Advances
in COMPUTERS VOLUME 34
Contributors t o This Volume
J. K. AGCARWAL TFD
J.
RIGCjtKSTAFF
LAWKENCFCHISVIN ...
37 downloads
1150 Views
22MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 34
Contributors t o This Volume
J. K. AGCARWAL TFD
J.
RIGCjtKSTAFF
LAWKENCFCHISVIN R. J A M ~ SDUCKWORTH RALPHDUNCAN Wir I.IAM I. GROSKY R~JDY HlRS(’ki1IEIM H ~ I N K. L KI FIN RAJIVM ~ H K O T R A N . NANDHAKUMAK
Advances in
COMPUTERS EDITED BY
MARSHALL C. YOVITS Purdue School of Science Indiana University -Purdue University at Indianapolis Indianapolis, Indiana
VOLUME 34
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Boston London
San Diego New York Sydney
Tokyo
Toronto
This book is printed on acid-free paper. ($9
Copyright 0 1992 hy Academic Pms, Inc. All nghis reserved. No pan o f thih publication may be reproduced or tnnaniitted in any form or by any rtieans, electronic or mechanical. including phntocopy. recording, or any inlimnation stordge and retrieval system, without peniiirsion in writing from the publisher,
ACADEMIC PRESS, I NC. 1250 Sixth Avenue, San Diego, CA 92101-4311
United Kingdom Edition published by ACADEMIC PRESS LIMITED 24-28 Oval Road. London NW 1 7DX
Library of' Congress Catalog Card Number: 59-15761
ISBN 0-12-012134-4 Printed in tlic United States of America 92939495 9 8 7 6 5 4 3 2 1
Contents
Contributors . Preface . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. vii ...
.
viii
An Assessment and Analysis of Software Reuse l e d J . Biggerstaff
Introduction . . . . . . . . . . . . . . . . Software Reusability Successes . . . . . . . . . . . Examples of Reuse Implementation Technologies . . . . . . . . . . . . . . . . . Effects of Key Factors . Futures and Conclusions . . . . . . . . . . . . 6. References . . . . . . . . . . . . . . . . .
1. 2. 3. 4. 5.
1 10
30 38 53 54
Multisensory Computer Vision N . Nandhakurnar and J . K . Aggarwal
1. 2. 3. 4. 5. 6.
Introduction . . . . . Approaches to Sensor Fusion Computational Paradigms for Fusion at Multiple Levels . Conclusions . . . . . References . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
Multisensory Vision . . . . . . . .
. . . .
59 63 86 99
. . . . . . . . . . . . . . . 105 . . . . . . . . . . . 107
Parallel Computer Architectures Ralph Duncan
1. 2. 3. 4. 5. 6.
Introduction . . . . . . . . . Terminology and Taxonomy . . . . Synchronous Architectures . . . . . MIMD Architectures . . . . . . . MIMD Execution Paradigm Architectures Conclusions . . . . . . . . . Acknowledgments . . . . . . . References . . . . . . . . . .
.
.
.
.
.
.
. 113
.
.
.
.
.
.
. 115
. . . . . . . 118 .
.
.
.
.
.
. 129
. . . . . . . 139 . . . . . . . 149 . . . . . . . 152 . . . . . . . 152
vi
CONTENTS
Content-Addressable and Associative Memory Lawrence Chisvin and R . James Duckworth
1. 2. 3. 4. 5. 6. 7. 8.
Introduction . . . . . . . . . . . . . . . Address-Based Storage and Retrieval . . . . . . . . Content-Addressable and Associative Memories . . . . . Neural Networks . . . . . . . . . . . . . . Associativc Storage, Retricval, and Proccssing Methods . . Associative Memory and Processor Architectures . . . . Software for Associative Processors . . . . . . . . Conclusion . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . Rcfcrcnccs . . . . . . . . . . . . . . . .
. 160 162 . 164 . 174 . 176 . 184 . 212 . 225 . 228 . 229
.
image Database Management William I . Grosky and Rajiv Mehrotra
1. 2. 3. 4. 5. 6.
Introduction . . . . . . . . . . . . Image Database Management System Architecture . Some Example Image Database Management Systems Similarity Retrieval in Image Database Systems . . Conclusions . . . . . . . . . . . . . . . . . . . . . . Acknowledgments References and Bibliography . . . . . . .
.
.
.
. 237 239 . . . . 249 . . . . 266 . . . . 283 . . . . 283 . . . . 283
. . . .
Paradigmatic influences on information Systems Development Methodologies: Evolution and Conceptual Advances Rudy Hirschheirn and Heinz K. Klein
Introduction . . . . . . . . . . . . . . . . 294 295 Evolution of Information Systems Development Methodologies Methodologies and Paradigms . . . . . . . . . . . 305 Paradigms and the Continued Evolution of Mcthodologies . . . 325 Conclusion . . . . . . . . . . . . . . . . .. 366 Acknowledgments . . . . . . . . . . . . . . 367 6 . Appendices: Summaries of the Methodologies . . . . . . 367 References . . . . . . . . . . . . . . . . . 381
1. 2. 3. 4. 5.
A U I I I O R INDFX .
.
.
.
.
.
.
.
.
.
.
.
.
.
. 393
SLIRJFCT INDFX .
.
.
.
.
.
.
.
.
.
.
.
.
.
. 405
Contents of Volumcs in this Scrics .
.
.
.
.
.
.
.
.
. 413
Contributors Numbers in parentheses rcfcr to the pages on wjhich the authors' contributions begin.
J. K. Aggarwal (59), Computer and Vision Research Center, College of Engineering, The University of Texas, Austin, Texas, 78712 Ted J. BiggerstaR ( 1), Microelectronics and Computer Technology Corporution , Austin, Texas 78759 Lawrence Chisvin ( 159), Digital Equipment Corporation, Hudson, Massachusetts 01 749 R. James Duckworth ( 159), Department of Electrical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609 Ralph Duncan (1 13), Control Data, Government Systems, Atlanta, Georgiu 30328 William I. Grosky (237), Computer Science Department, Wayne State University, Detroit, Michigan 48202 Rudy Hirschheim (293), College of Business Administrution, University of Houston, Houston, Texas 77204 Heinz K. Klein (293), School of Mantigement, State University of New York, Ringhumton, New York 13901 Rajiv Mehrotra (237), Computer Science Department, Center for Robotics and Manujarturing Systems, University of Kentucky, Lexington, Kentucky 40506 N. Nandhakumar (59), Department of Elrctricul Engineering, University of' Virginiu, Churlot tesville, Virginiu 22903
vii
Preface The publication of Volume 34 of Aducrncc~sin Cutizputers continues the in-depth presentation of subjects of both current and continuing interest in computer and information science. Contributions have been solicited from highly respected expcrts in their fields who recognize the importance of writing substantial review and tutorial articlcs in their areas of expertise. A d v ( m v s in C,’omputerspermits the publication of survey-type articles written from a relatively leisurely perspective. By virtue of the length of the chapters included, authors are able to treat their subjects both in-depth and inbreadth. The Advunccs in Computers series began in 1960 and now continues i n its 33rd year with this volume. During this period, in which we have witnessed great expansion and dynamic change in the computer and information fields, the series has played an important role in the development of computers and thcir applications. The continuation of the series over this lengthy period is a tribute to the rcputations and capabilities of the authors who havc contributed to it. Included in Volume 34 arc chapters on software reuse, multisensory coniputer vision, parallel computer architecture, associative memory, image databases, and paradigms for information systems development. In the first chapter Ted Riggerstaff states that software reusability is an approach that under special circumstances can produce an order of magnitude improvement in software productivity and quality, and under more common circumstances can produce less spectacular but nevertheless significant improvements in both. His chapter examines several aspects of reuse. He concludes that software reuse provides many opportunities for significant improvemcnts to software development productivity and quality within certain well-defined contexts. If one understands where it works well and why, it can be a powerful tool in one’s arsenal of software development tools and techniques. Nandhakumar and hggarwal in Chapter 2 consider that computer vision broadly includes a variety of sensing modes. They conclude that the advantages of multisensory approaches to computer vision are evident from their discussions. The integration of multiple sensors or multiple sensing modalities is an effective method of minimizing the ambiguities inherent in interpreting perceived scenes. The multisensory approach is useful for a variety of tasks including pose determination, surface reconstruction, object recognition, and motion computation, among others. I n the third chapter Kalph Duncan indicates that the term parallel proccssing designates thc simultaneous execution of multiple processors to solve a single computational problem cooperatively. Parallel processing has viii
PREFACE
ix
attracted a great deal of recent interest because of its potential for making difficult computational problems tractable by significantly increasing computer performance. He further states that parallel processing must be supported by architectures that are carefully structured for coordinating the work of many processors and for supporting efficient interprocessor communications. His chapter’s central aim has been to show that, despite their diversity, parallel architectures define a comprehensible spectrum of machine designs. Each of the major parallel architecture classes included represents a fundamental approach to supporting parallelized program execution effectively. Chisvin and Duckworth in the fourth chapter state that associative memory has finally come of age. After more than three and a half decades of active research, industry integrated circuit design and fabrication ability has finally caught up with the vast theoretical foundation built up over that time. In the past five years, in particular, there has been an explosion in the number of practical designs based upon associative concepts. Advances in very largescale integration technology have allowed many previous implementation obstacles to be overcome. Their chapter describes the field of contentaddressable memory and associative memory, and the related field of associative processing. Compared to conventional memory techniques, contentaddressable and associative memory are totally different ways of storing, manipulating, and retrieving data. In the next chapter Grosky and Mehrotra discuss database management systems for images. Although database management systems were originally developed for data processing applications in a business environment, there has recently been much interest expressed in the database community for devising databases for such nonstandard data as graphics, maps, images, video, and audio, as well as their various combinations. Much of the initial impetus for the development for such nonstandard databases originated in the scientific community concerned with the type of data that was to be managed. Grosky and Mehrotra convey an appreciation for the continuing development of the field of image databases. They believe that since researchers in the database community have shown a mutual interest in its development, the field of image database management should experience much growth. This field is still in its infancy and not yet on a firm footing; the correct questions are just starting to be asked, let alone answered. Hirschheim and Klein, in the final chapter, state that the subject of computer-based information systems development has received considerable attention in both the popular and academic literature over the past few decades. One area that continues to have a high profile and where a remarkable amount of interest can easily be observed is in the approaches or methodologies for developing information systems. It is likely that hundreds of
X
PREFACE
different methodologies exist. In this chapter, the authors explore the emergence of alternative information systems development methodologies, placing them in their historical context and noting where and why they differ from each other. Hirschheim and Klein believe that the history of methodologies appears to be driven more by fashionable movements than by theoretical insights. They conclude that, from the beginning, methodologies were influenced primarily by functionalism, but more recently the inspiration has come from alternative paradigms. They have also shown that methodologies can be improved by systematically importing fundamental concerns and principles inspired by different paradigms. I am pleased to thank the contributors to this volume. They have given extensively of their time and effort to make this book an important and timely contribution to their profession. Despite the considerable time and effort required, they have recognized the importance of writing substantial review and tutorial contributions in their areas of expertise; their cooperation and assistance are greatly appreciated. Because of their efforts, this volume achieves a high level of excellence and should be of great value and substantial interest for many years to come. It has been a pleasant and rewarding experience for me to edit this volume and to work with the authors.
MARSHALL C . YOVITS
An Assessment and Analysis of S o f t w a r e Reuse TED J . BIGGERSTAFF Microelectronics and Computer Technology Corp. Austin. Texas
I . Introduction . . . . . . . . . . . . . . . . . 1 . 1 Hyperboles of Reuse . . . . . . . . . . . . . 1.2 Key Factors Fostering Successful Reuse . . . . . . . . 2. Software Reusability Successes . . . . . . . . . . . . 2.1 Fourth-Generation Languages (LSR to VLSR) . . . . . . 2.2 Application Generators (VLSR) . . . . . . 2.3 Forms Designer Systems (LSR to VLSR) . . . 2.4 Interface Developer’s Toolkits . . . . . . . 2.5 The Software Factory (MSR to LSR, Process-Orientf:d Reuse) . 2.6 Emerging Large-Scale Component Kits (LSR) . . . . . . 2.7 User-Oriented Information System (LSR to VLSR) . 2.8 Application-Specific Reuse (LSR to VLSR) . . . 2.9 Designer/Generators (LSR to VLSR) . . . . 3. Examples of Reuse Implementation Technologies . . . . . . . 3.1 Classification and Library Systems . . . . . . . . . 3.2 CASETools . . . . . . . . . . . . . . . . 3.3 Object-Oriented Programming Systems . . . . . . . . 4 . Effects of Key Factors . . . . . . . . . . . . . . 4.1 Relationships among the Reuse Factors . . . . . . . . 4.2 A Quantitative Model of the Relative Amount of Integration Code 5. Futures and Conclusions . . . . . . . . . . . . . 5.1 Futures . . . . . . . . . . . . . . . . . 5.2 Conclusions . . . . . . . . . . . . . . . . 6 . References . . . . . . . . . . . . . . . .
. . .
1
.
.
.
2
. . . . .
. . . . . . . . . .
4 10 10 13 15 18 20 22 23 25 27 30 30 31 33 38 38 41 53 53 54 54
.
.
. . . . .
. . . . . . . . . .
.
. . . . . . .
. . . . . . . .
.
.
.
. . . . . . . . . . . . .
.
.
1 . Introduction
Software reusability (Biggerstaff and Perlis. 1984; Biggerstaff and Richter. 1987; Freeman. 1987; Tracz. 1987. 1988; Biggerstaff and Perlis. 1989; Weide et al., 1991) is not a “silver bullet”* (Brooks. 1987). but is an approach that under special circumstances can produce an order of magnitude improvement in software productivity and quality. and under more common
* The phrase “silver bullet” is jargon that refers to a panacea 1 ADVANCES IN COMPUTERS. VOL . 34
for software development .
Copyright 0 1992 by Academic Press. Inc.
All rights of reproduction in any form reserved. ISBN 0- 12-012 134-4
2
TED J. BIGGERSTAFF
circumstances can produce less spectacular but nevertheless significant improvements in both. This chapter will examine several aspects of reuse: (1) reuse hyperboles that lead to false expectations, (2) examples of reuse successes, (3) the factors that make these examples successful, (4)the relationships among these factors, ( 5 ) in particular, the relationship between reuse technologies and their potential for productivity and quality improvement, and (6) the quantitative relationship between the key factors and the resultant reuse benefits.
1.1
Hyperboles of Reuse
After listening to a series of speakers, each promising additive cost decreases that were summing suspiciously close to 100Y0,one wag was heard to comment, “If this keeps up, pretty soon our internal software development activities will be so efficient that they will start returning a profit.” As in this story, software reusability hyperboles often strain credulity. Unfortunately, software reusability hyperbole is more seductive than software reusability reality. There are several major reuse hyperboles that reflect some measure of truth but unfortunately overstate the profit of reuse or understate the required qualifications and constraints. 0
0
Reuse technology is the most important factor to success. This is an
aspect of the silver bullet attitude and is typified by statements like: “If I choose Ada, or Object-Oriented programming or an application generator then all other factors are second- and third-order terms in the equation that defines the expected improvement. Success is assured.” However, this is seldom completely true. While the technology can have very high impact (as with application generators for example), it is quite sensitive to other factors such as the narrowness of the application domain, the degree to which the domain is understood, the rate of technology change within the domain, the cultural attitude and policies of the development organizations, and so forth. Yes, the technology is important but it is not always primary nor even a completely independent factor. Reuse can be applied everywhere to great benefit. This is another aspect of the silver bullet attitude that one can apply reuse to any problem or application domain with the same expectation of high success. The reality is that narrow, well-understood application domains with slowly changing technologies and standardized architectures are the most likely to provide a context where reuse can be highly successful. For
AN ASSESSMENT A N D ANALYSIS OF SOFTWARE
0
0
REUSE
3
example, well-understood domains like management information systems (MIS) and business applications, user interfaces, narrowly defined product lines, numerical computation, etc. all, to a greater or lesser extent, have these qualities and reuse has flourished in these environments. Reuse has failed in new, poorly understood domains. Reuse is a hunter/gatherer activity. Making a successful reuse system is largely an intellectual activity of finding the right domain, the right domain standards, the infrastructure, and the right technical culture. It is not simply a matter of going out into the field and gathering up components left and right. Casually assembled libraries seldom are the basis of a high payoff reuse system. Successful reuse systems are crafted to accomplish a set of well and narrowly defined company or organizational goals. Too general a set of goals (e.g., we need a reuse system) or too general a domain (e.g., we need components that support all of our functional needs) usually lead to a low payoff. The hidden truth in this attitude is that populating a reuse library is largely fieldwork and that the “gold” is in the domain. But the success comes through problem driven harvesting, establishing domain standards to enhance component interconnectability and careful adaptation of the harvested components to those interconnection standards. We can have reuse without changing our process. Reuse is sensitive to many cultural, policy and environmental factors. An anti-reuse attitude within an organization, a process that is inconsistent with reuse or a weak, unsupportive infrastructure (software and process) can doom a potentially successful reuse effort.
Given that we reject these hyperboles, let us look at the reality of software reuse. In the broadest sense, software reuse is the formalization and recording of engineering solutions so that they can be used again on similar software developments with very little change. Hence, in one sense, the software reuse process institutionalizes the natural process of technology evolution. Consider the evolution of commercial software products. Successful companies often maximize their competitiveness by focusing on product niches where they can build up their technological expertise and thereby their product sets and markets, in an evolutionary fashion. For example, over a period of years, a company might evolve a line editor into a screen editor and then evolve that into a word processor and finally evolve that into a desktop publishing system. Each generation in such an evolution exploits elements of the previous generations to create new products and thereby build new markets. In an informal sense, such a company is practicing reuse within a product niche. The companies that formalize and institutionalize this process are truly practicing reuse. Since this definition of reuse
4
TED J. BIGGERSTAFF
is independent of any specific enabling technology (e.g., reuse libraries or application generators), it allows us to take a very broad view of reuse, both in the range of potential component types that can be reused (e.g., designs, code, process, know-how, etc.) as well as in the range of technologies that can be used to implement reuse. The success of a reuse strategy depends on many factors, some of them technical and some of them managerial. While we will attempt to point out management factors that foster or impede reuse, we will largely focus on the technology of reuse. In the next subsection, we hypothesize a number of factors or properties that we believe foster successful software reuse. Then in the following sections of the chapter, we will examine several reuse successes and the role that these factors played in those successes. Finally, we attempt to build a qualitative model that describes the interrelationship among the factors and a quantitative model that describes the effects of two of the key independent technology factors on the payoff of software reuse. In the end, we hope to leave the reader with a good sense of the kinds of reuse approaches and technologies that will lead to success and those that will not.
1.2 Key Factors Fostering Successful Reuse
Some of the key factors that foster successful reuse are : 0 0
0 0 0 0 0
0
Narrow domains Well-understood domains/architectures Slowly changing domain technology Intercomponent standards Economies of scale in market (opportunities for reuse) Economies of scale in technologies (component scale) Infrastructure support (process and tools) Reuse implementation technology
Narrow domains: The breadth of the target domain is the one factor that stands out above all others in its effect on productivity and quality improvement. Typically, if the target domain is so broad that it spans a number of application areas (often called horizontal reuse) the overall payoff of reuse for any given application development is significantly smaller than if the target domain is quite narrow (often called vertical reuse). The breadth of the target domain is largely discretionary, but there is a degree to which
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
5
the reuse implementation technology may constrain the domain breadth. There is a range of implementation technologies, with broad-spectrum technologies at one end and narrow-spectrum technologies at the other. Broadspectrum technologies (e.g., libraries of objects or functions) impose few or no constraints on the breadth of the target domain. However, narrow-spectrum technologies, because of their intimate relationship with specific domain niches, do constrain the breadth of the target domain, and most often constrain target domains quite narrowly. In general, narrow-spectrum implementation technologies incorporate specialized application domain knowledge that amplifies their productivity and quality improvements within some specific but narrow domain. As an example, fourth-generation languages (4GLs) assume an application model that significantly improves the software developer’s ability to build MIS applications but is of no help in other domains such as avionics. Even though there is a restrictive relationship only at one end of the spectrum (between narrow target domains and narrow implementation technologies), in practice there seems to be a correlation between both ends of the spectrum. Not only do narrow-spectrum technologies, perforce, correspond to narrow target domains but broad-spectrum technologies often (but not always) correspond to broader domains. The key effect of domain breadth is the potential productivity and quality improvement possible through reuse. Reuse within very narrow domains provides very high leverage on productivity and quality for applications (or portions of applications) that fall within the domain but provides little or no leverage for applications (or portions of applications) that fall outside the domain. For example, an application generator might be used to build MIS applications and it would give one very high leverage on the data management portion of the application but it would not help at all in the development of the rest of the application. Luckily, MIS applications are heavily oriented toward data management and therefore, such reuse technologies can have a significant overall impact on MIS applications. Broad-spectrum technologies, on the other hand, show much less productivity and quality improvement on each individual application development but they affect a much broader class of applications. Generally speaking, the broad-spectrum technologies we are going to consider can be applied to virtually any class of application development. In the succeeding sections, we will often use the general terms narrowspectrum reuse and broad-spectrum reuse to indicate the breadth of the domain without any specific indication of the nature of the implementation technology being used. If the breadth of the implementation technology is important to the point, we will make that clear either explicitly or from context.
6
TED J. BIGGERSTAFF
Well-understood domaindarchitectures: The second key factor affecting the potential for reuse success is the level of understanding of problem and application domains, and the prototypical application architectures used within those domains. Well-understood domains and architectures foster successful reuse approaches and poorly understood domains and architectures almost assure failure. Why is this? Often as a domain becomes better and better understood, a few basic, useful, and successful application architectures evolve within the domain. Reuse systems can exploit this by reusing these well-understood architectural structures so that the software developer does not have to recreate or invent them from scratch for each new application being developed. However, if such application architectures have not yet evolved or are not known by the implementing organization, it is unlikely they will be discovered by a reuse implementation project. The fact that the problem domains in which narrow-spectrum reuse has been successful are well-understood domains is not coincidental. In fact, it is a requirement of a narrow-spectrum reuse technology. This observation points up a guideline for companies that intend to build a narrow spectrum reuse system to support application development. To successfully develop a narrow-spectrum reuse technology, say an application generator or a domain-specific reuse library, the developer must thoroughly understand the problem and application domain and its prototypical architectures in great detail before embarking on the development of a reuse system for that domain.
There is a three-system rule of thumb-if one has not built at least three applications of the kind he or she would like to support with a narrowspectrum technology, he or she should not expect to create a program generator or a reuse system or any other narrow-spectrum technology that will help build the next application system. It will not happen. One must understand the domain and the prototypical architectures thoroughly before he or she can create a narrow-spectrum reuse technology. Hence, the biggest, hardest, and most critical part of creating a narrow-spectrum technology is the understanding of the domain and its prototypical architectures. Slowly changing domain technology: Not only must one understand the domain but the domain needs to be a slowly changing one if it is to lend itself to reuse technology. For example, the domain of numerical computation is one in which the underlying technology (mathematics) changes very little over time. Certainly, new algorithms with new properties are invented from time to time (e.g., algorithms allowing high levels of parallel computation) but these are infrequent and the existing algorithms are largely constant.
A N ASSESSMENT A N D ANALYSIS
OF SOFTWARE REUSE
7
Thus, if an organization makes a capital investment in a reuse library or an application generator for such domains, they can amortize that investment over many years. Rapidly changing domains, on the other hand, do not allow such long periods of productive use and, therefore, d o not offer as profitable a return on the initial investment. Intercomponent standards: The next factor is the existence of intercomponent standards. That is, just like hardware chips plug together because there are interchip standards, software components, and especially narrowspectrum technology components plug together because there are analogous intercomponent standards. These standards arise out of an understanding of the problem domains and the prototypical architectures. The narrower the domain, the narrower and more detailed the intercomponent standards. In very broad domains, these standards deal with general interfaces and data (e.g., the format of strings in a string package), whereas in a narrow domain the standards are far more narrowly focused on the elements of that domain (e.g., in an “input forms” domain, the standards might specify the basic data building blocks such as field, label, data type, data preskntation form, and so forth). This factor suggests that certain narrow spectrum reuse technology strategies will not work well. For example, if one intends to build a library of reusable software components, the strategy of creating a library and then filling it with uncoordinated software components, will lead to a vast wasteland of components that do not fit together very well. Consequently, the productivity improvement will be low because the cost to adapt the components is high. The analogy with hardware manufacturing holds here. If two software components (or chips) are not designed to use the same kinds of interfaces and data (signals), extra effort is required to build interface software (hardware) to tie them together. This reduces that payoff gained by reuse and also tends to clutter the design with Rube Goldberg patches that reduce the resulting application’s maintainability and limit its ability to evolve over time. Economies of scale in market: Another important factor is the economies of scale in the “market,” where we are using the term market in the broadest sense of the word and intend to include the idea that the total coalition of users of a component, regardless of the means by which they acquire it, is the market for that component. Thus, economies of scale in the market means that any reuse technology should be driven by a large demand or need. One should be able to identify many opportunities to apply the reuse technology to justify its development (or purchase) and maintenance. If you
8
TED J. BIGGERSTAFF
are only going to develop one or two applications, it seldom pays to develop (or purchase) a reuse technology for the target application. This is not to say that informal, ad hoc or opportunistic reuse, which is not organizationally formalized, should not be exploited. The point is that if an institutionalized reuse technology costs a company a lot to develop and maintain, it should return a lot more in savings to that company. One way to gauge that return beforehand is to consider the opportunities for reuse. Economies of scale in technologies: There are also economies of scale in the technologies themselves, in the sense that, the larger the prefabricated component that is used by the reuse technology, the greater the productivity improvement for each use. And it is this increase in size of components that tends to force the narrowing of the technology domain. Thus, the size of the prefabricated component, the narrowness of the application domain, and the potential productivity improvement are all positively correlated. Because the scale of the components is so important and the fact that scale correlates to other important properties of reuse technologies, we introduce some broad terminology that draws on the hardware component analogy. Smull-scule components are defined to be from 10 to 100 lines of code, i.e., O(I0’) LOC; medium-scale components are those from 100 to 1000 lines, i.e., O(IO’) LOC; /urge-scale from 1000 to 10,000 lines, i.e., O(IO’) LOC; uery large-scale from 10,000 to 100,000 lines, i.e., 0(104) LOC; and hyperscule above 100,000 lines, i.e., greater than 0(105) LOC. The sizes that we choose are somewhat arbitrary and flexible because we are most interested in the relative properties of the reuse systems that rely on the different scales of components. Therefore, the numbers should not be taken too literally but rather should provide a loose categorization of component sizes. Carrying the hardware analogy further, we use the term SSR (small-scale reuse) to refer to those technologies that tend to use small-scale components on the average. SSR is motivated by the hardware term SSI (small-scale integration). Similarly, MSR, LSR, VLSR, and HSR are medium-scale, large-scale, very large-scale and hyper-scale reuse technologies. While reuse technologies are not, strictly speaking, limited lo a particular scale, they seem to most easily apply to a characteristic scale range. For example, libraries of functions tend toward small scale and medium scale not because it is impossible to build large and very large function-based components, but rather because of the lack of formal support for large-scale design structures (e.g., objects or frameworks) in functionally based programming languages. Any such large-scale design structure falls outside of the functional language formalism and must be manually enforced. Experience has shown that manual enforcement tends not to be very successful. It is generally
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
9
easier to use other reuse implementation technologies (e.g., true object-based languages) that provide formal mechanisms to enforce and manage these larger-scale structures. Infrastructure support: Another important factor is an organization’s infrastructure. Most reuse technologies (and especially the narrow spectrum technologies) pay off best when they are coordinated with an existing, welldefined, and mature software development infrastructure (process). For example, an organization that uses computer-aided software engineering (CASE) tools is better positioned to exploit the reuse of design information than one that does not. CASE tools provide a (partially) formal notation for capturing such designs. And if an organization is already trained and using CASE tools, the additional effort to integrate a library of reusable designs into the process is significantly less than it would be otherwise. Reuse implementation technologies: One factor that can effect the degree of success of a reuse approach is the implementation or enabling technology that one chooses. For many narrow spectrum approaches to reuse, the technology is intimately tied to the approach and it makes more sense to discuss these technologies in the context of the discussions of the specific approaches. We will do this in the following section. On the other hand, broad-spectrum implementation technologies are not tied to any specific reuse approach, even though they are quite often used for broad-spectrum reuse, and so we will mention a few instances of these technologies here and discuss their values. 0
0
0
0
Libraries: Library technology is not a primary success factor but its value lies largely in establishing a concrete process infrastructure that fosters reuse by its existence more than by its functionality. If an organization’s first response to a reuse initiative is to build a library system, then they probably have not yet thought enough about the other more important factors. Classification systems : The main value of classification systems is that they force an organization to understand the problem and application domain. CASE tools : Their value lies in establishing a representation system for dealing with designs and thereby including reusable components that are more abstract (and therefore, more widely reusable) than code. Object-oriented programming languages: Their main value is in the perspicuity of the representation and its tendency to foster larger and more abstract reusable components (i.e., classes and frameworks) than
10
TED J. BIGGERSTAFF
in earlier languages (i.e., functions). Further, the object-oriented representation tends to lead to clearer, more elegant and more compact designs. In summary, reuse success is not a result of one technology or one process model or one culture. It is a result of many different mixtures of technologies, process models, and cultures. We can be guided by a few general principles that point in the direction of success and warn us away from surefire failures, but in the end, the details of success are defined by hard technical analysis and a strong focus on the application and problem domains. I suspect that there is an 80/20 rule here-the domain has an 80% effect and all of the rest has a 20% effect. 2.
Software Reusability Successes
Now let us consider some cases of successful reuse and analyze them in the light of these success factors. 2.1
Fourth-Generation Languages (LSR to VLSR)
Among of the earliest rapid software development technologies to appear and ones that can be bought of the shelf today are fourth-generation languages (4GLs) (Gregory and Wojtkowski, 1990; Martin, 1985; Martin and Leben, 1986a, b). These are quite narrow technologies that apply most specifically to the domain of MIS and business applications. The entities that are being reused in these two cases are the abstract architectural structures (i.e., design components) of MIS applications. The typical 4GL system provides the end user with some kind of highlevel capability for database management. For example, a high-level query from the end-user is often translated into an application database transaction that generates a report. The report may be a business form, a text-based report, a graph, a chart, or a mixture of these elements (see Fig. 1). 4GLs are typically very high-level languages that allow you to talk to the database system without all of the overhead that you would have to use if you were writing an equivalent COBOL program. In a COBOL program you might have to allocate memory and buffers to handle the results from the query. You might have to open the database, initiate the search, and so forth. In contrast, 4GL languages typically do all of those things for you. They provide a language that requires you to talk only about the essential database operations. For example, Fig. 2 shows a sequential query language (SQL) query that selects a part number from a table of all parts, such that the weight of the associated part is less than 700 pounds.
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
High Level Query or Processing Request
.**
>AFI,and more specifically
SC = 10" * AFI. This is a convenient approximation because it provides a simple if approximate relationship between PAC and component scale. That is, for AFI near 1, n is approximately log,, (average component size) and for AFT near 10, n + 1 is approximately loglo(average component size), and so forth. Thus, n is a relative gauge of the component scale. If one makes a few simplifying assumptions about AFI's range, we have an independent variable that ranges over the reuse scale, namely, SSR, MSR, LSR, VLSR, etc. Thus, we can easily relate the approximate (average) amount of work involved in connection of reused components to the scale of those components.
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
49
Using this approximation, Eq. (4.9) becomes PAC =
AFI 10" * AFI
* ( 1 + P) + AFI'
Canceling out AFI, we get PAC = For n
=
1
10" * (1
(4.13)
+ P) + 1'
1,2,3, . . . , we get PAC, = 1
1 1 O * P + 11
=
PAC,,= 3 =
PAC, = 2 =
1 100 * P + 101
1 ~
1000 * P
+ 1001
and so forth. Thus, for n > 0, = PACapprox
1 10" * (1 + P)'
(4.14)
We can see that for at least one order of magnitude difference between the component scale (SC) and the average number of connections (AFI), the amount of total connection code is below 10% (for n = 1 and p = 0) and well below that for larger 11's. Thus, for libraries with good interconnection standards and large components, the amount of work involved in interconnection is small relative to the overall development. The payoff of reuse is seen quite clearly in this case by examining the ratio of connection code to reused code, which is approximately the inverse of the component scale for small AFI. PAC - 1 PAR 10"' Thus, the connection overhead is relatively small for MSR components and inconsequential for LSR components and above. Figure 22 summarizes the results of this analysis. 4.2.3 Proportion of Reuse Code (Actual and Apparent)
If rather than just examining the proportion of interconnection code, we would like to know the proportion of reused code (and by implication the proportion of code to be developed), we can perform a similar set of algebraic manipulations to derive the formulas for PAR in each of the three
50
TED J. BIGGERSTAFF
Poorly Standardized Libraries
I
Well Standardized Libraries
ili!!T PAC
P I 0 P r 1
P=2 P = 3 P=4
0.25
1
2
3
4
5
Relatively Small Components
a
AFI (Averape Fan-In)
PAC
PAC
1 -
=
2 c P
PAC
--
-
- 1
PAR
MI-4 AFI 3 AFI
I
2
AFI = 1
PAC
1
=
lo* *
(1
+
P)
+
1
PAC < 0 . 1 0 for all PAC
AF I
=
(1
PAC
--
- AFI
PAR
+
=
P)
+
( n = 1 or n > 1 ) and AFI
Fan-In
all ( p = PAC -=-
1
PAR
10"
0
or p > 0)
FIG. 22. Summary of case analysis
cases considered earlier. The results of these derivations are: 1
CASE 1 :
PAR =
CASE 2:
PAR=2+P
CASE 3 :
PAR =
(1
+ P) + AFI 1
10" 10" * ( 1
+ P) + 1
'
Relatively Large Components
51
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
The formula for case 3 is fairly complex to compute and it would be convenient to have a simpler approximation to PAR for case 3. The apparent proportion of application reuse (APAR) is a useful approximation. APAR is defined as
APAR
RLOC RLOC + NLOC
= ___
which can be expressed as
APAR
1
=-
1 +P'
In other words, APAR ignores the connection code, assuming it to be small. Obviously, this approximation only works for some situations. The question is, Under what circumstances is this a good approximation of PAR? Figure 23 shows the APAR curve in comparison with the PAR curves for case 2 and several parameterizations of case 1. It is clear from this figure that APAR is generally not a good approximation for either case 1 or case 2 . However, for case 3, APAR is a pretty good approximation under most parameterizations. For n > or = 2, the connectivity does not significantly alter the percent of reused code and APAR is a good approximation. For n = 1, the worst case is when p = 0, and even in this case, the difference is only about 0.08. The remaining integral values of p (greater than 0) differ by no more than 0.02. For n = 0, the formula reduces to case 2 . This leads
AFI=l
(a case
2)
I 1
2
3
4
5
6
P (New/Reused Ratio) FIG. 23. Proportion of reuse code (apparent and real).
I
7
52
TED J. BIGGERSTAFF
to the following rule of thumb: If on the averagc, thc component scale (SC) is one or more orders of magnitude greater than AFI (the average interconnection fan-in) and the reuse library is well standardized (ACC is near 11, the connectivity code has no appreciable effect on the reuse proportions and APAR is a good approximation for PAR.
4.2.4 Effects on Defect Removal In the previous sections, we have focused largely on the excessive plumbing costs that arise from poorly standardized libraries and small components. The analytical model also has cost avoidance implications with respect to defect removal that may be as great or greater than the cost avoidance that accrues from well-designed reuse regimes. The important facts to note are: 0
0
Since reused code has significantly fewer defects than new code, defect removal from reused code is usually significantly cheaper than from new code. It is not unusual for there to be anywhere from several times to an order of magnitude difference between these costs. Since connective code is new code, it will exhibit the higher defect rates and therefore, higher defect removal costs than reused code.
When considering the effects of reuse regimes on defect removal, the conclusions are the same as when considering the effects of reuse regimes on basic development, i.e., make the connective code be as small as possible, thereby making PAR as large as possible. Each line of reused code will cost several times (and perhaps even an order of magnitude) less for defect removal than a line of new code or connective code. Therefore, the less connective code we have, the better. Thus, we are drawn to the same conclusions as above: to make defect removal as inexpensive as possible, we need to standardize our libraries and use large components.
4.2.5 Conclusions from the Model In summary, the conclusions drawn from our analytical model confirm those that we reached by qualitative argument and case study observations: 0
0
Library standards (most often expressed in terms of application domain data structure and protocol standards) are effective in promoting reuse. Large components reduce the relative effort to interconnect reusable components in all but those libraries with the poorest level of standardization.
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
53
Therefore, the conclusion must be to develop large components (which tend toward domain specificity) and use a small set of (domain-specific) data structures and protocols across the whole library of components.
5. Futures and Conclusions 5.1 Futures If I try to predict the future evolution of reuse, I see two major branchesvertical reuse and horizontal reuse-that break into several minor branches. In the vertical reuse branch, I see large-scale component kits becoming the “cut-and-paste’’ components for end-user application creation. That is, more and more applications will be constructed by using folders of facilities that are analogs of the clip art that is so widespread in today’s desktop publishing. Of course, such end-user programming will have inherent limitations and therefore, will not replace professional programming, only change its span and focus. The other major evolutionary branch within vertical programming evolution will be the maturation of application-specific reuse, which will evolve toward larger-scale components and narrower domains. This technology will be used largely by the professional programmer and will probably focus mostly on application families with a product orientation. Even though the productivity and quality improvements will be high, as with all vertical reuse technologies, the motivation in this case will be less a matter of productivity and quality improvement and more a matter of quick time to market. More and more software companies are succeeding or failing on the basis of being early with a product in an emerging market. As they discover that reuse will enhance that edge, they will evolve in toward reuse-based product development. Interestingly, I doubt that any of the vertical reuse approaches will long retain the label “reuse,” but more likely, the technology will be known by application specific names, even though, in fact, it will be reuse. The second major evolution of reuse technologies will be in the area of horizontal reuse and here I see two major branches-systems enhancements and enabling technologies. As technologies like interface toolkits, user-oriented information systems, and 4GL-related technologies mature and stabilize, they will become more and more part of the operating system facilities. This is not so much a statement of architecture, in that they will probably not be tightly coupled with the operating systems facilities, but more a matter of commonly being a standard part of most workstations and PCs. In fact, a litmus test of the maturity of these technologies is the degree to which they
54
TED J. BIGGERSTAFF
are considered a standard and necessary part of a delivered computer. One can see this kind of phenomenon currently happening with the X windows system. Within 10 or so years, it will probably be difficult and unthinkable to buy a workstation or PC that does not have some kind of windowing interface delivered with it. The other major branch of horizontal reuse is the set of reuse enabling technologies. More and more these technologies will merge into a single integrated facility. The object-oriented language systems and their associated development environments (i-e., the integrated debuggers, editors, profilers, etc.) will be integrated with the CASE tools such that the design and source code become an integral unit. The CASE tools themselves will be enhanced by designer/generator systems to allow them to do increasingly more of the work for the designer/programmer by using reuse technologies and libraries. Finally, I expect to see both the CASE tools and programming language development environments merge with reverse engineering, design recovery, and re-engineering tools and systems. These reverse engineering, design recovery, and re-engineering tools all support the population of reuse libraries as well as the analysis, understanding and maintenance of existing systems. Without such systems, the reuse libraries will largely be empty and the technology impotent. These are the systems that allow an even more primitive kind of reuse, that of bootstrapping previous experience into formal reusable libraries and generalized reusable know-how. Thus, while horizontal reuse and vertical reuse will evolve along different paths, both will move from independent tool sets to integrated facilities and consequently their leverage will be amplified. 5.2 Conclusions
There are no silver bullets in software engineering, and reuse is not one either, although it may come as close as anything available today. While not a silver bullet or cure-all, it does provide many opportunities for significant improvements to software development productivity and quality within certain well-defined contexts. If one understands where its works well and why, it can be a powerful tool in one’s arsenal of software development tools and techniques. REFERPNCES Arango, G. (1988). Domain Engineering for Software Reuse, Ph.D. dissertation, University of California at Ivine. Batory, D. S. (1988). Concepts for a Database System Compiler, ACM PODS. Batory, D. S . , Barnett, J. R., Roy, J., Twichell, B. C., and Garza, J. (1989). Construction of File Management Systems from Software Components. COMPSAC.
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
55
Bigelow, J., and Riley, V. (1987). Manipulating Source Code in Dynamic Design. HyperText ’87 papers. Bigelow, J. (1988). Hypertext and CASE. IEEE Software 21(3), 23-27. Biggerstaff, T. J., and Perlis, A. J., eds (1984). Special Issue on Reusability. IEEE Transactions on Sqftware Engineering, SE-lO(5). BiggerstaK, T. J. (1987). Hypermedia as a Tool to Aid Large-Scale Reuse. MCC Technical report STP-202-87; also published in “Workshop on Software Reuse,” Boulder, Colorado. Biggerstaff, T. J., and Richter, C. (1 987). Reusability Framework, Assessment, and Directions. IREE Software. Biggerstaff,T. J . , and Perlis, A. J., eds (1989). “Software Reusability” (two volumes). AddisonWesleylACM Press. Biggerstaff, T. J. (1989). Design Recovery for Maintenance and Reuse, IEEE Computer. Biggerstaff. T. J., Hoskins, J., and Webster, D. (1989). DESIRE: A System for Design Recovery. MCC Technical Report STP-021-89. Brachman, R. J., and Schmolze, J . G. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science 9, 171 216. Brooks, F. P. (1989). No Silver Bullet: Essence and Accidents of Software Engineering. IEEE Computer 22(7). Chikofsky, E. J . ed. (1988). Special Issue on Computer Aided Software Engineering. IEEE Software. Chikofsky, E. J., ed. (1989). Computer-Aided Software Engineering. IEEE Computer Society Press Technology Series. Conklin, J. (1987). Hypertext: An Introduction and Survey. IEEE Computer. Cox, B. (1 986). “Object-Oriented Programming: An Evolutionary Approach.” AddisonWesley. Cross, J. H., 11, Chikofsky, J., and May, C. H., Jr. (1992). Reverse Engineering. In “Advances in Computers,” Vol. 35 (Marshall Yovitz, Ed.) Academic Press, Boston. Cusumano, M. A. (1989). The Software Factory: A Historical Interpretation. IEEE Software. Cusumano, M. A. (1991). “Japan’s Software Factories: A Challenge to U.S. Management.” Oxford University Press. Ellis, M. A., and Stroustrup, B. (1990). “The Annotated C + + Reference Manual.” AddisonWesley. Fikes, R., and Kehler, T. (1985). The Role of Frame-Based Representation in Reasoning. Communiccitions of the ACM, 28(9). Finin, T. (1986a). Understanding Frame Languages (Part 1). A1 Expert. Finin. T. (1986b). Understanding Frame Languages (Part 2 ) . A1 Expert. Fisher, A. S . (1988). “CASE: Using Software Development Tools.” Wiley. Freeman, P. ( 1987). Tutorial on Reusable Software Engineering. IEEE Computer Society Tutorial. Goldberg, A., and Robson, D. (1983). “Smalltalk-80: The Language and Its Implementation.” Addison-Wesley. Gregory, W., and Wojtkowski, W. (1990). “Applications Software Programming with FourthGeneration Languages.” Boyd and Fraser Publishing, Boston. Gullichsen, E., D’Souza, D., Lincoln, P., and The, K.-S. (1988). The PlaneTextBook. MCC Technical Report STP-333-86 (republished as STP-206-88). Heller, D. ( I 990). “Xview Programming Manual.” O’Reilly and Associates, Inc. Hinckley, K. (1989). The OSF Windowing System. Dr. Dobbs Journal. Horowitz, E., Kemper, A., and Narasimhan, B. (1985). A Survey of Applications Generators. IEEE Software. Kant, E. (1985). Understanding and Automating Algorithm Design. IEEE Transacfions on Software Engineering SE-11( 1I).
56
TED J. BIGGERSTAFF
Kim, W., and Lechovsky, F. H. eds. (1989). ‘Object-Oriented Concepts, Databases. and Applications,” Addison-Wesley/ACM Press. Lubars, M. D. (1987). Wide-Spectrum Support for Software Reusability. MCC Technical Report STP-276-87, ( 1987) also published in “Workshop on Software Reuse,” Boulder, Colorado. Lubars, M. D. (1990). The ROSE-2 Strategies for Supporting High-Level Software Design Reuse. MCC Technical Report STP-303-90, (to appear). Also to appear in a slightly modified form in M. Lowry and R. McCartney, eds., “Automating Software Design,” under the title, Software Reuse and Refinement in the IDEA and ROSE Systems. AAAI Press. Lubars, M. D. (1991). Reusing Designs for Rapid Application Development. MCC Technical Report STP-RU-045-91. Martin, J. (1985). “Fourth-Generation Languages: Volume 11. Principles.” Prentice-Hall. Martin, J., and Leben, J. (1986a). “Fourth-Generation Languages-Volume 11. Representative 4GLs.” Prentice-Hall. Martin, J., and Leben, J. (1986b). “Fourth-Generation Languages-Volume 111. 4GLs from IBM.” Prentice-Hall. Matos, V. M., and Jalics, P. J. (1989). An Experimental Analysis of the Performance of Fourth Generation Tools on PCs. Communications qf the ACM 32( 11). Matsumoto, Y . (1989). Some Experiences in Promoting Reusable Software: Presentation in Higher Abstract Levels. In “Software Reusability” (T. J. Biggerstaff and A. Perlis, eds.). Addison-Wesley/ACM Press. Meyer, B. ( 1988). “Object-Oriented Software Construction.” Prentice-Hall. Neighbors, J. M. (1987). The Structure of Large Systems. Unpublished presentation, Irvine, California. Norman, R. J., and Nunamaker, J. F., Jr. (1989). CASE Productivity Perceptions of Software Engineering Professionals. Communications of the ACM 32(9). Nye, A. (1988). “Xlib Programming Manual.” O’Reilly and Associates, Inc. Nye, A,, and O’Reilly, T. (1990). “X Toolkit Intrinsics Programming Manual.” O’Reilly and Associates, Inc. Parker. T., and Powell, J. (May 1989). Tools for Building Interfaces. Computer Language. Pressman, R. S. (1987). “Software Engineering: A Practitioner’s Approach-2nd Ed.” McGraw-Hill. Prieto-Diaz, R. (1 989). Classification of Reusable Modules. In “Software Reusability-Volume I” (T. J. Biggerstatf and A. Perlis, eds.). Addison-Wesley. Rich, C., and Waters, R. (1989). Formalizing Reusable Components in the Programmer’s Apprentice. In “Software Reusability” (T. J. Biggerstaff and A. Perlis, eds.). Addison-Wesley/ACM Press. Rowe, L. A,, and Shoens, K. A. (1983). Programming Language Constructs for Screen Definition. IEEE Transactions on Software Engineering, SE-9( 1). Saunders, J. H. (March/April 1989). A Survey of Object-Oriented Programming Languages. Journal of Ohject-Oriented Programming. Scheifler, R. W., Gettys, J., and Newman, R. (1988). “X Windowing System: C Library and Protocol Reference.” Digital Press. Sclby, R. W. (1989). Quantitative Studies of Software Reuse. In “Software Reusability” (T. J. Biggerstaff and A. Perlis, eds.). Addison-Wesley/ACM Press. Smith, J. B.,and Weiss, S. F. eds. (1988). Special Issue on Hypertext. Communications qf the ACM 31(7). Stroupstrup, B. (1 986). “The C + + Programming Language.” Addison-Wesley. Stroupstrup, B. (May 1988). What is Object-Oriented Programming? IEEE Software 10 20. Sun Microsystems Corporation (1990). “Openwindows Developer’s Guide 1 . 1 User Manual.” Sun Microsystems.
AN ASSESSMENT AND ANALYSIS OF SOFTWARE REUSE
57
Tracz, W. ed. (July 1987). Special Issue on Reusability. IEEE Sofiwnre. Tracz, W. ed. (July 1988). Tutorial on Software Reuse: Emerging Technology. IEEE Computer Society Tutorial. Wartik, S. P., and Penedo, M. H. (March 1986). Fillin: A Reusable Tool for Form-Oriented Software. IEEE SoJiware. Weide, B. W., Ogden, W F., and Zweben, S. H. (1991). Reusable Software Components. In “Advances in Computers” (M. C. Yovits, ed.) Xerox Corporation (1979). “Alto User’s Handbook.” Xerox Palo Alto Research Center, Palo Alto, California. Xerox Corporation (1981 ). “8010 Star Information System Reference Guide.” Dallas, Texas. Young, D. A. (1989). “X Window Systcms Programming and Applications with Xt.” PrenticeHall.
This Page Intentionally Left Blank
Multisensory Computer Vision N. NANDHAKUMAR" Department of Electrical Engineering University of Virginia Charlottesville, Virginia
J. K. AGGARWALt Computer and Vision Research Center College of Engineering The University of Texas Austin, Texas 1. Introduction . . . . . . . , . . . , . . 2. Approaches to Sensor Fusion . . . . . . , . . 2.1 The Fusion of Multiple Cues from a Single Image . . 2.2 The Fusion of Information from Multiple Views . . 2.3 The Fusion of Multiple Imaging Modalities . . . , 3. Computational Paradigms for Multisensory Vision . . . 3.1 Statistical Approaches to Multisensory Computer Vision 3.2 Variational Methods for Sensor Fusion . . , . . 3.3 Artificial Intelligence Approaches . . . . . . . 3.4 The Phenomenological Approach . . . . . . . 4. Fusion at Multiple Levels . . . . . . . . . , 4.1 Information Fusion at Low Levels of Processing . . 4.2 The Combination of Features in Multisensory Imagery. 4.3 Sensor Fusion During High-Level Interpretation . , 4.4 A Paradigm for Multisensory Computer Vision . . . 5. Conclusions . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . .
. . . .
. . _ .
. . . . . . . , . . . .
. .
. .
. .
. .
, ,
. .
.
.
.
.
.
,
.
.
.
.
,
.
.
.
.
.
,
.
. . . . .
.
.
I
.
. '
.
.
.
.
. , .
. . . .
. . . .
. . . .
. . . .
. , . .
. . . .
.
.
.
.
.
.
59 63 63 68 71 86 86 90 91 94 99 100 102 103 103 105 107
1. Introduction
Automated analysis of digitized imagery has been an active area of research for almost three decades. Early research in this area evolved from signal processing schemes developed for processing one-dimensional signals. The science of describing and analyzing one-, two-, and three-dimensional * Supported in part by the Commonwealth of Virginia's Center for Innovative Technology under contract VCIT INF-91-007, and in part by the National Science Foundation under grant IRI-91109584. t Supported by Army Research Office under contract no. DAAL-03-91-G-0050. 59 ADVANCES IN COMPUTERS, VOL. 34
Copyright 0 1992 by Academic Press, Inc. All nghts of reproduction in any form reserved. ISBN 0- 12-012134-4
60
N. NANDHAKUMAR AND J. K. AGGARWAL
signals quickly became an established area of research. The area grew rapidly and was propelled by new theories and experimental findings in areas as diverse as cybernetics, artificial intelligence, mathematical modelling, human psychophysics, and neuro-physiological investigation. Moreover, the concomitant advances in technology made available increasingly sophisticated imaging sensors and greater computational power, which facilitated the implementation and verification (or refutation) of these new ideas. The development of automated image analysis techniques has also been driven by the urgent need for automating a variety of tasks such as equipment assembly, repair and salvage in hazardous environments, routine and repetitive inspection and monitoring, complex assembly operations that require sensing and interpretation of a scene, guidance and navigation of vehicles and projectiles, analysis of remotely sensed data, and so forth. All of these factors have provided great impetus to research in digital image analysis and have made possible the large and useful collection of knowledge that exists today in this exciting specialization of science and technology. Research in the automated analysis of digitized imagery may be grouped into three broad, loosely defined categories : 0
0
0
Image processing: The development of digital signal processing techniques to restore, enhance and compress images. Several books have been published on this subject, including the ones by Gonzalez and Wintz (1987), Rosenfeld and Kak (1982), and Jain (1989). Pattern Recognition: The development of mathematical (typically statistical and structural) models for representing or modelling classes of patterns and optimal algorithms for classifying patterns. The books by Duda and Hart (1973), Fukunaga (1990), and Therrien (1989) contain detailed discussions of important aspects of this approach. Computer Vision : The development of scene and world models involving a hierarchy of representations, and algorithms for interpreting scenes based on computational models of the functional behavior of biological perceptual systems. The books by Marr (1982), Ballard and Brown (1982), Horn (1986), and Schalkoff (1989) describe important results established in this area of research.
These categories overlap considerably. For example, problems such as image segmentation have been addressed from various perspectives, and such research may be classified into any of the preceding categories, depending on the particular approach that is followed. While the term computer vision has been construed by some to mean the investigation of computational models of only the human visual system, its usage in current literature includes a variety of sensing (perceptual) modes such as active range imaging
MULTISENSORY COMPUTER VISION
61
and thermal imaging. Moreover, computational models developed for computer vision rely on a variety of formalisms such as computational, differential, or analytic geometry and Markov random field models, among others. In the following discussion, the term computer uision is used with the latter, broader definition in mind. It is well known that the human visual system extracts a greal deal of information from a single gray-level image. This fact motivated researchers to devote much of their attention to analyzing isolated gray-scale images. However, research in computer vision has made it increasingly evident that formulation of the interpretation of a single image (of a general scene) as a computational problem results in an underconstrained task. Several approaches have been investigated to alleviate the ill-posed nature of image interpretation tasks. The extraction of additional information from the image or from other sources, including other images, has been seen as a way of constraining the interpretation. Such approaches may be broadly grouped into the following categories: (1) the extraction and fusion of multiple cues from the same image, e.g., the fusion of multiple shape-from-X methods; (2) the use of multiple views of the scene, e.g., stereo; and more recently (3) the fusion of information from different modalities of sensing, e.g., infrared and laser ranging. Various researchers have referred to each of these approaches as multisensory approaches to computer vision. The order in which the approaches have been listed indicates, approximately, the chronological order in which these methods have been investigated. The order is also indicative of the increasing amount of additional information that can be extracted from the scene and that can be brought to bear on the interpretation task. Past research in computer vision has yielded analytically well-defined algorithms for extracting simple information (e.g., edges, 2-D shape, stereo range, etc.) from images acquired by any one modality of sensing. When multiple sensors, multiple processing modules, or different modalities of imaging are to be combined in a vision system, it is important to address the development of (1) models relating the images of each sensor to scene variables, (2) models relating sensors to each other, and (3) algorithms for extracting and combining the different information in the images. No single framework is suitable for all applications and for any arbitrary suite of sensors. The choice of a computational framework for a multisensory vision system depends on the application task. Several computational paradigms have been employed in different recent multisensory vision systems. The paradigms can be categorized as (1) statistical, (2) variational, (3) artificial intelligence, and (4) phenomenological approaches. Statistical approaches typically involve Bayesian schemes that model multisensory information using multivariate
62
N. NANDHAKUMAR AND J. K. AGGARWAL
probability models or as a collection of individual (but mutually constrained) classifiers or estimators. These schemes are appropriate when the domain of application renders probabilistic models to be intuitively natural forms of models of sensor performance and the state of the sensed environment. An alternative, deterministic, approach is based on variational principles wherein a criterion functional is optimized. The criterion functional implicitly models world knowledge and also explicitly includes constraints from multiple sensors. Adoption of this approach results in an iterative, numerical relaxation approach that optimizes the criterion functional. The complexity of the task sometimes precludes simple analytical formulations for scene interpretation tasks. Models relating the images of each sensor to scene variables, models relating sensors to each other, and algorithms for extracting and combining the different information in the images usually embody many variables that are not known prior to their interpretation. This necessitates the use of heuristic and empirical methods for analyzing the images. The development of complex interpretation strategies and knowledge representational mechanisms for using such methods has been intensively researched in the field of artificial intelligence (AI). Many of these ideas can be employed in the design of a multisensory vision system. Recently, research has been directed at using phenomenological models for multisensory vision. The models are based on physical laws, e.g., the conservation of energy. Such models relate each of the sensed signals to the various physical parameters of the imaged object. The objective is to solve for the unknown physical parameters by using the known constraints and signal values. The physical parameters then serve as meaningful features for object classification. This chapter highlights the different ideas mentioned previously that are currently being investigated. The chapter is not meant to be an exhaustive compendium of such work. In keeping with this objective, a comparison and review of some recently reported work is presented while describing briefly some rccent and popular approaches to sensor fusion. Section 2 provides a brief description of specific systems that adopt multiple sensors for vision. The systems described are broadly classified into three groups: (1) those that combine the outputs of multiple processing techniques applied to a single image of the scene, (2) those that combine information extracted from multiple views of the same scene using the same imaging modality, and (3) those that combine different modalities of imaging, different processing techniques, or multiple views of the scene. Section 3 discusses some general computational paradigms used for implementing multisensory scene perception. It also discusses typical applications of each of the approaches. Section 4 discusses issues pertaining to the hierarchical processing of multisensory imagery and levels of sensory information fusion. It presents a paradigm for a
MULTISENSORY COMPUTER VISION
63
model-based vision system incorporating fusion at multiple levels of processing. The paradigm described in Section 4 is not prescribed as a general paradigm for multisensory vision since a general paradigm does not, as yet, exist for all applications. Finally, Section 5 contains concluding remarks. 2. Approaches to Sensor Fusion The term multisensor fusion has many connotations as described in the previous section. Approaches to combining multisensory imagery may be grouped into three broadly defined categories: (1) fusion of multiple cues from a single image, (2) integration of information from different views of a single scene, and (3) integration of different imaging modalities. Recent contributions in each of these three areas are discussed in this section. 2.1
Fusion of Multiple Cues from a Single Image
A greal deal of past and current research in computer vision has focused on the extraction of information from a single image. Different techniques, such as texture analysis, contour analysis, and shape analysis, among others, were developed and applied separately to an image. These techniques offered specific solutions to artificially constrained problems that could be solved in the laboratory or with synthesized imagery. The complexity of real-world scenes limits the usefulness of these techniques to imagery acquired from real scenes. In general, each of these problem formulations is typically underconstrained, yielding ambiguous results. This motivated researchers to combine the output of several different operations on an image in an attempt to constrain the interpretation. Such efforts have been directed by engineering applications and have also been motivated by results of psychophysical investigations. The latter have shown that various biological perceptual systems combine the outputs of multiple processing modules to produce an interpretation of the scene, e.g., blob, terminator, and crossing detection modules are integrated to perceive texture (Jules and Bergen, 1987). Presented in the following are examples of recent computer vision systems that follow this approach. 2.1. I
Visible Discontinuity Detection
Discontinuities in the intensity, texture, and orientation of surfaces imaged in a scene provide important information for scene segmentation, object classification, motion computation, etc. The reliable detection of visible discontinuities is, therefore, an important problem. A project that seeks to achieve this goal by combining the output of multiple discontinuity detecting
64
N. NANDHAKUMAR AND J. K. AGGARWAL
modules is the MIT Vision Machine (Poggio et al., 1988). Parallel modules compute zero crossings of the Laplacian of Gaussian filtered image, Canny’s edge detection scheme, and texture. Other information extracted from stereoscopic analysis, optic flow computation, and color segmentation is also integrated. The approach is based on the argument that, at discontinuities, the coupling between different physical processes and the image data is robust. Hence, discontinuities are argued to be “ideal” for integrating information from different visual cues, and the system is motivated by psychophysical findings that support this position. The approach seeks to refine the initial estimates of discontinuities using information from several cues. The different discontinuity cues are combined in the MIT Vision Machine using a Markov random field (MRF) model. The M R F model facilitates sensor fusion. Consider a surfacefand sparse observation g for this surface. LetJ and g, denote the corresponding values at site i in the image. The prior probabilities P(,f’) can be shown to be Gibbsian; i.e.,
where 2 is a normalizing constant, T is known as the temperature, and U( f)= C, U,( f) is the sum of contributions from every local neighborhood. Knowing the conditional probability of g givenf; the posterior distribution is given by the Bayes theorem as
where the energy function U ( f l g ) is given by
C denotes the cliques defined for the neighborhood of site i and contain site j , and y i = 1 at sites where data are available. The problem is to search for the f that maximizes the posterior probabilities for the entire image. One solution strategy involves the application of simulated annealing and stochastic relaxation techniques (Geman and Geman, 1984). The prior energy function can be modified to include other sources of information, such as intensity edge information, texture, orientation, etc. For example, let l{ be the output of a line detector that has output 1 if a linear edge exists between sites i a n d j and has value 0 otherwise. The energy function can then be
MULTISENSORY COMPUTER VISION
65
modified to be U C ( f ) = ( 5 -f,>2(1 - C) + PvCQ:) (4) where Vc is an operator that supports specified configurations of line edges. This operator may also be defined to support discontinuities detected from other sources of information, such as texture and orientation. Defining U c ( f ) to include information from multiple sources of information is thus a convenient and popular way to exploit M R F models for multisensory vision. The limitations of the approach are many, as listed by the system’s pro1988). Information integration may require goalponents (Poggio et d., directed processing, which the current MRF-based approach does not provide. Also, the probabilistic formulation of MRF is too general and therefore may be too inefficient. Deterministic algorithms, such as regularization techniques are preferred for this reason. A discussion of the advantages of deterministic approaches over stochastic approaches for visual reconstruction can be found in recent literature (Blake, 1989).
2.7.2 Computing Shape Many researchers address the use of shading information to compute the shape of an imaged surface. This problem is inherently underconstrained since brightness information at any pixel provides only a single constraint while surface orientation constitutes two degrees of freedom, i.e., ( p , q ) that denote the surface gradients along the x- and y-axes, respectively. Integrating other sources of information to constrain the solution has been an active area of research. One such commonly used piece of information is the assumption of smoothness (continuity) of the surface. This constraint allows the derivation of a method to grow a surface from points of known surface depth and orientation. The growth of the surface occurs along characteristic strips that are given by the solution of a system of five ordinary differential equations (Horn, 1986). This method is sensitive to noise, and it cannot use constraints from boundaries of the strip. An alternative to the characteristic strip expansion method that overcomes these limitations and also allows occluding contour information to be integrated as boundary conditions is based on a variational approach (Ikeuchi and Horn, 1981 ; Horn and Brooks, 1986). This approach seeks to minimize the deviation from smoothness and also the error in the image-irradiance equation. The stereographic plane is used instead of the gradient space. The conformal stereographic projection of the gradient space is defined as
66
N. NANDHAKUMAR AND J. K. AGGARWAL
Functions f ( x , y ) and g(x, y ) are sought that minimize ”
P
where A 3 0 , E(x,y ) is the image brightness and R,(f, g) is the reflectance map. The Euler equations for the preceding formulation consist of a pair of partial differential equations, the discrete forms of which specify an iterative relaxation approach for computingfand g . One drawback to the approach is that the resulting surface slopes may not be integrable. If z ( x , y ) is the surface being solved for, then integrability is defined by zxy(x. y ) = z&,
y)
(7)
viz., the second partial derivatives are independent of the order of differentiation. Methods for enforcing integrability in the solution of the surface are discussed by Frankot and Chellappa (1988) and Simchony and Chellappa ( 1990). The variational approach described previously, which uses the method of Lagrange multipliers to solve the constrained minimization problem, is also termed the regularization approach. The main objective of regularization is to transform ill-posed problems into well-posed ones. The variational approach is a convenient computational framework for incorporating multiple constraints and, hence, is an attractive strategy for implementing a multisensory computer vision system. The integration of the output of multiple texture analysis modules has been investigated by Moerdler and Kender (1987). Two shape-from-texture methods are integrated: (1) shape from uniform texel spacing, and (2) shape from uniform texel size. Their motivation for using multiple shape-fromtexture modules is that a single module can be applied only to a very limited range of real images while the combination of different modules allows surface orientation estimation for a wider class of textures surfaces. In shapefrom-uniform-texel size, two texels T I and T2 are detected whose sizes are S , and S2,respectively. If F, is the distance from the center of texel T, to the vanishing point (Fig. l), then
where F2 = FI - D . Since D can be measured from the image, we can solve for F, . In shape-from-uniform-texel spacing, three texels are detected (Fig. 2). The distance between the first texel and the vanishing point is given by
MULTISENSORY COMPUTER VISION
67
FIG.1. Computing vanishing points using uniformly sized texels.
Each vanishing point circumscribes a great circle on the Gaussian sphere. Vanishing points extracted from different choices of texels and from applying multiple shape-from-texture approaches to the same surface patch contribute multiple great circles, the intersections of which specify two unique surface orientations corresponding to the visible and invisible sides of the surface. The integration of multiple surface orientation estimates from the different approaches is designed to yield a “most likely orientation” for each texel path. An “augmented texel” is used for the integration process. This is a data structure containing a 2-D description of a texel patch and a list of orientation constraints. A hierarchical representation consisting of multiple Gaussian spheres tessellated at different scales of resolution is used to fuse multiple orientation information. A Waltz-type algorithm computes the most likely orientation for each texel patch. Surface segments are then generated from this information. Performance of the system on real data has been reported (Moerdler and Kender, 1987).
FIG.2. Computing vanishing points using uniformly shaped texels.
68
N. NANDHAKUMAR AND J. K. AGGARWAL
2.2 The Fusion of information from Multiple Views Although the extraction and integration of multiple visual cues from an image does yield more information about the imaged scene, the extra information produces sufficient constraints for unique solutions in only a limited number of situations. This is especially true for the problem of reconstructing the three-dimensional structure of the imaged scene using techniques such as shape-from-shading or shape-from-multiple-texture modules. The problem of 3-D scene reconstruction benefits greatly from the use of multiple views of a scene. The extra information available from additional views is entirely due to the geometrical constraints that arise from the motion of the camera and object. The simplest example of integrating multiple views is stereoscopic depth perception. Figure 3 illustrates the main principle of using two cameras Cl and C2-their positions, orientations, and focal lengths are calibrated with respect to a fixed coordinate system (perhaps centered on one of the cameras). Consider an object P that projects on to image plane points P I and P2 in C1and C2, respectively. Since the cameras are calibrated, the vectors OIPland are known, and hence, the intersection of these vectors can be computed to determine the 3-D coordinates of point P . The main problem in stereoscopic depth perception is to search for P2 in C2 given P I in C1such that both PI and P2 correspond to projections of the same point P in 3-D space. This problem is termed the correspondence problem. The primary constraint used to solve this problem is that Pz must lie on the epipolar plane containing P I , where the epipolar plane is defined to be the plane
FIG.3. Stereoscopic depth reconstruction.
MULTISENSORY COMPUTER VISION
69
containing the two centers of projection, 0, and 02,and the point P. The intersection of the epipolar plane containing P I and the image plane of C2 determines the epipolar line l2 on which Pz may be found. Additional constraints, such as the uniqueness of a match and smoothness of the imaged surface, are required to further constrain the establishment of correspondence. Several techniques have been developed for constraining the correspondence task. Dhond and Aggarwal (1989a) present a review of such techniques. A recently developed approach to facilitate correspondence relies on the use of a third camera C3 to create a trinocular imaging system. The image point P I now specifies epipolar lines l2 as well as l3 as shown in Fig. 4. A candidate match P2 in C2 specifies another epipolar line 1; in C 3 . If P2 is a valid match, then a point P3 in C3 that is at (or very near) the intersection of 1, and 1; will have a similar intensity distribution when compared with PI and P2. This condition signals a valid correspondence. Dhond and Aggarwal (1989b) analyze in detail the contribution of the third camera in aiding the correspondence process. The computation involved in establishing correspondence can be simplified further by rectifying the trinocular images (Ayache and Hansen, 1988; Ayache and Lustman, 1991), which involves applying linear image transformations to produce parallel, horizontal/vertical epipolar lines in the transformed images.
FIG.4. Trinocular imaging system.
70
N. NANDHAKUMAR AND J. K. AGGARWAL
A generalization of the preceding problem is to compute the 3-D scene structure and relative motion given 2-D images from unknown positions. Solutions to these problems rely on geometric and projective constraints that typically yield a system of nonlinear equations. A vast amount of literature is available on these topics and hence they are not discussed here. For example, Aggarwal and Nandhakumar ( 1988) review techniques for estimating 3-D motion from a sequence of 2-D images. It is worth noting that the integration of information in such approaches is truly synergistic. In the case of stereoscopic analysis, for example, the integration of simple cues (such as 2-D coordinates of edges) extracted from each image via identical processing modules yields 3-D information that cannot be otherwise obtained. Research has also been conducted on the integration of information from multiple views as well as from multiple processing modules that analyze these views. For example, Krotkov and Kories ( 1 988) discuss the combination of focus ranging methods and stereo ranging techniques. An agile, servomotor driven camera system is controlled autonomously to orient and focus cameras and to adjust the illumination. The focus ranging and stereo processes cooperate to yield more reliable estimates of the depth of objects from the cameras. The integration of depth estimates from the two processes is based on a statistical framework that seeks to reduce the variance of the final estimate. Another system that integrates multiple cues extracted from an image with information extracted from multiple views is the MIT Vision Machine (Poggio et ul., 1988), mentioned in Section 2.1. The MRF formulation also is used to integrate range data extracted from stereoscopic analysis, as well as optic flow extracted from a sequence of images, with the output from other early vision modules. Aloimonos and Basu (1988) discuss the fusion of stereo, retinal motion, contour, shading, and texture cues for computing 3-D structure and motion information of the scene with minimal assumptions. They explore issues regarding the uniqueness and stability of solutions for different pairwise combinations of these sources of information. Moerdler and Boult (1 988) discuss the fusion of stereo and multiple shapefrom-texture modules for recovering three-dimensional surface information. Their objective for information fusion is to enhance the robustness of surface reconstruction. Information fusion occurs in two stages. The combination of multiple shape-from-texture modules is similar to that described by Moerdler and Kender (1987) and is termed intra-process integration. Moerdler and Kender argue that it is easier to heuristically combine data from similar processes. A regularization-based approach combines the output of this stage with stereo range data to produce smooth object surfaces. This latter process is termed interprocess integration. A blackboard scheme is proposed for interaction between the computational modules and the integration modules.
M U LTI S E NSO RY CO M P UTER V ISI0N
71
2.3 The Fusion of Multiple Imaging Modalities It has been observed that the human visual system and other biological perceptual systems combine information from multiple monochrome visual images and from multiple processing modules operating on these images to produce a rich interpretation of the scene (Marr, 1982, Chapter 3). Research in computer vision, however, has shown that emulating this behavior functionally by using artificial means is a very difficult task. The approaches discussed in the previous sections continue to yield ill-conditioned formulations and produce very sparse interpretations. These problems may be lessened by using additional sensory information acquired via disparate sensing modalities that further limit the ambiguities in the interpretation. Such an approach has been motivated by two very different factors: (1) the recent availability of new sensing modalities, e.g., laser range and infrared; and (2) neurobiological findings that establish ways in which disparate sensory information is fused in natural perceptual systems, e.g., infrared and visual image fusion in snakes (Newman and Hartline, 1982) and the fusion of acoustic and visual imagery in barn owls (Gelfand, Pearson, and Spence, 1988). We present the salient features of several different research projects that combine multiple imaging modalities and that are motivated by either or both of these factors. We discuss the approaches used in research projects that are mature and integrate information in a nontrivial manner.
2.3.1 Different Components of Laser Radar Imagery Chu, Nandhakumar, and Aggarwal (1988,1990) developed a system that combines information from range, intensity, and velocity components of laser radar (ladar) imagery. The objective of the research is to detect and classify man-made objects in outdoor scenes. Each component of the ladar imagery is processed by different modules, and the resulting segmentation maps are fused to produce a composite segmentation map. The different modules process the image components based on the specific nature of the information contained in each image component. For example, the range image is segmented by using geometric analysis, i.e., by growing planar surfaces in the scene. Also, surface roughness parameters are extracted to help detect whether or not the region corresponds to a man-made object. Intensity imagery is analyzed to yield statistical properties of the speckle noise in the image. Different types of surfaces yield different types of speckle noise. Characterizing speckle helps distinguish between different types of surfaces.
72
N. NANDHAKUMAR AND J. K. AGGARWAL
The segmentation map and features extracted by the various modules are fed to an expert system for classification. The KEE expert system shell has been used for developing the rules for classification. The system has been tcstcd on a large set of real multisensory ladar imagery obtained from outdoor scenes. The segmentation results compare favorably with those obtained by manual segmentation. Preliminary attempts at classifying man-made objects show promising results. Other modalities of imaging such as infrared and millimeter-wave radar, are also being incorporated into the system. The block diagram of the system is shown in Fig. 5.
2.3.2 Structured Lighting and Contour lmagery
Wang and Aggarwal (1987, 1989) describe a system that combines information from both structured lighting and silhouettes (occluding contours) of the imaged object to reconstruct the three-dimensional structure of the object. A parallel projection imaging geometry is assumed. Multiple silhouettes from multiple views are rasterized in the direction parallel to the base plane. Each rasterized line segment is backprojected along the horizontal plane to intersect with backprojected line segments from other views. These intersections define a polygon on a plane parallel to the base plane. The stack of polygons corresponding to different parallel planes define the bounding volume description of the object (Fig. 6). The bounding volume description is then refined by using surface structure information computed from structured lighting. The computation of surface structure from light striping does not require correspondence to be established between the projected and sensed lines. Two orthogonal patterns are projected onto the object. Each pattern is a set of equally spaced stripes marked on a glass plate. Geometrical constraints are used to recover local surface orientations at the intersections of the two sets of mutually orthogonal grid lines. These local orientations are propagated along the lines to determine the global structure. Let the world coordinate axes be chosen such that the x-y plane is the base (horizontal) plane. Let the pan angle and elevation angle of the image plane normal be denoted by 8, and tyl, respectively. Similarly, let the normal to the plane containing the grid lines (which are to be projected onto the object) make pan and elevation angles of 8, and 'yg,respectively. Also, let the orientation of the plane, which is tangential to the object surface at the point of interest, be denoted by (8(,,'yJ. Let v I and v2 be orientations of the sensed stripes in the image plane reflected off the base plane. Let p1 and p2 be orientations of the sensed stripes in the image plane reflected off the point of interest on the object surface.
73
M U LTIS E NSO R Y CO M P UTER VI S I0N
Collection of data statistics and Data Format Conversion
Knowledge Base built from Integrated Segmentation
in C
Symbolic Reasoning in KEE and LISP Map
Estimating Segment Characteristics
1
Signal Processing Server
Interpreted Segmentation map and scene
+(proposed
4 feedback loop)
FIG. 5. Integrated analysis of the different components of ladar imagery.
u
Y L
C
._ ” a M U
U 0
e
0
a
L
-
MULTISENSORY COMPUTER VISION
The first step involves computing For the first stripe pattern
75
(Bi,yi) for the imaging configuration.
and for the second stripe pattern
Each constraint defines a curve on the Gaussian sphere. Four intersections of these curves provide four possible interpretations. The correct interpretation can be easily discerned by using a distinguishable marking (Wang and Aggarwal, 1987). The second step involves computing ( O o , y o ) for each point on the object where the grid lines intersect. Again, for the first stripe pattern A sin y o + B sin 8, cos yo = 0
(12)
and for the second stripe pattern
C cos O, cos y o i-D sin 8, cos y o + E sin y o = 0
(13)
where A , B, C, D, and E are known functions of ( p l ,p2), ( O r , yl),and ( O , , ty,) (Wang and Aggarwal, 1987). Note that ( O , , y,), and ( O , , y,) are known a priori while ( p l , p2) can be measured in the image plane. Each constraint defines a curve on the Gaussian curve and intersections of these curves correspond to the solutions. A unique solution is available since the image plane orientation is known and since mirror reflections can be discarded. The orientation of the tangent plane at each stripe junction is propagated along the stripe lines using cubic spline interpolation. This allows the change in depth to be computed along sensed grid lines, thus providing surface structure. Note, however, that this process does not fix the position of the computed partial surface structure at a unique point in space. Occluding contour information from multiple views is used to position the partial surface structure in space. The partial surface structure computed from each view is used to refine the bounding volume computed from the occluding contours. The surface structure computed from light striping can be slid along the contour generating lines for that view. In order to constrain its position, the contour generating lines of a different view are used, along with additional geometrical
76
N. NANDHAKUMAR A N D J. K. AGGARWAL
constraints as illustrated in Fig. 7 (Wang and Aggarwal, 1989). Radial fines are drawn from the centroid of the object to intersect the contour. Depending on the type of contour intersected, i.e., partial surface structure(s) or contour generating lines, different surface averaging operations are executed to coalesce the information into a single surface description. Hu and Stockman (1987) describe a more qualitative approach that uses grid coding as well as the intensity image. The stripes projected on the object yield qualitative surface shape information such as planar, convex, concave, etc. Correspondence between projected and sensed stripes is assumed and triangulation is used to compute the depth along the stripes. The intensity image yields boundary information. Boundaries are assumed to be one of five possible types, e.g., extremum, blade, fold. A rule-based system uses physical constraints between adjacent region types and separating contour types to label surface regions as well as the dividing contours.
2.3.3 Color Imagery Color may also be considered to be multisensory information since irradiation in three different spectral bands is sensed. Baker, Aggarwdl, and Hwang (1 988, 1989) address the problem of detection and the semantic interpretation of large stationary man-made objects, such as concrete bridges, in monocular color images of nonurban scenes. Their system consists of an expert system in which the higher level interpretation stage is tightly coupled with the lower-level image analysis modules. Initial segmentation feeds cues to the higher level. Hypotheses are generated and the low-level modules are directed in an incremental segmentution that uses color and geometric information to verify the existence of instances of three-dimensional models. The color image of the scene is first converted to a monochrome graylevel image. A Laplacian of Gaussian (LOG) filter is applied to the image and the zero-crossings of the output are detected to form an edge map. Since the edge map yields closed boundaries, each closed region is assigned a distinct label. Straight line segments exceeding a predetermined threshold are detected. Appropriate pairs of parallel lines are then selected to detect rectilinear structures. Strict mathematical parallelism, however, cannot be used since this includes both collinearity, as a degenerate case, and lines with a common orientation in two distinctly separate and unrelated parts of the scene. Also line pairs that are strictly parallel in 3-D are often not parallel in their 2-D perspective projection. This leads to the notion of perceptually parallel lines, that is, lines accepted as parallel for scene interpretation purposes, The perceptual grouping rule for parallel lines is defined as the following: find all line pairs with a similar orientation and significant overlap that
/
Contour gemerallmg Ilmea of directlom A,
Second view direction
position
Contour generating line8 of
a
/ pusition
direction
3
A2
C
r
b
?rojeclivr
range
im direction A *
/
position
H
c
W
ni
n
e
5
v,
Vlewing direction A2
.I
t Viewing
-
I
I .'I
:I
:
direction A 1
Structure inferred from the first v i e w Structure inferred from the second view
:
---------_
I
I I I
: :
:
+
0
First view direction
I I
Contour generating lines Radial sampling lines
FIG. 7. Positioning partial surface structure using bounding volume and occluding contour, first from a single view and then using an additional view (Wang and Aggarwal, 1989).
-4 -4
78
N. NANDHAKUMAR AND J. K. AGGARWAL
are separated by a perpendicular distance less than half the average length of the two lines. Rectilinear structures are extracted by identifying those subregions of the intensity (gray-scale) image bounded, in part, by parallel line segments. Each pair of perceptually parallel line segments defines a rectangle, called an intrinsic rectangle. There are two categories of intrinsic rectangles, called atomic and nunatomic rectangles. An atomic rectangle is derived from perceptually parallel line segments bounding a single region. A nonatomic rectangle encompasses more than one region in the image, as shown in Figs. 8 and 9. If the intrinsic rectangle contains more than one label, the rectangle covers multiple regions and is rejected. The color of each atomic rectangle is then sampled and used to reject rectangles occurring in natural, i.e., not man-made, portions of the scene. The color representation scheme used in the system is the CIE (1978) recommended L*a*h* color space, which defines a uniform metric space representation of color so that unit perceptual distances can be represented by unit spatial distances. For each atomic rectangle, the average values of luminance L, chroma C, and hue H a r e computed from the red, green, and blue values as specified by the C [ELAB transformation. The material composition of each region is estimated based on color characteristics. Each atomic rectangle
FIG. 8. h n d g of an outdoor scene containing a concrete bridge (Baker er nl., 1989).
MULTISENSORY COMPUTER VISION
79
80
N. NANDHAKUMAR AND J. K. AGGARWAL
is associated with each material and a confidence factor is assigned to that linkage. The confidence factor for a particular association between a rectangle and a material type is obtained from a color confidence function associated with that material type. Each confidence function for each material type is stored as a three-dimensional array indexed by color coordinates. The confidence functions may be considered to define volumes (of constant confidence values) in three-dimensional color space. Confidence factors are returned in the range [0, 1.01 and the DempsterShafter formalism is used for updating belief in classification (Shafer, 1976). This approach allows belief in other material types to reduce the belief that the material type of a region is concrete. Color constancy and fine brightness constancy control are handled within the encoding of the color confidence functions. The confidence functions are determined from training sets under supervised learning. The training phase involves the specification of the 3-D volumes of constant confidence values. The superquadric family of parametric volume representations is chosen for this purpose. The ( L ,H , C) data are obtained from training data consisting of intrinsic rectangles. Superquadric volumes are fit to these data to define the confidence functions. Values of the function are specified by a heuristic rule. Incremental segmentation is then performed. First, all obviously joinable regions are merged. Joinable regions are those that have central axes that are approximately parallel and that also have overlapping (artificially created) line segments on the nearer ends. The initial hypothesis generation is data driven from the material list in the segmenter graph. The interpreter attempts to instantiate as many instances of each bridge model as there are vertically oriented rectilinear concrete surfaces in the graph. The interpreter infers missing pieces in a complete model that has been instantiated. The missing piece is first instantiated, thus forcing a local (incremental) resegmentation of the scene and the creation of a new region. Verification of missing pieces is based on color information. During the verification process, the color confidence function is weakened to be able to accept a larger region of the color space as acceptable color values for the hypothesis being verified. Belief in the overall model is adjusted based on this additional information. Belief could be withdrawn if the model is later found to be inconsistent. The interpreter uses various constraints, including geometrical relationships between structural aggregates as well as the presence of shadows and the spatial relationships between the structural aggregates and shadows. A truth maintenance mechanism implemented within KEE retracts portions of the belief network that depend on assertions no longer believed. The interpreter cycles though the hypothesize and verify cycles until a complete model acquires a high measure of belief. Having detected a concrete bridge in the scene, the system then explores other structural aggregates in the image that
MULTISENSORY COMPUTER VISION
81
-
FIG. 10. Atomic rectangles corresponding to the concrete material type.
have not been associated with the verified model. Figure 10 shows the atomic rectangles, with color indicating the concrete material type. Figure 11 shows the results of the interpretation after the incremental segmentation and verification. Joinable regions have been appropriately joined and verified based on the instantiated model of a bridge. In Fig. 1 1 , the interpreter has detected two bridges, the first partially occluding a second. Three structural aggregates on the extreme right were not joinable to the other bridge structures because of the occluding telephone pole in front of the bridges. Levine and Nazif (1985a, 1985b) describe a rule-based image segmentation technique that processes color imagery. Their approach consists of first partitioning the image into a set of regions to form a region map. Edges are also extracted to form a map of lines. The regions are then repeatedly split and merged; and lines are repeatedly added, deleted, and joined. An important aspect of their system is a focus of attention mechanism. This mechanism identifies “interesting phenomena” in the image, e.g., a group of large adjacent regions that are highly uniform, highly textured, etc. The focus of attention mechanism chooses the order in which data are selected on which rules are to be applied. The system thus incorporates a feedback mechanism in which the data specify the rules to be applied and the order in which the rules are to be applied.
82
N. NANDHAKUMAR AND J. K. AGGARWAL
FIG. 1 I . Final interpretation shows that two instantiated models have been verified. One bridge partially occludes another, which is behind the former. Joinable aggregates are joined by appropriate or verified structures. Several concrete material structures at the extreme right remain separated from either instantiated model.
Klinker, Shafer, and Kanade (1988) discuss the segmentation of objects using physical models of color image generation. Their model consists of a dichromatic reflection model that is a linear combination of surface reflection (highlights) and reflection from the surface body. The combined spectral distribution of matte and highlight points forms a skewed T-shaped cluster (in red-green-blue space) where the matte points lie along one limb of the T and the highlight points lie along the other limb. Principal component analysis of color distributions in small nonoverlapping windows provides initial hypotheses of the reflection type. Adjacent windows are merged if the color clusters have similar orientations. These form “linear hypotheses.” Next, skewed T-shaped clusters are detected. This specifies the dichromatic model used to locally resegment the color image via a recursive region merging process. Thus a combination of bottom-up and top-down processing segments images into regions corresponding to objects of different color. More recently, Healey (1991) reports on a color segmentation approach that uses a reflection model that includes metallic as well as dichromatic surfaces. The
MULTISENSORY COMPUTER VISION
83
segmentation algorithm considers the color information at each pixel to form a Gaussian random vector with three variables. Segmentation is achieved by a recursive subdivision of the image and by the analysis of resulting region level statistics of the random vector. Jordan and Bovik (1988) developed an algorithm that uses color information to aid the correspondence process in stereo vision algorithms. Their work is motivated by psychophysical findings that indicate the secondary role of color information in human stereo vision. 2.3.4 Infrared and Visual Imagery
Nandhakumar and Aggarwal (1987,1988a-c) present a technique for automated image analysis in which information from thermal and visual imagery is fused for classifying objects in outdoor scenes. A computational model is developed that allows the derivation of a map of heat sinks and sources in the imaged scene based on estimates of surface heat fluxes. Information integration is implemented at the different levels of abstraction in the interpretation hierarchy, i.e., at the pixel and the symbolic levels. Pixellevel information fusion yields a feature based on the lumped thermal capacitance of the objects, which quantifies the surface’s ability to sink/source heat radiation. Region-level fusion employs aggregate region features in a decision tree classifier to categorize imaged objects as either vegetation, building, pavement, or vehicle. Real data are used to demonstrate the approach’s usefulness. The approach classifies objects based on differences in internal thermal properties and is tolerant to changes in scene conditions, occlusion, surface coatings, etc. The approach is suitable for applications such as autonomous vehicle navigation, surveillance, etc. The multisensory vision system Nandhakumar and Aggarwal (1987, 1988a-c) describe is largely a data-driven system. Oh, Nandhakumar, and Aggarwal (1989) and Karthik, Nandhakumar, and Aggarwal (1991) develop a unified modeling scheme that allows the synthesis of different types of images. In particular, they describe the generation of thermal and visual imagery as well as the prediction of classifier features used by the multisensory vision system of Nandhakumar and Aggarwal (1987, 1988a-c) for object recognition. The development of specific strategies for using the developed unified models for model-based multisensory vision is under investigation. 2.3.5 Range and intensity Imagery
The integration of registered laser range and intensity imagery has been intensively researched. Gil et al. (1983, 1986) explore the extraction of edge
84
N. NANDHAKUMAR AND J. K. AGGARWAL
information by combining edges separately extracted from range and intensity edges. A more complete edge description of the scene is obtained by merging edges extracted from the two types of images. The combination of intensity edge information and 3-D information from range imagery is used to recognize objects (Magee and Aggarwal, 1985; Magee ef al., 1985). Lines and curves are extracted from the intensity edge imagery. Range information corresponding to these features is used to specify their positions in 3-D space. A graph-matching approach is used to recognize objects where the nodes of the graph correspond to features and edges correspond to geometric relationships. The intensity guided range-sensing approach is also extended for computing the motion of imaged objects (Aggarwal and Magee, 1986).
2.3.6 Range, Visual, and Odometry Research in autonomous navigation at CMU has focused on the use of laser range sensors, color cameras, inertial navigation systems, and odometry-for interpreting scenes, finding roads, and following roads (Stentz and Goto, 1987; Kanade, 1988). A software system called CODGER integrates the tasks of perception, planning, and control functions. The system implements three types of sensory functions: (1) competitive fusion occurs when sensors are of the same modality, e.g., vehicle position; (2) complementary fusion occurs when sensors are of different modality, e.g., stairs are identified by using color and range maps; (3) sensors are used independently, e.g., landmark recognition by using only the color camera.
2.3.7 Radar and Optical Sensors Shaw, de Figueiredo, and Kumar (1988) discuss the integration of visual images and low-resolution microwave radar scattering cross-sections to reconstruct the three-dimensional shapes of objects for space robotic applications. Their objective is to “combine the interpreted output of these sensors into a consistent world-view that is in some way better than its component interpretations.” The visual image yields contours and a partial surfaceshape description for the viewed object. The radar system provides an estimate of the range and a set of polarized radar scattering cross sections, which is a vector of four components. An “intelligent decision module” uses the information derived from the visual image to find a standard geometrical shape for the imaged object. If this is possible, then a closed form expression is used to predict the radar cross section. Otherwise, an electromagnetic model uses the sparse surface description to compute the radar cross section
MULTISENSORY COMPUTER VISION
85
by using a finite approximation technique. The unknown shape characteristics of the surface are then solved for iteratively, based on minimizing the difference between the predicted and sensed radar cross section. This technique is illustrated by a simulation reported by Shaw et al. (1988). 2.3.8 Sonar and Stereo Range Sensors
Mathies and Elfes (1988) discuss the integration of sonar range measurements and stereo range data for mobile robot applications. Occupancy grids are used for each ranging modality to represent the sensed information. The 2-D plane containing the sensors is tessellated into cells and each cell can have one of two states: occupied or empty. Sensor data update the probabilities of the states from multiple views of the scene. The probability updates are based on a Bayesian scheme where the prior probabilities of a sensor reading given the state of a cell are obtained from a probabilistic sensor model. The probabilistic model for the sonar sensor is defined by the beam pattern. The behavior of range error for a given disparity error defines the probabilistic model for the stereo range sensor. The integration of the two occupancy grids is based on the same Bayesian update scheme used for the individual occupancy grids. Experimental results illustrate the performance of this method using real data (Mathies and Elfes, 1988).
2.3.9 Multispectral Imagery Bhanu and Symosek (1987) describe a knowledge-based system for interpreting multispectral images. The system uses 5 spectral channels of a 12 channel scanner. The channels are chosen based on a priori knowledge of their ability to discriminate between classes of objects, such as sky, forest, field, and road. Each of the five spectral images is processed by a texture boundary detector. The outputs are combined to form a single gradient image. Edge segments are detected by labeling local maxima of the gradient image. These edge segments are then grown to form closed contours. Statistics (mean and standard deviation) of each channel are computed for each region. Features based on region location and adjacency are computed. During interpretation, spectral and local features are used to first detect the sky. Then the remaining regions are analyzed using a pseudo-Bayesian approach based on relational, spectral, and location features. It is evident from the preceding that researchers are investigating a variety of sensing modalities and a variety of strategies for integrating multiple sensors. In the next section we describe general classes of techniques used to integrate multisensory information.
86
N. NANDHAKUMAR AND J. K. AGGARWAL
3.
Computational Paradigms for Multisensory Vision
The previous section discussed specific systems, each of which incorporates a specific suite of sensors and attempts a particular vision task, We discussed ways in which multisensory information is fused in each system. This section discusses a more general issue, i.e., computational frameworks, each of which is suitable for a variety of multisensory vision tasks. The development of a single framework general enough to be applicable lo different suites of sensors and to different vision applications has been considered in the past. However, the realization of this goal has yet to be achieved. Several specific approaches have been adopted for designing multisensory vision systems. The popular computational approaches may be categorized into the following broadly defined classes : ( 1) statistical integration, (2) variational approaches, (3) artificial intelligence (AI) techniques, and (4) phenomenological approaches. The basic principles in each of these approaches are presented. 3.1 Statistical Approaches to Multisensory Computer Vision
Several distinct statistical approaches have been explored for multisensory computer vision. The most straightforward approach utilizes Bayesian decision theory based on multivariate statistical models. Such techniques are especially widespread in the analysis of multispectral remote-sensing data. This approach typically consists of first forming a feature vector wherein each variable corresponds to the signal value (e.g., pixel gray level) from each sensor. This feature vector is then classified by a statistical decision rule. Other features, such as the mean intensity level in a neighborhood, contrast, second- and higher-order moments, entropy measures, etc. which are computed for each sensor, have also been used as elements of the feature vector; e.g., see Lee, Chin, and Martin (1985). In some techniques, linear or nonlinear combinations of signal values from dgerent sensors form a feature, several of which are then fed to a classifier, e.g., Rosenthal, Blanchard, and Blanchard (1985). Other extensions to the standard statistical approach are reported, e.g., Di Zenzo et al. (1987) report a fuzzy relaxation labeling approach for image interpretation wherein a Gaussian maximum likelihood classifier provides initial probability estimates to the relaxation process. Different optimal classification rules have been developed for interpreting multisource data for each of a variety of statistical models assumed for the data. For example, consider s,(x,y) to be the signal (feature) from the ith sensor at image location (x, y ) , and the feature vector S(x, y ) to be defined as (s,(x, y ) . . . , , s N ( x , Y ) ) ~where , the number of sensors (features) = N . Let
M U L T I S E N S O R Y C O M P U T E R VISION
87
k. A simple classifier based on the minimum-distance rule will choose class c for pixel (x,y ) if
Pk be the prototypical feature vector for class
[S(X,.Y) - PcI2d[s(& y ) - PkI2,
Vk # c.
(14)
It is well known that the preceding classifier is optimal (maximizes the likelihood ratio) when S(x, y ) are Gaussian random vectors, si(x,y ) are independent and identically distributed, the class covariance matrices are equal, and the cost associated with each possible misclassification is equal. It is possible to derive optimal classifiers for other choices of statistical models. Classifiers derived in such a manner, however, do not address the problem of choosing suficiently discriminatory features from the infinite number of available features. Such approaches therefore suffer from the disadvantage that the global optimality of the feature set is impossible to guarantee. Also, the training of such classifiers is difficult since very large training data sets are warranted for achieving a reasonable error rate. It is also not clear what physical properties of the imaged objects are being utilized by the classifier during the discrimination process. 3.1.1
Markov Random Field Models
MRF models provide a convenient framework for making local decisions in the context of those made in a local neighborhood. Appropriate forms of the prior probability density functions also allow the integration of different sources of information in making such contextual decisions. Consider the classification problem of assigning a label/state I ( x ,v ) to a pixel at location (x,y ) . Let L denote the state assignment to the entire image. Let Y denote a specific set of multisensory data. The problem is to find the L that maximizes the posterior probability P(LI Y ) . Applying the Bayes theorem, the problem is equivalent to maximizing p ( YI L)P(L).The MRF assumption states that P[I ( x , y ) I L’(x,y ) ] = P [ l ( x ,y) I i ( x , y ) ] , where L ’ ( x , y ) is the set L minus the element I ( x , y ) , and L ( x , y ) is the state assignment in a local neighborhood defined at location (x, y ) . This assumption renders the prior joint probability density function P ( L ) to be of the Gibbs form; i.e., P(L) = - e
-U(L)/T
z
where 2 is a normalizing constant, T is known as the temperature, and U ( L ) is known as the energy function
88
N. NANDHAKUMAR AND J. K. AGGARWAL
F,(W,) is a function of the states of pixels in clique V,. The image model is a two-dimensional analog of a one-dimensional hidden Markov model. While optimal solutions can easily be computed for the latter, searching for the optimal solution of the two-dimensional problem is computationally prohibitive (Geman and Geman, 1984; Therrien, 1989). Hence, suboptimal solutions that yield good results are typically used. One solution strategy involves the application of simulated annealing and stochastic relaxation techniques (Geman and Geman, 1984). An important feature of the MRF model that makes it suitable for multisensory computer vision is that the prior energy function U ( L )can be modified to include other sources of information. For example, one of the potential functions constituting the prior energy function may be defined as N
F A X , Y ) = (1x.y
-
lk,d2 -
1 P , V , [ l ( x ,y ) , I =
&(X,
y)l
(17)
1
where the operator V, measures support provided by sensor si to the state/ label I (x, y ) . The MIT Vision Machine implements a specific instance of this approach for integrating image discontinuities detected by different processing modules (Poggio et al., 1988).
3.1.2 Multi- Bayesian Techniques
When a suite of sensors is used to collect and merge partial and uncertain measurements of the environment into a single consistent description, the sensors may be considered as a team that makes a joint decision by using complementary, competitive, and cooperative information (Durrant-Whyte, 1987, 1988). Having chosen appropriate probabilistic models for the sensors and the state of the environment, the interpretations from multiple sensors can be merged by using the Bayes decision theory. First, consider the case of multiple sensors sensing the geometric structure of the environment. If the environment contains known objects, then a network can be used as the model of the environment wherein nodes are geometric features (lines, surfaces, etc.) and sensor-coordinated frames and edges are geometric (uncertain) relations between nodes. Thus, the parameter vector, p (e.g., the intercepts of straight lines), of the features/nodes is considered to be uncertain. Consider a set of observations i = {51,. . . ,5,,} of the environment where p and 5, are Gaussian random vectors; p z N ( 6 , A,,) ; 5, z N ( $, &) ; Zi= jj &; and Vi is zero mean Gaussian noise. The posterior probability distribution a ( pIf,, . . . , i n )is jointly Gaussian
+
MULTISENSORY COMPUTER VISION
89
with mean
and covariance matrix
When the observations are not all Gaussian, a clustering and filtering operation can be used to reject the outlying measurements to arrive a t a consensus estimate of the parameter vector (Durrant-Whyte, 1988). Given the network world model that expresses geometric constraints, fusing a new set of sensor observations into the network requires uncertainties to be updated throughout the network. Durrant-Whyte (1988) describes a rational update policy that maintains Bayesianity and geometric consistency. Pearl (1987) also describes methods for propagating and updating belief in specific classes of distributed Bayesian networks. When the environment is unknown, the multisensor system can be considered a team of multi-Bayesian observers. Consider two sensors making observations zI and z2 of two disparate geometric features p l and p 2 . If a geometric relationship exists between the two features, then the local estimates 6'(z1) and 62(z2)made by the sensors constrain each other. A utility function u,[ . , 6,(z,)] is required to compare local decisions. An appropriate choice of the individual utility function is the posterior likelihood
where p I is a single feature being estimated. The team utility function may be chosen to be the joint posterior likelihood
The team utility function may have either a unique mode or be bimodal. The former convexity property indicates that the sensors agree with the team consensus; and the latter condition indicates that they disagree. Consider the transformation of the scene geometry p to individual features p i by the transformation p i= h i ( p ) . Denote the hypothesis of the scene geometry generated from a single feature piasp = h;'(pi). The inverse transformation is, in general, indeterminate. If each sensor makes individual estimates 6,(zi) of possible features, the sensor fusion task is to find p such that the joint
90
N. NANDHAKUMAR AND J. K. AGGARWAL
posterior density given by ri
F { ~ ~ I ~ ; ~ [ ~ I (.Z. ,I K) 1I [, 6. n ( z n ) I }
=
llI L{r~1hi'[6i(zr)I}
(22)
I =
is convex. Durrant-Whyte (1988) describes a recursive algorithm that implements a pair-wise complexity analysis to cluster agreeing hypotheses into ditferent groups. 3.2 Variational Methods for Sensor Fusion
The integrated analysis of multiple sensors can sometimes be formulated as an optimization problem subject to multiple constraints. For example, depth information may be provided by multiple sensing techniques and the problem is to fit a surface while minimizing the deviation from smoothness. Analysis techniques available in the calculus of variations are typically applied to such problems. The method of Lagrange multipliers is used to integrate the constraints from the multiple sensors to form a new functional to be optimized (extremized). Consider the problem of solving for functions f i ( x ) , i = 1, . . . ,n which have specified values at the boundaries x = xI and x = x2. Given a criterion to be satisfied, e.g., smoothness, the approach consists of formulating an error functional to be minimized and of the form: e=
jAyF(x,fi,. . . ,.L,f;,. . . , ~ A I
dx.
(23)
The minimization is subject to the constraints u i ( x , f I , .. . , f n ) = 0,
i = 1 , 2 , . . . , m.
(24)
For example, if multiple-range sensing methods yield multiple estimates of depth zk(x) and iff(x) is the required surface, then an appropriate form for U k W - 1 is U k ( X , f ) = [f(x)- Z k ( 4 l 2 . Using the method of Lagrange multipliers, a new error functional of the form
is minimized where
MULTISENSORY COMPUTER VISION
91
and L,(x) are known as the Lagrange multipliers. Applying the variational principle, it can be shown that Eq. (25) is minimized by the solution to the following Euler equations (Courant and Hilbert, 1953) :
Discrete approximations of the Euler equations specify an iterative numerical solution for the unknown functions f ; ( x ) . A very simple error functional is presented in the preceding for the sake of illustration. More useful formulations comprise multiple independent variables (multiple integrals), %%,expressed as a function of second- and higherorder derivatives off;, and constraints that may be expressed in integral forms. The two-dimensional formulation is commonly used for combining multiple constraints and multiple sources of information in tasks such as surface extraction (Ikeuchi and Horn, 1981 ; Moerdler and Boult, 1988) and motion computation (Aggarwal and Nandhakumar, 1988). Euler equations are unavailable for all general forms of the error functional, and in general, they have to be derived for specific cases by using the variational principles. Note that the variational approach is a deterministic approach. One advantage of this approach is that it does not require the knowledge of prior probability models, as in the case of statistical approaches. However, a priori information is required in the variational approach and is implicit in the form of the specific error functional chosen, e.g., C' smoothness of the surface.
3.3 Artificial Intelligence Approaches The complexity of the task sometimes precludes simple analytical formulations for scene interpretation. Models relating the images of each sensor to scene variables, models relating sensors to each other, and algorithms for extracting and combining the different information in the images usually embody many variables unknown prior to interpretation. This necessitates the use of heuristic and empirical methods for analyzing the images. Typically, the appropriate choices of techniques for processing the imagery are also not known a priori. Hence, the strategy for interpreting the images tends to be very complex. The nature of the task demands the use of iterative techniques that search for interpretations consistent with known analytical models as well as common-sense knowledge of the scene. These strategies are typically implemented as hypothesize-and-verify cycles of processing. A combination of data-driven and goal-driven processing is therefore required. Another complication involved in interpreting multisensory imagery is that
92
N. NANDHAKUMAR AND J. K. AGGARWAL
different kinds of information extracted from each of the sensors and information derived from combining the information are best represented and stored using different schemes. Maintaining these representations, as well as the explicit relationship between them, is difficult. The issues raised earlier, including those of complex strategies, knowledge representation, and the application of heuristic and empirical techniques, have been intensively researched in the field of artificial intelligence (AI). Such research has focused on general theories regarding these issues as well as on solutions to specific problems in which such issues are addressed (Nandhakumar and Aggarwal, 1985). The recent progress in artificial intelligence research has made available many useful computational tools for sensor fusion. The development of large data bases of heuristic rules and complex control strategies for combining multisensory data has been explored. Research issues focus on ( I ) developing new representational schemes for modeling the world and sensed information in a common framework that support reasoning and decision making, and (2) developing new interpretation strategies for specific sensor suites and applications. Typically, a specific set of sensing modalities is chosen for an application. Useful features are identified and algorithms for evaluating them are implemented. Rules are then used for examining the collection of these features to arrive at a consistent interpretation. An important aspect of the interpretation strategy is to decide on which area of the scene or subset of features to focus at some intermediate stage of processing, viz., focus of attention mechanism. The given task, the choice of the features, and the interpretation strategy are usually instrumental in suggesting an appropriate world representation. No single A1 framework has been shown to be optimal for a general collection of sensors and for all tasks. Hence, we present a survey of multisensory vision systems that are representative of different approaches to different specific tasks. A rule-based system that combines information from ladar range, ladar intensity, ladar doppler, millimeter-wave radar, and passive infrared imagery for detecting and classifying man-made objects in outdoor scenes is being developed using KEE, a commercially available expert system shell (Chu et al., 1988, 1990). Frames are used in a hierarchical organization to represent individual regions and scene objects that are collections of regions (see Fig. 12). Slots in the frames correspond to region parameters and the attributes of objects. Rules are applied to the segmented input images to evaluate slot values for low-level frames. Rules are then applied to these frames to form groupings of frames corresponding to objects in the scene. In addition to this forward-chaining approach, it is also possible to implement different control strategies, such as backward chaining and truth maintenance. The KEE expert system shell has also been used for implementing a system
MULTISENSORY COMPUTER VISION
93
FIG. 12. Representation of regions and objects using frames in KEE.
that identifies structures in color images of outdoor scenes (Baker et ul., 1988, 1989). Low-level processing yields cues that instantiate models. Modeldriven processing refines the partial segmentation and extracts geometric and color features in the image to verify the instantiated model. A representation termed multisensor kernel system (MKS) is proposed for a robot equipped with various types of sensors (Henderson and Fai, 1983). The representation of three-dimensional objects is built from information provided by “logical sensors,” which provide 2-D and 3-D features extracted from visual and range images of the object. The logical sensor outputs are combined to form a feature vector of dimension k, where k is the number of logical sensor outputs. These vectors are nodes of a “spatial proximity graph.” This representation is built by first ordering the collection of vectors into a tree structure based on a measure of distance between vectors and then linking nearest neighbors of the vectors to each other. Although the representation is argued to be general, it has been developed specifically for fusing visual and tactile data. It is unclear how suitable this approach is for a suite of highly disparate sensors. A schema-based approach for sensor fusion is proposed, based on experience gained by the researchers in developing the VISIONS system (Belknap, Riseman, and Hanson, 1986; Arkin, Riseman, and Hanson, 1988). The system is used to integrate information from sonar sensors and visual cameras and has been argued to be a useful test bed for experimenting with different perceptual strategies for robot navigation. The schema-based system allows top-down and bottom-up analyses. Initially discovered cues generate hypotheses. Focus of attention mechanisms then direct processing to
94
N. NANDHAKUMAR AND J.
K. AGGARWAL
verify or discard these hypotheses. The system is described in detail for interpreting scenes based on combining the output of line-detecting and region-finding modules. A distributed blackboard approach has been proposed for sensor fusion in an autonomous robot (Harmon and Solorzano, 1983; Harmon, 1988). The blackboard is organized into a class tree. This hierarchical representation allows inheritance mechanisms, which are useful, for example, in maintaining geometric reference frames of various objects in the scene. Control statements, which are extended forms of production rules, are stored in the blackboard as separate objects and activated by a monitor that detects when condition field values of the rules are changed. The distributed system includes tools for performance monitoring and debugging. The system does not consider any specific algorithms for sensor interpretation and fusion. Applications of the system for autonomous welding and autonomous terrain-based navigation are reported. Hutchinson, Cromwell, and Kak (1988) describe a system that dynamically plans optimal sensing strategies in a robot work cell. An augmented geometric CAD model of the object is used. In addition to representing the object’s 3-D structure, the model also includes a table of features that can be observed by each of the sensors, as well as an aspect graph of the object (Ikeuchi and Kanade, 1988). Sensors include a laser ranging device, fixed and manipulator-held video cameras, a force-torque sensor mounted on the robot’s wrist, and the manipulators, which measure the distance between the robot’s fingers. A wide variety of 3-D and 2-D features are extracted separately from each of these sensors. The initial set of features extracted from the imaged object form hypotheses of the object’s possible positions and attitudes. The aspect graph is searched for the best viewing position to disambiguate the hypotheses. This viewing position is then chosen and the sensor( s) appropriate for sensing the features in the predicted aspects are applied. Hutchinson et af. (1 988) describe the application of this technique to one object in the work cell. Luo and Lin (1987) proposed a system for fusing a wide variety of sensors for a robot assembly cell. Analysis and control of sensing is divided into four phases : “far away,” “near to,” “touching,” and “manipulating.” A probabilistic framework is used to fuse 3-D feature location estimates using measurements made in each of these phases. Experimental results illustrating the application of this approach to a real task are unavailable. 3.4 The Phenomenological Approach
The phenomenological approach is a recently developed computational approach for integrating multisensory information (Nandhakumar and
MULTISENSORY COMPUTER VISION
95
Aggarwal, 1987, 1988a-c). This approach relies on phenomenological or physical models that relate each of the sensed signals to the various physical parameters of the imaged object. The models are based on physical laws, e.g., the conservation of energy. The objective is to solve for the unknown physical parameters by using the known physical constraints and signal values. The physical parameters then serve as meaningful features for object classification. Denote sensed information as s,. Each imaging modality (viz., physical sensor) may yield many types of sensed information s,. For example, we may have sI= “thermal intensity,” s2 = “stereo range,” s-) = “visual intensity,” s4 = “visual edge strength,” etc. Let ZJx, y ) denote the value of sensed information s, at any specified pixel location (x, y ) . For the sake of brevity, Zs,will be used instead of Zs,(x,y) in the following. Each source of information is related to object parameters and ambient scene parameters, collectively denoted by p,, via a physical model of the following form:
where N is the total number of scene and object parameters. Note that for eachJ;, only a subset of the entire set of parameters has nonzero coefficients. Examples of pi include visual reflectance of the surface, relative surface orientation, material density, and surface roughness. In addition to the preceding, various natural laws may be applied to interrelate the physical properties of the objects, e g , principles of rigidity and the law of the conservation of energy. These lead to additional constraints of the following form:
Let K denote the set of all p, known a priori, either by direct measurement (e.g., ambient temperature) or directly derivable from an image (e.g.,surface temperature from thermal image). Let U denote the set of all pi not directly measurable. Obviously N = +t( U ) + %?(K),where +t( U ) denotes the cardinality of set U. To solve for the unknown parameters, we need a total of at least W ( U ) independent equations in the form of (28) or (29) that contain elements of U. Note that, in general, the equations are nonlinear, and hence solving them is not straightfoward. Also, it may be possible to specify a larger number of equations than required, thus leading to an overconstrained system. An error minimization approach may then be used to solve for the unknowns. Consider the integration of spatially registered and calibrated thermal and visual imagery using such an approach (Nandhakumar and Aggarwal, 1987, 1988a-c). The gray-level L, of the thermal image provides information
96
N . NANDHAKUMAR AND J . K . AGGARWAL
regarding surface temperature T,. The relation is of the following form:
where KI , K2, C I , and C, are constants, and [Al, A,] is the spectral bandwidth of the sensor. Assuming Lambertian reflectance in the visual spectrum, the gray-level L, of the visual image is related to the surface reflectance p and the incident angle 8 as
L, K3 W,p cos 8 + K4 (31) where K3 and K4 are constants, and W, is the intensity of irradiation (W/ m2) on a surface perpendicular to the direction of irradiation. The principle of the conservation of energy applied to the surface equates the absorbed energy Wahq (in the visual spectrum) to the sum of the conducted, convected, and radiated energies (Wed, W,,, and Wrud,respectively; see Fig. 13). This energy balance constraint is expressed as A=
(a242
+ a3q3)
(41 - 1) q1 = Wdha/Wcd
92 =
(Tb- Tam,)/
wabs
93 = d F? - T:,b)/
wxbb
a 2 = h convection coefficient
a , = E surface emissivity wabs= K ( I - p ) cos e.
V
/
FIG. 13. Surface energy exchange (Nandhakumar and Aggarwal, 1988~).
MULTISENSORY COMPUTER VISION
4 a b s
“.‘d ““‘+I
97
rad
cT
-I-
FIG. 14. Equivalent thermal circuit of the imaged surface (Nandhakumar and Aggarwal, 1988~).
From these equations, it is possible to compute R at each pixel. R is an estimate of the ratio Wcd/Wabsand, therefore, is a measure of the object’s relative ability to act as a heat sink/source. The value of R is closely related to that of the object’s lumped thermal capacitance (Fig. 14). Hence, R is a physically meaningful feature for object classification. Figure 15 shows a block diagram of the sensor integration scheme. Figure 16 shows the visual image of a scene. Figure 17 shows the thermal image. Figure 18 shows the mode of the values of R computed for each region. Figure 19 shows the output of a decision tree classifier that uses R and other image-derived features. This phenomenological approach is extended for analyzing a temporal sequence of thermal and visual imagery (Nandhakumar, 1990). Nandhakumar (1991) discusses robust methods of solving the parameters occurring in Eq. (32). The phenomenological approach is also suitable for a variety of other sensor suites and domains of application. For example, the interpretation of underwater visual and sonar imagery described by Malik and Nandhakumar
Thermal Image
I
I
T
I
I Conducted Heat Flux I R = ConductediAbsorbed
Temp.
I
T
.
Refl.
I I
Region Labels
1
FIG.15. Overview of approach for integrated analysis of thermal and visual imagery (Nandhakumar and Aggarwal, 1988~).
98
N. NANDHAKUMAR AND J. K. AGGARWAL
FIG. 16. Visual image of the scene (Nandhakumar and Aggarwal, 1988~).
(1991) follows such an approach. The phenomenological model used for this application is based on the conservation of acoustic energy propagating through the interface between two fluids. Roughness information extracted from visual imagery is used, along with acoustic backscatter information, to estimate the physical parameters of the imaged surface, such as compressional wave speed and material density ratios. These parameters are shown to be useful features for material classification. The integrated analysis of radar and optical sensors, described Shaw et u1. (1988), is also based on a phenomenological approach. The principal difference between the phenomenological approach and the others is that the former seeks to establish physically meaningful features for classification. The other approaches seek to establish optimal classification strategies without regard to the optimality of the feature set. An emerging technique yet to be explored in any detail relies on connectionist ideas (Bolle et al., 1988) and on principles of artificial neural networks (Pearson et ul., 1988; Gelfand et al., 1988). Very little work is reported on
MULTISENSORY COMPUTER VISION
99
FIG. 17. Thermal image of the scene (Nandhakumar and Aggarwal, 1988~).
the use of such approaches for sensor fusion. Neural mechanisms for sensor fusion discovered in several primitive natural perceptual systems are likely candidates for emulation by such approaches, although at this point the problem remains a very difficult one to solve. 4.
Fusion a t Multiple Levels
A computer vision system that uses single or multiple sensors to classify objects in a scene typically implements the following sequence of operations : (1) segmentation of image(s) and detection of features, (2) evaluation of feature attributes and values, and (3) classification/interpretation of features. Many variations of course do exist to this sequence of operations. For example, segmentation may be incomplete and partial segmentation may be iteratively refined based on interpretation. The computation of feature values may also be iteratively enhanced, and higher-level models may guide these operations. These modifications do not drastically change the approach to
100
N. NANDHAKUMAR AND J. K. AGGARWAL
FIG. 18. Mode of the heat flux ratio for each region (Nandhakumar and Aggarwal, 1988~).
the interpretation task, and the preceding paradigm is generally followed in most vision systems discussed in the literature and in the previous section of this chapter. It is obvious that one could use multiple sources of information in each of these operations to improve the performance of each module and, thus, that of the entire system. This aspect, i.e., the fusion of multisensory information at different levels of analysis, is discussed here. Examining recently reported systems from this perspective engenders a new paradigm for fusing information at different levels of analysis in a multisensory vision system. 4.1
Information Fusion at Low Levels of Processing
Asar, Nandhakumar, and Aggarwal (1990) describe an example of a technique that combines information at the lowest levels of analysis. Their technique segments scenes by using thermal and visual imagery. Image pyramids are grown separately for the thermal and visual images. Regions are grown in the thermal image at a reduced image resolution. Contrast information extracted from the visual image is used to control this region-growing process. The labels are propagated to the highest resolution image by using links in the visual pyramid to form the final segmentation.
MULTISENSORY COMPUTER VISION
101
FIG. 19. Output of decision tree classifier (Nandhakumar and Agganval, 1988~).
Duncan, Gindi, and Narendra (1987) describe another approach to segmenting scenes using multisensory data. The different noise characteristics of the two sensors used are exploited to yield the best estimate of edges in the images. A deterministic hill-climbing approach is adopted in a sequential search for the next edge pixel. The approach chooses one image over the other depending on the noise present at each location. The metric used is image variance about a candidate edge or boundary pixel. The method has been demonstrated on one-dimensional signals, and extensions to twodimensional images are discussed. However, no results are shown for twodimensional images. Also, it is unclear whether the technique works in cases where occlusions exist. Duncan and Staib (1987) discuss a model-driven approach for the segmentation of multisensory images. A probabilistic framework is employed. Edges and edge segments are extracted from the different images. Trial contours are generated and ty-s curves are computed for each contour. Disagreements between the trial contours extracted from the different images prompt the application of the model in searching the images for better trial contours. The search, however, consists of a local monotonic optimization approach and is susceptible to failure in the presence of local minima.
102
N. NANDHAKUMAR AND J. K. AGGARWAL
The composite gradient image extracted by Bhanu and Symosek (1987) from five channels of a multispectral imager is also a case where low-level sensor fusion is exploited for improved scene segmentation. The many segmentation methods that rely on color features may also be grouped under this category.
4.2 The Combination of Features in Multisensory Imagery
A great deal of research in multisensory computer vision has dealt with combining features extracted from the different sensors’ outputs. Each sensor’s output is processed separately to detect features. The extracted features are combined with one of two objectives in mind : ( 1 ) to produce new features different in type from those extracted from each sensor, or (2) to increase the reliability of the features extracted from each imaging modality. These two approaches are illustrated with examples. A typical example of the former approach is stereoscopic perception, where intensity edge locations are integrated to yield depth estimates. The computed 3-D information is different in nature from the 2-D information extracted from each image. The extraction of structure and motion paramctcrs from a sequence of monocular intensity images also belongs to the former class of approaches. The images need not be produced by the same sensing modality. An example of such a system is the one described by Nandhakumar and Aggarwal(l987,1988b). In this system, surface temperature values extracted from a thermal image are combined with surface shape and reflectivity values extracted from the corresponding visual image of the scene to estimate values of internal thermal object properties used as features for object classification. The other approach, which is distinct in its objective from that just described, integrates multiple values of one type of feature as sensed by different sensors to improve the accuracy of the final estimate. Typical examples of such an approach are systems that compute more reliable surface reconstructions by combining the surface estimates produced by different methods, e.g., the fusion of shape-from-texture and stereo outputs using a blackboard scheme combining the information (Moerdler and Boult, 1988). The combination of structured lighting techniques to compute surface shape with contour analysis to determine the location of the surface computed is another example of the latter approach (Wang and Aggarwal, 1989). An analogous approach is that followed by Shaw et al. (1988) in which surface shape is hypothesized by the visual image and radar cross-section scattering models verify and refine the reconstructed object shape. The MIT Vision Machine also conforms to this approach by integrating edge information in
MULTISENSORY COMPUTER VISION
103
the form of edges or discontinuities detected in the outputs of various modules, such as optic flow, texture analysis, etc. The objective is to produce a denser and more reliable map of discontinuities in the scene. In contrast to these examples where images were strictly visual, Chu et al. (1990) describe a technique for segmenting registered images of laser radar range and intensity data, and for combining the resultant segmentation maps to yield a more reliable segmentation of outdoor scenes into natural and man-made objects. The combination of the segmentation maps involves first partitioning regions in one map with those in the other and then using various heuristic rules to merge regions. 4.3 Sensor Fusion During High- Level Interpretation
Features extracted by separately processing the different images and also those computed based on combining information at low and intermediate levels of analysis, as discussed earlier, may be combined at the highest levels of analysis during the final stages of interpretation. The system described by Nandhakumar and Aggarwal(1988a, 1988c) discusses the fusion of information at the intermediate and higher levels of analysis. Aggregate features for each region in the image are evaluated separately for the thermal and visual images of outdoor scenes. A feature based on integrating information from the thermal and visual images at an intermediate level of analysis is also computed and an aggregate value of this feature for each region is computed. All these features are then considered together, during the final interpretation, by a decision tree classifier that labels regions in the scene as vegetation, buildings, roads, and vehicles. The CMU NAVLAB project also implements the fusion of information at higher levels of processing (Kanade, 1988). The range image is segmented into surface patches. The reflectance image is processed to yield lines. This information is combined to detect road edges for navigation. The colors and positions of the regions are used to further classify regions in the scene using an expert system. Dunlay (1988) adopted a similar approach wherein color imagery is processed separately using a simple color metric to extract road boundaries. The 3-D location of the road boundaries are computed assuming a planar road surface. These are overlaid on the range image to limit the search for obstacles on the road, which are detected as blobs in the range image. 4.4 A Paradigm for Multisensory Computer Vision
We now outline a model-based paradigm for multisensor fusion. We illustrate this paradigm by outlining a recently reported system that combines
104
N. NANDHAKUMAR AND J. K. AGGARWAL
thermal and visual imagery for classifying objects in outdoor scenes. Information fusion from the different imagery occurs at different lcvels of analysis in the system (see Fig. 20).
FIG.20.
Sensor fusion at various levels of analysis
At the lowest levels, thermal and visual imagery are combined to extract meaningful regions in the scene (Asar et al., 1990). A pyramidal approach is adopted for segmentation, as outlined in Section 4.1. The thermal image is then analyzed to produce estimates of surface temperature while the visual image produces estimates of surface shape and reflectivity. This information is combined at the intermediate stages of analysis via a phenomenological scene model, which is based on the law of the conservation of energy. Scene variables, such as wind speed, wind temperature, and solar insolation, are used in the model to relate surface temperature, shape, and reflectivity to an internal thermal object property, i.e., thermal capacitance (Nandhakumar and Aggarwal, 1987, 1988b). The physical model allows the estimation of heat fluxes at the surface of the imaged objects. A feature based on these surface fluxes yields insight into the relative ability of the object to act as a heat sink or heat source. This feature is evaluated at each pixel of the
MULTISENSORY COMPUTER VISION
105
registered thermal and visual image pair. Thus information fusion at this intermediate level is synergistic and results in a new feature useful in identifying scene objects (Nandhakumar and Aggarwal, 1987, 1988b). A representative value of this feature based on surface heat fluxes is chosen for each region by computing the mode of the distribution of this feature value for each region. Other aggregate features from each imaging modality, for each region, are also computed separately. These include the average region temperature and surface reflectivity. These features are used in a decision tree classifier to assign labels to the regions. The labels are vehicle, vegetation, road, and building. Thus, information from the two imaging modalities are again combined during this high-level interpretation phase (Nandhakumar and Aggarwal, 1988a, 1988~). Another important component of the system is the object modeling approach, which consists of a unified 3-D representation of objects that allows the prediction of the thermal image and the visual image as well as the surface heat fluxes and, hence, the features used in classification (Oh et al., 1989; Karthik et al., 1991). The model is constructed from multiple silhouettes of objects, and the model can be “edited” to include concavities, internal heat sources, and inhomogeneities. Currently, the models used in each of the levels of analysis are different, and the classification task is based on feature values lying in fixed ranges. The system is being extended to use the predictions provided by the unified object models to guide the interpretation phase. 5. Conclusions
The advantages of multisensory approaches to computer vision are evident from the discussions in the previous sections. The integration of multiple sensors or multiple sensing modalities is an effective method of minimizing the ambiguities inherent in interpreting perceived scenes. The multisensory approach is useful for a variety of tasks including pose determination, surface reconstruction, object recognition, and motion computation, among others. Several problems that were previously difficult or even impossible to solve because of the ill-posed nature of the formulations are converted to well-posed problems with the adoption of a multisensory approach. We discussed specific formulations that benefit from such an approach. The previous sections presented an overview of recent ideas developed in multisensory computer vision and a comparison and review of some recently reported work. We classified existing multisensory systems into three broadly defined groups : ( 1 ) those that combine the output of multiple processing techniques applied to a single image of the scene, (2) those that combine information extracted from multiple views of the same scene by using the
106
N. NANDHAKUMAR AND J. K. AGGARWAL
same imaging modality, and (3) those that combine different modalities of imaging, different processing techniques, or multiple views of the scene. We presented examples of several systems in each category. We discussed several commonly used computational frameworks for multisensory vision and presented typical applications of such approaches. The chapter categorized computational frameworks as statistical, variational, artificial intelligence, and phenomenological. We discussed issues pertaining to the hierarchical processing of multisensory imagery and the various levels at which sensory information fusion may occur. Finally, we presented a paradigm for a model-based vision system incorporating the fusion of information derived from different types of sensors at low, intermediate, and higher levels of processing. We discussed the specific case of integrating thermal and visual imagery for outdoor scene interpretation. However, the principles embodied in this approach can be generalized to other combination of sensor types and application domains. At the lowest levels of analysis, multisensory information is combined to segment the scene. At intermediate levels of analysis, a phenomenological scene model based on physical principles, such as the conservation of energy, is used to evaluate physically meaningful features. These features are combined a t the highest levels of analysis to identify scene objects. This paradigm emphasizes the optimality and physical significance of features defined for object recognition. Such an approach simplifies the design of classifiers and yet ensures the required performance. The phenomenological approach has been applied to a limited number of application domains. Its advantages in other application areas remain to be verified. We cited recent research in the fusion of sonar and visual imagery for underwater scene classification as another successful implementation of this paradigm. The paradigm was not presented as the preferred paradigm for all vision tasks. Instead, it was meant to illustrate the various issues that need to be addressed in designing a multisensory vision system. Another paradigm, based on a connectionist or artificial neural network approach to multisensory vision, also remains to be investigated in detail. Recent and continuing developments in multisensory vision research may be attributable to several factors, including (1) new sensor technology that makes affordable previously unexplored sensing modalities, ( 2 ) new scientific contributions in computational approaches to sensor fusion, and ( 3 ) new insights into the electrophysiological mechanisms of multisensory perception in biological perceptual systems. Most of the progress to data may be attributed to the second cause. The development of new, affordable sensors is currently an important and active area of research and may be expected to have a significant future impact on the capabilities of vision systems. For example, the availability of low-cost imaging laser ranging sensors, passive
MULTISENSORY COMPUTER VISION
107
infrared sensors, and high-frequency radar imagers would provide significant impetus to research in developing multisensor-based autonomous navigation, object recognition, and surface reconstruction techniques. Many lessons from nature are yet to be learned from neurophysiological and psychophysiological studies of natural perceptual systems. Such studies may provide useful clues for deciding what combination of sensing modalities are useful for a specific task, and they may also provide new computational models for intersensory perception. Many multisensory vision tasks are very computation intensive. Hence, while significant milestones have been established in multisensory computer vision research, the development and application of practical multisensory vision systems in industry, defense, and commerce have not, as yet, been completely successful. The continual increase in performance of available computational hardware may be expected to provide additional impetus to the development of practical multisensory vision systems for “real-world’’ applications. Highly parallel computer architectures may also meet the computational demands placed on multisensory strategies. The development of such architectures, the automatic identification of parallelism inherent in multisensory vision tasks, and strategies for exploiting this parallelism are other topics of research yet to be addressed. Therefore, a highly interdisciplinary approach to research in multisensory vision is expected in the future in order to realize practical and robust real-time vision systems. REFERENCES Aggarwal, J. K., and Magee, M. J. (1986). Determining Motion Parameters Using Intensity Guided Range Sensing, Pattern Recognition 19(2), 169-180. Aggarwal, J. K., and Nandhakumar, N. (1988). On the Computation of Motion from a Sequence of Images, Proceedings of the IEEE 76(8), 917-935. Aggarwal, J. K., and Nandhakumar, N. (1990). Multisensor Fusion for Automatic Scene Interpretation-Research Issues and Directions, in “Analysis and Interpretation of Range Images.” ed. R. C. Jain and A. K. Jain. Springer Verlag, New York, pp. 339-361. Aloimonos, J., and Basu, A. (1988). Combining Information in Low-Level Vision, “Proceedings of DARPA Image Understanding Workshop,” Cambridge, MA, pp. 862-906. Arkin, R . C., Riseman, E., and Hanson, A. (1988). AURA: An Architecture for Vision-Based Robot Navigation, “Proceedings of the DARPA Image Understanding Workshop,” Cambridge, MA, pp. 417 43 I . Asar, H., Nandhakumar, N., and Aggdrwal, J. K . (1990). Pyramid-Based Image Segmentation Using Multisensory Data, Pattern Recognition. Ayache, N., and Hansen, C. (1988). Rectification of Images for Binocular and Trinocular Stereovision, “Proceedings of the International Conference on Pattern Recognition,” Rome. Ayache, N., and Lustman, F. (1991). Trinocular Stereovision for Robotics, IEEE Trans. Pattern Analysis and Machine Intelligence 13, 73 -85. Baker, D. C., Aggarwal, J. K., and Hwang, S. S. (1988). Geometry-Guided Segmentation of Outdoor Scenes, “Proceedings of the SPIE Conference on Applications of Artificial Intelligence Vl,” Vol. 937, Orlando, FL, pp. 576-583.
108
N. NANDHAKUMAR A N D J. K. AGGARWAL
Baker, D. C., Hwdng, S. S., and Aggarwal, J. K. (1989). Detection and Segmentation of ManMade Objects in Outdoor Scenes: Concrete Bridges, Journal of the Optical Society of America A 6 ( 6 ) , 938-950. Ballard, D. H., and Brown, C. M. (1982). “Computer Vision.” Prentice-Hall, Inc., Englewood CliKs. NJ. Belknap, R., Riseman, E., and Hanson, A. (1986). The Information Fusion Problem and RuleRased Hypotheses Applied to Complex Aggregations of Image Events, “Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,” pp. 227234. Bhanu, B., and Symosek, P. (1987). Interpretation of Terrain Using Hierarchical Symbolic Grouping from Multi-Spectral Images, “Proceedings of the DARPA Image Understanding Workshop,’’ Los Angeles. pp. 466~-474. Blake, A. (1989). Comparison of the Efficiency of Deterministic and Stochastic Algorithms for Visual Reconstruction, IEEE Trtms. PAMI 11(1), 2-12. Rolle, R., Califano, A., Kjeldsen, R., and Taylor, R. W. (1988). Visual Recognition Using Concurrent and Layered Parameter Networks, to appear in “Proceedings of the TEEE Computer Society Conference on Computer Vision and Pattern Recognition,” San Diego, CA. Chu, C. C., Nandhakumar, N., and Aggarwal, J. K. (1988). Image Segmentation and Information Integration of Ldser Radar Data, “Proceedings of the Conference on Pattern Recognition for Advanced Missile Systems,” Huntsville, AL. Chu, C. C., Nandhakumar, N., and Aggarwal, 3. K. (1990). lmage Segmentation Using Laser Radar Data, Pattern Recognition 23(6), 569-581. CIE ( 1978). “Recommendation on Uniform Color Spaces- Color Difference Equations, Psychometric Color Terms,” Technical Report Supplement No. 2 to CTE Publication No. 15, Commision Internationale de L’Eclairage, Paris. Courant, R., and Hilbert, D. (1953). “Methods of Mathematical Physics,” Interscience Publishers, New York. Dhond, U. R., and Aggarwal, J. K. (1989a). Structure from Stereo-A Review, IEEE Trans. Systems, Mun und Cybernetics 19(6), 1489-1510. Dhond, U. R., and Aggarwal, 1. K. (1989b). A Closer Look at the Contribution of a Third Camera Towards Accuracy in Stereo Correspondence, “Image Understanding and Machine Vision,” Technical Digest Series 14, Optical Society of America, pp. 78-81. Di Zenzo, S., Bernstein, R., Degloria, S. D., and Kolsky, H. G . (1987). Gaussian Maximum Likelihood and Contextual Classification for Multicrop Classification, IEEE Trans. on Ceoscience and Remote Sensing GE-25(6), 805 814. Duda, R. O., and Hart. P. E. (1973). “Pattern CI ification and Scene Analysis,” John Wiley and Sons, New York. Duncan, J. S., and Staib, L. H. (1987). Shape Determination from Incomplete and Noisy Multisensory Imagery, “Proceedings of AAAI Workshop on Spatial Reasoning and MultiSensor Fusion,” St. Charles, IL, pp. 334 344. Duncan, J. S., Gindi, G. R., and Ndrendra, K. S. (1987). Multisensor Scene Segmentation Using Learning Automata, “Proceedings of the AAAI Workshop on Spatial Reasoning and Multi-Sensor Fusion,” St. Charles, IL, pp. 323-333. Dunlay, R. T. (1988). Obstacle Avoidance Perception Processing for the Autonomous Land Vehicle, “Proceedings of the IEEE International Conference on Robotics and Automation,” Philadelphia, pp. 912.~917. Durrant-Whyte, H. F. (1987). Sensor Models and Multi-Sensor Integration, “Proceedings of the AAAI Workshop on Spatial Reasoning and Multi-Sensor Fusion,” St. Charles, IL, pp. 303 312. Durrant-Whyte, H. F. (1988). “Integration, Coordination, and Control of Multi-Sensor Robot Systems,” Kluwer Academic Publishers, Boston, 1988.
MULTISENSORY COMPUTER VISION
109
Frankot, R. T., and Chellappa, R. (1988). A Method for Enforcing Integrability in Shape from Shading Algorithms, IEEE Trans. Pattern Analysis and Machine Intelligence 10, 439- 451. Fukunaga, K . (1990). “Introduction to Statistical Pattern Recognition.” Academic Press, San Diego, CA. Gelfand, J. J., Pearson, J. C., and Spence, C. D. (1988). Multisensor Integration in Biological Systems, “Proceedings of the Third IEEE Symposium on Intelligent Control,” Arlington, VA . Geman, S., and Geman, D. (1984). Stochastic Relaxation, Gibbs Distribution and the Bayesian Restoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, 6 , 721-741. Gonzalez, R. C., and Wintz, P. (1987). “Digital Image Processing.” Addison-Wesley Publishing Company, Reading, MA. Hager, G., and Mintz, M. (1987). Searching for Information, “Proceedings of the AAAI Workshop on Spatial Reasoning and Multi-Sensor Fusion,’’ St. Charles, IL, pp. 313-322. Harmon. S. Y. (1988). Tools for Multisensor Data Fusion in Autonomous Robots, “Proceedings of the NATO Advanced Research Workshop on Highly Redundant Sensing for Robotic Systems,” II Ciocco, Italy. Harmon, S. Y., and Solorzano, M. R. (1983). Information Processing System Architecture for an Autonomous Robot System, “Proceedings of the Conference on Artificial Intelligence,” Oakland University, Rochester, MI. Healey, G. (1991). Using Color to Segment Images of 3-D Scenes, “Proceedings of the SPIE Conference on Applications of Artificial Intelligence,” vol. 1468, Orlando, FL, pp. 814-825. Henderson, T. C., and Fai, W. S. (1983). A Multi-Sensor Integration and Data Acquisition System, “Proceedings of the IEEE Computer Society Conference Computer Vision and Pattern Recognition,” Washington, DC, pp. 274-279. Horn, B. K. P. (1986). “Robot Vision.” MIT Press, Cambridge, MA. Horn. B. K . P., and Brooks, M. J. (1986). The Variational Approach to Shape from Shading, Computer Vision Graphics and Image Processing 33, 174-208. Hu, G., and Stockman, G. (1987). 3-D Scene Analysis via Fusion of Light Striped Image and Intensity Image, “Proceedings of AAAI Workshop on Spatial Reasoning and Multi-Sensor Fusion,” St. Charles, IL, pp. 138 147. Hutchinson, S. A,, Cromwell, R. L., and Kak, A. C. (1988). Planning Sensing Strategies in a Robot Work Cell with Multisensor Capabilities, “Proceedings of the IEEE International Conference on Robotics and Automation,” Philadelphia, pp. 1068 1075. Ikeuchi, K., and Horn, B. K. P. (1981). Numerical Shape from Shading and Occluding Contours, Artificial lntelligence 17, 141-184. Ikeuchi, K., and Kanade, T. (1988). Modeling Sensors and Applying Sensor Model to Automatic Generation of Object Recognition Program, “Proceedings of the DARPA Image Understanding Workshop,” Cambridge, MA, pp. 697-710. Jain, A. K. (1989). “Fundamentals of Digital Image Processing.” Prentice-Hall, Englewood Cliffs, NJ. Jordan, J. R., and Bovik, A. C. (1988). Computational Stereo Using Color, Cover Paper of Special Issue on Machine Vision and Image Understanding, IEEE Cuntrul Systems Magazine 8(3), 31-36. Julesz, B., and Bergen, J. R. (1987). “Textons, The Fundamental Elements in Preattentive Vision and Perception of Textures,” in “Readings in Computer Vision : Issues, Problems, Principles, and Paradigms,” ed. M. A. Fischler and 0. Firschein. Morgan Kaufmann Publishers, Los Altos, CA, pp. 243-256. Kanade, T. (1988). CMU Image Understanding Program, “Proceedings of the DARPA Image Understanding Workshop,” Cambridge, MA, pp. 40-52. Karthik, S., Nandhakumar, N., and Aggarwal, J. K. (1991 ). Modeling Non-Homogeneous 3D Objects for Thermal and Visual Image Synthesis, “Proceedings of the SPIE Conference on Applications of Artificial Intelligence,” Orlando, FL.
110
N. NANDHAKUMAR AND J.
K. AGGARWAL
Klinker, G. J., Shafer, S. A,, and Kanade, T. (1988). Image Segmentation and Reflection Analysis through Color, “Proceedings of the DARPA Image Understanding Workshop,” Cambridge. MA, pp. 838 853. Krotkov, E., and Kories, R. (1988). Adaptive Control of Cooperating Sensors: Focus and Stereo Ranging with an Agile Camera System, “Proceedings of IEEE International Conference on Robotics and Automation,” Philadelphia, pp. 548-553. Lcc, B. G., Chin, R. T., and Martin, D. W. (1985). Automated Rain-Rate Classification of Satellite Images Using Statistical Pattern Recognition, IEEE Trans. on Geascience and Remote Sensing CE-23(3), 31 5 324. Levine, M. D., and Nazif, A. M . (1985a). Dynamic Measurement of Computer Generated Image Segmentations, IEEE Trans. P A M 1 7(2), 155--164. Levine, M . D., and Nazif, A. M. (198%). Rule-Based Image Segmentation--A Dynamic Control Strategy Approach, Computer Vision Graphics and Image Processing 32(1), 104-1 26. Luo, R . C., and Lin, M.-H. (1987). Multisensor Integrated Intelligent Robot for Automated Assembly, “Proceedings of the AAAI Workshop on Spatial Reasoning and Multi-Sensor Fusion,” St. Charles, IL, pp. 351-360. Magee. M. J.. and Aggarwdl, J. K. (1985). Using Multi-Sensory Images to Derive the Structure of Three-Dimensional Objects: A Review, Computer Vision, Gruphics and Image Processing 32, 145 157. Magee, M. J., Royter, B. A,, Chien, C.-H., and Aggarwal, J. K. (1985). Experiments in Intensity Guided Range Sensing Recognition of Three-Dimensional Objects. IEEE Trans. on Pattern Anulysis and Machine Intelligence 7(6), 629 -637. Malik, S., and Nandhakumar, N . (1991). Multisensor Integration for Underwater Scene Classification, “Proceedings of the IEEE International conference on Systems, Man, and Cybernetics,” Charlottesville, VA. Marr, D. (1982). “Vision.” W. 14. Freeman and Co., New York. Matthies, L., and Elfes, A. (1988). Integration of Sonar and Stereo Range Data Using a GridRased Representation, “Proceedings of the IEEE lnternational Conference on Robotics and Automation,” Philadelphia, pp. 727-733. Mitiche, A.. and Aggarwal, J. K. (1986). Multiple Sensor Integration/Fusion Through Image Proccssing : A Preview, Optical Engineering 25(3), 380~-386. Mitiche, A,, Cil, B., and Aggarwal, J. K. (1983). Experiments in Combining Intensity and Range Edge Maps, Computer Viyion, Graphics and Imrige Processing, 21, 395- 41 1. Moerdler. M . L., and Boult, T. E. (1988). The Integration of Information from Stereo and Multiple Shape-from-Texture Cues. “Proceedings of DARPA Image Understanding Workshop,” Cambridge, MA, pp. 786-793. Moerdler, M. L., and Kender, J. R. (1987). An Approach to the Fusion of Multiple Shape from Texture Algorithms, “Proceedings of AAAT Workshop on Spatial Reasoning on MultiSensor Fusion,” St. Charles, IL. pp. 272 281. Nandhakumar, N . (1990). A Phenomenological Approach to Multisource Data Integration : Analyzing Infrared and Visible Data, “Proceedings of the NASA/TAPR TC7 Workshop on Multisource Data Integration in Remote Sensing,” College Park, MD. Nandhakumar, N. (1991). Robust Integration of Thermal and Visual Imagery I‘or Outdoor Scene Analysis, “Proceedings of the IEEE International Conference on Systems, Man and Cybernetics,” Charlottesville. VA. Nandhakumar, N., and Aggarwal, J. K. (1985). The Artificial Intelligence Approach to Pattern Recognition-A Perspective and an Overview, Pattern Recognition 18(6), 383-389. Nandhakumar, N., and Aggarwal, J. K. (1987). Multisensor Integration-Experiments in Integrating Thermal and Visual Sensors, “Proceedings of the First International Conference on Computer Vision,” London, pp. 83-92.
MULTISENSORY COMPUTER VISION
111
Nandhakumar, N., and Aggarwal, J. K. (1988a). A Phenomenological Approach to Thermal and Visual Sensor Fusion, “Proceedings of the NATO Advanced Research Workshop on Highly Redundant Sensing for Robotic Systems,” I1 Ciocco, Italy, pp. 87-101. Nandhakumar, N., and Aggarwal, J. K. (1988b). Integrated Analysis of Thermal and Visual Images for Scene Interpretation, IEEE Trans. on Pattern Analysis and Machine Infelligence 10(4), 469-481. Nandhakumar, N., and Agganval, J . K. (1988~).Thermal and Visual Information Fusion for Outdoor Scene Perception, “Proceedings of the IEEE International Conference on Robotics and Automation,” Philadelphia, pp. 1306 1308. Newman, E. A,, and Hartline, P. H. (1982). The Infrared “Vision” of Snakes, Scientific American 246(3), 116-127. Oh, C. H., Nandhakumar, N., and Agganval, J. K. (1989). Integrated Modelling of Thermal and Visual Image Generation, “Proceedings of the IEEE Computer Vision and Pattern Recognition Conference.” Ohta, Y. ( 1 985). “Knowledge-Based Interpretation of Outdoor Natural Color Scenes,” Pitman Publishing Inc., Massachusetts. Pearl, J. (1987). Distributed Revision of Composite Beliefs, Artificiul Intelligence 33, pp. 173215. Pearson, J. C., Gelfand, J. J., Sullivan, W. E., Peterson, R. M., and Spence, C. D. (1988). Neural Network Approach to Sensory Fusion, “Proceedings of the SPIE Conference on Sensor Fusion,” Vol. 931, Orlando, FL, pp. 103 108. Poggio, T., Little, J., Gillett, W., Geiger, D., Wienshall, D., Villalba, M., Larson, N., Cass, T., Bulthoff, H., Drumheller, M., Oppenheimer, P., Yang, W., and Hurlhert, A. (1988). The MIT Vision Machine, “Proceedings of DARPA image Understanding Workshop,” Cambridge, MA, pp. 177-198. Rodger, J. C., and Browse, R. A. (1987). An Object-Based Representation for Multisensory Robotic Perception, “Proceedings of the AAAI Workshop on Spatial Reasoning and MultiSensor Fusion,” St. Charles, IL, pp. 13 20. Rosenfeld, A,, and Kak, A. C. (1982). “Digital Image Processing.” Academic Press, New York. Rosenthal, W. D., Blanchard, B. J., and Blanchard, A. J. ( 1985). Visible/Infrared/Microwave Agriculture Classification, Biomass and Plant Height Algorithms, IEEE Trans. on Geoscience and Remote Sensing GE-23(2), 84-90. Schalkoff, R. J. (1989). “Digital Image Processing and Computer Vision,” John Wiley and Sons, New York. Shafer, G. (1976). “A Mathematical Theory of Evidence,” University Press, New York. Shaw, S. W., deFigueiredo, R. J. P., and Kumar, K . (1988). Fusion of Radar and Optical Sensors for Space Robotic Vision, “Proceedings of the IEEE Robotics and Automation Conference,” Philadelphia, pp. 1842- 1846. Simchony, T., and Chellappa, R. (1990). Direct Analytical Methods for Solving Poisson Equations in Computer Vision Problems, IEEE Trans. Pattern Analysis and Machine Intelligence 12, 435-446. Stentz, A,, and Goto, Y. (1987). The CMU Navigational Architecture, “Proceedings of the DARPA Image Understanding Workshop,” Los Angeles, pp. 440-446. Therrien, C. W. (1989). “Decision, Estimation and Classification.” John Wiley and Sons, New York. Wang, Y. F., and Aggarwal, J. K. (1987). On Modeling 3-D Objects Using Multiple Sensory Data, “Proceedings of the IEEE International Conference on Robotics and Automation, Raleigh, NC, pp. 1098-1 103. Wang, Y. F., and Aggarwal, J. K. (1989). Integration of Active and Passive Sensing Techniques for Representing 3-D Objects, IEEE Trans. Robotics and Automation 5(4), 460-471.
This Page Intentionally Left Blank
Parallel Computer Architectures RALPH DUNCAN Control Data Government Systems Atlanta. Georgia 1. Introduction . . . . . . . . . . . . . . 2 . Terminology and Taxonomy . . . . . . . . . 2.1 Interrelated Problems of Terminology and Taxonomy 2.2 Low-level Parallelism . . . . . . . . . 2.3 Flynn’s Taxonomy . . . . . . . . . . 2.4 Definition and Taxonomy . . . . . . . .
3 . Synchronous Architectures .
.
. . . . . . . . . . . . 4. MIMD Architectures . . . . . . . 4.1 Distributed Memory Architectures . 4.2 Shared Memory Architectures . . . 5. MIMD Execution Paradigm Architectures . 5.1 MIMD/SIMD Architectures . . . 5.2 Data-Flow Architectures . . . . 5.3 Reduction Architectures . . . . 3.1 Pipelined Vector Processors 3.2 SIMD Architectures . . 3.3 Systolic Architectures . .
5.4 Wavefront Array Architectures 6. Conclusions . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
1 . Introduction
The term “parallel processing” designates the simultaneous execution of multiple processors to solve a single computational problem cooperatively . Parallel processing has attracted a great deal of recent interest because of its potential for making difficult computational problems tractable by significantly increasing computer performance . Two basic kinds of computational problems are encouraging research in parallel processing through their need for orders-of-magnitude improvements in computer processing speed. First. problems characterized by inordinate size and complexity. such as detailed weather or cosmological modeling. often require hours or days of conventional processing . This 113 ADVANCES IN COMPUTERS. VOL 34
Copyr~ght0 1992 by Academic Press. Inc All nghts of reproduction In any form reserved ISBN 0-12-012 134-4
114
RALPH DUNCAN
hinders developing conceptual models and discourages researchers from modeling the phenomena of interest at a desirable level of detail. Real-time problems, which require computations to be performed within a strictly defined time period and are typically driven by external events, also need significant performance improvements. Real-time systems are being taxed by shorter times for processing and by demands for more processing to be performed before a time deadline. For example, real-time systems in military aircraft are being stressed by increased sensor input speeds and by the need for additional processing to provide more sophisticated electronic warfare functionality. These computational problems call for vast performance increases that conventional, single-processor computers are unlikely to provide. Although developers have achieved impressive increases in uniprocessor speed, continued advances are constrained by fundamental physical laws. The primary barriers to achieving this kind of performance improvement through parallel processing, however, are conceptual ones-finding efficient ways to partition a problem among many processors and to orchestrate multiple processors executing in a cooperative fashion. Since the difficulty of surmounting conceptual obstacles is less formidable than overcoming fundamental physical laws (such as the speed of light), parallel processing is a promising means for achieving significant computer performance advances. Clearly, parallel processing must be supported by architectures that are carefully structured for coordinating the work of many processors and for supporting efficient interprocessor communications. The many parallel architectures that have been developed or proposed define a broad and quite diverse spectrum of architectural possibilities. There are several reasons for this variety; these include the many possible responses to the fundamental conceptual challenge, the divergent characteristics of problems amenable to parallelization, and the practical limitations of alternative technologies that can be used for inter-processor communications. The parallel architecture discipline has been further enriched by the introduction of a host of new parallel architectures during the 1980s. The sheer diversity of parallel processing architectures can be daunting to a nonspecialist. Thus, this chapter attempts to provide a tutorial that surveys the major classes of parallel architecture, describing their structure and how they function, In addition, this chapter correlates parallel architecture classes with references to representative machines, in order to steer the interested reader to the vast literature on individual parallel architectures. Although this chapter’s primary intent is not taxonomic, a high-level parallel architecture taxonomy is presented in order to structure the discussion and demonstrate that the major architecture classes define a coherent spectrum of design a1ternatives.
PARALLEL COMPUTER ARCHITECTURES
115
2. Terminology and Taxonomy
2.1
Interrelated Problems of Terminology and Taxonomy
A coherent survey of parallel architectures requires at least a high-level architecture taxonomy in order to show that the diversity of extant architectures springs from different approaches to supporting a small number of parallel execution models, rather than from ad hoc approaches to replicating hardware components. A parallel architecture taxonomy, in turn, requires a definition of “parallel architecture” that carefully includes or excludes computers according to reasonable criteria. Specifying a definition for parallel architectures that can serve as the basis for a useful taxonomy, is complicated by the need to address the following goals : 0
0
0
Exclude architectures incorporating only low-level parallel mechanisms that have become commonplace features of modern computers Maintain elements of Flynn’s useful taxonomy (Flynn, 1966) based on instruction and data streams Include pipelined vector processors and other architectures that intuitively seem to merit inclusion as parallel architectures, but that are difficult to gracefully accommodate within Flynn’s scheme. 2.2 Low-level Parallelism
How a parallel architecture definition handles low-level parallelism is critical, since it strongly heavily influences how inclusive the resulting taxonomy will be. Our definition and taxonomy will exclude computers that employ only low-level parallel mechanisms from the set of parallel architectures for two reasons. First, failure to adopt a more rigorous standard could make the majority of modern computers “parallel architectures,” rendering the term useless. Second, architectures having only the features listed below do not offer the explicit framework for developing high-level parallel programming solutions that will be an essential characteristic of our parallel architecture definition.
0
Instruction pipelining-the decomposition of instruction execution into a linear series of autonomous stages, allowing each stage to simultaneously perform a portion of the execution process (e.g., decode, calculate effective address, fetch operand, execute, store) Multiple central processing unit (CPU) functional units, providing independent functional units for arithmetic and Boolean operations that execute concurrently.
116
RALPH DUNCAN
Separate CPU and input/output (I/O) Processors, freeing the CPU from 1 / 0 control responsibilities by using dedicated 1/0 processors. Although these features are significant contributions to performance engineering, their presence alone does not make a computer a parallel architecture. 2.3 Flynn’s Taxonomy
Flynn’s taxonomy for computer architectures enjoys such widespread usage that any proposed parallel architecture taxonomy must take it into account. The Flynn taxonomy classifies architectures on the presence of single or multiple streams of instructions and data, yielding the following four categories. 1. SISD (single instruction stream, single data stream) : defines serial computers 2 . MISD (multiple instruction streams, single data stream) : would involve multiple processors applying different instructions to a single datum ; this hypothetical possibility is generally deemed impractical 3. SIMD (single instruction stream, multiple data streams) : involves multiple processors simultaneously executing the same instruction on different data 4. MIMD (multiple instruction streams, multiple data streams) : involves multiple processors autonomously executing diverse instructions on diverse data
Although these distinctions provide a useful shorthand for characterizing architectures, they are insufficient for precisely classifying modern parallel architectures. For example, pipelined vector processors merit inclusion as parallel architectures, since they provide both underlying hardware support and a clear programming framework for the highly parallel execution of vector-oriented applications. However, they cannot be adequately accommodated by Flynn’s taxonomy, because they are characterized by neither processors executing the same instruction in SIMD lockstep nor the asynchronous autonomy of the MIMD category. 2.4 Definition and Taxonomy
In order for a definition of parallel architecture to serve as the basis for a useful taxonomy, then, it should include appropriate computers that the
PARALLEL COMPUTER ARCHITECTURES
117
Flynn schema does not handle and exclude architectures incorporating only low-level parallelism. The following definition is therefore proposed : a paralilel architecture provides an explicit, high-level framework for expressing and executing parallel programming solutions by providing multiple processors, whether simple or complex, that cooperate to solve problems through concurrent execution. Figure 1 shows a taxonomy based on the imperatives discussed earlier and the proposed definition. This informal taxonomy uses high-level categories to delineate the principal approaches to parallel computer architecture and to show that these approaches define a coherent spectrum of architectural alternatives. Definitions for each category are provided in the section devoted to that category. This taxonomy is not intended to supplant efforts to construct fully articulated taxonomies. Such taxonomies usually provide comprehensive subcategories to reflect permutations of architectural characteristics and to cover lower-level features. In addition, detailed parallel architecture taxonomies are often developed in conjunction with a formal notation for describing computer architectures. Significant parallel architecture taxonomies have been proposed by Dasgupta (1990), Hockney and Jesshope (198l), Hockney (1987), Kuck (1982), Schwartz (1983), Skillicorn (1988), and Snyder (1988).
Processor Array Synchronous Associative Memory
Systolic Distributed Memory
Shared Memory
MlMD/SIMD Data-flow
Reduction
Wavefront
FIG. 1. High-level taxonomy of parallel computer architectures. 0 1990 IEEE.
118
RALPH DUNCAN
3. Synchronous Architectures
The initial category in our high-level taxonomy consists of synchronous parallel architectures, which coordinate concurrent operations in lockstep by using global clocks, central control units, or vector unit controllers. Our survey of synchronous architectures next examines pipelined vector processors, SIMD architectures, and systolic arrays. 3.1 Pipelined Vector Processors Vector processor architectures were developed to directly support massive vector and matrix calculations. Early vector processors, such as Control Data’s Star- 100 (Lincoln, 1984) and Texas Instruments’ Advanced Scientific Computer (Watson, 1972), were developed in the later 1960s and early 1970s and were among the first parallel architectures to be offered commercially. Vector processors are characterized by multiple, pipelined functional units that can operate concurrently and that implement arithmetic and Boolean operations for both vectors and scalars. Such architectures provide parallel vector processing by sequentially streaming vector elements through a functional unit pipeline and by streaming the output results of one unit into the pipeline of another as input (a process known as “chaining”). Although data elements for a vector operation enter a given functional unit’s pipeline in sequential fashion, parallelism is achieved by concurrently executing different stages of the vector operation on different data elements (or element pairs). Additional parallelism is provided by having the various functional units execute simultaneously. A representative architecture might have a vector addition unit consisting of six pipeline stages (Fig. 2). If each pipeline stage in the hypothetical architecture shown in the figure has a cycle time of 20 nsec, then 120 nsec elapse from the time operands ul, bl enter stage 1 until result cl is available. When the pipeline is filled, however, a result is available every 20 nsec. Thus, start-up overhead of pipelined vector units has significant performance implications. In the case of the register-to-register architecture depicted, special high-speed vector registers hold operands and results. Efficient performance for such architectures (e.g., Cray-1, Fujitsu VP-200) is obtained when vector operand lengths are multiples of the vector register size. Memory-to-memory architectures, such as the Control Data Cyber 205 and Texas Instruments Advanced Scientific Computer, use special memory buffers instead of vector registers. Recent vector processing supercomputers (e.g., the Cray Y-MP/4 and Nippon Electric Corporation SX-3) typically unite 4 to 10 vector processors through a large shared memory. Since such architectures can support tasklevel parallelism by assigning individual tasks to different CPUs, they could
PARALLEL COMPUTER ARCHITECTURES
119
Vector Register A
FIG.2. Register-to-register vector architecture operation.
0 1990 IEEE.
arguably be classified as MIMD architectures. However, since pipelined vector processing units remain the backbone of such multihead architectures, they are categorized in this discussion as vector processors for clarity’s sake. It was argued previously that an architecture’s utilizing multiple functional units or instruction pipelining, per se, is insufficient to merit classifying the architecture as parallel. Since multiple units and pipelining are the underlying mechanisms for vector architectures’ concurrent execution, one might question their inclusion as parallel architectures. However, such architectures’ vector instructions, as well as the language extensions and subroutine libraries that facilitate their use, do provide the user with a high-level framework for developing parallel solutions. Thus, the combination of a vector-level framework for expressing application parallelism with the effective exploitation of multiple units and pipelining to support that parallelism makes it reasonable to classify vector machines as parallel architectures. Figure 3 shows some representative vector processor architectures. Only two of Cray Research’s many models are depicted. In addition, the figure suggests both the current preference for fegister-to-register approaches and the introduction of recent models by Japanese manufacturers.
3.2 SIMD Architectures SIMD architectures (Fig. 4) typically employ a central control unit, multiple processors, and an interconnection network (IN) for either processor-to-processor or processor-to-memory communications. The distinctive
120
RALPH DUNCAN
Register-to-Register Operation
I
Cray-l (Russell, 1978) Cmy Y-MP (Reinhardt, 1988) Fujitsu VP-200 (Miura & Uchlda, 1984) Hitachi 5-820 (Wada, et a1 , 1988)
Vector Archi tectures
NEC SX-22 (Hwang, IQ84bl P.R.O.C. Galaxy YH-I (Hwang, I984b)
Memory- to-Memory
CDC Star-100 (Control Data, 1976) Cyber 205 (Lincoln. 1984) TI ASC (Watson, 1972)
FIG. 3.
Example vector processor architectures.
INSTRUCTION
DATA
n c
add rl .b O
P
-
I
N
FIG. 4. SIMD execution.
PARALLEL COMPUTER ARCHITECTURES
121
aspect of SIMD execution consists of the control unit broadcasting a single instruction to all processors, which execute the instruction in lockstep fashion on local data. The IN allows instruction results calculated at one processor to be communicated to another processor for use as operands in a subsequent instruction. SIMD architectures often allow individual processors to disable execution of the current broadcast instruction. As the subsections below will show, the SIMD architecture category encompasses several distinctive subclasses of machine, including processor arrays for word-sized operands, massively parallel machines composed of 1-bit processors, and associative memory architectures.
3.2.1 Processor Array Architectures
Processor arrays geared to the SIMD execution of numerical instructions have often been used for computation-intensive scientific problems, such as nuclear energy modeling. The processor arrays developed in the late 1960s (e.g., Illiac-IV) and their more recent successors (e.g., the Burroughs Scientific Processor) utilize processors that accommodate word-sized operands. Operands are usually floating-point (or complex) values and typically range in size from 32 to 64 bits. Various IN schemes have been used to provide processor-to-processor or processor-to-memory communications, with mesh and crossbar approaches being among the most popular. One variant of processor array architecture uses a large number (thousands) of 1-bit processors. This machine organization was employed by several significant SIMD architectures of the 1980s and is one of the architectural approaches sometimes characterized as constituting “massive” parallelism. Various approaches to constructing SIMD architectures with 1 bit processors have been explored. In bit-plane architectures, the array of processors is arranged in a symmetrical grid (e.g., 64 x 64) and associated with multiple “planes” of memory bits that correspond to the dimensions of the processor grid (Fig. 5). Processor,, situated in the processor grid at location (x, y ) , operates on the memory bits at location (x,y ) in all the associated memory planes. Usually, operations are provided to copy, mask, and perform arithmetic operations on entire memory planes, as well as on columns and rows within a plane. Loral’s Massively Parallel Processor (Batcher, 1980) and the Distributed Array Processor exemplify this kind of architecture, which is often used for image processing applications by mapping pixels to the memory’s planar structure. An alternative approach to 1-bit processor organization is exemplified by Thinking Machines Corporation’s Connection Machine (Hillis, 1985), which
122
RALPH DUNCAN
1 -BIT SERIAL PROCESSORS
MEHORY BIT-PLANES
planel
plane2
plane,
FIG. 5. Bit-plane array processing. KJ 1990 IEEE.
organizes as many as 65,536 one-bit processors as sets of four-processor meshes united in a hypercube topology. Figure 6 reflects the recent commercial emphasis on 1-bit processor SIMD architectures. Although SIMD arrays based on word-oriented processors continue to be developed, reduced interest in this traditional approach is currently evident. 3.2.2 Associative Memory Processor Architectures
Associative memory processors (Kohonen, 1987) constitute a distinctive type of SIMD architecture. These architectures use special comparison logic to cffect parallel operations on stored data on the basis of data content. Research in constructing associative memories began in the late 1950s with the obvious goal of being able to search memory in parallel for data that matched some specified datum. Associative memory processors developed in the early 1970s, such as Bell Laboratories’ Parallel Element Processing Ensemble (PEPE), and more recent architectures ( e g , Loral’s ASPRO) have often been geared to such database-oriented applications as tracking and surveillance. Figure 7 shows the functional units that characterize a typical associative memory processor. A program controller (serial computer) reads and executes instructions, invoking a specialized array controller when associative memory instructions are encountered. Special registers enable the program controller and associative memory to share data. Most current associative memory processors use a bit-serial organization, which involves concurrent operations on a single bit-slice (bit-column) of
PARALLEL COMPUTER ARCHITECTURES
123
ICL DAP (Reddaway, 1973) Loral M P P ( B a t c h , 1980)
Connection Machine (Hillis, 1985)
llliac IV (Barnes, et al., 1968)
Other
Burroughs ESP (Kuck & Stokes, 1984) I B M G F l l (Beteem, et al.. 1987) Motorola T-ASP (Lang, et al., 1988)
FIG.6 . Example SIMD processor array architectures.
all the words in the associative memory. Each associative memory word, which usually has a very large number of bits [e.g., 32 kilobytes (32K)], is associated with special registers and comparison logic that functionally constitute a processor. Hence, an associative processor with 4K words effectively has 4K processing elements (PEs). Figure 8 depicts a row-oriented comparison operation for a generic bitserial architecture. A portion of the comparison register contains the value to be matched. All of the associative PEs start at a specified memory column and compare the contents of 4 consecutive bits in their row against the comparison register contents, setting a bit in the A register to indicate whether or not their row contains a match. In Fig. 9, a logical OR operation is performed on a bit-column and the bit-vector in register A, with register B receiving the results. A zero in the
124
RALPH DUNCAN
program memoryc '
program ontroller
4
4
ALU and 4 spectal registers
array .controller
+
COHPARISON REGISTER
m 10011010
ASSOCIATIVE
search' pattern
REGISTERS
ASSOCIATIVE HEHORY
E
f WORDS
n A
B
Hask
bit-column &arch window
1.-
BITS PER WORD
-4
FIG.8. Associative memory comparison operation. Q 1990 IEEE.
125
PARALLEL COMPUTER ARCHITECTURES
0
Mask Reg.
7 WORDS
i ASSOCIATIVE MEMORY BITS PER WORD
FIG. 9. Associative memory logical OR operation. 0 1990 IEEE.
Mask register indicates that the associated word is not included in the current operation. Figure 10 shows example associative memory architectures. In addition to the bit-serial architecture category discussed above, the figure uses several other categories of architecture defined by Yau and Fung (1977) in their older, but still useful, article. In fully parallel architectures, all bits (or groups of bits) in a given column of memory are accessed by an instruction, and multiple columns can be accessed simultaneously. This functionality can be implemented by a distributed logic approach, in which the columns of concurrently accessed memory are several bits wide, and typically contain enough bits to constitute a character. Lesser known variants of associative memory architecture have included word-serial machines, which use hardware to implement loop constructs for searching, and block-oriented architectures, which use rotating memory devices as the associative memory. These latter approaches are included primarily for historical interest. In recent years, interest in these traditional approaches to associative memory architecture seems to have lessened, with much of the work in content-addressable memory passing to the neural network field. 3.3 Systolic Architectures
In the early 1980s H. T. Kung of Carnegie-Mellon University proposed systolic architectures to solve the problems of special-purpose systems that
126
RALPH DUNCAN
word-serial Associative Memory Architectures
FIG.10. Example associative memory architectures.
must balance intensive computations with demanding 1 / 0 bandwidths (Kung, 1982). Systolic architectures (systolic arrays) are pipelined multiprocessors in which data is pulsed in rhythmic fashion from memory and through a network of processors before returning to memory (Fig. 11). A global clock and explicit timing delays synchronize this pipelined data flow, which consists of datums obtained from memory that are to be used as operands by multiple processors within the array. In some schemes, this pipelined data-flow may include partial results computed by the array’s processors. Modular processors united by regular, local interconnections are typically used in order to provide basic building blocks for a variety of special-purpose systems. During each time interval, these processors transmit and receive a predetermined amount of pipelined data, and execute an invariant sequence of instructions.
PARALLEL COMPUTER ARCHITECTURES
FIG.11. Systolic flow of data from and to memory.
127
(01990 IEEE.
Systolic arrays address the performance requirements of special-purpose systems by achieving significant parallel computation and by avoiding 1/0 and memory bandwidth bottlenecks. A high degree of parallelism is obtained by pipelining data through multiple processors, most often in two-dimensional fashion. Systolic architectures maximize the computations performed on a datum once it has been obtained from memory or an external device. Hence, once a datum enters the systolic array it is passed along to the various processors that need it, without an intervening store to memory. According to H. T. Kung’s definition, only processors at the topological boundaries of a systolic array perform 1 / 0 to and from memory. Figures 12a-e show how a simple systolic array could calculate the outer product of two matrices
The zero inputs shown moving through the array represent explicit timing delays used for synchronization. Each processor in this tightly synchronized scheme is expected to accept/send operands and execute a code sequence during each time-step period. Thus, if the operands needed by a given processor have not yet become available by passing through antecedent processors, timing delay operands are sent to that processor to ensure its computations are appropriately delayed. In the example, each processor begins with an accumulator set to zero and, during each cycle, adds the product of its two inputs to the accumulator. After five cycles the matrix product is complete. Figure 13 shows example systolic arrays developed by industry, academia, and government. The examples suggest that systolic array architectures have rapidly become commercially viable, particularly for algorithm-specific systems that perform military signal processing applications. In addition, programmable (reconfigurable) systolic architectures, such as the iWarp and Saxpy Matrix-1, have been constructed that are not limited to implementing
128
RALPH DUNCAN
h f
Q
e
0
h f
C
C
db
d b
(a)
(bl h
If
IQ (dl
(C)
(el Frc;. 12. Systolic matrix multiplication. 0 1990 IEEE
129
PARALLEL COMPUTER ARCHITECTURES
ARRAY ARCHITECTURES
I Carnegie-Hellon WARP 1
’
1-
IHughes Systolic/Cellular System} 1
Y} 1 -
-
I Ho-rt1
-1
NOSCSLAPP
}
(Annaratone. e t al.. 1987)
I
(Nash, e t al., 1987) (Borkar, e t al., 1988)
1
(Leeland, 1987)
1-
(Drake, e t al.,. 1987)
1
(Lopresti, 1987) (Hein, e t al., 1987)
1
FIG. 13. Example systolic array architectures.
a single algorithm. Although systolic concepts were originally proposed for very large-scale integration (VLS1)-based systems to be implemented at the chip level, recent systolic architectures have been implemented at a variety of physical levels.
4.
MIMD Architectures
MIMD architectures employ multiple processors that can execute independent instruction streams. Thus, MIMD computers support parallel solutions that require processors to operate in a largely autonomous manner. Although software processes executing on MIMD architectures are synchronized by passing messages through an IN or by accessing data in shared memory, MIMD architectures are asynchronous computers, characterized by decentralized hardware control. In this exposition, MIMD architectures are treated as being synonymous with asynchronous architectures. The impetus for developing MIMD architectures can be ascribed to several interrelated factors. MIMD computers support higher-level parallelism (subprogram and task levels) that can be exploited by “divide and
130
RALPH DUNCAN
conquer” algorithms organized as largely independent subcalculations (e.g., searching, sorting). MIMD architectures may provide an alternative to depending on further implementation refinements in pipelined vector computers to provide the significant performance increases needed to make some scientific applications tractable (e.g., three-dimensional fluid modeling). Finally, the cost effectiveness of n-processor systems over n single-processor systems encourages MIMD experimentation. Both major categories of MIMD architecture, distributed memory and shared memory computers, are examined in the following text. First we discuss distributed memory architectures and review popular topological organizations for these message-passing machines. Subsequent sections consider shared memory architectures and the principle interconnection technologies that support them.
4.1
Distributed Memory Architectures
Distributed memory architectures (Fig. 14) connect processing nodes (consisting of an autonomous processor and its local memory) with a processor-to-processor IN. Nodes share data by explicitly passing messages through the IN, since there is no shared memory. Significant developments in distributed memory architecture occurred during the 1 %Os, often spurred by the desire to construct a multiprocessor architecture that would “scale” (i.e., accommodate a large increase in processors without significant performance degradation) and would satisfy the processing requirements of large scientific applications characterized by local data references.
FIG. 14. MIMD distributed memory architecture structure. 0 1990 IEEE.
PARALLEL COMPUTER ARCHITECTURES
131
Various IN topologies have been proposed to support architecture expandability and provide efficient performance for parallel programs with differing interprocessor communication patterns. Figure 15 depicts some common topologies. Although the suitability of these IN topologies for a given architecture is partly determined by the cost and performance characteristics of a particular implementation, several more abstract characteristics can be used to judge topologies‘ relative merits. First, a topology’s scalability is strongly influenced by the number of connections that are required for each node (the node’s “degree”), since physical constraints limit the number of connections one can feasibly implement. It is desirable, therefore, for the number of connections per node to remain fixed or to grow logarithmically as the number of system nodes increases.
( a ) ring
( b l mesh
(c) tree
(d) hypercube
o =
root
I = level-1 2 = level-2
( e ) t re e mapped t o a reconfigurable mesh
FIG. 15. MIMD interconnection network topologies: (a) ring; (b) mesh; (c) tree; (d) hypercube; (e) tree mapped to a reconfigurable mesh. 0 1990 IEEE.
132
RALPH DUNCAN
Another important consideration is a topology’s inherent fault tolerance. This involves the degree of disruption that a single node’s failure causes and the overhead involved in routing messages around a failed node. A third abstract measure of topology suitability is communication diameter, which can be defined as the maximum number of communication links that a message must traverse between any source and any destination node, while taking the shortest available path (Bhuyan, 1987). In an informal sense, it is the best routing solution for the worst case pairing of source and destination nodes. The following subsections review several popular topologies in terms of these considerations.
4.I . 1 Ring Topology Architectures A major benefit of a ring topology is that each node’s degree (number of interconnections) remains constant as processors are added to form a larger ring. Significant drawbacks to a simple ring topology include the large communication diameter ( N / 2 for N processors) and low fault tolerance (a single failure disrupts communications). Ring-based architectures’ communication diameter, however, can be improved by adding chordal connections. Both chordal connections and the use of multiple rings can increase a ring-based architecture’s fault tolerance. Typically, fixed-size message packets are used that include a node destination field. Ring topologies are most appropriate for a small number of processors executing algorithms that are not dominated by data communications. Control Data Corporation has built several specialized architectures that use ring topologies for pipelining. These are hybrid architectures, however, that have both shared memory and message-passing capabilities. Such architectures include the Advanced Flexible Processor (Allen, 1982), the Cyberplus (Ray, 1985), and the Parallel Modular Signal Processor (Colestock, 1988). 4.1.2 Mesh Topology Architectures
A symmetrical two-dimensional (2-D) mesh, or lattice, topology has n2 nodes, each connected to their four immediate neighbors. Wrdp-around connections at the edges are sometimes provided to reduce the communication diameter from 2(n - 1) to 2 * (INTEGER-part of n / 2 ) . Increasing the mesh size does not alter node degree. Meshes with simple, four-neighbor connections are relatively fault tolerant, since a single fault results in no more than two additional links being traversed to bypass the faulty node. A mesh’s communication diameter can be reduced and its fault tolerance
PARALLEL COMPUTER ARCHITECTURES
133
increased by providing additional diagonal links or by using buses to connect nodes by rows and columns. 4.1.3 Tree Topology Architectures
Tree topology architectures have been proposed to support the parallel execution of algorithms for searching and sorting, image processing and other algorithms amenable to a divide-and-conquer approach. Although a variety of tree-structured topologies have been suggested, the complete binary tree topology is the most analyzed variant and is the one discussed below. Node degree is not a barrier to binary tree topology scalability, since it remains fixed as tree size increases. Communication diameter and fault tolerance, however, are significant limitations for a binary tree unadorned with additional communications links. For example, the communication diameter for such a tree with n levels and 2" - 1 processors is 2(n - 1). Furthermore, disrupted communications links at a single node would sever communications between all that node's descendents and the rest of the tree. For these reasons, various additional communications links have been proposed for binary tree topologies, such as buses or point-to-point links that unite all nodes at the same tree level. Well-known parallel architectures based on tree topologies include the DADO (Stolfo, 1987) and Non-Von architectures (Shaw, 1981) developed at Columbia University. 4.1.4 Hypercube Topology Architectures
Since hypercube topologies are not likely to be as familiar to readers as rings or trees, we define the topology in some detail, before considering its relative merits. A Boolean n-cube or hypercube topology uses N = 2" processors arranged in an n-dimensional cube, where each node has n = log, N bidirectional links to adjacent nodes (Fig. 15). Individual nodes are uniquely identified by n-bit numeric values that range from 0 to N - 1 and that are assigned in a manner that ensures adjacent nodes' values differ by a single bit. Messages contain the destination node's bit-value and a label initialized to the source node's bit-value. When a processor routes a message, it selects an adjacent node that has a bit in common with the destination value that the routing node lacks, corrects that bit of the message label, and sends the message to the selected node. As a result of these conventions, the number of links traversed by a message traveling from node A to node B is equal to the number of bits that differ in the two nodes' bit-values. Since the source and destination node
134
RALPH DUNCAN
labels can at most differ in each of the n bits in their respective labels, the communication diameter of such a hypercube topology is n = log, N . Similarly, hypercube node degree grows in proportion to log2 N . Thus, the total number of processors can be doubled at the cost of increasing the number of interconnections per node by a single communications link. These properties make hypercube topologies attractive as the basis for messagepassing architectures that can “scale up” to a large number of processors (i.e., on the order of 1024) in order to meet demanding scientific application requirements. In practice, hypercube topology fault tolerance is likely to be as much influenced by the sophistication of the message routing system as by the topology’s abstract properties. For example, if a node in a log, N dimension hypercube (where log2 N > 2) possesses a message that it should forward to a node other than its immediate neighbors, and a single neighbor node has failed, at least one optimal-length pathway to the destination is available. In order to cope with multiple faults, the message routing mechanism could be enhanced to use suboptimal alternative paths when faults block the optimal-length pathways. Interest in hypercube topologies was stimulated by the development of the Cosmic Cube architecture at the California Tnstitute of Technology (Seitz, 1985). Commercial architectures based on hypercube topologies have included the Ametek Series 2010, Intel Personal Supercomputer, and NCUBE/10. Research is continuing on generalized hypercubes where N is not restricted to being an integral power of 2 (Bhuyan and Agrawal, 1984).
4.1.5 Reconfigurable Topology Architectures
An interconnection network embodies a single topology in the sense that its physical implementation in hardware is fixed. A reconfigurable topology architecture can, however, provide mechanisms, such as programmable switches, that effectively allow the user to superimpose various interconnection patterns onto the physical IN. Recent research architecture prototypes have implemented I N topology reconfigurability with diverse approaches. For example, Lawrence Snyder’s CHiP (Configurable Highly Parallel computer; Snyder, 1982) allows the user to superimpose different topologies onto an underlying mesh structure. Another approach, which is exemplified by H. J. Siegel’s PASM (Partitionable SIMD/MTMD system; Siege1 et ul., 1987), allows the user to partition a base topology into multiple interconnection topologies of the same type.
135
PARALLEL CO M PUTER ARCH ITECTU R ES
A significant motivation for constructing reconfigurable topology architectures is that such an architecture can act as many special-purpose architectures that efficiently support the communications patterns of particular algorithms or applications. Figure 16 shows example distributed memory architectures that utilize reconfigurable topologies and the common topologies discussed in previous subsections.
4.2 Shared Memory Architectures As befits their name, the defining characteristic of shared memory architectures is a global memory that each processor in the system can access. In such an architecture, software processes, executing on different processors, coordinate their activities by reading and modifying data values in the shared memory. Our discussion defines these architectures, which involve multiple general-purpose processors sharing memory, as parallel architectures, while ring topology CDC AFP* (Allen, 1982) CDC Cyberpluw (Ray, 1985) CDC PMSP* (Colestock, 1988) *=hybrid distrib./shared memory
f f
5 MlHD Distributed Memory Architectures
tree topology
>
DADO2 (Stolfo, 1987) NON-VON (Shaw, 1981)
CHiP/PRINGLE (Snyder, 1982) (Kapauan, et al., 1984) PASM (Siegel, et al., 1987) TRAC (Lipovski & Malek, 1987)
FIG. 16. Example MIMD distributed memory architectures.
136
RALPH DUNCAN
excluding architectures in which a single CPU only shares memory with 1/0processors. A significant number of shared memory architectures, such as Encore Computer Corporation’s Multimax and Sequent Computer Systems’ Balance series, were commercially introduced during the 1980s. These shared memory computers do not have some of the problems encountered by message-passing architectures, such as message sending latency as data is queued and forwarded by intermediate nodes. However, other problems, such as data access synchronization and cache coherency, must be solved. Using shared memory data to coordinate processes executing on different processors requires mechanisms that synchronize attempts to access this data. The essential problem is to prevent one processor from accessing a datum while another process’ operation on the datum is only partially complete, since the accessed data would be in an indeterminate state. Thus, one process must not read the contents of a memory location while another process is writing a new value to that location. Various mechanisms, such as test-and-set primitives, fetch-and-add instructions, or special control bits for each memory word, have been used to synchronize shared memory access (Dubois et ul., 1988). These mechanisms can be implemented through microcoded instructions, sophisticated memory controllers, and operating system software. For example, the testand-set/reset primitives shown below can be used to grant a processor sole access to a shared variable when the test-and-set primitive returns a zero value.
TEST-AND-SET (lock-variable) t emp :=1o ck-var i ab 1e ; lock-variable:= 1; RETURN (temp); END;
RESET (lock-variable) lock-variable := 0;
END ;
Processors that receive a value of one when invoking the primitive are prohibited from accessing the variable ; they typically enter a busy waiting state as they repeatedly invoke the primitive (“spin-lock”) or wait for an interrupt signaling that the lock variable’s value has been reset (“suspendlock”). Other approaches to synchronizing shared data access are described in Kuck et ul. (1986), Gottlieb et ul. (1983), and Jordan (1984). Each processor in a shared memory architecture may have a local memory that is used as a cache. Multiple copies of the same shared memory data, therefore, may exist in various processors’ caches at a given time. Maintaining a consistent version of such data is the cache coherency problem, which is caused by sharing writable data, process migration among processors, and
PARALLEL COMPUTER ARCHITECTURES
137
1/0 activity. Solutions to this problem must ensure that each processor uses the most recently updated version of cached data. Both hardware-based schemes, such as write-invalidate and write-update protocols for “snoopy caches,” and software-based schemes, such as predetermining data cacheability or time-stamping data-structure updates, have been proposed (Stenstrom, 1990). Although systems with a small number of processors typically use hardware “snooping” mechanisms to determine when cached memory data has been updated, larger systems often rely on software solutions to minimize performance impact. Useful overviews of cache coherence schemes are presented in Dubois et af. (19SS) and Stenstrom (1990). Figure 17 illustrates some major alternatives for connecting multiple processors to shared memory outlined below.
1 BUS
(a 1 bus interconnection
(bl
a 2 X 2 crossbar
FIG. 17. MIMD shared memory interconnection schemes: (a) bus interconnection; (b) 2 x 2 crossbar; (c) 8 x 8 omega MIN routing a P3 request to M 3 . 0 1990 IEEE.
138
RALPH DUNCAN
[c) an 8 X 8 omega HlN routing a P3 request t o H3
FIG. 17. Continued.
4.2.1 Bus Interconnections
Time-shared buses (Fig. 17a) offer a fairly simple and relatively inexpensive way to give multiple processors access to a shared memory. Many of the commercial parallel architectures introduced during the 1980s were bus-based, shared memory machines. However, a single, time-shared bus can effectively accommodate only a moderate number of processors (4-20), since only one processor can access the bus at a given time. In order to accommodate more processors or to increase communications bandwidth, bus-based architectures sometimes utilize multiple buses and hierarchical interconnection systems (Mudge et al., 1987). The experimental Cm* architecture, for example, employs two kinds of buses-a local bus linking a cluster of processors, and a higher-level system bus that links dedicated service processors associated with each cluster. The Hector architecture (Vranesic et al., 1991) exhibits an alternative approach, using a hierarchy of “rings” (bit-parallel, point-to-point connections) to interconnect short buses that each serve a small number of processors. 4.2.2 Crossbar Interconnections
Crossbar interconnection technology uses a crossbar switch of n2 crosspoints to connect n processors to n memories (Fig. 17b). Processors may contend for access to a memory location, but crossbars prevent contention for communication links by providing a dedicated pathway between each
PARALLEL COMPUTER ARCHITECTURES
139
possible processor/memory pairing. Crossbar interconnections offer high communications performance but are a relatively expensive I N alternative. Power, pinout, and size considerations typically limit crossbar architectures to using a small number of processors (i.e., 4 16). The Alliant FX/8, which uses a crossbar scheme to connect processors and cache memories, is an example of a commercial parallel architecture using crossbar interconnections.
4.2.3 Multistage Interconnection Networks
Multistage interconnection networks, or MINs (Bhuyan, 1987; Kothari, 1987; Siegel, 1985), offer a compromise between the relatively high-price/ high-performance alternative of crossbar INS and the low price/low-performance alternative offered by buses. An N x N MIN connects N processors to N memories by deploying multiple “stages” or banks of switches in the IN pathway. When N is a power of 2, a popular approach is to employ logzN stages of N/2 switches, using 2 x 2 switches. A processor making a memory access request specifies the desired destination (and pathway) by issuing a bit-value that contains a control bit for each stage. The switch at stage i examines the ith bit to determine whether the input (request) is to be connected to the upper or lower output. Figure 17c illustrates MIN switching with an omega network connecting eight processors and memories, where a control bit equal to 0 indicates a connection to the upper output. Since the communication diameter of such MINs is proportional to log, N, they can support a large number of processors (e.g., 256). Since MIN technology offers a moderate price/performance IN alternative with a high degree of scalability, it has received a great deal of research attention, leading to proposals for variations such as the omega, flip, SW-banyan, butterfly, multistage shuffle-exchange, baseline, delta, and generalized cube networks. Similarly, many fault-tolerant MINs have been proposed, including the extra stage cube, multiplath omega, dynamic redundancy, merged delta, and INDRA networks (Adams et al., 1987). Figure 18 shows an example of MIMD shared memory architectures categorized by the IN technologies discussed above.
5. MIMD Execution Paradigm Architectures MIMD/SIMD hybrids, data-flow architectures, reduction machines, and wavefront array processors all pose a similar difficulty for an orderly taxonomy of parallel architectures. Each of these architectural types is predicated on MIMD principles of asynchronous operation and concurrent
140
RALPH DUNCAN
bus interconnection Cm* (Jones b Schwartz, 1980)
ELXSl 6400 (Hays, 1986) Encore Multimax (Encore Cornput., 1987) FLEX132 (Manuel, 1985) Hector (Vranesic, et al., 19911
t
HlMD Shared Memory Architectures
rossbar interconnectio Alliant FX/8 (Pemn b Mundie, 1986) S-1 (Widdoes & Correll, 1979)
MlN interconnection BBN Butterfly (BBN Lab., 1985) BBN Monarch (Rettberg, et al., 1990) CEDAR (Kuck, et al., 1986) IBM RP3 (Pfister, et al., 1987) Ultracomputer (Gottlieb, et al., 1983)
FIG. 18. Example MIMD shared memory architectures.
manipulation of multiple instruction and data streams. However, each architecture type is also structured to support a distinctive parallel execution paradigm that is as fundamental to its overall design as MIMD characteristics. For example, the data-flow execution paradigm exemplifies a distinctive form of processing, in which instruction execution is triggered by operand availability. Although data-flow architectures can be implemented using diverse MIMD technologies, their design features coalesce around the central concept of supporting data-flow execution. This dualism poses several taxonomic problems. Placing these architectures in MIMD subcategories solely on the basis of their memory structure and interconnection characteristics obscures the most fundamental aspect of their design-supporting a distinctive kind of parallel program execution. Simply adding special MIMD subcategories for these architectures, however, results in undesirable asymmetry and imprecision. First, having MIMD subcategories at the same taxonomic level be based on both supported execution
PARALLEL COMPUTER ARCHITECTURES
141
models (e.g., data-flow) and structural characteristics (e.g., shared memory, bus-based) makes the subcategorization asymmetrical and somewhat arbitrary. Second, the MIMD architectures discussed in Section 4 can typically support multiple parallel execution models. One can implement a messagepassing application using shared memory for the messages, or can implement an application using data-flow principles on a distributed memory hypercube architecture. Thus, if one subcategorizes MIMD architectures on the basis of supported execution models, one would have many architectures grouped under an imprecise category for “other models” or “multiple models.” Our taxonomy, therefore, creates a separate, high-level category : MIMD Execution Paradigm Architectures. This inelegant term emphasizes that these MIMD architecture types are structured to support particular parallel execution models. 5.1
MIMD/SIMD Architectures
A variety of experimental hybrid architectures have been constructed during the 1980s that allow selected portions of an MIMD architecture to be controlled in SIMD fashion (e.g., DADO, NON-VON, PASM, and the Texas Reconfigurable Array Computer, or TRAC) (Lipovski and Malek, 1987). These architectures employ diverse mechanisms for reconfiguration and SIMD execution control. One promising approach, based on tree-structured, message-passing computers, such as DADO2 (Stolfo and Miranker, 1986), will be used here to illustrate hybrid MIMD/SIMD operation. The master/slave relation of a SIMD architecture’s controller and processors can be mapped onto the node/descendents relation of a subtree (Fig. 19). When the root processor node of a subtree operates as a SIMD controller, it transmits instructions to descendent nodes that each executes
a
( MIMD operation node)
( SIMD controller node)
/
d
( SIMD
slave processors 1
FIG. 19. MTMD/SIMD operation. 0 1990 IEEE.
142
RALPH DUNCAN
the instructions on data in its local memory. In a true message-passing architecture, this instruction transmission process differs from that of the classic SIMD model of simultaneously broadcasting instructions to each processor, since instructions can be first transmitted to the controlling processor’s descendents, and then transmitted down the tree to their descendents. The flexibility of MIMD/SIMD architectures obviously makes them attractive candidates for further research; specific incentives for recent development efforts include supporting image processing applications (PASM ; Siege1 et al., 1987) ; studying scalable, reconfigurable architectures (TRAC; Lipovski and Malek, 1987) ; and parallelizing expert system execution (NON-VON; Shaw, 1981 ; DADO; Stolfo and Miranker, 1986). Figure 20 shows some example MIMD/SIMD architectures.
5.2
Data-Flow Architectures
The fundamental characteristic of data-flow architectures is an execution paradigm in which instructions are enabled for execution as soon as all of their operands become available. Hence, the execution sequence of a dataflow program’s instructions is based on data dependencies. Data-flow architectures can be geared to exploiting concurrency at the task, routine and instruction levels. A major incentive for data-flow research, which dates from J. B. Dennis’s pioneering work in the mid-l970s, is to explore new
HlHDlSlHD Architectures
0 Topoloqy
DADO (Stolfo & Miranker, 1986)
TRAC (Lipovski h Malek, 1987)
NON-VON (Shaw, 1981 )
FIG. 20. Example MIMD/SIMD architectures.
PARALLEL COMPUTER ARCHITECTURES
143
computational models and languages that can be effectively exploited to achieve large-scale parallelism. Programs for data-flow architectures can be expressed as data-flow graphs, such as the program fragment depicted in Fig. 21. Graph nodes may be thought of as representing asynchronous tasks, although they are often single instructions. Graph arcs represent communications paths for tokens that carry either execution results needed as operands in subsequent instructions or control information. Some of the diverse approaches used to implement data-flow computing are outlined below. Static implementations load all program-graph nodes into memory during initialization and allow only one instance of a node to be executed at a time; dynamic architectures allow node instances to be created at run-time and multiple instances of a node to be concurrently executed (Srini, 1986). Some architectures directly store token information containing instruction results into a template for the instruction that will use the results as operands (“token storage”). Other architectures use token matching schemes, in which a matching unit collects result tokens and tries to match them with instructions’ required operands. When a complete set of tokens (all required operands) is assembled for an instruction, an instruction template containing the relevant operands is created and queued for execution (Treleaven rt al., 1982b). Proposed instructions formats for data-flow architectures differ considerably (Srini, 1986). Significant differences result from varying constraints on the number of input and output arcs that may be associated with a graph node and from alternative approaches to representing control information.
Q Node 1
FIG. 21. Data-flow graph-program
Node 2
fragment.
0 1990 IEEE.
144
RALPH DUNCAN
A typical scheme, however, might allow operand data to be written into instruction fields as either literals or (result) memory addresses by using control bits to identify which data format is being used. Figure 22 shows how a simplified token matching architecture might process the program fragment shown in Fig. 21. At step 1, the execution of ( 3 * u ) results in the creation of a token that contains the result (15) and an indication that the instruction at node 3 requires this as an operand. Step 2 shows the matching unit that will match this token and the result token of ( 5 * h ) with the node 3 instruction. The matching unit creates the instruction token (template) shown at step 3 . At step 4, the node store unit obtains the relevant instruction opcode from memory. The node store unit then fills in the relevant token fields (step 5 ) , and assigns the instruction to a processor. The execution of the instruction creates a new result token to be used as input to the node 4 instruction. Figure 23 shows some examples of data-flow architectures, and categorizes them on the basis of the static and dynamic architecture distinction discussed above. Readers interested in detailed discussions of data-flow architecture
3
*h
STORE
w=’ IPI
(OPRNDP=IO
EST = NODE4
FIG.22. Data-flow token matching example.
0 1990 IEEE.
PARALLEL COMPUTER ARCHITECTURES
145
EDFG System (Srini, 19851 lrvine D-F Machine (Arvind & Gostelow, 19751
-
Manchester Data-Flow (Watson & Curd, 1979) Computer H.I.T. Tagged Token (Arvind & Kathail, 19811 Data-Flow Computer Newcastle JUMBO (Treleavan, e t a1 , 1 982a) Utah Data-Driven Machine (Davis. 1978)
static architectures CERT LAU System (Plas, e t al., 1976) H.I.T. Data-Flow (Dennis & Misunas, 1975) Computer TI Distributed Data (Cornish, 1979) Processor
FIG.23. Example data-flow architectures.
characteristics and taxonomy can consult Treleaven et al. (1982b) and Srini ( 1986).
5.3 Reduction Architectures Reduction, or demand-driven, architectures (Treleaven et al., 1982b) implement an execution model in which an instruction is enabled for execution when its results are required as operands for another instruction that is already enabled for execution. Most reduction architecture research began in the late 1970s in order to explore new parallel execution models and to provide architectural support for applicative (functional) programming languages. Reduction architectures execute programs that consist of nested expressions. Expressions are recursively defined as literals or as function applications on arguments that may be literals or expressions. Programs may reference named expressions, which always return the same value (ie., have the property of “referential transparency”). Hence, reduction programs are function applications constructed from primitive functions. Reduction program execution consists of recognizing reducible expressions, then replacing them with their calculated values. Thus, an entire
146
RALPH DUNCAN
reduction program is ultimately reduced to its result. Since the general execution model only enables an instruction for execution when its results are needed by a previously enabled instruction, some additional rule is needed to enable the first instruction(s) and begin computation. Practical challenges for implementing reduction architectures include synchronizing instruction result demands and managing copies of evaluation results. Demands for an instruction’s results must be synchronized, because preserving referential transparency requires that an expression’s results be calculated only once. Copies of expression evaluation results must be maintained, since an expression result could be referenced (needed) more than once and a single copy could be consumed by subsequent reductions upon first being delivered. Reduction architectures employ either string-reduction or graph-reduction to implement demand-driven execution models. String-reduction involves manipulating literals and copies of values, which are represented as strings that can be dynamically expanded and contracted. Graph-reduction involves manipulating literals and references (pointers) to values; thus, a program is represented as a graph and garbage collection is performed to reclaim dynamically allocated memory as the reduction proceeds. Figures 24 and 25 show a simplified version of a graph-reduction architecture that maps the program below onto tree-structured processors and passes tokens that demand or return results. Figure 24 depicts all the demand tokens produced by the program, as demands for the values of references propagate down the tree. In Fig. 25, the last two result tokens produced are shown as they are passed to the root node. The program fragment used in Figs. 24 and 25 is: a = fbc;
b = +de; c = *fg;
d=l. e=3. j = 5 .
g=7.
Figure 26 shows reduction machine architectures, categorized according to whether they implement the string or graph reduction mechanisms discussed previously.
5.4 Wavefront Array Architectures Wavefront array processors (Kung et al., 1987) combine the data pipelining of systolic arrays with an asynchronous data-flow execution paradigm. In the early 1980s, S. Y. Kung proposed wavefront array concepts to address
PARALLEL COMPUTER ARCHITECTURES
147
Node 1
FIG. 24. Reduction architecture demand token production. 0 1990 IEEE.
the same kind of problems that stimulated systolic array research. Thus, wavefront array processors are intended to provide efficient, cost-effective architectures for special-purpose systems that balance intensive computations with high 1/0 bandwidth. Wavefront and systolic architectures are both characterized by modular processors and regular, local interconnection networks. Both kinds of arrays read data from external memory (using PEs at their topological boundaries), pulse data from neighbor to neighbor through a local IN, and write results to external memory using boundary PEs. Wavefront arrays, however, replace the global clock and explicit time delays used for synchronizing systolic data pipelining with asynchronous handshaking as the mechanism for coordinating inter-processor data movement. Thus, when a processor has performed its computations and is ready to pass data to its successor, it informs the successor, sends data when the successor indicates it is ready, and receives an acknowledgment from the successor. The handshaking mechanism makes computational wavefronts
148
RALPH DUNCAN
Node I
FIG.25. Reduction architecture result token production.
0 1990 IEEE.
pass smoothly through the array without intersecting, as the array’s processors act as a wave propagating medium. In this manner, correct sequencing of computations replaces the correct timing of systolic architectures. Figure 27 depicts wavefront array operation, using the matrix multiplication example used earlier to illustrate systolic operation (Fig. 12). The simplified example shows an array that consists of processing elements (PEs) with one-operand buffers for each input source. Whenever a boundary PE’s buffer associated with external memory is empty and the memory still contains inputs, the PE immediately reads the next available operand from memory. Operands from other PEs are obtained by using a handshake protocol. Figure 27a shows the situation after memory input buffers are initially filled. In Fig. 27b PE(1, 1) adds the product ae to its accumulator and transmits operands a and e to neighbors; thus, the first computational wavefront is shown propagating from PE(1, 1 ) to PE(l,2) and PE(2, 1). Figure 27c shows the first computational wavefront continuing to propagate, as a second wavefront is propagated by PE(1, I ) .
PARALLEL COMPUTER ARCHITECTURES
149
graph reduction machines Cambridge SKIM (Clarke, et al , 1980) ALICE (Darlington & Reeve, 1981) Utah AMPS (Keller, et al., 19781
string reduction machines
Machine ~~
~~
GMD R-Machine (Kluge & Schlutter, 1980) Newcastle R-Mach. (Treleaven & Mole, I9801 N.C. Cellular Tree Mach. (Mago, I9791
SERFRE (Villemin, 1982)
graph h string reduction machines Indiana APSA (ODonnell, et al., 1988) FIG. 26. Example reduction machine architectures.
S. Y. Kung argues (Kung et al., 1987) that wavefront arrays enjoy several advantages over systolic arrays, including greater scalability (since global clock skewing is not a problem), increased processing speed when nodes’ processing times are not uniform, simpler programming (since computations need not be explicitly scheduled), and greater run-time fault tolerance (since a single processor can be independently interrupted for testing). Wavefront arrays constructed by the Applied Physics Laboratory of Johns Hopkins University (Dolecek, 1984) and by the British Standard Telecommunications Company and Royal Signals and Radar Establishment (McCanny and McWhirter, 1987) should facilitate further assessment of wavefront arrays’ proposed advantages. 6 . Conclusions
This discussion’s central aim has been to show that, despite their diversity, extant parallel architectures define a comprehensible spectrum of machine designs. Each of the major parallel architecture classes that we have reviewed represents a fundamental approach to effectively supporting parallelized program execution. Although these approaches range from providing networks
150
RALPH DUNCAN
FIG.27. Wavefront array matrix multiplication. (01990 IEEE.
of general-purpose processors to supporting specific parallel programming philosophies and languages, this conclusion attempts to characterize the direction in which the field of parallel architecture research was moving in early 1991. Recent accomplishments in “scalable” architectures are likely to strongly shape research efforts in the immediate future. The concern for building systems that can be significantly increased in size without performance
PARALLEL COMPUTER ARCHITECTURES
151
degradation has become an important aspect of designing message-passing topologies (e.g., hypercube architectures), interconnection networks (e.g., MINs, hierarchical bus systems), and execution paradigms (e.g., wavefront array processing). The commercial viability of Thinking Machines Corporation’s Connection Machine, Loral’s Massively Parallel Processor, and various hypercube architectures is spurring interest in massively parallel computers that use thousands of processors. The diversity of mature parallel architecture types suggests that there are many viable ways to structure parallel processing systems. This breadth of alternatives encourages researchers to select possible system components and component integration strategies from a wide range of alternatives. Such a concern with system component selection may encourage research attention to be more equally divided among processor, memory, and interconnection technologies, rather than focusing primarily on performance engineering for specialized processors. For example, recent years have seen many research efforts directed to multistage interconnection networks and to organizing cached memory hierarchies. One of the last decade’s most stimulating developments has been the introduction of new architecture types that are geared to supporting a specific parallel execution model. Such architectures have included systolic and wavefront array processors, data-flow architectures, reduction machines, and the massively parallel, bit-oriented SIMD machines. This increased concern with conceptualizing parallel execution models is a departure from the concerns of the vector architecture approach in its maturity, which has tended to emphasize successive engineering refinements to highly specialized components. The first prototypes of execution model-oriented architectures are often constructed using standard microprocessors, buses and memory chips. This combination of emphasizing parallel execution paradigms and of using standard components as system building blocks has significant implications. First, these trends make it easier for researchers to contribute to the field, since the enormous financial investment needed to develop architectures like the classic vector computers can be avoided by using standard components for prototyping. Hence, a professor at a relatively ill-funded Third World university, who is armed with a promising conceptual model of parallel execution and some standard components, has a reasonable chance of constructing a novel parallel architecture. By making it easier to experiment with new parallel architecture approaches, these trends are likely to result in an even greater variety of proposed parallel architecture approaches. Parallel processing is firmly established as a viable mechanism for solving computational problems that are characterized by intensive calculations and demanding processing deadline requirements. By providing diverse architectures that are well suited to different kinds of computational problem, the
152
RALPH DUNCAN
parallel architecture subdiscipline has made parallel processing a useful tool for many application domains. Current research concerns, such as scalability, interconnection network and hierarchical memory refinement, and parallel execution paradigm support, suggest that the number and variety of parallel architectures under active development will continue to increase.
Acknowledgments
The author thanks the following individuals for providing research papers, descriptions of NVN architectures, and insights : Theodore Bashkow, Laxmi Bhuyan, Joe Cavano, Jack Dongarra, Paul Englehart, Scott Fahlman, Dennis Gannon, H. T. King, S. Y. Kung, G. J. Lipovski, Richard Lott, David Lugowski, Miroslaw Malek, Susan Miller, Wayne Ray, Malcolm Rimmer, Douglas Sakal, Howard J. Siegel, Charles Seitz, Lawrence Snyder, Vason Srini, Kent Steiner, Salvatore Stolfo, Philip Treleaven, David Waltz, and Jon Webb.
REFERENCES Adams, G. B., Agrawal, D. P., and Siegel, H. J. (1987). A Survey and Comparison of FaultTolerant Multistage Interconnection Networks. Computer 20(6), 14~-27. Allen, G. R. (1982). A Reconfigurable Architecture for Arrays of Microprogrammable Processors. In “Special Computer Architectures for Pattern Processing” (K. s. Fu and T. Tchikawa, eds.), pp. 157 189. CRC Press, Boca Raton, Florida. Anderson, G. A,. and Kain, R. Y. (1976). A Content-Addressed Memory Design for Data Basc Applications. Pror. hi.Conference on Parallel Processing, pp. 19 1-195. Annaratone, M., Amould, E., Gross, T., Kung, H. T., Lam, M., Menzilcioglu, O., and Webb, J. A. (1987). The Warp computer; Architecture, Implementation and Performance. IEEE Trans. Comput. C-36(12), 1523.~1538. Arvind and Gostelow, K. P. (1975). A New Interpreter for Data Flow and its Implications for Computer Architecture. Rep. No. 72, Department Information and Computer Science, University of California, Irvine. Arvind and Kathail, V. (1981). A Multiple Processor that Supports Generalized Procedures. Proceeding 8th Annual Symposium Computer Architecture, Minneapolis, pp. 291 -302. Barnes, G. H., Brown, R. M., Kato, M., Kuck, D. J.. Slotnik, D. L., and Stokes, R. A. (1968). The Illiac 1V Computer. IEEE Trans. Comput. C-17(8), 746-757. Batcher, K . E. (1972). Flexible Parallel Processing and STARAN. 1972 WESCON Technical Papers, Session I-Parallel Processing Systems, pp. 115.1-1 15.3. Aatcher, K. E. (1980). Design of a Massively Parallel Processor. IEEE Transactions Comput. C-29(9), 836-844. BBN Laboratories ( 1985). “Butterfly Parallel Processor Overview.” BBN Laboratories, Cambridge, Massachusetts. Beteem, J., Denneau, M., and Weingarten, D. (1987). The GFl1 Parallel Computer. In “Experimental Parallel Computing Architectures” (J. J. Dongarra, ed.), pp. 255-298. Elsevier, Amsterdam.
PARALLEL COMPUTER ARCHITECTURES
153
Bhuyan, L. N., and Agrawal, D. P. (1984). Generalized Hypercube and Hyperbus Structures for a Computer Network. IEEE Trans. Comput. C-33(4), 323-333. Bhuyan, L. N. ( 1987). Interconnection Networks for Parallel and Distributed Processing. Computer 20(6), 9-12. Borkar, S., Cohn, R., Cox, G., Gleason, S., Gross, T., Kung, H. T., Lam, M., Moore, B., Peterson, C., Pieper, J., Rankin, L., Tseng, P. S.. Sutton, J., Urbanski, J., and Webb, J. (1988). iWARP: an Integrated Solution to High-speed Parallel Computing. Proceeding Supercomputing 88, Orlando, Florida, pp. 330-339. Briggs, F., and Hwang, K . (1984). “Computer Architectures and Parallel Processing.” McGraw-Hill, New York. Clarke, T. J. W., Gladstone, P. J. S., Maclean, C . D., and Norman, A. C. (1980). SKIM-the S,K,I Reduction Machine, Proceedings LISP-80 Conf., Stanford, California, August, pp. 128 135. Colestock, M. (1988). A Parallel Modular Signal Processor. Proceeding 8th Conference Digital Auionics Systems, San Jose, California, October 17 -20, pp. 607-613. Control Data Corp. (1976). “Control Data STAR-I00 Computer System.” Control Data Corp., Minneapolis, Minnesota. Cornish, M. (1979). The TI Data Flow Architectures: the Power of Concurrency for Avionics. Proceedings Third Conference Digital Avionics Systems, Fort Worth, Texas, pp. 19-25. Couranz, G. R., Gerhardt, M. S., and Young, C. J. (1974). Programmdbk Radar Signal Processing Using the RAP. Proceedings Sagamore Computer Conference on Parallel Processing, 37-52. Crane, B. A,, Gilrnartin, M. J., Huttenhoff, J. H., Rux, P. T., and Shiveley, R. R. (1972). PEPE Computer Architecture. Proceedings IEEE COMPCON, pp. 57-60. Darlington, J., and Reeve, M. (1981). ALICE: a Multiprocessor Reduction Machine for the Parallel Evaluation of Applicative Languages. Proceedings Int. Symposium on Functional Programming Languages and Computer Architecture, Goteborg, Sweden, pp. 32-62. Dasgupta, S. (1990). A Hierarchical Taxonomic System for Computer Architectures. Computer 23(3), 64 74. Davis, A. L. (1978). The Architecture and System Method of DDMl : a Recursively Structured Data Driven Machine. Proceedings 5th Annual Symposium Computer Architecture, pp. 210215. Dennis, J. B., and Misunas, D. P. (1975). A Preliminary Architecture for a Basic Data Flow Processor. Proceedings 2nd International Symp. Computer Architecture, January 20-22, pp. 126-132. Dolecek, Q. E. (1984). Parallel Processing Systems for VHSIC. Tech. Report, Applied Physics LabOI‘dtOry, Johns Hopkins University, Laurel, Maryland, pp. 84-1 12. Dongarra, J. J., ed. (1987). “Experimental Parallel Computing Architectures.” North-Holland, Amsterdam. Drake, B. L., Luk, F. T., Speiser, J. M., and Symdnski, J. J. (1987). SLAPP: a Systolic Linear Algebra Parallel Processor. Computer 20(7), 45-49. Dubois, M., Scheurich, C . , and Briggs, F. A. (1988). Synchronization, Coherence, and Event Ordering in Multiprocessors. Computer 21(2), 9-21. Encore Computer Corp. (1987). “Multimax Technical Summary,” Publication no. 726-01759 Rev. D. Encore Computer Corp., Marlboro, Massachusetts. ETA Systems, Inc. (1987). “ETA10 Supercomputer Series,” Brochure no. 205326. ETA Systems, Inc., St. Paul, Minnesota. Finnila, C. A,, and Love, H. H. (1977). The Associative Linear Array Processor. IEEE Transactions Comput. C-26(2), 112-125. Flynn, M. J. (1966). Very High Speed Computing Systems. Proceedings IEEE, 54, pp. 19011909.
154
RALPH DUNCAN
Foulser, D. E., and Schreiber, R. (1987). The Saxpy Matrix-I : a General-Purpose Systolic Computer. Computer 20(7), 35- 43. Gajski, D. D., Lawrie, D. H., Kuck, D. J., and Sameh, A. H. (1987). CEDAR. In “Parallel Computing: Theory and Comparisons” (G. J. Lipovski and M. Malek, eds.), pp. 284 291. Wiley, New York. Goodyear Aerospace Corp. (1984). “Functional Description of ASPRO, the High Speed Associative Processor,” document no. GER 16868. Loral Systems Group, Akron, Ohio. Gottlieb, A,, Grishman, R., Kruskal, C. P., McAuliffe, K. P., Rudolph, L., and Snir, M. (1983). The NYU Ultracomputer: Designing an MlMD Shared Memory Parallel Computcr. IEEE Transuctiuns Cornpui. C-32(2), 175 189. Hays, N. (1986). New Systems Offer Near-Supercomputer Performance. Computer 19(3), I04 -107. Hein, C. E., Zieger, R. M., and Urbano, J. A. (1987). The Design of a GaAs Systolic Array for an Adaptive Null Stcering Beamforming Controller. Computer 20(7), 92 93. Higbie, L. C. (1972). The OMEN Computers: Associative Array Processors. Proceedings IEEE COMPCON, pp. 287-290. Hillis, W. D. (1985). “The Connection Machine.” MIT Press, Cambridge, Massachusetts. Hockney, R. W., and Jesshope, C. R. (1981). “Parallel Computers: Architecture, Programming, and Algorithms.” Adam Hilger, Ltd., Bristol, England. Hockney, R. W. (1987). Classification and Evaluation of Parallel Computer Systems. In “Springer-Verlag Lecture Notes in Computer Science,” No. 295, pp. 13-25. Hwang, K., ed., ( 19844. “Tutorial Supercomputers: Design and Applications.” IEEE Computer Society Press, Silver Spring, Maryland. Hwang, K . ( 1984b). Evolution of Modern Supercomputers. In “Tutorial Supercomputers: Design and Applications” (K. Hwang, ed.), pp. 5 8. IEEE Computer Society Press, Silver Spring, Maryland. Jones, A. K., and Schwarz, P. (1980). Experience Using Multiprocessor Systems: a Status Report. ACM Compui.Surveys 12(2), 121 165. Jordan, H. F. (1984). Experience with Pipelined Multiple Instruction Streams. In “Tutorial Supercomputers: Design and Applications” (K. Hwang, ed.), pp. 239 249. IEEE Computer Society Press, Silver Spring, Maryland. Kandle, D. A. (1987). A Systolic Signal Processor for Signal-Processing Applications. Cumpuler 20(7), 94 95. Kapauan, A,, Wang, K-Y., Cannon, D., and Snyder, L. (1984). The PRINGLE: an Experimental System for Parallel Algorithm and Software Testing. Proceedings International Conference un Parullel Processing, pp. 1 6. Keller, R. M., Patil, S., and Lindstrom, G . (1978). An Architecture for a Loosely Coupled Parallel Processor. Technical Report No. UUCS-78- 105, Department of Computer Science, University of Utah, Salt Lake City. Kluge, W. E., and Schlutter, H. (1980). An Architecture for the Direct Execution of Reduction Languages. Proceedings International Workshop on High-Level Language Computer Archiiecfure, Fort Lauderdale, Florida, pp. 174 180. Kohonen, T. (1987). “Content-addressable Memories-2nd ed.” Springer-Verlag, New York. Kothari, S. C. (1987). Multistage Interconnection Networks for Multiprocessor Systems. In “Advances in Computers-Vol. 26” (M. C. Yovits, ed.), pp. 155 199. Academic Press, New York. Kozdrowski. E. W., and Theis, D. J. (1980). Second Generation of Vector Supercomputers. Computer, 13(11), 71 83. Kuck, D. J. (1982). High-speed Machines and their Compilers. In “Parallel Processing Systems” (D. Evans, ed.). Cambridge University Press, Cambridge, England.
PARALLEL COMPUTER ARCHITECTURES
155
Kuck, D. J . , and Stokes, R. A. (1984). The Burroughs Scientific Processor (BSP). In “Tutorial Supercomputers: Design and Applications” (K. Hwang, ed.), pp. 90- 103. IEEE Computer Society Press, Silver Spring, Maryland. Kuck, D. J., Davidson, E. S., Lawrie, D. H., and Sameh, A. H. (1986). Parallel Supercomputing Today and the Cedar Approach. Science 231, 967-974. Kung, H. T. (1982). Why Systolic Architectures? Computer 15(1), 37-46. Kung, S. Y., Lo, S. C., Jean, S. N., and Hwang, J. N. (1987). Wavefront Array ProcessorsConcept to Implementation. Computer 20(7), 18-33. Lang, G. R., Dharsai, M., Longstaff, F. M., Longstaff, P. S., Metford, P. A. S., and Rimmer, M. T. (1988). An Optimum Parallel Architecture for High-speed Real-Time Digital Signal Processing. Computer 21(2), 47-57. Leeland, S. B. (1987). An Advanced DSP Systolic Array Architecture. Computer 20(7), 95 96. Lincoln, N. R. (1984). Technology and Design Tradeoffs in the Creation of a Modern Supercomputer. In “Tutorial Supercomputers: Design and Application” (K. Hwang, ed.), pp. 3245. IEEE Computer Society Press, Silver Spring, Maryland. Lipovski, G. J., and Malek, M. (1987). “Parallel Computing: Theory and Comparisons.” Wiley, New York. Lopresti, D. P. (1987). P-NAC: a Systolic Array for Comparing Nucleic Acid Sequences. Computer 20(7), 98-99. Mago, G . A. (1979). A Cellular, Language Directed Computer Architecture. Proceedings Conference on Very Large Scale Integration, Pasadena, California, January, pp. 447 452. Manuel, T. (1985). Parallel Machine Expands Indefinitely. Electronics Week, May 13,49-53. McCanny, J. V., and McWhirter, J. G. (1987). Some Systolic Array Developments in the United Kingdom. Computer 20(7), 51-63. Miura, K., and Uchida, K. (1984). FACOM Vector Processor System: VP-IOO/VP-200. In “Tutorial Supercomputers: Design and Applications” (K. Hwang, ed.), pp. 59-73. IEEE Computer Society Press, Silver Spring, Maryland. Mudge, T. N., Hayes, J. P., and Winsor, D. C. (1978). Multiple Bus Architectures. Computer 20(6), 42-48. Nash, J. G., Przytula, K. W., and Hansen, S. (1987). The Systolic/Cellular System for Signal Processing. Computer 20(7), 96-97. O’Donnell, J. T., Bridges, T., and Kitchel, S. W. (1988). A VLSI Implementation of an Architecture for Applicative Programming. Future Generation Computer Systems 4(3), 245.254. Paddon, D. J., ed. (1984). “Super-Computers and Parallel Computation.” Clarendon Press, Oxford. Perron, R., and Mundie, C. (1986). The Architecture of the Alliant FX/8 Computer. In “Digest of Papers, COMPCON, Spring 1986” (A. G. Bell, ed.), pp. 390-393. IEEE Computer Society Press, Silver Spring, Maryland. Pfister, G . F., Brantley, W. C., George, D. A,, Harvey, S. L., Kleinfelder, W. J., McAuliffe, K. P., Melton, E. A,, Norton, V. A,, and Weiss, J. (1987). An Introduction to the IBM Research Parallel Processor Prototype (RP3). In “Experimental Parallel Computing Architectures” (J. J. Dongarra, ed.), pp. 123-140. Elsevier, Amsterdam. Plas, A., Comte, D., Gelly, O., Syre, J. C., and Durrieu, G. (1976). LAU System Architecture: a Parallel Data Driven Processor Based on Single Assignment. Proceedings International Conference Parallel Processing, August 24-27, pp. 293-302. Ray, W. A. (1985). CYBERPLUS: a High Performance Parallel Processing System. Proceedings 1st Intercontinental Symposium Maritime Simulation, Munich, pp. 24 29. Reddaway, S. F. (1973). DAP-a Distributed Array Processor. Proceedings 1st Annual Symposium Computer Architecture, pp. 61-65. Reinhardt, S. (1988). Two Parallel Processing Aspects of the Cray Y-MP Computer System. Proceedings International Conference Parallel Processing, August 15-19, pp. 31 1-314.
156
RALPH DUNCAN
Rettberg. R. D., Crowther, W. R., Carvey, P. P., and Tomlinson, R. S. (1990). The Monarch Parallel Processor Hardware Design. Computer 23(4), 18 30. Rudolf, J . A. (1972). A Production Implementation of an Associative Array Processor: STARAN. Proceedings AFIPS Fall Joint Computer Conference, 41( l ) , 229-241. Russell, R. M. (1978). The Cray-l Computer System. Communications ACM 21( l), 63-72. Schwartz, J. (1983). “A Taxonomic Table of Parallel Computers, Based on 55 Designs.” Courant Institute, New York University, New York. Seitz, C. L. (1985). The Cosmic Cube. Communications ACM 28(1), 22-33. Shaw, D. E. (1981). Non-von: a Parallel Machine Architecture for Knowledge Based Information Processing. Proceedings 7th International Joint Conference on Art$cial Intelligence, 961 963. Sicgel, H. J. ( 1985). “Interconnection Networks for Large-Scale Parallel Processing: Theory and Case Studies.” Lexington Books, Lexington, Massachusetts. Siegel, H. J., Schwederski, T., Kuehn, J. T., and Davis, N. J. (1987). An Overview of the PASM Parallel Processing System. In “Tutorial: Computer Architecture” (D. D. Gajski, V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds.), pp. 387~407.IEEE Computer Society Press, Silver Spring, Maryland. Skillicorn, D. B. (1988). A Taxonomy for Computer Architectures. Computer 21(11), 46 57. Snyder. L. (1982). Jntroduction to the Configurable Highly Parallel Computer. Computer 15( I), 47-56. Snyder, L. ( 1988). A taxonomy or synchronous parallel machines. Proceedings 17th Intrrnational Conjirence P~iraIleIProcessing, University Park, Pennsylvania, 28 1-285. Srini, V. (1985). A Fault-Tolerant Dataflow System. Computer 18(3), 54-68. Srini, V. (1986). An Architectural Comparison of Dataflow Systems. Computer 19(3), 68--88. Stenstrom, P. (1990). A Survey of Cache Coherence Schemes for Multiprocessors. Computer 23(6), 12 24. Stolfo. S. (1987). Initial Performance of the DADO2 Prototypc. Computer 20(1), 75-83. Stolfo, S. J., and Miranker, D. P. (1986). The DADO Production System Machine. Journal Parallel and Distributed Computing 3(2). 269-296. Treleaven, P. C., and Mole, G. F. (1980). A Multi-Processor Reduction Machine for UserDefined Reduction Languages. Proceedings 7th International Symposium on Computer Archifrcturr, pp, 121 130. Treleaven. P. C., Brownbridge, D. R., and Hopkins, R. P. (1982b). Data-Driven and DemandDriven Computer Architecture. A C M Cumput. Surueys 14(1), 93 143. Treledven, P. C., Hopkins, R. P., and Rautenbach, P. W. (1982a). Combining Data Flow and Control Flow Computing. Computer Journal 25(2), 207-217. Villemin, F. Y . (1982). SERFRE: A General-Purpose Multi-Processor Reduction Machine. Proceedings Internutionnl Conference Purallel Processing, August 24 27, pp. 140- 141. Vranesic, Z., Stumm, M., Lewis, D., and White, R. (1991). Hector: a Hierarchically Structured Shared-Memory Multiprocessor. Computer 24(1), 72-79. Wada, H., Ishii, K., Fukagawa, M., Murayuma, H., and Kawabe. S. (1988). High-speed Processing Schemes for Summation Type and Iteration Type Vector Instructions on Hitachi Supercomputer S-820 System. Proceedings International Conference Supercomputing, St. Malo, France, pp. 197 206. Watson, W. J. (1972). The ASC-a Highly Modular Flexible Super Computer Architecture. Proceedings AFIPS Fall Joint Computer Corference, 221-228. Watson, I . , and Gurd, J. (1979). A Prototype Data Flow Computer with Token Labeling. Proceedings National Computer Conference, New York, 48, 623 628. Widdoes. L. C., and Correll, S. (1979). The S-l Project: Developing High-Performance Digital Computers. Energy and Technology Reuiew, Lawrence Livermore Laboratory Publication UCRL-52000-79-9, September 1-1 5.
PARALLEL COMPUTER ARCHITECTURES
157
Wiley, P. (1987). A Parallel Architecture Comes of Age at Last. IEEE Spectrum 24(6), 46-50. Yau, S. S., and Fung, H. S. (1977). Associative Processor Architecture-a Survey. A C M Computing Surveys 9( I), 3-27.
This Page Intentionally Left Blank
Content-Addressable and Associative Memory” LAWRENCE CHlSVlN Digital Equipment Corporation Hudson. Massachusetts
R . JAMES DUCKWORTH Department of Electrical Engineering Worcester Polytechnic Institute Worcester. Massachusetts 1. Introduction . . . . . . . . . . . . . . . . . . 2. Address-Based Storage and Retrieval . . . . . . . . . . 3. Content-Addressable and Associative Memories . . . . . . . 3.1 Nomenclature . . . . . . . . . . . . . . . . . . . . 3.2 Materials . . . . . . . . . . . 3.3 Associative Storage and Retrieval in a CAM . . . . . . 3.4 Multiple Responses . . . . . . . . . . . . . . 3.5 Writing into a CAM . . . . . . . . . . . . . 3.6 Obstacles and Advantages of Content-Addressable and Associative Memories . . . . . . . . . . . . . . . . 3.7 Applications that Benefit from a CAM . . . . . . . . 3.8 New Architectures . . . . . . . . . . . . . . 4. Neural Networks . . . . . . . . . . . . . . . . 4.1 Neural Network Classifiers. . . . . . . . . . . . 4.2 Neural Network as a CAM . . . . . . . . . . . 5. Associative Storage, Retrieval, and Processing Methods . . . . . 5.1 Direct Association . . . . . . . . . . . . . . 5.2 Indirect Storage Method . . . . . . . . . . . . 5.3 Associative Database Systems . . . . . . . . . . . 5.4 Encoding and Recall Methods . . . . . . . . . . 5.5 Memory Allocation in Multiprocessor CAMS . . . . . . 5.6 CAM Reliability and Testing . . . . . . . . . . . 6. Associative Memory and Processor Architectures . . . . . . . 6.1 Associative Memory Design Considerations . . . . . . . 6.2 Associative Processors . . . . . . . . . . . . . 6.3 CAM Devices and Products . . . . . . . . . . .
. . . .
. . . . . . .
. . . . . .
. . . . . .
160 162 164 164 . . . 165 . . . 166 . . . 167 . . . 168
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . .
168 170 173 174 175 176 176 177 178 178
. 180 . 182 . 183 . 184 . 186 . .
187 198
* Based on “Content-Addressable and Associative Memory : Alternatives to the Ubiquitous RAM” by Lawrence Chisvin and R . James Duckworth which appeared in IEEE Computer. Vol . 22. No . 7. pages 51-64. July 1989. Copyright 0 1989 IEEE . 159 ADVANCES IN COMPUTERS. VOL 34
Copyright 1992 by Academic Press. Inc . All rights of reproduction in any form reserved. ISBN 0-12-0121 34-4
160
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
Software for Associative Processors. . . . . . . . . 7.1 STARAN Software. . . . . . . . . . . . 7.2 DLM Software . . . . . . . . . . . . . 7.3 ASP Software . . . . . . . . . . . . . 7.4 Patterson’s PL/l Language Extensions . . . . . . 7.5 PASCALIA. . . . . . . . . . . . . . 7.6 LUCAS Associative Processor. . . . . . . . . 7.7 The LEAP Language . . . . . . . . . . . 7.8 Software for CA Systems . . . . . . . . . . 7.9 Neural Network Software . . . . . . . . . . 8. Conclusion, . . . . . . . . . . . . . . . . 8.1 Additional References . . . . . . . . . . . 8.2 The Future of Content and Associative Memory Techniques Acknowledgments. . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . 7.
. . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . .
212 213 215 216 21 8 219 220 221 223 223 225 . 225 . 228 . 228 229
. . . . . . . . .
1. Introduction
The associative memory has finally come of age. After more than three and a half decades of active research, including scores of journal papers, conference proceedings, book chapters, and thesis treatments, the industry integrated circuit design and fabrication ability has finally caught up with the vast theoretical foundation built up over that time. The past five years in particular have seen an explosion in the number of practical designs based upon associative concepts. Advances in very large-scale integration (VLST) technology have allowed many previous implementation obstacles to be ovcrcome, and there seems to be a more general recognition that alternative approaches to the classic method of computing are necessary to produce faster and more powerful computing systems. This chapter describes the field of content-addressable memory (CAM) and associative memory, and the related field of associative processing. Content-addressable and associative memory are a totally different way of storing, manipulating and retrieving data compared to conventional memory techniques. The authors’ work in this area started in 1984, when it became obvious that a faster, more intelligent memory solution was required to efficiently accommodate a highly parallel computer system under development (Brailsford, 1985). Although tremendous improvements had been made in the speed and capability of both microprocessors and peripherals, the function of memory had changed very little. We realized that a more intelligent memory could off-load some of the data processing burden from the main processing unit, and furthermore, reduce the volume of data routinely passed between the execution unit and the data storage unit.
CONTENT- AD D R ESSAB L E AN D ASSOC IATlV E
M EM0RY
161
This chapter is a review and discussion of the kind of intelligent memory that would solve the problems we recognized. It spans the range from content-addressable memory (CAM), that can retrieve data based upon the content rather than its address, and extends into associative processing, which allows inexact retrieval and manipulation of data. The field of neural networks is covered as well, since they can be considered a form of associative processor, and because some researchers are using neural networks to implement a CAM . Throughout the text, recent content addressable and associative system examples are used to support the authors’ contention that such systems are now feasible. The size and versatility of actual devices has been increasing rapidly over the last few years, enabling the support of new kinds of parallel and A1 architectures. The paper by Kadota et al. (Kadota, 1985), for example, describes an 8-kbit device they call a CARM (content-addressable and reentrant memory), designed to provide a high-speed matching unit in data flow computers. Also, a project in England called SCAPE is involved with the design of an associative parallel processor which has been optimized for the support of image processing algorithms (Jones, 1988); a 20-kbit CMOS associative memory integrated circuit design for artificial intelligence machines is described by Ogura et al. (Ogura, 1986), and recently a machine called ASCA was developed which executes Prolog at high speed using CAMs (Naganuma, 1988). The largest CAM device built at this time appears to be the DISP (dictionary search processor) chip (Motomura, 1990) which, with a 160-kb CAM, is over ten times larger than previously reported CAMs. A number of commercial content-addressable memory devices have recently been introduced by Advanced Micro Devices, Coherent Research Inc., Music Semiconductors, and Summit Microsystems. These devices are described in more detail in later sections. An interesting idea that takes into account the inherent fault-tolerant capabilities of a CAM has also recently been reported by a number of researchers. In a conventional memory system every addressable memory cell must function correctly, otherwise the device is useless. However, if faulty cells in a CAM can be found and isolated, then a perfect device is not essential since the actual storage of data does not have to relate to a specific memory location. Another interesting development that has recently been published is a proposal to construct an optical content addressable memory (Murdocca, 1989). We start this chapter with a brief overview of the traditional addressbased storage method which pervades all our present-day computer systems, and describe some of its deficiencies and inherent weaknesses. We then introduce the concept of content-addressable and associative storage and explain some of the terminology that abounds in this area. Next, we explain some of the obstacles that face the development of these intelligent memory
162
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
systems and explain the potential advantages that can be obtained if the obstacles can be overcome. We then describe how CAM techniques are presently being used in both traditional computers and some newer highly parallel computer systems. We also introduce the technique of hash coding that has been used by computer system designers in the past in an attempt to implement the CAM functionality using software. We follow this with a discussion of the use of neural networks as associative processing systems. In the next section we describe the storage, retrieval, and processing of data using associative techniques and then we describe the associative memory and processor architectures of devices that have either been used or are in active use today. This section also describes the design and use of CAM devices that are commercially available or have been produced in research laboratories. The issues of software for associative processors, including software for neural networks, is discussed next. Finally, in order to place our chapter in historical context, we summarize the major milestones and the articles that have been published over the last 25 years, and we conclude with some thoughts on the future prospects for the field of intelligent memory systems. We hope that this chapter will explain the associative concepts in enough detail to interest new people to study existing problems, and that it will motivate the incorporation of some of the ideas discussed into new designs, thus accelerating the exciting progress already underway. 2.
Address-Based Storage and Retrieval
Traditional computers rely on a memory architecture that stores and retrieves data by addressing specific memory locations, as shown in Fig. 1.
address
I
1
2 1 w0RD3
I
+eddata*
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
163
Every accessed data word must travel individually between the processing unit and the memory reservoir through a communications medium, often a shared bus, one word at a time. The elegance and simplicity of this approach has ensured its success, evidenced by the ubiquitous nature of the computer today, However, there are some inherent drawbacks to a word-at-a-time, location-addressed memory. One major problem of address-based memory is that the memory access path becomes the limiting factor for system performance. This has come to be known as the “von Neumann bottleneck” (Backus, 1978). Much of the traffic on the communications medium is involved with sending information back and forth merely to calculate the efective address of the necessary data word. A second important drawback to the location-addressed approach is the serial nature of the processing, where each piece of information manipulated by the computer must be handled sequentially. This approach is particularly slow in search and compare problems, for example, where many items must be inspected to determine the outcome. If the items are each distinct and unrelated to one another, then the only reason they must be processed sequentially is that the architecture is limited to handling them in that manner. All the records could be inspected simultaneously if the system allowed it. A linear search operation for an exact match on a conventional computer finds the match, on average, halfway down the search list. The search time increases at the same rate as the list size. The performance penalty increases if a more complex comparison is necessary while searching, such as correlating or sorting the data. Techniques such as hash coding and hardware execution pipelines attempt to alleviate the problems by reducing the search time and overlapping the functions. However, improvement using conventional methods is limited. Addressing by location is particularly inefficient when : 0 0
0
Data are associated with several sets of reference properties Data elements are sparse relative to the values of the reference properties Data become dynamically disordered in memory during processing (Hanlon, 1966)
The serious disadvantages inherent in location-addressed memories become more obvious when multiple processing units are introduced into the computing system. Modern parallel processing architectures, such as data flow machines, exploit application parallelism to increase their execution performance. Systems that rely on data flow properties do not execute efficiently using a traditional memory, where each data word can only be
164
LAWRENCE CHlSVlN A N D R. JAMES DUCKWORTH
accessed serially by its location in a large sequential array. The use of CAM devices in these types of parallel computers are discussed in more detail in Section 3.8, “New Architectures.” Conventional memory systems are also too slow in some applications. For example, the bridge between high-speed local area networks is readily implemented with a CAM. The bridge provides transparent communication between workstations on different networks. The problem with a bridge is the short time in which it must recognize that messages are for a station on the other network and route it accordingly: There may be many thousands of stations on the networks and the bridge must check the destination address to determine whether to accept the message and pass it on to the other network. Sequentially comparing an incoming address with addresses stored in the bridge may take many cycles, and slows down the overall message transfer in the system. Ideally the search and comparison should be done in parallel so that the search time remains constant irrespective of the number of addresses that must be compared. Commercial content addressable memory devices manufactured by Advanced Micro Devices and MUSIC Semiconductors, and described in more detail in Section 6.3, “CAM Devices and Products,” can carry out this search action and require less than 1 ps to find a match.
3.
Content-Addressable and Associative Memories
The basic problems with conventional address-based systems have led researchers to investigate the potential benefits of CAMs, where information is stored, retrieved, or modified based upon the data itself, rather than by its arbitrary storage location. In some ways, we can view such a memory as a representation of the information it contains, rather than as a consecutive sequence of locations containing unrelated data (Kohonen, 1987). 3.1
Nomenclature
Actual implementations of CAMs have been reported since at least 1956 (Slade and McMahon, 1956), and much of the research has used its own definitions. The people who surveyed the early progress of this field showed the problems associated with keeping track of what research was being conducted. Hanlon, in his 1966 paper (the first comprehensive survey of the field) (Hanlon, 1966), defined content-addressable memory as a storage mechanism where the data are located by content. He defined an associative memory as “a collection or assemblage of elements having data storage capabilities, and which are accessed simultaneously and in parallel on the
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
165
basis of data content rather than by specific address or location.” In this definition, the term “associative” referred to the interrelationships between the data elements, rather than the specific method of storage. Minker (1971) used the International Federation of Information Processing definition of an associative store as “a store whose registers are not identified by their name or position but by their content.” Parhami (1973) defined an associative memory as a “storage device that stores data in a number of cells,” where the cells “can be accessed or loaded on the basis of their contents.” This was similar to what Hanlon had called a content-addressable memory. Parhami further defined an associative processor as a system that exhibited sophisticated data transformation or included arithmetic control over the contents of a number of cells, depending upon their content. An associative computer was defined as “a computer system that uses an associative memory or processor as an essential component for storage or processing.” Foster (1976) defined a CAM to be “a device capable of holding information, comparing that information with some broadcast information, and indicating agreement or disagreement between the two.” A content-addressuble purulldprocessor ( C A P P ) was defined as “a CAM with the added ability to write in parallel into all those words indicating agreement.” In more recent literature, the term “associative memory” is used to describe a general storage and retrieval system that can access or modify cells based on their content, but does not necessarily need an exact match with a data key. This is similar to Hanlon’s definition, and is the more generic description. Content-addressable memory has come to represent the mechanism that is used to implement the associative system. However, many research papers still refer to them seemingly interchangeably, and both terms must be used to effectively find information on the topic.
3.2 Materials Most associative memories today are constructed using silicon in the form of VLSI circuits, and the examples in this chapter are mainly drawn from that wealth of experience. There are, however, systems in various stages of experimentation that are built using other methods, including Josephson memory cells (Morisue et al., 1987) and optical or optoelectronic principles (Farhat, 1989; Murdocca et ul., 1989; White, 1988). The field of optics in particular shows excellent promise, and it is likely that someday large, complex, and powerful associative engines will be designed and produced using optical techniques (Berra, 1987). This area is still in its infancy, however, and the systems being produced and suggested are more useful as small research vehicles than as commercially viable products (Berra et al., 1990).
166
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
3.3 Associative Storage and Retrieval in a CAM The concepts of storage and retrieval in a CAM are straightforward, and are described here in canonical form. The basic functions of a CAM are: 1. Broadcast and comparison of a search argument with every stored
location simultaneously 2. Identification of the matching words 3. Access to the Matching Words. Figure 2 is a simple block diagram of a CAM. A particular record can be found by matching it with a known pattern. This involves a key word, a mask word, and matching logic. The key word is used to input the pattern for comparison, while the mask word enables only those parts of the key word that are appropriate in the context of the request. The key and mask word combination is provided to the tag memory and matching logic, where the actual data comparison takes place. After a match has been found, the appropriate data words can be output to the requesting program or modified, depending upon the capabilities of the system architecture and the requirements of the application. Figure 3 shows a common CAM data word arrangement, where each data word is partitioned into fixed segments. In this scheme, there are three fields containing specific information. The tag bits signify the type of location, and are used to show whether the location is empty or used. If the location is used, this field identifies the type of information stored, such as temporary data or program code.
TAG MEMORY AND
MATCHING LOGIC
4
DATA MEMORY
FIG.2. Content-addressable memory block diagram.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
ITAGSI
LABEL
I
167
1
DATA
Fic;. 3. Common bit arrangement for content addressable memory.
The balance of the word is split into lube1 and data fields. The label field is used to match any incoming key word requests, and the data field holds the information that will be returned or modified. If more flexibility is desired, or if the label information is embedded in the data, these two conceptual segments are treated as one entity. In this implementation, the entire data field is compared with the properly masked search key.
3.4 Multiple Responses Since it is possible (in fact, likely in a complex system) that a search will identify more than one matching stored record, some method of sorting through or outputting multiple matches must be provided. The two main problems with multiple responses are (1) identify the number of responders, and (2) select each member from the set of responses. As an example of these problems, assume that a 24-bit CAM contains the entries shown in Fig. 4. This figure shows separate label and data fields, and for simplicity contains no tug field. When the key word “3AH” is applied to the CAM, three labels respond after the matching operation. These responders have corresponding data items containing 3 8 6 C H , ABCDH, and 9732H. Each of the multiple matches might be selected at random or in some predefined priority order. Assuming some priority, the matching words could be presented as they are found in an ordered array, or they could be sorted by an algorithmic selection process. In the example of Fig. 4, they might be sorted alphabetically or numerically.
I
I
LABEL
I
DATA
I
25H
I
1234H
I
3AH
386CH
65H
42878
3AH
ABCDH
80H
5624H
3AH
9732H
FIG. 4. A section of a CAM.
168
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
3.5 Writing into a CAM After the matching operation identifies the associated data, it is often necessary to write new information into the CAM. This brings with it decisions unique to associative storage. The first difficulty is deciding where in the memory to write the information. Since the data words are usually not addressable by location, some other method of identifying currently available memory areas must be employed. This method must take into account the likelihood that certain data words are related to one another and should be stored for efficient retrieval. The free areas can be identified by their content, by a separate tag bit, or by a pattern in a special field. The word might be placed at random within the free area, or a more predictable choice might be made. For example, the new data word might be stored in the first free memory area found (if this had any meaning in the particular architecture), or it might be placed close to other related data. The algorithm would depend upon the intended application. Changing only the partial contents of a data word is a desirable function, and writing to an arbitrary number of cells within different words is a potentially powerful operation (Parhami, 1973). Once the location has been determined, the memory system must also confront the decision of exactly how the word is to be written. Since a content addressable architecture relies on the relationship of the data, it is not sufficient to merely dump the new information into its appointed storage location. The data currently residing in that location might have to be merged with the incoming data, and the label field will almost certainly have to reflect the new information now contained in that area of the memory. 3.6
Obstacles and Advantages of Content-Addressable and Associative Memories
There have been a number of obstacles to commercially successful associative memories. Some of these are listed below: 0 0 0
0 0
Relatively high cost for reasonable storage capacity Poor storage density compared to conventional memory Slow access time due to the available methods of implementation Functional and design complexity of the associative subsystem Lack of software to properly utilize the associative power of the new memory systems
An associative or content-addressable memory is more expensive to build and has lower storage density than a conventional address-based memory
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
169
because of the overhead involved in the storage, comparison, manipulation, and output selection logic, Some current large-scale integration (LSI) versions of CAMs are discussed later, where this characteristic can be clearly seen. Content-addressable and associative memories are always more complex than location addressable memories. Manipulation of data based upon the contents, in association with the contents of other locations, entails design decisions that do not exist when information can be saved merely by address. In a location-addressed memory, the usual design considerations include the word length, the number of words, the base technology (e.g., CMOS vs. ECL), the internal architecture (e.g., dynamic vs. static), and the speed. The CAM has all the decisions above, and adds some significant tradeoffs of its own. The internal fields have to be determined (e.g., whether a special “index” field is required or any arbitrary search capability is to be allowed), what the internal architecture will be (e.g., bit-serial vs. word-seriat), how to interface to the memory, how much internal interconnection is required between the various cells, how to handle multiple CAM hits, how to detect and correct errors, what language will best utilize the architecture and hardware (it may even be necessary to write a new language), and how this more expensive system compares to the traditional way of solving the target problem. The unfamiliarity with associative concepts that hampers many designers aggravates the situation, but even a widely understood CAM technology involves the extra tasks of storage and retrieval by content. Very little software is currently available for CAMs. At some very high level of hierarchy, the program differences can be made transparent, but the lowest programming levels will always have to adapt to the underlying architecture (and in some cases the hardware) to extract the power of content-based storage efficiently. 3.6.I
Motivating Factors
The motivation to overcome these obstacles is that a combination of highly parallel processing techniques and associative storage lends itself to certain classes of applications (Murtha, 1966; Thurber and Wald, 1975). For example, a large part of the software in administrative and scientific data processing is related to the searching and sorting necessary for data arrangement in a sequentially addressed memory. This is especially true in tasks such as compiling, scheduling, and real-time control. This type of housekeeping is unnecessary in a CAM because the data are used for retrieval and can be output already sorted by whatever key is specified. A database consisting of unordered list structures is a perfect candidate for content-addressable treatment (Hurson et al., 1989). Because the CAM
170
LAWRENCE CHlSVlN A N D R. JAMES DUCKWORTH
searches and compares in parallel, the time to extract information from the storage medium is independent of the list length. There is no need to sort or compact the information in the memory, since it can be retrieved easily based on its contents. This has immediate implications for common data manipulations such as collating, searching, matching, cross-referencing, updating, and list processing. An associative approach can help any problem where the information is stored based on an association with other data items. Lea ( 1 975) provides an excellent illustration of the type benefits obtainable through the use of a CAM. He discusses how one would access and update a company telephone directory. Using a location-addressable memory would involve some special technique for accessing the data, such as “hash-coding” or “inverted-listing” on the name field (Kohonen, 1987). This works fine until it is necessary to retrieve the information by a field other than the name. If one wanted to find out who was in room X, for example, it would still be necessary to go through the entire list looking for the “room” field. One could, of course, provide an access key for other fields, but this method of cross-retrieval quickly becomes cumbersome for a great number of possible keys and a large database. Moreover, if one wanted to allow for access based upon a name that was “almost” right, the design of the retrieval key would have to be such that this was possible. Updating such a database involves other problems, and the more flexible the retrieval mechanism, the longer and more complex the job of storage. A CAM solves all these problems. Retrieval is based upon the actual contents, and in this way every field is a “key” to the entire entry. Since the database is inspected in parallel, access to any specific data cell is fast and efficient. The storage process is greatly simplified since the location of the entry is irrelevant. Once the input fields are stored, their actual contents provide the links to the rest of the database. 3.7 Applications that Benefit from a CAM
This characterization suggests a vast array of applications that can potentially benefit from associative treatment. As just a few recent examples, content-addressable and associative memories have been suggested for list processing system garbage collection (Shin and Malek, 1985a), graph transversal (Shin and Malek, 1985b), pattern classification (Eichmann and Kasparis, 1989; Suzuki and Ohtsuki, 1990), pattern inspection (Chae et al., 1988), text retrieval (Hirata et al., 1988; Yamata et al., 1987), signal and image processing (Lea, 1986), speech processing (Cordonnier, 198l), image analysis (Snyder and Savage, 1982; Lee, 1988), parallel exhaustive search for NP-complete problems (Yasuura et ul., 1988), digital arithmetic through
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
171
truth table lookup processing (Cherri and Karim, 1988; Mirsalehi and Gaylord, 1986; Papachristou, 1987), logic simulation (Sodini er al., 1986), probabilistic modeling (Berkovich, 198I), and characterization of ultrasonic phenomena (Grabec and Sachse, 1989). CAMs are especially appropriate for computer languages, such as LISP (Bonar and Levitan, 1981; Ng et al., 1987), and PROLOG (Chu and Itana, 1985) that use list structures as their building block and tend to fragment memory with their execution. The Conclusions section at the end of this chapter provides a review of some important previous surveys on content-addressable and associative systems. Most of the literature mentioned there has its own list of applications. The performance improvement in many of the above areas can be dramatic using an associative memory, especially when the database being manipulated is large, and search time on a conventional computer becomes significant. To see this improvement, consider a sorting problem, where each data item must be compared to all the other data items to ascertain its sort position. In the general case, the search/sort time grows at the rate of O(n log n), where n is the number of items on the list. An application that needed only the maximum value would grow in execution time at least as fast as the increase in the list size. With an appropriate associative memory or associative processor, the sorting could be done while the access is occurring, growing only as the list grows. A problem that needed only the largest value could inspect all the data items simultaneously, and the performance would be the same for any list size that fit within the memory. One novel application for a CAM is the processing of recursively subdivided images and trees (Oldfield et al., 1987). An example of this is a binary tree, used to represent the pixels in a drawing. If each node in the tree has to be visited for the desired operation, then a conventional location-addressed memory can be made efficient. If, however, only a few of the nodes need to be either inspected or changed (for example, if a small portion of the picture needs modification), a CAM is a clear winner. With a traditional memory, the entire tree must be rewritten for each local change, and a search entails looking at all the previous nodes for the one of interest. A CAM allows a single node to be easily found (by using an appropriate key pattern as a search argument), and provides for local changes in constant time. CAMs have been suggested to enhance the performance of logic programming systems. One such system implements a version of the PROLOG language, and uses a CAM for the variable environment and the database (Nakamura, 1984). In this system, a more traditional serial depth-first search, and a heuristic (best-first) concurrent evaluation, can both be accommodated. In the depth-first method, the bindings are stored in the CAM, and are referred to by the appropriate keys. Concurrent program evaluation
172
LAWRENCE CHlSVlN A N D R . JAMES DUCKWORTH
is obtained by having the execution processors share the common associative memories. These memories contain the environments, the database, and the search operation context table. More recently, researchers at Syracuse University have been investigating the use of CAMs to increase the speed of logic programming (Kogge et al., 1988; Oldfield, 1986; Ribeiro, 1988; Ribeiro et al., 1989). The SUMAC machine (Syracuse University Machine for Associative Computation) uses advanced CAMs and an instruction set well suited for logic programming execution (Oldfield, 1987b). The logic expressions in their system are represented using the previously described CAM-implemented tree structures (Oldfield et al., 1987). Operations related to unification and data structure manipulation are the most frequent and time-consuming parts of executing logic programs. The unification operation involves finding substitutions (or bindings) for variables which allow the final resolution of the user’s goal. The backtracking operation describes the process by which a program continues a search by examining alternate paths. Both of these operations can be improved by using a CAM to store the information. An even better idea is to reduce the number of such operations, and a CAM will help in this area, too. An index stored in the CAM can filter the clauses needing to be matched against a goal. This can reduce the number of blind alleys in the search operation, and thereby increase the efficiency of the program. The other major operation that can be improved by a CAM is the maintenance and updating of the data structures. One example of this is the creation of, search for, and deletion of a binding. Other examples are garbage collection (easily implemented by an “in use” bit defined in the CAM word) and compaction (totally unnecessary in a CAM). Many conventional computer implementations of PROLOG spend an inordinate amount of time searching through linear lists. As already discussed, an associative treatment of this function has potentially significant performance improvement capability. The technique of content addressing has been used for many years in the design of cache memory. A cache memory is a high-speed memory placed in the path between the processor and the relatively slow main memory. The cache memory stores recently accessed data and code with the idea that if the processor needs to access this data again it can be retrieved from the faster cache rather than main memory and therefore speed up the rate of execution. For more information see, for example, Stone’s work (Stone, 1990). CAM techniques have also been used for many years in memory management units to carry out the virtual-to-physical address translations. The AT&T WE-32201 Integrated Memory Management Unit/Data Cache
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
173
(IMDC) is reported to be the first device to include a content addressablememory based Memory Management Unit (MMU) and a large instruction/ data cache on a single chip (Goksel, 1989).
3.8 New Architectures Another important use of CAMs and associative memories is in the implementation of new computer architectures. Massively parallel computer systems cannot depend upon a serial memory bottleneck, but must instead have concurrent access to much of the computer database. Data flow machines, for example, rely on data words that have tags to route them through the system. The matching of the data tags to the proper operation nodes requires an associative operation. In the data flow computational model, instructions can only be executed if all their operands are available, but the pairing and routing of the operands for instructions is one of the most critical parts of the whole system. An example of an early data flow computer is the Manchester data flow machine (Gurd et al., 1985). At the time the Manchester machine was constructed (1978), the largest commercially available CAM was only 64 bits in size, making the cost of building and implementing a true content-addressable matching store prohibitive. Instead, a pseudocontent-addressable matching store was implemented by manipulating data in conventional random access memory through a hardware hashing function unit (Silva and Watson, 1983). [For more information on hash coding see, for example, Knuth (1973).] Technology improvements have now made larger-sized CAMs feasible. Two papers have recently been published that describe devices developed using content addressable memory techniques to improve the performance of data flow systems. Takata et ul. (1 990) describe a high-throughput matching memory that uses a combination of a small amount of associative memory (32 words by 50 bits) with a hashing memory (512 words by 42 bits). The paper by Uvieghara (1990) describes a smart memory for the Berkeley dataflow computer. The content-addressable and reentrant memory (CARM), which is described in Section 6.3.1.1, is also suitable to construct a highspeed matching unit (Kadota ef ul., 1985). 3.8.1 Computer Networks
Associative memory is also appropriate in an intelligent computer network routing system. It could utilize the association of message content and routing information to efficiently package and dispatch the network intercommunication. Duplicate messages within local routing areas could be sent between the global routing nodes as a single message, which would be
174
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
decoded at the receiving node by content-addressable techniques. This would reduce the traffic between the global nodes. In multistage networks such as an omega network (see, e.g., Almasi and Gottlieb, 1989; Decegama, 1989) it is very advantageous to reduce network traffic by a technique known as message combining. This technique combines messages together that are to be sent to the same memory location. If message combining is not performed then hotspots (Pfister and Norton, 1985) can occur degrading the performance of the system. Message combining is implemented in the New York University (NYU) Ultracomputer (Almasi and Gottlieb, 1989). Using content addressable techniques to compare and match the destination addresses may result in substantial performance improvement.
4.
Neural Networks
The field of neural networks (1989a; Lippmann, 1987) has in recent years gone from a research curiosity to commercial fruition. In some ways, neural networks represent the entire field of associative memories. This is true for two reasons. First, the concepts behind neural networks were understood long before the technology was available to implement them efficiently. Second, a neural network is in every way an associative processing engine. It is ironic that John Von Neumann, whose word-at-a-time architecture has become so prevalent in the computer field, was one of the early proponents of the associative memory discipline (Gardner, 1990) long before there was any possibility of implementing a feasible system. This great man’s own work helped to establish the associative systems that are now making inroads into his previously unchallenged computer architecture. The field of neural networks has grown substantially in recent years due to improvements in VLSI technology (Goser et al., 1989; Graf et al., 1988; Treleaven et ul., 1989). The number of groups actively involved in artificial neural network research has increased from about 5 in 1984 to about 50 in 1988 (Murray, 1990). Neural network business has gone from about 7 million dollars in 1987 to around 120 million dollars today (Gardner, 1990). The basis for neural networks is massive parallelism, simple fine-grained execution elements, and a highly connected intercommunication topology. The network explores many competing hypotheses in parallel, arriving at the best solution based upon the input stimulus and the links and variable weights already in the system. The neural network has as its biological model the human brain, and it attempts to solve the same types of problems that humans can solve so well. They happen to be problems that conventional computers struggle with, mostly unsuccessfully.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
175
Current targets of neural networks are the fields of speech and pattern recognition, process control, signal processing, nondestructive testing, and stress analysis. Despite advances in conventional computer technology, where computers have been designed that are significantly more powerful than anything available just a few years ago, speech and pattern recognition remains elusive. Pure computation speed does not seem to be an advantage for these problems. The human brain, for example, is relatively slow (about 1000 pulses per second) (Treleaven et al., 1989) yet people can recognize entities even when obscured. The current thought is that it is the parallelism that contributes to this skill, and that is where neural networks come in (Lerner, 1987). 4.1 Neural Network Classifiers
Figure 5 shows a neural network classifier (Lippmann, 1987). This system accepts input patterns and selects the best match from its storage database. The input values are fed into the matching store, where matches are made based upon the currently available data. The intermediate scores are then passed to the selection block, where the best match is filtered for output. The selection information is returned to modify the matching store data, and thus train the associative network. The data interconnections that are formed during this training session become the associative store, and provide the basis for later content-driven output selection. As selections are made, the feedback modifies the storage information such that the correct associative interconnections exist within the neural network. The weights that are used to interconnect the neurons thus change over time. This allows the network to “learn” what is appropriate behavior for various input conditions. This learning process is accomplished during a set of supervised training trial runs. Different sets of input stimuli are presented
COMPUTE
INPUTS
INTFRMEDIATE
MATCHING
AND
u I ENHANCE
SCORE
MAXIMUM
FIG.5. Neural network classifier.
OUTPUT
176
LAWRENCE CHlSVlN AND
R. JAMES DUCKWORTH
to the network, and at the end of the sessions the network is either told how well it performed or it is given what the correct answers should have been. In this way, the neural network becomes iteratively better at its task until the point at which it is ready for real, non-training input stimulus. The detailed underlying operation of the neural network is oriented to the higher-lcvel function of selecting from its stored database the pattern that is “most like” the input stimulus. The definition of “most like” varies, depending upon the required operation of the network. If the neural network is created to recognize speech, for example, the comparison might depend upon some encoded version of the raw input stimuli, extracting the frequency content over some time snapshot. On the other hand, a vision recognition system might break a picture into pixels and use a gray scale level to represent and compare two images. Teuovo Kohonen, of the Helsinki University of Technology, has developed an associative memory using a neural network that does an astounding job of recognizing partially obscured facial images (Kohonen et al., 1981).
4.2 Neural Network as a CAM
A neural network can also be used as a CAM (Boahen et al., 1989; Verleysen et al., 1989a, b), and provide the correct output when only part of an input pattern is available. One example of this use is a bibliographic search subsystem. A partial citation could be input to the neural network, and the entire bibliographic entry would be found and output. This could be handled by training the neural network to recognize any piece of the bibliographic reference exactly, or by recognizing which internal reference most closely matches the input data. even if no field has an exact match. The input data might be encoded in some way and stored as a data pattern that has no easily recognizable association with the actual information. Encoding the data, however, might make classification and recognition easier or more accurate.
5. Associative Storage, Retrieval, and Processing Methods
In an associative memory, we must assign state variables to conceptual items and the connections between them. The associative recall takes the form of a response pattern obtained on the output when presented with a key pattern on the input. A further input in the form of a mask pattern includes context information that selects closeness of recall. In this way, a broad search might turn up enough information to allow a more narrow
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
177
search using a new key or mask context. Many relationships can be considered when manipulating the data within an associative memory, and the intersection of the relevant items can be used for specific recall. An example of this might be a search for all the people on a particular street who had incomes above 20,000 dollars per year. Depending upon how the information was stored, this might take one pass through the CAM or two. In the two-pass method, the first pass could provide the name of the street as the key (or key and mask combination), and the output would be a list of names. This list would be buffered somewhere and the second pass would provide a key and mask combination that only matched people of incomes greater than 20,000 dollars. The intersection of the two lists (names of people on the street and people who made more than 20,000 dollars) is the target in this example. One way to further process the two lists would be to feed the fully extracted information from each associative pass into a standard computer, where they would be combined sequentially. This would not provide the performance of a totally parallel CAM, but would still be faster than doing the entire operation on that same sequential computer. A two-pass associative strategy could be implemented by loading the results of the first pass (including name and income) into a CAM buffer, and providing the income match as the second-pass key. The second-pass search key would be applied to the new buffer CAM which contained the retrieved list of names. This would provide a list of matches that already contained the intersection of street name and salary. If the information was properly structured in the initial CAM, a one-pass solution to this problem is possible. For example, if the entry for each name included the person’s street address and income, a key and mask combination could be formulated which only matched those entries falling into the appropriate intersection. As the previous example hints, information in an associative memory can be arranged in different ways. The various elements in the memory can be linked by direct or indirect association. 5.1
Direct Association
Direct association makes a logical link between the stored items. Using one of the items as a key causes the memory to present the associated item. This method of association is limited to systems where the interconnection fields of all the data items is known at the time the information is stored. A direct association storage mechanism can have more than two items in the link, as long as they are fixed and specific. In the example at the start of this section about street names and income levels, it was mentioned that
178
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
certain methods of storage could provide the ability to retrieve the intersection in one pass. A direct association system might allow this. If the name, street, and income were all combined directly then one key and mask combination could be used to query the database and pick off the matching entries. The drawback to this, of course, is that every conceivable link must be known and fixed from the start. The larger and more subtle the interconnections become, the more cumbersome this method is. If we wanted to add religion, political affiliation, and marital status to the list, it would soon be impossible to provide a one-pass answer to any reasonably useful query. Beyond that, it would be impossible to query the database using any link that was not understood during the storage. 5.2 Indirect Storage Method The indirect storage method involves the use of inferences to save information, giving an object a certain value for an attribute. In a simple case, three pieces of information can be stored for each set. By providing one or two of the pieces of information as a key, the entire triple can be accessed. ATTRIBUTE =color
VALUE = red
FIG. 6 . Indirect association.
Consider an apple with the color red, as shown in Fig. 6 . The object here is “apple,” the attribute is “color,” and the vulue is “red.” This can be represented by the triple (apple, color, red) (Kohonen, 1977; Stuttgen, 1985). By providing the value and the attribute ( X , color, red), we extract the name of the object ( X = apple). Alternatively, we could present the object along with the attribute (apple, color, X ) to extract the value of the color (X= red). If we present only the object (apple, X , Y ) , the given response is both the attribute and the value ( X = color, Y = red). This returns general information about the object. Relational structures such as this can be built up to create complex concepts (Kohonen, 1977). In this example, the database could be expanded to include information about other attributes (taste, feel, etc.) and contain other objects with separate or overlapping values.
5.3 Associative Database Systems The ideas presented above can be embodied in an associative database system using the general concepts of database theory (Gillenson, 1987, 1990). Information is stored as objects, or entities, with descriptive
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
179
characteristics called attributes. Within the database, the entities have associations with one another. Combining associations leads to a relationship. For example, a mechanic might have taken a particular course, and the two entities of “mechanic” and “course” form an association. The mechanic’s grade in that course would be an attribute of the relationship. The three database models in use today are the hierarchical, network and relational systems (Bic and Gilbert, 1986; Holbrook, 1988). A relational database model, the newest of the three structures, overcomes the recordoriented limitation of the hierarchical and network descriptions. It allows many-to-many relationships without the pointer overhead. Since the information is not stored with a predefined structural relationship, the relational model is most appropriate for implementation as an associative processing system. The traditional relational model can be referred to as complete. Complex relationships can be described, but each piece of information is explicitly represented. Large knowledge bases can also be stored and processed using an incomplete relational model (McGregor et al., 1987). In this model, each entity is designated to be a member of a particular class, which is a category or type identification. Class information is stored at the level of the class, and each member is assumed to have the characteristics of its associated class. This storage method allows a limited inference mechanism. A type lattice can be created from this definition, where entities can be considered part of more than one set. This relationship is shown in Fig. 7. The generic relational model (GRM) has been created to precisely define this method of database processing. The GRM consists of objects, or subsections, which communicate by means of a message-passing protocol. A query will normally describe an implicit tuple, which will then be translated ANIMAL
FIG.7.
Relationship between classes in a type lattice.
180
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
through model expansion, or inference, to a set of explicit tuples. Section 6.2.12 describes a system that implements the GRM in a real computer. At the user level, the informational query to a database system is obviously not formed in tuples or sets. Rather, it takes the form of a specific question. The query can ask for an exact match (“Retrieve the record whose identification number is 1357”), a partial match (“Retrieve the records whose first name is Smith and who are Republicans”), or an orthogonal range (“Retrieve the records for all identification numbers between 1000 and 2000”) (Wu and Burkhard, 1987). The storage and retrieval hardware/software mechanism must be matched to the underlying architecture to obtain satisfactory performance at reasonable cost. Many examples of this symmetry are provided in the sections which describe the associative systems that have been conceived and implemented. 5.4
Encodings and Recall Methods
Various recall methods are possible when using an associative memory for search and match operations (Kohonen, 1977). 5.4.1 Hamming Distance
One of the earliest methods of best match retrieval was based on the Hamming distance devised by R. W. Hamming (Hamming, 1980). Many computer systems now use “Hamming” error-correcting codes which are designed with the assumption that data and noise (error information) are random. In other words, no statistical basis exists for assuming that certain patterns will be more prevalent than others. We can construct Hamming codes lo allow correction of any number of random error bits. They work by using only a portion of all the data patterns possible given a certain number of bits in the word. In practice, extra check bits are added to a number of data bits, and the combination of check and data bits forms the data word. The code’s error detection ability is symmetrical, in that an error in either a data bit or a check bit will be handled properly. The number of extra check bits necessary for the data bits depends upon the size of the needed usable data word and the necessary capability for correction. For example, to detect and correct 1 error bit in 32 bits of usable data, 7 check bits are needed; for a 64-bit data word, 8 check bits are required. A geometric analysis of the same coding technique introduces the concept of a “Hamming distance.” If all the possible patterns of data and check bits are enumerated, only a subset are used for actual stored information. The legitimate information patterns are stored such that a data word with errors
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
181
is geometrically closer to one correct data pattern than any other. We can picture this as a distance between legitimate data words, as shown in Fig. 8 for a 3-bit code. The two correct data patterns in this example are (0, 0,O) and (1, 1, 1). The other corners of the cube are possible data patterns with at most a single bit error. The circled corners are the data patterns that will be corrected to (0, 0,O) and the corners with squares will be corrected to (1, 1, 1). The dotted lines form planes that separate the two data domains. An associative memory can use the concept of Hamming distance for inexact retrieval by allowing the correction process to respond with the closest data pattern for a given query key. In this case, a single response is possible, and any input will respond with some output. By choosing an appropriate Hamming distance between legitimate data words, we can construct a robust associative memory. More legitimate data patterns can be accommodated for the same number of memory bits if the system will accept multiple responses. In this case, there might be many memory words at the same Hamming distance in relation to the key word, and each of them are accounted for in the memory response. 5.4.2 Flag Algebra
A data transformation method using a concept called “flag-algebra’’ (Tavangarian, 1989) has been suggested to enhance the parallel processing of associative data in a uniprocessor hardware system. This approach replaces complex searching, arithmetic, and logical operations with simple Boolean functions. The system consists of three major components. First, the wordoriented data must be transformed into flag-oriented data. This new representation identifies each word as a flag in a bitvector; the data is processed by manipulating the flags. The second part of the system processes the flagoriented data using a new algebra based on set theory, Boolean algebra, and the special characteristics of the flagvectors. This new processing method, called “flag-algebra,’’ is used to manipulate all the flags simultaneously.
FIG. 8. Three-dimensional model of Hamming distance.
182
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
Finally, the flag-oriented resulting bitvectors must be converted back to word-oriented data. A flag-oriented associative processor has been suggested for the implementation of the above method. The program and word-oriented data are stored in the sequential PD memory (program/data memory). The word-oriented data coming from the PD memory of the input/output (I/O) units are converted to flag-oriented data during the program execution and stored in the flag memory and flag registers. Parallel, associative techniques are used to manipulate the flag-oriented data. A sequential control unit directs the operation of the processor, and obtains its instructions from the PD memory.
5.5 Memory Allocation in Multiprocessor CAMS Multiprocessor systems can also make use of content-addressable memories, but care must be taken in such cases to allocate the CAM appropriately. In particular, the memory must be allocated so as to reduce the conflicts that arise when more than one processor needs access to the same associative data. An example of this is a system incorporating several CAMSto store the tables used in a multiprocessor relational database engine. In a traditional, location-addressed memory, noninterference can be guaranteed by giving each processor a unique address range. When the data is instead manipulated by content, the solution to this problem is less obvious. To make the best use of the higher cost associated with the CAM hardware, an allocation strategy should provide for minimal overhead to prevent access contention. That is, most of the storage should be used for real data. Furthermore, all memory allocations should be conflict free. Finally, during one revolution of a circulating-type memory, the contents of each CAM should be capable of supporting a complete fetch by a set of noninterfering processors. Kartashev and Kartashev (1984) discuss a method for supporting such a system in terms of minimal and nonminimal files. A minimal file includes a set of data words that can be accessed by the same processor (let us say, processor P) in consecutive revolutions without the possibility of another processor changing them between accesses. Those data words are “connected” to processor P for the consecutive revolutions. A nonminimal file describes a set of data words that do not have this property. Both minimal file allocation (where each processor accesses only its minimal file during a memory revolution) and nonminimal Jile allocation (where each processor can access a nonminimal data file during a memory revolution) are described in the paper referenced.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
183
5.6 CAM Reliability and Testing When we discuss retrieval of information, it is assumed that the information occupying the CAM cells is correct. That is, the data stored in the memory is what we intended to put there. Even if we ignore the possibility that some software flaw put erroneous data into the memory, we cannot guarantee that the information is correct since real hardware devices do fail on occasion. Given that CAM-based hardware is likely to grow in both storage capacity and importance in the years ahead, the reliability of such systems is a legitimate concern for designers (Grosspietsch, 1989). Testing a content-addressable memory is far from trivial, and current RAM test methods cannot be directly applied to this new technology. From a fault prospective, the CAM can be viewed as a combination of traditional random access memory (RAM) storage with extra logic to perform the comparison, masking, selection, etc. (Grosspietsch et al., 1986). So we can expect the same kinds of stuck-at, coupling, pattern-sensitive, and leakage problems already familiar to conventional memory designers. Furthermore, CAM faults can be classified into two broad categories: (1) a word that should match the search key misses, and ( 2 ) a word that should not match the search key hits. To detect errors on the fly, an error-detecting code can be added to the CAM storage section. The false “miss” fault (the first type above) can be detected during CAM verification by loading a pattern into all the CAM words, and presenting that pattern as a search argument. Every word should “hit” in the CAM, and any word that does not show a match can be assumed to have a problem. Diagnosis can be made even better by providing the ability to read (as well as write) the search, mask, and match (hit) registers. One proposed solution to detect a false “hit” (the second fault type above) is to add extra, strategically located, hardware. For each retrieval of a matching word (based upon the “hit” register) an extra comparison can be made between the search/mask register combination and the output match register. If the selected output data word does not match the properly masked input search register, a false hit is confirmed. Another method suggested to test CAMS is to gang entire groups of cells together, and to look for any words that do not follow an expected retrieval pattern (Mazumder and Patel, 1987). For example, all the odd lines can be grouped into one class and the even lines grouped into another class. If all the even lines react the same (expected) way, then they are all assumed to be good. If at least one line does not provide the same matching output as the rest, a single error line can present the evidence of this error. In this way, a good CAM is tested quickly since the error line will show that all the words reacted as expected. When employing any of the above methods, the
184
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
use of proper test patterns can speed the verification process and allow more specific diagnosis when an error is detected. Once an error has been detected in the CAM, the question remains about how to deal with it (Grosspietsch et al., 1987). The entire memory can be rendered unusable, but this becomes less attractive as the memory grows in size and is implemented in ever more dense technologies. Some better methods to deal with bad cells are:
1. Swap a spare (good) cell with a bad one (cell redundancy) 2 . Mark the bad word locations as unusable (graceful degradation by shrinking the number of words) 3. Triplicate each search and use a voting mechanism to correct errors.
6. Associative Memory and Processor Architectures
The data in an associative memory can be grouped by bits, bytes, words, variable fields, or blocks. Furthermore, the architecture can identify the information by discrete data fields (such as words), or distribute the information throughout the memory (neural net is one example). The architectural trade-offs that must be considered when deciding the orientation of the memory are (Thurber, 1976) : 0
0 0
0
The storage medium The communication between cells The type of retrieval logic The nature and size of external logic (such as registers and input/output ports).
The ultimate goal of a word-oriented CAM is to compare each data word with a key word, appropriately modified by a mask word. Every word in the memory is inspected in parallel, and any matching words are identified by a “match” bit (of which there is one per data word). The words can then be retrieved by cycling throughout the match bits. This tends to carry a high cost premium due to the comparison circuitry necessary for every single bit in the memory. In practice, some serialization is ordinarily used to get close to this ideal of totally parallel operation.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
185
A typical method of simplification is called word-parallel, bit-serial (or bit-slice). The data in a bit-serial associative memory (Fig. 9) is inspected one bit at time, with the same bit in every word in the memory looked at simultaneously. The bit position in each memory word is handled as a slice running from the first memory word to the last. The search time through such a memory is related to the word width rather than the memory depth (number of words). This is explained by the fact that the entire memory depth is searched in parallel for each bit in the word. An example of a wordparallel, bit serial, design is that developed by Blair (1987). A word-parallel, byte-serial memory can also be constructed on the same principle, only the “slice” is a byte rather than a bit. This reduces the number of shifts necessary to compare the entire data word at the cost of higher comparison circuitry (since an entire byte is being compared in parallel for each data word). Although less useful in practice, a word-serial, bit-parallel associative memory can also be created (Kohonen, 1987). In this CAM architecture, one whole word is read in parallel and compared to the key/mask combination. The memory cycles through all the words in the memory sequentially with each associative access. This reduces the number of parallel comparison circuits at the cost of an increased cycle time, growing as the number of word entries grows. The advantage of this method over simply using a normal address-based memory (and programming the comparison) are simplicity and access speed. The programmer can simply specify the key word and the hardware handles the actual work. Having the sequential comparison happen at a very low hardware level makes this operation much faster than if a program were to execute it one word at a time. The hardware
INPUT WORD SLICE
INPUT BITSLICE
OUTPUT WORD S1,ICE
FIG. 9. Bit-serial associative memory block diagram.
186
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
designer can use the limited access types and distances to optimize the cycling of the data words, perhaps using block mode mechanisms or other technology-specific cycling abilities. An associative memory architecture that has been considered useful, especially for large database systems, is called block-oriented (Su, 1988). The systems that use this architecture are largely all derivative from Slotnick’s “logic per track” system, described in his classic 1970 paper (Slotnick, 1970). The simplest way to understand this type of memory is to envision a rotating disk, a commonly used method to implement a block-oriented CAM (Smith and Smith, 1979). Each cylinder of the disk is broken into blocks, and there are multiple processing elements (a disk read/write head, perhaps with some filtering logic), one for each cylinder. The information on the rotating medium serially passes by each of the parallel processing elements. The amount of information that can be accommodated using such an architecture can vary dramatically, depending upon the clocking scheme chosen. For example, Parhami (1989) has shown that a 70-90% capacity improvement can be realized at minimal cost by moving from a single scheme to one where the tracks are divided into equal-capacity groups. It is, of course, not necessary to view the information in the memory as series of words at all. A distributed logic memory (DLM) (Lee, 1962) places the comparison and manipulation logic into each cell, and thus performs the comparison function on every cell truly simultaneously. The information content of the memory is not made up of discrete information locations. Rather, it consists of information distributed throughout the entire memory. This system is described further in Section 7.2. 6.1 Associative Memory Design Considerations
As with a conventional memory, the three most important design considerations for the construction of an associative memory are speed, cost and density. The speed of a content addressable or associative memory depends upon (Stuttgen, 1985) the access time of the individual storage elements, the cycle time of the comparison system, and the degree of parallelism in the operation. This can be seen most clearly by example. A bit-serial associative memory has less inherent parallelism than a distributed logic memory. If the word width is (for example) 10 bits, then it is going to take at least 10 cycles through the comparison system for every lookup. The distributed logic memory (as well as the all-parallel word-oriented CAM) inspects every cell totally in parallel, and achieves its matching decision in one cycle. However, a DLM has a longer cycle time for each cell operation than a bit-serial memory because of the extra logic in each cell and the interconnection hardware required.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
187
The useful speed of a particular associative memory thus depends upon the characteristics of the application, and how they relate to the architecture of the memory. Word-oriented (including bit and byte-serial) CAMS perform better when executing data and arithmetic computation problems, where the processing information is naturally in the word format. Fewer cycles (than on a distributed logic memory) should be needed to read the operands, perform the operation, and store the results. A distributed logic architecture, on the other hand, works very well for equality comparisons, since operations of that kind are naturally performed in parallel and simultaneously over all the bits in the memory (Thurber, 1976). The task of input/ output is a problem in a bit-serial CAM, since a conventional locationaddressable word-oriented computer is likely to be the interface to the world of humans. Some method of accessing the information by word is necessary, which entails the ability to read and write in a direction orthogonal to the normal processing operations. A totally distributed logic memory has the same problems, since no word access is even necessarily defined. However, a more structured DLM (such as an all-parallel but word-oriented CAM) can provide 1/0 in a format easily used by a normal computer. The cost of an associative memory is controlled by the price of the storage elements, the cell interconnection expense, and the amount (and per-gate cost) of logic associated with each element. In general, a bit or byte-serial architecture will be less expensive than a distributed logic architecture because of the smaller amount of logic for each storage element. The density of a memory system is related to the size of the storage elements, the overhead associated with the comparison, and the amount of interconnection between the elements. A bit-serial system will be more dense than a DLM due to a reduced comparison overhead (remember that the DLM has the comparison in every bit). The DLM is also likely to have more interconnection than the bit-serial CAM, especially if the storage is very unstructured.
6.2 Associative Processors 6.2.1 The STARAN
The most notable truly associative processor is the STARAN (Batcher, 1974), developed by the Goodyear Corporation in the early 1970s. What gives it such prominence is that it was successfully offered for sale to paying customers, and can thus be considered the first practical associative processor ever produced. The STARAN architecture was designed such that an ofthe-shelf implementation was feasible, and this cost-reduction principle coupled with some clever features probably contributed to its success (Feldman and Fulmer, 1974). There are many good descriptions of the machine in
188
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
varying degrees of depth (Foster, 1976; Thurber and Wald, 1975; Thurber, 1976) so only an overview is provided here. The description here will be used in a later section to give one example of associative processing software. The STARAN machine consists of an associative array, a PDP-11 sequential controller unit, and a sequential program memory. The controller executes operations on the associative array based upon normal sequential instructions residing in the program memory. The associative array can be made up of several array modules, each of which is described next. In the STARAN associative module there is one processing element (PE) for each word, and the set of PEs form a bit slice across the data words. The associative processing is performed in a bit-serial manner, but 1 / 0 is performed in word-serial form (where a data word is accessed with all its bits in parallel). These two access methods (bit and word) are accommodated through a multidimensional access (MDA) memory, which is 256 bits by 256 bits. The MDA allows the memory information to be accessed by word, by bit-slice, or by other fixed formats (such as a number of records with %bit bytes in each record).
6.2.2 A Hierarchical Associative Memory System A hierarchical associative memory system has been described by Stuttgen (Stuttgen, 1985) to take advantage of the tradeoffs available between performance and cost. The first level of hierarchy would be a fast, flexible, distributed, relatively expensive (but still cost-effective) associative memory that would operate on local data quickly. The second level of hierarchy would be a larger, slower (perhaps bit-serial), less expensive associative store containing the entire database or program. This two-level approach is conceptually similar to having a cache buffer between a traditional computer and its location-addressed memory. The analogy to a cache extends to the user’s perspective in the design, since the high-speed memory must be architecturally invisible to the programmer (except for the performance increase, of course). As far as the user is concerned, the associative memory is one large, powerful storage unit. Therefore, dedicated hardware would control the interaction between the various levels of storage. The local “cache” (first-level memory) would process the currently active subset of data items in this system, only going to the larger (but slower) second-level storage when it ran out of room. Lest we carry this analogy with a traditional memory too far, let’s look at some major and important differences. In an associative computer, much of the processing happens in the memory itself. The parallel associative matching and manipulating can
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
189
therefore take place on either level of storage. This is totally unlike a location-addressed system where all the processing must eventually travel to some specific processing element [central processing unit (CPU) or smart peripheral]. The question of when to send information between the two storage (and processing) levels is thus made more difficult, but more flexible. One way to handle the problem of data transfer is to split the main storage (second level of hierarchy) into sections. The system would determine which sections to transfer to the first level based upon a tradeoff between the overhead of the interlevel data transfer and the expected performance gain (standard cost/benefit analysis). Another solution to the data transfer problem is to perform an associative search of the main storage with a particular key and context mask. The number of responses from the main memory could be counted and the mask or key could be modified until the number of responding cells was able to fit into the first-level memory. In either of these scenarios, and in any formulation of level management, fast information interchange between the levels is important. We presented an earlier example in Section 5, where we wished to query a database and have returned to us all the people on a particular street who made more than a certain income level. The hierarchical system above could be used to good advantage for this by embarking on a two-pass search. The first pass would retrieve all the people on the street from the large, slower database storage. The information from this search could be loaded into our fast, local CAM and the income level could then be used to query the local memory .
6.2.3 The HYTREM Database Search System A different hierarchical approach was taken by the designers of the HYTREM (Hybrid Text-REtrieval Machine) database search system (Lee, 1990) as shown in Fig. 10. In this system there are actually three levels of hierarchy, the top two of which are associative. Before describing the system in a little more detail, it’s important to understand its intent, so that the elegant operation of the various pieces can be appreciated. The HYTREM is meant to store and retrieve large text databases efficiently. It uses a text signature to initially screen the entire database, then does a final selection on the remaining entries in a more accurate manner. The first level of hierarchy is a relatively small but fast bit-serial associative memory that stores a hashed signature of the database. The compressed signature information is typically only 10-20% as large as the entire database, and can thus be searched quickly. The first screen eliminates all the text records that cannot possibly match the query. The remaining records are likely to match, but there may be some false-positive indications (what
190
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
FIG. 10. Diagram of the HYTREM system.
the designers call false drops). A multiple-match resolver (MRR) is included to retrieve the qualified pointers and send them to the next stage. The next level of hierarchy is a text processor with more complex pattern matching capabilities, called the associative linear text processor (ALTEP). The ALTEP does a more thorough matching operation on the text that is delivered from the signature file, and will make a final determination about the appropriateness of the text to the query. It is implemented as a linear cellular array, optimized for signature file access. The text in a signature file system is broken into fixed-length blocks and is loaded into the ALTEP cells on demand. At that point the array functions as an associative matching processor. The ALTEP also has an MRR to deliver the information to a sequential controlling element. The final, lowest level of hierarchy is a mass storage system. This contains the entire database and delivers the text for both the ALTEP matching operation and whatever further information is requested after a successfully matched query. The designers of the HYTREM envision this storage level as a set of magnetic disks, with a cache memory somewhere in the path to add a performance boost. Even with the relatively slow access time of a disk, they believe performance can be kept respectable through parallel and overlapping operation of various system components.
6.2.4 Syracuse University Database System An efficient data/knowledge base engine has been suggested by researchers at Syracuse University (Berra, I987b), targeting huge systems comprising
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
191
hundreds of gigabytes. As with the HYTREM system, they suggest the use of a hashed signature or descriptor file, which they call a surrogutefile, to greatly compress the access information contained in the full (or extensionan database (EDB). The surrogate file can be used to index the EDB, and even provide the ability to perform some relational operations on the data without access to the larger EDB. An associative memory would be used to implement the surrogate file, and any retrieval that depends upon only a partial match would be performed directly at this level. The memory would perform functions such as exact match, maximum, minimum, and Boolean operations. Since the surrogate file is compact and regular, the major drawbacks to associative memories (prohibitive cost for large storage and data format rigidity) no longer are issues. At some point the entire reference must be obtained from the full EDB, and the hashing function must be chosen so as to ensure efficient data transfer. The surrogate file can be created using a superimposed code word (SCW) mechanism, where the individually hashed values are ORed together. The index into the EDB would be guaranteed to retrieve any existing facts that match the query, but using the SCW method there might be unwanted facts retrieved (false drops). These must be eliminated after retrieval, which means more data transfer from the EDB and more postretrieval processing. An alternative hashing function, called concatenated code words (CCW), concatenates the individually hashed entities before storage in the surrogate file. This makes it far more likely that all the retrieved facts are desired (few false drops), but necessitates a longer word length to accommodate the extra information. Using a CCW should reduce the amount of data transfer traffic between the EDB and the front-end processing unit. Simulation of the two hashing schemes described above has shown that the surrogate file can easily be less than 20% of the size of the entire EDB, and the hashing function must be chosen based upon the characteristics of the information contained in the database. The amount of redundancy in the database must be analyzed to determine which of the hashing functions will provide the smaller surrogate file.
6.2.5 CAM Based Hierarchical Retrieval Systems Hashizume et al. (1989) discuss the problem of data retrieval and also suggest a hierarchical architecture to obtain good performance for a reasonable cost. Although modern integrated circuit technology has provided the ability to create a truly useful CAM, they argue that it will always be more expensive than a large bulk memory device. Their model of the proper associative system consists of a parallel VLSI CAM as the local high-speed
192
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
search engine, and a block-oriented mass storage device as the main memory. The blocks in the mass storage device would correspond to the capacity of the local CAM, and data would be transferred back and forth from the mass storage in single block packets. Their paper evaluates the performance of their proposed system by making assumptions about the data characteristics, and changing various parameters to determine the outcome.
6.2.6 The LUCAS System Another interesting associative processor is the LUCAS (Lund University content addressable system) (Fernstrom et al., 1986). It was built during the early 1980s in order to study highly parallel systems, specifically their architectural principles, programming methodology, and applicability to various problems. The LUCAS system contains four major blocks, as shown in Fig. 11, the most interesting of which is the associative processor array. The processor array is interfaced to the outside world through a standard sequential master processor, which sends instructions to the associative array through a control unit. The processor array is composed of 128 processors, each of which has a 4096-bit memory module and a PE. The processors are configured in a bitserial organization. Data in a memory module is connected to the PE, and can be used in the same memory module or routed to a different one. A typical operation for the LUCAS is to compare the processor array contents (configured so that all the memory modules are accessed as one associative memory) with some template. Matching words are accessed by a multiple match resolving circuit. Data is routed among PEs by an interconnection network that allows eight possible sources for each PE input. One of the inputs is dedicated to the path between the PE and its memory, and the other seven can be configured to suit a particular application. This scheme has several important ramifications. The communication links are fixed (not data dependent) and the transfer of information happens in parallel between the source/destination pairs. The data are permutated as they are transferred. If there is no
Processor
t
Peripheral 4 Devices
UO
t Processor
Processor
FIG. 1 1 . Diagram showing the LUCAS system blocks.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
193
direct link between two PEs, multiple passes through the network must be made to route information between them. The LUCAS architecture makes it a reasonable engine for certain classes of problems, and the developers suggested several applications for it. It can be used effectively for matrix muliplication, fast Fourier transforms, and graph theoretic problems (such as the shortest distance between points, minimal spanning tree, etc.). Furthermore, LUCAS was suggested as abackend processor for relational database processing, and as a dedicated processor for image processing. (The software for this system is discussed in Section 7.6.) 6.2.7 Matching Hardware and Software
A novel computer architecture has been suggested by Blair and Denyer (1989) that uses the power of content addressability to attack certain classical data structures and algorithms. Among their examples are vectors, lists, sets, hash tables, graphs, and sorting problems. The CAM is bit-serial and wordparallel to provide a constant speed unrelated to the number of words, to minimize the pin count and internal buses, and to keep the memory bit size reasonable. Each word has a tag set associated with it that is used to manipulate the contents of that word. There are two tags that keep the status of a comparison operation, and one tag that identifies the data word as empty (“empty tag”). An empty tag status means the associated data word is undefined and available for new data. Empty locations are also not a part of any comparison. The CAM operates as follows. A masked comparison is performed on the entire memory, and the appropriate tags will be left “active” after this operation. Each matching word is inspected one at a time, and it is either manipulated somehow or marked as empty (by setting the “empty tag” for that word). The matching tag for that word is then made passive (cleared), which brings up the “next” active tag (from the original matching operation). When all the words from the last match have been inspected and made passive, the CAM signals that there are no more responders. The smaller modular CAM described above can be cascaded to form large CAMS. A group “lookahead” signal, formed by a NOR function on the tag values, can be used to enhance the performance through a bypass of groups that have no active words. (The software for this system is discussed in Section 7.8.) 6.2.8 The DBA System
The DBA (database accelerator) system (Sodini et al., 1986; Wayner, 1991 ) at Massachusetts Institute of Technology (MIT) is a research vehicle
194
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
to discover and evaluate appropriate uses for content-addressable memories. The DBA can be viewed as a single-instruction, multiple-data (SIMD) (Flynn, 1972) content-addressable processor, with the additional ability to store and manipulate “don’t care” conditions, to selectively enable subsections of the memory, and to process sequences through a finite state machine. Each 32-bit CAM word in the DBA has its own I-bit microprocessor, a set of four single-bit registers that can be used to store the results of the current operation or feed into the next operation, a matching-pattern result latch, a selection circuit that enables the word, and a priority encoder to provide a serial output when more than one CAM word hits. The.DBA system, which is a combination of the above basic cells, is organized as a set of I-bit data paths connected in a linear, nearest-neighbor topology. As an example of the power such a system can provide, the designers suggest its use to enhance the performance of logic simulation. To use the DBA in this fashion, the network under simulation should be viewed as a clocked system, where the logic description can be represented as a sum of products feeding some latching element (most modern synchronous designs meet these constraints). The simulation is carried out by taking the logic network, represented as a Boolean function of its inputs, and computing in parallel the results of a set of functions over a set of variables. This is done in several steps. Before starting the actual simulation, the input variables are assigned unique bit positions in the CAM word. As a simple example, assume that the CAM consists of four words, each word having a width of 4 bits. Consider the expression
D =(A x
B) + (2x
C)
which contains three input variables ( A , B, C). These could be assigned to the leftmost three bit positions in the CAM word. The DBA’s ability to store and manipulate “don’t care” conditions is crucial here. The minterm designation can be represented by the words 1OX and OX1 (ABC, where the “X” term means “don’t care”). Each minterm is assigned its own CAM word, and one “in-use” bit per word specifies whether that term is to take part in the simulation. If there are more inputs than a single CAM word can accommodate, the simulation must employ a multiple-pass technique, making use of the DBA’s sophisticated logical ability and internal storage. The general procedure does not change. The simple example above uses only 2 CAM words to determine “D,” so the “in-use” bit (we assume that it is the rightmost bit) is only set in the two significant words. The 4 words would be 10x1,OX1 1, XXXO, XXXO. The actual simulation is carried out in three phases. The first phase in the simulation evaluates the minterms of all the equations, the second phase performs the A N D function on the results of the minterms, and the last phase returns the logic values.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
195
6.2.9 The CAAPP System
Another recent bit-serial associative processor, the content addressable array parallel processor (CAAPP) (Shu et al., 1988) has been designed from basic principles to efficiently support an image understanding system. It is one component of the overall architecture, and provides pixel level and symbolic processing. The basic building block of the CAAPP is the PE, which consists of an ALU, support circuitry, and memory. Each PE consists of a 320-bit on-chip memory store and an external 32K-bit “backing store” designed with dual port video-RAMS (VRAM). The backing store has growth capability if larger VRAMs are used. The PEs are interconnected through a nonmultiplexed full mesh network, providing a compact and efficient topology. The CAAPP has several interesting architectural features. These include an activity bit that controls PE responses to a query, a some/none response feedback ability, and a method to count responders. The activity bit in a particular PE might be set after an initial query, and would thus leave only a subset of PEs for further processing. The some/none lines from all the processing elements are wired together such that only a single output line needs to be monitored by the system to determine if any matching responses were obtained. An example of where this type of system might prove useful is in the creation of a histogram that exhibits the informational content in an image. The controller would broadcast data that describes a particular pixel intensity range. Any matching PE would set its some/none line, and the response count circuitry would quickly determine how many active lines matched that range. 6.2.10 The CAFS
Another associative subsy’stem, called the content addressable file store (CAFS) (Burnard, 1987) has been created by ICL and is available on all their mainframes. It is designed to search efficiently through great quantities of unstructured text data, replacing cumbersome and often inadequate software measures such as tagging and indexing. The CAFS hardware, built into ICL’s disk controllers, consists of four major sections. The logical format unit identifies logical records within the byte stream. The retrieval unit converts the input byte stream into records for transmission to the CPU. The search evaluation unit does the actual search based upon data and mask information supplied. This unit determines if a particular record should be retrieved. The retrieval processor accumulates the matching responses (“hit” records) and can perform other simple arithmetic operations on those records.
196
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
The CAPS system has the ability to process a special tagged structure called the self-identifying format (SIF). Properly tagged tokens can be stored in records of arbitrary length (fixed or variable). Using the SIF, the CAFS engine can identify tokens of any type, independently specifying the types to be searched for and retrieved, even applying a mask to each tag as it is processed. So, for example, the CAFS can search for any name, a name of some particular type, any surname, a surname of one type, etc. This search and processing ability is limited only by the length of the chosen tag.
6.2.17 The GAPP
An interestingly unique content-addressable memory approach has been chosen for the geometric arithmetic parallel processor (GAPP) (Wallis, 1984). A systolic array architecture is used to create a bit-serial CAM. It can search for data based upon its content, optionally performing logical and arithmetic operations on that data. Each array chip contains 72 singlebit processing elements, each element having access to 128 bits of RAM. The PEs operate in parallel as an SIMD machine, and more than one GAPP chip can be combined for greater capacity and flexibility.
6.2.12 The GAAP
A large knowledge base machine is currently being designed by researchers at the University of Strathclyde, Glasgow, Scotland in collaboration with Deductive Systems, Ltd. (McGregor, 1986; McGregor et ul., 1987). This machine will implement the generic relational model (GRM), briefly described in Section 5.3. The major associative component of this computer is the generic associative array processor (GAAP). This processor allows the hardware mechanism to inferentially expand the implicit query tuples into a set of explicit ones. In the GAAP achitecture, a traditional sequential processor controls an array of custom VLSI associative chips. The controller also has its own dedicated RAM to coordinate the interchip information. Connections among lines within one cell and between lines in different cells provide the information about set membership. The intercell and intracell communication matrix can be used to perform the operations needed in the GRM. These operations include set membership insertion and deletion, upward closure to determine the sets to which an entity belongs, and downward closure to ascertain the members of a set. Set operations such as union, intersection, and selection are also implemented.
CONTENT-ADDRESSABLE AND ASSOCIATIVE M E M O R Y
197
6.2.13 The ASP
Lea has written extensively about his associative string processor (ASP) (Lea, 1986a, b, c; 1987a, b) offered as a cost-effective parallel processing engine, the architecture of which is shown in Fig. 12. His unfortunate use of the acronym ASP can lead to confusion, since Savitt et al., used the same three letters to refer to their 1967 “Association-storing processor” specification (Savitt et al., 1967). The original ASP is examined in the software section of this chapter. Lea’s ASP, described in the following section, is not particularly related to that classic machine, though he does give at least conceptual credit to the original ASP. The associative string processor architecture (Lea’s ASP) describes a reconfigurable and homogeneous computing foundation, designed to take advantage of the inexorable technological migration from VLSI to ultralarge-scale integration and then on to wafer-scale integration. Its goal is to efficiently support such operations as set processing, string processing, array processing, and relational data processing. The building block of the ASP is the substring. Many substrings operate in parallel, and are supported by an ASP data buffer (ADB), a controller, and a data communications network. DATA INTEWACE
-
DATA COMMUNICATIONS NETWORK
ADB
ADB
I I
ASP S U B S T R I N G
CONTROL INTERFACE
-r -
ADB --L
ASP
ASP
S
S U B
U
B S T R I N G
ADB I
ASP S U B
S
S
T R I N G
T R
I N G
T T- I ASP CONTROLLER
FIG. 12. Diagram of the ASP architecture.
198
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
Each substring incorporates a string of identical APEs (associative processing elements) which communicate through an inter-APE network. During operation of the ASP, all the APEs simultaneously compare their stored data and activity registers to the information broadcast on the data and activity buses in the substring. Any APEs that find a match are either directly activated themselves, or indirectly activate other APEs. Activation in this context means that the APE’S activity register is updated. Once an APE has been activated, it then executes local processing operations in parallel with other active APEs. Four basic operations are supported by the APE: match, add, read, and write. The match operation affects the M (matching) and D (destination) flags, either setting or clearing them based upon the APE registers and the broadcast information. In the add operation, a bit-serial addition or subtraction is performed, and the outcome is stored in the M (sum) and C (carry) flags. A read operation drives the data bus with a wire-AND of all the activated APEs. A write operation updates the data and activity registers of all active APEs with the information on the data and activity buses. The ASP supports both bit-parallel, single-APE data transfer through the shared data bus, and bit-serial, multiple-APE information flow through the inter-APE communication network. The inter-APE communication path is restricted to high speed transfer of activity signals or M-flag patterns between APEs. This communication architecture, coupled with the ability to activate each APE by its data content, allows efficient control of data movement. The LKL and LKR ports maintain continuity by allowing information to be sensed by the external ASP controller. The inter-APE communication network allows the ASP substring to effectively emulate such common data arrangements as arrays, tables, trees, and graphs.
6.3 CAM Devices and Products Over the last few years a number of content-addressable memory integrated circuits have been designed and built. Most of these have been constructed in research laboratories but in recent years several commercial CAM devices and board level products have become available. This section first describes the devices and architectures developed at various research institutions and then describes commercially available CAM devices. 6.3.1 CAM Devices Being Developed
6.3.1.1 The CARM. Kadota et ul. (1985) describe an 8-kbit contentaddressable integrated circuit organized as 256 words by 32 bits. Their device
199
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
is called a content-addressable and reentrant memory (CARM) and was fabricated with 2-pm CMOS technology and two-layer metallization. The basic structure is similar to that of a static RAM in that it has address decoders, memory cell arrays, bit lines, word lines, write drivers, and sense amplifiers. To provide the added functionality required in the CARM, data and mask registers are added along with sense lines for each word, an address encoder (as well as a decoder), and an address pointer. A block diagram of this device is shown in Fig. 13. Thirty-two-bit-wide data are applied to the device through the data pins and transferred to the data register. These data are then masked according to the condition of the bits set in the mask register and applied to all the memory words in parallel. If the masked data coincide with the data stored in the memory words, a match occurs, and the sense lines for those words are activated and propagated to the matching sense amplifier. This amplifier activates the sequential address encoder and the address of the corresponding word is output through the address bus. When more than one word matches the data applied to the device, corresponding addresses are output, one after another from the sequential address encoder. I I I I I I
*J Data Bus
Address Bus
I I
I*
I
I I I I I I I I I I I I I I I I I I
Data and Mask
I
y
Sequential Address
1
AmuMer
Associative
Memory
I
I
I-l
Control and Timing Logic 1
A 256 words by 32 bits
I
I
I I I I I I I
I I I I I
Sense Amp
I
1
I I
I I I I I I I I I I I I I I I I I I I I I I I
I I I I
200
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
6.3.1.2 CAM Device Expansion. One of the problems with earlier CAM deviccs was their limited size and lack of expandability. Even with current technology it is not possible to conceive that a single integrated circuit could be dcsigned and built that would satisfy all the memory requirements of a content addressable memory. A modular design is therefore essential whereby the capacity of the memory can be easily increased by the suitable addition of extra identical integrated circuits. In conventional RAM systems it is relatively easy to increase both the memory width and depth of the system by adding extra memory devices. Increasing the size of a CAM system is not as simple. The most difficult expansion feature to implement is concerned with the label and tag fields, which are the fields where we require a content-addressable search. The tag field may be considered an extension of the label field but we still have to allow for expansion of the width of this combined field which is also called horizontulexpansion. Also, we must be able to increase the number of entries that may be placed in the memory which corresponds to an increase in the depth, or vertical expansion of the memory. Ogura et ul. (1985) describe a 4-kbit CAM integrated circuit organized as 128 words by 32 bits which may be interconnected to produce larger sizes of CAM. A garbage flag register is used to indicate whether a word location is empty or not and during write operations a multiple-response resolver selects from among empty word locations. To implement a memory with more than 128 words, a number of these CAMs can be connected together by sharing a common data bus. Each CAM has an inhibit in and out signal ( Pe,,,,, and Pcxl,).The P,,,,, is generated from the multiple-response resolver flag register outputs. To extend the word count (depth) of this memory, two memory chips can be connected together on a common data bus, as shown in Fig. 14, to produce a memory system with 256 thirty-two-bit words. The use of the inhibit signals assign a priority in a daisy-chain fashion to each of the modules to resolve contention for the data bus. When data are to be stored or retrieved from the system, each of the CAM modules identifies an empty word location by referring to the garbage flag register. If an empty word is located in a module then the inhibit signal ( PeX,,,) is set to “1” which is then transferred to the last chip in a ripple through operation. After the highest priority module has accessed the common bus its (P,,,,,) signal is set to ‘‘0” allowing the next integrated circuit in the sequence to have priority. To allow multiple integrated circuits (ICs) to be connected together to increase the word “width,” existing designs of CAMs, e.g., the two devices mentioned above, generate an address when a match occurs and use this to allow for expansion. As an example, assume that it is required to carry out a matching operation on 96 bits but the actual width of the individual devices
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
-
-
201
CAM module 1
p,, uin
CAM module 2
F FIG.14. Increasing the word count of CAMS (vertical expansion)
is only 32 bits. The 96 bits would be split into three groups of 32 bits and one group applied to each of the three devices. One of the devices acts as a master and the rest are programmed to act as slaves as shown in Fig. 15. If the master detects a match on the four bits that it receives it outputs the address corresponding to the location of the matched entry and this address is then used as an input to all the slaves. The slaves compare their 32 bits of input data with the data stored at that particular address and if the data stored is the same then an output bit will be set. A match on the whole 96 bits only occurs if all the slaves indicate their part of the word matches.
Label
0 0 master
1 address I
FIG. IS.
slave 1
slave 2
I
A C A M arrangement that increases the width of the label field.
202
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
6.3.2 The Database Accelerator Chip Most of the LSI versions of CAMs that are currently available, such as the devices described in the previous section, use about 10 transistors for each cell. The memory cells for these devices were all based on a static memory cell, but by comparison, a dynamic CAM which only requires five transistors has been designed as part of the Smart Memory Project at MIT (Wade and Sodini, 1987). These devices may be compared to the single transistor and capacitor required to store one bit of information in a conventional dynamic random access memory. The DBA chip was briefly described as a complete system in Section 6.2.8. The DBA chip has two major sections, a memory section consisting of a CAM array used for associative operations and a processing section which consists of many simple, general-purpose, single-bit, computing elements. In this section we wish to concentrate specifically on the design of the CAM which uses a unique concept called trits (lernary digits). A trit can be either the usual zero or one but also a don’t care (“X”) value. This cam is also called a ternary CAM (Brown, 1991; Hermann, 1991; Wade, 1989). Traditionally CAMs have implemented the don’t care function with a mask register separate to the data register but the ternary CAM allows for the storage of the three states directly. As opposed to the static CAM cell designs of the integrated circuits mentioned above, the DBA cell is a fivetransistor CAM cell (Wade and Sodini, 1987). This dynamic cell was used because of its small size and ability to store three states.
6.3.3 The Dictionary Search Processor As far as the authors are aware the largest CAM produced to date is the DISP integrated circuit which has a 160-kbit content addressable memory (Motomura, 1990a). The DISP was developed to aid in dictionary search operations required for natural language processing. Two of the most important requirements of a dictionary search system are increasing the vocabulary (may be several tens of thousands of words) and speeding up the search process of this large dictionary. To complicate matters, the input words to the system may have misspellings. It is therefore not only necessary that the system searches for a stored word that exactly matches the input word, but it also has the ability to search for stored words that approximately match the input word, Previous dictionary search systems have used conventional memory and software to iteratively read out stored words and compare them with the input word. This process may take many thousands of cycles, especially when the nearest match to the input word is required. However, CAMs are
203
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
able to simultaneously compare an input word with all the stored words in one cycle, and so the DISP was developed to enable large and fast practical dictionary search systems to be constructed. The DISP contains a 160-kbit data CAM (DCAM) organized as 20 CAM arrays of size 5 12 rows by 16 columns, and a high-speed cellular automation processor. A block diagram of the DISP is shown in Fig. 16. In order to reduce the number of comparisons between the input word and the stored words the DISP classifies the stored words into a number of categories. The control code CAM shown in the figure is responsible for storing indexes into the DCAM based on the classification scheme used. For example, the categories could be selected using the first character of the stored word. The DISP can store a maximum of 2048 words classified into 16-different categories. As mentioned earlier, a dictionary search system should respond with the correct word even when the input word contains spelling errors. Similar to the Hamming code distance described previously in Section 5.4.1, the cellular automation processor of the DISP calculates the distance based on character substitutions, insertions, and deletions, between the input word and the closest stored words. Only stored words with distances less than or equal to 2 are treated as matched words; a word with a distance greater than 2 is treated as a mismatched word. Once a matched word or words are found the priority encoder will serially output the addresses of the matched words starting with the closest match. The DISP can store a maximum of 2048 words but multiple DISPs may be connected in parallel to increase the vocabulary of the system. For example, a Character Codes
a
Control Code
1~~~1Controller
RowAddress ColumnAddress
--
cam t
Cellular Automation Rocessor
I
T
PriorityEncoder
I
204
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
50,000-word dictionary search system could be constructed using 25 DISPs. Further details and additional references on this device can be found in a rcport by Motomura et al. (1990).
6.3.4 Fault- Tolerant Content Addressable Memory Devices
In a conventional memory system if every addressable location can not be accessed and correctly manipulated then the memory is usually useless. To increase the yield of memory devices some manufacturers include extra space capacity which is automatically switched in if faulty memory locations are detected. A CAM is naturally fault tolerant since there is no concept of an absolute storage location and the physical location of data can be arbitrary. So long as faulty cells can be detected and isolated, a CAM will still function, albeit with reduced capacity. The articles by Grosspietsch et al. (Grosspietsch et al., 1986, 1987; Grosspietsch, 1989) cover the issues of testability and fault tolerance in general and is a good introduction to this area. A number of researchers have incorporated these concepts into actual designs. Blair (1987) describes a device that exploits the natural fault tolerance of a CAM at the cost of one extra latch per word. During a test cycle this latch will be set if a fault is detected and it will ensure that the faulty CAM locations are not used for actual data storage and retrieval. An 8-kbit CAM (128 words by 64 bits) that is fault tolerant under software control is described by Bergh et al. (1990). A faulty word location in the memory can be made inaccessible by on-chip circuitry. This device was developed for a real-time multiprocessor system, but the authors also describe its use in telecommunications systems and as a matching unit for a data-flow computer (see Section 3.8 for more details). An additional feature of this device is a 12-bit counter that contains the number of stored valid words in the CAM. Each time a word is stored or deleted, the counter is incremented or decremented accordingly. A self-testing reconfigurable CAM (RCAM) is described by McAuley and Cotton (1991). The size of this device is 256 words by 64 bits and was designed for general high-speed table look-up applications. The design is similar to the CARM device described previously, but it has two additional transistors for the self-test. During the self-test cycle a number of test patterns are automatically generated to test all the CAM words. If a fault is found that word is disabled from future selection. This self-test reconfiguration is typically carried out when power is first applied. The RCAM has 26 usable words, less the number of bad locations found during the self-test.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
205
The RCAM is also interesting because it is an example of an addressless CAM. It does not contain the usual address encoder to identify the matching locations in the CAM but instead outputs the matching word directly. To explain this concept further, an example of the use of the RCAM for address translation in a high-speed packet switching network is described (McAuley and Cotton, 1991). When packets arrive at the switch they must be routed to the correct output port. The correct port to be used is based on the destination address, and the switch must translate the packet’s destination address into an output port number. The RCAM may be used for this purpose by splitting the 64bit words into a 48-bit packet address field and a 16-bit port address field as shown in Fig. 17. After receiving a packet, the destination address is applied to the RCAM with the masking register set so that the top 16-bits are don’t care. Any RCAM locations that have the same bottom 48 bits will match and the 16bit port address will be output (along with a duplicate copy of the 48-bit address).
6.3.5 Commercially Available C A M Products
The devices and architectures mentioned above have all been produced in research laboratories. Over the last few years a number of companies have started producing products that utilize content addressable techniques. These products range from individual CAM integrated circuits to complete associative processing board products. This section provides brief details and references for what the authors believe are the main products that are currently commercially available. Packet Address
IXXXXXXXl
I 0
47 48
Packet Address
63
1 Port Address 1
Output Port Address
FIG. 17. The RCAM used for address to port translation.
206
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
6.3.5. I Advanced Micro Devices. One of the first commercially available CAM devices was the Am99C10 device from Advanced Micro Devices (1990a). This device is organized as 256 words by 48 bits and has been optimized for address decoding in local area networks and bridging applications although it could also be used effectively in database machines, file servers, image processing systems, neural networks, and other applications. A block diagram of the Am99C10A is shown in Fig. 18. Each of the 256 words consists of a 48-bit comparator and a 48-bit register. When the data (comparand) is presented to the CAM array, a simultaneous single cycle compare operation is performed between the comparand and all 256 stored words in about 100 ns. Any of the 48 bits of the comparand may be selectively masked, disabling those bits from participating in the compare operation and thereby allowing comparisons to be made on only a portion of the data word. If the comparand matches with a stored word the on-chip priority encoder generates an 8-bit address identifying the matched word location in the array. If there are multiple matches in the array the priority encoder gcnerates the address of the lowest matched location. The addresses of other
16-bil LIO Bus
FIG. 18. A block diagram of the Am99CIOA. (Reprinted with permission from Advanced Mico Devices. Copyright 0 1990.)
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
207
matching words may be selected individually by setting the skip bit in the CAM word. Some of the applications that AMD suggest for the Am99ClOA are (1 990a) : Local area network (LAN) bridge address filtering LAN ring message insertion and removal Database machine support-search and support accelerators Pattern recognition-string search engines, etc. Image processing and machine vision-pattern recognition, image registration, etc. Neural net simulation A1 language support-(LISP. etc.) garbage collection support, PROLOG accelerators, etc. The main intended use for this device is a LAN address filter. This application is also mentioned for the other commercial devices described in the following sections and it therefore seems appropriate to elaborate on this example of CAM use. 6.3.5.2 LAN Bridge Address Filtering. A LAN bridge should provide transparent communication between two networks. An example of a bridge between an FDDI network and an Ethernet network is shown in Fig. 19, and a block diagram of the FDDI-Ethernet bridge is shown in Fig. 20. To allow workstations on the different networks to communicate with each other the bridge must pass through appropriate messages from one 10 MHz Ethernet
FOOl
Workstation 157E
CPU 2
Workstation 231A
Bridge B
I I
,
I
I
I
I
I
I
I
b
I
I
Workstation 34E5
10 MHz Ethernet
100 MHz
I
tj
t~
File Serve
+
Bridge A
Workstation 562C
wo A351
Workstation 4405
I I
I
1
I
;t
1
CPU 1
I
I I
I I I
I
I 1 I
,I I
Workstation 13EB
FIG. 19. An FDDI-Ethernet network system. (Reprinted with permission from Advanced Micro Devices. Copyright 0 1990.)
208
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
100 MHz FDDI Out
4 10 MHz Ethernet Bus
Ethernet Controller
Buffer
Controller
L
100 MHz FDDI In
Address Filter Address Match ( M y for this net) Pass Message
to Ethernet
Message
FIG.20. A block diagram of the FDDI-Ethernet bridge. (Reprinted with permission from Advanced Micro Devices. Copyright 0 1990.)
network to another. For example, assume that the workstation with address 562C sends a message to workstation 34E5. For this to occur the bridge must recognize that the address 34E5 is for a workstation on the other Ethernet network and pass the message accordingly. The FDDI-Ethernet bridge must compare destination addresses of all the transmitted messages to see if any of the messages should be routed to its Ethernet network. The problem is that there may be hundreds or even thousands of workstations on the LANs and the bridge therefore has to compare the message destination address with many stored addresses as quickly as possible. A simple sequential search approach would be too slow but a CAM device such as the Am99ClOA can carry out the required message address comparison in a single cycle. The 48-bit word size of the Am99C10A corresponds to the 48-bit address length of the network messages. More information on LAN address filtering and Bridge implementations using the Am99C10 can be found in Wilnai and Amitai (1990) and Bursky (1988) as well as in the Advanced Micro Devices data sheet (1990a).
6.3.5.3The MUSIC Semiconductors LANCAM. The MUSIC (MultiUser Speciality Integrated Circuits) Semiconductors Company in Colorado introduced in 1990 a content-addressable memory also targeted at address filtering in LAN and routers (1991a). The name of the device is LANCAM
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
209
(part number is MU9C1480) and is capable of storing 1024 64-bit fields. The device may be used for destination and source address recognition, and also for general associative data storage in systems such as database accelerators, code converters, machine vision systems, and target acquisition systems. The MU9C1480 is very similar to the AMD Am99C10 described previously but it has additional functionality and more than four times the capacity. Figure 21 shows a block diagram of the LANCAM. Although the internal data path is 64 bits wide, the external interface is multiplexed to allow communication with the device over a 16-bit bus (labeled 0415-0). This bus conveys data, commands, and status to and from the MU9C1480. The four signals shown in the bottom right of the figure are flags to allow the device to be vertically cascaded. These four signals have the following meanings :
/FF /MI /FF /FI
Match Flag: This output goes low when a valid match occurs during a comparison cycle. Match Input: This input is used in vertically cascaded systems to prioritize devices. Full Flag: This output when low indicates that all the memory locations within a device contain valid contents. Full Input: This input is used in vertically cascaded systems to generate a CAM memory system full condition.
Using the above four signals it is very easy to increase the memory word size by connecting together a number of LANCAMs. This vertical expansion is shown in Fig. 22. For bridge or other applications that require more than 1024 entries the LANCAM can be easily cascaded without the need for external priority encoders or address decoders. Figure 22 shows the vertical cascading of the LANCAM and it can be seen that the flag signals are simply daisy-chained together. 6.3.5.4 The National Semiconductor SONIC. National Semiconductor also has a device targeted for high-speed LANs called the systems-oriented network interface controller (SONIC) (Wheel, 1990). This device employs CAM architecture to perform address filtering and has seventeen 48-bit entries to store destination addresses.
6.3.5.5 The Summit Microsystems C A M Board. Summit Microsystems is an example of a company that produces board level products containing a number of CAM integrated circuits. Their SM4k-GPX board contains an
BLOCK DIAGRAM
IEC
MUSIC-" SeniccrariCtors reserves me ngh! 10 make changes ID lhs product wnhogt notice for the purpose 01 improving design or performance charanerlrfics MUSIC" and the elements of me MUSIC-" logo are Rademaks of MUSICN Semiconductors
January 15,1991, Rev. 0
FIG.21. A block diagram ofthe LANCAM. (Reprinted by permission of MUSIC Semiconductors. Copyright G 1991.1
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY .
21 1
-
DQO-15 /EN IWR ICM IEC
IIl‘ - 7 ::,& :: /SMF lSFF
FIG. 22. Vertically cascading the LANCAM. (Reprinted by permission of MUSIC Semiconductors. Copyright 0 1991.)
array of AMD Am99C10 CAM chips to provide a 4086-word by 48-bit matching memory (1989b). An input pattern up to 48 bits wide can be compared against all the CAM array words in a single 100-ns cycle. The board responds with the address of the CAM word that found an exact match with the input pattern. This board plugs into a standard PC/AT bus and is supplied with menudriven software to provide a CAM development system. It is possible to daisy-chain up to 15 additional boards to expand the CAM capacity to 64k words. The boards contain a 16-bit address register, effectively expanding the &bit addressing range of the individual Am99C10 devices. The boards also have a special area for user prototyping and personalization. Some of the applications that Summit Microsystems state are suitable for this board are : 0
0 0 0
0
LAN interconnect address filtering (Wilnai and Amitai, 1990a) File servers Database management Disk caching Radar and SONAR (sound navigation ranging) signature recognition Image processing Neural networks Cache memories
6.3.5.6 Coherent Research, lnc. Coherent Research is a company that provides both individual devices and board level products. The CRC32256
21 2
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
is a CMOS associative processor with a capacity of a 256-word by 36-bit content addressable memory (1990b). The coherent processor [(a) ;Stormon, 1989; Wayner, 1991)] is a card for the PS/2 Model 70/80 that uses up to 16 of the CRC32256 chips to provide a 4096 by 36-bit associative parallel processing array. The Coherent Processor development system provides hardware and software support for writing, debugging, and running parallel application programs in the C language. The company also has a software tool called the coherent processor simulator which runs under MS-DOS or Sun Unix and which simulates a parallel associative processor. Programs developed on the simulator can be run on the coherent processor board simply by recompiling. Coherent Research has a number of application notes that describe the use of their products in such fields as neural networks, LANs, relational databases, pattern recognition, and radar multiscan correlation.
7. Software for Associative Processors
Storage, retrieval, and manipulation concepts for associative memories and processors differ significantly from those used on traditional sequential address-based computers. Consider a typical address-based computer instruction such as STORE X, which presumably takes a value from an internal register and stores it in a memory location with address “X.” We cannot directly compare this with its exact counterpart in an associative processor because the CAM storage contains no location addressed by “X.” All we can reference are locations differentiated by their contents. This is true even if the underlying hardware in the associative computer does allow such address-based location selection at the lowest hardware level, since this user-available reference method would completely circumvent the whole point of the associative processing architecture. Depending upon the exact implementation of the associative system, this type of instruction might refer to a pattern “X,” and it might store the pattern in some subset of associative memory storage fields. Associative software will be discussed in more detail in the following sections, and several interesting implementations of associative languages will be provided. The software programmer writing applications for a content-addressable processing system does not necessarily need to have detailed knowledge of the underlying hardware data storage mechanism. It is possible to allow the user to specify the program in terms of associative interconnections, and let the system handle the exact implementation. So, for example, the user could write a program in some associative language and that program could be executed on either a fully associative computer, or on a partially associative
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
213
computer (e.g., bit-serial), or even on an entirely traditional address-based computer. The hardware and operating system (or run-time library routines, microcode, etc.) could shield the programmer from the details. It might even make sense to debug the program on a serial location-dependent machine before letting it loose on a parallel associative hardware engine, since troubleshooting the code would be simpler without the parallel aspect of computation. However, for the programmer to fully exploit the power of fast, parallel, content-related data management, it is necessary that he or she comprehend the underlying associative architecture. At some level, the software architecture must contain the proper syntax, program flow control, and data structures to efficiently execute applications on the associative hardware (Potter, 1988). The emerging field of parallel processing is one area where an associative memory can substantially enhance a computing system. For example, the Linda parallel processing language relies on “tuple space” to coordinate the multiple execution threads. The tuples formed in this language are matched by associative comparison rather than address. This language obviously benefits from an associative component. Many new programming languages in the field of artificial intelligence are nonprocedural in nature. It is hoped that these languages will more closely model how people deal with problems. Languages such as Prolog and Smalltalk identify objects and their interrelationships, and are good prospects for an associative processor. Content-addressable systems show good promise as fast, efficient database engines. The software necessary to implement this function must provide easy, context-sensitive searching capability on sometimes unstructured data, and the user interface must be accessible to the average programmer (Berra and Trouillinos, 1987). Some important associative computers have been formulated with that task as their goal, and several are described in this section to show the nature of such systems.
7.1
STARAN Software
The STARAN associative parallel computer architecture was described previously in Section 6.2.1. This section will provide some of the details about the STARAN software (Davis, 1974; Thurber, 1976) as an example of an associative machine language and system support combination. Since the machine can be operated in address-mode as well as associative-mode, there are obviously going to be many language facets that are common to
21 4
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
other non-associative computers. This section deals only with the special language constructs and implementations that are unique to an associative computer. The assembly language for the STARAN processor is called APPLE (Associative Processor Programming LanguagE). Each APPLE instruction is parsed by the hardware into a series of microcoded execution operations, and as such the underlying hardware may not always perform the entire operation in parallel. From the vantage point of the programmer, however, each assembly language instruction may be viewed as performing the entire operation as if it were a totally parallel machine. The array instructions in the APPLE language are unique to the associative STARAN computer, These instructions are loads, stores, associative searches, parallel moves, and parallel arithmetic operations. They operate on the MDA memory arrays and the PEs that are associated with them. The load array instructions load the PE registers or the common register with data from the MDA. The load function can also perform logical operations on the data before it is finally saved in the PEs. The store array instructions perform just the opposite. function. They move data from the PE or common register (with logical operations and a possible mask enable) to the memory arrays. The associative search instructions search the words in the array enabled by the mask register. The search can take several formats, and the comparisons can be made by many of the nonexact methods already listed in a previous section (such as greater/less than, greater/less than or equal, maximum, minimum). By combining different searches even more powerful comparisons such as between limits and next higher can be obtained. This group also contains special instructions to resolve multiple responses from a search. The puruEIel move instructions move masked fields within an array to other fields within the same array. Permutations of the data as it travels from the source to the destination fields are possible, such as complement, increment, decrement and move the absolute value. Finally, the parullel arithmetic array instructions provide the ability to perform masked parallel operations such as add, subtract, multiply, divide, and square root within the associative arrays. This instruction has several formats, but in each case it uses all the array words in parallel (or whichever ones appropriately match) as potential targets. The STARAN also offers a macro definition language, which allows the programmer to add other useful operations. These would include other arithmetic, logical, relational, and string manipulation operators. Although the APPLE language is unique to the STARAN system, it does provide an excellent example of the kinds of low-level machine instructions that make sense on an associative computer.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
7.2
21 5
DLM Software
The distributed logic memory (DLM) as a string retrieval engine was proposed by Lee in his classic paper on the subject (Lee, 1962). In this memory scheme, there are no fixed words at all. Rather, the cells are laid out as a string from start to end, each cell communicating with its neighboring cells (next and previous) in the string. Each cell contains a symbol for storage, and a tag field to identify the active state of the cell. A comparison string is entered through an 1/0 port, and this string is compared to the entire associative memory. When the comparison operation is complete, only the matching fields identify themselves and the retrieval process can begin. The comparison is done as follows. The first symbol in the input string is compared to all the symbols in the memory that reside in the first position of a field (the start of the field is identified by a special symbol). If any of the cells match, they send a propagate signal to the next cell, which sets that new cell active. The next symbol in the input string is then compared to any still-active cells, which behave in the same way as the first cell if a match is made. In this way, only the cells that represent a complete string match will show an active status, since any nonmatching symbol will not propagate an active signal to the next neighbor cell. This description has been brief, and is only meant to familiarize the reader enough so that the instructions presented next have some context. Several incarnations of this basic architecture have been developed, and more information is available in other reference works (Kohonen, 1987; Thurber, 1976; Thurber and Wald, 1975). The most basic instructions in the DLM are match, propagate left/right, store, and read. The match command causes all currently active cells to compare their data symbols to the reference symbol input. This is done associatively and in parallel. A side effect of the matching operation is a clearing of the matched cell’s active state and a propagation of that active state to the next cell (left or right, depending upon the propagation direction control). The propagate instruction causes a transfer of the activity state from each cell to its neighbor (left or right, again depending upon the direction control). For example, if a cell is currently active and a propagation command is given with the control to the left, the cell will become inactive but the previous cell will become active (the previous cell being the one to the left of the current cell). Every cell is affected in parallel for this command, so each cell transfers its information simultaneously to the next (or previous) cell. The store and read instructions operate only on the active cells. An active cell, as already described, contains a tag in its activity field identifying it as ready to accept or provide a symbol when the appropriate command is given.
21 6
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
B A -
R
FIG.23. Directed graph.
The store command instructs every active cell to simultaneously save the symbol provided from the input port. The active state is then propagated to the next cell, and the current state is made inactive. In this way, a string can be saved one symbol at a time. The read command sends the symbol from any active cell to the output port. If there is more than one active cell, some multiple match resolution mechanism must be provided. By combining the four instructions groups above, the defined search ability already described can be obtained. As well, it is possible to perform other types of transactions, such as arithmetic operations and proximity searches, on the information contained in the DLM.
7.3 ASP Software Another interesting associative language is the ASP (association-storing processor) specification (Savitt et al., 1967). This language (and the architecture for its suggested implementation) was designed to simplify the programming of nonarithmetic problems. The basic unit of data in the ASP language is the relation. It relies on the ordered triples of indirect association already described in Section 5.2. In this architecture, the reader may recall, the triple (A, R, B) states that A is related to B through the relation R. This association of two items is what the ASP language calls a relation. The relations in ASP can be expressed as above, or as a directed graph as shown in Fig. 23. Each item in the relation must be distinct, and no item can appear without a relation. A compound item is formed when the association between two other items is itself an item with its own association as shown in Fig. 24. The ASP language transforms the data based upon conditions. There are two main components of this transformation. First, the language provides a search capability where the existing database is inspected for a match with one set of relations. Any matching data is replaced with data described by B
t
C FIG.24. A compound item.
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
21 7
another set of relations. Furthermore, the instruction identifies the next instruction to be executed based upon the success or failure of the current match operation (conditional branch). ASP instructions are expressed as structures of relations, and linked together to form programs. One interesting aspect of this representation is that one ASP program can be processed by another ASP program. The best way to show how this would work in practice is to provide an example. The ASP description might read: LOCATE items X1 which are examples of airman, and have the jobclass of arm spec, and are stationed at items X2 (which are located in Europe), and have the status of items X3. REPLACE items XI which have the status of X3 by items XI which have the status of ALERT.
This language statement would take all the military personnel identified in the above description and change their status to “alert.” It was expected that a machine would be constructed based upon the ASP specification, and that the language described would be executed on this hardware. So the language was first designed, then the hardware architecture was formulated. A major component in this hardware was a distributed logic associative memory called the context-addressed memory. This highly interconnected memory would have the capability to perform the global searches on all the items (and their relations) in parallel.
7.3.I RAPID System Software Parhami has suggested a CAM-based system for reference and document retrieval (Parhami, 1972). Called RAPID (rotating associative processor for information dissemination), it includes an underlying architecture and a machine language specification for that architecture. Like the DLM, this system performs its operation on variable-length string records rather than fixed-length words. Special tag symbols are specified so that records can be identified. The system is envisioned to be a byte-serial CAM, where a byte is inspected in one low-level hardware operation. For the purpose of understanding the language, we can assume that the hardware is a circulating disk with enough heads to read and write one byte simultaneously. This is how Parhami envisioned the hardware implementation. Other hardware mechanisms could be used, of course, but by viewing the hardware as a disk the reader can gain insight into how the language interacts with the processing engine. We
21 8
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
can further assumc that each machine language instruction is performed on at least one full rotation of the disk, so that subsequent instructions will operate on the information left by the previous instruction. As the instructions are described the similarity to the DLM will become apparent. The first instruction type is of the search variety. A single character or string (strings would take several rotations, one byte comparison per rotation) would be searched for. When found, an “active” marker would be set such that a subsequent rotation could recognize the results of the search. The character search could take the form of equal, not equal, greater/less than, and greater/less than or equal. Other search instructions would look for marked characters or strings and perform some operation on them, such as setting new active markers. The propugate instruction would transfer active markers to other characters. The currently active characters would have their markers cleared, and the target of the propagation would have their markers set. This propagation would happen to every marker occurrence in one revolution, and would appear to happen in parallel to the programmer. The expund instruction would take active markers and propagate them without clearing the currently active markers. So, when a marker was found, the next “x” characters would be marked active (“x” depending upon the exact instruction parameter) but the currently active marker would remain active. The contract instruction would reset markers in a row. The addinstruction would add a numerical value to all marked characters and replace the new sum into that character. Finally, the replucr instruction would replace every marked character with a new character specified in the instruction. The language just described can be used to write programs that would search for patterns and modify them based upon subsequent instructions. So an initial search could be made for a character string, then the marked characters would be expanded or propagated until just the right combination of characters would remain marked for retrieval or modification. 7.4 Patterson‘s PL/1 Language Extensions
Patterson has suggested extensions to the PL/l language to allow associative structures to be easily manipulated (Patterson, 1974). He believes that an extension of an existing language is more reasonable than an entirely new language, since most problems have both associative and sequential components. His language would add a declaration for an associative procedure, similar to the already-included PL/ 1 declaration for a recursive procedure. This procedure would include a parameters field that the programmer could use to specify the appropriate entry length for the application.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE
MEMORY
21 9
Variables would be differentiated by their nature (sequential or associative) upon their initial declaration. The two variable types would be static (for a normal sequential variable) and associative. An associative variable declaration would define a field in every associative word. Comparisons would be made in parallel and associatively between an input from the sequential machine and the associative words, or between two fields in all associative words. The associative words taking part in any operation (“active” words) would generally be a subset of the total words. The associative function would allow relational operators (greater/less than and equal) and logical operators (AND, OR, NOT) to execute. This operation would be simultaneously carried out on all currently active words, and would potentially reduce their number if all the words do not respond. Two special statements would find the minimum or maximum values in a particular field, and to activate each word that contains these values. The activate statement would perform the same relational and logical operations in parallel on the associative memory, but would be executed on all the words rather than the currently active ones. This could be used to set all the words active, or it could activate the first matching word found. The for statement could select a subset of the active words for the operation, and the else could be used to perform some operation on the active words that did not meet the selection criteria. Assignment statements in the PL/l extension would look similar to normal assignment statements in many languages ( X = Y ) . However, the outcome would be different if an associative variable was involved. If both variables were associative, the statement would move data from one field to another in all active words. If the source Y was a common sequential variable, then it would be loaded simultaneously into the X field of all active associative words. If the X field was common and the Y field was associative, then the first active word in the associative memory would be loaded into the destination variable.
7.5
PASCALIA
Another language suggested for extension is PASCAL. By adding some associative concepts to that language, PASCAL/A is formed (Stuttgen, 1985). PASCAL/A has only one more data structure than standard PASCAL, and that is the table. The table is similar to the “relation” in database terminology, but it provides more flexibility in that row uniqueness is not mandated (although the programmer can provide row uniqueness in the table if he wishes). A table declaration contains fields called attributes, which describe the information contained within it.
220
LAWRENCE CHlSVlN AND
R. JAMES
DUCKWORTH
Associative procedures in the PASCAL/A language operate on “active” rows in the table (the concept of “active” shows up often in associative languages). Generic instructions (such as emp.salary :=empsalary + 100) operate associatively and in parallel on all currently active rows. The statement above would add 1000 dollars to the scalar field in every active row in the database. Active rows are those that have matched a selection criteria, as set forth in specially defined content addressed statements for the language. For example, a statement such as
WHERE emp [ salary< 20,0001DO salary:=salary + 1000 would first search the database for employees currently earning less than 20,000 dollars, activating all the rows where this was true. Every row that matched this criteria would then have the salary increased by 1000 dollars. The associative data structures (tables) in PASCAL/A are interfaced by several special statements. The insert procedure writes a row into the table. Tables can be read by either the retrieve (nondestructive read) or the readout (erases row in table after reading) procedure. In each case, some arbitrary row would be copied into a buffer for other processing. Finally, the delete statement would erase all the active rows in the database. The PASCAL/A statements described above can be made into powerful programs that query and modify complex databases. The author of the language suggests the language would be especially strong in the areas of artificial intelligence, array processing, database systems, pattern recognition, numerical analysis, compilers, operating systems, and graph algorithms.
7.6 LUCAS Associative Processor PASCAL was also chosen as the base language for the LUCAS associative processor (Fernstrom et al., 1986) previously described in terms of its hardware and architecture (see Section 6.2.6). PASCAL was chosen for this system over APL and FORTRAN due to its structured language with powerful control, its strong typing of variables, and its excellent error detection offered at both compile and run time. The language PASCAL/L (as in PASCAL/LUCAS) adds several important extensions to standard PASCAL, including special constructions to allow parallel operations and special variable declarations for data that is allocated to the associative array. There are extensions to allow parallel operation, and special variable declarations for data that is allocated to the associative array. The two kinds of parallel variables are selectors and parallel arrays. Selector variables control the parallelism of operations on the PE. This can be
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
221
used to operate simultaneously on a defined subset of the PEs, while excluding operation on the unselected subset. A parallel array variable describes a fixed number of elements, all of which have the same type characteristic (e.g., integer, Boolean, character, etc.). The parallel array can also restrict the operation to a subset of the total PEs. Assignment statements operate based upon the variable types on the left and right side of the assignment. Sequential to sequential assignments operate just as they do in standard PASCAL. If the left-hand side (destination) is a parallel variable and the right-hand side (source) is a sequential variable, every component reference in the associative array will be loaded with the scalar expression contained in the sequential variable. If the destination is a sequential variable and the source is a parallel expression, the parallel source must indicate just one element in the array (e.g., S:=P[5]), and that is transferred to the sequential variable. Finally, if the destination is a parallel variable and the source is a parallel expression, the referenced components on the left are loaded with their corresponding elements on the right. The control structure for PASCAL/L also accommodates parallel associative processing. As many as 128 PEs may be operated on in parallel in each statement. So a statement such as
WHERE<selector expression>DO<true-statement> ELSEWHERE operates on both paths in parallel. The true clause is executed on one set of PEs using one set of data, while the false is also executed on another set of PEs using another data set. The CASE statement operates on all the paths in parallel, using different data on the different PEs. There is also a WHILE AND WHERE statement which repeats as long as the selector statement is true for any element, with the selector determining which PEs are active for any particular repetition. 7.7 The LEAP Language
Algol was extended to include some associative concepts (such as associations and sets) by Feldman and Rovner (1969) to create their LEAP language. Their language aims at striking the right balance between ease of use and efficiency of execution. They provide an interesting view of RAM as a special form of a CAM, with one field of the CAM reserved for the address of the word. By relaxing the need for a special field, the CAM provides more flexible retrieval capability. However, fixed and static fields (direct association) provide no ability to have complex interrelationships in the data. If we look at the example of a telephone directory (often used to show the benefits of an associative memory), the drawback of direct associative
222
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
becomes obvious. What if one person has two telephone numbers, or if one number must be associated with two people sharing an office? The LEAP language relies on the ordered triple concept (already described in Section 5.2 as indirect association) to create a more useful associative description. Thus, the language syntax treats each association as the 3-tuple (a, 0,v), representing the attribute, object, and value, respectively. A 3-tuple of this sort forms an association, and the items are the components of the association. Four new data type declarators were added to the standard ALGOL language: item, itemvar, local, and set. An item is similar to a LISP atom. An item that is stored in a variable is called an itemuur. Items may also be members of a set, or be associated to form the 3-tuples described above. A LEAP expression can be used to create new items or associations (construction expression), or to retrieve information about existing items or associations (retrieval expression). Items are obtained during execution by using the function new. The count operator returns the number of elements in the specified set. The istriple predicate returns a value that represents whether the specified argument is an association item. There are several set operators identifying the standard set manipulations such as NOT, AND, and OR. A few extra program statements are added to make ALGOL into LEAP. The put statement performs a union operation (e.g., put tom in sons will insert the item tom into the set sons), while the remoue statement does the opposite (removes the element from a set). The delete statement destroys an item that was previously created. The make statement places the specified “triple” into the universe of associations, whereas the erase statement removes the association from that universe. The most important addition in the LEAP language is the loop statement, exemplified by the,foreac.h statement. This statement must perform its operation over a set of simultaneous associative equations in order to determine the loop variables. The best way to show how this works is by example. The expression we will work with is : foreach father ’ x = hill do put x in sons In this expression, father and hill are items, x is a local data type, and sons is a set. This expression would first determine the set of all items who match the condition that the father attribute is “bill.” In other words, who are the people that have “bill” as their father? Each time the condition is met, the current value of “x” is added to the set “sons.” More complex expressions can be created that use more local variables and include Boolean functions in the set search space. The LEAP data system was used in a more recent artificial intelligence language called SAIL (Feldman et al., 1972).
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
223
7.8 Software for CA Systems The section on associative architectures introduced a computing system designed (and prototyped) by Blair and Denyer (1989). The system they envisioned has a careful matching between the software and its underlying hardware architecture. We will provide more details on the software for that system in this section. Their architecture was called a “triplet,” and contained a CAM-CPU-RAM combination. The CPU and RAM were viewed as a normal sequential (von Neumann) computer, and the CAM was connected to the CPU and accessed by an address group. Blair and Denyer chose (as did many before them) to extend an existing high-level language rather than create a new language from scratch, or count on an intelligent compiler to recognize when associative processing was possible. The C language was used as their starting point. Before describing the language extensions, we will explain an important underlying concept in this content-addressable system. After an associative comparison is performed using the CAM, a “bag” of words is formed from the matching entries. This bag contains the group of words whose tags show them to be active (the bag might be empty if no matches were made). Thefield statement identifies which fields within the CAM are to be used, and what type of information they contain (char, int, etc.). The bag is defined by the reserved word define, which describes the comparison fields and specifies their values. The function can also return a value that specifies whether the pattern describes an empty bag. The first operation type is the simple Boolean function empty, which is true if and only if the bag is empty. This can be used to determine when to stop looping (e.g., while (!empty)). The next operation returns the next value in the bag, and returns status to show when the bag is empty (there are no more values). The remove deletes an entry from the bag, and also returns similar status. Special language constructs are provided to perform common manipulations, such as loops that operate on each member of a bag. The foreuch statement first defines the bag based upon a pattern, and loops (similar to a while and next combination) until the bag is empty (i.e., the entire bag has been operated upon). The fromeuch loop defines a bag and then performs a remove operation on each member of the bag. The repeat with statement performs a new define function on each iteration of the loop, so the members of each bag depend upon the operations of the last iteration.
7.9 Neural Network Software Neural networks have been referred to several times in this chapter, and the underlying hardware for such a system is obviously significantly different than a normal sequential computer. The programming of a neural network
224
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
consists of modifying the connections between the primitive storage elements, and the systems created for this purpose have been called connectionist (Fahlman and Hinton, 1987). The programming of a connectionist computing system is more akin to teaching than to what is normally considered programming. The learning process is accomplished by entering initial input data, and feeding the selecied outputs back into the teaching inputs. The associative neural network decides which stored pattern most closely matches the input vector, and selects an ouput based upon the best match. The programming of connectionist systems, still a topic of much research and controversial debate, is dependent upon the associated hardware. The information in the neural system can be stored as a local or distributed representation. In a local representation, each discrete packet of data is localized to a section of the hardware. This is the easiest to program, since there is very little interaction between most of the neural nodes. It is also more familiar to most programmers, in that the individual chunks of information can be stored and validated without regard to the rest of the stored data. However, it allows many single points of failure to exist. If the local area used to store a piece of information is broken, that information is no longer available for recall. A distributed representation completely spreads the data throughout the available hardware. In this system, every neural node in the processing structure is potentially activated when any data is input. This eliminates any single point of failure, and is the most reliable associative system possible in terms of hardware. The disadvantage to the completely distributed representation is the programming and validation obstacle it presents to a software enginecr. Since every execution and storage unit is conceivably involved with every recall attempt, unexpected connections can influence the decision process. Languages that can be used to describe the operations generally performed by neural networks have been called neurosoftwure (Hecht-Nielsen, 1990). The goal of neurosoftware is to free the programmer from having to deal with the underlying mechanisms involved in storage and retrieval. In other words, let the hardware (or some firmware/operating system combination) map the programmer's conceptual statements into meaningful execution routines, freeing the user from tedious and difficult housekeeping chores. Lei the programmer describe what associative operations need to be performed at some high level. This theme is seen repeatedly in the examples of software written for associative processing systems. The following description shows how it applies to neural network software. It is assumed here that there is a traditional address-based computer acting as the user interface to the neural network. Since most problems have
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
225
sections that are best handled sequentially, this approach seems to provide the most efficient use of scarce resources. The initial function in this scenario must be to have some kind of network load command, which takes a description of the network and transfers it to the underlying network hardware. Once the network description is loaded, an instruction is required to define the run-time constants such as learning rates and thresholds. An instruction to define the initial state of each processing element is necessary, and there must be another instruction to provide a new input value (and its weight) to the processing elements. Another instruction should be included to monitor the current state of each processing element. Once the network has been initialized, an instruction to actually run the system must be included. This would cause the neural network to activate the underlying hardware and perform its execution. After the execution is done (or after some amount of time if the network runs continuously), an instruction to save the state of the network is necessary. This saved state can be used to restore the network at some later date. These primitive instructions can be used to completely control the neural network under program control of the host (traditional) computer. An example of a general-purpose neural network description language is AXON. This language is described in detail in Robert Hecht-Nielsen’s (1990) book on neurocomputing. 8 . Conclusion
This chapter has given a broad overview of content-addressable and associative systems. Important terms were defined, associative concepts were explained, and recent examples were provided in this rapidly progressing area of data processing and retrieval by content. In this chapter we concentrated on providing information on the most recent advances in content-addressable and associative systems. We conclude this chapter with information that places our article in historical context, and show the vast amount of research that has been lavished on the subject. There have been a significant number of major reviews of the topic during the last 30 years, and they are described briefly here. Most of these reviews contain a bibliography of their own and provide additional, older references. Finally we conclude with our thoughts on what we believe the next few years will bring in this area of intelligent memory systems. 8.1 Additional References
The first comprehensive survey of CAMS and associative memories was by Hanlon (1966). The CAM had been around for about 10 years at the
226
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
time, and his motivation was to summarize previous research and suggest interesting areas for further development. Hanlon’s survey described the concepts of this emerging field, and provided an excellent state-of-the-art (for 1966) tour of the topic. This included some details on the materials and architectures considered promising at the time. That same year the Advances in Computers series published a chapter by Murtha (1966) that discussed highly parallel information processing systems. Although the chapter was not entirely dedicated to the subject of associative systems, there was a good discussion of associative processors and their ramifications. The next major review of the research literature was done by Minker (1971). His paper was mostly a comprehensive bibliography with a very brief description of some interesting developments since the Hanlon survey. As with the Hanlon paper, Minker listed some applications of associative memories, as well as a few interesting implementation materials and memory organizations. He concluded that, as of 1971, “associative memory hardware technology has not yet come of age.” Parhami (1973) was the next major reviewer of the subject. His primary thesis was that “associative processing is an important concept that can be employed to enhance the performance of special-purpose and general-purpose computers of the future.” His article was not a tutorial, but rather a newer survey of associative processing techniques with a new bibliography for those interested in reading about it all first hand. His report described the architectural concepts inherent to associative storage and processing, detailed some interesting hardware implementations, briefly touched upon a few software considerations, and (as usual) provided some potential applications. P. Bruce Berra (1 974) provided a discussion of associative processors and their application to database management in his presentation at the 1974 AFIPS National Computer Conference and Exposition. He discussed most of the implementations attempted to that time, and showed the advantages and disadvantages of the associative approach in such applications. The ACM journal Computing Surveys published an article by Thurber and Wald (1975) that discussed associative and parallel processors. This article presented an excellent genealogy of associative SIMD machines, then went on to discuss at some length associative processors and their design issues and trade-offs. Several actual machines were highlighted. Two major books were published in 1976 that covered the topic of content-addressable systems. Foster (1976) dealt with the subject of content-addressable parallel processors. His book discussed the basics of content-addressable computers, included some useful algorithms for such machines, detailed several applications, presented some CAM hardware,
CONTENT-ADDRESSABLE AND ASSOCIATIVE MEMORY
227
and described the STARAN associative system in some detail. In that same year, Thurber (1976) published a book about large-scale parallel and associative computers. This book dealt with similar subject matter to the 1975 Thurber and Wald report, but was able to provide more details on the associative computers mentioned. In 1977 Yau and Fung surveyed associative processors for ACM Computing Surveys. During 1979, both IEEE Computer (1979a) and IEEE Transactions on Computers (1 979b) featured special issues on database machines. Each issue was dedicated to articles describing hardware and software for database applications. Kohonen (1987) put the subject all together in his 1980 book on contentaddressable memories (updated to briefly survey new information in 1987). He attempted to include a complete description of the field by “presenting most of the relevant results in a systematic form.” His book included information about CAM concepts, CAM hardware, and content-addressable processors. In 1985 Stuttgen (1985) provided a review of associative memories and processors as part of his book on hierarchical associative processing systems. The review section listed a number of different taxonomies for associative systems, including his own view. He then discussed several different past architectures in a way that allowed direct comparison of their benefits and drawbacks. Also, in 1985 Lea wrote a chapter called “Associative Processing” in his book Advanced Digital Information Systems (1985). The 1987 proceedings of COMPEURO contained an article by Waldschmidt ( 1987) that summarized the fields of associative processors and memories. In 1988, Su dedicated a chapter of his book on database computers to associative memory systems. The chapter presents an excellent overview of the topic, with descriptions of some major content-addressable architectures and their application to database management systems. Zeidler reviewed the topic of content-addressable mass memories (Zeidler, 1989) in his 1989 report. Mass memories are defined as those having large storage capacities (gigabytes), and are targeted for use in database and information systems. His paper was one of several in a special issue of the IEE Proceedings (1989) that concentrated on associative processors and memories. There are numerous other papers and books on the subject, including an earlier paper by the current authors (Chisvin and Duckworth, 1989). Many of them are mentioned in this report in reference to specific associative concepts. We have attempted to concentrate on recent developments in this chapter, and only refer to old references where they provide the classical description of some aspect of associative computing. We believe that the references above provide a reasonable historical review of the topic.
228
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
8.2 The Future of Content and Associative Memory Techniques
The concepts and techniques of content-addressable and associative systems, already making their appearance in the commercial world, will become more important in time. This will happen as the technology used to build the devices reduces the size and cost of the final system, and as more people become familiar with the systems thus created. The development of inherently fault-tolerant CAM devices should help to produce very large devices and the availability of optically-based CAMSin a few years seems an exciting possibility. We seem to be at an interesting crossroads in this field of intelligent or smart memory systems. The technology is now available to implement devices that are of reasonable size. However, the problem seems to be whether enough semiconductor manufacturers will support and produce enough general devices that system designers will consider using them. It is the classic chicken and egg situation : Engineers will not incorporate new parts into their products unless they are well supported and are second sourced by at least one other manufacturer, on the other hand, manufacturers will not commit to an expensive introduction of a major new part unless they perceive a sizable market for that part is available. As with any new and exciting field of knowledge, the success of the systems will depend on the availability of bright, motivated people to program and apply these systems to both current problems and problems not yet imagined.
Acknowledgments
The motivation for this work started at the University of Nottingham in England with the MUSE project (Brailsford and Duckworth, 1985). This project involved the design of a structured parallel processing system using a mixture of control and data flow techniques. The use of CAM to improve the performance of the machine was investigated by a research student who demonstrated the potential of this approach (Lee, 1987). We acknowledge his contributions to this field. We thank Worcester Polytechnic Institute for providing the resources to produce this chapter. We would also like to thank Gary Styskal, a M.S. student of Worcester Polytechnic Institute, who performed a literature and product search of CAM architectures and devices. This chapter was based on a report published in IEEE Computer (Chisvin and Duckworth, 1989) and we wish to thank the Institute of Electrical and Electronics Engineers (IEEE) for permission to use that material in this chapter.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
229
REFERENCES (a). Coherent Processor 4,096 Element Associative Processor. Data Sheet, Coherent Research, East Syracuse, New York. (1979a). IEEE Computer 12(3). ( 1979b). IEEE Transactions on Computers C-28(6). (1986a). Memory Update for Computers. New Scientist 109 (1492), 36. (l989b). SMC4k-GPX A General Purpose IBM PCjAT Add-on Content Addressable Memory Board. Data Sheet, Summit Microsystems Corporation, Sunnyvale, California. (1 989c). Special Section on Associative Processors and Memories. IEE ProceedinRs, Parf E 136(5), 341-399. (1989a). Special Issue on Neural Networks. IEEE Microsystems. (1990a). Am99C10A 256 x 48 Content Addressable Memory. Publication no. 08125, Advanced Micro Devices, Sunnyvale, California. (1990b). CRC32256 CMOS Associative Processor with 256 x 36 Static Content Addressable Memory. Coherent Research, Syracuse, New York. (1991a). MU9C1480 LANCAM. Data Sheet, MUSIC Semiconductors, Colorado Springs, Colorado. Almasi, G . S., and Gottlieb, A. (1989). Highly Parallel Computing. Bejamin/Cummings, Redwood City, California. Backus, J. (1978). Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. Communications of the A C M , 21(8), 613-641. Batcher. K. E. (1974). STARAN Parallel Processor System Hardware. Proceedings of AFIPS N C C 43, 405-410. Bergh, H.. Eneland, J., and Lundstrom, L.-E. (1990). A Fault-Tolerant Associative Memory with High-speed Operation. IEEE Journal of Solid-State Circuits 25(4), 912 919. Berkovich, S. Y . (198 1). Modelling of Large-Scale Markov Chains with Associative Pipelining. Proceedings 1981 International Conference on Parallel Processing, I3 I - 132. Berra, P. B. (1974). Some Problems in Associative Processor Applications to Data Base Management. Proceedings of AFIPS N C C 43, 1-5. Berra, P. B., and Troullinos, N. B. (1987a). Optical Techniques and Data/Knowledge Base Machines. IEEE Compufer 20( lo), 59-70. Berra, P. B., Chung, S. M., and Hachem, N. I. (1987b). Computer Architecture for a Surrogate File to a Very Large DatajKnowledge Base. IEEE Computer 20(3), 25 -32. Berra, P. B., Brenner, K.-H., Cathey, W. T., Caulfield, H. J., Lee, S. H., and Szu, H. (1990). Optical Database/Knowledgebase Machines. 29(2), 195-205. Bic, L., and Gilbert, J. P. (1986). Learning from AI: New Trends in Database Technology. IEEE Computer 19(3), 44-54. Blair, G . M. (1987). A Content Addressable Memory with a Fault-Tolerance Mechanism. IEEE Journal of Solid-State Circuits SC-22(4), 614-61 6. Blair, G. M., and Denyer, P. B. (1989). Content Addressability: An Exercise in the Semantic Matching of Hardware and Software Design. IEE Proceedings, Port E 136(1), 41 47. Boahen, K . A,, Pouliquen, P. O., Andreau, A. G., and Jenkins, R. E. (1989). A Heteroassociative Memory Using Current-Mode MOS Analog VLSI Circuits. IEEE Transactions on Circuits and Systems 36(5), 747 155. Bonar, J. G., and Levitan, S. P. (1981). Real-Time LISP Using Content Addressable Memory. Proceedings 1981 International Conference on Parallel Processing, 112- 1 19. Brailsford, D. F., and Duckworth, R. J. (1985). The MUSE Machine-An Architecture for Structured Data Flow Computation. New Generation Computing 3, I8 1- 195, OHMSHA Ltd., Japan.
230
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
Brown. C. (May 13, 1991). Chip Doubles as Data Cruncher. Elrctronic Engineering Times 43, 46. Burnard, L. (1987). CAFS: A New Solution of an Old Problem. LiteraryandLinguistic Computing 2(1), 7 12. Rursky, D. ( 1988). Content-Addressable Memory Does Fast Matching. Electronic Design 36(27), 119 121. Chae, %-I.. Walker, T., Fu, C.-C., and Pease, R. F. (1988). Content-Addressable Memory for VLSI Pattern Inspection. IEEE Journal of’ Solid-state Circuits 23( I). 74--78. Cherri, A. K., and Karim, M. A. (1988). Modified-Signed Digit Arithmetic Using an Efficient Symbolic Substitution. Applied Optics 27( 18), 3824 3827. Chisvin, L., and Duckworth, J. (1989). Content-Addressable and Associative Memory: Alternatives to the Ubiquitous RAM. IEEE Computer 22(7), 51-64. Chu, Y., and Itano, K. (1985). Execution in a Parallel, Associative Prolog Machine. Technical Report TR-147 I , University of Maryland, College Park. Cordonnier, V., and Moussu, L. (1981). The M.A.P. Project: An Associative Processor for Speech Processing. Proceedings 1981 International Conjcrence on Parallel Processing, 120128. Davis, E. W. (1974). STARAN Parallel Processor System Software. Proceedings qf AFIPS NCC 43, 11-22. Deeegdma, A. L. (1989). The Technology of Parallel Proccssing-Volume I . Prentice-Hall, Englewood Clifs, New Jersey. Eichmann, G., and Kasparis, T. (1989). Pattern Classification Using a Linear Associative Memory. Pattern Recognition 22(6), 733 740. Fahlman, S. E., and Hinton, G. E. (1987). Connectionist Architectures for Artificial Intelligence. IEEE Cornputer 20( I ) , 100 109. Farhat, N. H. (1989). Optoelectronic Neural Networks and Learning Machines. IEEE Circuits and Devices, 32-~41. Feldman, J. A,, and Rovncr, P. D. (1969). An Algol-Based Associative Language. Communicafions of the ACM 12(8), 439 449. Feldman, J . A,, Low, J. R., Swinehart, D. C., and Taylor, R. H. (1972). Recent Developments in SAIL-An Algol-Based Language for Artificial Intelligence. Proceedings uf AFIPS FJCC, 41, Part 11, 1193--1202. Feldman, J. D., and Fulmer. L. C. (1974). RADCAP-AN Operational Parallel Processing Facility. Proceedings of’ A F I P S N C C 43, 7-1 5. Fernstrom, C., Kruzela, I., and Svensson, B. ( 1986). “LUCAS Associative Array Processor.” Springer-Verlag, Berlin. Flynn, M. J. Some Computer Organizations and Their Effectiveness. IEEE Transactions on Computers C-21(9), 948-960 (September 1972). Foster, C. C. ( 1976). “Content Addressable Parallel Processors.” Van Nostrand Reinhold C:ompany, New York. Gardner, W. D. Neural Nets Get Practical. High Performance Systems, 68-72. Gillenson, M. L. (1987). The Duality of Database Structures and Design Techniques. Communications of’the A C M 30(12), 1056---1065. Gillenson, M. L. (1990). “Database Design and Performance.” In “Advances in Computers”Volume 30, pp. 39 83. Goksel, A. K., Krambeck, R. H., Thomas, P. P., Tsay, M.-S., Chen, C. T., Clemens, D. G., LaRocca, F. D., and Mai, L.-P. (1989). A Content-Addressable Memory Management Unit with On-Chip Data Cache. IEEE Journal of’ Solid-State Circuits 24( 3), 592-596. Goser, K., Hilleringmann, U., Rueckert, U., and Schumacher, K. (1989). VLSI Technologies al Neural Networks. IEEE Micru, 28-44.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
231
Grabec, I., and Sachse, W. (1989). Experimental Characterization of Ultrasonic Phenomena by a Learning System. Journal of’Applied Physics 66(9), 3993-4000. Graf, H. P., Jackel, L. D., and Hubbard, W. E. (1988). VLSI Implementation of a Neural Network Model. IEEE Computer 21(3), 41L49. Grosspietsch, K. E., Huber, H., and Muller, A. (1986). The Concept of a Fault-Tolerant and Easily-Testable Associative Memory. FTCS-16, Digest of Papers, The 16th Annual International Symposium on Fuult-Tolerant Computing Systems, 34- 39. Grosspietsch, K. E., Huber, H., and Muller, A. (1987). The VLSI Implementation of a FaultTolerant and Easily-Testable Associative Memory. Proceedings of Compeuro ’87, 47 50. Grosspietsch, K. E. (1989). Architectures for Testability and Fault Tolerance in ContentAddressable Systems. TEE Proceedings, Part E, 136(5), 366-373. Curd, J. R., Kirkham, C. C., and Watson, I. (1985). The Manchester Prototype Dataflow Computer. Communications of’the ACM 28( I ) , 34-52. Hamming, R. W. (1 980). “Coding and Information Theory.” Prentice-Hall, Englewood Cliffs, New Jersey. Hanlon, A. G. (1966). Content-Addressable and Associative Memory Systems. IEEE Transactions on Electronic Computers EC-15(4), 509-521. Hashizume, M., Yamamoto, H., Tamesadd, T., and Hanibuti, T. (1989). Evaluation of a Retrieval System Using Content Addressable Memory. Systems and Computers in Japan 20(7), 1-9. Hecht-Nielsen, R. (1990). “Neurocomputing.” Addison-Wesley, Reading, Massachusetts. Hermann, F. P., Keast, C. L.. Ishio, K., Wade, J . P., and Sodini, C. G. A Dynamic ThreeState Memory Cell for High-Density Associative Processors. IEEE Journal of Solid-Stare Circuits 26(4), 537-541. Hirata, M., Yamada, H., Nagai, H., and Takahashi, K. (1988). A Versatile Data-String-Search VLSI. IEEE Journal of Solid-State Circuits 23(2), 329- 335. Holbrook, R. (1988). New RDBMS Dispel Doubts. Perform OLTP Applications. Computer Technology Review 8(6), 1I - 15. Hurson, A. R., Miller, L. L., Pakzad, S. H., Eich, M. H., and Shirazi, B. (1989). Parallel Architectures for Database Systems. In “Advances in Computers”-Volume 28, pp. 107151. Jones, S. (1988). Design, Selection and Implementation of a Content-Addressable Memory for a VLSI CMOS Chip Architecture. IEE Proceedings Part E. Computers and Digital Techniques 135(3), 165 172. Kadota, H., Miyake, J., Nishimichi, Y., Kudoh, H., and Kagawa, K. (1985). An 8-kbit Content-Addressable and Reentrant Memory. IEEE Jotrrnal of Solid-State Circuits SC-20(5), 951-957. Kartashev, S. P., and Kartashev, S. I. (1984). Memory Allocations for Multiprocessor Systems That Incorporate Content-Addressable Memories. IEEE Transactions on Computers C-33( I), pp. 28 ~ 4 4 . Knuth, D. E. (1973). “The Art of Computer Programming-Volume 3: Sorting and Searching.” Addison-Wesley, Reading, Massachusetts. Kogge, P., Oldfield, J., Brule, M., and Stormon, C. (1988). VLSI and Rule-Based Systems, Computer Archirecrure News 16(5), 52 65. Kohonen, T. (1977). “Associative Memories: A System-Theoretical Approach.” Springer-Verlag, New York. Kohonen, T., Oja, E., and Lehtio, P. (1981). Storage and Processing of Information in Distributed Associative Memory Systems. In “Parallel Models of Associative Memory” (Anderson, J. A,, ed.), pp. 105-143. Lawrence Erlbaum, Hillsdale, New Jersey. Kohonen, T. ( 1987). “Content-Addressable Memories.” Springer-Verlag, New York.
232
LAWRENCE CHlSVlN AND R . JAMES DUCKWORTH
Lea, R M. (1975). Information Processing with an Associative Parallel Processor. IEEE Computer, 25-32. Lea, R. M. (1985a). Associative Processing. In “Advanced Digital Information Systems” (Aleksander, I., ed.), pp. 531-585. Prentice Hall, New York. Lea, R. M. (1986b). VLSI and WSI Associative String Processors for Cost-Effective Parallel Processing, The Computer Journal, 29(6), 486-494. Lea, R. M. (1986~).VLSI and WSI String Processors for Structured Data Processing. IEE Proceedings, Part E 133(3), 153-161. Lea, R. M. (19x64. SCAPE: A Single-Chip Array Processing Element for Signal and Image Processing, IEE Proceedings, Pt. E 133(3), 145-151. Lea, R. M. (1988a).The ASP, A Fault-Tolerant VLSI/ULSI/WSI Associative String Processor for Cost-Effective Systolic Processing. Proceeding3 1988 IEEE Internastional Conference on Systolic, Arrays, 5 15-524. Lea, R. M. (1988b). ASP: A Cost-Effective Parallel Microcomputer. IEEE Micro 8(5), 10-29. Lee, C. Y. (1962). Intercommunicating Cells, Basis for a Distributed Logic Computer. FJCC 22, 130 136. Lee, D. L.. and Lochovsky, F. H. (1990). HYTREM-A Hybrid Text-Retrieval Machine for Large Databases. IEEE Transactions on Computers 39( I), 111-123. Lee, J . S. J., and Lin, C. (1988). A Pipeline Architecture for Real-Time Connected Components Laheling. Proceedings ojthe S P l E 1004, 195 201. Lee, W.-I-’.(1987). Thc Development of Associative Memory for Advanced Computer System, M.Phil. Thesis, IJniversity of Nottingham. Lerncr, E. J. (1987). Connections: Associative Memory for Computers. Aerospace America, 12-13. Lippmann, R. P. (1987). An Introduction to Computing with Neural Nets. IEEE ASSP Muguzine 4(2), 4 21. Mazumder, P., and Patel, J. H. (1987). Methodologies for Testing Embedded Content Addressable Memories. FTCS-17, Digex! of Pupers, Tire 17th Intrrnurional Symposium on FaultTolerant Gompuling, 201-275. McAuley, A. J., and Cotton, C. J. (1991). A Self-Testing Reconfigurable CAM. IEEE Journal qf Soliii-State Circuits 26(3), 257-261. McGregor, D., McInnes, S., and Henning, M. (1987). An Architecture for Associative Processing of Large Knowledge Bases (LKBs). The Computer Journal 30(5), 404-412. Minker, J. (1971 ). An Overview of Associative or Content-Addressable Memory Systems and a KWIC Index to the Literature: 1956 1970. A C M Computing Reviews 12(10), 453-504. Mirsalehi, M. M., and Gaylord, T. K. (1986). Truth-Table Look-Up Parallel Data Processing Using A n Optical Content-Addressable Memory. Applied Optics 25( 14), 2277-2283. Morisue, M., Kaneko, M., and Hosoya, H. (1987). A Content-Addressable Memory Circuit Using Josephson Junctions. Transactions on Mugnetics MAG-23(2), 743-746. Motomura. M.. Toyoura, J., Hirdta, K., Ooka, H., Yamada, H., and Enomoto, T. (1990). A I .2-Million Transistor, 3-MHz, 20-b Dictionary Search Processor (DISP) ULSI with a 160kb CAM. IEEE Journal of Solid-State Circuits 25(5), 1158-1165. Murdocca, M., Hall, J., Levy, S., and Smith, D. (1989). Proposal for an Optical Content Addressable Memory. Optical Computing 1989 Technical Digesst Series 9, 210 213. Murray, J. P. ( 1990). The Trade-offs in Neural-Net Implementations. High Performance Systems, 74 78. Murtha, J. C. (1966). Highly Parallel Information Processing Systems. In “Advances in Computers”-Volume 7, pp. 2--116. Academic Press, New York. Naganuma. J.. Ogura, T., Yamada, S., and Kimura, T. (1988). High-speed CAM-Based Architecture for a Prolog Machine (ASCA). IEEE Transactions on Computers 37( l l ) , 1375-1383.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
233
Nakamura, K. (1984). Associative Concurrent Evaluation of Logic Programs. Journal ofLogic Programming 1(4), 285-295. Ng, Y. H., and Glover, R. J. (1987). The Basic Memory Support for Functional Languages. Proceedings of COMPEURO ‘87, 35 40. Ogura, T., Yamada, S., and Nikaido, T. (1985). A 4-kbit Associative Memory LSI. IEEE Journal of Solid-State Circuits SC-20(6), 1277-1282. Oura, T., Yamada, S., and Yamada. J. (1986). A 20kb CMOS Associative Memory LSI for Artificial Intelligence Machines. Proceedings IEE International Conference on Computer Design: VLSI in Compulers, 574-571. Oldfield, J. V. (1986). Logic Programs and an Experimental Architecture for their Execution. IEE Proceedings, Part I133(3), 123-127. Oldfield, J. V., Williams, R. D., and Wiseman, N. E. (1987a). Content-Addressable Memories for Storing and Processing Recursively Subdivided Images and Trees. Electronics Letters 23(6), 262. Oldfield, J. V., Stormon, C. D., and Brule, M. (1987b). The Application of VLSI Contentaddressable Memories to the Acceleration of Logic Programming Systems. Proceedings of COMPEURO ’87, 27-30. Papachristou, C. H. ( 1987). Associative Table Lookup Processing for Multioperand Residue Arithmetic. Journal of the ACM 34(2), 376-396. Parhami, B. (1 972). A Highly Parallel Computing System for Information Retrieval. Proceedings of AFIPS FJCC 41(Part 11), 681-690. Parhami, B. (1973). Associative Memories and Processors: An Overview and Selected Bibliography. Proceedings of the IEEE 61(6), 722-730. Parhami, B. (1989). Optimal Number of Disc Clock Tracks for Block-Oriented Rotating Associative Processors. IEE Proceedings, Part E, 136(6), 535-538. Patterson, W. W. (1974). Some Thoughts on Associative Processing Language. Proceedings of AFIPS NCC 43, 23-26. Pfister, G. F., and Norton, V. A. (1985). Hot Spot Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers C-34( lo), 943-948. Potter, J . L. (1988). Data Struclures for Associative Supercomputers. Proceedings 2nd Symposium on the Frontiers of Massively Parallel Computations, 77-84. Ribeiro, J . C. (1988). “CAMOAndOr: An Implementation of Logic Programming Exploring Coarse and Fine Grain Parallelism.” CASE Center Technical Report No. 88 15, Syracuse University, Syracuse, New York. Ribeiro, J. C. D. F., Stormon, C. D., Oldfield, J. V., and Brule, M. R. (1989). ContentAddressable Memories Applied to Execution of Logic Programs. IEE Proceedings, Part E 136(5), 383 388. Savitt, D. A,, Love, H. H., Jr., and Troop, R. E. (1967). ASP: A New Concept in Language and Machine Organization. Proceedings of AFIPS SJCC 30, 87-102. Shin, H., and Malek, M. (1985a). Parallel Garbage Collection with Associative Tag. Proceedings 1985 International Conference on Parallel Processing, 369-375. Shin, H., and Malek, M. (1985b). A Boolean Content Addressable Memory and Its Applications. Proceedings of the IEEE 73(6), 1142-1 144. Shu, D., Chow, L.-W., Nash, J. G., and Weems, C. (1988). A Content Addressable, Bit-Serial Associative Processor. VLSI Signal Processing 111, 120-128. da Silva, J. G. D., and Watson, I. (1983). Pseudo-Associative Store with Hardware Hashing, IEE Proceedings, Part E 130(1), 1 9 24. ~ Slade, A. E., and McMahon, H. 0. (1956). A Cryotron Catalog Memory System. Proceedings UfEJCC, 115 120. Slotnick, D. L. (1970). Logic per Track Devices. Advances in Computers 10, 291 -296.
234
LAWRENCE CHlSVlN AND R. JAMES DUCKWORTH
Smith, D. C. P., and Smith, J. M. (1979). “Relational Database Machines.” IEEE Computer 12(3), 28 38. Snyder, W. E., and Savage, C. D. (1982). Content-Addressable Read/Write Memories for Image Analysis. IEEE Transactions on Cornpulers C-31( lo), 963~-968. Sodini, C., Zippel, R,, Wade, J., Tsai, C., Reif, R., Osler, P., and Early, K . The MIT Database Accelerator: A Novel Content Addressable Memory. WESCON/86 Conference Record 1214, 1-6. Stone, H. S . ( 1990). “High Performance Computer Architecture.” Addison-Wesley, Reading, Massachusetts. Stormon, C. D. (1989). “The Coherent Processor. An Associative Processor for A1 and Database.” Technical Report, Coherent Research, Syracuse, New York. Stuttgen, H. J. ( 1985). “A Hicrarchical Associative Processing System.” Springer-Verlag, Berlin. Su, S. Y . W. (1988). “Database Computers: Principals, Architectures, and Techniques,” pp, 180-225. McGraw-Hill, New York. Suzuki, K., and Ohtsuki, T. (1990). CAM-Based Hardware Engine for Geometrical Problems in VLSI Design. Electronics and Communications in Japan, Part 3 (Fundamental Electronic Science) 73(3), 57- 67. Takata, H., Komuri, S., Tamura, T., Asdi, F., Satoh, H., Ohno, T., Tokudua, T., Nishikawa, H., and Terada. H. (1990). A 100-Mega-Access per Second Matching Memory for a DataDriven Miroprocessor. IEEE Journul of Solid-State Circuits 25( I), 95-99. Tavangarian, D. (1989). Flag-Algebra: A New Concept for the Realisation of Fully Parallel Associative Architectures. IEE Proceedings, Part E 136(5), 357-365. Thurber, K. J., and Wald, L. D. (1975). Associative and Parallel Processors. ACM Computing Surveys 7(4), 21 5-255. Thurber, K. J. ( 1976). “Large Scale Computer Architecture: Parallel and Associative Processors.” Hayden, Rochelle Park, New Jersey. l’releaven, P., Pdcheco, M., and Vellasco, M. (1989). VLSI Architectures for Neural Networks. IEEE Micro, 8-27. Uvieghara, G . A,. Nakagome, Y., Jeong, D.-K., and Hodges, D. A. (1990). An On-Chip Smart Memory for a Data-Flow CPU. IEEE Journcrl of Solid-State Circuits 25(1), 84 94. Verleysen, M., Sirletti, B., Vandemeulebroecke, A,, and Jespers, P. G . A. (1989a). A HighStorage Capacity Content-Addressable Memory and Its Learning Algorithm. IEEE Transactions on Circuits und Systems 36(5), 762 766. Verleysen, M., Sirletti, B., Vandemeulebroecke, A,, and Jespers, P. G. A. (1989b). Neural Networks for I ligh-Storage Content-Addressable Memory: VLSI Circuit and Learning Algorithm. IEEE Journul af Solid-Stnte Circuits 24(3), 562-569. Wade, J . P., and Sodini, C. G. (1987). Dynamic Cross-Coupled Bit-Line Content Addressable Journal of Solid-State Circuits SC-22( I ) , 119Memory Cell for High Density Arrays. I 121. Wade, J. P., and Sodini, C. G. (1989). A Ternary Content Addressable Search Engine. IEEE Journul oj’ Solirl-Stcite Circuits 24(4), 1003 1013. Waldschmidt, K. (1987). Associative Processors and Memories: Overview and Current Status. Proceedings of COMPEURO ’87, 19 26. Wallis, I.. (1984). Associative Memory Calls on the Talents of Systolic Array Chip. Electronic Design. 217 226. Wayner, P. (1991). Smart Memories. Byte 16(3), 147-152. Wheel, L. (1990). LAN Controller Goes Sonic. Elecfronic Engineering Times 571, 25, 30. White, H. J., Aldridge, N. B., and Lindsay, I. (1988). Digital and Analogue Holographic Associative Memories. optical Engineering 27( I), 30 37.
CONTENT-ADDRESSABLE A N D ASSOCIATIVE MEMORY
235
Wilnai, D., and Amitai, Z. (1990). Speed LAN-Address Filtering with CAMS. Electronic Design, 75 78. Wu, C. T., and Burkhard, W. A. (1987). Associative Searching in Multiple Storage Units. ACM Transactions on Database Systems 12( I ) , 38 64. Ydmada, H., Hirata, M., Nagai, H., and Takahashi, K. (1987). A High-speed String Search Engine. IEEE Journal of Solid-State Circuits 22(5), 829-834. Yasuura, H., Tsujimoto, T., and Tamaru, K. (1988). Parallel Exhaustive Search for Several NP-complete Problems Using Content Addressable Memories. Proceedings of I988 IEEE International Symposium on Circuiis and Sysiems 1, 333 336. Yau, S. S., and Fung, H. S. (1977). Associative Processor Architecture-A Survey. ACM Computing Surveys 9( I), 3-27. Zeidler, H. Ch. (1989). Content-Addressable Mass Memories. IEE Proceedings, Part E 136(5), 351-356. ~
This Page Intentionally Left Blank
Image Database Management WILLIAM I. GROSKY Computer Science Department Wayne State University Detroit, Michigan
RAJIV MEHROTRA Computer Science Department Center for Robotics and Manufacturing Systems University of Kentucky Lexington, Kentucky 1. Introduction . . . . . . . . . 2. Image Database Management System Architecture . 2.1 Classical Database Architecture . . . . . 2.2 Classical Data Models . . . . . . . . 2.3 A Generic Image Database Architecture . . . 2.4 A Generic Image Data Model. . . . . . 3. Some Example Image Database Management Systems 3.1 First-Generation Systems . . . . . . . 3.2 Second-Generation Systems . . . . . . 3.3 Third-Generation Systems . . . . . . . 4. Similarity Retrieval in Image Database Systems . . 4.1 Shape Similarity-Based Retrieval . . . . . 4.2 Spatial Relationship-Based Retrieval . . . . 5. Conclusions . . . . . . . . . . . . Acknowledgments. . . . . . . . . . . References and Bibliography. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
231 239 239 241 245 241 249 250 256 265 266 268 219 283 283 283
1. Introduction
Contemporary database management systems are devised to give users a seamless and transparent view into the data landscape being managed. Such programs give users the illusion that their view of the data corresponds to the way that it is actually internally represented, as if they were the only users of the software. Originally developed for data processing applications in a business environment, there has recently been much interest expressed in the database community for devising databases for such nonstandard data as graphics (CAD/CAM in a manufacturing environment), maps (geographic information systems), statistics (scientific-experimental data 237 ADVANCES IN COMPUTERS, VOL. 35
Copyright 1992 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0- 12-012134-4
238
WILLIAM I. GROSKY AND RAJlV MEHROTRA
management), rules (deductive databases-expert systems), images, video, and audio (image-document-multimedia databases), as well as their various combinations. Much of the initial impetus for the development for such nonstandard databases originated in the scientific community concerned with the type of data that was to be managed. In this survey chapter, we hope to convey an appreciation for the continuing development of the field of image databases. The initial impetus for image databases originated with the image interpretation community. Most of the proposals from this community, however, were quite narrowly conceived and hence, after a brief flurry of activity in the late 1970s and early mid-l980s, interest in this activity decreased drastically, even resulting in the dropping of this area from the title of the IEEEsponsored workshop previously known as the Workshop on Computer Architecture ,for Pattern Analysis and Image Database Management. It is now known as the Conjermce on Computer Architecture j o r Pattern Analysis and Machine Intelligence. In our opinion, interest could not be sustained in this area due to its unsophisticated conception. The image interpretation community, or more accurately for the time, the pattern recognition and image processing community, conceived of an image database management system as just a way of managing images for image algorithm development test beds. Images were retrieved based on information in header files, which contained only textual information. At this time, the database community largely ignored such nonstandard applications due, we believe, to the then unsophisticated nature of the then current database management systems. It has only been since the development of various object-oriented approaches to database management that the field has expanded into these areas. In the last half of the 1980s, however, the situation had largely been reversed. The database community had expressed much interest in the development of nonstandard database management systems, including image databases, due, as mentioned earlier, to the development of the object-oriented paradigm as well as various data-driven approaches to iconic indexing. However, the interest of the image interpretation community had wavered. Only in the decade of the 1990s have the two communities been converging on a common conception of what an image database should be. This is due to the acceptance of the belief that image and textual information should be treated equally. Images should be able to be retrieved by content and should also be integral components of the query language. Thus, image interpretation should be an important component of any query processing strategy. A revived interest in the ficld from this perspective is shown by the publication of Grosky and Mehrotra (1989a). As is becoming increasingly apparent, moreover, the experience gained from this view of what an image database should be will generalize to other modalities, such as voice and touch, and
IMAGE DATABASE MANAGEMENT
239
will usher in the more general field of study of what we call sensor-based data manugement. An area related to image database management, and even considered a subarea by some researchers, is that of geographic or spatial database management. While there are many common issues between these two fields, notably those in data representation, data modeling, and query processing, the intent of geographic or spatial database researchers is quite different from what we consider the intent of image database management to be. Researchers in spatial data management are concerned with managing map data, which is largely graphics, or presentation, oriented. With the exception of satellite data interpretation issues, there is no notion of interpreting a map that has just been acquired by some sensor. Interpretation issues are largely bypassed by having them entered into the system by the users or the database administrators. The system already knows that, say, a lake exists in a particular region or that a particular road connects two cities in specific geographic coordinates. The research issues of spatial data management concern how to represent and query such nonstandard information in a database environment. The relation between this field and image database management lies in the fact that map data and image feature data are related and can be represented and modeled in similar ways. Similarly, spatial query language design gives insight into the design of query languages that encompass images. While discussing certain papers in the spatial data management field as they pertain to issues in common with image data management, this chapter will largely bypass the field. The reader is referred to Samet (1990a; 1990b) for a good survey of this area. There are many interesting problems in the field of image database management. Those that will be discussed in this chapter concern data modeling, sensor data representation and interpretation, user interfaces, and query processing. The organization of the rest of this chapter is as follows. In Section 2, we discuss a generic image database architecture after familiarizing the reader with various classic database architectures. Section 3 covers various implementations and proposals for image database management systems. We have divided these systems into three generations. The very important topic of similarity retrieval and query processing in this relatively new environment is then discussed in Section 4. Finally, we offer our conclusions in Section 5. 2.
Image Database Management System Architecture
2.1 Classical Database Architecture The architecture of a standard database management system, as shown in Fig. 1, is usually divided into three different levels, corresponding to the
240
WILLIAM I. GROSKY A N D RAJlV MEHROTRA User,
Userz
User
External Database Level (Individual User Views)
(Community View)
Physical Database Level (Storage View)
FIG. 1. The architecture of a database management system
ANSI/SPARC standard (Tsichritzis and Lochovsky, 1978). These levels are the physical database level, the conceptual database level, and the external database (view) level. The physical database resides permanently on secondary storage devices. This level is concerned with actual data storage methods. The conceptual database is an abstracted representation of a real world pertinent to the enterprise that is using the database, a so-called miniworld. The external database or view level is concerned with the way in which the data is viewed by individual users. In other words, an external database is an abstracted representation of a possibly transformed portion of the conceptual database. These levels of abstraction in data representation provide two levels of data independence. The first type of independence, called physical duta independence, follows from the relationship between the physical database level and the conceptual database level. This permits modifications to the physical database organization without requiring any alterations at the conceptual database level. The second type of independence, which follows from the relationship between the conceptual database level and the external database level, is called lugicul duta independence. This allows modifications to the conceptual level without affecting the existing external database, which also extends to any application programs that have access to the database. A database management system provides a data definition language (DDL) to specify the definition of the conceptual database in terms of some data model (the concepptual schema), as well as to declare views or external databases (the external schema). There is also a data manipulation language (DML) to express queries and operations over external views.
IMAGE DATABASE MANAGEMENT
241
In Section 2.3, we will see how the classical database architecture should be modified in order to support image data. 2.2 Classical Data Models
The implementation-independent framework that is employed to describe a database at the logical and external level is called a data model. These models represent the subject database in terms of entities, entity types, attributes of entities, operations on entities and entity types, and relationships among entities and entity types. There is a growing literature on various types of data models (Hull and King, 1987 ; Peckham and Maryanski, 1988). The most important ones that will be discussed here are the entity-relationship data model, the relational data model, the functional data model, and the object-oriented data model. Each of these data models has been used and extended to support image data. 2.2.1 The Entity- Relationship Data Model
This approach represents the entities under consideration as well as relationships between them in a generic fashion (Chen, 1976). Each entity has a set of associated attributes, each of which can be considered to be a property of the entity. Relationships among entities may also have associated attributes. As an example, suppose we are trying to represent the situation where students are taking courses in a university environment. The entities would be student, course, fuculty, and department. The student entity would have the attributes name (string), address (string), and social security number (string) ; the course entity would have the attributes name (string), number (integer), and description (string) ; the faculty entity would have the attributes name (string), address (string), social security number (string), and salary (integer) ; and the department entity would have the attributes name (string) and college (string). Among the relationships might be one between student, course, and department with associated attributes date (date) and grade (character), which indicates when and how successfully a given student took a given course; another between faculty, course, and department with associated attributes date (date), building (string), and room number (integer), which indicates when and where a particular faculty taught a particular course; another between course and department with no attributes, indicating that the given course belongs to the given department; another between faculty and department with associated attributes rank (string) and hire date (date), which indicates that a particular faculty member belongs to a particular department and is of a certain rank; and another between faculty
242
WILLIAM I. GROSKY AND RAJlV MEHROTRA
and department with associated attributes from-date (date), and to-date (date), which indicates that a given faculty member was chair of the given department for a given period of time. Various extensions to this model have added such enhancements as integrity conditions to the basic model (Teorey, 1990).
2.2.2 The Relational Data Model This model was motivated largely by the design of various file processing systems. Just as a file consists of records, each record consisting of various fields, a relation consists of tuples, each tuple consisting of various attributes. Thus, from a naive point of view, file corresponds to relation, record corresponds to tuple, and field corresponds to attribute. The beauty of the relational approach is that it is a mathematically precise model. There exist precisely defined operators (union, difference, selection, projection, join) that, using relational algebra, can be combined to retrieve any necessary information the user requires. It has also been shown, through the use of relational calculus, that relational algebra is quite powerful in its querying capabilities. Until the mid-l980s, most applications could be modeled using this approach with no loss in semantics. However, such modern applications for database management systems as graphics, rule-based systems, multimedia, and geographic information systems have experienced difficulty in using relational systems without a loss of semantics. This loss can be overcome only by adding some nonrelational components to the system. This disadvantage is perhaps the main reason why object-oriented database management systems are becoming the systems of choice. As an example, let us consider the miniworld discussed in Section 2.2.1. In our notation, we will write R(a,h) to denote the fact that we have a relation called R with attributes a and b. We then have the following relations with their associated attributes : student (name, address, social-securitynumber) course (name, number, description) faculty (name, address, social-securitynumber, salary) department (name, college) takes (student-social-security number, course-name, course-number, coursedescription, department -name, date, grade )
IMAGE DATABASE MANAGEMENT
243
taught (faculty-social-security-number, course-name, course-number, coursede script i on, department -name, date, building, room-number) works (faculty-social-security-number, department-name, rank, hire-date) belongs (course-name, course-number, coursedescription, department-name) chair (faculty-social-security-number, department-name, from-date, to-date) These attributes are defined over the same domains as in Section 2.2.1 2.2.3 The Functional Data Model
In this approach, attributes and relations are represented by functions whose domain consists of entities and whose range consists of entities or sets of entities (Shipman, 1981). The preceding miniworld can then be represented by the following functions :
student() + + entity name (student) + string address (student) + string social-security-number (student) + string courses (student) + + course x date x character course() -+ -+ entity name (course) string number (course) + integer description (course) -+ string home (course) department -+
faculty() + + entity name (faculty) -+ string address (faculty) + string social-security-number (faculty) -+ string salary (faculty) + integer works (faculty) + + department x string x date department() + -+ entity name (department) + string college (department) -+ string chairs (department) + + faculty
x
time-period
244
WILLIAM I. GROSKY AND RAJIV MEHROTRA
time-period() -+ -P entity begin (time-period) + date end (time-period) + date We note that a set-valued function is denoted by
-+
--*
2.2.4 The Object-Oriented Data Model The weakness of the relational model to fully capture the semantics of such recent database application areas as computer-aided design (CAD), computer-assisted software engineering (CASE), office information systems (OTS), and artificial intelligence, over the years, has become quite apparent. Due to its semantic richness, various object-oriented approaches to database design have been gaining in popularity (Zdonik and Maier, 1990). There are disadvantages to the object-oriented approach, however. These include the lack of agreement on a common model, the lack of a firm theoretical foundation, and the inefficiency of many currently implemented object-oriented database management systems as compared to the various existing relational implementations. An object-oriented database management can be behaviorally object-oriented as well as structurally object-oriented. Behauioral object-orientation relates to the notion of data encapsulation, the concept of methods, and the notion of the is-a type hierarchy and its associated concept of inheritance. Structural object-orientation relates to the notion of complex objects ; that is, objects whose attribute values are themselves objects rather than simple data types such as integers and strings, the is-part-of hierarchy. An object-oriented definition of our example miniword is as follows: class Person superclasses : none attribute name: string attribute address: string attribute social-security-number: string class Student superclasses: Person attribute transcript: set o f Course-History class Course-History superclasses : none attribute class: Course attribute when: date attribute grade: character class Faculty
IMAGE DATABASE MANAGEMENT
245
superclasses : Person attribute salary: integer attribute works-for: set o f Position class Position superclasse s : none attribute place: Department attribute rank: string attribute hired: date class Course superclasses: none attribute name: string attribute number: integer attribute home: Department class Department superclasses: none attribute name: string attribute college: string attribute chairs: set of Regime class Regime superclasses: none attribute person: Faculty attribute date-range: Time-Period class Time-Period superclasses : none attribute begin: date attribute end: date method length: date x date + integer
2.3 A Generic Image Database Architecture With the advent of image interpretation and graphics technologies, a wide variety of applications in various areas have evolved that require an application-dependent abstraction of the real world in terms of both textual and image data. It has thus become essential to develop or extend existing database management systems to store and manage this data in an integrated fashion. The type of information required to be managed can broadly be classified into five categories :
Iconic: This information consists of the images themselves, which are stored in a digitized format. Image-Related Data: This is the information found in the header and trailer files of the images.
246
WILLIAM I. GROSKY AND RAJlV MEHROTRA
Feature Informution Extracted,fiorn the Imuges: This information is extracted by processing the images in conjunction with various world models. Image-world Rrlationships: This information consists of the relationships between various image features and the corresponding real world entities. This information may be known a priori or obtained through analyzing the image. World-Reluted Data : This is conventional textual data describing the abstracted world pertinent to the application. Any image database management system must facilitate the storage and management of each of these five types of information. The advantages of data independence, data integrity, data sharing, controlled redundancy, and security offered by conventional database management systems for textual data are required here for both textual and image data. Such a system should perform query operations on iconic information by content. Generalizing from image data management to sensor-based data management and using satellite data as an example, this type of retrieval would include one or more, in combination, of the following simple cases,
1. The retrieval of image data from textual data. An example would be to find the spatio-temperature data distribution taken over a specific geographical area on a given day by a specific sensor. 2 . The retrieval of textual data from image data. An example would be to find the particular sensor that measured a given spatio-temperature data distribution. 3. The retrieval of image data from image data. An example would be to find the visual image of the particular hurricane that manifested a given pattern of spatio-pressure readings. 4. The retrieval of textual data from textual data. An example would be to find the type of sensor residing on a particular type of satellite at a particular time. As is obvious, some of the above mentioned image data retrieval requires the use of image representation (modeling) and recognition techniques. An efficient system will no doubt use model-based recognition techniques whose management will support the efficient insertion, deletion, and updating of given models. This is extremely important in a database environment. In light of the preceding discussion, we can say that an image database management system will consist of three logical modules, as shown in Fig. 2.
247
IMAGE DATABASE MANAGEMENT
Textual Data Management System
-
Textual Data Storage
A U
sE R
User Interface System
t Image Understanding c-----) Image Storage System
FIG. 2. The logical structure of an image database management system.
The image understanding module handles image storage, processing, feature extraction, decomposition, and matching. The textual data management module is a conventional database management system. It manages the textual data related to images, textual data extracted from the images, and other textual data. Recent research in knowledge-based systems design has advocated the use of conventional database management systems to store models and production-inference rules (Dayal, Buchmann, and McCarthy, 1988). In an image database management system, the models and the matching processes are nothing but the knowledge that the image understanding module needs to perform its task. Therefore, one can employ the textual data management module of an image database management system to manage the image models and the information related to the matching process (rules) that are needed by the image understanding module. The user intevfacr module interprets the input commands, plans the processing steps, and executes the plans by invoking the appropriate subsystems at the proper time. 2.4 A Generic Image Data Model
We believe that an image data model must represent the following types of information. The conceptual schema should consist of four parts (Mehrotra and Grosky, 1985): the model base, the model-base instantiation, the instantiation-object connection, and the object information repository, as shown in Fig. 3. The model base consists of hierarchical descriptions of generic entities that the system is expected to manage as well as descriptions of the processing
248
WILLIAM I. GROSKY AND RAJlV MEHROTRA
View I
Viewz
...
View,
External Level
Conceptual Level
Physical Level
FIG.3. Another view of thc proposed design for an image database management system.
that must occur for image interpretation. The model-base instantiation contains detailed hierarchical descriptions of the processed image data. These descriptions are detailed in the sense that all components and their relationships are described as an associated set of attributes. The description of an image will be in one-one cwrespondence with the associated model-base information. Each image entity corresponds to a real-world entity with given semantics. This correspondence is defined in the instantiation-object connection. Finally, the object information repository consists of textual information concerning these real-world entities. To use the system as a purely standard database management system or as an integrated image database management system, only the object information repository would be made available to the users for the definition of external views. In other words, the users would not have to worry about the iconic entity description and processing aspects of the system. The hierarchical descriptions of the generic objects and the image interpretation methods would be inserted in the model base by the database administrator. The information in the model-base instantiation would be stored by the system itself as the required information is obtained through processing the input images. On the other hand, to use the system for purely image interpretation or graphics applications, the entire conceptual schema would be made available to the user for the definition of external views. Thus in this case, the users can define and maintain their own models and image interpretation or graphics
IMAGE DATABASE MANAGEMENT
249
functions. In the former case, the model-base instantiation would be generated and stored by the system itself, whereas in the case of graphics applications, it would be inserted by the users. This system will be general enough to be used in one of the previously mentioned modes or in any combination of these. To achieve this generality and still allow the sharing of information among various types of users, however, one should not be allowed to change the information generated and stored by the system. 3.
Some Example Image Database Management Systems
In this section, we will give the reader a flavor of the different types of image database management systems that have been designed over the years. In order to accomplish this in a meaningful fashion, we divide the development of such systems into three generations. Systems in the first generation are characterized by being implemented relationally. As such, any image interpretation task associated with their use is either nonexistent or hardwired into the system and, if under user control, is so in a very rudimentary fashion. There is no notion of new image feature detectors being composed during run time by the user. Other standard database issues, such as the nature of integrity conditions in this new environment and potentially new notions of serializability, are also left unexamined. While relational systems are still being designed today, mainly for geographic information systems (Orenstein and Manola, 1988), the main thrust of the first generation lasted from the late 1970s until the early 1980s. Second generation systems are characterized by being designed either in a more object-oriented fashion or utilizing a semantically rich extension of the relational model. In this approach, image interpretation routines are, more or less, the methods and, as such, are packaged along with their respective objects. Still, there is no notion of the user composing new image feature detectors in a user-friendly and interactive fashion during run time. However, such database issues as integrity conditions are being examined in this new environment (Pizano, Klinger, and Cardenas, 1989). The second generation began in the mid1980s and is still ongoing. The third generation of image database systems is just beginning. These systems, when fully implemented, will allow the user to manage image sequences as well as to interact with the image interpretation module and compose new image feature detectors interactively and during run time. This interaction will be conducted at a high level and in a very user-friendly fashion. That is, the user will have available a toolbox of elementary features and their associated detectors (methods) as well as connectors of various sorts that will allow him or her to build complex features and detectors from
250
WILLIAM I. GROSKY AND RAJlV MEHROTRA
more elementary ones through an iconic user interface. The only system of which we are familiar that discusses this concept in the context of images is that of Gupta, Weymouth, and Jain (1991), although Orenstein and Manola ( 1988) discuss this concept in a geographical information context. 3.1
First-Generation Systems
The early systems of this generation have been of two major types. There are those systems specifically designed for pattern recognition and image processing applications. These systems were concerned mainly with images (Chang, 1981a). The textual data in these systems consists mostly of textual encodings of the positional information exhibited in the images. There are also those systems that are similar to conventional database management systems and that have images as part of a logical record. These systems, however, are not capable of handling the retrieval of image data by content. They cannot be called integrated image database systems as they do not treat images equally with text. The only two attempts towards the design of integrated image database systems are described in Grosky (1984) and Tang (1981). The pioneering work in this area was done in 1974 by Kunii, Weyl, and Tennenbaum (1974). In their system, a relational database schema is utilized to describe images. A relation snap (snap#, data, place, subject, negative#, frame#) is used to store the image related data. The relations ohjectahl (snap#, object#, object-name), objectah2 (object-name, superpositionorder), and part (object#, part#, part-name, part-superposition-order) are used to describe the images as superimposed objects and the objects as superimposed parts. Some additional relations are used to describe the color, texture, and regions of the objects. This approach satisfies the requirements of compatibility of textual data, data independence from hardware, and data independence from the viewpoints of the information and of the user. However, it does not address the issues concerning methods of extracting information from images and mapping them into the description schema nor the design of a data manipulation language for data input, update, retrieval, and analysis. The next system we discuss is the graphics oriented relational algebraic interpreter (GRAIN), developed by Chang and his colleagues (Chang, Reuss, and McCormick, 1977; 1978; Chang, Lin, and Walser, 1980; Lin and Chang, 1979; 1980). The organization of the GRAIN system is shown in Fig. 4. This system consists of RAIN, the relational algebraic interpreter, to manage the relational database for retrieval use, and ISMS, the image storage management system, to manage the image store. The main characteristic of this system is the distinction of logical images from physical images.
251
IMAGE DATABASE MANAGEMENT
D i S
Database Machine (RAIN)
-
Relational Database
P I a
Y D e V
i C
Store Processor
Image Storage
e
FIG.4. System organization of GRAIN.
This distinction leads to the design of a versatile and efficient image data storage and retrieval system. Logical images are a collection of image objects that can be considered masks for extracting meaningful parts from an entire image. These are defined in three tables : the picture object table, the picture contour table, and the picture page table. Each physical image is stored as a number of picture pages that can be retrieved from image storage using ISMS commands. A relational query language called GRAIN provides the means to retrieve and manipulate the image data. The concepts of generalized zooming and picture algebra have also been explored. Vertical zooming corresponds to a more detailed view of an image whereas horizontal zooming is with respect to a user supplied selection index, such as the degree of similarity. In this case, horizontal zooming corresponds to continuously changing this similarity degree and viewing the corresponding retrieved images. Picture algebra is an image version of relational algebra. This system meets the requirements of compatibility of textual data, data independence, and a manipulation language for image data. However, no methods have been developed to transform an image into its corresponding tuples in the above relational tables; the image description is manually entered. Also, the system has been used mainly in a geographical information context. A system designed recently for map information retrieval that has similar concepts is discussed in Tanaka and Ichikawd (1988). Another important first-generation system is the relational database system for images (REDI) developed by Chang and Fu (1980b; 1980c; 1981). REDI was designed and implemented for managing LANDSAT images and digitized maps. Figure 5 illustrates the system organization of REDI. In this approach, the database management system is interfaced
252
WILLIAM I. GROSKY AND RAJlV MEHROTRA
Command Interpreter
s
/
Interpreter
Image . Unde, .._.._..____._ ...... snding System Processing System
Recognition
+-
Dioplsy Device
Image Storage
Relational Database
Database Management
Image Processing
FIG.5. System organization of REDI.
(01980 IEEE.
with an image understanding system. The image features are extracted from images and image descriptions are obtained by using image processing operators supported by the system. Image descriptions and registrations of the original images are stored in the relational database. Original images are stored in a separate image store. A query language called query-by-pictorialexample (QPE) is part of the system. QPE is an extended version of the predicate-calculus-based relational symbolic data manipulation language Query-by-Example (Zloof, 1977). This system made the first effort .to manage the image processing routines as well as the image data. It did this through the use of so-called image processing sets. Each image processing set is an ordered sequence of image processing operations that accomplishes recognition tasks for various domains. There were processing sets for roads, rivers, cities, and meadows. All processing sets were packaged together into the LANDSAT processing package. This concept anticipated the emerging concepts of object-oriented design and is interesting for that reason. This
253
IMAGE DATABASE MANAGEMENT
system also included support for image-feature-relation conversion, introduction of pictorial examples that enabled effective pictorial queries utilizing terminals, and a simple similarity retrieval capability. An example road, city database consists of the following tables: roads (frame, road-id, xl, yl, x2, y 2 ) road-name (frame, road-id, name) position (frame, xsize, ysize, xcenter, ycenter, location) cities (frame, city-id, xl, yl, x2, y2) city-name (frame, city-id, name)
The position relation holds the registration information of an image, where location indicates where the image is stored. Figure 6 shows how the data manipulation command, ‘Apply the Road processing set to the image whose frame number is 54 and insert the processing results into the roads relation’ would be stated in query-by-pictorial-example; while Fig. 7 similarly illustrates the query, ‘Find one image frame whose road network pattern is most similar to that of the image shown on the display terminal.’ For Fig. 7, the value * of location denotes a default display terminal location for the given image. The road processing set is applied to this image and the intermediate results are inserted into a relation temp. The image operator SIM-LL finds lines similar to given lines. Tang (1981) extended the relational data model to allow an attribute of a relation to have a data type of picture or device. The picture data type is characterized by three numbers: m, n, and h. The size of the image is m X n and the maximum allowed number of gray levels is h. The device data type can take as values only operating system recognizable 1/0 device names. The device type is introduced in order to manage the complicated 1/0system in an integrated image database system through the use of the concept of a logical 1/0 system. The language SEQUEL, a forerunner of SQL, is position Road.
3
frame
xsize
ysize
xcenter
ycenter
location
54
1
I.(Road)
FIG. 6 . A query-by-pictorial-exampledata manipulation statement. 0 1981 IEEE.
254
WILLIAM I. GROSKY AND RAJlV MEHROTRA
position
frame
xsire
ysize
xcenter
ycenter
location
Road.
m I.(Road)
roads
frame
P.
road-id
Xl
yl
x2
y2
SIM-LL.(temp)
extended to serve as an interface between the users and the system. An example database is the following :
employee (name, id-number, face(pic), department-number) employee-feature (id-number, feature-name, feature(pic)) department (department-number, location, manager ) monitors (name(device), department-number, person-in-charge) scanners (name(device), department-number, person-in-charge) A sample query over this database is, ‘Exhibit the face and name, on monitor A, of the employee whose nose has been scanned by the scanner in department 5.’ This would be expressed in SEQUEL as follows:
SELECT employee.name, employee. face FROM WHERE
( ‘ ‘monitor A’ ’ ) employee, employee-feature, scanner scanner.department-number = 5 AND employee-feature.feature-name = ‘nose’ AND employee-feature.feature = scanner.name AND employee.id-number = employeefeature. id-number
IMAGE DATABASE MANAGEMENT
255
The weakness of this approach is that an image cannot stand by itself in the database and an entity cannot have more than a single associated image. Grosky (1984) proposed a logical data model for integrated image databases that overcomes these weaknesses. He proposed three entity sets: one consisting of individual analog images, another consisting of individual digital images, and the last consisting of digital subimages. The relationships among various entities are represented by three tables: Analog - Digital, connecting an analog image to its various digitized counterparts; Digital - Subdigital, connecting digital subimages to the digital images in which they occur ; and Appearing - In, connecting a digital subimage to the subject entities appearing in it. In this approach, the query ‘Display the names and addresses of all persons on file who were photographed together with employee Joseph Smith,’ would be SELECT name, address FROM employee WHERE employee. id-number IN SELECT subject-id FROM Appearing-In WHERE Subdigital.id IN SELECT Subdigital. id FROM Digital-Subdigital WHERE Digital-id IN SELECT Digital-id FROM Digital-Subdigital WHERE Subdigital. id IN SELECT Subdigital. id FROM Appearing-In WHERE Subject.id IN SELECT employee. id FROM employee WHERE name = ‘Joseph Smith’ Also discussed is the need for pictorial as well as textual indices. The last first generation system we discuss is the picture database management system (PICDMS) of Chock, Cardenas, and Klinger (1981 ; 1984) and the associated query language PICQUERY (Joseph and Cardenas, 1988). This system was initially designed for geographical applications, but it has some quite interesting features that can profitably be used in generic image database management systems. Its most interesting architectural property is
256
WILLIAM I. GROSKY AND RAJlV MEHROTRA
how image information is represented. At each point in an image, different attributes are generally recorded. For geographical applications, these attributes could be spectral data, elevation data, or population data, while in a generic image, these attributes could comprise such data as region segmentation data, boundary data, or optic flow data. Rather than record this information for each point, however, an image is subdivided into a gridlike pattern, where each grid element is of some equal small area, and the preceding attributes are recorded for each entire grid element. Rather than store particular attribute values for the entire image in an individual record, however, here a record consists of all the attribute values for the same grid element. Thus, if an image consists of g grid cells, each grid cell having a attributes, rather than having an image file consisting of a records, each record having g fields, this approach has an image file consisting of g records, each recording having a fields. The associated query language PICQUERY allows the user to request such operations as edge detection, different kinds of segmentation, and similarity retrievals of various sorts.
3.2 Second-Generation Systems
Systems in this generation are characterized by using more powerful data modeling techniques. Either various semantically rich extensions to the relational model are used or a model of the object-oriented variety. The system REMINDS (Mehrotra and Grosky, 1985) discusses a generic image data model that also included aspects related to image interpretation tasks. Although relational in implementation, it has many structural objectoriented aspects to it. Based on the image data model discussed in Section 2.4, the model-base consists of two parts: the generic entity descriptions and the functional subschema. The former consists of hierarchical descriptions of generic entities that the system is expected to manage. A model of an entity consists of descriptions of its parts and their interrelationships. In a hierarchical description, each component entity is further broken down into subentities, down to the level of primitive entities, with recursion being supported. As an example the following tables capture the generic entity shown in Fig. 8.
Primitive Primitiveld C
C
AttributeName Type Radius
Attributevalue Circle 1
uRigh L*REa 257
IMAGE DATABASE MANAGEMENT
LeflEye
Skull -&
Face
BearFace
Stomach
-+
RightEar
BearUpperBody
-+ RightLeg
LeflLeg
J
\ BearLowerBody
LowerBody
Bear
FIG. 8. A generic entity. Complex Part ComplexPart BearUpperBody BearUpperBody BearUpperBod y BearFace BearFace BearFace BearLowerBody BearLowerBody BearLowerBody Bear Bear
ComponentPart LeftEar RightEar Face LeftEye RightEye Skull Stomach LeftLeg RightLeg UpperBody LowerBody
Instanceof C C BearFace C C C C C C BearUpperBody BearLowerBody
Scaling 1 I 1 0.9 0.9 5 5 3.5 3.5
Relation ComplexPart BearUpperBody BearUpperBody BearUpperBody BearUpperBody BearUpperBody BearFace
ComponentPartl LeftEar LeftEar LeftEar RightEar RightEar LeftEye
ComponentPart2 RightEar Face Face Face Face RightEye
RelationType LeftOf Above Touch Above Touch LeftOf
1
I
WILLIAM I. GROSKY AND RAJlV MEHROTRA
BearFace BearFace BearLower Body BearLowerBody BearLowerBody BearLower Body BearLowerBody Bear Bear
LeftEye RightEye LeftLeg Stomach Stomach Stomach Stomach UpperBod y UpperBody
Skull Skull RightLeg LeftLeg LeftLeg RightLeg RightLeg LowerBody LowerBody
Inside Inside LeftOf Above Touch Above Touch Above Touch
The hierarchical structure of the generic entity shown in Fig. 8 is exhibited in Fig. 9. Methods are objects also. The functional subschema will logically manage the descriptions of all the image interpretation procedures available in the system. For each image interpretation task, a control structure describing how a set of procedures combine to perform that task resides here. This feature of their system makes the image interpretation system highly modular, which, in turn, makes it easily modifiable: the procedures can be shared among various tasks, new procedures can easily be added, and old procedures can easily be replaced or removed. Thus, the duplication of efforts in the development of new image analysis techniques can be avoided. This is a highly desirable environment in which to carry out image analysis experiments. This process of interacting with the image interpretation module should be able to be done by the user during runtime as well as, of course, by the database administrator. The following tables illustrate a simplified functional subschema for recognizing the generic entity shown in Fig. 8. The
LeftLeg
RightLeg
Stomach
LeftEar
RightEar
Face
F.
b
FIG. 9. The hierarchical structure of the bear entity.
IMAGE DATABASE MANAGEMENT
259
table Functions lists the given operators along with their associated addresses, whereas the table FunctionHierurclzy exhibits the partial order of the various operations involved. We note that the detectors in this latter table perform such tasks as verifying properties of and relationships among the recognized subcomponents. Functions Functionld F1 F2 F3 F4 F5 F6 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F2 1 F22 F23 F24 F25 F26 F27 F28 F29 F30 F3 1 F32 F33 F34 F35 F36 F37
FunctionName Edgeoperator Thresholdoperator ThinningOperator Lin kingoperator LineDetector CircleDetector FindEdge FindLine Findcircle FindLeftEye FindRigh tEye Findskull FindFace FindLeftEar FindRightEar FindUpperBody FindStomach FindLeftLeg FindRightLeg FindLowerBody FindBear LeftEyeDetector RightEyeDetector SkullDetector FaceDetector LeftEarDetector RightEarDetector Upper BodyDetector StomachDetector LeftLegDetector RightLegDetector LowerBodyDetector BearDetector
FunctionHierarchy Command FindEdge FindEdge FindEdge FindLine
Predecessor ComponentFunction Edgeoperator Thresholdoperator ThinningOperator FindEdge
Successor ComponentFunction ThresholdOperator ThinningOperator Linkingoperator LineDetector
260
*
WILLIAM I. GROSKY AND RAJIV MEHROTRA
FindCircle FindLeflEye FindRightEye FindSk ull FindLeftEar FindRightEar Findstomach FindLeftLeg FindRight Leg FindFace FindFace FindFace FindLowerBody FindLowerBody FindLowerBody FindUpperBody FindUpperBody Find UppcrBody FindBear FindBear
FindEdge Findcircle Findcircle Findcircle Findcircle Findcircle Findcircle Findcircle Findcircle FindLeftEye FindRightEye Findskull Findstomach FindLeft Leg FindRightLeg FindLeftEar FindRightEar FindFace FindLowerBody FindUpperBody
CircleDetect o r LeftEyeDetector RightE yeDetector SkullDetector LeftEarDetector RightEarDetector StomachDetector LeftLegDetector RightLegDetector FaceDetector FaceDetector FaceDetector LowerBodyDetector LowerBodyDetector LowerBodyDetector UpperBody Detector UpperBodyDetector UpperBodyDetector BearDetector BearDetector
The next few systems we discuss concern themselves with managing geographic information, but in each approach there are interesting ideas that can easily be applied to generic image database management systems. The system PROBE (Orenstein and Manola, 1988) has been designed by researchers in the database management area and, as such, raises some quite interesting issues. PROBE uses a functional data modeling approach and represents spatial data as collections of points along with associated operations. One important issue concerns the nature of the operations packaged with each object class. Packaging all necessary application-based operations with the corresponding object class will make it difficult for implementers, who must then be familiar with database issues as well as application issues. Thus, the authors leave it to the database system kernel to implement general basic operations and for the object class to handle the more specialized operations. In turn, these specialized operations should be written in such a generalized fashion that they rely on the database system implemented operations as much as possible. An example of this occurs in their discussion of query processing, where the concept of a geometry$lter is also introduced. This is a collection of procedures that iterate over various collections of objects in one or more nested loops, choosing candidates that might satisfy certain query criteria and then verifying that they indeed do satisfy the criteria, As an example, consider the query, ‘Find all pairs of objects x and y, such that x and y are close to each other.’ This command would be
IMAGE DATABASE MANAGEMENT
261
expressed in their notation PDM algebra, as,
candidates:=spatial-join(x, y) result:=select (candidates, close) Spatial join is implemented in the database system kernel and chooses pairs of objects likely to be close to one another. In the application, each candidate is examined by the associated method close, where the notion of two objects being close to one another is more precisely defined. To show the applicability of the author’s approach to a generic image database application, we exhibit an example schema from their paper:
type image is entity pixels (image, X, Y) -+ pixel place (image) -+ box (*Bounding box giving bounding latitudes and longitudes*) time (image) --f time (*When the image was taken* ) frequency (image) -+ float (*spectral band* ) feature (image) -+ set of feature (*Set of notable features, extracted by an image interpreter*) type feature is entity entity type (feature) -+ feature-type location (feature) + (latitude, longitude) (*Real-world coordinates*) occurrences (feature) -+ set of (image, x, y) (*Describes occurrence of a feature in each image containing the feature, and gives the position o f the feature within the image*) near (feature) 4 set of feature ( * A set of nearby features*) type road is feature name (road) string crosses (road) + set of road length (road) -+ real -+
type bus-stop is feature buses (bus-stop) -+ bus-line
262
WILLIAM I. GROSKY AND RAJlV MEHROTRA
Another system designed by database researchers is that constructed around the query language PSQL (Roussopoulos, Faloutsos, and Sellis, 1988). This language is a spatial data management-based extension of SQL and was formulated first (Roussopoulos and Leifker, 1984). The system follows the philosophy of having an extensible language with embedded specialized, applications-dependent commands, the latter being implemented by a separate application processor. See Fig. 10 for the architecture of this system. At present, PSQL supports points, line segments, and regions, and it supports numerous specialized operators for these entities. An example command in their system is
SELECT hwy = section FROM highways , c i ti e s WHERE city = ‘Detroit’ and distance (location, segment) = min( SELECT distance (location, segment) FROM highways, cities WHERE city = ‘Detroit’ and hwy-name = ‘180’) which finds the highway section of 1-80 closest to Detroit. Query processing makes use of the specialized data structures of R trees and R+ trees (Guttman, 1984; Sellis, Roussopoulos, and Faloutsos. 1987). These indexing mechanisms, or structures like them, can also be used in generic image database management systems. User
Indirect Spatial Access
Direct SpatI aI Access
FIG. 10. The architecture ol’ the image database system for PSQL. K) 1988 IEEE.
IMAGE DATABASE MANAGEMENT
263
With respect to the above two systems, the functional data model is much more natural than SQL for various spatial operations and queries. However, image interpretation is quite a bit more complex than spatial operations and it is unclear from these articles how a real image interpretation task would proceed. Goodman, Haralick, and Shapiro (1989) overcome this shortcoming by indicating for a particular image interpretation task not only the image modeling that is necessary, but also the associated processing steps, what we previously called the functional subschema. The problem discussed is that of pose estimation, which is determining the location and orientation of an object from its 2-D image. The data model used is CAD-like and hierarchical. Primitive features are considered to be of level-0, while, in general, levelk features represent relationships between features of level less than k . As an example, consider the line drawing shown in Fig. 1 1. To describe this line drawing, the authors use a data structure called a relational pyramid. This data structure is hierarchical and describes higher level features in terms of features at lower levels. Conceptually, this data structure captures the following information, Level-0 Features Straight Curve L, c, LZ C2 L3
Level-1 Features Three-Line Junctions Jz : {(straight, LJ, (straight, L?), (curve, c,)) J, : {(straight, L,), (straight, L2), (curve, Cl)} Four-Line Junctions J, : {(straight, Ll), (straight, L), (curve, ‘211, (curve, C d } Level-2 Features Junction Adjacency { (four-line, J,), (three-line, Jz)) {(four-line, JI), (two-line, J,)) {(three-line, J2), (three-line, J,)}
J1
FIG. 11. A sample line drawing. 0 1989 IEEE.
264
WILLIAM I. GROSKY AND RAJlV MEHROTRA
For rapid feature matches, another data structure, called the summary pyramid, is constructed based on the relational pyramid. This data structure
captures the number of different types of features. Such a structure based on the relational pyramid is Levei-0 Features Straight Curve 3 2 Level-1 Features Three-Line Junctions [(straight, straight, curve), 21 Four-Line Junction [(straight, straight, curve, curve), 11 Level-2 Features Junction Adjacency [(four-line, three-line), 21 [(three-line, three-line), I ]
A functional subschema is then developed that utilizes these data structures. This encompasses the transformation of creating a 2-D wire frame representation from the associated image, building the relational and summary pyramids, using an associated index structure into the CAD model database, and finally determining the correct pose. Finally, an interesting use of images has been studied by Pizano, Klinger, and Cardenas (1989). In this paper, the notion of using images for expressing integrity constraints in a spatial database environment is explored. Each image represents an unacceptable database state. For example, in Fig. 12 an image is shown that conveys the fact that automobiles and people cannot be in a crosswalk simultaneously. These constraint images are automatically translated to predicate logic formulas and then to a form more amenable to whichever database management system is at hand.
FIG. 12. An example image constraint description.
1989 IEEE.
IMAGE DATABASE MANAGEMENT
265
3.3 Third-Generation Systems In all previous systems discussed, the user could formulate a standard database schema related to the aspect of the world to be modeled. This schema could, of course, include images. However, the user has no control over the module of the system that performs the actual image interpretation. Third-generation systems allow the user some control over this module. There will be some sort of functional subschema that the user can formulate. The only papers of which we are aware that have put some flesh on this concept are those of Jagadish and OGorman (1989) and Gupta, Weymouth, and Jain (1991a, b). In Jagadish and O’Gorman (1 989), derived image feature types can be built on top of particular base types. This customization is not in terms of a fixed set of operations, however, and whether it can be done dynamically is unclear. There is the notion of a physical hierarchy and a logical hierarchy as part of image data modeling. The physical hierarchy starts at the pixel level, advances to the chain level, the line level, the composite level, the structure level, and finally to the entire image level. In parallel with this, the logical hierarchy provides the semantics of the corresponding physical hierarchical structures. As an implementation of this general concept, the authors introduce the TLC image model, which is an acronym for thin line code. Entities at each level have their own associated attributes and methods. Different notions of inheritance are discussed due to the nature of the application. As an example, a poIygon’s constituent lines are part of the polygon but are not subtypes of the type polygon. However, these lines may still inherit such attributes as color and thickness from the given polygon. The discussion in Gupta, Weymouth, and Jain (1991a, b) is extremely comprehensive with respect to data model design as part of the implementation of a very general image database management system called VIMSYS (Visual Information Management System). This is the only prototype system in which managing information from image sequences has also been addressed. VIMSYS has a layered data model that is divided into an image representation and relation layer, an image object and relation layer, a semantic object and relation layer, and a semantic event and relation layer, each layer being implemented via object-oriented techniques. In the image representation and relation layer, each image object has multiple representations that are mutually derivable from each other. The image object and relation layer concerns itself with image features and their organization. Examples of such features are those of texture, color, intensity, and geometry. New features can easily be formed from given features. Using supplied constructors, one can define such features as an intensity histogram by the expression gruph_of(intensity, integer) as well as a texture field by the
266
WILLIAM I. GROSKY AND RAJlV MEHROTRA
expression mutrix_of(uppend(orientedness,point)).The latter definition illustrates the process of combining two existing features into a composite feature through the use of the operator append. The semantic object and relation layer is used to connect real-world entities with various objects in the preceding two layers. Finally, the semantic event and relation layer is used to construct so-called temporal features, a collection of features over an image sequence. An example of a temporal feature is that of a rotation. The authors’ design of a user interface is also quite interesting. The query specification is done through a graphical user interface in an incremental manner. The authors recognize that specifying a query over an image domain is not as straightforward as other researchers have presented it and have given the user much freedom to specify exactly what he or she wants. As an example, the user may want to search for a greenish object of a particular shape. The system will allow the user to specify what he means by the term greenish by manipulating hue, saturation, and lightness scrollbars via a mouse until the shade of green that the user feels is appropriate is exhibited. The user can use similar methods to specify the shape. Thus, the query style is more navigational than in other systems. 4. Similarity Retrieval in Image Database Systems
In image database systems, we often want to retrieve images whose contents satisfy certain conditions specified in an iconic query (i.e., queries that involve input images and conditions on them). In other words, an image database management system must support the retrieval of image data by content (Grosky and Mehrotra, 1989b; 1990). Two types of image data retrieval (or commands) involve input images : Shape Similurity- Bused Retrieval: In these queries, the spatial relationships among the objects in an image are not important. The specified conditions are based on similarity of shapes. An example is, ‘Find all images that contain one or more objects present in the input image or in the view of camera C1.’ Spatial Relationship Bused Retrieval: In these queries, the constraints on the similarity of shapes as well as the similarity of their spatial relationships are specified. For example, ‘Find all images containing the object in the view of camera C1 to the left of the object in the view of camera C2,’ or ‘Find all images having the same objects and same relationships among them as in the input image.’
To process ionic commands, the query image data must be analyzed to identify its contents. In other words, image representation and interpretation
IMAGE DATABASE MANAGEMENT
267
should be components of a viable query processing strategy. The query image(s) as well as the stored images must be efficiently and reliably interpreted. This requires an efficient organization of the model base as well as the model-base instantiation. The model base has to be searched to interpret the contents of the query images. The model-base instantiation has to be searched to identify the stored images or the model instantiations that meet the conditions specified in the query. We believe that an image database should have one index for organizing the model base and separate indexes for the instantiations of each of the models. In this case the image command processing can be considered a two-phase process. First, the model-base index is searched to analyze the content of the query images. This phase yields the matching or most similar models found in the query images as well as the values of various required model parameters (such as size, location, or various relationships). Then, the instantiation indexes corresponding to the retrieved models can be searched to identify the instantiations or images meeting the query conditions, possibly through the use of various model parameters. Since images are usually corrupted by noise or distortions, the search for similar shapes or images must also be capable of handling corrupted query images. Thus, efficient noise-insensitive and distortion-insensitive index structures based on robust representations of images are essential to achieve image data retrieval in an image database system. The traditional index structures are not directly applicable for these two classes of image retrieval. Several index mechanisms have been proposed to retrieve geometric objects that intersect a given spatial range (Guttman, 1984; Orenstein and Manola, 1988; Samet, 1990a; 1990b; Sellis et af., 1987). These mechanisms are useful for spatial database systems, but are not useful for the previously mentioned types of image information retrieval. As far as image information retrieval is concerned, the key issues to be handled in the design of an image information retrieval system are Shape and Image Representation: How can the useful information present in an image be described in terms of the features or properties of the shape of the objects or their spatial relationships? Of course, an important point is that these representations should be extracted automatically by processing the images. For the first type of retrieval, an image is represented as a set of shapes or regions present in that image. Each shape is represented in terms of its properties or primitive structural features. It is generally assumed that all the shapes that could appear in images to be managed are known a priori. Therefore, representation of each of the known shapes-objects is usually compiled and stored in the model base. For the second type of image information retrieval, an image is represented by an ordered or partially ordered
268
WILLIAM I. GROSKY AND RAJlV MEHROTRA
set of shapes or by a graph structure. The ordering is determined by the spatial relationships of interest. Similarity Measure: What measures or criteria should be employed to automatically determine the similarity or dissimilarity of two shapes or the spatial relationships among objects? The similarity measure used by a system depends on the type of features or properties used to represent shapes or spatial relationships among objects? The similarity measure used by a system depends on the type of features or properties used to represent shapes or spatial relationships. Index Structures : How should the shape and spatial relationship representation be organized so as to enable an efficient search for similar shapes or spatial relationships based on a predefined similarity measure? Since a large set of known models or images have to be searched to select a subset of models or images that satisfies certain conditions, model and image data must be organized in some index structures to facilitate efficient search. There are two main classes of approaches to image information retrieval. One class of approaches deals with the design and manipulation of indexes for shape similarity-based retrieval. In other words, these are data-driven techniques for shape recognition. The other set of techniques is concerned with the image spatial knowledge representation in order to retrieve images based on the similarity of spatial relationships among the various objects appearing in the given images. Some of these techniques are reviewed in the following subsections.
4.1
Shape Similarity-Based Retrieval
Shape matching or object recognition is an important problem in the area of machine vision. A number of approaches have been proposed for interpreting images containing two-dimensional (2-D) objects. Most of the existing techniques are model-based. The goal of a model-based system is to precompile the description of each known object, called a model, and then to use these models to identify any objects present in the input image data and to determine their locations. A model for an object is developed using features extracted from one or more prototypes of that object. In general, the overall functioning of a model-based recognition system can be divided into two phases: the training phase and the recognition phase. In the training phase, the system builds the models of the known objects, stores the models in a database, called the model base, and collects or generates information useful for the recognition of unknown objects. In the recognition phase, the models and other useful information acquired during the
269
IMAGE DATABASE MANAGEMENT
training phase are utilized to analyze the input images. Figure 13 shows the main functional components of a model-based object recognition system. The matching process of the recognition phase of most of the existing modelbased object recognition systems can be divided into two component processes : hypotheses generation and hypotheses verification. The hypotheses generation component is responsible for hypothesizing the identities and locations of objects in the scene, whereas the hypotheses verification component performs some tests to check whether a given hypothesis is acceptable or not. This mode of operation is called the hypothesize-and-test paradigm. Several shape matching or object recognition techniques have been proposed. One approach is to use each of the precompiled models, in turn, as a test model. Hence, the object's identity is assumed to be known. The image data is searched for one or more features of the model under consideration. If matching features are found then an instance of an object is assumed to be present and the location parameters are estimated, if possible or desired. The presence of an object at the estimated location may be verified later. We call this the model-by-model approach to shape recognition (Ayache and Faugeras, 1986; Bolles and Cain, 1982; Turney, Mudge, and Volz, 1985). The main disadvantage of this approach is that the cost of shape matching is usually high because the image data is exhaustively searched for a selected feature belonging to the test model. Another approach, which we call feature-by-feature (Knoll and Jain, 1986), forms a collection of features from all the models in the training phase and associates with each feature a list containing where and in which objects that feature is found. Each of these features is then searched for in A Prlorl Knowledge
-
- - - - - - - - TRAINING - - - PHASE ------_-___ - -_RECOGNITION PHASE
270
WILLIAM I. GROSKY AND RAJlV MEHROTRA
the image data. If a particular feature is found, the list associated with that feature is used to hypothesize and verify the identities and locations of the possible objects. The main limitation of this approach is that to achieve a higher speed of recognition, only features that appear in a certain proportion of models should be used to form the model feaure collection (Knoll and Jain, 1986; Turney et ul., 1985). To find such features, complex and expensive methods are usually used that must then be repeated each time a model is deleted or inserted. The fundamental difference between these two approaches (see Fig. 14) is that the model-by-model approach uses a feature belonging to a given model, whereas the feature-by-feature approach uses a feature belonging to a collection of features obtained from the database of models. These two
I
Image Data Representation
Matching
Hypothesis Generation
a: The Model-by-Model Approach
n image Data Representation
Matching
Hypothesis Generation
b: The Feature-by-Feature Approach FIG. 14. The model-driven approaches to object recognition.
IMAGE DATABASE MANAGEMENT
271
approaches are model driven in the sense that the image data is searched for model related feature data-either belonging to a specified model or to a collection of features obtained from the model database-in order to generate hypotheses. The various model-driven techniques are not suitable for database retrieval because a linear search is conducted to find matching models. Therefore, a desirable response time for processing the retrieval requests may not be attainable. Alternatively, the model database can be searched for an image-related feature in order to find which models have that image feature. Once this information is available, the identity and locations of the objects can be hypothesizd and verified. In other words, a data-driven approach (Grosky and Mehrotra, 1990; Mehrotra, 1986) to recognition of objects is another possibility. One way of finding the identity and location of an object that contains a given image feature is to search each model, in turn, for this feature-a datadriven, model-by-model approach. However, another possibility is to form a collection of features belonging to the models and search this collection for the given image feature. Since high speed is one of the desirable characteristics of an object recognition system in a database environment, the search for a given feature in the feature collection must be conducted with a minimum of effort. The efficiency of such a search can be increased by the use of such heuristic search procedures as A* (Grebner, 1986). However, this approach also employs a linear search and is thus not desirable for similarity retrieval in an image database system. The conventional data management approach to speed up search is to organize the data in a particular way and then employ some appropriately tailored search procedures. For example, binary search can be used with a sorted set of numerical data. If, in addition to the search operation, insertion and deletion operations are also required, the data can be organized in an index structure such as a binary search tree, kd-tree, 2-3 tree, hash table, or B-tree. Since an object recognition system in an image database environment may be required to identify additional objects and no longer be required to identify some of the previously existing objects, the insertion and deletion of models must also be efficiently handled by such a system. Earlier approaches to data-driven model-based object recognition techniques (Agin, 1980; Gleason and Agin, 1979) cannot handle complex image data containing overlapping, partially visible, and touching objects, due to the limitations of the features used for building models. Recently, a few data-driven techniques capable of handling complex image data have been proposed (Grosky and Mehrotra, 1990; Lamdan, Schwartz, and Wolfson, 1988; Mehrotra and Grosky, 1989; Stein and Medioni, 1990). In these techniques, as in traditional databases, iconic index structures are employed to store the image
272
WILLIAM I. GROSKY AND RAJlV MEHROTRA
and shape representation in such a way that searching for a given shape or image feature can be conducted efficiently. Some of these techniques handle the insertion and deletion of shapes or image representations very efficiently and with very little influence on the overall system performance. The general functioning of an index-based data-driven object recognition technique is depicted in Fig. 15. Index-based data-driven techniques are highly suited for similarity retrieval in an image database management system because they offer efficient shape matching and also the possibility of inserting and deleting models. The existing iconic index structures for shape similarity-based retrieval can be classified into two different classes based on the types of features used to represent shapes : global feature-based indices and local feature-based indices. 4.1.7 Global Feature-Based Indexes
These techniques utilize primitive structural features or properties that are derived from the entire shape. Examples of such features are those of area, perimeter, or a set of rectangles or triangles that cover the entire shape, among others. Since the entire shape is required to extract these features, however, techniques based on these methods cannot handle images containing overlapping or touching shapes. A Priori
a Known Object
Object
TRAINING PHASE
Design
- - - - - - _ - - - - - - - - - - -- - - - - - - RECOGNITION PHASE
IMAGE DATABASE MANAGEMENT
273
One of the earliest indexed, structure-based object recognition systems, called the SRI Vision Module, was proposed by Gleason and Agin (Agin, 1980; Gleason and Agin, 1979). This system uses global feature-based shape representations. The regions of a 2-D shape are represented by a vector of numerical attributes (or features) such as area, moments, perimeter, center of mass, the extent in the x and y directions, number of holes, area of holes, aspect ratio, and thinness ratio. Several images of each shape are taken to obtain average values of the various shape attributes. After building representations of all the known shapes, a binary tree-based attribute index of the type shown in Fig. 16 is created as follows. The two feature values with the largest separation for a given attribute and the corresponding pair of shapes are selected to reside at the root node of the index tree. A threshold is then selected for this attribute that distinguishes between the two shapes. Next, two subtrees of the root node are formed so that all shapes whose given attribute value is less than or equal to the threshold become members of the left subtree and all other shapes (i.e., those whose given attribute value is greater than the threshold) become members of the right subtree. This procedure is applied recursively to the two subtrees. This recursion terminates when the size of a subtree becomes one. Insertion or deletion of models requires a complete reconstruction of the decision tree for the new set of models. No secondary storage implementation has been proposed for this index. If N attributes are used to represent a shape, it becomes a point in an N-dimensional feature space. In this case, any multidimensional point indexing technique can be used.
shape-1
shape-2
FIG. 16. An example of a decision tree classifier.
274
WILLIAM I. GROSKY AND RAJlV MEHROTRA
Grosky and Lu (1986), propose a boundary code-based iconic index for shape recognition. In their approach, a shape is represented by the boundary code of its boundary. The similarity of two shapes is then based on the length of a particular type of longest common subsequence, called the longest q-generalized common subsequence (LqGCS), with respect to the boundary codes of the two shapes, based on a generalized pattern matching technique for two strings. An index is designed by packing the boundary codes into a superstring. Each character in the superstring contains a set of votes for the individual strings to which it belongs. To classify an input string (or boundary code), the LqGCS of this string with the superstring is found. The votes of the matching and nonmatching characters are used to determine the quality of the match between the input string and each of the strings in the database. Insertion or deletion of models again requires the complete redesign of the superstring for the new set of models. Recently, Jagadish proposed a retrieval technique for similar rectilinear shapes (Jagadish, 1991). A rectilinear shape is represented by a set of rectangles that cover the entire shape. One of the rectangles is selected as the reference rectangle to normalize the locations and sizes (represented by a pair of values) of the other rectangles. The location of a rectangle before normalization is represented by the coordinates of the center ( x , , y r ) of the line segment joining the lower-left and upper-right corners of that rectangle. The size of a rectangle before normalization is represented by the pair (Xur-xii), (yu,-yd, where (Xur, r u r ) and (XI). yiJ are the coordinates of its upper-right and lower-left corners, respectively. A shape is described by a vector ( t x , t,, s, d ) for the reference rectangle and a vector ( c ~ ,c,, s,, s), for each of the other rectangles. Here ( t x , t,) is the location of the reference rectangle, s is the product of the x and y components of the size of the reference rectangle, d is the ratio of the y and x components of the size of the reference rectangle, (cx, c,) is the coordinate of the center of the given rectangle normalized with respect to ( t x , tv), and s, and s, are the x and y components of the size of the given rectangle normalized with respect to the size of the reference rectangle. Thus, a shape covered by k rectangles becomes a point in 4kdimensional space. Therefore, any multidimensional point indexing method can be used. The similarity of two shapes (or two rectangles) is then determined by the sum of the areas of the various nonintersecting regions, if any, when one shape is placed on the other. Since all these techniques rely on global feature-based shape representation, they cannot handle images with overlapping or touching shapes or objects. We now describe some index-based techniques that permit shape similarity-based retrieval even when input images contain such overlapping or touching shapes.
I M A G E DATABASE M A N A G E M E N T
275
4.1.2 Local Feature-Based Indexes
These techniques utilize primitive local structural or relational features to represent shapes and images. Local features are those that do not depend on the entire shape and therefore can be extracted by processing local segments of a shape or an image. Examples of local features are line and curve segments of the object boundary and points of maximal curvature change. Mehrotra and Grosky proposed a data-driven object recognition approach based on local feature-based iconic index structures (Mehrotra, 1986; Mehrotra and Grosky, 1989). They proposed that given any structural feature-based shape representation technique and a quantitative method to measure the similarity (or difference) between any two features, a feature index tree having the following properties can be created (Grosky and Mehrotra, 1990) : 1. The model features are stored at the leaf nodes. 2. Each of the interior nodes contains a feature, called the reference feature. This feature can be either a member of the model feature collection or an artificial feature. 3. The members of any subtree are more similar to the reference features stored at the root of that subtree than to the reference feature stored at the root of the sibling subtree. Given a feature of the input image, the best matching feature in the feature index tree can be easily found using the following algorithm: I . Let the root of the feature index tree be at level 0. Find which of the two reference features at level 1 of the index of the index tree is more similar to the given feature. 2. Search the subtree whose root has the most similar reference feature and ignore the other subtree. 3. Recursively apply this procedure until a leaf node is reached. The feature stored at the leaf node is then taken to be the best matching feature.
Associated with each feature stored at a leaf node of the feature index is a list of shape-location information that tells where and in which shapes that feature appears. The shape-location list associated with the best matching feature is used to hypothesize the identities and locations of possible shapes. These hypotheses are later verified. The average time complexity of recognition in this case is O(log2 N) for a feature set of size N . This index structure permits efficient insertion and deletion of models. The index tree could be
276
WILLIAM I. GROSKY AND RAJlV MEHROTRA
developed by incrementally adding features of each model one at a time or by recursively subdividing the entire collection of all model features. A prototype system based on this feature index is presented in (Mehrotra and Grosky, 1989). In this system, a shape is modeled as an ordered set of vertices of the polygonal approximation of its boundary. Each vertex is described by a set of attributes that consists of a length, an angle, and its coordinate values. The length attribute gives the distance of the given vertex from the previous vertex and the angle attribute gives the angle of the given vertex. In other words, a shape is described by an attributed string. Finally, fixed size subsets (disjoint or nondisjoint) are used as features for building the feature index. Figure 17 shows an example of a feature. An edit-distancebased similarity measure was proposed to determine the similarity of two attribute strings (or features). This similarity measure computes the cost to transform one attributed string to another. It attains a value of zero for exact matching features and increases with the dissimilarity between the features. Grosky, Neo, and Mehrotra (1989; 1991) extended their binary tree-based feature index to an m-way tree for secondary memory implementation. This generalized index has the following properties : 1. Each internal node has the structure shown in Fig. 18. The value of Rejis a reference feature used to determine key values, while s represents the current out-degree of the node and is restricted to lie in the range [2,m]. The notations Pipoint to subtrees that are also m-way search trees. The notations K, are values that divide the underlying features into intervals.
FIG. 17. An example of a feature
277
IMAGE DATABASE MANAGEMENT
ef
s
Po
K1
PI
K2
P2
..=
KPr
P,
FIG. 18. Structure of an internal node.
2. The key values in an internal node are in ascending order; i.e., K, < Ki+, for 1 < i d s - 2 . 3. All key values in nodes of the subtree pointed to by P,are less than or equal to the key value Ki+i,for 0 6 i < s - 2. 4. All key values in nodes of the subtree pointed to by P,-, are greater than the key value K,-i. 5. A typical leaf node is shown in Fig. 19. The value n represents the current number of features in the node. Each F, is a feature with associated list Li, containing where and in which models Fiis found. In their implementation, this list has a maximum declared size. Any list which gets larger than this bound is chained in an overflow area. Each leaf node can contain from 1 to r features. The key value of an internal node are similarity values between the reference feature in the same node and features in its subtrees. A good match of an input feature is a feature in the index whose similarity with the input feature is less than some threshold value. A two-phase index search process was proposed to find a good match for an input feature. The first phase, called the external search, searches for a leaf node containing the potentially matching feature. The second phase, called the internal search, searches the data associated with that leaf node for the best matching feature. Two cutoff criteria are used to eliminate some subsets from search for the best match. Suppose that b is the current best-match key found so far, q the query key, and 5 = sim(q, b), the similarity between q and 6, where sim is a metric similarity measure. If sim(q, x) < 5 then b is updated with x and 4 is updated with sim(q, x), as b is a closer match. The following cutoff criteria provide sufficient conditions for eliminating subset Y of the key space X if it is known a priori that sim(q, y ) > 6 for all y in Y : 1. Suppose Y G X , x EX, and for every y E Y, we have that sim(x, y ) < k. Then, if sim(q, x) - k 2 5, we can eliminate subset Y from consideration. That is, no key in Y is closer to the query key than b.
n
FO
Lo
FI
Ll
0.-
F,I
L,I
278
WILLIAM I. GROSKY A N D RAJlV MEHROTRA
2. Suppose Y E X , x E X , and for every y E Y, we have that sim(x, y ) 3 k. Then, if k - sim(q, x) 2 4, we can eliminate subset Y from consideration.
The external search starts by traversing the path of the tree to the leaf L, which ostensibly contains the exact match. Hence, a better estimate of x is obtained, resulting in a possible exclusion of various subtrees from the search. If a good match is not found, the two cutoff criteria are applied in alternately searching the left and the right siblings of L. Once a cutoff criterion is met, further search of siblings in that direction is unnecessary since the key values in the tree are in ascending order. Another class of data-driven shape matching techniques is based on the concept of geometric hashing. These methods store the model-base information in a hash table that is indexed to search for a given shape or a feature of a shape. Lamdan and Wolfson (1988) represent a shape by a similarity invariant representation set of interest points. This is done by defining an orthogonal coordinate frame using an ordered pair of points, called the basis pair, and representing all other points with respect to this frame. Multiple representations of an object using different basis pairs are then obtained. For each basis pair, the transformed coordinates of all other points are hashed into a table that stores all (shape, busispair) tuples for every coordinate. To analyze given image data, a basis frame is selected from the set of image interest points and the coordinates of all other points are computed with respect to the selected basis. For each transformed point, the hash table is indexed and votes are gathered for the (model, basispair) tuples appearing there. The number of votes for a (model,hasispair) tuple indicate the quality of similarity. The transformation parameters are hypothesized using the point correspondence between the model points and the image points. The hypothesized transformation is then verified. Stein and Medioni (1990) propose another hash-based shape matching technique. They represent a shape by the polygonal approximation of its boundary. A set of adjacent line segments of the polygonal approximation, called a super segment, is used as a basic feature for creating a hash table. A super segment is characterized by a set of numerical attributes. The representation of each super segment is gray coded and hashed into a table where (super segmenl, object) tuples are stored. To analyze a query image, gray codes of the super segments of the input are used to index the hash table to generate and verify hypotheses regarding the identity and location of the shape. This technique also permits the efficient insertion and deletion of models. Some other data-driven shape matching techniques that are suitable for shape similarity-based retrieval in a database environment are described in
IMAGE DATABASE MANAGEMENT
279
Hong and Wolfson (1988); Kalvin et al. (1986); Mehrotra, Kung, and Grosky (1990); and Sethi and Ramesh (1989a; 1989b; 1991). 4.2 Spatial Relationship- Based Retrieval
To retrieve images that meet the shape identity and spatial relationship constraints requires efficient representation and organization of spatial relationship knowledge, which are sometimes called relational models. Very limited research activities have been reported on this type of image data retrieval. Generally, there are two types of image representation models that are used: graphs and strings. These methods assume that any given input image is first processed to obtain the identities and locations of the objectsshapes present in that image. In a graph-based method, an image representation or relational model is defined by a graph whose nodes represent objects and whose edges represent relationships. Shapiro and Haralick ( 1982) proposed two organizations for graph-based relational models. One of these two organizations is based on the concept of clustering whereas the other is based on the concept of binary trees. They defined two distance measures to quantify the similarity of two representations. According to their first measure, the distance D ( G , , G,) for a pair of graphs ( G I ,G,), each of size s, is given by
W G , ,G,)
= min f
llf(GJ - GZII,
where f is a permutation of s and /I . /I represents any norm. GI and G, are considered similar if D(G,, G,) is less than or equal to some threshold d. The second distance measure is a generalization of the first distance measure. Let MI = { R , , . . . , R k } and M2 = {SI, . . . , Sk} be two relational models. For any N-ary relation R s A N and associationf'c A x B, the composition R of' is defined as R 0 f = { (bl , . . . , b N ) E B N I 3(ul, . . . , u N ) E R with ( a , , b,) EL for 1 d n 6 N}. The distance between M I and M2 is then defined in terms of two types of errors: the structural error and the completeness error of the association f. The structural error of an association f E A x B with respect to N-ary relations R c A N and S c BN is E S ( n= 1 R 0 f - S 1 + IS o f ' - R I. The structural error is a measure of the tuples found in R, but not in S, or found in S, but not in R. The completeness error of an associationfz A x B with respect to N-ary relations R G A N and S E is
Ec(f)=I S - R o f l
+ IR-Sof-'I.
The completeness error is a measure of the tuples in S that no tuples in R map to and vice versa. The combined error is then given by E R , S ( f ) = ClEs(f)
+ C2E4f).
280
WILLIAM I. GROSKY AND RAJlV MEHROTRA
The total error offwith respect to the relational models M I and M2 is then given by K
The distance between MI and M2 is given by
GD(MI,M 2 )= min E ( f ) . f
The clustering-based approach forms clusters of relational models using one of the previously mentioned distance measures for comparing two relational models or graphs. For each cluster, a representative is selected such that every member of a cluster is more similar to its representative than to the representatives of other clusters. To retrieve matching images/models, the input relational model is matched against each of the cluster representatives. The clusters whose representatives are closely similar to the input model are then searched for the best matching or closely matching images/models. The binary tree-based relational model organization has the same properties as the binary tree-based feature index structure of Mehrotra and Grosky discussed earlier. For a given set of relational models S , a binary tree is recursively generated. At each level of recursion, for every large enough set of relational models L, two models A and B belonging to L are selected so as to minimize c E I.
min[D(G, A ) , D(G, @I,
where D(R, X )denotes the distance between models R and X . The remaining models of set L are split into two groups P A and PB so that every model in set Pa is more similar to A than to B and every model in PB is more similar to B than to A . The search for the best matching relational model starts with the comparison of the input model with the two representatives at level 1, where the root is at level 0. If an acceptable match is found, then the search terminates, otherwise the subtree with the more similar representative is recursively searched and the other subtree is ignored. No secondary storage implementation has been proposed for any of these methods. Other treatments of relational matching may be found in Haar (1982) ; Mulgaonkar, Shapiro, and Haralick (1982a, 1982b); Shapiro and Haralick (1981); Shapiro et al. (1984). Chang, Shi, and Yang (1987) have proposed a two-dimensional string representation for modeling the spatial relationships among the objects in an image. In their approach, the input is regarded as a symbolic image that preserves the spatial relationships among objects of the original image. This
IMAGE DATABASE MANAGEMENT
281
symbolic image can be obtained by recognizing the identities and spatial locations in the x and y directions of objects present in the original image. A symbolic image is encoded as a two-dimensional string. Formally, let V be the set of symbols representing the pictorial objects and let R be the set {=,