Parallel Computational Geometry
Selim G. Akl Kelly A. Lyons Department of Computing and Information Science Queens Uni...
119 downloads
1034 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Parallel Computational Geometry
Selim G. Akl Kelly A. Lyons Department of Computing and Information Science Queens University
PRENTICE HALL, Englewood Cliffs, NJ 07632
Library of Congress Cataloging-in-Publication Data
AkU, Selim G. Parallel computational geometry / Selim G. Akl, Kelly A. Lyons. p. cm. Includes bibliographical references (p. ) and indexes. ISBN 0-13-652017-0 1. Gemoetry--Data Processing 2. Parallel processing (Electronic computers) 3. Computer algorithms. I. Lyons, Kelly A. II. Title. QA448.D38A55 1993 516'.00285'435--dc 20 92-8940 CIP
Acquisitions editor: THOMAS McELWEE Editorial/production supervision and interior design: RICHARD DeLORENZO Copy editor: CAMIE GOFFI Cover design: JOE DiDOMENICO Prepress buyer: LINDA BEHRENS Manufacturing buyer: DAVID DICKEY Editorial assistant: PHYLLIS MORGAN
To Joseph
S.G. Akl ** * ** ** * ***
**
***
*
To Rainy Lake and the people on it -*
© 1993 by Prentice-Hall, Inc. A Simon & Schuster Company Englewood Cliffs, New Jersey 07632
K.A. Lyons
All rights reserved. No pan of this book may be reproduced, in any form or by any means, without permission in writing from the publisher.
Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
ISBN 0-13-652017-0 Prentice-Hall Intemational (UK) Limited, London Prentice-Hall of Austria Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Simon & Schuster Asia Pte. Ltd., Singapore Editor Prentice-lHall do Brasil, Ltda., Rio de Janeiro
ISBN 0-13-652017-0
I
790000> 9 '78013 5_20_177'"11111
Contents PREFACE I
2
Vii
INTRODUCTION
1
1.1
Origins of Parallel Computational Geometry
1.2
Representative Problems
1
1.3
Organization of the Book
4
1.4
Problems
1.5
References
4 6
MODELS OF PARALLEL COMPUTATION 2.1
2.2
9
Early Models 9 2.1.1 Perceptrons, 9 2.1.2 Cellular Automata, 16
Processor Networks 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10
2.3
1
10
Linear Array, 11 Mesh or Two-DimensionalArray, 11 Tree, 13 Mesh-of-Trees, 13 Pyramid, 14 Hypercube, 17 Cube-ConnectedCycles, 17 Butterfly, 17 AKS Sorting Network, 18 Stars and Pancakes, 19
Shared-Memory Machines 20 2.3.1 2.3.2 2.3.3 2.3.4
ParallelRandom Access Machine, 21 Scan Model, 23 Broadcasting with Selective Reduction, 23 Models for the Future, 23
2.4
Problems
2.5
References
24 25 iii
iv
Contents
3 CONVEX HULL
27
3.1
Shared-Memory Model Algorithms
3.2
Network Model Algorithms
3.3
Other Models
3.4
When the Input Is Sorted
3.5
Related Problems 3.5.1 3.5.2 3.5.3 3.5.4
28
33
37 38
38
Three-Dimensional Convex Hulls, 38 Digitized Images, 40 Convex Hull of Disks, 41 Computing Maximal Vectors, 41
3.6
Problems
3.7
References
44 45
4 INTERSECTION PROBLEMS 4.1
Line Segments
4.2
Polygons, Half-Planes, Rectangles, and Circles
4.3
Problems
4.4
References
51
51 56
61 62
5 GEOMETRIC SEARCHING
6
5.1
Point Location
5.2
Range Searching
5.3
Problems
5.4
References
65 70
71
72
VISIBILITY AND SEPARABILITY 6.1
Visibility 6.1.1 6.1.2 6.1.3 6.1.4
75
Visibility Polygon from a Point Inside a Polygon, 75 Region of a Polygon Visible in a Direction, 77 Visibility of the Planefrom a Point, 81 Visibility Pairs of Line Segments, 83
6.2
Separability
6.3
Problems
6.4
References
65
84 85 87
75
Contents
v
7 NEAREST NEIGHBORS
89
7. 1
Three Proximity Problems
7.2
Related Problems
7.3
Problems
7.4
References
89
93
94 95
8 VORONOI DIAGRAMS
99
8.1
Network Algorithms for Voronoi Diagrams
8.2
PRAM Algorithms for Voronoi Diagrams
8.3
Problems
8.4
References
101 103
105 107
9 GEOMETRIC OPTIMIZATION
111
9.1
Minimum Circle Cover
I I1
9.2
Euclidean Minimum Spanning Tree
9.3
Shortest Path
9.4
Minimum Matchings 117 9.4.1 Graph Theoretic Formulation, 118 9.4.2 Linear Programming Formulation, 119 9.4.3 Geometric Formulation, 120 9.4.4 ParallelAlgorithm, 121 9.4.5 Related Problems, 121 9.4.6 Some Open Questions, 122
9.5
Problems
9.6
References
115
116
122 123
10 TRIANGULATION OF POLYGONS AND POINT SETS
11
10.1
Trapezoidal Decomposition and Triangulation of Polygons
10.2
Triangulation of Point Sets
10.3
Problems
10.4
References
127
131
134 135
CURRENT TRENDS 11.1
127
Parallel Computational Geometry on a Grid 11.1.1 Geometric Search Problem, 138 11.1.2 Shadow Problem, 141
137 137
Contents
vi
11.1.3 11.1.4
11.2
General Prefix Computations and Their Applications 11.2.1 11.2.2 11.2.3 11.2.4
11.3
146
Lower Bound for GPC, 147 Computing GPC, 148 Applying GPC to Geometric Problems, 149 Concluding Remarks, 150
Parallel Computational Geometry on Stars and Pancakes 11.3.1 11.3.2 11.3.3 11.3.4 11.3.5 11.3.6
11.4
Path in a Maze Problem, 143 Concluding Remarks, 145
Broadcasting with Selected Reduction 11.4.1 11.4.2 11.4.3 11.4.4
169
BSR Model, 171 Sample BSR Algorithms, 172 Optimal BSR Implementation, 175 Concluding Remarks, 180
11.5
Problems
11.6
References
180 182
12 FUTURE DIRECTIONS
187
12.1
Implementing Data Structures on Network Models
12.2
Problems Related to Visibility 12.2.1 12.2.2
151
Basic Definitions, 151 Data Communication Algorithms, 154 Convex Hull Algorithms on the Star and Pancake Networks, 164 Solving Geometric Problems by the Merging Slopes Technique, 165 General Prefix Computation, 168 Concluding Remarks, 168
187
188
Art Gallery and Illumination Problems, 188 Stabbing, 188
12.3
Geometric Optimization Using Neural Nets
12.4
Parallel Algorithms for Arrangements
12.5
P-Complete Geometric Problems
12.6
Dynamic Computational Geometry
12.7
Problems
12.8
References
189
190
191 191
191 193
BIBLIOGRAPHY
195
INDEXES
211
Author
211
Subject
212
Preface Programming computers to process pictorial data efficiently has been an activity of growing importance over the last 40 years. These pictorial data come from many sources; we distinguish two general classes: 1. Most often, the data are inherently pictorial; by this we mean the images arising in medical, scientific, and industrial applications, such as, for example, the weather maps received from satellites in outer space. 2. Alternatively, the data are obtained when a mathematical model is used to solve a problem and the model relies on pictorial data; examples here include computing the average of a set of data (represented as points in space) in the presence of outliers, computing the value of a function that satisfies a set of constraints, and so on. Regardless of their source, there are many computations that one may want to perform on pictorial data; these include, among many others, identifying contours of objects, "noise" removal, feature enhancement, pattern recognition, detection of hidden lines, and obtaining intersections among various components. At the foundation of all these computations are problems of a geometric nature, that is, problems involving points, lines, polygons, and circles. Computationalgeometry is the branch of computer science concerned with designing efficient algorithms for solving geometric problems of inclusion, intersection, and proximity, to name but a few. Until recently, these problems were solved using conventional sequential computers, computers whose design more or less follows the model proposed by John von Neumann and his team in the late 1940s. The model consists of a single processor capable of executing exactly one instruction of a program during each time unit. Computers built according to this paradigm have been able to perform at tremendous speeds, thanks to inherently fast electronic components. However, it seems today that this approach has been pushed as far as it will go, and that the simple laws of physics will stand in the way of further progress. For example, the speed of light imposes a limit that cannot be surpassed by any electronic device. On the other hand, our appetite appears to grow continually for ever more powerful computers capable of processing large amounts of data at great speeds. One solution to this predicament that has recently gained credibility and popularity is parallel processing. Here a computational problem to be solved is broken into smaller parts that are solved simultaneously by the several processors of a parallel computer. The idea is a natural one, and the decreasing cost and size of electronic components have made it feasible. Lately, computer scientists have been busy building parallel computers and developing algorithms and software to solve problems on them. One area that has received its fair share of interest is the development of parallel algorithms for computational geometry. vii
Preface
viii
This book reviews contributions made to the field of parallel computational geometry since its inception about a decade ago. Parallel algorithms are presented for each problem, or family of problems, in computational geometry. The models of parallel computation used to develop these algorithms cover a very wide range, and include the parallel random access machine (PRAM) as well as several networks for interconnecting processors on a parallel computer. Current trends and future directions for research in this field are also identified. Each chapter concludes with a set of problems and a list of references. The book is addressed to graduate students in computer science, engineering, and mathematics, as well as to practitioners and researchers in these disciplines. We assume the reader to be generally familiar with the concepts of algorithm design and analysis, computational geometry, and parallelism. Textbook treatment of these concepts can be found in the following references: 1. Algorithm Design and Analysis G. Brassard and P. Bratley, Algorithmics: Theory and Practice, Prentice Hall,
Englewood Cliffs, New Jersey, 1988. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. U. Manber, Introduction to Algorithms: A Creative Approach, Addison-
Wesley, Reading, Massachusetts, 1989. 2. Computational Geometry H. Edelsbrunner, Algorithms in Combinatorial Geometry, EATCS Mono-
graphs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. K. Mehlhorn, Data Structures and Algorithms 3: Multi-Dimensional Searching and Computational Geometry, EATCS Monographs on Theoretical Computer
Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. 3. Parallel Algorithms S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, En-
glewood Cliffs, New Jersey, 1989. A. Gibbons and W. Rytter, Efficient ParallelAlgorithms, Cambridge Univer-
sity Press, Cambridge, 1988. Finally, we wish to thank the staff of Prentice Hall for their help, the reviewers for their enthusiasm, and our families for their love and support. Selim G. Akl Kelly A. Lyons
1 Introduction
Computational geometry is a branch of computer science concerned with the design and analysis of algorithms to solve geometric problems. Applications where efficient solutions to such problems are needed include computer graphics, pattern recognition, robotics, statistics, database searching, and the design of very large scale integrated (VLSI) circuits. Typical problems involve a set of points in the plane for which it is required to compute the smallest convex polygon containing the set, or to find a collection of edges connecting the points whose total length is minimum, to determine the closest neighbor to each point, and so on. A survey of important results in this area can be found in [Lee84b], while textbook treatments of the subject are provided in [Mehl84, Prep85, Edel87].
1.1 Origins of Parallel Computational Geometry Due to the nature of some applications in which geometric problems arise, fast and even real-time algorithms are often required. Here, as in many other areas, parallelism seems to hold the greatest promise for major reductions in computation time. The idea is to use several processors which cooperate to solve a given problem simultaneously in a fraction of the time taken by a single processor. Therefore, it is not surprising that interest in parallel algorithms for geometric problems has grown in recent years. While some early attempts date back to the late 1950s, the modern approach to parallel computational geometry was pioneered by A. Chow in her 1980 Ph.D. thesis [Chow8O]. Other initial attempts are described in [Nath8O] and in [Akl82]. Since then a number of significant results have been obtained, and important problems have been identified whose solutions are still outstanding. In this book we survey the first ten years of research in parallel computational geometry.
1.2 Representative Problems Consider the following problems, each a classic in computational geometry. Problem 1: Convex Hull. Given a finite set of points in the plane, it is required to find their convex hull (i.e., the convex polygon with smallest area that includes all the points, either as its vertices or as interior points). 1
2
Introduction
Chap. 1
Problem 2: Line Segment Intersection. Given a finite set of line segments in the plane, it is required to find and report all pairwise intersections among line segments, if any exist. Problem 3: Point Location in a Planar Subdivision. Given a convex planar subdivision (i.e., a convex polygon itself partitioned into convex polygons) and a finite set of data points, it is required to determine the polygon of the subdivision occupied by each data point. Problem 4: Visibility Polygon from a Point Inside a Polygon. Given a simple polygon P and a point p inside P, it is required to determine that region of P that is visible from p (i.e., the region occupied by points q such that the line segment with endpoints p and q does not intersect any edge of P). Problem 5: Closest Pair. Given a finite set of points in the plane, it is required to determine which two are closest to one another. Problem 6: Voronoi Diagram. Given a finite set S of data points in the plane, it is required to find, for each point p of S, the region of the plane formed by points that are closer to p than to any other point of S. Problem 7: Minimum-Distance Matching. Given 2n points in the plane, it is required to match each point with a single other point so that the sum of the Euclidean distances between matched points is as small as possible. Problem 8: Polygon Triangulation. Given a simple polygon P. it is required to triangulate P (i.e., to connect the vertices of P with a set of chords such that every resulting polygonal region is a triangle). Each of the problems above has been studied thoroughly in the literature, and often more than one efficient sequential algorithm exists for its solution. The list above is also illustrative of geometric problems with a significant degree of inherent parallelism. Take, for instance, Problem 2. It is obvious that one could check all pairs of segments simultaneously and determine all existing intersections. Similarly, in Problem 3, all polygons of the subdivision may be checked at the same time for inclusion of a given data point. The same is true of Problem 5, where the closest neighbor of each point can be computed in parallel for all points and the overall closest pair of points quickly determined afterward. These examples demonstrate that very fast solutions to geometric problems can be obtained through parallel computation. However, the solutions just outlined are rather crude and typically require a large number of resources. Our purpose in this book is to show that geometric problems can be, and indeed have been, solved by algorithms that are efficient both in terms of running time and computational resources. We now introduce some terminology and notation used throughout the book. As customary in computational geometry, we refer to the number of relevant objects in the statement of a problem as the size of that problem. For example, the size of Problem 1
Sec. 1.2
Representative Problems
3
is the number of points, while the size of Problem 2 is the number of line segments. Let f (n) and g(n) be functions from the positive integers to the positive reals: 1. The function g(n) is said to be of order at least f (n), denoted Q (f (n)), if there are positive constants c and no such that g(n) > cf (n) for all n > no. 2. The function g(n) is said to be of order at most f (n), denoted 0(f (n)), if there are positive constants c and no such that g(n) < cf (n) for all n > no.
The Q( ) notation is used to express lower bounds on the computational complexity of problems. For example, to say that 2 (n log n) is a lower bound on the number of operations required to solve a certain problem of size n in the worst case means that the problem cannot be solved by any algorithm (whether known or yet to be discovered) in fewer than cn log n operations in the worst case, for some constant c. On the other hand, the 0( ) notation is used to express upper bounds on the computational complexity of problems. For example, if there exists an algorithm that solves a certain problem of size n in cn 2 operations in the worst case, for some constant c, and furthermore, no other algorithm is known that requires asymptotically fewer operations in the worst case, then we say that 0(n2 ) is an upper bound on the worst-case complexity of the problem at hand. Both the Q( ) and the 0( ) notations allow us to concentrate on the dominating term in an expression describing a lower or upper bound and to ignore any multiplicative constants. In an algorithm, an elementary operation is either a computation assumed to
take constant time (such as adding or comparing two numbers) or a routing step (i.e., the sending of a datum from one processor to a neighboring processor in a parallel computer). The number of elementary operations used by a sequential algorithm is generally used synonymously with the running time of the algorithm. Thus if a sequential algorithm performs en operations to solve a problem of size n, where c is some constant, we say that the algorithm runs in time t(n) = 0(n). In a parallel algorithm an upper bound on the the total number of operations performed by all processors collectively in solving a problem (also known as the cost or work) is obtained by multiplying the number of processors by the running time of the algorithm (i.e., the maximum number of operations performed by any one processor). When an algorithm solves a problem using a number of operations that matches, up to a constant multiplicative factor, the lower bound on the computational complexity of the problem, we say that the algorithm is optimal if it is a sequential algorithm, and cost optimal if it is a parallel algorithm. A randomized algorithm (whether sequential or parallel) is one that terminates within a prespecified running time with a given probability. The running time of such an algorithm is said to be probabilistic. A deterministic algorithm, on the other hand, has a guaranteed worst-case running time. In this book we refer to an algorithm as being deterministic only in those circumstances where it is to be contrasted with a randomized algorithm. When no qualifier is used explicitly, it should be understood that the algorithm in question (whether sequential or parallel) is deterministic. We adopt the notation used in [Reif90], where O( ) is used to express the running time
Introduction
4
Chap. 1
of a probabilistic algorithm, while O( ) is used in conjunction with deterministic time. Sometimes the expected running time of a deterministic algorithm is of interest. To this end, an average-case analysis is conducted assuming that the input obeys a certain probability distribution.
1.3 Organization of the Book Unlike in sequential computation, where von Neumann's model prevails, several models of parallel computation have been proposed and used. In Chapter 2 we introduce the most common of these models, particularly those used to design parallel algorithms for computational geometry. Many of the more interesting results in parallel computational geometry are algorithms designed for the shared-memory PRAM model of computation (Section 2.3.1). Algorithms for the less powerful network models are often seen as more practical since actual machines based on these models can be constructed more readily. However, designing algorithms for the PRAM model results in complexities that reflect the inherent limits of solving a problem in parallel rather than limits due to data movement [Yap87]. For example, the lower bound for communicating data on a mesh-connected computer (Section 2.2.2) is Q(n 1/2), and this lower bound holds for most interesting algorithms on the mesh. In this book we describe parallel algorithms that solve geometric problems on both network and shared-memory models. The main body of the book, Chapters 3 to 10, is organized according to geometric problem. In each of these chapters we describe a problem and give some idea of the extent to which the problem has been studied in the sequential world and the best known time complexities for solving the problem sequentially. We then describe parallel solutions to the problem and discuss the significance of the parallel results. When appropriate, a table is provided summarizing the complexities of existing parallel algorithms for the problem. Finally, Chapters 11 and 12 cover current trends and future directions, respectively, in the design of parallel algorithms for geometric problems. We stress that the parallel algorithms selected for treatment in this survey are for problems that are fundamental in nature, such as construction, proximity, intersection, search, visibility, separability, and optimization. By contrast, parallel algorithms for various applications of computational geometry are covered in [Uhr87, Kuma90, Kuma91]. We also draw the reader's attention to [Good92b], where another survey of parallel computational geometry can be found.
1.4 Problems 1.1.
A parallel computer is a computer consisting of several processors that cooperate to solve a problem simultaneously. This definition leaves many details unspecified, particularly those details pertaining to the structure and operation of a parallel computer. Several of the options available in designing a parallel computer, or more specifically, a parallel model of computation, are outlined in Chapter 2. Suppose that you had to design a parallel computer.
Sec. 1.4
Problems
5
Before reading Chapter 2, and perhaps to better appreciate the issues therein, describe how your parallel computer will be organized and how the processors will function when solving a problem in parallel. 1.2. The convex hull of a finite set P of points in the plane is the smallest convex polygon that contains all the points of P. The convex hull is a cornerstone concept in computational geometry, and algorithms for computing it, both sequentially and in parallel, have provided many insights to the field's theory and practice. Use the parallel computer designed in solving Problem 1.1 to compute the convex hull of a set of points. 1.3. Given a finite set P of points in the plane, it is required to compute a triangulation of P (i.e., it is required to connect the points of P repeatedly using nonintersecting straight-line segments until no more segment can be added without creating an intersection). The resulting structure is the convex hull of P and a collection of polygons inside it, each of which is a triangle. Suggest a way to solve this problem in parallel. 1.4. Assume that the triangulation T of a set of points P. as defined in Problem 1.3, is known. Given a point p not in P, it is required to determine the triangle (if any) of T in which p falls. Is there a fast way to solve this problem in parallel? How many processors will be needed? Express your answers in terms of the number of points in P. 1.5. In some applications it is required to determine, given a finite set of points in the plane, which two are closest (if several pairs satisfy the condition, one may be selected at random). Propose an efficient parallel solution to this problem (i.e., one that is fast and does not use an excessive number of processors). Can your solution be extended to points in d-dimensional space where d > 2? 1.6. Another common problem in computational geometry is to determine intersections of a number of objects. For simplicity, let us assume that all objects are equilateral triangles in the plane, all of the same size and all having one edge parallel to the x-axis. Design a parallel algorithm to solve this problem, and discuss its time and processor requirements. 1.7. An n-vertex convex polygon P is given in the plane such that its interior contains the origin of coordinates. It is required to identify P (i.e., determine its shape and location) using finger probes. For a chosen directed line L, a finger probe can be thought of as a point moving from infinity along and in the direction of L until it first touches P at some point p. The outcome of the probe is the pair (L,p), where p is oc if L does not intersect P. A sequence of such probes may be used to determine the exact shape and location of P. Design a parallel algorithm for identifying a set of probes sufficient to determine the shape and location of a given convex polygon. 1.8. General polygons are polygons in which two or more edges may cross. This class of polygons includes simple polygons as a subclass. (In simple polygons, no two edges may cross.) (a) Give a definition of the interior of a general polygon. (b) Design a test for point inclusion in a general polygon (i.e., a test to determine whether a given data point falls inside a given general polygon). (c) Design a test for polygon inclusion in a general polygon (i.e., a test to determine whether a general polygon is included inside another general polygon). (d) Design a test for polygon intersection (i.e., a test to determine whether two general polygons intersect (of which inclusion is a special case)]. (e) Develop parallel implementations of the tests in parts (b) through (d). (f) Are there applications where general polygons arise? 1.9. You are given two simple polygons P and Q in the plane, where Q falls entirely inside
6
Introduction
Chap. 1
P. Now consider two points p and q in the annular region R = P - Q (i.e., p and q are in P but not in Q). It is required to find a polygonal path from p to q that falls entirely in R and satisfies one or both of the following conditions: (a) The number of line segments on the path is minimized. (b) The total length of the path is minimized. Develop a parallel algorithm for solving this problem. 1.10. Given a point p inside a simple polygon P, it is required to compute a polygonal path from p to every vertex v of P such that the number of line segments on each path, also called the link distance from p to v, is minimized. Design a parallel algorithm for solving this problem. 1.11. Using the definition of link distance in Problem 1.10, develop an adequate definition for the concept of link center of a simple polygon P, and design a parallel algorithm for locating such a center. 1.12. In sequential computation, data structures play a crucial role in the development of efficient algorithms, particularly for computational geometric problems. Discuss the idea of how data structures might be implemented in a parallel computational environment.
1.5 References [AkI82]
S. G. AkI, A constant-time parallel algorithm for computing convex hulls, BIT, Vol. 22, 1982, 130-134. [Chow8] A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. [Edel87] H. Edelsbrunner, Algorithms in combinatorial geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. [Good92b] M. T. Goodrich and C. K. Yap, What can be parallelized in computational geometry: a survey, manuscript in preparation, 1992. [Kuma90] V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal (Editors), ParallelAlgorithms for Machine Intelligence and Vision, Springer-Verlag, New York, 1990. [Kuma9l] V. K. Prasanna Kumar, ParallelArchitectures and Algorithms for Image Understanding, Academic Press, New York, 1991. [Lee84b] D. T. Lee and F. P. Preparata, Computational geometry-a survey, IEEE Transactions on Computers, Vol. C-33, No. 12, t984, 1072-1101. [MehI84] K. Mehlhorn, Data structures and algorithms 3: multi-dimensional searching and computational geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. [Nath8O] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt, Parallel Algorithms for the Convex Hull Problem in Two Dimensions, Technical Report EE 8005, Department of Electrical Engineering, Indian Institute of Technology, Delhi Hauz Khas, New Delhi, October 1980. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif90] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary
Sec. 1.5
[Uhr87] [Yap87]
References
7
version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. L. Uhr (Editor), Parallel Computer Vision, Academic Press, New York, 1987. C. K. Yap, What can be parallelized in computational geometry? Invited talk at the International Workshop on Parallel Algorithms and Architectures, Humboldt University, Berlin, May 1987, Lecture Notes in Computer Science, No. 269, Springer-Verlag, Berlin, 1988, 184-195.
2 Models of Parallel Computation
In this chapter we define existing models of parallel computation, with a particular emphasis on those used in the development of parallel computational geometric algorithms.
2.1 Early Models We begin this review with two models that predate today's more popular ones (developed mostly since the late 1970s). Interestingly, the salient features of these two models were later rediscovered, and new names are now used for these models. 2.1.1 Perceptrons The perceptron, proposed in the late 1950s [Rose62], was intended to model the visual pattern recognition ability in animals. A rectangular array of photocells (representing the eye's retina) receives as input from the outside world a binary pattern belonging to one of two classes. Inside the machine, the input bits are collected into n groups. Within a group, each bit is multiplied (at random) by +1 or -1, and the products are added: If the sum is larger than or equal to a certain threshold, a I is produced as input to the next stage of the computation; otherwise, a 0 is produced. Each of these n bits is then multiplied by an appropriate weight value wi and the products are added: Again, if the sum is larger than or equal to a given threshold, a final output of I is produced (indicating that the original input pattern belongs to one class); otherwise, a 0 is produced (indicating the other class). Figure 2.1 shows a rectangular array and a perceptron. A number of limitations of the perceptron model are uncovered in [Mins69]. It is important to point out that many of the ideas underlying today's neural net model 9
Models of Parallel Computation
10
Chap. 2
1,0
Figure 2.1
Perceptron.
of computation owe their origins to perceptrons. Neural nets, however, are more general in a number of ways; for example, they are not restricted to visual patterns, can classify an input into one of several classes, and their computations may be iterative [Lipp87]. 2.1.2 Cellular Automata The cellular automaton consists of a collection of simple processors all of which are identical. Each processor has a fixed amount of local memory and is connected to a finite set of neighboring processors. Figure 2.2 illustrates an example of a processor (a cell) in a cellular automaton and its neighboring processors. At each step of a computation, all processors operate simultaneously: Input is received from a processor's neighbors (and possibly the outside world), a small amount of local computation is performed, and the output is then sent to the processor's neighbors (and possibly the outside world). Developed in the early to mid-1960s [Codd68], this model enjoyed a purely theoretical interest until the advent of very large scale integrated circuits. The dramatic reduction in processor size brought about by the new technology rendered the model feasible for real computers. Today's systolic arrays [Fost8O] are nothing but finite cellular automata often restricted to two-dimensional regular interconnection patterns, with various input and output limitations. Cellular automata are the theoretical foundation upon which lies the more general processor network model of Section 2.2. 2.2 Processor Networks In a processor network, an interconnected set of processors, numbered 0 to N - 1, cooperate to solve a problem by performing local computations and exchanging
Sec. 2.2
Processor Networks
I11
Figure 2.2 Example of a processor and its neighbors in a cellular automaton. 0
2
3
Figure 2.3 Linear array with N = 6 processors.
messages [Akl89a]. Although all identical, the processors may be simple or powerful, operate synchronously or asynchronously, and execute the same or different algorithms. The interconnection network may be regular or irregular, and the number of neighbors of each processor may be a constant or a function of the size of the network. Local computations as well as message exchanges are taken into consideration when analyzing the time taken by a processor network to solve a problem. Some of the most widely used networks are outlined below. 2.2.1 Linear Array The simplest way to interconnect N processors is as a one-dimensional array. Here
processor i is linked to its two neighbors i -I and i + I through a two-way communication line. Each of the end processors, 0 and N -1, has only one neighbor. Figure 2.3 shows an example of a linear array of processors for N = 6. 2.2.2 Mesh or Two-Dimensional Array A two-dimensional network is obtained by arranging the N processors into an m x m array, where m = N1 2. The processor in row j and column k is denoted by (j,k), where 0 < j < m -I and 0 < k < m -1. A two-way communication line links (j,k) to its neighbors (j + 1, k), (j 1-,k), (j, k + 1), and (j, k -1). Processors on the boundary rows and columns have fewer than four neighbors and hence fewer connections. This network is also known as the mesh or the mesh-connected computer (MCC) model.
5
12
Chap. 2
Models of Parallel Computation COLUMN NUMBER
ROW NUMBER
0
1
2
3
0
2
3
Figure 2.4 Mesh with N = 16 processors. When each of its processors is associated with a picture element (or pixel) of a digitized image (i.e., a rectangular grid representation of a picture), the mesh is sometimes referred to as a systolic screen. Figure 2.4 shows a mesh with N = 16 processors. A number of processor indexing schemes are used for the processors in a mesh [Akl85b]. For example, in row-major order, processor i is placed in row j and column k of the two-dimensional array such that i = jm + k, for 0 < i < N-1, 0 < j < m-1, and 0 < k < m- 1. In snakelike row-major order, processor i is placed in row j and column k of the processor array such that i = jm + k, when j is even, and i = jm +m - k - 1, when j is odd, where i, j, and k are as before. Finally, shuffled row-major order is defined as follows. Let bib2 ... bq and blb(q/2)+Ib2b(q/2)+2b3b(q/2)+3b4... bql2bq be the
binary representations of two indices i and i5, respectively, 0 < i, i, < N - 1. Then processor is occupies in shuffled row-major order the position that would be occupied by processor i in row-major order. The mesh model can be generalized to dimensions higher than two. In a ddimensional mesh, each processor is connected to two neighbors in each dimension with processors on the boundary having fewer connections [Akl85b, Hole90. Several variations on the mesh have been proposed, including the mesh with broadcast buses [where the processors in each row (or column) are connected to a bus
Sec. 2.2
Processor Networks
13
LEVEL3
LEVEL2
LEVEL I
LEAVES
LEVEL
Figure 2.5 Tree with
N = 24- I= 15
processors.
over which a processor can broadcast a datum to all other processors in the same row (or column)], and the mesh with reconfigurable buses (which is essentially a mesh with broadcast buses and four switches per processor, allowing several subbuses to be created as needed by the algorithm). 2.2.3 Tree In a tree network, the processors form a complete binary tree with d levels. The levels are numbered from 0 to d - I and there are a total of N = 2- - I nodes each of which is a processor. A processor at level i is connected by a two-way line to its parent at level i + 1 and to its children at level i -1. The root processor (at level d - I) has no parent and the leaves (all of which are at level 0) have no children. Figure 2.5 shows a tree with N = 24-I = 15 nodes.
2.2.4 Mesh-of-Trees In a mesh-of-trees (MOT) network, N processors are placed in a square array with N' 12 rows and N1/ 2 columns. The processors in each row are interconnected to form a binary tree, as are the processors in each column. The tree interconnections are the only links among the processors. Figure 2.6 shows a mesh-of-trees with N = 16 processors. Sometimes the mesh-of-trees architecture is described slightly differently. Here, N processors form an N'12 x N'/2 base such that each base processor is a leaf of a column binary tree and a row binary tree. Additional processors form row and column binary trees. Each base processor is connected to its parent processor in its column binary tree and its parent processor in its row binary tree. The total number of processors is O(N). In some cases, mesh connections between the base processors are allowed.
14
Models of Parallel Computation
Figure 2.6 Mesh-of-trees with N
-
Chap. 2
16 processors.
This architecture can be nicely embedded in the plane making it useful for VLSI implementation. Figure 2.7 shows this different mesh-of-trees architecture.
2.2.5 Pyramid A one-dimensional pyramid computer is obtained by adding two-way links connecting processors at the same level in a binary tree, thus forming a linear array at each level. This concept can be extended to higher dimensions. For example, a two-dimensional
Sec. 2.2
Processor Networks
Figure 2.7 Slightly different mesh-of-trees where N = 16. The N boxes are the base processors and the black circles are additional processors that form row and column binary trees.
15
Models of Parallel Computation
16
Chap. 2
APEX
BASE
BASE
Figure 2.8
Pyramid with d = 2.
pyramid consists of 4 d/ 3 - 1/3 processors distributed among d + 1 levels. All processors at the same level are connected to form a mesh. There are 4d processors at level 0 (also called the base) arranged in a 2d x 2d mesh. There is only one processor at level d + 1 (also called the apex). In general, a processor at level i, in addition to being connected to its four neighbors at the same level, also has connections to four children at level i- 1 (provided that i > 1), and to one parent at level i + 1 (provided that i < d - 1). Figure 2.8 shows a pyramid with d = 2.
Sec. 2.2
Processor Networks
17
Figure 2.9 Hypercube with N = 23 processors.
2.2.6 Hypercube Assume that N = 2 d for some d > 1. A d-dimensional hypercube is obtained by connecting each processor to d neighbors. The d neighbors of processor i are those processors j such that the binary representation of the numbers j and i differs in exactly one bit. Figure 2.9 shows a hypercube with N = 23 processors. 2.2.7 Cube-Connected Cycles To obtain a cube-connected cycles (CCC) network, we begin with a d-dimensional hypercube, then replace each of its 2d corners with a cycle of d processors. Each processor in a cycle is connected to a processor in a neighboring cycle in the same dimension. See Figure 2.10 for an example of a CCC network with d = 3 and N = 2dd = 24 processors. In the figure each processor has two indices ij, where i is the processor order in cycle j. A modified CCC network is a CCC network with additional links, guaranteeing that it can be partitioned into smaller CCC networks [Mill88]. 2.2.8 Butterfly A butterfly network consists of 2d(d + 1) processors organized into d + I rows and 2d columns. If (i, j) is the processor in row i and column j, then for i > 0, (i, j) is connected to (i -1, j) and (i - 1, k), where the binary representations of the numbers j and k differ only in the ith most significant bit. A butterfly network with 23(3 + 1) processors is illustrated in Figure 2.11. The butterfly is related to both the hypercube and the CCC architectures. A link in a hypercube between processors i and j such that the binary representation of the number i differs from the binary representation of the number j in the rth bit corresponds to a link in a butterfly between processor (r - 1, i) and (r,j). To see how the butterfly is related to the CCC model, we begin by identifying row 0 with row d. Now, consider each column of processors in the butterfly as a node of a d-dimensional hypercube such that the processors in a column are connected in a cycle at the node in the order in
18
Models of Parallel Computation
Chap. 2
14
16 9'
C) 34
24
26
10
30
36
12
20
32
22
15
17
. ,
A N ,
,
35
,
31
A
25
27
II
WL_~
KZ, ,
37
13
1
J2
23
21
Figure 2.10 processors.
23 Cube-connected
33
cycles network with d = 3 and N = 24
which they appear in the column. Any algorithm that can be implemented in T(n) time on a butterfly can be implemented in T(n) time on a hypercube and a CCC network [Ullm84]. 2.2.9 AKS Sorting Network An O(n) processor network capable of sorting n numbers into nondecreasing order in 0(logn)' time is exhibited in [Leig85], based on the earlier work of [Ajta83]. ' All logarithms in this book are to the base 2 unless stated otherwise.
Sec. 2.2 COLUMN
Processor Networks 1
0
19 2
3
4
5
6
7
ROW 0
3 Figure 2.11
Butterfly network with 23(3+ 1) processors.
This network, combined with a modified CCC network, is referred to in [Mill88] as a modified AKS network. 2.2.10 Stars and Pancakes
These are two interconnection networks with the property that for a given integer 1t, each processor corresponds to a distinct permutation of aqsymbols, say {1, 2, . . ., 1. In other words, both networks connect N = ra! processors, and each processor is labeled with the permutation to which it corresponds. Thus, for q = 4, a processor may have the label 2134. In the star network, denoted by Se, a processor v is connected to a processor u if and only if the label of u can be obtained from that of v by exchanging
Models of Parallel Computation
20 1234
Chap. 2
4231
Figure 2.12 A 4-star. the first symbol with the ith symbol, where 2 < i < A. Thus for rj = 4, if v = 2134
and u = 3124, u and v are connected by a two-way link in S4, since 3124 and 2134 can be obtained from one another by exchanging the first and third symbols. Figure 2.12 shows S4. In the pancake network, denoted by P,, a processor v is connected to a processor u if and only if the label of u can be obtained from that of v by flipping the first i symbols, where 2 < i < q. Thus for rj = 4, if v = 2134 and u = 4312, u and v are connected by a two-way link in P4, since 4312 can be obtained from 2134 by flipping the four symbols, and vice versa. Figure 2.13 shows P4 . Both the star and pancake interconnection networks have been proposed as alternatives to the hypercube. They have recently been used to solve several problems in computational geometry. These two networks, their properties, and associated algorithms are studied in detail in Chapter 11.
2.3 Shared-Memory Machines
One of the main challenges involved in designing an algorithm for a processor network follows from the fact that the routing of messages from one processor to another is the responsibility of the algorithm designer. This challenge is removed completely by the models described in this section.
Sec. 2.3
1234
21
Shared-Memory Machines
4321
32
.1
23
.1
41 3 14:
a 3412
2143
Figure 2.13 A 4-pancake.
2.3.1 Parallel Random Access Machine In a parallel random access machine (PRAM), the processors no longer communicate directly through a network. Instead, a common memory is used as a bulletin board and all data exchanges are executed through it. Any pair of processors can communicate through this shared memory in constant time. As shown in Figure 2.14, an interconnection unit (IU) allows each processor to establish a path to each memory location for the purpose of reading or writing. The processors operate synchronously and each step of a computation consists of three phases: 1. The read phase, in which the processors read data from memory 2. The compute phase, in which arithmetic and logic operations are performed 3. The write phase, in which the processors write data to memory Depending on whether two or more processors are allowed to read from and/or write to the same memory location simultaneously, three submodels of the PRAM are identified: 1. The exclusive-read exclusive-write (EREW) PRAM, where both read and write accesses by more than one processor to the same memory location are disallowed 2. The concurrent-read exclusive-write (CREW) PRAM, where simultaneous reading from the same memory location is allowed, but not simultaneous writing
22
Models of Parallel Computation
(-_
Chap. 2
-A
Interconnection Unit (IU)
*AZ Processors
Shared Memory Locations
Figure 2.14
PRAM.
3. The concurrent-read concurrent-write (CRCW) PRAM, where both forms of simultaneous access are allowed In the case of the CRCW PRAM, one must also specify how write conflicts are to be resolved (i.e., what value is stored in a memory location when two or more processors are attempting to write potentially different values simultaneously to that location). Several conflict resolution policies have been proposed, such as the PRIORITY rule (where processors are assigned fixed priorities, and only the one with the highest priority is allowed to write in case of conflict), the COMMON rule (where, in case of conflict, the processors are allowed to write only if they are attempting to write the same value), the ARBITRARY rule (where any one of the processors attempting to write succeeds), the SMALLEST rule (where only the processor wishing to write the smallest datum succeeds), the AND rule (where the logical AND of the Boolean values to be written ends up being stored), the SUM rule (where the values to be stored are added up and the sum deposited in the memory location), the COLLISION rule (where a special symbol is stored in the memory location to indicate that a write conflict has occurred), and many other variants. Computational geometry on a single processor uses the REAL RAM as the model of computation which allows real arithmetic up to arbitrary precision, as well as evaluation of square roots and analytic functions, such as "sin" or "cos" in 0(1) time. In parallel, the model of computation is the REAL PRAM, sometimes denoted as RPRAM. We make the assumption that the PRAM model used for computational geometry is the REAL PRAM and refer to it simply as PRAM. Two fundamental algorithms for the EREW PRAM are broadcasting and sorting. Broadcasting allows a datum d to be communicated to N processors in 0 (log N) time, by beginning with one processor reading d and then doubling the number of processors that have d at each iteration [Akl85b]. The second algorithm, sorting, allows N numbers to be sorted in nondecreasing order by N processors in 0 (log N) time
Sec. 2.4
Problems
23
[Cole88b]. By using these two algorithms, it is possible to simulate any concurrent-read or concurrent-write step involving N processors in 0 (log N) time on a PRAM that disallows them [Akl89a, Corm9O].
2.3.2 Scan Model
Given n data items xo, xi., xn-, and an associative binary operation *, it is required to compute the n-I quantities xO * xl, xO * xl * x 2 , .. , xO * xI * ... * X,- . It is well known that all required outputs can be computed in parallel on a processor network with n processors in 0(log n) time. This is known as a parallelprefix computation [Krus85]. It is argued in [Blel89] that since O(logn) is the amount of time required to gain access to a shared memory of 0(n) locations, the time for parallel prefix computation can be absorbed by the time for memory access. Further, since the latter is assumed to take constant time, so should the former. The scan model is therefore the usual PRAM augmented with a special circuit to perform parallel prefix. As a result, many algorithms that use parallel prefix and run on the PRAM in T time units run on the scan model in T/logn time units.
2.3.3 Broadcasting with Selective Reduction Broadcasting with selective reduction, proposed in [Akl89c], extends the power of the CRCW PRAM while using only its existing resources: The interconnection unit connecting processors to memory locations is exploited to allow each processor to gain access to potentially all memory locations (broadcasting). At each step of an algorithm involving a concurrent write, the algorithm can specify, for each memory location, which processors are allowed to write in that location (selection) and the rule used to combine these write requests (reduction). This model is described in detail in Chapter 11 together with algorithms for solving geometric problems on it.
2.3.4 Models for the Future All popular models of computation today are based largely on assumptions derived from digital electronics. It is believed, however, that new models may emerge from totally different approaches to building computers. There are already computers in existence today in which some devices are built using optical components [Feit88]. The day may not be far where central processing units and memories are optical. There are also studies under way to investigate the possibility of building biologically based computers [Conr86]. The effect these models may have on our approach to algorithm design in general, and computational geometry in particular, is still unknown.
24
Models of Parallel Computation
Chap. 2
2.4 Problems 2.1.
How does your parallel computer, designed in solving Problem 1.1, compare with the parallel models of computation described in this chapter? 2.2. In Chapter 3, several parallel algorithms are described for computing the convex hull of a set of n points in the plane (defined in Problem 1.2). Before reading Chapter 3, attempt to compute the convex hull on one or more of the models of computation presented in this chapter. For simplicity, you may assume that no two points have the same x- or y-coordinate, and that no three points fall on the same straight line. Your algorithms may use one or more of the following properties of the convex hull: (a) If a point p falls inside the triangle formed by any three of the other n - I points, p is not a vertex of the convex hull. (b) If pi and pj are consecutive vertices of the convex hull, and pi is viewed as the origin of coordinates, then among all the remaining n -I points of the set, pj forms the smallest angle with pi with respect to the positive (or negative) x-axis. (c) A segment (pi, pj) is an edge of the convex hull if and only if all the n - 2 remaining points fall on the same side of an infinite line through (pi, pj). (d) If all the rays from a point p to every other point in the set are constructed, and the largest angle between each pair of adjacent rays is smaller than 7r, then p is not on the hull (and conversely). 2.3. Analyze each algorithm designed in Problem 2.2 to obtain its running time t(n) and the number of processors it uses p(n), both of which are functions of the size of the problem n (i.e., the number of points given as input). 2.4. A set of 2n points in the plane consists of n blue points and n red points. It is required to connect every blue point to exactly one red point, and similarly, every red point to exactly one blue point by straight lines whose total length is the minimum possible. Derive parallel algorithms for solving this problem on at least two different models of parallel computation, and analyze their running time. 2.5. Two points p and q in a simple polygon P are said to be visible from one another if the line segment with endpoints p and q does not intersect any edge of P. The visibility polygon from a point p contained inside a polygon P is that region of P that is visible from p. Show how this problem can be solved in parallel on a hypercube parallel computer. 2.6. Given a set of circular arcs S on a circle C, it is required to find the minimum number of arcs in S that cover C. Design an efficient parallel algorithm for solving this problem on a two-dimensional array of processors. 2.7. The plus-minus 2' (PM2I) interconnection network for an N-processor computer is defined as follows: Processor j is connected to processors r and s, where r = j + 2' mod N and s j-2' mod N, forO < i < logN. (a) Compare the PM2I processor network to the hypercube. (b) Use the PM2I processor network to solve Problem 1.4. 2.8. Consider the following model of parallel computation. The model consists of n2 processors arranged in an n x n array (n rows and n columns). The processors are interconnected as follows: (a) The processors of each column are connected to form a ring (i.e., every processor is connected to its top and bottom neighbors), and the topmost and bottommost processors of the column are also connected. (b) The processors of each row are connected to form a binary tree (i.e., if the processors
Sec. 2.5
References
25
in the row are numbered 1, 2,. n, then processor i is connected to processors 2i and 2i + I if they exist). Use this model to solve Problem 1.5. 2.9. Let N processors, numbered 0, 1, ... , N - 1, be available, where N is a power of 2. In the perfect shuffle interconnection network a one-way line links processor i to processor j, where the binary representation of j is obtained by cyclically shifting that of i one position to the left. Thus for N = 8, processor 0 is connected to itself, processor I to processor 2, processor 2 to processor 4, processor 3 to processor 6, processor 4 to processor 1, processor 5 to processor 3, processor 6 to processor 5, and processor 7 to itself. In addition to these shuffle links, two-way links connecting every even-numbered processor to its successor are sometimes added to the network. These connections are called the exchange links. In this case, the network is known as the shuffle-exchange interconnection network. Use the shuffle-exchange interconnection network to solve Problem 1.6. 2.10. An Omega network is a multistage interconnection network with n inputs and n outputs. It consists of k = log n rows numbered 1, 2, . . ., k with n processors per row. The processors in row i are connected to those in row i + 1, for i = 1, 2, . . ., k-1, by a perfect shuffle interconnection. (a) Discuss the relationship between the Omega network and a k-dimensional hypercube. (b) Use the Omega network to solve Problem 2.5. 2.11. A satellite picture is represented as an n x n array of pixels each taking an integer value between 0 and 9, thus providing various gray levels. The position of a pixel is given by its coordinates (ij), where i and j are row and column numbers, respectively. It is required to smooth the picture [i.e., the value of pixel (ij) is to be replaced by the average of its value and those of its eight neighbors (i - 1, j), (i - 1,j - 1), (i, j - 1), (i + 1, j - 1), (i + 1, j), (i + 1, j + 1), (i, i + 1), and (i -1, j + 1), with appropriate rounding]. (a) Design a special-purpose model of parallel computation to solve this problem. Assume that N, the number of processors available, is less than n2 , the number of pixels. (b) Give two different implementations of the smoothing process, and analyze their running times. 2.12. As described in Problem 2.11, a picture can be viewed as a two-dimensional array of pixels. A set S of pixels is said to be convex if the convex hull of S does not contain any pixel not belonging to S. Design a parallel algorithm for the two-dimensional pyramid to determine whether a set of pixels is convex.
2.5 References [Ajta831 [Akl85b]
[Akl89a] [Akl89c] [B1eI89]
M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n log n) sorting network, Combinatorica, Vol. 3, 1983, 1-19. S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. AkM, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. S. G. Akl and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538.
26
Models of Parallel Computation
Chap. 2
[Codd68] E. F. Codd, CellularAutomata, Academic Press, New York, 1968. [Cole88b] R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. [Conr86] M. Conrad, The lure of molecular computing, IEEE Spectrum, Vol. 23, No. 10, October 1986, 55-60. [Corm90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. [Feit88] D. G. Feitelson, Optical Computing, MIT Press, Cambridge, Massachusetts, 1988. [Fost80] M. J. Foster and H. T. Kung, The design of special purpose VLSI chips, Computer, Vol. 13, No. 1, January 1980, 26-40. [Hole9O] J. A. Holey and 0. H. Ibarra, Iterative algorithms for planar convex hull on mesh-connected arrays, Proceedings of the 1990 InternationalConference on Parallel Processing, St. Charles, Illinois, August 1990, 102-109. [Krus85] C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles, Illinois, August 1985, 180-185. [Leig85] F. T. Leighton, Tight bounds on the complexity of parallel sorting, IEEE Transactions on Computers, Vol. C-34, No. 4, April 1985, 344-354. [Lipp87] R. P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magazine, April 1987, 4-22. [Mill88] R. Miller and Q. F. Stout, Efficient parallel convex hull algorithms, IEEE Transactions on Computers, Vol. C-37, No. 12, December 1988, 1605-1618. [Mins69] M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, Massachusetts, 1969. [Rose62] F. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1962. [Ullm84] J. D. Ullman, Computational Aspects of VLSI, Computer Science Press, Rockville, Maryland, 1984.
3 Convex Hull
A set P = {Po, p 1 Pn-} of points in the plane is given, where each point is represented by its Cartesian coordinates [i.e., pi = (xi, yi)]. It is required to find the convex hull of P (i.e., the smallest convex polygon that includes all the points of P). The vertices of CH(P) are points of P such that every point of P is either a vertex of CH(P) or lies inside CH(P). Figure 3.1 shows a set P of points and the convex hull of P.
Without doubt the most popular problem among designers of sequential computational geometric algorithms, constructing convex hulls enjoyed a similar attention in parallel computing. In fact, it appears to be the first problem in computational geometry for which parallel algorithms were designed [Nath8O, Chow8l, Akl82]. To simplify our subsequent discussion we make two assumptions: 1. No two points have the same x or y coordinates. 2. No three points fall on the same straight line. These two assumptions can easily be lifted without affecting the behavior of the algorithms we present. Our statement of the convex hull problem requires a polygon to be computed. Any algorithm for determining CH(P) must then produce its vertices in the (clockwise or counterclockwise) order in which they appear on the convex hull. Consequently, any such algorithm can be used to sort n numbers [Akl89a]. Therefore, the running time of any algorithm for computing CH(P) on some model of computation is bounded below by the time required to sort on that model. For example, sorting n numbers on an 0(n)-processor linear array requires Q (n) time [Akl85b], and hence the same bound applies to computing the convex hull. In fact, as we will see below, many algorithms for computing the convex hull use sorting explicitly as a step in their computations. It should be noted that the problem of determining the vertices of the convex hull in any order is no easier asymptotically than that of producing them as a polygon. Indeed, 27
28
Convex Hull
0
0
Chap. 3
0
0 0 *0
0 0
0*0
0 a
db
(a)
Figure 3.1
(b)
(a) Set P of points and (b) convex hull CH(P) of P.
it is known that the former problem requires Q (n log n) algebraic operations [Yao8 1], and this coincides with the lower bound on sorting [Ben-083].
3.1 Shared-Memory Model Algorithms One of the first parallel alogorithms for the convex hull problem appears in [Akl82]; it runs in constant time on the CRCW PRAM (with the AND rule for resolving write conflicts) and requires 0(n3) processors. An improved algorithm for the same model that also runs in constant time but with 0(n 2 ) processors is described in [Akl89b]. The algorithm makes use of the following two properties of the convex hull: Property 1. Let pi and pj be consecutive vertices of CH(P), and assume that pi is taken as the origin of coordinates. Then among all points of P, pj forms the smallest angle with pi with respect to the positive (or negative) x-axis.
the n
Property 2. A segment (pi, Pi) is an edge of the convex hull if and only if all - 2 remaining points fall on the same side of an infinite line through (pi, P).
For simplicity of notation, and when the distinction is clear from context, we henceforth use (Pi, pj) to represent both the straight-line segment with endpoints pi and pj, as well as the infinite straight line through points pi and pj, without qualification. Assume that 0(n 2 ) processors are available on a CRCW PRAM. By assigning 0(n) processors to each point Pi, it is possible to determine in constant time (using the SMALLEST write-conflict resolution rule) the points Pk and Pm such that the segments (Pi,Pk) and (piPm) form the smallest angles with respect to the positive x-axis and the negative x-axis, respectively. Now, by assigning 0(n) processors to each segment (p ,pj) found in the preceding step, it is possible to determine in constant time (using the AND write-conflict resolution rule) whether all input points fall on the same side of an infinite straight line through (pipj), in which case pi is declared to be a point of CH(P). Finally, by sorting them according to their polar angles (using a constant time CRCW PRAM sorting algorithm [Akl89a]), the points identified as vertices of CH(P)
Sec. 3.1
Shared-Memory Model Algorithms
29
can be listed in the order in which they appear on the boundary of the convex hull (e.g., clockwise order). Thus the entire algorithm requires constant time. It is interesting to point out that this algorithm is essentially a parallelization of the algorithm due to Jarvis [Jarv73], long believed to be inherently sequential because of the incremental (point-by-point) way it constructs the convex hull. Note further that no algorithm is known for constructing the convex hull in constant time in the worst case while using asymptotically fewer that n
2
processors. By contrast, the CRCW PRAM
algorithm described in [Stou88], which requires 0(n) processors and the COLLISION rule for resolving write conflicts, assumes that the data are chosen from a uniform distribution and runs in constant expected time. A different approach is taken in [Atal86a] and, independently in [Agga88], for the weaker CREW PRAM with 0(n) processors. It is based on the idea of multiway divideand-conquer, which consists of dividing the problem into a number of subproblems whose solutions are obtained recursively in parallel, and then merging these solutions. The algorithm proceeds in three steps. In the first step the n points are sorted by their x-coordinates. The set P is then partitioned into n1/2 sets PI, P2 , .. . , P,112, divided by vertical lines such that Pi is to the left of Pj if i < j. This step is implemented in 0(logn) time using the sorting algorithm of [Cole88b]. In the second step, the convex hull problem is solved recursively in parallel for all Pi to obtain CH(Pi). Finally, the union of the convex polygons CH(PI), CH(P2 ), ..., CH(Pn 2) is computed, yielding CH(P). The merge step is implemented as follows. Let u and v be the points of P with smallest and largest x-coordinates, respectively. The convex polygon CH(P) consists of two parts: the upper hull (from u to v) and the lower hull (from v to u). We describe the merge step for the upper hull (the computation for the lower hull is symmetric). Each CH(Pi) is assigned 0(n"/2 ) processors. These processors find the n 1 2 - I upper common tangents between CH(P,) and the remaining n1 2 _I other convex polygons. Each tangent between two polygons is obtained by applying the sequential algorithm of [Over8l], which runs in 0(logn) time. Since 0(n1 /2 1) processors are computing tangents to the same polygon simultaneously, concurrent reads are needed during this step. Among all tangents to polygons to the left of CH(P,), let Vi be the one with smallest slope and tangent to CH(Pi) at the point vi. Similarly, among all tangents to polygons to the right of CH(Pt), let W, be the one with the largest slope and tangent to CH(P1 ) at the point wi. As shown in Figure 3.2, if the angle formed by V, and W, is less than 180°, none of the points of CH(Pi) is on the upper hull; otherwise, all the points from vi to wi are on the upper hull. These computations are done simultaneously, for all CH(P1 ), each yielding a (possibly empty) list of points on the upper hull. A parallel prefix computation, as defined in Section 2.3.2, is then used to compress these lists into one. In this case the addition operation (+) is used as the binary associative operation and the computation is referred to as computing prefix sums. Consider a shared memory array of points z(I), z(2), . . , z(n), where for each point z(i), it is known whether or not z(i) is an upper hull point. To compact all upper hull points into adjacent positions of the array, we assign a label s(i) to z(i), such that s(i) = I if z(i) is an upper hull point; otherwise, s(i) = 0. The position of upper hull point z(k)
Convex Hull
30
Chap. 3
CH(Pi)
(a)
Vj
Wj
CH(PO)
(b)
Figure 3.2 (a) The angle formed by V1and W, is less than 180°. (b) The angle formed by Vj and W, is greater than 180°.
in the compacted array is then obtained from s(l) + s(2) + * *+ s(k). This quantity is known as a prefix sum, and all n prefix sums can be computed in 0 (log n) parallel time using O(n) processors. Initially, processor i knows s(i), and when the following iterative computation terminates, s(i) has been replaced by s(l) + s(2) + *.. + s(i): During iteration j, where 0 < j < logn - 1, and for all i, 2i + 1 < i < n, processor i replaces s(i) with s(i -21) + s(i). Thus the merge step requires O(n) processors and 0(logn) time. The overall running time of the algorithm is given by t(n) = t(n"/2 ) + b logn, for some constant b, which is O(logn). Since O(n) processors are used, the cost of the algorithm (i.e., the total number of operations performed) is 0 (n log n). This cost is optimal in view of the Q (n log n) lower bound on the number of operations required to solve the convex hull problem [Prep85]. The same running time is obtained in [Cole88a], using the cascading merge tech-
Sec. 3.1
Shared-Memory Model Algorithms
31
nique described in detail in [Atal89c]. We outline the technique briefly here. Given an unsorted list of elements stored at the leaves of a tree T (where some leaves may be empty), the list U(v) at each node v of T is computed, which is the sorted list of all elements stored at descendent leaves of v. The algorithm proceeds in stages computing a list Us(v) at stage s for each node v E T. At each stage s, a sample of the elements in U,-I(x) and U.- (y) are merged to form the list Us(v), where x and y are the left and right children of v, respectively. An internal node v is active at stage s if Ls/31 < alt(v) < s, where alt(v), the altitude of a node v, is the height of the tree T minus the depth of v (the depth of the root is 0). Node v is full at stage s if Ls/3j = alt(v). For each stage s up to and including the time a node v becomes full, a sample of every fourth element of Us- (x) is passed to v and merged with every fourth element of U, I(y) to create the list Us(v). In stage s + 1, a sample of every second element of Us(x) and a sample of every second element of Us(y) are merged, and in (y) are merged. Therefore, there are stage s + 2, all the elements of U + 1 (x) and Ui+ 3 x height(T) stages in total. A sorted list L is a c-cover of a sorted list J if between every two elements of the list (-oo, L, +oc), there are at most c elements of J. It is shown in [Atal89c] how the two sample lists can be merged to create Us(v) in constant time using O(IU,(v)l) processors if the list of items passed to v from its child node x(y) in stage s is a c-cover of the list of items passed to v from x(y) in stage s + 1. In [Cole88a], n points are sorted by x-coordinate and the upper and lower hulls are computed using the cascading divide-and-conquer technique. Consider computing the upper hull. Initially, the points are paired and each pair represents an edge on the upper hull of that pair of points. The upper hulls are stored at the leaves of a tree. Given edges on two upper hulls sorted by slope, the edges are merged using cascading merge, and the common tangent t is found. Those edges not on the union of the upper hulls are deleted from the sorted list and t is added. Adding and deleting these edges does not change the fact that the list being passed up the tree at stage s is a c-cover of the list being passed up the tree at stage s + 1. The convex hull is thus computed in 0 (log n) time using
O(n)
processors on a CREW PRAM.
Two other algorithms for the CREW PRAM are described in the pioneering work of Chow [Chow81]: The first uses O(n) processors and runs in O(log 2 n) time, while the second uses 0(nl+I±K)processors and runs in O(K logn) time, with 1 < K < logn. Many efforts were directed toward obtaining efficient convex hull algorithms on the least powerful variant of the shared-memory model, namely the EREW PRAM. For example, an algorithm appearing in [Nath8O] runs in 0(K logn) time with 0(n±+I/K) processors, 1 < K < logn, thus duplicating the performance of the CREW PRAM algorithm in [Chow8l]. Also described in [Nath8O] is an O(N)-processor algorithm, I < N < n, which runs in 0(n/N log n + log n log N) time. An algorithm in [Akl84] uses 0(n 1 -) processors, 0 < E < 1, and runs in 0(nL logh) time, where h is the number of edges on the convex hull. It is shown in [Mill88] how the CREW PRAM multiway divide-and-conquer algorithm of [Atal86a] and [Agga88] can be modified to achieve the same performance on the weaker EREW PRAM. A judicious distribution of work among processors and
Convex Hull
32
Chap. 3
broadcasting the data needed to compute the tangents avoids the need for concurrent reads. The algorithm has essentially the same structure as the CREW algorithm (i.e., subdivision, recursive solution, and merging), with two differences: 1. In the first step, P is subdivided into n1/ 4 subsets, PI, P2 ,. . . , P. . /4, each with n 3 / 4 points. 2. In the third step, a different approach is used to compute the tangents. Let (p,, pj) be the upper tangent between CH(Pi) and CH(Pj), with pi in Pi, and pj in Pj. The slope of this tangent lies between the slopes of (pi-,,pi) and (pi,Pi+±), and also between the slopes of (pj-,, pj) and (pj,pj+,). This property is used to obtain the upper (lower) tangents between all pairs of convex polygons computed recursively during the second step of the algorithm. We now describe how this is done for the upper tangents. Step 1. Let the upper hull of Pi contain ni points. Of these, n /4 are marked, breaking the upper hull into convex chains of equal length. For each three consecutive marked points Pk-I, Pk, Pk+I, the processor containing Pk. say processor j, creates two slope records: [slope of straight line through
(Pk, Pk+l), Pk, Pk+l
[slope of straight line through Step 2.
(Pk-1, Pk),
Pk-I,
fll, and
Pk
j]-
Every CH(Pj) sends its 0(n1 /4 ) slope records to every other CH(Pj).
Step 3. Every CH(Pj) creates two slope records for each of its points [i.e., 0(n3 / 4 ) records in all]. These records are merged with the 0(n1/2 ) records received from the other CH(P1). Step 4. Through a parallel prefix (postfix) operation, each processor, upon receiving a record with slope s, can determine the largest (smallest) slope of a CH(Pj) edge that is smaller (larger) than s. Thus each processor in CH(Pj) that contains a received record representing a point p of CH(PF) can determine (in constant time) the endpoints of the upper tangent to CH(Pj) passing through p, and whether p is on, to the left, or to the right of the upper common tangent between CH(Pj) and CH(Pj). Step 5. Each CH(Pj) returns to CH(PI) the records it received from the latter, appended with the information gathered in step 4. This allows CH(P,) to determine, for each CH(Pj), the points (between consecutive marked points) that contain an endpoint of an upper common tangent between CH(P,) and CH(Pj). Step 6. Thus far, the search within CH(Pi) for the endpoint of the upper tangent to CH(Pj) has been narrowed down to 0(n1 /2 ) points. Steps 1 to 5 are now repeated two more times: In the first, 0 (n1/4) equally spaced points, among the 0(n1 /2 ), are sent from CH(Pj) to CH(Pj), thus reducing the number of candidates further to 0(n 1 /4 );
Sec. 3.2
Network Model Algorithms
33
finally, all 0(n' 4) leftover points are sent to CH(Pj), leading to the determination of the upper tangent endpoint. Computing the tangents therefore requires 0(logn) time. Since all other steps are identical to the ones in the CREW PRAM algorithm, this algorithm runs in time t(n) = t(n3 4) + clogn, for some constant c, which is 0(logn).
3.2 Network Model Algorithms
Several parallel algorithms for the convex hull problem have been developed for network models. Among the first algorithms are two results by Chow presented in [Chow8l] for the cube-connected cycles (CCC) network model. The first algorithm runs in 0(log2 n) time on a CCC with O(n) processors. It uses the divide-and-conquer technique, which splits the set P into two sets PI and P2, recursively solves the problem on PI and P2 , and then merges the two hulls to find CH(P). The second algorithm in [Chow8l] is also a divide-and-conquer algorithm; however, P is split into n 1 -/K subsets of n 1 K points each where I < K < logn. The problem is solved on each of the nI-I/K subsets simultaneously and the hulls are merged into one. This algorithm runs in 0(K logn) time on a CCC with
0(n 1+/K)
processors.
Little work on convex hull solutions for the CCC model has since been reported; however, a number of results exist for the hypercube model of computation. Stojmenovi6 presents two parallel convex hull algorithms for a hypercube with O(n) processors that both run in 0(log2 n) time [Stoj88a]. In both algorithms the input data are distributed one point per processor, and the points are sorted by x-coordinate in O(log 2 n) time [Akl85b]. The first algorithm is an adaptation for the hypercube of the CREW PRAM multiway divide-and-conquer algorithm of [Atal86a] and [Agga88]. The second algorithm is similar in spirit to the first CCC algorithm of [Chow8l]: The set P is divided into two disjoint sets PI and P2, each with approximately n/2 points and stored in a hypercube of O(n/2) processors; CH(PI) and CH(P2 ) are computed recursively, and CH(P) is formed by constructing the two common tangents between CH(PI) and CH(P2 ). We describe in some detail how this last step is performed. For each edge e of CH(PI) [similarly, for each edge of CH(P2 )] it is possible to decide if e is an edge of CH(P) by applying Property 2; namely, edge e is in CH(P) if CH(PI) and CH(P2 ) are in the same half-plane defined by the infinite line through e. We describe how this test is done for edges in CH(PI). For edges in CH(P2 ), the test is symmetric. Rather than testing all the vertices of CH(P2 ) with an edge e of CH(P1 ), it suffices to test two points: the nearest and farthest points of CH(P2 ) to e. Assuming that these two points are known to the processor that stores e, that processor can determine if e belongs to CH(P) in constant time. Consequently, every processor containing a point of CH(PI) or CH(P2 ) can determine if that point is a vertex of CH(P). Among the vertices of CH(P) thus determined, exactly two from CH(PI) have two adjacent edges such that one edge is on CH(P) and the other is not. Similarly, exactly two vertices from CH(P2 ) have the property of being adjacent to two edges, one on CH(P) and the other
Convex Hull
34
0
1
14
15
3
2
13
12
4
7
8
11
5
6
9
1
Chap. 3
Figure 3.3 Mesh of size 16 in proximity order.
not on CH(P). These four points define the upper and lower tangents of CH(PI) and CH(P2 ). It remains to be shown how for each edge e of CH(PI), the points pi and pj of CH(Pi) that are nearest to and farthest from e, respectively, can be found. The following property is used: pi belongs to an edge ei of CH(P2 ) such that Islope(e) -slope(ei)I is minimized; similarly, pj belongs to an edge ej of CH(P2 ) such that Islope(e) +
7r -
slope(ej) I is minimized. Therefore, by merging the slopes of edges
in CH(PI) and CH(P2 ), the nearest and farthest points in CH(P2 ) to each edge in CH(P1 ) can be found. Since merging two lists of size n/2 each on an O(n)-processor hypercube can be done in 0 (log n) time, and since all other operations described require constant time, the merge step runs in O(logn) time. The running time of the algorithm is t(n) = 2t(n/2) +blogn for some constant b, which is O(log 2 n). Another algorithm for computing the convex hull of n planar points on an O(n)-processor hypercube is given by Miller and Stout in [Mill88]. There are two main steps in the algorithm: The first sorts the set of points in O(log 2 n) time; the second is a divide-and-conquer computation requiring O(logn) time. Therefore, the time to sort n points on a hypercube of size n dominates the performance of this convex hull algorithm. Recently, a faster algorithm to sort on the hypercube has been developed which runs in O(lognloglogn) time. This sorting algorithm is presented in [Leig9l] and is based on ideas first proposed in [Cyph9O]. Therefore, the running time of the convex hull algorithm by Miller and Stout [Mill88] can be reduced to 0 (log n log log n). Miller and Stout also present an algorithm for computing the convex hull of a set of n points in the plane that runs in 0 (n 1/2) time on a mesh of size n [Mill89b]. The indexing used for the mesh is proximity ordering, which combines the advantages of snakelike ordering and shuffle row-major ordering: Adjacent processors have consecutive processor numbers and processors are organized by quadrant, which is useful for algorithms that employ the divide-and-conquer strategy. Figure 3.3 shows a mesh of size 16 with processors ordered in proximity order. To simplify the description of the algorithm, it is assumed that there is no more than one point per processor. The points are sorted by x-coordinate and divided into four subsets, PI, P2, P3 ,
Sec. 3.2
Network Model Algorithms
35
, All points in CH(P 2 ) lie below (a,p)
Some points in CH(P 2 ) lie above (p,b) a
Figure 3.4
CH(P,), CH(P2 ) and the lines (a, p) and (p, b).
and P4 , by three vertical separating lines. Each subset is mapped to a consecutive quadrant on the mesh. Let the quadrant Ai contain the set Pi of points, i = 1, 2, 3, 4. The convex hull is found recursively for the points in each quadrant and the resulting four convex hulls are merged in three steps. CH(PI) and CH(P2 ) are merged to form CH(PI U P2) = CH(B1), and CH(P3 ) and CH(P4 ) are merged to form CH(P3 U P 4 ) = CH(( 2 ). Finally, CH(B1 ) and CH(B2) are merged to form CH(Bi U B2) == CH(P). We now describe how merging two convex hulls is performed in 0(nl/2 ) time on a mesh of size n. We discuss merging CH(PI) and CH(P2 ) into CH(BI). The other two merges are similar. Merging CH(PI) and CH(P2 ) into CH(B1 ) requires finding the points p, t E CH(P1 ) and q, s E CH(P2 ) such that (p,q) is the upper common tangent to CH(PI) and CH(P2 ) and (s,t) is the lower common
tangent. We describe how to find the point p e CH(PI). A similar approach is used to find q, s, and t. Note that the two hulls do not intersect because they are separated by a vertical separating line. The coordinates of the point of CH(PI) with the smallest x-coordinate Xmin, and the point with the largest x-coordinate Xmax, and their position in counterclockwise order around CH(Pl) are reported to all processors of quadrant Al by two semigroup operations. A semigroup computation applies an operation such as minimum to all data items in a given quadrant in 0(r1 2 ) time, where r is the maximum
number of processors in a quadrant, and broadcasts the resulting value to all processors in the quadrant. Note that p must lie on or above the line (xmin, Xmax). Let a and b be the
points immediately succeeding and preceding p, respectively, in the counterclockwise ordering of hull points. All the points in CH(P2 ) must be below the line (a,p) and some of the points in CH(P2 ) must be above the line (p,b). See Figure 3.4. Initially, p is chosen to be the point pi on CH(PI) halfway between points Xmin
Convex Hull
36
Chap. 3
and xmax; pi is identified and reported to all processors in quadrant Al by using a semigroup operation. Let a, and bi be the points immediately succeeding and preceding pi, respectively. The two processors that contain a, and bi compute (ai,pi) and (p ,bi), respectively, and pass these values to all processors in quadrant A2 [the quadrant storing the points of CH(P2 )] by performing a concurrent read operation. Concurrent read, sometimes called random access read, allows any number of processors to read a value stored at another processor. Concurrent read takes O(n112 ) time on a mesh of size n. The processors in A2 store a 1(0) in a variable if they are below(above) (aj,pi) and store a 0(1) in a different variable if they are above(below) (pi,bi). They then write both of these variables to the processor in Al that contains pi using a concurrent write operation where conflicts are resolved by writing the minimum value in the variables. Concurrent write, sometimes called random access write, allows any number of processors to write to a location in a different processor. Concurrent write is executed in 0(nl/2) time on a mesh of size n. The processor that stores pi determines: 1. If all points in CH(P2 ) are below (ai,pi) (1 written by processors in A2), or 2. If one or more points in CH(P2 ) are above (pi,bi) (O written by processors in A 2 ).
If both conditions are satisfied, the point p has been found. If the first condition is not satisfied, Xmax is assigned to ai and pi is recomputed to be halfway between xmin and xmax. If the second condition is not satisfied, xmin is assigned to bi and pi is recomputed as above. The data are compressed to minimize communication cost and the merge algorithm is iterated. In data compression, m pieces of data distributed randomly on a mesh of size r such that r > m are moved to a submesh of size m in O(rI/2 ) time. In this binary search manner, 0(logn) iterations of the algorithm are executed to find each of the four tangent points. Steps in the first iteration operate on approximately n/2 pieces of data, and because of the data compression operation, at the ith iteration, approximately n/2' pieces of data are involved. Therefore, the number of steps over O(logn) iterations is O (log n)
E
(n/2') 1 /2
=
O(n 1 /2).
i=O
To compute the points in CH(P1 U P2) = CH(B), all processors concurrently read the number of points in CH(Pl), the number of points in CH(P2), and the counterclockwise positions of p, q E CH(Pl), and s, t C CH(P2 ). Each processor computes the position in CH(B) of its hull point (if it contains one). This final step in the merging takes O(n 1/2) time. The total time to merge takes 0(n 1/2) time as shown above, and merging CH(PI) with CH(P2 ) can be done in parallel with merging CH(P3 ) and CH(P4 ). The following recurrence relation gives the total time of the convex hull algorithm on a mesh of size n: t(n) = t(n/4) + cn112 , for some constant c; therefore, the algorithm runs in 0(n'/2 ) time. In [Mill84a], Miller and Stout present a similar algorithm for a mesh with a
Sec. 3.3
Other Models
37
snakelike order indexing scheme. The running time is the same as for the proximity order algorithm. Both of these algorithms for the mesh are time optimal since the lower bound for sorting on a mesh of size n is Q (nl/2) [Akl85b). In [Wang90a], parallel algorithms are sketched for sorting n elements in constant time and computing the convex hull of n points using a three-dimensional n x n x n mesh with reconfigurable buses. The sorting algorithm, which appeared originally in [Wang90b], achieves its constant running time by exploiting the constant time reconfigurability of the buses and the fact that transmission of a signal along a bus through 0(n) processors takes constant time. In [Wang90a], a straightforward planar embedding of the n x n x n array is proposed. While the changes in the interconnection from that described in [Wang90b] are cosmetic (e.g., replacing a diagonal with a right angle) the number of processors remains the same, and the sorting algorithm is essentially unchanged. The convex hull algorithm described in [Wang90a] is also an immediate consequence of the result in [Wang90b]. Algorithms for computing the convex hull on a linear array, a d-dimensional mesh, and a hypercube are presented in [Hole90]. Three types of linear array are considered. One allows input at one end and output at the other such that data travel in one direction only, the second allows input at all processors but data movement in one direction only, and the third allows input and output at all processors and the data movement in either direction. The convex hull algorithm runs on all three types of linear array in 0(n) time with 0(n) processors, on a d-dimensional mesh in 0(d 2 nlid) time with 0(n) processors and on a hypercube in 0(log2 n) time with 0(n) processors. Dynamic convex hull algorithms are also given in which deletions and insertions of points from and to the set are handled. Finally, the following results regarding convex hull computation on processor networks deserve mention: 0(n) time on an 0(n)-processor linear array [Chaz84, Chen87]; 0(logn) time on an 0(n2 )-processor mesh-of-trees [Akl9b]; and 0(logn) time on an 0(n)-processor modified AKS network [Mill88]. In [Reif90] a randomized algorithm for determining the convex hull of a set of points in the plane on an 0(n)-processor butterfly is given that runs in 0 (log n) probabilistic time. In Chapter 11 we describe an algorithm for computing the convex hull on the star and pancake networks.
3.3 Other Models A number of models of computation that are less well known than the PRAM or processor network models have also been used to develop parallel convex hull algorithms. Three of these are particularly noteworthy. In [Blel88], two algorithms are given for the scan model. The first of these is based on the sequential algorithm of [Eddy77], dubbed Quickhull in [Prep85]. When the input points obey certain probability distributions, the algorithm runs on the scan model with 0(n) processors in 0(logh) expected time, where h is the number of points on the convex hull. In the worst case, however, the algorithm runs in 0(n) time. The second algorithm in [Blel88] is an adaptation of
38
Convex Hull
Chap. 3
the multiway divide-and-conquer algorithm of [Atal86a] and [Aggal88]: It also requires 0 (n) processors, but runs in 0 (log n) time in the worst case. An algorithm for the BSR model is described in [Akl89c] which uses the following property of convex hull points: Property 3. Consider a point p. Construct all the rays from p to every other point in the set. These rays form a star centered at p. Measure the angles between each pair of adjacent rays. If the largest such angle is smaller than ir, then p is not on the hull (and conversely). The algorithm in [Akl89c] requires 0(n 2 ) processors and runs in constant time. The details of this algorithm are provided in Chapter 11.
Summary The table in Figure 3.5 summarizes the results in the previous sections. Note that h is the number of hull edges, 0< e < 1, and 1 < K < log n.
3.4 When the Input is Sorted Assume that the n points for which the convex hull is to be computed are already sorted (say, by their x-coordinates). This situation may be used advantageously by any algorithm that explicitly sorts its input, and whose processor and time requirements are dominated by those for sorting. For example, 0(log3 n/(log log n) 2) time algorithms are described in [Mill88] for computing the convex hull of n sorted points on a tree, a pyramid, and a mesh-of-trees, each with 0(n) processors. Also given in [Mill88], as stated in Section 3.2, is an 0(n)-processor hypercube algorithm that runs in 0(logn) time if the input points are sorted and is identical to the one described in detail in Section 3.1 for the EREW PRAM. A CREW PRAM algorithm is presented in [Good87b] which computes the convex hull for sorted inputs in 0(logn) time using 0(n/logn) processors. An algorithm in [Fjal9O] computes the convex hull of a set of sorted points in 0(logn/loglogn) time using 0(n loglogn/logn) processors on a COMMON CRCW PRAM.
3.5 Related Problems 3.5.1 Three-Dimensional Convex Hulls There has been some work in developing parallel algorithms for the three-dimensional convex hull problem. Some of the earliest algorithms for computing the convex hull of a set of points in three dimensions are presented in [Chow8O]. One of these runs on a CREW PRAM in O(log 3 n) time using 0(n) processors and an 0(logn) time parallel sorting algorithm. [Note that the running time in [Chow8O] is actually given as 0(log3 n loglogn) because, at the time, 0(logn loglogn) was the running time of the
Sec. 3.5
Related Problems
39
Reference
Model
Processors
Running time
[Chow8l] [Chow8ll
CREW PRAM CREW PRAM
0(n)
[Nath8O]
CREW PRAM EREW PRAM
I < N 0, a point p in the xy-plane is transformed to the point pf by the inversion such that -po is in the same direction as Pop and Pop' = r2 / IPop If the inversion is applied twice, the original point results. The exterior of the sphere corresponds to one half-space bounded by the plane, and the interior of the sphere corresponds to the other half-space. Let S' be the set of inversion points of S. By property I of Voronoi diagrams, one way to find the Voronoi diagram of S is to test each set of three points, Pi, Pj, pk E S to determine if the circle through Pi, Pj, Pk contains any other point of S. This test corresponds to checking if the convex hull of S', CH(S'), and Po are in the same half-space bounded by the face of CH(S') that is defined by Pi,, Pk. If this test is successful, the center of the circle through pi, Pjb Pk is a Voronoi point. Algorithms are given in [Chow8O] to compute the convex hull of a set of points in three dimensions (see Chapter 3) on a CCC model. The first runs in 0 (log4 n) time and uses O(n) processors, and the second runs in O(K log3 n) time and uses O(n 1+1/K) processors, 1 < K < logn. These algorithms and the inversion method described are used in the design of two algorithms that compute the Voronoi diagram of a set of points in the plane on a CCC model within the same time and processor bounds. Mi Lu presents a Voronoi diagram algorithm also based on Brown's method that runs in 0(n 112 logn) time on an 0(n)-processor mesh [Lu86a]. An algorithm for computing the convex hull of a set of points on a sphere is used in the Voronoi diagram algorithm. It also runs on an 0(n)-processor mesh in 0(n'12 logn) time. A time optimal algorithm is given in [Jeon9O] that runs in O(n1/2) time on an n1/2 x n1/ 2 mesh. The algorithm is based on the divide-and-conquer approach used in [Sham75]. The set of points is sorted by x-coordinate and divided in half into two
102
Voronoi Diagrams
Chap. 8
sets L and R by a vertical separating line I such that points in L are to the left of I and points in R are to the right of 1. Sorting takes O(n1 / 2) on a mesh of size n [Akl85b]. Recursively, the Voronoi diagrams Vor(L) and Vor(R) are computed for the sets L and R, respectively. The two diagrams are then merged, resulting in Vor(L U R). The merge step finds C, the collection of edges in Vor(L U R) that are shared by polygons of points in L and polygons of points in R. This dividing chain C is monotone with respect to the y-axis, and all points to the left of C are closer to a point in L than to any point in R. Similarly, all points to the right of C are closer to a point in R than to any point in L. The merge step works by identifying those Voronoi edges in Vor(L) and Vor(R) that are intersected by C. Planar point location is used to determine which Voronoi vertices of Vor(L) [respectively, Vor(R)] are closer to R [respectively, L], and the Voronoi edges of Vor(L) and Vor(R) are subdivided into groups depending on whether one, both, or none of their endpoints are closer to L or to R (special action is taken for unbounded Voronoi edges). It is shown how to determine, from this information, which edges intersect C. Let B, be the set of edges of Vor (L) that intersect C, and let Br be the set of edges of Vor (R) that intersect C. Therefore, B = B, U Br is the set of edges in both Vor(L) and Vor(R) that intersect C. The edges in B are sorted according to the order in which they intersect C. The chain C is directed from bottom to top and the edges in B are directed from the endpoint closer to L to the endpoint closer to R. Two edges ej, ej e B are y-disjoint if the minimum y-value of ei is no less than the maximum y-value of ej. The minimum (maximum) y-value of an edge is the minimum (maximum) y-value of its two endpoints. If two edges are y-disjoint, the order in which they cross C is easily determined since C is monotone with respect to the y-axis. If two edges are not y-disjoint, three cases are considered to determine the order in which they cross C. To find the actual edges of C, the points pi and pj, the bisector of which defines an edge of C, are found using a precede operation. The precede operation finds, for each edge el E Bl, the greatest edge in Br (sorted by the order in which they intersect C) that is less than el.The precede operation takes 0 (n1 /2) time [Jeon9O]. Finally, the edges of Vor(L U R) together with their vertices and the bisector points that define them are distributed so that each processor contains a constant number of Voronoi edges. Merging the two Voronoi diagrams takes O(n 1/2) time on a mesh, and the total time t(n) for the algorithm is t(n) = 2t(n/2) + O(n1/2 ), which is O(n 1 /2 ). Several improvements to the algorithm in [Jeon9O] are described in [Jeon9la]. An algorithm in [Stoj88a] that computes the Voronoi diagram of a set of n points in the plane in 0 (log 3 n) time on an 0(n)-processor hypercube can be obtained by using the algorithm in [Jeon9O] and a planar point location algorithm. Two algorithms are presented in [Saxe9O] (see also [Saxe9l]). The first finds the Delaunay triangulation of a set of n points in the plane in 0(log2 n) time on a mesh-of-trees of size n2. The second algorithm constructs the Delaunay triangulation for a set of points in three dimensions that runs in O(m 1 /2 logn) on an n x n mesh-of-trees, where m is the number of tetrahedra in the triangulation. The algorithm for a set of points in the plane is based on the fact that if (pi,pj) is a Delaunay edge, and if Pk is
Sec. 8.2
PRAM Algorithms for Voronoi Diagrams
103
the point such that cos Z PiPk Pi is a minimum among all points in S on the same side of the line through (pi,Pj) on which Pk lies, then APiPjPk is a Delaunay triangle. Let pI, P2,. . - P, be the set of points S. Each processor in row i is loaded with the coordinates of the point pi, and each processor in column j is loaded with the coordinates of the point pj, such that the processor in position (ij) in the mesh-of-trees contains two points, pi and pj. Each processor computes the square of the distance between its two points (i #Fj), and the minimum function is computed from leaf processors to the root in each column in 0(logn) time. The resulting edge that defines a closest point to the point pj is stored in each processor in column j. This edge is a Delaunay edge [Prep85]. An 0(logn) time compacting procedure is used to remove duplicate edges. Each processor containing edge (pi, Pj) computes cos Z Pi Pk P1 for k 0 i, j, and Pk to one side of the line through (PiPj) and cos Z PiPiPj for I $ i, j, and pi to the other side of the line through (pi,pj). The minimum "cos" on each side of the line through (pi,pj) is found by passing values to the column root in O(logn) time. The four new edges (PiPk), (PjPk)P (ipI), and (pjpi) are stored in processors in the
same column. Now, for each newly created edge (Pi,Pj), a point Pk is found that is in the triangulation, as was done in the previous step. By doing this, two new edges are created for each existing edge. This last step is then iterated 0(logn) times since, at each iteration, the number of edges remaining to be examined is decreased by half. As new edges are added, a compaction algorithm is executed to remove duplicate edges and to place edges in the mesh-of-trees in a form suitable for the next iteration. Thus the total time taken is O(log2 n). A similar algorithm is given for the case when the points are in three dimensions. An algorithm is given in [Jeon9lb] which computes the Voronoi diagram of n points in the plane on an n-processor mesh in 0(n'12 ) time, where distances are measured using the Li-metric. Finally, it is shown in [Schw89] how the discrete Voronoi diagram of an n x n digitized image can be obtained under the LI-metric in 0(logn) time on a mesh-of-trees with 0(n2 ) processors.
8.2 PRAM Algorithms for Voronoi Diagrams Several algorithms exist for computing the Voronoi diagram of n planar points on the PRAM model of computation. Chow gives an algorithm that uses inversion and computes the convex hull of a set of points in three dimensions. It runs in 0(log 3 n) time on a CREW PRAM with 0(n) processors [Chow8O]. A time-optimal algorithm is described in [Prei88] in which each point computes its own Voronoi polygon by determining its neighbors. The algorithm runs in 0(logn) time using 0(n3 ) processors on a CREW PRAM. The authors show how their algorithm can run on a (SMALLEST, LARGEST) CRCW PRAM in 0(1) time with 0(n4 ) processors.
104
Voronoi Diagrams
Chap. 8
Algorithms for computing the Voronoi diagram for a set of points under the LI-metric are given in [Wee90] and [Guha9O] for the CREW PRAM model. The first runs in 0(logn) time and uses 0(n) processors, and the second runs in 0(log 2 n) time and uses 0(n/logn) processors. Both algorithms are cost optimal in view of the Q (n log n) sequential lower bound for this problem [Prep85]. In [Agga88], an algorithm that uses divide-and-conquer and runs in 2n) time on an 0(n)-processor CREW PRAM is given. An algorithm is given in 0(log [Evan89] for computing the Voronoi diagram of points in the plane that runs in 0(log3 n) time and uses 0(n) processors on a CREW PRAM. It is also shown how the algorithm runs in 0 (log 2 n) time on a CRCW PRAM with the SMALLEST write conflict resolution rule. Their algorithm uses the divide-and-conquer technique of Shamos [Sham75] and is similar to that in [Jeon9O]. List ranking is used to compute the edges of C instead of the precede function. Given a linked list L of n elements represented as an array of pointers, list ranking computes, for each element e E L, its distance from the tail of the list, that is, the number of elements in L that follow e. List ranking can be performed in 0(logn) time on an EREW PRAM of size 0(n/ logn) [Cole88c]. The authors point out that by using the optimal point location algorithm of Jeong and Lee[Jeon9O] (see Chapter 5), their algorithm runs in 0(n/1 2 ) time on a mesh of size n. In [Levc88], a parallel algorithm is given for computing the Voronoi diagram of a planar point set within a square window W. It runs in 0 (log n) average time on a (PRIORITY) CRCW PRAM with 0(n/logn) processors when the points are drawn independently from a uniform distribution. The algorithm uses multilevel bucketing. The square window W is divided into equal-size cells in (log n)/2+ 1 ways, creating (logn)/2 +1 grids. Each grid GI,I = 0, 1... , (logn)/2, partitions W into 215n -21 equal-size squares. The grid with the most squares has 1lgn squares and the grid with 2 the least squares is W itself. The points of S are sorted into these buckets in parallel. This is done in 0(logn) expected time using 0(n/logn) processors on a (PRIORITY) CRCW PRAM by converting a randomized algorithm due to Reif [Reif85] that runs in 0(log n) probabilistic time on a CRCW PRAM of size n/log n. This is the only step of the Voronoi diagram algorithm that requires concurrent writing. It is shown how several Voronoi polygons of points can be computed in parallel by first computing for a point in S its rough rectangle of influence. Given a point p in a square C 1(i,j) of grid 1, define RI(p) as the region around C 1(i, j) from C(i - j3, - 3) to Cj(i + j3, + 3). If RI(p) extends outside W, only consider the part of it inside W. The rough rectangle of influence RI (p) for a point p is the rectangle RI (p) such that every square in RI (p) contains at least one point of S, and for each Rk(p), 0 < k 27r and (y- 27r) > xi are handled by merging the list of counterclockwise endpoints with yj - 27r, j = 1, 2, . . - n, and finding new successors. To check if no cover exists, a test if SUCC(Ai) = Ai for any arc Ai suffices. If a cover exists, it is found using the following idea from [Lee84a]: If the size ml of the minimum circle cover starting from an arbitrary arc is known, the size of the minimum circle cover for the set is either ml or ml -1. In other words, the greedy algorithm starting at an arbitrary arc always produces a cover whose size is at most one more than the size of the minimum circle cover. Once an arbitrary arc has been chosen, 0(n) processors cooperate to find ml in 0(logn) time. Each processor j starts at arc Aj and doubles the number of successor arcs in the cover at each step in parallel until the x endpoint of the arbitrary arc is included in the cover. Processor 1 then finds ml by summing the number of arcs included by each processor. After ml has been determined, each processor j checks in 0(logn) time whether there is a cover of size m- 1 starting at Aj. A slightly different approach is presented in [Boxe89c], but the same running time with the same number of processors is achieved. The 2n endpoints are stored in records with four fields such that the initial values are xi or yi, yi, i, and xc in fields 1, 2, 3, and 4, respectively, for record i. The records are sorted on their first field in 0(logn) time using 0(n) processors on a CREW PRAM [Cole88b]. To find SUCC(Ai) for each arc Ai, a parallel prefix "max" operation is used on the second field of the records. This operation takes 0(logn) time with 0(n) processors [Krus85]. For the yi records, this operation gives the maximum Yk such that Xk ' Yi < Yk, and the index k is the fourth field of each record. Since the arcs can cross R, an additional 2n records with 2wr subtracted from the yi's are created and the steps above repeated on these records. The 4n records resulting from the union of the two sets of records are then sorted by the third field, the index of the endpoint. Each processor i examines four records at sorted positions 4i, 4i + 1, 4i + 2, and 4i + 3 in constant time in parallel to find the index of SUCC(Ai). If there is no cover, there is an arc that is its own successor which can be determined in 0(logn) time with 0(n) processors. If there is a solution, then for all arcs Ai that include the origin R, the minimum number of arcs, county, required for
14
Geometric Optimization
Chap. 9
Ai to wrap around on itself is found. A linked list of the successor arcs is a partially ordered list of n elements represented by an array. A modified list ranking procedure is used to find county for each arc Ai that includes R in 0(logn) time. The minimum county is then computed in 0(logn) time to find the minimum circle cover. Finally, an algorithm is given in [Atal89b] that computes a minimum circle cover in 0(logn) time using 0(n) processors on an EREW PRAM model. If the endpoints of the arcs are sorted, only 0 (n/log n) processors are needed. This algorithm is cost-optimal based on the 0 (n log n) and Q (n) lower bounds for unsorted and sorted endpoints, respectively, given in [Lee84a]. For unsorted endpoints, sorting is performed first in 0(logn) time using 0(n) processors [Cole88b]. The endpoints are labeled such that for indices i and j, i < j means that xi is before xj in a clockwise walk around the circle. This relabeling can be done using parallel prefix in 0(logn) time and 0(n/logn) processors on an EREW PRAM [Krus85]. In [Atal89b], as well as defining SUCC(Ai) for each arc Ai, the inverse function SUCC- (Ai) is defined: SUCC- 1(Ai) = {A1 E SISUCC(Aj) = Aj}. Note that ISUCC- (Ai)I > 1. The first two steps of the algorithm eliminate arcs properly contained in other arcs and compute SUCC and SUCC-1 for each arc in S. It is shown that eliminating contained arcs can be accomplished in 0 (log n) time using 0 (n/ log n) processors using parallel prefix. A method similar to that in [Boxe89c] is used for computing SUCC(Ai) for each arc Ai. A test [requiring 0(logn) time and 0(n/logn) processors] is then made to see if there is no solution, that is, if for some Ai, SUCC(Ai) = Ai. It is shown how to compute SUCC- 1(Ai) by first proving that for every arc Ai, the arc(s) in SUCC- 1(Ai) occur around the circle C consecutively. Therefore, the arcs can be "marked" with the indices j for which SUCC(Aj) 0SUCC(Aj+ ), and SUCC- (Ai) 1 can be computed for each A, e S in 0(logn) time using 0(n/logn) processors. With the successor function and its inverse computed for each arc and properly contained arcs removed, the minimum circle cover is computed by using a parallel version of the greedy algorithm given in [Lee84a] such as was done in [Bert88]. Recall that W is the set of arcs that contain the origin R. A new copy of Ai E W, New(Ai), is created and the successor function is modified so that every SUCC(Aj) = Ai such that Ai E W is changed to SUCC(Aj) = New(Ai) and every SUCC(New(Ai)) = 0. The result is a forest of IWI trees such that the roots of the trees are the elements of New(W) and the children of a node Aj in T are the arcs in SUCC- 1(Aj). The arcs Ai ( W are among the leaves of the trees. Since the inverse of the successor function is available, the trees can be computed in 0 (log n) time with 0 (n/log n) processors using the Euler tour technique [Tarj85] and list ranking. The trees are then used to find a minimum circle cover by finding the minimum depth of each leaf Ak such that Ak E W.
Summary The table in Figure 9.2 summarizes the results in this section. Note that q minimum number of arcs crossing any point of the circle.
-
1 is the
Sec. 9.2
Euclidean Minimum Spanning Tree
115
Problem
Reference
Model
Processors
Minimum
[Bert88]
CREW PRAM
0(n
[Sark89a]
CREW PRAM
0(n)
0(logn)
[Boxe89c]
CREW PRAM
0(n)
0(logn)
[Atal89b]
EREW PRAM
0(n)
0(logn)
[Bert88]
CREW PRAM
0(n'/ logn)
O(log n)
cardinality
Minimum
2
/ log n + qn)
Running time
O(log n)
2
weight
Figure 9.2
Performance comparison of parallel minimum circle cover algorithms.
Figure 9.3 Set of n = 16 points and EMST of set.
9.2 Ew cIidean Minimum Spanning Tree Given a set S of n points in the plane, a Euclidean spanning tree of S is a tree linking the points of S with rectilinear edges. A Euclidean minimum spanning tree (EMST) of S is one for which the total (Euclidean) length of the edges is smallest among all such trees. The lower bound for computing the EMST of a set of points on a single processor is Q2(n logn) [Prep85]. Figure 9.3 shows a set of n = 16 points and the EMST of the set. It is shown in [Mill89b] how the EMST can be computed in parallel using the algorithm of Sollin described in [Good77] on a mesh with O(n) processors. Initially, each point is viewed as a connected component. At each iteration, every component is connected by an edge to its nearest (component) neighbor, thus forming a new component for the next iteration. Since each iteration reduces the number of components by at least a factor of 2, the EMST is found after log n iterations at most. The
116
Geometric Optimization
Chap. 9
algorithm uses the procedure for finding ANN on a mesh (see Section 7.1) and runs in 0 (n 1/ 2 log n) time. It operates on the implicit complete graph connecting the points of S. By starting with a sparser graph guaranteed to contain the EMST, such as the Delaunay triangulation, more efficient algorithms may be obtained. Indeed, it is pointed out in [Mill89b] that an 0(n1 /2) time (and hence optimal) algorithm for the mesh can be obtained based on an algorithm for computing the Voronoi diagram [Jeon9O]. Another EMST algorithm that runs on the mesh and is based on first finding the Voronoi diagram for S appears in [Lu86a]. The running time of this algorithm is O(n 1 /2 logn) on an 0(n)-processor mesh. (Voronoi diagram construction is discussed in Chapter 8.) We note in passing that the EMST is only a special case of the minimum spanning tree (MST) problem defined for general connected and weighted graphs. Many algorithms for computing the MST in parallel exist [Akl89a], which are of course applicable to the EMST problem. However, the algorithms discussed in the preceding paragraph were singled out as they exploit the geometric properties of the EMST.
9.3 Shortest Path In addition to the minimum spanning tree problem, other graph theoretic problems have been studied in the special case of a geometric setting. One such problem we study in this section is that of computing the shortest path between two points. A third problem, computing perfect matchings, is discussed in the following section. Given a simple polygon P with n vertices and two points s and d in P. the interior shortest path (ISP) problem asks for computing the shortest path from s to d that lies completely inside P. Figure 9.4 shows a polygon P, two points s and d inside P, and the shortest path from s to d. This problem is solved in [ElGi86b] on an 0(n)-processor CREW PRAM in 0 (log 2 n) time, provided that P is monotone. This result is strengthened in [ElGi88], where it is shown how the same model can solve the ISP problem in 0 (log n) time for arbitrary simple polygons. The algorithm consists of two major steps, each requiring 0 (log n) time and 0(n) processors. A triangulation TP of P is constructed in the first step using the parallel algorithm of [Good89a], then its dual TP' (a tree) is obtained, where each edge connects two nodes whose corresponding triangles in TP share an edge. Denote by s' (d') the node in TP' corresponding to the triangle containing s (d). The algorithm of [Tarj85] is then applied to obtain a simple path from s' to d', which corresponds to a simple polygon S contained in P, and is called a triangulated sleeve. The edges of this triangulated sleeve are arranged in order of increasing distance from s. In the second step, a divide-and-conquer approach is used to compute a shortest path from s to d in the triangulated sleeve. An algorithm is also given in [ElGi88] for computing the shortest paths from a point inside an n-vertex simple polygon P to the vertices of P. It runs in 0(log2 n) time on an 0(n)-processor CREW PRAM. This running time is improved to 0(logn) in [Good9Oa]. It is also shown in [Good9Oa] that the farthest neighbor for each vertex
Sec. 9.4
Minimum Matchings
Figure 9.4 Polygon P. two points s and s to d.
117
d
inside P. and shortest path from
in P (where distance is measured by the shortest path inside P) can be determined in O(log2 n) time on a CREW PRAM with O(n) processors. A simple polygon is said to be rectilinear if all of its edges are either horizontal or vertical. Let P be a simple rectilinear convex polygon with n vertices inside of which lie n pairwise disjoint rectangles. The latter are called obstacles. CREW PRAM algorithms are given in [Atal9Oc] for computing shortest paths inside P that avoid the set of obstacles. Descriptions of shortest paths are obtained in O(log2 n) time using 0(n 2 / log 2 n) processors if the source and destination are on the boundary of P, 0(n 2/ logn) processors if the source is an obstacle vertex and the destination a vertex of P, and O(n2 ) processors if both source and destination are obstacle vertices. Using these descriptions, a single processor can obtain the path length in constant time if the source and destination are vertices, and in O(logn) time if they are arbitrary points. The shortest path itself can be retrieved from its description by O(n/logn) processors in O(logn) time. 9.4 Minimum Matchings Let 2n points in the plane be given, of which n are colored red and n are colored blue. It is required to associate every blue point with exactly one red point such that the sum of the distances between the pairs thus formed is the smallest possible. This is a special
Geometric Optimization
118
Chap. 9
Figure 9.5 Two sets of points and minimum-weight perfect matching of sets.
case (for points in the plane) of the more general minimum-weight perfect matching problem on bipartite graphs, also known as the assignment problem. Figure 9.5 shows two sets of points and a minimum-weight perfect matching of the sets. Two efficient sequential algorithms for the assignment problem in the plane are known [Vaid89]. The first runs in O(n 2 5 logn) time when the distances are measured using either the Euclidean (i.e., L2 ) or Manhattan (i.e., LI) metric. The second runs in o (n2 log 3 n) time strictly when the Manhattan metric is used. Parallel algorithms for the assignment problem in the plane that achieve an optimal speedup with respect to the algorithms of [Vaid89] are presented in [Osia9O]. The algorithm for Euclidean distances runs in O(n
3
/p
2
+ n2 5 5 logn/p) time, using p
112
processors where p < n . It achieves an optimal speedup with respect to the algorithm of [Vaid89] when p > n1/ 2 / logn. When the distances are measured using the Manhattan metric, an algorithm is given in [Osia9O] that solves the assignment problem in the plane on an EREW PRAM with 0 (log2 n) processors in 0(n
2
log n) time.
In what follows we provide some theoretical background to the assignment problem in the plane, a summary of the algorithm of [Osia9O] for the case where the Euclidean metric is used to measure distances, a description of related matching problems, and a number of open questions.
9.4.1 Graph Theoretic Formulation A bipartite graph is a graph whose set of nodes is the union of two disjoint sets of nodes U and V. No two nodes in either U or V have an arc between them. A complete
Sec. 9.4
Minimum Matchings
1t9
bipartite graph is a bipartite graph in which there is an arc between every node in U and every node in V. Let G = (U, V) be a complete bipartite graph on the plane induced by two sets of points U and V, with I U 1=1 V J= n. A matching M of G is a pairing of the points in U with those in V such that every point in U is paired with no more than one point in V, and vice versa. The weight of M is the sum of the distances between the pairs of points in the matching. A perfect matching is a matching M of G such that every point in G is paired with another point. A minimum-weight perfect matching is a matching M of G such that the weight of M is a minimum over all perfect matchings. The assignment problem in the plane is to determine a minimum-weight perfect matching of G. 9.4.2 Linear Programming Formulation Let d(ui,vj) be the distance between Ui E U and Vj formulation of the assignment problem on the plane is: Minimize
E
V. A linear programming
Ed(uiavj)xi (Ur ,1j )
subject to xj I') (i) Lxj 1 =
=1,2.n j=1,2.
n
Xij > 0
with the understanding that the pairing (ui,vj) is in the matching M if and only if xij = 1. The constraints of the linear program mean that when a solution is obtained, each point must be paired with exactly one other point. To solve this linear program, a dual to the linear program, which is generally easier to solve, is formulated as: Maximize a, + Eby subject to a + bj d(ui,vj)
I Exij
Chap. 9
a, + by = d(ui,vj)
=1
i = 1,2,...,n
UJ)
by
0
xij
=X
1
j = 1,2,
n.
(i)I
An algorithm based on this formulation maintains primal and dual feasibility at all times and, in addition, maintains satisfaction of all orthogonality conditions except the second. The number of points for which the second condition is not satisfied decreases during the course of the computation. An alternatingpath in G with respect to a matching M is a simple path such that only one of any two consecutive edges ei, ei+l on the path is in the matching. With respect to a matching M, a point is said to be exposed if it is not paired with any other point; otherwise, it is said to be matched (or paired). An alternating tree relative to a matching M is a tree whose root is exposed and a path between the root and any other point in the tree is an alternating path. An alternating path in G joining two distinct exposed points in U and V is called an augmenting path. A matching M that does not have a maximum number of edges can be augmented using an augmenting path P by including in M the edges on P not in M and removing from M the edges on P that are in M. The algorithm searches for a series of augmenting paths. Each time an augmenting path is found, the matching is augmented. A search for an augmenting path is done by growing alternating trees. When alternating trees cannot be grown, dual variables are revised to permit further growth of alternating trees. However, when an augmenting path cannot be found, a solution to the problem has been obtained.
9.4.3 Geometric Formulation The operation of determining which edge is the next to include in an alternating tree can be reduced to a geometric query problem as follows. The slack sij on an edge (ui, v) is the distance between ui and vj, minus the sum of the dual variables associated with ui and vj [i.e., sij = d(ui,vj) - a, - bj]. There are two types of points in V. Those in an alternating tree and those that are not in any alternating tree. For the latter, we need to determine the edge (ui,vj), where us is in an alternating tree and vj not in an alternating tree, such that sij is minimum. To do this, weights w(ui), us E U, and w(vj), vj E V, related to the dual variables, are associated with the points. Now, determination of the edge (ui,vj), not in M and with minimum slack, to be included in an alternating tree, is reduced to a geometric query problem involving the weights. Let F be a forest of alternating trees, and let H represent the sum of the amount h by which the dual variables change during a phase. The relationships between the weights and the dual variables are given as as = w(ui) + H and by = w(vj) - H. At the beginning of a phase, H is initialized to 0. When the dual variables are to be revised,
Sec. 9.4
Minimum Matchings
121
h is added to H instead of revising the dual variables. When a point is included in F, the associated weight is initialized to the dual variable. At the end of a phase when an augmenting path has been discovered, the matching is augmented and the dual variables are revised using H and the weights associated with the points. To determine the next edge to include in F efficiently, a solution to the following geometric query problem is required: Given a set of points Q and a weight w(p) for each point p E Q, preprocess Q so that for a given query point q, a point in Q nearest to q can be found quickly, where the distance between the points for this query problem is sij. A solution to this problem for points on the Euclidean plane can be obtained through the use of the weighted Voronoi diagram (WVD) of the points in Q. A weighted Voronoi diagram (WVD) partitions the plane into O(I Q l) regions. Each point p E Q has a region Vor(p) associated with it, defined as follows: Vor(p) = {p" I d(p",p)-
w(p) < d(p",p') -w(p'),Vp'
E Q).
Sequentially, the WVD of a set Q of n points can be computed in O(n logn) time [Fort87], and a query can be answered in O(logn) time [Edel86a]. 9.4.4 Parallel Algorithm The parallel algorithm of [Osia9O] for solving the assignment problem on the Euclidean plane is summarized below. Step 1.
Initialize a matching M to an empty set.
Step 2.
In parallel, root alternating trees at exposed points in U.
Step 3.
If F is empty then stop else (3.1) In parallel determine a point vj not in F such that vj is nearest to a point ui in F, using the distance sij = d(ui,vj) -w(uj) -w(vj).
(3.2) If sij = 0 and vj is exposed, all processors augment the matching, update dual variables, and go to step 2. (3.3) If sij = 0, grow an alternating tree by adding (ui,vj) and (vj,uk) E M to F, initialize the weights of ui, vj, and Uk to their respective dual variables, and go to step (3.1). (3.4) If si. > 0, in parallel revise the dual variables using h = s and go to step (3.1). 9.4.5 Related Problems A different minimum matching problem is considered in each of [Osia9l] and [He9l]. In [Osia9l] we are given 2n points in the plane, all of which are of the same color. It is required to match each point with a single other point so that the sum of the distances
Geometric Optimization
122
Chap. 9
between matched points is a minimum. It is shown in [Osia91] that this problem can be solved on a p-processor EREW PRAM in O(n2 5 log 4 n/p) time, where p < n112. A restriction of this problem to the case where the 2n points fall on the boundary of a convex polygon is described in [He91], where an algorithm is given for solving the problem in 0(log2 n) time on an 0(n)-processor CREW PRAM. 9.4.6 Some Open Questions Several problems are left open in [Osia9O, Osia9l]; for example: 1. Can smaller running times be obtained using more processors? 2. What performance can be obtained when using a set of interconnected processors (instead of the PRAM)? 3. Are there efficient parallel algorithms for computing maximum matchings in the plane? Parallel algorithms for optimization problems other than the ones discussed in this chapter are described in [Agga88] and [Ferr9lb].
9.5 Problems 9.1.
Design and compare parallel algorithms for solving the minimum cardinality circle cover problem on the following models of parallel computation: (a) Mesh (b) Mesh with broadcast buses (c) Mesh with reconfigurable buses
9.2.
Show how the minimum-weight circle cover problem can be solved on a hypercube parallel
9.3.
computer. Can the Euclidean minimum spanning tree problem be solved in constant time on the model of computation known as broadcasting with selective reduction?
9.4. For two points p and q inside a rectilinear polygon P. define a smallest path from p to q
9.5.
9.6. 9.7.
as a rectilinear path that minimizes both the distance and the number of line segments in the path. Given P, p, and q, design a parallel algorithm for computing the smallest path from p to q on a mesh of processors. Given a simple polygon P, an external shortest path between two vertices p and q of P, denoted SP(p, q), is a polygonal chain of vertices of minimum length that avoids the interior of P. The external diameter of P is the SP(p,q) of maximum length over all pairs of vertices p and q of P. Design an algorithm for computing the external diameter of a simple polygon P on the EREW PRAM model of parallel computation. Investigate various approaches to computing in parallel a maximum-weight perfect matching of a set of 2n points in the plane. Given a convex polygon P with n vertices and an integer k > 3, it is required to compute the minimum area k-gon that circumscribes P. (a) Design a parallel algorithm for solving this problem for the case k = 3.
Sec. 9.6
References
123
(b) What can be said about parallel solutions when k > 4? The following problem is known as the maximum empty rectangle (MER) problem: Given an isothetic rectangle RI and a set of points P inside RI, it is required to find an isothetic rectangle R2 of maximum area such that R2 is completely contained in RI and does not contain any points from P. Design a hypercube algorithm for solving the MER problem. 9.9. You are given a collection P of planar points, an integer C, and a radius R. The elements of P may be viewed as customers of a set F of facilities to be located in the plane such that each facility has capacity C. It is required to find a set F of planar points so that each customer in P can be assigned to some facility in F with distance at most R, and so that no facility has more than C customers assigned to it. Design a parallel algorithm for solving this problem. 9.10. Design a parallel algorithm for the model of your choice that computes a smallest radius disk that intersects every line segment in a set of n line segments in the plane. 9.11. Given a simple rectilinear polygon P, it is required to cover P with a minimum number of squares, possibly overlapping, all interior to P. Design a parallel algorithm for solving this problem. 9.12. A set P of points is given in the plane. The Euclidean traveling salespersonproblem calls for finding a simple polygon whose vertices are the points of P and whose perimeter is the shortest possible. This problem is believed to be very hard to solve sequentially in time polynomial in the number of points of P [Prep85]. Design a parallel algorithm that combines the minimum spanning tree and the minimum-weight perfect matching to obtain a solution to the Euclidean traveling salesperson problem that is no worse than 1.5 times the optimal. 9.8.
9.6 References [Agga88] [Akl89a] [Atal89b] [Atal9Oc]
[Bert881 [Boxe89c] [Cole88b]
A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dunlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the minimum circle-cover problem, Information Processing Letters, Vol. 32, 1989, 159-165. M. J. Atallah and D. Z. Chen, Parallel rectilinear shortest paths with rectangular obstacles, Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 270-279. A. A. Bertossi, Parallel circle-cover algorithms, Information ProcessingLetters, Vol. 27, 1988, 133-139. L. Boxer and R. Miller, A parallel circle-cover minimization algorithm, Information Processing Letters, Vol. 32, 1989, 57-60. R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785.
124 [Edel86a] [ElGi86b]
[EiGi88] [Ferr9lb]
[Fort871 [Good771 [Good89a] [Good9Oa]
[He9l]
[Jeon9O] [Krus85]
[Lee84a] [Lu86a]
[MilI89b] [Osia90]
[Osia9l]
[Prep85]
Geometric Optimization
Chap. 9
H. Edelsbrunner, L. J. Guibas, and J. Stolfi, Optimal point location in a monotone subdivision, SIAM Journal on Computing, Vol. 15, 1986, 317-340. H. ElGindy, A Parallel Algorithm for the Shortest Path Problem in Monotone Polygons, Technical Report MS-CIS-86-49, Department of Computer and Information Science, Faculty of Engineering and Applied Science, University of Pennsylvania, Philadelphia, May 1986. H. ElGindy and M. T. Goodrich, Parallel algorithms for shortest path problems in polygons, The Visual Computer, Vol. 3, 1988, 371-378. A. G. Ferreira and J. G. Peters, Finding smallest paths in rectilinear polygons on a hypercube multiprocessor, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 162-165. S. Fortune, A sweepline algorithm for Voronoi diagrams, Algorithmica, Vol. 2, 1987, 153-174. S. E. Goodman and S. T. Hedetniemi, Introduction to the Design and Analysis of Algorithms, McGraw-Hill, New York, 1977, section 5.5. M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. M. T. Goodrich, S. B. Shauck, and S. Guha, Parallel methods for visibility and shortest path problems in simple polygons, Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, June 1990, 73-82. X. He, An efficient parallel algorithm for finding minimum weight matching for points on a convex polygon, Information Processing Letters, Vol. 37, No. 2, January 1991, 111-116. C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles, Illinois, August 1985, 180-185. C. C. Lee and D. T. Lee, On a circle-cover minimization problem, Information Processing Letters, Vol. 18, 1984, 109-115. M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on ParallelProcessing, St. Charles, Illinois, August 1986, 806-811. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. C. N. K. Osiakwan and S. G. Akl, Efficient ParallelAlgorithms for the Assignment Problem on the Plane, Technical Report 90-284, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. C. N. K. Osiakwan, Parallel computation of weighted matchings in graphs, Ph.D. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1991. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985.
Sec. 9.6
References
125
[Sark89a] D. Sarkar and 1. Stojmenovi6, An optimal parallel circle-cover algorithm, Information Processing Letters, Vol. 32, July 1989, 3-6. [Stoj88b] 1. Stojmenovi6 and M. Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane, Parallel Computing, Vol. 7, 1988, 249-251. [Tarj85] R. E. Tarjan and U. Vishkin, An efficient parallel biconnectivity algorithm, SIAM Journal on Computing, Vol. 14, 1985, 862-874. [Vaid89] P. M. Vaidya, Geometry helps in matching, SIAM Journal on Computing, Vol. 18, No. 6, December 1989, 1201-1225.
10
Triangulation of Polygons and Point Sets
In this chapter we review parallel algorithms for the following two problems: 1. Decomposing simple polygons into trapezoids; such decompositions are used in planar point location (see Chapter 5), as well as the triangulation of simple polygons (also discussed in this chapter). 2. Triangulating point sets; this problem has practical applications in the finite-element method and in numerical analysis.
10.1 Trapezoidal Decomposition and Triangulation of Polygons The trapezoidaldecomposition or trapezoidal map of a polygon P is the decomposition of P into trapezoids. Given a simple n-vertex polygon P, (vertical) trapezoidal edge(s) are determined for each vertex. A trapezoidal edge for vertex v is an edge e of P that is directly above or below v such that the vertical line segment from v to e is inside P. Figure 10.1 shows a polygon P and a trapezoidal decomposition of P. A triangulation of a simple n-vertex polygon P is the augmentation of P with diagonal edges (or chords) connecting vertices of P such that in the resulting decomposition, every face is a triangle. Figure 10.2 shows a triangulation of the polygon P shown in Figure 10.1. In this section we present algorithms that compute the trapezoidal map of a simple polygon and algorithms that, given a trapezoidal map of a simple polygon P. triangulate P. An algorithm that decomposes an n-vertex simple polygon P (possibly with holes) into trapezoids is given in [Asan88] with running time O(n) on a linear array of size n. Using the trapezoidal decomposition of P. P is decomposed into monotone polygons in O(n) time. Each monotone polygon is then triangulated sequentially by one processor in O(n) time. 127
128
Triangulation of Polygons and Point Sets
Figure 10.1 Polygon
P
Chap. 10
and trapezoidal decomposition of P.
In [Jeon9O] an algorithm for multilocating points in a set of nonintersecting line segments is described (see Chapter 5). It is shown how the trapezoidal decomposition of a simple polygon P and a triangulation of P can be constructed by direct application of this algorithm. The multilocation algorithm runs on a mesh of size n in O(n 1/2) time. An algorithm for point location on a hypercube presented in [Dehn9O] implies O (log2 n)-time solutions to trapezoidal decomposition and triangulation of a simple polygon on a hypercube of size 0 (n log n). Randomized algorithms for computing the trapezoidal decomposition of a set of nonintersecting line segments and the triangulation of a simple polygon are given in [Reif90]. Each algorithm runs in 0 (log n) probabilistic time on an 0(n)-processor butterfly network. Several algorithms on the CREW PRAM model of computation also exist for computing the trapezoidal map of a simple polygon and triangulating a simple polygon. The algorithm of [Agga88] described in Chapter 4 decides whether any two line segments in a set of line segments intersect and, if not, computes the vertical trapezoidal decomposition for the set of line segments. The trapezoidal decomposition of a simple polygon P is found in 0(log2 n) time using O(n) processors or in O(logn) time using O (n log n) processors. From the trapezoidal decomposition G, the polygon is partitioned into monotone polygons: Let t be a trapezoid in G and let c be a reflex comer of P such that c lies in the relative interior of a vertical edge of t; a diagonal edge is added that joins c to the comer of P on the opposite vertical edge of t. This can be done in constant time using O(n) processors, resulting in the partition of P into a
Sec. 10.1
Trapezoidal Decomposition and Triangulation of Polygons
129
Figure 10.2 Triangulation of P. set of horizontally monotone polygons. Horizontally monotone polygons consist of an upper chain and a lower chain of edges that meet at two extremal points, a leftmost and a rightmost point. From the horizontally monotone partition, P is partitioned into one-sided monotone polygons. A polygon Q is a one-sided monotone polygon if it is monotone and it has one distinguished edge s such that the vertices of Q are all above or below s except for the endpoints of s. The endpoints of s are the extremal points of a horizontally monotone polygon. Let Q be a one-sided monotone polygon with q vertices. Without loss of generality, assume that the distinguished edge s is below the vertices of Q. To triangulate Q, divide Q into ql 1 2 sections using ql1 2 vertical lines and find the lower hull of the part of Q above s in each section. The parts of Q above each hull are recursively triangulated by utilizing a multiway divide-and-conquer technique. The common tangent lines between each pair of lower hulls are computed iteratively until there is only a single lower hull left. This process results in a partial triangulation with the remaining parts to be triangulated having a similar shape. This shape is called a funnel polygon by Goodrich in [Good89a]. A funnel polygon is a one-sided monotone polygon that consists of a single edge followed by a convex chain followed by a single edge (or a vertex) followed by another convex chain. Figure 10.3 shows examples of funnel polygons. A funnel polygon K is triangulated in 0(logk) time using O(k) processors, where k is the number of vertices of K. The remaining section to be triangulated is bounded by the
130
t
I ii
---
-i-c-
UM - I iY r.-
M Pclvclnnq -n-
- --
and Point Sets
Chap. 10
Figure 10.3 Funnel polygons. r (left and right, respectively) on the bottom, distinguished edge s with vertices I and I to the on the top, with two edges connecting and the final lower hull boundary LH triangulated easily is portion of LH. This left extreme of LH and r to the right extreme either I or r. from visible is hull since each point on the lower runs in 0(logn) time using 0(n) The triangulation algorithm given in [Agga88] polygon to be triangulated is given. Otherwise, processors if the trapezoidal map for the or in 0(logn) time using O(n logn) 2 it runs in 0(log n) time using 0(n) processors in [Atal86b]. There, a simple polygon processors. A similar algorithm is presented and n log log n) time using 0(n) processors is decomposed into trapezoids in 0 (log 0(n) using time n) (log 0 in decomposition, is triangulated, given the trapezoidal processors. is used to construct a trapezoidal In [Atal89c], cascading divide-and-conquer plane 0 (log n) time using 0(n) processors. A decomposition of a simple polygon P in made segments that make up P, and T is sweep tree T is constructed for the line with T. Both operations take 0(logn) time into a fractional cascading data structure P of is multilocated in T; this yields the edge 0(n) processors. Each vertex p of P this trapezoidal decomposition or trapezoidal that is directly above (or below) p. Given time finds a triangulation of P in 0(logn) map of P, the algorithm in [Good89a] LAgga88I on the result of [Atal86b] and using 0 (n/log n) processors which improves uses a method similar to those Goodrich of by a factor of log n processors. The result of [Atal86b] and [Agga88]. an a simple polygon P, [Yap88] presents Given a trapezoidal decomposition of two making time using 0(n) processors by algorithm that triangulates P in 0(logn) as the one given in [Atal89c]. This algorithm calls to a trapezoidal map algorithm such and [Good89a] that perform heterogeneous avoids the steps in [Atal86b], [Agga88], algorithm, convex hull construction, and map operations such as calls to the trapezoidal map it performs just two calls to the trapezoidal multiway divide-and-conquer. Instead, algorithm and, as such, is more elegant. in [Reif87] that find the trapezoidal Finally, randomized algorithms are given 0 (log n) and triangulate a simple polygon in decomposition of a simple polygon probabilistic time using 0(n) processors.
Sec. 10.2
Triangulation of Point Sets
131
Reference
Model
Processors
TD time
T time
[Asan88) [Jeon9O]
Linear array Mesh
0(n) 0(n)
(n)(n) 0(n 1/2
0(n 1/2)
[Reif90] [Dehn9O]
Butterfly Hypercube
0 (n) 0(n logn)
O(logn)
6 (log n)
0(l0g2n)
0(log2 n)
[Atal86b]
CREW PRAM
0(n)
0(logn log logn)
0(logn)
[Agga88] [Agga88]
CREW PRAM CREW PRAM
0(n)
0(n)
Given a TD 2 0(log n)
0(logn)
[Atal89c] [Good89a] [Yap88] [Reif87]
CREW CREW CREW CREW
0(n) 0(n/logn) 0(n) 0(n)
PRAM PRAM PRAM PRAM
0(logn) Given a TD O(logn)
0(logn)
0(logn) 0 (logn) O(logn)
Figure 10.4 Performance comparison of parallel polygon triangulation algorithms.
Summary The table in Figure 10.4 summarizes the results for polygon triangulation in parallel. There are two running times given: TD time. T time.
The time to construct the trapezoidal decomposition. The time to construct a triangulation given a trapezoidal decomposition.
In [Asan88], the polygon is partitioned into monotone polygons in parallel, and the time marked with a t is the time to triangulate each monotone polygon with one processor.
10.2 Triangulation of Point Sets Triangulating a set S of n points requires partitioning the convex hull of S into triangles such that the vertex set of the partition is the set of points. The problem of triangulating a set of points in the plane is more difficult than triangulating a simple polygon since the sequential lower bound for triangulating a set of points is Q (n log n) [Prep85] and a linear time algorithm exists for triangulating a simple polygon [Chaz9O]. Conceptually, one can see that if there exists an algorithm to triangulate a simple polygon, it can be used to triangulate a point set S by first constructing a simple polygon P from S, then finding the convex hull of P. Each of P and the polygons formed between P and the convex hull edges can then be triangulated using a polygon triangulation algorithm. The parallel algorithms described in this section compute arbitrary triangulations of point sets. On the other hand, the Delaunay triangulation is the triangulation of a set of points
132
Triangulation of Polygons and Point Sets
Chap. 10
Figure 10.5 Arbitrary triangulation of set of points.
S such that the minimum angle of its triangles is a maximum over all triangulations [Saxe9O]. Since the Delaunay triangulation of S is the dual of the Voronoi diagram of S, algorithms for computing Delaunay triangulations are discussed in Chapter 8. Figure 10.5 illustrates an arbitrary triangulation of a set of points. An algorithm is given in [Chaz84] that triangulates a set S of n points in the plane on a linear array of size n in O(n) time. The convex hull of S, CH(S), is found in O(n) time using an algorithm also given in [Chaz84]. CH(S) is then partitioned into h triangles, where h is the number of points on CH(S), by adding an edge from one point of CH(S) to every other point of CH(S). The rest of the points in S are then triangulated inside each of the triangles. Processors store either the edges of CH(S) or the edges of a triangle in clockwise order. When a point p is added to the triangulation, it is passed through the array to test if it is inside the face of a triangle R and, if so, R is replaced by three triangles made up by joining p to vertices of R. This causes the rest of the information stored in the linear array of processors to "ripple" down the array in linear time. Points can also be added to the triangulation that are outside CH(S) by using an algorithm similar to Chazelle's convex hull algorithm [Chaz84]. Three more algorithms are considered for triangulating point sets in parallel. They all run on the CREW PRAM. The first two triangulate sets of points in the plane and the third one triangulates points in d-dimensional space. The algorithm in [Merk86] triangulates a set of points S in the plane and runs in O(logn) time using O(n) processors. It reduces the problem of triangulating points inside the convex hull of S to triangulating points inside triangles. The convex hull of S is found in O(logn) time using O(n) processors by an algorithm such as the one in [Atal86a] described in Chapter 3. The lowest rightmost point X is found and the rest of the points pi are sorted by the angle Oi that pi makes with X in the positive x-direction. The sorted sequence
Sec. 10.2
Triangulation of Point Sets
133
is split by lines through the extreme points of S and X. These splitting lines partition a convex polygon into triangles. Pr in one P, Pi, Pi+i, .Pr-1, Pi-l, Consider a subsequence of points pi, Pi+,. of these partitions, where pi and Pr are the left and right extreme points that bound the subsequence, respectively, in a clockwise direction around CH(S). The height Ai from the line through X parallel to the line through (pi, Pr) is calculated for each point Pi E [pi, Pr]I An algorithm called simple triangulation, which takes a subsequence such as the one defined above and triangulates it, runs in 0(logk) time using O(k) processors, where k is the number of points in the subsequence. Since a point can be in at most two subsequences, the processors can be divided among the subsequences so that O(k) processors are used for each subsequence. The simple triangulation algorithm splits a subsequence of size k into k0 2 subsequences of size k012 and recursively triangulates each in a multiway divide-and-conquer process. The 3,'s are used to connect points to their left higher and right higher neighbors in the subsequence and down to the point X. Merging the triangulated subsequences in 0(logn) time with O(n) processors is accomplished using a data structure similar to a segment tree [Bent8O]. The entire algorithm takes O(logn) time with O(n) processors on a CREW PRAM. An algorithm in [Wang87] achieves the same running time and also uses multiway divide-and-conquer but does not first reduce the problem of triangulating points in a convex hull to triangulating points in a triangle. The set of points is partitioned into n /2 subsets of size n1/2 each. The problem is solved recursively on each subset and, during the algorithm, the convex hull of each of the subsets is created. The upper hulls of the n1/2 convex hulls, n1 /2 - I of their pairwise common upper supporting lines, and n12- 1 "middle" lines connecting every two adjacent sets of points, form n1 /2 - funnel polygons. (Funnel polygons were described in the preceding section.) An algorithm is presented that triangulates funnel polygons in 0(logm) time using O(m) processors, where m is the number of vertices in the funnel polygon. A supporting line lij is chosen for a pair of convex hulls CHi and CHj (CHj to the left of CHi) where the slope of lij is smaller than the slope of all supporting lines between CH, and hulls to the left of CHi. The supporting lines can be found in 0(logn) time with one processor [Over8l], and since there are at most n supporting lines, in 0(logn) time using O(n) processors. The algorithm to triangulate each funnel polygon is called and the entire process is repeated for the lower hulls. It is shown that allocating O(n) processors to n1 /2 - I funnel polygons so that each funnel polygon P of size m is allocated m - 2 processors can be done in 0 (log n) time using O(n) processors on the CREW PRAM. The algorithm in [Wang87] is adapted to run on an 0(n)-processor hypercube by MacKenzie and Stout [MacK9Oa] by dividing the set of points in n1/4 subsets of size n314 each. At each stage of the recursion, O(SORT(n)) time is used, where SORT(n) is the time needed to sort n numbers on an 0(n)-processor hypercube. The total time required is t(n) = t(n314 ) + O(SORT(n)), which is O(SORT(n)). At the time of this writing, the fastest known sorting algorithm on a hypercube of size n has running time 0(log n log log n) [Cyph9O, Leig9 1]. Using this sorting algorithm, the triangulation algorithm of [MacK9oa] runs in 0(logn log logn) time.
Triangulation of Polygons and Point Sets
134
Reference
Model
Processors
Running time
[Chaz84]
Linear array
0(n)
0(n)
[Merk861, [Wang871 [MacK9Oa] with [Leig9l] [Elgi86a] t
CREW PRAM Hypercube CREW PRAM
0(n) O(n) 0(n/logn)
0(logn) 0(lognloglogn) 0(log2 n)
Chap. 10
Figure 10.6 Performance comparison of parallel algorithms for triangulating point sets. Finally, an algorithm given in [ElGi86a] triangulates a point set in arbitrary dimensions in O(log 2 n) time using 0(n/ log n) processors on a CREW PRAM.
Summary The table in Figure 10.6 summarizes the results in this section. All of the results are for triangulating n points in the plane except for the reference marked with a t, which triangulates n points in arbitrary dimensions.
10.3 Problems 10.1.
10.2.
Show how an n-vertex simple polygon can be triangulated on the following processor networks: (a) Tree (b) Butterfly (c) Pyramid Design a hypercube algorithm for trapezoidal decomposition and triangulation of a simple polygon with n vertices, whose cost is 0(n log2 n).
10.3.
10.4.
Design a CREW PRAM algorithm that decomposes a simple polygon with n vertices into trapezoids and whose cost is O(n). Can this performance be obtained on an EREW PRAM? Show how a triangulation of a set of points in the plane can be computed on each of the following models of computation: (a) Mesh (b) Tree
10.5. 10.6.
(c) Modified AKS network (d) Broadcasting with selective reduction (BSR) Design a mesh-of-trees algorithm for computing a triangulation of a set of points in a d-dimensional space. Given a set S of points in the plane, a minimum-weight triangulation of S is a
triangulation T such that the sum of the Euclidean lengths of its edges is a minimum over all triangulation of S. Design a parallel algorithm for computing T on each of the following models of computation: (a) Mesh
Sec. 10.4
References
135
(b) Hypercube (c) CRCW PRAM (d) Broadcasting with selective reduction (BSR) 10.7. Repeat Problem 10.6 for the case where the points of S form a simple polygon. 10.8. Repeat Problem 10.6 for the case where the points of S lie on a set L of m straight nonvertical lines, numbered I to m, and satisfying the following two properties: (i) No two lines intersect inside CH(S). (ii) All the points of S on line i are above line i + I and below line i - 1. 10.9. Design a parallel algorithm for triangulating a simple polygon with holes. 10.10. Design parallel algorithms for computing the following two geometric structures for a set P of points in the plane [Prep851: (a) The Gabriel graph of P has an edge between points pi and pj of P if and only if the disk with diameter the segment (p,, pj) contains no point of P in its interior. (b) The relative neighborhood graph of P has an edge between points pi and pj of P if and only if d(pi,pa), the distance from pi to pj, is such that d(pi, pj) < min max(d(pi, pk), d(pj,pk))-
10.11. A set of points P in d-dimensional space, where d > 2, is given. (a) Define the concept of a triangulation T of P. (b) Design a parallel algorithm for computing T on a hypercube computer. 10.12. A set of points P in d-dimensional space, where d > 2, is given. (a) Provide a definition for the concept of the Delaunay triangulation T of the convex hull of P. (b) Design a parallel algorithm for computing T [Beic90.
10.4 References [Agga88] [Asan88] [Atal86a] [AtaI86b]
A. Aggarwal, B. Chazelle, L. J. Guibas, C. 6'Ddnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. T. Asano and H. Umeo, Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region, Parallel Computing, Vol. 6, 1988, 209-216. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Paralleland Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational
[Atal89c]
[Beic90
Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. 1. Beichl and F. Sullivan, A robust parallel triangulation and shelling algorithm, Proceedings of the Second Canadian Conference in Computational Geometry,
[Bent8O]
Ottawa, Ontario, August 1990, 107-111. J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting
136
Triangulation of Polygons and Point Sets
Chap. 10
intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571-576. [Chaz84] B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785. [Chaz9O] B. Chazelle, Triangulating a simple polygon in linear time, Proceedings of the Thirty-First Annual Symposium on Foundations of Computer Science, St. Louis, October 1990, Vol. I, 220-230. [Cyph90] R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dehn90] F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Paralleland Distributed Computing, Vol. 8, 1990, 367-375. [ElGi86a] H. ElGindy, An optimal speed-up parallel algorithm for triangulating simplicial point sets in space, International Journal of Parallel Programming, Vol. 15, No. 5, 1986, 389-398. [Good89a] M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. [Jeon90l C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [Leig9l] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees . Hypercubes, Morgan Kaufman, San Mateo, California, 1991. [MacK9Oa] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, October 1990, 8-11. [Merk86] E. Merks, An optimal parallel algorithm for triangulating a set of points in the plane, International Journal of Parallel Programming, Vol. 15, No. 5, 1986, 399-411. [Over8l] M. H. Overmars and J. van Leeuwen, Maintenance of configurations in the plane, Journal of Computer and System Sciences, Vol. 23, 1981, 166-204. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif87] J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedingsof the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. [Reif9O] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. [Saxe90] S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions, IEEE Transactions on Computers, Vol. C-39, No. 3, March 1990, 400-404. [Wang87] C. A. Wang and Y. H. Tsin, An O(logn) time parallel algorithm for triangulating a set of points in the plane, Information ProcessingLetters, Vol. 25, 1987, 55-60. [Yap88] C. K. Yap, Parallel triangulation of a polygon in two calls to the trapezoidal map, Algorithmica, Vol. 3, 1988, 279-288.
11
Current Trends
The purpose of this chapter is to expose some of the trends in parallel computational geometry that are developing at the time of this writing. We begin by describing a number of algorithms that run on systolic screens and solve problems defined on two-dimensional pictures, such as insertion, deletion, computing shadows, and finding shortest paths. We then present a generalization of the prefix sums problem and show how it leads to the derivation of efficient parallel algorithms for several problems in computational geometry, including finding maximal vectors and ECDF searching. This is followed by a study of the properties of the star and pancake interconnection networks and their use in solving a family of computational geometric problems. Finally, we conclude with a detailed discussion of the model of computation known as broadcasting with selective reduction and its applications. For other expositions of current trends in parallel computational geometry, the reader is referred to [Agga92] and [Good92a].
11.1 Parallel Computational Geometry on a Grid Current graphics technology uses raster scan devices to display images. These devices can be modeled by a two-dimensional array of picture elements or pixels (see Section 2.2.2). This suggests that it would be useful to consider a form of geometry where objects are composed of pixels. Operations on objects simply deal with collections of pixels. Indeed, a language is proposed in [Guib82] to manipulate pixels in which general graphics operations can be performed. An idea that blends in naturally with the use of pixel operations is the use of massive parallelism. A natural way to use parallelism in this framework is to assign a processor to each pixel with a suitable underlying connection network. Algorithms are described in [Four88] that use the frame buffer or the array of pixels in such a way. This concept was also used in our discussion of systolic screens and their associated algorithms (see, for example, Chapters 3 and 7). In [Akl9Ob] several common geometric problems are examined and algorithms are provided for their solution incorporating some of the ideas mentioned above. 137
Current Trends
138
Figure 11.1 components.
Chap. 11
Grid and three regions. Regions can have holes and disconnected
These include a geometric search problem, a shadow problem, and a path in a maze problem. This section is devoted to a discussion of these problems and the corresponding algorithms designed to solve them in parallel. Other algorithms for the problems addressed in this section, as well as algorithms for related problems and additional references, can be found in [Beye69], [Nass8O], [Won87], [Prea88], [Ayka91], [Dehn9la], and [Dehn9lb]. 11.1.1 Geometric Search Problem Consider a square finite subset of the plane denoted by U, partitioned into an N"12 by N112 square grid. Following the terminology used in the computer graphics literature we will call each grid square a pixel. We will identify each pixel by its row and column position in U. A region r is defined by a collection of pixels from U and identified with a unique label. Let R = (rl, r2, . ... , r) represent a well-ordered sequence of regions as shown in Figure 11.1. We would like to process operations in the following form:
Sec. 11.1
Parallel Computational Geometry on a Grid
139
1. INSERT: Given the description of a new region r, and a location in the sequence R, insert r into the sequence. 2. DELETE: Given a region r, delete it from R. 3. RETURN TOP REGION: Given a pixel (i,j), return the region r containing (i,j), such that of all regions containing (i,j), r is the one that appears first in the sequence R. If there is no region in R containing (i,j), return NIL. These operations are intended to be a formalism of the operations one performs when using a raster graphics display with inputs from an electronic selecting device (e.g., a mouse, light pen, touch screen, etc.). Each region represents an object of the graphics display. A RETURN TOP REGION operation represents the ability to choose an object by selecting a pixel the object contains. The INSERT and DELETE operations reflect the ability to insert and delete objects and move objects around. (Using our formalism, moving an object from one position to another results in a DELETE followed by an INSERT.) The sequence of the regions denotes that objects are layered, and in the case where several objects contain a selected pixel, the one that is on top is the one that is chosen. It is shown in [Akl90] that by using an array of processors (one processor per pixel) this problem can be solved in a simple and efficient manner. In what follows we begin by describing a sequential solution to the problem, designed to run on a single processor. This will serve as our paradigm in presenting the parallel solution of [Akl90], which runs on an array of processors. Sequential solution. One can view the problem we have described as a combination of two search problems. One search requires us to determine all the regions that contain a given pixel. The second is a search among these regions for the one that appears first in our ordering. A simple solution maintains a priority queue Tij for each pixel (ij). All regions in R that contain (ij) will be stored in Tij. We must update w priority queues to INSERT a region r containing w pixels. Similarly, an update of w priority queues is required to DELETE a region r. A RETURN TOP REGION operation simply examines one priority queue and returns an answer. We must now address the problem of maintaining these priority queues. If we could assign a value to the priority of a region, an update of a priority queue is straightforward. Using a suitable balanced search tree implementation, we can insert into and delete from the priority queue in time that is proportional to log ITIj 1.However, inserting into these priority queues requires knowledge of R. A region's position in R cannot be represented by a fixed rank; rather, the position in R only has meaning relative to other regions in R. We therefore require one additional priority queue, called TR, to maintain the entire sequence R. The operations INSERT and DELETE use TR when updating individual priority queues. Let us first examine the INSERT operation. We are given a region r to be inserted into the sequence R, after a region q. We want to update each Tij corresponding to pixels (ij) that are contained in R. The correct position in which to insert r into Tj is after the region r*, such that r* is the last region in Tij that precedes q in R. We can identify r* by performing a binary search in Tij. At each comparison we examine the
140
Current Trends
Chap. 11
region s to determine whether it precedes or succeeds q. If s precedes q, we can search in the upper part of the sequence, and if s succeeds q, we search in the lower part of the sequence. After O(log ITj1 ) comparisons we can find r*. Each comparison to determine if s precedes or succeeds q is done by locating s in TR. This computation requires o (log n) operations. Observe that ITijI < n. Thus to INSERT a region r consisting of w pixels, we use 0 (w log 2 n) operations. Using a similar method, we can DELETE a region consisting of w pixels in 0(w log 2 n) operations. Given the pixel (ij), the operation RETURN TOP REGION can be performed in constant time, by returning the first region found in Ti. This scheme uses one memory location for each pixel in the union of all of the regions in R, plus storage for the structure TR. On the average, each pixel is shared by a small number of regions, yielding a storage requirement of 0(n). In the worst case, however, each of the N pixels is part of all n regions, and the storage requirement is O(N x n). Similarly, the time to perform an INSERT or DELETE operation is 0 (N log2 n). The analysis above suggests a time-versus-storage trade-off. Instead of using T1j, assume that a copy of the priority queue TR is maintained for each pixel (ij). In that copy of TR, those regions containing (i,j) are identified by a special label. In this case, the storage requirement is always O(N x n), while the running time of an INSERT or DELETE operation is O(N log n). This reduction of the running time by a factor of o (log n) is due to the fact that all the information required to insert or delete a region is available at each pixel. Operation RETURN TOP REGION can be performed in' constant time as before by maintaining, for each pixel (i,j), a pointer to the top region in TR containing (i, j). Parallel solution. The observation in the final paragraph of the preceding section leads to the following parallel implementation of the algorithm. Assume that N processors are available. The processors are arranged in a two-dimensional array of N 1/2 rows and N"/2 columns. There are no links whatsoever connecting the processors. Each processor is associated with a pixel in U. We assume that each processor has enough memory to store a copy of TR. Within that copy, those regions containing the associated pixel are identified by a special label. All processors are capable of communicating with the outside world. Suppose that a region r with w pixels is to be inserted into the sequence R, after region q. Each of the w pixels in the region receives r and q and a "1" bit indicating that it belongs to this region. Each of the remaining N - w pixels also receives r and q and a "O" bit indicating that it lies outside the region. All processors would have thus received in constant time the data needed to perform the insertion. This is done locally by each processor in an additional 0(logn) steps, with all processors operating in parallel to update their copy of TR. The processors included in the new region also label that region in their copy of TR. Consequently, INSERT requires 0(logn) time. The same analysis applies to DELETE. Finally, operation RETURN TOP REGION [containing pixel (ij)] is performed by querying processor ij, associated with pixel (ij). If a pointer is maintained at all times to the top region in TR containing (ij), then processor ij is capable of answering the query in constant time.
Sec. 11.1
Parallel Computational Geometry on a Grid
141
A similar problem is examined in [Bern88]. A scene consisting of rectangular windows is maintained to allow insertions, deletions, and mouse click queries in much the same way as described above. However, the approach in [Bem88] is object oriented; that is, the algorithm is independent of the resolution of the display medium. Inserting or deleting a rectangle can be performed in O(log2 n loglogn + klog2 n) time using o (n log2 n + a log n) space, where k is the number of visible line segments that change, n is the number of rectangles, and a is the number of visible line segments at the time of the update. A mouse click query is performed in 0 (log n log log n) time.
11.1.2 Shadow Problem Assume that an N1 2 by N 1 / 2 mesh of processors represents an N 1/ 2 by N 1/ 2 grid of pixels. We will call this mesh the screen. The pixels are assumed to be unit squares and pixel (i,j) covers the square with corners (i i 0.5, j + 0.5). We assume that the rows and columns are numbered from 0 to N1/ 2 - 1. Each processor in the interior of the screen can communicate with four neighbors, and the exterior ones with one or two. An image on the screen is defined as a set of squares. Given a collection of images and a light source, as shown in Figure 11.2, it is required to compute the shadows that the images throw onto each other. In Figure 11.2, the parts of the object that are not highlighted are shadows. We assume that each processor ij knows whether or not pixel (i,j) is in the image. It is shown in [Dehn88e] that the shadows can be computed in O(N'1 2 ) time for a light source that is infinitely far away. The idea there is to solve the problem in strips that run parallel to the rays of light. The strip width needs to be chosen carefully to achieve the O(N 112 ) running time. A simpler algorithm that also computes the shadows in O(N' 2) time is given in [Akl9O]. Moreover, the algorithm works for any light source that is either inside or outside the screen. This algorithm is now described in some detail.
Computing shadows. Suppose that processor ij is the processor closest to the light source. We assume that this processor knows the location of the light source. Processor ij starts the computation, and in at most 2N'/2 steps all other processors can compute the shadows on their squares. Since all squares are equal in size, it can easily be seen that on each side of a square there can be at most one interval of light and one interval of shadow. The following algorithm computes all contours of images that receive light from the light source.
Algorithm Shadows for all processors st (except processor ij) do in parallel 1. Wait for a message from neighboring processors; this message specifies the coordinates of the light source and the shadow interval on the edge that separates the square of processor st and the square of the sending processor.
142
Current Trends
Chap. 11
Figure 11.2 Point light source and some objects. The illuminated parts of the objects are highlighted. 2. Using the coordinates of the light source, processor st computes the number of messages it should receive. 3. The shadow intervals on the remaining edges can be computed from the coordinates of the light source, the received shadow intervals of one or more edges, and the knowledge that square (s,t) is or is not part of the image. The number of received messages in step 2 is either 0, 1, or 2. For example, if the light source is inside the screen, the coordinates of the light source are in a square (a,b), for some a and b such that 0 < a, b < N 112. Processor ij, where ij = ab, will send the initial messages. All processors st with s = a or t = b will receive one message. All other processors will receive two messages. On the other hand, suppose that the light source is outside the screen in a square with coordinates (a,b) for some a < 0 , and 0 < b < N 112. Processor ij, where ij = Ob, will start the computation. All processors st with t = b will receive one message. All other processors will receive two messages. Similarly, it can be seen that the number of messages sent is at most 3, except possibly for the starting processor. To obtain the shadowing information of the remaining edges in step 3, it is
Sec. 11.1
Parallel Computational Geometry on a Grid
143
Figure 11.3 Maze. required only to compute the straight lines through the light source and the endpoints of the received shadow intervals, and to intersect these with the remaining edges. From the two observations above it can be concluded that steps 2 and 3 can be executed in constant time. Consequently, the overall running time of the algorithm is O(N' /2). 11.1.3 Path in a Maze Problem We are given an N1/2 by N1/2 grid consisting of some white squares and some black squares, as shown in Figure 11.3, for N = 81. Two white squares, A (the origin) and B (the destination), are designated. It is required to find a shortest path from A to B along white squares, which avoids the black squares (the obstacles). Of course, there may be cases where no path exists from A to B (for example, if A were A' in Figure 11.3). Applications that require finding the shortest path in a maze include circuit design, robot motion planning, and computer graphics. It should be noted that in the restricted maze problem [Lee61], the shortest path can go from one square to another only through a common horizontal or vertical side. By contrast, our definition also allows the path to go diagonally from one square to the next through a common corner. In what follows we begin by presenting a sequential solution to the problem. We then describe a parallel algorithm for a systolic screen, first proposed in [Akl90]. Sequential solution. To solve the problem sequentially, we express it in graph theoretic terms. A weighted graph is used to represent the maze as follows: 1. A white (black) node is associated with a white (black) square. 2. Each black node is connected to its immediate neighbors (at most eight) by arcs of infinite weight. 3. Each white node is connected to every white immediate neighbor (at most eight) by an arc of finite weight.
144
Current Trends
Chap. 11
Figure 11.4 Graph corresponding to maze in Figure 11.3.
4. The nodes associated with the origin and destination are marked A and B, respectively. The resulting graph for the maze in Figure 11.3 is shown in Figure 11.4. For simplicity only arcs of finite weight are shown. The problem of finding a shortest path in a maze is now reduced to that of finding a shortest path in a graph. The graph has N nodes and e arcs, where e < 2N1 /2 (N1 /2 - 1) = O(N). It is shown in [John77] that a shortest path in such a graph can be found in O(N log N) time. If it turns out that this path has infinite weight, we know that no path exists from origin to destination in the given maze. Parallel solution. Let the squares in the maze be indexed in row-major order (i.e., left to right and top to bottom) from I (top left square) to N (bottom right square). Our parallel solution associates a processor i with square i. Strictly speaking, only the processors associated with white squares will be needed. However, we usually do not know in advance which squares will be white and which will be black. Thus in a computer graphics application, for example, each pixel is assigned a processor: The same pixel is white in some scenes, black in others. The parallel algorithm may be thought of as a wave that originates at A and sweeps the maze; if B is reached, a stream flows back from B to A, yielding the required shortest path. Thus the algorithm consists of two phases: the forward phase and the backward phase. Each phase requires at most N steps. This means that a shortest path is found (if one exists) in at most 2N steps. If the wave never reaches B (i.e., if
Sec. 11.1
Parallel Computational Geometry on a Grid
145
there is no path from origin to destination), the backward phase is never executed. In most cases, the two phases will overlap in time, with the second phase beginning before the first phase has ended. The algorithm is given below. Algorithm Maze Step 1. The origin square is assigned the label (t,a,O), where t stands for temporary, a is the origin's index, and 0 is the distance from the origin to itself. Step 2. for k = I to 2N do for i = I to N do in parallel (2.1) Once square i is assigned a label (tj,d), for some j, it labels all its immediate unlabeled white neighbors, k, with (t, i, d + Wik), where Wik is the distance from i to k. (2.2) A square receiving more than one label simultaneously in (2.1) retains only one, the one with the smallest distance. If two or more labels are received with the same distance, the label with the smallest index is chosen. (2.3) Once the destination square is labeled (tj,d), for some j, it changes its label to (fj,d), where f stands forfinal. (2.4) Once a square is assigned a label (fj,d), for some j, it changes the label of its neighbor j from (tmd) to (fm,d), for some m. When the algorithm terminates, the squares labeled final define the path from origin to destination (if one such path exists). Assuming that the squares represent pixels, a line can be drawn along those pixels labeled final. The algorithm runs in O(N) parallel time. Any distance function can be used for assigning weight values. For example, if we simply want to minimize the number of squares traveled, weights of 1 are assigned to all arcs. On the other hand, if Euclidean distance is to be minimized, we can assign weights of I to vertical and horizontal arcs, and weights of X2 to diagonal arcs. It should be clear that the algorithm's speed can be nearly doubled by initializing two waves simultaneously, one originating at A and the other at B. 11.1.4 Concluding Remarks In this section, parallel algorithms were described for three geometric problems defined on a two-dimensional array of pixels. The algorithms presented are simple, efficient, and easy to implement. In each algorithm one processor is associated with every pixel. The algorithms differ, however, in the way the processors are interconnected. In the first algorithm, the processors conduct a geometric search independently of one another without the need to communicate among themselves. Consequently, the processors are not connected in any way. In the second algorithm, in order to compute the shadows created by images and a light source, each processor must communicate with its two horizontal and vertical neighbors. Finally, in the third algorithm, a shortest path is discovered between two given pixels, by allowing each processor to communicate
Chap. 11
Current Trends
146
0 4 2
7 9
. 8
*3
*
10
Figure 11.5 Instance of range searching problem.
with its diagonal neighbors (in addition to its horizontal and vertical neighbors). The discussion in this section suggests that using a grid of processors to solve raster geometric problems is an expedient strategy.
11.2 General Prefix Computations and their Applications Consider the following problem defined in [Spri891, where it is called general prefix computation (GPC): Let f (1), X (2), . .. , f (n) and y(l), y(2), . .. , y(n) be two sequences of elements with a binary associative operator "*" defined on the f -elements, and a linear order " and P,. In [Aker87a] and [Aker89] it is shown that there is an O('O log >1)length sequence of dimensions SI, S2, S3, . .,5S for S, and another 0 (1 log q)-length sequence of dimensions PI, P2, P3, . *, Pq for P, 2 < si < ql, 2 < pj < rI, such that broadcasting on S. or P,, can be done by letting each processor send its message along the dimensions SI, S2, . .- s., or PIP2, -. P Since Qi(log(n!)) = Q(i)log7) is the lower bound for broadcasting on any network with YI!nodes [Aker87a], assuming that each processor can communicate with only one neighbor in one time unit, all these broadcasting algorithms are optimal.
Current Trends
158
Chap. 11
Given elements Computing prefix sums, ranks, maxima, and minima. . - XN-1, stored in processors 0, 1, . . ., N-1 in a network with processors ordered such that processor i < processor j if and only if i < j, and an associative binary operation *, the parallelprefix computation (defined in Chapter 2) is to compute all the quantities sj = xo *x *... .*xj, i = 0, 1, . . ., N-1. At the end of the computation we require that processor j contain sj. Here, we refer to the problem as the prefix sums problem, since + is one possible binary associative operation. An 0 (, log 7)-time algorithm for computing all prefix sums on X, with respect to the processor ordering (Definition 2), using a constant-time routing scheme, is given below. The prefix sums computation on X,1 is done using the procedure GROUP-COPY. Suppose that we have computed prefix sums for two groups of substructures as follows: XO,
xI,
Group 1.
X,_1 (i) ... X-_ (i + k)
Group 2.
X,- i(i +k+ 1)...X-
I(i +2k+ 1)
and that each processor holds two variables, s and t, for storing the partial prefix sum so far and the total sum of values in the group it is in, respectively. Let the total sum in group 1 be t1 and the total sum in group 2 be t2. We first use GROUP COPY to send t1 to every processor in group 2, and t2 to every processor in group 1; then the prefix sums in processors in group 1 remain the same, while the prefix sum s in a processor in group 2 becomes s * tj. The total sum for all the processors in both groups becomes tj * t2 . All these steps can be accomplished in 0(1) time. When a group contains only one X, -1, the algorithm is called recursively. This leads to a running time of 0(rj log 0). It is straightforward to state the algorithm formally. However, care must be taken since 1 is not necessarily a power of 2. Assume now that some nodes in X,1 are marked. The rank of a marked node u is the number of marked nodes that precede u. The ranks of all the marked nodes can be computed in 0 (q log q) time by applying the prefix sums algorithm, with * being the usual addition +, each marked node having value 1, and others having value 0 (the rank of a marked node is its prefix sum minus one). The maximum and minimum of q! values stored one per node in X,, can also be found in 0 (q log)?) time by letting the binary associative operation in the prefix sums algorithm be max and min, respectively. The final result (either the maximum or the minimum) is reported in all processors. It is easy to see that the idea for broadcasting can also be used to find the maximum or minimum of q! elements in 0 (r log q) time on S,, or P,. Sorting, merging, unmerging, and finding cousins. Given a sequence of elements stored in a set of processors, with each processor holding one element, we say that the sequence is sorted in the F (forward) direction if for any two elements x and y held by processors p and q, respectively, p < q if and only if x < y. The R (reverse) direction is defined similarly. The sequential lower bound of Q ((i7!) log(y7!)) on the number of steps required for sorting q! numbers [Knut73] implies a lower bound of Q (log(q!)) = Q(q log q) on the number of parallel steps needed to sort on both S, and P,,. Sorting on S. has been studied in [Menn9O], in which an 0(03 log q)-time algorithm
Sec. 11.3
Parallel Computational Geometry on Stars and Pancakes
159
is given. This algorithm is based on a sorting algorithm for the mesh-connected computer given in [Sche89] and is outlined below as procedure 11-Star Sort. We denote by D the direction of the final sorted sequence, where D can be either F or R. We also use D to denote the direction opposite to D. Each iteration of step 2 in the procedure implements a merging algorithm. Procedure
o-Star Sort (D)
1. in parallel sort all the odd-numbered rows in the forward direction and all the even-numbered rows in the reverse direction recursively. 2. for j = I to Flog q] do a. Starting with row 1, arrange all rows into groups of 2J consecutively numbered rows (the last group may not have all 2i rows). b. in parallel sort the columns within each group of rows in the direction D. c. in parallel 1. sort the rows in odd-numbered groups by calling FTG (D); 2. sort the rows in even-numbered groups by calling FTG (D). Procedure FTG ("fixing the gap" as it is called in [Menn9O]) is defined as follows: Procedure FTG (D)
if the row is not a 1-star do 1. in parallel sort all columns in the direction D. 2. in parallel sort all rows with FTG (D). It is important to node that if procedure FTG is called with a k-star, then, by Assumption 1, each row in step 2 is a (k - I)-star. Also note that in step 2, FTG is applied to each row, with all rows being sorted in parallel. From the algorithms above we can see that sorting or merging on X, is reduced to sorting on the columns. Since each column is connected as a linear array, odd-even transposition sort [Akl89a] can be applied. This means that given two sorted sequences stored in two groups of X,-I's: A: X,- (i), X,-1(i + 1),
X-1(j),
B: X-A(k), X_ 1(k + 1), .
Xq1
(1),
i < j < k < I (A and B do not necessarily contain the same number of X,-i's), such
that A and B are in opposite directions, they can be merged into a sorted sequence stored in C: X 1- (i), Xq _I(i + 1)I ... I X1W(j), X-1 (k), X, _1(k + 1), 1 1
X1 1-i(1),
in either direction in O(q 2) time. Let t(ij) be the time to sort q! elements on S.; then t(7) = t(r1- 1) + Flog
ff1
x O(Y) 2 ) = O(173 log 1).
It is not hard to see that the same sorting and merging algorithms also apply to P,
Current Trends
160
Chap. 11
since the nodes in columns of P, when arranged in an r1 x (YI- 1)! array, can also be considered as connected (Assumption 2). Now let A, B, and C as defined above be given, such that each element in C knows the rank of the node in which it was before the merging. The problem of unmerging is to permute the list to return each element in C to its original node in A or B. This operation is the inverse of merging. The problem can be solved by running the merging algorithm in reverse order, using the given rank information (the rank information is used to compute address coordinates, as defined in the following section; the unmerging procedure is basically an ASCEND-type algorithm). The problem of unmerging can also be solved by applying the operations of concentration and translation to be described later (concentrate the elements of A, then those of B, then translate the elements of B). Both approaches take O(172) time. Let A and B be two sorted lists stored in two groups of X 1-'s, and let a be an element of A. The cousins of a in B are two consecutive elements b, and b2 in B, such that a lies between b, and b2 in the sorted list resulting from merging A and B (we assume that B has two dummy elements, -oc and +oo, for obvious reasons). The cousins in B of each element in A can be determined in o(q 2) time by merging and interval broadcasting (the latter is described further below). Two classes of parallel algorithms. Suppose that all the nodes uo, u I . U,,!-, in X11 have been ordered such that Uk (3,6,-2 O-),(4,9,15
C
Figure 11.20 Implementing the BROADCAST instruction.
components such as registers) is only a constant factor greater than the cost of a comparator.
11.4.4 Concluding Remarks We conclude our discussion of BSR by mentioning the following open problem. For many applications, we would like to be able to utilize simultaneously two tags, two selectors, and two limit values at the BROADCAST. In other words, we want a memory location to "accept" data fulfilling two conditions instead of one. A good example is the GPC that captures the core of many important problems, particularly in computational geometry. To solve this problem in constant time on the BSR model, we require "double selection": the first to select ji < m and the second to select y(ji) < y(m). The implementation of [Fava91] does not allow this operation, and it is not clear that it can be obtained by an extension of that implementation. Efficiently implementing double selection, and more generally multiple selection, remains a major open problem.
11.5 Problems 11.1.
11.2.
11.3.
180
In the geometric search problem of Section 11.1.1, we assumed that regions in a grid can have holes and disconnected components. Does lifting one or both of these assumptions result in a problem for which more efficient parallel algorithms exist? Develop an alternative solution to the geometric search problem of Section 11.1.1 which requires O(n + w log n) time per INSERT or DELETE operation and 0(1) time per RETURN TOP REGION operation, where n is the number of regions and w is the number of pixels in a region. How much storage does your algorithm require? Design solutions to the shadow problem of Section 11.1.2 for each of the following models of computation:
Sec. 11.5
11.4.
11.5.
11.6.
11.7. 11.8.
Problems
181
(a) Mesh with broadcast buses (b) Mesh with reconfigurable buses (c) Modified CCC It is pointed out at the end of Section 11.1.3 that the speed of the Algorithm Maze can be nearly doubled by initializing two waves simultaneously, one originating at A and the other at B. Give a formal statement of an algorithm that makes use of this idea and runs on the same model of computation as Algorithm Maze. Develop algorithms for solving the path in a maze problem on models of parallel computation other than the systolic screen. For example, how fast can the problem be solved on a hypercube? On both the CREW PRAM and the mesh, the time to compute the GPC by the algorithm of Section 11.2.2 matches (up to a constant factor) the time required to sort. This is optimal since any GPC algorithm can be used to sort. On the hypercube, however, this is not the case. To date, the fastest algorithm for sorting n elements on an n-processor hypercube runs in O(logn loglogn) time. On the other hand, the algorithm of Section 11.2.2 for computing the GPC on the hypercube takes O(log 2 n) time. Can you develop a faster algorithm for computing the GPC on the hypercube? Alternatively, can you show that the algorithm of Section 11.2.2 is optimal? Suggest other problems where GPC might lead to efficient parallel algorithms. It is stated in [Spri89] that the GPC can be computed in constant time on the BSR model using n processors. The algorithm given therein is as follows. Each processor i, I < i < n, broadcasts a tag y(i) and a datum f (i). Each memory location m, I < m < n, applies the selection operator < and the limit parameter y(m) to select those data f (j) for which j < m and y(j) < y(m). All selected data f (j) are then reduced, using the reduction operator *, to one value Dm that is finally stored in memory location m.
It would appear, however, that this algorithm requires double selection. Indeed, we need to know that j < m and that y(j) < y(m) before including f(j) in the computation of D.. The circuit implementing BSR and described in Section 11.4.3 does not allow double selection. (a) Can the circuit of Section 11.4.3 be extended to allow double selection? (b) Alternatively, can the GPC be computed on the existing implementation of BSR without the need for double selection? (c) Regardless of what the answers to (a) and (b) might be, are there geometric problems where double (or, more generally, multiple) selection in BSR might be useful? 11.9. The algorithms described in Section 11.3.2 for sorting on the star and pancake networks run in 0(q 3 log a) time. As suggested at the end of Section 11.3.3, however, there may be room for improvement since the lower bound on the time required for sorting q! elements using q! processors is Q (log q!) [i.e., Q(q log ?)]. Faster algorithms for sorting on these networks would imply faster algorithms for solving numerous other problems, including several in computational geometry. Can you find such algorithms? Alternatively, can you show that they do not exist? 11.10. Are there other problems in computational geometry (besides the ones described in Section 11.3) that can be solved efficiently on star and pancake networks? 11.11. Given n points in the plane, it is required to find the two that are closest to each other (i.e., the closest pair). Can this problem be solved in constant time using n processors on the BSR model?
182
Current Trends
Chap. 11
Hint: An extension of the BSR notation may be required. If the reduction operator is max or min, we may want the value stored to be the index of the maximum or minimum of a set of values rather than that value itself. 11.12. A set P of n points in the plane is given, where it is assumed for simplicity that no two points have the same x- or y-coordinate. Let xmin and Xmax be the two points with minimum and maximum x-coordinate, respectively. The convex hull of P can be regarded as consisting of two convex polygonal chains: the upper hull, which goes from xmij to Xmax (above the line segment (xmin, Xmax)), and the lower hull, which goes from Xmax to xmjn (below the line segment (Xmin, max)). Many convex hull algorithms are based on the idea of computing the upper hull and the lower hull separately. In fact, an algorithm for the upper hull requires minor modifications to produce the lower hull. Now consider the following algorithm for computing the upper hull of P [Ferr9la].
Algorithm Upper Hull Step 1. Sort the points of P by their x-coordinates. Step 2. For each point p of P do the following: 1. Among all points to the right of p, find the point q such that the line through the segment (p,q) forms the largest angle with the horizontal. 2. Label all points of P that fall below (p,q). Step 3. All unlabeled points form the upper hull. (a) Prove that this algorithm correctly finds the upper hull of P. (b) Can this algorithm be implemented to run in constant time, using n processors, on the BSR model? Hint: The hint in Problem 11.11 may be helpful here also.
11.6 References [Agga88]
A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dtnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. [Agga921 A. Aggarwal, Ed., Special Issue: Parallel Computational Geometry, Algorithmica, Vol. 7, No. 1, 1992. [Ajta83] M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n logn) sorting network, Combinatorica, Vol. 3, 1983, 1-19. [Aker87a] S. B. Akers, D. Harel, and B. Krishnamurthy, The star graph: an attractive alternative to the n-cube, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 393-400. [Aker87b] S. B. Akers and B. Krishnamurthy, The fault tolerance of star graphs, Proceedings of the Second International Conference on Supercomputing, San Francisco, May 1987. [Aker89] S. B. Akers and B. Krishnamurthy, A group theoretic model for symmetric interconnection networks, IEEE Transactions on Computers, Vol. C-38, No. 4, 1989, 555-566. [Akl89a] S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989.
Sec. 11.6 [Akl89b]
[Akl89c] [Akl89d] [Akl90] [AkI91aJ
[Akl91b]
[Ak191c]
[Akl91d]
[Aki9le]
[Alt87]
[Ayka9l]
[Batc68]
[Bern88]
[Beye69] [Blel89] [Boxe89a]
References
183
S. G. Akl, On the power of concurrent memory access, in Computing and Information, R. Janicki and W. W. Koczkodaj (Editors), Elsevier, New York, Proceedings of the International Conference on Computing and Information, ICCI '89, Toronto, 1989, 49-55. S. G. Aki and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. S. G. AkU, Reflections on a parallel model of computation, Invited talk, First Great Lakes Computer Science Conference, Kalamazoo, Michigan, October 1989. S. G. Akl, H. Meijer, and D. Rappaport, Parallel computational geometry on a grid, Computers and Artificial Intelligence, Vol. 9, No. 5, 1990, 461-470. S. G. Akl, Parallel synergy: can a parallel computer be more efficient than the sum of its parts? Proceedings of the Thirteenth IMACS World Congress on Computation and Applied Mathematics, Dublin, July 1991. S. G. Akl, Memory access in models of parallel computation: from folklore to synergy and beyond, in Algorithms and Data Structures, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1991, 92-104. S. G. AkI and G. R. Guenther, Application of BSR to the maximal sum subsegment problem, International Journal of High Speed Computing, Vol. 3, No. 2, June 1991, 107-119. S. G. AkU, K. Qiu, and I. Stojmenovi6, Computational geometry on the star and pancake networks, Proceedings of the Third Annual Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 252-255. S. G. Akl, K. Qiu, and I. Stojmenovic, Data communication and computational geometry on the star and pancake interconnection networks, Proceedings of the Third Symposium on Parallel and Distributed Processing, Dallas, December 1991, 415-422. H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata, Deterministic simulation of idealized parallel computers on more realistic ones, SIAM Journal on Computing, Vol. 16, No. 5, October 1987, 808-835. C. Aykanat and T. M. Kurq, Efficient parallel maze routing algorithms on a hypercube multicomputer, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. III, Algorithms and Architectures, 224-227. K. E. Batcher, Sorting networks and their applications, Proceedings of the AFIPS 1968 Spring Joint Computer Conference, Atlantic City, New Jersey, April/May 1968, 307-314. M. Bern, Hidden surface removal for rectangles, Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 183-192. W. T. Beyer, Recognition of topological invariants by iterative arrays, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1969. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161-191.
184
Current Trends
Chap. 11
[Brow79b] K. Q. Brown, Geometric transforms for fast geometric algorithms, Ph.D. thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1979. [Cole88b] R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. [Cyph9O] R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dehn88e] F. Dehne, A. Hassenklover, J.-R. Sack, and N. Santoro, Parallel visibility on a mesh connected parallel computer, in Parallel Processing and Applications, E. Chiricozzi and A. D'Amico (Editors), North-Holland, Amsterdam, 1988, 203-210. [Dehn9la] F. Dehne and S. E. Hambrusch, Parallel algorithms for determining k-width connectivity in binary images, Journal of Parallel and Distributed Computing, Vol. 12, No. 1, May 1991, 12-23. [Dehn9lb] F. Dehne, Ed., Special Issue: Parallel Algorithms for Geometric Problems on Digitized Pictures, Algorithmica, Vol. 6, No. 5, 1991. [Fava9O] L. Fava, The design of an efficient BSR network, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1990. [Fava9l] L. Fava Lindon and S. G. Akl, An Optimal Implementation of Broadcasting with Selective Reduction, Technical Report 91-298, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. [Ferr9la] A. G. Ferreira, personal communication, 1991. [Four88] A. Fournier and D. Fussel, On the power of the frame buffer, ACM Transactions on Computer Graphics, Vol. 7, 1988, 103-128. [Free75] H. Freeman and R. Shapira, Determining the minimal area rectangle for an arbitrary closed curve, Communications of the ACM, Vol. 18, 1975, 409-413. [Gibb88] A. Gibbons and W. Rytter, Efficient Parallel Algorithms, Cambridge University Press, Cambridge, England, 1988. [Good92a] M.T. Goodrich, Ed., Special Issue: Parallel Computational Geometry, International Journal on Computational Geometry and Applications, 1992 [Grah72] R. L. Graham, An efficient algorithm for determining the convex hull of a finite planar set, Information Processing Letters, Vol. 1, 1972, 132-133. [Guib82] L. J. Guibas and J. Stolfi, A language for bitmap manipulation, ACM Transactions on Computer Graphics, Vol. 1, 1982, 191-214. [Houl85] M. E. Houle and G. T. Toussaint, Computing the width of a set, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, 1985, 1-7. [Jeon9O] C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [John77] D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, Journal of the ACM, Vol. 24, No. 1, 1977, 1-13. [Jwo9O] J. S. Jwo, S. Lakshmivarahan, and S. K. Dhall, Embedding of cycles and grids in star graphs, Proceedingsof the Second IEEE Symposium on Parallel and Distributed Processing, Dallas, December 1990, 540-547. [Karp9O] R. M. Karp and V. Ramachandran, A survey of parallel algorithms for shared
Sec. 11.6
References
185
memory machines, in Handbook of Theoretical Computer Science, J. van Leeuwen
[Knut731 [Ku&e82] [Lang761
(Editor), North-Holland, Amsterdam, 1990, 869-941. D. E. Knuth, The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, Massachusetts, 1973. L. Kucera, Parallel computation and conflict in memory access, Information Processing Letters, Vol. 14, No. 2, April 1982, 93-96. T. Lang and H. S. Stone, A shuffle-exchange network with simplified control, IEEE Transactions on Computers, Vol. C-25, No. 1, January 1976, 55-65.
[Lee6l]
C. Y. Lee, An algorithm for path connections and its applications, IRE Transactions on Electronic Computers, Vol. EC-10, No. 3, 1961, 346-365.
[Lee86a]
D. T. Lee, Geometric location problems and their complexity, Proceedings of the Symposium on Mathematical Foundations of Computer Science, Lecture Notes in
[Lee86b]
Computer Science, No. 233, Springer-Verlag, Berlin, 1986, 154-167. D. T. Lee and Y. F. Wu, Geometric complexity of some location problems, Algorithmica, Vol. 1, 1986, 193-211.
[Leig9l]
F. T. Leighton, Introduction to ParallelAlgorithms and Architectures: Arrays
[Menn9O]
* Hypercubes, Morgan Kaufman, San Mateo, California, 1991. A. Menn and A. K. Somani, An efficient sorting algorithm for the star graph
.
Trees
interconnection network, Proceedings of the 1990 International Conference on
[Mill89b]
Parallel Processing, St. Charles, Illinois, August 1990, 1-8. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340.
[Nass8O]
[Nass8l] [Nass82] [Niga90]
D. Nassimi and S. Sahni, Finding connected components and connected ones on a mesh-connected parallel computer, SIAM Journal on Computing, Vol. 9, No. 4, November 1980, 744-757. D. Nassimi and S. Sahni, Data broadcasting in SIMD computers, IEEE Transactions on Computers, Vol. C-30, No. 2, February 1981, 101-106. D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms and a new generalized connection network, Journal of the ACM, Vol. 29, No. 3, 1982, 642-667. M. Nigam, S. Sahni, and B. Krishnamurthy, Embedding hamiltonians and hypercubes in star interconnection graphs, Proceedings of the InternationalConference on Parallel
Processing, St. Charles, Illinois, August 1990, 340-343. [Parb87]
[Pate9O] [Prea88]
1. Parberry, Parallel Complexity Theory, Research Notes in Theoretical Computer
Science, Pitman Publishing, London, 1987. M. S. Paterson, Improved sorting networks with O(log N) depth, Algorithmica, Vol. 5, 1990, 75-92. B. T. Preas, M. J. Lorenzetti, and B. D. Ackland (Editors), Physical Design Automation of Electronic Systems, Benjamin-Cummings, Menlo Park, California,
[Prep8l] [Prep851
[Qiu9la]
1988. F. P. Preparata and J. Vuillemin, The cube-connected-cycle: a versatile network for parallel computation, Communications of the ACM, Vol. 24, No. 5, 1981, 300-309. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. K. Qiu, H. Meijer, and S. G. AkU, Parallel routing and sorting on the pancake network, Proceedingsof the International Conference on Computing and Information,
186
Current Trends
Chap. 11
Ottawa, May 1991, Lecture Notes in Computer Science, No. 497, Springer-Verlag, Berlin, 360-371. [Qiu9lb] K. Qiu, H. Meijer, and S. G. Akl, Decomposing a star graph into disjoint cycles, Information Processing Letters, Vol. 39, No. 3, August 1991, 125-129. [Qiu9lcI K. Qiu, S. G. AkU, and H. Meijer, The Star and Pancake Interconnection Networks: Properties and Algorithms, Technical Report 91-297, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. [Rana87] A. G. Ranade, How to emulate shared memory, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 185-194. [Rank90] S. Ranka and S. Sahni, Hypercube Algorithms with Application to Image Processing and Pattern Recognition, Springer-Verlag, New York, 1990. [Rey87] C. Rey and R. Ward, On determining the on-line minimax linear fit to a discrete point set in the plane, Information Processing Letters, Vol. 24, No. 2, 1987, 97-101. [Roth76] J. Rothstein, On the ultimate limitations of parallel processing, Proceedings of the 1976 International Conference on Parallel Processing, Detroit, August 1976, 206-212. [Sche89] I. D. Scherson and S. Sen, Parallel sorting in two-dimensional VLSI models of computation, IEEE Transactions on Computers, Vol. C-38, No. 2, 1989, 238-249. [Sham78] M. I. Shamos, Computational geometry, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, Connecticut, 1978. [Shan5O] C. E. Shannon, Memory requirements in a telephone exchange, Bell Systems Technical Journal, Vol. 29, 1950, 343-349. [Shi9l] X. Shi, Contributions to sequence problems, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1991. [Snyd86] L. Snyder, Type architectures, shared memory and the corollary of modest potential, Annual Review of Computer Science, Vol. 1, 1986, 289-317. [Spri89] F. Springsteel and I. Stojmenovi6, Parallel general prefix computations with geometric, algebraic, and other applications, International Journal of Parallel Programming, Vol. 18, No. 6, December 1989, 485-503. [Stoj88a] I. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. [Thom771 C. D. Thompson and H. T. Kung, Sorting on a mesh-connected parallel computer, Communications of the ACM, Vol. 4, No. 20, 1977, 263-271. [Tous83] G. T. Toussaint, Solving geometric problems with the "rotating calipers", Proceedings of IEEE MELECON'83, Athens, May 1983. [Ullm84] J. D. Ullman, Computational Aspects of VLSI, Computer Science Press, Rockville, Maryland, 1984. [Vish84] U. Vishkin, A parallel-design distributed-implementation (PDDI) general-purpose computer, Theoretical Computer Science, Vol. 32, 1984, 157-172. [Won87] Y. Won and S. Sahni, Maze routing on a hypercube multiprocessor computer, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 630-637.
12 Future Directions
As its title indicates, this final chapter aspires to point toward some directions for future research in parallel computational geometry. We discuss implementing data structures on network models, problems related to visibility (such as art gallery, illumination and stabbing problems), geometric optimization using neural nets, arrangements, P-complete problems, and dynamic computational geometry.
12.1 Implementing Data Structures on Network Models The most important feature of the PRAM, and the reason for its power, is the common memory shared by the processors. Not only does the shared memory serve as a communication medium for the processors, but it allows a direct implementation of complex data structures, in a manner very similar to the way they are implemented on the memory of a sequential computer. The situation is considerably different for processor networks, where the memory is no longer shared but instead, distributed among the processors. Implementing data structures in the latter situation poses a more serious challenge. The problem of implementing data structures on a hypercube is addressed in [Dehn9O]. A class of graphs called ordered h-level graphs is defined, which includes most of the standard data structures. It is shown in [Dehn9O] that for such a graph with n nodes stored on a hypercube, O(n) search processes can be executed in parallel. This allows efficient solutions to be obtained for the following two problems: 1. Given a set of n linear segments in the plane, consider one endpoint of some segment p in the set. Let two rays emanate from that endpoint in the direction of the positive and negative y-axis, respectively. The rays intersect at most two other segments, called the trapezoidal segments for that endpoint. The trapezoidal map of the set consists in defining for each endpoint its trapezoidal segments. 2. Given an n-vertex simple polygon P, it is required to triangulate P. 187
188
Future Directions
Chap. 12
It is shown in [Dehn90 that both of these problems can be solved on a hypercube of size O(n logn) in time O(log 2 n). (Recall that faster algorithms to solve these problems exist for the more powerful CREW PRAM model as shown in Chapter 10.) Consider a data structure modeled as a graph G with n nodes of constant degree. The multisearch problem calls for efficiently performing 0(n) search processes on such a data structure. An additional condition on the problem is that each search path is defined on line (i.e., once a search query reaches some node v of G, it then determines which node of G it should visit next, using information stored at v). In [Atal9la] the multisearch problem is solved for numerous classes of data structures in 0(n"/2 + rn 1/2/log n) time on a mesh-connected-computer of size n, where n is the size of the data structure and r is the longest path of a search query. This result leads to an optimal 0(n 1/2)-time algorithm for the three-dimensional convex hull problem on a mesh of size n with constant storage per processor (see Section 3.5.1). These results suggest that implementing data structures on processor networks is a worthwhile endeavor that deserves to be pursued for other models besides the hypercube and the mesh.
12.2 Problems Related to Visibility Illumination and stabbing are two problems related to visibility (see Chapter 6) which are the subject of intense study in computational geometry, but for which parallel algorithms are yet to be developed.
12.2.1 Art Gallery and Illumination Problems The fundamental art gallery problem, now a classic in computational geometry, asks for determining the minimum number of guards sufficient to cover the interior of an n-wall art gallery room. The origins of this problem are traced in [O'Rou87], where many interesting variations and results are also presented. One version (giving rise to a family of problems) asks for the minimum number of lights sufficient to illuminate a set of objects (lines, triangles, rectangles, circles, convex sets, etc.). Here a light source is stationary and can illuminate 360 degrees about its position. For example, as shown in Figure 12.1, three light sources suffice to illuminate six straight-line segments. Although the topic is receiving considerable attention presently [Czyz89a, Czyz89b, Czyz89c, Urru89, Sher9O] to our knowledge, only one parallel algorithm has been developed to solve art gallery and illumination problems: It is shown in [Agga88] how an optimal placement of guards in an art gallery (in the shape of an n-vertex simple polygon) can be obtained in O(logn) time on an 0(n log n)-processor CREW PRAM.
12.2.2 Stabbing The following problems studied in [Pell90] for a set T of triangles in 3-space are representative of a wide range of stabbing problems:
Sec. 12.3
Geometric Optimization Using Neural Nets
Figure 12.1 segments.
189
Three light sources suffice to illuminate six straight-line
1. Query problem: Given a line, does it stab T (i.e., does the line intersect each triangle in the set)?
2. Existence problem: Does a stabbing line exist for T? 3. Ray shooting problem. Which is the first triangle in T hit by a given ray? Typically, the objects being stabbed range from lines to polygons to polyhedra, while ray shooting (also called ray tracing) is used to enumerate pairs of visible faces of a polyhedron and to determine whether certain complex objects (such as
nonconvex polyhedra) intersect. We are not aware of any effort toward developing parallel algorithms for these problems.
12.3 Geometric Optimization Using Neural Nets Investigations into the use of neural nets for solving optimization problems in computational geometry are currently under way [Dehn92]. In a neural net, each node (or neuron) is a processor connected to other neurons. A threshold d(i) is associated with each neuron N(i). A weight w(i,j) is associated with the edge leaving neuron N(i) and entering neuron N(j). A value v(i, t + i) is N(i)'s output at time t + 1: It is a function of d(i), w(j,i) and v(jt) for all N(j) connected to N(i) by an edge directed from N(j) to N(i). Usually, v(i,t) is a "step" function whose value is 0 unless its input exceeds a certain threshold, in which case its value is 1. A near-optimal solution to a problem is found by minimizing an energy function E. The neurons operate simultaneously, and after each iteration it is determined whether E has reached a (local) minimum, or whether a new iteration is to be performed [Hopf85, Rama88]. As an example, consider the following problem: Given an n-vertex simple polygon, it is required to triangulate the polygon such that the sum of the lengths of the edges
190
Future Directions
Chap. 12
forming the triangulation is a minimum. We associate one neuron with each of the n(n - 1)(n - 2)/6 triangles. Triangle T(i) will be said to belong to the optimal triangulation if and only if v(i, t), the output of neuron N(i), is 1. The triangulation sought must satisfy the following conditions: 1. Each boundary edge (i.e., each edge of the given polygon) belongs to exactly one triangle. 2. Each interior edge (i.e., each edge added to create a triangulation) is shared by exactly two triangles. 3. The area of the given polygon is equal to the sum of the areas of the triangles forming the triangulation. 4. The sum of the circumferences of the triangles forming the triangulation is minimal. A function E of the output v(i,t) is thus derived using conditions 1 through 4, whose minimum corresponds to the optimal triangulation. One of the main difficulties of this approach is in choosing appropriate values for the many parameters used. These parameters include the thresholds d(i) and the weights w(i,j) used in computing the v(i,t) at each iteration. They also include the various multiplicative and additive constants used in the expression for E. Another difficulty is in the choice of the initial values for the v (i, t). It should also be emphasized that there is no guarantee that the solution obtained is indeed optimal or that the process converges quickly. However, this field is still in its infancy and it is clear that a lot of work and new insights are still needed.
12.4 Parallel Algorithms for Arrangements A problem that has gained recent attention by designers of parallel algorithms is that of constructing arrangements. The problem is to determine the geometric structure of the intersections of objects in space and, in particular, the structure of the intersections of lines in the plane. Given a set L of lines in the plane, their arrangement A(L) is a subdivision of the plane. Algorithms for a single processor can be found in [Edel86b] and [Edel90]. Arrangements of n lines in two dimensions can be computed in 0(n 2 ) time, and arrangements of n hyperplanes in d dimensions can be computed in 0(nd) time with a single processor [Edel86b]. In [Ande90], an algorithm for computing the arrangement of n lines in two dimensions on a CREW PRAM is given that runs in 0 (log n log* n) time and uses 0 (n2 / logn) processors. This algorithm is generalized in [Ande90 to compute the arrangement of n hyperplanes in d dimensions on a CREW PRAM in 0(loglog*n) time using 0(nd/ logn) processors. It is shown in [Hage9O] how the latter problem can be solved in 0(logn) time by a randomized algorithm that runs on an ARBITRARY CRCW PRAM and uses 0(nd/ logn) processors. An EREW PRAM algorithm for constructing an arrangement of n lines on-line is also given in [Ande9O], where each insertion is done optimally in 0(logn) time using 0(n/logn) processors. Finally, several CREW PRAM algorithms are given in [Good9Ob] that use
Sec. 12.7
Problems
191
generalized versions of parallel plane sweeping to solve a number of problems including hidden surface elimination and constructing boundary representations for collections of objects.
12.5 P-Complete Geometric Problems A problem of size n is said to belong to the class NC if there exists a parallel algorithm for its solution which uses 0(nP) processors and runs in O(log" n) time, where p and q are nonnegative constants. Let P be the class of problems solvable sequentially in time polynomial in the size of the input. A problem 11 in the class of P-complete problems has the following two properties: 1. It is not known whether 11 is in NC. 2. If rl is shown to be in NC, then P = NC [Cook85]. It is shown in [Atal9Ob] that a number of geometric problems in the plane belong to the class of P-complete problems. This work suggests that there may exist other natural two-dimensional geometric problems for which there is no algorithm that runs in polylogarithmic time while using a polynomial number of processors.
12.6 Dynamic Computational Geometry In applications such as robotics, graphics, and air traffic control, it is often required to determine geometric properties of systems of moving objects. As these applications usually occur in real-time environments, the value of a parallel (and hence fast) solution is great. In an abstract setting, we are given a number of points (objects) that are moving in Euclidean space, with the added condition that for each point (object) every coordinate of its motion is a polynomial of bounded degree in the time variable. This formulation is used in [Boxe89b] to derive CREW PRAM algorithms for several problems, including the nearest neighbor, closest pair, collision, convex hull, and containment problems. For n moving objects, the algorithms run typically in O(log 2 n) time, using a number of processors that is only a little worse than linear in n. Mesh and hypercube algorithms for the same problems are described in [Boxe89a]. We feel that these important contributions have barely scratched the surface, however, and that many well-known computational geometric problems with applications to dynamic systems await parallel solutions.
12.7 Problems 12.1.
The standard data structures of sequential computation include linked lists, queues, stacks, trees, and so on. (a) Show how the standard data structures of sequential computation can be implemented on a linear array of processors.
192
Future Directions
Chap. 12
(b) Investigate problems in computational geometry where the implementations of part (a) lead to efficient parallel solutions. 12.2. Repeat Problem 12.1 for the following models of parallel computation: (a) Tree (b) Mesh-of-trees (c) Modified AKS network (d) Pyramid 12.3. A straight line that intersects each member of a set S of geometric objects is called a stabbing line, or a transversal, for S. Assume that S consists of n isothetic unit squares in the plane and that a transversal for S exists. Design a parallel algorithm for finding a placement of the minimum number of lights sufficient to illuminate the squares. 12.4. Let P be a simple polygon with n vertices, and let k be a fixed integer, I < k < n. It is required to place k guards inside P so that the area of P visible to the guards is maximized. Design algorithms for solving this problem on the following models of parallel computation: (a) Hypercube (b) Modified CCC (c) Scan 12.5. Given a simple polygon P, it is required to find the shortest closed path in P such that every point of P is visible from some point on the path. Develop a parallel algorithm for solving this problem. 12.6. Assume that a set S of straight-line segments is given. It is required to compute a transversal of S of minimum length. Discuss various parallel solutions to this problem on different models of computation. 12.7. Repeat Problem 12.6 for the case where S is a set of convex polygons. 12.8. Develop neural net solutions to the following geometric optimization problems defined on a set of planar points: (a) Minimum spanning tree (b) Minimum-weight perfect matching (c) Minimum-weight triangulation 12.9. Two sets of points in the plane are said to be linearly separable if a straight line can be found such that the two sets are on different sides of the line. Design a neural net algorithm for testing linear separability of two sets of points in the plane. 12.10. Given a set L of n straight lines in the plane, it is required to compute the arrangement A(L) of L on a hypercube computer. Your algorithm should produce, for each line in L, a sorted list of its intersections with other lines in L. 12.11. Are there problems in computational geometry that you suspect are not in NC? 12.12. Design and compare algorithms for solving the nearest-neighbor problem for a set of moving points in the plane on the following models of parallel computation: (a) Linear array (b) Mesh (c) d-Dimensional mesh (d) Mesh with broadcast buses (e) Mesh with reconfigurable buses
Sec. 12.8
References
193
12.8 References [Agga88] [Ande90
[Atal90bI
[Atal9la]
[Boxe89a] [Boxe89b] [Cook851 [Czyz89a]
[Czyz89b]
[Czyz89c]
[Dehn90]
[Dehn92] [Edel86b]
[Edel90
[Good9Ob]
[Hage90]
A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Ddnlaing and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. R. Anderson, P. Beame, and E. Brisson, Parallel algorithms for arrangements, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 298-306. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-complete geometric problems (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 317-326. M. J. Atallah, F. Dehne, R. Miller, A. Rau-Chaplin, and J.-J. Tsay, Multisearch techniques for implementing data structures on a mesh-connected computer, Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, Hilton Head, South Carolina, July 1991, 204-214. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161 -191. L. Boxer and R. Miller, Parallel dynamic computational geometry, Journal of New Generation Computer Systems, Vol. 2, No. 3, 1989, 227-246. S. A. Cook, A taxonomy of problems with fast parallel algorithms, Information and Control, Vol. 64, 1985, 2-22. J. Czyzowicz, E. Rivera-Campo, N. Santoro, J. Urrutia, and J. Zaks, Guarding Rectangular Art Galleries, Technical Report TR-89-27, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, and J. Urrutia, Illuminating Rectangles and Triangles on the Plane, Technical Report TR-89-50, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, J. Urrutia, and J. Zaks, Illuminating Lines and Circles on the Plane, Technical Report TR-89-49, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 367-375. F. Dehne, B. Flach, M. Gastaldo, D. Graf, R. Merker, R. Sack, and N. Valiveti, Computational geometry on Hopfield networks, manuscript in preparation, 1992. H. Edelsbrunner, J. O'Rourke, and R. Seidel, Constructing arrangements of lines and hyperplanes with applications, SIAM Journal on Computing, Vol. 15, No. 2, 1986, 341 -363. H. Edelsbrunner, L. J. Guibas, and M. Sharir, The complexity and construction of many faces in arrangements of lines and of segments, Discrete and Computational Geometry, Vol. 5, 1990, 161-196. M. T. Goodrich, M. R. Ghouse, and J. Bright, Generalized sweep methods for parallel computational geometry (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 280-289. T. Hagerup, H. Jung, and E. Welzl, Efficient parallel computation of arrangements
194
Future Directions
Chap. 12
of hyperplanes in d dimensions, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 290-297. [Hopf8S] J. J. Hopfield and D. W. Tank, "Neural" computation of decisions in optimization problems, Biological Cybernetics, Vol. 52, 1985, 141-152. [O'Rou87] J. O'Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, New York, 1987. [PeIl901 M. Pellegrini, Stabbing and ray shooting in 3 dimensional space, Proceedings of the Sixth Annual ACM Symposium on Computational Geometry, Berkeley, California, June 1990, 177-186. [Rama88] J. Ramanujam and P. Sadayappan, Optimization by neural networks, Proceedings of the IEEE InternationalConference on Neural Networks, San Diego, 1988, (11)325-332. [Sher9O] T. Shermer, Recent Results in Art Galleries, Technical Report CMPT TR 90-10, Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, 1990. [Urru891 J. Urrutia and J. Zaks, Illuminating Convex Sets, Technical Report TR-89-31, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989.
Bibliography
[Agga85]
[Agga88] [Agga92] [Ajta83] [Aker87a]
[Aker87b]
[Aker89]
[Akl82] [Akl84] [Akl85a]
[Akl85b] [Akl89a] [Akl89b]
[Akl89c] [Akl89d] [Akl90] [Akl9laI
A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dtnlaing, and C. K. Yap, Parallel computational geometry, Proceedings of the Twenty-Sixth Annual Symposium on Foundations of Computer Science, Portland, Oregon, October 1985, 468-477. A. Aggarwal, B. Chazelle, L. J. Guibas, C. 6'Dunlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. A. Aggarwal, Ed., Special Issue: Parallel Computational Geometry, Algorithmica, Vol. 7, No. 1, 1992. M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n logn) sorting network, Combinatorica, Vol. 3, 1983, 1-19. S. B. Akers, D. Harel, and B. Krishnamurthy, The star graph: an attractive alternative to the n-cube, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 393-400. S. B. Akers and B. Krishnamurthy, The fault tolerance of star graphs, Proceedings of the Second International Conference on Supercomputing, San Francisco, May 1987. S. B. Akers and B. Krishnamurthy, A group theoretic model for symmetric interconnection networks, IEEE Transactionson Computers, Vol. C-38, No. 4, 1989, 555-566. S. G. Akl, A constant-time parallel algorithm for computing convex hulls, BIT, Vol. 22, 1982, 130-134. S. G. AkU, Optimal parallel algorithms for computing convex hulls and for sorting, Computing, Vol. 33, 1984, 1-11. S. G. Akl, Optimal parallel algorithms for selection, sorting and computing convex hulls, in Computational Geometry, G. T. Toussaint (Editor), Elsevier, Amsterdam, 1985, 1-22. S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. S. G. AkU, On the power of concurrent memory access, in Computing and Information, R. Janicki and W. W. Koczkodaj (Editors), Elsevier, New York, Proceedings of the International Conference on Computing and Information, ICCI '89, Toronto, 1989, 49-55. S. G. AkI and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. S. G. AkU, Reflections on a parallel model of computation, Invited talk, First Great Lakes Computer Science Conference, Kalamazoo, Michigan, October 1989. S. G. AkM, H. Meijer, and D. Rappaport, Parallel computational geometry on a grid, Computers and Artificial Intelligence, Vol. 9, No. 5, 1990, 461-470. S. G. AkU, Parallel synergy: can a parallel computer be more efficient than the sum of its parts? Proceedings of the Thirteenth IMACS World Congress on Computation and Applied Mathematics, Dublin, July 1991.
195
196 [Akl91b]
[Akl91c]
[Akl91d]
[Akl91e]
[Alnu89] [Alt87]
[Ande891
[Ande9O]
[Asan85] [Asan86]
[Asan88] [Atal85]
[AtalS6a] [Atal86b]
[Atal87]
[Atal88a]
Bibliography S. G. Akl, Memory access in models of parallel computation: from folklore to synergy and beyond, in Algorithms and Data Structures, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1991, 92-104. S. G. AkU and G. R. Guenther, Application of BSR to the maximal sum subsegment problem, International Journal of High Speed Computing, Vol. 3, No. 2, June 1991, 107-119. S. G. Akl, K. Qiu, and I. Stojmenovi6, Computational geometry on the star and pancake networks, Proceedings of the Third Annual Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 252-255. S. G. Akl, K. Qiu, and 1. Stojmenovi6, Data communication and computational geometry on the star and pancake interconnection networks, Proceedings of the Third Symposium on Parallel and Distributed Processing, Dallas, December 1991, 415-422. H. M. Alnuweiri and V. K. Prasanna Kumar, An efficient VLSI architecture with applications to geometric problems, Parallel Computing, Vol. 12, 1989, 71-93. H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata, Deterministic simulation of idealized parallel computers on more realistic ones, SIAM Journal on Computing, Vol. 16, No. 5, October 1987, 808-835. R. Anderson, P. Beame, and E. Brisson, Parallel Algorithms for Arrangements, Technical Report 89-12-08, Department of Computer Science, University of Washington, Seattle, 1989. R. Anderson, P. Beame, and E. Brisson, Parallel algorithms for arrangements, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 298-306. T. Asano, An efficient algorithm for finding the visibility polygon for a polygonal region with holes, Transactions of the IECE Japan E-68, Vol. 9, 1985, 557-559. T. Asano and H. Umeo, Systolic Algorithms for Computing the Visibility Polygon and Triangulation of a Polygonal Region, Technical Report of IECE of Japan, COMP86-7, 1986, 53-60. T. Asano and H. Umeo, Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region, Parallel Computing, Vol. 6, 1988, 209-216. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Proceedingsof the 1985 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1985, 411-417. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Paralleland Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 151-160. M. J. Atallah and M. T. Goodrich, Parallel algorithms for some functions of two convex polygons, Algorithmica, Vol. 3, 1988, 535-548.
Bibliography [Atal88b]
197
M. J. Atallah, G. N. Frederickson, and S. R. Kosaraju, Sorting with efficient use of special-purpose sorters, Information Processing Letters, 1988, 13-15.
[Atal89a]
M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the visibility of a simple polygon from a point (preliminary version), Proceedings of the Fifth Annual ACM Symposium on Computational Geometry, Saarbrucken, Germany, June 1989,
[AtaI89b] [Atal89c]
[Atal89d]
114-123. M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the minimum circle-cover problem, Information Processing Letters, Vol. 32, 1989, 159-165. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. M. J. Atallah and J.-J. Tsay, On the parallel-decomposability of geometric problems, Proceedings of the Fifth Annual ACM Symposium on Computational Geometry,
[Atal90aI
[Atal90b]
Saarbricken, Germany, June 1989, 104-113. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-Complete Geometric Problems, Technical Report, Department of Computer Science, Johns Hopkins University, Baltimore, 1990. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-complete geometric problems (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 317-326.
[Atal90c]
M. J. Atallah and D. Z. Chen, Parallel rectilinear shortest paths with rectangular obstacles, Proceedings of the Second ACM Symposium on ParallelAlgorithms and
[Atal9la]
Architectures, Crete, July 1990, 270-279. M. J. Atallah, F. Dehne, R. Miller, A. Rau-Chaplin, and J.-J. Tsay, Multisearch techniques for implementing data structures on a mesh-connected computer, Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures,
[Atal9lb]
[Ayka9l]
Hilton Head, South Carolina, July 1991, 204-214. M. J. Atallah, D. Z. Chen, and H. Wagener, An optimal parallel algorithm for the visibility of a simple polygon from a point, Journal of the ACM, Vol. 38, No. 3, July 1991, 516-533. C. Aykanat and T. M. Kurq, Efficient parallel maze routing algorithms on a hypercube multicomputer, Proceedings of the 1991 InternationalConference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. III, Algorithms and Architectures, 224-227.
[Batc681
K. E. Batcher, Sorting networks and their applications, Proceedings of the AFIPS 1968 Spring Joint Computer Conference, Atlantic City, New Jersey, April/May
[Beic90]
1968, 307-314. I. Beichl and F. Sullivan, A robust parallel triangulation and shelling algorithm, Proceedings of the Second Canadian Conference in Computational Geometry,
[Ben-083]
Ottawa, Ontario, August 1990, 107-111. M. Ben-Or, Lower bounds for algebraic computation trees, Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, Boston, May 1983,
[Bent801
80-86. J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571 -576.
198 [Berg891
[Bern88]
[Bert88] [Beye69] [Blel88]
[Blei89] [Blel90] [Boxe87aI
[Boxe87b]
[Boxe881
[Boxe89a] [Boxe89bI [Boxe89c] [Boxe9O] [Bren74] [Brow79a] [Brow79b]
[Chaz84]
Bibliography B. Berger, J. Rompel, and P. W. Shor, Efficient NC algorithms for set cover with applications to learning and geometry, Proceedings of the Thirtieth Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, October/November 1989, 54-59. M. Bern, Hidden surface removal for rectangles, Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 183-192. A. A. Bertossi, Parallel circle-cover algorithms, Information ProcessingLetters, Vol. 27, 1988, 133-139. W. T. Beyer, Recognition of topological invariants by iterative arrays, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1969. G. E. Blelloch and J. J. Little, Parallel solutions to geometric problems on the scan model of computation, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 218-222. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538. G. E. Blelloch, Vector Models for Data-ParallelComputing, MIT Press, Cambridge, Massachusetts, 1990. L. Boxer and R. Miller, Parallel Dynamic Computational Geometry, Technical Report 87-11, Department of Computer Science, State University of New York at Buffalo, 1987. L. Boxer and R. Miller, Parallel algorithms for dynamic systems with known trajectories, Proceedings of the IEEE 1987 Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, 1987. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. I, Architecture, 323-330. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161-191. L. Boxer and R. Miller, Parallel dynamic computational geometry, Journal of New Generation Computer Systems, Vol. 2, No. 3, 1989, 227-246. L. Boxer and R. Miller, A parallel circle-cover minimization algorithm, Information Processing Letters, Vol. 32, 1989, 57-60. L. Boxer and R. Miller, Common intersections of polygons, Information Processing Letters, Vol. 33, No. 5, 1990, 249-254; see also corrigenda in Vol. 35, 1990, 53. R. P. Brent, The parallel evaluation of general arithmetic expressions, Journal of the ACM, Vol. 21, No. 2, 1974, 201-206. K. Q. Brown, Voronoi diagrams from convex hulls, Information Processing Letters, Vol. 9, 1979, 223-228. K. Q. Brown, Geometric transforms for fast geometric algorithms, Ph.D. thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1979. B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785.
Bibliography [Chaz86] [Chaz901
[Chen87] [Chow8O] [Chow8l]
[Codd681 [Cole88a]
[Cole88b] [Cole88c]
[Cole90a]
[Cole9fb]
[Conr86] [Cook82]
[Cook851 [Corm90] [Cyph9O]
[Czyz89a]
[Czyz89b]
199
B. Chazelle and L. J. Guibas, Fractional cascading: 1. A data structuring technique, Algorithmica, Vol. 1, 1986, 133-162. B. Chazelle, Triangulating a simple polygon in linear time, Proceedings of the Thirty-First Annual Symposium on Foundations of Computer Science, St. Louis, October 1990, Vol. 1, 220-230. G.-H. Chen, M.-S. Chern, and R. C. T. Lee, A new systolic architecture for convex hull and half-plane intersection problems, BIT, Vol. 27, 1987, 141-147. A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. A. L. Chow, A parallel algorithm for determining convex hulls of sets of points in two dimensions, Proceedings of the Nineteenth Annual Allerton Conference on Communication, Control and Computing, Monticello, Illinois, September/October 1981, 214-223. E. F. Codd, Cellular Automata, Academic Press, New York, 1968. R. Cole and M. T. Goodrich, Optimal parallel algorithms for polygon and point-set problems (preliminary version), Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 201-210. R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. R. Cole and U. Vishkin, Approximate parallel scheduling. I. The basic technique with applications to optimal parallel list ranking in logarithmic time, SIAM Journal on Computing, Vol. 17, 1988, 128-142. R. Cole and 0. Zajicek, An optimal parallel algorithm for building a data structure for planar point location, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 280-285. R. Cole, M. T. Goodrich, and C. O'Ddnlaing, Merging free trees in parallel for efficient Voronoi diagram construction, in Automata, Languages and Programming, M. S. Paterson (Editor), Lecture Notes in Computer Science, No. 443, SpringerVerlag, Berlin, 1990, 432-445. M. Conrad, The lure of molecular computing, IEEE Spectrum, Vol. 23, No. 10, October 1986, 55-60. S. Cook and C. Dwork, Bounds on the time for parallel RAM's to compute simple functions, Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, San Francisco, May 1982, 231-233. S. A. Cook, A taxonomy of problems with fast parallel algorithms, Information and Control, Vol. 64, 1985, 2-22. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. J. Czyzowicz, E. Rivera-Campo, N. Santoro, J. Urrutia, and J. Zaks, Guarding Rectangular Art Galleries, Technical Report TR-89-27, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, and J. Urrutia, Illuminating Rectangles and
200
[Czyz89cl
[Dado87]
[Dado891 tDehn86a]
[Dehn86b]
[Dehn88a] [Dehn88b]
[Dehn88c]
[Dehn88d]
[Dehn88e]
[Dehn89]
[Dehn9O]
[Dehn9la]
[Dehn9lb] [Dehn92]
Bibliography Triangles on the Plane, Technical Report TR-89-50, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, J. Urrutia, and J. Zaks, Illuminating Lines and Circles on the Plane, Technical Report TR-89-49, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. N. Dadoun and D. G. Kirkpatrick, Parallel processing for efficient subdivision search, Proceedings of the Third Annual ACM Symposium on Computational Geometry, Waterloo, Ontario, June 1987, 205-214. N. Dadoun and D. G. Kirkpatrick, Parallel construction of subdivision hierarchies, Journal of Computer and System Sciences, Vol. 39, 1989, 153-165. F. Dehne, 0(n"/2) algorithms for the maximal elements and ECDF searching problem on a mesh-connected parallel computer, Information Processing Letters, Vol. 22, 1986, 303-306. F. Dehne, J.-R. Sack, and N. Santoro, Computing on a Systolic Screen: Hulls, Contours and Applications, Technical Report SCS-TR-102, School of Computer Science, Carleton University, Ottawa, Ontario, October 1986. F. Dehne, Solving visibility and separability problems on a mesh-of-processors, The Visual Computer, Vol. 3, 1988, 356-370. F. Dehne, J.-R. Sack, and I. Stojmenovi6, A note on determining the 3-dimensional convex hull of a set of points on a mesh of processors, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318, Springer-Verlag, Berlin, 1988, 154-162. F. Dehne, Q. T. Pham, and I. Stojmenovi6, Optimal Visibility Algorithms for Binary Images on the Hypercube, Technical Report TR-88-27, Computer Science Department, University of Ottawa, Ottawa, Ontario, October 1988. F. Dehne and 1. Stojmenovi6, An 0(,I/H) time algorithm for the ECDF searching problem for arbitrary dimensions on a mesh-of-processors, Information Processing Letters, Vol. 28, 1988, 67-70. F. Dehne, A. Hassenklover, J.-R. Sack, and N. Santoro, Parallel visibility on a mesh connected parallel computer, in Parallel Processing and Applications, E. Chiricozzi and A. D'Amico (Editors), North-Holland, Amsterdam, 1988, 203-210. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Proceedings of the Fifteenth International Workshop on Graph-Theoretic Concepts in Computer Science, June 1989. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 367-375. F. Dehne and S. E. Hambrusch, Parallel algorithms for determining k-width connectivity in binary images, Journal of Paralleland Distributed Computing, Vol. 12, No. 1, May 1991, 12-23. F. Dehne, Ed., Special Issue: Parallel Algorithms for Geometric Problems on Digitized Pictures, Algorithmica, Vol. 6, No. 5, 1991. F. Dehne, B. Flach, M. Gastaldo, D. Graf, R. Merker, R. Sack, and N. Valiveti, Computational geometry on Hopfield networks, manuscript in preparation, 1992.
Bibliography [Deng901 [Dyer8O] [Dyer84] [Eddy771 [Edel86a] [Edel86b]
[Edel87]
[Edel90
[ElGi86a]
[ElGi86b]
[EiGi88] [EIGi9J0 [Evan89] [Fava9O]
[Fava9l]
[Feit881 [Ferr9la] [Ferr9lb]
[Fink74]
201 X. Deng, An optimal parallel algorithm for linear programming in the plane, Information ProcessingLetters, Vol. 35, 1990, 213-217. C. R. Dyer, A fast parallel algorithm for the closest pair problem, Information Processing Letters, Vol. 11, No. 1, 1980, 49-52. M. E. Dyer, Linear time algorithms for two- and three-variable linear programs, SIAM Journal on Computing, Vol. 13, No. 1, 1984, 31-45. W. F. Eddy, A new convex hull algorithm for planar sets, ACM Transactions on Mathematical Software, Vol. 3, No. 4, 1977, 398-403. H. Edelsbrunner, L. J. Guibas, and J. Stolfi, Optimal point location in a monotone subdivision, SIAM Journal on Computing, Vol. 15, 1986, 317-340. H. Edelsbrunner, J. O'Rourke, and R. Seidel, Constructing arrangements of lines and hyperplanes with applications, SIAM Journal on Computing, Vol. 15, No. 2, 1986, 341-363. H. Edelsbrunner, Algorithms in combinatorial geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. H. Edelsbrunner, L. J. Guibas, and M. Sharir, The complexity and construction of many faces in arrangements of lines and of segments, Discrete and Computational Geometry, Vol. 5, 1990, 161-196. H. ElGindy, An optimal speed-up parallel algorithm for triangulating simplicial point sets in space, InternationalJournal of ParallelProgramming, Vol. 15, No. 5, 1986, 389-398. H. ElGindy, A Parallel Algorithm for the Shortest Path Problem in Monotone Polygons, Technical Report MS-CIS-86-49, Department of Computer and Information Science, Faculty of Engineering and Applied Science, University of Pennsylvania, Philadelphia, May 1986. H. ElGindy and M. T. Goodrich, Parallel algorithms for shortest path problems in polygons, The Visual Computer, Vol. 3, 1988, 371-378. H. ElGindy, personal communication, 1990. D. J. Evans and I. Stojmenovi6, On parallel computation of Voronoi diagrams, Parallel Computing, Vol. 12, 1989, 121-125. L. Fava, The design of an efficient BSR network, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1990. L. Fava Lindon and S. G. AkU, An Optimal Implementation of Broadcasting with Selective Reduction, Technical Report 91-298, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. D. G. Feitelson, Optical Computing, MIT Press, Cambridge, Massachusetts, 1988. A. G. Ferreira, personal communication, 1991. A. G. Ferreira and J. G. Peters, Finding smallest paths in rectilinear polygons on a hypercube multiprocessor, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 162-165. R. A. Finkel and J. L. Bentley, Quad-trees; a data structure for retrieval on composite keys, Acta Informatica, Vol. 4, 1974, 1-9.
202 fFjal90] [Fort781
[Fort87] [Fost8O] [Four88] [Free75] [Ghos9lI
[Gibb881 [Good77] [Good87a] [Good87b] [Good88]
[Good89a] [Good89b]
[Good9OaI
[Good9Ob]
[Good92a] [Good92b]
Bibliography P.-O. Fjallstrom, J. Katajainen, C. Levcopoulos, and 0. Petersson, A sublogarithmic convex hull algorithm, BIT, Vol. 30, No. 3, 1990, 378-384. S. Fortune and J. Wyllie, Parallelism in random access machines, Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, San Diego, May 1978, 114-118. S. Fortune, A sweepline algorithm for Voronoi diagrams, Algorithmica, Vol. 2, 1987, 153-174. M. J. Foster and H. T. Kung, The design of special purpose VLSI chips, Computer, Vol. 13, No. 1, January 1980, 26-40. A. Fournier and D. Fussel, On the power of the frame buffer, ACM Transactions on Computer Graphics, Vol. 7, 1988, 103-128. H. Freeman and R. Shapira, Determining the minimal area rectangle for an arbitrary closed curve, Communications of the ACM, Vol. 18, 1975, 409-413. K. S. Ghosh and A. Maheshwari, An optimal parallel algorithm for determining the intersection type of two star-shaped polygons, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 2-6. A. Gibbons and W. Rytter, Efficient Parallel Algorithms, Cambridge University Press, Cambridge, England, 1988. S. E. Goodman and S. T. Hedetniemi, Introduction to the Design and Analysis of Algorithms, McGraw-Hill, New York, 1977, section 5.5. M. T. Goodrich, Efficient parallel techniques for computational geometry, Ph.D. thesis, Purdue University, West Lafayette, Indiana, 1987. M. T. Goodrich, Finding the convex hull of a sorted point set in parallel, Information Processing Letters, Vol. 26, December 1987, 173-179. M. T. Goodrich, Intersecting Line Segments in Parallel with an Output-Sensitive Number of Processors, Technical Report 88-27, Department of Computer Science, John Hopkins University, Baltimore, 1988. M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. M. T. Goodrich, C. 6'Ddnlaing, and C. K. Yap, Constructing the Voronoi diagram of a set of line segments in parallel, Proceedingsof the 1989 Workshop on Algorithms and Data Structures (WADS'89), Lecture Notes in Computer Science, No. 382, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1989, 12-23. M. T. Goodrich, S. B. Shauck, and S. Guha, Parallel methods for visibility and shortest path problems in simple polygons, Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, June 1990, 73-82. M. T. Goodrich, M. R. Ghouse, and J. Bright, Generalized sweep methods for parallel computational geometry (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 280-289. M.T. Goodrich, Ed., Special Issue: Parallel Computational Geometry, International Journal on Computational Geometry and Applications, 1992. M. T. Goodrich and C. K. Yap, What can be parallelized in computational geometry: a survey, manuscript in preparation, 1992.
Bibliography [Gott87]
203 A. Gottlieb, An overview of the NYU Ultracomputer project, in Special Topics in Supercomputing, Vol. 1, Experimental Parallel Computing Architectures, J. J.
[Grah72]
Dongarra (Editor), Elsevier, Amsterdam, 1987, 25-96. R. L. Graham, An efficient algorithm for determining the convex hu]l of a finite planar set, Information Processing Letters, Vol. 1, 1972, 132-133.
[Guha9O]
S. Guha, An optimal parallel algorithm for the rectilinear Voronoi diagram, Proceedings of the Twenty-Eighth Annual Allerton Conference on Communication,
[Guib82]
Control and Computing, Monticello, Illinois, October 1990, 798-807. L. J. Guibas and J. Stolfi, A language for bitmap manipulation, ACM Transactions
[Hage90]
T. Hagerup, H. Jung, and E. Welzl, Efficient parallel computation of arrangements
on Computer Graphics, Vol. 1, 1982, 191-214. of hyperplanes in d dimensions, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 290-297.
[He9l]
[Hi1851 [Hole90
X. He, An efficient parallel algorithm for finding minimum weight matching for points on a convex polygon, In formation Processing Letters, Vol. 37, No. 2, January 1991, 111-116. W. D. Hillis, The Connection Machine, MIT Press, Cambridge, Massachusetts, 1985. J. A. Holey and 0. H. Ibarra, Iterative algorithms for planar convex hull on meshconnected arrays, Proceedings of the 1990 International Conference on Parallel
[Hole9l]
Processing, St. Charles, Illinois, August 1990, 102-109. J. A. Holey and 0. H. Ibarra, Triangulation, Voronoi diagram, and convex hull in k-space on mesh-connected arrays and hypercubes, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. IIt, Algorithms and Applications, 147-150.
[Hopf85]
J. J. Hopfield and D. W. Tank, "Neural" computation of decisions in optimization problems, Biological Cybernetics, Vol. 52, 1985, 141-152.
[HouI85]
M. E. Houle and G. T. Toussaint, Computing the width of a set, Proceedingsof the First Annual ACM Symposium on Computational Geometry, Baltimore, 1985, 1-7.
[Jarv731
R. A. Jarvis, On the identification of the convex hull of a finite set of points in the plane, Information Processing Letters, Vol. 2, No. 1, 1973, 18-21.
[Jeon87l
C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on mesh-connected computers, Proceedings of the 1987 Fall Joint Computer Conference, Exploiting Technology Today and Tomorrow, October 1987, 311-318.
[Jeon90] [Jeon9la] [jeon9lb] [John77] [Jwo90]
C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. C.-S. Jeong, An improved parallel algorithm for constructing Voronoi diagram on a mesh-connected computer, Parallel Computing, Vol. 17, July 1991, 505-514. C.-S. Jeong, Parallel Voronoi diagram in L, (L_) metric on a mesh-connected computer, Parallel Computing, Vol. 17, No. 2/3, June 1991, 241-252. D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, Journal of the ACM, Vol. 24, No. 1, 1977, 1-13. J. S. Jwo, S. Lakshmivarahan, and S. K. Dhall, Embedding of cycles and grids in star graphs, Proceedings of the Second IEEE Symposium on Paralleland Distributed
Processing, Dallas, December 1990, 540-547.
204 [Karp90I
[Kim90] [Kirk83] [Knut73] [Krus85]
[Kuce82] [Kuma86]
[Kuma9O] [Kuma9l] [Lang76] [Lee6l] [Lee77] [Lee8l] [Lee84a] [Lee84b] [Lee86a]
[Lee86b] [Lee89] [Leig8l]
[Leig85]
Bibliography R. M. Karp and V. Ramachandran, A survey of parallel algorithms for shared memory machines, in Handbook of Theoretical Computer Science, J. van Leeuwen (Editor), North-Holland, Amsterdam, 1990, 869-941. S. K. Kim, Parallel algorithms for the segment dragging problem, Information Processing Letters, Vol. 36, No. 6, December 1990, 323-328. D. G. Kirkpatrick, Optimal search in planar subdivisions, SIAM Journal on Computing, Vol. 12, No. 1, February 1983, 28-35. D. E. Knuth, The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, Massachusetts, 1973. C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles. Illinois, August 1985, 180-185. L. Kucera, Parallel computation and conflict in memory access, Information Processing Letters, Vol. 14, No. 2, April 1982, 93-96. V. K. Prasanna Kumar and M. M. Eshaghian, Parallel geometric algorithms for digitized pictures on a mesh of trees, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, August 1986, 270-273. V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal (Editors), ParallelAlgorithms for Machine Intelligence and Vision, Springer-Verlag, New York, 1990. V. K. Prasanna Kumar, Parallel Architectures and Algorithms for Image Understanding, Academic Press, New York, 1991. T. Lang and H. S. Stone, A shuffle-exchange network with simplified control, IEEE Transactions on Computers, Vol. C-25, No. 1, January 1976, 55-65. C. Y. Lee, An algorithm for path connections and its applications, IRE Transactions on Electronic Computers, Vol. EC-10, No. 3, 1961, 346-365. D. T. Lee and F. P. Preparata, Location of a point in a planar subdivision and its applications, SIAM Journal on Computing, Vol. 6, 1977, 594-606. D. T. Lee, H. Chang, and C. K. Wong, An on-chip compare steer bubble sorter, IEEE Transactions on Computers, Vol. C-30, 1981, 396-405. C. C. Lee and D. T. Lee, On a circle-cover minimization problem, Information Processing Letters, Vol. 18, 1984, 109-115. D. T. Lee and F. P. Preparata, Computational geometry-a survey, IEEE Transactions on Computers, Vol. C-33, No. 12, 1984, 1072-1101. D. T. Lee, Geometric location problems and their complexity, Proceedings of the Symposium on Mathematical Foundations of Computer Science, Lecture Notes in Computer Science, No. 233, Springer-Verlag, Berlin, 1986, 154-167. D. T. Lee and Y. F. Wu, Geometric complexity of some location problems, Algorithmica, Vol. 1, 1986, 193-211. D. T. Lee and F. P. Preparata, Parallel watched planar point location on the CCC, Information Processing Letters, Vol. 33, 1989, 175-179. F. T. Leighton, New lower bound techniques for VLSI, Proceedings of the Twenty-Second Annual Symposium on Foundations of Computer Science, Nashville, Tennessee, October 1981, I -12. F. T. Leighton, Tight bounds on the complexity of parallel sorting, IEEE Transactions on Computers, Vol. C-34, No. 4, April 1985, 344-354.
205
Bibliography [Leig9l]
F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees Hlypercubes, Morgan Kaufman, San Mateo, California, 1991.
[Levc88]
C. Levcopoulos, J. Katajainen, and A. Lingas, An optimal expected-time parallel algorithm for Voronoi diagrams, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318,
[Lipp87] [Lodi86]
Springer-Verlag, Berlin, 1988, 190-198. R. P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magazine, April 1987, 4-22. E. Lodi and L. Pagli, A VLSI solution to the vertical segment visibility problem, IEEE Transactions on Computers, Vol. C-35, No. 10, October 1986, 923-928.
[Lu86a]
M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles,
[Lu86b]
Illinois, August 1986, 806-811. M. Lu and P. Varman, Mesh-connected computer algorithms for rectangle-intersection problems, Proceedings of the 1986 International Conference on ParallelProcessing,
St. Charles, Illinois, August 1986, 301-307. [MacK90a] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively ParallelComputation, College Park, Maryland, October 1990, 8-1 1.
[MacK9Ob] P. D. MacKenzie and Q. F. Stout, Practical hypercube algorithms for computational geometry, poster presentation at the Third Symposium on the Frontiers of Massively
[Megi831
Parallel Computation, College Park, Maryland, October 1990. N. Megiddo, Linear time algorithm for linear programming in R3 and related problems, SIAM Journal on Computing, Vol. 12, No. 4, 1983, 759-776.
[MehI84]
K. Mehlhorn, Data structures and algorithms 3: multi-dimensional searching and computational geometry, in EATCS Monographs on Theoretical Computer Science,
W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. [Menn90
A. Menn and A. K. Somani, An efficient sorting algorithm for the star graph interconnection network, Proceedings of the 1990 International Conference on Par-
[Merk86]
allel Processing, St. Charles, Illinois, August 1990, 1-8. E. Merks, An optimal parallel algorithm for triangulating a set of points in the plane, InternationalJournal of Parallel Programming, Vol. 15, No. 5, 1986, 399-411.
[MiIl84a]
R. Miller and Q. F. Stout, Computational geometry on a mesh-connected computer (preliminary version), Proceedings of the 1984 International Conference on Parallel
[Mill84b]
Processing, Bellaire, Michigan, August 1984, 66-73. R. Miller and Q. F. Stout, Convexity algorithms for pyramid computers (preliminary version), Proceedings of the 1984 International Conference on Parallel Processing,
[MilI85a]
Bellaire, Michigan, August 1984, 177-184. R. Miller and Q. F. Stout, Pyramid computer algorithms for determining geometric properties of images, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, June 1985, 263-271.
[Mill85b]
R. Miller and Q. F. Stout, Geometric algorithms for digitized pictures on a meshconnected computer, IEEE Transactions on Pattern Analysis and Machine Intelli-
[Mil]87a]
gence, Vol. PAMI-7, No. 2, March 1985, 216-228. R. Miller and S. E. Miller, Using hypercube multiprocessors to determine geometric
206
[Mill87b]
[MiII88] [MiIl89a] [Mill89bI [Mins691 [Nand88]
[Nass80
[Nass8l] [Nass82] [Nath80
[Niga9O]
[O'Rou87] [Osia9O]
[Osia9l]
[Over8l] [Parb87] [Pate9O] [PelI90]
Bibliography properties of digitized pictures, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 638-640. R. Miller and Q. F. Stout, Mesh computer algorithms for line segments and simple polygons, Proceedings of the 1987 International Conference on ParallelProcessing, St. Charles, Illinois, August 1987, 282-285. R. Miller and Q. F. Stout, Efficient parallel convex hull algorithms, IEEE Transactions on Computers, Vol. C-37, No. 12, December 1988, 1605-1618. R. Miller and S. Miller, Convexity algorithms for digitized pictures on an Intel iPSC hypercube, Supercomputer, Vol. 31, May 1989, 45-51. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, Massachusetts, 1969. S. K. Nandy, R. Moona, and S. Rajagopalan, Linear quadtree algorithms on the hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 227-229. D. Nassimi and S. Sahni, Finding connected components and connected ones on a mesh-connected parallel computer, SIAM Journal on Computing, Vol. 9, No. 4, November 1980, 744-757. D. Nassimi and S. Sahni, Data broadcasting in SIMD computers, IEEE Transactions on Computers, Vol. C-30, No. 2, February 1981, 101-106. D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms and a new generalized connection network, Journal of the ACM, Vol. 29, No. 3, 1982, 642-667. D. Nath, S. N. Maheshwari, and P. C. P. Bhatt, ParallelAlgorithms for the Convex Hull Problem in Two Dimensions, Technical Report EE 8005, Department of Electrical Engineering. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, October 1980. M. Nigam, S. Sahni, and B. Krishnamurthy, Embedding hamiltonians and hypercubes in star interconnection graphs, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1990, 340-343. J. O'Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, New York, 1987. C. N. K. Osiakwan and S. G. Akl, Efficient ParallelAlgorithms for the Assignment Problem on the Plane, Technical Report 90-284, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. C. N. K. Osiakwan, Parallel computation of weighted matchings in graphs, Ph.D. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1991. M. H. Overmars and J. van Leeuwen, Maintenance of configurations in the plane, Journal of Computer and System Sciences, Vol. 23, 1981, 166-204. I. Parberry, Parallel Complexity Theory, Research Notes in Theoretical Computer Science, Pitman Publishing, London, 1987. M. S. Paterson, Improved sorting networks with O(log N) depth, Algorithmica, Vol. 5, 1990, 75-92. M. Pellegrini, Stabbing and ray shooting in 3 dimensional space, Proceedings of the
Bibliography
[Prea88] [Prei88]
[Prep8l] [Prep85] [Qiu9la]
[Qiu9lb] [Qiu9lc]
[Rama88]
[Rana87]
[Rank90 [Reif85]
[Reif87]
[Reif90]
[Rey87] [Rose62] [Roth76]
207
Sixth Annual ACM Symposium on Computational Geometry, Berkeley, California, June 1990, 177-186. B. T. Preas, M. J. Lorenzetti, and B. D. Ackland (Editors), Physical Design Automation of Electronic Systems, Benjamin-Cummings, Menlo Park, California, 1988. W. Preilowski and W. Mumbeck, A time-optimal parallel algorithm for the computing of Voronoi-diagrams, Proceedings of the Fourteenth International Workshop on Graph-Theoretic Concepts in Computer Science, Amsterdam, Lecture Notes in Computer Science, No. 344, J. van Leeuwen (Editor), Springer-Verlag, Berlin, June 1988, 424-433. F. P. Preparata and J. Vuillemin, The cube-connected-cycle: a versatile network for parallel computation, Communications of the ACM, Vol. 24, No. 5, 1981, 300-309. F. P. Preparata and M. 1. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. K. Qiu, H. Meijer, and S. G. AkI, Parallel routing and sorting on the pancake network, Proceedingsof the InternationalConference on Computing and Information, Ottawa, May 1991, Lecture Notes in Computer Science, No. 497, Springer-Verlag, Berlin, 360-371. K. Qiu, H. Meijer, and S. G. Akl, Decomposing a star graph into disjoint cycles, Information Processing Letters, Vol. 39, No. 3, August 1991, 125-129. K. Qiu, S. G. Akl, and H. Meijer, The Star and Pancake Interconnection Networks: Properties and Algorithms, Technical Report 91-297, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. J. Ramanujam and P. Sadayappan, Optimization by neural networks, Proceedings of the IEEE International Conference on Neural Networks, San Diego, 1988, (11)325-332. A. G. Ranade, How to emulate shared memory, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 185-194. S. Ranka and S. Sahni, Hypercube Algorithms with Application to Image Processing and Pattern Recognition, Springer-Verlag, New York, 1990. J. H. Reif, An optimal parallel algorithm for integer sorting, Proceedings of the Twenty-Sixth Annual Symposium on Foundations of Computer Science, Portland, Oregon, October 1985, 496-503. J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedings of the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. C. Rey and R. Ward, On determining the on-line minimax linear fit to a discrete point set in the plane, Information Processing Letters, Vol. 24, No. 2, 1987, 97-101. F. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1962. J. Rothstein, On the ultimate limitations of parallel processing, Proceedings of
208
[Rub90] [Sark89a] [Sark89b]
[Saxe90
[Saxe9l]
[Sche891 [Schw80] [Schw89I
[Sham75] [Sham78] [Shan50] [Sher90
[Shi9l] [Shih87I [Snir85] [Snyd86] [Spri89]
[Srid90]
Bibliography the 1976 International Conference on Parallel Processing, Detroit, August 1976, 206-212. C. Rub, Parallel algorithmsfor red-blue intersection problems, manuscript, FB 14, Informatik, Universitat des Saarlandes, Saarbrucken, 1990. D. Sarkar and I. Stojmenovi6, An optimal parallel circle-cover algorithm, Information Processing Letters, Vol. 32, July 1989, 3-6. D. Sarkar and I. Stojmenovi6, An Optimal Parallel Algorithm for Minimum Separation of Two Sets of Points, Technical Report TR-89-23, Computer Science Department, University of Ottawa, Ottawa, Ontario, July 1989. S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions, IEEE Transactions on Computers, Vol. C-39, No. 3, March 1990, 400-404. S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Correction to: "Parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions," IEEE Transactions on Computers, Vol. C-40, No. 1, January 1991, 122. 1. D. Scherson and S. Sen, Parallel sorting in two-dimensional VLSI models of computation, IEEE Transactions on Computers, Vol. C-38, No. 2, 1989, 238-249. J. T. Schwartz, Ultracomputers, ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980, 484-521. 0. Schwarzkopf, Parallel computation of discrete Voronoi diagrams, Proceedingsof the Sixth Annual Symposium on Theoretical Aspects of Computer Science, Paderbom, Germany, February 1989, 193-204. M. I. Shamos, Geometric complexity, Proceedings of the Seventh ACM Symposium on Theory of Computing, Albuquerque, New Mexico, May 1975, 224-233. M. I. Shamos, Computational geometry, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, Connecticut, 1978. C. E. Shannon, Memory requirements in a telephone exchange, Bell Systems Technical Journal, Vol. 29, 1950, 343-349. T. Shermer, Recent Results in Art Galleries, Technical Report CMPT TR 90-10, Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, 1990. X. Shi, Contributions to sequence problems, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1991. Z.-C. Shih, G.-H. Chen, and R. C. T. Lee, Systolic algorithms to examine all pairs of elements, Communications of the ACM, Vol. 30, No. 2, February 1987, 161-167. M. Snir, On parallel searching, SIAM Journal on Computing, Vol. 12, No. 3, August 1985, 688-708. L. Snyder, Type architectures, shared memory and the corollary of modest potential, Annual Review of Computer Science, Vol. 1, 1986, 289-317. F. Springsteel and I. Stojmenovi6, Parallel general prefix computations with geometric, algebraic, and other applications, International Journal of Parallel Programming, Vol. 18, No. 6, December 1989, 485-503. R. Sridhar, S. S. Iyengar, and S. Rajanarayanan, Range search in parallel using distributed data structures, Proceedings of the International Conference on
Bibliography
209
[Stoj87]
Databases, Parallel Architectures, and Their Applications, Miami Beach, Florida, March 1990, 14-19. I. Stojmenovic, Parallel Computational Geometry, Technical Report CS-87-176, Computer Science Department, Washington State University, Pullman, Washington, November 1987.
[Stoj88a]
1. Stojmenovic, Computational geometry on a hypercube, Proceedings of the 1988
International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. [Stoj88b] I. Stojmenovi6 and M. Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane, ParallelComputing, Vol. 7, 1988, 249-251. [Stou84] Q. F. Stout and R. Miller, Mesh-connected computer algorithms for determining geometric properties of figures, Proceedings of the 1984 International Conference on Pattern Recognition, 1984. [Stou851 Q. F. Stout, Pyramid computer solutions of the closest pair problem, Journal of Algorithms, Vol. 6, 1985, 200-212. [Stou88] Q. F. Stout, Constant-time geometry on PRAMS, Proceedings of the 1988 International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 104-107. [Tama89a] R. Tamassia and J. S. Vitter, Parallel Transitive Closure and Point Location in Planar Structures, Technical Report CS-89-45, Department of Computer Science, Brown University, Providence, Rhode Island, October 1989. [Tama89b] R. Tamassia and J. S. Vitter, Optimal parallel algorithms for transitive closure and point location in planar structures, Proceedings of the 1989 Symposium on Parallel Algorithms and Architectures, Sante Fe, New Mexico, June 1989, 399-408. [Tama90] R. Tamassia and J. S. Vitter, Optimal cooperative search in fractional cascaded data structures, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 307-316. [Tama9l] R. Tamassia and J. S. Vitter, Planar transitive closure and point location in planar structures, SIAM Journalon Computing, Vol. 20, No. 4, August 1991, 708-725. [Tarj85] R. E. Tarjan and U. Vishkin, An efficient parallel biconnectivity algorithm, SIAM Journal on Computing, Vol. 14, 1985, 862-874. [Thin87] Thinking Machines Corporation, Connection Machine Model CM-2 Technical Summary, Thinking Machines Technical Report HA87-4, April 1987. [Thom77] C. D. Thompson and H. T. Kung, Sorting on a mesh-connected parallel computer, Communications of the ACM, Vol. 4, No. 20, 1977, 263-271. [Tous83] G. T. Toussaint, Solving geometric problems with the "rotating calipers", Proceedings of IEEE MELECON'83, Athens, May 1983. [Uhr87] L. Uhr (Editor), Parallel Computer Vision, Academic Press, New York, 1987. [Ullm84] J. D. Ullman, ComputationalAspects of VLSI, Computer Science Press, Rockville, Maryland, 1984. [Umeo89] H. Umeo and T. Asano, Systolic algorithms for computational geometry problems - a survey, Computing, Vol. 41, 1989, 19-40. [Urru89] J. Urrutia and J. Zaks, Illuminating Convex Sets, Technical Report TR-89-31, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989.
210 [Vaid89]
Bibliography
P. M. Vaidya, Geometry helps in matching, SIAM Journal on Computing, Vol. 18, No. 6, December 1989, 1201-1225. [vanW90] K. van Weringh, Algorithms for the Voronoi diagram of a set of disks, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. [Vish84] U. Vishkin, A parallel-design distributed-implementation (PDDI) general-purpose computer, Theoretical Computer Science, Vol. 32, 1984, 157-172. [VoroO8] G. Voronoi, Nouvelles applications des parametres continus a la theorie des formes quadratiques. Deuxieme Memoire: Recherches sur les paralldloedres primitifs, Journal fur die Reine und Angewandte. Mathematik, 134, 1908, 198-287. [Wang87] C. A. Wang and Y. H. Tsin, An O(logn) time parallel algorithm for triangulating a set of points in the plane, Information Processing Letters, Vol. 25, 1987, 55-60. [Wang9Oa] B.-F. Wang and G.-H. Chen, Constant Time Algorithms for Sorting and Computing Convex Hulls, Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 1990. [Wang9Ob] B.-F. Wang, G.-H. Chen, and F.-C. Lin, Constant time sorting on a processor array with a reconfigurable bus system, Information Processing Letters, Vol. 34, No. 4, 1990, 187-192. [Wee9O] Y. C. Wee and S. Chaiken, An optimal parallel LI-metric Voronoi diagram algorithm, Proceedings of the Second Canadian Conference on Computational Geometry, Ottawa, Ontario, August 1990, 60-65. [Wegr91] P. Wegrowicz, Linear programming on the reconfigurable mesh and the CREW PRAM, M.Sc. thesis, School of Computer Science, McGill University, Montreal, Quebec, 1991. [Won87] Y. Won and S. Sahni, Maze routing on a hypercube multiprocessor computer, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 630-637. [Yao81] A. C. Yao, A lower bound to finding convex hulls, Journal of the ACM, Vol. 28, No. 4, 1981, 780-787. [Yap87] C. K. Yap, What can be parallelized in computational geometry? Invited talk at the International Workshop on Parallel Algorithms and Architectures, Humboldt University, Berlin, May 1987, Lecture Notes in Computer Science, No. 269, Springer-Verlag, Berlin, 1988, 184-195. [Yap88] C. K. Yap, Parallel triangulation of a polygon in two calls to the trapezoidal map, Algorithmica, Vol. 3, 1988, 279-288.
Index Author Index Ackland, B. D., 185 Aggarwal, A., 45, 62, 72, 107, 123. 135, 182, 193 Ajtai, M., 25, 182 Akers, S. B., 182 AkU, S. G., 6, 25, 45, 46, 62, 72, 87, 95, 107, 123, 124, 182. 183, 185 Alt, H., 183 Anderson, R., 193 Asano, T., 87, 135 Atallah, M. J., 46, 62. 72, 87, 88, 95, 107, 123, 135. 193 Aykanat, C., 183 Catcher, K. E., 183 Beame, P., 193 Beichl, I., 135 Ben-Or, M., 46 Bentley, J. L., 62, 72, 96, 135 Bern, M., 183 Bertossi, A. A., 123 Bcyer, W. T., 183 Bhatt, P. C. P., 6, 48. 108. 136 Blelloch, G. E., 25, 46, 72, 183 Boxer, L., 62, 123, 183, 193 Brent, R. P., 62 Bright, J., 193 Brisson, E., 193 Brown, K. Q., 46, 107, 183 Callahan, P., 193 Chaiken, S., 109 Chang, H., 96 Chazelle, B., 45, 46, 62, 63, 72, 73, 96, 107, 123, 135, 182, 193 Chen, D. Z., 87. 88, 123 Chen, G. H., 46, 49, 63 Chern, M. S., 46, 63 Chow, A. L., 6, 46, 63, 107 Codd, E. F., 26 Cole, R., 26, 46, 62, 63. 72, 73, 87, 96, 107, 123, 135, 183 Conrad, M.. 26 Cook, S. A., 193 Cormen, T. H., 26 Cypher, R., 46, 88, 96, 136, 183 Czyzowicz, J., 193 Dadoun, N., 46, 63, 73 Dehne, F., 46, 47, 73, 88, 136, 183, 193
Deng, Dhall, Dyer, Dyer,
X.. 63 S. K., 184 C. R.. 96 M. E.. 63
Eddy, W. F., 47 Edelsbrunner, H., 6, 73, 123, 193
ElGindy, H., 47, 123, 124, 136 Eshaghian, M. M., 47, 96 Evans, D. J., 107 Fava, L. (Fava Lindon, L.) 183 Feitelson, D. G., 26 Ferreira, A. G., 124, 184 Finkel, R. A., 96 Fjillstrom, P. O., 47 Flach, B., 193 Fortune, S., 124 Foster, M. J., 26 Fournier, A., 184 Frederickson, G. N., 95 Freeman, H., 184 Fussel, D., 184
Johnson, D. B., 184 Jung, H., 193 Jwo, J. S., 184 Kanal, L. N., 6 Karp, R. M., 184 Katajainen, J., 47, 108 Kim, S. K., 63 Kirkpatrick, D. G., 46, 63, 73 Knuth, D. E., 184 Koml6s, J., 25, 182 Kosaraju, S. R., 95 Krishnamurthy, B., 182, 185 Kruskal, C. P., 26, 124 Kucera, L., 184 Kumar, V., 6 Kung, H. T., 26, 186 Kurc, T. M., 183 Lakshmivarahan, S., 184 Lang, T., 184 Lee, C. C., 124 Lee, C. Y., 184 Lee, D. T., 6, 63, 73, 96, 108, 124, 136, 184 Lee, R. C. T., 46, 63 Leighton, F. T., 26, 47, 88, 96, 136, 184 Leiserson, C. E., 26 Levcopoulos, C., 47, 108 Lin, F. C., 49 Lingas. A., 108 Lippmann, R. P., 26 Little, J. J., 46, 72 Lodi, E., 88 Lorenzetti, M. J., 185 Lu, M., 47, 63, 124, 108
Gastaldo, M., 193 Ghosh. K. S., 63 Ghouse, M. P., 193 Gibbons, A.. 184 Goodman, S. E., 124 Goodrich. M. T., 6, 46, 47, 62, 63, 72, 73, 87, 88, 95, 96, 107, 108, 124. 135, 136, 184, 193 Gopalakrishnan, P. S.. 6 Graf, D., 193 Graham, R. L., 184 Guenther, G. R.. 25, 46, 182 Guha. S., 88. 108, 124 Guibas, L. J., 45, 62, 63, 72, 73, 107, 123, 135, 182, 184, 193 MacKenzie, P. D., 47, 63, 88, 96, 136 Maheshwari, A., 63 Hagerup, T., 183, 193 Maheshwari, S. N., 6, 48 Hambrusch, S. E.. 183 Megiddo, N., 63 Harel, D., 182 Mehlhorn, K., 6, 183 Hassenklover, A., 183 Meijer, H., 182, 185 He, X., 124 Menn, A., 184 Hedetniemi, S. T., 124 Merker, R., 193 Holey, J. A., 26, 47, 108 Merks, E., 136 Hopfield, J. J., 194 Houle, M. E., 184 Miller, R., 26, 47, 48, 62, 63, 96, 97, 124, 183, 185, 193 Miller, S., 48 Ibarra, 0. H., 26, 47, 108 Miller, S. E., 48 Iyengar, S. S., 73 Minsky, M., 26 Miyakawa, M., 48, 124 Jarvis, R. A., 47 Moona, R., 96 Jeong, C. S., 63, 73, 108, 124, 136, 184 Mumbeck, W., 108 211
Index
212 Nandy, S. K., 96 Nassimi, D., 185 Nath, D., 6, 48 Nigam, M., 185 O'Dunlaing, C., 45, 62, 72, 107, 108, 123, 135, 182, 193 O'Rourke, J., 193, 194 Osiakwan, C. N., 124 Overmars, M. H., 48, 136 Pagli, L., 88 Papert, S., 26 Parberry, 1., 185 Paterson, M. S., 185 Pellegrini, M., 194 Peters, J. G., 124 Petersson, O., 47 Pham, Q. T., 88 Plaxton, C. G., 46, 88, 96, 183 Prasad, V. C., 108, 136 Prasanna Kumar, V. K., 6, Preas, B. T., 185 Preilowski, W., 108 Preparata, F. P., 6, 48, 63, 108, 124, 136, 183,
136, 47, 96 73, 96, 185
Qiu, K., 182, 183, 185 Rajagopalan, S., 96 Rajanarayanan, S., 73 Ramachandran, V., 184 Ramanujam, J., 194 Ranade, A. G., 185 Ranka, S., 185 Rappaport, D., 182 Rau-Chaplin, A., 46, 73, 136, 193 Reif, J. H., 6, 48, 73, 88, 108, 136 Rey, C., 185 Rivera-Campo, E., 193 Rivest, R. L., 26 Rosenblatt, F., 26 Rothstein, J., 185 Rub, C., 63 Rudolf, L., 26, 124 Rytter, W., 184 Sack, J. R., 47, 183, 193 Sadayappan, P., 194 Sahni, S., 185, 186 Santoro, N., 47, 183, 193 Sarkar, D., 88, 124 Saxena, S., 108, 136 Scherson, 1. D., 185 Schwarzkopf, O., 108 Seidel, R., 193 Sen, S., 6, 48, 73, 88, 136, 185
Shamos, M. I., 6, 48, 63, 73, 96, 108, 124, 136, 185, 186 Shannon, C. E., 186 Shapira, R., 184 Sharir, M., 193 Shauck, S. B., 88, 124 Shermer, T., 194 Shi, X., 186 Shih, Z. C., 63 Snir, M., 26, 124 Snyder, L., 186 Somani, A. K., 184 Springsteel, F., 186 Sridhar, R., 73 Stojmenovic, I., 47, 48, 64, 88, 96, 107, 109, 124, 182, 183, 186 Stolfi, J., 73, 123, 184 Stone, H. S., 184 Stout, Q. F., 26, 47, 48, 63, 88, 96, 97, 124, 136, 185 Sullivan, F., 135 Szemeredi, E., 25, 182 Tamassia, R., 73 Tank, D. W., 194 Tarjan, R. E., 124 Thompson, C. D., 186 Toussaint, G. T., 184, 186 Tsay, J. J., 46, 87, 95, 193 Tsin, Y. H., 136 Uhr, L., 7 Ullman, J. D., 26, 186 Umeo, H., 87, 135 Urrutia, J., 193, 194 Vaidya, P. M., 125 Valiveti, N., 193
Yao, A. C., 49 Yap, C. K., 6, 7, 45, 62, 72, 107, 108, 123, 135, 136, 182, 193 Zajicek, O., 73 Zaks, J., 193, 194
Subject Index AKS, sorting: circuit, 176 network, 18 Algorithm: cost optimal, 3 deterministic, 3 optimal, 3 parallel, I randomized, 3 sequential, 3 Alternating: path, 120 tree, 120 Antipodal, 166 Arrangement, 90 Art gallery, 88 Ascend, 160 Assignment problem, 118, 119 Associated point, 164 Augmented plane sweep tree, 67 Augmenting path, 120 Average case analysis, 4
van Leeuwen, J., 48, 136 van Weringh, K., 48, 109 Varman, P., 63 Vishkin, U., 107, 124, 186 Vitter, J. S., 73 Voronoi, G., 109 Vuillemin, J., 185
Balanced search tree, 139 Biological computer, 23 Bipartite graph, 118 Boundary of a point set, 43 Bridged separator tree, 68 Broadcasting, 22, 156: interval, 162 with selective reduction, 23, 169 Bucketing, 104 Butterfly network, 17
Wagener, H., 88 Wang, B. F., 49 Wang, C. A., 136 Ward, R., 185 Wee, Y. C., 109 Wegrowicz, P., 64 WeIzl, E., 193 Won, Y., 186 Wong, C. K., 96 Wood, D., 62, 72, 135 Wu, Y. F., 184
Cascading: divide and conquer, 53, 81, 92, 130 fractional, 53, 67, 130 merge technique, 30 Cayley graph, 151 Cellular automata, 10 Center of a set, 95 Circle cover: minimum cardinality, 111 minimum weight, 111
Index Circular: arcs, 24. 62, 111 range searching, 72 Circumscribing polygon, 45, 122 Classification theory, 89 Closest pair (CP) , X9, 169. 181 Cluster analysis, 89 Combinational circuit. 175 Computational geometry, I Concentration, 161 Conflict resolution policies, 22 Convex: hull, 2, 5, 24. 27, 164, 173, 181 polygon, 2 polygonal chain, 44, 61, 181 subdivision, 2 Cooperative searching, 68 Co-podal pair, 168 Cost, 3: optimal, 3 Cousins, 158 Critical: point merging, 81 support line, 166 Cube-connected cycles (CCC) network, 17 Data structure, 187 Delaunay triangulation, 1()0. 102, 116, 131, 135 Depth: of a combinational circuit, 175 of a point set, 45 of an image, 40 of collision, 61 Descend, 160 Deterministic algorithm, 3 Diameter: of a point set, 95, 166 of a polygon, 122 Digitized image, 12, 40, 93, 103 Dirichlet tessellation, 99 Disk, 105, 123 Distance: between a point and an oriented edge, 164 between line segments, 95 between polygons, 95, 167. 168 Distribution, 161 Divide and conquer, 33, 101, 104, 105, 116 cascading, 53, 81, 92, 130 multiway, 29, 31, 33, 38, 43, 82, 129, 133 Dividing chain, 102 Dominate, 41, 149, 172 Dynamic computational geometry, 191 Dynamically changing set, 95
213 ECDF (Empirical Cumulative Distribution Function) searching, 41, 147, 150, 168 Elementary operation, 3 Eucliidean: minimum spanning tree, 99. 115. 123 minimum weight perfect matching, 118, 123 minimum weight triangulation. 134 traveling salesperson problem. 123 Euler tour technique, 114 Euler's relation. 100 Expected running time, 4 Extremal: point, 43 search, 164 Facility location problem, 123 Farthest neighbor, 94 All(AFN), 94 Finger probe, 5 Fold-over operation, 91 Fractional cascading, 53, 67, 130 Frame buffer, 137 Free tree, 1)5 Funnel polygon, 129, 133 Gabriel graph, 135 General polygon, 5 prefix computation, 146, 168, 180 Geometric optimization, III, 122, 189 Graphics, 137 Greedy algorithm, 112, 114 Grid, 137 Half-plane, 41, 51, 59 Ham-sandwitch cut, 87 Hull convex, 2, 5, 24, 27, 164, 173, 181 lower, 29, 181 upper, 29, 133, 181 Hypercube network, 17 Hyperplane, 190 Illumination, 188 Indexing scheme, 12: proximity ordering, 34 row-major order, 12 shuffled row-major order, 12 snakelike row-major order, 12
Interconnection network, I 1: AKS sorting, 18, 176 butterfly, 17 cube-connected cycles (CCC), 17 hypercube, 17 linear array, II mesh, II mesh of trees (MOT), 13 omega, 25 pancake, 19, 151, 152 perfect shuffle, 25 plus minus 2 (PM21), 24 pyramid, 14 star, 19, 151 tree, 13 Interconnection unit (IU), 21, 170 Intersection, 2, 51: of two convex polygons, 169 strict, Ill Inversion, 101 Isothetic: line segment, 54 rectangle, 58 unit square, 192 k-D (or multidimensional binary tree), 70, 72 Kernel of a simple polygon, 59 reachability, 87 Leader, 162, 178 Light source, 141 Line segment, 51, 81, 82, 83 Linear: array network, II programming, 59, 119 Linearly separable, 192 Link: center, 6 distance, 6 List ranking, 104, 114 Loci of proximity, 99 Lower: bound, 3, 147, 177 hull, 29, 181 Maintenance, 45 Many-to-one routing, 163 Matching, 2, 117: Euclidean, 118 Manhattan, 118 maximum, 122 minimum weight perfect, 119, 123 Maximal vectors (maximal elements), 41, 137, 147, 150, 168, 172
Index
214 Maximum empty rectangle, 122 Maze, 143 m-contour, 41 Merging, i58: circuit, 176 slopes technique, 164 Mesh: network, i I of trees (MOT) network, 13 with broadcast buses, 12, 13 with reconfigurable buses, 13 Metric: L, (Manhattan), 94, 103, 118 L, (Euclidean), 118 L,, 94 Minimax linear fit, 167 Model of computation, 4 Monotone polygon, 58 Multidimensional binary (or k-D) tree, 70, 72 Multi-level bucketing, 104 Multilocation, 54, 128, 130 Multisearch, 40, 188 Multiway divide and conquer, 29, 31, 33, 38, 43, 82, 129, 133 NC, 191 Nearest neighbor, 89 all(ANN), 89 ball, 92 query(QNN), 89 Neighbor: nearest, 89 farthest, 94 Network model, 4, 10 Neural net, 9, 189 Next element search, 69 Omega network, 25 One-sided monotone polygon, 129 Optical computer, 23 Optimal algorithm, 3 Optimization, 111, 122, 189 Order: at least, 3 at most, 3 proximity, 34 row-major, 12 shuffled row-major, 12 snakelike row-major, 12 Orthogonal range searching, 71 Pancake network, 19, 151, 152 Parallel: algorithm, 1 computer, 4 prefix, 23, 29, 113, 157, 175 Parallel random access machine (PRAM), 4, 21:
CRCW, 22 CREW, 21 EREW, 21 Real(RPRAM), 22 Parallelism, 1, 2 Path: external shortest, 122 in a maze, 143 polygonal, 6 shortest, 116 smallest, 122 Pattern recognition, 9, 65 P-complete problem, 191 Peeling, 40, 44, 45 Perceptron, 9 Perfect shuffle network, 25 Performance comparisons of: parallel algorithms for computing maximal vectors and related problems, 43 parallel algorithms for triangulating point sets, 134 parallel convex hull algorithms, 39 parallel line segment intersection algorithms, 56 parallel minimum circle cover algorithms, 115 parallel point location algorithms, 70 parallel polygon triangulation algorithms, 131 parallel polygon, half-plane, rectangle and circle intersection algorithms, 60 parallel QNN, ANN and CP algorithms, 93 parallel visibility and separability algorithms, 86 parallel Voronoi diagram algorithms, 106 Pipelining, 68, 77, 82 Pixel, 12, 40, 94, 137, 141 Plane sweep, 51: tree, 54, 67, 130 Plus minus 2' network (PM2I), 24 Point location, 2, 57, 65, 104 Polygon circumscribing, 45, 122 convex, 2 general, 5 horizontally monotone, 129 inclusion, 5 intersection, 5 monotone, 58 one-sided monotone, 129 rectilinear, 117, 122, 123 separation, 85 simple, 5 star shaped, 57
vertically convex, 61 visibility, 76, 77 with holes, 62, 76, 135 Polygonal chain, 44, 78, 122 convex, 44, 61, 181 Prefix sums, 29, 157, 175 Priority queue, 139 Probabilistic time, 3 Processor, 4 network, 10 Proximity problems, 89, 99 Pyramid network, 14 Quad, 105 tree, 94 Random access machine (RAM), 22 Real, 22 Range searching, 70, 146 tree, 71, 72 Rank, 158 Ray shooting, 189 Reachability, 87 Rectangle: isothetic, 72, 123 maximum empty, 122 minimum-area, 166 of influence, 104 query, 61, 72 Rectilinear convex hull, 40 convex image, 40 polygon, 117, 122, 123 Recursive doubling, 157 Relative neighborhood graph, 135 Retrieval: direct, 71 indirect, 71 Reversing, 161 Routing, 3, 154 many-to-one, 163 Running time, 3 Scan model, 23 Screen, 12, 40, 41, 137, 141 Searching circular range, 72 cooperative, 68 ECDF (Empirical Cumulative Distribution Function), 41, 137, 147, 150, 168 extremal, 164 geometric, 138 multi, 188 next element, 69 orthogonal range, 71 range, 70, 146
Index Segment dragging, 55 tree, 51, 72, 94, 133 Semigroup operation, 35 Separability, 75, 84 Sequential algorithm, 3 Sequentially separable simple polygons, 85 Set difference, 163 Shadow, 141 Shared memory model, 4, 20, 170 Similar polygons, 72 Simple polygon, 5 Size: of a combinational circuit, 175 of a problem, 2 Sorting, 22, 27, 158, 159, 173, 176 Stabbing, 188 Star network, 19, 151 Star shaped polygon, 57 Step: computation, 3 routing, 3 Subdivision arbitrary, 65 convex, 65 hierarchical, 67 monotone, 67 planar, 65 triangulated, 65 Supporting line, 133 Systolic array, 10 screen, 12, 40, 41, 137, 141
215 Tangent, 29, 32 Thiessen tessellation, 99 Threaded binary tree, 71 Translation, 161 Transversal, 192 Trapezoidal decomposition (map), 82, 127 segment, 187 Tree alternating, 120 augmented plane sweep, 67 balanced search, 139 bridged separator, 68 Euclidean minimum spanning, 99, 115, 123 free, 105 multidimensional binary (or kD), 70, 72 network, 13 plane sweep, 54, 67, 130 quad, 94 range, 71, 72 segment, 51, 72, 94, 133 threaded binary, 71 Triangulated sleeve, 116 Triangulation, 2, 5, 116, 127, 135 Delaunay, 100, 102, 116, 131, 135 minimum weight, 134 of a point set, 131, 147, 168 of a polygon, 127, 189 Two-dimensional array, II Two-set dominance counting, 42, 147, 150, 168
Unmerging, 158 Upper: bound, 3 hull, 29, 133, 181
Vector sum of two convex polygons, 168 Vertically convex polygon, 61 Visibility, 2, 75, 188 chain, 78 graph, 84 hull, 77, 85 pair of line segments, 83 polygon, 24, 76, 77 Voronoi diagram, 2, 40, 89, 99, 116, 132 discrete, 103 furthest site, 100 of disks, 105 of order k, 107 weighted, 121
Well balanced curve segment, 55 Width: of a combinational circuit, 175 of a point set, 167 Work, 3 Worst case analysis, 3 Y-disjoint edges, 102