World. Scientific Series in Computer Science Vol. 33
Mathematical Foundations of Parallel Computing
WORLD
SCIENTIFIC
SERIES
IN
COMPUTER
SCIENCE
Published
11:
Control Theory of Robotic Systems ( J M
12:
An Introduction to Chinese, Japanese, and Korean Computing (J
13:
KT
Huang
& T D
Skowronski)
Huang)
Mathematical Logic lor Computer Science (Z W Lu)
14:
Computer Vision and Shape Recognition (Eds. A Krzyzak,
15:
Stochastic Complexity in Statistical Inquiry ( J
16:
A Perspective in Theoretical Computer Science — for Gift Siromoney (Ed. R Narasimnan)
17:
Computer Transformation of Digital Images and Patterns (Z C Li, T D Bui, Y Y Tang
&CY
T Kasvand
Suen)
Commemorative Volume
Suen)
18:
Array Grammars. Patterns and Recognizers (Ed. P S P
19:
Structural Pattern Analysis (Eds. R Mohr,
20:
A Computational Model of First Language Acquisition {N
21:
The Design and Implementation of ConcurrentSmalltalk ( V
22:
From Humans to Computers — (V
& C Y
Rissanen)
V Alexandrov
& N D
Th Paviidis
Sanleliu) Satake) Yokole)
Cognition Through Visual Perception
Gorsky)
23:
Introduction to Theoretical Computer Science (Ma
24:
A Digital Optical Cellular Image Processor — Implementation ( K - S
Wang)
& A
Xiwen)
Theory, Architecture and
Huang)
25:
Computer Epistemology — A Treatise on the Feasibility of the Unfeasible or Old Ideas Brewed New (7" Vatnos)
26:
Applications of Learning and Planning Methods {Ed. N G
27:
Advances in Artificial Intelligence —
Bourbakis)
Applications and Theory (Ed. J
28:
Introduction to Database and Knowledge-Base Systems ( S
29:
Pattern Recognition: Architectures, Algorithms and Applications
30:
Character and Handwriting Recognition —
31:
Software Science and Engineering —
(Eds
(Eds.
33:
R Plamondon
/ Nakata
& H D
& M
Bezdek)
Krishna)
Cheng)
Expanding Frontiers (Ed. P S P
Wang)
Selected Papers from the Kyoto Symposia
Hagiya)
Mathematical Foundations of Parallel Computing {V V
Voevodin)
Forthcoming
32:
Advances in Machine Vision — (Eds.
34:
S E
Strategies and Applications
Petriu)
Language Architectures and Programming Environments (Eds. T Ichikawa H
For
C Archibald
&
Tsubolam)
a complete
list
of published
Mies
in the series,
please
write
in to the
publisher.
World Scientific Series in Computer Science Vol. 33
Mathematical Foundations of Parallel Computing Valentin V.
Voevodin
Russian Academy of Sciences
Vfe
World Scientific Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing Co. Pie. Lid. P O B o x 128. FarrerRoad, Singapore 9128
USA office: Suite IB, 1060 Main Street, River Edge, NI07661 UK office: 73 Lynton Mead, Toiteridge. London N20 SDH
M A T H E M A T I C A L FOUNDATIONS OF P A R A L L E L C O M P U T I N G Copyright © 1992 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,electronic or mechanical, including photocopying,recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
ISBN 981-02-0820-0
Printed in Singapore by Genera! Printing Services Pte. L t d .
V
To
my
faithful,
kind,
and
loving
wife
Sima
This page is intentionally left blank
vii
Contents Preface Chapter
I .
Algorithm
and I t s Graph
1
§
1.
§
2.
Algorithm notations
§
3.
Graph o f a l g o r i t h m
§
4.
Topological
§
5.
S c h e d u l e s a n d g r a p h m a c h i n e --
§
6.
Examples
Chapter
General n o t i o n o f a l g o r i t h m
2.
2 6 -
17
sorting
26 —
Algorithm
34 43
... E x e c u t i o n Time
55
§
7.
Vector p r o p e r t i e s o f schedules
57
§
8.
Number s e m i r i n g s a n d o t h e r s e t s
67
§
9.
Minimax p r o p e r t i e s o f schedules
80
§ 10.
O p t i m a l and high-speed schedules
89
§ 11.
Examples -
100
Chapter
3.
Algorithms
a n d C o m p u t e r Memory
—
107
§ 12.
Examples -
•
107
§ 13.
T o t a l r e q u i r e d memory s i z e
120
§ 14.
H i e r a r c h i c a l memory
135
§ 15.
Sectioning
§ 16.
D e c o m p o s i t i o n o f a l g o r i t h m and o f i t sgraph
Chapter 4.
Matrix
o f memory
143
Investigation o f Algorithm - - —
Graphs and m a t r i c e s
§ 18.
Recovering t h e l i n e a r
§ 19.
Computing g r a d i e n t and d e r i v a t i v e
§ 20.
Roundoff
§ 21.
Examples
Chapter
5.
Structure
-
§ 17.
148
-
-- 155 156
functional
164 •
error analysis
169 174 186
Functional
Investigation
§ 22.
Space-time
schedules -
§ 23.
Regular graphs
of Algorithm
Structure
193 193 203
viii § 24.
Passage t o t h e l i m i t
§ 25.
Data streams
§ 26.
Examples
C h a p t e r 6.
2 1 5
---
A l g o r i t h m Graph and
§ 27.
Some s t a t i s t i c s
§ 28.
Order r e l a t i o n
2 3 1
-
2
A
Schedules B u i l d i n g
0 252 253
— -
—
256
§ 29.
Notation particularities
267
§ 30,
Guidelines
275
§ 31.
Splitting
§ 32.
Linear
§ 33.
Branching Linear
§ 35.
Examples —
the algorithm graph
index
§ 34.
Afterword
f o r a l g o r i t h m graph b u i l d i n g
282
expressions —
292 301
-
information closure
308 313
-
-
327
References
332
Index
340
ix
Preface The cians. the
subject
matter
A l o t of things
objects
rithms.
of this still
o f our examination
However,
our i n t e r e s t
n a t u r e . We f o c u s o n s t u d y i n g ularities of their
i s nontraditlonal
are well
known:
i n algorithms
the structure
investigation that
mathematicians.
The
most
f o r mathemati-
i n spite of the fact we
will
i s of a
study
rather
that algo-
specific
o f a l g o r i t h m s and on p a r t i c -
implementation on p a r a l l e l computers.
pect o f algorithm to
book
feel unfamiliar,
I ti s t h i s as-
i s n o n t r a d i t l o n a l and p o o r l y
nontraditlonal
part
of
i t Is
known
research
methodology. Before
the reader
s h o u l d be made c l e a r . achieve
over
several
decades,
of computational
things
went, that
more
period.
One
a dozen
interests
and n u m e r i c a l
problem
are i nthe
software.
every
Recall
various
supposed
f o r whom a c o m p u t e r
that
spare a l g o r i t h m
algorithmic
designers
computers.
t o handle
grams t o c u r r e n t This
abnormal from
initial
systems
a l l activities
goal
that
i t will
pert
i n algorithmic
be reached
were
a
tool
code
accordance
languages. o f view o f
he u s e s
originally
together
i n his
Intended
to
required
with
t o adjust
compilers
were
application
pro-
settings.
has n o t been a c h i e v e d
yet,
and i t i s u n l i k e l y
i n n e a r f u t u r e . One d o e s n o t h a v e t o be a n e x -
languages
t o come t o t h i s
name a l a n g u a g e t h a t g u a r a n t e e s t h a t p r o g r a m s efficiently
time the
the necessity to scrutinize the p e c u l i a r i t i e s
Operating
machine
i s merely
languages
the point
way
learned
changes i n t h e source
o f machine-independent a l g o r i t h m i c i s highly
The
h a d t o be
persisted
t o a new c o m p u t e r :
o f things
mathematician
work.
viewpoint
that.
research
d i f f e r e n t computers
a n d t h e same
the specifications This state
of
the author's
t o be made, a l t h o u g h a l l p r o g r a m s w e r e w r i t t e n i n s t r i c t
with
a
the author's
mathematics
than
programs were t r a n s p o r t e d had
t h e book,
t h e m s h a l l be e a s i l y u n d e r s t o o d a f t e r
For field
proceeds w i t h
H o p e f u l l y b o t h t h e r e s e a r c h g o a l s a n d t h e ways t o
conclusion. ini t will
Just
be implemented
o n d i f f e r e n t c o m p u t e r s w i t h o u t a n y c h a n g e s . You w i l l
soon be c o n v i n c e d t h a t no such languages e x i s t .
try to
pretty
X
Of for
course,
poor
we
may
blame
appreciation of
partly
justified
tainly
got
inant.
I t m u s t a l s o be
far
lost
from being The
s u c h i s s u e s as
present
of
are l e f t
that
results
both
system
developers
The
rebuke
is
i n t e r e s t s have
cer-
are
dom-
and
the
is
no
designers'
not
processes.
The
round
longer problems
are
a l g o r i t h m s and as
should
the r e s u l t s
the
I t i s well scheme o f
as
different
them o f f i n d i f f e r e n t
developing
diversity
(maybe
treated
from
this and
ways. of
the
t o make
with
the p o i n t
such
may
re-
In practice
this
results.
Then
we
a c t u a l l y mean b y
sure
of view
re-
speaking,
follow while
certain
lat-
i s not
computers
"What d o we
programs
machine-independent
that
algorithm
language n o t a t i o n s . S t r i c t l y
substantial
important
known an
information
l a n g u a g e ? What r u l e s s h o u l d we
be
most
mode o f r o u n d i n g - o f f . The
w i t h a b u n d l e o f q u e s t i o n s , as:
viewed
of
related
machine-independent,
b r i n g s about
one
the o v e r a l l
machine-dependent.
machine-independent
so
obtained
of computational
n u m b e r s and
eventually
vested
interests
number r e p r e s e n t a t i o n and
are
are
other
owned h o w e v e r
I n most a l g o r i t h m i c
languages
ing
l o t of
i s i n f l u e n c e d by
factors
and
needs.
simple.
characteristics
flected
the a l g o r i t h m designers'
among a
accuracy
accuracy
ter
as
language, c o m p i l e r ,
algorithm designers'
that
a
designcan
be
reservations)?
they
How
of accuracy?"
And
on. Huge a m o u n t s o f
effort
have gone
into
a b o v e q u e s t i o n s . S t a n d a r d s w e r e a d o p t e d on
finding
the
answers
to
the
number r e p r e s e n t a t i o n and
r o u n d - o f f modes. E r r o r b o u n d s w e r e o b t a i n e d f o r many a l g o r i t h m s . On whole, however, ward e r r o r lysis
are not
i s t o o t i m e - c o n s u m i n g . No
possible that that
the r e s u l t s
a n a l y s i s approaches are
influence
constructive,
forward and
and
interval
other approaches c u r r e n t l y e x i s t .
the c h i e f achievement
roundoff error
s p e c t a c u l a r . The not
i s the f u l l
c a n n o t be
r e a l i z a t i o n of the
ignored during numerical
on the
backanaI t is fact soft-
ware development. For
a
long time
fundamental different certain
i t seemed t h a t
difficulty
concerning
the accuracy
the
implementation
c o m p u t e r s . Though i t s easy s o l u t i o n prospects
went on o f b u i l d i n g
nonetheless up
came
the confidence
problem
into that
could
view.
of
was
the
only
algorithms
n o t be
Gradually
t h e development
counted the of
on on,
process machine-
xi independent numerical The plication
of
The
entire
following
quite
what
The
ation
facts
what
But
of
every
new
numerical
pilers exist
The
author's The
initial
ra,
parallel
numerical
author
was
parallel
computation
s u c h b r a n c h e s was However, long.
Initial
questions
grew
to as
of
many q u e s t i o n s
studied.
The
i t is of,
t o be
has
done
been
de-
reconsider-
language
com-
t h e code t h e y
systems t o f u l l
what
new
was
rather
issues
as
linear
optimization, managed
program
extent.
large
m e t h o d s and
fields
compiler
gene-
on.
computations
physics, the
with
this
was were
i f no
not no
knowledge
new
field
to
numealgeb-
etc.
The
recognize
information
stretched out
illuminating. answers. of
the
The
area
i t dawned u p o n t h e a u t h o r
on
found
in literature
for
Moreover, number
enlarged that
the
new
of
such
and
ac-
answers
s i m p l y because
those
known. software design proved
difficulties
architectures,
processing
and
regards numerical
there
c o u l d n o t be
E f f e c t i v e numerical
system
systems
have
yet
made u s e
is
c o n t i n u e d o n and
in a serial
the author's
answers were not y e t
well
be
what
High-level
traditional
papers
which
quired profundity. Gradually to
difficulties
f o r c e s another
understand
t o know how
familiar
reading
arose
i n such
branches
This
specified.
getting
questions
parallel
of p a r a l l e l
computational
curious
the r e -
c o m p u t e r s . However, i t i s g e n e r a l -
to
I n t o p l a y as
analysis,
also
ap-
undertaken.
i t can and
in parallel
wanted
s o f t w a r e development
how
are,
very e f f e c t i v e
interest
just
large
Once a g a i n
t o be
grave
algorithms.
the resources
supercomputers brought rical
achieved,
t h i s k i n d c o u l d be
author
feasible.
to solve
come I n t h o u s a n d s ,
architecture
and
they are not
of facts of
had
new
various
parallel
r a t e does not e x p l o i t
1 imited.
for
methods
that
that
parallelization
for individual
recognized
list
suggest
software
use
confidence. lore
parallelization
of
languages I s
their
parallelization,
is actually
bottlenecks
Certainly,
veloped.
The
that
algorithmic
p a p e r s on
clear
the
next.
ly
the
t h e a s s o c i a t e d b u z z w o r d was
s p r u n g up. not
in algorithmic
c o m p u t e r s and
p r o b l e m s have undermined
furbishment time,
software
advent o f p a r a l l e l
which
are
causes
o r g a n i z a t i o n s . These
primarily the
i n their
t o be d i f f i c u l t and due
to
corresponding turn
imply
wide
variety
variety the
not
in
of data
necessity
of
xii various numerical
m e t h o d s and
Restructuring parallel
existing
computers
a l g o r i t h m s , s o f t w a r e , and
software
I s not
an
easy
c l e a r whether such r e s t r u c t u r i n g
new
languages.
t o meet t h e r e q u i r e m e n t s task.
Moreover,
is feasible
i t is
of
not
at a l l . Despite
large always
the
large
number o f p a p e r s o n p r o g r a m p a r a l l e l i z a t i o n m e t h o d s , s o f a r t h e r e i s no general
practical
reason,
most o f t e n
developed
constructive new
f o r large
methodology
algorithmic
parallel
in
this
l a n g u a g e s and
computers,
and
field.
numerical
For
that
methods
a l l programs
are
are
written
anew. Mathematicians again
and
again
can
hardly
revise
their
computers, whether w i t h Any portant little
computational fact
about
v e l o p and of
must
by
to
them
i s based
recognized.
errors
tailor
on
Dominant over
to
to various
new
or
not.
really
s e v e r a l d e c a d e s von-Neumann
and
most o f t e n
to just
two
imvery
they
the
de-
science
architecture
computer-related
r e q u i r e d memory s i z e . ignored
most know
algorithms that
long p e r i o d of e x i s t e n c e of
o p e r a t i o n count
i n f l u e n c e was
prospect
some a l g o r i t h m . A
o f t h e m e t h o d s and
of the
the
"super"
Mathematicians
the a l g o r i t h m designers' a t t e n t i o n
characteristics: ing
be
in spite
computations.
riveted
concerned
algorithms
process
now
not
the f a s h i o n a b l e p r e f i x
the s t r u c t u r e
use,
be
in practical
Even
round-
projects.
The
exploration of structural
p r o p e r t i e s o f a l g o r i t h m s , a s e.g.
modularity,
has
The
the
of
remained
rudimentary.
parallel
computers
computational
structures.
language,
compiler,
It
cannot
in
found
Related and
to
without
sciences,
computer
and
solve
by
large
adequate
i n particular
architecture
arrival
problems,
knowledge
of
algorithmic
development,
were
in
H o w e v e r , l e t us
go
find
o f a c h i e v e m e n t s . Let us pose q u e s t i o n s
answers.
from f i n d i n g
Let
us
these
t r y to unearth
a n s w e r s . T h e n we
f a c t know a b o u t p a r a l l e l motivated
understand
and
t o which
the o b s t r u c t i o n s that shall
r e a l i z e how
we
pre-
little
we
computations. the author
t i o n o f numerous p r o b l e m s a r i s i n g to
employment itself
that
i s e a s y t o f o r e s e e many o b j e c t i o n s t o t h i s .
This reasoning
was
i t a l l was
plight.
beyond t h e l o n g l i s t
v e n t us
their
mathematics
algorithm
similar
and
upshot o f
to start
in parallel
investigate
a thorough
computations.
mathematically
the
investigaThe
impulse
bottlenecks
of
xiii analyzing formulas The an
and i m p l e m e n t i n g p a r a l l e l i s m and programs t o compilers joint
i n v e s t i g a t i o n o f computational
exceptionally complicated
attractive
idea.
of
mathematical
uniform
rithms. This sity
i n algorithms
Perhaps
choice
t o provide
b e c a u s e we m u s t
task.
s y s t e m s and a l g o r i t h m s i s
both
be a b l e
i t i s an
computers
and a l g o -
t o make n o t s o much d u e t o t h e n e c e s -
descriptions of certain t o s e t and r e c o v e r
these o b j e c t s , and a l s o
t o solve
stage o f i ti s t h e choice
t o describe
i sdifficult
mathematical
systems.
To u s e c o m p u t e r s
t h e most d i f f i c u l t concepts
fixed
from
and c o m p u t a t i o n a l
to collate
objects,
various
and t r a n s f o r m
but primarily
characteristics of
individual
o b j e c t de-
scriptions. It
I s almost
evident
that
algorithms
can be d e s c r i b e d
( w i t h a f e w r e s e r v a t i o n s ) . I t c a n h a r d l y be d o u b t e d t h a t ciple Is
possible
moment. tions
ended
Of c o u r s e ,
attempts
graph
theory
t o make
i n a disappointment.
much
specific
possible. of
t o describe
t h a t g r a p h s came i n t o with
computational
the author's
i t s numerous
This
t h e same
functional
s y s t e m s . So i t
view a t a beautiful
certain applica-
use
of available
Research
features
f o r c e d us t o consider
order
as
units.
As
t h e number a
goals
o f algorithms
rule,
graph-theoretic
implied
results
taking into
and c o m p u t a t i o n a l
account
systems
g r a p h s whose number o f n o d e s
of algorithm
t h e number
operations
o f algorithm
and
known puters
a t compile
time.
Correlating properties of algorithms
implies determining
momorhlc, e t c . Graph
whether
their
theory has n o t h i n g
practically
as
running
ive
the algorithms
search. Finally
terminology
This
However, even
themselves on s e r i a l i n linear
i s o f course unacceptable
i t became
clear
and a few b a s i c
e n c e was d r a w n f r o m
that
facts.
graph
take about
a n d como r ho-
linear
requiring
time
t h e exhaust-
i n terms o f time theory
research.
search
t h e same
c o m p u t e r s . A c t u a l l y most
time,
can o n l y
Nonetheless, a very
t h e accomplished
Is
t o suggest f o r the s o l u t i o n o f
u n a c c e p t a b l e f o r u s , as i t w i l l
p r o b l e m s cannot be s o l v e d
was
are not
graphs a r e isomorphic,
s u c h p r o b l e m s b u t f o r some k i n d o f s e a r c h . is
as
system
operations
huge. M o r e o v e r , i t t y p i c a l l y depends on c e r t a i n p a r a m e t e r s t h a t
vital
graphs
contributed to this. The
as
t o use g r a p h s
not surprising
by
i t i s i n prin-
cost. lend
important
Specifically,
us t h e infer-
the joint
xiv investigation fective count.
of computational
systems and a l g o r i t h m s
Although
the f i r s t
results,
i t was
means o f r e s e a r c h . being
used
their
jects.
types
and nodes. The r e l a t i o n was n o t a l w a y s
and
papers
as
the mathematical
authors,
were
entire
However,
scope
there
the author's
at different
o f our consideration:
parallel
the author's
computers.
graphs denoting
t h e same
were
alterna-
no v i a b l e
confidence
times, Taken
that
an
works
and t h e y
they
connection covered the
programs,
languages,
had g r e a t
impact
have
deserve
elegant
to different
and had none together,
algorithms,
These
viewpoint
same o b -
instrument.
Yet each paper used graphs.
compilers, forming
published
elaborate.
i n s i z e and i n t h e i d e n t i f i c a t i o n o f
t r e a t i s e c o u l d be p u t t o g e t h e r u s i n g g r a p h s . T h e y b e l o n g e d
whatever.
f o r long.
not adequately
between d i f f e r e n t
clear.
boosted
systems
actually
o f g r a p h s were used t o d e s c r i b e
both
t i v e s f o r the r o l e o f the mathematical Several
graphs
and c o m p u t a t i o n a l
fragmentary
Those g r a p h s d i f f e r e d
objects
to abort
construct-
p r o a r g u m e n t was t h a t g r a p h s w e r e
algorithms
M o r e o v e r , many d i f f e r e n t
I n general?
t o use graphs d i d n o t y i e l d
undesirable
u s e was
be e f -
i n t o ac-
questions.
attempts
The c h i e f
t o describe
Certainly,
arcs
only
B u t w h a t a r e t h e s e f e a t u r e s ? What i s g r a p h s t r u c t u r e
T h e r e w e r e no a n s w e r s t o t h e s e
ive
will
i f the c h a r a c t e r i s t i c f e a t u r e s o f o u r graphs a r e taken
t o be m e n t i o n e d
on here
specially. The search
first
lustrating program
t h e use o f graphs.
graphs
Another graph plementation are
paper I s a r e v i e w
by A.P.Ershov
are
listed
type mentioned
not discussed
In particular,
there
data f l o w graph.
(control
i n t h e review
Next
go
as
methods
t h e papers
however
[61,62]
by
sequential
programs.
suggested:
types
flow,
of
etc.).
i s t h e s o - c a l l e d p r o g r a m Im-
that
t h e most
interesting re-
For a
graph.
L.Lamport.
of analysis of the parallel
were
data
s t r u c t u r e o f a l g o r i t h m s and programs a r e ob-
tained using various modifications o f that
feasibility
a l l essential
flow,
I t s p r o p e r t i e s and p o t e n t i a l a p p l i c a t i o n s
i n 1321. Note
search r e s u l t s o f p a r a l l e l
down
[ 3 2 1 . I t sums u p t h e r e -
i n p r o g r a m schemes t h e o r y . T h e r e a r e a l s o n u m e r o u s e x a m p l e s i l -
They
demonstrate
structure of algorithms
class
the coordinate
o f programs, method
and
two
the
the
written analysis
hyperplane
XV
method. class
These
o f programs,
parallel 132]
proved
system graph
[61,62],
we
associated with
the terminological
notice
an
data
flow
important
converter easily
that
a s e r i e s o f papers
automatically
t h a t machine-independent structural
lems
inspiring
used
i nParafrase
ment o f t h a t systolic [68]
class
tempt
systolic
t h e paper's itself,
t o bring
into
i sa conclusive
evidence
t o study a wide
Graphs
arrays f o r a rather chief
systolic flow
accord
variety
mathematical
[68] by D.I.Moldovan.
message
but rather
data
become i s ana-
them.
arrays
graph.
algorithm
lies
i n the rules
t h e terminology from
t o construct
implementation
they
structure
prob-
are widely
tool.
t h e paper
arrays. Borrowing
suggests
program
b y D.J.Kuck a n d o t h e r s
t o p a r a l l e l i s m . Development o f
to investigate
as a r e s e a r c h
laid the
s t r u c t u r e o f graphs.
Program
Parafrase
features related
mathematicians
However,
coordinate
[61,62]
programs so t h a t
computers.
t o o l s may be d e v e l o p e d
t o constructing
algorithms.
[54,55]
Thus
t o a l o t o f new a n d m e a n i n g f u l
F i n a l l y we m e n t i o n formally
parallel
restructures
on p a r a l l e l
such t o o l s g i v e s r i s e
a
I s e s s e n t i a l l y a FORTRAN-to-FORTRAN
l y z e d b o t h on macro- and m i c r o l e v e l .
of algorithm
between
concerning the
In particular,
constructively.
t h eParafrase. Parafrase
implementable
differences
t h e p r o g r a m was i n t r o d u c e d , a n d a way t o p l a c e
i n I t was s p e c i f i e d
N e x t we m e n t i o n
f o r the selected
development
graph.
f o u n d a t i o n f o r u s i n g computers t o analyze
describing
effective
i n c o r p o r a t e d i n a number o f c o m p i l e r s f o r
Ignoring
implementation
nodes
sufficiently
and were
computers.
and
program
t o be
narrow
class of
not i n the treatused
t o construct
[ 3 2 ] , we may s t a t e
using
This
I t i s dedicated
projections
may be v i e w e d
structures
that
of the
as an a t -
and p a r a l l e l
system
architecture. Naturally,
thediscussed
papers d i d n o t f u r n i s h
questions.
Yet taken together they strengthened
provide
sound
a
base
from mathematical Of as
well.
course,
formulas
Brent
research
t o computational
the author's
The a u t h o r w o u l d
B a r l o w R.H., I.S.,
f o ra global
positions
like
were
answers t o a l l t h e
theopinion that
into
parallelism
systems. influenced
t o g r a t e f u l l y mention
R.P., Demmel J.W.,
graphs ranging
by o t h e r
here
papers
t h e works o f
Dennis J.B., Dongarra J . J . ,
E r s h o v A.P.. E v a n s D. J . , F a d d e e v O.K.
Duff
a n d F a d d e e v a V. N. , G u r d J . ,
XV
i
Heller
D. ,
Lamport Siegel their to
Hockney
L. ,
B, W. , Hwang K. . K u c k
Maslov
V.P.,
H.J., S t o n e
Moldovan
H.S.,
Traub
J . F. ,
w o r k was n o t a l w a y s d i r e c t ,
steer
the r i g h t
course
D. J . ,
D.I.,
Kung
Plemmons
S. ¥ . , K u n g B. J . ,
a n d many o t h e r s .
b u t t h e y have h e l p e d
i n studying
H. T. ,
Sameh
The
A. H. ,
impact o f
and a r e h e l p i n g
the mathematical
foundations
of
parallelism. I n t h e book topics.
[88J t h e author essayed t o s t a t e h i s o p i n i o n s
The c h i e f
ficulties
objective
that hinder
was
t o understand
the
implementation systematic
problems,
data
study
including
flow
graph
of algorithm
I ti s i d e n t i c a l
t h e problem
o f mapping
the
way t o s o l v e information
can
only
o p t i m a l l y most
important
that
a lot of
onto
parallel
t h a t t h e main h u r d l e on
p r o b l e m s was t h e t o t a l
on t h e s t r u c t u r e o f a l g o r i t h m
be d e r i v e d
us t o s o l v e
algorithms
was
t o the pro-
o f [ 3 2 ] . I t was d e m o n s t r a t e d
graphs enables
c o m p u t e r a r c h i t e c t u r e s . I t was a l s o e s t a b l i s h e d
of
on t h e s e
of the d i f -
t h e i n v e s t i g a t i o n s . The c h i e f r e s e a r c h o b j e c t
t h e a l g o r i t h m g r a p h . To a f e w r e s e r v a t i o n s , gram
the roots
graphs.
This
lack
information
f r o m some a l g o r i t h m n o t a t i o n s , a s m a t h e m a t i c a l
for-
mulas, a l g o r i t h m i c language programs, e t c . The
need
t o conduct
theoretical
cumstances o b l i g e d us t o s t a r t rithms, ing
on p a r a l l e l
i t s structure, parallelizing
computers.
implement c l a s s e s o f a l g o r i t h m s , transported
onto various
They can a l s o
Simultaneous mention here Just existing
sequen-
be c o u n t e d
easily upon t o
that
may
c o m p u t a t i o n a l systems, and so o n .
and p r a c t i c a l
a few o f t h e acquired
efforts
results.
were
I t turned
fruitful.
We
out that a l l
program p a r a l l e l i z a t i o n methods a r e i n f a c t methods o f f i n d i n g
solutions ficients viously
theoretical
algo-
systems o p t i m a l l y s u i t e d t o
t o develop numerical software
parallel
cir-
t o o l s aim a t b u i l d -
p r o g r a m s , r e s t r u c t u r i n g them i n s u c h a way a s t o make them
Implementable
Links
uncertain
tools t o study
s y s t e m s . These
d e s i g n m a t h e m a t i c a l models o f c o m p u t a t i o n a l
be
i n these
developing software
programs, and c o m p u t a t i o n a l
the a l g o r i t h m graph, e x p l o r i n g
tial
research
to a certain
i n e q u a l i t y o f Bellman
are r e a d i l y determined unknown b o t t l e n e c k s
were
structure
established
type,
the algorithm
of parallelization
between
o f an a l g o r i t h m
using
t h e problem
and o t h e r
whereof graph.
processes were
of studying
problems having
the coefSome
pre-
exposed.
the parallel
no a p p a r e n t
rela-
xvii tion
to
that
one,
as
studying round-off
errors
propagation
during
the
a l g o r i t h m e x e c u t i o n , c o m p l e x i t y b o u n d s f o r g r a d i e n t e v a l u a t i o n and ilar the
processes.
The
possibility
ical
memory
continued.
of
connection the e f f e c t i v e
systems The
between
was
reader
also will
a l g o r i t h m graph
algorithm implementation
revealed.
find
The
other
list
of
examples
in
sim-
structure on
hierarch-
examples the
and
could
corpus
of
be the
book. T h i s book p r e s e n t s rithm
structures.
material.
ine.
of
choice of notation
language
out
This
tions
using
a
programs.
model
I s accounted
o f a l g o r i t h m and
the
graph
w h i c h we
of the author's
machine
may
choose
rithms
of
tiated
earlier
f o r by
the
produces
other
most
a g i v e n c l a s s . The
model
of
strongly
the graph joint
properties.
Is
mach-
investiga-
systems
implementation
that
on
Transformation
computational
f o r the
efficiency
algo-
investigation
called
t o conduct
system
suited
into
focus
structure
system
desire
computational
those
demands we
Algorithm
computational
research
t o represent algorithms i s not
However, t o r e s p o n d t o p r a c t i c a l
FORTRAN-1 i k e carried
The
the results
approach
i n the course
o f development o f a s y s t o l i c
point
keeping
among
of
was
algo-
substan-
array
design
system [ 8 8 ] . We
make a
Therefore
one
o b s e r v e was mentation
of
out
of
the requirements
our
that
research
the author
separation of algorithm structure considerations. This
requirement
machine-Independent. b e l i e v e d necessary
i n v e s t i g a t i o n from
implies treating
o f a l g o r i t h m s as unknown symbol v a r i a b l e s .
I t I s , however,
that
the values
on
For
e x a m p l e , t h e y may
total
amount
t h e y may of
of
also
parallel
use,
e t c . We
algorithm
influence
branches, do
on
manipulations. prove We
affect
executable
not
t o be
now
h a v e some i m p a c t branching
of computational I t i s natural
other
the
I f we
algorithm
operations.
characteristics
efficiency
know
structure.
vestigations
and
input data
a c o m p u t e r , we
to
ways o f
Implement
must d e v e l o p
p r o c e s s and to
make a
few
surmise
or h i e r a r c h i c a l input
data
the that
number memory
influence
algorithm structure
on In-
s p e c i a l methods f o r symbol
These methods a r e n o n t r a d i t l o n a l f o r t h i s rather
evident
structure.
o f a l g o r i t h m s , as
of d i s t r i b u t e d
concrete are
imple-
a l l Input
data
of
to
research
field
complicated. observations
about
the d i s t i n c t i v e
features of
xviii this
book.
The a u t h o r
tried
t o l e t t h e reader
feel
n a t u r e o f t h e g r o u n d s on w h i c h
the investigation
of
Therefore
a l g o r i t h m s h a s t o be b u i l t .
the indeterminate
of parallel
any a d d i t i o n a l
r e s e a r c h c o n d i t i o n s a r e made o n l y when i t becomes c l e a r made a s s u m p t i o n s author
hopes
a r e u s e d up a n d n o f u r t h e r
this
way
whence t h e c o n s i d e r e d
of exposition
an acceptable
certain
level.
important
ive
dwelling
The
wish
should
problems o r i g i n a t e
a r e used t o s o l v e them. Of c o u r s e , at
This
relations
on minute
help
the author
between
the reader
realize
strived
methods
to maintain
r e q u i r e d so a s no
individual
may
that previously i s p o s s i b l e . The
a n d why s u c h - a n d - s u c h
i s primarily
details
progress
structure
a s s u m p t i o n s on
eclipse
facts.
t h e most
t o s t e e r a midway c o u r s e a l s o h a d i m p a c t
rigor
to lose
However,
excess-
important
facts.
o n t h e manner o f e x -
position. The self
chief
p o i n t s we make a r e p u t down a s S t a t e m e n t s .
a l s o c o n t a i n s a good d e a l
cessarily
more
complex
Statements t o draw proofs are
are just
than
of information. the surrounding
t h e reader's
skeletons
altogether omitted.
examples t o h e l p reader
attention
devoid
Each
understand
contains
view
o f the author.
references who w i l l
not like
is
and
huge
scientific the
scale
I t i s due t o t h i s
i s relatively this
short.
list.
I t keeps
or that
i n another
T h i s b o o k may be r e a d reader's found
preferences.
theoretical
and w o r k
juncture
number
of
that
apologizes
this
the point
the l i s t of
to a l l
field
readers
computing
has no
sound
i t i s n o t easy t o understand
contribution
to parallel
computing
i n c l u d i n g w e l l above one t h o u s a n d
i n several d i f f e r e n t wishes
refer-
t h e book
with
ways, d e p e n d i n g on t h e
to get familiar
the prospects
computer a r c h i t e c t u r e s ,
c a s e , e x a m p l e s may be v i e w e d
proofs
meaningful
book b y t h e same a u t h o r [ 8 8 ] .
I f t h e reader
through
make Many
the simplest
a
circumstance
Nevertheless
foundations, t o realize
mapping a l g o r i t h m s o n t o patience
we
result.
I t presents c h i e f l y
The a u t h o r
paper's
theory. A large reference l i s t e n c e s may be f o u n d
rather,
The number o f p a p e r s i n p a r a l l e l
growing.
basement y e t . A t t h i s of this
a r e n o t ne-
the matter.
T h i s book i s n o t a r e v i e w o f p a p e r s . of
text;
to a particular
of a l l details;
chapter
The t e x t i t -
Statements
pencil
as i l l u s t r a t i o n s
with
pro-
and p i t f a l l s o f
he s h o u l d
marshall h i s
and n o t e p a d .
to theoretical
In this material.
xix If
t h e reader
topics
merely
considered
wishes
t o achieve
a general
i n t h e b o o k , he may j u s t
O n l y a modicum o f a d d i t i o n a l examples.
In this
damentals
of parallei
case
information
read
understanding
t h e Examples s e c t i o n s .
i s r e q u i r e d t o understand the
t h e b o o k may be v i e w e d a s a n e x p o s i t i o n computing
through
o f the
the solution
of fun-
of selected
pro-
blems. T h i s book
i s based on t h e research t h a t
t h e a u t h o r has been
ing
o n f o r many y e a r s now a t t h e D e p a r t m e n t o f N u m e r i c a l
the
USSR
Academy
gratefulness manuscript the took
was
exceptional part
o f Sciences.
The
author
t o G.I.Marchuk f o r c o n s t a n t read
by E . E . T y r t y s h n i k o v .
usefulness
i n . This
book
i s pleased
support
i s also
r e a d i n g a t t h e Moscow U n i v e r s i t y
a
base
t o express h i s
of this
The a u t h o r
o f h i s remarks
r e s e a r c h . The
gratefully
notes
and o f t h e d i s c u s s i o n s f o r lectures
a n d a t Moscow C o l l e g e
Technology.
Valentin
carry-
Mathematics o f
V.
Voevodin.
he
the author i s f o rPhysics
and
Chapter 1 Algorithm and Its Graph B e f o r e we s t a r t cify and
o u r s t u d y o f s t r u c t u r e o f a l g o r i t h m s , we must s p e -
precisely the subject
problem
formulation
search
area
stantial Clearly,
dealing
we
shall
describe The ous,
success
have
to restrict
answer
however,
to this
question
that i t involves
t e n t we s h a l l
a class
number o f w a y s t h e y o f programs
characterize
which
of algorithms
basic we
I ti s this
will
call
studying
permits
have
devise
amounts t o s p e c i f y i n g a c o m p u t e r s . We In full
portions an
w i l l not
detail. I n -
computational
we w i l l
r e v i s e o u r problem:
we w i l l
study
the key
of algorithms).
abstract
machine. Moreover,
the structure of algorithms
to specify a
structure describing
important
will
a graph
that
that
build
a
instead of
the functioning of
machines. Putting
of
This
and programs
some b a s i c
( o r t h e most
s t r u c t u r e we
choice
we e s s e n t i a l l y
or hypothetical
of algorithms
t r y t o discover
l a r g e , ex-
the algorithms
plan.
w h o l e s e t o f s u c h m a c h i n e s . T h e n we w i l l
graph
class o f
I t i s obvi-
probably
t o analyze only
c a n be p u t i n w r i t i n g .
of algorithms
that
system
sub-
b e , a n d how d o we
i s not immediately clear.
computers.
f o rexisting
classes
we w i l l
kernels Using
to a certain
of that class
a
algorithms.
some k i n d o f c o m p r o m i s e b e t w e e n o u r d e -
the f o l l o w i n g research
define
stead,
accurate
loose r e -
t o achieve
arbitrary
t o meet t h e m . To a c e r t a i n ,
be implemented on v a r i o u s
class
hope
our research
b e n e f i t by p r e f e r r i n g
to outline To
cannot
studying
the nature
The
i n the rather
it?
mands a n d o u r c a p a c i t y
can
We
while
o f our investigations
i n the process.
importance
with algorithms.
But what should
the goal
t o be used
i s of special
utilitarian
algorithms.
us
f o rinquiry,
the mathematical apparatus
paramount
this
plan
into
Importance.
effect
will
I n the f i r s t
description of the class o f algorithms possibility
t o construct
a
allow place,
us t o s o l v e we w i l l
t h a t we s t u d y .
mathematical
apparatus
two problems
obtain This
the exact
opens up t h e
t o explore
their
2
structures. to
As we
analyze
place,
fabricate
the s t r u c t u r a l
the
optimally
possibility
properties
arises
of
f o r us
our
to
algorithms.
study
Certainly, structure.
bulk o f our e f f o r t w i l l
above-listed
problems.
The
out
we
will
not
put
It just
is difficult
particular
the
relevant features of algorithms.
implementations hardly
t i o n s o f an a l g o r i t h m on
the graph
preserving
have
sketched
we
p o i n t on what
start
into
implementathose
interest,
units,
more r e a l i s t i c
basing because
t o comprehend a l l
shall determine of
after
first.
architecture
h e l p one
the
f o r i t points
studying the set of
number o f f u n c t i o n a l
as
e t c . T h e n we
computer models,
im-
time, will while
characteristics. the plan
carrying
i s t o be
ever
By
only
investigation
the c h a r a c t e r i s t i c s
machine
the achieved
r i t h m s . Now
problem,
of
algorithms.
computer, p r i m a r i l y
t h e g r a p h m a c h i n e we
minimize
communications,
transform
We
that
second
particular
the
storage,
the
s t r u c t u r e deserve
a
second
i t a l l depends
tackled
t o choose the o p t i m a l p a r a l l e l
on one's e x p e r i e n c e w i t h
plementations
s h o u l d be
able
architectures
the s t r u c t u r e of p r a c t i c a l
aside
which aspects of a l g o r i t h m
be
to the s o l u t i o n of the f i r s t
second problem
h a v e g a i n e d some i n s i g h t i n t o
Nevertheless
go
will I n the
computer
algorithms.
we
for specific
a p p a r a t u s , we
h e a v i l y on t h e f o r t u n a t e c h o i c e o f t h e b a s i c The
tailored
the mathematical
of
algo-
i t o u t . This chapter e l a b o r a t e s our
to
view-
investigated
investigate
and
the
structure
by w h a t means.
1. General Notion of Algorithm In
computer-based
algorithms. gorithms. different who
We
nature,
see
that
constantly
encounter
s o l u t i o n o f computational problems
Information yet
processing performed
f o l l o w s some a l g o r i t h m
the notion of algorithm
a m b i g u o u s and
by
i t i s a l s o d e s c r i b e d by
works a t a t e r m i n a l
often for
The
r e s e a r c h we
can
be
interpreted
a
a l l kinds
of
i s d e s c r i b e d by a l compiler
algorithms.
i s of
quite
Even t h e
man
i n his activity.
i s very widespread. i n m o r e t h a n one
way.
I t s use
is
Consider,
e x a m p l e , t h e a l g o r i t h m o f summing s e v e r a l n u m b e r s , What d o e s i t e x -
actly
mean? From
the point
of
view
of
pure
mathematics,
the
result
does
not
3 depend on t h e order erands i n t h i s dition
But t h a t order
errors,
struction
say,
I n t h a t c a s e we m u s t s e t t l e
selected
order
i n FORTRAN. Due t o t h a t
program
on d i f f e r e n t
course,
we
wrongly.
exclude
we c a n i m a g i n e
computers, t h e case
The d i f f e r e n t
Thus, even
t h e possession
provide What
shall
t h ef u l l is,
we s t u d y
us
discuss
of
rules
The
data
ing by
order
that
o f a program
on t h a t
have agreed
within
processes. can o f f e r
t o start
only
these
data.
any i n d i v i d u a l
I ti sunderstood that
and t h a t
studying
theapplication
the i n of rules
we know t h e r e s u l t
o f each
t h es t r u c t u r e o f algorithms
One c a n t h i n k
t h ed i f f i c u l t y definition,
bas-
i s caused w h i l e we
t h a t c a n be i m p l e m e n t e d o n a
we a r e i n t e r e s t e d i n s t u d y i n g t h e Mathematical
computation-
Encyclopaedia
- perhaps
definition.
Encyclopaedia
defines
s p e c i f y i n g t h e computational
Then
t o be a f i n i t e s e t
mechanically
that
those a l g o r i t h m s
L e t us c o n s u l t a more s t r i c t
input data
algorithms
t o these questions l e t
t o make u s e o f t h e m o s t g e n e r a l
Mathematical
instructions bitrary
does
itself.
an a l g o r i t h m
limits,
unambiguously,
t o study
it
ways
schemes.
application.
c o m p u t e r . More s p e c i f i c a l l y ,
The
t h e answer
t o solve
certain
clouded d e f i n i t i o n .
al
Of
o r used
rounding-off
The s t r u c t u r e o f w h a t
b o o k ? To f i n d
i tpossible
i sdifficult
our attempting
results.
incorrect
i n an a l g o r i t h m i c language
a s e t o f analogous problems.
i sdefined
being
i s es-
i f we r u n t h i s
a r e accounted f o r by t h ed i f f e r e n t
Encyclopaedia defines
make
can vary
thealgorithm
obtain different
numbers, and d i f f e r e n t
an a l g o r i t h m ?
I n this
general
language,
i n f o r m a t i o n on t h e u n d e r l y i n g a l g o r i t h m .
then,
step o f t h e r u l e s It
we s h a l l
t h e meaning o f t h e word " a l g o r i t h m "
problem from data
that
I n some
case. U n f o r t u n a t e l y ,
o f t h e program
results
computers use t o represent
to
computer's i n -
o n some d e f i n i t e
o f summing c a n be e x p r e s s e d
s e n t i a l l y a FORTRAN p r o g r a m i n t h i s
put
c a n be due t o t h e i n f l u e n c e o f
or theidiosyncrasies of the particular
set, etc.
o f op-
once t h e a d -
summing, d i s c a r d i n g a l l o t h e r s . The
not
we c a n a c c e p t a n y o r d e r
becomes s i g n i f i c a n t
i s performed by a computer. This
rounding
of
o f terms. Therefore,
context.
and aims a t a c h i e v i n g i t proceeds
an a l g o r i t h m
process
that
t h er e s u l t
t o e x p l a i n what
formal
t o be
starts
formal
withar-
that corresponds t o instructions are,
4 and
what t h e c o m p u t a t i o n a l It
looks
like
mathematical d e f i n i t i o n al
c o n c e p t s . Ue c a n o n l y i t down
using
t o continue
notions
t h a t cannot
to
solve
that of
to refine
rationalize or c l a r i f y
rigorous
i t , detail hope
t o more
algorithmically,
that a l l
t h e r e was no g o o d suggested
t h e r e was a t a c i t
r e c i p e was a n " a l g o r i t h m " . The n e c e s s i t y
basic
i t i n some way,
existed,
t h e c o n c e p t o f a l g o r i t h m . Once a r e c i p e was
a problem o r a class o f problems,
this
f o ra
be r e d u c e d
some n o t a t i o n , e t c . W h i l e
m a t h e m a t i c a l p r o b l e m s c a n be s o l v e d reason
our search
o f a l g o r i t h m . The n o t i o n o f a l g o r i t h m i n g e n e r -
i s one o f t h e p r i m a r y
write
p r o c e s s i s , and so o n .
i tis futile
to refine
agreement the notion
a l g o r i t h m o n l y m a t e r i a l i z e d when i t was p r o v e n t h a t t h e r e e x i s t
b l e m s t h a t c a n n o t be s o l v e d We s t a r t ing
using
algorithms
pro-
from a s p e c i f i e d class.
o u r e l a b o r a t i o n o f t h e concept o f a l g o r i t h m by enumerat-
i t s f e a t u r e s . Here
i s the l i s t
provided
by t h e Mathematical
Encyc-
lopaedia: - the set of possible
input
- the set of possible
results;
-
intermediate
the set o f possible the r u l e
to start
the r u l e o f the r u l e
the algorithm
execution;
to obtain the result.
There a r e s e v e r a l o f a l g o r i thm.
commonly a c c e p t e d We
mention
listed
the general
refinements
fying
exactly
vary.
The c h o i c e
are equivalent.
o f ranges
of the intuitive
machine,
recursive
func-
speaking, each o f t h e refinements
notion of algorithm,
t h e range w i t h i n
refinements
the Turing
t i o n s and n o r m a l a l g o r i t h m s . S t r i c t l y curtails
results; execution;
processing;
t o end t h e a l g o r i t h m
- the rule
concept
data;
Every
although
i n a sense
refinement
consists
a l l the
i n speci-
which each o f t h e seven parameters can
i s what
distinguishes
one r e f i n e m e n t
from
o f p a r a m e t e r s i s minimum m i n i m o r u m f o r c o m p u t a t i o n a l
pro-
another. Our cesses. of
list
I ti soften insufficient
algorithms.
part data.
on
Notice
the data,
and
that
f o r t h e d e s c r i p t i o n o f many p r o p e r t i e s
a l l seven
the algorithm
The d a t a h a v e t o be s t o r e d
parameters consists
bear
entirely
I n transforming
somewhere d u r i n g
the algorithm
or i n these execu-
5 tlon.
The media I n c l u d e a s h e e t
optical
storage
i c o n s on paper volved
devices,
o f paper, magnetic
e t c . The
to physical state
i n any a l g o r i t h m , e i t h e r
agreement
must e x i s t
methods
o f devices. explicitly
on t h e r u l e s
by which
cing
memory a s p a r t
o f t h e concept
sing
trivial
the algorithm execution
cases,
until
neither tures
they
a r e needed.
a b o v e . We w i l l
first,
i f we
make
point,
into
a n d many
The can
be
the
initial being
subsequent
All
quantity
the transition state
Issue
o f some
Turing
cell
choice
makes
of the Turing
and o u r d e c i s i o n t o con-
another.
with
Note
that the o n e . The
to carry
cells.
i s empty
each
the previous
i s a tape
into
that
states:
step-by-step,
and ends a t t h e f i n i t e
i s required
I fa cell
device
a r e two s p e c i a l
into
coincide
and p a r t i t i o n e d
state.
through
stretching
Each c e l l then
we
the algoinfinitely
c a n s t o r e one
say t h a t
I t con-
letter. machine
o f i t as h a n g i n g
writing
notion of
our research
This
similarity
There
one s t a t e
state
that
alphabet.
ground
The m a c h i n e w o r k s
i n memory. The memory
both d i r e c t i o n s
The
from
later.
o f as an a u t o m a t i c
of states. ones.
at the i n i t i a l
i s stored
computers,
can i n general
the information
tains a "void"
one
later. I t
c a n be i m p l e m e n t e d o n c o m p u t e r s .
c a n be t h o u g h t
and t h e f i n i t e
process s t a r t s
think
this:
o f memory
i n mathematics
as w e l l
the architectural
o f the existing
i n a finite
again
e.g. t h e T u r i n g machine.
account
T u r i n g machine
accepted
we c a n j u s t
sider only those a l g o r i t h m s that
letter
i s no m e n t i o n
t h i s matter
t h e commonly
our starting
taking
machine
in
Bypas-
goes on l i k e
others a r e performed
Yet t h e r e
discuss
one o f I t s r e f i n e m e n t s ,
sense,
rithm
introdu-
i n t h e g e n e r a l d e s c r i p t i o n o f a l g o r i t h m s , n o r i n t h e seven f e a -
Thus,
step
a r e acces-
t h e r e s u l t s o f t h e e a r l i e r o p e r a t i o n s must be s t o r e d some-
listed
algorithm on
Some k i n d o f
of algorithm i s Inevitable. always
from
memory i s i n -
i f we r e g a r d a n a l g o r i t h m a s a p r o c e s s t h e n
some o p e r a t i o n s a r e a c c o m p l i s h e d
where
A l lI n a l l ,
range
t h e memory d a t a
I t seems t h a t
that
electronic or
data
or i m p l i c i t l y .
sed.
follows
tape,
to store
one l e t t e r .
has a above
During
multifunction t h e tape.
read/write
Each s t e p
reading/writing
head.
can involve
t h e head
We
can
reading/
i s on t o p o f o n l y
o f the tape.
Performing
each
step
involves the following
actions.
Suppose t h e
6 r e a d / w r i t e h e a d i s a b o v e some c e l l machine
i s not f i n i t e .
the
letter
the
machine
Then, d e p e n d i n g on t h e s t a t e
i n the c e l l , itself
goes
some l e t t e r
its
right
i s written
t o i t s next
t h e same a s s t o r e d I n t h e c e l l t e d one c e l l
and t h e c u r r e n t s t a t e
or l e f t ,
state.
A program
previously. After
Now
l e t us s i n g l e o u t t h a t f o r us r i g h t
machine c a n d i f f e r tape
can s t r e t c h
from
ours
i n small i n only
as t h e e x e c u t i n g
can
r e a d / w r i t e heads w i t h
be s e v e r a l
t h e tape
state.
that
steps
of the Turing
I n general
i s being
specification
details.
c a n be i s shif-
t h e con-
used.
p a r t o f t h e above d e s c r i p t i o n
now. A p a r t i c u l a r
infinitely
same c e l l a n d letter
T h e p r o c e s s comes t o
of a l l possible
may be p r e s e n t ,
For
example,
one d i r e c t i o n ;
which I s
o f the Turing t h e memory
some o t h e r
devices
one, t h e t a p e - p u l l i n g one, e t c . ; tapes,
and so o n . i s descri-
b y some p r o g r a m . S i n c e T u r i n g m a c h i n e s m i r r o r a d e q u a t e l y
the i n t u i -
tive
notion
tures
the functioning
of algorithm,
this
i s closely associated
that describe
using
Turing
that
algorithm
struc-
e x p l o r i n g t h e s t r u c t u r e s o f programs
Therefore
i t i s not expedient
such programs.
machines
have
very
of their
immense
little
i n common
with
N o n e t h e l e s s , o u r d i s c u s s i o n shows t h a t
rithm
structure
investigation
t o explore
In spite
ters.
hypothetical
exploring
u s e d i n p r a c t i c e a r e h a r d l y e v e r w r i t t e n down a s T u r i n g
programs.
structures value,
with
separate
of the Turing
the algorithms.
Algorithms machine
means
their
there
machine
However, i n a l l cases bed
that
i s a precise mathematical d e f i n i t i o n .
t e n t s o f t h e program depends on t h e a l p h a b e t
important
that
The w r i t t e n
o r remains motionless.
i s the enumeration
o f t h e machine and
into
e n d when t h e T u r i n g m a c h i n e i s I n t h e f i n i t e
machine. This
of the Turing
apparently
the chief
must
algorithm theoretical
modern
target
be p r o g r a m s
compu-
o f algo-
f o r real or
machines.
2. Algorithm Notations While analyzing actually
analyze
various
algorithms,
not algorithms
i t i s easy
as p r o c e s s e s ,
n o t a t i o n s . There i s a grave reason t o t h i s : t a t i o n s f o r algorithms, then
to notice
but rather
some
that
we
formal
i f t h e r e were no f o r m a l no-
i t w o u l d be i m p o s s i b l e
t o spread
the exact
7 k n o w l e d g e o f a l g o r i t h m s among t h e s c i e n t i f i c c o m m u n i t y and c o n s e q u e n t l y no
accumulation
the choice
o f knowledge
the corresponding
while discussing Turing The
pursues
with
to exploit
given
I n t e r p r e t a t i o n o f an a l g o r i t h m i s u s u a l -
some s p e c i f i c
accuracy;
resources
this
machines.
f o r by t h e Incompleteness
designer
solution
leads
f o r m a l n o t a t i o n s , we h a v e m e n t i o n e d
absence o f t h e unique
accounted
rithm
T h a t i s why
t o a n a l y z e a l a r g e enough c l a s s o f a l g o r i t h m s a c t u a l l y
to analyzing
ly
i n that area would take place.
of i t s description.
goal,
e.g. t o o b t a i n
t o guarantee s u f f i c i e n t
of prescribed
t y p e and amount;
The
algo-
t h e problem
execution
speed;
t o express the a l -
gorithm v i aoperations w i t h r i g i d l y determined p r o p e r t i e s , e t c . But t h e entire
s e t o f c o n d i t i o n s under which t h e designer's
r a r e l y w r i t t e n down t o g e t h e r w i t h details Now
are omitted
i f that
some
machine
scription of
a n d some
incomplete that
quite
assumed
i s used
conventions,
different
i s achieved i s Most o f t e n
many
t o hold.
b y somebody o r
t h e a l g o r i t h m de-
meaning.
The
consequences
a r e n o t a t a l l easy t o t r a c k .
Here a r e a few t y p i c a l to describe
description
by o t h e r
acquires
that discrepancy
conditions are t a c i t l y
algorithm
i s guided
actually
goal
the algorithm itself.
examples.
I f mathematical
n o t a t i o n i s used
an a l g o r i t h m , t h e n as a r u l e t h e e x e c u t i o n o r d e r o f i n d i v i -
dual
operations
fied
at all,
Is not specified
as i n our e a r l i e r
numbers. T h i s
means t h a t
precisely.
Sometimes i t i s n o t s p e c i -
example o f f i n d i n g
the algorithm designer
the consequences o f d i f f e r e n t
execution
t h e sum o f s e v e r a l
was n o t c o n c e r n e d
orders and l e f t
about
the decision to
the end user. Accordingly,
the exact
execution
who w r i t e s c o d e
based
follows:
the a l g o r i t h m designer
order, of
"Since
a l l execution
them f o l l o w i n g
orders
of
tivity,
computational
the algorithm designer
operations.
equivalence
They
associativity,
i s chosen by a
formulas.
programmer
He u s u a l l y r e a s o n s
d i d not specify
as
the execution
Such arguments can i n v o l v e con-
p o i n t o f view. can allow a l l v a l i d
are equivalent
usually relies
order
must be e q u i v a l e n t , a n d I c a n t a k e a n y o n e
my own p r e f e r e n c e s " .
s i d e r a b l e danger from True,
on mathematical
from
h i s point
execution
o f view.
on s u c h p r o p e r t i e s o f o p e r a t i o n s
and d i s t r l b u t l v i t y .
In this
case
orders
Yet
this
a s commutathere
i s no
s need f o r a m e t i c u l o u s
clarification
o f execution
w o u l d be o n e and t h e same u n d e r t h a t Due do
t o rounding
not hold
that
while
errors
t i o n undergoes c o n s i d e r a b l e valent
under exact
g a r d s e.g. t h e i r i n t o account computer
and
would
properties by a
operations
computer.
I t follows
i n the algorithm
specifica-
the algorithms
manifest
numerical s t a b i l i t y .
of
I fthis
unlike
t h a t were
properties
circumstance
equias r e -
i s not taken
t h e n even an e x p e r i e n c e d u s e r , w e l l aware o f t h e q u i r k s o f can o v e r l o o k
being q u i t e
certified.
results.
implied
c h a n g e . Now
arithmetics
arithmetics,
algorithm,
i s executed
o f "equivalence"
as t h e r e s u l t
conditions.
the above-listed
t h e program
the contents
order,
sure
that
The d i s r e g a r d
In spite of this,
the i n s t a b i l i t y
i t s stability
of that
particular
was a c c u r a t e l y
phenomenon
mathematical
t o w r i t e computer programs w i t h o u t
o f some
can produce
formulas
are quite
studied incorrect
often
careful examination of their
used
applic-
ability. Rounding e r r o r s
u n a v o i d a b l y accompany c o m p u t e r - b a s e d
f o l l o w s from our discussion algorithm
notations
that
that best-suited
do n o t r e l y
f o r our analysis
on a n y u n d e c l a r e d
o f operations.
l a n g u a g e s was t o r u l e o u t t h e p o s s i b i l i t y o f m u l t i p l e i n t e r p r e -
tations
of the notation.
Existing
o f the execution
Notice
that v i r t u a l l y
roundlng-off,
results
on d i f f e r e n t algorithmic
precisely. Unfortunately
to algorithmic i s so d i f f i c u l t All
the exact where t h e
priori. no c l u e s
methods o f
languages do n o t p r o v i d e
That
to write portable
I n a l l programs
different
t h a t even ex-
t h e means t o s p e c i f y
we h a v e t o g e t a l o n g w i t h t h i s ,
o f rounding e r r o r s
language design.
i s elaborate
enough
i s one o f t h e main
numeric
i n algorithmic
algo-
chiefly
to contribute r e a s o n s why i t
software. languages
a r e good
they supply t h e exhaustive d e s c r i p t i o n s o f algorithms.
unexpected obstacle
as t o t h e
intermediate
o n e a n d t h e same p r o g r a m w o u l d g e n e r a t e
isting
algo-
require
i s known a
Due t o d i f f e r e n t
of
f o r t h e cases
computers. This e s s e n t i a l l y i m p l i e s
rithms
b e c a u s e no t h e o r y
always
except
a l l t h e languages o f f e r
structure o f operations.
results
that
languages
order,
i n d e p e n d e n c e o f some o p e r a t i o n s
inner
o f t h e development
pro-
rithmic
data
o f the goals
a r e those
algebraic
perties
specification
One
research. I t
e m e r g e s : most a l g o r i t h m i c
enough i n
However, an
languages support b u t r e -
9 latlvely
simple algorithm
notations.
Host e x i s t i n g languages were c o n c e i v e d Neumann c o m p u t e r execution execution
of
Moreover,
the
m o s t no
architectures.
t i m e shows l i t t l e
way
Since the
individual
d e p e n d s on
mutual
whether
languages
The
algorithmic that
been
knowledge.
" d u s t y deck"
shall
used
that
information
on
dern p a r a l l e l dependent
write
do
a
information
not
is a
and
to
refurbish
of
manually
languages.
and
a
memory
traditional amounts
of
portion
of
looms t h a t
we
small
Danger
data.
implemen-
via
enormous
even
task.
in a l -
share
indirectly
convenient,
of
operations.
traditional
accumulate
formidable
to dig
algorithmic
this
problem. the
I f we
cannot
tasks
sprout
it,
we
out
to
derive
mentations
of
algorithms
hope up
an
contrive we
mo-
achieve
to
exist-
somebody
extract
s h a l l have
the
s t r u c t u r e of
full from
i s , do
achieve
a
to
extensively
information sequential
success. Moreover, use
Can
we
sequential
algorithms
e x e c u t i o n order preserves the
to
it.
hope t o
of e x i s t i n g algorithm
i n d i v i d u a l o p e r a t i o n s ? I f the
expect
fixing
of
not
information,
must d e v e l o p c o n s t r u c t i v e
Another question
original
of
t i o n order of
if
that
notations,
not
can
use
that
to
re-
manually.
c h a n c e t o do
change
efficient
require
about such branches. Since
language
computational branches out
have t h e
not
P a r a l l e l computers would
incorporate
i t out.
The
do
simultaneous execution of data i n -
information
way
architectures
operations.
implies
explicitly
l o t of d i f f i c u l t
parallel
von-Neumann
branches.
the
from
most p r o g r a m s A
opens
program order
help faster
Is described
simple
d a t a dependence o f
computational
must f i n d
solve
To
architectures
programs
ally
subset
r e f l e c t e d i n the
is
traditional
maximum s p e e d w i t h o u t
we
some
von-
h a v e t o embark on I t . Recall
ing
from
worldwide
programs
the the
t h e s e o p e r a t i o n s exchange or
operations
arrangement
have
classic
architectures,
d a t a dependence does n o t
of
of
to accomplish a subset of operations
i t i s not
influence
references.
operations
on
t a t i o n of algorithms, The
those
era
o r none w h a t e v e r dependence on
time required
information
With
i n the
by
notations?
If
procedures
to
languages
fixing
answer i s y e s , on
possible
programs. algorithm in
this
e x i s t i n g sequential
On
extract
the
t h e n we
parallel the
actuexecu-
other
canimplehand,
s t r u c t u r e , then
case
the
algorithmic
we
opportunity languages
10 on
parallel It
stand the
computers. This p o s s i b i l i t y
i s very
important
i t s bottlenecks,
bottlenecks.
while
i s currently widely
solving
or at least
any k i n d
t o recognize
P a r a l l e l computers
discussed.
o f problem
the factors
themselves
do
t o underthat
not create
shape
bottle-
n e c k s i f we r u n s e q u e n t i a l
programs on them. Y e t t h e y h i g h l i g h t t h e i n -
sufficiency
notations
of sequential
t o guarantee
the e f f i c i e n t
imple-
mentation o f algorithms. We h a v e m e n t i o n e d clude
the declaration
method. ular
to store
I n t h e t i m e when t h e f i r s t
way t o s p e c i f y
mathematical sociated ables.
t h a t e v e r y d e s c r i p t i o n o f an a l g o r i t h m o f memory
an a l g o r i t h m
t h e s e t o f a l l used
The n o t a t i o n
used
itself
to specify
every
clearly:
including
pop-
formulas. I n i t i s as-
indexed
used t o s t o r e
varican be
the value of
variable.
u s e d t o e v a l u a t e some o t h e r
other
words,
tions,
they
perform
tell
some
us, which v a r i a b l e s have t o
v a r i a b l e , a n d i n p r e c i s e l y w h i c h way. I n
us t o e x t r a c t manipulations,
a n o t h e r memory l o c a t i o n . I t i s t h i s
data and
from then
cades that
later,
with
specified store
t h e advent o f p a r a l l e l
algorithms
what o p e r a t i o n s formation using
that
sequential
cessful
efficiently
these data
have
we
must
impact.
p r o g r a m s on p a r a l l e l
about
the result
into
also
languages.
i t h a s become
De-
clear
data are stored.
explicitly
specify
To on
I t i s t h e absence o f t h i s i n -
c o n s t i t u t e s the narrowest
s o l u t i o n o f many p r o b l e m s w i l l
information
loca-
algorithmic
computers
I t i s n o t enough t o s p e c i f y where t h e r e q u i r e d
implement
memory
scheme t h a t i s r e f l e c t e d i n c l a s s i c
von-Neumann a r c h i t e c t u r e s a n d i n t r a d i t i o n a l
the
access
t h e most
individual variable
Mathematical formulas e s s e n t i a l l y t e l l be
quite
variables,
must i n -
the data
t o use m a t h e m a t i c a l
r e g a r d e d a s t h e d e s c r i p t i o n o f a memory c e l l that
and
computers were b u i l t , was
f o r m u l a s memory m a n i f e s t s
with
data,
bottleneck
i n the problem o f
computers. Consequently, depend on o b t a i n i n g
the interconnections
t h e suc-
and s t o r i n g
o f individual operations
of
algorithms. To
get closer
t o t h e answers t o o u r q u e s t i o n s
aspects o f algorithm ous
notations
methods t o compute
i n more d e t a i l .
l e t us c o n s i d e r
L e t us i n v e s t i g a t e
some vari-
11 (2.1)
We
can r e g a r d
language. unique;
The
this
e x p r e s s i o n a s a s t a t e m e n t o f some
execution
order
of
right-hand
side
i n d e x i n g does n o t have a n y t h i n g t o do w i t h
expression. Actually g o r i t h m s . We
will
erations
within
executed
serially
three execution
level
are executed
i n one-by-one
a
2
a
3
a a
1
' Layer 2
a
4
a
a
( a ^ *M eat h^o d H a1 ^
a^
a(
a ^
a
7 a
ag
a
+ a a
5 6 7 8 a ^ )
a^
a^
a^
a ^
*fo aa^
Layer 3
*
Layer 4
a ^
( a ^ +
^
8
78
+
a^
1
aa^
Layer 2
V a
*i%B*s% *
Method
2
a(
as
a^
* V t
*« 7
*7»BJ
ag
a?
V e a
3
a 7
Layer 4
* ( a ^ t ^ ^ ' V e Method
8
•
a fi
3
+
and
have
a
6
a a
5 2 3 4
a
a 56
a a + a a
1
Layer 5
we
a a
34
Layer 2
Layer
Suppose
S
a a
Layer 3
Layer
assuming t h a t
simultaneously
fashion.
12
Input data
of the
orders:
1
Input data
the structure
l a y out the computation i n layers,
each
r
Layer
i s not
( 2 . 1 ) spawns s e v e r a l m a t h e m a t i c a l l y e q u i v a l e n t a l -
Input data
Layer
algorithmic
operations
a
levels
opare
implemented
12 These t a b l e s g i v e compute
(2.1).
We
can see t h a t Method
steps but i trequires ual
operations.
longer
not
While
contents
operations tant,
in
not
take
that
graph
a n d we
We
nodes
arcs
tions.
We
algorithm
3
ascor-
opera-
with
Graph
place 0.
Z
l a y them o u t
and i n p u t
layer
7
substitute
accordance
tables.
0
impor-
f o r tables.
respond
but they
i n c u r r e d a redundant step o f computation.
of individual Is
output
In
two p r o c e s s o r s
structures
respond t o a l g o r i t h m tions
i n three
t h a t w i t h Method 3 t h e u n f o r t u n a t e
investi-
s o we now
sume
Notice
suited f o r
gating algorithm
graphs
order
the result
t a b l e s a r e cumber-
and
analysis.
the
1 produces
to
independent processors t o c a r r y o u t i n d i v i d -
the result.
choice o f execution Our
four
Methods 2 and 3 use o n l y
to obtain
some
some I n s i g h t I n t o t h e s t r u c t u r e o f a l g o r i t h m s
the link
o f operainput
The
data
resulting
graphs
that
t o Methods
cor-
1-3 a r e
shown i n F i g . 2 . 1 . Notice for same
O
t h a t t h e graphs
Methods
1-3
f o r any
are the
Input
7
data.
More
important
f o r us i s
that
the three
pictures i n
Fig.
2.1 a c t u a l l y r e p r e s e n t
Z 3
one a n d t h e same g r a p h . The three ic,
graphs
4
a r e isomorph-
i . e . there
exists
5
a
one-to-one mapping o f t h e i r
F i g . 2.1
13 n o d e s t h a t a l s o maps a r c s of
roundoff
the
same
ters
errors
computer.
as w e l l ,
operations out
is
o n same
Our
important
The
structures.
formulas
involving
scribed
exact
how
same
have
or
we
could
to
store
safely
the operations
(2.1) that
by t h e parentheses i n ( 2 . 1 ) .
This
indicates
a few data
(2.1)
describe
the algorithm
some q u a n t i t i e s .
replaced invent
The i n p u t
regard
o f those
a ,..., a , y 1 8 a totally
as a sequence o f opera-
valid
explicitly variables
from
specified they
plementations
i s o f no
what
that
a a +a a
on which So
The memory
that
locations,
variables
of (2.1) are equivalent
2. 1 c o n -
i f we
carry
out
graphs
we
depend and w h i c h
that
a l l v a l i d im-
as f a r as t h e d a t a would
them i n
I t i s n o t immed-
algorithm
our variables clear
form
processing
i t makes
By b u i l d i n g
i t became q u i t e
the
S
dependencies
between
variables
a r e concerned.
Therefore
they
results
e v e n when r o u n d i n g e r r o r s
influence
the computation.
Consider
locations y e t we c a n
e t c . as d e s c r i b i n g
t o a memory l o c a t i o n . difference
we
1, 2 , . . . , 8, 9
specified,
the operations
of operations.
influence.
importance;
3 4 1 2 3 4
f r o m some memory
( 2 . 1)
sequence
values are stored.
i n ( 2 . 1 ) by i n t e g e r s
a a , a a ,
I t follows
i n r e t r i e v i n g data
clear
addresses
different notation.
12
3
locations.
bet-
d a t a and t h e r e s u l t a r e de-
results are not e x p l i c i t l y
the symbols
S
links
(2.1).
some w a y , a n d s t o r i n g t h e r e s u l t iately
f o r same
are carried
f o r a l l methods t o compute
F i g . 2. 1 m e r e l y
intermediate
required
another
results
O f c o u r s e , n o h y p o t h e s e s a r e made a s t o t h e compu-
representation
could
sist
Precisely
produce
b y t h e a d d r e s s e s o f memory c e l l s w h e r e t h e i r
I
on
be i d e n t i c a l o n d i f f e r e n t compu-
order defined
that would evaluate
The
identical results
e x a m p l e shows t h a t g r a p h s p r o v i d e a handy a p p a r a t u s t o e x p l o r e
algorithm
tions
t h a t even i n t h e presence
yield
property.
ween t h e o p e r a t i o n s . ter
would
computers
This holds
the execution
a very
these
numbers.
I t follows
methods w o u l d
The r e s u l t s
provided
i s not important.
preserve
t o arcs.
the three
yield
a g a i n o u r e x a m p l e o f summing numbers. L e t us
identical
investigate
v a r i o u s methods t o compute
(2.2) i=i
14
The e x a c t associativity. terras.
o f numbers obeys
the result
f o r large
n
execution
the point
fluence,
order
o f view
substantial
of ai
Input
difference
a
Layer 2 3
(a,
Layer 4
U
Layer 5
( a
Layer 6
(a
Layer
i i
7
+ + + + + + +
a2 2 a 2 a 2
a a
2 a
a 2
we
on
the order
of execution
choose?
o f the
orders t o
They
are a l l equivalent
With
roundoff
shows u p . I f we
errors i n -
ignore the absolute va-
o f summing,
then
consecutive
F o r n = 8, i t l o o k s l i k e
sum-
this:
a 3
+ •
*3 a 3 a 3 a a
+ + + 3 + a 33
s o )+ a
6
S
a5 a
The s o - c a l l e d d o u b l i n g scheme pict
number
computation.
the order
a
a
Layer 1
Layer
shall
i n accuracy.
data
n o t depend
and
large.
o f exact
w h i l e choosing
ming i s t h e worst
does
t h e laws o f c o m m u t a t i v i t y
the total
( 2 . 2 ) w o u l d be q u i t e
which
lues
Thus
Obviously,
evaluate
from
addition
+ a )+ a
6
7
+ a + a )+ a
5
6
7
i s the best
8
i n accuracy.
H e r e we de-
i t f o r n • 8:
a
Input data
1 Layer
1
Layer 2
a
2
a + a
1
a
3 2
4
a + a
3
( a + a ) + (a
1
2
a
a
5
6
a + a
4
5
4
5
7
6
a
B
a
a + a
6
* a
* a ) (a
3
a
7
B
+ a
) + (a
7
)
8
I t I s a w e l l - k n o w n f a c t t h a t c o n s e c u t i v e summing i s w o r s e t h a n t h e Layer 3 (a + a + a + a ) + la * a + a + a ) d o u b l i n g scheme by a f a c t o r o f a b o u t n / l o g ^ n . T h e r e i s a l s o a n i n s t r u c tive difference graphs
1
i n F i g . 2.2
scheme, r e s p e c t i v e l y . Even describe
though
2
3
4
5
6
7
i n data dependencies between i n d i v i d u a l
both
correspond
to
consecutive
Each n o d e s t a n d s graphs
8
o p e r a t i o n s . The
summing
and
doubling
f o r a single addition operation.
i n c o r p o r a t e t h e same number
t h e e v a l u a t i o n o f one a n d t h e same e x p r e s s i o n
o f nodes and
(2.2),
they are
15 not
Isomorphic.
Isomorphic
graphs always have al
critical
identic-
path
lengths
(critical
path
i s t h e long-
est
in a
graph).
The
our
graphs
has
path
first
of
critical while
path
length
i t equals
3
for
second o n e . The f i r s t Indicates summing
that has
branches The rates
that
contains data
the
graph
consecutive no
of
second
7,
parallel
computation.
graph
demonst-
doubling
scheme
a l a r g e number
independent
of
Fig.
2.2
operations
the
process o f algorithm
the
crucial
difference
Arcs i n graphs represent data t r a n s f e r s i n e x e c u t i o n . The g r a p h s i n F i g . 2.2 i l l u s t r a t e
between
t h e t w o schemes
with
respect
t o data
transfers. Comparing the
the graphs
second graph
That
means
that
i n Fig.
i n Figs.
2 . 1 a n d 2.2 we r e a d i l y n o t i c e
2.2 i s i s o m o r p h i c
the algorithms
they
scheme t o sum 8 n u m b e r s a n d e v a l u a t i o n tures, is
even
quite
have
a
rithms. have. be
different.
is the
that
algorithms
i n common,
a t least
They f e a t u r e
between
doubling
- have i d e n t i c a l
appearance o f formulas
I t i s clear
i n Fig. 2.1.
I.e. the
with
(2.1)
struc-
and ( 2 . 2 )
identical
as regards
graphs
the flow of
the execution.
For example,
drawback
cies
o f (2.1)
-
g r a p h s and t a b l e s a r e m e r e l y a n o t h e r n o t a t i o n
executed
The
t h e outward
l o t o f propert ies
data during Our
though
t o t h e graphs
represent
that
some p r o p e r t i e s
our tables
i n parallel. of tables operations
immediately
provided
that
explicitly
the formula specify
The f o r m u l a n o t a t i o n s I s that
they
poorly
on d i f f e r e n t by graphs.
exact s t r u c t u r e o f every operation,
lack
That
the other while
notations
which
reflect
layers.
On
t o express
algodo n o t
operations
may
that explicitness. the data kind
hand,
of
dependeninformation
tables
graphs lack
that
comprise lnforma-
16 tion
a l t o g e t h e r . However, t h a t
becomes n e c e s s a r y , include
links
between
individual whether
operations
tions,
operations.
must
be
executed
them.
a l l data
I ti s precisely this
tance f o re f f i c i e n t
movements t h a t
computers,
idea
naturally
algorithm notations
some
other
while
and i t I s t h i s
as a g e n e r a l
idea,
exe-
imporinforma-
L e t us use t h e
a l g o r i t h m graphs,
and t h e n
graphs t o analyze s t r u c t u r e s o f a l g o r i t h m s . This
be d i s p u t e d
opera-
implementa-
i s of vital
comes t o m i n d .
to build
we
simultaneously,
take place
information that
t h e data
of algorithm
a l l possible
the following
but i t s implementation
use
can hard-
i s f a r from
obvious. Almost
ficiently
immediately clear-cut
we
come
structure
arbitrary
picture
choice
that
up
of
h e a v i l y upon a f o r t u n a t e c h o i c e
to
or after
Now,
the o b t a i n e d
Yet,
before
use o f p a r a l l e l
specifies
the graph
t h e a l g o r i t h m i c language n o t a t i o n l a c k s .
being
can be expanded t o
c a n be e x e c u t e d
that
existing
An
explicitly
Using
some o p e r a t i o n s
t i o n s and f u l l y d e s c r i b e s
ly
nodes
e t c . The g r a p h o f a l g o r i t h m d e t e r m i n e s
cuting
tion
o f graph
representation of algorithms
determine
which
the description
I f i t
the relevant information.
Graph
can
i n f o r m a t i o n i s n o t a l w a y s needed.
against
graphs
difficulties.
i n Figs.
2 . 1 , 2.2
o f mapping o f g r a p h nodes o n t o
of that
mapping c o u l d
result
would n o t h e l p
us u n d e r s t a n d
the structure
the formulas
(2.1),
The
(2.2) o f f e r no h i n t s
In a wildly
suf-
depends a
plane.
intricate
o f t h e graphs.
a s t o how t h i s
problem i s
be s o l v e d . The
problem o f f i n d i n g
the best
m a p p i n g c a n c e r t a i n l y be s o l v e d i f
t h e number o f n o d e s i s s m a l l . We c o u l d lities. could
We
only
mention
this
because
just
look over
a l l the possibi-
f o r an a r b i t r a r y
p r o v e t h e o n l y method t o b u i l d and e x p l o r e
algorithm
i t s graph.
That
this
i s why
we s t r e s s t h e s m a l l n e s s o f t h e number o f n o d e s . As ber
a rule,
programs i n a l g o r i t h m i c languages d e s c r i b e
of individual
operations.
Moreover,
h a n d , a s i t d e p e n d s o n some p a r a m e t e r s . chance
to build
and e x p l o r e
i ti s often At t h i s
a l g o r i t h m graphs
a h u g e num-
n o t known
juncture
l o o k i n g over
there
beforei s no
a l l possible
variants. Actual
programs,
however,
are arbitrary
only
on
the
individual
17 statements
level.
The s t a t e m e n t s
As r e g a r d s t h e p r o g r a m the
algorithm
conciseness in
o f a program
the structure
t o be r a t h e r
t h e r e a r e no r e a s o n s
due t o l o o p s must
o f a l g o r i t h m s and t h e i r
Mathematical express
as a w h o l e ,
tend
reflection
notation,
algorithms.
but that
programs,
Directly
reflect
simple.
to believe
i t d e s c r i b e s does n o t possess any p e c u l i a r
exact nature o f t h a t
to
themselves
that
features.
The
i n some
way
Itself
g r a p h s . We do n o t know y e t t h e i s another matter.
schemes, g r a p h s
or indirectly,
c a n a l l be
those
notations
used imply
memory u s a g e t o s t o r e d a t a . The ways v a r i o u s n o t a t i o n s make u s e o f memory
may d i f f e r
rithm
significantly.
With
i s d e s c r i b e d as a s e t o f e q u a l i t i e s .
right-hand (taken
side
together
process would
o f any e q u a l i t y with
super-
(i.e.
I n t h e memory m o d e l t h a t will
ever
This
most
reducing
languages.
overall
memory
resulted
Our ing
discussion
algorithm
proves
structures
notation)
property
and
no
holds
h o w e v e r . The n e -
o f mathematical
requirements,
memory
i n t h e concept
11 a l l o w e d m u l t i p l e
m a k i n g a l g o r i t h m g r a p h b u i l d i n g much more d i f f i c u l t instead o f the mathematical
the algorithmic
t h e same
programs,
memory
which has r e p l a c e d t h e concept
programming
cells,
computer
and t h e variables
i n mathematical
by mathematical
Naturally,
an a l g o -
side
identical
or else
that
i s n o t t h e case w i t h
t o use s p a r i n g l y
"assignment"
i s implied
notation,
The l e f t - h a n d
not contain
I tfollows
get overwritten.
good f o r graphs. cessity
must
or subscripts],
n o t be s p e c i f i e d .
cells
in
the mathematical
"equality"
re-use at
of
o f memory
t h e same
i f we u s e a
time
program
notation. t h a t we c a n n o t e x p e c t
t h e problem o f e x p l o r -
t o b e s i m p l e . T h e r e f o r e we s h a l l
elaborate our
g o a l s a n d o u r means b e f o r e we e m b a r k o n i t s s o l u t i o n ,
3. Graph of Algorithm A
particular
requires
that
computer
i m p l e m e n t a t i o n o f an a l g o r i t h m
c h a n g e s be made e i t h e r
inevitably
i n t h e a l g o r i t h m as a w h o l e , o r i n
some o f i t s f r a g m e n t s . F o r m a l l y , t h i s a d j u s t m e n t o f a l g o r i t h m ticular tion. in
computer
This
means
that
transformation
an u n d e s i r a b l e ,
the algorithm
may c h a n g e
o r , worse,
undergoes
some p r o p e r t i e s
inadmissible
way.
some
t o a par-
transforma-
of the algorithm
18 A l g o r i t h m s were always t a i l o r e d ities.
Yet
this
activity
vent
of
parallei
fact
that parallel
putation
gained
computers.
11
the
requirements
matter
will
quite
imperceptible. For
an
search. about tool ved
end
the
the
any
the
user
actual
result
would
must be
From
that
goal of
computational but
late
of
the
a l l allowable
only
from
results. We
may
the Now,
have
notation. erations
of
some
target
or
operations
the
machine,
Turing
of
the
target
interpret guously.
a
his
as
is
goals
can
i s why
re-
possible
to
result
That
use
the sol-
the be
should
acmain
quite
we
stipupreserve
modification
of
preserve
accuracy
an
grasp
the
the
and
exactly,
any
i s complete in
we
algorithm of
etc.)
a
via
some
that
can
is
language, e t c .
non-contradictory
the
corresponding
to
that
I f the
and
hypo-
execute
able
machine
op-
assumes
(possibly
such machines
that
hypothetical
designer
of
the
describe I t ?
algorithm
algorithm
existence
machine
a
do
an
i m p l i e s p e r f o r m i n g some
automaton,
hypothetical
the
how
s p e c i f i e s . Examples o f
notations
conduct
become
to a guaranteed
a
to
subject to
problem being
algorithms
algorithm
i n some a l g o r i t h m i c
machine
algorithm
of
that
Actually
(or
he
operations
programs w r i t t e n
secondary.
appropriate
explicitly)
machine
sequence o f
mathematical
such
our
wishes
of
approximate
e s s e n t i a l l y we
objects.
just
kind
the
results.
modifications
that
He
to
com-
properties
as
little
development. Other
s p e c i f i c a t i o n o f an
implicitly
thetical)
the
as
i s the nature of t h a t set
stated
Any on
(either
set
the
tool
obtaining
are
the
algorithms
shaped
a
the
the
Otherwise
know
tool.
transformations
choose
what
view
same t h e y
the guaranteed accuracy o f we
to
that
ad-
in
explore
transforming
vaguely
is just
D e p e n d i n g on
algorithms
important,
Thus
of
the
1ies
to
must p i n p o i n t
so
e i t h e r exact or
point
just
get
like
with
this
starting
computers.
can
computer
result.
curacy.
that
and
construction
the
we
preserved while
particular
user,
Ideally,
to obtain
of
ill-defined
swing
for
peculiar-
s p e c i f i c a t i o n of p a r a l l e l
before
detail
computer
large
reason
the
that
in full
t h a t s h o u l d be
be
primary
follows
structures of algorithms
fit
particularly
The
computers r e q u i r e
branches.
of algorithms
to f i t various
the
include perform executes
description then
i t must
language
unambi-
19 However, seldom machine
a perfunctory
i n algorithm
the algorithm
m a c h i n e . As we h a v e
with different can
easily
tion
description
of a
target
machine i s
Most
often
the target
specifications.
i s n o t e v e n m e n t i o n e d . The a l g o r i t h m
sumes t h a t get
even
included
notation stated
itself
designer
earlier,
i f such a d e s c r i p t i o n
a s s u m p t i o n s on t h e t a r g e t machine
occur.
can lead
We s t r e s s
again
t o unpredictable
that
then
pretar-
i s used
misinterpretation
thedifferences
consequences.
as t o t h e t a r g e t machine a r e t h e c h i e f
wordlessly
determines theunderlying
i ninterpreta-
The u n s p o k e n a s s u m p t i o n s
source o f ambiguity
i n algorithm
notations. We h a v e machine priate
noted
i sfairly
that often
modifications.
an a l g o r i t h m used
on another
are
intended
made t o r u n them o n p a r a l l e l
programs
guish
special
that
formed upon m a t r i x rithm,
we h a v e
as w e l l
will
gorithm cell
computers, y e t attempts
transportation of
I t i s important
task
i s t o develop
thing
objects
It
these o b j e c t s i s what
an a l g o r i t h m
arestored
operations
r e s u l t s . The u s u a l
d o . I f t h e amount o f a v a i l a b l e must
allow
must
way t o d o t h i s
and i n be p e r -
the contents
variables.
s e to f operations
i s through these operations crucial point
o f this
on matrix
that
provide
then
the a l -
of a
memory
T h i s c o m p l i c a t e s t h e no-
entries
i s that
organization.
remains
the designer's goal
discussion
entries
any s u b s c r i p t -
memory i s l i m i t e d
f o r overwriting
by values o f other subscripted
The
to
e n t r i e s . H o w e v e r , i f we w i s h t o w r i t e down t h e a l g o -
I f memory c o n s i d e r a t i o n s a r e n o t i m p o r t a n t ,
theentire
and t h e
b o t h a r e m a t r i c e s . On
t a t i o n and imposes l i m i t a t i o n s on t h e c h o i c e o f s u b s c r i p t Yet
to distin-
designer
t o d e c i d e o n a p a r t i c u l a r way t o i d e n t i f y m a t r i x
designer
FOR-
machine.
and o u t p u t
important
as a l l Intermediate
subscripts. ing
t o another.
i t i s i m m a t e r i a l where
m a n n e r . The o n l y
appro-
c o m p u t e r s . We m u s t h a v e t h e p r o p e r d e -
t h e designer's Input
with
i sw r i t i n g
FORTRAN p r o g r a m s a r e i n -
requirements o f t h ealgorithm
inverse.
the d e s i g n stage what
one machine
thev i t a l
thematrix
formulas,
machines t o ensure t h e c o r r e c t
needs o f a p a r t i c u l a r t a r g e t
Suppose find
from
between
to a specific
machine, p o s s i b l y
t o be u s e d o n u n i p r o c e s s o r a l
scription of different our
tailored
A c h a r a c t e r i s t i c example o f t h i s
TRAN p r o g r a m s b a s e d o n m a t h e m a t i c a l herently
notation
unchanged.
i s achieved.
thedesigner's
goals
20 are
not reflected
i n algorithm notations
They a r e c a m o u f l a g e d and
i t s language
gorithmic is
of
creep
into
execution
order
and r e p e a t e d
should
be h a d i n m i n d , h o w e v e r ,
the execution
order
ideas
that
from
order
scription
as a h i n d r a n c e .
i f he t h i n k s t h a t
parallelism
and i s less
burdened
turn
By d o i n g
Generally
algorithm that
coating
Given following
originating
even w i t h
than
greater t h e FOR-
cloud
has to
t h e mathematical
some
important
somebody's d i s c o v e r i n g
reflected
i n particular
prothem,
properties
by i t . The q u e s t i o n
arises
while
by o u r preoccupation
are responsible
for
with
e f f e c t i v e im-
suppose
rules.
that
This
we c a n e x e c u t e
the algorithm
means t h a t some s e t o f o p e r a t i o n s
a n d f o r e v e r y o p e r a t i o n we c a n d e t e r m i n e , w h i c h
functions
i n some f i x e d
on t h evalues
characteristics
those
their
i t s a r g u m e n t s . Suppose t h a t a l l o p e r a t i o n s
depend o n l y
preserving the
o u t t o be p o s s i b l e . The
computers.
the input data, some p r e s c r i b e d
vector
that
machines
I tturns
i s determined
on p a r a l l e l
is f u l l y defined,
time
garbage
intention.
of the results.
o f algorithms
ations provide
no
c o n c e p t , a n d he
a n y a l g o r i t h m n o t a t i o n masks t h o s e
a r enot d i r e c t l y
accuracy
plementation
sults
thwart
i t n e v e r was h i s
o f the solution
properties
as
o r even
exec-
i t i s p o s s i b l e t o f r e e the a l g o r i t h m n o t a t i o n o f a l l t h e redun-
guaranteed nature
therigid
the a l g o r i t h m designer
orders
s o , he c a n i n v o l u n t a r i l y
speaking,
dedicated
t o t h e m a t h e m a t i c a l de-
by u n e s s e n t i a l
execution
o f the algorithm,
though, o f course,
whether
for a
t h e mathematical d e s c r i p t i o n o f f e r s
a set o f possible
notation. perties
issue
t o u s e a FORTRAN p r o g r a m t o
c o m p u t e r , he c a n r e g a r d
He c a n t h e n
TRAN p r o g r a m . Due t o a number o f r e a s o n s , specify
i t i s n o t always easy t o
o f t h e a l g o r i t h m i n s e a r c h o f a more p u r i f i e d
right
dant
With a l -
t h e c h a f f . The d e t e r m i n a t i o n
i s b y no means a s e c o n d a r y
solve h i s problem on a p a r a l l e l
of
form.
machine
o v e r w r i t i n g o f the contents o f
FORTRAN p r o g r a m m e r . Y e t i f somebody w i s h e s
is
pure
t h ea l g o r i t h m s p e c i f i c a t i o n .
the core o f the designer's
ution
original
cells.
It tell
that
i n their
idiosyncrasies of the target
l a n g u a g e s s u c h a s FORTRAN, ALGOL, e t c . t h e c h i e f i n t e r f e r e n c e
rigid
memory
by numerous
number
o f v a r i a b l e s , and t h e i r r e -
o f t h e s e v a r i a b l e s . Suppose a l s o
influence
oper-
c a n be v i e w e d
the output
of the algorithm.
that I t
i s
21 only
Important
i.e.
a l l the
that
a l l the
operations
completed. A l l i n a l l , rithm
the
set
of
arguments
that
we
have
for
these
postulate that
variables that
are
every
operation
arguments
as
output
an a l g o r i t h m
modified
be
i n the
process of
that
are performed d u r i n g the
the arguments f o r every Note
that
we
do
not
t y p e o f v a r i a b l e s and some a l p h a b e t ,
that (in
the
number
terms o f
T h u s we
do
impose
any
boolean values, of
substantial
o p e r a t i o n s . Our
i n s and
matrices,
outs
admit
the
operation of
summing n
Typically,
class
a l l v a r i a b l e s and
number o f
ferent
the
i s i m m a t e r i a l . For
of
numbers.
each o t h e r ,
i n which
ried out.
I f such classes
and
variables
basic
are
can
are
be
we
f o r given
tions. tion,
I f a we
out o f
link
having
are
an
We
l a c k one
of
data
could the
into
f e a t u r e s . The
sig-
to belong
variables
to the
difsame be
Just
an
of
or
how
real
operations
shall
identify
operation
corresponding
several
input
n e v e r u s e d . We or
the
data. of
ac-
say
numbers
are
differ
actually
that basic
car-
operations
defined.
possible as
nodes.
nodes w i t h
argument
nodes w i t h
an
these
opera-
f o r another
opera-
arc.
The
arc
goes
argument. conventions
arguments,
agree t h a t
that are a c t u a l l y p e r f o r -
graph
i s an
graph
t h e node t h a t p r o d u c e s the
There tion
input
result
vari-
a l g o r i t h m can
the
fixed,
is
become
additions, multiplications,
important manner
algorithm.
algorithm f a l l
operations
Consider the set of a l g o r i t h m operations med
an
example, a l l v a r i a b l e s o f
I t i s not
nor
of
some s p e c i f i c
d i f f e r e n c e between
a l l operations
fixed
numbers.
operations
possessing
assume
Is
i f n
t h i s o p e r a t i o n can
i s f o r v a r i a b l e s and
while
numbers and
from
classes
distinction classes,
divisions
numbers, y e t
numbers
the
letters
only
results]
t h e c h o s e n o p e r a t i o n s ) f o r each o p e r a t i o n o f an
not
on
numbers,
a r r a y s , e t c . We
i f i n p u t v a r i a b l e i s an a r r a y o f n
nificant
be
( i . e . a r g u m e n t s and
ceptable
small
operations
restrictions
v a r i a b l e s can
input v a r i a b l e s are
real
algo-
operation.
a b l e and
a
be
execution;
t h e c o r r e s p o n d e n c e w h i c h shows w h i c h r e s u l t s o f w h i c h
of
must
determines:
execution; the set of operations
are
ready,
or
producing
the r e l a t e d
Unless
f o r the
otherwise
arcs are
case o f the
result
either
specified,
an
we
operathat
is
nonexistent assume
that
22
all
I n p u t / o u t p u t goes v i a s p e c i a l
graph
nodes
outgoing
that
arcs,
correspond
respectively.
dence between i n d i v i d u a l of
I fthe pictorial
we w o u l d d r o p
We
I n that
do n o t r e q u i r e
i/o data
such o p e r a t i o n s can v a r y
tion.
i/o devices.
to i/o operations w i l l
case o n l y
have
no
one-to-one
those
incoming/ correspon-
i t e m s a n d i / o o p e r a t i o n s . The number
t o meet t h e demands o f a p a r t i c u l a r
image o f a g r a p h
becomes a n n o y i n g l y
some o f o u r a g r e e m e n t s t o a c h i e v e
better
situa-
cumbersome,
quality
of
i l -
lustrations. We w i l l will
often
treat
t a g a r c s and nodes
graph that
n i z i n g o p e r a t i o n s and d a t a Strictly dependence
speaking,
graph
data". Clearly, simplify
this
i t to just
belong
unwieldy t h e graph
term does n o t f u l l y object,
shall
graph
execution expression
recog-
should
be c a l l e d
i s unfit
" t h e data
to given
input
t o use. Therefore
o r algorithm
graph.
we
Though
r e v e a l t h e n a t u r e o f t h e u n d e r l y i n g mathemat-
see t h a t
i t adequately
t h a t we
reflects
introduced i s a directed
G=(V,E), C.
£ t h e s e t o f arcs o f t h e graph
graph
theory, c h i e f l y ,
assume t h e r e a d e r book
[80] e a s i l y
t h e essence o f
Since
i s familiar
t h e graph
We w i l l
with
of algorithm
links
basic
only fixes
To be more p r e c i s e , t h e r e
t a r g e t machine, s i n c e i t can i n f l u e n c e t y p e s . The r e l a t e d g r a p h
with
and such.
Thus
the target
We make a p o i n t
will
theory [ t h e
t h e a l g o r i t h m was
remain
but a
However,
the said
orig-
few t r a c e s o f
t h e c h o i c e o f o p e r a t i o n s and
parameters are the t o t a l most
often
a n d o p e r a t i o n s i s made b e f o r e a n y n o t a t i o n the algorithm.
from
t h e s e t o f e x e c u t a b l e op-
t o t h e t a r g e t machine f o r which
nected
We
between them, i t no l o n g e r c o m p r i s e s any i n f o r -
written.
o f nodes,
several facts
n o t i o n s o f graph
peculiar
types
m u l t i g r a p h . We
i s t h e s e t o f nodes
them].
inally
degrees
V
need
mation
data
acyclic
where
t h e t e r m i n o l o g y and a few b a s i c r e s u l t s .
covers
e r a t i o n s and d a t a
sent
thus
we
analysis. The g r a p h
the
classes,
corresponding
of algorithm,
make u s e o f t h e s t a n d a r d n o t a t i o n and
to different
the described
ical
we
a s v e c t o r s . Whenever n e c e s s a r y
transfers o f various kinds.
of algorithm
this
our
arcs
parameters
number o f nodes,
t h e choice
of
data
i s selected t o repre-
a r e n o t n e c e s s a r i l y con-
machine. that
t h e graph
of algorithm
i s the kernel of the
23 designer's designer
idea as represented
could
well
be a w a r e o f t h a t k e r n e l . Q u i t e
mask i t b e c a u s e t h e t a r g e t description signer
graph
o f I t . But i n the vast
m a j o r i t y o f cases t h e a l g o r i t h m de-
idea
i s really
I ti s quite
understandable
the informational kernel
since the
o f i t . Up t o a
t h e k n o w l e d g e o f how e x a c t l y t h e i n f o r m a t i o n f l o w s d u -
algorithm
ledge
k e r n e l . M o r e o v e r , he d o e s n ' t e v e n h a v e t h e
o f i t s existence.
s h o r t w h i l e ago,
strived
p o s s i b l y , he h a d t o
language d i d n o t a l l o w any adequate
of algorithm
ring
The a l g o r i t h m
machine
i s n o t aware o f t h a t
smallest
i nthealgorithm notation.
execution
t o acquire
h a d no
i t . With
practical
value.
t h eadvent o f p a r a l l e l
t u r n s o u t t o be o f paramount
importance.
No
wonder
nobody
computers t h i s
The g r a p h
know-
of algorithm
comprises t h e r e l e v a n t i n f o r m a t i o n e x p l i c i t l y . By
choosing
gorithm
itself,
liarities,
t o consider
the graph o f a l g o r i t h m r a t h e r than
we a r e f r e e n o t t o t a k e
nor the idiosyncrasies of the target
u s e d a FORTRAN p r o g r a m t o b u i l d er bothered re-use
deration rithm that
opens
research
The tion
we w i l l
restriction
speaking,
execution eventual choice
of that
consi-
set o f algo-
equivalent.
Basing
implementation
on
f o r any
computers.
o f a l g o r i t h m places theexecution
on t h e execu-
o f a n y o p e r a t i o n may
i t s a r g u m e n t s m u s t be c o m p l e t e d .
i snot a r e s t r i c t i o n ,
b u t r a t h e r the necessary
orders effect
on t h e graph.
possess a s i n g u l a r l y o f roundoff
determined
a l l algorithm
a set o f permissible execution
s e t depends
o f execution
fully
before
graph
the entire
t o s e l e c t t h ebest parallel
i s that
c o n d i t i o n f o r t h e a l g o r i t h m t o be c o r r e c t . T h e r e f o r e t h e
STATEMENT 3 . 1 . are
thing
t o explore
that provide
of algorithm defines
structure
data
this
we h a v e
l a n g u a g e , n o r by t h e m u l t i p l e
important
t h e graph
reads as f o l l o w s :
sufficient
graph
be a b l e
a l l the operations
Strictly
of that
are informationally
computer, i n c l u d i n g
only
order
start,
nature
The m o s t
that
machine. A f t e r
t h e g r a p h o f a l g o r i t h m , we a r e no l o n g -
up t h e p o s s i b i l i t y
implementations
particular
and
by t h e s e r i a l
o f memory c e l l s .
the a l -
i n t o account any n o t a t i o n pecu-
order. Suppose
important
e r r o r s accumulation The f o l l o w i n g that
by argument implementations
that
invariant.
Then
holds: the roundoff
for a given
correspond
Namely, t h e
does n o t depend o n t h e
statement
for a l l operations values.
o r d e r s . The
However, a l l p e r m i s s i b l e
set
errors of
to one and the
input same
24 graph
yield
suits
identical
results,
do not depend
including
on execution
that
E ,
£ ....
be
pointed
0
1
where
Obviously,
o f execution
Certainly,
order,
depend o n e x e c u t i o n
originating
a l l nodes that
order
Then
tions.
we
I t follows
that
on e x e c u t i o n o r d e r The
plore
agreed
as
input
o f £j (
that
we
just
proved
from
produce
f o rE that
Re-
same r e -
operations roundoff
data
do
errors
the results of
f o r E-
operations
opens
notation
o f a l g o r i t h m graphs
t h e m s e l v e s . F r o m now o n , we w i l l graphs,
will
t o Ej
exists.
i = l , do n o t d e p e n d o n e x e c u t i o n o r -
results
the results
a r e now f r e e
the structure
rithms their
As we
from
may «
will
opera-
n o t depend
either.
statement
tunities.
those
subsets
nodes b e l o n g i n g
o f a r g u m e n t s . Now s u p p o s e t h a t
can regard
into
partitioning
a r e arguments
a n d we h a v e
nodes
and t h e nodes o f E
from
one such
the o p e r a t i o n s from £(, . . . , E ^ ,
all
n o d e s . Ex-
i n p u t nodes,
at least
input data
depend o n l y on t h e v a l u e s
der.
on t h e s e t o f graph
the s e t o f graph
only
t o o n l y by t h e arcs
Osi£Jc-l.
sults.
order
we p a r t i t i o n contains
The re-
0
gardless
not
order,
where E
errors.
order.
Graph a r c s d e f i n e a p a r t i a l ploiting
a l l roundoff
since
they
up wide
research
peculiarities,
we
opporcan ex-
instead o f dealing with
algo-
not distinguish algorithms
are equivalent
from
some
point
from
o f view. I n
p r a c t i c e an a l g o r i t h m i s seldom s p e c i f i e d b y i t s g r a p h and b u i l d i n g t h e graph
from
some o t h e r
algorithm notation
we assume t h i s p r o b l e m h a s b e e n s o l v e d We
have
mentioned
that
quirements of a p a r t i c u l a r adjustments
answer w h i c h
and
thus
to the
or i n part.
re-
These
they a r e those
the accuracy
o f the results
transformations that
the basic
preserve
v a r i a b l e s and b a s i c
opera-
unchanged).
now make
several
algorithm graph. F i r s t , ly
be a d j u s t e d
entirely
t r a n s f o r m a t i o n o f t h e a l g o r i t h m . Now we
o f a I g o r i thm ( a s s u m i n g
t i o n s remain Ue
an a l g o r i t h m must
transformations preserve
are permissible:
the graph
t a s k . F o r now,
i n some m a n n e r .
computer, e i t h e r
amount t o a c e r t a i n
can
i sa difficult
f o ra given
remarks
note
that
set o f input data.
bearing
on t h e i n t r o d u c e d
notion of
t h e graph o f a l g o r i t h m i s d e f i n e d onI f an a l g o r i t h m
i s described
as an
e x e c u t i o n s c h e d u l e f o r some t a r g e t m a c h i n e , t h e n a n y r e c o r d o f i t p r a c tically
always deals
with
input data
i n either
direct
or indirect
way.
25 For of
e x a m p l e , Gauss e l i m i n a t i o n thelinear
order
p i v o t i n g depends b o t h on t h e o r d e r
system b e i n g s o l v e d a n d o n t h e v a l u e s o f t h e p i v o t s . The
o f t h e system
i sspecified
lues o f t h ep i v o t s ditions
with
explicitly
depend on i n p u t
i n a FORTRAN p r o g r a m
data
on t h e input
indirectly.
( o r program
One may g e t t h e I m p r e s s i o n al
algorithms
branches. on
only,
that
We s t r e s s
operations.
that
o f operations.
on
data,
conditional Fourier es,
even
or
t o terminate
Sometimes
thealgorithm
the description
For example,
a s we
control
uncondition-
no
conditional
shall
graph does n o t depend
o f algorithm
may
process. graph,
contain f o r fast
a l o t o f conditional
see l a t e r ,
o f algorithm
t o t h e inner
a t y p i c a l FORTRAN p r o g r a m
includes
transfers f a i r l y
some i t e r a t i o n
much t h e s t r u c t u r e
have
b r a n c h e s may be r e l e g a t e d
o f f i x e d order
y e t t h e graph,
data. Conditional
lan-
data.
t h a t o u r s t u d y i s aimed a t that
con-
algorithmic
we d o n o t i m p o s e a n y s u b s t a n t i a l r e s t r i c t i o n s
though
branches.
transform
on input
at algorithms
Thus c o n d i t i o n a l
structure input
is,
the va-
Any b r a n c h i n g
i n any other
guage) a l s o depend, d i r e c t l y o r i n d i r e c t l y ,
while
does
often
n o t depend
branchon
a r e used j u s t
input
to start
I n t h a t c a s e t h e y do n o t a f f e c t modifying
i ts l i g h t l y
along t h e
b o r d e r s o f r e g i o n s where graph nodes a r e s i t u a t e d . Strictly tional
interesting will
speaking,
branches
we m u s t r e g a r d
as a f a m i l y
t o investigate
n o tpursue
this
of similar thegeneral
t h e graphs from a family
on.
O f c o u r s e , we s h a l l o b t a i n sufficient
So we h a v e p l e n t y
try
of effort across
possible rithms,
structure
and i n v e s t i g a t e coarser
o f reasons
a l l possible delving
i n practice.
to build
into
t h es t r u c t u r e o f that
results
t o begin
b u t they o f t e n
studying
algorithms, rare
i t
is We
uni-
prove t o
their this
graphs
that
are r i s k i n g that
algorithms,
i sn o t always
possible.
a l l objects
i t s graph,
ever
i t i s always
executing
from
i f we
t o waste
one h a r d l y
without
Choosing n o t t o d i s t i n g u i s h a l g o r i t h m further notice
the structures o f
data. Besides,
we
situations
For unconditional
and study
For generic algorithms,
sume w i t h o u t
condi-
o f such a f a m i l y .
g r a p h s t h a t do n o t depend on i n p u t
t o investigate
comes
To b e s u r e ,
f o r p r a c t i c a l purposes.
those a l g o r i t h m
plenty
containing
algorithms.
i s s u e f u r t h e r now. We c a n a l w a y s t a k e t h e u n i o n o f
all
be
any a l g o r i t h m
the algo-
we w i l l a s -
and dependencies
t h a t may
26 influence For
the implementation
example, t o take
munication the graph is
that
o r memory that
into
of algorithm are reflected
account
traffic,
reflect
t h e overhead
we must
those
i n the graph.
f o r i n t e r p r o c e s s o r com-
i n c l u d e new n o d e s a n d a r c s
operations.
Their
t h e y do n o t c h a n g e a n y d a t a b u t t a k e
distinguishing
time
into
feature
t o complete.
4. Topological Sorting Our
research
i s aimed
r i t h m s on p a r a l l e l cuted
i n parallel
at the e f f i c i e n t
computers. Determining ( i . e . simultaneously)
implementation
which operations
of
algo-
c a n be e x e -
i s t h e r e f o r e one o f o u r m a j o r
tasks. We
can p o i n t
grounded.
Given a p a r t i c u l a r which
out the general
The e x e c u t i o n
operations
that
i s being
only
from
Thus
any
implementation, are executed
executed
those
idea
have
sorting
of i t s operations.
sorting
i s that
a l l operations
executed
long
us f r o m
groups a r e connected
of
sorting graph
will
speaking,
a t any time an
operation
i t s arguments
to this induces
moment. a
well-
feature of
among g r o u p s
that
the operations
i nparallel. wherein
be
i n time.
this must
t h a t be-
However, n o t h i n g
the operations
from
i n some way. I t g o e s w i t h o u t s a y i n g
of algorithm operations
of algorithms
completed
The d i s t i n g u i s h i n g
the sortings
nodes a n d v i c e v e r s a .
structure just
considering
receive
f o r execution
are distributed
consecutively. Generally
specific
Obviously,
could
been
t o t h e same g r o u p a r e t o be e x e c u t e d
prevents
any
moment
algorithm
defined
be
t o determine
simultaneously.
that
o f an
our research
necessarily unfold
we a r e a b l e
a t a given
operations
scheduling
on which
o f a l g o r i t h m must
This
i s closely
induces means
related
the corresponding
that
that
sorting
exploring the parallel
to studying
the sortings
we
described, So f a r we d o n ' t know a n y t h i n g a b o u t a l g o r i t h m s t r u c t u r e
the f a c t
that
a n y a l g o r i t h m c a n be d e s c r i b e d
process t h a t u n f o l d s notations,
i n t i m e has i t s b e g i n n i n g
as p r o g r a m s
i n serial
by a d i r e c t e d
except f o r graph.
Any
a n d e n d . Some a l g o r i t h m
a l g o r i t h m i c languages,
offer
conspic-
uous d e s c r i p t i o n s o f t h e f i r s t
and t h e l a s t o p e r a t i o n o f a l g o r i t h m . I n
the
operations
graph
of algorithm,
these
correspond
t o t h e node
that
27 h a s no i n c o m i n g a r c s a n d t o t h e n o d e tively. built
Suppose
using
that
those
and
terminal
put
node a l w a y s
an a l g o r i t h m
algorithm
to which
or
notations
that
by a graph
explicitly one i n p u t
e x i s t , as t h e f o l l o w i n g s t a t e m e n t
no a r c s p o i n t
acyclic
graph
a n d at least
respec-
that
was n o t
specify
initial
node a n d one o u t -
shows.
comprises
one node
at least
from
which
one
no
arcs
iginate. The
graphs arc.
the
statement
that
have
Consider
node
obviously
no a r c s
has an incoming
the
critical tions
path
path.
Statement
group
graph.
a t least
a r e n o t empty.
that
a
Statement
o f t h e second
an integer
s±n such
from
i and points
to a node
"1"
t h egraph.
process Since labels
cannot
form
that the
t h e assump-
o u t o f which
t h e nodes
node,
no a r c s
into
belong
i s n o t below
length
algorithm
t o t h e second
operations
n o t supply
three
the third
path
any
2,
a l l
c a n be
information
group. acyclic
graph
a l l the nodes
with
that
from
nodes
e x c e e d t h e number o f g r a p h
and a r c s acyclic.
step,
nodes.
inte-
a node
labeled
i<j.
incident
and l a o n them
Now we r e p e a t t h e
no p r e d e c e s s o r s
on each
There
with
have no p r e d e c e s s o r s
remains
have
i s labeled
j then
that
of n nodes.
can be labeled
i f an arc originates labeled
Remove a l l l a b e l e d
one node
nodes
one i n p u t
4. 1 does
The r e s u l t i n g g r a p h
labeling
a t least
with
found
that
A l l the rest
a directed that
so that
11,2,...,sj
them w i t h
from
case. I f
i s longer
o f graph
a t least
C h o o s e a n y number o f g r a p h n o d e s bel
that
o f a node
The c o r r e s p o n d i n g
STATEMENT 4 . 2 . Consider
with
a path
partitioning
contains
the critical
consecutively.
gers
i n that
p a t h t h e n we h a v e
i sa contradiction
one o u t p u t node.
groups
exists
found
The e x i s t e n c e
group
Provided
thes t r u c t u r e
I t s starting
I f i t I s n o t one o f t h e nodes
have
4.1 d e f i n e s
group.
executed
has a t l e a s t one that
similarly.
The f i r s t
—
( I . e . f o r those
t h e graph
t o t h e same c r i t i c a l
I n b o t h cases t h e r e
i sproved
groups.
Suppose
graphs
p a t h i n t h e g r a p h . Suppose
t h e n we
o f t h e statement.
depart
f o r empty
a r c . There a r e two p o s s i b i l i t i e s
i nour acyclic
critical
holds
at a l l ) .
any c r i t i c a l
p r e c e d i n g node b e l o n g s
a circuit
on
has no o u t g o i n g a r c s ,
nodes. N e v e r t h e l e s s , a t l e a s t
STATEMENT 4 . 1 . Any directed node
that
i s defined
with
the total
"2",
etc.
number o f
28 COROLLARY. No identically
labeled
COROLLARY. The minimal equals
critical
path
number
length
plus
length
utilizes The
s
marking
sorting. all
and the total
precisely
theproof
that
t h e maximum
equals
k-1.
critical
s i n the Interval
number
of nodes
o f 4.2 t h a t length
For that
path length
with
i n that
Note '
j
by i ^ j .
refer
nodes
the
critical
a marking
that
4.2 i s c a l l e d
there
o f paths
sorting,
a
k by t h e t o p o l o g i c a l
exists
than
sorting k.
the topological
terminating
t h e number
i n t h e node
o f used
topological
labels
I t
sorting
such
labeled
by k
exceeds the
b y 1 . T h e p r o o f o f 4.2 e s s e n t i a l l y y i e l d s
t h estatement
still
corresponding
then
follows
the
con-
paths.
holds
i f we r e p l a c e
the inequality
s o r t i n g o f nodes as the
We
generalized
will topolo-
sorting. There
ings. ings
a r e two reasons
As we s h a l l
t o introduce
see l a t e r ,
i s i n a sense a n a t u r a l
ings. iate
Besides, generalized objects
during
a non-trivial
consider the
the
topological topological
sort-
of theset of topological
sort-
closure
topological
generalized
sortings
a specific
topological
are
sort-
useful
as intermed-
topological
sorting. Gi-
sorting,
we d o n o t h a v e t o
e n t i r e g r a p h o n c e a g a i n . A g r e e a b l e r e s u l t s c a n be o b t a i n e d
finding appropriate
by
g r o u p s o f nodes o f t h e
gical
generalized
the seto f generalized
search f o r
by
the
an arc. all
However, n o t one o f t h e c o r o l l a r i e s r e m a i n s v a l i d .
t o the
gical
ven
that
exists
node a r e n o t l o n g e r
s t r u c t i v e method t o f i n d a l l c r i t i c a l
~° \
/
as
The
Each
structure
the
algorithm
shown
2>
2
too,
represents the
the
graph is
These
facts
suggest
algorithm
to
solve
iteration
of
the
clusion
w o u l d be
Consider agonal
system
m a t r i x and
The
that
algo-
nodes
the
and
operations
structure.
2>„_,
6.4
clearly
of
a
t h a t we that
(6.7),
6.4.
the
macroopera-
a _
shows
s e r i a l and
right-hand
p a r a l l e l i z e d , provided
i n Fig.
of
that
X
/°
solution
computation of
graph to
r
admits
i n d i v i d u a l m a c r o o p e r a t i o n c a n n o t be
only be
the
(6.8)
the
emphasizing
have complex
\
of
build
corresponds
picture,
Fig.
level
We
D. node
be
the
data being transmitted
SB,
B,
each
vector
no
on
matrix-vector
paralleltzation.
parallelized either,
bidiagonal sides of
system.
bidiagonal
i s no
hence,
above-mentioned
since i t ,
Therefore
i t is
systems t h a t
s o l v e these systems u s i n g
there
and
that of
(6.8).
good p a r a l l e l i z a t i o n o f
no
p a r a l l e l i z a t i o n of
solution
methods.
can
However,
a
the
semi-
such
con-
premature.
a FORTRAN-like n o t a t i o n solving.
Using
v e c t o r e n t r i e s , we
the have
of
the
notation
algorithm we
for
introduced
block earlier
bidifor
51 DO 1 J m l , n 1
= 0
= 0 (6.9) -
u
2
CONTINUE
(6.9) ( e s s e n t i a l l y
This
operation
ok
a rectangular
graph
nodes
into
grid
with
a l l nodes
nodes t h a t
feed
the output
broadcast
input
on t h e i . k plane.
l^ft^m,
We
of operations
( i - 1 , f t ) and ( i , f t - l ) .
any o t h e r
(6.10).
omit
I t c a n b e shown t h a t
eration w i t h coordinates cute
i n t e g e r nodes
the straight-
the graph,
assuming t h a t the input
a l l nodes
nodes
a n d some
(i.ft)
operations.
located
(6.1),
I n t h e nodes w i t h
required
i s input data
Unlike
By t h e
t h e node w i t h c o o r d i n a t e s
A l l other data
that
t o perform
graph
(i,fc) coor-
t h e op-
i s n o t needed
t h e present
con-
We p u t
i n i t i a l z e r o e s a s a r g u m e n t s f o r some o p e r a t i o n s .
of (6.9)
receive
Again,
f,e,d,b,x,y.
i s no g o o d . To b u i l d
for l^i^n,
t o the operation
dinates
(6. 10)
computes u u s i n g
correspond
analysis
The d o m i n a n t o p e r a t i o n
= b '(f-ex-dy)
enumeration o f operations
sider
will
f o r a l l k, i .
e =d^=0
i t i s t h e only one)i s
a
forward
. u. l - i , ft 1-1, k - d l,k-iul,k-i
ik
H e r e we a s s u m e t h a t in
e
t o exe-
does n o t
data.
The g r a p h o f a l g o r i t h m i s shown i n F i g . 6.5 f o r t h e c a s e n = 5 , m=9. Despite
our apprehensions
t h e graph
l a y e r s o f t h e maximum p a r a l l e l the height minim,n). respond layers
c a n be
readily
parallelized.
a r e drawn i n dashed l i n e s .
The
Clearly,
o f t h e a l g o r i t h m e q u a l s m+n-1, t h e w i d t h o f t h e a l g o r i t h m i s The g r o u p s
to individual
o f n o d e s hemmed b y d a s h e d
how
an
lines
i n F i g . 6,6
coi—
nodes o f t h e g r a p h i n F i g . 6.4. They a r e a l s o t h e
o f a generalized
lustrates
form
easily
parallel
form.
parallelizable
This
collection
algorithm
o f drawings i l -
c a n be
turned
into
52 n o n - p a r a l l e l l z a b l e by t h e u n f o r t u n a t e c h o i c e o f m a c r e o p e r a t i o n s . Merging operations tures,
since
investigation. operations,
i s widely
used w h i l e a n a l y z i n g
omitting superfluous This
details
example warns us
so as n o t t o l o s e
can
to exercise
Important
ture.
k
Fig.
6.5
Fig.
6.6
*- k
greatly
algorithm
struc-
facilitate
caution while
i n f o r m a t i o n on a l g o r i t h m
the
merging struc-
53 T h i s example arcs originating nating
f r o m any
The
graphs.
graphs
can
observe form,
data. and
scrutinized
Naturally
Given
The
We
size,
see
origi-
exhaustively.
brings that
circumstance f a c i l i t a t e d shall
d i d not
For
algorithm
t w o a r r a y s o f n u m b e r s a.,
b l s i s n ,
the following
we
now,
we
parallel
common
on
graph
input
building
example.
w i s h t o compute
the
program:
0
=
b =
0 1 1=1.n
= max(a,b) +
a
out
found.
depend
presently consider a d i f f e r e n t
as
regular
t o w h i c h t h e maximum
the graphs
graph
those
t h r e e p r e v i o u s examples have t h e f o l l o w i n g
two numbers a , b u s i n g
DO
shall
that
t o such g r a p h s
algorithms
L a t e r we
i s t h e r e a s o n due
problem
this
a
refer
t h e w i d t h o f t h e a l g o r i t h m were e a s i l y
f o r given
exploration.
by s h i f t i n g
We
practical
r e s p e c t s be
regularity
EXAMPLE 6 . 4 .
of
often.
t h e h e i g h t and
property:
(1,1).
occur f a i r l y
i n many
that
coordinates
consideration
r e g u l a r graphs
more r e a s o n . N o t i c e
node c a n be o b t a i n e d
f r o m t h e node w i t h
regular that
i s n o t e w o r t h y f o r one
16.11)
b • min(a.b) + 1
If
we
whether ly,
a or
do
not
know
b will
a.
and
be g r e a t e r
beforehand than the other
the graph o f a l g o r i t h m cannot We
the do
CONTINUE
will
build
an e x p a n d e d g r a p h
graph o f a l g o r i t h m this,
will
we
will
f o r any
ignore
a^,
f o r any
argument.
set of
arguments
data only
an
d e t e r m i n e , w h i c h o f t h e f o r m u l a s a=a+a e.g.
required
sists
a.
After
more p r e c i s e ,
the
choice
t o c o m p u t e a. Our
In Ignoring
this
as
l £ i = n s h o u l d be
b.,
to
longer
be
I n s u c h a way
as
update
To
Input
has
cannot
foretell
independent o f input data.
the contents of operations
assume t h a t f o r a l l 1, t h e i r
tually,
be
t h e n we
f o r every i . Consequent-
one
t o guarantee
i t s subgraph.
inside
a r e a.b,a.,
the loop.
a n d a.b.b^
o f t h e n u m b e r s a.b
b o t h n u m b e r s a.b
that
are
We Ac-
i s used
required
but
a n d a = b + a i s h o u l d be u s e d t o
i
been
made,
either
a
or
b
is
expansion of the algorithm graph
c i r c u m s t a n c e and
To
assuming
that
b o t h a and
no
conb
are
54 a l w a y s used t o compute t h e The n=6
expanded graph
together with
rallel data,
form the
carding remain
a
are
i n p u t and depicted
actual half
graph
of
between any
i s obvious
the p a r a l l e l
that
can
be
adjacent
to
formed
course,
expansion
conditional not
consider
hardly
be
from
layers. and
the p a r a l l e l
He
form
of
algorithm.
Ue
a
the
more
drawn on
complex paper.
given
expanded
parallel
cannot
set graph
or
examples
pa-
input
by
dis-
crossing
arcs
foresee, which
t h e expanded graph f o r any
of
arcs
re-
Nevertheless I s as
a.,
b .
our
point here
well
6.7
wished
f o r algorithm
operations or c o n d i t i o n a l
for
l a y e r s o f t h e maximum For
i t i s a v e r y s i m p l e example. Yet
technique
i n F i g . 6.7
b^. a r e k n o w n i n a d v a n c e .
form of the a l g o r i t h m graph
study a complicated
graph
lines.
Namely, e i t h e r
Fig.
Of
I t i s shown
o u t p u t n o d e s . The i n dashed
i t s arcs.
main, unless t h e values o f it
result.
I s easy t o b u i l d .
to demonstrate
exploration
branching. is
that
Unfortunately, this
the c h o i c e o f examples where g r a p h s are
t o be
the
The
the
not
use
of
i n the presence
of
o n l y reason
resulting
obstacle built.
was
we
graphs
severely
do can
curbs
Chapter 2 Algorithm Execution Time The its
time
required
an a l g o r i t h m on a c o m p u t e r i s one o f
major e f f i c i e n c y c h a r a c t e r i s t i c s .
pend
solely
structure The
on
the algorithm.
mutual
history ment
serial ters.
new c o m p u t e r s
mathematics
general,
introduce
t o that
results
that
trends
Serial
time,
computer p e r m i t s
with
long
obtain the solution
portant ably
characteristics
compu-
then? C l e a r l y ,
computer.
Therefore
d i -
but should
yield,
we
must
times. I n
them o n some a b s t r a c t m a c h i n e t h a t still
should reflect
design.
o f algorithms
ago c r e a t e d
respect
on
comparison o f algo-
designed
f o r uniprocessoral
the w e l l - d e f i n e d procedure
to execution
time.
This
well-known
c o n s i s t s i n comparing the o p e r a t i o n counts that both to
parallel
c o m p u t e r . The c o m p a r i s o n w o u l d
on another
i n computer system
implementations
computers have
develop-
c a n become e f f e c t i v e o n c e a g a i n i f
depend on c u r r e n t hardware p e c u l i a r i t i e s ,
algorithms
technology
entire
w e r e deemed e f f e c t i v e
some a b s t r a c t m e t h o d t o c o m p a r e a l g o r i t h m e x e c u t i o n
w o r d s , we m u s t c o m p a r e
general
struc-
The
appear.
w i t h respect different
and a l g o r i t h m
and computer
c a n we a s s e s s t h e a l g o r i t h m e x e c u t i o n
rithms only
on t h e
i n e f f e c t i v e on e x i s t i n g
measurements on a p a r t i c u l a r
other
parameters
Some a l g o r i t h m s
computers a r e o f t e n very
How
not
to this.
time does n o t de-
considerably
i s r a t h e r complex and e q u i v o c a l .
H o w e v e r , t h e s e same a l g o r i t h m s
suitable
rect
time
that
depends
o f t h e computer.
i n f l u e n c e o f computer
o f computational
testifies
Of c o u r s e ,
I t also
and t h e c h a r a c t e r i s t i c s
ture on t h e execution
in
t o execute
t o a given of serial
accuracy. This
t o compare procedure
algorithms
reflects
require
t h e most i m -
implementations o f algorithms
reason-
w e l l a n d h a r d l y depends a t a l l on h a r d w a r e m o d i f i c a t i o n s . F o r a l l
these
reasons t h e operations
table
criterion
t i o n s count as t h e time uniprocessoral
count
became a n a l m o s t u n i v e r s a l l y
of algorithm efficiency.
computer
Formally
we
can t r e a t
accepopera-
r e q u i r e d t o e x e c u t e an a l g o r i t h m o n a n a b s t r a c t that
performs
every
operation
i n unit
time,
56 while a l l other a c t i v i t i e s , ting is
as d a t a
v i a communication channels,
also
assumed
that
i / o , memory t r a f f i c ,
e t c . , do
operations
are
not
take
executed
any
data time
one-by-one
transmitat a l l .
I t
without
any
breaks. As
we
have d e m o n s t r a t e d
inadequate from algorithm stract times is
both
execution
serial
times
machine
becomes
i n §6, o p e r a t i o n s
theoretical on
was
to
Now
parallel
an
infinite
in
unit
can
be
the
algorithm
appropriate
machine
can
a l l other
be
work
communications
parallel
easily
be
the
introduced.
examining
the
Df
model
machine
no
time
at
all.
Suppose
We
could
not
rush
times
on
d i f f e r e n c e s between
various
implementations
various graph
suitable machine
pared
with
the
the
number o f
help
studied
rithm,
we
can
cope
with
solution
abstract
parallel
the
set
readily
algorithms,
parallel
us
algorithm
the
processors,
Having
various
to
makes
their
of
the
to
the
task.
problem
machine, types,
The
latter
that
since
and
the
implementations
proceed
i . e . use
of
abs-
parallel
t o i n t r o d u c e such a machine.
i n w h i c h way
graph
inter-
that
differ.
quite
ab-
i t has
needed use
same
the
for-
operation
must u n d e r s t a n d
and
the
that
tations
one
and
c o u r s e , an
o f v a r i o u s a l g o r i t h m s we of
ab-
computer
each p r o c e s s o r p e r f o r m s any
i n c l u d i n g e s t a b l i s h i n g the
takes
should
the
execution
i m p l e m e n t e d on
latter.
machine t o compare a l g o r i t h m e x e c u t i o n
c o m p u t e r s . H o w e v e r , we Before
can
i m p l e m e n t e d on
number o f p r o c e s s o r s ,
time,
processor tract
compare
fundamental d i f f e r e n c e between the a b s t r a c t s e r i a l
w h i l e o n l y one
stract
f o r comparing
i n §5.
the g r a p h machine i s t h a t a l l a l g o r i t h m s mer,
viewpoints
computers. Consequently,
used
inapplicable either.
c o u n t becomes a l t o g e t h e r
practical
parallel
that
the graph machine d e s c r i b e d The
and
of
implemen-
machine
Moreover,
much e a s i e r i t firmly
one
the a b s t r a c t p a r a l l e l
and
of
the com-
establishes
communication
comparison
as
is
the
network.
same
algo-
implementations
m a c h i n e as
our
of
model
computer.
In t h i s
chapter
we
set. o u t
to explore
the set of
schedules R
Our
m
principal that
goal
represents
trivial
for
i s to
i n v e s t i g a t e the
algorithm execution
serial
implementations:
t i o n s f o r which there
i s no
idle
set
of
time.
The
minimums o f
structure of
i t consists
run of
the
of
those
functional that
set
is
implementa-
the g r a p h machine. S i n c e
elimi-
57 nating
idle
r u n s does n o t change
mums o f t h e t i m e implementations. nate ly
a l l idle
functional As r e g a r d s
the execution
i s essentially parallel
implementations,
runs f o r a l l processors.
The o n l y
assumed i s t h a t a t a n y moment a t l e a s t
executing tions,
the structure
the set of mini-
we c a n n o t
thing
elimi-
t h a t c a n be s a f e -
one o f t h e p r o c e s s o r s
some a l g o r i t h m o p e r a t i o n . T h e r e f o r e
rather complicated
order,
t h e same a s t h e s e t o f a l l
with parallel
o f t h e s e t o f minimums o f t h e t i m e
i s busy
implementa-
functional i s
algebraically.
7. Vector Properties of Schedules B e f o r e we b e g i n sary
t o study
s e t o f v e c t o r s . We b e g i n
with
to
a.
a given
fact
delay
that
other ized
vector
particular
kinds
vector
the set R
At
this
of u
We h a v e m e n t i o n e d
I n general,
between
n o t a t i o n simple. be l i n k e d
any
essential constraints.
schedule,
we
schedule have
restrictions algorithm
o f some d e -
have
introduced
than
sure
a r e imposed o n l y
That
that
that
by t h e d e l a y
are performed
instantly
that
order
t o keep
a n y two nodes d o e s n o t add
vector
t c a n be a
For t h e vector
i t s components
execution
.notation to
i n order
assumption
n o t every
algorithm.
define a valid
operations
nodes.
one a r c . I n
t o the
n o t do t h i s
t o t h e assumption
one a r c .
f o r that
t o be
the nota-
t h e i t h a n d t h e jth
be a t t a c h e d
a n a l g o r i t h m , we o b s e r v e
restrictions
o f the imple-
later.
We
a r c s . We w i l l
That amounts
b y n o t more
various
i n case o n l y one a r c c o n n e c t s t h o s e two
t a g should
several
can
given
the s p e c i f i c a t i o n
t w o n o d e s c a n be c o n n e c t e d b y more t h a n
our
Given
of
by t h e
the set o f general-
to the specification
on t h e a r c connecting
t h a t c a s e some a d d i t i o n a l
(generalized)
corresponding
is justified
the description
be c o n s i d e r e d
This notation i s correct solely
distinguish
that
p o i n t we make a r e f i n e m e n t .
i
f o rR
allow
h i s also equivalent
t i o n t> . f o r t h e d e l a y
nodes.
o f a l l schedules
F o r e x a m p l e , u>-0 d e f i n e s
v e c t o r w. O t h e r c a s e s w i l l
i t i s neces-
r e l a t i o n s d e s c r i b i n g a l l s c h e d u l e s as a
Our p r e f e r e n c e
choices
o f schedules.
schedules.
mentation lay
the p r o p e r t i e s o f schedules,
t o o b t a i n the mathematical
together
t t o be with
of operations.
vector,
a
the
I f the
we c a n assume
that
a t t h e moments d e f i n e d
by
58 the
schedule
components.
STATEMENT 7 . 1 . Let tion
is
necessary
responding the
and u;
to
j t h
node
if
then
w be a g i v e n d e i a y vector.
sufficient an
the
for
arc
the
originates
inequality
The following
vector from
must
t the
to
be
ith
condi-
a schedule
node
and
cor-
points
to
hold
t , - t , * 0, . J
where
ia. .
is
the
component
of
( 7 . 1)
' J
I
the
delay
vector
corresponding
to
that
arc.
If
an a r c goes o u t o f t h e i t h node
the
results
The
specification
into
t h e j t h node
then
one o f
o f t h e i t h o p e r a t i o n i s an a r g u m e n t f o r t h e j t h o p e r a t i o n . o f a nonnegative
v e c t o r id i m p l i e s
that
the time i n -
t e r v a l b e t w e e n t h e t w o o p e r a t i o n s must be g r e a t e r t h a n o r e q u a l
t o t&^j.
This
t h e ne-
i s precisely
cessity
the c o n d i t i o n
I s merely
expressed
a reformulation
r e s p o n d s t o t h e d e l a y v e c t o r U. To p r o v e nonical
parallel
always execute tor
form
that
f o r given
t i o n s u p t o t h e Jcth l a y e r ,
f o r (7.1).
tisfied,
the (k+l)th
lay
(7.1) f u l l y
vector.
(")(s|
While
t o denote
involved.
The
variable,
i f that
rule,
investigating
be
STATEMENT 7 . 2 . T,V. the
Then vector
we u
t h e way
that
i f ( 7 . 1 ) i s sa-
c a n be e x e c u t e d ,
we
and
so
well
d e f i n e d by a de-
will
o f u whenever
use
the
notation
our formulas
get too
c o n s i s t o f more t h a n one
the vector
components have
t h e y can
a l l opera-
executed.
schedules,
i n d e x e x p r e s s i o n may
the schedule
layer
can vec-
o f u d o e s n o t impose
the set o f schedules
t h e s t h component
components have two
vectors
specifies
t cor-
the delay
can execute
I t follows
a l l algorithm operations are
Thus,
since
o n t h e moments a t w h i c h
R s l . The s p e c i f i c a t i o n
a l l operations from
Therefore
the schedule
G i v e n a v e c t o r t , we
layer,
£ a n d w we
any c o n s t r a i n t s on t , except
on, u n t i l
that
t h e s u f f i c i e n c y c o n s i d e r a ca-
the operations of the f i r s t
Suppose
(7.1).
o f the a l g o r i t h m graph.
u does n o t impose any r e s t r i c t i o n s
be e x e c u t e d .
by
o f the f a c t
one
components Index,
are
while
indexed.
the delay
index As
a
vector
indices. Let
vectors
u,v
be
schedules
corresponding
corresponding
to
to
delay
have + v
is
a schedule
the
delay
vector
59 T
*
Vf -
delay
for
For hold.
A
any
£
O
the
vector
and
v, v
u, T
pairs
,-(u+v}
,
Uu^-Uu/j
(7.1) This
ponding
also
the
vector.
»
a
is
schedule
the
corresponding
inequalities
(u . - t t , ) t ( v , - v , )
set The
delay
£
vectors
establishes
to
analogous
the
(7.1)
to
vectors.
set
=
R
H o w e v e r , we
i s almost
of
q
(^Xljj
Au.
corresponding
f o l l o w i n g statement The
AT^.
and
wv
(T+I>)
=
r e l a t i o n s h i p between
o f a l l schedules
STATEMENT 7.3.
T . ,-H*,
£
ACU^-U^)
=
holds f o r the
Statement
to different
studying lay
\u
Therefore
iu+v)
i.e.
vector
AT.
vector
are
schedules more
t o one
corres-
interested in
and
the
same
de-
trivial.
generalized
schedules
a
is
linear
cone.
Certainly, (7.1)
the
let u
and
u . J
a l l valid
for
pairs
components
of
equalities
analogous
are
v
be
generalized
following inequalities
u
generalized
+
of
to
indices Au
2
that the
The
set
If
are
generalized
R^
A
0
to
Statement
is arbitrary
of
set
R^
generalized
is a
linear
schedules
7.2,
satisfy
I t f o l l o w s t h a t the vectors
COROLLARY.
of
(7.2)
According
i,j.
where
s c h e d u l e s , so
virtue
1
J
(7.2).
By
v . - v . £ 0
'
and
v
£ 0,
u.
schedules.
hold
u
the
the
in-
* v and
Au
cone. is
convex
and
inequalities
(7.2)
closed.
hold.
u
and
v
T h e r e f o r e f o r any
A,
schedules
0 £ A £
1 , we
then
(Au+(l-A)v)^-{A(j+(l-A)v}i
=
-u . ) + ( l - A ) ( v -v.)
\{u j
i
J
I
the
have
£ Aw. IJ
=
,+ ( l - A ) w . . - w. ., IJ
ij
60 i.e.
Au+(1-Alv
vector
i s a generalized
z i s an a c c u m u l a t i o n
sequence o f g e n e r a l i z e d
schedule
and R
i s convex.
p o i n t o f ft. I t means t h a t
schedules z
that
Suppose
a
exists
a
there
c o n v e r g e s t o z. Then
K k
k
k
z .-z. = l i m z . - l i m z . 1 J k ^ > k-*, '
i.e.
z t u r n s o u t t o be a g e n e r a l i z e d The
that
set of generalized
I t corresponds
ule f o r the delay negative is
delay
p r o v e n by t h e f a c t
that
t h e components
clude
that
f o r the vectors
2 T . .,
{AT},,
t
the
the
sum of
the
product
the
due
t o the fact t i s a sched-
be s c h e d u l e f o r a l l n o n those
o f u.
u, v
T , . hold
vectors
In partic-
are nonnegative,
i n Statement
7.2
the
This
we
con-
Inequalities
f o r A B 1 and a l l v a l i d
pairs
i,j.
i n t o a c c o u n t a n d u s i n g S t a t e m e n t 7 . 2 , we o b t a i n
possesses
All
i t would s t i l l
o f delay
STATEMENT 7 . 4 . The set vector
only
I f a vector
( 7 . 1 1 h o l d g o o d a s (tfj . d e c r e a s e s .
since
Taking t h i s
i s a cone vector.
v e c t o r s whose c o m p o n e n t s d o n o t e x c e e d
ular,
{T+P}..
£ 0,
'
schedule.
delay
vector m then
1
k ^
schedules
t o the zero
k
= l i m ( z -z.)
set
of
of
is
a schedule
schedules
is
proofs are similar
f o r example,
corresponding
to
a gii'ert
delay
properties:
schedules of
schedules
following a
schedule; and a n u m b e r
convex
and
\
£ 1 is
a
schedule;
closed.
t o p r o o f s o f S t a t e m e n t s 7.2 a n d 7 . 3 . P r o v e ,
t h e c o n v e x i t y . We
have
u -u . £ t j . ., v -v. a u J JJ 1 ij 1
For
a n y A, O s A = l , we f i n d
that
(Aut(l-A)v}j-(Au+(l-A)v>i
•
i.e.
A ( u . - t i . ) + ( 1 - A ) ( v -v.) J 1 J i
e \0.
,+ ( l - A ) w , , = u
ij
t h e s e t o f schedules f o r a given delay For
longer
a given
non-zero delay
vector
=
i j
vector
, i f
i s indeed
convex.
u the s e t o f schedules R
a cone. However, f o r any u t h e s c h e d u l e s
from
R belong U
i s no tothe
61 set
o fgeneralized
schedules R . I n s p i t e
o f the
fact
t h a t if o i n a sense, i s i n c l u d e d i n each R .
includes
9
all
the sets R STATEMENT
generalized
i t also, 7 . 5 . Let
schedule
Certainly,
t
be a given
u the vector
for a l l valid
pairs
tj-t.
Using these
Inequalities,
(7.1)
holds
schedule from
for
from
also
R . •a
Then
a schedule
from
for
any
R . w
i , j we h a v e
u
*-
^ 0.
/ U i
we o b t a i n
( t + u ) . - {t+u} . J I Thus,
schedule
t+u is
the
vector
(t ,-t .) + ( u - u . ) J I J I
t+u,
z iff. .. i j
and t h i s
vector
i s therefore a
R. u
We
see
arbitrary
that
the
cone o f g e n e r a l i z e d
s c h e d u l e s , when s h i f t e d
schedule from
t o t r y and
find
out
R , i s a s u b s e t o f R f o r a l l iff. I t i s n a t u r a l iff w o n w h a t c o n d i t i o n t h e s e t s R and R a r e i d e n t i c a l o
to a parallel
Suppose
the system
p is
compatible
The
any from
and the vector
identical
shifted
to
vector by the
schedule R
plus
a shift
r
p
i
p is
p i s a schedule
from the
R
U
linear
(t-p),
consider
these
from
Then
(t-p>j
i.e.
the vector The
t - p . We h a v e
we c o n c l u d e
^..
Since
schedule t = p+
Subtracting
(7.3)
that
- f ' j - V
t-p i s a g e n e r a l i z e d
3
R . By S t a t e m e n t 7,5, the set R u ' o i n R^. I t r e m a i n s t o p r o v e t h a t
p. T a k e a n y s c h e d u l e
the v e c t o r
the sets
7
p.
c a n be r e p r e s e n t e d
inequalities,
equations
(
i t s solution.
p i s contained
vector
algebraic
" i j
by the vector
v
vector
=
of
O
from
translation.
STATEMENT 7.6.
are
by an
(
p
j ~
p
i
)
-
°'
schedule.
(7.3) provides
a sufficient
condition for
the
62 sets
ft
and
o
t o be
R
identical.
Now
t i o n s a r e w o r k a b l e . Suppose t h a t a v e c t o r s. What p r o p e r t i e s a r e
we
ft
inquire
what
necessary
i s generated from
inherent
ft
via a shift
to the schedule
For
any
schedule
t i n R ^ the vector
t-s i s a generalized
any
pair
nodes c o n n e c t e d
an
out
of
i,j
there exists
on
the
i t h node
i t . Due
ity
the
j t h node.
such schedule
to the fact a
{ t - s } j - { t - s ) .
have
into
that
0
by
that
Since
t - t ^
Suppose
Hence,
t
j ~
t
s
-
i
f
s
schedule T
i '
goes
for
given
i t s minimum o v e r R ^
h
i
s
m
the e
a
Suppose
the
set
R
is
generated
from
n
R
inequal-
s
t
h
via
(J a
arc
a
t
w
e
obtained STATEMENT 7.7.
by
schedule.
the
R ^ i s closed,
achieves
i s a generalized
t-s
holds.
arc.
by
s?
Consider
of
condi-
vector
arcs
the
s.
Then
minimum
for
ail
value
pairs
of
t
i.j
-t
j
.
a
shift
o corresponding
over
all
to
algorithm
schedules
t
graph ft
in
i
emiais
u
s . - s . .
Thus, I f the s e t s R s,
t h e components o f
satisfy those
the
as
(7.3)
i s compatible
f o r the schedule We
special an
h
way.
by
idle
m u s t be
extra
If
by
property.
a
schedule
Namely,
they
" w i t h m i n i m a l gaps", i . e . making each o f
t o an
e q u a l i t y as
i n §5 t h a t
the
some d e l a y
possible.
I f the
system
actually
become
equal-
(7.1)
specification
vector
during
the
somewhere. T h u s t h e of both
i n p u t and
schedule
S t a t e m e n t 7.7
of
implementa-
selected be
regarded
i s also v a l i d
vector
f o r an
consumed
t - t -to J 1 tj
the delay
in
a as
i t h operation
immediately
residuals
that
an
the
intermediate data,
t and
asserts
Is can
.-i->^ j(h)
consuming
i s not
of
that
u>(h)
t h e r e s i d u a l tj-t
time
between the
arthen
characterize c a u s e d by
u;
t and
the schedule s minimizes
the
w
are each
times. the system o f equations
ding
schedule
data
item, whether
after
to a shift
important
a l l inequalities
case
run
times
"ill-fitting". those
identical
v e c t o r id. I f some r e s u l t
stored
storage
discrepancy
of
are
the j t h o p e r a t i o n . S i m i l a r treatment
b i t r a r y delay it
o
s.
produces In that
unplanned
output
then
have m e n t i o n e d
vector
(7.1)
close
ities
tion
R
s possess a very
inequalities
inequalities
and
u
p
directs
a
i t i s an
i t I s p r o d u c e d and
(7.3)
special
mode o f
input or
the delay
i s compatible
then
algorithm
intermediate
d e f i n e d by
the
correspon-
execution.
value,
i s used
t h e v e c t o r d has
Every right
elapsed;
63 no
additional
algorithm
w a i t occurs.
Implementation
Certainly,
efficiency.
p r o p e r t i e s o f a l g o r i t h m graph Given a delay form
a cycle.
I n that
to
traverse the cycle.
as t h e f i r s t
itive
We assume t h a t
i f we f i r s t e n c o u n t e r
direction
a n d b. , t i m e s
compatible or
i f the
and
Suppose t h a t gorithm graph rection.
The
only
With
we a s s o c i a t e
system i f
graph
a l l
deiay
of
no
algebraic
of
the
cycles
refer t o tothe
The c y c l e
with
(7.3)
graph
are
is balan-
all.
(7.3) i s compatible and p i s i t s s o l u t i o n .
every
traverse of the i j t h
the equality
p.-p^
I fthe a l -
arc i n the positive
. and w i t h
-
we
the Incident
of neighboring p . that
arcs
value
corresponds
associate
the
neighboring equations
suppose
that
node w i l l node w i l l
losing
the f i r s t
t o i t s nodes t h a t
the
subgraph.
be c o n s i d e r e d
the equations
Add o n e new a r c s u c h
either
with
a t least
with
annihilate.
has no c y c l e s , o r
generality,
we c a n assume
s u c h v a l u e s p.
(7.3) a r e s a t i s f i e d that
tra-
twice.
equation)
n o d e a n d a s c r i b e some
p . Suppose a c o n n e c t e d s u b g r a p h i s b u i l t
bed
=
a l w a y s be i n c l u d e d i n
and t h e l a s t
of algorithm
Without
Take
P^'Pj
the cycle traverse
t h e s u m m a t i o n , t h e s e Pj w i l l
the graph
t o be c o n n e c t e d .
the equality
be t h e d i r e c t e d d e l a y o f t h e
l o ri n the f i r s t
i t s cycles are balanced.
direction
traverse of that
be z e r o . C e r t a i n l y , f o r e v e r y
t o that
b o t h "+" a n d "-" s i g n s . D u r i n g
every
i n accordance w i t h
verse The
graph
We
h a s c y c l e s , t a k e a n y o n e o f them a n d s e t t h e t r a v e r s e d i -
side w i l l
to
that
i n t h e pos-
equations
algorithm
at
cycle, while the left-hand
the
direction.
the cycle.
of
linear
cycles
has
Sum u p a l l t h e e q u a l i t i e s
Now
Suppose
a l l i . j corresponding
o r d e r . T h e r i g h t - h a n d s i d e o f t h e sum w i l l
all
set the direction
balanced.
same a r c i n t h e n e g a t i v e d i r e c t i o n -Uj^,
that
an a r c i s t r a v e r s e d I n the p o s i -
taken over
i s called
algorithm
nodes
a r c i s traversed r ^ j times
o f t h e c y c l e as t h e d i r e c t e d
STATEMENT 7 . 8 .
which
i t s o r i g i n and t h e n i t s end p o i n t ;
i n the negative
u
tr^j-bj/) jj
zero d i r e c t e d delay
ced
n o d e . Now
the arc I s traversed i n the negative d i r e c t i o n .
sum o f v a l u e s
arcs
some s e q u e n c e o f g r a p h
and t h e l a s t
during a cycle traverse the i j t h
the
influences the
determine,
sequence e v e r y n e i g h b o r i n g nodes a r e c o n n e c t e d b y
arc, as w e l l
otherwise
circumstance
ensure t h e c o m p a t i b i l i t y o f (7.3).
v e c t o r u, choose
an
tive direction
this
T h e r e f o r e we w i l l
value ascri-
f o r a l l arcs o f
one o f i t s t e r m i n a l
64 nodes b e l o n g s to
t o the b u i l t
t h e subgraph,
unique
value
Let sumed
then that
subgraph.
(7.3),
taken
I f t h e o t h e r node d o e s n o t b e l o n g f o r t h e added
should correspond
t h e o t h e r node
o u r subgraph
belong
t o t h e o t h e r node.
t o t h e subgraph,
t o be c o n n e c t e d ,
some c y c l e s t o g e t h e r w i t h
arc, determines the
t o o . Since
t h e added
we h a v e a s -
a r c has t o
some o f t h e a r c s o f t h e s u b g r a p h .
constitute Take any o f
s u c h c y c l e s a n d s e t t h e t r a v e r s e d i r e c t i o n i n s u c h a way a s t o t r a v e r s e the of
added a r c i n t h e p o s i t i v e the origin
d i r e c t i o n . Suppose
o f t h e added a r c , 1 -
that
r i s t h e number
t h e number o f i t s end p o i n t .
c o r r e s p o n d i n g v a l u e s p^. a n d P j m u s t h a v e b e e n d e f i n e d verse a l l arcs o f the cycle, the
r t h node.
Again
positively directed -WJJ w i t h
every
equalities.
sides
will
directed
Remembering
that
t h e sum o f r i g h t - h a n d
tion
(7.3) holds f o r t h e r l t h arc. Going
tions
on w i t h
(7. 3) w i l l
COROLLARY. is
vector
the cycle
be
o f adding
I f
satisfied.
Ue
new
have
algorithm
then
graph
i t s solutions
is
we
graph
we
will
a
find
t h e equa-
that
obtained
sura u p
left-hand
i s balanced,
arcs,
thus
the compatibility of that the
t h e sum o f
every
p;-p . •
a r c . Now
be - M J . C o n s e q u e n t l y ,
sides w i l l
t h e process
compatible,
(1,1
This to
of that
s u c h v a l u e s p^ t o a l l nodes o f a l g o r i t h m
(7.3), which proves
(7.3)
a r c , and t h e e q u a l i t y
traverse
Tra-
and f i n i s h i n g a t = fa.. , w i t h
p.-p.
By t h e same a r g u m e n t a s a b o v e ,
be P r ~ P j •
that
ascribe
a t t h e J t h node
traverse of the i j t h
negatively
these
starting
we a s s o c i a t e t h e e q u a l i t y
The
previously.
finally
a l l equa-
solution
to
system. connected
occupy
the
and line
the
system
directed
by
the
1).
i s p r o v e n by t h e f a c t
that
f o r a connected
graph
the solution
( 7 . 3 ) i s d e t e r m i n e d u n i q u e l y o n c e a n y o f i t s c o m p o n e n t s p^ i s f i x e d ,
and
t h e sum o f a n y s o l u t i o n
with
the vector
(1,1
1) i s a g a i n t h e
solution of (7.3). For
any a l g o r i t h m
graph
cycles o f t h e graph balanced lanced that
the set of delay
vectors u that
t h e n ( 7 . 3 ) i s c o m p a t i b l e and t h e r e e x i s t s
I s i t s solution.
las
t o derive
the
delay vector.
I n that
t h e components I f we
take
make a l l
i s easy t o d e s c r i b e . I fa l l c y c l e s a r e ba-
case
a g e n e r a l i z e d schedule
( 7 . 3 ) can be r e g a r d e d
o f t h e schedule
using
as t h e formu-
t h e components o f
an a r b i t r a r y g e n e r a l i z e d schedule
t and
65 define be
t h e v e c t o r u by
compatible
any
and
generalized If
algorithm
case
there are
may
the
. then, obviously,
-t
will
belong
exist
a
schedule schedule
and
be
the
following
Fig. 7.1(a).
The
p -p
2
1
12
additional
,
delay
minimizes
(7.3)
p -p
3
= u
2
the
has
23
the
Clearly,
i t i s compatible
w
bv
S
In
this
2
-S
1
=
i f and
(tl , 12
case t h e s e t R
only
tor
s.
I f the and
We
have
existence as
the
S ~S
3
one
ta^^ R
p -p
3
from
Yet
exist. be
speci-
=
1
w
13
that
>
"j2
the
a l g o r i t h m on
+
= 0la "23
I f
u
3
*
) 3
that
using
R
t h e s h i f t by
the
vec-
o
+ w 2 3
holds,
are a l t o g e t h e r
p o s s i b l e . T h i s does not f o r the
each
time
S - 5 = W +W 3 1 12 23
then
no
such
of
(7.3)
that every data
mean t h a t
vector
s
different,
compatibility
o f such a l g o r i t h m implementation
s o o n as
fastest
and w o
mentioned
for
S t a t e m e n t 7.7.
form
i f w^
W , 23
-
2
i s obtained
inequality sets R
that
such v e c t o r s e x i s t s
u exists
the
in
7.1
of Statement 7.7,
virtue
3
Even
b) Fig.
then,
the
caused by
wait
by
a)
Z3
waits
a l g o r i t h m graph
the
,
will sense
set, then
vector.
i s described
example. Let
(7.3)
In that
of a l g o r i t h m graph.
s c h e d u l e does not
system
= w
balanced.
to the described
the
that
I t e m . Such s i t u a t i o n
c a s e s when e v e n t h a t
Consider by
= t
implemented w i t h o u t
between
i n d i v i d u a l data
fied
v e c t o r does not
c a n n o t be
there
a.^,
schedule balances the cycles
the delay
discrepancy
formulas
therefore a l l cycles
this
the whole. Let
implies
the
item i s used
implementation
is
the algorithm graph
the be
66 specified
by F i g . 7 . 1 ( b ) .
has no c y c l e s , t h e s y s t e m
We
assume
'
1
•
Thus o n l y extent r
the f i r s t
functional
be i n t r o d u c e d
J
j
m a x ( ( ( i , i / ) , (w, v))
=
(8.5).
The
(8,5) suggests
=
J
• (u,v)©(w, v ) .
of the properties
f o r the f u n c t i o n a l
0. The
J
J
'
= m a x ( m a x ( u +v . I , max (w ,+V .)) j
-)
value that
( 8 , 6 ) does n o t h o l d o f ft ( p ) d e f i n e s a s e t o f w iff s c h e d u l e s f r o m ft c o n s i s t i n g o f a l l t h e "sums" o f s c h e d u l e s f r o mft( s ) sms
iff
with be
L>
s c h e d u l e s f r o m ftiff ( p ) . T h e n o t a t i o n s Ru ( s ) a f t ij i p ) and Aoft ^ ( s ) a r e t o
understood
i n t h e same way. F o r m a l l y
S t a t e m e n t 9.2 e n s u r e s
r e s u l t o f a p p l i c a t i o n o f any o f t h e mentioned o p e r a t i o n s belongs
t o R . However, w h i l e
proving
Statement
that the
t o any
9.2 we h a v e
classes actually
ii established
the following relations: R
iff
( s ) ® R ( p ) c ft ( s o p ) , u
ft(s)aft
tff
(p) c R ( s a p ) ,
tff
iff
(9.4)
iff
Aoft ( s ) = ft ( A O s ) . Iff u The
last
any
c l a s s ft ( s ) t h a t i s fixed.
account
we f i n d R
hi
that
We
case
For example,
the formulas
way w h i l e
the usual
applicable schedules
u s t o assume when
we c a n assume
o f the i n i t i a l that
mins
;
studying
conditions = 0. T a k i n g
(9.1) imply
vector
min t . = 0 f o r a l l schedules I n
case,
have a l r e a d y
usual
allows
one o f t h e components
t h e n o n n e g a t l v e n e s s o f t h e components o f t h e d e l a y
I s ) i n that
the
o f the o operation
to
vector into
property
mentioned
t h a t we assume n u m b e r s t o b e o r d e r e d I n
we e x a m i n e
concepts
t o the sets
the properties
of limit,
closed
o f schedules.
a r e t o be t r e a t e d
sets,
The u p p e r
as the r e s p e c t i v e
o f schedules. bounded
sets,
and lower bounds
I n that etc.
are
bounds f o r
f o r their
com-
ponents. STATEMENT
9 . 4 . T h e class
R (s)
is
bounded
from
below
and
closed
to for
a l l valid
The
vectors
s and w.
components o f t h e d e l a y v e c t o r
are nonnegative,
so t h e formu-
84 las
(9.1) Imply that
the
minimal
R
class
that the
o f the I n i t i a l
t o be bounded f r o m below.
is)
the
a l l components o f any s c h e d u l e
component
z
k
R i s ) . There e x i s t s
•+ Z a s k -» to,
limits
equalities
that
(7,1) hold.
We
k
a l lg
i
tial
* a.
conditions
ft i s ) ,
z J
s . This
= s . i f g. J J
= 0 , we
z
in R
obtain
by
(*)
such
evaluating ^
z
the i n -
have k
z. * u>, , i 'J
Consequently vector
than
proves
l e t z be a n a c c u m u l a t i o n p o i n t f o r
z . = s . f o r t h e c a s e % . = e>. F o r s c h e d u l e s
z. j for
are not less
vector
a sequence o f s c h e d u l e s k
Since
Now
conditions
*
z,
j
I
ij
the vector z i s a schedule. Since
for z
i s the vector
s,
z belongs
the
to the
iniclass
i . e . t h e c l a s s ft ( s ) I s c l o s e d . STATEMENT
9.5.
The
class
ft
is)
contains
the
schedule
such
Ois)
u that
0 ( s ) s u = u . O l s ) « u = Ois)
for
any u e R
is).
Take a n y number j . The low. est
(9.5)
I t follows
that
c l a s s ft I s ) i s c l o s e d a n d b o u n d e d f r o m b e ¬ , <J e ft ( s ) e x i s t s s u c h t h a t t h e g r e a t er
a schedule u
J
l o w e r bound o f v a l u e s o f t h e j t h components
ft is)
i s reached on i t . Consider
o f a l l schedules
J
Ois)
u e
the schedule
= ® u .
(9.6)
J
where
the
"product"
Furthermore, of
Ois)
I s the exact
schedules ft is)
in R^is).
and a l l
is
taken
over
a l l j
the nature of the ® operation l o w e r bound As
a
. Clearly. implies
of values of that
consequence
of
that,
we
j
.) = u .,
( 0 { s ) ® u ) . = m l n ( 0 , ( s ) , u .) = 0 J
J
I
.is), J
e ft i s ) .
each
component
component
over a l l
obtain
w
( 0 ( s ) ® u ) . = m a x ( 0 Asi.u
Ois)
that
f o r any
u
e
85 i.e.
the
relations
(9.5)
hold
good
for
the
schedule
0(s)
defined by
(9.6). COROLLARY. Every the
zero
element
Indeed, These
respect
obey
the
is
to
c l a s s ft is)
the
operations
laws, y e t
R^ls)
class
with
i s closed
fact
that
the
the
schedule
minimum
(9.1)
are
under
commutative,
values
o f components
"
is)
vector by tor.
that
However,
still
a r e cases
com-
to
ob¬
for a given
s
inequalities i n
* 0, •
(9.7) = 0.
i s not
closed
can when
alter the
the
This
under the
i s accounted f o r
initial
result
usual
c o n d i t i o n s vec-
o f vector
operations
belongs t o the c l a s s ft [«).
tional
vector
Let
i n ft i s ) . "
Every
addition
class
is)
is
convex
with
respect
to
conven-
multiplication.
v b e l o n g t o t h e c l a s s ft i s ) . A c c o r d i n g t o S t a tu A, 0 £ A £ 1 , t h e v e c t o r \u + ( l - A ) v i s a s c h e d u l e
I f g . = a then J
J that not
corresponds v.
ts
and scalar-vector
{ A u + ( 1 - A ) v } , = Au
see
ft
s c h e d u l e s u and
teraent 7.4 f o r any
and
) If g
scalar-vector multiplication.
STATEMENT 9 . 6 .
Me
formulas
that
j
these operations
there
testify.
i f the
•*
case t h e c l a s s R ^ l s )
a d d i t i o n and
the f a c t
imply
the
we h a v e
+ ia
lea e g
(9.5)
Is
t h e minimums o f t h e
reached
O j ( s ) - s . i f gj
the general
e and » .
and d i s t r i b u t i v e
us t o b u i l d
(9.1)
Therefore
= max(0 J
are
allows
are
by e q u a l i t i e s .
0 .is)
In
possessing
operations
as the r e l a t i o n s
0 ( s ) . The r e l a t i o n s
replaced
the
associative,
c o m p o n e n t s o f Ois)
the
p o n e n t s o f a l l s c h e d u l e s f r o m ft is) tain
semiring e and e .
h a v e i n v e r s e o p e r a t i o n s . T h e s c h e d u l e Ois)
they do not
"zero element" o f t h a t semiring, The
a commutative
the operations
only
t o the
Therefore
Mi-Xi\r, = J J
i s the v e c t o r same
initial
As + ( 1 - A ) s . = s
J
Au. + ( l - A ) v a s c h e d u l e ,
conditions
f o r a n y A, OSASl,
J J
the
vector
vector
as the
Au + ( 1 - A ! v
but
also i t
schedules u indeed be-
86 longs
to R i s ) . w
According R
t o S t a t e m e n t 7.4 t h e c o n v e n t i o n a l
always belongs t o R . Yet i t i s o n l y
under c o n v e n t i o n a l
vector
addition,
0 only. For A a 1 the product of
a l l the classes
As=s h o l d s
good
id
the class R (0) that
a s t h e e q u a l i t y s+s=s
of R
R (0) i s closed
id
f o r s=0 o n l y .
from
i s closed
holds
for s =
and A a l w a y s b e l o n g s t o R . A g a i n ,
Id
onlv
R is)
sum o f s c h e d u l e s
We
see t h a t
under
Id
that
the class
o p e r a t i o n , as
R ( 0 ) possesses
£d
c e r t a i n e x c e p t i o n a l p r o p e r t i e s as r e g a r d s t h e c o n v e n t i o n a l
vector
ations.
of the class
R
The u l t i m a t e
reason
f o r these
( 0 ) i s t h e unique p o s i t i o n
special properties
o f the zero
vector
oper-
i n the set of vectors
fd with conventional
vector
However, t h a t vectors "zero"
with
operations.
zero
vector
operations
©
vector. Accordingly,
corresponding
ceases
and a.
t o be e x c e p t i o n a l
I t s place
i s taken
the special position
i n the set of
over
by
another
i s now o c c u p i e d b y t h e
class R i s ) .
u STATEMENT 9 . 7 . the
vector
tion.
s"
Then
eration)
that
any of
Let is
the the
schedule
the
set
initial
"zero''
from
schedule
of
vector
R^is)
Ois)
can
and
conditions with
be
some
vectors
respect
represented
schedule
to
S
the
as from
comprise operaf©
a "sum"
the
class
R
opis").
id Furthermore,
the
©
"sum"
of
0(s)
and
any
schedule
from
R is°)
yields
a
Id schedule
in
R (si.
Consider
some c l a s s R is)
Id
relations fine
(9.1)
hold
By
definition
Therefore
of Ois)
Ois)
t.
f o r ut h. e= v e c t o r 1
t h e v e c t o r u by
definition
a n d l e t t be some s c h e d u l e i n R ( s ) . The i f g
t and (9.7)
s . I fg
of s " the equalities the equalities
® u = t.
(d
0
*
j
«
f o r Ois).
We d e -
a
s ^ © Sj°~
0^(s)
hold
© t
I t remains t o v e r i f y
S j hold = t,
hold
i f g . = z.
By
i f g . * a.
that u c R i s " ) . Since i n
Cd t h e c a s e o f gj
• 0 t h e components o f u e x a c t l y match t h o s e o f t h e vec-
tor s " , the only schedule
inR .
thing
that
The p r o o f
i s not y e t proven
t o that
can r e a d i l y
i s that
the vector u i sa
be o b t a i n e d
by
collating
87 the f i r s t group o f r e l a t i o n s difference replaced u. T h i s e
i s that
o f (9.11 f o r t h e v e c t o r s
f o r g ^ = <e> t h e q u a n t i t y t
by a not greater
which
{
t a n d u . The
quantity s° i n the relations
replacement obviously
preserves
the inequality.
only
t o Sj
i s equal
Is
f o r the vector Consequently, u
SvfVl. Now
l e t v be an a r b i t r a r y
schedule
0 ( s ) ® v. S i n c e 0 ( s ) and v b o t h schedule
in R
as w e l l .
i n R i s " ) . Consider
belong
to R , the vector u I f g , = 0 t h e n we h a v e
the
"sum"
0(s) © v is a
( 0 ( s ) © v > . • m a x ( 0 , ( s ) , v . ) = m a x ( s ., 8°. ) - s .
J I.e.
j
}
J'
I
J
0 ( s ) © v b e l o n g s t o t h e c l a s s fi i s ) . Iff
"Adding"
(using
the © operation)
a given
t o r s c a n be r e g a r d e d as a s o r t o f " p a r a l l e l the
given
vector.
I n these
terms
any
c l a s s R ( s ) c a n be g e n e r a t e d f r o m
vector
notation fi^(p)
t h e c l a s s R is")
b y t h e v e c t o r Ois)
if is)
using
s e t by i s that
by t h e " p a r a l l e l
t h e © o p e r a t i o n . We
= 0(s)®R
iff
This y i e l d s
9.7
Iff
u®/? ip) to designate the " p a r a l l e l w t h e v e c t o r 11. T h e n we h a v e
by
of that
t h e message o f S t a t e m e n t
0>
translation"
t o a s e t o f vec-
translation"
the following
iff
translation"
introduce the of the class
( / ) .
(9.8)
r e p r e s e n t a t i o n o f t h e s e t o f s c h e d u l e s fi^: - \j 0 ( s ) f f l R (s°).
R
s
Summing i t a l l u p , we s t a t e t h a t e x p l o r i n g t h e s e t o f s c h e d u l e s I n c a n be r e d u c e d t o t h e s t u d y o f t h e s e t o f " z e r o " s c h e d u l e s 0 1 s ) a n d w of t h e c l a s s R i s " ) . N o t i c e t h e weakness o f t h e c o n s t r a i n t s imposed on R
iff
the vector vector" tor
s
s " . The
only
w i t h respect
that
defines
thing
required
t o the 9 operation our class
i s that
i t should
be
the "zero
i n some s e t c o m p r i s i n g
R I s ) . Given
a vector
s , take
t h e vec-
any
vector
iff
s"
satisfying
ponentwise). be c l o s e d that
the inequality s " £ s Then
under
(the inequality
i s t o be t a k e n
t h e s e t c o n s i s t i n g o f t h e two v e c t o r s
the © operation,
set. I n other
words,
every
and s " w i l l
be
class
can
R^is)
the "zero be
com-
s a n d s° vector"
generated
will In
by t h e
88 "parallel other case
translation"
class
R i s
0
by t h e v e c t o r
) , provided
the set of i n i t i a l
vector",
conditions
nonnegative
by
the a
o p e r a t i o n o f any
inequality
vectors
s° s
s
holds.
does n o t c o n t a i n
with
initial zero
t h e schedule
vector
conditions
initial 01s).
s . Consequently, vector
conditions We
have
any c l a s s
c a n be g e n e r a t e d
vector
already
by a
noted
ft^ls)
from
"parallel
that
In
t h e "zero
i t c a n a l w a y s be a d d e d t o t h a t s e t . T h e i n e q u a l i t y 0 £ s
good f o r any n o n n e g a t i v e
ft ( 0 )
01s) using
the vector
holds with
the class
translation"
the class
R^IO) i s
somewhat s p e c i a l . Now t h e r e p r e s e n t a t i o n o f t h e s e t o f s c h e d u l e s ft^ has the
form
ft = v Ois) u s We
have m e n t i o n e d
particular
earlier
c l a s s ft is)
we
p)
is hold:
(10.2)
AoO(s) = O(Aos) .
The same
0 operation
number.
shifts.
Notice
Consider
shifts that
a l l components o f a s c h e d u l e b y one and t h e the relations
(9.7) are invariant
t w o s c h e d u l e s Ois) a n d 0{p),
I n c a s e g . * 0 we h a v e
(0(s)+0(p))j = maxfO^sl.O^p)) =max(max ( 0 ( s ) + u l *
£
j
U
) , max ( 0 ( p ) + w leg.
t o such
= ) )=
91 = max
(max(0
|s)+u
= max
(max(0
= max
I.e.
the
first
cond g r o u p o f
),0 1
leg.
(p)-Hd
1
l
.) =
J
( s ) , 0 (p)-no
) =
((0(s)®0(p) } t u ) ,
group o f r e l a t i o n s relations
J
(9.7)
(9.7)
holds
holds.
I f g. = a t h e n
the
good b y f o r c e o f d e f i n i t i o n
se-
of op-
eration ©. COROLLARY. The set of
conditions
initial
of
optimal
vectors
schedules
with
is
respect
to
isomorphic
the pair
to of
the
operations
set ©
and © . The optimal
© operation schedules.
stays
Indeed,
Suppose t h a t a l l b u t
one
are
the
zero,
graph
and that
nodes.
conditions Input
nodes
node,
then
within
I f the
take
for
are
linked
yet
twooptimal
non-zero
the
s and p are
schedule
by paths
t h e values
initial
initial
i s zero. same
components
set o f
and 0 ( p ) .
conditions
nonnegative
t o one and the
the
0(s)
correspond
O(s)aOlp)
o f corresponding
conditions vector.
a d d i t i o n and
schedules
components
0 ( s ) © 0 ( p ) may e x c e e d t h o s e o f t h e o p t i m a l zero
i t may g o b e y o n d
components o f t h e i r
vectors
vector
R
vectors
to different
then
the
initial
I f both algorithm
that
the
graph
o f t h e vector
schedule corresponding
Note a l s o
o f our
conventional
s c a l a r - v e c t o r m u l t i p l i c a t i o n do n o t g o beyond R
to
the
vector
(provided
it) the
scalar
bounds o f not
factor the
i s greater
set o foptimal
convex w i t h respect STATEMENT 1 0 . 3 .
tains
the vector
tion.
Then
ule"
that
The
s
the set is
equal
formulas
ule
do not
the
definition
to these
Suppose q
which
1), yet
i s "zero
optimal
to
0(s°).
imply
they
do not
stay
within
the
schedules i s
operations.
the set
of
(9.7)
than
s c h e d u l e s . The s e t o f o p t i m a l
initial
vector"
schedules
that
of
with also
conditions respect
contains
vectors to the
con-
the ©
opera-
"zero
sched-
a l l components o f a n o p t i m a l
sched-
d e c r e a s e a s l o n g a s n o n e o f t h e c o m p o n e n t s s ^ d e c r e a s e s . By o f "zero
vector"
with
respect
t o© operation
a n y com-
92 ponent o f any i n i t i a l ponding
the optimal of
conditions vector
component o f t h e v e c t o r
schedule
with
respect
0(s°).
s i s not less than
I tfollows that
schedule 01s) i s not less
the optimal
schedule"
s°.
than
This
each
the correscomponent o f
the corresponding
proves
t o the © operation
that
component
01s°) i s t h e
"zero
i n the set o f a l l optimal
schedules. Let Suppose
P be that
an A b e l i a n
semigroup
the operation
a n d ff be
a commutative
of multiplication
by s e m i r i n g
i n t r o d u c e d f o r t h e s e m i g r o u p e l e m e n t s . Suppose f u r t h e r tion the
i sdistributive. semiring
Then
the semigroup P i s c a l l e d
R. S u m m a r i z i n g
our r e s u l t s
semiring.
elements i s
that
this
opera-
a semimodule
over
on t h e p r o p e r t i e s o f o p t i m a l
s c h e d u l e s , we o b t a i n STATEMENT semimodule optimal
I f
the
respect
schedules
of
to
is
set
the
also
of
pair
initial of
conditions
a semimodule
with
vectors
o,
operations respect
&
then
to
is
the
that
set
same
a of pair
operations.
The as
10.4.
with
relation
|9.7) i m p l i e s t h a t
t h e image o f t h e i n i t i a l
the schedule 0(s)
conditions vector
can be regarded
s f o r some m a p p i n g
A ,
1. e. 0(s) - A ( s ) .
By
virtue
o f (10.2)
t i o n s o, ®, s i n c e
the operator
"zero
U
(aou®/3ov) = o&A u®8oA u)
Moreover, Statement vector"
from
10.3 e n s u r e s
i t s domain onto
fact
that
nent
s . changes w i t h o u t d e c r e a s i n g
decreasing,
that
should
that
t h e 'zero
no c o m p o n e n t s o f a n o p t i m a l
J
equality
i s linear
with
respect
t o opera-
the equality
A holds.
A.
(10.3)
v
«
the operator vector"
i n i t s r a n g e . The
schedule decrease
means t h a t
a s a n y compo-
the operator
A i s nonw the vector i n -
i s , i f s : p then be r e g a r d e d
i l maps t h e
A s i A p. As u s u a l , to u> componentwise. Obviously, t h e o p e r a t o r
A ui
is
continuous
this
operator
semimodule.
and bounded maps
a
from
semigroup
The o p e r a t o r
A
below. onto
possesses
Statements semigroup
10.2-10.4
and a
t h e inverse
imply
that
semimodule
onto
operator,
since i t
93 maps d i f f e r e n t i n i t i a l c o n d i t i o n s v e c t o r s o n t o d i f f e r e n t o p t i m a l s c h e d u l e s . T h e o p e r a t o r A c a n b e r e p r e s e n t e d a s t h e d i r e c t sum o f t h e i d e n u>
tity and
operator
E corresponding
o f the operator
t o t h e s e c o n d g r o u p o f r e l a t i o n s o f (-9.7)
A ^ corresponding
t othe f i r s t
g r o u p o f 1 9 . 7 ) . The
o p e r a t o r A y p o s s e s s e s no i n v e r s e , s o i t i s o n l y because t h e i d e n t i t y operator E i s r e v e r s i b l e t h a t t h e o p e r a t o r A possesses t h e i n v e r s e . w The us
new i n f o r m a t i o n o n t h e p r o p e r t i e s o f o p t i m a l
to refine
the formulas
STATEMENT 1 0 . 5 .
The following
equalities
ADR
10
h a v e t o show t h a t a n y v e c t o r
a s a "sum"
o f vectors
from
f r o m R (s©p) c a n be r e p r e s e n ts a n d R ip). L e t z e R ( s a p ) . By
R is) IO
virtue and
o f representation
s° i s any"zero
count vity
(10.4)
is) « R ( A o s ) , U
ted
allows
hold:
( s ) © R ( p ) - R is@p)
R
We o n l y
schedules
(9.4).
(9.8)
vector"
Ui
we h a v e
CO
z = 0(s©p)®u
where u e ^
f o rthe vectors s , p, ssp. Taking
( 1 0 . 2 ) , t h e e q u a l i t y u s u = u and t h e c o m m u t a t i v i t y o f t h e © o p e r a t i o n , we c o n c l u d e
u
^
s
'
into ac-
and a s s o c i a t i -
that
z = 0 ( s © p ) ® u = ( 0 ( s ) © 0 ( p ) )©(ti©u) •
•
In
accordance
R
ip).
with
(0(s)©Ll)©(0(p)®U).
the representation
COROLLARY. The set
of
classes
(9.8) 0(s)®u
R is)
is
e R^is),
isomorphic
to
0(p)©u 6
the
set
of
to initial
conditions
vectors
with
respect
to
the pair
of
operations
© and
O, Now we a r e a b l e
t o describe
schedules corresponding of the
initial
t o o n e a n d t h e same d e l a y
conditions vectors.
set ofinitial
e a c h c l a s s R is)
t h e macrostructure
o f t h e s e t R^ o f
v e c t o r U a n d some s e t
I t i s t h e s e t o fclasses
conditions vectors
can be represented
comprises
a s t h e "sum"
R^ls).
t h e "zero" (9.8).
Provided
element s " ,
The o p e r a t i o n s
94 « a n d © may be i n t r o d u c e d i n t h e s e t o f c l a s s e s If
the set o f i n i t i a l
operation o,
© o r a semimodule w i t h
then the set R
using
formulas
(10.41.
c o n d i t i o n s v e c t o r s i s a semigroup w i t h respect t o respect
to the pair
o f operations ©,
i s a semigroup o r a semimodule w i t h
respect
t o these
Li
same o p e r a t i o n s , r e s p e c t i v e l y , a s i t i s t h e u n i o n o f c l a s s e s existence the
o f t h e "zero"
initial
condition
fi^(s).
The
s° implies the existence o f
"zero"
class - i t i s the class R I s " ) . w microstructure of the set R i s described
The
v i a i t s macrostruc-
a ture
and t h e m i c r o s t r u c t u r e o f each o f t h e c l a s s e s
R I s ) i s a commutative semiring w i t h the
"zero"
be g e n e r a t e d a
fixed
respect
schedule 0 ( s ) . According by a " p a r a l l e l
translation"
class
t o o p e r a t i o n s © and e w i t h
t o t h e r e p r e s e n t a t i o n (9.8)
c l a s s fi (s°) b y i t s " z e r o "
schedules 01s) i s isomorphic
R ( s ) . Every w
i t can
i n t e r m s o f t h e si o p e r a t i o n o f
schedule
01s).
t o the set o f classes
The s e t o f R (s) that
"zero" contain
w
these
schedules
class
i s bounded f r o m below and c l o s e d . B e s i d e s t h a t ,
respect
with
respect
t o operations
t o the conventional vector addition
© a n d o. M e t r i c a l l y
every
i t i s convex
with
and s c a l a r - v e c t o r m u l t i p l i -
cation. We p r o c e e d w i t h o u r i n v e s t i g a t i o n o f i n d i v i d u a l moments o f i n i t i a l test
that
Besides t h a t ,
yield
the
functional
data Tit)
However, e v e r y
t h e same e x e c u t i o n
input.
time
time w i l l
conditions vector,
achieves
classes. Given the
t h e optimal schedule describes
algorithm execution
of the i n i t i a l
ments o f i n i t i a l time
input,
algorithm implementation.
schedules
tion
data
We w i l l
refer
may
contain
other
as t h e o p t i m a l
schedule.
change w i t h e v e r y
modifica-
i.e. with
i t s minimum
c l a s s fi ( s ) , o r i n t h e e n t i r e
class
the fas-
every
change
o f mo-
t o schedules f o rwhich the as h i g h - s p e e d
schedules i n
s e t o f s c h e d u l e s R , o r i n some o t h e r
set. STATEMENT 1 0 . 6 . G i v e n a n y s c h e d u l e s u , " i n fi , ( h e i n e q u a l i t i e s Htifflv) £ T ( u ) © T ( v ) ,
always
r(uav) a r(ul»7(v)
hold.
Consider t h e system o f obvious
inequalities
(10.5)
min
u . s u.
i
i
i
min
* max u . , . i
1
v . i v . s max v . .
i
'
'
1
1
As b o t h max a n d m i n a r e n o n - d e c r e a s i n g f u n c t i o n s ,
we h a v e
m a x ( u . , v . ) a maxlmax u.,max v . ) . i
l
.
i
.
i
max(u^,v^] a maxlmin u^.min v ^ ) , i i m i n ( u . , v . ) * m l n l m a x u ., max v . ) , i i . i . i i
l
m i n ( u . , v . ) £ min(min u.,min i
Taking
i n t o a c c o u n t ( 1 0 . 1 ) we f i n d
m a x l m a x u.,max v.) i
i i
.
v.).
i
that
= max(7"(u+min U,),Tlv)+Hili
i
i
.
i
£ m a x ( T ( u ) © r ( v ) + m i n u .,T(u)®T{v)+min
u.,min v . ) .
of the inequalities
T l u w ) = max(max(u.,v i '
(10.5)
)) - min[max(Uj.)) a i
s max(max u . , max v . ) - m a x ( m i n U.,min v.)) i
1
1
i
a
S i m i l a r l y we h a v e
i
i
= KulsTdO+maxtmin i the f i r s t
v,
v ) =
i
Now we o b t a i n
. i
i
riultflv).
i
a
96 m i n f m a x u.,max v.) J J i i
= m i n ( T ( u ) + m i n u., T ( v ) + m i n v . ) * i i
s min(T(u)®r(v)*min i
u .,r(u)@T{v)+min ' i
IT,) =
T ( u ) ® i ( v ) f - m i n ( m i n u ^ . m i n ?. 1. i i Now we o b t a i n t h e s e c o n d o f t h e i n e q u a l i t i e s
r(u®v) = max(min(u . , v . ) )
a m i n ( m a x u.,max v.) 1 i i '
(10.5):
m i n ( m i n ( u , v )) £
mintnlin u^.min i i
))£
£ T(u)®T(v).
The 10.6
time f u n c t i o n a l
i s n o n n e g a t i v e f o r a l l s c h e d u l e s . By
there i s an a d d i t i v e
sembles an a d d i t i v e uniformity
with
Statement
i n t h e © o p e r a t i o n upper bound f o r i t .
seminorm i n t h a t
a s p e c t , y e t due t o t h e absence o f
r e s p e c t t o t h e e. o p e r a t i o n
n o r m . We h a v e a l r e a d y n o t e d t h a t
I t re-
i t i s not actually
the time f u n c t i o n a l
a
semi-
d o e s n o t d e p e n d on
the o o p e r a t i o n . STATEMENT each
of
functional
Let schedules
the
10.7.
For
®,
operations in
SJ i s closed
T^ d e n o t e
any ®,
set and
under
that
schedules
<s,
these
the same
Q that
set
of
is
minimums
closed of
i . e . T(u)
under the
time
operations.
t h e minimum v a l u e o f t h e t i m e f u n c t i o n a l
u a n d v be m i n i m u m s ,
have t o p r o v e
of
-
Tiv)
-
T ( u s v ) = T ( u w ) = T^. A c c o r d i n g
i n £2. L e t
T^. E s s e n t i a l l y
we
to the inequalities
( 1 0 . 5 ) we h a v e
T(u®v) £ T(u)©T(v) = max(Tn, T ) Q
T(u«v) s r(u)®r(v) The
- T"n
= maxtT^.T^) = T .
s c h e d u l e s u ® v and ue>v a r e i n 13, a s we h a v e assumed t h e s e t £1 t o be
c l o s e d u n d e r o p e r a t i o n s ta a n d ® . N e i t h e r Tiu&v)
n o r T ( u s v ) c a n be
less
97 than
since
follows
that
If the
i s t h e minimum v a l u e o f t h e t i m e f u n c t i o n a l
t h e e q u a l i t i e s T(u&v)
s e t o f high-speed
schedules
(a semimodule, a s e m i a l g e b r a ) (ffl.o o r
STATEMENT 1 0 . 8 . closed
metrically
the
"zero
(with
above).
It
d i f f e r e n t terms
t o s e t s o f vec-
which operations are c u r r e n t l y considered. I f the semiring
(the
of
from
"unit
the minimums
below
(from
above)
with
respect
element")
of
a functional then
i t
to
is contains
operation
®
t h e s e m i r i n g be m e t r i c a l l y c l o s e d and bounded f r o m below
(from
Repeating w i t h
9 . 5 , we e s t a b l i s h
upper
I t i s a semiring
o p e r a t i o n s 8 and a,
r e s p e c t to o p e r a t i o n i s ) . Let
and
hold.
i n Q i s a semigroup.
i f £3 i s c l o s e d u n d e r
and bounded
element"
I t
one o f t h e o p e r a t i o n s © o r ® , then
a n d o ) . By a p p l y i n g
t o r s we make i t c l e a r
= 7"^ m u s t
= T{u®v)
t h e s e t £3 i s c l o s e d u n d e r
i n Q.
bounds
i s this
h a r d l y any changes t h e p r o o f s o f s t a t e m e n t s 9.4 t h e e x i s t e n c e o f t h e schedule on which
I the greatest
schedule
that
the least
l o w e r bounds) o f a l l components a r e reached.
i s "zero"
("unit") with respect t o operation ®
(or e ) . COROLLARY. I f the is
bounded
from
below
semiring (from
of
the minimums
above)
then
from
that
semiring
and the
"zero"
schedule
from
that
semiring
and the
"unit"
schedule)
schedule
of
that
same
to
operations ional
Let
of
the
"product" (the is
time
functional
of
any
schedule
"sum" of
any
schedule
the
"zero"
(the
"unit")
semiring.
STATEMENT 1 0 . 9 .
funct
the
I f
+ and
•
the then
set
of
the
schedules
algorithm
£1 i s convex execution
with
time
is
respect a
convex
in £3,
u , v be a r b i t r a r y s c h e d u l e s
i n Q. F o r a n y A, O s A s l ,
r(Au+(l-A)v) = = maxtAu.-f(l-A)v.)-min(Aui + ( l - A ) v i ) ^ i < maxUu i
i
)+max( ( l - A ) v i
)-min(Au i
)-min( (1-A)vj.) '
- A l m a x u . - m i n u , ) + ( 1 - A H m a x v. - m i n v ) =
we h a v e
98
This proves tiie c o n v e x i t y o f t h e time COROLLARY. If erations
*,
convex
with
Let 10.9
,
the
then
respect
set
of
the
set
to
these
of
of
the
with
time
respect
funct
s c h e d u l e s u a n d v be m i n i m u m s a n d Tlu)=T{v)=T^.
7"^ i s t h e minimum
to
tonal
is
opalso
operations.
T(Au+(l-A)v)
(l-A)v)
f) i s c o n v e x
minimums
f o r a n y A, O ^ A s l , we c o n c l u d e
Since
functional.
schedules
By S t a t e m e n t
that
a AT(u)+(l-A)T(v) =
value
o f the time
c a n n o t be l e s s t h a n T f i . T h e r e f o r e
functional
the equality
i n £!, T ( A u +
T(Au +
(l-A)v)
must h o l d . C o n s e q u e n t l y t h e s e t o f m i n i m u m s i s c o n v e x . Now
we
Consider follows
a r e able
t o describe
some
sets
o f high-speed
schedules.
operations
and ® . I t
the class that
R ( s ) . I t i s c l o s e d under to t h e s e t of high-speed schedules
a
i n R (s) i s a
semiring
to w i t h r e s p e c t t o o p e r a t i o n s ® and ® , The s c h e d u l e 0 ( s ) i s a h i g h - s p e e d o n e i n R ( s ) . T h i s same s c h e d u l e i s t h e " z e r o " i n t h e s e m i r i n g o f h i g h W
speed
schedules
from
R is).
This
semiring
i s metrically
closed
and
to bounded f r o m below. A c t u a l l y , an even s t r o n g e r p r o p e r t y h o l d s : min r e m a i n s c o n s t a n t f o r a l l s c h e d u l e s ! i n R ( s i . The f a c t t h a t t h e t i m e
to functional
i s constant
is
from
bounded
ring.
above.
The s e m i r i n g
i n the semiring implies that Hence
the "unit"
o f high-speed
schedule
schedules
from
the semiring exists
itself
i n t h e semi-
R I s ] i s convex
with
to r e s p e c t t o o p e r a t i o n s +, •, A c c o r d i n g t o ( 9 . 8 ) a n y c l a s s R is) of
the "parallel
rings
o f high-speed
possess t h a t
c a n be o b t a i n e d
as t h e r e s u l t
EJ
t r a n s l a t i o n " o f R (s°) b y t h e s c h e d u l e 0 ( s ) , The s e m i Ed schedules
corresponding
p r o p e r t y . Moreover,
t o these
i n the general
case
classes
do n o t
t h e y a r e n o t even
isomorphic. The
set of high-speed
schedules
from
has t h e f o l l o w i n g
R
struc-
to ture. ©,
Taken as a w h o l e ,
® , a. I t i s c o n v e x
i t i s a semi-algebra with
respect
with
to operations
respect +,
to operations
• and
metrically
99 closed.
Algebraically
schedules
from
this
set
is a
union
some c l a s s e s R ( s ) . b u t
of
semirings
of
high-speed
i n general not from a l l of
them.
to To be m o r e p r e c i s e , o n l y t h o s e c l a s s e s a r e t a k e n f o r w h i c h t h e c o r r e s p o n d i n g o p t i m a l s c h e d u l e s a r e h i g h - s p e e d s c h e d u l e s i n f i . A l l s u c h opto tlmal
schedules
vectors
i s bounded
schedules
from
ty" ) would optimal In
from
below
remain
"zero"
based
e. g.
upon
and
"unit"
("unity")
a schedule Now
we
components
schedules
Therefore t o be
i n the
initial
the
conditions
set of
This
high-speed
"zero"
semi-algebra
(or
of
"uni-
high-speed
equal
graph
arc
the path
a
one
may
vector
whose
of
sider
the
(9.7)
there exist
the
j t h algorithm
0.(0)
direct
of
the
of value,
relation
o p t i m a l schedule
ascribe
commay
algebraic though
to
the
high-speed
i s not
always
a
sufficient conditions
of
equal
node and
ing
t o t h e j t h node f r o m
of
initial one
possess
in
the
another
graph
node.
+ WJf
..
an
algo¬
d e f i n e the
length i t .
vectors the
time
which
The
then
contains corresponding is
equal
to
path.
optimal schedule
0 ( 0 ) . Con-
i n accordance
a r c goes f r o m
I t follows that
t h e j ' t h node s u c h
the
constituting
then
critical
I f gjie that
61^. to
path.
execution
algorithm
= 0^,(0)
arcs
conditions
the corresponding graph
weight
t h e j t h n o d e . We
the c r i t i c a i
set
i ' t h node such
jth
schedules
the
the weights
algorithm
the
build
an
He
i s called If
yield
length
t o be
high-speed
i t h and
of
components
Take s = 0 and
"zero"
knowledge
prove no
rather
c o n d i t i o n s v e c t o r s c o n t a i n s a v e c t o r whose
the
t h e sum
schedules
weighted
have
is a
methods. Such methods
i t i s important to f i n d
another.
STATEMENT 1 0 . 1 0 .
high-speed
The
can
property
linking
t o be
10.9.
the
schedules
high-speed.
describe
p a t h o f maximum l e n g t h
the
of
("unity").
high-speed
schedules
c a s e when t h e s e t o f i n i t i a l
a
set
of numerical
Statement
In particular,
h i g h - s p e e d one.
rithm
"zero"
t h e use
of the set of
schedules.
of
I f the
( f r o m above) then
case f i n d i n g
task r e q u i r i n g
structure
for
semi-algebra.
R ^ contains the
the general
"zero"
a
schedules.
plicated be
form
with
i t into
a path exists
the
lead-
that
(10.6)
100 where path. tity ly,
p' , I ' ,
j ' , k'
a r e t h e numbers o f nodes b e l o n g i n g t o t h a t
j
I n accordance w i t h
the properties
of optimal
schedules
t h e quan-
0 . ( 0 ) i s m i n i m a l f o r s c h e d u l e s c o r r e s p o n d i n g t o s = 0. Consequentthe weighted
node
cannot
(10.6). then
l e n g t h o f any p a t h l i n k i n g
exceed
the value
I f we t a k e t h e l a s t
t h e sum
execution time
critical
set of i n i t i a l
all
equal
node and t h e j t h
less
i s reached
value. This into
a s t h e j t h node
equal
of algorithm.
than
side of
the weighted
Clearly,
the weighted
length
algoo f the
o n t h e s c h e d u l e 0 ( 0 ) . Now l e t
c o n t a i n a v e c t o r whose c o m p o n e n t s
case
c a n be r e d u c e d
account
the relations
t o t h e o n e we
AoO(s) =
0(A©s),
no(s)).
COROLLARY. one
be
limit
have c o n s i d e r e d by t a k i n g
equal
o f (10.6) w i l l
conditions vectors
some n o n z e r o
HAoOls)) =
side
p a t h o f t h e graph cannot
p a t h , and t h i s
the
any i n p u t
i n the right-hand
operation of the algorithm
i n the right-hand
length o f the c r i t i c a l rithm
o f t h e sum
I f all
another
then
components the
optimal
of
the
initial
schedule
0(s)
condit is
ions
vector
s
a h i g h - s p e e d one
in
R . u Notice that
the arbitrariness o f the i n i t i a l
only hinder our achieving around
i t we
initial
c o n d i t i o n s v e c t o r can
t h e minimum a l g o r i t h m
have
t o take
a vector
conditions
vector.
For p a r t i c u l a r
with
execution time.
identical cases
components
other
To g e t as t h e
initial
condi-
t i o n s may e x i s t o n w h i c h t h e minimum e x e c u t i o n t i m e s p e c i f i e d b y S t a t e ment 10.10 i s r e a c h e d .
11. Examples While finding
investigating
schedules
the subsets o f a l g o r i t h m
functional
u n i t s o f t h e graph
T h e r e f o r e we w i l l
nodes
machine w h i c h
t i m e moment. T h e s e s u b s e t s d e f i n e graph.
one o f t h e most graph
refer
that
do t h e i r
some p a r a l l e l
t o these
important points i s correspond
forms
subsets
A
layer
o f a schedule
geometric
algorithm
graph
i s sometimes
interpretation nodes
called
accounts
a r e mapped
onto
o f the
the subset.
a wavefront.
f o r this
points
a t t h e same
of the algorithm
as t h e l a y e r s
s c h e d u l e c o r r e s p o n d i n g t o t h e t i m e moment t h a t d e f i n e s
lowing
jobs
t o those
term.
o f some
The
fol-
Suppose
that
space
and t h e
101 nodes
From
some
corresponds in
t i m e we
ted
at
which
a
form
I f we
have a process
given
moment.
forthcoming
which
a surface
t o some s u r f a c e . shall
grounds t h e term The
or.
layer
This
i s t h e same t h i n g ,
regard
process
EXAMPLE 1 1 . 1 .
chapter of
an
that
grid
the graph
node
i s oriented
left
to right.
that
time
We
with
illuminate
and
i n p u t n o d e s a r e n o t shown
wave
propagation,
of
o f schedules.
Let graph
considered will
apply
on
wavefronts, We
will
pri-
i n Example
6.3 i s
the r e s u l t s o f t h i s
nodes be s i t u a t e d
i , k where
laism,
i n t h e nodes
liksn.
Suppose
c o r n e r has c o o r d i n a t e s ( 1 , 1 ) , t h e and
the k axis
i s oriented
from
a l l components o f t h e v e c t o r u e q u a l
takes
positive
integer
i n F i g . 6.5. W i t h o u t
assume t h e y a r e s i t u a t e d dinate
i n F i g . 6.5
t o p t o bottom
assume t h a t
function
schedules.
i n t h e upper l e f t
is discrete
moment
o p e r a t i o n s are execu-
the behavior
of the layers
coordinates
from
time
s u r f a c e s as a
which
resembles
a l g o r i t h m s . We
to I t sinvestigation. integer
axis
i
The g r a p h
f o r computational
Every
i s applied t o the layers.
m a r i l y examine o p t i m a l and high-speed
typical
space.
these
specifying
"wavefront" examples
i n that
values.
losing
Recall
1, that
g e n e r a l i t y we c a n
i n t h e i n t e g e r p o i n t s o f the axes o f t h e c o o i —
system.
Let gorithm
r , , be i k graph
t h e component
node
with
o f the schedule
c o o r d i n a t e s i,k.
corresponding
Then
the formulas
t o the a l (9.1) w i l l
have t h e f o r m
t., ik
2
rnaxU. ,, t . . ) + 1. i-i, k i.k-1
m i n ( i . f c ) = 1,
(11.1)
t ., = s . , , m i n ( i . k ) = 0. ik ik Here, s.. a r e t h e components o f t h e i n i t i a l ing
t h e moments o f d a t a The
general
The q u a n t i t y The
x,k. teger is •
t
points
and
specify¬
input.
solution
of
the problem
c a n be r e g a r d e d
relations
conditions vector
(11.1)
strictly
(11.1)
as a f u n c t i o n
imply that
t(i,k)
increases with
i s easy
t(i.k)
takes
o f two
variables
integer values
i a n d k.
d e f i n e d f o r i . k > 0 and t a k e s g i v e n v a l u e s
to describe.
t(i,k)
The • s
function
in i n i(i,k)
, i f mln(i.k)
0. No o t h e r c o n s t r a i n t s a r e I m p o s e d o n t h e s o l u t i o n o f ( 1 1 . 1 ) .
102 For
given
boundary
values
s..
the
set
of
such
f u n c t i o n s forms
a
1K
semiring our
with
respect
particular
Identity
function
semiring.
to operations
c a s e . The 0(i,k)
I t is given
©
and
a.
with
respect
Oli.k) =
a l l s
fact
i s obvious
the class
In
fi^ts).
t o t h e si o p e r a t i o n e x i s t s
The
i n the
by
O(i.k) = nax(0(i-l,k),0(r,Jc-l))+l
If
This
said semiring Is actually
are equal
t o 0,
sik
2 1
i f minli.k)
i f min(i,k) =
the f u n c t i o n 0 ( i , k )
0.
i s given e x p l i c i t l y
by
a
Ik
simple
formula • U.k)
The tained
f o r constant
schedule
I n R^
of
surfaces
of
tions
describe
lines
in Fig.
by
is
optimal
schedule
optimal
of
form
form are given
the form
nodes
by
each
that
are
the
algo-
constant of
the
const.
0(i,k) = of
ob-
high-speed
of
components
a l l solutions
sets of
i t is a
parallel
i . e . the
schedule
of
va-
nodes
For
these
the
equa-
shown b y
dashed
i n F i g . 6,5 propagates
i t can
the wavefront i n much
be d e s c r i b e d we
will
by
the
described same
way
a hyperplane.
often attempt
by
as
a
Such
to describe
a
highplanar
situation Mavefronts
hyperplanes.
nodes
We
together w i t h
resulting
nents ger
those
f r e q u e n t l y encountered;
EXAMPLE 1 1 . 2 .
The
maximum
0(i,k),
(11.2)
precisely
i s an
(11.2)
6.5.
wave. C o n s e q u e n t l y ,
by
the
the equations
given
Thus f o r t h e g r a p h speed
(11,2)
layers of that p a r a l l e l the f u n c t i o n
0(i,k)
by
1.
conditions. Therefore
describes
each l a y e r s a t i s f y
function
given
initial
that
r i t h m g r a p h . The lue
0(i,k)
function
= i+k-1 i f min(i,k) £
of
graph
now
arcs
modify
the
incident
on
i s shown i n F i g .
t h e v e c t o r u t o be
values.
zero values
Consider of s I t s
an
equal
optimal
a l g o r i t h m graph d e l e t i n g
several
them f r o m
corner.
11.1.
t o 1 and high-speed
layers are
shown by
the
upper
A g a i n , we time
left
assume a l l compo-
to take p o s i t i v e
schedule dashed
inte-
corresponding
lines
in Fig.
to
11.1.
103 We
can observe
the
graph
11.2
6.5
the
though
that
the r e g u l a r i t y of the wavefront that
i s severely
layers
i t i s not
of
marred.
another
lines
I t is a
11.1
we
high-speed
modification
s i g n i f i c a n t changes
i n the layers
ing s u i t a b l e schedules.
Irregular layers
scribe
and
A viable
We
that
the graph
i n F i g . 11.1
can
therefore
t r y t o expand
6.5.
We
examine then
investigate.
i t s schedules,
consider their
mentioned The general,
that
should
i n Figs.
exactly
o f some o f
before their
possess "regular"
high-speed while
i s a subgraph a given
o f the graph
graph of
findt o de-
by o u r p i c t u r e s . inFig.
before s t a r t i n g
t h e expanded g r a p h . We
graph
have
a
11.2 a r e b o t h h i g h - s p e e d
importance f o r the p a r a l l e l i s f o u n d . By a n d l a r g e ,
the t o t a l
dramatically
our graphs
may r e -
a r e much more d i f f i c u l t
out i s suggested
onto the o r i g i n a l
11.1 a n d
i s that
should admit
way
the schedules
what s c h e d u l e
requirement
not d i f f e r
too,
to and
i n fact
i n §4.
i t i s not o f v i t a l
themselves
graphs
study
reduction
approach
schedules
vestigation nificant
one,
of a graph of algorithm
s c h e d u l e s . T h i s c i r c u m s t a n c e causes a d d i t i o n a l d i f f i c u l t i e s
see
enjoyed I n
represent i n Fig.
F i g . 11.2
Thus an i n s i g n i f i c a n t i n quite
schedule.
dashed
optimal.
Fig.
sult
The
from
number o f l a y e r s
the minimal
simple description.
exploration, schedules.
the only in a
one, w h i l e I f we
choose
i t i s very important
ones. I n
structure I n sig-
schedule
the
layers
t o expand
t o know
what
104 EXAMPLE 1 1 . 3 . C o n s i d e r graph
nodes
w h e r e 2^i^n, vector
u
values.
laJSi-l,
t o equal Suppose
k=0,
the input
j=0,
k-0,
in
,
schedule
than
that
0(i,j,0)
of
i n R
are
on
6.2. L e t
coordinates i , j , k
i,j,0.
The
situated
on
the
A are
i , j , 1 feeds
situated
input
) + 1 i f j*o
or
i*0, k=l
initial
o f the problem
conditions
(11.3)
(11.1).
vector
i s much more
However,
obtained e x p l i c i t l y
that
the
difficult high-speed
i n case
a l l s . ., IJK
have
OU.j.O) - max(0(i,J-1,0), OU, j,0)
We
conclude
The
layers
schedule
satisfy
= 0 i f >0.
= j
i f j*0.
i s a high-speed
i n F i g . 6.2.
again
a
hyperplane,
The
wavefront
a l though
(11.4)
one
g r a p h . The
t h e e q u a t i o n s j = const.
lines
We
(11.4)
form o f the algorithm
outwardly
0(J.J-1.0)) + 1 i f j * 0
that
0(j',j,0)
parallel
data
t i l . 3)
D
e q u a l 0, we
line
input.
the problem be
the
J.J.i
. , i f j - 0 , k=0
can
occupy
the matrix
, t . .
of
i n t h e nodes f o r b
formulas (9.1) take the form
J.j-l.o
data
take p o s i t i v e ' integer
the vector
components
with
t h e components
general solution
obtain
nodes data
. t . .
s.
f r o m Example with
and
data
t h e moments o f i n i t i a l
The to
discrete
node
.
. , -
grid
providing
J.J-1,0
s . ., a r e
specify
nodes
e max(t.
t,
Here,
graph
components
J.J.O
t o be
providing
f o r k = l . The
i n F i g . 6.2
integer
Once a g a i n we a s s u m e a l l c o m p o n e n t s o f t h e
time
a l l inner nodes
t h e node w i t h
t.
k=0,1.
1 and
the input
t h e nodes
into
the graph
occupy t h e nodes o f an
i t defines
a maximum
nodes b e l o n g i n g t o
individual
The
and
layers
d e s c r i b e d by
the graphs
a r e shown by
the schedule
i n Figs.
6.2
and
dashed
(11.4) i s
6. 5
are not
alike.
have
succeeded
in finding
the general
solution
of
(11.1),
yet
105 this find
I s more
other particular
be
based
to
the graph,
linked the
difficult
f o r (11.3).
solutions
o f (11.3)
on t h e approach
by
(11.3)
This
the graph,
t o nodes w i t h
mains a c y c l i c
adding
arcs
(
s max(t
t . . i.j.o
2.1,0
The q u a n t i t y
.
i and j . ( 1 1 . 5 )
integer
points values
of
applled
> sj
from
search
will
a d d new
arcs
repeatedly
. f
1,0,0
t(I,j,0)=Sj ,i
I f j=0,k=0
c a n be r e g a r d e d implies that
be r e w r i t t e n
, t
Q
these
follows:
)+l
i f i s 3 , J that
determines
N
u&v
(t).
N (t).
The sum
B, o r , e q u i v a l e n t l y ,
Then
certainly
identical,
w
A n B determines
u
v
b y A u fl a n d A n B.
which
readily
I tfollows
and t h e l a s t
follows
from
5 (
( '. u f f i ( /
N ( t ) + N ( t ) i s determined
i d e n t i t y o f ( 1 3 . 9 ) I n d e e d h o l d s . The f i r s t are
that
solely
£
termines and
(13.6) y i e l d s
i s determined
tothe
t h e
A
u
setof e
by b o t h that
d e
~
sets A
the f i r s t
terms o f (13.9)
the obvious
iden-
tity a + b • max(a.b) +
which
i s t r u e f o r any two numbers Unlike
the time f u n c t i o n a l ,
on s c h e d u l e s , treme
values
min(a,b}
a.b. t h e memory f u n c t i o n a l d e p e n d s n o t o n l y
b u t a l s o on t h e d e f e c t s o f t h e nodes. I n g e n e r a l t h e exof both functionals
would
be r e a c h e d
on d i f f e r e n t
sched-
134 ules.
I t follows that
i t i s impossible
schedule which would minimize both
to find
i n the general
algorithm execution
time
case t h e
and t h e r e -
q u i r e d memory a m o u n t . Let
w be a d e l a y
all
initial
t=T.
Both
and
data
zero
a) e x i s t
while
d e t e r m i n i n g6
schedules
the s e t o f schedules
the following
respect
u,
f o r which
are outputted at
t o the operations
graph.
f o r any two s c h e d u l e s
and N it) u®v
N it) u®v
(with
s e t f o r any a l g o r i t h m
13.7 t h a t
s e t s o f nodes d e t e r m i n i n g tions
Consider
a r e i n p u t t e d a t 1=0 a n d a l l r e s u l t s
and u n i t y i n such
proving
vector.
v
We
a
have
observed
the sets
o f nodes
a r e t h e i n t e r s e c t i o n a n d t h e sum o f t h e and W ^ ( t ) .
N^it)
Therefore
u n d e r o u r assump-
i n e q u a l i t i e s h o l d f o r any schedule u, p r o v i d e d
that
t h e d e f e c t s o f i n t e r n a l nodes a r e n o n n e g a t i v e
N
In
The
the
= « (I) a N it). ti o
it)
case t h e d e f e c t s o f i n t e r n a l nodes a r e n o n p o s i t i v e
N
as
1
1
"zero ' v e c t o r
possible, moment
implying
o
(t) s U it) u
= H it).
i
schedules each o p e r a t i o n the minimization
of i t s execution
maximum
o f time
memory
t o be e x e c u t e d functional.
i s required
c a s e a n d minimum memory i n t h e s e c o n d c a s e . The f o l l o w i n g viously hold
S
and
f o rthe f i r s t
case
^
@ N i t ) .
it)
the opposite
N ^
This functional
it)
^ N it)
relations hold
z N it)
example s u g g e s t s properties
that
fcH
it):
f o r t h e second
® N i t ) ,
requires
N
H
i t )
a l g o r i t h m g r a p h and t h e s e t o f v a l i d
However a t
i n the
first
r e l a t i o n s ob-
U)
case
a H (t)• K it).
a more t h o r o u g h more
i t ) 9 11
a s soon
detailed schedules.
s c r u t i n y o f t h e memory
information
both
on the
135
14. Hierarchical Memory Suppose
that
a computational
s t o r e a l l needed d a t a . a rule, the
external
Increase
ory
to
they
computation
memory a c c e s s perform
ternal
should
every
rithms
f o r which
ternal
cing. mit
t h e number
yield
than
t h e substan-
I n background time
as t h e
of
external
time
required
computations
and ex-
the total
a l l arithmetic
o f an e f f e c t i v e shows
of external
i n that
case, t h e
I t i s therefore very
of the effective
use o f e x t e r n a l that
there
memory.
exist
overall
time
important
references
i s on t h e I f t h e ex-
of external
t o understand
use o f e x t e r n a l
algo-
operations.
i s n o t v e r y s m a l l , no i m p l e m e n t a t i o n s
negligible
than
operations.
memory
t o t h e number o f a r i t h m e t i c
memory a c c e s s t i m e
algorithms
performed
earlier
As
t i m e d u e t o e x t e r n a l mem-
t o b e more p r a c t i c a l ;
considered
to
case.
memory a c c e s s t i m e s h o u l d b e c o n s i d e r a b l y l e s s
we
proportional
RAM
t h a t o f RAM a n d
t i m e . To a v o i d
operations. Alternating
a l g o r i t h m admits
12.2 t h a t
enough
i n that
the cumulative
n o t be much g r e a t e r
a l l arithmetic
time r e q u i r e d t o perform
Not Example
often
on. Obviously,
memory e x c h a n g e s p r o v e s
total
whole
operation execution
are f a i r l y
goes
cumulative external the
n o t have
E x t e r n a l memory h a s t o be u s e d
i n the o v e r a l l problem s o l u t i o n
references
main
does
memory a c c e s s t i m e b y f a r e x c e e d s b o t h
average a r i t h m e t i c
tial
system
o f such
memory
referen-
what a l g o r i t h m s ad-
memory a n d w h a t p r o p e r t i e s o f a l -
gorithm graphs are responsible f o r i t . To
avoid overcrowding
making
several
our research
simplifying
by u n e s s e n t i a l d e t a i l s
assumptions.
h a v e t w o - l e v e l memory a n d a n y a r i t h m e t i c t i m e . Suppose t h a t
RAM
taneous c o n f l i c t - f r e e tional
units.
ered
that
less
than 1. Let
This
correspond
external
concerned
about
channels,
whether
that
matters
has l i m i t e d
o p e r a t i o n be p e r f o r m e d
In unit instan-
access t o any o f i t s c e l l s only
to delay
vectors
be
the precise these
i s that data
those
access
t o a n y number o f f u n c -
schedules
a l l components
arbitrarily
channels
will
system
s t o r a g e c a p a c i t y and a l l o w s
implies that
memory
we
Let the computational
mode,
large.
We
be
whereof
are not
t h e number
are pipelined,
shall
considare not
currently
o f communication
e t c . The o n l y
e x c h a n g e s be i n v a r i a n t w i t h
respect
thing
t o time.
136 In
particular,
moments performed changes memory that
consider
, £g a t time
occur
moments
a t these
i s t h etime
time
a s e t o f exchanges
Assume
'j+t.
(
+ T 2
moments.
' •••
provided
The m a i n
required t o perform
by q where
a n d l e t them s t a r t
f o r any T s i m i l a r
that
q>\ o r e v e n
that
q » l . Assume
data also
may b e
no o t h e r ex-
characteristic
a single
a t time
exchanges
o f external
exchange. that
Denote
a l l external
memory e x c h a n g e s go v i a BAM. We h a v e
noted
that
ground o r i n t e r m i t t e n t l y are
performed
following executed
e x t e r n a l memory
i n background.
way. P a r t i t i o n into
smaller
We m o d i f y
t h e time
intervals.
written
i n e x t e r n a l memory
t h e e x e c u t i o n p r o c e s s on e v e r y from
Assume
i n that
that
o p e r a t i o n s , and f i n a l l y ,
i n which
that
same
process
time
they
i n the
operations
do n o t u s e a n y d a t a
interval
interval.
Now
as f o l l o w s . F i r s t ,
i s t o be read,
perform
i n back-
thealgorithm i s
the arithmetic
intervals
smaller
e x t e r n a l memory e v e r y t h i n g
algorithm
go e i t h e r
goes o n . Suppose
t h ecomputation
interval
p e r f o r m e d w i t h i n each o f t h e s m a l l e r are
exchanges
as t h e main c o m p u t a t i o n
then
a l l writes
that
modify read
perform a l l
t o e x t e r n a l mem-
ory. These t r a n s f o r m s
result
i n a new a l g o r i t h m i m p l e m e n t a t i o n and computation.
ferences
from background t o i n t e r m i t t e n t ex-
changes time. of
a r e now g r o u p e d . S w i t c h i n g generally
all
data
i n3q time
units.
increase
Finally,
execution
units.
with
i snot greater
a l l data
that
this
synchronous
a n y q, a n y a l g o r i t h m e x e c u t i o n
memory, e t c . T h e o n l y
should
be i n v a r i a n t
little
increase formally
i n time,
important
apart
e x t e r n a l memi n Zq t i m e
i n e x t e r n a l memory c a n
upper bound
i s guaranteed
and asynchronous
modes o f
t i m e , a n y mode o f a c c e s s t o thing
i s that
a s we h a v e m e n t i o n e d
i n algorithm execution brought
from
intervals 4, I n d e e d ,
t i m e needed t o c o m p l e t e t h e fragment
does n o t exceed 8 q . Note t h a t both
execution
into than
can be executed
a r e t o be s t o r e d
i n t i m e 3 q . The o v e r a l l
computation,
c a n be r e a d
itself
these r e -
algorithm
the p a r t i t i o n i n g
factor
The f r a g m e n t
a n y a l g o r i thm g r a p h ,
external
i n overall
needed t o e x e c u t e any f r a g m e n t
be w r i t t e n
having
an i n c r e a s e
Yet i t i s easy t o see t h a t
l e n g t h 2q t h e t i m e
ory
for
causes
Besides,
that a l -
t e r n a t e s e x t e r n a l memory r e f e r e n c e s
t h e exchanges
earlier.
time a c t u a l l y occurs.
computation
and e x t e r n a l
As a
rule,
Of c o u r s e , memory e x -
137 changes,
we
practice.
can again
perform
t h e exchanges
as
background
jobi n
B u t now t h e s e e x c h a n g e s c a n be g r o u p e d .
T h u s we w i l l putation
c o n s i d e r those a l g o r i t h m
and e x t e r n a l
memory
references
Implementations go
computations
a r e performed,
then
e x t e r n a l memory, new d a t a a r e r e a d
data
In,
w h e r e b y com-
as a l t e r n a t i n g
T h a t means t h a t f i r s t d a t a a r e r e a d f r o m e x t e r n a l some
a
processes.
memory i n t o RAM,
are written
from
then
RAM
into
etc.
Every a l g o r i t h m has a s e t o f i t s i m p l e m e n t a t i o n s . W i t h each i m p l e mentation, stored
some
i n RAM.
data
are stored
I tfollows
n i t e a m o u n t o f RAM.
that
i n external
each
The a r i s i n g p r o b l e m s
that minimizes
- find of
data some
the algorithm
are defi-
implemen-
usage;
o u t what p r o p e r t i e s o f a l g o r i t h m g r a p h
required The
RAM
other
requires
include:
- g i v e n an upper bound f o r r u n t i m e , f i n d tation
memory,
implementation
influence
t h e amount
RAM.
list
o f such
p r o b l e m s c a n be c a r r i e d
o n . We
will
investigate
o n l y a c e r t a i n number o f them. Recall
Statement
independent. gorithm worth
1 3 . 5 . The e s t i m a t e
In particular,
implementations
while
t o consider
mate c a n b e s h a r p e n e d . importance
f o rour
STATEMENT graphs
and
quire
both
on s e r i a l
specific
an
at
least
p]F
for
i s implementation-
and p a r a l l e l when
simple
algorithm
computers.
the required statement
p^,
those
graph of
N
holds
cases
The f o l l o w i n g
implementations
RAH amounts
i ty i e l d s
e s t i m a t e w o u l d be t h e same f o r a l l a l -
memory
I tI s esti-
i s of particular
problems.
1 4 . 1 . Let
certain
this
u
(t)
scheduls
the
consist
of
s
disjoint
corresponding
respectively.
Then
fragments the
execute
re-
estimate
£ max p 1 i£iss
that
sub-
(14.1)
the
said
fragments
con-
secutively.
Indeed, vided
t h e memory
t h e fragments
(14.1) c e r t a i n l y
i s freed
are executed
on c o m p l e t i o n
o f each
consecutively. Therefore
fragment
pro-
the estimate
holds,
COROLLARY. L e t a n a l g o r i t h m g r a p h
consist
of
s disjoint
subgraphs
138 and
the
mum
of
ith
fragment
is
that
execute
the
defects the
of
p^
13.5
of
the
fragments
( 1 4 . 11
provided
a
the requirement
s m a l 1 amount
of
In that
fied
as
use
f a r as
bottleneck
partitioned
into
[u,
v)
whose
end
I f we
two
its
end
placing
into
of
the
schedules
sharper
the
and
time
The
that
large
time
fragments
they
are
on
be
each i n -
may
prove
to
one
by
i s well as
our
to
consecutive
executed
i s concerned,
usefulness
of
data
the
of
fragment
result
input
requirement
same
as
than
o f each graph
course
that the
two
no
justi-
other
t i m e . The
of Statement
point.
The
the data
that
and
}
of
the
such
different
set
that
sets
i s a directed
im-
chief
14.1
conexter-
o f nodes V
f o r a l l arcs the
relations
of
the graph
i t from
t h e graph
cut
V ) then d e l e t i n g
(V ,
use
and
first
G^.
The
corresponding
i n f o r m a t i o n b o r n e by the f o l l o w i n g
attachment
transfer then
of
admits
corresponding
to
nodes
to
in G . I t
of the algorithm into
consecutively.
these
described inputting
by
the arcs of the
procedure.
o u t p u t node t o i t s o r i g i n
these d a t a
graph C always
a l l operations
cut defines „ s p l i t t i n g
be e x e c u t e d
loss of
d e l e t e d we
cut
a l l operations
can
a new
s e t s Jf
s u b g r a p h s G]
directed
the
Suppose
elements
execute
any
that
graph.
set o f such a r c s
that
link
outputting
all
PJ.
then
that
being
at
resources
disjoint
and
fragments
deleted,
(IT,
G
To a v o i d cut
two
p o i n t s are
schedules
i n G^
follows
results
to s u b s t a n t i a l l y decrease that
know a d i r e c t e d
G would s p l i t
nodes
imposes
Sufficiently
a directed
v e l ^ h o l d . The
i s denoted
such
of for
maxi-
efficiently.
is
of
i s of
forcing
memory u s a g e .
be
and
This
computer
the
consecutively.
significantly
14.1
If
i t shows t h e ways t o i m p l e m e n t t h e a l g o r i t h m u s i n g
G={V,E)
and
holds
implementation
RAM.
sign.
number
of a l l algorithm results
likely
i s now
memory
the
(14.1)
algorithm execution
are
i n that
Let
of
and
same
case t h e c o n s e c u t i v e e x e c u t i o n o f f r a g m e n t s
plementations
uel^
of
the fragments.
make e f f i c i e n t one.
nal
be the
a l g o r i t h m fragment,
execution
sists
may
i n memory. S t a t e m e n t
dividual
data
the
algorithm
that
requires
have
estimate
of
dropping stored
nodes
input
then
estimate
Statement
internal
number
total
The
all
new
and
As
nodes can
the arc
them i n t o
each arc
a new
being
be
directed i s being
input
node to
viewed
deleted
as
by
the o t h e r subgraph.
refirst Uhe-
139 n e v e r we d e a l w i t h a l g o r i t h m g r a p h s p l i t t i n g u s i n g d i r e c t e d sume t h a t
the corresponding
So f a r we r e s t r i c t of
individual
We a l s o assume t h a t
I t s fragment execution begins
and
terminates by outputting
(nonpositive)
of
r e a d i n g i n memory a l l i n p u t
internal
I t follows
graph
nodes
any directed
any implement
requires
that
data
one a n d execution
cut
at ion
the same
have that
that
amount
nonnegative
does
not
executes
of
memory
spli
t
algorithm as the
exe-
algorithm.
f o r t h e sake o f d e f i n i t e n e s s t h e case o f n o n n e g a t i v e d e -
of internal
a directed
and
all
Then
the entire
Consider
by
execution
a l g o r i t h m and
may b e u s e d f o r t h e c o n s e c u t i v e
Consider
nodes.
consecutively
cution
fects
1 4 . 2 . Let defects.
(input)
fragments
entire
fragments,
STATEMENT
output
with
both
a l l the results.
t h e same s e t o f memory c e l l s individual
added.
our consideration t othe consecutive
fragments.
any
of
c u t s we a s -
i n p u t and o u t p u t nodes a r e a l w a y s
nodes.
cut,
Any s p l i t t i n g
o f the a l g o r i t h m graph
f o l l o w e d by the attachment
induced
o f corresponding
input
o u t p u t nodes does n o t change t h e d e f e c t s o f i n t e r n a l nodes. I n case
the c u t does n o t d i v i d e
the output
nodes
their
number
I s t h e same f o r
t h e s e c o n d s u b g r a p h a n d f o r t h e e n t i r e g r a p h o f a l g o r i t h m . By v i r t u e o f Statement gorithm
13.5 t h e same
itself
amount o f memory
and i t s
nodes o f t h e f i r s t
second
subgraph
total
tion
mentation nal
number o f i t s o u t p u t
of the f i r s t
i nexactly
now d e t e r m i n e d
COROLLARY. L e i the be of
tation
the same
graph of
the a l o f output
the total
number o f i n -
i n i t s turn
n o t exceeding
I tfollows
that
t h e implementa-
f r a g m e n t d o e s n o t r e q u i r e more memory t h a n t h e i m p l e -
i s treated
amount b e i n g
algorithm
number
o f t h e s e c o n d o n e . The case o f n o n p o s i t i v e d e f e c t s o f i n t e r -
nodes
graph
the l a t t e r
nodes.
t o execute
The t o t a l
i s not greater than
put nodes o f t h e second subgraph, the
i s needed
fragment.
either
into
sign
by t h e t o t a l
defects or
of zero.
two subgraphs
fragment
t h e same
of
way,
memory
number o f i n p u t nodes.
a l l internal Suppose
nodes
a directed
requires
of cut
p arcs.
involves algorithm
t h erequired
at
Then least
an
algorithm
splitting any
the
implemen-
p words
of
memory.
If into
we u s e a s e q u e n c e o f d i r e c t e d
subgraphs
then
the implementation
cuts
to slice
o f t h e whole
a n a l g o r i t h m graph a l g o r i t h m can be
140 b r o k e n i n t o a sequence o f i m p l e m e n t a t i o n s posing
every
tation
o f the whole
Statement result
fragment
requires a small algorithm w i l l
time
into
as g r a p h
c r e a s e u s u a l l y does n o t b e g i n when t h e s e t s o f b o t h
require small
o f r e q u i r e d memory
main unchanged f o r a l o n g
until
Sup-
a m o u n t o f memory, t h e i m p l e m e n -
also
14.2 shows t h a t p a r t i t i o n i n g
i n t h e decrease
o f i n d i v i d u a l fragments.
memory
amount.
s u b g r a p h s may n o t t y p i c a l l y amount. T h a t
splitting
the process
amount c a n r e -
progresses. i s close
i n p u t and o u t p u t nodes g e t s p l i t
I t s de-
t o i t s end,
to a
substantial
extent. Notice describe
that
a whole, data must
be
t h e r e d u c t i o n o f t h e r e q u i r e d memory a m o u n t w h i c h we
i s to a certain extent f i c t i t i o u s .
To e x e c u t e
t h e a l g o r i t h m as
exchange between f r a g m e n t s must be p e r f o r m e d .
stored
somewhere,
and
some
additional
memory
These
i s therefore
n e e d e d . By memory we a l w a y s mean r a n d o m - a c c e s s memory h e r e . input
The
As f o r t h e
and o u t p u t n o d e s a d d e d i n t h e c o u r s e o f t h e s p l i t t i n g
will
assume
that
they
represent
a l g o r i t h m graph s p l i t t i n g
implementation
requiring
implementation
always e x i s t s .
duced
t o t h e minimum
devices
i f only
process,
referencing external
p r o c e s s may be v i e w e d a s a s e a r c h
a specified
amount o f RAM.
Furthermore,
RAM
Obviously,
then
graph
splitting
f o r the s u c h an
u s a g e c a n i n f a c t be r e -
t h e arguments and t h e r e s u l t s
t h e time
o f exchanges w i t h
cut requires that number o f a r c s ternal
effect
i n that
i s small
fragment. tional tion perly
o f each i n -
c u t be p e r f o r m e d .
writes
that
algorithm execution
t h e number o f i n p u t d a t a
Of c o u r s e ,
this
I f this
algo-
i s given,
and reads e q u a l
Consequently
can be e f f e c t i v e l y
i n comparison w i t h
amount
e x t e r n a l memory becomes c r i t i c a l .
controlled
time
comparison
o f ex-
as g r a p h
split-
n o t take
I f we
and r e s u l t s
i s v a l i d only
Each
t o the
the time
consider-
manage
to find
o f each
t h e number o f o p e r a t i o n s
s y s t e m has o n l y one p r o c e s s o r
channel.
I f RAM
E x c h a n g e s w i t h e x t e r n a l memory w i l l
on o v e r a l l
such s p l i t t i n g ment
nodes.
a number o f a d d i t i o n a l
memory r e f e r e n c e s
t i n g progresses. able
are individual
we
memory.
d i v i d u a l o p e r a t i o n a r e s t o r e d . The f r a g m e n t s o f t h e c o r r e s p o n d i n g rithm
data
frag-
within
that
i n case t h e computa-
a n d o n e e x t e r n a l memory communica-
i s n o t t h e case
t h e n o u r a r g u m e n t s m u s t be
pro-
adjusted. T h u s we h a v e
essentially
reduced
t h e problem
of efficient
use o f
141 external of
memory t o t h e f o l l o w i n g
directed
cuts
number o f n o d e s cut to
exists,
whose
ly small
the implementation
increase
further
of
splitting
respect
with
external
'Does a l g o r i t h m g r a p h
i s considerably
subgraphs?'
of the entire
o f arcs
directed
tively
memory
are performed
operations
effectively,
execution.
small w i t h at
fragments.
respect
fines.
The
The t o t a l
to the total
l e a s t one d i r e c t e d
substantially
time
i s guaranteed
t o t h e c u t . Any
smaller
search
than
that
question exchanges
i . e . the
cumulative t h e time
t h e correspondence
(or individual
references)
o f our algorithm Into
consecu-
number o f a l l a r c s o f a l l c u t s i s
number o f g r a p h n o d e s .
cut exists
reduced
relative-
as compared w i t h
Establishing
c u t s we o b t a i n a s p l i t t i n g
executed
a l g o r i t h m may be
Suppose f u r t h e r
b e t w e e n g r o u p s o f e x t e r n a l memory r e f e r e n c e s and
one s u c h
o f t h e a l g o r i t h m i n v o l v e s a n s w e r i n g t h e same
t o each o f t h e fragments.
arithmetic
than t h e
belonging
t i m e o f e x t e r n a l memory r e f e r e n c i n g i s s m a l l of
less
algorithm execution
t h e number
such
that
I tfollows
t h e number o f a r c s
cuts
involves
the analysis
that
In i t i s
t h e number o f n o d e s I n t h e s u b g r a p h s
f o r other
admit
I f at least
o f t h e two f r a g m e n t s d e f i n e d b y i t . The
i n the overall
t h e smallness
with
of arcs
i n the corresponding
the implementation
by
number
question:
i t de-
of the frag-
ments. The g e o m e t r i c we r e g a r d be
from
the functional
point
a l g o r i t h m g r a p h nodes i n t o
prises
p
arcs.
corresponding tween
Consider to V
determines
quires V
to
the existence
also
directed cut
a n d V^. S u p p o s e I t com-
vector
u whereby t h e o p e r a t i o n s
L e t t be any time
groups
that
r . Yet i t i s only
must
of operations.
o f such
the data
be s t o r e d a t t h e moment
the existence
of a directed
schedule
N ( t ) i s p. G i v e n
memory f u n c t i o n a l
"u t,
respectively,
number o f a l g o r i t h m g r a p h
first
g r o u p and p o i n t
arcs form
I t i s easy t o see t h a t
arcs that originate
from
t o nodes o f t h e second group.
a directed
c u t f o r any t , "
(
u
r
) being
NAt) i s
t h e nodes o f the
Therefore
t h e number
a l l such
o farcs i n
it. It ted
i s i m p o r t a n t t o have v a r i o u s c r i t e r i a
cuts
( i f any) a g i v e n a l g o r i t h m
graph
t o determine
admits.
what
One o f t h e s e
direci s pro-
cured by STATEMENT 1 4 . 3 . (z
, P
the
w ) with P pairs of
distinct
node.
vector
Then
with
z ,
i
nonzero
Suppose
l
p
at ion
requires
at
under
the schedule
corresponding
least
linking
possess
a
to a
p words
of
delay
memory.
t h e common n o d e o p e r a t i o n i s
to
i s generated
t o t h e moment u . T h e s e d a t a
w i l l be
K
later
than u . Consequently, N ( u (ftp.
p
1
implementing
paths
u. A c e r t a i n amount o f d a t a
previously
consumed b y t h e n o d e s w ,...,w This c r i t e r i o n
(z^, w^>, . . . ,
a r e paths
these
p
1
arcs
there
and that
implement
components
t h e nodes z ,...,z
while
p
comprise that
z , w
. . . ;
t h e moment a t w h i c h
k
by
w;
any algorithm
Denote b y u be e x e c u t e d
graph
end points.
nodes
"
common
L e t a n algorithm
implies
K
U K
t h a t we c a n n o t d o w i t h
small
t h e a l g o r i t h m d e f i n e d by t h e graph
amount o f RAM
i n F i g . 1 2 . 2 . To
prove t h i s i t i s s u f f i c i e n t t o t a k e nodes o f a d j o i n i n g l e v e l s as z .....z a n d w .....w i n S t a t e m e n t 14.3. The nodes h a v i n g t h e maxiV p 1 p. mum
values
o f coordinates
i , j a r e common
p a i r s o f n o d e s . T h e amount o f d a t a execution in
o f the corresponding
one l e v e l ,
Reading
memory a c c e s s the
time
should
execution.
stored
This
these
data
n o t t o slow
implies
that
down
than
be done
o f points d u r i n g the
i t i s necessary
that
operation time f o r
significantly
a l l o r almost
that
the c o n s i d e r a t i o n o f p a r a l l e l
a l l data
the algoh a s t o be
implementations
o b l i g a t o r y f o r b o u n d i n g t h e r e q u i r e d memory a m o u n t . study of s e r i a l STATEMENT reached
these
i n RAM.
Notice
the
1inking
t h e number
must
levels. Obviously,
n o t b e much g r e a t e r
memory e x c h a n g e p r o c e s s
rithm
operations equals
and w r i t i n g
execution o foperations o fboth
f o r a l l paths
t h a t must b e s t o r e d a t t h e moment o f
on serial
implementations
14.4.
The global
In the general
case
i s sufficient.
maximum
implementations.
i s not
of
required
memory
amount
is
143 Let u
the quantity
a t time
t^.
moment
N^U^) be t h e g l o b a l
maximum r e a c h e d
We c a n a s s u m e w i t h o u t
loss
on schedule
i ng e n e r a l i t y
t h a t no
o p e r a t i o n s a r e b e i n g e x e c u t e d a t t h e moment t . T h e n » ( f 1 I s e q u a l t o o u o the by
number o f a r c s o r i g i n a t i n g f r o m t h e moment
ter
i
Q
form
t
Q
and p o i n t i n g
. Then t h e s a i d
a l l operations
Our
t o t h e nodes t h a t
before
treatment
and a f t e r t
o f two-level
sequentially
memory c a n e a s i l y
t h e c a s e o f m u l t i - l e v e l memory. S u p p o s e
grows
with
the level
the f i r s t this
sively. and
little
1 levels juncture
First
last
loss
rected
that
t h ehighest-level
to refining
splitting
of algorithm
capacity
r a p i d l y de-
a l l available
o f "BAM", s p l i t t i n g
into
o f course disjoint
be
capacity o f
o f the (l-M)th
level.
c a n be i n v e s t i g a t e d
process
graph
i n time
i tcan u s u a l l y
memory a s " e x t e r n a l "
obtained
t h estructure
to
memory a c c e s s
t h ecumulative
m u l t i - l e v e l memory u s a g e
b u t o n e l e v e l memory, e t c . T h i s
recursive
memory
than t h e capacity
memory a s "BAM". H a v i n g
we p r o c e e d
be expanded
that
Furthermore,
i n generality
i s much l e s s
we r e g a r d
a l l other
tion,
As a r u l e ,
as t h e l e v e l number d e c r e a s e s .
assumed w i t h
At
number.
( a s t h e number o f
would n o t change).
clude
creases
a r e t o be e x e c u t e d a f -
maximum w o u l d r e m a i n u n c h a n g e d i f we w e r e t o p e r -
d a t a a t t h e moment t
arcs carrying
t h e n o d e s whose e x e c u t i o n was o v e r
recurmemory, informaoff the
amounts
fragments
to the by d i -
cuts.
15. Sectioning of Memory We now s u p p o s e use
algorithms
rally
arises
structure
that
f o r some
requiring very
whether
by t a k i n g
reason
large
account
i t i s necessary t o
a m o u n t s o f BAM. The q u e s t i o n
i t i s possible into
or other
t o weaken
more
natu-
t h e demands o n t h e BAM
properties
of practical
algo-
rithms. Recall is
that
b y RAM we a c t u a l l y
mean
t h e memory w h o s e a c c e s s
o f t h e same d e g r e e o f m a g n i t u d e a s a l g o r i t h m
not
concerned
about
fast
memory a c c e s s .
take
somewhat
ficiency will
longer,
the technological I fa relatively
operation
number
o f memory
o r even s u b s t a n t i a l l y l o n g e r ,
a c t u a l l y occur
t i m e . We a r e
a n d s t r u c t u r a l means small
i np r a c t i c e .
little
time
t o achieve references drop
i n ef-
We s e t o u t t o make e f f e c t i v e
144 use
of that Let
tional
circumstance.
t h e c o m p u t e r memory c o n s i s t
units
reference
memory a c c e s s
time
memory
i s roughly
functional
units
long
and t h e t i m e
time,
considerably tioned
greater.
case
sectioned
time
RAM
only will
tation
t h e same a s o p e r a t i o n
one and
required
We w i l l
the data
t h e same
t o switch
refer
exchange
i s chaotic
t o frequent
place
Suppose
t o RAM
that
t h e average
time
section
provided
o f memory
b e t w e e n memory structured
the
for a
sections i s
like
t h i s as sec-
memory.
In
due
reference
o f s e c t i o n s . Assume t h a t t h e f u n c -
v i a a switch.
switching
once
between
t h e average
memory a c c e s s
between s e c t i o n s .
i n a while,
then
memory
computer
uni t s
time
will
be
large takes
average
memory
admits o f an e f f e c t i v e system
and t h e
However, i f s w i t c h i n g
an a c c e p t a b l e
ensue. Whether an a l g o r i t h m
on s e c t i o n e d
the f u n c t i o n a l
i s determined
access
implemen-
by t h e algo-
rithm structure. C o n s i d e r any a l g o r i t h m
t h a t c a n be e f f e c t i v e l y
i m p l e m e n t e d on such
a s y s t e m . We h a v e s e e n t h a t s w i t c h i n g b e t w e e n memory s e c t i o n s infrequent tions
i n that
case.
I t follows
into
such g r o u p s
may be b r o k e n
that
almost
h a s t o be
a l l algorithm
that a l l operation
opera-
arguments f o r
e a c h g r o u p a r e t a k e n f r o m o n e a n d t h e same memory s e c t i o n . Assume the
r e s u l t s o f these operations
generally ments
not Important
from
different
are stored
i n t h a t same s e c t i o n .
where t h e r e s u l t s o f o p e r a t i o n s sections
as
their
t h a t take
number
Nonetheless
l e t us a g r e e
stored
t h e s e c t i o n whence t h e l a s t a r g u m e n t was d r a w n .
Into
Now
group
together
that
are stored,
t h e r e s u l t s o f an o p e r a t i o n
those a l g o r i t h m
graph
long
t o o n e a n d t h e same memory s e c t i o n . S w i t c h i n g b y t h o s e a r c s whose e n d p o i n t s
into
these arcs
disjoint
from
subgraphs,
number b e i n g
u s e d memory s e c t i o n s . An a l g o r i t h m computer splits
system
i s effective
t h e graph
into disjoint
Let of
G=IV,E)
i t s nodes
a r e elements
t h e g r a p h we o b t a i n their
be a g r a p h
a r e always
will
then
be caused
a splitting
groups.
o f t h e graph
a s t h e number o f
i m p l e m e n t a t i o n on a s e c t i o n e d of
arcs
subgraphs i s r e l a t i v e l y
(not n e c e s s a r i l y
(vertices) i s partitioned
into
argu-
I s smal1.
of different
t h e same
i f t h e number
is
n o d e s whose r e s u l t s b e -
only
Deleting
that I t
whose
memory
deletion
small.
d i r e c t e d ) . Suppose t h e s e t two d i s j o i n t
subsets V
and
145 I*a_
The s e t o f a r c s
sets
i s referred
l e d g e s ) whose end p o i n t s a r e e l e m e n t s o f d i f f e r e n t
t o as u n d i r e c t e d
cut,
or just
cut,
o f t h e g r a p h , and
d e n o t e d b y . 1 2
The rithm
possibility
implementation
following. a
to build
on a s e c t i o n e d
Indirected cuts exist
relatively
splits
an e f f e c t i v e
small
the graph
disjoint
memory
computer
Deleting
subgraphs
s i z e . C l e a r l y , the reverse
ted cuts e x i s t s tively
statement
these cuts
So
f a r we
chiefly
also
i n t h e a l g o r i t h m graph then
implemented on a s e c t i o n e d
functional
have
units
t o make
algo-
means t h e
sections.
our discussion
more
from
t h e graph
i s t h e same
as
i s d e t e r m i n e d by sec-
holds:
i fsuch
undirec-
t h e a l g o r i t h m c a n be e f f e c -
memory c o m p u t e r
assumed t h e e x i s t e n c e
a n d memory
system
w h o s e number
t h e number o f u s e d memory s e c t i o n s a n d w h o s e s i z e tion
run time)
i n the algorithm graph that consist o f
number o f a r c s .
into
tas regards
system.
of a switch
Actually
connecting
switch
illustrative.
was
The
the
introduced
essential re-
quirements a r e i n f a c t as f o l l o w s :
time
-
t h e computer system a r c h i t e c t u r e admits o f s e c t i o n e d
-
t h e a v e r a g e memory a c c e s s t i m e
i f the functional
ory f o r a long
units
These tems t h a t
requirements
transfer
tant
as
something
are d i s t r i b u t e d
a
given
satisfied
or distributed,
a switch
sections
system
imposed
i s used,
on
or a
not important
by a l l computer
units
communication
whether
usage o f f u n c t i o n a l u n i t s .
and f u n c t i o n a l u n i t s
by
units
the above-listed
of individual
algorithms
system
network, or
the functional
s o l e l y on t h e s i z e s o f u n d i r e c t e d architecture various
sys-
I s unimpor-
the algorithm
a l l peculiarities
communications c o n t r i b u t e t o t h i s .
t w e e n memory s e c t i o n s
i s consid-
memory. The a c t u a l m e t h o d o f
among memory s e c t i o n s o r n o t . P r o v i d e d
are satisfied,
yield different system
are apparently
I t i s also
tures e v e n t u a l l y t e l l For
between d i f f e r e n t
operations.
the restrictions
Maybe
else.
requirements
o n e a n d t h e same s e c t i o n o f mem-
b e t w e e n memory s e c t i o n s a n d f u n c t i o n a l
regards
architecture.
units
than performing
have s e c t i o n e d ,
data
memory;
t h e same a s o p e r a t i o n
time;
- switching functional erably slower
reference
i s roughly
architec-
c u t s we c h o o s e . will
in
general
B o t h a l g o r i t h m s t r u c t u r e and The t y p e s play
o f c o n n e c t i o n s be-
t h e main
role.
146 We h a v e shown i n §14 t h a t t h e a l g o r i t h m implemented c a s e RAM
effectively
size
is insufficient
mediate r e s u l t s .
to store
memory
a l l data,
12.2 c a n n o t be
computer
system i n
including the inter-
However, i t a d m i t s o f a good i m p l e m e n t a t i o n on a
t i o n e d memory c o m p u t e r by
i n Example
on a n y m u l t i - l e v e l
system.
hyperplanes p a r a l l e l
Indeed, d e f i n e
t o the coordinate
a set of undirected
into
sides
subgraphs
hyperplanes.
o f the corresponding contained
The r a t i o
within
face
large
correspondence achieve sectioned
memory c o m p u t e r
Thus u s i n g
effective there
by t h e
cut t o the total
i s p r o p o r t i o n a l t o t h e r a t i o o f t h e sur-
I t follows
that
and
of the algorithm
i s small f o r
e s t a b l i s h i n g the
memory
sections,
i n Example
we
12.2 o n a
system.
sectioned
memory b r o a d e n s
Our n e x t q u e s t i o n s
implementation
algorithms
i s split
delimited
i n each
the parallelepipeds
implementation
plemented a l g o r i t h m s .
are
the parallelepipeds
parallelepipeds.
between
a good
l i e on t h e
The g r a p h
a r e a o f t h e p a r a l l e l e p i p e d t o i t s v o l u m e . The l a t t e r
sufficiently
an
hyperplane.
o f t h e number o f a r c s
number o f a l g o r i t h m g r a p h a r c s
cuts
h y p e r p l a n e s and n o t c o n t a i n -
i n g g r a p h n o d e s . The e n d p o i n t s o f a r c s o f a n y o f s u c h c u t s different
sec-
that
t h e scope o f e f f e c t i v e l y imare,
on s e c t i o n e d cannot
be
what a l g o r i t h m s
memory
computer
implemented
admit of
s y s t e m s and
effectively
on
such
systems? The
example
question. Suppose
we
just
Let algorithm that
considered
suggests
an answer
g r a p h n o d e s be p o i n t s
t h e nodes a r e s i t u a t e d
w i t h i n a region with s u f f i c i e n t l y
i n such
i n some a way
smooth boundary call
a g r a p h JocaJ
all
with
respect
o f I t s arcs
ledges! a r e small
containing
the graph.
characteristics
describing
for
y e t they
space. number
i f the lengths of o f the re-
various quantitative are not o f
interest
u s now, STATEMENT
local
notion,
their
to the size
I t i s easy t o i n t r o d u c e this
metrical
that
first
i s a good e s t i m a t e o f
t h e v o l u m e o f t h a t r e g i o n . We w i l l
gion
to the
can
be
15.1. effectively
Any
sufficiently implemented
large on
algorithm *
sectioned
whose
graph
memory
is
computer
system.
Given a l o c a l
graph, consider
the hyperplanes p a r a l l e l
t o t h e co-
o r d i n a t e h y p e r p l a n e s and n o t c o n t a i n i n g g r a p h n o d e s . They d e f i n e
a par-
147 titioning
of
the
region
i n t o p a r a l l e l e p i p e d s and
into
subgraphs.
The
ered
during
re-analysis
is
that
our
the
total
parallelepipeds lengths sing
of
of
number
is
of
small.
they
parallelepipeds,
parallelepipeds the
number o f
such a r c s i s r e l a t i v e l y
facets
of
are the
Thus t h e p o s s i b i l i t y sectioned graph
analysis
of
graphs
there
are
or
their
Generally
most o f t e n t h e Note
isomorphic
are
of
graph c u t s . It
use
of
themlinking layers
the
total
the
of
the
local them
local
not possess t h a t
the b o t t l e n e c k .
(and
Dot
computation of
whereby
have
However, property.
the
like),
product
evalu-
s o l v i n g l a r g e systems o f g r i d
number
number
of
data
are
s t r u c t u r e upon
the
i n p u t and
the
equa-
output
The
one.
the
influence
memory
undirected
cuts
always easy t o The
now tell
traditional
loops.
Loops
of
involves
algorithm
the
play
i n v e s t i g a t i o n of
the
main
role.
whether a given
a l g o r i t h m and
often
encompass
the
algorithm
graph
is
isomorphic
program n o t a t i o n s computational
involve
fragments
t h a t have c l e a r - c u t m a t h e m a t i c a l meaning. D e s p i t e
t h a t t h e i r graphs between d i s t i n c t
l o c a l due
putational
liar
to the
existence
of
"long"
links
f r a g m e n t s . S u c h a l g o r i t h m g r a p h s may
c a l g r a p h s and
The
either
graphs.
method
a
algorithm graph.
not
be
the
i m p l e m e n t a t i o n on
to a
most to
conjugate gradient
fragments of
studying
required
i s not
to a local the
to
thin
Consequently,
that
w h o s e g r a p h s do
to
encompas-
respect
relatively
the
bottlenecks.
that
structure
those
different
points of arcs
within
shows
is proportional
thing
parallelepipeds
t h e end
algorithms
trouble while
tions.
the
with
consid-
local,
size of the
i s isomorphic
products being
operations
to
is
i s d e t e r m i n e d by w h e t h e r
the e v a l u a t i o n o f dot l o t of
belong
effective algorithm
ation
causes a
important
small
that
graphs
popular algorithms
already
graph
small.
that
Here b e l o n g , f o r example, t h e
that
graph
to the
also
situated
o f an
t o some g r a p h
practical
local
the
algorithm
only
points
parallelepipeds.
memory c o m p u t e r s y s t e m
i s close
are
provided
adjoining
The
Since
large. Therefore
different
to
12.2.
w i t h respect
that
sufficiently
close
Example
a r c s w h o s e end
small
I t follows
the
selves are
i s rather
of
relatively
i t s arcs are
region.
sizes
situation
of the
then
mathematical
split
meaning.
into Here
subgraphs lies
the
be
t h a t no
transformed longer
methodological
may com-
into lo-
have t h e
fami-
difficulty
of
148 dealing with
local
Consider
graphs.
f o r example
SSOR f o r t h e s o l u t i o n
grid equations. Their structure methods a r e i t e r a t i v e . tions.
loops
solved.
i n (12.2).
o f 1lnear
m i r r o r e d b y E x a m p l e 1 2 . 2 . These
The ( l o o p i n ( 1 2 . 2 ) d e s c r i b e s i n d i v i d u a l
On e a c h i t e r a t i o n ,
natively
i s well
o f systems
The
upper and lower t r i a n g u l a r
solution
processes
The n o t a t i o n
Itera-
systems a r e a l t e r -
are described
by
t h e double
(12.2) has t h u s a c l e a r - c u t
mathematical
structure. Applying 112.2) w o u l d arcs
that
112.2).
the standard not y i e l d
correspond
In fact,
preferring
a
methods
local
t o data
graph.
(12.3).
shown i n F i g . 12..2, a n d i t i s l o c a l .
notation
(12.2).
per double one.
loop i n (12.2), this
graph
parallelepipeds,
then
the resulting
structure
of
(12.2).
"long"
loops i n
reason
reflects
o f our
the structure
t h e mathematical of the original
correspond
t o t h e up-
subgraphs c o n t a i n e d w i t h i n subgraphs would
Moreover,
these subgraphs c a n be i d e n t i f i e d
to
I l e v e l s c o r r e s p o n d t o t o t h e lower
I f we s p l i t
the
into
by
t h e two i , J
t h e main
t h e odd t l e v e l s
t h e even
building
The c o r r e s p o n d i n g g r a p h i s
I t fully
and t h e r e f o r e
In particular,
graph
i s violated
between
o f l o c a l 1 t y was
notation
structure of the algorithm,
a l g o r i thm
Locality
exchanges
t h e absence
the equivalent
of
no
i t i s not obvious
with
any f a m i l i a r
individual
longer
reflect
a t a l l whether
o b j e c t s and p r o c e s -
ses.
16. Decomposition of Algorithm and of Its Graph Algorithm sults
splitting
(or decomposition)
I n the decomposition of algorithm
The
graph
the
original
rithm. ties.
graph
of every
The
graph
partial
algorithm
itself
into
subgraphs r e -
into partial
algorithms.
i s the corresponding
subgraph o f
a n d i t c a n be r e g a r d e d a s some new i n d e p e n d e n t
independence
manifests
I n a number o f c a s e s p a r t i a l
itself
i n memory
algorithms
traffic
a r e implemented
d e n t l y o f one a n o t h e r . There a r e a l s o o t h e r f a c t s
algo-
peculiariindepen-
that corroborate
this
viewpoint, We original
will
regard p a r t i a l
algorithm.
We
will
algorithms use
a s new
the term
large
operations o f the
macrooperaiion
to refer
to
149 such o p e r a t i o n s . Data dependencies between termined two
by
algorithm
distinct
that originate arc. so
i n one
Obviously,
that
graph
subgraphs,
arcs
call
be
to
graph
be
as
regarded
of
larger
and
data The
as
o p e r a t i o n s and
i n two
stages.
level,
then
tion
data
of
partial
quires a
by
First,
choices Taking
demands.
graphs
i t i s p o s s i b l e t o choose macronodes
t h e demands o n d a t a
The
w o u l d be
strict
natural
requirements
cuts
arc
I f the
an
belonging
disjoint
parts,
parts.
t h e end
be
cut. I t follows
This used
deleting
these
of
on
terms
operations cuts.
implementa-
The
t h e macroimplementa-
macronodes
the e n t i r e
re-
algorithm.
c o n s i d e r a b l y weaken t h e r e of
i n s u c h a way well.
This
system
individual
as
t o weaken
results
in i t s
architectures.
what
c o n d i t i o n s the
We
macro-
macrograph
means
that
to build
graph
is
macrograph
t o any cut
macroarc.
from
split
is
the
into
subgraphs
every
macroarc
t h a t any induces
I t i s an
graph
splits
element the
involves the arcs
of
some
graph
into
the set o f d i r e c t e d such
those c u t s from
set of directed
the macrograph
cut of
differ-
of only
directed cut that participated a directed
by
acyclic.
macronodes o f t h e macroarc b e l o n g i n g t o
Consequently,
f o r m i n g o f the macrograph
can
of
peculiarities
computer
algorithm
resulting
cut. Deleting that
self.
on
question arises,
16.1. then
Take any
directed
size
arcs refer
in
compromises are p o s s i b l e w h i l e choosing
directed
ent
will
acyclic.
STATEMENT directed
the
e x c h a n g e b e t w e e n t h e m as
have seen t h a t e f f e c t i v e nodes.
account
sets
is acyclic, i t
individual
that
o f macronodes can into
We
a r e d e s c r i b e d on
to
than
source
i n less
graph.
to build algorithm
algorithms corresponding
l e s s e r amount o f r e s o u r c e s
macro-
r e g a r d m a c r o a r c s as
the chosen a l g o r i t h m graph
implementations
any arcs
disjoint
algorithm described The
de-
graph
the other a into
i n d i v i d u a l macronodes a r e c o n s i d e r e d .
Moreover, s p e c i a l
turn
into
the macrograph
transfers.
are
subgraphs. For
algorithm
divided
will
c o n c e p t o f m a c r o g r a p h a l l o w s us
tions
sink
o f the o r i g i n a l
i s determined
graph
of
m a c r o n o d e s o f t h e new
macrograph. Provided a graph
transfers
separate
be
a m a c r o a r c . We
the corresponding
raacrooperations
set
o f t h e subgraphs and
each s e t would
t h e new
link
entire
a l l a r c s o f a l l c u t s may
connecting
can
that
the
the
one
i n the
the macrograph
cuts
i n the
cuts
i n t h e macrograph
that
the corresponding
mac-
isolates
algorithm
it-
graph
150 ronodes. tion
Suppose t h e m a c r o g r a p h c o n t a i n s a c i r c u i t .
of the cuts
elimination belong
this
circuit
t o non-connected
parts
i n this
case
may be i n v e s t i g a t e d
dently the
o f the graph,
cuts y i e l d s
t h e macrograph
r i t h m graph macrodescription.
"usual"
As t h e e n d p o i n t s
algorithm
o f that
no o t h e r
f o r t h e macrograph
graph.
circuits. There-
be r e g a r d e d a s a n a l g o -
f o r m s may be b u i l t i n just
and schedules
t h e same way a s f o r t h e
E a c h m a c r o o p e r a t i o n may
o f those macrooperations
must
l i n k i n g the
contain
an a c y c l i c macrograph.
can a c t u a l l y
Parallel
macroarc
path
T h e r e f o r e t h e macrograph cannot
Thus t h e use o f d i r e c t e d fore
must be b r o k e n a t some moment d u e t o t h e
o f some m a c r o a r c .
m a c r o n o d e s may e x i s t .
During the elimina-
be e x e c u t e d
t h a t do n o t i n f l u e n c e
indepen-
i t s arguments a t
moment o f i t s e x e c u t i o n . T h e s e t o f m a c r o o p e r a t i o n s may be e x e c u t e d
sequentially or i n parallel.
Each i n d i v i d u a l
implemented
way. F o r e x a m p l e ,
i n any s u i t a b l e
be e x e c u t e d c o n s e c u t i v e l y , w h i l e
m a c r o o p e r a t i o n may a l s o be the macrooperations
t h e e x e c u t i o n o f e a c h o f them
may
i s par-
allelized, Suppose t h a t a m a c r o p r o c e s s o r It
h a s t o be s o l v e d w h i l e
choosing d i r e c t e d
memory r e q u i r e m e n t s d e c r e a s e
larger
subgraphs
results
channels
Regarding
the macroprocessors
various
Implemented.
are
pipelined
able t o arrange
data dependencies communication
network
topology of links
algorithm
include
units
cuts
i n such
them w i l l
systems
with
a
c a n be single
e t c . [ 8 8 ] . I f we
a way a s t o w e a k e n t h e
t h e n t h e r e q u i r e m e n t s on t h e
be m i l d .
t h e y c a n be l i n k e d n o t have
we c a n u s e them
t h e macrograph
A f t e r we d e c i d e o n t h e
practically
significant
impact
arbitrarily.
on t h e o v e r a l l
implementation efficiency.
Whenever t h e r e ly
will
on which
t h e macronodes,
linking
number o f m a c r o p r o c e s s o r s The
organizations
vice versa,
memory i s n e e d e d .
processors, s y s t o l i c arrays,
the directed
between
pro-
macroprocessor
r e q u i r e m e n t s o n communica-
as f u n c t i o n a l
c o m p u t a t i o n a l systems
The p o s s i b l e
macroprocessor,
The
t h r o u g h p u t grow;
i n milder
t i o n channels but l a r g e r macroprocessor
build
cuts.
as i n d i v i d u a l macrographs g e t s m a l l e r , b u t
requirements on communication
choosing
to
t h e macronodes.
h a s some r e s o u r c e s , i n c l u d i n g memory. G e n e r a l l y a n o p t i m i z a t i o n
blem
the
i s chosen t o execute
coupled subgraphs
i sa possibility
to spilt
the algorithm
i t i s w o r t h d o i n g i n most c a s e s .
into
loose-
The s p l i t t i n g r e -
151 legates
t h e main d i f f i c u l t i e s
computer
architectures
structure. al
units
Note t h a t of
complex,
and
structure Me
the
never
in
synchronous
their
structure
o f o p e r a t i o n s and
have merely
They
may
are
ultimately
reveals
properties suits is
the
and
the s t r u c t u r e
the algorithm.
not c u r r e n t l y
that
remarkably
graph
we
like
would
once a g a i n Example ting
induced
planes
by
graph
subroutine MxL,
and
that
MxN
DO
1
DO
determined
or
very
Their
com-
by
both
the
to solve
the problem
of
mapping
Yet even o u r s u p e r f i c i a l
o f the c o m p u t a t i o n a l system discussion
of
treat-
algorithm that
graph
optimally
the mapping
problem
decomposition using to express
12.1.
directed
i t i n our
that
whose
that
structure
Therefore the p a r t i a l
i s analogous
EXP(A, B, C , M , N , L )
algorithm
important Consider split-
subgraphs
t o see
i s so
notation.
hyper-
the
I t i s easy
cuts
algorithm
is
analogous
algorithms
to
should
the
admit
t o 12.1. the
following
o p e r a t e s u p o n t h e a r r a y s A , B,
and
FORTRAN-llke
language
C of dimensions
NxL,
respectively:
f=l,M
2 i= l , N
DO
3
j=l,L
= AU.j)
Aii.j) IF{i*N)
GO
TO
+ AU-i.J)
B(t.J)
=
A(N,j>
2
C(t,i)
=
Aii.L)
1
CONTINUE
u s u a l , we
+ AU,j-l)
2
3
As
simple etc.
intention.
structure.
by
very
graph
of a d e s c r i p t i o n Denote
be
modes,
function-
a set of hyperplanes p a r a l l e l to the coordinate
generates
overall
macroprocessor
c o n n e c t i o n between
A more d e t a i l e d
our
Algorithm
tight
the
onto
t e c h n o l o g y advances.
a l g o r i t h m s o n t o computer a r c h i t e c t u r e s . ment
well
asynchronous
t h e ways
of
r e s t r i c t i o n s on t h e
or
the computer
sketched
o f mapping a l g o r i t h m s
consideration
imposed any
the macroprocessor.
work
plexity
to
we
o f the problem
assume t h a t
by
definition
(16.1)
152 = Btt.JY,
A(0,j)
It
c a n be e a s i l y v e r i f i e d
=
Ali.O)
that
C((,I).
the additional modification of entries
o f fi, C i n ( 1 6 . 1 ) d o e s n o t c h a n g e t h e e n t r i e s o f A c o m p u t e d i n ( 1 2 . 1 ) . With Example
the above-described
12.1 e v e r y
described
vide and
partitioning corresponds
of the algorithm
to a
partial
b y ( 1 6 . 1 ) . t h e v a l u e s H. H. L now b e i n g
graph. Besides, sults
subgraph
the arrays
also belonging explicit
A , B, a n d C now s t o r e
graph I n
algorithm
also
t h e s i z e s o f t h e subt h e i n p u t d a t a and r e -
t o t h e s u b g r a p h . These i n p u t d a t a and r e s u l t s
i n f o r m a t i o n o n w h a t d a t a may be s t o r e d
w h a t c a n be g a i n e d
by t h a t .
Now
i t remains
pro-
i n e x t e r n a l memory
t o p u t down
the algo-
r i t h m as a whole. S u p p o s e t h a t t h e n u m b e r s W, H. L i n ( 1 2 . 1 ) a r e r e p r e s e n t e d of
W= la +. . .*m , l p According A
t o these
, of sizes rk
and
a s sums
integers
n
the matrix
now be w r i t t e n
DO DO DO
r
representations k
C into
blocks
I
ft=l,p
2
r=\,q
3
k=l,s
V
contents
C
hr'
The
initial
the
corresponding blocks i n the array
cuts,
,
partition 6
into
+. . . + 1 i s
L=l
the matrix
blocks
of sizes
hk
« n^.
A into
o f sizes
blocks m. x 1, h
k
The a l g o r i t h m can
VVV
o f the arrays A
and C
, B
must m a t c h t h a t o f
o f A , B. a n d C. The a l g o r i t h m
A consisting o f t h e blocks
A
f
k
results
grouped
will
be
I n t h e ap-
manner.
The n o t a t i o n cerning
Chj_
q
i n the form
E X P (
propriate
1
x J, , t h e m a t r i x
"rkCONTINUE
stored
N=n +. , , +n
(16.2)
the p o s s i b i l i t i e s
as i t p r o v i d e s
renders unnecessary any f u r t h e r of algorithm
the f u l l
graph
information
splitting
about
research using
con-
directed
i te x p l i c i t l y .
I t
Is
153 not
even
that
necessary
i s t o be Now
ting
taken
consider
using
cutting
t o know t h e
Inner
i n t o account
Example
undirected
hyperplanes
As
Example 12.1,
algorithms that
with
produces
plementation the
nodes w o u l d
process
into
of
algorithms.
This
undirected
Any
it
a set
of
between
of
linked
graph then
them.
In
a partial
operations.
This
is
typical
an
algorithm
Algorithm
closely
nearly
used
to
usually
dominates the
effective
the
cuts
The
problem
that
overall
the
same
im-
reduce partial
splittings
represent
of
i n various
typically
the
used
of
dependen-
decompose
i n the
I n t h a t c a s e we
set
of
i s the
fact
algorithms.
execution
portabi1ity
is
libraries.
only
that
software
For
time
of
time.
The
therefore
Since
the
directly
are
example,
subroutines problem largely
overall
of the
size
of
performed. beget
tremendous neglected.
that some
the
inevitably
on
i t is
c i r c u m s t a n c e o f c o u r s e c a n n o t be
have
will
ex-
systems, Macrooperations
algorithm execution
fields
cuts
cannot but
transporting applications
The
the
macro-
i n some a p p r o p r i a t e mode.
complicated
the
algo-
i s whether
the macrograph. I f u n d i r e c t e d
role.
the analysis reveals
area
are
t r a n s p o r t c a n n o t be
amounts o f a l g o r i t h m s . T h i s Nonetheless
graph
t h e n o t a t i o n s f o r i t bear n o t
describe
play
i s huge, the
progress
to
to exist
including parallel
the p o r t a b i l i t y
libraries
using
to
main q u e s t i o n
e v e n more i m p o r t a n t
applications software
problem o f
algorithm
The
i s guaranteed
i s d e f i n e d by
the
subroutines
that
to data
computers,
always
partial
algorithm
to
directed
d e c o m p o s i t i o n and
various
I t follows the
graph.
the
the macrooperations according
such guarantees e x i s t .
to
the the
case
I t seems t h a t
related
onto various
Provided
i t i s impossible
allows
ecute a l l macrooperations simultaneously
memory u s a g e .
split-
hyperplanes,
identifying
to break
for
macrooperations.
order
order
u s e d , t h e n no
algorithms.
circuits.
i . e.
thing
time.
t o that o f the e n t i r e
impossible
stages,
only
algorithm graph
coordinate
similar
The
cuts.
i s possible to order
cies
are
situation
decomposition
as
two
EXP.
the whole a l g o r i t h m to implementations of
p r o d u c e d by
rithm
the
have
I t i s i n general
implementation
partial
to
be
the
t h e m a c r o g r a p h p r o d u c e d by
new
macrograph
before,
parallel
s t r u c t u r e of a l l subgraphs w i l l Unlike
i s i t s execution
12.2.
cuts
are
s t r u c t u r e of
set
common
of
algorithms
kernel,
usually
of
one
not
and very
154 large.
Moreover, t h i s
sufficiently
large
o p e r a t i o n s , as plication,
I f a l l linear
a l g o r i t h m s w o u l d be
of
such a
the
algebraic
reduced
course,
operations.
in
are
fields
t h e d i f f e r e n c e may totally
example,
rithm
various
the
identical,
common. The
different
graphs
of
though
would turn
one
of
graph
to
the key
and
parts of
yield
algorithms
(12.1)
and
linear
of
their
the
may
the
described
i n the block
block
of
that,
form. Yet
rather
identical
are
have
graphs. algonothing
revealed
book i s d e v o t e d .
based
on
cuts
we than
multiplication
these a l g o r i t h m s
analysis,
of
kernels.
functional
situation
to which t h i s
algorithms
In spite
possess
algebra
background
development
theory.
be
of
theoretical
matrix
investigation
the s t r u c t u r a l
multi-
algebra a l -
carefully
different
to
of
functionally
essential traits
building
matrix-vector
were
The
of
out
several
matrix-vector
problem f o r l i n e a r
seldom d e s c r i b e d
s t r u c t u r a l analysis of algorithms, rithm
i n terms of simple
to the problem of p o r t a b i l i t y
matrix-vector
i s a w e l l - e s t a b l i s h e d branch
that
structural: For
algorithms
the p o r t a b i l i t y
l i n e a r a l g e b r a i c a l g o r i t h m s are Of
the
multiplication,
description Is closely related
m e t h o d s and
observe
described
example,
such k e r n e l f o r numerical
terms of such o p e r a t i o n s ,
implementing
o f t e n be For
m a t r i x a d d i t i o n and
etc. constitute
gorithms. in
k e r n e l can
operations.
by
the
Algo-
constitute
Chapter 4 Matrix Investigation of Algorithm Structure Various questions tempts t o f i n d
arise
a s we a r e i n v e s t i g a t i n g a n a l g o r i t h m . A t -
a n s w e r s t o them f o l l o w .
The s c o p e o f q u e s t i o n s
i s usual-
l y q u i t e w i d e . T h e q u e s t i o n s may t o u c h u p o n c o m p l e x i t y b o u n d s , u p o n t h e feasibility teristics, tigation this
of certain etc.
transformations,
Algorithm
o f i t s record.
Hence
various
charac-
i n v e s t i g a t i o n a c t u a l l y amounts t o t h e i n v e s -
I t i s Intuitively
i n v e s t i g a t i o n depends
itself.
upon computing
clear
that
l a r g e l y upon t h e s t r u c t u r e
the importance
o f t h e chosen
the success o f of the algorithm
algorithm
notation
and i t s
structure. The
discussion
some o b j e c t s
of algorithm
are specified
the
emerging q u e s t i o n s .
use
already proved
the
graph
notation.
depend 11
beget
associated
it,
with
algorithm
matrix
only
Clearly,
o f data
consider
fully
We
algorithm
shall
parti-
including
structure graph
graph
notation c a n be
must be r e -
we
shall
associate
start
with
an o b j e c t
a
with
a number o f p r o b l e m s b e a r i n g o n a l This
connections
a particular kind
reflects
being useful
i t s de-
o f algorithm
algorithm
to generality,
notation.
investigation.
as v a r i a t i o n m a t r i x
entries
which
that
object.
properties
weighted
its
object
Fairly often
notation,
the choice
that helps solve constructively
will
and i t s
s p e c i f i c a t i o n s . Many
of exploited that
o u r commitment
general
answers t o
i s one o f s u c h o b j e c t s ,
i n i t s pure form.
i n some way o r o t h e r .
i n that
gorithm
to
assumed
a specific
be m e a n i n g f u l i f
constructive
H o w e v e r , we h a v e o b s e r v e d many t i m e s
on t h e k i n d
c a n be
Following rather
A l g o r i t h m graph
fruitful.
would only
provide
i s a c c o m p a n i e d b y some a d d i t i o n a l
cularities
flected
would
i s not always applied
scription
should
structure
that
o f algorithm. the algorithm
object
i s a
o f algorithm.
matrix
In this
o f such m a t r i c e s . The
structure
graph.
chapter
a we
I t i s referred
of i t s
Therefore
called
nontrivial
we c a n c o u n t
on
f o r t h e s o l u t i o n o f problems concerning t h e i n v e s t i g a -
t i o n o f a l g o r i t h m s and t h e i r
structures.
156
17. Graphs and Matrices C o n s i d e r any a l g o r i t h m data
change.
algorithm tations such input
We
have
the order
data
property.
within
the o r i g i n a l
graph.
on n u m b e r s . consists
an a l g o r i t h m
Denote
u
= F
k
k
l u
a l l F^
heavy
those
algorithms this
T h u s we evaluation
we
by
the value
that
evaluate
assume
that
p variables
information cluding
u^
be e a s i l y
number
certain
one o p e r a -
o f operations
data.
the algorithm
Suppose
*
< k.
(17.1)
of
their
arguments.
b u t some s e t o f v a l u e s u ^ . that
there
amounts
functions
Is quite
u
theory
F besides
propagation
and o f g r a d i e n t
obtained only
This
the recurrent
U^j...,u . Both
error
to a
to just
i s only
one r e -
to considering a t given
only
points.
Ob-
large.
relations
(17.1)
describe the
function
on t h e f u n c t i o n
roundoff
derivatives
c a n , up
functions
results
v • FUj
of
be-
as a sequence
fc
c a n assume
class of algorithms
of a certain
even i f
i s established
computations:
smooth
Without
viously,
we
k
are sufficiently
restrictions,
n o t change
k
c a n be t a k e n f o r a l g o r i t h m
represented
sequences
i n generality.
pp, t h e n
graph.
the f i r s t
and a l l t h e r e s t
postulate
t o t h e j t h node
evaluating
emerge i n
(17.1).
o f u ^ t o t h e k t h g r a p h node. C l e a r l y ,
i s used as an argument w h i l e
ed
matrix
problems concerning t h e i n v e s t i g a t i o n o f a l g o r i t h m
t h a t an a r c
i f and o n l y i f t o (17.1),
no
t h e k t h node i s p o i n t -
t o by a r c s o r i g i n a t i n g f r o m nodes k , . . . , k
Sk
1 Now we w i l l
s k*p
otherwise.
t o see t h a t
t h e k t h column o f t h e m a t r i x
q u a n t i t y u . , and t h e k t h row c o r r e s p o n d s *
kth
row h a s e n t r y
ber
of the quantity u
(17.7)
* corresponds to
t o the quantity u
k*p
. The
- 1 i n t h e c o l u m n whose number c o r r e s p o n d s t o t h e numbeing evaluated.
I t has e n t r i e s
+1 i n c o l u m n s
159 whose
numbers
describes refer The is
numbers
the informational
connection
o f information
obvious.
entries
The f o r m e r
representing
of the quantity
t i , . The m a t r i x * K*p o f u , . T h e r e f o r e we w i l l k
interconnection
t o I t a s t h e information
relation
the
a r e argument
i s derived
partial
of algorithm
matrix
connection
matrix
from
the l a t t e r
derivatives
o f F.
by s e t t i n g
to unity.
s t r u c t u r e o f n o n z e r o e n t r i e s i s t h e same f o r b o t h The
ces
information
related
nonzero
connection
to the algorithm
e n t r i e s . We
connection
matrix.
matrices
H e r e we w i l l
of this
structure
graph. S u b s t i t u t e
kind
of their
nonzero
reflects
information
of this kind
i s the
to this matrix.A l l
o f algorithm
entries fully
i s why
family of matri-
as w e i g h t e d
our discussion
the graph
a l l the
That
any numbers f o r a l l i t s
matrix
A sample m a t r i x
limit
define
matrix
matrices.
spawns a w h o l e
t o any such
of the algorithm.
matrix
variation
refer
matrix
(17.1).
to the variation
uniquely,
so t h e
the structure o f a l -
gorithm. We
have
many
times
graph nodes can s i m p l i f y hope t o a c h i e v e rix
observed
a simpler
o f graph
the
layerwise:
operations
of m a t r i x
matrix)
first,
the operations layer,
new
a r g u m e n t s o f t h e new o p e r a t i o n
operation
in
F .
Then
we
except
Fig.
trix
with
1 7 . 1 . Each
nonzero
defines
has o n l y
i s selected
and columns
t h e new
r o w o f F.
e n t r i e s equal
layer,
then
the enumeration
has a t l e a s t
1. Each
one nonzero e n t r y
we
corresponding
to
have n o t been c o u n t e d y e t , so as t o match
t h e new r o w
t o the i n i t i a l
of the information
enumeration
fol-
t o t h e arguments o f
a l l columns
F^ t h a t
f o r columns corresponding
t h e rows
accordance
Enumerate a l l
i n the f i r s t
e t c . That
enumerate
T h e new c o l u m n e n u m e r a t i o n
terchanging
mat-
t h e e n u m e r a t i o n o f c o l u m n s we p r o c e e d a s
the
enumeration,
connection
(17.1).
we e n u m e r a t e a l l c o l u m n s c o r r e s p o n d i n g
the
etc.
t h a t we c a n
by c h o o s i n g a s u i t a b l e enumera-
form o f the algorithm
i n t h e second
r o w s . To d e f i n e
lows. F i r s t ,
enumeration o f
nodes.
Consider any p a r a l l e l operations
an a p p r o p r i a t e
s t r u c t u r e o f the information
(and hence o f t h e v a r i a t i o n
tion
that
the graph d e s c r i p t i o n . I tf o l l o w s
obtain
connection
the matrix
one n o n z e r o
entry
row o f t h e h a t c h e d - o v e r that equals - 1 .
data. I n -
part
matrix
shown i n
and a l l t h e o f t h e ma-
160
b) Fig.
In trix ith
c a n h a v e more column
their is
the general
that
tion
than
column
one e n t r y
of the information
equal
i n c l u d e s 5 j such e n t r i e s
arguments.
He h a v e m e n t i o n e d
some a r c s
transport
case each
17.1
originating
of identical
data
[
t h e meaning
certain
graph
items. Recall that
a s one o f
of this
nodes
we r e f e r
situation
stand
for
to this
the
situa-
as d a t a b r o a d c a s t i n g . Thus i f t h e r e a r e d a t a
broadcasts
i n the algorithm
then
umns o f t h e i n f o r m a t i o n c o n n e c t i o n m a t r i x may h a v e n o n z e r o w i t h more t h a n o n e m a t r i x r o w . T h i s p r o p e r t y h o l d s umn p e r m u t a t i o n s . information exists If
ma-
I n p a r t i c u l a r , the
o p e r a t i o n s use Uj
i f5 that
from
to unity.
connection
I tfollows
connection
at least
matrix
one column
t h e r e a r e no d a t a
that
broadcasts
l o o k as i n F i g . 17.1b).
goes
shown
i n Fig.
t h r o u g h more
i n the algorithm
Ho c o l u m n g o e s
col-
f o r a l l row and c o l -
t h e above p e r m u t a t i o n s
t o t h e form
that
some
intersection
t r a n s f o r m the 17.1a).
There
t h a n o n e o f t h e P.. then
t h r o u g h more
the matrix
will
t h a n one o f t h e P j
matrices. Matrices
are traditionally
used
t o represent graphs.
cuss s e v e r a l k i n d s o f such m a t r i c e s , t a k i n g ral
properties
without all is
of algorithm
graphs.
i n t o account
Consider
a directed
We w i l l
dis-
t h e m o s t genegraph
G={V,E)
l o o p s a n d m u l t i p l e a r c s . S u p p o s e i t h a s n n o d e s a n d m a r c s , and
n o d e s a n d a r c s a r e m a r k e d . An n a n s q u a r e m a t r i x fi w i t h called
adjacency
matrix
o f the graph i f
entries
b. .
161 1 b.
i f an arc o r i g i n a t e s and
. =
0
Note
that
otherwise.
the main d i a g o n a l o f the adjacency m a t r i x
r e s p o n d i n g t o o u t p u t n o d e s and zero
as well.
related ation of
The a d j a c e n c y
t o i t s information
connection matrix
the matrix
graph,
transformed
As s h o w n
choice
such
P'BP is
ever
the
inform-
o f the
algorithm
' stands
connection matrix exploration I nfact
by the
this
by enumerating graph
for
c a n be appro-
amounts t o
nodes
i n an ap-
are
defined
there
by the
adjacency
a permutation
exists
i n the
T a k e any o f t h e m and 1. W i t h t h a t
that
graph,
P
g r a p h . S i n c e t h e g r a p h has
that
never
there
layer-
arc would
point
the adjacency matrix i s
the adjacency matrix
T h e r e f o r e we w o u l d
I t follows
parallei
enumerate t h e nodes
p a t h i n t h e g r a p h we w o u l d a l w a y s
node.
various
e n u m e r a t i o n any
n u m b e r . T h i s means t h a t
numbers.
starting
i f
no c i r c u i t s
Now s u p p o s e
T r a c i n g any
increasing
and only
graph
triangular.
greater
triangular.
the
I s closely
b y t h e l a s t n-p rows
matrix
superscript
o f operations.
i f
from layer
to a node w i t h
at
subsequent
directed
upper
there
starting
gular.
and the information
matrix
A loopless
f o r m s o f i t must e x i s t .
upper
graph
way.
Supposing
wise,
the
o f enumeration
B has no circuits that
matrix,
above,
the adjacency
STATEMENT 1 7 . 1 . matrix
o f the algorithm
B i s the adjacency
so as t o f a c l 1 l t a t e
transforming propriate
matrix
« i s the s u b m a t r i x formed
B ' - E , where
i s z e r o . Rows c o r -
columns c o r r e s p o n d i n g t o i n p u t nodes a r e
connection matrix. Specifically,
E i s theidentity
transposing.
priate
f r o m t h e i t h node
p o i n t s t o t h e j t h node,
are
i s upper
trian-
pass b y nodes
find
with
o u r s e l v e s back
no c i r c u i t s
i n the
no l o o p s , a l l d i a g o n a l e n t r i e s o f t h e a d j a c -
ency m a t r i x a r e z e r o . Actually tionship
this
between
proof
allows
a particular
t o establish
parallel
a n e v e n more c l o s e
form and the graph's
rela-
adjacency
matrix. STATEMENT tiple height
arcs, I
17.2.
An acyclic
defined and width
by the s
i f
directed
adjacency
and only
graph matrix
i f
there
B, exists
without
loops
has a parallel a permutation
and
mul-
form
of
P
such
162 that
P'BP
square
is
diagonal
For to
block
an
upper
blocks
triangular
of
order
not
block
order
exceeding
s t r u c t u r e can
tional
graph p r o p e r t i e s , the
matrix
can
be
discovered.
the i n f o r m a t i o n connection rectangular
nxm
be
had
little
obtained.
corresponding
We
1
with
nonzero
s.
a r b i t r a r y a c y c l i c d i r e c t e d graph,
the adjacency matrix
A
of
similar
further As
we
specify
properties of situation
details
as
addi-
the
adjacency
we
discussed
when
matrix.
matrix
A with
entries a „
is called
Incidence
matrix i f
a .. = ij
1
i f the j t h arc
o r i g i n a t e s from
-1
i f the j t h arc
p o i n t s t o the
0
Only
two
zero.
columns are
1 and
each
column
-1 s i n c e
Suppose t h a t an
A i s i m p l e m e n t e d on
tation
satisfies
the
conditions
vector.
vector
matrix
are
non-
l o o p s i n t h e g r a p h . No
two
m u l t i p l e arcs. is defined
by
i t s
incidence
s y s t e m . Assume t h e
postulated
i n Chapters
a delay vector
w.
2
implemen-
and
3.
Con-
A r e f o r m u l a t i o n of
gives
STATEMENT 17.3.
delay
incidence
no
a computational
s i d e r a s c h e d u l e t - ( t , . . . , t ) and
a
the
a l g o r i t h m whose g r a p h
matrix
delay
of
there are
i d e n t i c a l b e c a u s e t h e r e a r e no
S t a t e m e n t 7.1
i t h node,
otherwise.
entries within
They e q u a l
the
i t h node,
w
For it
is
Let
a
A be
vector
necessary
the
t
incidence
t o be
and
-A't
matrix
a
schedule
sufficient
z
of
a
graph
u
and
corresponding that
the
to
be
the
inequality
( 1 7 . 8|
to
holds. We
cannot
hope
that
the
schedules would s i g n i f i c a n t l y matrix size
matrix-vector simplify
d e s c r i p t i o n of
i t s investigation.
h a r d l y e v e r a c c o m p a n i e s an
algorithm's
i s immense. H o w e v e r , s p e c i a l
features
n e c e s s a r i l y be cidence
reflected
matrix.
This
the
The
be
useful
when
of
incidence
d e s c r i p t i o n ; besides, i t s
of
an
algorithm
graph
i n the s t r u c t u r e of nonzero e n t r i e s of
can
set
exploring
the
must
the i n -
inequality
163 (17.8), the
E t
I f t h e 1 t h component
t
j~ i
T h e r e f o r
" i j '
STATEMENT is
o f the vector
i t h and j t ha l g o r i t h m graph
necessary
e
17.4.
nodes
inequality
then
(17.8) r e f e r s t o
I t has t h e f a m i l i a r
ford
S t a t e m e n t 7.8 c a n b e r e f o r m u l a t e d a s f o l l o w s : F o r a i i algorithm
and sufficient
that
graph
the
cycles
system
of
to
linear
be balanced
i t
algebraic
equa¬
tions
A -t = U
-
be
(17.9)
compatible.
Certainly,
a l l our e a r l i e r
results
can be r e f o r m u l a t e d I n terms o f
certain
properties o f the vector
whether
I t i s w o r t h w h i l e , a n d i n c a s e i t i s , w h a t t h e p u r p o s e o f i t may
inequality
(17.8).
The q u e s t i o n i s
be. Algorithm graph i s f a i r l y
o f t e n balanced w i t h respect
t o the delay
v e c t o r W"e, w h e r e a l l c o m p o n e n t s o f e e q u a l I , STATEMENT tiple
arcs,
respect such
17.5.
An acyclic
defined
to that
directed
by the adjacency
the
vector
P'BP
is
e
i f
block
graph matrix
and only
upper
B,
i f
there
bidiagonal
without
loops
has loops exists
with
and
mul-
balanced
with
a permutation
zero
P
square
diagonal
blocks.
Consider parallel
form
execution
a solution
moments
algorithm).
t o (17.9). Using
o f t h e graph grouping (assuming
execution
l a y e r w i s e we o b s e r v e
ified
i n the statement.
scribed
form.
corresponding node
will
Build
that
differ
a parallel
only
form
Statements concerning
of
by 1. Enumerating
o f the graph,
block
t o nodes
o f some
the graph spec-
m a t r i x has t h e p r e ascribing
t o o n e a n d t h e same from
a
same
layers as the
t h e a d j a c e n c y m a t r i x has t h e form
cycles o f the a l g o r i t h m graph w i l l
vestigating
t o operations
Now s u p p o s e t h e a d j a c e n c y
t o each d i a g o n a l
be l i n k e d
( we c a n b u i l d
o n l y nodes f r o m n e i g h b o r i n g
moments
nodes
the vector
one l a y e r t h e nodes w i t h
t h e nodes c o r r e s p o n d
Each a r c c a n l i n k
corresponding
into
neighboring
t h e nodes
layer.
layers,
so
Each a l l
be b a l a n c e d .
t h e a d j a c e n c y m a t r i x c a n b e h e l p f u l when i n -
the information connection
m a t r i x and t h e v a r i a t i o n
a n a l g o r i t h m . T h e y p o i n t o u t t h e way o f e n u m e r a t i n g
matrix
the operations
164 of
the algorithm
117.1)
to facilitate
the i n v e s t i g a t i o n of i t s struc-
ture.
18. Recovering the Linear Functional Let
us
consider
i n greater
[ 1 7 . 11) ) where a l l f u n c t i o n s tion of algorithm
\
consists
p***"
= l % % 1 1 1=1
Assume t h a t a l l a f c
detail
a
F. a r e l ii nn ee aa r . F, k i n computing
special
This
\
case
implies
of
that
algorithm t h e execu-
k
\
m
* -
M
k
a r e known b e f o r e t h e c o m p u t a t i o n
(18.1) s t a r t s .
Ob¬
i viously, ber
this
function
computation determines (17.2)
that
has
t o be
a way linear
to find
v a l u e s o f some num-
i n the variables
u
u . •
Consequently,
P
i t has t h e f o r m P
u
v = Fiu i
evaluation
ive
(18.1).
than implementing may
become n e c e s s a r y
L
p
Of c o u r s e , t h e d i r e c t
it
) = f 8 .u . .
o f F u s i n g ( 1 8 . 2 ) i s much more
The
total
number o f a d d i t i o n s ,
118.1)
effect-
H o w e v e r , (3^. may n o t b e known. T h a t i s why
to evaluate F indirectly
r a l q u e s t i o n a r i s e s : 'what i s t h e b e s t way
to execute
(18.2)
} j
t o do
v i a (18.1).
The n a t u -
It?*
subtractions,
and m u l t i p l i c a t i o n s
equals
n
« = 2 £
sk-(n-p).
118.3)
k=p+l The
total
compute
number
of
additions,
subtractions,
and
m u l t i p l i c a t i o n s to
( 1 8 . 2 ) u s i n g t h e fi . e q u a l s
rt = Z p - 1 .
(18. 4)
165 Clearly, times less
f o r large
f o rdifferent costly
rectly ficient
input
data
t o precompute
a s many
W » M.
n we h a v e
times
I f (18.1)
U^,... , U p
a n d same
t h e numbers S , and t h e n
as n e c e s s a r y .
Again
i t may be
evaluate
(18.2) d i -
there i s t h e problem
this
problem
c a n be
readily
solved.
( 1 8 . 1 ) . We s e e t h a t u
i s already represented
of
we
<Xy • • • ,Up. u
up+1 like
then
many
ofe f -
evaluation of 8 .
Seemingly
fc-i'
terms
vided
i s t o be e x e c u t e d
Suppose
have
Substituting
we
obtain
the
them
analogous into
the required
a r e bounded,
finding
(18.1)
Indeed,
as a l i n e a r
combination
representations f o r a l l ufc
for
a l l
and g a t h e r i n g
r e p r e s e n t a t i o n (18.2) the explicit
consider
f o r u^. Pro-
representation
(18.2)
would r e q u i r e f o r l a r g e n about
K
"
I
2 p
s
(18.5)
k
k=p+l
additions,
subtractions,
greater
than
(18.1).
Looking
be
executed
and m u l t i p l i c a t i o n s .
t h e number o f o p e r a t i o n s at (18.3)-(18.5)
more
than p times
This
i s about
involved i n a single
we c a n c o n c l u d e
f o r same a
K,
then
that
p
times
computation
i f (18.1)
i sto
i ti s advantageous t o
1
precompute 8 . and then e v a l u a t e It from
turns
the best.
putational The ear
o u t . however,
the linear
that
T h e 8^. c a n a c t u a l l y
this
function
easily
(18.2)
obtained
directly.
result
i s
be c o m p u t e d a t a much s m a l l e r
f a r
com-
expense.
recurrent relations
algebraic
(18.1)
c a n be r e g a r d e d
as a system o f l i n -
equations
(18.6) i-1 with
respect
u
1
1
t o variables u ' ^ , 1 * ^ .
k
Actually,
we
only
have
to find
the variables u u being f r e e . Note t h a t t h e m a t r i x o f t h e n' i P system (18.6) i s t h e v a r i a t i o n m a t r i x * o f t h e a l g o r i t h m ( 1 8 . 1 ) . I t has the
form
166
-1
i f
j=i+p,
a . i f j i s among l i + p ) 0
Now
write
+