Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis
39 Data Base Systems Proceedings, 5th informatik S...
16 downloads
669 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis
39 Data Base Systems Proceedings, 5th informatik Symposium, IBM Germany, Bad Homburg v.d.H., September 24-26, 1975
Edited by H. Hasselmeier and W. G. Spruth
Springer-Verlag Berlin-Heidelberg • New York 19 76
Editorial Board P. Brinch H a n s e n . D. Gries o C. Moler • G. Seegm~iller. J. Stoer N. Wirth
Editors Helmut Hassetmeier Dr.-Ing. Wilhelm G. Spruth IBM D e u t s c h l a n d EF G r u n d l a g e n e n t w i c k l u n g S c h 6 n a i c h e r Stra6e 2 2 0 703 B0blingen/BRD
Library of Congress Cataloging in Publication Data
Informatik S~,~Do!~iL~a~ 5th~ }I~,~teg ,zo2 de." ~6he~ 19~'~. O&ta base system. (Lecture note~ .illeoa2%lter sciemce ; 39) Engl~ ~h o.r German. Sponsored by I~[~ G e ~ n y s~u& the I&~1 ~Torli T ~ e Co!~por atlono Bibliogr~p!~: p. Include-', i u ~ 1. Data base ~%nagement--Congresses. I. ~m,sse3~eia~ TI. I[o Spruth s W~ G. III. IBM De~Itschlan&o IV. IBM Wot'Id Trade Corporation. V. Title° VIo Series° QA76.9°D3152 19T~ 001.6'442 75-46~0 L
AMS Subject Classifications (1970): 00A10, 68-02, 68-03, 68A05, 68A10, 68A20, 6 8 A 5 0 CR Subject Classifications (1974): 4.30, 4.33, 4.34, 4.0, 4.22, 4.6
ISBN 3-540-07612-3 Springer-Verlag Berlin • Heidelberg • New York ISBN 0-387-07612-3 Springer-Verlag New Y o r k . Heidelberg • Berlin This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and. storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin • Heidelberg 1976 Printed in Germany Printing and binding: Offsetdruckerei Julius Beltz, Hemsbach/Bergstr.
Contents Uberlegungen H.
Remus
zur E n t w i c k l u n g von D a t e n b a n k s y s t e m e n
.......................................................
On the R e l a t i o n s h i p b e t w e e n G.
Richter
D a t a Base Research: A.
I n f o r m a t i o n and Data 21
.....................................................
B~aser/H.
Schmutz
A Survey ...........................................
Grundlegendes
zur S p e i c h e r h i e r a r c h i e
C.
..................................................
Sch~nemann
44
114
S y s t e m R - A R e l a t i o n a l D a t a Base M a n a g e m e n t S y s t e m M.M.
Astrahan~
D.D.
Chamberlin,
W.F.
King,
I.L.
Traiger
........
139
G e o g r a p h i c Base Files: A p p l i c a t i o n s in the I n t e g r a t i o n and E x t r a c t i o n of D a t a f r o m D i v e r s e S o u r c e s P.E.
Mantey/E.D.
Carlson
.......................................
D a t a Base User L a n g u a g e s P.
Lockemann
for the N o n - P r o g r a m m e r
...................................................
Ein S y s t e m zur i n t e r a k t i v e n Messdaten U.
Schauer
149
183
Bearbeitung umfangreicher
.....................................................
213
D a t e n b a n k o r g a n i s a t i o n bei der H o e c h s t A k t i e n g e s e l l s c h a f t O.
Saal
........................................................
N u t z u n g von D a t e n b a n k e n einer H o c h s c h u l e E.
Edelhoff
im n i c h t - w i s s e n s c h a f t l i c h e n
R.
Heitm~ller
Clark
Data Base S y s t e m E v a l u a t i o n Hill ......................................................
H,L.
H.
Wedekind
Data Base S t a n d a r d i z a t i o n Steel
279
291
in D a t e n b a n k s y s t e m e n
....................................................
On the I n t e g r i t y of Data Bases and R e s o u r c e L o c k i n g R. B a y e r .......................................................
T.B.
266
Implementation
.....................................................
Datensicherheit
249
beim Hessischen
..................................................
Relational Data Dictionary I,A.
Bereich
....................................................
E i n s a t z eines D a t e n b a n k s y s t e m s Landeskriminalamt
232
315
339
- A Status R e p o r t
.....................................................
362
PREFACE
The papers in these Proceedings were presented at the 5th Informatik-Symposium which was held in Bad Homburg, Germany, from September 24 - 26, 1975. The Symposium was organized by the Scientific Relations Department of IBM Germany and sponsored by IBM Germany and the IBM World Trade Corporation.
The aim of the Informatik-Symposium is to strengthen and improve the com~unication between universities and industry, by covering a subject in the field of computer science, both from a university and from an industry point of view.
During the last 5-10 years, Data Base Systems have developed from a highly speculative "Management Information System (MIS)" approach to a practical production tool. In the late 5O's and early 60's, the application program was viewed as the nucleus of an application, with multiple data sets as accessories to the application program, and multiple, more or less unrelated application programs serving the needs of a larger enterprise or organization. The modern approach views the data base as the nucleus of a data processing operation, surrounded by multiple application programs operating on its data.
This switch has significantly increased the need for features and characteristics, which permit quick adaptions to an ever changing set of external requirements. In the old approach, external changes usually could be contained to one or a few application programs and their associated data sets. Because of the tight coupling between application programs and their data in a Data Base System, external changes are much more pervasive than they used to be. As a consequence, practical Data Base System implementations require a degree of universality and generality unknown in previous data processing installations.
In organizing this Symposium, we structured the subject matter into four topics~ The topic of data structures covers the logical view the user has on internally stored data. This topic is closely related to the subject of data base languages. In doing this, we specifically tried to avoid a repetition of the popular argumentation of the pros and cons of the various data representation models, e.g. the hierarchical, network, and relational models.
VL
The second topic deals with components and technology~ Today the magnetic disk is the main technology for the storage of large amounts of data. Its peculiarities impact to a large extent the structure of today's data base systems. A major change in data base structures can be expected, if and when we succeed to replace the magnetic disk storage by another, more amenable storage structure.
System aspects is the third topic° It includes problems of data security and data integrity. The evolution of data base systems has generated numerous ethical, social and moral questions. It is the responsibility of the data processing community to assure technically acceptable solutions for those issues°
User aspects is the fourth topic of the Symposium. Data Base Systems require a number of tools for their installation, maintenance, and evaluation. Refinement and enhancement of these tools may be one of the major prerequisites for the further development of Data Base Systems.
The editors would like to express their thanks to everybody who contributed to the Symposium by preparing a talk, providing advice for its content and organization or assisting in its administration~
Boeblingen, October 24, i975
H. Hasselmeier
W.G. Spruth
@berlegungen
zur Entwicklung
yon Datenbanksystemen
Horst Remus,
IBM Palo Alto, Californien
Zusammenfassung Bei der Entwicklung te besonders
zur integrierten
Datenverarbeitung
sind zwei Schrit-
bemerkenswert:
- Die Datenbank
als Zentrale,
wobei die Anwendungsprogramme
lichen den Verkehr mit der Datenbank
regeln
im wesent-
(Abfrage oder Aufarbei-
tung). - Das Datenfernverarbeitungsnetzwerk,
das den gleichzeitigen
Zugriff
einem Programm oder einer Datenbank yon mehreren Benutzerstationen
zu aus
gestattet. Die Datenbankzentrale
des Datenverarbeitungssystems
Datei als Zugriffsdatei
fur ein bestimmtes
der Datei yon diesem einen Programm) bezUglich
ihrer Organisation.
genereller
Datenbanksysteme
re @berlegungen Benutzer
-
Programm
Ein weiterer
Schritt
0berlegungen
ist die EinfUgung
mit der Idee der Datenunabh~ngigkeit. ("integrity"
zu der
(mit OPEN und CLOSE
erfordert bestimmte
haben mit der Beantwortungszeit
schutz und Datensicherung
im Gegensatz
("performance"),
und "recovery")
AndeDaten-
zu tun° FUr den
stellt sich das System in zwei Teilen dar:
Das Datenmodell
- Die Sprache mit der diese Daten manipuliert KUnftig
werden
("user interface").
zu 15sende Probleme weisen in die Richtung yon Datenbanken mit
gleichzeitigem schiedene
Zugriff von mehreren
Knotenpunkte
verteilte
Systemen und in Netzwerken
Datenbanken.
auf ver-
]~
ENTWICKLUNG ZUR DATENBANK
Wir betrachten Mengen~ deren Elemente aus alphanumerischen Zeichen zusammengesetzte Daten oder Informationen sind. F@r diese Mengen ergeben sich folgende Operationen: a) Die Abfrage~ d.h. die Herauskristallisierung
gewisser Teilinformation
aus der Gesamtmenge° b) Die Berichterstellung,
d.h. die (meist summarische)
der Informationsmenge,
Zusammenfassung
oder Teilen daraus, nach gewissen nicht not-
wendig automatisch in der Mengenstruktur gegebenen Merkmalen. c) Die Aufarbeit~ng der Informationsmenge,
d.h. HinzufSgung, Ausstreichen
oder Ver~ndern von Teilen der Informationsmenge.
(Eine spezielle Form
der Aufarbeitung ist die Format~nderung, d.h. das Hinzuf~gen oder Fortlassen yon Information relativ zu jeder vorhandenen Teilinformation.) Historisch gesehen ergibt sich bez@glich der Struktur oder Organisationsform yon Informationsmengen folgende Entwicklung
(Abbildung ] zeigt
einen Versuch zur schematischen Darstellung): Der erste Schritt zur Zusammenfassung yon Information ist die Liste, wobei die einfachste Form die fortlaufende Liste ist. Als Datentr~ger in der urspr@nglichen Form dienen Medien auf denen lesbar geschrieben werden konnte. Die Abfrage erfolgte manuell, die Liste wird nach dem infrage stehenden Eintrag
(normalerweise startend am Anfang der Liste)
durchsucht. Eine Berichterstellung
ist in den meisten Fallen unmSglich,
da Einzelabfragen sehr zeitraubend sindo Die Aufarbeitung erfolgt manuell durch Hinzuf~gung eines neuen Eintrags am Ende oder dutch Streichung ~berflSssig gewordener Eintr~ge. Eine ~nderung im Listenformat fiche Information per Eintrag) keiten, da die zus~tzliche
(zus~tz-
f@hrt normalerweise nicht zu Schwierig-
Information ohnehin nur f~r die neu hinzuge-
f@gten Eintr~ge verf~gbar ist. Der n~chste Schritt ist die geordnete Liste mit den gleichen Medien als Datentr~ger.
Eine geordnete Liste entsteht aus einer fortlaufenden Liste
durch Sortierung nach einem Ordnungsbegriff.
Es ist auch m~glich, dab
eine fortlaufende Liste automatisch geordnet ist, z.B. bei chronologischen Listen wie Kirchenbuchregistern.
Die Abfrage ist wesentlich vereinfacht und erleichtert damit die Berichterstellung.
Bei der Aufarbeitung treten Probleme mit der Einschiebung von
Eintr~gen auf. Jede Menge daf~r vorgesehener Platz ersch6pft sich. Das f@hrt entweder zu einer Zerst~rung der Ordnung oder es muss eine neue Liste erstellt werden. Ein gewisser Ausweg sind die Erg~nzungslisten und Hinweise auf solche in der Basisliste Gesamtinformation). @bersichtlichkeit,
(anstelle des Eintrags der
Derartige Verfahren f@hren jedoch schnell zur Unz.B. werden er6ffnungstheoretische
Werke f@r Schach
immer wieder neu aufgelegt. Der n~chste Schritt ware das Auseinanderbrechen der Liste in Einzeleintr~ge, die Kartei. Sie stellt gewisse spezielle Anspr~che an die Medien. Die Schwierigkeiten in der geordneten Liste bez@glich Hinzuf~gen von Eintr~gen si~d beseitigt. Die Erfindung der Lochkarte und die damit verbundene elektromechanische Behandlung von Information bedeutete die M6glichkeit, einzelne manuelle Verarbeitungsschritte
zu automatisieren. Die semi-automatisc~e Einzel-
abfrage ist jedoch im Normalfall zu zeitraubend. Die Berichterstellung kann weitgehend automatisch erfolgen, jedoch mu~ die Lochkartenkarte~ f~r das Programm, d.h. die Tabelliermaschinenschaltung, reitet werden
speziell vorbe-
(Sortieren, Mischen und andere spezielle Arbeitsg~nge).
Die Aufarbeitung erfolgt semi-automatisch.
Problematisch wird die For-
mat~nderung, die meist zur Erstellung einer neuen Kartei f~hrt. Benutzung anderer Medien wie Platte oder Band erm~glichen vollautomatische Verarbeitung und f@hren zur Datei. Normalerweise ist diese, ~hnlich wie die Lochkartenkartei, relativ zu einer bestimmten Anwendung organisiert. Der Programmierer "~ffnet"
(OPEN) und "schlie~t"
(CLOSE) die Datei,
je nachdem ob die zugeh6rige Anwendung l~uft oder nicht. L~uft die Anwendung nicht, wird die Datei unter Umst~nden sogar physikalisch vom System entfernt; jedenfalls ist sie normalerweise nicht f@r andere Anwendu~gen zugriffsbereit. Abfrage und Berichterstellung sind auch nur f~r bestimmte Anwendungsprogramme m6glich. Die gleichzeitige Bearbeitung mehrerer Anwendungen yon ein und derselben Datenstation oder yon einer oder mehr Anwendungen von verschiedenen Datenstationen wird problematisch. Aufarbeitung und Format~nderung erfordern die automatische Erstellung einer neuen Datei.
Eine Vielzahl
yon Anwendungen
menge f@hrt zur Datenbank.
und Benutzern
fNr ein und dieselbe Daten-
Ihre speziellen Erfordernisse
werden im fol-
genden n~her erl~utert.
2o
DATENBANKEN
Implizit
enthalten
minimalen
in der Definition
Redundanz
st~ndlichen Zugriff
UND DATENBANKSYSTEME
Struktur,
zu einer Datenbank
erfolgt normalerweise
@berwachung
ter. Neben der Erhaltung
Systemprogrammierer Beantwortungszeit physikalische
Anwendungen
der Datenbank
der Integrit~t
eine optimale und Speicher
Organisation
weise von Indizes
ist das Konzept der
einer f~r den Benutzer ver-
dem Datenmodell.
Benutzern mit verschiedenartigen eine fortlaufende
der Datenbank
und die Notwendigkeit
von einer Reihe yon
gleichzeitig.
durch einen Datenbankverwal-
der Datenbank
Erzielung
streben diese
von Leistungsfaktoren
an. Sie interessieren
der Datenbank~
Das erfordert
wie
sich daher f@r die
einschlie$1ich
der Wirkungs-
und Zeigern°
Die Anwendu~gsprogrammierer logische Datenmodell
oder "Enduser '~ interessieren
und f@r Wege zum Wiederauffinden
sich f~r das
und zur Aufarbei-
tung yon Datenbankelementen. Um zu verstehen~
welche Forderungen
der Anwendungsprogrammierer, wendungen Zun~chst
yon Datenbanken
oder Begriffs
yon Stapelverarbeitung
oder nachdem eine bestimmte Menge der Echtzeitverarbeitung tenmenge
(batch processing)
erinnert
erfolgt die Verarbeitung
gruppenweise
und
haben, m~ssen die An-
werden.
(real time processing)
Bei der Stapelverarbeitung Merkmales
n~her untersucht
sei an den Unterschied
und Echtzeitverarbeitung
beide, der Datenbankverwalter
an Datenbanksysteme
an bestimmten
zur Verarbeitung
(Abbildung
bez~glich
2),
eines
festgelegten angesammelt
Terminen ist. Bei
wird jeder Schritt sofort auf der gesamten Da-
ausgef~hrt.
Au~erdem sind bei den Anwendungen
zwei Parameter
von besonderer
tung: . die Voraussehbarkeit die H~ufigkeit gleichartiger
Zugriffe
(Repetivit~t).
Bedeu-
Hierbei gibt es bezNglich beider Merkmale eine Reihe yon Mischungen. Man wei~ z.B. nicht im voraus, nach welchem Tell eines Lagerbestands ein Magazinverwalter fragt. Was er darOber wissen will, ist jedoch genauestens bekannt.
Im allgemeinen kann man Datenbankoperationen
folgende verschiedenartige Operationen einteilen
in
(Abbildung 3):
I. Wirkungsvolle Ausffihrung sich wiederholender Arbeiten
(traditionelle
Stapelverarbeitung). 2. Im voraus definierte Abfragen 2
("Wie gro$ ist der Lagerbestand an
Zoll N~geln ?").
3. Zuf~llige, schlecht strukturierte und unvorhergesehene Abfragen
("Wie-
viele Ingenieure in Hamburg haben ein Monatseinkommen von mehr als DM 6000.-- ?"). Ein System, das Nr. I und 2 behandelt, wird "Operational"
oder "Supervisory System" genannt, ein System, das Nr. 3 behandelt, ein "Informa,ions" oder "Executive System". Beispiele for beide Gruppen w~ren: "Operational" Systeme: Bank mit Datenstationen an jedem Schalter, Flugreservierung, Flugsicherung. Informationssysteme;
BOcherei mit Aufsuchen von Information nach Kenn-
wort, Marktinformation fNr Management, Datenbank mit Personaldaten. Ein und dieselbe Datenbank sollte normalerweise die Anwendung beider Systeme erlauben.
3.
SPEZIELLE ANFORDERUNGEN AN DATENBANKEN
Es wurde bereits auf die Forderung der minimalen Redundanz hingewiesen. Die meisten Band-Bibliotheken enthalten eine FOlle von redundanten Daten. Unkontrollierte Behandlung der Frage der Redundanz kann (wie z.B. bei vielen BOroablagesystemen)
zu der Notwendigkeit h~ufiger Um- oder Neuord-
nung fOhren. Eine weitere Frage ist natOrlich der Verbrauch an Speicherplatz und die damit verbundene Kostenfrage. Mehrfache Kopien derselben Daten k6nnen au~erdem wegen eines m6glicherweise verschiedenen Aufarbeitungsstandes zu verschiedener Information fOhren. Ziel einer Datenbankorganisation sollte es also sein, Redundanz zu vermeiden, w o e s
6kono-
misch richtig
erscheint.
chen Wiederherstellung erforderlich
Aus Gr8nden der Datensicherheit
fehlerhafter
Daten kann jedoch einige Redundanz
sein.
Eine weitere Forderung
ist die Vielseitigkeit in der Darstellung von
Datenbeziehungen.
Verschiedene
logische
die jedoch alle auf derselben
Dateien,
Sehr bedeutend
Programmierer
Entscheidende
Benutzer
einer Datenstation
einheit,
die ein System bew~itigen
Verkehrsvolumen, (throughput)
Leistungsfaktoren
Bedeutung
erwarten
(Hinzuf@gen
Leistungssteigerung
der Obertragungen
tere Ma~nahmen in mehrere Datenbank
in Betracht
yon mehreren
beitungssysteme
in der Sekunde rasche
etc.).
ohne Bedeutung.
ist ein Dialog mit einer Antwortzeit
cheneinheit
yon Einflu$
Es ist notwendig,
Nat~rlich
and privacy"
Kontrollen
so gestaltet
nicht
zerst6rt werden
System mu~ daher die M6glich-
= Datenschutz).
gesch~tzt wer-
Diese Forderung
kann ~ber-
da~ das System die Authorisation
und seiner Aktionen ~berpr~ft sollten
der Re-
beinhalten.
tragen werden auf die Forderung~ Benutzers
von 2 Sekunden
untereinander
In vielen F~llen m~ssen Daten vor dem Zugriff Unbefugter ("security
F~r
des Datenbanksystems.
oder andere "Unf~lle"
( D a t e n s i c h e r h e i t ). Jedes
den
Stapelverar-
ist die Leistungsf~higkeit
da~ Daten und ihre Beziehungen
keit yon Datensicherheitstests
der Datenbank
(Stapelverarbeitung).
auf die Leistungsf~higkeit
durch Maschinenfehlverhalten
sind wei-
Ihr Entwurfskriterium
gewisse
erforderlich.
Um die erfor-
aus. F~r traditionelle
des "batch processing"
Anwendungen
zu
oder Zugriff zu einer
ist die Effektivit~t oder weniger
erfor-
Steigerung
wie z.B. Aufspaltung
(Dezentralisierung)
ist die Antwortzeit
Es gibt heute
in den Griff zu bekommen,
zu ziehen,
Rechenanlagen
je Zeiteinheit
und Gro~banken.
Bank-Zweigstellen
besser
Einzeldatenbanken
je Zeit-
ist. Systeme mit hohem Verkehrsvo-
ist eine weitere
von weiteren
fur die
der Obertragungen
die 10 und mehr Obertragungen
Bei derartigen Anwendungen
derliche
beruhen.
kann. Es gibt Systeme mit geringerem
lumen sind z.B. Flugreservierungssysteme bereits Anwendungen,
Datenbank
sind die Antwortzeit
und die Anzahl
bei denen die Anzahl
von geringer
benutzen unterschiedliche
der Leistungsf~higkeit eines Datenbank-
sind die Aspekte
systems.
dern.
und zur mOgli-
(z.B. durch ein Passwort).
sein, da~ geschickte
nicht ohne weiteres
umgehen
k6nnen.
und notiert werden,
soda~ falscher Gebrauch
Programmierer
Auch sollten die Aktionen nachtr~glich
eines Die sie
~be~acht
herausgefunden
werden kann. Ebenso ist es erforderlich,
da~ die Datenbank selbst lau-
fend @berpr~ft werden kann. Au~erdem tritt die Forderung auf, Anwendungsprogramme unabh~ngig yon der Datenorganisation und Zugriffstechnik zu schreiben (Datenunabh~ngigkeit). Z.B. bietet IMS [3] einen gewissen Grad yon Datenunabh~ngigkeit, indem neue Datensegmente an bestimmten Punkten der Hierarchie ohne Programm~nderung hinzugef@gt werden k~nnen, oder auch die L~nge eines Datensatzes oder die Aufteilung der Datenbank in Datengruppen ge~ndert werden kann.
4.
DATENBANKSTRUKTUREN
Die Funktion einer Datenbank ist das Abspeichern der Daten und der Beziehungen zwischen den Daten. Die logische Beschreibung einer Datenbank wird das Datenbankschema genannt. Ein Schema definiert also das Datenmodell fur den Anwender. Ein Subschema ist die Aufgliederung der Datenbank f~r ein spezielles Anwendungsprogramm. Abbildung 4 zeigt das Zusammenwirken der verschiedenen Teile innerhalb eines Datenbanksystems und insbesondere die Bedeutung der Begriffe Schema und Subschema. Abbildung 5 zeigt die Aufgliederung einer Datenbank zur Arbeitsplatzbeschaffung. Die Beziehungen zwischen den einzelnen Dateien sind klar ersichtlich. Die Arbeitgeberdatei gibt die Einzelheiten zu dem Feld "Arbeitgebernummer",
die Talentdatei die Einzelheiten zu dem Feld "Gefor-
dertes Talent" in der Arbeitsplatzliste. form f~r Datenbankstrukturen:
Hierbei zeigt sich eine Haupt-
die hierarchische Gliederung.
Die Dateien
"Arbeitgebernummer"
und "Talentgruppe" sind Untergliederungen der Datei
"Arbeitsplatzliste"
~Eltern-Kind-Beziehung).
Die M@glichkeit Beziehungen zwischen den einzelnen Datenfeldern in der Datenbankstruktur zum Ausdruck zu bringen, hat zu drei wesentlichen Datenbankorganisationsformen gef@hrt: ]. Die hierarchische Datenbankstruktur
(Abbildung 6). Hierbei hat der
hSchste Level einen und nut einen Knotenpunkt,
die "Wurzel des Baumes".
Jeder Knotenpunkt eines anderen Levels erh~it genau einen Knotenpunkt in dem n~chsth6heren Level zugeordnet.
Knuth
[4] definiert
sprechend
einen Baum oder eine hierarchische
Struktur
ent-
als "eine endliche Menge T von einem oder mehr Knotenpunk-
ten mit a. einem speziell
ausgezeichneten
Knotenpunkt,
der Wurzel
des Baumes
und b. m~O verbleibenden
disjunkten
(unverbundenen)
wobei jede dieser Teilmengen Teilbgume
genannto"
IMS [3] verwendet
die hierarchische
2. Falls ein Knotenpunkt Ebene zurNckgef@hrt
Netzwerk
~'
bezeichnet.
Die entstehende
zeigt einige einfache
Komplexere existieren.
entstehen,
nur ein spezieller
Netzwerkstruktur wenn mehrfache,
ohne Redundanz
den Datenbankelementen
Abbildung
NatNrlich
7
ist
Fall dersel-
ist ein Stammbaum. Level
und Redundanz
zurNckgef~hrt
werden.
k6nnen
Die Aus-
[I] fNhren zu einer Netzwerkstruktur. auszukommen
und die Beziehungen Kalk@l darstellen
data base" nach Codd
zwischen
zu k6nnen,
(siehe ausf~hrliche
Be-
in [2]).
Die Grundoperationen
zur Formung neuer Datens~tze
Die Sprache
aus sehr elegant,
doch haben sich Implementierungen
Leistungsf~higkeit mit Datensgtzen
erscheint
sind Vereinigung
und Durchschnitt.
vom mathematischen
bisher wenig durchgesetzt.
auf dem gleichen Level
keit des Datenmodells manipuliert
des Wortes Sprachbe-
nicht algorithmisch
von Mehrfachindizes
als algebraischen
f@hrt zu der "relational
"Netz-
den Elementen verschiedener
auf Baumstrukturen
der Codasylgruppe
3. Die Forderung
schreibung
zwischen
Unter EinfNhrung
verwendet.
yon Netzwerkstrukturen.
oder Baumstruktur
Netzwerkstrukturen arbeitungen
"plex structures"
Beispiele
Beziehungen
Gebrauchs
wird im angloamerikanischen
einer einfachen
Strukturen
bestimmbare
nicht mehr
Struktur wird als
Wegen des vielseitigen
reich hgufig die Bezeichnung eine hierarchische
einer h6heren
werden soll, kann die Beschreibung
in der Datenindustrie
ben. Ein Beispiel
Datenbankstruktur.
auf mehr als einen Knotenpunkt
durch einen Baum erfolgen. werkstruktur
Teilmengen T I ..... Tm,
ein Baum ist. Diese Teilmengen werden
und Einfachheit
werden k6nnen.
aus Gr~nden der
Die Vorteile yon Datei
gliedern
sich um Obersichtlich-
der Sprache mit denen Beziehungen
Darstellungen
Form k~nnen durch Verwendung
Standpunkt
in "relational
von Mehrfachindizes
data base"-
und Redundanz
auf
obige Formen der hierarchischen oder Netzwerkstrukturen
zur~ckge-
f~hrt werden. Im Zusammenhang mit Datenbankstrukturen wird h~ufig yon Listen und Ringen gesprochen (chains or lists, rings). Diese Strukturen beziehen sich jedoch auf die Art, in der Datens~tze innerhalb einer Datei untereinander verbunden sind. Sie beschreiben daher Techniken, wie logische Strukturen aus physikalischen erreicht werden, w~hrend die unter I-3 beschriebenen Strukturen spezielle Formen logischer Strukturen darstelfen. Ein entscheidendes Element f~r beide, die Listen- als auch die Ringstruktur,
sind die Zeiger (pointer),
die yon einem auf den folgenden
Datensatz weisen. Bei der Ringstruktur sind dabei normalerweise zweiseitige Zeiger gebr~uchlich.
5.
DATENBESCHREIBUNGSSPRACHEN
Eine Sprache, die die logische Datenstruktur beschreibt,
sollte die
folgenden Forderungen erf@llen: Die Gliederung in Datenmengen wie Dateien, S~tze, Segmente, Datenelemente, sollte klar beschreibbar sein. Jeder Typ einer solchen Mengeneinheit sollte spezifisch bezeichnet sein (z.B. sollten 2 verschiedene Satztypen verschiedene Bezeichnungen haben). Die Untergliederung einer bestimmten Datenmenge in bestimmte Untermengen sollte klar erkennbar sein (welche Datenelemente in einer bestimmten Datengruppierung enthalten sind etc.). Die Aufeinanderfolge mug spezifiziert und Wiederholungen sollten aufgezeigt sein. Die Sprache sollte ausdr~cken, welche Datenelemente als Indizes benutzt werden. Beziehungen zwischen Satztypen, Segmenttypen etc., die die Grundlage der Datenstruktur bilden, m@ssen spezifiziert und klar bezeichnet werden.
10 Nach J. Martin [5] ergeben sich je nach dem Gesichtspunkt des Benutzers verschiedene Level der Datenbeschreibungssprachen (Abbildung 8): I. Die Sprache ffir den Anwendungsprogrammierer, schema beschreibt in DL/I
(z.B. die Datendivision
(PSB = program specification
2. Die genere!le Beschreibung bankverwalter
des Schemas der Datenbank,
ion). Die COBOL Datendivision
3. Die physikalische losgel6st
block)). die vom Daten-
angewandt wird (z.B.: DL/I logical data base descript-
einem Schema zu beschreiben. werden.
description).
die das Datenbanksub-
in COBOL oder die PSBs
erlaubt z.B. nicht, die Beziehungen
Datenbeschreibung
Im Gegensatz
(z.B.: DL/I physical data base
zur logischen Datenbeschreibung,
ist yon Hardware- und Speicherfiberlegungen,
doch fur Leistungsoptimierung Auger DL/I ist wahrscheinlich
in
Sie kann daher bier nicht verwendet
die v@llig
sind diese je-
sehr interessant.
CODASYLs data description language DDL
die bekannteste Datenbankoeschreibungssprache.
6.
0BERLEGUNGEN
BEI DER HARDWARE
Es sind Datenbanken yon der Gr6~enordnung Bytes bekannt. denkbar,
yon mehr als 4 Milliarden
Das entspricht 40-50 Platteneinheiten
eine Platteneinheit
igngerer Zugriffszeit
IBM 3330. Es ist
durch eine gr6~ere Speichereinheit
zu unterst~tzen,
mit
ghnlich wie beim virtuellen Spei-
cherkonzept zwischen Kernspeicher und Platte. Die vor etwa einem Jahr angekfindigte IBM 3850 liefert z.B. 103 bis 104 mehr Speicherraum mit einer um den Faktor 102 verlgngerten Zugriffszeit. Der Benutzer sieht das System als ein einziges Plattensystem, ffir Leistungsf~higkeitsbetrachtungen sind die Hardware-Parameter jedoch von gr6~ter Bedeutung. Zum Beispiel bestehen strenge Abh~ngigkeiten zwischen Antwortzeit, Obertragungsrate und Direktspeichergr6~e, oder Speicherverf@gbarkeit in der niedrigsten Stufe der Speicherhierarchie.
Die Antwortzeit wgchst mit der
0bertragungsrate und f~llt mit mehr Direktspeicherverf~gbarkeit (weniger paging). Die Obertragungsrate kann mit mehr Direktspeicher gesteigert werden.
11 Andere Hardware-Parameter sind nat~rlich die Geschwindigkeit des Computers, der Aufbau und die Komponenten des Nachrichtennetzes.
7.
AUSBLICK
Die zus~tzlichen Anforderungen f~r Erweiterungen bestehender oder Entwicklung zuk~nftiger Datenbanksysteme gliedern sich um die folgenden Aspekte: a) Steigerung der Leistungsf~higkeit.
Wachstum der Datenbank und der
Anzahl der Datenbankbenutzer erfordern h6here 0bertragungsraten und k@rzere Antwortzeiten.
Die Antwort liegt in geeigneteren Datenbank-
organisationen und einer Minimisierung von Verwaltungsfunktionen. Gewisse Hilfsmittel der Hersteller erm6glichen gin "tuning" der Datenbank, dazu ergeben sich Anwender-beeinflu~te Verbesserungsm6glichkeiten.
Gewisse Verbesserungen sind dutch geeignetere Verwendung
yon Hardware erzielbar (multiprocessing oder ~hnliche Verfahren). b) Fortlaufende Operation.
Die Forderung einer 24-st~ndigen Zugriffs-
m6glichkeit zur Datenbank f~hrt zu gewissen Konsequenzen bei der Implementierung. Zun~chst wird bei Unterbrechung durch Fehlverhalten eine schnelle Wiederherstellung der Datenbank und kurzfristige Wiederaufnahme der Operationen notwendig. Das erfordert die F~hrung eines schnell zugriffsbereiten "Journals". AuBerdem sollte an den besten Techniken zur Fehlerverh~tung,
-auffindung und -korrektur gearbeitet werden.
Eine weitere Forderung ist, die Datenbank - bei gleichzeitiger Fortf~hrung des Routinebetriebs - zu reorganisieren.
Ein Dictionary
[7]
kann dabei als wesentliche Hilfe zum Management der Datenbanken dienen. c) Einfachheit der Installierung und Benutzung.
Die Parameter, die zur
optimalen Organisation einer Datenbank f@hren, sind sehr komplex. Systemhersteller helfen allgemein mit automatischen Organisationshilfen oder Hinweisen in der Dokumentation. Die Frage der Installierbarkeit ist weitgehend identisch mit der M6glichkeit, die physikalische Representation der Datenbank zu verstehen. Wiederum kann ein Dictionary
[7] n~tzlich sein.
!2 Einfachheit der Benutzung h[ngt wesenzlich mit der Beschaffenheit der Sprachen zur Datenmanipulierung
und -beschreibung und dem "inter-
face" zu den Programmierungssprachen Weitere Funktionen,
ab.
die zur Vereinfachung
der Benutzung f8hren~ haben
mit der automatischen Regelung des Informationsflusses zu tun. wesentlich ist hierbei die Handhabung der Kontrollinformation (Kontrollbl~cke)~ wie sie z.B. bei der standard network architecture Um die sp~tere Benutzung zu vereinfachen, geh6rige Systeme auf die M6glichkeit
erfolgto
m8ssen Datenbanken und zu-
zur sp~teren Ver[nderung bzw.
Erweiterung ausge!egt sein.
Literatur [!] CODASYL~
"1974 Status Report on Data Base Activities"
(Z] Date, C.J.~ "An Introduction Addison-Wesley,
to Database Systems".
Reading, Mass.
~3~ Information Management
Ig75
System, "System/Application
Design Guide"
IBM Form No. SH 20-9025 [4] ~nuth, D.E.~ "The Art of Computer Programming3 Algorithms".
Addison-Wesley,
Reading, Mass.,
Vol. I, Fundamental
1968
[5i ~artin, J.~ "Computer Data Base Organization", Prentice-Hall, Englewood Cliffs, N.J., 1975 [6] Senko, M.E.~ Altman, E.Bo, Astrahan, M.M and Fehder, P.L., "Data Structures
and Accessing
IB~ Systems Journal [7] Uhrowczik,
in Data-Base Systems".
12, 30-93 (1973)
P.P., "Data Dictionary/Directories".
I~4 Systems Journal 12, 332-350
(]973)
Medium, das menschfiches Schreiben und Lesen erlaubt.
Fortlaufende Liste
Lochkarte
Band, Platte
Lochkartenkartei
Datei
Abbildung 1
ENTWICKLUNG ZUR DATENBANK
Datenbank
Medium,separierbar je Eintrag
Kartei
Geordnete Liste
Datentr~ger
Datendarstellung
Semiautomatisch, die Kartei wird fLir das entsprechende Programm vorbereitet
Manuell
Manuell, bestimmt durch zeitraubende Einzelabfragen
Berichterstellung
Automatisch unbegrenzt
Auto matisch soweit Information vorhanden unbegrenzt
Automatisch, beAutomatisch, die Datei wird fLir das schr~inkt auf die zu dieser Datei geh6ren- entsprechende Programm vorbereitet de Anwendung
Manuell oder semiautomatisch (sehr zeitraubend)
Manuelt, unter Benutzung des Ordn ungsbegriffs
Manuelles Durchsuchen (generell: Start am Anfang)
Abfrage
Automatisch t unbegrenzt
Automatisch, mit h~iufiger Neuerstellung
Semiautomatisch
Manuell, unbegrenztes Hinzuf~Jgen m6glich
H&ufige Neuerstellung wegen Aussch6pfung des Platzes fiJr ZufiJgungen
Manuelt, ZufLigung neuer Eintr~ige am Ende
Aufarbeitung
Automatisch unbegrenzt
Erfordert normalerweise Neuerstellung der Datei
Erfordert normaler~eise Neuersteliung der Kartei
Kein Problem, neues Format bleibt auf neue Eintrage beschr~inkt.
Formatanderung
Co
14 STAPELVERARBEITUNG
~ " - " " l m ~
{ BATCHPROCESSING)
GEMEINSAME~ ~
i 125.s,,7o.2~ llp
GEMEINSAME ( 26,5. )
•
J + ( 25.s., ~3.01 ) V
t 29.5. )
y
! !
+ ECHTZEITVERARBEITUNG
(
REALTIMEPROCESSING)
T
I' r
ABBILDUNG 2
15
Operational Systeme
InformationsSysteme
Zugriff
geplant oder vorausprogrammiert
spontan, nicht vorausprogrammiert
Typische Beispiele
Bankschalter Ftugreservierung
Verkaufsanalyse, Personalinformation
Typische Benutzer
Bankschalterbeamte, Vorarbeiter, Unteres Management
lnformationsstab, Mittleres Management, Assistentendes h6heren Management
Normalzweck
Unterstiitzung von Routine Operationen
Unterstlitzung von Planung und dringenden InformationsbediJrfnissen
Antwortzeit
Sekunden
Minuten oder Stunden
Implementierer der Anwendung
Programmierer
Informationsspezialist
lmplementierungszeit
Wochen oder Monate
Stunden
Typische Sprachen
COBOL, FORTRAN, PL/I
IQF, GIS
MERKMALE FOR DATENBANKSYSTEME (nachJames Martin) Abbildung 3
I
DATENBANK SYSTEM
1
ABBILDUNG 4
WIRKUNGSWEISE EINES DATENBANKSYSTEMS
SYSTEM PUFFER
ARBEITSBEREtCH DES PROGRAMMS
ANWENDUNGS PROGRAMM A
17
NAME
ADRESSE
NAME
I
ADRESSE
VERFOGBARKEIT
I
i
ERFAHRUNG
ARBE1TSKLIMA
AUSBILDUNG
t
l-t DATEN
GEHALT
SOZIALE LEISTUNGEN
ABBILDUNG 5
AUFGLtEDERUNG EINER DATENBANK A R B E I T S P L A T Z B E S C H A F F U N G
TALENT GRUPPE
TALENT DATEI
ARBEITGEBER NUMMER
ARBEITGEBERDATEI
ARBEITSPLATZLISTE
I
ABBILDUNG 6
HIERARCHISCHE DATENBANKSTRUKTUR
/ \
WURZEL
jl
1
LEVEL 4
LEVEL 3
LEVEL 2
LEVEL
~o
~BBILDUNG
7
DATENBAN KNETZWERKSTRUKTUREN
411
20
ANWENDUNGSPROGRAMMIERER t
SUBSCHEMA
A
i tSUBSOHEMAI ,,
-..../_...scHEMA ./~ZU
GLOBALE ODER GENERELLE DATENBANKBESCHREtBUNG ( DATENBANKVERWALTER)
AUTOMATISCHE AUSF(JHRUNG DURCH DATENBANKSYSTEM
I
PHYSIKALISCHEBESCHREIBUNG
Oa DNUNG SUBSCHEMA
PHYSIKAL1SCHE J SPEICHERZUORDNUNG
I DATENBANKBESCHREIBUNG
LEVEL DER DATENBESCHREIBUNGEN
ABBtLDUNG 8
On the ~ e l a t i o n s h i R Gernot Richter, (G~D),
Sf.
between Information
Gesellschaft
fuer
and Data und
Mathematik
Datenverarbeitung
Augustin
Summary On
the
background
analyzed
of a general
which explicitly
represeniation.
Using a conceptual
to talk about information on
the
representation
In
the
of
with
a
data base management
For information
discussed.
have
been
characterized their
functional
realization.
of
This
gives
in
[ANSI]
recognized
under to
level present
motivation
to
in the field of
allows for the exchange
roles
work stations than
Years ago this kind of functional (Instanz)
consideration.
In
a
of messages
which units
these functional or
within the system rather
and applied in [ABN]
introduce
functional
Recently
as
offices influence each other by communicating been
the
differentiation
communicating
There the term office
units
The significant
and representation
some topics concerning
in the sense of [DIN]. identified
been introduced
of C. A. Petri.
some ideas
~ystems
only by their function
technical
has already
which has been designed manipulation,
a view has been proven to be very useful
consisting
(Funktionseinheiten)
its
systems.
systems
them
and
and data a definition is outlined.
for conceptual
I. A model view of information
considers
systems a view is
information
are presented.
for the information
are
plea
(IMC)
and their
these considerations
data base technology conclude
units
system
structures
For the concepts of format light
between
of information
role of type declarations is shown.
model of information
distinguishes
following
by
units
a suggestion
has been chosen fox the
information messages.
complementary
systems
So the need has
functional
between offices.
the
To this
unit which kind
of
22
functional
units
the
concept of interfaces concept of channel: communication only
term c h a n n e l
(Kanal)
was given in [ABN].
as used in [ANSI] has a direct relation An i n t e r f a c e
The
to
the
is a system of rules which govern the
via a c o n s i d e r e d channel.
by its function within the system
Also a channel is c h a r a c t e r i z e d serving
as
a
facility
where
messages can be posted and taken by the c o m m u n i c a t i n g offices.
This
yields
a model view of information systems
which provides
d e c o m p o s i t i o n into two d i s t i n c t classes of functional - offices channels
-
gained
some
discussion
by the processes they can perform
characterized
by the states they can assume.
publicity,
base management
since
the
and in the area of s t a n d a r d i z a t i o n
With the above model in mind
publication
of
of
two
we
want
offices
recently
[ANSI]
has
is under
(IFIP/TC-2 and I~G)
(ISO/TC 97/SC 5).
via
adequate minimum c o n f i g u r a t i o n to information
To
systems
both in the world of s c i e n t i f i c r e s e a r c h
communication
units:
characterized
This model view applied to data
for the
to
do
a
close
one channel.
examine
the
look
to
the
This seems to be an
interrelation
between
and data.
i l l u s t r a t e this c o n f i g u r a t i o n
where offices are depicted
we use the graphic notation of [PET],
by boxes and channels
by
circles
(in
the
cited paper only e l e m e n t a r y offices and c h a n n e l s are considered). yields fig. is
I.
In the adopted model c o m m u n i c a t i o n
done by exchanging
messages
This
between both offices
via the linking channel.
The arrows in
the above figure only i n d i c a t e the possibility of access and are functional
n o
units.
A further aspect is depicted in fig. only sense if both c o m m u n i c a t i n g
I:
The exchange of messages
offices have a
common
makes
background
of
understanding,
which allows them to interpret the messages found in the
channel.
assumption
The
useful auxiliary
such a "uniwerse of discourse" is a very
of
model for
between t e c h n i c a l f u n c t i o n a l
the
understanding
units.
of
communication
also
23
2. Model i n f o r m a t i o n and abstraction
So
far no reference has been made to a distinction between i n f o r m a t i o n
and data.
But words as "represent"
mapping between two things. there
are
two
abstraction,
and "interpret" indicate
mappings to be considered.
i.e.
a kind
of
It is the goal of this section to show that Both have the nature of an
omission of features not to be considered - hut they
start at different points.
One
kind
of abstraction starts with the so-called initial i n f o r m a t i o n
(Ausgangsinformation), knowledge
which is to
be
understood
or ideas a person has about something
anything else). intended
For a certain
purpose
pragmatic
as
the
whole
context,
i.e.
pursuing
part of it. The information about a person e.g.
is different
for a d m i n i s t r a t i v e purposes and for medical purposes;
information
about
a
technical
from what is needed for e n g i n e e r i n g purposes.
result of the abstraction process information
has
been
(~odellinformation).
yields
indicates,
the
"engineering
called
In
[STEEL] the above abstraction is called the which
the
process for teaching purposes will be So it
i n t e n d e d purpose which controls the abstraction process.
model
an
it might be that not the whole information is needed
but only the "relevant"
different
of
(of the real world or
model".
is
the
In [DURI] the
the
(respective)
similar c o n s i d e r a t i o n s "engineering
The
term
that we are still on the information
of
abstraction,'
model information
level.
In the present
context
we do not adopt any definition of information;
the concept is
used in
the
sense
of
knowledge
or
idea
(about
something).
Thus
i n f o r m a t i o n is viewed as being of mental nature.
It
is
obvious,
that
depending
on
the
respective intended purpose
various abstractions can be performed on the same initial information.
It
is
not
information
of
interest
"exists"
in
this
presentation,
or not - whatever that
whether
the
model
means. However we found the
approach very useful which assumes a level of model information
(as did
also other authors).
Model i n f o r m a t i o n cannot be communicated directly nature.
There must be a r e p r e s e n t a t i o n of it
handed
out
to
the addressee
(on a medium)
which can be
(or which can he stored for later use).
Such a r e p r e s e n t a t i o n is what usually is called between information
because of its mental
"data".
The distinction
and its r e p r e s e n t a t i o n is the background
all the following ideas have been developed.
on
which
24
Now it is possible to show the other a b s t r a c t i o n is
of
a g u i t e different
sense of data)
nature.
C o n s i d e r some messages
which by a g r e e m e n t
have the same meaning.
mentioned above,
between
the
messages
"semantics"
model
information.
There are
informa±ion.
and the process of
Such
rules a
mapping
for
mapping
the
to
the
"interpretation".
So we have an abstraction
pertinent
representational
There
is
one
model
If
several
they all have the
from various r e p r e s e n t a t i o n s
by
ignoring
the
respective
problem
which
might have been apparent
C o n s i d e r i n g the c o m m u n i c a t i o n
already in the
beween an
author
audience he has the need of r e p r e s e n t i n g model information,
he wants to write reference
about.
language
represented
and
is
the
For
this
purpose
beneficial,
in
interpretation
representation
whenever
a
kind
which
of
the
(graphical)
information
following
emphasis
is
laid
and which
can
of which is agreed upon.
g r a p h i c a l language will be p r e s e n t e d in canonical
of
is called
peculiarities.
above discussion. the
information
mapping
usually
messages are mapped onto the same model information, "same meaning".
As
e x c h a n g e of messages is assumed to have the goal
model information.
to
offices
What is "same meaning" in the present case? Any
pointed out,
to exchange
(here in the
communicating
message is c o n s i d e r e d to be a r e p r e s e n t a t i o n of model already
which
and on
be
Such a
used the
for model
i n f o r m a t i o n rather than on one of its possible representations.
3. O u t l i n e s of a c o n c e p t u a l
model of i n f o r m a t i o n
Before dealing with any problems of r e p r e s e n t a t i o n
the
model
What is an adeguate
information
itself
have to be identified.
view of model i n f o r m a t i o n
with respect to a p p l i c a t i o n s ?
brings
least in the past)
us
into
argumentation models"
a
about
(at
This
network,
of
question
very c o n t r o v e r s a l area of
the a d v a n t a g e s and d e f i c i e n c i e s of so-called
(hierarchic,
considerations
properties
relational,
...).
For
"data general
we can avoid this topic by adopting a view which covers
the various ,'data models".
This view has been outlined in [DUHI]
and is
r e f l e c t e d in a c o n c e p t u a l system called I n f o r m a t i o n M a n a g e m e n t C o n c e p t s (IMC).
These c o n c e p t s have been developed as a means for talking about
model information, systems.
in p a r t i c u l a r in the context
Simultaneously,
rules
for
graphic
i n f o r m a t i o n in terms of IMC were developed. IMC
r e p r e s e n t a t i o n of model
Both the basic concepts of
and the related c a n o n i c a l r e p r e s e n t a t i o n s
section to f a c i l i t a t e the treatment of the
of data base management
will be outlined in this
topic
of
"data"
(in
the
25
sense
In
of representation)
IMC
any portion
communication information library,
to
in a factory.
component
Depending
on
aggregate
is
A
way
either
of a
These
immediate
generic
unordered
a
(mathematical)
constructs.
The domain of a nomination
components
selection
of immediate
components
in the Vienna
To show examples
of atoms, above
the
vertex.
(fig.
always
nomina t i o n s
circles. network"
hy
example
of a "relation"
construct
is given
can
at
the
representation
of)
the same construct
the
nature
of
serve
e.g.
manner
[ZEM]).
framework
the
for the
(in the same
Beyond
of IMC.
we first
have
In IMC a box
is shown
either
In a tree r e p r e s e n t a t i o n is e x p r e s s e d
techniques the
is
by t ~
possible.
representation
by small circles are written
of
attached
close
to
the
and the c o r r e s p o n d i n g
of the nomination
we
we cannot.
For
"set
in [DKR].
may appear
representation
of model i n f o r m a t i o n
point
is a set of names.
cf.
In
The names
at the r e p r e s e n t a t i o n the same
boxes.
a to
aggregate
of names is depicted
representations.
whereas
representation.
3).
in that a
(Name)
of a c o n s t r u c t
an
an
nomination
n~me~
and n o m i n a t i o n s
(fig.
to
the
of both r e p r e s e n t a t i o n
in I~C r e p r e s e n t a t i o n
that
within
a
differ
Names only
Language,
canonical
represented
A detailed
If we look notice
a
be a
level)
constructs,
in a nomination
collections,
constructs
or
from
therefore
The c o m p o s i t i o n
the presence
to the component
of
or a n o m i n a t i o n
2) or by trees
of
A combination are
set
Definilion
mentioned
a construct.
aggregation
Atoms
is a in
cannot
(first
of aggregates
function
of names is involved
boxes
i.e.
as a part of
A construct
is of no significance.
to i n t r o d u c e
by nested
finite
of being a c o l l e c t i o n
immediate
the
a
(Atom)
to "be",
relevant
(Kollektion)
types
an
represents
an ~!Rm
its capacity
composition
collection
two
is
no meaning
in
itself.
is
that
in
may be the
an aggregate
i s
(Komponente).
to in a
a book
is either
which
construct
nomination
as s e l e c t o r s
A construct
situation),
of
collection
the property
can be referred
an atom is declared
(in a given
is a ~ R @ ~ 2 ~
the
(Nomination).
A construct
the c o m p o s i t i o n
communication.
within
which
(Gebilde).
Whereas
as e l e m e n t a r y
construct
to information.
a car in an administration,
(Aggregat).
construct,
considered
information
a construct
a family,
a process
be viewed
another
of model
is called
about
or an ~ e ~ a ~ e
compou n d
and its r e l a t i o n s h i p
various appears,
Therefore
of
fig.
in different
2
or
contexts.
locations
3
we In a
where
(the
on the c o n c e p t u a l
level
a concept
is
needed
which
26
allows
to
distinguish
between
different
appearances of one ccnstruct
(within a c o n s i d e r e d e m b r a c i n g construct). (Stelle) pairs
has
been introduced.
(name,
inserted
construct).
at
the
In IMC the concept of
In case of a c o l l e c t i o n the empty
name p o s i t i o n
in the pair.
in
(=relative to)
name
is
The first pair of a spot
d e f i n i n g s e q u e n c e always c o n s i s t s of the empty name and construct,
~R2~
A spot can be defined as a sequence of
the
which the spot is considered.
reference So with the
symbols of fig. ~ the c o n s t r u c t in question appears at the spots
(-,c,)
(home address,c2)
(-,ci)
(place of birth,c3)
(-,c,)
(branches, c s)
which are spots in cio construct.)
(city,c3)
(-,c~)
(The lower case c~s stand
The same c o n s t r u c t
for
the
respective
also appears at the spot
(-,c2)
(city,c~)
in c 2 and
(-,c5)
(-,c3)
in cs.
Another example is c 7 which appears in c, at the following two spots:
(-,c,)
(ho~e address~c 2)
(-,c,)
(date of birth,c 4)
It turns outs
(street,c6)
(number,cT)
(year,c,)
that the concept of spot is e s s e n t i a l
for the discussion
and u n d e r s t a n d i n g of some s o p h i s t i c a t e d
aspects in data base management
systems,
the
not least
information
Fig.
those
(constructs)
2 and 3 show,
always c o n s t r u c t s
concerning
and data
a t
system.
information
models
between
by the way, that in c a n o n i c a l graphic r e p r e s e n t a t i o n s p 0 t s
spo% structure is hierarchic, hierarchic
interrelationshi~
(representations).
But
it
are depicted.
one sigh% be is
obvious,
(in hierarchic,
network,
As by definition
tempted that
in
to
label
I~C
any a
a 1 1
existing
etc.)
the spots
relations,
form h i e r a r c h i c trees. So
far only individual c o n s t r u c t s
have been considered.
types or d e c l a r a t i o n s has been said nor used tacitly. is a set.
But not any set is a type.
determined
what are the e l e m e n t s
we focus on ~ e ~ _ ~ f constructs.
In
the
constructs world
First
of
of such a set. (Gebildetyp),
Nothing about
A type in general
all,
it
has
to
be
In the present context thus the
elements
are
of data base management systems instead of
27
"element"
the terms
"occurrence"
or "instance"
of
a
type
have
been
adopted. But
not
even
constructs
any
set
that only constructs for exchange. be
of constructs
has to be declared
the
specifies
"understood" are made,
via
is a construct type.
considered
channels
by interpretation.
should be called
type(s)
As only representations of
an
what constructs a "type
information
system,
can
a type
and
definition/declaration
is often called a "data definition
one
sloppy terminology
of
are admitted
can
be
in which type declarations
but unfortunately example
saying
of constructs
will be represented
A language,
A type of
communication,
which belong to the specified
Sore precisely:
exchanged
declaration
for a
language,,,
language".
This is
which is so characteristic
for the
field of data processing.
Not even "type declaration will
be
shown
below,
representational
level).
language', would be sufficiently
also
other types have to be declared
Therefore,
is a "construct type declaration composition
of
declaration, applied example
in
constructs
a graphic analogy
to
box
in
the to
representation, occurrence
the This
if
by others is specified
in a recursive type
the
type
definition
canonical
construct
of a particular type
in
particular.
the
be An
5, an occurrence
where in both figures the small
~[R@__~es~nation
emphasis
can
representation.
is shown in fig.
in fig. 6,
"type
language
plate"
is
a place for inserting (Typenbezeichnung)
also
used
in
as
the we
construct
is put on the fact that the construct
is
(cf. fig. 6 and 10).
It would be beyond the scope of this paper to discuss involved
(on the
such a language
As far as only the
upper righthand corner provides say.
speaking
As
(CTDL).
for a graphic type definition
name of the type or prefer
strictly
language"
construct
of that type is represented
precise.
the
aspects
concept of type in general and of construct
types in
The one or the other will he addressed
all in
the
following
paragraphs.
After this very short outline,
concepts to talk about model information
and a canonical
technique
representation
type has been emphasized
because
guestions of representation
of
are available.
its
to be discussed
great
The concept of
importance
for
in the next section.
the
28
~. Data as r e p r e s e n t a t i o n s For
convenience
the
term
" d i g i t a l data" i n d i c a t i n g which
consist
(pictures,
of
"data" that
characters
sounds,
etc.)
is used in the following instead of
only (cf.
are
not
representations
are
[DIN]).
representations
Other
investigated
considered
with regard to their
r e l a t i o n s h i p to information.
R e f e r r i n g to the c o n f i g u r a t i o n of two offices with (fig.
I),
let
the
piece
of
paper
on
r e a l i z a t i o n of a c o m m u n i c a t i o n channel. addressee
three,
that
one
agreement
seven",
or
A multitude
all r e p r e s e n t a t i o n s
there
might
of such
communication. irrelevant,
of
to
So
paper
in
text
taken the
carefully
as "number
for
shape
etc.
granted of
the
in
everyday
c h a r a c t e r s is
On the contrary,
between d i f f e r e n t fonts,
is
default in m a t h e m a t i c a l
literature.
beginning
Or:
In many of
in other places it is.
e x a m p l e s may show that the r e l a t i o n s h i p
and r e p r e s e n t a t i o n make possible
you
because they
meaning which usually is agreed upon at the
or
might
on the c o n s t r u c t level even
and a "plain seven"(7),
usual
~ + 3
and not be interpreted
a difference
are
according
languages the i n t e r s p e r s i o n of blanks in some places is
no relevance,
two
be
agreements
distinguish
programming
These
So
might be i n t e r p r e t e d as "number
but in m a t h e m a t i c a l texts it is not.
a different a
the
The example suggests the
between the c o m m u n i c a t i n g offices.
between a "bar seven"(~)
have
be a
whether
a c c o r d i n g to another agreement the r e p r e s e n t a t i o n
seven",
between
appears
The question is,
two or one construct.
be taken for an a r i t h m e t i c e x p r e s s i o n
have
channel
the i n t e r p r e t a t i o n of the various r e p r e s e n t a t i o n s is the
subject of a g r e e m e n t s to
a
fig. 7
i n t e r p r e t s the five r e p r e s e n t a t i o n s there as r e p r e s e n t a t i o n s
of five, four, answer,
which
(data)
has to be e s t a b l i s h e d
mutual u n d e r s t a n d i n g
between i n f o r m a t i o n
in advance in order to
in c o m m u n i c a t i o n
via a channel.
What
are the p r o v i s i o n s to be made? For a c o m m u n i c a t i o n background
of
to
be
possible
understanding,
r e p r e s e n t a t i o n s onto constructs. agreements
may
be
there
i.e.
a
must
be
a
prior
predefined
mapping
In the course of c o m m u n i c a t i o n
used to extend this cemmon background:
common of
further
One office
passes the d e c l a r a t i o n s to the other, the latter one accepts or rejects them.
The d e c l a r a t i o n s c o m p r i s e
29
- construct
type declaration
- representation
Construct The
type declaration.
type declarations
construct
communicated
were discussed
type declaration
via the considered
in
determines channel.
the
preceding
the constructs
The construct
type declaration
language is a part of the above mentioned common
background.
The representation
a
type.
It
constructs
what
are
type
we
arrive The
at
the
An example
may illustrate
representation intuitively.)
Fig.
to
of
ccnstruct of
channel.
occurrences
of
x~presentation
a
~
language
(RTDL)
mentioned common background.
the relationship
be
Although
necessary
indication
to
the
type declaration
type and their respective
are not
declared
representations
in the regarded
of
concept
representation
is a further part of the above
been
to
admissible
the set of all representations
(Darstellungtyp).
languages
refers the
of this type which can be exchanged
Considering given
type declaration
determines,
section.
which can be
discussed
between construct
occurrences.
here
and
should
it is a very simple example,
depict the ideas presented
type
and
(The used ad-hoc be
understood
many figures have
sc far,
which gives an
about the magnitude of usually implied declarations.
8 shows a declaration
MONTH-NAME,
of the four construct
YEAR and DAY-NUMBER.
types
CALENDAR-DATE,
The latter three are types of atoms,
the first one is an aggregate type.
Additionally
the type
composition
is shown in IMC representation.
Fig.
9
shows
MONTH REPR, the
a
pertaining
YEAR REPR,
declaration
construct types MONTH-NA~E,
DATE
PEPR
is
the
of four representation
and DAY REPR are the representation YEAR,
representation
and DAY-NUMBER, type
for
the
types:
types
for
respectively. construct
type
CALENDAR-DATE. In spite of the extensive remain:
The character sets to be used,
the medium
(paper e.g.)
to the pre-existing Fig. of
declarations
common
course
of
the
assumptions
the arrangement
and other details.
component
type DATE REPR.
of the construct types)
and
still
of characters
on
They all have %o be counted
background of the communicating
10 shows two occurrences
representation
many implicit
offices.
type CALENDAr-DATE
some
occurrences
of
(and the
30
This example suggests that the concept of format belongs to the concept of representation that
only
type.
one
type.
Up to here the assumption has been maintained,
representation
This restriction
of representation declared
type can be declared for each construct
should be dropped now.
If multiple declaration
types for one construct type is provided,
representation
types
close relation to the common use of this term. example of fig. could
9,
declare
representation
representation
of constructs of type
formats,
one "key-word"
It
be
and
can
explicit working
types
(=
above
type DATE HEPR we
formats)
CALENDAR-DATE
in
(two
for
the
"positional"
format).
observed that the separation of construct type declaration
representation in
declaration
(Format)
Referring to the
instead of the one representation
three
each of the
could be called a ~_m_a~
type
existing
decoration
systems.
The
is often simultaneously
area
format.
(=format layout
declaration)
of
the
construct
the specification of the input
This might be a reasonable economical
But to understand the relationship
is
between
information
and
not type and
approach. data
one
should be aware of the double function of such a "data definition". Applying
the
view which has been presented sc far of the relationship
between
information
(representation information
(constructs
and
and
representation
between
two
offices
construct
types)
we
types) outline
via one channel:
and a
flow
properties
(e.g. from a data base).
Office B finds the specified construct
representation
identifies the type of it,
of it),
of
the construct in question into the channel.
regarded channel,
type
(i.e. a
chooses one of the
type declarations and puts
conforms to the representation
of
An office B may be
requested by an office A to retrieve a construct with given
pertaining representation
data
a
representation
As this representation
declaration
established
office A is able to interpret the data
for
the
(knowing the
representation type and construct type). Some
reader
argumentation
might have noticed, is missing,
that in the CALENDAR-DATE example an
why the representations
details of the represented cons%lucts not necessarily processing,
so,
it
because
it
and not the construct. in a representation
only
(cf.
corresponds
fig. to
is %he representation
do not show all
10). Actually, the
practice
the
this is in
data
which occupies storage,
More extensive representations could be provided
type declaration
less extensive declarations,
for
various
etc.). Of course,
capacity of the involved channels
(storage).
reasons
(security,
that would require more In any case the question
31
arises,
whether such a "representation" is really a r e p r e s e n t a t i o n of a
construct.
Strictly speaking,
specifications,
it
r e p r e s e n t a t i o n is there.
Therefore
shows only the ~ a ~ i X ! ~ ! _ _ ~ construct,
is
not.
together
of
the
represented
in "input data")
This leads to the idea,
the
use
definition"
of
the
word
"data"
can partly be justified:
representation
type
in
the
be
entirely
clear
by
that
term ',data
The "data definition" defines
declaration
now,
With this in
criticized
the admissible data,
admissible individual parts of construct representations. should
that
usually means individual part of the full
r e p r e s e n t a t i o n rather than the full r e p r e s e n t a t i o n itself.
its
all
a full
because the r e p r e s e n t a t i o n a l part common to all occurrences
(e.g.
mind,
with
a r e p r e s e n t a t i o n in the a b o v e sense
(Individualteil)
of that type is in the type declarations. da~
Only
which allow the interpretation of the construct,
the
omission
in
i.e.
the
However,
it
of the word "type" is
misleading.
5. Practice oriented remarks
In this
concluding
section
some
applications
of
the
ideas
about
i n f o r m a t i o n and data as discussed above shall be tried.
First
a
preliminary remark:
system of IMC has been offered compete
with
other,
misunderstanding.
IMC
about information,
view
on
as a new proposal of a known
data
models.
data
That
that the model
would
%o
be
a
aiming to he a c o n c e p t u a l tool for speaking
on this level comprising the various
N e v e r t h e l e s s it is a specific
well is
There might be the impression,
c o n c e p % u a 1
data
models.
model and as such offers a
model information which allows to form a wariety of
i n f o r m a t i o n structures,
but has its own limitations,
too.
It is not the task of this paper to outline the features of hierarchic, network,
r e l a t i o n a l or other data models.
in
context,
this
so-called
to
Hut it might be of interest
what these attributes refer.
They refer %o %he
"data structures" which can be established
in a system of the
respective
model and which are supported by the
system's
functions.
With the t e r m i n o l o g y introduced above
we would of course say
" i n f o r m a t i o n structure', instead of "data structure" structure
in
representation efficiency, communication
our
understanding
as
structure
security,
or
purposes
the
any
goal
possible
else
of
structures
as meant here.
of
normally is left to the implementor,
manipulation
the
Data
information
in order to achieve this of
nature. constructs
For and
32
related q u e s t i o n s c o n c e r n i n g
model i n f o r m a t i o n are of main interest:
what levels of a g g r e g a t i o n are nominations what
are
the
restrictions
or
collections
for the nesting of constructs,
special generic types adjusted to the
application
in
On
available, are there
question
(e.g.
"relations",
which in terms of IMC are c o l l e c t i o n s of equally domained
nominations,
called c o l l e c t i v e s
orientation
in
extensive
address c o n s t r u c t s other
questions.
(Kollektiv)),
constructs,
what properties can be used to
(independently of their representation), The
answers
to
these
p e r t a i n i n g o p e r a t i o n s on the c o n s t r u c t s hierarchic,
It
is
a
network or r e l a t i o n a l
matter
of
course,
i n f l u e n c e d by r e p r e s e n t a t i o n of "redundancy" benefits
and
clarified, but
to
chance)
appearance
are of
are of relevance.
of
redundancy.
~ @ ! _ _ § ~
construct
"consistency
(cf.
constraints"
But
it
has
to be
constructs,
of
appears
an
embracing
(Parallelstelle).
type
that a
(necessarily or If
declaration
hy
the system it
will store the r e p r e s e n t a t i o n of the c o n s t r u c t each time it appears
(at
a
parallel
spot)
to be. that
or
It is c o n c e i v a b l e the
same
with the RESULT
(usually once).
The more often the
the higher the degree of redundancy is in p r i n c i p l e
technique
consistency-conditioned
the
less often
is stored,
decide,
Once a
whether
representation
is free to
this
so-called
the SOURCE clause of [DDLC]).
offices)
the
r e q u i r e d
s p e c i f i c a t i o n of this kind has been established,
(as one of the c o m m u n i c a t i n g
be
problem
It has been shown,
at several spots is
has to be specified in the
consistency
The
does not refer to the level of
construct
to
It is not intended here to consider the
at which the same c o n s t r u c t called
a
model
also e f f i c i e n c y and other aspects
techniques
disadvantages
Spots,
many
together with the
data
may appear at several spots as a component
construct. by
the
that r e d u n d a n c y
a
and
(or something else).
that
is one of them.
questions
render
the level of their representation.
construct
what is the support for
could
(and actually is done sometimes)
he
applied
p a r a l l e l spots.
feature of [DDLC]).
said
also
for
other
than
Such a s i t u a t i o n is also given
On the model i n f o r m a t i o n type level
RESULT clause specifies that the atom at the s p e c i f i e d spot is the
result of the e x e c u t i o n of a specified procedure, at other spots as input. additionally
is
In both the
specified,
SOURCE
which uses c o n s t r u c t s
and
the
RESULT
clause
whether a r e p r e s e n t a t i o n of the depending
atom is m a i n t a i n e d p e r m a n e n t l y
(ACTUAL)
by the system,
or is made up
only when r e q u i r e d for passing it via the c o m m u n i c a t i o n channel to r e q u e s t i n g office causes
redundancy.
(VIRTUAL). However
In the strict sense, also
i n t e r p r e t a t i o n of the ACTUAL and VIRTUAL
another,
the
the ACTUAL feature less
restrictive
feature is conceivable,
where
33
the
system still remains
assumed above)
free to follow the s p e c i f i c a t i o n
Doing a closer look to the d i s c u s s i o n of redundancy one encounters
a
(the "system")
is a
unit with a storage as a private channel fig.
11
is
configuration
often
preferred
containing
(input channel,
two
stated.
representations) RESULT
rather
than
are
the is
a
With
this
what is the object channel
which
the
As a matter of fact this is seldom clearly
input format declaration
(e.g.
sequence of atom
(e.g.
SOURCE feature,
made up to one complex declaration package,
d e c l a r a t i o n into the same package.
well known under
1.
a diagram
we have also three places to
complexity of which is still more increased by
"optimization"
fig.
and data base format declaration
feature)
functional
If we consider a r e p r e s e n t a t i o n tyFe declaration,
is applied to?
In particular,
To show explicitly
computerized
channels or still better three channels
the question has to be answered, declaration
configuration
(the "data base"),
data base, output channel)
represent constructs.
type
(in the context of
system
is a slight modification of that used so far.
that one of the offices
like
(as
or to understand it only as an efficiency constraint
data base management systems) which
verbatim
label
"schema',.
minimization
of
packing
the
construct
Such d e c l a r a t i o n packages The
consequence
of
the
are
such
an
the number of characters to be
written by the programmer at the expense of
quality
of
software,
in
particular of clarity.
Finally
some
remarks on the relationship between information
on %he one hand and their manipulation appropriate. or their
on
the
other
hand
and data might
be
If would be an obvious question to ask whether constructs
representations
are
r e p r e s e n t a t i o n s can he handled,
manipulated.
Strictly
speaking,
as was stated previously.
only
But so-called
data
manipulation languages do not refer to the r e p r e s e n t a t i o n a l level
only.
Primarily they are designed for the manipulation of constructs.
This will be illustrated by an example of the retrieval of a construct: The properties which are specified as parameters of a request refer a
construct
rather than to a r e p r e s e n t a t i o n of it.
to
The delivery of the
found construct is done by putting it into the respective channel in an agreed representation, is "navigation".
i.e.
meeting the output format.
This term refers to moving from one spot to the other
in an e x t e n s i v e construct.
Also here no reference to the r e p r e s e n t a t i o n
of this c o n s t r u c t is involved. some r e p r e s e n t a t i o n at.
Another example
Only upon request
of the construct
(at the spot)
In case of a data base management system,
the
navigator
gets
where he has arrived
he does not receive the
34 representation
on which the retrieval has been performed,
representation.
A counter-example,
representation in the data base in the output channel Although
a
information,
this
implementor,
does
the
user
has
reguirements. time
exert
language refers %o the level of model
not
imply
representations
accessed in order to execute several
representation
and
interests access.
to
way
application
of
adequacy
and
resources
will decrease. from
manipulation given
to
computer
computer
and
level.
a
efficiency.
concepts,
security
However,
of
update /
compromise
between
in overall
computing
information
differentiation
hand
computing
(traffic density, balanced
facilities to system interfaces,
view of inforaation
to
A good choice of
More and more it becomes evident,
includes to support conceptual presented
cost,
functions as well as a forecast
the involved people and the intended
to this goal.
On the other
the influence of storage and biased
access
it is up to the
which refer to storage and
should yield
considerations
actual
also the policies of
time,
acting in the future
etc.)
to
move
influence
has
some influence to the information
user's
no
B~t again,
in what way he has provided to be
He
These requirements
retrieval ratio, efficiency
that
manipulation commands.
construct types and of manipulation the
where the
is the same as
(librarian's counter).
takes place in the system.
which
is a library,
(room with book-shelves)
~'data manipulation"
representations
however,
but an output
time
that we have
stractures
and
where more preference is application. wherever
This goal
useful.
The
and data is intended to be a contribution
35
References
[DIN]
DIN/Fachnormenausschuss 44300 "Information Institute
[ANSI]
ANSI/X3/Sparc/DBMS
Study
GMD/Arbeitsgruppe the description (German).
[ PZT ]
Prozesse".
[DURI]
R. Durchholz
and
[DKR]
Beschreibung
Verlag,
"Concepts
T. B. Steel
"Data
Jr.,
IFIP-TC-2
"Abstract 10/5,
(German).
Datenbanksysteme,
E. Falkenberg
base
J. W. Klimbie
Conference
(German).
a "A
status
technical
1975 Elektronische
G. Richter, und
"Design of a data programs
(DAGS)"
Systementwuerfe
und W. Klutentreter,
fuer (Hrsg.),
1974
Description
CODASYL
data
197~
basic system for application Datenmodelle
Report".
for
Namur, January
Objects"
In:
CODASYL/Data
1967
1968
W. Klutentreter,
GMD, St. Augustin,
diskreter Haendler,
standardization
Special Working
Rechenanlagen R. Durchholz,
base
of the DDL",
H. Zemanek,
for
systems"
Basel,
Data Base Management,
(eds.), North-Holland,
base management
[ DDLC ]
zur
Birkhaeuser
In:
"Terminology computer
ueber Aufomatentheorie,
G. Richter,
systems".
American
1971
and K. L. Koffeman,
report".
Report.
fuer Betriebssystemnormung,
(Hrsg.),
DIN
German
1975
of models of job processing
in-depth evaluation [ZEM]
Interim
February
"Grundsaetzliches
Unger,
management
[STEEL]
Group,
In: 3. Colloguium
(~NI),
(German).
March 1972
Institute,
GMD, St. Augustin,
C. A. Petri, Peschl,
vocabulary"
for Standardization,
National Standards
lABS]
Informationsverarbeitung
processing;
Language Committee
DDL Journal of Development,
(DDLC), June 1973
"June 73
36
a~nd
I,,,office ..... %_______
Figure I
_
Configuration of con~unicating functional units
office
office
"user"
"system"
Figure 1!
office B
Extended configuration of communicating functional units
37
name
f•ly
home address
~
iJACKSON I
city
~
I HOUSTON1
~ ~
street
first name
FOHN BiJ
~
street name
place of birth
[ HOUSTON ]
[JAckSON
date of birth number
~
~
year ~ m o n t h
i~71 day branches
[WASHINGTON 1
LOS ANGELES]
[ANN A~oR, 1 t HO~-'STON ]
Figure 2
Constructsin iMC box representation
_
~
Figure 3
..........
.o~)sTo,.
~ % jhumber ~
~)street
i) home address .....
t
.....................
"
- -
/
~ /
X
. ~
/-~hvear f ~onth ~ d a y ~ y q ] ~ j ~ ~__
~"'~lace 7f~irth
.-......] 1 F~os ,,.,~,s]--
fir.~ame Sz~,e
"[ ..................
1 | / ~ branches
t
I ranmalmly~ name~
Constructsin IMC tree representation
streetf-~ name ~ ] -__ "~
k~ic i t y
f ~ ~
¢O O0
39
, ,, /C?. ~
name
f•iy
homeaddress
/
FJAc~SO~
city
~
C3
C~
first name
[JO~N '~-I C6 _
0 s<eeti
_
~
place of birth//
c3
1H o u s T o ~ ] ~
~c~
date of birth f
\ C7
1~7 day branches¢ ..~.,. IWASHINGTON ]
[~os A~G~Es I
j~.,~
[CAMBRIDGE
[ANN A~BOR ]
I~{ousTON
C3
Figure 4
Construct representation of fig. 2 with additional lettering for reference purposes
--c 5
40 EMPLOYEE
----
~ Figure 5
~DSCR
SKILLS
MBE R
Jt
Graphic construct type definition
EMPlOYeE
PERSON
¢
SKILLS~
....
IsKILLCODE I 1120 . J. WA=TERS ]
I
,ISK~LLCODE 1135
NUMBER
5 7 8 ~ Figure 6
Occurrence of construct type defined in fig. 5
41
Figure
see n e x t page
7
construct atom:
JANUARY,
construct atom:
FEBRUARY,
... D E C E M B E R
type Y E A R
1900~INTEGE~
construct atom:
type M O N T H - N A M E
1999
type D A Y - N U M B E R
1~INTEGER~31
construct
type C A L E N D A R - D A T E
nomination:
MONTH
--> c o n s t r u c t
type M O N T H - N A M E
YEAR
--> c o n s t r u c t
type Y E A R
DAY
--> c o n s t r u c t
type D A Y - N U M B E R
non-occurrences:
MONTH
DAY
FEBRUARY
3O
FEBRUARY
31
APRIL
31
etc. CALENDAR-DATE
,•MONTH
YEAR
0
atom
~A_Y-NUMBE__R om....
Figure
8
Construct
type d e c l a r a t i o n s
42
representation
type M O N T H REPR
r e p r e s e n t e d c o n s t r u c t type M O N T H - N A M Z string:
1
or
JAN --> a t o m J A N U A R Y
12
or
DEC --> a t o m D E C E M B E R
r e p r e s e n t a t i o n type DAY R E P R r e p r e s e n t e d c o n s t r u c t type D A Y - N U M B E R string:
DECIMAL representation
representation
type Y E A R R E P R
r e p r e s e n t e d c o n s t r u c t type Y E A R string:
DECIMAL representation
representation
type DATE R E P R
r e p r e s e n t e d c o n s t r u c t type C A L E N D A R - D A T E string: (DAY R E P R "-" M O N T H R E P R "-" Y E A R REPR) or (YEAR R E P R "-" M O N T H R E P R "-" DAY REPR) or ("D:" DAY R E P R /// "M:" M O N T H R E P R /// "Y:" Y E A R R E P R
Figure
9
Representation
; delimiter
",")
type d e c l a r a t i o n s
4+3 SEVEN seven
Figure 7
Five c o n s t r u c t r e p r e s e n t a t i o n s
on p a p e r
43
I'CALENDAR-DATE
DAY0 YEA~0 l DAY-N~M~'4 ' ] 19G7'YEAR1 MONTH 0
4-0CT-1967 D:4,Y: 1967,M:OCT
1967-10-4
I CALENDAR-DATE _ ~ MONTH
DAY_~
--1973 ]
< M:MAY,Y: 1973,D: 14
D:14,M:5,Y:1973 14-5-1973 1973-MAY-14
Figure 10
Construct type occurrences and representation type occurrences of fig. 8 and 9
Figure 11
see first page (fig. I)
Data
A®
Base
Eesearch:
Blase~
H.
A
Surve Z
Schm~%z~
Tiergartenst~.
IBM
Wissenschaftliches
Zentrum~
Heidelberg~
15
Abstract The
research
Most
of
models
activities
the of
issues
information~
implementation industry of
ac%ivl%ies
respect
area
of
tial
future
%0
da%~
OF
and
between
with
%rends
Introduction Models
3.
Data
Manlpulation
4.
System
data
modelling user
and and
and
data
data
systems
are
institutes
reviewed.
center
around
manip~lation~
system
and
Comparison
analysis.
requirements
development.
potentially
architecture
base
base
research
shows
emerging
are
principles
with
with
differences
Conclusions
and
aspects~
respect
drawn in
to
the
poten-
research°
Languages
Problems
~.
Storage
6.
Modelling
7.
Summary
8®
Bibliography
Structures and and
objective
and
Search
Algorithms
Analysis
Conclusions
INTRODUCTION
and
in
of
CONTENTS
Data
past
by
documented
research
des~n
I®
The
area
and
established
base
2.
1.
%he
interactive
±echniques
emphasis
TABLE
in
considered
of
present
192/
/49,
this
paper
research
is
primarily
activities
in
to
the
provide
data
an
base
overview
area.
This
over
~a--
45
per
does
not
er~
information
information
survey
retrieval
systems
of
such
an
introductlon
to
available
Ll~htfoot:
Jardlne
and
of
T
data
still
help
is a
or
have
been
such
a
The
the
scheme
an
first
our
shown
programs
is
seen
by
base
the
We
will
~
is
is
which
sical
or
internal
is
actually
we
can
The
use
are
the
of
between selec±
the
conceptual
conceptual
a~e
specified
in
the
in
conceptual the
never
in
subpart
a
and of
the
definition
external information
the
It
with
the
It
is
a
standard-shown
designer
IMS
in
through the
views
exist
of
a
serves
in
as
the
the
double
phy-
form~
help
of
All
of
The
these and
mappln~s
purpose:
sufficient
a
C[!),
administrator
language.
a
central the
Informatlon
[ mapping
mappings.
and
a
conceptual
with
the and
langua@e.
examplel
way
base
as a
at
syntax
base
of
to
installatlonT may
For
way
aspects
referred
"correct"
data
serve
the
or
mapping
neces~ry
informa--
legal
form
of
of
system
system
{fig.
is
information,
pepresents
the
the
data
mapping
and
of
usually
It
physical
internal
information
Is
reflects
Given
responsibility data
been
the
retrieval
information.
corresponding to
the
unconscioesly,
information
directly.
of
information
memory.
of
what
defining
used
vlews
of
type
grammar
is other
view
stored
construct
mapplng~
mappings
for
to
has
major
views
for is
~iven
a
describesv
view
(D.Ao
schemes
or
point.
group
For
specifies
point
a
to
during
responsible This
A
conceptu~l
Experlence
which
knowledge~
persons~
flow
level.
reference
by
similar
central
schema
The
J.A.
S[stems
accepted.
consciously
shows
administrator".
a
and
definition
widely
employed
very
data
group
to
in
commer-
addition
in
our
make
authors
the
information.
similar
Users
question,
is
scheme
conceptual
therefore
a
experience
conceptual is
the
Barnett
-- A
of
danger
interested
of
~{anagemen[ 1974}
already
I and
schema
{A,J.
the
ago.
and
view
IMS
Base
implemented~
fig.
as
of
iS
some
[IMS)
D~ta
the
decade
in
who
depth
Vurth--
aspects
aware
reader, In
Is
and To
who a
integrated
"data
debates.
book.
well
Amsterdam,
This
in
are
System
architecture
concept~l
information
the
of
nearly
is
The
stored,
system?
software,
survey.
(ANSI/X3/SPARC}
mappln~s~
tion.
this
~ which of
such
[n
base
non-compute~orlented
the
systems~
~olland~ in
WedekindVs
scheme
data
base
Management
subject
scheme
group
Date's
to study
No~th
base
simplification ization
recommend field,
to
data
We
the
referenced
a
and
addressed.
Development.
editor)
Is
systems not
Information
litemature
available
and
data
Evolutionary
What
are
limitations
cially
with
commercially
for
{a} a
to
spe--
Fig.
of
a data
base
parametric interactive application programmer data base administrator
external conceptual internal
I :Structure
Users PU IU APR DBA
Views E C I
APP
APP
system
[1,, 2300
"SAL
]
2800
I ... 1950
Fig. 7:Subgraphof E, MGR and SAL
I
58
same
rigor
/12~/.
a
• urnish
user
restricts fig.
W
with
to
for
the
an
the a
other
science
does
force
side
subvlew
of
can us
be
to
does
data
Since
not
base
may
which
mapped
exclude
it
the
relationships~
lilustratlon,
computer not
On
seen
practically
these
all to
structures
difficulty
a
{i.e.
be
conveniently
provide
subgraph)
by
the
known
some
form
from
our
to
which
user.
See
structures
in
of
grephs~
high
level
it data
modeling.
2.3.
The
It
is
not
in
the
Equivalence
at
sense
all
surprising
that
in
ple
straightforward
respondin~ ween
of
the
language.
In
question
The
of
of
Bobrow
/17/,
models
are
by
Neuhold
of
creates
3 . i.
Low
Level
As
we
can
see
an
application
%o
as
second
the to
DBTG most
a
we
in
The
models°
DIAM
one
on
how
This
[s~ of
equivalent
model is
a
model
to
a
of
be
simCOP--
choice
bet-
"convenient"
or
however~ the
therefore
can
there
question
question will
or
are
cases
schema
decided
also
of
data
Sihley
same
for
First
a
and
a
while
system.
This
a
has
models /167/
for
superimpose
need
/43/.
DATA
tially
a
models
not
data
come
only
manlpule--
back
to
the
McGee
{at
model
in
investigated /122/0
least
in
creates
a
A
model
on
nsuperimpositlon
results
been
a
new
theory"
thls
Rs
direction
by
Different
the
world
mapping By
a
it
was
are
of
prob-
problem9 stated
reported
by
/82/®
3.
onset
the
section
coexist
on
how the
Codd
EeP.
Frasson
in
/134/y to
~ even
namely
which
the in
other.
be
but
equivalence
likely
researches)
the
must
model next
eonve~%
in
becomes
in
Moreoverv
to
models
the
different
equivalence.
question
lem7
way
data
the
encoded
versa.
schema
processing
question
tion
vice
dlfferen%
"natural" a
and
equivalent
two
that
information
encoded and
CaM
o.f Data....__Mode~s
MANIPULATION
a
"low case
Versus
in
High
fig.
is
program
records in
LANGUAGES
are
of
data or
oe
access
are
LoLic
accessed
interactlvely
typically
programming level"
Level
retrieved
language.
as is
"one the
in at
one
This
a
by
type
record
at
higher
level
a
external
form
terminal, one of
time
and
In
logic"°
the
processed
p~ocessing
"multiple
either
is
first
sequenreferred
Typical records
vla
for at
a
the tlme
59
logic". level
Research logic.
program in
allocation
subset
needs
by
a the
Even
modest
may
very
of
access compared well
plication
be
its
%he
use
to
and
even
Is
still
Of
the
thelm
more
type common
specify to
sub space
in
a
to
todays
level
towards systems
their
processing
Pesul±s
through user
a
viewv
high
oriented implemented
in
oper-
resource
external
systems~
of
tO
and
the
higher
application
going
the
specified
available
for
is
prim~rily
though
the
scheduling
and
be ape
the
required
program
for selection
has
in
is
conceptual
commerclally
it
the
towards
also
logic"
data
the
relevance
as
that
tlme
projects
data to
of
programs
%he
ef~ect~
research to
a
oriented
informatlon
between
nature though
interactive are
In
realize
at
of
this
mappln~
primarily
to
records
purposes, Is
are
important
which
system
program
logic,
is
"multiple
on
The
which
It
case
advance
ate.
activities
ap--
Installa-
tions,
Subsequently searchers then
we
some wlll
is
In
which
they
of
data
models.
Some
Table
I lists CRMt
data
manipulation
referenced. wlth
are
We
languages
start
based
%o l a n g u a g e s
used.
Finally
lansuages
wlth on
which
are
will
come
we
some
of for
the
IS/I
IBM MIT
experimental
some
location
MacAims
data
models,
characterized back
to
it
would
.........
be
systemst more
which
correct
to
remark
reference
algebra
Todd
algebra
Goldstein
UK
RDMS
MIT/MULTICS
algebra
Steue~t
MORIS
Mllano
calculus
Bracchi
SQUARE
IBM
Research
mapping
Boyce
SEQUEL
IBM
Research
mapping
Chamberlin
INGLES
Berkeley
calculus
Held
ZETA
Toronto
definitional
Mylopoulos
DAMAS
MIT
calculus
Rothnie
Table
I.
Some
by
re-
CRM I m p l e m e n t a t l o n s ~
%he
other
developed
the
A by
special the
way
equivalence
Implementations
though
System
the
devoted
CRM
3.2.
ment
be
continue
subsection
of
relational
systems
claim
to
imple-
claim
that
they
60
implement four
homogeneous
represent
concept
of
tlenal XRM~
a
data
and
files
graph
SEQUEL
is
for
system)
a
snduser ing
derived
and
dy
an
Te
give
us
consider
is
an
is
IS/l)
This
tion
In
{P
~
%he
query
the a
data
the
relations
which
on
lan@uage stands CUPID
top its
ef
as
currently
tool
berela--
%o
low
a
mesembles
INGRES
is
ef
RAM,
supporting
definitional
by
top
homoge-
ePiented
management
used
and
the
rela~
of
query
keywomds.
system
data
on
on
top
The
first
between
better)
en
graphics
language
of
let
level
implementer
the
primto
stu-
access.
different
styles
of
query
langua~es~
let
query:
name
of
the
algebra
{S;
C2
is
a
=
~M"
advisor
of
approach)
));
sequence
{operator
=
)%s).
calcuius
Ci
=
%o
query
OF
PRO~
IS
P
RANGE
OF
STUD
IS
S
INTO
R(PROF.PN)
RETRIEVE
=
a
the
we
student~
whose
name
C5)
%
obtain:
C2
selection
v*') I a
refers
oriented
Cl
of
RANGE
=
second
the
selection
value
language
WHERE
(operator and
in
the
to
INGRES)
PROF.P~
=
=
iIth
';'), a
a
projec-
domain.
we
STUD.P~
obtain:
AND
STUD.SN
~M ~
Here
The
answer
P~OF
and
STUD
existential
a
aspect
the
product
{operator
QUEL)
is
has
specifically
rela%ion~tl
expression
cartesian
ZETA
or)
syntax.
Engllsh
and
system
of
binary
based
In
implemented
implemented
directed"
level
is
compact
more
language
%he
"M"P
the
(
It
following
is
a
a
shown)
somewhere
relations
is
stered
QUEL
~WsynTax
impression the
n-ary
has
119/.
high
a
optimization
What
In
a
hls
DAMAS
I%
with
Toronto,
provides
implement
i%Ives.
it
/S3)
~%
calculus,
turn
systems
is is
supports
offers
nine
SQUARE
in
SQUARE
frem
which
interfaces
tions
XRM
the
which
supporting
/110/.
developed
user
relational
which
model
Of
approach
/111/.
management)
the
file.
experlments, an
management
flat
d~ta
early
vlmapping"v
algebra
neous
flat
is are
in
the
variables
quantifications
result in are
relation the
~)
predicate
applied
by
a
unary
calculus defaulT.
relatien. sense
Clearly
ever
which
61
In
SEQUELt
All
of
the
"mappln~"
FROM
P
WHERE
P,P~
IN
SELECT
P#
FROM
S
WHERE
S,SN
=
nine
systems
tion
research
data
solution
of
pointed
out
for
the
three
ing
research
Some
above~
may
model.
In a
there
First
data
effort
this
graph
is
most
DIAM
that
their
already
ZETA
first
genera-
contribution
to
significant
as
development,
we
know
system
to
manipulation called
DIAM
that
At
least
such
SEQUEL
is
on@o-called
model
oriented
entities.
formulated
P{PN}
FERAL
the
where
recently on
model
/82~.
graph
system~
medel~
A very
I~IS a n d describes
which
interesting a
query
nition
This
generated
SN
allo~ on
hierarchical
DIAM
as
languages
of
work
query of
a
the
their which
Language}
continues
language graph
{or
in
binary of
preceding
as an
/157-IS9/.
composition
the
=
rela--
relations
subsection
can
oP
very
and to
QUEL
a
data
In
query
data
to
another form
Nice~
the
system
given
computer~ can
DBTG
then
in to
be
a
and
students
one
comparison.
Implements
the
with
similar
language
to
model
/123/.
is
definition
data
least
languase
developed
map
at
manipulation
similar
a
professors need
developed
offers a
VM1 ;
between
IS/I
research
language
language
possibly
is
some
CRM
Independent
its
for
query
connection
descrlbed of
language
PS
where
top
usin~
follows:
identifier
Mcgee
The
as
property
example
for
the
/72/.
FERAL
query
as
discuss
(Representation
interestin~
FERAL
establishes
single
with
The
in
RIL
will
are
model,
languase II
activities
we
data
with
tional}
form,
called
follow--on and
follow-on
subsection
between
fers
be
though
by
research
oriented
has
es
might
means
INGREST The
;
Systems
FERAL
a
This
Increased
planned.
Senko's
be
be
what
problems~
SEQUEL~
mentloned~
usin G
its
systems. base
systems
Non-CRM
already
are
base data
is
represent
'M';
R,
3.3.
data
obtain:
PN
the
As
we
SELECT
the
System
approach~
a
SIMS
/194/
language.
The
their
internal
hierarchical
accessed
by
the
with A
graph
advantagconceptual
which data
ofdefi-
form
and
conceptual query
lan--
62
gua~e
without
actually
tures
ob~ee%ives~
which
though
SIMS
report
generation,
reports
is
computer level
0.4.
User
%his
A
a
by
design
SIMS
most
of
p~oblem Dana
data~
meets
other
wlth
these
experimental
fea-
systems~
implementations.
i~you%s
which
and
specifically
to
Presser
of
with
report
designed
computer
solve
for
about
this
generated
the an
task
help
of
a
interestln~
/46/.
Aspects
we
apply
missed
earlier
the
natural.
interface
of
Into
access
will
discuss
specific
some
data
technique
manlpulation
with
have
as
CRM
efforts
has
~eneral
purpose
p~og~ammlng
a
powerful
their
75/.
build
an
question, of
%rac% the
of
might
groups.
respect
is
system
management imental
languages
%o
the
whose
interface
to
Further
approach
The
feasibillty
language"
is
Thompson,
found
of
the
subject
in
at
a
as
Is
and
systems
offered fo
for
traditional
best
by at
way
a
It
Kraegeloh
in
natural
report
~nd
lan--
some
is
ZETA
user
R~ND~Z-a
as
natural its
about
±he
data
exper-
/184,
Implemented language
at-
called
TORUS
uses
%o
believe
least
system~
already
of
proposes system
the
the
/42/.
natural
base
reseamchers
language
are
protection inclusion
/149/.
whether
Some
data
efforts
Schauer
data
APL
is
Toronto.
147,
lin~uis-
/156/.
~tcommunicatinn
with
TO r a t h e r
sceptical
data
manipulation
languages
many
applicable~
since
"universe
of
%he
in of
implemented
which %o
be
top
/59/.
attractive
Petrlck
systems,
the
syntax
natural
being
study
language
more
developed
references can
a
and for
lan-
the
these
develop
language
query
combine of
proposal
computer.
in
to Two
the
%o
open~
formal the
freedom
such
being
/131/.
a
embed
a
language on
s%lll
to
to
language
llke
data
computer
currently
languase
tLc
a %he
proposes
which
language
%ha%
~oal
query
currently
target
facilities.
ALGOL
C~M
groups
make
Codd
an
rigorously
end--user
possibility
guage~
I02/.
is
defining
Its
describes
measumement
which
all
VOUS~
into
Interactlve
evaluation
way
research
Earley
structures
as
computational
specific
/44~
data
the
research
with
mechanisms
of
i~e=
the
use~®
guage
A
the
non--trivial
section
series
to
a~e
of
langua@e
deslgners the
a
one
seems
hl~h
In
Is
converting
of
the
computer
considerations. these
discourse"
considerations is
essentially
in In
natural the are
case not
restrict-
63
ed
to
the
simply
A
objects
completely
IS
to
119,
graphical
form.
Into
spaces
free
formulated
easily used
a
menue
extended method.
and has
It
by
Is
to
wlth
geographic
can
point
user
obtain
The
questlon~
cessful Tigations chology of
the
(or
slight
3.5,
As
pointed
ferent
out
data
languages. one
(CRM)
guages
are
attribute name
the
form
the
of
McDonald
display
device,
CRM
by
is
a
a
can
be
McDonald's
query°
such
the
help
Sehauerts of
ZloofWs is
system
asso--
in
displayed
approach
abstract
which map
while
/143,
that
questions
the
contents)
to
questlons
skill
more
to
suc-
Inves-
experimental
question
indicate
Is
reasoning.
Of
methods
posed
opposed
users
In
(with
information
within
the
to
The
some
"examples"
modiflcatlon 19
in
entities.
the
as
of
and
which
queries
error.
oriented
employ
fills
plctuPe
device
answered
to
user
of
/2S/
relations
Simple
and
graphic
unskilled the
Zloof~
llke
subareas
seems
the
are
easily this
CRM
very names
followed
be
which
and
illustrated
{or
"can
schemata.
To
a
of
psy-
183/. of
One
syntax
are
semantics
o~
a
are
/143/.
Equivalence
can
for
in
described
a
the
semantics
user-lnterface
answers
earlier~
corresponding
diagram
GADS
to
for
models
equivalence
and
by
stored
probability
display
or
independent
Model
a
the
extension
locations
of
taken
Example
the an
use
experiments
significance
pate
is
way,
generally
slgniflcan%
flew
unbiased
reported
more
low
cannot
under
find
a a
one
another
are to
wlth
related
wether
than
base
description.
locations.
%o
information
By
draw
to
a
data
requires
of
Query
expresses
natural
is
relation
example
clated
the
method
descr~ptlon
the
) which
query
Their
ZloofWs of
CUPID,
the
the
In
in
approach
149/o
display
stored
dictionary,
different
/198,
used
verbs
data
structured
Schauer
of
and
by
similar and a
we
the to
introduce
SEQUEL.
followed
In A
will
the
for CaM
we
variable by
an
the
briefly
two graph
deal is
that
dif-
respect
indicate of query
with
name.
query
languages, Both
relation by
to that
the
model,
denoted
attribute
know with
equivalence
informally
(GRAPH)
variables.
period
to
we
equivalent
we
extended
other
examplesv
made")
Subsequently be
end
and
by
be
a
lan-
names, relation
64
Example
S
eelation
SN
SoSN
In
GRAPH
we
~elatlons), a
period
name
~ttribute variable
deal
with
A
denotation
set
followed
by
relation
name
a
~elatlon
denotation.
with
obvious
the
the
name
or
sets
(unary is
relation
a
relation A
meaning
set
name
that
A
followed
denotation
the
and or
denotation.
name
set
relations)
a
variable
name
relation
by
may
relations
set
a
also
be
'W~unsU
is
a
followed
by
a
used over
by
denotation
period
a
(binary
followed
as all
a
varlaDle
elements
of
set.
Example S~
S.SC~
PS~
It
should
while
The
CRM
be
noted
is
bound
period
ls
composition
A
query
in
~rom
is
of
the
CR}{
names~ with
In
and fhe
GRAPH
set
is the
both
arise.
to
to
in
the
definition
of
sets
levels,
FROM
llst
a
The
recursive
used
as
the
operator
for
functional
right.
of
predicate
is and help
languages
variables)
form:
list[
~he
is
languages
listl
a
(or
relations
GRAPH
two
both
in
with
ambiguity.
that
~elatlons
denotations
built
In
llstl
sets
PSoSC.CN
left
SELECT
In
S. S C , C N
FSoSC~
llst2
WHERE
attribute is
over
predicate:
names~
variables
list2
is
a
which
can
list be
of
relation
built
startin@
lls%2®
list the of
the
of
relation
predicate relations
use
subsequent
of
is
denotationst over
starting
subscripts
examples
are
set wlth
may such
list2
is
denotations the
sets
a
list
of
which
can
he
in
llst20
be
necessary
to
that
ambiguity
does
avold not
65
Query
1 of
Name
the
who
professort
advises
student
M,
CRM:
SELECT
PN
FROM
P,S
SELECT
PN
FROM
P
WHERE
P.P~
=
S.P~
and
S.SN
=
IMt;
GRAPH
Thls
simple
the
two
query
data
lationshlp of
may
used
be
GRAPH
illustrates
models.
between
pertles
these
will
Query
2
C~M
uses
Names
are
of
the
while
in
graph
the
CRM
apparent
courses
essential
wlth
the
by
attended
do
a
between
logical
re-
unique
por--
of
these
make
we
some
help
model
as
in
difference
%hat
%he
to
has
composition
more
VMt
requires
encoded
Therefore,
functional even
=
normalization
entities
become
P.PS.SN
already
entities
directly.
simply
This
WHERE
relationships
comparison natural
in
where
langua@e.
query°
next
students
which
are
advised
by
vBm°
CRM:
SELECT and
CN S.S@
FROM =
P~
$7
SCoS~
C~
and
SC
WHERE
SC.C~
=
P.PN
=
'B t and
P.P~
=
S.P#
CRM
form
C.C~;
GRAPH:
SELECT
The
brevity
should~ the
and
graph
model with
between can
then
ies
in
a
implement user
a
be
over
CRM.
macro in
accept
GRAPH
simple the
P
terms
their and
on
advantages
Is
top of
of the
as
these
a
%0
the
superiority extend
definitions
encodings.
algorlthm.
to
essential
possible
accepts
CRM
WBt ;
compared
an
convert
forward
language
it
which
=
form
conclude
fact,
queries
the
P.PN
GRAPH
to
In
of
WHERE
the
used
processor
straight
GRAPH all
of
not
entities
has
FROM
elegance
however~
language
the
PS.SC.CN
Thls queries
With
other
macro
into
model.
the
The
CRM
relations processor
CRM
words,
CRM i m p l e m e n t a t i o n
graph
of
querwe
such
can that
differences
66
of
the
is
primarily
away
languages
with
the
and
of the
other
are
of
of
sections
underlying
syntactical
help
implementation. quent
a
thei~
nature
syntax
little
other
deserve
and
they
Issues
of
relevance
questions a~e
appear
since
macros.
practical
Many
models
in
like
the
on can
one
level
be
data
glven those
process
a
the
transformed model
versus
right
sort
discussed
of
which
in
receivlng
of
subse-
more
atten-
tion.
4o
SYSTEM
4 , i,
Introduction
The
major
PROBLEMS
peoblems
concurrent
access
gram
management
with
Iocklng
and
last7
enough
in to
a data
and and
hut
data
shared
not
leasfT times
system
by
schedu!Ing~
like
many
with error
with
high
enough
the
whole
to
make
IMS
users~
system
or
recovery
response
base
are
with
data
with
pro-
Integrity~
independence
data
transaction system
with
application
enforced
isolation~
connected
rates
short
and
attractive
for
the
user.
The
implementation
may
turn
natural full
out that
data
in
wlth
fact
outside
the
area
nection
to
provide
does
and
high
reference
of
data
independence
attached
at
a
purposes~ is
Among
and to
of
the was
all
least
stora@e
though of
the
systems at
supported
SEQUEL~
nearly
therefore
portion
conception
activity in
It
larse
functions,
independence
solutions
so
systems
models
to
data
experimental
manpower,
original
follow--on
will
with
Its
for
and aim
system in
%o
the
we
references the
ce~pect
even time
the
be[n~ problem
above.
base
data
sections
in
projects
DIAM
eesearchers
that
data
system~
management 3~
R7
plans
a
costly
mesearch
base
mentioned
tional
few
System
expePimental~
The
such
quite
section
ambitious
structures.
areas
of be
only
of
set
mentioned very
to
multi--user research bibliography,
a
far
have
not
not
mean
that
level
and
systems. in
query
conslderable
the
area
data In
developed they
languages. number
integrlty
addition~ Of
full
have
security
of
In
and
subsequent
recovery mender
opera--
problems
relevant
and the
size
ignored
pafers in will
authorization
in confind in
67
4.2.
A
data
it
/175[
Data,,,Independence
base
allows
without
system
transformations
also
dence
{a
affecting
correctly
is
in
model
it
for
is
with
widely
organization
and
to
changes
form
fact
/182/.
tha%
The of
its
links
or
inverted
5).
Every
such
direct
mix
of
application
there
will
be
a
The
need
new
types
schema.
of In
affected~
ers
for
base.
or
example~
There
since may
data
the
old
may
main.
Certalnlys
ments
into
many
are
be
be
a
designed
a
many
least
s may
suchs
or
indepento
Is
data
not
the data
of
a
affecl
they
rely
%o
which
the of
the
In
on
CRM
of
which
conceptual a
binary
programs Consid-
only new
data
read
and
new
model~ relation
programsy
if
many
one
to
the
insert
the
data~
model.
Informationv
otherwise
of
conceptual
conceptual
the
update
programs
old
a
tlme~
additions
the
unaffected.
which
the
the
ad-
for
wlth
to In
domains
only
see stoP-
base
application
a
of read
In
data
due
changed
since
structures;
changes
remain
{projection}
(for
organization.
programs~
programs
some
mlx
%he data
depends
organization
existing may
internal
program
changes
domain
alter
constraint
even
the
data~ of
implemented
The
arises
of
the
redundancy
internal the
of
best
storage
data
The
application
consequence
activities,
some
to
form
a
means
the
part
of
directly
other
generallyv
changes
one
that
data
application
ape
Since
aftected~
Other
an
independence
between
are
¢o
that
independence,
absolutely
more
subview fop
Is
internal
large
need
dependency
entered.
constPalnt
a
no Is
at
the
least
different
relatlons
the
files
addition
he
model
be
there
cannot
the
should
already
relaxing
at
stlll
no
paths
programs.
conceptual
while
internal
optimize
Informations
generals
are
the
update
to
adapt
for
independence
implementation
to
need
data
i.eo
of
access
additional
attempt
given
data
conceptual
conceptual
of
performance
via
will
of
sense
respect
independence
is
section
requires
T
wlth
certain
there
example
mlnls±rator
~ence
a
which
transformatlonsT
Of
invariantT
on
i.e,
many
the
data
stays
heavily
In~s
how
that
forms
in %he
in
claimed,
internal
respect
before
programs
selection
extent
conceptt~l
transformations. of
the
programs
correctly
clear
the
%o
or
existing
Independencev
conceputal
recognized
run
the
of
data
internal
of}
makes
between
the
the
independence
independence
sometimes
internal
progeams while
the
This
We d i s t i n g u i s h need
after
consequence
as
of
which
effect
transformations. automatic
data
maximum
programs~
non-affected run
supports
This since
new new
do-ele-
Information for to
these
examplev a
many
%o
programs
constraint.
68
Support
of
capable
of
is
conceptual de±ermlng
affected
which
is
data
or not
trict
the
solvable.
data
each
This
solvable
mains
extensively
for
not.
independence
in
mappln~ Thls
applied
of
its
involves general,
languages requires
requires application
a
very
[%
is
such
a
programst decision
fherefome
necessary
that
appear~
the
complex
of
type
/exceptions
that
the
decision
theory~ in
Is
whether
I%
problemT #o
res-
pmoblem
which
other
system
has
re--
not
eontexts~
been
53
in
and
65/.
Support
of
Internal
1.
data
A data
the
definition
Internal
schema. al
any
nal
schema~
process
cess
to
all
of
in
section
following
be
there exploits
inversions
A more
of
the
lor~
and
such
languages
with
a a
called
a
given
by
the
conceptu-
supported
system paths
the
for
reduetlon
.
the
and
program,
external This
words$
in
ac-
optimizbut
what
results
capainter-
needs~
independent
other
which
be the
the
the
without
data
In
in
of
of
system
must
program's
execution
"logically"
purpose
Is
prac-neces-
pemformance
user.
query
data
a
user
(in
a
his
he
Inverslens
or
offered on
is
not.
the
Implementations to
a
"data
may
query
advantages burden
language
independence
relation
However~
any
the
n
A
independence
new
limited
base
specify
During
user
except
attributes
independently
execution
inversions
in
administration which
formulated
by
des-
de~ree
of
and
a
query
maintains
unavoidable
of The the
stomate
/175/.
comprehensive
development
no
When
exist
overhead
specifies
allowed
access
during
access,
be
support
a
wi±hou%
time
"optimally
paths
expe~ImenTal 3
inverted.
system
may
the
the
meet
been
data
way:
whether
and
a
introduces
should
to
mappings
program
predeflned
internal
se~ves
for
cribed the
has
the
is
Almost
which
form
independence
different
the
access
reduction
galns
data Of
following:
language~
conceptual
application
which
these
This
sary
of set
~ecognlzing
exploit
role")
given
of
tically
mapping
every
the
schema,
ble
ing
the
~equires
and
to
degree
is
conceptual
To
form
The
schema
2,
independence
approach
%o d a t a
flexible
data
slightly
modified
for
a
data
model
and
mapping
motivation~ very
starts
independence
definition
close
Smith %o
the
with
language. have
DBTG
the Tay--
developed model
/1797
69
169/.
As
enables
it
whole of
pointed to
I194/o
a
out
operate The
by
and
the
form
programs
operating
The
evaluation data
probably of
is
Data
processed
cesslng
in
practical mappin~
anothe~
Ramlrez
in
et
al.
from
tioned
less
have
data
data
of
tion
has
tions
language nition
fop
Sraph~
a
whichT In iS
for a of
Desautels
oriented
towards
the
a
describe this
of
to
a
been
area
created pro-
combines
the
small
has
right,
allows
projects
of
full
the
own
which
remaln
a fact
in
its
have
these
in
power
of
enough
to
This
be
plan
to
translatlon~
approach IS3/.
such
to
as and
data
is the Lam~
data
ori-
transla-a
negative descrlp--
developed
for
hlerarchleal
of
•
the
pro-
DBTG
data
have
implement
model
at
continuing
with
structure) between
is
men-
Smith
runnning
as
Shu
P,
during
/133/
and
D.
pro--
the
are
data
grammars Lum
their
Su
of
Merten
hierarchical
importance.
work
projects
form
of
activity
currently
both
(mappins
by
major
the
contextfree
conversion use
developed
126/.
and
and
makes
another
~ouseIT
of
/177~
generates work
in
computers~
an use
which
which
Fry)
Again
purpose
particular
illos-
in
This
in
evaluation
language
definition
of
dataT
internal
used
to
a
application
languages
organizatlon
This
Navathe
/95 v 165/o
network
such
justification
models
CONVERT
translation
with
into
/i08/.
{mapping
is
the
task,
built
as
by
have
all
language
still
used
Merten~
level
language
prototype
rarchles. version
Heller
record
DEFINE
a
is
investigated
and
the
and
tures} in
been
mapping
CRY
means
convert
complex
data
/142/.
data
of
as
by
~roup,
being
underlying
usefulness
Liu at
/
functions
The
resul±,
and
Michigan
increased
The
and
whole to
mapplns
compiler~
language
a
very
a
projects
a
as
which
orientation
large
built
definition
University
totypes.
a
data
which
da~a
data
and
experimental
the
these
access
rewriting
of a
The
descriptions
Taylorls
ented,
than
a
also
%o
wlth while
Similarly~
with
system.
lan~ua~est
is
conversion
systemT
orientatlon
conducted
grams
%he
one
language,
/166/.
experiments has
to
the
definition
to
a
conve~ting
possibility
than
system
T which
such
collectlons~
date
data
is in
data
these
impetus
translatlon
translation
and
es
a
also
without
the
expensive
on
has
converting
of
management
given
data
data
of
without
mope
of
base
SIMS
given
existence
standard
slze
on
importance
description
trated
earlier,
these is
data
a
defl-struc-
languagthat
o~
a
decomposed
Into
ble--
ARPA
net~
data
con--
and
also
Schneider
translation
specifically
70
4.3.
Data
Though
Intefl=Xlt[_and
the
recovery
pcoblems~
are
increased
in
multl--user adequate system
by
solution,
The
notion
assertions
example~
state
require
that
different the which
a
person
a
A
allows
a
enforced
A straightforward
I~
be
data
the
a
the
a
own
supports
specify
wlth
the
as
stay
a
then
data
or~ the
during
rules.
complex
mope
its
Such
that
integrity
{for
mannummay
rule the
budget
sum
of
allocated
to
rulesT
of of
complete
A
consistency
the for
notion
invariant
person~
exceed
be
collection
sense
ancestorv
not
may
system
consistency
a
without
responslbility
certain
known},
may
fact~
multi-user
viewed
which
about
are Its
base %o
the
which
extent
are
sub--
system.
approach
Provide
2.
specifying
to
the or
the
who%her
This
approach~
proposed
/66/
has
considered
%o b e
language
undeoidable
tent.
Second?
it
base
checked~
is
has
modAfica%ions
transformed plex baser
into
cons[s±ency which lar~er
to a
the
for
such
rules
could
consistent
rules
may
can
range
from
data
bases
in
has
consls%
of
been
user
base
/1~78/
and
Firs%~
state.
hours processing
defined
for
a
small tame.
in
when
in
general
consistent
Third~ access
it
are
must
before
require
modlfied~ hold,
assertions
checking to
a
data
llke
predicate
assertions.
still
in
carefully a
language
specify
to
caution.
the
be
general
base
example
since data
a
assertions
with
to
a
with
language
data
whether
a
data
user query
a
Whenever
checks~
for
known
In
called
in
In
and
enormously
following.
calculus
of
is
department
user
by
also is
birthdate
cannot
in
department.
sequently
the
and
expenses
it
are
are
responsibility
of
be
integrity
they
connected
may
contents
Information
something
address
some
closely
base
assertions that
whenever
name~
is
data
users. used
problems, over
to
systems~
under
schema
Systems
respect
concurrent
take
A
User
user
means
these to
data
wlth
many
schema.
the
These
may
with
integrity
the
abou%
processing0
to
data
and
MuI~I
single
traditional with
in
exist in
necessity
of
consistency
in
system
dealing
for
the
ber~
a
whlch
present
suppor%~
has
Pules
also
Recovery
large bases
the
system
recently is
also
for
a
general
themselves the
consls-
consistency
perform
a
state of
a
portion to
in
Is set of
several
of
number a~aln
Of
com-
a
data weeks
71
The
first
problem
consistency tency
rules
of
the
certainly
The tlon
of
the
to
checking
The
of
end
of
can
take
third
of
a
the
to
for
the
be
enforced
llke
the
assurance
of
analysis
of
the
an
analysis
The
problem
than do
one
not
end
isms
The
of
The
interfere system to
for
a
situation
have
wlth
part
is
the
without
be
is
at
increased
has~
each
the
and
in
other a
data
access
analyzed
during
is
cycle
free.
execution
tlme
in
a
technical
of
purpose ef~i-
situations
time
needed
for
necessary
compiled
query
addi-
and
most
time
A
to
approach
modificatlon~
as
comparative courser
such
level.
concurrent to
access
ensure
by
that
are
a
time. well
users
To
user
this
excIustve
Basic Known
more
the
operations. gives
limited
or
same
easily
Of
user
in
blrthdate
the
the
update
a
the
the
that
participatln~
contains
constraints.
which
fop
an
is
allows
with
with
father
It
addltlon~
to
a
by
variation
state
In
since
source
locking
of
in
the
facility base
names
help
a
costly
may
change.
Of
Given
Illustrated
that
assurance.
sesup-
Be~in--
how
The
objects
if
a
consistency
one.
be.
serves
sensey
constraints may
first
however,
function
to
exclusive
under
linear
rule sony
base only
state. Now
rule
of
canv
integrity
provide
of
granting
systems
a
he
with
the
data
is are
information
Pule
makes
with
system
must
IMS~
introduc-
which
determlngT
that
the
the
transaction
will
number
of
/176/~
integrity
user.
the
access
ating
not
the stored
This
by
queries
does
Ls
integrity
Stonebrecker
consis-
llke
user~
the of
may
labeled
hirthdate~
without
by
the
the
complete,
consistency
algorithm
the
every
bound
is
¢o
rule
to
consistent
capable
a
n
ruleo
proposed
which
that
control.
transformations
birthdate
for
processing
the
he
an
integrity" is
connected
If
the
a
user
way
edges
where
A one
transaction
consistency
only
person
enforcement
perform
so
in
systemsy
lead
by
into
under a
has
661.
base
must
requires
vePfled
/65,
relationy
cycle
"system
enough
and
state
base
a
n*#3,
previous
clently
the
some
father
precedes
language
Practical
data
are
system
relation.
every
the
whenever
data
rule
father
father
as
transaction of
containing
this
proportional
tion
a
in
the
the
slm~le
recognized
consistent
checking
subgraph
Checking
been
place
during
is
if
decldable.
transaction
is
Given
the
a
a
ruler
costs
example:
in
of
problem
checking
the
has
notion
only~
criterion.
transformations
consistency Its
solved
expressed~
this
transform
and
be
remains
problem
of
posed nlng
are
rules
satisfy
second
quence
c~n
from
mechanoper-
semaphores.
report
by
Eswaran
et
al
?2
/65/. in
A
/65/~
being
complication
of
locking
is
to
lock
the
created
potentially
/30/. be
finlte).
locks
such
the
formulation
Locking
has
systems
there
of
such
%hat
i.e.
taking
the
other
back
to
the
the
by
The
of
preempted did
record
internal
/83/.
This
method
is
noted
that
these
files
and
state
discussed are
hold
by
a
then
to
pre-
preemptions resources
be
is
l.e.
possi-
data
its
/29/.
to
positioned
This
during
systems
The
With
is
files~
al
operating
transactions
the
resources.
e%
most
in
deadlocks.
user's
has
process
in
on
/6S/.
As
give
Chamberlin
required
restrictions
Solution
checkpoint
of
for
with
the
infinite
decided
system
%o
the
an
be
may
always
is
preclaimingo
second process
process
not
journals
which
one
Is
schedule
The
from
the
deal
/67/
can
can
from
which
with
it
deadlocks.
to
explained
exis%,
objects
imposes
by
of
yet
objects
predicates
This
ways
of
that
also
not
created
handled
Everest
system
it
by
danger
appears.
which
help the
be
systemss may
number of
dictate
two
resources
in
set
overlap.
to
example
deadlock
state
with
ble
for
process. a
they
essentially
resources no
infinite the
the
base
which
described
consequence
away
an
predicates
are
proposed
claiming
be
data
requirements whether
of
as
are
may
predlcates
in
ob~ectss
(though
Performance
two
fi~st~
There
created
Such
extension.
need
sets
execution
It
should
for
recovery
a
transaction
be pur-
pose.
Recovery
is
terminate
necessary
normally.
error
in
check
failure~
the
transaction livered
isolation data
be the
failures
second a
If
posslbleo
much
such
solutions placing
expected beginnin~
no
Bjork
of
as
this
MULTICS
to an
feom
the
and
work
is
be
a
failureT
first an
to
deadlockt
error
the
all
Thls
described Sayanl
deIs
propagate
in
which
that
is
been
by
base) recovery
such
162/ a n d
to
a
by
they large
Genton
/148/.
recovery
a
of
transactions
cause
laid
logical
consistency
data
not
does
to
failure
Of
the
have
a
a
the
appeared.
Edelberg been
or
(via
being
had
or
objective
restart
without
has
in
The
operating
/81/.
integrity
and burden
application the
that
for
exception
indirectly The
algorithms
unnecessary
an
or
failure
/iS/s
may
hardware
such
failure
Recovery
/50/,
thls
objective
by
for
avoid
directly
impossible
zerodivide
transaction.
as
basis
All
v a
affected
Davies
is
for
example)
execution
/83/,
systems
a
A
cause (a
the
of
i%
program
has
tO
base.
been
extent
for
which
the
continue
The
userWs
input
the have
whenever
end
of
recovery on
the
programmer trsnsactlonso
problems user.
is
The to
The
must most
inform
of
course
that
should
the
interactive
system
of
problem
73
solver as
should
far
were
as
the
As
with
at
the
for
not
the
only
data
be
required
query
language
user
to b e
improvement
(a)
reduce
STRUCTURES
from
from
the
the
subsections
5.1.
Strra~e
of
the
search
with
Inverted
~-Trees
answers describe rithmic
searches list
of
indices
In
of
I151.
hlgher
there
of
allow
to
to
These
topics,
two
section
in
are room
partleularv
derives
techniques otherwise
utilize
these
are to
records.
techniques the
cases
the
Its
which and
structures.
~ data
we
(b)
s%ore~e The
next
If
hierarchical known
as
HaerdeP
Index
and
quicker
and
Bentley
supportln~ describes
reduce
the
In-
allow
Finkel
trees
a
the
IB--Treel.
update
which
/112/.
to q u a d
of
file.
retrlevalt
Indexes
llsts t } to
Iogamethods
storage
costs
/90/,
time
Lum
number
is
with have
d l v l s l o n I is
of
methods
hashing.
Hashing
its application recently
shown
in general
addltionalt
organization
These
an and
for
/77/°
acceleration
inverted
obtain
tlme
parameters (Ibit
the
an
McCrelgh%
trees
search
and
is
of
complexity
in connection
by
help
and
binary
two
Ghosh
lhashlng
applied
which
search
reducing
studied
splitting
this
section
required
multl--a%trlbute
of
with
extenslvely
niques
if h e
signlfican%
organize%Ion
time
inverted
compressions
meat
course~
of
certain
method
assumpTionsv
preceding
storage
Hayer
introduced
address
Of
by
extension
/113,
in
solutionsv
programs
wlth
a logarithmic
of
Another
should~
as
performances
the
search
to
repeatedly
queries
an
of
employed
parameter
is
Lum
to
He act
ALGORITHMS
in
the
devoted
existing
SEARCH
algorlthms
binding
described
allow
/9/.
AND
frequently
file
to
Structure§
one
organization
able
is ce~talnly
improved
enormous
are
transactions, be
discussed
There
and
discussed
of
without
about
problems
with
existence
existence
tWO
sert
the
sometimes
structures
the
understood, proposed
as
know
is concerned~
system.
functions
Independence
value
One
over
The
STORAGE
Data
the
independencet
beginning
In providin~
5°
of
to
may
to d a t a that
best
essentially
such
as
be
combined
ltnks in
has
been
manase--
under
Their
basic
tech-
/8~/.
between
or
various
modlfI--
The
74
cations°
Storage
and
well
are
120y
13S,
tures
to
I.~°
described
!60/.
The
programs
to
which
structures
offer
the
without
the
responsibility
Attempts
to
solve
Reduction
Reduction
the
problem
the
relationship
tations
are I)o
during
given
is a
by
Reduction
too
of
to
Internal
in
99,
specific
storage
the
next
t00,
118,
the
Struc-
case it
stractures
in
past
structures)
In
organizationl
discussed
the
of
richness
independence.
of
reducing
the
a
of
in
is
the
optimally° subsection.
llke
or
evoke
and
is
loaded
the
objectives
an
external
%o
a
to
application
with
program.
problems
form
the
secondary
expectations;
similar
internal represen--
conceptual
Woptlmizatlon"
accesses
unrealistic with
and
forms
an of
to
accesses
internal
these
number
query
not
external
between
somethin~
reduce
~' s h o u l d
complex
/54,
this
programs
the
are
mappings
is
to
execution
"optimizatlon
the
Surveys offer
data
studied
Problem
is
objective
the
utilize
problem
where
(fig.
to
know
access)
mary
and
with
to
this
extensively
remains
binding
not
does
been
textbooks
structures
program
The
in
problem
system)s
~.2.
have
as
pri-
storage The
the
term
problem
optimization
In
compiler.
Variations
in
handlln@
Of
intermediate
example,
the
expression
opl
(A
where
A 9 By
C)
the
relational
two
intermediate
evaluate amounts liary
AB of
of
in
A v B)
data
execution.
this data sets
AB
the
=
A
far
of
{i.e.
opl
opl
indices)
By
and
also
D*
op3
CD
with
the
fOr
Conslder~
are
an the
On of
be
D
and
to
enormous
amount occupied
other
hand)
queries can
oriented
can
op5
storage
some
and
C
addlton
be
of
the
then
auxiby
the
there
are
dorin@
built.
towards
consequently built
if
in
construct
enormous
the
slze
operators
might =
in
inversions
primarily
modest
to
requires
evaluation
is
connected sets.
evaluation
exceed
and
temporary
area
base
and
accesses
C)
are data
D}
algorithm
by
some
over
straightforward
an
may
in
optimization
relations
A
storage
least
a
op3
large
Such
which
a%
research use
(C
relations CD.
improvements
auxiliary query
op2
relations
evaluation
tive
are
secondary
underlyln@
the
D
op2
algebra.
stomagey
drastic
B)
of
expressions
thelr Most
interac-
assumes
temporarily
of
for
that one
75
One
of
~he
due
to
Palermo
queries by
earlles%
no
lus
of
system
is
and
by %he
consists
assume ies.
that
mentary
queries
has
not time
Into As
than
by
• GPeenfeld
implemented
verslon
and
Chamberlln
advantage
of
calculus of
/6/e
a
Their but
inversions,
intermediary
of
seDIAM
lists
attentlont
search
earllert
To
stategles
the
researchers
reduce are
problem
of
under
CPU
tlmet
access
module has
however, also
eleput--
the
assump--
less
dynamic
can per
be
taken
other
in
com-
appllca--
been
for
Taylor
assump-
Thls
which
approach
primarlly~ Conway~
the
system.
required,
or
compiler
Pernandez9
the
quer-
to
/19S/.
perhaps in
module
the
form
They
elementary
according
essentially
standard
valid.
over
organized
bottleneck
always
to W o n g / C h i a n g .
expression be
some
is due
or
reasons
125,
75,
44
180/.
should
respect be
research CPU
6.
Research
be
clear
that
de$crlbed
as
long
are
the In
as
there which
Is
AND
in
area
no
of
are
a not
constraints
of
generaEly respected
deserve
"minimizing"
is v e r y
number
generally
potentially
%o
problem
to make which
Pecognltlon
addition
MODELLING
the
reduction has
structureT
Questionst
time
the
above
to s y s t e m
soy
tectuPee
and
84/
in
construction
can
much
seaTch
IMehl,
/5,
the the
quant~fication
Astrahan
becomes
the
efficient
several
efficiency
algorithm
to
a CPU
Senko
of
achieved in c a l c u -
to
problem
base
into
is n o t not
variables for
and
type
is
applicable
a boolean
reduction
mentioned
proposed
iS
data
interpretive
tlon.
reduction
received
hoverers
piled
It
and
of
This
is
InversiOnSo
expression
CPU
As~
the
the
investigated
once.
principle"
taking
also
problem
growth
Another
by
thls
algorithm
handllng
547/.
into
the
domains
Ghosh of
primarily
the
that
less
/89,
of
the
"least
problem
query
usage
tion
and
to
case
a boollan
tlon
and
of
each
a
is described
merglng
thls
for than
reduction
involves
related
In
tlng
by
that mope
Astrahan,
efficiently
algorithm
claims
restricting
A
at
look
investigations
accessed
applying
algorithm
{indexes}
CPU
and
described
expressions
paper
and
operations.
reduction
their
to b e
indices
Rothnie
problem
Palermo
has
expressions
quence
A
/140/.
tuple
building
comprehensive
more in
secondary
complex.
~very
assumptions valid° data
attention storage stoeage
wlth
This
has
base
archl-
in
future
requirements accesses.
ANALYSIS
of
modeling
and
analysis
has
as
Its
objective
to
76
learn
about
velop
slmple
management changes
existing
probabillstic
system,
in a
management
Such
system
system
primary
have
data
management
of II
with
report
structed Their cesses plex
in
possible system
way
Data
base
itles
though
may
be
Tools.
the A
base
event
model
this
performed
the tool
tools
/91/.
organlzatlonst
which
they
of
/132/, pro-
so
com-
critical.
comprehensive Is
con-
the
are
become
but
Nakamura have
package
processes
direction
to
a compara-
these
slmula±ion,
may
of
lead
simulator
These
systems
follow-on
simulation
performance
in
of base
should
base
and a
using
system
development
step
data
and
driven
system.
simulation
and
storaze
conventional
a
management
of
proposed also
they
of
tool by
mention
A
data
base
by
Rel-
proposed
a
is
they
server
of
complex
analysis
FOREM
in
~22/,
Yao
analytical
to
restrict
the
the
be
IS an
example
storage
in
/196/
modeling
analytically
themselves
of
analytically
IMS for
level,
treatable
therefore
to
well
of a determlnlstlc~
structures.
and
activ-
The in
Wedeklnd
methods
/193/
are
tractable
developed "r~Ther
For
organization of
It
queueing
Lavenberg
general"
example,
Extensions
by
model
and
distribut!onsT not
does
and
total
I/O
the
model
are,
their
expllcltly is
of
Shedler
model
represent
represented however,
by
likely
the
/103/.
a
Is the
sin@le %o
make
necessary,
Perhaps
the
indices
%o a
this
objects
too
studies
system,
of
storage
simulation
clearly
the
also
queue.
been
for
allow
gross
physical
also
deterministic.
component
Though
are
Cardenas
essentially
DL/I
have
Analytical
parts
analytical
at
of
detailed,
base
data
a
administrator
recently
simulation
base
systems
a whole,
defined
To
has
techniques
to e v a l u a t e
base
data
Influence of
colleagues
FOREM
data
help
simulating
hls
to o v e r a l l
fairly
out
system
a data
a
data
called
useful
the
questions
and
tool ~aerder
desIsner
base
a
activities.
Senko
indexlns
of
de-
to
/~4/®
ier
as
about
is
a
that
II a p e
respect
with
model
138/.
current
PHASE
e%
by
analysis
/154~
of
limited
early
an
the
and
the
predict
data
of
analysis
al.
the
behavlour
components to
Thus
modeling
recognized
and
the
research
been
FOREM
deslgno motet
their
help
these
has
PHASE
even
for
may
in
for
tive
models
models
system
and
need
called
or
analyzing
interest
The
development
by
systems
problem
most
frequently
flat under
file,
investigated AuThors~
varying
who
assumptions
question have are
Is
the
contributed Lum
and
selection %o
Ling
research
/114/,
of on
Palermo
77
/139/, Yue
Stonebraker and
tigated
Data
Wong
/197/
the
question
may
be
tempt
to
have
position
in
data
the
Chen
and
have
given
]l16y
Lum
21].
an
and
Chen's
al.
model
Into
response
The
second
%Istlcs~ Easton
finds
the
takes
in
sets
has and
heuristic
been
a
60/.
approaches
are
to
queuelng
arm
of
given
Is and
no
and
of
the
to
an
at
target
and
by
data
and
Buzen
the
hierarchy
to
minimize
distri-
usage
recently
The
sets
allocation
their
for
storage
suitably
algorithmic
bounds
a
cost.
ls
given Wong
best
some
Buzen
usage
effects
drives
Chandra
by There
ac-
as
minimal
%o
Their
ARPA
within
a
have
the
data
contention
disk
well
at-
improve
(like to
function
an
given
network
allocation
disk
data
network
the
constralnTs.
T
in
a
as
inves-
categories:
the
cost
storage
considered
/31~
Wong
on
total
algorlthmWs
number
second
etal.
Their
minimizing
over
Lum
of
hierarchy
contention
of In
has
/164/o
variety
allocating
addition
given
a
time r
information
which
Shneiderman
storage
case
of
a
problem~
data
a
nodes
costs.
specify
under
over
/15G/~
Schkolnlck
levels
minimize
in
to
problem
consideration.
tlme
buting
line the
/71/.
access
to
%hlrd v
statistical et
and
devices
reduce
algorithm
levels
costs
/23/,
different
within
assigned
considered
hlerarchyT
and
be
at
destributed
allocated
hierarchy, to
Cardenas
Stewart
size
or
to be
and
index
physical
have
cessibility
Farley
between
to
t98/,
Kins
of
to
balance
assigned
net}~
and
allocated
first 7 data
be
/174/,
sta-
also
by
solution,
but
optimality
ape
derived.
Casey
and
within 32/. al
Chang
a
have
simplified
Chang
has
function.
considered
network
extended
Both
the
of
Casey's
specify
third
computers linear
problem to
cost
algorithms~
of
reduce
allocating
line
costs
data
/26,
functions
to
a
attempt
to
minimize
which
more
27,
generline
costs.
With
the
open: how
analysis
what is
ape
the
Nakamura
etal. of
the
(tO ences
and
data
can
be
reported
of
a
data
far~
base
their
simulation
model.
Answers
describe over
system}
collected
in
a
to
a
at
least
one
input
data?
system
statistically
raise %o
operational
Hildebrand messages~ base
so
characteristic
observing
userVs
the
In
their
actually
Rodrlguez of
The
workload
validity by
work
such
systems how
the
Oft
and
data
of
the
trace
of
physlcal
systems
other
words~
question
can
only
ranging
disk
of be
the
found
statistics.
appllcation
/145/.
remains
characterlzed~
collecting
trace
operational
with
further
questions
relevant
question
from
program address
Lewis
and
a
log
calls refer-Shedler
78
derive
from
such
tmansactions process
In
a
(i.eo
a
be
the
model
Poisson
flt
mine
the
used
To
%0
a
model
%o by
Ghosh
/86~
model
blocks
on
Ghosh
with
for
also
and
Easton~ to
Tuel
to
sequence
of
behaviour
determake
model
an
deter-
are
also
extension
references
09
of
and
again
has
a
large
data
of
the
cer%aln
and
which
use
storage
the
/I07/.
model
measurements~
secondary
between Polsson
rate)
relationships
proposed
the
with
and
linear
has
times
non-statlonary
dependent
Tuelv
by
Easton
a
established
61/®
comparison
comparison
interarPival by
time
and
system
model.
reference
programs
this
a
theoretlcally
base by
%he
with
data
data
coeffleien%s validate
pllcatio,
a
empirical in
independent
dated
of
the
modeled
process
approach~
parameters
interactions
that
satisfactorily
semi--empirlcal
mine
the
observations
can
ap-
valibase
system.
It
is
clear
valldated met. art
The of
next
that
reasons
data
analysis
a
7.
data
SUMMARY
least
we
AND
two
major
systems
our
opinion
by
integrity
system
has
ventional
to
The only but
goal
in
also
that
have
in
over
to
their
%hat
described continuation
models
been
be
state
summarized
this is
and
convincln@ly
current
research in
research
on
of
the
in
the
modelln@
section
and
has
extremely
be
than
a
a
made
Important
the
base
of
of
part
system
tREes
the
complexity
of
Consldec
past~
user~ of
may
in
this be
that on
adwanta@e
the of
the
at
data
on
integei--
In
a
the con-
independent
due
different
devices.
the
userms
same
(or
cumrent
in
data
system
responsibility. devlce
of
the
base
is
language
goals
sys±ems~ a
structures
requires
that
programming
alone
conventional
storage
storage
a
complexity
the
program
different
system
In of
of
consldered:
systems.
large
userVs
activities
objectives
equivalent
data
the
clear
the will
its
independence.
take
of a
yet
wlth which
as
%he
responsibility
independent
not
view.
operating
system~
Implementation
also
greater
data
the
is
that
of
ape
or
and
remains
connected
systems
representative
has
general~
summarize
far
Implementatlon~
obtain
in
and
factors
base
to
CONCLUSIONS
to
Data
fly
base
poin%
try
are
it
pPogress~
practical
Before
this
research
~owever~
of
slgnlflcant
objective
characterizations
for
base
section.
from
the
workload
program a,otheP)
structures
is
%0
not
device~ during
79
access base
where
admlnis%rator
The
area
ed
restructuring
only
of a
amount
base
years
time;
of
expensive
are
assessment, in
This
researchers
that
question~
of
not
a
should
cussing
of
one
A
of
promising
design
Into are
the
system
Data
With
respect
than
can
Research be
be
base the
model
data
be
have
under
driven
two
interface
branch
was
against
the data
[mpor--
are
but
%he
researchv
level
that
d{s-It
mope
is the
other.
will
for
of
lan-
now
attitude:
and
by
research
programming
of
started
Imple-
reduced
around
mode~s
top
requires
certainly
changed
on
man--machine
a
is
lar@e
contlnuev
the
interac-
Investlgatlons prototype
efforts
way.
by
%0
storage
data
description
and
increased
power
wlth
stmuctures
intelligently
handled
%0
a
repmesented
reached
of
researchers
wlth
been
be
held
which
base
a
large
war"
takenT
between
have
to
amount
questionT
start-
takes
such
"religious
data
another
investigated
put
mope
there by
is
data
emphasis
langua@esv
of
lan~ua@es.
already
base on
mapping
how
the more
available
management
systems.
these
structures
can
utilized,
systems
in
administrator research
into
a is
which
combined
his
solution
sometimes
is
models
a
has
falr is
a
selecting
the
In
has
in
the
activities
now
probably
as
can
of
Before
a
model
However~
be
efficiently
Modeling
data
of
aspects
translatlon~ to
in
activities
problems
fa[lume
It
engaged
evaluate
and
of
that
different
solvem.
justified
continues
is
and
problem
user
A|ajor
viability
rlsk
nature
q~stlon
how
tive
%he
Peal
efforts.
exampleT
which
the
of
whlch
are
supported.
question
number
the
For
they
problem
much
of
new.
the
the
Justifies
similar
be
the
so
t
clarification.
and
is
implementation
models. %ant
research
Understanding
performed
spent
The
control
demonstrating
prior
guage
systems
ago.
prototype
mentations
under
role.
data
few
Is
has a
way in
that
its
already
useful
set
significant
beginning. been of
help
It
conducted
tools
results
will
for
take and
the
for
some
has
system
to
the
timer be
data
before
continued
deslgner
or
ad--
ministrator.
Comparing
first signed
the
a
fop
prlmarily rent
obtained
difference
state
and
employed
designed of
of
art~
for it
results
emphasis, by the is
with
parametric interactive llkely
industry
Systems
that
~ctivities
llke
users
while
problem research
we
Iris a r e research solver. changes
may
prlmar{ly systems With
the
priority
see
de-are cursome-
80
what
in
favour
described ningt
in
of
the
section
productive
parametric 6
is
user.
already
The
now
modeling
primarily
and
analysis
work
oriented
towards
run-
systems.
Conclusions
With what any
the
wealth
are
among
trends
tion?
Major
~iI
are
these
heartedly
research these
existing~
results
recognlzable
Whet
answer
of
with
currently
we
the
to
major
are
becomes
major
respect
the
quesflons~
it
%o
achievements?
a
change
problems?
well
meaningful
aware
of
While
that
Are
research we
the
are
ask: there
direc-
tryin8
reader
may
to
whole--
disagree.
resplts
I.
Model
One
Development
of
the
a~reement shown
primary
on
in
deal
a
~®
at
internRtv
of
problem ~dmlnistrating)
data
b~se
administrator
2.
to
Multiple
Due
to
lem
solver,
been
the
%ures
ture
in
many
to
the
we
user
control his
to
roles
the
has
is
have
programmin@~
of
the
{conceptual~
different
that
and
in
over
his
storage
installation.
Logic to
the
record
power
and
commercially of
that
finally
research
in
is which
application
multiple
notions
solvin@
Storage
Time
of level
importance
views
interactive
at
a
time
use
of
has
the
~ea--
systems.
predicete
parametric
logic
flexibility available
and
prob-
locks data
In are
of
bases
as
to
more
gener-
use®
Structures structures
"what
textbooks
a
means
assume
performance
exceedin~
the
problem
Storage ally
at
offered
similar
3,
Records
research structure~
information
users
and
orientation
particular
the
the
high
of
function
tune
developed
system this
levels
solving,
base
pest
of
base
that
data
structures
data
three
external}
{parametrlcs
~ndependence
particulart
In
least
Data
achievements
type
fig.
with
for
can
like be
represent
research
found
the in
important
~ctivlties.
B--trees Knuth
or
VOlo results
to 3t
say chapter
and
are
it
6" basic
or
other to
fu-
81
Recognizable
I,
Trends
Data
After that the
find
models
area
of
the
the
data
base
this
area
has
respect
contain that
solutions men%
3.
Data
current
management
system
systemo
ent
types
of
the
management
in
one
likely
to
and
to
functlons.
It
and
data
in
can
of
solved experi-
which
need
in
recovery T be
Increased
systemsv
the
problems
cannot
system°
sharing
of
arises
a large
is
more
data
into
number
a
consistency a
much
be
ex-
integrated
base
mann@e--
of
and
data
places
where
programs to combine
central among
simpler
descriptive
system
and
recognizable
offerln~ the
operating
about
trend
ensuring
is
the
in
differ--
dictlonaryv
descriptive to
interface
des--
stored
these
data
base
the
data user
for
information.
~
Performance
constitutes felt
that
tn
constitute research
the
sense
currently current
performance~
performing
and
many
makes
merge
descriptions
generally
Performance
ble
A of
maintaining
l0
and
within
apparent,
information
the
problem
problem
functlons~
system
time
more
models
the OS
lead
research
even
different
that
management already
realized
Dictlonary/Directory
the
thereby
even
increasingly
justi~icatlon of
resource system
to operatlng~
criptive
and
Into
apparent
operating
further
systems
With
DBMS
operating
ence
pected
future.
made
outside in
the
scheduling~
classic
their
is
superimposition
in
has
of
have
it
coexistence
the
attention
research
i.eo
The
is called
Integration
%he
controversies~
system°
more
Past
Major
Coexistence of
different same
system
2.
Model
years
of the
systems
do
though
%hls
alternatives= a bottleneck is
throughput ma3or
necessary
In has in
not can
and
transaction
problem. offer
the
only
be
It
level
this
been area.
recognized
rate generally
of
proved
partlculavltha% not
is
achlewa-by
CPU In
better
time
may
the
past
82
2.
Integrity)
It
is
Data
necessary
system
can
phasis
be
handled
here
on
is
these
functions
users
installation
niques
which
desirable
3.
Concurrency
in
in
&
%ribu%ed
on
network
Design
Tools
todays
systems)
%he
to
he
is
these is
Data
In
a
given
data
from
logical
time
order~
to ~
process
extreme-
schedul-not
have
bases)
so
in
been
increase which
oP
how the
the
for
in
are
dls-
systems~
llke:
to
how
select
current
of
helps
reported
development
to
the model
hardware
state
which
research
the
future
decisions
InfoPmatlon) of
for the
to
time=
such
de!etion
and
and
the
in in
of
inevitable
clock) (The
which
The
degrades
and
ant)
making
section
6
tools,
~eload Is
range
A
solution avoids
order)
significant
large
interruption
time
a
is to
not data may
from the
interruption.
reorganize
type
of
does
performance
therfore
For
is which
and it
parts
generalv
of
the
duping
too %o
utl--
in
which
become
a~fect
storage
available
hours
pbyslcal
not
is)
bases)
of
update
physically
reason
physical
use. the
%o
fragmentation)
which
normal
addition)
is
but
dump
Peorganlzation}o necessary
wlth i%
reestablish
wholel
tolerable.
ks
tech-
are
problems
data
more of
With
storage
To
necessary
around
Simllarly
the
computers.
even
much
system
like
llzation.
as
provide that
prevention)
These
with
number
Some
information
b~se
em-
Reorganization
disorder the
of
relevant
dynamic
stored the
a
decisions.
S®
so
faliures
efficiency)
way.
and
Information)
certainly
benefit.
The
(to
tPlvlal}
~deadlonk wlth
repmesentatlon. not
representationst
efficiency
from
specifying
efficiency.
is
has
of
data
wlth and
recovery
and
make
conceptu~l
physical
whole
systems
In
has
a
Papld
concurrency
4,
user
and
efficiency
satisfactory
a
Pules system
functlons
connection
multlprocessing
posslbllties
lacking,
of
again
solved
as
Recovery
more
the
mode
and
problems
by
ignoring
allow
ly
ing)
provide
integrity
enforceable
which
The
Independencev to
reoPganlzatlon
this
are long
weeks
data
used to
be
fop
a
problem
83
Acknpwledgement
The
are
authors
Scientific Heights
grateful
Centerv and
San
Jose
Pope
and
North
they
are
grateful
of
preversion
a
8.
to
and
America ±o of
their
members to
Eo
F.
collegues the
IBM
many
Codd
helpful
and
M.
E.
at
the
Research
representatives
for
thls
of
from
IBM
Heldelberg
staff
at
Yorktown
Universities
discussions. Senko
for
the
basis
lh
~u-
Specifically a
crltlcal
review
for
status
report°
BIBLIOGRAPHY
The
subsequent
report.
It
research
reader
the
list
is
hoped
results.
critical
in
entries
in
Re~erences
Definition
also
be
lists
II~
They
37
82
169
179
194
142
152
166
12S
152
175
65
66
78
17
18
Tndependence
47
48
55
82
180
181
182
194
4S
Data
Integrity
1
29
30
129
163
176
Data
1
Manlpulation
3
6
Languages
13
16
a
should
iS
a
partially
author.
this
reference
to
are
intended
not
he
in
recent to
help
considered
as
Subsection
references
first
95
as
elsewhere,
cross
which to
value
annotations
found
of
according
of
the
Languages
35
Data
can
subsection
8.Io
ig
literature.
which
ordered
it
contains
presen%~
selecting
ordered
Data
references
that
references,
Cross
o~
Where
revlews~
alphebetically to
to
numbers annotated
I contains referring
list
of
84
19
20
2S
28
3S
36
40 72
42
46
59
68
69
70
73
74
79
87
93
I01
102
lOS
106
I09
II0
119
123
!28
131
136
141
143
147
149
15S
158
173
183
I~4
185
194
198
Data
Hodel
17
Da%a
Equ£valence
82
122
134
167
Models
1
2
4
7
8
14
20
34
35
38
39
41
43
52
56
57
58
63
68
69
70
79
!21
124
133
151
15~
157
178
190
7S
94
171
142
153
165
177
Data
Security
30
Dat~
44
T~ansla%ion
95
108
Modelling
and
126
Analysis
-- G e n e r a l
-
24
61
8S
~6
91
103
107
I13
115
117
127
132
137
144
161
188
i~3
196
22
138
14S
154
170
31
32
Tools
12
-- O p t i m i z a t i o n
2t
23
Algorithms
26
2~
33
85
60
71
88
97
98
114
139
1SO
162
164
174
197
94
171
187
S0
62
81
83
148
116
Privacy
76
Recovery
IS
Resource
29
Search
S
Storage
Allocation
30
65
and
Scheduling
67
Algorithms
6
84
92
140
147
1~
Structures
9
I0
II
S1
77
80
90
96
III
112
130
146
ISS
182
186
189
Surveys
and
Textbooks
8
49
$4
64
99
I00
104
118
120
13S
IS6
160
172
191
192
86
8e2o
References
1.
Abrlal~ W0rk.
J.~°
sterdam
y
paper
ing
the
is
W.
cessoP !44
L~t
and
156
Deductive (1968)
is
the
terms
of
father
00.2200
6.
to
~educe
M°
l%hm
for
the
i@74
ACM
Astrahan~
M~
scope
exceed-
advocates
a
on
Data
Base
!975.
binary
Associative
ACM
Natl.
Pro-
Confe~ence~
relational
model
definitions
the tO
of
grandfather
deductlve
Manipulation. T
as
in
a
relations a
function
capabilities.
The
Division
and
M.
a
Connection
Matrix
Poughkeepsie
v
TwRo
English
algorithm
employed
Co
of
W.
S.
is
P.
by
The
ACH v
in
RIL
Chamberlin~ Language. the
data
attrlbutes~ is
matrix
true
the
rows
and
with
techniques
it
A
SEQUEL to
make
accessing
Programmer
Search
Accessing New
described
query
Query
essentially
where
a
1
in
respect
to
have
be
To
requlrementSo
Gosh~
and
mat~ix~
represent
Sparse
Workshop~
M.~
binary
attrlbute
Independenf
given
11minlmization"
the
false.
algorlthm %0
a
columns
that
Data
Describes
Bachmann~
a
(e.g.
as
sto~age
SIGFIDET
heuristic
Structured
7.
Am-
and
Interpretive
accepts
leads
Data
the
otherwise
As%rahan~
path
of
represented
indicates
entity,
applies
a
Committee
1968
Development
entlties~
position
cess
IFIP
1971 is
represent
An
It
which
of
System
June
Study
relations
Concepts
Informatlon
A
of
Holland~
entities.
Newsletter~
TRAMP:
wlth
describes
Report:
Capabllitleso
mother)
IBM
it between
SIG}~OD
system.
othe~
and
R.
area.
implementation
in
Method.
5.
North
•
of
Ashany~
ACM
Sibley.
answerlng
the
Proc.
Management~ 1974.
philosophical
relations
in%trim
question
a
B~se
April
and
management
binary
Systems.
with
-
TRAMP
4.
base
with
ANSI/X3/~PARC.
Ash~
Data
Corsica
mathematical
data
model
Management
3.
Semantlcs®
Cargese~
1874.
The
data
2.
Data
Conf.
York~ whlch
Path
Selection
Hodel
{DIAM)
Alger•
Proc.
!974. constructs
a
DIAM
ac-
(Fehder).
D. CACM
D.
Implementation
1By
5@0
Interpretem use
of
-- 5 8 8
and
a
{1975}.
the
secondary
of
feductlon
indexes
for
operations.
as
Navigator.
CACM
16,
653
-
658
87
(1973). C.
8.
Wo
Bachmannt Proc.
C.
vol. ape:
evolution
The
The
of
Rot
Large
and
Ordered
used
by
data
Lecture.
Management.
data
AFIPS
description
NCC
1~75
(conceptualt
ANSI/X3/SPARCo
structured %0
the
introduction
of
Bayert
Re
model
[graph~
understanding
new
hardware
Symmetric
Binary
structure
described
Bayer~
Storage
network)
of
to
the
vs
nature
support
data
%ual
Data 1,
are
a
and
290
189
search
%he
has and
be-
ef~i--
method.
Structure -
of
{1972}.
{B-tree)
Logarithmic
B-treesv
Bayer
-
173
of
Maintenance
306
and
Mainte-
{1972},
modlflcatLon
of
the
storage
McCvelgh%o
Characteristics
and
Processing
Methods
74,
440
for
--
444t
Searching North
and
Hollandt
19740
paper
access~
by
I~
characte¢istics
B--trees
and
organlza%lon
Informatica
Information
Amsterdam~
Informatlca index
Binary Acta
Organization
structure.
are
Symmetric
Addressing.
Acta
storage delete
Algorithms,
R.
Eo
hierarchical
Insertl
nance
12.
as
Indexes.
a standard
clent
The
trlpartlte
McCreighiv
described
come
11.
Base
Award
{1975).
contributes
The
Turlng
algorithms.
Bayert
The
3.
a
debate
model
data.
base
of
ACM
in D a t a
-- 5 7 6
external)
current
relational
1973
Trends $69
I.
20
10.
W.
44,
Trends
internal~
9.
famous
Bachmannls
contains
pseudo
a
discussion
random
of
access
hashing
{l.e.
B-trees
and
indexed
In
sequential)
random
and
vlr--
memoeies.
Bennet
t
Systems.
~e
To
Traditional
and
K~uskalt
appear
in
stack
large
average
large
number
gorithm
Tot
to
J°
Joof
handle
thls
Processing
and
Dev.
algorithms
distances
distinct
Stack
Res°
processing
stack of
IBM
Vo
as
pages,
situation
they The
Data
Base
(IS75).
are
inefficient
appear authors
wlth
for
in
the
describe
drastically
for
case a
o~ new
improved
a
al-
effl--
clency.
13.
Bergenv
Mot
Environment and
Its
Erbet for
R.t the
Application
Pistor~
P-t
Interactive in Computer
Schauer~ Evaluation Aided
U., of
Design.
and
Walch~
Go
Scientific Proc.
Workshop
An
Data on
88
data
fo~
bases
interao%ive
dams s editors), ble
14.
from
Blller~
ACM.
~®s
and
15.
North
BjoPk,
L~
National This
16.
17.
paper
Eo
Formal
is
the
in
papers.
a data
Gamma--Zero
n-ary
of
and
J®
C.
Decker~ Data
a
View
on
74,
Proe,
DB/DC
papers
See
IBM
o~
and
[5--16~
J.
G®
1979v
Lln-
availa-
Schema-Subschema of
IF[P
System.
Con--
1973
ACM
T.
K.
describing Davies
L.,
Base
Interface: Report
level
for
query
recovery
the
Tralger,
Research
low
a
first
I.
L.
of
The
Speclfiaca%ions RJ
1200~
language
1978.
accessln@
a
base.
An
(R.
two
system~
F**
Cleemput
(1973}.
of
Relational
data
R.
for a
Operations.
descmiptlon
Sytems
Scenario 142--146
Eo
relational
Base
1974.
base
A de±ailed
Bobrow~
Processing
second
Codd~
D.~
Objects
Jo
Amsterdam,
PPOC.,
the
BJo~ne~
Neuhold~
Recovery
concept two
/149/.
~ollandv
A.
M.
September
Schauer
[nformation
Conf.
(W.
Canada~
also
See
Correspondence, gress~
design
Waterloo,
Experimental
RUstln
Data
editoP)~
Management
System.
Prentice--Hallv
In
Englewood
Data
Cliffs,
1972. The
paper
describes
It
contains
(hierarchy
18.
Boyee~
a or
R.
as
Management~
Proc.
of
1974,
North
Holland,
SQUARE
iS
a
Bracch!~
D.
IFIP
G.~
the fop
Fedeli~
System.
ettrotecnica~
vs*
D.~
implemented
dlsc~sslon Codd*s
King,
Work.
W.
Conf.
of
relational
F.,
Expressions:
AmstePdam~
on
system
excellent
Relatlonal
syntatically
based
Management
but
approach
Chambevlin~
Chamberlln/Boyce
19.
brief
Queries
language
experimental
network)
F.~
Specifying
an
and
LISP.
the
EDMS
approach.
Eammer~
SQUARE.
Car~ese,
In
MQ
Data
Corslca,
M.
Base April
1974®
te~se,
so--called
set
omiented,
"concept
hlgh
of
level
mapping"°
query
See
also
Date
Base
"SEQUEL".
A.~
and
Laboratorio
Poli%echnica
di
Paolini~ di
P.
A
~elatlonal
Calcolatorl,
Milano,
Internal
Instituto Report
di
No.
EI-
72--5,
1972. ~|ORIS is
a Codd
pulatlon
language.
hierarchical
relational The
structures
system
wlth
a
users
wlew
{i.e.
uanormallmed
calculus
(external
oriented
schema}
data}.
may
manl-include
89
20.
Bracchl,
G.~
Model
for
Prec.
of
Fedeliv
Data
Base
IFIP
Holland,
A,~
the
ceptual
Conf.~
schema
P.
Systems.
Cargese,
A Multilevel In
Data
Corsica,
Relational
Base
Management~
April
1974,
North
1974.
binary
{hierarchical,
Paolinit
Management
Work.
Amsterdam,
Advocates
and
relational
and Codd
many
(graph
model
models
relational,
for
etc.}
the
as
fOm
model)
the
external
well
as
con-
schema
Internal
sche-
ma,
21.
Buzen~ ry
queuing Is
offems
a
Model program
and
A.
costs
also
and
Fo
CACM
E.
The
play land~ GADS
F. t
1,
System.
is
an
North
Hol-
sets of
in a
data
memory
sets.
hier-
The
paper
be
tlme
of
File
Organization
--
1973,
used
to
given
estimate
the
data
total
sto-
organization
Performance
of
Inverted
Data
Base
197S. SchkolnIek
and
Yue/Wong
for
re-
J.
P.
Doubly
Modeling
Chained
and
Tree
Analysis
of
Structure.
In--
1975.
J.
L.t
Evaluation
Giddlngs~
of
an
Go
M.~
Interactive
Processing
74,
and
Manteyt
Analysis
10SS
-
and
1061v
North
P. DisHol-
1974o
and
provides a
graphics intended a
data
in
ence
gained
with
GADS a n d
this
kind.
of
-$48,
263,
The
--67t
stored
system
271-27So
in M e m o -
subject.
data
a
Balancing
results.
may
Sagamangt
interactive
It
data
Selection
540
and -
Information
locations
grammers.
and 16y
Bennet,
and
Amsterdamt
graphic
and
57
P-T
Design
Chen#s
access
thls
Organization:
E.
to
Farley/Stewartt of
Data
Carlsont
74,
allocation
which
2S3
18,
A.
Systems
Load
specifications.
Cardenas~
Base
Processing
the
Analysis
CACM
Klng,
Optimal
access
of
average
related
treatments
form.
2S°
the
descrlbed~
A.
P.-S.
Evaluation
System.
Structures.
24.
F.
is
Cardenast
See
for
to a n a l y z e
generalization
device
cent
P.
1974.
model
and
a
A
Chen~
Information
used
Cardenast
rage
23.
and
Amsterdam,
archy
22.
P.t
Hierarchies°
land, A
J.
variety
of the
system as
a
for tool
extraction files.
The
requirements,
data to
related
be
used
technique paper
by for
to
accessin~
discusses
which
must
geo-
non--pro-
experibe
met
by
90
26.
Casey~
R®
Network.
27.
The
author
lem
of
The
costs
28.
1973
SJCC
gives
an
storing
G.
Design
Free.
D.
Query
of
Copies
1972
Prec.,
exact
and
data
of
Chamberlln~ lish
Allocations
allocating
R.
Casey~ NCC
G. AFIPS
sets
at
of
and
Tree
a File
40,
heuristic a
in
617
an
-- 2 2 5 ,
of
to
Networks
Distributed
-- 2 5 7 ,
D.~
and
Boyce~
for
the
prob-
computers,
between
251
Information 1972.
solution
network
%ransmlssion
42,
ACH
a
within
vol.
Language~
of
vol.
~Iven
nodes.
Data.
AFIPS
1973®
R.
F.
SIGFIDET
SEQUEL
-
Workshop
a
STructured
1974,
ACM~
Eng-
New
York,
1974. SEQUEL
Is
SQUARE, Boyce/
29.
a
however,
D.
Free
Scheme
for
tion
Processing authors
processes ite
in
delays
zatlon NCC h
and
view
The
cussed.
North
case
to
Of
those
English.
See
deadlocks.
of
-
use
Traiger,
a
Data
In
Base
~olland,
A
Deadlock
System.
Informa-
Amsterdam,
deadlock--detection Their
L®
and
algorlthm
1974o
baekout
of
avoids
indefin-
Viswst
Authorl--
process.
D.~
Gray~ in
44,
a
virtual
J.
a
425
can
the
in
343.
%o
language
restrict
similar
%o n a t u r a l
and
Locking
propose
a
closer
F®,
340
Locking
Views
R®
Resource
vol. is
query
syntax
very
SQUARE.
Boyce,
D.~
D®
Proco
a
for
semantics
74,
of
Chamber!Int
with
with
Chamberlin
Chamberlin~
The
30.
language
N.t
Tralger~
Relatlon~l -
430,
Data
I. Base
relation
derived
form
The
problem
of
be
fop
authorization.
access
%o
a
SysTem.
1975
AFIPS
1975.
SEQUEL. used
L.
view
for
the
other
updating
relations
via
is
dis--
views
Locks
exclusive
temporarily use
of
one
user.
31.
Chandrav ment
to
disk one
32.
System.
S.K®
to
related
specJ
drives dlsk
algorithm
Chang7
Wong~
K.
C.
Worst
Storage
Case
Analysis
Allocation.
To
of appear
Place-
a
in
SIA~
Computing.
authors
of the
on
and
Ko~
algorithm
Journal The
A.
fy
such drive is
a
heuristic the
that
is
ACM S t G M O D
probability
minimized.
analyzed.
Data
algorithm
Base 1975
See
The also
Conf.
of
worst
allocate
data
simultaneous case
sets access
performance
of
EasTon/Wongo
Decomposition InT.
to
in on
a
Hgmt.
Hierarchic of
Data~
Compute~ San
Josev
91
1975. The
author
cost
33.
Chenv
P.
tem.
1973
A
34.
S.
Optimal
AFIPS
Caseyls
results
the
hierarchy
CODASYL
Development
and
deflnitlon
section
sets
can
CODASYL
CACM
by
allowing
&
non-llnear
of
an
n--tuples
CODASYL
also
2821
problem
Language
many
or on
the
taking
Structure
Sys-
queu|n~
Group.
An
In--
1962.
ideas. idea
which
Storage
1973.
BuzenfCheno
1 9 0 -- 2 0 4 y
entity
Multilevel -
allocation See
fop
of
in 277
Contains~
that
then
fop
files
jolns~
may
example T be
union
and
interInter--
performed°
from
original
St
source
Programming
Available The
be
42~
Committee.
Algebra.
as
vol.
considerations°
"oldtlmer"
the
Allocation
Proco
into
preted
Flle
NCC
of
treatment
formation
36.
extended
effects
An
3S.
has
function.
Language
Committee.
1971.
DBTG-Report.
ACM. DBTG
proposal.
Programming
Language
Committee.
DBLTG proposal,
Febru-
1973.
ary
Contains nltlon
the
COBOL
language°
data
The
manipulation are
languages
and
suvschema
essentially
data
those
of
defi-
ref.
3S.
37.
CODASYL
Data
Language. Essentially
38.
39.
CODASYL
the
Systems
Base
Management
from
ACM.
of
same
data
data
model.
F.
Relational
Codd~
The of
E.
CACM
paper
in
Feature
Systems.
compares
A 13~
377
which
Technical
commercially
Model
-- 3 8 7 y Codd
Committee. June
definition
Committee.
a
network
Language
Development~
Primarily
Banks.
40.
description
Journal
language
Analysis Report,
available
of
Data
Description
1973o
Data
as
of
in
Generalized
May
1971.
systems~
for
35.
Available
contains
Large
Data
Shared
also
Data
1970.
introduced
%he
{Codd)
relational
model
data.
Codd~
E.
F.
A Data
Base
Suhlan~uage
Founded
on
the
Relational
92
Calculus.
41.
E.
Codd~ Model~ Data
1971.
Fo
and
Base
CllffsT
42.
Codd, Data
Further
Systems
E.
F.
Base
of
Information
F.
Amsterdam,
Recent
Base
Relational
Sublangua~es.
Prentice--Hall~
R.
mentation 211
--
The
main
of
W®,
220,
In
Englewood
multiple
In
User.
Cargesev
Corsl--
1974.
are:
natural simple
data
dlalogue~
choice
lan@uage model,
query
Pes-and
Interrogation
a
[n Relational
74~
1017
-
Data
1021,
Base
North
Sys--
Holland 9
Codd's
relational
data
topics
sublansuage
including
types.
superimposition
needing
Maxwell,
model
W.
L.~
The
and
discussion
author
storage
lists access
investigation.
and
Measures
a
Morgan,
in
H.
L.
Information
On
The
Systems.
[mp!elSv
CACM
1972.
in
at
as
a
R.
File
W.~
The
Maxwell~
W0
by
file
which accesses
ve[llance
progPam~
automatic
functions.
which
contains
ls
also
a
security
conscious
of
discusslon
of
1972.
L.~
and
Morgany
H.
Processing
L,
A Technique
74,
988
-
992.
1974. by
has
are %o
of
checking
approach
Information
Ams%erdamv
Each
To p e r f o r m
an
paper
implemented
implemented
declarations~
is
%ime"~
Surveillance.
technique
All
paper
resource.
~olland~
described.
this
compile
systems
Conw~y~
Casual
proposed
steps
Inves%igatlons
Security
idea
"once CPU
the
Conf.
clarification
queDy~
and
the
of
security
Work.
%o a
The
logic v
with
Amsterdamy
steps
performance,
among
Conway,
IFIP
of
Processing
normalization
gram.
Data Base
1971®
1974.
survey
concu~encyv
A
the
Yorkl
capability.
E.
for
New
Data
Rendezvous
system.
declara±ive
tems~
North
seven
Internal
Codd~
A brief
45.
North
answering
%atemen%~
to
Holland,
level
theory
Steps
Proc.
descriptlon
the
of
editor).
ID747
The
only
Rustln
ACM~
of
Completeness
Hana~emen%~
definition
44.
NormalizatiQn
(Ro
Seven
Aprll
of
Workshop,
1971.
question
43.
SIGFIDET
Relational
ca~
high
ACM
the authors
associated complled
the
which
file
can
in
with
into
a
have
To
then
be
their £t
a set
file pass
used
system
to
of
ASAP
is
function
surveillance
pro--
through
suP--
perform
the
certain
93
46°
DanaT
Co~
and
and
Device
~o10
41t
The
paper
Date,
-
of
J.,
An
InforamTion
Report
1116~
Structure
Generation.
AFIPS
for
FJCC
Data
Base
1972
Prec.
1972. high
describes
Co
Data
L.
Independent
1111
manipulation
47°
Presser~
level
elements
for
The
generatlon
and
reports.
and
Hopewellt
Independence.
P.
1971ACM
STorage
SIGFIDET
Structure
and
Workshop,
ACM~
Physical New Y o r k ~
1971.
48.
Date,
J.,
C.
and
Independence.
49.
Dater ley~
Co
J.
An
Readlng,
Similar
Hopewell,
1971ACM
Introduction
Flle
to
book~
one
to
introduction
Deflnltlon
and
Workshopv
ACM~
Data
Systems.
Base
New
Loglcal York,
Data
1971o
Addison--Wes--
197"5.
Massachusetts~
To Wedeklndes
prehensive
P.
SIGPIDET
of
data
the
first
base
attempts
systems.
of
Many
a com-
annotated
references=
~0°
Davies, Natlo
C.
Together to
51°
52.
a
T.
Confo
Recovery
Prec.,
with
Dearnley~
P.
System°
others %he
Delobel, The
Theory
17,
374
Deals
-
as
Comp.
for
a DB/DC
System.
1973
ACM
1973. an
easy
To
of
a
Model
Self
20~
-- 2 1 0 ,
system
Journal observes
accordingly°
and
Caseyw
of
Boolean
386,
1973o
the
R0
E.e.
original
G.
of
Into
17,
understand
patterns
Slmulatlon
introduction
set
of
without
flat To
are
a
I B M J.
decomposition a
of
a
files
derive
allowlnS
Data
1974.
usage
of
Functions.
allowing
file
of
Organlzlng
results
DecomposDtlon
Switching
problem
property, The
paper
Opera%fen
redundancy
cover
Tion
The
with
{enormous) mal
A.
data
C.,
141~
concept.
Management
tures
-
BJorkls
recovery
Among
Semantics
136
and
Data
Base
Res.
Develop.
flat
the
same
further
and
with
file
having The
restruc-
reported.
mlnl-
Informa-
decompo~i-
tlon°
53.
DI
Paola,
Classes Santa The
of
R.
Monlca~
paper
A.
The
Proper
Callf.
deals
with
Solvabillty
Formulas Technical the
and
of
the
Related
Report
solvability
Declslon Results.
Problem
Rand
R--803--PR, A u g u s t of
The
decision
for
Corp.,
1971o problem
of
94
class File.
54.
of
See
Storage.
55.
M.
Dl%fmann~
deP
E.
Annual
Press~
den
%0 be
Data
Structures
Review
L~
and
in Automatic
Rends
Rela±lonal
%help
Data
Representation
Programmlngt
Klasslfizlerung
System-Entwurf. Infomm~%ik-
A~
GrundsTruktur
elnes
notes
yon
Technlsche
vol.
5,
in
PeP@a-
in
Des
Konzept
Darmstadt.
Berlehte
DV75--[
des
ObJektbeschrelbungsbaumes
gPaphenorlentlerten
computer
fuer
Datenunabhaeng£gkelt
Hochschule
FoPsehungsgruppen
Doerrscheidt,
ture
by
1969°
E®
Berlin,
processed
Levien/Marono
D~Imperio~
mon
questions
science
26,
als
Datenbankmodells.
532
-
541,
Springer
LecVerlagv
[975.
Describes
a
Typically
graph
o~iented
data
model
based
on
LISP
ideas.
57.
Durchholz~
R.~
Systems°
Data
Corsica~
April
Influenced the
58.
to
J.
guages~
Aeta
M.
s%Paints
on
%he
Work°
Conf.
Feature
model
of
Management CaPgese,
1974.
"CODASYL
data
Base
and
Data
Analysis"
schema.
Structures.
CACM
related %henry
Level
the
C~
2,
to a of
Theory
formal
Data 293
and
data
Structures -
incorpoPatlon
llke
of
string
structures
s[ml--
309, of
languages.
for
Programming
Lan-
1973.
relational
level
data
struc-
languages°
Wong,
%he Minimal
Co
Cost
K.
The
of
a
Effect
of
Partition.
Capacity
JACM~
22,
Con-441
-
1975. algorithm
proposed,
Easton~ IBM
of
Data
AmsTerdam~
Understandlng
Informal[ca
ALGOL
EasTon~
449~
ideas
for
into
A new
61.
an
model
fop IFIP
Proc,
~oll~nd,
hierarchical
Relational
proposal
tures
Concep±s
1971®
available
Earleyz
A
data
Towards
some the
North
a
J.
Go
Management
discuss
-- 6 2 8 ~
Sketches
60.
the
14,
617
Rich±er~
1974.
Earley~
lap
59 °
Base
by
authors
and
whlch
M.
C~
Research
%0
%he
accep±s
~odel Repnm%
for PC
problem capacity
considered
Chandra/Wo.g
is
constraln%So
Interactive 5050,
by
Sept.
Data 1974.
Base
~eference
STring.
95
Describes
a
of
modification
which
describes
model
Is
measured
the
independent
behavlour
its analytical
well,
tractahilltly
references
An
under
model,
advantase working
of
set
The
assump-
tions.
62.
Edelberg,
M.
SIGFIDET The
of
descrlbed~
Ehrlch~
H.
D,
InformaTlca graph
The
and
which
201
--211, data
for
(i.e°
log)
data
blocks.
einer
Recovery.
1~74
ACM
1974.
restores
Grundlagen
4,
York,
transfers
oriented
model
New
and
algorithm v which
processes
is a l s o
A
an
data
Into
Contamination
ACM~
describes
set
pagatlon
63.
Base
Workshop,
paper
given
Data
a
given
de%ermlnes
blocks
Theorle
A and
der
error
The
error
recovery reruns
and
a
pPo--
algorithm
processes.
Datenstrukturen.
Acta
1975.
model
are
investigated
W°
A
and
graph
from
a
Data
Base
orlen%ed
more
schemata
mathematical
within
point
of
view.
64.
Engles, view
R°
in
Tutorial
on
Programming
Automatic
vol.
Organization.
part
7
It
Annual
Pergamon
Re-
Press,
1972.
65.
Eswaran, The
Ko
P.~
Notions
System.
IBM
paper
The
of
Research
defines
concurrency~ guage is
Gray~
and
presented
N°~
Loriev and
Report
The
RJ
1487~
locks
determines
A.,
and
Tralger~
Locks
December
and
Their
is
I.
On
Base
within
consequences.
Two
L.
Data
consls±ency
proposed~
whether
in a
i974.
transaction,
specification
which
R°
Predica%e
of
no%Ion
predlca%e
predicate
fOr
J.
Consistency
and
such
an
A
lan-
algorithm
predicates
over-
lap.
66.
EswaranT of
1601~
Po,
and
Chamberlin,
for
Data
a
rules
interpreted
are
data
Everest~ rity.
Base
D.
D.
Specifications
Functional
Integrity.
IBM
Report
Research
RJ
1975.
Con%alns
the
67.
K.
a Subsystem
as
of
consistency
routines
To b e
rules.
invoked
Consls%ency
after
changes
Of
base.
G.
Data
Cargese,
classification
C. Base
Concurrent
Corsica~
Preclalmln~
of
Update
ManagemenTT April resources
241
1974. to
--
Control 270,
North prevent
and
Data
Base
Proc.
IFIP
Work°
Holland~
Ams%erdamt
deadlocks
is
Integ-Cent.
1974.
advocated
by
96
the
68.
author.
rende
I,
Informal and
a
of
Sprln~er
hlgh
Falkenberg
%he
from
T1
of
der
E.
language
The
of
Farley,
72.
also
Fehder, search The
73.
computer
a
data
model,
Management
Systems.
lnformatik,
Internal
A
employee
of
B
manipulation
dlmenslon.
und
Dars%ellung
Datenhankbenutze~
a
is
a data
data
closely
of
{and
yon
Informatlon
und
Detenbank--Man--
Stuttgart,
1975.
model
and
a data
related
to
concepts
though
graphically
It
an
manlpulation in for
allows
are}
natural n--ary
loterpreted
as
relations.
Cardenas
papers
DIAM
be
and
Stewart~
Relational
L®
Base
extends
Is g r a p h o r l e n t e d
of T o r o n t o ,
P.
in
it.
example:
and
tlme
of
can
G.,
S.
Data March
for
The
Reports
fo~
fuer
Unlversity
are
model
for
Unlversl%y
Data
{for
to
zwlschen
both
H.
J.
Selection
See
the
Thesisy
binary
in
Instltut
time
with
which
Notes
1973.
language
relatlons
description
relations
Resultatspezlflzie-
"Gegens%andsmodell"T
S%rukturlerung
where
lanEuage.
71.
of
Schnl%%s%elle
A detailed
Heidelberg,
%he
manipulation
stored
to c o p e
J.
Lecture
1974.
dimenslon
agement--System.
Joins
of
Stuttgart,
07/74,
Falkenberg,
Schneider,
Time--Handlln~
to T 2 )
language
and
Da%ensystemen.
Verla~,
level
T E®
Universlty
Adds
B.,
yon
discussion
CIS--Repor%
70.
~eyer,
Handhabung
science
6S.
E=,
Faikenbe~,
recent
A.
Query
Bases.
Technical
investigations
Independent RJ
descmibe
RIL,
the
Report
into
1121
(1972)
and
Index
CSRG-53v
1975.
Representation
RJ
Execution
and data
12Sl
this
subject.
Language.
IBM
~e--
to
the
i.
IBM
(1973|.
manipulation
language
system.
Fehder~ ~esearch Describes
Pc
L.
The
Report a
RJ
query
Hierarchic
Query
1307,
1973.
Nov.
language
to
Language
operate
on
(HQL)
IMS
part
like
hlerarchlc
datao
74.
Feldman, Language.
The
high
J.
A®~
CACM
level~
and
12,
439
ALGOL
Rovner, -- 4 4 9 ,
llke
P,
P.
An
ALGOL
based
Assoclatlve
1969.
programming
language
LEAP
is based
on
97
binary
associatlons~
which
are
implemented
uslng
a
hash
coding
P.
An
Author--
technique.
7S°
Fernandez~ Izatlon Conf.
E.
B. t Summers~
Model on
for
M~mt,
of
Authorization data
76.
base
purer A
77.
governed
Ro
and
vol.
The
by
und
and on
Coleman,
Base.
C.
ACM
SIGMOD
1975
Intl.
197S.
predicates
enforced
26,
and
Joset
over
prlmarily
at
applications
compile
Lecture
Gesellschaft.
and
time.
Notes
in
Com--
1975
discussions
A.,
Retrieval
C.~ Data
San
Datensehutz
of
Finkel~ for
is
Science,
survey
Shared
Data~
contents
H.
Fledler7
a
R.
on
Bentley,
privacy.
J.
L.
Ouad-trees:
Composite
Keys.
Acta
of
trees
for
a
Data
Informatica
Structure
4~
1
-
9t
1974. A
generallzatlon
binary
the
search
on
composite
keys.
78.
Florentln, nal
17,
J. 52
-
Consistency data
J.
Consistency
$8,
of
Data
Bases.
Compo
Jour-
1974.
rules
base
Auditing
are
contents,
predicate
Problems
calculus
of
their
expressions
over
implementation
the
are
dis--
cussed.
79.
Frank~
R.
L.s
University
Shows
detail
in
Franks
R.
L.t
Access
Method.
Describes
81.
o9
the
and
steps,
and
AFIPS a
the
users
Frasert
A.
G.
Integrity
Journal
12,
C.
archical
Structure.
(GI
1975}s
A
Report:
-- w o ~ k l n g
have
to
the
DBTG
A Proc.
Illustrative
papeP to
made
-- 7, get
a
COBOL
approach,
Method vol.
oriented
be
An
for
a
43t
45
language
Generalized
Data
-- 5 2 I 1 9 7 4 , to
tailor
access
of
a
Mass
Storage
Filing
System.
Comp.
1969o
System
Springer
NCC
DBTG
ISDOS
specifications.
~ecovery
Frasson,
The
Ko
keyword
to
Ss
in
Yamaguchis
1974
I -
H.
which
runnlng
ideas
the
E.
Michigan,
methods
Describes
82.
Sibley~
program
application
80.
and
Example.
in
to
MULTICSo
IncPease
Lecture Verla~
Notes
Data in
Heidelberg
Independence Computer s
I~75o
in
Sciences
an
Hier-
vol.
34
98
Descrlbes thelr
83.
Gen%on~
in
the
Recovery
Compo
Journ.
Ghosh~
P.~
S.
Base
work
is
S.
P®~
I%
iS
al
bes±.
Data
IBM
path
and
S=
P.~
and
System
-
accessed
dlrec%
126,
b{. E.
J.
Independent
of
Res.
Dev.
of
queries
An
algorithm minimum
V.
Y.
System
shown
Tuel~
that
W.
G®
Commercial
journallng
Path
1ST
is
-
Procedures
422y
access
given~
"path
in
is
of
a net-
claimed
%o
Collision
by
division"
A
of
an
Design
when
Hashing
197S.
"hashing
[B~
1974.
paths
which
fop
cardinality".
15 -- 22~
Perfromance.
Sys-
techniques.
Search
408
to
Analysis
I~
Access
1970. and
String
of
Lum~
Inform.
analytically
Base
123
for
checkpointin@
reduction
access
Divlsion~
Ghosh~
13,
considered.
an
Ghosh~ by
the
be
hierarchy.
Senko~
Systems.
DIAM
yield
and
can
Procedures
elementary
Data
86 °
A.
structures
Describes
Within
85.
[MS
position
tems.
84.
how
Research
is
in
Experiment Report
RJ
gener-
Model
to
1482,
Dec.
1S74.
87.
88.
The
authors
ate
the
model
Goldsteln~
1970
MacAims
is
C.~
and
early
{I.
e.
of
Strnad~
N=
1974
NCC
AFIPS
Galatll
is
transfer)
R.
J.
The
in
an
MacAims
ACM,
New
and
IMS
Data
York7
evalu-
system.
Management
1970o
system.
%hat a
Data
Base
Report
qC
of
~eorganiza%lon 5063,
clustering
way
Quan%Iflcation
Proc®
vol.
op±imlzatlon
LEAP
Feldm~n/Rovner).
ten.
A.
Go
in
Discusses
Haerder~
measurements
model
as
to
Oct.
records
mlnimIze
for
a
1974. into
the
blocks
number
of
necessary.
GreenfeldT
(see
performance
Workshop~
IBm| R e s e a r c h
considered
units
with
relational
and
Hierarchy.
problem
linearlzed
SiGFIDET
S.~
The
~
comparison
ACM
an
Oorens%ein~
transfers
90.
by
R.
System.
Storage
89.
construct
T.
Die
Technlsche
Forschun~sgPuppen
43T
in 71
-
techniques
Implemen%ierung Hochschule DV74--2.
won
a
Relational
75~ fop
a
relational
Zugriffspfaden
Darms%adt~
Data
System.
1974.
Berleh%e
system
dutch der
llke
Bitl[s-
InfoPmatik--
99
The
author
vestlgates of
91°
Haerdery
Hall~
T°
Zugrlffszeitverhalten Datenbank,
of of
P.
Held~
G.
access
Ae
V.
D.~
Common
is a QUEL
IBM
UK
and
conventional
methods
der
Auswahl
In-
of
for
Saetzen Berlchte
simulation.
Identification
M°
and
R.~
Includes
a com-
indexes.
UKSC0060~
1975
yon
Darmstadt~
DV74--3.
help
Report
System.
relatlonai as
of
rity
assurance
the
Hoffmannv Tems.
Its
NCC
in General
Nov°
1974.
E°
INGRES
Wongt
AFIPS
L.
via
J°
B.
C.~
Proc.
Easllyo
ACMv
New
forms
a
-- a ~ela--
vol°
et
44,
4CS
--
Shut
Descriptlon
access at
and Los
N.
Paclflc~
DEFINE
to
control
calculus
interesting and
preprocesslng
Privacy
In
Angeles~
C°t
and
Language
of ACM
Computer
Sys-
1973.
Lum~
for
Integ-
time.
Vo
Y.
Defining
DEFINE: Informa-
San
Franc[seoy
Aprll
19751
graph
structures
%o a l l n e a r
1975°
then
map
referenced
speclflcatlon~
J.
Journ.
17~
Discusses inverted
by
written
(and in
processed the
according
language
CONVE~.
to)
a
See
Iverted 59 how
-
Indexes
&3y
to
and
Multlllst
Structures.
Comp.
1974.
use
multlllst
structures
in
order
to
maintain
files.
R°
lutions
incorporate
Pov
wlth An
al°
Inglist
Karpt
Data
D.
system
language.
modification
Companyt
Proc°
iS
which
to
Publishing
language
translation
query
Securlty
Smlth~
York~
Describes
is
query
management
level
{editor}.
A Nonprocedural tion
data
high
authors
Melville
Housel~
The
organization
to
Hochschule
Subexpress[on
StonebrakerT Base
plan
Shu
the
structures
Systems.
bel
Technlsche
wlth
storage
Data
INGRES
port
index
1975.
based
97.
an
supePior
Informatlk--Forschungsgruppen
416,
96.
as
are
der
tlonal
9S.
lists
elner
AlGebraic
94.
lists
bit
aus
p~rison
93°
blt
when
indexing°
Analysis
92.
proposes
M.~ to
a
RC 4 7 4 0 v problem
McKellar~
A.
C. v
2-dimensional ~lso considered
and
Wong~
placement
%o a p p e a r is
the
in
SIAM
placement
C.
K.
problem. Journal of
Near--optimal
so-
IBM R e s e a r c h
~e--
of
Computing.
records
in
a
2--d|men--
100
slonal
storage
eonseeu%ive
98®
Kin~
W.
search See
99.
I00.
E.
D~
E~
~*T
tO2.
539. The
Center
North
is
Lavenberg~
iOS.
S.
Levlen~
relations
for
Ott~
volo
N.~
Report
1968,
3:
C.
and
Computing
1973.
and
ZoepprJ*z~
IBM
Germanyv
a
data
1975o
manlpula--
language.
Retrieval
Concepts
Sorting
75.08,007~
tO
natural
in
a
set-theor-
Practical
Symposium
Consldera1973~
531
-
1973.
natural
a "set
language
theore±ic"
S®v
and
Shedler~
G.
IBM
Research
Report
analytically data
D.
Re
E.~ and
Introduces
LsvIt%~
into
Fundamen--
Massachusetts~
designed to
P,
I:
like
query
langua@eT
In%ermedla%e
lanBuage
base
File
S.
A
tractable
Queuing RJ
Model
1561T
of
the
DL/I
the
pro-
1975.
queuln~
model
of
On-Line
Systems,
access.
Structures
for
Spartan
1969.
Execution
1060
a
vol.
Information,
IMS.
durin~
Books~
has
a~ea.
interpretation,
of
Lefkovi%z9
Base:
Ams%erdam~
system
simpllfied~
cesses
close
Lockemann~
Data
Holland,
~oP
Introduced
International
translated
Composent
i04.
of
th~s
H.~
Technical
and
Re=
IBM
M~ssachusetts7
Readlng~
Lehmann~
very
a File.
Programmlng,
Heidelberg~ is
for
Programming~
General
Strucutred
Proco
suitable
A
P=~
K®
in
Readlng~
Computer
D®)
is
two
between
1974.
Languages:
system
distance
Indices
Computer
of
At±
expected
research
Addison-Wesley7
described
which
of
Lat%ermannT
Kraegeloh~
tionsv
103.
The
lan~uage~whlch
etically
January ~ecen%
Art
of
Addlson--Wesleyv
interactive
tlon
the
mln~mized.
Selection
for
Specialty
Scientific
that
is
13411
The
Searching,
User
An
RJ
Cardenas
D.
Kogon~
so
the
Algorithms~
Knuth,
M.
On
Report
Knuth~
and
lot.
F.
also
±al
aP~ay~
~eferences
the [see
G.~
and Data
Maron~
E.
Re%rleval.
Relational also
Stewar±~
Interactive
M.
Data
Di
A Computer
CACM
Data
[0,
Filer
71S a
System
for
721,
1967.
-
system
based
IngePence
on
binary
Paola).
D°
H.~
Analysis.
and
Yormarkv 1974
AFIPS
B.
A Prototype
System
NCC
Proc=
43,
vol.
69
101
-- 6 9 ,
1974.
Describes
an
relying
on
graphics
107.
Lewis~
implemented
standard
and
P.
statistical
A.
Transaction po=%
108.
RJ
system
analytic
W. t
1629,
and
ShedlerT In
AuGust
The
cess
with
~
Llu)
S.)
and
Go
a Data
of
varying
Heller)
Translation
of
It
measurement
makes
data
heavy
use
of
S.
Statistical
Base
System.
transaction
stream
Analysis
IBM
of
Research
Re--
1975.
modeling
time
analysis
me%hods.
Processing
Describes
for
procedures.
J.
Model.
a
as
a
Polsson
pro-
Grammar
Driven
Data
ra%eo
A
1974
Record
ACM
OrIentedv
SIGFIDET
Workshop)
ACM~
New
York)
1974. Grammars
may
grammars
mapping
as
109.
a
string
P.
men%
for
764,
1967 •
111.
of
a
strin~s
to
equivalent
string
C.)
and Data
W.
D.
Acqulsi%Ion
%o
the be
to
a
tree.
7we
are
used
frees
specification.
KnuTsen~
may
A.,
A
and
problem assembled
Data
and
Symonds~
Ao
MultlpvoGramminG Analysis.
of
Environ-
CACM
measurement
10~
data.
communicating
via
75~
Base.
PrOCo
1970
RAM
-
relations
{in
some
Lo~te)
R.
Ao
Scientific January
Prefadata
sets
J.
A
ACM
Schema
for
SIGFIDET
Describing
a
Workshop~
Rela-
ACM~
New
a
data
XRM -
Center
base
sense
an
management
llke
LEAP
Extended
Report
G 320
of
system
(n--ary) -
2096)
based
on
binary
Peldman/Rovner}.
Relational CambridGe
Memory.
IBM
~ Massaehusetts~
1974. Implements
homogeneous
flat
files
on
top
of
RAM
(see
Lorle/Symonds)o
112.
113.
-
1970.
Describes
XRM
mapping
string
tars°
R.
tional York)
mappings
programs
parame
Lorle,
as
different
approach
earlier
and
Taken
Online
bricated
110o
to
Lockemann)
An
be
Lum)
V0
CACM
13)
Yo
MulTi--aT%rlbute
660
-
Lum)
Y.
form
Techniques)
Yo~
665)
Yuen)
P, a
Retrieval
with
Combined
Indexes.
1970.
S.
Tat
Fundamental
and
Dodd)
Me
Performance
Key Study
%o
Address on
Large
TransExist-
102
Ing
114.
Formatted a
plled
large
%0
V®
Yo~
of
Secondary
356,
±he
Cardenas
Vo
Y®
1973.
Lumv
V.
ented
117.
H.
An
Optimization
Proc,
1971
Performance Using
for
and
an
E®~
Data
techniques
as
ap-
ACM
Problem
NAT1.
on
Conf.~
the
vol.
Selec26,
349
into
the
problem
considered
by
Abstract
Wang~ Set
of
File
C.
P.~
Key--To--Address Trans-
Concept.
and
Allocation
the
algorithm
CACM
Ling~
in
H,
Storage
16,
603
A Cost
-
Ori-
Hlerarcbies,
cost
for
of
data
storage7 set
CPU~
allocation
channel is
e%c,
outllned~
cost.
Smith~
Memory
Analysis
197~.
this
and
Virtual
an
combining
minimizes
K.~
M.
322,
-
function
Maruyama~
Investi~tlons
Senko~
318
defined
for
hashin 8
others°
Algorithm
A cost is
1971.
of
Keys.
General
Y.~
18,
which
Ling~
Methods
612,
CACM
4,
sets,
earlier
and
forms%ion
116.
and
vol®
evaluations
1871.
Of
Lum,
and
data
tion
One
llS.
survey
Lum,
-
CACM t4~
Files=
Con±alas
S=
E,
Analysis
IBM
Indexes,
of
Research
Design
Report
RC
Alternatives 5087,
0ct.
1974, A
number
B-trees cally
118,
are
Surveys
McDonaldt
ACM,
New
and
7~
N®,
alternatives
resulting
York7
into
for
indexes
formulas~
which
oPganlzed
as
are
numeri-
is
system.
McGee~
W®
5 -- 1 9 ~
a
See
W.
C.
Hash
Table
Methods.
ACM
Comput--
1975.
M.
Conferencev
also
data
CUPID San
-- t h e
Friendly
Query
Francisco
t April
197Sv
File
volo
flow
Fi!e S~
687
P~ocessing.
Pergamon
Structures
Processing
dlagram-llke
language
%0
the
Held.
Generalized
Programmln~
Information
G.
StonebrakerT
Pacific
grahicy
C~
T.
1975o
INGRES
McGee~
Lewis,
and
ACM
CUPID
matic
121,
analyzed
D=~
Language.
120.
Implementation
evaluated.
Maurer~W. ing
119.
of
Press~
for 1233
Annual
Generalized
-- 1 2 3 9 ,
Review
in
Auto--
t96~.
North
Data
Management.
Holland,
Amster--
103
1968.
dam,
122.
Introduces
graphs
McGee,
Co
Data
W,
Base
April The
author of
McGee~
presents
W.
C,
ACM
SIGMOD
The
paper
125.
Go
H.
and
relations
Intl.
%he
Mehl,
J.
earlier
of
information.
Data
Conf.
Equivalence,
Cargese v CorsicaT
1974.
equivalent
organizations
organizations
at
on
Network
T Proc.~ and
on
Look
papers
between
W,s
and
in
and
of
the data
a
New
proposal
network
Data.
ACM,
Data
data
Prec.
Structures.
York,
fop
1975.
a data
manl-
structures,
AFIPS
1967
FJCC
525
-
New
proposal
to
the
York,
Ao %o
C. in
P,
G°s
the
compiled
and
A
Study
IMS Data
of
information
Order
Bases,
data
1974
as
sets
Transformations ACM
independence
routines,
appllcatlon
File
to view
SIGFIDET
of
Work--
1974.
increase of
be%wren
Merten~ proach
held
York~
proposing
sets.
Wangt
ACMT
program
F r y T Jo
Po
Translation.
which and
data
A Data
1974
ACM
supported
intercept
the
by
IMS
communi-
management.
Descrlp%ion
SIGF[DET
Language
Workshop~
ACM~
ApNew
1974,
Describes
the
idea
translation
Merten~
A.
Gos
New
MeyerT
York~
B.y
and
design
behind
%he
Of M i c b i @ a n
UnivePsi±y
Severance,
through
D.
G.
Modeling.
Performance Proc.
Evaluation
1972
ACM
Natlo
File Conf,t
1972
and
Technology.
and
project,
of Organizations
the
Work°
Operations
operating
shop~
cation
128.
STudy
(CRM}
Conference
Another
stored
organizations.
requirements
Structures
ACM,
of
flle
(DBTG)
Hierarchic
data
IFIP
for
1967. of
with
127.
197S
One
A
126.
flat
Level
outlines language
MealeyT S34,
File
the
Amsterdamv
a number
language
models
%o
Proc,
Holland~
homogeneous
pualtion
124.
A Contribution
North
description
123.
conceptual
Management°
19747
class
as
Schneider~
Course
H.
Notesv
Jo
Predicate
University
of
Logic
Berlinv
and
Data
available
Base from
authors.
Reviews interface
predicate llke
logic
in Coddes
and work
Its
use
and
in
as
a
model
natural
fop
man-machine
language
question--
104
answerin~
129.
sys%ems~
Minsky~
N®
Workshop~ The He
On
!nte~act~on
ACMT
author
New
discusses
proposes
a
Vlconsls%ent
wlth
YorkT
concepts~
cons±Ductive
operators"
Data
Bases.
[974
ACM
SIGFIDET
1974. integrity approach
to
be
used
rules~
%0
as
user
integrity
prlmi%ives
views
for
etc.
deflnlng
by more
complex
opera%ions.
130.
Mul!in~ Hashed
131.
J,
K,
An
Overflow.
Mylopoulos7
J.~
Relatlonal
Improved CACM
Index
15~
301
Schus%er~
System~
-
S.~
1975
Sequential 007,
and
AFIPS
Access
Me%hod
uslng
1972,
Tslchritzis7
NCC
Prec.
D.
A
Multilevel
vol.
44~
403
fhe
prototype
-
408,
197S. The
mechanism
ZETA/TORUS system
with
language ZETA
132.
used are a
on
as
an
Nakamuma~ Base
vol.
44,
±np
of
-
base
lower
level
a
I.~
of
rel~tional %0
data
define
a
prlmi%ives.
natural
and
Performance 463,
is
capabillfy
Yoshida~
System 459
development ZETA
"intelligent"
language
Kondov
high
TORUS
system
management level
is
query
bulit
on
interface,
H.
A
Evaluation.
Slmulation 197S
AFIPS
Model NCC
for Proc,
IS7S.
of expe~Iments
DescPiptlon data
%he
defini#lon
F.~
Data
in
descrlbed.
management
simulating
sysfem
in
a
the
processes
conventional
withln
slmulatlon
a
pack--
age.
133.
Nava%he~
cation 1975 The
S® of
paper
al
Mer%enT
Relatlonal
Eo
View,
when mo~e
J.
%hat
powerful
February
The
paper
of
G.
Investigation
to
Data
into
Translation.
the
Appll-
ACM
SIGMOD
-- 1 3 8 ,
Codd~s in
the
relational
model
context
data
of
".,®
poses
ser-
tr&nslaTlen
as
a
restruc%urln@".
Mapping:
University
10,
123
used
Data
A.
~odel
Proc,~
concludes
fop
Neuhold~
and
Conf®
pmoblems
vehicle
134.
the
Intl,
ious
Bo~
A
Formal
KaPlsruheT
Hierarchical
and
Relation-
Forschungsberlch±e~
Berlcht
1973. compares
formal
notation.
tional
model
is
hlePaPchlcal In
a
and
partIculart
special
case
It of
the
relational m~Mes
clear
hierarchical
da±a
models
%ha%
%be
model.
in
rela-
105
135.
136.
Blnary
Nlever@elt
v J.
Computing
Surveys
Notleyy UK-SC
M.
G,
Search
6v
The
3~
Trees
and
File
Organization.
ACM
1974.
Peterlee
IS/I
System.
IBM
UK~
Peterleey
Report
0018.
Describes
IS/It
one
of
the
earlier
Codd
~elationnl
implementa--
tions.
137.
Olsonl
C.
cessed
Records°
A.
Random
Access
Prec.
File
of
1969
Organization
ACM
Natl.
for
Confo
Indirectly ACMt
New
AcYork~
1969.
138.
Owensl
P.
Phase
J.
Information
II
--
Processing
a
Data
71T
827
Base --
Management
832T
North
Modeling Holland~
System. Amsterdamv
1972. Phase
II
is
management
138.
Palermo~
P.
Indexes. the
of
modeling
IBM
eamlier
designed
specifically
fop
data
Approach
Research papers
Report
on
index
RJ
to
the
0730~
Selection
July
selection.
of
Sec-
Cardenas
fop
1970.
See
results.
Palermo,
F°
RJ
July
I072~ paper
P.
A Data
for
queries
Petrlckt
S.
R.
Research
REQUEST
an
one
of
in
Search
RC
the
Problem.
earlier
predicate
Semantlc
Report
is
Base
IBM
Research
Report
1972.
contains
gorithms
IBM
tool
A Quantitative
ondary
The
141.
F.
One
recent
140.
a
evaluation.
optimizing
calculus
Interpretation 4457~
July
expe~tmental~
reduction
al-
form.
in
the
REQUEST
system,
1973.
natural
language
question
answering
system°
142.
Ramlrez~ tion
of
guage.
J.
1974
Describes to
D0
P*
Reisner~ Evaluation
Rln)
N.
Ao~
Conversion ACM
an
and
Prywes~
Programs
SIGFIDET
using
Workshopv
implementation
Smlth}v
translating
143o
At~
Data
of
which
a
ACM~ data
complies
N, a
S.
Automatic
Data
New
York~
Lan-
1974.
definition
data
Genera-
Description
language
definitions
{due
Into
data
programs.
P.~
Boyee~ of
R.
two Data
For
and
Base
Ch~mberltn
Query
t
Languages
Do
P, -
Human
SQUARE
Factors and
SE--
106
QUEL~
AFIPS
1975
NCC
A psychological
analyzed.
in
144.
Data of
data
the
performance
Models
models
64
-
447
show
the
of
Data
ternational
are
452,
1975.
subjects
is
a
but
slight
language,
which
Implemented of
sequences
described
and
statistically primarily
differ
at
Rothnie,
program the
J.
low
8.¢
a Paged
for
and
of
to
be
for
used
implementations.
A
ACM
Framework European
measurement of
levels
end,
and
a
data
for
Evalu-
Chapters
In-
evaluation
of
base
commands
involve
hgih
Lozano,
To
Environment. "multiple
for
allowing
the
levels the
~97S.
197S.
different
at
objective
D.
of
Representation.
May
and
disk
system
issued
address
in
is the
reference
end.
Memory
A combina%ion nlque
at
the
Prec.
Symposium
dlfferent
1554t
different
Hlldebrand~
Systems,
events
Storage
no.
wlth
of
framework
The
application traces
and
J.,
Base
Secondary Repo~t
designed
Computing
presented.
for MRC
evaluation
Rodriguez-Rosell,
in
on
Wisconsin~
The
ation
146.
with
nonprogrammers
dependency
A~
Rei%er,
An
44,
syntax~
University
145.
vol.
experiment
Only
significant
Pros°
a
Attribute
CACM
key
reductlon
Based 63
17,
hashing" of
the
File
-- 6 9 ,
and
Organization
1974.
inverted
number
Of
page
file
tech--
faults
for
multi--key--retrieval.
147.
Rothnie~
Jo
Relational vel*
148.
44,
employed
with
every
Sayanty
~.
burg.
and
To
attempts
Restart
U.
recovery emphasts
Ein
Messdaten.
Verlag,
Expressions
1975
utilize
the
AFIPS
in
NCC
a
Prec.
1975.
Processing
puts
Schauer, chef
423~
Retrieval
System.
to
for
the
and
Recovery
System.
purpose
1974
of
gained
optimization.
in a ACM
Information
Transaction
SIGFIDET
Oriented
Workshopv
ACM~
discussed.
]he
1974.
York,
ReStart
-
Inter--Entry
Management
tuple-access
Information
149.
Base
strategy
H.
author
Evaluating
417
The
New
B.
Data
appear
policies on
System IBM
as
~eidelberg.
are
defined
and
performance.
zur
Germany, Lecture
Interaktiven Informatlk Notes
in
Bearbeltung Symposium
Computer
umfan~rel--
1979t
Science,
Bad
~om--
Springer
107
Introduces
an
Interactive storage~ "query
150.
a by
brary
or
SchkolnlckT
See
151.
graphics
M.
Conf.
also
H.
tional
A.s
Data
Datay
San authors
ism
of
The
similar
ACM
See
to
data
language
{llke
an
also
open
ended
ll-
/13/.
ACM
Optimization, Jose~
comblnln~
relational
SlGMOD
1975
1975.
research.
J.
Re
SIGMOD
On
the
1975
Semantics
Intlo
Conf.
of
the
Rela-
on
M~mt.
of
1975.
are
Codd~s
access
Data t San
Swenson~
Model.
Jose~
The
world.
for
and
Index
of
Mgmto
system
a
manipulation
wlth
subroutines.
Secondary
base
{APL)~
data
Zloof)
FORTRAN
on
data
facilities
see
Cardenas
Schmld~
measurement
oriented
example"~
PL/I
of
Intern.
interactive
computational
concerned
relational authors
wlth
the
model
and
the
a
kind
of
employ
gap
between modelled graph
the
pure
part
of
model
formalthe
to
real
fill
the
gap,
152.
Schmutzt
H.
Germanyv
74.10.004t A
Oct.
special
schema
to
153.
of
context--free
hierarchical
mapping
Go
Language
31~
1975. authors
Senkor
as
M.
M°~
and
ARPA
E.)
a
Holland~
FOREM
is
evaluation
an
Senkot Data Journo
Mt
data
E.~
Structures 12~
30
a
is Pair
for
base
IBM Peport
describe
grammars
are
internal a
or
the
used
to
external
theoretical
J.
Creation
Information
for
to
treat-
systems°
E.
Networks.
used
and
model
data
language
V.
North
evaluate
model,
Deasautelst
Yo)
(FOREM)o
1968.
to
Relations.
Technical
data
of
Systems
translation
in
a
File
I,
a
25
-
network
network.
Lum)
Model
is In
for
propose
the
and
Centerv
conceptual
system problems
Schneldery
The
Languages
grammars
data
between
described
important
Evaluation
155.
of
Translation
such
154.
a
the The
ment
Regular Scientific
1974.
form
describe view.
Parenthesis Heidelberg
and
Amsterdam~ and
and -- 9 3 ~
E.
B0~
Accessing 1973,
P.
J.
A File
Processing
Organization 687
514
-
519~
1969.
simulation
management
Altman~
Owens9
Information
tool
specifically
designed
systems.
AstrahanT in
Data
M. Base
Mo~
and
Systems.
Fehder~
P,
L.
IBM S y s t e m s
108
This
paper
descrlbes
tem T one
of
research
156.
157.
159.
Senko~
M.
tities
~nd
Senko~
M.
E.
and
ideas
behind
app?oaches
comprehensive
E.
Data
Report
RC
An
Senko~
Me
~®
Report
RC
5263v
Senko~
M®
Eo
%he
DIAM
sys-
%0
data
base
3
I~
Description Oct.
-
Pela%ions~
--
13,
Setsv
En-
1975.
in
the
DIAM
II
wlth
FERAL
for
Lan@u~ge
Description
5073~
RecordsT
Systems
Context
of
FORAL.
a
Mul--
IBM
Re-
1973.
Introduction
%0
Users,
IBM
Pesearcb
1975.
Speclfiea%len
Results on
Sys±ems:
Inform.
Structured
Output
thoughts
Information
Things.
search
ence
%he
e~rller
systems,
tilevel
158.
%he
In
Very
DIAM
Large
of
II
Stored
wlth
Data
Data
FORALo
Basesw
Structures
Proc.
of
Bos%ont
the
1975~
and
Desired
In%.
Confer-
available
from
ACM. The
last
which
is
three
references
based
on
introduce
binary
DIAM
I[~
a
and
has
FERAL
assocla%ions
proposed as
sys%em~ Its
query
language.
160.
Severance~
161.
A
D.
scheme
164.
G.
is
A
[~
descrlbed~ a
set
Shneiderman~
B®
362
-- 3 6 5 ,
Optimum
Shnelderman~
B.~ -
566
~nd
577T
paper
describes
cem%aln
classes
of
B.
Model
The
3,
p~per
93 is
-
and
Gen--
Alternative
File
StPuc-
1975o a
special
Base
Scheuermann~
Of
IJCIS
Survey
"two
dimensional
space
including
well-knewn
of
case,
ReoPganization
Points.
CACM
A
103,
P°
S%ructuPed
Data
STructures.
1974.
The
Shneiderman~
of
A 1974.
1973.
CACM
17~
3t
organizations
as
Data
55, maps
data
organlzatlons
6~
Model --
51
which of
Mechanism:
Surveys
Parametric
Systems
~v t o
Search
Computing
conven±Ional
16,
163.
ACM
Inform.
parameters
162.
Iden%[fler
Model.
Severance~ tures.
O.
D.
erallzed
an
approach
data
for
to
deal
wl%h
integrity
in
case
sTruc%ures.
Optimizing
Indexed
File
Structures0
1974.
concerned
wlth
the
selection
of
index
size
at
dlf--
109
ferent
165.
Shut
N.
C.~
SlbleyT
paper
Ho,
E.
CACM
discusses
paper
Eo
two
"data
-
750
V.
for
On
the
structured" wlth
et
al.
R.
W.
759,
goals
Y.
CONVERT
Data
of
A
a
High
Conversion.
Level
CACM
18,
and
ACMv
Data
Definition
and
Mappin~
1973. a
data
definition
mapping
Equivalence
Workshop,
independence
A
Taylor~
philosophical
Sibley,
Lum,
Language
deflnitlon
H.
translation
the
16,
data
ACM S I G F I D E T
168.
and
The
The
and
to H o u s e l
Language°
Sibley~
C°,
1975.
lustrates
167.
B.
Deflnltion
-- S 6 7 ,
A companion
166.
performance.
improve
Housel~
Translation 5S7
%0
levels
New
of York~
or
"procedural" connection
and
il-
examples.
Data
Based
1~74
Systems.
1974o
directions~
its
by
language
"relational" (DBTG)
are
to d a t a
(Codd)
and
the
compared.
Also
data
restructuring
and
dafa
Dictionaries
for
Is d i s c u s s e d .
E.
H.y
and
Information
discussion
Sayanl,
Systems
of
the
H®
H.
Data
Interface.
need
for
and
Element
NBS-Report objectives
v 1974o of
a Data
Dictionary
capability.
169.
dissertation,
One
of
the
to
Data
Description
of
Pennsylvaniav
University
earller
data
definition
and
and
mapping
Conversion.
1971. languages°
See
Ramirez.
Smithy cal
P.
Approach
Do
D.
also
170 •
An
Smithy PHo
S.
Data
E.,
and
Base
Mommens,
Structures.
J. ACM
H.
Automatic
SIGMOD
Generation
1975
intl.
Conf.
of
Physi-
San
Jose~
from
des-
1975. A
criptive into
171.
172.
design a i d
prototype
input
account
S%ahl~
Fo
A.
AFIPS
NCC
Steel~
To
SIGMOD
1975
IMS
and
A Homophonic vol.
Data Intl.
described
physical
constralnts
Prec.
B~
is
42,
Base Conf.
data
for
- 568v
Standardization on
Mgmt.
generates
structure
objective
Cipher 565
which
of
def[nitlons
taklns
functions.
Computational
Cryptogvaby.
1973.
--
Datay
A
San
Status
Jose~
Report, 197S.
ACM
110
173.
Steuertt tem:
J. ~ a n d
Goldman~
A Perspective.
J®
!974
The
ACM
Relational
Data
SIGFIDET
Workshop,
RDMSv
system
Management ACM,
Sys-
New
York,
1874. An
in±roduc%ory
and
174.
175.
based
on
deecrlptlon
Codd's
Stonebraker~
M.
The
Indices.
IJCIS
See
Cardenas
else
3,
of
-- 1 8 8 ,
for
~esearch
Stonebraker,
M.
SIGFIDET
Workshop
The
paper
first
Partial
on
ACM,
the
unfortunately of
used
and
Inversions
%hls
View
Proe.~
analyzes
being
at
MIT
Combined
1974.
A Functional
which
It d e s c P l b e s
Choice
a
model.
167
ACM
approach,
of
relational
%ople.
of New
Data
problem ks
%he
types
~®
Implementation
not
data
Independence.
YorkT with
kept
1874
1974. a
promising
through
independence
up
%o
to be
form~l The
end.
provided
in
INGRES.
176.
Stonebraker7 Views San
177.
by
Jose,
The
also
et
Su~
Held
S.
1974
Y.
The
Data
tation~
which
Taylo~
W,
Constraints
1975
is
R,
have
it T which
their
Sharing
in
ACM~
York,
a
in
New
of
Intl.
Conf.
and Prec.,
in
more
detail.
Data
Base
Translation
a Ne±work
See
the
a
corresponding
deals
%o
the
of
IFIP
Work.
a conceptual datalogical data
Storage.
Arbor,
Conf.
1974. data
model
approach
forms.
hlanagemen%
Physical
Approach
Amsterdam,
internal
Base
Ann
data
of
with
Data
Infological
~olland~
a kind
Environment.
[974,
Proco
North
MichiganT
for
used
Semiautomatic
is
)(appln G
of
proposal being
See
inte~rity
Management
Generalized
and
a
to
1974.
approach
University
Contains
Integrity
SIGMOD
Foundation
Base
April
It m a y
R.
STructures
180.
Data
with
A
Data
Conceptual
Base.
associated
H.
Workshop,
[nfolo~ical
ments.
Lam,
Corsica,
Taylor~
approach
Achieving
philosophy.
t79.
and
B.
Cargese,
of
ACM
al.
SIGFIDET
Sundgren, to
INGRES
W.T
for
ACM
qodificaTlon.
19U5.
Describes
System
178.
Query
System Ph.
D.
Data
diseer--
1971,
definition
and
~[ichigan data
mapping
translation
languase, experi-
MeDten/Fry.
W®
Data
Administration
and
±he
DBTG
Report.
1974
ACM
111
SIGFIDET Among
others~
taln
181.
Workshop
data
Taylors Base
the
Ro
Cargese~
Wo7
TeichroewT Proc.
D,
of
ACMy
essential
of
about
slstance
183.
J°
Thomas~ by
given ple
185o
F. A
So
186.
Tslchritzls~
of
Turn~
R.I
Research
Development of
IFIP
of Work.
Amsterdamv a
data
in
on
paper:
the
J.
PJ
File
Data Conf,
1974°
base
at
a
user
Organtzatlon.
Informations
there
is
as
data
Storage
no a
and
have
absolutely
Proco
vol°
44~
to he
to be
439
with
of
made
of
as-
Query
197S.
subJects~ into
know-
wlth
Study
- 44ST
3S
translated
best
function
A Psychological
an experiment
of how
Interactions.
Van
der
417~
who
query
by
were exam-
Pool~
of
the
Toronto7 CoddWs
experimental
deverill~
R.
1969
Nail,
ACM
No
IBM
system
Framework
(i.eQ
o~
A.
Overview.
Technical
AFIPS
B° v and System,
UKSC
Peterleev
1975o
relational
Shapiroy
Dos±erty
Language
Technical
007S~
A Network
J.
Coy
1969.
A
UKSC
- Measures
der
-
networks
and
P°
Extensihle
PRTV:
P°
physical
Systems
188~
P°
description
University
187.
399
Report
Discusses
to
o~
B° v Lockemann~
J.
Technical A new
of
Changes
English
Rapidly
Proc*T
Todd~
In
NCC
of
oh--
to
Zloo~)o
REL:
Conf.
use
AFIPS
the
Proco
programsQ
this
Gould~
results
questions
Thompsont S.
and
On
on
Information.
computer.
preprocessor
a
1971~
in
future
use
Holland~
Symposium
Yorkv
the
197S
the
(see
SIGIR
the
C°t
Example°
Reports
184.
of
North evolution
Approach
New
W°
the
Impact
message
representation
D°
1974.
time.
1974.
is
1971
to
Managementv
April
An
the
Retrieval~
ledge
Base
Its
Yorkl
precomptle
Stemple~
~ concern and
New
proposes at
and
Corslca~
authors
The
author
Data
installation
182.
ACM)
Independence
Editions.
The
Proc.t
Eeport model
linked
Z.
for
Optimum
Relation
Implementation.
CSRG-49~
February
can
be
1975.
Implemented
on
top
structures)o
Privacy
Ef~ectlvenessy 1972
IS/1.
FJCC~
Storage
and
Security
Costs
and
vol.
41y
435
Allocation
in
Data
Bank
Protection--Intru-
444.
for
a
File
in
112
Steady
State.
Files
with
overflow
189.
Vose~
M.
Wang~
R.,
C.
Data
and
a
set
of
specify
minimal
without of
set
are
to
the
19,
cover
to
-- 7 7 ,
Inverted
Index
set
of
cover
which
relations
is
again
Given
third
Logical
calculates with
a
the
normal
in
1975.
algorithm,
given
CoddVs
rate
state.
Synthesis
71
dependencies.
in
with
steady
Approach
Segment
Dev®
a
and
overflow
!972.
minimal
tPansltive
melatlons
1973. {hashing)
for
An
H.B,
minimal
Each
S,
May
Res.
covers
38v
utilization,
given
J.
16,
J°
a
dependencies.
floss
Storage
Wedekind~ IBM
-
27
transformations
analyzed.
Bull.
and
17,
Dlv.
Richardson~
Design.
authors
tive
Res.
factors
Comp.
P.,
Base
The
are
relevant
Maintenance.
190 •
J.
key--to--address
areas
other
and
IBM
transl-
set
of
minimum
form
can
velacovert
easily
be
constructed.
H.
191.
Wedeklnd,
1£2.
Wedekind~
B.
Mannhelm,
1974.
193 •
Wedekind~ System. esev
W.
paperlS
tion
of
Wellis~
the
Base April
efficient
M.
E.~
Katke,
1117-
SIMS
is
interesting
data
normal
form
and
tion
been
paid
%o
Based
File
Organizations.
Each
query
queries. to
the
is In
Olsont
assumed
elementary
a
IFIP
in
Work.
Data
a
Base
Conf.,
Amsterdam,
analysis
J,t
number
and
Carg-
IB74.
for
Yang,
the
S.
System.
in
case
of and
mapped
determina-
C,
SIMS
AFIPS
to
the
reasons.
T.
a
FJCC
be
the
queries
a
14,
of
593
boolean
data and
high
language.
Canonical
CACM to
offers
-
an
1972,
ba@e access
and
597,
be
blgh Data hier-
atten-
programs.
In
Attribute
1971.
expression can
level
PartlculaP
data
Structure -
a
language.
conceptual
query
C.
I%
manipulation
transferability
Chiang~
this
and
Information
be
used
and
of
Holland~
mapping
go,
Paths
Access
Instltut
1872, for
may
1972.
paths.
definition,
archical
Wong,
North
W®,
1131,
files
has
of
Berlin,
Bibliographlsches
Proc.
modeling
User-Oriented
41~
on
1974.
access
vol.
level
Selection
is
Gruytem~
I.
Management,
concern
Integrateds
195.
On
Data
de
Datenbanksys%eme
Corsicav
The
194.
Datenorganlsatlon.
over
elementary
organized
according
becomes
essentially
the
113
problem
of
pu±%ing
a
boolean
expression
In%o
some
s%~ndard
~orm,
196.
Yao~
S.
±hrou~h
B.
Michlgan~
197.
Y u e 7 P. ondary also For
198.
The
basic
user
Wongt
C.
Selec%lon,
recen%
M.
-
frame
Op%Imlz~%ion Pho
D°
of F i l e
dlsser%a%ion~
Organization of
Universlty
K.
S%orage
IBM
Cos%
Research
Consldera%ions
Repot%
RC
5070~
in %o
Sec-
appear
IJCIS,
431
%hat
and
Index
other
M.
437,
userWs
and
Modeling.
1974.
C.~
in
Zloof,
Evalua%ion
Analy%ie
resul%s
Query
in
this
By
Example.
of
query
area
of
rese&~ch
197S
AFIPS
NCC
see
Cardenas,
Proc.
vol.
44,
1975.
features
pe~cep%ion
of
of
manipula%ing
of
reference
fills
da%a
example
processing
%ables
consis%ing
informa%ion.
by
in of
in
are %his
illustrated. query
a graphically table
skele%onsv
language
The is
pre--estebllshed in%o
which
%he
Grundlegendes
zur Speicherhierarchie
Claus Sch~nemann~
1.
IBM B6blingen
EINLEITUNG
Das Thema dieses Beitrags ist die konkrete Daten-Speicherung und -Adressierung unter Zugrundelegung eines hierarchischen Aufbaus des Speichersystems. Soweit Datenbankaspekte dabei berahrt werden~ sind sie aus der Sicht der Hardware-Implementierung
und vorwiegend unter Leistungsgesichtspunkten
gesehen. Heutige Computer-Speichersysteme
sind bereits weitgehend hierarchisch
strukturiert. Dabei soll unterschieden werden zwischen einer lediglich dutch Kapazit~tsabstufung gekennzeichneten und einer strengen Hierarchie, bei der auf jeder Stufe wahlfreier Zugriff m~glich ist und der Datenflug keine Stufe ~berspringt. Die Kombination Hauptspeicher - Pufferspeicher stellt eine strenge Hierarchie dar, bei der der Hierarchiebegriff fiberhaupt erst ins Bewugtsein ger@ckt wurde
[11. Der Pufferspeicher
(Cache) ist far die Maschinenar-
chitektur transparent und pagt die Geschwindigkeit des Hauptspeichers an die noch h~here des ~rozessors an. Ebenso ist die Folge Hauptspeicher Magnetplattenspeicher
als strenge Hierarchie anzusprechen, auch wenn
diese Betrachtungsseite
(mit Ausnahme von Programm-Paging im Rahmen des
virtuellen Speichers) bislang nicht im Vordergrund stand und der Plattenspeicher mehr als Ein/Ausgabeger~t aufgefagt und so yon der Maschinenarchitektur behandelt wurde. Der Magnetbandspeicher
ist wegen seiner langen Zugriffszeit
(incl. Band-
laden) nicht mehr im strengen Sinne zur Hierarchie zu rechnen.
115
Ans~tze,
die gro~e und billige Bandspeicherkapazit~t als echte oberste
Datenflu~-Hierarchiestufe
zu integrieren,
sind mit der j~ngeren Entwick-
lung yon automatischen Bandtransportsystemen, Kassettenspeicher,
wie z.B. beim IBM 3850-
sichtbar geworden. Dabei k6nnte beispielsweise dem
Bandspeicher die Funktion eines Archivs und dem Plattenspeicher die Funktion eines Arbeitsspeichers groSer Kapazit~t zugeordnet werden, wobei der Inhalt ganzer virtueller Plattenstapel automatisch auf Verlangen auf das Plattensystem @bertragen wird [2]. In Abbildung ] i s t
das Schema
dieses Hierarchiekonzepts skizziert. Der schwache Punkt der gegenw~rtigen Speicherhierarchie ist das Verh~Itnis der Zugriffszeiten des Hauptspeichers
zum Plattenspeicher yon mehr
als 1:1OOOO, die sog. Zugriffsl~cke. Auch ein Dazwischenschalten von Trommelspeichern bzw. Plattenspeichern mit festem Lesekopf ~ndert die Situation nicht wesentlich. Man versucht daher bekanntlich, h~Itnis durch Programmumschaltung
das Mi~ver-
im Rahmen yon Multiprogrammierung
zu
fiberbr~cken. Mit fortschreitender Prozessor- und Hauptspeichergeschwindigkeit, aber gleichbleibender Zugriffszeit der mechanisch arbeitenden Massenspeicher,
muB der Multiprogrammierungsgrad,
die Hauptspeichergr~$e
und die Zahl der Plattenspindeln immer gr6Ber werden. Damit entfernt man sich vom Kostenoptimum, au~erdem steigen die Anforderungen an das steuernde Betriebssystem und seine Komplexit~t,bei abnehmender Effizienz. Im Folgenden wird versucht,
f~r das gesamte Hierarchiespektrum die Spei-
cherparameter nach einheitlichen Gesichtspunkten zu klassifizieren und anhand solcher Parameter die Leistungsf~higkeit der Hierarchie zu diskutieren, mit besonderer Blickrichtung auf das Problem der Zugriffsl~cke. Die Anforderungen des Datenbankbetriebes werden kurz angesprochen.
2.
TECHNOLOGIE- UND OPERATIONSPARAMETER
Es sind zahlreiche Technologien bekannt, die unter Ausnutzung verschiedenster physikalischer Effekte zu sehr unterschiedlichen Speichereigenschaften f@hren. Am verbreitetsten ist heute die Halbleitertechnologie f~r die schnellen elektronischen Matrix-Speicher mit wahlweisem Zugriff und die Magnetschichttechnologie
f~r die langsameren und billigen Massen-
speicher, haupts~chlich in den Ausf~hrungen Platten- und Bandspeicher. Bine weitere Gruppe, die aber noch nicht das Stadium breiter Produktreife erreicht hat, ist die der optischen und mit Elektronenstrahl
operierenden
116
Speicher [3r4]. Auch die diversen Schieberegistertechnologien wie CCD (Charge Coupled Device)
[5,6] oder Magnetblasen (Bubbles)
[7] machen
vorerst nur tastende Schritte im kommerziellen Einsatz. Die spezifischen Arbeitsweisen der einzelnen Speicherfamilien sollen hier nicht diskutiert werdenr vielmehr wird das gesamte Speicherspektrum einheitlich durch einen Satz von invarianten technologischen und operativen Parametern beschriebenr Tabelle I. Die beiden wichtigen Operationsparameter, mittlere Zugriffszeit und Bitkostenr stehen in einer gewissen reziproken Relation zueinander. Sie bestimmen den Standort einer Technologie innerhalb des Gesamtspektrums. Im Diagramm Abb. 2 sind heutige typische Werte in Abh~ngigkeit des gewichtigsten Technologieparameters, Bitzahl pro Schreib/Lesestation, dargestellt
[8].
Die Zugriffszeit setzt sich zusammen aus der Zugriffszeit im engeren Sinner einer Art Totzeit vor der 0bertragung des ersten Bit, und der Daten~bertragungszeit. Die 0bertragungszeit ist abh~ngig yon der Datenrater gegeben durch Taktfrequenz und interne Bitbreite, und der gew~hlten ~bertragenen Blockl~nge. Zus~tzliche Verz6gerungen durch den externen 0bertragungskanal sind in der Obertragungszeit mitenthalten. Unter Modularit~t ist die Unterteilbarkeit eines Speichers bzw. einer Hierarchiestufe in Module mit eigenem parallelen Zugriff verstanden. Dadurch wird die Zugriffsrate erh~ht. Die F~higkeit zur modularen Aufteilung nimmt im allgemeinen ab mit dem Technologieparameter "Bitzahl pro Schreib/Lesestation'. Bei mechanischer Entkopplung zwischen Lesen/ Schreiben und dem Datentransport kann die Zugriffsrate dutch Oberlappung welter erh6ht werden. So wird beim Bandkassettenspeicher IBM 3850 die n~chste Kassette schon transportiert, w~hrend die vorhergehende sich noch in der Lese/Schreibstation befindet. Weitere Beispiele fur asynchronen Parallelbetrieb sind die Konfiguration mehrerer Plattenspeicher in einer DV-Anlage wie auch die Unterteilung des Hauptspeichers in unabh~ngig und parallel arbeitende Module. Auch die Bitkosten bestimmen sich in erster Linie aus der Bitzahl pro Lese/Schreibstation. Sie sind auger yon den spezifisch technologischkonstruktiven Faktoren vom allgemeinen Miniaturisierungsstand der Technik abh~ngig. Abb. 3 zeigt beispielsweise die historische Entwicklung der Bitdichte beim Magnetplattenspeicher. Entsprechend sind die Zahlenangaben
117
in Abb. 2 nur zeitbezogen zu verstehen.
Die relativen Zuordnungen dOrf-
ten hingegen weitgehend invariant zum allgemeinen Stand der Technik sein, da fortschreitende Miniaturisierung allen Technologien zugute kommt. Die Speicherkapazit~t pro Hierarchiestufe ergibt sich in einer ausgewogenen Konfiguration nach einer Art reziproker Funktion der jeweiligen Bitkosten Ein weiterer operativer Parameter ist die Zuverl~ssigkeit des Speichers, d.h. die mittlere Zahl yon gelesenen Bits pro fehlerhaftem Bit. Dieses Merkmal ist eine Funktion der natOrlichen Fehlerfreiheit des Mediums, des Sortierungsgrades nach guten Einheiten und des Aufwands an gezielter Redundanz mit nachfolgender Fehlerkorrektur. Die Fehlerdichte des Mediums nimmt n a t u r g e m ~
mit der Homogenit~t ab. Typische Zuverl~ssigkeitswerte
sind (nach entsprechendem Sortierprozess) z.B. beim fabrikneuen Plattenspeicher 10 9 und 1012 nach erfolgter Korrektur. Die physikalische Natur der Speicherung bestimmt den Grad der Fl~chtigkeit der eingeschriebenen Information. Bei einem Arbeitsspeicher kann man eine gewisse Fl@chtigkeit mit periodischem Wiederauffrischen zulassen, bei einem Archiv- oder Journalspeicher mud nat~rlich ein dauerhaftes Speichern gefordert werden. In gewisser Verwandtschaft
zur FiOchtigkeit steht die Eigenschaft des
ON-line oder OFF-line Einschreibens, ROM verstanden.
letzteres auch allgemein unter
Bei verschiedenen Anwendungen,
kumenten mit geringer ~nderungsfrequenz,
z.B. Speicherung yon Do-
kann der ROM-Speicher durchaus
sinnvoll und, da entsprechend billig, von Interesse sein. Ein Obergang zwischen dem normalen schreibbaren Speicher und dem ROM stellt der PROM bzw. EAROM (Programmable bzw. Electrically Alterable Read Only Memory) dar. Der ROM-Speicher wird bier nicht weiter behandelt. Der letzte Operationsparameter
ist die adressierbare Einheit, die im
Verein mit der eigentlichen Zugriffszeit die Komplexit~t der Zugriffsmethode und Effizienz des Datensuchens bestimmt. Man unterscheidet zwischen Orts- und Inhaltsadressierung. sierung ist auf Hauptspeicherebene
Die Ortsadres-
die dominierende Adressierungsart:
Die physische Lokation jedes Datenelementes ist vom Programm definiert und wird Ober die Adresse direkt gefunden. Dieses Konzept ist auf den h6heren Speicherebenen f~r das Aufsuchen yon Datens~tzen nicht mehr zweckm~6ig, wenn die S~tze z.B. in Form einer Datenbank organisiert,
118
programmunabh~ngig und vielen Benutzern verf~gbar sein sollen. Sie m~ssen also letztlich durch ihren Inha!t, gegeben durch ein oder mehrere Merkmale, gekennzeichnet sein. Innerhalb eines Satzes sind die Daten im allgemeinen wieder formatiert, d.h. ihre semantische Bedeutung ist durch ihren relativen Ort bestimmt. Die heutige Suchtechnik bei inhaltsadressierten Datens~tzen bedient sich Indextabellen,
in denen z~B. die Hauptmerkmale numerisch oder alphabe-
tisch geordnet und die reale Speicheradresse direkt zugeordnet ist. Beim Vorliegen weiterer
(Neben-) Merkmale k6nnen diese in eigenen Ta-
bellen gelistet werden, wobei die Speicheradressen aller S~tze, die dieses Merkmal enthalten, wieder zugeordnet werden. Mit diesen invertierten Listen kann bekanntlich der Prozess des Suchens nach mehrfachen Merkmalen schnell, d.h. ohne alle S~tze sequentiell prozessieren zu m~ssen, durchgef~hrt werden. Mit Hilfe der Indextabellen wird also die Inhaltsadresse eines Datensatzes
in eine Ortsadresse umgewandelt.
Letz-
tere wird dann beim Speichern mit wahlfreiem Zugriff schnell und direkt angesteuert. Das Durchsuchen der Indextabellen nach dem gew@nschten Merkmal stellt in sich nun wiederum einen Proze~ mit sequentieller Schrittfolge dar. Ein weiteres Parallelisieren w~re das Abspeichern der Indextabellen in Assoziativspeichern,
mit folgenden Vorteilen:
Fortfall der numerischen oder alphabetischen Merkmalsordnung. Dadurch einfache Aufarbeitung durch direktes Zuf~gen/Entfernen neuer Indizes. Fortfall der invertierten Listen, da gleichzeitig auf mehrfache Merkmale assoziiert werden kanno Direktes gleichzeitiges statt sequentielles Suchen. Die Eigenart des Assoziativspeichers,
eine Formatierung der Daten zu
verlangen, w~re in diesem Fall kein Nachteil. Ein Sonderfall der Ortsadressierung
ist die Adressierung mit Zeigern.
Dabei wird auch eine Entkopplung yon Benutzerprogramm und Datenadresse erreicht. Nachteilig ist das sequentielle Durchlaufen der Zeigerkette. Die einzelnen Speichertechnologien unterscheiden sich nun hinsichtlich der GrS~e der h a r d w a r e - m ~ i g
adressierbaren Einheit. Diese ist z.B. ein
119
Byte beim (Halbleiter-) Matrixspeicher,
ca. 10-20 KBytes beim Platten-
speicher und Millionen yon Bytes beim konventionellen Bandspeicher. Wenn diese adressierbare Einheit nun gleich oder kleiner als die gewfinschte zu fibertragene Blockl~nge ist, soll von wahlfreiem Zugriff gesprochen werden. Der Plattenspeicher hat nur einen semi-wahlfreien Zugriff, da seine Adressiereinheit
(die Spur) um ein Vielfaches grS~er als eine bequeme
logische Satzl~nge bzw. eine ffir diese Hierarchiestufe optimale Blockl~nge ist. Der konkrete Block mu~ dann wieder sequentiell auf der Spur gesucht werden. Die sogenannten Zugriffsmethoden,
also die praktischen Prozeduren zum
Aufsuchen von Datens~tzen spiegeln die jeweils zugrundeliegenden technologischen Adressierparameter wider. Ein Beispiel ist die index-sequentielle Zugriffsmethode ffir "direkten wahlfreien" Zugriff zum Plattenspeicher:
Dabei sind die Hauptmerkmale
der Datens~tze in einer Indextabelle nach aufsteigender Ordnungszahl geordnet. Die Tabelle ordnet jeweils einer Gruppe von S~tzen die zugeh~rende Spuradresse auf der Platte zu° Auch die S~tze selbst sind nach der gleichen Ordnungszahl geordnet, um im Falle sequentiellen Zugriffs die gro~e Zugriffszeit ffir jeden individuellen Satz zu eliminieren. Beim Rotieren der Platte werden die ausgelesenen Satzmerkmale mit dem Suchmerkmal verglichen, his 0bereinstimmung herrscht. Beim Aufarbeiten,
z.B.
Zuffigen eines weiteren Satzes in die m6glicherweise physisch lfickenlose Satzfolge, weist ein Zeiger zu einer neuen Spuradresse auf einer 0berlaufspur. Die Methode kombiniert also die Suchelemente Indextabelle, sequentielles Suchen und Zeigertechnik zu einer den spezifischen Plattenspeicherbedingungen angepa~ten Prozedur, Abb. 4a. Bei einem anderen Speicher mit auch homogenem Medium, dem Elektronenstrahl-Speicher,
ist die Adressiereinheit
frei w~hlbar zwischen einem
und Zehntausenden yon Bytes. Das Zugriffsverfahren kann rein indexorientiert und entsprechend einfach gehalten werden: Das sequentielle Suchen entf~llt. Ein 0berlaufproblem existiert nicht. Dank der kurzen eigentlichen
(elektronischen)
Zugriffszeit kann auf eine sequentielle
Satzordnung verzichtet und der Satz an beliebiger Stelle gespeichert werden, Abb. 4b. Die gr6~ere Adressiereinheit,
d.h. die geringere "Wahlfreiheit", bei
!20
den kosteng~nstigen Technologien ist an sich kein prinzipieller Nachteil, da innerhalb einer Hierarchie ohnehin mit Block@bertragung gearbeitet wird. Ein gradueller Nachteil ist nur dann festzustellen, wenn wie beim Plattenspeicher optimale Blockl~nge und technologische Adressiereinheit nicht ~bereinstimmen.
Diese Diskrepanz schl~gt sich dann in aufwendigen
und zeitraubend ab!aufenden "Zugriffsmethoden" nieder.
3.
SPE ICHERHIERARCHIE
Aufgabe eines Speichersystems
ist neben der Speicherung,
dem Prozessor
die ben6tigten Daten in gen~gend kurzer Zeit und in der angeforderten Menge pro Zeiteinheit zur Verf@gung zu stellen. Analog zu den SystemLeistungsparametern Antwortzeit und Durchsatz l ~ t
sich die Speicher-
leistung durch die Parameter Zugriffszeit und Zugriffsrate definieren. Wenn ein Speicher nur einen Zugriff gleichzeitig gestattet,
kann die
Zugriffsrate etwa gleich dem reziproken Wert der Zugriffszeit gesetzt werden. Bei gleichzeitig mehreren Zugriffen,
d.h. Modularit~t gr6~er
als I, erh~ht sich die maximale Zugriffsrate entsprechend. Wie weir die maximale Zugriffsrate ausgenutzt werden kann, h~ngt yon Parametern wie Systemsteuerung,
Programmprofil, Multiprogrammierungsgrad
und Zahl der
Parallelprozessoren etc. ab. In einer Hierarchie
ist eine gewisse Grundmodularit~t der einzelnen
Stufen schon im Interesse eines gleichzeitigen Datenverkehrs nach oben und unten w~nschenswert.
Dies wird steuerungsm~6ig z.B. auf Hauptspeicher-
ebene durch das unabh~ngige Operieren yon Prozessor und Kan~len erreicht. F~r effektive Multiprogrammierung tenspeicherstufe
ist ausreichende Nodularit~t der Plat-
zwingend Voraussetzung.
Zweck der Multiprogrammierung
ist es, die resultierende Zugriffsrate - gemessen an der Schnittstelle zum Prozessor - und damit den Systemdurchsatz
zu erh6hen.
Bekanntlich liegt dessenungeachtet der Engpa~ f@r den Durchsatz heutiger DV-Systeme immer noch bei der Zugriffszeit und Zugriffsrate des Plattenspeichers. Da weitere Geschwindigkeitsfortschritte Halbleiterspeicher
f@r Prozessor und
in Zukunft durchaus erwartet werden d~rfen, die Plat-
tenspeicher-Zugriffszeit
abet kaum noch verbesserungsf~hig ist, wird
dieses Problem immer dr~ngender: Multiprogrammiergrades,
Eine L~sung Qber weitere Erh6hung des
d,h. der Zahl der gleichzeitig operierenden
Programme, mit entsprechender Erh6hung von H a u p t s p e i c h e r g r ~ e tenspeichermodularit~t
und Plat-
erscheint aus Kosten- und Komplexit~tsgrfinden
121
unpraktikabel. Au~erdem leidet bei zu hohem Multiprogrammierungsgrad die Effizienz: Die Systemverwaltung nimmt relativ zur Wirkarbeit zu, die Chance, mit einer Plattenarmposition mehrfache Zugriffe abzudecken, nimmt ab usw. Eine andere L6sung dieses Problems ist der weitere Ausbau des Speicherhierarchiekonzeptes,
bei beschr~nktem Multiprogrammierungsgrad.
(nicht realisierbare)
Der
ideale Speicher, d.h. der Speicher mit der Zu-
griffszeit des Pufferspeichers und den Kosten des Bandspeichers, l ~ t sich durch eine ausgewogene Hierarchie mit gen@gend feiner Stufung ann~hern. Gl~cklicherweise verspricht die technologische Entwicklung Speicherprodukte, die leistungs- und k o s t e n m ~ i g
gerade das Gebiet der "L~cke" aus-
f~llen und sich so gut in das Spektrum einf~gen. M~gliche Technologien f~r die "L@cke" sind z.B. der CCD-Schieberegisterspeicher,
der Schiebe-
registerspeicher mit verschiebbaren magnetischen Blasen (Bubbles) sowie die Elektronenstrahlspeicherr~hre,
Abb. 5. Diese Technologien sollen im
Folgenden elektronische Massenspeicher genannt werden.
3.1
Hierarchiemechanismus
Die Speicherhierarchie besteht also aus der Hintereinanderschaltung yon Speicherstufen, wobei mit zunehmender Stufenordnungszahl
die Zugriffszeit
und Speicherkapazit~t zunimmt. Bei einem Speicherzugriff des Prozessors versucht dieser zun~chst, die Daten auf der untersten schnellsten Ebene zu finden. Bei Mi~erfolg wird zur n~chsten Ebene zugegriffen und so fort. Bei einer Daten@bertragung auf die jeweils niedere Ebene wird nun nicht nur das verlangte Wort oder Byte, sondern gleich ein ganzer Block ~bertragen. Auf jeder unteren Ebene wird ein
Teil
des Blocks abgelagert.
Die 0bertragungszeit ist bei den gew~hlten Blockl~ngen meist klein gegen die eigentliche Zugriffszeit. Das Wesen der Speicherhierarchie dr~ckt sich also darin aus, da~ unter Zulassung yon geringfOgig mehr Zugriffszeit (n~mlich incl. 0bertragungszeit) @bertragen werden,
ganze Daten- oder Programmbl6cke
in der Annahme, da~ davon ein Yell in n~chster Zukunft
ohnehin zum Prozessieren angefordert wird. Es liegt also ein prophylaktischer Zugriff (look ahead) unter Ausnutzung der (gegen die eigentliche Zugriffszeit) kurzen 0bertragungszeit vor. Unterst@tzt wird dieser Mechanismus dadurch, da~ die Daten oftmals in kurzem Zeitraum mehrfach zugegriffen werden,
z.B. bei Programmschleifen,
abet auch beim Operieren
122
auf h~ufig benutzte Arbeitsdaten Die Trefferrate, gegriffenen Ebene,
d~ho die Wahrscheinlichkeit,
Ebene anzufinden,
ferner im allgemeinen
sie nat~rlich
folgt im einfachsten
kann selbstverst~ndlich
bei denen jeder Zugriff software-implementiert
Datenteile
und entsprechend
Einspeichern z.B.
usw. Auf den h6heren Ebenen, eingeht,
ist die Steuerung
"intelligenter".
fiber einen das Gesamtspeichersystem
L~fassenden
erfolgen. enthielte
ordnung der virtuellen
Entwicklung
in einer Speicherhierarchie: speicheradresse Hauptspeicher
gibt es meist mehrere Adressr~ume wird die reale Haupt-
Platz im Pufferspeicher
Indextabellen
umfa~t,
Zu-
zur lokalen Ebenenadresse.
Auf Pufferspeicherebene
einem bestimmten
den inhaltsadressierten
der realen Adresse
h6heren Hierarchiestufen die Datenlokalisierung:
zugeordnet.
Beim
die also bereits zugeerdnet.
Bei
fibernehmen die vorer-
Logisches
und hierarchie-
Suchen wird identisch.
Die Zuordnungstabellen Ebenen gespeichert~
werden
Beim
entweder auf der gleichen oder auf unteren
(schnellen)
einem eigenen mehr oder weniger
Pufferspeicher
assoziativ
eines Archivspeichers~
der alle Daten im 0N-line
einen magnetischen
Bandspeicher
und einem Prozessorsystem, und einer Hierarchie
wird die Tabelle
arbeitenden
Man kann sich so das gesamte DV-System vorstellen spielsweise
fQr die dynamische
wird die heute meist virtuelle Adresse,
einen grS~eren Adressraum
spezifisches
dann eine Tabelle
Gesamtspeicheradresse
Aufgrund der histerischen
transport~
Algo-
Dieser Mechanismus
in untere schnelle Ebenen,
im Hauptspeicher
er-
und das Suchen yon Daten auf einer Ebene kSnnte kon-
Jede Hierarchiestu£e
Prozessor
h~ngt
ab.
nach den gebr~uchlichen
(Least Recently Used).
in die Leistungsbilanz
zeptuell am einfachsten
w~hnten
dieser
Davon unabhgngig
unterstfitzt werden durch residentes
Teile des Betriebssystems
zu-
auf einer geffillten Hierarchiestufe
Fall selbstregelnd
gewisser hgufig gebrauchter
Die Adre~steuerung
zu mit der Speicherkapazit~t
Daten- und Programmprofil
yon Speicherplatz
rithmen wie FIFO oder LRU
Adressraum
nimmt
Kataloge usw.
Daten auf der jeweils
mit der Blockl~nge.
vom jeweiligen
Das Freimachen
wie Indextabellen~
Speicher
in
gehalten.
als die Kombination Zugriff enth~it,
mit automatischem
bei-
Band-
das wiederum aus dem eigentlichen
yon Arbeitsspeichern
besteht.
Die vet-
123
schiedenen,
teilweise im vorigen Abschnitt diskutierten Technologie-
und Steuerungsparameter variieren entlang der Hierarchieachse wie in Abb. 6 skizziert.
3.2
Leistungsbetrachtung
Das wichtigste Kriterium der Speicherhierarchie ist die Gesamtzugriffszeit bzw. Gesamtzugriffsrate,
absolut gesehen als auch kostenbezogen.
Diese Zusammenh~nge sollen im folgenden anhand eines sehr einfachen Modells diskutiert werden. Das Modell orientiert sich an "typischen" Werten f@r die verschiedenen Parameter und extrapoliert bei nicht bekannten Daten. Wie das Technologiediagramm Abb. 2 bereits indiziert, scheint eine nat~rlich einfache G e s e t z m ~ i g k e i t
zwischen den Bitkosten und der Spektrums-
variablen Zugriffszeit zu bestehen. Diese und die Zuordnung der Trefferrate und Speicherkapazit~t diagramm Abb. Gerade
zur Zugriffszeit sind im Modellparameter-
7 aufgetragen. Die Kapazit~tsverteilungskurve
ist als
(im log. Ma~stab) angenommen, mit den Endpunkten Puffer- und
Archivspeicher. Die gew~hlte Archivkapazit~t ist 1012 b, die Pufferkapazit~t 200 Kb. Die auf der Geraden liegenden Punkte f@r Haupt- und Plattenspeicher entsprechen etwa realen Werten. Die Kapazit~tsverteilungskurve ist an sich nat@rlich innerhalb des technologisch verf~gbaren Spektrums frei w~hlbar. Mit wachsender Prozessorleistung und Datenmenge wird sie nach oben verschoben werden. F~r die Trefferrate im multiprogrammierten Stapelbetrieb liegen als Funktion der Kapazit~t und Blockl~nge einige Erfahrungsdaten im Bereich Puffer - Hauptspeicher vor [9]. Typische Werte daf~r wurden der Modellkurve zugrundegelegt.
Zu den oberen Hierarchieebenen ~in wurde extrapoliert.
Das Modell ber~cksichtigt nicht die gegenseitigen Abh~ngigkeiten von Blockl~nge,
Zugriffszeit, Trefferrate, Multiprogrammierungsgrad usw.,
sondern nimmt starr typische Werte an. Die Gesamtzugriffszeit ist
tges = t1+(1-hl)t2+(1-h2)t3 + .... (1-hn_1)t n
GI. I
124 mit tn ~ Zugriffszeit der n-ten Stufe hn = T r e f f e r r a t e
der n-ten Stufe
Die maximale Gesamtzugriffsrate,
d.h. der Zugriffsflu~ an der Schnitt-
stelle zum Prozessor ist I
max° Zges = tt
1,hl
GI. 2
l_,hn_l
P-~I+ - ~ 2 t2+ . . . .
Pn
tn
mit Pn = Zugriffsparallelit~t auf der n-ten Stufe. Die Zugriffsparallelit~t entspricht in etwa der Modularit~t. angenommen, da~ 50% der Zugriffsparallelitgt
Es wird
sich jeweils in echter
Erh~hung der Zugriffsrate durch Multiprogrammierung niederschlagen, Peff also 0,5 po Ferner, da~ unterhalb der Plattenspeicherebene Programmumschaltung nicht mehr lohnt (p=1) und schlie~lich,
da~ Einzel-
Prozessorbetrieb vorliegt. GI. 2 modifiziert sich dann entsprechend. Einige Modellergebnisse auf der Grundlage realer Technologien sind in Tabelle II zusammengestellt.
Unterschiedliche
Speicherzugriffsraten
schlagen sich in unterschiedlicher Prozessorauslastung nieder. Es wurde ein Modeilprozessor mit 2 MIPS (Millionen Instruktionen pro Sekunde) und durchschnittlich
2 Zugriffen pro Instruktion gewghlt. Dieser Pro-
zessor kann seine volle Leistung nur entfalten, wenn das Speichersystem 4 Millionen Zugriffe pro Sekunde z u l ~ t . Die schlechte Auslastung dieses 2-MIPS-Prozessors bei heutiger Konfiguration ohne Multiprogram~ierung ~berrascht nicht. Auch mit Multiprogrammierung ist die Auslastung nur mg~ig. Erst die Einf@hrung des elektronischen Massenspeichers erbringt eine Verbesserung auf eine vern@nftige Gr6~enordnung.
Bei Multiprogrammierung
verlagert sich jetzt der Engpa~ f@r die Zugriffsrate vom Plattenspeicher (mit seiner hohen Modularit~t)
zum Bandspeicher. Dieser Engpa~ k6nnte
~berwunden werden durch weitere Erh6hung der Hierarchiestufenzahl,
kon-
kret durch Einbau einer Zwischenstufe zwischen Platten- und Bandspeicher.
125
Technologisch liegt eine solche Stufe im Bereich des Sichtbaren, n~mlich ~ber eine Modifizierung des konventionellen Plattenspeichers
zu
einem Satz yon flexiblem Platten mit sehr hoher Bit-Volumendichte
[9].
Die Zugriffsrate der Hierarchiekonfiguration
liegt dann oberhalb yon
4 Millionen pro Sekunde. Die Ergebnisse aus Tabelle II werfen die Frage nach der optimalen Hierarchiestufung auf, bei festgehaltenen Endpunkten.
Ffir diese Analyse wird
ohne Bezug auf reale Technologien eine g l e i c h m ~ i g e
Stufung vorgesehen
und die Stufenzahl variiert. Multiprogrammierung wird jetzt nicht ber@cksichtigt. Ergebnisse sind in Abb. 8 aufgetragen:
Bei ca. 16 Stufen
stellt sich ein Sgttigungswert fur die Zugriffsrate ein (die in diesem einfachen Fall der reziproke Wert der mittleren Zugriffszeit ist). Diese Zugriffsrate ist nur etwa 2 mal kleiner als die der reinen Pufferspeicherstufe. In Abb. 8 ist weiterhin die Preisleistungszahl, pro Gesamtbitkosten,
n~mlich Zugriffsrate
aufgetragen.
Hier liegt das Optimum bei ca. 8-10 Stufen. Die Verbesserung gegenfiber einer 4-stufigen Hierarchie ist g r ~ e r
als Faktor 6. Auf der Grundlage
der realeren Daten in Tabelle II ist der Gewinn bei einem Schritt von heutigen 4 Stufen auf (die durchgespielten)
6 Stufen noch wesentlich
h6her, da dort nicht von einer gleichmg~igen Stufung ausgegangen wurde. Ein weiterer Vorteil der feineren Hierarchiestufung ist die Verbesserung des Prozessor-"Wirkungsgrades":
Die Zahl der Zugriffe zum Platten- und
Bandspeicher nimmt ab. Damit nimmt auch die Zahl der prozessierten Instruktionen
(der Zugriffsroutinen) pro Zugriff zur Speicherhierarchie
ab, und der Prozessor-"Wirkungsgrad"
nimmt zu. Schlie~lich kann das Be-
triebssystem einfacher gehalten werden. In diesem Modell ist der Zuverl~ssigkeitsaspekt nicht enthalten, der mit wachsender Stufenzahl kritischer wird. Ebenso sind die Kosten der Steuerungen, Adresstabellen, Trefferratenkurve
etc. nicht ber@cksichtigt.
Die Extrapolation der
ist v611ig hypothetisch. All dessert ungeachtet d~rfen
die Modellergebnisse als Indiz daffir verstanden werden, dab eine feinere Hierarchiestufung noch erhebliches Leistungspotential
enth~it.
126
4.
SPEICHERASPEKTE BEI DATENBANKBETRIEB
Auch der Datenbankbetrieb kann grunds~tzlich in die bisherige Modellbetrachtung eingenordnet werden° Derjenige Parameter, der sich m~glicherweise
(in Richtung ungQnstiger Werte) ~ndert, ist die Trefferrate,
insbesondere auf den hohen Ebenen. Erfahrungen dar~ber m@ssen abet erst gewonnen werden, sodag hier die Modellwerte beibehalten werden,
zumal
auch bei der Datenbank ein gewisses "Nachbarschafts"-Verh~Itnis
yon
Anfragen festzustellen sein dQrfte. Praktisch-anschaulich k~nnte man sich eine Funktionsverteilung
auf die einzelnen Hierarchiestufen wie in
Tabelle III skizziert, vorstellen. Zugriffsrate m~ssen v o n d e r
Datengruppen mit hoher professioneller
Archivstufe auf die Plattenspeicherstufe
resident ausgelagert werden. Der spezifische Datenbank-Leistungsparameter die zul~ssige Anfragenrate.
ist, neben der Datenmenge,
Diese sollte mit wachsender Datenbankkapa-
zit~t auch ansteigen. Die folgende 0berschlagsrechnung m~ge einige Veranschaulichung bringen: Nach Tabelle II ist bei heutiger Hierarchie und Multiprogrammierung die Modellzugriffsrate
~85 M/s. Wenn wir einen Programmablauf von durch-
schnittlich 100 K Instruktionen pro Datenbank-Anfrage
annehmen, w~rde
das System 4.25 Anfragen pro Sekunde erlauben. Dieser Wert dfirfte bei einer Datenbank-Kapazit~t yon 1012 b nicht ausreichen. Nach BinfQhrung des elektronischen Massenspeichers
erh~ht sich die Anfragenrate auf 14
pro Sekunde, Mit einer zus~tzlichen Zwischenstufe zwischen Platten- und Bandspeicher erh6ht sie sich auf ca. 30 pro Sekunde - entsprechende Prozessorleistung von ca. 3 MIPS vorausgesetzt. Die letzten Endes interessierende Frage, wieviele Terminals an eine Datenbank dieser Gr6ge bei befriedigender Bedienung angeschlossen werden k6nnen, h~ngt natQrlich yon der mittleren Anfragelast pro Terminal ab. Bei einer angenommenen mittleren Last yon einer Anfrage pro Terminal und Minute errechnet sich eine Terminalzahl von 30.60=1800. Diese Anschlugm6glichkeit pro 1012 b Datenbankkapazit~t
erscheint ausreichend.
Als Schlugfolgerung aus diesen Betrachtungen soll die Feststellung getroffen werden, dag Organisation und Technologie zukQnftiger Speichersysteme das Potential haben, den Leistungsanforderungen eines breiten Datenbankbetriebes
gerecht zu werden.
127
Literatur [ I] C.W. Pugh, "Storage Hierarchies:
Gaps, Cliffs and Trends",
IEEE Transactions on Magnetics, Vol. Mag-7, No. 4, Dez. 1971 [ 2] C. Johnson, "IBM 3850-Mass Storage System", Nat. Comp. Conf.
1975, S. 509
[ 3] J. Kelly, "The Development of an Experimental Electron-BeamAddressable Memory Module", Computer, Februar 1975 [ 4] W.C. Hughes et. al., "BEAMOS, A New Electronic Digital Memory", Nat. Comp. Conf. [ 5] G.F. Amelio,
1975, S. 5-41
"Charge-Coupled Devices for Memory Application",
Nat. Comp. Conf. 1975, S. 515 [ 6] W.S. Boyle et. al., "Charge-Coupled Devices - A New Approach to MIS Device Structures", IEEE Spectrum, Juli 1971, S. 18 [ 7] A.H. Bobeck et. al., "A New Approach to Memory and Logic: Cylindrical Domain Devices", Proc. AFIPS Conf., Vol. 55, 1969 [ 8] R.R. Martin et. al., "Electronic Disks in the 1980's", Computer, Februar 1975, S. 24 [ 9] D.H. Gibson, "Considerations
in Block-Oriented Systems Design",
AFIPS Proc., Vol. 30, SJCC 1967, S. 75-80
128
I m
SPEICHERMEDIUM (HOMOGENIT~T, BITDICHTE)
BiTZAHL PRO SCHREIB-LESE-STATION ]-ECHNOLOGIE - (MATRIX-/SEQUENTIELLE ANORDNUNG) PARAMETER
-
i -
ATENTRANSPORT
ZUGRIFFSZEIT
i- OBERTRAGUNGSZEIT = F(OBERTRAGUNGSBREITE, TAKTFREQUENZ)
BLOCKL~NGE,
- MODULARITAT----ZUGRIFFSRATE )PERATIONSPARAMETER
- BITKOSTEN---KAPAZIT~T - ZUVERLASSIGKEIT - FLOCHTIGKEIT ,- ADRESSIERBARE EINHEIT (BYTE/BLOCK-ADRESSIERUNG)
TABELLE
I
SPEICHERPARAMETER
0,075 0,9
0,075 0,009
0,03 0,04
0,03 0,04
O,O3 0,04
P+H+E+SP+B
P+H+E+SP+B Multiprogr.
P+H+E+SP+F+B Multiprogr.
FP
,32 1
1,82
70
100
(0,3 4) 2,82
0,2
i
Pufferspeicher Hauptspeicher Elektronischer Nassenspeicher Starre Platte Flexible Platte Band
B
(Prozessor 2 MIPS,
TABELLE II
0 , 0 1 5 (O, 7) 5,88
2 Zugriffe/Instruktion)
1,4 1,32
47
1,87
0,53
0,3
3,2
2,1
0,67
1 ,27
21
0,85
(1,1
0,2
)
0,084
1,27
2,8
0,11
0,3
[~s]
[I06~1
GesamtKosten
9,3
Prozessor Auslastung
[%]
imax. Zges
[106/s1
B
tges
P H g SP FP
Modellhierarchie-Leistungsparameter
0,075 O,O O 9 % O O 4
0,9
O,03 0,04
SP
P+H+SP+B Multiprogr.
E 9
H
0 , 0 3 0,O4
P
t [ps]/Pelf
P+H+SP+B
KONF IGU RAT ION
~D
130
HIERARCHIEEBENE NR,
TECHNOLOGIE TYP. KAPAZITAT
FUNKTION
1
BIP PUFFER- 4-16K BYTES SCHNELLER ARBEITSSPEICHER FOR VERKNQPFUNG VON DATEN MIT SPEICHER PROGRAMMEN
2
FET HAUPTSPEICHER
5
I05-10ZB
BEREITSTELLUNGVON PROGRAMMEN UND DATENFOR OBERSCHAUBAREN OPERATIONSZEITRAUM
SCHIEBERE- I07-I09B GISTER- BZW E-ST~HLSPEICHER
HALTEN VON H~UFIGEN PROGRAMMEN Z,B. BETRIEBSSYSTEM UND ARBEITSDATEN Z.B, INDEXTABELLEN, DESKRIPTOREN, KATALOGE, ZEIGERNETZE USW.
PLATTENSPEiCHER
I08-1010B
BANDSPEiCHER (AUTOMAT, BANDTRANSPORT)
i010-i013 B DOKUMENTEN-DATENBANK DATENSICHERUNG, ARCHIVIERUNG
DATEIEN FOR PROFESSIONELLE BENUTZUNG, DATENSICHERUNG
TABELLE ZII
FUNKTIONSVERTEILUNG BEI DATENBANKBETRIEB
131 I Ill l
I
BANDSPEICHERMIT AUTOMATISCHEMLADEN
I I
rain II
1 L .......
PLATTENSPEICHER
HAUPTSPEICHER
---J PUFFERSPEICHER
I
I
~I
~ 40 ms
/~s
50 ns
STEUERKANALE
Abb. 1
SPEICHERHIERARCHIE HEUTE
i
BANDSPEICHERMIT ~~10s 1 MANUELLEMLADEN
I32
MATR ~X ~cts/bit I bits
SEQUENT|E L L
BiP FET BUBBLES
ROHRE PLATTE
E-
,
I
log i
@
I i 1 1 J I I l
o
•i
104
AUTOM. BAND
I
Xt
MITTLERE ZUGRIFFSZEtT
m
ADRESSIERBARE EINHEIT
D ×
i l
102
BITKOSTEN ( Marktpreise ) x
10-2 i _ _ |
I
I 102
104
108
106 I
1010 I t012 B!TS / LESE - SCHREIBSTATION t
J i
I
!
i
~
Abb. 2
el
~
i
Q
el
e~
mech
J
MEDIUM -
i
DATENTRANSPORT (ELEKTRONISCH / MECHANISCH )
4-I+ HOMOGENIT~T mech
OPERATIONSPARAMETER ALS FUNKTION OER TECHNOLOGIEPARAMETER
133
BITS
t 10 7-
3340 x CDC 9762 x x 3330 - 002
l o 6-
× 3330 - 0 0 1 x 2314 10 5-
10 4
× IBM 2311 I I
I
I
10 3.
II
10 2.
1960
I
!
1970
1980
x BITFL,~CHENDICHTE
BITS / INCH 2
• BITSPURDICHTE
BITS / INCH
• SPURDICHTE
SPUREN / INCH
Abb, 3
PLATTENSPEICHER -
BITDICHTE
JAHRESZAHL
134 ~NDEXTABELLE
SATZ 5 SATZ 2 SEQUENTIELLES SUCHEN
OATENSPUR
DIREKTE ADRESSE
0BERLAUFZEIGER SATZ 3 0BERLAUFSPUR
A) PLATTENSPEICHE R
I
I INDEX 2
ADR. X SATZ 3
INDEX 3
ADR. Y
i INDEX 5 ADR. Z
Abb. 4
ADRESSiERUNGSSYSTEME
SATZ 5
B) ELEKTRONENSTRAHLSPEICHER
135
SPEICHERKAPAZITAT BITS
1014 MAGN. BANDSPEICHER ( automatisch )
1012.
E-STRAHL 1010. MAGN. PLATTE,
108 -
106 -
104 -
102
I
10-8
I
10 - 6
I
10 - 4
!
10 - 2
1 ~4=,,,.-
1 ' LOCKE '
~-~ I
L Abb. 5
TECHNOLOGIE - 0BERSICHT
(ohne opt. Techn,)
I
102 ZUGRIFFSZEIT
s
136 DATENSPEICHER
5
i
AUTOMAT. BAND
1 4
HOMOG ENIT.,~T MEDIUM DATENTRANSPORT MECHANISCH ADRESSIEREINHEIT BLOC KLANG E ZUGRIFFSZEIT STEUERUNGSAUFWAND ( SOFTW,~,RE ) K.APAZITAT TREFFERRATE
'I
' L PLATT i I ~DR'TA~:S'~0J "L ......
I i '--SC"'EBEREO')4 I
2
' s I I J
]ADR. TAB. St. 3 - - 4 J
1 FET ADR. TAB. St, 2
STUFE B
1
BIP
l
,
!
m J
_jL
}..*'DATENRATE
1
TAKTFREQUENZBusBREITE HARDWARE
PROCESSOR
PROCESSORSYSTEM
Abb. 6
MODULARITAT BITKOSTEN DATENFLOCHTIGKEIT DATENTRANSPORT ELEKTRONISCH
I~
-
STEUERUNG
STEIGENDER TREND
PARAMETERTREND 0BER HtERARCHIESPEKTRUM
137 BITS PARALLELZUGRIFFE
CTS / BIT 1-h
BIT - KOSTEN
KAPAZIT,~T .1012
_ 1010
10-2
108
10-4
lO6
10-6
- 10 4
10-8
102
10-10.
I 10--8
I 10--6
I'" 10 .-4
I' 10 -2
I 1
I t0 2
ZUGRIFFSZEIT s
P H E SP FP B
Abb. 7
PUFFERSPEICHER HAUPTSPEICHER ELEKTRON. MASSENSPEICHER STARRE PLATTE FLEXIBLE PLATTE BAND
MODELLPARAMETER
138
10 6 S
~=10 6 $
1 ZUGRIFFSRATE 14-
12-
8 -3
// / //+/
GESAMTBtTKOSTEN
4.!
2-
|
2
I
I
6
I
I
10
1
l
I
14 -
Abb. 8
I
I
I
18
MODELLERGEBNISSE GLEICHM~.SSIGE STUFUNG { im log. Mal~stab )
~
STUFENZAHL
System R:
A Relational Data Base.Management System
Morton M. Astrahan, IBM Research Laboratory, San Jose, California Donald D. Chamberlin, IBM Research Laboratory, San Jose, California W. Frank King, IBM Research Laboratory, San Jose, California Irving L. Traiger, IBM Research Laboratory, San Jose, California INTRODUCTION System R is a data base management system which provides a high-level, non-procedural relational data interface. The system provides a high level of data independence by isolating the end user as much as possible from underlying storage structures. The system permits definition of a variety of relational views on common underlying data. Data control assertions,
features
are
also
provided,
including
authorization,
integrity
triggered transactions, a logging and recovery subsystem, and f a c i l i t i e s
for maintaining data consistency in a shared-update environment. The relational model of data was introduced by Codd [ I ] in 1970 as an approach toward providing solutions to the various outstanding problems of current data base management systems. In particular, Codd addressed the problems of providing a data model
or view which isdivorced from various implementation considerations (the data
independence problem) and also the problem ofproviding the data very
high-level,
non-procedural
stressed here that the relational model is a framework compatible
solutions
to
base user with
data sublanguage for accessing data.
these and other
or
problems in
philosophy
a
I t should be for
finding
data base management; the
relational approach is thought to make solutions more elegant and perhaps simpler but the
approach by i t s e l f does not solve these problems.
With this caveat in mind, our
f i r s t purpose is to b r i e f l y describe a related set of data base problems which we are attempting to solve in a coherent way following the relational approach. Our solutions are embodied in an experimental prototype
data
management system called
System R which is currently being designed, implemented, and evaluated at the IBM San Jose Research Laboratory. We wish to emphasize that System R is a vehicle for research in data base architecture, and is not available as a product. Furthermore, the ideas discussed in this paper should not be considered as having product implications.
140 To a large extent, the acceptance and value of the relational approach hinges on the demonstration that a system can
be b u i l t
which is
operationally
complete (can
actually be used in a real environment to solve real problems) and has performance at least comparable to today's existing systems.
With the
present
state
of
systems
performance prediction, the only credible demonstration is to actually construct such a system, and to evaluate i t in a real environment.
The point of this
paper,
then,
is to describe the set of problems which are being studied in the System R framework, to discuss the objectives of the system (which amounts to a description or definition of
the term operationally complete), and to describe the architecture of the system,
including overall structure, interfaces, and functional design. The System R project is not the f i r s t however, we know of complete capability. related
no other
implementation
hence data
the
relational
Other efforts have demonstrated f e a s i b i l i t y in various
problem areas.
these
of
projects
the
No concurrent sharing of data was permitted
control, locking, and recovery issues were greatly simplified.
INGRES project [4] at U.C. Berkeley is also single-user oriented. of
approach;
For example, both the IS/I system [2] and the Phase/O SEQUEL
prototype [3] were single-user systems. and
of
system which is r e a l l y aimed at an operationally
In addition,
The each
has an incomplete treatment of views, i . e . , of providing various
views of data to various users. The next section describes the overall goals of System R and describes capabilities
which we believe
the
list
to be necessary in an operational environment.
of The
following section describes the architecture of the system, and describes in overview terms i t s major interfaces and the components which support these interfaces SYSTEM OBJECTIVES System R is focused on f i v e main goals: I.
To provide a high l e v e l , non-procedural relational data interface.
2.
To provide the maximum possible data independence for
the
basic
data
objects
(base relations). 3.
To support derived relational views.
4.
To provide f a c i l i t i e s for data control consistent with the high level of the data interface.
5.
To discover
the
performance trade-offs
inherent
in
this
type of data base
capability. F i r s t , each of these goals w i l l be discussed and i l l u s t r a t e d . I. High Level Non-Procedural Relational Data Interface The trend toward higher level languages has long been evident in the programming
141 domain.
Set-oriented
data
Information Algebra [5].
sublanguages were introduced
in
1962 in the CODASYL
Codd's ALPHA language [6] and Relational Algebra [7] raised
the level of data sublanguages by letting the user specify the properties of the data required without describing the access Path or detailed sequence of operations to
be
used to obtain the data. This trend toward higher level non-procedural programming [8] is aimed at reducing the number of decisions the programmer must make in order to express his problem/solution, and at making the decisions more relevant to the solution (as opposed to being relevant to the programming of a specific computer). Halstead
has examined two programs solving
the
same problem using his software
physics techniques [9], one written in ALPHA and the other in DBTG-COBOLand for this case found that the ALPHA solution required 30 times fewer mental discriminations than the lower level solution This observation should be directly translatable into increased
programmer productivity and ease of maintenance.
is one strong reason for the goal of supporting
Thus, human productivity
a high-level,
non-procedural
data
interface. The other reason for moving in the direction of non-procedural interfaces is related to the optimization of the execution of the program. to
I f the data base were dedicated
a single application, its structure could be optimized for that application only,
and the application could be written in terms of that optimized structure. in
an integrated
inefficient.
data
Hence, the
application
on a data
applications.
base environment,
application intent optimization.
such local optimization is l i k e l y to be
system must i t s e l f
optimize
base whose structure
The non-procedural, and hence is
is
high-level easier
the
execution
for
rather
much mathematical
the
sophistication
better
system to
algebra
projection,
join,
introduces division,
a collection etc.)
relational results. The need to relational languages became apparent research groups [11,12].
which
of
each
on the aggregrate
have relational
reveals
the
use as a basis for
part
particular, the ALPHA language is based on the f i r s t order predicate relational
of
a compromise among the various
specification
The available relational languages (ALPHA, Relational Algebra) were very required
However,
formal
of the user. calculus.
and In The
operators (selection, operands and produce
discover more user-oriented, non-mathematical and is currently being pursued by several
The principal external interface of System R is called the Relational Data Interface (RDI), and provides relationally complete [7] f a c i l i t i e s for data manipulation, data definition, and data control. To support high-level, non-procedural~ set-oriented applications, the RDI contains the SEQUEL data sublanguage in its entirety. SEQUEL is documented in [I0].
142 Of course, not a l l requirements can best be met through a non-procedural approach and f o r this reason the RDI
contains
single-tuple-oriented
operators
(FETCH, INSERT,
DELETE, REPLACE, e t c . ) in addition to the set-oriented c a p a b i l i t i e s of SEQUEL. We have designed the RDI to be used in two modes: (a) D i r e c t l y by an application
program
(e.g.,
a
COBOL program)
which
uses RDI
operators to access the data base. (b) As the target of a t r a n s l a t o r program (a special case of an application
program)
which is emulating some other type of user interface. 2.
Data Independence
Date [13] has defined data independence as the immunity of applications to change storage structure and access strategy.
the a b i l i t y of a data base system to provide various logical views of the data for
example to make v i s i b l e only selected records of a f i l e ,
of each record. application
By view,informally we mean a
can
access
the
data
base.
relational
The
to
distinguish
window through
which
an
term "window" is used to imply that the
these two notions of data independence.
address the only f i r s t
base;
and selected a t t r i b u t e s
changes to the data base which a f f e c t the view are v i s i b l e to wish
in
Often, however, the notion is associated with
application.
We
In t h i s subsection we
notion of data independence; the second~ which
we call
the
support of derived views, is discussed in the next subsection. Typically,
data
management systems permit two levels of data d e f i n i t i o n .
The lower
l e v e l , or "schema", describes the p r i m i t i v e data objects being managed by the system. In System R, these p r i m i t i v e objects are called base relations.
The description of a
base r e l a t i o n includes the r e l a t i o n name, a t t r i b u t e names, description of
the
units
of each a t t r i b u t e , the domain of each a t t r i b u t e , the order of the a t t r i b u t e s within a r e l a t i o n , the order ( i f any) of the tuples within a r e l a t i o n , the
definition
of
a
base table
storage or available physical access paths to the data. has
a very
direct
etc.
In
particular,
does not include any information about physical However, each base r e l a t i o n
physical representation, i . e . , each tuple of the r e l a t i o n has a
stored representation.
Data independence implies
that
the
base
relation
can
be
supported by a v a r i e t y of physical structures and access strategies. Clearly
data
independence
is important i f a system is to allow growth and meet the
changing requirements of various applications. access structures. 3.
System R provides
a
rich
set
of
Any of these can be used to support a given base r e l a t i o n .
Support of Derived Views
The higher level of data independence consists of the a b i l i t y to define a l t e r n a t i v e views in terms of the p r i m i t i v e data objects. This notion appears in most
143 contemporary data management systems and the usefulness of such systems depends in large measure on the capability of the system to support derived views. The i n a b i l i t y to support views which d i f f e r from the primitive views often leads to programs which are complex, because they are warped to use views which are not natural but can be supported, and which require extensive maintenance as changes over time.
the
system
As an example of the usefulness of derived views, consider a data base containing the following
two
types
of
records:
CATALOG (PARTNO,DESC,PRICE) and
SALES
(SALENO,PARTNO,QSOLD). The CATALOG f i l e is ordered by part number, and gives the description and price of each part. The SALES f i l e is ordered by sale number, and gives the part number and quantity sold for each sale. Suppose we wish to print out all the SALES records for parts which have a price greater than $I000. We could write a program to scan through the CATALOG f i l e , finding parts $I000;
for
with
PRICE>
each such part, a separate scan could be made through the SALES table to
find all the corresponding records.
This program would
be highly
procedural;
it
would require repeated scanning of the SALES table, and would give the system l i t t l e opportunity to optimize the query by choosing among alternate access paths. However, i f our system permits the specification of derived views, the user might specify a view consisting of the join of the two f i l e s , as follows: SALES-CAT (SALENO,PARTNO, DESC,PRICE,QSOLD). The program could then consist of a single through
the
SALES-CATview.
the system f l e x i b i l i t y
to take
scan
Besides being easier to write, this program would give advantage
of
new access paths
which
may become
available (such as a PARTNOindex on the SALES f i l e ) without requiring changes in the program. A major goal of the System R project is to develop and investigate the technology derived views. studied:
This
problem has
three
of
distinct aspects, each of which is being
(a) Exactly what set of operations on derived views is supportable? As an example of this issue, imagine a request to delete a tuple from the SALES-CAT view described above. Since this view is a join of two underlying f i l e s , i t is not obvious what actions should be taken on the f i l e s to support the deletion. (Should we delete the SALES record but retain the CATALOG record?) For some kinds of view modification requests, there may be several possible actions which would produce the desired result; for other kinds of requests, there may be no possible supporting action. Codd [18] has described some examples of the l a t t e r phenomenon. (b) How should the view be bound to the available physical structures and access paths? This aspect of the binding problem concerns the optimization of the view and
144 accesses on scan, etc.
the
view in terms of available access paths, e.g., indexes~ sequential
(c) When should binding be performed?
For dynamic view d e f i n i t i o n , the binding must
also be dynamic.
In System R, we are investigating various binding-time
dynamic
w i l l occur for dynamically defined views but for certain often-used
binding
or very demanding views, the binding w i l l be done s t a t i c a l l y
with
strategies;
(hopefully)
an
increase in performance. 4.
Data Control F a c i l i t i e s
Data Control includes those aspects of a data base system which control the access to and
use
of data.
We distinguish four types of data control, each of which is being
investigated in System R. (a) Authorization.
This
form
almost a l l current systems.
of control is the most common type, being present in
Authorization is the mechanism to
permit
or
creation and manipulation of data structures and views by various users. System R may p o t e n t i a l l y be authorized selectively
grant
to
create
new tables
and
authorizations for his objects to other users.
deny the Any user of
views,
and
to
The authorization
mechanism of System R is described more f u l l y in [14]. (b) I n t e g r i t y .
I n t e g r i t y control provides a mechanism for enforcing that the data in
the data base obeys certain rules or predicates system.
which
have been declared
is l e f t to protocols imbedded in various application programs. types
of
control
facilities
are
provided:
integrity
I n t e g r i t y assertions are expressed in the SEQUEL language data
in
the
predicates. type
to
the
This form of control is t y p i c a l l y not found in current data base systems but
of
data
b a s e [15].
The
system
then
In System R, two main
assertions as
and triggers.
predicates
guarantees
the
Exactly when the system checks an assertion is a function
assertion
and
the
transaction
about
the
truth of these of
both
the
boundary which caused the assertion to be
checked. Triggers are actions that are invoked when some triggering detected.
For
example,
this
or
action
is
suppose that the DEPT r e l a t i o n contains an a t t r i b u t e NEMPS
which represents the number of employees in the department. of
condition
To maintain the v a l i d i t y
value~ we can declare triggers to update t h i s f i e l d whenever an employee is
hired, f i r e d , or transferred. (c) Consistency.
Integrity
implies
the
static
correctness
consistency is concerned with the dynamic correctness.
of the data base and
Suppose that one
application
program is t r a n s f e r r i n g a set of employees from Dept. 48 to Dept. 50, while simultaneously another application program is giving raises to a l l employees in Dept, 50. The interaction of these programs may have the undesirable r e s u l t that some but not a l l of the transferred employees receive the raise. E v e n worse, i f the transferring program encounters a f a i l u r e and backs out i t s updates, i t may develop
t45 that a raise has been given to In
current
systems
the
someone in Dept. 48.
application would contain specific statements (e.g., "LOCK
DEPT 50") to avoid these problems. defensive
A major goal of System R is
to
eliminate
coding which is not a part of the problem being solved but is related only
to the fact that the solution is running in a certain environment. cannot
know in
advance the
exact
environment
is
not
needed),
consistency. boundaries
the
system must
provide
The approach being pursued is to of
atomic unit. environment
Since
the
the
require
in
control that
this
case
user
define
the
a transaction, which is a sequence of statements to be executed as an The system then requests whatever resources i t needs
to
guaranteed
the
needed to enforce
the
guarantee
atomicity.
in
the
run-time
Furthermore, this same atomic unit is used as
the unit of i n t e g r i t y , i . e . , i n t e g r i t y may be suspended within a transaction is
user
in which his application w i l l run
(perhaps no other users are currently updating employee records; lock
such
at the transaction endpoints.
but
it
I f a transaction violates i n t e g r i t y at
i t s endpoint, then the transaction is backed out. (d) Recovery.
The fourth
aspect
of data control is concerned with preserving the
i n t e g r i t y of the data i f the system experiences a malfunction or backs
up either
voluntarily
if
an
application
or i n v o l u n t a r i l y , (e.g., as in the case of deadlock).
The recovery c a p a b i l i t i e s of System R include the usual checkpoint/restart as well
as
functions
the a b i l i t y to back up an ongoing transaction to user-specified points.
These c a p a b i l i t i e s are examples of functions which are required in order to
have an
operationally complete c a p a b i l i t y . ARCHITECTURE AND SYSTEM STRUCTURE We w i l l describe the overall architecture of Sytem R from two viewpoints. will
describe
description. a functional
the
system
as
seen by
Second, we w i l l investigate
a
single
i t s multi-user dimensions.
Figure 1 gives
programming language,
or
used to
directly
support various other interfaces.
The
Relational Storage Interface (RSI) is the access-method-like level which handles
the
access
a
we
view of the system including i t s major interfaces and components. The
RDI, as described previously, is the external interface which can be called from
First,
transaction, i . e . , a monolithic
to single tuples of base r e l a t i o n s .
This interface and i t s supporting system
(Relational Storage System - RSS) is actually a complete storage subsystem in that i t manages devices,
space
allocation,
storage buffers (one level s t o r e ) , transaction
consistency and locking, deadlock, backout, transaction recovery and Furthermore, i t maintains indexes on selected a t t r i b u t e s of base relations.
logging.
t46 r- -"i
!
r - --~
I ! !
I I I I I
t I
I
I
Relational Data Interface (RDI)
..JW
JO ::)
0
Z
H
W Z
~t W U
Z O H I---
Ul Z O ~ 1"~ U H
¢~
W gL
H
1E
Z O H
0
Z O ~-i I"-~t U lUL
ell
"~ Y Z~
gL E Z 0
Z L~ 2>
U~ I-Z W 2> w
_J
gl p..,~ H
Z o'1
~9
Z W
Z
H
O
Z
~J
o
E
2~
E
E
o~
o
x LU
o') LL
[RAS[
R[STOR[
SCRRTCHPRD
GET PRG[ 6 UZ(~ T I T L [ ~ ~ ' " SRU£ PAGE R(DRRU ......
Figure 11a. Data description function
]LIPUT RODE R ( P L R C [ ROD[ sON CH[CK RETURN
m
m
m
m
m
DELETE L I N E INSERT BLANK L Z N [ COPY L I N E DOWN PRZNT
DATA D [ | C R Z P T Z O N |
ZBfl [HPLOYEE F Z L [ ~ZTH X , Y [RP S2 HATCH KEY rROfl UNXflRTCH ~'14TCH CH 13 1 STR[[T HUR|[R r r N o ¢H S fLANK1CH t STRE[T NRH[ Sr~rN~R[ CH 16 9TRE[T TYP[ rTTYPE CH 2 NRfl£ L~TY CH 10 2"IP cz s COD( rLRNK2 CH 1 COORDZNAT£ CZ 7 rt.~NK3 CH 2 COORDZNRT[ TC! 6
C R [ R T [ AND [ D Z T
~0
=ON
ynJ;
X~t
;
S[L.[¢T][ONS
ERAS[
RESTORE
SCRATCHPAD
GET PAGE 15 VIEW TITLE~"-'-SAVE PAGE 15 REDRAM ........
~ 95125
TN[N
THEN
AND [ D I T
Figure 11b. Data selection function
I N P U T HODE R E P L A C E MODE CHECK RETURN
ZIP
OR Y ) l l l i J J 8 8 0
Y(=m
MNI[R[
OR X ~ 2 t t i a i l
XN Kz: M->Z
Intersection Relative complement Cardinality
Binary relation
{x[xeMiAx@M2}
operators
Ko: R-~R Rb: RxM-~R
Converse relation Restriction { (x,y) ~ (x,y)eRAxeM}
Rp: KxR->R RU: RxR-~R
Product Union
Reduction Vo:
{(x,y)~ 3 z:(x,z)eRIA(Z,Y)eR2}
of binar~ relations
R-~
Domain
{xI3y:(x,y)eR}
and a measure
191
Range
{xJ3y:(y,x)eR}
Na:
R-~M
Vg:
RxI-~M
Individual
domain
Ng:
RxI-~M
Individual
range
VgU:
RxM-~M
Restricted
domain
Reduction of measure Fw: FxI->D (n=2)
{xJ(x,I)eR} {x~(I,x)eR} {xl(x,y)eR^yeM}
functions
Logical 0Perators e: IXM-~B Test on set membership c:
In
MxM-~B
addition,
the
standard
Test on set inclusion
the standard arithmetic
logical operators and
comparison
are available
operators
as well as
for numbers
and
measures. Control m e c h a n i s m Sequencing
of operations
"Programs"
for the set theoretic machine
notation. Operations are performed nested argument, from inside out. Example:
A
question
such
would take the following c(Mw(Mcity),
are expressed
in a functional
from left to right and,
as "Are cities birthplaces
~or each
of engineers?"
form in the set theoretic machine
VgU(en(Rbirthplace),
Mw(Mengineer)))
Loops Loops are introduced three arguments:
by
resulting
the
use of bounded quanti£iers
i)
An expression
2)
An
3)
The name of a bound variable; invocation of the loop.
expression
(scope);
for
in a set of objects condition
it may be regarded
Important q u a n t i f i e r s are AL: MxB -~B all, every EI: MxB -~B some DB: MxB -~M
the
which
which nave
(range).
resulting
in
a
truth value
as the loop body. each o£ its substitutions
defines
an
192
ZB: Mx~ ->Z how many with the le£t-hane ~
the
set
bounding
and
the
le~t-nand
5 tne
conoition. Zxamples : DB
(x~Mw(~city) ~ e ( x r V g O ( e n ( R b i r t n p l a c e ) , M W ( M e n g i n e e r ) ) )
with the meaning DB
o£ "~nicn cities
are birthplaces
)
o£ engineers".
(x I , Mw (~manu f) ZB(x2, Vg(en(Rprod) ,Xl) , DB(x 3 , l~w (~lailment) , e(x2, Vg(en(Rmedic) ,x3)))))
with the meaning of "How many products m e d i c a t i o n s £or which ailments?" ~x~ressions Set
o£ which m a n u f a c t u r e r s
are
in the data base
membership
represen£ation
o£ an arbitrary o~
a
set,
~ind is expressed
arbitrary
set
Dy including,
expressions.
in the
Example
(in
German): Mrezeptp£1ichtig Ispasmocibalgin Vg(en(RDerivat), IOxazolidin)
®
IMorpnin Mw(MOpiate)
®
MW(MHypnotiKa) IMethadon Vg(en(RDerivat),
IS uccinimid)
Vg(en(RHeilmittel), where
~
indicates
drugs, Q a l l Tais
concept
all
opiates, is
its advantages are: - Since all objects
IAgitiertheit)
derivates
of
Oxazolidin
to be prescription
etc.
extended
to relations
and measure
functions.
are e v a l u a t e d on request only, changes
Dase may De made locally without that may exist.
Two of
to the data
regard to any interrelationships
193
- Expressions individuals
may be stored without regard for the existence of any for it. Hence one could construct a data base consisting
exclusively
of higher-order
One consequence, however, defined recursively since
relationships.
is that the control mechanism must itself be it may be invoked on any load operation.
3~3 Natu~@ 1 !anguage Few
users
will
feel
at
ease
with
the
highly
stylized
language
introduced in sec. 3.2. One possible step of abstraction, therefore, is the definition of a new abstract machine accepting natural language input. By necessity this is a highly restricted form of natural language
since
its semantics,
and hence
its syntactic
forms,
can be no
more than what may ultimately be reduced to a set theoretic interpretation. Moreover, it must be considered more restrictive than the set theoretic interface because while one may nest set theoretic expressions to an arbitrary depth, those beyond a certain depth simply cannot be stated To
speak
with
in n a t u r a l language
of
objects,
operators
natural
language
turns
in any comprehensible
and control mechanism
out
fashion.
in connection
to be highly unnatural,
It is possible,
that
in terms of the syntax of the interface which in turn may
level
however,
or rather
impossible.
to define an abstract machine
still be based on object types. This is in striking High High
similarity
on
to Very
Level languages vis-a-vis High Level program/r, ing languages: Very Level languages are loosely described as languages used to
specify what is to be done, rather
than how it is to be done
[SI 74].
In accordance with sec.2.2, the object types must relate to the ones of the set theoretic machine. In this case the relationship is straightforward as indicated by the following list: N proper names for the objects of the universe. A attributes (properties of an object of the universe). R references from one object of the universe to a second one Thebacon is referred to by Morphium M references to measures. D numbers or measures. S sentences.
These
or no, and proper names.
are of two kinds:
sentences
to
be
(e.g.
as its derivate).
sentences
answered
to be answered
by yes
by counting or enumerating
194
Some
examples
language
from
XAIfAS
in
which
German
was chosen as natural
interface.
Ist Psyquil
rezeptpflic__~ht_!~?
N A Betraegt die T a g e s d o s i s yon C n i n i d i n M
2 Gramm?
N
D
~elcne O e r i v a t e yon ~ o r p N i u m sina r e z e p t p i l i c h t i g f
The
syntax
of
the
inter£ace
is
describea
by
a 9ra~az
~itn tile
iollowing general properties: (i) S y n t a c t i c a l cannot
variables must
relate to the object types, hence
be based on tile traditional grammatical
noun,
noun
phrase,
essentially
adjective,
semantical
(attributes),
etc.
in nature.
RE(references),
categories
but on c a t e g o r i e s
they
SUCh as that are
The v a r i a b l e s are IN(names),
~F(references
to
measures),
ME ZA
(numbers) ~ SA (sentences), QO (quantifiers} . (2) On the other hand, the traditional c a t e g o r i e s inust be accounted for in some way, a consequence, features. sAS FE~ NED STR ATT ~OM
e.g.
in order
each syntactical
variable
incorrect
inflections.
is indexe~ my a number of
for
restricted
natural
nominative ) genitive ) case aative ) accusative ) wora c l a s s ( a a j e c t . / n o u n )
language,
grammars are Know~ to be
e x t r e m e l y complex because of the m u l t i t u a e of syntactic aspects be
observed~
insofar
As
Examples:
masculine ) NO~ feminine }gender GEN neuter ) OAT strong ~ e c l e n s i o n ACC attribute apposition ADJ number (singular/plural)
(3) gven
to reject
The
as it can be arranged
a) a c o n t e x t - f r e e grammar
in two levels,
in terms of the v a r i a b l e s
from (i); b) a feature program to be a s s o c i a t e d wit~l each p r o d u c t i o n on level a). Example:
Typical p r o d u c t i o n s of level a) are
aE
ME
-~
aE
ME - ~
RE
ME - ~ ~E -~
RE NE RE 1N
to
a p p l i c a t i o n of features s i m p l i f i e s tI~e grammar
SA -* ~IE sind ~h?
195
The production ME 1 -~ ME 2 ME 3 refers to the following feature program numbered
(syntactic variables are
for reference).
Part I: Test o~ right-hand features for acceptance (reduction takes place only i~ the condition is true). t__es~ (ME2,+ADJ+ATI')
A test
A ~!e~ (MAS,FEM,NE0,ME2,ME 3) A egu (NUM,ME2,~E3) Part 2: Assignment
(NO~,GEN,OAI,ACC,~IE2,~3)
of features to the syntactic variable on the
left-hand
side.
-ADJ-ATT,
co_~p (NUM,ME2),
and
(ME 3, -ADJ-Aq~) Ameq
(MAS,FEM,NEU,ME2,ME3) , a_qnd (NOM,GEN,DAT,ACC,~E2,ME3)
Feature operators are underlined. For example, test is true when the features of the first argument meet the condition specified by the second argument, me__qq is true whenever at least one of the listed features agree in both syntactic variables specilied, co~ copies the features ol the syntactic variable specified. 3.4 Pharmacolog~y The natural language level is supposed to serve a variety o£ application areas, we postulate that these application areas are all served
by
the
explainable only
in
the
in
same
natural
language
grammar
since
terms of set theory. Consequently, vocabulary
each ~ust De
these areas Giffer
they assign to the object types. Level 3 is
reached from level 2 simply by introducing names, and relating the object types. ~elow a few typical examples of assignment in the area of pharmacology. proper names
medications,
attributes
e.g. ~hebacon, Morphium, CIBA, Angina pectoris properties
references
e.g. Tablette, rezeptp~lichtig e.g. Indikation and Kontraindikation
references to measures
substances,
companies,
them to
are given
ailments,
(from ailment to
medication), Hersteller (from company to medication) e.g. Preis, Dosis, HaltbarKeit
numbers or measures
e.g. 5 DM, 2 %~abletten/i~ag, ~ ~oc~len
sentences
e.g. ~elche Preise haben Praeparate, die bei Angina Pectoris indiziert sind und deren Kont~aindiKation nicht Glaukom ist?
t96
3.5 T r a n s l a t i o n s ~he
path between aa3acent nodes
(3) and
(4)). ~e Shall briefly
natural t~ree
and
set
language.
traditional
code generation.
phases:
is traversed by translation
illustrate
(sec.2.3,
this for t~e passage between
In this case translation consists of t~e lexical
analysis,
syntactic analysis ano
The sentence
"~elche Firmen sind Herstelier
tablettenfoermiger
Medikamente?"
shall serve as an example. Lexical a n a l z s ! s Lexical
analysis
natural
language
exceptions,
includes the mapping level,
proceeds
and
for
from the p h a r m a c o l o g i c a l
each word encountered,
in three steps:
(i)
reduction of a word to its word stem;
(ii)
d i c t i o n a r y lookup resulting some
to the
with a few
of its features,
in a syntactical
variable,
and s m o r p h e m i c class,
level name for the word. (iii) a s s i g n m e n t of further
features
values of
as well as the set
on the basis of the m o r p h e m i c
class and the actual m o r p h e m i c ending.
• he lexical analysis of the entire word
Isyn.~
Ivar Welche Firmen sind Hersteller
Medika-
results
features
I
Q~ ME RE RE NE ME ME ME
tablettenfoermiger
sentence
in
]int.name
I +MAS+FEM+NEU -~OM+NOM+ACC FEM-NUM+NOM+GEN+DAT+ACC +MAS+NUM+NOM+DAT+ACC +MAS-NUM+NOM+GE~+ACC +MAS+NUM+NOM+AYT+STR+ADJ +FEM+NUM+GEN+DAT+ATT+STR+ADJ +f~AS+FEM+NEU-NUM+GEN+ATT+STR+ADJ +NEO-NUM+NOM+GEN+ACC
DB M26 R23
~9 M22
mente
Note the combinations lexical "Firmen',
syntactic ambiguities due to the d i f f e r e n t feature for "Hersteller" and "tablettenfoermiger'. Note also that
analysis all
by
four
itself cannot always determine cases are still possible),
"tabletten~oermiger') °
the case
(as for
or the gender
(as for
197
Syntactic
analzs!s
Syntactic analysis includes three phases: feature analysis (level b)), final code
reduction (level manipulation. For
a)), each
production applied, reduction and feature analysis follow each other immediately. Hence a production is applied in three steps: (i) Matching of input string and right-hand side. (ii) Test of right-hand features for acceptance. (iii) If true, reduction to left-hand side and assignment of features. For example, the production and feature program from sec.3.3 result in the following when applied to the phrase "tabiettenfoermiger Medikamente": ME2 ('tablettenfoermig'): I) +MAS+NOM+NOM+AT~+ADJ 2) +FEM+N~M+GEN+DAT+AT~+ADJ
(rejectea on m eq) (rejected on me_~q)
3) +MAS+FEM+NEO-NOM+GEN+AT~'+ADJ ME3 ('Medikamente') I) +NEH-NUM+NOM+GEN+ACC ME1 (result): i) +NEU+GEN-NOM-ADJ-ATT (note the disambiguation) The syntactic
analysis of the entire sentence
is illustrated
in figure
3. Because of the possibility of ambiguities the result is a parsing graph rather than a tree (in this case the ambiguity of the sentence is due to "Hersteiler'). The numbers adjacent to the syntactic variables refer to an associated list of features. Final code manipulation is left to the final stages of code generation, but must be considered part of the syntactic analysis because without it context-sensitive or transformational rules could not be avoided. ~o~e_g~neration Whenever a production is applied, a semantic action associated with it generates a functional set expression. Its arguments point to other such expressions unless they are individuals. Example: (tablettenfoermiger
Medikamente)
/ Mw (Mg) (tablettenfoermig)
MW (M221 (Medikament)
A,18
SA,19
~
M[,
14
ND HERSIELL
Figure 3
~\
Ip 9 RE, 8
ll
M£,I ~
ABL['r:[
-
~DIKAHEN
ME, 5 ME, ~ M~N [, N[o 2
?*. I
CO
199
WELCHE FIRHEN SIND HERSTELLER TABLZTTENFOERI41GER HEDIKAHENTE ?
02300047 I0000001 15000000 01100033 04000032 16000000 15000000 01100025 14100025 15000000 15000000 15000000 16000000 15000000 15000000 16000000 15000000 16000000 16000000 16000000 26000000
15000000 01100025 140000C5 15000000 16000000 01200001 10000001 15000000 01100045 01200040 01100C30 05000027 01200044 01100033 04000033 01100033 04000026 16000000 16000000 16000000 00000000
DB X1 t ~ M26 ) ( AA ~'T (22) ( ( ( ) ( ( ) ( ) ) ) E~IRBE
Figure 4
( AA ~'T ( 5 ) ( ) £ XI ( MV* VG* £N R23 MD ~H ~2Z MW H22 ) ) ) ........
200
On c o m p l e t i o n of the parse, syntactic
the pointer
variable SA is transformed
must be s u b m i t t e d
to a further
string m a n i p u l a t i o n
(i) C o m p l e t i o n of the syntactic
to the
This string
for two reasons.
analysis.
Quantifiers
do
not yet appear
them
is
subject
there
structure c o r r e s p o n a i n g
into a linear string.
to
a
in front of the expression.
~oving
number of rules that govern their
sequence. (2) O p t i m i z a t i o n . In many cases q u a n t i f i e r s can
The
cooe resulting
the p r i n t o u t Reverse Set
e.g. DB by
from translation o~ tne sentence adore is shown in
in figure
4.
translation
level names may
level
(whose e v a l u a t i o n may be time-consuming)
be replaced Oy stanaard set or relation operators,
immediately be translated
simply by again
conditions result.
(empty
invoking the dictionary.
sets)
into the p h a r m a c e u t i c a l However,
under certain
set e x p r e s s i o n s may themselves De part of a
This requires a translation
Examples: Vg(RI2, I14)
-~ Heiimittel
Mw(M9)
-~ t a b l e t t e n ~ o e r m i g
I2
-~ Verophen
into both level
2 and level
3.
fuer Psychosen
4 Semantic p_~rimitives as a basis 4.1 M o t i v a t i o n In
order
whether
to stuuy the a G e q u a c y of the rules o~ cn.2 anQ to d e t e r m i n e
they must be ~urther
of c o n s t r u c t i n g
systems,
refined or augmenteQ
to examine existing
in
t~e form of layers. One of the olQest
it
was
[Wo
not
conceived
that way)
it is help£ul,
systems of this ~ind
68,
~o
73]. Like the set theoretic approach,
of
objects
previous
approach,
is
taken.
but
the semantics data bases.
~oods"
universe
and i n t e r r e l a t i o n s h i p s between them. UnliKe
these are not c o l l e c t e d
treated as p r o p o s i t i o n s
This
(t~ougn
is Woods" q u e s t i o n - a n s w e r i n g machine
composed relations
snort
systems that are arrangeG
into m a t h e m a t i c a l
is the
sets and
to which a p r o c e d u r a l approach
is p r o b a b l y due to an o r i e n t a t i o n towards explaining of
natural language rather
than m a n i p u l a t i n g concrete
201
4.2 Semantic
Primitives
~bie~t_t~P~ O
Elementary
Fn
n-ary functions (n>l), e.g. departure x2). I~hese need not be functions function
objects,
may
yield
it is defined
Rn
e.g. Boston,
AA-57,
function
officer(x,O) = a 1 officer(x,al) = a 2
(end)
officer (X,an)
8:~0 a.m.
time (of flight x I for place in the strict sense. If a
more than one value
as a successor
(start)
(e.g. officer
of a ship)
such that
= E~D
n-ary
relation
arrive
(flight x I goes to place x2).
Designators
DC-9,
(predicate)
(n)l), e.g.
3et
(flight x I is a jet),
are either names of elementary objects or of ti~e form
Fn(Xl,...,xn) where x i is a (AA-57, Boston) for 8:00 a.m.
designator;
e.g.
departure
Propositions Rn(Xl,...,Xn) where x i is a designator; (AA-57), place (Boston), arrive (AA-57, Chicago). B
time
e.g. jet
Truth values
Example: (from
A
set of semantic
primitives
for the flight
schedules
[~o 68]):
Primitive
Predicates
CONNECT (Xl, X2, X3) DEPART (Xl, X2) ARRIVE
(XI, X2)
DAY (XI, X2, X3) IN (XI, X2) SERVCLASS (XI, X2) MEALSBRV
(XI,X2)
Flight X1 goes from place X2 to place X3 Flight X1 leaves place X2 Flight X1 goes to place X2 Flight X1 leaves place X2 on day X3 Airport X1 is in city X2 Flight X1 has service of class X2
JET (XI) DAY (XI) TIME (XI)
Flight X1 has type X2 meal service Flight X1 is a jet X1 is a day of the week (e.g.Monday) Xl is a time (e.g. 4:00 p.m.)
FLIGHT (Xl) AIRLINE (XI)
X1 is a flight (e.g. AA-57) X1 is an airline (e.g.American)
AIRPORT
X1 is an airport
(XI)
(e.g. JFK)
table
202
CIT~
(Xl)
Xl is a city
(e.g. Boston)
PLACE
(XI)
X1 is an airport or a city
PLANE
(XI)
X1 is a type of plane
CLASS
(XI)
X1 is a class of service
AND
S1 and $2
(SI, S2)
(e.g. DC-3) (e.g. £irst-class)
] |
Sl or $2 Sl is false
OR (Sl, S2) NO~ (Sl) IF~SE~ (Sl, s2)
~ |
(where S1 and $2 are propositions)
!
!
if Sl then S 2 J
Primitive F u n c t i o n s DTIME
(Xlo X2)
the d e p a r t u r e
ATIME
(XI, X2)
the arrival
NUMSTOPS
(XI,X2,X3)
time of Zlignt x1 from place X2
time of flight X1 in place X2
the number o£ stops of flight X1 between place X2 and place X3 the airline which o p e r a t e s flight X1
EQUIP FARE
(XI)
the type of plane of flight X1
(XI,X2,X3,X4)
the cost o£ an X3 type ticket from place X1 to place X2 with service of class X4
(e.g. the cost
o£ a one-way ticket from Boston to Chicago with first-class
service)
Qperators To
every
function
(procedure)
and relation there exists a p r o g r a m ~ e ~
which
subroutine
~ e t e r m i n e s a value of a £unction or the truth o£ a
proposition. Examples JET
(procedure names are capitalizeu) :
(AA-57)
-9
true
ARRIVE
(AA-57,Chicago)
-9
ARRIVE
(AA-57, boston)
-9
~alse
-9
8:~@ a.m.
D~II~
(AA-57, boston)
~nereas
the
specific terms
abstract
operators,
of
supplied
both by
the
microprograma~ing, adjusting
true
machine the
of cn.3 was Rased on object types Out
abstract machine
object and operator user
in this case
types. Specific
is define~
in
instances must be
for both of them. However, with the auvent of
computer
scientists
should have little p r o b l e m s
in
to this kind o£ notion.
Control m e c h a n i s m As
in
the
notation~
preceeing
e.g.
example,
p r o g r a m s are expresseo
in £unctional
203
TEST(CONNECT would
(AA-57, ~OSTON, C~ICAGO))
stand
for
"Does
AA-57 go £rom 5oston to Chicago?".
Likewise,
queries of any appreciable degree of complexity are based on the notion of bounded quantifier as a representative for loops. The £ormat for a quantified expression
is
FOR /:; where a type of quantifier (EACH,EVERY,SOME,THE,
nMANY).
a bound variable. class of objects over which quantification is to range. The specification is performed by special enumeration functions, e.g. SEQ,DATALINE,NUMBER,AVERAGE. Besides enumeration these functions may perform searches or computations.
restriction on the range
~ may both be quantified
scope
; expressions.
Unlike
KAIFAS
automatically
where the result of the evaluation of an expression retranslated
and
displayeG,
this
is
must be explicitly
requesteG by commands such as TEST (test trut~l o£ a proposition), PRINTOOT (print the representation for a ~esignator). Examples: (FOR EVERY X1 / (S£Q T~PECS):T;
(PRiNTOOT
(XI))
prints the sample numbers for all the lunar samples which are o£ type C rocks, i.e. breccias (T stands for "true"). (TEST (FOR 3~ MANY X1 / (SEQ FLIGHT):JET(XI); "Do 30 jet flights leave Boston?"
DEPART
(XI,~OSTON)))
4.3 Natural language As a general rule, the introductory remarks to sec.3.3 apply here as well: The level of the "English-like" query language provided on level 2 is influenced by t~%e range of expressions possible on the previously discussed
level i. In contrast to KAIFAS,
inspection of the data base
is not limited to the evaluation of level 1 expressions but may take place during translation from level 2 into level i, too. The semantic actions associated with a rule of grammar impose further restrictions, e.g. they make sure that the first argument of CONNEC~ is inaeed an instance of the class FLIGR~.
204
This
is
illustrated
syntactic analysis
by
the
£ollowing
example.
is p e r f o r m e d and a phrase marker
In a first step a is derived,
e.g.
NP
1 I M-57
NPR
/%
/\ 1
Since
verbs
in
~nglish
I~
,o
correspond
rougniy to p~eaicates, an~ noun
phrases are used to denote
the a r g u m e n t s of the predicate,
the
be
phrase
predicate. is
marker
will
In the example,
necessary
that
the
the
primary
factor
the p r e d i c a t e will be CONNECT.
subject
be
a
flight
the verb in
in d e t e r m i n i n g
and
that
the
For this it there
be
prepositional phrases whose objets are places representing origin (from) and d e s t i n a t i o n (to). The g r a m m a t i c a l relations among elements of a phrase marker
are defined by partial
GI:
S
/\ NP
G2;
S
t V 1 (2)
subjecl-verb
G3;
e.g.
S
i VP
VP
(I)
tree structures,
I t
VP
/ \ V 1
NP
i
{ I)
t2)
vetb-obj ect
/P\ PREP
NP
(| )
{ Z)
Pfeposffion-objec! modifying o VP
Among
the
phrase
three
n~arker,
structures,
v~hich of these
G1
and
G3 ootn match subt[ees
In the
is a c c e p t a b l e depends on the a~ditional
rules, e.g~ (GI:FLIGHT(1) ana(2) = fly). ((i) and (2) are p o s i t i o n a l v a r i a b l e s This rule o b v i o u s l y example,
the
is satisfied.
topmost
S-node
= to and PLACE((2))) ==>
tree structure).
More co~nplex rules are possible;
of the phrase marker
rule I-(GI:FLIGd%((1)) and (2) = fly) and 2-(G3: (i) = ~rom an~ PLACE ((2))) and 3-(G3:(I)
in the partial
CONNECT(I-I,2-2,3-2)
for
is matched by the
205
4.4 Air!ine 9 u i d e ~he system under discussion was first applied to a flignt seneQules table. TO illustrate the application interface, a few examples of queries shall be g i v e n below Does A m e r i c a n
(from
[Wo 68]).
Airlines
have
a
flight
departure
time
from
which
goes
from
~oston to
Chicago? ~hat is
the
Boston of every A m e r i c a n A i r l i n e s
flight that goes from Boston to Chicago? What A m e r i c a n
Airlines
flights
arrive
in Chicago from Boston before
1:8~ p.m.? Bow many
airlines
have
more
than
3 flights that go from Boston to
Chi=ago?
4.5 Lug~{ geology More
recently
the
system
evaluate the chemical that
accumulating
was
has
been
applied to access, compare ana
analysis data on lunar rock and soil composition as
a
result of the Apollo m i s s i o n s
[~o ?3].
Examples: What is the average c o n c e n t r a t i o n of aluminum in high alkali
rocks?
Give me all analyses of SI~046! How many breccias contain olivine? Do any samples have greater
than 13 percent aluminum?
What is the average model c o n c e n t r a t i o n of ilmenite
in type A rocks?
4~6 Critique (i) The
possibility
during
of
translation
confusion. related
Since,
to
inspecting the data base both on level 1 and from
definition,
reference
to
practical
repercussions:
necessitate control
the changes
mechanism
level 2 to level 1 introduces a note of
according data in
the
to sec.2.3, translation
base.
The
Either the
translation process
is d i r e c t l y
must
make no
lack of separation will have
certain changes on level 1 will
rules
of
grammar, or parts of the
for level 1 must be duplicated
for translation
purposes. (2) In
Wooas"
system
the
subroutines
their arguments are of the proper whether
AA-57
kind
do not appear to verlfy that (e.g. ARRIV~ Goes not c~eck
is indeed a flight or Chicago a place),
since this
206
is
done
on
translation~
then p r i m i t i v e These
interdependencies
the
parlance
corresponding arguments.
of to
relationships
this
those
structures unary
for
circumvents
predicates macnines
axioms
t~is
types that
or
must
restrict
accoun~ by
or
in
ranges
oi
machine
ana
not only
for
(~ote that
only
1
categories
of a D s t r a c t
as well.
problem
to level
to each oLner.
by a set oi axioms, Dy
tt~e c o n c e p t s
abstract
but
(correctly)
are related
may be e x p r e s s e d
data
between
terms
left
and functions
As a consequence,
primitive machine
If one
predicates
the KAl~AS
prescribing
all
operators.) (3) O p e r a t o r s albeit
(subroutines)
in
a
one-to-one
requirements are
met
governing
it
corresponding
5 Relational
ana
objects
fashion.
are
In order
the r e l a t i o n s h i p
suffices
to
procedure
as two
treat
interdependent to make
between
a predicate
instances
as well,
sure
that the
abstract
machines
or
function
o£ the same
and
its
resource.
model
5.1 M o t i v a t i o n One
oi
the
relational well
to
users
an
to
iormatte~
A
certain
reade r ' s
are
abstract
unlverse
same way:
field
names a
uniquely
a sequence
or,
as is
by
supposes
oi £ielGs are
ordered
a key,
i.e.
his
structures.
of entries
t~ey an
is Coua's
particularly
CoQ~
that may be named.
identified
Oases
itsel~
of table-liKe
ol a number
entry
is a relation
to Qata
lenas
machlnes.
in terms
the
formally,
a table
by
consists
or
More
consequently,
approaches
72, ~e 74] which
a table
exactly
headings
but
their
speaking, in
particular
alscusseQ
interpretation
attributes. named
widely
[Co 7G,Co
explain
Intuitively certain
most mooel
t~at
are
orGerea
called n-tuple ~ntries
on
here, and,
are not
the contents
ol
fields. familiarity part.
Only
with
the
relational
its i n t e r p r e t a t i o n
model
by a m a c h i n e
here.
5.2 R e l a t i o n a l
algebra
Qbie~t & A
attributes
Kn
relations
naming
a set of ob3ects
(domain)
is assumea
on the
will be e x a m i n e d
207
R n (AI,A2,...,A n) S A 1 x A 2 x ... x A n Example: S U P P L I £ R (SUPPLIERNR, ~AME, LOC), K E Y = S O P P L I E R N R SUPPLIER:
SUPPLIERNR
NAME
LOC
1
Jones
New York
2
Smith
Chicago
3
Connors
Boston
4
~hompson
New York
Key
attributes are indicatee;
anQ
other
Keys may be composite.
Hierarcnicai
relationships are usually eliminateo ~y normalization.
~ence all relations can be assumea to be normalizea. Tn
~
R n n-tuple.
Operators 9tand~d Rnl Q
[We 74] rela~ign o p e r a t o r s
Rn2 -9
Knl+n 2
Direct Product: {(Tnl~Tn2) JTnl E Rnl^Tn2 e R n 2 ) (~ C o n c a t e n a t i o n operator) } attributes
Rnu Rn
-~ R n
Union
R n ~ Rn Rn - Rn
-9 R n -~ R n
In t ~ r s e c t i o n l must be Di£~E~ence "compatible"
Special o p e r a t o r s Rn[A]
-9 R m
Projection:
Kelation R n restricteo
to the
attributes A={AI,...,Am}. Rnl [AQ~]Rn2-~ Rnl+n2Join: { ( T n l ~ T n 2 ) JTnl E Rnl ^ Tn2 ~ Rn2 ^ Tnl [A]~Tn2 where A,~ sets of attributes, @ one oi (Slight modifications, R n [A@B] -9 R n
Restriction:
e.g. natural
R n [A÷~]R n ->R m
~iv~sion:
[Co 71], p.74.
{=,9,,l}.
join, are possible).
{~nJTng R n ^ Tn[A]@Tn[B] }
where A,B,O as above.
[B]}
208
~o£tio ! ~e£h~n!s ~ Since are
all
operators
formed
by n e s t e ~
i~elational nave
by linear
For
5.3 R e l a t i o n a l
calculus
In
relation
place
oi
reduced
in
the
for
Individual
an e x a m p l e
algebra
see
Co~G
relational
infix
operators,
and
sec.
operands
"programs" rather
than
5.3.
proposes
an~
an a p p l i e u
proceeds
calculus
relation
constants,
constants,
Tuple
variables,
(attributes
to
show
preQicate tnat
(alpha-expression)
algebraic
may
any be
expression.
are
a I, a 2, a 3,
...
i,
.......
indexeu
2, 3, per
4,
relation
insteau
ot namee)
r I, r 2, r 3, ......
constants,
monodic, dyadic,
Logical
as
operators
the c a l c u l u s :
Index
Predicate
o£
calculus),
to an e q u i v a l e n t
Alphabet
defined
(ALPHA)
(relational
expression
been
sequences
expressions.
calculus
al~e£r~)
symbols,
PI,
P2,
P3, .... ;
=,~,,~
3, V , A , v ,
Delimiters. Simple
alpha-expressions
nave
(t I, t2, .... , tK) : w where - w a well-fo[meu -
formula,
terms
consisting
non-indexed
tuple
variable,
set
of
is p r e c i s e l y
tuple the
~xample:
Alpna-expresslon
suppliers
each
o£ W h O m
variables set of
indexeQ
occurring
in
free
ior
supplies
of an
variables
"~ino all
the
] P3r3((rl[l]=r311])
reduction
to r e l a t i o n
tl,
name
projects":
S1 = R1 S2 = R2 S3 = R3
s=sI®s2®
3
T 3 = S[I=6]~S~8=4~ T 2 = '1'3
[1,2,3,4,~]
TI = T2
[(4,5)÷(1,2)]S 2
A (r313]=r2[l]))
algebra:
or .o,
tk
in w.
r2{3]):
Plrl^~P2r2 After
form
t i distinct
- the
(rl[2],
t~e
and
location
oi all
209
= TI[2,3 ] ALPHA
is
a
appealing
language
to the user
may be r e f o r m u l a t e d I~ANGE S U P P L I E R RANGE
PROJECT
RANGE
SUPPLY
G~T ~
in A L P H A
SUPPLIER PROJECT SUPPLY
~or
((L.SUPPLIEk~=K.SUPPLI~R~k)
(order of q u a n t i f i e r s
similar do
of
tnis
to
= K.SUPPLIERNR) A (K.PROJNR
a
have
kind
is SQOARE
= P.PROJ~R)
each
such
statements
found
columns
However,
of a table
formal
looking the
been
shown
une
to oe
the view o£ [elatlens ~y ALPHA: for a value
one row after
examine
have
training,
wnich has been
from t~at offerea
to inspecting
value
of given
3 an~ 4 languages
[bo ?4]
calculus.
or columns
(as opposed
in cns.
to rely on a user's
is d i f f e r e n t
column
elements
to the ones
not
the relational
of values
SQUARE
A (~.PiO0~R=P.P~OONk))
be m a i n t a i n e d ! ) ,
L.LOC):
Dy SQOAR~
(ii)For
must
L
that
(i) Scan
as
levels
reducible offered
more
ine example
P ALL
reasons
language
is slightly
above,
L.LOC):
(L.SOPPLIERNR
devised
that
shown
K SOME
GET W (L.NAME,
5.4 Higher
form
K
or, e q u i v a l e n t l y
RANGE
expressions
F
(VP) (~K)
RANGE
alpha
L
(L.~AME,
RANGE
for
than the p r e d i c a t e
or a set
another).
corresponning
row anG
in this row.
are of a form suc~ as
("aisjunctive
mapping")
bRA(S) (read: is
a
"find B of R where A is S") relation,
respectively), Other
forms,
a similar
A S
e.g.
and is
an
B
that defines
a mapping
are sets of a t t r i b u t e s argument
for projection,
that may
conjunctive
itself
(domain
such
be an expression.
and n-ary mappings,
appearance.
Example : ~iA~iggMP DEPI' ( "TOY ") stanGs
for
"FinQ
the names
of e m p l o y e e s
that R
and range,
in ti~e toy aepartment".
nave
210
~ore a
recently
attempts
relational
[Co
~4].
ehs.3
data
%he a p p r o a c h
and
nave
base
4 in that
been
reporteo
that allow
system
in a ~ialog
~oun~eQ
~ii~ers
drastically
from
a truly
two-way
a user
to engage
on natural
~ngiisn
t~e ones ~ i s c u s s e o
communication
in
is envisioned.
5.5 Comment It
has
been
relational
shown
algebra,
expressible SQUARE
tnat botn
in
i.e.
ALPHA
are t h e m s e l v e s
any query
and
equivalent.
on tne s u c c e s s i o n
equivalence~
the
definition
~rom
the point
given
ss
relational
increasing notion
o~ user
level).
expressible
Equivalence
in relation
of the h i e r a r c h y ALPHA
indicates
does
- SQOAR£
that
ALPHA
is and
relation.
not preclude
by r e s t r i c t i o n
a hierarchy
to the
algebra
hence
is a s y m m e t r i c
machines
sophistication -
are e q u i v a l e n t
and vice versa,
of abstract
algebra This
of h i e r a r c n y
and SQUARh
in S Q U A ~ ,
The c o n d i t i o n does.
ALPHA
however still
De
(in the e i r e c t i o n
of
~urtner
coul~
refinement
on the
is necessary.
6 Conclusions There and
are
some
striking
similaritzes
between
the examples
o£ cns.3,4
5:
- In each - All
the lowest
rely
on
level
has been well
quantification
as
a
£ormalizeu. means
for
building
complex
expressions. -
All
- All
tend
towards
three
systems
On the other a
less
natural
hand,
formal
Experiences
have been only
but
indicate
~nile
a
objectives between has
been
that
successive
perhaps
in the belore.
translations, raise
o£
so far
system
Rave
(cn.3)
to provide level.
situations,
as well.
at the very
least
they
meet
the
languages
coulo
0£ course,
the r e l a t i o n s h i p
nigher
techniques
the e f f i c i e n c y
attempteo
to De made much more precise,
Furthermore, ane
some application.
on an i n t e r m e d i a t e
proof,
user
introduction. will
found
levels°
in some w e l l - d e f i n e d
do noc c o n s t i t u t e
levels
inoicateo
(ch.5)
the KAIFAS
higher
and
language
at least
nierarcnies
mentioned
o~ s u c c e s s i v e and
with
~ew e x a m p l e s
suggest
stylized
that,
on their
implemented
one of them
still
this may be n e c e s s a r y
Qo
language
of nigher
must
levels
imply
be e x p l o r e d
levels.
~inally,
did not attend to the critical q u e s t i o n what form take; this a p p e a r s to be a largely unsolved problem.
as
a number
to measure tne paper
the root should
211
Acknowiedgement~ The reading the manuscript
author is grateful to G.Goos and making helpful suggestions.
for carefully
Re£erences [Ab 74]
J.R.Abrial,
[BO 74]
R.F.~oyce, D.D.Chamberlin, W.F.King, M.M.Hammer, Specifying Queries as Relational Expressions, in [KI 74], 169-176
[Bu 72]
Burroughs Corp., Language (ESPOL),
[Co 76]
E.F.Codd, A Relational Model for Large Snared Data BanKs, Comm.ACM 13(197~), No.6, 377-387
red 72]
E.F.Coad, Relational Completeness of Data base Sublanguages, in: ~.Rustin (ed), Data Base Systems, Courant Computer
Data Semantics,
in [KI 74], 1-59
B6700/77~ Information
Science Symp.,
Executive System Programming Manual, 1972
Prentice-Hall,
Inc. 1972, 65-98
red 74]
E.F.Coea, Seven Steps to Rendezvous in [KI 74], 179-199
with the Casual 0ser,
[Col 68]
L.S.Coles, An Online Question-Answering System with Natural Language and Pictorial Input, Proc. 23rd Natl. ACM Conf. (1968), 1.69-181
[Go 73]
G.Goos, ~ierarchies, in F.L.Bauer (ed), Advancea Course on Software Engineering, Lecture Notes in Econ. and Math. Systems, vol.81, 29-46
|Gr 69]
C.C.Green, The Question-Answering Univ. 1969
[~i 74]
J.W.Klimoie, Nortn-Hollana
|Kr 75]
K.D.Kraegelo~, P.C.Loc~emann, Bierarcnies o£ Data Languages: An Example, Information Systems (in print)
[Su 74]
B.Sundgren, Conceptual Foundation of Approach to Data Bases, in |KI 74], 61-94
[SI 74]
ACM SIGPLAN Symposium on Very High Level Languages, 1974, ACM, New York 1974
Application o£ ~neorem Proving to Systems, Tech. Rep. ~o. CS138, Stanford
K.L.Koffeman (eds), Publ. Co. 1974
Data
Base
the
Management,
Base
In£ological
March
212
[i~e 74]
H.WedeKino, Data Base Systems I, ~I-~issenscna£tsverlag~ Reine Informatik, vol.16, 1974 (in German)
[Hi 68]
N.Wirth0 Computers,
PL3~6, A Programming Language Journ.ACM 15(1968), No.l, 37-74
[wo 68]
~.A.WOOdS~ Machine, 457-471
Proce0ural Semantics £or a Question-Answering Proc. AFIPS Fall Joint Coff!p.ton~l 33(1966),
[No 73]
WoA.~oo~s~ Progress in Natural Application to Lunar Geology, 42(1973)~ 441-450
£or
tne
36~
Language 0nde[stan~lng - An Proc. AFIPS ~ati.Comp.uon£.
Ein System zur interaktiven Bearbeitung umfangreicher Me~daten Ulrich Schauer,
IBM Deutschland GmbH, Wiss. Zentrum Heidelberg
Zusammenfassung Bei der Bearbeitung von Megdaten mu~ man unterscheiden zwischen einer Standardauswertung der Messungen, bei der eine bestimmte Modellvorstellung zugrunde liegt und einer Analyse mit dem Ziel, logische Zusammenhange zu erkennen und ein erkl~rendes Modell zu finden. W~hrend die Standardauswertung durchaus im Stapelbetrieb ablaufen kann mit einem Datenmodell,
das abgestimmt ist auf die im Modell ablesbaren Verknfipfungs-
m6glichkeiten,
ist ffir die Analyse ein interaktives System
wfinschens-
wert mit einem Datenmodell, das beliebige Verknfipfungen erm6glicht und mit einer Datenmanipulationssprache,
die mSglichst deskriptiv sein soll-
re, aber komplexe Auswahlkriterien erlaubt. Verf~gbare Systeme werden den Anforderungen der Analyse nur teilweise gerecht, meist mangelt es der Datenmanipulationssprache
an F~higkeiten zur rechnerischen Datenbe-
arbeitung. Im folgenden wird ein experimentelles System ffir die Bearbeitung von Megdaten beschrieben,
an dem im Wissenschaftlichen Zentrum der IBM in Hei-
delberg gearbeitet wird.
t.
EINFOHRUNG
Umfangreiche Sammlungen yon Megdaten k6nnen erst in vollem Mage nutzbar gemacht werden, wenn die f~r die Analyse zust~ndigen Fachleute Wissenschaftler,
(z. B.
Techniker - meist ohne groge Programmiererfahrung)
in die Lage versetzt werden, ohne Zuhilfenahme von Programmierern selbst die Bearbeitung vorzunehmen. Dazu ist ein interaktives System erforderlich, das erlaubt, Teilmengen der Daten unter komplexen Auswahlkriterien zu bilden und in vorhandene oder neu zu schreibende Bearbeitungsprogramme zu stecken und die Ergebnisse tabellarisch oder graphisch darzustellen.
214
Schon bei den Auswahlkriterien k6nnen recht verwickelte Berechnungen anfallen, die z w e c k m ~ i g
mit Bausteinen aus einer Programmbibliothek
durchgeffihrt werdeno Anpassung des Systems an bestimmte Fachgebiete ist damit m6glich durch Anpassung der zugrundeliegenden Programmbibliothek. Da nur eine begrenzte Anzahl yon vorgefertigten Programmen zur Verffigung stehen kann~ wird h~ufig noch Datenmanipulation durch eine Tr~gersprache (host language) notwendig sein. Als Tr~gersprache ist APL ffir die angestrebte Zielsetzung besonders geeignet durch ein hohes Mag an Interaktivit~% durch Anpassungsf~higkeit
an die Programmiererfahrung des Ben~tzers
und eine Vielzahl yon Operationen zur Datenmanipulation. Figur ! vermittelt einen 0berblick fiber den Systemaufbau. DatenManagementsystem
........ IInformationsSystem
DatenManipulations System
Interaktive Tr~gersprache
(APE) FIGUR I:
System-Aufbau
Die Datenbank enth~it sowohl Problemdaten als auch beschreibende Dateno Programmbibliothek steht symbolisch f~r eine Sammlung von Programmen, die in PL/I, FORTRAN oder Assembler geschrieben sein k6nnen und die von APL aus mit Daten aus dem APL-Arbeitsspeicher oder der Datenbank angestogen werden k~nnen und ihre Ergebnisse wieder im APL-Arbeitsspeicher abliefern. Die Benfitzer-Kommunikation erfolgt mit APL oder mit einem der in APL eingebetteten Systeme zur Manipulation yon Megdaten, Pro-
215
grammen und zugeh~riger Dokumentation. Als Benftzerstation
(Terminal)
kommen in erster Linie Bildschirm und Schreibmaschine in Frage. Einen 0berblick fiber die Datenkomponenten,
die vom System zu verwalten
sind, gibt Figur 2. Katalogbearbeitung beschreibende Daten
ProblemDaten
5) und zur Datenmanipulation
(z. B.
x ÷ y ÷ z-tOO) ffir numerische und abgesehen yon arithmetischen
216
Operationen auch ffir nicht numerische Daten. Die Verwendung yon APL als Tr~gersprache erlaubt insbesondere auch bequeme Manipulation yon Rechtecksstrukturen yon numerischen und yon Textdaten (Vektoren~ Matrizen). b) Unterprogramme
zur L6sung von standardisierten Problemen aus Ge-
bieten wie Mathematik tiation) und Statistik
(z. B. numerische Integration und Differen(z. B. lineare Regression, Testverfahren,
Darstellung yon H~ufigkeitsverteilungen c) Anwendungsbezogene zeichnungen,
Standardverfahren
etc.).
(z. B. Analyse von EKG-Auf-
Klassifizierung von FingerabdrQcken etc.).
Die Tr~gersprache APL mit einer Vielzahl von verf~gbaren APL-Bibliotheksprogrammen und der M 6 g l i c h k e i ~ v o n
APL aus
graphische Darstellungen zu
initiieren, bietet schon alle M6glichkeiten zur Datenmanipulation.
Trotz-
dem sind die Klassen b) und c) notwendige Bestandteile des Systems. Die Klasse b) erlaubt Ausweichen auf FORTRAN, PL/I oder Assembler geschriebene Unterprogramme,
was besonders bei grogen Datenmengen bessere Rechen-
zeiten bringen kann. Programme der Klasse c) existieren vorwiegend in FORTRAN oder PL/I~ weil sie meistens f@r Anwendung im Stapelbereich entwickelt werden. 2.2
Problemdaten
Das System ben@tzt ein relationales Datenmodell~ die Datenbank besteht aus einer Sammlung umfangreicher Tabellen, die mit leicht verst~ndlichen Operationen manipuliert werden k~nnen (Codd /1,2,3/). Datenattribute sind den Spalten einer Tabelle fest zugeordnet wie beim SEQUEL-System (Boyce, Chamberlin /4,5/). Spezifikation von Teilmengen von Daten aus einer oder mehreren Tabellen erfolgt mit einer an Beispieleintragungen in die fraglichen Tabellen orientierten deskriptiven Sprache, die sich gleichermagen fur den Einbau von Unterprogrammaufrufen ablauf eignet
in den Programm-
(Zloof /6/).
Die Datenelemente in einer Tabellenspalte k~nnen dimensionierte Daten sein (z. B. Vektoren, die eine Me~reihe darstellen oder Matrizen, die mehrere Megreihen oder eine Funktion yon zwei Ver~nderlichen darstellen k6nnen etc.)° Die offensichtliche Mehrdeutigkeit wird duTch eine der Tabellenspalte zugeordnete Interpretierung behoben. a) Interpretierungsattribut: Regelt die Deutung einer Matrix, z.B. als Werte einer Funktion yon zwei Ver~nderlichen in den Punkten eines gleichabst~ndigen Gitters. Die Definition der Gitterpunkte
217
(x ° + i.h, Yo + j'k)
i = O, I, ..., m-1 j = O, I, ..., n-1
erfolgt durch Angabe von Xo, Yo' h, k und m, n. b) Darstellungsattribut: Erlaubt Spezifikation yon Verdichtungsmechanismen fur Datendarstellungen in Erg~nzung zu beispielsweise I, 2, 4 byte integer. c) Speicherungsattribut: Die meisten Daten werden in der XRM-Datesbank gespeichert digitalisierte
(Lorie /7/). Umfangreiche Datenelemente
(z. B.
Bilder) k6nnen jedoch auch in yon CMS (Conversational
Monitor System) verwalteten Band- oder Platten-Dateien
abgelegt
und in XRM nur durch Angabe ihres Dateinamens und einer Zugriffsroutine bekannt gemacht werden. Das System besorgt automatische Umwandlung physikalischer Einheiten und automatische Datenkonversion entsprechend Interpretierungs-, Darstellungsund Speicherungsattribut
sowie Beachtung yon durch logische Bedingungen
definierten Konsistenzregeln
bei neuen Eintragungen
oder ~nderungen in
einer Tabelle. 2.3
Beschreibende
Daten
Das System zur Manipulation der unformatierten
Kataloginformation
ist
eine selbst~ndige Komponente mit F~higkeiten fNr Generierung, Wartung und f@r rechnerunterstNtztes Auffinden der relevanten Katalogeintragungen Nber Daten und Algorithmen (Erbe, Walch /8/). Formatierte Datenbeschreibung wird in der XRM-Datenbank
gespeichert und umfaSt jeweils ein
Verzeichnis von: a) Umwandlungstabellen
f~r physikalische
Einheiten.
b) Methoden mit Programmidentifikation. c) Datenattributen mit Tabellen und Spaltenbezeichnern. Mittels b) und c) kSnnen Programme und Tabellen rasch identifiziert werden, wenn die Bezeichnung der Methode bzw. der Attribute der fraglichen Tabellenspalte bekannt sind.
3.
DIE DATENMANI~ULATIONSSPRACHE
Zun~chst sind zwei Sprachebenen vorgesehen.
218 Prgz!durale Sprachebene
3.1
Die folgenden Eigenschaften
kennzeichnen die prozedurale Datenmanipula-
tion: a) Der Datenzugriff erfolgt durch APL-Befehle (Lorie, Symonds /9/)° b) Umwandlungen zwischen der externen Datendarstellung in der XRMDatenbank und der internen Datendarstellung (z. B. Darstel!ung und Speicherung). c) Konsistenzregeln
erfolgen automatisch
werden automatisch kontrolliert bei Datenzug~ngen
oder Ver~nderungen. d) Die Daten werden tabellenweise e) Der Ben~tzer ist verantwortlich ten hinsichtlich physikalischer
oder zeilenweise verarbeitet. fur korrektes Verarbeiten der DaEinheiten und Interpretation.
Deskriptive SpFacheben ~
3.2
Die nicht prozedurale
Sprache EQBE stellt eine Erweiterung dar von QBE
(Query by Example, Zloof /6/). Sie eignet sich auch fur Ben~tzer mit geringen Kenntnissen in APL (Erfahrung im Umgang mit APL als Tischrechner gen@gt) und ohne Programmiererfahrung. Die Sprache ist in hohem Ma~e deskriptiv. Relationen und in der Programmbibliothek verf~gbare Unterprogramme werden als Tabellen dargestellt, und der Ben~tzer formuliert seine Datenauswahl, indem er entsprechende Zeileneintragungen vornimmt, die Ausgabewerte bezeichnet und Auswahlkriterien - soweit erforderlich durch APL-Statements definiert. EQBE l ~ t sich am besten anhand yon Beispielen erkl~ren. 3.3
Beispiele R
~
r
Ix
zu E~BE ist ein Schema fur eine Tabelle mit dem Namen R und
I y ~
zwei Spalten mit den Bezeichnern RI und R2.
Die Werte x~ y stellen eine Tabellenzeile
dar, r ist ein Bezeichner
diese Zeile. r, x, y werden vom Ben~tzer eingetragen in das Schema
R
IRI
fur
I R2 ~ I
a
das vom System geliefert wird, wenn man Tabelle R anfordert. Die Datenvariablen x, y k6nnen alle in R gespeicherten Tupelwerte annehmen.
{ ( x , y) I
(x, y)
!. Auswah! einer Spalte
O÷ X
e R}
(Projektion)
219
Die Angabe
eines
Zeilenbezeichners
ist als Symbol Die Abfrage Gesucht
ffir Ausgabe
ist nicht notwendig.
zu verstehen.
lautet:
ist die Menge
Eine m6gliche
der x Werte
Formulierung
{x I ~ ( x ,
y)
Selbstverst~ndlich nur auf Werte Im folgenden
aus RI.
im Pr~dikatenkalkfil
wgre
ER}
erstreckt
sich der Definitionsbereich
aus der R2-Spalte schreiben
von y
yon R.
wir daffir auch k~rzer
{x I u ( x , ) } und fassen u(x,) in R existiert, 2. Einfache
als Pr~dikat dessen
Abfrage
gersprache R
RI
R2
u
x
y
mit einschrgnkenden
formuliert
x>
auf, das wahr
erste Komponente
ist, wenn ein Tupel
gleich x ist.
Bedingungen,
die in der Trg-
werden.
,31 z
5 +yxy
(z < 25) V (z > 50)
D~x {x [~3u(x,y,z) yz
A (x > 5 + y × y)
A ((z < 25) V z > 5O) }
3. Schnittmenge
x > y z = 10 ~÷x T r g g t man i n S a n s t a t t das APL-Statement
z den konstanten
W e r t 10 e i n ,
z = 10.
oder {x ]~/9 r ( x , y ) yz
A s(x,z)
A (x > y )
A (z=lO)}
so e n t f ~ t l l t
220 4. Vereinigungsmenge
x1> y z = 10 0÷
x
{ x
] ~y u C x , y , ) A Cx> y) } L) { x
} 3z
vCx,,z)
(x
i (#. u(x,y,) A (x> y)) v ::]zzvCx,,z) A (z=1O)}
A (z=10)}
oder
S.
Differenzmenge
r
x
y
D+x {x
[ ~ r(x,y) A ~ s(,x) }
Selbstverst~ndlich muB jede Datenvariable, die in einer negierten Tupelvariable auftritt, auch in einer nicht negierten Tupelvariabfen auftreten (oder als globale Variable bekannt sein). 6. Kartesisches Produkt
R RI I
x ...... :I
r
O+
x,y,xl,z { (x,y,xl,z) I r(x,y) A s(xl,z) }
7. Equijoin (Restriktion im Kartesischen Produkt)
-
~1 ~
~ I1~1 Ix i"I ~'2"'I ~
~+x,y,z { (x,y,z) I r(x,y) A s(x,z)}
221
8. Verallgemeinerter
Join mit nachfolgender
R
RI
R2
S
$I
$2
r
x
y
s
xl
z
Projektion
x_>y B÷z
{z
I 3x x-3I -3y
r(x,y) A S(Xl,Z) A (x >- y)}
Anstelle des _> Operators k~nnte eine beliebige goolsche Funktion stehen. 9. Division R r
RI Ix
R2 I y
I
S
$I
$2
T
TI
T2
s
x
z
t
.y
z
~]+x {x I~z ¥Y6 r r(x,y)A s(x,z) A t(y,z)} .y steht fiir {y l~x ~z r(x,y) A wobei
-4
s(x,z)}
,
bedeuten soll, daI~ x fest zu w~hlen ist, und das Auf-
X
treten yon .y in t ist so zu verstehen,
dab gilt ~ Y6.Y
t(y,z)
10. Gruppierung
{x Iv v { r ( x , y ) A s(x,z)A t(y,z)} g
kann bis jetzt noch nicht formuliert werden. Man braucht ein Hilfsmittel, um AbhRngigkeit zwischen Variablen anzugeben. Mit der Vereinbarung,
daf~ y.z bedeuten soll-I ~z ' sind die entspreY chenden Eintragungen :
sis ] r
x
y
s
t
..............
.y
y.
zl
D+x Wir sind jetzt in der Lage, jede Operation der Relationenalgebra auszuf{ihren. Die Vollst~ndigkeit yon QBE in der vorgestellten erweiterten Form ist damit fiir einfache Abfragen, die nur eine Operation der Relationenalgebra
umfassen,
erwiesen.
Sie folgt auch fur beliebig zusammengesetzte Operationen: Jede Abfrage yon QBE etabliert bei ihrer Definition eine logische Datensicht, die der Resultattabelle entspricht. Erst bei Ausf~hrung eines APL-Programmes) das yon einem Abfrageprozessor
aus der logischen Datensicht erzeugt wird,
222
entsteht die Resultattabelleo
Eine neue Abfrage kann auf der iogischen
Datensicht yon schon definierten Abfragen aufgebaut werden, und damit kann eine komplexe Abfrage in Einzelschritte aufgel~st werden. 3.4
Diskussion der Erweiterungen von QBE
Die nachfolgend beschriebenen Erweiterungen erlauben die Behandlung yon recht komplexen Abfragen, wie sie bei Me~daten zu erwarten sind, ohne die Einfachheit f~r elementare Abfragen zu beeintr~chtigen. a) In einer Programmbibliothek erfa~te Algorithmen (APL-Funktionen, FORTRAN-Unterprogramme, PL/1-Prozeduren oder Assemblerroutinen) k6nnen f~r Datenauswertung oder Datenselektion innerhalb einer Abfrage eingesetzt werden. b) Beliebige APL-Befehle k6nnen innerhalb einer Abfrage zur Datenselektion und Auswertung verwendet werden. QBE erlaubt auger den Vergleichsoperationen nur eine begrenzte Anzahl eingebauter Funktionen wie COUNT, SUN etc. ¢) Die Resultattabelle einer Abfrage kann durch Angabe yon formatbeschreibenden Formularen auf verschiedenste Art dargestellt werden, auch in graphischer Form und wiederholt mit wechselnden Formularen. d) Dutch jede Abfrage wird eine logische Datensicht definiert, die zur Entkoppelung komplexer Abfragen in einer Folge von einfacheten Abfragen verwendet werden kann. e) Jede Abfrage kann zu wiederholten Malen ausgef~hrt werden. Dabei k~nnen von Mal zu Mal die Werte globaler Variablen ge~ndert werden. F@r APL-erfahrene Ben~tzer er6ffnen sich dadurch interessante Mgglichkeiten zur Datenbearbeitung mit anpassungsfghigen Bausteinen. f) Der Entkopplungseffekt von QBE, da~ die Zeileneintragungen in beliebiger Reihenfolge m6glich sind, wurde noch verst~rkt (Verwendung der Gruppierungsm6glichkeit). g) Durch die Gruppierungsm~glichkeit k~nnen auch Abfragen ohne Zerlegung in aufeinanderfolgende Schritte bearbeitet werden, die sich der Behandlung durch QBE entziehen. h) Als Gegenst@ck des ALL D-Operators (all different) von QBE dient in EQBE ein vorgesetzter Punkt, entsprechend beim ALL-Operator (alle mit Wiederholungen) ein vorgesetzter Punkt und Angabe des Tupelbezeichners in Klammern gesetzt. Eine Pseudovariable wie .y oder .x (r) kann in APL-Befehlen verwendet werden und steht stellvertretend ffir einen Bereich gleichartiger Werte.
223
4.
MESSDATENBEARBEITUNG
4.1
Das Datenbearbeitungssystem
APL ist zur interaktiven Analyse von Me~daten, die im APL-Arbeitsspeicher Platz finden, hervorragend geeignet (Schatzoff /10/). Bei gro~em Datenumfang verliert APL an Attraktivitgt, weil Datenselektion aus Tabellen dann aus Platzgr~nden nicht im APL-Stil durch eine Operation abet einen dimensionierten Bereich dargestellt werden kann, sondern nur durch eine Rekursionsvorschrift ~ber alle Tabellenzeilen. Eine prozedurale Sprachebene mit APL als Trggersprache
ist daher noch nicht voll zufriedenstel-
lend. Ein weiterer Gesichtspunkt bei Me~daten ist, da~ Messung h~ufig f@r die Zusammenfassung
von vielen Einzelwerten
steht (z. B. digitalisierte
Me~-
kurve). FUr die Bearbeitung solcher Messungen ist es w@nschenwert yon der Tr~gersprache APL aus, Programme, die in einer anderen Sprache (FORTRAN, PL/I, Assembler) Andere experimentelle
entwickelt wurden, aufrufen zu k6nnen.
Datenbanksysteme,
die APL als Trggersprache
ver-
wenden, sind meist nur ffir geringen Datenumfang konzipiert (Palermo /I]/), Klebanoff, Lochovsky, Tsichritzis /12/) und erlauben den Einsatz von Programmen,
die nicht in APL geschrieben wurden, entweder gar nicht
oder nur mit ineffizienter Datenkommunikation
(~ber externe Dateien).
Bei der in Figur 5 beschriebenen Architektur erhalten wir ein System zur Probleml~sung mit Datenbankzugriff
auf zwei Sprachebenen
(prozedural und deskriptiv)
Einsatzm~glichkeit von vorgefertigten Programmen aus einer leicht erweiterbaren Programmbibliothek (FORTRAN~ PL/] oder Assemblerprogramme) Hilfsmitteln Programme
zur Verwaltung der Dokumentation fiber Daten und
Automatischer Datenumwandlung in gew~nschte physikalische Einheiten Automatischer Datenkonversi~n, soweit durch Implementierung, Darstellung und Speicherung erforderlich Unterstfitzung graphischer Ein/Ausgabegergte Verffigbarkeit von Programmen zur graphischen Darstellung - einer Schnittstelle
f~r leichte Substitution von Ein/Ausgabeger~ten
224
VM /370 Conversational Monitor System
I CP/CMS o~andos ~ Informationssystem (Daten,Methoden)
i Nicht procedurale Sprachebene (EQBE)
Procedurale Sprachebene (DB-Service) Dateizugriff Spooling
XRM DB-System ProgrammBib lio thek (FORTRAN, Assembler, PL/I)
Schnittstelle ~ilfs'~ f@r prozessoren , Ein/Ausgabeger~te
Menutechnik etc.
Station
FIGUR 3: Systemarchitektur
]
Biid-~ schirm I
I
~a~in
225
4.2
Be , i s p i e l e
zur D a t e n b e a r b e i t u n $
Die folgenden zwei Beispiele sollen die Fghigkeiten zur Probleml~sung illustrieren.
Im ersten Beispiel wird die Verbindung mit Programmen aus
einer Programmbibliothek gezeigt, im zweiten Beispiel unter anderem die Bengtzung von globalen Variablen. 1. Welches in der Datenbank erfaBte Material hat einen mittleren Reflexionsbeiwert
.~TERIAL~PEKTREN
(zwischen 250 und 300 nm) gr6ger als 60?
~{¢TERIALNAME
REFLEXIONSSPEKTRUM
material
reflexion
AUSGABE
SIMPSONREGEL
INTEGRALWERT
integral
xl
÷
250
x2
÷
300
STARTWERT 150 NM
SCItRITTWEITE 5 NM
,,EINGABE iNTEGRAND
150
GRENZEN
reflexion
xl
x2
60 gamma[KG-DN~3]xc[CAL.GRADxG] xlambda [CAL.CMxGRADxSEC] Bei dieser Formulierung ist die Existenz einer Eintragung in der Tabelle }~9\TERIALWERTE gesichert. Eine widersprechende Eintragung k6nnte augerdem existieren (falls t~NTERIALNAME nicht Schlfisseleigenschaft hat). Bei der folgenden Abgnderung ist entweder die zusfitzliche Bedingumg erffillt oder nicht entscheidbar Eintragung der Materialwerte
MATERIALWERTE
(weil keine
existiert):
SPEZ. ~ I E ' .... IMATERIALW)~RME 1 LEITF)~HIGKEIT INAME
i GEWICHT [gamma' '
c
]
lambda
[material
0.5 being definable as ~ x ~ , t x , y ~ )
(the ordered pair
and one can say that x bears
the relationship R to y provided that <x,y> g R ("~" being the predicate of set membership).
Thus, the confusion between a
"relation" and a "relationship", terminological
idiocy,
which is another example of
is made quite precise.
379
Relations of interest can be given names and defined either by enumeration of their members or by any property that must be possessed by a pair to enjoy membership,
in exactly the same fashion
that any other set is completely defined by its members. The equally troublesome concept of "order" can be explicitly defined.
A partial ordering is any relation having the properties
of reflexivity,
anti-symmetry and transitivity.
A linear ordering
is a partial ordering where any two elements in its field are comparable and a well-ordering is a nowhere dense linear ordering. Structures of arbitrary complexity can be constructed. of a general array
The concept
(Steel 1964) developed out of some early data
structure studies, and it can be shown that any nondense complex is expressible as a general array so defined.
As digital computers
cannot deal with dense structures except in finite approximation, this would seem to be sufficient. The modal predicate of deontic logic, its derived predicates "-0-"
"O-"
"O"
(for "obliged to"), and
("obliged to not" E "forbidden to"), and
("not forbidden to" I "permitted to") provide the required
paradigm for expressing either legal constraints in the model or defining the rules of access.
These examples could be multiplied a considerable length, but should be sufficient to illustrate the point.
From a theoretical point of
view there is no more suitable vehicle for expressing a conceptual schema.
This is, of course, not the whole story.
First, theoretical possibility and practical possibility are not identical.
There is the danger that the necessary expressions get
too large and cumbersome for effective use. with million instruction operating systems,
In an age where we deal this is not a fully
380
persuasive argument in any event.
It is, however, moot.
The number
and character of the necessary expressions do not get excessive; unlike,
say, the contrast between conventional procedure languages
and Turing machines.
On the contrary, nearly a century of search
for compact notation has resulted in definitional sequences that provide more compact expression than one typically finds in programming language data descriptions perform less of the task.
(or sub-schemas)
which
Some of this is due, of course, to the
use of large character sets, but in any case economy of notation is not a problem. A second potential difficulty is the actual use of the tools to construct the desired models, which is a task that is necessarily an art rather than a science.
Clearly,
if the process of constructing
a model could be itself formalized one would already have the model in the input.
To this point I can only say that I have personally
been partially successful
in constructing models of relatively
complex insurance procedures,
and in a matter of a few days,
inventing notation as I went along.
This effort was only partially
successful in the sense that, while I was able to generate static models with no difficulty,
the problem with time and the dynamic
behavior of the model caused difficulties of two types.
First,
thre was the philosophical problem of the potential as opposed to the actual.
How does one treat the property "age at death" prior to
the actual death of the individual?
Formally,
of course, this is
trivialF but obtaining some assurance that the formalism does not hide an ambiguity or paradox is far from trivial. The second problem with time has to do with the inelegance of making the variable denoting time distinguished and, therefore, case.
a special
While there is nothing inherently wrong with mathematical
381
inelegance per s e, several thousand years of logical and mathematical history suggest intuitively that something is wrong. Some recent work
(Thomasen 1974) on the reduction of tense logic to
modal logic hints at a solution to this problem. I have gone far enough with this work to become convinced that the approach is sound and no fundamental invention is required; some hard work to refine the ideas.
There remains,
only
however, one
further potential criticism of this approach with which it is necessary to deal.
It is a criticism to which I would prefer to
comment "a pox on those who raise it" and then ignore the matter. As a practical consideration,
however,
it will not go away.
It is
much the same argument that has been raised in the past against every programming language except COBOL;
i.e., the language is too
much like algebra, only the mathematicians can use it.
The argument
is irrefutable for if people believe they cannot understand something,
they won't!
However,
there is one difference between
this situation and the programming language situation.
The only one
who must construct models is the enterprise administrator and only the data base administrator and the applications administrators need to read such models. well compensated.
These individuals are presumably senior and
They can be required to have a little education.
Furthermore, while I have no proof, it is my belief that once the barrier of belief in its esoteric character is overcome,
it is no
harder to teach reasonably intelligent people the relevant logic than it is to teach them COBOL and the DDL.
To summarize this personal view of the nature of a conceptual schema, any alternative is either equivalent and therefore equally complex while being less understood for lack of familiarity,
or it
is not equivalent and therefore can only model a subset of that
382
reality otherwise amenable to modelling.
The only real issue is
whether some less powerful but more acceptable formalism exists that is adequate for modelling anticipated enterprises for a reasonable future.
In my view neither data structure diagrams
nor normalized relations
(Bachman 1969)
(Codd 1970) nor the CODASYL DDL
(CODASYL
1971) being discussed at this Working Conference are candidates for such an alternative.
As overlaid structures for internal and
external schemas they may be quite suitable;
the criteria for
acceptability being different. In conclusion~
let me reiterate that the latter portion of this
paper is my personal view of the appropriate structure for a conceptual
schema and does not necessarily represent the view of
other members of the ANSI/SPARC Study Group on Data Base Management Systems.
On the other hand, the general principle of the three
level approach and the essential requirement for the conceptual schema is fundamental to the deliberations of the Study Group.
It
is reasonable to claim that this position will be maintained in the Final Report of the Study Group and will continue to characterize the official position taken by ANSI on behalf of the USA in any deliberations on data base management systems in the ISO.
383
REFERENCES Bachman,
C. W.: "Data Structure Diagrams",
Bernays,
P. and Fraenkel,
Data Base,
A. A.: "Axiomatic
Set Theory",
North-Holland Braithwaite,
R. B.: "Scientific Press
Carnap,
R.: "Introduction (Cambridge,
CMSAG Joint Utilities
1:2 (1969).
Explanation",
(Amsterdam
1958).
Cambridge University
(London 1953). to Semantics",
~
Harvard University Press
1942).
Project:
"Date Management Requirements",
Systems CMSAG
(Orlando,
FL
1971). CODASYL:
"A Survey of Generalized available
CODASYL":
from NTIS
(Washington,
Banks",
CACM,
13:6
(1970), pp. 377-387.
Mathematica
und verwandter
(1931), pp.
173-198.
"Data Base Management Inc.
Hilbert,
(New York,
D. and Ackermann,
W.:
669.
Systems
I", Monatshefte,
SHARE
N. Y. 1970). "Grundzuge der Theoretischen
1938). (Geneva-3)
S~tze der Principia
System Requirements",
Logik",
ISO: ISO/TC97
(New York 1971).
Model of Data for Large Shared Data
K.: "Uber formal unentscheidbare
GUIDE/SHARE:
Systems",
DC 1969).
"Data Base Task Group Report", ACM
Codd, E. F.: "A Relational
G~del,
Data Base Management
Julius Springer
(Berlin,
38
384 Kleene,
S. C.:
"Introduction (Princeton,
Quine, W. V. 0.:
SPARC:
"Outline for Preparation
"Interim Report: Systems:,
Steel,
T. B.; Jr.:
A.:
for Standardization", DC 1974).
Study Committee on Data Base Management
"Beginnings
(forthcoming).
of a Theory of Information CACM,
7:2
(1964), pp. 97-103.
Begriffe der Methodogie
Wissenschaften
und
S. K.: "Reduction of tense logic to modal logic,
I",
37
I", Monatshefte
der deduktiven
f~r Mathematik
Physikt Thomason#
Harvard University
MA 1961).
(Washington,
SIGMOD NEWSLETTER
"Fundmentale
rev.ed.,
of Proposals
CBEMA
Handling", Tarski,
Logic",
(Cambridge,
document SPARC/90,
van Nostrand
N. J. 1952).
"Mathematical Press
SPARC:
to Metamathematics",
(1930), pp. 361-404.
J. Symbolic Logic, Von Wright,
39:3
(1974), pp. 549-551.
G. H. : "An Essay in Modal Logic", (Amsterdam
1951) .
North-Holland
385
'
Enterprise Administral
®
® Conceptual Schema Processor
Data Base ~dministratol
iptmswm~
"\.dm~,strator/!
® External Schema Processor
Internal Schema _~~Processor
@ ,0 ~ ,® I I 'n"'na'~I_
"~ ..... -i-'~!
Sto,age r"! L,,.n,.°,~°, /
! !
I ~ ! I I
Internal (System) Program Subsystem
I I
Conceptual/ Internal Transformer
i I-,conc ' I !
I I
I
! External II 1(Application) ~ I Program I
I I I I I
~ Subsystem II I I
I I
I I