Series ~ d i t o ~ The Worldwide series in Computer Science has been created to publish textbooks which both address and anticipate the needs of an ever evolving curriculum thereby shaping its for under~rad~lates nnajoring in Computer Science and practitioners Its philoso~hyderives from the convictionthatthediscipline of produce technically skilled engineers who will inevitably face, and possibly invent, radically new technologies throughout their future careers. New media will be used in~ovative~y to support high quality texts written by leaders in the field.
JOHNWILEY & SONS, LTD Chichester New York W e i n h e i ~0 Brisbane Singapore Toronto 0
0
0
0
Copyright 0 2000 by John Wiley & Sons, Ltd, Baffins Lane, Chichester, West Sussex PO19 lUD, England ~ational:01243 779777 International: (+44) 12437797’77 e-mail (for orders and customer service enquiries): cs-books~wi1ey.co.uk Visit our Home Page onhttp://~.wiley.co.uk or ht~://~w.wiley.com 1 Reprinted March 200
All rights reserved.No part of this publicationmay be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mecha~ical,photocopying, recording, scanning or otherwise, except under the terms of the Copyright Designs and Patents Act 1988 or under the terms of a licence UK issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London, W 1POLP, without the permission in writing of the Publisher, with the exception of any ~ a t e ~supplied al specifically for the purpose of being entered and executed on a computer system for exclusive use by the purchaser of the publication. & Sons, Ltd accept any responsibili~or Neither the author nor John Wiley liability for loss or damage occasioned to any person or property t ~ o u g using h the material, instructions, methods or ideas contained herein, or acting re~aining or from actingas a result of such use. The author and publisher expressly disclaim all implied warranties, including mercha~tabilityof fitness for any particular purpose. ~esignationsused by companies todistin~uishtheir products are often claimed as & Sons, Ltd is aware of a claim, the trademarks, In all instances where John Wiley product names appear in capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Other ~
i ~ d~i t ~~rOffices i ay ~
J o b Wiley & Sons, lnc., 605 Third Avenue, NewYork, NY 10158-0012, USA
Weinheim * Brisbane * Singapore * Toronto uing in Pu~lication
A catalogue record for this book is available from the British Library. ISBN 0-471-97697-0
Printed and bound, from authors’ own electronic files, in Great Britain by Biddles Lt& Guildford and King’s Lynn. This book is printedon acid-free paper responsibly manufactured from sustainable forestry, for which at leasttwo trees are planted for each one used for paper prod~ction.
...
x111
3.1
1.2
L .3
Why study compiler construction? 1.1.1Compilerconstruction is verysuccessful 1.1.2Compiler const~ctionhas a wide app~ica~ility 1.1.3 Compilerscontaingenerallyusefulalgorithms A simple traditional modularcompiler/inte~reter abstract syntax tree cture of the demo compiler 1.2.3 Thelanguage for thedemocompiler cal analysis for the demo compiler ax analysis for the demo compiler 1.2.6 Contexthandling for thedemocompiler 1.2.7Code ~ ~ n e r a t i ofor n thedemocompiler 1.2.8 ~ n t ~ ~ r e t a tfor i o nthedemocompiler The structure of a more realistic compiler 1.3.1 The structure
1.4
~ho~-cuts Compiler architectures 1.4.1 width of compiler the 1.4.2
1.S
roperties of a good compiler
1.3.3
1 4 6 8
8 9 9 11 12 13 14 19 19 21 22 22 24 25 25 26 27 29
1.6
l .7 1.
lace and use~ulnessof o ~ t i ~ i z a t i o n s A short historyof compiler 3 945-1 960: code 1960-1975: parsin 1.8.3 1975-resent:codegeneration
33 33 33 and code opti~ization;
1.9 be ~roductionprocess
1.10 1.11
1.1
32 32
ion of the closure a l ~ o r i t ~ ~ he outline code used in this book ~oncl~sion
34 34 35 35 37 38 38 40 44 45 47 47 48 49
*l. .l* .l*
ding the ~ r o g r text a~ ical versus syntactic analysis ressions and regulardesc~ptions
.l* .l.
*l. .l. .l.
I analyzer by hand l analyzer auto~atical1~ table ~ o ~ p r e s s i o n
.I. .1.10 1.1 1
identification of tokens
a
A.12
ro processing andfile inclusion
.l .l3 rom tokens to syntax tree ___ the syntax o classes of parsing methods or detection and error recovery a top- own parser ~ a n u a l l y a top-down parser automatically
52 56 56 57 58 60 61 68 86 93 94 96 98 l 03 109 110 111 115 117 120
2.2.5Creating
a bottom-upparser auto~atically
179
Conclusion 2.3
3.1
3.2
summary Further reading
181
Exercises
185
Attribute grammars 3.1.1 Dependency graphs 3.1.2 Attribute evaluation 3.1.3 Cycle handling 3. 1.4Attributeallocation 3.1.5 Multi-visit attribute grammars 3.1.6Summary of thetypes of attribute grammars 3.1.7 L-attributed grammars 3.1.8 S-attributed grammars 3.1.9Equivalence of L-attributedand S-attributed grammars 3.1.10 Extended grammar notations and attribute grammars 3.1.1 1 Conclusion Manual methods 3.2.1 Threading the AST 3.2.2 Symbolic interpretation 3.2.3 Data-flow equations 3.2.4 Inte~roceduraldata-flow analysis 3.2.5Carryingtheinformationupstream -live analysis 3.2.6Comparingsymbolicinterpretationanddata-flowequations
194 195 200 202 210 217 218 229 230 235 235 237 238 238 239 245 253 260 262 267 269
Conclusion 3.3 summary Further reading Exercises
ei 4.1
4.2
150
Interpretation 4.1.1 Recursive inte~retatio~ 4.1.2 Iterative inte~retation Code generation 4.2. 1 Avoiding code generationaltogether 4.2.2 The starting point 4.2.3 Trivial code generation 4.2.4 Simple code generation
3 84
269 273 274 279 28 1 28 1 285 290 295 296 297 302
4.2.5Codegeneration for basicblocks 4.2.6BURS code generationanddynamic p r o g r a ~ i n g 4.2.7Registerallocation by graphcoloring Supercompilation 4.2.8 of code generationtechniques 4.2.9Evaluation 4.2.10 Debugging of codeoptimizers 4.2.1 l Preprocessingtheintermediatecode 4.2.12 Postprocessingthetarget code 4.2.13 Machinecodegeneration Assemblers, linkers, and loaders 4.3 4.3.1 Assembler design issues 4.3.2 Linker design issues Conclusion 4.4 ~ u ~ m ~ y Further reading Exercises
5. l
Dataallocationwith explicit deallocation Basic memory allocation 5.1.3
Extensible arrays
5.2Dataallocationwithimplicitdeallocation 5.2.1Basicgarbagecollectionalgorithms 5.2.2 Preparing the ground 5.2.3 Reference counting 5.2.4 Mark and scan 5.2.5 Two-space copying 5.2.6 Compaction 5.2.7 Generational garbage collection Conclusion 5.3 S
6. 1
u
~
~
a
~
320 337 357 363 364 365 366 37 1 374 375 378 381 383 383 389 389 396 398 399 403 404 407 407 409 415 420 425 428 429 43 l
Further reading
43 1 434
Exercises
435
Context handling 6.1.1 Identification 6.1.2 Type checking 6.1.3 Conclusion
438 440 44 1 449 460
6.2
Source lan~uagedata rePresentation and handling
io^ types 6.2.6 Array types 6.2.7 Sett es types YPes 6.2.10 Interface types outines and their activation Activation records 6.3. l 6.3.2 Routines 6.3.3 O~erationson routines 6.3.4 Non-nested routines 6.3.5 Nested routines 6.3.6 Lam~dalifting 6.3.7 Iterators and coroutines Code ge~erationfor control flow state~ents
6.2.5
6.3
6.4
6.5
Code en era ti on for ~ o d u l e s 6.5.3 Name generation 6.5.2 odule initialization .3 Code ge~erationfor generics
6.6 Further reading xercises
460 460 461. 46 1 465 466 46'7 470 47 l 47 l 480 48 1 482 485 486 489 49 1 499 500 501 502 512 519
523 524 524 525 527 528 531 532 538
7.1
A short '7.1.1 7.1.2 7.1.3
Offside rule Lists List co~prehension
540 540 54 1 542 543 543 544 545 547
7. 7.3 function a ~ ~ l i c a t i o n 7.4
e translation of lists he trans~atio~ of ~ ~ ~ t tmatching ern he translation of list c o ~ ~ r ~ h e n s i o n he t r a ~ s ~ ~ toifonested n functio~s
7.
7.6 7.7
umulato~transfor~ation
548 549 551 55 l 553 553 553 557 558 560 564 566 568 573 575 575 582 582 584 587 587 587 588 588 588 589 589
7.
7.9
592 593 596 598 598 600 tion model, i n t ~ ~ r e t e ~
8.3.1
~ n i f i ~ ~ tof i osnt ~ ~ t u ~lists, e s ,and sets
601 603 606 606 607 607
8.3.2 The implementation of unification 8.3.3Unification of twounboundvariables Conclusion 8.3.4 8.4The generalimplementationmodel,compiled 8.4. 1 List procedures 8.4.2Compiled clause searchandunification 8.4.3OptimizedclauseselectionintheWAM 8.4.4Implementingthe‘cut’mechanism 8.4.5 ~mple~enting thepredicates a s s e r t and r e t r a c t
609 612 614
8.5Compiled code for unification 8.5.1 ~nificationinstructionsintheWAM 8.5.2Deriving a unificationinstruction by manualpartial evaluation 8.5.3 ~nificationof structuresintheWAM 8.5.4Anoptimization: readwrite mode 8.5.5FurtherunificationoptimizationsintheWAM Conclusion 8.5.6
634 636
Summav Further reading Exercises
9.1 Parallel programing models Sharedvariablesandmonitors 9. l. 1 9.1.2 Message passing models 9.1.3 Object-oriented languages 9.1.4The Linda Tuplespace 9.1.5 ~ata-parallellanguages 9.2 Processes and threads 9.3 Shared variables 9.3.1 Locks 9.3.2 Monitors 9.4 Message passing 9.4.1 Locating the receiver Marshaling 9.4.2 9.4.3Typechecking of messages 9.4.4 Message selection 9.5 Parallel object-oriented languages 9.5.1 Object location 9.5.2 Object migration 9.5.3 Object replication
615 616 619 623 628 630
638 639 646 648 650 650 653 653 656 659 659 66 1 663 664 665 667 668 669 669 67 1 672 672 673 674 674 675 676 677
9.6
Tuple space 9.6.1Avoidingtheoverhead of associativeaddressing ~ i s t ~ b u t implementations ed of the tuple space
9*7
~ u t o ~ a tp~allelization ic 9.7.1 E ~ p l o i t i n ~p ~ a l ~ e lautomatically ism 9.72 Data ~ependencies 9.7.3 Loop transformations 9.7.4Automatic paralleli~ationfor distributed-~emo~ machines
678 679 682 684 685 684 688 690
9.8
~o~clusion
693
Su~rnar~ Further reading
693 695 695
Exercises
A. 1 A.2 A.3 AA
Syntax-dete~nedclassesand semantics-dete~inin~ methods Thesimpleobject-orientedcompiler ~bject-ori~nted parsing Evaluation Exercise
699 499 701 702 708 708 709 720 728
In the 1980s and 1990s, while the world was witnessing therise of the PC and the Internet on the front pages of the daily newspapers, compiler design methods developed with less fanfare, developments seen mainly in the technical journals, and -more importantly -in the compilers that are used to process today’s software. These develop~entswere driven partly by the advent of new p r o g r a ~ ~ i nparadigms, g partly by a better unde~standingof code generation techniques, and partly by the introduction o f faster machines with large amounts of memory. The field of p r o g r a ~ i n glanguageshasgrowntoinclude,besidesthe traditional imperative para dig^, the object-oriented, functional, logical,and ~arallel/distributed paradigms,whichinspirenovelcompilationtechniquesandwhichoftenrequiremore extensive run-time systems than do imperative languages. BURS techniques (Bottom-Up Rewriting Systems) have evolved into very powerful code generation techniques which cope superbly with the complex machine inst~ctionsets of present-day machines. And thespeedandmemorysize of modernmachinesallowcompilationtechniquesand languagefeaturesthatwere unthin~ablebefore.Moderncompilerdesign methods meet these challenges head-on.
Our audience are mature students in oneof their final years, who have at least used a compiler occasionally and given some thought to the concept of compilatio~.When these students leave the u~iversity,they will have to be familiar with language processors for each of the modern ~aradigms,using modern techniques. Although curriculum requirements in many universities may have been lagging behind in this respect, graduates entering the job market cannot afford to ignore these develop~ents. xperience has shown us thata considerable number of techniques traditio~allytaught ... .. (L..
const~ctionarespecialcases imperativ iwork for ues fundamental techniques have
of morefundamentaltechniques.Oftenthese a sentation for activa-
fullydistinguishing seful to the student) nefit the student only in exceptional cases); etails and opti~i~ations;
book, can expectto:
-have obtained __I
a
anding of the concepts of mode^ compiler design and with their practical application; theinng ~onstructionof a language processor ~ i n i m at~dining l period;
for each of the
-be able to read the literature. he first two provide a firm basis; the third provides potential for growth.
The first, co~prisingChapters 1through essing in general; it includes a chapter on in the generated code. The second part, ecific techniques required by the various p r o g r a ~ i n g n the parts of the book are outlined in the table on the four phases of compiler const~ction:~ ~ a Z y s icons, e ~ s y s ~ e ~Chapters s. inthiscolumncoverboththe auf the pertinent software buttendtoemphasize
paradig~scovered in this book; for ct treated by each of thephasesis shown, These iquesonly, all automatictechniqueshavingbeen e and square, with all boxes filled tries are iss sing and that there is no . The top right entries would cover s in the text analysis of logic la~guages,but present text ible enough -and languages similar enough -to thing to be said there, for lack of problems. The would discuss manual and auto~atictechniques
The s t r ~ c t ~of r ethe ~
o
in imperative and objectoriented programs (Chapter 6)
in functional programs (Chapter 7)
in logic programs (Chapter 8)
in parallel/ distributed programs (Chapter 9)
context handling (Chapter 3)
identifier identification
polymorphic type checking
static rule matching
Linda static analysis
synthesis (Chapter 4)
code for whilestatement
marshaling structure code for list comprehension unification
run-time systems (no chapter)
stack
reduction machine
o xv ~
How to do: analysis (Chapter 2)
replication Warren Abstract Machine
for creating run-time systems. Unfortunately there is little or no theory on this subject: run-time systems are still crafted by hand by programmers on an intuitive basis; there is nothing to be said there,for lack of solutions. Chapter 1 introduces the reader to compiler design by examining a simple traditional modular compiler/interpreter in detail. Several high-level aspects of compiler construction are discussed, followed by a short history of compiler construction and an introduction to formal grammars. Chapter 2 treats the analysis phase of a Compiler: the conversion of the program text to an abstract syntax tree. Techniques for lexical analysis, lexical identification of tokens, and syntax analysis are discussed. Chapter 3 covers the second phase of a compiler: context handling. Several methods of context handling are discussed: automated ones using attribute grammars, manual ones using L-attributed and S-attributed grammars, and semi-automated ones using symbolic interpretation and data-flow analysis. Chapter 4 covers the synthesis phase of a compiler, covering both interpretation and code generation. The section on code generation is mainly concerned with machine code generation; the intermediate code required for paradigm-specific constructs is treated in Chapters 6 through 9. Chapter 5 concerns memory management techniques, bothfor use in the compiler and in the generated program.
Chapters 6 through 9 address the special problems in compiling for the various paradigms -imperative, object-oriented, functional, logic, and paralle~distributed.Compilers for imperative and object-oriented programs are similar enough to be treated together in one chapter, Chapter 6. Appendix A discusses a possible but experimental methodof object-oriented compiler const~ction,in which an attempt is made to exploit object-oriented concepts to simplify compiler design. Several subjects in this book are treated in a non-traditional way, and some words of justification may be in order. Lexical analysis is based on the same dotted items that are adi it ion ally reserved for bottom-~psyntax analysis, rather than on Thompson’s NFA cons~ction.We see the dotted item as the essential tool in bottom-up pattern matching, unifying lexical analysis, LR syntax analysis, and bottom-up code generation. The traditional lexical algorithms are just low-level implementations of item manipulation. We consider the different treatment of lexical and syntax analysis to bea historical artifact, Also, the difference betweenthe lexical and the syntaxlevels tends to disappear in modern software. Considerable attention is being paid to attribute g r ~ a r s in, spite of the fact that their impact on compiler design has been limited.Still, they are the only known way of aucontext handling, and we hope that the present treatment will help to lower the thresho~~ of their application. Fun~tionsas first-class data are covered in much greater depth in this book than is usual in compiler design books. After a good start in Algol 60, functions lost much status as manipulatable data in languages like C, Pascal, and Ada, although Ada 95 rehabilitated them somewhat. The implementation of some modern concepts, for example functional and logic languages, iterators, and continuations, however, requires functions to be manipulated as normal data. The fundamental aspects of the implementation are covered in the chapter on imperative and object-oriented languages; specifics are given in the chapters on the various other paradigms. An attempt at justifying the outline code used in this book to specify algorithms can be found in Section 1.11. Additional material, including more answers to exercises, and all diagrams and all code from the book, are available through John Wiley’s Web page.
The book contains far too much materialfor a compiler design courseof 13 lectures of two hours each, as given at OUT university, so a selection has to be made. Depending on the maturity of theaudience,anintroductory,more traditional coursecan be obtainedby including, for example,
Chapter 1; Chapter 2 up to 2.1.7; 2.1.10; 2.1.1 1; 2.2 up to 2.2.4.5; 2.2.5 up to 2.2.5.7; Chapter 3 up to 3.1.2; 3.1.7 up to 3.1.10; 3.2 up to 3.2.2.2; 3.2.3; Chapter 4 up to 4.1; 4.2 up to 4.2.4.3; 4.2.6 upto 4.2.6.4; 4.2.11; Chapter 5 up to 5.1.1.1; 5.2 up to 5.2.4; Chapter 6up to 6.2.3.2; 6.2.4 up to 6.2.10; 6.4 up to 6.4.2.3. A more advanced course would include all of Chapters 1 to 6, excluding Section 3.1. This could be augmentedby one of Chapters 7 to9 and perhaps Appendix A. An advanced course would skip much of the introductory material and concentrate on the parts omitted in the introductory course: Section 3.1 and all of Chapters 5 to 9, plus Appendix A.
We owe many thanks to the following people, who were willing to spend time and effort on reading drafts of ourbookand to supply us withmanyusefulandsometimesvery detailed comments: Mirjam B&er, Raoul Bhoedjang, Wilfred Dittmer, "homer NI. Gil, Ben N. Hasnai,BertHuijben,JacoA.Innthorn,JohnRomein, Tim Ruhl,andthe anonymous reviewers. We thank Ronald Veldemafor the Pentium code segments. We are grateful to Simon Plumtree, Gaynor Redvers-Mutton, Dawn Booth, and Jane Kerr of John Wiley & Sons Ltd, for their help and encouragement in writing this book. Lmbert NleertenskindlyprovidedinformationonanolderABCcompiler,andRalph Griswold on an Icon compiler. We thank the Faculteit Wiskunde en Informatica (now partof the Faculteit der Exacte Wetenschappen) of the Vrije Universiteitfor their support and the use of their equipment.
J.H.
Dick Gmne Henri Ceriel Koen G. Langendoen
[email protected],http://www.cs.vu.nl/-dick
[email protected],http://www.cs.vu.nl/-bal
[email protected],http://www.cs.vu.nl/-ceriel
[email protected],http://pds.twi,tudelft.nl/-koen
~ m ~ t e ~ May ~ a 2000 m,
JavaTMis a tradernarkof Sun ~icrosystems ~irandaTMis a trademark of Research Software Ltd a trademark of ~ i c r o s o fCorporation t ~ S - ~ ~ Sis T M PentiumTMis a trademark of Intel PostScriptTM is a trademark of Adobe Systems Inc SmalltalkT~is a trademark of Xerox Corporation UNIXTMis a trademark of AT&T
Introduction
In its most general form, a compiler is a program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text. This process is called translation, as it would be if the texts were in natural languages. Almost all compilers translate from one input language, the source language, to one output language, the target language, only. One normally expects the source and target language to differ greatly: the source language could be C and the target language might be machine code for the Pentium processor series. The language the conipiler itself is written in is called the implementation language. The main reason why one wants such a translation is that one has hardware on which one can ‘run’ the translated program, or more precisely: have the hardware perform the actions described by the semantics of the program. After all, hardware is the only real source of computing power. Running a translated program often involves feeding it input data in some format, and will probably result in some output data in some other format. The input data can derive from a variety of sources; examples are files, keystrokes, and network packages. Likewise, the output can go to a variety of places; examples are files, monitor screens, and printers. To obtain the translated program, we run a compiler, which is just another program whose input data is a file with the format of a program source text and whose output data is a file with the format of executable code. A subtle point here is that the file containing the executable code is (almost) tacitly converted to a runnable program; on some operating systems this requires some action, for example setting the ‘execute’ attribute. To obtain the compiler, we run another compiler whose input consists of compiler source text and which will produce executable code for it, as it would for any program source text. This process of compiling and running a compiler is depicted in Figure 1.1; that compilers can and do compile compilers sounds more confusing than it is. When the source language is also the implementation language and the source text to be compiled is
I
actually a new version of the compiler itself, the process is called term‘bootstrapping’ is traditionallyattributedto a story of Baronvon Munc~ausen (17I20-1797), although in the original story the baron pulled himself from a swamp by his hair plait, rather thanby his bootstraps (Anony~ous,1840).
Figure 1.1 Compiling and running a compiler.
Compilation does not differ fundamentally from file conversion but it does differ in degree. One clear aspect of compilation is that the input has a property called se~anticsits ‘meaning’ -which must be preserved by the process and which is often less clearly identifiablein a traditional file conversionprogram, for exampleonethatconverts EBCDIC to ASCII. On the other hand, a GIF to JPEG converter has to preserve the visual impression of the picture, which might with some justification be called its semantics. In the final analysis, a compiler is just a giant file conversion program. The compiler can workits magic because of two factors:
-the input
is in a language and consequently has a structure, which is described in the language reference manual; -the semantics of the input is described in termsof and is attached to that structure. These factors enable the compiler to ‘understand’ the program and to collect its se~antics ~ . sametwofactors exist withrespecttothetarget in a s e ~ ~ ~rte ~i rce s e ~ t a t i oThe langua~e.This allows the compiler to refomulate the collected semantics in tems of the
target language. How all this is donein detail is the subject of this book,
Source text Compiler
igure 1.2 Conceptual structure of a compiler.
The part of a compiler that performs the analysisof the source language text is called ,and the part that does the target language synthesis is the b a c ~ - ~ see n~; Figure 1.2. If the compiler has a very clean design, the front-end is totally unaware of the target language and the back-end is totally unaware of the source language: the only thing theyhaveincommon is knowledge of the semantic representation. There are technical reasons why such a strict separation is inefficient, and in practice even the best-structured compilers compromise. The above description i ~ e d i a t e l ysuggests another mode of operation for a compiler: if all required input data are available, the compiler could~ e ~ theo actions r ~ specia different form. The fied by the semantic representation rather than re-express them in code-generating back-end is then replaced by an interpreting back-end, and the whole program is called an i ~ t e r ~ r e t e There r. are several reasons for doing this, some fundamental and some more opportunistic. One fundamental reason is that an interpreter is normally written in a high-level language and will therefore run on most machine types, whereas generated object code will only run on machines of the target type: in other words, portability is increased, Another is that writing an interpreter is much less work than writing a back-end. A third reason for using an interpreter rather than a compiler is that performing the actions straight from the semantic representation allows better error checking and reporting to be done. This is not fundamentally so, but is a consequence of the fact that compilers (front-en~ack-endcombinations)areexpectedtogenerateefficientcode.As a result, most back-ends throw away any information that is not essential to the program execution in order to gain speed; this includes much information that could have been useful in giving good diagnostics, for example source codeline numbers. A fourth reason is the increased security that can be achieved by interpreters; this effect has played an important role in Java’s rise to fame. Again, this increased security is not fundamental since there is no reason why compiled code could not do the same checks an interpreter can. Yet it is considerably easier to convince oneself that an interpreter does not play dirty tricks than that there are no booby traps hidden in binary executable code. It should be pointed out thatthere is no fundame~taldifference between using a cornpiler and using an interpreter. In both cases the program text is processed into an inter-
The original meaning of ‘to compile’ is ‘to select representative material and add it to a collection’; present-day makers of compilation compact discs use the term in its proper meaning. In its early days programming language translation was viewed in the same way: when the input contained for example ‘a. + b’, a prefabricatedcodefragment‘load a in register; add b to register’ was selected and added to the output. A compiler compiled a list of code fragments to be added to the translated program. Today’s compilers, especially those for the non-imperative programming paradigms, often perform much more radical transformations on theinput program. mediate form, which is then interpreted by some interpreting mechanism. In compilation,
-the program processing is considerable;
-the resulting intermediate form, machine-specific binary executable code,is low-level; -the interpreting mechanismis the hardware CPU;and
-program execution is relatively fast.
In interpretation, the program processingis minimal to moderate; theresulting i n t e ~ e d i a t ef o m , somesystem-specific medium-level; the interpreting mechanismis a (software) program; and program execution is relatively slow.
data structure, is high-to
These relationships are summarized graphically in Figure 1.3. Section 4.2.3 shows how a fairly smooth shift from interpreter to compiler can be made. After considering the question of why one should study compiler construction (Section 1.1) we will address some general issues in compiler construction, including compiler architecture (Sections 1.2 to IS), retargetability (Section 14, and the place of optimizations (Section 1.7). This is followed by a short history of compiler construction (Section 1.8). Nextaretwomoretheoreticalsubjects:anintroductionto context-free g r a ~ a r s (Section l.9), and a general closure algorithm (Section lSO). A brief explanation of the pseudo-code used in the book (Sectionl. l 1) concludes this in~oductorychapter. Occasionally, the structure of the text will be summarized ina ‘roadmap’ like the one below.
There are a number of objective reasons why studying compiler construction idea:
is a good
Compilation
1-11.
t"""""" " " " _ " " " "7" " _ preprocessing
'b
processing Inte~ret~tion
F i ~ 1~.3rComparison ~ of a compiler and an interpreter.
Roadrnap l Introduction 1.1 Why study compiler construction? 1.2 A simple traditional modularc o ~ p i l e r / i n t e ~ r e t e r 1.3 The structureof a more realistic compiler 1.4 Compiler architectures 1S-l .7 Properties of a good compiler 1.8 A short history of compiler construction 1.9 Grammars 1.10 Closure algorithms 1.1 l The outline code usedin this book
-compiler construction is a very successful branch of computer science, and one of the earliest to earn that predicate;
-given its close relation tofile conversion, it has wider application thanjust compilers;
-it contains many generally useful algorithms ina realistic setting.
We will have a closer look at each of these below. The main subjective reason to study compiler construction is of course plain curiosity: it is fascinating to see how compilers manage to do what they do.
Compiler construction is a very successful branch of computer science. Some of the reasons for this are the proper structuringof the problem, the judicious use of f o ~ a l i s m sand , the use of tools wherever possible,
Compilersanalyzetheir input, construct a semanticrepresentation,andsynthesizetheir output from it. This analysis-synt~esis~ a r ais ~very i powerful ~ ~ and widely applicable. A program for tallying word lengths in a text could for example consist of a front-end which analyzes the text and constructs internally a table of (length, frequency) pairs, and a back-end which then prints this table. Extending this program, one could replace the textanalyzing front-end by a module that collects file sizes in a file system; alte~atively,or additionally, one could replace the back-end by a module that produces a bar graph rather than a printed table; we use the word ‘module’ here to emphasize the exchangeabi~ityof the parts. In total, four programs have already resulted, all centered around the semantic representation and each reusing lots of code from the others. Likewise, without the strict separation of analysis and synthesis phases, programming languages and compiler construction would not be where they are today. Without it, each new language would requirea completely new set of compilers for all interesting machines -or die for lack of support. With it, a new front-end for that language suffices, to be combinedwiththeexistingback-ends for the current machines: for I, languagesand M machines, L front-ends and M back-ends are needed, requiring I,+M modules, rather than LxM programs. See Figure 1.4. It should be noted immediately, however, that this strict separation is not completely free of charge. If, for example, a front-end h o w s it is analyzing for a machine with special machine instructions for multi-way jumps, it can probably analyze caselswitch statements so that they can benefit from these machine instructions. Sirnilarly, if a back-end knows it is generating code for a language which has no nested routine declarations,it can generate simpler code for routine calls. Many professional compilers are integrate^ compilers, for oneprogramminglanguageandonemachinearchitecture,using a semantic representation which derives from the source language and which may already containelements of the target machine. Still, the structuring has played and still plays a large role in the rapid introductionof new languages and new machines,
Forsomeparts of compilerconstructionexcellentstandardizedformalismshavebeen developed, which greatly reduce the effort to produce these parts. The best examples are regularexpressionsand context-free g r a ~ a r s used , inlexicalandsyntacticanalysis. Enough theory about these has been developed from the 1960s onwards to fill an entire course, but the practical aspects can be taught and understood without going too deeply into the theory. We will consider these formalisms and their applications in Chapter 2. Attribute g r ~ m a r sare a formalism that can be used for handling the context, the long-distance relations in a program that link, for example, the use of a variable to its declaration. Since attribute g r ~ m a r sarecapable of describingthe full semantics of a
Back-ends for
Front-ends for
l
/
Semantic representation
/
i l '
\
Machine M
Language L,
~
Figure 1.4 Creating compilers forL languages and M machines.
language, their use can be extended to interpretation or code generation, although other techniques are perhaps more usual. There is much theory about them, but they are less wellstandardizedthanregularexpressionsand context-free g r a ~ ~ Attribute s . grammars are covered in Section 3.1. Object code generation for a given machine involvesa lot of nitty-gritty p r o g r a ~ n g whendonemanually,buttheprocesscan be automated, for example by usingpattern matching and dynamic p r o g r a ~ i n gtechniques. Quite a number of formalisms have been designed for the description of machine code, both at the assembly level and at the binary level, but none has gained wide acceptance to date and each compiler writing system has its own version. Automated code generation is treated in Section 42.6.
Once one has the proper formalism in which to describe what a program should do, one cangenerate a programfrom it, using a ~ r o g r age~erator. ~ Examplesarelexical analyzers generated from regular descriptions of the input, parsers generated from grammars (syntax descriptions), and code generators generated from machine descriptions. All these are generally more reliable and easier to debug than their handwritten counterparts; they are often more efficient too. Generating programs rather than writing them by hand has several advantages:
-The input to
a program generator is of a much higher level of abstraction than the handwritten program would be. The programmer needs to specify less, and the tools take responsibility for much error-prone housekeeping. This increases the chances that
the program will be correct. For example, it would be cumbersome to write parse tables by hand. The use of program-generating tools allows increased ~exibilityand modifiability. For example, if during the design phase of a language a small change in the syntax is considered, a handwritten parser would be a major stumbling block to any such change. With a generated parser, one would just change the syntax description and generate a new parser. Pre-canned or tailored code can be added to the generated program, enhancin.g its power athardlyanycost.Forexample, input errorhandling is usually a difficult affairin handwritten parsers; a generated parser can include tailored error correction code with no effort on the partof the programmer. A formal description can sometimes be used to generate more than one type of program. For example, once we have written a grammar for a language with the purpose of generating a parser from it, we may use it to generate a syntax-directed editor, a specialpurpose program text editor that guides and supports the user in editing programs in that language. Insummary,generatedprograms maybeslightlymoreorslightlylessefficientthan handwritten ones, but generating themis so much more efficient than writing them by hand that whenever the possibilityexists, generating a program is almost always to be preferred. The technique of creating compilers by program-generating tools was pioneered by Brooker et al. (1963), and its importance has continually risen. since. Programs that generateparts of a compileraresometimescalled CO ilers, althoughthisisclearly a misnomer. Yet, the term lingers on.
Compiler construction techniques can be and are applied outside compiler construction in its strictest sense. Alternatively, more programming can be considered compiler constructionthanonewouldtraditionallyassume.Examplesarereadingstructured data, rapid introduction of new formats, and generalfile conversion problems. If data has a clear structure it is generally possible to write a grammar for it. Using a parser generator, a parser can then be generated automatically. Such techniques can, for example, be applied to rapidly create ‘read’ routines for HTML, files, PostScript files, etc. This also facilitates the rapid introduction ofnew formats. Examples of file conversion TeX systems that have profited considerably from compiler construction techniques are text fomatters, which convert TeX test to dvi format, and PostScript interpreters, which convert PostScript text to instructionsfor a specific printer,
A third reason to study compiler construction lies in the generally useful data structures andalgorithmscompilerscontain.Examplesarehashing,precomputedtables,thestack mechanism,garbagecollection,dynamicprogramming,andgraph algo~thms. Although each of these can be studied in isolation, itis educationally more valuable and satisfying to do so in a meaningful context,
In this section we will show and discuss a simple demo compiler and interpreter, to introduce the concepts involved and to set the framework for the rest of the book. Turning to Figure 1.2, we see that the heart of a compiler is the semantic representation of the programbeingcompiled. This semanticrepresentationtakestheform of a data structure, called the ‘intermediate code’of the compiler. There are many possibilities for the form of the i n t e ~ e d i a t ecode; two usual choices are linked lists of pseudo-instructions and annotated abstract syntax trees. We will concentrate here on the latter, since the semantics is primarily attached to the syntax tree.
The syntas tree of a program text is a data structure which shows precisely how the various segments of the program text are to be viewed in terms of the grammar. The syntax tree can be obtained through a process called ‘parsing’; inother words, parsing-’ is the process of st~cturinga text according to a given grammar. For this reason, syntax trees are arse trees;we will use the terms interchangeably, witha slight preference for ‘parse tree’ when the emphasis is on the actual parsing. Conversely, parsing is also called syntas ana~ys~s, but this has the problem that there is no corresponding verb ‘to syntaxanalyze’. The parser can be written by hand if the grammar is very small and simple; for larger andlor more complicated grammars it canbe generated by a parser generator. Parser generators are discussed in Chapter2. The exact form of the parse tree as required by the grammar is often not the most convenientone for further processing, so usually a modifiedform of it is used,calledan a~stractsyntas tree, or A Detailedinformationaboutthesemanticscanbeattached ns, which are stored in additional data fields in to the nodes in this tree th the nodes; hence the term tract syntas tree. Since unannotated ASTs are of limited use, ASTs are always more or less annotated in practice, and the abbreviation ‘AST’ is used also for annotated ASTs. Examples of a~notationsaretypeinformation(‘thisassignmentnodeconcerns a Boolean array assignment’) and optimization infor~ation(‘this expression does not contain a function call’). The first kind is related to the semantics as described in the manual, and is used, among other things,for context error checking, The second kind is not related to anything in the manual but may be tant for the code generation phase. The annotaes ofthatnodeand since a noderepresents a tionsin a nodeare also calledthe a grammar symbol, one also says that the grammar symbol has the corresponding attributes. It is the task of the cantext ~ ~ n ~module Z i n to ~ determine and place the annotations or attributes. Figure 1.S shows the expression b* b - 4 *a*c as a parse tree; the grammar used for express i o n is similar to those found in the Pascal, Modula-2, or C manuals:
’ In linguistic and educational contexts, the verb ‘to parse’ also used for the determination word classes: determining that ‘to by’ the word ‘by’ is an adverb and in ‘by the way’ it is is
in go preposition. In computer science the word is used exclusively to refer to syntax analysis.
of a
expression
'
expression
/l\
' *' factor
identifier factor
identifier
' b'
term
' *'
' *'
factor identifier
/l\
term
term
term
-1
term
factor
identifier factor
'b'
' c'
identifier
'4'
Figure 1.5 The expression b*b - 4*a*c as a parse tree.
expression "+ expression ' + l term I expression term term "+ term ' * l factor I term ",l factor I factor factor "+ identifier I constant ' (' expression '1'
I
term
Figure 1.6 shows the same expression as an AST and Figure 1.7 shows it as an annotated AST in which possible type and location information has been added, The location i n f o ~ a t i o nin Figure 1.7 suggests offsets froma stack pointer and allocation in a m a c ~ ~ e register, but the precise nature of the i n f o ~ a t i o nis not important at this point. What is important is that we seea shift in emphasis from syntactic structureto semantic contents. Usually the gram^^ of a p r o g r a ~ ~ i nlanguage g is not s ~ e c i in~ terms i ~ ~ of input charactersbut of input 'tokens'. Examples of input tokensareidentifiers(forexample length or as), strings (IIHello! l f ! @ # l f ) , numbers ( 0 , 123e-5),keywords (begin, real), compound operators (++, :=), separators (;, E), etc. Input tokens may beandsometimesmustbeseparated by white space, which is otherwiseignored. So before feeding the input program text to the parser, it must be divided into tokens. Doing so is the task of the ZexicaZ a~alyzer;the activity itself is someti~escalled 'to tokenize', but the literary valueof this word is doubtful.
Figure 1.6 The expression b*b -
4 *a*c as an AST.
,- type: real I
loc: regl
/\ type:
'b' loc: sp+16
I
*, type: real
/\ /\ loc: reg2
,* type: real type: real ' 'loc: SP-I-24 loc: reg2
type: real real 'b' loc: sp+16
I
type: , realtype: 4 'loc: const
Figure 1.7 The expression b*b
-
real
' a' loc: sp+8
4 *a*c as an annotated AST.
We see that the front-end in Figure 1.2 must at least contain a lexical analyzer, a syntax analyzer (parser), and a context handler, in that order. This leads us to the structure of the demo compiler/inte~retershown in Figure 1.8. The baclc-end allows two intuitively different implementations: a code generator and an inte~reter.Both use the AST, thefirst for generating machine code,the second for performing the implied actions immediately.
Figure 1.8 Structure of the demo c o ~ ~ i l e r / i n t e ~ r ~ t e r .
problems that marks much of compiler writing, we will base our demonstration compiler ~on ~ 1 1 ~ sions with operands of one digit. An ~ t h m e t expression i~ is ‘fully parenthesi~ed’if each operator plus its operands is enclosed in a set of parentheses and no other parentheses occur, This makes parsing almost trivial, since each open parenthesis signals the start of a lower level in the parse tree and each close parenthesis signals the return to the previous, higher level: a fully parenthesized expression can be seen as a linear notation of a parse tree. expression -+digit I ’(‘ expression operator expression’)‘ operator -+ I digit -+ ‘ 0 ‘ I ‘1’ I ’ 2 ’ I ‘3‘ I ‘ 4 ’ I ’5‘ I ‘ 6 ‘ I ’ 7 ’ I ’ 8 ’ I I * ’
’9’
Figure 1.9 Grammar for simple fully parenthesized expressions.
To simplify things even further, we will have only two operators, + and *. On the otherhand, we willallowwhitespace,includingtabsandnewlines,in the input. The g r a ~ a r i ~ F i g u r e l ~ 9 p r o ~ u c e s s u c h f o ~( s5 a+s83) , a n d (2*((3*4)+9)). Even this almost trivial language allows us to demonstrate the basic principles of both compiler and interpreter construction, with the exceptionof context handling: the language just has no context to handle.
#include #include #include
h" llparser. "backend.h" error.h"
/* for type AST-node*/
/* for Process0 */ /* for Error() */
int main(void) { AST-node *icode; if ( !Parse-program Process (icode) ;
1
(&icode) ) Error ("No top-level expression''
;
return 0;
Figure 1.10 Driver for the demo compiler.
Figure 1.10 shows the driver of the compiler/inte~reter,in C. It starts by including the definition of the syntax analyzer, to obtain the definitions of type AST-node and of the routine Parse-program ( ) ,which reads the program and constructs the AS". Next it includes the definition of the back-end, to obtain the definition of the routine Process ( ) ,for which either a code generator or an interpreter can be linked in. It then calls the front-end and, if it succeeds, the back-end. (It should be pointed out that the condensed layout used for the program texts in the following sections is not really favored by any of the authors but is solely intended to keep each program text on a single page. Also, the #include commands for various system routines have been omitted.)
The tokens in our language are ( ) ,+ *,and digit. Intuitively, these are five different tokens, but actually digit consists of ten tokens, for a total of 14. Our intuition is based on the fact that the parser does not care exactly which digit it sees, so as far as the parseris concerned, all digits are one and the same token: they form a token class. On the other hand, the back-endis interested in exactly whichdigit is present in theinput, so we have to preserve the digit after all. We therefore split the information about a token into two parts, the class of the token and its representation. This is reflected in the definition of the type Token-type in Figure 1.1 1,which has two fields, one for the class of the token and one for its representation. For token classes that contain only one token which is also an ASCII character (for example +), theclass is theASCIIvalue of thecharacteritself. The class of digits is DIGIT,which is defined in l e x . h as 257, and the repr field is set to the representation of the digit. The class of the pseudo-token end-of-file is EOF,which is defined as 256; it is useful to treat the endof the file as a genuine token. These numbers over255 are chosen to avoid collisions with any ASCII valuesof single characters. The representation of a token has at least two important uses. First, it is processed in
/* Define class constants*/ /* Values 0-255 are reserved for ASCII characters */ #define #define DIGIT 257
EOF
256
typedef struct {int class; char repr;} Token-type; extern extern
Token-type Token; void get-next-token(void); Figure 1.11 Header file lex.h for the demo lexical analyzer.
one or morephasesaftertheparsertoproducesemanticinformation;examples are a numeric value produced from an integer token, and an identification in some form from an identifiertoken.Second,it is usedinerrormessages,todisplaythe exact form of the token. In this role the representation is useful for all tokens, not just for those that carry semantic information, since it enables any part of the compiler to produce directly the correct printable versionof any token. The representation of a token is usually a string, implemented as a pointer, but in our demo compiler all tokens are single characters, so a field of type char suffices. The imple~entationof the demo lexical analyzer, as shown in Figure 1.12, defines a Token and a procedure get-next-token ( ) . A call to global variable et-next-token ( ) skips possible layout characters (white space) and stores the next singlecharacteras a (class, repr) pairin Token. A globalvariable is approp~ate here, since the corresponding input file is also global. In summary, a stream of tokens can be obtained by calling get-next-t oken () repeatedly.
It is the task of syntax analysis to structure the input into an AST. The g r ~ a inr Figby twosimpleBooleanreadroutines, ure 1.9 is so simplethatthiscanbedone Parse-operator (1 for the non-terminal operator and Parse-expression () for the non-terrninal expression. 0th routines are shown in Figure 1.13 and the driver i tial call to Parse-expression ( ) ,is in Fi of the parser,whichcontainsthe ure 1.14. Each of the routines tries to read the syntactic construct it is named after, using the following strategy. The routine for the non-teminal N tries to read thealte~ativesof N in order. For each alternative A it tries to read its first member A If A is found present, the routine assumes thatA is the correct alternative and it then requires the presenceof the other members of A. This assumption is not always warranted, which is why this parsing method is quite weak. But for the ~ r ~ mofa Figure r 1.9 the assumption holds, If the routine succeeds in reading the syntactic construct in this way, it yields a pointer to the corresponding AST as an output parameter, and returns a 1 for success; the output parameter is implementedas a pointertothelocationwherethe output valuemustbe
~
#include
/* for self check
"1ex.h"
*/ /* PRIVATE */
static int Layout-char(int ch) { switch (ch) { case case /\t': case / \ n / :return l; default: 0; /:
return
1
1
/* PUBLIC */
Token-type Token; void get-next-token (void) { int ch;
/* get a non-layout character: */ do
{
ch = getchar (1 ; if (ch c 0) { Token.class = EOF; Token.repr = '#'; return;
1
} while
(Layout-char
(ch) );
/* classify it: */
if c= ch & & ch c = / g r ) {Token.class = DIGIT;} else {Token.class = ch;}
1
Token.repr = ch;
Figure 1.12 Lexical analyzer for the demo compiler.
stored, a usual technique in C. If the routine fails to find the first member of any alternative of ZV, it does not consume any input, does not setits output parameter, and returns a 0 for failure. And if it ets stuck in the middle it stops with a syntax error message. The C template used for a rule P
-+ A1
A2 ... A,
I
B1
B2
...
I
...
is presented in Figure I. S 5. More detailed code is required if any of Ai,E3i7 ...,is a terminal symbol; see the examples in Figure 1.13. An error in the input is detected when we require a certain syntactic construct andfind it is not there. We then give an error message by calling Error ( ) with an approp~atemessage; this routine does not return and terminates the program, afterd playing the message to theuser, This approach to parsi is called rec~rsive~ e s c~e ~~ r~s ibecause ~ g , a set of routines descend recursively to construct the parse tree. It is a rather weak parsing method and makes for inferior error diagnostics, but is, if applicable at all, very simple to implement. Much stronger parsing methods are discussed in Chapter 2, but recursive descent is suffi-
static int Parse-operator (Operator *oper) { if (Token.class == -I") { *oper = get-next-token(); return
1;
if (Token.class == '*') { *oper = l * ' ; get-next-token(); return
1;
' + I ;
1
1
1
return 0;
static int Parse-expression(Expression **expr-p) { Expression *expr = *exprJ = new-expression();
/* try to parsea digit: */ if (Token.class == DIGIT) {
1
expr->type = 'D'; expr->value = Token.repr get-next-token (1 ; return 1;
/* try to parsea parenthesized expression: */ if (Token. class == (' ) { expr->type = P' ; get-next-token (1 ; if ( ! P a r s e - e x p r e s s i o n ( & e x p r - > l e f t ) ) { Error ("Missing expression") ;
1
if ( Parse-operator (&expr->oper) ) { Error ("Missing operator1') ;
1
if (!Parse-expression(&expr->right)) { Error ( "Missing expression" );
1
if (Token.class != ' 1 ' ) { Error ( "Missing ) ;
1
1
get-next-token ( ) ; return 1;
/* failed on both attempts */
1
free-expression(expr); return0;
3 Parsing routines for the demo compiler.
-
'0';
F
r
#includeI1lex. #include "error. #include"parser
h" h" h"
/* for Error()
/* forselfcheck
*/
*/
/* PRIVATE */ static Expression *new-expression (void) {
1
return
(Expression
*)malloc
(sizeof
(Expression) );
static voidfree-expression(Expression *expr) {free ( (void *)expr) ;} static int Parse-operator(0perator *oper-p); static int Parse-expression(Expression **expr-p); int Parse-program(AST-node **icode-p) { Expression *expr;
/* PUBLIC */
get-next-token (1 ; /* startthelexicalanalyzer if (Parse-expression (&expr) ) { if (Token.class != EOF) { Error ( "Garbage after end of program") ;
*/
1
1
1
*icode-p = expr; return l;
return 0;
Parser environment for the demo compiler.
cient for our present needs. The recursive descent parsing presented here is not to be confusedwiththemuchstronger ~ ~ e ~recursive ~ c ~descent ~ vparsing, e which is discussed he latter is an imple~entationof LL(l) parsing, and includes ets to base decisions on. ory we should have different node typesfor the ASTs of different syntactic const~cts,it is more convenient to group them in broad classes and node type for each of these classes. This is one of the differences between which follows the g r a ~ a faithfully, r and the AST, which serves the conveni ore in particular, in our example all nodes in an expression are of type Expression,and, since we have only expressions, that is the only possibility for the ion, each such node type of AST-node. To differentiate the nodes of type Express contains a type attribute, setwith a characteristic value: 13 for a digit and P for a parenthesi%ed expression. The type attribute tells us how to interpret the fields in the of the node. Such inte~retationis needed in the code generator and the interpreter. header file with the definition of the node t e Express ion is shown in igure l .l4 defines a single The syntax analysis module shown Parse-program ( ) whichtriesto re e program as an exp S re
int P(...) { /* try to parse the alternative A, A2 .., A, */ if (AI(...)) { if ( !A2(...) ) Error ("Missing A2");
...
1
if ( !A,(. ..)) Error (IfMissringAnrr) ; return 1;
/* try to parse the alternative B,B2 if ( B l ( . . . ) ) { (!B2(...))
if
... */
Error(llMissingB 2 1 r ) ;
* * .
1
return 1;
... /* failed to find any alternative Pof*/ return 0;
Figure 1.15 A C template for a g r a ~ a nlle. r
int Operator; typedef struct -expression { type char ; /* 'D' or ,pr */ value; int /* for 'D' */ struct -expression *left, *right;/* for / P f */ /* for / P r */ } Expression; typedefExpressionAST-node; xtern
/* thetopnodeisanExpression
int Parse-pro~r~~(AST-no~e **); igure 1.16 Parser header file for the demo compiler.
n ( ) and, if it succeeds, converts the pointer to the expression to a which it subse~uentlyyields as its output parameter. It also checks deed finished after the expression. results from parsing expression the shows the AST that ) ) . ~ e p ~ n on ~ the i nvalue ~ of the type attribute, a node contains either te or three attributes l e f t , o p e r ,and r i g h t . In the diagram, the nontes have been crossed out in each node.
*
value left
/
OPer
\right
F i g ~1 r.1~7 An AST for the expression (2 * ( (3 *4 ) -I- 9 ) ) .
As mentioned before, there is no context to handle in our simple language. We could have introduced the need for some context handling in the form of a context check by allowing the logical values t and f as additional operands (for true andfklse) and defining -t as Zogical or and * as logical and, The context check would then be that the operands must be either both numeric or both logical. Alternatively, we could have collected o~timization i ~ f o ~ a t i ofor n , example by doing all ~ t h m e t i cthat can be done at compile time, Both would have required code that is very similar to that shown in the code generation and inte~retationsections below. (Also, the optimization proposed above would have made the code ene era ti on and interpretation trivial!)
The code generator receives the AST (actually a pointer to it) and generates code from it for a simple stack machine. This machine hasfour instructions, which work on integers: PUSH n ADD
NULT PRINT
pushestheinteger n onto the stack replacesthetopmosttwoelements bytheir sum replacesthetopmosttwoelements by theirproduct popsthetopelementandprintsitsvalue
The module, which is shown in Figure l. 18, defines one routine ProcesS () with one p~ameter,a pointer to the AST. ts purpose is to emit -to add to the object file code with the same se~anticsas the AST. It first generates code for the expression by calling Code-gen-expression ( ) and then emits a PRINT instruction. When run, the code for the expression will leave its value on the top of the stack where PRINT will find it; at the end of the program run the stack will again be empty (provided the machine started with an empty stack). /* fortypesAST-nodeandExpression /* forselfcheck */ /* PRIVATE */ static void Code-gen-expression(Expression *expr) { switch (expr->type) {
#includeItparser. #include"backend.
h" h"
*/
case 'D' : printf (IIPUSH %d\n" expr->value); break; case 'P' : Code-gen-expression(expr->left); Code-gen-ex~ression(expr->right); switch (oper) { case ' +' : printf ( I 1 A D D \ n f 1 ; ) break; : printf ( l1MULT\n1l ) ; break; case *' I
1
1
1
break;
void Process (AST-node "icode) { Code-gen-expression(icode) ;printf
1
/* PUBLIC */ (l'PR1~T\n1l) ;
Code generation back-end for the demo compiler.
The routine Code-gen-expression ( ) checks the type attribute of its parameter to see if it is a digit node or a p~enthesizedexpression node. In both cases it has to generate code to put the eventual value on the top of the stack. If the input node is a digit node, the routine obtains the value directly from the node and genemtes code to push it onto the stack: it emits a PUSH i ~ s t r ~ c t i oOtherwise ~. the input node is a p~enthesized e ~ ~ r e s s i node; o n the routine first has to generate codefor the left and right operands recursively, and then emit an ADD or MULT instr~ction. When run with the expression ( 2 * ( (3 "4 ) -1-9) ) as input, the compiler that results from co~biningthe above modules produces thefollow in^ code:
PUSH 2 PUSH 3 PUSH 4 MULT PUSH 9 ADD MULT PRINT
The interpreter(seeFigure1.19) is verysimilartothe code generator,Bothperforma depth-first scan of the AST, but where the code generator emits code to have the actions performed by a machine at a later time, the interpreter performs the actions right away. The extra set of braces ({ ...}) after case ' P :is needed because we need two local variables and theC language does not allow declarations in the case parts of a switch statement. /* fortypesAST-nodeandExpression /* forselfcheck */ /* PRIVATE */ static int Interpret-expression(Expressi0n *expr) { switch (expr->type) {
#include"parser. #include"backend.
h" h"
case 'D': return expr->value; break; case 'P' : { int e-left = Interpret-expression(expr->left); int e-right = Interpret-expression(expr->right); switch (expr->oper) { case r + i : return e-left + e-right; case '*': return e-left * e-right;
>1
1
1
break;
void Process(AST-node *icode) { printf ( lt%d\ntl, Interpret-expression
1
/* PUBLIC */
(icode) );
igure 1.19 Interpreter back-end for the demo compiler.
*/
ote that the code generator code (Figure l. 18) and the interpreter code (Figure 1.19) e definition file (called a ‘header file’ in C>,backend. h, shown in ossible because they both implement the same interface: a single rouST-node *) . In Chapter 4 we will see an example of a different type tion 4.1.2) and two other code generators (Sections 4.2.3 and 4.2.3.2), e interface. Another module that imple~entsthe back-end interface be a module that displays the AST graphically. Each of these can be ~ o ~ b i with ~ e dthe lexical and syntax modules, to producea program processor.
extern void Process(AST-node *); Common b ~ c ~ - header e n ~ for code generator and interpreter.
three modules and that the back-end could stay as a single module. It will be
these ~ o d u l e will s be called upon by almost all other modules.
of the modules follows, together with an indication of where the o ate rial is discussed in detail. dule finds the program text file, reads it efficiently, and acters, allowing for different kinds of newlines, escape codes, files, when these are to be included. This function may re~uirecooperationwiththeoperatingsystemontheonehandandwiththe lexical a ~ a l y z on e ~ the other. The l e ~ i c~~~l ~ Z y sle i s isolates tokens in the input stream and d e t e ~ i n e stheir sentation. It can be written by hand or generated from a description of the onally, it may do some limited interpretation on some of the tokens, for an identifier is perhaps actually a macro or a keyword (reserved word). ~ ~ ~ Z y~~~~2~ s i s converts the stream of tokens into an abstract syntax tree yntaxanalyzersconsist of twomodules.The first onereadsthetoken st~eamand calls a function from the second modulefor each syntax construct itrecognizes; in the second module then c o n s t ~ c the t nodes of the AST and link them. advantage that one can replace the AST ene era ti on module to obtain a diffrom the same syntax analyzer, or, alte~ative~y, one can replace the syntax
-
ril IC
opti~i~ation
Lexical
generation symbolic inst~ctions
tokens
file
-
Target
Syntax
-Bfile
symbolic instructions
Machine
Context
generation bit patterns
IC Front-end
Back-end
F i ~ ~ r1.2 e1 Structure of a compiler.
analyzer and obtainthe same typeof AST from a (slightly) different language. The above modules are the subjectof Chapter 2. The co~text~ u n ~ Z i n~g o ~ collects ~ Z econtext information from various places in the program,andannotatesnodeswiththeresults.Examplesare:relatingtype info~ation from declarations to expressions; connecting goto statements to their labels, in imperative languages;decidingwhichroutine calls are local andwhichareremote,in distribut~d languages. These annotations are then usedfor performing context checks or are passed on to subse~uentmodules, for example to aid in code generation. This module is discussed in Chapter 3. The i n t e ~ e ~ i code ~ t e ~ene~ution ~ ~ translates ~ ~ language-s~ecific Z e constructs in more general constructs; these general constructs then constitute the i ,so~etimesabbreviated IC. Deciding what is a language-specific and a more general construct is up to the compiler designer, but usually the choice is not very difficult. One criterion for the level of the i n t e ~ e d i a t ecode is that it shouldbe reasonably s t r a i g h t f o ~ ~todgenerate machine code from it for various machines, as suggested by
igure 1.4. Usually the intermediate code consists almost exclusively of expressions and ow-of-control in.structions. ~ x a m p ~ of e s the tran.s~ations done by the intermediate code generation module are: replacing a while statement by tests, labels, and jumps in imperative languages; inserting code for d e t e ~ i n i n gwhich method to call for an object in lallguages with dynamic bind;re~lacinga Prolog rule by a routine that does the appropriate bac~trackingsearch. In each of these cases an alternative trans~ationwould be a call to a routine in the run-time system, with the appropriate parameters: the Prolog rule could stay in symbolic form and be inte~retedby a run-time routine, a run-time routine could dynamically find the method to be called, and even the while statement could beperfomed by a run-time routine if the test and the body were converted to anonymous subroutines. The intermediate code generation ~ o d u l is e the place where the division of labor between in-line code and the runtime systen~is decided. This module is treated in Chapters 6 through 9, for the imperative, object-oriented, functional, logic, and parallel and distributedp~ogrammingparadigms. The i ~ t e r m e ~ icode ~ t eo ~ t i ~ i z f f~ t i lo o~ d performs ~ l ~ preprocessing on the intemediate code, with the intention of improving the effectiveness of the code generation. module. An exam~leof s~aightforwardpreprocessing is constant folding, in whichall operations in e~pressionswith known simple operands are performed. A more sophisticated example is in-lining, in which carefully chosen calls to some routines are replaced by the bodies of those routines, whileat the same time substituting the parameters. The code ~ e n e r ~ t module i o ~ rewrites the AST into a linear list of target machine instructions, in more or less symbolic forrn. To this end, it selects inst~ctionsfor seg,allocatesregisterstoholddataandarrangesthe instr~ctionsinthe target code ~ ~ t i ~ ~ i ~ o d~ u tconsiders l i~ o ~ the list of symbolic machine instructions and tries to optimize it by replacing sequences of machine inst~ctionsby faster or shorter sequences. It uses target-machine-specific properties. The precise boundaries between i n t e ~ e d i a t ecode optimization, code generation, and de opti~~ization are floating: if the code generation is p ~ i c u l a r l ygood, little taroptimization may be needed or even possible, and constant folding can be done de gene ratio^ or even on the target code. Still, some optimizations fit better in one module than in another, and is it useful to distinguish the above three levels. ine code ~ e n ~ r f f 7t i o~ ~ oconverts ~ ~ lthe~symbolic machine instructions into ing bit patterns. It determines machine addresses of pro~ramcode and data and produces tablesof constants and relocation tables. Z e ~ ~ t ~ ~o du ~ tcombines l e the encoded machine instructions, the The e ~ e c u t f f ~code constant tables, the relocation tables, and the headers, trailers, and other material required by the operating system into an executable code file. The back-end ~ o d u l e are s discussed in Chapter4.
There is one i l ~ p o ~ a n t c o ~ p oof n eancompiler t that is tra~itionallyleft outside compiler s t ~ c t u r epictures:therun-timesystem of thecompiledprograms. Some of theactions required by a running p r o g ~ ~will m be of a eneral, language-dependent andlor machine-
dependenthousekeepingnature;examplesare code for allocatingarrays,manipulatin stackframes,andfindingthepropermethodduringmethodinvocation in anobjectorientedlanguage.Althoughit is quitepossibletogenerate code fragments for these actions wherever they are needed, these fragments are usually very repetitive and it is ofte more convenient compile to them store the result library in modules. These library modules together form the . Some imperative languages nee minimal only a run-time system; 0th logic and distributed langua require ~ n - t i systems ~e of considerable size, containing code for parameter unification, remote procedure call, task scheduling, etc. The parts of the run-time system needed by a specific program can be linked in by the linker when the complete object program is constructed, or even be linked in dynamically when the compiled program is called; object programs and linkers are explained in Section 4.3. If the back-end is an inte~reter,the ~ n - t i m esystem must be incorporatedin it. As an aside it should be pointed out that run-time systems are not only tra~itionally left out of compiler overview pictures like those in Figure 1.8 and Figure 3 21, they are also sol~etimesoverlookedorunderestimatedincompilerconstructionplanning.Given the fact that they may contain such beauties as print f ( ) ,ma lloc ( ) ,and concu~ent task management, over~ookingthem is definitely inadvisable.
It is by no means always necessary to implement all modules of the back-end:
-~ r i t i n gthe modules for generating machine code and executable code can be avoide by using the local assembler, whichis almost always available. be avoided by generating C code from the intermediate code. This exploits the fact that good C compilers are available on virtually any platform, which is why C is sometimes called, half jokingly, ‘The Machine-Independent Assembler’. This is the usual approach taken by compilers for the more advanced paradigms, but it is certainly also recom~endablefor first i~plementationsof compilers for new im~erativeand object-oriented languages.
-Writing the entire back-end can often
The object code produced by the above ‘short-cuts’ is often of good to excellent quality, but the increased compilation time may be a disadvantage. Most C compilers are quite substantialprogramsandcallingthemmaywellcostnoticeabletime;theiravailability may, however, make them worth it.
Compilers can differ considerably with regard to their architecture; unfortunately, terminology to describe the different types is lacking or confusing. Two architectural questions dominatethescene. One isconcernedwiththegranularity of the data thatispasse between the compiler modules: is it bits and pieces or is it the entire program? In other words, how wide is the compiler? The second concerns the flow of control between the compiler modules: whichof the modules is the boss?
A compiler consists of a series of modules that transform, refine, and pass on i n f o ~ a t i o n
ern. ~nformationpasses mainly from the front to the end, from module M, to . Each such consecutive pairof modules defines an interface, and although in the end all information has to pass through all these interfaces, the size of the chunks of on makes a considerable difference to the structureof the comices for the size of the chunks of i n f o ~ a t i o nare the smallest unit that is meaningful between the two modules; and the entire program, This leads to mpilers, neither of which seems to have a name; we will call them ‘narrow’ pilers, respectively. er readsasmallpart of theprogram,typicallyafewtokens, obtained, produces a few bytes of object code if appropriate, discards most of the information about these tokens, and repeats this process until the end of the pro ram text is reached. er reads the entire program and applies a seriesof transformations to ontextual, optimi~ing,code generating, etc.), which eventually result in the desired object code. This object codeis then generally written to a file. It will be clear that a broad compiler needs an amount of memory that is propo~ional to the size of the source program, whichis the reason why this type has always been rather Until the 1980s, a broad compiler was unthinkable, even in academia. A narler needs much less ~ ~ e m o rits y ; memory requirements are still linear in the e source program, but the proportionality constant is much lower since it gathers p e ~ a ~ einformation nt (for example about global variables) at a much slower rate. From a t~eoretical,educational, and design point of view, broad compilers are preferable, since they represe~ta simpler model, more in line with the functional pro~ramming m. A broad compiler consists of a series of function calls (Figure 1.22) whereas a compiler consists of a typically imperative loop (Figure 1.23). In practice, ‘real’ compilersareoftenimplementedasnarrowcompilers. Still, anarrowcompilermay each ave abroadcomponent:it is quitenatural for a C compiler initsentirety,process it, andthendiscardallbutthenfor-
n the future we expect to see more broad compilers and fewer narrow ones. Most of new program~ingparadigmsarealreadybroad,sincetheyoften rs, Since scarcity of memory will be less of a problem in the future, more and more imperative compilers will be broad. Onthe other hand, almost all compiler constr~lctio~ tools have been developed for the narrow model and thus favor it. Also, the for a simple n ~ model ~ is opro~ably ~ better for the task of writing a simple compiler y hand, sinceit requires much less dynamic memory allocation. ~ , that it ce the ‘field of vision’ of a narrow compiler is, well, ~ ~ r rit ois possible age all itstransformations on thefly.Suchcompilersthenwriteapartially version. of the program to disk and, often using a different program, continue ccasionally even more passes e used. Not s u ~ r i s i n ~ lsuch y , a comss (or N-pass) compiler, or a scan (N-scan)compiler. If adistincterms is made, ‘2-scan’ often indicates that the second pass actually original program text, the difference being thatit is now armed with
SET Object code TO Assembly ( Code generation( ( Context check Parse ( Tokenize ( Source code )
1
)
) );
Figure 1.22 Row-of-control structure of a broad compiler.
WHILE NOT
Finished: Read some dataD from the source code; Process D and produce the corresponding object code, if any; Figure 1.23 Flow-of-control structure of a narrow compiler.
i n f o ~ a t i o nextracted during thefirst scan. The major transformations performed by a compilerandshowninFigure 1.21 are ~ ~giving ~ rise s to, the termN - ~ ~c ~ o ms ~ei l ~which r , is of course not the same as an N-pass compiler. Since on a very small machine each phase could very well correspond to onepass, these notions are sometimes confused. Withlargermachines,bettersyntaxanalysistechniquesandsimpler pro~ramming language g r a ~ a r sN-pass , compilers with N>1 are going out of fashion. It turns out that not only compilers but also people like to read their programs in one scan. This observation has led to syntactically stronger p r o g r a ~ i n glanguages, which are co~espondin~ly easier to process. Many algorithms in a compiler use only local information; for these it makes little difference whether the compiler is broad or narrow. Where it does make a difference, we will show the broad method first and then explain the narrow method as anopti~ization,if approp~ate.
each In a broadcompiler,control is not a problem:themodulesruninsequenceand module has full control when it runs, both over the processor and over the dah. A simple 1.22. In a driver can activate the modules in the right order, as already shown in Figure
n a ~ o wcompiler, things are more complicated. While pieces of data are moving forward from module to module, control has to shuttle forward and backward, to activate the proper module at the proper time. We will now examine the flow of control in narrow compilers in more detail. hemodules in acompilerareessentially'filters',readinchunksof info~ation, processing them, and writing the result. Such filters are most easily programed as loops which execute function calls to obtain chunks of information from the previous module and routine calls to write chunksof information to the next module. An exampleof a filter as a main loop is shown in Figure 1.24..
WHILE Obtained input character Ch from previous module: IF Ch = 'a': // See if there is another 'a': IF Obtained input character Chl from previous module: IF Chl = 'a': // We have 'aa': Output character 'b' to next module; ELSE Chl /= 'a' : Output character 'a' to next module; Output character Chl to next module; ELSE Chl not obtained: Output character 'ar to next module; EXIT WHILE; ELSE Ch /= 'ar: Output character Ch to next module; The filter aa
-+ b as a main loop.
t describes a simple filter which copies input characters to the output while replacing quence aa by b; the filter is representative of, but of course much simpler than, the
kind of tr~nsformationsperformedbyanactualcompilermodule. The readermay nevertheless be surprised at the complexity of the code, which is due to the requirements roper termination of the previous, the present, and the next module. The need for proper handling of end of input is, however, very much a factof life in compiler construewe cannot afford to sweep its complexities under the rug. e filter obtains its input characters by calling upon its predecessor in the module sequence; such a call may succeed and yield a character, or it may fail. The transformed passed on to the next module. Except for routine calls to the previous and le, control remains insidethe while loop all the time, andno global variables are needed. Alt~oughmain loops are efficient, easy to program and easy to understand, they have oneseriousflawwhichpreventsthem from beingusedastheuniversal pro~ram~ng model for compiler modules: a main loop does not interface well with another main loop in traditional pro~ramminglanguages. Linking the output of one main loop to the input of
another involves a transfer of control that leaves both the environment of the callee and that of the caller intact, regardless of whether the consumer calls the producer to obtain new infomation or the producer calls the consumer to pass on processed inforl~ation. The traditional function call creates a new environment for the callee and traditional function return destroys the environment of the callee. So they cannot serve to link two loops, A transfer of control that does possess the desired properties is the c which involves having separate stacks for the two loops to preserve both envlron~ents. The coroutine linkage also takes care of the end-of-input handling: an attempt to obtain i n f o ~ a t i o nfrom a module whose loop has terminated fails. A well-known implementation of the coroutine linkage is the IJNIX pipe, in which the two separate stacks reside in different processes and therefore in different address spaces. Implementation of coroutines in imperative languages is discussed in Section6.3.7. Although coroutine linkage was proposed by Conway (1963) early in the history of compilerconstruction, no p r o g r ~ i n glanguageexceptperhapsIcon(Griswoldand Griswold, 1983) has ever featured a usable implementation of it. In the absence of coroutines we have to choose one of our modules as the main loop in a narrow compiler and implement the other loops through trickery. That this implies major surgery to these loops is shown by Figure 1.25, which shows our filter as a loop-less module preceding the main loop, and FigureAnswers.l, which shows itas a loop-less module following the main loop. We see that global variables are needed to recordinfomation that must remain availablebetweentwosuccessivecalls of thefunction. The variable Input exhausted records whether the previous call of the function returned from the position before the EXIT WHILE in Figure1.24,andthevariable There is a stored character records whether it returned from before outputting Chl. Some additional code is required for proper end-of-input handling. Note that the code is 29 lines long as opposed to 15 for the main loop. Similar considerations apply to the post-main variant, which is given as an exercise. An additional complication is that proper end-of-input handling requires that the filter be flushed by the using module when it has supplied its final chunk of i n f o ~ a t i o ~ . See Exercise 1.11. Looking at Figures l .25 and Answers.1 in the answers to the exercises, wese complication comes from having to save program state that resides on the stack. be convenient to choose for the main loop the module that has the most state on That module will almost always be the parser; the code generator may gather more state, but it is usually stored in a globaldata structure rather thanon the stack. This explains why we almost universally find the parser as the main module in a narrow compiler: in very si~p~e-minded wording, the parser pulls the program text in through the lexical analyzer, and pushes the code out through the code generator.
The foremost property of a good compiler is of course that it generates correct code. A compiler that occasionally generates incorrect code is useless; a compiler that generates incorrect code once a year may seem useful butis dangerous. It is also i~portantthat a compiler conform completely to the language specification. It may be tempting to implement a subset of the language, a superset or even what is some-
T the flag Input exhausted TO False; SET the flag There aisstored characterTO False; // can never be an SET Stored character TO Undefined;
,a'
F ~ C T I O NFiltered character RETURNING a Boolean, a character: IF Input Exhausted: RETURN False, No character; ELSE IF There is a stored character: // It cannot bean ,ar: SET There is a stored character TO False; RETURN True, Stored character; ELSE Input not exhausted AND There is no stored character: IF Obtained input character Ch from previous module: IF Ch = 'a': // See if there is another, a f : IF Obtained input character Chl from previous module: IF Chl= 'ar: // We have 'aa': RETURN True, 'b'; ELSE Chl /= 'ai: SET Stored character TO Chl; SET There is a stored character TO True; RETURN True, 'a' ; ELSE Ch1 not obtained: SET Input exhausted TO True; RETURN True, 'ar; ELSE Ch /= 'a' : RETURN True, Ch; ELSE Ch not obtained: SET Input exhausted TO True; RETURN False, No character; F i ~ u r1.~25 The filter aa "+b as a pre-main module.
stically called an 'extended subset', and users may even be grateful, but those will soon find that programs developed with such a compiler are much less those written using a fully conforming compiler. (For more about the notion f ' e ~ t e n ~ esubset', d see Exercise 1.13.) ~ n o t property ~ e ~ of a good compiler, one that is often overlooked, is that it should be le to ~ ~ n d l e p r oof~essentially ra~s arbitrary size, as far as available memory permits. It ~ e very~reasonable ~ s to say that no sane programmer uses more than 32 parameters in a routine or more than 128 declarations in a block and that one may therefore allocate a fixed ace for each in the compiler. One should, however, keep in mind not the only ones who write programs. More and more software is ams, and such generated software may easily contain more than128 declarations in one block -although more than 32 parameters to a routine seems excessive, even
for a generated program; famous last words ... Especially any assumptions about limits on the number of cases in a caselswitch statement are unwarranted: very large case statements are often used in the implementation of automatically generated parsers and code generators, Section 5.1.3 shows how the flexible memory allocation needed for handlin grams of essentially arbitrary size can be achieved at an ost negligible increase in cost. allprogramscanbeexpected Compi~ationspeed is anissuebutnot a major one a secondonmodernmachines.r progra~mingprojectsare tocompileinunder izedinmanrelativelysmallsubprorams,modules,libraryroutines,etc., compilationunitscanbecompiled program modification is usually restricted to the modified co~pilationunits only. Also, compiler writers have traditionally been careful to keep their compilers ‘linear in the input’ ,which means that the compilation timeis a linear function of the length of the input file. This is even more important when generated programs of considerable length. are being compiled, since these can be There are several possible sources of non-linearity in compilers. First, all linear-time techniques are rather inconvenient, and the worry-free parsing techniques can be cubic in thesize of theinput in theworstcase. Second, manycodeoptimizationsare potentially e~ponentialin the size of the input, since oftenthe best code can only be found byconsideringallpossiblecombinations of machineinstructions.Third,naivememory ~anagementcan result in quadratic time consumption. Fortunately, good linear-time solutions or heuristics are availablefor all these problems, Compiler size is hardly an issue anymore, with most computers having many megabytes of primary m e ~ o r ynowadays. The ~ser-friendlinessof a compiler shows mainly in the quality of its error reporting, At the least, the user should be presented with a clear error message which includes the d cause of the error, the name of the input file, and the position in it. Giving a od error cause description is often hard or impossible, due to the limited insight have into incorrect programs. inp pointing the error is aided by recordin name and line number with every token and every node inthe AST. More fancy reporting mechanis~s,jncluding showing parts of the syntax tree, may not havethe beneficial effect the compiler writer may expect from them, but it may be useful to provide some visual display mechanism,for example opening a text editor at the point of the error, The importance of the speed and the sizeof the generated code depends totally on the purpose of the compiler, Normally one can expect that the user is more interested in hi speed than in small size (code for embedded applications? suchas washing machines, portable telephones, etc., is an exception). Moderately advanced optimization techniques will perhaps provide a factor of three improvement over very naive code generation; implesuch optimizations may take about the same amount of time as the entire compiler project. Gaini~ganother factor of two or even three over this may be possible t ~ o u extensive ~ h and aggressive optimization; one can expect to spend many times the original effort on an optimization phase of this nature.
A program is considered
ondifferentmachine ty course, a matter of opinion, but t ‘makefile’ to reflect the local situa With compilers, machine de also -perhaps even mainly -in guish two forms of portability: th on another machine, and the ease first is called the y. If thecompiler level language, good portability c retargetabili the back-end; entire back-end. In this contextit is importan mean writing one from scratch. dependent, but much of it is no otherback-endsandotherpar descriptions. This approach ca reasonable effort. Given the p between one and four progr descriptions range in size betweena few hundre
o create a new
~ptimizationsareattractive:muchresearchincompiler co~str~ction is ~ ~ n c e r ~with ed them, and compiler writers regularly see all kinds of o should, however, be kept in mind that imple~enti piler construction: unlike correctness, o p t i ~ i ~ a t i ming, it is easier to make a correct program fast than a fa easier to make correct generated object code fast than t correct. There is another reason besides correctness why W algorithm in this book: some traditionala1 orithms me basicalgorithms.Sometimesthebasic ized version and in any case the basic v dom of design than the optimized version. A good example is the stack in i ~ p l e m e ~ t ~ t i o moment the stack holds the pe~inentdata -admini for each active routine, a routine that of data is calledthe‘activationrecord’ o f this an activation records are found only on the stack, represents a ~ n n i n groutine; we consider the sta activationrecordsare just parts.Itis,however
record as the p r i m ~ yitem: it arises naturally when a routine is called (‘activated’) since it is obvious that its pertinent data has to be stored somewhere. Its allocation on a stack is just an optimization that happens to be possible in many -but not all -imperative and ~bject-o~ented languages. From this point of view it is easier to understand the implementation of those languages for which stack allocation is not a good optimization: imperative languages withcoroutinesorAda-liketasks,object-oriented languages withactive Smallta1~-likeobjects, functional languages, Icon, etc. Probably the best attitude towards optimization is to first understand and implement the basic structure and algo~thm,then see which optimizations the actual situation allows, and only implement them if they are considered worthwhile (considering their cost). In situations in which the need for optimization is obvious from the start, as for example in code generators, the basic structure would include a framework for these optimizations. This framework can thenbe filled in as the project progresses. Thisconcludesourintroductorypartonactuallyconstructing a compiler.Inthe remainder of this chapter we consider three further issues: the history of compiler construction, formal gramm~s,and closure algorithms.
Three periods can be distinguished in the history of compiler construction: 1945-1960, 1960-1975, and 1975-present. Of course, the years are approximate.
During this period languages developed relatively slowly and machines were idiosyncratic. The primary problem was how to generate code for a given machine. The problem was exacerbatedby the fact that assembly programming was held in high esteem, and high(er)level languages and compilers were looked at witha mixture of suspicion and awe: usinga compiler was often called ‘automatic programming’ , Proponents of high-level languages feared, not without reason, that theidea of high-level p r o g r a ~ i n gwould never catch on if compilersproducedcodethatwaslessefficientthanwhatassemblyprogrammersproduced by hand. The first FORTRAN compiler (Sheridan, 1959) optimized heavily and was far ahead of its time in that respect.
The 1960s and 1970s saw a proliferation ofnew p r o g r a ~ i n glanguages, and language designers began to believe that having a compiler for a new language quickly was more important than having one that generated very efficient code. This shifted the emphasis in compiler construction from back-ends to front-ends. At the same time, studies in formal languages revealed a number of powerful techniques that could be applied profitably in front-end const~ction,notably in parser generation.
From 1975 to the present, both the number of new languages proposed and the number of different machine types in regular use decreased, which reduced the need for quick-andsimple/quick-and-d~ycompilers for newlanguagesandlormachines.Thegreatestturmoil in language and machine design being over, people began to demand professional compilers that were reliable, efficient, both in use and in generated code, and preferably with pleasant user interfaces. This called for more attention to the quality of the generated code, which was easier now, since with the slower change in machines the expected lifetime of a code generator increased. Also, at the same time new paradigms in programming were developed, with functional,logic,anddistributed p r o g r a m ~ n gasthemostprominentexamples.Almost invariably, the run-time requirementsof the corresponding languages far exceeded thoseof the i ~ p e r a t i ~languages: e automatic data allocation and deallocation, list comprehensions, unification, remote procedure call, and many others, are features which require much runtime effort that corresponds to hardly any code in the program text. More and more, the emphasis shifts from ‘how to compile’ to ‘what to compile to’.
~ r ~ m a rormore s 7 precisely c o ~ t e ~ t ~ ~ rmars, e ~ aretheessential f o ~ a l i s mfor desc~bingthe structure of programs in a programming language. In principle, the grammar of a languagedescribesthesyntacticstructureonly,but since thesemantics of a language is defined in terns of the syntax, the grammar is also inst~mentalin the definition of the semantics. There are other grammar types besides context-free grammars, but we will be mainly concerned with context-free grammars. We will also meet reguEar ~ r a ~ ~ a which r s ,more often go by the name of ‘regular expressions’ and which result froma severe restriction on the context-free g r a ~ a r s ;and attrib~teg r ~ ~ ~which ~ rares 7 context-free g r a ~ a r s extended with p~ametersand code. Other types of grammars play no more than a marginal role in compiler construction. The term ‘context-free’ is often abbreviated to CF. We will give herea brief s u m m of ~ the features of CF grammars. A g r a ~ is~a recipe ~ r for const~ctingelements of a set of strings of symbols. When applied to p r o g r a ~ i n glanguages, the symbols are the tokens in the langua of symbolsareprogramtexts,andthe set of strings of symbols is the ~ r o g r a ~ m i n g language. The string BEGIN p r i n t ( {{Hi! ) END
consists of 6 symbols (tokens) and could be an element of the set of strings of symbols generated by a programmi~glanguage grammar, or in more normal words, be a pro some programing language. This cut-and-dried view of a prog~amminglanguage would be useless butfor the fact that the strings are constructed in a structured fashion; and to this structure semantics can be attached.
defines a named syntactic construct. construct. An e~ampleof a production rule is expression "+
l
(
'
expression operator expression'1'
The right-hand side of a production rule can contain two kinds of symbols, terminal symbolsandnon-terminalsymbols. As theword says, a ter short) is an end point of the produ be pa^ of the strings produced by hand side (the name)of one or more pr duced by the grammar, Terminals are an input to be analyzed. on-ter~in g r a ~ m symbols ~ intheright-handside of a rule arecollectivelycalledits ,when they occur as nodes in a syntax tree they are more often called its 'children' . In discussing gra~mars,it is customary to use some conventions that allow the class of a symbol to be deduced fromits typo~raphicalform. on-te~inals are denoted by capital letters, mostly A, B,C, and N. T e ~ i n a l are s denoted by lower-case letters near the endof the alphabet, mostlyx, y , and z. Sequences of g r a m m ~symbols are denoted by Creek letters near the beginning of the alphabet, mostly a (alpha), p (beta), and y amma ma). Lower-case letters near the beg inn in^ of the alphabet (a,6, c, etc.) stand for themselves, as terminals. The empty sequence is denoted by E (epsilon),
The central data structure in the production process is the se described as a string of g r a ~ a symbols, r and can then be tho tially produced program text. For our purposes, however, we want to represent the syntactic structure of the program too. The syntactic structure can be added to the flat inte~retation of a sentential form as a tree positioned above the s ~ n t e ~ t iform a l so that the leaves of the tree are the g r a l ~ msymbols. ~ This combination is also called a A string of terminals can be produced from a g r a ~ ~by a rapp tion steps to a sentential form, as follows. The sentential form is initialized to a copy of the start symbol. Each finds a non-terminal N in the leaves of the se~tential fom, finds a production rule N+a with N as its left-hand side, and replaces the iV in the sentential form with a tree having N as the root and the right-hand side of the production rule, a, as the leaf or leaves. When no rnore non-terminals can be found in the leaves of
the sentential form, the production process is finis ed, and the leaves form a string of terminals in accordance with theg r a m ~ a r . Usingtheconventionsdescribedabove,we can writethat replaces the sententialform PNy by Pay. rocess leading from the start sym string. called that are nals the up pose our numbered productio expression -+ ' ( / expression operator expression ') expression "+ '1' operator "+ '-I-' operator -+ '*'
[l] [27 E31
E41
in which the terminal symbols are s u ~ o u ~ d by e dapostro identifiers, and suppose the start symbol is expression. tial forms shown derivation in theF of the particular, it forms a non-terminal in the se~tentialform that is rewritten. in Figure 1.26 shows that grammar ruleR is used to The resulting parse tree (inw h i c ~the derivatio~order is n ure 1.27.
1@1
2@2 4@3
1@4 2@5 3@6 2@7
expression (' expression. operator expression ' 1 ' ' (' '1' operator expression '1' ' ( / '1' '*/ expression. ) ' '(' '1' '*' '(' expression operator expression ')' '1' '(' '1' operatorexpression ')'')' '(' '1' '*' '(' '1' ' S ' expression '1' ' 1 ' ' (' '1' '*' ' (' '1' '+' '1' ' 1 ' ')' I
(
'
Leftmost derivation of the string (1 * (l+ l. ) )
We see that recursion -the ability of a p r o ~ l ~ c trul io~ itself -is essential to the pro~uctiollprocess; without rec only a finite set of strings. The production process is kind enough to produce production tree, but then the program text is com~ittedto a puter file) and the production tree gets stripped off in t to find out the se~anticsof the pro~ram,we use a S retrieve it. The systematic construction of parsers is tr
')'
,
expression
*’
’( J
\ \ 7 2 ’( J expressionoperator
’1’
J
expressionoperatorexpression
’1’
+’
1’
2’7 Parse tree of the derivation in Figure 1.26.
The single g r a ~ m arule r format non-terminal
-+ zero
or more grammar symbols
used above is sufficient in principle to specify any grammar, but in practice a richer notation is used. For one thing, it isusual to combine all rules with the same left-hand side into one rule: for example, the rules
N+a
are combined intoone rule
N+alPlr
in which the original ri~ht-handsides are separated by vertical bars. In this form a, p,and ich may be considered an abbrevia. It is very suitable for expressing nesting and recursion, but less convenient for expressing repetition and optionality, although it can of course express repetition through recursion. To remedy this, three additional notatio~sare introduced, each in the form of a postfix operator: “he format described
so far is kn
-R + indicates the occurrenceof one or more Rs, to express repetition; -R ?indicates the occurrenceof zero or one Rs, to express optionality; and
-R * indicates the occurrenceof zero or moreRs, to express optional repetition.
Parenthesesmay be needed if thesepostfixoperatorsaretooperateonmorethanone
l. Thegrammarnotationthatallowstheabove . An example is the grammar rule parameter-list
-+
foms is called
,for
(/IN’ I ’OUT’)? identifier (’(‘identifier)*
which produces program fragments like a/ b IN yearr month, day OUT left right I
There are a number of properties of grammars and its components that are used in discussing grammars. A non-terminal N is le~t-re~ursive if, starting with a sentential form N, we can produce another sentential form starting with N. An example of direct left-recursion is expression
-+
expression ‘+’ factor
I
factor
but we will meet other forms of left-recursion in Section 2.2.4.3. that con ins one or more left-recursive rules is itself called left-recursive. talso exists, butis less important. -terminal N is ~ullableif, starting with a sentential form N, we can produce an empty sentential form A non-terminal N is useless if it can never produce a string of t e r ~ n a symbols: l any attempt to do so inevitabl~leads to a sentential that again containsN. A simple example is expression
-+
r+r
expression
I
I
-’ expression
but less obvious examples can easily be constructed, Theoretically, useless non-terminals can just be ignored, but in real-world specifications they almost certainly signal a mistake on the part of the user; in the above example, it is likely that a third alte~ative,perhaps I factor,has been omitted. ~rmmar-~rocessing software should check for useless nonterminals, and reject the grammar if they are present. A grammar is ambi uous if it can produce two different production trees with the same leaves in the same order. That means that when we lose the production tree due to linearization of the program textwe cannot recover it unambiguously; and since the sernantics derives from the production tree, we lose the semantics as well. So ambiguous g r m marsaretobeavoided in thespecification of p r o g r a ~ n glanguages,whereattached semantics plays an importantrole,
Thoughts,ideas, definitions, andtheoremsaboutgrammarsareoftenexpressedin a mathe~aticalformalism, Some familiarity with this formalism is indispensable in reading books and articles about compiler construction, which is why we will briefly introduce it ch, much more can be found in any book on formal languages,for which see the ading section of this chapter.
The basicunitinformal g r a ~ a r is s the sy . The onlypropertyofthesesymbols is that we can take twoof them and compare them tosee if they are the same. In this they are comparable to the values of an enumeration type. Like these, symbols are written as identifiers, or, in mathematical texts, as single letters, possibly with subscripts. ~xamplesof - ~ y m ~ o l , symbols are N,x, pracedure-bod ~ ~ ~ ~ ~ n m ~ n t t,+ The next building unit of formal rammars is the ~ r o ~ u ~rule. t i oGiven ~ two sets of symbols V I and V 2 ,a production rule
(N, a) such thatN E V I,a~ V2 * in which X" means a sequence of zero or more elements of the set X . This means that a production rule is a pair consisting of an N which is an element of V , and a sequence a of elements of V 2 . We call N the left-hand side and a the ri~ht-handside. We do not normally write thisas a pair (N,a) but rather as N+a; but technically it isa pair. The V in V I and V 2 stands for v Nowwehaveingunitsneededtodefine a ramm mar. A eo G is a 4-tuple
in which V, and VT are sets of symbols, S is a symbol, and P is a set of production rules. elements of VN c are S , those of VT the ,and S is called the . In programer' s t e ~ i n o l o g ythis means that a g r a is a~ record ~ with non-terminals, the terminals, the start symbol, and the production rules. The previous paragra~hdefines only the c m grammar. To make it a real, acceptable grammar,it has to fulfill threee in which li7 denotes the empty set and which means thatVN and VT are not allowed to have symbols in common: we must be able to telltermi~alsand no~-terminalsapart;
which means that the start symbol must be a non-te~inal;and
which means that the left-hand side of each production rule must be a on-terminal and that the right-hand side may consistof both te~minalsand non-terminals but is not allowed to include any other symbols.
Sequences of symbols are called st be derivable from another string in a grammar; more in p ~ i c u l a ra, S a string a, written a:: a==$, if and only if there exist stringsy, 8, ,62,and a non-terminal N E V,, such that
This means that if we have a string and we replace a non-te string that is directly d Of course,'replacement'
is animperative
notio~
,if and only if a= is ~ e a n that s a strin from the first throu another string if we can reach the second string of a g r a m m ~ G is defined as which is any string that is derivable from the start symbol S of of a grammar G is defined as a sente~~t contain non-terminals:
which denotes a string derivable fromS which is in VT",the set of all strin terminal S mbols on1 . A in, The
whichistheset of allterminalproductions of G. heseterminalproductionsarecalled S in the language U G ) . Termi~al producti mars: if G is a g r a for~a pr~ r a ~ m i n glanguage, th in thatlanguagethatarecorrect a context-freesense. have another property in addition to their identity: typed, printed, etc. For example the representation be := or =,that o f i ~ ~ e ~ e ~ - t y p e - ~CO ym~ol nal symbols ina sentence by their re~resentatio~s and comments, we obtain a program. It is usually considered symbol that has an empty represe~tatio~; it is only sliglltly l different terminal symbols that share the samerep~esentation. Since we are, in this book, more c o n c e ~ e d formal proofs, we will use this formalism s p ~ i n g l yor not at all.
n ~ m b e rof algorithms in compiler constr~ctionstart off tion items and then apply a set of rules to extend the info clusions from them. These 'infor~ation-improving' al~orithms thms are treated in iso~ation; which does not show up well when theal will therefore treat here a si look more different than they really are.
tive of this class of algorithms and refer back toit from the following chapters. The example algorithm in question is the construction of the calling graph of a proof a program is a directed graph which has a nodefor each routine (procedure or function) in the program andan m o w from node A to node B if routine A calls routine B directly or indirectly. Such a graph is useful to find out, for example, which routines are recursive and which routines can be expanded in-line inside other routines. Figure 1.28 shows the sample program in C, for which we will construct the calling graph; the diagram shows the procedure headings and the procedure calls only.
void void void void void
P O Q0 R 0 SO TO
{ { { { {
... Q O ; .... S o ; ... } ... R O ; .... T o ; ... } ... P O ; } ... } ... }
Sample C p r o g r a ~used in the c o ~ s t l ~ c tof i oa~calling graph.
When the calling graph is first constructed from the program text, it contains only the arrows for the direct calls, the calls to routine B that occur directly in the body of routine A; these are our basic information items. (We do not consider here calls of anonymous routines, routines passed as parameters, etc.; such calls can be handled too,but their problems have nothing to do with the algorithm being discussed here.) The initial calling graph of the code in Figure 1.28 is givenin Figure 1.29, and derives directlyfrom that code.
P
R
\k
S
T
Initial (direct) calling graphof the code in Figure1.28.
The initialcallinggraphis,however, of little immediateusesincewearemainly interested in which routine calls which other routine directly or indirectly. For example, recursion may involve call chains from A to B to C back to A. To find these additional information items, we apply the following rule to the graph:
If there is an arrow from node A to nodeB and one fromB to C, make sure there is anm o w from A to C. sider this rule as an algorithm (whichit is not yet), this set-up computes the tr sure of the relation ‘calls directly or indirectly’, The transitivity axiom of the relation can be written as: in which the operator err: should be read as ‘calls directlyor indirectly’. Now the statements ‘routine A is recursive’ and ‘ A d ’ are equivalent. The resulting calling graph of the code in Figure 1.28 is shown in Figure 1.30. We see that the recursionof the routines P, 0, and R has been brought into the open.
Figure 1.30 Calling graph of the code in Figure 1.28.
-Data definitions -definitions and semantics of the information items; these derive from the natureof the problem.
-Initializations -one or more rules
for the initialization of the i n f o ~ a t i o nitems; these convert information from the specific problem into information items. -Inference rules -one or more rules of theform:‘Ifinformationitems I I ,12,... are present then information item J must also be present’. These rules may again refer to specific information from the problem at hand, The rules are called i n ~ e r e ~ crules e because they tell us to infer the presence of i n f o ~ a tion item J from the presenceof i n f o ~ a t i o nitems I1 ,I2,.... When all inferences have been drawn and all inferredi n f o ~ a t i o nitems have been added,we have obtained the closureof the initial item set. If we have specified our closure algorithm correctly, the final set contains the answers we are looking for. For example, if there is m arrow from node A to node A, routine A is recursive, and otherwise it is not. Depending on circu~stances,we can also check for special, exceptional, or erroneous situations.
ure 1.31 shows recursion detectionby calling graph analysis written in this format.
raph with one node for each routine. The information
items are avows in G. 2. An m o w from a node A to a node B means that routine A calls routine B directly or indirectly.
f a routine A contains a call to routine B, an arrow from A to B must be present. If there is an m o w from nodeA to nodeB and one fromB to C, an arrow from A to C must be present. Recursion detection as a closure algorithm.
must be noted about this format. The first is that it does specify which st be present but it does not specify whichinfor~ationitems ~~~~tnot be present; not~ingin the above prevents us from adding arbitrary i n f o ~ a t i o nitems. To remed this, we add the re~uirementthat we do not want m y i ~ f o r ~ a t i oitems n that are not by any of the rules: we wantthe smallest set of i n f o ~ ~ t i oitems n that therules in the closurealgorithm.Thisconstellation is calledthe the closure algo~thm. The second is that the closure algo~thmas introduced aboveis not really an a l g o ~ t ~ in that it does not specify when andhow to apply the inference rules and when tostop; it is rather a declarative,Prolog-likespecification of ourwishes,and‘closure specificati~n’ it doesnotevencorrespond to an acceptable Prowouldbe a moreterm.Actually, logprogram:theprograminFigure1.32gets into an infinite loop immediately. S
calls(A, C ) :- c a l l s ( A , B) , calls(B, calls(a, b) . calls(b, a ) . :-? calls(a, a ) .
C).
A Prolog program correspondingto the closure algorithmof Figure l .31.
V&at we need is an implementation that will not miss any inferredi n f o ~ a t i o nitems, will not add any unnecessary items, and will not get into an infinite loop. The most convenient i ~ p l e ~ e n t a t i uses o n an iterative bottom-up ~ g o ~ t hand m is treated below.
above. If theinferencerulesarerestrictedtothe
form ‘If in
possibilities for th removal of infor
handling to be specified very efficiently.
The usual way of i~plementin a closure algorithl~is this approach, the information erns are visited in sorn itemsthatfulfillacondition of aninferencerule. corresponding inferred item is added, if it was not alre other conditions again,so we have to continue sweepin The exact order of investigation of items and c data structures and the inference rules. There is no generic closure inferencerules can bepluggedintoobtainaspecificclosure a ingenuity is still required. Figure 1.33 shows code transitive closure algorithm.
SET the flag Something changed TO True; WHILE Something changed: TO False; SET Something changed FOR EACH
Node 1 IN Graph: FOR EACH Node2 IN Descendantsof Node 1: FOR EACH Node3 IN Descendants of Node 2: IF there is no arrow from Node 1 to Node 3: Add an arrow from Nodel to Mode 3; SET Something changedTO True; igure 1.33 Outline of a bottom-up algorithmfor transitive closure.
A sweep consists of finding the nodes of the adding an m o w from it to all its descendants’ desce moment, It is important to recogl~izethe restriction ‘ moment’ since this is what forces us to repeat the s
more arrows are added. We are then sure that the descendants we know are all the descendants there are. The algorithm seems quite inefficient. If the graph contains n nodes, the body of the outermost for-loop is repeated n times; each node can have at most n descendants, so the body of the second for-loop can be repeated n times, and the same applies to the third forloop. Together this is O(n 3, in the worst case. Each run of the while-loop adds at least one arc (except the last run), and since there are at most n2 arcs to be added, it could in principle be repeated n2 times in the worst case. So the total time complexity would seem to be 0 ( n 5 ) ,which is much too high to be used in a compiler. There are, however, two effects that save the iterative bottom-up closure algorithm. The first is that the above worst cases cannot materialize all at the same time. For example, if all nodes have all other nodes for descendants, all arcs are already present and the algorithm finishes inoneround. There is a well-knownalgorithm by Warshall (1962) which does transitive closure inO(n3, time and 0 ( n 2 )space, with very low multiplication constants for both time and space, but it has the disadvantage that it always uses this O ( n 3 ) time and 0 (n 2 , space. Also, 0 (n3, time is still rather stiff in a compiler. The second effect is that the graphs to which the closure algorithm is applied are usually sparse, which means that almost all nodes have only a few outgoing arcs. Also, long chains of arcs are usually rare. This changes thepictu~eof the complexity of the algorithm completely. Let us say for example that the average fan-out of a routine is which means that a routine calls on the averagefother routines; and that the average calling depth is d, which means that on the average after d calls within calls we reach either a routine that does not call other routines or we get involved in recursion. Under these assumptions, the while-loop will be repeated on the average U!times, since after d turns all required arcs will have been added. The outermost for-loop will still be repeated n times, but the second and third loops willbe repeated ftimes during the first turn of the while-loop,f2 times during the second turn,f3 times during the thirdturn, and so on, until the last turn, which takes fd times. So the if-statement will be executed nX
f2(d+l)-,f2
cf"+p+fj+...f2"> =
f2-1
xn
times on the average. Although the constant factor can be considerable -for f=4 and d 4 it is almost 70 000 -the main point is that the time complexityis now linear in the number of nodes, which suggests that thealgo~thmmay be practical after all. This is borne out by experience,andbymanymeasurements (Syslo andDzikiewicz, 1975). Fornon-sparse graphs, however, the time complexity of the bottom-up transitive closure algorithm is still 0(~3).
In s u ~ a r yalthough , transitive closure has non-linear complexity in the general case, for sparse graphs the bottom-up algorithmis almost linear.
In addition to being explained in words and by examples, the outline of an algorithm is sometimessketchedin a pseudo-code; we havealreadyseenexamplesinFigures1.24,
1.25, and 1.33. The o ~ t Z i ~code e ignores much of the problematic details that beset many real-worldprogramminglanguages,includingmemoryallocationanddeallocation,type conversion, and declaration before use. It is an informal, reasonably high-level language, based loosely on the programming language ABC (Geurts et al., 1989). We have chosen not to use an existingp r o g r a ~ i n glanguage, for several reasons:
-We emphasize the ideas behind the algorithms rather than their specificim~lementation, since we believe the ideas will serve for a longer period and will allow the compiler designertomakemodificationsmorereadilythan a specificimplementationwould. This is not a cookbook for compiler construction, and supplying specific code might suggest that compilers can be constructed by copying code fragments from books. -We do not want to be drawn into a C versus C++ versus Java discussion. We em~hasize ideas and principles, andwe find each of these languages pretty unsuitablefor hi idea expression. -Real-world code is much less intuitively readable, mainly due to historical syntax and er no^ allocation problems. The rules of the outline code are not very fixed, but the following notes may help in reading the code. Lines can end in a semicolon (; ),which signals a command, or in a colon (:),which signals a control structure heading. The body of a control structure is indented by 4 spaces with respect toits heading. The end of a control structure is evident from a return to a previous indentation level or from the end of the code segment; so there is no explicit ENT> line. Keywords are all-capital; they are self-evident-we hope. Identifiers can contain spaces in addition to letters, digits, and underscores; this is unusual in p r o g r a ~ i n glanguages, but is helpful in const~ctingmeaningful identifiers that print well both in programcode and in running text. Type identifiers and field selectors of records start with a lower-case letter, because they denote a class (‘a dog’); the other identifiers start with an upper-case letter, because they denote individual items(‘Fido’). Field selectors are marked by a dot: Node .l e f t . The selector is separated from the identifier by a space, to avoid constructs like Top node l e f t being misread as Top (node .l e f t ) . It also emphasizes the fact that the selector is actually a postfix operator acting on the record. A c o m m ~ dcan, among other things, be an English-language command starting with a verb, with individual items embedded as ~guments.Such commands can be viewed as procedure calls. An example from Figure l.33 is Add an
arrow from Node l to Node 3 ;
whichcan be viewedas a call to a procedurewithtwoformal p~ametersAdd an. arrow f m m ... t o ...,in addition to being a command in English. Long lines may be brokenfor reasons of page width; the continuationline or lines are indented by another 4 spaces. Broken lines can be recognized by the fact that they do not end in a colon or semicolon, Comments start at // and run to the endof the line.
This concludes our introduction to compiler writing. We have seen a toy interpreter and compiler that already show many of the features of a real compiler. A discussion of the general properties of compilers was followed by an introduction to context-free g r a ~ m a r s and closure algorithms. Finally, the outline code used in this book was introduced. As in the other chapters, a summary, suggestionsfor further reading, and exercises follow.
-A compiler is
a big file conversionprogram, The input format is calledthesource language, the output format is called the target language, and the languageit is written in is the implementation language. -One wants this file conversion because the result is in some sense more useful, like in any other conversion. Usually the targetcode can be run efficiently, on hardware. -Target code need not be low-level, as in assembly code. Many compilers for high- and very high-level languages generate targetcode in C or C++. -Target code need not be run on hardware, it can also be interpreted by an interpreter; in that case the conversion from source to target can be much simpler. -Compilers can compile newer versionsof themselves; this is called bootstrapping. -Compiling works by first analyzing the source text to construct a semantic representation? andthensynthesizingtarget code fromthissemanticrepresentation.This analysis/synthesis paradigm is very powerful, and is also useful outside compiler construction. -The usual form of the semantic representation is the AST, abstract syntax tree, which is the syntax treeof the input, with useful context and semantic annotations at the nodes. Large parts of compilers are generatedautomatically?using program generators based on formalisms, which is another word for special-purpose programming languages, Important formalisms are regular andcontext-free grammars (for program text analysis), attribute g r a ~ a r s(for context handling), and bottom-up tree rewriting (for code generation). -The source code input consists of characters. Lexical analysis constructs tokens from the characters. Syntax analysis constructs a syntax tree from the tokens. Context handling checks and annotates the syntax tree. Code generation constructs target code from the annotated syntax tree. Usually the target code needs the support of a run-time system. -Broad compilers have the entire AST at their disposal all the time; narrow compilers make do with the path from the node under consideration upwards the to top of the AST, plus information collected about the branches on the left of that path. -The driving loop of a narrow compiler is usually inside the parser: it pulls tokens out of the lexical analyzer and pushes parse tree nodes to code the generator. -A goodcompilergeneratescorrect,truthfulcode,conformsexactlytothesource language standard, is able to handle programs of virtually arbitrary size, and contains no quadratic or worse algo~thms.
-
r that can easily be run on different p l a t f o ~ sis portable; a compiler that can et code for ~ifferent p~atforms is izations are attractive and useful, compiler con~truct~on has shift^^ fr st of the how-to p r o ~ l e ~have s been solved by automatic allow us to recover the structure of the source pros text was linearized in the process of c o ~ ~ i t t iti n ~ iler con~tructio~ are closure a ct more i ~ ~ o ~ a t i o ~ he a ~ ~ o r idiffer t ~ ~ins W ntion is collected and
and Warshall (1 962). Schnorr presents a sophisticated but still reasonably simple version of the iterative bottorn-up algorithm shown in Section 1.lo. l and proves that its expected time requireme~tis linear in the sum of the number of nodes and the final number of edges. Warshall's algorithm is very famous and is treated in any text book on algorith~s, for example Sedgewick (1988).
Compilersareoftenwritteninthelanguagetheyimplement.Identifyadvantages and disadvantages of this technique. Referring to Section 1.1.1.1, give additional examples of why a language front-end would need information about the target machine and why a back-end would need information about the source language. Redo the demo compiler from Section 1.2,in your favorite program~inglanguage. Compare it to the versionin this book. Given the following incomplete grammarfor a very simple segmentof English: Sentence "3 Subject Verb Object Subject "+ Noun-Phrase Object -+ Noun-Phrase Noun-Phrase -+ Noun-Compound Personal-Name 1 Personal-Pronoun Noun-Compound -+ Article? Adjective-Sequence? Noun S
.
.
(a) What is the parse tree for the sentence 1 see you,in which 1 and you are terminal productions of Personal-Pronoun and see is a terminal production of Verb? (b) What would bea sensible AST for this parse tree? Consider the demo compiler from Section 1.2. One property of a good compiler is that it is able to give good error messages, and good error messages require,at least, knowledge of the name of the input file and the line number in this file where an error occurred. Adapt the lexical analyzer from Section 1.2.4 to record these data in the nodes and use them to improve the quality of the error reporting. Implement the constant folding optimization discussed in Section 1.2.6: do all arithmetic at compile time. One module that is missing from Figure 1.21 is the error reporting module. Which of the modules shown would use the error reporting module and why? Modify the code generator of Figure 1.18 to generateC code rather thanPUSH,ADD, MULT and PRINT instruction, and compile and run that code,
ere is the context that must be remembered betweeneach cycle of the while loop ure 1.23 and the next? s the ~ompiler i~plemented in Section 1.2a narrow or a broad compiler?
onstr~ctthe post-main versionof the main-loop module in Figure 1.24. or those who already h o w what a finite-state automaton (FSA) is: rewrite the preill and ~ost-mainversions of the aa -+ b filter using an FSA. You will notice W the code is simpler: an FSA is a more efficient but less structured device for age of state than a set of global variables. at is an ‘extendedsubset’ of a language? Why is the termusuallyused
in a
expression in Section 1.2.1 has: expression
-+expressionterm
I
expressionterm
I
term
f we replaced this by
expression expression
I
expression
I-’
expression
I
term
Id still produce the same language, but the re~lacementis not
arameter-list
”+
(‘IN’ I ‘OUT’) ? identifier (‘I
identifier)*
ection 1.9.3 to BNF.
C - + B C S
me the non-terminals that are left-recursive, right-recursive, nullable, or useguage does the grammarproduce?
y could one want two or more terminal symbols with the same ad design to have
re~resentation?
a terminal symbol with an empty
Refer to Section 1.9.5.1 on the definition of a ~ r a m m condition ~, (3). ?Vhy do we have to be able to tell t e ~ n a l and s non-te~inalsapart? Arguethatthere is onlyone‘smallestset r e q u i r e ~ e ~oft sa closure specification.
of i n f o ~ a t i o nitems’that fulf~llsthe
ist to^ of c o ~ ~ i constr~ction: ~er Study Conway’s paper (1963) on the coroutinebased mo~ula~zation of compilers, and write asummary of it.
e front-end of a compiler starts with a stream of characters which constitute the program text, and is expected to create from it intermediate code that allows context h a n ~ ~ i nand g ~ ~syntactic ~ g struct~reof the et code. It does this by first r e c ~ v the ccording to the g r ~ l ~ mof a rthe lan n terms of its syntactic str~cture e the c o ~ e s p o n d i n ~ i n t e ~ e dcode. iate as constant defi~itionsof the form
CONST pi = 3.14159265; CONST pi-squared = pi * pi;
r a ~ m afor r such constantd e f i ~ i t i ois: ~s ~ o ~ ~ t ~ n t - ~ e ~"+i 'CONST' ~ i t i oidentifier ~
' = I
expression ';'
t appear as is in the program, a l s refer to ~ r a ~ m rules ar and expression are n o ~ - t e ~ ~ i nwhich
o c c ~ ~ r e of ~ cthe e conv a l ~ a t to e ~give a st of the block.' t results in a syntax tree, which contains antics is defined based tree as the intermediate
code.
In practice, the actual syntax tree contains too many dead or uninteresting branches andacleanedupversion of it,theabstractsyntaxtreeorAST, is more efficient. The difference between the twois pragmatic rather than fu~damental,and the details depend on of thecompilerwriter.Considerthe(oversimplified) thegoodtasteanddesignskills g r a m m rule ~ for expressionin Figure 2.1. Then the actual syntax treefor CONST pi-squared = pi * pi;
is
I\'\\\
constant-definition
CONST
identifier
product
pi-squared
=
expression
;
*
factor
expression
identifier
factor
identifier
Pi
Pi
as specified by the g r a m ~ a rand , a possible abstract syntax tree could be: constant-definition
/\ /l\
pi-squared expression
Pi
*
Pi
The simplifications are possible because 1. the tokens CONST, =, and ;serve only to alert the reader and the parser to the presence of the constant definition, and need not be retained for further processing; 2. the semantics of identifier (in two different cases), expression,and factor are trivial (just passingon the value) and need not be recorded.
This means that nodes for constant-de f init ion can be imple~entedin the com-
expression "-+ product I factor factor product "+ expression factor "+ number 1 identifier Figure 2.1 A very simple ~r~~~
for express
ion.
iler as records with twofields: struct constant-definition { Identifier *CD-idf; Expression *CD-expr;
1 (in ition ion to some s t a n d ~ dfields recording in ~ h i c hfile and at what line the constant in it ion was found). Another example of a useful difference between parse tree and AST is the combinanode types for if-then-else and if-then into one node type if-then-else. An ifis represented by an if-then-else node, in which the else part has been supplean empty statement, as shown in Figure2.2.
if-statement
IF
condition
THEN
if-statement
statement condition statement statement
Syntax tree (a)and abstract syntax tree (b)of an if-then statement.
ntext handling module gathers information about the nodes and combines it other nodes. This information serves to perform contextual checking and to eneration. The abstract syntax tree adorned with these bitsof information is tated ~ ~ s t r asc yt ~ t atree. ~ Actually the abstract syntax txee passes through 'an~otatedness'during compilation. The degree of annotatedness starts out straight from parsing, and continues to grow even throughcode generation, in which, for example, actual memory addresses may be attached as annotationsnodes, to At the endof the context handling phase our AST might have the form
constant-definition
expression TYPE: real.
pi-squared TYPE: real
* Pi TYPE: real VAL: 3 . 1 4 1 5 9 2 6 5
Pi TYPE: real VAL: 3 . 1 4 1 5 9 2 6 5
and after constant folding -the process of evaluating constant expressions in the compiler -it might be rather than at run time constant-definition
pi-squared TYPE: real
expression TYPE: real VAL: 9 . 8 6 9 6 0 4 3 7
Havingestablished the annotatedabstractsyntaxtree as theultimategoal of the front-end, we can now work our way back through the design. To get an abstract syntax tree we need a parse tree; to get a parse tree we need a parser, which needs a stream of tokens; to get the tokens we need a lexical analyzer, which needs a stream of characters, and to get these characters we needto read them. See Figure 2.3.
l I
c " " "
1
" " " r
Input text
' " " " & ure 2.3 Pipeline from input to annotated syntax tree.
Some compiler systems come witha so-called structure editor anda program management system which stores the programs in parsed forrn. It would seem that such systems can do without much of the machinery described in this chapter, but if they allow unstructured program test to be imported or allow such modifications to the existing text that parts of it have to be reanalyzed from the character level on, they still need full the apparatus. Noonan (1985) gives aset of heuristic rules for deriving a good AST structure from a grammar. For an even more compact internal representation of the program than ASTs see Waddle (1990).
ty, and it is not u n u s ~ afor ~ 30% of the time spent in the front-end to be
piler. oadmap 2.1 From p r o g r text ~ ~ to tokens -the lexical s~ructure
eading the program text
2.1.2 Lexical versus sy~tacticanalysis 2. l .3Regular expressions and regulardescri~tions 2.3.4 Lexical analysis
.I .5-2.1.9 Creating lexical analyzers 2.1.10-2.1 .12.3 Symbol handling and its applications
text consists of ch~acters,but the use of the s t a ~ ~ch~acter-read ar~ in^ routines sincetheseroutines are
necess~y,and on some py of the c ~ ~ a c tthe e ~file s ~ o ~ t a iOlder ~s. eed up read in^ of the program file and to n ~ a c h i ~ the e s r e c o ~ ~ e ~ method d e d is to ly the ~ a s t ~input s t methodand es problem: ode^ ~ a c ~ i nhave enerated ~ r o files g are ~ seldom ~ ~ that er to obtain the size of a file, so memory e entire input file in ples of such tokens are any of these need to be stored for
further use by the compiler and allocating space for them is much easier if their sizes are known in advance. Suppose, for example, that a string is read using a routine that yields characters one by one. In this set-up, the incoming characters have to be stored in some temporary buffer until the endof the string is found; the size of this buffer is not known in advance. Only after the end of the string has been readcan the final allocation of space For the string take place; andonce we have the final destination we still have to copy the characters there. This may lead to complicated allocation techniques, or alternatively the cornpiler writer is tempted to impose arbitrary limits on the largest allowable string length; it also costs processing timefor the copying operation. With the entire file in memory, however, one can just note the position of the first string character, find the end, calculate the size, allocate space, and copy it.Or, if the input file stays in memory throughout the entire compilation,one could represent the string by a pointer to the first character andits length, thus avoiding all allocation and copying. ing the entire program textin memory has the additional advantage that error messages can easily show the precise code around the place of the problem.
There is some disagreement as to whether ‘newline’ is a character or not, and if it is, what it looks like. Trivial as the question may seem, it can be a continuous source of background bother in writing and using the compiler. Several facts add to the confusion. First, each operating system implements its own convention. In UNIX, the newline is a character, with the valueof octal 12. In MS-DOSthe newline is a combination of two characters, with values octal 15 and 12, in that order; the meaning of the reverse order and that of the characters in isolation is undefined. And in OS-370 the newline is not a character at all: a text file consists of lines called ‘logical records’ and reading it produces a series of data structures, each containingasingleline.Second,inthosesystemsthatseemtohave a newline character, it is actually rather an end-of-line character, in that it does not occur at the beginning of the first line, but does occur at the end of the last line. Again, what happens when the last line is not terminated properly by a ‘newline character’ is undefine And last but not least, some people have strong opinionson the question, not allof them in agreement with the actual or the desired situation. Probably the sanest attitude to this confusionis to convert the input to a fixed internal format as soon as possible. This keeps the operating-system-dependent part of the compiler to a minimum; some imple~entationlanguages already provide library routines for doing this. The internal format must allow easy lexical analysis, for noma1 processin and easy reproduction of the original program test, for error reporting. A convenient format is a single character array in which the lines are stored consecutively, each t e ~ i n a t e by a newline character. But when the text file format supplied by the operating system differs too much from this, such an array may not be cheap to construct.
Havingbothalexicalandasyntaxanalysisrequires one todecidewheretheborder betweenthetwolies.Lexicalanalysisproducestokensandsyntaxanalysis cons~~mes them, but what exactly is a token? Part of the answer comes from the language definition
it is design. A good guideline is ‘If it can be separated from its left and right white space without changing the meaning, it’s a token; otherwise it isn’t.’ If lowed between the colon and the equals sign in :=, it is two tokens, and ppear as a separate token in the grammar. If they have to stand next to each thing intervening, it is one token, and only one token occurs in theg r a m m ~ . ean that tokens cannot include white space: strings can, and they are tokens e, since adding white space in a string changes its meaning. Note that the ate the string are not tokens, since they cannot be separated from their acters by white space without changing the meaning. ~ ~ m e nand t s white space are not tokens in that the syntax analyzer does not cony are generally discarded by the lexical analyzer, but it is often useful to e them, to be able to show some program text surrounding an error. om a pure need-to-know point of view, all the lexical analyzer has to supply in the struct Token.are the c l a s S and repr fields as shown in Figure1.1l , but in practice it is very much wort~whileto also record the name of the file, line number, and character posiit started).Such i n f o ~ a t i o nis tion in whichthetokenwasfound(oractuallywhere i ~ v ~ l u ~for b l egiving user-friendly error messages, which maysurface much later on in the c o ~ ~ i l ewhen r , the actual program text may be long discardedfrom memory.
he shapes of thetokens of a languagemay be describedinformallyinthelanguage anual, for example: ‘An identifier is a sequence of letters, digits, and underscores that starts with a letter; no two consecutive underscores are ailowed init, nor can it have a trailscore.’ Such a description is quite satisfactory for the user of the language, but ler c o n ~ t ~ c t i opurposes n the shapes of the tokens are more usefully expressedin what are called ‘regular expressions’. Regular expressions are well known from their use as search expressio~sin text editors, where for example the search expression ab* is used t that consists of an a followed by zero or morebs. essio~is a f o ~ u l athat describes a possibly infinite set of strings, be viewed both as a recipe for generating these strings and asa patngs. The above regular expression a b * , for example, generates the b ... }. When we have a string that can be generated by a ay that the regular expression~ a t c h e sthe string. e most basic regular expressionis a pattern that matchesjust one character, and the ne that specifies that characterexplicitly; an example is the pattern which matches the character a. There are two more basic patterns, one for matching a set of c~aractersand one for matching all characters (usually with the exception of the n ~ - o f - l i ~character, e if it exists).Thesethreebasicpatternsappear at thetop of Figy , z, ...stand for any character and R,R R2,...stand for any regonally be followedby a re~etitiono ~ e r ~ t oexamples r; are b? r a possiblyemptysequence of bs; and b+ for a non-empty
on o ~ e r a t o r ~One . is the invisible operator, which indicates
x
Basic ~ a t t ~ r n s :
[VZ...l
epetition operators: R? R* R+
~atc~in~: The character x Any character, usually except a newline Any of the characters x, y, z, ... An R or nothing (=optionally anR ) Zero or more occurrencesof R One or more occurrencesof R
Composition operators:
R IIR2
An R l followed by an R2 Either anR l or anR 2
~rouping: (R)
R itself
RIA2
Figure 2.4 C o ~ p o n e n t sof regular expressions.
I
concatenation; it occurs for example between the a and the b in ab*. The second is the operator which separates alternatives; for example, ab* I cd? matches anything that is matched by ab* or alternatively by cd?. The repetition operators have the highest precedence (bind most tightly); next comes theconcatenationoperator;andthealternativesoperator 1 hasthelowestprecedence. Parentheses can be used for grouping. For example, the regular expression ab* I cd? is equivalent to ( a (b*)) 1 (c (d?) 1. A more extensive set of operators might for example include a repetition operator of the form Rm-n, which stands for m to n repetitions of R, but such forrns have limited usefulness and complicate theimple~entationof the lexical analyzerco~sidera~ly.
A comparison with the right-hand sidesof production rules inCF g r a ~ m a r ssuggests itself. We see that only the basic patterns are characteristic of regular expressions. They share with the BNF notation the invisible concatenation operator and the alternatives operator, and with EBNF the repetition operators and parentheses. atorsand [,1, (,and ) as separators prevents them from being used to match themselves as actual characters. There is usually some trickery involving esca e characters to force these characters to stand for themselves rather than being taken as operators. One example of such an escape character is the backslash, \, which is used as a prefix: \* denotes the asterisk, \\ the backslash
etc.Anotheristhequote,which is used to s u ~ o u n dtheescapedpart: +? denotes a plusfollowed by a ~uestionmark, denotes ter itself, etc. As we can see, additional tric ery is needed to represent the re esthetically satisfyi~ aracters used been had *,+,?,etc., with a special ~ e a n i rather n~ endow the n o r ~ a characters l t this is not the path that history has taken and the present situation pre
roble^^.
can substitute the ri~ht-handside of the first rule ~ i n a l s in ) the second and further rules, adding
tion of the previous rules and its r i g ~ t - h a nside ~ corresponds to the desired re~ularexpression. ular description for the identi~ierdefined at thebe i n n i of ~ ~this section is: digit -+ [O-91 underscore "+ __ letter-or-digit -+ letter 1 digit u~der~cored-tail"+ underscore letter-or-digit+ ntifier -+ letter letter-or-digit* underscored-tail* t is relativelyeasy to seethatthis i ~ ~ l e n l e n t s estr~ctionsabouttheuse of theunderand no trailing u~derscore. score: no two cons~cutive u~derscores
The s~bstitutionprocess descri~edabove c identifier -+ [a-zA-Zl ([a-zA-Z] I [O-91) * (-
([a-zA-Z] I
[ 0 - 9 ] )+) *
~ h i cafter ~ , some si~plificatio~, redL~cesto: identifier
" 3
[a-zA-Z]
[a-zA-ZO-9] * (-[a-zA-ZO-9]+)*
he ri~ht-handside is the re ulas e~pressionfor i enti f i e r . This is a clear case of ~o~ciseness versus re~~dability.
e is spe~ifiedby a re shape and co~espond words also fall in this iscussed in Section 2.1 .lo. Other
tokens can occur in many shapes and correspond to more complicated regular expressions; examplesareidentifiersandnumbers,Stringsandcommentsalso fall inthis class, but again they often require special treatment, The combination of token class name and regular expression is called a to assignment-symbol
-+
:=
The basic task of a lexical analyzer is, given a set S of token desc~ptionsan tion P in the input stream, to determine which of the regular expressions in S will match a segment of the input starting atP and what that segmentis. If there is more than one such segment, the lexical analyzer must have a disa~biguating rule; normally the longest segmentis the one we want. This is reasonable: if S contai~s the regular expressions =, = /, and =/=, and the input is =/=, we want the full = /= matched.Ifthelongestsegment is matchedbymorethanoneregularexpressionin S, again tie-breaking is needed and we must assign priorities to the token desc~ptionsin S. Since S is a set, this is somewhat awkward, and it is usual to rely on the textual order in which the token descriptions are supplied: the token that has been defined textually first in the token description file wins. To use this facility, the compiler writer has to specify the more specific token descriptions before the less specific ones: if any letter sequence is an identifier except xyzz y , then the following willdo the job: magic-symbol -+ xyzzy identifier -+ [ a - z l IRoadmap Creating lexical analyzers: 2. l.5 Creating a lexical analyzerby hand 2.1.6 Creating a lexical analyzera ~ t o ~ a t i c a l l y 2.1 .?Transition table compression 2. l.8 Error andl ling in lexical awalyzers 2.1.9 A traditional lexical analyzer generator-lex
Lexical analyzers can be written by hand or generated automatically, in both cases based on the specification of the tokens through regular expressions; the required techniques are treated in this and the following section, respectively. Generated lexical analyzers in particular require large tables and it is profitable to consider methods to compress these tables (Section 2.1.7). Next, we discuss input error handling in lexical analyzers. An example of the use of a t~aditionallexical analyzer generator concludes the sections on the creation of lexical analyzers. It is relatively easy to write a lexical analyzer by hand. Probably the best way is to start it with a case statement over thefirst character of the input. The first characters of the different tokens are often different, and such a case statement will split the analysis problem into many smaller problems, each of which can be solved with a few lines of ad hoc code. Such lexical analyzers can be quite efficient, but still require a lot of work, and m a y be difficult to modify.
ures2.5 though 2.12 containtheelements of asimplebutnon-triviallexical nizes five classes of tokens: identifiers as defined above, integers, onetokens, and the token classes ERRONEOU~and EOF. As one-character tokens t the operators -I-, -,*,and /,and the separators ; ,,(comma), (,) ,{,and },as ht be used in an actual programming language. We skip layout andcomment;commentstartswithasharpcharacter # andendseitherat another # or at end of line. Single characters in the input not covered by any of the above r ~ c o ~ n i z as e dtokens of class ERRONEOUS. An alternative action would be to discard such char~cterswith a warning or error message, but since it is likely that they represent error for an actual token, it is probably better to pass them on to the parser to ere was so~ethingthere. Finally, since most parsers want to see an explicit ken, the pseudo-character end-of-input yields the real token of class EOF for en~-o~-~ile. /* Define class constants;0-255 reserved for ASCII characters: */ #define IDE~TIFIER #define ~ N ~ E ~ #define ERRONEOUS 259
256 257 E258 ~
typedef struct { char *file-name; int line-number; int char-number; } Position-in-File; int class; char *repr; Position-in-File p o s ; , } Token-Type; extern Token-Type Token; xtern tern
void void
start-lex (void) ; get-next-token (void) ; igure 2.5 Header file l e x . h of the ~ ~ n d ~ r i t tlexical e n analyzer
shows that the Token-Type has been extended with a fieldfor recording e ~ositionin the input at which the token starts; it also includes the definitions of the class co~stants.The lexical analyzer driver, shown in Figure 2.6, consists of declarations of localdatatomanagetheinput,aglobaldeclaration of Token, andtheroutines start l e x ( ) ,which starts the machine, and get-next-token ( ) ,which scans the tain the next token and put its data Token. in After skipping layout and comment, the routine get-next-token ( ) (Figure 12.7)
#include #include
!!input.h" "lex.h"
/* for get-input0 */
/* PRIVATE */ static char *input; static int dot; static int input-char; #define
next-char()
/* dot position in input*/
/* character at dot position */
(input-char = input
[++dot1 1
/* PUBLIC */ Token-Type Token; start-lex (void) { input = get-input 0 ; dot = 0; input-char = input Cdotl ;
void
Figure 2.6 Data and start-up of the handwritten lexical analyzer.
recordstheposition of thetoken to be identifiedinthefield Token. pos by calling note-token-posi t ion ( ) ; the code for this routine is notshownhere.Next, get-next-token () takes a five-way split based on the present input character, a copy of which is stored in input-char. Three cases are treated on the spot; two more cornplicated cases are referred to routines. Finally, get-next-token ( ) converts the chunk string by calling of the input which forms the token into a zero-terminated input-to-zst ring ( ) (not shown) and stores the result as the representation of the token. Creating a representation for the EOF token is slightly different since there is no corresponding chunkof input. Figures 2.8 through 2.10 show the routines for skipping layout and recognizing identifiers and integers. Their main task is to move the variable dot just past the end of the form they recognize. addition, In recognize-ident if ier () and recognize-integer ( ) set Token. class. The routine get-next-token ( ) and its subroutines frequently test the present input charactertoseewhetheritbelongsto a certain class; examplesare calls of is-letter (input-char) and is-digit (input-char) . The routines used for this are definedas macros and are shown in Figure 2.1 1. As anexample of its use,Figure2.12shows a simplemainprogramthat calls get-next-token ( ) repeatedly in a loop and prints the i n f o ~ a t i o nfound in Token. The loop terminates when a token with class EOF has been encountered and processed. Given the input # *# 8 ; ##abc-dd-8 ;z #/ z it prints the results shown in Figure 2.13.
void get-next-token (void) { int start-dot; skip-layout-and-comrnent();
/* now we are at the start of a token orat end-of-file, so: */ note-token-position ( ) ; /* split on first character of the token */ start-dot = dot; ) { if (is-end-of-input (input-char) Token.class = EOF; Token.repr = llsEOF>ll; return;
1
if (is-letter (input-char) ) {recognize-identif ier () ;} else if (is-digit (input-char) ) {recognize-integer () ; } else if (is-operator (input-char)I I is-separator (input-char) ) { Token.class = input-char; next-char();
3
else {Token.class = ERRONEOUS; next-char();}
1
Token.repr = input-to-zstring(start-dot, dot-start-dot);
Figure 2.7 Main reading routineof the handwritten lexical analyzer.
void skip-layout-and-comment(void) { while (is-layout (input-char) ) {next-char () ;} while (is-comment-starter (input-char) ) { next-char ( ) ; while (!is-comment-stopper(input-char)) { if (is-end-of-input (input-char)) return; next-char () ;
1
1
next-char ( ) ; while (is-layout
(input-char) ) {next-char ( ) ;}
Figure 2.8 Skipping layout and comment in the handwritten lexical analyzer.
We see that often questions of the type i s-let ter (ch) are asked. These questions have the property that their input parameters are from a finite set and their result depends
void recognize-identifier(v0id) { Token.class = IDENTIFIER; next-char(); while (is-letter-or-digit
(input-char) ) {next-char (1 ;}
while (is-underscore(input-char) && is-letter-or-digit (input[dot+ll)
) {
1
1
next-char () ; while (is-letter_or-digit(input-char)) {next-char0 ;}
Figure 2.9 Recognizing an identifier in the handwritten lexical analyzer.
void
1
recognize-integer (void) { Token.class = INTEGER; next-char(); ) {next-char (1 ;} while (is-digit (input-char)
Figure 2.10 Recognizing an integer in the handwritten lexical analyzer.
#def ine is-end-of-input (ch) ((Ch) == ' \ O r ) #define is-layout (ch) (!is-end-of-input (ch) && (ch) provides state, and the function Next item set (Item set Ch) describes its stateons; thefunction Next it em a t~ansitionunction. Thealgori If is called a ~ n i t e ~ ~autot~te Since there are only a finite number of states, it is customary to number them, starting fromSo for the initial state. The question remains how to determine the set of reachable item sets. The answer is very simple: by just const~ctingthem, starting from the initial item set; that item set is certainly reachable. For each character Ch in the character set we then compute the item set Next it em (Itsetem ,set Ch). This process yields a number of new reachable item sets (and perhaps some old ones we have already met). We repeat the process for each of the new item sets, until no new item sets are generated anymore. Since the set of I
S
item sets is finite, this will eventually happen. This procedure is called the ;it finds the reachable subsets of the set of all possible items, plus th between them. It is depicted as a closure algorithm in Figure 2.26. " "
la. A 'state' is a set of items. lb. Let States be a set of states. 2a. A 'state transition' is a triple (start state, character, end state). ions be a set of state transitions. 2b. Let Transit S to contain a single state, Initial 2. Set Transit ions to the empty set.
() set .
item
If St es at contains a state S, St at must contain es the state E and Transi tions must contain the state transition (S, Ch,E) for each character Ch in the input character set, where E = Next item set(S, Ch). ~ i ~ u 2.26 r e The subset algorithm for lexical analyzers.
For thetwotoken set ():
descn~tionsabove, we findthe initial state Initial item
integral-number -+ (* [0-91 + fixed-point-number -+ ( 0 [0-91 )* ' .' (LO-91 ) + fixed-point-number -+ ([0-91) * * ' .' ([O-91 ) +
We call this state So. For this example we consider only three character classes: digits, the decimal points and others -semicolons, parentheses, etc. We first compute Next item set ( S o , digit),which yields integral-number -+ ( 0 CO-91 ) + integral-number -+ ([O-91 ) + * .' ([O-91 ) + fixed-point-number -+ ( 0 [0-91 ) * ' fixed-point-number -+ ( [0-91) * * .' (10-91 ) +
l l ; return;
1
1
Token.pos.line-number = line-number; strcpy(Token.repr = (char*)malloc(strlen(yytext)+l) yytext) ; I
(void) {return l;} 1 Lex input for the token set from Figure2.20.
that case. Finally, since the representation of the token in the array yytext [1 will be overwritte~by that of the next token, it is secured in Token.repr by the fairly horrible C-ism in the last statel~entin the routine. The decl~ationof yywrap ( ) assists in proper end-of-file ~ a ~ d ~further i n ~details ; can be found in any lex manual, for example that by Levine, Mason and
The handwritten lexical analyzer of Section 2.1.5 recorded the position in the input file of the token delivered by t r a c ~ n gthat position inside the routine next-char ( ) . Unfortunately, we cannotdo this in a reliable way in lex, for two reasons. First, some variants of lex read ahead arbitrary amounts of input before producing the first token; and second, some use the UNEX input routine f r e a d ( ) rather than g e t c () to obtain input. In both cases, the relation between the characters read and the token recognizedis lost. We solve half the problem by explicitly counting lines in the lex code. To solve the entire problem and record also the character positions inside a line, we need to add code to measure and tally the lengths of all patterns recognized, We have not shown this in our code to avoid clutter. This concludes our discussion of lexical analyzers proper. The basic purpose of the stream of tokens generated by a lexical analyzer in a compiler is to be passed on to a syntax analyzer. For purely practical reasons it is, however, convenient, to introduce additional layers between lexical and syntax analysis. These layers may assist in further identification of tokens (Section 12.1.lo), macro processing andfile inclusion (Section 2.1.12.1), conditional text inclusion (Section 2.1.12.2), and possibly generics (Section2.3 .l 2.3). We will now first turn to these intermediate layers.
In a clean design, the only task of a lexical analyzer is isolating the text of the token and identifying its token class. The lexical analyzer then yields a stream of (token class, token representation) pairs. The token representation is carried t ~ o u g hthe syntax analyzer to the rest of the compiler, where it can be inspected to yield the appropriate semantic information. An example is the conversion of the representation 37 7 # 8 (octal 377 in Ada) to the integer value 255. in a broad compiler, a good place for this conversion would be in the initialization phase of the annotation ofthesyntaxtree,wheretheannotationsthat derive from the tokens form the basis of further attributes. In a narrow compiler, however, the best place todo computations on the token text is in the lexical analyzer. Such computations include simple conversions, as shown above, but also more elaborate actions, for example identifier identification. Traditionally, almost all compilers were narrow for lackof memory and did considerable semantic processing in the lexical analyzer: the integer value 255 stored in two bytes takes less space than the string re~resentation 37 7 # 8'I . With modern machines the memory considerations have for the most part gone away, but language properties can force even a modern lexical analyzer to do some semantic processing. Three such properties concern identifiers that influence subsequent parsing, macro processing, and keywords. In C and C++, typedef and class declarations introduce ~~entifiers that in~uencethe parsing of the subsequent text. In particular, in the scopeof the declaration t y p e d e f i n t T;
the code fragment
(T *)
is a cast which converts the subsequent expression to ‘pointer to ‘I”,and in the scope of the variable declaration int T;
it is an incorrect expression with a missing right operand to the multiplication sign. In C and C++ parsing can only continue whenall previous identifiers have been identifiedsufficiently to decideif they are type identifiers or not. We said ‘identified sufficiently’ since in many languages we cannot do full identifier identification at this stage, Given the Ada declarations type Planet = (Mercury, Venus, Earth, Mars) ; type Goddess = (Juno, Venus, Minerva, Diana) ;
then in thecode fragment for P in Mercury .. Venus loop
the identifier Venus denotes a planet, and in for G in Juno .. Venus loop
it denotes a goddess.Thisrequiresoverloadingresolutionandthealgorithm for this belongs in the context handling module rather than in the lexical analyzer. (Identification and overloading resolution are covered in Section 6.1.1 .) A second reason to have at least some identifier identification done by the lexical analyzer is related to macro processing. Many languages, including C, have a macro facility, which allows chunks of program text to be represented in the program by identifiers. Examples of parameterless macros are #define EOF #define DIGIT 257
256
from the lexical analyzer in Figure1,11; a macro with parameters occurred in #define is-digit (c)
(’0’
c=
(c)
&&
(c)
c=
’9’)
The straightforward approach is to do the macro processing as a separate phase between reading the program and lexical analysis, but that means that each and every character in the program will be processed several times;also the intermediate result may be very large. See Exercise 2.26 for additional considerations. Section 2.1 .l2 shows that macro processing can be conveniently integrated into the reading module of the lexical analyzer, provided the lexical analyzer checks each identifier see to if it has been defined as a macro. A third reason to do some identifier identification in the lexical analyzer stems from the existence of keywords. Most languages have a special set of tokens that look like identifiers but serve syntactic purposes: the keywords or reserved words. Examples are if, switch, case, etc. from Java and C, and begin, end, etc. task, from Ada. There is again a s~aig~tforward approach to deal with the problems that are caused by this, which is specifying each keyword as a separate regular expression to the lexical analyzer, textually before the regular expression for identifier. Doing so increases the size of the transition table considerably, however, which may not be acceptable.
These three problems can be solved by doing a limited amount of identifier identification in the lexical analyzer, just enough to serve the needs of the lexical analyzer and parser. Since identifier identi~cationhas many more links with the rest of the compiler than the lexical analyzer itself has, the process is best delegated to a separate module, the Get next symboltablemodule.Inpracticaltermsthismeansthattheroutine token () ,which is our version of the routine get-next-token () described extensively above, is renamed to something like Get nextS irnple token () ,and that the real Get next token ( ) takes on the structure shown in Figure 2.42. The procedure Switch to macro () does the fancy footwork needed to redirect further input to the macro body; see Section 2.1.12.1 for details.
FUNCTION Get next token( ) RETUR~INGa token: SET Simple token TO Get next simple token 0 ; IF Class of Simple token = Identifier: SET Simple token TO Identify in symbol table (Simple token); // See if this has reset Class of Simple token: IF Class of Simple token = Macro: ; Switch to macro (Simple token) RETURN Get next token () ; ELSE Class of Simple token /= Macro: // Identifier or Type identifier or Keyword: RETURN Simple token; /= Identifier: ELSE Class of Simple token RETURN Simple token; Figure 2.42 A Get
next
token () that does lexical i ~ ~ n t i f i ~ ~ t i ~ n ,
Effectively this introduces a separate phase between the lexical analyzer proper and the passer, the lexical i ase, as shown Figure in 2.43. Lexical identificaeRemer,19'74).Oncewehavethismechanism in place, it tion is also called scree can also renderservices in theimplementation of genericdeclarations;thisaspect is covered in Section 2.1.12.3. We will first consider i~plementationt e c h ~ i ~ for ~ e symbol s tables, and then see how to do macro processing and file incl~sion;the section on lexical analysis closes by examining the useof macro processing ini m ~ l e ~ e n t i ngeneric g declarations.
a sy e list) can be thought of as an extensible array of En its basic form records that can be indexedby a string rather thanby an integer. The string is the identifier and the associated record contains collected information about the identifier. The basic interface of a symbol table module consistsof one single function:
3 Pipeline from input to lexical identification.
FUNCTION Identify (Identifier name) ~ E T U ~ N IaNpointer ~ to Identifier info;
When called with an arbitrary string Identifier name it returns a pointer to a record if inf o;when it is later called with that same string, it returns the of type Idlent ier same pointer, reg~dlessof how often this is done and how many other calls of Identi (f ) intervene. y The compiler writer chooses the record typeIdent i f soier that all pertinent i n f o ~ a t i o nthat will ever need to be collected for an identifier can be stored in it. It is i ~ p o ~ athat n t the function Identify ( ) returns a pointer to the record rather than a copy of the record, since we want to be able to update the record to collect i n f o r ~ a tion in it. In thisrespect Identify () acts just likean may of recordsin C. If C allowed arrays tobe indexed by strings, we could declare an array struct
Identifier-info
Sym-table [11111 ;
anduse &Sym-table [Identif ier-name] instead of Identify (Identif ier name). When used in a symbol table module for a C compiler, I dent if i er could, info for example, contain pointers to the following pieces of information: 1. the actual string (for error messages and for verification in hashing; see below); 2. a macro definitio~(see Section 2.1.12); 3. a keyword definition; 4. a list of type, variable and function definitions (see Section 6.1.1); S. a list of struct and union name definitions (see Section 6.1.1); 6. a list of struct and union field selector definitions (see Section 6.1.1);
In practice, manyof these pointers would be null for most of the identifiers. This approach splits the problem of building a symbol table module into two problems: how to obtain an access mechanism which emulates an extensible string-inde~able array, and how to design and maintain the data attached to the string (which in our case is the identifier). The latter problem is actually a set of problems, since many pieces ofinformation about identifiers have to be collected and maintained, for a variety of reasons and often stemming from different partsof the compiler. We will treat these where they occur; here we will turn to the i ~ p l e ~ e n t a t i of o nthe access mechanism.
info
It would be nice if implementation languages provided extensible string-indexable arrays as a built-in type constructor, but with the exception of awk, Perl, and a few others, they don’t. There are several ways to implement such a mechanism. he most obvious one is simpletoimplementbutveryinefficient:usealinkedlist of (string,data)pairs. The access timeis linear in the number of entries, which makes thecost of entering n identifiers into the symbol table a quadratic function of n: the cost is O(n2). To get to the desired string more easily, a tree structure-binary or more advanced -might be used. The access time is now logarithmic in the number of entries, which makes the total cost O(n In n ) , which is better. The best approach is to use a hash table, which has almost constant, or, with some care, totally constant access time. This makes the total cost linear. variants in hash algorithm design; we will present here whatis called . The basic building block of hashing is the hash function. ~ u n ~ t i oaccepts n a string as a parameter and maps it onto an integer in the range 1, for some N. Reasonable values of N range from a few hundreds to a few tens of thousands,butinprincipleanynumberwillwork.Theseintegersaretobeusedas indexesintoanarray of N elements,the e. Eachelementinthehashtableisa container for set aof (string, data) pair containers are often unceremoniously e idea t s is. that a stringS, which is m a p ~ e donto the integer A4 by the hash called ~ ~ ~ ~The function, will be stored in bucket number M , together with its data. Since the buckets are elements of an array, finding the right bucket can be done in constant time, by indexingthe hash table. A call of Identify (Identifier name) starts by computing the hash value of Identifier name, to find the numberof the bucket that is the proper placefor Ident if i er and name its associated data. It finds the bucketB by indexing, and examinesit to see if it contains the record (Identifier name, data). If it is indeed found, Identi y (f) returns the address of the record. If it is not found, Identifier name has never been seen before: it certainly cannot be inanyoftheotherbuckets. The call to Identify ( ) thencreatesanemptyrecord for it inthebucket B, storesthedata of Ident i f in ier it, and returns name the address of the record. An outline of the code for the function I dentis given i in fFigurey 2.44. Inevitably,sometimestwodiffe gswillhashto the same integ~randendup in . Ifwechoose N muchlargerthanthelargest thesamebucket,inwhat is calleda expected number of different identi~ersin a program (a not too well-defined concept!), most buckets will be empty, many will contain one string, and a few will contain more than onestring. We canthereforeusethesimplestdatastructureto i m p l e l ~ e the ~ t buckets: linked lists. Since the Ident i f ier records ininfo the linked lists still contain their strings we can easily find the corresponding record by stepping through the linked list. If we do not find a corresponding record, we have to add an empty one. So we arrive at the datalayoutshowninFigure2.45,inwhichthreeidentifiers, length, width, and height have been introduced into a hash table of size 5 (actual sizes are more likely to be between 1000 and l 0 000). The hash function used was *
FUNCTION Identify (Identifier name) RETURNING a pointer to Identifier info: SET Bucket number TO Hash value (Identifier name); IF there is no Identifier info record for Identifier name : in Bucket [Bucket number] Insert an empty Identifier info record for Identifier name ; in Bucket [Bucket number] RETURN Pointer to Identifier info record for Identifier name in Bucket [Bucket number] ; Figure 2.44 Outline of the function Identi f y.
( sum
of the ASCII values of the characters in the name length of name ) MOD table size
+
and it so happens that bothwidth and height map onto the same integer,4.
hash-table bucket 0 bucket l bucket 2 bucket 3 bucket 4
_" "_
w i d t h \ O h e i g h t \ O
Figure 2.45 A hash table of size 5 containing three identifiers.
" -
Wecannow see whatthefunction Identify insymbol table (Token) in Figure 2.46 must do. It calls the function Identify (Token) from Figure 2.44, examines the record obtained, and resets the class of the token accordingly, if needed. A possiinsymbol table (Token) based on the data proposed ble structure for Identify above for C is shown in Figure 2.46. To accommodate the new i n f o ~ a t i o nin the token wehaveextendeditwith a field Identifier info pointer; this is one of the extensions discussed in Section 2.1.5. Since the extensions are usually very class-specific, they can be allocated in a union, so their space require~entsare not a problem.
F ~ C T I O NIdentify in symbol table (Token) RETURN^^^ a token: SET Identifier info pointer TO Identify (Representation of Token); IF Identifier info indicates macro: SET Classof Token TO Macro; ELSE IF Identifier info indicates keyword: SET Classof Token TO Keyword; ELSE IF Identifier info indicates type identifier: SET Class of Token TO Type identifier; ELSE Identifier info indicates variable, struct, union, etc. identifier: // Do not change Classof Token; Append Identifier info pointer to Token; RETURN Token; F i ~ 2.46 ~ r A~ possible Identify
in
symbol
table (Token)for C.
Note that if the token represents an identifier, it now has two references to the identifier in it: one as the representation that any token has, and one in the identifier info. And what is more, the two do not have to be the same. For example, in a language that allows but does not differ~ntiatelower-case and upper-case letters in identifiers (as does Ada) the representation will be the actual representation of that token, for example PostScript in one place and Post script in another, and the string in the identifierinfo will always be the same canonical representation,for example POSTSCRIPT.
e buckets introducesa linear time component in the access mechanism and a quadratic one in the entire algorithm. This can easily be seen by considering what happens when we usea hash table of size 1:all strings will be mappedonto the same integer, 0, and a single linked list will result. In fact, the hash table does nothing fundamental to the quadratic nature of the algorithm; it just replaces one list of length n by N lists of average length n/N. A simplebuteffectivewaytoimproveefficiency is by makingthelists S . The easiest way to effect this is by moving the record that is accessed to the head of the linked list. The rationale is that when an identifier is used at a certain position
in the progr~m,it is likely that it will again be referenced soon. So the next access will find the most recently used identifier at the beginning of the list. This simple trick avoids almost all list searching.
An important deter~inantof the quality of a hash table mechanism is the hash function. A good bash function should be fast and should map the identifiers evenly over the integers, withnoobviousrelationshipbetweentheidentifiersthatmapontothesameinteger. Related identifiers like a, b,ab,ba, AB, etc. BA,should Ab, map onto different integers; the hash value should dependon the value of each character in the string and also on their order. It is surprisingly easy to spend considerable effort on designing a hash function and still come up with a bad one. Mc~enzie,Harries and Bell (1990) have analyzed the performance of several widely used hash functions and have concluded that the following simple iterative a~gorithmperforms very well, provided that ‘good’co~binationsof the hash table size N and the multiplication constantk are chosen. h0 = 0 hi = k ~ h , -+~ci
for 1 S i S n
ITS(h,, 30) MOD N
Here, n indicates the length of the identifier, ci is the ASCII value of tine i-th character in the identifier, and B I T S ( h , ~ yields ) the least significant (rightmost)m bits of the integer h. Excellentvalues for k and N appearto be61 3 and 1008. A fasteralgorithmcanbe obtained for k =4 and N=1403, since the multiplication by 4 can be implementedby a very efficient left-~hift,butthesmallvalue of k causes sets of similar identifiers to cluster. Obviously, larger values for N result in a better performance, because there will be fewer collisions, but raising N will also cost more memory. With k=613 and N=1008, we obtain width, and height, the inte~ers610, 812, and 3005 for the hash values of length, respectively.
d having a certain string as a value is to be substituted in its place. A macro definition can specify formal parameters, which have to be substituted by the actual p~ameters.An example in C is #define is-capital (ch)
(’A‘ c= (ch)
&&
(ch)
c=
’Z‘ )
whichstatesthat is-capital (ch) mustbesubstituted by ( A’ 0, if there are pairs (A l, I l)to (A,, I,) with all Zi <m,the pair (N+A ... A,, IN) must be replaced by a pair (N+A ...A,, I,,,) where I,,, =
Ei
provided I,,,
;
#define new-expro ((struct expr *)malloc(sizeof (struct expr) )) struct term { int type;
/*
1;
'1'
only */
#define new-term0 ((struct term *)malloc(sizeof (struct term))) extern void print-expr(struct expr *e); extern void print-term(struct term *t); Figure 2.80 Data structures for the parse tree.
The code in Figure 2.79 hasa serious LL(1) problem: it exhibits hidden left-recursion. is hidden by the C code { (*ep) = The left-recursion of the rule express ion new-expr ( ) ; (*ep)->type = - ;},which is a pseudo-ruleproducing E. This hidden left-recursion prevents us from applying the left-recursion removal technique from Section 2.2.4.3. To turn the hidden left-recursion into visible left-recursion, we move the C code to after express ion; this requires storing the result of expresS ion ternporarily in an auxiliary variable, e-aux. See Figure 2.81, which shows only the new rule for express ion; the one for tertn remains unchanged.
expression (struct expr **ep) : expression (ep) { struct expr *e-aux = (*ep); (*ep) = new-expr (1 ; (*ep)->type =
1
I
I - I
-
I
(*ep)->expr = e-aux;
->term) ( ) ; (*ep)->type = ' T' ;} term(&(*ep) ->term) 1 - 1
term(&(*ep)
{ (*ep) = new-expr
I
igure 2.81 Visibly left-recursive grammar with codefor constructing a parse tree.
Now that we have turned the hidden left-recursion apply the technique from Section 2.2.4.3,We find that
into direct left-recursion we can
N = expression(struct expr **ep) a =
{
1
P=
struct expr *e-aux = (*ep); (*ep) = new-expr (1 ; (*ep)->type =
- I
I
(*ep)->expr = e-aux;
'-' term(&(*ep) ->term) {(*ep) = new-expro; (*ep)->type = terrn(&(*ep) ->term)
{TI;}
which results in the code shown in Figure 2.82. Figure 2.83 shows what the new code herule expression-tail-option is calledwiththeaddress (ep) of a pointer (*ep) to the top node collected thus far as a parameter (a). When another term is found in the input, the pointer to the node is held in the auxiliary variable e-aux (b), a new node is inserted above it (c), and the old node and the new term are connected to the new node, which is accessible through ep as the top of the new tree, This technique concts proper parse trees in spite of the grammar transfo~ationrequired for LL(1) pars-
expression (struct expr **ep) : { (*ep) = new-expro; (*ep)->type = I T ( ; } term(&(*ep) ->term) expression-tail-option(ep) xpression-tail-option(struct expr **ep): { struct expr *e-aux'= (*ep); (*ep) = new-expro; (*ep)->type =
1
I
!
-
I
;
(*ep)->expr = e-aux;
'-' term(&(*ep) ->term) expression-tail-option(ep)
I
ure 2.82 Adapted LLgen grammar with code for const~ctinga parse tree.
A sample run with theinput i-i -i yields f ( (I ) -I ) -I ) ;here i is just an identifier and I is the printed representationof a token of the class IDENTIFIER.
-
available LL(1) using the fundarnenpracticallyimportant allow the LR( 3 ) or
*ep :
3 Tree t r ~ s f o r ~ a t i operformed n by expression-tail-option.
LALR(1) parser construction technique to result in a parser; those that do not are said to exhibit LR(1) or LALR(1) conflicts, and measures to deal with them are discussed in Section 2.2.5.7, Techniques to inco~orateerror handling in LR parsers are treated in Section
2.2.5.8. An example of the use of a traditional bottom-up parser generator concludes this section on the creationof bottom-up parsers. Roadmap 2.2.5 Creating a bottom-up parser automatically 2.2.5.1 LR(0) parsing 2.2.5.3 LR(0) conflicts 2.2.5.4 SLR( 1) parsing 2.2.5.5 LR( 1)parsing 2.2.5.6 LALR(1) parsing 2.2.5.7 Making a grammar LR(1) -or not 2.2.5.8 Error handling inLR parsers 2.2.5.9 A traditional bottom-up parser generator-yacchison
The main task of a bottom-up parser is to find the leftmost node that has not yet been constructed but all of whose children have been constructed. This sequence of children is le, because this is where we get hold of the next node to be constructed. for a parent N and connecting the children in the handle to that node is ing the handle to N. In Figure 2.52, node 1, terminal t 6 , and node 2 together form the handle, which has just been reduced to node 3 at the moment the picture was taken. To construct that node we have to find the handle and we have to h o w to which right-hand side of which non-terminal it corresponds: its reduction rule. It will be clear that finding the handle involves searching both the syntax tree as constructed so far and the input. Once we have found the handle and its reduction rule, our troubles are over: we reduce the handle to the non-terminal of the reduction rule, and restart the parser to find the next handle. Althoughthere is effectivelyonly one deterministictop-downparsingalgorithm, LL(k), there are several different bottom-up parsing algorithms. All these algorithms differ onlyinthewaytheyfindahandle;thelastphase,reduction of thehandletoanonter~inal,is the same for each of them. We mention the following bottom-upalgorith~shere: precedence ~ a ~ s:pretty i ~ g we&, but still used in simple parsers for anything that looks like an a~thmetic e~press~on; ounded-context with k tokens left context andm tokens right context; reason,very popular in the1970s, especially BC(2,1), but now out of fashion; L R ( ~theoretically ~: impo~antbut too weak to be useful; SLR(I):an upgraded versionof LR(O), but still fairly weak; LR(I):like LR(0) but both very powerful and very memory-consu~ng;and ~ L ~a slightly ( ~ watered-down ~ : version of LR(I), which is both powerful and usable: the workhorseof present-day bottom-up parsing. We will first concentrate onLR(O), since it shows all the principlesin a nutshell. The steps to LR( 1)and from there to LALR()lare then simple. It turns out that finding a handle is not a simple thing to do, and all the above algorithms, with the possible exception of precedence parsing, require so much detail that it is
humanly impossible to write a bottom-up parser by hand: all bottom-up parser writing is done by parser generator.
One of the immediate advantagesof bottom-up parsing is that it has no problems with leftrecursion. We can therefore improve our grammar of Figure 2.53 so as to generate the proper left-associative syntax tree for the + operator. The result is left-recursive -see Figure 2.84. We have also removed the non-terminal parenthesized-expression by substituting it; the grammar is big enough as it is.
input -+ expression EOF expression "+ term I expression ' S ' term term -+ IDENTIFIER 1 expression ' )' Figure 2.84 A simple grammar for de~onstratingbottom-up parsing.
LR parsers are best explained using diagrams with item sets in them. To keep these diagrams manageable, it is customary to represent each non-terminal by a capital letter and each terminal by itself or by a single lower-case letter. The end-of-input token is traditionally represented by a dollar sign. This form of the grammar is shown in Figure 2.85; we have abbreviated the input to Z, to avoid confusion with the i, which stands for IDENTIFIER.
Z + E $ E + T I E + T T + i I ( E ) Figure 2.85 An abbreviated form of the simple g
r
a for~bottom-up ~ parsing.
In the beginning of our search for a handle, we have only a vague idea of what the handle can be and we need to keep track of many different hypotheses about it. In lexical analysis, we used dotted items to summari~ethe state of our search and sets of items to represent sets of hypotheses about the next token. LR parsing uses the same technique: item sets are kept in which each item is a hypothesis about the handle. Where in lexical analysis these item sets are situated between successive characters, here they are between successive g r a ~ m a rsymbols. An N+a. p between two grammar symbols as a possiblehandle,thatthis ap is tobe meansthat we maintainthehypoth reduced to N when actually found applicable, and that the part a has already been reco ctly to the left of this point. When the dot reaches the right end of the item, as we have identified a handle. The members of the right-hand side ap have all
ns to ~ y n t tree ~x
-th
nized, since the item has been obtained by moving the dot successively over each m e ~ b e of r them, These members can now be collected as the children of a new node ,an item with the dot at the end is called a reduce item; the
*
The various LR parsing methods differ in the exact formof their LR items, but not in ethods of usingthem. So there are LR(0) items, SLR(1) items, LR(1) itemsand l) items, and the methods of their construction differ, but there is essentially only
trate how LR items are used to do bottom-up parsing. Assume e are interested in theinitial item set, the setof hypotheses about the handle we have before the first token, Initially, we know only one node of the tree: the es us the first possibility for the handle: %+*E$, which means that if we ognize an E followed by end-of-input, we have found a handle which we can reduce to Z, the top of the syntax tree. But since the dot is still at the beginning of the t-hand side, it also means that we have not seen any of these g ~ a ~ symbols a r yet. first we need to see is an E. The dot in front of the non-te~inalE suggests that we e wrong symbol atthe moment and that the actual handle mayderive two new items to the initial item set, one for each alternative of E: -FT, which describe two other hypotheses about the handle. Now we f another non-terminal T,which suggests that perhaps the handle adds two more items to the initial item set: T-+*i and T"+* ( E ) . suggests also that the handle could derive from E, but we h e w that introduces no new hypotheses. So our initial item set, $ 0 , contains five hypotheses about the handle:
s with a lexical analyzer, the initial item set is positioned before thefirst input symbol:
where wehave left open spaces between the symbols to accommodate thefuture item sets. that the four additional items in the item setso are the result of E moves, moves made by the handle-se~chingautomaton without consuming input. As before, theE moves are p e r f o ~ e dbecause the dot is in front of something that cannot be matched directly. The const~ctionof the complete LR item is also very similar to that of a lexical item set: the initial co~tentsof the item set are brought in from outside and the set is completed by an E closure algo~thm.An E closure algorithm for LR item sets is given in FigTo be more precise, it is the E closure al orithm for LR(0) item sets and s o is an (0) item set. Other E closure algorithms will be shown below. The E closure algorithm expects the initial contents be brought in from elsewhere. or the i ~ i ~ iitem a l set s o this consists of the item Z+* where S is the start symbol of mar and $ represents the end-of-input. The important part is the inference rule: it
A set S of LR(0) items.
S is prefilled externally with one or moreLR(0) items. If S holds an item of the form P+a*NP, then for each production rule N+y in G, S must also contain the itemN+*y. 6
E closure ~ g o r for i LR(0) ~ ~ item sets for a grammar
G
predicts new handle hypotheses from the hypothesis that we are looking for a certain nonterminal, and is sometimes called the ictio~~ ~it corresponds 1 ~ ; to an E move, in that it allows the a~tomatonto move to anotherstate without consuming input. Note that the dotted items plus the prediction rule represent a top-down component in our bottom-up algorithm. The items in an item set form one or more sets of top-down predictions about the handle, ultimately deriving from the start symbol. Since the predictions are kept here as hypotheses in a set rather than being t r a n s f o ~ e dimmediately into syntax tree nodes as they are in the LL(1) algorithm, left-recursion does not botherus here. Using the same technique as with the lexical analyzer, we can now compute the contents of the next item set s ,the one between the i and the +. There is only one item in so in which the dot can be moved over an i: T-+ i. Doing so gives us the initial contents of the new item set s l: { T+i }. Applying the prediction rule does not add anything, so this is the new item set. Since it has the dot at theend, it is a reduce item and indicates that we have found a handle, More precisely, it identifies i as the handle, to be reduced to T using the rule T+i. When we perform this reduction and construct the correspondingpart of the syntax tree,the input looks schematically as follows:
i
Having done one reduction, we restart the algorithm, which of course comes up with the same valuefor so,but now we are looking at then o n - t e ~ i n aT l rather than at the unreduced i. There is only one item in so in which the dot can be Doing so gives us the initial contents of a new value for s :{ E the prediction rule does not add anything,so this is the new item set; it contains one reduce item. After reduction by E+T, the input looks as follows:
I
m t o ~ e to ~ s y ~ t tre ~x
T
i
and it is quite satisfying to see the syntax tree grow. get a really different initial value for S the set
Rest~tingthe algorithm, we finally
&-+E.$ E"+E@+T
We now have:
T
i
The next token in the input is a +. There is one item in S that has the dot in front of a +: +T. So the initial contents of s2 are { E+E+ 'I:}. Applyingthepredictionrule yields two more items,for a total of three for s2:
Going through the same motionsas with so and again reducing thei to T,we get:
T
i
i
Now there is one item ins2 in which the dot can be carried over a T:E+E+ },which identifies a new handle, E + T, which is to be reduced to E. So we E + T is correct. finallyfind a case inwhichourhypothesisthatthehandlemightbe ~ e ~ e m b that e r this hypothesis already occurs in the construction of so. ~ e r f o ~ i ntheg reduction we get:
E
+
l l i
T
l
i
T
which brings us back to a value of S l that we have seen already: !&+E.$ E-+E*+T
Unlike last time, the next token in the input is now the end-of-input token $. Moving the dot over it gives us s2, { Z+E$ >,which contains one item, a reduce item, shows that a handle has been found, and says that E $ must be reduced to Z:
/l\+
E
l
T
I
T
1i
i
This final reduction completes the syntax tree and ends the parsing process. Note how the LR parsing process (and any bottom-up parsing technique for that matter) structures the input, which is still there in its entirety.
~ r e ~ o ~the~ i ~t etset ~i ~ g The above demonstration ofLR parsing shows two major features that need to be discussed further: the computationof the item sets and the useof these sets. We will first turn to the computationof the item sets. The item sets of an LR parser show considerable similarities to those of a lexical analyzer. Their number is finite and not e m b ~ a s s i n ~ large ly andwecandefineroutines Initial item set ( ) and Next item set ( ) with meanings co~espondingto those in the lexical analyzer. We can therefore precompute the contents of all the reachable item sets and the values of Initial it em ( ) set and Next i tem () set for all theirparameters.Eventhebodies of thetworoutines for LR(0) items, shown in Figures 2.87 and 2.88, are similar to those for the lexical analyzer, as we can see when we compare them to the ones in Figures 2.22 and2.23, One difference is that LR item sets are moved over grammar symbols, rather than over characters. Thisis reflectedinthe first parameter of Next item set ( ) ,whichnow is a Symbol.
noth her is that there is no need to test ifS is a basic pattern (compare Figure2.23). This is because we have restricted ourselves here to grammars in BNF notation. So S cannot be a non-basic p a t t e ~ if, ; however, we allow EBiW, the code in Figure 2.88 will have to take etition and combination operators into account.
F ~ C T I O NInitial item set RETURNING an item set: SET New item set TO Empty;
// Initial contents - obtain from the start symbol: S+a for the start symbol S: FOR EACH production rule SET New item set TO New item set+ item S+@a; RETURN E closure
(New
item
Figure 2.87 The routine Ini i a1t it
Set) ; S em et for an LR(0) parser.
F ~ C T I O NNext item set (Itemset, Symbol) RETURN IN^ an item set: SET New item set TO Empty;
// Initial contents - obtain from token moves: FOR EACH item N+a@SP IN Item set: IF S = Symbol: SET New item set TO New item+ set item N+aS@P; TURN E closure (New item set); igure 2.88 The routine Next
item
(set 1 for an LIX(0) parser.
Calling Initia1 it em (set ) yields So, and repeated application of Next gives us the otherreachable item sets, in anLR analog of the lexical subset ainedinSection 2.1.6.3. The re itemsets are shown,togetherwith an sit ion in Figure 2.89. The reduce items, between them, in the by a double rim, We theiternsthat indicate that a handlehasbeenaremarked ize the sets So, S s , S 6 ,SI,S 3 , S4 and S2 (in that order) from the parsingof i+i;the ill occur in parsing different inputs. The transition table is shown in Figure 2.90, This tabular version of Next i tem t ( ) is traditionallycalledth able in LR parsing. The empty entries stand for of hypotheses; if set is obtainedwhilesearching for thehandle, theemptyset there is no ~ypothesisleft, no handle can be found, and there is a syntax error. The empty set is also called the IE) ror s t ~ t e .It is quite re~resentativethat most of the GOTO table is empty; also the non-empty part shows considerable structure. SuchLR tables are excellent ~ a n d i ~ a t for e s transition table compression.
9 Transition diagram for theLR(0) automaton for the grammar of Figure 2.85.
The use of the item sets differs considerably from that in a lexical analyzer, the reason being that we are dealing with a push-down automaton here rather than with a finite-state automaton. The LR push-down automat~nalso differs from an LL push-down automaton. Its stack consists of an alternation of states and grammar symbols, starting and ending with a state. The g r a ~ a symbols r on an LR stack represent the input that has already been reduced. It is convenient to drawLR reduction stacks horizontally with the top to the right: SO
AI
SI
A2 ... A,
S,
where A, is the n-th grammar symbol on the stack and t designates the top of the stack, Like the LL automaton, the LR automaton has two major moves and a minor move, but they are different:
-Shift: ove removes the first token from the present input and pushes
it onto the
It“ state
i
0 1
5
2 3 4 5
+
”+
GOTO table symbol ( ) $
E
T
7
1
6
2
3
4
7
5
6 7
8 9
7
5
3
8 9
6
I
ACTION table
shift shift Z+E$ shift E”+E+T T+i E+T shift shift T-+(E)
Figure 2.90 GOTO and ACTION tables for theLR(0) automa~onfor the grammar of Figure 2.85.
stack. A new state is determined using the GOTO table indexed by the old state and the input token, and is pushed onto the stack. If the new state is the error state, a syntax error has been found. Reduce: ove is parameterized with the ~roductionrule N+a to be used in the reduction. The grammar symbols in a with the states following them are removed from the stack; in an LR parser they are guaranteed to be there. N is then pushed onto the stack, and the new state is determined using theCOT0 table and pushed on top of it. In an LR parser this is guaranteed not to be the error state. Te~nation: The input has been parsed successfully when it has been reduced to the start symbol. If there are tokens leftin the input though, there isa syntax error. The state on top of thestackinan LR(0) parser determi~eswhich of thesemoves is applied. The state top indexes so-called the ,which comparable is to Class of token recognized ( )in in the lexical analyzer. Like the latter, it tells us whether we have found something or shouldgo on shifting input tokens, and if we found something it tells us what it is. The ACTION table for our grammar is shown as the rightmost column in Figure 2.90, For states that have outgoing arrows it holds the entry ‘shift’; for states that contain exactly one reduce item, it holds the co~espondingrule. We can now summari~eour demonstration of the parsing of i -ti in a few lines; see Figure 2.9 1. The code for the LR(0) parser can be foundinFiure2.92. Comp~isontoFigure 2.69 shows a clear similarity to the LL push-down automaton? but there are also considerable differences. Whereas the stack of the LL automaton contains g r a m m ~symbols only, the stack of the LR automaton consists of an alternating sequence of states and grammar symbols, starting and ending with a state, as shown, for example, in Figure 2.91 andin
Action shift reduce by T + i reduce by E+T shift shift reduce by T+i reduce by E+E+T shift reduce by Z+E$ stop Figure 2.91 LR(0) parsing of the input i-+i.
many other figures, Parsing terminates when the entire input has been reduced to the start symbol of the grammar, and when that start symbol is followed on the stack by the end state; as with the LL(1) automaton this will happen only when the EOF token has also been reduced. Otherwise, the state on top of the stack is looked up in the ACTION table. This results in ‘shift’, ‘reduce using rule N+a’, or ‘erroneous’. If the new state is ‘erroneous’ there was a syntax error; this cannot happen in an LR(0) parser, but the possibility is mentioned here for compatibility with other LR parsers. For ‘shift’, the next input token is stacked and a new state is stacked on top of it. For ‘reduce’, the grammar symbols ina are popped off the stack, including the intervening states. The non-terminal N is then pushed onto the stack, and a new state is determined by consulting the GOTO table and stacked on top of it. This new state cannot be‘erroneous’ in anyLR parser (see Exercise 2.48). Above we stated that bottom-up parsing, unlike top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with rightrecursive rules, in that the stackrnay grow proportionally to the size of the input program; maximum stack size is normally proportional to the logarithm of the program size. This is mainly a problem with parsers with a fixed stack size; since parsing time is already linear in the size of the input, adding another linear component does not much degrade parsing speed. Some details of the problem are considered in Exercise2.5 1.
The above LR(0) method would appear to be a fail-safe method to create a deterministic parser for any grammar, but appearances are deceptive in this case: we selected the grammar carefully for the example to work. We can make a transition diagram for any grammar and we can make a GOTO table for any g r m a r , but we cannot make a deterministic ACTION table for just any grammar. “he innocuous-looking sentence about the construction of the ACTION table rnay have warned the reader; we repeat it here: ‘For states that have outgoing arrows it holds the entry “shift’ ’;for states that contain exactly one reduce item, it holds the corresponding rule.’ This points to two problems: some states m a y have
IMPORT Input token
[l. .l ;
// from the lexical analyzer
SET Input token index 1; TO SET Reduction stack TO Empty stack; PUSH Start state ON Reduction stack; WHILE Reduction stack/= {Start state, Start symbol,End state} SET State TO Top of Reduction stack; SET Action TO Action table [State]; IF Action = "shift": // Do a shift move: SET Shifted token TO Input token [Input token index]; + l; // shifted SET Input token index TO Input token index PUSH Shifted token ON Reduction stack; SET New state TO Goto table [State, Shifted token .class]; // can be Empty PUSH New state ON Reduction stack; ELSE IF Action= (ffreducell, N+a) : // Do a reduction move: Pop the symbolsof a from Reduction stack; // update State SET State TO Top of Reduction stack; PUSH N ON Reduction stack; NI ; SET New state TO Goto table [State, PUSH New state ON Reduction stack; // cannot be Empty ELSE Action = Empty: ; ERROR "Error at token I t , Input token [Input token index] Figure 2.92 LR(0) parsing
with a push-down
automa~on.
both outgoing mows and reduce items; and some states may contain more than one reduce item. The first situation is called a s~i~tmre~uce CO ict, the second a re ict. In both cases the ACTION table contains entries with multiple values and the algo1thm is no longer deterministic. If the ACTION table produced from a grammar in the above way is deterministic (con~ict-free),the g r a is ~called ~ an Veryfew g r ~ m a r sare LR(0). Forexample,a g r ~ a Wr LR(0). Suppose the grammar contains the production rule A+€. be predicted by any item of the form P+a*A p. The first is a reduce item, the secondhas an arrow on A, so we have a shift-reduce conflict. And €-rules are very frequent in grammas. Even modest extensions to our exampleg r ~ a cause r trouble. Suppose we extend it to allow m a y elements in expressions, by adding the productionrule T+i [E J . When we construct the transition diagram, we meet the item set co~espondingto S 5 : T+i T+i
@
@
[E]
and we have a shift-reduce conflict on our hands: the ACTION table requires both a shift and a reduce, and the grammaris no longer LR(0). Or suppose we want to allow assignments in the input by adding the rules Z-+V :=E$ and v+i, where V stands for variable;we want a separate rule for V,since its semantics differs from thatof T+i. Now we find the item set corresponding toS5 to be T+i V+i.
and we have a reduce-reduce conflict, These are very common cases. Note that states that do not contain reduce items cannot cause conflicts: reduce items are needed both for shift-reduce and for reduce-reduce conflicts. For more about the nonexistence of shift-shift conflicts see Exercise2.49, For a run-of-the-mill ~rogramminglanguage g r a m m ~ ,onecanexpecttheLR(0) automaton to have some thousandsof states. With, say, 50 tokens in the language and2 or 4 bytes to represent an entry, the ACTION/GOTO table will require some hundreds of kilobytes. Table compression will reduce this to some tensof kilobytes. So the good news is that LR(0) tables claim only a moderate amount of memory; the bad news is that LR(0) tables are almost certainlyfull of conflicts. The above examples show that the LR(0) method is just too weak to be useful. This is caused by the fact that we try to decide from the transition diagram alone what action to perform, and that we ignore the input: the ACTION table construction uses a zero-token a one-token look-ahead, hencethenameLR(0). There arebasicallythreewaystouse look-ahead, SLR(l), LR( l), andLALR( 1). Allthreemethodsuse a two-dimensional first token of the ACTION table, indexed by the state on the top of the stack and the present input. The construction of the states and the table differ, though.
(~eRemer,1971)) parsing method has little practical significance these days, but we treatit here because we can explain it in a few lines at this stage and because it provides a good stepping stone to the far more important LR(1) method. For one thing it allows us to show a two-dimensional ACTION table of manageable proportions. The SLR(1) method is based on the consideration thata handle should not be reduce to a n o n ~ t e ~ i nNa lif the look-ahead is a token that cannot followN: a reduce item N+a is a~plicableonly if thelook-ahead is in FOLLOW(^. Consequently, SLR(1) hasthe same t~~nsition diagram as LR(0) for a given grmmar, the same GOT0 table, but a different ACTION table. Based on this rule and on the FOLLOW sets ~OLLOW(Z)= { $ } FOLLOW (E) = { ) + $ } ~OLLOW(T)= { ) + $ }
we can construct the SLR(1) ACTION table for the g r a ~ a ofr Figure 2.85. The result is shown in Figure 2.93, in which a reduction to a non-terminal N is indicated only for lookahead tokens inF O L ~ O W ( ~ .
state 0
i
shift shift
4
5 6 7 8
9
$
shift
l. 2 3
-l-
look-ahead token ( 1
shift Z+E$ shift
E+E+T T+i E+T
E+E+T T+i E+T
E+E+T T+i E+T
shift T+ (E)
T+ (E)
shift
shift shift T+ (E)
Figure 2.93 ACTION table for the SLR(1) automaton for the grammar of Figure 2.85.
When we compare the ACTION table in Figure 2.93 to the GOTO table from Figure 2.90, we see that the columns marked withnon-te~inalsare missing; non-terminalsdo not occur in the input and theydo not figure in look-aheads. Where the ACTION table has ‘shift’, the GOTO table has a state number; where the ACTION table has a reduction, the GOT0 table is empty. It is customary to superimpose the ACTION and G0TO tables in the implementation. The combined ACTI0N/~OTOtable has shift entriesof the form sN, which mean ‘shift to state W ; reduce entries rN,which mean ‘reduce using rule number W ; and of course empty entries which mean syntax errors. The A C ~ I O N / ~ Otable T ~ is also called the parse la le. It is shown in Figure 2.94, in which the following numbering of the g r ~ m arules r is used: 1: Z + E $ 2: E + T 3: E + E + T 4: T + i 5: T + ( E )
Note that each alternative counts as a separate rule. Also note that thereis a lot of structure in the ACTION/GOTO table, whichcan be exploited by a compressionalgo~thm. It shouldbe e m p ~ a s i ~ ethat d inspite of theirvisual s i m i l ~ t ythe GOTO and ACTION tables are fundamentally different. The GOTO table is indexed by a state and one grammar symbol that resides on the stack, whereas the ACTION table is indexed by a state and a look-ahead token that resides in the input. That they can be superimposed in the case of a one-token look-ahead is more or less accidental, and the trickis not available for look-ahead lengths other than1. When we now introduce a grammar rule T+i C E I ,we find that the s~ift-redu~e conflict has gone away. The reduce item T+i applies only when the look-ahead is one of
sl
state
i
0
S5
l 2 3
S2
T
s6 S3 rl
S5 r3
4 r3
5 6 r2 r2 7 S5 8 9
stack symbolAook-aheadtoken + ( ) $ E
r5
r5
S7
r3 r4 r2
S4
r4
r4 s8
S7
S3 r5
s6
S9
Figure 2.94 ACTIONlGOTO table for theSLR(1) automaton for the g r a m a s of Figure 2.85.
) , + ,and $ ,sotheACTIONtablecanfreelyspecify table will now contain theline state
5
i
-I-
T+iT+i
o shift
a shiftfor
l
$
T+i
T+i
[
. The SLR(1)
Note the reduction on ],since ] is in the new FOLLOW(T). The ACTION table is deterministic and the grammaris SLR(1). It will be clear that theSLR(1)automaton has the same numberof states as the LR(0) automaton for the same grammar. Also, the ACTIONlGOTO table of the SLR(1) automaton has the same size as the GOT0 table of the LR(0) automaton, but it has fewer empty entries. Experience has shown that SLR(3 )is a considerable improvement over LR(O), but is still far inferior to LR( 1) or LALR( 1). It was a popular method for some years in the early 19’70~~ mainly because its parsing tables are the same size as those of LR(0). It has now been almost completely supersededby LALR(1).
The reason why conflict resolution by FOLLOW set does not work nearly as well as one might wish is thatit replaces the look-aheadof a single itemof a rule N in a given LR state by FOLLOW set of N, which is the union of all the look-aheads of all alternatives of N in all states.LR(1)itemsetsaremore disc~minating:a look-aheadset is keptwitheach separate item, to be used to resolve conflicts when a reduce item has been reached. This greatly increases the strengthof the parser, butalso the size of its parse tables. The LR(1) technique will be demonstrated using the rather artificial grammar shown in Figure 2.95. The grammar has been chosen because, first, it is not LL(1) or SLR(l), so
these simpler techniques are ruled out, and second, it is both LR( 1) and LALR(1), but the two automata differ.
S + A I xb A -+ aslb I B B-+x Figure 2.95 Grammar for demonstrating the LR(1) technique.
The grammarproducesthelanguage { xb, anxbn I n>O}. Thislanguagecan of coursebeparsed bymuchsimplermeans,butthat is besidethepoint: if semantics is attached to the rules of the g r a ~ a of r Figure 2.95, we want a s ~ c t ~ r i of n gthe input in terms of that g r ~ a and r of no other. It is easy to see that theg r a m ~ a is r not LL(I): x is in FIRST@), so it is in FIR and S exhibits a FIRST~IRSTconflict on x. The g r a ~ m a ris not SLR(1) either, which we can see from the SLR(1) automaton shown in Figure 2.96. Since the SLR(1) technique bases its decision to reduce using an item N+a. on the FOLLOW set of N, these FOLLOW sets have been added to each item in set braces. We see that state S2 contains both a shift item, on b, and a reduce item, {b$ }. The SLR(l ) technique tries to solve this conflictby restricting the reduction to those look-aheads that are in FOLLOW(^). ~nfortunately, however, b is in FOLLOW(A), so it is also in FOL~OW(B),resulting in an SLR(1)shift-reduce conflict. The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item. We will write an LR(1) item thus: N+a*P{ci}, in which G is the set of tokens that can follow this specific item. When the dot has reached the end of the item, as in N+aP. {G}, the item is an acceptable reduce item onlyif the look-ahead at that moment is inG; otherwise the itemis ignored. The rules for determiningthelook-aheadsetsaresimple.Thelook-aheadsets of existing items do not change; only when a new item is created, a new look-ahead set must be determined. There are two situations in which this happens.
-When creating the initial item set:
The look-ahead set of the initial items in the initial item set So contains only one token, the end-of-file token (denoted by $),since that is the only token that can follow the start symbol of the grammar. -When doing E moves: The prediction rule creates new items for the alternatives of N in the presence of items of the form P+a.NP{ G}; the look-ahead set of each of these items is FIRST(P{G}), since that is what can follow this specific item in this specific position. Creating new look-ahead sets requiresus to extend our definition of FIRST sets to include suchlook-aheadsets. The extension is simple: if FIRST@) doesnotcontain E, FIRST( P{G}) is just equal to FIRST( p);if p can produce E, FIRST@{G}) contains all the tokens in FIRST(P), excluding E, plus the tokens in G. The E closure algo~thmfor LR(1)
6
8
Pigure 2.96 The SLR(1) auto~atonfor the grammar of Figure 2.95.
items is given in Figure 2.98. Supplying the look-ahead of $ to the start symbol yields the items S-+.A{ $ } and S-+.xb {$ },as shown in So, Figure 2.97. predict in^ items for the A in the first item gives us A-+.aAb {$ } and A-+.B {$1,both of which carry $ as a look-ahead, since that is what can follow theA in the first item, The same applies to thelast item in So:B-+.x {$ }. The first time we seea different look-ahead is in S3, in which the predictionmle for A in the first item yields A-+ aAb {b } and A+.B {b }. Both have a look-ahead b, since FIRST(b {$ })= { b }, The rest of the look-ahead sets in Figure 2.97 do not contain any surprises. We are pleased to see that the shift-reduce conflict has gone: state S2 now has a shift on b and a reduce on $. The other states wereall right already and haveof course not been spoiled by shrinking the look-ahead set. So the grammar of Figure 2.95 is LR(1).
5
Figure 2.97 The LR( 1) ~ u t o ~for ~ the t o grammar ~ of Figure 2.95,
The code for the LR(1) automaton is shown in Figure 2.99. The only difference with the LR(0) automaton in Figure 2.92 is that the Act i o n table is now indexed by the state and the look-ahead symbol. The pattern of Figure 2.99 can also be usedin a straightforward fashion for LR(K) parsers for K > 1, by simply indexing the ACTION table with morelook-aheadsymbols.Ofcourse,theACTIONtablemusthavebeen const~cted accordingly. We see that the LR(1) auto~atonis more disc~minatingthan the SLR(3 ) automaton. In fact, it is so strong that any language that can be parsed from left to right with a onetoken look-aheadin linear time can be parsed using theLR( 1) method: LR( 1) is the strongest possible linear left-to-right parsing method. The reason for this is that it can be shown (Knuth, 1965) that the setof LR items implements the best possiblebread^-first search for handles.
A set S of LR(1) itemsof the formN+a niti~li~~tions: S is prefilled externally with oneor more LR(l )items. In~erencerules: If S holds an item of the form P+a NP{ G } , then for each production rule N+y in G, S must also contain the item N-+*y{ T } , where T = FIRST( P{G}).
F i ~ ~ 2.98 re
E closure
algorithm for LR(l) item sets for a grammar G.
IMPORT Input token [l. .] ;
// from the lexical analyzer
SET Input token index TO 1; SET Reduction stack TO Empty stack; PUSH Start state ON Reduction stack; WHILE Reduction stack/= {Start state, Start symbol, End state} SET State TO Top of Reduction stack; SET Look ahead TO Input token [Input token index] .class; SET Action TO Action table [State, Look ahead]; IF Action = Ifshift": // Do a shift move: SET Shifted token TO Input token [Input token index]; SET Input token index TO Input token index + 1; // shifted PUSH Shifted token ON Reduction stack; SET New state TO Goto table [State, Shifted token .class]; // cannot be Empty PUSH New state ON Reduction stack; ELSE IF Action= ( Irreduce1', N+a) : // Do a reduction move: Pop the symbols of a from Reduction stack; SET State TO Top of Reduction stack; // update State PUSH N ON Reduction stack; NI ; SET New state TO Goto table [State, PUSH New state ON Reduction stack; // cannot be Empty ELSE Action = Empty: ERROR "Error at token I!, Input token [Input token index]; Figure 2.99 LR(1) parsing with a push-down automaton.
It is possible to define an LR(k) parser, with k > l, which does a k-token look-ahead. This change affects the ACTION table only: rather than being indexed by a state and a look-ahead token it is indexed by a state and a look-ahead string of length k. The COT0
table remains unchanged. It is still indexed by a state and one stack symbol, since the symbolinthe COT0 table is notalook-ahead; it alreadyresideson the stack.LR(k > l ) 1) parsers, but only marginallyso. If a g r a m ~ a is r not LR(l), parsers are stronger than LR( chances are slim that it is LR(2). Also, it can be proved that any lan expressed by an LR(k 31) g r ~ a can r be expressed by an LR(1) g r ~ a r LR(k . 31) parsing has some theoretical significance but has never become popular. The increased parsing power of the LR( 1) technique does not come entirely free of charge: LR(1) parsing tables are one or two orders of magnitude larger than SLR(1) parsingtables.WhereastheaveragecompressedSLR(1)automaton for a p r o g r a m ~ n g language will require some tens of kilobytes of memory, LR( 1) tables may require some megabytes of memory, with perhaps ten times that amount required during the construction of the table. This may present little problem in present-day computers, but traditionally compiler writers have been unable or unwilling to use that much memoryjust for parsing, and ways to reduce the LR(1) memory requirements have been sought. This has resulted in the discovery of LALR(1) parsing. Needless to say, memory re~uirementsfor LR(k) ACTION tables with k > 1 are again ordersof magnitude larger. A different imple~entationof LR(1) that reduces the table sizes somewhat has been presented by Fortes Griilvez (1992).
When we look carefully at the states in the LR( 1) automaton in Figure 2.97, we see that some of the item sets are very similar to some other sets, More in particular, S3and Slo are similar in that they are equal if one ignores the look-ahead sets, and so are S4 and S9, S6 and S l l , and Ss and S 12. What remains of the item set of an LR(1) state when one ignores the look-ahead sets is called the care of the LR(1) state. For example, the core of state S2 in Figure 2.97is S -+
B -+
x.b X.
All cores of LR(1) states correspond to LR(0) states. The reason for this is that the contents of the cores are determined only by the results of shifts allowed from other states, These shifts are determinedby the GOT0 table and are not influencedby the look-aheads. So, given an LR(1) state whose core is an LR(0) state, shifts from the item set in it will produce new LR(1) states whose cores are again LR(0) states, regardless of loo~-aheads. ~e see that the LR(1) states are split-up versions of LR(0) states, Of course this fine split is the source of the power of the LR(1) auto~aton,but this power is not needed in each and every state. For example, we could easily combine states S s and S12into one new state Ss,l2holding one item A-+am* {b$ },without in the least compromising the disc~minatorypower of the LR(1) automaton. Note that we combine stateswiththesamecoresonly,andwe do this by adding the look-aheadsets of the corresponding items they contain. Next we lead the transitions away from the old states and to the new state. In our example, the transitions on b in S6 and SIX leading to Ss and S 12 respectively, are moved . states S 8 and S12 can then be removed, reducing the numberof states to lead to S * , J ~The by 1.
S U ~ S E C ? l O2.2.5 ~
r e a t i ~a ~ o t t o ~~-aur s~e ar u t o ~ a t i ~ a l l y 171
Continuing this way, we can reduce the number of states considerably. Due to the possibility of cycles in the LR(1) transition diagrams, the actual algorithm for doing so is much more complicated than shown here (Pager, 1977), but sinceit is not used in practice, we will not giveit in detail.
Figure 2.100 The L K R ( 1 ) auto~ato~ for the g r m a r of Figure 2.95.
It would seem thatif one goes on combining states in the fashion described above, one would very soon combine two (or more) states into a new state that would have a conflict, since after all we are gradually throwing away the look-ahead information that we have just builtuptoavoidsuchconflicts. It turnsoutthat for theaverage p r o g r a ~ n g language grammar this is not true. Better still, one can almost always afford to combineall states with identical cores, thus reducing the number of states to that of the SLR(1) -and LR(0) -automaton. The auto~atonobtained by combining all states of an LR(1) automa-
ton that have the same cores is the The LALR(1) automaton for the grammar of Figure 2.95 is shown We see that our wholesale combining of states has done no damage: the conflict-free, and the grammar is LALR(1), as promised. The item Bretained its look-ahead $, which distinguishes it from the shift on b. The item for B that ) , an item must exist) sits does have a look-ahead of b (since b is in F O L L O ~ ( ~such safely in state $7. The contexts in which these two reductions take place differ so much that the LALR(1) method can keep them apart. It is surprising how well the LALR(1) method works. It is probably the most popular parsing method today, and has been so for at least twenty years. It combines power -it is only marginally weaker than LR(1) -with efficiency -it has the same memory requirements as LR(0). Its disadvantages, which it shares with the other bottom-up methods, will becomeclearinthechapteroncontexthandling,especiallySection3.1.7. Still, one wonders if the LALR method would ever have been discovered (LaLonde et al., 19’7l) if computers in thelate 1960s had not beenso starved of memo^. One reason why the LALR method works so well is that state combination cannot a state S with a shiftcause shift-reduce conflicts. Suppose the LALR(1) automaton has reduce conflict on the token t, S contains least at two items, a shift item A+a*tp{o} and a reduceitem B to2}. The shift item is presentin all theLR( l ) states that have been combined into S, perhaps with different look-aheads. A reduce item B+ye {C T ~ ~ Cwith T ~ } a look-ahead that includes t must be present in at least one LR(1) states, or t would not be in the LALR reduce item look-ahead set of S. implies that this LR(1) state already had a shift-reduce conflict, so the conflict was not caused by co~bining,
Although the chances for a grammartobeLR(1)aremuchlargerthanthose of being SLR(1) or LL( l), one often encounters a grammar that still is not L generallythat the grammar is anibiguous,whichimpliesthatnodeterministicmethod would work. And the reason that one wants to use ambiguous g r a m m ~ is s that they often allow the properties of the language to be stated more simply. An example is the g r a ~ a r fragment for conditional statement in C: if-statement -+ ‘if’ ’(’ expression ’)‘ statement I ‘if’ / ( ’ expression. ’ ) / statement ‘ e l s e ‘ statement statement -+ ... I if-statement 1 ...
which exhibits the ~ n ~ l else in~ problem. Now the state~ent is ambiguous, since it has two syntax trees, as shown in Figures 2.101 and 2.102, The C manual states that the tree in Figure 2.101 is the correct parsing. In the LR( 1) auto~aton, we notice the problemby observing that the state that contains the two items
item 1:
if-statement 3 (' expression 'if' ' item 2: if-statement -+ 'if' ' (' expression
... }
1 ) '
statement * { ... 'else'
l)'
statement * 'else' statement
has a shift-reduce conflict on the token e 1se. Note that such a state must exist. The item marked 'item 1' must occur in at least one state, or else the corresponding if-statement could never be recognized; call this state X, Since the first part of the if-else-state~entis the same as that of the if-statement, the if-else-statement will also lead to state X, since until the e l se has been seen, the parser cannot see any difference. So the item marked 'item 2' must also be present in stateX.
if-statement
'if' ' ('
expression '1'
x > o
statement
if-statement
(' 'if' '
expression '1'
Y ' O
statement 'else' statement
p = 0;
q = 0;
A possible syntax treefor a nested if-state~ent.
~esolvi~~ s~~t-re~uc~ co~~icts Shift-reduce conflicts are adi it ion ally solved in an LR parser generator in a way that is similar to the conflict resolution used in lexical analyzers: the longest possible sequence of g r a symbols ~ ~ is~ taken for reduction.This is verysimpletoimplement:in a shiftreduce conflict do the shift. Note that if there is more than one shift-reduce conflict in the same state, this criterion solves them all. As with the lexical analyzer, this almost always does what one wants. In the caseof the dangling else, the e 1se gets shifted andas a result is paired with the latesti f without an e l S e, thus c o n f o ~ n to g the C manual. So in gen-
if-statement
’if‘
‘(’ expression ‘1’ statement ‘else’ statement
x > o
‘if’
if-statement
q = 0;
‘(’ expression ’)’statement
Y’O
p = 0;
Figure 2,102 An alternative syntax tree for the same nested if-statement.
era1 shift-reduce conflicts are not removed from the grammar but resolved by shift preference in the parser generator. Another useful techniquefor resolving shift-reduce conflicts is the use of precedences between tokens. The word ‘precedence’ is used here in the tra~itionalsense, in which, for example, the multiplication sign hasa higher precedence than the plussign; the notion may be extended to other tokens as well in parsers. This method can be applied only if the reduce item in theconflict ends in a token followed by at most one non-terminal, but many do. In that case we have the following situation which hasa shift-reduce conflict on t: P+aetP{ ...} Q +yuR e {...t ..}
(the shift item) (the reduce item)
where R is either empty or one non-terminal, Now, if the l o o ~ - ~ e is a dt, we perform one of the following three actions: 1.if symbol U has a higher precedence than symbolt, we reduce; this yields a nodeQ containing U and leavest outside of it to the right; 2. if t has a higher precedence than U, we shift; this continues with the node for P which will contain t when recognizedeventually, and leavesU out of it to the left;
3. if both have equal precedence, we also shift (but see Exercise 2.55). This method requires that the user of the parser ene era tor supply precedence information. It allows considerable control over the resolution of shift-reduce conflicts. Note that the dangling else problem can also be solved by giving the e l S e token the same precedence
as the ) token; then we do not have to rely on a built-in preference for shift in^ in a shiftreduce conflict. Reduce-reduce con~ictscorrespond to the situ est token still matches more than one pattern. the same as in a lexical ~ a l y ~ ethe r : textual1 his is easy to imple~entand i s usually
a lexical analyzer in which the lon ost c o ~ m o built-in n resolution rule r a ~ rule m ~in the parser ene era tor satisfactory.
reduction stack and an input token, such ck S, and the input tokenfx is empty: s ~ 1A S IA2...At~t
t.x
cover from the error we need to reach a situation in which this is no longer true. ies are involved, stack and the input, we can consider modify in^ either or both, but just as in Section 2.2. the stack endan~ersour chances of obtainin an LR parser, since removin a correct syntax tree. Actual1 ven worse in ar symbols from the reduction stack implies t h ~ o w i naway ~ parts of the syntax tree that have already been found tobe correct. There are many proposed techni~ue~ to do repairs, almost all of them ~oderatelysuceven search the states on the stack and the next few input tokens d most the promisin cKenzie et al., 1995;
ot to modify
the stack, but this is difficult. Several techni~ueshave
S, allows a shift or reduction on a token, say t,, one can inse and perfom the shift or reduction. ~ n f o ~ u n a t e l this y ? has a good chance of br backto a situationwith the sametopstate s t , and since therest of theinpu history will repeat itself. have seen that the acceptable-set t ~ n i ~ from u e ~Section 2.2.4.5 avoid ~ o d i f y i for L parsers too, but they are less success~ulthe est t~el~selves A naive a ~ p ~ o isa to c ~take the set of correct tokens as the acce able set. This causes the the input one by one until a token found that does have m table, so parsing can continue,butthis panic- ode error ant tokens, and yields ad results. An approach simid e ~ c r i b for e ~ LL parsers in is much more co~plicated
parsers tend to modify the stack.
The best known method is the one used by the LALR(l) parser generator yacc (Poonen, 197’7). The method requires some non-terminals to be chosen as error-recoveri~~ ~ o ~ S; these areusuallythe‘bignames’fromthegrammar: declaration, expression,etc. If a syntax error is detected while constructing a node for an errorrecovering non-terminal, say R, the idea is to give up the entire attempt to construct that node, construct a d u m y node instead that has the proper attributes, and discard tokens from the input until one is found that indicates the end of the damaged production of R in the input. Needless to say, finding the end of the damaged productionis the risky part, g r a m m ~writeraddsthealternative Thisidea is implementedasfollows.The erroneous to the right-hand sideof one or more non-terminals, thereby marking them as non-te~inalsthat are licensed to produce a d u m y syntax subtree. During the construction of the LR states, each state that containsan item of the form in which R is an e~or-recover in^ non-te~inal,is marked as ‘e~or-recovering’. When a syntax error occurs, the top of the stack exhibits a state S, and the present input starts with a token tx, such that ACTION[s,, t,] is empty. See Figure 2.103, in which we assume thatR was defined as R
-+ G
H I
I
erroneous
andthatwehavealreadyrecognizedandreducedthe G and H. The pseudo-ter~nal erroneous-R represents the d u ~ node y that is allowed as an alternative of R.
R-> .G H R -> . erroneous-R
F i ~ 2.103 ~ ~ LR e error recovery
-detecting the error.
The error recovery starts by removing elements from the top of the stack one by one until it finds an ~~ror-recover in^ state. See Figure 2.104, where the algorithm finds the e~or~recovering state S,. Note that this action removes correctly parsed nodes that could have become part of the tree for R. We now construct the d u m y node erroneous-R for R, push R onto the stack and use the GOT0 table to determine the new state on top of
-
R-> .G H I R -5 . erroneous-R G->
.....
Figure 2.104 LR error recovery -finding an error recovery state.
p-> p->
. t y ... ‘t,
...
...
~~
Figure 2.105 LR error recovery -repairing the stack.
0
~-~ p-> I
.
. t z ...
.
Figure 2.106 LR error recovery -repairing the input.
the stack. Since the error-recovering state contains the item N+a@RP, we can be certain that the new state is not empty, as shown in Figure2.105. The new state S, defines a set of
I p-> t , .
1
...
I l
f . ,
l
...
w e 2.307 LR esror recovery -restart in^ the passer.
acceptable tokens, tokens for which the row ACTION[s,,,..] contains a non-empty entry; these are the tokens that are acceptable in sz. We then discard tokens from the input until we find a token t2 that is in the acceptable set and can therefore follow R. Note that this action attempts to remove the rest of the production of R from the input; see Figure 2.106. Nowatleast one parsing step canbetaken,since ACTION[s,, tz.] is not empty" This prevents looping. The final situation is depicted in Figure 2.107. The procedure described here cannot loop, restricts the damage to the syntax tree toa known place and has a reasonable chance of getting the parser on the rails again. There is a risk, however, that it will discard an important token and derail the parser further. Also, the rest of the compiler must be based on the g r a ~ m a as r extended with the alte~atives erroneous in all error-recovering non-te~inals.In the above example that mean^ that all code that processes nodesof type R must allow the possibility that the nodeis actually a ~ u m m ynode erroneous-R.
se Probably the most widely used parser generatoris yacc, which started as a UNIX utility in the mid-~970sand has since seen more than twenty years of service in many co~pilation and coilversion projects. Yacc is an LALR( 1) parser generator. The name stands for 'Yet Another Compiler Compiler', but it is not a compiler compiler in that it generates ratherthancompilers. One recentproblemwith yacc is thatitgenerates C rath ANSI C; a yacc look-alike called ~ ~ ~provided u n , by GNU, remedies this prob~em.The yacc code shown in this section has been tested usingbison. The most striking difference between top-down and bottom-up parsing is that where determines the correct alternative right at the beginning and then works its way through it, bottom-up pars in^ considers collections of alte~ativessimultaneously andonlydecidesatthelastpossiblemomentonthecorrectalternative. Althou~hthis open~essof mind increases the stren th of the method considera~ly,it makes it much more difficult to execute code. In fact code can only be executed safelyat the end of an a l t e r n ~ ~ tive,whenitsapplicabilityhasbeenfirmlyestablished.This also rulesouttheus parameters since it would be unclear when (or even whether) to evaluate them andto pass
Yacc ' S approach to this is to associate with each ember exa whichshouldbeset by thatmemberwhen it hasbeen recognized. means that when the entire alte~ativeof a non-ter~inalN has been re place and can be used to construct the para~eterfor N. The $ 2 , .. Sn,for the n mem~ersof an alternative;thecount . The para~eterassociated with the rule non-te~inalsitself is e full yacc code for constructing parse treesfor simple expressions is shown in . The code at the end of the first alternative of expresS ion allocates a n yields its address as the parameter for express ion. Next, it sets the type and the two pointer fields to the para~eterof the first member and the third membe~,respectively.Thesecond l~emberis the t e r ~ i n a lsymbol - ;its parameter is notused.The sion and in term are similar. ments in the second alternativeof e e parameters. Since the p ~ a ~ e t e are rs All this raises questions about the types sing stack, they all have to be of the implemented as an may that parallels theLA. r will want to associate different data because t sametype. This is inco~~enient, s ~ c t ~ ~ with r e s diff~rentnon-ter~nals. A way out is provided by imp p~ametersas ~~nions of the various ata structures. Yacc is aware of this union to be ~ e f i n by e ~ the user, thr gh a %union keyword. R e f e ~ i nt ~ twostructuresdeclared inside the %union,withtags expr %type state~entsas§ociate the entry tag~edexpr in the union with the non-te~inal n. andtheentry term withthe n o n - t e ~ n a lterm. This allows yacc and
t ma in are similar to those of the g r ~ ~ pro a r ary C code. This code is very s the sameoutput as the LLgen example on correct (1)
parse error
Thisconcludesourdiscussion of the first sta e of thecompilationprocess -textual analysis: the conversion from characters in a so rce file to abstract syntax tree. ~e have seen that the conversion takes places in two major steps separated by a minor one. The aracters into tokens and then structure the sequence n thetwomajorsteps,someassortedlanguaged e p e ~ d e character ~t and token ~ a n i ~ u l a t i omay n take place, to ~ e ~ o r m p r e idenli~in~ tifier identi~ication,macro process in^, file inclusion,and conditional major steps are based on more or less automated pattern p at chin^, usin sionsand context-free rammars res~ectively. Important a l g o r i t ~ ~inboth s stepsuse
%{
#include
"tree,h"
%}
%union { struct expr *expr; struct term *term;
1
%type <expr> expression; %type term; %token IDENTIFIER %start
main
main : expression {print-expr($l)
; printf (ll\nrl);}
I
expression: expression term {$$ = new-expro; $$->type=
I
I
-
!
;
$$->expr = $1; $$->term= $3;)
term
{$$ = new-expro; $$->type= IT1; $$->term = $l;}
I
term: IDENTIF~E~ {$$ = new-term(); $$->type = 1 1 ~ ; ) I
Figure 2.108 Yacc code for constructing parse trees.
'items', which are simple data s t ~ c t u r e sused to record partial pattern matches. We have also seen that the main unsolved problem in textual analysis is the handling of syntactically incorrect input; only ad-hoc techniques are available. A very high-level view of the relationships of the techniques is givenin Figure 2.110.
#include
'Ilex.h"
int main(void) { start-lex ( ) ; yyparse (1 ; return 0;
/* routine generated by yacc */
1
int yylex(void) { get-next-token (1 ; return Token.class;
1
Figure 2.109 Auxiliary code for theyacc parser for simple expressions.
Lexical analysis Top-down Bottom-up
Decision on reduce items: finite-state automata
Syntax analysis Decision on first Decision on first chartoken: LL(1) method acter: manual method Decision on reduce items: LR techniques
Figure 2.110 A very high-level view of text analysis techniques.
-Lexical analysis turns
a stream of characters into a stream of tokens; syntax analysis turns a stream of tokens into a parse tree, or, more probably, an abstract syntax tree. Together they undo the lineari~ationthe program suffered in being written out sequentially. -An abstract syntax tree is a version of the syntax tree in which only the semantically importantnodesareretained.What is 'semanticallyimportant' is uptothecompiler writer. -Source program processing starts by reading the entire program into a character buffer. This sim~lifiesmemory management, token isolation, file position tracking, and error reporting.
-Standardize nevvline characters as soon as you see them. -A token consists of a number (its class), and a string (its representation); it should also include position-trac~nginformation.
-The form of the tokens in a source language is described by patterns in a special fomalism; the patterns are called regular expressions. Complicated regular expressions can be simplifie~by naming parts of them and reusing theparts; a set of named re sions is called a regular description. -A lexical analyzer is a repeating pattern matcher that will cut up the input stream into tokens l~atchingthe token patternsof the source language. -Ambiguous patterns are resolved by accepting the longest match. If that fails, the order of the patterns is used to break the tie. -Lexical analyzers can be writtenby hand or generated automatically, in both cases based on the s~ecificationof the tokens through regular expressions. andwritten lexical analyzers make a first decision based on the first character of the and use ad-hoc code thereafter, xical analyzer is the only part of the compiler that sees each character of the source ~ r o ~ r a m as;a result, it performs an order of ~agnitudemore actions that the rest of the compiler phases. -Much computation in a lexical analyzer is done by side-effect-free functions on a finite domain. The results of such computations can be determined statically by precomputation and stored ina table. The co~putationcan then be replaced by table lookup, greatly increasing the efficiency. -The resulting tables require and allow table compression. -~eneratedlexical analyzers represent their knowledge as a set of items. An item is a namedfullyparenthesizedregularexpressionwith a dotsomewherein it. Thepart before the dot matches the last part of the input scanned; the part after the dot must match the first part of the rest of the input for the item to succeed. -Scanning one Character results in an item being transformed into zero, one, or more new item. This transition is called a shift. The set of items kept by the lexical analyzer is t r a n s f o ~ e dinto another setof items by a shift over a character. -The item sets are called states and thetransfo~ationsare called state transitions. -An item with the dot at the end, called a reduce item, signals a possible token found, but the end of a longer token may still be ahead. When the item set becomes empty, there are no more tokens to be expected, and the most recent reduce item identifies the token to be matched and reduced. -All this item manipulation can be avoided by preco~putingthe states and their transitions. This is possible since there are a finite number of characters and a finite number of item sets; it becomes feasible when we limit the precom~utationto those item sets that can occur in practice: thestates. -The states, the transition table, andthe transition m ~ c h ~ itogether s l ~ are called a fimitestate automaton, FSA. -~eneratedlexical analyzers based on FSAs are very efficient, and are standard, althou handvvritten lexical analyzers can comeclose. -Transitiontablesconsistmainly of emptyentries. y canbecompressed by cutting them into strips,row-wiseorcolumn-wise,and fi thevaluesinone strip into the
,by shifting one with respect to the other; the starting positions of recorded and used to retrieve entries. Some trick must be appliedto resolve the v a l ~ e ~ o l e -Inanothercompressionscheme,thestripsareoupedintoclusters,the l~embersof which do not ~ ~ t e ~ fwith e r e each other, using gr coloring techni~~es. All embers of lexical analysis have to be identified further before in the symbol table. This idenrd identification, macro processds indexed by strings. The string is the ion about the identifier. using ~ashing. ays can be im~~emented a1 compilation, and file incluck of input buffers. can be i m p l e ~ e n t eusing ~ file inclusion and macro processin a form of token ins~rtion,between the lexical and the syntax anal
ere are two ways of doing -down and bottom-up. op-down ~arsingtries gram prod~ctionprocess; bottom-up parsing tries to rollback the proa ~ t o ~ a t i c a l l yfro^ a p-downparsers can bewrittenmanuallyorbegenerated context-free grammas. -A handw~ttentop-downparserconsists of a set of recursiveroutines,eachroutine closely to a rule in the r a m a r . Such a parser is called a recursive deshis techni~ueworks for a restricted set of g r ~ a r only; s the restrictions down parsers use precomputation of the decisions that predictive recursive desce~tparsers take dynamical~y. ~ n a ~ b i g u o transition us tables are obtained for s LL( 1) g r a ~ m a r only. -~ o n s t ~ c t i of o nthe table is based on the F1 ST and FOLLOW sets of the non-terminals. kens any ~ r o d u c t i of o ~N can start with, andE if N produces the ( N ) contains all tokensthat can follow anyprod~ctionof N. -The transitloe inco~oratedin a recursive descent passer to yield a predicwhich the parsing stack coincides with the routine calling stack; or be l) push-down automaton, in which the stackis an explicit array. can be r e ~ o v e dby left-factorin substitutio~,and left-r~cursionrernomas, and can be resolved by having dynamic conflict resolvers in the 'I
can recover from syntax errors by plotting a shortest path out, deletillg tok~nsfrom the rest of the in ut until one is found that is acceptable on that path, and
then following that path until that token can be accepted. This is called acceptable-set error recovery. -~ottom-upparsers work by repeatedly identifying a handle, The handle is the list of children of the last node that was expanded in producing the program. Once found, the bottom-up parser reducesit to the parent node and repeats the process. -Finding the handleis the problem; there are many approximative techniques. -The LR parsingtechniquesuseitemsets of proposedhandles.Theirbehaviorwith respect to shift (over a token) is similar, their reduction decisioncriteria differ. -In LR(0) parsing any reduce item (=item with the dot at the end) causes a reduction. In SLR( 1) parsing a reduce item N+a causes a reduction only if the look-ahead token is in the FOLLOW set of N. In LR( 1) parsing a reduce item N+a* {CT}causes a reduction only if the look-ahead token is in CT,a small set of tokens computed especially for that occurrence of the item. -Like the generated lexical analyzer, the LR parser can perform a shift over the next token or a reduce by a given grammar rule. The decision is found by consulting the ACTION table, which can be produced by precomputation on the item sets. If a shift is prescribed, the new state can be found by consulting the GOT0 table, which can be LR parserswith a one-token look-ahead, the precomputedinthesameway.For ACTION and GOT0 tables can be superimposed. -The LALR(1) item sets and tables are obtained by combining those LR(1) item sets that differ in look-ahead sets only. This reduces the table sizes to those of LR(0) parsers, but, remarkably, keeps almost all parsing power. -An LR item set has a shiftheduce conflict if one item in it orders a shift and another a reduce, taking look-ahead into account. An LR item set has a reduceheduce conflict if two items init order two different reduces, taking look-ahead into account. -LR shif~reduceconflicts can be resolved by always preferring shift over reduce; LR reduce/reduce conflicts can be resolved by accepting the longest sequence of tokens for the reduce action. The precedence of operators can also help. -Error recovery in anLR parser is difficult, since much of the information it gathers is of a tentative nature. In one approach, some non-terminals are declared error-recovering by the compiler writer. When an error occurs, states are removed from the stack until a state is uncoveredthatallows a shift onan error-recovering non-terminal R; next, a dummy node R is inserted; finally, input tokens are skipped until one is found that is acceptable in the new state. This attempts to remove all traces of the production of R and replaces it witha dummy R.
The use of finite-state automata for lexical analysis was first described by Johnson et al. (1968)andtheuse of LL(1) was first described by Lewisand Stems (Lewis 11 and Steams, 1968), although in both cases the ideas wereolder, LR(k) parsing was invented by nuth (1965). Lexical analysis and parsing are covered to varying degrees in all compiler design books, but few books are dedicated solely to them. We mention here a practice-oriented
book by Crune and Jacobs (1990) and two theoretical books, one by Sippu and SoisalonSoininen (1990) and the other by Ah0 and Ullman (1973), both in two volumes. A book by Chapman (1987) givesa detailed treatment of LR parsing. There are a num~erof good toexcellentcommercial and publicdomain lexical analyzer generators and parser generators. Information about them can be found in the postingsin the c ~ ~ ~ne~sgroup, . c which ~ are ~ much ~ more ~ ~up toe date~thanany ~ printed text can be.
2.1
Section 2.1 .l advises to read the program text with a single system call. Actually, you usually need three: oneto find out the size of the input file, one to allocate space for it, and one to read it, Write a program for your favorite operating system that reads a file into memory, and counts the number of occurrences of the character sequence abc+abc.Try to malse it as fast as possible. Note: the sequences may overlap.
2.2
On your favorite system and programming language, time the process of reading a large file using the language-supplied character read routine. Compare this time to asking the system for the size of the file, allocating the space, and reading the file using one call of the languag~-suppliedmass read routine.
2.3
Usingyourfavoritesystemand p r o g r a ~ i n glanguage, create a file of size 256 which contains all 256 different 8-bit characters. Read it character by character, and as a block. What do you get?
2.
Somebody in a compiler construction project suggests solving the newline problem by systematically replacing all newlines by spaces, since they mean the same anyway. Why is this almost certainly wrong?
2.5
Give a regular expression for all sequences of Os and 1 s that (a)contain exactly 2 1s. (b) contain no consecutive 1s. (c) contain an even numberof 1 S.
2.6
Why would thedot pattern (.)usually exclude the newline (Figure 2.4)?
2.7
Whatdoestheregularexpression erroneous? Are they ambiguous?
2.
(from Stuart Broad) The following is a highly simplified grammar for URLs, assuming proper definitions for1 etter and digit.
a ? * mean?And
a**? Aretheseexpressions
I URL . l t label label ”+ letter ( letgit-hyphen-string? letgit ) ? letgit-hyphen-string ”+ letgit-hyphen I letgit-hyphen letgit-hy~hen-string letgit-hyphen ”+ letgit I 11-11
URL ”+ label
ewritethe skip-layout-and-comm~nt routineofFigure2.8toallow nested comments.
for
The c o ~ m e n skipping t scheme of Figure 2.8 suffices for single-character coml~entdelimiters. However, multi-character comment-deli~tersrequire some more attention. Write a ski~-layo~t-an~-comm~nt routine for C, where comments are delimited by ‘/*’and ‘*/’,and don’t nest, The end of the s~bsectionon table compression in Section 2.1 5.1 leaves single array of 256 bytes, charbit S E 1. Since programs contain onlyA acters in therange 32 through126,plusnewlineandpetab,somebodyproposes togainanotherfactor of 2andreducethearrayto a of 128. What is your reaction? Explain why is there a FOR EACH statement in Figure 2.16 rather thanjust: IF the input matchesT,”+R,
...
over Length:
The text distinguishes ‘shift items’ with the dot in front of a basic pattern, ‘reduce items’ with the dot at the end, and ‘non-basic items’ with the dot in front of a re sLlbexpressio~. What about items with the dot just before the closingp~enthesisof a p ~ e n t h e s i subexpression? ~e~ uppose you are to extend an existing lexical analyzer generator witha basic pattern which matches two consecutive occurrences of the same characters, for example aa, ==, or ,,. How would you implement this (notso) basic pattern?
g,
Argue the correctnessof some of the dot motion rulesof Figure 2.19, at use regular expressions, for exampleSCML, add a third composiwith R &R2 ~ e a ~ i that n g both R and R 2 must occur but that they may occur in any order; so R I &R2 is equivalent to R I R 2 IR 2R Show the E move rules for thiscomposiin a fashionsimilartothoseinFigure2.19,starting from the item T-+ Show that the closure algorithmfor dotted items (Figure 2.25) terminates.
that ‘our closure al orithm terminates after having erated five sets, outof a possible 64’. late keywords in a file. A keyword is any sequence of letters delophes: ‘begin‘i ekeyword begin. eware of non-letters between apose~pressiol~s for the process, and construct the F .31 using marking by state (rather than by charles to be compressed often contain many rows 0, 3, and 7 of Figure 2.90: -t-
state i 0
3 7
(
$
)
E
T
1
5 5 5
shift shift shift
6 4
7 7
6
8
empty entries -and thus more compressibility -can be obtained by assignin to one of the rows in such a group the role of ‘principal9 and reducing the others to the differen~ewith the principal. Taking row 7 for the principal, we can simplify the table to: state principal i 0 3 7
-I-
(
7
)
$
E
T
l 4
7 5
7
8
6
shift
If, upon re~ieva19an empty entry is obtained from a row that has a p~ncipal,the be obtained from that principal. Fill in the detail to turn this idea (l) table of Figure 2.94 in two ways: using rowdisplace~entwith nd using column dis~lacementwith ~ a r ~ by n state. g ilar lexical analyzer generator to generatea filter that removes r a m files. One problem is that the comment starter /* may ments may be arbitrarily long and most even if it is su~sequentlydiscarded, so ge buffers, which are not supplied by all of lex or flex to enerate~lexical analy start condition feature c o n s u ~ the e c o l ~ ~ eline n t by line. guages, for example Algol 68, feature a token class similar to is largely similar to the formats used in C p r i n t f ( ) calls.
For example, $3d$ described the formatting of an integer value in 3 digits. Additionally, numbers in formats may be dynamic expressions: integers f o ~ a t t e dunder $n.(2*a ) d$ will have 2 *a digits. Design a lexical analyzer that will handle this. int 1: the dynamic expressions can, of course, contain function calls that have formats as parameters, recursively. Hint 2: this is not trivial. Will the function Identify ( ) in Figure 2.44 still work if the hash function maps all identifiers onto the same number? Why? Suppose your symbol tablehas to accommodate between 10 and1000 identifiers as a rule, but is also required occasionally to handle10 million identifiers reasonablyefficiently.Alwaysallocating a hashtable of size 10 million is out of thequestion. Design an appropriate hash table algorithm. An adviserto a compilerconstructionproject insists thattheprogrammatically correct way to do macro processing is in a separate phase between reading the program and lexical analysis. Show this person the errors of his or her ways.
7 In the subsection on back-calls in Section 2.1.12.1, we need a back-call because the process of recognizing the identifier mac overruns the end of the identifier by one character. The handwritten lexical analyzer in Section2. 1.5 also overruns the endof an identifier. Why do we not need a back-call there? Give a code segment (in some ad hoc notation) that uses N generic items and that willcause a piece of code tobegenerated2N-1timesundergenerics by macro expansion. r easy.Forexample,thesuffixrule Constructing a suffix g r a ~ a is terminal A "+ B C D is: A-suffix "+ B-suffix C D
1
C D
I
C-suffix D
I
D
I
for the nonD-suffix:
Using this technique, construct the suffix grammar for the grammar of Figure 2.84. Try to make the resulting suffix grammar LALR( l)and check this property using an LALR( 1) parser generator. Use the resulting parser torecognize tails of productions of the grammarof Figure 2.84. Add parse tree constructing codeto the recursive descent recognizerof Figure 2.54. (a)Construct a (non-predictive) S "+ '('
S ')'
I
recursive descent parserfor the grammar
')'
Will it parse correctly? ( S t ) E. (b)Repeat for (c)Repeat for S-+' ( ' S I ) ' I ) I
~
I
I&.
Why is the correct associativity of the addition operator + (in the grammar of Figure 2.53) less important than thatof the subtraction operator-?
Exercises
I89
2.33 Naive recursive descent parsing of expressions with n levels of precedence requires n routines in the generated parser. Devise a technique to combine then routines into one routine, which gets the precedence as a parameter. Modify this code to replace recursive calls to the same precedence level by repetition, so that only calls to parse expressions of higher precedence remain.
Add parse tree constructingcode to the predictive recognizer in Figure 2.61. Naively generated predictive parsers often contain useless code. For example, the entire switchmechanismintheroutine parenthesized-expression () in Figure 2.61 is superfluous, and so is the default : error () ;case in the routine terrn ( ) . Design rules toeliminate these inefficiencies. 2.36 Answer the questions of Exercise 2.31 for a predictive recursive descent parser. 2.37 (a) Devise thecriteria for a g r a m a to allow parsing witha non-predictive recursive descent parser. Call such a grammar NE” (b) Would you create a predictive or non-predictive recursive descent parser for an NPRD grammar?
The grammar in Figure 2.1l 1 describes a simplified version of declarations in C,
declaration: decl-specifiers init-declarator? decl-specifiers: type-specifier decl-specifiers? type-specifier: lint’ I ‘long’ init-declarator: declarator initializer? declarator: IDENTIFIER I declarator ’(‘ ‘1’ I declarator ‘[’ ‘ l ’ initializer: ’ expression I initializer-list I initializer-list initializer-list: expression I initializer-listinitializer-list I initializer-list ’}‘ l
;
’
= I
l { !
‘ { l
l } ’
l } ‘
’ { l
Figure 2.111 A si~plifiedgrammar for declarations in C.
(a)Showhowthisgrammarproducesthedeclaration long inti = {1, 2 } ; (b) Make this grammar LL( 1) under the unrealistic -assumption that expression is a single token. (c) Retrieve the full grammar of the variable declaration in C from the manual and make it LL(1). (Much more difficult.)
-
(a) Construct the transition tableof the LL(1) push-down automatonfor the g r a ~ a r
epeat, but with the above definitionof B replaced by B + b
1
E
Complete the parsing started in Figure 2.70. e t e ~ i n where e exactly the prediction stackis located in a predictive parser. ~ ~ l ~ - L L~( dl )v, a npca ~r ~~i ntopic ~ : (a) The LL( l ) methoddescribedinthisbookusesthe F O L L O ~set of a nonterminal N to decide when to predict a nullable production of N. As in the SLR( L) method, the F ~ L L O set ~is too coarse an approximation since it includes any token that can ever follow N , whereas we are interested in the setof tokens that can follow N on the actual prediction stack during parsing. Give a simple grammar in which this makes a difference. (b)Wecaneasilyfindtheexacttokensetthatcanactual1 terminal 7‘ on the prediction stack f T, a 1: it is FIRST(a), exact token set to improve our prediction? (c) ~e can inco~oratethe exact follow set of each prediction stack entry into the LL( 1) ~ush-downautomaton by expandingictionstack entries to ( g r a ~ m a r symbol,tokenset)pairs.Inanalogytotheautomaton,thesetokensetsare called ‘look-ahead sets’. Design rules for computing the look-ahead sets in the predictions for the stack element(N, a)for production rulesN+P. (d) The LL(l) method that uses the look-aheads described here rather than the FOLLOW set is called ‘full-LL( l)” Show that full-LL(1) provides better error detection than strong”LL(l),in the sense that it will not incorrectly predict a nullable alternative. Give an example using the grammar from part(a). (e) Show that there is no full-LL(1) grammar that is not also strongstruct a grammar that has a F I R S T ~ O L L Oconflict ~ when set, such that the conflict goes away in all situations when using th look-ahead set. (f) Show that there are full-LL(2) g r a m m ~ sthat are not strong-LL(2). sider a non-ter~inalwith two alternatives, one producing the empty string and one producing one token.
the g r a m ~ a of r Figure 2.53 and some tables provided in the text, deter~ine p~rent~esi~e~-~xpre~s~ the acceptable set of theLL(1)parsingstack rest-expression EOF. Consider the automatic com~utationof the acceptable set based on continuations9as ex~lainedin Section 2.2.4.5. The text suggests that upon finding anerror, the parser goes through all the l~otionsit would go through if the input were exhausted. This
ise a simpler method to c o ~ p u t ethe acceptable nt 2: note that the order in which the symbols sit ~ athe ~ value of the a c c e ~ t a ~set. le on the stack is i ~ m a t e for table set of a prediction stack confi uration
a will always
rules for the con~ersio correct trees with trans for me^ conversion to bea u t o ~ a t e,or show that this c a ~ n obe t done. item sets and their transitia~s for the ote: ( ,not ) inthesecond alter~ative,) parser when the that haw calls for a ‘reduce usin N+a’, the top of the stack does indeed c o ~ t a the i ~ m e ~ ~ e of r sa in the correct when the reduce move has been ~ e ~ o r by ~ erepdlac in^ a by N, the ed on topof it cannot be ‘erroneo~s’in an L,
(1) automata for the ~ r a ~ ~ a r
cursion causes linear re in p ~ i c u l show ~, mmar S+XS I x the stack will grow at behavior in this respect between airs of items can coexist in anL
I
ro
t to
ct
Q+.P (c) Show that no look-ahead can make the item in setpart (b) conflict-free. Construct an unambiguous grammar f r a g ~ e n that t implements the C interpretation of the dangling else. Hint: since the ambiguity occurs when the then-part of an ifstatementcontains an if-state~entwithoutanelse-part,introducetwonewnont e ~ i n a l s ,short-i f-s tatement for an if-statement without an else-part, and st~tement-but-not-short-if withtheobviousmeaning; additional new non-terminals may be needed. Refer to the subsection on resolving shift-reduce conflicts in Section2.2.5.7 on making a g r a m m ~LR(l), where precedence information about operators is used to help resolve shift-reduce conflicts. In addition to having precedences, operators can be left- or right-associative. For example, the expression a+b+c must be grouped as (a+b ) +c, but a* *b* *c, in which the ** represents the ex~o~entiation operator, as a** (b* *c ) , a conventionarising from thefact that mustbegrouped (a**b)**c would simply be equal to a** (b*c ) . So, addition is left-associative and e~ponentiationis ~ght-associative. ncorporate associ~tivityinto theshiftreduce conflict-resolvi~grules stated in the text. (a) Show that the g r a ~ m a for r type in some p r o g r a ~ ~ i nlanguage, g shown in Figure 2. I 12, exhibits a reduceheduce conflict. type "+ actual-type I virtual-type actual-type -+ actual-basic-type actual-size virtual-type -+virtual-basic-type virtual-size actual-basic-type "+ 'int' I 'char' [' N U ~ '1 ~ E ~ actual-size -+ ' virtual-basic-type "+ 'int' I 'char' I 'void' virtual-size "+ ' [' ' 1 igure 2.312 Sample gra~nrnarfor type.
(b) Make the grammar LAL (I); check your answer using an LAL erator. (c) Add code that constructs the proper parse tree in spite of the transfor~a~ion.
7 Consider a variant of the g ~ ~ m frommFigure ~ 2. 5 in which A is error-recovering:
I
S - + A xb A "3 aAb B B " 3 x
I
I
erroneous
How will the LR(1) parser for this grammar react to empty input? What will the resulting parse tree be?
2.58 LR error recovery with stack modification throws away trees that have already been constructed. What happens to pointers that already point into these trees from elsewhere? 2.59 History o ~ ~ ~ Study ~ s iSamelson ~ g : and Bauer's paper (1960), which. introduces the use of a stack in parsing, and write a summaryof it.
The lexical analysis and parsing described in Chapter2, applied to a pro~ramtext, result in an abstract syntax tree with a minimal but important degree of annotation: the Token.class and Token.repr attributessupplied by thelexicalanalyzer as the a token initial attributes of theterminalsintheleafnodes of theAST.Forexample, representinganintegerhastheclass‘integer’and its valuederivesfromthetoken representation; a token representing an identifier has the class ‘identifier’, but completion of further attributes may haveto wait until the identification mechanism has done its work. Lexicalanalysisandparsingtogetherperformthe context-free processing of the source program, which means that they analyze and check features that can be analyzed and checked either locally or in a nesting fashion. Other features, for example checking the number of parameters in a call to a routine against the number of parameters in its declaration, do not fall into this category. They require establishing and checkin range relationships, whichis the domainof context ha~dling. Context handling is required for two different purposes: to check context co~ditions imposed by the language specification and to collect i n f o ~ a t i o nfor semantic processin In an extremely clean compiler, two different phases would be assigned to this: first all language-requiredcontextcheckingwouldbedone,thentheinputprogramwould be declared contextually correct, and only then would the collection of other information start. The techniquesusedare,however,exactlythesame,and it would be artificialto distinguish the two aspects on a technical level. After all, when we try to find out if a given m a y parameter A to a routine has more than one dimension, it makes no difference whether we do so because the language forbids multi-dimensionalm a y parameters and we if A hasmorethanonedimension,orbecausewecan havetogiveanerrormessage
ata needed for these analyses and checks is stored as attributes in the nodes of
sicallystoredthereoractuallyreside else~here,for in the local variables of an analyzing routine, is more or d efficienc~considerations prime focus in this book is ple~entation,we will treat
ulesarestilloftenwrittenby
d that the input values needed
s have large and often r ~ ~ e t i ~sets i v eof
The c o ~ p ~ t a t i o ~ s r e ~by ~context ired
bein ar that is already
hand1
can
bespecified
he contest-free . To d with two features, one *
*
terminal or b on-ter~~nal, zero or more attrib a type, like the fields in a record; these are ameters, they consist of a name and a type o
1
x
the actual attributes is allocated automatically in each node that is created for S in the abstractsyntaxtree.Theattributesareusedtoholdinformationaboutthesemantics attached to that specific node. So, all nodesintheASTthatcorrespondtothesame grammar symbol S have the same formal attributes, but their values -the actual attributes -may differ. -With each U N+M, ...Mrt,a set of tation rules are associated -the -whichexpresssomeattributevalues of theleft-hand side N and the members of the right-hand side Mi in terms of other attributes values of these. These evaluation rules also check the context conditions and issue warning and error messages. Note that evaluation rules are associated with production rules rather thanwithnon-terminals.This is reasonablesincetheevaluation nnles areconcerned with the attributesof the membersM i , which are production-rule-specific. *
In addition, the attributes have to fulfill the followin
-The attributes of eachgrammarsymbol
synes; theevaluationrules for all production rules of N can count on the valuesof the inherited attributes of N to be set by the parent node, and have themselves the obligation to set the synthesized attributes of N. Note that the re~uirementconcerns g r a ~ m a rsymbols rather than production rules. This is again reasonable, since in any position in the AST in which anN node produced by one production rule for N occurs, a node produced by any other produ~tionrule of N may occur and they should all have the same attribute structure.
N are divided into twogroups,called
The re~uireme~ts apply to all alternatives of all g r a m ~ a symbols, r and more in particular to all M i s in the production rule N+MI ...Ma. As a result, the evaluat~onrules for a production rule N+Ml ...M E can count on the valuesof the synthesized attributes of M ito be set by Mi, and have the obligation to set the inherited attributes of Mi, for all l G= Base: ERROR "Token It Token value cannot be a digit in base V Base; RETURN Base - 1; Figure 3.9 The function Checked
digit
value.
Figure 3.10 The dependency graph of Number.
base Digit-Seq ~
I base
Digit
value
F i ~ u 3.11 r ~ The two dependency graphs of Digit-Seg.
Digit-Token
~
The dependency graph
igu
of Digit.
Base-Tag
'D'
I
B'
3 The two dependency graphs of Base-Tag.
system the rulefor Digit-Seq could, for example, look as follows: Digit-Seq(IN~ base, SYN value)"+ Digit-Seq (base, value) Digit (base, value) ~TTRIBUTERULES: SET value TO Digit-Seq .value * base + Digit .value;
I
Digit
(base,
value)
"his is indeed a considerable si~plificationover Figure 3.8. The style of Figure 3.8 has the advantage of being explicit, unambiguous, and not influenced towards any pa~icular system, and is preferable when many non-terminals have attributes with identical names. But when no mis~nderstandin can arise in small examples we will use the above abbreviated notation. To implement the data-flow ~ a c h i n ein the way explained above, we have to visit all nodes of the data de~endencygraph. ~isitingall nodes of a graph usually requires some care to avoid infinite loops, but a simple solutio^ is available in this case since the nodes are also linked in the parse tree, which is loop-free. y visiting all nodes in the parse tree we auto~aticallyvisit all nodes inthe data dependency graph, and we can visit all nodes in the parse tree by aver sing it recursively. Now our algorithm at each node is very simple: try to perfom all the ass rules section for that node, traverse the children, and when return' o perform all the assign~ents rules the section. in The attribute rited values downwards; pre-visit as post-visit the
assignments harvest the synthesized attributes of the children and propagate them upwards.
PROCEDURE Evaluate for Digit-Seq alternative-l ( pointer todigit-seq node Digit-Seq, pointer todigit-seq alt-l node Digit-Seq alt-l ):
// Propagate attributes: Propagate for Digit-Seq alternative-l (Digit-Seq, Digit-Seq alt-l);
// Traverse subtrees: Evaluate for Digit-Seq (Digit-Seq a1t-l .Digit-Seq); Evaluate for Digit (Digit-Seq alt-l.Digit);
// Propagate attributes: Propagate for Digit-Seq alternative-l (Digit-Seq, Digit-Seq alt-l) ( PROCEDURE Propagate for Digit-Seq alternative-l pointer to digit-seq node Digit-Seq, pointer to digit-seq alt-l node Digit-Seq alt-l 1: IF Digit-Seq alt-l .Digit-Seq .base is not set AND Digit-Seq .base is set: SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq.base;
IF Digit-Seq alt-l .Digit .base is not set AND Digit-Seq .base is set: SET Digit-Seq alt-l .Digit .base TO Digit-Seq .base; IF Digit-Seq .value is not set AND Digit-Seq alt-l .Digit-Seq .value is set AND Digit-Seq .base is set AND Digit-Seq alt-l .Digit .value is set: SET Digit-Seq .value TO Digit-Seq alt-l .Digit-Seq .value * Digit-Seq .base + Digit-Seq alt-l .Digit .value; Figure 3.14 Data-flow codefor the first alternative of Digit-Seq.
Outline code for theevaluation of nodesrepresentingthe first alternative of Digit-Seq is given in Figure 3.14. The code consists of two routines, one, Evaluate which organizes the assignment attempts and the for Digit-Seg alternative-l, recursive traversals, and one, Propagate for Digit-Seq alternative-l, whichattemptstheactualassignments.Bothgettwoparameters: a pointer to the Digit-Seq node itself and a pointer, Digit-Seq alt-l, to a record containing the pointerstothechildren of thenode. The type of thispointer is digit-seq alt-1 node,since we are working on nodes that represent the first alternative of the grammar rule for Digit-Seq. The twopointersrepresentthetwo levels independencygraph diagrams like the one in Figure 3.3.
The routine Evaluate for Digit-Seq alternative-l is called by a roufor Digit-Seq when this routine finds that the Digit-Seq node it tine Evaluate is calledforderivesits first alternative. The codein Evaluate for Digit-Seq alternative-1 is straightforw~d. The first IF statement in Propagate for Digit-Seq alternat ive-l corresponds to the assignment SET Digit-Seq[l] .base TO Digit-Seq .base;
in the rules section expressed as
of Digit-Seq in Figure 3.8. It shows the same assign men^, now
SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq .base;
but preceded by a test for appropriateness. The assign~entis appropriate only if the destination value has not yet been set and the source value(s) are available. A more elaborate version of the same principle can be seen in the third I F statement. All this means, of course, that attributes have tobe implemented in sucha way that one can test if their values have been set. 3.15, calls theroutine Evaluate for The overalldriver,showninFigure Number repeatedly,untiltheattribute Number .valueis set.Eachsuch call will cause a ~ompleterecursive traversal of the syntax tree, transporting values down and upas available. For a 'normal' attribute g r m a r , this process converges in a few rounds. Actually, for the present example it always stops after two rounds, since the traversals work from left to right and the grammar describes a two-pass process. A call of the resulting program withinput 5 6 '7Bprints Evaluate for Number Evaluate for Number Number .value = 375
called called
The above data-flow implementation, charrning as it is, has a number of drawbacks. First, if there is a cycle in the computations, the attribute evaluator will loop. Second, the produced code may not be large, but it does a lot of work; with some restrictio~son the attribute grammar, much simpler evaluation techniques become possible. There is much theory about both problems, and we will discuss the essentials of them in Sections 3.1.3.2 and 3.1.5.
PROCEDURE Run the data-flow attribute evaluator on node Number: WHILE Number .value is not set: // report progress PRINT "Evaluate for Number called"; Evaluate for Number (Number) ;
// Print one attribute: PRINT "Number .value=
It,
Number .value;
Figure 3.15 Driver for thedata-flow
code.
There is another, almost equally naive, methodof dynamic attribute evaluation, which wewanttomentionhere, since itshowsanupperbound for thetimerequiredto do dynamic attribute evaluation. In this method, we link all attributes in the parse tree into a linked list, sort this linked list topologically according to the data dependencies, and perform the assignments in the sorted order. If there are n a t ~ b u t e and s d data dependencies, them topologically costs O(n+d);the subsequent assignments cost O(n). The topological sort will also reveal any (dynamic) cycles. For more about topological sort, see below.
The difference between normal sorting and topological sorting is that the n o ~ a sort l workswith a cornparisonoperatorthatyieldsthevalues ‘smaller’, ‘equal’, and ‘larger’, whereas the comparison operator of the topological sort can also yield the value ‘don’t care’: normal sorting uses a totalordering,topologicalsorting a partialordering.Elementthat compare as ‘don’t care’ may occurin any order in the orderedresult. The topological sort is especially useful when the comparison represents a dependency of some kind: the ordered result will be such that no element in it is dependent on a later element and each element will be preceded by all its pre~equisites.This means that the elements can be produced, computed, assigned,or whatever, in their topological order. ical sort can be performed recursively in time
~roportionalto
O(n+d), where n is the number of elements and d the number of dependencies, as follows. Take an abitrary elementnotyetintheordered
result, recursively find all elements it is dependent on, and put these in the ordered result in the proper order. Now we can append the element we started with, since all elements it depends on precede it. Repeat until all elements are in the ordered result. For an outline algorithm see Figure 3.16. It assumes that the set of nodes that a given node is dependent on can be found in a time proportional to thesize of that set.
To prevent the attribute evaluator from looping, cycles in the evaluation computations must
be detected. We must distinguish between static and dynamic cycle detection. In dynamic cycle detection, the cycle is detected during the evaluation of the attributes in an actual syntax tree; it shows that there is a cycle in a particular tree. Static cycle detection looks at the attribute grammar and fromit deduces whether any tree that it produces can ever exhibit a cycle: it covers all trees. In other words: if dynamic cycle detection finds that there is no cycle in a particular tree, then all we know is that that particular tree has no cycle; if static cycle detection finds that there is no cycle in an attribute g r ~ a r then , we know
11
F ~ C T I O NTopological sort of (a set Set) RETURNING a list: SET ListTO Empty list; ~ H I L Ethere is a Node in Set but not in List: Append Node and its predecessors List; to RETURN List; ~ R O C E ~ ~Append RE Node and its predecessors to List: // First append the predecessors of Node: FOR EACH Node-l IN the Set of nodes that Node is dependent on: IF Node-l is not in List: Append Node-l and its predecessors to List; Append Node toList; ure 3.16 Outline codefor a simple i ~ p l ~ ~ e n ~ofttopological ion sort.
that no tree producedby that g r a m m ~will ever exhibit a cycle. Clearly static cycle detection is much more valuable than dynamic cycle detection; u~surprisingly,it is also much more difficult.
There is a simple way to dynamically detect a cycle in the above data-flow implementation, but itis inelegant: if the syntax tree hasN attributes and more thanN rounds are found to be required for obtaining an answer, there must be a cycle. The reasoning is simple: if there is no cycle, each round will compute at least one attribute value, so the process will t e ~ n a t after e at most N rounds; if it does not, there is a cycle. Even though this brutein the force approach works, the general problem with dynamic cycle detection remains: end we have to givean error message saying somethinglike 'Compiler failure due to a data depende~cycycle in the attribute grammar', which is e ~ b a ~ a s s i n gIt. is far preferable to do static cycle checking; if we reject during compiler const~ctionany attribute gram^^ that can ever producea cycle, we will not be caught in the above situation,
As a first step in designing analgo~thmto detect the possibilityof an attribute dependency cycle in any tree produced by a given attribute g r ~ a r we , ask ourselves how such a
cycle can exist at all. A cycle cannot ori nate directly from a dependency graph of a productionrule P, for thefollowinheattributeevaluationrulesassignvaluestoone ted attributes of the children of P and the synthesized attributes set of attributes, the values of the syn~esized att~butes of the children of P and the inherited attributesof P. And these two sets aredisjoint, have no element in common,so no cycle can exist. For an attribute dependency cycle to exist, the data flow has to leave the node, pass through some part of the tree and return to the node, perhaps repeat this process several times to different parts of the tree and then return to the attribute it started from. It can
leave downward through an inherited attribute of a child, into the tree that hangs from this node and then it must return from that tree througha synthesized attribute of that child, or it can leave towards the parent through oneof its synthesized attributes, into the rest of the tree, after which it must return from the parent through oneof its inherited attributes. Or it can do both in succession, repeatedly, in any combination. Figure 3.17 shows a long, possibly circular, data-flow path. It starts from an inherited attribute of node N,descends into the tree below N,passes twice through one of the subtrees at the bottom and once through the other, climbs back to a synthesized attribute of N, continues to climb into the rest of the tree, where it first passes through a sibling tree of at the left and then through one at the right, after whichit returns to node N,where it lands at an inherited attribute. If this is the same inherited attribute the data flow started from, there is a dependency cycle in this particular tree. The main point is that to form a dependency cycle the data flow has to leave the node, sneak its way through the tree and return to the same attribute, It is this behavior that we want to catch at compiler construction time.
Figure 3.17 A fairly long, possibly circular, data-flow path.
Figure 3.17 shows that there are two kinds of dependencies between the attributesof a n o n - t e ~ i n aAt l from inherited to synthesized and from synthesized to inherited. The first eney and stems from all the subtrees that can be found under N; there are infinitely many of these, so we need a summary of the dependencies they can generate. The second is called an S ency andoriginatesfrom all the trees of a telymany of these,The s u m m ~ of the which N canbe a node;thereare dependencies between the attributes of a non-te~inalcan be collected in an an example of which is shown in Figure 3.18. Since IS-dependencies stem from things that happen below nodesfor N and SI-dependencies from things that happen above nodes for N, it is convenient to draw the dependencies(in data-flow direction!) in those same positions. The IS-SI graphs are used as follows to find cycles in the a t ~ b u t edependencies of a
A n example of an IS-SIgraph.
grammar. Suppose we are given the dependency graph for a production rule N+PQ (see Figure 3.19), and the complete IS-SI graphs of the children P and Q in it, then we can obtain the IS-dependenciesof N caused by N+PQ by adding the dependencies in the IS-SI graphs of P and Q to the dependency graph of N-PQ and taking the transitive closure of the dependencies. This transitive closure uses the inference rule that if data flows from attribute a to attribute b and from attribute b to attribute c, then data flows from attribute a to attribute c.
i~ur3 e.19 The dependency graph for the production ruleN"+P&.
The reason is as follows. At attribute evaluation time, all data flow enters the node through the i~heritedattributes o f N , may pass through trees produced by P and/or any order, and emerge to the node and may end up in synthesized attributes. Since SI graphs of P and Q s u ~ a r i z all e possible data paths through all possible trees produced by P and Q, and since the dependency graph of N+PQ already showed the fixed direct dependencies within that rule, the effects of all data paths in trees below N+P known.Next we take the transitive closure of thedependencies.Thishastwoeffects: first, if there is a possible cycle in the tree below N including the node for N+PQ, it will show up here; and second, it gives us all data-flow paths that lead from the inherited attributes of N in N+PQ to synthesized attributes. If we do this for all production rules for N, we obtain the complete setof IS-dependencies of N. Likewise, if we had all depende~cygraphs of all production rules in which N is a child, and the complete IS-SI graphs of all the other non-terminals in those production
rules, we could in the same manner detect anycycle that runs through a tree of which N is a child, and obtain all SI-dependencies of N. Together this leads to the IS-SI graph of N and the detectionof all cycles involvingN. Initially, however, we do not have any complete IS-SI graphs. So we start with empty IS-SI graphs and perform the transitive closure algorithm on each production rule in turn and repeat this process until no more changes occur to the IS-SI graphs. The first sweep through the production rules will findall IS- and SI-dependencies that follow directly from the dependency graphs, and each following sweep will collect more dependencies, untilall have been found. Then, if no IS-SI graph exhibits a cycle, the attribute g r a m ~ a is r noncyclic and is incapable of producing an AST with a circular attribute dependency path. We will examine the algorithm more in detail and then see why it cannot miss any dependencies.
SET the flag Something was changed TO True;
// Initialization step: FOR EACH terminal T I N Attribute grammar: SET the I S - S I graph of T TO T ' s dependency FOR
graph;
EACH non-terminalN I N Attribute grammar: SET the I S - S I graph of N TO the empty set;
~ H ~ Something L E was changed: SET Something was changed FOR
EACH
production
TO
False;
rule P = Mo+M ,...M, I N Attribute
grammar:
// Construct the dependency graph copy D: SET the dependency graph D TO a copy of the dependencyhgraphof // Add the dependencies already found Mii4 for n : FOR EACH M I N M O...M , , : FOR EACH dependency d I N the I S - S I graph of M : Insert d in D ; // Use the dependency graph D: Compute all induced dependencies Dinby transitive closure; I F D contains a cycle: ERROR ticyclefound inproductionii,P // Propagate the newly discovered dependencies: FOR EACHM I N M ~ . . . ~ , : FOR EACH dependency d I N L) SUCH THAT the attributes dinare attributesof M : I F there is no dependencyd in the I S - S I graph of M : Enter a dependency d into theI S - S I graph of M ; SET Something was changed TO True; Figure 3.20 Outline of the strong-cyclicity testfor an attribute g r a l ~ ~ .
An outline of the algorithm is given in Figure 3.20. It examines each production rule in turn, takes a copy of its dependency graph, merges in the dependencies already known raphs of the non-te~inaland its children, and takes the transitive cloncies. If a cycle is discovered, an error message is given, Then the algorithm updates non-terminal the children its and newly with any discovered ~ e ~ e ~ d e n c i process result the of this, is repeated, since still mo ures3.19 t ~ o u g h3.23showtheactions of one su step.Thedependenciesin fromtheattributeevaluation ru ven for N ~ P in Qtheattriies areimmutable, so we a w o r ~ n gcopy of themin collectedsofarareshowninFigure3.21. The ,in N,P, and Q; these may originate directly from s of these n o n - t e ~ n a l sor , they may have been found by previousrounds of the algorith~. Thediagrams also containoneSI-dependency, from N.s l to N.i2; it must originate from a previous round of the algori~m,since the dependency graphs of rules for a non-terminal do not contain assignments to the inherited attributes of that non-te~inal.The value of the synthesized attribute .S does not depend on ,so it is either generated inside Q or derives from terminal symbol in this is shown as an m o w starting from nowhere.
The IS-SI graphs of N, P,and Q collected so far.
The dotted lines in ure 3.22 show the result of mer~ingthe IS-SI graphs of N,P, ng the transitive closure adds manymore dependencies, but to drawnonlythosethat connect twoattributes of thesamenonterminal. There aretwo of theseone I -dependencyfrom N.il to path N.i L +P.i +P. .s2), SI-dependency one and the Q .i ). These added areto th (because of path and Q, respectively, ~esultingin We now want to show that ure 3.20 cannot miss cycles that might orithm may, however, some ct cycles that cannot occur in actual
igure 3.22 Transitive closure over the dependenciesof N, P, Q and D.
Figure 3.23 The new IS-SIgraphs of N,P , and Q.
trees, as we will see below. Suppose the algorithm has declared the attribute grmmar to be cycle-free, and we still find a tree T with a cyclic attribute dependency patht" in it. We shall now show that this leads to a contradiction. We first take an arbitrary node N on the path, and consider the parts of the path inside N. If the path does not leaveN anywhere, it just follows the dependencies of the dependency graph of N, since the path is circular, the dependency graph of N itself must contain a cycle, which is impossible. So the path has to leave the node somewhere. It does so through an attribute of the parent or a child node, and then returns through another attribute of that same node; there may be more than one node with that property. Now for at least one of these nodes, the attributes connected by the path leaving and returning to N are not connected by a dependency arc in the IS-SI graph of N: if all were connected they would forma cycle in the IS-SI graph, which would have been detected. Call the node G, and the attributes A and A 2. Next we shift our attention to node G. A I and A 2 cannot be connected in the IS-SI graph of G, since if they were the dependency would have been copied to the IS-SI graph of N. So it is obvious that the dependency between A l and A cannot be a direct depen-
dency in the dependency graph of C. We are forced to conclude that the path continues H,different from N,through and that C too must have at least one parent or child node which the circular path leaves C and returns to it, through attributes that are not connected by a dependency arc in the IS-SI graph of G: if they were all connected the transitive closure step would have added the dependency betweenA and A 2 . The same r e a s o n i ~applies ~ to H , and so on. This procedure crosses off all nodes as possible sources of circularity, so the hypothetical circular pathP cannot exist, which leads to our claim that the algorithmof Figure 3.20 cannot miss cycles. An attribute grammar in which no cycles are detectedby the algorithm of Figure 3.20 is called , algorithm The presented here is actually too pessi~istic about cyc t cycles where none can materialize. The reason is that the algorithm assumes that when thedata flow from an attribute of node N passes through N ' s child M kmore than once, it can find a different subtree there on each occasion. This is the result of merging into D in Figure 3.20 the IS-SI graph of M k ,which represents the data flow through all possible subtreesfor Mk. This assumption is clearly incorrect, and it occasionally allows dependencies to be detected that cannot occur in an actual tree, leading to false cyclicity messages. A correct algorithm exists, and uses a set of ZS-SI graphs for each on-terminal, rather than a single IS-SI graph. Each IS-SI graph in the set describes a combinatio~of dependencies that can actually occur in a tree; the union of the IS-SI graphs graphs for N yields the single IS-SI graph usedfor N in the algorithmof in the same way as the union of the look-ahead sets of the items for N yields the FOLLOW setof N. In principle, the correct algorithmis exponential in the maxinlum number of members in any grammar rule, but tests (Raihii and Saarinen, 1982) have shownthatcyclicitytesting for practicalattributegrammars is ite feasible. A gram^^ thatshowsnocyclesunderthecorrectalgorithm is called 110 yelic. Almost all grammars that are non-cyclic are also strongly non-cyclic, so in practice the simpler, ~euristic, algorithm of Figure 3.20 is completely satisfactory. Still, it is not difficult to construct a non-cyclic but not strongly non-cyclic attribute grammar, as is shown in Exercise 3.5. The data-flow technique from Section 3.l .2 enables us to create very general attribute evaluators easily, and the circularity test shown here allows us to make sure that they will not loop. It is, however, felt that this full generality is not always necessary and that there is room for less general but much more efficient attribute evaluation methods. We will coverthreelevels of simplification:multi-visitattribute ramm mars (Section 3,3.5), L.attributed g r ~ m (Section ~ s 3.1.'7), and S-attributed g r a ~ ~ a(Section rs 3.1.8). The latter two are specially il~portantsince they do not needthe full syntax tree to be stored, and are therefore suitable for narrow compilers.
So far we have assumed that the attributes of a node are allocated in that node, li in a record. For simple attributes -integers, pointers to types, etc. -this is satisfactory, but for large values, for example the environ~ent,this is clearly undesirable. The easiest solution is to impleme~tthe routine that updatesthe environment such thatit delivers a pointer
to the new environment. This pointer can then point toa pair containing the update andthe pointer to the old environment; this pair would be stored in global memory, hidden from the attribute g r a ~ a r The . implementation suggested here requiresa lookup time linear in the size of the environment, but better solutions are available. Another problem is that many attributes are just copies of other attributes on a higher orlowerlevelin the syntaxtree,andthatmuchinformation is replicatedmanytimes, requirin~time for thecopyingandusing up memory.Choosing a goodform for the abstract syntax tree already alleviates the problem considerably. Many attributes are used in a stack-like fashion only and can be allocated very profitably on a stack (Jazayeri and Pozefsky, 1981). Also, there is extensive literature on techniques for reducing the memory requirementsfurther (opdenAkkerandSluiman,1991;EngelfrietanddeJong,1990; Kastens, 198’7; Farrow and Yellin, 1986). Simpler attribute allocation mechanisms are possible for the more restricted attribute grammar types discussed below.
Now that we have seen a solution to the cyclicity problem for attribute grammars, we turn to their efficiency problems. The dynamic evaluation of attributes exhibits some serious inefficiencies: values must repeatedly be tested for availability; the complicated flow of control causes much overhead; and repeated traversals over the syntax tree may be needed to obtain all desired attribute values.
The above problems can be avoided by having a fixed evaluation sequence, implemented as p r o g r a ~code, for each production rule of each non-te~inalN; this implements a form of s t ~ t i c~ t ~ r ie ~v ~ ~ Zt ~ e ~ tThe io~ task . of such a code sequence is to evaluate the attributes of a node P, which represents production ruleN+MIM 2.... The attribute values needed to do so can be obtained in two ways:
-The code can visit a child C of P to obtain the values of some of C’s synthesized attriC to compute butes while supplying some of C’s inherited attribute values to enable those synthesized attributes. -It can leave for the parent of P to obtain the values of some of P’s own inherited attributes while supplying some of P’s own synthesized attributes to enable the parent to compute those inherited attributes.
ince there is no point in computing an attribute beforeit is needed, the computationof the required attributes can be placed just before the point at which the flow of control leaves the node for the parent orfor a child. So there are basically two kindsof visits: Supply a setof inherited attribute valuesto a child Mi Visit child Mi Harvest a set of synthesized attribute values supplied Mi by
and synthesized inherited
attribute attribute
values values
to
the
supplied
parent by
the
parent
This reduces the possi~i~ities for the visiting code of a production rule N+MI . . .to~ the ~
// Visit 1 from the parent: flow of control from parent enters here. // The parent has set some inherited attributes, theI N set 1. // Visit some childrenM k t M ,.. ,": Compute some inherited attributes of M k tthe set (ZMk),; 'Visit Mk for th // M~ returns W
ts synthesized attributes evaluated.
Compute some inherited attributes NI;, of the set (ZMl)I; Visit Ml for the first time; // nCr, returns with some of its synthesized attributes evaluated. rhaps visit some more children, including possibly M k or // M, again, while supplying the proper inherited attributes // and obtaining synthesized attributes in return.
// End of the visits to children. Compute some ofN ' s synthesized attributes, the set $NI; Leave to the parent; // End of visit 1 from the parent.
// Visit 2 from the parent: flow of control re-enters here. // The parent has set some inherited attributes, theI N zset . ... // Again visit some children while supplying inherited // attributes and obtaining synthesized attributes in return. $NZ; Compute same ofN's synthesized attributes, the set Leave to the parent; // End of visit2 from the parent.
... // Perhaps code for some more visits 3..n from the parent, // supplying setsIN3 to INn and yielding // sets SN3 to SNn.
Outline code for multi-visit attribute evaluation.
scheme This is called on: the flow of control pays multiple visits toeach node, accord in^ to a scheme fixed at compiler generation time. It canbe imple~entedas a tree-walker, which executes the codesequentia~lyand moves the flow of
control to the children or the parent as indicated;it will need a stack to leave to the correct position in the parent. Alternatively, and more usually, multi-visit attribute evaluation is implemented by recursive descent. Each visit from the parent is then implemented as a separateroutine,a visitin. r ~ u t i ~which e , evaluatesthe approp~ateattributerulesand calls the appropriate visit routines of the children. In this implementation, the 'leave to parent' at the end of each visit is implemented as a return statement and the leave stack is a c c o ~ o d a t e din the return stack.
c " " " " " _
\
I
f i -thvisitto
...- '
I
c - " " " " " " " " _ _
N
(ZMk)~ ,\
h'
(SMk~"""..
" " " " " " " " " "
I'
@
i
'*
igure 3.25 The i-th visit to a nodeN, visiting two children, M, and Ml.
Figure 3.25 showsadiagram
of thei-thvisittoanode
for theproductionrule
N+MrM z..., during which the routine for that node visits two of its children, M k and Ml.
The flow of control is indicated by the numbered dotted mows, the data flow by the solid arrows. In analogy to the notationINi for the set of inherited attributes to be supplied toN on the i-th visit, the notation( I ~ kindicates ) ~ the set of inherited attributes tobe supplied to ~k on the i-th visit. The parent of the node has prepared for the visit by computing the and these are supplied to the node for N (1). Assuming inherited attributes in the set INi,
that the first thing the i-th visit toa node of that type has todo is to perform the h-th visit to Mk (2), the routine computes the inherited attributes ( I M ~(3), ) ~ using the data dependencies from the dependency graph for the production rule N+M M2 .... These are passed to the node of type Mk, and its h-th visiting routine is called (4). This call returns with the ) ~ (5). One of these is combined with an attrib~tevalue synthesizedattributes ( S M ~ set ))~ for the j-th visit to M~(6). from INi to produce the inherited attributes( I M ~('7 (8) supplies back the values of the attributes in ( S M ~ (9). ) ~ Finally the synthesized attributes in SNi are computed (lo), and the routine returns (l 1). Note that during the visits to Mk and M Lthe flow of control ((4) and (8)) and the data flow (solid arrows) coincide; this is because we cannot see what happensinside these visits. An il~portantobservation about the sets INl.,ptand SN1,., is in order here. INi is associated with the start of the i-th visit by the parent and SNi with the i-thleave to the parent. The parent of the node for N must of course adhere to thisinterface, but the parent does not know which production rule for N has produced the child it is about to visit. So the sets INI.., and SNI , . n must be the same for all production rules for N. they are a property of the non-ter~inaliV rather than of each separate production rulefor N. Similarly, all visiting routines for production rules in the grammar that contain the n o n - t e ~ i n aN l in the right-hand side mustcall the visiting routines of N in the same order 1..n. If N occurs more than once in one production rule, each occurrence must get its own visiting sequence, which mustconsist of routine calls in that same orderl..n. It should also be pointed out that there is no reason why one single visiting routine could not visit a child more thanonce. The visits can even be consecutive,if dependencies in general. in other production rules require more than one visit Toobtain a multi-visitattributeevaluator, we will first showthat once weknow acceptable IN and SN sets for all non-te~inalswe can construct a multi-visit attribute evaluator, and we will then see how to obtain such sets. S
The above outline of the multiple visits to a node for a production rule N+ tions attributes the of N into a list of pairs of sets ( I N ,, S N ~ ) , ( I i V 2 , S N 2.. ).,,( IN~,SN,)for what is called an n-visit. Visit i uses the attributes in INi, which were set by the parent, visits some children some number of times in some order, and returns after having set the attributes in SNi. The sets IN I,.n must contain all inherited att~butesof N, and SNI.., all its synthesized attributes, since each attribute must in the endreceive a value some way or another. None of the INi and SNi can be empty, except IN and perhaps SN,. We can see this as follows. If an INi were empty, the visit from the parent it is associated with would not supply any new information, and the visit could be combined with the previous visit. only exception is the first visit from the parent, since that one has no previous visit. If an SiVi were empty, the leave to the parent it is associated with would not supply any new infor~ationto the parent, and the leave would be useless. An exception might be the last visit to a child, if the only purposeof that visit is an action that does not influence theattributes, for example producing an error message. But actually that is an improper use of attribute grammars, since in theory even error messages should be collected in an attrib~te and produced asa synthesized attribute of the start symbol.
Given an a ~ ~ e p t a ~ l e p a r t i t(i~~Nn i n, ~~N i ) ~it=is~ .relatively .~, simple to generate the co~esponding multi~visit attribute evaluator. We will now consider how this can be done and will at the same time see what the properties of an ‘acceptable’ partitioning are. The evaluator we are about to construct consists of a set of recursive routines. There are n routines for each production rule P N+M l ...for n o n - t e ~ i n aN, l one for each of the n visits, with n d e t e ~ n e dby N. So if there are p production rules for N,there will be a total of pxn visit routines for N. Assuming that P is the k-th alternative of N,a possible name for theroutine for thei-thvisittothatalternativemightbe V i s i t-i to N a l t e r n a t ive-k. During this i-th visit, it calls the visit routines of some of the M L ...in P. When a routine calls the i-th visit routine of a node N,it knows statically that it is called for a node of type N,but it still has to find out dynamically which alternative of N is represented by this particular node. Only then can the routine for the i-the visit to the k-th alternative of N be called. So the routine V i s i t-i t o N ( ) contains calls to the routines V i s i t-i t o N a1 t erna t ive-k ( ) ,as shown in Figure 3.26, for all required values of k. PROCEDURE Visit-i to N (pointer to an N node Node): SELECT Node .type: CASE alternative-l: Visit-i to N alternative-l (Node); I . .
CASE alternative-k: Visit-i to N alternative-k
...
(Node);
Figure 3.26 Structure of an i-tb visit routinefor N.
We will now discuss how we can determine which visit routines to call in which order inside a visiting routine V i s i t-i t o N a1 t ernat ive-k ( ) ,based on i n f o ~ a t i o n gatheredduringthegeneration of theroutines V i S i t - h t o N () for Ish and ( ( 1 1 1, {S111, ({I2113}, {s2})* Nowthe idea behind ordered attribute grammars is that the later an attribute is evaluated, the smaller the chance that its evaluation will cause a cycle. This suggests that the second partition in^ is preferable. This late evaluation idea is used as follows to derive a partitioning from an IS-SI graph. We want attributes to be evaluated as late as possible; the attribute evaluated last cannot have any other attribute being dependent onit, so its node in the IS-SI graph cannot have outgoing data-flow arrows. This observation can be used to find the synthesized attributes in S N ~ a snote ~ ; that we cannot write SN;, since we do not know yet the value of rz, the number of visits required. S ~ ~ contains ~ ~ , s all t synthesized attributes in the IS-SI graph on which no other attributes depend; these are exactly those that have no outgoing arrows. Next, we remove the attributes in S N ~ ~from ~ s t the IS-SI graph. This exposes a layer of inherited attributes that have no outgoing data-flow arrows; these make up INlasL,and are removed from the IS-SI graph. This processis repeated for the pair (lNlasf-l,SNEast-l), and so on, until the IS-SI graph has been consumed completely. Note that this makes all the sets in the partitioning non-empty except perhapsfor INI ,the last set to be created:it may find the IS-SI graph empty already. We observe that this algorithm indeed produces the partitioning ( { I I }{S,}), , ({12,13}, { S 2 } )for the IS-SI graphof Figure 3.18. The above algorithms can be performed without problems for any strongly cycle-free attribute g r ~ m a r and , will provide us with attribute ~artitioningsfor all symbols in the grammar. Moreover, the pa~itioningfor each non-terrninal IV conforms to the IS-SI graph for N since it was derived from it. So, adding the data dependencies arising from the partiIS-SI graph of N will not cause any direct cycle inside that IS-SI graph to be ut still the fact remains that dependencies are added, and these may cause larger cycles, cycles involving more than one non-terminal to arise. So, before we can start genIfthe test does not find any erating code, we have to run lgorithm again. r and the ~artitioningsused can be to cycles, the grammar is an or generate att~buteevaluation code. This code will
-not loop on any parse tree,since the final set of IS-SI graphs was shown to becycle-free; -never use an attribute whose value has not yet been set, since the moment an attributeis
~ the IS-SI graphs used is determined by the ~ a ~ i t i o n iS nand the partitionings c o n f o to and so to the dependencies; -evaluate the correct values before each visit to a node and before each return from it, partition in^. since the code scheme in Figure 3.24 obeys the
'Very many, not to say almost all, attribute g r ~ a r that s one writes naturally turn out to be ordered, which makes the notion of an ordered attribute grammara very useful one. We have explained the technique using terns like the k-th visit out of y2 visits, which somehow suggests that considerable numbers of visits may occur. We found it advantageous to imagine thatfor a while, while trying to understand the algorithms,since thinking so made it easier to focus on the general case. ut in practice visit numbers larger than 3 are rare; most of the nodes need to be visited only once, some may need two visits, a small minority may need three visits, and in most attribute grammars no node needs to be visited four times. Of course it is possible to construct a grmmar with a non-terminal X whose nodes require, say, 10 visits, but one should realize that its partition consists of 20 nonoverlapping sets, IN1,.lo and SNI..zo,and that only set IN1 may be empty. So X will have to have at least 9 inherited attributes and 10 synthesized attributes. This is not the kind of non-terminal one normally meets during compiler construction.
The ~ r ~ e uttri~~te r e ~ ~ r a ~ ~thea~ rc t~a l~/ r~ e c ~ ~ a l
e~a~~le
We will now apply the ordered attribute g r a ~ a technique r to our attribute grammar of Figure 3.8, to obtaina multi-visit attribute evaluatorfor that g r a ~ ~ aWe r . will at the same timeshowhowtheorderofthe calls tovisitroutines inside one Visit-i to N a1 t ive-k ernat ( ) routine is determined. Number, Digit-Seq, The IS-SI graphs of the non-te~inalsinthegrammar, Digit,and Base-Tag, are constructed easily; the results are shown in Figure 3.27. We find no cycles during their construction and see that there are no SI-dependencies: this reflects the fact that no non-ter~inalhas a synthesized att~butewhose value is propagated through the restof the tree to return to the node it originates from. The next step is to construct the partitionings. Again, this is easy to do, since each IS-SI graph contains at most one inherited and one synthesized attribute. The table in Figure 3.28 shows the results. As we have seen above, thej-th visit to a node of type Mx in Figure 3.24is a building block for setting the attributes in( S ~ x ) ~ :
// Require the attributes needed to compute the // attributes in( ~ ~ xto) ~be set;
Compute the set( ~ M ~ ) ~ ; Visit child &lx for the j-th time; // Child M x returns with the set evaluated.
but it can only be applied inan environment in which the valuesof the attributes in are available or can be evaluated. With this knowledge we can now construct the code for the first (and only) visit to nodes of the type Number. Number has only one alternative Number lat e - rna t ive-l, so the code we are about to generate will be part of a routine Visit-1 to Number alternative-l ( ). The alternative consists of a Digit-Seq and a Base-Tag. The set E of attributes that have already been evaluated is empty at this point and ne~~-visi2_nuwtberDigitSeq and ne~t_visit-numherBas,-Tag are both zero. The building block for visiting Digit-Seq is
7 The IS-SIgraphs of the on-ter~i~als from g r ~ 3.8. m ~
value
base
IN1 Number Digit-Seq base value value Digit base Base-Tag
SN1
Figure 3.28 P~itioningsof the attributesof g
r
a 3.8.~
~
// Requires Number alt-l .Base-Tag .base to be set. // Compute the attributes in IN, of Digit-Seq O r the set { base } SET Number alt-l .Digit-Seq .base TO Number alt-l .Base-Tag .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Number alt-l .Digit-Seq); // Digit-Seq returns with its$ N I , the set { value // it supplies Number alt-l .Digit-Seq .value.
m
7
and the one for Base-Tag is
// Requires nothing. // Compute the attributes in IN1 of Base-Tag 0 the set { } : // Visit Base-Tag for the first time: I
Visit-l. to Base-Tag (Number alt-l .Base-Tag); // Base-Tag returns with its S N I I the set { base evaluated; // it supplies Number alt-l .Base-Tag .base.
Their data requirements have been shown as comments in the first line; they derive from the set IN1 of Digit-Seq and Base-Tag, as t r a n s f o ~ e dby the data dependencies of the attribute evaluation rules. For example, INI of Digit-Seq says that the first visit The attributeevaluation requires Number alt-l .Digit-Seq .basetobeset. rule for this is SET Number alt-l .Digit-Seq .base TO Number alt-l .Base-Tag .base;
whosedatadependencyrequires Number alt-l . Base-Tag .base tobeset. thevalue of Number alt-l .Base-Tag .baseis notin E atthemoment, so the eq be generatedat this moment. building block for visitingDig t-Si cannot Next we turn to the building block for visiting Base-Tag, also shown above. This building block requires no attribute values to be available, so we can generate code for it. of Base-Tag showsthatthebuildingbsetsthevalue of Number to ase-Tag .base,so Number alt-l . e-Tag .baseis added E. This frees the way for the building block for visiting Digit-Seq, code for which is generated next. The set SNjof Digit-Seq consists of the attribute value,so we can add Number alt-l .Digit-Seq .value to E. There are no more visits to generate code for, and we now have to wrap up the routine Visit-l to Number alternative-l( ) . The set SN1 of Number containsthe attribute value,so code for setting Number .value must be generated. The attribute evaluationrule in Figure3.8showsthat Number .valueis just a copy of Number alt-1 .Digit-Seq .value,which is available, since it is in E. So the code can be generated and the attribute ramm mar turns out to be an ordered attribute grammar, at least as far as Number is concerned. All these considerations result in the code of Figure 3.29. Note that we have effectively been doing a topological sort on the building blocks, using the data dependencies to compare building blocks. For good measure, and to allow comparison with the co~espondingroutine for the data-flowmachineinFigure3.14,wegivethecode for visitingthefirstalternative of Digit-Seq in Figure 3.30. In this routine, the order in which the two children are visited is im~aterial,since the data dependencies are obeyed both in the order (Digi t-Seq, Digit) and in the order (Digit,Digit-Seq). Similar conflict-free constructions are possible for Digit and Base-Tag, so the grammar of Figure 3.8 is indeed an ordered attribute grammar, and we have constructed automatical~yan att~buteevaluator for it. The above code indeed visits each node of the integer ~ u ~ bonly e r once.
PROCEDURE Visit-l to Number alternative-l ( pointer to number node Number, pointer to number alt-l node Numberalt-l 1: // Visit 1 from the parent: flow of control from the parent enters // The parent has set the attributesIN, inof Number, the set{ }.
// Visit some children: // Compute the attributes in IN1 of Base-Tag 0 , the set { }: // Visit Base-Tag for the first time: Visit-l to Base-Tag (Number alt-l .Base-Tag); // Base-Tag returns with its$ N I , the set { base }, evaluated.
// Compute the attributes inI N 1 of Digit-Seq 0 , the set { base }: SET Number alt-l .Digit-Seq .base TO Numberalt-l .Base-Tag .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Number alt-l .Digit-Seq); // Digit-Seq returns with its$ N I / the set { value }, evaluated. // End of the visits to children.
// Compute the attributes in $NI of Number, the set{ value }: SET Number .value TO Number alt-l .Digit-Seq .value; Figure 3.29 Visiting code for Number nodes.
Of course, numbers of the form [0 -9 ]+ [BD] can be and normally are handled by the lexical analyzer, but thatis beside the point. The point is, however, that
-the grammar for Number is representative of those language constructs in which information from further on in the text must be used,
-the algorithms for ordered attribute evaluation have found out automatically that no node needs to be visited more than once in this case, provided they are visited in the right order. See Exercises 3.6 and 3.7 for situations in which more than one visit is necessary. The above construction was driven by the contents of the ~ a ~ i t i o n i nsets g and the data dependencies of the attribute evaluation rules. This suggests a somewhat simpler way of constructing the evaluator while avoiding testing the ~artitioningsfor being a~~ropriate:
-Construct the IS-SI graphs while testing for circularities.
-Construct from the IS-SI graphs thep~titioningsusing late evaluation.
-Construct the code for the visiting routines, starting from the obligation to set the attributes in SNk and working backwards from there, using the datade~endenciesand the IN and SN sets of the building blocks supplied by the other visit routines as our guideline. If we can construct all visit routine bodies without violating the data dependencies, we haveprovedthatthegrammarwasorderedandhaveatthesametimeobtainedthe
PROCEDURE Visit-l to Digit-Seq alternative-l ( pointer todigit-seq node Digit-Seq, pointer to digit-seq alt-l node Digit-Seq alt-l 1: // Visit l from the parent: flow of control from the parent enters here. // The parent has set the attributes Iin N 1 of Digit-Seq, the set{ base }.
// Visit some children: // Compute the attributes in IN, of Digit-Seq
(), the set { base }: SET Digit-Seq alt-l .Digit-Seq .base TO Digit-Seq .base; // Visit Digit-Seq for the first time: Visit-l to Digit-Seq (Digit-Seq alt-l .Digit-Seq); // Digit-Seq returns with itsS N z , the set { value }, evaluated.
// Compute the attributes inINI of Digit 0 , the set { base }: SET Digit-Seq alt-l .Digit .baseTO Digit-Seq .base; // Visit Digit for the first time: Visit-l to Digit (Digit-Seq alt-l .Digit); // Digit returns with itsSNI,the set { value }, evaluated.
// End of the visits to children. // Compute the attributes inSN1 of Digit-Seq, the set { value }: SET Digit-Seq .value TO Digit-Seq alt-l .Digit-Seq .value * Digit-Seq .base Digit-Seq alt-l .Digit .value;
f
Figure 3.30 Visiting code for Digit-Seq alternative-l nodes.
multi-visit attribute evaluation code. This technique is more in line with the usual compiler construction approach: just try to generate correct efficient code;if you can you win, no questions asked. Farrow (1984) discusses a more complicated technique that creates attribute evaluaor not. tors for almost any non-cyclic attributeg r a m ~ a rordered ,
There are a series of restrictions that reduce the most general attribute grammars to ordered attribute grammars. The important point about these restrictions is that they increase considerably the algorithmic tractability of the g r a l ~ ~ a but r s are almost no obstacle to the compiler writer who uses the attributeg~ammar. The first restriction is that all synthesized attributes of a production and all inherited attributes of its children must get values assigned to them in the production. Without this restriction, the attribute grammaris not even well-formed. The second is that no tree produced by the grammar may have a cycle in the attribute dependencies. This property is tested by constructing for each non-terminal N, a s u ~ m a r y ,
the IS-SI graph set, of the data-flow possibilities throughall subtrees deriving fromN. The test for this property is exponential in the numberof attributes in a non-terminal andidentifies non-cyclic attribute grammars. In spite of its exponential time requirement the test is feasible for ‘normal’ attribute grammars on present-day computers. The third restriction is that the grammar still be non-cyclic even if a single ISgraph is used per non-terminal rather than an IS-SI graph set, The test for this property is linear in the numberof attributes in a n o n - t e ~ i n aand l identifies strongly non-cyclic attribute gram~ars. s thattheattributescanbeevaluatedusin The fourthrestrictionrequ Thisleadstomulti-visit attri~utegra multi-visitscheme of Figure 3. rammars have a partitioning for the attributes of each non-terminal, as described above. Testing whether an attribute grammar is mu~ti-visitis exponential in the total number of attributes in the worst case, and therefore prohibitively expensive (in the worst case). The fifth restriction is thatthepartitioningconstructedheuristicallyusin evaluation criterion turn out to be acceptable and not create any new cycles. Th ordered att~butegrammars. The testis O(n2) where rz is thenumber of attributesper non-te~inalif implemented naively, and O(n In n ) in theory, but since n is usually small, this makes little difference. Each of these restrictions is a real restriction, in that the class it defines is a proper subclass of the class above it. So there are g r a ~ m a r sthat are non-cyclic but not stro non-cyclic, strongly non-cyclic but not multi-visit, and multi-visit but not ordered. these ‘difference’ classes are very small and for all practical p u ~ o s e sthe above classes form a single class, ‘the attribute grammars’. In the following two sections we will discuss two classes of attribute g r a m ~ a r sthat result from far more serious restrictions: the ‘L-attributedg r a m ~ a ~ sin, ,which an inherited attribute of a child of a non-terminal N may depend only on synthesized attributes of children to the left of it in the production rule for N and on the inherited attributes of N itself; and the ‘S-attributed grammars’, which cannot have inherited attributes at all.
The parsing process constructs the nodes in the syntax tree in left-to-right order: first the parent node and then the children in top-down parsing; and first the children and then the parent node in bottom-up parsing. It is interest in^ to consider attribute grammars that can match this behavior: attribute g r a m m ~ swhich allow the attributes to be evaluated in one t traversal of the syntax tree. Such grammars are called L-attributed grammar is characterized by the fact that no any of its ~roductionrules has a data-flow arrow that points froma child to that child or to a child to the left of it. Many programming languageg r ~ m a r are s L-attri~ut~d; this is not su~rising,since the left-to-right i n f o ~ a t i o nflow inherent in them helps pro~rammersin reading and understan~ingthe resulting programs. An example is the dependency graph o f the rule for Constant-def inition in Figure 3.3, in which no i n f o ~ a t i o nflows from Expression to Def ined-identif The ier. human reader, like the parser and the attribute evaluator, arrives at a Constant-def ini ion with t a symbol table, sees the
defined identifier and the expression, combines the two inthe symbol table, and leaves the c ~ n s t ~ n t - ~ e ~i oinnbehind. it An example of an attribute grammar that is not Lattributed is the Number grammar from Figure 3.8: the data-flow arrow for base points to the left, andin principle the reader has to read the entire digit sequence to find the B or r) which tells how to interpret the sequence. Only the fact that a human reader can grasp the entire number in one glance saves him or herfrom this effort; computers are less fortunate. The ~ - a t t ~ b u t property ed has an important consequence for the processing of the syntax tree: once workon a node has started,no part of the compiler will need to return to one of the node's siblings on the left to do processing there. The parser is finished with them, and all their attributes have been computed already. Only the data that the nodes contain in the form of synthesized attributes are still important. Figure3.31 shows part of a parse tree for an L-attributed g r a m m ~ .
1 Data tlow in part of a parse tree for an L-attributed ~
r
a
~
~
~
We a s s u ~ ethat the attribute valuator is working on node C 2 , which is the second child of node B,, which is the third child of node A; whether A is the top or the child of another node is i~material.The upward arrows represent the data flow of the synthesized attributes of the children; they all point to the right or to the synthesized attributes of the parent. All inhe~itedattributes are already available when work on a node starts, and can be passedtoanychildthatneedsthem.Theyareshown as d o ~ n w a r darrowsinthe diagrm. Figure 3.31 shows that when the evaluator is working on node C 2 , only two sets of attributes play a role:
-all attribut~sof the nodes that lie on the path from the top to the node being processed: C2,B3, and A,
-the synthesized attributes
of the left siblings of those nodes: Cl, B1, B2, and any left siblings of A not shownin the diagram.
More in particular, no role is played by the children of the left siblings of C2, B,, and A, since all co~putationsin them have already been pe~ormedand the results are summarized in their syn~esizedattributes. Nor do the right siblings of C 2 , B,, and A play a role,
,
since their synthesized attributes haveno influence yet. The attributes of C 2 , B3, and A reside in the corresponding nodes; work on these nodes has already started but has not yet finished. The sameis not true for the left siblings of C 2 , B3,and A, since the work on them is finished; all that is left of them are their synthesized attributes. Now, if we could find a place to store the data synthesized by these left siblings, we could discard each node in left-to-right order, after the parser has created it and the attribute evaluator has computed its attributes. That would mean that we do not need to construct the entire syntax tree but can restrict ourselves at all times to the nodes that lie on the path from the top to the node being processed. Everything to the leftof that path has been processed and, except for the synthesized attributes of the left siblings, discarded; everything to the right of it has not been touched yet. A place to store the synthesized attributes of left siblings is easily found: we store them in the parent node. The inherited attributes remain in the nodes they belong to and their values are transported down along the path from the top to the node being processed. This structure is exactly what top-down parsing provides. This correspondence allows us to write the attribute processingcode between the various members, to be performed when parsing passes through,
Anexample of a system for handlingL-attributed g r a ~ a r sis LLgen; LLgen was explained in Section 2.2.46, but the sample code in Figure 2.76 featured synthesized attributes only, representing the values of the expression and its subexpressions. Figure 3.32 includes an inherited attribute as well:a symbol table which containsthe representations of some identifiers, together with the integer values associated with these identifiers. Thissymboltable is producedas a synthesizedattribute by the n o n - t e ~ i n a l declarations in the rule main,which processes one or more identifier declarations. The symbol table is then passed as an in he rite^ attribute down through expression and express ion-t ai l-opt to be usedion, finally in term to look up the value of the identifier found. Thisresultsinthesynthesizedattribute *t, which is thenpassed on upwards.Forexample,theinput b = 9, c = 5 ; b- c,passedtotheprogramproduced by LLgen from the grammar in Figure 3.32, yields the output result = 4. Note that synthesized attributes in LLgen are implemented as pointers passed as inherited attributes, but this is purely an implementation trickof LLgen to accommodate theC language, which does notfeature output parameters. The coordination of parsing and attribute evaluation is a great si~~lification compared to multi-visit attribute evaluation, butis of course applicable to a much smaller class of attribute g r a ~ a r s .Many attribute grammars can be doctored to become L-attributed g r ~ m ~and s ,it is up to the compiler const~ctorto decide whether to leave the grammar intact and use an ordered attribute evaluator generator or to modify the grammar to adapt it to a system like LLgen. In earlier days much of compiler design consistedof finding ways to allow the -i ~ p l i c i t-attribute grammar to be bandied by a handwritten left-to-right evaluator, to avoid handwritten multi-visit processing, The L-attributed technique allows a more technical definition of a narrow compiler than the one given in Section l .4.1. A er is a compiler, based formally or informally on some form of L-a~tributedg r a m m ~ that , does not save substantially more
{ #include "symbol-table.hlf
1
%lexical get-next-token-class; %token IDENTIFIER; %token DIGIT; %start Main-Program, main; main {symbol-table sym-tab; int result;}: {init-symbol-table(&sym-tab) ;} declarations(sym-tab) expression(sym-tab, &result) (printf('lresu1t = %d\n", result) ;} I
declarations(symbo1-table sym-tab) : declaration(sym-tab) [ declaration(sym-tab) l *
I;'
I
declaration(symbo1-table sym-tab) {symbol-entry *sym-ent;}: repr) } IDENTIFIER {sym-ent = look-up (sym-tab, Token. ; I DIGIT {sym-ent->value = Token-repr - IO';} " I
I
expression(symbo1-table sym-tab; int *e) {int t;}: = t;} term(sym-tab, {*e&t) expression-tail-option(sym-tab, e) I
expression-tail-option(symbo1-table sym-tab; int *e) {int t;}: - I term(sym-tab, &t) {*e -= ti> expression-tail-option(sym-tabf e)
I
I
term(symbo1-table sym-tab; int *t) : IDENTIFIER {*t = look-up(sym-tab, Token.repr)->value;} I
LLgen code for an L-attributed
mar for simple expressions.
infor~ationthan that which is present on the path from the top to the node being processed. In most cases, the length of that path is proportional to In n, where n is the length of the program, whereas the sizeof the entire AST is proportional to n. This, and the intuitive appeal of L-attributed grammars, explains thep o p u l ~ t yof narrow compilers,
We have seen that the attribute evaluation in L-attributed g r a ~ a r can s be inco~orated conveniently in top-down parsing, but its implementation using bottom-up parsing is less obvious. In fact, it seems impossible. The problem lies in the inherited att~butes,which must be passed down from parent nodes to children nodes. The problem is that in bottomup parsing the parent nodes are identified and created only q-fter all of their children have been processed, so there is just no place from where to pass down any inherited attributes when they are needed. Yet the most famous LALR(1) parser generator yncc does it anyway, and it is interesting to see howit accomplishes this feat. A s explained in Section 2.25.2, a bottom-up parser has a stack of shifted terminals and reduced non-terminals;we parallel this stack with an attribute stack which contains the attributes of each stack element in that same order, The problem is to fill the inherited attributes, since code has to be executed for it. Code in a bottom-up parser can only be executed at the end of an alternative, when the corresponding item has been fully reco nized and is being reduced. But now we want to execute code in the middle: A "+ B {C-inh-attr := f (B.syn-attr);} C
.syn-at tr is a synthesized attribute of B and C .inh-at tr is an inherited of C. The trick is to attach the code to an E rule introduced for the purpose, say A-actionl: A -+B A-action1 C A-action1 "+ E {C.inh-attr := f (B.syn-attr) ;} Yncc doesthisautomaticallyand also remembersthecontext of A-act ionl, so B .syn-a ttr1 and C .inh-at tr1 can be identified in spite of them havin out of their scopes by the above t r a ~ s f o ~ a t i o n . Now the code in A-ac tionl is at the end of an alternative and can be executed + E is reduced, This works, but the problem is that after when the item A-act ionl this ~ a n s f o ~ a t i othe n grammar may no longer be LALR(1):introducing E rules is bad for botto~-upparsers. The parser will work onlyif the item
is the only onein the set of hypotheses at that point. Only then can the parser be confident that this is the item and that the code can be executed. This also ensures that the parent node is A,so the parser knows already it is going to construct a parent node A. These are severe re~uirements. Fortunately, there are many grammars with only a small number of inherited attributes, so the method is still useful. There are a number of additional tricks to get cooperation between attribute evaluation and bottom-up parsing. One is to lay out the attribute stack so that the one and only synthesized attribute of one node is in the same position as the one and only inherited attribute of the next node. This way no code needs to be executed in between and the problem of executing code inthemiddle of a grammarrule is avoided. See the yacc or is^^ manual for details and notation.
If inherited attributes are a problem, let's get rid of them. This gives S, which are characterized by havingnoinheritedattributes at all. It is remarkable how much can still be done within this restriction. In fact, anything that can be done in an , we will show in Section L-attrib~~ted grammar can be done in an S-attributed g r ~ m a ras 3.1.9. Now life is easy for bottom-up parsers, Each child node stacks its synthesized attributes, and the code at the end of an alte~ativeof the parent scoops them all up, processes them, and replaces them by the resulting synthesized attributes of the parent. A typical example of an S-attributed ramm mar can be found in the yaec code in Figure 2.108. The ion: code at the endof the first alternative of express {$$ = new-expro; $$->type =
$$->expr =
$1;
$$->term = $3;)
picks up the synthesized attributes of the children, $1 and $3, and combines them into the synthesized a t ~ i b ~of t e the parent, $$. For historical reasons, yacc grammar rules each have exactly one synthesized attribute; if more than one synthesized attribute has to be returned, they have to be combinedinto a record, which then foms the only attribute. This is comp~ableto functions allowing only one return value in most programming languages.
It is relatively easy to convert an L-attributed g r a m ~ a into r an S-attributed grammar, but, as is usual with grammar transfo~ations,this conversion does not improve its looks, The basic trick is to delay any computation that cannot be done now toa later moment when it can be done. More in particular, any computation that would need inherited attributes is replaced by the creation of a data structure specifying that co~putationand all its synthesized attributes. This data structure (or a pointer to it) is passed on as a synthesized attribute up to the level where the missing inherited attributes are available, either as constants or as synthesized attributes of nodes at that level. Then we do the co~putation. The ~aditionalexample of this technique is the processing of variable decl~ationin a C-like language; an example of such a declaration is int i , j ;. When inherited attributes are available, this processing can be described easily by the L-attributed g r ~ m a in r Figure 3.33. Here the rule Type-Declarator produces a synthesi~edattribute type, which is then passed on as an inherited attribute to Declared-Idf-Sequence and Declared-I .df It is combined in the latter with the representation provided by Idf, and the combination is added to the symbol table. In the absence of inherited attributes, Declared-Idf can do only one thin repr as a synthesized attribute, as shown in Fi ure 3.34. The various reprs result in^ from the occu~encesof Declared-Idf in Declared-Idf-Sequence are collected into a data structure, which is yielded as the synthesized attribute repr 1 is Finally t. this list reaches the level on which the type is known and where the delayed computations can bep e ~ o ~ e d . It will be clear that this technique can in p~nciplebe used to eliminate all inherited attributes at the expense of in~oducingmore synthesi~edattributes and moving more code
Declaration -3 Type-Declarator (type) Declared-Idf-Sequence (type)
;I
I
Declared-Idf-Sequence(1NH type) -+ Declared-Idf (type)
I
Declared-Idf-Sequence (type)
I
I
' Declared-Idf (type)
l
Declared-Idf(1NH type) -+ Idf (repr) ATTRIBUTE RULES: ; Add to symbol table (repr, type) I
Figure 3.33 Sketch of an L-attributed g r a l for ~ Declaration. ~
Declaration -3 Type-Declarator(type) Declared-Idf-Sequence(repr list) ATTRIBUTE RULES: FOR EACH repr IN repr list: Add to symbol table (repr, type) ;
I
;
'
Declared-Idf-Sequence(SYN repr list) -+ Declared-Idf (repr) ATTRIBUTE RULES: SET reprlist TO Convert to list (repr) ;
Declared-Idf-Sequence(o1d repr list) Declared-Idf(repr) ATTRIBUTE RULES: SET repr list TO Append to list (old repr list, repr); I t t
I
Declared-Idf ( S Y N repr) Idf (repr)
-+
l
Figure 3.34 Sketch of an S- attribute^ g r a ~ a for r Declaration.
up the tree. In this way, any L-attributed gram ma^ can be converted into an ~ - a t ~ i b u t e ~ one. Of course, in some cases, some of the attribute code will have to be moved right to the top of the tree, in which case the conversion automatically creates a separate postprocessing phase. This shows that in principle one scan over the input is enough.
The transformationfromL-attributed to S-attribl~tedgrammarseemsattractive: it allowsstronger,bottom-up,parsingmethodstobeused for themoreconvenient Lattributedgrammars.Unfortunately,thetransformation is practically feasible for small problems only, and serious problems soon arise. For example, attempts to eliminate the entire symbol table as an inherited attribute (as used in Figure 3.2) lead to a scheme in which at the end of each visibility range the identifiers used in it are compared to those declared in it, and any identifiers not accounted for are passed on upwards to surrounding visibility ranges. Also, much info ation has to be carried around to provide relevant error messages. See Exercise 3.12 for a possibility to automate the process. Note that the code in Figures 3.33 and 3.34 dodges the problem by having the symbol table as a hidden variable, outside the domainof attribute g r ~ m a r s .
Notations like E .att r for an attribute deriving from g r a m a s symbol E break down if there is more than one E in the g r a m ~ a rule. r A possible solution is to use E [l] ,E [2 ], etc., for the children and E for the non-ter~inalitself, as we did for Digit-Seq in Figure 3.8. More serious problems arise when the right-hand side is allowed to contain regular expressions over the grammar symbols, as in E B W notation. Given an attribute grammar rule Declaration-Sequence(SYN symbol table) -+ Declaration* ~ T T R ~ RULES: ~ ~ T E
it is less than clear how the attribute evaluation code could access the symbol tables produced by the individual Declarations,to combine them into a single symbol table. Actually, it is not even clear exactly what kind of node must be generated for a rule with a variable number of children. As a result, most general attribute g r a ~ a systems r do not allowEBNF-likenotations. If the systemhasitsownattribute rule language,another option is to extend this language with data access operations to match the EBNF extensions. L-attributed and S-at~ibuted gram~ars have fewer problems here, since one can just write the pertinent code inside the repeated part. Thisis the approach taken inLLgen and a possible form of the above rulefor Declaration-Sequence in LLgen would be Declaration-Sequence(struct Symbol-Table *Symbol-Table) {struct Symbol-Table st;}: {Clear-Symbol-Table (Symbol-Table) ;} [ Declaration(&st) {~er~e-S~bol-Tables(Symbol-Table, &st);}
l* I
given proper declarations of the routines Clear-Symbol-Table ( ) and ~ e r ~ e - S y ~ o l - T a(~ ) .lNote e ~ that LLgen uses square brackets [1 for the group-
ing of g r ~ a t i c a constructs, l to avoid confusion withthe parentheses ( ) used for passing at~ibutesto rules.
This concludes our discussion of gramm~-basedcontext handling. In this approach; the context is stored in attributes, and the gramatical basis allows the processing to be completelyautomatic(forattribute g r ~ m a r s or ) largely auto ma ti^ (for L- and S-att~buted g r ~ m ~ sFigure ) , 3.35 s u m m ~ i ~ the e s possible attribute value flow through the AST for orderedattribute g r ~ m a r s L-att~buted , grammars,andS-attributed ~ r a r n m ~ sValues . may flow along branches from anywhere to anywhere in ordered attribute onebranchandthendownthenextinL-attributedgrammars,andupw a t ~ b u t e dgrammars.
L-attributed
Ordered Pictorial comparison of three types of attribute ~ r a ~ ~ s .
We will now discuss some manual methods, in which the context is stored in ad-hoc data st~ctures,not intimately connected with the grammar rules. Of course most of the data structuresare still associatedwithnodes of theAST, since theAST is theonly representation of the program thatwe have.
Although attribute grammars are the only means we have at the moment for gene~~ting contextprocessingprogramsautomatically,themoreadvanced Ute evaluationtechniques for them are still in their infancy, and much context proce done at a lower level, by writing code in a traditional language like C or C++. We will give here two methods to collect context information from the AST; one is co~pletely manual and the other uses some reusable software. Whether this collected i n ~ o ~ a t i oisn then stored in the nodes (as with an attribute g r a ~ ~stored ) , in compiler tables, or consumed imediately is immaterial here: since it is all handy-work, it is up to the compiler
writer to decide where to put the information. The twomethodsare s y ~ ~ o l i c i ~ t ~ r ~ and r ~ t ~at^-^^^ t i o n e~u~tions. from the AST as produced by the syntax analysis, possibly alread extent, but both require more flow-of-control i n f o ~ a t i o nthan the particular, we need to knowfor each node its possible ow-of-control s~ccessoror succesCO sors. Although it is in principle possible to determine these successors while checking the context infor~ation,it is much more convenient to have the flo available in each node in the €om of successor pointers, These ~ointerslink the nodes i the AST together inan a~ditionaldata structure, the c o ~ t ~ u l g~ Z u ~~ ~ ~ ~ Roadmap 3.2 ~~u~ methods 3.2.1 Threading the AST 3.2.2 Symbolic inte~retatio~ 3.2.3 Data-flow equations 3.2.4 Inte~rocedur~ data-flow analysis 3.2.5 Carrying the info~ationupstream -live analysis 3.2.6 Comparing symbolic i~ter~retation and data-flow equations
c m be constructed statically by t e tree, as follows. for each node type; the threading pointer to the node to be processedas a p ~ a ~ e t e r , d e t e ~which i n e s p~oductionrule of N describes the node, and calls the threading routines of its children, in a r of theAST. The set of routinesmaintains a globalvariable Last n which points to thelast node processed on the control flow path, the dyn When a new node N on the control path is met during the recursive traversal,its a ~ d ~ e' ss storedin Last node pointer.successar and Last node to point toN. Using this technique, the threading routinefor a binary expression have the following form: ~ R O C E ~ U RThread E binary expression (Expr node pointer): Thread expression (Expr node pointer.left o Thread expression (Expr node pointer.right operand); // link this node to the dynamically last node SET Last node pointer .successor TO Expr node pointer; // make this node the new dynamically last node SET Last node pointer TO Expr node pointer;
This makes the present node the successor of the last node of the ri registers it as the next dynmically last node. ure 3.36 showstheASTandthe control flowgraph for the e ~ ~ r e s s b*b io~ 4 *a*c, wherethepointersthat make uptheASTareshown as solidlinesand 01 flow graph is shown using arrows. Note that the fxst node visited by the thread
.
Last
node
pointer
Last
node
pointer
I \
l
I I I
\
I
i /
X
A ’
/
l\ I
h
\
*
4
a
situation
initial
C
finalsituation
~ i ~3.36 ~ Control r e flow graph for the expression b *b -
4 *a *c .
is the -node at the top but that thefirst node in the control flow graphis the leftmost b,as it should be. Statically, the -node is the first node, but dynamically, at run time, the leftmost b is the first node. Threading code in C for the demo compiler from Section 1.2is shown in Figure 3.37. The threading code for a node representing a digit is trivial, that for a binary expression node derives directly from the code for Thread binary expression given above. Since there is no first dynamically last node, a dummy node is used to play that role temporarily. At the end of the threading, the thread is terminated properly; its startis retrieved from the dummy node and stored in the global variable Thread-start,to be used by a subsequent interpreter or code generator. A complication arises if the flow of control exits in more than oneplace from the tree below a node. This happens, for example, with the if-statement. Actually, there are two problems here, The first is that the node that corresponds to the run-time thedelse decision has two successors rather than one, and the second is that when we reach the node dynamically following the entire if-statement, its address must be recordedin the dynamically last So a singlevariable Last node nodes of boththethen-partandtheelse-part. pointer is no longer sufficient. The first problem can only be solved by just storing two successor pointers in the ifnode; this makes the if-node different from the other nodes, but in any graph that is more complicated than a linked list, some node will have to store more than one pointer. One way to solve the second problem is to replace Last node pointby a er set of last nodes, each of which will be filled in when the dynamically next node in the control flow path is found. But it is often more convenient to constructa special join node to merge the diverging flow of control. Such a node is then part of the control flow graph without being part of the AST; we will see in Section 3.2.2 that it can play a useful role in context checking.
#include"parser. #include"thread.
h" h"
/* fortypesAST-nodeandExpression /* for selfcheck */ /* PRIVATE */
*/
static AST-node *Last-node; static void Th~ead-expression(Expression*expr) { switch (expr->type) { case 'D' : Last-node->successor = expr; Last-node = expr; break; case 'Pr: Thread-expression(expr->left); Thread-expression(expr->right); Last-node->successor = expr; Last-node = expr; break;
}
1
/* PUBLIC */
AST-node *Thread-start; void Thread-AST(AST-node *icode) { AST-node Dummy-node;
1
Last-node = &Dummy-node; Thread-expression(icode); Last-node->successor = (AST-node *IO; Thread-start Dummy-node.successor; I :
Figure 3.37 Threading codefor the demo compiler from Section 1.2.
The threading routine for an if-statement could then have the form shown. in ure 3.38. The if-node passed as a parameter has two successor pointers, true succe sor and false successor. Note that these differ from the then part and el part pointers; the part pointers point tothe tops of the co~espondingsyntax subtrees, the successor pointers point to the dynamically first nodes in these subtrees. The code st by threading the expression which is the condition in the if-statement; next, the if-n itself is linked in as the dynamically next node, Last node pointer having been set by Thread expression to point to the dynamically last node in the expression. prepare for processing the then- and else-parts, an End-if node is created, to be use combine the control flows from both branches of the if-statement and to serve as a link to the node that dynamically follows the if-statement. Since the if-node does not havea single successorfield, it cannot be used as a last tocatch the pointerst node, so weuse a localauxiliarynode Aux last node dynamically first nodesinthethen-and else-parts. The call of Thread b l o c k node .then part ) will put the pointer to its dynamically first node in Aux l node,fromwhereit is picked up andassignedto If node .true success
0 IO^ 3.2
~ a n ~~ e at ~ o ~ s
the next state~ent.Finally, the endof the then-part will have the end-if-join node set as its successor.
OCEDURE Thread if statement (If node pointer) : Thread expression (If node pointer .condition); SET Last node pointer .successor TO If node pointer; 0 ; SET End if join node TO Generate join node
SET Last node pointer TO address of a local node Auxnode; last Thread block (If node pointer .then part); ^. SET If node pointer .true successor TO Aux last node .successox SET Last node pointer .successor TO address of if Endjoin node; . I
T Last node pointer TO address of Thread block (If node pointer .else SET If node pointer .false successor SET Last node pointer .successor TO
Aux node; last part); TO Aux last node .successor address of if Endjoin node;
SET Last node pointer TO address of End if node; join Figure 3.38 Sample threading routine for if-statements.
~~
Last node I
pointer
l I
0.
i.
I
If-statement
i
I
V
x Condition
Thensart
Elsesart
igure 3.39 AST of an if-statement before threading.
Given the AST from Figure3.39, the routine will threadit as shown in Figure3.40. ote that the Last node p o i n t er pointer has been moved to point to the end-ifjoin node. ~ ~ athe dASTican~also ~be expressed by means of an attribute grammar. The suc-
Last
pointer . --------"""""""~
node
" " "
If-statement
\ I I I
X
0 AST and control flow graph of an i f - ~ t a t e ~ eafter nt t ~ e a d i ~ g .
cessorpointersarethen i ~ ~ l e m e n t easd inherited att~butes. oreover, each node additional synthesized attribute that is set by the evaluatio~rules to the pointer to the first node to be executed inthe tree.
If-statement(1NH successor, SYN first) "+ 'IF' Condition 'THEN' Then-part 'ELSE' Else-part 'END' 'IF' ATTRIBUTE RULES: SET If-statement .first TO Condition .first; TO Then-part .first; SET Condition .true successor SET Condition .false successor TO Else-part .first; SET Then-part .successor TO If-statement .successor; SET Else-part .successor TO If-statement .successor; Figure 3.41 Threading an if-statement using attribute rules.
The t ~ e a d i n grules for an if-statement are given in Figure 3.41. In this e ~ ~we~ assume that there is a special node ty Condition (as sug sema~ticsof which is toevaluate the oleanexpressionand to true successor or false S U C C ~ S S O as ~ ,the case may be. It is often useful to i~plementthe control flow graph as a ~ o u b l ~ - l i graph in which each link consists of a pointer pair: one from the node to thesucces~orand
~
l
e
one from the successor to the node. This way, each node contains a set of pointers to its dynamic successor(s) anda set of pointers to its dynamic predecessor(s). This ~ a n g e m e n t gives the algorithms working on the control graph great freedomof movement, which will prove especially useful when processing data-flow equations. The doubly-li~edcontrol flow graph of an if-statement is shown in Figure3.42.
Last
. pointer
node
""-----"
" " " " " " " " "
If-statement
\ I l
~ i ~ u 3.42 r e AST and doubly-lin~ed control flow graph of an if-statement.
No threading is possible in a narrow compiler, for the simple reason that there is no AST to thread. ~orrespo~dingly less context handling can be done than in a broad compiler. Still, since parsing of programs in imperative languages tends to follow the flow of control, some checking can be done. Also, context handling that cannot be avoided, for example strong type checking, is usually based on information collected in the symbol table. Now that we have seen means to construct the complete control flow graph of a program, we are in a position to discuss two manual methods of context handling: ~ y ~ ~ o Z i c inter~retation,which tries to mimic the behavior of the program at run time in order to col~ a ist i a~ semi-automated n ~ , restricted lect context infor~ation,and ~ a t a - ~ o ~ e ~which form of symbolicinterpretation.Othermethodsarepossible.Anunusualapproachto data-flow analysis, based on a g r a ~ a t i c a paradigm l different from that of the attribute g r a ~ a r sis, given by Uhl and Horspool (1 994), in which the kinds of information to be gathered by data-flow analysis must be specified and the processing is automatic.
When a program is executed, the control follows one possible path through the control flow graph. The code executed at the nodes is not the rules code of the attribute which represents (compile-time) context relations, but code that represents the semantics of the node. For example, the attribute evaluation code in the if-stat Figure 3.41 is mainly concerned with updating the AST and with passing around tion about the if-statement. At run time, however, the code executed by an if-state~ent node is the simple jump to the then- or else-part depending on a condition bit c o ~ p ~ t e just before. The run-time behavior of the code at each node is determined by the values of t variables it finds at run time upon entering the code, and the behavior d e t e ~ i n e sth values again upon leaving the code. Much contextual information about variables can deduced statically by simulating this run-time processat compile time ina techni~ue calk n or simu~~tion on the st Todo so, we attach a re the arrow in control flow g . In principle, compil this representation of the run-time stack holdsan entry for each identifier visible at that point in the program, regardlessof whether the c o ~ e s p o n d i nentity ~ will indeed be put on the stack at run time. In practice we are mostly interested in variables and constants,so most e~tries will concern these, The entry summarizes all compile-time infomation we have about the variable or the constant, at the moment thatat run time thecontrol is following the arrowi the control graph. Such infor~ationcould, for example, tell whether it has been initiali~e or not, or even what its value is. The stack representations at the entry to a node and at its exit are connected by the semanticsof that node. Figure 3.43 showsthestackrepresentationsin the control flowgraph of an ifstatement similar tothe one in Figure 3.40. We assume that we arrive with a stack containing two variables, x and y, and that the stack representation indicates that x is initializ and y has the value 5; so we can be certain that when the program is run and the flow control arrives at the if-statement, x will be initialized and y will have the value 5. also assume that the condition is y > 0. The flow of control arrives first at the node and leaves it with the value of y put on the stack. Next it comes to the 0, which stacked, and then to the operator>,which unstacks both operands and replaces them value true. Note that all these actions can be perfomed at compile time thanks to the fact that the value of y is known. Now we arrive at the if-node, which unstacks the condition and uses the value to decide that only the then-part will ever be executed; else-part the can be marked as unreachable and nocode will need tobe generated for it. Still, we depart for both branches, armed with the same stack representation, and we check them bo since it is usual to give compile-time error messages evenfor errors that occur in unrea able code. The outline of a routine Symbolically interpret an if statement ' given in Figure 3.44. It receives two parameters, describing the stack represe~tationan the 'if' node, First it symbolicallyinterpretsthecondition.Thisyields a newstac representation,whichholdstheconditionontop. The condition is unstacked,andthe resulting stack representation is then used to obtain the stack representations at the ends of the then-andtheelse-parts.Finallytheroutinemergesthesestackrepresentationsand yields the resulting stack representation. ~~
3 Stack representatio~sin the control flow graph of m if-statement.
F ~ C T I O NSymbolically interpret anif statement ( Stack representation,If node ) R E T U R N ~ Na~ stack representation: SET New stack representation TO Symbolically interpret a condition ( Stack representation,If node .condition );
Discard top entry from New stack representation; RETURN Merge stack representations( Symbolically interpret a statement sequence ( New stack representation,If node .then part )
);
l
Symbolically interpret a statement sequence ( New stack representation,If node .else part 1
Outline of aroutine Symbolically
interpret
if an statement.
The actual code will contain more details. For example, it will have to check for the i presence of the else-part, since inal if-statementmayhavebeen dependingonhowthestackrepion is imple~entedit mayneed t one copy toeach branch of the if-statement. It will be clear that many properties canbe propagated in thisway t h r o ~ flow graph, and that the i n f o ~ a t i o nobtained can be very useful bo checks and for doing optimizations. In fact, this is how some imple context c h e c ~ n g p r o g rlint m operate. Symbolic interpretation in one form or another was already used in example, Naur (1945) used symbolic inte~retationto do type checkin was not described in them~nstreamliterature until the mid-1970s (Ki one of those things onedid. We will now consider the check for uninitialized variables in more detail v ~ i ~oft symbolic s inte~retation.The first, s i ~ ~ ~sy~~oZic Ze inte scan from routine entrance to routine exit and applies to structure properties only; a program is structured when it consists of flowone entry point and one exit point only, The second variant, f u Z works in the presenceof any kindof flow of control and for a wi is that si The fundamental difference between the two follows the AST closely: for each node it analyzes its child they occur in the syntax, and the stack representations are processed as Lrestricts the method to structured programs only, and to simple properties, but allows It to be applied in a narrow compiler. Full symbolic inte~retation,on the o the t ~ e a d i n gof the AST as com~utedin Section 3.2.1. This obviously AST and since the threading of the AST may and usually will contain cycles, a closure algorithm is neededtocomputethe full required i n f o ~ a t i o n , In short,the d i f f ~ r e ~ c e between full and simple symbolic inte~retationis the same as that between ~eneralattribute g r a ~ and ~ L-attributed s grammars.
io To check for the use of uninitialized variables usin
make a compile-ti~erepresentation of the local st parameterstack)andfollowthis represen~ation representation can be i~plementedconveniently as a linked list of names and roper ties pairs, a ‘property list’. The list starts off as empty, or, if there are para~eters,as initialized with the ~ ~ a m e Init ial ized for INand I p~ametersand Unini terswiththeirproperties: t ial ized for OUTparameters.We also maintain a r ist, inwhichwecombin the stack rep~esentationsas found at return statements and routineexit. We then follow the arrows in the control flow graph, all the while The precise actions required at each node type depend of course on the source languag~,butareusuallyfairly obvious, Wewilltherefore in briefly here. When a declaration is met, the declared rrarne is added to the list, w i
status: Initialized if there was an initialization in the declaration, andUninit iali zed otherwise. When the flow of control splits, for example in an if-statement node, a copy is made of theoriginal list; onecopy is followedonitsroutethroughthethen-part,theother through the else-part; and at the end-if node the two lists are merged. Merging is trivial except in the case in which a variable obtained a value in one branch but not in the other. In that case the status of the variable is set to May be ini t The ial status May i zed. be initialized is equal to Uninitialized for most purposes since one cannot rely on the value being present at run time, but a different error message can be given for its use.Notethatthestatusshouldactuallybecalled May be initial i zed a also may not be initialized. The same technique applies to case statements. When an assignmentis met, the status of the destination variable is set to Initialized,after processing the source expressionfirst, since it may contain the same variable, When the value of a variable is used, usually in an expression, its status is checked, and if it is not Initialized,a message is given: an error message if the status is Uninit ial since i thezed, error is certain to happen whenthe code is executed and a warning for May be initialized, since the code may actually still be all right. An example of C code with this property is /* y is still uninitializedhere */ if (x >= 0 ) ( y = o;} if (x =. 0 ) { z = y ; } Here the status of y after the first statement is May be initialized. This causes a warning concerning the use of y in the second statement, but the error cannot materialize, since the controlled part of the second statement will only be executed if x 0. In that case the controlled partof the first statement will also have been executed, initializing y. When we meet a node describing a routine call, we need not do anything at all in principle: we are considering information on the run-time stack only, and the called routine cannot touch our run-time stack. If, however, the routine has IN andlor INOUT parameters, these have to be treated as if they were used in an expression, and any INOUT and OUT parameters have to be treated as destinations in an assignment. When we meet a for-statement, we pass through the computations of the bounds and the initialization of the controlled variable. We then makea copy of the list, which we call -exit list. This list collects the information in force at the exit of the loop. We pass the original list through the body of the for-statement, and combine the result withthe loop-exit list, as shown in Figure 3.45. The combination with the loop-exit list represents the possibility that the loop body was executed zero times. Note that we ignore here the back jump to the beginning of the for-statement -the possibility that the loop body was executed more than once. We will see below why this is allowed. When we find an exit-loop statement inside a loop, we merge the list we have collected at that moment into the loop-exit list. We then continue with the empty list. When we find an exit-loop statement outside any loop, we give an error message. When we find a return statement, we merge the present list into the return list, and continue with the empty list.We dothe same when we reach the end of the routine, since a return statement is implied there. When all stack representations have been computed, we
expr From-expr To-expr
5 =: v
-
Forstatement
V
from to
' 1 L rom to I
I
F i ~ ~ 3r.4e5 Stack representations in the controlflow graph of a for-state~ent.
check the return list to see if all OUT parameters have obtained a value, and give an error message if they have not. And when we reach the end node of the routine, we check all variable identifiers i the list. If one has the status Uninit ial. i zed, it was never initialized, and a W can be given. The above technique can be refinedin many ways. Bounds in for-state~entsare ofte constants, either literal or named. If so, their values will often prove that the loop will performed at least once. In that case the original list should not be mergedinto the exit list, to avoid inapprop~atemessages. The same applies to the well-knownC idioms for i n f i ~ i t ~ loops:
for (;;) ... while ( 1 ) ...
Once we have a system of symbolic inte~retationin place in our compiler, we can easilyextend it to fit specialrequirements of andpossibilitiesoffered by thesource language. One possibility is to do similar accounting to see if a variable, constant, field selector, etc. is used at all. A second possibility is to replace the status Initialized by or even the set of values the variable may hold, a technique called n. This information can be used for at least two purposes: to identify variables that are actually usedas constants in languages thatdo not have constant declarations, and to get a tighter grip on the tests in for- and while-loops. Both may improve the codethatcanbegenerated.Yetanother,moresubstantial,possibility is to do last-def analysis, as discussed in Section3.2.2.3. When we try to implement constant propagation using the above technique, however, we run into problems. Consider the segment of a C program in Figure 3.46. Applying the i hasthevalue 0 at the ifabovesimple s y ~ b o l i cinterpretationtechniqueyieldsthat statement, so the test i > 0 can be evaluated at compile time and yields 0 (false). Consequently, an optimizermightconcludethatthebody of theif-statement,the call to print f () ,can be removed sinceit will not be executed. Thisis patently wrong.
int i =
0;
while (some condition) { if (i 0) printf (iiLoop reentered: i = %d\nfi,i) ; i++ ;
1
Figure 3.46 Value set analysisin the presenceof a loop statement.
It is therefore interesting to examine the situations in which, and the kind of properties for which, simple symbolic inte~retationas explained above will work. Basically, there arefour require~entsfor simplesymbolicinterpretationtowork;motivation for these requirements will be given below. 1. The p r o ~ r must a ~ consist of flow-of-control structures with one entry point and one exit point only. 2. The values of the property must form a lattice, which means that the values can be ordered in a sequence v l ..v, such that there is no operation that will transformvj into vi with i <j; we will writevi value); break; case , P t: printf ( l t {\n\ int e-left = Pop(); int e-right Po switch (%d) {\n\ case . Push(e-left + e-right); break;\n\ case Push(e-1eft * e-right); break;\n\ 11\m" exprPoper I
1:
' + l
' * l : I
):
1
1
1
break;
Active-node-pointer
= Activ~-node-pointer->s~ccessor;
printf (flprintf(\ll%%d\\n\", printf (llreturn ~ ; } \ n l f :)
Pop()) ; /* print
the
result* / \ n f l );
/* P U ~ L I*/~ void Process (AST-node *icode) { Thread-AST(icode); Active-node-pointer = Thread-start; Trivial-codemgeneration();
1
.l2 A trivial code generator for the demo compiler of Section 1.2.
in the inte~reter,and Expression-P for the case ly those interpreter routines that are actually usedby a p included in the threaded code. The characteristic advantage of threaded code is that it is small. is ~ ~ i used ~ inl y process control andembed~edsystems, to control hardware with iimite for example ~ a l ~ t and o ~tele~hones. s code by hand,but t~readedcodecan also be g e n e ~ ~ very t e ~ well
#include Iistack.hi1 int main(void1 { Push(7) ; Push(1); Push(5); -left = pop () ; int e-right = Pop 0 ; switch (43) { case Push(e-left -I- e-right); break; * e-right); break; case Push(e-left /-I-/:
{
1)
int e-left = Pop ( ) ; int e-right = Pop 0 ; switch (42) { case Push(e-left -t e-right); break; Push(e-left * e-right); break; case /-I-/:
>> printf
return
/*/:
(rt%d\nli,
o ;}
Pop() ) ; /* print
the
result*/
.l3 Code for (7 * ( 11-5)) generated by the code generator of Figure4.12.
int main(void) { Expression-D ( 7 ) ; E~pres~ion-D ( 1 ); Expression-D ( 5 ) ; xpression-P (43); /* 43 = ASCII valueaf Expression-P (42); /* 42 = ASCII value of Print (1 ; return 0;
/*/
*/ */
1
Possible threaded code for (7 * (1 + 5 1 .
code was first rese a variety of contexts
the PDP-3 1 (1973) er and Walker, 3980;
code size reduction is desired, the routines can be numbered and the laced by an array of routine n ~ ~ b e rifs there ; are no more than 256 iffer~ntroutines~one byte per call suffices (see xercise 4.5). Since each routine has a knownnumber of p ~ a ~ e t and ~ r sis all p ~ a ~ e t e derive rs from fields inthe AST and are thus con~tantsknown to the code erator, the ~arameterscan be incor~oratedinto the
#include
li
stack.h"
{ void Expression-D (int digit) Pus~(digit);
1
void Exp~ession-~(intoper) int e-left = Pop 0 ; int switch (oper) { case "I-': Push(e-left + case / * / : Push(e-left *
void Print (void) { printf ( r l % d \ n " ,Pop (1
1
{
e-right = Pop 0
e-right); break; e-right); break;
;
Routines for the threaded code for (7 * (1 -t5 1 ) .
en era ti on has become c o ~ p ~ e t e~l yl ~ ~ e d . Actually, the above t e ~ ~ n i ~ u e ince the code s ~ ~ m e nfrom t s the in
operations, effect jumps, etc. the ~rimitive routine^ need to be e i ~ t ~ ~ r eist ~ e ra c ~ i n
out.
;
Pwtial evaluation in a s e g ~ ~ eofn tthe code ~euerator,
7 Code for (7 * (l+5 ) ) ~ e ~ l eby ~ the ~ ~code ~ e~d~ n e r a tof o rFigure 4.16.
'
M.C. ( ~ a u r i t Cornelis) s Escber (1898-1972), Dutch artistknown for his intri ous drawings and ~ a i n t i ~ g s .
ore ground ( ~ n - n o wview ) of partiallye v a l u ~ tcode. i~~
.l9 ~ackground(run-later) view of partially ~ v ~ l u a t icode. n~
ram generation9and the more co~piler-constructionoriented book extensive example of eneratin an object code se~mentby ~ a n u a partial l eval~ationcan be found in Section 8.
In simple code gene ratio^, a fixed t~anslationto the target codeis chosen for each ~ o s s i ~ l e code ~eneration9 the nodes in the AST are then replacedby their translations, and the AST is linearized by following the data flow in flow of controlelsewhere. Since thecorrectness of this corn d~pendsvery rnuch on the inte~faceconventions between each i ~ ~ o ~ toakeep n tthese interface conventionssimple;but, as e conventions allow more efficient transla~ions. lmple code generation requires local decisions only, and is ther~fore able for narrow compilers. With respect to machine types,it is p
somewhatsimi
models,thepurestackmachineandthepure r e uses a stack to store and manipulate values; It has two typesof inst~ctions,those that moveor copy values between the top of the stack and elsewhere, and those that do o~erationson the top element or e l e ~ e n t sof the sta The stack machine has two important data administration pointers: the stack pointer which points to the topof the stack, and the basepoi~ter P, which points to thebeginni~ of the region on the stack where the local variables are stored; see Figure 4.20. It ma have other data ad~inistrationpointers, for example a pointer to th lobal data area and a stack area limit pointer, but these play no direct role in simple code *
I I
I
I I
I
stack
Data a ~ ~ n i s ~ a tini oa nsimple stack machine.
For our explanation we assume a very simple stack ~ a c ~ i none e , in which all stack e integer and which features only the machine i n s ~ c t i o n s u ~ ~ ~ inz e d many overflow stack with on here; We ~achines Figure 4.21. stack o v e ~ o wis results inand a synchronous interr~pt,which allows the operati Push-Const c pushestheconstant c (incorporated in the m ontothetop of thestack;thisactionraisesthestackpointer by 3. pushes a copy of the value of the i-th local variable on the top of th porated in the machine inst~ction,but BP is add stack element; this raises the stack pointer by 1. ment from the stack and stores its value in the pointer by 1. Add-Top2 removes the top two andpushes the resultbackontothestack;thisactionlowersthestackpointer by l . Subtr-Top2 and Mult-Top2 do similarthin S ; notetheorder of the o p e r ~ d sin Subtr-Top2: the deeper stack entryis the left operand since it was pushed first. Suppose p is a local variable; then the codefor p :=p+5 is
nstruction
Push-Const Push-Local Store-Local
Actions
e i i
SP:=SP+1; stack [SPJ:=c; SP:=SP+1; stack [SPI :=stack [BP+i]; stack [BP+il :=stackESP] ; SP:=SP-1; stack FSP-11 :=stack[SP-l]+stack [SP] ; SP:"sp-1; stack [SP-11 :=stack[SP-l]-stack [SP]; SP:=SP-I; SP:= s P - ~ ; stack ESP-l1 :=stack[SP-11 *stack ; [SP]
Add-Top2 Subtr-Top2 Mu1 t-Top2
Stack machine instructions.
Push-Local Push-Const Add-Top2 -Local
#p 5
#p
// // // //
Push value of #p-th local onto stack. Push value 5 onto stack. Add top two elements. Pop and store result back in #p-th local.
is the position number of p among the local variables. ote that the operands ne instructions are all compile-time constants: the operand of Push-Local and Store-Local is not the value of p -which is a ~ n - t i quantity ~e -but the ~ u k machine model has been made popular
bythe ce all modern machines, with the exception of RI instr~ctionsthismodelstillhaswide applicabi~ity.Its maindisadvantage is that on a emory to store values in, a set of registers to perne set contai~sinstructions to copy he instructions in the othe set perform operathe result in one of them. n our simple regisstore values of type integer; the instructions are here consist of two parts. The first part can be ll of whichimply a register as thetarget,or on as thetarget. The secondpartspecifiesthe Reg, or Mem. For e~ample,an inst~ction d-Const 5 , R 3 would add theconstant 5 to thecontentsister 3. The above i ~ s t ~ c tnames i o ~ havebeenchosen for theirexplanatoryvalue; do not derive from ge. Each assembler has its own set ofinst~ctionnames, most of them emarks are in order here. The first is that the rightmost operand in the edestination of theoperation, in accordancewithmostassembler at this is a property of those assembly lan uages, not of the machine instructions themselves. In two-register inst~ctions,the destination register doubles as the
~
Inst~ction
Actions
Load-Cons Load-Mem Store-Reg
tc ,R, X R, R, ,x
R,t:=fc ; R,,:= X ; X :=RIz ;
Add-Reg Subtr-Reg Mu1t-Reg
R, I R, R, ,R, R, ,R,
R, :=R,+R, ; -R, ; R, :=Rtz ; R, :=R, *R,,
I
Figure 4.22 Register machine instructions.
first source register of the operation during execution; this is a property of the m a c ~ i ~ inst~ctionsof a pure register machine. Load-Mem x ,R, with sema~tics The secondremark is thattheabovenotation R, :=xis misleading. We should actually have written Instruction
Load-Mem
Actions
&x, R,R,:
=* ( U );
in which &x is the address of x in memory. Just as we have to write Push-Local in which #b is the variable number of b, to push the value of b onto the stack, we sho in principle, write Load-Mem &x,R1 to load the value of x into R1. course that machine inst~ctionscan contain constants only: the load-co contains the constant value directly, the load-memory and store-memory in tain constant addresses that allow them to access the values of the variables. ally assembly languages consider the address indication & to be implicit in store instructions, making forms like Load-Mem x,R1 the normal way value of a variable into a register; its semantics is R, :=* { &x),in which the address operator & is provided by the assembler or compiler at compile time and the ~ e r ~ f e r e ~ c e operator * by the inst~ctionat run time. The code for p :=p+5 on a register-memory machine would be: Load-Mem Load-Const Add-Reg Store-Reg
p,R1
5 ,R2 R2 ,R1 Rl,p
in which p represents the addressof the variable p. The most pro~inentrepresentative of thismodel is theI modern machines have registers, the model is very relevant. Its e main problemis that the number of registers is limited.
we can generate stack machine code for arithmetic expressions. As the ex~ressionb*b - 4 * (a*c ) ;its AST is shown in Figure 4.23.
W
*
*
a
/\ C
The abstract syntax tree for b * b - 4 * (a *c ) .
ext we ~ o n s i ~ the e r ASTs that belong to the stack machine instructions from Fig-
Push-Const C:
Add-TopZ
:
/+\
l
Push-Local i:
l
Subtr-Top2
l
Mult-Top2:
l
i
:
Store-Local i:
i
l /-\ .-
/"\
/*\
The abstract syntax trees for the stack machine i n s t ~ c t i o ~ s .
erface convention that operands are supplied to and retrieved from the e trivial: each machine instruction correspondsexactly to one ee Figure 4.24. As a result, the r ~ ~ r i t i nofgthe tree is also is replaced by its straightforw~dtranslation; see Figure 4.25, in which the variable n u ~ b e r (stack s positions) of a, b, and c.
Subtr-Top2
/ l /\ /\ /\ Mult-Top2
Mu1t-Top2
Push-Local #b
Push-Local #b
Push-Const
4
Mult-Top2
Push-Local #a Push-Local
#c
The abstract syntax tree for b*b - 4 * (a*c)rewritten.
The only thing that is left to be done is to order the inst~ctions.The conventions that an operand leaves its result on the top of the stack and that an operation may only be issued when its opera~d(s)are on the top of the stack i ~ e d i a t e l ysuggest a simple evaluation order: depth-first visit. ~epth-firstvisit has the property that it first visits all the children of a node and then immediately afterwards the node itself; since the chil~renhave put their results on the stack (as per ~ o n v e ~ t i the o ~p)nt can now find them there and can use themtoproduceitsresult.Inotherwords, dept~-firstvisitcoincideswiththedataflow mows inthe of an ex~ression.So wearrive at thecode ~ e ~ ~ r a ta1i o n shown in Figure 4.26, in which the procedure E m i t ( ) produces its p ~ a ~ e t e r ( in s )the proper inst~ctionformat.
P ~ O C E ~ U RGenerate E code (Node) : SELECT Node .type: CASE Constant type: Emit (ltPush-Consts' Node .Value); CASE LocalVar type: Emit (tlPush-Locall' Node .number); CASE StoreLocal type: Emit (lrStore-Localsl Node .number) ; CASE Add type: Generate code (Node .left) ; Generate code (Node .right) ; Emit ( "Add-Top2'I ) ; CASE Subtract type: Generate code (Node ,left) ; Generate code (Node .right) ; Emit ( Subtr-Top2 ) ; CASE Multiply type: Generate code (Node .left) ; Generate code (Node .right) ; Emit ( "Mu1t-Top2 ) ; It
.26 Depth-first code~eneratio~l for a stack machine.
N 4.2
~ p p l y this i ~algorithm ~ to thetopnodeinFigure4.25yieldsthefollowing sequence:
code
Push-Local #b Push-Local #b Nult-Top2 4 Push-Const Push-Local #a. Push-Local #c Mult-Top2 Mu1 t-Top2 Subtr-Top2
successivestack CO urationsthatoccurwhenthissequenceisexecutedareshown igure 4.27, in which values appear in their symbolic form. The part of the stack on h e~pressionsare evaluated is called the ‘working stack’;it istreated more extensively
7 Successive stack c o l l f i ~ u r a t i ofor ~ ~b*b - 4* (a*e )
was said about code ene era ti on for the stack ~ a c h i n applies e to the register XI. The ASTs of the e chine i~st~uctions from Figure 4.22 can be found in utputs are mentioned except for the result
Load-Const c R,
Add-Reg R,,
:
Load-Mem x R,,:
,R, :
Subtr-Reg R,fi, R,
:
+
i"
/\
/\ Store-Reg R,, x: *
X/ :
= \
R,
The abstract syntax treesfor the register machine instructio~s.
of the top instruction, the output register of an inst~ctionmust be used im~ediatelyas an input register of the parent instruction in the AST, and that, for the moment at least, the two input registersof an instruction must be different. Notethat as a result of theconventiontonamethedestinationlast in a s s e ~ ~ l ~ instructions, the two-operand instructions mention their operandsin an order reversed that which appears in the ASTs: these i n s t ~ c t i o ~mention s their second source re first,sincethefirstregister is thesame as the destination,whichismentioned s ~nfortunately,this m a y occasionally lead to some confusion.
R1 are available as auxiliary registers.
Registerallocation is noweasy;seeFigure 4.29, inwhich Target is a re number and Aux is a set of register numbers. Less accurately, we will refer toTarget as a register and toAux as a set of registers. The code for the leaves in the expression tree is strai~htforward: justemit the using the target register. The code for an operation node starts with code for the left using the same parameters as the parent: all auxiliary registers are still available and th result must arrive in the target register. For the right child the situation is dif~erent:on register, Target,is now occupied, holding the result of the left tree. We therefore pick a register from the auxiliary set, Target 2, and generate code for the right child wit
~ R O C E ~ U RGenerate E code (Node, a register Target, a register set Aux): SELECT Node .type: CASE Constant type: Emit ("Load-Const Node .value I I I R i l Target); CASE Variable type: Emit ( l1Load-Mem Node .address , R t 1 Target); CASE ... CASE Add type: Generate code (Node.left, Target, Aux) ; SET Target 2 TO An arbitrary element of Aux; SET Aux2 TO Aux \ Target 2; // the \ denotes the set difference operation Generate code (Node .right, Target 2, Aux 2); Emit ( I'Add-Reg R" Target 2 , R i 1 Target) ; CASE .. igure 4.29 Simple code ene era ti on with register allocation.
eaves the result in Target and frees Target 2. So when we leave the routine, all auxctuall~,no set ~anipulationis necessary in this case, because the set can be impleally a stack of the numbers l to the numberof available registers, a
shown in Figure 4.31, Figure 4.32 shows the ont tents of the of this code. The simil~itywith Figure 4.27 is i ~ e d i a t e :
ding b twice is dealt with in the subsection on common n 4.2.5.1.) The reason is that one register gets tied up e a*c is being computed. If we had treated the right
result of the first child. So it is advantageous to gen-
e can be computed from the weights of its children,
c
PROCE~UREGenerate code (Node, a register number Target): SELECT Node .type: CASE Constant type: Emit ("Load-Const Node .value l l t RTarget) " ; CASE Variable ty mit ( I1Load-Mem Node .address ,R" Target) ; CASE ... CASE Add type: Generate code (Node left, Target); Generate code (Node .right, Target+-l) ; Emit (liAdd-RegRIi Target+-lll,Rsi Target) ; CASE ... I
uce 4.30 Simple code genera~ionwith register n u ~ b e ~ n ~ .
Load-Mem Load-Mem Mult-Reg Load-Const Load-Mem Load-Mem Mu1t-Reg Mult-Reg Subtr-Reg
b,R1 b,R2 R2,R1 4,R2 a,R3 c,R4 R4 ,R3 R3 ,R2 R2 ,R1
.31 Register machine code for the expression b* b
- 4 * (a*c ) .
the weight of a subtree can be d e t e r ~ i n esi~ ly by a depth-first prescan, as shown in Fi ure 4.34. If the left tree is heavier, we compile it first. hold in^ its result CO doing the second tree costs Required right re~isters, tog rightt-l,but since Required left > Required ri left, so Required right+l cannot be larger than Required fice. The same applies vice versa to the right tree if it is heavier. If bot require one extra register. This techni~ueis someti~escalled ,after its designers (Sethi and~ l l ~ a 1n ,~ ~ 0 ~ . b*b - 4 " ( a * c ) , withthe ee that the tree a* c is heavier see that this leads to the code sh Theabove computatio~s~ e n e r a l i % to ~o~e~ations with n operands. An e x a ~ p l eof such an operation is a ro~tinecall with y1 ~ ~ a ~ e t eunder r s , the not unusual conven~io~ that
R2 : R1 :
R4 : R3 : R2 : R1 :
R4 : R3 : R2 :
I
R1 : b*b-4" (a*c)
Successive register contentsfor b*b - 4* (a*c ) .
Load-Mem Load-Mem Mult-Reg Load-Mem Load-Mem Mult-Reg Load-Const Mu1 t-Reg Subtr-Reg
b,R1 b,R2 R2 ,R1 a,R2 c,R3 R3 R2 4,R3 R3,R2 R2 R1
Weighted register machine codefor the expression b*b - 4* (a*c 1.
all ~ ~ a m e t e must r s be passed in registers (for rz smaller than some reasonable n u ~ b e r ) . e ~ g u ~ ethat n t each f i ~ i s h operand e~ takes away one register, registers will be used most e c o ~ o ~ i c a if ~ lthe y ~ ~ a ~ etrees t e are r sorted ~ c c o r d i ~tog weight, the heaviest
FUNCTION Weight of (Node)RETURN IN^ an integer: SELECT Node .type: CASE Constant type: RETURN1; CASE Variable type: RETURN 1; CASE . . CASE Add type: SET Required left TO Weight of (Node .left) ; SET Required right TO Weight of (Node .right); IF Required left> Required right: RETURN Required left; IF Required left Required right: RETURN Required right; // Required left = Required right RETURN Required left+ l; CASE .,. Figure 4.34 Register require~ents(weight) of a node.
AST for b*b - 4 * (a*c)with register weights.
first, and processed in that order (Appel and Supowit, 1987). If the sorted order is then the compilation of tree 1 requires E +O registers, that of tree 2 requires E? ters, and that of tree y2 requires En+n--l registers. The total number of required n
for the node is the m a ~ i m uof~these terms, in a formula max(Ek+k-l). k= 1
reduces to the ~F-statementsin Figure 4.34. Suppose, for example, we have a routine with three parameters, to be ~ e l i v e r ein~ registers R1, 112, and R 3 , with actual parameters of weights livi=l, ?V2=4., and sorting the weights, we conclude that we must process the parameters in the order 2, 3, 1. The computation
~ r a ~ e tnumber er N upied when starting parameter
N
2 4 0
4
3 2 l
3
1 1 2
3
4 that we need 4 registers for the code generation of the parameters. Since we now the first e ~ ~ r e s s i otondeliver its result in register 2, we can no longer use a simple the code of Figure 4.30, but must rather use a set, as in the origi~alcode of Fighe process and its results are shown in Figure 4.36.
com~utationorder
parameter second parameter third
first parameter Evaluation order of three parameter trees.
venthemost i~attentivereader will bynow have noticed that we have swept a very o r t a ~ prob~em t under the rug: the expression to be translated may require more regisrs than the machine possesses or than the compiler designer has set aside for expression valuation. If that happens, oneor more values from registers have to be stored in memory locations,calledtemporaries,toberetrieved later. Onesaysthatthe contents of these spilled, or, less accurately but more cornmonly, that the registers are spilled; and a techniqu~of choosing which register(s) to spill is called a re gister spilling technique (except for exhaustive search), and new tooldtechniquesare still beingdeveloped. The simple d on the observation that the tree for a very complicated a top region in which the weights are higher than the number of registers we on a number of trees dangle, the weights of which are equal to or of registers, We can detach these trees from the original tree and to tempor~yvariables. This leaves us with a set of temporary variables r which we can generate code since we have enough registers, plus a d original tree, to which we repeat the process. An outline of the code
n
is shown in Figure4.37.
~ R O C E ~ U RGenerate E code for large trees (Node, Target register): SET Auxiliary register set TO Available register set\ Target register; WHILE Node /= No node: Compute the weights of all nodes of the tree of Node; SET Tree node TO Maximal non-large tree (Node); Generate code (Tree node, Target register, Auxiliary register set); IF Tree node/= Node: SET Temporary location TO Next free temporary location(); Emit ( I'Store RI1 Target register t r , T 1 lTemporary location); Replace Tree node by a reference to Temporary location; Return any temporary locations in the tree of Tree nod to the pool of free temporary locations; ELSE Tree nodeT= Node: Return any temporary locations in the tree of Node to the pool of free temporary locations; SET Node TO No node; F~C~IO Maximal N non-large tree (Node) RETURNING node: a IF Node .weightc = Size of Auxiliary register set: RETURN Node; IF Node .left .weight > Size of Auxiliary register set: RETURN Maximal non-large tree (Node.left); ELSE Node .right .weight >= Size of Auxiliary register set: RETURN Maximal non-large tree (Node.right); .37 Code generation for large trees.
The method uses the set of available registers and a pool of t e ~ p o r ~variables y in memory. The main routine repeated~yfinds a subtree that can be c o ~ p i l ~using d no more than the available registers, and generates code for it which yields the result in Target entire tree, the code generation process is complete. register. If the subtree was the Otherwise, a temporarylocation is chosen, code formovingthecontents of Targ register to that location is emitted, and the subtree is replaced by a re~erenceto t e ~ p o r alocation. ~ (If replacing the subtree is impossib~ebecause the expression tree is an unalterable part of an AST, we have to make a copy first.) The process of compilin subtrees continues until theentire tree has been consumed. The a ~ ~ i l function i a ~ Maximal non-large tree (Node) first checks if the tree of its parameter Node can already be compiled with the available registers; if so, the n o n - l ~ g etreehasbeenfound. Other~ise,atleastone of the c ~ i l ~ r eofn Node must require at least all the available registers. The function then looks for a non4 the left or the right child; since the register re~uirementsdecrease goin
the code generatedfor our sample tree when compiled with2 regislster is spilled, to temporary variable T1.
Load-Mem Load-Mem Mu1 t-Reg Load-Const Mu1t-Reg Store-Reg Load-Mem Load-Mern Mu1t-Reg Load-Mem Subtr-Reg .38 Code generated for b*b -
a,Rl c,R2 R2,R1 4,R2 R2,R1 R1 T1 b,R1 b, R2 R2 R1 FT1 R2 R2 R1 I
I
I I
4 * ( a * c 1 withonly 2 registers.
A few words may be said about the number of registers that a compiler designer should reserve for expressions. Experience shows (Yuval, 1977) that for handw~ttenproregisters are enough to avoid spilling almost completely. A problem is, hownerated ~ r o g r a can ~ s and indeed do contain ~bitrarilycomplex expressions, ons side ring thatsuchgeneratedprograms or 5 registerswillnotsuffice. bly cause spilling even if much larger numbers of registers were set aside for expressions, reserving 4 or 5 registers still seemsa good policy.
addition to thepureregistermachineinstructionsdescribedabove,manyregister chines have instructions for combining the contents of a register with that of a memory of location. An example is aninstruction Add-Nem X.,R1 foraddingthecontents X to R1. The above techni~uesare easily adapted to include these new example, a memory location as a right operand now requires zero regisne; thisreducesthew of thetrees. The newtree is shown in Figresultingnewcodeine 4.40. We seethatthe a ~ g o r i t ~now m prosubtree 4 *a* c first, and that the produced code differs com~letelyfrom
There are a few simple techniq~esfor allocating registers for the entire routine we are L for the first L local he simplest is to set aside a fixed n ~ ~ bofe registers r rest of the available registers as workin ailable registers are those that are not ne k limit pointer, heap pointer, activation record base ~ointer,etc.).
.39 Register-weighted tree for a memory-register machine.
Load-Const Load-Mem Mu1t-Mem Mult-Reg Load-Mem Mu1t-Mem Subtr-Reg
4,R2 a,R1 c,Rl R1 ,R2 b,R1 b,R1 R2 ,R1
.40 Code for the register-~eighte~ tree for a memory-register machine.
With a little bit of effort we can do better: if we set aside L, registers for local variables we do not necessarily have to give them to the first L, such variables, but we may able to make a better choice. For example, the C language allows local variables to have the storage attributer e g i S t e r , and priority canbe given to these variables when handi out regi more sophisticated approach is use tousage counts (Frei 1974). is an estimate how of frequently a variable is used. The it is best to keep the most fsequently used variables in registers. Frequency estimates ca be obtained from static or dynamic profiles.See below for more on profiling info~mation. The problem with these and all other procedure-wide register allocation schemes is that they assign a register to a variable even in those regions of the routine in which the variable is not used. In Section 4.2.7 we will seea method to solve this problem.
~ v a l ~ ~oftsii o ~ p~l ecode
ene era ti on
Quite generally speaking and as a very rough estimate, simple code generation loses about a factor of three over a reasonable good optimizing compiler. This badly ~uantifiedstatement means that it would be surprising if reasonable optimization effort didnot bring a factor of two of improvement, and that it would be equally su~risingif an improvel~entfactor of six could be reached without extensive effort. Such an extensive effort probably be spent in many small but often complicated refinements, each yiel
rofiling i n f o r ~ ~ t i ~ n e honest,labor-intensiveway of obtainingstatistical i n f o ~ a t i o n codeusage is by dynamicprofiling.Statementsareinserted, ally or auto~atically,into the program, which produce a record of e code are executed: the program is i ~ s t r is then run on a representative set of input data and the records red and condensedinto the desired statistical usage data. ractice it is simpler to do static profiling, based on the simple control of traffic entering a node the a ~ o u n leaving t it; this is the ow-of-control equivalent of off‘s laws of electric circuits (Knuth and Stevenson, 1973). The procedure body is set to, say, 1. At if-statements we the incoming stream passes through the then”part and lse-part; loops are (re)entered 9 out of 10 times; etc. f linear equations, which can be solved, resulting in all the basic blocks. See Exercises 4.9 and 4.10.
ow traffic rule which says that the amount
these techniques here, between the simple and the more advance compilaS
we recall, theidea
of symbolic inter~retationwastohaveanapproximate try of each node and to transformit into an ap~roximate e. The stack representation was approximate in that it usuall~ r~cor~ informatio~ ed items like ‘x is initiali~ed’rather than ‘x has the value 3’, g an approximation is, of course, sentation at compile time. After been initialized, but we have no ame technique but does keep the nded to include the stack, the will call such a representation r for short. Now, if the effect of a e do so. This is, for example, f the assignment x := 3 can be
recorded exactly in the regvar descriptor as ‘x = 3’. If, however, we cannot, for some reason, record the effect of a node exactly in the regvar descriptor, we solve the problem by generating code for the node and record its effect in the regvar descriptor. When confronted with an assignment x :=read-real ( ) we are forced to generate code for it. ine instruction and Suppose in our compiler we call a function by using a Call-Rout suppose further that we have decided that a function returns its result in register R1. We ine read-real. and record in the regvar descriptor then generate the code Call-Rout ‘The value of x is in R1’. Together they i ~ p l e ~ e nthe t effect of the node x :=read-real () exactly. In this way, the regvar descriptor gets to contain detailed information about which registers are free, what each of the other registers contains, where the present valuesof the local and temporary variables can be found, etc. These data can then be used to produce better code for the next node. Consider, for example, the code s e g ~ e n t x:=read-real ( ) ; y :=x * x, Atentrytothesecondassignment,theregvar descriptor contains ‘The value of x is in R1’.Suppose register R4 is free. Now the second assignmentcanbetranslatedsimply as Load-Reg R1,R4 ; Mult-Reg RI,R4, which enters a second item into the regvar descriptor, ‘The valueof y is in R4’,Note that the resulting code Call-Routine Load-Reg Mult-Reg
read-real R1,R4 R1,R4
does not access the memory locations of x and y at all. If we have sufficient registers, the values of x and y will never have to be stored in memory. This technique combines very well with live analysis: when we leave thelive range of a variable, we can delete all inforfree a register. mation aboutit from the regvar description, which will probably Notethat a registercan contain thevalue of morethanonevariable:after a :=b := c expresS ion.>,the register that received the value of the expression contains the present values of both a and b. Likewise the value of a variable can sometimes be found in more than one place: after the generated code Load-Mem x , R3,the value of x can be found both in the location x and in registerR3. The regvar descriptor can be implemented as a set of information items as suggested above, but it is more usual to baseits i~plementationon the fact thatthe regvar descriptor has to answer three questions:
-where can the value of a variable V be found?
-what does register R contain? -which registers are free?
It is traditionally implemented asa set of three data structures:
escriptors, addressed by register numbers, whose n-th entry conon what register n contains; eseriptors (also known as a dress d~scriptors),addressed by variable names, whose entryV contains information indicating where the valueof variable V can be found; and
n
-a set of free registers. The advantage is that answers to questions are available directly, the disadvantage inserting and removing information may require updating three data structures. When this technique concentrates mainly on the registers, it is called
is that
As explained at the beginning of Section 4.2, instruction selection, register allocation, and
inst~ctionordering are intertwined, and finding the optimal rewriting of the AST with available instruction templates is ~P-complete(Aho et al., 1977; run0 and Sethi, 1976). We present here three techni~uesthat each addresses a part of the problem. The first, basic bZoc~s,is mainly concerned with optimization, instruction selection, and instruction ordering in limited parts of the AST. The second, ut^^^-^^ tree r e w r ~ t ~discussed ~g, in Section 4.2.6, shows how a very good inst~ctionselector can be generated automatically for very general instruction sets and cost functions, under the assumption that enoughregisters are available. The third, register all~cationby g ~ colaor in^, ~ discussed ~ in Section 4.2.7, explains a good and very general heuristic for register allocation. In an actual compiler some of these techni~ues wouldbecombinedwitheachother andor withad-hoc approaches, We will turn first to the useof basic blocks. Goto statements, routine calls, and other breaks in the flow of control are complicating factors in code generation. This is certainly so in nmow compilers, in which neither the code from which a jump to the present code rnay have originated nor thecode to which control is transferred is available, so no analysis can be done. ut it is also true in a broad compiler: the required code rnay be available (or in the case of a routine it rnay not), but information about contents of registers and memory locations will still have to be merged at the join nodes in the flow of control, and, as explained in Section 3.2.2.1, this merge may have to be performed iteratively. Such join nodes occur in many places even in wellstructured programs and in the absenceof user-written jumps: the join node of the flow of control from thethen-p~tand the else-part at the end of an if-else statement is an example. ration in more ‘quiet’ parts of the AST has led to the idea s a part of the control graph thatcontains no splits (jumps) orcombines(labels). It is usualtoconsideronly which cannot be extended by including adjacent nodes without violating the definitionof a basic block. A maximal basic block starts at a label or at the beginning of the routine and ~ e or label or the endof the routine. A routine call ends just before a jump or j u ~ p - l i node terminates a basicblock, after the parametershavebeenevaluatedandstoredintheir required locations. Since jumpshave been excluded, the control flow inside a basic block cannot contain cycles, Intheimperativelanguages,basicblocksconsistexclusively of expressionsand assignments, which follow each other sequentially. In practice this is also true for functional and logic languages, since when they are compiled, imperative code is generated for thel~. The effect of an assign~entin a basic block rnay be local to that block, in which case the resulting valueis not used anywhere else and the variable is dead at the endof the basic
block, or it may be non-local, in which case the variable is an output variable of the basic block. In general, one needs to do routine-wide live analysis to obtain thisi n f o ~ a t i o nbut , sometimes simpler means suffice: the scope rules of C tell us that at the end of the basic block in Figure 4.4l n is dead.
{
1
int n; n x n y
= a + l ; = b + n*n = n + l ; = d * n ;
-I-
c;
1 Sample basic blockin C.
Ifwedo not have this i n f o ~ a t i o n(as is likely in a narrow compiler) we have to assume that all variables are live at basic block end; they are all output variables. Sirnilarly, last-def analysis (as explained in Section 3.2.2.3) can give us information about the values of input variables to a basic block. Both types of information can allow us to generate better code; of the two, knowledge about the output variables is more important. A basic block is usually required to deliver its results in specific places: variables in specified memory locations and routine para~etersin specified registers or places on the stack. We will now look at one way to generatecode for a basic block. Our code generation proceeds in two steps. First we convert the AST and the control flow implied in it into a AST thedependencygraph is a a directedacyclic dependencygraph;unlikethe graph. We then rewrite the dependency graph to code. We use the basic blockof Figure 4.41 as an example; its AST is shown in Figure 4.42. It is convenient to draw the AST for an assignment with the source as the left branch and the destination as the right branch; to emphasize the inversion, we write the traditional assign~entoperator := as = :. The C program text in Figure 4.41 shows clearly thatn. is a local variable and is dead at block exit. We assume that the valuesof x and y are used elsewhere:x and y are live at block exit; it is immate~alwhether we know this because of a preceding live analysis or just assume it because we h o w no thin^ about them.
Until now, we have threaded ASTs to obtain control flow graphs, which are then used to make certain that code is generated in the right order. But the restrictions imposed by the control flow graph are often more severe than necessary: actually only the data dependencieshavetobeobeyed.Forexample,thecontrolflowgraph for a + b definesthat a must be evaluated before b, whereas the data dependency allows a and b to be evaluated in any order. As a result, it is easier to generate good code from a data dependency graph
i
~i
/+\
/=:\
n 1
+/+\
X
n
/\ n
/*\
/+\
= /
i = /
: \
: \
n 1
Y
d /*\
n
n
.42 AST of the sample basicblock.
than from a control flow graph. Although in both cases any topological ordering consistent with the interface conventions of the templates is acceptable, the control flow graph generallydefinesthe order preciselyandleavesnofreedomtothetopologicalorderin whereas the data dependency graph often leaves considerab~e freedom, One of the most impo~antproperties of a basic block is that its AST including its control flow graph is acyclic and can easily be convertedinto a data dependency graph, which ad~antageousfor code generation. There are two main sourcesof data dependencies in the AST of a basic block: data flow inside expressions. The resulting data dependencies come in two varieties, downwardfroman assignme~toperatortothedestination,andupwardfromthe operands to all other operators. The generated code must implement this data flow (and of course the operations on these data). data flow from values assigned to variables to the useof the values of these variables in further nodes. The resulting data dependencies need not be suppo~edby code, since the data flow is effected by having the data stored in a machine location, from where it is retrieved later. The order of the assignments to the variables, as implied by the flow of control, must be obeyed, however. The implied flow of control is simple, since basic blocks by definition contain only sequential flowof control. For a third source of data dependencies, conce~ingpointers, see Section4.2.5.3. Three obser~ationsare in order here: The order of the evaluation of operations in expressions is immate~al,as long as the data dependencies inside the expressions are respected. If the value of a variable V is used more than once in a basic block, the order of these uses is immaterial, as long as each use comes after the assig~mentit depends on and before the next assignment toV.
-The order in which the assignments to variables are executed is im~aterial,as long asall assignments to a specific variable V are executed in sequential,left-to-right, order. These considerations give us a simple algorithm to convert the AST of a basic block to a data dependency graph. 1. Replace the arcs that connect the nodes in the AST of the basic block by data dependency arrows. Thearrowsbetweenassignmentnodesandtheirdestinationsinthe expressions in the AST pointfrom destination node to assignment node; the other arrows point fromthe parent nodes downward, 2. Insert a data dependency arrow from each variable V used as an operand to the assignment that set its value, or to the beginningof the basic block if V was an input variable. replaced. This dependency reflects thefact that a value stays in a variable until 3. Insert a data dependency arrow from each assignment to a variable V to the previous assign~entto V , if present. This dependency reflects the fact that an assignment to a variable replaces the old value of that variable. 4. Designate the nodes that describe the output values as roots of the graph. From a data dependency point of view, they are the primary interesting results from which all other interesting results derive. 5. Remove the ;-nodes and their arrows. The effects of the flow of control specified by 2 and 3. them have been taken over by the data dependencies added in steps
Figure 4.43 shows the resulting data dependency graph.
a
J '
L
1
.43 Data dependency graph for the sample basic block.
Nest, we realize that an assignment in the data dependency graph just passes on the valueandcanbeshort-circuited.Also, we caneliminatefromthegraph all nodesnot reachable t~roughat least one of theroots.Thesesimplificationsyield the final data
n
dependency graph redrawn in Figure4.44.
X
Y
+
*
+
c
a
1
d
+
Cleaned-up data dependency graph for the sample basic block.
Note that the only roots to the graph are the external dependencies for x and y. Note also that if we happened to know that x and y were dead at block exit too, the entire data dependency graph would disappear automatically. Figure 4.44 has the pleasant property that it specifies the semanticsof the basic block precisely: all required nodes and data dependencies are present and no node or data dependency is superfluous. eforegoingintotechniques of convertingthedependencygraphintoefficient e inst~ctions,however,wehavetodiscusstwofurtherissues conce~ingbasic blocksanddependencygraphs. The first is an impo~ant opti~i%ation, ~ u sub- ~ e ~ p r ~ s s i u ~ e l i and ~ i ~the ~ tsecond i o ~ , is the traditional representation of basic blocks and dependency graphs as triples.
~~~~n s ~ ~ e ~ ~ r e~ sl si ~ i ~i n n~ti~n Experiencehasshownthatmanybasicblockscontain e expressions that occur more than once in the basic block and evaluate to the same value at each occurrence. Common subexpressions originate from repeated subexpressions in the source code, for example x = a*a y = a*a
+ 2*a*b + b*b; - 2*a*b
-+
b*b;
which contains three c o ~ ~ subexpressions. o n This may come as a surprise to C or Java programers, who are used to factoring out common s~bexpressionsalmost without thinking:
~
double quads = a*a + b*b; double cross-prod = 2*a*b; x = quads + cross-prod; y = quads - cross-prod;
but such solutions are less convenient in a langua tions in sub-blocks. Also, common subexpressions can be generated by the intermediate code generation phase for many constructs in many languages, including C. For example, theCexpression a [i] -t b [i] ,inwhich a and b arearrays of 4-byteintegers, is translated into *(a
+
4*i)
+ *(b +
4*i)
which features the common subexpression 4 *i . Identifyingand c o ~ ~ i n common i ~ g subexpressions for thepurpose of computi~g them only once is useful, since doing so results in smaller and faster code, but this only workswhenthevalue of theexpression is thesame at each occurrence.Equalsubexpressions in a basic block are not necessarilycommon subexpressions. For example, the source code x = a*a + 2*a*b + b*b; a = b = O ; y = a*a - 2*a*b + b*b;
still contains three pairs of equal subexpressions, but they no longer evaluate to the same value, due to the intervening assign~ents,and do not qualify as ‘common subexpressions’. The effect of the assignments cannot be seen easily in the AST, but shows up immediately in the data dependency graph of the basic block, since the a s and bs in the third line have differentdependencies from thoseinthefirstline. This meansthatcommonsubexpressions cannot be detected right away in the AST, but their detection has to wait until the data dependency graph has been constructed. Once we have the data dependency graph, finding the common subexpressions is simple. The rule is that two nodes that have the operands, the operator, and the dependencies in common can be combined into one node. This reduces the number of operands, and thus the number of machine inst~ctionsto be generated. Note that we have already met a simple version of this rule: two nodes that have the operand and its dependencies in common can be combined into one node. It was this rule that allowed us to short-circuit the assignments and eliminate the variablen in the transfo~ationfrom Figure 4.43 to Figure 4.44. ConsiderthebasicblockinFigure4.45,whichwasderived from theoneinFigure 4.41 by replacing n by n*n in the third assignment. Figure 4.46 shows its data dependency graph at the ~ o m e n that t the common variables with identical dependencies have already been elil~i~ated. This results in the presence of two nodes with identical operators (*),identical operands (the + node), and identical data dependencies, again on the -tnode. The two nodes can be combined (Figure4.4171, resulting in the eli~inationof the common subexpression. Detecting that two or more nodes in a graph are the same is usually imple~entedby storingsomerepresentation of each nodeinahashtable. If thehashvalue of anode
{
1
int n; n x n y
= a + l ; = b + n*n + c; = n*n -t 1; = d * n ;
/* subexpression n*n .., */ /* ... incommon */
Figure 4.45 Basic block in C with common sL~b~xpre~sion.
b
a
X
Y
+
*
*
*
1
1
Figure 4.46 Data dependency graph with common subexpression.
depends on its operands, its operator, and its dependencies,c o m ~ o nnodes will hash to the same value. As is usual with hashing algorithms, an additional check is needed to see if they really fulfill the r e q u ~ e ~ e ~ t s , As with almost all optimi~ationtechniques, the usefulness of common subexpression elimination depends on the source language and the source program, and it is difficult to give figures, but most compiler writers find it useful enough to include it in their compilers. T~aditionally,data dependency graphs are implemented as arrays of triples. A record with three fields representing an operator with its two operands, and corresponds to an operator node in the data dependency graph. If the operator is monadic, the second operand is left empty. The operands can be constants, variables, and indexes to other triples. These indexes to other triples replace the pointers that connect the nodes in the data dependency graph, Figure 4.48 shows the array of triples co~e~ponding to the data dependency graph of Figure 4.44.
X
Y
+
*
+
c
+
d
7 Cleaned-up data dependency graph withcommon subexpression eliminated.
position 1 2 3 4 5
6 7
8
triple a + l @l * @l b + @2 @3
+
c
=: x @l + l d * @6 @4
@7 =: TyT
Figure 4.48 The data dependency graphof Figure 4.44 as an array of triples.
~eneratinginstructions from a data dependency graph is very similar to doing so from an AST: the nodes are rewritten by machine instruction templates and the result is linearized. The main difference is that the data dependency graph allows much more leeway in the order of the inst~ctionsthan the AST, since the latter reflects the full sequential specification inherent in imperative languages. So we will tryto exploit this reater freedom. In this section we assume a 'register-memory machine', a machine with reg op := rnern inst~ctionsin addition to the reg op := reg inst~ctionsof the pure register machine, and we restrict our generated code to such inst~ctions,to reduce the complexity of the code generation. n e available machine inst~ctionsallow most of the nodes to be rewritten simply by a single appropriate machine inst~ction,and we can concentrate on lineari-
zation and register allocation. We will turn to the linearization first, and leave the register allocation to the next subsection.
inear~zutionof the ~ a t ~ a e ~ e n ~~e rn ac ~~ h Wehaveseenattheb of Section 4.2 thatany lineari~ationobtained by atopological ordering of theins is acceptableas far ascorrectness is concerned,butthat for optimization purposes some orderings are better than others. In the absence of other criteria,twoorderingssuggestthemselves,correspondingtoearlyevaluationandtolate evaluation, respectively. In the early evaluation orde~ng,code for a node is issued as soon as the code for all of its operands has been issued. In the late evaluation ordering, codefor a node is issued as late as possible. It turns out that early evaluation ordering tends to require more registers than late evaluation ordering. The reason is clear: early evaluation ordering creates values as soon as possible, which may be long before they are used, and these values have to be kept in registers. We will therefore concentrate on late evaluation t is useful to distinguish between the notionof ‘late’ evaluation used here rather than themorecommonnotion of ‘lazy’evaluation. The difference is that‘lazyevaluation’ impliesthatwehopetoavoidtheactionatall,whichisclearlyadvantageous;in‘late evaluation’ we know beforehand that we will have to perform the action anyway, but we find it advanta~eousto perform it as late as possible, usually because fewer resources are tied up that way. The same considerations applied in Section 3.1 5 . 3 , where we tried to evaluate the attributes as late as possible. ven within the late evaluation ordering there is still a lot of freedom, and we will exploit this freedom to adapt the ordering to the characterof our machine instr~~ctions. We observe that re~ister-me~ory machines allow very efficient ‘ladder’ sequences like Load-Mem Add-Mem Mu1t-Mem Subtr-Mem
a ,R1 b,R1 c ,R1 d,R1
for the expression ( ( (a+-b) *c ) -d) ,and we would like our ordering algorithm to produce such sequences. To this end first we define an cea in data dependency graph: 1.Each root nodeof the graph is an available ladder sequence. 2. If an available ladder sequence S ends in an operation node N, then S extended with N’s left operand is also an available ladder sequence. 3. Ifan available ladder sequenceS ends in an operation nodeN whose operator is c o m ~ u tative -~ e a n i n gthat the left and right operand can be interchanged without affecting the result -then S extended withN’s right operand is also an available ladder sequence.
In other words, available ladder sequences startat root nodes, continue normally along left operands but inay continue along the right operand for commutative operators, may stop anywhere, and must stop at leaves. Code generated for a given ladder sequence starts at its last node, by loading a leaf variable if the sequence ends in a leaf, or an i n t e ~ e d i a t evalue if the sequence ends ear-
lier. Working backwards along the sequence, code is then generatedfor each of the operation nodes. Finally the resulting value is stored as indicated in the root node. For example, the code generatedfor the ladder sequencex,+,+,b in Figure 4.44 would be Load-Mem Add-Reg Add-Mern Store-Reg
b, R1
I1,Rl c,R1 Rl,x
assuming that the anonymous right operand of the + is available in some register ‘Intermediate l ’). The actual rewriting is shown in Figure4.49.
11
(for
X
Store-Reg Rl,x
+
Add Add
b
I1
7
l
Memc,R1
7
Reg I1,Rl
Load-Mem b,R1 Rewriting and ordering a ladder sequence.
The following simple heuristic ordering algorithm tries to combine the identification of suchladdersequenceswithlateevaluation.Basically, it repeatedlyfindsaladder sequence from among those that could be issued last, issues code for it, and removes it in reverse order and the last from the graph. As a result, the instructions are identified instruction of the entire sequence is the first to be determined. To delay the issues of register allocation, we will use pseudo-registers during the ~inearization. like normal registers, except that we assume that there are enoughof them. We will see in the next subsection how the pseudo-registers canbe mapped onto real registers or memory locations. However, the register used inside the ladder sequence must be a real register or the whole plan fails,so we do not want to run the risk thatit gets assigned to memory during register allocation. Fo~unately,since the ladder register is loaded at the beginning of the resulting code sequence and is stored at the endof the code sequence, the live ranges of the registers in the different laddersdo not overlap, and the same real register,for example RI,can be usedfor each of them. The algorithm consists of the following five steps. 1. Find an acceptable ladder sequence S that has the property that none of its nodes has more than one incoming data dependency. 2. If any operand of a node N in S is not a leaf but another node M , associate a new pseudo-register R with M if it does not have one already; use R as the operand in the
code generated for N and makeM an additional root of the dependency graph. 3. Generate code for the ladder sequenceS, using RI as the ladder register. 4. Remove the ladder sequenceS from the data dependency graph. 5. Repeat steps 1 through 4 until the entire data dependency graph has been consumed and rewritten to code. In step 1 we want to select a ladder sequence for which we can generate code il~mediately in a last-to-first sense. The intermediate values in a ladder sequence can only be used by code that will be executed later. Since we generate code from last to first, we cannot erate the code for a ladder sequence S until all code that uses i n t e ~ e d i a t evalues from S hasalreadybeengenerated. So anysequencethathas i n c o ~ i ndata ~ dependencieswill have to wait until the code that causes the depend~ncieshas been generated and removed from the dependency graph, together with its dependencies. This explains the ‘incoming data dependency’ part in step 1. Step 2 does a simple-~ndedform of register allocation. The other steps speakfor themselves.
X
Y
*
+
c
d
+
‘50 Cleaned-up data dependency graph for the sample basic block.
Returning to Figure 4.44, which is repeated here for convenience (Figure 4.50), we see that there are two available ladder sequences without ~ultiple inco~ing data dependencies: x,+,+,*,in which we have followed the right operand of the second addition; and y, *, +. It makes no difference to the algorithm which one we process first; we will on the weak grounds thatwe are generati~gcode laststart here with the sequence y, *,-I-, to-first, and y is the rightmost root of the dependency graph. T’he left operand of the node + in the sequence y, *, + is not a leaf but another node, the + of a + 1, and we associate the first free pseudo-register X1 with it. We make X1 an additional root of the dependency graph. So we obtain the following code:
Load-Reg Add-Cons Mu1t-Mem Store-Reg
X1 ,R1 tl ,R1 d ,R1 R1 ,y
Figure 4.51 showsthedependencygraph removed,
after theaboveladdersequencehasbeen
x t
+
G
+
Fi~ure4.51 Data dependency graph after removal of the first ladder sequence.
The nextavailableladdersequencecomprisesthenodes S , +, +, *. Wecannot include the + node of a + 1 in this sequence, since it has three incoming data dependencies ratherthan one. The operands of the final node * arenotleaves, but they do not require a new pseudo-register, since they are already associated with the pseudo-register XI. So the generated code is straightfor~ard: X1 ,R1 Load-Reg Mu1 t-Reg X1 ,R1 b,Rl Add-Mem c,Rl Add-Mem R1,x Store-Reg
Removal of this second ladder sequence from the dependency graph yields the graph shown in Figure 4.52. The available ladder sequence comprises both nodes: X1 and +;it rewrites to the following code: Load-Mem Add-Const Load-Reg
a ,R1 l,R1 R1 ,X1
I
Data dependency graph after removalof the second ladder sequence.
Removing the above ladder sequence removes all nodes from the dependency graph, and we have completed thisstage of the code generation. The result is in Figure 4.53.
Load-Mem Add-Cons Load-Reg
a,R1 t1,Rl R1 X1 I
X1 R1 Load-Reg Mu1 t-Reg x1 R1 Add-Mem b,R1 Add-Mern clR1 R1,x Store-Reg I
Load-Reg Add-Const Mu1t-Mem Store-Reg
X1 ,R1 1,Rl d lR1 RllY
igure 4.53 Pseudo-register target code generatedfor the basic block.
e~ister alloc~tion for the line~rizedcode One thing remains to be done: the pseudo-registers have to be mapped onto real registers or, failing that, to memory locations. There are several ways to do so. One simple method, which requires no further analysis,is the following. We map the pseudo-registers onto real registers in the order of appearance, and when we run out of registers, we map the remaining ones onto memory locations. Note that mapping pseudo-registers to memo^ locations is consistent with their usage in the instructions. For a machine with at least two registers, R1 and R2,the resulting codeis shown in Figure4.54. Note the instruction sequence Load-Reg R1 ,R2 ; Load-Reg R2,Rl,in which thesecondinstructioneffectivelydoesnothing,Such‘stupid’instructionsaregenerated often during code generation, usually on the boundary between two segments of the code. There are at least three ways to deal with such inst~ctions: im~roving the code generation
Load-Mem Add-Cons Load-Reg
a ,R1 t1 ,R1 R1 ,R2
Load-Reg Mult-Reg Add-Mem Add-Mem Store-Reg
R2 ,R1 R2 ,R1 b,R1 c,Rl R1 ,x
Load-Reg Add-Const Mult-Mem Store-Reg
R2 ,R1 1 ,R1 d,R1 R1,y
Code generated for the program segment of Figure 4.41.
algorithm; doing register t r ~ c ~ ias ~ gexplained , in the last paragraph of Section 4.2.4.3; and doing~ e e ~ ~ u ~Z tei ~ i z ~ as t i will o ~ ,be explained in Section 4.2.12. A more general and better way to map pseudo-registers onto real ones involves doing moreanalysis. NOWthatthedependencygraphhasbeenlinearizedwecanapply live ~ ~ ~ Z y as s i described s, in Section 3.2.5, to determine the live ranges of the pseudo-registers, and apply the algorithmsfrom Section 4.2.7 todo register allocation. For comparison, the code generated by thefull optimizing version of the GNU C compiler gcc is shown in Figure 4.55, converted to the notation used in this chapter. We see that i s has avoided both Load-Reg R 2 ,R1 instructions, possibly using register tracking.
Load-Mem Add-Const Load-Reg
a ,R1 1,Rl R1,
Mult-Reg Add-Mem Add-Mem Store-Reg
R1 ,R2 b,R2 c,R2 R2 ,x
Add-Const Mu1t-Mem Store-Reg
1 ,R1 d ,R1 Rl,y
R2
Code generatedby the GNU C compiler, gcc.
Pointers cause two different problems for the dependency graph const~ctionin the above sections. First, assignment under a pointer may change the value of a variable in a subsequent expression: in a = x * y ; *p = 3; b = x * y ;
x * y is not a common subexpression if p happe~sto point to x or y. retrieved from under a pointer may change after an assignment: in a = *p * g; b = 3; e = *p * g ;
*p * q is not a common subexpression if p happens to point tob.
Static data-flow analysis may help to determine if the inte~erencecondition holds, but that does not solve the problem entirely. If we find that the condition holds, or if, in the more usual case, we cannot determine that it does not hold, we have to take the interference into account in the dependency graph construction.Ifwe do this, the subsequentcode generation algorithm of Section 4.2.5.2 will automatically generate correct code for the basic block. 'The intederence caused by an assignment under a pointer in an expression can be inco~oratedin the dependency graph by recognizing that it makes any variable used in a subseq~entexpression dependent on that assignment. These extra data de~endenciescan be added to the dependency graph. Likewise, the result of re~ievinga value from under a pointer is dependent on all preceding assignments. Figure 4.56 shows a basic block similar to that in Figure 4.41,except that the second ~~~e~ x rather than to x. The data ~ependencygraph in Figure 4.57 a s s i ~ ~ m eassigns nt features two additional data dependencies, leadi from the variables n and d in the third andfourthexpressiontotheassignmentunder ointer. Theassignmentitself is marked with a *;note that the x is a normal input operand to this assi nment operation, andthat its data dependency is downward.
{
1
int n; n = a + l ; *x = b + n*n n = n + l ; y = d * n ;
+
c;
Figure 4.56 Sample basic block with an assign~entunder a pointer.
-.
-.
_"""""
7 Data d e p e ~ d e n c ~raph with an a ~ ~ i g ~ under ~ e natpointer.
Since the n inthethird e~pressionhasmore data expression two, it is not a c o ~ o subexpression, n two. As a result, the variable n cannot be e ~ i l ~ i n a h, Figure 4.58. Where the dependency grap ence x, +,+, *,this sequence is now not av has an i n c o ~ i ndata ~ depe~dence.The only available s the co~espondingcode and rernovi the = :* node. This makes the sequ ingthenode n, sincethelatterhastwoincornin sequence is n,= :,+. The r e s u l t i ~code ~ can be foun The codefeatures a pseudo-inst~ction Ins~uction
Store-Indirect-Mem
whichstores the contents of
Actions
R,,x
" x :=R,;
r R, underthepointerfound i ave such an instruction, but
ee that the code d replaced by more expensiv
x
Y
*
a
1
Cleaned-up data dependency graph with an ~ s s i ~ n under ~ e ~at pointer.
a,R1
Load-Mem Add-Cons t Store-Reg
R1,n
Load-Mem Mu1t-Mem Add-Mem Add-Mem Store-Indirect-Mem
n,R1 m,R1 b,R1 c,R1 Rl,x
Load-Mem Add-Cons t Mu1t"em Store-Reg
n,R1
1,r1
1,r1
d,R1 R1,Y
,59 Target code generatedfor the basic blockof Figure 4.56.
ter' in C, no pointer to it can be obtained,so no assign~~ent under a pointer can affect e same applies to local variables in languages in whichno point~rsto local variables can be obtained. Also, if the source language has strong typing, one can restrict the added dependencies to variables of the same type as that of the pointer under w h i c ~the assignof variables an assign~entcan possibly ment took place, since that type defines the set
P affect. This concludes our discussion of the heuristic optimized code generation for basic e will now turn to a very efficient method to generate optimal code for expressions.
3 . 2 we have shown how bottom-up tree rewriting can convertan A hmetic ex~ressioninto an inst~ctiontree which can then be linearized. In o used only very simple machine instructions, with the result that the tree rew cesswascompletely deter~inistic. In practice,however,machinesoftenhave a great variety of i~structions,simple ones and CO icated ones, m d better code can if allavailable inst~ctionsareutilized.chinesoftenhaveseveralhu ~ a c h i n e i n s t ~ c t i ooften n s , each with ten or more addressing modes, and it eous if' codegenerators for suchmachinescouldbederived from a concise descriptionratherthanwrittenbyhand. It turnsoutthatthe com~inationof ottom-up pattern matching and d y n a ~ i cprograml~ingexplained below allows precis el^
small set of instructions of a varied nature; the setis more or less ~achines,large enough to show the principles involved and small eexplanationmanageable. For each inst~uctionweshow tics in the form of a formula, its cost of exec t r q units, its name, both a~breviatedand in full, and a number label in ourpatternmatchingalgorithm. Since wewillbematchin 11, each node in the AST of an instruction has been given a label: instruction, the simple label goes to its top node and the other nodes are labele tosomescheme.For examp the Nult-scaledits only subnode is labeled .1. We will call the ee, becausewewillusethes Ts as patternsin a p s an aside, the cost figures in igure 4.60 suggest that on ~ e m o r ycosts 3 units,multiplicatiocosts 4 units,addition is apparently done in parallel with other CPU activities, and if an instruction contains tw m~ltiplications,their activities overlap a great deal, Such conditions and theco~espondin i~egularitiesin the cost structure are fairly common.If the cost structure of the i n s t ~ c t i o ~ set is such that the cost of each inst~ctionis simply the sum of the costs of its a p p ~ ~ n t s no gain in choosing co~bined inst~ctions, and simple code generat real-world machines are more baroque, for better or for worse. The AST contains three types of operands: mem, which indicates the co~tentso C S t,which indicates a constant; and reg, which indicates the conte h inst~ctionyields its (single) result in a register, which is used as t nother instruction7 or yields the final result of the expression to be compiled. The instr~ctionset shown here has been restricted to addition and multiplicatio instructions only; this is sufficient to show the algorithms. The 'scaled register' i
Rn
#l est
#2
mem
Load-Const est, R,
load constant
cost = 1
Load-Mem mem, R,,
load from memory
cost = 3
Add-Mem mem R,
add from memory
cost = 3
Add-Reg RI,R,
add registers
cost = 1
Mult-Mem nzem, R,
multiplyfrommemorycost
I
#4
/+\
R,
R1 12
*
#
/\
Mu1
#7
/+\
t-Reg R,, R,
Add-Scaled-Reg cst ,R, R, I
=6
multiply registers
cost = 4
addscaledregister
cost = 4
R,
est/*\
R,
B
Mul /*\
t-Scaled-Regcst ,R, ,R,
multiplyscaledregistercost
Sample inst~ction patterns for BURS code
en era ti on.
=5
tions #7 and #8 are s o ~ e w h a unnatural, t and are i n t r o ~ u c eonly ~ for ote that it is quite simple to describe an inst~ctionusi inear nature of the AST; this is necessary if we inst~ctionsto an auto~aticcode g e n e ~ ~ t ogenerator. r ~nstruction be specified bya line containing fourse~icolon-separatedfields: reg +:= (cst*regl); Add-Scaled-Reg cst,regl,reg; 4; Add scaled register
The firstfieldcontains eno i n f o ~ a t i o nto co~struc thesymbolic inst~ctiontoissued;thethird field is an e yields the cost of the inst~ction;and the fourth field is the be used in diagnostics, etc. The third field is an expression, to be evaluated inst~ctionis considered, rather than a fixed constant. an i n s t ~ c t i dependent o~ on the context.
(cst ==
1.
II
cst == 2
II
cst == 4) ? 3
: 4
Another form of context could be a compiler flag that indicate should optimize for program size rather thanfor p r o g r a ~speed. then be: O p t i ~ i ~ ~ F o ~? ~3 p :e (cst e~
>
in which the 3 is an indication of the time consu~ptionof arethe instructio~sizes for smallandnon-smallvalue numbers suggest thatc s t is stored in one byte if it fits in 7 not u n ~ ~ s u~angement). al ing to generate code fo
e. A rewrite of th described by attaching the instruction node labels to the n to see that there are many possible rewritesof our input tr ure 4.60. For e ~ a ~ p lFigure e, 4.62 shows a naive rewrite, wh ,and #6 only;these co~espondto those of a pure results in 7 inst~ctionsand its cost is 17 units, using line~ization,as obtained following the t e c h ~ i ~ uexplai es register allocation in~ection4.2.4.2, is shown in Figure 4.63. ure 4.64 illust~atesanother rewrite possibility. st-fit algo~thm:starting from the top, was chosen,andtheoper theoperatorsinthetree r e ~ u i r e ~ e n of t s that inst~ction. Thisforce rewrite is better than the naive one: it uses 4 cost is l 4 units. On the other hand, the top-down l
b
/+\* 4
/\ /\ *
8
a
Input tree for the BURS code generation.
+ #4
l
#l 4
/\R l l
R #l 8
#2 a
Naive rewrite of the input tree.
Load-Cons t Load-Hem Mult-Reg Load-Const Mult-Reg Load-Mem Add-Reg
8,R1 a,R2 R2 ,R1 4 I R2 R1 I R2 b, R1 R2 ,R1 Total
;1 ;3 ;4 ;1 ;4 ;3
unit units units unit units units ; 1 unit = 17 units
3 Code resul~ingfrom the naive rewrite.
rewrite the top of the tree in such a way that no rewrites can be found for the bottom parts; This discussion identifies two main problems:
#2
i: b
* #5
4 /*\
/\
'i
#l 8
Top-down largest-fit rewrite of the input tree.
Load-Const Mu1t"em Load-Mem Add-Scaled-Reg
8,Rl a,Rl b,R2 4,Rl,R2 Total
;1 ;6 ;3 ;4
unit units units units = 14 units
.65 Code resulting from the top-down largest-fit rewrite.
do we find all possible rewrites, and how do we represent them? It will be clear t we do not fancy listing them all! do wefindthebest/cheapestrewriteamongall ~ossibilities,~ r e f e ~ in ~ btime l~ mear in the sizeof the expression to be translated?
y a form of botto~-uppattern m . Thistechnique is known as a in particular, the code is generat tree: s bottom-up and identifies possible instruci s top-downandselectsat
each nodeone llected during the previous scan; bottom-up and emits the inst~ctionsin the
correct linearized order. ach of the scans can be i~plementedas a recursive visit, the first and the third ones as post-order visits and the secondas a pre-order visit. The instructi0~-collectingscan is themost i~teresting,andfourvariantswillbe
first variant finds all possible inst~ctionsusing item sets s all possible instructions using a tree automaton (Section ists of the first variant followed by a bottom-up scan that identifies the best tions using dynamic programming, (Section 4.2,6.3), and the final onecome third into a single efficient bottom-up scan (Section 4.2.6.4).
tween each pair of characters a set of in which a position was marked by a nized from the part we still hoped to
d at thebe~inningof Section 4.2.6. resent in the label set at node N in attern tree I (including the node L ) N. Moreover, we hope to be able pattern tree co~espondsto a reg-
o ~ ~ ~ i o n d e t which e ~ i n inst~ctions es can accept the result as an operand. For register, its top node cannot be to be inamemorylocation. the location in which the result mber #7 as an example. The igure 4.66a. When the label e write the compound label re 4.66b. When there is no after a character from the
The most obvious one thesecondonatree, ble token s t a ~ i n gat a
c
#7-
>reg
#7.1
The dotted trees c o ~ ~ s p o n dto i n#'7+reg ~ and to #"7.l.
given position; we then make that decision state. In treematchingwekeep all po reaches the top of the input tree and we line code for this bottom-u~pattern rec the co~espondingtyp
final and rest
node: the operand should bea memory loca only one such pattern: the variable can be there are twolabels here, +mem The rewrite ~ossibilitiesfor the n +c st for the constant itself and # 1+r The label sets for nodes 8 and a are obtained similar~y. The lower * node is next and its label set is more interest in^. We scan the set of pattern trees again for nodes that could match this node: should have two operands. We find five such nodes: see that we can match our node to the top node of pattern tree #5:
-its left operand is required to be a re~ister,and indeed the label ant 8 in the input tree shows thata register can be found erand is required to be a memory location, the presenc tree is confir~edby the label +mem in the node with variablea. This match results in the additionof a label #5+reg to our node. Next we match our node to the t node of inst~~ction # ~ the : rig required to bea register, and the label can be made available. Next we rec requires a constant for its left operand, which is c o n f i ~ ~ ebyd the l
PROCEDURE Bottom-up pattern matching (Node) : IF Node isan operation: Bottom-up pattern matching (Node .left); Bottom-up pattern matching (Node .right); SET Node .label set TO Label set (Node); for ELSE IF Nodeis a constant: 0 ; SET Node .label set TO Label set for constant ELSE Node is a variable: SET Node .label set TO Label set for variable 0 ; FUNCTION Label set for (Node) RETURNING a label set: SET Label set TO the Empty set; FOR EACH Label IN the Machine label set: FOR EACH Left labelIN Node .left .label set: FOR EACH Right label IN Node .right .label set: IF Label .operator= Node .operator AND Label .operand= Left label .result AND Label .second operand= Right label .result: Insert Label into Label set; RETURN Label set; F ~ C T I O NLabel set for constant(1 RETURNING a label set: SET Label setTO { (No operator, No location, No location, I1Constantr1) }; FOR EACH Label IN the Machine label set: IF Label .operator= lrLoadfl AND Label .operand= "Constantrr : Insert Label into Labelset; RETURN Label set; F ~ C T I O NLabel set for variable( ) RETURNING a label set: SET Label setTO { (No operator, NO location, No location, lrMemorylf) }; FOR EACH Label IN the Machine label set: IF Label .operator= rlLoadll AND Label .operand= lrMemoryrl : Insert Label into Labelset; RETURN Label set; .67 Outline code for bottom-up pattern matching in trees.
node, and a register as its right operand, which is also there; this adds the label the same reasoning we recognize subnode lf.8.1,but we fail to match node #8 to our node: its leftoperand is aregister,which is availableatthe 4 node,but its rightoperand is #$ .l,and #B. 1 is not in the label set of the a node. e next node to be visited by the bottom-~~p pattern matcher is the higher * node, where the situation is similar to that at the lower * node, an where we i ~ ~ e d i a t erecogly nize the top node of i ~ s t ~ c t i kt6 o ~and s the subnode #7.1. ut here we also recognize the top node of i ~ s t ~ c t i #8: o n the left operandof this top nodeis a register, whichis available,
TYPE operator: llLoadll +‘, ’*’ ; TYPE location: llConstantll llMemoryll llRegisterll a label; I
// anode ina TYPE label: an operator FIELD operator; a location FIELD operand; a location FIELD second operand; a location FIELD result;
pattern. tree
8 Types for bottom-up pattern recognitionin trees.
and is #8.1, which is indeed in the label set of the right operand of e the left operand allows a constant and the right operand allows a ude subnode ft8.1. e top nodes of inst~ctions#4 and #7 for the top node of the input tree is now easy.
#4 ->reg
/\ /y +
#7->req
- #6->reg ti7.1
b
#.-;m;ng
*
g
*
4
8
->cst #l ->reg
# 5 ->reg #6 ->reg #7.1
a ->mem 112 ->reg
Label sets resultingfrom bottom-up pattern matching.
t we have obtained in the aboveinst~ction-collect in^ scan is
theinputtreewithsets of possible re~ritinginstructions concise recipe for const~uctin~ tree rewrites using a subse uent node of theinputtreegivesusthechoice of rewrit or by i n s t r ~ c t i o#7. ~ We could, for example, decide to rewrite by forces the b node and the lower * node to be rewritten to registers and the high in place. The label set at the b node supplies only one rewrit #2, but that at the lower * node allows two possibili hoosing inst~ction#5 resultsintherewritesh
#6 causes an additional rewrite of the a node using instrucucceeded in obtaining a succinct representation of all possible
ssible for the pattern set to be insuf~cientto match a given input empty set of rewrite labels at some node, in which case the ck. In practice, however, this is a non-problem since all real mall' inst~ctionsthattheyalonewillsufficetorewriteany Note, for example, that the instructions #I, ft2, #4, and #6 f rewriting any expression tree consisting of constants, varications.Also,the BURS auto~atonconstruction algo~thm allows us to detect this situatio~statically, at compiler con-
t the algorithm sketched above perfoms at each node an amount t of the size of the input tree. Also, the amount of space used to by the n u ~ b e of r possible labels andis also independent of the sequently, the algorith~is linear in the size of the input tree, other hand, both the work done and the space required are proportional to the n set, which can be co~siderable,and we would like to remove this ques from thelexicalanalysissceneproveagainvaluable;morein Ute allpossiblematches at code ge~erator ge~erationtime, chniques as in the generation of lexical analyzers. set of pattern trees (which is supdescriptio~),there is only a finite ,given the operator of Node in Label set operan~s,we can precompute the resulting 1 algo~thmfor lexicalanalyzers in sup~liedby thefunctions Label set f o r constant ieldconstantresults.(Real-world an operand, etc.) Usin bel sets as operands, we check all nodes in the pattern trees to of these operands. If they can, we note the relation and add label set to our setof label sets. We then repeat the process with our enlarged ets, and co~tinueuntil the process converges and no changes occur any more. the theoretically possible label sets are realized in this prolaced by numbers, the states. Rather than storing a label set e store a state; this reduces the space needed in each node ati ion to a constant and quite small amount. The result sionaltable,indexed by operator,leftoperandstate,andrightoperand s the state of the possible matches at the operator. for pattern matching ateach node to that of simple table
indexing in a transition table; the simplified code analysis, the table orithrn usesconstant and S Inanalogyto the -state auto~aton(FSA) used in lexical ~alysis the character list and c o ~ p u t e snew states from old states and input ch
PROCE~UREBottom-up pattern matching (Node) : IF Node isan operation: ottom-up pattern matching (Node .left); Bottom-up pattern matching (Node .right); SET Node .state TO Next state [Node .operand, Node left.state, Node .right.state] ; ELSE IF Node is a constant: SET Node .state TO State for constant; ELSE Node isa variable: SET Node .state TO State for variable; .70 Outline code for efficientb o t t o ~ - ~ pattern p ~ ~ t cin htrees. i ~ ~
With, say, a hundred operators and some t ~ o u s a ~states, d the an empty entry, the origi~alset of p
without slowing them down seriously.
the steps of the algorithm on larger trees, we see that we often recompute the optimal rewrites of the lower nodes in the tree. We could prevent this by doing m e ~ o i ~ a t i oon n the results obtained for the nodes, but it is easier to just precompute these results in a botto~-upscan, as follows. or each node in our bottom-up scan, we examine the possible rewrites as determined by the instruction~collectingscan, and for each rewriting instruction we establish its cost by adding the cost of the instruction to the m i n i ~ a costs l of getting the operands in the places in which the inst~ctionrequires them to be. ~e then record the best rewrite in the node, with its cost, in the form of a label with cost indication. or example, we will write the rewrite label #5+reg with cost 7 units as #5+reg@7. The ~ i n i m a costs l of the operands are known because they were preco~putedby the same algorithm, which visited the co~espondingnodes easlier, due to the ~ o t t o ~ - unature p of the scan. The only thing still needed to get the process started is knowing the ~ i n i m a costs l of the leaf nodes, but since a leaf node has no operands, its cost is equal to the cost of the instruction, if one is required to load the value, and zero otherwise. As withtheoriginal inst~ction-collecting scan(asshowninFigure4.69), this bottom-up scan starts at the b node; refer to igure 4.71. There is only one way to get the value in a register, by using instru~tion#2, and the cost is 3; leaving it in m e ~ o r ycosts 0. The situatio~at the 4: and 8 nodes is also simple (load to register by instr~ction#l, cost = 1, or leave as constant), and thatat the a node is equalto that at the b node. entries, #5+reg, 6+reg, #7.1, and
-A rewrite with pattern tree#5 (=instruction #5) requires the left operand to be placed in a register, which costs l unit; it requires its right operand to be in memory, where ;and it costs 6 unitsitself:together7units.hisresultsinthelabel
it
patte~ tree #6 again requires the left operand to be placed in a register, at ires its right operand to be placed in a register to which costs 3 units; and itself together 8 units. This results in the label .l and #8 .l do not correspond with top nod cannot get a value into a register, so no cost is attached to them. see that there are two ways to get the value of the subtree at the lower * node into a ster, one costing 7 units and the other 8. We keep only the cheaper possibility, the one with instl~ction#5, and we record its rewrite pattern and its cost in the node. We do not the rewrite possibility with instruction #6, since it can never be part of a rewrite of the input tree. A si~ilar sit~ation obtains at the higher* node: it can be rewrittenby instruction #6 at cost 1 (left operand) + 7 (right operand) + 4 (instruction) = 12, or by inst~ction#8 at cost 1 (left operand) + 3 (right operand) + 5 (instruction) = 9. The choice is obvious: we keep reject instruction #6. At the t get e again two possi~ilities: instructio~#$ and at cost (left 3 operand) + 9 (right + (instruction) 1 = 13 ) + 4 (inst~ction)= 1 at cost 3 (leftoperand) + 7 (right choice is again clear: we keep instruction#4 and reject instr~ction ow we have only one rewrite possibility for each location at each node, and we are
+
b
/\
#4->reg 013
#7.1
* #E->reg @9
->mem 00 #2->reg @3
->cst @O
8 ->cst 0 0
#l->reg 01
a ->mem 00 #2->reg @3
Bottom-up pattern matching with costs.
certain that it is the cheapest rewrite possible, given the instru have more than one instruction attached to them, and the next guity in atop-downinstruction-selectingscan,similartotheone de consider the result location required at the top of the input tree, whic donthis i n f o ~ a t i o n ,we ch turn decides the rewrite e top node is rewritten using instruction #4, which requires two register ope selects instruction a register, which decid put into registers, but remains a constant. The labels involved in the actual rewrite have been checked i ure 4 only thing that is left to do is to linearize the rewritten tree into an inst~ction sequence: the code-generation scan in our code generation scheme. As e~plainedin the su~sectionon weighted register allocation in Section 4.2.4.2, we can do this by a recursive process which for each node generates code for its heavier operand first, followed by code lowedbythe inst~ctionitself. The result is s h o in~ ~ gain over the naive code (cost 17 units) and top-down largest-fit (cost 14 units) is lies mainly in the a~ificiallysmall inst~ctionset of our example; real machines have much larger inst~ctionsets and consequently provide much more good pattern matching. algorith~has advantages over the other rewriting algorithms in that it provides optimal rewritin of any tree and that it cannot get stuck, provided the setof instruce techni~ueof finding the ‘best’ path through a graph
by scan~ingit in a fixed
Load-Mem Load-Const Mult-Scaled-Reg Loa~-Mem Add-Reg
a,R1 4,R2 8,Rl,R2 b,R1 R2 ,R1 Total
;3
units unit ; 5 units ; 3 units ; l unit = 13 units ;l
.72 Code generatedby bottom-up pattern matching.
a set of ‘best’ sub-solutions at each node is called
from the infor~ationat nodes that have already been visited. When all a very c o ~ approach ~ o to ~ all kinds of optimi~ationproblems. As already ove, it can be seen as a specific im~lementationof memoization. For a more of dynamic p r o g r a ~ i n g see , text books on ~~gorithms, for example
headditionoperator to saveanotherunit.The e it does not (yet) know about such c o ~ ~ t a t i v i t i e There s. : specify for each inst~ctionthat involves a c o ~ u t a t i v e e code generator generator, or mark c o ~ u t a t i v eoperators r generator and let it add the patterns. The it is more auto~atic,and is less work in the code generator will indeed produce theAdd” cost to 12, as shown in Figure
to find the cheapest possible rewrite for each type of hine on which the cost functions of the i n s ~ u c t i o are ~s
Load-Mem Load-Const Mult-Scaled-Reg Add-Nem
a,R1 4,R2 8,Rl,R2 b,Rl Total
;3
units unit units units = l2units ;1 ;5 ;3
igure 4.73 Code generated by bottom-up pattern matching, usingcorn~u~ativity'
erator generation time and a single state number suffices at each node. The two processes are combined by adapting the subset algorithm from Section 4.2.6.2 to generate a transition table Cost conscious n e x t s t a t e []. Thisadaptation is far fromtrivial, as we shall see.
o ~ ~ ithen ~ i~ ~ t t e~atc r n chi^^
~~~
i~~tr~ction
sel~ctio~
~l~orith~s
The first step in ~ o m ~ i n i nthe g two algorithms is easy: the cost of each label is incorporated into the state; we use almost the same format for a label as in Section 4.2.6.1: ~ ~ Z o c f f t i o ~ This ~ c oextension ~t. of the structure of a label causes two problems: l . Input trees can be arbitrarily complex and have unboundedcosts. If we include the cost inthelabel,therewill bean ~nboundednumber of labels and conse~uently an unbounded numberof states. 2. Subnodes like #7,1 and #8. l have no cost attached to them in the original algorithm, but they will need one here. We shall see below how these problems are solved. create the initial states. Initial states derive from inst~ctions S only,operandsthatareavailablewithout the intervention of further instru~tions. The mostobviousexamples of suchoperandsareconstantsand memory locations, but the program counter (inst~ctioncounter) and the stack pointer also come into this category, As we have seen above, each basic operand is the basis of an initial state. Our example inst~ctionset in Figure 4.60 contains two basic operands -constants and memory locations -and two instructions that operate on them Constants give rise to state S1and memory locations to state S2: State SI: +CSt@O #l+reg@l State S 2 : +mem@O #2+reg@3
We are now in a position to create new states from old states, by precomputing entries of our transitio~table. To find such new entries, we systematically consider all triplets of an operator and two existing states, and scan the inst~ctionset to find nodes that match the
cost of label n in the left state -Icost of label m in the right state -Icost of instruction p
and sucha form is c o l ~ p ~ to e dthe costof a rewrite cost of label q in the left state -Icost of label r in the right state + cost of instruction S
But that means that only the reZ~tivecosts of e labels in each state count: if the costs of all labels in a state are increased or reduced by the same amount the result of parisonwillremainthesame e sameappliestotheinitialstates.This ob allows us to normalize a state a consta~tamount from all costs in We shall n o ~ a l i z estates by subtracting the smallest cost it contains from each of the costs; this reduces the smallest cost to zero. Nor~alization states various the reduces g@6, reg@^, etc., to a single state #4+reg@0. Now this cost 0 no longer means that it costs 0 units to 0 c o ~ p ~ toe dother possi rewrite by pattern tree #;“c,but that that possibility has cost (of which there happen to be none). All this means that the top of the tree will no carry an indication of the total cost of the tree, as it did in igure 4.71, but we would not base any decision on the absolute valueof the total cost anyway, even i loss is not serious. It is of course possible to assess the total cost of a g scan, or even on the fly, but such action is not finite- state^ and require side the FSA. is Cost conscious next s t a t e * / , atetocompute s are #5,#6, #7.1,and #8.l ;the labels for #5 and #6 are #5+reg@1-1-0+6=7 #6”+reg@l+3-1-4=8
of whichonlylabel #5+reg@7 survives. ~ o ~ p u t i ne costs for thelabe nodes #7.1 and #8.1 involves the costs of the no es then~selves,which are u decide to localize the entire cost of an instruction in its top node, so the cost of the subnodes is zero. No cost units will be lost or gained by this decision since subnodes can in the end onlyc o ~ b i n with e their own top nodes, which then carry the cost.So the new state is #5+reg@7 #7.1@0+3+0=3 #8.1@0-1-3+0=3
which after normalization reduces to #5+reg@4 #7.1@0 #8.1@0
We continue to combine one operator and two operand states using the above techniques untilnomorenewstatesarefound.orthe inst~ctionset of Figure 4.60 this
yields l 3 states, the contents of which are shown in Figure 4.74.. The states SI, S2, S3, and S4 in our explanation correspond to S S 02, S 03, and S 5 , respectively, in the table. The state S o o is the empty state. Its presence as the value of an entry Cost cons cious next state [op, S,, S,] means that no rewrite is possible for a node with operator op and whose operands carry thestates S, and S,. If the input tree contains such a node, thecode €eneration process will get stuck, and to avoid that situation any transition table with entries S o o must be rejected at compiler €eneration time. A second table (Figure 4.75) displays the initial states for the basic locations supported by the instruction set.
Figure 4.74 States of the BURS auto~atonfor Figure 4.60.
c s t : so1 m e m : So2 Figure 4.75 Initial states for the basic operands.
The transition table Cost conscious next state E I is shown in Figure 4.76; we see that it does not contain the emptystate S oo. To print the three-dimensional tableon two-dimensionalpaper,thetables for theoperators -I- and * are disp~ayedseparat~ly. Almost all rows in the tables are identical and have already been combined in the printout, compressing the table vertically. Further possibilities for hori~ontalc o ~ p r e ~ s i oare n clear, even in this small table. This redundancy is c h ~ a c t e ~ s tof ic using the proper techniques, such tables can be compressed to
The last table, Figure 4.77, contains the actual rewrite information. It specifies, based on the state of a node, which inst~ctioncan be used to obtain the result of the expression in a given location. mpty entries meanthatno inst~ctionis required, entries with -mean that no inst~ctionis available and that the result cannot be obtained in the required location. Forexample, if a node is labeledwiththestate So2 anditsresult is to be delivered in a register, the node shouldbe rewritten using instruction #2, and if its result is required in memory, no inst~ctionis needed; it is not possible to obtainthe result as a constant.
S03
s07
S05
S03
s07
s03
sll
s03
s03
s07
S11 S07S07
s03
s03
s03
s03
s07
S07
S07
S11
F i g ~ 4.76 ~ e The transition table Cost conscious next
$11
state [1.
CSt
mern reg Figure 4.77 The code generationtable,
ode ~ ~ ~ @ r usin^ t i t~~ o ~cost-co~scio~s ~e~t-stut~ t~ble The process of ~eneratingcodefrom an inputtreenowproceedsasfollows.First all leaves are labeled with their co~espondinginitial states: those that contain constants with S o and those that contain variables in memory with So2, as specified in the table in Figure 4.75; see Figure 4.78, Next, the bottorn-up scan assigns states to the inner nodesof the tree, using the tables in Figure 4.76. Starting at the bottom-most node which has operator
#4
States and i ~ s ~ r uused ~ ~ini ~ ~ s
One of tlne most pleasant properties of the circumstances. We will give s getting the value of the expres essentially the same and can has two kinds of registers, A ppose, for example, th -registerscannot.Th tion whether it is an
rewriting with instruction#73. As a different example, suppose we want to use the size of the CO when two rewrites have the same r u n - t i ~ ecost (which happens often). i ~ ecode size). Now, when com~ari cost pair rather than a single cost value: ( ~ ~ n - tcost, costs, we first comp e the r u n - t i ~ ecost fields and if they turn out to be equal, we CO pare the code sizes. these are equal too, the two sequences are equivalent as to cost, and we can choose either. Another advantage of this method is that if we want to optimize for code size, we just compare them as ordered integer pairs with the first and the second element e~chan~ed. The run-time cost will then be used as a tie breaker when two rewrites require the same amount of code. Anadaptation in a completelydifferentdirectionagain is toinclude all mac~ine inst~ctions-flow of control, fast move and copy, conversion, etc. -in the instruction set and take the complete A of a routine (or even the entire compilation unit) as the input tree.Instructionselectiand linear~zationwouldthenbecompletely a~tomatic. Such S t~chnologyare still experi~ental. eatment of the theory behind bottom-u~tree rewriting is given by (19 9 ~ ) for ; the full theory see Aho and Johnson (1976). A more recent publication on the application of d y n a ~ i c ~ m i to n tree ~ rewriti Proebsting (1995). An interesting variation on the al~orithmis described Ganapathi, and Tjiang(1989). In thissectionwehaveassumedthatenoughregistersilable for anyrewritewe choose. For a way to include register allocation into the 4.25. o ode generation by c o ~ ~ b i n bottomin~ Thisconcludesourdiscussion matching and dynamic pro have e seen that it yields optimal code when applicable, but thatit isap~licableto pure trees only.
In the subsection on procedure-wide
registe~allocation in Section 42.42, we have seen
that naive register allocation for the entire routine ignores the fact that variables only need registers when they are live. On the other hand, when two variables are live at the same position in the routine, they need two different registers. We can therefore say that two variables that are both live at a given position in the program ‘interfere’ with each other when register allocation is concerned. It will turn out that this inte~erenceinformation is important €or doing high-quality register allocation. Without live analysis, we can only conclude that all variables have values at all positions in the program and tbeyall interfere with each other. So it is essential for good register allocation to have done live analysis on the variables. We will de~~onstrate the techof Fi nique of register allocation by graph coloring, using the program segment
a := read0; b := read0; G := read(); a := a + b + c; 0 , 0 for n = 0 , and -1 for n c 0. Figure 4.84 shows the optimal code sequence found by supercompilation on the Intel 80x86; the sequenceis surprising, to say the least. The cwd inst~ctionextends the sign bitof the %ax register, which is assumed to contain the value of n,into the %dx register. Negw negates its register and sets the carry flag cf to 0 if the register is 0 and to 1 otherwise. Adcw adds the second register plus the carry
;n
in register %ax cwd ; convert to double word: I (%dx,%ax) = (extend-s negw %ax ; negate: (%ax,cf) := ( - % a x adcw %dxl%dx ; add with. carry: %dx := %d
O ~ ~ i code ~ a for l the function S ign (n ) .
ase n > 0
%dx %ax initially:
cf
-
n
-
0
n
-
0
-n
1
-n
ase %dx
n = 0 %ax cf 0
-
0
0
-
l
0
0
l
0
0
_ .
ase
n
< 0
%
-
n
-
-1
n
-
0
-1.
-n
1
0
-l
-n
1.
cwd negw
%ax
adcw %dxl%dx
Actions of the 80x86 code from Figure 4.84.
e
Techni~ue
roblem
uality
xpression trees, usin re~ister-re~ister or ~ e ~ o r y - r e ~ iinstrucste~ tions ptimal ptimal euristic
lnst~ctionswith cost funcptimal euristic egister allocation when all
euristic
~ o ~ ~ ~ofisome s o code n eneration t~c~lliques.
Tbe desc~ptionof code ~ e n e r a t i o ~ t e c h n i ~in~ this ~ e sbookaints a rel~tively oder rate view of code ~ptimization. enerators often are much more a and use tens and sometimes h ues and tricks, each of w ~ i c hcan ple interfere with each of the other optimi~ations.Also, such code ~ e n e r ~ t ooften r s distinguish large numbers of special these special cases and the tricks c o ~ b i ~ a t i owith n s any of the 0th
o p t i ~ i z a t i o ~and s fails when in error lies
c
In one non-optimizi~~ c
when c o ~ p i l ~ d i ~ o ~ t cessaril~ the thatmean details on the
Figure 4.87 Incorrect C p r o g r with ~ ~ co~pil~tion-dependent effect.
just after the array A [1 0 ]. When during execution i reached the value 10, the assignment A f 1.0 1 = 2 *1 0 was pe~ormed,which updated i to 20, since it was located at the position where A [1.01 would be if it existed. So, the loop t e ~ n a t e dafter having filled the i was allocated array as expected. In another, more optimizing compilation, the variable in a register, the loop body was performed 20 times and i n f o ~ a t i o noutside A [] or i was ove~~tten. Also, an uninitialized variable in the program may be allocatedby chance in a zeroed predicta~ly location in one form of compilation and in a used register in another, with unpredictable results for the ~ n n i n gprogram. All this leads to a lot of confusion and argume~tsabout the de~arcationof responsibilities between compiler writers and compiler users, and compiler writers have sometimes gone to great lengths to isolate optimization errors. When introducing an opti~ization, it is impo~antto keep the non-opti~zingcode present in thecode generator and to have a simple flag allowing the opti~izationto be performed or skipped. This allows selective testingof the optimizations and anyof their combinations, and tends to keep the opti~zationsrelatively clean and independent, as far as possible. It also allows the following drastic technique, invented by Boyd and Walley (1993). A counter is kept which counts the number of opti~zationsapplied in the compilation of a program; at the end of the compilation the compiler reports something like ‘This co~pilationinvolved N optimizations’. Now, if the code generated for a program P malfunctions, P is first compiled with all optimi~ationsoff and run again. If the error persists, P itself is at fault, otherwise it is likely, though not certain, that the error is with the optimizations. Now P is compiled again, this time allowing only the first N / 2 opti~izations; since each optimization can be applied or skipped at will, this is easily implemented. If the error still occurs,the fault wasdependentonthe first N / 2 opti~zations,otherwise it binary search will thus lead us to depended on the last N-N/2 opti~zations. ~ontinued the precise o ~ t i ~ z a t i othat n caused the error to appear. Of course, this optimization need not itself be wrong; its m~f~nctioning could have been triggered by an error in a previous optimization, But such arethe joys of debugging,..
As we have seen at the beginning of Section 4.2, it may be useful to do preprocessin the i n t e ~ e d i a t ecode. We have seen that the interm~diatecode originates from source-
langua~e-depe~dent inte~ediate code ene era ti on, which removes most specificfeaturesandperformstheific optimi~atio~s required by the loopsandcasestatementshavebeeoved from imperat~ve~ r o g r ~ s , from functionalprograms,andunification from logicpro involvin~themhavebeendone.asically,onlyexpressions,if-state in. So preproc the i n t e ~ ~ e d i acode t e conce~tr~tes onthes
xp~essionsare consta~tf o l ~ i ~ e for~ c o~m ~ i l e - t i ~ e e~pression eval~ation. For example, most Cco~pilerswill c o ~ p i l the e routine is the traditional t
char lower-case-from-capital(char ch) return ch + (’a’ - ’A‘);
{
char lower-case-from-capital(char eh) return ch -I- 32;
{
1
as
1
since a has the integer value 9’7 and A is 65. Some compilerswillapply ~mutativityandassociativity mles to ex order to find co~stantexpression. ch compilers will even fold the constants I
I
char lower-case-f~o~-capital(charch) { return ch + ‘a’ - ’A’;
1
in spite of the fact that both co~stantsdo not sharea node in Constant folding is one of the simplest and most effe program~ersusually will not write constant expressionsdi arise from character constants, macro processing,symboli code ene era ti on.
that the semantic
. Some care has to be inprocess. the For ex
eneratedfrom specifications. In principle,constant extreme case of arith~eticsimpli~cation.
E + O
E * l E ** 1 l ** E
foldin
= $ E = $ E * E
*
l
.88 Some transfor~~tions for arithmetic ~ i ~ ~ l i f i c a t i o n ,
orithmsfor thereduction of multiplic~tiontoaddition are suppliedby dy (1977); generalized operator stren th reduction is discussed in depth
inated.
of R. To this end, a copy i s made of the of S in the place of the call. S o ~ e w
same, since only one call is replaced by eac replace the parameters in-line, in macro sub necessary to i ~ p l e m e n tthe p ~ a m e t e rtr uage. So the call node in S is rep1 properly, a block that results from cop handle the return value, if applicable. As an example, the C routine prin igure 4.90. Note thatthe naive ma -I-+* i++ ) would be incorrect, since shown that there are no other calls to
eliminated, but this may not be easy to determine, especially not in the presence of separate compilation. void S { ,*. print-square (i++);
...
void print-square (int n) { printf ( "square = %d\nl@ , n*n) ;
1
Figure 4.89 C code with a routine to be in-lined.
void S { ... {int n = i++; printf ("square = %d\n", n * n ) ;} ...
1
void print-square (int n) { printf ("square = %d\n", n * n )
1
;
.90 C code with the routine in-lined.
The obvious advantage of in-lining is that it eliminates the routine call ~ e c h a n i s ~ , which may be expensive on some machines, but its greatest gain lies in the fact that it opens the door to many new optimizations, especially the more advanced ones. For example, the call pr int-square (3 ) is in-lined to {int n = 3; printf ('Isquare= %d\nit, n * n ) ;}
which is transformed by constant propagationinto {int n
= 3;
printf ("square = %d\n",
3*3)
;}
Constant folding then turns thisinto {int n = 3; printf(llsquare= %d\n1I,g ) ; }
and code generation for basic blocks finds that the variable n is not needed and generates something like
“square = %d\n”
Load-par Load-par Call
*
9
printf
not always live upto the expectations of the implementers; see, for all and Torczon (1992). The reason is that in-lining can co~plicatethe r o ~ r text a ~ to ~ such an extent that some otherwiseeffective opti~izationsfail; also, inforneeded for opti~zationcanbelostintheprocess.Extensivein-liningcan, for te very large expressions, which may require more registers than are availonsequent degradation in performance. Also, duplicating the code of a soumay increasetheload on theinstruction cache. These are examples of conflicting The complications of in-lining function calls in C are discussed by Hwu and
Chang
is s i ~ l ato r in-lining in thata copy of a routine is made, but rather than using place a call, it is used to create a new routine in which one or more parametershavebeenreplaced by constants. The cloning of a routine R is useful when static alysis showsthat R isoftencalledwiththesameconstantparameter or parameters. se for example that the routine double power-series (int n, double a[l , double x) { double result = 0.0; int p; for (p = 0; p S: n; p++) result += aCp1 * (x ** p); return result;
1
n
u,X”
,is called with x set to I .O. Cloning it for this p ~ a m e t e yields r the
p=O
new routine double power-series-x-l (int n, double a[]). { double result = 0.0; int p; for (p = 0; p S: n; p++) result += a[p] * (1.0 ** p) ; return result;
1
etic si~plifi~ation reduces this to double power-series-x-l (int n, double a[1) { double result = 0.0; int p; for (p = 0 ; p S: n; p + + ) result += a [p1 ; return result;
1
ach call of the form ower-series (n, a, 1 . 0 ) is then power-series-x-l (n, a),and a moree transformation is useful even if there is only one constant p ~ a m e t e rcan be substituted in macro fashion, si have side effects. A large p r o ~ o ~ i of o nthe calls with constant par tines, and cloning is most effective when the complete ing the library routines,
As we have seen above (Fi ures 4.54 and
4 5 9 , even ~oderately so~histicated cod eration tec~niquescan produce stupid inst~ctionsequences like Load-Reg R2 ,R1
or Store-Reg Load-Nern
R1,n n ,R1
One way of reme~yingthis situatio~is to do postprocessing in the form of replaces ization. sequences of symbolic ~ a c h i in ~ b input questions: es. two what raises This what er we are i n s t r ~ c t i osequences; ~ an instructions to be replaced? The two questions can be answered inde~ende~tly.
. A replace~entpattern consists of three components:
ters that fulfill the conditions. Its applicatio~consists of replacing the ~ a t c h inst~ctions e~ by the instructions in the r e p l a c e ~ ~list, n t with the para~eterssubs will normally be shorter. An example in some ad-hoc notation is LOad-Reg R, ,Rb;
Load-Reg R,, Rd
I
Ra=Rd, Rb=R,=> Load-R
which says that if we find the first two Load-Reg instructions in t they refer to the same but reversed register pair, we should replace them(=>) instruction. It is t e ~ p t i n gto c o n s t ~ c at full set of replacement patterns for a
which can be applied to any sequence of symbolic l~achineinstructions to o efficient sequence, but there are severalp r o b l e ~ swith this idea, The first is that instruction sequences that sequences are rarer than one fight think. For ex increment inst~ctionIncrement R,, which in efore accepti~git as a replaceme~tfor Add-Const 1 ,R, instructions affect the condition registers of the machine in the same way and react to er overflow in the same way. If there is any difference, the replacement cannot be accepted in a general-pu~osepeephole optimizer. special-purpose and is used after a code generator th ters and is used for a language that declares ow undefined, the e the ment can be accepted withoutpro~lems. he second is thatwewouldenlike to accept r e ~ e n t that s patentlydo not do thing e as the original, r example, we W like to replace the sequence Load-Const l,R,; Add-Reg R,,R, by I n c ~ ~ r n ~ ~ ut t this is inco~ect sincethefirstinstructionsequenceleaves R, set to l andtheseoes not affectthat enou the s e c o ~ duse of register. however, If, the code generator is kind R, is its lastreplacement the use, is correct. nreplacement the pattern Load-COnSt l,Ra; Add-Reg Rb,R, Increment R,
1
R, = Rb, ~ ~ - l ~ ~ t - U S e => (~b)
Last-use infor~ationmay be readily obtaine when the code is being not be available to a general-purpose peephol he third is that code generators usually have a very ~imited re~ertoire of inst~uctio~ se~uences,and a general-pu~osepeephole optimizer contains many ~ a t t e that ~ s will just never match anything that is generated. e~lacementp a t t e ~ can s be created by ~ ~ orn~ ed~ ~ r abt e d postprocessing, a handw~tten replace~ent patternsetsuffices. structed by somebody with a good nowl ledge of the ma chin^ in pages of generatedcode.Goodreplacementpatternsthen ea Experience shows (Tanenbaum et al., 1982) that about a hundred take care of almost all correctable inefficiencies left by a relative1 xperience shown also (Davidson has an that s e ~ c h i n gfor clever peephole optimizations is ente~ainingbut of most useful o~tirniz~tions are generally obvious. eplacement patterns canalso be derive a process generation code similar to by bo o, three, or more instruction trees are combined into one tre If this rewrite hasa lower total cost than the rnent pattern. The process is described by This auto~aticprocess is especially peep~ole opti~~ization. An example is the use of peephole opti~izationto s u b s ~ the ~~e entirecodegenerationphasefromintermediatecode to machineinstructions ( 1990). In this process, the inst~~ctions of the inter~ediatecode and the tar
instructions together are considered instructions of a single imaginary machine, with the provisothatany i n t e ~ e d i a t ecode instruction is moreexpensivethananysequence of machine instructions. A peephole optimizer is then used to optimize the i n t e ~ e d i a t ecode instructions away. The peephole optimizer is generated automatically from descriptions of both the intermediate and the machine instructions. This combines code ge~erationand peephole optimization and works because any rewrite of any inte~ediate inst~ctions to machine instructions is already an improvement. It also shows the interchangea~ilityof some compiler construction techniques. S
We will now turn to techniques for locating instruction sequences in the target inst~ction list that match any of a list of replacement patterns; once found, the sequence must be replaced by the indicated replacement. A point of consideration is that this replacement may cause a new pattern to appear that starts somewhat earlier in the target instruction list, and the algorithm must be capable of catching this new pattern as well. Some peephole optimizers allow labels and jumps inside replacement patterns: GOT0
La; Lb:
I
La = Lb => Lb:
butmostpeepholeoptimizersrestricttheleft-handsideofareplacementpatterntoa sequence of instructions with the property that the flowof control is guaranteed to enter at the first instruction and to leave at the end of the last instruction. These are exactly the re~uirementsfor a basic block, and most peephole optimization is done on the code produced for basic blocks. The linearized code from the basic blockis scanned to find left-hand sidesof patterns. When a left-hand side is found, its applicability is checked using the conditions attached to the replacement pattern, and if it applies, the matched instructions are replaced by those in the right-hand side. The process is then repeated to see if more left-hand sides of patterns can be found. The total result of all replacements depends on the order in which left-hand sides are identified, but as usual, finding the least-cost result is an NP-complete problem. A simple heuristic ordering is to find the first place in a left-to-right scan at which a matching lefthand side is found and then replace the longest possible match, The scanner must then back up a few instructions, to allow for the possibility that the replacement together with the preceding instructions match another left-hand side. We have already met a technique that will do multiple pattern matching efficiently~ choosethelongestmatch,andavoidbackingup -usian FSA; andthatiswhat peepholeoptimizersdo.Sincewehavealreadydiscusseveralpatternmatching rithms, we will describe this one only briefly here. The dotted items involved in the matching operation consistof the pattern instruction lists of the replacement patterns, without the attached para~eters;the dot may be positioned between two pattern instructions or at the end. We denote an item by P ,.. with P, for the i-th instruction in the pattern, and the input Iby I ... IN. The set o between the two input instructionsI, and I,+l contains all dotted items P ...Pk which P ...Pk matches ...I,. To movethissetovertheinstruction the items for which Pk+i matches I,+l, andweaddallnewitems PI
~ a t c h e sIncl. When we find an item with the dot at the end, we have found a matching p a ~ t and e ~ only then are we going to check the condition attached to it. If more than one ncluding conditions, we choose the longest. replaced the pattern instructions by the replacement instructions, we can start our scan at the first replacing instruction, since the item set just before it s u ~ a r i z e s allpartly match in^ patterns at thatpoint. No backingupoverprevious inst~ctionsis required.
optimizer is in~erselyproportional to the quality of the code he i ~ ~ o ~ aofn ac peephole e lded by the codegene~ation phase. A good code generatorrequires little peephole tion, but a naive code generator can benefit greatly m a good peephole optime compilerwriters(DavidsonandWhalley, 1989; vidsonandFraser, 1984.b) quality compilers from naivecode generation followed by aggressive peephole optimi~ation.
heresult of theabove compi~ation efforts is thatoursourceprogramhasbeen tralls~or~ed into a linearized list of target machine inst~ctionsin some symbolic format, usual re~resentationis an m a y or a linked list of records, each describing a machine i ~ s t ~ c t i oinna f o ~ a that t was decided by the compiler writer; this fonnat has nothing to do with the actual bit patterns of the real machine instructions. The purpose of compilation in an executabl~object file with se~anticscorrespondi~gto that of the ch an object file contains the bit patterns of the machine instmctio~s ut of the code generation process, embedded inbinary-encod~dinfor~ a t i othat ~ is partly progra~-dependentandpartly o~erating-system-dependent. For ailers are OS-dependent, in~or~lation about calls to library rout, and the format in which this i n f o ~ a t i o nis specified is again t machine code generationis the conversion of the machine code in at into an object file. Since code selection, register allocation, and in principle. already been done, this conversionis s~aightfor~ard is a lot of work, and since it involves specifying hundredsof bit paterns, ~ ~ o r ~ p r work o n e at that. In short, it should be avoided; fo~unatelythat is easy to do,
a programthat accepts lists of most all systemsfeatureatleastoneassembler, chine code inst~ctionsand su~ounding info~ation in character code format and genthem. These ~ u ~ a n - r e ~ d alists b l e of symbolic chine instmce machine instructions we have seen above were in some by generating assembly code as the last stage of our code ration process we can avoid writing the target machine code generation part of the ork of the people who wrote the assembler. In addition to nvolved in the c o n s ~ c t i o nof our compiler we also gain a useful interface for checkinganddebuggingthegeneratedcode:itsoutputinreadable
assembly code. It is truethatwriti e assemblyoutputto file andcallin the job slowsdown th ilationprocess,butthe costs areoften soft~~e~engineering benefits. Even if no assembler is available, as experimental machine, it is probably worth while to first write it asthe final step in thecompilationprocess.ing so partitionsthe interface usefulin const~ctingthe compiler,yieldsan asse~~ler, d which can also be used for other compilers. piler is available on the t attractive to take this idea a step further, by chan ss: ratherthangeneratin i ~ t e ~ e d icode a t ~from theannotated ++ code from it, which all optimization and target machine code ge C and C++ here, since these are probabl andthemostwidelyavailablecornpi s name as the ~latform-independentasse~bler Code generation into a higher-level langu attractive for compilers for non-i~perativelang logical, distri~uted,and special-purpose languages produce the approach can also be useful for imperative and object first C++ compilers produced C code and even a heavily for C itself could generateC code in whichall chec~ing cit. In each of these situations the savi enormous. On the down side, using C a factor of two slower than those that may be up to machine code. in (1988) gives a case study of C as a target l language SAIL, and Tarditi, Lee and Acharya trans la tin^ ~tandardML. If, for some reason, the compiler should do its own object file techniques can be applied as those used in an assembler. The constru discussed below.
An assembler, like a compiler, is a converter from source the usual compiler const~ctiontechniques are applicable in assembler const~ction;the include lexical analysis, symbol table m a n a ~ e ~ e nand t, ences too, though, resulting from the relative si~plicityo tive comple~ityof the tar et format, Assemblers are best t an a s s e ~ b ~ eisr still severalstepsawayfromthe s e~ecuting.~e willstartfromthissituationand and the tasksof an assembler. : A runnin~program CO ts of four c o ~ p o ~ e n tas CO data segment, and a set of isters. Thecontents of the
in
source code and are usually immutable; ften exte~dibleto allow dynamiclinking. The contents of the le and start off empty. e of the are also mutable and are refilled from the literals the sou . The contents of the registers usually start off ~ n i n zeroed. The code and the data relate to each other through addresses of loc rnents. These addresses are stored in the machine instructions and in the data segment. Most operating systems will set the re mana~erunit of the machine in such a way that the address spaces of the code and data segments start at zero for each running progral~,regardless of where these segments are located in real memory. is initiated by loading the contents of an execl~tablecode file into . The loader is usually an inte~ratedpart of the opera tin^ system, visible, and its activation is implicit in call in^ a pro should not forget that it is there. As part of the o~eratingsystem, it has special ~rivileges. Allinitializedparts of the ~rogramderive from theexecutablecode file, inwhich all addresses should be based on segments start in^ at zero. The loader reads these segments from the executable code file and copies them to suitable m~mory astack s e g ~ e n t and , jumps toa predeterm~nedlocationinthec se~ment,to startthe program. So the executable code file must contain a code se ment and a data se may also contain other indications, for example the initial stack size and theexecutio~start address. The executable code file derives from c o m b i ~ i ~one g or and ~ ~ o b a bsome l y l i b r ~ yobject files, and is constructed by a mal user ~ r o g r awithout ~, any privileges. A11 o~eratingsystems provide at least one, and most traditional compilers use this standard linker, but an increasi~gumber of com~iling systems come with their own linker. The reason is that a S eciaiized linker can check that the proper versions of ~suallydesigned for F 0 Each object file carries its own code and data segl~entcontents, and it is the task of the linker to combine these into the one code segment and one data~egmentof the executable code file. The linker does this in the obvious way, by making copiesof the segl~ents, con catena ti^^ them, and writing themto t e executable code file, but there are two complications here. (Needless to say, the object file generator and the linker have to agree on the format of the object files.) The first com~licationc o n c e ~ sthe addresses inside code and code and data in the object files relateto each other through addre in the executable code file do, but since the object files were creat they will be linked into an execL~table ment of each object file starts at zero. object files except the first one havet ents from different object files are linked together. ose, for example, that the length of the code ent in the first object file a .o is 1000 bytes. Then the second code object se file b . o, will start at thelocationwithmachineaddress1000.Allitsinternaladdresseswereori *
puted with 0 as start address, however, so all its internal addresses will now have to be increased by 1000. To do this, the linker must know which positions in the object segments contain addresses, and whether the addresses refer to the code segment or to the data segment. This i n f o ~ a t i o nis called r e ~ o ~ ~ t in~orm~tion. ion There are basically two formats in which relocation information can be provided in an object file: in the form of bit maps, in which some bits correspond to each position in the object code and data segments at which an address may be located, and in the form of a linked list. Bit maps are more usual for this purpose. The second complication is that code and data segments in object files may contain addresses of locations in other program object files or in library object files. A location L in an object file, whose address can be used in other object files,is marked with an e ~ t e ~ xtemal symbol looks like an identifier. The int. Object files can refer to L, by using an . Object files contain information about the external symbols they refer and tothe external which r they provide entry points. This information is stored in an e ~ t e r Sn ~ ~ For example, an if object file a . o contains routine p r i n t f at location 500, the file contains the explicit information in the external symbol table that it refers to the external symbol - p r i n t f at location 500. And if the library object file p r i n t f .o has the body of p r i n t f starting at location 100, the file contains the explicit information in the external symbol table that it features the external entry point - p r i n t f at address 100. It is the task of the linker to combine these two pieces of information and to update the address at location 500 in the copy of the code segment of file a . o to the address of location 100 in the copy of p r i n t f .o, once the position of this copy with respect to the othercopieshasbeenestablished. The prependedunderscoreintheexternalname p r i n t f is a convention to mark external symbols, he linking process for three code segments is depicted in Figure 4.91; the segments derive from the object files a .0, b .0, and p r i n t f .o mentioned above. The length of the code segment of b . o is assumed to be 3000 bytes and that of p r i n t f .o 500 bytes. The code segment for b .o contains three internal addresses, which refer to locations1600, 250, and 400, relative to the beginning of the segment; this is indicated in the diagram by having relocation bit maps along the code and data segments, in which the bits corresponding to locations 1600,250, and 400 are marked with a C for ‘Code’. Segments usually contain a high percentage of internal addresses, much higher than shown in the diagram, which is why relocation bit maps are more efficient than linked lists. The code segment for a .o contains one external address, of the external symbol - p r i n t f as described above, The code segment for p r i n t f o contains one external entry point, the locationof - p r i n t f . The code segments for a.o and p r i n t f .o will probably also contain many internal addresses, but these have been ignored here. The linkingprocessfirstconcatenatesthesegments.Itthenupdatestheinternal addresses in the copiesof a .0,b. 0,and p r i n t f .o by adding the positionsof those segments to them; it finds the positions of the addresses by scanning the relocation maps, which also indicate if the address refers to thecode segment or the data segment. Finallyit stores the external address of - p r i n t f ,which computes to 4100 (=1000+3000+100),at location 100, as shown.
a.o
o~ reference to printf _ I
1000
i__l
0
b.o
'
0
1000
/
4000
3000
p r i n t f .o
500
entry point srintf
codeoriginal seg~ents
relocation bit maps
resulting executable code seg~ent
.!H Linking three code segments. e see that an objectfile will contain at leastfour com~onents:the code se ment, the relocation bit map, and the external symbol table. e are now in a position to discuss issues inthe const~ctionof asse~blersand linkwill not go into the c o n ~ t ~ c t i oofnloaders, since they hardly require any special t e ~ h ~ i ~ and u e $are almost universally s u p p ~ i with e ~ the opera tin^ system.
n asse~blerconverts from symbolic machine code to binary machine code, and from s y l ~ ~ o ldata i c to binary data. In principle the conversion is one to one; for example the assembler instruction add1
%edx,%ecx
which does a 32-bit addition of the contents of the %edxregister to the %ecx register, is converte~to the binary data
0 0 0 0 0 0 0 1 11 0 1 0 0 0 1
(binary) =
01
Dl ( ~ e x ~ ~ e c ~ ~ ~ ~ )
The byte 0 0 0 0 0 0 0 1 is the operation code of the operation addl,the next two bits 11 mark the inst~ctionas register-to-register7 and the trailing two grou and 0 0 1 are the translations of %edxand %ecx. It is more us lation in hexadecimal;as shown above, the instructionis 0 ID1 translationscanbelooked up in tablesbuilt into the assembler. languages, there aresomeminorcomplicationsduetothe overlo names, which have to be resolved by considering the types of the Q the ~ a n s l a t ~ d i n s t ~ c t are i o npacked s closely, with no-op inst~ctions requirementswouldleavegaps. A that does nothing (except perhaps was The conversion of symbolic data to binary data involves conv two-byte integer 6 6 6 to hexadeci~al91102 (again on an 80x84, which is a l i t t ~ e - e n ~ i ~ n e dou~le-length @-byte) floating point numbe E3210940, and the two-byte string l l F C 1tolhex 5 in assembly code is not e~tendedwith a null byte; the null-byte convention, and language-specific conventions have no place in PC\ 0 string Peli mustbetranslated by thecodegeneratorto code. the assembler will then translate this to 5hex 0 4 3 0 0. he main problems in const~ctinganassembler lie inthe handlin~ ses, addresses that refer to locatio~sin the same segl~ent;and external those that refer to locations in segments in other object files. issues.
the de~initionof the label l a b 2 1 l. The assembler reads the a s s e ~ b l ycode and assembles the code segments into two different arrays. When the assembler re data may. It translates the source material for the data segm~n in the data array, and records the addresses of the locations a bler records the valueof the label varl as
varl must be used in a m e ~ o ~ - r e ainstruction, ~in~ into the code may. While translating the code segment, th
.data
... .align 8
varl : .long 666
.code
... ...
add1 varl,%eax
...
jmp
...
labell
labell : S
.
.
Assenibly code fragment with internal symbols.
addl varl, %eax, for which it assembles the pro er binary attern a tion, plus the value of the data segment label which the code segment is location being assemble e of this as ‘relocatable to the data segme instr~~ction bler encounters the inst~ctionjrnp labe11, however, it c since the valueof 1 abel 1 is not yet known. There are two solutionsto this problem: baclspatching and two-scansasse~bly. using bacls~atching, assembler the keeps whose valuel is not ..A,,oflocations the known. yet The back~atchlist for a labe in the code and data segments being assembled, into which the value of L must be stored whenthevalue of L becomesavailable.nanapplied o c ~ u ~ e n oc e encounteredandtheassemblerdecidesthvalue of L mustbeassem tion Ai, the addressAi is inserted in thebac~patchlist for I, and the location Theresulting a~angementis shownin 4.93, whichdepictstheassemblycode,the asse~bledbinarycode,and one baclsp t, forthelabel label1. Whenfinallythe d~finingoccurrence of L is found, the address of the position it labels is det~rminedand assigned to L as its value. Next the backpatch list is processed, and for each entry Ak, the cation addressed byAk. ,the assem~lerprocesses its in ut file twice, the values of ail labels. ss described above, but with of where everythin occurrences of alllabels. For each the value of L, since that value derives from the position ,the values of all la els are known and the actual translation can take place withoutproble~s. I
.
Assembly code
Assembled binary
Backpatch list for labell
. . . . jmp labell
. . * .
. . . .
. . . . jmp labell f
.
.
.
jmp labell
.... labell: .
I
.
.
.93 A backpatch list for labels.
Some additional complications may occur if the assembler language supports features like macro processing, mu~tiplesegments, labels in expressions, etc., but these are mostly of an administrative nature.
The external symbol and address informationof an object file is summarized inits external symbol table, an example of which is shown in Figure 4.94. The table specifies, among other things, that thedata segment has an entry point named-opt ions at location 50, the code segment has an entry point named -ma in at location 100, the code segment refers to an external entry point -printf at location 500, etc. Also there is a reference to an external entry point named-f i le-l ist at location 4 in the data segment. Note that the meaning of the numbers in the address column is completely different for entry points and references. For entry points, the number is the value of the entry point symbol; for references, the number is the address where the valueof the referred entry point must be stored. The external symbol table can be constructed easily while the rest of the translatio~is being done. The assembler then produces a binary version of it and places it in the proper position in the object file, together with the code and data segments, the relocation bit maps, and possibly further header and trailer material.
The basic operation of a linker is simple: it reads each object file and appends each of the four components to the proper one of four lists. This yields one code segment, one data
External symbol
options main __ printf atoi printf exit msg-l is t -Out-Of-Memory fprintf exit -file-list
Type
entry point entry point reference reference reference reference entry point entry point reference reference reference
Address
50 data code 500 code 600 code 650 code 7 0 0 code 300 data 800 code 900 code 950 code 4 data 100
Figure 4.94 Example of an external symbol table.
ment, one relocation bit map, and one external symbol table, each consisting of the concatenation of thecorrespondingcomponents of theobject files. Inadditionthelinker r e t a i ~ s i n f o ~ ~ a tabout i o n the lengths and positions of the various components. It is now s~~ightforward to do the relocation of the internal addresses and the linking of the external addresses; this resolves all addresses. The linker then writes the code and data segments to e executable code file; optionally it can append the external symbol table, which sefui for debugging the translated program. This finishes the translation process arted in thefirst line of Chapter 2! al-world linkers are often more complicated than described above, and const~cting t a particularly simple task. There are several reasons for this. One is that the actual sit~ationaround object modules is much hairier than shown here: many object file f ~ r ~ ahave t s features for repeated initialized data, special arithmetic operations on relocatable addresses, conditional external symbol resolution, etc. Another is that linkers often havetowadethroughlargelibrariestofindtherequiredexternalentrypoints,and advan~edsymbol table techniques are used to speed up the process. A third is that users tend to think that linking, like garbage collection, should not take time,so there is pressure linker writer to producea blindingly fast linker. ne obvious source of inefficiency is the processing of the external symbol table. For each entry point init, the entire table must be scannedto find entries with the same symbol, which can then be processed. This leads to a process that requires a time O(n2)where n is the number of entries in the combined external symbol table. S c ~ n i n gthe symbol table for each symbol can be avoided by sorting it first; this brings all entries concerning the same symbol together,so they can be processed efficiently.
This concludes our discussion of the last step in compiler construction, the transfo~ation of the fully annotated AST to an executable binary file. The basic process is that of tree rewriting: nodes or sets of nodes are replaced by nodes or sets of nodes that embody the samesemanticsandthatareclosertothehard st powerfultree rew~tingproRS, is based on a pattern matching technique that is a tree extension of that used l analysis inthe front-end of a compiler. In an extremely high-level view of compiler construction, one can say that textual analysis is done by pattern matching, context handling by data-flow machine, and object code synthesis (code generation) again by pattern matching. Many of the algorith~sused both in compiler build in^ and in compilation can conveniently be expressed as closure
-The annotated AST produced by context handling is converted to intermediatecode in a igm- and langua~e-specificprocess. Next, the intermediate code is converted into symbolic machine inst~ctionsby a paradigm-independent, language-independent, and largely machine-independent process: code generation. The symbolic ~ a c h i n e i n s t ~ c C code,assemblycode, tionsarethenconvertedtosomesuitablelow-levelcode: machine code. -Another option is to i n t e ~ r ethe t intermediate code. -~ n t e ~ e d i a code t e usually consist of expressions, routine ad~nistrationand calls, and jumps; it may include specia~-pu~os~ lan~ua~e-specific operations, which can be inlined or idd den in a library routine.
-An i n t e ~ r ~ t ise ra program that considers the nodesof the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language. An inte~reterperforms essentially the same actionsas the CPU of the computer, except that it works on AST nodes rather than on machine inst~ctions. -Inte~reterscome in two varieties: recursive and iterative. A recursive inte~reterhas an inte~retingroutine for each node type in the AST;it follows the control flow graph. An iterative interpreter consists of a flat loop over a case statement which contains a code segment for each node type; it keeps an active-node pointer similar to the instruction pointer of a CPU. -The routine in a recursive interpreter for the n o n - t e ~ n a N l perfoms the semantics of the non-terminal N. It normally follows the control flow graph, except when a status indicator indicates otherwise. -Unless the source language specifies the data allocation explicitly, run-time data in a
recursiveinterpreter is usuallykeptinanextensivesymboltable.Thisallowsample debugging information to be kept. -A recursive interpreter can be written relatively quickly, and i s useful for rapid prototyping; it is not the architectureof choice for a heavy-duty interpreter. -The ~ n - t i m edava in an iterative interpreter are kept in arrays that represent the global data area and the activation records of the routines, in a form that is close to that of a compiled program. -Additional information about the run-time data in an iterative interpreter can be kept in shadow mays that parallel the data arrays. These shadow arrays can be of assistance in detectingtheuse of uninitializeddata,theimproperuse of data,alignment errors, attempts to overwrite protectedor system area data, etc. -Using an iterative interpreter can be expected to be between 30 and 100 times slower than ~ n n i n ga compiled program, but an interpreter optimized for speed can reduce the loss to perhapsa factor 10.
he basis of code generation is the systematic replacement of nodes and subtrees of the AST by target code segments, in such a way that the semantics is preserved. It is followed by a ~inea~zation phase, which produc~sa linear sequence of inst~ctionsfrom the rewritten AST. -The replacement process is called tree rewriting. The linearization is controlled by the ~ata- low and ow-of-control requirements of the target code segments, -Thethreemainissuesin code generationare code selection,registerallocation,and instruction ordering. -Finding the optimal combination is N~-completeinthegeneral case. There are ways to simplify the code generation problem: l. consider only small partsof the A a time; 2, simplify the target machine;3. restrict the interfaces between code segments. -Codegeneration is pe~ormedinthreephases:1. pre~rocessing,in whichsomeAST nodepatternsarereplaced by other (‘better’) ASTnodepatterns,usingprogram t r a n s f o ~ a t ~ o2. ~ scode ; generation proper, in which all AST node p a t t e ~ sare replaced by target code sequences, using tree rewriting; 3. postprocessing, in which some target code sequences are replaced by other (‘better’) target code sequences, using peephole o timi~ation. - and postprocessingrnay be performed repeatedly. simplest way to obtain code is to generate for each node of the AST the code segment an iterative interpreter would execute for it. If the target code is C or C++,all optimizations can be left to the C or C++ compiler. This process turns an interpreter into a compiler with a minimum of investment. a single copy in a ather than repeating a code segment many times, routine calls to library can be generated, reducing the size of the object code consi~erably.This technique is called threaded code. The object size r e d u ~ t i ornay ~ be i m p o ~for ~ ~embedded t systems.
- An even larger reduction in object size can be achieved by numbering the library routines and storing the program as a list of these numbers. All target ~ a c h i n edependency is now concentrated in the library routines.
- Going in the other direction, the repeated code segments may each be partially evaluated in their contexts, leading to more efficient code.
- In simple code generation, a fixed translation to the Parget code is chosen for each possible node type. These translations are based on mutual interface conve~tions.
- Simple code gener~tionrequires local decisions only, and is therefore especially suitable for narrow compilers.
- Simple code generation for a register machine rewrites each expression node by a single machine instruction; this takes care of code selection. The intefiace convention is that the output register of one instruction must be used im~ediatelyas an input register of the parent instruction. - Code for expressions on a register machine can be generated by a depth-first recursive visit; this takes care of inst~ictionordering. recursive routines carry two additional p ~ a ~ e t e r the s : register in which the result be delivered and the set of free registers; this takes care of register allocation. - Since each operand that is not processed immediately ties up one register, it is advantageous to compile code first for the operand that needs the most registers. "his need, called the weight of the node, or its ~ e t h i - ~ l l m anumber, n can be c o ~ p u t e din a depthfirst visit. - When an expression needs more registers than available, we need to spill one or more registers to memory. There is no best register spilling techni~ue,except for exhaustive search, which is usually not feasible. So we resort to heuristics. - In one heuristic, we isolate maximal subexpressions that can be compiled with the available registers, compile them and store the results in t e m p o r ~ yvariables. This reduces the original tree, to which we repeat the process. - The machine registers are divided into four groups by the compiler designer: those needed for administration purposes, those reserved for parameter transfer, those reserved for expression evaluation, and those used to store local variables. Usually, the size of each set is fixed, and some of these sets may be empty. - Often, the set of re isters reserved for local variables is smaller than the set of candidates. Heuristics include first coine first served, register hints from the progra~mer,and usage counts obtained by static or dynamic profiling. A more advanced he~iristicuses graph coloring. - omp pi la ti on on the stac~compilationby symbolic inte~retationis s o ~ e w h a similar t to symbolic interpretation. As in the latter, we keep a symbolic representation, but now we include both the stack and the registers; more impo~antly,this time the i n f o ~ a t i o nin the representation must be exact. The representation is called a regvar descriptor. - If the effect of a node can be stored exactly in the regvar descriptor, we do so. No code is generated for the node, but its semantics is preserved in the regvar descriptor: old regvar + node = new regvar. - If the effect of a node is such that we cannot keep the information in the regvar descr tor exact, we generate code to achieve the effect, and record the result in the regvar.
3
the semantics is preserved in the rewriting of the node: old regvar + node = code + new regvar. -If we have the information from live analysis available, we can delete all information about a variable from the regvar description the momentwe leave its live range. -A basic block is a maximal part of the control graph that contains no splits tiumps) or combines (labels). A basic block starts at a label or at the beginning of the routine and ends just before a jump or jump-like node or label or the end of the routine. It contains expressions and assign~entsonly. -Theconcept of a basicblockseparatestheconcerns of code g neration for straight sequences of expressions and assignments from thoseof flow of control. The separation is especially useful for narrow compilers, sinceit allows them todo optimized code generation for expression sequences. -Code generation for basic blocks proceeds in two steps. First the control flow converted into a dependency graph, which is a ‘dag’, a directed acyclic graph. rewrite the dependency graph to code. The gain lies in the fact that the dependency graph is much less restrictive toinst~ctionordering than the control flow graph. -The dependency graph of a basic block is formed by two kinds of dependencies: data de~endenciesinside expressions t ~ o u g hoperands, and data d~pendenciest ~ o u g hvariables that obtain their value in an assignment and whose value is used in an expression further on. The ultimate data dependencies are the values that are still needed after the basic block; these are called the roots of the basic block. -~ ~ p h a s i z i nthese g data dependencies and removing other ow-of-con~ol dependencie~ yields a rough data dependency graph that can be simplifiedby shL~nting out the assignments and keeping only those nodes that are reachable from the roots. The graph is a dag, a directed acyclic graph. -The dag of a basic block can be reduced furtherby recognizin c o m o n sube~pressions. The reduction takes place by merging nodes that have the s e operands, operator, and ~ependencies,repeatedly. Traditionally, dags of basic blocks arei~plementedas arrays of triples. -The nodes in the dagof a basic block are rewritten to theco~espondingmachine instructions. The dag is then linearized based onlate evaluation of the operands. -The specific f o m of late evaluationused for linearizingdagsidentifiesladder sequences; such ladder sequences match sequences of r e g i s t e r - ~ e m o ~ i n s t ~ c t that ions all have the register inc o m o n . Such sequences are veryefficient. To find the linearization,first avai~ableladder sequences are isolated and thencode from them is generated in last-to-first order, starting with a ladder on which no data from otherladdersdepend.Theladder is thenremovedfromthedagandtheprocess is repeated. -Pointers in expressions in basic blocks can be handled by two simple rules: 1. an assignment under a pointer m&es any variable used in a subsequent expression dependent on that assign~ent;2. retrieving a value from under a pointer is dependent on all preceding assignments. Extended analysis may allow someof these dependencies tobe canceled. -Optimal rewriting of expression trees (but not of dags!) can be obtained through code generation; BURS standsfor Bottom-Up Rewriting System.
-
-
7
URS technique allowsone to decompose a given input treeof arbitrary complexity number of subtrees, each of which is a member of a given set of trees, the pattern ttern trees may again beof arbitrary complexity. RS to code generation, we equate the input tree with the expression AST trees with the ASTsof the machine instructions. in two scans over the input tree, one bottom-up and one top-down. The m-up scan annotates each nodeof the input tree with references to nodes in the pattern trees. The presence of a reference to a node N with an node I in the input tree means that the tree with I at the top can be rewritten by rewriting its top section by the of the top section, all subtree that has N as its top. This implies that after the rewrite other parts of the tree belowI can also be rewritten, The top-down scan can then rewrite the entire tree. -The bottom-up scan combines sets of fragments of pattern trees that are a~plicableat a certain node in the input tree much in the same way as a lexical analyzer combines sets of items of regularexpressionsthatareapplicableinacertainpositionintheinput stream. -Like the lexical analyzer, the speed of the BURS pattern matcher can be improved by implementing it as an FSA, a tree auto ma to^ in this case, rather thaninte~retatively. -Unlike the case of the lexical analyzer, the various patterns have different costs, and we want a minimum-cost rewrite. In the inte~reting imple~entation, cost-based decisions canbehandledbyusingadynamicprogrammingtechnique:at each node,onlythe cheapest way to get the result in a given register type is retained, In the tree automaton imple~entation,constant costs can be incorporated into the automaton. The resulting ansition table are often huge but can be compressed considerably. URS code generation is adapted relatively easily to additional requirements. Examples eneration for machines with several typesof registers and the extensionof the method to t ow-of-con~ol instructions. -Two variables which are both live at a given position in the program ‘interfere’ with each other when register allocation is concerned. If we know the live ranges of all variables, we can construct the register interference graph of the variables, in which each node represents a variable andeach arc between two nodeN I and N 2 represents overlap N Iand N 2 . of the live rangesof the variables represented by the nodes -We can find a possible assignment of registers to variables by coloring the graph such that no two nodesof the same color are connectedby an arc; each color then represents a register. The optimalregisterassignmentcorrespondstoagraphcoloringwiththe lowest possible numberof colors. -The problem of optimal graph coloring is ~P-complete,but good heuristics exist; for example,temporarilyremoveanode of minimaldegreefromthegraph,colorthe remaining graph recursively using the same algorithm, reconnect the removed node and color it. -In supercompilation, a small but frequently usedintemediate code fragment is taken and the best possible code for it is generated using exhaustive search. The resulting code is then used as a template in a compiler. Surprising code sequences have been found in this way.
eforeconvertingthe i n t e ~ e d i a t ecode totargetcodeitmaybepreprocessedto improve efficiency. Examples of simple preprocessing are constant folding and arithmeticsimplification.Carehas to betakenthatarithmeticoverflowconditionsare so. translated faithfully by preprocessing, if the source language semantics requires -More extensive preprocessing can be done on routines: they can be in-lined or cloned. -In in-lining a call to a routine is replaced by the body of the routine called. This saves the calling and return sequences and opens the way for further optimizations. Care has to be takento preserve the semanticsof the parameter transfer. -En cloning, a copy C of a routine R is made, in which the value of a parameter P is fixed to the value V, all calls to R in which the parameter P has the value V are replaced by calls to the copy C. Often a much better translationcan be produced for the copy C than for the original routineR. -~ o m sub-optimal e symbolic machine code sequences produced by the code generation processcanberemoved by peepholeoptimization,inwhichfixedparameterized sequences are replaced by other, better, fixed parameterized sequences. About a hundred re~lacementpatterns are sufficient to take care of almost all correctable inefficien'es left by a relatively simple code generator. eplaceable sequences in the instruction stream are recognized using an FSA based on the replace~entpatterns in the peephole optimizer. The FSA recognizer identifies the longest possible sequence, as it does in a lexical analyzer.The sequence is then replaced and scanning resumes. -Code generation yields a list of symbolic machine instructions, which is still several stepsaway from aexecutablebinaryprogram.Inmostcompilers,thesestepsare delegated to the local assembler. he assembler translates the symbolic instructions generated for a source code module to a relocatable binary object file. The linker combines some relocatable binary files andprobablysomelibraryobjectfilesintoanexecutablebinaryprogramfile. The loader loads the contents of the executable binary program file into memory and starts the executionof the program. -The code and data segme~tsof a relocatab~eobject file consist of binary code derived directly from the symbolic instructions. Since some machine instruction require special ent, it may be necessary to insert no-ops in the relocatable object code. tablebinaryobjectfilescontaincodesegments,datasegments,relocationinformation, and external linkage information. -The memory addresses in a relocatable binary objectfile are computed as if the file were of the loaded at position 0 in memory. The relocation information lists the positions addresses that have to be updated when the file is loaded in a different position, as it usually will be. -Obtai~ingthe relocation information is in principle a two-scan process. The second scan can be avoided by bac~patchingthe relocatable addresses as soon as their values are determined. The relocation information is usually imp~ementedas a bit map. -An external entry point marks a given location in a relocatable binary file as available fromotherrelocatablebinaryfiles.Anexternalentrypointin one module can be
accessed by an external reference in a different, or even the same, module.
- The external linkage information is usually i ~ ~ p ~ e ~ easn tane array d of records. - The linker combines the code segments and the data segments of its input files, converts relative addresses to absolute addresses using the re~oca~ion and external linkage information, and links in library modules to satisfy left-over external references. - Linking results in an executable code file, consisting of one code segment and one data segment. The relocation bit maps and external symbol tables are gone, having served their purpose. This finishes the translation process. - In an extremely high-level view of compiler const~~ction, one can say that textual analysis i s done by pattern ~ a t c h i n gcontext , handling by data-flow machine, and object code synthesis (code generation) again by pattern matching .
Books on and general discussions about interpreter design are rare, unfo~unately. The most prominent examples ar Griswold and Griswold (1986), who describe an * rpreter in detail, and by t (1981), who describes a variety of interpreter types. h valuable information c 11 be found in the ~ r ~ ) c e ~o ~ tihnSe ~ s~ G ’87 ~ Sym~ N ~~~~i~~ on I ~ t e r p r e t e rand ~ ~ I~terpretive~ e c ~ ~(1987). i ~ ~ e s The most important recent books on code g e ~ e r ~ t i oare n by ~ u c h n i c k( Morgan (1998). A continuous source of infor~ationcan be found in the annual PLAN ~onferenceson P r o ~ r a ~ ~Li anngg ~ a esign and ~mple~entation, I? plete compiler, the survey of data-flow analysis techni~uesse retar~etableC c o ~ p i l e Zrc, r is described by As with interpreters, readin material on assem~lerdesign is not abundant; we mention Saloman (1992) as one of the few books. Linkers and loaders have long lived in the undergrowt~of compilers and operating systems; yet they are getting more i ~ ~ o r t a with n t each new p r o g r a m ~ i nlanguage ~ and more c o ~ ~ l i c a t ewith d each new o ~ e r a t i ~system. g Levine’s book (1999) is the first book in 20 years to give serious attention to them and the first ever to be dedicated exclusively to them.
Exercises This is an exercise in ~ e ~ o i ~ a t iwhich o n , is not properly a compiler const~ction subject, but the exercise is still instructive. Given a directed acyclic graph G and a node N in it, design and implement an algorit~mfor finding the shortest distance from N to a leaf of G by recursive descent, where a leaf is a node with no outgoin arcs. Test your imple~entationon a large graph of the structure shown in ure 4.95. Extend the iterative inte~reterin Figure 4.5 with code for operators.
Test graph for recursive descent marking.
Iterative interpreters are much faster than recursive inte~reters,but yield less extensiverun-timediagnostics.Explain.Compiledcodegivesevenpoorererrormessages.Explain. Explain how a self-extracting archive works (a self-extracting archive is a that, when executed, extracts the contents of the archive that it represents).
program
Section4.2.3.1outlines howthethreaded code of Figure 4.14 canbe reduced by numbering the routines and coding the list o f calls as an m a y o f routine n u ~ b e r s . Show such a coding schemeand the corresponding interpreter. Generating threaded code as discussed in Section 4.2.3.1 reduces the possibilities for partialevaluationas discussed in Section4.2.3.2, because the switchis inthe Expression-P routine. Find a way to prevent this problem.
.7
The weight of a tree, as discussed in Section 4.2.4.2, can also be used to reduce the ma~imumstackheight whengenerating codefor the stackmachine of Section 4.2.4.1. (a) How? (b) Give the resulting code sequence for the AST ofFigure 4 2 3 . Thesubsection on machines with r~gister-memoryoperations in Section4.2.4.2 explainsinformally howtheweight function mustbe revised inthe presence of inst~ctionsfor combining the contents of a register with that of a m e ~ o r ylocation. Give the revised version of the weight functionin Figure 4.34.
The code of the C routine of Figure 4.96 co~esponds to the flow graph of Figure 4.97. The weights for static profiling havebeenmarkedbythe letters a to g. Set LIP the traffic flow equations for this flow graph, underthe following assump-
tions. At an if-node 70% of the traffic goes to the then-part and 30% goes to the else-part; a loop body is (re)entered 9 out of 10 times; in a switch statement, all cases get the same traffic, except the default case, which gets half.
void
Routine (void) { if (...) { while (.)..) { A;
1
1
else { switch { case: .,: B; break; case: ...: C; break; (
1
1
.
S
.
)
1
igure 4.96 Routine code for static profiling.
Using the same techniques as in Exercise 4.9, draw the flow graph loop
for the nested
while (..,) { A; while (.".) { B;
1
1
et up the traffic equations and solve them. In Section 4.2.5. l, an a l g o ~ t is h ~presented to convert theAST of a basic block to a data dependency graph. Step 3 of that a ~ g o r i t hinserts ~ dependencies to reflect the fact that an assi~nmentto a variable replaces its old value. Which other r e ~ u i r e ~ e n t does it also express? Given the code fragment x := a*a + 2*a*b + b*b; y := a*a - 2*a*b c b*b;
draw the depe~dencygraph before and after common subexpression e~imination. (a>Rraw the dependency graphof the C expression *p+ +. (b)In thebasicblock { a = *p++ ; b = *p++},theexpression *p++is not a
common subexpression, appearances ~otwithstan~ing. How does the c o ~ m subo~
end if ~
7 Flow graph for static profilingof Figure 4.96,
expression elimination technique described in Section 4.2.5.1 discover that *p++ is not a c o ~ ~ sube~pression? o n In Section 1.2 we chose ASTs over lists of instructions for the intermediate code representation, Redo the demo compiler from that section with triples rather than te ASTs as i n t e ~ e ~ i acode. Give the data dependency graph after common sube~pression elil~ination of Exercise 4.12 in triple rep~esentation. tep 2 of the algorithm in the subsection on the lineari%ationof the data dependency raph in Section 4.2.5.2 features two nodes, S and N. Is the algorithm still correct when S and N happen to be the same node? Show that there isalways at least one available ladder sequence whenl i n e ~ i % ithe n~ data dependency graph. efer to Figure 4.49 concern in^ the rewriting and ordering of a ladder sequence. Giventhe com~utativityof the 1- operator, it is temptin to addthevalue of b
directly to I1 without loading it into register R1 first, thus saving one register and one instruction. Explain why this is not allowed. Use the ladder-sequence algo Answers.8. graph of Figure ladder sequence.
9generate to code for the dependency d dependency graph after each coded
equence algorithm on page 329 always uses R1 as the ladder er code can be obtain by using the appropriate' register an to-registertransfers.scusswaystochoosean 'appro~riate' Given a ~ a c h i n ewith 3 machine instrlIctio~s:
where ~e~ denotes the contentsof a memory location. (a) Show the sets that the S code ge~eratorbuilds at the nodes of the input tree co~espondingto the expression a + b, and explain why it does so. (b) Show the tree or trees that result from the rewriting process. Refer to Section 4.2.6.3 on instruction selection by dyna~ic ~rogra~lmillg. we indicate associativity of operators to the code generator generator, in addition to commutativity. Would this be useful? Would this cause the code generator generator to add more patterns? Which ones'? edesign the lexical analysis algorithm to yield the lar est number of tokens for a given input string, rather than the longest matches, using dynamicp r o g r a m ~ i nin~a way similar to the tree pattern~ a t c h i n galgorithm. Why is it not good enoughto just yield the shortest token all the time'? edesignthelexicalanalysis algorith~toyieldthesmallestnumber rather than the longest matches.
of matches,
The dynamic progra~mingalgorithm sketched in Section 4.2.6.3 assigns cost zero to ~ e m o r y rather ' than the label rnern, since the label means 'value already resides in 'value can be brought to memory at cost C'. Add an ins~ructionStore-Reg R , x with cost 3 units to the instruction set of Figure 4.60. This illst~~ction allows any operand that resides in a register to be yielded in memory, at a price. Also, in addition to the cost of a rewrite in a label, we can record the ~ a x i number ~ u ~of registers used. edo the c o ~ p u t a t i o that ~ s led to igure 4.7 1, while including the ing operands in memory and recording the l l l a ~ i number ~ u ~ of the resulting tree to derive an i~structionsequence when only one register is available. What happens when two registers are available?
Given the register interference graphfor the five variables a,b, c,d,and e: a
b-C d
e
show the steps the graph coloring algorithm goes through when coloring this graph, and produce the resulting register allocation. 7 Given the code sequence
{
int tmp-2ab = 2*a*b; int tmp-aa = a*a; int tmp-bb = b*b; = tmp-aa + tmp-2ab +tmp-bb; y = tmp-aa - tmp-2ab +tmp-bb; X
1
and given that a and b are live on entry and dead on exit, and that x and y are live on exit: (a) C o n s t ~ cthe t register interference graph. (b) Color the graph. How many registers do we need? Refer to Section 4.2.8 on supercompilation. Suppose we have a modest set of 20 machineinstructions, each operating on 2 registers,and 3 machineregisters.We want to find a pattern for a given function by trying all combinations of at most N i ~ s t ~ c t i o n sUnder . theassumptionsthattryingasinglecombinationtakesten microseconds, that all but one or two combinations are rejected, and that rejecting takes place on the average after two tests, what would abereasonable value ofN for the search to take a weekend? On some processors, multiplication is extremely expensive, and it is worthwhile to replace all multiplications with a constant by a combination of left-shifts, additions, and/or subtractions. Assume that our register machine of Figure 4.22 has an additional inst~ction: Actions
Instruction
Shif t-Lef t-Reg c, R,R,:
=R,
Add-Constant cl-d
Show how the FSA recognizer and replacer described in Section 4.2.12.2 co~pletely removes the instruction sequence Add-Cons tant 1; Add-Cons tant 2 ; Add-Cons tant - 3. Show all states of the recognizer during the transfo~ation. Learn to use the local assembler, for example by writing, assembling and running a program that prints the tiables of ~ u l t i ~ l i c a t i ofrom n 1 to 10. any processors have modes and/or instructions. They may, for example, have a jump instruction that adds a constant to the program counter. What is the advantage of such inst~ctionsand addressi~gmodes? any processors have conditional jump inst~ctionsonly for conditional jumps with a linited range, For example, the target of the jump may not be further than 228 bytes away from the current program counter. ~ o ~ e t i m ean s , assembler for such a processor still allows unlimited conditional jumps. How can such an un~imitedconditional jump be i~plemented? Find and study docu~entationon the object file format of a compiler system that you use regularly, In pa~icular,read the sections on the symbol table format and the ~elocationi n f o ~ a t i o n . istory of code ~ e ~ e r ~: Study t i o ~Anderson’s two-page paper (1964), which introduces a ~ u d i m e n t aform ~ of bottom-up tree-rewriting for code generation, and identify and s u m ~ ~the i ~techniques e used. Hint: the s u m m ~ ywill be much longer than the paper.
All compilers and many run-time systems use dynamically sized data. The size of such data is not known in advance and room for it must be found at run time. Examples inside thecompileraresymboltables,stringsfromthesourceprogram, ASTs, register interferencegraphs for graphcoloring,andmanyothers. The examples in run-time systems derive from the nature of the source language: strings, dynamically sized arrays in il~perativelanguages,closures in functionallanguages,tentativeunificationsinlogic languages, and incoming messages in distributed languages are a few that come to mind. of There was a time when compiler writers could afford to allocate fixed amounts memory for such data, for example 32 bytes for an identifier, and give an error messageor even truncate the data, with or without warning, when the limit was exceeded, With the s o l ~ e t i ~ e s a wmemory ~ w ~ dst~cturesand the l i ~ i t e dmemory man~gement techni~ues of thosedays,suchrestrictionshelped to simplifycompilerwriting.Also,available memoriesweresosmallthatnobodyexpectedtobeabletouselongidentifiersorto compile large programs. With present memory sizes and better p r o g r ~ m i n glanguages, techni~L~es, andalgorithms,both ~ g u m e n t shavebecomevoid,anddynamicmemory ~ a n a g e ~ise an ~ tintegral partof both compilers and~ n ~ t i m systems, e When a program is started,mostoperatingsystemsallocateatleastthefollowing three memory segmentsfor it:
-the ‘code segment’, which contains the program code and which
is usually read-only or execute-only;it is addressedbytheprogramcounterand is nexttoinvisible to the program code itself; -the ‘stack segment’, which contains the stack and which may come with an overflow and underflow detection mechanism; itis addressed by one or more stack pointers, which are most of the time manipulated automatically by machine inst~ctions; -the ‘data segment’, which is a single contiguous stretch of memory locations, totally at the disposition of the program for the purpose of storing data; its start location and size
are accessible to the program in some way and its contents are addressable by machine inst~ctions;itsuse is uptotheprogrammer.Whentheemphasisisonmemory mana~ement,the data segmentis also called the Some operating systems allow more co~plicated d so~etimes baroque r main interest is in the ~chitectures,but we will restrict ourselves to the simple case. rnent. ~ s u a lthere l ~ is an o~erating data segment; we will use it for all our memory m ent; if the s e g ~ e nhas t to be moved system c o ~ a n or d system call to resize the data as a result of this, the operating system makes sure this move is transparent to the program. It is the task of m e ~ o r y m a n a g e ~ eton hand t out and collect subseg~entsof the data segment such that all su~segmentsfit inside the data segment and no memory location is ever part of more than one subsegment. These subse~mentsare usually called ‘blocks’ or ‘chunks’; for a more precise definition of blocks and chunks see Section 5.1.1. Additional important objectives are thata request for memory should not be refused when thereis still a possibility to fulfill it and that the amortized cost of obtaining and freeing~ e ~ oshould r y or all~ostconstant. See below for an e x ~ l a n a t i oof~the notion of ‘amortized ory allocation methods in the 1960s and’70s used to require a time proportional tothenumber of blocks han~edout;thisresultedinquadratictime require~entsand limited the methods to some several thousands of blocks. This was no problem with the limitedcomputermemories in thosedaysbut is ~~na~ceptable now,andmoreefficient methods are required.
cost average 1s the of one The ence step,averagedover a longenoughperiod.If, for example, each step normally costs 1 unit but one in five costs l0 units, the amortized cost is 2.8 units, since thatis the avera e per step in the long run. This way of measuring costs is useful for situations in which we cannot prevent occasional steps from being much more expensive than usual. This is the case in algo~ithmsthat require occasional reorganization of their data; it also applies to buyinga house.
Allocating memory in the heap is easy enough: we keep a pointer to the first free location in the heap, allocate the requested block from there, and bump the pointer to the next free location. The problem with this schemeis that sooner or later? we run outof heap space; if memory blocks are requested continually and no blocks are released, even the largest memory will fill up eventually. f we decide we want blocks to be released, a new question arises: who is responsible for the release? The naive answer is of course’,butthatanswer un~eresti~~ates thecomplexity of thetas systems often allocate data which have unpredictable lifetimes and of whose existence the p r o g r a ~ e is r not even aware. Efforts to solve these problems have resultedin techni~ues for automatically releasing unuseddata,
En the following sections we will discuss techniques for data allocation with explicit deallocation bytheprogrammer,and for data allocationwithimplicitdeallocation. Emplicit deallocation is usuallycalled'garbage collection'. Generallyspeaking,most compilers deallocate their internal data explicitly (or not at all) and most run-time systems leave the deallocation of user- allocate^ data to the user or the garbage collector, and that of their own data to the garbage collector. More in p ~ i c u l a ralmost , all ~ n - t i m esystems for programsinthefunctional,logical,andmoreadvancedparadigms feature garbage collectors.Garbage collectors Me also usedinmany app~icationsoutsidecompiler construction. Examples are text editors, word processors, photo editors, and web browsers. We will therefore discuss the tec~niqueswith explicit deallocation in a general and compiler setting and those with implicit deallocation ainrun-time system setting. Roadmap e m ~ a~ g e ~ eo n t ~ S. l Data allocation with explicit deallocation 5.1.1 Basic memory allocation 5.1.2 Linked lists S. 1.3 Extensible arrays 5.2 Data allocation with implicit deallocation 5.2.1 Basic garbage collection algorithms 5.2.2 Preparing the ground 5.2.3 Reference counting 5.2.4 Mark and scan 5.2.5 Two-space copying 5.2.6 Compaction 5.2.7 Generational garbage collection
5~
In most systems, basic memory allocation comes in the formof a routine that finds a block of unused memory of the requested size, marks it as used, and returns a pointer to the block. If no such block is available, the result varies: a null pointer may be returned, an error routine rnay be called, or the program rnay be aborted. The requested size is given as a p~ameterto the routine; the marking prevents the block from bein handed out more than once, A second routine can be used to return the block to the system. This routine simply marks the block as not in use.The implementation of this m ~ ~ varies; n g a bit map with one marking bit for each byte suggests itself, but other imp~ementationsare possible and perhaps equally usual. This basic mechanism is available in C as the routines void *malloc (siz e-t size) and free (void * p t r ) . The void * const~ctionis a tricktofake p o l y m o ~ ~pointers ic in C, and need not worry us here. We will discuss the ~ o r ~ n of gs malloc () and free ( ) below. Although compilers are essentially just n o ~ a pro~rams, l they have enough in common to consider the allocation and deallocation problems of two classes of data types that ~ eand ~ e ~ t ~ ~ sarrays; i~le are in great demand in compilers. These data types are Z i ~ lists
both are used to imple~entsurte~or ~ ~ ~ osets r or t ehags. ~ It WO mentation l a n ~ u a had ~ e type constructors for them built in, but h A correct and efficient implementati~nof these data structures is for the co~ectnessand the efficiency of the compiler, which is to them here. It is true that some modernl Java) have automatic data alloc all compilers are still written in a traditional tion and deallocation in these shown that it is very advanta~ properly and systematical~y.
A memory allocation requestof N bytes supplies the user witha p a block of N free bytes. The ~ e m allocation o ~ process requires the block to
in a more complicateci data s t ~ c t u r esince some additional ad~inistrationis
manyplaces, includin~ pro~ramvariables, ma chin^ re~isters, Chunk pointers occur only in chunk ad~inis~ations, if at all. The offset between chunk pointer and block pointer needs s is that the address must be a ~ u l ~ p ofl eZN, with N depend in^ on t machines, for example, inte~erscan only be stored in and re addresses divisible by 4.
st or in^ any data type in the common multi~leof all d means that the nu~ericalvalue of the block pointer rem ~ u ~ t i pofl e8 or 16, and chunk addresses must beat least 8 or We will now consider the relation between ch a chunk is known, one can find its end and thus the start of the next reside inside a block are under user control and will point to the start pointers inside the allocation administration, if any, are un S. A possible layout of chunks is shown in tothechunk si weneed a fewbits in each chunk for acimini of the bits is a ,whichindicates if a chunk is free.
the . Onmachineswithalignmentrestrictions?thesizefieldmustbealignedonan integer-align~entb o u n d a ~if; we allocate 4 bytesfor the field, it can probably include the administration bits. For the block pointer to come out on a 16-, 32-, or 64-byte boundary, the chunk must then be allocated on a 16-, 32-, or 64-byte boundary minus4.
chunk
chunk
IOW
high
marked free use marked in Memory structure used by the mallodfree mechanism.
~ewillnowdescribeanaive i~plementationof a routine Mal loc (Block but is not good si2;e ) ;it is intuitively very simple and exhibits the basic principles? enough for practical use; opti~izationswill be suggestedin Section 5.1.1.1. To allocate a block B of the required Block s iz e, Ma lloc steps through the chunks until it finds a chunk C marked ‘free’ that is l e enough to a ~ c o ~ ~ oB.d The a~e chunk C is then broken into two chunks C l and Cz, S that Clhas the proper size for the block B -unless of course the block happens to fit exactly, in which case no chunk C 2 is created. Now the size fields of Cland C 2 are set to their new values, the free bit of Clis turned off, that of C 2 is turned on, and a pointer to the block in Cl is returned to the requester. To free a block pointed to by a given pointer, the free bit of the co~espondingchunk is simply turned on. If Malloc cannot find a chunk that is large enough it first tries to coalesce adjacent free chunks into larger chunks,by scannin~the ~ ~ r looking n o for ~ such situatio~s.If two adjacent free chunks Cland C 2 are found, the size of Clis set to the sum of the sizes of Cland Cz. The free bit of Clneed not be modified since it is already set to free; neither do the administ~ationfields in Cz need adjustment since they are now out of reach of the allocator. This operation can also be performed during any other scanof m e ~ o r yo,r upon f this does not provide a suffi~ientlylarge chunk, Mal loc calls a routine Soluof memory condition ( ) in a last a t t e ~ pto t remedy the situation. ion to out elow we discuss some ways of o b t a i n i ~additional ~ memory that could be i~plemented of memory condition(). intheroutine Solution to out
SET
the polymorphic chunk pointer First-chunk pointer TO Beginning of available memory; SET the polymorphic chunk pointer One past available TO memory Beginning of available memory+ Size of available memory; of available memory; SET First-chunk pointer .size TO Size SET First-chunk pointer .free TO True; F ~ C T I O NMalloc (Block size) RETURNING a polymorphic block pointer: SET Pointer TO Pointer to free block of size (Block size); IF Pointer /= Null pointer: RETURN Pointer; Coalesce free chunks; SET Pointer TO Pointer to free block of size (Block size); IF Pointer /= Null pointer: RETURN Pointer; RETURN Solution to out of memory condition (Block size); PROCEDURE Free (Block pointer) : SET Chunk pointer TO Block pointer - Administration SET Chunk pointer .free TO True;
size;
Figure 5.2 A basic implementationof Mal loc (Block s ize ) .
Outline code for the basic Mal loc is given in Figures 5.2 and 5.3. The pointer One past avai lab1 e points memory to the first byte past the end of available memory. This is a useful pointer, but if memory extends to the last byte of addressable memory it may not be representable; in that case additional trickeryis needed. The codeassumes that data can be accessed at any alignment; if this is not supported by a given machine, the computation of RequeS ted chunk size in Figure 5.3 must be modified to take the alignment requirements into account. The i~plementationof theroutine Solution to out of memory condi() depends on the system. If the basic memo^ allocation system coexists with a t ion garbage collector that uses the same chunk structure, the routine can call it, in an attempt to find unusedchunksthathavenotbeen freed. Anotheroption is totrytoincreasethe mount of memory allotted to the program. The Malloc algorithm as stated assumes that the value of One past available memory is fixed for any one run of the program, but actually many operating systems feature requests to reset this value and thus extend or restrict the amount of memory available to the program. After all, our memory allocation system is a client to a more basic memory allocation system, that of the operating system. past available memory can be issued at any moment Such requests to bump One the routine Pointer to free block of size (Block size) fails tofind a block,and it is temptingto do so atthe first possibleoccasion,toavoidthework of coalescing blocks or doing garbage collection. The problem is, however, that some operating systems will extend the available memory almost indefinitely but implement this by supplying paged or virtual memory, which resides completely or partially on disk and is
FUNCTION Pointer to free block of size (Block size) RETURNING a polymorphic block pointer: // Note that this is not a pure function SET Chunk pointer TO First-chunk pointer; SET Requested chunk size TO Administration size + Block size; WHILE Chunk pointer/= One past available memory: IF Chunk pointer .free: IF Chunk pointer .size- Requested chunk size>= 0: // large enough chunk found: Split chunk (Chunk pointer, Requested chunk size); SET Chunk pointer .free TO False; RETURN Chunk pointer+ Administration size; // try next chunk: SET Chunk pointer TO Chunk pointer -I- Chunk pointer .size; ETURN Null pointer; ~ROCEDURESplit chunk (Chunk pointer, Requested chunk size): - Requested chunk size; SET Left-over size TO Chunk pointer .size IF Left-over size> Administration size: // there is anon-empty left-over chunk SET Chunk pointer .sizeTO Requested chunk size; SET Left-over chunk pointer TO Chunk pointer + Requested chunk size; SET Left-over chunk pointer.size TOLeft-over size; SET Left-over chunk pointer.free TOTrue; ~ROCEDURECoalesce free chunks: SET Chunk pointer TO First-chunk pointer;
WHILE Chunk pointer/= One past available memory: IF Chunk pointer .free: Coalesce with all following free chunks (Chunk pointer); SET Chunk pointer TO Chunk pointer + Chunk pointer .size; PROCE~URECoalesce with all following free chunks (Chunk pointer): SET Next-chunk pointer TO Chunk pointer -I- Chunk pointer .size; ~ H I L ENext-chunk pointer /= One past available memory D Next-chunk pointer .free: // Coalesce them: SET Chunk pointer .size TO Chunk pointer .size+ Next-chunk pointer .size; SET Next-chunk pointer TO Chunk pointer + Chunk pointer .size; igure 5.3 Auxiliary routines for the basic Nalloc (Block size).
whichareunknown:the memo^ allocator itself,the way the ~ r o g r auses ~ it, an operating system. One impo~antrule is of course that the ~ ~ e ~ allocator o r y should not give up as long as there is one possibility left to continue.
There are two efficiency problems with the list. First, finding a suitable chunk in the free list requires lin memory, which is unacceptable. Second, coalescing is done i only when the usual linear searchfails; this makes the perfor lar. The simplest approach to the first problem is to chain t the link pointers can be a c c o ~ ~ o d a t eindthe unused spac area. Chunks can still be split as explained above; chunks to the list. ~e have thus replaced linear search through all the free chunks. This is some improvement, butis likely n A more sophisticated implementation classifies the keeps a free list for each interval. ~ntervalscould for 2, and separate linked lists could be kept for all free c 2n'1--3 ,for a sensible set of values of n. Now, when determine a k such that 2"' ~ ~ 5 We 2 now ~ find . the through 2k'1-l ;if that list is empty, we try thelist for 2 we find one that is not empty. The first block in tha room, so allocation takes place in cation must of course be linked in time too. The interval boundaries tion, a different increasing sequence canbe more useful. can be done on thefly, during each callof Free if we have easyacces~to the chunk preceding the one beingfreed; the one following it is already wit~ineas using the size of the chunk beingfreed. This access can beo ~ t a i n si e~ size i n f o ~ a t i o nof each chunk atits end, where it can befound by the course, this duplicate size information costs additional space and must other operations on the chunks. By combining these two optimizations, a very effic~ent~ e ~ o ~rl lyo c ~ t can o r be constructed. The combination of the two optimizations is not entirely trivial since a cbun being freed may bemergedwiththeprecedingand/orfollowingchun which of them is free, The su~oundingfree c h ~ n kor chunks must then their respective free lists, and the c h u result ~ ~ in^ from the ~ e ~ g proper free list. We will now turn to two special a~pli~ations of ~ ~ ~llocation m oin a CO ~ linked lists and extensible arrays.
lists are used for many purposes in a compiler: identifier lists storage, synta~trees, code frag~ents,etc. The C compiler from the
t ACK (Tanenbaum et al., 1983) contains 25 different list types and the Orca compiler contains 17; for Orca, see Bal,~ a a s l ~ o and e k Tanenbaum (1992). a naive Records in these linked lists are added and removed at irregular intervals, and implementation is to request them one by one from the standard memory manager, for example mal loc ( ) ,and return them using f r e e ( ) Much efficiency can be gained by batching these records in blocks rather than allocating them one by one. A separate set of blocks is allocated and maintainedfor each record type T. Each block is an arrayof a fixed number of records and is obtained from the standard memory ~ a n a g e rsuitable ; sizes are 16 or 32 records per block.A free list is ~aintainedlinking the free records in the blocks. Initially, the system starts with zero blocks and an emptyfree list, The first allocation request for a record of type T finds the free list empty, allocates a block of type ARRAY O F T, creates a free list linking the free records using the space in the same free records., andhandsoutthefirstrecord.Subsequentallocationrequestsnormallyobtaintheir records directly from the free list, which speeds up the memory management considerably. ecords can be r e t u ~ e dto this system and are then reattached to thefree list. Blocks are never returned to the standard memory manager. In principle they could be, if an entire block came to consist of free records only, but detecting such an event would require much more ad~inistrationand the total gain would be limited, as observed in Exercise 5.5. Figure 5.4 showsoutlinecode for theroutines New T () and Free T (Old record) . The global variable Free l i s t f o r T is used to hold the beginning of the linked list of free records. When a new record is requested, Free 1i s t f o r T is tested to see if a free record is available. If it is not, a new block is requested from Nalloc ( ) and the records in the newly obtained block are linked into the free list, which is then no er empty. The requested record can now be unlinked from the head of the free list, Free T () ibly zeroed, and a pointer to it be delivered to the caller. The routine links the record to be freed back into the free list; the record is not marked or cleared in any way and if the program uses the pointer to it afterwards, chaos will ensue. The struce free list,as it winds through the allocated blocks,is sketched in e free listisconstructedusing link fields in therecords; si present only in free records., they do not need allocation space of their own and can be stored in the records, in the same space where user data resides when the record is in use. This techni~uerequires the records to be at least as large as a link pointer, a condition that is almost certainly fulfilled for records used in a compiler. It will be clear that this code can easily be generated automatic all^ for all record types 7' used in dynamic allocation in the compiler or in the compiled program.
~~~y is an array to which elements can be added at the high-inde~end. Extensible mays are used in several places ina compiler: in hash tables, while reading and storing input the length of which is not known in advance, while constructing executable program images in the assembler, etc. Their use in run-time systems is perhaps less common but not exceptional; one application is the event queue in simul~tionprograms, when this queue is l~aintainedby heap sort. Extensible arrays have advantages over linked lists w ~ e nthe elements are small, never change their order,or are never freed again. For exam-
SET Free list for T TO No T; FUNCTION NewT ( ) RETURNING a pointer to T: an IF Free list for T = No T: // Acquire a new block of records: SET New block [l .. Block factor forTI TO Malloc (Size ofT * Block factor forT ) ; // Construct a free list in New block: SET Free list for T TO address of New block [l]; FOR Record count IN[l .. Block factor forT - 11 : SET New block [Record count] .link TO address of New block [Record count + 11; SET New block [Block factor for TI .link TONo T ;
// Extract a new record from the free list: SET New record TO Free list T;for SET Free list for T TO New record .link; // Zero the New record here, if required; RETURN New record; PROCEDURE FreeT (Old record) : // Prepend Old record to free list: SET Old record .link TO Free list T; for SET Free list for T TO addressof Old record; ~
i 5.4 ~Outline ~ code r for blockwise ~ allocation of records of type T
ple, storing the characters of a string in an extensible array is much more attractive than storing them ina linked list, The simple-minded approachto extensible arrays would be tofirst allocate an array of reasonable size, and then, when the array turns out to be too small, to extendit with a reasonable increment, and to repeat this process as needed. Doing so, however, yields a quadratic algorithm, as can be seen as follows. We write the cost of allocating an array of size n as C(n), When we try to increase thesize of the array, say withp bytes, we will find that there is not always room for this at the end of the array. In that case we will have to allocate a new array, at a cost a,which we suppose is independent of a. We then have to copy the contents of the old array; this will have a cost proportional to n, say yn. In C, the function realloc ( ) does all this for us; other systems usually have similar facilities. This leads us to the following recurrence relationfor the cost of obtaining a new array of length n +p:
C(n+p)=C(n)+a+yn
The recurrence relation is solved by
a
‘I
free-list-Elem
block l
block k
F i ~ 5.5 ~ rList ~ of free records in allocated blocks.
w h i c ~can easily be verified by substitution, and which shows that the processis quadratic ~n n. o ~ u n a t e ~ity ,is possible to extend mays in linear time. The trick is to increase the size of the array by a constant factor rather than by a constant amount. When needs to be extended, we allocate a new array that is P times as big, with p> l . A allocation has constant cost a. We then copy the contents of the old array into the new array, at cost yn. This yields the recurrence relation
c(Pn)=c(~)+a+Yn which has the solution
his can again be verified by substitution, and shows that the process indeed requires time is negligible. Intuitively speaking, each reallocation step mear in n; the logarith~ic tern takes p times as much time as the previous one,but the results lastp times as long. A disadvantage of almost any scheme for extensible arrays is that sooner or later the array bas to be moved. ~onsequently,no pointers can be kept to entries in the array, and elements of the array using an index involves loading the address of the array first rather than using a fixed starting location. Fortu~ately,the C syntax allows the same construction €or both situations: itis ‘referentially transparent’ in this case since the index-
.2. I
ing construct A [i] will work both if .& i s a fixed allocated array o pointer to a dyna~icallyallocated array of type 7'. The basic allocation scheme embo~iedby malloc ( ) and f allocation, and the extensible array t e c ~ n i ~ all u e have one thin explicitly indicated by the user. Explicit deallocation is a proble will now turn to auto~atic~ e t h o ~ s mer and compiler alike. explicit deallocation. ~~
blocks in handwritt~ncode gram. Correctly freeing ~npredictablelifeti~esof Many p r o g r a ~ e r shave experienced the conse~uencesof freei by ~ s t a and, ~ e later o possibly reused - data.
- the dangling pointer e~oneousl y; - after ~ereferencingthe dangling pointer, the p r o ~ r amay ~ proceed with the incorre~t data for some time before an inco~sistencyis ~ ~ s e r v e d ; - the error may be hard to reproduce, because the exact same sequence of memory alloeations and deallocations may be required to cause the observed problem in interactive systems. erefore, garbage collection is co~sideredto be an i ~ p o ~ afeature nt of ~ o d e r n m n g s~stemsthat reduces p r o ~ ~ a ~ ~ ne ififno g~ considerably. s rning systems that offer ~ ~ r b colle a ~ e ~ m a l ~ t afunctional l~, la~g~~ages scripting languages like awk and
fied in this chapter by two techniques, mark and scan and ~ o - s p copying. ~ c ~ The thee techniques are all very different: eference counting directly identifies gqrbage chunks. It is simple and reasonably efficient but requires all pointer actions tobe monitored during program execution and may cover all garbage chunks. andscanidentifiesreachablechunksandconcludesthattherest is garbage. It is reasonably efficient and does not require pointer monitoring, butis quite complicated. It is the only algorithm that will recover all available memory, -Two-space copying is not concerned with garbage. It copies the reachable chunks from a memory region called ‘from-space’ to a memory region called ‘to-space’; the rernainspace in to-space is a single free chunk. It is very efficient, does not require pointer monitoring andis moderately complicated but wastes half of the memory. ~ n c e g ~ b chunks a g e have been identi~edby these techniques they must be turned into free memory space to be useful. Garbage chunks found by reference counting or mark and scan must be returned to the free list by the a~gorithm.Two-space copying automatically creates a fresh free list consisting of a single large chunk that includes all of free memory. ~ocatingall free chunksandaddingthem to thefreelist is notalwaysenough: it allows us to get more chunks in the free list, and we may even succeed in coalescing adjacent free chunks, but it leaves the free memory fragmented in a number of free chunks separated by chunks in use. This phenomenon is called user requests a chunk with a size that is larger than the l m e ~ o r yallocator is unable to supply that chunk, even though the total size of free ~emory may be larger or even much larger than the request. To obtain the free memory in its most valuableform,asinglefreechunk,weneedto do c ~ ~ ~ ~ omp c pact t ~iono moves ~ . the reachable chunks to one side, thereby moving the free chunks to the other side; the latter can then easily be coalesced into a single free chunk; it is more complicated and timecon sum in^ than just freeing the garbage chunks, butit avoids fragmentation and is the best recover all unused memory. hemainproblemwithcompactionisthatitinvolvesmovingreachablechunks, m a y contain pointers to other chunks which are also moved. Needless to say, this has to be orchestrated carefully; a technique for doing so is explained in Section 5.2.6. eference counting and mark and scan can optionallybe followed by a compaction phase; wo-space copying does compaction automatically. abage collection a l ~ o r i t h come ~ s in three v~ieties: :the garbage collector is started, runs to comp~etionwhile in full control of all chunks, and then returns, leaving behind a hopefully improved situation. Since the gare collector is in full control when it runs, it can be fairly simple, but its unexpected activation can be disruptive. This is not much of a problem inside compilers, butit may ‘ . e applicatio~l~rogram ): some garbage collector actions are performed at each call of Mall OG andlor Free. These actions make some local modificationsto the chunk structure to increase the probability of finding a free chunk when needed. On-
the-fly garbage collectors are usually much more difficult to construct than one-shot garbage collectors, but are smoother and less disruptive in their operation. Also, they may still need a one-shot garbage collector as back-up for situations in which they cannot cope with the demand. -~ o n c ~ r r e nthe t : garbage collector runs on a second processor, different from the one that runs the program. It runs continuously and concurrently, and tries to keep memory garbage-free. ~nfo~unately, concu~ent garbagecollection is sometimesalsocalled on-the-fly, in spiteof the fact that this term suggest one agent rather than two. Referencecountinggarbagecollection is an on-the-flyalgorithm.On-the-flyandconcurrent variants of mark and scan garbage collection have been known since the late 1970s (Dijkstra and Lamport, 1978; Ben-Ari, 1984), but have not seen wide application. We will restrict ourselves here to reference counting and the one-shot variants of mark and scan and two-space copying. Garbage collection needs considerable preparation and support in the compiler, which will be treated in the next section. We will then discuss the three garbage collection techniques mentioned above: reference counting, mark and scan, and two-space copying (Sections 5.2.3 to 5.2.5), followed by an explanation of compaction (Section 5.2.6). It has been foundadvantageoustorestrictmost of thegarbagecollectionactivationstothemost recently allocated set of chunks, the newest generation. A discussion of this o p t i m i ~ i n ~ approach, called ‘generational garbage collection’ (Section 5.2.7), concludes this section and this chapter.
A chunk is only reachable by the program if the program directly has a pointer to it or can use a pointer to reach the chunk indirectly. The pointers that are directly available to the program can be located in various places, depending on the implementation. These places may include the global variables, local variables, routine para~eters,registers, and perhaps others. We will use the term r o ~ r data a ~ area for the non-heap memory that is directly accessible totheprogramcode. The set of pointersin the program data areawillbe referred to as the root set. It should be noted that the rootset is a conceptual notion rather than a data structure; it is the set of all pointers in the program data area, nota list of their values. The root set is usually not imple~enteddirectly but is conceptually present in the program code of the garbage collector. The pointers in the root set may point to chunks in the heap, which is under control of the garbage collector; such chunks are then reachable. Reachable chunks in the heap can, of course, contain pointers that point to other chunks in the heap, which are then reachable as well. This decomposes the problemof finding all reachable chucks -and so the problem of garbage collection -into three subproblems: 1. determining the rootset by finding all pointers in the program data area, with their types; 2. finding all pointers in a given chunk, with their types; 3. finding all reachable chunks using thei n f o ~ a t i o nof 1 and 2,
ubproblems l and 2 require knowledge of the pointer layout of the program data area and of each chunk type. The garbage collector needs compiler support for this; techniques to supply the required i~formationare covered in the next section. Solutions to subproblem 3 ~onstitutethe garbage collection algorithms proper, and are usually im~lementedas runtime system routines. A constellation of a root set and a heap with reachable and unreachable chunks is shown in Figure 5.6; chunks a and c are reachable from the root set, d and fr' are reachable fromtheheapthrough a and c respectively,andchunks b and e are ~nreachable.The first pointer from c points into the program data area.
heap f " " " " " " " " " " " "
I I
I
I I
I l I I I
I I
I I l I I I
I
1""""""""""""~
" I
I I I
I
ure 5.6 A root set and a heap with reachable and u n r ~ a c ~ a chunks. b~e
nce counting, which is not based on the concept of reachable chunks, does not require a root set, but it muststill be able to find all pointers ina given chunk. Since the garbage collector will, in principle, follow any pointer, there is still another re uireme~tthatthecodegeneratedbythecompiler has toobey:allpointersmustbe hich means that they must oint to locations the garbage collector can again interinter validity is also called ointer co~sist~ncy.
compiler he rmation pro to has about pointer thelayout of each ~ ~garbage the to u ~ k c t of a chunk C describes position the of each pointer Pthe in c of chunk thethat P points The to. compiler also has to make sure that all reachable pointers, both in the p r o ~ r data a ~ area and in the heap, are valid when the garbage collector is activated. Providing the root set requires e of the poi~terlayout of the program data area, and it was already clear that we also need the pointer layoutof each chunk type. Ac~ievingpointer vali~ityis relatively simple. A pointer can only be invalid when it
is uninitialized or when it has been ruined by assi compiler m ~ e sure s that each newly allocated po correctly, no bad pointer valuesof the type pointer type-secure, no valu S of other types can be assi validity is assured, f thesourcelanguage is not achieved and garbage collection can only be based servative garbage collection; conservative garbage this section. ~ e t e ~ i n i nthe g layout of chu for specifying the layout to theg layout of the p r o ~ r data a ~ area the next two sections. The compiler is in full control of the layout of chunks, so the only transfertheknowledge of thepointerlayouttothe answer to an important questi e if having a pointer to it is sufficient to ob of the chunk pointed to. There are two ways for chunks to be se1 carry their pointer layout i n f o ~ a t i o nin each copy, either direct pointer to a shared desc~ptor,or they can all have the same layo arises for examplein Lisp inte~reters,inwhich each chunkconsists pointers, one to thecar field and one to thecdr field of the Lisp cell. There are several ways to make the pointer layout of chunks avail collector.
of exactlytwo
-The compiler can g~neratea bit map for each chunk type, speci.fyin a chunk of that type are pointers to other chunks. With this me self-descriptive, sincejust having the pointer must be sufficientfo to continue. So each chunk must either contain its bit map or a This method requires the collector to i n t e ~ r e the t bit map, whic sive. -The compiler can generate a specific routine for each chunk type, which calls a collector routine passed as a parameter for each pointer inside the eh avoids the run-timei ~ t e ~ r e t a t i oofn bit maps and the need forself-de code can pass the type of the chunk tog er with the pointer to th routine. It also makes handling any chu f a type that does not CO operation,probably a usefuloptimizatiproblemwiththisme tines tend to be recursive, in which case they require an u n ~ e t e ~ i ~ a e ~d o of~stac ~ t space, which may or may not be available when the garbage colle is one of the rare cases where a significant portion of the run-ti erated automatical~y. -The compiler can org~nizethe chunks to start off with an followed by the other data types. With this organization, t the location of the pointer array and the totaln u ~ of~pointers ~ r pointer array is the first field and the total n u ~ b e rof pointers
administration, the chunkis self-desc~ptiveat low cost. Collecting the pointers inside a consecutive region is the most efficient method since it requires the fewest cross calls between compiled code and collector. The language definition may, however, not allow thereorgani~ationof the data items inside a chunk. In addition to the pointer layout, the collector has to know the size of each chunk. Similar considerations as with the pointer layout apply. The technique of specifying the pointer layout by routine is described extensively by (1991), for a strongly typed source language.
e
~
ithe~~
~ i ~ila ~ yo^^ ~ t~~t~~ e ~~
r
o data ~ ~r~~ r ~
~
The root set is usually supplied by running a library routine that scans the program data area and calls a specific garbage collection routine for each pointer the program data area contains. It is thenuptothespecificgarbagecollectionroutinetosee if the pointer is interesting and perform the proper actions, To perfom its task, the l i b r ~ yroutine must be able to find all pointers in the programdata area, with their types, and be sureeach pointer is valid; in short, it needs the pointer layout of the p r o g r a ~data area. The problem is that the pointer layout of the program data area, unlike that of chunks, is complicated and d~namicallyvariable. The program data area usually consistsof the lobal data area and a stack holding one or more stack frames or activation records. The pointer layout of the global data area is known and constant, although it may be distributed over several source program modules. To know the pointer layout of thestack, the garbage collector has to know which activation records it contains, and what the pointer layout of each activation record is. i~formationare dynamic, so activation records must be self~desc~bing. This is, however, easier said than done: the contentsof the activation records changedynamica~ly. Themainobstaclesto a pointer-validpr m data area of known layout arethe stacksandparameter lists underconsionwhichmayoccur ~ o r ~ stacks n g maycontainpointersin dyna~icallydifferin dyna~~ically differing types depending on the progress of the program, and parameter lists do not correspond toa data type in the language. Also,para~eterslists are sometimes constructed on the working stack, which combines the problems. In both cases the solution lies in defining anonym s data types for all st~cturesthat can occur, and keeping track of which one is current. r example, the parameter list of a routine with three formal parameters P R O C E ~ ~ RThree E parameters ( an integer I, a tree pointer Tp, a graph pointer Gp 1
can be described bya record type
TYPE
-anonymous-0001: an integer I; a tree pointerTp; a graph pointerGp;
To get guaranteed pointer validity, the record must be allocated in its entirety before the actual para~etersare evaluated, and the pointers in it must be set to null; they are then updated as the actual parameters are obtained. Adding this feature to an existing compiler may require considerable work. The same technique must be used for the working stack but there the situation is worse: whereas parameter lists are fixed and appear in the program in the form of formal parameter lists, thedatastructuresthatappearontheworkingstackaretheresults of pushes and pops generated by the code generator, after it has optimized the target code. One source of relief in this awkward situation is that pointer validity is only required at points where the garbage collector can indeed be activated; this is at an explicit use of an allocator and at a routine call. Fortunately, these are exactly the points at which a code generator tends to make sure that all values from registers are stored safely in memory.So we only haveto define anonymous data typesfor the working stack con~gurationsat these points. Still, for a large routine with many expressions that manipulate pointers, this can amount to a considerable number of anonymous data types. Also, the actual data type of the working stack has tobe updated at run time. Similar techniques can be used for data structures with dynamically changing types, for example unions. Another approach to pointers in the working stack can be followed in implementations in which the working stack is used only to impl~mentexpressions. The technique is simple: make sure the working stack is always empty when a garbage collector activation can occur. In fact, it has to be empty when any call occurs, since any call might in the end cause a call to the garbage collector to be made. To achieve this situation, all calls, and all other actions that can have sideeffects are moved to positions before the expression, using temporary variables. For example, an expression a := (b :=
c(3))
+ d(new Tree);
in which the assignment (:=) inside the expression yields the value assigned and new is the record allocator,is transformed into b := ~ ( 3 ) ; tl := new Tree;
t2 := d(t1); a := (b) -c t2;
In effect, the ‘difficult’ partof the working stack has been moved to the local variable stack. Since the data type of the local variable stack is constant for a given routine, only one anonymous data type needs to be generated. Some languages and compilers just cannot supply the pointer layout infor~ationand consistency required by garbage collection. An example of such a language is G, in which in principle anypiece of data can be stored in any typeof variable by using the proper cast.
o a c c o ~ o d a t esuch languages and such compilers, another approach to the pointer layout and consistency problem has been devised. In this approach any sequence of bytes in ram data area and theheap that contains a value that could representa valid pointer is taken to be a valid pointer, nce this drastic idea hassunk in, two things will become clear. Thefirst is that some ed pointers may actually be fake. This has two consequences: we have to be very careful in following pointers; and we may occasionally reach a chunk by accident that could not have been reached by following correct pointers. This may cause subsequent orithms to retain chunks that could be freed, The second is that if all pointers to nonfree chunks indeed reside in the program data area and the heap, we will find all non-free This prevents the algorithm from freeing chunks that should be retained. e approach described here is known as ~ ~ ~ s e ~ ~ ~ t i ~ e ~ ~ n s e r v at e sleast as many chunks as it should and probably more; the idea was introduced m and Weiser (1988). Its implementation requires considerable care; pitfalls are (1990) and Boehm (1993). Although it has been applied successdiscussed by ~ e n t w o ~ h fully in some systems, it is still regarded as experimental, and we will not discuss it any further here.
ss e ~ollectionis anoldandmaturesubjectandmanytechniques,rangingfrom st~aightfor~ard to inge~lioustodownrightsneaky,havebeeninventedinthecourse of time. A full treatment of garbage collectors could easilyfill a book (at least one such book exists, Jones and Lins (1996)), so we will make a number of simplifying assumptions for the benefit of the following discussion. We assume that the garbage collector operates on the same memory s t ~ c t u r eas and in cooperation with the simple malloc/free mechanism described in Section5.1-1.We also assume that it is possible both to free blocks explicitly and to have them freed automatically by a garbage collector. This is useful in implementations in which the ~ n - t i m esystself uses allocated blocks and knows when tofree them safely; it also allows the garcollector to free blocks through this mechanism, thus decreasing the comple~ityof Even with our simplified malloc/free model, the following constitutes an outline only; t under the rug. For one thing we assume that all pointers point to the b e g i n n i ~of~chunks -or at least to a fixed position inside of them. In some languages r a ~ e can r construct a pointer to a field inside a record; it must then be possible to r e c o ~ s t ~from c t such a pointer the pointer to the all~catedchunk into which it points. Actual garbage collectors are full of details required by the peculiarities of the data types of the implemented language, including mays, unions, and sets of types that may contain pointers. And last, and in this case probably least, in addition to data pointers the admi~istrativepointers that connect activation records must be followed. These pointers are essential especially in functional languages and in imperative languages that feature coroutin~s,functions as first-class citizens, or tasks. We also assume that all chunks are self-descriptive, either because they have e ~ o u g h information in them to retrieve the pointer layout,or because they all have the same structure. The algorit~msbelowcanusuallybereworkedeasilytooperateonnon-self-
descriptive data, but some benefits of the algorithm may be lost in the process. As said before, the discussion below is concerned with the basic algo~thmsonly, which are relatively simple; actual garbage collectors tend to be complicated and hairy, and, which is worse, error prone. The reasons for this additional complexity are twofold. The first is that the basic algorithms assume the user data to consist of chunks of known size and type, with all pointers initialized. Worse, many memory management publications in the computer science literature use a model in which chunks are all of the same type and in which the heap and the program data area have the same uniform structure. Reality is different. Actual languages have uninitialized pointers, arrays of dynamically alterable size, unions which have dynamic types some of which may contain pointers, exception handlers which may ~nvalidateassumptions about what is on the w o r ~ n gstack, tasks that may update data structures concurrently and other surprises. Also the structures of the program data area and the heap are usually vastly different. There is no fundamental reason why all these complications could not be inco~oratedcorrectly in an existing garbage collection algorithm, but doing so requires considerable care and increases the complexity of the code. The second reason is that garbage collection is intuitively considered a waste of valuable computer cycles by many users, and there is pressure on the compiler writer to optimize the garbage collector heavily. Given the baroque structure of the data types in most languages, there is indeed often room for spectacular optimi~ation;for example, no wing that an array consists of elements that do not contain pointers and so does not need to be scanned will improve the speed of the garbage collector. Also, the target code optimizer may optimize away code that is essential for the garbage collector. It may, for example, decide in a late stage that it can hold a variable in a register all the time and delete all memory accesses to the location reserved for it. This leaves an uninitialized pointer location, which wiI I derail the garbage collector, unless the pointer layout is also updated. Again, this can all be ~ r o g r a ~ correctly, ed but tends to heap complication on complicatio~. Each optimization, both of the garbage collector and of the generated code, is a threat to garbage collection correctness, and has to be designed and i ~ p l e m e ~ t ewith d the utmost care. Garbage collectors are notoriously difficult to debug, since they are doing surgery on the nervous system of the running program - the pointers in the activation records and in user data structures in the heap. For the same reason, a bug in the garbage collector will cause unpredictable and incompreh~~sible failure of the user program. Conversely, once a compiler has a name of having a buggy garbage collector, any failure of any user program using the compiler will first be blamed on the garbage collector, before any other source of error is considered. Even more than is the case with other parts of a compiler, it is important to get a garbage collector right before making it fast. We will now turn to the actual garbage collection algorithms.
is an intuitive garbage collection algorithm that records in each chunk the number of pointers that point to it; when the number drops to zero the chunk can be declared garbage; in a literal sense, reference counting collects garbage, unlike the other
garbage collection algorithms, which actually collect reachable chunks. In line with the name 'reference counting', we will call pointers 'references' in this section. The reference count is updated inseveral actions on the chunk. When a chunk is allocated from the heap, its reference count is initialized to one. Whenever a reference to the chunk is duplicated, its referencecount is increased by one ('incremented'). Likewise, whenever a referencetothechunk is deleted, itsreference count is decreased by one ('decremented'). If the reference count drops to 0, the chunk can be freed because it is no longer reachable. Figure 5.7 shows a heap with a number of chunks with reference counts and the references connecting them; some references come from the global data area. Note that there are no chunks with reference count 0.
program
data area""""""""""""r
heap l
d
I I
I I I I
I
I I I I
l l l
I I I
l
I I I
l.
Figure 5.7 Chunks with reference count ina heap.
Simply returning the chunk witha zero reference count to thefree list is not enough to reclaim all garbage, since it may contain references to other chunks that may now become garbage as well. For example, by deleting the reference to chunk b in Figure 5.7, chunk e also becomes garbage, but f remains in use, since there is still one reference left and its reference count does not drop to0. The resulting constellation is shown in Figure5.8. The twomainissuesin implement in^ reference counti~garekeepingtrack of all referencemanipulationsandrecursivelyfreeingchunkswithzeroreferencecount. The compiler plays an important role in keeping track of references; the recursive freeing is delegated to a run-time routine. We will now consider both. The compiler inserts special code for all reference manipulations: incrementing the reference count whena reference to a chunk is duplicated and decrementing it when such a reference is deleted. References are typically duplicated as an effect of some assign~ent in the source language. It does not matter what the targetof the assignment is: it may be a variable in the program data area, a field in a dynamically allocated data structure, etc. In
program data area""""""""""""r l
heap I
a
d
I I I l
I I I
U I _ I
I
I
I I I l
I I I I I I I I l
Figure 5.8 Result of removing the reference to chunk b in Figure 5.7.
each case the reference to the chunk is duplicated, so its reference count must be incremented. Besides assignment statements, the compiler also has to add reference increasing code to parameter transfers, since a reference that is passed as a parameter is effectively assigned to a local variableof the called routine. Note that not all references in the running program are references to chunks on the heap; many of them point to blocks in the program data area, and all reference-counting code must make sure it does not follow such references. References to chunks are typically deleted implicitly by assignment statements. An assignment to a reference variable overwrites the current reference with a new value, so beforeinstallingthenewreferencethereferencecount of thechunkaddressed by the current reference should be decremented. Figure 5.9 shows an outline of the code that must be generated for the pointer assignmentp :=g;when performing reference counting.
IF Points into the heap (q) : Increment q .reference count; IF Points into the heap :(p) Decrement p .reference count; = 0: IF p .reference count Free recursively dependingon reference counts (p); SET p TO g; Figure 5.9 Code tobe generated for the pointer assignment p :=g.
ta ~ l l o c a t i ~ ~
~ith
i~~licit
The other source of reference deletions is returning from a routine call. When a routine returns, all its local variables are deleted. A local variable holding a reference to a chunkshould be processed to decrement the associatedreferencecount,andthesame applies to parameters that hold references. We have seen that the proper way to reclaim the memory allocated to a chunk is to first decrement recursively the reference counts of all references contained in the chunk, and then return it to the free list. Recursion, however, is an unwelco~efeature in a garbage collector since it requires an unpredictable amount of stack space. ~ e p e n d i non~ the run-time system andthe operating system, thisstack space may be part of the ~ r o g data r ~ area, part of the heap, or a separate memory segment; in any case, its size is limited and may not suffice for the operation of the garbage collector. Having a garbage collector fail for lack ofmemoryiskindof e ~ b ~ a s s i n though, g, and several techniques have been invented to avoid the problem. The best solution is using pointer reversal, which will be explained in Section 5.2.4.3. A simpler but still adequate solution is suggested in Exercise 5.1O(c). We will discuss here briefly a very simple trick that does not solve the problem completely, but usually alleviatesit sufficiently: tail recursion elimination. The trick is to use repetition rather than recursion for the last pointer in the chunk to be freed. This avoids using the stack for freeing chunks that form a linear list. An outline of the code is given in Figure 5.10, in which Point er references the chunk to be freed recursively.
PROCEDURE WHILE IF IF
Free recursively depending on reference counts(Pointer); Pointer /= No chunk: NOT Points into the heap (Pointer) : RETURN; NOT Pointer .reference count = 0: RETURN;
FOR EACH IndexIN l .. Pointer .number of pointers - l: Free recursively depending on reference counts (Pointer .pointer [Index] ); SET Aux pointer TO Pointer; IF Pointer .numberof pointers = 0: SET Pointer TO No chunk; ELSE Pointer .numberof pointers > 0: SET Pointer TO Pointer .pointer [Pointer .number of pointers]; Free chunk(Aux pointer); // the actual freeing operation Figure 5.10 Recursively freeing chunks while avoiding tail recursion.
Reference counting is a simple technique that can be i m ~ l e ~ e neasily t e ~ by modifyingthecompilertoperformreference m~ipulationasoutlinedabove. ~nfo~unately, referencecountinghassomeseriousdrawbacksthatlimit its applicability. First,and foremost, reference counting cannot reclaim cyclic data structures. Consider the example in Figure 5.8, in which the reference count of chunk a is 2. If the reference from the root
set to chunk a is deleted, the reference count drops to l so the chunk is not reclaimed, as shown in Figure S. 31. By definition, however, chunk a has become garbage since it is no longer reachable from the root set. Reference counting cannot reclaim chunk a because chunk d, which has also become garbage, holdsa reference to it.
program data area -"-""""""""""r
U
n
l I I
heap I
a
d
I I l
I I I
I I I
I I I I I I
I I I I I
I I I
I l.
ure 5.33 Reference counting fails to identify circular garbage.
The problem with reference counting is that it takes its decisions by considering only one node in the graph at a time, and in order to reclaim a cyclic data structure all nodes in the data structure should be considered as garbage together. Once reference counting has failed to reclaim a cyclic data structure, the chunks involved will never be reclaimed. This has the unfortunate effect that free space leaks away, which might evencause the program to run out of free space when other garbage collectors would be able to reclaim the cyclic structures and allow the program to continue. The second problem with reference counting is efficiency. The compiled code has to monitor all reference manipulations, .and each and every reference manipulation requires the adjustment of the associated reference counts. This is a considerable overhead in comparison to other garbage collection techniques that do not monitor any pointer action and reclaim garbage chunks only when needed (see Sections 5.2.4 and 5.2.5). The final problem with reference counting is memory fragmentation. The free list is augmented with the reclaimed chunks, but it remains fragmented. In principle, doing a compaction phase during a reference counting allocation request is possible, but few reference counting garbage collectors go to such lengths. Despite its problems, reference counting is a popular technique for managing relativelysmallnumbers of dynamicallyallocated data st~ctures,usuallyinhandwritten to handlethe software.Forexample, a TJNIX kerneltypicallyusesreferencecounts recovery of file descriptors. Its use in generated code is less widespread.
~~~
S ~ C T5.2 I ~Data ~ allocationwith
i ~ ~ l i c~eallocation it
and scan garbage collection algorithm described in this section is the most effective one in that it frees all memory that can be freed (reference counting fails to free circular structures, two-space copying leaves half of the memo^ unavailable). It is often combined with compaction, and then also provides the largest possible chunk of memory available. The mark and scan algorithmis also sometimes called mark and sweep. The mark and scan garbage collection algorithm consists of two phases. The first, the mark in^ phase, marks all chunksthatare still reachable; thesecond,the scan phase, scans the allocated memory and considers as free chunks all chunks that are not marked reachable and makes them available again. We will now consider the marking and scanning phases in more detail.
ark in^ is based on two principles: chunks reachable through the root set are
reachable d anychunk reachable from a pointerin a reachable chunk is itself reachable. We assume that the root set resides in a program data areaor the topmost activation record, and that a data type description for it has been constructed and made available by the compiler. Now, in its simplest form, marking marks the program data area as reachable, finds the pointers in it using its data type description, and recursively marksall chunks pointed to by these pointers, in a simple depth-first scan. If this recursive process finds a chunk without pointers or a chunk that has already been marked,it backtracks and continues with the next pointer. Since the number of reachable chunks is finite and no chunk is processed more than once, this depth-first scan terminates and takes a time linear in the number of reachable chunks. Marlsing requires a second bit in the administration header of the chunk, the ~ a r k e d it, in addition to thefree bit. This bit starts off as ‘cleared’. The main problem with this recursive process is thatit needs a stack of unknown size, and the question is where to find room for this stack at a time when memory is in short supply, witness the fact that the garbage collector was activated. It has been suggested to us that in this day and age, in which computers easily have stacks of 1 megabyte or more, the question is no longer relevant. It is our experience that each generation of designers againsuggeststhat ‘X iscertainlyenough’,andthatsuggestionsliketheseleadto 3 boundaries. machines with built-in640 H The simplest answer to the questionof where to put the stack is: reserve room for the marking stack in each chunk. One pointer and a small counter are sufficient: the pointer points back to the parent chunk which contains the pointer that caused the present chunk to be processed, and the counter counts how many pointers have already been processed in the p~esentchunk. See Figure 5.12, in which the third child is being processed. Thistechniquecostsroom for onepointer,one counter, andonebitperallocated chunk. Whether this is a problem or not depends onthe average size of a chunk and on the ease with which the counter and bit can be accommodated. In a Lisp interpreter, in which chunks (called ‘cells’ there) contain two pointers only, the overhead is more than 3396, even though the counter can only have the values 0 and 1; in modern systems, in which the minimum allocation size is perhaps 16 or 32 bytes anyway the overhead is often consider-
to parent
t
2
3
4
Figure 5.12 Marking the third child of a chunk.
ably less. For a way to mark the directed graph of all reachable chunks without using space for the extra pointer, see Section 5.2.43.
eein Just freeing the unreachable chunks is now easy: using the lengths noted in the chunks we step through memory, from chunk to chunk. For each chunk C we check if it has been marked reachable; if so, we clear the marked bit of C, and if not, we turn on the free bit of C. We can also exploit this left-to-right scan to combine adjacent free chunks. To this end, we keep a pointer F to the first free chunk we find and note its size. As long as we keep on meeting free blocks, we just add up their sizes until we run into a chunk in use or into end of m e ~ o r y .We then update the administration of the chunk pointed to by F to the total size of the free chunks, thus creating a single larger free chunk, and continue our scan. We repeat this process as soon as we meet another free chunk, and so on. The result of a mark and scan operation is a heap in which all chunks marked in use are reachable and each pair of free chunks is separated by chunks in use, This is the best one can achieve without moving the chunks. A subsequent compaction phase can combine all the free chunks into one large free chunk, thereby improving the performance even further.
The above marking technique requires an overhead of one pointer plus a few bits in each chunk at all times, even when the garbage collector is not running. This overhead can be avoided almost completelyby pointe~~ e v e ~ sThis ~ l . ingenious technique has applications outside of the field of garbage collection, since it allows one to visit all nodes of a directed additional without using graph space for algorithm based on pointer reversal on isthe called also S after its inventors (Schom and Waite, 1967). The marking algorithm described above kept a pointer in each chunk C This pointer pointed to C’s parent P in the visitin process md would be the marking algorithm was finished with chunk C. Now, when the marki working on chunk C, it finds the pointers inC, which point to c ~ l of ~C e ~ one by one. Ima~inethat the muking algorithm has gone off visiting the say D; then after a while it will return from this visit. Upon r e t ~ i n gfr m a ~ ~ algorithm i~g can retain a pointer to D, the chunk it just resides in C in the B-th pointer field! This leads to the observ n-th childof C the contents of the n-th pointerfield in C are re
Parent
Figure 5.13 The Schorr and Waite a l g o r i ~ ~arriving , at C.
The ‘Schorr and Waite graph making a l g o ~ t h utilizes ~’ this r e d u n ~ to ~~ sto~e y the parent pointer which would otherwise go on the stack, as follows. Figure 5 situationwhenprocessingthechunk C starts. The a l ~ o ~ t h ~ ~ a i n t ~ n s pointers, Parent pointerand Chunk pointer; Chunk pointe chunk being processed, Parent pointer to its parent. Moreover, each a counter field which records the number of the pointer in. the c h u that ~ is b the counter starts off at 0 and when it has reached the total~ u r n ~of e rpointe the chunk has been visited completely. Weassumethattheprocessingofpointersin C proceedsuntil we pointer, which points to child D. In order to move to D, we shift the conte
Parent pointer
Chunk pointer
Parent pointer
Chunk pointer
Number of pointers in D
igure 5.15 About to return from D.
pointer, Chunk pointer, andthen-thpointerfieldin C circularly,using a tempoint Some er. care is required, since the pointer that porary variable 0l d parent addresses C changes in the middleof the code:
p /
pointer
pointer
l
U U
I
Number of pointers in C
Figure 5.16 About to return from C.
// C is pointed to by Chunk pointer. SET Old parent pointer TO Parent pointer; SET Parent pointer TO Chunk pointer; // C is pointed to by Parent pointer. SET Chunk pointer TOn-th pointer field inC ; SET n-th pointer field in C TO Old parent pointer; This results in the return pointer to the parent P of C being stored in the a-th pointer field in C, which. normally points to D, as shown in Figure 5.14. As far as Parent pointer and Chunk pointer areconcerned,thesituation is nowequivalent to that in Figure 5.13, when we first arrived at C, except that C is now the parent and D is the chunk going to be processed, Figure 5.15 shows the situation when we are about to return from visiting D; the only difference with Figure 5.14 is that the counter in D has now reached the total number of pointers in D. In order to returnfrom D to C, we circularly shift back the pointers:
// C is pointed to by Parent pointer. SET Old parent pointer TO Parent pointer; // C is pointed to by Old parent pointer. SET Parent pointer TO n-th pointer field inC ; SET n-th pointer field in C TO Chunk pointer; SET Chunk pointer TO Old parent pointer; // C is pointed to by Chunk pointer. and increment the counterin C. The whole fancy footwork is then repeated for the rz t-1-th pointer in C, and so on, until all children of C have been visited. We are then ready to return from C, as shown in Figure 5.16. Note that the return pointer to P has by now been stored in and subsequently retrieved from each pointer position inC. Returning to P from C, the marking algorithm is in a position similar to where it was when returning from D to C, Again, the marking algorithm makes sure it brings along the
pointer to its point of departure, C, to restore the pointer in P to C, which for the duration has been replaced byP’sreturn pointer toits parent. It should be pointed out that the above only describesa clever technique for avoiding a stack while visiting all nodes in a graph. To prevent looping on cycles in the graph, nodes must be marked at the beginning of a visit, and already marked nodes must not be visited again. The samemarkedbitcanthen be used during freeing, to indicate that a chunk is reachable.
The mark phase in a mark and scan garbage collector touches the reachable chunks only, but the scan phase touches all chunks. This makes it expensive, since most chunks have a short lifetime and a very large fraction of the heap consists of garbage chunks when the garbage collector is invoked. Two-space c o p ~ exploits i ~ ~ this important observation by avoiding scanning all chunks and only processing the reachable chunks; it manages to do so at the expense of doubling the memory requirements. With memory becoming cheaper every day, however, it has become reasonable to optimize for time rather thanfor memory. The The basictwo-spacecopyingcollector byCheney (1970) worksasfollows. available heap space is divided into two equal parts: the from-space and the to-space, as shown in Figure 5.17. During normal computation new chunks are allocated infrom-space by simply advancing a pointer through the from-space. When the heap space in the frornspace has been consumed, all reachable chunks are copied to the empty to-space by the garbage collector.
p r o g r a ~data area
program data area
11
from space
to space
I
to space
1
from space
Figure 5.17 Memory layout for two-space copying.
The two-space copying operation starts with copying the chunks in from-space that are referenced by pointers in the root set. The copies are placed in theto-space in consecutive positions, starting right at the beginning. The original of a chunk in the from-space is marked ‘copied’ and a forwarding pointer pointing to the copy in the to-space is stored in it; since the chunk has just been copied, its contents can be destroyed. No pointers are updated in the copies,so the pointers in themstill point to chunks in thefrom-space, Then the chunks in to-space are scanned from left to right for pointers to chunks in from-space, using a ‘scan pointer’. Suppose a pointer in a chunk P under the scan pointer
points to a chunk Q in from-space. Now there are two possibilities: in which case it contains a forwarding pointer which is used to upd Q is not marked ‘copied’, in which case it must be copied now. After the copy has been made,theoriginal Q is marked‘copied’and its contentsarereplaced by a forward in^ pointer to the copy. This processis repeated until the to-space co~tainsno more pointersto chunksinfrom-space.Then all reachablechunksinthe from-spac ave beencopied to to-space and all pointers have been updated to point into to-space. xt, the roles of the two semi-spaces are ‘flipped’, and ordinary computation is resumed. When scanning, we may also find that the pointer in a chunk P points to the program data area; such pointers need no attentio~and are ignored. The strict separation of program data and the heap allows the garbage collector to detect efficiently with one compare instruction whether the pointer refers to a chunk in from-space or in program data. The pointer cannot point toa chunk in to-space, sinceP has just been copied from from-space, which contains no pointers to to-space, and no pointersitinhave been updated yet.
r ” ” “ ” “ -
from-space
I
I I
I I I I l I
Figure 5.18 Initial situation in two-space copying.
This algorithm does not need a stack to keep track of which chunks contain pointers that still must be traced. Instead it uses a simple scan pointer in the to-space that shows which of the copied chunks have been scanned and which have not. Figure5.18 shows the from- and to-spaces at the start of a garbage collection operation, Figure 5.19 shows a
from-space
" " " " " " " " " " r
I
I
I I
I I 1 I 1
, '?
:
I I I I I
I
'l
scan pointer
ure 5.19 Snapshot after copying the first level.
snapshot after the three chunks directly accessible from the root set have been copied and scanning started, and Figure 5.20 shows the situation after four chunks have been copied and one has been scanned. The forwarding pointers have been shown as dotted lines with hollow m o w heads, to distinguish them from the 'real' pointers; in an implementation, the 'copied' marker, not shown in the diagrams, serves to distinguish between the two. We see that the collector copies the graph in breadth-first order, and compacts the chunks automatically (i~evitably italso compacts the arrows in. the dia~ram!). It is also easy to see that after all chunks have been copied and scanned, thefrom-space contains no interesting data my more and can be considered empty. A copying garbage collector addresses the fragmentation overhead and time complexity of the mark and scan garbage collector. The key insight is to move chunks so free space will always be consecutive, which makes memory allocation very cheap, since it can be pedormed by advancin~a global pointer. The main problems with two-space copying garbage collection are thatit wastes half of the heap and thatits performance is poor if the heap is nearly full. Two-space copying is a very effective but also very demanding garbagecollectionalgorithm.It is a goodchoicewhen ~ n - t i m espeed is importantand ~ e is not~scarce,o This ~ is fre~uentlythe case with advanced p r o g r ~ n paradigms g
from-space
" " " " " " " " " " " r
I
I
I
root set
I
' I
I
l
I
l
I
scan pointer
Figure 5.20 Snapshot after having scanned one chunk.
on modern machines, whichis where we find two-space copying mostoften.
cti Since the size of the largest free chunk may be considerably smaller than the sum of the sizes of all free chunks, it is useful to move all used chunks together. Doing so moves all free chunks together also, so they can be combined into one ma~imum-sizefree chunk. The compaction algorithm described below can be performed in any situation in which there are only two kindsof chunks, chunks marked in use and chunks marked free, and the pointers in the chunks marked in use are all consistent. Since this is true even if some of the chunks marked in use are actually unreachable, compaction can, technically speaking, be done at almost any moment: it is largely independent of garbage collection and is just a free-list improving technique. Needless to say, it is most effective just after the garbage collector has run, and in practiceit is often integrated with the garbage collector. It is user two-space copying, though. action is most simply done in three left-to-right sweeps through memory, using one additional pointer per chunk. We assume that the used chunks will be moved to the
lower end of the heap. For clarity, Figures 5.21 through5.23, which depict the compaction as if compactionmovesthechunksfrom'old'memoryto 'new' process,aredrawn memory, but actually the 'old, and 'new' in the diagrams designate the same memory, The first sweep calculates the addresses of the new positions of the chunks, the second updates existing pointers to pointto the new positions, and the third actually moves the chunks:
-Address caZc~Zation,see Figure 5.21. The chunks are scanned from low to high and
for C its new positionaftercompaction is computed;theaddress eachusedchunk corresponding to this position is stored in C's administration. Since the new position of the first used chunk is known (the lower edge of memory) and the sizes of the chunks are also known, the address calculationis trivial. -Pointer date, see Figure 5.22. The program data area and the chunks are scanned for pointers that point into the heap; each such pointer to a chunk C is updated to the new address of C found in theadminis~ationof C. -Move c ~ ~ n see ~ Figure s , 5.23. The chunksarescannedfromlowtohighthrough memory and each used chunk is moved to its new position, as found in its adrninistration. Since chunks will only move to the left (or stay in place), this can be done with a single left-to-right scan. All pointers within the chunks now again point to the chunks they pointed to before the compaction started. All of the memory after thelast used chunk now formsa single free chunk.
old
new
f
1
1
2
1
""""""""""".I""""""I
igure 5.21 Address calculation during compaction.
d frees are utilized by an optimization called gen-space copying, it is based on the phenomenon that most allocated blocks fall out of use almost immediately; conversely, if a block has been around for a while already, chances are thatit will stay in business for some time to come, This suggests restricting the -expensive -action of the garbage collector to the most recentlyallocatedchunksonly.Especiallyin a compactinggarbagecollectortheseare
address new
of 1
address new
of 2
old
1"-J
J
" " " " " "
new
I I
1
1
2
1
""""""""""".I""""""I
Fig~re5.22 Pointer update during compaction.
I
" " " " " " " " " " " " " " " " "
old I""""~""~
new
Figure 5.23 Moving the chunks during compaction.
easy to find: they all lie at addresses past the end of the last compaction. Only if garbage collection on this newest generation fails to free enough memory or to meet some efficiency criterion, is a full garbage collection pedomed. Many variants of generational garbage collection exist, some of great sophistication and efficiency, but in its basic form it divides the heap dynamically into two regions, one for an older generation of chunks and one for a younger generation; the routine mal loc ( ) hands out chunks fromthe younger region. As usual, a third region also exists, the program data area. Now a certain large percentage of the garbage collector activationsrestrict themselves to the younger generation region, usinga simple trick. The trick is to consider the older generation region to be part of the program data area, and collect only the chunks in the younger generation region, using any garbage collection algorithm desired. Since the program data area can be seen as an input parameter to any garbage collection process and since it can in principle have any shape and content, it is obvious that this unusual view of what constitutes a program data area still yields correct results. The comp l e m e n t ~smaller percentage of the garbage collector activations are full garbage collections, in whichthe program data area and the heap have their traditional roles. These serve to reclaim any chunksin the older generation region that have b e ~ o unreachable. ~e
Algorithms differ in the answers they give to the questions of what the exact difference between older and younger is, whether more than two generations are distinguish~d, exactly when and how often a full garbage collection is done, and what garbage collection algorithm is used.Theyareoftencombinedwithotheroptimizingtechniques,most of which again exploit statistical propertiesof chunk usage. For example, it is known that very few pointers to young chunks are stored in old chunks and the program data area, so some algorithms keep a list of such pointers to be used as a root set for garbage collecting the young region. This avoids scanning the possibly large program data area and old region upon such occasions, but requires m o ~ i t o r i n ~ all pointer assignments, ene era ti on^ garbage collection algorithms are the fastest and most efficient collection algorithms known. For an example of a mature, h i g h - p e ~ o ~ a n cgarbage e collection system using generational garbage collection, see Ungar (1984.).
This concludes our treatment of memory management with explicit and implicit deallocation. In summary, explicit deallocation is implemented using linked lists of free chunks, and implicit allocation is implemented by computing the transitive closure of the directly accessible chunks and freeingall chunks not in thisset.
-All compilers and many compiled programs need dynamic memory allocation -access to memory chunks the sizesof which are not known statically.
-Usually, all dynamic memory is allocated from a single contiguous segment of ~ e m o r y , called the data segment or the heap.
-Allocatingrequestedmemorychunks
is an ad~nistrativematter;freeing m e ~ o r y chunks that are no longer needed is both an a d ~ n i s ~ a t i vmatter e and a p r o g r a ~ m i ~ g paradigm issue. -Dataallocationwithexplicitdeallocation by the programmer usually uses a mal loc ()Ifree () -like interface, Data allocation with implicit deallocation requires a formof garbage collection. -Memory fragments are called ‘blocks’ if they are in the hands of the user, and ‘chunks’ iftheyarehandledbythememoryallocator. A chunkcontains a blockplussome administration. -Memory is allocated in a region called the ‘heap’. It consists of a contiguo~ssequence of chunks, each chunk marked with its size. A bit tells whethera chunk is in use or free. -A call of mal l oc finds a free chunk of sufficient size and returns a pointer to it, after updating some ad~nistration.A call of free ( ) frees the chunks supplied to it, and may coalesce adjacent free chunks into a larger one. -For increased efficiency, the free chunks can be linked into multiple free lists, one for each size interval.
-~ o m ~ i ldata e r st~cturesthat require dynamic allocation are linked
lists and extensible ked lists of records of a given size can be in~plementedefficiently by batchrds in larger blocks and ~aintaininga free list of them, ~xtensiblearrays l~entedefficiently by increasing the array sizeby a constant factor. on with implicit ~eallocationis very usef~lsince explicit deallocation is very difficult for the programmer to do; explicit deallocation in generated code is often sodifficultas to be infeasible. Also, m e ~ allocation o ~ errorsarehardtofindand correct. collection should free all chunks that the program will not use any more, but r the c~unksthat are u~~~eachable from the program dat ounting garbage collection detects garbage chunks and scan garbage collection id all ~nreachablechunks. Two tion copies all reachable chunks to a different space, to e chunks to the free list leaves free memory fr co~pactionis needed to upgrade thefragl~entedfree list into a sin algol~thmscan be one-shot, on-the-~y,or concurrent. bage collector needs a root set and the pointer layout of all reachable achable pointers must be valid, point to somethin rt is needed for all thee of these features. ted from the p r o g r a ~data area; pointer layout follows from the compiler must provide both to the arbage collector in tbe comachieved in a ty~e-securelanguage by properly initia~izingail
-The pointer layout of chunks is known to the com~iler;it can be su~pliedto the garbage r in the formof a bit map, in theform of a routine, or it can be fixed.
can be se~f-descri~tive, in which case the pointer layout can be obtained from context. d each chunk itself, inter layout must be s ~ ~ ~ lbyi ethe data area i s c o ~ ~ l i c a and t ~ dd y n a ~ i c a l ~ y v ~ i a b l e . e the stack with ~bitraryactivation records and the working stacks nder construction in these activation records. be created by the compiler for every possible tivation record that can occurduring p ~ o g r aexecution, ~ andcodemustbegenerateatkeepstrack of i n is cu~rent. er of required pointer layouts for activation records may be reduced by using k only for simple expressions. It is then empty when the es not allow the root set andlor the pointer layout to be provided or to be ac~ieved,a conservative estimate can be made of the root set in each chunk: any value that looks like a ~ointeris y, thisworkscorrectly; it may still leavesome
-Actual garbage collectors are complicated by baroque language
features such as arrays of dynamical~yalterable size, unions with dynamic types which contain pointers, and exception handlers. The effects of aggressive code optimization can also be a compli-
arbage collection records ineach chunk the numberof pointers that point to it; when the number drops to zero the chunk is garbage and can befreed. -To keep the reference countof a chunk current, it must be incremented when a reference to the chunkis copied and decremented when sucha reference is destroyed. - hen reference counting frees a chunk, the reference counts of all chunks it refers to mustbedecremented,and so on, recursively. A recursiveimplementationcanbe achieved using pointer reversal. -Reference counting cannot free unreachable circular data st~ctures.Also, monito~ng all pointer manipulation may be expensive. -Mark and scan garbage collectionis the: only algorithm thatfrees all memory that can be freed. It is often combined with compaction, and then also provides the largest possible chunk of memo^ available, The marking phase marks all chunks that are still reachable; the scan phase scans the allocated memory andfrees all chunks that are not marked reachable. -~ a r k i n gneeds a stack of u n ~ o w nsize. This stack can be distributed over the chunks, with each chunk having room for a parent pointer in the marking process. The scan phase can combine adjacent chunks to improve the quality of the free list. -The overhead of the room for the parent pointer can be eliminated by using pointer reversal, also called the Schorr and Waite algorithm. In pointer reversal, while visitinga child D of a chunk C, the parent pointer of C is stored inthe location in C which held the pointer to L). The visit toD returns a pointer to D, which is used to restore the pointer in C by swapping it with the parent pointer. This restores the parent pointer, which can then be swapped withthe location of the next pointer inC. Intwo-spacecopying, the availableheap space is divided into twoequalparts:the from-space and the to-space. New chunks are allocated in from-space by advancing a pointer. Upon garbage collection, all reachable chunks are copied from the to the empty to-space, while updating all pointers, including those in the pr area. -Two-space copying copies the graph of reachable chunks in breadth-first fashion: first the chunks that are reachable directly from the program data area, then those reachable in two steps from the program data area, then thosereachable in three steps, andso on. -When a chunk is reached for the first time, through a pointer P,the chunk is copied to the first free position in the to-space, and a forwarding pointer is left in the ori pointing to the copy; the pointer to the copy replaces P, When a chunk is reached t ~ o u g ha pointer ,no copying takes place and the forwarding pointeris used to replace
-
-
Q*
-When
all reachable chunksinthefrom-spacehavebeencopiedtoto-spaceandall pointers have been ~pdated,the roles of from-space and to-space are flipped, and ordinary computationis resumed. -Two-space copying does not needa stack; it avoids the fragmentation overhead and time
complexity of the mark and scan garbage collector. Its main problem is that it wastes half of the heap. It is a good choice when run-time speed is i l ~ p o ~ a and n t memo^ is not scarce. The size of the largest free chunk may be considerably smaller than the sum of the sizes of all free chunks; this is calledfragmentation.Compactionmoves all used c h u n ~ s together; this moves all free chunks together also, so they can be combined into one ma~imum-sizefree chunk. ompaction is a free-list improving technique rather thana garbage collection technique and is largely independent of garbage collection. It can, technically speaking,be done at almost any moment, butis most effective right aftera garbage collection. C o ~ ~ a c t i oisndone in three left-to-right sweeps through memory: thefirst sweep calculates the addresses of the new positions of the chunks in the heap; the second updates existing pointers in the program data area and the heap chunks to point to the new positions; and the third actually moves the chunks. ~enerationalgarbage collection restricts the expensive action of the garbage collector to the most recently allocated chunks only,since these are the most short-lived. In its basic form, generational garbage collection divides the heap dynamically into one ~egionfor an older generationof chunks and onefor a younger generation. ~enerational arbage collector activationsrestrictthemselvestotheyoungergenerationregion by onsidering the older generation region to be part of the program data area. The algor i t h ~then automatically collects only the chunks in the younger generation region. ~enerationalgarbage collection algorithms are the fastest and most efficient garbage collection algorithms known. n s u ~ m a r yexplicit , deallocation is implemented using linked lists of free chunks, and implicit allocation is implemented by computing the transitive closure of the directly accessible chunks and freeingall chunks not in this set.
tailed infor~ationabout the garbage collection algorithms described here and many othcan be found in the book Garbage Collection -Algorit~msfor Automatic ~ y ~ , a m i c mory ~ a n ~ g e m e by n t Jones and Lins (195%). An extensive bibliography on garbage ollection and related topics is supplied by Sankaran (1994), and an interesting survey is ivenbyCohen(1981).Journals to consult are, for example, ACM S I G ~ ~ o ~ ~ c e~s , ~ r a n s ~ c ton i ~~~r so g r ~ m ~~i n g u a g and ~ s ~ystems,and S o ~ ~ r~ e ~ ~ cand t i ~ e ~x~~rience. dea age collection has a long history starting in the 1960s with the development of . The original paper on LISP( ~ c c a r t h y1960) , describes mark and scan garbage collection, called ‘recl~ation’there. The early garbage collection algorithms, such as reference counting, imposed a considerable overhead on the execution time of an application, which led to the (~s)conceptionthat garbage collection is more expensive than manual memory manage~ent. Recentstudies,however,haveshownthatadvancedalgorithms such as generational copying garbage collection perform roughly as well as, and sometimes even better than, manual memory management (Appel,1987; Barrett and Zorn, 1995).
Recent advances have been reported in the Conference ~roceedingsof the AC ~I~~~~ PLAN Inte~ationalSymposiumonMemory ~anagementISMM '98, Notices, Vol. 34, No. 3, March 1999, and in the more algorith~callyoriented c o ~ p u t e r science journals. Much material can also be found in the ~ ~ o c e e ~ i ofthe n g s Inte~~atiun~Z ~ o r k s h on o ~ er no^ ~ ~ ~ a g e r n1992, e n t published as Springer's Lecture Notes in Computer Science No.637. ~~~
5.1
Modify the outline code for the basic Nalloc given in Figures 5.2 and 5.3 for the situation that the least common multipleof the align~entrestrictions of the machine is 32 and the A ~ m i n i ~ t ~ a t isize o n is 4.
.2
In Section 5,l. 1 we suggest calling the garbage collector, if there is one, whenever Mal l o c ( ) runs out of space, hoping that it will free some unused chunks the user has forgotten to free. It would seem, however, that this is incorrect, since the user could free some of these chunks later on in the program, and these chunks would then be freed twice. Why is this not a problem?
.3 In languages with explicit memory deallocation, memory leaks can bea major prob-
lem.Memoryleaksoccurwhenthe programer forgetstodeallocate a block of memory that is no longer used. What can be done to address this problem, apart fromusingagarbagecollector?AnswersaresuppliedbyHastingsandJoyce (1992), Barach, Taenzer and Wells (1982) and in the answers section of this book. (a) Using a loop of calls to mal loc () ,each requesting say l Mb, find out how much memory one can obtainon your system. (b) By timing the access times of random bytes in the chunks thus obtained, find out if the access times of the chunks are all equal. If they are not, what could be the cause? In Section 5.1.2 it is claimed that returning blocks to the standard memory m a n a ~ e ~ would require much administration. Why is it not enough to have a single counter per block, which holds the number of busy records for that block, and to return the block whenit reaches O? 5.
W e r e , in the construction of assemblers as explained in Section 4.3, can one make good useof extensible arrays?
5.7
Name some properties of garbage collection algorithms that make bage collection very hard.
concu~entgar-
A C programer (hacker) decides to exploit the fact that addresses on the target
machine are always less than 2**31, and uses the most significant (32nd)ofbitsome , code like: pointer variables asa flag bit, to store some usefuli n f o ~ a t i o nusing
i n t x, *p; /* s e t flag b i tt h u s : */ = ( i n t *) ( (int) p 0x80000000); /* d e r e f e r e n c et h e pointer t h u s : */ x = * ( i n t *) ( ( i n t ) p & Ox7fffffff);
What will this do to the garbage collection algo~thms?Will conservative garbage collection still work? In
9 ,
thereferencecount of thechunkunder is incrernented before that of decre~ented. What c o ~ ~ con l d ivably go wrongiftheorder consider the assign~entp :=p.
~ e n rec~rsive t descent e s t ~ ~ c t u shown re in t is the comp~exityof nt the Schorr andWa
your irnplementationon
a large complexity of the algow about~ e ~ o i z a t i o n ? algorith~.Answer the same questions
ing.Test
orithm for l~arkingagraphwithoutusinganunbounded stack: use a b o ~stack ~ as ~ exp~ained e ~ by uth (1973) in his Art of ~ o ~ ~Pro~ ~ t e orithm G on page 415. It works as follows. Say, the stack is limited Whet? the stack ove~lows,the oldest entry is discarded, leavin 100 most recent return addresses. When the stack becomes ex austed (under~ows) oritbrn R. The arki in^ a1 o r i t h ~now consists s no more nodes to~ n a k . h will be marked bya single runof the algorith~ A? orithm; hint: you need one more markin than for the ~nboundedstack i~plernen~~tion. JVhat is its complex it^? ility to limit the stop-and-go behaviorof the mark and scan algorithmis to scan phase incrementally. After the mark phase, weno longer perform a scan of the complete memory, but rather modify theMal loc () code in Figures 5.2 .3 to scan until it finds a free chunk of suitable size. Give outline code that ~mp~ements this i ~ c r e m ~ n t a l s c h e ~ e .
onsider the pointer reversal al~orithmof Section 5.2.4.3 for bin ~ o r yfor , e ~ a ~ pon l ea C in the finished inter with ablebe atonode, t we cannot write that pointer in read-only memory. Suppose now that the tree stores the ‘exclusive or’ (0)of the left and right pointers and the parent pointer, rather than the left and right pointers the~selves.When returnin for the last time to a node, we can then find the arent pointer as follows. The right pointer entry holds r e n tp o i n t e r , and since we know right there!), we can c o ~ p u t the e parent pointer:
right
pointerG3 parent pointer
= parent
the equalities A
pointerG3 right
~ =~ ~
pointer
~ =~ OOB A = A B.O Turn~this idea into a complete
C o ~ p a c t i has o ~ occasionally been ch~acterizedas ‘one-space copying’, in analogy e s differences. to two-space copyin . Describe the s i ~ i l ~ i t iand
c is to^
of ~ e ~ ~~ n ~o ~Study e~ Collins’ ~ ~ npaper t : (196O), which i ~ t r o d ~ c e s reference co~nting,and write a s u ~ of ~it in ymodern ter~inology.
n the previous chapters we have discussed general methods for performing lexical and syntaxanalysis,contexthandling,codegeneration,andmemorymanagement,while disregardingthe p r o g r a ~ i n gparadigm from whichthesourceprogramoriginated.In doing so, we have exploited the fact that much of compiler const~ctionis in~ependentof a Java compilerdiffersconsiderablyfrom a Prolog thesourcecodeparadigm.Still, compiler:bothrequire p~digm-specifictechniques, in additiontogeneralcompiler construction techniques. We will explore these paradigm~s~ecific techniques in the next fourchapters.Figure6.1showsthenumbers of thechaptersandsectionsthatcontain ater rial on the subjects of lexicalandsyntacticanalysis,contexthandling,and code gelleration for programs in the four paradigm covered in this book. For each of the three subjects we have already considered the general methods; the p~adig~-specific methods are covered as shown in the table. There are hardly any p~adigm-specificlexical and syntactic issues, so the general treatme~tin Chapter 2 suffices. One exception is the ‘offsiderule’ in some fu~ctional languages, in which the text layout plays a role in parsing; it is described in Section 7.1.1. WO aspects of context handling are almost universal: identification and type checking; of the rather than dedicating a separate chapter to these, we cover them at the beginning present chapter, in Section 6.1. The p~adigm-specificcontext manipulations are described where neededin Chapters 6 through 9. The main differences in compilersfor different paradigms lie in the kindof code they enerate, and this is the primary reason for having a separate chapter for each of them. ost (but not all) compilers for imperative and object-oriented languages produce code at the level of an assembler or lower, whereas many (but by no means all) compilers for func-
identification &
6.1
F i ~ ~ r6.1e Roadmap to p~adi~m-specific issues.
tional, logic, and parallel and distributed languages generate C or C++code as their target code. The general code generating techniques have been discussed in Chapter 4, and the specific foms of code to be generated for the four paradigms are treated in Chapters 6 through 9. ~ e t u ~ i to n gthe structure of the present chapter, we first discuss identifier/operator identification and type checking; operator identification requires special attention since in most languages operators are overloaded, andthe context is needed to make the final identification. The rest of the chapter concerns code generation for the imperative and objectoriented languages. Since programs in these languages work by specifying direct instructions for the manipulation of explicit data, and since both the instructions and the data are relatively close tothose of thehardware,themainissuesin code generation for these languages are the mappings of source language statements to low-level instructions and of source language data onto bytes. These issues are covered in Sections 6.4 and 6.2, respectively. An inter~ediateposition is occupied by the activation records and closures ( tion 6.3). Activation records are data structures used by the run-time system to repre~ent active (‘~nning’or ‘suspended’) routines; closures are a represe~tationof routi~esthat allow some advanced operations on them, most prominently currying. Both combine data and flowof control. Althou~hfunctional, logic, and paralle~distributedprograms often require data structures that are not direct mappings of programer data and instruction sequences thatdo not derive directly from the program, much of the following discussion is still relevant for the more advanced paradigms.
Roadmap
6 ~ ~ p e r a t i vand e object-or~e~ted pro 6.1 Context and ling 6.1.1 ~dentification 6.1.2 Type checking
6.3.2~ o u t i n e s .3~ p ~ r a t i o on n s routi~es
6.3.4 c on-neste~ routines .3.5Nested routines .3.6Lambda lifting
.3.7 ~ t ~ r a t oand r s ~oroutines
6.4.1 Local flow of control
6.5.2 Module initializati~~n .3Code g e ~ ~ r a t fbr i o ~generics
The context handling phase follows the lexical and syntactic phases. Th c e ~ e dwith local pheno~enaand relate each item to its jmmediate neighbors only: for e x a ~ ~ llexical e ? a~alysiscombines letters withadjacent letters andits i ~ t oidentifiers, x analysis c o ~ ~ i n adjacent es identifiersandoperatorsintoressions.Context on the otherhand, is c o n c e ~ ewith ~ lon~-distancerelations: for e x a ~ p l e ,it type of a variable in a declaration to its use in an expression and relates the position of a label to its use in a stateme~lt. The connecto~sinthese aretheidentifiers: all applie c c u ~ e n c eof~ an i~entifieri in an e i o i, ~ and from that socket they obtain infor~ation which fit in one socket, the d e c l ~ ~ tfor ,lifeti~e,and so on. first task of the context ase is to take the a n ~ o t ~ t esyntax d tree and to find the c~urrenceof each appli occurrence of an identifier or operator in the p r o g r a ~ . That done, the context phasecan turn to its second task, perfor ing on each node of the tree? as specified by the rules in the lang areconcernedwithtyp ition ex~ressionin an if-stat a jump to a label ins f o r - s t a t e ~ e from ~ t a goto perform the final steps in syntax checkiven in the ma~ualmay forbid records y more use~-frie~dly to allow such records in the
parser and to issue the error message ‘No fields found in record type declaration’ during context handling than to exclude them syntactically and issue the message ‘~nexpected closing parenthesis’, or something similar, durin Context handling is sometimes also called which emphasizes the for correct meaning se. more that than Of course, correct m e ~ i n gcannotbecheckedinany fo a1 senseoftheword,but often useful w ~ i n g s about dubious ~ e a n i n g scan be given. amples are the detection of unused variables and routines, infinite loops, non-trivial exp ions that have a trivial result (for example EXPR >= 0 , where EXPR is anexpression of type unsigne~integer),andmanyothers.The problem with thisis that much of this infor~ationbecomes available more or less autom ically during code optimization. Issuing the w a i n S in that phase, however, has the a wardpropertythatthe ~ e s s a g e sfail toappearwhentheprogram is compiledwithout opti~izations,but duplicating part of the code opti~zationeffort in the context handler ivethepropermessages is notattractiveeither. The reasonablecompromise doing the analysis and issuing the messages in the context phase and saving the resultsfor the code generator -complicates the interfaces of all the phases between the context phase and the consumerof the i n f o ~ a t i o n ,Such are the worries of the compiler designer, We will now first turn to two context handling issues that play a direct role in static correctness c h e c ~ n gin most languages: identification and type checking.
At a conceptual level, identification is the process of finding the defining occurrence of a givenappliedoccurrence of m identifieroroperator.The o c c ~ r r e n cof~an identifier is the place of its main, and usually only, introduction. This introduction supplies information about the identifier: its kind (whether it is a constant, a variable, a module, etc.), its type, possibly an initial or fixed value, possible allocation properties, etc. The other occurrences of an identifier are its pro lie o ~ c ~ r r e n and c ~ are s the consumers of this information. For example, in month: integerRANGE [l..l21 ; *.. month := 1; ~ H I month ~ E c = 12 DO print-string (month-n~me[month]) ; month := month c 1; DONE ;
the month in the first line is the defining occurrence, the others are applied occurrence This formulation of the problem does not cover all cases. Some languages all tions, which result in identifiers having more than one introduction. Other languages do not require ide~tifiersto be introduced or declared at all; those la have the information about the identifiers built in or the informationis distributed over all the applied occu~ences:type and other properties ofan identifier follow from its form andlor use. At the compiler constructionlevel, these differences disappear: thereis an i n f o ~ a t i o n entry for each named item in the program, in which all information about this item is col-
lected. It is the task of the identification process to connect all occurrences of an identifier to its proper entry. Infor~ationcan then be inserted in and retrieved from this entry as desired. As explained in Section 2.1 .l the l, i n f o ~ a t i o nbase in which all the entries are kept is called thes y ~ b table ~ l or ~ a list. ~ e Not all identifiers must be looked upin the same set of entries: for example, v~riables must be looked up among the local and global identifiers, field selectors must be looked up the field selectors of a given type, etc. Each of these sets of entries defines a ,and the syntactic position of an identifier in the program determines which space it belongs in. The precise rules follow from the language manual, but usually identifiers in the syntactic positions of v able names, routine names, and some others live in onenamespace,the ~ e ~ ~~~e e r ~ ee,~ andfieldselectorsliveinspecialnamespaces belongillg to record types. implies that one can have a variable name i next to a field name i,~ i t h o uthe t g in each other’s way,as shown in theC code fragment struct one-int { int i; 1 i;
...
i.i =
3;
The first i in i .i is looked up in the general name space, leading to its identificationas a variable of type str u c t one-int, The second i is then looked up in the special name space of the field selectorsof the type struct one-int . Specific questions about the name spaces in a particular language must be answered by the lan~uagemanual. Examples of such questions are whether labels live in the general name space or in a special label name space, and whether module names have a name space of their own. In principle any position that can be distinguished syntactically or contextually can have its own name space, C has three main name spaces, one cont~ningthe names of enums, structs, and unions, one containing the labels, and one containing all the rest;thelatterincludesvariableidentifiers,routineidentifiers,typeidentifiers,and enumeration value identifiers. In addition, C has a name space for each struct and each union; these name spaces contain only the field selectors of the co~espondingstructs and unions.
ome name spaces, especially those in block-st~cturedlanguages, are scope-structured. These scopes work in stack fashion: there is a stack of scope elements, one for each scope entered, and all actions are performedon the top element(s). The rules are simple:
a new empty scope element is stacked upon scope entry; declared identifiers are entered in the top scope element; applied identifiers are looked up in the scope elements from top to bottom; and the top element is removed upon scope exit, thus removing all declarations from that scope.
A naive i ~ p l e ~ e n t a t i oofna scope-structured name space is shown in Figure 6.2; on
the left is the scope stack, on the right the linked lists of declaration information records, one for each name declared in each scope. The capital P in a record for a name stands for the properties attached to the name in the declaration.
wrong
r
Itright
level
l
It
...
4 3 ..*
2 1
0
stack scope
1L
..* properties name
Figure 6.2 A scope-structured symbol table.
Five levels of scope have been shown; they represent a possible constellation for a program: level 0 is the library level, level 1the program (routine declaration) level, the formal parameter level, level 3 the local variable level and level 4 a subblock. symbol table might result from the C code in Figure 6.3. This set-up allows easy insertion of new names and easy removal of entire scopes; identifiers can be foundby perfor~inga simple sequential search. The organization shown in Figure 6.2 is simple and intuitive, but lacks two features essential for use in a practical compiler: it has no provisions for name spaces and it does not use the fast symbol table access discussed in Section 2.1 .l l. We have seen there that some identifier identification must already be done between the lexical scan and the parser, long before anything about scope and name spaces is known. This identification is based on the routine Identify (Name ) ,which yields a pointer to a record of type Ident f - i ier inf 0, which gives access to all i n f o ~ a t i o n about the identifier Name. The speed of access is obtained usinga hash table. There are several ways to combine hash table identification, name spaces, and scopes in one implementation. We will discuss one possibility here, suitablefor C-like l a ~ ~ u a ~ e s ; for a partially alternative implementation^ see Exercise 6.2.
void rotate(doub1e angle) {
1
f . .
void paint (intleft, int right) { Shade matt, signal;
... { Counter right, wrong;
C code leading to the symbol cable in Figure 6.2.
6.5 together show an implementation that utilizes the easy access to hash table e x ~ ~ a i n eind~ e c t i o n2.1 11.1 ,provides a fixed n ~ ~ bofe r t of a single field selection, and allows efficient scope stack opera,the d i a g r a ~ sdepict a subset of the symbol table from Figure 6.2: on part: the Identification info recordsare cessed by the hash table. The name spaces are imp~emented as
sible macro definition of I; in C, such a macro definition takes e~nitions.The decl field points to a stack of declarations o f I eral name space, i~plementedas a linked list of declaration i n f o ~ a t i o nrecords. aration informat~o~ record is marked with its scope level, and the linked list is top record is the identification sought: the declaration infoma-
i~plementsthe rest of the scopestack: its structure ~irrorsthe ori cords pointed to by ion info records r e p r i ~ use ~ yof this s scope on top of the stack is to be ist of id~ntifiersdeclared in that e d e c l ~ a t i o ~as ~ ss, h o ~ nby the ment conclude the operation; the These data s t r ~ ~ t u r eallow s fast and easy addition of identifier declarations andthe
Id.info
name hash table
llpaintll
,4 A hash-table based symbol table.
removal of entire scope i n f o ~ a t i o nsets. ~ i t h o u these t data structures,scoperemoval would involve scanning the entire symbol table, which may contain hundreds or thousands of identifiers, to find the ones that are declared on the level beto removed. As we haveseen,recordandmoduledeclarationscreatenamedsubscopeswhich S live in scopes. In idf sel,idf is first looked up as an identifier in the idenives a pointer to a definition of id€,which among others holds a pointer to a record It describin the type of idf. i r , which may be defined in a different (older) scope than idf itself, is then tested to see if it is a st~cturedtype. If it is not, the e~pressionidf .se1 tries to select from a type that has no selectors and an error message must be given. In one possible imp~ementation,shown in Figure 6.7, T provides a pointer to a list of records describing selectors, in which the selector se1 can be looked up by se~uentia~ search. This will then lead to the type of the field. In Figure 6.7, it would be contained in property partP of the second record. Removing a level that holds definitions of st~ucturedtypes entails r e ~ o v both i ~ the ~
di.n f o
( wrong )
Id.info("right")
d ,i n f o ( I t s i g n a l
level
4
3
info d.
(" p a i n t )
I( d " r. iont faot e l f )
2
.*.
1
i n fdo.
( " p r i n t f If
i n fdo.
0
( s i g n a l I* )
..*
scope stack
igure 6.5 Scope table for the hash-table based symbol table.
PROCEDURE Remove topmost scope0 : SET Link pointer TO Scope stack [Top level]; WHILE Link pointer/= No link: // Get the next Identification info record SET Idf pointer TO Link pointer .idf-info; SET Link pointer TO Link pointer .next; // Get its first Declaration info record SET Declaration pointer TO Idf pointer .decl; // Now Declaration pointer .level= Top level // Detach the first Declaration info record SET Idf pointer .decl TO Declaration pointer .next; by Declaration pointer; Free the record pointed at level] ; Free Scope stack [Top SET Top level TO Top level - 1; Figure 6.6 Outline codefor removing declarations at scopeexit.
type definitions and their fields. Thisis not particularly difficult, but requiressome care.
type record
ure 6.7 F i n d i ~ g ” ttype ~ e of a selector.
If the source language manual states that all identifiers in a scope hide the identifiers with the same names in older (more outer) scopes, identificationis relatively simple: the definition record pointed at by the decl field in Figure 6.4 provides the definition sought. In a language with overloading, however, identifiers do not hide all identifiers with the same name; for example, PUT ( S : STRING) doesnothide PUT (i : INTEGER) inAda. Also, in many languages the operators are overloaded: the + in 3 + 5 is different from the one in 3 .l + 5 .l,sincethefirsthastwointegersasoperandsandthesecond operates on two floating point numbers. The ~ ~ b i g u i caused ty by the overloading is resolved by considering the context in which the name to be identified occurs. There are two issues here. The first is that in the presence of overloading the identifier identification process comes up with a set of definitions, rather than with a single identification. These definitions are selected from the list of definition^ supp~iedby the decl field in Figure 6.4. The rules for selecting such a set of definitions from the list of all definitions depend on the source language.The second issue then concerns the reduction of this set to a single definition. This reduction process is again based on language-specific rules that consider the context of the identifier. If, as a result of the reduction process, no applicable definition remains, the identifieris undefined; if more than one definition remains, the identifier is ambiguous; and if exactly one definition remains, it isthe definition of the identifier in the given context. Suppose, for example, that the compiler needs to identify the PUT in the Ada statement PUT ( He1lo It ) . In Ada, routine identifiers can be overloaded and the overloading has to be resolved by considering the types of the parameters. The compiler then calls Identify ( PUT It ) to obtain a list of definitions of PUT ( ) . Now assume the scope stack
level 1: procedure procedure
PUT PUT
= P64 ; (STRING) = P57; (MATRIX)
level 0: procedure PUT (INTEGER) = P48 ; procedure PUT (STRING) = P33 ; procedure PUT(FILE-TYPE/ INTEGER)= P27; procedure PUT(FILE-TYPE/ STRING)= P14;
Here PN,with N a number, represents the propertiesof the declaration in the symbol table; also,theotheridentifiersinthescopestackarenotshown.The call Identify ( PUT 1 willthenreturnthe list procedure PUT (STRING) = P64 ; procedure PUT(~TR1X)= P57; = P48 ; procedure PUT (INTEGER) procedure PUT(FILE-TY~E/ INTEGER) = P27; procedure PUT(FI~E-TYPE, STRING) = P14;
Note that the definition procedure PUT (STRING) = P33 is not included: it is hidden by the definition procedure PUT (STRING) = P64,since the routine name and theparametertypesareidentical.Goingthroughthislistwiththeparameter of type STRING of the call, the compiler rejects the last four entries because of differences in number and types of parameters, but keeps the first entry, P64. So only one definition remains and the call is correct. A call PUT (MY-OUTPUT , 3) will identify P27 , and the PUT in a third call PUT (MY-OUTPUT , 3 .1 4 ) would fail to be identified. A considerable additional complication is that the types of the actual p~ametersthat help determine the definition of a routine name can themselves be the result of overload resolution. The problem is even more severe when identification is combined with coercions; that situationis discussed in Section 6.1.2.3. Now that we have seen how we can still identify names in the presence of overloading, we consider its consequences for removing the top scope upon block exit. The main change is that there cannow be more than one definition with the same level in the definition record chains in Figure6.4. It is easy to remove them all rather than just the first one, when visiting the Identifier info record in the code in Figure 6.6.
Some progra~minglanguages feature constructs thatintroduc~copies of scopes from other places. Examples are the scope resolution operator in C++, the with-statement in Pascal and Modula-2, and the import declaration in Modula-2, Ada, and other languages with modules or packages.Thedetails of thesescopeimporters differ fromlanguageto language. In C++, a scope resolution operator x ::preceding a function definition introduces the field selectors of the class x in a scope around that function definition. The Pascal and ~ o d u l a - 2construct WITH x DO ... END is similar: it starts a newscopein which the field selectors of the record x are available as identifiers in the general name space. The FROM module IMPORT declaration of Modula-2andthe use declaration of Ada, however, do not introduce a new scope but rather merge the new names into the
present scope. Name clashes must be prevented in Modula-2 by explicitly selecting the imported narnes, and are avoided automatically in Adaby the overloading mechanism and the visibility rules associated with the u s e declaration. In all cases the result is that, with some identifiers,the programmer canjust write S e l,instead of i d f .S e l,or more importantly PUT ( " T e x t It ) instead of TEXT-IO .PUT ( " T e x t ) . e obvious im~lementationof this feature is of course to retrieve the scope S to be imported, and to collect all names in it. Next, this collection is filtered as required, and a new scope is created and stacked if the semantics of the importing construct prescribes so. Each name N is then defined in the top-level scope in the usual fashion, with its properties derivedfrom its entryin S. The timerequired by thistechnique is propo~ionaltothe number of names imported.
The previous section has shownus that looking upan identifier leads toa declaration information record a cont~nerfor the properties of the object that the identifier identifies. The most basic and obvious, but therefore somewhat inconspicuous, propertyis the the object -whether it is a constant, a variable, a routine, a type, a module, etc.; first step in checking the proper use of the identifier is checking this kind. Kind checking in itself is almost -but not entirely -trivial: for example, when the context requires a module i~entifier, itis simple to test if a module identifier has been supplied, using the decl~ationinformation. Some of the kinds, however, specify objects that i for exampleconstants,variables,androutines.Thesevaluesbelongto t values and operations -and their use is subject to often complicated and and has to be checked: type checking is not at all trivial. We will therefore first concentrate on type checking, and address oneissue in kind checking at the endof this section. Type checking is involved in large parts of the annotated syntax tree. For example, the language rules usually specify which types can be combined with a certain operator; there are rules for formal and actual parameter types in a routine call; and assigning an expression value to a variable restricts the allowed typesfor the expression. Type infomation in a compiler has to be implemented in such a way that all these and many other checks can be p e ~ o ~ conveniently. ed Types are usually introducedby name through ty
-
TYPE Int-Array = A R W Y [Integer l..lO] OF Integer;
which defines the type I n t a r r a y , but they may also be introduced ~ ~ o n y ~ o ~ as sinl y , the following variable declaration: Y [Integer l. .lo] OF R e a l ;
When a type is introduced anonymously, it does not have a name in the source program, but for uniformity it needs one. Therefore, the compiler must use its imagination to invent a unique name, for example producing such internal names as# t y p e 0 1-in-l i n e - 3 5 . So the above declarationof a is seen as an abbreviationof
TYPE #typeOl_in-line-35 = ARRAY [Integer l..lO] OF Integer; VAR a: #typeOl-in-line-35;
Type declarations often refer to other type identifiers. In some languages they are allowed to refer to identifiers that have not been declared yet; such references are called ces. Forwardreferences enable theusertodefinemutuallyrecursive types, for example: TYPE Ptr-List-Entry = POINTER TO List-Entry; TYPE List-Entry = RECORD Element: Integer; Next: Ptr-List-Entry; END RECORD;
However, forward references writer:
also add some complications to the work
of the compiler
-The forwardreferences must be resolved. When, during processing,a forward reference is met, it is added to the symbol table, marked asa forward reference. Next, when a type declaration for thisforwardreference is met,itssymboltableentry is modifiedto represent the actual type insteadof a forward reference. -A check must be added for loose ends. At the end of a scope in which forward references occur, the forward references all must have been resolved if the language manual says so (and it usually does), This check can be implemented by checking all symbol table entries of the scope for forward references. If a forward reference is met, it is a loose end, and must be reported. -A check must be added for circularity. The consequence of allowing forward references is that the user cannow write a circular type definition: TYPE X = y ; TYPE y = X;
which probably is illegal. We will see below how to deal with this problem.
e Alltypesin a compilationunitarecollectedin a ty le, with a singleentry for each type. For each type, the type table entry might, among others, contain the following:
-its type constructor (‘basic’, ‘record’, ‘array’, ‘pointer’, and others); -the size and alignment requirementsof a variable of the type;
-the types of the components, if applicable.
Various information is being recorded for types:
-for a basic type: its precise type (integer, real,etc.); -for a record type: thelist of the record fields, with their names and types;
-for an array type: the numberof dimensions, the index type(s), and the element type;
-for a pointer type: the referenced type; -for other type constructors: the appropriate information. The type table entry must contain all the type information required to perform the type checking and code generation. The exact content therefore dependson the source language at hand. The representation in the compiler, on the other hand, depends on the implementation language, the language in which the compiler is written. In an imperative implementation language, the representation of a type usually is a record with a variant part (or a union) for those items that depend on the type constructor. In an object-oriented implementation language, a type would be an object class with all fields and methods that all type constructors share, and each type constructor would have its own subclass, with extensions specific for the type constructorat hand. To demonstrate how the type tableis built, let us now consider a simple language with record and pointer as the only type constructors, and one built-in type: integer. Also, for the sake of si~plicity,let us assume that there is only one, global, scope for type identi~ers.In this language, the built-in type integer is a predefined identifier indicating an integer type. The compiler places the identifier in the symbol table and its type in the type table, before it starts processing the input. This results in table: symbol table:type
TYPE
0:
INTEGER
ttintegerlt : TYPE
0
where the type table is indexed by values of the form TYPE n (implemented as integers or pointers) and the symbol tableis indexed by strings, as explained in Section2.1.11.1. Because of possible forward references,it isnot possible tojust build the type table as input is processed. One possible solution is to add identifier references to the type table, which must be resolved when all input has been processed. For example, let us process the following type declarations: TYPE a = b; TYPE b = POINTER TO a; TYPE C = d; TYPE d = C ;
Processing the first type declaration, table and symbol table:
able:symboltable:
TYPE a = b; ,might result in the following type
type
TYPE 0 : INTEGER "integert1: TYPE 0 TYPE 1: ID-REF l t b t t . TYPE l "b" : ~ D E F I ~ E D - T Y P E
Then, processing TYPE b = POINTER TO a ;results in another type table entry for a (becausereferencestoidentifiershavenotbeenresolvedyet). So, weadd TYPE 2 : ID-REF It arttothetypetableandprocess TYPE b = POINTER TO TYPE 2 . The ~ D E F I N E D - T Y P E symbol table entryfor b is now resolved, and we obtain:
ble:
ble:
symbol
table:
type
TYPE 0 : INTEGER TYPE 1: ID-REF r l b l l TYPE 2: ID-REF Irac1 TYPE 3: POINTER TO TYPE2
llintegerll : TYPE : TYPE l I1b": TYPE 3
0
The last two lines of the input are processed in the same way, and our type table and symbol table now look as follows: symbol
table:
type
TYPE TYPE TYPE TYPE TYPE TYPE
0:
1: 2: 3: 4: 5:
INTEGER ID-REF i l b l l ID-REF Irat1 POINTER TO TYPE 2 ID-REF I'df1 ID-REF llcll
"integer": TYPE
0
r l a l:l TYPE l "b" : TYPE 3 r t C 1 l : TYPE 4 r l d l: l TYPE 5
Nowthattheinputhasbeenprocessed,allidentifierreferencesmustberesolved, There are two reasons for doing this, the more important one being cycle detection. The other reason is that it is convenient to have each type referred to by a single, unique, index in the type table. The first step in resolving the identifier references is to replace the identifier references in the type tableby type references: a reference to an entry in the symbol table which refers to an entry in the type table is replaced by a direct reference to this type table entry. Our modified type tablenow has the following contents: type table:
TYPE 0 : INTEGER TYPE 1: TYPE 3 TYPE 2: TYPE 1 TYPE 3: POINTER TO TYPE2 TYPE 5: TYPE 4
Now, cycles can be detected by the closure algorithm presented in Figure 6.8. The algorithm also resolves type references to type references: when it has finished, a type table entry can still be a TYPE,but the entry it then refers to is no longer a TYPE -unless it is a cyclicdefinitioll.Thealgorithmconstructs a set Cyclic containing all typetable entries desc~binga type with a cycle. After executing this algorithm, our type table looks as follows:
type table: TYPE 0 : INTE~ER
TYPE 3: POINTER TO TYPE 2 TYPE 4: TYPE 4 TYPE 5: TYPE 4
and the Cyclic set contains both TYPE 4 and TYPE 5 . For a11 members in the Cyclic set, an error message must be produced. Also, these type table entries can be replaced with a special ERRONEOU~-TYPE entry,
ata
1. Let 7" be a type table that has entriescontain in^ either a type desctiption or a reference (TYPE) to a type table entry. 2. Let Cyclic be a set of type table entries.
nitia~i~ation~: Initialize the Cyclic set to empty.
. *
a TYPE type table entry t l in 7" and t l is not a member of Cyclic, let t 2 be the type table entry referred toby t 1. l.If t 2 is t then add t to Cyclic. 2, If t 2 is a member of Cyclic then add t to Cyclic. 3. If t is again a TYPE type table entry, replace, int l ,the reference to t 2 with the reference referred toby t 2 . S
.8 Closure a l ~ o r i for t ~ detecting ~ cycles in type definitions.
A good next step is to replace any type table reference TYPE n, where the corresponding type table entry n contains TYPE m, with a type table reference TYPE m, Note that there may be many of these references, in the symbol table, in the type tableitself, as well as in other data s ~ ~ c t ~inr the e s compiler. This action converts our type table and symbol table to type table:
TYPE TYPE TYPE TYPE TYPE TYPE
0:
l: 2: 3: 4: 5:
INTE~ER TYPE 3 TYPE 3 POINTER TO TYPE 3 ER~ONEOUS-TYPE ERRONEOUS-TYPE
symbol table: *lintegey:ll: TYPE 0 11 a 11 . TYPE 3 "b" : TYPE 3 l r C " : TYPE 4 Ild" : TYPE 5
and may make similar changes to otherdata structures in theCompiler. Another option is to have a function Actual type ( 1 in the compiler, which is used whenever a type table reference is required and which resolves aTYPE, for example: FUNCTION Actual type (Type index) RETURNING a type index: IF Type table [Type index] is a TYPE reference: RETURN Type table [Type index] .referred index; ELSE Type table [Type index] is a direct type: RETURN Type index;
The first approach is less error prone, but requires an extra pass over the internal compiler data structures. For a different approach to scope table management see Assmann (l 992).
of a routine call, When type checking an expression or the formal and actual parameters often two types must be compared to see if they match in the given context. For example, a routine may be declared to have a parameter of a floating point type, and a call to this routine may have an integer parameter. The compiler has to detect this situation, and produce an error message if the language rules do not allow this. Which combinations are allowed in which context is specified by the language manual. For example, the routine call of the example aboveis allowed in (ANSI) C, but not in Modula-2. An impo~antpart of comparing two types concerns the notion of ty M e n two types are equivalent, values of these types have the same representations: one canbeusedwheretheotherisrequired,andviceversa. The languagemanual,again, specifies when two types are equivalent. There are two kinds of type equivalence, structuralequivalenceandnameequivalence.Virtuallyallmodernimperativeandobjectoriented languages use name equivalence. ~~~e e ~ ~ i v a l e ~ c e Two types are ~ a m e - e ~ u i ~ a l ewhen n t they have the same name. Note that this requires every type to have a name (eitheruser-~eclaredor anonymously assignedby the compiler). In a language with name equivalence, the two types t 1 and t2 in TYPE tl = ARRAY [Integer] OF Integer; TYPE t2 = ARRAY [Integer] OF Integer;
are not equivalent since they have different generated names, but the following two types are: TYPE t3 = ARFUAY [Integer] OF Integer; TYPE t4 = t3;
~ ~ p l e ~ e n t ai name n g equivalencecheck is easy:name-equivalenttypeshavethesame index in the type table, once the ID-REF entries as described above have been resolved.
~ ~ r u ~ te ~~ ru ai vl a l e ~ ~ e u ~ ~ a lwhen e ~ t variables of these types can assume the same values (when the two types have the same structure) and allow the same operations. This makes the types TYPE t 5 = RECORD c : I n t e g e r ; p: POINTER TO t 5 ; END RECORD; TYPE t 6 = RECORD c : I n t e g e r ; p : POINTER TO t 6 ; END RECORD; TYPE t 7 = RECORD
c : Integer; p: POINTER TO RECORD
c : Integer; p : POINTERTO END RECORD; END RECORD;
t5;
all equivalent.
Testing for structural equivalence is difficult. As it is hardly used anymore, we will only discuss it very briefly here. The algorithm basically works as follows: first, all types areplacedinto a singleequivalenceclass.Then,repeatedly,attemptsaremadetosplit each equivalence class E into two equivalence classesE land E 2 , by selecting a single type T as a pivot from E and putting all types that can be shown to be not equivalent toT in E I and all types that cannot be shown to be not equivalent to 7“ in E 2 , according to certain rules.
Type equivalence is only a building block in, not the complete answer to, type checking. In type checking, we want to answer the following question: if we expect a value of type T Iat a particular position in the program, but we find a value of type 7 ’ 2 , is that acceptype expected is sometimes called the a e , but the terms ‘type expected’ and ‘typ tive and less error-prone. If T Iand 2’2 are equivalent, the type check certainly succeeds;if they are not equivalentthe rules are language-dependent. For example, if a is a variable of type real, we expecta value of type real in the right-hand sideof an assignment toa. However, the assignment a :=5 may have to be dealt with as well. Some languages do not allow any typemismatch,othersrequirethecompilertoinsert a data integer to real. Such an implicit data and type conversion is called a eoere Exactly which coercions the compiler can insert dependson the language. In general, more than one coercion may have to be applied. For example, in the assign~entxx := 5 where xx is of type complex, a coercion from integer to real and then a coercion from real to complex may be required. in many languages, the possible coercions also depend on the context, For example, in C, the context ‘operand of a binary operator’ differs from the context ‘right-hand side of an assignment expression’. In the latter context, a coercion from real to integeris allowed, in the formerit isnot.
The presence of coercions complicates operator and routine identification, because operand and result types may need to be coerced to another type, before a matchin ication can be found. Also, unrest~~cted application of coercions may lead to ambiguities. For example, the + in the expression 2-I- 3 indicates integer addition, but, allowing a coercion from integer to floating point for both operands, could also indicate floating point addition. Finding the proper coercions in an AST, for a language with arbitrary sets of types, contexts, and coercions, is a very difficult problem, for which no solution has yet been found. The ap~roachpresented here is s i m i l ~to the one used in Section 6.1.1.2 to handle overload in^ of identifiers,andworks for ~oderately complicate^ coercionrules.It is based on two closure algorithms, to be applied in succession, The first finds all types in each node that might play a role, the second crosses out all inapplicab~etypes; the details depend on the source language rules. If in the end any typeset in a node is empty, no identification could be made, and if any of the type sets contains more than one element, an a~biguityhas been detected. The closure algorithms are given in Figures 6.9 and 6.10. The sets with which the type sets of the nodes are initialized can be built in (as for a leaf 3 .i4) or supp~iedby the identification mechanism. Note that even the leaf nodes get type sets, since they too may be overloaded (for example enu~erationvalues in Ada), andlor coerced. The inference rule in Figure 6.9 addsall types reachable by coercions; usually there are very few of these. The inference rules in Figure 6.10 then remove all types that are upward or downward inco~patible. Note that the algorithm only d e t e ~ i n e sthe types; we have to record the coercions used separately.
Let each node in the expression havea variable type set S attached to it. Also, each node is associated with a non-variable context C. The S of each operator node contains the resulttypes of all identifications of the operator init; the type setS of each leaf node contains the typesof all identi~icationsof thenode. The context C of a nodederivesfromthe lan~uagemanual. For each node N with context C, if its type set S contains a type T Iwhich the contest C allows to be coerced toa type T 2 ,7'2 must also be present inS. The closure a l g o ~ t h for l ~ id~ntification in the presence of overloading and coercions, phase 1.
As said before, the algo~thmpresented here is not the answer to all identification problems in the presence of overloading and coercions. In fact, as it stands, it is not even capable of handlingoperatoridentificationin C. Thebasicproblem is mainlyone of
Let each node in the expression have a type set S attached to it. et S of each node be filledby the algorithmof Figure 6.9. 1. For each operator node N with type set S, if S contains a type T such that there is no operator identified in N that results in T and has operands T I and T 2 such that T Iis in the type set of the left operand of N and T 2 is in the type set of the right operand of N,Tis removed from S. 2. For each operand node N with type set S, if S contains a type T that is not compatible withat least one typeof the operator that works on N,Tis removed from S. igure 6.10 The closure a l g o r i t h ~for identificationin the presence of overloading and coercions, phase 2.
language design and can be sketched as follows. Suppose a language has two types, int and r e a l , andtwo+-operators,withtypes ( i n t , i n t ) -+ i n t and ( r e a l , r e a l ) -+ r e a l. To accommodateexpressionslike 3 .l4 + 5 thelanguagehasto allow a coercion from i n t to r e a l in operand context, to reach the expression 3 .14 + (r e a l ) 5 , which identifies the second +, But having such a coercion in operand context makes the expression 3 + 5 ambiguous, since it allows both 3 + 5 to identify the first + and (r e a l ) 3 + (r e a l ) 5 to identify the second +. This problem in language design is solved by having rules like ‘In operand context, a value can only be coerced rto e a l if the other operand is r e a l without using a coercion’. Such rules make the coercions allowed in one syntactic position dependent on what is found in another syntactic position. This kind of longer-ran~erelationship is not supported by the algorithm presented here and has to be coded separately, either as a third scan over the expression or as ad-hoc code in the inference rule in Figure 6.10. Another potential problem with the a l g o ~ t his ~that phase 1 will not terminate for certain coercion rules. This happens, for example, with the coercion rules of Algol 6 where the inference rule in Figure 6.9 will continue forever to require new types. Fortunately very few programming languages have such a complicated coercion mechanism as Algol 68, so this phenomenonis usually not a problem.
Some languages allow the use of casts. A east specifies the required type explicitly. For the compiler, it justintroduces a different context, with usually different and stronger coercion rules. A cast differs from an explicit e o ~ ~ ~ r s in i othat n it still uses the coercion system of the language. In contrast, an explicit conversionis a function which~ansformsdata of one type to that of another type, and the types of its operands and result are subject to
the normal source language type rules, Of course, the function could be a built-in function of the language, in which case the compiler must know about it. On the other hand, it could just be a library function.
eking is trivial, as explained at the start of this section. The exception concerns constants and variables, and the complication arises from the fact that the actual kinds we are concerned with are locations and values rather than constants and variables. We will now examine the relationships between these notions. In the assignment ~ e s t i ~ ~:=source, t i o ~ we expect a location on the left and a value ontheright. Ifwe adheretotheideathattheassignmentoperator,likealloperators, requires values as input, we expect a value which is the address of a location on the left and a (normal) value on the right. Based on the topology of the assign~entstatement, the first e (pronounced ‘el-value’) and the secondis called an ~ ~ ~ ~ En the assignment p := q ;,in which p and q are variables, we expect an lvalue for p and an rvalue for g. Since p is a variable, it has a location and its address is the lvalue; one even says that ‘a variable isan Ivalue’. On the other hand, q is an lvalue too, but in its position an rvalue is required. The kind checking system solves this by inserting a coercion which retrieves the contents of the location addressed by an lvalue. This coercion is similar to the dereferencing explained below, but is ‘less conspicuous because it is usually inco~oratedin the machine inst~ctions.Although the AST for the assignment p := q; , as amended by kindchecking,containsadereferencenode(Figure 6.11, inwhichthe arrows show the dependencies), this node is not reflected explicitly in the c o ~ e s p o n d i ~ g code sequence: Load-Mem
q, R1
Store-Reg
R1,p
It is hidden inside the machine instruction Load-Nern, as explained in the i~troductionto register machines at the beginningof Section 4.2.4.
de~ef
(location of)
P (location of) q
Figure 6.11 AST for p
:=
q with explicit deref.
~
We are now in a position to formulate the kind checking rulesfor lvalues and rvalues. The table in Figure 6.12 shows the basic rules;a -indicates that no actionis required. The AST of an expression, according to lvalue/rvalue attribute propagates bottom-up in the language-dependent, but usuallyobvious, rules. Some of the rules are stated in the table in Figure 6.13, for C or a similar language; V stands for lvalue or rvalue. The combined rules state, for example, that a [11 is an lvalue if a is a variable; type checking then tells us that a must be an array. This means that a [1] can be used as the destination in an assign~ent,since an lvalue is expected in that position. Suppose, however, that the expression a [1.] is used as an index, for example in a [a [1l l ;here an rvalue is re~uired,so the lvalue needs to be dereferenced. On the other hand, 3 := 4 is erroneous, since 3 is an rvalue, and an lvalue is required; and so is &x := 7 , for the same reason.
expected
lvalue rvalue ~ o ~ n d
Figure 6.12 Basic checking rules forlvalues and rvalues.
expressio~ constr~ct
resulting kind
constant identifier (variable) identifier (otherwise) &lvalue *rvalue VLrvalue] V. selector rvalue+rvalue lvalue:=rvalue
rvalue lvalue rvalue rvalue lvalue
V V rvalue rvalue
Figure 6.13 lvaluelrvalue requirements and results of some expression constructs.
Thisconcludesourdiscussion of contexthandling for imperativeand object~oriented languages. The main problem is the identification of types, operators and identifiers. The problem is complicated by forward references, type equivalence rules, routine and operator identification rules, overloading of identifiers and operators, and c0ntext”dependent coercions. No hard-and-fastgeneralalgorithm is available,but for almost all practical languages the problem can be solvedby some form of inference rules working on typesets in the nodes of the expressions, and for many languages much simpler approaches involving only one type per node are possible,
In this section we discuss someof the data structures that represent source languagedata at run time, and the mn-time manipulation needed to deal with the source language operations on these data. In the source language, a data item has a type, which may be a basic, built-in type of the language, or a const~ctedtype, built using oneof the type const~ctors e, The target language data types are usuallyl i ~ i t e dto single bytes, integers of various sizes, address representations, and floating point numbers of several sizes. verysourcelanguage data type is mappedto a p ~ i c u combination l ~ of target lan~uagedata types, and the run-time representation of a source language data item is the result of the application of this mapping, It is the task of the compiler writer to create such a ~ a p p i n g .We assume here that the target language has the common arithmetic? coma byte-addressable memo^; thisassumption is parison,andcopyoperations,andhas almost universallyjustified.
in a source language are characters, integers of several sizes, and of several sizes. The source language operations on these typically are arithmetic operations? assignment and c o ~ p ~ s o nThe . ~ t h m e t i coperations include addition, subtraction? multiplication, division,andremainder.Allthesecan be mapped directly to the target language data types and operations. Often, the same arithmetic operationscanbeused for unsignedarithmetic,except for comp~ison. Forthese,special i ~ s t ~ c t i o usually ns exist, Characters usually are mapped to single bytes, although some modernsourcelanguagesrequiretwo or fourbytestorepresent all possiblecharacter values, Somesourcelanguages also have a voi e, co~espondingtonovalue at all.In a voidtype is present,butonlyimplicitly. A targetlanguage someotherlanguages representation for a void type is not needed.
defines a set of narnes to be the values of the new data type, The is an integer, with values usually ranging from 0 to the number of enumeration values minusl ,although some languages allow the programmer to specify the integer values corresponding to the names of the enumeration type. In any case, the range of values is h o w n at compile time, so an integer type of suitable size can be chosen to represent the values. for Operations allowed on enumerations usually are limited to copying, comptuison equality, comparison for greater/smaller, and sometimesincre~en~decrement, all of which are readily available in the target language for the integer representation chosen. An enumeration type which is available ages, including some that do not have explicit enu~erationtypes, is with the false and true as enumeration literals. An implementationwith 0 for false and 1 for true suggests itself, but in many cases representing the Boolean value can be avoided, as shown in Section 6.4.1.1.
Mostimperativeandobject-orientedlanguagessupport a Pointers represent the addressesof source language data structures. The ~ n - t i m erepresentation of a pointer is an unsigned integer of a size large enough to represent an address. The integer operations for copying, assi nment, comparison, etc. are available for pointers as well, and some target machines have special inst~ctionsor addressing modes for dealing with pointers. The one operation that is particular pointers to is ,which consists of obtainingthevalue of thedatastructurethatthepointerrefersto. If thevaluetobe obtained is small enough tofit in a register, dereferencing can usually be implementedas a single machine inst~ction,If the value is larger, though, as in dereferencing a pointer to a record or may, thesituation is morecomplicated.Theoretically,dereferencingdelivers the value on topof the working stack, but itis more efficient to find out what the final destination of the value is and to copy it to that place directly.For example, the assignment 9 = *p;
in which g is the name of a record variable of type T and p is a pointer to a record of the same type, may be translated toa call of a library routine byte-copy () : byte-copy(&q, p, sizeof
(7')
1;
or the loop insidebyte-copy may be in-lined. An obvious optimi~ationis available when the record obtained by dereferencing is C has a specialnotation for thissituation, usedonly for fieldselection.Thelanguage pt ->fie r Id, but other languages require the p r o g r a ~ e rto use the formally correct notation (*pt r ) .f ield. Literal implementation of the latter would indeed dereference the record underthe pointer pt to the r top of the working stack and thenreplace the top of the working stack by the selected field. ut when the compiler recognizes the situation it
can first compute the pointer to the field and then dereference that pointer: ( * p t r ) .f i e l d is translated as * ( & ( p t r - > fi e l d ) ) . The above applies when the context requires an rvalue. When the context requires an lvalue for a field selection, for instance in an assignment to p t r -> f i e l d , the required address can be obtained by adding the offset of f i e Id within the record to the pointer ptr.
Mostlanguagesthatsupportpointersactuallysupport ty inters, for example POINTER TO INTEGER. In addition, there is often a ~ ~ ~ ~ r i becoercedtoanyotherpointertype.Forexample, C has a ‘pointertovoid’type, Modula-2 has a ‘pointertobyte’type.Coercingsuch a generic pointer type to another pointer type isa compile-time action, for which no run-time codeis required,
In the above we have assumed that a pointer indeed points to a valid value present in memory. In an incorrect program this need not be the case. Although the compiler can usually make sure that if the pointer refers to a value, that value will be of the right type, there are a number of situations in which the pointer does not referato value: l. the pointer was never initialized; 2. the pointer is a null pointer, normally used to indicate the absence of a value, but the p r o g r a ~ e forgot r to check; 3. the pointer once referred to a value, but that value was located on the heap and has been removed since by a f r e e () operation, leaving the pointer dangling; 4. the pointer once referred toa value, but that value was located in the activation record of a routine that has since been terminated, leaving the pointer dangling. as anyone with programming experience inC can These dangers make pointers hard to use, confim. ~anguagesdiffer in their approach to bad pointers. In C, the actions of an incorrect program are undefined, and the best one can hope of any of the above errorsis that the programcrashesbefore it producesincorrectresults.Thememorymanagementunits ( M ~ I J s of ) most processors will generate an exception or trapif the pointer to be dereferenced is null, thus catching error2 above, but their behavior on uninitialized pointers (error l) will be erratic, and since errors 3 and 4 involve perfectly good pointers whose referents have gone away,no MMIJ assistance canbe expected for them. ~anguagesdesigners sometimes take measures to tame the pointer. Several approaches suggest themselves. Avoiding pointers at all is a good way to solve pointer problems (as done, for example, in Java and in functional and logic languages) but requires a lot of alternative programming support. Automatic initialization of pointerseliminatesuninitializedpointers(error l). The derefere~cingof null pointers (error 2) can be eliminated in the language design by having two kinds of pointer-like type constructors, ‘pointers’, which may point to a value or be null, and ‘references’, which are guaranteed to point toa value, References can be dereferenced safely. Pointers cannot bedereferenced and can onlybe used in a test for null; if the of course, be dereferenced pointer is not null, a reference is obtained, which can then, safely. Having a garbage collector and disallowing calls to f r e e () eliminates dangling
pointers to the heap (error 3). And disallowing pointers to local variables eliminates other dangling pointers (error4). Although these possibilities are language design issues rather than compiler design issues, they are still relevant to compiler design. I~plementingautomatic initiali%ationof pointers is easy, and symbolic interpretation can often establish that a pointer cannot be null. This is especially i m ~ o ~ t ain n ttranslating generated code, in which care has already been taken never to dereference null pointers. Also, there is a technique that avoids dan~ i t disallowing ~ ~ ~ pointers t to local varigling pointer errors due to returning routines ables. Aspects of this technique are importantfor the ~ n - t i m esystems of some i~perative and many functional and logic languages, which is why we will discuss these aspects
Of the four types of errors, error 4 is the most problematic, sincea good pointer turnsinto dangling pointer through a routine exit, an action that is only remotely associated with the pointer. Following the saying that anounce of prevention is worth a pound of cure, a set of rules has been developed to prevent dangling pointers from arising in the first place, the so-called ‘scope rules’. Although no language except Algol 68 has put these rules in th hands of the programmer, they playa considerable role in the more advanced forms of routine const~ctionand calling, and in the imple~entationof functional and logic la Because the possibilities and restrictions of passing routines as p~ametersand r them as values are difficult to understand without these scope rules, it seems ad treat them here, in their more easily understandable data structure version. Values located in activation records have limited ‘lifetimes’, where the entity is the time in the runof the program during whichthe entity exists. Th value is the same as that of the activation record in which it resides. Therefore, pointers to vali of a pointer is equaltothe localvalueshavelimited‘validityspans’;the time of thevalue it pointsto.Forhistoricreasons tyspan of a pointer is calle and the rules for avoiding dangling pointers are called these is unfo~unate,since it inevitably causes confusion with identifier’ and ‘scope rules’, as discussed in Section 6.1.l,but we will conform to it to keep in line with the literature. Where necessary, a distinction can be made by u s i ~ gthe terms ‘identifier scope’ and ‘pointer scope’. The values that pointers refer to can be located in activation records or on the he for the purpose of this discussion, the global data area can be considered as the oldest activation record. The lifetime of a value is equal to that of the container it is located in. For values on the heap, the lifetime is infinite; this does not mean that the data will be kept infinitely long, but rather thatit is impossible to ever find out that its lifetime may not beinfinite. The lifetime of an activation record is governed by the routine calling mechanis~;if the call is a simplesubroutine call, the lifetime of theresultingactivationreco enclosed in the lifetime of the activation record of the caller. If we call the lifetime of an activation record its ‘scope’, as we did before with pointers, we see that one S completely inside another scope. We will call a scope P smaller than a scope P lies entirely inside Q. This imposes an ordering on scopes, but we will s scopes, for example those in different threads, are incommensura~le,so the
partial one in principle. In summary, the lifetirneof a value is the time span during which the valueexists, the scope of a value is the time span during which the valueis valid. The purpose of the scope e sure that any value will be valid during its entire lifetime. We are now in a position to formulate the scope rules (van~ i j n g a ~ d et e nal., 1975): he scope of a location on the heap is infinite; the scope of a location in an activation record is that of the activation record. -The scope of the activation record of the program is infinite; the scopes of the other activation records depend on the callin ~echanismand the i ~ p l e ~ e n t a t i o n . -The scope of a value is the smallest scope of any pointer it contains or infinite if it does not contain pointers. -The scopeof a pointer is that of the location into which it points. -A value with scope V may be stored only in a location with scope L if V>L; in other words, p := q requires ‘scope of p 2 scope of q’. It is incorrect to express the last rule as‘A value with scope V may not be stored in a location with scope L if VcL’. The assignment is also forbidden if V and L cannot be cornpared, since that would imply the possibility that part of the lifetime of L falls outside the lifetime of V. Together these rules ensure that values remain valid as long as they exist, so no danglingpointerscanoriginatefrom ro~tinesreturning. If activationrecordsareputon a stack, the scope of the activation record of a called routine is smaller than that of the activation record of the caller. An i ~ e d i a t coilsequence e of that is that data on the heap cannot contain pointersto the stack, except perhaps tothe global activation record. Another consequenceis that a local pointer variable ina routine R cannot point to data localto a routine called by R. This effectively prevents the notorious dangling pointer gning the address of a local variable to a pointer passed as a p ~ ~ e t e r , e 6.14. Here the scope of &actual-buf erfis that of the activation in-buf fer ( ) , andthat of *buf fer-pointeris that of do-buf f( ) ,which er is larger. This assignment constitutesa scope rule violation;if it is C, the following happens. When allowed pass tounchecked, as it inis obtain-buf f ( ) er returns, actual-buffer [l disappears,but a pointerto it remains in buffer,which is passed to use-buf f( ) . When er use-buf erf( ) then uses its now invalid parameter, the scoperule violation takes its toll. In princi~le,the scopes of pointers, locations, and values could be m~ntainedand arises, as it does checkedat mn time, butdoing so is awkwar~andinefficient.need ymbolic i n t e ~ r e t a t i can o ~ beusedtoavoidgenecheckswheretheyare n practicethescoperulesareused at compiler d me, to designefficient il~p~ementations of advanced routine operations in all paradig~swith the property that no dangling pointers will ever be generated. These imple~entationsare designed so that they never store a short-lived value in a lon er-lived container. We will see several examples in ection 6.3, A run-ti~e a~proach to pointer safety is taken by Austin et al. (1994), who discuss a ~ e t ~ for o dthe detection of all pointer and may access errors by replacin so called ‘safe pointers’.
void do-buf f er char "buffer;
1
{(void)
; obtain-buf fer (&buffer) use-buffe~(buffer);
void obtain-buffer(char **buffer-pointer) { char actual-buffer[256];
1
*buffer-pointer = &actual-buffer; /* this is a scope-violating assignment: */ /* scope of *buffer-pointer > scope of &actual-buffer */
Figure 6.14 Example of a scope violation in C,
tu^^, is a data item in which a fixed number of members of possibly different types are grouped together, In the target language, these members are represented consecutively in a memo^ area that is large enough to contain all members. For example, a variable of the C structure
struct example { int member1 ; double mernber2;
1;
is -in principle -represented as follows:
member2
owever,someprocessorsimposeaddressalignment requirementsoncertaindata accesses. For example, the SPARC processor requires an int (a 4-byte quantity) to have an address that is a multiple of 4 (is 4-byte aligned), and a double (an 8-byte ~uantity)to ned, Memberi of the example above thus must be 4-byte ali easily accomplished if the structure itself is 4-byte aligned, but member2 must be 8-byte aligned, and this c m be accomplished by first making sure that it is 8-byte aligned within the structure by inse~inga gap of 4 bytes between memberi and member2,and then makingsurethatthestructureitself is 8-bytealigned. This way, member2 willalso always be 8-byte aligned. In general, gaps must be inserted between the structure members to make sure that each member is aligned properly within the structure, and then the size and the alignm~nt require~entfor the structure itself must be computed. This alignment requirement is the lowest common ~ u l t i p l e(LCM) of the member ali~nmentrequirements, Often this is just
e largestmemberalignmentrequirement,becausealignmentrequirementsusuallyare small powers of 2 (and thus one is either a multiple or a divisor of the other). Also, a gap can be inserted at the end of the s t ~ c t u r eto make sure that the size of the structur~is a ~ultipleof its alignment requirement. This is convenient when the structure is usedto build other data types, such as an arrayof these st~ctures. o get back to our example structure, on a SPARC it will be represented as follows:
I
I
member2
and it must be aligned on 8-byte boundaries. If x is a vaiable of this type, it has size 16, .~ ~ m lies~ at offset e ~ 0 from l the start of x,and x member2 lies at offset 8 from the I
e mandatory operations record ontypes and are copying; comId from a record is for equality is not always required. The se ~ ~ c o ~ p l i s in h ethe d target languageby co~putingthe address of the field, and thenaccessing the field t ~ o u g hthat address. This is done by adding the offset of the field within the record, which is known to the compiler, to the address of the record. The resulting address is the lvalue of the field (if it has one in the context at hand). To obtain the rvalue of the ress must be dereferenced. Record copying can be accomplished by either elds individually or by copying the whole record, including the contents of e latter may be more efficient if there is an efficient m e ~ o r yblock copy outine available in the target language. possiblepresence of gapsmakesrecordcomparisonmore c o ~ p l i c a t ethan ~ ng. The contents of a gap usually are undefined, so they must be disregarded ring records. Therefore, record comparison has to be done field by field. For record type above, the compiler could generate a routine, r e t u ~ i n g1 if the al, and 0 if they are not, according to the followingC scheme: int compare-example(struct example *SI, struct example *s2) { if (sl->member1 != s2->memberl) return 0; if (sl->member2 != s2->member2) return 0; return 1;
1
enerate code that calls this routine whenever two records of this type must be . It could also generate in-linecode to perform the comparison.
is a data type of which the values areof one of a set of types. For example,a variable a of type
union { int i; float f;
1;
in C can either hold a valueof type i n t or a value of type f loat. To access the value of type i n t , the programmer uses a .i;to access the value of type float the p C, the pro uses a . f . A union cannot hold both values at the same time. In responsible for keepingtrack of which of theunionfields is present:theunion is not ‘discriminated’. Some other languages have a special field, called the always present in unions and indicates which variant the union current1 ‘discriminated’. Each variant of the union has its own (constant) value of the union ta The type of the union tag usually is an enumeration type, and each variant is associated with a particular enumeration value. The run-time representation of an undiscriminated union is very simple: it is like a record, except that all fields overlap. Therefore, its size is equal to the aligned size of the largest variant, and its alignment requirement is equal to the lowest common ~ ~ l t iof~ l e the alignment requirements of its variants. Field selection of an undiscri~inatedunion is simple: the field always residesat offset 0 from the start of the union. Copying consists of a memory copy of the size of the union. Comparison is not available for undiscrimin~te unions The representation of a discri~natedunion is that of a record containin the union tag and the -undiscriminated -union of the variants, as above. The generated code can access the union tag to check which variant is current; whether the programmer can also do so is source-language dependent. In principle a check must be generated for each union-field access to make sure that the selector matches the current variant of the union, and if it is not, a run-time exception must be produced. In an optimizing compiler thischeck can often be avoidedbydoingstaticanalysis,forexampleusing sy~~olic inte~retationas explained in Section 3.2.2.
e describes data structures which consist of series of i same type. An array can have one dimension (a vecto ,or more. An m a y element is indicated through one or more index expressio~s, one for each dimension of the array, The run-time representation of the array consists of a ence of the representation of the elements; the addressof the first element ress of the may. For example, a one-dimensionalamay A: ARRAY [l. - 3 1 OF Integer;
will be stored as
and a two-dimensional array B: ARRAY [l..2, 1. . 3 ] OF Integer;
can beeither stored in
row-
that the elements stored row are hich means that the elements are stored column after column (Figure6.15b). These schemes can easily be extended to more than two dimensions. Below, we will assume row-major order, as this is the order most often used.
ure 6.15 Array B in row-rnajor (a) and c o l u ~ n - l ~ a j o(b) r order,
Note that, as long as the compiler makes sure that the element sizeis a multiple of the nment restriction of the element, there will be no gaps ~ e ~ the e mea y~ elements. This is iln~ortant€or the il~ple~entation of array comparison, as well as the implementation of element selection, as discussed below. A s with records, array copying can be done either element by element or through a memo blockcopy.Array com~arison,ifthesourcelanguageallowsit,canbe done mparison if the m a y elements do not have any gaps, and eleAll Ian ua es that support m a y s also support e ,which is usually A with base address ba.~e(A),where Assume an ~-dime~sionaI may nd LBk and upper bound UBk. The number of elements along dimension k is then LENk=UBk--LBk+l. Nowsupposethatthelocation of element A [i I i ...,i,,] must be computed. Given the base address base (A)and the size el-size of the m a y elements in bytes, the location of the required lement is obtained by multiplying eZ-size with the number of elements in front of the required element, and adding the result to base (A). For a one-di~ensionalm a y this location is base (A)+(i l--LB )xeE-size. In
general, for an dimensional array, the locationof the elementA [i I ,i ,...,in] is:
...
This is a lot of computation just to determine the location of one element. Fortunately, a large part of it can be preco~puted.~ ~ o r ~ a n ithe ~ i expression ng resultsin base (A) -(LB x ~ E ~ ~ x L E N ~ x . . . ~ E N ,
+L ~ ~ x L E N ~ x . . . x L E N ~ + ...
+ L~n)xel-size
+ ( ~ ~ x L ~ ~ ~ ~ ~ ~ i2xLEN3x.,.xLENE+ ~ x . . . x L E .N..+~i,)xel-size + All lines of this expression except the last are independent of the indices in the element selection and depend only on the array A. In fact, the first n lines of the ex~ressioncontain the location of the element A [O,0,...,01, the zerot e ~ e ~ e nOf t . course, it is quite possible that the zeroth element does not exist inside the array, because 0 may not be a m e ~ b e of r the interval LB~ . . .for U ~some ~ k ; this does not prevent us, however, from using its location, ze~oth-eze~e~t, inaddress co~putations, The LEN~x...xLE~, products in thelast line are also independent of the indices, and we can compute them in advance. If we desigC Thave: ~y nate the productL ~ ~ ~ x . . .by~ LE ~N Nn - P R Q ~ ~we
L ~ N - P R =~el-size ~ ~ C ~ ~ ~ E N - P R =Q~ ~ E~ N~ -~ P~ R Q ~ ~ C Tfor~n>k>3 ~ ~ ~ E ~ ~ ~ ~
+ ... With these valuesp~ecomputed,the location of the elemknt A [i i ,.,.,in] can be co~puted by the formula ze~oth-ele~e~t (A)+ i 1 x L ~ N - P R Q ~ + ~. C..T+ ~i ~ x L E N - P R Q ~ U C ~ ~
All these precomputed values for the array can be stored in an arr descriptor should also contain the array bounds (or the lower bound and the len~th)themselves, so arrayboundcheckscan be generated. An example of anarraydescriptor is shown in Figure6.16. The co~putationdescribed above results in the locationof an array element. When an lvalue is required in the contextat hand, this location serves asone. When the rvalue of the
ii"
ii"
Figure 6.16 An array descriptor.
arrayelement is required,thelocation(apointer)mustbedereferenced.Again,this ereferencing rnay be implicit in the selected machine instructions. en the array does not change size during its lifetime, and the bounds are knownat time, it is called a static array. In contrast, when the array rnay change size durtime, or its bounds are determined at run time, it is called a static mays the array descriptor does not actually have to exist at run time. Instead, the c o ~ ~ i l can e r compute all the values the array descriptor would contain and use them as constants in index expressions, with the exception of zerut~-eZe~ent.The compiler can, ow ever, compute the offset of z e r o t ~ - e Z e ~ e ~from t the first element of the array itself, and use thatto-compute the value of zerut~-eZe~ent when needed.
ome languages have set ty es, which are usually limited to set of a small subrange of Theseareprobablybestimplementedasbitsets.Ina ,each valueinthe is represented by a single bit: if the bit is set, the value is a member of the set; if not set, the valueis not a memberof the set. Bitsets can be stored convenientlyin chine words. For example, the set of the integers ranging from 0 to 31 can be d by a single 32-bit word. The usual set operations are implemented by means of bit operations: set union is i ~ ~ ~ e m e nwith t e d a bitwise OR, set intersection is implemented with a bitwise AND, symS Iand~ set E difference is ~ ~ t rset i cdifference is i~plementedwith a bitwiseE ~ C ~ ~ OR, ted with a bitwise NOT followed by a bitwise AND. omelanguagesallowsets of anytype for whichcomparison for equalityexists. sible representations of such sets are linked lists (Tarjan, 1975), trees, and hash sa and Goto, 1976).
Routinesarenotconsidereddatainmany p r o g r a ~ i n glanguages,andinthese es C, aas thequestion ofhowtorepresentroutines as a data typedoesnotarise.In is then data can be implemented simply as a pointer to its code; an indirect routine call used to activate it. The best implementation of routines as data in lan~uagesthat allow more advanced operationson routines -nesting, routines as first-class data, partial~ a r a ~ e terization, etc. -depends in complicated ways on the exact operations allowed an choices made in the i~plementationas to the allocation of the activation records. issue will be covered integrallyin Section 6.3.
built-inmethods,withsomeadditionalfeatures.Itstype is usually called a class, In some languages, an object is similar to an ~ ~ s~~t~ t type ~ in ~ ~ ~ which only the methods are visiblefrom outside and the object fields can only be access from within the method bodies. In other languages, the object fields can be accessed li record fields. The basic operations on objects are ‘field selection’, ‘copying’, and ‘method invocation’. Although many object-oriented languages do not allow direct access to the fields of an object, field selectionis needed for the acc ing is trivial and will not concern us here. F be identified, and then the call must be made but with one difference: as the object fields are directly accessible from within the~ e ~ o d body, an extra, implicit, parameter must be supplied: the pointer to the object.The metho can then access the object fields through this pointer. In this section, we will focus on method selection; the actual routine or method invocation is discussed in Section6.4.2. In most object-oriented languages, objects also have eo which are methods that are to be invoked on object creation tively.As far as thefollowingdiscussion is concerned,thesearemethodswhichare invoked like any other method. If this was all there was to it, implementation of objects would be quite simple. pose that we have an object class A with methods m1 and m2 and fields a1 and a2. run-time representation of an object of class A then consists of a record c o ~ t a i n i nthe ~ fields a1 and a2: a1 a2
In addition, the compiler~aintainsa compile-time tableof methods for class A:
t
where we have appended ‘-A’ to the method names, to indicate that they operate on an object of class A,rather than on a different class that also defined methods with namesm1 or m2. In this simple model, field selection is implemented as record field selection, and object copying is implemented as record copyin ;method selection is done by the identification phase in the compiler. Methods are implemented as routines with one additional para~eter,a pointer to the object. So the method m2-A could be t~~nslated to the C routine void m2-~(~lass-~. *this, int i) {
1
Body of method rnX-A, accessing any object fieldx as this->x
assu~ingthat m2-A ( ) hasoneinteger p~ameterandreturnsnovalueandwhere C l a s s - A is the C type name for class A. The method invocation a.m2 (3 ) ;is then trans~atedto m2-A (&a, 3 ) ;. The problem is, however, that all object-oriented languages have at least some of the features discussed below. They mdse objects ~ u c hmore ~ ~ s etof ~the~ pl r o ~ r a m ~ ebut r, also comp~icatethetask of thecompilerwriter.Thesearethe features thatdistinguish objects from abstract data types.
ted langua~es,allowsthe p r o ~ r a ~ to e base r a class B on a class A, so that B inherits A,addition in to its own :class B extends class 12 fields methods. and This feature is als with more zero or fields and methods. of class B, and class B is ss of class A. Now s~~ppose that class B by adding a method m3 and a field b1. The ~ n - t i m erepresentation of an object of class B is then:
In addition, thecompil~-timetable of methods for class
This can still be i ~ p l e ~ e n t using e d the means for abstractdata types described above,
one or more of A’s methods; this feature a parentclass P ng impliesthatwhen directly on P will have that method, but the i~plementationsof these methods in these subclasses may differ when overridden by ut more precisely, the method is ~ e ~ inZthe~class ~ P, e and ~ then ~e~~~~
in class P and possibly redefined in any of its subclasses. We use the phrase 'the declaration of X'here as a statement saying that X exists, and 'the definition of X' as a statement telling exactly what X is; this usage is in accordance with but somewhat stricter than the traditional usage in pro rammin~languages, where one speaks for example of 'forward f 'forward definitions'. declarations'rathertha B intheexampleaboveredefinesmethod m2, whichwas Nowassumethatclass already defined for objects of class A. Then the definition of method m2 in A is both its only declaration and its first definition; that in class B is a redefinition. Some langua ,allow the method declaration to occur withouta definition; the al, and a classinwhich at leastonevirtualmethodoccurs, is an methods must thenbe defined in classes that extend the abstract class. We will rename the methods so that the name reflects both the class in which it is declared and the class in which it is defined. The names consist of three parts, the method d name, the class it is declared in, and the class it is defined in; the parts are s e ~ a ~ a t eby underscores (J. So a name m2-A-B is used to designate a method m2 declared in class A and defined in class Methodoverridingaffectsthe compile-ti~etable of methods.Undertheabove assumption that class B redefines method m2,which was already declared and defined in class A,the method tableof class A now becomes:
and the method tableof class €3 becomes:
a is an object of class A, object of class B. A method call Now suppose a .m2 ( ...) willthenbetranslatedwith a call to m2-A-A, whereas a method call b .m2 ( ...) will be translated to a call to m2-A-B. This differentiates clearly between m2-A-A, which was declared in class A and defined in class A,and m2-A-B which was also declared in class A but defined in classB. If i n h e ~ t ~ n is c ethe only other object-o~entedfeature in the language, the type of th translation of m2-A-A is
void rn2-A-A(Class-A
*this, int i);
and thatof m2-A-B is void rn2-A-B(Class-B *this, int i);
hen a classB extends a classA and the language allows a pointer of type ‘point e assigned to a variableof type ‘pointer to classA’,the language supports a variable of type ‘pointer to class A’ may actually refer to an object of nsions. The imple~entationof this feature requires a new operation, converting a pointer to an object of subclass B to a pointer to an . This operation is used in assign~ents,for example: class B * b = ...; class A *a = b;
in which the second lineis translated into class A *a = convert-ptr-to-B-to-ptr-to-A(b);
routine convert-ptr-to-B-to-ptr-to-A ( ) is a compile-time type ecause an object of class B starts with the fields of class A,the value of the pointerneednotbechangedandtheonlyeffectconsists of changingthetype of the
pointer to A inside
ut note that we now have identical pointers to objects of different classes,
ow that a pointer p of type class A * can actually refer to an object of c k ~ B, s a question arises about the methods that apply. If the source program applies method m2 to the object referred to by p,should the translation i oke m2-A-A or m2-A-B? There are twopossibleanswerstothisquestion: static bi g, whichmaintainsthatstatically p refers to an object of class A,so m2-A-A should be called; and ~aintai~ that s if the object is actually of class B we should apply m object is actually of class A,we should apply m2-A-A. Static binding is trivial to implement, but most object-oriented languages use dynamic binding, for various important pror~mnlinglanguage design reasons. Dynamic binding has two significant consequences: re are two kinds of As out there, ‘genuine’ As which use m2-A-A and As ‘embedin Bs, which use m2-A-B, and the two cannot be distinguishe~statically. It follows that from now on, the object representation must include dynamic type information, lling if it isan A or a B. methods require a pointer to a B to obtain access to all of B’sfields. As m2 may, however, be called through a pointer of type ‘pointer to A’that ect of class 13, we need yet another operation, stru~tsa pointer to B from the pointer to A. The method invocation p->m2 (3 ) ,where p is statically a pointer to an object of class A,could then be translated to
switch (dynamic-type-of (p) { case Dynamic-class-A: m2-AnA(p, 3); break; case Dynamic-c1ass-B: m2-A-B(convert_ptr-to-A-to-ptr-to-B(p) , 3); break;
1
where the dynamic type information is an enumeration type with the values Dynamic-Class-A and Dynamic-Class-B. When p is statically a ~ointerto we could translate the invocationp ->m2 (3 ) immediately to m2-A-B (p, 3 )
i
Notethatthiscode is consistentwiththedeclarations void m2 * t h i s , i n t i ) a n d v o i d m2-A-B(Class-B * t h i s , int however,that a better ans slat ion is possible.Fornow,pointer su compile-time operation. The switch state~entused to find out which method routine to call is a function that works on a small domain, and lends itself to precomputation. To this end we incor~orate thepointerconversionfrom A to B intheroutine for m2-A-B, whichnowaccepts a pointer toA: void m2-A-B(Class-A *this-A, int i) { Class-B *this = convert-ptr-to-A-to-ptr-to-B(this-A) Body of method m2-A-B, accessing any object fieldx as this-=.x
;
1
More in general, every method translation m-X-Y gets as its first parameter a pointer to ClasS-X, which is then immediately converted to a pointer to C1asS-Y by convert-ptr-to-X-to-ptr-to-Y( ) . If X and Yare the same, the conversion can be omitted. With this modification tom2-A-B ( ) ,the method invocation p ->m2 (3 ) ,where p is statically a pointer to an objectof class A,can be translated as (dynamic-type-of(p) == Dynamic-c1ass-A ? m2-A-A
:
m2-A-B)(p, 3);
whichfeatures a computedfunctionwhich is calledwiththeparameter list (p, 3 ) , Rather than computing the function from the dynamic type i n f o ~ a t i o nof p each time an operationon p is pe~ormed,wecanincorporatetheresulting f~nctionaddressin t dynamic type information. The type information for an object of class B is then a recor and m3-B, containing the addresses of the with three selectors, ml-A, m2-A, be called for methods ml, m2, and m3. Theseare ml-A-A ( ) , m2 m3-B-B ( ) ;each of these routines has as its first parameter a pointer to an A. Such a record with addresses of method routines is called a i n f o ~ a t i o nin each object is implemented as a pointer to its Figure 6.1’7. The type information for an object of class A is a two-field dis~atchtable containin the addresses of the routines ml-A-A ( ) and m2-A-A ( ) ,both of which have a pointer to an object of class A as their first parameter. So in both dispatch tables the selector ml-A
r-----------,
I
f
r---------"-I I
d~s~at table c~
I I I
I I I I I I I I I I
P
. I
7 The representation of an object of class B.
selects routines of the same type; the same applies to m2-A. This in turn means that the selection p - x i i s p a t c h - t a b l e - >rn2-A,where is oftype class A *,yieldsthe proper routine with the proper type, regardless of whether p points to an object of class A , The method invocation p ->m2 (3 ) can now be translated very effi(p->dispatc~-tabLe->m2-A) (p,
3) ;
have modified the routine m2-A-B () ,the translation given above for the ->m2 (3 ) ,when p is statically a pointer to B,is no longer correct. The routine now expects a pointer toA,and must be given one: m2-A-B (convert-ptr-to-B-to-ptr-to-A
(p), 3 ) ;
Until now, the effectof the conversion routines has been thatof type conversion only; no actud codewasinvolved.Wewillnowturntoanother object-o~ente~ feature that requires the con~ersionroutines to have substance.
e , inwhichanobjectclassmayonly inheritfrom a singleparent class. Inthissection, we willdiscusstheconsequences of g objects to extendmorethanoneparentclass.This feature is called ce, and is supported by several impo~ant object-oriented programming languages. Suppose, for example, that we have an object class C with fields c1 and c2 1 and m2,an object class D with field d l and methods m3 and m4,and an hich extends both C and D, adds a field el,redefines methods m2 and m4, and adds a method m5 ,as shown in Figure6.18. ~ n l i k ethe situation with single inheritance, it is no longer possible to represent an object as a pointer to a dispatch table followed by ail object fields. In particular, the 'D inside E' object must start with a pointer to its dispatch table, followedby its object fields. is stillpossibletocombinethedispatchtables for E, 'C inside E', and 'D inside E'. owever, they are no longer all indicated by the same pointer. The dispatch table for E: becomes: *
class C { field cl; field 62; method m1 0; method m2 0;
h class D { field dl; method m3 0; method m4 (1 ;
1; class E extendsC, D { field el; method m2 0 ; method m4 0; method m5 0;
1; igure 6.18 An example of multiple inheritance.
ml-C-C m2-C-E m3-D-D m4-D-E m5-E-E
so the dispatch tables forE and ‘C inside E’still have the same address. m e dispatch table for ‘D inside E’starts right behind the dispatch tablefor ‘C inside E’. The representation is Note that the pointer supertyping and pointer subtyping operations, which up to this point did not require code, now in some instances get substance: supertyping: convert-ptr-to-E-to-ptr-to-C(e) convertJtr-to-E-to-ptr-to-D(e) subtyping: convertJtr-to-C-to-ptr-to-E(c) convert-ptr-to-D-to-ptr-to-E(d)
= =
e e + sizeof (class C)
=c =d -
sizeof (class C)
When an object class E inherits from both class C and class D, an a~biguitymay arise.Forexample, C and D may bothcontain a methodwiththesamename. Class inherits both these methods, but when applying a method with this name to an object of class E, only one method can be applied, andit m a y not be clear which versionis intended. The language rules should indicate when a conflict or ambiguity arises, and the compiler has to detect these situations. Often the compiler task is complicated enough to warrant
r-”---------~
I
1
r-----------I I ~ i s ~ a table tc~
P
I I I I
I I I I I I I I I I I I
P
ure ~.~~ A representation of an object of class E.
ific publications, for example ~amalingamand S~nivasan(1997) or Boyland and Casa (1997). These are very language-specific problems, however, and we will not discuss them here.
ant issue that
multipleinheritance,onewhich we haveignoredinthe erit~nce.Forexample, if bothclass C and D of the sions of a class A, then what does ‘A inside E’mean? Depending mean one of two things: either an object of class E contains two contains one object of class A, In the former case we speak of e , inthelatter we speak of even a mixture of the two: som ay be inherited dependently’ others independently. Inde~endentmultipleinheritance is implementedexactlyasdescribedabove.The nly co~plicationis in the identification, and the language rules should prescribe when and ow the p r o ~ r a ~ should e r use qualification, exactly as if the fields andlor methods of class A resulted from different classes, but with the same names. W let us turn to dependent inheritance. The added complexity here is not in the selection, but in the representation of the object data. We can no longer use the ‘ p ~ ~objects nt first’ scheme, because, in the example above, we would get two copies of will have to arrange the object data so that it only has one copyof the lish that by placing the components of the object data in the object in the f o l ~ o ~ order: i ~ g the first entry is the pointer to dispatch table of E (and of ‘C inside );the next entries are the fields of A; then follow all fields of C not inherited from A;the try after that is the pointer to dispatch table of‘D inside E’ (which points to within E’s all fields of D not inherited from A; and, finally, all fields of E not *
*
1s order is correct for the E object and the ‘C inside E’object, but what about the ‘D
inside E’object? The compiler has to decide what a D object looks like when compiling the D object class, at a time when it does not know about eitherC or E. Assuming that a D consists of a pointer to its dispatch table followed by its fields will not work, since when a D resides inside an objectof type E, the fields it inherits from fi are some distance aheadof the dispatch table pointer and D’s own fields follow it. So when producing code to access the object fields, the compiler has no idea where they are. As usual, the answer to this kind of problem is a run-time descriptor. This descriptor should allow a method to find the object fields, given a pointer tothe object itself. For each field, the descriptor must contain the offset of the field from the object pointer. We enumerate the object fields, so that we canusetheenumerationindex as anindexinan o le. Inadditiontoapointertothe dispatch table, the object representation now must also contain a pointer to the offset table. Since we do not know beforehand which object classes will be involved in multiple inheritance, we must follow this two-pointer schemefor all objects. Now let us return to our example of Figure 6.18, and extend it by assuming that both object class C and object class D extend an object class A, which has fields a1 and a2 and methods m1 and m3. So, object class C redefines method m1,and object class D redefines method m3 ;see Figure 6.20. An object of class E has the representation depicted in Figure 6.21. Assuming that all pointers and fields have size 1, field a1 has offset 2 from the E pointer, field a2 has offset 3, etc., and the index tableof class E contains 2 3 4 5 8 9
The index table of the ‘Dinside E’class contains - 4- 3
2
Note that for an object of class E, there is an ambiguity onm1 as well as m3. The language rules or the progra~merwill have to specify whichm1 and m3 are intended whenap~lying it to an object of classE. y now it rnay be clear why some languages (notably Java) do not allow multiple inheritance: it causes many complications and adds some method invocation overhead. On the other hand, the benefitis added flexibility.
All the manipulations discussed above make a method invocation more expensive than a routine call: the method is called through a dispatch table, and the method itselfrnay have to adjust the object pointer thatis passed to it. Therefore, some languages have a mechanism for indicating that a method may not be redefined by a subclass. Such an indication allows the compiler to identify the method directly, as it would identify any routine; the compiler can then use the routine call mechanism instead of the dispatch table. In Java, a method can be marked ‘final’, indicating that it may not be redefined. In C++, methods that may be redefined must be marked ‘virtual’. Static analysis canalso be of assistance in allowing the compiler to find out exactly to which subclass a method is applied. It may also be able to determine the method called, and ~eneratea direct routine call. ore i n f o ~ a t i o nabout the implementation of inheritance can be found in papers by
class A { field al; field a2; 0 method m1 method m3 0
1;
; ;
class C extends A { field cl; field c2; () ; method m1 method m2 0;
1;
class D extends A { field dl; method m3 0 ; method m4 0;
1;
class E extends C, D { field el; method m2 0; method m4 0; method m5 0;
1;
Figure 6.20 An example of dependent multiple inhe~tance.
Temp1 (1993),whichcoversindependentmultipleinheritanceonly,andVitekand Horspool (1996), which covers implem~ntationof dynamic binding in dynamically typed object-o~entedlanguages.
Java has incorporated an extension that relieves theli~itationsof single inheritance somewhat, without adding the complexity (and the power) of multiple inhe~tance.The extension consists of so-called interfaces. An inte ce is like m object class in that it consists of a number of method specifications. In contrast to an object class, however, it cannot have object fields. An interface may extend a single parent interface. An interface is not instantiated like an object, but one can declare Java variables of an interface type, and invoke methods from the interface specification on them; Java variables are actually pointers to objects. The trick i s that an object class may specify that it implements one or more of those interfaces, and that an interface type is compatible with any object type that implements thisinterface. So, for example, given an interface
I _ " " _ " " "
I
~""""^"""""""""
I
1
l
P I I I
I
/
I\;
-
d i s ~ ~ ttable ch
1
I I I I I
l
I I I I
pointer to
I I
I
Bobject L"""""; I
I
I
\
1
~-"""""""""""""I
E-class
I
igure 6.21 An object of class E, with dependent inheritance.
public interface Comparable{ public int compare (Comparable 0);
1
it is possibletodefineanobjectclassthatimplementsthis interface, while it still can extend another objectclass, The compiler must generate a separate dispatch table for each interface that an object class implements. This separate interface dispatch table only contains entries for methods that are specifiedinthe interface specification,butthe entries refertomethods of the object type. A variable of an interface type can be represented by a record containing two pointers, one to the interface dispatch table, and one to the object. Method invocation on an interface then goes through the pointer to the interface dispatch table. Conversion from an interface value to an object class value requires a check that the object class actually co~espondsto the type of the object referred to by the interface value, or is a parent class of it. The reverse conversion consists of a check that the object class actually i~plements the interface type, and of locating the proper dispatch table within the object class.
Routines have been with us since the first programs were written and will probably continue to serve usfor a very long time. It is therefore amazing to see how co~plicatedthese seemingly basic entities are. A routine call is a combination, a successful combination one
4
~~~~1~~ 6.3
~ o u t i n ~an^ s t h ~ iactivation r
must admit,of at least four loosely related features: 1. supplying a new computing env~onmentcontaining at least some temporary memory, the local variables; 2. passing information to the new environment, the parameters; 3. transfer of the flow of control to the new environme~t,with -in principle -guaranteed return; 4. returning information fromthe new environment, the return value(s). Some of these features areavailableinisolationinsomelanguages.Creating a new environment (l)is available in C and many other languages as block entrance. Code that can be transferred to with guaranteed return (3) without creating a new environment is known as a ‘refinement’ (Bal and Crune, 1994). But the package deal of the routine call bas been far more in~uential.
The list of the four features above shows that the basic ingredientof a routine activation is the new environment. The data structure supporting the new environment is the a ~ t i ~ a t i o ~ ,also called frame. An activation record holds the data pertinent to an invocation of a routine or object method; it represents an activated and not yet terminated routine. In particular, it contains user data -local variables, parameters, return values, register contents -andadministrationdata -code addresses,pointerstootheractivationrecords. Managing the userdata is discussed in Section 6.4.2.2. In non-concurrent code, only one of the activation records represents a running routine; all the others are suspended: no instructionsof those routine activations are beingexecuted. The instruction being executed is located in the code of the running routine and the c o ~ ~ t ePC, r , points at it, or -more usually -just after it, depending on hardware conventions. The activation record of the running routine is indicated by a f ~ a m e FP. The f r m e pointer, which usually resides in a dedicated register, is used at run time to access the contents of the activation record of the running routine; its use is explained in more detail in Section 6.4.2.2. Inthissection we concernourselveswithallocating,deallocating,andotherwise organizingtheactivationrecordsinefficientways.Inmanylanguages,routinesare activated in a strictly last-in-first-out order: whena routine A invokes a routine B,A cannot continueuntil B hasfinished. For suchlanguages a stack is thepreferredallocation scheme for activation records. In other languages complications exist, due to features like nested routines, iterators, coroutines, routines passed as parameters, routines returned as routineresults,partiallyparameterized calls, non-localgotos,andcontinuations.Such features complicate the use of a stack for the activation records. In our implementations, as long as possible, for efficiency reasons. we will try to use stack allocation Wewill first discuss the contents of activation records. Next, we consider several f o r ~ of s routines and make an inventory of the operations available on routines. Finally we consider implementationsof these operations on the various forms of routines.
~lthoughthe contents of an activation record depend on the source language, the compiler, and the target machine, the dependency is not a strong one, and all types of activation records have much in common. They almost invariably include the following components:
-the local variablesof the routine; -the p~ametersof the routine; -the working stack; and
-an ad~inistrationpart.
The first three components correspond closely to the local variables, parameters, and intermediate results in the source program. These components will be examined in more detail in Section 6.4.2.2, where we discuss the actual routine call. Some aspects of the administration part are discussed further on in this subsection. shows a possible structure for an activation record. The peculiar order of thecomponents is helpfulinthecreation of activationrecordsinroutinecallsandin addressing the user values they contain. It is the most natural order on most machines and it is the order used in our treatment of routine calls in Section 6.4.2.2. Other orders are possible, though, and may have their advantages occasionally. An activation record is accessed through its frame pointer. In our treatment this frame pointer points to the last byteof the administration part, just before the first local variable. This is convenient since it allows addressing local variable k by FP + off set (k)and p ~ a ~ e tp e rby FP -t- sizeof (administration part) + offset (p), regardless of how many parameters or local variables there are. The offsets of local variables are negative, those of the para~eterspositive. The direction of low to high addresses is in accordance with that on most machines. The exact contents of the administration part of an activation record A resulting from the invocation of a routine R are machine- and imple~entation-depelldent. They always include either return or continuation information, and a ‘dynamic link’; they may include a ‘lexical pointer’. In some implementations, the administration part also contains copies of values kept in registers, but since these again represent local variables, parameters, and intermediate results from the source program, their treatment is also deferred to Section consists of a return address, which is the code address in the o which the flow of control will return when R terminates. consists of a continuation address, which is the code address in ntinue when R is resumed. Which of the two is present depends on ‘~plementation,as explained in the next few paragraphs. In both cases the is the frame pointerof the caller of R. The issue of storing return addresses in the activation records of callees or continuation addresses in the activation record of callers may be confusing and a few words are in order. In principle, any operation that suspends a routine R for whatever reason can store the conti~uation(resumption) information in the activation record of R; the information can then serve to continue R when that is required. Such operations include calling another routine, s u s ~ e n d i nin~aniterator, transfe~ingtoanothercoroutine,andevenathread
464
~ ~ C T 6.3 l O ~o~tinesan^
high addresses
t ~ e i a~tivation r
parameter k
parameter 1
" _ lexical _ _ " _ _ _ _ _ _pointer ___" return information "__"__""______ dynamic link
__._""."_______
registers &L misc. Q=--"
framepointer FP
local variables
addresses low
il
stackpointer
SP
working stack
Figure 6.22 Possible structure of an activation record.
switch (Section 9.12). Storing continuation information aboutR in the activation recordof
R has the advantageof keeping R as a conceptual unity. Storing return i n f o ~ a t i o nin the activation recordof the c a k e has a much more limited applicability: it supports only routine calling. The reason is simple: only the callee can get at the infomation and the only thing it can do with it is to use it to return to the caller. Hardware support is, however, much stronger for storing return i n f o ~ a t i o nthan for storing continuation information, and in virtually all implementations the call to a routine S stores return information in the activation record of S. The adminis~ationpartmaycontainanotherimportantentry:the'lexical pointer', also called 'static link,; this is the f r m e pointer of the enclosing visible scope. The lexical pointer allows access to the local variables of the lexically enclosing routine and toits lexical pointer, as will be explained in Section6.3.5, For some languages, no lexical pointeris needed, In C, for example, routines can only be defined on the top level, so a routine has only one enclosing visible scope, the global scope, in addition to its own local scope. The frame pointer of the global activation record is constant, though, and need not be stored;it can be incorporated into the machine addresses of the global items. The frame pointer of the local activation recordis available as the current frame pointerFP. Activation records can be allocated on a stack or ontheheap.Stackallocation is cheaper but limits the usabilityof the activation records. As said before, we will try to use stack allocation as long as possible in our implementation.
utine is just a piece of code, reachable through a pointer to itsfirst instruction, its ess. When a routine R is called (invoked, activated) an activation record is create it and the flowof control is transferred to its first instruction by setting the program counter to the routine code address; the routine is now runnin . When R calls a routine S, an activation record for S is created and the flow of control S transferred to the first instmcR is S ~ a r e n of t S, and S is the is now running and When S finishes, it is term . The ancestors of a roudefined as the parent of R and the ance nt of R. Routinesthatare active. A routine can be suspended have activation records; they are called e than once simultaneously, in which case it is recursive; only one invocation can be ~ n n i n gat any given time. When a routine returns, its activation record is removed. When the last activation record of a routine has been removed, the routine becomes i tive again, The above is the behavior of the classical su routine; there are several other kinds of routines, which exhibit additional features. The simplest is the ite~ator,a routine that can suspend tempora~lyandreturntoitsparentwithoutlosing its activation record, This allowsthe iterator tocontinuewhere it suspended itself, when it is calledagain.As a result, the iterator can yield a succession of values; hence its name. The temporary return statement is usually called ‘suspend’ or ‘yield’. An example application of an iterator in C notation is given in Figure 6.23. The iterator get-next-int ( ) is started by the first call to it. After having yielded an 8 upon the first call and a 3 upon the second, it yields a 0, after which it terminates, due to the implicit return at the end. The returned value 0 stops the while loop. In languages that lack iterators, p r o g r a ~ e r usually s implement themby using global variables or by in~oducing additional parameters, as shown for example in Section 1.4.2, but the general implementation requires the retention of the activation record. A well-known example of an iterator is the UNIX routine getchar ( ) . A third variety of routines is the coro~tine.Like the iterator it can suspenditself, but unlike the iterator suspending does not imply a return to the parent but rather transfers the control to a named coroutine, which is then resumed, This form of flow of control is often used in si~ulationprograms, and was introduced by Simula 67 (Birtwistle et al., 1975). There the coroutine transfer statement is called ‘resume’. The statement resume (C ) in a coroutine X leaves X t e m p o r ~ l yand resumes the coroutine C at the same point the last resume statement in C left C temporarily. A simplified example application in a C-like notation concerns producer/consumer communication,as shown in Figure6.24. The example shown there is overly simple, since in practice the resume (Consumer) statements may be hidden in routines called directly or indirectlyby Producer ( ) ,and a subsequent resume (Producer) must continue inside those calls, with the complete environments intact. This requires retaining the activation records of those calls and all calls that lead to them. Independently of these variants, routines can be global or nested. The code in a al routine only has access to global entities and to its own local entities. The code routine, declared on the same level as local variables, has access to these sameglo-
void
use-iterator int i;
(void) {
while ( (i = get-next-int 0 ) printf ( II%d\n", i) ;
1
3
0)
{
1
int get-next-int (void) { yield 8; yield 3; yield 0;
1
Figure 6.23 An example application ofan iterator in C notation.
char
buffer [l001
;
void Producer (void) { while (produce-buffer ( ) ) { resume (Consumer) ;
1
1
empty-buf fer ( ) ; /* signal end of stream */ resume (Consumer) ; Consumer (void) { resume (Producer) ; while ( !empty-buf f er-received (1 ) { consume-buf f( ) ; er resume (Producer) ;
void
1
1
Figure 6.24 Simplified producer/consumer co~municationusing coroutines.
bal and strictly local environments, but also to entities declared in lexically intervening routines, as shown in Figure 6.25. Since j ,k, and l reside in different activation records, providing access to them requires some thought.
h addition to declaring a routine, which specifies its name, parameter types, and return type and which is a compile-time action, several operations of increasing complexity can
7
int i; void level-0 (void) { int j; void level-l (void) { int k; void level-2 (void) { int 1;
/* codehasaccess
...
1 ...
1 ...
to i, j, k, 1 */
k = 1; j = 1;
/* code has access toi, j, k */
j = k;
/* code has access toi, j */
} Figure 6.25 Nested routines in C notation.
be performed on routines. First of all,aroutinecanbe . Thisdiffersfromdeclaringaroutineinthat definingitsuppliesthecode of the . Also,definingmaybearun-timeaction:when the routine level-1 () in Figure 6.25 is recursive, each recursive invocation defines a different routine level-2 ( ) ,each having access to a different incarnation of variable k. A routine ~efinitionresults in a defined routine, possibly represented at run time as a routine value. No doubtthemost impo~antoperationonadefinedroutine is c it.Callinga definedroutinecreatesanactivationrecord for it andtransferscontrolcode.Details of routine calling, which include parameter passing, passing back a return value and returning from the routine, are covered in Section6.4.2.2. Once a defined routineis seen as a value, two more operations onit become possible: er to a routine in a call, andr call.An importa~tdifferencebetweenthesetwooperthatpassingaroutinevalue as a p ~ a ~ ~ eintroduces ter the value into a s ~ ~ l l pointer er scope, whereas returning a routine as a value introduces the value into a larger scope. (Pointer scopes were treated in Section 6.2.3.2.) Consequently, returning routines as values is funda~entallymore difficult to implement than passing them as parameters. It turns out that once we have implemented returning routines as values, we can also store them in arbitrary data structures, more in particular in global variables. Routines passed as parameters are important in the i~plementationof l c languages, routines stored in data structures are essential for functional languages, occur h in some imperative languages, for example Icon and
Algol 68. Routine values resulting from these two operations must allow the same operations as any other defined routine. In particular, it must be possible to call them, to pass them on as parameters, and to return them as values. A lessfrequentoperation is jum out of a rout in^. The destination of such a jump, also called a non-local goto, is a ~onmlocal label. A non-local label is a label in another routine. A variation on the code from Figure 6.25 is shown in. Figure 6.26. The routine leve1-2 () contains a non-localgotostatementtolabel L-1 in routine level-1 () . The goto ~ - 1statement t e ~ i n a t e sthe activation of level-2 ( ) and transferscontrolto ~ - 1in routine level-l ( ) ;if level-2 ( ) is recursive,several invocations of leve1-2 () will have to be terminated. If leve1-1 () is recursive, each i n c ~ a t i o ndefines a different non-local label1;-1. In addition to being directly visible, the non-local labelcan. also be passed to the running routine as a p ~ a m ~ t e When r. the non-local label is returned as a value or stored in a data structure with a scope not smaller than that of the label itself, it is called a ~ o ~ t i n u a tion. ont ti nu at ions allowsidebranches of computationstoberesumed,andsupport a r e m a r ~ a b ~ e p r o g r a ~ i n g p ~(Appel, a d i g m 1992). void level-0 (void){ void level-1 (void){ void level-2 (void){ L . .
goto L-1;
...
..*
L-1: . . *
1
...
Figure 6.26 Example of a non-local goto.
Designing a non-local goto mechanism requires finding a representation for the nonlocallabelthatsupportstransfer of control,beingpassedas a parameter,andbeing returned as a value. Note that this representation does not need to be the same as that of a routine value. The last operation on routines we will discuss in this sectionis In partial paramete~%ation, one or more actual parameters are supplied to a defined routine, but the routine is not called, even if all actual ~arametershavebeensupplied. (I
Instead, a new defined routine results, with n-m parameters, if the original routine had y1 parameters, of which m have been supplied; again, m can be equal to n. An example in C notation would be: extern int add(int i, int j ) ; int (*inc) (int i) ;
/* yields i + j */ /* a routine variable inc
*/
int main(void) {
...
inc = add( , 1 ) ;
...
/* supplythesecondparameter
*/
printf ( Ir%d\nI1 , inc (5)) ;
inwhichanexternalroutine add (i n t , i n t ) is parameterizedwith 1 as itssecond parameter, to yield a new one-parameter routine, i n c ( ) . No call is involved, just the creation of a new defined routine. The last line of the code calls i n c () with one parameter, 5, to print the result 6. Partial parameterization is also called ‘currying’, although this tern is more often used more specificallyfor partial p~amete~zation in which just the first parameter is supplied, We will often prefer the tern ‘curried routine’ over the somewhat unwieldy term ‘partially parameterized routine’. Currying plays a large role in functional languages. Since a curried routine is just an ordinary routine, it is important that curried routines be implemented so that all operations available on routines are available on them. This includes calling, further currying, beinga parameter, being a return value, and perhaps others.
We will first discuss the i ~ ~ l e ~ e n t a tofi onon-nested n routines since they are simpler than nested routines, Non-nested routines can be i~plemented using stack allocatio~ exclusively, except when currying is among the required operations. A non-nested routine is represented at run time simplyby the start addressof its code. When called, a new activation record is stacked, as described in Section 6.4.2.2. If the code address is known at compile time, a routine call instruction can be used to transfer control to the routine; if the address results from run-time computation,an indirect routine call instruction must be used. A non-nested running routine has access to two environments only: the global data area and its own activation record. The global data area is addressed directly. The routine’s own activation record is accessible through the frame pointer FP, which is kept pointing to the activation record of the running routine. Direct addressing and the frame pointer together provide access to the complete environment of a running non-nested routine. A non-nested routine can be passed on as a parameter or returned as a value by just passing on or returning its code address. The same operations are possible on this passed is all that or returned routineas on the original routine, since in both cases the code address is needed. Jumpingout of a non-nestedroutine is not a naturalconcept,sincebesidesthe
routine's own code there is no other syntactically visible code to jump to. Still, it is occasionally useful to terminate a running routine and transfer control to a marked code locationinanancestorroutine.Twopossibleapplicationsare:stopping a recursivesearch when an answer has been found; and handling exceptions (Section 6.4.3.2). The C programing language has the s e t j ~ p ~ o n ~ rrnnpe ~ h ~ nfor i s ~this. A call to the built-in routine setj mp (env) saves information about its code position and stack environment in a 'jump buffer' pointed toby the parameter env,and returns 0; it marks a possible place in the execution of the program to which control may be transferred by performing the non-local goto. A later call to the built-in routine longjmp (e m , Val) restores the environmentsaved by the last call of setjmp (env), andreturnsfromthe call to setjmp ( ) as if it returned Val. This effectively implements a non-local goto, with the jump buffer representing the non-local label. A condition for the proper functioningof this mechanism is that the routine that called setjmp must still be active at the time the corresponding 1ongjmp is called. The mechanism is demonstrated in Figure 6.27. The routine find-div-7 ( ) implements a recursive search for a number divisible by 7, and is symbolic for any such search process. When a number divisible by 7 has been found, a longjmp is performed to the label (*jmpbuf-pt r) passed as a parameter, otherwise the search continues with the next higher number. Without thelongjmp call, the search recurses intoinfinity.
#include csetjmp.h> void find-div-7 (int n, jmp-buf "jmpbufstr) { if (n % 7 == 0) longjmp(*j~pbuf-ptr/ n) ; find-div-7 (n -I 1, jmpbuf-ptr) ;
1
int main(void) { jmp-bufjmpbuf ; int return-value;
/* typedefinedinsetjmp.h
*/
if ((return-value = setjmp(jmpbuf)) == 0) { /* setting up the label for longjmp0 lands here */ find-div-7 (1, &jmpbuf) ;
1
else {
1
1
/* returning from a call of longjmpo lands here */ printf ("Answer = %d\n" return-value) ; I
return 0;
Figure 6.27 emo on strati on of the setjmpfiongjmp mechanism.
The driver establishes the 'non-local label' by calling set j ( ) ;themp actual label is not textually visible and is located after the else. The driver then initiates the search, startingat 1; thenon-locallabel is passedas a parameter.Whenthesolutionis found, find-div-7 perfoms the non-local goto, which lands at the else branch. Note that the traditional C programming technique of allocating the jmp-buf data s t ~ c t u r eamong the global variables constitutes a violation of the pointer scope rules. The jump buffer will contain pointers to the activation record of the routine that fills it, so its scope is smaller than that of the global data area. If the jump buffer is filled in the routine main () in C the problem disappears, since there is no code on a global level that could access the jump buffer, but if the jump buffer is filled in a subroutine, a pointer scope violation can easily occur, resulting in a jump to a routine that has already been ter~inated. assing the jump buffer address as a parameter to all interested routines as in Figure 6.27 solves the problem andis safe, but annoying. Now that we have seen the feature and its use, we turn to its implementation. The i~plementationof set j mp must at(env) least save theframe pointer of its caller and its (env, own return address in the jump buffer env. The i~plementationof longjmp val ) retrieves the destination activation record frame pointer and the return address from the jump buffer env. It thenunstacksactivationrecordsuntil it finds thedestination activation record and transfers to the return address. "he i ~ p l e ~ e n t a t i must o n also deliver val in the function result register. ~ u ~ i of n gnon-nestedroutines cannot be i~plementedsubstantiallymoresimply than that of nested ones. We will therefore postpone its discussion to the next section.
Not all o~erationson nested routines can be implemented using stack allocation for the activation record, but much can still be done to preserve stack the regime. In addition to the usual code address, the routine descriptor used to represent a defined nested routine R must contain enough infor~ationto provide access to the data that are visible from the point of its definition. These are the constants, variables, para~eters,routines, etc., of the lexically enclosing routines of R and reside in activation records of these routines, The straightfor~ardway to provide this access is to include the frame pointer of the invocation of the routine in which R is defined (another solution, using 'closures' is ,Section in 6.3.5.5).This pointer is called the lex or S to Figure 6.25, the descriptor of the routine level sts 0 1eve1-2 ( ) and a lexical pointer, the frame pointer of the enclosing routine level-1 ( ) . Figure 6.28 shows such a two-pointer routine descriptor. Several points are worth noting here. The first is that it is conceptually convenient to imaginethedefinition of a routinetocorrespondto ~ n - t i m ecode,whichproduces a correctly filled local routine descriptor in the activation record, just as the definition int i = 5 ;produces a correctly filled localintegervariable.Figure 6.29 showspossible code for the const~ctionof a routine descriptor for the routine level-2 ( ) from Fig. Further optimi~ationmay of course render the explicitconst~ctionof the routine descriptor supe~uous,just as constant propagation can remove the allocation of i from
2 8 A routine descriptor for a language that requires lexical pointers.
the progrm if i turns out not be to modified anywhere. Also, the value level-2-as-a-value could be const~ctedon the fly when calling routine A () in Figure 6.29, void level-1 (void) { int k; void level-2 (void) { int 1;
1
...
routine-descriptor level-2-as-a-value = { FP-of-this-activation-record(), /* FP of level-l() */ level-2 /* address code of level-2()*/
1;
1
A (level-2-as-a-value)
;
/* level-20 as a parameter */
Figure 6.29 Possible codefor the const~ctionof a routine descriptor.
A second point is that if leve1-1 ( ) is recursive, each incarnation has a different activationrecordwith a differentaddress,sotheroutinedescriptors for thedifferent level-2 () S in them differ,as they should, Another point is that when nested routines havea two-pointer descriptor, thenso must the non-nested ones, for reasons of u n i f o ~ i t y .The top-level routine descriptors can get a null lexical pointer, since all data visible from outside a top-level routine is accessible by direct addressing and the lexical pointer will never be consulted. data of Since the code of routine level-2 ( ) hasaccessnotonlytothe level-1 ( ) but also to those of level-0 ( ) ,it would seem that s~pplyinga lexical pointer to just the activation record of level-1 ( ) is not enough, We will see now, however, that it is.
When a routine R defined by a two-pointer routine descriptor L) is called, a new activation record is created; the present pro~ramcounter PC, the frame pointer FP and the lexical pointer from L) are stored in the administration area of the new activation record; FP is made to point to the new activation record; and control is transferred to the code address from D. See Figure 6.30,
Activation record of lexical parent of R
Activation record of lexical parent of R
Activation record of the caller Q of R
Activation record of the caller Q of R
FP
code of Q
code of Q
Activation record of R
PC
of R
code of PC
a. Before calling
R,
b.After calling R.
igure 6.30 Calling a routine definedby the two-pointer routine descriptor D.
As said, the point of having a lexical pointer is the access it allows to all lexically enclosing environments. We will now first see how the access to l and k in the statement k = 1 in Figure 6.25 can be obtained. The variable l is located in the activation record so it canbereachedthroughtheframepointer: l is * (FP + oftherunningroutine, off set (1) ) ,where off set ( X ) is the offset of entry X from the frame pointer of the activation record. The variable k is located in the activation record of the imediately enclosing routine, which can be reached through the lexical pointer, which in turn can be found through the frame pointer: k is * (* (FP + off set (lexical-pointer) ) + offset ( k ) ) inroutine level-:! 0 . (Of course, k i s * (FP -I- offset ( k ) ) inroutine level-1 () .) So the assignment k = l is translated to intermediate code as shown in Figure 6.31. The translation of the statement j = l is similar, except that j must be found by following the lexical pointer twice (Figure 6.31). These translations may look imposing, but BURS techniques can often find good code for such forms, by exploiting advanced addressing modes.
FP
+
off
1
set
(lexical-pointer)
+
off
l = *(FP
(k) set
+ offset ( 1 ) )
Figure 6.31 Intemediate code for the non-local assign~entk = 1.
We see that storing the lexical pointer to the activation record of the lexically enclosing routine in the activation record of the running routine builds a linked list of those activation records that hold thedata visible from the running routine. The length of the list is equal to the lexical nesting depth of the running routine. The number of times the lexical pointer must be followed to reach an entry in a routine R from a routine S is equal to the difference in lexical nesting depth between S and R, It is therefore convenient to represent local addresses in the compiler as pairs of nesting difference and offset; since the nesting difference cannot benegative, a value of -1 can be used to code direct addressing. So, inside routine level-2 () ,1 is represented as [0 , offset(l)],k as [l, offset(k)], j as [2, offset(j)], and i as [-l, (i ) ]. Note that the nesting differenceis a compile-time constant. off set
*( *(
*(
FP
+
off
set
(lexical-pointer)
1
+ 1
off
set
(lexical-pointer)
+
offset ( j ) ) =
*(FP
+
offset(1))
Figure 6.32 Inte~ediatecode for the non-local assignment j = 1.
Passing a nested routine as a parameter is simple now: just pass the two-pointer descriptor. No matter to what static or dynamic depth the routine level-:! ( ) is passed on, when it is finallycalledtheabovecallingschemeconstructsanactivationrecordwith a lexical pointer that indicates the activation record of level-1 ( ) and thus restores the proper environment for a call of level-2 ( ) ;see Figure 6.33. Since the two-pointer routine descriptor contains a pointer of possibly limited scope, the lexical pointer, we have to consider the pointer scope rules. The scope of the routine descriptor is the same as that of the activation record in which the routine was declared. W e n passing the routine descriptor to a child routine, it is passed into an environment of smaller scope, so no scope violation can occur, regardless of whether we use a heap or a stack for the activation records.
Returning a nested routine as a value is equally simple: just return the two-pointer descriptor. Now, however, the two-pointer descriptor is passed to an environment of potentially larger scope, so there is the danger of a pointer scope violation. The violation will indeed occur under a stack regime: when routine leve1-1 ( ) returns routine leve1-2 ( ) as a value to its parent level-0 ( ) ,and this parent calls the returned routine, the call will construct an activation record for level-2 ( ) whose lexical pointer refers to the activation record of level-1 ( ) ,which is long gone! We see that returning a nested routine as a value is incompatible with the stack regime; it requires the activation records to be allocated on the heap. Heapallocationindeedsolvestheproblem. Since all environmentshavethesame infinite scope,nodata is introduced into environments of largerscope,and no pointer scopeviolationcanoccur.Morein detail, theactivationrecord of the call of routine
outine~and t ~ e iactiv~tion r
record activation of level-l ()
routine
nested level-2 () passed as a parameter
activation record
activation record
activation recordof level-2 ( )
\H Figure 6.33 Passing a nested routine as a parameter and calling it.
leve1-1 ( ) in our example will be retained automatically after the call has t e ~ i n a t e d , since it is still accessible from the program data area:FP, which is in the root set, points to the activation record of level-0 ( ) ,which contains the routine value level-2 ( ) , which contains the frame pointer of leve1-1 ( ) as the lexical pointer. Such operations m a y seem weird in a C-like language, but are c o ~ o n p l a c in e functional languages, and in some imperative languages,for example Icon and some variants of A1
The main operation a non-local label in a given routine invocation I must support is the non-local goto. ~ e ~ o ~a inon-local n g goto to this label t e ~ i n a t e szero or more routine invocations until routine invocationI surfaces and then transfers control to the local label. This can be implemented by representing the non-local labelas a t~o-pointerdescriptor ~ o n t a i n i ~the g frame pointer of the routine invocation that holds the label,and the code address of the label. The imple~entationof a non-local goto to a non-local label L must then travel back through the activation record chain as defined by the dynamic links, until it finds the activation record with a framepointer that is equal to that inL. Each activation
record met on the way must be released, since the activation of the co~espondingroutine is t e ~ i n a t e dimplicitly by the non-local goto. Once the proper activation record has been in found, FP is made to point toit and execution proceeds at the code address indicated L. If routine level-1 ( ) in Figure 6.26 is recursive, each invocation defines a different label L-1. Theirrepresentations differ by pointingtodifferentactivationrecords;performing non-local gotos on them will terminate different n u ~ b e r of s active routines. The two-pointer non-locallabel descriptor can be passedas a parameter or returnedas a value in the same way as a routine descriptor can, and the same pointer scope consideraa routine descriptor holds a tionshold. The twoare,however,fundamentallydifferent: pointer to an activation record that will be the parent of a new running routine and a code address to be jumped to by a routine call instruction, a non-local label descriptor holds a pointer to an activation record that will itself be running again and a code address to be jumped to by a jump inst~ction. As withroutine definitions, it is oftenconvenienttoallocate the non-locallabel as showninFigure 6.34, Again, d e s c ~ ~ t oasr a co~stantentryintheactivationrecord, however, values like L-1-as-a-value may be constructed on thefly. void level-l (void) { non-local-label-descriptor L-l-as-a-value = { FP-of-this-activation-record(), /* FP of level-l0 */ L-1 /* address code L-l of */
1;
void level-2 (void){
... non-local-goto(L-l-as-a-value)
1
...
;
/* goto L-l; */
.".
L-1 :...
1
... Possible code for the construction of a label descriptor.
The data structure representing a curried routine must be capable of holding an indeterminate number of actual parameters, so the above two-pointer descriptor does not- suffice and will have to be extended, The extension consists of space for all its ~arameters,plus a mask indicating which p ~ a ~ e t ehave r s already beenfilled in. Figure 6-35 shows the result of partially parameteri~inga 5-parmeter routinewith its secondandfourth p a r ~ e t e r values.
F i ~ 6~.35r A~ closure for a partially parameterized routine,
This represen~t~on of a partially parameterized routine is called a e tunately it is the same tern as used in 'transitive closure' and 'closure a1 which it has nothing to do. The closure can be allocated on the stack or on the heap, with t conse~uencesfor the pointer scope rules. nce we have this representation, implementing the desired operations on it is simple. u ~ y i n ga closure further is implemented by scanning the mask to find the locations of the free p ~ a l ~ e t e and r s filling them in. The required type checking has already been done since the curried routine just has a new routine type with a subset of the original parameters. If currying in the stricter sense of supplying only the first actual parameter is used, the mask can bereplaced by a counter and finding the locationof the first free parameter is trivial. Calling a curried routine implies completing the closure by supplying the last of the ~ a m e t e r sand then invoking it. The invocation of a completely curried routine can be one remar~ablyefficiently: allocate room for the activation record and block copy the closure into it. If the layout is chosen properly, the parameters and the lexical pointer are already in place, and the mask can be overwritten by the dynamic link; such a layout is already shown in Figure6.35. ote that it is possible to have a closure that represents a fully parameterized routine, which has nevertheless not been invoked. Such a routine/closure acts as a parameterless routine, and can be treated as such. assing a curried routine as a parameter or returning it as a value can be implemented in the obvious way: pass a copy of the closure. Since the closure contains the requiredlexical pointer, it can be invoked in the same way as described above. This implementation of r e ~ ~ i r ethat s aEZ routines must be treated as curried and closures must be used instead two-pointer routine descriptors everywhere.
In this subsection we have considered representations for nested routines and implementations for theoperationsonthem. Some of theseoperations,notablypassingthem as parameters and r e t u ~ i n gthem as values, give rise to new routines and we have been careful to ensure that our implementations of the operations also work on them. Most operations on routines and non-local labels are compatible with a stack regime, escept returning them as values, which requires heap-allocated activation records. Since heap allocation of the activation records slows down the calling mechanism,it is important to seek ways to avoid it and maintain a stack regime. We will now discuss such a way.
The problem with two-pointer routine descriptors is that they contain a lexical pointer, which points into the stack and which reduces its pointer scope. As a result, two-pointer routine descriptors cannot be moved around freely, which in turn limits their usability. One simple solution was given above: allocate all activation records on the heap, ~e will now consider another simple solution: do away with lexical pointers and make all routines global.This i ~ e d i a t e l yreintroducestheoriginalproblem:howtogainaccesstothe non-local data; but we now givea new answer: pass pointers to themas parameters. Figure 6.36 shows the result of this transformation on the routines of Figure 6.25. At the top we see the innermost routine level-2 ( ) ,which used to have access to two nonlocalnon-globalvariables, j and k, andwhichnowhastwopointerparameters.When called (in level-l () ), the actual parameters are supplied from j ,which level-1 ( ) has received as a parameter, and the address of k. A similar explanation applies to the call of level-1 ( ) in level-0 ( ) . It is clear that the two assignments *k = l and *j = l assign the valueof l to the proper locations. The transfo~ationshownhere is called la ng. The namederives from the lambdaexpressions in Lispandotherfunctionalthatareliftedtogloballevelby C programmers for ages. it, but the technique has been in use with Lambda lifting has effectively ridus of two-pointer routine descriptors with their lirnited scopes, but the moment we try to pass a lifted routine as a parameter, we run into a new problem: how do we pass on the extra parameters that result from the l a m b ~ alifting? When passing level-2 ( ) as a parameter, how do we pass the j and &k that go with it? There is a stunning answer to this question: use partial parameterization! Rather than passing the code address of level-2 ( ) ,we pass a closure C containing the code address, j , and &k. Notethattheseclosures,unliketheearlierclosures, do notcontainlexical pointers. When the routine parameter corresponding to C is finally called, the run-time system recognizes it as a closure and performsa closure invocation. The scope of closure C is d e t e ~ i n e dby the scope of & j ,which, unfortunatel~is again that of the activation record of the lexically enclosing routine, leve1-1 ( ) . This still prevents the closure from being returned as a value, due to pointer scope ~roblems. This problem is solved by another drastic measure: all local data thatis used non-local allocated on the heap and the co~espondinglocal entries are replaced by pointers. ure 6.37 shows the final result. In particular, the closure passed in the call of A () has
int i; void level-2 (int *j , int *k) { int 1;
...
/* code has access to i,*j, *k, L */
*k = l;
void level-l (int *j) { int k;
...
level-2 (j, &k) ; A (level-2) ;
/* /* /* /*
code has access to i, * j , k */ was: level-2 ( ) ; */ level-20 as a parameter:*/ this is a problem */
void level-0 (void) { int j;
/* code has access toi, j */ /* was: level-1 (1 ; */
Figure 6.36 Lambda-lifted routines in C notation,
infinite scope: it contains a code address and two heap pointers. Thus, the closure can be The activationrecord of moved to anyenvironmentandcalledwhereverneeded. level-1 ( ) may long have disappeared; the required ‘locals’ in it still exist on the heap. We have now achieved an implementation in which the stack mechanism is used for the routine invocation administration and the strictly local variables, and in which variables that are used non-locally are allocated on the heap. Such an implementation is advantageous for languages in which efficient routine calling and free move~entof routine variables are very i ~ p o ~ a n The t . implementation is especially profitable for languages that require currying anyway; in short, for i~plementingfunctional languages and advanced imperativelanguages.Themainproperty of the i ~ p l e ~ e n t a t i oisnthatroutinesare represented by closures that do not carry lexical pointers; this gives them the oppo~unityto acquire infinite scope.
The above techniques give us enough material to implement both iterators and coroutines in a simpleway. The twoimplementationsarealmost identical. Theinvocation of an iterator or coroutine creates an activation record in the usual way, and its address is stored in a variable, allocated for the purpose; the activation record will have to be allocated on the heap, unless the functionality of the iterator or coroutine is restricted severely. When
int i; void level-2 (int *j , int *k) { int 1;
..*
1
/* code has access toi,
*k = l; *j = 1;
*jl
*k, 1 */
void level-l (int *j) { int *k = (int *)malloc(sizeof (int)) ;
1
... /* access code has to i, * j l *k */ level-2 (j, k); /* was: level-2 0 ; */ vel-2, j, k)); /* was: A(leve1-2); */
void level-0 (void) { int *j = (int *)malloc(sizeof (int));
/* code has accessto i, *j */ /* was: level-1 0 ; */
~ i ~ 6. u37r ~~a ~ b d a - l i froutines t e ~ with additional heap allocation inC notation.
either is suspended, continuation information is stored in that activation record, The iterator returns tempor~ilyto its caller, whose activation record can be foundby following the dynamic link; the coroutine transfers to another coroutine, whose activation record can b found in its coroutine variable. When the iterator or coroutine is resumed the continuation i ~ f o ~ a t i oisnretrievedandactedupon.Whenthe iterator or coroutine t e ~ i n a t e s its , activation record is deleted, and theiterator or coroutine variable zeroed. This concludes our discussion of non-nested and nested routines and the implementation of several ope~ationsapplicable to them. The basic data structures in all the implementations are the activation record and the routinerepresentation, The latter exists in two variants, the two-pointer routine descriptor and the closure. The detailed allocation design decisions are based on the pointer scope rules explained in Section 6.2.3.2.
In Section 4.2, we discussed code generation for expressions and basic blocks. In this section, we will concentrate on code generation for statements that affect the flow of control, and thus de~arcatethe basic blocks. Three levels of flow of control can bedistin~uis~ed:
-local flow of control, which determines the statement inside a routine or method to be executed next (Section 6.4.1);
-routine calls and method invocations, which perform the parameter transfer and flowof-control manipulation needed to activatea new routine (Section 6.4.2);
-non-local jumps, which transfer the flow of control out of the currently running routine into an ancestor routine (Section 6.4.2.3). We assume that all source code expressions, with one exception, have already been evaluated and that the results have been stored in an appropriate place, usually a register. The exception is formed by Boolean expressions used for flow control, for example the control expressions in if-statements; they are treated separately in Section 6.4.1.1. In addition, we assume the existence of a mechanism in the compiler for allocating temporary variablesandlabels.Thisdescription is intentionallynotveryprecise;detailsalways depend on the source and target languages. To describe the code generationfor the flow of control, we will use thefour statement types below, written in a Pascal-like notation. Each has a s t r a i g h t f o ~ ~equivalent d on virtually all processors.
-a simple goto statement: GOTO label;
-an indirect goto statement: GOTO label register; -a conditional goto statement, in two forms:
IF c o ~ ~ i t register i o ~ THEN GOTO label and IF NOT c o ~ d i t i oregister ~ THEN GOTO label; -an assignment statement: destinatio~:= source. Wewillsometimeswritesimpleexpressionswhereregistersappear in theabove s~~tenlents, especially when these expressions derive from the code generation mechanism itself rather than from the source code; this increases the readability of the code samples.
Thetwomainmechanisms for influencingthelocalflow of controlinimperativeand o~ject-orientedlanguages are ‘selection’ and ‘repetition’. Selection causes a piece of code to be selected for execution, based on the value of some expression. Repetition causes a iece of code to be executed zero or more times, based on the valueof some expression or expressions. More often than not these expressions are Boolean expressions, and in many cases it is useful to translate them in special ways. We will therefore first consider code ene era ti on for Boolean expressions used in controlling program execution, and then turn to code for selection and repetition statements.
ooleanexpressionsarenodifferentthanotherexpressions:evaluating ean value, on the stack or in a register. Most Boolean expressions are, however,usedtoaffecttheflow of c trolratherthantoprodu a value; wewill call ooleanexpressionsusedinthat way oolean co~trolexpressi S. There aretworea-
sons to treat Boolean control expressions specially. The firstreasonhasto do withtwoproperties of machine inst~ctions. expressionsoftenconsist of compa~sonsandthecomparisoninstructions machines produce their results in special condition registers in a special format rather than as 011 integer values; so an additional conversion is required to obtain a genuine And the most usual way of affecting the flowof control is by using a conditional j the machine instructions for conditional jumps base their decisions on condition registers rather than on 011 integer values; so jumping on a Boolean value requires an additional conversion of the Boolean value to a value in a condition register. It will, however, be clear that the naive code sequence for Boolean control expressions
/* code for the Boolean expression: */ comparison code, yielding a condition value conversion from condition value to Boolean /* code for the conditional jump: */ conversion from Boolean to condition value jump on condition value is to be avoided. The second reason is related to a property of some p r o g r a ~ i nlanguages. ~ Several program~inglanguages (for example C, Ada, Java) feature lazy Boolean operators, operators that evaluate operands only when their value is needed. Examples are the && and I I operatorsin C. Suchoperators do not fit thetranslationmodeldiscussedinSections 4.2.4.1 and 4.2.4.2: exprl GC& exprz cannot be translated as code to compute expr, in loc, code to compute expr, in loc, code for the&& operator on loc, and loc,
since that would result in the unconditional evaluation of both expressions. Instead, code intermingledwithconditional jumps mustbegenerated;againthese ~onditiona~ jumps reacttovaluesinconditionregisters.Inshort,Booleancontrolexpressionsareti interrelated with conditional jumping. This relationship can be exploited conveniently when we know the labels to which control must be transferred when the Boolean expression yields true or false, before we generate the code. We can then use acode generation technique like theone shown in Figure 6.38. The procedure Generate code for Boolean control expression gets two parameters, True 1 abe1 and Fa1 se 1, labe in addition to the usual Node pointer. A special value No label is available for these parameters, to indicate that control must continue at the end of the expression: the control must ‘fall through’ to the end. We assume a single condition register here, although most machines have several of them, with assorted semantics. If the node represents a (numeric) comparison operator, we generate code for this operator; this leaves the result in the condition register. Then, depending on the presence or absence of the true and false labels, we generated zero, one or two jump inst~ctions. The use of the value No label is shown in the entry for the lazy && operator. First we generate code for the expression before the && operator, such that when the expression fails control is transferred to the Fa1se label and when the expression succeeds, the
~
~
~6.4T Code I o~ e ~n e r ~ t i o for n co~trolf l
o S~
PROCEDURE Generate code for Boolean control expression ( Node, True label, False label ):
SELECT Node .type: CASE Comparison type: // c , >, ==, etc. in C Generate code for comparison expression (Node .expr); // The comparison result is now in the condition register IF True label/= No label: Emit ( It IF c ~ n ~ i t i o register n THEN GOTO" True label ); IF False label/= No label: Emit (ltG0TO" False label) ; ELSE True label= No label: IF False label/= No label: Emit ( I NOT F c ~ n ~ i tregister ~ ~ n THEN GOTOII False label) ; CASE Lazy type: and // the && in C Generate code for Boolean control expression ; (Node .left, No label, False label) Generate code for Boolean control expression (Node .right, True label, False label) ; CASE . . Negation CASEtype: // the ! in C Generate code for Boolean control expression ; (Node, False label, True label) Figure 6.38 Code ge~erationfor Boolean expressions.
control falls through, to reach the expression after the && operator. Similar entries can be c o ~ s t ~ c t for e d the I I and ? :operators of C. The last entry in Figure 6.38 shows that the i ~ p l e ~ e n t a t i oofn the negation operator comes free of charge: we just swap the true and false labels. As an example, thecall Generate code for Boolean control expression ( Parse ( I ' i > 0 & & j =. O t t ) , No label, Else label ):
in which we assume that Parse (string) produces the parse tree for string,yields the code sequence Compare-greater i, 0 IF NOTc ~ ? ~ ~ i register t i ~ n THEN Compare-greater j , 0 IF NOT c ~ ~ n ~ register ~ t i ~ nTHEN
GOTO lseE GOTO
Else
labe l label
There are also occasions when we have to construct a genuine Boolean value, for example to assign it to a variable or to pass it as a parameter. We can then use conversion
instructio~sif the target machine has them, or use the above scheme to produce code to jump to places where the proper values are constructed. S
The two most c o m ~ o nselection statements are the if-state~entand the case statement. The i f - s t a t e ~ ~ selects nt ne of twostatementsequences(one ofwhichmaybeabsent), based on the value of a oolean expression; the case statement (also sometimes called a ‘switchstatement’or a ‘selectionstatement’)selectsoneout of several statem~nt of an integer or enumeration expression. sequences, based on the value
~~e ~ - s t ~ t e ~ e ~ t The general form of an i IF Boolean e x ~ r e ~ ~ ~THEN s i o n true
f sequence
ELSE false seque~ceEND IF;
which results in the AST
Boolean e x ~ ~ e s ~ ~ i o nsequence true
.false sequence
Code generation for an if-statement is simple: allocate two new labels, fa1se-labe l -labe l,and generate the following code: Boolean control code( ~ o o l e a ne x ~ r e s s i o n / 0 code fortrue seque~ce GOT0 end-label; false-label: code forfalse seque~ce end-label:
I
false-label)
oolean control code is thecodegenerated by theprocedure Generate code for Boolean control expression withtheparametersshown. If the else-part is absent (or empty), the fa1se-labe l is not needed and we generate Boolean control code (Boole~n ex~ression/ 0 , end-label) code fortrue sequence end-label:
e case s t a t e ~ ~ ~ t We will consider se S
CASE case expression IN I , : state~entsequence I
.I..
I, : ~statelnentsequence, ELSE else -state~entsequence END CASE;
where I I ...I, are case labels -integer or enumeration values representing compile-time constants. The expression case expression must be evaluated. If its value is equal to one of the values I I ,...,I,, the corresponding statement sequence is executed. If not, the statement sequenceof the else-part is executed. here are many code generation schemes for case statements and it is the task of the c o ~ p i l e rto select an efficient one for the statement at hand. The choice depends on the number of case entries, the range (or reach) of the case labels, and on the density o f the case labels within the range. he followi~gis a simple scheme that works well when there are only a few case entries (n being small, say 10 or less). First, nt-2 labels are allocated: label-l through label-n, label-else, and label-next. Also, a temporary variable is allocated for the case expression. Then, the code of Figure 6.39 is generated. This scheme implements a linear search through all the case labels. Note that we allocate a temporary variable for the case expression. Usually, the language manual specifies that the case expression must be evaluated only once, and, even if the language manual does not, it may still be more efficient to do so.
tmp-case-value := case ex~ression; IF tmp-case-value = 1, THEN GOTO label-l;
...
IF tmp-case-value = 1, THEN GOTO label-n; // or insert the code at label-else GOTO label-else; label-l : code for stute~entsequence l GOTO label-next;
...
label-n: code for.state~ent sequence, GOT0 label-next; label-else: code forelse -state~ent sequence label-next: Figure 6.39 A simple translation schemefor case statements.
The execution timeof the above scheme is linear inn, the number of cases in the case statement.Caseselection in constanttime is possibleusing a jum le, asfollows. First
the compiler computes the lowest case label Iiowand the highest case label Ihigh.Then the entries, to be indexed with indices ranging from compiler generates a table of [hjgh-lio,+l 0 to I h ~ ~ h -The I ~ entries ~ ~ . in this table are code labels: label-k for an entry with index Ik-IloW,for k ranging from l to n, and label-el se for all others. Finally the follow in^ code is generated: tmp-case-value := case ex~~ession ; IF tmp-case-value I,, THEN GOTO label-else; IF tmp-case-value > I,,,,, THEN GOT0 label-else; GOTO table [tmp-case-value - Iiowl ;
If Ihigh-~~ow is much larger than n, many of the jump table entries contain label-el se, and the table may be deemed too space-inefficient. In that case, the case labels can be organized into a balanced binary tree, in which each node of the tree represents one case label I, therightbranchindicates a subtree with case labels larger than I, andthe left branch indicates a subtree with case labels smaller than I. For each node node-k in the binary tree, the following codeis generated: label-k: tmp-case-value IF THEN GOTO label of left branch of node-k; IF tmp-case-value > I, THEN GOT0 label of right branch of node-k; code forstate~entsequencek GOTO label-next;
Ifthe left branchand/ortherightbranchdoesnot exist, the c o ~ e s ~ o n d i nGOTO g is replaced by GOTO label-else. Many more advanced translation schemes for case statements exist. Several translation schemes for the case statement were analyzed and compared by Sale (1981). Very sophisticated techniques for producing good code for the case statement are desc~bedby Hennessy and Mendelsohn(1 982), Bernstein (1985), and Kannan and Proebsting(1994).
ts are the while statement and the for-state~e~t. while statement executes a statement sequence an indete~inatenumber of times (including 0 times), as long as the while expression remains fulfilled. The for-state~entexecutes a statement sequencea fixed numberof times.
The w ~ i state~ent l ~ WHILE
oozea an
expression DO state~entsequence END
WHILE ;
can be processed by allocating two labels: end-label and test-label,and generating the following code:
test-label: Boolean control code( ~ o o l e a nexpression, code forstate~ent sequence GOTO test-label; end-label:
0,
end-label) ;
In many cases, however, the following scheme results in more efficient code: allocate two labels: sequence-label and test-label,and generate the following code: GOTO test-label; sequence-label: code forstate~ent sequence test-label: Boolean control code ( ~ o o l e a ne~pression, sequence-label,
0) ;
This scheme is usually more efficient when there are several iterations, because it only executes a single conditional jump instruction per iteration, whereas the first scheme executes a conditional jump inst~ctionand an ~nconditionaljump inst~ctionper iteration. If ~ u u Z e ~e~pression n evaluates to f a1 the sefirst time, there will be no iterations, and the first scheme is more efficient. Which scheme is more efficient also depends on the target processor. Note that the branch in the conditional goto statementof the first scheme is actually taken only once. On many processors, in particular processors that maintain an instruction. look-ahead cache, a conditional branch instruction is more expensive when the branch is taken than when it is not, because when it is, the instruction look-ahead cache must be flushed. On the other hand, some modem processors perform look-ahead at the target of the branch instruction as well, andlor havea sophisticated branch prediction mechanism. We will first consider the following typeof ~ o ~ - s t ~ t ~ FOR
i IN lower b o u n ~ . . upper b o u n ~DO state~ent sequence
END FOR;
where i is thecontrolledvariable of the for-state~ent,the implicit step size is 1, and Zu~erb o u n ~and upper b o u n ~are to be evaluated once upon starting the for-statement. Code generation for a for-statement is quite tricky, because care must be taken that the controlled variableof the for-statement does not cause ovefflow. tiveapproach is to allocate a temp or^ variable tmp-ub for theupper erate the following code: i := lower b o u ~ ~ ; tmp-ub := upper b o u n ~ ; WHILE i c = tmp-ub DO code for~ t ~ t e ~ esequence nt i := i+l; END WHILE ;
where the while statement is handled as described above. Unfo~unately,this scheme will ~ oproduce u n ~the largest not always work. In particular, the computation of u ~ ~ e r may value representable for the type of the controlled variable i. After a while i will reach the value of tmp-ub. Then, on machines that detect overflow, the increment of the controlled variablewill cause anexception. On machinesthat do notdetectoverflow,the forstatement will never t e ~ n a t e because ? i can never become larger than the largest value representable for its type, and can thus never become larger than tmp-ub. Therefore,theloopterminationtestmustcomparethecontrolledvariable i with tmp-ub for equality, and be executed after the statement sequence, but before the increment of the con~olledvariable. However, moving the loop t e ~ n a t i o ntest to the end of the statement sequence means that another testis required to detemine if the loop is to be entered at all. This leads to the following, improved, scheme (with the tempor~yvariable tmp-ub allocated as above, and labelsloop-label and end-label): i := lower b o a n ; ~ tmp-ub := apper ~ o u ~ ~ ; I F i > tmp-ub THEN GOTO end-label; loop-label: code for s t ~ t e ~ e n~t ~ e ~ a e ~ ~ e I F i = tmp-ub THEN GOTO end-label; i := i 3- 1; GOTO loop-label; end-label:
In this gene ratio^ scheme, the first I F clause makes sure that the statement sequence will neverbeexecutedwhen lower ~ o u is n larger ~ than u ~ ~~e or u Also, ~ ~the. controlled variable i will never be incremented beyond upper ~ o uthus ~ preventing ~ , overflow and its consequences. i An issuethat we haveignoreduntilnow is whatvaluethecontrolledvariable should have after the loop is finished. In some languages? the for-loop declares the controlled variable implicitly, so it no longer exists after the loop. Some other languages not specify what value the controlled variable should have afterwards, orexplicitly specify it are e~oneous). Whenthe that it is notspecified (so thatprogramsthatdependon language manual does specify the value, the implementation should of course follow the manual, if necessary by adding the properassign~entat the end-label. Many languages offer more general formsof for-statements, for example providing an explicit step size. An explicit step size causes additional complications, as is illustrated by the following for-statement: FOR i I N 1..6 STEP 2 DO
... END FOR;
The complication here is that the controlled variable never becomes exactly equal to the upper bound of the for-statement. Therefore, the scheme described above fails miserably. With an explicit step size, we cannot compare the upper bound and the controlled variable for equality, and, as we have seen above, we cannot compare for greater/s~allereither, because of possible overflow problems. A solution lies in com~utingfirst the number of times the loop will be executed. An
~
~6.4 1Code 0 ~ e~n e r ~ t i oforn c~ntrolflow S
extra temp or^ variable is neededtoholdthisnumber.Notethatthistemporaryloop count must have a range that is large enough to represent the difference between the maximunl value and the minimum value representable in the type of the controlled variable. or example, if the controlled variable is of type integer, the value PIAX (integer) MIN (integer) must be representable in this temporary variable. ~emarkably,this can be a c c o ~ ~ l i s h ebyd making the loop count an unsigned integer of the same size as the integer: if therepresentablevalues of aninteger range from -2’’ to 2n-1, thenthe re~resentablevalues ofan unsigned integerof thesame size range from 0 to 2”+”-1, which is exactly large enough. Also, if the step size expression is not a constant, the step size needs a temporary variable as well. All of this leads us to the scheme of Figure 6.40. Note that, although we no longer use the controlled variable to detect whether the for-statement is finished, we still have to keep its value up-to-date because it maybeusedin the state~~ent sequence code. Also note that the compiler can switch to the simpler and more efficient scheme discussed earlier when it can determine that thestep size is l.
i := lower bound ; tmp-ub := upper bound; tmp-s t ep-s zei := step-size ; IF tmp-step-size = 0 THEN ,.. probably illegal; cause run-time error ... IF tmp-step-size tmp-ub THEN GOTO end-label; // the next statement uses tmp-ub - i // to evaluate tmp-loop-count to its correct, unsigned value tmp-loop-count := (tmp-ub - i) DIV tmp-step-size + 1; GOTO loop-label; neg-s t ep : IF i
upperbound) THEN
where T H R O is ~ a basic statement to signal anerror. These checks can bequite expensive, which is why many compilers have a switch that disables the generationof these checks.
When a programming language does not have facilities to allow the user to deal with runtime errors, the story ends here. The ~ n - t i m esystem can make sure that it produces an intelligible error messa~e,and then rnay termi~atethe program, producing some kind of memoryimage,usuallycalled a ‘memorydump’,whichcanthenbeexaminedusing a post-mo~emdebugger. Many languages, however, include a mechanism for dealing with run-time errors. We will now discuss two such mechanisms, signal routines and exceptions.
A simplemechanism for dealingwithrun-timeerrors
is the nal statement. A signal statementbindsaspecificclass of errorconditionstoaspeciuser-definedroutine,the ler. Wheneverarun-timeerroroccurs,thecorrespondingsignalhandler is led. The signal handler might then close open files, free system resources, print a message, and terminate the program. Alternatively, it could deal in some other way with the error, and just return. In the latter case, the pro ram will resume just after the point where the error occurred. To implement the signal statement, the run-time system maintains a program-widelist of (errorconditionclass,signalhandler)pairs, so it cancallthe co~espondingsignal handler when a run-time error occurs. Sometimes it is convenient to have the program continue at a point different from where the run~timeerror occurred, For example, a program could be in a main loop processing commands. A run-time error in one of the commands should then cause the program to print an error message and continue with the main loop. The programmer can implement this by using a non-local goto statement in the signal handler to transfer to the main loop of the program. In a non-nesting language like C, the s e t j m p ~ o n ~ m mechanp ism can be used,as shown in Figure 6.46.
#include <setjmp.h=. jmp-buf j mpbuf;
/* type defined in setjmp.h
*/
void handler (int signo) { printf (“ERROR, signo = %d\ntt , signo); longjmp (jmpbuf , 1 );
1
int main(void) { /* install the handler ... */ signal (6, handler) ; /* ... for some signals*/ signal (12, handler) ; if (setjmp(jmpbu~)== 0) { /* setting up the label for longjmp() lands here */ /* normal code: */
... } else { /* returning from a call of longjmp() lands here*/ /* exception code: */ ...
Figure 6.46 An example program usings e t j mp/longj mp in a signal handler.
A more flexible feature, available
in amming languages, is the facility to specify ‘exception handlers’. An ~x~~ specifies particular a error condition and a set of statements that will be executed should this error condition occur. This allows an exception handler to access the local variablesof the block or routine in which the error condition occurred. The statements in the exception handler replace the rest of the statements in the block or routine, should the corresponding error condition, or ‘exception’, occur. ~ s u a l l yeach , block of statements or each routine can have its own exception handler. It can even have several exception handlers, for different error condition^. When an exception E occurs, the chain of activation records is followed, releasing activation records on the way as in the implel~entationof a non-local goto, until one is found that has a handler for the exception E. This handler is then executed, and execution resumes at the end of the block or routine that the handler is associated with -unless the exception handler terminates the programor causes another exception. For each exceptionhandler,thecompilergeneratesthe code co~espondingtothe statements in the exception handler,term in at in^ it with a jump to the endof the block associated with the handler, and labels this code with a unique handler label. Also, for each block or routine with exception handlers it generates a table of (exception, handler label) tuples, with one entry for each exception handler. In addition, the administration partof each activation record must contain a pointer to the table of exception handlers currently installed for its routine. If the exception handlers are associated with a routine, this is a constant; if they are associated with a block, the pointermustbeupdatedasblockswithexceptionhandlersareenteredandleft.Now, when an exception E occurs, the chain of activation records is examined as follows, most recent activation record first:
I The pointer to the table of exception handlers is extracted from the currently examined activation record.
2 The table referenced is searched for a handler for exception E. If such a handler H is found, step 3 is performed. If not, step l is perfomed on the parent activation record is ter(the one of thecaller). If thereare no moreactivationrecords,theprogram minated, possibly after printing a message. 3 A non-local goto to the exception handler H is performed. This automatically releases all e~aminedactivation records that turned out not to have a handler for exception E. Note that the ~eneratedcode for the exception handler takes care of the continuation of the program execution,once the handler is finished. The discussion above leads us to the impo~antobservation that routines with exception handlers can be the target of a non-local goto. So, depending on the implementation of non~localgotos, the compiler may or may not keep variables of routines with exception handlers in registers, in accordance with the preceding discussion on non-local gotos. A disadvantage of the method described above is that, depending on the i ~ p l e m e n t a ~ tionofthenon-localgoto,activationrecordsmaybeexaminedtwice,oncetofinda handler, and once to perform the non-local goto. An optimization to this a~proachis to restore information from an examined activation record when it turns out not to have a
handler for the exception, and to then discard the activation record. The exception handler mechanism is more flexible than the signal handler mechanism. Its increased flexibility comes, however, at the cost of a slight increase in the number of instructions required to perform a routine call or block entrance; these instructions are necessary for theconstruction of theexceptionhandlertable.Otherexceptionhandler implementations are feasible, of which we will discuss one example. An alternative implementation replaces the (exception, handler label) pairs mentioned above by a program-widelist of tuples(exception,handlerlabel,startaddress,end ere ‘startaddress’indicatesthebeginning of theblockprotected by the handler, and ‘end address’ indicates the end of that block. When an exception occurs, the run-time system uses this list of tuples to determine which handler should be invoked, as follows. First, the value of the program counter at the time the exception occurredis compared with the start and end addressesof blocks protected by a handler for the exception at hand. If it falls within a start address-end address range, a handler is found, and a nonlocal goto is performed, as discussed above. If not, the return address is extracted from the activation record at hand. This return address representsthe position in the code where the caller currently is. This position may again be protected by a handler, so, again, it is compared with the start and end addressesof blocks protected by a handler for the exception at band. Again, as above, if a handler is found, a non-local goto is performed. If not, the return address ofthe caller is extracted, etc. Ultimately, either a return address (and an activation record) witha handler is found, or execution terminates. This scheme is more efficient as long as no exceptions occur. However, searching a handler may be more expensive, depending on the exact implementation of the exception handler table.
es) are syntactic structures in whicha number of related items are grouped together. They often restrict access to their contents by providing an explicit a interface, which is then the only means of accessing the module. The related items in module could for example be variables and routines affecting these variables. Modules are somewhat similar to objects, but there are also considerable differences. In some respects theyaresimpler:theyusually cannot becreated dynamically, andlack all the objectoriented features discussed in Section 6.2.9. On the other hand, modules are compilation units and it must be possible to compile them separately; second, modules often require an explicit initialization. Regarding code generation, modules introduce two problems to the compiler writer:
-Thetargetlanguageusuallyhasone,
flat, namespace. ”herefore, thecompilermust make sure that if two different modules export an item of the same name, they have a different name in the generatedcode. -The compiler must generate code to perform the module initialization; a co~plication here is that at module initialization modules may use items from other modules, so these other modules mustbe initialized first.
Usually, the rules that the characters in a name must obey are more strict in the source language than in the target language. Often, there is a character c that is allowed in names in the target language but not in names in the source language; examples are the .(dot) andthe $ sign. If so, this feature canbeusedtocreateuniquenames for itemsin a module: simply concatenate the item name to the module name, using c as a separator. Note that this assumes that module names are unique, which they usually are. If there is no such character, there may be some other rules in the source langua can be exploited in the compiler to produce a unique name. If all else fails, the compilation process could havea phase that analyzes the complete program and does name generation.
Most source languages allow modules to have explicit i ~ i ~ i e , for example for lobal variables. Even if this is not the case, a language might require implicit initiali~ation of these variables. Note that a module, and more specifically the initialization code of a module, could use items from other modules. This means that those modules should be initialized earlier.Thiscanbeaccomplished by havingthe initiali~ationcode of each module call the initialization code of all the modules that it uses. Then, the whole initialization phase can be started by calling the initiali~ationphase of the module that contains the main program. When adopting this solution, there are two issues one should be aware o f avoiding multiple i~itiali~ations and detecting circular dependencies. If module A uses module B and module C, and moduleB also uses module C, the initialization code of module A calls that of module B and that of module C. The initialization code of module B also calls that of module C, so C’s initialization will get called twice. This canbeprevented by having a module-s~eci~ic variable This rnodul e has been initial zedi in each module, which is set to true once its initialization code is called. The initialization of a module then becomes: IF NOT
This module has been initialized: SET This module has been initialized TO true; // call initializationo f the modules usedby this module // code f o r this module’s own initializations
ircular dependencies between modules ~ e c ~ c ~ tare i ousually ~s detectedby the compiler.
hen compiling a module specification A that imports a module specification B, the com-
piler usually demands that module specification B is present for e~aminationor has already been compiled. If module specification B then requires module specification A, the compiler can easily detect thatit already was compilingA, and deal with this according to what the language manual prescribes (whichis probably to disallow this). Formodule i ~ ~ Z e ~ e ~the t situation ~ t ~ ois~ sdifferent. W e n compiling a module
implementation A, which uses module specification B, the compiler can see what module specifications are used by the module specification for B,but it cannot know what module specifications are used by B’s implementation, since the latter need not even be available yet. Now suppose the present or a future B implementation uses A’s specification. Then we have a circular dependency, which is not detected at compile time, because when cornpiling B’s implement~tio~, the compiler only readsA’s specification, which in our example does not useB at all. One way to detect circular dependencies is to postpone this check to run time, during module initialization. Eachmodulethenhas a module-wideBooleanvariable This module is being initialized, and the initialization code then becomes: IF This
module is being initialized:
// circular dependency; deal with it
IF NOT This module is initialized: SET This module is initialized TO true; SET This module is being initialized TO true; // call initialization of the modules used by SET This module is being initialized TO false; // code for this module‘s own initializations
this module
A disa~vantageof this approach is that the error does not become apparent until the compiled program is run. A more elegant solution to this problem is to have the compiler produce a list of the modules that each module uses. A separate compilation phase, which is invoked after all module implementations have been compiled, may then read theselists and detect circular dependencies. Such a phase could also be used to determine a global initialization order, by imposA 0 ) {B(i-l);} showSP ("A.SP. exit" );
1
void B(int i) { showSP ( !!B.SP.entry") ; if (i 0 ) {A(i-l);} showSP ( IIB. SP exit ;
1
Figure 6.48 Mutually recursive test routines.
(c) Design a format for routines as parameters and test your design with the code from Figure 6.49; start by calling A (l0 , B) . void A(int i, void C()) { showSP ( "A.SP .entry"1 ; if (i > 0) {C(i-l, A);} showSP("A.SP.exit");
1
void B(int i, void C O ) { showSP ( IfB. SP .entry") ; if (i > 0) {C(i-l, B);} showSP ( !!B.SP .exit ) ;
1
Figure 6.49 Routines for testing routines as parameters.
esign a formatfor labels as p~ametersand test your design with thecode from Figure 6.50. A (int i, label L) { showSP ( "A.SP .entry") ; if (i 0) {B(i-l, exit); return;} exit: showSP(lfA.SP.exitl') ; goto L;
void
1
void B(int i, label L) { showSP ( IIB. SP .entry") ; if (i > 0 ) {A(i-x, exit); return;} exit: showSP(IlB.SP.exit"); goto L;
1
Figure 6.50 Routines for testing labels as parameters.
What code should be generated for the Boolean assignment b := x > y,in which b is a Boolean variable with the representation 'true' = l and 'false' = 0, and x > y translates to Compare-greater x, y, which leaves the result in the c o ~ ~ ~register? t i o ~ The discussion of case statements in Section 6.4.1.2 mentions using a balanced tree for the case labels as a possible i~plementationfor the case statement. Why does the tree have to be balanced? Discuss the translationof a case statement by means of a hash table. A repeat statement allows the programmer to execute a sequence of statements an indete~inatenumber of times, until a ~ o o Z ee~~~ ~r e s s isi ofulfilled. ~ Give a translation of the statement REPEAT .state~entsequence UNTIL Boolean e ~ ~ r e s s i ~END ?n
REPEAT;
to intemediate code. Note that thes t ~ t e ~ e ~ ~ is executed s e ~at~least ~ ~ once, c e The C languagehas a breakand a continuestatement.Thebreakstatementterminates the closest enclosing loop, and the continue statement proceeds with the next iteration. In fact, these are both jumps. In the code generation schemes of Section 6.4.1 3, where would these go to? In other words, where should the compiler place the break-label, and where the continue-label? Why is the C while loop of Figure 6.42 not exactly equivalent to thefor-loop of Fig6.41? Hint: consider the effectof a continue statement inside the body.
The optimization explained in the last subsection of Section 6.4.1.3 replaces a loop with a step size of 1 by a loop with a step size of 4, thus apparently introducing all the complexity of the code of Figure 6.40. Devise a simpler code scheme for such loops. Make sure it is still ove~ow-resistant. In Section 6.4.2.2 we claim that pushing the last parameter first onto a stack is a technique suitable for languages that ailow a variable number of para~etersto be passed in a routine call. Explain. Refer to the subsection on signal handlers in Section 6.4.3.2 on handling run-time errors. Consider the following code fragment in a language that allows a exception handler to resumeat the point where theerror occurred: X :=
A
Y := A
/ B; / B;
An optimizing compiler wants to transform this code into: X := A / B; Y :=
x;
Explain why, if an exception handler is defined for the code fragment, this optimization is incorrect. .32 A language designer is designing a p r o g r a ~ i n glanguagethat
features, a m o ~ ~ other things, genesic routines with generic type parameters. Helshe considers adding generic routines as generic p~ameters(non-generic routines can already be passed as normal parameters in the language) and comes to you, a compiler designer, for advice. Evaluate the proposed addition froma compiler const~ctionpoint of view. i of i ~~~ e ~ t~~t ui vn ~ e ~ i ~~~u~ ~ l ee ~ e ~ Study t ~ t Shesidan’s i ~ n : paper (1959) on the IBM F O R T R compiler, ~~ and identify ands u ~ m ~the z etechniques used.
3 ~
unctional languages are based on the idea that a program is a function with one input parameter,itsinput,andoneresult,itsoutput.Running a program is seen as the application of a function to the input, resulting in the output. This computatio~almodel of functionapplication,simple as it is, is veryexpressiveandbuildsupontheresultsof mathematical research into computable functions. The at he ma tical worldview of functionallanguagesshowsupin a number of places. For example, syntactical constructs have been included in the language to ease the s~ecificationof sets and recursive functions. More important, however, is the idea that a p r o g r a ~ e should r not be bothered with implementation details of a certain function, but should rather concentrate on specifying its input-output relation. In particular, a ~rogrammer shouldspecify w h ~ tto compute,ratherthan how, ~ h e ~ and e , when. ~onsequently,there are no ; -operators, for-loops, or other statements that specify how the co~putationproceeds from one statementtotheother;morespecifically,thereare no statements at all. Also, there is no assignment operator specifying a variable or memory location where a computation mustbe stored. The difference in approach to computing is illustrated by the following example that lists progr~msections for the factorial function f ac in the functional language Haskell (Figure 7.1) and the imperative language C (Figure 7.2).
fac 0 = 1 f a c n = n * f a c (n-1) Figure 7.1 Functional specification of factorial in Haskell,
int fac (int n) { int product =
1;
while (n > 0 ) { product *= n; n--;
1
1
return
product;
Figure 7.2 lrnperative definitionof factorial inC.
The functional specification of f ac is a good example of how recursion is used to tackle problems. The function is defined by two equations. The first equation states that O! = 1, while the second equation states that n != nx(n -l)!. Unfolding the recursion then leads to n != nx(n--l)x(n--2)x,..x1. This definition of factorial is also the basis for the C code, but this is not directly visible because of the interleaving of control statements and computation rules. The C code uses a while loop to avoid expensive recursive function calls; the i n t e ~ e d i a t eresults produced by the loop are accumulated in the local variable product. Efficiency considerations often lead imperative languagep r o g r a ~ e r to s compromise on the readability of their code. Functional language programmers, on the other hand, put the burden of finding an efficient computation order on the compiler. Fortunately, cornpiler technology for functional languages has matured enough to handle cases like factorial: the techniques discussed in the remainder of this chapter are capableof generating C code from the Haskell specification thatis equivalent in efficiency. Functional languages raise the level of abstraction at which programmers should be concernedaboutperformance.Functional p r o g r a ~ e r smust still beconcernedabout O(n’) versus 0 (nlogn) algorithms, but need no longer worry about reducing the number calls, managing memory efficiently, and other low-level programing aspects. A consequence is that the interpreter or compiler must handle these issues and must handle them efficiently. The first functional language compilers were not up to this task, and generated very slow code; a slow-down of a factor of 100 compared to C was no exception. Fortunately, today’s compilers are capable of generating code quality that matches, and sometimes even exceeds, thatof plain C for a considerable class of programs. On the one hand, compilers for functional languages are more complicated than their imperative counterparts. The front-end has to handle additional syntactic constructs capturing concise mathematical notations, whereas the back-end has to handle the higher level of abstraction.Ontheotherhand,functionallanguagescompilersaresimplerthantheir imperative counterparts. The simple and regular syntax leads to a small and clean frontend without the many exceptional cases typically found in imperative compilers, and the functionalparadigmeasesmany of theadvancedanalysesemployed byan o ~ t i m i ~ i n g back-end, since, for example, no memory aliasing can occur.
efore discussing the additional techniques employed in a functional language compiler, we present a short tour of the p r o g r ~ n glanguage Haskell (Peyton Jones and Hughes, 1999) to highlight the aspects of functional languages that require special care. Haskellwasdefined byan intern at ion^ standardcommittee.It is representative of the ‘purelyfunctionallazylanguages’andcontainsmost features fromthefunctional languages developed so far.
The following sections emphasize the aspectsof functional languages thatraise the level of abstraction above that of imperative languages. Some aspects, such as pattern matching, have a modest impact on the compiler and are typically handled in the lexical analyzer and parser,Thisso-called s~ntacticsugar includes:the offside rule, list notation, list co~prehension,andpatternmatching.Otheraspects of functionallanguages like lazy evaluation cannot even be handled completely at compile time and require conside~a~le run-time support. m e syntax of function application is explained below.
Thesyntax for functionapplication is veryconcise,butpossibly misleading for the unwary. Applying a function f to the two arguments 1.1and 13 is written in Haskell as: f 11 1 3
Note that there are no brackets around the arguments to f , nor are the ~ g u m e n t sseparated by a comma.Thereason for simply juxtaposing function and arguments is that it allows currying to be expressed naturally, as shown in Section 7.1.7. so g n+l is Function applicatio~bindsstrongerthananyoperator, parsed as ( g n) + 1 rather than g (n+l ) as the layout may sugges Also, it is left-associative, so g g n is parsed as ( g g ) m,rather than as g ( g n),and parentheses are required to obtain the other interpretation.
The layout of a Haskell program matters to parsing. Consider the following definition d i v i d e , which handles divisionsby zero: divide x 0 = i n f d i v i d e x y = x/y
of
The definitionconsists of twoequations.An *on consists of a left-hand side, followed by the = token,followed by theright-ha . Notethat there is no explicit token to denote the end of each equation. In many languages a line break is treated as ordinary white space, which would allow incorrect parses suchas: (divide x 0 = infdivide ( y = x/y)
x)
Ontheotherhand,treatingevery line breakas a equation t e ~ i n a t o ris ’ since a long expression may need to span several lines for clarity. The trols the bounding box of an expression. Everything below and to the ri is defined to be part of the expression that makes up the right-hand side of the equation. The right-hand side teminates justbefore the first token that is ‘offside’ -to the left -of the = position. The offside rule has to be applied recursively when handling nested equations, as in: f a c n = i f ( n == 0 ) t h e n 1 e l s e prod n ( n - 1 ) where prodace n = i f ( n == 0 ) t h e n acc e l s ep r o d( a c c * n )( n - 1 )
The offside rule can be handled conveniently in the lexical analyzer by having it insert explicit end-of-equation tokens (the ;in Haskell), as follows. The lexical analyzer maintains a stack of offside markers. When the lexical analyzer detects a token with assoc’ offside semantics such as =, it pushes the current character position onto the marker Upon detecting a line break, the lexical analyzer skips all white space and records the character position of the next token. It compares the position against the top of the marker stack. If the new position is less than the marker, the lexical analyzer pops the marker and returnstheend-of-equationtoken.Itcontinuespoppingmarkersandreturningend-ofequation tokens until the stackis empty or the position is greater than thetop marker.
The list is a powerful data structure that has been part of functional languages from day one with the in~oductionof the list processing language LISP (McCarthy, 1960). Since lists are used so frequently, Haskell includes special notation for them.Theempty list, also known as ‘Nil’, is denoted by []. Non-empty lists are written as an opening bracket f , the elements separated by c o m a s , and a closing bracket 1,as shown in the examples below:
Strings are actually just lists of characters, so redt1is a convenient notation for [11 r l l , It e I t , d It I . The last exmple shows a shorthand notation named
forthelist [l, 2 , 3 , 4 , 5 , 6, 7 , 8 , 9 , 101. Theelementsofalistcanbeof any type -integers, strings, user-defined types, etc. -but all elements of a given list must be of the same type; the polymo~hictyping of Haskell, discussed in Section 7.1.5, does not allow lists to contain elements of different types. Listscan also beconstructedelement by elementusingthe : infixoperator.This operator combines an element and a list into a new list. For example, the list E 1, 2 , 3 I is equivalent to the expression (1. : ( 2
: (3 :
[l)))
S a secondexample,considerthe sequence [n .. m] dynamically:
range function,which
r a n g e n m = i f n>m t h e n [ l e l s e( n : range
constructs thearithmetic
(n+l) m )
at closely matches mathematical set notation. For example, the setof squares of all odd numbers up to 100, defined mathematically by S = ( n 2I ne { l ,...,lOO}
A
odd n }
can be expressed in Haskellas follows: S
= [nA2
1
n Int [Char]I -> Int
are instances of the type [a1 -> Int.These instances typically appear when length is used in larger expressions and the context imposes additional constraints of the type of length,as for example in length [l, 2, 31 + length [rrredtr, rlyellowli , llgreenll]
The main advantage of polymorphic typing is that functions and data structures can be reused for any desired type instance. The same definition of length can be used to cornpute the length of a list of integers and a list of strings. ~anguageswithout a polymorphic type system would require the specification of two length functions: one that operates on lists of integers and another that handles lists of strings. C allows the programmer to resort to pointers and write a single length function that operates on generic lists, where elementsarepresent as void * pointers.Thehandling of pointers,however, is errorprone and the useof the void * device discards valuable typeinfo~ation. Type checking in polymorphic languages like Haskell is more difficult than in traditional monomorphic type systems. In p ~ i c u l a r the , equality of types involving type vario will n , be explained in Section’7.3. ables is harder to determine and requires~ n i ~ ~ a t ias
By definition, a function in Haskell defines a fixed relation between inputs and output: whenever a function f is applied to the argument value arg it will produce the same output no matter what the overall state of the computation is. Haskell, like any other pure functional language, is said to be ‘referentially transparent’ or ‘side-effect free.’ This property does not hold for imperative languages, where assignments to global variables and
through pointers may cause two function calls f a r g to yield different results, even when the argument valuea r g is the same in both calls. The good thing about referential transparency is that it simplifies program analysis and transformation since a closed expression always denotes the same value independent of the context, and may be moved around freely. A closed expression is an expression that contains no references to external names other than global identifiers.The bad thing is that it preventstheprogrammerfromwriting space-efficient programsthatuse in-place updates. For example, adding 1 to all elements of a list in Haskell requires a complete new list to be constructed: = [l add-one I I add-one (x:xs) = x+l
:
add-one xs
Each application of the list constructor ‘:’allocates a fresh node. In an imperative language, we can update the input list in-place, in which case we require the caller to create a copy of the input beforehand if the original is still needed later. In simple cases the functional language compiler can deterrnine that the input list is indeed no longer neededafter the call to add-one, so it can generate in-place updates too. Unfortunately, whenever the input list is passed as part of a larger data structure the compiler probably cannot infer the last-use requirement, and must conservatively allocate a new list. Functional programs are notoriousfor allocating huge amountsof memory. Therefore garbage collection is a very important part of the run-time system. It allows the run-time system to present the user with an unlimited ~ o u noft memory, but requires the compiler to insert the necessary hooks in the code, as discussed in Section 5.2.2.1. Given the fast rate at which functional programs typically allocate and, implicitly, deallocate memory, garbage collection techniques that only examinethe live data, like two-space copying (Section 5.2.5), perform best.
A fundamental concept of modern functional languages is that functions are ‘first-class citizens’: functions may be passed as ~guments,returned as results, and kept indata structures, just like ordinary data values such as integers. A function that takes a function as an ~ g u m e n tor , delivers one asa result, is referred to as a higher-or~erf~nction.Imperative languagesbarelysupporthigher-orderfunctions:functionsmayperhapsbepassedas parameters, but it is impossible to create ‘new’ functionsat run time. The ability to construct new functions out of existing ones provides great abstractive power to the user, and is common in mathematics. The differential operator, for example, is a higher-order function which takesa function as argument and returns anotherfunction, its derivative, as the result:
This mathematical definition can be directly expressed in Haskell as follows: diff f = fwhere f- X = (f (x+h) - f X) / h = 0.0001 h
Note that this definitionof d i ff crudely approximates the true derivativesince it takes the limit by fixing h to a small constant; a better definition that employs a sequence of ever decreasing values will be presented in Section 7.1.8. The important aspect of the example is that d i f f returns as its result a 'new' function that is composed out of already existing functions ( f , +,-,and />.The expression d i ff s q u a r e approximates the function 2x and can be used in other expressions. For example, the expression (diff square) o
yields an approximation to the derivative of s q u a r e in point 0: 0 .0 0 0 1. Since d i ff s q u a r e yields a function, it can be differentiated again: the expression (diff (diff square)) 0
yields 2 .0 as an approximation to the second derivativeof s q u a r e in point 0, Another way to create new functions is to partially apply an existing function to a number of arguments that is less than the arity of the function. This technique is known as ,after the logician H.B. Curry, and can be used to specialize functions by fixing some parameter values. A s an example, we will look at the differential operator again. Instead of constructing a new function that truly operates as the derivative, we candefine a function d e r i v that computes the derivative off in point x: deriv f x = (f (x+h) - f x) / h where h = 0.0001
It is important to note that there is no semantic difference between the definitions unary function d i ff and the binary functiond e r i v , since the equation
of the
deriv f x == (diff f ) x
holds for any function f and point x. Functional languages therefore consider a function with YE arguments as syntactic sugar for a chain of YE unary functions processing the arguments one by one: This view allows currying since function calls consisting of functions applied to too few arguments like d e r i v s q u a r e are not semantic errors, but can be usefully interpreted as shorthand notation for some new function, In short, an YE-ary function applied to m arguments denotes an (YE-m)-aryfunction. In this view, applying an rz-ary function to YE arguments leads to a 0-ary function, which may be passed aroundlike any other function. As a consequence it is not necessary to compute the value of a 0-aryfunction when simplypassing it on to another function:it may be passed unevaluated as long as the called function is prepared to carry out the evaluation when needing the value. Lazy evaluation, which is to
be discussed next, uses exactly this approach and does not evaluate an expression (a 0-ary function) until its valueis needed. Currying complicates compiler construction since we need additional run-time support to construct and evaluate curried functions. This will be discussed in detail in Section "7.5.
Functional languages lack explicit control flow and are based solely on expressions constructedout of valuesandfunctionapplications.Evaluating a non-trivialexpression amounts to evaluating some subexpressions and combining the partial results intothe final answer by applyinga built-in operator or user-defined function. Itis up to the compiler and run-time system to determine which subexpressions to evaluatein which order. Imperative languages, on the contrary, prescribe toa large extent how and when expressions should be evaluated. For example, expressionsat argument positions must be evaluated before calling azy ~ ~ a ~ u arelaxes t i o ~these constraints by specifying that a subexpression will only be evaluated when its value is needed for the progress of the computation, This approachofferstheprogrammer a simplewayofspecifyinginfinite data structures, as shown in the following example: main = take 100 [l. .l
The function take is a standard Haskell function that takes the firstn elements from a list. In this case take operates on the infinite list of natural numbers generated by the [l...] expression. Lazy evaluation does not start off by evaluating I 1 ..I ,as would happen in a comparable C program, but rather invokes take since take is the function at the top of the expression tree for main. The call of take proceeds to evaluate its second argument step by step until the firstl00 list elements are known. of With lazy evaluation we can improve on our earlier derivation example. Instead approximating the derivative with a small fixed value for h, we can take the limit of a series of approximations, deriv f x = lim [(f (x+h) - f x)/h 1 h. nd.num ? Num(1) : mu1 (n, fac (Appl (Appl (&-sub, n) , Num(1) ) ) ) ;
Figure 7.31 Lazy translation of the argument of f ac.
Until now we have used the strictness i n f o ~ a t i o nto avoid constructing applications spines, but we can go one step further. In addition to generating a routine fac ( ) which expects one unevaluated argument in which it is strict, we can generate a second routine f ac-evaluated ( ) , whichexpectsanevaluatedargument.We can thenreplace fac (n)by fac-evaluated(eva1 (n))
and, if we h o w that n has already been evaluated, leave out the call to eval ( ) . We can do the same for the built-in operators, leading to the introduction of add-evaluated ( ) ,
Pnode f ac (Pnode Pnode n= -argO;
1
-argO) {
return equal (n, Num(0)) ->nd.num ? Num(1) : mul(n, fac(sub(n, Num(1))));
Figure 7.32 The first step in exploiting the strictness of the argument of f ac.
equal-evaluated () ,etc. The code for fac that uses this optimization is shown in Figure 7.33; note that the code for fac ( ) itself has been left in place, andnow calls fac-evaluated ( ) after evaluating the argument. Now, when generating a direct call has a choice between two translations: calling of fac, the code generation fac-evaluat (ed ) if fac is known to be strict and the argument is known to be evaluated, and calling f ac () otherwise. (The options of calling -f ac ( ) when arguments are in an m a y and of creating a node using -f ac are still open as well.) This code uses at most one call of eval ( ) for each call of fac (n);this is a great improvement over the code from Figure 7.32, which calledeval () 6n times.
Pnode fac-evaluated (Pnode Pnode n= -argO;
1
return equal-evaluated(n, Num(0)) ->nd.num ? Num(1) : mul-evaluated (n, fac-evaluated (sub-evaluated (n, Num (1)) ) ) ;
Pnode ac f (Pnode -argO){ return fac-evaluated(eva1
1
-argO 1 {
(-argO) );
Figure 7.33 Exploiting the strictness off ac and the built-in operators.
With strictness analysis we avoid c o n s ~ c t i n gand inte~retingapplication spines at strict positions. The overall effect on performance is strongly program dependent, since the quality of the strictness analyzer and the lazinessof the program determine whichfraction of the application spines can be optimized away. The strictness analysis discussed so far only considers argumentsof functions, but many expressions are partof data structures. Some progress has been made with analyzing the strictness of data structures like lists (Burn,1991), but in general it is a hard nut to crack, so most strictness analyzers simplydo not address the issue at all. To alleviate the problem of strictness analysis failing to determine the strictness of certain expressions compilers often have provisions for the programmer to manually indi-
cate strictness. Haskell provides the standard function seq (for ‘sequence’) that can be used in seq x y to ensure that x is evaluated before running y. This function can easily be usedto ma& strict expressions,butalsoplaysanimportantrolein progra~ing interactive programs where actions should happen in a precise order.
For simplicity the graph reducer requires all data to be ‘b~xed’in a graph node cont~ning a tag for identification, as outlined in Figure 7.15. Although this allows a uniform treatment of pointers when inte~retingthe computation graph, it is also a source of inefficiency. For example, the strict version of the function f ac in Figure 7.33 still allocates a considerable number of graph nodes to hold thei n t e ~ e d i a t eresults produced by the builtin operators less,mul,and sub,even in their -evaluated forms. These intermediate nodes are passedon to another operator or call of f ac,both of which are onlyinterested in the numerical contents, not in the box surrounding it. The function mul-evaluated () Pnode mul-evaluated (Pnode a, Pnode b) { return Num(a->nd.num * b->nd.num);
1
simply dereferences the argument pointers without even ins~ectingthe node tag, and does not use the pointers otherwise, It is obvious that we can pass the numbers directly, instead of boxed in graph nodes. In general, we can pass any basic value unboxed at strict argument positions. Once strictness analysis is performed, it takes little analysis to find out o ~ i ~ which ~gumentsare suited for unboxing. This analysis is called ~ The effect on the fac example is shown in Figure ’7.34, which features the routine f ac-unboxed ( ) . The built-in operators are no longer needed, and the correspondingC operators are invoked directly. The function fac-evaluat ed ( ) was adapted to take care of unboxing the argument beforecalling f ac-unboxed ( ) ,and boxing the result in a number node. Boxing the result is necessary since fac-evaluated () has to return a boxed value as before. f ac,it no longer allocates any graph node. After applying the boxing analysis to This is, however, an exception since most functions in a Haskell program of reasonable size do not operate on strict arguments with basic types. Nevertheless boxing analysis can have a noticeable effect on performance: the unboxed factorial runs about l 0 times faster than its boxed counterp~t.
The last statement in a routine in code generated for functional languages is often a function call. One reason is that repetition is expressed using recursion infunctional languages. But we will see in Chapter 8 that the same phenomenon occurs in code generatedfor logic programs, as Figures 8.16, 8.19, and even 8.15 and 8.18 amply attest. Some of these tail calls can be avoidedby in-lining the called routine, but the ones involved in recursion cannot. These in particular cause a problem, since they may lead to stack overflow. A tailrecursive routine working on a linked list of N elements will reach a stack depth of N just
int fac-unboxed(int n) { return (n == 0 ? 1 : n * fac-unboxed(n-1));
1 Pnode fac-evaluated(Pnode -argO) { Pnode n = -argO; return ~um(Eac-unboxe~(n->nd.num));
1 Pnode Eac (Pnode -argO) { return fac-evaluated (eval(-argO))
;
1 re 7.34 Strict, unboxed translation of the function fac.
before finishing; and when it finishes, it spends a time proportional to N in unstac~ingthe activation records. An iterative routine w o r ~ n gon the same N-element list would have a stack depth of 1 and a 1-level unstacking operation would suffice upon ter~ination. So it is natural to look for ways of avoiding the stacking involved in tail calls. Fo~unately,much of the administration involved in a tail call is super~uousand can be eliminated. There are several reasons for this. First, there is no reason to return to the routine R that made the call (the caller) when the routine C that was called (the callee) has finished, since there is no additional code in the caller that needs to be executed. So there is no need for a return address when C exits, nor was there any need to store one when C was called and the transfer (tail call) to C could be effected by a simple jump instruction. Second, the formal and local variables of the caller R are no longer required as soon as the parameters of the tail callee C have been computed, provided these p ~ a ~ e t edo r s not contain pointers to the formals or locals of R. And third, now that we have seen that the formals and locals can be abandoned and there is no need for a return address, most or all of the activation record of R is free just before the call. When generating assembly code, these possibilities for opti~ization are easily exploited. The parameter values of C are written over the activation record of R, which now looks like a proper activation record for C, and a simple jump to the start of the code of C is executed. Some care must be taken in the overwriting of the variables, but on the whole the o p t i ~ z a t i o nis straightfo~ardand r e w ~ d i n g .It prevents the stack from growing from tail recu~sion,thus extending the range of applicability of recursion, and reduces the routine e ~ t r y / ~ xcode i t to a simple jump. When generating C, however, the start of the code of the callee is not available to jump to, except in the case of a directly tail-recursive routine, Fo~unately,that is an important case in functional programs. First, many tail calls are already directly tailrecursive. Second, the remaining indirect tail calls can often be t r a n s f o ~ e daway by inlining the code of the callee. If the code of the callee itself ends in a tail call, we must inline further if that call is not to the original caller. This approach fails if not all source code is available, for example, when calling a separately compiled (library) function. Most
mutually recursive functions, however, occur in the same module. As an example consider the following function that drops the first n elements from a list: drop n 1st = if (n == 0) then 1st else drop (n-1) (tail 1st)
Using strictness i n f o ~ a t i o nand boxing analysis we would normally generate the C-code in Figure 7.35. The tail call can easily be transformed into a back jump, as shown in Figure 7.36. The tricky part of overwriting the parameters with the new values of the tail call is left to the C compiler by introducing a set of temporary variables - t m p O and -tmpl. Pnode drop-unboxed (int n, Pnode 1st) { if (n == 0 ) { return 1st; } else { return drop-unboxed(n-l, tail
1
); (1st)
Figure 7.35 Strict, unboxed translationof the functiondrop.
Pnode drop-unboxed (int n, Pnode 1st) { L-drop-unboxed: if (n == 0 ) { return 1st; } else { int -tmpO = n-l; Pnode -tmpl = tail (1st);
1
n = tmp0; 1st = -tmpl; goto L-drop-unboxed;
Figure 7.36 Translation of drop with tail recursion eliminated.
Although we have seen above how to transform direct tail recursion into efficient c code, many self-recursive functions do not end in a tail call, but require alittle postprocessing of the result of the recursive call. An exampleis again the function fac:
fac n = if (n == 0) then 1 else n * fac (n-1)
The problem is that in this code the multiplication is performed after the recursive call, resulting in a stack of n recursive calls. Another transfo~ationis required to reduce the required stack space to a single frame. If the operator using the recursive result is associative, as is the case for multiplication, we can trans for^ the code to use an accumulating argument, as shown in Figure 7.37.
fac n
=
if (n == 0) then 1 else prod n (n-1) where prod acc n = if (n == 0) then acc else prod (acc*n) (n-1)
Figure 7.37 Tail-recursive f ac with ~ c c u ~ u l ~ targument. ing
The local function prod accumulates the results of the multiplications along the way in its the call prod p q computes p*q* (q-1)* . . . "1. So the desired first a r g u ~ e n acc: t product n* ( n - 1) * . . .* 1 is computed by the call prod n (n-1 ) ,except for n equals 0, which has to be caught as a special case. Since prod is a tail-recursive function being strict in both arguments, the code generator may now produce the code shown in Figure '7.38. Note that the local function prod is lifted to the top and has been renamed to fac-dot-prod. The tail call to itself has been transformed into a jump, which makes the code as efficient as the handwritten imperative version of Figure 7.2. The transformation to an accu~ulatingargument depends on the compiler being able to recognize the recursive pattern and to determine that the operations performed on the results are associative. In other words, the top-level node in the AST should be an associative operator and one of its children s~ o u ldbe a self-recursive call. Figure 7.39 gives the translation scheme for a general right-recursive function: if a function definition matches the first line in the table, it can be replaced by the other two functions. In this scheme, @ denotes an associative operator, F is the function name, and B, V, E, El...En denote arbitrary expressions. An additional constraint is that the @ operator should be strict, to avoid building a chain of unevaluated expressions through the accumulating argument. Note that the test B in the replacement code of I ; is evaluated in a different environment than the B in the code for F'; so the code of B is d~plicatedin this transformation, but its actions are not. The function f ac, given as fac n = if (n == 0) then 1 else n * fac (n-1)
matches the conditions in the translation scheme, with F = fac,n = 1, @ = *, B = (n == 0 ) , V = 1,E = n, and El = (n- 1 ) . This results in its replacement by the functions
int fac-dot-prod-unboxed(int acc, int n) { L-fac-dot-prod-unboxed: if (n == 0 ) { return acc; } else { int -tmpO = n * acc; int _tmpl = n-l;
1
1
acc = -tmpO; n = -tmpl; L-fac-dot-prod-unboxed;
goto
int fac-unboxed (intn) { if (n == 0 ) { return 1; } else { return fac-dot-prod-unboxed(n, n-1);
} Figure 7.38 Accumulating translation of f ac, with tail recursion elimination.
F xl...x, = if B then V else E G.3 F El...E, ==3
F xl.,.x, = if B then V else F’ E E ,...E, F’ x,,, x1 ,.xn = if B then xoCc@ V else F’
(x,,, 0 E) El...E,
Figure 7.39 Accumulating ~ g u ~ etranslation nt scheme.
fac n = if (n == 0) then 1 else fac-prime n (n-l) fac-prime x-acc n = if (n == 0) then x-acc * 1 else fac-prime (x-acc * n) (n-1)
which, after the arithmetic simplification x * 1 I=$ x,matches the code in Figure 7.37. Translation of the functional core code of Figure 7.37 to C yields the code of Figure 7.38. In-lining and a few other optimizations discussed in Chapter 4 yield code that is comparable to that in Figure 7.2. In s u ~ ~ a rthe y , following optimizations were required to trans for^ the definition of f a c in Figure 7.1 toG code equivalent to thatof Figure 7.2:
-top-level sho~-circuitingof the if-state~entand its branches;
-sho~-circuitingof =E=and * due to known strictness of their arguments;
- strictness analysis; yields that f a c is strict in its argument; - short-circuitingthe - operator, which is now allowed; - translation to C; - uiibo~ingthe ~ g u m e n t ; - accumulating the argument; and further optimizations performed by the C compiler.
The optimizations presented in this section showed that it is possible to generate efficient code for the factorial example. Likewise, many simple mathematical functions can be compiled into efficient C code. In general, however, strictness analysis breaks down on symbolic code that manipulates user-defined data structures; it is very difficult to determine whether or not certain fields inside a record are strict, especially when infinite data stmctures and higher-order functions are involved. Aggressive in-lining and constant propagation before c a r ~ i n gout strictness analysis reduces the problem somewhat, since all cases of known functions passed as an a r g u ~ e nthen t reduce to ordinary, non-higher"order, functions. Neve~heless,the daily practice is that most data s t~ c tu re sare passed lazily, which causes a significant amount of time being spent in the r u n - t i ~ esystem, The graph reducer and garbage collector may take more e~ecutiontime than the user-defined code bits in the worst case.
As mentioned several times before, the performance of functional programs is strongly depend~nton the amount of graph manipulation involved. ~ ~ e a m l i n i nthe g handling of graph nodes therefore has a noticeable impact on p e r f o ~ ~ a ~ i In c e this . section we discuss a few advanced techni~uesemployed by several reduction engines at the lowest level.
The basic graph reducer discussed in Section 7.5.2 uses a single graph node type (Figure 7.15) to represent the different possibilities. Alt~oughthis uniform coding eases the interpretation of graph nodes, it also wastes memory. For example, a simple number takes up as ~ u c space h as a function descriptor holding three attributes (its arity, name, and code address). The obvious improvement is to use v es, where each node consists of a tag followed by a number of fields depending on the node type. The raph reducer, and the garbage collector as well, is minimal since the code in c e is that less Figure 7.17 is based on a case analysis of the node tag. The p e ~ o ~ a n effect heap space will be allocated, resulting in fewer garbage collections. Another effect is that cache pe~ormanceincreases since unused parts of the uniform nodes no longer take up precious cache space.
Graph nodes arequite small, so the node tagfield consumes relatively much memory. Note that a graph node is always identified through a pointer, or else it is garbage, so we can lift the tag from the node and put it in the pointer itself. The least significant bits in a pointer are ‘free’, because of alignment constraints for the nodes. Typically nodes are aligned on word (4 bytes) boundaries, allowing 2 bits for tagging. Four tags is precisely what we need to encode the node types usedby the basic graphreducer; the node-type defined in Figure7.15enumerates five values,but we caneasilyrepresent‘Nil’ as a constant(null) pointer. If more tags are needed the most significant bits in a pointer can be used for taggingtoo. This reducesthemaximaladdressablememory space, but,whenusing32-bit pointers, taking two bits off still leaves 1 Gbyte. In combination with the two least significant bits, this provides 16 different tags, which is enough to represent the (most common) node types in any graph reducer. ging saves one word per graph node, which reduces the pressure on the cache and garbage collector. Another advantage is that inspecting the tag now consists of selecting a fewbitsfromthepointerinstead of dereferencingit.Theselectioncanbe is cheaperthanthememory accomplishedwith a simplebinaryANDoperation,and lookup required before. Nothing comes for free, of course, but the price of pointer tagging is small: before dereferencing a tagged pointer the tagbits must be zeroed to convertit into a ‘clean’ pointer. It typically requires one binary AND operation to zero the tagbits, so the benefits far outweigh the costs.
The basic code generation scheme from Section 7.6 translatesa function into a sequence of calls to constructor functions like Num () and Appl ( ) to build the graph representing the body of the function. This simplistic approach causes the nodes to located one by one. It is more efficient to allocate them all at once. ~ ~ g r ~ g a tal~ocation ~ canbe arranged, but requires some analysis of the AST to compute the total amountof heap space needed for the complete expression. Note that the compiler can no longer generate code that calls the constructor functions of the run-time system, but must generate the proper code for filling in the specific node type itself.
S supports the concepts of currying and higher-order functions very well, but has the drawback of generating long application spines for functions with many arguments. Calling an n-ary function in a lazy context requires n application nodes. Although aggregate allocation of the nodes at once reduces the costs of buildas a singlevector of ing the spinesomewhat,morecanbepresentingthespine pointers to the arguments. A vector is a variab~e-lengthgraph node consisting of a function pointer, a length field, and n argument pointers. The lengthfield is needed by both the graph reducer and the garbage collector to function properly. The vector apply node is more space efficient than a spine of binary application nodes if the number of arguments exceeds two (2+n versus 2n). Anotheradvantage of usingvectornodes is that
unwinding becomes cheaper, because the pointers in the application spine need not be traversed. Vector apply nodes frequently represent delayed function calls in which all function arguments are present; the number of arguments matches the arity of the function. In this case the vector apply node is in fact a self-contained closure. The graph reducer can improve performance by disc~minatingclosures from curried applications. Normally the graph reducer unwinds an apply node by stacking the argument list before finally invoking the function through its code address, The arguments are passed as an m a y of pointers. Inside a closure the arguments already appear as an may, so the graph reducer can skip the unwind phase and invoke the function directly, This saves processing as well as stack space. It is an important optimization since most vector apply nodes hold closures and currying is used (relatively) sparingly in most functional programs.
This concludes the chapter on compiling functional programs. We saw that the increased level of abs~actionprovided by functional languages over traditional imperative languages puts a burden on the compiler writer. Some features, like pattern matching and list comprehension, are considered syntactic sugar and can be handled quite easily in the front-end. Fundamental features like currying, higher-order functions, and lazy evaluation, however, require substantial effort from the compiler back-end even when C is used as the target language. Lazy evaluation implies an execution order (normal order reduction) that is different from the strict model offered by the target machine (applicative order reduction in C). Therefore the language run-time system centers around an interpreter, the graph reducer, that repeatedly searches for the next computation step (reduction) in a graph of unevaluated expressions. To increase performance, the functional language compiler exploits several optimizations to minimize or even circumvent interpretation overhead (for example, graph manipulation). Strictness analysis is the most important optimization since it identifies which expressions must always be evaluated, so traditional code can be generated for them, ~nfortunately, strictness analysis on symbolic code involving user-defined data structures is notoriously difficult. As a consequence a large fraction of the computation goes through the interpreter. This also explains why the garbage collector, discussed in Chapter 5, is another important component of the run-time system of a functional language.
- Functional languages promote a declarative style where programmers specify only what to compute. It is up to the compiler and run-time system to decide how, where, and when to compute (sub-)results. - Functional languages include special syntax (for example, list comprehension) and programming constructs (for example, higher-order functions) to raise the level of abstraction. This complicates the compiler and the run-time system.
-Syntactic sugar is handled by the lexical analyzer and parser. This includes the
offside
rule, list notation, list comprehension, and pattern matching. -Themore fundamental issuesconcernpolymorphictyping,referentialtransparency, higher-order functions, and lazy evaluation.
-A functional language compiler is typically structured as a front-end that t r a n s f o ~ sthe complete source language to an intermediate language that contains only the essential features: basic data types, structured data types (lists), non-nesting functions, local bindings (lets), expressions, higher-order functions, and lazy evaluation semantics. -Numerous optimizations are performed on the intermediate code (or actually the AST), before code is generated. Most compilers build on imperative compiler technology by generating C; C is used as a portable optimizing assembler. -Pattern matching is handled by generating a simpler function which includes a sequence of if-then-else constructs to explicitly check whether or not actual ~ g u ~ e nmatch ts a specific constant or type constructor. -List comprehension is handled by a translation scheme that transforms the (semi-)mathematical notation into a set of higher-order functions that iterate over lists and filter out the right elements. -Nested functions are floated to the top by the lambda~liftingtransformation. Pointers to identifiers in outer scopes are translated into pointers to additional arguments that are passed to the lifted functionsexplicitly. -Type checking of functional languages is complicated by p o l y ~ o ~ htypes; i c functions and expressions may have multiple types. Also, types need not be specified explicitly, so the compiler must infer the type of arbitrary expressions using type inference techniques. -Type inference first determines the constraints that are imposedby the body of the function at hand. The set of constraints is then solved by unification, which results in the most general instanceof the type of the function,
-Higher-order functions and lazy evaluation are tackled
by an interpreter known as the graphreducer,which explicitly manipulatesapplicationnodesrepresenting(partial) t is represented as a spine of rz function calls. A function call with rz ~ g u m e nexpressions application nodes, each holding a function and argument pointer. An application spine that holds a function and all its ~ g u m e n t sis known as a ‘redex’ (for ‘reducible expression’), -The graph reducer is a three-stroke engine that repeatedly selects a redex, instantiates the right-hand side of the function, and updates the root of the redex. Instantiating the right-r and side of the function builds the graph representing the body and replaces the
formal parameters with pointers to the actual arguments. - The reduction order controls the selection of the next redex. ~ o ~ a l - o r d reduction er always selects the leftmost outermost redex and, consequently, only reduces what is needed to progress the computation; this is lazy evaluation. Applicative-order reduction is used by imperative languages and evaluates the leftmost innermost redex first; expressions at argument positions are evaluated before calling the function, - ~ormal-orderreduction can handle a larger class of programs. It allows programi~ersto specify infinite data structures as long as they only use a finite part of them. - The code generator emits C code, which will be invoked by the graph reducer, to instantiate the body of a user-defined function at run time. - Expressions are translated into a sequence of application nodes. The if-then-else construct translates into a call to the conditional function supplied by the run-time system. Local bindings (let-expressions) translate into assignments to local variables in C. - If a function fdirectly calls another function g (with some arbitrary argument expressions) then control can be transferred directly to g without using application nodes. This optimization short-circuits top-level function applications. Expressions occurring as operands to a built-in s t r i ct operator rnay be evaluated immediately instead of being transformed into a (lazy) piece of graph. - A broad compiler, which has access to the AST of all functions, is needed to achieve acceptable performance. Strictness analysis and boxing analysis are program-wide analyses that avoid the manipulation of graphs in many cases. - Strictness analysis identifies for each function which arguments will always be evaluated, so corresponding argument expressions need not be passed as a graph of function applications, but may be evaluated safely before the call. - ~ e t e ~ n i the n g strict arguments of a function proceeds bottom-up through the AST. Function application is handled by looking up the strict arguments of the called function and passing the sets of strict arguments in the AST at the corresponding positions upward. The strictness properties of a recursive function are optimistically set to true. Then the strict arguments are d e t e ~ n e dby repeatedly propagating sets along the AST, until a fixed point is reached. - Once strictness analysis is performed it is no longer necessary to pass strict arguments as graph nodes. Boxing analysis determines which arguments rnay be passed by value and takes care of changing between boxed and unboxed representations when switching from interpreted to compiled code. - In assembly code, the tail call inside a routine R can be optimized into a stack adjjustment plus jump to the new routine, C. - In C we can only jump to the start of the current routine. This allows us to optimize the impo~antcase of directly tail-recursive functions by overwriting the parameters with the values of the tail call and jumping back to the start of function R. - Often recursive functions do not end in a tail call, but in an operator that processes the result returned by the tail call. The accumulating transformation may be applied when the operator is associative and the tail call is recursive. The body of the function is reorganized to take an additional argument that accumulates partial results in a strictly tailrecursive manner; the operation is performed as input to the accumulating argume~tof the tail call.
-The end result is that state-of“the-a~compilers for functional languages can emit code that is similartotraditionalimperative counterparts for theclass of nume~calalgorithms. ~ptimizingsymbolicandinherentlylazyprograms is muchmore difficult, though. -The last drop of performance can be gainedby streamlining the graph reduceritself. We discussed four advanced graph-~~ipulation techniques: va~.iable-length nodes, pointer tagging, aggregate node allocation, and vector apply nodes (or closures).
a1 programmingdiffers conside~~bly fromimperative progra~ming.The books (1998)andThompson(1999)provide a gentleintroduction into theworldof urrying, and lazy evaluation. 0th books use the Haskell language,which is the de facto functionallanguagewithlazysemanticsandwhich succeeds early lazy languages like SASL, Miranda, and LML. Standard ML (Milner et al., 1997) on the other hand, is the prominent functional language with strict, or non-lazy, semantics. It is quite popular, and the book by Paulson (1996) is a good place to start. LISP, o~ginatingin the early sixties, is still alive, with Scheme (Dybvig, 1996) being the prominent representative. Detailed infor~ationabout the imple~entationof functional languages can be found in the proceedings of the annual International Conference on ~ u n c t i o n a l P r o g r a m ~i~g K F P , which combinesthe former biennial Functional ~ r ~ g ~ ~ m and m i nCompu~er g Architecture -FPCA and Lisp and Fu~ctional Pro~ramming LFP conferences. The Implem e ~ t ~ ~ ~ o n o ~ ~ ~~ an nc gt ui o~ ngwor~shops ae sZ are also a rich source of in€ormation. Journals to consider are the J~urnalqf Functional P r o ~ r a ~ m iand n ~ ~, i g ~ ~ r - ~and r dSymer bolic C ~ ~ p u t a t i o(formerly: n LISP and Symbolic computation^. A systematicstudy of functional language implementationsis given by Douence and Fradet (1998). This chapter discussed the modern approach of compiling functional languages using C as theuniversalassembler.Thisapproach is themostrecentstageinthehistorical develop~entof implementing functional languages, which progressed from poorly performinginterpreters (SF3 cornbinators(Turner,1979))throughsupercombinators (Gmachine (Johnsson, 1984)) to native code generation. The standard reference to the early work is The Im~leme~tation o~Func~iona1 Progra~ming ~ ~ n g u (Peyton a ~ ~ sJones, 1987). The follow-up I~plementing ~unctional~ n g u a g e s (Peyton Jones and Lester, 1992), is e espe~iallyuseful for practitioners since it describes the spineless tagless ~ - ~ a c h i nthat forms the heartof the state-of-the-art Haskell compiler from Glasgow ~niver~ity. The Glasgow Haskell Compiler (GHC) is available from the Internet (http://w~w.h~skell.org/ghc/), and comes with a complete set of sources, It is structured such that ‘others can plug in their own strictness analyzer, profiler, front-end, back-end,or otherspecial pass’. Thisopenstructure,andthe availabi y of allsourcecode,eases e ~ p e ~ m ~ n t a twith i o nthecompiler,althoughlearningthe C ’S internals is quoted to be ‘a fairly substantia^ exercise’.
-
7.1
In Section 7.3, the type of map is described as map : : ( b - > c ) - > f bl - =. f c] 'since the - > operator is ~ght-associative,which is consistent with currying'. Explain why the right-associativity of the - > operator is consistent with currying and what would be wrong if the type of map were specified as map : : b - > c - >
Ebl 7.
->
[el.
Derive the type of the followi~gfunction foldr, which folds a right-associative operator into a list of values: foldr op val [ 1 foldr op val (x:xs)
=
Val. op x (foldr op val x s )
7.3
Haskell uses the dot operator ( .) to denote function composition (f .g). What is the polymorphic type of this built-in composition operator?
7.
Give equivalent f~nctional-coreexpressions for the following lists: 1ist If [l..481]
7.5
Pattern matching is considered syntactic sugar that can easily be compiled away. Transform the function unique (a:b:cs) = if (a == b) then a : unique cs else a : unique (b:cs) unique cs = cs
which contains a nested pattern, into an equivalent functional-core routine. 7.
The pattern ans slat ion in Figure '7.6 contains code to catch cases in which all patterns fail. (a) Argue that the activation of this code signals an error in the specification of the function. (b) Argue that it signals an error in the use of the function.
7.7
~ a t t eat~chin^ by ~ e ~ e a t ev da ~ i a b ~ eadvance^ ~: topic : Some functional languages (for example Miranda, but not Haskell) allow another form of pattern mat chi^^, pattern matching by re es. This pattern allows an argument name to occur more than once in the left-hand side of a function definition and requires these occurrences to have the same value for the match to succeed. This allows, for example, defi~itionslike equal x x = 1 equal x y = 0
(a) Find out the precise semantics of this form of pattern matching in Miranda, with aneyeto,forexample, f (x:(y:xs)) (y:(x:ys)).
esign a translation technique for the translation of these p a t t e ~to~the functhe strictness computation rulesfor the code resultingfrom these patterns. e de~nitionof the quicksort function from Section 7.1.3 uses listcomp~ehensions. e the translation scheme in Figure 7.10 to derive a version that uses higher-order nctions instead. he t r a n s f o ~ e dq s o r t functioninExercise 7.8 useslocal la~bda-lifting transfo~ation to float these functions out.
functions. Usethe
plicative-order reduction may compute expressions that are not needed, Give an e x a ~ p l ewhere applicative-orde~reductionevenfails to terminate,while nomalorder reduction succeeds inr e t u ~ i n ga result (by being lazy). he ~ r e e - s t r o ~reduction e engine ends each reduction step by updatin~the root with the result. ~ ~ i t t i the n g update may lead to exponential overhead. Give an example that de~onstratesthis. he reduction engine in Figure 7.17 updates the root node by copying the value into is approach does not workif we extend the reducer to handle user-defineddata structures,whichcanbe of arbitrary size. Theusualwork-around is to introduce ction nodes that consist of a tag and a pointer to the true value. Discuss the es of the eval () code that are required to incorporate such indirection nodes. rsive definitions inside let-expressions can not be translat~dinto assignments to G variables. Explain. A solution is to use a two-phase approach. First, generate code for the recursive definition while using a blank entry for each forward refere. Second, generate code to backpatch the blank entries. Give the trans~ationfor function rep n = let 1st = n in 1st
:
1st
~ h i c hgenerates an i n ~ n i t elist of Ins. lain the opti~i~ations in Section 7.6.1 as examples of partial evaluation. hat are thestrict ~ g u m e n t of s the following function? f x y z O = x + z f S y z p = f y 0 0 (p-l) + f z z
0
(p-l)
lain the unboxing t r a n s f o ~ a t i oof~ Figure 7.33 to Figure 7.34 as an example of ial evaluation.
n the last few paragraphs of Section 7.7.3 we examined the problem of fil ~ a ~ e trecord e r with values, some of which possibly derive from that same
eter record, If, for simplicity, we assume that all values have the same size, we can formalize the problem as follows. Given the set A of assignments I?-new[ll L?-new[21
:= :=
fl(. . . ) ; fz(. . . 1 ;
:=
fN(. . . ) ;
*.. P-new[NI
in which P-new [ ] is an array, and the function ~ g u m e n t s(. . .) may contain any combination of the elements of an array P-old, give an algorithm for generating code for the assi~nmentsin A that works correctly even if P-new and P-old are actually the same array. These assi~nmentsdo what is called in situ r~pZuce~ent.
7.18 The mappend function given in Section 7.4.3 looks like a suitable candidate far the accumulatin~argument transformation, but it is not. Explain. Hint: consider the performance consequences. 7.19 Discuss the benefits of using vector apply nodes when evaluating the follo~ing expression: take 100 (from 1)
where take 0 xs = take n (x:xs) =
from n
=
n
:
[I x
:
take (n-1)s s
from (n+l)
ist to^ of f ~ ~ c t i o nZunguag~ ul i ~ p Z ~ ~ ~ n t a: Study t i o n Landin's paper ( 1964) on the m e c ~ a ~ i cevaluation al of expressions, and identify and summarize the concepts and techniques described.
In thepreviouschapter we haveseenthatthecompilation of functionalprogramsis basically simple, once the idea of a closure is understood. The situation for logic programs is different: even imple~entingthe basic mechanis~s already poses considerable roblems. Logic p r o g r a ~ ~ i nisg based on the specification of ‘relations’between‘terms’, ‘facts’ using these relations, and ‘rules’ for ‘inferring’ new facts from existing facts. Some or may contain logical variables. of the terms involved in these facts and rules may be Facts and rules together are called ‘clauses’. A program in a logic language consists of a ‘list of clauses’. The point of logic programming is that it is possible to request specific inferred facts by doing ‘queries’. Such inferred facts are obtained by the program by searching through the clauses, combining them in various ways and doing backtrac~ing when the do leadto a desired fact. During combinationsleadnowhere,andoftenevenhey thissearch,thevariablesintherulesmaybetentativelytovariousterms,andthese bindings will have to be undone duringbac~tracking.The terms tried for variables may be co~stants9butmayalsobeothervariablesorstructures;thesestructuresmayhold constants,boundorunboundvariablesand/orfurtherstructures.Also,thevariablesare themselves terms. For these reasons, the usual means of binding variables to values, the a s s i g ~ ~ e nist out , of thequestion,andthebinding is performed by a processcalled ‘unification9,to be explained furtheron in this chapter. Note that ‘a variable is bound to a value’, whereas ‘a value is assigned to a variable’. The reason for using the verb ‘to bind’ in this wayis that, unlike traditional variables, logic variables can be unbound-have no binding -or bound -have a binding -and that if they have a binding, that binding leads toa value (or another variable). An example of a relation is ‘is a parent of‘. Facts using this relation might be ‘Arne is a parent of Sachiko’ and ‘Sachiko is a parent of Rivka’. Another example of a relation is
‘is a grandparent of‘, and the two relations might be linked in a rule ‘ X is a grandparent of Z if there is a Y such that X is a parent of Y and Y is a parent of Z’, in which X9 Y, and Z are logical variables. The system should then be able to infer from the above facts the new fact ‘Arne is a grandp~entof Rivka’. The first logic language processors were inte~reters,and in fact there has long been serious doubt if logic language programs could be compiled profitably at all. Today, for problems of a logic programming nature, logic programs compiled by high-end logiclanguage compilers achieve speeds that approach those obtained for the c o ~ e s ~ o n d i n g imperative implementations to within a factor of 2 of 3. Compilation of full backtracking search over the clauses and of unification of the terms require substantial implementation effort even in their non-opti~izedform, but modem Prolog compilers achieve the remarkable speed of their object code by doing extensive optimizations, Fortunately, many optimizations consist of specializations of more general code and can be understood easily on the basis of that code. We will first explain the full backtrac~ingsearch mechanism by showing a very simple and inefficient interpreter for it (Section 8.2). This is followed by a discussion of the unification of two terms (Section 8.3). We then concentrate on a general compilation technique for full recursive backtrac~ngsearch over the clauses combined with full recursive b a c ~ t r a c ~ nunification g of the terms. Its main property is that it is a uniform i~plementationof a logic p r o g r a ~ i n gmodel that is slightly more general than is required by Prolog, and from which more specialized implementations can be derived (Section 8.4). Often, some of the search code can be performed at compile time and part of the remainder can be generated as chunks of in-line code. This often allows us to avoid calling upon the general search mechanism. This optimization is discussed in Section 8.4.3. Next, we restrict ourselves to ~~nification as used in Prolog, which does not require bac~tracking; this yields a more imperative implementation of unification (Section 8.5). As with the search code, often some of the unification code can be pe~ormedat compile time and some of the remainder can be generated as chunks of in-line code. Finally, we apply one optimization to the unification code: ‘Read’/‘Write’ mode (Section 8.5.4). Some compilers for logic languages simplify life by generating C code rather than assembly code, and so will we in this chapter. There is a problem with this approach, however: the most natural expression of full backtrac~ingsearch requires nested routine declarations, which standard C does not allow. Fortunately, one of the most important C implementations, the GNU C compiler, has an extension that allows exactly these nested routines. Also, nested routines are available in a few other imperative languages. Some compilers for advanced logic languages take the level of the target code another step higher and generate Prolog code (Debray, 1994). Whereas few intermediate codes for imperative languages have reached more than local fame, there exists a more or less well-known i n t e ~ e d i a t ecode for Prolog: the ine or ; this may reflect on the complexity of the task of omp piling Prolog. The WAM document defines 39 instructions that together allow an efficient com~iledimplementation of Prolog, including the ‘cut’ operator but excluding assert,retract,and the numeric operators (Warren, 1983). The instructions deal with such things as selecting the proper clause to use in inferring a fact9 unifying under various circumstances, and backtracking. Each inst~ctioncorresponds reasonably closely
to a node in the AST of a Prologprogram;thismakesintermediate code generation relatively easy. does all memo^ allocation, for which it uses five stacks; it can do its own eallocation,butefficiency can beincreased by using an internalcompactinggarbage ollector on some of the stacks. The machine incorporates quite a ~unlberof optimizations to the basic Prolog execution model. Many of these opti~izationsinteract with each other, eresultthatmostWAM i~structionscannot be understood in isolation:several s of the defining document are required to understand the purpose andapplica~ility of the instructions, The WAM report gives translations of the WA i n s t ~ c t i o ~ into s an code, which allows WAM to target code generation almost by macro acihas ~ r i t t e na tutorial on the WAM (Ait-Kaci, 1991). too detailed for treatment in this book. We will first concentrate on the e ~ ~ c u t i omodel n for logic languages and then derive a few WAM-like instructions it, with optimizations. This approach avoids the co~plicationsinherent in interacting ~ ~ t i o nand s may also be helpful in designi~gWAM-like inst~ctionsets for other es. For another derivation of some WAM instructions see Kursawe (1987).
8.2 The general i ~ p l e m e n ~ t i omodel, n interpreted 8.3 ~nification
.4 The general implementation model, compiled
8.5 Compiled code for unification
concepts of logic ~ r o g r a m ~ i n g , ~ ~ ito ~review l y the terminology andto introduce a simple working example.
a t i o ~between ~ terns, facts stating such relafrom established facts. A faet consists of the name of the r ~ l a t i oand ~ its terms, in some syntactic notation;it is an instance of the named relaf a relation is p a r e n t /2, which notation means that the relation is governs 2 terms; in other words, it is a ortantin a givenlogiclanguageto another relation p a r e n t /l, which would then govern I term and . A few examplesof facts using the relationp a r e n t / 2 are:
parent parent parent parent parent
( a r n e, j ames) . ( a r n e, s a c h i k o ) . (koos, r i v k a ) . (sachiko, r i v k a ) . ( t r u i t j e , koos) .
It is more than likely that the first fact is intended to mean that Arne is a parent of lames, and that similar meanings should be attached to the other four facts, but of course such an inte~retationis in no way necessary for a logic program to function. Given the above facts, a query could be ? - p a r e n t (X, r i v k a )
.
in which X is an u ~ ~ o~ ~u r ~i ~onv ~ vent l e ion . ally, names of variables begin with a capital letter, and names of constants begin with a small letter. The query ? - p a r e n t ( X I r i v k a ) asks the system to scan the facts, to find those which match the query, and to show the values of the variables for which the matches occur; note the plural 'matches,, since more than one match may occur. The body of the query, p a r e n t (X, r i v k a ) is our initial goal of the search. A fact m~tchesa goal if there is a binding for the variables in the goal and the fact that makes goal and fact equal. The above goal results in two successive bindings, X = sachiko and X = koos, since the first makes the goal equal to the fact p a r e n t (sachiko , r i v k a ) and the second to p a r e n t (koos , r i v k a ) Queries do not need to contain variables: the query ? - p a r e n t (koos, r i v k a ) results in yes and the query ? - p a r e n t ( r i v k a, a r n e ) results in no. On the other hand, facts may contain variables: the fact p a r e n t ( X I r i v k a ) means that X is a parent of Rivka for any value of X, The fact p a r e n t (X, r i v k a ) can be considered as shorthand for an infinite number of facts: p a r e n t ( a, r i v k a ) , p a r e n t (b, r i v k a ) , ... 'This feature is useful, for example to express the fact equal (X, X) which says that x: equals for any value of X. An example of a rule is grandparent ( X , Z )
:-
p a r e n t (X, Y) , p a r e n t (Y, Z )
.
which means that when we have established the facts p a r e n t ( X I Y) and p a r e n t ( Y , Z ) for some values of X, Y, and Z, we may infer the fact grandparent (X, Z ) . Another interpretation of this rule is that we may replace the goal grandparent (x, Z ) by the goal list p a r e n t (X, Y) , p a r e ( y , %: ) for any bindings of X,Y,and Z. 'The part before the : - token is called the he of the rule, the part after the : - token is y of the rule. We see that the comma in the body can be interpreted as a logical AND. Again the rule may be seen as a summary of infinitely many rules, obtained by infinitely many different bindings for X, Y,and Z:
g r a n d p a r e n t( a , c ) :- p a r e n t( a , b), p a r e n t (b, c ) . with X = a , Y=b, Z=c g r a n d p a r e n t( t r u i t j e ,r i v k a ) :p a r e n t ( t r u i t j e , koos) , p a r e n t ( k o o s , r i v k a ) . with X = t r u i t j e , Y=koos, Z = r i v k a g r a n d p a r e n t( a r n e ,r i v k a ) :p a r e n t ( a r n e , koos) , p a r e n t ( k o o s , r i v k a ) . with X = a r n e , Y=koos, Z = r i v k a
...
Note that the facts required by the second example have been established above and that t e , r i v k a ) ;this is not the wemay therefore infer the fact g r a n d p a r e n t ( t r u i j case in the first and the third example. Actually, facts and rules are not as different in nature as it seems: rules estab~ishfacts conditionally upon other facts, facts do so unconditiona~ly. It is therefore convenient to play down the difference and combine them in the notionclause.
Now suppose the system is presented with the query ?- g r a n d p a r e n t ( a r n e ,
X)
.
which asks for an X of which a r n e is a ~ran~parent, and makes g r a n d p a r e n t ( a r n e, X) our initial goal. The system then tries to find or infer the goal for some binding for X. Thereareno grandparent factsdirectly,butthere is a rule for inferringthem,the clause g r a n d p a r e n t ( X , Z) :- p a r e n t ( X , U), p a r e n t (U, Z ) .
To get an applicable instanceof this clause, we makea copy of it with fresh variables in it: g r a n d p a r e n t ( X l , Z1) :- p a r e n t (Xl, Y1) , p a r e n t (Yl, Z 1 ) .
and try to make the goal g r a n d p a r e n t ( a r n e, X) and the head of this clause equal by binding the variables in them; this is the unification process we mentioned above. It is easy to see that the bindingsX1=arne and z1=Xdo the job; note that wecan bind a variable (z1)to a variable (X). This unificatio~transforms the clause copy into g r a n d p a r e n t( a r n e ,
X)
:-
p a r e n t( a r n e ,
Y1) , p a r e n t (Y1, X ) .
~e can now replace our goal g r a n d p a r e n t ( a r n e, X) by a new goal list: p a r e n t ( a r n e , Yl) , p a r e n t (Yl, X ) . Next, we try to unify the first part, p a r e n t ( a r n e, Y 1 ) with a fact and find that there are two possibilities, p a r e n(ta r n e , james) (with y1= j ames) and p a r e n t ( a r n e , s a c h i k o ) (with Yl=sachiko). The firsttransformsourgoallist into parent(arne,james)
, parent(james,
X)
.
but no fact can be unified with parent ( j ames, X), so this is a dead end; the second t r a n s f o ~ sour goal list into parent (arne
I
sachiko) , p a r e n t ( s a c h i k o
I
X) .
and now we find that parent (sachiko, X) can be unified with the fact parent (sachiko, rivka) by using the binding X=rivka. So we have shown that our query can be reduced to a list of known facts by the binding X=rivka,and the answer to the query ? - grandparent ( a r n eI
X) ,
is grandparent ( a r n e
I
rivka) .
This corresponds to the fact ‘Arne is a grandparent of Rivka’ mentioned at the beginning of ~ ~ e this chapter. The process by which it was derived is an example of an ~ The above covers the basics of the logic para dig^. All logic languages have many additional features, usually incl~dingstructures, lists, and other more advanced data types, integer arithmetic, string manipulation, ways to influence the search order, and ways to abort a search. We will only consider two of these additional features in this chapter, both from Prolog: structures and the ‘cut’ mechanism.
We can use a more precise version of the above inference technique to explain the workings of a very simple interpreter for logic languages. This interpreter is not intended as a serious implementation and serves as a model only; also, it imple~entsonly the bare bones of the logic paradigm, as explained above. The main data structure of this interpreter is the goal list stac . Each of the goal lists on the stack is a separate possibility to satisfy the query, independ t of the other goal lists: the semantics of the stack is the logical OR of all its elements. This set-up is depicted in Figure 8.1. The interpreter works on the top goal list on the stack only. It pops it off the stack, processes it, and pushes zero, one, or more new goal lists back onto the stack. The stack starts with one entry: the body of the query, followed by a special that has the variables in the query and their names as terms. W e answers to the query have been found and the processing stops. The goal list notation is extended with the construct [ P ? = Q 3 , which signifies request to unify P and Q, and the construct [ P == Q ] , which signifies that P and have been unified successfully. These constructs are used as follows. As we have seen above, u~ificatjonserves to adapt a copy of a clause to a given goal list by finding the ~ i n i ~substitutio~ al that will m&e the head of the clause and the first goal in the goal list equal. For example, the (clause copy, goal list) pair
~
e
OR
OR
... OR
bottom element
ure 8.1 The goal list stack.
randparent ( X l , Z1) :- p a r e n t (Xl, Y 1 ) , p a r e n t (Yl, Z1). r a n d p a r e n t ( a r n e , X ) , 0 ) { stack-argument(goa1-arg2) ; stack-argument(~oal-arg1); stack-argument (put-constant ( l1parent/2If ) ) ; interpret (goal-1ist-tai 1 ) ;
Figure 8.25 Translation of p a r e n t / Z with code for controlling asserted and retracted clauses.
L-fail signals e~haustionof the program-defined clauses; a jump to L-cut signals abandoning all further clauses for the relation, including those added by assert2;. The code in Figure 8.25 handles the assertion of clauses and the retraction of clauses that have been asserted. The retraction of p ro g ra ~ - d e ~ n eclauses d is imple~entedby preceding each call of a clause routine C by a test using a global Boolean retracted-C. Many Prolog implementers decide that this causes so much useless overhead that the imple~entationdoes not allow program-defined clauses to be retracted unless they are specifically declared as retractable. The goal lists in asserted clauses may contain goals with relations that occur only in other asserted clauses, in which case the processing of these goals has to be left to the inte~reter? but they may also contain goals with progr~-definedrelations. Since a routine has been compiled for such a goal, it is profitable to call that routine to process the goal, rather than leaving it entirely to the interpreter. This requires the run-time conversion of the name of a routine to a call to that routine. To effect this, we use the same entry ( ) routine used in selecting code for unifying a constant shown in Figure 8.22, and apply it
0
~
Interpreter
~ o ~ p i code ~ed
.,. ...
..* stack-argument(goa1-arg2); stack-argument(goal-argl); stack-argument(put-constant( ... interpret(goa1-list-tail); ,..
...
code
.*.
...
...
void interpret( ... ...
Figure 8.26 Switch to i n t e~ r et at i o nof added clauses.
here to the name of the relation and a fixed table of names of program-defined clauses. The dispatch routine do-relat ion ( ) is shown in Figure 8.2'7 and uses the name of the relation and the fixed table of relation names to select a code sequence for that relation. The code sequence for parent/ 2 unstacks the goal arguments from the interpreter stack and passes them and an action routine to the compiled routine parent-2 () ;the other code sequences are similar. If the requested relation is not program-defined, the flow of control is directed to aninterpreterroutine interpret-asserted-relation (), which searches relations that stem from asserted clauses only. The action routine passed to do-rela t ion ( ) is an internal routine in the interpreter. It processes the goals still left on the interpreter stack and then calls the action routine originally passed to the interpreter in Figure 8.25. Ifwe assume that the goallist stack is ~aintained explicitly in the interpreter, the relation name to be passed to do-relat ion () is found onthe top of the stack and do-relat ion () is called as follows: ~o-relation(unstac~-argument()->na~e, internal-goal-list-tail);
The process is depicted in Figure 8.28. The unstack-argument ( ) call unstack-argument (1->name
in the abovecall of do-re
lat ion ( ) and thosein
void do-relation(const char *relation, Action goal-list-tail) { switch (entry(relation,N, relation-table) ) { case ENTRY- . . . : case ENTRY-parent-2: goal-argl = unstack-argume~to; goal-arg2 = unstack-argument ( ) ; parentm2(goa1-argl, goal-arg2, goal-list-tail); break; case ENTRY-grandpa~ent-2:
...
break; case ENTRY-.,.: . * *
default: interpret-asserted-relation(relation, goal-list-tail) break;
;
1 } Figure 8.27 Dispatch routine for goals in asserted clauses.
Interpreter
Compiled code
stack
code
...
list-tail)
... ... Figure 8.28 Switch to execution of compiled clauses.
goal-argl = unstack-argument(); goal-arg2 = unstack-argument();
;
2'7 unstack a goal, just as the inverse sequence in Fi ure 8.25 stacked a goal. ver,thatthis is notthesamegoal. Thecode inFig 8.25 is located in the program and stacks a goal from the compiled program, to be tried with asserted clauses by the interpreter; the code in Figure 8.2'7 is located in the acks a goal from an asserted and interpreted clause, to be tried either tine if one is available, or with asserted clauses. The i n t e ~ l a ybetweeninterpreterandcompiled code allowsthe assert and ations to be i ~ p l e ~ e n t ereaso~ably d ef~cient~y. ~aintaini~g this mechanimposition as it would seem to be, since much of it is needed anyway to The behavior of the as s ert and retract operations with tion of the assert and il~plementationsimilar to ere, which is applied to programs with tensof thousan~sof Prolog rules we have been c o n c e ~ e dwith c o ~ p i l e dclause selection and its optimithe unification side, we have assumed the existence of a routine *goal-arg, erm Term *head-arg, Action ) ,whichfinds all possibleunificationsbetween goal-arg and n found. This emphasi~es allows one to imple~entsets as terms. On the other .it cannot succeed in more than one way andif one ste ils; no backtracki~ginside the u n i ~ c a t i ois~ required. So it is likely s and their calling sequences can be made more efficient. Also, we would tailored code for the unification of specific terms, rather than call the gen-
Prolog either succeedsor fails, we canusethe imple~entation 8.3.2, which does exactly that. This requires a little re writ in^ of the nt /2 and grandparent/2 routines. The basic str~ct~lre of these routines is the asshowninFigures 8.15 and8.19;they still have an actionroutineastheirlast a ~ e t e r and , they still act as OR list procedures, But the clause routines require some ;theone for parent (arne, sachiko) is showninFigure fortheheadunificationhasbeenreplaced by a simpleset of c oolean function unify-terms ( ) ,which receives no action routine. Each tests one ion,andiftheyallsucceedwecangoonandcallthefunction for thebody t 2 c lause 2 body ( ) ,which is again identical to that in Figure 8.16. gure 8.29 shows the useof the ~~~~~. The trail is one of the stacks of the it is used to store pointers to variables that have been bound. acb clause routine R 11s unify-terms ( ) declares a stack marker, and records it the present stack t-trail-mark () Each time unify-t~rrn~ () or one of its subariable, it stacks its address by calling t r a i ~ - b i n ~ i( )~ g(Fig"
/ * translation of 'parent(arne, sachiko) void parent-2-clause-2 (void) { Trail-Mark trail-mark;
. /
*/
set-trail-mark(&trail-mark) ; / * translation of (arne, sachiko)' * / if (unify-terms(goal-argl, put-constant ("arnel') && unify-terms (goal-arg2, put-constant (ffsachikoll) ) )
{
/ * translation of */ void parent-2-clause-2-body (void) { goal-list-tail ( ) ; / . /
1 / * translation of */ parent-2-clause__2-bodyO; , . /
1
restore-bindings-until-trail-mark(&trail-ma~k) i
1 Figure 8.29 Clause routine with flat unification for parent ( a r m, sachiko).
ure 8.6). At the end, when all possibilities for this binding have been examined, the routine R unstacks all addresses of variables from the stack until the original level is reached again, by calling restore-~in~in~s-un~il-trail-mar~ ( ) on the same stack marker. Note that it is,not possible to know statically how many variables have been bound: the calls of unify-terms ( ) may have failed i ~ e d i a t e l yhave , failed just before the end or may have succeeded; so a dynamic record must be kept. Now that we have 'flattened' the unification we can take the next step towards compiled unification, Since the heads of the clauses are known from the program, we often find ourselves generating calls of unify-terms ( ) with constant parameters; examples are unify-terms (goal-argl, put-constant ( I8arnel1) )
from Figure 8.29, and unify-terms(goa1-arg, put-struct-2 ( lltimestl , put-constant ( ) , put-variable ( I1X" ) ) )
used to unify the head structure times (2 , x:) and the goal argument. In such cases it seems unnecessary invoke to the heavy interpretive unification mechanism of uni y-tferms ( ) ,and the code looks the more clumsy because it creates complicated data structures just for the purpose of comparing them. One could try to remove thatinefficiency by creating all fixeddatastructures at compiletimeandpassingthem to unif y-terms ( ) , but still the full interpretiveunificationmechanismwouldbe employed.Also,thistechniquewould fail for s ~ c t u r e swithconstantsandvariables inside.
Both inefficiencies can be removed simultaneously by directly in corpora tin^ the infomation about the head arguments in the compiled unification code. This leads to the unification 'inst~ctions'~ I F Y - ~ O ~ S U~IFY-VARIABLE, T~T, ~ I F Y - S T R U ~ T U Rand E, several others. As with any interface, there are two sides to these inst~ctions:how they are used and how they are implemented. Their use is demons~atedin Figure 8.30, which featurestwoapplications of ~ I F Y - C O ~ S T A N TThese . applicationsareto be seenas macros,and for thatreasontheyarewrittenin capital letters,following C convention. They expand into code that falls through if the unification succeeds and then it establishes the proper bindings -and jumps to the label L-f ai1 when it does not. Note that the UN1 FY... instructions are code segments rather than routines, in that the flow of control can leave them by falling through and jumping out. This jump-on-failure fits in with the WAM instructionsfor clause selection, as seen,for example, in Figure 8.22.
-
/* optimized translation of 'parent(arne, sachiko).'*/ void parent-2-clause-2 (void){ Trail-Mark trail-mark; set-trail-mark(&trail-mark); /* optimized translation of '(arne, sachiko)' */ ~ I F Y - C O N S T ~(goal-argl N~ tlarneis) ; /* macro */ ~ I F Y - C O N S T ~(goal-arg2, T I'sachiko") ; /* macro */ I
/* optimized translation of '.' */ goal-list-tail (1 ; L-f ail : restore-bindings-until-trail-mark(&trail-tnark)
1
;
Figure 8.30 Clause routine with compiled unification for parent (arne, sachiko 1.
As to the implementation of these instructions, we can of course design the code that implements them by hand, but it is more instructive, and perhapsalso safer, to derive them by partially evaluating the corresponding call to unify-t erms ( ) . The code to be sub-
head-text ) is shown in Figure 8.31; stituted for ~ I ~ Y - C O N ~ (goal-arg, T ~ T g is a temporary variable of type T e r m * , which loses its meaning at the end of the code segment. The code was derived semi-automatically by partially evaluating the call unify-terms(goa1-arg, put-con~tant(head-text))
Since partial evaluators for C are still experimental and not widely available, this particular partial evaluation was performed by hand, as demonstrat~din. the next section.
g = deref (goal-arg);
if (g->type== 1s-Variable) { trail-binding (9); g->term.variable.term= put-constant(head-text) ;
1
else { if (g->type ! = Is-Constant) goto L-fail; if (strcmp(g->term.constant,head-text) ! = 0 ) goto L-fail; } Figure 8.31 The u n ~ ~ c a t i oi n s ~ c t ~ oUNI n F Y - ~ ~ ~ S T ~ T .
The code for ~ I F Y - C O N ~ T (Figure ~ T 8.31) holds no surprises (fo~unately!). On the one hand, it is similar to what a p r o g r a ~ e rmight have written for the task; on the other hand one recognizes fragments from the general unification code from Figures 8.6 and 8.7. A similar process produces the code for ~ I F Y - ~ A R I A B L E(Figure 8.32). It differs somewhat because the head variable may already be bound.
v = deref (head-var); / * head-var may already be bound * / g = deref (goal-arg); if (v->type== Is-Variable) { if (g ! = v) { trail-binding (v); v->term.variable.term= g;
1
1
else {
/ * no further compilation possible * / / * call interpreter * / if
(
!unify-terms (g, v) ) goto L-fail;
1 Figure 8.32 The unification instruction UNI FY-VARIABLE.
ve,the code for theunificationinstruction UNIFY-CONS manual partial evaluation. We will now show ho ed with only minor effort. In the case of U ~ I F Y - ~ O ~ ~ the T~NT, y consists of in-lining the subroutines of unif y-terms ( ) ,replacing rg by put-constant (head-text ) andthen ~epeatedly and deleting unreachable code. The process required the followerations of an almost mechanical nature. in-line Start from unif y-terms( ) (Figure 8.4) and riables () (Figure 8.7) and unify-unbound-~ariable() is easy to do since they all have the same parameters; just inse~ing . Thetest for constants is alreadyin-line; we do not in-line c tures ( ) since we foresee we willnotneed it. Theresult is still G; see Figure 8.33. ~ c o n s t ~ n tas ~ head. Remove useless the head = g) statement, since head-arg is not a variable. Change the headfy-constant (Term *goal, char *head-text) and ut-constant (head-text ) ) t~oughoutthe body. See Figure 8.34,
s ~ ~ s t i t ~ t i oWe n .replace (put-constant (head-text ) ) ->type by tant, (put-constant(head-text))->term.co~stant by xt,and then (put-constant (head-text) ) ->term.* by ERRONEcode no longercompiles,due to thepresence of E~RONEOUS,See Fig~ o ~ ~ ~ aWet ireplace o ~ . Is-Constant == Is-Variable by 0, goal ) by 0 since goal->typeis Is-Variable constant (head-text) xt, and goal -"type by Is-Constant in the switch statement, since that context. The code again does not compile, due to the~resenceof
~ z ~ ~ t i Remove o n . the ineffective operations (I I 0) and the unreachable ) ) The code compiles again since all occurrences of ERRONEOUS were cated in unreachable code, as desired. See Figure 8.37. M i n t ~ ~ ~Replace c e . re turn 1 by fall-through and re turn 0 by oto L-f ail,to match the WAM flow of control. Remove superfluous p~entheses. is isthe final product. See Figure 8.38. rocess of partially evaluating a part of an interpreter to obtain an optimized code inst~ctioncan be applied in many situations. ore in pafiicular, it proe s e ~ ~ e nfor t s ~ I F Y - V ~ R I A B L(Figure E 8.32) and ~IFY-STRUCTURE
int unify-terms(Term *goal-arg, Term *head-arg) { / * Handle any bound variables: * / Term *goal = deref (goal-arg); Term *head = deref (head-arg); ’
I
if (goal->type== Is-Variable I head->type == Is-Variable) { / * Handle identical variables: * / if (goal == head) return 1;
/ * Bind the unbound variable to the other term: * / if (head->type== Is-Variable) { trail-binding (head); head-3term.variable.term = goal;
1
else { trail-binding (goal); goal->term.variable.term= head;
1
return 1;
/ * always succeeds * /
1
else {
/ * Handle terms of different type: * / if (goal->type ! = head->type) return 0; switch (goal->type){ case Is-Constant: / * both are constants * / return strcmp(goa1->term.constant,head->term.constant)== case Is-Structure: / * both are structures * / return unify-structures( &goal->term.structure,&head->term.structure 1;
1
1
1
Figure 8.33 Deriving the i n s ~ c t i UNI ~ n FY-CONSTANT,stage 1.
Deriving the code for UNI FY-STRUCTURE along the lines explained above is straightforward most of the way, but the resulting code ends in a loop to unify the c o ~ p o ~ e n twhich s, does not fit the notion of an ‘inst~ction’.The WAM inst~ctionfor unify in^ structures only unifies the functor and the arity; separate i n s ~ c t i o n smust be generated for unify in^ the co~ponents.We use the same separation here, but some care is required. The code for ~ I F Y - S T R U ~ T U Rcan E be found in Figure 8.39. Figure 8.40 shows an application of it
0;
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg: */ goal = deref (goal-arg) ; if (goal->type== Is-Variable I I (put-constant (head-text)->type ) {
L== Is-Variable
/* Handle identical variables:*/
if (goal == (put-constant (head-text) ) ) return
1;
/* Bind the unbound variable to the other term: */ if ((put-constant(head-text))->type == Is-Variable) {
1
trail-binding( (put-constant(head-text))); (put-constant(head-text))->term.variable.term = goal;
else { trail-binding (goal); goal->term.variable.term= (put-constant(head-text));
1
1
return
1;
/* always succeeds
*/
else {
/* Handle terms of different type: */ if (goal->type != (put-constant (head-text) ) ->type) return 0; switch (goal->type) { /* both are constants*/ case Is-Constant: return strcmp ( goal->term.constant, (put-constant(head-text))->term.constant ) == 0 ;
case Is-Structure: /* both are structures */ return unify-structures( &goal->term.structure, &(put-constant(head-text)) ->term,structure 1;
1
1
1
Figure 8.34 Deriving the instruction UNIFY-CONSTANT, stage 2.
int unif~-constant(Term*goal-arg, char *head-text) { Term *goal;
/ * Handle bound goal-arg: * / goal = deref (goal-arg); if (goal->type== Is-Variable 1 I Is-Constant == Is-Variable) { / * Handle identical variables: * / if (goal == (put-constant(head-text))) return I;
/ * Bind the unbound variable to the other term: * / if (Is-Constant == Is-Variable) { trail-binding((~ut-constant(head-text)1 ) ; ~ R R O ~ E O= ~goal; S
1
else { trail-binding (goal); goal-rterm.variable.term= (put-constant(head-text)1 ;
1
return 1;
/ * always succeeds * /
1
else {
/ * Handle terms of different type: * / if (goal->type ! = Is-Constant) return 0; switch (goal->type){ case Is-Constant: / * both are constants * / return strcmp(goa1->term.constant, head-text) == 0; case Is-Structure: / * both are structures * / return unify-structures( &goal->term.structure, E ~ R O ~ E O U S 1;
1
1
1
Figure 8.35 Deriving the j n s ~ ~ t i ~ o n ~ ~ ~ - ~stage O 3.~ S T ~ T ,
in the compiled code for the head times ( 2 , X ) . The compiled code first establishes the presence of the structure, then acquires a local variable goal-comp pointing to the component list m d then steps through the list, unifying the components in turn. As usual, if any of the uni~cationsfails, the inst~ctionexecutes a jump to the label L-f ai1. We see from Figures 8.31 and 8.39 that when the goal argument is an unbound variable, the corresponding constant or structure must be created and the variable bound to it. This is necessary to allow further unification andor reporting of the value in answer to a query. There is a problem here, though: whereas put-constant ( ) creates the entire
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg:*/ goal = deref (goal-arg) ; if (goal->type== Is-Variable I 0) { /* Handle identical variables:*/ if ( 0 ) return 1;
/* Bind the unbound variable to the other term: */ if ( 0 ) {
1
trail-binding( (put-constant(head-text))); ERRO~EOUS = goal;
else { trail-binding (goal); goal->term.variable.term= (put-constant(head-text));
1
1
return
1;
/* always succeeds */
else {
/* Handle terms of different type: */ if (goal->type != Is-Constant) return
0;
switch (Is-Constant) { /* both are constants*/ case Is-Constant: return strcmp(goa1->term.constant,head-text) == 0; /* both are structures*/ case Is-Structure: return unify-structures( &goal->term.structure,E ~ R O ~ E O U ~ 1;
1
Figure 8.36 Deriving thei n s ~ c ~ Uo ~n I ~ Y - C O N S T ~ Nstage T , 4.
constant, put-st ructure ( ) creates only the record for the s ~ c t u r e fills , in the functor, and the arity, but leaves the component entries empty. This matches the behavior of E , checks only the type, the the second part of the code for ~ I F ~ - ~ T R U C T U Rwhich functor and the arity but leaves testing the component entries to the code followingit. This code (Figure 8.40) cannot, however, handle uninitialized component entries, as would be supplied by the first part if the goal variableis an unbound variable. We solve this problem by initializing components the with anony~ous variables the in call to init ialize-components ( ) in Figure 8.39; these variables can then be bound to the
int unify-constant(Term *goal-arg, char *head-text) { Term *goal;
/* Handle bound goal-arg: */ goal = deref (goal-arg) ; if (goal->type == Is-Variable) { trail-binding (goal); goal->term.variable.term= (put-constant(head-text)); return 1; /* always succeeds */
1
else { if (goal->type != Is-Constant) return 0 ; return strcmp(goa1->term.constant,head-text) == 0;
1
1
Figure 8.37 Deriving the instruction~ N I F Y - C O N S T ~ Tstage , 5.
int unify-constant(Term *goal-arg, char *head-text) { Term *goal; goal = deref (goal-arg) ; if (goal->type== Is-Variable) { trail-binding (goal); goal->term.variable.term = put-constant(head-text)
1
;
else { if (goal->type != Is-Constant) goto L-fail; if (strcmp(goa1->term.constant,head-text) != 0) goto L-fail;
1
return 1; L-f ail : return 0;
1
Figure 8.38 Deriving the instruction~ I F Y - C O N S T A ~ final T , result.
components of the structure by the instructionsfor the entries. Figures 8.41 and 8.42 show theeffects of the code of Figure 8.40 whengoal-arg is an unbound variable Y, represented as a null pointer. Figure 8.41 shows the result of the inst~ctionU~IFY-~TRUCTURE(goal-arg , It times , 2 ) . Three allocations
g = deref (goal-arg) ; if (g->type == Is-Variable) { trail-binding (g) ; g->term.variable.term = put-structure (head-functor, head-arity) ; initialize-components(g->term.variable.term);
1
else { if (g-=.type!= Is-Structure) goto L-fail; if ( g - > t e r m . s t r u c t u r e . a r i t y != head-arity I I strcmp(g->term.structure.functor, head-functor) != goto L-fail;
0
1
Figure 8.39 The unification instruction U N 1FY-STRUCTURE.
1
~IFY-CO~ST (goal-comp ~T EO] , I s Z s s; ) ~IFY-VARIABLE(goal-comp [l] , X) ;
/* macro */ /* macro */
Figure 8.40 Compiled code for the head times (2 , X).
have taken place: onefor the structure t imeS /2 , one for an m a y of two components, and -A0 l and -A02 here. The one for two ‘~onymous’variables,whichwerenamed instructions U N I F Y - C O N ~ T ~ T(goal-comp CO] , s12”)and U ~ I F Y - ~ A R I A ~ L E (goal-comp [1.1 , X ) are referred to these anonymous variables and bind them to their respective arguments, as shown in Figure 8.42. We see that the same inst~ctionscan perform unification with data structures from the goal when these are present and construct them when they are not. We are now in a position to generate in-line code for the unification of constants, variables, and structures as head arguments, with goal arguments; unification of lists as head ~ g ~ m e nist svery similar to that of structures. Compiled unification performs effectively the same operations on the data structures as interpreted unification does, with the exception of the intro~uctionof anonymous variables as structure components, It is much faster, since many of the decisions have been taken at compile time. Note also that compiled unification is appliedto all arguments,whereastheoptimized clause selection(Section 8.4.3) examines the first argument only.
~ U ~ ~ ~ C 8.5.3 T I O ~Nn i f i ~ ~ tof i ostru~tu~es n in the WAM
variable goal:
Y
anonymous variables
A0 1
_ .
A02 -
ure 8.41 The effect of the instruction ~ ~ F Y - S T R U C T U R ~ .
variable goal:
Y
anonymous variables
-A01 $ constant
-A02 $ variable X
Figure 8.42 The effect of the i n s ~ c t i o n sof Figure 8.40.
The WAM features several additional optimizations on the unification i n s ~ ~ t i o n s ; we will discuss one of them in some detail in the next section and briefly mention some others in the section thereafter.
There is a second solution to the problem of the uninitialized component entries, already ted in the wording above: leave the entries uninitialized and modify the unification instructions so they can handle that situation, This solution is especially attractive because the anonymous variables are clearly an artifact of our implementation and continue to add at least one level of indirection to all operations on the structure; avoiding them will result inconsiderablespeed-up, since manyPrologprogramsmanipulatestructuresandlists extensively. The solution cannot be implemented asit stands, however, and requires two modifications. The first is simple: the unification instructions on the components have to be notified that the component entries have not been initialized. This is done by introducing a globalstatevariable‘mode’withtwovalues,‘Read’and‘Write’.In‘Read’mode, each instruction works as described above. In ‘Write’ mode the instruction creates precisely the data structure that it wouldunifywithin‘Read’mode,and initializes the proper component entry with a pointer to it. ~nification starts in ‘Read’ mode, and a ~ ~ F Y - S T R U C T Uthat ~ Eneeds to create the s t r u c ~ r eswitches to ‘Write’ mode. Since structures can be nested, the mode must be saved at the start of structure unification and restored at the end of it. far-reaching. The unification The secondmodification is alsosimplebutmore instructions as described above get a pointer to the goal argument to workon, and unification instructions for the com~onentsof structures get pointers to the components. A s a result, when we do not initialize the component entries, the instructions get uninitialized pointers, and no amount of clever program~ingcan make them do anything sensible with them. To initialize a component entry with a pointer to a data structure, the instruction needs the ~~~~~s~ of the component entry, rather than its -uninitialized -value. So the unificationinstructionsneedpointerstopointersto data stru~tures;as the sayinggoes, there is no problem in computer science that cannot be solved by introducing another level of indirection. Fortun~tely,this is easy to do, as shown in Figure 8.43, where we have just added the address operator & to the goal arguments, The code saves the mode in mode-1 before calling the readwrite version of ~IFY-STRUCTU~E, since that instruction may set the mode to ‘Write’, to w a n the following instructions. The mode is restored when all structure components have been unified. Nested structures will need more than one ternporary mode-N tosavenestingmodesandmorethanonetemporary goal-comp to address the various Componentlists. Now that we have seen how the new instructions are used, we can look into their implementation. The readwrite version of ~ I F Y - C O N S T ~ NisTshown in Figure 8.44 and is representative of the general approach. The code for ‘Read’ mode is the same as that for the original ~ I F Y - C O N S T ~(Figure T 8.31)’exceptthatthere is an extra dereferenceoperationonthepointerto the goalargument. The code for ‘Write’mode creates the constant and stores a pointer to it under the goal argument pointer, which in ‘Write’ modewill be an entry ina component list of a goal st~cture. The codefor UNIFY-STRUCTURE (Figure 8.45) follows this pattern with the exception that it sets the mode to ‘Write’ rather than initializing the components, when the goal argument is an unbound variable. Figures 8.46 and 8.47 show the effects of the code of Figure 8.40 with goal-arg an
/ * match times(2, X) * / mode-1 = mode; / * save mode * / U~IFY-STRUCTURE(&goal-arg, Iftimes" , 2 ) ; / * macro * / / * saves readlwrite mode and may set it to Write-Mode * / { / * match (2, S ) * / register Term **goal-comp = d~re~(goal-arg)->term.structure.components; ~ I F ~ - C O N S T A N(&goal-comp T L O ] , ! 1 2 1 f;) / * macro * / UNIFY-VARIABLE (&goal-comp [l], X ) ; / * macro * /
1
mode = mode-1; Figure 8.43 Compiled code with reaawrite mode for the head times ( 2 ,
if (mode == Read-Mode) { g = deref (*goal-ptr) ; if (g->type == Is-Variable) { trail-binding (9) ; g->term.variable.term = put-constant(head-text)
x:1 .
;
1
1
else { if (g->type ! = Is-Constant) goto L-fail; if (strcmp(g->term.constant,head-text) ! = 0) goto L-fail; }
else { / * mode == Write-Mode * / *goal-ptr = put-constant(head-text);
1 The inst~ctionUNI~ ~ - C O N S Twith ~ T readwrite mode.
unbound variable Y,this time using rea~writemode. Figure 8.46 shows that the instruction ~ I F ~ - S T R U C T U R E(goal-arg t! timest! 2) has now left the component array entries uninitialized, avoiding the overhead of the const~ctionof the anonymous variables. The instructions UNIFY-CONSTANT (goal-comp [ 01 It 2 ) and ~ I F Y - ~ ~ R I (goal-comp ~ ~ L E [l] X) recognize this situation and fill in the component array entries with their respective arguments, as shown in Figure 8.47. omp piled unification with rea~writemode creates exactly the same data s t ~ c t ~as re a set of calls to the put-. . . routines, except that only that part of the data structure is created that is not already present in the goal argument. I
I
if (mode == Read-Mode) { g = deref (*goal-ptr) ; if (g->type == Is-Variable) { trail-binding (g); g->term.variable.term = put-structure (head-functor, head-arity) ; mode = Write-Mode; /* signal uninitialized goals
1
*/
else { if (g->type != Zs_Structure) goto L-fail; if (g->term.structure.arity!= head-asity I I strcmp(g->term.structure.functor,head-functor) != 0 ) goto L-fail;
1
1
else { /* mode == Write-Mode */ *goal-ptr = put-structure(head-functor, head-arity);
1
Figure 8.45 The instruction ~IF~-STRUCTURE with reaawrite mode.
variable goal:
Y
Figure 8.46 The effect of the instruction UN1
FY-STRUGT with read/write mode.
The above UNIFY-. ..instructions are simpler than the actual WAM instructions, which feature several further optimizations. We will briefly discuss them here; for a full treatment see the tutorial by Kit-Kaci (1991). We saw that the unification of structures is somewhat complicated by the fact that they can nest. This necessitates saving the mode as shown in Figure 8.43, and leads to the use of several component pointers at the same time, each of which we would like to allocate in a machine register. The WAM avoids these complications in a drastic way: it forbids nested structures. If the head ar~umentconsists of a nested structure, it must be uni-
variable goal:
Y
Figure 8.47 The effect of the ins~uctionsof Figure 8.40 with readwrite mode.
fied in top-down breadth-first fashion: the goal and head top-level structures are unified first, skip pin^ any substructures. Next, the second level of structures is unified, skippin any subsubs~ctures,etc. Since deeply nested head arguments to clauses are rare, this restsiction is not much of a problem in practice. The restsiction has several advantages: saving the mode has become superfluous; only one goal-comp pointer is needed, which can easily be placed in a register; and the door is opened to other optimi~ations,which we will describe now. Since only the UNIFY-STRUCTURE i n s ~ ~ t i ocan n set the mode to ‘write’, the mode is g u a r ~ t e e dto be ‘Read’ in any unification not inside a structure. Since simpler code can be generated in ‘Read’ mode (for example the code of Figure 8.31) than when the mode is unknown (Figure 8,44), the WAM has special inst~ctionsGET-. . . to be used instead of UNIFY-. . . whenever ‘Read’ mode is guaranteed statically; so G E T - C O ~ ~ Twould ~ T be used instead of ~ I F Y - C O N S T ~inTFigure 8.30, Note that since no structure unification is ever started inside a structure, there is no ~IFY-STRUCTUREin the WAM, just a GET-STRUCTURE. Since there is only one pointer to the component list, goal-camp, it can be implicit in the machine and need not be passed explicitly to the instructions. And since the unification inst~ctionsfor structure components (the UNIFY- . . . instructions) are differentiated from those used outside structures (the GET-. . . ins ~ c tio n s ),GET-STRUCTURE can set the pointer goal-camp and the UNIFY-. . . i n s ~ c t i o n scan use it and then increment it, to reach the next component. This also removes the need for the extra indirection described above. As a result, the compiled unification code of Figure 8.43 reduces to G E T - S T R U ~ T ~ ~ E ( g o a tl t/ t i m e s t t , 2 ) ; ~ I F ~ - C O ~ S( 2T ) ;~ T ~ I F Y - V A R I A ~ L (X) E ;
/* m a c r o */ / * macro * / / * macro * /
Note that UNIFY-CONST~Tand U N I F Y - V A R I ~ ~each L ~ have one parameter now, from the rule head; the goal argument is implicit. the head ~ g u m e nknown t A different optimization concerns the instructions GET-VAR IABLE and U N I F Y - V ~ I ~ B L EThe , first use of a free variable X is guaranteed to be unbound, so the
instruction reduces to v = head-var; g = deref (goal); trail-binding (v); v->term.variable,term = g;
for GET-VARIABLE and to v = head-var; if (mode == Read-Mode) { g = deref (*goal-comp++) ; trail-binding (v); v->term.variable.term = g;
1
else { /* mode == Write-Mode */ *goal-comp++ = v;
1
for ~ I F Y - ~ A ~ I A Bnote L E ;the useof the goal component entry pointergoal-cornp in the latter, since it is used only during structure unification. The original GET-~ARIABLE and U ~ I F Y - ~ A ~ I Ainst~ctions ~LE are renamed to GET-VALUE and U~IFY-VALUE, although these narnes are somewhat misleading. The second and further uses of a head variable need not represent a value at all since it may have become bound to an unbound variable.
This concludes our discussion of some optimi~ationsof compiled u~ificationin the WAN(, and of compiled unification in general. We have seen that the building blocks for compiled unification can to a large extent be derived semi-automatically by the partial evaluation of calls to the interpreter. This method was already applied to the imperative languages in Section 4.2.3.2, but the gain was l i ~ t e dthere. The difference is that here the method is applied manually and to a relatively simple interpreter. Professional compilers for functional and logic languages combine these or similar techniques and use them in combinationwithoptimized clause selection(Section8.4.3)andtail call elimination(Section 7.7,3), and such traditional opti~izationtechniques as constant propagation, in-lining, and cloning. The resulting code can come within a factor of two or three with respect to the speed of a similar program in an imperative or object-o~ente~ language.
-Logic p r o g r a ~ i n gis based on (l)the specification of relations between terns, (2) facts using these relations, and (3) rules for inferring new facts from existing facts. Specific inferred facts can be requestedby doing queries. -Facts and rules together are calledclauses; they can contain logic variables. -A logic variable can be (1) unbound; (2) bound to a value, in which case it represents that value; or (3) bound to another variable, in which case it represents whatever that
variable represents. A binding of a variable in a clause gives a new clause; the new clause is called an instantiation of the more general clause. - Clauses can be used to infer new facts, as follows. A clause consists of a head and a list of goals; dependencies between the head and goals are expressed by logic variables. If in an inst~tiationof a clause all goals are establis~edfacts, the head of the clause can be inferred as a fact. - Answering a query means finding bindings for the variables in it that turn the query into a known or inferred fact. - To infer a goal fact F , a clause is sought that might infer F, given proper variable bindings, Possible proper bindings are found by unifying F with the head of the clause. The clause is then instantiated using these bindings; the instantiated goal list yields new goals. Some of these may be known facts, others may need to be inferred. If all goals are eventually reduced to known facts or empty, F is inferred. - The state of the search for clauses and bindings can be recorded in a stack of estended goal lists. An estended goal list is a list of goals and unification requests. - The inference process considers the top stack entry only. If the top stack entry is empty, we have found an answer to the query and report it, using the instruction ‘Succeed’. if the top stack entry contains unification requests, an attempt is made to satisfy these, using the inst~ctions‘Unify’ and ‘Match’; if this fails the top entry is removed. Otherwise, clauses are located that might infer the first goal in the top entry. The inst~ction ‘Attach clauses’ is used to make copies of the top entry and insert unification requests with these clauses in them. - All Prolog interpreters and compiled Prolog programs work with this inference process or optimizations of it. - The inference process acts only on the topmost entry and mainly on its first goal, except for variable binding, which acts on the whole topmost entry. The non-first goals in the non-topmost entries are almost exclusively copies of those in the topmost stack entry, and it is an essential optimization to combine them. However, once combined, variable binding in the topmost entry affects them all. To undo this effect when the topmost entry is removed, all bindings are recorded when performed and undone when the topmost entry is removed. This is called backtracking. - Unification is a top-down process on two terms, one from a head tern and one from a goal term. The terms are viewed as trees. For unification to succeed, the two top nodes must be equal and have the same number of children, and the pairwise unification of the children of both terms must succeed, - An unbound variable is unified with a value or variable by binding it to that value or variable. The binding is recorded for backtrac~ngpurposes. - Structures, lists, and sets are unified by the same rules as terms. Variables in them are unified by the rules for variable unification. - Most Prolog compilers generate either C code or WAM code. The Warren Abstract Machi~ecode is an assembly level macro code that allows convenient optimizing Prolog compilation. - In a compiled Prolog program, each relation is implemented as a routine, and so is each clause.
-The first goal in the routine for a clause C is processed by a call G to the routine com-
piled for the relation in that first goal. This routine gets an action routine as a parameter that will be called whenever G succeeds in finding a proper set of bindings, The action routine is constructed to process the rest of the goals of C and then call C’s action routine. This implements the AND component of inference. The technique requires nesting routine declarations. The routine compiled for a relation R successively tries all clauses for R, passing to them the action routine it obtained from its caller. Copying the goal list has been replaced by copying the action routine. This implements the OR component of inference, Static and dynamic analysis can show that many of the calls to clauses for a relation cannot succeed, and should be avoided. The WAM has special instructionsfor this. The WAM has an ins~uctionfor clause selection based on the nature of the first goal argument: constant, variable, structure, or list; it effects a four-way jump, Next, for constants and structures as goal arguments it has instructions which perform N-way jumps, based on the value of the constant or the functor of the structure. The code positions reached through these jumps are occupied by calls to exactly the clauses that can have fiist head arguments with the selected properties. Exhaustion of the search possibilities is implemented as a jump to a fail label in the WAM. The Prolog ‘cut’ is implemented by jumping to the fail label when backtracking overit. Assert is implemented by conditionally calling the interpreter at the beginning andthe end of each routine for a relation. Retract is implemented by conditionally calling the clause routines. If the interpreter finds in an assertedclause a goal for which a compiled routine exists, it switches back to the compiledcode. -Naive compiled codefor unification is inefficient because itoften compares known constants to known constants and because it often creates data structures just to compare them. -To avoid the inefficiencies, the WAM has special instructions for unification with head arguments that are constants, variables, structures, and lists. These inst~ctionsare partial evaluations of the general unification routine for head arguments, specialized for constants,variables,structures,and lists. Theyapplyto all headarguments, first or otherwise. If a WAM unification i n s ~ c t i o nmeets a goal value, it compares the value to its built-in value. If the two match, the instruction continues; otherwise it jumps to the fail label. If a WAM unification instruction meets a goal variable, it creates a copy of its internal value, binds the variable to it, and continues. -The WAM unification instructions for structures and lists create the top nodes of the structure or list only. The creation of the children is left to further unification instructions, which are notifiedof this situation by turningfrom ‘Read’ mode to ‘Write’ mode. Usingthe above andmany other optimizations,compiled logic programscancome within a factor of two or three with respect to the speed of equivalent programs in imperative and object-oriented languages.
-
-
-
-
-
-
-
-
Information about the imple~entationof logic languages can be found in journals like the Journal of Logic P r o g r a ~ ~ i nand g ACM rans suctions on Program~ing~ n g ~ u g and es ~ y s t e ~ and s , in the proceedings of, for example, the International Conference on Logic Programming and the ~ C ~M I Conferen~e ~ on Progra~ming P ~ ~ n g~u a g Design e and I ~ ~ l e ~ e n t a t PLDI. io~ The original explanation of the Warren Abstract Machine is given by Warren (1983). Kit-Kaci (1991) has written a tutorial on the WAM. The definitive survey of sequential Prolog implementation, both historically and technically, was written by Van Roy (1994); it contains an extensive literature list.
-
8.1.
Prolog tries the different subgoals in the body of a clause from left to right. In theory, other evaluation orders are also possible, for example from right to left. For each of the two queries ??-
grandparent (arne, 2) . grandparent ( X , rivka) .
determine whether left-to-right or right-to-left order is more efficient (using the grandp~entrelation of Section 8.1). What can we learn from this?
8.2
In Section 8.2.1 we showed how the interpreter processes the query ? - gp (arne, X ) . Process this query again, but now using the opti~zationsdiscussed in Section 8.2.2.
8.3
Given the following relations: author(grune, ParsingTechniques) . author ( j acobs, ParsingTechniques) . author(grune, programmi~gLanguageEssentials) . author(ba1, p r o g ~ a m m i n g D i s t r i b u ~ e d ~ y ~ t e ~ s ) . author(ba1, programmingLanguageEssentials). coauthor(X,Y) : - author(X, Book) , (Y, Book).
(a) Show how the inte~reterdescribed at the beginning of Section 8.2 processes the query: 7-
coauthor(ba1, grune) .
(b) Describe which code will be generated for these relations using the techniques of Section 8.4.2. (6) Describe which code will be generated for the relations using the optimized clause selection technique described in Section 8.4.3.
8.
Project: In Section 8.2 we presented an implementation model in which the goal lists
are kept in a queue which acts like a stack since the ‘Attach clauses’ instruction manipulates it in a last-in-first-out fashion. Explore the effects of changing the ‘Attach clauses, i n s ~ c t i o nso as to manipulate the queue entries in a first-in-first-out fashion: it removes the top element, but adds the new clauses at the bottom. Which of the optimizations discussed here are still applicable? A language designer proposes to allow queries in which relation n m e s can be variables as well. Discuss the consequences for the general interpreted implementation model. (a) Showhowtheinterpreterprocesses un~ficationof p ( X , [ 2 , X , 4 I ) and p ( 3, [A, B, 41 ) ,using the techniques described in Section 8.3.
(b) Describe which code will be generated for this unification using the techniques described in Section8.5.
.cl
~ ~a unification t e routine unif y-l i stS () for lists as described in Section 8.3.1. Why is this code improvement an direct over application of unify-structures ()? In Section 8.3.3 we explain that the unification sequence B to A, C to B, D to B, E to C, leads to the binding tree in Figure 8.10b rather than the one in Figure 8,lOa. Is there a binding sequence that~~~Z~ lead to the tree in Figure8.10a, and if so, what is it? Refer to the example at the beginning of Section 8.4.1 on list procedures. The procedure return-two-values ( ) could also return a list with both values it wants to return, and let the caller consume the results one at a time. Why will this not work in general? Write a list procedure that accepts two list procedures specifying integer lists and that implements the intersectionof both lists. Construct code for a clause with twocuts in it: first-first-grandparent (X, Z )
:- parent
(X, U), !, parent (U, Z ) ,
which tries to say that X is a first-first- randp parent of % if first child of X.
2; is
the first child of the
Find out from the Prolog manual how as t seand retract r should behave with respect to ~ a c ~ t r a c ~and n g verify , if the approach from Section 8.4.5 supports this behavior. Design an implementation of optimized clause selection and retracted clause handling in an integratedfashion using bit vectors. Hint: see Colomb (1988). (a) The meta-logicaloperator var (;x) inProlog is definedtosucceedif X is currently an unbound variable and tofail otherwise. How can this operator be imple-
mented in compiled Prolog? eta-logical arithmeticalinfixoperator is inFrologrequiresitsright (b)The operand to be a s ~ c t u r E e that represents an arithmetic expression in some specific format. All variables in E must be bound. If V is currently an unbound variable, the operation succeeds and V is bound to the ~ t h ~ e tvalue i c of E, If V is bound and its value is equal to thearith~eticvalue of E5the operation succeeds. Otherwise it fails. How can this operator beimple~entedin compiled Prolog? Assume the existenceof a routine Term *evaluate-expres sion ( ) that evaluates an expression in the specific f o ~ a to t a term of type Constant,orgivesanerrormessage if the expression is in error. Also assume si~gle-lengthinteger arithmetic. Suppose one in~oducesin log F/
an ~ ~ t h m e toperator ic between with the syntax
between E , and E 2
whichevaluates E l and E2 as inExercise8.14,If V is u n b o u n ~the ~ o~eration succeedswith V boundsuccessivelyto all ~ t h m e t i cvaluesbetween E l and E2 inclusive (of which there may be zero!). If V is bound and its value lies between E l and E 2 inclusive, the o~erationsucceeds. Otherwise it fails. How can this operator be imple~entedin omp piled Prolog? Derive a unification instruction UNIF~-LISTfor unifying lists, similar to the one in Figure 8.39. i ~ of t logic o l~a n ~ u ia ~~p~l ~ ~ ~ ~ t a Study t i oRobinson’s n: paper (1965), which ~ntroducesunification as an essential partof theorem p~ovingand logicp r o g r a ~ ~ m i n ~ (requires co~siderable mathe~aticalsophistication),orhis later paper robins so^, 19’71) which concerns theimple~entationof unification, and write a s u of it. ~
~
~
~
P a ~ ~ l land e l distributed systems consist of multiple processors that can communicate with each other. Languages for p r o g r a ~ i n gsuch systems support constructs for expressing c o ~ c u ~ e n cand y c o ~ u n i c a t i o n In . this chapter, we will study how such languages can be implemente~. A s we will see, the presence of multiple processors introduces many new problems for a language implementer. We will first define what we mean by a parallel system and what by a distributed ,the goal is to solve a given problem asfast as possible, using processorscooperatetosolveasingleproblem,Parallel processing typically features in applications that would take a huge amount of compute So, the reason for using a parallel time, often weeks or months, on a single processor. system is to obtain performance improvements. s y s t e ~also contains multiple processors, but now the processors work on many different tasks from many users. There is normally no single problem involved. A typical configuration is a system with one workstation per user, plus some additional processors providing a file service or a time-shared compute service. The processors in a distributedsystemareautonomousandareconnected by anetwork. "here aremany reasons for using a distributed system insteadof a single, centralized machine: a distributed system is more cost-effective, potentially more fault-tolerant, and it is easier to extend the system incrementally. Interestingly, it is possible to use a distributed system as a parallel one. In fact, many research groups have studied how to use a collection of (idle) ~orkstationsfor runnin~ parallel programs. This approach is attractive, because many institutes already have the wor~stationsinstalled.
An important issue in compiling parallel and distributed p r o g r a ~ sis the architecture of thetargetsystem.Duringthepast decades, manyarchitectures for paralleland distributed systems have been designed. In this chapter, however, we focus on only two widely-used machine models: ~ ~ Z t i ~ r o c e s s oand r s ~ ~ Z t i c o ~ ~ ~The t e r difference s. betweenthesetwoarchitectures is thatwith a multicomputerprocessesondifferent processors run in different address spaces, whereas with a ~ultipro~essor at least part of 9.1). In a rocessor, all processorshave theaddressspace is shared(seeFigure accessto a single,sharedmemory. The processorsnicate by readingandwriting variablesintheshared ~ e m o r y .Multiprocessorscanbeconstructed, for example, by connectin several processors to a single bus. Other, more scalable, methods also exist. A tic consists of several processors connected by a network. The processors communicate by sendingmessagesoverthisnetworkto each other. The networkcan either be especially designed for the multicomputer, as in the CM-5 or SI)-2, or it can be an off-the-shelflocalareanetwork(LAN),suchasEthernet. A dist~butedsystemcanbe regarded as a ~ ~ l l t i c o ~ p uwith t e r an off-the-shelf LAN.
(a)~ultiprocessor
(b) ~ u l t i c o ~ p u t e r
Multiprocessors and ~ u l t i c o ~ p u t e r s ,
Many models and languages have been proposed for parallel and distributed programming and a great amount of research has been done-on how to imple~entthese languages efficiently. In this chapter, we necessarily cover only a small fraction of all thiswork. Rather than trying to give a broad overview of compi~ationtechniques for many different paradigms, we will focus on a few impo~antmodels. We will focus on langLlage constructs for p ~ a l l e l p r o g r a ~ i nalthough g, many constructs9for example message passing, can be used just as well for writing ~istributedapplications. For distributed applications,
ditional language c o n s ~ c t sexist, such as atomic transactions, but these will not be discussed here. The parallel programming modelswe discuss are: s ~ a r ve~~r i a ~ l e~s ,e s s ~~ ~~ se s i ~ ~ ~ ~ e c~ t s ~, Space, p ~and e~ a t ~ - p a r a~l l ~rl ~ ~ rThe a first ~ two ~ models, i ~ ~shared . variablesandmessagepassing,arelow-level c o ~ u n i c a t i o nmechanismsthatreflectthe underlyin~hardware. Objects, Tuple Space, and data parallelism are high-level const~cts that allow c o ~ u n i c a t i o nto be expressed at a higher level of abstraction. The models are s ~ ~ r ~ which a ~ , s , d e s c ~ ~ in e dSection 9.1. Their implementation is based on p r ~ c e s s ~and are treated in Section 9.2, In Sections 9.3 to 9.6 we discuss im~lementation techni~ues for shared variables, message passing, objects, and Tuple Space. in Section 9.7, we look at a ~ t o ~ ~parallelization, tic which tries to generate data parallel programs from sequential We should emphasize that much interest in^ research is done on other ~ r o g r a ~ i n g ms too, in p~ticularon functional and logic languages, and on other parallel promodels, for example collective communication. Also, we will restrict ourselves o c o ~ ~ i l a t i otechniques n for multiprocessorsand ~ u l t i c o ~ p ~ tbuilt e r s out of simple on alte~ativeformsofparallel ~icroprocessors. Muchresearchhasalsobeendone dware, for example vector computers, super-scalar machines, and data-flow machines. e will not cover these in this book. Roadmap
9 Parallel and distributed programs 9.1 Parallel p r o g r ~ m i n gmodels 9.2 Processes and threads 9.3 Shared variables 9.4 ~ e s s a g epassing 9.5 Parallel object-oriented languages 9.6 Tuple space 9.7 ~ u t o ~ a t paralleliz~~ion ic
general remarks can be made about the implementation of parallel languages. than with logic and functional languages, muchof the impleme~tationcomes in a g e ~ n - tsystem, i ~ e rather than compiler support. Some of the more popurogra~mingsystems(forexample PVM and MPI) areeven imple~ented raries that are linked witha program written ina traditional language, suchas AN. These p r o g r a ~ ~ nsystems g need no direct compiler support whatsoem with this approach is that the p r o g r a ~ e is r respo~siblefor m& calls, since there is no compiler that checks the calls. Incorrect usage library occurs, for example, when a process sends a message containi~gan integer value, while the receiver expects a oati in^ point value. The type error will not be caught, and will result in incorrect and unpredictable program behavior, This approach is comparable m i n in assembly, where the library calls are regarded as machine inst~ctions ing systems are designed as relatively smallextensions to existy are implemented using compilers that do check for errors such as the one described above. Compilers for such languag~susually are fairly simple, how-
ever, and mostof the languageimplemen~ationconsists of a complex run-time system. An interesting area where compilers do play an important role concerns languages with implicit parallelism. In most languages, the program~eris required to express parallelism explicitly, and to use statements to coordinate the parallel activities (for example by using message passing). In languages with i lel~sm,on the other hand, the compiler tries to parallelize the program auto enerate communication code. Such compilers are very hard to build and ers. We study such compilers in are someti~eseven referred to as 9.7. One problem in studying parallel run-time systems is that the boundary between the Operating System (OS) and the run-time system is often fuzzy. Many p~mitives,such as processes and c o ~ u n i c a t i ocan ~ , be implemented either in theOS or in the r u n - t i ~ esystem.Arecenttrendis to movefunctionality from the OS to therun-timesystem. The advantages are that it saves expensive interactions with the OS and that it gives the compiler writer more control over the implementation of the functionality. An extreme form used in some systems is to map devices, for example the network interface, in user space and to let the run-time system control the device without any involvement from the OS. In our discussions, we will always make clear which functionality we assume the OS to provide.
A parallel programming model provides supportfor expressing parallelism as well as communication and sync~onizationbetween parallel tasks, Below we briefly describe the programming models used in this chapter.
Perhaps the simplest parallel p r o g r a ~ i n gmodel is that of a collection of processes com~ u n i c a t i ~through g shared variables. A ro~essis an abstraction of a physical processor. Each process executes program code seq ntially. Conceptually, a process may be thought as containing a virtual processor, which supplies the processing power, and an ace, which stores data; so an address space is an abstraction of physical mem concepts of processandaddressspaceareusedinmanyotherparallelprogramming models (discussed later) as well. A program ca multiple processes to obtain parallelism. A new process is created with thefor fork sort(A,
1, 1 0 ) ;
which forks offa copy of sort ( ) as a new process and passes threep~ametersto it. With the s h a r e d - ~ ~ i a b l e p r o g r amodel, ~ i ~ g at least part of the address spaces of multiple processes can access the same variables. Such variables les. A given shared variable can be read and written by some or allprocesses,andcanthusserve as a ~echanismfor communicationbetweenthese processes.
An important problem with this programming model is synchronizing the access to thesharedvariables. If multipleprocesses simult~eouslytrytochangethesame data structure, the result will be unpredictable and the data structure may be left in an inconsistent state. A typical example is two processes that si~ultaneous~y try to increment a shared variable X by executing: X: shared integer;
x
:= x
4 1;
Assume that X initially contains the value 5. Clearly, if both processes increment the variable, the resulting valueof X should be 7 . What may happen instead, however, is that both processes will read theoriginal value (5) and compute the new value (6) based on the original value. Next, both processes will write this new value into X. As a result, the variable is assigned the value 6 twice, instead of being incremented twice. syn S needed that are To prevent undesirable such behavior, make sure only process oneaccess can a certain given time. This form of synchronization is called mutual exclusi . A simple primitive e, which has indivisible operations to acquire (set) and release the lock, to acquire a lock that has already been acquired, this process will block until the lock has been released. Lock variables can thus be used to restrict the access toa given shared data structure to a single process only. The example above would be written as follows using lock variables: X:
shared integer; X-lock: lock;
Acquire-Lock
x
:=
x +
(X-lock) ;
1;
Release-Lock (X-lock);
Now,onlyone of theprocessescanexecutetheincrementstatementat a giventime. ~nfortunately,this method is rather low-level and e~or-prone.For example, the programmershouldguard all accesses to shared variables by locks. If the increment statement occurs in several places in the program, and one occurrence is not protected by lock statements, the programis still in danger of behaving incorrectly. A morestructuredandhigher-solutiontothesynchronizationproblem,whichwe oRito~is similar to an Abstract Data Type in that it use in this chapter, is the monitor. is a type that contains dataand the operations to access the data. The dataencapsulat~dby a monitor are sharedby multiple processes. The key idea of a monitor is to allow only one operation inside the monitor at any given time. Thus, all monitor operations are executed in a ~ ~ t u a lexclusive ly way. exclusion, a second form of synchronization is needed, called R, With condition synchronization, a process can be blocked specific condition occurs, for example until a certain process has produced a result that is , With monitors, condition synchronization usually is which are shared variables of type Condition,on whichtwoindivisibleoperationsaredefined: wait ( ) and s igmal ( ) Theoperation wait ( c ) on a condition variable c blocks the invoking process; a signal ( c ) wakes up
one process blocked in a wait on the same condition variable. The two primitives can only be invoked from within a monitor operation. An important feature of the wait operation is that a process that blocks in a wait operation is temporarily lifted out of the monitor, so another process may enter the monitor. A simple example of a monitor is given in Figure 9.2. The monitor contains a variable representing a ‘bin’ and operations to put data in the bin and take data out of it. The p u t ( ) op~rationblocks if the bin already contains data and the get ( ) operation blocks if the bin does not yet contain data. When the operations wake up after being blocked, they recheck the status of the bin, because the status may have changed again since the signal operation was issued.
monitor BinMonitor; bin: integer; occupied: Boolean := false; full, empty: Condition;
(x: integer); operation put begin # wait if the bin already is occupied while occupied do ; wait (empty) od; # put the item in the bin bin := x; occupied := ,true; signal (full); # wake up a process blocked in get end; ; operation get(x: out integer) begin while not occupied do # wait if the bin is empty wait (full) ; od; x := bin; # get the item from the bin occupied := false; signal (empty); # wakeup a process blocked in put end ;
end ; Figure 9.2 An example monitor.
The sharedvariable p r o g r a ~ i n gmodel is primarilyintended for programmingshared memory multiprocessors, machines with physical shared memory. The model is less suitable for multicomputers, since such machines do not have a shared memory for storing shared variables. An alternative programming model, message passing,is suitable for both
mu~tiprocessorsand multicomputers. passing, each process can access onlyits own local data. Processes can exch sending messages other. to The basic model uses two primitives, t and the receive stat t, as illustrated below: the S processl: send (process2, message) ; process2: receive (process1 , message) ;
ere, the first process sends a message to the second process. The call to receive ( ) second process until the message has arrived. The formatof a message depends ramming system. In low-level systems a message usually is just an array of igher-level languages, a message can be a record-like structured value, with fields of different types. any variations of this basic sendreceive model exist. One issue is how the sender eiver address each other. In the basic model, the sender and receiver specify the narne of each other. A more flexible method is to let the receiver accept messages sent by anyprocessintheprogram.Thismethod is useful if the receiver doesnotknowin advance who is going to send the next message. Another way of i ~ c r e a s i ~flexibility g is to avoid specifying process names at all, by using indirect names. Many languages use n a ~ e sfor this purpose. A message sent to a specific port is delivered at a process issues a receive statement on the corresponding port narne. It is up to the system to match ~ ~ n d eand r s receivers. Another important issue concerns the question of when exactly the sending process is allowed to continue. With as~nchrono~s sender the contin~~es immediassing the sender waits until thereceiver has accepted the messa~e. Sync~onous message passing is more restrictive, but it has the advantage that when the sender continuesit knows that the message has been delivered. ]In some languages, the receiver is given control over which messages to accept. For ~ ~ a m ~the l ereceiver , may specify which kinds of messages it wants to accept, or in which order to handle messagesif multiple messages are available: receive print(size, text) suchthat size c 4096; # only accept print messages with a small size
or receive print(size, text)by size; # order the messages by increasing size
e have so far assumed that messages are received by an explicit receive statement. Analternativemethod,called i ,is tocreate a new ~~r~~~for each incoming ss, with its own pro~ramcounter and stack. the context of a process and can access the address spaceof this process: a thread does not have an address space of its own. This thread will execute a ,a routine that is definedby the p r o ~ r a ~for e reach type of message. After handler, the thread terminates. With implicit receipt, multiple threads may cess. All these threads can access the
of that process. Threads can also be used for other purposes. For example, if a process wants to send a request to a remote process, it can create a separate thread to senda message and waitfor the result. In the mean time, the original thread (for example the main program) can continue processing. Threads have become an important concept in many parallel pro 9.2, ming systems, and will be discussed in Section In this bookwe assume that the threadsof a process execute ina pseudo-parallel way, each that means which e (single) processor on thisruns; Alternatively, called is also multiple memory multiprocessor could be assigned to the process, in which case the threads can executephysically in parallel. If thesynchronization ~echanismsdescribed for shared variables (for example locks and monitors) are usedfor synch~onizingaccess to the global variables of the process, the program will execute correctly in both cases,
Sharedvariablesandmessagepassingarelow-levelmodelsthatdirectlyreflectthe shared~~emory and distributed- memo^ machine architectures. Many other parallel prog r a ~ n languages g have been designed that are based on more abstract models. Examples include parallel functional, logic, and object-oriented languages. In this chapter, we will discuss the latter category, which contains a large ~ u ~ b ofe langua~es r ( ~ i l s o nand Lu, 1996). The key idea o f object-oriented program~ngis to ‘encapsulate’ the data in objects. The data inside an object can only be accessed through on the object. Other impo~antconcepts are classes, i andGrune, 1994), butweneednotdiscussthesehere.One of thegreatestadvantages claimed for object-oriented p r o g r a ~ i n gis that it resultsin well-st~cturedprograms, making the technique suitable for writing large programs and easing the reuse of software components.Theseadvantagesare just as important for parallel p r o g ~ a ~ ~ as i nforg se~uentialprogra~ming, whichexplainsthe large interestinparallelobject-oriented languages, ~ ~ a l ~ e l can i s mbe introduced by allowing several objects to executeat the same time, possibly on different processors, by letting a process execute inside the object. Comunication between objects is expressed through operation invocations:an object can invoke an operation on another object, possibly located on a remote processor. Operation invocatio~ is similar to message passing, but it is more cleanly integrated in the language sema~tics, Just as with message passing, there are many alternatives for invoking and servicing an operation.Synchronousinvocationswaituntiltheoperationhasbeenexecuted,while asynchronous invocations continue imme~iately.The receiving object can accept invocations explicitly or implicitly. Many parallel object-oriented languages allow the process inside an object to consist of multiple threads of control. A popular model is to have one threadfor the main process of the object and to createan a~ditionalthread on the flyfor each operation invocation; this implements implicit receipt, ~ynchronizationof these threads can be expressed using a any parallel object-oriented languages are based on monitors.
Anotherapproachtoobtain a morehigh-leveltract p r o g r a ~ i n gmodel is throughan appropriate com~unicationdata structure:the ace. TupleSpace wasdesignedas system. Linda is a small set of simple primitives that can be added to an existingsequentiallanguage,resultingin a newparallellanguage.This idea hasbeen applied to several base languages, resulting in parallel languages like C/Linda, F~RTRANLinda,and LispLinda. The Tuple Space is a conceptual shared memory that is addressed ‘associatively’, as follows. Tuple Space can be regarded as a box of tuples -records -that can be accessed by all processes in the program, regardless of the processor on which they run. In this sense, Tuple Spaceis a shared memory; the pointis, however, that TupleSpace can also be implemented reasonably efficiently ona dist~buted~memory system, as we will see. Three operations are defined on Tuple Space:
-o u t adds a tuple to theTuple Space; -read reads a matching tuple in TupleSpace; -in reads a matching tuple in Tuple Space and simultaneous~yremoves the tuple from Tuple Space. For example, using C/L,inda, thecall o u t ( l l i t e m l l ,4 , 5 - 4 8 ) ;
generates a tuple with three fields (a string, an integer, and a floating point number) and deposits it in Tuple Space. The read ( ) and in ( ) operations search for a tuple in Tuple Space. For each field of the tuple, they can specifyeither
eter, which is an expression passedby value, or ter, which is a variable preceded by a ‘?’symbol and which is passed by reference. The terns ‘actual’ and ‘formal’ are used with this meaning in the Linda literature; they should not be confused with the traditional use of ‘actual and formal parameters’ in prog r a m ~ ~ nlanguages, g For example, the call float f ; i n ( l f i t e m ’ l ,4 , ? &f)
;
specifies two actual parameters (a string and an integer) and one formal parameter (of type float). This call tries to find a tuple T with three fields, such that:
-the actual parameters in the call have the same types and values as the corresponding fields in tuple
-the formal parameters in thecall have the same types as theco~espondingfields in tuple T.
Thisprocessiscalled t . If a tuple T is foundthatmatches,anyformal parameters in the call get the values of the co~espondingtuple fields of T‘, and the tuple T is removed from Tuple Space;all this is performed as an i ~ ~ i v i ~ui~~eZr ~e t i o also n ( called ~ t u ~ action), i c If the matching tuple was generatedby the call
o u t ( l f i t e m l l ,4 , 5 . 4 8 )
;
the variable f will get the value5.48. If more than one tuple exists in the TupleSpace that i n ( ) or matches,one is selectedarbitrarily. If nomatchingtupleexists,thecallto read ( ) blocks: the calling process is suspended until another process adds a matching tuple. The Tuple Space primitives are i ~ ~ i v i ~ in i ~the Z esense that if two processes simultaneously try to i n the same tuple, onlyone of them will succeed; the otherone will block. Linda does not contain any primitives to modify a tuple while it is in Tuple Space. Instead, thetuplemustbetakenout of Tuple Space, modifiedlocally,andthenputback. The advantage of this approach is that it automatically provides mutual exclusion synchronization on the Tuple Space. As an example, suppose we want to increment the second field of the tuple shown above. We can then use the following code: intn; float f; i n ( l l i t e m l l , ? & n , ? &f) ; o u t ( I 1 i t e m l 1 ,n + 1, f ) ;
The first statement gets the tuple and removes it from Tuple Space. The second statement increments the second field and puts back the new tuple. If two or more processes try to execute this code at the same time, onlyone process will succeed in inning the tuple. The others will block until the tupleis back. This achieves the right synchronization behavior. The above example shows how simple shared data structures can be built in Tuple Space. More complicated data structures can be built using a similar approach. The Linda model resembles the shared variable model, but an important difference is that Tuple Space is addressed associatively. A tuple does not have an address, as a word in memory has. The read ( ) and i n ( ) primitives specify a description for a tuple, and the system has to find a tuple that matches this description.
Another important parallel programming model is . With data parallelism, all processors execute the same algorithm (and code), but operate on different parts of a data set, which usuallyis an array. In contrast, in a parallel program using processes, different processors can execute different algorithms. The latter kind of parallelism is generally known as t Data parallelism is more restrictive and less flexible than task paralleli easier use, tosince only single a algorithm has to be defined for all processors. A usual approach is to let the programmer define ~~~t can be executed in parallel, and let the compiler ist tribute the computations and the data among the different processors. Some parallel languages do this by providing built-in data-parallel operations. Fortran 90, for example,supportsoperations on matricesthatcanbeexecuted in adataay. The simplest explicit language constructfor data-parallel programming is the . For example, matrix ~ultiplicationcan be expressed as follows
parfor i := 1 t o N do parfor j := 1 t o N do C [ i , j l := 0 ; for k := l to N do C [ i , j l := C [ i , j l od;
od ;
-t
A[i,
j l * B [ j , kl ;
od ;
This example specifies that the steps of the two outer loops can be executed in parallel, leaving it to the compiler to actually dist~butethe com~utationsover the different processors. An important advantage of data p~allelismis that, due to their simple structure, d a t a - ~ ~ a l programs ~el are easier to analyze by a compiler. We will discuss this in more ection 9.7, when we look at automatic p~allelization. Figure 9.3 summarizes the five ~ ~ a ~programming l e l models.
process 1 r
ite
r
ite
1
process 2
l I
I
message passing
shared variables
objects parallel
process 1 T
T
Space
Tuple
dat~-parallel
i ~ u r e9.3 S u ~ ~ of ~ the r yfive parallel p r o ~ r amodels. ~ ~ i ~ ~
n our discussion of the implementation of the five parallel progrmming above. All these models use processes and many also use threads in addition to processes. "herefore, in this section, we first discuss the implementation of processes and threads. "he first i m ~ o ~ a thing n t to note is that modern o p e r a ~ nsystems ~ already provide a process a ~ s t r ~ c t i oThis ~ . should come as no surprise, since a process is essentially an abstraction of a ~hysicalprocessor, and it is the task of the US to provide abstractions of hardware entities. Unfo~unately,using 0s processes for implementing language-level , switch from one OS process processes often has a hi h overhead, In p ~ i c u la~context to another is often exp sive, since it requires intervention from the 0s kernel. Also, creating 0s processes is usually expensive, because such processes are heavyweig~t;they carry a large mount of state i n f o ~ a t i o such ~ , as open file descriptors and accountin info ~a t i on. An alte~ativeway to implement langua~e-levelprocesses is to use a single 0s process per physical processor, and implement all language-level processes on that processor as threads. This approach is used in many parallel l a n ~ ~ a g esince s , it often is far more efficient than using multiple 0s processes. If the language itself also provides a threads uage-level threads can then be implemented in the same way as lan Below, we discuss how to implement threads. Many - but not all - opera~ngsystems provide threads in addition to processes, A t ~ carries ~ much e less~ state~i n f o ~ a t i o nthan a process, so it is more lightwe threads are managed by the OS, all thread operations (creation, context switchi have to go through the OS, which is still expensive. An alternative solution i s to implement threads in user space. In general, this is easy to do, and in fact m thread packages exist. The only complicated operation is a thread c requires saving the current program counter, stack pointer, and them with the previously saved values from another thread. Since this operation cannot be expressed in most high-level languages, a few lines of assembly code are usually needed for i m p l e ~ e n tthread i ~ ~ context switches. A fundamental problem with user-level threads concerns blocking calls to the 0s kernel. If one thread invokes a blocking operation (for example a receive statement), another thread in the same process should be allowed to continue. If the 0s does not know about the existence of user-level threads, however, the 0s will block the entire process, thus preventing other threads from ~ n n i n Fortunately, ~. several solutions to this problem have been invented (Anderson et aE., 1991). he basic idea is to let the 0s h o w about the existence of threads, but to avoid involving the OS in each and every thread operation. This idea, called sche s, is supported in several modern operating systems, for example Solari in the remainder of this chapter that the run-time tern uses user-level threads. Each user-level thread is represe~tedby a , which has room to save the status i n f o ~ a t i o n(see Figure 9.4). Depending on the threads package, the control block may have additional fields. In the example, the Stack 1i r n i t s field i~dicatesthe beginning and end of the stack that is assigned to the thread, The Status field indicates
whether the thread is ‘running’ (it is the current thread), ‘runnable’ (it is not the current thread, but it can execute), or ‘blocked’ (itcannot execute because it is suspended).
Program counter
Stack limits
Figure 9.4 Structure of a thread control block.
Sometimes a threadspackagecontains a se der threadthat d e t e ~ i n e swhich ~ n n a b l ethread should be allowed to execute next. The scheduler uses the status fields to d e t e ~ i n ewhichthreadscanexecute.Ifmultiplethreadsarerunnable,twoscheduling regimes are possible: either the current thread executes until it (voluntarily) blocks, or the ts, therunningthread after a certainperiod of timeandschedules another thread. The scheduler also takes care of doing the actualcontest switch to the new thread. A djsadvanta~eof using a special scheduler thread is that the context switches to andfromtheschedulerthreadaddrun-timeoverhead. Many thread packages therefore avoid using an explicit scheduler thread. Instead, a thread can call a library routine that selects the next thread to run and does a context switch directly to the selected thread. T ~ r e a dcontrol blocks sometimes need tobe manipulated as list items. A monitor, for example, can maintaina list of threads that want toenter the monitor, A s another example, the scheduler thread may use a list of runnable threads, so it does not need to inspect the S t a t u s field of every thread when it wants to select another runnable thread. For this reason, control blocks usually contain oneor more pointer fields that are used tolink them into lists. This concludes our introduction tothe five parallel ~rogrammingmodels and our discussion of processes and threads. In the following sections we turn to the i ~ p l e ~ e n t a t i o n of the five models, where we will primarily look at the implementation of c o ~ u n i c a t i o n primitives.
On a shared- er no^ multiprocessor, a shared variable can beim~lementedsimply by storing it in the shared memory. The compiler has to know which part of the memory address space is shared, so it can assign shared variables to locations in shared memory, On many multiprocessors, all ofthe ~ e m iso shared, ~ so this task is trivial.On a distributed-
memory multicomputer, shared variables are of course much more difficult to implement. Techniqueshavebeendevelopedtosimulate a virtualsharedmemoryontop of a dist~buted-memorysystem(LiandHudak, 1989; Keleher et d., 1994). Justasvirtual memory moves pages between main memory and the disk, virtual shared memory moves pages between the memoriesof different processors. This techniqueis outside the scope of this book, however.
The main difficulty with the shared-variable model is how to implement the synchronization p~mitives.Let us first look at lock variables. A lock variable has two operations, to acquire and release the lock, that look deceptively simple. The hard problem, however, is implementing these operations ~ ~ ~ ~ Ifvseveral ~ ~ threads ~ ~ try Z to y execute . an operation on the same lock variable, only one of them should be allowed to execute at any one same time, This problem has been studied extensively in the operating systems literature. One solution for making the primitives indivisible is by implementing them in theOS, but this has a high overhead. Fortunately, most processors have hardware inst~ctionsthat an efficient user-level implementation of locks. A frequently used -set. This instruction checks if a given variable is equal to zero, and le to one; either way, the original value of the variable is returned. The hardware guarantees theinstl~ctionto be indivisi~le. With such aninst~ction,it becomes possible to implement indivisible operations on a lock. A lock variable holds a zero if it is free and a one if it is taken. To acquire the lock, a test-and-set instruction on this variable is executed, If the result is zero, the lock was free, and the variable is now set to one. If multiple threads try to acquire the lock simultaneously, only one of the test-and-set inst~ctionswill return zero, since the instruction is indivisible; the others will see that the variable was already set to one. Thus, there can be no confusion about which thread obtained the lock. If a thread fails to acquire a lock, it should block until the lock becomes free; this happens when it is set to zero again by some other thread. A simple way to implement this ’ is torepeatedlytryacquiringthelock.Thisapproach is calle ). It wastes CPU time, however, which in general should b solution is to maintain a list of blocked threads with each lock variable, as shown in Figure 9.S. With this approach, when a thread needs to block on a lock, its control block is added to thislist, its status is set to ‘not runnable’, and another (runnable) thread is selected by the scheduler (see Figure 9.Sb). Whenevera thread releases a lock, it can make the first thread on the list ‘runnable’ again, so the scheduler will eventually select it for execution again (see Figure 9.5~).Note that releasing a lock now becomes more complicated than just resetting the lock variable to zero; for further details see Exercise 9.4.
Let us now look at the second synchronization p~mitivedescribed in Section 9.1.1:the monitor, A monitor is a higher-level mechanism than a lock. It is usually implemented with some support from the compiler. In particular, the compiler helps in making all moni-
lock variable
Thread control blocks I
5,
l
not
runnable
I
5.1 not
I
I
runnable
I runnable
1
ure 9.5 A lock variable containinga list of blocked threads.
tor operations mutua~lyexclusive, First of all, it automatic all^ generates a lock variable for of each monitor, called the mo itor lock. Also, the compiler generates code at the start every oni it or operation to acquire the monitor lock; likewise, it generates code at the end of the operation to release the lock. These statements assure that only one thread can be in the monitor at a time; if multiple threads try to execute an operation on the same monitor, all but one will be blocked on the monitor lock. The compiler also generates code to im~lementthe wait and signal operatio~s. For thispurpose, a separate list is used for each conditionvariable,containingthecontrol blocks of threads that are block~din a wait operation on that condition variable. The wait operation adds the control block of the current thread to this list, marks the current thread as ‘not runnable’, releases the oni it or lock, and finally allows the scheduler to resume of thethreadsinthe list runnableand r thread. A signaloperationmakesone removes its control block from this list. Depending on the precise semantics of the monitor, a signal operation may or may not do a context switch to the signaled thread. If no context switch is done, the signaled thread will eventually be selected by the scheduler.
When that moment arrives, the thread will continue executing its operation.
first obtain the monitor lock again, and then
on a distributed-memory discusshowtoimplementmessagepassin nicationprotocolsprovided implementationdependsonwhich of the CO by the o~eratingsystem is being used. Many langu es are implemented on top of a reliablecommunicationprotocol,such as TGPDP. F efficiency reasons,somelanguage imple~entersprefer a lighter-weight, unreliable protocol, such as UDP, or even use their own c o ~ u n i c a t i o nprotocols. An unreliable protocol usually is more efficient than a reliable one; this i ~ p l i e that s it obtains a lower latency and higher throughput. The disadvane is of course that the languageru~-timesystem then needs to implementits own protocol to make the langua~e-l~vel operations reliable. Since this protocol can be tailored to what the run-time sys rall p e ~ o ~ a n mcaey be better than that on top of a reliable OS protocol In the extreme case, some languages are implemented by letting the work device in user space, and runnin~the entire network protocol as part of the run~time system. This approach only uses the OS during program initialization (to map the network device),butavoidstheOS for communication, so it bas a lowoverhead ( et al., 1 9 9 ~ ) .In g ,themorefunctionality is implementedinuserspace,the mo ~exibilitythe r u n ~ t i ~system e has, This often results in p e r f o ~ a n c egains, but at the cost of a more complicated ~ n - t i m esystem, For our discussion, however, we assume that the e s y s t e is ~ built on topof a reliable c o ~ u n i c a t i o nprotocol providedby the OS, ven with this simplificatio~,there still are many issues the run-time system has to The issues we discuss are: locating the receiver, marshaling, type chec~ing~ selection. For the purpose of our discussion, assume we have two processes, Receiver,that are ~ n n i n gon. different processors and that want to comeach other as follows: Process Sender : A, B: array[1..100] of float; initialize A and B; send multiply(A, B) to Receiver; Process Receiver: A, B: array[1..100] of float; receive multiply(A,B) ; multiply A andB, print result;
Process Sender thus sends a message multiply (A, B) with two arrays of 100 floate ing point numbers to processR ceiver;process Receiver waits for such a ~ e s s a g to arrive and then services it.
The first question to answer is how the sender manages to get the message delivered at the rightdestinationprocess.Atthelanguagelevel,identifierssuchas Sender and Receiver areusedtoidentifyprocesses,butthe OS communication protocol clearly does not h o w anything at all about such identifiers. Typically, OS protocols only h o w about processor names. Thus, thefirst task of a run-time system is to maintain an administration of where each language process runs. In languages that im lement Ian ua e-level esasthreads, an often-usedapproach is tohaveone per processor. A message intended for process Receiv daemon of the processor on which Receiver runs, using the OS c o ~ u n i c a t i o nprotocol; the daemon then hands the message to the thread that implements the destination process (as describedlater).
e over the network. In the example above, the message is a st~cturedvalue containing two arrays of floating point numbers. "he OS communication protocol, however, does not h o w about data types like arrays or ~oating point numbers. A typical protocol merely provides a primitive to send and receive flat rs, whichareuntyped mays of bytes. Therefore, procedures are needed rt a language-level message to a flat buffer at the sending side vert such a buffer back at the receiving side. Such p~oceduresare known as routines. The conversion process is also called ( An example of these conversions is shown in Figure 9.6. Were, a record has to be marshaled that contains the string It John1'and the integer 26. The message buffer containing the marshaled data starts with a length field ina fixed format (4 bytes), denoting the number of bytes to follow (9). Next, the string is stored in the buffer, in null-terminated ASCII form. Finally the four bytes of the integer value are stored in the buffer, with the least significant byte at the rightmost (highest) address, which also happens to be the format used by the sending processor. At the receiving side, the string and integer value are extracted from the buffer. Note that, in the example, the receiving processor uses a different data representation for integers than is used for the buffer, and stores the least significantbyteatthe Z e ~ ~ address. ~ s t Animportantadvantage of marshaling is thatsuch differences in machine representations can be masked. All thatis needed is that the sender and receiver agree upon the fomat of the marshaled data in the buffer. As an optimization, it is also possible to always store the data in the format of the sender, and includea description of this format, for example specifying the byte order in the buffer. The receiver can then check whether the format in the buffer is different from its own format. This scheme avoids doing conversions when the sender and receiver have the same data format, In a library-based programming system, the user often has to write the marshaling routines by hand. With parallel languages, the routines are usually generated automatically by the compiler from the type definitions of the values sent and received. The marshaling routine for a message containing simple data st~ctures,for example arrays, is easy to generate. The routine simply allocates a buffer of the right size and copies the mays into the buffer.Forcomplicated,pointer-baseddatastructures(suchaslists,trees,andgraphs),
name age
3b
=26
0000001~
J1
John"
It
marshaling
00 00 00 09
J
o
h
n \O 0 0 0 0 0 0 1A
P
4-byte length unmarshaling ~~
LAOOOOOO
I It
=2 6
John"
Figure 9.6 Marshaling.
however, automatic generation is not always possible, because the compiler cannot know what constitutes the entire data structure or how to deal with cycles in data structures. Many languages therefore restrict the data types that can be used in messages to scalars, records, and arrays, and disallow data types that contain pointers. An interesting approach to marshaling i s taken by Java RMI, the Java Remote Method Invocation. Any object can be marshaled automatically, provided its class implements the S e r i a1izabl e interface, The Java library contains generic routines that can marshal and unmarshal any serializable object, The routines are written in Java and check dynamically what the types of the object parameters are, using an object-oriented feature called reflection. This approach is highly flexible, but unfo~unatelythe run-time inspection also has a ~ e ~ e n d o uoverhead. s It has been shown (Maassen et al., 1999) that s~ecialized compiler-generated routines can be orders of magnitude faster.
Another important advantage of using a parallel language rather than a library is that the compiler can type-check messages, in the same way it type-checks routine calls. In the example above, the compiler can check that multiply messages always contain two arrays of 100 floating point numbers each. Without such a compiler-check, the sender could also send, say, one array of 400 integer numbers. The discrepancy would go unnoticed, but would clearly result in erroneous results, because the receiver will interpret the incoming stream of bytes as floating point numbers.
After m~shalingthe message, the next step is to actually transfer it over the network. As stated before, we assume that the operating system providesa reliable communication protocol to achieve this,so we do not discuss the detailsof such a protocol here. Once a messagehasbeendeliveredatthedestinationprocessor,typically is tohandthemessageto a local municationdaemonthread,thenextstep on the language, this may be a complicated task, because programmers often are given control over the order in which incoming messages are serviced, In the simplest case (used in our example above), the receiver only indicates the typeof message it wants to receive (such as a multiply). In this case, the run-time system can store incoming messages in separate lists, based on the message type. Each receive statement then blocks until the listfor the specified typeis non-empty. Some languages, however, give the user more flexibility about which messages to accept, and in which order. In SR, for example, the receiver can specify a predicate on the message: receive rnsg(A, N) suchthat N z: 50;
In this case, the ~ n - t i m esystem has to traverse thelist of i n c o ~ n gmessages of type msg and check if there is any message whose second field is smaller than 50. hene ever a new messagearrives, it also hastobecheckedinthisway.Features like thesemake the more expressive, but they clearlyalso complicate the language implementation. r discussion so far hasassumedthatmessagesareacceptedusinganexplicit ~onceptually,this tatement. An alternativeform is implicitmessagereceipt. oach creates a new thread of control within the receiving process that executesa meshandler. To implement this, the run-time system may indeed create a new thread of control for each message. Since messa e handlersgenerallytake little executiontime, however, the overheadof creating and destroying these threads will be high. To reduce the overhead, several techni~uescan be used. One idea is to have the corn~unicationdaemonthread execute themessage han~lerfor each incomingmessa without creating a new thread. Althou~hthis will work in most cases, sometimes a message handler routine may need to block. For example,it may want to acquire a lock that is already taken. Clearly, the comm~~nication d a e ~ o nthread should not block, sinceit would prevent other incoming messages from being serviced. If message handlers are allowed to block, it follows that they need to be executed b A second and more general idea is use to a ndling threads. When a thread has finished executing a handler, it is r e t ~ ~ to e dthe pool instead of being destroyed. When another message comes in, it can be handled by one of the threads in this pool. If the pool is empty when a message arrives, a new thread is created. In this way, many expensive thread creation and destruction operations can be avoided.
ilar imple~entationtechniquesas imper~tivelanguages. Aswe have seen, man^ object-oriented lan
based on ideaslikethreadsandmonitors,whoseimplementationwehavealready elow, we discuss three additional implementation techniques that have been developed more specifically for object-oriented languages: how to locate objects, how to migrate objects, and how to replicate objects.
The first issue we discuss concerns namin and locating of objects. A parallel or distributed program uses many objects, located on several processors which can access each other. Typically, an object in such a program is defined by an o An OID is similar to a pointer. Unlike noma1 pointers in C or Pascal, however, an OlD is valid network~wide. An object on one processor may contain a reference to an object on another processor. There is no syntactic distinction between invoking an operation on a local object -an object on the same processor as the invoker -or on a remote object. In other words, concu~ent object-o~ented languages provide a global, progra~-wide,name space for all objects. A keyquestion is howtoimplementthisglobalnamespace for objectsona distributed-memory machine. A traditional pointer, as used for example in C or Pascal, is represented as a memory address; such a pointer, however, only has meaning for a single address space. An OID, on the other hand, identifies an object on any processor, independent of its location. Typically, some bookkeeping is used on each processor to map an OID onto a physical processor and a memory address within that processor. For example, an OID can be represented as two words:one identifying the processor on which the object resides and one containing its address within that processor (assumingeach processor runs a single process). As an example, suppose an object x: contains a reference to another object Y. The reference of X to Y is then stored as a &word OID, as shown in Figure 9.7’. Suppose x: invokes an operationon Y, as in: Y ->Determinewvalue( llDiamondll );
The run-time system then first determines on which processor Y is stored, by lookingat the first word in the OID. Next, it sends a message to this processor containing the operation, the address, and thep ~ a ~ e t e r s . Unfortunately,treatingeveryobjectinthisway is veryexpensive.Forremote objects, the overhead of determining the location of the object is small compared to the time needed to issue a remote operation, For local objects, however, the overhead can be substantial. One solution is to require the programer to distinguish between local objects -objects that cannot be accessed from a remote processor -and shared or global objects. In Java, for example, remote objects have to i~plementthe Remote interface. Another solution, chosen in the Emerald system, is to let the compiler figure out which objects are used only locally; the others are assumed to be shared.
CPU
4
CPU 7
CPU address
x
:
'
I address 4000:
igure 9.7 Using object identifiersfor remote references.
allow objects tobe moved dynamically from one procesanother. technique, to This sorseveral called has advantages. For example, suppose an object X is going nce of operations on a remote comm~nication.An object Y. Without object migration, each invocation would result in alternative schemeis to first migrate object Y to the processor on whichX runs, then invoke all operations there, and finally, if desired, migrate object Y back to its original location. If Y does not contain much data, this scheme may result in less c o ~ u n i c a t i o noverhead. Object migration can also be used to improve the load balance of a program. The Emerald language provides several notations for expressing object ~ g r a t i o nand for requesting the system to keep certain groupsof objects together on the same processor. To implement object migration, several challenging problems have to be solved. The data of the object have to be transferred to its new location, using the marshaling and ~nmarshalingtechniques described earlier. For objects containin one or more threads of control, these threads also have to be moved, which is co~plicated.Finally, if objects are allowed to migrate, it becomes far more co~plicatedto locate an object, given its OID. Ourschemedescribedaboveassumedthat an OID containsthelocation of the object, encoded as a processor number and a local address. If objects can migrate, this scheme no longer works. One solution to this problem is to keep track of all references to each object. Ifan object X migrates from processor A to processor B, all references to x: are then changed on all processors. ~nfortunate~y, maintaining all this i n f o ~ a t i o nin a coherent way is very hard, An alternative s c h e ~ that e does not need thisi n f o ~ a t i o nworks as follows (see Figprocessor on Aupdated. is ure 9.8). Instead of updating all references, The run-time system on processor A stores a ess (processor B) for object x: (see Figure 9.8b). Whenever another processor, say C, tries to invoke an operation onX,
it will send a message to the ~ n - t i m esystem on processor A, because processor C is not yet aware of the new location of X. The ~ n - t i m esystem of processor A will notice that x: has moved and use the forwarding address for X to d e t e ~ i n eits current location (processor B); the message is then forwarded to this processor. The run-time system of processor B will execute the operation, return the result to processor C, and also inform the run-time system of processor C of the new location of X. All subsequent invocations from processor C on object X will go straight to processor B, as shown in Figure 9 . 8 ~ .In theory, an object can move multiple times, so many forwarding addresses may have to be followed to find a given object. For this reason, the technique is known as ~ o i ~ t ce ~ rasi~~.
machine C
machine A
machine B
Figure 9.8 Pointer chasing.
Besides object migration, some languages also support object re~lication.With this technique, copies of an object are stored in the local memories of several processors. The advantage of object replication i s that operations that read the state of the object but do not modify it can be executed using the local copy, without doing any communication. Thus, if an object is mainly read and seldom modified, object replication can substantially reduce the communication overhead (Bal et al., 1998). Some languages also use replication to make programs more fault-tolerant. A key issue is what to do when an object is modified. Clearly, if the operation is applied only on the local copy of the object, all other copies in the system will become out-of-date. Two solutions to this problem are possible. One is to i ~ v ~ Z i all ~ ~other te copies, by asking each processor to delete its copy. The problem with this is that a processor whose copy has been invalidated needs to get a new copy of the object when it again needs to invoke an operation on the object. If the object is large, this will have a high com-
munication overhead. The alternative is to ~ ~ all copies, ~ ~ for example t e by sending the operation andits parameters and applyi~gthe operation to all copiesof the object. The most fundamental problem with object replication in neral is how to guarantee consistency of all copies.We illustrate thisusingtheupdprotocol.Assumethat M 2 , simultaneouslywant to do an operation on an processes on twoprocessors,and object X that contains an inte r with initial value 0. For example, M I wantstosetthe it to 20, both using an operation assign. ( ) : integer to l0 and M 2 wants to M , : X->assign(lO); M 2 : X->assign(aO);
d M 2 both send their operationto all processors, thereis a risk that these messages e in a different order at different processors. Some processors may receive the from M I first and then the message from Mz, so the final value of the integer 20. Other processors may receive the messages the other way round, resulting in a final value of 10, As a result, th sol~ltionshave been proposed for taken in the Orca system is to use teesthat all messagesarriveat 1998). With this primitive, either processors receive the message from M 2 first. com~unicatio~ primitive can beimplemente~either by the OS or by a user-space protocol.
The next p r o g ~ ~ ~model i n g we discuss is the Linda Tuple Space. uple Space provides a ~articularlyinteresting challenge for language imple~enters. ecall that the Tuple is a kind of associati~eshared memo^. We will discusshowtheTuple<Spacecan be imple~entedon a dis~ibuted- memo^ system. A trivial way of im~lementin~ Tuple Space would be to store the entire Tuple Space as a list on a single processor (the ‘Tuple Space server’), and search this entire list whenever a tuple is read. Obviously, this approach will be highly inefficient. First, it requires a linear search of the entire Tuple Space. Second, every Tuple Space operation will corn~ u n i c a t ewith the Tuple Space server, making this processor a communication bottleneck. Thus, an efficientTuple Spacei~~lementation has to address two issues:
-how to avoid the overheadof associative addressi~ -how to implement the shared memory model
on a system that does not provide shared
memo^.
a combination of compile-time and r u n - t i ~ etechslates Linda ~ r o g ~ ~into m sthe tar sage of Tuple Space operations, as discussed below. tem that is l i ~ k ~with d the c o ~ p i l e d t a r ~lan et code. The preprocessor md thekernel cooperate to addressthetwoissuesdiscussed above.
We first discuss how to avoid the overhead of associative addressing of the Tuple Space model. This problem has to be addressedfor any implementation of Linda, whether it runs on a mu~tiprocessoror a multicomputer. To simplify our discussion, we assume here that the Tuple Space is implemented using the shared memory of a multiprocessor. A distributed imple~~ntation is discussed in the next subsection. The Linda preprocessor analyzes all calls to Tuple Space primitives in the program test. It pa~itionsall th calls into disjoint sets, called r t i t ~ ~ nThe ~partitioning . is doneinsuch a way thatple Space calls in different parnscannot affect eachother. For this purpose, the compiler looks at the number of parameters of the calls, the types of the para~eters,and the values of the parameters if they are compile-time constants.
int n, year; float f; .I.*
/* code initializing and n
f */
(1) in( rlOlympicyear", 1928) ; (2) in(llClassic movie", "Way out West", 1937); (3) in("C1assic movie", "Sons of the desert", 1933); (4) in("C1assie movie", "Sons of the desert", ? &year) ; (5) in("Popu1ar number1*,65536); (6) in(lrPopular number", 3.14159); , n) ; (7) in ("Popular number" (8) in( "Popular number!', f) ; F i ~ ~ 9.9 r e A demo program in GLinda,
Forexample,assume a CLinda programwiththedeclarationsand state~entsas igure 9.9. It is clear that the first two calls cannot affect each other, because they have a different num~erof parameters: the first call i n s a tuple with two fields and the second call i n s a tuple with three fields, so they access different tuples. The second and third calls both have three parameters, but the compiler can determine statically that the last two parameters are different, so these calls cannot affect each other either. Likewise, the fifth and sixth calls differ in their second parameter. The third and fourth calls clearly do affect each other, since the tuple ("ClaSsic movie","Sons of the desert", 1933) ;
can be inned by both calls. The last two calls are more interest in^, because they have actual parameters whose values are not known at compile time. Note that the parameters are variables, but since they are not preceded by a ? they are ~ ~ parameters t ~ that~are evaluated Z at run time. Still, the seventh and eight calls cannot affect eachother, because the types of their second
parameter are different: the seventh call features an integer parameter and the eight call a floating point parameter. It is not clear at compile time whether the fifth and seventh calls will affect each other, since that depends on the run-time value of the variable n. Likewise, the sixth and eight calls may or may not affect each other, depending on the value of f . In such cases, the compiler is conservative and puts both calls in the same partition. The compiler performs the analysis that we i n f o ~ a l l ydescribed above and uses it to generate the five partitions shown in Figure 9.10 for the demo CLinda program of Figure 9.9. In this example we have used i n calls everywhere, but the analysis is the same if o u t or read primitives (or some combination) are used. Partition l : (1)
Partition 2 : (2)
i n ( llOlympic year" , 1 9 2 8 ) ; i n ( I ' C l a s s i c movie1', "Way o u t Westf1, 1 9 3 7 ) ;
Partition 3 : ( 3i n ) ( l l C l a s s im c ovie", (4) i n ( I ' C 1 a s s i c movie",
lKSons of t h e d e s e r t " , 1933); 'lSons o f t h e d e s e r t f 1 , ? &year);
Partition 4 : (5) (7)
i n ( l f P o p u l anru m b e r t 1 , i n ( "Popularnumberll,
Partition 5 : ( 6 ) in("Popu1n a ru m b e r 1 ! , ( 8i n) ( I 1 P o p u l anru m b e r p 1 ,
65536) ; n) ; 3.14159);
f) ;
igure 9.10 The partitions for thedemo C/Linda program in Figure 9.9.
An immediate advantage of this analysis is that the entire Tuple Space can now be split up into different sections, one foreach partition. Since calls in two different partitions cannot affect each other, a call need only search the section of the Tuple Space that is reserved for its partition. So, instead of an exhaustive search of the entire Tuple Space, a call need only search a small part of the Tuple Space. The result of the Tuple Space analysis can be used for more advanced o p t i ~ i ~ a t i o n s as well. In particular, the Linda preprocessor analyzes all calls within a p ~ i t i o nand then ge that is optimized for that partition. We illustrate this determines a s t ~ r ~ re idea with several examples. Assume a givenpa~itioncontains the three callsin the following fragment:
string title, SomeTitle; int year, SomeYear; out ("Classic movie", "Sons of the desert", 1933) ; out ("Classic movie", SomeTitle, Someyear) ; in("C1assic movieff,? &title, ? &year);
So, no other calls in the program produce or consume tuples of this f o m , The tuples of this form will be stored in a separate section of Tuple Space. Since all tuples in this section of Tuple Spacewill always have the constant C 1 as s i c movie If as their first field, there is no need to do run-time matching on this field. In other words, the compiler can determinestaticallythat the first field of the o u t operationandthe i n operationwill always match, Hence, this field need not be stored in the tuple at run-time, which saves memory and the costs of a run-time string comparison. Also, the second and third parameters of the i n call are formal parameters of the right type, so they will always match the actual parameters. In conclusion, there is no need to do any run-time matching at all for the i n call. If tuples of this form exist, resulting from one or more o u t calls, the run-time system can just take such a tuple and return it. A simple way to implement this is to store the tuplesof this partition in a queue. The o u t call adds a tuple to this queue and the i n call waits until the queueis not empty and then returns thefirst element. So, the following data structureis used for this partition: Q: queue of record string title; int year;
A call like o u t ( C l a s s i c movief1, t i t l e , year) is translatedinto: enqueue (Q, [title,
year] );
andacalllike i n ( I f C l a s s i cm o v i e r 1 , ? & t i t l e , ? &year) istranslatedinto: [title, year] := dequeue(Q) ;
where dequeue blocks while the queue is empty. The result is that during run time the Tuple Space need not be searched at all: it is sufficient to do an operation on a queue data structure. The overhead of the associative memory model has thus been eliminated in this case. A s another example, assume theprogramer changes the i n statement into in(f8Classicmovie11,? &title, 1940);
to search for classic movies produced in 1940. This small change has a major impact on the representation scheme, Clearly, this call still belongs to the partition described above, since it has the right number and typesof elements. So, we now have a partition with these calls: out (IIClassicmovie", ltSonsof the desert",1933) ; out (lfClassic movie", SomeTitle, SomeYear) ; in(I'C1assic movie", ? &title, 1940);
he new in call cannot be translated into a simple dequeue statement, because it requires that the third field contains the value 1940. One solution would be to search the entire queue until a tuple is found whose last field contains 1940, ~nfortunately,if the queue contains many elements, asfor example in a complete list of all classic movies, this search mayt&e a longtime. The Lindapreprocessorthereforeswitches to analternative representation scheme for pa~itionslike these, based on a hash table. Each tuple that is added by the out operation is inserted into a hash table, using the last fieldof the tuple as a hash key. The in operation likewise uses the last field asa hash value, so a linear search is avoided, If both in calls are used in the same program, things become more complicated a ~ s s u the ~ eprogram contains the following twostate~ents; in("C1assic movieti,? &title, ? &year) ; ? &title, 1940) ; in ("Classic movie",
The problem is that the first call does not have an actual ~ a r a ~ e t(a e rvalue) that can be used for hashing. Instead, the first call must search the hash table. The hash table thus must have an interface that allows thisoperation, To su~marize,the Linda preprocessor partitions all Tuple Space calls in a program into disjoint sets and then selectsa representation scheme for each set thatis tailored to that set. The result is that a Tuple Space call seldom needs to search the complete TupleSpace,
~ e ~ o ~ultiprocessor, r y the entire TupleSpacecan simplybestoredinthisshared memory. On a distributed-~emorymachine,however,other represe~tationshavetobe found. al approaches exist to represent Tuple h is based on hashing. As described field that has a hash value associated withit. We have described how this hash field can be used to avoid searching the Tuple Space. In a distributed implementation, the hash field can also be used to d e t e ~ i n eon which processor to store the tuple. To continue our earlier example, a statement like out ( ticlassic
movie" , "The
MusicBOX" , 1932) ;
can use the last field (193~)as a hash value. Assume we have a bash unction h ( ) that returns a value between l and P, where P is the number of processors in the system. The above state~lentthen sends a message to processorh( 1932) and stores the tuple ("Classic moviet1,"The Music B O X " , 1932)
on that processor. (As described earlier, the first field can actually be opti~izedaway, but nore this optimizatio~for our current discussion,) In this way, all tuples in the etrandomlydistributed a ~ o all n processors, ~ thus re vent in^ oneprocessor from b ~ c o a~c on~ ~ u n i c a t i bottleneck. on en astatement like
in(!'Classicmoviei1,
?
&title,
1940) ;
is executed, a messa e is sent to processorh( l 940) to ask it for a tuple that matches thein call. A problem occurs if the program contains in calls that do not have an acttxd parameter for the last field, as in in("C1assicmovief1,
?
&title,
?
&year);
In this case, the Linda preprocessor can determine that the last field is not suitable as a hash value. The second field has the same problem. Instead, the preprocessor can It Cl first field as hash value. ~nfortunately,this means that all tuples that have movie as their first field will be stored on the same processor. If there are m a tuples, this processor may become a bottleneck. The hash-based solution thus tries to distribute all tuples randomly among all processors. The advantage is that it is less likely that any one processor becomes a bottleneck. The disadvantage is that almost all Tuple Space calls result in communication. Also, if a processor P1 wants to send data to another processor P 2 (using a tuple), chances are that the data got~roughanother (random) processorP 3, because the tuple containing thedata is hashed onto processor P 3. In this case, two messages (instead of one) are required to get the data from the sourceto the destination. A co~pletelydifferent approach than hashing is to use a or^ ~ i s t r ~ ~ which ~ti#~, comes in several variants. One option is to broadcast all out calls and store the new tuple at all processors. The entire Tuple Space thus becomes r e ~ Z i c ~ on t e ~all processor advantage is that read calls can be executed locally, using the local copyof Tuple so they do not require any c o ~ u n i c a t i o nat all. In calls can first check the local Tuple Space to see if a matching tuple is available, but they also need comrnunication to inform the other processors that they have to delete the tuple. The second form of uniform dist~butionis the inverse of the above scheme: all out calls are executed without doing comrnunication and the new tuple is generated in the local Tuple Space. The entire Tuple Space thus is ~ ~ r t i t i among # ~ e ~all processors. Whenever a tuple needs to be found (using read or in operations), the run-time system first checks tuple is available in the local partition. If not, it broadcasts a request to all ach processor then checks if a ~ a t c ~ i ntuple g is available in its p~tition.If multiple processors have a matching tuple, further co~municationis needed to select one of the tuples. Yet another unifor~dist~bution scheme organizes all processors in a tworid. If a processor P executes an out call, the call is sent to all processors in the same row as P. A read or in call is sent to all processors in the same c o Z of~ the ~ ~ sender. Since a row and a column always intersect at exactly one point,a read or in call will always find the tuple (at the processor at the intersection point) if it exists, To s u ~ ~ a ~many z e ,different representations of the Tuple Space on a distributed systern are possible. The best scheme in general depends on the application. For example, if an application frequently reads a given tuple (for example the global bound in a branchand-bound algorith~),the best p e ~ o ~ a n is c eobtained if the tuple is replicated. Thus, the first uniform distribution scheme, which replicates the Tuple Space, is most efficient. If
tuples are read infrequently, however, this scheme is less efficient. The best scheme also if thehardwaresupports multicast dependsontheunderlyinghardware.Forexample, (sending the same message to multiple processors), for example an Ethernet network, replicating the TupleSpace becomes more attractive. A general problem with this variety of implementation techniques is that the performancemodel is notclear to the p r o g r a ~ e r .Forexample,onsomeimplementations read operationsmaybecheapwhile i n and o u t operationsare expensive, On other implementations, o u t operations are cheap and read and i n operations are expensive. In other words, the performance of a Linda program will be hard to predict without knowing more about the underlying hardware and Linda implementation. This makes it hard to write programs that are not only portable but also efficient on a wide range of systems, Inconclusion,Lindaprovidesanabstract,high-level p r o g r a ~ i n gmodel.ImplementingLindaefficiently is a challengingtask,andmanynoveltechniqueshavebeen developed to avoid the overhead of associative addressing and to il~plementthe Tuple Space on a dist~buted-memorysystem. The high level of abstraction also makes it more difficult for the p r o g r a ~ e to r understand the performance behavior of the Linda primitives.Nonetheless,theLindamodelhashad quite someimpactonparallelprocessing research since its intro~uctionin 1985, and is still being studied and used. For example, JavaSpaces (Freeman et al., 1999) is based on a model somewhat similarto Tuple Space.
The techniques described so far in this chapter are intended for implementing languages with explicit constructs for expressing parallelism. An interesting question is whether we needsuchlanguages at all. Parallel progra~mingis more difficult than sequent~alprogramming, so for the programer it would be much easier if the language i~plementation would take care of the parallelization. The holy grail in parallel processing is to design a compiler that takes as input any sequential program written in a given language and produces as output an efficient parallel program. In this way, programmers could write their programs in sequential languages and would not be bothered with express in^ parallelism, communication, or sync~onization.In addition, existing sequential programs could be run on a parallel computer without any effort from the user. The key issue is of course whether such an automatically parallelizing compiler can be implemented. No such compiler exists yet, and it is hard to tell if one ever will. For many applications, clever ideas are requiredto parallelize them efficiently. papers and Ph.L). theses have been devoted to describin~such ideas for specific applications. It remains to be seen whether a compiler will ever be able to parallelize all these applications automatically. On the other hand, substantial progress has been made with automaticparallelization.Ratherthantryingtopredictthe future, we willdescribethe technical problems of automatic parallelization and illustrate some of the techniques that have been developed so far. Most of the work on auto~aticparallelization is targeted for specific classes of parallel architectures. We initially assume that the target machine is a sh~ed- memo^ multiprocessor.InSection9.7.4 we shall look at distributed-memorymulticomputers.Much researchhasbeendoneoncompilers for vectorcomputersandVeryLong Inst~ction
Word (VLIW) machines, but we do not discuss such work here. Also, we assume that the source program is written in a Pascal-like, imperative language. In the real world, most parallelizing compilers are designedfor Fortran.
The first question to ask is what kind of parallelism a compiler should try to exploit. In theory, it could look all over the input program and try to find sections of code that can be executedconcurrently.Unfortunately,thisoftenresultsinveryfine-grainedparallelism that is not suitable for most parallel processors. For example, the two statements A := B * C; x := Y f 2;
can be executed in parallel, butit is unlikely that the overheadof sending the statements to differentprocessorswill be outweighed bytheperformancegain of usingparallelism. Some machines (for exampleVLIWs) are designed to exploit such fine-grained parallelism efficiently, so for such architectures this approach makes sense. The gain to be expected is modest, however. To exploit larger-scale parallelism and to increase the grain size, a good idea is to look only at loops in the program, since thisis where programs usually spend mostof their time. This observation is especially true for numerical programs, Most paralleli~ingcompilersthereforeexploitloopparallelism.Asanexample,considerthefollowingstatements: p r i n t ( " s t a r t computing") ; f o r i := 1 t o 1 0 0 0 do A [ i l := sqrt ( i ); od;
p r i n t (A) ;
This code computes 1000 square-roots. All iterations can be executed in parallel, since the calls to sqrt are independent of each other and the results are written to different elements of array A. Moreover, since the calls are expensive, it is worth while to execute theminparallel.Assumingthecompilerknowsabout the sqrt function(forexample because it is built into the language) and knows that it has no side effects, it can easily determine that thecalls can and should be executedin parallel. A compiler could convert the above code to a parallel program as follows. Initially, one processor (for example CPU 0) starts executing the program. After executing the first print statement, it arrives at the for-loop. It then splits up the 1000 iterations among all processors. If there are, say, 10 processors, CPU 0 executes iterations 1 to 100, CPU 1 executes iterations 101 to 200, and so on. Each processor writes part of the array A, which is stored in shared memory. After all processors are done, they synchronize and wait until the last one is finished. Finally, CPU 0 prints the valueof A [], This execution model is used by most parallelizing compilers. Except for the sequential code (which is executed by one CPU), all processors execute the same code, but on different parts of the data. The model therefore is known as 'Single Program Multiple Data' ('SPMD') parallelism (Karp, 1987). We do not discuss the details of the work distribution
and synchronization here, but a s s u ~ ethe compiler enerates code for it, using traditional com~unication~ e c h a n i s ~ (for s example shared variables or ~ e s s a pass ~e needed. If different iterations do different amounts of co~putation,the w can become quite complicated, and advanced schedulin techniques may be needed to give ssor an equal amountof work. ral, p r o g r a ~ smay contai~a large n u ~ b e rof loops, so a relevant question is to p~allelize. If there are several consecutive loops (each with independent iterations) in the program, they can all be parallelized, one after each other. For example, the following code: f o r i := l t o 1 0 0 0 do A [ i ] := s q r t ( i );
od ; p r i n t (A) ; f o r i := l t o 5 0 0 0 do B [ i ] := a r c t a n ( i ) ;
od; p r i n t (B) ;
jusl applies the technique described earlier to both loops in turn. If the loops are ‘nested’ inside each other, however, we have a different situation. For e x a ~ p l e consider , the folcode, which adds the matrices A [ ] and B [ 1: I
I
f o r i := 1 t o 5 0 0 do f o r j := l t o 1 0 0 0 do C [ i , jl := A [ i , jl + B [ i , jl; 0d; ad;
There are (at least) two ways to p~allelizethis code. One way is to repeatedly distriver all processors. With l 0 processors, each processor will execute ions and then sync~onize;this process is re~eated500 times (for all values of ij. An alte~ativeapproach is to dis~ibutethe outer loop. All l 0 processors will then execute ~ 0 0 0 0(= 50 x 1000) additions before they sync~onize.Cle approach has a lower overhead, becauseit requires fewer work-distributions and synchronization steps. ~arallelizing co~pilers therefore try to parallelize ‘outer loops’. Put in other words, they try to exploit l~ge-grainedparallel is^, whenever possible. An a~ternativeto this compiler analysis is to allow the p r o g r a ~ ~ to e rindicate which e s using the loops should be executed in para~lel. This is what data- allel el l a n ~ u a ~ do, in Section 9.1 5 . parfor c o n s t ~ cdiscussed t
In the examples given sofar, all iterations of the loops are inde endent of each other. is is where the problen~s loops in actual pro~rams,however, do not have this property for p~allelizingcompilersbegin.To illustrate, consider a sli different version of our earlier example:
for i
:= 2
A[i]
:=
to 1 0 0 0 do sqrt ( A L i - 1 1
) ;
od ;
The iterations of the loop now can no longer be executed in parallel. For example, during iteration 2 (i = 3), the value of A [ 2 1 is required. The value of A [ 2 ] must be computed before iteration 2 can start. Since the value is compute^ during the iteration 1 must be executed before iteration 2. This is called a lo The argument applies to all iterations, so the loop iterations must In other words, the loop is sequential and cannot be parallelized. ~arallelizingcompilers are faced with two challenges. First, they should d e t e ~ i n eas precisely as possible whether a given loop may or may not be executed in parallel. When in doubt, they must be ‘conservative’ and execute the loop sequentially; otherw parallel program might produce different results than the original sequential p Second, whenever a compiler discovers a loop that cannot be parallelized, it should try to transform it into a loop that can be parallelized. We discuss such transformations in the next subsection. Below, we look at the first issue, In general, there can be several different reasons why a given loop cannot be p ~ a l l e l ized. We illustrate the most important cases with the following code fragment: for i : = 1 to 1000 do (1) A [ i ] : = . . .; (2)
...
(3) (4)
B[i-13
(5) (6)
...
:= A I i - 1 1 ; := B C i l ; :=
...; C [ i l : = . . .; C E i - 1 1 : = . . .;
ad;,
Statement 1 assigns a variable (A [ i l ) wh will be used in statement 2 d next iteration of the loop; this is called a e n ~ e n ~ eStatement . 3 uses a variable e will be changed in statement 4 during the next iteration; this is called e. Statement 5 assigns a var 1 be reassigned in statement 6 d ration; this is called an out ence. In each case, the d e ~ e ~ d e n c e prevents the loop from being parallelized. The examples above used very simple subscript expressions. We used either i or i1 as index, where i is the loop variable. In practice, programs use far more complicated subscript expressions, which mdses the dependence analysis harder. Still, an accurate dependence analysis may mdse the difference between being able to parallelize a loop and having to run it sequentially. As an example, consider the following code fragment: for i
:=
1 t o 2 5 0 do
A[4*i]
...
od;
:=
...;
:= A [ 4 * i
- 11;
At first sight, one might say that thecode looks similar to statements 1 and 2 of our previous example, so there is a flow dependence. More careful inspection of the code reveals that this is not the case. The first statement will assign the arrayentries 4, 8, 12, and so on, whilethesecondstatementwillreadthe entries 3, 7, 1l , andsoon.Thus,there is no dependence and the loop can be parallelized,On the other hand, the loop f o r i := 1 t o 2 0 0 do A[5*i]
.,.
od;
:= ...;
:= A [ 4 * i - l];
have a dependence. For example, iteration 3 (i = 3) will assign to A [1 5 ] while iteration 4 will read this element. Several techniques, often based on algebraic manipulation, have been developedfor testing whether a given loop contains any dependencies. Several other problems exist that make loop analysis and parallelization even harder. its If a loop contains a routine call, this routine will have to be analyzed to determine effects on any variables that it can access. So, interprocedural analysis is required here. Another hard problem concerns pointer variables. The code below, for example, contains an indirect assignment througha pointer variable p:
does
f o r i := 1 t o 2 5 0 do "p .- ..*, ... := A [ 4 * i I ; *-
od;
If p points to an element of A [],this assignm~ntmay cause a loop-carried dependence. Unfo~unately,it is often difficult to determine during compile time whether p points into A E I ,let alone determining to which specific element p points. Often, this depends on run-time conditions, and cannot be determined during compile time. Still, several techniques have been developed thattry to estimate the possible targetsof pointer variables and usethisinformationtoparallelizeconservativelyloopscontainingpointer-dereferences ( H u ~ eetl al., 1994). An alternative to this compiler analysis is to let the programmer indicate which loops can safely be run in parallel. Data-parallel languages often use a construct like the parfor statement for this purpose. This construct thus may indicate that it is both desirable and legal to run a given loop in parallel. Of course, this also takes away an important advantage of automatic parallelization.
The for-loops written by the programmer often have a form that make compiler optimizationandparallelizationdifficult.Optimizingcompilers for parallelmachinestherefore exploittechniquestotransformloops into therforms,thatareeithermoreefficientor easiertoparallelize.Dozens of such loo r ~ s t r u ~ t u r itransfor~ations n~ havebeen designed and each compiler typically usesa bag of such 'tricks.' Some tr~sformationsare not related to but increase try to sequential performance, for example by improvingthe avior of theprogram;they strive toincreasethehitratio for the CPU cache. Rather than trying to give a complete overview of loop transfo~ations,we
will illustrate the general idea by describing two transformations that are designed specifically for parallel machines. We start witha simple example, shown below: f o r i := 1 t o 2 5 0 do A [ 4 * i ] := sqrt ( i ); T := A [ 4 * i
-
l ] ;
. * *
od;
This code is similar to one of our earlier examples, but an important difference is that it contains an assignment to a scalar variable T. The problem is that this assignment creates an output dependence, since every iteration of the loop will do an assignment to that same T. A compiler can easily transfom this loop to an equivalent loop that does not have this dependence. The idea is to let every iteration assign to a different (temporary) variable. After the loop has been executed, the value of the last (250th) iteration is assigned to T, We can implement this idea using an extra array, called Tmp 1. f o r i := 1 t o 2 5 0 do A [ 4 * i ] := s q r t ( i ) ; T m p [ i ] := A [ 4 * i - l ] ;
...
od; T := TmpC2501 ;
This transfo~ationis called scalar ex ansion. It eliminates the output dependence and thus allows the transformed loop to be paralleli~ed.On a dist~buted-memorymachine, a slightly different approach is to give every processor its own scalar variable, stored in its local memory. The Tmp [] array then essentially is distributed over the local memories. This transfomation is known Another important transfo interc~an~e, which works nested onloops, such as the loops shown below: f o r i := 2 t o 5 0 0 do f o r j := 1 t o 1 0 0 0 do A[i, j]
:= A [ i - l , j ]
-I-
B [ i ,j ] ;
od;
od;
This code is similar to the matrix additioncode shown earlier, but the left-hand side of the assignment statement now is an element of A [,1. This assignment statement causes a loop-carried dependence. For example, iteration 2 (i = 3) of the outer loop uses the elements A [2 , l] to A [2 , 1 0 0 0 ],which are computed during iteration 1 (i = 2). We couldparallelizetheinnerloop,but, as wehavediscussedbefore,thisgenerates finegrained parallelism, which is less efficient. We can solve this problem by t r a n s f o ~ n gthe code as follows:
e inner and outer loops have been exchanged. We can now parallelize the outer 00 tasks that each execute the inner loop se~uentially, leli~ingcompiler must of course make sure that the original and transformed equivalent. The compiler is only allowed to do transformations such as scalar d loop interchange that do not change the result of the programs. We do not alysis that is required for this.
~ultiprocessor, auto~atic p~alle~ization becomes even more difficult, because the com-
for i := l to 5 0 0 do for j := 1 to 1 0 0 0 do C [ i , jl := A [ i , jl + B [ i , j l ; 0d; od :
~ e ~ o machine, r y the three arrays can be stored in the s ~ a r e d m e ~ owhere ry, accessed by all processors. On a multicomputer, this no longer is the case. e ~ o 1 ~ t i owould n be to store the arrays in the memory of one processor, say CPU 0. All ors can then access these data by sending ~ e s s a g e sto CPU 0, asking it to certain elements of the arrays. CPU 0 would thus implement a (very slow) is approach would generate a huge amount of co~munication utweigh the gainsof parallelism and thus be unacceptable. cceptable solution, avoiding a large part o f the commu~icationoverhead, can be i , s t ~ i the ~ ~arrays t ~ ~over ~ the different processors, in such a way that each ains those elements that it accesses most frequently. In the above example, actually is quite simple. ~ssumingthat we parallelize the outer loop over accesses 50 rows of the three arrays. For example, CPU 0 accesses rows 51 to 100, and so on, as shown in Fi the mays row-wise we can reducethe c o ~ u n i c ~ t i o n . n~ortunately,thisexample is overlysimplistic.Ingeneral,pr o illustrate this,supposethematrixaddition code
l
CPU 0 50
51
CPU 1
100 I
l
451
I "
CPU 10
500
"
re 9.11 An array distributed over 10 CPUs.
A [ 1 l, ]
:= A [ 1 2 5 ,
l]
+
A[175,
11;
With the row-wise distribution described above, the two operands are stored on CPUs 2 and 3, respectively. If CPU 0, which contains A [1 11 ,executes the statement, it thus has to communicate with these two processors to get the current values of the two elements. A slightly more complicated exampleis the following code: I
f o r i := 2 t o 499 do f o r j := 1 t o 1 0 0 0 do A [ i , j ] := B [ i - l ,j ]
+
B[i+l,
jl;
od; od;
Most of the assign~entscan be executed without doingc o ~ u n i c a t i o nusing , the d i s ~ i b ~ " tiondescribedabove.Forexample,processor 0 canassign B E 1 11 -t- B 3 l 1 1 to A [2 , l],since it contains all these elements, For some assignments, however, data from a remoteprocessorareneeded.Forexample,theassignmentto A [5 0 l1 requires B [5 1 11 ,which is stored on CPU 1. In general, each processor requires the last row of its left neighbor and the first row of its right neighbor (except for the leftmost and rightmost processor). The task of thecompiler is togeneratethe c o ~ u n i c a t i o nstatementsneededto implement these data transfers. In simple cases like the one above, the compiler can compute precisely which elementseach processor needs, so it can insert send and receive statements in the code to transfer these elements. Figure 9.12 shows in a pseudo-notation what the code generated by the compiler might look like. Each processor is assigned a number of rows of the matrices, ranging from lower-bound to upper-bound, A processor allocates space for these parts of the array, and some extra space to contain two additio~al rows of B; the initia~izationof the arrays is not shown. At the start of the e~ecution,each processor checks to see if it should send its first and last rows to its neighbors; we use the I
I
I
notation E3 E i , *1 to denote a row of B. Next, each processor checks if it should receive rows from its neighbors. The send commands can be synchronous or asynchronous, the receive commandsaresynchronous.Finally, each processorperformsitsshare of the work.
INPUT : ID: number of this processor (ranging from o to P-I.) lower-bound: number of first row assigned to this processor upper-bound: number of last row assigned to this processor real Arlower-bound. .upper-bound, 10003, Bllower-bound-l .. upper-bound+l, 10001;
Figure 9.12 Generated (pseudo-)code.
In more complicated cases, the data to be transferred can only be computed time. As an example, for the statement
at run
All, l] := B[X, l] ;
it depends on the run-time value of X from which processor the element must be fetched. In this case, the compiler generates code to determine this processor (at run time) and to send a request message toit; the run-time system on this processor will then reply with the requested value. Note that this exchange needs two messages over the network, a request and a reply, so it is less efficient than theearlier example. Finally, there are many important opti~izationsa compiler can apply to reduce the c o ~ u n i c a t i o noverhead. For example, messa~ecom~inin e s s a ~ ea ~ ~ r ~ ~ a t i o n combines multiple small messages into a single large message, which is more efficient on mostsystems.Anotheroptimization is toexploitspecialhardwareprimitives,suchas broadcasting. We do not discuss such opti~izationshere. Suffice it to say that the task of ~eneratingefficientmessagepassing code makesthealreadyverychallengingtask of automatic ~arallelizationeven more difficult. The data distribution has a huge impact on the performance of the parallel program, since it determines to a large extent which data transfers will take place. The optimal distribution depends on how the arrays are used in the mosttime"consumin~parts of the program,Since it is verydifficultfor a compilertodeterminetheoptimaldistribution,
languages like High Performance Fortran allow thep r o g r ~ e to r help the compiler. HPF defines a collection of compiler directives that programmers can use to specify distributions of arrays. For example, the following HPFcode fragment !HPF$ PROCESSORS pr(4) real A(100, 1 0 0 ) !HPF$ DISTRIBUTE A(BLOCK, *) ONTO
pr
declares a two-dimensional arrayA that is distributed row-wise overfour processors. Many other types of distributions, for example cyclic or column-wise, can also be expressed in HPF. The compiler can use this i n f o ~ a t i o nas hints about how to dist~bute the data structures, The speed-ups achieved by supplying hints depend completely on the nature of the problem and the quality of the hints; they can easily make the difference between an infeasible computation anda readily doable one.
This concludes our discussion of Compilation and processing techniques for parallel and distributed programs. We have seen that such programs may express theparallel is^ explicitly or may rely on automatic p~allelizationby some language processor. Five models for explicit parallelismhavebeenintroducedandtheirimplementationsdiscussed;the Linda Tuple Space can profit more from compiler support than the other four. Automatic parallelization still is a challenging problem, especially for distributed-memory machines. Some languages therefore use a combination of compiler analysis and directives from the user.
-The goal of parallel systems is to solve a given problem as fast as possible, using multi-
ple processors. A distributed system contains multiple autonomous processors that are connected bya network and that work on many different tasks from many users. -A process is an abstraction of a physical processor. A process contains an address space that stores data. -Manymodelsandlanguages exist for programmingparallelanddistributedsystems. The programming models discussed in this chapter are shared variables, message passing, shared objects, TupleSpace, and data-parallelism. -Shared variables are variables that can be accessedby multiple processes. Mutual exclusionsynchronization ~echanisms(forexamplelocks)andconditionsynchronization mechanisms (for example monitors) are used to synchronize access to the shared variabies. -Message passing uses two primitives, send and receive, to transfer data from one process to another process. Languages differ in the way processes are addressed, and the way messages are sent (synchronous or asynchronous) and received (explicitly or implicitly). -Parallel object-oriented languages encapsulate shared data in objects.
s a globally shared data structure called the Tuple
ams, all processors execute the same algorithm, but operate on ent parts of the data set. Automatically parallelizing compilers usually try to generate data-parallel programs. parallel and dist~butedlanguages are implemented mainly using a run-time systher than extensive compiler support. ting system processes, but a ~ e d - ~ ~ultiprocessor e ~ o ~ ly, and without using busy on top of an indivisible ruction, such as test-and-set, Busy waiting is avoided by ~aintaininga list lementation issues. Many different communiTCPBP, and custom user-level protocols). ng an administration of where each lan are used to convert between data structures and a linear can allel language over a library is that i t s i~plementa~ion er control over the order in which ~ e s s a g e sare serd a more complicated run-time system, which maintains a list of incoming that i s inspected whenever the application wants to receive a message. el object-oriented languages, one issue is how to locate objects; if objects can inter chasing techniques can be used. ct replication must use a protocol for keeping the replicas of en an object is modified, such a protocol can either invalidate of the object or it can update all copies, for example using total~y-ordered Linda compilers reduce the overhead by selecting the right storage representation for each tuple. buted-~emorymulticomputer, Tuple Space can use a hash-based partition in^ cheme (replicating or p ~ i t i o n i n gthe Tuple Space). The most scheme depends on the app~icationand the underlying hardware. a challengin~problem. Paralle~izingcompilers try to find s that are in~ependentfrom each other. To e li~ in a teloop encies, they can apply loop rest~cturingtransfo~ations. For machines without ~ ~ mtheocompiler ~ , also has to dis~ibutethe data structures over the different rocessors need remote data. c o ~ ~ i n i nisgan important optimization to reduce the c o ~ ~ u n i c a t i o formance Fortran (HPF) also uses i n f o ~ a t i o n(hints) from the pro
introductions to parallel processing ( A l ~ a s and i Gottlieb, 1994; al., 1994; Wilson,1995).Goodbooksondistributedsystems include Tanenbaum (1995) and Coulouris, Rollimore and Kindberg (19 papersaboutparallelanddistributedprogramminglanguagesareby T a ~ e ~ b (la 989) u ~ and Skillicorn andTalia (1998). A survey of dist~butedp r o g r a ~ i n gmodels is given by Andrews (1991). The book by Wilson and Lu (1996) focuses on object-o~entedparallel languages based on C++. An extensiveoverview of distributedsharedmemorysystems is includedinTanenbaum’s ~ Zhou (l 990) and Nitzb book (1995). More concise survey papers are by S t u and and Lo (1991). 0 ct migration and replication are discussed extensively in papers by et al. (1988)and Z et al. (1998)respectively.Linda is describedinseveralresearch papers (Ahuja et al., 1986; and Celernter, 1989). A technical report describes imp~ementationissues of Linrnson et al., 1988). The literature contains overviews of compiler an sf or mat ions for automatic parallelization (Wolfe, 3 99 et al. 1994). Exampl ( al., 1996) and SUIF compilers include Polaris mance Fortran i s described by Loveman (1993).
essage passingcanbeimplementedonboth shared- memo^ anddistributed~ e m o r ymachines, whereas shared variables are difficult to implement on machines withoutshared m e ~ o r y .Roes thisimplythatsharedvariablesandmonitorsare super~uous? The Linda Tuple pace can be used to simulate both message passing and shared variables.Describehowboth se~ding/receiving messagesand readin~/writin~ shared variables can be simulated using Linda. Explain why the costsof a thread switch depend very much on the CPU architecture. As discussed in Section 9.3, busy waiting can be avoidedby using an ad mini st ratio^ (a list of control blocks) for blocked threads. This ad mini st ratio^, however, also has
to be protected by locks, to make the operations on it indivisible. So, to implement a lock ~ i t h o u tbusywaiting, weneedanotherlock.Willthisresultinanendless implementation loop?
Somerun-timesystemsacquirelocksusing a combination of busywaitingand thread switching, to avoid wasting CPU cycles while still reducing the overhead of thread switches. Can you thinkof a useful combination? plain what happens if two processes simultaneousl~try to execute a monitor st reads and returnsdata protected by the monitor (but without modi-
(b) Design an -optimized -implementation of monitors that will allow multiple readers, while preserving the original monitor semantics as perceived by the programmer,
. ' 7
Manymessagepassinglanguagesthatuseportsas expl~nedin Section 9.1 allow only a single process at a time to receive from any given port. Why would this restriction be imposed? Write marshaling and unmarshaling routinesfor the following procedure: procedure f(x: float; y " : integer; z: array[1..1001 of integer);
9.9
Somemessagepassinglanguages(forexampleAda, SR) support a 'select statement', with which a receiver can wait for different types of messages from different senders,dependingonwhichonearrives first. Veryfewlanguagessupportthe opposite: a statement with which a sender can wait until one of the receivers is willing to accept its message (is waiting in a receive state~ent).Can you think of a reason for this?
9.10 Section 9.4'4 explained that message handlers that may block should be executed by a newly created thread rather than by a daemon thread. Someone suggests the following idea: let the daemon thread simply start executing the message handler. Only if the handler actually needs to block, create a new thread dynamically, copy the current stack of the daemon to the newly created thread, andlet the new thread continue the executionof the handler. Why is this hard to implement? 9.11 In Section 9.5.2 we discussed the possibility of gratin^ an object from one processor to another processor. Suppose an object X invokes an operation on an object Y that is located on a different processor. Further suppose X contains a thread of control, but y does not, Two different options for object migration now exist: X could of X. What are be moved to the processor ofY, or Y could be moved to the processor the advantages and disadvantagesof both options? 9.12 Someone suggests a newupdateprotocol for implementing replicated objects that does not need totally-ordered group communication. The scheme works as follows. For each replicated object, one processor (containing a replica) is designated as the primary copy. If a processor wants to do a write operation on a replicated object, it sends the operation to the primary copy of the object. The primary copy broadcasts theoperation to all other processorsthatcontain a copy. Why is thisscheme incorrect? 9.13 The operation Maximum is defined as follows. Each processor performs the following procedure call: result = Maximum
(value) ;
where value may be different for each processor, Each processor waits (blocks) until every processor has started this call. Next, every processor gets as result the
€xercises
maximum of all the value parameters. Figure 9.13 shows how the operation can number of processors P:
697
be implemented in C/Linda, for a fixed
int Maximum (int value){ if (mycpu == 1) { /* the first CPU creates the tuple*/ out ( "max1I , value, 1 ) ; } else { /* other CPUs wait their turn*/ in(l'maxll, ? &max, mycpu); if (value > max) max = value; out ( llmax" , max, mycpu-t 1) ;
1
/* wait until allCPUs are done */
1
read (flmaxl', ? &max, P) ; return max;
Figure 9.13 CLinda code for the operationMaximum.
(a)Analyzetheperformance of theoperation for thethreedifferentdistributed implementations of Tuple Spacedescribed in Section 9.6.2. (b)Designanoptimizedoperation,usinghandwrittenmessagepassing.Compare the optimized implementation with the Linda version.
9.14 Some imple~entationsof Tuple Spacesupport an extra primitive f e t ch-a l1 (t ) withwhichalltuplesthatmatch t canbeobtainedinoneindivisibleoperation. Explain why this primitiveis difficult to implement.
p
9.15 Does the following code contain a loop dependency? If so, what kind of dependency is it? for i := 2 to 9 do A[7*i] := i*i; A[9*i - 21 := i*j*i. A[11* (i-1)+7] := i*i*i*i* od;
9.16 Loop interchanges as described in Section 9.7.3 may also have Performance disadvantages, Consider the following example:
for i := 1 to 1 0 0 0 do for j := 1 t o 1 0 0 0 do A[i,
j] := A [ i , jl
-I-
1;
ad; 0d;
The sequential perfomance of this code may decrease su~stantiallyif the two loops are interchan~ed.Explain,
is to^ of parallel and d i s t ~ i ~ u~t ae n~ g u ~ gi e~ p l e ~ e n t ~Study t i o ~K~q ’ s paper (1987), which points out the difficu~tiesof parallel p r o ~ r a ~ and i n introduces ~ the concept of SPMD parallelism, and s u m ~ ~its z main e points.
A compiler can be viewed as a set of algorithms for tasks that apply actions to data that representsyntacticconstructs.Examplesareparsing(thetext of) a function call, (the AST of) an expression, ge~eratingcode for (the AST of) a for-loop, etc. shows someof these tasks ina table. In Section 1.2 we have shown how this table can be sliced up into rows, each row corresponding to a module or pass. Figures 1.12 through 1.19 showed four modules: lexical analysis, parsing, code generation, and inte~retatio~. This ment emphasizes theactions p e ~ o ~ on e the d data, and is representative of the ive p r o g r ~ i n g m (but see Section 1.41). It is also possible to slice this table along the columns, This results in modules for syntactic data objects, with the actions attached to them, and leads to an object-oriented approach to compiler construction.
declarations,expressions,function calls, statements, for-loops, etc., each with eth hods ‘Parse’, ‘Handle context” ‘Interpret’,‘Generatecode’,and a fewothers.Thishasthe advantage that all information about a syntactic construct is collected in one place in the compiler, in the code for that syntactic construct. For example, each instance of the class F ~ n ~ ~ i o n - Cknows ~ l lhow to create itself by parsing, how to handle its contest, how optimize itself, how to generate code for itself, etc., in good object-oriented fashion. ore in p ~ t i c ~ l acalling r, the method Generate-Code ( ) of the top object of the AST will set the machine^ in motion and generate code for the entire tree. A disadvantage is of
1
ass~gnments declarations for-loops simple expressions
l
lexical analysis
lexical analysis of declarations
parsing
parsing of simple expressions
context handling context handling of declarations optimization
optimization of for-loops code generation for assignments
code generation
Figure A.1 Table of tasks in a compiler with some tasks filled in.
course, as with all strong structuring, that information needed for sophisticated decisions may not be easily available, because it happens to reside in distant and poorly accessible places. But then again, imperative compiler constructionis not easy either. For the best match between syntax features and object-oriented features, we have to streamline our grammar of Figure 1.9, as follows. The object-oriented paradigm has two fun~amentalways to create new classes:
-class e~tension,in which
a new class, the subclass, is formed by extending an existing class, the superclass; the new class can be seen as a specialization of its superclass. -c o ~ ~ o s i t i oinn ,which a new class is formed by combining one or more existingclasses; an instance of the new class then contains instances of the compos~ngclasses, or references to them.
Class extension corresponds toa grammar rule of the form A "+ AI
I
A2
I ... I
An
with the restriction that Al through A, are either all non-terminals or all t e ~ n a l s .From an object-oriented point of view, this rule expresses the idea that objects of classes Al through A, are special kindsof As. Composition correspondsto a g r rule of~the form ~ A
+ A1AnA2
with the restriction that there are no other rules for A. From an object-oriented point of view, this rule expresses the idea that objects of class A are composed of objects of classes Al through A,. The object-oriented view does not impose the proper order Al ... A, of the components, however; this order must be insured separately by thecode of the parsing routines. To find a natural assignmentof the non-terminals in a grammar to extended and composed classes, we have to restrict our grammar to rulesof the above forms; such a grammar form. It is easy to bring a grarnmar into AND/OR form by introducing some new non-terminals. A possible AND/OR form of our grammar for fully ~arenthesi~ed expressions is shown in Figure A.2. The non-terminals expression and operator areORnon-terminals, digit and parenthesized-expression are AND non-terminals. The rule for expression says that digit and parenthesized-expression are both special kinds of expressions,which fits our intuition. The classes Digit and Parenthesized-Expression will therefore be extensions of theclass Expression. Since each expression is eithera digit or a parenthesized expression, there will be no objects with dynamic classExpression,only Digit and Parenthesized-Expression. Thesyntaxtreeitselfwillconsist of nodes of static types Expression and Operator only. expression -+ digit I parenthesized-expression digit -+ DIGIT parenthesized-expression -+ '(/ expression operator expression')' operator "+ '-I I" Figure A.2 The grammarfor simple fully parenthesized expressions in AND/OR form.
Figures A.3 through A.8 show the demo compiler in Java. The class Expression in features theabstractmethods Print ( ) , Figure A S hasbeendeclaredabstract,and Interpret () ,and Generate-Code ( ) ;this makes these methods obligatory for the classes Digit (Figure A.6) and Parenthesized-Expression (Figure A.7). Indeed, both Digit and Parenthesized-Expression contain non-abstract imple~ e n t a ~ o of n sthe methods. The same applies to the abstract class Operator. It is now easy to verify that if the top node is the object Program of class Expression,the method call Program. Print () will print the expression, Program. Interpret () will evaluate the expression, and Program. Generate-Code () will generate code for the expression. Making the syntax tree create itself by parsing turns out to be more difficult, however,
class Demo-Compiler { public static void main(String[ I args) { // Get the intermediate code Lex.Get-Token ( ) ; // startthelexicalanalyser Expression icode = Expression.Parse-Or(); if (Lex.Token-Class != Lex. EOF){ System.err.println(liGarbage at end of program ignored");
1
if (icode == null) { System.err.println("No intermediate code producedii); return;
1
// Print it 'I) ; System.out,print (I1Expression: icode, Print (1 ; System.out.println0; // Interpret it System.out .print (IiInterpreted:; System.out.println(icode.Interpret0);
1
// Generate code for it System.out.println('lCode:'l); icode .Generate-Code (1 ; System.out.println('lP~INT'l) ;
Figure A.3 Class Demo-Compiler, driver for the object-oriented demo compiler.
It would be nice to rig the constructors that are called when an object is created so they would do theparsing;thenthesimpledeclaration Expression Program = new ression ( ) ;would create a new object Program of class Expression,initialized with the parse tree. ~ n f o ~ u n a t e l this y , seems impossible for technical reasons (see Exercise A.3 ). The next best thing is to have self-parsing methods for all nodes; this is the approach shown in Figures A S through A.8. The parsing templates for AND and OR non-te~inals A.9, is similar to that differ. The template for anANDnon-terminal,showninFigure shown for the recursive descent parser in Section 1.2.5, except that there is only one alternative to parse, The template for an OR non-te~inal,shown in Figure A.10, is less obvious. OR templates are considerable. The The differences between the AND and Parse-And () method is applied to an already allocated object of an extended class co~espondingto an AND non-terminal. Its task is to see if the required components can
import java.io.IOException; class Lex { public static public static public static public static
int Token-Class; final int EOF = 256; final int DIGIT = 257; char Token-Char-Value;
public static voidGet-Token0 char ch;
{
// Get non-whitespace character do {int ch-or-EOF; try {ch-or-EOF = System.in.read();} catch (IOException ex){ch-or-EOF = -l;} if (ch-or-EOF -c 0) {Token-Class = EOF; return;} ch = (char)eh-or-EOF; // cast safe is } while (Character.is~hitespace(ch)); // Classify it if (Character. isDigit ) (ch) { Token-Class = DIGIT; Token-Char-Value = ch;
1
else { Token-Class = ch;
1
1
1
~ i ~ u A.4 r e Class Lax,lexical analyzerfor the object"o~enteddemo compiler.
be found in the input and, if so, to initialize the object with them, To do so, it makes the same assumption as the recursive descent parser in Section 1.2.5, which is that if the first member of the alternative is found, the others haveto be present too. The Parse-Or method is astaticmethod; it is appliedtoanabstractclass co~espondingto an OR non-te~inal,before any object has been allocated. Its task is to l be found inthe input and,if so, see if any of the specializations of the OR n o n - t e ~ n a can To do so, it tries the possible subclasses in textual toreturn the c o ~ e s p o n d i nobject. ~ order, as follows. To try a subclass Ai,it tentatively declares an object ai of that subclass and calls its method ai .Parse-And ( ) ,to see if a syntactic construct c o ~ e s p o n d i nto~ Ai is present in the input. If it is, the call ai .Parse-And ( ) initializes the fields in the subclass object and succeeds. The object ai is then returned by the method Parse-Or ( ) ;this action changes the dynamic class of the object to Ai,but its static class remains A. If Parse-And ( ) cannot find a syntactic construct Ai in the input, the object ai is abandoned to thegarbagecollector,andthenextsubclass is tried. If all attemptsto find apropersubclass fail, there is nothinginthe input atthispointthat
abstract class Expression{ public static ExpressionParse-Or0 {
// Try to parse a parenthesized-expression Parenthesized-Expression pe = new if (pe.Parse-And()) return pe;
Parenthesized-Expression();
// Try to parse a digit Digit d= new Digit () ; if (d.ParseAnd0 ) return d;
1 1
return null;
abstract public void Print(); abstract public int Interpret(); abstract public void Generate-CodeO;
Figure A.5 Class Expression for the object-oriented demo compiler.
class Digit extends Expression { private char digit= '\Or; public boolean Parse-And() { if (Lex.Token-Class == Lex.DIGIT) { digit = Lex.Token-Char-Value; Lex. Get-Token () ; return true;
1
1
return false;
public void Print() {System.out.print(digit);} public
int
Interpret ( ) {return
digit- ' 0' ;}
public void Generate-Code0 { System,out.print('!PUSH + digit
1
1
+ ft\n'!) ;
Figure A.6 Class Digit for the object-oriented demo compiler.
corresponds to anA,and the routine returns the null pointer. boolean and Anotherdifference is that Parse-And () methodsreturna Parse-Or ( ) methods return a value that statically belongs to an OR class but dynami-
class Parenthesized-Expression extends Expression { private Expression left; private Operator oper; private Expression right; public boolean Parse-AndO { if (Lex,Token-Class == ' ( ' 1 { Lex.Get-Token ( 1 ; if ((left = Expression,Parse-Or()) == null) { S y s t e m . e r r . p r i n t l n ( " M i s s i n g left expression"); return false;
1
if
(
(oper = 0perator.Parse-Or ( ) ) == null) { Syst~m.err.println(ltMissing operator"); return false;
1
if ((right = Expression,Parse-Or()) == null) { System.err .println( lfMissingright expression"); return false;
1
if (Lex,Token-Class ! = ' 1 ' ) { System,err.println("Missing ) added");
1
else { Lex.Get-Token ( 1 ;} return true;
1
return false;
} public void Print 0 { System.out.print(' ( ' ) ; left .Print ( ) ; oper.Print 0 ; right.Print ( 1 ; System.out.print(')');
1 public int Interpret ( 1 { return oper.Interpret(left.Interpret0, right,Interpret());
1 public void Generate-Code ( ) { left .Generate-Code 0 ; right.Generate-Code ( 1 ; oper .Generate-Code ( ) ;
1
1 Figure A.7 Class Parenthesized-Expres sion for the object-oriented demo compiler.
abstract class Operator{ public static OperatorParse-Or0 { // Try to parse a‘ S ‘ Actual-Operator co-plus = new Actual-Operator(’+’); if (co-plus.Parse-And () 1 return co-plus;
// Try to parse a’*‘ Actual-Operator eo-times = new Actual-Operator(’*‘); if (Go-tirnes.Parse-And()) return co-times;
1
1
return null;
abstract public void Print(); abstract public int Interpret(int e-left’ int e-right); abstract public void Generatemcode();
class Actual-Operator extends Operator{ private char oper; public Actual-Operator(char op) {oper = op;} public boolean Parse-And() { if (Lex.Token-Class == oper) {Lex.Get-Token(); return true;} return false;
1
public
void
Print 0 {System.out
.print
(oper) ; }
public int Interpret (int e-left int e-right) { switch (oper) { case +‘, return e-left + e-right; case ’*’: return e-left * e-right; I
1
1
return 0;
public void Generate-Code0 { switch (oper) { case ’+ ’: System.out.print ( llADD\nll ) ;break; case ’ : System. out .print ( llMULT\nll ) ;break;
1
* l
Figure A.8 Class Operator for the object-oriented demo compiler.
class A extends... { private A, al; // fields for the components private A2a2; f
.
.
private A, a,; boolean Parse-And (1 { // Try to parse the alternative A1 A2 ... An: if ((a, = ~,.~arse-oro)!= null) {
public
if ( (a2 = A2.Parse-Or 0 1 == null) error ("Missing A2II ) ;
(I..
1
1
1
if ( (a, = An. Parse-Or 0 == null) error ( "Missing An") ; return true;
return
false;
Figure A.9 The templatefor an AND non-terminal.
abstract class A{ public staticA Parse-Or 0 {
// Try to parse an Al: A, al = new A1 (1 ; if (al.Parse-And (11 return al; ... // Try to parse an A,:
An , a new An() ; if (a,. Parse-And (11 return an;
1
1
return null;
Figure A.10 The templatefor an OR non-terminal.
cally to an AND class whichis an extension of that OR class. This is reasonable, since the Parse-And () method only has to determine if a syntax structure that is known in detail is present in the input, whereas the Parse-Or ( ) method has to determine specialization applies. One result of the different templates shown above is that the nodes in the AS?: are always instancesof OR classes statically but of AND classes ~ynamically. ~~~~~
The class Demo-Compiler (Figure A.3), which contains the method main () ,and the static class Lex: (Figure AA), which provides the stream of tokens, require no further explanation,
The technique shown here is only one way to do object-oriented compiler construction. It has been chosen because it employs the characteristic features of object-oriented programming, class extension and method overriding with dynamic binding, to improve compiler structure, But it also leaves some problems unsolved. For one thing, it is not at all obvious how a more advanced parsing technique, which requires a global view of the grammar, could be incorporated within the above framework, in whichall knowledge about a syntactic construct S is localized explicitly in the class definition for S. For another, it is not (yet) clear how class extension and method overriding can help in machine code generation. It is difficult to assess the weightof these problems, since almost all our experience in compilerconstruction is intheimperativeparadigm,and all toolsandtechniquesare geared to it. But the approach demonstrated inthis section shows that at least occasionally an improvement in structure can be obtained. Holmes (1995) reports on the construction of a Pascal compiler using object-oriented techni~uessystematically wherever possible. The AND/OR form of grammars for objectoriented parsing was suggestedby Koskjmies (1991).
A.1 In Section A.3 we suggest that the natural way to do object-oriented parsing would be to let the object constructorsdo the parsing. (a) Find out why this cannot be done in Java, C++, and your favorite object-oriented language, (If it carz be done in any object-oriented language, we would like to hear about it!) (b)What feature(s) shouldanobject-orientedlanguagehavetoallowparsing by object constructors?
Additional answers to exercises are available through JohnWiley’s Web page.
1.1
Advuntuges: Assuming the language is still being designed, writing a major piece of software in it is an excellent shake-down for the language design. Compiling the compiler may be a good way to debug the compiler (but there is a problem here: how defensible is debugging by using not well debugged tools?) Any improvement to the compiler benefits the compiler writers themselves, which gives them an incentive to improve the compiler more. ~ ~ ~ ~ d v u n t u~goeost s; ~ a p p i nproblems: g there is no compilertocompilethefirstversionwith. Any changes to the language may necessitate many modifications to the compiler, as both the implementation and the source language change. The compiler may inadvertently be tuned to cons~ctionsused specifically in the compiler.
1.9
In data structures outside the while statement, as withany while statement.
1.11 See Figure Answers.l. 1.13 Firstasubset
is created by takingawaysomefeatures;thelanguage is thenextended by adding new features. An example would be a C compiler which does not implement floating point numbers but does havebuilt-in infinite-le~gthintegerarithmetic.Thesarcasmcomesfromthefactthateverything is an extended subsetof everything else, which makes the term meaningless.
1.19 The g r ~ m a t i c a production l process stops when the sentential form consists of terminals only; to test this situation, we have to be able to tell terminals and non-terminals apart. Actually, this is not entirely true:we can scan the ~ ~ ~declare a all r symbols , in left-hand sides asn o n - t e ~ n a l sand all other symbols as terminals. So context condition(l)actually provides redundancy that can be checked.
1.20 Suppose there were two different smallest setsof information items, S , and St. Then Siand S2 must have the same size (or one would not be the smallest) and each must contain at least one item the other does not contain (or they would not be different). Call one such differing item in SIX. Since both sets started with
nswers to exer~ises
SET
the
flag
There aischaracter -a- buffered TO False;
PROCEDURE Accept filtered character Ch from previous module: IF There is a character -a- buffered = True: // See if this is a second 'a': IF Ch = 'ar: // We have 'aa': SET Thereis a character -a- buffered TO False; Output character 'b' to next module; ELSE Ch /= ' a r : SET Thereis a character -a- buffered TO False; Output character * a r to next module; Output character Ch to next module; ELSE IF Ch= 'a': SET Thereis a character -a- buffered TO True; ELSE There is no character 'a' buffered AND /=Ch'a': Output character Ch to next module; PROCEDURE Flush: IF There isa character -a- buffered: SET Thereis a character -a- buffered Output character 'a* to next module; Flush nextmodule;
TO
False;
Elgure Answers.1 The filter aa "+b as a post-main module,
the same initial items,X cannot be an initial itembut must have been added by some application of an inference rule. This rule clearly did not applyin S,, so there must be at least one other itemY that is present in Siand absent from S,. By induction, all items inS, must differ from all iterns inS,, but this is impossible since both started with the same initial items.
2.7
They both mean the same as a*. They are not fund~entallyerroneous butmaydraw a warning from a processor, since they are probably not what the programmer intended. Ambiguity is not a concept in lexical analysis, so they are not ambiguous.
2.11 Let himher implement it and then feedan object file or jpeg picture as source file to the compiler; admire the crash, Or, more charitably, explain this intention to the inventor.
2.17 Each round adds at least one dotted itern to
Closure set and there is only a finite number of dotted
items. 2.23 Close cooperation between lexical and syntax analyzer is required. As a kludge, prelim in^ skipping of the dynamic expression based on counting nested parentheses could be considered. Error recovery is a nightmare,
2.26 It isn't as simple as that. It depends on the amount of interaction of the macro processing with the lexical analysis of larger units, for example strings and comments. InC the scheme is hare-brained since it would require the macro processor to do almost full lexical analysis, to avoid substi~tinginside strings and comments. But in a language in which macro names have an easily recognizable form (for example in PLh, in which macro names start witha %),there is no such interference, and a better st~cturingof the compiler is obtained by a separate phase. But the loss in speed and the large memory requirements remain. Also, with full macro processing preceding compilation, it is very difficult to reconstruct the source text as the compiler user seesit.
2.29 Figure Answers.=! shows an LALR( 1) suffix grammar for the grammarof Figure 2.84.
%token %token
IDENTIFIER EOF
%%
input-suffix : expression-suffix EOF I EOF ; expression : term I expression term ; expression-suffix : term-suffix I expression-suffixterm term : IDENTIFIER 1 ’(‘ expression ’) ‘ ; term-suffix : expression 0 ’ I expression-suffix l + ’
I
’ ) l
1
‘
term
l+‘
)
l
;
;
Figure Answers.2 An LALR(1) suffix grammar for the grammar of Figure 2.84.
2.33 See Hanson (1985). 2.37 (a) LL(1) and &-free.(b) Predictive is still more efficient. 2.41 The recursion stack consistsof a list of activation records, eachof which defines an active routine; only the top one is running. Each activation record contains a continuation address (often called return address) telling where the routine should continue when it becomes the top node. The code from the continuation address to theend of the routine consistsof zero or more routine calls. These calls represent what is being predicted and the co~espondinggrammar symbols are part of the prediction stack. Thus each activation record represents part of the re diction stack; the total prediction stack is the concatenation of all these parts, in the order of the activation records. Additional exercise: draw a picture that illustrates the above explanation ina clear way. 2.44 The acceptable setof a n o n - t e ~ n aNl is the union of HRST(N) and the acceptable setof the shortest alternative of N. So, the acceptable sets of all non-terminals can be precomputed using a closure algo~thm. Now, if the prediction stackis available directly (as an array or a linked list), we can traverse the stackand compute the union of the acceptable sets of the symbols in it. In LLgen, however, the prediction stack is just the C stack and is not available for traversal. LLgen keeps an integer array indexed by grammar symbols countinghow many times a given symbolis present on the stack. This informationis easily m ~ n t ~ n e d and sufficesto compute the acceptable set. 2.48 (a) When the ACTION table calls for a ‘reduce using rule N j a ’ , the item setco~espondjngto the stateon the top of the stack contains the itemN+a*, The dot can onlybe at the end of a when it has just passed over the last member of a,which must therefore be just below the top state on the stack. This reasoning on the stack. applies successively to all other members of a,which must therefore also be (b) The item set precedinga on the stack must contain the item N + w , or no a would be recognized and no item N+a* wouldeventually be found.Theitem N+*a musthaveoriginatedfromsomeitem P+P*Ny. The presence of this item guarantees that a transition on N is possible, leading to a state that includes P+PN*y. 2.49 A value ‘shift’ in an ACTION table entry does not conflict with another ‘shift’ value in that same entry, but a ‘shift’and a ‘reduce’ do. So do a ‘reduce’ and another ‘reduce’, since they are actually two different ‘reduces’: ‘reduce to M’ and ‘reduce toM. 2.91 Thetreehastheform (x 1 n- (x) 1n- andthelast x is thefirsthandle,inanybottom-upparser. So allthe (x 1n- mustbestacked.
~ n s ~ etor sexercises
2.55 After rule 2, add: 'If t and shift.'
U
are the same operator: if the operator
is left-associative, reduce, otherwise
2.58 In a pure bottom-up parser no such pointers exist: trees are constructed before their parents, and the only pointer to a tree is the one on the stack that is used to discard the tree; the stack entry that contains it is removed by the recovery process. If other pointers have been created outside the parsing mechanism, these must be found and zeroed.
3.2
For a on-terminal N,some of its production rules could set some attributes and other rules could set other attributes. Then the attributes in a tree with a node for N in it could be evaluable for one production (tree) of that N, alld not for another. This destroys the composability of context-free g r ~ m a r swhich , says that anywhere anN is specified, any production ofN is acceptable.
3.5
See FiguresAnswers.3andAnswers.4.
a
b Figure Answers3 Dependency graphs for S and A.
3.9
The fact that no intervening visits to other children are needed shows that the production rule already has all in all production rules that have the information for the second visit, This may, however, not be the case this type of child,so other production rules may require two non-consecutive visits.
3.15 See Figure Answem5, and note that the code is simpli~cation a over that from Figure 3.37. 3.18 We need two variables, the actual number needed here and a high-water mark. Simple symbolic interpretation suffices. 3.19 The successor of the then-part is the merge node at the end of the if-statement rather than its else-part, it and is correct that we enter that node with an empty list during symbolic interpretation, since we will never reach the end of the if-statement from the end of the then-part when the program is run, due to the intervenon the linear program text. ing jump. Full symbolic interpretation works on the threaded AST rather than 3.21 It violates requirement 4 in Section 3.2.2.1: the actions to be taken on constantsdo not subsume those taken on variables. Quite to the contrary, any constant can be handled by code generated for variables but not the other way around.
IS-SI graph set of A:
merged IS-SI graph of A:
Figure Ansvvers.4 IS-SI graph sets and IS-SI graph of A.
#include #include
Ifparser. hsl “thread.h1!
/ * for types AST-node and Expression * / / * for self check * /
/ * PRIVATE * / static AST-node *Thread-expression(Expressi0n *expr, AST-node *succ) { switch (expr->type){ case ‘ D ’ : expr->successor = succ; return expr; break; case ’ P’ : expr->successor = succ; return Thread-expression(expr->left, Thread-expression(expr->right,expr) 1; break;
1
1 / * PUBLIC * /
AST-node *Thread-start; void Thread-AST(AST-node *icode) { Thread-start = Thread-expression(icode,
0);
1 Figure Answers.5 Alternative threading code for the demo compiler kom Section 1.2.
x becomes initialized. Con: It sounds unreasonable and counterintuitive to get a variable initialized by assigning the value of an unini~iaIizedvariable to it. Pro: The error in the program is probably the lack of initiali~ationof y; the further usage of x is independent of this error. Since a warning is already given on the assign~ent,no further warnings on subsequent - probably correct - uses of x seem appropriate.
ns
Consider any routine with a flow-of-control graph that is a linear list from routine entry to routine exit. hat ever the contents of the KILL and GEN sets, the IN and OUT sets will be computed in one scan through the list, and there is no way to transport jnfo~ation the about the routine exit back to the last use of a variable.
A self-extracting archive works exactly like the ‘compiler’ of Section 4.2.1 :there, the executable tile contains both the AST and the interpreter. A self-extracting archive contains both the contents of the archive and the extraction code. Often, the archiveis compressed, and the extraction code also contains decompression code. The program: int Program[] = { / D ~ , 7 , ‘ D f , l , ‘ D ’ , 5 , ’ P ’ , ‘ + ’ , ’ P f , ‘ * ’ , ’ ! ’ , O } ;
The interpreter: int main(void) { int PC = 0 ; int Instruction; while ((Instruction = Program[PC++]) != 0 ) { switch (Instruction) { case ‘ D ’: Expression-D (Program [PC++]) ; break; case ’P‘ : Expression-P (Program [PC++] ) ; break; case ’!’: Print(); break;
1
1
return 0;
} See Figure Answers.6; the ‘then’ gets0.7, the ‘else’ 0.3; loop skipping gets 0.1, loop entering 0.9; the cases get 0.4, 0.4, 0.2. Traffic at routine entsy i s arbitrarily set to 1. The first column gives the 17 equations; all can be solved by simple substitu~ion,except those for e, f , and g, which need elimination. The results are given in the second column. Note that we predict that for every time the routine is called, the loop body A will be executed 6.3 times. Also note that the traffic out of the routine is again 1; what goes in must come out. These dependencies also express the requirement that all assignments to a variable are executed in sequential, left-to-right order. See Figures Answers.7 and Answers.8.
(a) See Figure Answers.9. (b) The input p of the second *p++ is dependent on the output p of the first *p++ and so its dependencies differ from thoseof the inputp of the first*p++ . S and N cannot be the same node, since that would make the dependency graph contain a cycle because S refers to N .
A ladder sequences starts at each graph root, except when that root has an incorning dependency. Not all roots can have incorning dependencies,or the dependency graph wouldbe cyclic. Suppose the token set { a, bcd,ab, c , d } and the input abcd. I~mediate~y re~rning the a yields 2 tokens, whereas 3 can be achieved, obviously. A s s u ~ ethe entire input is in memory. Record in each item its starting position and the number of tokens recognized so far. At each reduce item that says that N tokens have been recognized, add the T n i t i a l item set with token counter N+l and the present location as starting point. Having arrived at the end of
Equation a = l
Value 1.0
b = c = d = ~3 = f = g = h =
0.7
0.7
a 0.3 a b 0.1 (d+f)
0.3 0.7 0.7
6.3 (d+f) 6.3 c 0.3 i = 0.4 h 0 12 j = 0.4 h 0.12 k = 0.2 h 0.06 g
0.9
*
l = i j
0.12
m =
0.12
n = k
0.06
o = e
0.7
p
=L
l+m+n
g = o+p
0.30 1.0
Figure Answers.6 TrafFc equations and their solution for Figure 4.97.
+
2
X
Y
+
+ *
a
*
2
a
Figure Answers.7 The dependency graph before common subexpression elimination.
the input, find the reduceitem with the largest token counter and isolate the tokenit identifies. Work backwards, identifying tokens.
4.28 For the first inst~ctionin the sequence we have 20*2=40c o ~ b i n a t i o ~using s , R I , RI and R1, R2, or more compactly {R l } ,{ R I , R2 }. Forthesecondinstructionwehave20*2*3=120combinations,using {R 1 , R2 } ,{R 1 ,R2 ,R 3 };for the further i n s ~ c t i o n we s have 20*3*3=180 co~binationseach. In total 4800~18dv-~ Combinations. Estimating a weekend at 2.5 days,each of about 80000 seconds, we have about2 ~ 1 0 ~ ~ ~ s e c oorn d2x10" s, tests. So we want the largestN for which 4 8 0 0 ~ 1 8 p -is~smaller than 2x10". Now, 4800x1804~2=1.Sx10s and 4 8 ~ ~ 1 8 0 ~ - ~ = 2 . 7so~ N1 04 ' .~ ,
~ ~ s ~toeexercises r s
X
Y
Figure Answers.8 The dependency graph after common subexpression e~i~ninati~n.
Figure Answers9 The dependency graph of the expression *p++
4.34 Advantages of PC-relative addressing modes and instructions are: -they require no relocation, thus reducing the work of the linker; -they allow position-independent code,code that can be loaded anywhere in memory, without any rnodifications; -they may allow shorter instructions: an offset may fit in a byte whereas a full address usually does not, Even if it does, the assernbler still has to reserve space for a full address, because the linker may modify it.
5.2
The garbage collector will free chunks only when they are unreachable. If they are u n r ~ ~ ~ hthey a b cannot l~ So it is safe to call the garbage be freed by the user since the user does not have a pointer to them any more. collector from Mall o c () .
5.6
In the allocation of the arrays into which the code and data segments will be compiled; perhaps
inthe allo-
cation of the external symbol table. 5.7
In general, the garbage collection algorithms inspect pointer values, which will be (si~ultaneously)changed by the application. Some garbage collection algorithms (for example two-space copying and compaction)
copy data, which is dangerous if the application can access the data in the meantime. Some algorithms 5.2.2), assume that the garbage collection only becomes active at specific points in the program (see Section which is difficult to guarantee with a concurrent garbage collection.
.Y
If the assignment p :=p is not optimized away, if p points to a chunkP with reference count l , if P contains a pointer to another chunk Q and if the reference count of Q is also 1, then first decreasing the reference Q tobefreed. Subse~uentlyincreasingthereferencecountof countof P causes P tobefreed,ses P will not raise the reference CO again and the pointer to Q in P will be left dangling. Also, on some systemsfreeing P mightcauseiontotakeplace,afterwhichthechunk P wouldbegoneentirely and incrementing its reference count would overwrite an arbitrary memory location.
5.12 See Sikl6ssy (1972). The ‘overlapping lists’ in the paper are dags.
6.2
6.
6.7
In most languages we need to know if an identifier is a keyword or the name of a macro, long before its name space is known, If we want to postpone the identificati~nto the time that the proper name space is known, we will need othermechanisl~sto solve the keyword and macro name questions. We have rvalue?V: V -+ rvalue. In principle, rvalue?lvalue :lvalue could yield an lvalue, but ANSI C defines it to yield an rvalue. In GNU C an lvalue results, but a warning is given under the -pedantic flag. Can’t be. The last scope rule forbids the creation of such values.
by a class descriptor which contains, among others, the method 6.13 At run time, a class may be represented table of the class. Such a class descriptor could also contain a pointer to the class descriptor of its parent class. An object then contains a reference to its class descriptor instead of a reference to the method table. Then, the implementation of theinstanceof operator becomes easy, see Figure Answers.10.
FUNCTION Instance of (Obj, Class) RETURNING a boolean; SET Object Class TO Obj. Class: WHILE Object Class/= No class: IF Object Class= Class: RETURN true; ELSE Object Class/= Class: SET Object Class TO Object Class .Parent; RETURN false; Figure Answers.10 l ~ p l e l ~ ~ n t of a ~the o ninstanceof operator.
6.14 The code for these callsis: (*(e->dispatch-table[O] ) ) (e); (*(e->dispatch-table[2]) ) ( (class D *) ( (char (*(e->dispatch-table[ 3 1 ) ) ( (class D *) ( (char
*)e+ sizeof *)e+ sizeof
(class C) ) ) ; (class ) )C) ;
Note that althoughm4 is redefined in classE, it still requires a pointerto an object of classD.
6.16 For example, when the caller calls several routines consecutively, the ‘caller saves’ scheme allows saving and restoring only once, whereas the ‘callee saves’ scheme has no option but to doit for every call. Also, in the ‘callee saves’ scheme the callee has to save and restore all registers that might be needed by any caller, whereas the ‘caller saves’ scheme allows for saving and restoring only those registers that are needed for
this particular caller, at this particular call site, This may cause overflow, because the controlled variable may then be incremented beyond the upper bound (and thus possibly beyond the maximum representable value). 6.30 Pushing the last parameter first makes sure that the first parameter ends up on the top of the stack, regardless of the number of parameters. ~ n s t a c ~ i the n g parameters will therefore yield them in textual order,
2 Two new operations must be supported: instantiation of a generic routine whichis a parameter, and passing a generic routine as a parameter to an instantiation. The consequences from a compiler construction point of to be dealt view are modest. When implementing instantiation through expansion, the first instantiation with resides within a non-generic unit. Therefore,a generic routine parameteris available as an AST, which can be copied and processed just as the AST of an ordinary generic unit. One issue that must be dealt with, weather the language has generic routine parameters or not, is cycle detection: when one generic unit contains an instantiation of another generic unit, the result is a chain of not contain further instantiations, which must be terminated by the instantiation of a generic unit which does instaIltiations. If this is not the case, the implementation through expansion scheme will fail. Without generic routine parameters, detecting this situationis easy: when a generic unit occurs twice on the instantiation chain, there is a cycle. With generic routine parameters, when an identical instantiation (same generic unit and sameinstant~~tion para~eters) occurs twice onthe instantiation chain, there is a cycle. When implementing instantiation through dope vectors, a generic routine has a code address, and can be passed as a parameter, just like an ordinary routine.
(a) One should not writ not call a function with
'
that fail for some argu~ents(but sometimes one has to). (b) One should for whichit will fail (but thatis so~etimeshard to know).
1 The following program uses the value of the expression fac
20 2" times:
tree 0 v a l = val tree n v a l = let t = tree (n-1) val in t*t main = p r i n t (tree 1 0 (fac 2 0 ) )
Althoughthe ++ (append)operatorisassociative,the mount ofworkdiffers for a ++ (b ++ c) and (a ++ b) ++ c. This is a consequence of the append operator, which essentially creates a copy of its first ~ g u m e nto t 'replace' the null pointer with a pointer to the second list. In (a ++ b) ++ c the list a is copiedtwice,whereasonlyonecopyisrequired for a ++ (b ++ c). Transforming mappend into an accumulating ~ g u m e nfunction t would cause the suhlists to be copied many times (depending on their position). There is no reduction in heap space since all lazy expressions allocated in the heap consist of a (built-in) function with two arguments, whichis the break-even point. The benefit is in the ~ r a p hreducer, which does not needto unwind application spines,but can call the suspended function immediately.
The new paradigm implements breadth-first search rather than depth-first search. This usually takes (much) in an infinmore time, but will find a solution if one exists, unlike depth-first search which may work itself ite branch of the search tree. The discussed o~tim~zations are all applicabl~to some degree. The number of values returned in general is unbound, so an infinite list may haveto be returned. Theassertsandretractsshouldnotbeundonebybacktracking,Sincethedatastructureinwhichthe corresponding example, thecounters (for andkeptasserted are clauses number-of-clauses-added-at-end-for-parent-2 inFigure 8.25) areglobalvariables,no
backtracking will occur,as required.
9.1 No; shared variables are easier to use than message passing. Also, on a shared-memory machine, sharedvariable programs often get better performance than message passing programs, because message passing programs do more copying. 9.4
No; the lock that is used to protect thea~minis~ation can use spinning, because the operations on the list are very simple and will not block for a long time.
9.6
(a) The first operationthatsucceeds in acquiringthemonitorlockwillcontinue.Thesecondoperation, first operation has released the however, will block on the monitor lock, and thus cannot continue until the lock. Thus, although the two operations could in principle be executed simultaneously, the implementation of a monitor runs them sequentially. (b) 1. Use symbolic inte~retationto find those operations that are read-only. 2. Use a mu1ti"state lock to protect the monitor: nobody inside, some readersinside, some readers inside and one or more writers waiting, and one writer inside.
9.10 It is difficult to copy the stack of a thread, because theremay exist variables that point into the stack.In C, for example, it is possible to take the addressof a local variable (storedon the stack) andstore this value in a global variable. If the stack is copied to a new region in memory, the global variable will still point to the location in the original stack. So, migrating a stack from one memory region to another is hard to implement ~ansparently. 9.11 Since X contains a thread, it is harder to migrate, since threads always have a state that has to be migrated (for example, a stack); moreover, thread migration has the same implementationdif~cultyas stack migration, as described in the answer to Exercise 9.10. mi gratin^ object Y is easier and less expensive, since it does not contain a thread. On the other hand,if multiple active objects (likeX) located on different processors do operations on objectY, object Y will repeatedly be moving between these processors. In this case, it to the processorof Y. may be more efficient to move the active objects
A.1 (a) In Java, C++, and probably in most other object-oriented languages, the'consbxctor, is not a consbxctor at all but rather an initializer. The real c o n s ~ c t o is r the new,and when it has finished constructing the object (building the house), the constructor is called to initialize it (move in the furniture). In Expression Program= new Expression ( ) ;,the new allocates space for an Expression,which will not be large enough to hold any of its extended subtypes, and thec o n s ~ c t ocannot r stretch thisspace aftera pointer toit), not the constructor. wards. Also, it is the new that returns the object (or (b) Its consbxctor should be a real constructor, in which thep r o g r ~ m e rcan allocate the space, possibly for a subtype, initialize its fields, and return the constructed object.
Aho, A. V., Ganapathi, M., and Tjiang, S. W, K. (Oct. 1989). Code generation using tree pattern matching and dynamic p r o g r a ~ i nACM ~ . Trans. Program~ingLanguages and Systems, 11(4), 491-516 Aho,A. V. andJohnson, S. C.(March1976).Optimalcodegenerationforexpressiontrees. J .ACM, 23(3), 488-501 Aho, A. V., Johnson, S. C.,andUllman, J. D. (Jan. 1977).Codegeneration for expressionswithcommon subexpr~ssions.J. ACM, 24(1), 146-160 Aho, A. V.,Sethi, R., and Ullman, J. D. (1986). C o ~ p ~ l e rPrinciples, s: Tec~niquesand Tools. Addison-Wesley Aho, A. V.and Ullman, J,D. (1973). The Theory of Parsing, Translation and Co~piling,Vol. I:Parsing, Vol. 11: C ~ ~ ~ p i Z iPrentice ng. Hall Ahuja, S., Casriero, N., and Gelernter, D. (Aug. 1986). Linda and friends. IEEE Com~uter,19(8), 26-34 n Machine -A Tutorial Reconstruction. MIT Press Lit-Kaci, H. (1991). ~ ~ r r'SeAbstract ~y C~~zputing. 2nd Edn. B e n j a ~ ~ C u m ~ n g s Alrnasi, G.S. and Gottlieb, A. (1994). ~ i g h ParaZZel Anderson, J. P. (March 1964).A note on some compiling algorithms.Conzm. ACM, 7(3), 149-150 Anderson, T. E., Bershad, B. N., Lazowska, E. D., and Levy, H. M. (Oct. 1991). Scheduler activations: Effective kernel supportfor the user-level manage~entof parallelism. In 13th ACM Symposium on Operating Systems Prinei~les(Edward D. Lazowska,ed.), pp. 95- 109.ACM in distributedprogrants. ACM C o ~ z ~ u t ~ n g Andrews,G. R. (March1991).Paradigmsforprocessinteraction S ~ ~ ~23(YI ),S49-90 , Anonymous (1840). Des ~ r e ~ h e r rvon n ~ ~ n c h h a u wun~erbare se~ Reisen urzd Abent~euerzu Wasser und zu Lande. Dieterichsche Buchhandlung, Gottingen Appel, A, W. (1987). Garbage collection can be faster than stack allocation. I ~ ~ o r ~ aProcessing ~ion Letters, 25, 275-279 Appel, A. W. (1992). Co~pilingwith Continuations. Cambridge University Press r ~o ~ ~ iImp~eme~ta~ion ~er in C ~ M ~ J a vCa a. ~ b ~ d University ge Press Appel, A. W. (1997). M o ~ e C algorithm for register Appel, A. W. and Supowit, K. J. (June1987). Gener~izationsof the S e t h i - U l l ~ ~ allocation. Software -Practice and Experience, 17(6), 417-421 Assmann, W, (Oct. 1992). Anothersolution of scoping problems in symbol tables. In C o ~ ~ i l Construct~on, er 4th ZnternutionaZ Conference, CC'92 (U. Kastens and P. Pfahler, ed.), pp. 66-72. Springer-Verlag Austin, T.M,, Breach, S. E., and Sohi, G.S. (June 1994). Efficient detectionof all pointer andm ay access errors. ACM ~ ~ Notices, I 29(6), 290-3 ~ 12P ~ Bacon, D. F., G r ~ a S.~ L., , andSharp, 0. 3. (Dec. 1994).Compiler ans sf or mat ions for high-pe~ormance computing. ACM C o ~ p u t ~ nSurveys, g 26(4), 345-420 Bal, H. E., Bhoedjang, R., Hofman, R., Jacobs, C., Langendoen, K., Ruhl, T., and Kaashoek, M, F. (Feb. 1998). Performance evaluationof the Orca shared-object system.ACM Trans, C o ~ p u t e Systems, r 16(l), 1-40
72u
Bal, H. E. and Grune, D. (1 994). P r o g r a ~ ~ Language ~ng Esserzt~als.Addison-Wesley Bal, H. E., Kaashoek, M. I?., and Tanenbaum, A. S. (March 1992). Orca: A language for parallel programming of distributed systems. IEEE Trans. S o ~ ~ aEngineering, re 18(3), 190-205 Bal, H. E., Steiner, J. G., and Tanenbaum, A. S. (Sept. 1989). Programming languages for distributed c o ~ p u ~ n g systems. ACM Corn~utingSurveys, 21(3), 261-322 Barach, D. R., Taemer, D. H., and Wells, R. E. (May 1982). A technique for finding storage allocation errors in C-Ianguage programs. ACM SiGPUN Notices, 17(5), 1 6-23 Barrett, D. A. and Zorn, B. G. (June 1995). Garbage collection using a dynamic threatening boundary. A C ~ S I G P U ~Notice‘s, 30(6), 301-3 I4 Baskett, F. (April 1978). The best simple code ene era ti on technique for while, for, and do loops. ACM SIGPUN Notices, 13(4), 3 1-32 Bell, J. R. (June 1973). Threaded code. C o ~ ~ACM, ~ n 16(6), . 370-372 ges Ben-Ari, M. (July 1984). Algorithms for on-the-fly garbage collection. ACM Trans. P r o g r ~ ~ L~ ~~~~n ng ~gaand Systems, 6(3), 333-344 Bernstein, R. L. (Oct. 1985). Producing good code for the case statement. S o ~ w a r e- Practice and Exp~rience, 1 ~ ( 1 0 )1021-1024 , Bertsch, E. and Nederhof, M.-J. (Jan. 1999). On failure of the pruning technique in ‘Error repair in shift-reduce parsers’. ACM Trans. P r o g r ~ ~ r nknguages in~ and Systems, 21(l), 1-10 Bhoedjang, R. A. F., Ruhl, T., and Bal, H. E. (Nov. 1998). User-level network interface protocols. IEEE C o ~ p u t e r31( , 1 I), 53-60 Bird, R. J. (1998). Introd~ctionto ~unctionalP r o g r a ~ ~ i nusing g aske ell. 2nd Edn. Prentice-Hall Europe Birtwistle, G. M., Dahl, 0.-J., ~ y h r h a u gB., , and Nygaard, K. (1975). S I M ~ Ubegin. Petrocellj/Charter Bjornson, R., Carriero, N., Gelernter, D., and Leichter, J. (Jan. 1988). Linda, the Portable Parallel. Tec~nical Report RR-520, Yale University n , Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Blume, W., Doallo, R., ~ i g e n ~ a n R., Pottenger, B., Rauchwerger, L., and Tu, P. (Dec. 1996). Parallel p r o g r ~ m i n gwith Polaris. IEEE C o ~ p u t e r , 29( 12)’ 78-83 Boehm, €I.-J. (June 1993). Space-efficient conservative garbage collection. ACM SIGPUN ~ o t i c e s ,28(6), 197-206 Boehm, H.-J. and Weiser, M. (Sept. 1988). Garbage collection in an uncooperative environnient. S o f ~ a r ePractice and Experience, 18(9), 807-820 Boyd, M. R. and Whalley, D. €3. (June 1993). Isolation and analysis of optimization errors. ACM S I G P ~ N Notices, 28(6), 26-25 Boyland, J. and ~ a ~ ~G.~(Oct. n a1997). , Parasitic methods - Imple~entationof m u l t i - ~ e ~ o for d s Java. ACM SIGP~N Notices, 32( 1O), 66-76 Briggs, P., Cooper, K. D., Kennedy, K., and Torczon, L. (July 1989). Coloring heuristics for register allocation. ACM SiGPUN Notices, 2 Brooker, R. A., ~ a c c a l l u m I. , R., Morris, D., and Rohl, J. S. (1963). The compiler compiler. Annua~Review A u t [ ~ ~ a tPro,~r~rnrning, ic 3,229-322 Bruno, J. and Sethi, R. (July 1976). Code generation for a one-register machine. J. ACM, 23(3), 502-510 : Interpretation and Co~pilation.Pitman Burn, G. L. ( 1991). Lazy ~ ~ n c t i u n~arli g u a g e sAbstract Carriero, N. and Gelernter, D. (Sept. 1989). How to write parallel programs: A guide to the perplexed. ACM C o ~ ~ u tSurveys, j n ~ 21(3), 323-357 Chaitin, G. J., Auslander, M. A., Chandra, A. K., Cocke, J., Hopkins, M. E., and Markstein, I?. W. (1981). 6( l), 45-57 Register allocation via coloring. C o ~ ~ ~Language~s, t e r Chapman, N. P. (1 987). LR Parsing: Theory and Practice. C a ~ b r i d g eUniversity Press n . 13(11), 677-678 Cheney, C. J. (Nov. 1970). A non-recursive list compacting algorithm. C o ~ ~ uACM, Cocke, J. and Kennedy, K. (Nov. 1977). An algorithm for reduction of operator strength. C ~ ~ r n ~ACM, u n . 20(11j, 850-856 Cohen, J. (1981). Garbage collection of linked data structures. ACM C u ~ ~ u tSurveys, i ~ g 13(3), 341-367 Collins, G. E. (Dec. 1960). A method for overlapping and erasure of lists. Conz~un.ACM, 3(12), 655-65’7 Colomb, R. M. (March 1988). Assert, retract and external processes in Prolog. S o ~ ~ a r-e Practice and E ~ ~ e r i ~ r i18(3), c e , 205-220 Conway, M. E. (July 1963). Design of a separable t~~nsition-diagram compiler. Curnrnun. ACM, 6(7), 396-408
Cooper, K. D., Hall, M. W., and Torczon, L. (March 1992). Unexpected side effects of inline substitution: A case study. ACM Letters on ~ r o g r u ~ ~~nguage‘s ing and Syste?ns,1( l), 22-32 Coulouris, G., Dollimore, J., and Kindberg, T, (1994). Distr~~uted Systems - Co~ceptsand Design. 2nd Edn. Addison-Wesley Davidson, J. W. and Fraser, C. W. (June 1984a). Automatic generation of peephole optimizations.ACM S I G P ~ N Notices, 39(6), 1 11-116 Davidson, J. W. and Fraser, C. W. (Oct. 198410). Code selection throug~object code optimization. ACM Trans. r a ~ ~ i ~n ngg u a g eand s Systei~s,6(4), 505-526 Davidson, J. W. and Whalley, D. €3. (Jan. 1989). Quick compilers using peephole opti~izations.S ~ ~ ~ -u r e Practice and E ~ ~ e r ~ e19( n c1), ~ 79-97 , g - The quiche-eating approach. In Debray, S. K. (1994). Im~lementing logic p r o ~ a ~ i nsystems I ~ n p l ~ ~ n e n ~ aoft ~hognisc P r o g r a ~ m ~ nSy.ste~s g (E. Tick and G. Succi, ed.), pp. 65-88. Muwer Academic e g r ~ m ~Cs ( . ~ ~ ~ACM, u n .14(7), 453-460 DeRemer, F. L. (July 1971). S i ~ p l LR(k) DeRerner, F. L. (1974). Lexical analysis. In C o ~ p i l e rConstruction, An A d v a ~ c eCourse ~ (F.L. Bauer and J. Eickel, ed.), pp. 109-120. Springer-Verlag Dewar, R. (June 1975). Indirect threaded code. C o m ~ u nACM, . 18(6), 330-331 , L. (Nov. 1978). On-the-fly garbage collection: an exercise in cooperation. C[~mnzun. (March 1998). A syste~aticstudy of functional language impleinentations. T O P U S , zle, U. (Oct. 1995). ~ i ~ ~ n i ~row i t displacement ig dispatch tables. ACM SIGPLAN Notices, 30(10), 141-155 DuJardin, E., Amiel, E., and Simon, E. (Jan. 1998). Fast algorithms for compressed multime~oddispatch tables. ~C~ ~ r a ~Psr.o g r u ~ ~ ~ n g u u g e and s Sy~stem~s, 20(t), 116-165 Dybvig, R. K. ( 1 996). The S c ~ e P~ reo g r u ~ ~ i n~ gn g u ~ gANSI e : Scheme. 2nd Edn. Prentice Hall n . 13(2), 94-102 Earley, J. (Feb. 1970). An efficient context-freeparsing ~gorithm.C o ~ ~ uACM, Engel€~iet,J. (1984). Attribute grarnm~s:att~buteevaluation methods. In ~ e t ~ and o ~Tools s $or Co~~iZer C ~ n . s t r ~ c ~(B. i o nLorho, ed.), pp. 102-138. Cambridge ~niversityPress elfriet, J. and De Jong, W. (1990). Attribute storage optimization by stacks. Acta I n ~ ~ r ~ u t i27,567-581 ca, s . S I G P Not~ces, ~ 19(6), 70-80 (June 1 984). Sub-protocol-evaluatorsfor attribute g r ~ ~ a rACM and Yellin, D. (1986). A c o ~ p ~ i s oofn storage opti~zationsin automatica~lygenerated attribute tors. Acta Informatics, 23, 393-427 Feijs, L. M. C. and Van Ommering, R, C. (Aug. 1997). Abstract derivation of transitive closure ~ g o r i ~ ~ r n ~ . I ~ f o ~ a t i oProces.~ing iz Letters, 63(3), 159-164 Fortes GLilvez, J. (Oct. 1992). Generating LR(1) parsers of small size. In Com~iZerConstr~ction,4th I~ternational Confe~ence,CC’92 (U. Kastens and P.Pfahler, ed.), pp. 16-29. Springer-Verlag Fraser, C. W. and Hanson, D. R. (1995). A ~ e ~ a r g e t a ~Cl eC o ~ ~ i ~-e rDesign and l ~ ~ ~ l e ~ e n t a t ~ o n . Be~jamin/Cu~mings, Redwood City, Ca. Freeman, E., Hupfer, S., and Arnold, K. (1999). JuvaS~aces- Princi~les,Patterns, and Practice. AddisonWesley Freibu~g~ouse, R. A. (Nov. 1974). Register allocation via usage counts. C o ~ ~ uACM, n . 17(1l), 638-642 Geurts, L., Meertens, L., and Pemberton, S. (1989). The ABC Prograi~mer’s~ u n ~ ~Prentice o o ~ Hall . garbage collection for strong~y typed ~rogramminglanguages. ACM ). Eli~inatingbranches using a super-op~imizerand the GNU C compiler. A ~ M S l G P ~ NNotices, 27,341-352 . Hall Criswold, R. E. and Griswold, M. T. (1 983). The Icon P r ~ g r a ~~~n ig ~u ug g ePrentice Griswold, R. E, and Griswold, M. T. (1986). The l ~ p l e ~ e ~ t aofthe t i o ~Icon ~ r o g r u m ~ ~ n gg u u g e Princeton . U~iivers~ty Press Grune, D. and Jacobs, C. J. H. (1990). Pur.sin~~ec~ni4ue.s: a Practical Guide. Ellis Honvood Hall, M. W,, Anderson, J. M., Amarasinghe, S. P., Murphy, B, R., Liao, S.-W., Bugnion, E., and L m , M. S . (Dec. 1 996). ~ a ~ i m i z i n~ultiprocess~r g performance with the SUIF compiler. IEEE C o ~ ~ u t e29(1 r , a), 84-89 Hanson, D. R. (Dec. 1985). Compact recursive~descentparsing of expressions. S o ~ a r e- Practice and
3
H ~ m a n nA. , C. (1977). A Concurrent-Pascal compiler for ~ i n i c o ~ p u t e rSpringer-Verlag s. Fast detection ofmemory leaks and access errors. In inter ’92 Hastings, R. and Joyce, B. (1992). Purify USENIX Conference (Eric Allman,ed.), pp. 125-136. USENIX Association Hemerik, C. and Katoen, J.P. (Jan. 1990). Bottom-up tree acceptors. Science of C o ~ p u t e r P r o g r a ~ ~ i13, ng, 51-72 Hennessy, J. L. and Mendelsohn, N.(Sept. 1982). Compilationof the Pascal case statement. S o ~ a r e-Practice and Experience, 12(9), 879-882 Holmes, J.(1995). O ~ j e c t - O r i e ~ t e ~ C o ~Con~truction. piler ~entice-HallI ~ t e ~ a t i o n ~ a Languages, and Co~putatiun. Hopcroft, J. E. andUllman, J. D.(1979). Zntroduction to A u t o ~ a ~Theory, Addison-Wesley A frameworkfor data dependencetesting in the Hummel, J., Hendren, L. J.,andNicolau,A.(Aug.1994). presence of pointers. In 1994 Znte~ationalConference on Parallel Processing (Vol. 11) (KC. T i , ed.), pp. 216-224. CRC Press Hwu, W.-M. and Chang, P.P. (July 1989). Inline function expansion for compilingC programs. ACM S I G P ~ N
-
Jazayeri, M. and Pozefsky, D. (Oct. 1981). Space efficient storage ~anagementfor attribute g r a ~ a r s ACM . Trans. P r o g r a ~ ~ i n g ~ n g u and a g eSsy s t e ~ s3(4), , 388-404 Johnson, W. L.,Porter, J. S., Ackley, S. I., and Ross, D. T. (Dec. 1968). Automatic generationof efficient lexical processors usingfinite state techniques. C o ~ ~ uACM, n . 11(12), 805-813 Johnsson, T. (June 1984). Efficient compilation of lazy evaluation.ACM S I G P ~ NNotices, 19(6), 58-69 g In 3rd ~unctional Johnsson,T.(Sept.1987).Attribute g r ~ m a r sas a function^ p r o ~ a m ~ nparadigm. P r o g r a ~ ~ i n g ~ n g u a gand e s Computer Architect~reConference (G, Kahn,ed.),pp.154-173.SpringerVerlag Jones, N. D., Gomard, C, K., and Sestoft, P. (April 1993). Partial Evaluation and P r o g r a ~Generation. Prentice Hall Jones, R. and Lins, R. (1996). Garbage Collection -Algorith~s for Auto~atic ~ y n a ~ i c M e Manage~ent. ~ory John Wiley Jourdan, M. (June 1984). Strongly non-circular attributeg r a ~ a r and s their recursiveevaluation.ACM S Z G ~ ~ Notices, 19(6), 81-93 Jul, E., Levy, H., Hutchinson, N., and Black, A. (Feb. 1988). ~ i n e ~ ~ r mobility ~ n e d in the Emerald system.ACM 109-133 Trans. Computer S y s t e ~ s6(l), , for the case statement’. Kannan, S. andProebsting,T.A.(Feb.1994).Correctionto‘Producinggoodcode S o ~ a r e-Practice and E~perience,24(2), 233 K g ,A. H.(May 1987). P r o g r ~ i n gfor parallelism.ZEEE Co~puter,20(5), 43-57 Acta Znfo~atica,24,633-652 Kastens, U. (1987). Lifetime analysis for attributes. Kastens, U.,Hutt, B., and Zimme~ann,E. (1982). GAG: A Practical Co~pilerGenerator. Sprin~er-Verlag Katayama, T. (July1984).Translation of attribute g r ~ a r into s procedures. ACM Trans. P r o g r a ~ ~ ~ n g Languages and Systems, 6(3), 345-369 shared Keleher, P.,Cox, A.L., Dwwkadas, S., andZwaenepoel, W. (Jan. 1994). Tread~ar~s: ~istributed memory on standard workstations and operating systems. In inter 94 ~ S E N I XConference (Jeffrey Mogul, ed.), pp. l 15- 132. USENIX Association Kennedy, K.(1981). A survey of data flow analysis techniques.In P r o g r a ~Flow Analys~s(Steven S. ~uchnick and NeilD. Jones, ed.), pp.5-54. Prentice Hall C o ~ ~ uACM, n . 19(7), 385-394 King, J. (July1976). Symbolic execution and program testing. Klint, P. (Sept. 1981).Interpretation techniques.~ o f ~ a r Practice e and Experience, 11(9), 963-973 Z n ~ o Control, ~. 8,607-639 Knuth, D. E. (1965).On the translationof languages from left to right. Knuth, D.E. (1968). Semantics of context-freelanguages. Math. Syst. Theory, 2(2), 127-145 Knuth, D. E.(l 971). Semanticsof context-freelanguages -correction. Math. Syst. Theory, 5( l), 95-96 g 1:Fundamental Algorit~ms.2nd Edn. AddisonKnuth, D. E. (1973). The Art of C o ~ p u t e r P r o g r a ~ ~ i nVol Wesley Knuth, D. E. and Stevenson, F. R. (1973). Optimal measurement points for program frequency counts. BZT, 13, 3 13-322 Koskimies, K.(June 1991). Object-orien~ationin attribute g r ~ m a r s In . Attribute G r a ~ ~ a rApplicatiuns s, and S y s t e ~ s(H. Alblas& B. Melichar, ed.), pp. 297-329. Springer-Verlag
-
Kowaltowski, T. (1981). Parameter passing mechanisms and run-time data structures. S o f ~ a r e-Practiceand Experience, 11(7), 757-765 Kristensen, A. (1998). Template resolution in ~ L ~ T M Con~puter L , N e ~ o r k and , ~ ISDN Systems, 30(1-7), 239-249 Kurnar, V., Grama, A., Gupta, A., and Karypis, G. (1994). IntroductiontoParallelComputing -Designand Analysis o~Algorithms.B e n j ~ i ~ C u ~ n g s Kursawe, P. (1987). How to invent a Prolog machine. New Generation Computing, 5,97-l14 LaLonde, W. R., Lee, E. S., and Horning, J. J. (1971). An LALR(k) parser generator.In P I P Congress 71 (C.V. Freiman, ed.), pp. 153-157. North-Holland Landin, P. 5. (April 1964).The mechanical evaluationof expressions. Computer J., 6(4), 308-320 Lemkin, P. F. (Oct. 1988). PSAIL: a portable SAIL to C compiler -description and tutorial. ACM SIGPLAN Notices, 23(10), 149-171 Levine, J. R.(1999). Linkers and Loaders. Morgan-Kaufmann Levine, J.R., Mason,T., and Brown,D. (1992). Lex and Yacc. 2nd Edn. O’Reilly Lewis, H. R. and Papadimitriou, C.H. (1998). Elements of the Theory of Computation.Prentice Hall Lewis 11, P. M. and Steams, R. E. (1968). Syntax-directed~ansduction.J. ACM, 15(3),465-488 Li, K.and Hudak, P. (Nov. 1989). Memory coherencein shared virtual memory systems. ACM Trans. Computer Systems, 7(4), 321-359 Linz, P. (1997). An Introduction to Formal Lunguages and~utomata.2nd Edn. Jones and Bartlett Louden, K.C. (1997). Compiler Construction -Principles and Practice. PWS PubSishing Loveman, D. B. (Feb. 1993). High performance Fortran.IEEE Parallel and Dis~ributedTec~noZogy,l ( l), 25-42 Maassen, J.,Van Nieuwpoort, R., Veldema,R., Bal, H, E., and Plaat, A, (Aug. 1999).An efficient i~plementation Notices, 34(8), 173-182 of Java’s remote method invocation.ACM S I G P ~ N M ~ e l I i ,A. and M o n t a n ~ U. , (April 1982). An efficientunification ~ g o ~ ACM ~ m Trans. . Progr~mming ~ n g u a g eand s Systems, 4(2), 258-282 Massa~in,H. (Oct. 1987). S u p e ~ o p ti ~ zer -A look at the smallest program. ACM S I G P Notices, ~ 22(10), 122-1 26 ~ c C ~ h J.y (April , 1960).Recursivefunctions of symbolicexpressions and theircomputation by machine. Commun. ACM, 3(4), 184-195 McKenzie, B. J., Harries, R., and Bell, T. C. (Feb. 1990). Selecting a hashing algorithm.S o ~ a r e Practice and ~ x p e r i ~ n c20(2), e, 209-224 McKenzie, B. J., Yeatman, C., and De Vere, L. (July 1995). Error repair in shift-reduce parsers. ACM Trans. Programming ~ n g u a g eand s Systems, 17(4), 672-689 Milner, R., Tofte, M., Harper, R., and MacQueen, D. (1997). The ~ e ~ n i t i of o nStand~rdML. revised Edn. MIT Press Morel, E. (1984). Data flow analysis and global optimization. In eth hods and Tools for Compiler Con.struction (B. Lorho, ed.), pp. 289-315. Cambridge University Press Morgan, R.(1998). ~ u ~ ~ an d Optimizing ~ n g CompiEer. Digital ~ess/Buttenvorth-Heinemann Muchnick, S. (1997). Advanced Compiler Design andI~plementation.Morgan Kaufmann Muchnick, S. S.and Jones, N.D. (1981). Program FZow Analysis. Prentice Hall Naur, P. (1965). Checking of operand types inAL,GOL compilers. BIT, 5, 151-163 Nguyen, T.-T. and Raschner, E. W. (May 1982). Indirect threaded code used to emulate a virtual machine. AC34 S I G Notices, ~ ~17(5), ~ 80-89 Nitzberg, B. and Lo, V. (Aug. 1991). Distributed shared memory -A survey of issues and algorithms. ZEEE Computer, 24(8), 52-60 Noonan, R.E. (l 985). An algorithm for generating abstract syntax trees. Computer ~ n g u a g e s10(3/4), , 225-236 Informat~onProcessing Nuutiia, E. (Nov. 1994). An efficient transitive closure algorithm for cyclic digraphs. Letters, 52(4), 207-213 Op den Akker,R. and Sluiman, E.(June 1991). Storage allocationfor attribute evaluators using stacks and queues. Applications andSystems (H. Alblas & B. Melichar, ed.), pp. 140-150. SpringerIn Attribu~eGr~~inmars, Verlag Pagan, F. G. (June 1988). Converting interpreters into compilers. S o ~ a r e-Practice and ~xperience,18(6), 509-527 Pagan, F. G. (1991). Partial Computation and the Construction of Lunguage Processors. Prentice Hall
Pager, D. (1977). The lane-tracing algorithm for constructing LR(k) parsers and ways of enhancing its efficiency. Inform. Sci., 12, 19-42 Paige, R. and Koenig, S. (July 1982). Finite differencing of computable expressions. ACM Trans. P r o g r a ~ ~ i ~ g
LR: A predicated-LL(k) parser generator. So~ware-Practice and iler Construction. Computer Science Press Paulson, L. C. (1996). MLfor the ~ o r k ~ Progra~mer. ng 2nd Edn. Cambridge University Press ntsonanerror-recoveryscheme by Hastmann. Sqftware
- Practice
and
Pettersson,M.(Oct.1992). A termpattern-matchcompilerinspired by finite automatatheory. In C o ~ p i l e r Conference, CC92 (U.Kastens and P. Pfahler, ed.), pp. 258-270. SpringerConstruction, 4th Internati~~nal Verlag of ~unctionalP r o g r Q ~ ~ i ~ n gn g u a g e sPrentice . Hall Peyton Jones,S. L. (1987). The Imple~entati[)n PeytonJones, S. andHughes, J. (Feb.1999).Haskell98: A non-strict,purelyfunctionallanguage. Technicul Report ~ttp://www. haskell.org/onlinereport,Internet Peyton Jones, S. L. and Lester,D. R. (1992). Z~plementing~unctional~ n g u u g e sPrentice , Hall Plajsted, D. A. (1984). The occur-check problemin Prolog. New GenerfftionCo~puting,2,309-322 Poonen, G. (Aug. 1977). Error recovery for LR(k) parsers. In Znfor~ationProcessing 77 (Bruce Gilchrist, ed.), pp. 529-533. North Holland Proebsting, T. A. (May 1995). BURS automata generation. ACM Trans. P r o g r a ~ ~~i nn g~u u g e ~and s S~stems, 17(3), 461-486 Purdom Jr., P.(1970). A transitive closure algorithm.BIT, 10,76-94 Computing -Theory and Practice. McGraw-Hill Quinn, M. J. (1994). Parul~e~ c a 185-192 , Raiha, K.-J. and Saarinen,M. (1982). Testing attributeg r a ~ a r for s circularity,Acta I n ~ ~ r ~ a t i17, ~ ~ 32(5), R~alingaln,G. and S~nivasan,H. (1997). A member lookup algorithm forC++. ACM S Z G P Notices, 18-30 Reps, T.(March 1998). ~aximal-munchtokenization in linear time. ACM Trans. P r o g r a ~ ~ i ~ nn gguage~ and s S ~ s t e ~ .20(2), s , 259-273 RdvCsz, G. E. (1985). Introduction to ~ o r LanguQges. ~ a ~ McGraw-Hill Richter, H. (July 198s). Noncorrecting syntaxerror recovery. ACM Trans. P r o ~ r a ~ ~ i n g ~ n gand u a gSystems, es 7(3), 478-489 Ritter, T. andWalker, G. (Sept. 1980). Varieties of threaded code for language il~plementation.BYTE, 5(9), 206-227 Robinson, J. A (Jan. 1965). A machine-oriented logic basedon the resolution principle.J. ACM, 12(1), 23-41 Mach. Intell., 6,63-72 Robinson, J.A.(1971). Computational logic: The unification computation. Rohrich, J. (Feb. 1980). Methods for the automatic construction of error correcting parsers. Acta Z ~ f o ~ a t i c a , 13(2), l1 5-139 Sale, A. (Sept. 1981). The i~plementationof case statements in Pascal. S o f ~ a r e Practice and Experience, 11(9), 929-942 Saloman, D. (1992).A ~ s s e ~ ~ land e r sLoaders. Ellis Horwood C o ~ ~ uACM, n . 3(2), 76-83 Samelson,K. and Bauer, F.L. (Feb. 1960). Sequential formula translation. Sankaran, N. (Sept. 1994). A bibliography on garbage collection and related topics. ACM SIGPLAN ~otice,s, 29(9), 149-158 Sassa, M. and Goto, E. (1976). A hashing method for fast set operations. I ~ j ~ r ~ a t Processing ion Letters, 5(2), 3 1-34 Schnorr, C. P. (May 1978). An algorithm for transitive closure with linear expected time. SIAM J. Comput., 7(2), 127-133 Schorr, H. and Waite, W. NI. (Aug. 1967). An efficient ~achine-independentprocedure for garbage collection in various list structures.C o ~ n ~ uACM, n . 10(8), 501-506 Sedgewick, R. (1988).A l ~ ~ r i t h mAddison-Wesley s. J. ACM, 17(4), Sethi, R. and Ullman, J.D. (Oct. 1970).The generationof optimal code for arithmetic expressions. 7 1S-728 Sheridan, P. B. (Feb. 1959). The arithmetictranslator-co~pilerof the IBM FORTRAN automatic coding system. C a ~ ~ ACM, u ~2(2), ~ 9-21 .
-
Siklbssy, L. (June 1972).Fast and read-only algorithms for traversing trees withoutan auxiliary stack. Information Processing Letters, 1(4), 149-152 Sippu, S. and Soisalon-Soininen, E. (l988/1990). Parsing Theory, Vol. I: ~ n s u a g e and s Par.sing; Vol. 11: LL(k) and ~(~~Parsing. Sp~nger-Verlag Skill;cosn, D. B. and Talia, D.(June 1998). Models and languages for pasallel computation. ACM C o ~ ~ u ~ ~ n S ~ ~ e y 30(2), s, 123-169 Stirlil~g7 C. P. (March 198s). Follow set error recovery. S o f ~ a r e-Practice and E ~ p e r ~ e n c15(3), e, 239-257 shared memory. IEEE Co~puter, Stnmm, M. and Zhou, S. (May 1990). Algorithms imple~enting dis~buted 1997). ~ n s u a s e and s Machines -An Introduction to the Theory of C o ~ p u t eScience. r 2nd Edn. Addison-Wesley Syslo, M. M.andDzikiewicz,J. (1975). Computationalexperienceswithsometransitiveclosurealgorithms. C ~ ~ ~ u t i15,33-39 ng, Tanenbau~,A. S. (1995). Distr~butedO ~ e r a t ~Snys~ s t e ~Prentice s. Hall Tanenbaum, A. S., Van Staveren, H., Keizer, E. G., and Stevenson, J. W, (Sept. 1983). A practical toolkit for making portable compilers. C o ~ ~ u ACM, n . 26(9), 654-660 W. (Jan.1982).Usingpeephole0ptimizat;on on Tanenbaum,A. S., VanStaveren,H.,andStevenson,J. i n t e r ~ e ~ i acode. t e ACM Trans. P r o s r a ~ ~ i n~gn g u a s e and s S y . s ~ e ~4(1), s , 21-36 Tarditi, D. R., Lee, P,, and Acharya, A. (June 1992). No assembly required: compiling Standard ML to C. AGM Letters on P r o g r a ~ ~ i n~gn g u a g e and ~ s S y s ~ e ~ 1(2), s , 161-177 Tarjan, R. E. (April 1975).Efficiency of a good butnot linear set merging algorithm.J .ACM, 22(2), 215-225 Templ, J. (April 1993). A syste~atic approach to multipleinhe~tance.ACM S I G P Notices, ~ 2$(4)761-66 Terry, P. D.(1997). Co~pilersand Compiler Generators -An Int~oductionUsing C+ +.Intesnational Thomson . 11(6),419-422 T h o ~ p ~ oK. n ,(June 1968). Regular expression searcha~gorithm.C o ~ m u nACM, s r aEdn, m ~ iAddison-~esley ng. Thompson, S. (1999). ~a.skel1:The Craft of ~ u ~ ~ c t i o n a ~ P r ~ ~ 2nd T u ~ e r ,D. A. (1979). A new implementatio~technique for applicativelanguages. S o f ~ a r e ~racticeand Experience, 9,3 1-49 Uhl, J. S. and Horspool, R. N.(1994). Flow g r a ~-~a flow s analysis methodology. In C o ~ ~ i l Construction: er 5th International Conference, CC ’94 (Peter A. Fsitzson, ed.), pp. 203-217. Spsinger-Veslag Ungar, D. M. (May 1984). Generationscavenging:A non-dis~ptivehighpesformancestorage recl~ation a l g o ~ t ~ lACM n. SIGP~N Notices, 19(5), 157-1 67 Van Ro ,P. (1994). 1983-1993: The wonder years of sequential Prolog i~plementation.J .Logic P r o g r a ~ ~ i n g ,
-
Van Wijn~aarden,A.,Mailloux, B, J., Peck, J. E. L., Koster, C. H. A., Sintzoff, M., Lindsey, C. H,, Meertens, L. G. L. T., and Fisker,R. G. (1975). Revised report on the algorithnk language Algol 68. Acta Z n f o r ~ u ~ ~ 5,c a , 1-236 Verb~gge,C., Ca, P., and Hendren, L. (1996). Generalized constant propagation -a study in C. In C o ~ p ~ ~ e r C ~ n s t r ~ ~ t i6th o n Inte~ationaf : Conference, CC’96 (Tibor Gyimbthy, ed.), pp. 74-90. Springer-Verlag Vitels, J. and Horspool, R.N.(1996). Compact dispatch tablesfor dynamica~lytyped object-o~ente~ languages. In Co~pilerConstruction: 6th Zntern~tionalC~)nference,CC’96 (Tibor Gyimbthy, ed.), pp. 309-325. SpringerVerlag Waddle, V. E. (Jan. 1990). Productiontrees: a compactrepresentationofparsed progra~s.ACM Trans. P r o g ~ a ~ m~i n~ s~uga s e and s S y s t e ~ s12( ~ 3 ),61-83 W ~ e nD. , H. D.(Oct. 1983). An Abstract PrologInst~ctionSet. Technical Report Technical Note 309, Artificial Intelligence Center, SRI Wasshall, S. (1962). A theorem on Boolean matrices. J.ACM, 9, 11-12 Property extraction in well-founded property sets. IEEE Trans. S o f ~ a r e E n ~ i ~ e e r i n g , e 1990). Fast code generation using
au~omatic~ly generated decision trees. ACM S I G P ~ ~
90). .Pitfalls of conservative garbage collection. S o ~ a r e Practice and Experience, Wich~ann,B. A.(May-June1977).Howtocallprocedures,orsecondthoughts ~of~ar ePractice and E ~ ~ e r i e n c7(3), e , 317-329
on A c k e ~ a n n ’ sfunction.
Wilhelrn, R.and Maurer, D. (1995). Compiler Design,~ddison-Wesley Wilson, G. V.(1995). Practical ParallelProgrammin~.MIT Press Wilson, G. V.and Lu,P. (1996). Parallel Programming Using C++.MIT Press ce Parallel Computi~g.Addison-Wesley Wolfe, M, J. (1996). High P e ~ o ~ a nCompilers~or Academic Press Yershov, A. P. (1971). The Alpha auto ma ti^ Program~ingSyste~. Yuval, G. (July-Aug. 1977). The utility of the CDC 6000 registers. S o ~ a r e Practice and Experience, 7(4), 535-536
-
2-pass compiler 26 2-scan compiler 26 a posteriori type 455 a priori type 455 ABC 46 abstract class 473,701,703 abstract data type 471-472 abstract syntax tree 9,22,52,55, 194 acceptable partitioning 222,223-224 acceptable-set method 138-139 accumulating arguments 585, 595 acquiring a lock 660,669-670,674,695 action routine 616 ACTION table 160,170,191 activation record 32,412,462,482 active node 285-286 active routine 32,485,497,711 active-node pointer 285-287,297 actual parameterin Linda 664 Ada 447,696 address descriptor 3 19 address space 29,376,657,659,662,668,675 a~inistrationpart 483-484,514,522 aggregate node allocation 588 Algol 68 457,463-464,488,496,622 Algol 68 format 187 alignment requirements 379,399,435,465,467, 533 alternative 37, 113-1 14,702 ~ b i g u o u grammar s 38, I 1 1 , 172, 192,457 amortized cost 397,560 analysis-synthesis paradigm 6
annotated abstract syntax tree annotation 9,96, 194 anonymous type declaration 449 ti-depende~ce 687 M T L R 132 application spine 561,573,718 ~lpplicative-orderreduction 566,5 applied occurrence 199,380,440, ~ ~ t h ~ esequence t i c construct 541 arithmetic si~plifica~~on 367, 370, 586 arity 546,566,587,610,639 array descriptor 469 array type 450,467 may withoutarra 534 a s s e r t a () 630 a s s e r t z (1 630-631
assignl~ent248,277, 321,416,458,464,532 assignment under a pointer 334,688 associative addressing 664 AST 9 asynchronous message passing 66 atomic action 664 Attach clauses instruction 603 attribute 9, 96, 135,234, 547 attribute evaluation rule 196-197,211,223, 227, 274 attribute evaluator 196,202,204,23 1, 273 attribute g r ~ 7,195, m ~ 547 auto~aticpa~al~elization 684, 695 automatic programming 33
auxiliary register 309 available ladder sequence 328 back-end 3,6, 11,32 backpatch list 263, 380 backpatching 263,380 backtracking 77,117,596,607,654 Backus Normal Form 37 Backus-Naur Form 37 backward data-flow 262,277 bad pointer 462 bag399 base address 467-468 basic block 254,320,322,373 basic item 72-73, 80 basic operand 35 1 basic type 450,460,543,582 BC(k,m) parsing 152 binary relation 598 binding a logical variable 596,654 bison 178 bitset470 block 397,399 blocked thread 668-670,695-696 BNF 37,59,158,202 body 599 Boolean control expression 502 Boolean type 461 bootstrapping 2,709 bottom-up parser 114, 150, 178, 191,230,234-235, 712 bottom-up pattern matching 341,342-343,346 bottom-up rewriting system 341 bottom-up tree rewriting 320,357,365, 372 bounded stack 436 boxing analysis 582 broad compiler 26-27,96,244,320 bucket 100,102 BURS 337,341,349,357,494 busy waiting 669,695 C 25,117,375,442,549,558,597 C preprocessor 107 caching behavior 688 call by reference 5 14 call by result 514 call by value 5 14 call by value-result 514 callee 29,260,513,534,576,583 callee-saves scheme 515,534 caller 260,518,576,583 caller-saves scheme 515,534 calling graph 41,43 calling sequence 513,515,517,634
candidate set 261 case label 506,536 case statement 505-507,536 cast457 cell 41 l , 420 character move 73 child112 child routine 485,495 chunk 397,399 class471 class composition 700 class extension 700,708 clause 596,600 cloning 368,370-37 1 closed hashing with chaining 100 closure 439,498,589 closure algorithm 42,498 closure specification 43,5 1 CM-5657 coalesce 400,403,408 code addressof a routine 483,485,491,496,499, 587 code generatedfor aquery 621 code generation 19,290,295,297,302,320,337, 374,501,523,525,568 code generation module 24 code hoisting 277 code in-lining 24,369-370,395,582,623,638 code segment 375,396,716 code selection 293 code-generating scan 341 coercion 448,455 collision 100, 103,444 column-major order 468 comb algorithm 90 comment starter 68, 187 common subexpression 324-325,334-335, 391-392 common subexpression elimination3 10, 324-326, 549 communication daemon thread672,674 compaction 408,421,428,437,717 co~p.compi1ersnewsgroup 185 compilation by symbolic inte~retation3 18 compilation unit 31,357,450,523 compiler l compiler compiler 8, 178 composition operator 58, 3 86 compound result type 517 concurrent execution 663,675,685 concurrent garbage collection 409,435 condition code 286 condition synchronization 660 condition variable 660,670
conditional compilation 104 conditional text inclusion 104, 107 conservative garbage co~lection414 conservative parallelization 687 constant folding 24,49, 55,367,395, 549 constant propagation 250,253,369,395, 638 context condition 39, 195-196, 199,709 context handling 9, 19, 194, 244,440 context andl ling module 23,54,97,279 context (in coercions) 455 context switch 647-668,670 ,39,60,133, 194,712
directly derivable from 39 d i s c ~ ~ n a t eunion d 467 5-476,478-479,481, 512
continuation of a stack 139, 140, 175
dynamic cycle detection 211
conversion457 core of an LR(1) state 170-17 1 coroutine 29,33,414,482,484,485,500 coroutine call 29 ctype package 68 currying 489,497,540,546,SS 1,560,592-593 cut operator 597,601,628,654
dynamic programl~ing 337,347,350,357
dag 321
distributed system 656, 695 dist~butingan array 690 dope vector 526 dotted item 71-72, 110, 153,373 dotted tree 342 dvi format 8 dynamic allocation part 515,516 dynamic array 470, 5 14
E closure 77,79, 154, 166 E move
73,154,186 EBNF 38, 50, 59 elaborate a statel~lent284 element of an m a y 467 element selection 468 empty list 541
~ ~ n g e rcompiler ~us 29 dan~lingpointer 407,463-464 data dependency 686 data p~allelism658, 665-666,685 data segment 376,396,7 16 data-flow equations 253,262,265,267,277 data-flow paradigm 195 p r o g r ~ ~ n658,665, g 688 data-pa~~l~el dead code e l i ~ n a t i o n368,395,638 dead variable 32 1 debugging of garbage collectors 415 decl~ationof a method 473 defining occurrence 199,380,440,4 definition of a method 473 degree of a node 360 , dep~ndencygraph 2 0 ~274,321,326-328 dependent inheritance 478 dependent on 200,334 dereferencing a field selection 461 derivable from 40 derivation36 dese~alization672 d e s u g ~ n g553 dete~inistic110, 117, 134, 172,337 direct left-recursion l30 directed acyclic graph 321
enu~erationtype 461,467,475 equation541 error correction 116, 136, 144,175 error detection 115-116, 142, 190 error handling in lexical analyzers 93 error state 158 error-recove~ngnon-te~inal176 escape character 59 Escher, M.C. 301 Ethernet657 evaluate an expression 284 event queue 404 exception hand~er 415,522,537 exclusive or 436 executable code output module 24 Extended BNF 38 extended subset 30, 50 extensible may 98,287, 398,404,435 external entry point 377 external name 377 e x t e ~ reference a~ 377 external symbol 377 external symbol table 377 extra-logical predicate 630
failure in theWAM 625
field selection 466 file inclusion 104 filter 557 final m e ~ o d479 fine-grained p~allelism685 finite-state auto~aton80 finite-state tree automaton 347 FIRST set 12I first-class citizens, functions as 545 FIRST~IRSTconflict 127 first-fit 90 FIRST~OLLOWconflict 127 jlex 187 flow dependence 687 FOLLOWset 123 follow set 138 FOLLOW-set error recovery 139 follow-set error recovery 138 fork statement 659 formal attribute 195 formal parameterin Linda 664 for-statement 508 forward declaration 441,473 forward reference 450,594 forwarding address 676 FP 482,515,517-518 frame 412,482,489,585 frame pointer 482,489,493,559 free bit 399,420-421 free list 400,416 from-space 408,425 front-end 3,52,539 FSA 50,80,347,373 full symbolic inte~retation 247,2S 1, 277 full-LL(1) 124, 190 fully p~enthesizedexpression 12,701 function result register 5 17 functional core 549,558-559, S68, 575, 594 functor 607, 639 garbage collection 262,398,407,545,598 gabage set 407 gcc 297,333 general closure algorithm 44 general name space 442 generational garbage collection 409,429 generator 557 generic pointer type 462 generic unit 109,525 global routine 485 GNU C compiler 297 goal599 goal list stack 601 goto statement 25 1, 368,502
GOT0 table 158 g r a m ~ a r34,35 grammar symbol 35,195 graph coloring 92 graph reducer 560 handle152 hash func~ion100,682 hash table 300, 188,325,443,470,532,626,682 Haskell 540 head599 header file in C 22 heap 397,462,484,495,498,560 heap sort 404 heroic compiler 659 heuristic graph coloring 360 hidden left-recursion 130, 149 High Performance Fortran 693 higher-order function 545 HPF 693 Huffrnan compression 86
IC 23 Icon 29,33,389,488,496,615, 622 if-statement 368, 505 i~plementationlanguage 1, 100, 399,451 implicit parallelism 659 implicit receipt 662 imported scope 448 in operation in Linda 664,679 in situ replacement 595 inactive routine 485 incremental garbage collection 408,436 independent inheritance 478 indexing468 indirect left-recursion 130 indirect routine call 261,47 1,489 indivisible operation 660,664-665,669,695,697 inference rule 42 inference technique 601 inheritance472 inherited attribute 196 initial state 80, 343,351 in-lining368 input tree 339 inst~t~ation through dope vectors S26 ~ n s ~ n t ~ athrough ~ i o n expansion 526 instruction ordering 293 inst~ction-colIectingscan 341 inst~ction-selectingscan 341 inst~mentinga program 3 17 interface 480 interference graph 92,359 intermediate code 9,23
intermediate code generation module 23 inte~ediatecode opti~zationmodule 24 intermediate code tree 279 interpreter 3,281, 295, 560,603,630, 699 inte~roceduraldata-flow analysis 260,688 invalidate replicated copies 677 IS-dependency 2 12 IS-SIgraph 212,274 itern '71-72,110,153,373 iterative interpreter 285,297, 389,603 iterator 485, 500
I 673 Java M join operator 255 jump buffer 490 jump table 506 jumping out of a routine 488,496 kind449 ladder sequence 328 LALR( 1) automaton 172 LALR(1) parsing 152, 170, 176, 191,234 lambda lifting 499,559,594 language generatedby a grammar 40 last-def analysis 253 late evaluation 224,228, 328 latency 67 1 lattice 250, 277 L-attributed g r a ~ a 230, r 269,275 lazy Boolean operator 117-1 18,503 lazy evaluation 328,547,560,564,566,575 Ice 389 least fixed point 43 left-associative 153, 540,7 12 left-factoring 129 left-hand side 35 leftmost derivation 36 leftmost innermost redex 566 leftrnost outermost redex 566 left-recursion removal 129 left-recursive 38, 120, 127, 144 Lempel-Ziv compression 86 let-expression 543, 572,594 lex 83,94, 187 lexical analysis module 22 lexical analyzer 10, 61, 68, 83,9344,541 lexical identification phase 98 lexical pointer 483-484,491,493, 514,559 library 25,281,297, 376,658,672 lifetime 463 Linda 664 Linda kernel 678 Linda preprocessor 678
Linda Tuple Space 678 linear-time 31, 86, 131, 168, 224,277,406 linker 25,376, 381, 389 lint 247
list procedure 616, live analysis 262-263,265, 319, 333, 358 live variable 262 LL(1) conflict 127-128 LL(1) grammar 123, 127, 142 LL(1) parser generation 123 LL(1) parsing 121-122,139 LL(2) g r a ~ m a r132, 190 LLgen 132,142, 179,232,237,711 LLgen directive 144 loader 3'76,389 local attribute 203 local variable area 515 location458 lock variable 660,669 longjmp (1 490 look-ahead set 165-166, 190 loop interchange 689 loop r e s t ~ c t u ~ ntransformation g 688 loop unrolling 511 loop-carrieddependence 687 loop-exit list 248 LR automaton 159 LRitern 153, 193 LR reduce item 154 LR shift item 154 n ~ LR(0) g r ~ 162 LR(0) parsing 152-153, 191
machine code generation 374 machine code generation module 24 macro application 103 macrocall103 macro definition 103,444 makefile 32,296 mark and scan 408,420 mark and sweep 420 marked bit 420 marker rule 13 I marking 420 marking phase 420 marshal;ng672 Match iiistruc~ion603,609 matchmove134 matching a goal 599
matching in regular expressions 58 matrix467 maximal basic block 320 meaning of 'to compile' 3 meet operator 255 member35 memoization 285,348, 350,389,436 memory dump 520 memory fragmentation 408,419,427 memory management unit 462 message aggregation 692 message buffer 672 message combining 692 message handler 662 message passing 658,671 message selection 674 method663 method invocation 471,479,512,673 method overriding 472 Miranda593 module523 module initialization 524 monitor 660,669,695 monitor lock 670 most recent binding 607,613 MP1 658 ~ u l t i c o ~ p u t e657 r multiple inheritance 476,533 multiprocessor657 multi-visit attribute evaluation 219, 230 mutual exclusion synchronization 660 name equivalence 454 name generation for modules 524 name list 98,442 name space 442,526,532,675 narrow compiler 26,96,232,269,302,320-321, 444 nested routine 485, 558,597,618 newline 57, 185 non-correcting error recovery 116 non-cyclic attribute grammar 217,275 non-dete~inistic162,362 nondet~r~nistic-polynomial 90 non-local ~ssignment494 non-local goto 482,488,496,5 18 non-local label 488,491,496 n o n " t e ~ i nsymbol ~ 35,39 no-op instruction 379 normal-order reduction 566 NP-complete 89,90,293,320,360,373 N-phase compiler 27 N P D grammar 189 null sequence 367
nullable non-terrninal 38, 121-122, 190 object 471,658 object constructor 471,527,702,708 object destructor 471,527 object identifier 675 object location 675 object migration 676 object replication 677 object-oriented compiler construction 699 objec~-ol~ented parsing 702 occur check 608 offset table 479 offside rule 438,541 OID 675 one-shot garbage collection 408 one-space copying 437 on-the-fly garbage collection 408 operation on an object 663 opti~ization 32,64, 102,403,479,511,646,648 ordered attribute evaluator 224 ordered attribute grammar 224 out operation in Linda 664 outer loop 686 outline code 46 output dependence 687 overloading447 package523 panic-mode error correction 139 parallel loop statement 665 parallel system 656 parameter area 514 parameter passing mechanism5 14 parent class 472,476,481 parent routine 485 parse table 164 parse tree 9,148 parser 36,110,702 parsing 9, 110,540, 616 partial evaluation 297,300, 390,638 partial paramete~zation 488,499 partition (Tuple Space) 679,683 partitioning (of attributes) 222 passing a routine as a parameter 463,487,495,617 pattern matching 543,553, 593 pattern tree 337 PDA 134 peephole optimization 333,371 phase of a compiler 27 PLD preprocessor 104-105, 107-108,710 pointer chasing 677 pointer consistency 410 pointer layout 410
pointer problems 334 pointer (re)subtyping 474 pointer reversal 422 pointer supertyping pointer tagging 588 pointer type 46I pointer validity 410 polymo~hicfunction application 551 polymo~hictype S43 polymorphic type checking551 p o l y m o ~ h i s474 ~ po1yn~)mial time 90 pool of threads 674 port 662 portability32 portability of a compiler 32 position"independent code 716 post-order visit 112 precedence 59, 174 precedence parsing l 52 precomputation 66, 80, 120, 157, 191,285, 346, 469,475 prediction move 134 prediction rule 155 predictionstack 134 predictiveparser 122 predictive recursive descentparser 17, 121 pre-order visit 112 prerequisite to 200,210 priva~i~ation 689 process 658,659 process abstraction 667 production rule 35,39, 196, S47 produ~tiontree 35 program counter 35 1, 395,482,493,662,667 program data area 409,496 program generator 7, 66, 301 program text input module 22, 56 program-counter relative addressing 395 Prolog 43, 296,597,628,630,654 property list 247 pseud(~-register329, 362 pure register machine 304 pure stack machine 303 push-do~nautomaton 13 PVNI 658
rea~writemode 646 receive statement 662 reclamation434 record type 465 recursive descent parser 15, 117 recursive interpreter 28 1 redex 561 reduce item 72, 186,714 reduce move 160 reduce~reduceconflict 162 reducible expression 561 reduction engine S66 reduction (graph) 561 reduction (in parsing) 1S2 reduction stack 159 reference counting 407,415 referential t r ~ s p a r e n ~406, y 544 refine~ent482 register allocation 293,3 16, 332-333, 357 register allocationby graph coloring 320,357 register and variable descriptor 318 register descriptor 3 19 register interference graph 359 register spil~ingtechnique 3 14 register storage class 336 register tracking 320, 333 regi~ter- memo^ machine 327 regular description 56,60
regvar descriptor 318 relation 596,598 releasing a lock 660 relocation380 relocation in~ormation377 repeated inheritance 478 repeated variables S93 repetition operator 58, 72, 124, 142 replacement pattern 371 replicated Tuple Space 683 rest of program, grammar for 116 resume statement 485 resuming a routine 483,4 retargetability 32 return information 483 return list 247 r ~ ~ ~a routine i n g as a value 482,487,495,559 ri~ht-associative 117, 192,551, S93
qualifier557 root set 409,496 reachillg-definitionsana~ysis253 read operation in Linda 664
routine denotation 621
routine type 471 row displacement 87 row-major order 468 rule 596,599 runnable thread 668 ~ n n i n groutine 32,439,482,485 ~ n n i n thread g 668 run~timeerror 5 19 run-time system 25,281,396,499,545,560,659, 667 rvalue 458,462,470,5 14 safe pointer 464 S-attributed gr~nlnar230,235, 275-276 scalar expansion 689 scan phase 420 scan pointer 425 scheduler 668,686 scheduler activation 667 Schorr and Waite algorithm 422,436 languages 442 scope in block~s~uctured scope of a pointer 463,491,532,559 screening98 select statement 696 selection statement 505 self-descriptive chunk 41 1 self-organizing list 102 semantic checking 441 semantic representation 2 send s~a~ement 662 sentence 40 sentential form 35,40 serialization 672 set type 470 Sethi-Ull~annumbering 3 11 s e t jmp () 491 setjmp~on~m mechanism p 490 shadow memory 287 shared variable 658,659, 668 shift item 72, 166, 172, 186 shift move 159 shift-reduce conflict 162 shift-shift conflict 391 sho~-circuitinga function application 573,586 SI-dependency 2 12 signal handler 521, 537 signal operation 660,670 signal statement 521 signa~reof a computer virus 83 simple symbolic interpretation 247,277,712 simulation on the stack 245 single inheritance 476 Single Program Multiple Data parallelism 685 SLR(1) parsing 152,163, 190-191
S~alltalk33 sorted set 399 source language 1 SP 513 SP-2 657 sparse transition table 86 specialization 370 spineless tagless G-machine 592 spinning669 SPMD parallelism 685 SR 674 stack 513 stack, directionof growth 513 stack of input buffers 105 stack pointer 533 stack representation 245 stack segment 376,396 start symbol 35,39, 136, 144, 199 state 80 state ans sit ion 80 static array 470 static attribute evaluation 204, 218 static binding 474 static cycle checking 22 1 static link 484,491,5 14 status indicator 282 storage representationin Linda 680 strength reduction 367 strict ~ g u ~ e 565,574,576, nt 579,594 strictness analysis 576, 594 string39 strong-LL(1) parser 124 strongly LL( 1) 124 strongly non-cyclic 217,275 struct~ra1 equivalence 455 structure type 465 subclass472 subroutine485 subset algorithm 81, 358, 346,352 substitution in grammars 129 subtree 112 Succeed instruction 603,605 suffix grammar 116, 188 supercompilation363 suspend statement 485 suspended routine 439,482,485 switch statement 505 symbol 39 symbol table 98, 133, 188,284,442 symbolic interpretation 245 sync~onizationprimitive 660 synchronous message passing 662 syntactic sugar 540,553 syntax allalysis 9, 57, 110,440
syntax analysis module 22 syntax tree 9, I 1 0, 135,230 synthesized attribute 196 table compression 86 table lookup 68,285,347 tail call 582 target code optimization module 24 target language 1,375,523,549 target register 309 task 33,656 task parallelism 665 TCPDP 671 term 596 terminal 35 t e ~ n aproduction l 40 terminal symbol 3 5 3 9 terminated routine 32,462,485 test-and-set instruction 669 TeX 8 thread 658,662,667 thread control block 667 thread preemption 668 threaded code 297 threading (of an AST) 239,247,276,287 ~ o u g h p u t67 1 token 10,35 token description 61,70,77 tokenize 10 top-down parses 113, 120, 142,232 to-space 408,425 totally-ordered group c o ~ u n i c a t i o n678 trail 610, 634 transition diagram 82, 158, 161, 163 transition function 80 transitive closure 42,498 transitive closure algorithm 44 translation 1 traversing a node 112 traversing a tree 3 12 tree rewriting 290 triple 324,326 tuple matching 664 Tuple Space 658,664,678 Tuple Space server 678 two-scans assembly 380 two-space copying 408,420,425,437,545 type 449 type checking 449,673 type declaration 449,544,55 1 type equivalence 454-455,55 1 type extension 472 type table 450,552 type variable 544,551
typed pointer 462 UDP 671 unary relation 598 unbound variable 599,612,625 undisc~~inated union 467 ,596,607,609,612,634,654-455 in Linda 683 u n i f o distribution ~ uniform self-identifyillgdata sepresentation 28 1 Unify inst~ction603,609 union tag 467 union type 466 u n ~ ~ r s h a l i672 n~ unrolling factor 5 11 unwin~ing566,575,589,718 update replicated copies 678 URL syntax 185 usage count 317,362 useless non-terminal 38 user space 667
validity span 463 value 458 value oC a variable in an assembler 379 variable descriptor 319 vai~able-lengthgraph node 587 vector 467 vector apply node 588,595 very busy expression 277 visiting a node 112 visiting routine 220 vocabulary 39 void type 460 wait operation 660,670 "AM 597,623,636,638,648,653 Warren Abstract Machine 597,653 weight of a subtree 310-311,313-314,316,365, 390 weighted register allocation 3 10, 339, 349 while s ~ a t e ~ e 276,507,509 nt working stack 308,412,461,483,513,515,517 yacc 176, 178,234
yield state~ent485 zeroth element 469,533