The Art Of Assembly Language Programming
by Randall Hyde
For the High Level Assembler and other pertinent information visit the original site : http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html This is an unaltered, combined and bookmarked version of the original PDF documents. Acrobat file prepared in March 2003
The Art of Assembly Language Programming
The Art of Assembly Language Programming (Short Contents) The Art of Assembly Language ................................................................. 1 Chapter Two Volume One:Data Representation ........................................ 1 Chapter One Foreword ................................................................................ 3 Chapter Two Hello, World of Assembly Language ................................... 11 Chapter Three Data Representation ............................................................ 53 Chapter Four More Data Representation .................................................... 87 Chapter Five
............................................................................................... 119
Chapter Five Questions, Projects, and Lab Exercises ................................. 119 Volume Two: ............................................................................................. 135 Machine Architecture ................................................................................. 135 Chapter One System Organization .............................................................. 137 Chapter Two Memory Access and Organization ........................................ 157 Chapter Three Introduction to Digital Design ............................................ 203 Chapter Four CPU Architecture .................................................................. 234 Chapter Five Instruction Set Architecture .................................................. 270 Chapter Six Memory Architecture .............................................................. 303 Chapter Seven The I/O Subsystem ............................................................. 327 Chapter Eight Questions, Projects, and Labs .............................................. 355 Volume Three: ........................................................................................... 391 Basic Assembly Language ......................................................................... 391 Chapter One Constants, Variables, and Data Types .................................. 393 Chapter Two Introduction to Character Strings .......................................... 419 Chapter Three Characters and Character Sets ............................................ 439 Chapter Four Arrays ................................................................................... 463 Chapter Five Records, Unions, and Name Spaces ...................................... 483 Chapter Six Dates and Times ...................................................................... 501 Chapter Seven Files .................................................................................... 517 Chapter Eight Introduction to Procedures ................................................... 541 Chapter Nine Managing Large Programs ................................................... 569 Chapter Ten Integer Arithmetic .................................................................. 587 Chapter Eleven Real Arithmetic ................................................................. 611 Chapter Twelve Calculation Via Table Lookups ........................................ 647 Chapter Thirteen Questions, Projects, and Labs ......................................... 663 Beta Draft - Do not distribute
© 2002, By Randall Hyde
Page 1
The Art of Assembly Language Programming
Volume Four: ............................................................................................. 725 Intermediate Assembly Language .............................................................. 725 Chapter One Advanced High Level Control Structures ............................. 727 Chapter Two Low-Level Control Structures .............................................. 751 Chapter Three Intermediate Procedures ...................................................... 805 Chapter Four Advanced Arithmetic ............................................................ 853 Chapter Five Bit Manipulation ................................................................... 909 Chapter Six The String Instructions ............................................................ 935 Chapter Seven The HLA Compile-Time Language ................................... 949 Chapter Eight Macros ................................................................................. 969 Chapter Nine Domain Specific Embedded Languages ............................... 1003 Chapter Ten Classes and Objects ................................................................ 1059 Chapter Eleven The MMX Instruction Set ................................................. 1113 Chapter Twelve Mixed Language Programming ........................................ 1151 Chapter Thirteen Questions, Projects, and Labs ......................................... 1195 Volume Five: ............................................................................................. 1277 Advanced Procedures ................................................................................. 1277 Chapter One Thunks ................................................................................... 1279 Chapter Two Iterators ................................................................................. 1305 Chapter Three Coroutines and Generators .................................................. 1329 Chapter Four Advanced Parameter Implementation ................................... 1341 Chapter Five Lexical Nesting ..................................................................... 1375 Chapter Six Questions, Projects, and Labs ................................................. 1399 Appendix A Answers to Selected Exercises ............................................... 1405 Appendix B Console Graphic Characters ................................................... 1407 Appendix D The 80x86 Instruction Set ...................................................... 1449 Appendix E The HLA Language Reference ............................................... 1483 Appendix F The HLA Standard Library Reference .................................... 1485 Appendix G HLA Exceptions ..................................................................... 1487 Appendix H HLA Compile-Time Functions .............................................. 1493 Appendix I Installing HLA on Your System .............................................. 1531 Appendix J Debugging HLA Programs ...................................................... 1533 Appendix K Comparing HLA and MASM ................................................. 1539 Appendix L HLA Code Generation for HLL Statements ........................... 1541 Index .......................................................................................................... 1561
Page 2
© 2002, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
The Art of Assembly Language (Full Contents) 1.1 Foreword to the HLA Version of “The Art of Assembly...” ................... 3 1.2 Intended Audience ................................................................................... 6 1.3 Teaching From This Text ........................................................................ 6 1.4 Copyright Notice ..................................................................................... 7 1.5 How to Get a Hard Copy of This Text .................................................... 8 1.6 Obtaining Program Source Listings and Other Materials in This Text ... 8 1.7 Where to Get Help ................................................................................... 8 1.8 Other Materials You Will Need (Windows Version) .............................. 8 1.9 Other Materials You Will Need (Linux Version) .................................... 9 2.1 Chapter Overview .................................................................................... 11 2.2 Installing the HLA Distribution Package ................................................ 11 2.2.1 Installation Under Windows .......................................................... 12 2.2.2 Installation Under Linux ................................................................ 15 2.2.3 Installing “Art of Assembly” Related Files ................................... 18 2.3 The Anatomy of an HLA Program .......................................................... 19 2.4 Some Basic HLA Data Declarations ....................................................... 21 2.5 Boolean Values ........................................................................................ 23 2.6 Character Values ...................................................................................... 23 2.7 An Introduction to the Intel 80x86 CPU Family ..................................... 23 2.8 Some Basic Machine Instructions ........................................................... 26 2.9 Some Basic HLA Control Structures ....................................................... 29 2.9.1 Boolean Expressions in HLA Statements ...................................... 30 2.9.2 The HLA IF..THEN..ELSEIF..ELSE..ENDIF Statement .............. 32 2.9.3 The WHILE..ENDWHILE Statement ........................................... 33 2.9.4 The FOR..ENDFOR Statement ...................................................... 34 2.9.5 The REPEAT..UNTIL Statement .................................................. 35 2.9.6 The BREAK and BREAKIF Statements ....................................... 36 2.9.7 The FOREVER..ENDFOR Statement ........................................... 36 2.9.8 The TRY..EXCEPTION..ENDTRY Statement ............................ 37 2.10 Introduction to the HLA Standard Library ............................................ 38 2.10.1 Predefined Constants in the STDIO Module ............................... 40 2.10.2 Standard In and Standard Out ...................................................... 40 2.10.3 The stdout.newln Routine ............................................................ 41 2.10.4 The stdout.putiX Routines ........................................................... 41 2.10.5 The stdout.putiXSize Routines .................................................... 41 2.10.6 The stdout.put Routine ................................................................. 42 2.10.7 The stdin.getc Routine. ................................................................ 43 2.10.8 The stdin.getiX Routines .............................................................. 44 2.10.9 The stdin.readLn and stdin.flushInput Routines .......................... 46 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1
AoATOC.fm
2.10.10 The stdin.get Macro ................................................................... 46 2.11 Putting It All Together ........................................................................... 47 2.12 Sample Programs ................................................................................... 47 2.12.1 Powers of Two Table Generation ................................................ 47 2.12.2 Checkerboard Program ................................................................. 48 2.12.3 Fibonacci Number Generation ..................................................... 50 3.1 Chapter Overview .................................................................................... 53 3.2 Numbering Systems ................................................................................. 53 3.2.1 A Review of the Decimal System .................................................. 53 3.2.2 The Binary Numbering System ..................................................... 54 3.2.3 Binary Formats ............................................................................... 55 3.3 Data Organization .................................................................................... 56 3.3.1 Bits ................................................................................................. 56 3.3.2 Nibbles ........................................................................................... 56 3.3.3 Bytes ............................................................................................... 57 3.3.4 Words ............................................................................................. 58 3.3.5 Double Words ................................................................................ 59 3.4 The Hexadecimal Numbering System ..................................................... 60 3.5 Arithmetic Operations on Binary and Hexadecimal Numbers ................ 62 3.6 A Note About Numbers vs. Representation ............................................ 63 3.7 Logical Operations on Bits ...................................................................... 65 3.8 Logical Operations on Binary Numbers and Bit Strings ........................ 68 3.9 Signed and Unsigned Numbers ............................................................... 69 3.10 Sign Extension, Zero Extension, Contraction, and Saturation ............ 73 3.11 Shifts and Rotates .................................................................................. 76 3.12 Bit Fields and Packed Data .................................................................... 81 3.13 Putting It All Together ........................................................................... 85 4.1 Chapter Overview .................................................................................... 87 4.2 An Introduction to Floating Point Arithmetic ......................................... 87 4.2.1 IEEE Floating Point Formats ......................................................... 90 4.2.2 HLA Support for Floating Point Values ........................................ 93 4.3 Binary Coded Decimal (BCD) Representation ........................................ 95 4.4 Characters ................................................................................................ 96 4.4.1 The ASCII Character Encoding ..................................................... 97 4.4.2 HLA Support for ASCII Characters ............................................... 100 4.4.3 The ASCII Character Set ............................................................... 104 4.5 The UNICODE Character Set ................................................................. 108 4.6 Other Data Representations ..................................................................... 109 4.6.1 Representing Colors on a Video Display ....................................... 109 4.6.2 Representing Audio Information .................................................... 111 4.6.3 Representing Musical Information ................................................. 114 4.6.4 Representing Video Information .................................................... 115 4.6.5 Where to Get More Information About Data Types ...................... 115 4.7 Putting It All Together ............................................................................. 116 Page 2
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
5.1 Questions ................................................................................................. 119 5.2 Programming Projects for Chapter Two .................................................. 124 5.3 Programming Projects for Chapter Three ................................................ 124 5.4 Programming Projects for Chapter Four ................................................. 125 5.5 Laboratory Exercises for Chapter Two ................................................... 126 5.5.1 A Short Note on Laboratory Exercises and Lab Reports ............... 126 5.5.2 Compiling Your First Program ...................................................... 127 5.5.3 Compiling Other Programs Appearing in this Chapter .................. 128 5.5.4 Creating and Modifying HLA Programs ....................................... 129 5.5.5 Writing a New Program ................................................................. 129 5.5.6 Correcting Errors in an HLA Program ........................................... 130 5.5.7 Write Your Own Sample Program ................................................. 131 5.6 Laboratory Exercises for Chapter Three and Chapter Four .................... 132 5.6.1 Data Conversion Exercises ............................................................ 132 5.6.2 Logical Operations Exercises ......................................................... 133 5.6.3 Sign and Zero Extension Exercises ................................................ 133 5.6.4 Packed Data Exercises ................................................................... 134 5.6.5 Running this Chapter’s Sample Programs ..................................... 134 5.6.6 Write Your Own Sample Program ................................................. 134 1.1 Chapter Overview .................................................................................... 137 1.2 The Basic System Components ............................................................... 137 1.2.1 The System Bus ............................................................................. 138 1.2.1.1 The Data Bus ......................................................................... 138 1.2.1.2 The Address Bus .................................................................... 139 1.2.1.3 The Control Bus .................................................................... 139 1.2.2 The Memory Subsystem ................................................................ 140 1.2.3 The I/O Subsystem ......................................................................... 146 1.3 HLA Support for Data Alignment ........................................................... 146 1.4 System Timing ......................................................................................... 149 1.4.1 The System Clock .......................................................................... 149 1.4.2 Memory Access and the System Clock .......................................... 150 1.4.3 Wait States ..................................................................................... 151 1.4.4 Cache Memory ............................................................................... 153 1.5 Putting It All Together ............................................................................. 156 2.1 Chapter Overview .................................................................................... 157 2.2 The 80x86 Addressing Modes ................................................................. 157 2.2.1 80x86 Register Addressing Modes ................................................ 157 2.2.2 80x86 32-bit Memory Addressing Modes ..................................... 158 2.2.2.1 The Displacement Only Addressing Mode ........................... 158 2.2.2.2 The Register Indirect Addressing Modes .............................. 159 2.2.2.3 Indexed Addressing Modes ................................................... 160 2.2.2.4 Variations on the Indexed Addressing Mode ........................ 161 2.2.2.5 Scaled Indexed Addressing Modes ....................................... 163 2.2.2.6 Addressing Mode Wrap-up ................................................... 164 2.3 Run-Time Memory Organization ............................................................ 164 2.3.1 The Code Section ........................................................................... 165 2.3.2 The Static Sections ......................................................................... 167 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 3
AoATOC.fm
2.3.3 2.3.4 2.3.5 2.3.6 2.3.7
The Read-Only Data Section ......................................................... 167 The Storage Section ....................................................................... 168 The @NOSTORAGE Attribute ..................................................... 169 The Var Section .............................................................................. 169 Organization of Declaration Sections Within Your Programs ....... 170
2.4 Address Expressions ................................................................................ 171 2.5 Type Coercion ......................................................................................... 173 2.6 Register Type Coercion ........................................................................... 175 2.7 The Stack Segment and the Push and Pop Instructions ........................... 176 2.7.1 The Basic PUSH Instruction .......................................................... 176 2.7.2 The Basic POP Instruction ............................................................. 177 2.7.3 Preserving Registers With the PUSH and POP Instructions .......... 179 2.7.4 The Stack is a LIFO Data Structure ............................................... 180 2.7.5 Other PUSH and POP Instructions ................................................ 183 2.7.6 Removing Data From the Stack Without Popping It ..................... 184 2.7.7 Accessing Data You’ve Pushed on the Stack Without Popping It . 186 2.8 Dynamic Memory Allocation and the Heap Segment ............................. 187 2.9 The INC and DEC Instructions ................................................................ 190 2.10 Obtaining the Address of a Memory Object .......................................... 191 2.11 Bonus Section: The HLA Standard Library CONSOLE Module ......... 192 2.11.1 Clearing the Screen ...................................................................... 192 2.11.2 Positioning the Cursor .................................................................. 193 2.11.3 Locating the Cursor ...................................................................... 194 2.11.4 Text Attributes ............................................................................. 195 2.11.5 Filling a Rectangular Section of the Screen ................................. 197 2.11.6 Console Direct String Output ....................................................... 199 2.11.7 Other Console Module Routines .................................................. 200 2.12 Putting It All Together ........................................................................... 201 3.1 Boolean Algebra ...................................................................................... 203 3.2 Boolean Functions and Truth Tables ....................................................... 205 3.3 Algebraic Manipulation of Boolean Expressions .................................... 208 3.4 Canonical Forms ...................................................................................... 209 3.5 Simplification of Boolean Functions ....................................................... 214 3.6 What Does This Have To Do With Computers, Anyway? ...................... 221 3.6.1 Correspondence Between Electronic Circuits and Boolean Functions 221 3.6.2 Combinatorial Circuits ................................................................... 223 3.6.3 Sequential and Clocked Logic ....................................................... 228 3.7 Okay, What Does It Have To Do With Programming, Then? ................. 232 3.8 Putting It All Together ............................................................................. 233 4.1 Chapter Overview .................................................................................... 234 4.2 The History of the 80x86 CPU Family .................................................... 234 4.3 A History of Software Development for the x86 ..................................... 241 4.4 Basic CPU Design ................................................................................... 245 4.5 Decoding and Executing Instructions: Random Logic Versus Microcode 247
Page 4
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
4.6 RISC vs. CISC vs. VLIW ........................................................................ 248 4.7 Instruction Execution, Step-By-Step ....................................................... 250 4.8 Parallelism – the Key to Faster Processors .............................................. 253 4.8.1 The Prefetch Queue – Using Unused Bus Cycles .......................... 255 4.8.2 Pipelining – Overlapping the Execution of Multiple Instructions . 259 4.8.2.1 A Typical Pipeline ................................................................. 259 4.8.2.2 Stalls in a Pipeline ................................................................. 261 4.8.3 Instruction Caches – Providing Multiple Paths to Memory .......... 262 4.8.4 Hazards ........................................................................................... 263 4.8.5 Superscalar Operation– Executing Instructions in Parallel ............ 265 4.8.6 Out of Order Execution .................................................................. 266 4.8.7 Register Renaming ......................................................................... 266 4.8.8 Very Long Instruction Word Architecture (VLIW) ....................... 267 4.8.9 Parallel Processing ......................................................................... 268 4.8.10 Multiprocessing ............................................................................ 268 4.9 Putting It All Together ............................................................................. 269 5.1 Chapter Overview .................................................................................... 270 5.2 The Importance of the Design of the Instruction Set ............................... 270 5.3 Basic Instruction Design Goals ............................................................... 271 5.3.1 Addressing Modes on the Y86 ....................................................... 278 5.3.2 Encoding Y86 Instructions ............................................................. 279 5.3.3 Hand Encoding Instructions ........................................................... 282 5.3.4 Using an Assembler to Encode Instructions .................................. 286 5.3.5 Extending the Y86 Instruction Set ................................................. 287 5.4 Encoding 80x86 Instructions ................................................................... 288 5.4.1 Encoding Instruction Operands ...................................................... 290 5.4.2 Encoding the ADD Instruction: Some Examples .......................... 296 5.4.3 Encoding Immediate Operands ...................................................... 300 5.4.4 Encoding Eight, Sixteen, and Thirty-Two Bit Operands ............... 301 5.4.5 Alternate Encodings for Instructions ............................................. 301 5.5 Putting It All Together ............................................................................. 302 6.1 Chapter Overview .................................................................................... 303 6.2 The Memory Hierarchy ........................................................................... 303 6.3 How the Memory Hierarchy Operates ..................................................... 305 6.4 Relative Performance of Memory Subsystems ....................................... 306 6.5 Cache Architecture .................................................................................. 308 6.6 Virtual Memory, Protection, and Paging ................................................. 312 6.7 Thrashing ................................................................................................. 314 6.8 NUMA and Peripheral Devices ............................................................... 315 6.9 Segmentation ........................................................................................... 316 6.10 Segments and HLA ................................................................................ 316 6.10.1 Renaming Segments Under Windows ......................................... 317 6.11 User Defined Segments in HLA (Windows Only) ................................ 319 6.12 Controlling the Placement and Attributes of Segments in Memory (Windows Only) 321 6.13 Putting it All Together ........................................................................... 325 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 5
AoATOC.fm
7.1 Chapter Overview .................................................................................... 327 7.2 Connecting a CPU to the Outside World ................................................. 327 7.3 Read-Only, Write-Only, Read/Write, and Dual I/O Ports ...................... 329 7.4 I/O (Input/Output) Mechanisms .............................................................. 331 7.4.1 Memory Mapped Input/Output ...................................................... 331 7.4.2 I/O Mapped Input/Output ............................................................... 332 7.4.3 Direct Memory Access ................................................................... 333 7.5 I/O Speed Hierarchy ................................................................................ 333 7.6 System Busses and Data Transfer Rates .................................................. 334 7.7 The AGP Bus ........................................................................................... 336 7.8 Handshaking ............................................................................................ 337 7.9 Time-outs on an I/O Port ......................................................................... 340 7.10 Interrupts and Polled I/O
.................................................................. 342
7.11 Using a Circular Queue to Buffer Input Data from an ISR ................... 343 7.12 Using a Circular Queue to Buffer Output Data for an ISR .................... 349 7.13 I/O and the Cache .................................................................................. 352 7.14 Protected Mode Operation ..................................................................... 352 7.15 Device Drivers ....................................................................................... 353 7.16 Putting It All Together ........................................................................... 354 8.1 Questions ................................................................................................. 355 8.2 Programming Projects ............................................................................. 361 8.3 Chapters One and Two Laboratory Exercises ......................................... 363 8.3.1 Memory Organization Exercises .................................................... 363 8.3.2 Data Alignment Exercises .............................................................. 364 8.3.3 Readonly Segment Exercises ......................................................... 367 8.3.4 Type Coercion Exercises ................................................................ 367 8.3.5 Dynamic Memory Allocation Exercises ....................................... 368 8.4 Chapter Three Laboratory Exercises ....................................................... 369 8.4.1 Truth Tables and Logic Equations Exercises ................................. 370 8.4.2 Canonical Logic Equations Exercises ............................................ 371 8.4.3 Optimization Exercises .................................................................. 372 8.4.4 Logic Evaluation Exercises ............................................................ 372 8.5 Laboratory Exercises for Chapters Four, Five, Six, and Seven ............... 377 8.5.1 The SIMY86 Program – Some Simple Y86 Programs .................. 377 8.5.2 Simple I/O-Mapped Input/Output Operations ............................... 380 8.5.3 Memory Mapped I/O ...................................................................... 381 8.5.4 DMA Exercises .............................................................................. 382 8.5.5 Interrupt Driven I/O Exercises ....................................................... 383 8.5.6 Machine Language Programming & Instruction Encoding Exercises 384 8.5.7 Self Modifying Code Exercises ..................................................... 386 8.5.8 Virtual Memory Exercise ............................................................... 388 1.1 Chapter Overview .................................................................................... 393 1.2 Some Additional Instructions: INTMUL, BOUND, INTO ..................... 393 1.3 The QWORD and TBYTE Data Types ................................................... 397 Page 6
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
1.4 HLA Constant and Value Declarations ................................................... 397 1.4.1 Constant Types ............................................................................... 400 1.4.2 String and Character Literal Constants .......................................... 401 1.4.3 String and Text Constants in the CONST Section ......................... 402 1.4.4 Constant Expressions ..................................................................... 403 1.4.5 Multiple CONST Sections and Their Order in an HLA Program .. 405 1.4.6 The HLA VAL Section .................................................................. 406 1.4.7 Modifying VAL Objects at Arbitrary Points in Your Programs .... 406 1.5 The HLA TYPE Section .......................................................................... 407 1.6 ENUM and HLA Enumerated Data Types .............................................. 408 1.7 Pointer Data Types .................................................................................. 409 1.7.1 Using Pointers in Assembly Language .......................................... 410 1.7.2 Declaring Pointers in HLA ............................................................ 411 1.7.3 Pointer Constants and Pointer Constant Expressions .................... 411 1.7.4 Pointer Variables and Dynamic Memory Allocation ..................... 412 1.7.5 Common Pointer Problems ............................................................ 413 1.8 Putting It All Together ............................................................................. 417 2.1 Chapter Overview .................................................................................... 419 2.2 Composite Data Types ............................................................................. 419 2.3 Character Strings ..................................................................................... 419 2.4 HLA Strings ............................................................................................. 421 2.5 Accessing the Characters Within a String ............................................... 426 2.6 The HLA String Module and Other String-Related Routines ................. 428 2.7 In-Memory Conversions .......................................................................... 437 2.8 Putting It All Together ............................................................................. 438 3.1 Chapter Overview .................................................................................... 439 3.2 The HLA Standard Library CHARS.HHF Module ................................. 439 3.3 Character Sets .......................................................................................... 441 3.4 Character Set Implementation in HLA .................................................... 442 3.5 HLA Character Set Constants and Character Set Expressions ................ 443 3.6 The IN Operator in HLA HLL Boolean Expressions .............................. 444 3.7 Character Set Support in the HLA Standard Library .............................. 445 3.8 Using Character Sets in Your HLA Programs ......................................... 447 3.9 Low-level Implementation of Set Operations ......................................... 449 3.9.1 Character Set Functions That Build Sets ....................................... 449 3.9.2 Traditional Set Operations ............................................................. 455 3.9.3 Testing Character Sets ................................................................... 458 3.10 Putting It All Together ........................................................................... 461 4.1 Chapter Overview .................................................................................... 463 4.2 Arrays ...................................................................................................... 463 4.3 Declaring Arrays in Your HLA Programs ............................................... 464 4.4 HLA Array Constants .............................................................................. 464 4.5 Accessing Elements of a Single Dimension Array .................................. 465 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 7
AoATOC.fm
4.5.1 Sorting an Array of Values ............................................................ 467 4.6 Multidimensional Arrays ......................................................................... 468 4.6.1 Row Major Ordering ...................................................................... 469 4.6.2 Column Major Ordering ................................................................. 473 4.7 Allocating Storage for Multidimensional Arrays .................................... 474 4.8 Accessing Multidimensional Array Elements in Assembly Language ... 475 4.9 Large Arrays and MASM ........................................................................ 476 4.10 Dynamic Arrays in Assembly Language ............................................... 477 4.11 HLA Standard Library Array Support ................................................... 479 4.12 Putting It All Together ........................................................................... 481 5.1 Chapter Overview .................................................................................... 483 5.2 Records
................................................................................................. 483
5.3 Record Constants ..................................................................................... 485 5.4 Arrays of Records .................................................................................... 486 5.5 Arrays/Records as Record Fields
......................................................... 487
5.6 Controlling Field Offsets Within a Record .............................................. 489 5.7 Aligning Fields Within a Record ............................................................. 490 5.8 Pointers to Records .................................................................................. 491 5.9 Unions ...................................................................................................... 492 5.10 Anonymous Unions ............................................................................... 494 5.11 Variant Types ......................................................................................... 495 5.12 Namespaces ........................................................................................... 496 5.13 Putting It All Together ........................................................................... 498 6.1 Chapter Overview .................................................................................... 501 6.2 Dates ........................................................................................................ 501 6.3 A Brief History of the Calendar ............................................................... 502 6.4 HLA Date Functions ................................................................................ 505 6.4.1 date.IsValid and date.validate ........................................................ 505 6.4.2 Checking for Leap Years ............................................................... 507 6.4.3 Obtaining the System Date ............................................................. 509 6.4.4 Date to String Conversions and Date Output ................................. 510 6.4.5 date.unpack and data.pack .............................................................. 511 6.4.6 date.Julian, date.fromJulian ............................................................ 512 6.4.7 date.datePlusDays, date.datePlusMonths, and date.daysBetween . 512 6.4.8 date.dayNumber, date.daysLeft, and date.dayOfWeek .................. 513 6.5 Times ....................................................................................................... 514 6.5.1 time.curTime .................................................................................. 514 6.5.2 time.hmsToSecs and time.secstoHMS ........................................... 515 6.5.3 Time Input/Output .......................................................................... 515 6.6 Putting It All Together ............................................................................. 516 7.1 Chapter Overview .................................................................................... 517 7.2 File Organization ..................................................................................... 517
Page 8
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
7.2.1 Files as Lists of Records ................................................................ 517 7.2.2 Binary vs. Text Files ...................................................................... 518 7.3 Sequential Files ........................................................................................ 520 7.4 Random Access Files ............................................................................... 527 7.5 ISAM (Indexed Sequential Access Method) Files .................................. 530 7.6 Truncating a File ...................................................................................... 533 7.7 File Utility Routines ................................................................................ 534 7.7.1 Copying, Moving, and Renaming Files ........................................ 534 7.7.2 Computing the File Size ................................................................. 536 7.7.3 Deleting Files ................................................................................. 538 7.8 Directory Operations ............................................................................... 538 7.9 Putting It All Together ............................................................................. 539 8.1 Chapter Overview .................................................................................... 541 8.2 Procedures ............................................................................................... 541 8.3 Saving the State of the Machine .............................................................. 543 8.4 Prematurely Returning from a Procedure ................................................ 546 8.5 Local Variables ........................................................................................ 547 8.6 Other Local and Global Symbol Types ................................................... 551 8.7 Parameters ............................................................................................... 552 8.7.1 Pass by Value ................................................................................. 552 8.7.2 Pass by Reference .......................................................................... 555 8.8 Functions and Function Results ............................................................... 557 8.8.1 Returning Function Results ............................................................ 558 8.8.2 Instruction Composition in HLA ................................................... 558 8.8.3 The HLA RETURNS Option in Procedures .................................. 560 8.9 Side Effects .............................................................................................. 562 8.10 Recursion ............................................................................................... 563 8.11 Forward Procedures ............................................................................... 567 8.12 Putting It All Together ........................................................................... 567 9.1 Chapter Overview .................................................................................... 569 9.2 Managing Large Programs ...................................................................... 569 9.3 The #INCLUDE Directive ....................................................................... 570 9.4 Ignoring Duplicate Include Operations ................................................... 571 9.5 UNITs and the EXTERNAL Directive ................................................. 572 9.5.1 Behavior of the EXTERNAL Directive ......................................... 575 9.5.2 Header Files in HLA ...................................................................... 576 9.6 Make Files ............................................................................................... 578 9.7 Code Reuse .............................................................................................. 580 9.8 Creating and Managing Libraries ............................................................ 581 9.9 Name Space Pollution ............................................................................. 583 9.10 Putting It All Together ........................................................................... 585 10.1 Chapter Overview .................................................................................. 587 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 9
AoATOC.fm
10.2 80x86 Integer Arithmetic Instructions ................................................... 587 10.2.1 The MUL and IMUL Instructions ................................................ 587 10.2.2 The DIV and IDIV Instructions ................................................... 589 10.2.3 The CMP Instruction .................................................................... 592 10.2.4 The SETcc Instructions ................................................................ 593 10.2.5 The TEST Instruction ................................................................... 596 10.3 Arithmetic Expressions .......................................................................... 597 10.3.1 Simple Assignments ..................................................................... 597 10.3.2 Simple Expressions ...................................................................... 598 10.3.3 Complex Expressions ................................................................... 600 10.3.4 Commutative Operators ............................................................... 603 10.4 Logical (Boolean) Expressions .............................................................. 604 10.5 Machine and Arithmetic Idioms ............................................................ 606 10.5.1 Multiplying without MUL, IMUL, or INTMUL ......................... 606 10.5.2 Division Without DIV or IDIV .................................................... 607 10.5.3 Implementing Modulo-N Counters with AND ............................ 608 10.5.4 Careless Use of Machine Idioms .................................................. 608 10.6 The HLA (Pseudo) Random Number Unit ............................................ 608 10.7 Putting It All Together ........................................................................... 610 11.1 Chapter Overview .................................................................................. 611 11.2 Floating Point Arithmetic ...................................................................... 611 11.2.1 FPU Registers ............................................................................... 611 11.2.1.1 FPU Data Registers ............................................................. 612 11.2.1.2 The FPU Control Register ................................................... 612 11.2.1.3 The FPU Status Register ...................................................... 615 11.2.2 FPU Data Types ........................................................................... 619 11.2.3 The FPU Instruction Set ............................................................... 621 11.2.4 FPU Data Movement Instructions ................................................ 621 11.2.4.1 The FLD Instruction ............................................................ 621 11.2.4.2 The FST and FSTP Instructions .......................................... 622 11.2.4.3 The FXCH Instruction ......................................................... 622 11.2.5 Conversions .................................................................................. 623 11.2.5.1 The FILD Instruction ........................................................... 623 11.2.5.2 The FIST and FISTP Instructions ........................................ 623 11.2.5.3 The FBLD and FBSTP Instructions .................................... 624 11.2.6 Arithmetic Instructions ................................................................. 624 11.2.6.1 The FADD and FADDP Instructions .................................. 625 11.2.6.2 The FSUB, FSUBP, FSUBR, and FSUBRP Instructions .... 625 11.2.6.3 The FMUL and FMULP Instructions .................................. 626 11.2.6.4 The FDIV, FDIVP, FDIVR, and FDIVRP Instructions ...... 626 11.2.6.5 The FSQRT Instruction ..................................................... 627 11.2.6.6 The FPREM and FPREM1 Instructions ........................... 628 11.2.6.7 The FRNDINT Instruction .................................................. 628 11.2.6.8 The FABS Instruction .......................................................... 628 11.2.6.9 The FCHS Instruction ....................................................... 629 11.2.7 Comparison Instructions ............................................................ 629 11.2.7.1 The FCOM, FCOMP, and FCOMPP Instructions ............... 629 11.2.7.2 The FTST Instruction ........................................................ 630 11.2.8 Constant Instructions .............................................................. 631 Page 10
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
11.2.9 Transcendental Instructions ......................................................... 631 11.2.9.1 The F2XM1 Instruction ..................................................... 631 11.2.9.2 The FSIN, FCOS, and FSINCOS Instructions ............... 631 11.2.9.3 The FPTAN Instruction ..................................................... 632 11.2.9.4 The FPATAN Instruction .................................................... 632 11.2.9.5 The FYL2X Instruction ....................................................... 632 11.2.9.6 The FYL2XP1 Instruction ................................................... 632 11.2.10 Miscellaneous instructions ......................................................... 633 11.2.10.1 The FINIT and FNINIT Instructions ................................. 633 11.2.10.2 The FLDCW and FSTCW Instructions .......................... 633 11.2.10.3 The FCLEX and FNCLEX Instructions ......................... 633 11.2.10.4 The FSTSW and FNSTSW Instructions .......................... 633 11.2.11 Integer Operations .............................................................. 634 11.3 Converting Floating Point Expressions to Assembly Language ........... 634 11.3.1 Converting Arithmetic Expressions to Postfix Notation .............. 635 11.3.2 Converting Postfix Notation to Assembly Language .................. 637 11.3.3 Mixed Integer and Floating Point Arithmetic .............................. 638 11.4 HLA Standard Library Support for Floating Point Arithmetic ............. 638 11.4.1 The stdin.getf and fileio.getf Functions ....................................... 639 11.4.2 Trigonometric Functions in the HLA Math Library .................... 639 11.4.3 Exponential and Logarithmic Functions in the HLA Math Library 640 11.5 Sample Program .................................................................................... 640 11.6 Putting It All Together ........................................................................... 646 12.1 Chapter Overview .................................................................................. 647 12.2 Tables ..................................................................................................... 647 12.2.1 Function Computation via Table Look-up ................................... 647 12.2.2 Domain Conditioning ................................................................... 650 12.2.3 Generating Tables ........................................................................ 651 12.3 High Performance Implementation of cs.rangeChar ............................. 655 13.1 Questions ............................................................................................... 663 13.2 Programming Projects ........................................................................... 670 13.3 Laboratory Exercises ............................................................................. 677 13.3.1 Using the BOUND Instruction to Check Array Indices .............. 677 13.3.2 Using TEXT Constants in Your Programs .................................. 680 13.3.3 Constant Expressions Lab Exercise ............................................. 682 13.3.4 Pointers and Pointer Constants Exercises .................................... 684 13.3.5 String Exercises ............................................................................ 685 13.3.6 String and Character Set Exercises .............................................. 687 13.3.7 Console Array Exercise ............................................................... 691 13.3.8 Multidimensional Array Exercises ............................................... 693 13.3.9 Console Attributes Laboratory Exercise ...................................... 696 13.3.10 Records, Arrays, and Pointers Laboratory Exercise .................. 698 13.3.11 Separate Compilation Exercises ................................................. 704 13.3.12 The HLA (Pseudo) Random Number Unit ................................ 710 13.3.13 File I/O in HLA .......................................................................... 711 13.3.14 Timing Various Arithmetic Instructions .................................... 712 13.3.15 Using the RDTSC Instruction to Time a Code Sequence .......... 715 13.3.16 Timing Floating Point Instructions ............................................ 719 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 11
AoATOC.fm
13.3.17 Table Lookup Exercise .............................................................. 722 1.1 Chapter Overview .................................................................................... 727 1.2 Conjunction, Disjunction, and Negation in Boolean Expressions ........... 727 1.3 TRY..ENDTRY ....................................................................................... 729 1.3.1 Nesting TRY..ENDTRY Statements .............................................. 730 1.3.2 The UNPROTECTED Clause in a TRY..ENDTRY Statement ..... 732 1.3.3 The ANYEXCEPTION Clause in a TRY..ENDTRY Statement ... 735 1.3.4 Raising User-Defined Exceptions .................................................. 735 1.3.5 Reraising Exceptions in a TRY..ENDTRY Statement ................... 737 1.3.6 A List of the Predefined HLA Exceptions ..................................... 737 1.3.7 How to Handle Exceptions in Your Programs ............................... 737 1.3.8 Registers and the TRY..ENDTRY Statement ................................ 739 1.4 BEGIN..EXIT..EXITIF..END ................................................................. 740 1.5 CONTINUE..CONTINUEIF ................................................................... 745 1.6 SWITCH..CASE..DEFAULT..ENDSWITCH ........................................ 747 1.7 Putting It All Together ............................................................................. 749 2.1 Chapter Overview .................................................................................... 751 2.2 Low Level Control Structures ................................................................. 751 2.3 Statement Labels ...................................................................................... 751 2.4 Unconditional Transfer of Control (JMP) ............................................... 753 2.5 The Conditional Jump Instructions .......................................................... 755 2.6 “Medium-Level” Control Structures: JT and JF ...................................... 759 2.7 Implementing Common Control Structures in Assembly Language ....... 759 2.8 Introduction to Decisions ......................................................................... 760 2.8.1 IF..THEN..ELSE Sequences ........................................................ 761 2.8.2 Translating HLA IF Statements into Pure Assembly Language .... 764 2.8.3 Implementing Complex IF Statements Using Complete Boolean Evaluation 768 2.8.4 Short Circuit Boolean Evaluation .................................................. 769 2.8.5 Short Circuit vs. Complete Boolean Evaluation ............................ 770 2.8.6 Efficient Implementation of IF Statements in Assembly Language 772 2.8.7 SWITCH/CASE Statements .......................................................... 776 2.9 State Machines and Indirect Jumps
........................................................ 784
2.10 Spaghetti Code ....................................................................................... 786 2.11 Loops ..................................................................................................... 787 2.11.1 While Loops ................................................................................. 787 2.11.2 Repeat..Until Loops ..................................................................... 788 2.11.3 FOREVER..ENDFOR Loops ....................................................... 789 2.11.4 FOR Loops ................................................................................... 790 2.11.5 The BREAK and CONTINUE Statements .................................. 791 2.11.6 Register Usage and Loops .......................................................... 795 2.12 Performance Improvements ................................................................... 796 2.12.1 Moving the Termination Condition to the End of a Loop ........... 796 2.12.2 Executing the Loop Backwards ................................................... 798 2.12.3 Loop Invariant Computations .................................................... 799 2.12.4 Unraveling Loops ....................................................................... 800 Page 12
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
2.12.5 Induction Variables
..................................................................... 801
2.13 Hybrid Control Structures in HLA ........................................................ 802 2.14 Putting It All Together ........................................................................... 804 3.1 Chapter Overview .................................................................................... 805 3.2 Procedures and the CALL Instruction ..................................................... 805 3.3 Procedures and the Stack ......................................................................... 807 3.4 Activation Records .................................................................................. 810 3.5 The Standard Entry Sequence .................................................................. 813 3.6 The Standard Exit Sequence .................................................................... 814 3.7 HLA Local Variables ............................................................................... 815 3.8 Parameters ............................................................................................... 816 3.8.1 Pass by Value ................................................................................. 817 3.8.2 Pass by Reference .......................................................................... 817 3.8.3 Passing Parameters in Registers ................................................... 818 3.8.4 Passing Parameters in the Code Stream ........................................ 820 3.8.5 Passing Parameters on the Stack ................................................... 822 3.8.5.1 Accessing Value Parameters on the Stack ............................. 824 3.8.5.2 Passing Value Parameters on the Stack ................................. 825 3.8.5.3 Accessing Reference Parameters on the Stack ...................... 831 3.8.5.4 Passing Reference Parameters on the Stack .......................... 834 3.8.5.5 Passing Formal Parameters as Actual Parameters ................. 836 3.8.5.6 HLA Hybrid Parameter Passing Facilities ............................ 838 3.8.5.7 Mixing Register and Stack Based Parameters ....................... 839 3.9 Procedure Pointers ................................................................................... 839 3.10 Procedural Parameters ........................................................................... 842 3.11 Untyped Reference Parameters ............................................................. 843 3.12 Iterators and the FOREACH Loop ........................................................ 843 3.13 Sample Programs ................................................................................... 846 3.13.1 Generating the Fibonacci Sequence Using an Iterator ................. 846 3.13.2 Outer Product Computation with Procedural Parameters ........... 848 3.14 Putting It All Together ........................................................................... 851 4.1 Chapter Overview .................................................................................... 853 4.2 Multiprecision Operations ....................................................................... 853 4.2.1 Multiprecision Addition Operations ........................................... 853 4.2.2 Multiprecision Subtraction Operations .......................................... 856 4.2.3 Extended Precision Comparisons ................................................... 857 4.2.4 Extended Precision Multiplication ................................................. 860 4.2.5 Extended Precision Division .......................................................... 864 4.2.6 Extended Precision NEG Operations ............................................. 872 4.2.7 Extended Precision AND Operations ............................................ 873 4.2.8 Extended Precision OR Operations ................................................ 874 4.2.9 Extended Precision XOR Operations ............................................. 874 4.2.10 Extended Precision NOT Operations ........................................... 874 4.2.11 Extended Precision Shift Operations ........................................... 875 4.2.12 Extended Precision Rotate Operations ......................................... 878 4.2.13 Extended Precision I/O ................................................................ 878 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 13
AoATOC.fm
4.2.13.1 4.2.13.2 4.2.13.3 4.2.13.4 4.2.13.5 4.2.13.6 4.2.13.7 4.2.13.8
Extended Precision Hexadecimal Output ............................ Extended Precision Unsigned Decimal Output ................... Extended Precision Signed Decimal Output ....................... Extended Precision Formatted I/O ...................................... Extended Precision Input Routines ...................................... Extended Precision Hexadecimal Input ............................... Extended Precision Unsigned Decimal Input ...................... Extended Precision Signed Decimal Input ..........................
879 879 882 883 884 887 891 895
4.3 Operating on Different Sized Operands .................................................. 895 4.4 Decimal Arithmetic ................................................................................. 897 4.4.1 Literal BCD Constants ................................................................... 898 4.4.2 The 80x86 DAA and DAS Instructions ......................................... 898 4.4.3 The 80x86 AAA, AAS, AAM, and AAD Instructions .................. 900 4.4.4 Packed Decimal Arithmetic Using the FPU ................................... 901 4.5 Sample Program ....................................................................................... 903 4.6 Putting It All Together ............................................................................. 906 5.1 Chapter Overview .................................................................................... 909 5.2 What is Bit Data, Anyway? ..................................................................... 909 5.3 Instructions That Manipulate Bits ........................................................... 910 5.4 The Carry Flag as a Bit Accumulator ...................................................... 916 5.5 Packing and Unpacking Bit Strings ......................................................... 917 5.6 Coalescing Bit Sets and Distributing Bit Strings ..................................... 920 5.7 Packed Arrays of Bit Strings ................................................................... 922 5.8 Searching for a Bit ................................................................................... 923 5.9 Counting Bits ........................................................................................... 925 5.10 Reversing a Bit String ............................................................................ 927 5.11 Merging Bit Strings ............................................................................... 929 5.12 Extracting Bit Strings ............................................................................ 930 5.13 Searching for a Bit Pattern ..................................................................... 931 5.14 The HLA Standard Library Bits Module ............................................... 932 5.15 Putting It All Together ........................................................................... 933 6.1 Chapter Overview .................................................................................... 935 6.2 The 80x86 String Instructions ................................................................ 935 6.2.1 How the String Instructions Operate .............................................. 936 6.2.2 The REP/REPE/REPZ and REPNZ/REPNE Prefixes ................... 936 6.2.3 The Direction Flag ......................................................................... 937 6.2.4 The MOVS Instruction ................................................................... 938 6.2.5 The CMPS Instruction .................................................................... 943 6.2.6 The SCAS Instruction .................................................................... 946 6.2.7 The STOS Instruction .................................................................... 946 6.2.8 The LODS Instruction .................................................................... 947 6.2.9 Building Complex String Functions from LODS and STOS ......... 947 6.3 Putting It All Together ............................................................................. 948 7.1 Chapter Overview .................................................................................... 949
Page 14
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
7.2 Introduction to the Compile-Time Language (CTL) ............................... 949 7.3 The #PRINT and #ERROR Statements ................................................... 951 7.4 Compile-Time Constants and Variables .................................................. 952 7.5 Compile-Time Expressions and Operators .............................................. 953 7.6 Compile-Time Functions ......................................................................... 956 7.6.1 Type Conversion Compile-time Functions .................................... 957 7.6.2 Numeric Compile-Time Functions ................................................ 957 7.6.3 Character Classification Compile-Time Functions ........................ 958 7.6.4 Compile-Time String Functions ..................................................... 958 7.6.5 Compile-Time Pattern Matching Functions ................................... 958 7.6.6 Compile-Time Symbol Information ............................................... 959 7.6.7 Compile-Time Expression Classification Functions ...................... 960 7.6.8 Miscellaneous Compile-Time Functions ....................................... 961 7.6.9 Predefined Compile-Time Variables ............................................. 961 7.6.10 Compile-Time Type Conversions of TEXT Objects ................... 961 7.7 Conditional Compilation (Compile-Time Decisions) ............................. 962 7.8 Repetitive Compilation (Compile-Time Loops) ...................................... 966 7.9 Putting It All Together ............................................................................. 968 8.1 Chapter Overview .................................................................................... 969 8.2 Macros (Compile-Time Procedures) ....................................................... 969 8.2.1 Standard Macros ............................................................................ 969 8.2.2 Macro Parameters .......................................................................... 971 8.2.2.1 Standard Macro Parameter Expansion .................................. 971 8.2.2.2 Macros with a Variable Number of Parameters .................... 974 8.2.2.3 Required Versus Optional Macro Parameters ....................... 975 8.2.2.4 The "#(" and ")#" Macro Parameter Brackets ....................... 976 8.2.2.5 Eager vs. Deferred Macro Parameter Evaluation .................. 977 8.2.3 Local Symbols in a Macro ............................................................. 981 8.2.4 Macros as Compile-Time Procedures ............................................ 985 8.2.5 Multi-part (Context-Free) Macros ................................................. 985 8.2.6 Simulating Function Overloading with Macros ............................. 990 8.3 Writing Compile-Time "Programs" ......................................................... 995 8.3.1 Constructing Data Tables at Compile Time ................................... 996 8.3.2 Unrolling Loops ............................................................................. 999 8.4 Using Macros in Different Source Files .................................................. 1001 8.5 Putting It All Together ............................................................................. 1001 9.1 Chapter Overview .................................................................................... 1003 9.2 Introduction to DSELs in HLA ............................................................... 1003 9.2.1 Implementing the Standard HLA Control Structures .................... 1003 9.2.1.1 The FOREVER Loop ............................................................ 1004 9.2.1.2 The WHILE Loop .................................................................. 1007 9.2.1.3 The IF Statement ................................................................... 1009 9.2.2 The HLA SWITCH/CASE Statement ............................................ 1015 9.2.3 A Modified WHILE Loop .............................................................. 1026 9.2.4 A Modified IF..ELSE..ENDIF Statement ...................................... 1030 9.3 Sample Program: A Simple Expression Compiler .................................. 1036
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 15
AoATOC.fm
9.4 Putting It All Together ............................................................................. 1057 10.1 Chapter Overview .................................................................................. 1059 10.2 General Principles .................................................................................. 1059 10.3 Classes in HLA ...................................................................................... 1061 10.4 Objects ................................................................................................... 1063 10.5 Inheritance ............................................................................................. 1064 10.6 Overriding .............................................................................................. 1065 10.7 Virtual Methods vs. Static Procedures ................................................... 1066 10.8 Writing Class Methods, Iterators, and Procedures ................................ 1067 10.9 Object Implementation .......................................................................... 1071 10.9.1 Virtual Method Tables ................................................................. 1073 10.9.2 Object Representation with Inheritance ....................................... 1075 10.10 Constructors and Object Initialization ................................................. 1079 10.10.1 Dynamic Object Allocation Within the Constructor .................. 1081 10.10.2 Constructors and Inheritance ...................................................... 1082 10.10.3 Constructor Parameters and Procedure Overloading ................. 1085 10.11 Destructors ........................................................................................... 1086 10.12 HLA’s “_initialize_” and “_finalize_” Strings .................................... 1087 10.13 Abstract Methods ................................................................................. 1091 10.14 Run-time Type Information (RTTI) .................................................... 1094 10.15 Calling Base Class Methods ................................................................ 1095 10.16 Sample Program ................................................................................... 1096 10.17 Putting It All Together ......................................................................... 1112 11.1 Chapter Overview .................................................................................. 1113 11.2 Determining if a CPU Supports the MMX Instruction Set .................... 1113 11.3 The MMX Programming Environment ................................................. 1114 11.3.1 The MMX Registers ..................................................................... 1114 11.3.2 The MMX Data Types ................................................................. 1116 11.4 The Purpose of the MMX Instruction Set .............................................. 1117 11.5 Saturation Arithmetic and Wraparound Mode ...................................... 1118 11.6 MMX Instruction Operands ................................................................... 1118 11.7 MMX Technology Instructions ............................................................. 1123 11.7.1 MMX Data Transfer Instructions ................................................. 1123 11.7.2 MMX Conversion Instructions .................................................... 1123 11.7.3 MMX Packed Arithmetic Instructions ......................................... 1131 11.7.4 MMX Logic Instructions .............................................................. 1133 11.7.5 MMX Comparison Instructions ................................................... 1134 11.7.6 MMX Shift Instructions ............................................................... 1138 11.8 The EMMS Instruction .......................................................................... 1139 11.9 The MMX Programming Paradigm ....................................................... 1140 11.10 Putting It All Together ......................................................................... 1148 12.1 Chapter Overview .................................................................................. 1151
Page 16
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
12.2 Mixing HLA and MASM/Gas Code in the Same Program ................... 1151 12.2.1 In-Line (MASM/Gas) Assembly Code in Your HLA Programs . 1151 12.2.2 Linking MASM/Gas-Assembled Modules with HLA Modules .. 1154 12.3 Programming in Delphi/Kylix and HLA ............................................... 1157 12.3.1 Linking HLA Modules With Delphi Programs ............................ 1158 12.3.2 Register Preservation ................................................................... 1161 12.3.3 Function Results ........................................................................... 1161 12.3.4 Calling Conventions ..................................................................... 1167 12.3.5 Pass by Value, Reference, CONST, and OUT in Delphi ............. 1172 12.3.6 Scalar Data Type Correspondence Between Delphi and HLA .... 1173 12.3.7 Passing String Data Between Delphi and HLA Code .................. 1175 12.3.8 Passing Record Data Between HLA and Delphi ......................... 1177 12.3.9 Passing Set Data Between Delphi and HLA ................................ 1181 12.3.10 Passing Array Data Between HLA and Delphi .......................... 1181 12.3.11 Delphi Limitations When Linking with (Non-TASM) Assembly Code 1182 12.3.12 Referencing Delphi Objects from HLA Code ............................ 1182 12.4 Programming in C/C++ and HLA ......................................................... 1182 12.4.1 Linking HLA Modules With C/C++ Programs ............................ 1183 12.4.2 Register Preservation ................................................................... 1186 12.4.3 Function Results ........................................................................... 1186 12.4.4 Calling Conventions ..................................................................... 1186 12.4.5 Pass by Value and Reference in C/C++ ....................................... 1189 12.4.6 Scalar Data Type Correspondence Between C/C++ and HLA .... 1189 12.4.7 Passing String Data Between C/C++ and HLA Code .................. 1191 12.4.8 Passing Record/Structure Data Between HLA and C/C++ .......... 1191 12.4.9 Passing Array Data Between HLA and C/C++ ............................ 1192 12.5 Putting It All Together ........................................................................... 1193 13.1 Questions ............................................................................................... 1195 13.2 Programming Problems ......................................................................... 1203 13.3 Laboratory Exercises ............................................................................. 1212 13.3.1 Dynamically Nested TRY..ENDTRY Statements ....................... 1213 13.3.2 The TRY..ENDTRY Unprotected Section .................................. 1214 13.3.3 Performance of SWITCH Statement ............................................ 1215 13.3.4 Complete Versus Short Circuit Boolean Evaluation .................... 1219 13.3.5 Conversion of High Level Language Statements to Pure Assembly 1222 13.3.6 Activation Record Exercises ........................................................ 1222 13.3.6.1 Automatic Activation Record Generation and Access ........ 1222 13.3.6.2 The _vars_ and _parms_ Constants ..................................... 1224 13.3.6.3 Manually Constructing an Activation Record ..................... 1226 13.3.7 Reference Parameter Exercise ..................................................... 1228 13.3.8 Procedural Parameter Exercise .................................................... 1231 13.3.9 Iterator Exercises .......................................................................... 1234 13.3.10 Performance of Multiprecision Multiplication and Division Operations 1237 13.3.11 Performance of the Extended Precision NEG Operation ........... 1237 13.3.12 Testing the Extended Precision Input Routines ......................... 1238 13.3.13 Illegal Decimal Operations ........................................................ 1238 13.3.14 MOVS Performance Exercise #1 ............................................... 1238 13.3.15 MOVS Performance Exercise #2 ............................................... 1240 13.3.16 Memory Performance Exercise .................................................. 1242 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 17
AoATOC.fm
13.3.17 13.3.18 13.3.19 13.3.20 13.3.21 13.3.22 13.3.23 13.3.24 13.3.25 13.3.26 13.3.27
The Performance of Length-Prefixed vs. Zero-Terminated Strings 1243 Introduction to Compile-Time Programs ................................... 1249 Conditional Compilation and Debug Code ............................... 1250 The Assert Macro ....................................................................... 1252 Demonstration of Compile-Time Loops (#while) ...................... 1254 Writing a Trace Macro ............................................................... 1256 Overloading ................................................................................ 1258 Multi-part Macros and RatASM (Rational Assembly) .............. 1261 Virtual Methods vs. Static Procedures in a Class ...................... 1264 Using the _initialize_ and _finalize_ Strings in a Program ........ 1267 Using RTTI in a Program ........................................................... 1270
1.1 Chapter Overview .................................................................................... 1279 1.2 First Class Objects ................................................................................... 1279 1.3 Thunks ..................................................................................................... 1281 1.4 Initializing Thunks ................................................................................... 1282 1.5 Manipulating Thunks ............................................................................... 1283 1.5.1 Assigning Thunks ........................................................................... 1283 1.5.2 Comparing Thunks ......................................................................... 1284 1.5.3 Passing Thunks as Parameters ....................................................... 1285 1.5.4 Returning Thunks as Function Results .......................................... 1286 1.6 Activation Record Lifetimes and Thunks ................................................ 1288 1.7 Comparing Thunks and Objects .............................................................. 1289 1.8 An Example of a Thunk Using the Fibonacci Function .......................... 1289 1.9 Thunks and Artificial Intelligence Code ................................................. 1294 1.10 Thunks as Triggers ................................................................................ 1296 1.11 Jumping Out of a Thunk ........................................................................ 1299 1.12 Handling Exceptions with Thunks ......................................................... 1302 1.13 Using Thunks in an Appropriate Manner .............................................. 1302 1.14 Putting It All Together ........................................................................... 1303 2.1 Chapter Overview .................................................................................... 1305 2.2 Review of Iterators .................................................................................. 1305 2.2.1 Implementing Iterators Using In-Line Expansion .......................... 1307 2.2.2 Implementing Iterators with Resume Frames ................................ 1308 2.3 Other Possible Iterator Implementations ................................................. 1314 2.4 Breaking Out of a FOREACH Loop ....................................................... 1316 2.5 An Iterator Implementation of the Fibonacci Number Generator ........... 1317 2.6 Iterators and Recursion ............................................................................ 1323 2.7 Calling Other Procedures Within an Iterator ........................................... 1327 2.8 Iterators Within Classes ........................................................................... 1327 2.9 Putting It Altogether ................................................................................ 1327 3.1 Chapter Overview .................................................................................... 1329 3.2 Coroutines ................................................................................................ 1329 3.3 Parameters and Register Values in Coroutine Calls ................................ 1334
Page 18
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
3.4 Recursion, Reentrancy, and Variables ..................................................... 1335 3.5 Generators ................................................................................................ 1337 3.6 Exceptions and Coroutines ...................................................................... 1340 3.7 Putting It All Together ............................................................................. 1340 4.1 Chapter Overview .................................................................................... 1341 4.2 Parameters ............................................................................................... 1341 4.3 Where You Can Pass Parameters ............................................................ 1341 4.3.1 Passing Parameters in (Integer) Registers ...................................... 1342 4.3.2 Passing Parameters in FPU and MMX Registers ........................... 1345 4.3.3 Passing Parameters in Global Variables ........................................ 1346 4.3.4 Passing Parameters on the Stack .................................................... 1347 4.3.5 Passing Parameters in the Code Stream ......................................... 1351 4.3.6 Passing Parameters via a Parameter Block .................................... 1353 4.4 How You Can Pass Parameters ............................................................... 1354 4.4.1 Pass by Value-Result ..................................................................... 1354 4.4.2 Pass by Result ................................................................................ 1359 4.4.3 Pass by Name ................................................................................. 1360 4.4.4 Pass by Lazy-Evaluation ................................................................ 1362 4.5 Passing Parameters as Parameters to Another Procedure ........................ 1363 4.5.1 Passing Reference Parameters to Other Procedures ...................... 1363 4.5.2 Passing Value-Result and Result Parameters as Parameters ......... 1365 4.5.3 Passing Name Parameters to Other Procedures ............................. 1365 4.5.4 Passing Lazy Evaluation Parameters as Parameters ...................... 1366 4.5.5 Parameter Passing Summary .......................................................... 1366 4.6 Variable Parameter Lists ......................................................................... 1368 4.7 Function Results ...................................................................................... 1370 4.7.1 Returning Function Results in a Register ...................................... 1370 4.7.2 Returning Function Results on the Stack ....................................... 1371 4.7.3 Returning Function Results in Memory Locations ........................ 1371 4.7.4 Returning Large Function Results ................................................. 1372 4.8 Putting It All Together ............................................................................. 1372 5.1 Chapter Overview .................................................................................... 1375 5.2 Lexical Nesting, Static Links, and Displays ............................................ 1375 5.2.1 Scope .............................................................................................. 1375 5.2.2 Unit Activation, Address Binding, and Variable Lifetime ..... 1376 5.2.3 Static Links .................................................................................... 1377 5.2.4 Accessing Non-Local Variables Using Static Links ...................... 1382 5.2.5 Nesting Procedures in HLA ........................................................... 1384 5.2.6 The Display .................................................................................... 1387 5.2.7 The 80x86 ENTER and LEAVE Instructions ................................ 1391 5.3 Passing Variables at Different Lex Levels as Parameters. ...................... 1394 5.3.1 Passing Parameters by Value ......................................................... 1394 5.3.2 Passing Parameters by Reference, Result, and Value-Result ... 1395 5.3.3 Passing Parameters by Name and Lazy-Evaluation in a Block Structured Language 1395 5.4 Passing Procedures as Parameters ........................................................... 1396 Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 19
AoATOC.fm
5.5 Faking Intermediate Variable Access ...................................................... 1396 5.6 Putting It All Together ............................................................................. 1397 6.1 Questions ................................................................................................. 1399 6.2 Programming Problems ........................................................................... 1402 C.1 Introduction ............................................................................................. 1411 C.1.1 Intended Audience ......................................................................... 1411 C.1.2 Readability Metrics ....................................................................... 1411 C.1.3 How to Achieve Readability ......................................................... 1412 C.1.4 How This Document is Organized ................................................ 1413 C.1.5 Guidelines, Rules, Enforced Rules, and Exceptions ..................... 1413 C.1.6 Source Language Concerns ........................................................... 1414 C.2 Program Organization ............................................................................. 1414 C.2.1 Library Functions .......................................................................... 1414 C.2.2 Common Object Modules .............................................................. 1415 C.2.3 Local Modules ............................................................................... 1415 C.2.4 Program Make Files ...................................................................... 1416 C.3 Module Organization .............................................................................. 1417 C.3.1 Module Attributes .......................................................................... 1417 C.3.1.1 Module Cohesion .................................................................. 1417 C.3.1.2 Module Coupling .................................................................. 1418 C.3.1.3 Physical Organization of Modules ........................................ 1418 C.3.1.4 Module Interface ................................................................... 1419 C.4 Program Unit Organization ..................................................................... 1420 C.4.1 Routine Cohesion .......................................................................... 1420 C.4.2 Routine Coupling ........................................................................... 1421 C.4.3 Routine Size ................................................................................... 1421 C.5 Statement Organization ........................................................................... 1422 C.5.1 Writing “Pure” Assembly Code .................................................... 1422 C.5.2 Using HLA’s High Level Control Statements ............................... 1424 C.6 Comments ............................................................................................... 1430 C.6.1 What is a Bad Comment? .............................................................. 1430 C.6.2 What is a Good Comment? ............................................................ 1431 C.6.3 Endline vs. Standalone Comments ................................................ 1432 C.6.4 Unfinished Code ............................................................................ 1433 C.6.5 Cross References in Code to Other Documents ............................ 1434 C.7 Names, Instructions, Operators, and Operands ....................................... 1435 C.7.1 Names ............................................................................................ 1435 C.7.1.1 Naming Conventions ............................................................ 1437 C.7.1.2 Alphabetic Case Considerations ........................................... 1437 C.7.1.3 Abbreviations ........................................................................ 1438 C.7.1.4 The Position of Components Within an Identifier ................ 1439 C.7.1.5 Names to Avoid .................................................................... 1440 C.7.1.6 Special Identifers .................................................................. 1441 C.7.2 Instructions, Directives, and Pseudo-Opcodes .............................. 1442 C.7.2.1 Choosing the Best Instruction Sequence ............................... 1442 C.7.2.2 Control Structures ................................................................. 1443 C.7.2.3 Instruction Synonyms ........................................................... 1446
Page 20
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
C.8 Data Types .............................................................................................. 1447 C.8.1 Declaring Structures in Assembly Language ................................ 1447 H.1 Conversion Functions ............................................................................. 1493 H.2 Numeric Functions .................................................................................. 1495 H.3 Date/Time Functions .............................................................................. 1498 H.4 Classification Functions .......................................................................... 1498 H.5 String and Character Set Functions ........................................................ 1500 H.6 Pattern Matching Functions .................................................................... 1504 H.6.1 String/Cset Pattern Matching Functions ....................................... 1505 H.6.2 String/Character Pattern Matching Functions ............................... 1510 H.6.3 String/Case Insenstive Character Pattern Matching Functions ..... 1514 H.6.4 String/String Pattern Matching Functions ..................................... 1516 H.6.5 String/Misc Pattern Matching Functions ...................................... 1517 H.7 HLA Information and Symbol Table Functions ..................................... 1521 H.8 Compile-Time Variables ........................................................................ 1527 H.9 Miscellaneous Compile-Time Functions ................................................ 1528 J.1 The @TRACE Pseudo-Variable .............................................................. 1533 J.2 The Assert Macro ..................................................................................... 1536 L.1 The HLA Standard Library ..................................................................... 1541 L.2 Compiling to MASM Code -- The Final Word ....................................... 1542 L.3 The HLA if..then..endif Statement, Part I ............................................... 1547 L.4 Boolean Expressions in HLA Control Structures ................................... 1548 L.5 The JT/JF Pseudo-Instructions ................................................................ 1554 L.6 The HLA if..then..elseif..else..endif Statement, Part II ........................... 1554 L.7 The While Statement ............................................................................... 1558 L.8 repeat..until .............................................................................................. 1559 L.9 for..endfor ................................................................................................ 1559 L.10 forever..endfor ....................................................................................... 1559 L.11 break, breakif ......................................................................................... 1559 L.12 continue, continueif ............................................................................... 1559 L.13 begin..end, exit, exitif ............................................................................ 1559 L.14 foreach..endfor ...................................................................................... 1559 L.15 try..unprotect..exception..anyexception..endtry, raise ........................... 1559
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 21
AoATOC.fm
Page 22
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Chapter One: Foreword An introduction to this text and the purpose behind this text.
Chapter Two
Volume One: Data Representation
Chapter Two:Hello, World of Assembly Language A brief introduction to assembly language programming using the HLA language. Chapter Three:Data Representation A discussion of numeric representation on the computer. Chapter Four:More Data Representation Advanced numeric and non-numeric computer data representation. Chapter Five: Questions, Projects, and Laboratory Exercises
These five chapters are appropriate for all courses teaching machine organization and assembly language programming.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Volume One:Data Representation
Test what you’ve learned in the previous chapters!
Page 1
Page 2
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Foreword
Chapter One
Nearly every text has a throw-away chapter as Chapter One. Here’s my version. Seriously, though, some important copyright, instructional, and support information appears in this chapter. So you’ll probably want to read this stuff. Instructors will definitely want to review this material.
1.1
Foreword to the HLA Version of “The Art of Assembly...” In 1987 I began work on a text I entitled “How to Program the IBM PC, Using 8088 Assembly Language.” First, the 8088 faded into history, shortly thereafter the phrase “IBM PC” and even “IBM PC Compatible” became far less dominant in the industry, so I retitled the text “The Art of Assembly Language Programming.” I used this text in my courses at Cal Poly Pomona and UC Riverside for many years, getting good reviews on the text (not to mention lots of suggestions and corrections). Sometime around 1994-1995, I converted the text to HTML and posted an electronic version on the Internet. The rest, as they say is history. A week doesn’t go by that I don’t get several emails praising me for releasing such a fine text on the Internet. Indeed, I only hear three really big complaints about the text: (1) It’s a University textbook and some people don’t like to read textbooks, (2) It’s 16-bit DOS-based, and (3) there isn’t a print version of the text. Well, I make no apologies for complaint #1. The whole reason I wrote the text was to support my courses at Cal Poly and UC Riverside. Complaint #2 is quite valid, that’s why I wrote this version of the text. As for complaint #3, it was really never cost effective to create a print version; publishers simply cannot justify printing a text 1,500 pages long with a limited market. Furthermore, having a print version would prevent me from updating the text at will for my courses. The astute reader will note that I haven’t updated the electronic version of “The Art of Assembly Language Programming” (or “AoA”) since about 1996. If the whole reason for keeping the book in electronic form has been to make updating the text easy, why haven’t there been any updates? Well, the story is very similar to Knuth’s “The Art of Computer Programming” series: I was sidetracked by other projects1. The static nature of AoA over the past several years was never really intended. During the 1995-1996 time frame, I decided it was time to make a major revision to AoA. The first version of AoA was MS-DOS based and by 1995 it was clear that MS-DOS was finally becoming obsolete; almost everyone except a few die-hards had switched over to Windows. So I knew that AoA needed an update for Windows, if nothing else. I also took some time to evaluate my curriculum to see if I couldn’t improve the pedagogical (teaching) material to make it possible for my students to learn even more about 80x86 assembly language in a relatively short 10-week quarter. One thing I’ve learned after teaching an assembly language course for over a decade is that support software makes all the difference in the world to students writing their first assembly language programs. When I first began teaching assembly language, my students had to write all their own I/O routines (including numeric to string conversions for numeric I/O). While one could argue that there is some value to having students write this code for themselves, I quickly discovered that they spent a large percentage of their project time over the quarter writing I/O routines. Each moment they spent writing these relatively low-level routines was one less moment available to them for learning more advanced assembly language programming techniques. While, I repeat, there is some value to learning how to write this type of code, it’s not all that related to assembly language programming (after all, the same type of problem has to be solved for any language that allows numeric I/O). I wanted to free the students from this drudgery so they could learn more about assembly language programming. The result of this observation was “The UCR Standard Library for 80x86 Assembly Language Programmers.” This is a library containing several hundred I/O and utility functions that students could use in their assembly language programs. More than nearly anything else, the UCR Standard Library improved the progress students made in my courses. 1. Actually, another problem is the effort needed to maintain the HTML version since it was a manual conversion from Adobe Framemaker. But that’s another story...
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 3
Chapter One
Volume 1
It should come as no surprise, then, that one of my first projects when rewriting AoA was to create a new, more powerful, version of the UCR Standard Library. This effort (the UCR Stdlib v2.0) ultimately failed (although you can still download the code written for v2.0 from http://webster.cs.ucr.edu). The problem was that I was trying to get MASM to do a little bit more than it was capable of and so the project was ultimately doomed. To condense a really long story, I decided that I needed a new assembler. One that was powerful enough to let me write the new Standard Library the way I felt it should be written. However, this new assembler should also make it much easier to learn assembly language; that is, it should relieve the students of some of the drudgery of assembly language programming just as the UCR Standard Library had. After three years of part-time effort, the end result was the “High Level Assembler,” or HLA. HLA is a radical step forward in teaching assembly language. It combines the syntax of a high level language with the low-level programming capabilities of assembly language. Together with the HLA Standard Library, it makes learning and programming assembly language almost as easy as learning and programming a High Level Language like Pascal or C++. Although HLA isn’t the first attempt to create a hybrid high level/low level language, nor is it even the first attempt to create an assembly language with high level language syntax, it’s certainly the first complete system (with library and operating system support) that is suitable for teaching assembly language programming. Recent experiences in my own assembly language courses show that HLA is a major improvement over MASM and other traditional assemblers when teaching machine organization and assembly language programming. The introduction of HLA is bound to raise lots of questions about its suitability to the task of teaching assembly language programming (as well it should). Today, the primary purpose of teaching assembly language programming at the University level isn’t to produce a legion of assembly language programmers; it’s to teach machine organization and introduce students to machine architecture. Few instructors realistically expect more than about 5% of their students to wind up working in assembly language as their primary programming language2. Doesn’t turning assembly language into a high level language defeat the whole purpose of the course? Well, if HLA lets you write C/C++ or Pascal programs and attempted to call these programs “assembly language” then the answer would be “Yes, this defeats the purpose of the course.” However, despite the name and the high level (and very high level) features present in HLA, HLA is still assembly language. An HLA programmer still uses 80x86 machine instructions to accomplish most of the work. And those high level language statements that HLA provides are purely optional; the “purist” can use nothing but 80x86 assembly language, ignoring the high level statements that HLA provides. Those who argue that HLA is not true assembly language should note that Microsoft’s MASM and Borland’s TASM both provide many of the high level control structures found in HLA3. Perhaps the largest deviation from traditional assemblers that HLA makes is in the declaration of variables and data in a program. HLA uses a very Pascal-like syntax for variable, constant, type, and procedure declarations. However, this does not diminish the fact that HLA is an assembly language. After all, at the machine language (vs. assembly language) level, there is no such thing as a data declaration. Therefore, any syntax for data declaration is an abstraction of data representation in memory. I personally chose to use a syntax that would prove more familiar to my students than the traditional data declarations used by assemblers. Indeed, perhaps the principle driving force in HLA’s design has been to leverage the student’s existing knowledge when teaching them assembly language. Keep in mind, when a student first learns assembly language programming, there is so much more for them to learn than a handful of 80x86 machine instructions and the machine language programming paradigm. They’ve got to learn assembler directives, how to declare variables, how to write and call procedures, how to comment their code, what constitutes good programming style in an assembly language program, etc. Unfortunately, with most assemblers, these concepts are completely different in assembly language than they are in a language like Pascal or C/C++. For example, the indentation techniques students master in order to write readable code in Pascal just don’t apply to (traditional) assembly language programs. That’s where HLA deviates from traditional assemblers. By using a 2. My experience suggests that only about 10-20% of my students will ever write any assembly language again once they graduate; less than 5% ever become regular assembly language users. 3. Indeed, in some respects the MASM and TASM HLL control structures are actually higher level than HLA’s. I specifically restricted the statements in HLA because I did not want students writing “C/C++ programs with MOV instructions.”
Page 4
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Foreword high level syntax, HLA lets students leverage their high level language knowledge to write good readable programs. HLA will not let them avoid learning machine instructions, but it doesn’t force them to learn a whole new set of programming style guidelines, new ways to comment your code, new ways to create identifiers, etc. HLA lets them use the knowledge they already possess in those areas that really have little to do with assembly language programming so they can concentrate on learning the important issues in assembly language. So let there be no question about it: HLA is an assembly language. It is not a high level language masquerading as an assembler4. However, it is a system that makes learning and using assembly language easier than ever before possible. Some long-time assembly language programmers, and even many instructors, would argue that making a subject easier to learn diminishes the educational content. Students don’t get as much out of a course if they don’t have to work very hard at it. Certainly, students who don’t apply themselves as well aren’t going to learn as much from a course. I would certainly agree that if HLA’s only purpose was to make it easier to learn a fixed amount of material in a course, then HLA would have the negative side-effect of reducing what the students learn in their course. However, the real purpose of HLA is to make the educational process more efficient; not so the students spend less time learning a fixed amount of material (although HLA could certainly achieve this), but to allow the students to learn the same amount of material in less time so they can use the additional time available to them to advance their study of assembly language. Remember what I said earlier about the UCR Standard Library- it’s introduction into my course allowed me to teach even more advanced topics in my course. The same is true, even more so, for HLA. Keep in mind, I’ve got ten weeks in a quarter. If using HLA lets me teach the same material in seven weeks that took ten weeks with MASM, I’m not going to dismiss the course after seven weeks. Instead, I’ll use this additional time to cover more advanced topics in assembly language programming. That’s the real benefit to using pedagogical tools like HLA. Of course, once I’ve addressed the concerns of assembly language instructors and long-time assembly language programmers, the need arises to address questions a student might have about HLA. Without question, the number one concern my students have had is “If I spend all this time learning HLA, will I be able to use this knowledge once I get out of school?” A more blunt way of putting this is “Am I wasting my time learning HLA?” Let me address these questions using three points. First, as pointed out above, most people (instructors and experienced programmers) view learning assembly language as an educational process. Most students will probably never program full-time in assembly language, indeed, few programmers write more than a tiny fraction (less than 1%) of their code in assembly language. One of the main reasons most Universities require their students to take an assembly language course is so they will be familiar with the low-level operation of their machine and so they can appreciate what the compiler is doing for them (and help them to write better HLL code once they realize how the compiler processes HLL statements). HLA is an assembly language and learning HLA will certainly teach you the concepts of machine organization, the real purpose behind most assembly language courses. The second point to ponder is that learning assembly language consists of two main activities; learning the assembler’s syntax and learning the assembly language programming paradigm (that is, learning to think in assembly language). Of these two, the second activity is, by far, the more difficult. HLA, since it uses a high level language-like syntax, simplifies learning the assembly language syntax. HLA also simplifies the initial process of learning to program in assembly language by providing a crutch, the HLA high level statements, that allows students to use high level language semantics when writing their first programs. However, HLA does allow students to write “pure” assembly language programs, so a good instructor will ensure that they master the full assembly language programming paradigm before they complete the course. Once a student masters the semantics (i.e., the programming paradigm) of assembly language, learning a new syntax is relatively easy. Therefore, a typical student should be able to pick up MASM in about a week after mastering HLA5. As for the third and final point: to those that would argue that this is still extra effort that isn’t worthwhile, I would simply point out that none of the existing assemblers have more than a cursory level of com-
4. The C-- language is a good example of a low-level non-assembly language, if you need a comparison. 5. This is very similar to mastering C after learning C++.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 5
Chapter One
Volume 1
patibility. Yes, TASM can assemble most MASM programs, but the reverse is not true. And it’s certainly not the case that NASM, A86, GAS, MASM, and TASM let you write interchangeable code. If you master the syntax of one of these assemblers and someone expects you to write code in a different assembler, you’re still faced with the prospect of having to learn the syntax of the new assembler. And that’s going to take you about a week (assuming the presence of well-written documentation). In this respect, HLA is no different than any of the other assemblers. Having addressed these concerns you might have, it’s now time to move on and start teaching assembly language programming using HLA.
1.2
Intended Audience No single textbook can be all things to all people. This text is no exception. I’ve geared this text and the accompanying software to University level students who’ve never previously learned assembly language programming. This is not to say that others cannot benefit from this work; it simply means that as I’ve had to make choices about the presentation, I’ve made choices that should prove most comfortable for this audience I’ve chosen. A secondary audience who could benefit from this presentation is any motivated person that really wants to learn assembly language. Although I assume a certain level of mathematical maturity from the reader (i.e., high school algebra), most of the “tough math” in this textbook is incidental to learning assembly language programming and you can easily skip over it without fear that you’ll miss too much. High school students and those who haven’t seen a school in 40 years have effectively used this text (and its DOS counterpart) to learn assembly language programming. The organization of this text reflects the diverse audience for which it is intended. For example, in a standard textbook each chapter typically has its own set of questions, programming exercises, and laboratory exercises. Since the primary audience for this text is University students, such pedagogical material does appear within this text. However, recognizing that not everyone who reads this text wants to bother with this material (e.g., downloading it), this text moves such pedagogical material to the end of each volume in the text and places this material in a separate chapter. This is somewhat of an unusual organization, but I feel that University instructors can easily adapt to this organization and it saves burdening those who aren’t interested in this material. One audience to whom this book is specifically not directed are those persons who are already comfortable programming in 80x86 assembly language. Undoubtedly, there is a lot of material such programmers will find of use in this textbook. However, my experience suggests that those who’ve already learned x86 assembly language with an assembler like MASM, TASM, or NASM rebel at the thought of having to relearn basic assembly language syntax (as they would to have to learn HLA). If you fall into this category, I humbly apologize for not writing a text more to your liking. However, my goal has always been to teach those who don’t already know assembly language, not extend the education of those who do. If you happen to fall into this category and you don’t particularly like this text’s presentation, there is some good news: there are dozens of texts on assembly language programming that use MASM and TASM out there. So you don’t really need this one.
1.3
Teaching From This Text The first thing any instructor will notice when reviewing this text is that it’s far too large for any reasonable course. That’s because assembly language courses generally come in two flavors: a machine organization course (more hardware oriented) and an assembly language programming course (more software oriented). No text that is “just the right size” is suitable for both types of classes. Combining the information for both courses, plus advanced information students may need after they finish the course, produces a large text, like this one. If you’re an instructor with a limited schedule for teaching this subject, you’ll have to carefully select the material you choose to present over the time span of your course. To help, I’ve included some brief notes
Page 6
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Foreword at the beginning of each Volume in this text that suggests whether a chapter in that Volume is appropriate for a machine organization course, an assembly language programming course, or an advanced assembly programming course. These brief course notes can help you choose which chapters you want to cover in your course. If you would like to offer hard copies of this text in the bookstore for your students, I will attempt to arrange with some “Custom Textbook Publishing” houses to make this material available on an “as-requested” basis. As I work out arrangements with such outfits, I’ll post ordering information on Webster (http://webster.cs.ucr.edu). If your school has a printing and reprographics department, or you have a local business that handles custom publishing, you can certainly request copyright clearance to print the text locally. If you’re not taking a formal course, just keep in mind that you don’t have to read this text straight through, chapter by chapter. If you want to learn assembly language programming and some of the machine organization chapters seem a little too hardware oriented for your tastes, feel free to skip those chapters and come back to them later on, when you understand the need to learn this information.
1.4
Copyright Notice The full contents of this text is copyrighted material. Here are the rights I hereby grant concerning this material. You have the right to • • •
Read this text on-line from the http://webster.cs.ucr.edu web site or any other approved web site. Download an electronic version of this text for your own personal use and view this text on your own personal computer. Make a single printed copy for your own personal use.
I usually grant instructors permission to use this text in conjunction with their courses at recognized academic institutions. There are two types of reproduction I allow in this instance: electronic and printed. I grant electronic reproduction rights for one school term; after which the institution must remove the electronic copy of the text and obtain new permission to repost the electronic form (I require a new copy for each term so that corrections, changes, and additions propagate across the net). If your institution has reproduction facilities, I will grant hard copy reproduction rights for one academic year (for the same reasons as above). You may obtain copyright clearance by emailing me at
[email protected] I will respond with clearance via email. My returned email plus this page should provide sufficient acknowledgement of copyright clearance. If, for some reason, your reproduction department needs to have me physically sign a copyright clearance, I will have to charge $75.00 U.S. to cover my time and effort needed to deal with this. To obtain such clearance, please email me at the address above. Presumably, your printing and reproduction department can handle producing a master copy from PDF files. If not, I can print a master copy on a laser printer (800x400dpi), please email me for the current cost of this service. All other rights to this text are expressly reserved by the author. In particular, it is a copyright violation to • •
Post this text (or some portion thereof) on some web site without prior approval. Reproduce this text in printed or electronic form for non-personal (e.g., commercial) use.
The software accompanying this text is all public domain material unless an explicit copyright notice appears in the software. Feel free to use the accompanying software in any way you feel fit.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 7
Chapter One
1.5
Volume 1
How to Get a Hard Copy of This Text This text is distributed in electronic form only. It is not available in hard copy form nor do I personally intend to have it published. If you want a hard copy of this text, the copyright allows you to print one for yourself. The PDF distribution format makes this possible (though the length of the text will make it somewhat expensive). If you’re wondering why I don’t get this text published, there’s a very simple reason: it’s too long. Publishing houses generally don’t want to get involved with texts for specialized subjects as it is; the cost of producing this text is prohibitive given its limited market. Rather than cut it down to the 500 or so 6” x 9” pages that most publishers would accept, my decision was to stick with the full text and release the text in electronic form on the Internet. The upside is that you can get a free copy of this text; the downside is that you can’t readily get a hard copy. Note that the copyright notice forbids you from copying this text for anything other than personal use (without permission, of course). If you run a “Print to Order/Custom Textbook” publishing house and would like to make copies for people, feel free to contact me and maybe we can work out a deal for those who just have to have a hard copy of this text.
1.6
Obtaining Program Source Listings and Other Materials in This Text All of the software appearing in this text is available from the Webster web site. The URL is http://webster.cs.ucr.edu The exact filename(s) of this material may change with time, and different services use different names for these files. Check on Webster for any important changes in addresses. If for some reason, Webster disappears in the future, you should use a web-based search engine like “AltaVista” and search for “Art of Assembly” to locate the current home site of this material.
1.7
Where to Get Help If you’re reading this text and you’ve got questions about how to do something, please post a message to one of the following Internet newsgroups: comp.lang.asm.x86 alt.lang.asm
Hundreds of knowledgeable individuals frequent these newsgroups and as long as you’re not simply asking them to do your homework assignment for you, they’ll probably be more than happy to help you with any problems that you have with assembly language programming. I certainly welcome corrections and bug reports concerning this text at my email address. However, I regret that I do not have the time to answer general assembly language programming questions via email. I do provide support in public forums (e.g., the newsgroups above and on Webster at http://webster.cs.ucr.edu) so please use those avenues rather than emailing questions directly to me. Due to the volume of email I receive daily, I regret that I cannot reply to all emails that I receive; so if you’re looking for a response to a question, the newsgroup is your best bet (not to mention, others might benefit from the answer as well).
1.8
Other Materials You Will Need (Windows Version) In addition to this text and the software I provide, you will need a machine running a 32-bit version of Windows (Windows 9x, NT, 2000, ME, etc.), a copy of Microsoft’s MASM and a 32-bit linker, some sort of
Page 8
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Foreword text editor, and other rudimentary general-purpose software tools you normally use. MASM and MS-Link are freely available on the internet. Alas, the procedure you must follow to download these files from Microsoft seems to change on a monthly basis. However, a quick post to comp.lang.asm.x86 should turn up the current site from which you may obtain this software. Almost all the software you need to use this text is part of Windows (e.g., a simple text editor like Notepad.exe) or is freely available on the net (MASM, LINK, and HLA). You shouldn’t have to purchase anything.
1.9
Other Materials You Will Need (Linux Version) In addition to this text and the software I provide, you will need a machine running Linux (preferably Linux 2.4 or later), “as” and “ld” (if you can compile GCC programs, you’ve got these, they come standard with most distributions), some sort of text editor, and other rudimentary general-purpose software tools you normally use. Although not necessary, it helps if you’ve got superuser priviledges during installation so you can put the software in a reasonable spot.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 9
Chapter One
Page 10
Volume 1
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
Hello, World of Assembly Language 2.1
Chapter Two
Chapter Overview This chapter is a “quick-start” chapter that lets you start writing basic assembly language programs right away. This chapter presents the basic syntax of an HLA (High Level Assembly) program, introduces you to the Intel CPU architecture, provides a handful of data declarations and machine instructions, describes some utility routines you can call in the HLA Standard Library, and then shows you how to write some simple assembly language programs. By the conclusion of this chapter, you should understand the basic syntax of an HLA program and be prepared to start learning new language features in subsequent chapters.
2.2
Installing the HLA Distribution Package Before you can learn assembly language programming using HLA, you must first successfully install HLA on your system. Currently, HLA is available for the Linux and Windows operating systems. This section explains how to install HLA on these two systems. If HLA is already running on your system, you may skip to the next major section in this chapter. The latest version of HLA is available from the Webster web server at http://webster.cs.ucr.edu Go to this web site and following the HLA links to the “HLA Download” page. From here you should select the latest version of HLA for download to your computer. The HLA distribution is provided in a “Zip File” compressed format. Under Windows, you will need a decompressor program like PKUNZIP or WinZip in order to extract the HLA files from this zipped archive file; under Linux, you will use the GZIP and TAR programs to decompress and extract HLA. A detailed description of the use of these decompression products is beyond the scope of this manual, please consult the software vendor’s documentation or their web page for information concerning the use of these products; this discussion will only briefly describe how to use them to extract important HLA files. This text assumes that you will unzip the HLA distribution into the root directory of your C: drive under Windows, or to the “/usr/hla” directory under Linux. You can certainly install HLA anywhere you want, but you will have to adjust the following descriptions if you install HLA somewhere else. If possible, you should install HLA using root/administrator priviledges; regardless, you should make sure the permissions are set properly on the files so everyone has read and execute access to the HLA files; if you are unsure how to do this, please consult your operating system’s documentation or consult a system administrator. HLA is a console application. In order to run the HLA compiler you must run the command window program (this is “command.com” on Windows 95 and 98, or “cmd.exe” on Windows NT and Windows 2000; Linux users typically run “bash” or some other shell). This also means that you should be familiar with some simple “command line interface” (CLI) or “shell” commands. Most Windows distributions let you run the command prompt windows from the Start menu or from a submenu hanging off the start menu (you may also select “RUN” from the Start menu and type “cmd” as the program name). This text assumes that you are familiar with the Windows command window and you know how to use some basic command window commands (e.g., dir, del, rename, etc.). If you have never before used the Windows command line interpreter, you should consult an appropriate text to learn a few basic commands. Most Linux distributions run “bash” or some other shell program whenever you open up a terminal window (e.g., a GNOME or KDE terminal window or an X-TERM window). There are some minor differences between the shells running under Linux, this document assumes that you are using GNU’s “bash” shell. Again, this text assumes that you are comfortable with a few commands like ls, rm, and mv. If you have never used a Unix shell program before, you should consult an appropriate text or the on-line documentation to learn a few basic commands.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 11
Chapter Two
Volume 1
Before you can actually run the HLA compiler, you must set the system execution path and set up various environment variables. The following subsections explain how to do this under Windows and then Linux.
2.2.1 Installation Under Windows HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as MASM, then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.lib) and, possibly, several operating system specific library files (e.g., kernel32.lib under Windows). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your Windows system. First, you will need an HLA distribution for Windows. The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it. As noted earlier, HLA is not a stand alone assembler. The HLA package contains the HLA compiler, the HLA Standard Library, and a set of include files for the HLA Standard Library. If you write an HLA program with just this code, HLA will produce an "ASM" file and then stop. To produce an executable file you will need Microsoft’s MASM and LINK programs, along with some Windows library files, to complete the process. The easiest way to get all the files you need is to download the "MASM32" package from http://www.pdq.com.au/home/hutch/masm.htm or any of the other places on the net where you can find the MASM32 package (Webster maintains a current link if this link is dead). Once you unzip this file, it’s easy to install the MASM32 package using the install program it supplies. You must install MASM32 (or MASM/LINK/Win32 library files) before HLA will function properly. Here are the steps I went through to install MASM32 on my system: • I downloaded masm32v6.zip from the URL above (later versions are probably okay too, although there is a slight chance that the installation will be different. • I double-clicked on the masm32v6.zip file (which runs WinZip on my system). • I choose to extract "install.exe". I told WinZip to extract this file to C:\. • I double-clicked on the "install.exe" icon and selected the "C:" drive in the window that popped up. Then I hit the install button and waited while MASM32 extracted all the pertinent files. This produced a directory called "MASM32". MASM32 is a powerful assembly language development subsystem in its own right; but it uses the traditional MASM syntax rather than the HLA syntax. So we’ll use MASM32 mainly for the assembler, linker, and library files. MASM32 also includes a simple editor/IDE and several other tools that may be useful to an HLA programmer. Feel free to check this software out and see if it is useful to you. For now, note that the executable files you will ultimately need are ML.EXE, ML.ERR, LINK.EXE, and a couple of DLLs. You can find them in the MASM32\BIN subdirectory. Leave them there for the time being. The MASM32\LIB directory also contains many Win32 library files you will need. Again, leave them alone for the time being. • Next, if you haven’t already done so, download the HLA executables file from Webster at http://webster.cs.ucr.edu. On Webster you can download several different ZIP files associated with HLA from the HLA download page. The "Executables" is the only one you’ll absolutely need; however, you’ll probably want to grab the documentation and examples files as well. If you’re curious, or you want some more example code, you can download the source listings to the HLA Standard Library. If you’re really curious (or masochistic), you can download the HLA compiler source listings to (this is not for casual browsing!). • I downloaded the HLA1_32.zip file while writing this. Most likely, there is a much later version available as you’re reading this. Be sure to get the latest version. I chose to download this file to my "C:\" root directory.
Page 12
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language •
•
After downloading HLA1_32.zip to my C: drive, I double-clicked on the icon to run WinZip. I selected "Extract" and told WinZip to extract all the files to my C:\ directory. This created an "HLA" subdirectory in my root on C: with two subdirectories (include and lib) and two EXE files (HLA.EXE and HLAPARSE.EXE. The HLA program is a "shell" program that runs the HLA compiler (HLAPARSE.EXE), MASM (ML.EXE), the linker (LINK.EXE), and other programs. You can think of HLA.EXE as the "HLA Compiler". Next, I created the following text file and named it "IHLA.BAT" (note that you may need to change the default drive letters if you want to install HLA on a drive other than "C:"):
path=c:\hla;c:\masm32\bin;%path% set lib=c:\masm32\lib;c:\hla\hlalib;%lib% set include=c:\hla\include;c:\masm32\include;%include% set hlainc=c:\hla\include set hlalib=c:\hla\hlalib\hlalib.lib
•
•
•
•
•
•
•
•
Be sure you’ve typed all the lines exactly as written or HLA will fail to run properly. You may use any reasonable TEXT editor (e.g., NOTEPAD.EXE) to create this file. Do not use a word processing program (since they generally don’t save their data as a TEXT file). Be sure the file is named "IHLA.BAT" and not "IHLA.BAT.TXT" or some other variation. This batch file tells the system where to find all the files you will need when running HLA. Advanced Win32 users should note that you can set all these environment variables up inside the Windows system control panel in the "Advanced->Environment Variables" area. This is far more convenient (ultimately) than using this batch file (for reasons you’ll soon see). However, you can mess up you system if you don’t know what you’re doing when playing with the system control panel, so only advanced users who’ve done this stuff before should attempt this. HLA is a Win32 Console Window program. To run HLA you must open up a console Window. Under Windows 2000, Microsoft has hidden this away in Start->Programs->Accessories->Command Prompt. You might find it in another location. You can also start the command prompt processor by selecting Start->Run and entering "cmd". Once you’ve got the command prompt, ("C:>" or something similar), execute the IHLA.BAT file you’ve created by typing "IHLA" at the command line prompt. Hit the ENTER key to execute the command. At this point, HLA should be properly installed and ready to run. Try typing "hla -?" at the command line prompt and verify that you get the HLA help message. If not, go back and figure out what you’ve done wrong up to this point (it doesn’t hurt to start over from the beginning if you’re lost). Thus far, you’ve verified that HLA.EXE is operational. Now try the following command: "ML /?" This should run the Microsoft Macro Assembler (MASM) and display the help screen. You can ignore the information that appears; you will probably never need to know this stuff. Next, let’s verify the correct operation of the linker. Type "link /?" and verify that the linker program runs. Again, you can ignore the help screen that appears. You don’t need to know about this stuff. Now it’s time to try your hand at writing an honest to goodness HLA program and verify that the whole system is working. Here’s the canonical "Hello World" program written in HLA (we will revisit this program a little later in this chapter, don’t worry about what it means just yet). Enter it into a text editor and save it using the filename "HW.HLA":
program HelloWorld; #include( "stdlib.hhf" ) begin HelloWorld; stdout.put( "Hello, World of Assembly Language", nl ); end HelloWorld;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 13
Chapter Two
Volume 1 •
Make sure you’re in the same directory containing the HW.HLA file and type the following command at the "C:>" prompt: "HLA -v HW". The "-v" option tells HLA to produce VERBOSE output during compilation. This is helpful for determining what went wrong if the system fails somewhere along the line. This command should produce the following output:
HLA (High Level Assembler) Written by Randall Hyde and released to the public domain. Version Version 1.32 build 4904 (prototype) Files: 1: hw.hla Compiling "hw.hla" to "hw.asm" Assembling hw.asm via "ml /c /coff /Cp
hw.asm"
Microsoft (R) Macro Assembler Version 6.14.8444 Copyright (C) Microsoft Corp 1981-1997. All rights reserved. Assembling: hw.asm Linking via "link -subsystem:console /heap:0x1000000,0x1000000 /stack:0x1000000,0x1000000 /BASE:0x3000000 /machine:IX86 -entry:?HLAMain @hw.link -out:hw.exe kernel32.lib user32.lib c:\hla\hlalib\hlalib.lib hw.obj" Microsoft (R) Incremental Linker Version 5.12.8078 Copyright (C) Microsoft Corp 1992-1998. All rights reserved. /section:.text,ER /section:readonly,R /section:.edata,R /section:.data,RW /section:.bss,RW
•
If you get all of this output, you’re in business. You can run the “HW” program using the following CLI (command line interpreter) command: HW
•
Page 14
One thing to remember is that unless you set the environment variables permanently in the System control panel, you will have to run the IHLA.BAT file every time you open up a new command prompt window. Since this is a pain, here are some instructions I’ve taken from the Internet that describe how to set up the environment variables (DO THIS AT YOUR OWN RISK!)
1)
Open System Properties (Winkey-Break is a convenient shortcut) and go to Advanced tab, then Environment Variables. Add "c:\hla" to the Path in SYSTEM VARIABLES, not in "User variables for ". Click OK, but keep the Environment Variables window open, we're not done.
2)
Look at the contents of ihla.bat (ABOVE):
3)
In "User Variables for ", you must end up with each of these settings. For example, to create hlainc, you click the "New..." button, type "hlainc" as the name of the variable, and type "c:\hla\include" as the Variable value (all without quotes of course). If there is already a path set, and it already has some value, add this immediately to the end: ";c:\hla;%path%" and that will preserve your existing User and System paths as well as adding c:\hla.
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language For example, suppose you opened up your User Variables for and it already said "C:\Private Files\PantiePix;c:\winnt\system32;c:\winnt;c:\winnt\System32\Wbem;d:\lcc\bin;D:\PROGRA~1\U LTRAE~1;D:\4NT300;C:\msoffice\Office;c:/hla", you would click on Edit and type "C:\Private Files\PantiePix;c:\hla;%path%" (Same advice for preserving existing lib and include settings) 4)
Once you reboot the computer, you should be all set for "Hello world of assembly language"! (without having to run the IHLA.BAT file.)
Installing HLA is a complex and slightly involved process. Unfortunately, this is necessary because I don’t have the rights to distribute MASM, LINK, and other Microsoft files. Fortunately, HUTCH has collected all of these files together so they are easy to download. If you are concerned about possible legal issues with the download, you may legally download MASM and LINK from Microsoft’s site. A link on Webster (at the URL above) describes how to do this. At the time this was being written, work was progressing on HLA to produce TASM compatible output and plans were in the works to produce NASM and Gas versions as well. However, you will still have to obtain the Microsoft library files from some source if you intend to produce a Win32 application. Versions of HLA may appear for other Operating Systems as well. Check out Webster to see if any progress has been made in this direction. The most common two problems people have running HLA involve the location of the Win32 library files and the choice of linker. During the linking phase, HLA (well, link.exe actually) requires the kernel32.lib, user32.lib, and gdi32.lib library files. These must be present in the pathname(s) specified by the LIB environment variable. If, during the linker phase, HLA complains about missing object modules, make sure that the LIB path specifies the directory containing these files. If you’re a MS VC++ user, installation of VC++ should have set up the LIB path for you. If not, then locate these files (they are part of the MASM32 distribution) and copy them to the HLA\HLALIB directory (note that the ihla.bat file includes c:\hla\hlalib as part of the LIB path). Another common problem with running HLA is the use of the wrong link.exe program. Microsoft has distributed several different versions of link.exe; in particular, there are 16-bit linkers and 32-bit linkers. You must use a 32-bit segmented linker with HLA. If you get complaints about "stack size exceeded" or other errors during the linker phase, this is a good indication that you’re using a 16-bit version of the linker. Obtain and use a 32-bit version and things will work. Don’t forget that the 32-bit linker must appear in the execution path (specified by the PATH environment variable) before the 16-bit linker.
2.2.2 Installation Under Linux HLA is not a stand alone program. It is a compiler that translates HLA source code into a lower-level assembly language. A separate assembler, such as Gas (as), then completes the processing of this low-level intermediate code to produce an object code file. Finally, you must link the object code output from the assembler using a linker program. Typically you will link the object code produced by one or more HLA source files with the HLA Standard Library (hlalib.a). Most of this activity takes place transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to set up HLA on your system. First, you will need an HLA distribution for Linux. The latest version of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the latest version if you do not already possess it. As noted earlier, HLA is not a stand alone assembler. The HLA package contains the HLA compiler, the HLA Standard Library, and a set of include files for the HLA Standard Library. If you write an HLA program with just this code, HLA will produce an "ASM" file and then stop. To produce an executable file Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 15
Chapter Two
Volume 1
you will need GNU’s as and ld programs (these come with any Linux distribution that supports compiling C/C++ programs). Note that HLA only works with Gas v2.10 or later. The Gas assembler is part of the Binutils package. If you don’t have version 2.10 or later, download an appropriate binutils package from the internet. HLA will generate errors when it attempts to assemble its output via an invocation of the as (Gas) executable if you don’t have Gas v2.10 or later installed in your system. Here are the steps I went through to install HLA on my Linux system: • First, if you haven’t already done so, download the HLA executables file from Webster at http://webster.cs.ucr.edu. On Webster you can download several different ZIP files associated with HLA from the HLA download page. The "Linux Executables" is the only one you’ll absolutely need; however, you’ll probably want to grab the documentation and examples files as well. If you’re curious, or you want some more example code, you can download the source listings to the HLA Standard Library. If you’re really curious (or masochistic), you can download the HLA compiler source listings to (this is not for casual browsing!). • I downloaded the HLA1_39.tar.gz file while writing this. Most likely, there is a much later version available as you’re reading this. Be sure to get the latest version. I chose to download this file to my root directory; you can put the file whereever you like, though this documentation assumes that all HLA files wind up in the "/usr/hla/..." directory tree. If you do not already have a “/usr/hla” subdirectory, you can create one with the “mkdir” command (it’s best to do this using the “root” or “superuser” account; if you do not have superuser priviledges, you should have your system administrator do this for you. • After downloading HLA1_39.tar.gz to my root directory, I executed the following shell command: "gzip -d HLA1_39.tar.gz". Once decompression was complete, I extracted the individual files using the command "tar xvf HLA1_39.tar". This extracted a couple of executable files ("hla" and "hlaparse") along with two subdirectories (include and hlalib). The HLA program is a "shell" program that runs the HLA compiler (hlaparse), Gas (as), the linker (ld), and other programs. You can think of “hla” as the "HLA Compiler". It would be a real good idea, at this point, to set the permissions on "hla" and "hlaparse" so that everyone can read and execute them. You should also set read and execute permissions on the two subdirectories and read permissions on all the files within the directories (if this isn’t the default state). Do a "man chmod" from the Linux command-line if you don’t know how to change permissions. • Next, (logged in as a plain user rather than root or the super-user), I edited the ".bashrc" file in my home directory ("/home/rhyde" in my particular case, this will probably be different for you). I found the line that defined the "path" variable, it originally looked like this on my system "PATH=$DBROOT/bin:$DBROOT/pgm:$PATH" I edited this line to add the path to the HLA directory, producing the following: "PATH=$DBROOT/bin:$DBROOT/pgm:/usr/hla:$PATH” Without this modification, Linux will probably not find HLA when you attempt to execute it unless you type a full path (e.g., "/usr/hla/hla") when running the program. Since this is a pain, you’ll definitely want to add "/usr/hla" to your path. • Next, I added the following four lines to ".bashrc" (note that Linux filenames beginning with a period don’t normally show up in directory listings unless you supply the "-a" option to ls): hlalib=/usr/hla/hlalib/hlalib.a export hlalib hlainc=/usr/hla/include export hlainc These four lines define (and export) environment variables that HLA needs during compilation. Without these environment variables, HLA will probably complain about not being able to find include files, or the linker (ld) will complain about strange undefined symbols when you attempt to compile your programs. After saving the ".bashrc" shell, you can tell Linux to make the changes to the system by using the command: source .bashrc
Page 16
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
•
•
Note: this discussion only applies to users who run the BASH shell. If you are using a different shell (like the C-Shell or the Korn Shell), then the directions for setting the path and environment variables differs slightly. Please see the documentation for your particular shell if you don’t know how to do this. Also note that Linux does not normally display files whose name begins with a period when you use the “ls” command; to see such files, use the “ls -a” shell command. At this point, HLA should be properly installed and ready to run. Try typing "hla -?" at the command line prompt and verify that you get the HLA help message. If not, go back and figure out what you’ve done wrong up to this point (it doesn’t hurt to start over from the beginning if you’re lost). Now it’s time to try your hand at writing an honest to goodness HLA program and verify that the whole system is working. Here’s the canonical "Hello World" program written in HLA (we’ll discuss this program in detail a little later in this chapter). Enter it into a text editor and save it using the filename "hw.hla":
program HelloWorld; #include( "stdlib.hhf" ) begin HelloWorld; stdout.put( "Hello, World of Assembly Language", nl ); end HelloWorld;
•
Make sure you’re in the same directory containing the "hw.hla" file and type the following command at the prompt: "hla -v hw". The "-v" option tells HLA to produce VERBOSE output during compilation. This is helpful for determining what went wrong if the system fails somewhere along the line. This command should produce the following output:
HLA (High Level Assembler) Parser Written by Randall Hyde and released to the public domain. Version Version 1.39 build 6845 (prototype) -t active File: t.hla Compiling "t.hla" to "t.asm" HLA (High Level Assembler) Copyright 1999, by Randall Hyde, all rights reserved. Version Version 1.39 build 6845 (prototype) ELF output Using GAS assembler GAS output -test active Files: 1: t.hla Compiling 't.hla' to 't.asm' using command line [hlaparse
-v -sg -test "t.hla"]
Assembling "t.asm" via [as -o t.o "t.asm"] Linking via [ld -o "t" "t.o" "/usr/hla/hlalib/hlalib.a"]
Installing HLA is a complex and slightly involved process; though take heart, it’s a lot simpler to install HLA under Linux than Windows! (See the previous section if you need proof.) Versions of HLA may appear for other operating systems (beyond Windows and Linux) as well. Check out Webster to see if any progress has been made in this direction. Note a very unique thing about HLA: Carefully written (console) applications will compile and run on all supported operating systems without change. This is unheard of for
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 17
Chapter Two
Volume 1
assembly language! So if you are using multiple operating systems supported by HLA, you’ll probably want to download files for all supported OSes. Note: to run the HelloWorld program, a Linux user would type “hw” (or possibly “./hw”) at the command line prompt.
2.2.3 Installing “Art of Assembly” Related Files Although HLA is relatively flexible about where you put it on your system, this text assumes you’ve installed HLA in the “hla” directory on your C: drive under a Win32 operating system or in “/usr/hla” under Linux. This text also assumes the standard directory placement for the HLA files, which has the following layout • • • • • • • •
HLA directory AoA directory Doc directory Examples directory hlalib directory hlalibsrc directory include directory Tests directory
The “Art of Assembly” (AoA) software distribution has the following directory tree structure: • • • • • • • • • •
AoA directory volume1 ch01 directory ch02 directory etc. volume2 ch01 directory ch02 directory etc. etc.
The main HLA directory contains the executable code for the compiler. This consists of two files, HLA.EXE/hla and HLAPARSE.EXE/hlaparse (Windows/Linux). These two programs must be in the current execution path in order to run the compiler. Under Windows, it wouldn’t hurt to put the ml.exe, ml.err, link.exe, mspdbX0.dll (x=5, 6, or greater), and msvcrt.dll files in this directory as well. Under Linux, the “as” and “ld” programs are already in the execution path, assuming your Linux system supports C/C++ development. The Doc directory contains reference material for HLA in PDF and HTML formats. If you have a copy of Adobe Acrobat Reader, you will probably want to read the PDF versions since they are much nicer than the HTML versions. These documents contain the most up-to-date information about the HLA language; you should consult them if you have a question about the HLA language or the HLA Standard Library. Generally, material in this documentation supersedes information appearing in this text since the HLA document is electronic and is probably more up to date. The Examples directory contains a large set of HLA programs that demonstrate various features in the HLA language. If you have a question about an HLA feature, you can probably find an example program that demonstrates that feature in the Examples directory. Such examples provide invaluable insight that is often superior to a written description of the feature. Note that some of these programs may be specific to Windows or Linux, not all will compile and run under either operating system.
Page 18
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language The hlalib directory contains the object code for the HLA Standard Library. As you become more competent with HLA, you may want to take a look at how HLA implements various library functions by checking out the library source code in the hlalibsrc subdirectory. The include directory contains the HLA Standard Library include files. These special files (that end with a “.hhf” suffix, for “HLA Header File”) are needed during assembly to provide prototype and other information to your program. The example programs in this chapter all include the HLA header file “stdlib.hhf” that, in turn, includes all the other HLA header files in the standard library. The Tests directory contains various test files that test the correct operation of the HLA system. HLA includes these files as part of the distribution package because they provide additional examples of HLA coding. The AoA directory contains the code specific to this textbook. This directory contains all the source code to the (complete) programs appearing in this text. It also contains the programs appearing in the Laboratory Exercises section of each chapter. Therefore, this directory is very important to you. Within this subdirectory, the information is further divided up by volume and chapter. The material for Chapter One appears in the “ch01” subdirectory of the “volume1” directory in the AoA directory tree, the material for Chapter Two appears in the “ch02” subdirectory of the “volume1” directory, etc..
2.3
The Anatomy of an HLA Program An HLA program typically takes the following form:
program pgmID ; These identifiers specify the name of the program. They must all be the same identifier.
Declarations
The declarations section is where you declare constants, types, variables, procedures, and other objects in an HLA program.
begin pgmID ;
Statements
The Statements section is where you place the executable statements for your main program.
end pgmID ; PROGRAM, BEGIN, and END are HLA reserved words that delineate the program. Note the placement of the semicolons in this program. Figure 2.1
Basic HLA Program Layout
The pgmID in the template above is a user-defined program identifier. You must pick an appropriate, descriptive, name for your program. In particular, pgmID would be a horrible choice for any real program. If you are writing programs as part of a course assignment, your instructor will probably give you the name to use for your main program. If you are writing your own HLA program, you will have to choose this name. Identifiers in HLA are very similar to identifiers in most high level languages. HLA identifiers may begin with an underscore or an alphabetic character, and may be followed by zero or more alphanumeric or underscore characters. HLA’s identifiers are case neutral. This means that the identifiers are case sensitive insofar as you must always spell an identifier exactly the same way in your program (even with respect to upper and lower case). However, unlike other case sensitive languages, like C/C++, you may not declare two identifiers in the program whose name differs only by the case of alphabetic characters appearing in an identifier. Case neutrality enforces the good programming style of always spelling your names exactly the same
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 19
Chapter Two
Volume 1
way (with respect to case) and never declaring two identifiers whose only difference is the case of certain alphabetic characters. A traditional first program people write, popularized by K&R’s “The C Programming Language” is the “Hello World” program. This program makes an excellent concrete example for someone who is learning a new language. Here’s what the “Hello World” program looks like in HLA:
program helloWorld; #include( “stdlib.hhf” ); begin helloWorld; stdout.put( “Hello, World of Assembly Language”, nl ); end helloWorld;
Program 2.1
The Hello World Program
The #include statement in this program tells the HLA compiler to include a set of declarations from the stdlib.hhf (standard library, HLA Header File). Among other things, this file contains the declaration of the stdout.put code that this program uses. The stdout.put statement is the “print” statement for the HLA language. You use it to write data to the standard output device (generally the console). To anyone familiar with I/O statements in a high level language, it should be obvious that this statement prints the phrase “Hello, World of Assembly Language”. The nl appearing at the end of this statement is a constant, also defined in “stdlib.hhf”, that corresponds to the newline sequence. Note that semicolons follow the program, BEGIN, stdout.put, and END statements1. Technically speaking, a semicolon is not necessary after the #INCLUDE statement. It is possible to create include files that generate an error if a semicolon follows the #INCLUDE statement, so you may want to get in the habit of not putting a semicolon here (note, however, that the HLA standard library include files always allow a semicolon after the corresponding #INCLUDE statement). The #INCLUDE is your first introduction to HLA declarations. The #INCLUDE itself isn’t actually a declaration, but it does tell the HLA compiler to substitute the file “stdlib.hhf” in place of the #INCLUDE directive, thus inserting several declarations at this point in your program. Most HLA programs you will write will need to include at least some of the HLA Standard Library header files (“stdlib.hhf” actually includes all the standard library definitions into your program; for more efficient compiles, you might want to be more selective about which files you include. You will see how to do this in a later chapter). Compiling this program produces a console application. Running this program in a command window prints the specified string and then control returns back to the command line interpreter (or shell in Unix terminology). Note that HLA is a free-format language. Therefore, you may split statement across multiple lines (just like high level languages) if this helps to make your programs more readable. For example, the stdout.put statement in the HelloWorld program could also be written as follows: stdout.put ( “Hello, World of Assembly Language”, nl ); 1. Technically, from a language design point of view, these are not all statements. However, this chapter will not make that distinction.
Page 20
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language Another item worth noting, since you’ll see it cropping up in example code throughout this text, is that HLA automatically concatenates any adjacent string constants it finds in your source file. Therefore, the statement above is also equivalent to: stdout.put ( “Hello, “ “World of Assembly Language”, nl );
Indeed, “nl” (the newline) is really nothing more than a string constant, so (technically) the comma between the nl and the preceding string isn’t necessary. You’ll often see the above written as: stdout.put( “Hello, World of Assembly Language” nl );
Notice the lack of a comma between the string constant and nl; this turns out to be perfectly legal in HLA, though it only applies to certain symbol string constants; you may not, in general, drop the comma. The chapter on Strings, later in this text, will explain in detail how this works. This discussion appears here because you’ll probably see this “trick” employed by sample code prior to the formal discussion in the chapter on Strings.
2.4
Some Basic HLA Data Declarations HLA provides a wide variety of constant, type, and data declaration statements. Later chapters will cover the declaration section in more detail but it’s important to know how to declare a few simple variables in an HLA program. HLA predefines three different signed integer types: int8, int16, and int32, corresponding to eight-bit (one byte) signed integers, 16-bit (two byte) signed integers, and 32-bit (four byte) signed integers respectively2. Typical variable declarations occur in the HLA static variable section. A typical set of variable declarations takes the following form
static i8: int8; i8, i16, and i32 i16: int16; are the names of i32: int32; the variables to
"static" is the keyword that begins the variable declaration section. int8, int16, and int32 are the names of the data types for each declaration
declare here.
Figure 2.2
Static Variable Declarations
Those who are familiar with the Pascal language should be comfortable with this declaration syntax. This example demonstrates how to declare three separate integers, i8, i16, and i32. Of course, in a real program you should use variable names that are a little more descriptive. While names like “i8” and “i32” describe the type of the object, they do not describe it’s purpose. Variable names should describe the purpose of the object. In the STATIC declaration section, you can also give a variable an initial value that the operating system will assign to the variable when it loads the program into memory. The following figure demonstrates the syntax for this:
2. A discussion of bits and bytes will appear in the next chapter if you are unfamiliar with these terms.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 21
Chapter Two
Volume 1
static i8: int8 := 8; i16: int16 := 1600; i32: int32 := -320000;
The constant assignment operator, ":=" tells HLA that you wish to initialize the specified variable with an initial value.
Figure 2.3
The operand after the constant assignment operator must be a constant whose type is compatible with the variable you are initializing
Static Variable Initialization
It is important to realize that the expression following the assignment operator (“:=”) must be a constant expression. You cannot assign the values of other variables within a STATIC variable declaration. Those familiar with other high level languages (especially Pascal) should note that you may only declare one variable per statement. That is, HLA does not allow a comma delimited list of variable names followed by a colon and a type identifier. Each variable declaration consists of a single identifier, a colon, a type ID, and a semicolon. Here is a simple HLA program that demonstrates the use of variables within an HLA program:
Program DemoVars; #include( “stdlib.hhf” ); static InitDemo: int32 := 5; NotInitialized: int32; begin DemoVars; // Display the value of the pre-initialized variable: stdout.put( “InitDemo’s value is “, InitDemo, nl ); // Input an integer value from the user and display that value: stdout.put( “Enter an integer value: “ ); stdin.get( NotInitialized ); stdout.put( “You entered: “, NotInitialized, nl ); end DemoVars;
Program 2.2
Variable Declaration and Use
In addition to STATIC variable declarations, this example introduces three new concepts. First, the stdout.put statement allows multiple parameters. If you specify an integer value, stdout.put will convert that value to the string representation of that integer’s value on output. The second new feature this sample program introduces is the stdin.get statement. This statement reads a value from the standard input device (usually the keyboard), converts the value to an integer, and stores the integer value into the NotInitialized variable. Finally, this program also introduces the syntax for (one form of) HLA comments. The HLA compiler ignores all text from the “//” sequence to the end of the current line. Those familiar with C++ and Delphi should recognize these comments.
Page 22
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
2.5
Boolean Values HLA and the HLA Standard Library provides limited support for boolean objects. You can declare boolean variables, use boolean literal constants, use boolean variables in boolean expressions (e.g., in an IF statement), and you can print the values of boolean variables. Boolean literal constants consist of the two predefined identifiers true and false . Internally, HLA represents the value true using the numeric value one; HLA represents false using the value zero. Most programs treat zero as false and anything else as true, so HLA’s representations for true and false should prove sufficient. To declare a boolean variable, you use the boolean data type. HLA uses a single byte (the least amount of memory it can allocate) to represent boolean values. The following example demonstrates some typical declarations: static BoolVar: boolean; HasClass: boolean := false; IsClear: boolean := true;
As you can see in this example, you may declare initialized as well as uninitialized variables. Since boolean variables are byte objects, you can manipulate them using eight-bit registers and any instructions that operate directly on eight-bit values. Furthermore, as long as you ensure that your boolean variables only contain zero and one (for false and true, respectively), you can use the 80x86 AND, OR, XOR, and NOT instructions to manipulate these boolean values (we’ll describe these instructions a little later). You can print boolean values by making a call to the stdout.put routine, e.g., stdout.put( BoolVar )
This routine prints the text “true” or “false” depending upon the value of the boolean parameter ( zero is false, anything else is true). Note that the HLA Standard Library does not allow you to read boolean values via stdin.get.
2.6
Character Values HLA lets you declare one-byte ASCII character objects using the char data type. You may initialize character variables with a literal character value by surrounding the character with a pair of apostrophes. The following example demonstrates how to declare and initialize character variables in HLA: static c: char; LetterA: char := ‘A’;
You can print character variables using the stdout.put routine. We’ll return to the subject of character constants a little later.
2.7
An Introduction to the Intel 80x86 CPU Family Thus far, you’ve seen a couple of HLA programs that will actually compile and run. However, all the statements utilized to this point have been either data declarations or calls to HLA Standard Library routines. There hasn’t been any real assembly language up to this point. Before we can progress any farther and learn
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 23
Chapter Two
Volume 1
some real assembly language, a detour is necessary. For unless you understand the basic structure of the Intel 80x86 CPU family, the machine instructions will seem mysterious indeed. The Intel CPU family is generally classified as a Von Neumann Architecture Machine. Von Neumann computer systems contain three main building blocks: the central processing unit (CPU), memory, and input/output devices (I/O). These three components are connected together using the system bus. The following block diagram shows this relationship:
Memory
CPU
I/O Devices
Figure 2.4
Von Neumann Computer System Block Diagram
Memory and I/O devices will be the subjects of later chapters; for now, let’s take a look inside the CPU portion of the computer system, at least at the components that are visible to the assembly language programmer. The most prominent items within the CPU are the registers. The Intel CPU registers can be broken down into four categories: general purpose registers, special purpose application accessible registers, segment registers, and special purpose kernel mode registers. This text will not consider the last two sets of registers. The segment registers are not used much in modern 32-bit operating systems (e.g., Windows, BeOS, and Linux); since this text is geared around programs written for 32-bit operating systems, there is little need to discuss the segment registers. The special purpose kernel mode registers are intended for use by people who write operating systems, debuggers, and other system level tools. Such software construction is well beyond the scope of this text, so once again there is little need to discuss the special purpose kernel mode registers. The 80x86 (Intel family) CPUs provide several general purpose registers for application use. These include eight 32-bit registers that have the following names: EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP The “E” prefix on each name stands for extended. This prefix differentiates the 32-bit registers from the eight 16-bit registers that have the following names: AX, BX, CX, DX, SI, DI, BP, and SP Finally, the 80x86 CPUs provide eight 8-bit registers that have the following names: AL, AH, BL, BH, CL, CH, DL, and DH Unfortunately, these are not all separate registers. That is, the 80x86 does not provide 24 independent registers. Instead, the 80x86 overlays the 32-bit registers with the 16-bit registers and it overlays the 16-bit registers with the 8-bit registers. The following diagram shows this relationship:
Page 24
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
EAX
AL
AH EBX
ECX CH
EBP BP
CL ESP
DX
DH
DI
BL CX
EDX
SI EDI
BX
BH
Figure 2.5
ESI
AX
DL
SP
80x86 (Intel CPU) General Purpose Registers
The most important thing to note about the general purpose registers is that they are not independent. Modifying one register will modify at least one other register and may modify as many as three other registers. For example, modification of the EAX register may very well modify the AL, AH, and AX registers as well. This fact cannot be overemphasized here. A very common mistake in programs written by beginning assembly language programmers is register value corruption because the programmer did not fully understand the ramifications of the above diagram. The EFLAGS register is a 32-bit register that encapsulates several single-bit boolean (true/false) values. Most of the bits in the EFLAGs register are either reserved for kernel mode (operating system) functions, or are of little interest to the application programmer. Eight of these bits (or flags) are of interest to application programmers writing assembly language programs. These are the overflow, direction, interrupt disable3, sign, zero, auxiliary carry, parity, and carry flags. The following diagram shows their layout within the lower 16-bits of the EFLAGS register.
3. Application programs cannot modify the interrupt flag, but we’ll look at this flag later in this text, hence the discussion of this flag here.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 25
Chapter Two
Volume 1
15
0
Overflow Direction Interrupt
Not very interesting to application programmers
Sign Zero Auxiliary Carry Parity Carry
Figure 2.6
Layout of the FLAGS Register (Lower 16 bits of EFLAGS)
Of the eight flags that are usable by application programmers, four flags in particular are extremely valuable: the overflow, carry, sign, and zero flags. Collectively, we will call these four flags the condition codes4. The state of these flags (boolean variables) will let you test the results of previous computations and allow you to make decisions in your programs. For example, after comparing two values, the state of the condition code flags will tell you if one value is less than, equal to, or greater than a second value. The 80x86 CPUs provide special machine instructions that let you test the flags, alone or in various combinations. The last register of interest is the EIP (instruction pointer) register. This 32-bit register contains the memory address of the next machine instruction to execute. Although you will manipulate this register directly in your programs, the instructions that modify its value treat this register as an implicit operand. Therefore, you will not need to remember much about this register since the 80x86 instruction set effectively hides it from you. One important fact that comes as a surprise to those just learning assembly language is that almost all calculations on the 80x86 CPU must involve a register. For example, to add two (memory) variables together, storing the sum into a third location, you must load one of the memory operands into a register, add the second operand to the value in the register, and then store the register away in the destination memory location. Registers are a middleman in nearly every calculation. Therefore, registers are very important in 80x86 assembly language programs. Another thing you should be aware of is that although the general purpose registers have the name “general purpose” you should not infer that you can use any register for any purpose. The SP/ESP register for example, has a very special purpose (it’s the stack pointer) that effectively prevents you from using it for any other purpose. Likewise, the BP/EBP register has a special purpose that limits its usefulness as a general purpose register. All the 80x86 registers have their own special purposes that limit their use in certain contexts. For the time being, you should simply avoid the use of the ESP and EBP registers for generic calculations and keep in mind that the remaining registers are not completely interchangeable in your programs.
2.8
Some Basic Machine Instructions The 80x86 CPUs provide just over a hundred to many thousands of different machine instructions, depending on how you define a machine instruction. Even at the low end of the count (greater than 100), it appears as though there are far too many machine instructions to learn in a short period of time. Fortunately,
4. Technically the parity flag is also a condition code, but we will not use that flag in this text.
Page 26
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language you don’t need to know all the machine instructions. In fact, most assembly language programs probably use around 30 different machine instructions5. Indeed, you can certainly write several meaningful programs with only a small handful of machine instructions. The purpose of this section is to provide a small handful of machine instructions so you can start writing simple HLA assembly language programs right away. Without question, the MOV instruction is the most often-used assembly language statement. In a typical program, anywhere from 25-40% of the instructions are typically MOV instructions. As its name suggests, this instruction moves data from one location to another6. The HLA syntax for this instruction is mov( source_operand, destination_operand ); The source_operand can be a register, a memory variable, or a constant. The destination_operand may be a register or a memory variable. Technically the 80x86 instruction set does not allow both operands to be memory variables; HLA, however, will automatically translate a MOV instruction with two 16- or 32-bit memory operands into a pair of instructions that will copy the data from one location to another. In a high level language like Pascal or C/C++, the MOV instruction is roughly equivalent to the following assignment statement: destination_operand = source_operand ; Perhaps the major restriction on the MOV instruction’s operands is that they must both be the same size. That is, you can move data between two eight-bit objects, between two 16-bit objects, or between two 32-bit objects; you may not, however, mix the sizes of the operands. The following table lists all the legal combinations:
Table 1: Legal 80x86 MOV Instruction Operands Source
Destination
Reg8a
Reg8
Reg8
Mem8
Mem8
Reg8
constantb
Reg8
constant
Mem8
Reg16
Reg16
Reg16
Mem16
Mem16
Reg16
constant
Reg16
constant
Mem16
Reg32
Reg32
5. Different programs may use a different set of 30 instructions, but few programs use more than 30 distinct instructions. 6. Technically, MOV actually copies data from one location to another. It does not destroy the original data in the source operand. Perhaps a better name for this instruction should have been COPY. Alas, it’s too late to change it now.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 27
Chapter Two
Volume 1
Table 1: Legal 80x86 MOV Instruction Operands Reg32
Mem32
Mem32
Reg32
constant
Reg32
constant
Mem32
a. The suffix denotes the size of the register or memory location. b. The constant must be small enough to fit in the specified destination operand You should study this table carefully. Most of the general purpose 80x86 instructions use this same syntax. Note that in addition to the forms above, the HLA MOV instruction lets you specify two memory operands as the source and destination. However, this special translation that HLA provides only applies to the MOV instruction; it does not generalize to the other instructions. The 80x86 ADD and SUB instructions let you add and subtract two operands. Their syntax is nearly identical to the MOV instruction: add( source_operand, destination_operand ); sub( source_operand, destination_operand ); The ADD and SUB operands must take the same form as the MOV instruction, listed in the table above7. The ADD instruction does the following: destination_operand = destination_operand + source_operand ; destination_operand += source_operand; // For those who prefer C syntax Similarly, the SUB instruction does the calculation: destination_operand = destination_operand - source_operand ; destination_operand -= source_operand ; // For C fans. With nothing more than these three instructions, plus the HLA control structures that the next section discusses, you can actually write some sophisticated programs. Here’s a sample HLA program that demonstrates these three instructions:
program DemoMOVaddSUB; #include( “stdlib.hhf” ); static i8: i16: i32:
int8 int16 int32
:= -8; := -16; := -32;
begin DemoMOVaddSUB; // First, print the initial values // of our variables. stdout.put ( nl,
7. Remember, though, that ADD and SUB do not support memory-to-memory operations.
Page 28
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language “Initialized values: i8=”, i8, “, i16=”, i16, “, i32=”, i32, nl ); // // // // // // // //
Compute the absolute value of the three different variables and print the result. Note, since all the numbers are negative, we have to negate them. Using only the MOV, ADD, and SUB instruction, we can negate a value by subtracting it from zero.
mov( 0, al ); sub( i8, al ); mov( al, i8 );
// Compute i8 := -i8;
mov( 0, ax ); // Compute i16 := -i16; sub( i16, ax ); mov( ax, i16 ); mov( 0, eax ); // Compute i32 := -i32; sub( i32, eax ); mov( eax, i32 ); // Display the absolute values: stdout.put ( nl, “After negation: i8=”, i8, “, i16=”, i16, “, i32=”, i32, nl ); // Demonstrate ADD and constant-to-memory // operations: add( 32323200, i32 ); stdout.put( nl, “After ADD: i32=”, i32, nl );
end DemoMOVaddSUB;
Program 2.3
2.9
Demonstration of MOV, ADD, and SUB Instructions
Some Basic HLA Control Structures The MOV, ADD, and SUB instructions, while valuable, aren’t sufficient to let you write meaningful programs. You will need to complement these instructions with the ability to make decisions and create loops in your HLA programs before you can write anything other than a trivial program. HLA provides several high level control structures that are very similar to control structures found in high level languages. These
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 29
Chapter Two
Volume 1
include IF..THEN..ELSEIF..ELSE..ENDIF, WHILE..ENDWHILE, REPEAT..UNTIL, and so on. By learning these statements you will be armed and ready to write some real programs. Before discussing these high level control structures, it’s important to point out that these are not real 80x86 assembly language statements. HLA compiles these statements into a sequence of one or more real assembly language statements for you. Later in this text, you’ll learn how HLA compiles the statements and you’ll learn how to write pure assembly language code that doesn’t use them. However, you’ll need to learn many new concepts before you get to that point, so we’ll stick with these high level language statements for now since you’re probably already familiar with statements like these from your exposure to high level languages. Another important fact to mention is that HLA’s high level control structures are not as high level as they first appear. The purpose behind HLA’s high level control structures is to let you start writing assembly language programs as quickly as possible, not to let you avoid the use of real assembly language altogether. You will soon discover that these statements have some severe restrictions associated with them and you will quickly outgrow their capabilities (at least the restricted forms appearing in this section). This is intentional. Once you reach a certain level of comfort with HLA’s high level control structures and decide you need more power than they have to offer, it’s time to move on and learn the real 80x86 instructions behind these statements.
2.9.1 Boolean Expressions in HLA Statements Several HLA statements require a boolean (true or false) expression to control their execution. Examples include the IF, WHILE, and REPEAT..UNTIL statements. The syntax for these boolean expressions represents the greatest limitation of the HLA high level control structures. This is one area where your familiarity with a high level language will work against you – you’ll want to use the same boolean expressions you use in a high level language and HLA only supports some basic forms. HLA boolean expressions always take the following forms8: flag_specification !flag_specification register !register Boolean_variable !Boolean_variable mem_reg relop mem_reg_const register in LowConst..HiConst register not in LowConst..HiConst A flag_specification may be one of the following symbols: • • • • • • • •
@c @nc @z @nz @o @no @s @ns
carry: no carry: zero: not zero: overflow: no overflow: sign: no sign:
True if the carry is set (1), false if the carry is clear (0). True if the carry is clear (0), false if the carry is set (1). True if the zero flag is set, false if it is clear. True if the zero flag is clear, false if it is set. True if the overflow flag is set, false if it is clear. True if the overflow flag is clear, false if it is set. True if the sign flag is set, false if it is clear. True if the sign flag is clear, false if it is set.
8. Technically, there are a few more, advanced, forms, but you’ll have to wait a few chapters before seeing these additional formats.
Page 30
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language The use of the flag values in a boolean expression is somewhat advanced. You will begin to see how to use these boolean expression operands in the next chapter. A register operand can be any of the 8-bit, 16-bit, or 32-bit general purpose registers. The expression evaluates false if the register contains a zero; it evaluates true if the register contains a non-zero value. If you specify a boolean variable as the expression, the program tests it for zero (false) or non-zero (true). Since HLA uses the values zero and one to represent false and true, respectively, the test works in an intuitive fashion. Note that HLA requires that stand-alone variables be of type boolean. HLA rejects other data types. If you want to test some other type against zero/not zero, then use the general boolean expression discussed next. The most general form of an HLA boolean expression has two operands and a relational operator. The following table lists the legal combinations:
Table 2: Legal Boolean Expressions Left Operand
Relational Operator
Right Operand
= or == Memory Variable or Register
or !=
Memory Variable,
= Note that both operands cannot be memory operands. In fact, if you think of the Right Operand as the source operand and the Left Operand as the destination operand, then the two operands must be the same as those allowed for the ADD and SUB instructions. Also like the ADD and SUB instructions, the two operands must be the same size. That is, they must both be eight-bit operands, they must both be 16-bit operands, or they must both be 32-bit operands. If the Right Operand is a constant, it’s value must be in the range that is compatible with the Left Operand. There is one other issue of which you need to be aware. If the Left Operand is a register and the Right Operand is a positive constant or another register, HLA uses an unsigned comparison. The next chapter will discuss the ramifications of this; for the time being, do not compare negative values in a register against a constant or another register. You may not get an intuitive result. The IN and NOT IN operators let you test a register to see if it is within a specified range. For example, the expression “EAX in 2000..2099” evaluates true if the value in the EAX register is between 2000 and 2099 (inclusive). The NOT IN (two words) operator lets you check to see if the value in a register is outside the specified range. For example, “AL not in ‘a’..’z’” evaluates true if the character in the AL register is not a lower case alphabetic character. Here are some examples of legal boolean expressions in HLA: @c Bool_var al ESI EAX < EBX Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 31
Chapter Two
Volume 1 EBX > 5 i32 < -2 i8 > 128 al < i8 eax in 1..100 ch not in ‘a’..’z’
2.9.2 The HLA IF..THEN..ELSEIF..ELSE..ENDIF Statement The HLA IF statement uses the following syntax:
if( expression ) then sequence of one or more statements elseif( expression ) then sequence of one or more statements
The elseif clause is optional. Zero or more elseif clauses may appear in an if statement. If more than one elseif clause appears, all the elseif clauses must appear before the else clause (or before the endif if there is no else clause).
else sequence of one or more statements
The else clause is optional. At most one else clause may appear within an if statement and it must be the last clause before the endif.
endif; Figure 2.7
HLA IF Statement Syntax
The expressions appearing in this statement must take one of the forms from the previous section. If the associated expression is true, the code after the THEN executes, otherwise control transfers to the next ELSEIF or ELSE clause in the statement. Since the ELSEIF and ELSE clauses are optional, an IF statement could take the form of a single IF..THEN clause, followed by a sequence of statements, and a closing ENDIF clause. The following is an example of just such a statement: if( eax = 0 ) then stdout.put( “error: NULL value”, nl ); endif;
Page 32
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language If, during program execution, the expression evaluates true, then the code between the THEN and the ENDIF executes. If the expression evaluates false, then the program skips over the code between the THEN and the ENDIF. Another common form of the IF statement has a single ELSE clause. The following is an example of an IF statement with an optional ELSE: if( eax = 0 ) then stdout.put( “error: NULL pointer encountered”, nl ); else stdout.put( “Pointer is valid”, nl ); endif;
If the expression evaluates true, the code between the THEN and the ELSE executes; otherwise the code between the ELSE and the ENDIF clauses executes. You can create sophisticated decision-making logic by incorporating the ELSEIF clause into an IF statement. For example, if the CH register contains a character value, you can select from a menu of items using code like the following: if( ch = ‘a’ ) then stdout.put( “You selected the ‘a’ menu item”, nl ); elseif( ch = ‘b’ ) then stdout.put( “You selected the ‘b’ menu item”, nl ); elseif( ch = ‘c’ ) then stdout.put( “You selected the ‘c’ menu item”, nl ); else stdout.put( “Error: illegal menu item selection”, nl ); endif;
Although this simple example doesn’t demonstrate it, HLA does not require an ELSE clause at the end of a sequence of ELSEIF clauses. However, when making multi-way decisions, it’s always a good idea to provide an ELSE clause just in case an error arises. Even if you think it’s impossible for the ELSE clause to execute, just keep in mind that future modifications to the code could possibly void this assertion, so it’s a good idea to have error reporting statements built into your code.
2.9.3 The WHILE..ENDWHILE Statement The WHILE statement uses the following basic syntax:
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 33
Chapter Two
Volume 1
The expression in the WHILE statement has the same restrictions as the IF statement.
while( expression ) do sequence of one or more statements
Loop Body
endwhile;
Figure 2.8
HLA While Statement Syntax
This statement evaluates the boolean expression. If it is false, control immediately transfers to the first statement following the ENDWHILE clause. If the value of the expression is true, then control falls through to the body of the loop. After the loop body executes, control transfers back to the top of the loop where the WHILE statement retests the loop control expression. This process repeats until the expression evaluates false. Note that the WHILE loop, like its high level language siblings, tests for loop termination at the top of the loop. Therefore, it is quite possible that the statements in the body of the loop will not execute (if the expression is false when the code first executes the WHILE statement). Also note that the body of the WHILE loop must, at some point, modify the value of the boolean expression or an infinite loop will result. mov( 0, i ); while( i < 10 ) do stdout.put( “i=”, i, nl ); add( 1, i ); endwhile;
2.9.4 The FOR..ENDFOR Statement The HLA FOR loop takes the following general form: for( Initial_Stmt; Termination_Expression; Post_Body_Statement ) do > endfor;
This is equivalent to the following WHILE statement: Initial_Stmt; while( Termination_expression ) do > Post_Body_Statement; endwhile;
Initial_Stmt can be any single HLA/80x86 instruction. Generally this statement initializes a register or memory location (the loop counter) with zero or some other initial value. Termination_expression is an Page 34
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language HLA boolean expression (same format that WHILE allows). This expression determines whether the loop body will execute. The Post_Body_Statement executes at the bottom of the loop (as shown in the WHILE example above). This is a single HLA statement. Usually it is an instruction like ADD that modifies the value of the loop control variable. The following gives a complete example: for( mov( 0, i ); i < 10; add(1, i )) do stdout.put( “i=”, i, nl ); endfor;
// The above, rewritten as a while loop, becomes: mov( 0, i ); while( i < 10 ) do stdout.put( “i=”, i, nl ); add( 1, i ); endwhile;
2.9.5 The REPEAT..UNTIL Statement The HLA repeat..until statement uses the following syntax:
repeat sequence of one or more statements
Loop Body
until( expression ); The expression in the UNTIL clause has the same restrictions as the IF statement.
Figure 2.9
HLA Repeat..Until Statement Syntax
The HLA REPEAT..UNTIL statement tests for loop termination at the bottom of the loop. Therefore, the statements in the loop body always execute at least once. Upon encountering the UNTIL clause, the program will evaluate the expression and repeat the loop if the expression is false (that is, it repeats while false). If the expression evaluates true, the control transfers to the first statement following the UNTIL clause. The following simple example demonstrates one use for the REPEAT..UNTIL statement: mov( 10, ecx ); repeat stdout.put( “ecx = “, ecx, nl ); sub( 1, ecx ); until( ecx = 0 );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 35
Chapter Two
Volume 1
If the loop body will always execute at least once, then it is more efficient to use a REPEAT..UNTIL loop rather than a WHILE loop.
2.9.6 The BREAK and BREAKIF Statements The BREAK and BREAKIF statements provide the ability to prematurely exit from a loop. They use the following syntax:
break; breakif( expression ); The expression in the BREAKIF statement has the same restrictions as the IF statement.
Figure 2.10
HLA Break and Breakif Syntax
The BREAK statement exits the loop that immediately contains the break; The BREAKIF statement evaluates the boolean expression and terminates the containing loop if the expression evaluates true.
2.9.7 The FOREVER..ENDFOR Statement The FOREVER statement uses the following syntax:
forever sequence of one or more statements
Loop Body
endfor;
Figure 2.11
HLA Forever Loop Syntax
This statement creates an infinite loop. You may also use the BREAK and BREAKIF statements along with FOREVER..ENDFOR to create a loop that tests for loop termination in the middle of the loop. Indeed, this is probably the most common use of this loop as the following example demonstrates: forever stdout.put( “Enter an integer less than 10: “); stdin.get( i ); breakif( i < 10 ); stdout.put( “The value needs to be less than 10!”, nl ); endfor;
Page 36
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language
2.9.8 The TRY..EXCEPTION..ENDTRY Statement The HLA TRY..EXCEPTION..ENDTRY statement provides very powerful exception handling capabilities. The syntax for this statement is the following:
try sequence of one or more statements exception( exceptionID ) sequence of one or more statements exception( exceptionID ) sequence of one or more statements
Statements to test
At least one exception handling block.
Zero or more (optional) exception handling blocks.
endtry; Figure 2.12
HLA Try..Except..Endtry Statement Syntax
The TRY..ENDTRY statement protects a block of statements during execution. If these statements, between the TRY clause and the first EXCEPTION clause, execute without incident, control transfers to the first statement after the ENDTRY immediately after executing the last statement in the protected block. If an error (exception) occurs, then the program interrupts control at the point of the exception (that is, the program raises an exception). Each exception has an unsigned integer constant associated with it, known as the exception ID. The “excepts.hhf” header file in the HLA Standard Library predefines several exception IDs, although you may create new ones for your own purposes. When an exception occurs, the system compares the exception ID against the values appearing in each of the one or more EXCEPTION clauses following the protected code. If the current exception ID matches one of the EXCEPTION values, control continues with the block of statements immediately following that EXCEPTION. After the exception handling code completes execution, control transfers to the first statement following the ENDTRY. If an exception occurs and there is no active TRY..ENDTRY statement, or the active TRY..ENDTRY statements do not handle the specific exception, the program will abort with an error message. The following sample program demonstrates how to use the TRY..ENDTRY statement to protect the program from bad user input:
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 37
Chapter Two
Volume 1
repeat mov( false, GoodInteger ); try
// Note: GoodInteger must be a boolean var.
stdout.put( “Enter an integer: “ ); stdin.get( i ); mov( true, GoodInteger ); exception( ex.ConversionError ); stdout.put( “Illegal numeric value, please re-enter”, nl ); exception( ex.ValueOutOfRange ); stdout.put( “Value is out of range, please re-enter”, nl ); endtry; until( GoodInteger );
The REPEAT..UNTIL loop repeats this code as long as there is an error during input. Should an exception occur, control transfers to the EXCEPTION clauses to see if a conversion error (e.g., illegal characters in the number) or a numeric overflow occurs. If either of these exceptions occur, then they print the appropriate message and control falls out of the TRY..ENDTRY statement and the REPEAT..UNTIL loop repeats since GoodInteger was never set to true. If a different exception occurs (one that is not handled in this code), then the program aborts with the specified error message9. Please see the “excepts.hhf” header file that accompanies the HLA release for a complete list of all the exception ID codes. The HLA documentation will describe the purpose of each of these exception codes.
2.10
Introduction to the HLA Standard Library There are two reasons HLA is much easier to learn and use than standard assembly language. The first reason is HLA’s high level syntax for declarations and control structures. This HLA feature leverages your high level language knowledge, reducing the need to learn arcane syntax, allowing you to learn assembly language more efficiently. The other half of the equation is the HLA Standard Library. The HLA Standard Library provides lot of commonly needed, easy to use, assembly language routines that you can call without having to write this code yourself (or even learn how to write yourself). This eliminates one of the larger stumbling blocks many people have when learning assembly language: the need for sophisticated I/O and support code in order to write basic statements. Prior to the advent of a standardized assembly language library, it often took weeks of study before a new assembly language programmer could do as much as print a string to the display. With the HLA Standard Library, this roadblock is removed and you can concentrate on learning assembly language concepts rather than learning low-level I/O details that are specific to a given operating system. A wide variety of library routines is only part of HLA’s support. After all, assembly language libraries have been around for quite some time10. HLA’s Standard Library continues the HLA tradition by providing a high level language interface to these routines. Indeed, the HLA language itself was originally designed specifically to allow the creation of a high-level accessible set of library routines11. This high level interface,
9. An experienced programmer may wonder why this code uses a boolean variable rather than a BREAKIF statement to exit the REPEAT..UNTIL loop. There are some technical reasons for this that you will learn about later in this text. 10. E.g., the UCR Standard Library for 80x86 Assembly Language Programmers. 11. HLA was created because MASM was insufficient to support the creation of the UCR StdLib v2.0.
Page 38
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language combined with the high level nature of many of the routines in the library, packs a surprising amount of power in an easy to use package. The HLA Standard Library consists of several modules organized by category. The following table lists many of the modules that are available12:
Table 3: HLA Standard Library Modules Name
Description
args
Command line parameter parsing support routines.
conv
Various conversions between strings and other values.
cset
Character set functions.
DateTime
Calendar, date, and time functions.
excepts
Exception handling routines.
fileio
File input and output routines
hla
Special HLA constants and other values.
Linux
Linux system calls (HLA Linux version only).
math
Transcendental and other mathematical functions.
memory
Memory allocation, deallocation, and support code.
misctypes
Miscellaneous data types.
patterns
The HLA pattern matching library.
rand
Pseudo-random number generators and support code.
stdin
User input routines
stdout
Provides user output and several other support routines.
stdlib
A special include file that links in all HLA standard library modules.
strings
HLA’s powerful string library.
tables
Table (associative array) support routines.
win32
Constants used in Windows calls (HLA Win32 version, only)
x86
Constants and other items specific to the 80x86 CPU. Later sections of this text will explain many of these modules in greater detail. This section will concentrate on the most important routines (at least to beginning HLA programmers), the stdio library.
12. Since the HLA Standard Library is expanding, this list is probably out of date. Please see the HLA documentation for a current list of Standard Library modules.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 39
Chapter Two
Volume 1
2.10.1 Predefined Constants in the STDIO Module Perhaps the first place to start is with a description of some common constants that the STDIO module defines for you. One constant you’ve seen already in code appearing in this chapter. Consider the following (typical) example: stdout.put( “Hello World”, nl );
The nl appearing at the end of this statement stands for newline. The nl identifier is not a special HLA reserved word, nor is it specific to the stdout.put statement. Instead, it’s simply a predefined constant that corresponds to the string containing the standard end of line sequence (this is a carriage return/line feed pair under Windows or just a line feed under Linux). In addition to the nl constant, the HLA standard I/O library module defines several other useful character constants. They are • • • • • •
stdio.bell stdio.bs stdio.tab stdio.eoln stdio.lf stdio.cr
The ASCII bell character. Beeps the speaker when printed. The ASCII backspace character. The ASCII tab character. A linefeed character (even under Windows). The ASCII linefeed character. The ASCII carriage return character.
Except for nl, these characters appear in the stdio namespace (and, therefore, require the “stdio.” prefix). The placement of these ASCII constants within the stdio namespace is to help avoid naming conflicts with your own variables. The nl name does not appear within a namespace because you will use it very often and typing stdio.nl would get tiresome very quickly.
2.10.2 Standard In and Standard Out Many of the HLA I/O routines have a stdin or stdout prefix. Technically, this means that the standard library defines these names in a namespace13. In practice, this prefix suggests where the input is coming from (the standard input device) or going to (the standard output device). By default, the standard input device is the system keyboard. Likewise, the default standard output device is the console display. So, in general, statements that have stdin or stdout prefixes will read and write data on the console device. When you run a program from the command line window (or shell), you have the option of redirecting the standard input and/or standard output devices. A command line parameter of the form “>outfile” redirects the standard output device to the specified file (outfile). A command line parameter of the form “ 0 ) do if( ColCnt = 8 ) then stdout.newln(); mov( 0, ColCnt ); endif; stdout.put( i32:5 ); sub( 1, i32 ); add( 1, ColCnt ); endwhile; stdout.put( nl ); end NumsInColumns2;
Program 2.5
Demonstration of stdout.put Field Width Specification
The stdout.put routine is capable of much more than the few attributes this section describes. This text will introduce those additional capabilities as appropriate.
2.10.7 The stdin.getc Routine. The stdin.getc routine reads the next available character from the standard input device’s input buffer17. It returns this character in the CPU’s AL register. The following example program demonstrates a simple use of this routine:
16. Note that you cannot specify a padding character when using the stdout.put routine; the padding character defaults to the space character. If you need to use a different padding character, call the stdout.putiXSize routines. 17. “Buffer” is just a fancy term for an array.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 43
Chapter Two
Volume 1
program charInput; #include( “stdlib.hhf” ); var counter: int32; begin charInput; // The following repeats as long as the user // confirms the repetition. repeat // Print out 14 values. mov( 14, counter ); while( counter > 0 ) do stdout.put( counter:3 ); sub( 1, counter ); endwhile; // Wait until the user enters ‘y’ or ‘n’. stdout.put( nl, nl, “Do you wish to see it again? (y/n):” ); forever stdin.readLn(); stdin.getc(); breakif( al = ‘n’ ); breakif( al = ‘y’ ); stdout.put( “Error, please enter only ‘y’ or ‘n’: “ ); endfor; stdout.newln(); until( al = ‘n’ ); end charInput;
Program 2.6
Demonstration of the stdin.getc() Routine
This program uses the stdin.ReadLn routine to force a new line of input from the user. A description of stdin.ReadLn appears just a little later in this chapter.
2.10.8 The stdin.getiX Routines The stdin.geti8, stdin.geti16, and stdin.geti32 routines read eight, 16, and 32-bit signed integer values from the standard input device. These routines return their values in the AL, AX, or EAX register, respectively. They provide the standard mechanism for reading signed integer values from the user in HLA. Like the stdin.getc routine, these routines read a sequence of characters from the standard input buffer. They begin by skipping over any white space characters (spaces, tabs, etc.) and then convert the following stream of decimal digits (with an optional, leading, minus sign) into the corresponding integer. These rouPage 44
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language tines raise an exception (that you can trap with the TRY..ENDTRY statement) if the input sequence is not a valid integer string or if the user input is too large to fit in the specified integer size. Note that values read by stdin.geti8 must be in the range -128..+127; values read by stdin.geti16 must be in the range -32,768..+32,767; and values read by stdin.geti32 must be in the range -2,147,483,648..+2,147,483,647. The following sample program demonstrates the use of these routines:
program intInput; #include( “stdlib.hhf” ); var i8: i16: i32:
int8; int16; int32;
begin intInput; // Read integers of varying sizes from the user: stdout.put( “Enter a small integer between -128 and +127: “ ); stdin.geti8(); mov( al, i8 ); stdout.put( “Enter a small integer between -32768 and +32767: “ ); stdin.geti16(); mov( ax, i16 ); stdout.put( “Enter an integer between +/- 2 billion: “ ); stdin.geti32(); mov( eax, i32 ); // Display the input values. stdout.put ( nl, “Here are the numbers you entered:”, nl, nl, “Eight-bit integer: “, i8:12, nl, “16-bit integer: “, i16:12, nl, “32-bit integer: “, i32:12, nl );
end intInput;
Program 2.7
stdin.getiX Example Code
You should compile and run this program and test what happens when you enter a value that is out of range or enter an illegal string of characters.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 45
Chapter Two
Volume 1
2.10.9 The stdin.readLn and stdin.flushInput Routines Whenever you call an input routine like stdin.getc or stdin.geti32, the program does not necessarily read the value from the user at that moment. Instead, the HLA Standard Library buffers the input by reading a whole line of text from the user. Calls to input routines will fetch data from this input buffer until the buffer is empty. While this buffering scheme is efficient and convenient, sometimes it can be confusing. Consider the following code sequence: stdout.put( "Enter a small integer between -128 and +127: " ); stdin.geti8(); mov( al, i8 ); stdout.put( "Enter a small integer between -32768 and +32767: " ); stdin.geti16(); mov( ax, i16 );
Intuitively, you would expect the program to print the first prompt message, wait for user input, print the second prompt message, and wait for the second user input. However, this isn’t exactly what happens. For example if you run this code (from the sample program in the previous section) and enter the text “123 456” in response to the first prompt, the program will not stop for additional user input at the second prompt. Instead, it will read the second integer (456) from the input buffer read during the execution of the stdin.geti8 call. In general, the stdin routines only read text from the user when the input buffer is empty. As long as the input buffer contains additional characters, the input routines will attempt to read their data from the buffer. You may take advantage of this behavior by writing code sequences such as the following: stdout.put( “Enter two integer values: “ ); stdin.geti32(); mov( eax, intval ); stdin.geti32(); mov( eax, AnotherIntVal );
This sequence allows the user to enter both values on the same line (separated by one or more white space characters) thus preserving space on the screen. So the input buffer behavior is desirable every now and then. Unfortunately, the buffered behavior of the input routines is definitely counter-intuitive at other times. Fortunately, the HLA Standard Library provides two routines, stdin.readLn and stdin.flushInput, that let you control the standard input buffer. The stdin.readLn routine discards everything that is in the input buffer and immediately requires the user to enter a new line of text. The stdin.flushInput routine simply discards everything that is in the buffer. The next time an input routine executes, the system will require a new line of input from the user. You would typically call stdin.readLn immediately before some standard input routine; you would normally call stdin.flushInput immediately after a call to a standard input routine. Note: If you are calling stdin.readLn and you find that you are having to input your data twice, this is a good indication that you should be calling stdin.flushInput rather than stdin.readLn. In general, you should always be able to call stdin.flushInput to flush the input buffer and read a new line of data on the next input call. The stdin.readLn routine is rarely necessary, so you should use stdin.flushInput unless you really need to immediately force the input of a new line of text.
2.10.10The stdin.get Macro The stdin.get macro combines many of the standard input routines into a single call, in much the same way that stdout.put combines all of the output routines into a single call. Actually, stdin.get is much easier to use than stdout.put since the only parameters to this routine are a list of variable names. Let’s rewrite the example given in the previous section: stdout.put( “Enter two integer values: “ );
Page 46
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language stdin.geti32(); mov( eax, intval ); stdin.geti32(); mov( eax, AnotherIntVal );
Using the stdin.get macro, we could rewrite this code as: stdout.put( “Enter two integer values: “ ); stdin.get( intval, AnotherIntVal );
As you can see, the stdin.get routine is a little more convenient to use. Note that stdin.get stores the input values directly into the memory variables you specify in the parameter list; it does not return the values in a register unless you actually specify a register as a parameter. The stdin.get parameters must all be variables or registers18.
2.11
Putting It All Together This chapter has covered a lot of ground! While you’ve still got a lot to learn about assembly language programming, this chapter, combined with your knowledge of high level languages, provides just enough information to let you start writing real assembly language programs. In this chapter, you’ve seen the basic format for an HLA program. You’ve seen how to declare integer, character, and boolean variables. You have taken a look at the internal organization of the Intel 80x86 CPU family and learned about the MOV, ADD, and SUB instructions. You’ve looked at the basic HLA high level language control structures (IF, WHILE, REPEAT, FOR, BREAK, BREAKIF, FOREVER, and TRY) as well as what constitutes a legal boolean expression in these statements. Finally, this chapter has introduced several commonly-used routines in the HLA Standard Library. You might think that knowing only three machine instructions is hardly sufficient to write meaningful programs. However, those three instructions (mov, add, and sub), combined with the HLA high level control structures and the HLA Standard Library routines are actually equivalent to knowing several dozen machine instructions. Certainly enough to write simple programs. Indeed, with only a few more arithmetic instructions plus the ability to write your own procedures, you’ll be able to write almost any program. Of course, your journey into the world of assembly language has only just begun; you’ll learn some more instructions, and how to use them, starting in the next chapter.
2.12
Sample Programs This section contains several little HLA programs that demonstrate some of HLA’s features appearing in this chapter. These short examples also demonstrate that it is possible to write meaningful (if simple) programs in HLA using nothing more than the information appearing in this chapter. You may find all of the sample programs appearing in this section in the “ch02” subdirectory of the “volume1” directory in the software that accompanies this text.
2.12.1 Powers of Two Table Generation The following sample program generates a table listing all the powers of two between 2**0 and 2**30.
18. Note that register input is always in hexadecimal or base 16. The next chapter will discuss hexadecimal numbers.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 47
Chapter Two
Volume 1
// PowersOfTwo// // This program generates a nicely-formatted // “Powers of Two” table. It computes the // various powers of two by successively // doubling the value in the pwrOf2 variable. program PowersOfTwo; #include( “stdlib.hhf” ); static pwrOf2: LoopCntr:
int32; int32;
begin PowersOfTwo; // Print a start up banner. stdout.put( “Powers of two: “, nl, nl ); // Initialize “pwrOf2” with 2**0 (two raised to the zero power). mov( 1, pwrOf2 ); // Because of the limitations of 32-bit signed integers, // we can only display 2**0..2**30. mov( 0, LoopCntr ); while( LoopCntr < 31 ) do stdout.put( “2**(“, LoopCntr:2, “) = “, pwrOf2:10, nl ); // Double the value in pwrOf2 to compute the // next power of two. mov( pwrOf2, eax ); add( eax, eax ); mov( eax, pwrOf2 ); // Move on to the next loop iteration. inc( LoopCntr ); endwhile; stdout.newln(); end PowersOfTwo;
Program 2.8
Powers of Two Table Generator Program
2.12.2 Checkerboard Program This short little program demonstrates how to generate a checkerboard pattern with HLA.
Page 48
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language // // // // //
CheckerBoardThis program demonstrates how to draw a checkerboard using a set of nested while loops.
program CheckerBoard; #include( “stdlib.hhf” ); static xCoord: yCoord: ColCntr:
int8; int8; int8;
// Counts off eight squares in each row. // Counts off four pairs of squares in each column. // Counts off four rows in each square.
begin CheckerBoard; mov( 0, yCoord ); while( yCoord < 4 ) do // Display a row that begins with black. mov( 4, ColCntr ); repeat // // // //
Each square is a 4x4 group of spaces (white) or asterisks (black). Print out one row of asterisks/spaces for the current row of squares:
mov( 0, xCoord ); while( xCoord < 4 ) do stdout.put( “**** add( 1, xCoord );
“ );
endwhile; stdout.newln(); sub( 1, ColCntr ); until( ColCntr = 0 ); // Display a row that begins with white. mov( 4, ColCntr ); repeat // Print out a single row of // spaces/asterisks for this // row of squares: mov( 0, xCoord ); while( xCoord < 4 ) do stdout.put( “ ****” ); add( 1, xCoord ); endwhile; stdout.newln(); sub( 1, ColCntr );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 49
Chapter Two
Volume 1 until( ColCntr = 0 ); add( 1, yCoord ); endwhile;
end CheckerBoard;
Program 2.9
Checkerboard Generation Program
2.12.3 Fibonacci Number Generation The Fibonacci sequence is very important to certain algorithms in Computer Science and other fields. The following sample program generates a sequence of Fibonacci numbers for n=1..40.
// // // // // // // // // // //
This program generates the fibonocci sequence for n=1..40. The fibonocci sequence is defined recursively for positive integers as follows: fib(1) = 1; fib(2) = 1; fib( n ) = fib( n-1 ) + fib( n-2 ). This program provides an iterative solution.
program fib; #include( “stdlib.hhf” ); static FibCntr: CurFib: LastFib: TwoFibsAgo:
int32; int32; int32; int32;
begin fib; // Some simple initialization: mov( 1, LastFib ); mov( 1, TwoFibsAgo ); // Print fib(1) and fib(2) as a special case: stdout.put ( “fib( 1) = “fib( 2) =
Page 50
1”, nl 1”, nl
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Hello, World of Assembly Language ); // Use a loop to compute the remaining fib values: mov( 3, FibCntr ); while( FibCntr m) using sign extension. Unfortunately, given an n-bit number, you cannot always convert it to an m-bit number if m < n. For example, consider the value -448. As a 16-bit hexadecimal number, its representation is $FE40. Unfortunately, the magnitude of this number is too large to fit into an eight bit value, so you cannot sign contract it to eight bits. This is an example of an overflow condition that occurs upon conversion.
5. Zero extending into DX:AX or EDX:EAX is just as necessary as the CWD and CDQ instructions, as you will eventually see.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 75
Chapter Three
Volume 1
To properly sign contract one value to another, you must look at the H.O. byte(s) that you want to discard. The H.O. bytes you wish to remove must all contain either zero or $FF. If you encounter any other values, you cannot contract it without overflow. Finally, the H.O. bit of your resulting value must match every bit you’ve removed from the number. Examples (16 bits to eight bits): $FF80 $0040 $FE40 $0100
can be can be cannot cannot
sign contracted to sign contracted to be sign contracted be sign contracted
$80. $40. to 8 bits. to 8 bits.
Another way to reduce the size of an integer is through saturation. Saturation is useful in situations where you must convert a larger object to a smaller object and you’re willing to live with possible loss of precision. To convert a value via saturation you simply copy the larger value to the smaller value if it is not outside the range of the smaller object. If the larger value is outside the range of the smaller value, then you clip the value by setting it to the largest (or smallest) value within the range of the smaller object. For example, when converting a 16-bit signed integer to an eight-bit signed integer, if the 16-bit value is in the range -128..+127 you simply copy the L.O. byte of the 16-bit object to the eight-bit object. If the 16-bit signed value is greater than +127, then you clip the value to +127 and store +127 into the eight-bit object. Likewise, if the value is less than -128, you clip the final eight bit object to -128. Saturation works the same way when clipping 32-bit values to smaller values. If the larger value is outside the range of the smaller value, then you simply set the smaller value to the value closest to the out of range value that you can represent with the smaller value. Obviously, if the larger value is outside the range of the smaller value, then there will be a loss of precision during the conversion. While clipping the value to the limits the smaller object imposes is never desirable, sometimes this is acceptable as the alternative is to raise an exception or otherwise reject the calculation. For many applications, such as audio or video processing, the clipped result is still recognizable, so this is a reasonable conversion to use.
3.11
Shifts and Rotates Another set of logical operations which apply to bit strings are the shift and rotate operations. These two categories can be further broken down into left shifts, left rotates, right shifts, and right rotates. These operations turn out to be extremely useful to assembly language programmers. The left shift operation moves each bit in a bit string one position to the left (see Figure 3.8).
7
Figure 3.8
6
5
4
3
2
1
0
Shift Left Operation
Bit zero moves into bit position one, the previous value in bit position one moves into bit position two, etc. There are, of course, two questions that naturally arise: “What goes into bit zero?” and “Where does bit seven wind up?” We’ll shift a zero into bit zero and the previous value of bit seven will be the carry out of this operation. The 80x86 provides a shift left instruction, SHL, that performs this useful operation. The syntax for the SHL instruction is the following:
Page 76
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Data Representation shl( count, dest );
The count operand is either “CL” or a constant in the range 0..n, where n is one less than the number of bits in the destination operand (i.e., n=7 for eight-bit operands, n=15 for 16-bit operands, and n=31 for 32-bit operands). The dest operand is a typical dest operand, it can be either a memory location or a register. When the count operand is the constant one, the SHL instruction does the following:
H.O. Bit
4
2
1
0
...
C
Figure 3.9
3
0
Operation of the SHL( 1, Dest) Instruction
In Figure 3.9, the “C” represents the carry flag. That is, the bit shifted out of the H.O. bit of the operand is moved into the carry flag. Therefore, you can test for overflow after a SHL( 1, dest ) instruction by testing the carry flag immediately after executing the instruction (e.g., by using “if( @c ) then...” or “if( @nc ) then...”). Intel’s literature suggests that the state of the carry flag is undefined if the shift count is a value other than one. Usually, the carry flag contains the last bit shifted out of the destination operand, but Intel doesn’t seem to guarantee this. If you need to shift more than one bit out of an operand and you need to capture all the bits you shift out, you should take a look at the SHLD and SHRD instructions in the appendicies. Note that shifting a value to the left is the same thing as multiplying it by its radix. For example, shifting a decimal number one position to the left ( adding a zero to the right of the number) effectively multiplies it by ten (the radix): 1234 shl 1 = 12340
(shl 1 means shift one digit position to the left)
Since the radix of a binary number is two, shifting it left multiplies it by two. If you shift a binary value to the left twice, you multiply it by two twice (i.e., you multiply it by four). If you shift a binary value to the left three times, you multiply it by eight (2*2*2). In general, if you shift a value to the left n times, you multiply that value by 2n. A right shift operation works the same way, except we’re moving the data in the opposite direction. Bit seven moves into bit six, bit six moves into bit five, bit five moves into bit four, etc. During a right shift, we’ll move a zero into bit seven, and bit zero will be the carry out of the operation (see Figure 3.10).
7
6
5
4
3
2
1
0
Figure 3.10
0 C
Shift Right Operation
As you would probably expect by now, the 80x86 provides a SHR instruction that will shift the bits to the right in a destination operand. The syntax is the same as the SHL instruction except, of course, you specify SHR rather than SHL: SHR( count, dest );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 77
Chapter Three
Volume 1
This instruction shifts a zero into the H.O. bit of the destination operand, it shifts all the other bits one place to the right (that is, from a higher bit number to a lower bit number). Finally, bit zero is shifted into the carry flag. If you specify a count of one, the SHR instruction does the following:
H.O. Bit
5 4
2
1
0
...
0
Figure 3.11
3
C
SHR( 1, Dest ) Operation
Once again, Intel’s documents suggest that shifts of more than one bit leave the carry in an undefined state. Since a left shift is equivalent to a multiplication by two, it should come as no surprise that a right shift is roughly comparable to a division by two (or, in general, a division by the radix of the number). If you perform n right shifts, you will divide that number by 2n. There is one problem with shift rights with respect to division: as described above a shift right is only equivalent to an unsigned division by two. For example, if you shift the unsigned representation of 254 (0FEh) one place to the right, you get 127 (07Fh), exactly what you would expect. However, if you shift the binary representation of -2 (0FEh) to the right one position, you get 127 (07Fh), which is not correct. This problem occurs because we’re shifting a zero into bit seven. If bit seven previously contained a one, we’re changing it from a negative to a positive number. Not a good thing when dividing by two. To use the shift right as a division operator, we must define a third shift operation: arithmetic shift right6. An arithmetic shift right works just like the normal shift right operation (a logical shift right) with one exception: instead of shifting a zero into bit seven, an arithmetic shift right operation leaves bit seven alone, that is, during the shift operation it does not modify the value of bit seven as Figure 3.12 shows.
7
Figure 3.12
6
5
4
3
2
1
0
Arithmetic Shift Right Operation
This generally produces the result you expect. For example, if you perform the arithmetic shift right operation on -2 (0FEh) you get -1 (0FFh). Keep one thing in mind about arithmetic shift right, however. This operation always rounds the numbers to the closest integer which is less than or equal to the actual result. Based on experiences with high level programming languages and the standard rules of integer truncation, most people assume this means that a division always truncates towards zero. But this simply isn’t the case. For example, if you apply the arithmetic shift right operation on -1 (0FFh), the result is -1, not zero. -1 is less than zero so the arithmetic shift right operation rounds towards minus one. This is not a “bug” in the arithmetic shift right operation, it’s just uses a diffferent (though valid) definition of integer division.
6. There is no need for an arithmetic shift left. The standard shift left operation works for both signed and unsigned numbers, assuming no overflow occurs.
Page 78
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Data Representation The 80x86 provides an arithmetic shift right instruction, SAR (shift arithmetic right). This instruction’s syntax is nearly identical to SHL and SHR. The syntax is SAR( count, dest );
The usual limitations on the count and destination operands apply. This instruction does the following if the count is one:
H. O . B i t
4
5
3
2
1
0
...
Figure 3.13
C
SAR(1, dest) Operation
Once again, Intel’s documents suggest that shifts of more than one bit leave the carry in an undefined state. Another pair of useful operations are rotate left and rotate right. These operations behave like the shift left and shift right operations with one major difference: the bit shifted out from one end is shifted back in at the other end.
7
Figure 3.14
6
4
3
2
1
0
Rotate Left Operation
7
Figure 3.15
5
6
5
4
3
2
1
0
Rotate Right Operation
The 80x86 provides ROL (rotate left) and ROR (rotate right) instructions that do these basic operations on their operands. The syntax for these two instructions is similar to the shift instructions: rol( count, dest ); ror( count, dest );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 79
Chapter Three
Volume 1
Once again, this instructions provide a special behavior if the shift count is one. Under this condition these two instructions also copy the bit shifted out of the destination operand into the carry flag as the following two figures show:
H.O. Bit
5
4
3
2
1
0
... C
Figure 3.16
ROL( 1, Dest) Operation
Note that, Intel’s documents suggest that rotates of more than one bit leave the carry in an undefined state.
H.O. Bit
5
4
3
2
1
0
...
C Figure 3.17
ROR( 1, Dest ) Operation
It will turn out that it is often more convenient for the rotate operation to shift the output bit through the carry and shift the previous carry value back into the input bit of the shift operation. The 80x86 RCL (rotate through carry left) and RCR (rotate through carry right) instructions achieve this for you. These instructions use the following syntax: RCL( count, dest ); RCR( count, dest );
As is true for the other shift and rotate instructions, the count operand is either a constant or the CL register and the destination operand is a memory location or register. The count operand must be a value that is less than the number of bits in the destination operand. For a count value of one, these two instructions do the following:
Page 80
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Data Representation
H.O. Bit
5
4
3
2
1
0
... C
Figure 3.18
RCL( 1, Dest ) Operation
H.O. Bit
5
4
3
2
1
0
...
C
Figure 3.19
RCR( 1, Dest) Operation
Again, Intel’s documents suggest that rotates of more than one bit leave the carry in an undefined state.
3.12
Bit Fields and Packed Data Although the 80x86 operates most efficiently on byte, word, and double word data types, occasionally you’ll need to work with a data type that uses some number of bits other than eight, 16, or 32. For example, consider a date of the form “04/02/01”. It takes three numeric values to represent this date: a month, day, and year value. Months, of course, take on the values 1..12. It will require at least four bits (maximum of sixteen different values) to represent the month. Days range between 1..31. So it will take five bits (maximum of 32 different values) to represent the day entry. The year value, assuming that we’re working with values in the range 0..99, requires seven bits (which can be used to represent up to 128 different values). Four plus five plus seven is 16 bits, or two bytes. In other words, we can pack our date data into two bytes rather than the three that would be required if we used a separate byte for each of the month, day, and year values. This saves one byte of memory for each date stored, which could be a substantial saving if you need to store a lot of dates. The bits could be arranged as shown in the following figure:
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 81
Chapter Three
Volume 1
15 14 13 12 11 10
M M M M D
Figure 3.20
9
8
D D D
7
6
5
4
3
2
1
0
D Y Y Y Y
Y
Y Y
Short Packed Date Format (Two Bytes)
MMMM represents the four bits making up the month value, DDDDD represents the five bits making up the day, and YYYYYYY is the seven bits comprising the year. Each collection of bits representing a data item is a bit field. April 2nd, 2001 would be represented as $4101: 0100 4
00010 0000001 = %0100_0001_0000_0001 or $4101 2 01
Although packed values are space efficient (that is, very efficient in terms of memory usage), they are computationally inefficient (slow!). The reason? It takes extra instructions to unpack the data packed into the various bit fields. These extra instructions take additional time to execute (and additional bytes to hold the instructions); hence, you must carefully consider whether packed data fields will save you anything. The following sample program demonstrates the effort that must go into packing and unpacking this 16-bit date format:
program dateDemo; #include( “stdlib.hhf” ); static day: month: year:
uns8; uns8; uns8;
packedDate: word; begin dateDemo; stdout.put( “Enter the current month, day, and year: “ ); stdin.get( month, day, year ); // Pack the data into the following bits: // // 15 14 13 12 11 10 9 8 7 6 5 4 3 // m m m m d d d d d y y y y mov( 0, ax ); mov( ax, packedDate ); if( month > 12 ) then
2 y
1 y
0 y
//Just in case there is an error.
stdout.put( “Month value is too large”, nl ); elseif( month = 0 ) then stdout.put( “Month value must be in the range 1..12”, nl ); elseif( day > 31 ) then stdout.put( “Day value is too large”, nl );
Page 82
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Data Representation
elseif( day = 0 ) then stdout.put( “Day value must be in the range 1..31”, nl ); elseif( year > 99 ) then stdout.put( “Year value must be in the range 0..99”, nl ); else mov( month, al ); shl( 5, ax ); or( day, al ); shl( 7, ax ); or( year, al ); mov( ax, packedDate ); endif; // Okay, display the packed value: stdout.put( “Packed data = $”, packedDate, nl );
// Unpack the date: mov( packedDate, ax ); and( $7f, al ); mov( al, year );
// Retrieve the year value.
mov( shr( and( mov(
packedDate, ax ); 7, ax ); %1_1111, al ); al, day );
// Retrieve the day value.
mov( rol( and( mov(
packedDate, ax ); 4, ax ); %1111, al ); al, month );
// Retrive the month value.
stdout.put( “The date is “, month, “/”, day, “/”, year, nl );
end dateDemo;
Program 3.19
Packing and Unpacking Date Data
Of course, having gone through the problems with Y2K, using a date format that limits you to 100 years (or even 127 years) would be quite foolish at this time. If you’re concerned about your software running 100 years from now, perhaps it would be wise to use a three-byte date format rather than a two-byte format. As you will see in the chapter on arrays, however, you should always try to create data objects whose length is an even power of two (one byte, two bytes, four bytes, eight bytes, etc.) or you will pay a performance penalty. Hence, it is probably wise to go ahead and use four bytes and pack this data into a dword variable. Figure 3.21 shows a possible data organization for a four-byte date. Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 83
Chapter Three
Volume 1
31
16 15
Year (0-65535) Figure 3.21
8 7
Month (1-12)
0
Day (1-31)
Long Packed Date Format (Four Bytes)
In this long packed data format several changes were made beyond simply extending the number of bits associated with the year. First, since there are lots of extra bits in a 32-bit dword variable, this format allots extra bits to the month and day fields. Since these two fields consist of eight bits each, they can be easily extracted as a byte object from the dword. This leaves fewer bits for the year, but 65,536 years is probably sufficient; you can probably assume without too much concern that your software will not still be in use 63 thousand years from now when this date format will wrap around. Of course, you could argue that this is no longer a packed date format. After all, we needed three numeric values, two of which fit just nicely into one byte each and one that should probably have at least two bytes. Since this “packed” date format consumes the same four bytes as the unpacked version, what is so special about this format? Well, another difference you will note between this long packed date format and the short date format appearing in Figure 3.20 is the fact that this long date format rearranges the bits so the Year is in the H.O. bit positions, the Month field is in the middle bit positions, and the Day field is in the L.O. bit positions. This is important because it allows you to very easily compare two dates to see if one date is less than, equal to, or greater than another date. Consider the following code: mov( Date1, eax ); if( eax > Date2 ) then
// Assume Date1 and Date2 are dword variables // using the Long Packed Date format.
Date2 >> endif;
Had you kept the different date fields in separate variables, or organized the fields differently, you would not have been able to compare Date1 and Date2 in such a straight-forward fashion. Therefore, this example demonstrates another reason for packing data even if you don’t realize any space savings- it can make certain computations more convenient or even more efficient (contrary to what normally happens when you pack data). Examples of practical packed data types abound. You could pack eight boolean values into a single byte, you could pack two BCD digits into a byte, etc. Of course, a classic example of packed data is the FLAGs register (see Figure 3.22). This register packs nine important boolean objects (along with seven important system flags) into a single 16-bit register. You will commonly need to access many of these flags. For this reason, the 80x86 instruction set provides many ways to manipulate the individual bits in the FLAGs register. Of course, you can test many of the condition code flags using the HLA @c, @nc, @z, @nz, etc., pseudo-boolean variables in an IF statement or other statement using a boolean expression. In addition to the condition codes, the 80x86 provides instructions that directly affect certain flags. These instructions include the following: • • • • • • • • •
Page 84
cld(); std(); cli(); sti(); clc(); stc(); cmc(); sahf(); lahf();
Clears (sets to zero) the direction flag. Sets (to one) the direction flag. Clears the interrupt disable flag. Sets the interrupt disable flag. Clears the carry flag. Sets the carry flag. Complements (inverts) the carry flag. Stores the AH register into the L.O. eight bits of the FLAGs register. Loads AH from the L.O. eight bits of the FLAGs register.
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Data Representation There are other instructions that affect the FLAGs register as well; these, however, demonstrate how to access several of the packed boolean values in the FLAGs register. The LAHF and SAHF instructions, in particular, provide a convenient way to access the L.O. eight bits of the FLAGs register as an eight-bit byte (rather than as eight separate one-bit values).
Overflow Direction Interrupt Trace Sign Zero
Reserved for System Purposes
Auxiliary Carry Parity Carry Figure 3.22
The FLAGs Register as a Packed Data Type
The LAHF (load AH with the L.O. eight bits of the FLAGs register) and the SAHF (store AH into the L.O. byte of the FLAGs register) use the following syntax: lahf(); sahf();
3.13
Putting It All Together In this chapter you’ve seen how we represent numeric values inside the computer. You’ve seen how to represent values using the decimal, binary, and hexadecimal numbering systems as well as the difference between signed and unsigned numeric representation. Since we represent nearly everything else inside a computer using numeric values, the material in this chapter is very important. Along with the base representation of numeric values, this chapter discusses the finite bit-string organization of data on typical computer systems, specfically bytes, words, and doublewords. Next, this chapter discusses arithmetic and logical operations on the numbers and presents some new 80x86 instructions to apply these operations to values inside the CPU. Finally, this chapter concludes by showing how you can pack several different numeric values into a fixed-length object (like a byte, word, or doubleword). Absent from this chapter is any discussion of non-integer data. For example, how do we represent real numbers as well as integers? How do we represent characters, strings, and other non-numeric data? Well, that’s the subject of the next chapter, so keep on reading...
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 85
Chapter Three
Page 86
Volume 1
© 2001, By Randall Hyde
Beta Draft - Do not distribute
More Data Representation
More Data Representation 4.1
Chapter Four
Chapter Overview Although the basic machine data objects (bytes, words, and double words) appear to represent nothing more than signed or unsigned numeric values, we can employ these data types to represent many other types of objects. This chapter discusses some of the other objects and their internal computer representation. This chapter begins by discussing the floating point (real) numeric format. After integer representation, floating point representation is the second most popular numeric format in use on modern computer systems1. Although the floating point format is somewhat complex, the necessity to handle non-integer calculations in modern programs requires that you understand this numeric format and its limitations. Binary Coded Decimal (BCD) is another numeric data representation that is useful in certain contexts. Although BCD is not suitable for general purpose arithmetic, it is useful in some embedded applications. The principle benefit of the BCD format is the ease with which you can convert between string and BCD format. When we look at the BCD format a little later in this chapter, you’ll see why this is the case. Computers can represent all kinds of different objects, not just numeric values. Characters are, unquestionably, one of the more popular data types a computer manipulates. In this chapter you will take a look at a couple of different ways we can represent individual characters on a computer system. This chapter discusses two of the more common character sets in use today: the ASCII character set and the Unicode character set. This chapter concludes by discussing some common non-numeric data types like pixel colors on a video display, audio data, video data, and so on. Of course, there are lots of different representations for any kind of standard data you could envision; there is no way two chapters in a textbook can cover them all. (And that’s not even considering specialized data types you could create). Nevertheless, this chapter (and the last) should give you the basic idea behind representing data on a computer system.
4.2
An Introduction to Floating Point Arithmetic Integer arithmetic does not let you represent fractional numeric values. Therefore, modern CPUs support an approximation of real arithmetic: floating point arithmetic. A big problem with floating point arithmetic is that it does not follow the standard rules of algebra. Nevertheless, many programmers apply normal algebraic rules when using floating point arithmetic. This is a source of defects in many programs. One of the primary goals of this section is to describe the limitations of floating point arithmetic so you will understand how to use it properly. Normal algebraic rules apply only to infinite precision arithmetic. Consider the simple statement “x:=x+1,” x is an integer. On any modern computer this statement follows the normal rules of algebra as long as overflow does not occur. That is, this statement is valid only for certain values of x (minint = (Value2-error) and Value1 B or B < A.
3
NOT B. Ignores A and returns B’.
4
Inhibition = BA’ (B, not A). Also equivalent to B>A or A= A.
12
Copy B. Returns the value of B and ignores A’s value.
13
Implication, A implies B, or B + A’ (if A then B). Also equivalent to A >= B.
14
Logical OR = A+B. Returns A OR B.
15
One or Set. Always returns one regardless of A and B input values.
Beyond two input variables there are too many functions to provide specific names. Therefore, we will refer to the function s number rather than the function s name. For example, F8 denotes the logical AND of A and B for a two-input function and F14 is the logical OR operation. Of course, the only problem is to determine a function s number. For example, given the function of three variables F=AB+C, what is the corresponding function number? This number is easy to compute by looking at the truth table for the function. If we treat the values for A, B, and C as bits in a binary number with C being the H.O. bit and A being the L.O. bit, they produce the binary numbers in the range zero through seven. Associated with each of these binary strings is a zero or one function result. If we construct a binary value by placing the function result in the bit position specified by A, B, and C, the resulting binary number is that function s number. Consider the truth table for F=AB+C: CBA: 7 6 F=AB+C:1 1
5 1
4 1
3 1
2 0
1 0
0 0
If we treat the function values for F as a binary number, this produces the value F816 or 24810. We will usually denote function numbers in decimal. This also provides the insight into why there are 2**2n different functions of n variables: if you have n input variables, there are 2n bits in function s number. If you have m bits, there are 2m different values. Therefore, for n input variables there are m=2n possible bits and 2m or 2**2n possible functions.
3.3
Algebraic Manipulation of Boolean Expressions
You can transform one boolean expression into an equivalent expression by applying the postulates and theorems of boolean algebra. This is important if you want to convert a given expression to a canonical form (a standardized form) or if you want to minimize the number of literals (primed or unprimed variables) or terms in an expression. Minimizing terms and expressions can be important because electrical circuits often consist of individual components that implement each term or literal for a given expression. Minimizing the expression allows the designer to use fewer electrical components and, therefore, can reduce the cost of the system. Unfortunately, there are no fixed rules you can apply to optimize a given expression. Much like constructing mathematical proofs, an individual s ability to easily do these transformations is usually a function of experience. Nevertheless, a few examples can show the possibilities: ab + ab’ + a’b
= =
a(b+b’) + a’b a•1 + a’b
By P4 By P5
Page 208
(a’b + a’b’ + b’)‘
= =
a + a’b a + b
By Th4 By Th11
= = = = =
( a’(b+b’) + b’)’ (a’•1 + b’)’ (a’ + b’) ( (ab)’ )’ ab
By By By By By
b(a+c) + ab’ + bc’ + c= = = =
P4 P5 Th4 Th8 definition of not
ba + bc + ab’ + bc’ + cBy P4 a(b+b’) + b(c + c’) + cBy P4 a•1 + b•1 + cBy P5 a + b + cBy Th4
Although these examples all use algebraic transformations to simplify a boolean expression, we can also use algebraic operations for other purposes. For example, the next section describes a canonical form for boolean expressions. We can use algebraic manipulation to produce canonical forms even though the canonical forms are rarely optimal. 3.4
Canonical Forms
Since there are a finite number of boolean functions of n input variables, yet an infinite number of possible logic expressions you can construct with those n input values, clearly there are an infinite number of logic expressions that are equivalent (i.e., they produce the same result given the same inputs). To help eliminate possible confusion, logic designers generally specify a boolean function using a canonical, or standardized, form. For any given boolean function there exists a unique canonical form. This eliminates some confusion when dealing with boolean functions. Actually, there are several different canonical forms. We will discuss only two here and employ only the first of the two. The first is the so-called sum of minterms and the second is the product of maxterms. Using the duality principle, it is very easy to convert between these two. A term is a variable or a product (logical AND) of several different literals. For example, if you have two variables, A and B, there are eight possible terms: A, B, A’, B’, A’B’, A’B, AB’, and AB. For three variables we have 26 different terms: A, B, C, A’, B’, C’, A’B’, A’B, AB’, AB, A’C’, A’C, AC’, AC, B’C’, B’C, BC’, BC, A’B’C’, AB’C’, A’BC’, ABC’, A’B’C, AB’C, A’BC, and ABC. As you can see, as the number of variables increases, the number of terms increases dramatically. A minterm is a product containing exactly n literals. For example, the minterms for two variables are A’B’, AB’, A’B, and AB. Likewise, the minterms for three variables A, B, and C are A’B’C’, AB’C’, A’BC’, ABC’, A’B’C, AB’C, A’BC, and ABC. In general, there are 2n minterms for n variables. The set of possible minterms is very easy to generate since they correspond to the sequence of binary numbers:
Page 209
Binary Equivalent (CBA)
Minterm
000
A’B’C’
001
AB’C’
010
A’BC’
011
ABC’
100
A’B’C
101
AB’C
110
A’BC
111
ABC
We can specify any boolean function using a sum (logical OR) of minterms. Given F248=AB+C the equivalent canonical form is ABC+A’BC+AB’C+A’B’C+ABC’. Algebraically, we can show that these two are equivalent as follows: ABC+A’BC+AB’C+A’B’C+ABC’= = = = =
BC(A+A’) + B’C(A+A’) BC•1 +B’C•1 + ABC’ C(B+B’) + ABC’ C + ABC’ C + AB
+ ABC’By P4 By Th15 By P4 By Th15 & Th4 By Th11
Obviously, the canonical form is not the optimal form. On the other hand, there is a big advantage to the sum of minterms canonical form: it is very easy to generate the truth table for a function from this canonical form. Furthermore, it is also very easy to generate the logic equation from the truth table. To build the truth table from the canonical form, simply convert each minterm into a binary value by substituting a 1 for unprimed variables and a 0 for primed variables. Then place a 1 in the corresponding posi tion (specified by the binary minterm value) in the truth table: 1) Convert minterms to binary equivalents: F248 = CBA + CBA + CB A + CB A + C BA = 111 + 110 + 101 + 100 + 011 2) Substitute a one in the truth table for each entry above:
Page 210
C
B
A
F = AB+C
0
0
0
0
0
1
0
1
0
0
1
1
1
1
0
0
1
1
0
1
1
1
1
0
1
1
1
1
1
Finally, put zeros in all the entries that you did not fill with ones in the first step above: C
B
A
F = AB+C
0
0
0
0
0
0
1
0
0
1
0
0
0
1
1
1
1
0
0
1
1
0
1
1
1
1
0
1
1
1
1
1
Going in the other direction, generating a logic function from a truth table, is almost as easy. First, locate all the entries in the truth table with a one. In the table above, these are the last five entries. The number of table entries containing ones determines the number of minterms in the canonical equation. To generate the individual minterms, substitute A, B, or C for ones and A’, B’, or C’ for zeros in the truth table above. Then compute the sum of these items. In the example above, F248 contains one for CBA = 111, 110, 101, 100, and 011. Therefore, F248 = CBA + CBA’ + CB’A + CB’A’ + C’AB. The first term, CBA, comes from the last entry in the table above. C, B, and A all contain ones so we generate the minterm CBA (or ABC, if you prefer). The second to last entry contains 110 for CBA, so we generate the minterm CBA’. Likewise, 101 produces CB’A; 100 produces CB’A , and 011 produces C’BA. Of course, the logical OR and logical AND operations are both commutative, so we can rearrange the terms within the minterms as we please and we can rearrange the minterms within the sum as we see fit. This process works equally well for any number of variables. Consider the function F53504 = ABCD + A’BCD + A’B’CD + A’B’C’D. Placing ones in the appropriate positions in the truth table generates the following:
Page 211
B
A
0
0
0
0
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
0
1
0
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
1
0
1
0
1
0
1
1
1
1
0
0
1
1
0
1
1
1
1
0
1
1
1
1
1
1
D
C
0
F = ABCD + A’BCD + A’B’CD + A’B’C’D
1
1
The remaining elements in this truth table all contain zero. Perhaps the easiest way to generate the canonical form of a boolean function is to first generate the truth table for that function and then build the canonical form from the truth table. We ll use this technique, for example, when converting between the two canonical forms this chapter presents. However, it is also a simple matter to generate the sum of minterms form algebraically. By using the distributive law and theorem 15 (A + A’ = 1) makes this task easy. Consider F248 = AB + C. This function contains two terms, AB and C, but they are not minterms. Minterms contain each of the possible variables in a primed or unprimed form. We can convert the first term to a sum of minterms as follows: AB
= = = =
AB • 1 AB • (C + C’) ABC + ABC’ CBA + C’BA
By By By By
Th4 Th 15 distributive law associative law
Similarly, we can convert the second term in F248 to a sum of minterms as follows: C
= = = = = = =
C • 1 By Th4 C • (A + A’) By Th15 CA + CA’ By distributive law CA•1 + CA’•1 By Th4 CA • (B + B’) + CA’ • (B + B’)By Th15 CAB + CAB’ + CA’B + CA’B’ By distributive law CBA + CBA’ + CB’A + CB’A’ By associative law
Page 212
The last step (rearranging the terms) in these two conversions is optional. To obtain the final canonical form for F248 we need only sum the results from these two conversions: F248
= =
(CBA + C’BA) + (CBA + CBA’ + CB’A + CB’A’) CBA + CBA’ + CB’A + CB’A’ + C’BA
Another way to generate a canonical form is to use products of maxterms. A maxterm is the sum (logical OR) of all input variables, primed or unprimed. For example, consider the following logic function G of three variables: G = (A+B+C) • (A’+B+C) • (A+B’+C).
Like the sum of minterms form, there is exactly one product of maxterms for each possible logic function. Of course, for every product of maxterms there is an equivalent sum of minterms form. In fact, the function G, above, is equivalent to F248 = CBA + CBA + CB A + CB A + C BA = AB +C. Generating a truth table from the product of maxterms is no more difficult than building it from the sum of minterms. You use the duality principle to accomplish this. Remember, the duality principle says to swap AND for OR and zeros for ones (and vice versa). Therefore, to build the truth table, you would first swap primed and non-primed literals. In G above, this would yield: G= (A + B + C ) ¥ (A + B + C ) ¥ (A + B + C ) The next step is to swap the logical OR and logical AND operators. This produces G = A B C + AB C + A BC Finally, you need to swap all zeros and ones. This means that you store zeros into the truth table for each of the above entries and then fill in the rest of the truth table with ones. This will place a zero in entries zero, one, and two in the truth table. Filling the remaining entries with ones produces F248. You can easily convert between these two canonical forms by generating the truth table for one form and working backwards from the truth table to produce the other form. For example, consider the function of two variables, F7 = A + B. The sum of minterms form is F7 = A’B + AB’ + AB. The truth table takes the form: Table 15: F7 (OR) Truth Table for Two Variables F7
A
B
0
0
0
0
1
0
1
0
1
1
1
1
Working backwards to get the product of maxterms, we locate all entries that have a zero result. This is the entry with A and B equal to zero. This gives us the first step of G=A’B’. However, we still need to invert all the vari-
Page 213
ables to obtain G=AB. By the duality principle we need to swap the logical OR and logical AND operators obtaining G=A+B. This is the canonical product of maxterms form. Since working with the product of maxterms is a little messier than working with sums of minterms, this text will generally use the sum of minterms form. Furthermore, the sum of minterms form is more common in boolean logic work. However, you will encounter both forms when studying logic design.
3.5
Simplification of Boolean Functions
Since there are an infinite variety of boolean functions of n variables, but only a finite number of unique boolean functions of those n variables, you might wonder if there is some method that will simplify a given boolean function to produce the optimal form. Of course, you can always use algebraic transformations to produce the optimal form, but using heuristics does not guarantee an optimal transformation. There are, however, two methods that will reduce a given boolean function to its optimal form: the map method and the prime implicants method. In this text we will only cover the mapping method, see any text on logic design for other methods. Since for any logic function some optimal form must exist, you may wonder why we don t use the optimal form for the canonical form. There are two reasons. First, there may be several optimal forms. They are not guaranteed to be unique. Second, it is easy to convert between the canonical and truth table forms. Using the map method to optimize boolean functions is practical only for functions of two, three, or four variables. With care, you can use it for functions of five or six variables, but the map method is cumbersome to use at that point. For more than six variables, attempting map simplifications by hand would not be wise2. The first step in using the map method is to build a two-dimensional truth table for the function (see Figure 3.1)
2. However, it s probably quite reasonable to write a program that uses the map method for seven or more variables.
Page 214
BA
A
0
0
1
B'A'
B'A
00
01
11
10
0 C'B'A' C'B'A
C'AB
C'BA'
1 CB'A'
CAB
CBA'
C
B 1
BA'
BA
CB'A
Three Variable Truth Table
Two Variable Truth Table
BA 00
01
11
10
00
D'C'B'A' D'C'B'A D'C'AB D'C'BA'
01
D'CB'A' D'CB'A D'CAB D'CBA'
11
DCB'A'
10
DC'B'A' DC'B'A DC'AB DC'BA'
DC DCB'A DCAB
DCBA'
Four Variable Truth Table
Figure 3.1
Two, Three, and Four Dimensional Truth Tables
Warning: Take a careful look at these truth tables. They do not use the same forms appearing earlier in this chapter. In particular, the progression of the values is 00, 01, 11, 10, not 00, 01, 10, 11. This is very important! If you organize the truth tables in a binary sequence, the mapping optimization method will not work properly. We will call this a truth map to distinguish it from the standard truth table. Assuming your boolean function is in canonical form (sum of minterms), insert ones for each of the truth map entries corresponding to a minterm in the function. Place zeros everywhere else. For example, consider the function of three variables F=C’B’A + C’BA’ + C’BA + CB’A’ + CB’A + CBA’ + CBA. Figure 3.2 shows the truth map for this function.
Page 215
BA 00
01
11
10
0
0
1
1
1
1
1
1
1
1
C
F=C’B’A + C’BA’ + C’BA + CB’A’ + CB’A + CBA’ + CBA.
Figure 3.2
A Simple Truth Map
The next step is to draw rectangles around rectangular groups of ones. The rectangles you enclose must have sides whose lengths are powers of two. For functions of three variables, the rectangles can have sides whose lengths are one, two, and four. The set of rectangles you draw must surround all cells containing ones in the truth map. The trick is to draw all possible rectangles unless a rectangle would be completely enclosed within another. Note that the rectangles may overlap if one does not enclose the other. In the truth map in Figure 3.3 there are three such rectangles (see Figure 3.3) BA 00
01
11
10
0
0
1
1
1
1
1
1
1
1
C
Three possible rectangles whose lengths and widths are powers of two. Figure 3.3
Surrounding Rectangular Groups of Ones in a Truth Map
Each rectangle represents a term in the simplified boolean function. Therefore, the simplified boolean function will contain only three terms. You build each term using the process of elimination. You eliminate any variables whose primed and unprimed form both appear within the rectangle. Consider the long skinny rectangle above that is sitting in the row where C=1. This rectangle contains both A and B in primed and unprimed form. Therefore, we can eliminate A and B from the term. Since the rectangle sits in the C=1 region, this rectangle represents the single literal C. Now consider the blue square above. This rectangle includes C, C’, B, B’ and A. Therefore, it represents the single term A. Likewise, the red square above contains C, C’, A, A’ and B. Therefore, it represents the single term B. The final, optimal, function is the sum (logical OR) of the terms represented by the three squares. Therefore, F= A + B + C. You do not have to consider the remaining squares containing zeros. When enclosing groups of ones in the truth map, you must consider the fact that a truth map forms a torus (i.e., a doughnut shape). The right edge of the map wraps around to the left edge (and vice-versa). Likewise, the top edge wraps around to the bottom edge. This introduces additional possibilities when surrounding groups of
Page 216
ones in a map. Consider the boolean function F=C’B’A’ + C’BA’ + CB’A’ + CBA’. Figure 3.4 shows the truth map for this function. BA 00
01
11
10
0
1
0
0
1
1
1
0
0
1
C
F=C’'B’A’ + C’BA' + CB’A’ + CBA'. Figure 3.4
Truth Map for F=C’B’A’ + C’BA’ + CB’A’ + CBA’
At first glance, you would think that there are two possible rectangles here as Figure 3.5 shows.
BA 0 0
0 1
1 1
1 0
1
0
0
1
1
0
0
1
C
Figure 3.5
First Attempt at Surrounding Rectangles Formed by Ones
However, because the truth map is a continuous object with the right side and left sides connected, we can form a single, square rectangle, as Figure 3.6 shows.
BA 00
01
11
10
0
1
0
0
1
1
1
0
0
1
C
Figure 3.6
Correct Rectangle for the Function
So what? Why do we care if we have one rectangle or two in the truth map? The answer is because the larger the rectangles are, the more terms they will eliminate. The fewer rectangles that we have, the fewer terms will appear in the final boolean function. For example, the former example with two rectangles generates a function with two terms. The first rectangle (on the left) eliminates the C variable, leaving A’B’ as its term. The second rectangle, on the right, also eliminates the C variable, leaving the term BA’. Therefore, this truth map would produce the equation F=A’B’ + A’B. We know this is not optimal, see Th 13. Now consider the second truth map above. Here we have a single rectangle so our boolean function will only have a single
Page 217
term. Obviously this is more optimal than an equation with two terms. Since this rectangle includes both C and C’ and also B and B’, the only term left is A’. This boolean function, therefore, reduces to F=A’. There are only two cases that the truth map method cannot handle properly: a truth map that contains all zeros or a truth map that contains all ones. These two cases correspond to the boolean functions F=0 and F=1 (that is, the function number is 2n-1), respectively. These functions are easy to generate by inspection of the truth map. An important thing you must keep in mind when optimizing boolean functions using the mapping method is that you always want to pick the largest rectangles whose sides’ lengths are a power of two. You must do this even for overlapping rectangles (unless one rectangle encloses another). Consider the boolean function F = C'B'A' + C'BA' + CB'A' + C'AB + CBA' + CBA. This produces the truth map appearing in Figure 3.7. BA 00
01
11
10
0
1
0
1
1
1
1
0
1
1
C
Figure 3.7
Truth Map for F = C'B'A' + C'BA' + CB'A' + C'AB + CBA' + CBA
The initial temptation is to create one of the sets of rectangles found in Figure 3.8. However, the correct mapping appears in Figure 3.9 BA
0
BA
00
01
11
10
1
0
1
1
C
01
11
10
0
1
0
1
1
1
1
0
1
1
C 1
Figure 3.8
00
1
0
1
1
Obvious Choices for Rectangles
BA 00
01
11
10
0
1
0
1
1
1
1
0
1
1
C
Figure 3.9
Correct Set of Rectangles for F = C'B'A' + C'BA' + CB'A' + C'AB + CBA' + CBA
All three mappings will produce a boolean function with two terms. However, the first two will produce the expressions F= B + A'B' and F = AB + A'. The third form produces F = B + A'. Obviously, this last form is better optimized than the other two forms (see theorems 11 and 12).
Page 218
For functions of three variables, the size of the rectangle determines the number of terms it represents: • • • •
A rectangle enclosing a single square represents a minterm. The associated term will have three literals (assuming we’re working with functions of three variables). A rectangle surrounding two squares containing ones represents a term containing two literals. A rectangle surrounding four squares containing ones represents a term containing a single literal. A rectangle surrounding eight squares represents the function F = 1.
Truth maps you create for functions of four variables are even trickier. This is because there are lots of places rectangles can hide from you along the edges. Figure 3.10 shows some possible places rectangles can hide. 00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
00 01 11 10
00 01 11 10
00 01 11 10
00 01 11 10
00
00
00
00
01
01
01
01
11
11
11
11
10
10
10
10
Page 219
Figure 3.10
Partial Pattern List for 4x4 Truth Map
This list of patterns doesn’t even begin to cover all of them! For example, these diagrams show none of the 1x2 rectangles. You must exercise care when working with four variable maps to ensure you select the largest possible rectangles, especially when overlap occurs. This is particularly important with you have a rectangle next to an edge of the truth map. As with functions of three variables, the size of the rectangle in a four variable truth map controls the number of terms it represents: • • • • •
A rectangle enclosing a single square represents a minterm. The associated term will have four literals. A rectangle surrounding two squares containing ones represents a term containing three literals. A rectangle surrounding four squares containing ones represents a term containing two literals. A rectangle surrounding eight squares containing ones represents a term containing a single literal. A rectangle surrounding sixteen squares represents the function F=1.
This last example demonstrates an optimization of a function containing four variables. The function is F = D’C’B’A’ + D’C’B’A + D’C’BA + D’C’BA’ + D’CB’A + D’CBA + DCB’A + DCBA + DC’B’A’ + DC’BA’, the truth map appears in Figure 3.11.
BA 00 01 11 10 00 DC
01
=1
11 10
=0
Figure 3.11 Truth Map for F = D’C’B’A’ + D’C’B’A + D’C’BA + D’C’BA’ + D’CB’A + D’CBA + DCB’A + DCBA + DC’B’A’ + DC’BA
Here are two possible sets of maximal rectangles for this function, each producing three terms (see Figure 3.12). Both functions are equivalent; both are as optimal as you can get3. Either will suffice for our purposes.
Figure 3.12
Two Combinations of Surrounded Values Yielding Three Terms
First, let’s consider the term represented by the rectangle formed by the four corners. This rectangle contains B, B’, D, and D’; so we can eliminate those terms. The remaining terms contained within these rectangles are C’ and A’, so this rectangle represents the term C’A’.
3. Remember, there is no guarantee that there is a unique optimal solution.
Page 220
The second rectangle, common to both maps in Figure 3.12, is the rectangle formed by the middle four squares. This rectangle includes the terms A, B, B’, C, D, and D’. Eliminating B, B’, D, and D’ (since both primed and unprimed terms exist), we obtain CA as the term for this rectangle. The map on the left in Figure 3.12 has a third term represented by the top row. This term includes the variables A, A’, B, B’, C’ and D’. Since it contains A, A’, B, and B’, we can eliminate these terms. This leaves the term C’D’. Therefore, the function represented by the map on the left is F=C’A’ + CA + C’D’. The map on the right in Figure 3.12 has a third term represented by the top/middle four squares. This rectangle subsumes the variables A, B, B’, C, C’, and D’. We can eliminate B, B’, C, and C’ since both primed and unprimed versions appear, this leaves the term AD. Therefore, the function represented by the function on the right is F=C’A’ + CA + AD’. Since both expressions are equivalent, contain the same number of terms, and the same number of operators, either form is equivalent. Unless there is another reason for choosing one over the other, you can use either form.
3.6
What Does This Have To Do With Computers, Anyway? Although there is a tenuous relationship between boolean functions and boolean expressions in programming languages like C or Pascal, it is fair to wonder why we’re spending so much time on this material. However, the relationship between boolean logic and computer systems is much stronger than it first appears. There is a one-to-one relationship between boolean functions and electronic circuits. Electrical engineers who design CPUs and other computer related circuits need to be intimately familiar with this stuff. Even if you never intend to design your own electronic circuits, understanding this relationship is important if you want to make the most of any computer system.
3.6.1 Correspondence Between Electronic Circuits and Boolean Functions There is a one-to-one correspondence between an electrical circuits and boolean functions. For any boolean function you can design an electronic circuit and vice versa. Since boolean functions only require the AND, OR, and NOT boolean operators4, we can construct any electronic circuit using these operations exclusively. The boolean AND, OR, and NOT functions correspond to the following electronic circuits, the AND, OR, and inverter (NOT) gates (see Figure 3.13).
A
A and B
B
Figure 3.13
A B
A or B
A
A'
AND, OR, and Inverter (NOT) Gates
One interesting fact is that you only need a single gate type to implement any electronic circuit. This gate is the NAND gate, shown in Figure 3.14.
4. We know this is true because these are the only operators that appear within canonical forms.
Page 221
A
not (A and B)
B Figure 3.14
The NAND Gate
To prove that we can construct any boolean function using only NAND gates, we need only show how to build an inverter (NOT), an AND gate, and an OR gate from a NAND (since we can create any boolean function using only AND, NOT, and OR). Building an inverter is easy, just connect the two inputs together (see Figure 3.15).
A
Figure 3.15
A'
Inverter Built from a NAND Gate
Once we can build an inverter, building an AND gate is easy – just invert the output of a NAND gate. After all, NOT (NOT (A AND B)) is equivalent to A AND B (see Figure 3.16). Of course, this takes two NAND gates to construct a single AND gate, but no one said that circuits constructed only with NAND gates would be optimal, only that it is possible.
A
A and B
B Figure 3.16
Constructing an AND Gate From Two NAND Gates
The remaining gate we need to synthesize is the logical-OR gate. We can easily construct an OR gate from NAND gates by applying DeMorgan’s theorems. (A or B)’ A or B A or B
= = =
A’ and B’ (A’ and B’)’ A’ nand B’
DeMorgan’s Theorem. Invert both sides of the equation. Definition of NAND operation.
By applying these transformations, you get the circuit in Figure 3.17.
A A or B B
Figure 3.17
Constructing an OR Gate from NAND Gates
Now you might be wondering why we would even bother with this. After all, why not just use logical AND, OR, and inverter gates directly? There are two reasons for this. First, NAND gates are generally less expensive to build than other gates.
Page 222
Second, it is also much easier to build up complex integrated circuits from the same basic building blocks than it is to construct an integrated circuit using different basic gates. Note, by the way, that it is possible to construct any logic circuit using only NOR gates5. The correspondence between NAND and NOR logic is orthogonal to the correspondence between the two canonical forms appearing in this chapter (sum of minterms vs. product of maxterms). While NOR logic is useful for many circuits, most electronic designs use NAND logic.
3.6.2 Combinatorial Circuits A combinatorial circuit is a system containing basic boolean operations (AND, OR, NOT), some inputs, and a set of outputs. Since each output corresponds to an individual logic function, a combinatorial circuit often implements several different boolean functions. It is very important that you remember this fact – each output represents a different boolean function. A computer’s CPU is built up from various combinatorial circuits. For example, you can implement an addition circuit using boolean functions. Suppose you have two one-bit numbers, A and B. You can produce the one-bit sum and the one-bit carry of this addition using the two boolean functions: S C
= =
AB’ + A’B AB
Sum of A and B. Carry from addition of A and B.
These two boolean functions implement a half-adder. Electrical engineers call it a half adder because it adds two bits together but cannot add in a carry from a previous operation. A full adder adds three one-bit inputs (two bits plus a carry from a previous addition) and produces two outputs: the sum and the carry. The two logic equations for a full adder are S Cout
= =
A’B’Cin + A’BCin’ + AB’Cin’ + ABCin AB + ACin + BCin
Although these logic equations only produce a single bit result (ignoring the carry), it is easy to construct an n-bit sum by combining adder circuits (see Figure 3.18). So, as this example clearly illustrates, we can use logic functions to implement arithmetic and boolean operations. A0 B0
S0 Carry
Half Adder
A1 B1
A2 B2
Full Adder
Full Adder
S1 Carry
S2 Carry
• • •
Figure 3.18
Building an N-Bit Adder Using Half and Full Adders
Another common combinatorial circuit is the seven-segment decoder. This is a combinatorial circuit that accepts four inputs and determines which of the segments on a seven-segment LED display should be on (logic one) or off (logic zero). Since a seven segment display contains seven output values (one for each segment), there will be seven logic functions associ-
5. NOR is NOT (A OR B).
Page 223
ated with the display (segment zero through segment six). See Figure 3.19 for the segment assignments. Figure 3.20 shows the segment assignments for each of the ten decimal values. S0 S1
S4
Figure 3.19
Seven Segment Display
Figure 3.20
Seven Segment Values for “0” Through “9”
S2 S5
S3
S6
The four inputs to each of these seven boolean functions are the four bits from a binary number in the range 0..9. Let D be the H.O. bit of this number and A be the L.O. bit of this number. Each logic function should produce a one (segment on) for a given input if that particular segment should be illuminated. For example S4 (segment four) should be on for binary values 0000, 0010, 0110, and 1000. For each value that illuminates a segment, you will have one minterm in the logic equation: S4 = D’C’B’A’ + D’C’BA’ + D’CBA’ + DC’B’A’. S0, as a second example, is on for values zero, two, three, five, six, seven, eight, and nine. Therefore, the logic function for S0 is S0 = D’C’B’A’ + D’C’BA’ + D’C’BA + D’CB’A + D’CBA’ + D’CBA + DC’B’A’ + DC’B’A
You can generate the other five logic functions in a similar fashion. Decoder circuits are among the more important circuits in computer system design. They provide the ability to recognize (or ‘decode’) a string of bits. One very common use for a decoder is memory expansion. For example, suppose a system designer wishes to install four (identical) 256 MByte memory modules in a system to bring the total to one gigabyte of RAM. These 256 MByte memory modules have 28 address lines (A0..A27) assuming each memory module is eight bits wide (228 x 8 bits is 256 MBytes)6. Unfortunately, if the system designer hooked up those four memory modules to the CPU’s address bus they would all respond to the same addresses on the bus. Pandemonium would result. To correct this problem, we need to select each memory module when a different set of addresses appear on the address bus. By adding a chip enable line to each of the memory modules and using a two-input, four-output decoder circuit, we can easily do this. See Figure 3.21 for the details.
6. Actually, most memory modules are wider than eight bits, so a real 256 MByte memory module will have fewer than 28 address lines, but we will ignore this technicality in this example.
Page 224
Chip Select Lines Two to Four Decoder
A28 A29
Address Lines A0..A27 Figure 3.21
Adding Four 256 MByte Memory Modules to a System
The two-line to four-line decoder circuit in Figure 3.21 actually incorporates four different logic functions, one function for each of the outputs. Assume the inputs are A and B (A=A28 and B=A29) then the four output functions have the following (simple) equations: Q0 Q1 Q2 Q3
= = = =
A’ B’ A B’ A’ B A B
Following standard electronic circuit notation, these equations use “Q” to denote an output (electronic designers use “Q” for output rather than “O” because “Q” looks somewhat like an “O” and is more easily differentiated from zero). Also note that most circuit designers use active low logic for decoders and chip enables. This means that they enable a circuit with a low input value (zero) and disable the circuit with a high input value (one). Likewise, the output lines of a decoder chip are normally high and go low when the inputs select a given output line. This means that the equations above really need to be inverted for real-world examples. We’ll ignore this issue here and use positive (or active high) logic7. Another big use for decoding circuits is to decode a byte in memory that represents a machine instruction in order to activate the corresponding circuitry to perform whatever tasks the instruction requires. We’ll cover this subject in much greater depth in a later chapter, but a simple example at this point will provide another solid example for using decoders. Most modern (Von Neumann) computer systems represent machine instructions via values in memory. To execute an instruction the CPU fetches a value from memory, decodes that value, and the does the appropriate activity the instruction specifies. Obviously, the CPU uses decoding circuitry to decode the instruction. To see how this is done, let’s create a very simple CPU with a very simple instruction set. Figure 3.22 provides the instruction format (that is, it specifies all the numeric codes) for our simple CPU.
7. Electronic circuits often use active low logic because the circuits that employ them typically require fewer transistors to implement.
Page 225
Instruction (opcode) Format:
Bit: 7
6
5
4
3
2
1
0
i
i
i
s
s
d
iii 000 = 001 = 010 = 011 = 100 = 101 = 110 = 111 = Figure 3.22
0
d
ss & dd MOV ADD SUB MUL DIV AND OR XOR
00 = 01 = 10 = 11 =
EAX EBX ECX EDX
Instruction (opcode) Format for a Very Simple CPU
To determine the eight-bit operation code (opcode) for a given instruction, the first thing you do is choose the instruction you want to encode. Let’s pick “MOV( EAX, EBX);” as our simple example. To convert this instruction to its numeric equivalent we must first look up the value for MOV in the iii table above; the corresponding value is 000. Therefore, we must substitute 000 for iii in the opcode byte. Second, we consider our source operand. The source operand is EAX, whose encoding in the source operand table (ss & dd) is 00. Therefore, we substitute 00 for ss in the instruction opcode.
Next, we need to convert the destination operand to its numeric equivalent. Once again, we look up the value for this operand in the ss & dd table. The destination operand is EBX and it’s value is 01. So we substitute 01 for dd in our opcode byte. Assembling these three fields into the opcode byte (a packed data type), we obtain the following bit value: %00000001. Therefore, the numeric value $1 is the value for the “MOV( EAX, EBX);” instruction (see Figure 3.23).
Page 226
0
Figure 3.23
0
0
0
0
0
iii
ss & dd
000 = MOV . . .
00 = 01 = 10 = 11 =
0
1
EAX EBX ECX EDX
Encoding the MOV( EAX, EBX ); Instruction
As another example, consider the “AND( EDX, ECX);” instruction. For this instruction the iii field is %101, the ss field is %11, and the dd field is %10. This yields the opcode %01011110 or $5E. You may easily create other opcodes for our simple instruction set using this same technique. Warning: please do not come to the conclusion that these encodings apply to the 80x86 instruction set. The encodings in this examples are highly simplified in order to demonstrate instruction decoding. They do not correspond to any real-life CPU, and they especially don’t apply to the x86 family. In these past few examples we were actually encoding the instructions. Of course, the real purpose of this exercise is to discover how the CPU can use a decoder circuit to decode these instructions and execute them at run time. A typical set of decoder circuits for this might look like that in Figure 3.24:
Page 227
A B 2 line to 4 line decoder
Q0 Q1 Q2 Q3
EAX EBX ECX EDX
See Note
0
0
0
0
0
0
A B C 3 line to 8 line decoder
0
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7
1
Circuitry to do a MOV Circuitry to do an ADD Circuitry to do a SUB Circuitry to do a MUL Circuitry to do a DIV Circuitry to do an AND Circuitry to do an OR Circuitry to do an XOR
Note: the circuitry attached to the destination register bits is identical to the circuitry for the source register bits. Figure 3.24
Decoding Simple Machine Instructions
Notice how this circuit uses three separate decoders to decode the individual fields of the opcode. This is much less complex than creating a seven-line to 128-line decoder to decode each individual opcode. Of course, all that the circuit above will do is tell you which instruction and what operands a given opcode specifies. To actually execute this instruction you must supply additional circuitry to select the source and destination operands from an array of registers and act accordingly upon those operands. Such circuitry is beyond the scope of this chapter, so we’ll save the juicy details for later. Combinatorial circuits are the basis for many components of a basic computer system. You can construct circuits for addition, subtraction, comparison, multiplication, division, and many other operations using combinatorial logic.
3.6.3 Sequential and Clocked Logic One major problem with combinatorial logic is that it is memoryless. In theory, all logic function outputs depend only on the current inputs. Any change in the input values is immediately reflected in the outputs8. Unfortunately, computers need the ability to remember the results of past computations. This is the domain of sequential or clocked logic.
Page 228
A memory cell is an electronic circuit that remembers an input value after the removal of that input value. The most basic memory unit is the set/reset flip-flop. You can construct an SR flip-flop using two NAND gates, as shown in Figure 3.25.
S
Q
Q' R
Figure 3.25
Set/Reset Flip Flop Constructed from NAND Gates
The S and R inputs are normally high. If you temporarily set the S input to zero and then bring it back to one (toggle the S input), this forces the Q output to one. Likewise, if you toggle the R input from one to zero back to one, this sets the Q output to zero. The Q’ input is generally the inverse of the Q output. Note that if both S and R are one, then the Q output depends upon Q. That is, whatever Q happens to be, the top NAND gate continues to output that value. If Q was originally one, then there are two ones as inputs to the bottom flip-flop (Q nand R). This produces an output of zero (Q’). Therefore, the two inputs to the top NAND gate are zero and one. This produces the value one as an output (matching the original value for Q). If the original value for Q was zero, then the inputs to the bottom NAND gate are Q=0 and R=1. Therefore, the output of this NAND gate is one. The inputs to the top NAND gate, therefore, are S=1 and Q’=1. This produces a zero output, the original value of Q. Suppose Q is zero, S is zero and R is one. This sets the two inputs to the top flip-flop to one and zero, forcing the output (Q) to one. Returning S to the high state does not change the output at all. You can obtain this same result if Q is one, S is zero, and R is one. Again, this produces an output value of one. This value remains one even when S switches from zero to one. Therefore, toggling the S input from one to zero and then back to one produces a one on the output (i.e., sets the flip-flop). The same idea applies to the R input, except it forces the Q output to zero rather than to one. There is one catch to this circuit. It does not operate properly if you set both the S and R inputs to zero simultaneously. This forces both the Q and Q’ outputs to one (which is logically inconsistent). Whichever input remains zero the longest determines the final state of the flip-flop. A flip-flop operating in this mode is said to be unstable. The only problem with the S/R flip-flop is that you must use separate inputs to remember a zero or a one value. A memory cell would be more valuable to us if we could specify the data value to remember on one input and provide a clock input to latch the input value. This type of flip-flop, the D flip-flop (for data) uses the circuit in Figure 3.26.
8. In practice, there is a short propagation delay between a change in the inputs and the corresponding outputs in any electronic implementation of a boolean function.
Page 229
Q Clk Q' Data
Figure 3.26
Implementing a D flip-flop with NAND Gates
Assuming you fix the Q and Q’ outputs to either 0/1 or 1/0, sending a clock pulse that goes from zero to one back to zero will copy the D input to the Q output. It will also copy D’ to Q’. The exercises at the end of this topic section will expect you to describe this operation in detail, so study this diagram carefully. Although remembering a single bit is often important, in most computer systems you will want to remember a group of bits. You can remember a sequence of bits by combining several D flip-flops in parallel. Concatenating flip-flops to store an nbit value forms a register. The electronic schematic in Figure 3.27 shows how to build an eight-bit register from a set of D flipflops.
Clk
Figure 3.27
D0
D1
D2
Q0
Q1
Q2
D3
Q3
D4
Q4
D5
Q5
D6
D7
Q6
Q7
An Eight-bit Register Implemented with Eight D Flip-flops
Note that the eight D flip-flops use a common clock line. This diagram does not show the Q’ outputs on the flip-flops since they are rarely required in a register. D flip-flops are useful for building many sequential circuits above and beyond simple registers. For example, you can build a shift register that shifts the bits one position to the left on each clock pulse. A four-bit shift register appears in Figure 3.28.
Page 230
Clk Data In D
Clk
D
Q
Q'
Q
Q
Q
Q1
Q2
Q3
Q0
Figure 3.28
Clk
D
Clk
D
Clk
A Four-bit Shift Register Built from D Flip-flops
You can even build a counter, that counts the number of times the clock toggles from one to zero and back to one using flip-flops. The circuit in Figure 3.29 implements a four bit counter using D flip-flops.
Clk
D
Clk
D
Q0'
Q0'
Figure 3.29
Clk
D
Q1'
Q1 '
Clk
D
Q2'
Q2'
Clk Q3'
Q3 '
Four-bit Counter Built from D Flip-flops
Surprisingly, you can build an entire CPU with combinatorial circuits and only a few additional sequential circuits beyond these. For example, you can build a simple state machine known as a sequencer by combining a counter and a decoder as shown in Figure 3.30. For each cycle of the clock this sequencer activates one of its output lines. Those lines, in turn, may control other circuitry. By “firing” these circuits on each of the 16 output lines of the decoder, we can control the order in which these 16 different circuits accomplish their tasks. This is a fundamental need in a CPU since we often need to control the sequence of various operations (for example, it wouldn’t be a good thing if the “ADD( EAX, EBX);” instruction stored the result into EBX before fetching the source operand from EAX (or EBX). A simple sequencer such as this one can tell the CPU when to fetch the first operand, when to fetch the second operand, when to add them together, and when to store the result away. But we’re getting a little ahead of ourselves, we’ll discuss this in greater detail in a later chapter.
Page 231
4-line to 16-line Decoder
Four-bit Counter
Q0 Q1 Q2 Clk Q3
Figure 3.30
3.7
A B C D
Q0
State 0
Q1
State 1
Q2
State 2
Q3
State 3
Q14
State 14
Q15
State 15
. . .
A Simple 16-State Sequencer
Okay, What Does It Have To Do With Programming, Then? Once you have registers, counters, and shift registers, you can build state machines. The implementation of an algorithm in hardware using state machines is well beyond the scope of this text. However, one important point must be made with respect to such circuitry – any algorithm you can implement in software you can also implement directly in hardware. This suggests that boolean logic is the basis for computation on all modern computer systems. Any program you can write, you can specify as a sequence of boolean equations. Of course, it is much easier to specify a solution to a programming problem using languages like Pascal, C, or even assembly language than it is to specify the solution using boolean equations. Therefore, it is unlikely that you would ever implement an entire program using a set of state machines and other logic circuitry. Nevertheless, there are times when a hardware implementation is better. A hardware solution can be one, two, three, or more orders of magnitude faster than an equivalent software solution. Therefore, some time critical operations may require a hardware solution. A more interesting fact is that the converse of the above statement is also true. Not only can you implement all software functions in hardware, but it is also possible to implement all hardware functions in software. This is an important revelation because many operations you would normally implement in hardware are much cheaper to implement using software on a microprocessor. Indeed, this is a primary use of assembly language in modern systems – to inexpensively replace a complex electronic circuit. It is often possible to replace many tens or hundreds of dollars of electronic components with a single $5 microcomputer chip. The whole field of embedded systems deals with this very problem. Embedded systems are computer systems embedded in other products. For example, most microwave ovens, TV sets, video games, CD players, and other consumer devices contain one or more complete computer systems whose sole purpose is to replace a complex hardware design. Engineers use computers for this purpose because they are less expensive and easier to design with than traditional electronic circuitry. You can easily design software that reads switches (input variables) and turns on motors, LEDs or lights, locks or unlocks a door, etc. (output functions). To write such software, you will need an understanding of boolean functions and how to implement such functions in software. Of course, there is one other reason for studying boolean functions, even if you never intend to write software intended for an embedded system or write software that manipulates real-world devices. Many high level languages process boolean expressions (e.g., those expressions that control an IF statement or WHILE loop). By applying transformations like DeMorgan’s theorems or a mapping optimization it is often possible to improve the performance of high level language code. Therefore, studying boolean functions is important even if you never intend to design an electronic circuit. It can help you write better code in a traditional programming language. For example, suppose you have the following statement in Pascal:
Page 232
if ((x=y) and (a b)) or ((x=y) and (c 3 ) _then stdout.put( "in second _if statement" nl ); _endif; _endif; endfor;
end IFDemo;
Program 9.3
Macro Implementation of the IF..ENDIF Statement
9.2.2 The HLA SWITCH/CASE Statement HLA doesn’t support a selection statement (SWITCH or CASE statement). Instead, HLA’s SWITCH..CASE..DEFAULT..ENDSWITCH statement exists only as a macro in the HLA Standard Library HLL.HHF file. This section discusses HLA’s macro implementation of the SWITCH statement. The SWITCH statement is very complex so it should come as no surprise that the macro implementation is long, involved, and complex. The example appearing in this section is slightly simplified over the standard HLA version, but not by much. This discussion assumes that you’re familiar with the low-level implementation of the SWITCH..CASE..DEFAULT..ENDSWITCH statement. If you are not comfortable with that implementation, or feel a little rusty, you may want to take another look at “SWITCH/CASE Statements” on page 776 before attempting to read this section. The discussion in this section is somewhat advanced and assumes a fair amount of programming skill. If you have trouble following this discussion, you may want to skip this section until you gain some more experience. There are several different ways to implement a SWITCH statement. In this section we will assume that the _switch.._endswitch macro we are writing will implement the SWITCH statement using a jump table. Implementation as a sequence of if..elseif statements is fairly trivial and is left as an exercise. Other schemes are possible as well, this section with not consider them. A typical SWITCH statement implementation might look like the following: readonly JmpTbl:dword[3] := [ &Stmt5, &Stmt6, &Stmt7 ]; . . .
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1015
Chapter Nine
Volume Five
// switch( i ) mov( i, eax ); // Check to see if "i" is outside the range cmp( eax, 5 ); // 5..7 and transfer control directly to the jb EndCase // DEFAULT case if it is. cmp( eax, 7 ); ja EndCase; jmp( JmpTbl[ eax*4 - 5*@size(dword)] ); // case( 5 ) Stmt5: stdout.put( “I=5” ); jmp EndCase; // Case( 6 ) Stmt6: stdout.put( “I=6” ); jmp EndCase; // Case( 7 ) Stmt7: stdout.put( “I=7” ); EndCase:
If you study this code carefully, with an eye to writing a macro to implement this statement, you’ll discover a couple of major problems. First of all, it is exceedingly difficult to determine how many cases and the range of values those cases cover before actually processing each CASE in the SWITCH statement. Therefore, it is really difficult to emit the range check (for values outside the range 5..7) and the indirect jump before processing all the cases in the SWITCH statement. You can easily solve this problem, however, by moving the checks and the indirect jump to the bottom of the code and inserting a couple of extra JMP instructions. This produces the following implementation: readonly JmpTbl:dword[3] := [ &Stmt5, &Stmt6, &Stmt7 ]; . . . // switch( i ) jmp DoSwitch;
// First jump inserted into this code.
// case( 5 ) Stmt5: stdout.put( “I=5” ); jmp EndCase; // Case( 6 ) Stmt6: stdout.put( “I=6” ); jmp EndCase; // Case( 7 ) Stmt7: stdout.put( “I=7” ); jmp EndCase; // Second jump inserted into this code. DoSwitch: mov( i, eax );
Page 1016
// Insert this label and move the range // checks and indirect jump down here.
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages cmp( eax, 5 ); jb EndCase cmp( eax, 7 ); ja EndCase; jmp( JmpTbl[ eax*4 - 5*@size(dword)] ); // All the cases (including the default case) jump down here: EndCase:
Since the range check code appears after all the cases, the macro can now process those cases and easily determine the bounds on the cases by the time it must emit the CMP instructions above that check the bounds of the SWITCH value. However, this implementation still has a problem. The entries in the JmpTbl table refer to labels that can only be determined by first processing all the cases in the SWITCH statement. Therefore, a macro cannot emit this table in a READONLY section that appears earlier in the source file than the SWITCH statement. Fortunately, HLA lets you embed data in the middle of the code section using the READONLY..ENDREADONLY and STATIC..ENDSTATIC directives1. Taking advantage of this feature allows use to rewrite the SWITCH implementation as follows: // switch( i ) jmp DoSwitch;
// First jump inserted into this code.
// case( 5 ) Stmt5: stdout.put( “I=5” ); jmp EndCase; // Case( 6 ) Stmt6: stdout.put( “I=6” ); jmp EndCase; // Case( 7 ) Stmt7: stdout.put( “I=7” ); jmp EndCase; // Second jump inserted into this code. DoSwitch: // Insert this label and move the range mov( i, eax ); // checks and indirect jump down here. cmp( eax, 5 ); jb EndCase cmp( eax, 7 ); ja EndCase; jmp( JmpTbl[ eax*4 - 5*@size(dword)] ); // All the cases (including the default case) jump down here: EndCase: readonly JmpTbl:dword[3] := [ &Stmt5, &Stmt6, &Stmt7 ]; endreadonly;
HLA’s macros can produce code like this when processing a SWITCH macro. So this is the type of code we will generate with a _switch.._case.._default.._endswitch macro. Since we’re going to need to know the minimum and maximum case values (in order to generate the appropriate operands for the CMP instructions above), the _case #KEYWORD macro needs to compare the 1. HLA actually moves the data to the appropriate segment in memory, the data is not stored directly in the CODE section.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1017
Chapter Nine
Volume Five
current case value(s) against the global minimum and maximum case values for all cases. If the current case value is less than the global minimum or greater than the global maximum, then the _case macro must update these global values accordingly. The _endswitch macro will use these global minimum and maximum values in the two CMP instructions it generates for the range checking sequence. For each case value appearing in a _switch statement, the _case macros must save the case value and an identifying label for that case value. This is necessary so that the _endswitch macro can generate the jump table. What is really needed is an arbitrary list of records, each record containing a value field and a label field. Unfortunately, the HLA compile-time language does not support arbitrary lists of objects, so we will have to implement the list using a (fixed size) array of record constants. The record declaration will take the following form: caseRecord: record value:uns32; label:uns32; endrecord;
The value field will hold the current case value. The label field will hold a unique integer value for the corresponding _case that the macros can use to generate statement labels. The implementation of the _switch macro in this section will use a variant of the trick found in the section on the _if macro; it will convert a local macro symbol to a string and append an integer value to the end of that string to create a unique label. The integer value appended will be the value of the label field in the caseRecord list. Processing the _case macro becomes fairly easy at this point. All the _case macro has to do is create an entry in the caseRecord list, bump a few counters, and emit an appropriate case label prior to the code emission. The implementation in this section uses Pascal semantics, so all but the first case in the _switch.._endswitch statement must first emit a jump to the statement following the _endswitch so the previous case’s code doesn’t fall into the current case. The real work in implementing the _switch.._endswitch statement lies in the generation of the jump table. First of all, there is no requirement that the cases appear in ascending order in the _switch.._endswitch statement. However, the entries in the jump table must appear in ascending order. Second, there is no requirement that the cases in the _switch.._endswitch statement be consecutive. Yet the entries in the jump table must be consecutive case values2. The code that emits the jump table must handle these inconsistencies. The first task is to sort the entries in the caseRecord list in ascending order. This is easily accomplished by writing a little SortCases macro to sort all the caseRecord entries once the _switch.._endswitch macro has processed all the cases. SortCases doesn’t have to be fancy. In fact, a bubblesort algorithm is perfect for this because: • •
Bubble sort is easy to implement Bubble sort is efficient when sorting small lists and most SWITCH statements only have a few cases. • Bubble sort is especially efficient on nearly sorted data and most programmers put their cases in ascending order. After sorting the cases, only one problem remains: there may be gaps in the case values. This problem is easily handled by stepping through the caseRecord elements one by one and synthesizing consecutive entries whenever a gap appears in the list. Program 9.4 provides the full _switch.._case.._default.._endswitch macro implementation.
/**************************************************/ /* */ /* switch.hla*/ /* */ 2. Of course, if there are gaps in the case values, the jump table entries for the missing items should contain the address of the default case.
Page 1018
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages /* This program demonstrates how to implement the */ /* _switch.._case.._default.._endswitch statement */ /* using macros. */ /* */ /**************************************************/
program demoSwitch; #include( "stdlib.hhf" ) const // // // // //
Because this code uses an array to implement the caseRecord list, we have to specify a fixed number of cases. The following constant defines the maximum number of possible cases in a _switch statement.
maxCases := 256; type // The following data type hold the case value // and statement label information for each // case appearing in a _switch statement. caseRecord: record value:uns32; lbl:uns32; endrecord;
// // // // // // // // // // //
SortCases This routine does a bubble sort on an array of caseRecord objects. It sorts in ascending order using the "value" field as the key. This is a good old fashioned bubble sort which turns out to be very efficient because: (1) The list of cases is usually quite small, and (2) The data is usually already sorted (or mostly sorted).
macro SortCases( sort_array, sort_size ): sort_i, sort_bnd, sort_didswap, sort_temp; ?sort_bnd := sort_size - 1; ?sort_didswap := true; #while( sort_didswap ) ?sort_didswap := false; ?sort_i := 0; #while( sort_i < sort_bnd )
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1019
Chapter Nine
Volume Five #if ( sort_array[sort_i].value > sort_array[sort_i+1].value ) ?sort_temp := sort_array[sort_i]; ?sort_array[sort_i] := sort_array[sort_i+1]; ?sort_array[sort_i+1] := sort_temp; ?sort_didswap := true; #elseif ( sort_array[sort_i].value = sort_array[sort_i+1].value ) #error ( "Two cases have the same value: (" + string( sort_array[sort_i].value ) + ")" ) #endif ?sort_i := sort_i + 1; #endwhile ?sort_bnd := sort_bnd - 1; #endwhile;
endmacro;
// HLA Macro to implement a C SWITCH statement (using // Pascal semantics). Note that the switch parameter // must be a 32-bit register. macro _switch( switch_reg ): switch_minval, switch_maxval, switch_otherwise, switch_endcase, switch_jmptbl, switch_cases, switch_caseIndex, switch_doCase, switch_hasotherwise;
// Just used to generate unique names.
// Verify that we have a register operand.
Page 1020
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages #if( !@isReg32( switch_reg ) ) #error( "Switch operand must be a 32-bit register" ) #endif // Create the switch_cases array.
Allow, at most, 256 cases.
?switch_cases:caseRecord[ maxCases ]; // General initialization for processing cases. ?switch_caseIndex := 0; ?switch_minval := $FFFF_FFFF; ?switch_maxval := 0; ?switch_hasotherwise := false;
// // // //
Index into switch_cases array. Minimum case value. Maximum case value. Determines if DEFAULT section present.
// We need to process the cases to collect information like // switch_minval prior to emitting the indirect jump. So move the // indirect jump to the bottom of the case statement. jmp switch_doCase;
// // // // //
"case" keyword macro handles each of the cases in the case statement. Note that this syntax allows you to specify several cases in the same _case macro, e.g., _case( 2, 3, 4 ). Such a situation tells this macro that these three values all execute the same code.
keyword _case( switch_parms[] ): switch_parmIndex, switch_parmCount, switch_constant; ?switch_parmCount:uns32; ?switch_parmCount := @elements( switch_parms ); #if( switch_parmCount switch_maxval ) ?switch_maxval := switch_constant; #endif // Emit a unique label to the source code for this case: @text ( + +
"_case" @string:switch_caseIndex string( switch_caseIndex )
): // Save away the case label and the case value so we // can build the jump table later on. ?switch_cases[ switch_caseIndex ].value := switch_constant; ?switch_cases[ switch_caseIndex ].lbl := switch_caseIndex; // Bump switch_caseIndex value because we've just processed // another case. ?switch_caseIndex := switch_caseIndex + 1; #if( switch_caseIndex >= maxCases ) #error( "Too many cases in statement" ); #endif ?switch_parmIndex := switch_parmIndex + 1; #endwhile
// Handle the default keyword/macro here. keyword _default; // If there was not a preceding case, this is an error. // If so, emit a jmp instruction to skip over the
Page 1022
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages // default case. #if( switch_caseIndex < 1 ) #error( "Must have at least one case" ); #endif jmp switch_endcase;
// Emit the label for this default case and set the // switch_hasotherwise flag to true. switch_otherwise: ?switch_hasotherwise := true;
// The endswitch terminator/macro checks to see if // this is a reasonable switch statement and emits // the jump table code if it is. terminator _endswitch: switch_i_, switch_j_, switch_curCase_;
// // // // // // // // // //
If the difference between the smallest and largest case values is great, the jump table is going to be fairly large. If the difference between these two values is greater than 256 but less than 1024, warn the user that the table will be large. If it's greater than 1024, generate an error. Note: these are arbitrary limits. adjust them if you like.
Feel free to
#if( (switch_maxval - switch_minval) > 256 ) #if( (switch_maxval - switch_minval) > 1024 ) // // // //
Perhaps in the future, this macro could switch to generating an if..elseif..elseif... chain if the range between the values is too great.
#error( "Range of cases is too great" ); #else #print( "Warning: Range of cases is large" ); #endif #endif // Table emission algorithm requires that the switch_cases // array be sorted by the case values.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1023
Chapter Nine
Volume Five SortCases( switch_cases, switch_caseIndex );
// Build a string of the form: // // switch_jmptbl:dword[ xx ] := [&case1, &case2, &case3...&casen]; // // so we can output the jump table. readonly switch_jmptbl:dword[ switch_maxval - switch_minval + 2] := [ ?switch_i_ := 0; #while( switch_i_ < switch_caseIndex ) ?switch_curCase_ := switch_cases[ switch_i_ ].value; // Emit the label associated with the current case: @text ( + + + +
"&" "_case" @string:switch_caseIndex string( switch_cases[ switch_i_ ].lbl ) ","
) // Emit "&switch_otherwise" table entries for any gaps present // in the table: ?switch_j_ := switch_cases[ switch_i_ + 1 ].value; ?switch_curCase_ := switch_curCase_ + 1; #while( switch_curCase_ < switch_j_ ) &switch_otherwise, ?switch_curCase_ := switch_curCase_ + 1; #endwhile ?switch_i_ := switch_i_ + 1; #endwhile // Emit a dummy entry to terminate the table: &switch_otherwise];
endreadonly; #if( switch_caseIndex < 1 ) #error( "Must have at least one case" ); #endif // After the default case, or after the last // case entry, jump over the code that does // the conditional jump.
Page 1024
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages jmp switch_endcase; // Okay, here's the code that does the conditional jump. switch_doCase: // If the minimum case value is zero, we don't // need to emit a CMP instruction for it. #if( switch_minval 0 ) cmp( switch_reg, switch_minval ); jb switch_otherwise; #endif cmp( switch_reg, switch_maxval ); ja switch_otherwise; jmp( switch_jmptbl[ switch_reg*4 - switch_minval*4 ] );
// If there was no default case, transfer control // to the first statement after the "endcase" clause. #if( !switch_hasotherwise ) switch_otherwise: #endif // When each of the cases complete execution, // transfer control down here. switch_endcase: // // // //
The following statement deallocates the storage assocated with the switch_cases array (this saves memory at compile time, it does not affect the execution of the resulting machine code).
?switch_cases := 0;
endmacro;
begin demoSwitch;
// A simple demonstration of the _switch.._endswitch statement: for( mov( 0, eax ); eax < 8; inc( eax )) do _switch( eax ) _case( 0 ) stdout.put( "eax = 0" nl ); _case( 1, 2 )
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1025
Chapter Nine
Volume Five
stdout.put( "eax = 1 or 2" nl ); _case( 3, 4, 5 ) stdout.put( "eax = 3, 4, or 5" nl ); _case( 6 ) stdout.put( "eax = 6" nl ); _default stdout.put( "eax is not in the range 0-6" nl ); _endswitch; endfor; end demoSwitch;
Program 9.4
Macro Implementation of the SWITCH..ENDSWITCH Statement
9.2.3 A Modified WHILE Loop The previous sections have shown you how to implement statements that are already available in HLA or the HLA Standard Library. While this approach lets you work with familiar statements that you should be comfortable with, it doesn’t really demonstrate that you can create new control statements with HLA’s compile-time language. In this section you will see how to create a variant of the WHILE statement that is not simply a rehash of HLA’s WHILE statement. This should amply demonstrate that there are some useful control structures that HLA (and high level languages) don’t provide and that you can easily use HLA compile-time language to implement specialized control structures as needed. A common use of a WHILE loop is to search through a list and stop upon encountering some desired value or upon hitting the end of the list. A typical HLA example might take the following form: while( ) do breakif( ); > endwhile;
The problem with this approach is that when the statement immediately following the ENDWHILE executes, that code doesn’t know whether the loop terminated because it found the desired value or because it exhausted the list. The typical solution is to test to see if the loop exhausted the list and deal with that accordingly: while( ) do breakif( ); >
Page 1026
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages endwhile; if( ) then > endif;
The problem with this "solution" should be obvious if you think about it a moment. We’ve already tested to see if the loop is empty, immediately after leaving the loop we repeat this same test. This is somewhat inefficient. A better solution would be to have something like an "else" clause in the WHILE loop that executes if you break out of the loop and doesn’t execute if the loop terminates because the boolean expression evaluated false. Rather than use the keyword ELSE, let’s invent a new (more readable) term: onbreak. The ONBREAK section of a WHILE loop executes (only once) if a BREAK or BREAKIF statement was the reason for the loop termination. With this ONBREAK clause, you could recode the previous WHILE loop a little bit more elegantly as follows: while( ) do breakif( ); >
onbreak > endwhile;
Note that if the ONBREAK clause is present, the WHILE’s loop body ends at the ONBREAK keyword. The ONBREAK clause executes at most once per execution of this WHILE statement. Implementing a _while.._onbreak.._endwhile statement is very easy using HLA’s multi-part macros. Program 9.5 provides the complete implementation of this statement:
/****************************************************/ /* */ /* while.hla */ /* */ /* This program demonstrates a variant of the */ /* WHILE loop that provides a special "onbreak" */ /* clause. The _onbreak clause executes if the */ /* program executes a _break clause or it executes */ /* a _breakif clause and the corresponding */ /* boolean expression evaluates true. The _onbreak */ /* section does not execute if the loop terminates */ /* due to the _while boolean expression evaluating */ /* false. */ /* */ /****************************************************/ program Demo_while; #include( "stdlib.hhf" ) // _while semantics: // // _while( expr ) // // > //
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1027
Chapter Nine
Volume Five
// _onbreak // This section is optional. // // > // // _endwhile; macro _while( expr ):falseLbl, breakLbl, topOfLoop, hasOnBreak; // hasOnBreak keeps track of whether we've seen an _onbreak // section. ?hasOnBreak:boolean:=false; // Here's the top of the WHILE loop. // Implement this as a straight-forward WHILE (test for // loop termination at the top of the loop). topOfLoop: jf( expr ) falseLbl; // Ignore the _do keyword. keyword _do;
// _continue and _continueif (with a true expression) // transfer control to the top of the loop where the // _while code retests the loop termination condition. keyword _continue; jmp topOfLoop; keyword _continueif( expr1 ); jt( expr1 ) topOfLoop;
// // // // // //
Unlike the _break or _breakif in a standard WHILE statement, we don't immediately exit the WHILE. Instead, this code transfers control to the optional _onbreak section if it is present. If it is not present, control transfers to the first statement beyond the _endwhile.
keyword _break; jmp breakLbl; keyword _breakif( expr2 ); jt( expr2 ) breakLbl;
// // // // // // // //
If we encounter an _onbreak section, this marks the end of the while loop body. Emit a jump that transfers control back to the top of the loop. This code also has to verify that there is only one _onbreak section present. Any code following this clause is going to execute only if the _break or _breakif statements execute and transfer control down here.
keyword _onbreak;
Page 1028
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages #if( hasOnBreak ) #error( "Extra _onbreak clause encountered" ) #else jmp topOfLoop; ?hasOnBreak := true; breakLbl: #endif terminator _endwhile; // // // // //
If we didn't have an _onbreak section, then this is the bottom of the _while loop body. Emit the jump to the top of the loop and emit the "breakLbl" label so the execution of a _break or _breakif transfers control down here.
#if( !hasOnBreak ) jmp topOfLoop; breakLbl: #endif falseLbl: endmacro;
static i:int32; begin Demo_while; // Demonstration of standard while loop mov( 0, i ); _while( i < 10 ) _do stdout.put( "1: i=", i, nl ); inc( i ); _endwhile; // Demonstration with BREAKIF: mov( 5, i ); _while( i < 10 ) _do stdout.put( "2: i=", i, nl ); _breakif( i = 7 ); inc( i ); _endwhile // Demonstration with _BREAKIF and _ONBREAK: mov( 0, i );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1029
Chapter Nine
Volume Five _while( i < 10 ) _do stdout.put( "3: i=", i, nl ); _breakif( i = 4 ); inc( i ); _onbreak stdout.put( "Breakif was true at i=", i, nl ); _endwhile stdout.put( "All Done" nl );
end Demo_while;
Program 9.5
The Implementation of _while.._onbreak.._endwhile
9.2.4 A Modified IF..ELSE..ENDIF Statement The IF statement is another statement that doesn’t always do exactly what you want. Like the _while.._onbreak.._endwhile example above, it’s quite possible to redefine the IF statement so that it behaves the way we want it to. In this section you’ll see how to implement a variant of the IF..ELSE..ENDIF statement that nests differently than the standard IF statement. It is possible to simulate short-circuit boolean evaluation invovling conjunction and disjunction without using the "&&" and "||" operators if you carefully structure your code. Consider the following example: // "C" code employing logical-AND operator: if( expr1 && expr2 ) { > }
// Equivalent HLA version: if( expr1 ) then if( expr2 ) then > endif; endif;
In both cases ("C" and HLA) the > block executes only if both expr1 and expr2 evaluate true. So other than the extra typing involved, it is often very easy to simulate logical conjunction by using two IF statements in HLA. There is one very big problem with this scheme. Consider what happens if you modify the "C" code to be the following: // "C" code employing logical-AND operator: if( expr1 && expr2 ) {
Page 1030
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages > } else { > }
Before describing how to create this new type of IF statement, we must digress for a moment and explore an interesting feature of HLA’s multi-part macro expansion: #KEYWORD macros do not have to use unique names. Whenever you declare an HLA #KEYWORD macro, HLA accepts whatever name you choose. If that name happens to be already defined, then the #KEYWORD macro name takes precedence as long as the macro is active (that is, from the point you invoke the macro name until HLA encounters the #TERMINATOR macro). Therefore, the #KEYWORD macro name hides the previous definition of that name until the termination of the macro. This feature applies even to the original macro name; that is, it is possible to define a #KEYWORD macro with the same name as the original macro to which the #KEYWORD macro belongs. This is a very useful feature because it allows you to change the definition of the macro within the scope of the opening and terminating invocations of the macro. Although not pertinent to the IF statement we are constructing, you should note that parameter and local symbols in a macro also override any previously defined symbols of the same name. So if you use that symbol between the opening macro and the terminating macro, you will get the value of the local symbol, not the global symbol. E.g., var i:int32; j:int32; . . . #macro abc:i; ?i:text := "j"; . . . #terminator xyz; . . . #endmacro; . . . mov( 25, i ); mov( 10, j ); abc mov( i, eax ); xyz;
// Loads j’s value (10), not 25 into eax.
The code above loads 10 into EAX because the "mov(i, eax);" instruction appears between the opening and terminating macros abc..xyz. Between those two macros the local definition of i takes precedence over the global definition. Since i is a text constant that expands to j, the aforementioned MOV statement is really equivalent to "mov(j, eax);" That statement, of course, loads 10 into EAX. Since this problem is difficult to see while reading your code, you should choose local symbols in multi-part macros very carefully. A good convention to adopt is to combine your local symbol name with the macro name, e.g., #macro abc : i_abc;
You may wonder why HLA allows something to crazy to happen in your source code, in a moment you’ll see why this behavior is useful (and now, with this brief message out of the way, back to our regularly scheduled discussion).
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1031
Chapter Nine
Volume Five
Before we digressed to discuss this interesting feature in HLA multi-part macros, we were trying to figure out how to efficiently simulate the conjunction and disjunction operators in an IF statement without actually using this operators in our code. The problem in the example appearing earlier in this section is that you would have to duplicate some code in order to convert the IF..ELSE statement properly. The following code shows this problem: // "C" code employing logical-AND operator: if( expr1 && expr2 ) { > } else { > }
// Corresponding HLA code using the "nested-IF" algorithm: if( expr1 ) then if( expr2 ) then > else > endif; else > endif;
Note that this code must duplicate the ">" section if the logic is to exactly match the original "C" code. This means that the program will be larger and harder to read than is absolutely necessary. One solution to this problem is to create a new kind of IF statement that doesn’t nest the same way standard IF statements nest. In particular, if we define the statement such that all IF clauses nested with an outer IF..ENDIF block share the same ELSE and ENDIF clauses. If this were the case, then you could implement the code above as follows: if( expr1 ) then if( expr2 ) then >
else > endif;
Page 1032
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages If expr1 is false, control immediately transfers to the ELSE clause. If the value of expr1 is true, the control falls through to the next IF statement. If expr2 evaluates false, then the program jumps to the single ELSE clause that all IFs share in this statement. Notice that a single ELSE clause (and corresponding ’false’ statements) appear in this code; hence the code does not necessarily expand in size. If expr2 evaluates true, then control falls through to the ’true’ statements, exactly like a standard IF statement. Notice that the nested IF statement above does not have a corresponding ENDIF. Like the ELSE clause, all nested IFs in this structure share the same ENDIF. Syntactically, there is no need to end the nested IF statement; the end of the THEN section ends with the ELSE clause, just as the outer IF statement’s THEN block ends. Of course, we can’t actually define a new macro named "if" because you cannot redefine HLA reserved words. Nor would it be a good idea to do so even if these were legal (since it would make your programs very difficult to comprehend if the IF keyword had different semantics in different parts of the program. The following program uses the identifiers "_if", "_then", "_else", and "_endif" instead. It is questionable if these are good identifiers in production code (perhaps something a little more different would be appropriate). The following code example uses these particular identifiers so you can easily correlate them with the corresponding high level statements.
/***********************************************/ /* */ /* if.hla */ /* */ /* This program demonstrates a modification of */ /* the IF..ELSE..ENDIF statement using HLA's */ /* multi-part macros. */ /* */ /***********************************************/
program newIF; #include( "stdlib.hhf" )
// // // // // // // // // // // // // // // // // // // // // //
Macro implementation of new form of if..then..else..endif. In this version, all nested IF statements transfer control to the same ELSE clause if any one of them have a false boolean expression. Syntax: _if( expression ) _then > _else // this is optional > _endif
Note that nested _if clauses do not have a corresponding _endif clause. This is because the single _else and/or _endif clauses terminate all the nested _if clauses including the first one. Of course, once the code encounters an _endif another _if statement may begin.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1033
Chapter Nine
Volume Five
// Macro to handle the main "_if" clause. // This code just tests the expression and jumps to the _else // clause if the expression evaluates false. macro _if( ifExpr ):elseLbl, hasElse, ifDone; ?hasElse := false; jf(ifExpr) elseLbl;
// Just ignore the _then keyword. keyword _then;
// // // // // //
Nested _if clause (yes, HLA lets you replace the main macro name with a keyword macro). Identical to the above _if implementation except this one does not require a matching _endif clause. The single _endif (matching the first _if clause) terminates all nested _if clauses as well as the main _if clause.
keyword _if( nestedIfExpr ); jf( nestedIfExpr ) elseLbl; // If this appears within the _else section, report // an error (we don't allow _if clauses nested in // the else section, that would create a loop). #if( hasElse ) #error( "All _if clauses must appear before the _else clause" ) #endif
// Handle the _else clause here. All we need to is check to // see if this is the only _else clause and then emit the // jmp over the else section and output the elseLbl target. keyword _else; #if( hasElse ) #error( "Only one _else clause is legal per _if.._endif" ) #else // Set hasElse true so we know that we've seen an _else // clause in this statement. ?hasElse := true; jmp ifDone; elseLbl: #endif // // // //
Page 1034
_endif has two tasks. First, it outputs the "ifDone" label that _else uses as the target of its jump to skip over the else section. Second, if there was no else section, this code must emit the "elseLbl" label so that the false conditional(s)
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages // in the _if clause(s) have a legal target label. terminator _endif; ifDone: #if( !hasElse ) elseLbl: #endif endmacro;
static tr:boolean := true; f:boolean := false; begin newIF; // Real quick demo of the _if statement: _if( tr ) _then _if( tr ) _then _if( f ) _then stdout.put( "error" nl ); _else stdout.put( "Success" ); _endif end newIF;
Program 9.6
Using Macros to Create a New IF Statement
Just in case you’re wondering, this program prints "Success" and then quits. This is because the nested "_if" statements are equivalent to the expression "true && true && false" which, of course, is false. Therefore, the "_else" portion of this code should execute. The only surprise in this macro is the fact that it redefines the _if macro as a keyword macro upon invocation of the main _if macro. The reason this code does this is so that any nested _if clauses do not require a corresponding _endif and don’t support an _else clause. Implementing an ELSEIF clause introduces some difficulties, hence its absence in this example. The design and implementation of an ELSEIF clause is left to the more serious reader3.
3. I.e., I don’t even want to have to think about this problem!
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1035
Chapter Nine
9.3
Volume Five
Sample Program: A Simple Expression Compiler This program’s sample program is a bit complex. In fact, the theory behind this program is well beyond the scope of this text (since it involves compiler theory). However, this example is such a good demonstration of the capabilities of HLA’s macro facilities and DSEL capabilities, it was too good not to include here. The following paragraphs will attempt to explain how this compile-time program operates. If you have difficulty understanding what’s going on, don’t feel too bad, this code isn’t exactly the type of stuff that beginning assembly language programmers would normally develop on their own. This program presents a (very) simple expression compiler. This code includes a macro, u32expr, that emits a sequence of instructions that compute the value of an arithmetic expression and leave that result sitting in one of the 80x86’s 32-bit registers. The syntax for the u32expr macro invocation is the following: u32expr( reg32, uns32_expression );
This macro emits the code that computes the following (HLL) statement: reg32 := uns32_expression;
For example, the macro invocation "u32expr( eax, ebx+ecx*5 - edi );" computes the value of the expression "ebx+ecx*5 - edi" and leaves the result of this expression sitting in the EAX register. The u32expr macro places several restrictions on the expression. First of all, as the name implies, it only computes the result of an uns32 expression. No other data types may appear within the expression. During computation, the macro uses the EAX and EDX registers, so expressions should not contain these registers as their values may be destroyed by the code that computes the expression (EAX or EDX may safely appear as the first operand of the expression, however). Finally, expressions may only contain the following operators: =, , !=, =, == +, *, / (, )
The "" and "!=" operators are equivalent (not equals) and the "=" and "==" operators are also equivalent (equals). The operators above are listed in order of increasing precedence; i.e., "*" has a higher precedence than "+" (as you would expect). You can override the precedence of an operator by using parentheses in the standard manner. It is important to remember that u32expr is a macro, not a function. That is, the invocation of this macro results in a sequence of 80x86 assembly language instructions that computes the desired expression. The u32expr invocation is not a function call. to some routine that computes the result. To understand how this macro works, it would be a good idea to review the section on “Converting Arithmetic Expressions to Postfix Notation” on page 635. That section discusses how to convert floating point expressions to reverse polish notation; although the u32expr macro works with uns32 objects rather than floating point objects, the approach it uses to translate expressions into assembly language uses this same algorithm. So if you don’t remember how to translate expressions into reverse polish notation, it might be worthwhile to review that section of this text. Converting floating point expressions to reverse polish notation is especially easy because the 80x86’s FPU uses a stack architecture. Alas, the integer instructions on the 80x86 use a register architecture and efficiently translating integer expression to assembly language is a bit more difficult (see “Arithmetic Expressions” on page 597). We’ll solve this problem by translating the expressions to assembly code in a somewhat less than efficient manner; we’ll simulate an integer stack architecture by using the 80x86’s hardware stack to hold temporary results during an integer calculation. To push an integer constant or variable onto the 80x86 hardware stack, we need only use a PUSH or PUSHD instruction. This operation is trivial. Page 1036
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages To add two values sitting on the top of stack together, leaving their sum on the stack, all we need do is pop those two values into registers, add the register values, and then push the result back onto the stack. We can do this operation slightly more efficiently, since addition is commutative, by using the following code: // Compute X+Y where X is on NOS (next on stack) and Y is on TOS (top of stack): pop( eax ); add( eax, [esp] );
// Get Y’s value. // Add with X’s value and leave sum on TOS.
Subtraction is identical to addition. Although subtraction is not commutative the operands just happen to be on the stack in the proper order to efficiently compute their difference. To compute "X-Y" where X is on NOS and Y is on TOS, we can use code like the following: // Compute X-y where X is on NOS and Y is on TOS: pop( eax ); sub( eax, [esp] );
Multiplication of the two items on the top of stack is a little more complicated since we must use the MUL instruction (the only unsigned multiplication instruction available) and the destination operand must be the EDX:EAX register pair. Fortunately, multiplication is a commutative operation, so we can compute the product of NOS (next on stack) and TOS (top of stack) using code like the following: // Compute X*Y where X is on NOS and Y is on TOS: pop( eax ); mul( [esp], eax ); mov( eax, [esp] );
// Note that this wipes out the EDX register.
Division is problematic because it is not a commutative operation and its operands on the stack are not in a convenient order. That is, to compute X/Y it would be really convenient if X was on TOS and Y was in the NOS position. Alas, as you’ll soon see, it turns out that X is at NOS and Y is on the TOS. To resolve this issue requires slightly less efficient code that the sequences we’ve used above. Since the DIV instruction is so slow anyway, this will hardly matter. // Compute X/Y where X is on NOS and Y is on TOS: mov( xor( div( pop( mov(
[esp+4], eax ); edx, edx ); [esp], edx:eax ); edx ); eax, [esp] );
// // // // //
Get X from NOS. Zero-extend EAX into EDX:EAX Compute their quotient. Remove unneeded Y value from the stack. Store quotient to the TOS.
The remaining operators are the comparison operators. These operators compare the value on NOS with the value on TOS and leave true (1) or false (0) sitting on the stack based on the result of the comparison. While it is easy to work around the non-commutative aspect of many of the comparison operators, the big challenge is converting the result to true or false. The SETcc instructions are convenient for this purpose, but they only work on byte operands. Therefore, we will have to zero extend the result of the SETcc instructions to obtain an uns32 result we can push onto the stack. Ultimately, the code we must emit for a comparison is similar to the following: // Compute X ', ' terms ( mulOp terms )* The above grammar production tells us that a "MulOps" consists of a "terms" expansion followed by zero or more instances of a "mulop" followed by a "terms" expansion (like wildcard filename expansions, the "*" indicates zero or more copies of the things inside the parentheses). This code assumes that "terms" leaves whatever operands/expressions it processes sitting on the 80x86 stack at run time. If there is a single term (no optional mulOp/term following), then this code does nothing (it leaves the result on the stack that was pushed by the "terms" expansion). If one or more mulOp/terms pairs are present, then for each pair this code assumes that the two "terms" expansions left some value on the stack. This code will pop
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1049
Chapter Nine // // // // // // // // // // // // // // // // // // // // // // // // // //
Volume Five
those two values off the stack and multiply or divide them and push the result back onto the stack (sort of like the way the FPU multiplies or divides values on the FPU stack). If there are three or more operands in a row, separated by mulops ("*" or "/") then this macro will process them in a left-to-right fashion, popping each pair of values off the stack, operating on them, pushing the result, and then processing the next pair. E.g., i * j * k yields: push( i ); push( j );
// From the "terms" macro. // From the "terms" macro.
pop( eax ); // Compute the product of i*j mul( (type dword [esp])); mov( eax, [esp]); push( k );
// From the "terms" macro.
pop( eax ); mul( (type dword [esp])); mov( eax, [esp]);
// Pop K // Compute K* (i*j) [i*j is value on TOS]. // Save product on TOS.
macro doMulOps( sexpr ):opToken; // Process the leading term (not optional). Note that // this expansion leaves an item sitting on the stack. doTerms( sexpr ); // Process all the MULOPs at the current precedence level. // (these are optional, there may be zero or more of them.) ?sexpr := @trim( sexpr, 0 ); #while( @peekCset( sexpr, MulOps )) // Save the operator so we know what code we should // generate later. ?opToken := lexer( sexpr ); // Get the term following the operator. doTerms( sexpr ); // // // //
Okay, the code for the two terms is sitting on the top of the stack (left operand at [esp+4] and the right operand at [esp]). Emit the code to perform the specified operation.
#if( opToken.lexeme = "*" ) // For multiplication, compute // [esp+4] = [esp] * [esp+4] and // then pop the junk off the top of stack.
Page 1050
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages pop( eax ); mul( (type dword [esp]) ); mov( eax, [esp] ); #elseif( opToken.lexeme = "/" ) // For division, compute // [esp+4] = [esp+4] / [esp] and // then pop the junk off the top of stack. mov( xor( div( pop( mov(
[esp+4], eax ); edx, edx ); [esp], edx:eax ); edx ); eax, [esp] );
#endif ?sexpr := @trim( sexpr, 0 ); #endwhile endmacro;
// // // // // // // // // // // // // // //
Handle the addition, and subtraction operations here. AddOps-> MulOps ( addOp MulOps )* The above grammar production tells us that an "AddOps" consists of a "MulOps" expansion followed by zero or more instances of an "addOp" followed by a "MulOps" expansion. This code assumes that "MulOps" leaves whatever operands/expressions it processes sitting on the 80x86 stack at run time. If there is a single MulOps item then this code does nothing. If one or more addOp/MulOps pairs are present, then for each pair this code assumes that the two "MulOps" expansions left some value on the stack. This code will pop those two values off the stack and add or subtract them and push the result back onto the stack.
macro doAddOps( sexpr ):opToken; // Process the first operand (or subexpression): doMulOps( sexpr ); // Process all the ADDOPs at the current precedence level. ?sexpr := @trim( sexpr, 0 ); #while( @peekCset( sexpr, PlusOps )) // Save the operator so we know what code we should // generate later. ?opToken := lexer( sexpr ); // Get the MulOp following the operator. doMulOps( sexpr );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1051
Chapter Nine
Volume Five
// Okay, emit the code associated with the operator. #if( opToken.lexeme = "+" ) pop( eax ); add( eax, [esp] ); #elseif( opToken.lexeme = "-" ) pop( eax ); sub( eax, [esp] ); #endif #endwhile endmacro;
// // // // // // // // // // // // // // //
Handle the comparison operations here. CmpOps-> addOps ( cmpOp AddOps )* The above grammar production tells us that a "CmpOps" consists of an "AddOps" expansion followed by zero or more instances of an "cmpOp" followed by an "AddOps" expansion. This code assumes that "MulOps" leaves whatever operands/expressions it processes sitting on the 80x86 stack at run time. If there is a single MulOps item then this code does nothing. If one or more addOp/MulOps pairs are present, then for each pair this code assumes that the two "MulOps" expansions left some value on the stack. This code will pop those two values off the stack and add or subtract them and push the result back onto the stack.
macro doCmpOps( sexpr ):opToken; // Process the first operand: doAddOps( sexpr ); // Process all the CMPOPs at the current precedence level. ?sexpr := @trim( sexpr, 0 ); #while( @peekCset( sexpr, CmpOps )) // Save the operator for the code generation task later. ?opToken := lexer( sexpr ); // Process the item after the comparison operator. doAddOps( sexpr );
// Generate the code to compare [esp+4] against [esp] // and leave true/false sitting on the stack in place // of these two operands.
Page 1052
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages
#if( opToken.lexeme = "=" ) pop( eax ); cmp( [esp], eax ); setae( al ); movzx( al, eax ); mov( eax, [esp] ); #elseif( opToken.lexeme = "=" ) pop( eax ); cmp( [esp], eax ); sete( al ); movzx( al, eax ); mov( eax, [esp] ); #elseif( opToken.lexeme = "" ) pop( eax ); cmp( [esp], eax ); setne( al ); movzx( al, eax ); mov( eax, [esp] );
#endif #endwhile endmacro;
// General macro that does the expression compliation. // The first parameter must be a 32-bit register where // this macro will leave the result. The second parameter
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1053
Chapter Nine // // // // // // // // // // //
Volume Five
is the expression to compile. The expression compiler will destroy the value in EAX and may destroy the value in EDX (though EDX and EAX make fine destination registers for this macro). This macro generates poor machine code. It is more a "proof of concept" rather than something you should use all the time. Nevertheless, if you don't have serious size or time constraints on your code, this macro can be quite handy. Writing an optimizer is left as an exercise to the interested reader.
macro u32expr( reg, expr):sexpr; // // // //
The "returns" statement processes the first operand as a normal sequence of statements and then returns the second operand as the "returns" value for this macro.
returns ( { ?sexpr:string := @string:expr; #if( !@IsReg32( reg ) ) #error( "Expected a 32-bit register" ) #else // Process the expression and leave the // result sitting in the specified register. doCmpOps( sexpr ); pop( reg ); #endif }, // Return the specified register as the "returns" // value for this compilation: @string:reg )
endmacro;
// The following main program provides some examples of the // use of the above macro: static x:uns32; v:uns32 := 5;
begin TestExpr;
Page 1054
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages mov( 10, x ); mov( 12, ecx ); // Compute: // // edi := (x*3/v + %1010 == 16) + ecx; // // This is equivalent to: // // edi := (10*3/5 + %1010 == 16) + 12 // := ( 30/5 + %1010 == 16) + 12 // := ( 6 + 10 == 16) + 12 // := ( 16 == 16) + 12 // := ( 1 ) + 12 // := 13 // // This macro invocation emits the following code: // // push(x); // pushd(3); // pop(eax); // mul( (type dword [esp]) ); // mov( eax, [esp] ); // push( v ); // mov( [esp+4], eax ); // xor edx, edx // div( [esp], edx:eax ); // pop( edx ); // mov( eax, [esp] ); // pushd( 10 ); // pop( eax ); // add( eax, [esp] ); // pushd( 16 ); // pop( eax ); // cmp( [esp], eax ); // sete( al ); // movzx( al, eax ); // mov( eax, [esp+0] ); // push( ecx ); // pop( eax ); // add( eax, [esp] ); // pop( edi );
u32expr( edi, (x*3/v+%1010 == 16) + ecx ); stdout.put( "Sum = ", (type uns32 edi), nl );
// Now compute: // // eax := x + ecx/2 // := 10 + 12/2 // := 10 + 6 // := 16 // // This macro emits the following code: // // push( x ); // push( ecx ); // pushd( 2 );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1055
Chapter Nine
Volume Five // // // // // // // //
mov( xor( div( pop( mov( pop( add( pop(
[esp+4], eax ); edx, edx ); [esp], edx:eax ); edx ); eax, [esp] ); eax ); eax, [esp] ); eax );
u32expr( eax, x+ecx/2 ); stdout.put( "x=", x, " ecx=", (type uns32 ecx), " v=", v, nl ); stdout.put( "x+ecx/2 = ", (type uns32 eax ), nl );
// Now determine if (x+ecx/2) < v // (it is not since (x+ecx/2)=16 and v = 5.) // // This macro invocation emits the following code: // // push( x ); // push( ecx ); // pushd( 2 ); // mov( [esp+4], eax ); // xor( edx, edx ); // div( [esp], edx:eax ); // pop( edx ); // mov( eax, [esp] ); // pop( eax ); // add( eax, [esp]); // push( v ); // pop( eax ); // cmp( eax, [esp+0] ); // setb( al ); // movzx( al, eax ); // mov( eax, [esp+0] ); // pop( eax );
if( u32expr( eax, x+ecx/2 < v ) ) then stdout.put( "x+ecx/2 < v" nl ); else stdout.put( "x+ecx/2 >= v" nl ); endif; end TestExpr;
Program 9.7
Page 1056
Uns32 Expression Compiler
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Domain Specific Embedded Languages
9.4
Putting It All Together The ability to extend the HLA language is one of the most powerful features of the HLA language. In this chapter you got to explore the use of several tools that allow you to extend the base language. Although a complete treatise on language design and implementation is beyond the scope of this chapter, further study in the area of compiler construction will help you learn new techniques for extending the HLA language. Later volumes in this text, including the volume on advanced string handling, will cover additional topics of interest to those who want to design and implement their own language constructs.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1057
Chapter Nine
Page 1058
Volume Five
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects
Classes and Objects 10.1
Chapter Ten
Chapter Overview Many modern imperative high level languages support the notion of classes and objects. C++ (an object version of C), Java, and Delphi (an object version of Pascal) are two good examples. Of course, these high level language compilers translate their high level source code into low-level machine code, so it should be pretty obvious that some mechanism exists in machine code for implementing classes and objects. Although it has always been possible to implement classes and objects in machine code, most assemblers provide poor support for writing object-oriented assembly language programs. Of course, HLA does not suffer from this drawback as it provides good support for writing object-oriented assembly language programs. This chapter discusses the general principles behind object-oriented programming (OOP) and how HLA supports OOP.
10.2
General Principles Before discussing the mechanisms behind OOP, it is probably a good idea to take a step back and explore the benefits of using OOP (especially in assembly language programs). Most texts describing the benefits of OOP will mention buzz-words like “code reuse,” “abstract data types,” “improved development efficiency,” and so on. While all of these features are nice and are good attributes for a programming paradigm, a good software engineer would question the use of assembly language in an environment where “improved development efficiency” is an important goal. After all, you can probably obtain far better efficiency by using a high level language (even in a non-OOP fashion) than you can by using objects in assembly language. If the purported features of OOP don’t seem to apply to assembly language programming, why bother using OOP in assembly? This section will explore some of those reasons. The first thing you should realize is that the use of assembly language does not negate the aforementioned OOP benefits. OOP in assembly language does promote code reuse, it provides a good method for implementing abstract data types, and it can improve development efficiency in assembly language. In other words, if you’re dead set on using assembly language, there are benefits to using OOP. To understand one of the principle benefits of OOP, consider the concept of a global variable. Most programming texts strongly recommend against the use of global variables in a program (as does this text). Interprocedural communication through global variables is dangerous because it is difficult to keep track of all the possible places in a large program that modify a given global object. Worse, it is very easy when making enhancements to accidentally reuse a global object for something other than its intended purpose; this tends to introduce defects into the system. Despite the well-understood problems with global variables, the semantics of global objects (extended lifetimes and accessibility from different procedures) are absolutely necessary in various situations. Objects solve this problem by letting the programmer decide on the lifetime of an object1 as well as allow access to data fields from different procedures. Objects have several advantages over simple global variables insofar as objects can control access to their data fields (making it difficult for procedures to accidentally access the data) and you can also create multiple instances of an object allowing two separate sections of your program to use their own unique “global” object without interference from the other section. Of course, objects have many other valuable attributes. One could write several volumes on the benefits of objects and OOP; this single chapter cannot do this subject justice. The following subsections present objects with an eye towards using them in HLA/assembly programs. However, if you are a beginning to
1. That is, the time during which the system allocates memory for an object.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1059
Chapter Ten
Volume Five
OOP or wish more information about the object-oriented paradigm, you should consult other texts on this subject. An important use for classes and objects is to create abstract data types (ADTs). An abstract data type is a collection of data objects and the functions (which we’ll call methods) that operate on the data. In a pure abstract data type, the ADT’s methods are the only code that has access to the data fields of the ADT; external code may only access the data using function calls to get or set data field values (these are the ADT’s accessor methods). In real life, for efficiency reasons, most languages that support ADTs allow, at least, limited access to the data fields of an ADT by external code. Assembly language is not a language most people associate with ADTs. Nevertheless, HLA provides several features to allow the creation of rudimentary ADTs. While some might argue that HLA’s facilities are not as complete as those in a language such as C++ or Java, keep in mind that these differences exist because HLA is assembly language. True ADTs should support information hiding. This means that the ADT does not allow the user of an ADT access to internal data structures and routines which manipulate those structures. In essence, information hiding restricts access to an ADT to only the accessor methods provided by the ADT. Assembly language, of course, provides very few restrictions. If you are dead set on accessing an object directly, there is very little HLA can do to prevent you from doing this. However, HLA has some facilities which will provide a small amount of information hiding capabilities. Combined with some care on your part, you will be able to enjoy many of the benefits of information hiding within your programs. The primary facility HLA provides to support information hiding is separate compilation, linkable modules, and the #INCLUDE/#INCLUDEONCE directives. For our purposes, an abstract data type definition will consist of two sections: an interface section and an implementation section. The interface section contains the definitions which must be visible to the application program. In general, it should not contain any specific information which would allow the application program to violate the information hiding principle, but this is often impossible given the nature of assembly language. Nevertheless, you should attempt to only reveal what is absolutely necessary within the interface section. The implementation section contains the code, data structures, etc., to actually implement the ADT. While some of the methods and data types appearing in the implementation section may be public (by virtue of appearance within the interface section), many of the subroutines, data items, and so on will be private to the implementation code. The implementation section is where you hide all the details from the application program. If you wish to modify the abstract data type at some point in the future, you will only have to change the interface and implementation sections. Unless you delete some previously visible object which the applications use, there will be no need to modify the applications at all. Although you could place the interface and implementation sections directly in an application program, this would not promote information hiding or maintainability, especially if you have to include the code in several different applications. The best approach is to place the implementation section in an include file that any interested application reads using the HLA #INCLUDE directive and to place the implementation section in a separate module that you link with your applications. The include file would contain EXTERNAL directives, any necessary macros, and other definitions you want made public. It generally would not contain 80x86 code except, perhaps, in some macros. When an application wants to make use of an ADT it would include this file. The separate assembly file containing the implementation section would contain all the procedures, functions, data objects, etc., to actually implement the ADT. Those names which you want to be public should appear in the interface include file and have the EXTERNAL attribute. You should also include the interface include file in the implementation file so you do not have to maintain two sets of EXTERNAL directives. One problem with using procedures for data access methods is the fact that many accessor methods are especially trivial (typically just a MOV instruction) and the overhead of the call and return instructions is expensive for such trivial operations. For example, suppose you have an ADT whose data object is a structure, but you do not want to make the field names visible to the application and you really do not want to
Page 1060
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects allow the application to access the fields of the data structure directly (because the data structure may change in the future). The normal way to handle this is to supply a method GetField which returns the desired field of the object. However, as pointed out above, this can be very slow. An alternative, for simple access methods is to use a macro to emit the code to access the desired field. Although code to directly access the data object appears in the application program (via macro expansion), it will be automatically updated if you ever change the macro in the interface section by simply assembling your application. Although it is quite possible to create ADTs using nothing more than separate compilation and, perhaps, RECORDs, HLA does provide a better solution: the class. Read on to find out about HLA’s support for classes and objects as well as how to use these to create ADTs.
10.3
Classes in HLA HLA’s classes provide a good mechanism for creating abstract data types. Fundamentally, a class is little more than a RECORD declaration that allows the definition of fields other than data fields (e.g., procedures, constants, and macros). The inclusion of other program declaration objects in the class definition dramatically expands the capabilities of a class over that of a record. For example, with a class it is now possible to easily define an ADT since classes may include data and methods that operate on that data (procedures). The principle way to create an abstract data type in HLA is to declare a class data type. Classes in HLA always appear in the TYPE section and use the following syntax: classname :
class > endclass;
The class declaration section is very similar to the local declaration section for a procedure insofar as it allows CONST, VAL, VAR, and STATIC variable declaration sections. Classes also let you define macros and specify procedure, iterator, and method prototypes (method declarations are legal only in classes). Conspicuously absent from this list is the TYPE declaration section. You cannot declare new types within a class. A method is a special type of procedure that appears only within a class. A little later you will see the difference between procedures and methods, for now you can treat them as being one and the same. Other than a few subtle details regarding class initialization and the use of pointers to classes, their semantics are identical2. Generally, if you don’t know whether to use a procedure or method in a class, the safest bet is to use a method. You do not place procedure/iterator/method code within a class. Instead you simply supply prototypes for these routines. A routine prototype consists of the PROCEDURE, ITERATOR, or METHOD reserved word, the routine name, any parameters, and a couple of optional procedure attributes (@USE, RETURNS, and EXTERNAL). The actual routine definition (i.e., the body of the routine and any local declarations it needs) appears outside the class. The following example demonstrates a typical class declaration appearing in the TYPE section:
TYPE TypicalClass: class const TCconst := 5; val
2. Note, however, that the difference between procedures and methods makes all the difference in the world to the object-oriented programming paradigm. Hence the inclusion of methods in HLA’s class definitions.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1061
Chapter Ten
Volume Five TCval := 6; var TCvar : uns32;
// Private field used only by TCproc.
static TCstatic : int32; procedure TCproc( u:uns32 ); returns( "eax" ); iterator TCiter( i:int32 ); external; method TCmethod( c:char ); endclass;
As you can see, classes are very similar to records in HLA. Indeed, you can think of a record as being a class that only allows VAR declarations. HLA implements classes in a fashion quite similar to records insofar as it allocates sequential data fields in sequential memory locations. In fact, with only one minor exception, there is almost no difference between a RECORD declaration and a CLASS declaration that only has a VAR declaration section. Later you’ll see exactly how HLA implements classes, but for now you can assume that HLA implements them the same as it does records and you won’t be too far off the mark. You can access the TCvar and TCstatic fields (in the class above) just like a record’s fields. You access the CONST and VAL fields in a similar manner. If a variable of type TypicalClass has the name obj, you can access the fields of obj as follows: mov ( obj.TCconst, eax ); mov( obj.TCval, ebx ); add( obj.TCvar, eax ); add( obj.TCstatic, ebx ); obj.TCproc( 20 ); // Calls the TCproc procedure in TypicalClass. etc.
If an application program includes the class declaration above, it can create variables using the TypicalClass type and perform operations using the above methods. Unfortunately, the application program can also access the fields of the ADT data type with impunity. For example, if a program created a variable MyClass of type TypicalClass, then it could easily execute instructions like “MOV( MyClass.TCvar, eax );” even though this field might be private to the implementation section. Unfortunately, if you are going to allow an application to declare a variable of type TypicalClass, the field names will have to be visible. While there are some tricks we could play with HLA’s class definitions to help hide the private fields, the best solution is to thoroughly comment the private fields and then exercise some restraint when accessing the fields of that class. Specifically, this means that ADTs you create using HLA’s classes cannot be “pure” ADTs since HLA allows direct access to the data fields. However, with a little discipline, you can simulate a pure ADT by simply electing not to access such fields outside the class’ methods, procedures, and iterators. Prototypes appearing in a class are effectively FORWARD declarations. Like normal forward declarations, all procedures, iterators, and methods you define in a class must have an actual implementation later in the code. Alternately, you may attach the EXTERNAL keyword to the end of a procedure, iterator, or method declaration within a class to inform HLA that the actual code appears in a separate module. As a general rule, class declarations appear in header files and represent the interface section of an ADT. The procedure, iterator, and method bodies appear in the implementation section which is usually a separate source file that you compile separately and link with the modules that use the class. The following is an example of a sample class procedure implementation: procedure TypicalClass.TCproc( u:uns32 ); nodisplay; > begin TCproc; > end TCProc;
Page 1062
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects There are several differences between a standard procedure declaration and a class procedure declaration. First, and most obvious, the procedure name includes the class name (e.g., TypicalClass.TCproc). This differentiates this class procedure definition from a regular procedure that just happens to have the name TCproc. Note, however, that you do not have to repeat the class name before the procedure name in the BEGIN and END clauses of the procedure (this is similar to procedures you define in HLA NAMESPACEs). A second difference between class procedures and non-class procedures is not obvious. Some procedure attributes (@USE, EXTERNAL, RETURNS, @CDECL, @PASCAL, and @STDCALL) are legal only in the prototype declaration appearing within the class while other attributes (@NOFRAME, @NODISPLAY, @NOALIGNSTACK, and ALIGN) are legal only within the procedure definition and not within the class. Fortunately, HLA provides helpful error messages if you stick the option in the wrong place, so you don’t have to memorize this rule. If a class routine’s prototype does not have the EXTERNAL option, the compilation unit (that is, the PROGRAM or UNIT) containing the class declaration must also contain the routine’s definition or HLA will generate an error at the end of the compilation. For small, local, classes (i.e., when you’re embedding the class declaration and routine definitions in the same compilation unit) the convention is to place the class’ procedure, iterator, and method definitions in the source file shortly after the class declaration. For larger systems (i.e., when separately compiling a class’ routines), the convention is to place the class declaration in a header file by itself and place all the procedure, iterator, and method definitions in a separate HLA unit and compile them by themselves.
10.4
Objects Remember, a class definition is just a type. Therefore, when you declare a class type you haven’t created a variable whose fields you can manipulate. An object is an instance of a class; that is, an object is a variable that is some class type. You declare objects (i.e., class variables) the same way you declare other variables: in a VAR, STATIC, or STORAGE section3. A pair of sample object declarations follow: var T1: TypicalClass; T2: TypicalClass;
For a given class object, HLA allocates storage for each variable appearing in the VAR section of the class declaration. If you have two objects, T1 and T2, of type TypicalClass then T1.TCvar is unique as is T2.TCvar. This is the intuitive result (similar to RECORD declarations); most data fields you define in a class will appear in the VAR declaration section. Static data objects (e.g., those you declare in the STATIC section of a class declaration) are not unique among the objects of that class; that is, HLA allocates only a single static variable that all variables of that class share. For example, consider the following (partial) class declaration and object declarations: type sc: class var i:int32; static s:int32; . . . endclass; var 3. Technically, you could also declare an object in a READONLY section, but HLA does not allow you to define class constants, so there is little utility in declaring class objects in the READONLY section.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1063
Chapter Ten
Volume Five s1: sc; s2: sc;
In this example, s1.i and s2.i are different variables. However, s1.s and s2.s are aliases of one another Therefore, an instruction like “mov( 5, s1.s);” also stores five into s2.s. Generally you use static class variables to maintain information about the whole class while you use class VAR objects to maintain information about the specific object. Since keeping track of class information is relatively rare, you will probably declare most class data fields in a VAR section. You can also create dynamic instances of a class and refer to those dynamic objects via pointers. In fact, this is probably the most common form of object storage and access. The following code shows how to create pointers to objects and how you can dynamically allocate storage for an object: var pSC: pointer to sc; . . . malloc( @size( sc ) ); mov( eax, pSC ); . . . mov( pSC, ebx ); mov( (type sc [ebx]).i, eax );
Note the use of type coercion to cast the pointer in EBX as type sc.
10.5
Inheritance Inheritance is one of the most fundamental ideas behind object-oriented programming. The basic idea behind inheritance is that a class inherits, or copies, all the fields from some class and then possibly expands the number of fields in the new data type. For example, suppose you created a data type point which describes a point in the planar (two dimensional) space. The class for this point might look like the following: type point: class var x:int32; y:int32; method distance; endclass;
Suppose you want to create a point in 3D space rather than 2D space. You can easily build such a data type as follows: type point3D: class inherits( point ); var z:int32; endclass;
Page 1064
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects The INHERITS option on the CLASS declaration tells HLA to insert the fields of point at the beginning of the class. In this case, point3D inherits the fields of point. HLA always places the inherited fields at the beginning of a class object. The reason for this will become clear a little later. If you have an instance of point3D which you call P3, then the following 80x86 instructions are all legal: mov( P3.x, eax ); add( P3.y, eax ); mov( eax, P3.z ); P3.distance();
Note that the P3.distance method invocation in this example calls the point.distance method. You do not have to write a separate distance method for the point3D class unless you really want to do so (see the next section for details). Just like the x and y fields, point3D objects inherit point’s methods.
10.6
Overriding Overriding is the process of replacing an existing method in an inherited class with one more suitable for the new class. In the point and point3D examples appearing in the previous section, the distance method (presumably) computes the distance from the origin to the specified point. For a point on a two-dimensional plane, you can compute the distance using the function:
dist = x 2 +y2 However, the distance for a point in 3D space is given by the equation:
dist = x 2 +y 2 +z2 Clearly, if you call the distance function for point for a point3D object you will get an incorrect answer. In the previous section, however, you saw that the P3 object calls the distance function inherited from the point class. Therefore, this would produce an incorrect result. In this situation the point3D data type must override the distance method with one that computes the correct value. You cannot simply redefine the point3D class by adding a distance method prototype: type point3D:
class inherits( point )
var z:int32; method distance;
// This doesn’t work!
endclass;
The problem with the distance method declaration above is that point3D already has a distance method – the one that it inherits from the point class. HLA will complain because it doesn’t like two methods with the same name in a single class. To solve this problem, we need some mechanism by which we can override the declaration of point.distance and replace it with a declaration for point3D.distance. To do this, you use the OVERRIDE keyword before the method declaration: type point3D:
class inherits( point )
var z:int32; override method distance;
// This will work!
endclass;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1065
Chapter Ten
Volume Five
The OVERRIDE prefix tells HLA to ignore the fact that point3D inherits a method named distance from the point class. Now, any call to the distance method via a point3D object will call the point3D.distance method rather than point.distance. Of course, once you override a method using the OVERRIDE prefix, you must supply the method in the implementation section of your code, e.g., method point3D.distance; nodisplay; > begin distance; > end distance;
10.7
Virtual Methods vs. Static Procedures A little earlier, this chapter suggested that you could treat class methods and class procedures the same. There are, in fact, some major differences between the two (after all, why have methods if they’re the same as procedures?). As it turns out, the differences between methods and procedures is crucial if you want to develop object-oriented programs. Methods provide the second feature necessary to support true polymorphism: virtual procedure calls4. A virtual procedure call is just a fancy name for an indirect procedure call (using a pointer associated with the object). The key benefit of virtual procedures is that the system automatically calls the right method when using pointers to generic objects. Consider the following declarations using the point class from the previous sections: var P2: point; P: pointer to point;
Given the declarations above, the following assembly statements are all legal: mov( P2.x, eax ); mov( P2.y, ecx ); P2.distance(); lea( ebx, P2 ); mov( ebx, P ); P.distance();
// Calls point3D.distance. // Store address of P2 into P. // Calls point.distance.
Note that HLA lets you call a method via a pointer to an object rather than directly via an object variable. This is a crucial feature of objects in HLA and a key to implementing virtual method calls. The magic behind polymorphism and inheritance is that object pointers are generic. In general, when your program references data indirectly through a pointer, the value of the pointer should be the address of the underlying data type associated with that pointer. For example, if you have a pointer to a 16-bit unsigned integer, you wouldn’t normally use that pointer to access a 32-bit signed integer value. Similarly, if you have a pointer to some record, you would not normally cast that pointer to some other record type and access the fields of that other type5. With pointers to class objects, however, we can lift this restriction a bit. Pointers to objects may legally contain the address of the object’s type or the address of any object that inherits the fields of that type. Consider the following declarations that use the point and point3D types from the previous examples: var 4. Polymorphism literally means “many-faced.” In the context of object-oriented programming polymorphism means that the same method name, e.g., distance, and refer to one of several different methods. 5. Of course, assembly language programmers break rules like this all the time. For now, let’s assume we’re playing by the rules and only access the data using the data type associated with the pointer.
Page 1066
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects P2: point; P3: point3D; p: pointer to point; . . . lea( ebx, P2 ); mov( ebx, p ); p.distance(); // Calls the point.distance method. . . . lea( ebx, P3 ); mov( ebx, p ); // Yes, this is semantically legal. p.distance(); // Surprise, this calls point3D.distance.
Since p is a pointer to a point object, it might seem intuitive for p.distance to call the point.distance method. However, methods are polymorphic. If you’ve got a pointer to an object and you call a method associated with that object, the system will call the actual (overridden) method associated with the object, not the method specifically associated with the pointer’s class type. Class procedures behave differently than methods with respect to overridden procedures. When you call a class procedure indirectly through an object pointer, the system will always call the procedure associated with the underlying class associated with the pointer. So had distance been a procedure rather than a method in the previous examples, the “p.distance();” invocation would always call point.distance, even if p is pointing at a point3D object. The section on Object Initialization, later in this chapter, explains why methods and procedures are different (see “Object Implementation” on page 1071). Note that iterators are also virtual; so like methods an object iterator invocation will always call the (overridden) iterator associated with the actual object whose address the pointer contains. To differentiate the semantics of methods and iterators from procedures, we will refer to the method/iterator calling semantics as virtual procedures and the calling semantics of a class procedure as a static procedure.
10.8
Writing Class Methods, Iterators, and Procedures For each class procedure, method, and iterator prototype appearing in a class definition, there must be a corresponding procedure, method, or iterator appearing within the program (for the sake of brevity, this section will use the term routine to mean procedure, method, or iterator from this point forward). If the prototype does not contain the EXTERNAL option, then the code must appear in the same compilation unit as the class declaration. If the EXTERNAL option does follow the prototype, then the code may appear in the same compilation unit or a different compilation unit (as long as you link the resulting object file with the code containing the class declaration). Like external (non-class) procedures and iterators, if you fail to provide the code the linker will complain when you attempt to create an executable file. To reduce the size of the following examples, they will all define their routines in the same source file as the class declaration. HLA class routines must always follow the class declaration in a compilation unit. If you are compiling your routines in a separate unit, the class declarations must still precede the code with the class declaration (usually via an #INCLUDE file). If you haven’t defined the class by the time you define a routine like point.distance, HLA doesn’t know that point is a class and, therefore, doesn’t know how to handle the routine’s definition. Consider the following declarations for a point2D class: type point2D: class const
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1067
Chapter Ten
Volume Five UnitDistance: real32 := 1.0; var x: real32; y: real32; static LastDistance: real32; method distance( fromX: real32; procedure InitLastDistance;
fromY:real32 ); returns( "st0" );
endclass;
The distance function for this class should compute the distance from the object’s point to (fromX,fromY). The following formula describes this computation: 2
( x – fromX ) + ( y – fromY )
2
A first pass at writing the distance method might produce the following code: method point2D.distance( fromX:real32; fromY:real32 ); nodisplay; begin distance; fld( x ); fld( fromX ); fsub(); fld( st0 ); fmul();
// Note: this doesn’t work! // Compute (x-fromX)
fld( y ); fld( fromY ); fsub(); fld( st0 ); fmul();
// This doesn’t work either. // Compute (y-fromY)
// Duplicate value on TOS. // Compute square of difference.
// Compute the square of the difference.
fsqrt(); end distance;
This code probably looks like it should work to someone who is familiar with an object-oriented programming language like C++ or Delphi. However, as the comments indicate, the instructions that push the x and y variables onto the FPU stack don’t work – HLA doesn’t automatically define the symbols associated with the data fields of a class within that class’ routines. To learn how to access the data fields of a class within that class’ routines, we need to back up a moment and discover some very important implementation details concerning HLA’s classes. To do this, consider the following variable declarations: var Origin: point2D; PtInSpace: point2D;
Remember, whenever you create two objects like Origin and PtInSpace, HLA reserves storage for the x and y data fields for both of these objects. However, there is only one copy of the point2D.distance method in memory. Therefore, were you to call Origin.distance and PtInSpace.distance, the system would call the same routine for both method invocations. Once inside that method, one has to wonder what an instruction like “fld( x );” would do. How does it associate x with Origin.x or PtInSpace.x? Worse still, how would this code differentiate between the data field x and a global object x? In HLA, the answer is “it doesn’t.” You do
Page 1068
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects not specify the data field names within a class routine by simply using their names as though they were common variables. To differentiate Origin.x from PtInSpace.x within class routines, HLA automatically passes a pointer to an object’s data fields whenever you call a class routine. Therefore, you can reference the data fields indirectly off this pointer. HLA passes this object pointer in the ESI register. This is one of the few places where HLA-generated code will modify one of the 80x86 registers behind your back: anytime you call a class routine, HLA automatically loads the ESI register with the object’s address. Obviously, you cannot count on ESI’s value being preserved across class routine class nor can you pass parameters to the class routine in the ESI register (though it is perfectly reasonable to specify "@USE ESI;" to allow HLA to use the ESI register when setting up other parameters). For class methods and iterators (but not procedures), HLA will also load the EDI register with the address of the class’ virtual method table (see “Virtual Method Tables” on page 1073). While the virtual method table address isn’t as interesting as the object address, keep in mind that HLA-generated code will overwrite any value in the EDI register when you call a method or an iterator. Again, "EDI" is a good choice for the @USE operand for methods since HLA will wipe out the value in EDI anyway. Upon entry into a class routine, ESI contains a pointer to the (non-static) data fields associated with the class. Therefore, to access fields like x and y (in our point2D example), you could use an address expression like the following: (type point2D [esi].x
Since you use ESI as the base address of the object’s data fields, it’s a good idea not to disturb ESI’s value within the class routines (or, at least, preserve ESI’s value if you need to access the objects data fields after some point where you must use ESI for some other purpose). Note that if you call an iterator or a method you do not have to preserve EDI (unless, for some reason, you need access to the virtual method table, which is unlikely). Accessing the fields of a data object within a class’ routines is such a common operation that HLA provides a shorthand notation for casting ESI as a pointer to the class object: THIS. Within a class in HLA, the reserved word THIS automatically expands to a string of the form “(type classname [esi])” substituting, of course, the appropriate class name for classname. Using the THIS keyword, we can (correctly) rewrite the previous distance method as follows: method point2D.distance( fromX:real32; fromY:real32 ); nodisplay; begin distance; fld( this.x ); fld( fromX ); fsub(); fld( st0 ); fmul(); fld( this.y ); fld( fromY ); fsub(); fld( st0 ); fmul();
// Compute (x-fromX) // Duplicate value on TOS. // Compute square of difference.
// Compute (y-fromY) // Compute the square of the difference.
fsqrt(); end distance;
Don’t forget that calling a class routine wipes out the value in the ESI register. This isn’t obvious from the syntax of the routine’s invocation. It is especially easy to forget this when calling some class routine from inside some other class routine; don’t forget that if you do this the internal call wipes out the value in ESI and on return from that call ESI no longer points at the original object. Always push and pop ESI (or otherwise preserve ESI’s value) in this situation, e.g., .
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1069
Chapter Ten
Volume Five . . fld( this.x ); // ESI points at current object. . . . push( esi ); // Preserve ESI across this method call. SomeObject.SomeMethod(); pop( esi ); . . . lea( ebx, this.x ); // ESI points at original object here.
The THIS keyword provides access to the class variables you declare in the VAR section of a class. You can also use THIS to call other class routines associated with the current object, e.g., this.distance( 5.0, 6.0 );
To access class constants and STATIC data fields you generally do not use the THIS pointer. HLA associates constant and static data fields with the whole class, not a specific object. To access these class members, just use the class name in place of the object name. For example, to access the UnitDistance constant in the point2D class you could use a statement like the following: fld( point2D.UnitDistance );
As another example, if you wanted to update the LastDistance field in the point2D class each time you computed a distance, you could rewrite the point2D.distance method as follows: method point2D.distance( fromX:real32; fromY:real32 ); nodisplay; begin distance; fld( this.x ); fld( fromX ); fsub(); fld( st0 ); fmul(); fld( this.y ); fld( fromY ); fsub(); fld( st0 ); fmul();
// Compute (x-fromX) // Duplicate value on TOS. // Compute square of difference.
// Compute (y-fromY) // Compute the square of the difference.
fsqrt(); fst( point2D.LastDistance );
// Update shared (STATIC) field.
end distance;
To understand why you use the class name when referring to constants and static objects but you use THIS to access VAR objects, check out the next section. Class procedures are also static objects, so it is possible to call a class procedure by specifying the class name rather than an object name in the procedure invocation, e.g., both of the following are legal: Origin.InitLastDistance(); point2D.InitLastDistance();
There is, however, a subtle difference between these two class procedure calls. The first call above loads ESI with the address of the Origin object prior to actually calling the InitLastDistance procedure. The second call, however, is a direct call to the class procedure without referencing an object; therefore, HLA doesn’t Page 1070
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects know what object address to load into the ESI register. In this case, HLA loads NULL (zero) into ESI prior to calling the InitLastDistance procedure. Because you can call class procedures in this manner, it’s always a good idea to check the value in ESI within your class procedures to verify that HLA contains an object address. Checking the value in ESI is a good way to determine which calling mechanism is in use. Later, this chapter will discuss constructors and object initialization; there you will see a good use for static procedures and calling those procedures directly (rather than through the use of an object).
10.9
Object Implementation In a high level object-oriented language like C++ or Delphi, it is quite possible to master the use of objects without really understanding how the machine implements them. One of the reasons for learning assembly language programming is to fully comprehend low-level implementation details so one can make educated decisions concerning the use of programming constructs like objects. Further, since assembly language allows you to poke around with data structures at a very low-level, knowing how HLA implements objects can help you create certain algorithms that would not be possible without a detailed knowledge of object implementation. Therefore, this section, and its corresponding subsections, explains the low-level implementation details you will need to know in order to write object-oriented HLA programs. HLA implements objects in a manner quite similar to records. In particular, HLA allocates storage for all VAR objects in a class in a sequential fashion, just like records. Indeed, if a class consists of only VAR data fields, the memory representation of that class is nearly identical to that of a corresponding RECORD declaration. Consider the Student record declaration taken from Volume Three and the corresponding class: type student:
record Name: char[65]; Major: int16; SSN: char[12]; Midterm1: int16; Midterm2: int16; Final: int16; Homework: int16; Projects: int16; endrecord;
student2: class Name: char[65]; Major: int16; SSN: char[12]; Midterm1: int16; Midterm2: int16; Final: int16; Homework: int16; Projects: int16; endclass;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1071
Chapter Ten
Volume Five
Name (65 bytes)
SSN (12 bytes)
Mid 2 Homework (2 bytes) (2 bytes)
John Major (2 bytes)
Figure 10.1
Mid 1 Final Projects (2 bytes) (2 bytes) (2 bytes)
Student RECORD Implementation in Memory
Name (65 bytes)
SSN (12 bytes)
Mid 2 Homework (2 bytes) (2 bytes)
John VMT Pointer (4 Bytes)
Figure 10.2
Major (2 bytes)
Mid 1 Final Projects (2 bytes) (2 bytes) (2 bytes)
Student CLASS Implementation in Memory
If you look carefully at these two figures, you’ll discover that the only difference between the class and the record implementations is the inclusion of the VMT (virtual method table) pointer field at the beginning of the class object. This field, which is always present in a class, contains the address of the class’ virtual method table which, in turn, contains the addresses of all the class’ methods and iterators. The VMT field, by the way, is present even if a class doesn’t contain any methods or iterators. As pointed out in previous sections, HLA does not allocate storage for STATIC objects within the object’s storage. Instead, HLA allocates a single instance of each static data field that all objects share. As an example, consider the following class and object declarations: type tHasStatic: class var i:int32; j:int32; r:real32; static c:char[2]; b:byte; endclass; var hs1: tHasStatic; hs2: tHasStatic;
Figure 10.3 shows the storage allocation for these two objects in memory.
Page 1072
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects
hs1
hs2
VMT
VMT
i
i
j
j
r
tHasStatic.c c[1] c[0]
r
tHasStatic.b
Figure 10.3
Object Allocation with Static Data Fields
Of course, CONST, VAL, and #MACRO objects do not have any run-time memory requirements associated with them, so HLA does not allocate any storage for these fields. Like the STATIC data fields, you may access CONST, VAL, and #MACRO fields using the class name as well as an object name. Hence, even if tHasStatic has these types of fields, the memory organization for tHasStatic objects would still be the same as shown in Figure 10.3. Other than the presence of the virtual method table pointer (VMT), the presence of methods, iterators, and procedures has no impact on the storage allocation of an object. Of course, the machine instructions associated with these routines does appear somewhere in memory. So in a sense the code for the routines is quite similar to static data fields insofar as all the objects share a single instance of the routine.
10.9.1 Virtual Method Tables When HLA calls a class procedure, it directly calls that procedure using a CALL instruction, just like any normal non-class procedure call. Methods and iterators are another story altogether. Each object in the system carries a pointer to a virtual method table which is an array of pointers to all the methods and iterators appearing within the object’s class.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1073
Chapter Ten
Volume Five
SomeObject VMT
Method/ Iterator #1
field1
Method/ Iterator #2
field2
...
...
Method/ Iterator #n
fieldn Figure 10.4
Virtual Method Table Organization
Each iterator or method you declare in a class has a corresponding entry in the virtual method table. That dword entry contains the address of the first instruction of that iterator or method. To call a class method or iterator is a bit more work than calling a class procedure (it requires one additional instruction plus the use of the EDI register). Here is a typical calling sequence for a method: mov( ObjectAdrs, ESI ); mov( [esi], edi ); call( (type dword [edi+n]));
// All class routines do this. // Get the address of the VMT into EDI // "n" is the offset of the method’s entry // in the VMT.
For a given class there is only one copy of the VMT in memory. This is a static object so all objects of a given class type share the same VMT. This is reasonable since all objects of the same class type have exactly the same methods and iterators (see Figure 10.5).
Object1 VMT
Object2
Object3
Note:Objects are all the same class type Figure 10.5
All Objects That are the Same Class Type Share the Same VMT
Although HLA builds the VMT record structure as it encounters methods and iterators within a class, HLA does not automatically create the actual run-time virtual method table for you. You must explicitly
Page 1074
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects declare this table in your program. To do this, you include a statement like the following in a STATIC or READONLY declaration section of your program, e.g., readonly VMT( classname );
Since the addresses in a virtual method table should never change during program execution, the READONLY section is probably the best choice for declaring VMTs. It should go without saying that changing the pointers in a VMT is, in general, a really bad idea. So putting VMTs in a STATIC section is usually not a good idea. A declaration like the one above defines the variable classname._VMT_. In section 10.10 (see “Constructors and Object Initialization” on page 1079) you see that you’ll need this name when initializing object variables. The class declaration automatically defines the classname._VMT_ symbol as an external static variable. The declaration above just provides the actual definition of this external symbol. The declaration of a VMT uses a somewhat strange syntax because you aren’t actually declaring a new symbol with this declaration, you’re simply supplying the data for a symbol that you previously declared implicitly by defining a class. That is, the class declaration defines the static table variable classname._VMT_, all you’re doing with the VMT declaration is telling HLA to emit the actual data for the table. If, for some reason, you would like to refer to this table using a name other than classname._VMT_, HLA does allow you to prefix the declaration above with a variable name, e.g., readonly myVMT: VMT( classname );
In this declaration, myVMT is an alias of classname._VMT_. As a general rule, you should avoid aliases in a program because they make the program more difficult to read and understand. Therefore, it is unlikely that you would ever really need to use this type of declaration. Like any other global static variable, there should be only one instance of a VMT for a given class in a program. The best place to put the VMT declaration is in the same source file as the class’ method, iterator, and procedure code (assuming they all appear in a single file). This way you will automatically link in the VMT whenever you link in the routines for a given class.
10.9.2 Object Representation with Inheritance Up to this point, the discussion of the implementation of class objects has ignored the possibility of inheritance. Inheritance only affects the memory representation of an object by adding fields that are not explicitly stated in the class declaration. Adding inherited fields from a base class to another class must be done carefully. Remember, an important attribute of a class that inherits fields from a base class is that you can use a pointer to the base class to access the inherited fields from that base class in another class. As an example, consider the following classes: type tBaseClass: class var i:uns32; j:uns32; r:real32; method mBase; endclass; tChildClassA: class inherits( tBaseClass ); var c:char; b:boolean;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1075
Chapter Ten
Volume Five w:word; method mA; endclass; tChildClassB: class inherits( tBaseClass ); var d:dword; c:char; a:byte[3]; endclass;
Since both tChildClassA and tChildClassB inherit the fields of tBaseClass, these two child classes include the i, j, and r fields as well as their own specific fields. Furthermore, whenever you have a pointer variable whose base type is tBaseClass, it is legal to load this pointer with the address of any child class of tBaseClass; therefore, it is perfectly reasonable to load such a pointer with the address of a tChildClassA or tChildClassB variable, e.g., var B1: tBaseClass; CA: tChildClassA; CB: tChildClassB; ptr: pointer to tBaseClass; . . . lea( ebx, B1 ); mov( ebx, ptr ); > . . . lea( eax, CA ); mov( ebx, ptr ); > . . . lea( eax, CB ); mov( eax, ptr ); >
Since ptr points at an object of tBaseClass, you may legally (from a semantic sense) access the i, j, and r fields of the object where ptr is pointing. It is not legal to access the c, b, w, or d fields of the tChildClassA or tChildClassB objects since at any one given moment the program may not know exactly what object type ptr references. In order for inheritance to work properly, the i, j, and r fields must appear at the same offsets all child classes as they do in tBaseClass. This way, an instruction of the form “mov((type tBaseClass [ebx]).i, eax);” will correct access the i field even if EBX points at an object of type tChildClassA or tChildClassB. Figure 10.6 shows the layout of the child and base classes:
Page 1076
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects
a w
c
b c
d
r
r
r
j
j
j
i
i
i
VMT
VMT
VMT
tBaseClass
tChildClassA
tChildClassB
Derived (child) classes locate their inherited fields at the same offsets as those fields in the base class. Figure 10.6
Layout of Base and Child Class Objects in Memory
Note that the new fields in the two child classes bear no relation to one another, even if they have the same name (e.g., field c in the two child classes does not lie at the same offset). Although the two child classes share the fields they inherit from their common base class, any new fields they add are unique and separate. Two fields in different classes share the same offset only by coincidence. All classes (even those that aren’t related to one another) place the pointer to the virtual method table at offset zero within the object. There is a single VMT associated with each class in a program; even classes that inherit fields from some base class have a VMT that is (generally) different than the base class’ VMT. shows how objects of type tBaseClass, tChildClassA and tChildClassB point at their specific VMTs:
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1077
Chapter Ten
Volume Five
var B1: tBaseClass; CA: tChildClassA; CB: tChildClassB; CB2: tChildClassB; CA2: tChildClassA;
B1
tBaseClass:VMT
CA2
tChildClassA:VMT CA
tChildClassB:VMT
CB2
CB
VMT Pointer Figure 10.7
Virtual Method Table References from Objects
A virtual method table is nothing more than an array of pointers to the methods and iterators associated with a class. The address of the first method or iterator appearing in a class is at offset zero, the address of the second appears at offset four, etc. You can determine the offset value for a given iterator or method by using the @offset function. If you want to call a method or iterator directly (using 80x86 syntax rather than HLA’s high level syntax), you code use code like the following: var sc: tBaseClass; . . . lea( esi, sc ); // Get the address of the object (& VMT). mov( [esi], edi ); // Put address of VMT into EDI. call( (type dword [edi+@offset( tBaseClass.mBase )] );
Of course, if the method has any parameters, you must push them onto the stack before executing the code above. Don’t forget, when making direct calls to a method, that you must load ESI with the address of the object. Any field references within the method will probably depend upon ESI containing this address. The choice of EDI to contain the VMT address is nearly arbitrary. Unless you’re doing something tricky (like using EDI to obtain run-time type information), you could use any register you please here. As a general rule, you should use EDI when simulating class iterator/method calls because this is the convention that HLA employs and most programmers will expect this.
Page 1078
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects Whenever a child class inherits fields from some base class, the child class’ VMT also inherits entries from the base class’ VMT. For example, the VMT for class tBaseClass contains only a single entry – a pointer to method tBaseClass.mBase. The VMT for class tChildClassA contains two entries: a pointer to tBaseClass.mBase and tChildClassA.mA. Since tChildClassB doesn’t define any new methods or iterators, tChildClassB’s VMT contains only a single entry, a pointer to the tBaseClass.mBase method. Note that tChildClassB’s VMT is identical to tBaseClass’ VMT. Nevertheless, HLA produces two distinct VMTs. This is a critical fact that we will make use of a little later. Figure 10.8 shows the relationship between these VMTs:
Virtual Method Tables for Derived (inherited) Classes Offset Four
mA mBase
mBase
mBase
tBaseClass
tChildClassA
tChildClassB
Figure 10.8
Offset Zero
Virtual Method Tables for Inherited Classes
Although the VMT always appears at offset zero in an object (and, therefore, you can access the VMT using the address expression “[ESI]” if ESI points at an object), HLA actually inserts a symbol into the symbol table so you may refer to the VMT symbolically. The symbol _pVMT_ (pointer to Virtual Method Table) provides this capability. So a more readable way to access the VMT pointer (as in the previous code example) is lea( esi, sc ); mov( (type tBaseClass [esi])._pVMT_, edi ); call( (type dword [edi+@offset( tBaseClass.mBase )] );
If you need to access the VMT directly, there are a couple ways to do this. Whenever you declare a class object, HLA automatically includes a field named _VMT_ as part of that class. _VMT_ is a static array of double word objects. Therefore, you may refer to the VMT using an identifier of the form classname._VMT_. Generally, you shouldn’t access the VMT directly, but as you’ll see shortly, there are some good reasons why you need to know the address of this object in memory.
10.10 Constructors and Object Initialization If you’ve tried to get a little ahead of the game and write a program that uses objects prior to this point, you’ve probably discovered that the program inexplicably crashes whenever you attempt to run it. We’ve covered a lot of material in this chapter thus far, but you are still missing one crucial piece of information – how to properly initialize objects prior to use. This section will put the final piece into the puzzle and allow you to begin writing programs that use classes. Consider the following object declaration and code fragment: var bc: tBaseClass; . . . bc.mBase();
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1079
Chapter Ten
Volume Five
Remember that variables you declare in the VAR section are uninitialized at run-time. Therefore, when the program containing these statements gets around to executing bc.mBase, it executes the three-statement sequence you’ve seen several times already: lea( esi, bc); mov( [esi], edi ); call( (type dword [edi+@offset( tBaseClass.mBase )] );
The problem with this sequence is that it loads EDI with an undefined value assuming you haven’t previously initialized the bc object. Since EDI contains a garbage value, attempting to call a subroutine at address “[EDI+@offset(tBaseClass.mBase)]” will likely crash the system. Therefore, before using an object, you must initialize the _pVMT_ field with the address of that object’s VMT. One easy way to do this is with the following statement: mov( &tBaseClass._VMT_, bc._pVMT_ );
Always remember, before using an object, be sure to initialize the virtual method table pointer for that field. Although you must initialize the virtual method table pointer for all objects you use, this may not be the only field you need to initialize in those objects. Each specific class may have its own application-specific initialization that is necessary. Although the initialization may vary by class, you need to perform the same initialization on each object of a specific class that you use. If you ever create more than a single object from a given class, it is probably a good idea to create a procedure to do this initialization for you. This is such a common operation that object-oriented programmers have given these initialization procedures a special name: constructors. Some object-oriented languages (e.g., C++) use a special syntax to declare a constructor. Others (e.g., Delphi) simply use existing procedure declarations to define a constructor. One advantage to employing a special syntax is that the language knows when you define a constructor and can automatically generate code to call that constructor for you (whenever you declare an object). Languages, like Delphi, require that you explicitly call the constructor; this can be a minor inconvenience and a source of defects in your programs. HLA does not use a special syntax to declare constructors – you define constructors using standard class procedures. As such, you will need to explicitly call the constructors in your program; however, you’ll see an easy method for automating this in a later section of this chapter. Perhaps the most important fact you must remember is that constructors must be class procedures. You must not define constructors as methods (or iterators). The reason is quite simple: one of the tasks of the constructor is to initialize the pointer to the virtual method table and you cannot call a class method or iterator until after you’ve initialized the VMT pointer. Since class procedures don’t use the virtual method table, you can call a class procedure prior to initializing the VMT pointer for an object. By convention, HLA programmers use the name Create for the class constructor. There is no requirement that you use this name, but by doing so you will make your programs easier to read and follow by other programmers. As you may recall, you can call a class procedure via an object reference or a class reference. E.g., if clsProc is a class procedure of class tClass and Obj is an object of type tClass, then the following two class procedure invocations are both legal: tClass.clsProc(); Obj.clsProc();
There is a big difference between these two calls. The first one calls clsProc with ESI containing zero (NULL) while the second invocation loads the address of Obj into ESI before the call. We can use this fact to determine within a method the particular calling mechanism.
Page 1080
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects
10.10.1Dynamic Object Allocation Within the Constructor As it turns out, most programs allocate objects dynamically using malloc and refer to those objects indirectly using pointers. This adds one more step to the initialization process – allocating storage for the object. The constructor is the perfect place to allocate this storage. Since you probably won’t need to allocate all objects dynamically, you’ll need two types of constructors: one that allocates storage and then initializes the object, and another that simply initializes an object that already has storage. Another constructor convention is to merge these two constructors into a single constructor and differentiate the type of constructor call by the value in ESI. On entry into the class’ Create procedure, the program checks the value in ESI to see if it contains NULL (zero). If so, the constructor calls malloc to allocate storage for the object and returns a pointer to the object in ESI. If ESI does not contain NULL upon entry into the procedure, then the constructor assumes that ESI points at a valid object and skips over the memory allocation statements. At the very least, a constructor initializes the pointer to the VMT; therefore, the minimalist constructor will look like the following: procedure tBaseClass.mBase; nodisplay; begin mBase; if( ESI = 0 ) then push( eax ); // Malloc returns its result here, so save it. malloc( @size( tBaseClass )); mov( eax, esi ); // Put pointer into ESI; pop( eax ); endif; // Initialize the pointer to the VMT: // (remember, "this" is shorthand for (type tBaseClass [esi])" mov( &tBaseClass._VMT_, this._pVMT_ ); // Other class initialization would go here. end mBase;
After you write a constructor like the one above, you choose an appropriate calling mechanism based on whether your object’s storage is already allocated. For pre-allocated objects (i.e., those you’ve declared in VAR, STATIC, or STORAGE sections6 or those you’ve previously allocated storage for via malloc) you simply load the address of the object into ESI and call the constructor. For those objects you declare as a variable, this is very easy – just call the appropriate Create constructor: var bc0: tBaseClass; bcp: pointer to tBaseClass; . . . bc0.Create(); // Initializes pre-allocated bc0 object. . . . malloc( @size( tBaseClass )); // Allocate storage for bcp object. mov( eax, bcp ); . .
6. You generally do not declare objects in READONLY sections because you cannot initialize them.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1081
Chapter Ten
Volume Five . bcp.Create();
// Initializes pre-allocated bcp object.
Note that although bcp is a pointer to a tBaseClass object, the Create method does not automatically allocate storage for this object. The program already allocates the storage earlier. Therefore, when the program calls bcp.Create it loads ESI with the address contained within bcp; since this is not NULL, the tBaseClass.Create procedure does not allocate storage for a new object. By the way, the call to bcp.Create emits the following sequence of machine instructions: mov( bcp, esi ); call tBaseClass.Create;
Until now, the code examples for a class procedure call always began with an LEA instruction. This is because all the examples to this point have used object variables rather than pointers to object variables. Remember, a class procedure (method/iterator) call passes the address of the object in the ESI register. For object variables HLA emits an LEA instruction to obtain this address. For pointers to objects, however, the actual object address is the value of the pointer variable; therefore, to load the address of the object into ESI, HLA emits a MOV instruction that copies the value of the pointer into the ESI register. In the example above, the program preallocates the storage for an object prior to calling the object constructor. While there are several reasons for preallocating object storage (e.g., you’re creating a dynamic array of objects), you can achieve most simple object allocations like the one above by calling a standard Create method (i.e., one that allocates storage for an object if ESI contains NULL). The following example demonstrates this: var bcp2: pointer to tBaseClass; . . . tBaseClass.Create(); // Calls Create with ESI=NULL. mov( esi, bcp2 ); // Save pointer to new class object in bcp2.
Remember, a call to a tBaseClass.Create constructor returns a pointer to the new object in the ESI register. It is the caller’s responsibility to save the pointer this function returns into the appropriate pointer variable; the constructor does not automatically do this for you.
10.10.2Constructors and Inheritance Constructors for derived (child) classes that inherit fields from a base class represent a special case. Each class must have its own constructor but needs the ability to call the base class constructor. This section explains the reasons for this and how to do this. A derived class inherits the Create procedure from its base class. However, you must override this procedure in a derived class because the derived class probably requires more storage than the base class and, therefore, you will probably need to use a different call to malloc to allocate storage for a dynamic object. Hence, it is very unusual for a derived class not to override the definition of the Create procedure. However, overriding a base class’ Create procedure has problems of its own. When you override the base class’ Create procedure, you take the full responsibility of initializing the (entire) object, including all the initialization required by the base class. At the very least, this involves putting duplicate code in the overridden procedure to handle the initialization usually done by the base class constructor. In addition to make your program larger (by duplicating code already present in the base class constructor), this also violates information hiding principles since the derived class must be aware of all the fields in the base class (including those that are logically private to the base class). What we need here is the ability to call a base class’ constructor from within the derived class’ destructor and let that call do the lower-level initialization of the base class’ fields. Fortunately, this is an easy thing to do in HLA. Consider the following class declarations (which does things the hard way):
Page 1082
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects type tBase: class var i:uns32; j:int32; procedure Create(); returns( "esi" ); endclass; tDerived: class inherits( tBase ); var r: real64; override procedure Create(); returns( "esi" ); endclass; procedure tBase.Create; @nodisplay; begin Create; if( esi = 0 ) then push( eax ); mov( malloc( @size( tBase )), esi ); pop( eax ); endif; mov( &tBase._VMT_, this._pVMT_ ); mov( 0, this.i ); mov( -1, this.j ); end Create; procedure tDerived.Create; @nodisplay; begin Create; if( esi = 0 ) then push( eax ); mov( malloc( @size( tDerived )), esi ); pop( eax ); endif; // Initialize the VMT pointer for this object: mov( &tDerived._VMT_, this._pVMT_ ); // Initialize the "r" field of this particular object: fldz(); fstp( this.r ); // Duplicate the initialization required by tBase.Create: mov( 0, this.i ); mov( -1, this.j ); end Create;
Let’s take a closer look at the tDerived.Create procedure above. Like a conventional constructor, it begins by checking ESI and allocates storage for a new object if ESI contains NULL. Note that the size of a Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1083
Chapter Ten
Volume Five
tDerived object includes the size required by the inherited fields, so this properly allocates the necessary storage for all fields in a tDerived object. Next, the tDerived.Create procedure initializes the VMT pointer field of the object. Remember, each class has its own VMT and, specifically, derived classes do not use the VMT of their base class. Therefore, this constructor must initialize the _pVMT_ field with the address of the tDerived VMT. After initializing the VMT pointer, the tDerived constructor initializes the value of the r field to 0.0 (remember, FLDZ loads zero onto the FPU stack). This concludes the tDerived-specific initialization. The remaining instructions in tDerived.Create are the problem. These statements duplicate some of the code appearing in the tBase.Create procedure. The problem with code duplication becomes really apparent when you decide to modify the initial values of these fields; if you’ve duplicated the initialization code in derived classes, you will need to change the initialization code in more than one Create procedure. More often than not, this results in defects in the derived class Create procedures, especially if those derived classes appear in different source files than the base class. Another problem with burying base class initialization in derived class constructors is the violation of the information hiding principle. Some fields of the base class may be logically private. Although HLA does not explicitly support the concept of public and private fields in a class (as, say, C++ does), well-disciplined programmers will still partition the fields as private or public and then only use the private fields in class routines belonging to that class. Initializing these private fields in derived classes is not acceptable to such programmers. Doing so will make it very difficult to change the definition and implementation of some base class at a later date. Fortunately, HLA provides an easy mechanism for calling the inherited constructor within a derived class’ constructor. All you have to do is call the base constructor using the classname syntax, e.g., you could call tBase.Create directly from within tDerived.Create. By calling the base class constructor, your derived class constructors can initialize the base class fields without worrying about the exact implementation (or initial values) of the base class. Unfortunately, there are two types of initialization that every (conventional) constructor does that will affect the way you call a base class constructor: all conventional constructors allocate memory for the class if ESI contains zero and all conventional constructors initialize the VMT pointer. Fortunately, it is very easy to deal with these two problems The memory required by an object of some most base class is usually less than the memory required for an object of a class you derive from that base class (because the derived classes usually add more fields). Therefore, you cannot allow the base class constructor to allocate the storage when you call it from inside the derived class’ constructor. This problem is easily solved by checking ESI within the derived class constructor and allocating any necessary storage for the object before calling the base class constructor. The second problem is the initialization of the VMT pointer. When you call the base class’ constructor, it will initialize the VMT pointer with the address of the base class’ virtual method table. A derived class object’s _pVMT_ field, however, must point at the virtual method table for the derived class. Calling the base class constructor will always initialize the _pVMT_ field with the wrong pointer; to properly initialize the _pVMT_ field with the appropriate value, the derived class constructor must store the address of the derived class’ virtual method table into the _pVMT_ field after the call to the base class constructor (so that it overwrites the value written by the base class constructor). The tDerived.Create constructor, rewritten to call the tBase.Create constructors, follows: procedure tDerived.Create; @nodisplay; begin Create; if( esi = 0 ) then push( eax ); mov( malloc( @size( tDerived )), esi ); pop( eax ); endif;
Page 1084
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects // // // // //
Call the base class constructor to do any initialization needed by the base class. Note that this call must follow the object allocation code above (so ESI will always contain a pointer to an object at this point and tBase.Create will never allocate storage).
tBase.Create(); // // // // //
Initialize the VMT pointer for this object. This code must always follow the call to the base class constructor because the base class constructor also initializes this field and we don’t want the initial value supplied by tBase.Create.
mov( &tDerived._VMT_, this._pVMT_ ); // Initialize the "r" field of this particular object: fldz(); fstp( this.r ); end Create;
This solution solves all the above concerns with derived class constructors.
10.10.3Constructor Parameters and Procedure Overloading All the constructor examples to this point have not had any parameters. However, there is nothing special about constructors that prevent the use of parameters. Constructors are procedures therefore you can specify any number and types of parameters you choose. You can use these parameter values to initialize certain fields or control how the constructor initializes the fields. Of course, you may use constructor parameters for any purpose you’d use parameters in any other procedure. In fact, about the only issue you need concern yourself with is the use of parameters whenever you have a derived class. This section deals with those issues. The first, and probably most important, problem with parameters in derived class constructors actually applies to all overridden procedures, iterators, and methods: the parameter list of an overridden routine must exactly match the parameter list of the corresponding routine in the base class. In fact, HLA doesn’t even give you the chance to violate this rule because OVERRIDE routine prototypes don’t allow parameter list declarations – they automatically inherit the parameter list of the base routine. Therefore, you cannot use a special parameter list in the constructor prototype for one class and a different parameter list for the constructors appearing in base or derived classes. Sometimes it would be nice if this weren’t the case, but there are some sound and logical reasons why HLA does not support this7. Some languages, like C++, support function overloading letting you specify several different constructors whose parameter list specifies which constructor to use. HLA does not directly support procedure overloading in this manner, but you can use macros to simulate this language feature (see “Simulating Function Overloading with Macros” on page 990). To use this trick with constructors you would create a macro with the name Create. The actual constructors could have names that describe their differences (e.g., CreateDefault, CreateSetIJ, etc.). The Create macro would parse the actual parameter list to determine which routine to call.
7. Calling virtual methods and iterators would be a real problem since you don’t really know which routine a pointer references. Therefore, you couldn’t know the proper parameter list. While the problems with procedures aren’t quite as drastic, there are some subtle problems that could creep into your code if base or derived classes allowed overridden procedures with different parameter lists.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1085
Chapter Ten
Volume Five
HLA does not support macro overloading. Therefore, you cannot override a macro in a derived class to call a constructor unique to that derived class. In certain circumstances you can create a small workaround by defining empty procedures in your base class that you intend to override in some derived class (this is similar to an abstract method, see “Abstract Methods” on page 1091). Presumably, you would never call the procedure in the base class (in fact, you would probably want to put an error message in the body of the procedure just in case you accidentally call it). By putting the empty procedure declaration in the base class, the macro that simulates function overloading can refer to that procedure and you can use that in derived classes later on.
10.11 Destructors A destructor is a class routine that cleans up an object once a program finishes using that object. Like constructors, HLA does not provide a special syntax for creating destructors nor does HLA automatically call a destructor; unlike constructors, a destructor is usually a method rather than a procedure (since virtual destructors make a lot of sense while virtual constructors do not). A typical destructor will close any files opened by the object, free the memory allocated during the use of the object, and, finally, free the object itself if it was created dynamically. The destructor also handles any other clean-up chores the object may require before it ceases to exist. By convention, most HLA programmers name their destructors Destroy. Destructors generally do not have any parameters, so the issue of overloading the parameter list rarely arises. About the only code that most destructors have in common is the code to free the storage associated with the object. The following destructor demonstrates how to do this: procedure tBase.Destroy; nodisplay; begin Destroy; push( eax );
// isInHeap uses this
// Place any other clean up code here. // The code to free dynamic objects should always appear last // in the destructor. /*************/ // The following code assumes that ESI still contains the address // of the object. if( isInHeap( esi )) then free( esi ); endif; pop( eax ); end Destroy;
The HLA Standard Library routine isInHeap returns true if its parameter is an address that malloc returned. Therefore, this code automatically frees the storage associated with the object if the program originally allocated storage for the object by calling malloc. Obviously, on return from this method call, ESI will no longer point at a legal object in memory if you allocated it dynamically. Note that this code will not affect the value in ESI nor will it modify the object if the object wasn’t one you’ve previously allocated via a call to malloc.
Page 1086
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects
10.12 HLA’s “_initialize_” and “_finalize_” Strings Although HLA does not automatically call constructors and destructors associated with your classes, HLA does provide a mechanism whereby you can cause these calls to happen automatically: by using the _initialize_ and _finalize_ compile-time string variables (i.e., VAL constants) HLA automatically declares in every procedure. Whenever you write a procedure, iterator, or method, HLA automatically declares several local symbols in that routine. Two such symbols are _initialize_ and _finalize_. HLA declares these symbols as follows: val _initialize_: string := ""; _finalize_: string := "";
HLA emits the _initialize_ string as text at the very beginning of the routine’s body, i.e., immediately after the routine’s BEGIN clause8. Similarly, HLA emits the _finalize_ string at the very end of the routine’s body, just before the END clause. This is comparable to the following: procedure SomeProc; > begin SomeProc; @text( _initialize_ ); > @text( _finalize_ ); end SomeProc;
Since _initialize_ and _finalize_ initially contain the empty string, these expansions have no effect on the code that HLA generates unless you explicitly modify the value of _initialize_ prior to the BEGIN clause or you modify _finalize_ prior to the END clause of the procedure. So if you modify either of these string objects to contain a machine instruction, HLA will compile that instruction at the beginning or end of the procedure. The following example demonstrates how to use this technique: procedure SomeProc; ?_initialize_ := "mov( 0, eax );"; ?_finalize_ := "stdout.put( eax );" begin SomeProc; // HLA emits "mov( 0, eax );" here in response to the _initialize_ // string constant. add( 5, eax ); // HLA emits "stdout.put( eax );" here. end SomeProc;
Of course, these examples don’t save you much. It would be easier to type the actual statements at the beginning and end of the procedure than assign a string containing these statements to the _initialize_ and _finalize_ compile-time variables. However, if we could automate the assignment of some string to these variables, so that you don’t have to explicitly assign them in each procedure, then this feature might be useful. In a moment, you’ll see how we can automate the assignment of values to the _initialize_ and _finalize_ strings. For the time being, consider the case where we load the name of a constructor into the _initialize_
8. If the routine automatically emits code to construct the activation record, HLA emits _initialize_’s text after the code that builds the activation record.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1087
Chapter Ten
Volume Five
string and we load the name of a destructor in to the _finalize_ string. By doing this, the routine will “automatically” call the constructor and destructor for that particular object. The example above has a minor problem. If we can automate the assignment of some value to _initialize_ or _finalize_, what happens if these variables already contain some value? For example, suppose we have two objects we use in a routine and the first one loads the name of its constructor into the _initialize_ string; what happens when the second object attempts to do the same thing? The solution is simple: don’t directly assign any string to the _initialize_ or _finalize_ compile-time variables, instead, always concatenate your strings to the end of the existing string in these variables. The following is a modification to the above example that demonstrates how to do this: procedure SomeProc; ?_initialize_ := _initialize_ + "mov( 0, eax );"; ?_finalize_ := _finalize_ + "stdout.put( eax );" begin SomeProc; // HLA emits "mov( 0, eax );" here in response to the _initialize_ // string constant. add( 5, eax ); // HLA emits "stdout.put( eax );" here. end SomeProc;
When you assign values to the _initialize_ and _finalize_ strings, HLA almost guarantees that the _initialize_ sequence will execute upon entry into the routine. Sadly, the same is not true for the _finalize_ string upon exit. HLA simply emits the code for the _finalize_ string at the end of the routine, immediately before the code that cleans up the activation record and returns. Unfortunately, “falling off the end of the routine” is not the only way that one could return from that routine. One could explicitly return from somewhere in the middle of the code by executing a RET instruction. Since HLA only emits the _finalize_ string at the very end of the routine, returning from that routine in this manner bypassing the _finalize_ code. Unfortunately, other than manually emitting the _finalize_ code, there is nothing you can do about this9. Fortunately, this mechanism for exiting a routine is completely under your control; if you never exit a routine except by “falling off the end” then you won’t have to worry about this problem (note that you can use the EXIT control structure to transfer control to the end of a routine if you really want to return from that routine from somewhere in the middle of the code). Another way to prematurely exit a routine which, unfortunately, you have no control over, is by raising an exception. Your routine could call some other routine (e.g., a standard library routine) that raises an exception and then transfers control immediately to whomever called your routine. Fortunately, you can easily trap and handle exceptions by putting a TRY..ENDTRY block in your procedure. Here is an example that demonstrates this: procedure SomeProc; >
> try
// Catch any exceptions that occur: >
anyexception push( eax ); // Save the exception #. @text( _finalize_ ); // Execute the _finalize_ code here.
9. Note that you can manually emit the _finalize_ code using the statement “@text( _finalize_ );”.
Page 1088
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects pop( eax ); raise( eax );
// Restore the exception #. // Reraise the exception.
endtry; // HLA automatically emits the _finalize_ code here. end SomeProc;
Although the code above handles some problems that exist with _finalize_, by no means that this handle every possible case. Always be on the look out for ways your program could inadvertently exit a routine without executing the code found in the _finalize_ string. You should explicitly expand _finalize_ if you encounter such a situation. There is one important place you can get into trouble with respect to exceptions: within the code the routine emits for the _initialize_ string. If you modify the _initialize_ string so that it contains a constructor call and the execution of that constructor raises an exception, this will probably force an exit from that routine without executing the corresponding _finalize_ code. You could bury the TRY..ENDTRY statement directly into the _initialize_ and _finalize_ strings but this approach has several problems, not the least of which is the fact that one of the first constructors you call might raise an exception that transfers control to the exception handler that calls the destructors for all objects in that routine (including those objects whose constructors you have yet to call). Although no single solution that handles all problems exists, probably the best approach is to put a TRY..ENDTRY block around each constructor call if it is possible for that constructor to raise some exception that is possible to handle (i.e., doesn’t require the immediate termination of the program). Thus far this discussion of _initialize_ and _finalize_ has failed to address one important point: why use this feature to implement the “automatic” calling of constructors and destructors since it apparently involves more work that simply calling the constructors and destructors directly? Clearly there must be a way to automate the assignment of the _initialize_ and _finalize_ strings or this section wouldn’t exist. The way to accomplish this is by using a macro to define the class type. So now it’s time to take a look at another HLA feature that makes is possible to automate this activity: the FORWARD keyword. You’ve seen how to use the FORWARD reserved word to create procedure and iterator prototypes (see “Forward Procedures” on page 567), it turns out that you can declare forward CONST, VAL, TYPE, and variable declarations as well. The syntax for such declarations takes the following form: ForwardSymbolName: forward( undefinedID );
This declaration is completely equivalent to the following: ?undefinedID: text := "ForwardSymbolName";
Especially note that this expansion does not actually define the symbol ForwardSymbolName. It just converts this symbol to a string and assigns this string to the specified TEXT object (undefinedID in this example). Now you’re probably wonder how something like the above is equivalent to a forward declaration. The truth is, it isn’t. However, FORWARD declarations let you create macros that simulate type names by allowing you to defer the actual declaration of an object’s type until some later point in the code. Consider the following example: type myClass: class var i:int32; procedure Create; returns( "esi" ); procedure Destroy; endclass; #macro _myClass: varID;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1089
Chapter Ten
Volume Five
forward( varID ); ?_initialize_ := _initialize_ + @string:varID + ".Create(); "; ?_finalize_ := _finalize_ + @string:varID + ".Destroy(); "; varID: myClass #endmacro;
Note, and this is very important, that a semicolon does not follow the “varID: myClass” declaration at the end of this macro. You’ll find out why this semicolon is missing in a little bit. If you have the class and macro declarations above in your program, you can now declare variables of type _myClass that automatically invoke the constructor and destructor upon entry and exit of the routine containing the variable declarations. To see how, take a look at the following procedure shell: procedure HasmyClassObject; var mco: _myClass; begin HasmyClassObject; > end HasmyClassObject;
Since _myClass is a macro, the procedure above expands to the following text during compilation: procedure HasmyClassObject; var mco: // Expansion of the _myClass macro: forward( _0103_ ); // _0103_ symbol is and HLA supplied text symbol // that expands to "mco". ?_initialize_ := _initialize_ + "mco" + ".Create(); "; ?_finalize_ := _finalize_ + "mco" + ".Destroy(); "; mco: myClass; begin HasmyClassObject; mco.Create();
// Expansion of the _initialize_ string.
> mco.Destroy(); // Expansion of the _finalize_ string. end HasmyClassObject;
You might notice that a semicolon appears after “mco: myClass” declaration in the example above. This semicolon is not actually a part of the macro, instead it is the semicolon that follows the “mco: _myClass;” declaration in the original code. If you want to create an array of objects, you could legally declare that array as follows: var mcoArray: _myClass[10];
Because the last statement in the _myClass macro doesn’t end with a semicolon, the declaration above will expand to something like the following (almost correct) code: mcoArray: forward( _0103_ );
// Expansion of the _myClass macro: // _0103_ symbol is and HLA supplied text symbol // that expands to "mcoArray".
?_initialize_ := _initialize_ + "mcoArray" + ".Create(); "; ?_finalize_ := _finalize_ + "mcoArray" + ".Destroy(); "; mcoArray: myClass[10];
Page 1090
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects The only problem with this expansion is that it only calls the constructor for the first object of the array. There are several ways to solve this problem; one is to append a macro name to the end of _initialize_ and _finalize_ rather than the constructor name. That macro would check the object’s name (mcoArray in this example) to determine if it is an array. If so, that macro could expand to a loop that calls the constructor for each element of the array (the implementation appears as a programming project at the end of this chapter). Another solution to this problem is to use a macro parameter to specify the dimensions for arrays of myClass. This scheme is easier to implement than the one above, but it does have the drawback of requiring a different syntax for declaring object arrays (you have to use parentheses around the array dimension rather than square brackets). The FORWARD directive is quite powerful and lets you achieve all kinds of tricks. However, there are a few problems of which you should be aware. First, since HLA emits the _initialize_ and _finalize_ code transparently, you can be easily confused if there are any errors in the code appearing within these strings. If you start getting error messages associated with the BEGIN or END statements in a routine, you might want to take a look at the _initialize_ and _finalize_ strings within that routine. The best defense here is to always append very simple statements to these strings so that you reduce the likelihood of an error. Fundamentally, HLA doesn’t support automatic constructor and destructor calls. This section has presented several tricks to attempt to automate the calls to these routines. However, the automation isn’t perfect and, indeed, the aforementioned problems with the _finalize_ strings limit the applicability of this approach. The mechanism this section presents is probably fine for simple classes and simple programs. However, one piece of advice is probably worth following: if your code is complex or correctness is critical, it’s probably a good idea to explicitly call the constructors and destructors manually.
10.13 Abstract Methods An abstract base class is one that exists solely to supply a set of common fields to its derived classes. You never declare variables whose type is an abstract base class, you always use one of the derived classes. The purpose of an abstract base class is to provide a template for creating other classes, nothing more. As it turns out, the only difference in syntax between a standard base class and an abstract base class is the presence of at least one abstract method declaration. An abstract method is a special method that does not have an actual implementation in the abstract base class. Any attempt to call that method will raise an exception. If you’re wondering what possible good an abstract method could be, well, keep on reading... Suppose you want to create a set of classes to hold numeric values. One class could represent unsigned integers, another class could represent signed integers, a third could implement BCD values, and a fourth could support real64 values. While you could create four separate classes that function independently of one another, doing so passes up an opportunity to make this set of classes more convenient to use. To understand why, consider the following possible class declarations: type uint: class var TheValue: dword; method put; > endclass; sint: class var TheValue: dword; method put; > endclass;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1091
Chapter Ten
Volume Five r64: class var TheValue: real64; method put; > endclass;
The implementation of these classes is not unreasonable. They have fields for the data, they have a put method (which, presumably, writes the data to the standard output device), Presumably they have other methods and procedures in implement various operations on the data. There is, however, two problems with these classes, one minor and one major, both occurring because these classes do not inherit any fields from a common base class. The first problem, which is relatively minor, is that you have to repeat the declaration of several common fields in these classes. For example, the put method declaration appears in each of these classes10. This duplication of effort involves results in a harder to maintain program because it doesn’t encourage you to use a common name for a common function since it’s easy to use a different name in each of the classes. A bigger problem with this approach is that it is not generic. That is, you can’t create a generic pointer to a “numeric” object and perform operations like addition, subtraction, and output on that value (regardless of the underlying numeric representation). We can easily solve these two problems by turning the previous class declarations into a set of derived classes. The following code demonstrates an easy way to do this: type numeric: class procedure put; > endclass; uint: class inherits( numeric ) var TheValue: dword; override method put; > endclass; sint: class inherits( numeric ) var TheValue: dword; override method put; > endclass; r64: class inherits( numeric ) var TheValue: real64; override method put; > endclass;
This scheme solves both the problems. First, by inheriting the put method from numeric, this code encourages the derived classes to always use the name put thereby making the program easier to maintain. Second, because this example uses derived classes, it’s possible to create a pointer to the numeric type and 10. Note, by the way, that TheValue is not a common class because this field has a different type in the r64 class.
Page 1092
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects load this pointer with the address of a uint, sint, or r64 object. That pointer can invoke the methods found in the numeric class to do functions like addition, subtraction, or numeric output. Therefore, the application that uses this pointer doesn’t need to know the exact data type, it only deals with numeric values in a generic fashion. One problem with this scheme is that it’s possible to declare and use variables of type numeric. Unfortunately, such numeric variables don’t have the ability to represent any type of number (notice that the data storage for the numeric fields actually appears in the derived classes). Worse, because you’ve declared the put method in the numeric class, you’ve actually got to write some code to implement that method even though one should never really call it; the actual implementation should only occur in the derived classes. While you could write a dummy method that prints an error message (or, better yet, raises an exception), there shouldn’t be any need to write “dummy” procedures like this. Fortunately, there is no reason to do so – if you use abstract methods. The ABSTRACT keyword, when it follows a method declaration, tells HLA that you are not going to provide an implementation of the method for this class. Instead, it is the responsibility of all derived class to provide a concrete implementation for the abstract method. HLA will raise an exception if you attempt to call an abstract method directly. The following is the modification to the numeric class to convert put to an abstract method: type numeric: class method put; abstract; > endclass;
An abstract base class is a class that has at least one abstract method. Note that you don’t have to make all methods abstract in an abstract base class; it is perfectly legal to declare some standard methods (and, of course, provide their implementation) within the abstract base class. Abstract method declarations provide a mechanism by which a base class enforces the methods that the derived classes must implement. In theory, all derived classes must provide concrete implementations of all abstract methods or those derived classes are themselves abstract base classes. In practice, it’s possible to bend the rules a little and use abstract methods for a slightly different purpose. A little earlier, you read that one should never create variables whose type is an abstract base class. For if you attempt to execute an abstract method the program would immediately raise an exception to complain about this illegal method call. In practice, you actually can declare variables of an abstract base type and get away with this as long as you don’t call any abstract methods. We can use this fact to provide a better form of method overloading (that is, providing several different routines with the same name but different parameter lists). Remember, the standard trick in HLA to overload a routine is to write several different routines and then use a macro to parse the parameter list and determine which actual routine to call (see “Simulating Function Overloading with Macros” on page 990). The problem with this technique is that you cannot override a macro definition in a class, so if you want to use a macro to override a routine’s syntax, then that macro must appear in the base class. Unfortunately, you may not need a routine with a specific parameter list in the base class (for that matter, you may only need that particular version of the routine in a single derived class), so implementing that routine in the base class and in all the other derived classes is a waste of effort. This isn’t a big problem. Just go ahead and define the abstract method in the base class and only implement it in the derived class that needs that particular method. As long as you don’t call that method in the base class or in the other derived classes that don’t override the method, everything will work fine. One problem with using abstract methods to support overloading is that this trick does not apply to procedures - only methods and iterators. However, you can achieve the same effect with procedures by declaring a (non-abstract) procedure in the base class and overriding that procedure only in the class that actually uses it. You will have to provide an implementation of the procedure in the base class, but that is a minor issue (the procedure’s body, by the way, should simply raise an exception to indicate that you should have never called it). An example of routine overloading in a class appears in this chapter’s sample program.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1093
Chapter Ten
Volume Five
10.14 Run-time Type Information (RTTI) When working with an object variable (as opposed to a pointer to an object), the type of that object is obvious: it’s the variable’s declared type. Therefore, at both compile-time and run-time the program trivially knows the type of the object. When working with pointers to objects you cannot, in the general case, determine the type of an object a pointer references. However, at run-time it is possible to determine the object’s actual type. This section discusses how to detect the underlying object’s type and how to use this information. If you have a pointer to an object and that pointer’s type is some base class, at run-time the pointer could point at an object of the base class or any derived type. At compile-time it is not possible to determine the exact type of an object at any instant. To see why, consider the following short example: ReturnSomeObject(); mov( esi, ptrToObject );
// Returns a pointer to some class in ESI.
The routine ReturnSomeObject returns a pointer to an object in ESI. This could be the address of some base class object or a derived class object. At compile-time there is no way for the program to know what type of object this function returns. For example, ReturnSomeObject could ask the user what value to return so the exact type could not be determined until the program actually runs and the user makes a selection. In a perfectly designed program, there probably is no need to know a generic object’s actual type. After all, the whole purpose of object-oriented programming and inheritance is to produce general programs that work with lots of different objects without having to make substantial changes to the program. In the real world, however, programs may not have a perfect design and sometimes it’s nice to know the exact object type a pointer references. Run-time type information, or RTTI, gives you the capability of determining an object’s type at run-time, even if you are referencing that object using a pointer to some base class of that object. Perhaps the most fundamental RTTI operation you need is the ability to ask if a pointer contains the address of some specific object type. Many object-oriented languages (e.g., Delphi) provide an IS operator that provides this functionality. IS is a boolean operator that returns true if its left operand (a pointer) points at an object whose type matches the left operand (which must be a type identifier). The typical syntax is generally the following: ObjectPointerOrVar
is ClassType
This operator would return true if the variable is of the specified class, it returns false otherwise. Here is a typical use of this operator (in the Delphi language) if( ptrToNumeric is uint ) then begin . . . end;
It’s actually quite simple to implement this functionality in HLA. As you may recall, each class is given its own virtual method table. Whenever you create an object, you must initialize the pointer to the VMT with the address of that class’ VMT. Therefore, the VMT pointer field of all objects of a given class type contain the same pointer value and this pointer value is different from the VMT pointer field of all other classes. We can use this fact to see if an object is some specific type. The following code demonstrates how to implement the Delphi statement above in HLA: mov( ptrToNumeric, esi ); if( (type uint [esi])._pVMT_ = &uint._VMT_ . .
Page 1094
) then
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects . endif;
This IF statement simply compares the object’s _pVMT_ field (the pointer to the VMT) against the address of the desired class’ VMT. If they are equal, then the ptrToNumeric variable points at an object of type uint. Within the body of a class method or iterator, there is a slightly easier way to see if the object is a certain class. Remember, upon entry into a method or an iterator, the EDI register contains the address of the virtual method table. Therefore, assuming you haven’t modified EDI’s value, you can easily test to see if THIS (ESI) is a specific class type using an IF statement like the following: if( EDI = &uint._VMT_ . . . endif;
) then
10.15 Calling Base Class Methods In the section on constructors you saw that it is possible to call an ancestor class’ procedure within the derived class’ overridden procedure. To do this, all you needed to do was to invoke the procedure using the call “classname.procedureName( parameters);” On occasion you may want to do this same operation with a class’ methods as well as its procedures (that is, have an overridden method call the corresponding base class method in order to do some computation you’d rather not repeat in the derived class’ method). Unfortunately, HLA does not let you directly call methods as it does procedures. You will need to use an indirect mechanism to achieve this; specifically, you will have to call the function using the address in the base class’ virtual method table. This section describes how to do this. Whenever your program calls a method it does so indirectly, using the address found in the virtual method table for the method’s class. The virtual method table is nothing more than an array of 32-bit pointers with each entry containing the address of one of that class’ methods. So to call a method, all you need is the index into this array (or, more properly, the offset into the array) of the address of the method you wish to call. The HLA compile-time function @offset comes to the rescue- it will return the offset into the virtual method table of the method whose name you supply as a parameter. Combined with the CALL instruction, you can easily call any method associated with a class. Here’s an example of how you would do this: type myCls: class . . . method m; . . . endclass; . . . call( myCls._VMT_[ @offset( myCls.m )]);
The CALL instruction above calls the method whose address appears at the specified entry in the virtual method table for myCls. The @offset function call returns the offset (i.e., index times four) of the address of myCls.m within the virtual method table. Hence, this code indirectly calls the m method by using the virtual method table entry for m. There is one major drawback to calling methods using this scheme: you don’t get to use the high level syntax for procedure/method calls. Instead, you must use the low-level CALL instruction. In the example Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1095
Chapter Ten
Volume Five
above, this isn’t much of an issue because the m procedure doesn’t have any parameters. If it did have parameters, you would have to manually push those parameters onto the stack yourself (see “Passing Parameters on the Stack” on page 822). Fortunately, you’ll rarely need to call ancestor class methods from a derived class, so this won’t be much of an issue in real-world programs.
10.16 Sample Program This chapter’s sample program will present what is probably the epitome of object-oriented programs: a simple “drawing” program that uses objects to represent shapes to draw on the display. While limited to a demonstration program, this program does demonstrate important object-oriented concepts in assembly language. This is an unusual drawing program insofar as it draws shapes using ASCII characters. While the shapes it draws are very rough (compared to a graphics-based drawing program), the output of this program could be quite useful for creating rudimentary diagrams to include as comments in your HLA (or other language) programs. This sample program does not provide a “user interface” for drawing images (something you would need to effectively use this program) because the user interface represents a lot of code that won’t improve your appreciation of object-oriented programming (not to mention, this book is long enough already). Providing a mouse-based user interface to this program is left as an exercise to the interested reader. This program consists of three source files: the class definitions in a header file, the implementation of the class’ procedures and methods in an HLA source file, and a main program that demonstrates a simple use of the class’ objects. The following listings are for these three files.
type // Generic shape class: shape: class const maxX: uns16 := 80; maxY: uns16 := 25; var x: uns16; y: uns16; width: uns16; height: uns16; fillShape:boolean; procedure create; returns( "esi" ); external; method method method method
draw; abstract; fill( f:boolean ); external; moveTo( x:uns16; y:uns16 ); external; resize( width: uns16; height: uns16 );
external;
endclass;
// Class for a rectangle shape
Page 1096
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects // // +------+ // | | // +------+ rect: class inherits( shape ) override procedure create; external; override method draw; external; endclass;
// Class for a rounded rectangle shape // // -------// / \ // | | // \ / // -------roundrect: class inherits( shape ) override procedure create; external; override method draw; external; endclass;
// Class for a diamond shape // // /\ // / \ // \ / // \/ diamond: class inherits( shape ) override procedure create; external; override method resize; external; override method draw; external; endclass;
Program 10.1
Shapes.hhf - The Shape Class Header Files
unit Shapes; #includeonce( "stdlib.hhf" ) #includeonce( "shapes.hhf" )
// Emit the virtual method tables for the classes: static
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1097
Chapter Ten
Volume Five vmt( vmt( vmt( vmt(
shape ); rect ); roundrect ); diamond );
/*********************************************************************/ // Generic shape methods and procedures
// Constructor for the shape class. // // Note: this should really be an abstract procedure, but since // HLA doesn't support abstract procedures we'll fake it by // raising an exception if somebody tries to call this proc. procedure shape.create; @nodisplay; @noframe; begin create; // This should really be an abstract procedure, // but such things don't exist, so we will fake it. raise( ex.ExecutedAbstract ); end create;
// Generic shape.fill method. // This is an accessor function that sets the "fill" field // to the value of the parameter. method shape.fill( f:boolean ); @nodisplay; begin fill; push( eax ); mov( f, al ); mov( al, this.fillShape ); pop( eax ); end fill; // // // //
Generic shape.moveTo method. Checks the coordinates passed as a parameter and then sets the (X,Y) coordinates of the underlying shape object to these values.
method shape.moveTo( x:uns16; y:uns16 ); @nodisplay; begin moveTo; push( eax ); push( ebx );
mov( x, ax ); assert( ax < shape.maxX ); mov( ax, this.x ); mov( y, ax );
Page 1098
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects assert( ax < shape.maxY ); mov( ax, this.y ); pop( ebx ); pop( eax ); end moveTo;
// Generic shape.resize method. // Sets the width and height fields of the underlying object // to the values passed as parameters. // // Note: Ignores resize request if the size is less than 2x2. method shape.resize( width:uns16; height:uns16 ); @nodisplay; begin resize; push( eax ); assert( width 2 ) then mov( width, ax ); mov( ax, this.width );
mov( height, ax ); mov( ax, this.height ); endif; endif; pop( eax ); end resize;
/*******************/ /* */ /* rect's methods: */ /* */ /*******************/
// Constructor for the rectangle class: procedure rect.create; @nodisplay; @noframe; begin create; push( eax ); // If called as rect.create, then allocate a new object // on the heap and return the pointer in ESI. if( esi = NULL ) then mov( malloc( @size( rect ) ), esi );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1099
Chapter Ten
Volume Five
endif; // Initialize the pointer to the VMT: mov( &rect._VMT_, this._pVMT_ ); // Initialize fields to create a non-filled unit square. sub( eax, eax ); mov( mov( inc( mov( inc( mov( mov(
ax, ax, eax al, eax ax, ax,
this.x ); this.y ); ); this.fillShape ); ); this.height ); this.width );
// Sets fillShape to true.
pop( eax ); ret(); end create;
// Here's the method to draw a text-based square on the display. method rect.draw; @nodisplay; static horz: str.strvar( shape.maxX ); // Holds "+------...--+" spcs: str.strvar( shape.maxX ); // Holds " ... " for fills.
begin draw; push( push( push( push(
eax ebx ecx edx
); ); ); );
// Initialize the horz and spcs strings to speed up // drawing our rectangle. movzx( this.width, ebx ); str.setstr( '-', horz, ebx ); mov( horz, eax ); mov( '+', (type char [eax])); mov( '+', (type char [eax+ebx-1])); // // // // // //
If the fillShape field contains true, then we need to fill in the characters inside the rectangle. If this is false, we don't want to overwrite the text in the center of the rectangle. The following code initializes spcs to all spaces or the empty string to accomplish this.
if( this.fillShape ) then sub( 2, ebx ); str.setstr( ' ', spcs, ebx );
Page 1100
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects else str.cpy( "", spcs ); endif; // Okay, position the cursor and draw // our rectangle. console.gotoxy( this.y, this.x ); stdout.puts( horz ); // Draws top horz line. // // // //
For each row except the top and bottom rows, draw "|" characters on the left and right hand sides and the fill characters (if fillShape is true) inbetween them.
mov( this.y, cx ); mov( cx, bx ); add( this.height, bx ); inc( cx ); dec( bx ); while( cx < bx) do console.gotoxy( cx, this.x ); stdout.putc( '|' ); stdout.puts( spcs ); mov( this.x, dx ); add( this.width, dx ); dec( dx ); console.gotoxy( cx, dx ); stdout.putc( '|' ); inc( cx ); endwhile; // Draw the bottom horz bar: console.gotoxy( cx, this.x ); stdout.puts( horz ); pop( pop( pop( pop(
edx ecx ebx eax
); ); ); );
end draw;
/************************/ /* */ /* roundrect's methods: */ /* */ /************************/
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1101
Chapter Ten
Volume Five
// This is the constructor for the roundrect class. // See the comments in rect.create for details // (since this is just a clone of that code with // minor changes here and there). procedure roundrect.create; @nodisplay; @noframe; begin create; push( eax ); if( esi = NULL ) then mov( malloc( @size( rect ) ), esi ); endif; mov( &roundrect._VMT_, this._pVMT_ ); // Initialize fields to create a non-filled unit square. sub( eax, eax ); mov( mov( inc( mov( inc( mov( mov(
ax, ax, eax al, eax ax, ax,
this.x ); this.y ); ); this.fillShape ); ); this.height ); this.width );
// Sets fillShape to true.
pop( eax ); ret(); end create;
// // // // // // // // // // // //
Here is the draw method for the roundrect object. Note: if the object is less than 5x4 in size, this code calls rect.draw to draw a rectangle since roundrects smaller than 5x4 don't look good. Typical roundrect: -------/ \ | | \ / --------
method roundrect.draw; @nodisplay; static horz: str.strvar( shape.maxX ); spcs: str.strvar( shape.maxX ); begin draw; push( push( push( push(
eax ebx ecx edx
); ); ); );
if
Page 1102
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects ( #{ cmp( this.width, 5 ); jb true; cmp( this.height, 4 ); jae false; }#) then // If it's too small to draw an effective // roundrect, then draw it as a rectangle. call( rect._VMT_[ @offset( rect.draw ) ] ); else // // // //
Okay, it's big enough, draw it as a rounded rectangle object. Begin by initializing the horz string with a set of dashes with spaces at either end.
movzx( this.width, ebx ); sub( 4, ebx ); str.setstr( '-', horz, ebx ); if( this.fillShape ) then add( 2, ebx ); str.setstr( ' ', spcs, ebx ); else str.cpy( "", spcs ); endif; // Okay, draw the top line. mov( this.x, ax ); add( 2, ax ); console.gotoxy( this.y, ax ); stdout.puts( horz ); // Now draw the second line and the // as "/" and "\" with optional spaces // inbetween (if fillShape is true). mov( this.y, cx ); inc( cx ); console.gotoxy( cx, ax ); stdout.puts( spcs ); console.gotoxy( cx, this.x ); stdout.puts( " /" ); add( this.width, ax ); sub( 4, ax ); console.gotoxy( cx, ax ); stdout.puts( "\ " );
// Sub 4 because we added two above.
// Okay, now draw the bottom line: mov( this.x, ax );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1103
Chapter Ten
Volume Five add( 2, ax ); mov( this.y, cx ); add( this.height, cx ); dec( cx ); console.gotoxy( cx, ax ); stdout.puts( horz ); // And draw the second from the bottom // line as "\" and "/" with optional // spaces inbetween (depending on fillShape) dec( cx ); console.gotoxy( cx, this.x ); stdout.puts( spcs ); console.gotoxy( cx, this.x ); stdout.puts( " \" ); mov( this.x, ax ); add( this.width, ax ); sub( 2, ax ); console.gotoxy( cx, ax ); stdout.puts( "/ " );
// Sub 4 because we added two above.
// Finally, draw all the lines inbetween the // top two and bottom two lines. mov( mov( add( add( sub( mov( add( dec(
this.y, cx ); this.height, bx ); cx, bx ); 2, cx ); 2, bx ); this.x, ax); this.width, ax ); ax );
while( cx < bx) do console.gotoxy( cx, this.x ); stdout.putc( '|' ); stdout.puts( spcs ); console.gotoxy( cx, ax ); stdout.putc( '|' ); inc( cx ); endwhile; endif; pop( pop( pop( pop(
edx ecx ebx eax
); ); ); );
end draw;
/*********************/ /* */ /* Diamond's methods */
Page 1104
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects /* */ /*********************/
// Constructor for a diamond shape. // See pertinent comments for the rect constructor // for more details. procedure diamond.create; @nodisplay; @noframe; begin create; push( eax ); if( esi = NULL ) then mov( malloc( @size( rect ) ), esi ); endif; mov( &diamond._VMT_, this._pVMT_ ); // Initialize fields to create a 2x2 diamond. sub( eax, eax ); mov( mov( inc( mov( inc( mov( mov(
ax, ax, eax al, eax ax, ax,
this.x ); this.y ); ); this.fillShape ); ); this.height ); this.width );
// Sets fillShape to true. // Minimum diamond size is 2x2.
pop( eax ); ret(); end create;
// // // // // // //
We have to overload the resize method for diamonds (unlike the other objects) because diamond shapes have to be symmetrical. That is, the width and the height have to be the same. This code enforces this restriction by setting both parameters to the minimum of the width/height parameters and then it calls shape.resize to do the dirty work.
method diamond.resize( width:uns16; height:uns16 ); @nodisplay; begin resize; // Diamonds are symmetrical shapes, so the width and // height must be the same. Force that here: push( eax ); mov( width, ax ); if( ax > height ) then mov( height, ax ); endif; // Call the shape.resize method to do the actual work:
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1105
Chapter Ten
Volume Five push( eax ); // Pass the minimum value as the width. push( eax ); // Also pass the minimum value as the height. call( shape._VMT_[ @offset( shape.resize ) ] ); pop( eax );
end resize;
// Here's the code to draw the diamond. method diamond.draw; @nodisplay; var startY: uns16; endY: uns16; startX: uns16; endX: uns16; begin draw; push( push( push( push(
eax ebx ecx edx
); ); ); );
if (#{ cmp( this.width, 2 ); jb true; cmp( this.height, 2 ); jae false; }#) then // // // //
Special cases for small diamonds. Resizing prevents most of these from ever appearing. However, if someone pokes around directly in the width and height fields this code will save us: cmp( this.width, 1 ); ja D2x1; cmp( this.height, 1 ); ja D1x2; // At this point we must have a 1x1 diamond console.gotoxy( this.y, this.x ); stdout.putc( '+' ); jmp SmallDiamondDone;
D2x1: // Okay, we have a 2x1 (WxH) diamond here: console.gotoxy( this.y, this.x ); stdout.puts( "" ); jmp SmallDiamondDone; D1x2: // We have a 1x2 (WxH) diamond here:
Page 1106
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Classes and Objects mov( this.y, ax ); console.gotoxy( ax, this.x ); stdout.putc( '^' ); inc( ax ); console.gotoxy( ax, this.x ); stdout.putc( 'V' ); SmallDiamondDone:
else // // // // // // // // // // // // // // // // // // // // // //
Okay, we're drawing a reasonable sized diamond. There is still a minor problem. The best looking diamonds always have a width and height that is an even integer. We need to do something special if the height or width is odd. Odd Height /\ < > \/
Odd Width . 0 ) // // // //
// // // // //
Number of characters to process. Accumulate value here. Power of 16 to multiply by. Checks for overflow. Repeat for each char in string.
For each character in the string, verify that it is a legal hexadecimal character and merge it in with the current accumulated value if it is. Print an error message if we come across an illegal character.
?len := len - 1; // Next available char. ?curch := char( @substr( hs, len, 1 )); // Get the character. #if( curch in {‘0’..’9’} ) // See if valid decimal digit. // Accumulate result if decimal digit. ?dwval := dwval + (uns8( curch ) - uns8( ‘0’ )) * mplier; #elseif( curch in {‘A’..’F’} )
// See if valid hex digit.
// Accumulate result if a hexadecimal digit. ?dwval := dwval + (uns8( curch ) - uns8( ‘A’ ) + 10) * mplier; // Ignore underscore characters and report an error for anything // else we find in the string. #elseif( curch ‘_’ )
#error( “Illegal character in 64-bit hexadecimal constant” ) #print( “Character = ‘”, curch, “‘ Rest of string: ‘”, hs, “‘” ) #endif // If it’s not an underscore character, adjust the multiplier value. // If we cross a dword boundary, emit the L.O. value as a dword // and reset everything for the H.O. dword. #if( curch ‘_’ ) // If the current value fits in 32 bits, process this // as though it were a dword object. #if( mplier < $1000_0000 ) ?mplier := mplier * 16; #elseif( len > 0 )
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1121
Chapter Eleven
Volume Four // Down here we’ve just processed the last hex // digit that will fit into 32 bits. So emit the // L.O. dword and reset the mplier and dwval constants. ?mplier := 1; dword dwval; ?dwval := 0; // If we’ve been this way before, we’ve got an // overflow. #if( didLO ) #error( “64-bit overflow in constant” ); #endif ?didLO := true; #endif #endif
#endwhile // Emit the H.O. dword here. dword dwval; // If the constant only consumed 32 bits, we’ve got to emit a zero // for the H.O. dword at this point. #if( !didLO ) dword 0; #endif endmacro;
static x:qword; @nostorage; qword16( $1234_5678_90ab_cdef ); qword16( 100 ); begin qwordConstType; stdout.put( “64-bit value of x = $” ); stdout.putq( x ); stdout.newln(); end qwordConstType;
Program 11.1
Page 1122
qword16 Macro to Process 64-bit Hexadecimal Constants
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set Although it’s a little bit more difficult, you could also write a qword10 macro that lets you specify decimal constants as the macro operand rather than hexadecimal constants. The implementation of qword10 is left as a programming exercise at the end of this volume.
11.7
MMX Technology Instructions The following subsections describe each of the MMX instructions in detail. The organization is as follows: • • • • • • •
Data Transfer Instructions, Conversion Instructions, Packed Arithmetic Instructions, Comparisons, Logical Instructions, Shift and Rotate Instructions, the EMMS Instruction.
These sections describe what these instructions do, not how you would use them. Later sections will provide examples of how you can use several of these instructions.
11.7.1 MMX Data Transfer Instructions movd( movd( movd( movd(
reg32, mmi mem32, mmi mmi, reg32 mmi, mem32
); ); ); );
movq( mem64, mmi ); movq( mmi, mem64 ); movq( mmi, mmi );
The MOVD (move double word) instruction copies data between a 32-bit integer register or double word memory location and an MMX register. If the destination is an MMX register, this instruction zero-extends the value while moving it. If the destination is a 32-bit register or memory location, this instruction copies the L.O. 32-bits of the MMX register to the destination. The MOVQ (move quadword) instruction copies data between two MMX registers or between an MMX register and memory. If either the source or destination operand is a memory object, it must be a qword variable or HLA will complain.
11.7.2 MMX Conversion Instructions packssdw( mem64, mmi ); packssdw( mmi, mmi ); packsswb( mem64, mmi ); packsswb( mmi, mmi ); packusdw( mem64, mmi ); packusdw( mmi, mmi );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1123
Chapter Eleven
Volume Four
packuswb( mem64, mmi ); packuswb( mmi, mmi ); punpckhbw( mem64, mmi ); punpckhbw( mmi, mmi ); punpckhdq( mem64, mmi ); punpckhdq( mmi, mmi ); punpckhwd( mem64, mmi ); punpckhwd( mmi, mmi ); punpcklbw( mem64, mmi ); punpcklbw( mmi, mmi ); punpckldq( mem64, mmi ); punpckldq( mmi, mmi ); punpcklwd( mem64, mmi ); punpcklwd( mmi, mmi );
The PACKSSxx instructions pack and saturate signed values. They convert a sequence of larger values to a sequence of smaller values via saturation. Those instructions with the dw suffix pack four double words into four words; those with the wb suffix saturate and pack eight signed words into eight signed bytes. The PACKSSDW instruction takes the two double words in the source operand and the two double words in the destination operand and converts these to four signed words via saturation. The instruction packs these four words together and stores the result in the destination MMX register. See Figure 11.3 for details. The PACKSSWB instruction takes the four words from the source operand and the four signed words from the destination operand and converts, via signed saturation, these values to eight signed bytes. This instruction leaves the eight bytes in the destination MMX register. See Figure 11.4 for details. One application for these pack instructions is to convert UNICODE to ASCII (ANSI). You can convert UNICODE (16-bit) character to ANSI (8-bit) character if the H.O. eight bits of each UNICODE character is zero. The PACKUSWB instruction will take eight UNICODE characters and pack them into a string that is eight bytes long with a single instruction. If the H.O. byte of any UNICODE character contains a non-zero value, then the PACKUSWB instruction will store $FF in the respective byte; therefore, you can use $FF as a conversion error indication. Another use for the PACKSSWB instruction is to translate a 16-bit audio stream to an eight-bit stream. Assuming you’ve scaled your sixteen-bit values to produce a sequence of values in the range -128..+127, you can use the PACKSSWB instruction to convert that sequence of 16-bit values into a packed sequence of eight bit values.
Page 1124
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set
63
0 Source
63
0 Destination
63
0 Destination Word 3
Word 2
Word 1
Word 0
PACKSSDW Operation
Figure 11.3
PACKSSDW Instruction
63
0 Source
63
0 Destination
63
0 Destination Word 3
Word 2
Word 1
Word 0
PACKSSWB Operation
Figure 11.4
PACKSSWB Instruction
The unpack instructions (PUNPCKxxx) provide the converse operation to the pack instructions. The
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1125
Chapter Eleven
Volume Four
unpack instructions take a sequence of smaller, packed, values and translate them into larger values. There is one problem with this conversion, however. Unlike the pack instructions, where it took two 64-bit operands to generate a single 64-bit result, the unpack operations will produce a 64-bit result from a single 32-bit result. Therefore, these instructions cannot operate directly on full 64-bit source operands. To overcome this limitation, there are two sets of unpack instructions: one set unpacks the data from the L.O. double word of a 64-bit object, the other set of instructions unpacks the H.O. double word of a 64-bit object. By executing one instruction from each set you can unpack a 64-bit object into a 128-bit object. The PUNPCKLBW, PUNPCKLWD, and PUNPCKLDQ instructions merge (unpack) the L.O. double words of their source and destination operands and store the 64-bit result into their destination operand. The PUNPCKLBW instruction unpacks and interleaves the low-order four bytes of the source (first) and destination (second) operands. It places the L.O. four bytes of the destination operand at the even byte positions in the destination and it places the L.O. four bytes of the source operand in the odd byte positions of the destination operand.(see Figure 11.5).
he 63
0 Source
63
0 Destination
63
0 Destination Word 3
Word 2
Word 1
Word 0
PUNPCKLBW Operation
Figure 11.5
UNPCKLBW Instruction
The PUNPCKLWD instruction unpacks and interleaves the low-order two words of the source (first) and destination (second) operands. It places the L.O. two words of the destination operand at the even word positions in the destination and it places the L.O. words of the source operand in the odd word positions of the destination operand (see Figure 11.6).
Page 1126
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set
63
0 Source
63
0 Destination
63
0 Destination DWord 1
DWord 0
PUNPCKLWD Operation
Figure 11.6
The PUNPCKLWD Instruction
The PUNPCKDQ instruction copies the L.O. dword of the source operand to the L.O. dword of the destination operand and it copies the (original) L.O. dword of the destination operand to the L.O. dword of the destination (i.e., it doesn’t change the L.O. dword of the destination, see Figure 11.7).
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1127
Chapter Eleven
Volume Four
63
0 Source
63
0 Destination
63
0 Destination QWord
PUNPCKLDQ Operation
Figure 11.7
PUNPCKLDQ Instruction
The PUNPCKHBW instruction is quite similar to the PUNPCKLBW instruction. The difference is that it unpacks and interleaves the high-order four bytes of the source (first) and destination (second) operands. It places the H.O. four bytes of the destination operand at the even byte positions in the destination and it places the H.O. four bytes of the source operand in the odd byte positions of the destination operand (see Figure 11.8).
Page 1128
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set
63
0 Source
63
0 Destination
63
0 Destination Word 3
Word 2
Word 1
Word 0
PUNPCKHBW Operation
Figure 11.8
PUNPCKHBW Instruction
The PUNPCKHWD instruction unpacks and interleaves the low-order two words of the source (first) and destination (second) operands. It places the L.O. two words of the destination operand at the even word positions in the destination and it places the L.O. words of the source operand in the odd word positions of the destination operand (see Figure 11.9)
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1129
Chapter Eleven
Volume Four
63
0 Source
63
0 Destination
63
0 Destination DWord 1
DWord 0
PUNPCKHWD Operation
Figure 11.9
PUNPCKHWD Instruction
The PUNPCKHDQ instruction copies the H.O. dword of the source operand to the H.O. dword of the destination operand and it copies the (original) H.O. dword of the destination operand to the L.O. dword of the destination (see Figure 11.10).
Page 1130
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set
63
0 Source
63
0 Destination
63
0 Destination QWord
PUNPCKHDQ Operation
Figure 11.10
PUNPCKDQ Instruction
Since the unpack instructions provide the converse operation of the pack instructions, it should come as no surprise that you can use these instructions to perform the inverse algorithms of the examples given earlier for the pack instructions. For example, if you have a string of eight-bit ANSI characters, you can convert them to their UNICODE equivalents by setting one MMX register (the source) to all zeros. You can convert each four characters of the ANSI string to UNICODE by loading those four characters into the L.O. double word of an MMX register and executing the PUNPCKLBW instruction. This will interleave each of the characters with a zero byte, thus converting them from ANSI to UNICODE. Of course, the unpack instructions are quite valuable any time you need to interleave data. For example, if you have three separate images containing the blue, red, and green components of a 24-bit image, it is possible to merge these three bytes together using the PUNPCKLBW instruction3.
11.7.3 MMX Packed Arithmetic Instructions paddb( mem64, mmi ); paddb( mmi, mmi ); paddw( mem64, mmi ); paddw( mmi, mmi ); paddd( mem64, mmi ); paddd( mmi, mmi ); paddsb( mem64, mmi ); paddsb( mmi, mmi ); paddsw( mem64, mmi ); paddsw( mmi, mmi ); 3. Typically you would merge in a fourth byte of zero and then store the resulting double word every three bytes in memory to overwrite the zeros.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1131
Chapter Eleven
Volume Four
paddusb( mem64, mmi ); paddusb( mmi, mmi ); paddusw( mem64, mmi ); paddusw( mmi, mmi ); psubb( mem64, mmi ); psubb( mmi, mmi ); psubw( mem64, mmi ); psubw( mmi, mmi ); psubd( mem64, mmi ); psubd( mmi, mmi ); psubsb( mem64, mmi ); psubsb( mmi, mmi ); psubsw( mem64, mmi ); psubsw( mmi, mmi ); psubusb( mem64, mmi ); psubusb( mmi, mmi ); psubusw( mem64, mmi ); psubusw( mmi, mmi ); pmulhuw( mem64, mmi ); pmulhuw( mmi, mmi ); pmulhw( mem64, mmi ); pmulhw( mmi, mmi ); pmullw( mem64, mmi ); pmullw( mmi, mmi ); pmaddwd( mem64, mmi ); pmaddwd( mmi, mmi );
The packed arithmetic instructions operate on a set of bytes, words, or double words within a 64-bit block. For example, the PADDW instruction computes four 16-bit sums of two operand simultaneously. None of these instructions affect the CPU’s FLAGs register. Therefore, there is no indication of overflow, underflow, zero result, negative result, etc. If you need to test a result after a packed arithmetic computation, you will need to use one of the packed compare instructions (see “MMX Comparison Instructions” on page 1134). The PADDB, PADDW, and PADDD instructions add the individual bytes, words, or double words in the two 64-bit operands using a wrap-around (i.e., non-saturating) addition. Any carry out of a sum is lost; it is your responsibility to ensure that overflow never occurs. As for the integer instructions, these packed add instructions add the values in the source operand to the destination operand, leaving the sum in the destination operand. These instructions produce correct results for signed or unsigned operands (assuming overflow/underflow does not occur). The PADDSB and PADDSW instructions add the eight eight-bit or four 16-bit operands in the source and destination locations together using signed saturation arithmetic. The PADDUSB and PADDUSW instructions add their eight eight-bit or four 16-bit operands together using unsigned saturation arithmetic. Notice that you must use different instructions for signed and unsigned value since saturation arithmetic is different depending upon whether you are manipulating signed or unsigned operands. Also note that the instruction set does not support the saturated addition of double word values.
Page 1132
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set The PSUBB, PSUBW, and PSUBD instructions work just like their addition counterparts, except of course, they compute the wrap-around difference rather than the sum. These instructions compute dest=dest-src. Likewise, the PSUBSB, PSUBSW, PSUBUSB, and PSUBUSW instruction compute the difference of the destination and source operands using saturation arithmetic. While addition and subtraction can produce a one-bit carry or borrow, multiplication of two n-bit operands can produce as large as a 2*n bit result. Since overflow is far more likely in multiplication than in addition or subtraction, the MMX packed multiply instructions work a little differently than their addition and subtraction counterparts. To successfully multiply two packed values requires two instructions - one to compute the L.O. component of the result and one to produce the H.O. component of the result. The PMULLW, PMULHW, and PMULHUW instructions handle this task. The PMULLW instruction multiplies the four words of the source operand by the four words of the destination operand and stores the four L.O. words of the four double word results into the destination operand. This instruction ignores the H.O. words of the results. Used by itself, this instruction computes the wrap-around product of an unsigned or signed set of operands; this is also the L.O. words of the four products. The PMULHW and PMULHUW instructions complete the calculation. After computing the L.O. words of the four products with the PMULLW instruction, you use either the PMULHW or PMULHUW instruction to compute the H.O. words of the products. These two instruction multiply the four words in the source by the four words in the destination and then store the H.O. words of the results in the destination MMX register. The difference between the two is that you use PMULHW for signed operands and PMULHUW for unsigned operands. If you compute the full product by using a PMULLW and a PMULHW (or PMULHUW) instruction pair, then there is no overflow possible, hence you don’t have to worry about wrap-around or saturation arithmetic. The PMADDWD instruction multiplies the four words in the source operand by the four words in the destination operand to produce four double word products. Then it adds the two L.O. double words together and stores the result in the L.O. double word of the destination MMX register; it also adds together the two H.O. double words and stores their sum in the H.O. word of the destination MMX register.
11.7.4 MMX Logic Instructions pand( mem64, mmi ); pand( mmi, mmi ); pandn( mem64, mmi ); pandn( mmi, mmi ); por( mem64, mmi ); por( mmi, mmi ); pxor( mem64, mmi ); pxor( mmi, mmi );
The packed logic instructions are some examples of MMX instructions that actually operate on 64-bit values. There are no packed byte, packed word, or packed double word versions of these instructions. Of course, there is no need for special byte, word, or double word versions of these instructions since they would all be equivalent to the 64-bit logic instruction. Hence, if you want to logically AND eight bytes together in parallel, you use the PAND instruction; likewise, if you want to logically AND four words or two double words together, you just use the PAND instruction. The PAND, POR, and PXOR instructions do the same thing as their 32-bit integer instruction counterparts (AND, OR, XOR) except, of course, they operate on two 64-bit MMX operands. Hence, no further discussion of these instructions is really necessary here. The PANDN (AND NOT) instruction is a new logic instruction, so it bears a little bit of a discussion. The PANDN instruction computes the following result: dest := dest and (not source);
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1133
Chapter Eleven
Volume Four
As you may recall from the chapter on Introduction to Digital Design, this is the inhibition function. If the destination operand is B and the source operand is A, this function computes B = BA’. (see “Boolean Functions and Truth Tables” on page 205 for details of the inhibition function). If you’re wondering why Intel chose to include such a weird function in the MMX instruction set, well, this instruction has one very useful property: it forces bits to zero in the destination operand everywhere there is a one bit in the source operand. This is an extremely useful function for merging to 64-bit quantities together. The following code sequence demonstrates this: readonly AlternateNibbles: qword; nostorage; qword16( $F0F0_F0F0_F0F0_F0F0 ); // Note: needs qword16 macro! . . . // Create a 64-bit value in MM0 containing the Odd nibbles from MM1 and // the even nibbles from MM0: pandn( AlternateNibbles, mm0 ); pand( AlternateNibbles, mm1 ); por( mm1, mm0 );
// Clear the odd numbered nibbles. // Clear the even numbered nibbles. // Merge the two.
The PANDN operation is also useful for compute the set difference of two character sets. You could implement the cs.difference function using only six MMX instructions: // Compute csdest := csdest - cssrc; movq( (type qword csdest), mm0 ); pandn( (type qword cssrc), mm0 ); movq( mm0, (type qword csdest )); movq( (type qword csdest[8]), mm0 ); pandn( (type qword cssrc[8]), mm0 ); movq( mm0, (type qword csdest[8] ));
Of course, if you want to improve the performance of the HLA Standard Library character set functions, you can use the MMX logic instructions throughout that module. Examples of such code appear later in this chapter.
11.7.5 MMX Comparison Instructions pcmpeqb( mem64, mmi ); pcmpeqb( mmi, mmi ); pcmpeqw( mem64, mmi ); pcmpeqw( mmi, mmi ); pcmpeqd( mem64, mmi ); pcmpeqd( mmi, mmi ); pcmpgtb( mem64, mmi ); pcmpgtb( mmi, mmi ); pcmpgtw( mem64, mmi ); pcmpgtw( mmi, mmi ); pcmpgtd( mem64, mmi ); pcmpgtd( mmi, mmi );
Page 1134
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set The packed comparison instructions compare the destination (second) operand to the source (first) operand to test for equality or greater than. These instructions compare eight pairs of bytes (PCMPEQB, PCMPGTB), four pairs of words (PCMPEQW, PCMPGTW), or two pairs of double words (PCMPEQD, PCMPGTD). The first big difference to notice about these packed comparison instructions is that they compare the second operand to the first operand. This is exactly opposite of the standard CMP instruction (that compares the first operand to the second operand). The reason for this will become clear in a moment; however, you do have to keep in mind when using these instructions that the operands are opposite what you would normally expect. If this ordering bothers you, you can create macros to reverse the operands; we will explore this possibility a little later in this section. The second big difference between the packed comparisons and the standard integer comparison is that these instructions test for a specific condition (equality or greater than) rather than doing a generic comparison. This is because these instructions, like the other MMX instructions, do not affect any condition code bits in the FLAGs register. This may seem contradictory, after all the whole purpose of the CMP instruction is to set the condition code bits. However, keep in mind that these instructions simultaneously compare two, four, or eight operands; that implies that you would need two, four, or eight sets of condition code bits to hold the results of the comparisons. Since the FLAGs register maintains only one set of condition code bits, it is not possible to reflect the comparison status in the FLAGs. This is why the packed comparison instructions test a specific condition - so they can return true or false to indicate the result of their comparison. Okay, so where do these instructions return their true or false values? In the destination operand, of course. This is the third big difference between the packed comparisons and the standard integer CMP instruction – the packed comparisons modify their destination operand. Specifically, the PCMPEQB and PCMPGTB instruction compare each pair of bytes in the two operands and write false ($00) or true ($FF) to the corresponding byte in the destination operand, depending on the result of the comparison. For example, the instruction “pcmpgtb( MM1, MM0 );” compares the L.O. byte of MM0 (A) with the L.O. byte of MM1 (B) and writes $00 to the L.O. byte of MM0 if A is not greater than B. It writes $FF to the L.O. byte of MM0 if A is greater than B (see Figure 11.11).
63
0 Source
63
0 Destination
63
0
$00 / $FF
$00 / $FF $00 / $FF $00 / $FF $00 / $FF
$00 / $FF $00 / $FF $00 / $FF
Destination
PCMPEQB/PCMPGTB Operation Figure 11.11
PCMPEQB and PCMPGTB Instructions
The PCMPEQW, PCMPGTW, PCMPEQD, and PCMPGTD instructions work in an analogous fashion except, of course, they compare words and double words rather than bytes (see Figure 11.12 and Figure 11.13).
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1135
Chapter Eleven
Volume Four
63
0 Source
63
0 Destination
63
0 $0000 / $FFFF
$0000 / $FFFF
$0000 / $FFFF
$0000 / $FFFF
Destination
PCMPEQW/PCMPGTW Operation Figure 11.12
PCMPEQW and PCMPGTW Instructions
63
0 Source
63
0 Destination
63
0 $0000_0000 / $FFFF_FFFF
$0000_0000 / $FFFF_FFFF
Destination
PCMPEQD/PCMPGTD Operation Figure 11.13
PCMPEQD and PCMPGTD Instructions
You’ve probably already noticed that there isn’t a set of PCMPLTx instructions. Intel chose not to provide these instructions because you can simulate them with the PCMPGTx instructions by reversing the operands. That is, A>B implies B endif; etc.
11.7.6 MMX Shift Instructions psllw( mmi, mmi ); psllw( imm8, mmi ); pslld( mmi, mmi ); pslld( imm8, mmi ); psllq( mmi, mmi ); psllq( imm8, mmi ); pslrw( mmi, mmi ); pslrw( imm8, mmi ); psrld( mmi, mmi ); psrld( imm8, mmi ); pslrq( mmi, mmi ); pslrq( imm8, mmi ); psraw( mmi, mmi ); psraw( imm8, mmi ); psrad( mmi, mmi ); psrad( imm8, mmi );
The MMX shift, like the arithmetic instructions, allow you to simultaneously shift several different values in parallel. The PSLLx instructions perform a packed shift left logical operation, the PSLRx instructions do a packed logical shift right operation, and the PSRAx instruction do a packed arithmetic shift right operation. These instructions operate on word, double word, and quad word operands. Note that Intel does not provide a version of these instructions that operate on bytes. The first operand to these instructions specifies a shift count. This should be an unsigned integer value in the range 0..15 for word shifts, 0..31 for double word operands, and 0..63 for quadword operands. If the
Page 1138
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set shift count is outside these ranges, then these instructions set their destination operands to all zeros. If the count (first) operand is not an immediate constant, then it must be an MMX register. The PSLLW instruction simultaneously shifts the four words in the destination MMX register to the left the number of bit positions specified by the source operand. The instruction shifts zero into the L.O. bit of each word and the bit shifted out of the H.O. bit of each word is lost. There is no carry from one word to the other (since that would imply a larger shift operation). This instruction, like all the other MMX instructions, does not affect the FLAGs register (including the carry flag). The PSLLD instruction simultaneously shifts the two double words in the destination MMX register to the left one bit position. Like the PSLLW instruction, this instruction shifts zeros into the L.O. bits and any bits shifted out of the H.O. positions are lost. The PSLLQ is one of the few MMX instructions that operates on 64-bit quantities. This instruction shifts the entire 64-bit destination register to the left the number of bits specified by the count (source) operand. In addition to allowing you to manipulate 64-bit integer quantities, this instruction is especially useful for moving data around in MMX registers so you can pack or unpack data as needed. Although there is no PSLLB instruction to shift bits, you can simulate this instruction using a PSLLW and a PANDN instruction. After shifting the word values to the left the specified number of bits, all you’ve got to do is clear the L.O. n bits of each byte, where n is the shift count. For example, to shift the bytes in MM0 to the left three positions you could use the following two instructions: static ThreeBitsZero: byte; @nostorage; byte $F8, $F8, $F8, $F8, $F8, $F8, $F8, $F8; . . . psllw( 3, mm0 ); pandn( ThreeBitsZero, mm0 );
The PSLRW, PSLRD, and PSLRQ instructions work just like their left shift counterparts except that these instructions shift their operands to the right rather than to the left. They shift zeros into the vacated H.O. positions of the destination values and bits they shift out of the L.O. bits are lost. As with the shift left instructions, there is no PSLRB instruction but you can easily simulate this with a PSLRW and a PANDN instruction. The PSRAW and PSRAD instructions do an arithmetic shift right operation on the words or double words in the destination MMX register. Note that there isn’t a PSRAQ instruction. While shifting data to the right, these instructions replicate the H.O. bit of each word, double word, or quad word rather than shifting in zeros. As for the logical shift right instructions, bits that these instructions shift out of the L.O. bits are lost forever. The PSLLQ and PSLRQ instructions provide a convenient way to shift a quad word to the left or right. However, the MMX shift instructions are not generally useful for extended precision shifts since all data shifted out of the operands is lost. If you need to do an extended precision shift other than 64 bits, you should stick with the SHLD and SHRD instructions. The MMX shift instructions are mainly useful for shifting several values in parallel or (PSLLQ and PSLRQ) repositioning data in an MMX register.
11.8
The EMMS Instruction emms();
The EMMS (Empty MMX Machine State) instruction restores the FPU status on the CPU so that it can begin processing FPU instructions again after an MMX instruction sequence. You should always execute the EMMS instruction once you complete some MMX sequence. Failure to do so may cause any following floating point instructions to fail.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1139
Chapter Eleven
Volume Four
When an MMX instruction executes, the floating point tag word is marked valid (00s). Subsequent floating-point instructions that will be executed may produce unexpected results because the floating-point stack seems to contain valid data. The EMMS instruction marks the floating point tag word as empty. This must occur before the execution of any following floating point instructions. Of course, you don’t have to execute the EMMS instruction immediately after an MMX sequence if you’re going to execute some additional MMX instructions prior to executing any FPU instructions, but you must take care to execute this instruction if • •
You call any library routines or OS APIs (that might possibly use the FPU). You switch tasks in a cooperative fashion (for example, see the chapter on Coroutines in the Volume on Advanced Procedures). • You execute any FPU instructions. If the EMMS instruction is not used when trying to execute a floating-point instruction, the following may occur: • •
Depending on the exception mask bits of the floating-point control word, a floating point exception event may be generated. A “soft exception” may occur. In this case floating-point code continues to execute, but generates incorrect results.
The EMMS instruction is rather slow, so you don’t want to unnecessarily execute it, but it is critical that you execute it at the appropriate times. Of course, better safe that sorry; if you’re not sure you’re going to execute more MMX instructions before any FPU instructions, then go ahead and execute the EMMS instruction to clear the state.
11.9
The MMX Programming Paradigm In general, you don’t learn scalar (non-MMX) 80x86 assembly language programming and then use that same mindset when writing programs using the MMX instruction set. While it is possible to directly use various MMX instructions the same way you would the general purpose integer instructions, one phrase comes to mind when working with MMX: think parallel. This text has spent many hundreds of pages up to this point attempting to get you to think in assembly language; to think that this small section can teach you how to design optimal MMX sequence would be ludicrous. Nonetheless, a few simple examples are useful to help start you thinking about how to use the MMX instructions to your benefit in your programs. This section will begin by presenting some fairly obvious uses for the MMX instruction set, and then it will attempt to present some examples that exploit the inherent parallelism of the MMX instructions. Since the MMX registers are 64-bits wide, you can double the speed of certain data movement operations by using MMX registers rather than the 32-bit general purpose registers. For example, consider the following code from the HLA Standard Library that copies one character set object to another:
procedure cs.cpy( src:cset; var dest:cset ); nodisplay; begin cpy; push( eax ); push( ebx ); mov( dest, ebx ); mov( (type dword src), eax ); mov( eax, [ebx] ); mov( (type dword src[4]), eax ); mov( eax, [ebx+4] ); mov( (type dword src[8]), eax ); mov( eax, [ebx+8] );
Page 1140
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set mov( mov( pop( pop(
(type dword src[12]), eax ); eax, [ebx+12] ); ebx ); eax );
end cpy;
Program 11.2
HLA Standard Library cs.cpy Routine
This is a relatively simple code sequence. Indeed, a fair amount of the execution time is spent copying the parameters (20 bytes) onto the stack, calling the routine, and returning from the routine. This entire sequence can be reduced to the following four MMX instructions: movq( movq( movq( movq(
(type qword src), mm0 ); (type qword src[8]), mm1 ); mm0, (type qword dest)); mm1, (type qword dest[8]));
Of course, this sequence assumes two things: (1) it’s okay to wipe out the values in MM0 and MM1, and (2) you’ll execute the EMMS instruction a little later on after the execution of some other MMX instructions. If either, or both, of these assumptions is incorrect, the performance of this sequence won’t be quite as good (though probably still better than the cs.cpy routine). However, if these two assumptions do hold, then it’s relatively easy to implement the cs.cpy routine as an in-line function (i.e., a macro) and have it run much faster. If you really need this operation to occur inside a procedure and you need to preserve the MMX registers, and you don’t know if any MMX instructions will execute shortly thereafter (i.e., you’ll need to execute EMMS), then it’s doubtful that using the MMX instructions will help here. However, in those cases when you can put the code in-line, using the MMX instructions will be faster. Warning: don’t get too carried away with the MMX MOVQ instruction. Several programmers have gone to great extremes to use this instruction as part of a high performance MOVSD replacement. However, except in very special cases on very well designed systems, the limiting factor for a block move is the speed of memory. Since Intel has optimized the operation of the MOVSD instruction, you’re best off using the MOVSD instructions when moving blocks of memory around. Earlier, this chapter used the cs.difference function as an example when discussing the PANDN instruction. Here’s the original HLA Standard Library implementation of this function:
procedure cs.difference( src:cset; var dest:cset ); nodisplay; begin difference; push( eax ); push( ebx ); mov( dest, ebx ); mov( (type dword src), eax ); not( eax ); and( eax, [ebx] ); mov( (type dword src[4]), eax ); not( eax ); and( eax, [ebx+4] ); mov( (type dword src[8]), eax ); not( eax ); and( eax, [ebx+8] ); mov( (type dword src[12]), eax ); not( eax ); and( eax, [ebx+12] ); pop( ebx );
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1141
Chapter Eleven
Volume Four
pop( eax ); end difference;
Program 11.3
HLA Standard Library cs.difference Routine
Once again, the high-level nature of HLA is hiding the fact that calling this function is somewhat expensive. A typical call to cs.difference emits five or more instructions just to push the parameters (it takes four 32-bit PUSH instructions to pass the src character set because it is a value parameter). If you’re willing to wipe out the values in MM0 and MM1, and you don’t need to execute an EMMS instruction right away, it’s possible to compute the set difference with only six instructions – that’s about the same number of instructions (and often fewer) than are needed to call this routine, much less do the actual work. Here are those six instructions: movq( dest, mm0 ); movq( dest[8], mm1 ); pandn( src, mm0 ); pandn( src[8], mm1 ); movq( mm0, dest ); movq( mm1, dest[8] );
These six instructions replace 12 of the instructions in the body of the function. The sequence is sufficiently short that it’s reasonable to code it in-line rather than in a function. However, were you to bury this code in the cs.difference routine, you needed to preserve MM0 and MM14, and you needed to execute EMMS afterwards, this would cost more than it’s worth. As an in-line macro, however, it is going to be significantly faster since it avoids passing parameters and the call/return sequence. If you want to compute the intersection of two character sets, the instruction sequence is identical to the above except you substitute PAND for PANDN. Similarly, if you want to compute the union of two character sets, use the code sequence above substituting POR for PANDN. Again, both approaches pay off handsomely if you insert the code in-line rather than burying it in a procedure and you don’t need to preserve MMX registers or execute EMMS afterwards. We can continue with this exercise of working our way through the HLA Standard Library character set (and other) routines substituting MMX instructions in place of standard integer instructions. As long as we don’t need to preserve the MMX machine state (i.e., registers) and we don’t have to execute EMMS, most of the character set operations will be short enough to code in-line. Unfortunately, we’re not buying that much over code the standard implementations of these functions in-line from a performance point of view (though the code would be quite a bit shorter). The problem here is that we’re not “thinking in MMX.” We’re still thinking in scalar (non-parallel mode) and the fact that the MMX instruction set requires a lot of set-up (well, “tear-down” actually) negates many of the advantages of using MMX instructions in our programs. The MMX instructions are most appropriate when you compute multiple results in parallel The problem with the character set examples above is that we’re not even processing a whole data object with a single instruction; we’re actually only processing a half of a character set with a sequence of three MMX instructions (i.e., it requires six instructions to compute the intersection, union, or difference of two character sets). At best, we can only expect the code to run about twice as fast since we’re processing 64 bits at a time instead of 32 bits. Executing EMMS (and, God help us, having to preserve MMX registers) negates much of what we might gain by using the MMX instructions. Again, we’re only going to see a speed improvement if we process multiple objects with a single MMX instruction. We’re not going to do that manipulating large objects like character sets. One data type that will let us easily manipulate up to eight objects at one time is a character string. We can speed up many character string operations by operating on eight characters in the string at one time. Consider the HLA Standard Library str.uppercase procedure. This function steps through each character of 4. Actually, the code could be rewritten easily enough to use only one MMX register.
Page 1142
© 2001, By Randall Hyde
Beta Draft - Do not distribute
The MMX Instruction Set a string, tests to see if it’s a lower case character, and if so, converts the lower case character to upper case. A good question to ask is “can we process eight characters at a time using the MMX instructions?” The answer turns out to be yes and the MMX implementation of this function provides an interesting perspective on writing MMX code. At first glance it might seem impractical to use the MMX instructions to test for lower case characters and convert them to upper case. Consider the typical scalar approach that tests and converts a single character at a time: > cmp( al, ‘a’ ); jb noConversion; cmp( al, ‘z’ ); ja noConversion; sub( $20, al );
// Could also use AND($5f, al); here.
noConversion:
This code first checks the value in AL to see if it’s actually a lower case character (that’s the CMP and Jcc instructions in the code above). If the character is outside the range ‘a’..’z’ then this code skips over the conversion (the SUB instruction); however, if the code is in the specified range, then the sequence above drops through to the SUB instruction and converts the lower case character to upper case by subtracting $20 from the lower case character’s ASCII code (since lower case characters always have bit #5 set, subtracting $20 always clears this bit). Any attempt to convert this code directly to an MMX sequence is going to fail. Comparing and branching around the conversion instruction only works if you’re converting one value at a time. When operating on eight characters simultaneously, any mixture of the eight characters may or may not require conversion from lower case to upper case. Hence, we need to be able to perform some calculation that is benign if the character is not lower case (i.e., doesn’t affect the character’s value) while converting the character to upper case if it was lower case to begin with. Worse, we have to do this with pure computation since flow of control isn’t going to be particularly effective here (if we test each individual result in our MMX register we won’t really save anything over the scalar approach). To save you some suspense, yes, such a calculation does exist. Consider the following algorithm that converts lower case characters to upper case: > // bl := al >= ‘a’ // // // // // //
bh := al = ‘a’) && (al UC. . . . movq( ConvFactor, mm4 ); // Eight copies of conversion value. movq( A, mm2 ); // Put eight “a” characters in mm2. movq( Z, mm3 ); // Put eight “z” characters in mm3. movq( [edi], mm0 ); // Get next eight characters of our string. movq( mm0, mm1 ); // We need two copies. pcmpgtb( mm2, mm1 ); // Generate 1's in MM1 everywhere chars >= 'a' pcmpgtb( mm0, mm3 ); // Generate 1's in MM3 everywhere chars end Middle; begin Outer; > end Outer;
There are two advantages to this scheme: 1.
The identifier Inner is local to the Middle procedure and is not accessible outside Middle (not even to Outer); similarly, the identifier Middle is local to Outer and is not accessible outside Outer. This information hiding feature lets you prevent other code from accidentally accessing these nested procedures, just as for local variables.
2.
The local identifiers i and j are accessible to the nested procedures.
Before discussing how to use this feature to access non-local variables in a more reasonable fashion using static links, let’s also consider the issue of the static link itself. The static link is really nothing more than a special parameter to these functions, therefore we can declare the static link as a parameter using HLA’s high level procedure declaration syntax. Since the static link must always be at a fixed offset in the activation record for all procedures, the most reasonable thing to do is always make the stack link the first parameter in the list5; this ensures that the static link is always found at offset "+8" in the activation record. Here’s the declarations above with the static links added as parameters: procedure Outer( outerStaticLink:dword ); @nodisplay; @noframe;
Page 1384
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting var i:int32; procedure Middle( middleStaticLink:dword ); @nodisplay; @noframe; var j:int32; procedure Inner( innerStaticLink:dword ); @nodisplay; @noframe; var k:int32; begin Inner; > end Inner; begin Middle; > end Middle; begin Outer; > end Outer;
All that remains is to discuss how one references non-local (automatic) variables in this code. As you may recall from the chapter on Intermediate Procedures in Volume Four, HLA references local variables and parameters using an address expression of the form "[ebp±offset]" where offset represents the offset of the variable into the activation record (parameters typically have a positive offset, local variables have a negative offset). Indeed, we can use the HLA compile-time @offset function to access the variables without having to manually figure out the variable’s offset in the activation record, e.g., mov( [ebp+@offset( i )], eax );
The statement above is semantically equivalent to mov( i, eax );
assuming, of course, that i is a local variable in the current procedure. Because HLA automatically associates the EBP register with local variables, HLA will not allow you to use a non-local variable reference in a procedure. For example, if you tried to use the statement "mov( i, eax );" in procedure Inner in the example above, HLA would complain that you cannot access non-local in this manner. The problem is that HLA associates EBP with automatic variables and outside the procedure in which you declare the local variable, EBP does not point at the activation record holding that variable. Hence, the instruction "mov( i, eax );" inside the Inner procedure would actually load k into EAX, not i (because k is at the same offset in Inner’s activation record as i in Outer’s activation record). While it’s nice that HLA prevents you from making the mistake of such an illegal reference, the fact remains that there needs to be some way of referring to non-local identifiers in a procedure. HLA uses the following syntax to reference a non-local, automatic, variable: reg32::identifier
5. Assuming, of course, that you’re using the default Pascal calling convention. If you were using the CDECL or STDCALL calling convention, you would always make the static link the last parameter in the parameter list.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1385
Chapter Five
Volume Five
reg32 represents any of the 80x86’s 32-bit general purpose registers and identifier is the non-local identifier you wish to access. HLA substitutes an address expression of the form "[reg32+@offset(identifier)]" for this expression. Given this syntax, we can now rewrite the Inner, Middle, and Outer example in a high level fashion as follows: procedure Outer( outerStaticLink:dword ); @nodisplay; var i:int32; procedure Middle( middleStaticLink:dword ); @@nodisplay; var j:int32; procedure Inner( innerStaticLink:dword ); nodisplay; var k:int32; begin Inner; mov( mov( mov( add(
3, k ); // Initialize k. innerStaticLink, ebx ); // Static link to previous lex level. ebx::j, eax ); // Get j’s value. k, eax ); // Add in k’s value.
// Get static link to Outer’s activation record and // add in i’s value: mov( ebx::outerStaticLink ebx ); add( ebx::i, eax ); // Display the results: stdout.puti( eax ); stdout.newln();
// Display the sum.
end Inner;
begin Middle; mov( 2, j ); // Initialize j. mov( middleStaticLink, ebx ); // Get the static link. mov( ebx::i, eax ); // Get i’s value. add( j, eax ); // Compute i+j. stdout.put( eax, nl ); // Display their sum. Inner( ebp );
// Inner’s static link is EBP.
end Middle;
begin Outer; mov( 1, i ); Middle( ebp );
// Give i an initial value. // Static link for middle.
end Outer;
This example provides only a small indication of the work needed to access variables using static links. In particular, accessing @ebx::i in the Inner procedure was simplified by the fact that EBX already contained Middle’s static link. In the typical case, it’s going to take one instruction for each lex level the code
Page 1386
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting traverses in order to access a given non-local automatic variable. While this might seem bad, in typical programs you rarely access non-local variables, so the situation doesn’t arrive often enough to worry about. HLA does not provide built-in support for static links. If you are going to use static links in your programs, then you must manually pass the static links as parameters to your procedures (i.e., HLA will not take care of this for you). While it is possible to modify HLA to automatically handle static links for you, HLA provides a different mechanism for accessing non-local variables - the display. To learn about displays, keep reading...
5.2.6 The Display After reading the previous section you might get the idea that one should never use non-local variables, or limit non-local accesses to those variables declared at lex level zero. After all, it’s often easy enough to put all shared variables at lex level zero. If you are designing a programming language, you can adopt the C language designer’s philosophy and simply not provide block structure. Such compromises turn out to be unnecessary. There is a data structure, the display, that provides efficient access to any set of non-local variables. A display is simply an array of pointers to activation records. Display[0] contains a pointer to the most recent activation record for lex level zero, Display[1] contains a pointer to the most recent activation record for lex level one, and so on. Assuming you’ve maintained the Display array in the current STATIC segment it only takes two instructions to access any non-local variable. Pictorially, the display works as shown in Figure 5.7.
Lex Level 0 Lex Level 1 Display 0 1 2 3 4 5 6
Lex Level 2 Lex Level 3 Lex Level 3 Lex Level 4 Lex Level 5
????
Lex Level 5 Lex Level 5
Figure 5.7
The Display
Note that the entries in the display always point at the most recent activation record for a procedure at the given lex level. If there is no active activation record for a particular lex level (e.g., lex level six above), then the entry in the display contains garbage. The maximum lexical nesting level in your program determines how many elements there must be in the display. Most programs have only three or four nested procedures (if that many) so the display is usually quite small. Generally, you will rarely require more than 10 or so elements in the display.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1387
Chapter Five
Volume Five
Another advantage to using a display is that each individual procedure can maintain the display information itself, the caller need not get involved. When using static links the calling code has to compute and pass the appropriate static link to a procedure. Not only is this slow, but the code to do this must appear before every call. If your program uses a display, the callee, rather than the caller, maintains the display so you only need one copy of the code per procedure. Although maintaining a single display in the STATIC segment is easy and efficient, there are a few situations where it doesn’t work. In particular, when passing procedures as parameters, the single level display doesn’t do the job. So for the general case, a solution other than a static array is necessary. Therefore, this chapter will not go into the details of how to maintain a static display since there are some problems with this approach. Intel, when designing the 80286 microprocessor, studied this problem very carefully (because Pascal was popular at the time and they wanted to be able to efficiently handle Pascal constructs). They came up with a generalized solution that works for all cases. Rather than using a single display in a static segment, Intel’s designers decided to have each procedure carry around its own local copy of the display. The HLA compiler automatically builds an Intel-compatible display at the beginning of each procedure, assuming you don’t use the @NODISPLAY procedure option. An Intel-compatible display is part of a procedure’s activation record and takes the form shown in Figure 5.8:
Previous Stack Contents Parameters (if any)
Return Address Dynamic Link (previous EBP value)
EBP
Display[0] Display[1] . . . Display[n] Local Variables (if any)
ESP
Figure 5.8
Intel-Compatible Display in an Activation Record
If we assume that the lex level of the main program is zero, then the display for a given procedure at lex level n will contain n+1 double word elements. Display[0] is a pointer to the activation record for the main program, Display[1] is a pointer to the activation record of the most recently activated procedure at lex level one. Etc. Display[n] is a pointer to the current procedure’s activation record (i.e., it contains the value found in EBP while this procedure executes). Normally, the procedure would never access element n of Display since the procedure can index off EBP directly; However, as you’ll soon see, we’ll need the Display[n] entry to build displays for procedures at higher lex levels.
Page 1388
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting One important fact to note about the Intel-compatible display array: it’s elements appear backwards in memory. Remember, the stack grows downwards from high addresses to low addresses. If you study Figure 5.8 for a moment you’ll discover that Display[0] is at the highest memory address and Display[n] is at the lowest memory address, exactly the opposite for standard array organization. It turns out that we’ll always access the display using a constant offset, so this reversal of the array ordering is no big deal. We’ll just use negative offsets from Display[0] (the base address of the array) rather than the usual positive offsets. If the @NODISPLAY procedure option is not present, HLA treats the display as a predeclared local variable in the procedure and inserts the name "_display_" into the symbol table. The offset of the _display_ variable in the activation record is the offset of the Display[0] entry in Figure 5.8. Therefore, you can easily access an element of this array at run-time using a statement like: mov( _display_[ -lexLevel*4 ], ebx );
The "*4" component appears because _display_ is an array of double words. lexLevel must be a constant value that specifies the lex level of the procedure whose activation record you’d like to obtain. The minus sign prefixing this expression causes HLA to index downwards in memory as appropriate for the display object. Although it’s not that difficult to figure out the lex level of a procedure manually, the HLA compile-time language provides a function that will compute the lex level of a given procedure for you – the @LEX function. This function accepts a single parameter that must be the name of an HLA procedure (that is currently in scope). The @LEX function returns an appropriate value for that function that you can use as an index into the _display_ array. Note that @LEX returns one for the main program, two for procedures you declare in the main program, three for procedures you declare in procedures you declare in the main program, etc. If you are writing a unit, all procedures you declare in that unit exist at lex level two. The following program is a variation of the Inner/Middle/Outer example you’ve seen previously in this chapter. This example uses displays and the @LEX function to access the non-local automatic variables:
program DisplayDemo; #include( "stdlib.hhf" ) macro Display( proc ); _display_[ -@lex( proc ) * 4] endmacro; procedure Outer; var i:int32; procedure Middle; var j:int32; procedure Inner; var k:int32; begin Inner; mov( mov( mov( add(
4, k ); Display( Middle ), ebx ); ebx::j, eax ); // Get j's value. k, eax ); // Add in k's value.
// Get static link to Outer's activation record and
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1389
Chapter Five
Volume Five // add in i's value: mov( Display( Outer ), ebx ); add( ebx::i, eax ); // Display the results: stdout.puti32( eax ); stdout.newln();
// Display the sum.
end Inner;
begin Middle; mov( 2, j ); mov( Display( Outer ), ebx ); mov( ebx::i, eax ); add( j, eax ); stdout.puti32( eax ); stdout.newln();
// // // // //
Initialize j. Get the static link. Get i's value. Compute i+j. Display their sum.
Inner(); end Middle;
begin Outer; mov( 1, i ); Middle();
// Give i an initial value. // Static link for middle.
end Outer; begin DisplayDemo; Outer(); end DisplayDemo;
Program 5.1
Demonstration of Displays in an HLA Program
Assuming you do not attach the @NODISPLAY procedure option to a procedure you write in HLA, HLA will automatically emit the code (as part of the standard entry sequence) to build a display for that procedure. Up to this chapter, none of the programs in this text have used nested procedures6, therefore there has been no need for a display. For that reason, most programs appearing in this text (since the introduction of the @NODISPLAY option) have attached @NODISPLAY to the procedure. It doesn’t make a program incorrect to build a display if you never use it, but it does make the procedure a tiny bit slower and a tiny bit larger, hence the use of the @NODISPLAY option up to this point.
6. Technically, this statement is not true. Every procedure you’ve written has been nested inside the main program. However, none of the sample programs to date have considered the possibility of accessing the main program’s automatic (VAR) variables. Hence there has been no need for a display until now).
Page 1390
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting
5.2.7 The 80x86 ENTER and LEAVE Instructions When designing the 80286, Intel’s CPU designers decided to add two instructions to help maintain displays. This was done because Pascal was the popular high level language at the time and Pascal was a block structured language that could benefit from having a display. Since then, C/C++ has replaced Pascal as the most common implementation language, so these two instructions have fallen into disuse since C/C++ is not a block structured language. Still, you can take advantage of these instructions when writing assembly code with nested procedures. Unfortunately, these two instructions, ENTER and LEAVE, are quite slow. The problem with these instructions is that C/C++ became popular shortly after Intel designed these instructions, so Intel never bothered to optimize them since few high-performance compilers actually used these instructions. On today’s processors, it’s actually faster to execute a sequence of instructions that do the same job than it is to actually use these instructions; hence most compilers that build displays (like HLA) emit a discrete sequence of instructions to build the display. Do keep in mind that, although these two instructions are slower than their discrete counterparts, they are generally shorter. So if you’re trying to save code space rather than write the fastest possible code, using ENTER and LEAVE can help. The LEAVE instruction is very simple to understand. It performs the same operation as the two instructions: mov( ebp, esp ); pop( ebp );
Therefore, you may use the instruction for the standard procedure exit code. On an 80386 or earlier processor, the LEAVE instruction is faster than the equivalent move and pop sequence. However, the LEAVE instruction is slower on 80486 and later processors. The ENTER instruction takes two operands. The first is the number of bytes of local storage the current procedure requires, the second is the lex level of the current procedure. The enter instruction does the following: // enter( Locals, LexLevel ); push( ebp ); mov( esp, tempreg ); cmp( LexLevel, 0 ); je Lex0; lp: dec( LexLevel ); jz Done; sub( 4, ebp ); pushd( [ebp] ); jmp lp;
// Save dynamic link // Save for later. // Done if this is lex level zero.
// Index into display in previous activation record // and push the element there.
Done: push( tempreg );
// Add entry for current lex level.
mov( tempreg, ebp ); sub( _vars_, esp );
// Pointer to current activation record. // Allocate storage for local variables.
Lex0:
As you can see from this code, the ENTER instruction copies the display from activation record to activation record. This can get quite expensive if you nest the procedures to any depth. Most high level languages, if they use the ENTER instruction at all, always specify a nesting level of zero to avoid copying the display throughout the stack. The ENTER instruction puts the value for the _display_[n] entry at location EBP-(n*4). The ENTER instruction does not copy the value for display[0] into each stack frame. Intel assumes that you will keep the main program’s global variables in the data segment. To save time and memory, they do not bother copying the _display_[0] entry. This is why HLA uses lex level one for the main program – in HLA the main program can have automatic variables and, therefore, requires a display entry.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1391
Chapter Five
Volume Five
The ENTER instruction is very slow, particularly on 80486 and later processors. If you really want to copy the display from activation record to activation record it is probably a better idea to push the items yourself. The following code snippets show how to do this: // enter( n, 0 );
(n bytes of local variables, lex level zero.)
push( ebp ); mov( esp, ebp ); sub( n, esp );
// As you can see, "enter( n, 0 );" corresponds to // the standard entry sequence for non-nested // procedures.
// enter( n, 1 ); push( ebp ); pushd( [ebp-4] ); lea( ebp, [esp-4] ); sub( n, esp );
// // // //
Save dynamic link (current EBP value). Push display[1] entry from previous act rec. Point EBP at the base of new act rec. Allocate local variables.
// // // // //
Save dynamic link (current EBP value). Push display[1] entry from previous act rec. Push display[2] entry from previous act rec. Point EBP at the base of new act rec. Allocate local variables.
push( ebp ); // pushd( [ebp-4] ); // pushd( [ebp-8] ); // pushd( [ebp-12] ); // lea( ebp, [esp-12] ); // sub( n, esp ); //
Save dynamic link (current EBP value). Push display[1] entry from previous act rec. Push display[2] entry from previous act rec. Push display[3] entry from previous act rec. Point EBP at the base of new act rec. Allocate local variables.
// enter( n, 2 ); push( ebp ); pushd( [ebp-4] ); pushd( [ebp-8] ); lea( ebp, [esp-8] ); sub( n, esp );
// enter( n, 3 );
// enter( n, 4 ); push( ebp ); // pushd( [ebp-4] ); // pushd( [ebp-8] ); // pushd( [ebp-12] ); // pushd( [ebp-16] ); // lea( ebp, [esp-16] ); // sub( n, esp ); //
Save dynamic link (current EBP value). Push display[1] entry from previous act Push display[2] entry from previous act Push display[3] entry from previous act Push display[3] entry from previous act Point EBP at the base of new act rec. Allocate local variables.
rec. rec. rec. rec.
// etc.
If you are willing to believe Intel’s cycle timings, you’ll find that the ENTER instruction is almost never faster than a straight line sequence of instructions that accomplish the same thing. If you are interested in saving space rather than writing fast code, the ENTER instruction is generally a better alternative. The same is generally true for the LEAVE instruction as well. It is only one byte long, but it is slower than the corresponding "mov( esp, ebp );" and "pop( ebp );" instructions. The following sample program demonstrates how to access non-local variables using a display. This code does not use the @LEX function in the interest of making the lex level access clear; normally you would use the @LEX function rather than the literal constants appearing in this example.
program EnterLeaveDemo; #include( "stdlib.hhf" )
Page 1392
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting
procedure LexLevel2; procedure LexLevel3a; begin LexLevel3a; stdout.put( nl "LexLevel3a:" nl ); stdout.put( "esp = ", esp, " ebp = ", ebp, nl ); mov( _display_[0], eax ); stdout.put( "display[0] = ", eax, nl ); mov( _display_[-4], eax ); stdout.put( "display[-1] = ", eax, nl ); end LexLevel3a; procedure LexLevel3b; noframe; begin LexLevel3b; enter( 0, 3 ); stdout.put( nl "LexLevel3b:" nl ); stdout.put( "esp = ", esp, " ebp = ", ebp, nl ); mov( _display_[0], eax ); stdout.put( "display[0] = ", eax, nl ); mov( _display_[-4], eax ); stdout.put( "display[-1] = ", eax, nl ); leave; ret(); end LexLevel3b;
begin LexLevel2; stdout.put( "LexLevel2: esp=", esp, " ebp = ", ebp, nl nl ); LexLevel3a(); LexLevel3b(); end LexLevel2; begin EnterLeaveDemo; stdout.put( "main: esp = ", esp, " ebp= ", ebp, nl ); LexLevel2(); end EnterLeaveDemo;
Program 5.2
Demonstration of Enter and Leave in HLA
Starting with HLA v1.32, HLA provides the option of emitting ENTER or LEAVE instructions rather than the discrete sequences for a procedure’s standard entry and exit sequences. The @ENTER procedure options tells HLA to emit the ENTER instruction for a procedure, the @LEAVE procedure option tells HLA to emit the LEAVE instruction in place of the standard exit sequence. See the HLA documentation for more details.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1393
Chapter Five
5.3
Volume Five
Passing Variables at Different Lex Levels as Parameters. Accessing variables at different lex levels in a block structured program introduces several complexities to a program. The previous section introduced you to the complexity of non-local variable access. This problem gets even worse when you try to pass such variables as parameters to another program unit. The following subsections discuss strategies for each of the major parameter passing mechanisms. For the purposes of discussion, the following sections will assume that “local” refers to variables in the current activation record, “global” refers to static variables in a static segment, and “intermediate” refers to automatic variables in some activation record other than the current activation record (this includes automatic variables in the main program). These sections will pass all parameters on the stack. You can easily modify the details to pass these parameters elsewhere, should you choose.
5.3.1 Passing Parameters by Value Passing value parameters to a program unit is no more difficult than accessing the corresponding variables; all you need do is push the value on the stack before calling the associated procedure. To (manually) pass a global variable by value to another procedure, you could use code like the following: push( GlobalVariable ); call proc;
// Assume "GlobalVariable" is a static object.
To pass a local variable by value to another procedure, you could use the following code7: push( LocalVariable ); call proc;
To pass an intermediate variable as a value parameter, you must first locate that intermediate variable’s activation record and then push its value onto the stack. The exact mechanism you use depends on whether you are using static links or a display to keep track of the intermediate variable’s activation records. If using static links, you might use code like the following to pass a variable from two lex levels up from the current procedure: mov( [ebp+8], ebx ); mov( [ebx], ebx ); push( ebx::IntVar ); call proc;
// Assume static link is at offset 8 in Act Rec. // Traverse the second static link. // Push the intermediate variable’s value.
Passing an intermediate variable by value when you are using a display is somewhat easier. You could use code like the following to pass an intermediate variable from lex level one: mov( _display_[ -1*4 ], ebx ); push( ebx::IntVar ); call proc;
// Remember each _display_ entry is 4 bytes. // Pass the intermediate variable.
It is possible to use the HLA high level procedure calling syntax when passing intermediate variables as parameters by value. The following code demonstrates this: mov( _display_[ -1*4 ], ebx ); proc( ebx::IntVar );
This example uses a display because HLA automatically builds the display for you. If you decide to use static links, you’ll have to modify this code appropriately.
7. The non-global examples all assume the variable is at offset -2 in their activation record. Change this as appropriate in your code.
Page 1394
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting
5.3.2 Passing Parameters by Reference, Result, and Value-Result The pass by reference, result, and value-result parameter mechanisms generally pass the address of parameter on the stack8. In an earlier chapter, you’ve seen how to pass global and local parameters using these mechanisms. In this section we’ll take a look at passing intermediate variables by reference, value/result, and by result. To pass an intermediate variable by reference, value/result, or by result, you must first locate the activation record containing the variable so you can compute the effective address into the stack segment. When using static links, the code to pass the parameter’s address might look like the following: mov( [ebp+8], ebx ); mov( [ebx], ebx ); lea( eax, ebx::IntVar ); push( eax ); call proc;
// // // //
Assume static link is at offset 8 in Act Rec. Traverse the second static link. Get the intermediate variable’s address. Pass the address on the stack.
When using a display, the calling sequence might look like the following: mov( _display_[ -1*4 ], ebx ); lea( eax, ebx::IntVar ); push( eax ); call proc;
// Remember each _display_ entry is 4 bytes. // Pass the intermediate variable.
It is possible to use the HLA high level procedure calling syntax when passing parameters by reference, by value/result, or by result. The following code demonstrates this: mov( _display_[ -1*4 ], ebx ); proc( ebx::IntVar );
The nice thing about the high level syntax is that it is identical whether you’re passing parameters by value, reference, value/result, or by result. As you may recall from the chapter on Low-Level Parameter Implementation, there is a second way to pass a parameter by value/result. You can push the value onto the stack and then, when the procedure returns, pop this value off the stack and store it back into the variable from whence it came. This is just a special case of the pass by value mechanism described in the previous section.
5.3.3 Passing Parameters by Name and Lazy-Evaluation in a Block Structured Language Since you pass a thunk when passing parameters by name or by lazy-evaluation, the presence of global, intermediate, and local variables does not affect the calling sequence to the procedure. Instead, the thunk has to deal with the differing locations of these variables. Since HLA thunks already contain the pointer to the activation record for that thunk, returning a local (to the thunk) variable’s address or value is especially trivial. About the only catch is what happens if you pass an intermediate variable by name or by lazy evaluation to a procedure. However, the calculation of the ultimate address (pass by name) or retrieval of the value (pass by lazy evaluation) is nearly identical to the code in the previous two sections. Hence, this code will be left as an exercise at the end of this volume.
8. As you may recall, pass by reference, value-result, and result all use the same calling sequence. The differences lie in the procedures themselves.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1395
Chapter Five
5.4
Volume Five
Passing Procedures as Parameters Many programming languages let you pass a procedure or function name as a parameter. This lets the caller pass along various actions to perform inside a procedure. The classic example is a plot procedure that graphs some generic math function passed as a parameter to plot. HLA lets you pass procedures and functions by declaring them as follows: procedure DoCall( x:procedure ); begin DoCall; x(); end DoCall;
The statement "DoCall(xyz);" calls DoCall that, in turn, calls procedure xyz. Whenever you pass a procedure’s address in this manner, HLA only passes the address of the procedure as the parameter value. Upon entry into procedure x via the DoCall invocation, the x procedure first creates its own display by copying appropriate entries from DoCall’s display. This gives x access to all intermediate variables that HLA allows x to access. Keep in mind that thunks are special cases of functions that you call indirectly. However, there is a major difference between a thunk and a procedure – thunks carry around the pointer to the activation record they intend to use. Therefore, the thunk does not copy the calling procedure’s display; instead, it uses the display of an existing procedure to access intermediate variables.
5.5
Faking Intermediate Variable Access As you’ve probably noticed by now, accessing non-local (intermediate) variables is a bit less efficient than accessing local or global (static) variables. High level languages like Pascal that support intermediate variable access hide a lot of effort from the programmer that becomes painfully visible when attempting the same thing in assembly language. When attempting to write maintainable and readable code, you may want to break up a large procedure into a sequence of smaller procedures and make those smaller procedures local to a surrounding procedure that simply calls these smaller routines. Unfortunately, if the original procedure you’re breaking up contains lots of local variables that code throughout the procedure shares, short of restructuring your code you will have to leave those variables in the outside procedure and access them as intermediate variables. Using the techniques of this chapter may make this task a bit unpleasant, especially if you access those variables a large number of times. This may dissuade you from attempting to break up the procedure into smaller units. Fortunately, under certain special circumstances, you can avoid the headaches of intermediate variable access in situations like this. Consider the following short code sequence: procedure MainProc; var ALocalVar: dword; procedure proc; @nodisplay; @noframe; begin proc; mov( ebp::ALocalVar, eax ); ret(); end proc; begin MainProc; mov( 5, ALocalVar );
Page 1396
© 2001, By Randall Hyde
Version: 9/9/02
Lexical Nesting proc(); // EAX now contains five... end MainProc;
Notice that the proc procedure has the @NOFRAME option, so HLA does not emit the standard entry sequence to build an activation record. This means that upon entry to proc, EBP still points at MainProc’s activation record. Therefore, this code can access the ALocalVar variable by using the syntax ebp::ALocalVar. No other code is necessary. The drawback to this scheme is that proc may not contain any parameters or local variables (which would require setting EBP to point at proc’s activation record). However, if you can live with this limitation, then this is a useful trick for accessing local variables one lex level up from the current procedure.
5.6
Putting It All Together This chapter introduces the concept of lexical nesting commonly found in block structured languages like Pascal, Ada, and Modula-2. This chapter introduces the notion of scope, static procedure nesting, binding, variable lifetime, static links, the display, intermediate variables, and passing intermediate variables as parameters. Although few assembly programs use these features, they are occasionally useful, especially when writing code that interfaces with a high level language that supports static nesting.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1397
Chapter Five
Page 1398
Volume Five
© 2001, By Randall Hyde
Version: 9/9/02
Volume Five Questions, Projects, and Lab Exercises
Questions, Projects, and Labs 6.1
Chapter Six
Questions 1)
What is a First Class Object?
2)
What is the difference between deferred and eager evaluation?
3)
What is a thunk?
4)
How does HLA implement thunk objects?
5)
What is the purpose of the HLA THUNK statement?
6)
What is the difference between a thunk and procedure variable?
7)
What is the syntax for declaring a thunk as a formal parameter?
8)
What is the syntax for passing a thunk constant as an actual parameter?
9)
Explain how an activation record’s lifetime can affect the correctness of a thunk invocation.
10)
What is a trigger and how can you use a thunk to create a trigger?
11)
The yield statement in an iterator isn’t a true HLA statement. It’s actually equivalent to something else. What is it equivalent to?
12)
What is a resume frame?
13)
What is the problem with breaking out of a FOREACH loop using the BREAK or BREAKIF statement?
14)
What is the difference between a coroutine and a procedure?
15)
What is the difference between a coroutine and a generator?
16)
What is the purpose of the coret call in the coroutines class?
17)
What is the limitation of a coret operation versus a standard RET instruction?
18)
What is the lifetime of the automatic variables declared in a coroutine procedure?
19)
Where is the easiest place to pass parameters between two coroutines?
20)
Why is it difficult to pass parameters between coroutines on the stack?
21)
State seven places you can pass parameters between two procedures.
22)
State at least six different ways you can pass parameters.
23)
Where is the most efficient place to pass parameters?
24)
Where do most high level languages pass their parameters?
25)
What some problems with passing parameters in global variables?
26)
What is the difference between the Pascal/HLA and the CDECL parameter passing mechanisms?
27)
What is the difference between the Pascal/HLA and the STDCALL parameter passing mechanisms?
28)
What is the difference between the STDCALL and the CDECL parameter passing mechanisms?
29)
Provide one reason why some assembly code might require the caller to remove the parameters from the stack.
30)
What is the disadvantage of having the caller remove procedure parameters from the stack?
31)
Explain how to pass parameters in the code stream.
32)
Describe how you might pass a reference parameter in the code stream. What is the limitation on such reference parameters?
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1399
Chapter Six 33)
Explain how you might pass a “pass by value/result” or “pass by result” parameter in the code stream.
34)
What is a parameter block?
35)
What is the difference between pass by value/result and pass by result?
36)
What is the difference between pass by name and pass by lazy evaluation?
37)
What parameter passing mechanism does pass by name most closely resemble?
38)
What parameter passing mechanism does pass by lazy evaluation most closely resemble?
39)
When passing a parameter by name or lazy evaluation, what does HLA actually pass on the stack.
40)
What is the difference in the calling sequence between pass by reference, pass by value/result, and pass by result (assuming the standard implementation)?
41)
Give an example where pass by value/result produces different semantics than pass by reference.
42)
What parameter passing mechanism(s) support(s) deferred execution? For each of the following subquestions, assume that a parameter (in) is passed into one procedure and that procedure passes the parameter on to another procedure (out). Specify how to do this given the following in and out parameter passing mechanisms (if possible):
43)
Parameter is passed into the first procedure by value and passed on to the second procedure by: a. value b. reference c. result d. result e: name f: lazy evaluation
44)
Parameter is passed into the first procedure by reference and passed on to the second procedure by: a. value b. reference c. result d. result e: name f: lazy evaluation
45)
Parameter is passed into the first procedure by value/result and passed on to the second procedure by: a. value b. reference c. result d. result e: name f: lazy evaluation
46)
Parameter is passed into the first procedure by result and passed on to the second procedure by: a. value
Page 1400
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Volume Five Questions, Projects, and Lab Exercises b. reference c. result d. result e: name f: lazy evaluation 47)
Parameter is passed into the first procedure by name and passed on to the second procedure by: a. value b. reference c. result d. result e: name f: lazy evaluation
48)
Parameter is passed into the first procedure by lazy evaluation and passed on to the second procedure by: a. value b. reference c. result d. result e: name f: lazy evaluation
49)
Describe how to pass a variable number of parameters to some procedure. Describe at least two different ways to do this.
50)
How can you return a function’s result on the stack?
51)
What’s the best way to return a really large function result?
52)
What is a lex level?
53)
What is a static link?
54)
What does the term “scope” mean?
55)
What is a “display”?
56)
What does the term “address binding” mean?
57)
What is the “lifetime” of a variable?
58)
What is an intermediate variable?
59)
How do you access intermediate variables using static links? Give an example.
60)
How do you access intermediate variables using a display? Give an example.
61)
How do you nest procedures in HLA?
62)
What does the @lex function return?
63)
What is one major difference between the _display_ array and standard arrays?
64)
What does the ENTER instruction do? Provide an algorithm that describes its operation.
65)
What does the LEAVE instruction do? Provide an equivalent machine code sequence.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1401
Chapter Six 66)
Why does HLA not emit the ENTER and LEAVE instructions in those procedures that have a display?
67)
Provide a short code example that demonstrates how to pass an intermediate variable by value to another procedure.
68)
Provide a short code example that demonstrates how to pass an intermediate variable by reference to another procedure.
69)
Provide a short code example that demonstrates how to pass an intermediate variable by value/result to another procedure.
70)
Provide a short code example that demonstrates how to pass an intermediate variable by result to another procedure.
71)
Provide a short code example that demonstrates how to pass an intermediate variable by name to another procedure.
72)
Provide a short code example that demonstrates how to pass an intermediate variable by lazy evaluation to another procedure.
6.2
Programming Problems 1)
Rewrite Program 1.1 in Chapter One (Fibonacci number generation) to use a pass by reference parameter rather than a thunk parameter.
2)
Write a function ifx that has the following prototype: procedure ifx( expr:boolean; lazy trueVal:dword; lazy falseVal:dword );
The function should test expr’s value; if true, it should evaluate and return trueVal, else it should evaluate and return falseVal. Write a main program that tests the execution of this function. 3)
Write an iterator that returns all “words” of a given length. The iterator should have the following prototype: iterator wordOfLength( length:uns32 );
// returns( “eax” );
The iterator should allocate a string with length characters on the heap, initialize this string, and return a pointer to the string in the EAX register. On each call to this iterator, it should return the next string of alphabetic characters using a lexicographical ordering. E.g., for strings of length three, the iterator would return aaa, aab , aac, aad, ..., aaz, aba, abb, abc, ..., zzz. Write a main program to test this word generator. Don’t forget to free the storage associated with each string in the main program when you’re done with the string. 4)
Modify the program in programming project (3) so that it only returns strings that have a maximum of two consenants in a row and a maximum of three vowels in a row.
5)
Write a “Tic-Tac-Toe” game that uses coroutines to make each move. One coroutine should prompt the “X” player for a move, the second coroutine should prompt the “O” player for a move (note that the moves are made by players, not by the computer). The main program/coroutine should call the other two coroutines and determine if there was a win/loss/draw after each move.
6)
Modify programming project (5) so that the computer makes the moves for the “O” player.
7)
Write a factorial function (n!) that passes a real80 parameter on the FPU stack and returns the real80 result on the FPU stack. (note: n! = 1*2*3*...*n).
8)
Write the equivalent of cs.difference that passes the two character sets to the function in the MMX registers MM0 / MM1 and MM2 / MM3. Return the character sets’ difference in MM0 / MM1.
9)
Write a “printstr” procedure that expects a pointer to a zero-terminated sequence of characters to follow the call to printstr in the code stream. This procedure should print the string to the standard output device. A typcial call will look like the following:
Page 1402
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Volume Five Questions, Projects, and Lab Exercises static staticStrVar: char; nostorage; byte “Hello world”, 0; . . . call printstr dword staticStrVar
Note that this function must work with any zero terminated string; don’t assume the string is an HLA string. Write a main program that makes several calls to printstr and tests this function.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1403
Chapter Six
Page 1404
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Answers to Selected Exercises
Answers to Selected Exercises
Appendix A
To be written. My apologies that this isn’t ready yet, but other chapters and appendices in this text have a higher priority. I will get around to this appendix eventually. In the meantime, if you have some questions about the answers to any exercises in this text, please feel free to post a question to one of the internet newsgroups like “comp.lang.asm.x86” or “alt.lang.asm”. Because of the high volume of email I receive daily, I will not answer questions sent to me via email. Note that posting the message to the net is very efficient because others get to share the solution. So please post your questions there.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1405
AppendixA
Page 1406
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Console Graphic Characters
Console Graphic Characters $DA 21 8
Appendix B
$C2 194
$BF 19 1
$B3 179 $C5 19 7
$C4 196 $B4 180
$C3 195
$C0 192
$C9 201
$C1 19 3
$D9 217
$CB 203
$BB 187
$BA 186 $ CE 206
$ CD 205
$CC 204
$ C8 200
Beta Draft - Do not distribute
$ B9 185
$CA 202
© 2001, By Randall Hyde
$ BC 188
Page 1407
AppendixB
$DA 21 8
$D2 21 0
$BF 19 1
$B3 179 $CE 206
$CD 205 $B5 18 1
$C6 198
$BA 186
$C0 192
$C9 20 1
$D0 20 8
$C4 196
$D1 209
$D9 21 7
$BB 18 7
$BA 186 $C5 19 7
$C4 195 $B6 182
$C7 199
$B3 179
$C8 200
Page 1408
$CF 20 7
$CD 205
© 2001, By Randall Hyde
$BC 188
Beta Draft - Do not distribute
Console Graphic Characters
$D6 21 4
$D2 210
$B7 18 3
$BA 186 $D7 21 5
$C4 196
$C7 199
$B6 182
$D3 211
$D0 20 8
$BD 189
$D5 21 3
$D1 209
$B8 18 4
$B3 179 $D8 21 6
$CD 205
$C6 198
$D4 212
Beta Draft - Do not distribute
$B5 181
$CF 20 7
© 2001, By Randall Hyde
$BE 190
Page 1409
AppendixB
Page 1410
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines
HLA Programming Style Guidelines C.1
Appendix C
Introduction Most people consider assembly language programs difficult to read. While there are a multitude of reasons why people feel this way, the primary reason is that assembly language does not make it easy for programmers to write readable programs. This doesn’t mean it’s impossible to write readable programs, only that it takes an extra effort on the part of an assembly language programmer to produce readable code. One of the design goals of the High Level Assembler (HLA) was to make it possible for assembly language programmers to write readable assembly language programs. Nevertheless, without discipline, pandemonium will result in any program of any decent size. Even if you adhere to a fixed set of style guidelines, others may still have trouble reading and understanding your code. Equally important to following a set of style guidelines is that you following a generally accepted set of style guidelines; guidelines that others are familiar and agree with. The purpose of this appendix, written by the designer of the HLA language, is to provide a consistent set of guidelines that HLA programmers can use consistently. Unless you can show a good reason to violate these rules, you should following them carefully when writing HLA programs; other HLA programmers will thank you for this.
C.1.1 Intended Audience Of course, an assembly language program is going to be nearly unreadable to someone who doesn’t know assembly language. This is true for almost any programming language. Other than burying a tutorial on 80x86 assembly language in a program’s comments, there is no way to address this problem1 other than to assume that the reader is familiar with assembly language programming and specifically HLA. In view of the above, it makes sense to define an "intended audience" that we intend to have read our assembly language programs. Such a person should: • • • • •
Be a reasonably competent 80x86 assembly language/HLA programmer. Be reasonably familiar with the problem the assembly language program is attempting to solve. Fluently read English2. Have a good grasp of high level language concepts. Possess appropriate knowledge for someone working in the field of Computer Science (e.g., understands standard algorithms and data structures, understands basic machine architecture, and understands basic discrete mathematics).
C.1.2 Readability Metrics One has to ask "What is it that makes one program more readable than another?" In other words, how do we measure the "readability" of a program? The usual metric, "I know a well-written program when I see one" is inappropriate; for most people, this translates to "If your programs look like my better programs then they are readable, otherwise they are not." Obviously, such a metric is of little value since it changes with every person. To develop a metric for measuring the readability of an assembly language program, the first thing we must ask is "Why is readability important?" This question has a simple (though somewhat flippant) answer: 1. Doing so (inserting an 80x86 tutorial into your comments) would wind up making the program less readable to those who already know assembly language since, at the very least, they’d have to skip over this material; at the worst they’d have to read it (wasting their time). 2. Or whatever other natural language is in use at the site(s) where you develop, maintain, and use the software.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1411
Appendix C
Appendices
Readability is important because programs are read (furthermore, a line of code is typically read ten times more often than it is written). To expand on this, consider the fact that most programs are read and maintained by other programmers (Steve McConnell claims that up to ten generations of maintenance programmers work on a typical real world program before it is rewritten from scratch; furthermore, they spend up to 60% of their effort on that code simply figuring out how it works). The more readable your programs are, the less time these other people will have to spend figuring out what your program does. Instead, they can concentrate on adding features or correcting defects in the code. For the purposes of this document, we will define a "readable" program as one that has the following trait: •
A "readable" program is one that a competent programmer (one who is familiar with the problem the program is attempting to solve) can pick up, without ever having seen the program before, and fully comprehend the entire program in a minimal amount of time.
That’s a tall order! This definition doesn’t sound very difficult to achieve, but few non-trivial programs ever really achieve this status. This definition suggests that an appropriate programmer (i.e., one who is familiar with the problem the program is trying to solve) can pick up a program, read it at their normal reading pace (just once), and fully comprehend the program. Anything less is not a "readable" program. Of course, in practice, this definition is unusable since very few programs reach this goal. Part of the problem is that programs tend to be quite long and few human beings are capable of managing a large number of details in their head at one time. Furthermore, no matter how well-written a program may be, "a competent programmer" does not suggest that the programmer’s IQ is so high they can read a statement a fully comprehend its meaning without expending much thought. Therefore, we must define readability, not as a boolean entity, but as a scale. Although truly unreadable programs exist, there are many "readable" programs that are less readable than other programs. Therefore, perhaps the following definition is more realistic: •
A readable program is one that consists of one or more modules. A competent program should be able to pick a given module in that program and achieve an 80% comprehension level by expending no more than an average of one minute for each statement in the program.
An 80% comprehension level means that the programmer can correct bugs in the program and add new features to the program without making mistakes due to a misunderstanding of the code at hand.
C.1.3 How to Achieve Readability The "I’ll know one when I see one" metric for readable programs provides a big hint concerning how one should write programs that are readable. As pointed out early, the "I’ll know it when I see it" metric suggests that an individual will consider a program to be readable if it is very similar to (good) programs that this particular person has written. This suggests an important trait that readable programs must possess: consistency. If all programmers were to write programs using a consistent style, they’d find programs written by others to be similar to their own, and, therefore, easier to read. This single goal is the primary purpose of this appendix - to suggest a consistent standard that everyone will follow. Of course, consistency by itself is not good enough. Consistently bad programs are not particularly easy to read. Therefore, one must carefully consider the guidelines to use when defining an all-encompassing standard. The purpose of this paper is to create such a standard. However, don’t get the impression that the material appearing in this document appears simply because it sounded good at the time or because of some personal preferences. The material in this paper comes from several software engineering texts on the subject (including Elements of Programming Style, Code Complete, and Writing Solid Code), nearly 20 years of personal assembly language programming experience, and research that led to the development of a set of generic programming guidelines for industrial use. This document assumes consistent usage by its readers. Therefore, it concentrates on a lot of mechanical and psychological issues that affect the readability of a program. For example, uppercase letters are harder to read than lower case letters (this is a well-known result from psychology research). It takes longer
Page 1412
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines for a human being to recognize uppercase characters, therefore, an average human being will take more time to read text written all in upper case. Hence, this document suggests that one should avoid the use of uppercase sequences in a program. Many of the other issues appearing in this document are in a similar vein; they suggest minor changes to the way you might write your programs that make it easier for someone to recognize some pattern in your code, thus aiding in comprehension.
C.1.4 How This Document is Organized This document follows a top-down discussion of readability. It starts with the concept of a program. Then it discusses modules. From there it works its way down to procedures. Then it talks about individual statements. Beyond that, it talks about components that make up statements (e.g., instructions, names, and operators). Finally, this paper concludes by discussing some orthogonal issues. Section Two discusses programs in general. It primarily discusses documentation that must accompany a program and the organization of source files. It also discusses, briefly, configuration management and source code control issues. Keep in mind that figuring out how to build a program (make, assemble, link, test, debug, etc.) is important. If your reader fully understands the "heapsort" algorithm you are using, but cannot build an executable module to run, they still do not fully understand your program. Section Three discusses how to organize modules in your program in a logical fashion. This makes it easier for others to locate sections of code and organizes related sections of code together so someone can easily find important code and ignore unimportant or unrelated code while attempting to understand what your program does. Section Four discusses the use of procedures within a program. This is a continuation of the theme in Section Three, although at a lower, more detailed, level. Section Five discusses the program at the level of the statement. This (large) section provides the meat of this proposal. Most of the rules this paper presents appear in this section. Section Six discusses comments and other documentation appearing within the source code. Section Seven discusses those items that make up a statement (labels, names, instructions, operands, operators, etc.) This is another large section that presents a large number of rules one should follow when writing readable programs. This section discusses naming conventions, appropriateness of operators, and so on. Section Eight discusses data types and other related topics.
C.1.5 Guidelines, Rules, Enforced Rules, and Exceptions Not all rules are equally important. For example, a rule that you check the spelling of all the words in your comments is probably less important than suggesting that the comments all be in English3. Therefore, this paper uses three designations to keep things straight: Guidelines, Rules, and Enforced Rules. A Guideline is a suggestion. It is a rule you should follow unless you can verbally defend why you should break the rule. As long as there is a good, defensible, reason, you should feel no apprehension violated a guideline. Guidelines exist in order to encourage consistency in areas where there are no good reasons for choosing one methodology over another. You shouldn’t violate a Guideline just because you don’t like it -- doing so will make your programs inconsistent with respect to other programs that do follow the Guideline (and, therefore, harder to read), however, you shouldn’t lose any sleep because you violated a Guideline. Rules are much stronger than Guidelines. You should never break a rule unless there is some external reason for doing so (e.g., making a call to a library routine forces you to use a bad naming convention). Whenever you feel you must violate a rule, you should verify that it is reasonable to do so in a peer review with at least two peers. Furthermore, you should explain in the program’s comments why it was necessary 3. You may substitute the local language in your area if it is not English.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1413
Appendix C
Appendices
to violate the rule. Rules are just that -- rules to be followed. However, there are certain situations where it may be necessary to violate the rule in order to satisfy external requirements or even make the program more readable. Enforced Rules are the toughest of the lot. You should never violate an enforced rule. If there is ever a true need to do this, then you should consider demoting the Enforced Rule to a simple Rule rather than treating the violation as a reasonable alternative. An Exception is exactly that, a known example where one would commonly violate a Guideline, Rule, or (very rarely) Enforced Rule. Although exceptions are rare, the old adage "Every rule has its exceptions..." certainly applies to this document. The Exceptions point out some of the common violations one might expect. Of course, the categorization of Guidelines, Rules, Enforced Rules, and Exceptions herein is one man’s opinion. At some organizations, this categorization may require reworking depending on the needs of that organization.
C.1.6 Source Language Concerns This document will assume that the entire program is written in 80x86 assembly language using the HLA assembler/compiler. Although this organization is rare in commercial applications, this assumption will, in no way, invalidate these guidelines. Other guidelines exist for various high level languages (including a set written by this paper’s author). You should adopt a reasonable set of guidelines for the other languages you use and apply these guidelines to the 80x86 assembly language modules in the program.
C.2
Program Organization A source program generally consists of one or more source, object, and library files. As a project gets larger and the number of files increases, it becomes difficult to keep track of the files in a project. This is especially true if a number of different projects share a common set of source modules. This section will address these concerns.
C.2.1 Library Functions A library, by its very nature, suggests stability. Ignoring the possibility of software defects, one would rarely expect the number or function of routines in a library to vary from project to project. A good example is the "HLA Standard Library." One would expect "stdout.put" to behave identically in two different programs that use the Standard Library. Contrast this against two programs, each of which implement their own version of stdout.put. One could not reasonably assume both programs have identical implementations4. This leads to the following rule: Rule:
Library functions are those routines intended for common reuse in many different assembly language programs. All assembly language (callable) libraries on a system should exist as ".lib" files and should appear in a "\lib" or "\hlalib" subdirectory.
Guideline:
"\hlalib" is probably a better choice if you’re using multiple languages since those other languages may need to put files in a "\lib" directory.
Exception:
It’s probably reasonable to leave the HLA Standard Library’s "hlalib.lib" file in the "\hla\hlalib" directory since most people expect it there.
4. In fact, just the opposite is true. One should get concerned if both implementations are identical. This would suggest poor planning on the part of the program’s author(s) since the same routine must now be maintained in two different programs.
Page 1414
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines The rule above ensures that the library files are all in one location so they are easy to find, modify, and review. By putting all your library modules into a single directory, you avoid configuration management problems such as having outdated versions of a library linking with one program and up-to-date versions linking with other programs.
C.2.2 Common Object Modules This document defines a library as a collection of object modules that have wide application in many different programs. The HLA Standard Library is a typical example of a library. Some object modules are not so general purpose, but still find application in two or more different programs. Two major configuration management problems exist in this situation: (1) making sure the ".obj" file is up-to-date when linking it with a program; (2) Knowing which modules use the module so one can verify that changes to the module won’t break existing code. The following rules takes care of case one: Rule:
If two different program share an object module, then the associated source, object, and makefiles for that module should appear in a subdirectory that is specific to that module (i.e., no other files in the subdirectory). The subdirectory name should be the same as the module name. If possible, you should create a set of link/alias/shortcuts to this subdirectory and place these links in the main directory of each of the projects that utilize the module. If links are not possible, you should place the module’s subdirectory in a "\common" subdirectory.
Enforced Rule:
Every subdirectory containing one or more modules should have a make file that will automatically generate the appropriate, up-to-date, ".obj" files. An individual, a batch file, or another make file should be able to automatically generate new object modules (if necessary) by simply executing the make program.
Guideline:
Use Microsoft’s nmake program. At the very least, use nmake acceptable syntax in your makefiles.
The other problem, noting which projects use a given module is much more difficult. The obvious solution, commenting the source code associated with the module to tell the reader which programs use the module, is impractical. Maintaining these comments is too error-prone and the comments will quickly get out of phase and be worse than useless -- they would be incorrect. A better solution is to create alias and place this alias in the main subdirectory of each program that links the module. Guideline:
If a project uses a module that is not local to the project’s subdirectory, create an alias to the file in the project’s subdirectory. This makes locating the file very easy.
C.2.3 Local Modules Local modules are those that a single program/project uses. Typically, the source and object code for each module appears in the same directory as the other files associated with the project. This is a reasonable arrangement until the number of files increases to the point that it is difficult to find a file in a directory listing. At that point, most programmers begin reorganizing their directory by creating subdirectories to hold many of these source modules. However, the placement, name, and contents of these new subdirectories can have a big impact on the overall readability of the program. This section will address these issues. The first issue to consider is the contents of these new subdirectories. Since programmers rummaging through this project in the future will need to easily locate source files in a project, it is important that you organize these new subdirectories so that it is easy to find the source files you are moving into them. The best organization is to put each source module (or a small group of strongly related modules) into its own subdirectory. The subdirectory should bear the name of the source module minus its suffix (or the main module if there is more than one present in the subdirectory). If you place two or more source files in the same directory, ensure this set of source files forms a cohesive set (meaning the source files contain code that solve a single problem). A discussion of cohesiveness appears later in this document.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1415
Appendix C
Appendices
Rule:
If a project directory contains too many files, try to move some of the modules to subdirectories within the project directory; give the subdirectory the same name as the source file without the suffix. This will nearly reduce the number of files in half. If this reduction is insufficient, try categorizing the source modules (e.g., FileIO, Graphics, Rendering, and Sound) and move these modules to a subdirectory bearing the name of the category.
Enforced Rule:
Each new subdirectory you create should have its own make file that will automatically assemble all source modules within that subdirectory, as appropriate.
Enforced Rule:
Any new subdirectories you create for these source modules should appear within the directory containing the project. The only excepts are those modules that are, or you anticipate, sharing with other projects. See “Common Object Modules” on page 1415 for more details.
Stand-alone assembly language programs generally contain a "main" procedure – the first program unit that executes when the operating system loads the program into memory. For any programmer new to a project, this procedure is the anchor where one first begins reading the code and the point where the reader will continually refer. Therefore, the reader should be able to easily locate this source file. The following rule helps ensure this is the case: Rule:
The source module containing the main program should have the same name as the executable (obviously the suffix will be different). For example, if the "Simulate 886" program’s executable name is "Sim886.exe" then you should find the main program in the "Sim886.hla" source file. Finding the source file that contains the main program is one thing. Finding the main program itself can be almost as hard. Assembly language lets you give the main program any name you want. However, to make the main procedure easy to find (both in the source code and at the O/S level), you should actually name this program "main". See “Module Organization” on page 1417 for more details about the placement of the main program. An alternative is to give the main program’s source file the name of the project.
Guideline:
The name of the main procedure in an assembly language program should be "main" or the name of the entire project.
C.2.4 Program Make Files Every project, even if it contains only a single source module, should have an associated make file. If someone want to assemble your program, they should not have to worry about what program (e.g., HLA) to use to compile the program, what command line options to use, what library modules to use, etc. They should be able to type "nmake"5 and wind up with an executable program. Even if assembling the program consists of nothing more than typing the name of the assembler and the source file, you should still have a make file. Someone else may not realize that’s all that is necessary. Enforced Rule:
The main project directory should contain a make file that will automatically generate an executable (or other expected object module) in response to a simple make/nmake command.
Rule:
If your project uses object modules that are not in the same subdirectory as the main program’s module, you should test the ".obj" files for those modules and execute the corresponding make files in their directories if the object code is out of date. You can assume that library files are up to date.
Guideline:
Avoid using fancy "make" features. Most programmers only learn the basics about make and will not be able to understand what your make file is doing if you fully exploit the make language. Especially avoid the use of default rules since this can create havoc if someone arbitrarily adds or removes files from the directory containing the make file.
5. Or whatever make program you normally use.
Page 1416
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines
C.3
Module Organization A module is a collection of objects that are logically related. Those objects may include constants, data types, variables, and program units (e.g., functions, procedures, etc.). Note that objects in a module need not be physically related. For example, it is quite possible to construct a module using several different source files. Likewise, it is quite possible to have several different modules in the same source file. However, the best modules are physically related as well as logically related; that is, all the objects associated with a module exist in a single source file (or directory if the source file would be too large) and nothing else is present. Modules contain several different objects including constants, types, variables, and program units (routines). Modules shares many of the attributes with routines (program units); this is not surprising since routines are the major component of a typical module. However, modules have some additional attributes of their own. The following sections describe the attributes of a well-written module.
Note:
Unit and package are both synonyms for the term module.
C.3.1 Module Attributes A module is a generic term that describes a set of program related objects (program units as well as data and type objects) that are somehow coupled. Good modules share many of the same attributes as good program units as well as the ability to hide certain details from code outside the module.
C.3.1.1
Module Cohesion Modules exhibit the following different kinds of cohesion (listed from good to bad): • •
• •
•
•
•
Functional or logical cohesion exists if the module accomplishes exactly one (simple) task. Sequential or pipelined cohesion exists when a module does several sequential operations that must be performed in a certain order with the data from one operation being fed to the next in a “filter-like” fashion. Global or communicational cohesion exists when a module performs a set of operations that make use of a common set of data, but are otherwise unrelated. Temporal cohesion exists when a module performs a set of operations that need to be done at the same time (though not necessarily in the same order). A typical initialization module is an example of such code. Procedural cohesion exists when a module performs a sequence of operations in a specific order, but the only thing that binds them together is the order in which they must be done. Unlike sequential cohesion, the operations do not share data. State cohesion occurs when several different (unrelated) operations appear in the same module and a state variable (e.g., a parameter) selects the operation to execute. Typically such modules contain a case (switch) or if..elseif..elseif... statement. No cohesion exists if the operations in a module have no apparent relationship with one another.
The first three forms of cohesion above are generally acceptable in a program. The fourth (temporal) is probably okay, but you should rarely use it. The last three forms should almost never appear in a program. For some reasonable examples of module cohesion, you should consult “Code Complete”. Guideline:
Design good modules! Good modules exhibit strong cohesion. That is, a module should offer a (small) group of services that are logically related. For example, a “printer” module might provide all the services one would expect from a printer. The individual routines within the module would provide the individual services.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1417
Appendix C
C.3.1.2
Appendices
Module Coupling
Coupling refers to the way that two modules communicate with one another. There are several criteria that define the level of coupling between two modules: •
Cardinality- the number of objects communicated between two modules. The fewer objects the better (i.e., fewer parameters). • Intimacy- how “private” is the communication? Parameter lists are the most private form; private data fields in a class or object are next level; public data fields in a class or object are next, global variables are even less intimate, and passing data in a file or database is the least intimate connection. Well-written modules exhibit a high degree of intimacy. • Visibility- this is somewhat related to intimacy above. This refers to how visible the data is to the entire system that you pass between two modules. For example, passing data in a parameter list is direct and very visible (you always see the data the caller is passing in the call to the routine); passing data in global variables makes the transfer less visible (you could have set up the global variable long before the call to the routine). Another example is passing simple (scalar) variables rather than loading up a bunch of values into a structure/record and passing that structure/record to the callee. • Flexibility- This refers to how easy it is to make the connection between two routines that may not have been originally intended to call one another. For example, suppose you pass a structure containing three fields into a function. If you want to call that function but you only have three data objects, not the structure, you would have to create a dummy structure, copy the three values into the field of that structure, and then call the function. On the other hand, had you simply passed the three values as separate parameters, you could still pass in structures (by specifying each field) as well as call the function with separate values. The module containing this later function is more flexible. A module is loosely coupled if its functions exhibit low cardinality, high intimacy, high visibility, and high flexibility. Often, these features are in conflict with one another (e.g., increasing the flexibility by breaking out the fields from a structures [a good thing] will also increase the cardinality [a bad thing]). It is the traditional goal of any engineer to choose the appropriate compromises for each individual circumstance; therefore, you will need to carefully balance each of the four attributes above. A module that uses loose coupling generally contains fewer errors per KLOC (thousands of lines of code). Furthermore, modules that exhibit loose coupling are easier to reuse (both in the current and future projects). For more information on coupling, see the appropriate chapter in “Code Complete”. Guideline:
Design good modules! Good modules exhibit loose coupling. That is, there are only a few, well-defined (visible) interfaces between the module and the outside world. Most data is private, accessible only through accessor functions (see information hiding below). Furthermore, the interface should be flexible.
Guideline:
Design good modules! Good modules exhibit information hiding. Code outside the module should only have access to the module through a small set of public routines. All data should be private to that module. A module should implement an abstract data type. All interface to the module should be through a well-defined set of operations.
C.3.1.3
Physical Organization of Modules
Many languages provide direct support for modules (e.g., units in HLA, packages in Ada, modules in Modula-2, and units in Delphi/Pascal). Some languages provide only indirect support for modules (e.g., a source file in C/C++). Others, like BASIC, don’t really support modules, so you would have to simulate them by physically grouping objects together and exercising some discipline. The primary mechanism in HLA for hiding names from other modules is to implement a module as an individual source file and publish only those names that are part of the module’s interface to the outside world (i.e., EXTERNAL directives in a header file. Rule:
Page 1418
Each module should completely reside in a single source file. If size considerations prevent
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines this, then all the source files for a given module should reside in a subdirectory specifically designated for that module. Some people have the crazy idea that modularization means putting each function in a separate source file. Such physical modularization generally impairs the readability of a program more than it helps. Strive instead for logical modularization, that is, defining a module by its actions rather than by source code syntax (e.g., separating out functions). This document does not address the decomposition of a problem into its modular components. Presumably, you can already handle that part of the task. There are a wide variety of texts on this subject if you feel weak in this area.
C.3.1.4
Module Interface
In any language system that supports modules, there are two primary components of a module: the interface component that publicizes the module visible names and the implementation component that contains the actual code, data, and private objects. HLA (like most assemblers) uses a scheme that is very similar to the one C/C++ uses. There are directives that let you import and export names. Like C/C++, you could place these directives directly in the related source modules. However, such code is difficult to maintain (since you need to change the directives in every file whenever you modify a public name). The solution, as adopted in the HLA programming language, is to use header files. Header files contain all the public definitions and exports (as well as common data type definitions and constant definitions). The header file provides the interface to the other modules that want to use the code present in the implementation module. The HLA EXTERNAL attribute is perfect for creating interface/header files. When you use EXTERNAL within a source module that defines a symbol, EXTERNAL behaves like a public directive, exporting the name to other modules. When you use EXTERNAL within a source modules that refers to an external name, EXTERNAL declares the object to be supplied in a different module. This lets you place an EXTERNAL declaration of an object in a single header file and include this file into both the modules that import and export the public names. Rule:
Keep all module interface directives (EXTERNAL) in a single header file for a given module. Place any other common data type definitions and constant definitions in this header file as well.
Guideline:
There should only be a single header file associated with any one module (even if the module has multiple source files associated with it). If, for some reason, you feel it is necessary to have multiple header files associated with a module, you should create a single file that includes all of the other interface files. That way a program that wants to use all the header files need only include the single file.
When designing header files, make sure you can include a file more than once without ill effects (e.g., duplicate symbol errors). The traditional way to do this is to put a #IF statement like the following around all the statements in a header file: ; Module: MyHeader.hhf #if( @defined( MyHeader_hhf ) ) ?MyHeader_hhf:=true; // Actual type and value doesn’t really matter. . . ;Statements in this header file. . #endif
The first time a source file includes "MyHeader.hhf" the symbol "MyHeader_hhf" is undefined. Therefore, the assembler will process all the statements in the header file. In successive include operations (during the same assembly) the symbol "MyHeader_hhf" is already defined, so the assembler ignores the body of the include file. My would you ever include a file twice? Easy. Some header files may include other header files. By including the file "YourHeader.hhf" a module might also be including "MyHeader.hhf" (assuming "YourBeta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1419
Appendix C
Appendices
Header.hhf" contains the appropriate include directive). Your main program, that includes "YourHeader.hhf" might also need "MyHeader.hhf" so it explicitly includes this file not realizing "YourHeader.hhf" has already processed "MyHeader.hhf" thereby causing symbol redefinitions. Rule:
Always put an appropriate #IF statement around all the definitions in a header file to allow multiple inclusion of the header file without ill effect.
Guideline:
Use the ".hhf" suffix for HLA header/interface files.
Rule:
Include files for library functions on a system should exist as ".hhf" files and should appear in the "\include" or "\hla\include" subdirectory.
Guideline:
"\hla\include" is probably a better choice if you’re using multiple languages since those other languages may need to put files in a "\include" directory.
Exception:
It’s probably reasonable to leave the HLA Standard Library’s "stdlib.hhf" file in the "\hla\include" directory since most people expect it there.
You can also prevent multiple inclusion of a file by using the #INCLUDEONCE directive. However, it’s safer to use the #IF..#ENDIF approach since that doesn’t rely on the user of your include file to use the right directive.
C.4
Program Unit Organization A program unit is any procedure, function, coroutine, iterator, subroutine, subprogram, routine, or other term that describes a section of code that abstracts a set of common operations on the computer. This text will simply use the term procedure or routine to describe these concepts. Routines are closely related to modules, since they tend to be the major component of a module (along with data, constants, and types). Hence, many of the attributes that apply to a module also apply to routines. The following paragraphs, at the expense of being redundant, repeat the earlier definitions so you don’t have to flip back to the previous sections.
C.4.1 Routine Cohesion Routines exhibit the following kinds of cohesion (listed from good to bad and are mostly identical to the kinds of cohesion that modules exhibit): • •
Functional or logical cohesion exists if the routine accomplishes exactly one (simple) task. Sequential or pipelined cohesion exists when a routine does several sequential operations that must be performed in a certain order with the data from one operation being fed to the next in a “filter-like” fashion. • Global or communicational cohesion exists when a routine performs a set of operations that make use of a common set of data, but are otherwise unrelated. • Temporal cohesion exists when a routine performs a set of operations that need to be done at the same time (though not necessarily in the same order). A typical initialization routine is an example of such code. • Procedural cohesion exists when a routine performs a sequence of operations in a specific order, but the only thing that binds them together is the order in which they must be done. Unlike sequential cohesion, the operations do not share data. • State cohesion occurs when several different (unrelated) operations appear in the same routine and a state variable (e.g., a parameter) selects the operation to execute. Typically such routines contain a case (switch) or if..elseif..elseif... statement. • No cohesion exists if the operations in a routine have no apparent relationship with one another. The first three forms of cohesion above are generally acceptable in a program. The fourth (temporal) is probably okay, but you should rarely use it. The last three forms should almost never appear in a program. For some reasonable examples of routine cohesion, you should consult “Code Complete”.
Page 1420
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines Guideline:
All routines should exhibit good cohesiveness. Functional cohesiveness is best, followed by sequential and global cohesiveness. Temporal cohesiveness is okay on occasion. You should avoid the other forms.
C.4.2 Routine Coupling Coupling refers to the way that two routines communicate with one another. There are several criteria that define the level of coupling between two routines; again these are identical to the types of coupling that modules exhibit: • •
•
•
Cardinality- the number of objects communicated between two routines. The fewer objects the better (i.e., fewer parameters). Intimacy- how “private” is the communication? Parameter lists are the most private form; private data fields in a class or object are next level; public data fields in a class or object are next, global variables are even less intimate, and passing data in a file or database is the least intimate connection. Well-written routines exhibit a high degree of intimacy. Visibility- this is somewhat related to intimacy above. This refers to how visible the data is to the entire system that you pass between two routines. For example, passing data in a parameter list is direct and very visible (you always see the data the caller is passing in the call to the routine); passing data in global variables makes the transfer less visible (you could have set up the global variable long before the call to the routine). Another example is passing simple (scalar) variables rather than loading up a bunch of values into a structure/record and passing that structure/record to the callee. Flexibility- This refers to how easy it is to make the connection between two routines that may not have been originally intended to call one another. For example, suppose you pass a structure containing three fields into a function. If you want to call that function but you only have three data objects, not the structure, you would have to create a dummy structure, copy the three values into the field of that structure, and then call the routine. On the other hand, had you simply passed the three values as separate parameters, you could still pass in structures (by specifying each field) as well as call the routine with separate values.
A function is loosely coupled if it exhibits low cardinality, high intimacy, high visibility, and high flexibility. Often, these features are in conflict with one another (e.g., increasing the flexibility by breaking out the fields from a structures [a good thing] will also increase the cardinality [a bad thing]). It is the traditional goal of any engineer to choose the appropriate compromises for each individual circumstance; therefore, you will need to carefully balance each of the four attributes above. A program that uses loose coupling generally contains fewer errors per KLOC (thousands of lines of code). Furthermore, routines that exhibit loose coupling are easier to reuse (both in the current and future projects). For more information on coupling, see the appropriate chapter in “Code Complete”. Guideline:
Coupling between routines in source code should be loose.
C.4.3 Routine Size Sometime in the 1960’s, someone decided that programmers could only look at one page in a listing at a time, therefore routines should be a maximum of one page long (66 lines, at the time). In the 1970’s, when interactive computing became popular, this was adjusted to 24 lines -- the size of a terminal screen. In fact, there is very little empirical evidence to suggest that small routine size is a good attribute. In fact, several studies on code containing artificial constraints on routine size indicate just the opposite -- shorter routines often contain more bugs per KLOC6.
6. This happens because shorter functions invariably have stronger coupling, leading to integration errors.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1421
Appendix C
Appendices
A routine that exhibits functional cohesiveness is the right size, almost regardless of the number of lines of code it contains. You shouldn’t artificially break up a routine into two or more subroutines (e.g., sub_partI and sub_partII) just because you feel a routine is getting to be too long. First, verify that your routine exhibits strong cohesion and loose coupling. If this is the case, the routine is not too long. Do keep in mind, however, that a long routine is probably a good indication that it is performing several actions and, therefore, does not exhibit strong cohesion. Of course, you can take this too far. Most studies on the subject indicate that routines in excess of 150-200 lines of code tend to contain more bugs and are more costly to fix than shorter routines. Note, by the way, that you do not count blank lines or lines containing only comments when counting the lines of code in a program. Also note that most studies involving routine size deal with HLLs. A comparable HLA routine will contain more lines of code than the corresponding HLL routine. Therefore, you can expect your routines in assembly language to be a little longer. Guideline:
Do not let artificial constraints affect the size of your routines. If a routine exceeds about 200-250 lines of code, make sure the routine exhibits functional or sequential cohesion. Also look to see if there aren’t some generic subsequences in your code that you can turn into stand alone routines.
Rule:
Never shorten a routine by dividing it into n parts that you would always call in the appropriate sequence as a way of shortening the original routine.
C.5
Statement Organization In an assembly language program, the author must work extra hard to make a program readable. By following a large number of rules, you can produce a program that is readable. However, by breaking a single rule no matter how many other rules you’ve followed, you can render a program unreadable. Nowhere is this more true than how you organize the statements within your program.
C.5.1 Writing “Pure” Assembly Code Consider the following example taken from "The Art of Assembly Language Programming/DOS Edition" and converted to HLA: The Microsoft Macro Assembler is a free form assembler. The various fields of an assembly language statement may appear in any column (as long as they appear in the proper order). Any number of spaces or tabs can separate the various fields in the statement. To the assembler, the following two code sequences are identical:
______________________________________________________ mov( 0, ax ); mov( ax, bx ); add( dx, ax ); mov( ax, cx ); ______________________________________________________ mov( 0, ax); mov( ax, bx); add( ad, ax); mov( ax, cx ); ______________________________________________________
The first code sequence is much easier to read than the second (if you don't think so, perhaps you should go see a doctor!). With respect to readability, the judicial use of spacing within your proPage 1422
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines gram can make all the difference in the world.
While this is an extreme example, do note that it only takes a few mistakes to have a large impact on the readability of a program. HLA is a free-form assembler insofar as it does not place stringent formatting requirements on its statements. For example, you can put multiple statements on a single line as well as spread a single statement across multiple lines. However, the freedom to arrange these statements in any manner is one of the primary contributors to hard to read assembly language programs. Although HLA lets you enter your programs in free-form, there is absolutely no reason you cannot adopt a fixed format. Doing so generally helps make an assembly language program much easier to read. Here are the rules you should use: Guideline:
Only place one statement per source line.
Rule:
Within a given block of code, all mnemonics should start in the same column.
Exception:
See the indentation rules appearing later in this documentation.
Guideline:
Try to always start the comment fields on adjacent source lines in the same column (note that it is impractical to always start the comment field in the same column throughout a program).
Most people learn a high level language prior to learning assembly language. They have been firmly taught that readable (HLL) programs have their control structures properly indented to show the structure of the program. Indentation works great when you have a block structured language. In old-fashioned assembly language this scheme doesn’t work; one of the principle benefits to HLA is that it lets you continue to use the indentation schemes you’re familiar with in HLLs like C/C++ and Pascal. However, this assumes that you’re using the HLA high level control structures. If you choose to work in “pure” assembly language, then these rules don’t apply. The following discussion assumes the use of “pure” assembly language code; we’ll address HLA’s high level control statements later. If you need to set off a sequence of statements from surrounding code, the best thing you can do is use blank lines in your source code. For a small amount of detachment, to separate one computation from another for example, a single blank line is sufficient. To really show that one section of code is special, use two, three, or even four blank lines to separate one block of statements from the surrounding code. To separate two totally unrelated sections of code, you might use several blank lines and a row of dashes or asterisks to separate the statements. E.g., mov( FileSpec, eax ); mov( 0, cl ); call MyFunction; jc Error; //********************************************* mov( &fileRecords, edi ); mov( &files, ebx ); sub( 2, ebx );
Guideline:
Use blank lines to separate special blocks of code from the surrounding code. Use an aesthetic looking row of asterisks or dashes if you need a stronger separation between two blocks of code (do not overdo this, however).
If two sequences of assembly language statements correspond to roughly two HLL statements, it’s generally a good idea to put a blank line between the two sequences. This helps clarify the two segments of code in the reader’s mind. Of course, it is easy to get carried away and insert too much white space in a program, so use some common sense here. Guideline:
If two sequences of code in assembly language correspond to two adjacent statements in a HLL, then use a blank line to separate those two assembly sequences (assuming the sequences
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1423
Appendix C
Appendices are real short).
A common problem in any language (not just assembly language) is a line containing a comment that is adjacent to one or two lines containing code. Such a program is very difficult read because it is hard to determine where the code ends and the comment begins (or vice-versa). This is especially true when the comments contain sample code. It is often quite difficult to determine if what you’re looking at is code or comments; hence the following enforced rule: Enforced Rule:
Always put at least one blank line between code and comments (assuming, of course, the comment is sitting only a line by itself; that is, it is not an endline comment7).
C.5.2 Using HLA’s High Level Control Statements Since HLA’s high level control statements are so similar to high level language control statements, it’s not surprising to discover that you’ll use the same formatting for HLA’s statements as you would with those other HLLs. Most of these statements compile to very efficient machine code (usually matching what you’d write yourself if you were writing “pure” assembly code). Since their use can make your programs more readable, you should use them whenever practical. Guideline:
Use the HLA high level control structures when they are appropriate in your programs.
There are two problems advanced assembly programmers have with high level control structures: (1) the compiler for such statements (e.g., HLA) doesn’t always generate the best code, and (2) the use of such statements encourages inefficient coding on the programmer’s part. HLA’s control structures are relatively limited, so point (1) above isn’t as big a problem as you might expect. Nevertheless, there will certainly be situations where HLA does not generate the same exact instruction sequence you would for a given control construct. Therefore, it’s a good idea to become familiar with the low-level code that HLA emits for each of the control structures so that you can intelligently choose whether to use a high level or low level control structure in a given situation. A later appendix explains how HLA generates code for the high level control structures; you should study this material. Also note that HLA emits MASM compatible assembly code, so you can certainly study HLA’s output if you’ve got any questions about the code HLA generates. Point (2) above is something that HLA has no control over. It is quite true that if you write “C code with MOV instructions” in HLA, the code probably isn’t going to be as efficient as pure assembly code. However, with a little discipline you can prevent this problem from occurring. One of the benefits to using the high level control structures HLA provides is that you can now use indentation of your statements to better show the structure of the program. Since HLA’s high level control structures are very similar to those found in traditional high level languages, you can use well-established programming conventions when indenting statements in your HLA programs. Here are some suggestions: Rule:
Indent statements within a high-level control block four space. The ENDxxxx clause that matches the statement should begin in the same column as the statement that starts a block. // Example of nesting an IF..THEN..ENDIF statement: if( eax = 0 ) then > endif;
Guideline:
// endif should be at the same level as the if statement.
Avoid putting multiple statements on the same line.
7. See the next section concerning comments for more information.
Page 1424
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines The HLA programming language contains eight flow-of-control statements: two conditional selection statements (IF..THEN..ELSEIF..ELSE and SWITCH..CASE..DEFAULT..ENDSWITCH), five loops (WHILE..ENDWHILE, REPEAT..UNTIL, FOR..ENDFOR, FOREACH..ENDFOR, and FOREVER..ENDFOR), a program unit invocation (i.e., procedure call), and the statement sequence. Rule:
If your code contains a chain of if..elseif..elseif.......elseif..... statements, do not use the final else clause to handle a remaining case. Only use the final else to catch an error condition. If you need to test for some value in an if..elseif..elseif.... chain, always test the value in an if or elseif statement.
The HLA Standard Library implements the multi-way selection statements (SWITCH) using a jump table. This means that the order of the cases within the selection statement is usually irrelevant. Placing the statements in a particular order rarely improves performance. Since the order is usually irrelevant to the compiler, you should organize the cases so that they are easy to read. There are two common organizations that make sense: sorted (numerically or alphabetically) or by frequency (the most common cases first). Either organization is readable; one drawback to this approach is that it is often difficult to predict which cases the program will execute most often. Guideline:
When using multi-way selection statements (case/switch) sort the cases numerically (alphabetically) or by frequency of expected occurrence.
There are three general categories of looping constructs available in common high-level languagesloops that test for termination at the beginning of the loop (e.g., WHILE), loops that test for loop termination at the bottom of the loop (e.g., REPEAT..UNTIL), and those that test for loop termination in the middle of the loop (e.g., FOREVER..ENDFOR). It is possible simulate any one of these loops using any of the others. This is particularly trivial with the FOREVER..ENDFOR construct: /* Test for loop termination at beginning of FOREVER..ENDFOR */ forever breakif( ax = y ); . . . endfor;
/* Test for loop termination in the middle of FOREVER..ENDFOR */ forever . . . breakif( ax = y ); . . . endfor; /* Test for loop termination at the end of FOREVER..ENDFOR */ forever . . . breakif( x = y ); endfor;
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1425
Appendix C
Appendices
Given the flexibility of the FOREVER..ENDFOR control structure, you might question why one would even burden a compiler with the other loop statements. However, using the appropriate looping structure makes a program far more readable, therefore, you should never use one type of loop when the situation demands another. If someone reading your code sees a FOREVER..ENDFOR construct, they may think it’s okay to insert statements before or after the exit statement in the loop. If your algorithm truly depends on WHILE..ENDWHILE or REPEAT..UNTIL semantics, the program may now malfunction. Rule:
Always use the most appropriate type of loop (categorized by termination test position). Never force one type of loop to behave like another.
Many languages provide a special case of the while loop that executes some number of times specified upon first encountering the loop (a definite loop rather than an indefinite loop). This is the “for” loop in most languages. The vast majority of the time a for loop sequences through a fixed range of value incrementing or decrementing the loop control variable by one. Therefore, most programmers automatically assume this is the way a for loop will operate until they take a closer look at the code. Since most programmers immediately expect this behavior, it makes sense to limit FOR loops to these semantics. If some other looping mechanism is desirable, you should use a WHILE loop to implement it (since the for loop is just a special case of the while loop). There are other reasons behind this decision as well. Rule:
“FOR” loops should always use an ordinal loop control variable (e.g., integer, char, boolean, enumerated type) and should always increment or decrement the loop control variable by one.
Most people expect the execution of a loop to begin with the first statement at the top of the loop, therefore, Rule:
All loops should have one entry point. The program should enter the loop with the instruction at the top of the loop.
Likewise, most people expect a loop to have a single exit point, especially if it’s a WHILE or REPEAT..UNTIL loop. They will rarely look closely inside a loop body to determine if there are “break” statements within the loop once they find one exit point. Therefore, Guideline:
Loops with a single exit point are more easily understood.
Whenever a programmer sees an empty loop, the first thought is that something is missing. Therefore, Guideline:
Avoid empty loops. If testing the loop termination condition produces some side effect that is the whole purpose of the loop, move that side effect into the body of the loop. If a loop truly has an empty body, place a comment like "/* nothing */" within your code.
Even if the loop body is not empty, you should avoid side effects in a loop termination expression. When someone else reads your code and sees a loop body, they may skim right over the loop termination expression and start reading the code in the body of the loop. If the (correct) execution of the loop body depends upon the side effect, the reader may become confused since s/he did not notice the side effect earlier. The presence of side effects (that is, having the loop termination expression compute some other value beyond whether the loop should terminate or repeat) indicates that you’re probably using the wrong control structure. Consider the following WHILE loop in HLA that is easily corrected: while( mov( stdin.geti32(), ecx ) != 0 ) do > endwhile;
A better implementation of this code fragment would be to use a FOREVER..ENDFOR construct: forever
Page 1426
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines
stdin.geti32(); mov( eax, ecx ); breakif( eax = 0 ); . . . endfor;
Rule:
Avoid side-effects in the computation of the loop termination expression (others may not be expecting such side effects). Also see the guideline about empty loops.
Like functions, loops should exhibit functional cohesion. That is, the loop should accomplish exactly one thing. It’s very tempting to initialize two separate arrays in the same loop. You have to ask yourself, though, “what do you really accomplish by this?” You save about four machine instructions on each loop iteration, that’s what. That rarely accounts for much. Furthermore, now the operations on those two arrays are tied together, you cannot change the size of one without changing the size of the other. Finally, someone reading your code has to remember two things the loop is doing rather than one. Guideline:
Make each loop perform only one function.
Programs are much easier to read if you read them from left to right, top to bottom (beginning to end). Programs that jump around quite a bit are much harder to read. Of course, the jmp (goto) statement is well-known for its ability to scramble the logical flow of a program, but you can produce equally hard to read code using other, structured, statements in a language. For example, a deeply nested set of if statements, some with and some without ELSE clauses, can be very difficult to follow because of the number of possible places the code can transfer depending upon the result of several different boolean expressions. Rule:
Code, as much as possible, should read from top to bottom.
Rule:
Related statements should be grouped together and separated from unrelated statements with whitespace or comments.
In theory, a line of source code can be arbitrarily long. In practice, there are several practical limitations on source code lines. Paramount is the amount of text that will fit on a given terminal display device (we don’t all have 21” high resolution monitors!) and what can be printed on a typical sheet of paper. Even with small fonts and wide carriage printers, keep in mind that many people like to print listings two-up or three-up in order to save paper. If this isn’t enough to suggest an 80 character limit on source lines, McConnell suggests that longer lines are harder to read (remember, people tend to look at only the left side of the page while skimming through a listing). Enforced Rule:
Source code lines will not exceed 80 characters in length.
If a statement approaches the maximum limit of 80 characters, it should be broken up at a reasonable point and split across two lines. If the line is a control statement that involves a particularly long logical expression, the expression should be broken up at a logical point (e.g., at the point of a low-precedence operator outside any parentheses) and the remainder of the expression placed underneath the first part of the expression. E.g., (note that the following involves constant expressions, run-time expressions generally aren’t very long): #if ( ( ( x + y * z) < ( ComputeProfits(1980,1990) / 1.0775 ) ) && ( ValueOfStock[ ThisYear ] >= ValueOfStock[ LastYear ] ) )
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1427
Appendix C
Appendices
> #endif
Many statements (e.g., IF, WHILE, FOR, and function or procedure calls) contain a keyword followed by a parenthesis. If the expression appearing between the parentheses is too long to fit on one line, consider putting the opening and closing parentheses in the same column as the first character of the start of the statement and indenting the remaining expression elements. The example above demonstrates this for the "IF" statement. The following examples demonstrate this technique for other statements: while ( SomeFunctionReturningAValueInEAX( with, lots, of, parameters ) endwhile; fileio.put ( outputFileHandle, "Error in module “, ModuleName, “ at line #”, LineNumber, “, encountered illegal value", nl );
Guideline:
For statements that are too long to fit on one physical 80-column line, you should break the statement into two (or more) lines at points in the statement that will have the least impact on the readability of the statement. This situation usually occurs immediately after low-precedence operators or after commas.
If a procedure, function, or other program unit has a particularly long actual or formal parameter list, each parameter should be placed on a separate line. The following examples demonstrate a procedure declaration and call using this technique: procedure MyFunction ( NumberOfDataPoints: int32, X1Root: real32, X2Root: real32, var YIntercept: real32 );
MyFunction ( GetNumberOfPoints(RootArray), RootArray[ EBX*4 ], RootArray[ ECX*4 ], Solution );
Page 1428
// Assume “RETURNS” value is EAX.
© 2001, By Randall Hyde
Beta Draft - Do not distribute
Programming Style Guidelines Rule:
If an actual or formal parameter list is too long to fit a function call or definition on a single line, then place each parameter on a separate line and align them so they are easy to read.
Guideline:
If a boolean expression exceeds the length of the source line (usually 80 characters), then break the source line into pieces and align the parentheses associated with the statement underneath the start of the statement.
This usually isn’t a problem in HLA since expressions are very limited. However, if you call a function with a long parameter list you could run into this problem. One area where this problem does occur is when you’re using HLA’s hybrid control structures. For such sequences you should always place the statements associated with the boolean expression on separate lines and align the braces with the high level control structure, e.g., if { cmp( ax, bx );; jne true; cmp( ax, 5 ); jl false; cmp( bx, 0 ); je false; } > endif;
Rule:
Always put a blank line between a high level control statement and the nested statements associated with that statement. Likewise, put a blank line between the end of the nested statements and the corresponding ENDxxx clause of the statement. E.g., if( ax = 0 ) then