Introduction to Embedded Systems Interfacing to the Freescale 9S12
Jonathan W. Valvano University of Texas at Austin
Australia • Canada • Mexico • Singapore • Spain • United Kingdom • United States
Introduction to Embedded Systems: Interfacing to the Freescale 9S12, 1st Edition
© 2010 Cengage Learning
Director, Global Engineering Program: Chris Carson
ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means–graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, information storage and retrieval systems, or in any other manner–except as may be permitted by the license terms herein.
Senior Developmental Editor: Hilda Gowans
For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706.
Editorial Assistant: Jennifer Dinsmore
For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be emailed to
[email protected].
Jonathan W. Valvano
Marketing Specialist: Lauren Betsos
Library of Congress Control Number: 2009923271
Media Editor: Chris Valentine
ISBN-13: 978-0-495-41137-6 ISBN-10: 0-495-41137-X
Director, Content and Media Production: Barbara Fuller-Jacobsen
Cengage Learning 200 First Stamford Place, Suite 400 Stamford, CT 06902 USA
Content Project Manager: Emily Nesheim Production Service: RPK Editorial Services, Inc. Copyeditor: Shelley Gerger-Knecthl Proofreader: Harlan James Indexer: Shelley Gerger-Knecthl Compositor: Integra Software Services Senior Art Director: Michelle Kunkler Internal Designer: John Edeen and Carmela Periera Cover Designer: Andrew Adams Cover Image: © Janaka/Shutterstock Permissions Account Manager, Text: Mardell Glinski Schultz Permissions Account Manager, Images: John Hill Text and Image Permissions Researcher: Kristiina Paul Senior First Print Buyer: Doug Wilke
Printed in Canada 1 2 3 4 5 6 7 13 12 11 10 09
Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan. Locate your local office at: international.cengage.com/region. Cengage Learning products are represented in Canada by Nelson Education Ltd. For your course and learning solutions, visit www.cengage.com/engineering. Purchase any of our products at your local college store or at our preferred online store www.ichapters.com.
Preface Embedded computer systems are electronic systems that include a microcomputer to perform specific dedicated tasks. The computer is hidden inside these products. Embedded systems are ubiquitous. Every week millions of tiny computer chips come pouring out of factories like Freescale, Microchip, Philips, Texas Instruments, Silicon Labs, and Mitsubishi finding their way into our everyday products. Our global economy, our production of food, our transportation systems, our military defense, our communication systems, and even our quality of life depend on the efficiency and effectiveness of these embedded systems. Engineers play a major role in all phases of this effort: planning, design, analysis, manufacturing, and marketing. This book provides an introduction to embedded systems, including both hardware interfacing and software fundamentals. This book employs a bottom-up educational approach. The overall educational objective is to allow students to discover how the computer interacts with its environment. It will provide hands-on experiences of how an embedded system could be used to solve Electrical Engineering (EE) problems. The focus will be on understanding and analysis, with an introduction to design. The optical sensors, motors, sampling ADCs and DACs are the chosen mechanism to bridge the Computer Engineering (CE) and EE worlds. EE concepts include Ohms Law, LED voltage/current, resistance measurement, and stepper motor control. CE concepts include I/O device drivers, debugging, stacks, queues, local variables and interrupts. This book is based on the Freescale 9S12. This book can be used effectively with any of the 9S12 derivatives, such as 9S12C32, 9S12DG256, 9S12DP512, and 9S12E128. The hardware construction is performed on a breadboard and debugged using a multimeter (students learn to measure voltage and resistance). Software is developed in 9S12 assembly; labs may be simulated-only or first simulated and then run on the real 9S12 system. Software debugging occurs during the simulation stage. Device testing occurs on the final product. One way to sort the broad range of topics within EE and CE is to group them into three categories: components, interfaces, and systems. Electrical and Computer Engineering curriculi devote considerable effort to teaching how to design the components within a system. Components include physical devices, analog circuits, digital circuits, power circuits, digital signal processing, data structures, and software algorithms. Interfacing in general and this book, in specific, address the important task of connecting these components together. So, one effective way to educate engineering students is to first teach them how to build components, then teach them how to connect components together (this book). After the student learns how to build things and connect them together, then the student can be taught how to build systems. Of course, once a system is complete, it can be interfaced with other systems to solve more complex problems. The book is essentially organized into three parts. Chapters 1 through 4 provide a basic introduction to computer architecture, representation of information, and assembly language programming. Parallel ports, switches, and LEDs are presented early in Chapter 2 so that students can write software that actually does something. Chapters 5, 6, 7, and 10 provide an in-depth treatment of software design as it applies to embedded systems. Interfacing and applications of embedded systems are presented in Chapters 8, 9, 11, and 12.
Objectives of the Book The overall objective of this book is to present basic computer architecture, teach assembly language programming, and present an introduction to interfacing. Most universities teach assembly language programming not because employers wish to hire engineers and scientists iii
iv
䡲 Preface
ready to produce assembly code, but rather, because it affords a concrete approach for teaching how software works. Furthermore, an embedded system is an effective vehicle around which to introduce architecture, programming, and interfacing because the components are simple and inexpensive. The book describes both general processes and specific details involved in embedded system design. In particular, detailed case studies are used to illustrate fundamental concepts, and laboratory assignments are provided. The specific objectives of this book include the understanding of: 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲
The basic procedures involved in hardware/software simulation How information is represented on the computer The basic arithmetic and logical operations performed by the computer The fundamental architecture of the 9S12 family microcomputers The input/output operations and synchronization Assembly language programming: considering both function and style Simple hardware interfaces, including: switches, keyboards, LEDs, LCDs, DC motors, DACs, ADCs, and serial ports Debugging techniques: breakpoints, scanpoints, profiles, monitors, voltmeters, oscilloscopes, logic analyzers Program structures with a comparison between assembly and C Modular programming Elementary data structures Interrupt programming
This book does not discuss in detail every 9S12 instruction, but rather, it presents some of the instructions and uses them to discuss the general issues of representation of information, computer architecture, and developing embedded systems. In contrast, the Freescale programming reference guides do give details of each assembly instruction. In a similar manner, the Freescale microcomputer technical reference manuals explain all the I/O port functions. In other words, you will use this book along with the manuals from Freescale. A web site http://users.ece.utexas.edu/⬃valvano/ contains many reference documents for this book.
Prerequisites This book is intended for an introductory laboratory course in microcomputer programming and/or microcomputer interfacing. It is assumed the student has some knowledge of programming, covering concepts such as conditionals, for-loops, while-loops, functions, parameter passing, and arrays. Specific knowledge of C is not required, but C programs are presented throughout the book in an effort to explain the assembly language programs. In addition, some prior knowledge about digital logic is desired, but not necessary, covering topics such as not gates, and gates, or gates and D flip-flops. Students will need a fundamental knowledge of resistors, capacitors, and inductors, as typically covered in a freshmen physics class on electromagnetics. Calculus is not required for this book. For a more advanced treatment of microcomputer interfacing and embedded systems, see Embedded Microcomputer Systems: Real Time Interfacing Second Edition by Jonathan W. Valvano, published by Thompson, © 2006.
Special Features This book incorporates a number of special features specifically designed for the beginning engineer. An effective educational approach is to learn by doing. The first action component of the book is the use of checkpoints, which can be found throughout the book. A checkpoint
䡲 Preface
v
is a short question meant as an immediate feedback mechanism for the reader to evaluate his or her level of comprehension. Checkpoints should be performed while reading the chapter. Answers to checkpoints are given in the solutions manual section at the back of the book. The second action component of the book is the examples. Design examples are included within each chapter. The purpose of the examples is to apply knowledge presented in that chapter to solve a specific problem. The third action component is the tutorials. Each tutorial includes a sequence of actions (specific things for the reader to do) and a list of questions. Tutorials are meant to be performed without supervision, and should be performed after reading the chapter, but before attempting the labs or homework. Answers to the tutorial questions are also given in the solutions manual section in the back of the book. The most important action components of the book are the laboratory assignments, which can be found at the end of each chapter. Additional labs and the tutorials can be found on the web site http://users. ece.utexas.edu/⬃valvano/. Each laboratory solution can first be built and tested using the TExaS simulator, then downloaded and run on an actual 9S12. Only by performing the laboratory assignments can the reader truly assimilate the hardware and software concepts introduced in this book. Laboratories are meant to be performed under the supervision of an instructor, and involve the classic engineering processes of design, construction, debugging, and evaluation. Homework problems can also be found at the end of each chapter. These problems are less detailed and are intended to evaluate the reader’s understanding of specific topics introduced in the chapter.
How to Teach a Course Based on This Book The first step in the design of any course is to create a list of educational objectives. This book along with the materials on the book web site could be used to teach introductory microcomputer programming and/or microcomputer interfacing. Specific educational objectives that are supported in this book are microcomputer architecture, number systems, assembly language programming, debugging, I/O device interfacing, I/O device synchronization, subroutines, local variables, elementary data structures, and interrupts. The next important decision to make is the organization of the student laboratory. The importance of practical “hands on” experience is critical in the educational process. Unfortunately, space, staff, and money constraints force all of us to compromise, doing the best we can. On the other hand, the role of simulation is becoming increasingly important as the race for technological superiority is run with shorter and shorter design cycle times. Consequently, it is important to expose our students to all phases of engineering design, including problem specification, conceptualization, simulation, construction, and analysis. Universities that adopt this book will be allowed to download, rewrite, print out, and distribute the laboratory assignments presented in this book. The first laboratory configuration is based entirely on material included with book, and involves no extra costs. Each book allows the student to download and install the TExaS application on a single computer. Students, for the most part, work off campus and come to a TA station for help or lab grading. In this configuration, you can either develop software in assembly using the TExaS assembler or develop C programs using the special version of Metrowerks Codewarrior for the 9S12. The simulator itself becomes the platform on which the lab assignments are developed and tested. A second laboratory configuration combines simulation with some real microcomputer experiments. Labs can be first simulated, then run on a real microcomputer. Students are given or loaned a 9S12 development board like the Dragon12 board from Wytec (http://www.evbplus.com/index.html) or the Adapt9S12 board from Technological Arts (http://www.technologicalarts.com). Students can work off campus on the simulation aspects of the labs, then come to a laboratory for access to test equipment such as voltmeters and oscilloscopes. In this configuration, students first could write and debug assembly
vi
䡲 Preface
software using the TExaS simulator, then use TExaS to download and test on a real 9S12 board. TExaS can be used with any 9S12 that contains the Serial Monitor in protected EEPROM $F800 to $FFFF. The special version of Metrowerks Codewarrior for the 9S12 could also be used to develop either assembly or C using either the serial monitor or a background debug module (BDM) hardware pod. This is more expensive than the first configuration because actual microcomputer hardware and debugging systems are required.
What’s on the Book Web Site? 1. TExaS installer download. Each student purchasing a book can download and install TExaS. TExaS is a complete editor, assembler, and simulator for the Freescale 9S12 microcomputer. It simulates external hardware, I/O ports, interrupts, memory, and program execution. It is intended as a learning tool for embedded systems. This software is not freeware, but the purchase of the book entitles the owner to install one copy of the program. Once installed TExaS creates many subdirectories with example applications. 2. There are multiple short video tutorials about developing assembly language programs on TExaS. See http://users.ece.utexas.edu/⬃valvano/Readme.htm 3. There is a directory containing data sheets in Adobe’s pdf format. This information does not need to be copied to your hard drive; you can simply read the data sheets from the web itself. In particular there are data sheets for microcomputers, digital logic, memory chips, op amps, ADCs, DACs, timer chips and interface chips. See http://users.ece.utexas.edu/⬃valvano/Datasheets/ 4. There is a directory containing example applications. These examples include circuit diagrams and software that can be downloaded and run on the actual 9S12 board. http://users.ece.utexas.edu/⬃valvano/Starterfiles/ 5. There is a directory containing lecture notes and laboratory assignments based on this book. http://users.ece.utexas.edu/⬃valvano/EE319K/ 6. There is a web site containing downloads of materials that can be used with this book. http://www.cengage.com/engineering/valvano
Acknowledgments Many shared experiences contributed to the development of this book. First, I would like to acknowledge the many excellent teaching assistants I have had the pleasure of working with. Some of these hardworking, underpaid warriors include Dr. Nachiket Kharalkar, Dr. Robin Tsang, John Porterfield, Sri Priya Ponnapalli, Dr. Anil Kottam, Brett Hemes, Priyank Patel, Dr. Byung-geun Lee, Deepak Panwar, Tawfik Chowdhury, Jungho Jo, Usman Tariq, Glen Rhodes, Sandy Hermawan, Jacob Egner, Robby Morrill, and Kyle Hutchens. Ann Meyer developed most of the code for the HD44780 LCD simulation. My teaching assistants have contributed greatly to the contents of this book, especially Nachiket and Robin. In the similar manner, my students have recharged my energy each semester with their enthusiasm, dedication, and quest for knowledge. Secondly, I appreciate the patience and expertise of my fellow faculty members here at the University of Texas at Austin. From a personal perspective Dr. John Pearce provided much needed encouragement and support throughout my career. In addition, as instructors of the class around which this book was developed Dr. Bill Bard, Dr. Nachiket Kharalkar, Dr. Nur Touba, Mr. Mark Welker, Mr. Gary Daniels, and Dr. Ramesh Yerraballi provided insight and substance for this book. Dr. Lizy John and Dr. Yale Patt contributed to the architecture sections in this book. Thirdly, I would like to thank the experts who reviewed this manuscript. This is the third book I have written, and I was deeply impressed by the quality and quantity of
䡲 Preface
vii
suggestions made by these reviewers. The rough draft had serious flaws in how it was organized, and thanks to their helpful advice, I think this book now flows smoothly. In particular, I want to thank Bill Bard, University of Texas at Austin Christopher M. Cischke, Michigan Technological University Bruce A. Harvey, Florida A & M University Joseph J. Pfeiffer, New Mexico State University Karkal S. Prabhu, Drexel University Eric M. Schwartz, University of Florida Lastly, I appreciate the valuable lessons of character and commitment taught to me by my parents and grandparents. I recall how hard my parents and grandparents worked to make the world a better place for the next generation. Most significantly, I acknowledge the love, patience and support of my wife, Barbara, and my children, Ben, Dan, and Liz. In particular, Ben helped with the web site and the animations.
JONATHAN W. VALVANO
Good luck!
Contents 1
Introduction to Embedded Microcomputer Systems 1
1.1 1.2 1.3 1.4 1.5 1.6 1.7
Basic Components of an Embedded System 2 Applications Involving Embedded Systems 5 Flowcharts and Structured Programming 6 Concurrent and Parallel Programming 10 Product Development Cycle Successive Refinement 17 Quality Design 19
3
12
1.7.1 Quantitative Performance Measurements 1.7.2 Qualitative Performance Measurements 1.7.3 Attitude 20
1.8 1.9 1.10
2
2.3
2.4 2.5
Debugging Theory 21 Tutorial 1. Getting Started 23 Homework Assignments 24
2.6 2.7 2.8 2.9
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15
2.3.1 Assembly Language Instructions 2.3.2 Pseudo Operation Codes 33
4
32
Simplified 9S12 Machine Language Execution 33 Simple Addressing Modes 36 Inherent Addressing Mode 37 Immediate Addressing Mode 37 Direct Addressing Mode 38 Extended Addressing Mode 38 Indexed Addressing Mode 39 PC Relative Addressing Mode 39
The Assembly Language Development Process 40 Memory Transfer Operations Subroutines 43 Input/Output 45 2.9.1 Direction Registers 45 2.9.2 Switch Interface 46 2.9.3 LED Interface 47
41
Tutorial 2. Running with TExaS Homework Assignments 52 Laboratory Assignments 55
51
Representation and Manipulation of Information 57
Binary and Hexadecimal Numbers 27 Addresses, Registers, and Accessing Memory 29 Assembly Syntax 32
2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 2.5.6
viii
19 19
Introduction to Assembly Language Programming 27
2.1 2.2
2.10 2.11 2.12
Precision 57 Boolean Information 59 8-bit Numbers 60 16-bit Numbers 64 Extended Precision Numbers 66 Logical Operations 66 Shift Operations 76 Arithmetic Operations: Addition and Subtractions 78 Arithmetic Operations: Multiplication and Divide 92 Character Information 97 Conversions 99 Debugging Monitor Using a LED 102 Tutorial 3. Arithmetic and Logical Operations 102 Homework Assignments 104 Laboratory Assignments 110
9S12 Architecture
4.1
Introduction 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7
4.2 4.3
111
Big and Little Endian 111 Memory-Mapped I/O 112 *I/O-Mapped I/O 113 *Segmented or Partitioned Memory Memory Bus Cycles 114 Processor Architecture 116 I/O Port Architecture 118
113
*Understanding Software Execution at the Bus Cycle Level 121 9S12 Architecture Details 127 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5
4.4
111
9S12C32 Architecture 126 9S12DP512 Architecture 129 9S12E128 Architecture 132 Operating Modes 134 Phase-Lock-Loop (PLL) 134
The Stack
135
䡲 Contents
4.5 4.6 4.7
16-Bit Timer 137 *Memory Allocation 140 Performance Debugging 142
6
4.7.1 Instrumentation 142 4.7.2 Measurement of Dynamic Efficiency
4.8
4.9 4.10
5
5.1
6.1 142
Modular Design
Making Decisions
152 153
*Macros 168 *Recursion 171 Writing Quality Software
5.7.1 5.7.2 5.7.3 5.7.4 5.7.5
5.8 5.9 5.10 5.11
Stabilization 180 Single Stepping 180 Breakpoints Without Filtering 181 Conditional Breakpoints 181 Instrumentation: Print Statements 181
Tutorial 5a. Editing and Assembling 181 Tutorial 5b. MicrocomputerBased Lock 182 Homework Problems 186 Laboratory Assignments 190
6.9
Abstraction 216 Moore Finite-State Machines 217 Mealy Finite-State Machines 221 Functional Abstraction Within Finite-State Machines 223
*Dynamically Allocated Data Structures 226 6.9.1 *Fixed Block Memory Manager 6.9.2 *Linked List FIFO 230
*9S12 Paged Memory Functional Debugging
229
232 235
6.11.1 Instrumentation: Dump Into Array without Filtering 235 6.11.2 Instrumentation: Dump Into Array with Filtering 236
179
*How Assemblers Work 179 Functional Debugging 180
Arrays 199 Strings 203 *Matrices 204 Structures 209 *Tables 210 *Trees 212 Finite-State Machines with Statically Allocated Linked Structures 216 6.8.1 6.8.2 6.8.3 6.8.4
6.10 6.11
174
5.5.1 Assembly Language Style Guidelines 174 5.5.2 Comments 177 5.5.3 Inappropriate I/O and Portability
5.6 5.7
6.2 6.3 6.4 6.5 6.6 6.7 6.8
161
5.2.1 Conditional Branch Instructions 161 5.2.2 Conditional if-then Statements 163 5.2.3 Conditional if-then-else Statements 166 5.2.4 While Loops 166 5.2.5 For Loops 167
5.3 5.4 5.5
Indexed Addressing Modes used in Implement Pointers 193 6.1.1 Indexed Addressing Mode 193 6.1.2 Auto Pre/Post Decrement/Increment Indexed Addressing Mode 195 6.1.3 Accumulator Offset Indexed Addressing Mode 196 6.1.4 Indexed Indirect Addressing Mode 196 6.1.5 Accumulator D Offset Indexed Indirect Addressing Mode 196 6.1.6 Post-Byte Machine Coded for Indexed Addressing 196 6.1.6 Load Effective Address Instructions 197 6.1.7 Call-by-Reference Parameter Passing 198
5.1.1 Definition and Goals 153 5.1.2 Functions, Procedures, Methods, and Subroutines 155 5.1.3 Dividing a Software Task into Modules 156 5.1.4 How to Draw a Call-Graph 158 5.1.5 How to Draw a Data Flow Graph 160 5.1.6 Top-Down Versus Bottom-Up Design 160
5.2
Pointers and Data Structures 192
Tutorial 4. Building a Microcomputer and Executing Machine Code 144 Homework Assignments 147 Laboratory Assignments 148
Modular Programming
ix
6.12 6.13 6.14
7
Tutorial 6. Software Abstraction Homework Assignments 238 Laboratory Assignments 244
Local Variables and Parameter Passing
7.1 7.2 7.3 7.4
Local Versus Global 256 Stack Rules 259 Local Variables Allocated on the Stack 261 Stack Frames 262
256
237
䡲 Contents
x
7.5
Parameter Passing Using Registers, Stack and Global Variables 265
9.3 9.4 9.5 9.6
7.5.1 Parameter Passing in C 265 7.5.2 Parameter Passing in Assembly Language 267 7.5.3 C Compiler Implementation of Local and Global Variables 270
7.6 7.7 7.8
8
Tutorial 7. Debugging Techniques Homework Problems 276 Laboratory Assignments 280
8.1 8.2
8.3
8.4 8.5 8.6
RS232 Protocol 286 Transmitting in Asynchronous Mode 287 Receiving in Asynchronous Mode 288 9S12 SCI Details 290
SPI Fundamentals 294 SPI Details 296 9S12DP512 Module Routing Register 8-bit DAC Interface 299
8.7 8.8 8.9 8.10
309
*Pulse-Width Modulation 311 *Stepper Motors 316 Homework Problems 320 Laboratory Assignments 321
I/O Sychronization Interrupt Concepts
326 330
9.2.1 Introduction 330 9.2.2 Essential Components of Interrupt Processing 332 9.2.3 Sequence of Events 333 9.2.4 9S12 Interrupts 334 9.2.5 Polled versus Vectored Interrupts 337 9.2.6 Pseudo-Interrupt Vectors 337
10
Pulse Accumulator 352 *Direct Memory Access 356 Hardware Debugging Tools 357 Profiling 358
Tutorial 9. Profiling 363 Homework Problems 365 Laboratory Assignments 367
Numerical Calculations
368
Fixed-Point Numbers 368 *Extended Precision Calculations
371
10.2.1 Addition and Subtraction 372 10.2.2 Shift Operations 374 10.2.3 Mathematical Instructions on the 9S12 375 10.2.4 Multiplication and Division 377 10.2.5 Table Lookup and Interpolation 380
298
Interrupt Programming and Real-Time Systems 326
9.1 9.2
9.11 9.12 9.13
10.1 10.2
Scanned Keyboards 301 Parallel Port LCD Interface with the HD44780 Controller 303 Binary Actuators 306 8.6.1 Interface 306 8.6.2 Electromagnetic and Solid-State Relays 8.6.3 Solenoids 311
345
9.10.1 Profiling using a Software Dump to Study Execution Pattern 359 9.10.2 Profiling using an Output Port 360 9.10.3 *Thread Profile 361
Synchronous Peripheral Interface, SPI 294 8.3.1 8.3.2 8.3.3 8.3.4
9
9.7 9.8 9.9 9.10
General Introduction to Interfacing 284 Serial Communications Interface, SCI 286 8.2.1 8.2.2 8.2.3 8.2.4
9.6.1 Timer Features and Timer Overflow 9.6.2 Output Compare Interrupts 347 9.6.3 Input Capture Interrupts 350
274
Serial and Parallel Port Interfacing 284
Key Wakeup Interrupts 338 Periodic Interrupt Programming 342 Real-Time Interrupt (RTI) 343 Timer Overflow, Output Compare and Input Capture 345
10.3 10.4 10.5 10.6 10.7
11 11.1 11.2 11.3 11.4
Expression Evaluation 381 *IEEE Floating-Point Numbers Tutorial 10. Overflow and Dropout 387 Homework Problems 388 Laboratory Problems 392
Analog I/O Interfacing
383
398
Approximating Continuous Signals in the Digital Domain 398 Digital to Analog Conversion 399 Music Generation 400 Analog to Digital Conversion 403 11.4.1 9S12 ADC Details 403 11.4.2 ADC Data Formats 406 11.4.3 ADC Resolution 407
11.5 11.6 11.7
*Multiple Access Circular Queues 408 Real-Time Data Acquisition 409 *Control Systems 413
䡲 Contents
11.8 11.9 11.10
12 12.1 12.2 12.3
Tutorial 11. Analog Input Programming 416 Homework Problems 417 Laboratory Assignments 419
12.5 12.6
A1.5 A1.6
Introduction 433 Reentrant Programming and Critical Sections 434 Interthread Communication and Synchronization 438 Mailbox 439 Producer Consumer Problem FIFO Queue Implementation Double Buffer 446
A1.7 A1.8
440 444
Serial Port Interface using Interrupt Synchronization 438 *Distributed Systems. 447 *Design and Implementation of a Controller Area Network (CAN) 449 12.6.1 The Fundamentals of CAN 451 12.6.2 Details of the 9S12 CAN 454 12.6.3 9S12 CAN Device Driver 468
12.7
12.8 12.9 12.10 12.11
The Fundamentals of I2C 460 I2C Synchronization 464 9S12 I2C Details 465 9S12 I2C Single Master Example
469
Wireless Communication 470 Tutorial 12. Performance Debugging 470 Homework Problems 472 Laboratory Assignments 476
Appendix 1 Embedded System Development Using TExaS 480 A1.1 Introduction to TExaS 480 A1.2 Major Components of TExaS 483 A1.3 Embedded System Design Process 486
Overall Structure 492 Label Field 492 Operation Field 493 Operand Field 493 Expressions 494 Comment Field 496 Assembly Listing and Errors 497 Assembler Pseudo-Ops 499 S-19 Object Code 503
TExaS ViewBox 505 Microcomputer Interfacing in TExaS 506
Appendix 2 Running on an Evaluation Board 508 Appendix 3 Systems Engineering 511 A3.1 A3.2
*Inter-Integrated Circuit (I2C) Interface 460 12.7.1 12.7.2 12.7.3 12.7.4
Running and Modifiing Existing Assembly Language Programs 490 TExaS Editor 491 Assembly Language Syntax 492 A1.6.1 A1.6.2 A1.6.3 A1.6.4 A1.6.5 A1.6.6 A1.6.7 A1.6.8 A1.6.9
Communication Systems 433
12.3.1 12.3.4 12.3.4 12.3.4
12.4
A1.4
Design for Manufacturability Battery Power 512
Glossary of Terms
514
Solutions Manual
529
Checkpoint Solutions 529 Tutorial Solutions 542
Index
550
511
xi
This page intentionally left blank
1
Introduction to Embedded Microcomputer Systems Chapter 1 objectives are to: c Introduce embedded microcomputer systems c Outline the basic steps in developing microcomputer systems c Define data flow graphs, flowcharts and call graphs
It is an effective approach to learn new techniques by doing them. But, the dilemma in learning a laboratory-based topic like embedded systems is that there is a tremendous volume of details that first must be learned before microcomputer hardware and software systems can be designed. The approach taken in this book is to learn by doing. One of the advantages of a bottom-up approach to learning is that the student begins by mastering simple concepts. Once the student truly understands simple concepts, he or she can then embark on the creative process of design, which involves the putting the pieces together to create a more complex system. True creativity is needed to solve complex problems using effective combinations of simple components. Embedded systems afford an effective platform to teach new engineers how to program for three reasons. First, there is no operating system. Thus, in a bottom-up fashion the student can see, write, and understand all software running on a system that actually does something. Second, embedded systems involve input/output that is easy for the student to touch, hear, and see. Third, embedded systems are employed in many every-day products, motivating students by showing them how electrical and computer engineering processes can be applied in the real world. Rather than introduce the voluminous details in an encyclopedic fashion, the book is organized by basic concepts, and the details are introduced as they are needed. We will start with simple systems and progressively add complexity. The overriding theme for Chapter 1 will be to present the organizational framework with which embedded systems will be designed. Chapters 2 through 4 explain how the computer works. Chapters 5, 6, 7, and 10 present the details of software development on an embedded system. Interfacing I/O devices to build embedded systems is presented in Chapters 8, 9, 11, 12, and 13.
1
2
1.1
1 䡲 Introduction to Embedded Microcomputer Systems
Basic Components of an Embedded System Information is stored on the computer in binary form. A binary bit can exist in one of two possible states. In positive logic, the presence of a voltage is called the ‘1’, true, asserted, or high state. The absence of a voltage is called the ‘0’, false, not asserted, or low state. Figure 1.1 shows the output of a typical complementary metal oxide semiconductor (CMOS) circuit. The left side shows the condition with a true bit, and the right side shows a false. The output of each digital circuit consists of a p-type transistor “on top of” an n-type transistor. In digital circuits, each transistor is essentially on or off. If the transistor is on, it is equivalent to a short circuit between its two output pins. Conversely, if the transistor is off, it is equivalent to an open circuit between its outputs pins. On a 9S12 powered with 5 V supply, a voltage between 3.25 and 5 V is considered high, and a voltage between 0 and 1.75 V is considered low. Separating the two regions by 1.5 V allows digital logic to operate reliably at very high speeds. The design of transistor-level digital circuits is beyond the scope of this book. However, it is important to know that digital data exist as binary bits and encoded as high and low voltages.
Figure 1.1 A binary bit is true if a voltage is present and false if the voltage is 0.
True
Equivalence +5V
+5V
p-type on Out=5V n-type off
False
+5V
p-type off Out=0V
short Out=5V
n-type on
open
Equivalence +5V open Out=0V short
If the information we wish to store exists in more than two states, we use multiple bits. For example, a byte contains 8 bits, and is built by grouping 8 binary bits into one object, as shown in Figure 1.2. Information can take many forms, e.g., numbers, logical states, text, instructions, sounds, or images. What the bits mean depends on how the information is organized and more importantly how it is used. Figure 1.2 A byte is comprised of 8 bits.
Bit 7
Bit 6 +5V
+5V
Bit 5 +5V
Bit 4 +5V
Bit 3 +5V
Bit 2 +5V
Bit 1 +5V
Bit 0 +5V
Memory is a collection of hardware elements in a computer into which we store information, as shown in Figure 1.3. For most computers in today’s market, each memory cell contains one byte of information, and each byte has a unique and sequential address. The memory is called byte-addressable because each byte has a separate address. The address of a memory cell specifies its physical location and its contents is the data. When we write to memory, we specify an address and 8 bits of data, causing that information to be stored into the memory. When we read from memory we specify an address, causing 8 bits of data to be retrieved from the memory. Read Only Memory, or ROM, is a type of memory where is the information is programmed or burned into the device, and during normal operation it only allows read accesses. Random Access Memory (RAM) is used
1.1 䡲 Basic Components of an Embedded System Figure 1.3 Memory is a sequential collection of data storage elements.
Address
3
Contents
103 Main St 104 Main St 105 Main St 106 Main St 107 Main St 108 Main St
to store temporary information, and during normal operation we can read data from or write data into RAM. The information in the ROM is nonvolatile, meaning the contents are not lost when power is removed. In contrast, the information in the RAM is volatile, meaning the contents are lost when power is removed. The system can quickly and conveniently read data from a ROM. It takes a comparatively long time to program or burn data into a ROM. In contrast, it is fast and easy to both read data from and write data into a RAM. Software is a set of instructions, stored in memory, that are executed in a complicated but well-defined manner. The processor is the digital hardware device that executes software. A port is a physical connection between the computer and its outside world. Ports allow information to enter and exit the system. Information enters via the input ports and exits via the output ports. Other names used to describe ports are I/O ports, I/O devices, interfaces, or sometimes just devices. A bus is a collection of wires used to pass information between modules. A computer is an electronic device with a processor, memory, and I/O ports, connected together with a bus. A microcomputer is a computer small enough that one person can carry it. Small in this context describes its size not its computing power. Consequently, there can be great confusion over the term microcomputer, because it can refer to a very wide range of devices from a PIC12C508, which is an 8-pin chip with 512 words of ROM and 25 bytes RAM, to the most powerful Pentium-based personal computer. Computers are not intelligent. Rather, you are the true genius. Computers are electronic idiots. They can store a lot of data, but they will only do exactly what we tell them to do. Fortunately, however, they can execute our programs quite quickly, and they don’t get bored doing the same tasks over and over again. To better understand the expression embedded microcomputer system, consider each word separately. In this context, the word “embedded” means hidden inside so one can’t see it. The term “micro” means small, and a “computer” contains a processor, memory, and a means to exchange data with the external world. In an embedded system, we use ROM for storing the software and fixed constant data, and RAM for storing temporary information. Many microcomputers employed in embedded systems use EEPROM, which is an electrically erasable programmable ROM, because the information can easily be erased and reprogrammed. The functionality of a digital watch is defined by the software programmed into its ROM. When you remove the batteries from a watch and insert new batteries, it still behaves like a watch because the ROM is nonvolatile storage. As shown in Figure 1.4, the term embedded microcomputer system refers to a device that contains one or more microcomputers inside. Microcontrollers, which are microcomputers incorporating the processor, RAM, ROM and I/O ports into a single package, are often employed in an embedded system because of their low cost, small size, and low power
4
1 䡲 Introduction to Embedded Microcomputer Systems
Figure 1.4 An embedded system includes a microcomputer interfaced to external devices.
Embedded system Microcontroller
9S12
Processor I/O Ports
RAM ROM Bus
ADC
Electrical, mechanical, chemical, or optical devices DAC Analog signals
requirements. Microcontrollers like the 9S12 are available with a large number and wide variety of I/O devices, such as parallel ports, serial ports, timers, digital to analog convertors (DAC), and analog to digital convertors (ADC). The I/O devices are a crucial part of an embedded system, because they provide necessary functionality. The software together with the I/O ports and associated interface circuits give an embedded computer system its distinctive characteristics. Checkpoint 1.1: What is an embedded system?
A digital multimeter, as shown in Figure 1.5, is a typical embedded system. This embedded system has two inputs, the mode selection dial on the front and the red/black test probes. The output is a liquid crystal display (LCD) showing measured parameters. The large black chip inside the box is a microcontroller. The software that defines its very specific purpose is programmed into the ROM of the microcontroller. As you can see, there is not much else inside this box other than the microcontroller, a fuse, a few interfacing resistors, and a battery. Figure 1.5 A digital multimeter contains a microcontroller programmed to measure voltage, current and resistance.
As defined previously, a microcomputer is a small computer. One typically restricts the term embedded to refer to systems that do not look and behave like a typical computer. Most embedded systems do not have a keyboard, a graphics display, or secondary storage (disk). There are two ways to develop embedded systems. The first technique uses a microcontroller, like the 9S12. In general, there is no operating system, so the entire software system must be developed. These devices are suitable for low-cost, low-performance systems. One the other hand, one can develop a high-performance embedded system around the Arm or PC architecture. These systems typically employ an operating system, and are first designed on a development platform, and then the software and hardware are migrated to a standalone embedded platform. Checkpoint 1.2: What is a microcomputer?
The external devices attached to the microcontroller allow the system to interact with its environment. An interface is defined as the hardware and software that combine to allow the computer to communicate the external hardware. We must also learn how to interface a
1.2 䡲 Applications Involving Embedded Systems
5
wide range of inputs and outputs that can exist in either digital or analog form. This book provides an introduction to microcomputer programming, hardware interfacing, and the design of embedded systems. In general, we can classify I/O interfaces into four categories Parallel—binary data is available simultaneously on groups of lines Serial—binary data is available one bit at a time on a single line Analog—data is encoded as a variable voltage Time—data is encoded as a period, frequency, pulse width or phase shift A device driver is a set of software functions that facilitate the use of an I/O port. One of the simplest I/O ports on the 9S12 is a parallel port called PTT, meaning it is a collection of eight pins that can be used for either input or output. If PTT is an input port, then when the software reads from PTT, it gets eight bits (each bit is 1 or 0), representing the digital levels (high or low) that exist at the time of the read. If PTT is an output port, then when the software writes to PTT, it sets the outputs on the eight pins high (1) or low (0), depending on the data value the software has written. The other general concept involved in most embedded systems is they run in real-time. In a real-time computer system, we can put an upper bound on the time required to perform the input-calculation-output sequence. A real-time system can guarantee a worst case upper bound on the response time between when the new input information becomes available and when that information is processed. This response time is called interface latency. Another real-time requirement that exists in many embedded systems is the execution of periodic tasks. A periodic task is one that must be performed at equal-time intervals. A realtime system can put a small and bounded limit on the time error between when a task should be run and when it is actually run. Because of the real-time nature of these systems, microcomputers have a rich set of features to handle many aspects of time. Checkpoint 1.3: An input device allows information to be entered into the computer. List some of the input devices available on a general purpose computer. Checkpoint 1.4: An output device allows information to exit the computer. List some of the output devices available on a general purpose computer.
The embedded computer systems in this book will contain a Freescale 9S12, which will be programmed to perform a specific dedicated application. Software for embedded systems typically solves only a limited range of problems. The microcomputer is embedded or hidden inside the device. In an embedded system, the software is usually programmed into ROM and therefore fixed. Even so, software maintenance (e.g., verification of proper operation, updates, fixing bugs, adding features, extending to new applications, end user configurations) is still extremely important. In fact, because microcomputers are employed in many safety-critical devices, injury or death may result if there are hardware and/or software faults. Consequently, testing must be considered in the original design, during development of intermediate components, and in the final product. The role of simulation is becoming increasingly important in today’s market place as we race to build better and better machines with shorter and shorter design cycles. An effect approach to building embedded systems is to first design the system using a hardware/software simulator, then download and test the system on an actual microcontroller.
1.2
Applications Involving Embedded Systems An embedded computer system includes a microcomputer with mechanical, chemical and electrical devices attached to it, programmed for a specific dedicated purpose, and packaged up as a complete system. Any electrical, mechanical, or chemical system that involves inputs, decisions, calculations, analyses, and outputs is a candidate for implementation as an embedded system. Electrical, mechanical, and chemical sensors collect information.
6
1 䡲 Introduction to Embedded Microcomputer Systems
Electronic interfaces convert the sensor signals into a form acceptable for the microcomputer. For example, a tachometer is a sensor that measures the revolutions per second of a rotating shaft. Microcomputer software performs the necessary decisions, calculations, and analyses. Additional interface electronics convert the microcomputer outputs into the necessary form. Actuators can be used to create mechanical or chemical outputs. For example, an electrical motor converts electrical power into mechanical power. One automobile may soon employ up to 100 microcontrollers. In fact, upscale homes already contain as many as 150 microcontrollers, and the average consumer now interacts with microcontrollers thousands of times each day. Embedded microcomputers impact virtually all aspects of daily life: 䡲 䡲 䡲 䡲 䡲 䡲
Consumer electronics Communication systems Automotive systems Military hardware Business applications Medical devices
Table 1.1 presents typical embedded microcomputer applications and the function performed by the embedded microcomputer. Each microcomputer accepts inputs, performs calculations, and generates outputs. In contrast, a general-purpose computer system typically has a keyboard, disk and graphics display and can be programmed for a wide variety of purposes. Typical generalpurpose applications include word processing, electronic mail, business accounting, scientific computing, and data base systems. The user of a general-purpose computer does have access to the software that controls the machine. In other words, the user decides which operating system to run and which applications to launch. Because the general-purpose computer has a removable disk or network interface, new programs can easily be added to the system. The most common type of general-purpose computer is the personal computer, e.g., the Apple Macintosh or the IBM-PC compatible computer. Computers more powerful than the personal computer can be grouped in the workstation category, ranging from $10,000 to $50,000 range. Supercomputers cost above $50,000. These computers often employ multiple processors and have much more memory than the typical personal computer. The workstations and supercomputers are used for handling large amounts of information (business applications) or performing large calculations (scientific research.) This book will not specifically cover the general-purpose computer, although many of the basic principles of embedded computers do apply to all types of computer systems. Checkpoint 1.5: There is a microcomputer embedded in a digital watch. List three operations the software must perform.
1.3
Flowcharts and Structured Programming The remainder of this chapter will discuss the art and science of designing embedded systems from a general perspective. If you need to write a paper, you decide on a theme, then begin with an outline. In the same manner, if you design an embedded system, you define its specification (what it does), and begin with an organizational plan. In this chapter, we will present three graphical tools to describe the organization of an embedded system: flowcharts, data flow graphs and call graphs. You should draw all three for every system you design. In this section, we introduce the flowchart syntax that will be used throughout the book. Programs themselves are written in a linear or one-dimensional fashion. In other words, we type one line of software after another in a sequential fashion. Writing programs this way is a natural process, because the computer itself usually executes the program in a top-to-bottom
1.3 䡲 Flowcharts and Structured Programming Table 1.1 Embedded system applications.
Function Performed by the Microcomputer Consumer electronics Washing machine Exercise equipment Remote controls Clocks and watches Games and toys Audio/video electronics Set-back thermostats Camera, camcoder Television, VCR, cable box Communication systems Answering machines Telephones Fax machines Radios Cellular phones, pagers Automotive systems Automatic breaking Noise cancellation Locks Electronic ignition Power windows and seats Cruise control Collision avoidance Climate control Emission control Instrumentation Military hardware Smart weapons Missile guidance systems Global positioning systems Surveillance systems Business applications Cash registers Vending machines ATM machines Traffic controllers Industrial robots Bar code readers and writers Automatic sprinklers Elevator controllers RFID systems Lighting and heating systems Medical devices Monitors Drug delivery systems Cancer treatments Pacemakers Prosthetic devices Dialysis machines
Controls the water and spin cycles Measures speed, distance, calories, heart rate Accepts key touches, and sends infrared pulses Maintains the time, alarm, and display Entertains the user, accepts joystick input, displays video output Interacts with the operator and enhances performance Adjusts day/night thresholds saving energy Records and organizes images Accepts inputs and processes audio/visual signals Plays outgoing message, saves and organizes messages Transmits voice and data information Sends and receives images Sends and receives audio, noise rejection Accepts key pad input, outputs sound, and enables communication Optimizes stopping on slippery surfaces Improves sound quality Allows keyless entry, detects intruders, activates alarms Controls sparks and fuel injectors Remembers preferred settings for each driver Maintains constant speed Reduces accidents Improves comfort Reduces pollution Collects and provides necessary information Recognizes friendly targets Directs ordnance at the desired target Determines where you are on the planet Collects information about enemy activities Accepts inputs and manages money Collects money and dispenses product Provides both security and convenience Senses car positions and controls traffic lights Accepts input from sensors, controls motors Controls inventory and optimizes shipping Controls the wetness of the soil Maximizes traffic, minimizes waiting time Identifies products using radiofrequency tags Maximizes comfort and minimizes cost Measures important functions Administers proper doses Controls doses of radiation, drugs, or heat Helps the heart beat regularly Increases mobility for the handicapped Performs functions normally done by the kidney
7
8
1 䡲 Introduction to Embedded Microcomputer Systems
sequential fashion. This one-dimensional format is fine for simple programs, but conditional branching and function calls may create complex behaviors that are not easily observed in a linear fashion. Flowcharts are one way to describe software in a two-dimensional format, specifically providing convenient mechanisms to visualize conditional branching and function calls. Flowcharts are very useful in the initial design stage of a software system to define complex algorithms. Furthermore, flowcharts can be used in the final documentation stage of a project, once the system is operational, in order to assist in its use or modification. Observation: TExaS is one of the few software development systems that allow you to add flowcharts directly into your software as part of its documentation.
Figures throughout this section illustrate the syntax used to draw flowcharts. The oval shapes define entry and exit points. The main entry point is the starting point of the software. Each function, or subroutine, also has an entry point. The exit point returns the flow of control back to the place from which the function was called. When the software runs continuously, as is typically the case in an embedded system, there will be no main exit point. We use rectangles to specify process blocks. In a high-level flowchart, a process block might involve many operations, but in a low-level flowchart, the exact operation is defined in the rectangle. The parallelogram will be used to define an input/output operation. Some flowchart artists use rectangles for both processes and input/output. Since input/output operations are an important part of embedded systems, we will use the parallelogram format, which will make it easier to identify input/output in our flowcharts. The diamond-shaped objects define a branch point or decision block. The rectangle with double lines on the side specifies a call to a predefined function. In this book, functions, subroutines and procedures are terms that all refer to a well-defined section of code that performs a specific operation. Functions usually return a result parameter, while procedures usually do not. Functions and procedures are terms used when describing a highlevel language, while subroutines often used when describing assembly language. When a function (or subroutine or procedure) is called, the software execution path jumps to the function, the specific operation is performed, and the execution path returns to the point immediately after the function call. Circles are used as connectors. Common Error: In general, it is bad programming style to develop software that requires a lot of connectors when drawing its flowchart.
There are a seemingly unlimited number of tasks one can perform on a computer, and the key to developing great products is to select the correct ones. Just like hiking through the woods, we need to develop guidelines (like maps and trails) to keep us from getting lost. One of the fundamental issues when developing software, regardless whether it is a microcontroller with 1000 lines of assembly code or a large computer system with billions of lines of code, is to maintain a consistent structure. One such framework is called structured programming. A good high-level language will force the programmer to write structured programs. Structured programs are built from three basic building blocks: the sequence, the conditional, and the while-loop. At the lowest level, the process block contains simple and well-defined commands. I/O functions are also low-level building blocks. Structured programming involves combining existing blocks into more complex structures, as shown in Figure 1.6.
Figure 1.6 Flowchart showing the basic building blocks of structured programming.
Sequence
Conditional
While-loop
Block 1 Block 2
Block 1
Block 2
Block
1.3 䡲 Flowcharts and Structured Programming
9
Example 1.1: Using a flowchart describe the control algorithm that a toaster might use to cook toast. There will be a start button the user pushes to activate the machine. There is other input that measures toast temperature. The desired temperature is preprogrammed into the machine. The output is a heater, which can be on or off. The toast is automatically lowered into the oven when heat is applied and is ejected when the heat is turned off. Solution This example illustrates a common trait of an embedded system, that is, they perform the same set of tasks over and over forever. The program starts at main when power is applied, and the system behaves like a toaster until it is unplugged. Figure 1.7 shows a flowchart for one possible toaster algorithm. The system initially waits for the operator to push the start button. If the switch is not pressed the system loops back reading and checking the switch over and over. After the start button is pressed, heat is turned on. When the toast temperature reaches the desired value, heat is turned off and the process is repeated. Figure 1.7 Flowchart illustrating the process of making toast.
Entry point Input/Output Decision Input/Output Input/Output Decision
main Input from switch Start
Not pressed
Pressed Output heat is on Too cold Input toast temperature toast < desired toast ≥ desired
Input/Output
Output heat is off
Checkpoint 1.6: What safety feature might you add to this toaster to reduce the chance of a fire?
Example 1.2: Design a flowchart to illustrate the process of reading a book. The inputs to this system are words read from the book, and definitions looked up in a dictionary. The objective of this system will be to store knowledge into a database. There will be no formal output per se. Solution This second example illustrates the concept of a subroutine. We break a complex system into smaller components so that the system is easier to understand, and easier to test. In particular, once we know how to look up definitions of words in a dictionary, we will encapsulate that process into a subroutine, called Lookup. In this example, the main program performs the tasks of reading and remembering. We use a while-loop to read each word of the book in order until the end of the book is reached. After we read a word from the book, we use a conditional to determine whether or not we understand the meaning of the word. If we do not understand the word, we call the Lookup subroutine to find the definition in the dictionary. After we have read and understood each word, we record the knowledge we have learned into a database. The letters A through D in Figure 1.8 specify the software activities in this simple example. In this example, execution is sequential and predictable
10
1 䡲 Introduction to Embedded Microcomputer Systems
(if BD is to occur, it will come after A and before C.) A software task is called a thread. More formally, a thread is the execution of software or the action caused by the execution. In this example, there is one thread. Consider a book with 10 words, and we do not know the meaning of word 4 and word 7. The thread caused by the execution when reading this 10-word book will be A0 C0 A1 C1 A2 C2 A3 C3 A4 B4 D4 C4 A5 C5 A6 C6 A7 B7 D7 C7 A8 C8 A9 C9 where the subscript refers to the word number. The main program executes the sequence AC or ABDC over and over as it finishes reading the book. Figure 1.8 Flowchart illustrating the process of reading a book.
Entry point Connector 1
main End of book
Decision More Input/Output
Exit point
Read next word w
return
Entry point
Lookup(w)
Input/Output
Read w in dictionary
Exit point Decision
w
Connector
1.4
return
Don’t understand
Understand Function call Process block
D
A
Remember
Lookup(w)
B
C
1
Concurrent and Parallel Programming Many problems can not be implemented using the single-threaded execution pattern described in the previous section. Parallel programming allows the computer to execute multiple threads at the same time. State-of-the art multi-core processors can execute a separate program in each of its cores. Fork and join are the fundamental building blocks of parallel programming. After a fork, two or more software threads will be run in parallel, i.e., the threads will run simultaneously on separate processors. Two or more simultaneous software threads can be combined into one using a join. The flowchart symbols for fork and join are shown in Figure 1.9. Software execution after the join will wait until all threads above the join are complete. As an analogy, if I want to dig a big hole in my back yard, I will invite three friends over and give everyone a shovel. The fork operation changes the situation from me working alone to four of us ready to dig. The four digging tasks are run in parallel. When the overall task is complete, the join operation causes the friends go away, and I am working alone again. Concurrent programming allows the computer to execute multiple threads, but only one at a time. Interrupts are one mechanism to implement concurrency on real-time systems. Interrupts have a hardware trigger and a software action. An interrupt is a parameterless subroutine call, triggered by a hardware event. The flowchart symbols for interrupts are
Figure 1.9 Flowchart symbols to describe parallel and concurrent programming.
Fork
Trigger interrupt
Process
Process
Process
Process
Join
Return from interrupt
1.4 䡲 Concurrent and Parallel Programming
11
also shown in Figure 1.9. The trigger is a hardware event signaling it is time to do something. Examples of interrupt triggers we will see in this book include new input data has arrived, output device is idle, and periodic event. The second component of an interruptdriven system is the software action called an interrupt service routine (ISR). The foreground thread is defined as the execution of the main program, and the background threads are executions of the ISRs. Consider the analogy of sitting in a comfy chair reading a book. Reading a book is like executing the main program in the foreground. You start reading at the beginning of the book and basically read one page at time in a sequential fashion. You might jump to the back and look something up in the glossary, then jump back to where you where, which is analogous to a function call. Similarly, if you might read the same page a few times, which is analogous to a program loop. Even though you skip around a little, the order of pages you read follows a logical and well-defined sequence. Conversely, if the telephone rings, you place a bookmark in the book, and answer the phone. When you are finished with the phone conversation, you hang up the phone and continue reading in the book where you left off. The ringing phone is analogous to hardware trigger and the phone conversation is like executing the ISR.
Example 1.3 Design a flowchart for a system that performs two independent tasks. The first task is to output a pulse on PTT every 1.024 ms in real time. The second task is to find all the prime numbers, and there are no particular time constraints on when or how fast one finds the prime numbers. Solution In this example, there are two threads: foreground and background. Real-time means the output pulse must occur every 1.024 ms. Therefore, we will use a periodic interrupt to guarantee this real-time requirement. In particular, the timer system will be configured so that a hardware trigger will occur every 1.024 ms, and the software action will issue the pulse on PTT. The background thread causes the output to go high, then low. Tasks that are not timecritical can be performed in the foreground by the main program. In this example, the foreground thread finds prime numbers. Because both threads are active at the same time, we say the system is multithreaded and the threads are running concurrently. The letters (A through F) in Figure 1.10 specify the software activities in this multithreaded example. In particular, main Factor and Record are executed in the foreground. In the foreground, execution is sequential and predictable (if C is to occur, it will come after B and before D.) On the other hand, with interrupts, the hardware trigger causes the interrupt service routine to execute. The execution of the ISR is predictable too; in this case it is executed every 1.024 ms, but Figure 1.10 Flowchart for a multithreaded solution of a system performing two tasks.
Clock
Entry point
main
Process block
n=2
A
Input/Output
PTT = 1
Factor(n)
B
Input/Output
PTT = 0 F
Interrupt trigger
< E
Connector 1 Function call
Prime
Decision Function call Process block
Not
n = n+1
Connector 1
Record (n) D
Return from interrupt
>
C void interrupt 7 Clock(void){ PTT = 1; E PTT = 0; F } > void main(void){ int n=2; A while(1){ if(Factor(n)) B Record(n); C n = n+1; D } }
12
1 䡲 Introduction to Embedded Microcomputer Systems
ISR execution does not depend on execution in the foreground. In a single processor system like the 9S12, the interrupt must suspend foreground execution, execute the interrupt service routine in the background, then resume execution of the foreground. The symbol signifies the hardware halting the main program and launching the ISR. The symbol signifies the ISR software executing a return from interrupt instruction (rti), which resumes execution in the main program. The execution sequence of this two-threaded system might be something like the following (2, 3, 5, 7 are prime) Foreground A B2C2D2 B3C3D3 B4D4 B5C5D5 B6D6 B7C7 D7 B8D8B9D9 B10 D10 EF EF EF Background where the subscript refers to the current value of n. The main program executes the sequence BCD or BD over and over as it searches for prime numbers. In this example, the periodic timer causes the execution of EF every 1.024 ms. Even though C will come after B and before D, interrupts may or may not inject a EF between any two instructions of the foreground thread. Being able to inject a EF exactly every 1.024 ms is how the real-time constraint is satisfied.
Figure 1.11 Parallel programming solution for finding the maximum value in a buffer.
Buf[0]>Buf[1]
x = Buf[0]
Buf[0]<Buf[1]
Buf[2]>Buf[3]
x = Buf[1]
x>y
max = x
y = Buf[2]
Buf[2]<Buf[3]
y = Buf[3]
x0
0 Solenoid=on
1.7 䡲 Quality Design
19
the switches do not match we will lock the door. We will use a counter (cnt) to make sure the switches match the keycode for at least 1 ms before unlocking the door. The waiting is implemented by decrementing the counter. The hardware and software will be implemented in detail as Tutorial 5B.
1.7
Quality Design Embedded system development is similar to other engineering tasks. We can choose to follow well-defined procedures during the development and evaluation phases, or we can meander in a haphazard way and produce code that is hard to test and harder to change. The ultimate goal of the system is to satisfy the stated objectives such as accuracy, stability, and input/output relationships. Nevertheless it is appropriate to separately evaluate the individual components of the system. Therefore in this section, we will evaluate the quality of our software. There are two categories of performance criteria with which we evaluate the “goodness” of our software. Quantitative criteria include dynamic efficiency (speed of execution), static efficiency (memory requirements), and accuracy of the results. Qualitative criteria center on ease of software maintenance. Another qualitative way to evaluate software is ease of understanding. If your software is easy to understand then it will be: Easy to debug (fix mistakes) Easy to verify (prove correctness) Easy to maintain (add features) Common Error: Programmers who sacrifice clarity in favor of execution speed often develop software that runs fast, but does work and can’t be changed.
Golden Rule of Software Development Write software for others as you wish they would write for you.
1.7.1 Quantitative Performance Measurements
In order to evaluate our software quality, we need performance measures. The simplest approaches to this issue are quantitative measurements. Dynamic efficiency is a measure of how fast the program executes. It is measured in seconds or CPU cycles. Static efficiency is the number of memory bytes required. Since most embedded computer systems have both RAM and ROM, we specify memory requirement in global variables, stack space, fixed constants and program. The global variables plus the stack must fit into the available RAM. Similarly, the fixed constants plus the program must fit into the available ROM. We can also judge our embedded system according to whether or not it satisfies given requirements and constraints, like accuracy, cost, power, size, reliability, and time-table.
1.7.2 Qualitative Performance Measurements
Qualitative performance measurements include those parameters to which we can not assign a direct numerical value. Often in life the most important questions are the easiest to ask, but the hardest to answer. Such is the case with software quality. So therefore we ask the following qualitative questions. Can we prove our software works? Is our software easy to understand? Is our software easy to change? Since there is no single approach to writing the best software, we can only hope to present some techniques that you may wish to integrate into your-own software style. In fact, this book devotes considerable effort to the important issue of developing quality software. In particular, we will study self-documented code, abstraction, modularity, and layered software. These issues indeed play a profound effect on the bottom-line financial success of our projects. Although quite real, because there is often not a immediate and direct relationship between a software’s quality and profit, we may be mistakenly tempted to dismiss the importance of quality.
20
1 䡲 Introduction to Embedded Microcomputer Systems
To get a benchmark on how good a programmer you are, take the following two challenges. In the first challenge, find a major piece of software that you have written over 12 months ago, and then see if you can still understand it enough to make minor changes in its behavior. The second challenge is to exchange with a peer a major piece of software that you have both recently written (but not written together), then in the same manner, see if you can make minor changes to each other’s software. Observation: You can tell if you are a good programmer if 1) you can understand your own code 12 months later, and 2) others can make changes to your code.
1.7.3 Attitude
Good engineers employ well-defined design processes when developing complex systems. When we work within a structured framework, it is easier to prove our system works (verification) and to modify our system in the future (maintenance.) As our software systems become more complex, it becomes increasingly important to employ well-defined software design processes. Throughout this book, a very detailed set of software development rules will be presented. This book focuses on real-time embedded systems written in assembly language, but most of the comments should apply to other situations as well. At first, it may seem radical to force such a rigid structure to software. We might wonder if creativity will be sacrificed in the process. True creativity is more about good solutions to important problems and not about being sloppy and inconsistent. Because software maintenance is a critical task, the time spent organizing, documenting, and testing during the initial development stages will reap huge dividends throughout the life of the software project. Observation: The easiest way to debug is to write software without any bugs.
We define clients as programmers who will use our software. A client develops software that will call our functions. We define coworkers as programmers who will debug and upgrade our software. A coworker, possibly ourselves, develops, tests, and modifies our software. Writing quality software has a lot to do with attitude. We should be embarrassed to ask our coworkers to make changes to our poorly written software. Since so much software development effort involves maintenance, we should create software modules that are easy to change. In other words, we should expect each piece of our code will be read by another engineer in the future, whose job it will be to make changes to our code. We might be tempted to quit a software project once the system is running, but this short time we might save by not organizing, documenting, and testing will be lost many times over in the future when it is time to update the code. As project managers, we must reward good behavior and punish bad behavior. A company, in an effort to improve the quality of their software products, implemented the following policies. The employees in the customer relations department receive a bonus for every software bug that they can identify. These bugs are reported to the software developers, who in turn receive a bonus for every bug they fix. Checkpoint 1.8: Why did the above policy fail horribly?
We should demand of ourselves that we deliver bug-free software to our clients. Again, we should be embarrassed when our clients report bugs in our code. We should be mortified when other programmers find bugs in our code. There are a few steps we can take to facilitate this important aspect of software design. 1. Test it now. When we find a bug, fix it immediately. The longer we put off fixing a mistake the more complicated the system becomes, making it harder to find. Remember that bugs do not go away on their own, but we can make the system so complex that the bugs will manifest themselves in a mysterious and obscure fashion. For the same reason, we should completely test each module individually, before combining them into a larger system. We
1.8 䡲 Debugging Theory
21
should not add new features before we are convinced the existing system is bug-free. In this way, we start with a working system, add features, then debug this system until it is working again. This incremental approach makes it easier to track progress. It allows us to undo bad decisions, because we can always revert back to a previously working system. Adding new features before the old ones are debugged is very risky. With this sloppy approach, we could easily reach the project deadline with 100% of the features implemented, but have a system that doesn’t run. In addition, once a bug is introduced, the longer we wait to remove it, the harder it will be to correct. This is particularly true when the bugs interact with each other. Conversely, with the incremental approach, when the project schedule slips, we can deliver a working system at the deadline that supports some of the features. Maintenance Tip: Go from working system to working system.
2. Plan for testing. How to test each module should be considered at the start of a project. In particular, testing should be included as part of the design of both hardware and software components. Our testing and the client’s usage go hand in hand. In particular, how we test the module will help the client understand the context and limitations of how our component is to be used. On the other hand, a clear understanding of how the client wishes to use our hardware/software component is critical for both its design and its testing. Maintenance Tip: It is better to have some parts of the system that run with 100% reliability than to have the entire system with bugs.
3. Get help. Use whatever features are available for organization and debugging. Pay attention to warnings, because they often point to misunderstandings about data or functions. Misunderstanding of assumptions that can cause bugs when the software is upgraded, or reused in a different context than originally conceived. Remember that computer time is a lot cheaper than programmer time. Maintenance Tip: It is better to have a software system that runs slow than one that does run at all.
4. Deal with the complexity. In the early days of microcomputer systems, software size could be measured in 100’s of lines of source code using 1000’s of bytes of memory. These early systems, due to their small size, were inherently simple. The explosion of hardware technology (both in speed and size) has lead to a similar increase in the size of software systems. Some people forecast that by the next decade, automobilies will have 10 million lines of code in their embedded systems. The only hope for success in a large software system will be to break it into simple modules. In most cases, the complexity of the problem itself can not be avoided. E.g., there is just no simple way to get to the moon. Nevertheless, a complex system can be created out of simple components. A real creative effort is required to orchestrate simple building blocks into larger modules, which themselves are grouped to create even larger systems. Use your creativity to break a complex problem into simple components, rather than developing complex solutions to simple problems. Observation: There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is make it so complicated that there are no obvious deficiencies. C.A.R. Hoare, ”The Emperor’s Old Clothes,“ CACM Feb. 1981.
1.8
Debugging Theory The last section of every chapter will address debugging techniques. Every programmer is faced with the need to debug and verify the correctness of his or her software. A debugging instrument is hardware or software used for the purpose of debugging. In this book, we will study hardware-level probes like the logic analyzer and in-circuit-emulator
22
1 䡲 Introduction to Embedded Microcomputer Systems
(ICE); software-level tools like simulators, monitors, and profilers; and manual tools like inspection and print statements. Nonintrusiveness is the characteristic or quality of a debugger that allows the software/hardware system to operate normally as if the debugger did not exist. Intrusiveness is used as a measure of the degree of perturbation caused in program performance by the debugging instrument itself. For example, a printf statement added to your source code is very intrusive because it significantly affects the real-time interaction of the hardware and software. A debugging instrument is classified as minimally intrusive if it has a negligible effect on the system being debugged. In a real microcomputer system, breakpoints and single-stepping are also intrusive, because the real hardware continues to change while the software has stopped. When a program interacts with real-time events, the performance can be significantly altered when using intrusive debugging tools. On the other hand, dumps, dumps with filter and monitors (e.g., output strategic information on LEDs) are much less intrusive. A logic analyzer that passively monitors the activity of the software is completely non-intrusive. An in-circuit emulator is also nonintrusive because the software input/output relationships will be the same with and without the debugging tool. Similarly, breakpoints and single-stepping on a simulator like TExaS are nonintrusive, because the simulated hardware and the software are affected together. Checkpoint 1.9: What does it mean for a debugging instrument to be minimally intrusive? Give both a general answer and a specific criterion.
Research in the area of program monitoring and debugging mirrors the rapid pace of developments in other areas of computer architecture and software systems. Because of the complexity explosion in computer systems, effective debugging tools are essential. Some experts predict the software footprint programmed into the embedded systems of one automobile will soon reach 10 million lines of code. The critical aspect of debugging an embedded system is the ability to see what the software is doing, where it is executing, and when did it do it, without the debugger itself modifying system behavior. Terms such as program testing, diagnostics, performance debugging, functional debugging, tracing, profiling, instrumentation, visualization, optimization, verification, performance measurement, and execution measurement have specialized meanings, but they are also used interchangeably, and they often describe overlapping functions. For example, the terms profiling, tracing, performance measurement, or execution measurement may be used to describe the process of examining a program from a time viewpoint. But, tracing is also a term that may be used to describe the process of monitoring a program state or history for functional errors, or to describe the process of stepping through a program with a debugger. Usage of these terms among researchers and users vary. Furthermore, the meaning and scope of the term debugging itself is not clear. In this book the goal of debugging is to maintain and improve software, and the role of a debugger is to support this endeavor. The debugging process is defined as testing, stabilizing, localizing, and correcting errors. Although testing, stabilizing, and localizing errors are important and essential to debugging, they are auxiliary processes: the primary goal of debugging is to remedy faults or to correct errors in a program. Stabilization is the process of fixing the inputs so that the system can be run over and over again yielding repeatable outputs. Although, a wide variety of program monitoring and debugging tools are available today, in practice it is found that an overwhelming majority of users either still prefer or rely mainly upon “rough and ready” manual methods for locating and correcting program errors. These methods include desk-checking, dumps, and print statements, with print statements being one of the most popular manual methods. Manual methods are useful because they are readily available, and they are relatively simple to use. But, the usefulness of manual methods is limited: they tend to be highly intrusive, and they do not provide adequate control over repeatability, event selection, or event isolation.
1.9 䡲 Tutorial 1. Getting Started
23
A real-time system, where software execution timing is critical, usually can not be debugged with simple print statements, because the print statement itself will require too much time to execute. A debugging instrument is defined as hardware or software that is added to the system for the purpose of debugging. A print statement is a common example of an instrument. Using the editor, one adds print statements to the code that either verify proper operation or illustrate the programming errors. If we test a system, then remove the instruments, the system may actually stop working, because of the importance of timing in embedded systems. If we leave debugging instruments in the final product, we can use the instruments to test systems on the production line, or test systems returned for repair. On the other hand, sometimes we wish to provide for a mechanism to reliably and efficiently remove all instruments when the debugging is done. Consider the following mechanisms as you develop your own unique debugging style. 䡲 Place all instruments in a unique column, so you can easily distinguish instruments from regular programs. 䡲 Define all debugging instruments as functions that all have a specific pattern in their names. In this way, the find/replace mechanism of the editor can be used to find all the calls to the instruments. 䡲 Define the instruments so that they test a run time global flag. When this flag is turned off, the instruments perform no function. Notice that this method leaves a permanent copy of the debugging code in the final system, causing it to suffer a runtime overhead, but the debugging code can be activated dynamically without recompiling. Many commercial software applications utilize this method because it simplifies “on-site” customer support. 䡲 Use conditional compilation (or conditional assembly) to turn on and off the instruments when the software is compiled. When the assembler or compiler supports this feature, it can provide both performance and effectiveness. The emergence of concurrent languages and the increasing use of embedded real-time systems place further demands on debuggers. The complexities introduced by the interaction of multiple events or time dependent processes are much more difficult to debug than errors associated with sequential programs. The behavior of non-real-time sequential programs is reproducible: for a given set of inputs their outputs remain the same. In the case of concurrent or real-time programs this does not hold true. Control over repeatability, event selection, and event isolation is even more important for concurrent or real-time environments. Checkpoint 1.10: Consider the difference between a runtime flag that activates a debugging command versus an assembly/compile-time flag. In both cases it is easy to activate/deactivate the debugging statements. For each method, list one factor for which that method is superior to the other. Checkpoint 1.11: What is the advantage of leaving debugging instruments in a final delivered product?
1.9
Tutorial 1. Getting Started Tutorials in this book represent short activities for you to do on your own. Each tutorial that allows you to have a hands-on experience to support the basic concepts. An action defines a specific task that you should perform. The answers to the questions can be found at the end of the book. The objective of this first tutorial is to provide an overview embedded system development in general and of the TExaS simulator in particular. When you are ready to use the TExaS simulator to develop your own programs, first perform this tutorial, then install TExaS and read the Getting Started section found in the TExaS help menu.
24
1 䡲 Introduction to Embedded Microcomputer Systems Action: Watch the first getting started movie, called Lesson 1. First time users of TExaS should watch the Lesson 1 animation located on the web at http:// users.ece.utexas.edu/⬃valvano/Readme.htm. This lesson introduces the major components of the application. It takes about 11 minutes and provides a narrated overview of the TExaS application. You need not install TExaS, just download and run the Windows media file. Question 1.1 The branch instruction causes what two instructions to execute ten times? Question 1.2 Into what type of memory is count defined? Question 1.3 Into what type of memory is the main program defined? Question 1.4 What are the special features of the TExaS editor help in writing assembly programs? Question 1.5 What microcomputer is being simulated? Question 1.6 What does the red cursor in the listing file signify? Question 1.7 Does the TExaS application simulate both hardware and software or just software?
1.10
Homework Assignments Homework 1.1 In order to reduce power, some microcomputers run on 3.3 V instead of 5 V. Redraw Figure 1.1 using 3.3 V power, and define what logic high and logic low would be for this system. Assume the resistance path from the 5 V supply to ground for 5 V logic is approximately equal to the resistance from the 3.3 V supply to ground for 3.3 V logic. What is the percentage reduction in power occurring by switching from 5 V to 3.3 V. Homework 1.2 There is a microcomputer embedded in a vending machine. List three operations the software must perform. Homework 1.3 What is a port? Homework 1.4 What does nonvolatile mean? Homework 1.5 What do the acronyms RAM ROM I/O DAC ADC mean? Homework 1.6 What is the difference between a microcomputer and a microcontroller? Homework 1.7 Using a flowchart describe the control algorithm that a thermostat must use to maintain constant temperature. Assume the inputs are current temperature in F, the desired temperature in F, and an AC/off/heat three-way switch. The outputs are AC (on/off) and heat (on/off). Write a brief software requirements document for this system. Homework 1.8 Using a flowchart describe the cruise control algorithm that a car must use to maintain constant speed. Assume the inputs are current speed in mph, brake (on/off), and a cruise on/off momentary button. The output is accelerator position (0 to 100%). The desired current is the current speed at the time the cruise control is activated. Touching the brake turns off the system. Write a brief software requirements document for this system. Homework 1.9 Draw a flowchart of the following C program. Assume PORTB is an output. This is an incremental controller that maintains the motor at a constant speed of 100. void main(void) { unsigned char power,speed; power = 0; ADC_Init(); /* turn on ADC power */ while(1){ PORTB = power; /* output to actuator */ speed = ADC_Read(); if(speed < 100){ /* too slow */ if(power < 255) power++; }
1.10 䡲 Homework Assignments
25
else{ /* too fast */ if(power > 1) power--; } } } Homework 1.10 Write C code for the flowchart shown in Figure Hw1.10. PORTB is an output connected to a stepper motor. PORTA is an input connected to a toggle switch. Figure Hw1.10 Flowchart showing a stepper motor controller, used for Homework 1.10.
step(n) main
read PORTA
step(5) step(9)
bit0 1
0
PORTB=n
step(10) cnt = 10000 step(6)
cnt =0 return
>0 cnt = cnt-1
Homework 1.11 Draw a data flow graph of the thermostat algorithm developed in Homework 1.7. Homework 1.12 Draw a data flow graph of the cruise control algorithm developed in Homework 1.8. Homework 1.13 Draw a flowchart of this C program using just the three basic building blocks of structured programming. In particular, first draw the flowchart in the regular way, then show the groupings that define each basic block. short data[100],sum; void calc(void){ short i; sum=0; for(i=0;iRAM #$07 DDRT ;PT2,PT1,PT0 outputs ;allow debugger #4 PTT ;output #2 PTT ;output #1 PTT ;output loop $FFFE ;EEPROM main ;reset vector
// 9S12DP512 void main(void){ DDRT = 0x07; // PT2,PT1,PT0 outputs asm cli while(1){ PTT = 0x04; PTT = 0x02; PTT = 0x01; } }
Program 2.2 Software solution to Example 2.2.
To better understand how the computer translates our program into actions, we will analyze the explicit actions that occur as Program 2.2 executes. After typing in our source code, we will assemble the program generating the machine code and listing file. Program 2.3 is the listing file for Program 2.2. Line numbers were manually added to show instructions that will be executed. Looking at the big picture, we see that lines 1 through 4 are executed once to initialize the system, and lines 5 through 11 are repeated over and over as the system produces the infinite output sequence 4-2-1-4-2-1-4-2-1- . . . 1
$2000 for the 9S12E128 and $3800 for the 9S12C32
50
2 䡲 Introduction to Assembly Language Programming
Program 2.3 Listing file for Program 2.2.
$0240 $0242 $0800 $4000 $4000 $4003 $4005 $4008 $400A $400C $400F $4011 $4014 $4016 $4019 $FFFE $FFFE
CF4000 8607 7A0242 10EF 8604 7A0240 8602 7A0240 8601 7A0240 20EF 4000
; 9S12DP512 PTT equ $0240 DDRT equ $0242 org $0800 ;RAM org $4000 ;EEPROM main lds #$4000 ;SP=>RAM *Line ldaa #$07 *Line staa DDRT ;PT2,PT1,PT0 *Line cli ;allow debugger*Line loop ldaa #4 *Line staa PTT ;output *Line ldaa #2 *Line staa PTT ;output *Line ldaa #1 *Line staa PTT ;output *Line bra loop *Line org $FFFE ;EEPROM fdb main ;reset vector
1 2 3 4 5 6 7 8 9 10 11
The machine code is programmed into the flash EEPROM. In particular, locations $4000 to $401A will contain the machine code for this program, and locations $FFFE to $FFFF will always contain the reset vector, as shown in Figure 2.20.
Figure 2.20 Memory model of Program 2.2. After reset, PC$4000.
Data
Address
I/O
$0000
Processor RegA SP PC
PTT
$0240 PTT $0241 $0242 DDRT
$4000
Bus
$4000 $4001 $4002 $4003 $4004 $4005 $4006 $4007 $4008 $4009 $400A $400B $400C $400D $400E $400F $4010 $4011 $4012 $4013 $4014 $4015 $4016 $4017 $4018 $4019 $401A
$CF $40 $00 $86 $07 $7A $02 $42 $10 $EF $86 $04 $7A $02 $40 $86 $02 $7A $02 $40 $86 $01 $7A $02 $40 $20 $EF
EEPROM
$FFFE $40 $FFFF $00
When power is applied to the system, or when the reset button is pushed, the computer reads the 16-bit number from location $FFFE and $FFFF and places it into the PC. This defines the place the program will begin execution. In this example, the software will start executing at $4000. Lines 1 through 4:
These four lines perform the initialization sequence. Executing the lds instruction will initialize the stack pointer. Although not specifically used in this example, the stack is an important structure and should be initialized in this manner for all our 9S12 software. During the execution of each instruction, the PC is incremented to the next instruction. Executing the ldaa instruction will set register A equal to $07. Since this is immediate mode addressing, the data can be found in the machine code itself. Since it is immediate mode,
2.10 䡲 Tutorial 2. Running with TExaS
51
the data will be fixed, and can only be changed by editing the source code and reassembling the program. Executing the staa instruction will set DDRT equal to $07. Since this is extended mode addressing, the machine code contains the address of DDRT, $0242. DDRT specifies whether each pin of Port T is an input or an output. This store instruction produces a write cycle to address $0242 with data $07, causing PT2, PT1 and PT0 to become output pins. Notice that the load instructions bring data from memory or a port into a register, and the store instructions send data from a register out to memory or a port. Executing the cli instruction will enable interrupts. Although this program not specifically use interrupts, the debugger needs to have interrupts enabled.
Lines 5 through 10:
Line 11:
2.10
Address
Object code
Source code
Action
After completion
$4000 $4003 $4005 $4008
$CF4000 $8607 $7A0242 $10EF
lds #$4000 ldaa #$07 staa DDRT cli
SP=$4000 A=$07 DDRT=$07 I=0
PC=$4003 PC=$4005 PC=$4008 PC=$400A
These lines perform the body of the program, causing the 4-2-1 output sequence. Executing the ldaa instructions will set register A equal to a constant. The # symbol specifies immediate mode addressing, the constant data can be found in the machine code itself. Executing the staa instructions will set PTT. This is extended mode addressing, therefore the machine code contains the address of PTT, $0240. The store staa instructions produce write cycles to address $0240. When you write 1/0 binary data to Port T, the high/low digital voltages occur on the corresponding output pins. In this example, each staa instruction sets a new output on PTT, and notice the sequence will be 4-2-1, as desired. Address
Object code
Source code
Action
After completion
$400A $400C $400F $4011 $4014 $4016
$8604 $7A0240 $8602 $7A0240 $8601 $7A0240
ldaa staa ldaa staa ldaa staa
A=4 PTT=4 A=2 PTT=2 A=1 PTT=1
PC=$400C PC=$400F PC=$4011 PC=$4014 PC=$4016 PC=$4019
#4 PTT #2 PTT #1 PTT
This line causes the execution of the body of the program to occur over and over. The bra instruction uses PC-relative addressing. During the fetching of the two bytes of machine code, the PC is incremented twice, changing it from $4019 to $401B. The PC-relative offset, $EF is sign extended to $FFEF, which means -17. This is an unconditional branch, so PC PC-17 (or $401B$FFEF), setting PC back to line 5. Address
Object code
Source code
$4019
$20EF
bra loop
Action
After completion PC=$400A
Tutorial 2. Running with TExaS This tutorial explains some of the debugging features available with TExaS. A vast amount of information exists as the computer executes software. A good debugger allows us to selectively filter this information, showing us only data relevant to problem at hand. There are two aspects of this filter: what information will we see? and when (or how often) will it be collected? The run mode allows us to adjust the level of detail observable during the simulation. Action: Watch the second movie, called Lesson 2. Lesson 2 is located on the web at http://users.ece.utexas.edu/~valvano/Readme.htm. This lesson introduces some of the debugging features. It takes about 11 minutes and provides a narrated overview of debugging within TExaS. You need not install TExaS, just download and run the Windows media file.
52
2 䡲 Introduction to Assembly Language Programming Question 2.1. A good debugger allows us to filter data that we observe. What are the two aspects of this filtering? I.e., in what two ways does the debugger filter data? Question 2.2. What format code do we use in the ViewBox to see a variable in 8-bit unsigned decimal? Question 2.3. What does CycleView mode do? Question 2.4. What does InstructionView mode do? Question 2.5. What does LogRecord mode do? Question 2.6. What is a ScanPoint?
2.11
Homework Assignments Homework 2.1 What are the differences between the following four instructions: ldaa 10 ldaa #10 ldaa $10 ldaa #$10 Homework 2.2 What is the difference between the following two instructions: ldaa #10 ldx #10 Homework 2.3 Identify the addressing mode used in each of the following instructions: staa 200 staa 2000 staa 200,x staa 2000,x bra 2000 jmp 2000 Homework 2.4 Identify the addressing mode used in each of the following instructions: subd 2,x clra ldaa #$36 ldd $3800 bra loop Homework 2.5 You will need to look up the address of Ports A and J in your data sheet to answer this question. Identify the addressing mode used in each of the following instructions: cli subd #0 bsr $5000 jsr $5000 ldy 2,y ldaa PTJ ;Port J stab PORTA ;Port A rts The next three homework assignments in this chapter involve hand assembly. Pass1 contains three steps. The first step is to determine addressing mode for each instruction. Next, you calculate the object code size for the instruction. The third step is to create the symbol table. Pass2 contains two steps. The first step is to determine the object code for each instruction, and the second step write the listing (address, data) for each line. Homework 2.6 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRH equ $0262 ; Port H Data Direction Register DDRT equ $0242 ; Port T Data Direction Register PTH equ $0260 ; Port H I/O Register
2.11 䡲 Homework Assignments PTT Main
loop
equ org ldaa staa ldaa staa ldaa staa bra org fdb
$0240 $4000 #$FF DDRT #$00 DDRH PTH PTT loop $FFFE Main
53
; Port T I/O Register ; Object code goes in EEPROM ; Port T is output ; ; ; ;
Port H is input Read inputs Set output Repeat
; Starting address after a RESET
Homework 2.7 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRP equ $025A ; Port P Data Direction Register PTP equ $0258 ; Port P I/O Register org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main movb #$00,DDRP ; Port P is input loop ldaa PTP ; Read inputs staa Data ; Save in variable bra loop ; Repeat org $FFFE fdb Main ; Starting address after a RESET Homework 2.8 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main lds #$4000 ; Initialize stack movb #10,Data ; Data=10 loop bsr Add1 bra loop ; Repeat Add1 ldaa Data inca ; Add one staa Data rts org $FFFE fdb Main ; Starting address Homework 2.9 During an 8-bit memory read bus cycle to address $3800, what memory locations are modified? During an 8-bit memory write bus cycle to address $3800, what memory locations are modified? Homework 2.10 Consider this assembly instruction Here bsr Lookup ;call Lookup function For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.11 Consider this assembly instruction Here jsr Lookup ;call Lookup function
54
2 䡲 Introduction to Assembly Language Programming For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.12 Assume RegX is $3800, RegD is $4647, the PC is $4123, and RAM locations $3800 to $38FF are initially $00, $01, . . . $FF respectively. E.g., location $3856 contains $56. Show the simplified bus cycles occurring when the ldd 2,x instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4123 EC02
ldd 2,x
Homework 2.13 Assume PC is $4120, and the SP is initially $3FF4. Show the simplified bus cycles occurring when the bsr instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4120 07F0
bsr MyFunction
Homework 2.14 What does the effective address register contain? Homework 2.15 What is the purpose of the following registers CCR SP PC IR EAR? Homework 2.16 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldaa #44 ldy #$0010 staa 4,y Homework 2.17 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldab #$55 ldx #$0020 stab 5,x Homework 2.18 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. org $F000 fcb $86,$55,$CE,$02,$50,$F6,$F0,$00,$5A,$01,$6B,$08,$20,$FA Homework 2.19 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. Each value is in hexadecimal. org $4000 fcb $87,$CE,$02,$40,$F6,$40,$01,$5A,$08,$54,$6B,$02,$20,$FB Homework 2.20 Write an assembly language subroutine that initializes Port J bits 5, 4, 1, 0 to outputs and bits 7, 6, 3, 2 to input. Make all Port H bits input, and all Port T bits output. Homework 2.21 Write an assembly language subroutine that initializes Port T bits 7, 4, 3, 0 to outputs and bits 6, 5, 2, 1 to input. Make all Port M bits outputs. Homework 2.22 Write an assembly language software that initializes Port T bit 3 to an output. All other bits are input. Homework 2.23 Write an assembly language software that initializes Port H bit 1 to an input. All other bits are output. Homework 2.24 Write assembly software that makes Port T bits 1, 3, 5, and 7 outputs and the rest inputs.
2.12 䡲 Laboratory Assignments
55
Homework 2.25 Interface a LED that requires 1 mA at 2.5 V. A digital output high on PT0 turns on the LED. Homework 2.26 Interface a LED that requires 2 mA at 2.0 V. A digital output low turns PT1 on the LED. Homework 2.27 Interface a LED that requires 15 mA at 2.5 V. Use a 7405 driver and a current limiting resistor. A digital output high on PT2 turns on the LED. The 7405 output voltage VOL is 0.5 V. Homework 2.28 Interface a LED that requires 30 mA at 1.5 V. Use a 7406 driver and a current limiting resistor. A digital output high PT3 turns on the LED. The 7406 output voltage VOL is 0.5 V.
2.12
Laboratory Assignments For each lab in this chapter, you will have two binary switch inputs and one LED output. The LED represents the output, and the operator will toggle the switches in order to set the inputs. Let T be the Boolean variable representing the output (0 means LED is off and output is zero, 1 means LED is on and the output is 1). Let H and J be Boolean variables representing the state of the two switches (0 means the switch is not pressed, and 1 means the switch is pressed). Use the TExaS simulator to create three files. Lab2.rtf will contain the assembly source code. Lab2.uc will contain the microcomputer configuration. Lab2.io will define the external connections, which should be the two switches and one LED. Use the Mode-Processor command to select the desired processor. You should connect switches to PH0 (means Port H bit 0) and to PJ0 (means Port J bit 0). You should connect an LED to PT0 (means Port T bit 0). The switches should be labeled H and J, and the LED should be labeled T. When H switch is “off” or open position, the signal at PH0 will be 0 V, which is a logic “0”. For this situation, your software will consider H to be false. When the H switch is “on” or closed position, the signal at PH0 will be 5 V, which is a logic “1”. In this case, your software will consider H to be true. The J switch, which is connected to PJ0, will operate in a similar fashion. When your software writes a “1” to PT0, the LED will turn on. You will write assembly code that inputs from PH0 and PJ0, and outputs to PT0. A template structure for your assembly program is shown as Program 2.2. To solve this lab you will need the ldaa staa anda coma and bra instructions. You can use the movb instruction if you wish. You can copy and paste the address definitions for ports H, J, and T from the port12.rtf file. In particular, you will need to define DDRH DDRJ DDRT PTH PTJ and PTT. The opening comments include: file name, overall objectives, hardware connections, specific functions, author name, and date. The equ pseudo-op is used to define port addresses. Global variables are declared in RAM, and the main program is placed in EEPROM. The 16-bit contents at $FFFE and $FFFF define where the computer will begin execution after a reset vector. Lab 2.1 The specific device you will create is a digital NAND with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be zero if and only if both the H switch and the J switch are pressed. Program L2.1 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.
Program L2.1 The C program to illustrate Lab 2.1.
void main(void){ DDRH = 0x00; // make Port DDRJ = 0x00; // make Port DDRT = 0xFF; // make Port while(1){ PTT = ~(PTJ&PTH); // LED off } }
H an input, PH0 is H J an input, PJ0 is J T an output, PT0 is T iff PJ0=1 and PH0=1
56
2 䡲 Introduction to Assembly Language Programming Lab 2.2 The specific device you will create is a digital NOR with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be one if and only if both the H switch and the J switch are not pressed. Program L2.2 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.
Program L 2.2 The C program to illustrate Lab 2.2.
void main(void){ DDRH = 0x00; // make DDRJ = 0x00; // make DDRT = 0xFF; // make while(1){ PTT = (~PTJ)&(~PTH); // } }
Port H an input, PH0 is H Port J an input, PJ0 is J Port T an output, PT0 is T LED on iff PJ0=0 and PH0=0
Lab 2.3 The specific device you will create is a digital lock with two binary switch inputs and one LED output. The LED output represents the lock, and the operator will toggle the switches in order to unlock the door. The specific function you will implement is T = H&J This means the LED will be on if and only if the H switch is pressed and the J switch is not pressed. Program L2.3 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary. Program L 2.3 The C program to illustrate Lab 2.3.
void main(void){ DDRH = 0x00; // make Port H an input, DDRJ = 0x00; // make Port J an input, DDRT = 0xFF; // make Port T an output, while(1){ PTT = (~PTJ)&PTH; // LED on iff PJ0=0 and } }
PH0 is H PJ0 is J PT0 is T PH0=1
3
Representation and Manipulation of Information Chapter 3 objectives are: c c c c c c
Introduce the concept of how numbers are stored on the computer Discuss how characters are represented Define terms like precision and basis Review arithmetic and logic operations Explain the usage of condition code bits Develop mechanisms to convert between character strings and binary numbers
Numbers, like all information, are stored on the computer in binary form. On most computers, the memory is organized into 8-bit bytes. This means each 8-bit byte stored in memory will have a separate address. In this chapter we will learn about unsigned numbers, signed numbers, characters, and how to perform basic logical and arithmetic calculations. In order to develop reliable systems it is important to understand how the computer can make mistakes during calculations. With this knowledge, we can write software that detects when an error occurs, or better yet, we can write software that does not make mistakes.
3.1
Precision Precision is the number of distinct or different values. We express precision in alternatives, decimal digits, bytes, or binary bits. Alternatives are defined as the total number of possibilities. For example, an 8-bit number format can represent 256 different numbers. An 8-bit digital to analog converter (DAC) can generate 256 different analog outputs. An 8-bit analog to digital converter (ADC) can measure 256 different analog inputs. Table 3.1 illustrates the relationship between precision in binary bits and precision in alternatives.
Table 3.1 Relationship between bits, bytes and alternatives as units of precision.
Binary Bits
Bytes
Alternatives
8 10 12 16 20 24 30 32 n
1 2 2 2 3 3 4 4 [[n/8]]
256 1024 4096 65536 1,048,576 16,777,216 1,073,741,824 4,294,967,296 2n 57
58
3 䡲 Representation and Manipulation of Information
The operation [[x]] is defined as the greatest integer of x. E.g., [[2.1]] [[2.9]] and [[3.0]] are all equal to 3. The Bytes column in Table 3.1 specifies how many bytes of memory would it take to store a number with that precision assuming the data were not packed or compressed in any way. Checkpoint 3.1: How many bytes of memory would it take to store a 50-bit number?
Decimal digits are used to specify precision of measurement systems that display results as numerical values, as defined in Table 3.2. A full decimal digit can be any value 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. A digit that can be either 0 or 1 is defined as a 1⁄2 decimal digit. The terminology of a 1⁄2 decimal digit did not arise from a mathematical perspective of precision, but rather it arose from the physical width of the LED/LCD module used to display a blank or ‘1’as compared to the width of a full digit. Notice in Figure 3.1 that the 7-segment modules capable of displaying 0 to 9 are about 1 cm wide; however, the corresponding 2-segment modules capable of being blank or displaying a 1 are about half as wide. Similarly, we define a digit that can be or also as a half decimal digit, because it has two choices. A digit that can be 0, 1, 2, 3 is defined as a 3⁄4 decimal digit, because it is wider than a 1⁄2 digit but narrower than a full digit. We also define a digit that can be 1, 0, 0, or 1 as a 3⁄4 decimal digit, because it also has 4 choices. We use the expression 41⁄2 decimal digits to mean 20,000 alternatives and the expression 43⁄4 decimal digits to mean 40,000 alternatives. The use of a 1⁄2 decimal digit to mean twice the number of alternatives or one additional binary bit is widely accepted. On the other hand, the use of a 3⁄4 decimal digit to mean four times the number of alternatives or two additional binary bits is not as commonly accepted. For example, consider the two ohmmeters shown in Figure 3.1. As illustrated in the figure, both are set to the 0 to 200 k range. The 31⁄2 digit ohmmeter has a resolution of 0.1 k with measurements ranging from 0.0 to 199.9 k. On the other hand, the 41⁄2 digit ohmmeter has a resolution of 0.01 k with measurements ranging from 0.00 to 199.99 k. Table 3.2 Definition of decimal digits as a unit of precision.
Decimal Digits
Alternatives
3 31⁄2 33⁄4 4 41⁄2 43⁄4 n n1⁄2 n3⁄4
1000 2000 4000 10000 20000 40000 10n 2•10n 4•10n
Observation: A good rule of thumb to remember is 210•n ⬇ 103•n. Figure 3.1 Two ohmmeters: the one on the left has 31⁄2 decimal digits and the one on the right has 41⁄2.
3.2 䡲 Boolean Information
59
Checkpoint 3.2: How many binary bits is equivalent to 31⁄2 decimal digits? Checkpoint 3.3: About how many decimal digits is 64 binary bits? You can answer this without a calculator, just using the “rule of thumb”.
A great deal of confusion exists over the abbreviations we use for large numbers. In 1998 the International Electrotechnical Commission (IEC) defined a new set of abbreviations for the powers of 2, as shown in Table 3.3. These new terms are endorsed by the Institute of Electrical and Electronics Engineers (IEEE) and International Committee for Weights and Measures (CIPM) in situations where the use of a binary prefix is appropriate. The confusion arises over the fact that the mainstream computer industry, such as Microsoft, Apple, and Dell, continues to the old terminology. According to the companies that market to consumers, a 1 GHz is 1,000,000,000 Hz but 1 Gbyte of memory is 1,073,741,824 bytes. The correct terminology is to use the SI-decimal abbreviations to represent powers of 10, and the IEC-binary abbreviations to represent powers of 2. The scientific meaning of 2 kilovolts is 2000 volts, but 2 kibibytes is the proper way to specify 2048 bytes. The term kibibyte is a contraction of kilo binary byte and is a unit of information or computer storage, abbreviated KiB. 1 KiB 210 bytes 1024 bytes 1 MiB 220 bytes 1,048,576 bytes 1 GiB 230 bytes 1,073,741,824 bytes These abbreviations can also be used to specify the number of binary bits. The term kibibit is a contraction of kilo binary bit, and is a unit of information or computer storage, abbreviated Kibit. 1 Kibit 210 bits 1024 bits 1 Mibit 220 bits 1,048,576 bits 1 Gibit 230 bits 1,073,741,824 bits A mebibyte (1 MiB is 1,048,576 bytes) is approximately equal to a megabyte (1 MB is 1,000,000 bytes), but mistaking the two has nonetheless led to confusion and even legal disputes. In the engineering community, it is appropriate to use terms that have a clear and unambiguous meaning. Checkpoint 3.4: A 2 tebibyte storage system can store how many bytes?
Table 3.3 Common abbreviations for large numbers.
3.2
Value
SI
Decimal
Value
IEC
Binary
10001 10002 10003 10004 10005 10006 10007 10008
k M G T P E Z Y
kilomegagigaterapetaexazettayotta-
10241 10242 10243 10244 10245 10246 10247 10248
Ki Mi Gi Ti Pi Ei Zi Yi
kibimebigibitebipebiexbizebiyobi-
Boolean Information A Boolean number is has two states. The two values represent logical true and false. In Chapter 1, we defined positive logic so that true is a 1 or high, and false is a 0 or low. In C programming, a false is represented by a zero, and a true as any non-zero value. If you
60
3 䡲 Representation and Manipulation of Information
were controlling a motor, light, heater, or air conditioner, the Boolean could mean on or off. Figure 3.2 shows the simulation using TExaS of a simple switch connected to PC0 that has two states and a LED that can be on or off. PB0 is a digital output of the microcomputer, which can be either high or low. The output of the LED driver is low or HiZ (shown as z in Figure 3.2.) In communication systems, we represent the information as a sequence of Booleans: mark or space. For black or white graphic displays we use Booleans to specify the state of each pixel. The most efficient storage of Booleans on a computer is to map each Boolean into one memory bit. In this way, we can pack eight Booleans into each byte. If we have just one Boolean to store in memory, out of convenience we allocate an entire byte for it. A common positive logic definition for Boolean information is: False is defined as all zeros, and True is defined as any nonzero value. Figure 3.2 External to the microcomputer, Boolean information is encoded as voltage (0 or 5 V), position of a switch (off, on), and the presence of light (dark, light).
Checkpoint 3.5: Given an example of a switch that is not binary.
In negative logic, the absence of a voltage is the true or asserted state. The presence of a voltage is called the false or not asserted state. In other words, the 0 or low voltage means true, and the 5 or high voltage means false. RS232 serial communication uses a negative logic encoding where 12 V means true, and 12 V means false. More about serial interfacing can be found in Chapters 8 and 12.
3.3
8-bit Numbers We saw 8-bit and 16-bit numbers in Chapter 2, but more formal definitions will be presented in the next few sections. A byte contains 8 bits as shown in Figure 3.3, where each bit b7, . . . , b0 is binary and has the value 1 or 0. We specify b7 as the most significant bit or MSB, and b0 as the least significant bit or LSB.
Figure 3.3 8-bit binary format.
b7 b6 b5 b4
b3 b2 b1 b0
If a byte is used to represent an unsigned number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Notice that the significance of bit n is 2n. There are 256 different unsigned 8-bit numbers. The smallest unsigned 8-bit number is 0 and the largest is 255. For example, %00001010 is 8 2 or 10. Other examples are shown in Table 3.4. The least significant bit can tell us if the number is even or odd.
3.3 䡲 8-bit Numbers Table 3.4 Example conversions from unsigned 8-bit binary to hexadecimal and to decimal.
61
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
641 1642 128421 1286432168421
0 65 22 135 255
Checkpoint 3.6: Convert the binary number %01101010 to unsigned decimal. Checkpoint 3.7: Convert the hex number $45 to unsigned decimal.
The basis of a number system is a subset from which linear combinations of the basis elements can be used to construct the entire set. The basis represents the “places” in a “placevalue” system. For positive integers, the basis is the infinite set {1, 10, 100, . . .}, and the “values” can range from 0 to 9. Each positive integer has a unique set of values such that the dot-product of the value vector times the basis vector yields that number. For example, 2345 is ( . . . , 2,3,4,5)•(. . . , 1000,100,10,1), which is 2*10003*1004*105. For the unsigned 8-bit number system, the basis is {1, 2, 4, 8, 16, 32, 64, 128} The values of a binary number system can only be 0 or 1. Even so, each 8-bit unsigned integer has a unique set of values such that the dot-product of the values times the basis yields that number. For example, 69 is (0,1,0,0,0,1,0,1)•(128,64,32,16,8,4,2,1), which equals 0*1281*640*320*160*81*40*21*1. Conveniently, there is no other set of 0’s and 1’s, such that set of values multiplied by the basis is 69. One way for us to convert a decimal number into binary is to use the basis elements. The overall approach is to start with the largest basis element and work towards the smallest. More precisely, we start with the most significant bit and work towards the least significant bit. One by one, we ask ourselves whether or not we need that basis element to create our number. If we do, then we set the corresponding bit in our binary result and subtract the basis element from our number. If we do not need it, then we clear the corresponding bit in our binary result. We will work through the algorithm with the example of converting 100 to 8-bit binary, see Table 3.5. We start with the largest basis element (in this case 128) and ask whether or not we need to include it to make 100? Since our number is less than 128, we do not need it, so bit 7 is zero. We go the next largest basis element, 64 and ask, “do we need it?” We do need 64 to generate our 100, so bit 6 is one and we subtract 100 minus 64 to get 36. Next, we go the next basis element, 32 and ask, “do we need it?” Again, we do need 32 to generate our 36, so bit 5 is one and we subtract 36 minus 32 to get 4. Continuing along, we do not need basis elements 16 or 8, but we do need basis element 4. Once we subtract the 4, are working result is zero, so basis elements 2 and 1 are not needed. Putting it together, we get %01100100 (which means 64324).
Table 3.5 Example conversion from decimal to unsigned 8-bit binary to hexadecimal.
Number
Basis
Need It?
Bit
Operation
100 100 36 4 4 4 0 0
128 64 32 16 8 4 2 1
no yes yes no no yes no no
bit 70 bit 61 bit 51 bit 40 bit 30 bit 21 bit 10 bit 00
none subtract 100-64 subtract 36-32 none none subtract 4-4 none none
62
3 䡲 Representation and Manipulation of Information Checkpoint 3.8: In this conversion algorithm, how can we tell if a basis element is needed? Observation: If the least significant binary bit is zero, then the number is even. Observation: If the right-most n bits (least significant) are zero, then the number is divisible by 2n. Observation: Bit 7 of an 8-bit number determines whether its value is greater than or equal to 128. Checkpoint 3.9: Give the representations of the decimal 45 in 8-bit binary and hexadecimal. Checkpoint 3.10: Give the representations of the decimal 200 in 8-bit binary and hexadecimal.
One of the first schemes to represent signed numbers was called one’s complement. It was called one’s complement because to negate a number, we complement (logical not) each bit. For example, if 25 equals 00011001 in binary, then 25 is 11100110. An 8-bit one’s complement number can vary from 127 to 127. The most significant bit is a sign bit, which is 1 if and only if the number is negative. The difficulty with this format is that there are two zeros 0 is 00000000, and 0 is 11111111. Another problem is that ones complement numbers do not have basis elements. These limitations led to the use of two’s complement. The two’s complement number system is the most common approach used to define signed numbers. It is called two’s complement because to negate a number, we complement each bit (like one’s complement), then add 1. For example, if 25 equals 00011001 in binary, then 25 is 11100111. If a byte is used to represent a signed two’s complement number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Observation: One usually means two’s complement when one refers to signed integers.
There are 256 different signed 8-bit numbers. The smallest signed 8-bit number is 128 and the largest is 127. For example, %10000010 equals 1282 or 126. Other examples are shown in Table 3.6. Checkpoint 3.11: Convert the signed binary number %11101010 to signed decimal. Checkpoint 3.12: Are the signed and unsigned decimal representations of the 8-bit hex number $45 the same or different?
For the signed 8-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128} Observation: The most significant bit in a two’s complement signed number will specify the sign.
Table 3.6 Example conversions from signed 8-bit binary to hexadecimal and to decimal.
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
64 1 16 4 2 128 4 2 1 128 64 32 16 8 4 2 1
0 65 22 121 1
3.3 䡲 8-bit Numbers
63
Notice that the same binary pattern of %11111111 could represent either 255 or 1. It is very important for the software developer to keep track of the number format. The computer can not determine whether the 8-bit number is signed or unsigned. You, as the programmer, will determine whether the number is signed or unsigned by the specific assembly instructions you select to operate on the number. Some operations like addition, subtraction, and shift left (multiply by 2) use the same hardware (instructions) for both unsigned and signed operations. On the other hand, multiply, divide, and shift right (divide by 2) require separate hardware (instruction) for unsigned and signed operations. For example, the multiply instruction, mul, operates on unsigned values. Software that employs the mul instruction implements unsigned arithmetic. There is also a signed multiply instruction, smul, and if you use it, you are implementing signed arithmetic. Similar to the unsigned algorithm, we can use the basis to convert a decimal number into signed binary. We will work through the algorithm with the example of converting 100 to 8-bit binary, as shown in Table 3.7. We start with the most significant bit (in this case 128) and decide do we need to include it to make 100? Yes (without 128, we would be unable to add the other basis elements together to get any negative result), so we set bit 7 and subtract the basis element from our value. Our new value equals 100 minus 128, which is 28. We go the next largest basis element, 64 and ask, “do we need it?” We do not need 64 to generate our 28, so bit 6 is zero. Next we go the next basis element, 32 and ask, “do we need it?” We do not need 32 to generate our 28, so bit 5 is zero. Now we need the basis element 16, so we set bit 4, and subtract 16 from our number 28 (28 16 12). Continuing along, we need basis elements 8 and 4 but not 2 1. Putting it together we get %10011100 (which means 128 16 8 4).
Table 3.7 Example conversion from decimal to signed 8-bit binary.
Number
Basis
Need It
Bit
Operation
100 28 28 28 12 4 0 0
128 64 32 16 8 4 2 1
Yes No No Yes Yes Yes No No
bit 71 bit 60 bit 50 bit 41 bit 31 bit 21 bit 10 bit 00
Subtract 100 128 None None Subtract 28 16 Subtract 12 8 Subtract 4 4 None None
Observation: To take the negative of a two’s complement signed number we first complement (flip) all the bits, then add 1.
A second way to convert negative numbers into binary is to first convert them into unsigned binary, then do a two’s complement negate. For example, we earlier found that 100 is %01100100. The two’s complement negate is a two step process. First we do a logic complement (flip all bits) to get %10011011. Then add one to the result to get %10011100. A third way to convert negative numbers into binary is to first add 256 to the number, then convert the unsigned result to binary using the unsigned method. For example, to find 100, we add 256 plus 100 to get 156. Then we convert 156 to binary resulting in %10011100. This method works because in 8-bit binary math adding 256 to number does not change the value. E.g., 256-100 has the same 8-bit binary value as 100. Checkpoint 3.13: Give the representations of 45 in 8-bit binary and hexadecimal. Checkpoint 3.14: Why can’t you represent the number 200 using 8-bit signed binary?
64
3 䡲 Representation and Manipulation of Information
Sign-magnitude representation dedicates one bit as the sign leaving the remaining bits to specify the magnitude of the number. If b7 is 1 then the number is negative, otherwise the number is positive. b
N 1 7•(64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0) Unfortunately, there is no basis set for the sign-magnitude number system. For example, %10000010 equals 1•2 or 2. Other examples are shown in Table 3.8. Table 3.8 Example conversions from sign-magnitude 8-bit binary to hexadecimal and to decimal.
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
64 1 16 4 2 1•(4 2 1) 1•(64 32 16 8 4 2 1)
0 65 22 7 127
Another problem with sign-magnitude is that there are two representations of the number 0: “00000000” and “10000000”. But, the biggest advantage of two’s complement signed numbers over sign-magnitude is that the same addition and subtraction hardware (e.g., the adda, suba instructions) can be used for both signed and unsigned numbers. We also can use the same hardware for shift left (e.g., asla is the same instruction as lsla). Although the hardware for these three operations works for both signed and unsigned numbers, the overflow (error) conditions are distinct. The C bit in the condition code register (CCR) signifies unsigned overflow, and the V bit in the CCR means a signed overflow has occurred. Unfortunately, we must use separate signed and unsigned operations for multiply, divide, and shift right. Common Error: An error will occur if you use signed operations on unsigned numbers, or use unsigned operations on signed numbers. Maintenance Tip: To improve the clarity of our software, always specify the format of your data (signed versus unsigned) when defining or accessing the data.
When communicating with humans (input or output), computers need to store information in an easy-to-read decimal format. One such format is binary coded decimal or BCD. The 8-bit BCD format contains two decimal digits, and each decimal digit is encoded in 4-bit binary. For example, the number 72 is stored as $72 or %01110010. We can represent numbers from 0 to 99 using 8-bit BCD. Checkpoint 3.15: What binary values are used to store the number 25 in 8-bit BCD format?
3.4
16-bit Numbers A word or double byte contains 16 bits, where each bit b15, . . . , b0 is binary and has the value 1 or 0, as shown in Figure 3.4.
Figure 3.4 16-bit binary format.
b15 b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4
b3 b2 b1 b0
If a word is used to represent an unsigned number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0
3.4 䡲 16-bit Numbers
65
There are 65536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0 and the largest is 65535. For example, %0010000110000100 or $2184 is 8192 256 128 4 or 8580. Other examples are shown in Table 3.9.
Binary
Hex
Calculation
%0000000000000000 %0000010000000001 %0000110010100000 %1000111000000010 %1111111111111111
$0000 $0401 $0CA0 $8E02 $FFFF
1024 1 2048 1024 128 32 32768 2048 1024 512 2 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1
Decimal 0 1025 3232 36354 65535
Table 3.9 Example conversions from unsigned 16-bit binary to hexadecimal and to decimal.
Checkpoint 3.16: Convert the 16-bit binary number %0010000001101010 to unsigned decimal. Checkpoint 3.17: Convert the 16-bit hex number $1234 to unsigned decimal.
For the unsigned 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Checkpoint 3.18: Convert the unsigned decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.19: Convert the unsigned decimal number 10000 to 16-bit binary.
There are also 65536 different signed 16-bit numbers. The smallest two’s complement signed 16-bit number is 32768 and the largest is 32767. For example, %1101000000000100 or $D004 is 327681638440964 or 12284. Other examples are shown in Table 3.10.
Binary
Hex
%0000000000000000 %0000010000000001 %0000110010100000 %1000010000000010 %1111111111111111
$0000 $0401 $0CA0 $8402 $FFFF
Calculation 1024 1 2048 1024 128 32 32768 1024 2 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1
Decimal 0 1025 3232 31742 1
Table 3.10 Example conversions from signed 16-bit binary to hexadecimal and to decimal.
If a word is used to represent a signed two’s complement number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Checkpoint 3.20: Convert the 16-bit hex number $1234 to signed decimal. Checkpoint 3.21: Convert the 16-bit hex number $ABCD to signed decimal.
66
3 䡲 Representation and Manipulation of Information
For the signed 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Common Error: An error will occur if you use 16-bit operations on 8-bit numbers, or use 8-bit operations on 16-bit numbers. Maintenance Tip: To improve the clarity of your software, always specify the precision of your data when defining or accessing the data. Checkpoint 3.22: Convert the signed decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.23: Convert the signed decimal number –10000 to 16-bit binary.
3.5
Extended Precision Numbers Consider an unsigned number with n bits, where each bit bn-1, Á , b0 is binary and has the value 1 or 0. If an n-bit number is used to represent an unsigned integer, then the value of the number is n-1
N2n-1•b n-1 + 2 n-2•bn-2 + Á + 2•b1 + b0 a 2 i•bi i0
There are 2n different unsigned n-bit numbers. The smallest unsigned n-bit number is 0 and the largest is 2n 1. For the unsigned n-bit number system, the basis is {1, 2, 4, Á , 2n2, 2n1} If an n-bit binary number is used to represent a signed two’s complement number, then the value of the number is n-2
N -2n-1•bn-1 + 2n-2•bn-2 + Á + 2•b1 + b0 - 2n-1•bn-1 + a 2i•bi i0
There are also 2 different signed n-bit numbers. The smallest signed n-bit number is 2n1 and the largest is 2n1 1. For the signed n-bit number system, the basis is n
{1, 2, 4, Á , 2n2, 2n1} Maintenance Tip: When programming in C, we will use data types char short and long when we wish to explicitly specify the precision as 8-bit, 16-bit or 32-bit. Whereas, we will use the int data type only when we don’t care about precision, and we wish the compiler to choose the most efficient way to perform the operation. Observation: When programming in assembly, we will always explicitly specify the precision of our numbers and calculations.
The binary coded decimal or BCD format is convenient for storing data that has just been input or is just about to be output. Each byte or a BCD number contains two decimal digits, and each decimal digit is encoded in four-bit binary. For example, the number 1,234,567 is stored in four bytes as $01234567. If m is the number of bytes, then the numbers from 0 to 100m 1 can be stored. Checkpoint 3.24: What hexadecimal values are used to store the number 3456 in 16-bit BCD format?
3.6
Logical Operations Software uses logical operations to combine information, to extract information and to test information. A unary operation produces its result given a single input parameter. For example, negate, increment, and decrement are unary operations.
3.6 䡲 Logical Operations
67
In discrete digital logic, the complement operation is called a NOT gate, as shown in Figure 3.5. The complement function is defined in Table 3.11. CMOS refers to complementary metal oxide semiconductor. The “HC” in 74HC04 stands for high-speed CMOS. Most microcomputers, including the 9S12, are made with high-speed CMOS logic. As we saw in Chapter 1, CMOS circuits are built with p-type and n-type transistors. There are just a few rules one needs to know for understanding how CMOS transistor-level circuits work. Each transistor acts like a switch between its source and drain pins. In general, current can flow from source to drain across an active p-type transistor, and no current will flow if the switch is open. From a first approximation, we can assume no current flows into or out of the gate. For a p-type transistor, the switch will be closed (transistor active) if its gate is low. A p-type transistor will be off (its switch is open) if its gate is high. The gate on the n-type works in a complementary fashion, hence the name complementary metal oxide semiconductor. For a n-type transistor, the switch will be closed (transistor active) if its gate is high. A n-type transistor will be off (its switch is open) if its gate is low. Therefore, consider the two possibilities for the circuit in Figure 3.5. If A is high (5 V), then p-type is off and the n-type is active. The closed switch across the source-drain of the n-type will make the output low (0 V). Conversely, if A is low (0 V), then p-type is active and the n-type is off. The closed switch across the sourcedrain of the p-type will make the output high (5 V). The 9S12 performs the complement in a bit-wise fashion. For example, the calculation r⬃n means each bit is calculated separately, r7⬃n7, r6⬃n6, . . . , r0⬃n0. Figure 3.5 Logical NOT operation can be implemented with discrete transistors or digital gates.
+5V p-type A p-type n-type A 0 V active off +5V A +5V off active 0V
drain
n-type
drain
gate
Table 3.11 Logical complement.
A
⬃A
0 1
1 0
source
gate
A
A
A
74HC04
source
A binary operation produces a single result given two inputs. The logical AND (&) operation yields a true result if both input parameters are true. The logical OR (|) operation yields a true result if either input parameter is true. The exclusive OR (^) operation yields a true result if exactly one input parameter is true. The logical operators are summarized in Table 3.12 and shown as digital gates in Figure 3.6. We can understand the operation of the AND gate by observing the behavior of its six transistors. If both A and B are high, both T3 and T4 will be active. Furthermore, if A and B are both high, T1 and T2 will be off. In this case, the signal labeled A & B will be low because the T3,T4 switch combination will short this signal to ground. If A is low, T1 will be active and T3 off. Similarly, if B is low, T2 will be active and T4 off. Therefore, if either
Table 3.12 Logical operations.
A
B
A&B
A|B
A^B
0 0 1 1
0 1 0 1
0 0 0 1
0 1 1 1
0 1 1 0
68
3 䡲 Representation and Manipulation of Information
Figure 3.6 Logical operations can be implemented with discrete transistors or digital gates.
AND Gate
OR Gate
A&B
A B
74HC08 +5V A
A&B T3
74HC86 +5V
+5V T2
A^B
A B
74HC32
T1
B
EOR Gate
A|B
A B
+5V
A
T1
B
T2
+5V
T5 A&B
A|B
T6 T4
T3
T5 A|B T6
T4
A is low or if B is low, the signal labeled A & B will be high because one or both of the T1,T2 switches will short this signal to 5 V. Transistors T5 and T6 create a logical complement, converting the signal A & B into the desired result of A&B. We can understand the operation of the OR gate by observing the behavior of its 6 transistors. If both A and B are low, both T1 and T2 will be active. Furthermore, if A and B are both low, T3 and T4 will be off. In this case, the signal labeled A | B will be high because the T1,T2 switch combination will short this signal to 5V. If A is high, T3 will be active and T1 off. Similarly, if B is high, T4 will be active and T2 off. Therefore if either A is high or if B is high, the signal labeled A | B will be low because one or both of the T3,T4 switches will short this signal to ground. Transistors T5 and T6 create a logical complement, converting the signal A | B into the desired result of A|B. Checkpoint 3.25: Using just the 74HC gates shown in Figures 3.5 and 3.6, design an equals circuit, such that the output is 1 if and only if input A equals input B. There will be two input signals and one output signal.
Most 8-bit logical instructions take two inputs, one from a register and the other from memory. The 9S12 performs these operations in a bit-wise fashion on two 8-bit parameters yielding an 8-bit result. For example, the calculation rm&n means each bit is calculated separately, r7m7&n7, r6m6&n6, . . . , r0m0&n0. All but the bita bitb instructions put the result back in the register. The N bit will be set is the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. anda anda andb andb bita bita bitb bitb coma comb eora eora eorb eorb oraa oraa orab orab
#w U #w U #w U #w U
#w U #w U #w U #w U
;RegA=RegA&w ;RegA=RegA&[U] ;RegB=RegB&w ;RegB=RegB&[U] ;RegA&w ;RegA&[U] ;RegB&w ;RegB&[U] ;RegA=$FF-RegA, RegA=~RegA ;RegB=$FF-RegB, RegB=~RegB ;RegA=RegA ^ w ;RegA=RegA ^ [U] ;RegB=RegB ^ w ;RegB=RegB ^ [U] ;RegA=RegA | w ;RegA=RegA | [U] ;RegB=RegB | w ;RegB=RegB | [U]
Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Complement RegA Complement RegB Exclusive or RegA with a constant Exclusive or RegA with a memory value Exclusive or RegB with a constant Exclusive or RegB with a memory value Logical or RegA with a constant Logical or RegA with a memory value Logical or RegB with a constant Logical or RegB with a memory value
3.6 䡲 Logical Operations
69
Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.1 Write software to set bit 4 and clear bits 1 and 0 of an 8-bit variable N. Solution We use an 8-bit register because we wish to operate on 8-bit data. We “or with 1” to set bits and we “and with 0” to clear bits. This logical function N$FC&(N|$10) performs the desired effect. Immediate mode addressing is used when operating on fixed constants. ldaa oraa anda staa
N #$10 #$FC N
;RegA = N|$10 (set bit 4) ;RegA = $FC&(N|$10) (clears bits 1,0)
To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits of variable N. The ldaa instruction brings these values into Register A. The oraa instruction sets bit 4, the anda instruction clears bits 1,0, and the staa instruction stores the result back to N. b7 0 b7 1 b7
b6 0 b6 1 b6
b5 0 b5 1 b5
b4 1 1 1 1
b3 0 b3 1 b3
b2 0 b2 1 b2
b1 0 b1 0 0
b0 0 b0 0 0
value of N $10 constant result of the oraa instruction $FC constant result of the anda instruction
Checkpoint 3.26: Write assembly code that implements RegDRegD&$0F3C. Checkpoint 3.27: Write assembly code that implements RegXRegX|$1234. Checkpoint 3.28: Let N be an 8-bit location. Write assembly code that clears bit 4.
We can use the AND operation to extract, or mask, individual bits from a value.
Example 3.2 Write software that sets a global variable to true if a switch is pressed. Solution The first step is to interface a switch to an input port of the 9S12. We will use positive logic interface because we want the digital signal in to be high if and only if the switch is pressed, as shown in Figure 3.7. In particular, PTT bit 0 contains a signal that is high or low depending on the position of the switch. Some switches bounce, which means there will be multiple open/closed cycles when the switch is changed. This simple solution can be used if the switch doesn’t bounce or if the bouncing doesn’t matter. Bit 0 of the Port T direction register should be made zero during the initialization. When the computer reads PTT it gets all 8 bits of the input port. On the other hand, the expression PTT&0x01 will be zero, if Figure 3.7 Interface of a switch to a microcomputer input.
+5V 9S12 in 10kΩ
PT0
70
3 䡲 Representation and Manipulation of Information
and only if bit 0 of PTT is zero. The following C code will set the variable Pressed to true (nonzero) if the switch is pressed. Pressed = PTT&0x01;
// true if the switch is pressed
The following 9S12 assembly code uses the anda instruction to perform the same operation. ldaa PTT ;read input Port T anda #$01 ;clear all bits except bit 0 staa Pressed ;true iff the switch is pressed
To illustrate how the above program works, let a7 a6 a5 a4 a3 a2 a1 a0 be the values of the 8 individual bits in PTT. The ldaa instruction brings these values into Register A. The anda instruction clears all bits except bit 0, and the staa instruction stores the result into the variable called Pressed. a7 a6 a5 a4 a3 a2 a1 a0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 a0
value of PTT $01 constant result of the anda instruction
Often we combine many small systems together to make larger systems. Once we debug a small system we would like to have confidence that it will still work when combined with other systems. One difficulty arises when two or more systems share an I/O port (for example system 1 uses PT1, and system 2 uses PT2). Friendly software modifies just the bits that need to be modified, making it easier to combine with other software. Conversely, an unfriendly solution modifies all 8 bits of a register when needing only to modify less than 8 bits.
Example 3.3 Write software that make PT4 PT5 outputs and clears both outputs without affecting the other bits of PTT. Solution This system uses just bits 4 and 5 of PTT, and the other 6 bits are not needed in this problem. If we implement a friendly solution, this system can be combined with other systems that use the other bits of PTT. We begin by setting DDRT bits 4 and 5, so PT4 PT5 become outputs. Usually, we set the direction register once at the start of our program. Rather than just setting DDRT=0x30 (unfriendly), we perform a read modify write so just bits 4 and 5 are affected. The following C code uses the OR operation to set bits 4 and 5 of the register DDRT. The other six bits of DDRT remain constant. DDRT |= 0x30; // set bits 4 and 5, making PT4 and PT5 outputs
The following 9S12 assembly code uses the oraa instruction to perform the same operation. ldaa DDRT oraa #$30 staa DDRT
;read previous value of DDRT ;set bits 4 and 5, other 6 bits left unchanged ;update the actual direction register
To illustrate how the above program works, let c7 c6 c5 c4 c3 c2 c1 c0 be the values of the original 8 bits in DDRT. The ldaa instruction brings these values into Register A. The oraa instruction sets bits 4 and 5, and the staa instruction stores the result back to DDRT. c7 c6 c5 c4 c3 c2 c1 c0 0 0 1 1 0 0 0 0 c7 c6 1 1 c3 c2 c1 c0
value of DDRT $30 constant result of the oraa instruction
We use another read-modify-write to also to clear bits 4 and 5 of PTT. Notice that ⬃0x30 is 0xCF. This complement is executed at compile-time rather than at run-time. PTT &= ~0x30; // clear bits 4 and 5, PT4 and PT5 become 0
3.6 䡲 Logical Operations
71
The following 9S12 assembly code uses the anda instruction to clear the two bits of PTT. ldaa PTT anda #$CF staa PTT
;read previous value of PTT ;clear bits 4 and 5, other 6 bits left unchanged ;update the actual PTT register
Maintenance Tip: When interacting with just some of the bits of an I/O register, it is better to modify just the bits of interest, leaving the other bits unchanged. In this way, the action of one piece of software does not undo the action of another piece.
These read-or-write and read-and-write sequences are extremely useful in manipulating individual bits within direction registers and output ports. So useful in fact that the 9S12 has instructions to perform these logical operations. Notice that these two instructions directly affect memory space without using registers, and that the data size is always 8-bits. These instructions have two addressing modes. The first addressing mode determines memory location to change. For now this first addressing mode will be direct or extended addressing, but later in Chapter 6, we see indexed addressing mode also could be used to specify the memory location. The second addressing mode will always be immediate, specifying which bits to modify. The N bit will be set if the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. bclr bset
U,#w U,#w
;[U]=[U]&(~w) ;[U]=[U] | w
Clear bits in memory Set bits in memory
Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.4 Write software that toggles a PT3 output without affecting the other bits of PTT. Toggle means change. I.e., if it is 1, make it 0. If it is 0, make it 1. Solution The exclusive or operation can be used to toggle bits. The following C code toggles PT3 by inverting bit 3 of PTT, while the other seven bits remain constant. Notice that 0x08 is %00001000 in binary. PTT ^= 0x08;
// toggle PT3 from 0 to 1 or from 1 to 0
The following 9S12 assembly code uses the eora instruction to perform the same operation. ldaa PTT eora #$08 staa PTT
;read output Port T ;toggle just bit 3, other 7 bits left unchanged ;update the actual output port
To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits in PTT. The ldaa instruction brings these values into Register A. The eora instruction toggles bit 3, and the staa instruction stores the result back to PTT. b7 b6 b5 b4 b3 b2 b1 b0 0 0 0 0 1 0 0 0 b7 b6 b5 b4 ~b3 b2 b1 b0
value of PTT $08 constant result of the eora instruction
72
3 䡲 Representation and Manipulation of Information
Example 3.5 Generate two out-of-phase squarewaves as shown in Figure 3.8. Solution Out of phase means one signal goes high when the other one goes low. During the initialization, we specify PT1 and PT0 as outputs, then establish the initial values as 0 and 1 respectively. We use the exclusive or operation to toggle both bits at the same time. The infinite loop program will repeat the exclusive or operation over and over, creating the out of phase squarewaves on Port T bit 1 and 0. The other six bits of Port T remain unchanged. DDRT |= 0x03; // make PT1 PT0 output PTT = (PTT&0xFD)|0x01; // PT1=0, PT0=1 while(1){ PTT ^= 0x03; // toggle bits 1 and 0 }
The following assembly code uses logical instructions to perform the function in a friendly manner. The period of the squarewave is determined by the speed of the microcomputer. Figure 3.8 shows the simulated waveforms running on a 1 MHz 9S12. main bset bclr bset loop ldaa eora staa bra
DDRT,#03 PTT,#$02 PTT,#$01 PTT #$03 PTT loop
;make PT1, PT0 outputs, leaving other bits as is ;make PT1=0 ;make PT0=1, leaving other bits as is ;read previous value of Port T ;toggle bits 1,0 ;change PT1 PT0, leaving other bits as is
Figure 3.8 Scope window showing the execution of Example 3.5.
Other convenient logical operators are summarized in Table 3.13 and shown as digital gates in Figure 3.9. The NAND operation is defined by an AND followed by a NOT. If you compare the transistor-level circuits in Figures 3.6 and 3.9, it would be more precise to say AND is defined as a NAND followed by a NOT. Similarly, the OR operation is a NOR followed by a NOT. The exclusive NOR operation implements the bit-wise equals operation. Table 3.13 Convenient logical operations.
A
B
NAND
NOR
Exclusive NOR
0 0 1 1
0 1 0 1
1 1 1 0
1 0 0 0
1 0 0 1
3.6 䡲 Logical Operations Figure 3.9 Other logical operations can also be implemented with discrete logic.
NAND
NOR
A&B
A B
A|B
A B
74HC00
Ex NOR
74HC02
+5V
Open collector
NOT
A^B
A B
73
A
A 74HC7266
7405 or 7406
+5V
+5V A
A
74HC05
B
A&B
B
A A|B
A
The output of an open collector gate, drawn with the ‘x’, has two states: low (0 V) and HiZ (floating.) TExaS signifies this floating state with a z, as seen in Figure 3.2. Consider the operation of the transistor-level circuit for the 74HC05. If A is high (5 V), the transistor is active, and the output is low (0 V). If A is low (0 V), the transistor is off, and the output is neither high nor low. In general, we can use an open collector NOT gate to control the current to a device, such as a relay, an LED, a solenoid, a small motor and a small light. The 74HC05, the 7405, and the 7406 are all open collector NOT gates. 74HC04 is high speed CMOS and can only sink up to 4 mA when its output is low. Since the 7405 and 7406 are transistor-transistor-logic (TTL) they can sink more current. In particular, the 7405 has a maximum output low current (IOL) of 16 mA, whereas the 7406 has a maximum IOL of 40 mA.
Example 3.6 The goal is develop a means for the microcontroller to turn on and turn off an AC-powered appliance. The interface will use a solid-state relay with a control parameters of 2 V and 10 mA. Write necessary subroutines to operate the system. Solution The control portion of the solid-state relay (SSR) is an LED, which we interface using an open collector NOT gate just like Figure 2.17. We choose an electronic circuit that has an output current larger than the 10 mA needed by the SSR. Since the maximum IOL of the 7405 is 16 mA, it can sink the 10 mA required by the SSR. The 7406 could also have been used. The resistor is selected to control the current to the diode. Using the LED design equation, R (5 Vd VOL)/Id (5 2 0.5 V)/0.01 A 250 . The closest standard value 5% resistor is 240 . A 240 resistor will generate Id (5 2 0.5 V)/240 10.4 mA, which will be close enough to activate the relay. When the input to the 7405 is high (p 5 V), the output is low (q 0.5 V), see Figure 3.10. In this state, a 10 mA current is applied to the diode, and relay switch activates. This causes 120 VAC power to be delivered to the appliance. But, when the input is low (p 0), the output floats (q HiZ, which is neither high or low). This floating output state causes the LED current to be zero, and the relay switch opens. In this case, no power is delivered to the appliance. Figure 3.10 Solid-state relay interface using a 7405 open collector driver.
+5V 240Ω
9S12
SSR
7405 PT5
p
Appliance
q
120 VAC
74
3 䡲 Representation and Manipulation of Information
The initialization subroutine will set bit 5 of DDRT to make PT5 an output, see Program 3.1. This function should be called once at the start of the system. After initialization, the on and off functions can be called to control the applicance. Software that operates by affecting only the bits it has to without changing any of the other bits is called friendly. The oraa instruction is used to set bits and the anda instruction clears bits. Program 3.1 Subroutines to control a solid-state relay.
SSR_Init ldaa oraa staa rts SSR_On ldaa oraa staa rts SSR_Off ldaa anda staa rts
DDRT #$20 DDRT
;PT5 output
PTT #$20 PTT
;PT5 high
PTT #$BF PTT
;PT5 low
// Make PT5 an output SSR_Init(void){ DDRT |= 0x20; } // Make PT5 high void SSR_On(void){ PTT |= 0x20; } // Make PT5 low void SSR_Off(void){ PTT &= ~0x20; }
Checkpoint 3.29: Rewrite the assembly code in Program 3.1 using the bset and bclr instructions.
While we’re introducing digital circuits, we need digital storage devices, which are are essential components used to make registers and memory. The simplest storage device is the set-reset flip-flop. One way to build one is shown on the left side of Figure 3.11. If the inputs are S*0 and R*1, then the Q output will be one. Conversely, if the inputs are S*1 and R*0, then the Q output will be 0. Normally, we leave both the S* and R* inputs high. We make the signal S* go low, then back high to set the flip-flip, making Q 1. Conversely, we make the signal R* go low, then back high to reset the flip-flip, making Q 0. If both S* and R* are 1, the value on Q will be remembered or stored. This flip-flop enters an unpredicable mode with S* and R* are simulataneously low. Figure 3.11 Digital storage elements.
Set-Reset flip-flop S*
Q
Gated D flip-flop S*
D W
R*
R*
Q
74HC374
74HC74 8 D Q clock
D Q clock G
8
The gated D flip-flop is also shown in Figure 3.11. The front-end circuits take a data input, D, and a control signal, W, and produce the S* and R* commands for the set-reset flip-flop. For example, if W 0, then the flip-flip is in its quiescent state, remembering the value on Q that was previously written. However, if W 1, then the data input is stored into the flip-flip. In particular, if D 1 and W 1, then S*0 and R*1, making Q 1. Furthermore, if D 0 and W 1, then S*1 and R*0, making Q 0. So, to use the gated flip-flip, we first put the data on the D input, next we make W go high, then we make W go low. This causes the data value to be stored at Q. After W goes low, the data does not need to exist at the D input anymore. If the D input changes while W is high, then the Q output will change correspondingly. However, the last value on the D input is remembered or latched when the W falls, as shown in Table 3.14. The D flip-flop, shown on the right of Figure 3.11, can also be used to store information. D flip-flips are the basic building block of RAM and registers on the computer. To save information, we first place the digital value we wish to remember on the D input, then give a rising edge to the clock input. After the rising edge of the clock, the value is available at the Q output, and the D input is free to change. The operation of the clocked D flip-flop is
3.6 䡲 Logical Operations
75
defined on the right side of Table 3.14. The 74HC374 is an 8-bit D flip-flop, such that all 8 bits are stored on the rising edge of a single clock. The 74HC374 is similar in structure and operation to a register, which is high speed memory inside the processor. If the gate (G) input on the 74HC374 is high, its outputs will be HiZ (floating), and if the gate is low, the outputs will be high or low depending on the stored values on the flip-flop. Table 3.14 D flip-flop operation. Qold is the value of the D input at the time of the active edge of on W or clock.
D
W
Q
D
clock
Q
0 1 0 1 0 1
0 0 1 1 T T
Qold Qold 0 1 0 1
0 0 1 1 0 1
0 1 0 1 c c
Qold Qold Qold Qold 0 1
Second, the tristate driver, shown in Figure 3.12, can be used dynamically control signals within the computer. The tristate driver is an essential component from which computers are built. To active the driver, we make its gate (G) low. When the driver is active, its output (Y) equals its input (A). To deactive the driver, we make its G high. When the driver is deactive, its output Y floats independent of A. We saw this floating state with the open collector logic, and it is also called HiZ or high impedence. The HiZ output means the output is neither driven high or low. The operation of a tristate driver is defined in Table 3.15. The 74HC244 is an 8-bit tristate driver, such that all 8 bits are active or deactive controlled by a single gate. The 74HC374 8-bit D flip-flop includes tristate drivers on its outputs. Normally, we can’t connect to digital outputs together. The tristate driver provides a way to connect multiple outputs to the same signal, as long as at most one of the gates is active at a time. Figure 3.12 A 1-bit and an 8-bit tristate driver.
74HC125 Y
A G
A G
A +5V
Y
+5V
+5V T3
74HC244 8 8
G A
T5 T6
T4 Y
T1 T7
G T2
Table 3.15 Tristate driver operation. HiZ is the floating state, such that the output is not high or low.
G
T8
A
G
T1
T2
T3
T4
T5
T6
T7
T8
Y
0 1 0 1
0 0 1 1
on on off off
off off on on
on off on off
off on off on
on on off off
off on off on
on off on off
on on off off
0 1 HiZ HiZ
To understand how a tristate driver works, look at the various pieces of the circuit in Figure 3.12. Transistors T1 and T2 create the logical complement of G. Similarly, transistors T3 and T4 create the complement of A. An input of G 0 causes the driver to be active. In this case, both T5 and T8 will be on. With T5 and T8 on, the circuit behaves like a cascade
76
3 䡲 Representation and Manipulation of Information
of two NOT gates, so the output Y equals the input A. However, if the input G 1, both T5 and T8 will be off. Since T5 is in series with the 5 V, and T8 in series with the ground, the output Y will be neither high nor low. I.e., it will float.
3.7
Shift Operations When programming in C, the shift is a binary operation. In other words, the > operators take two inputs and yield one output, e.g., r m >> n. But at the machine level (i.e., assembly programming), the shift operators are actually unary operations, e.g., r m >> 1. The assembly instructions used for shifting will shift one bit at a time. If you want to shift multiple times, you will have to execute the instruction multiple times. The logical shift right (LSR) is the equivalent to an unsigned divide by 2, as shown in Figure 3.13. A zero is shifted into the most significant position, and the carry flag will hold the bit shifted out.
Figure 3.13 8-bit logical shift right.
LSR
0
C
Consider the top row of 8 D flip-flops of Figure 3.14 as a register containing an 8-bit value. The LSR function can be implemented in hardware as a two step process. The first step, which occurs on the falling edge of shift (rising edge of copy), is to make a copy of the 8 bits into the lower row of D flip-flips. Then, on the rising edge of the shift signal, the new shifted value is clocked back in the top row. Figure 3.14 8-bit logical shift right hardware.
0
b7
b6
b5
b4
D Q
D Q
D Q
D Q
b3 D Q
b2 D Q
b1
b0
D Q
D Q
C D Q
shift D Q
D Q
D Q
D Q
D Q
D Q
D Q
D Q
copy
The arithmetic shift right (ASR) is the equivalent to a signed divide by 2, as shown in Figure 3.15. Notice that the sign bit is preserved and the carry flag will hold the bit shifted out. Figure 3.15 8-bit arithmetic shift right.
ASR
C
Checkpoint 3.30: Use D flip-flops like Figure 3.14 to build an 8-bit ASR function.
The same shift left operation works for both unsigned and signed multiply by 2, as shown in Figure 3.16. In other words, the arithmetic shift left (ASL) is identical to the logical shift left (LSL). A zero is shifted into the least significant position, and the carry bit will contain the bit that was shifted out. Figure 3.16 8-bit shift left.
LSL/ASL C
0
The roll operations can be used to create multiple-byte shift functions. Roll right and roll left are shown in Figure 3.17. In each case, the carry is shifted into the 8-bit byte, and the carry bit will contain the bit that was shifted out. The simplest way to perform a shift operation on the microcomputer is to use a register like Register A or Register B. The asla and lsla instructions have identical machine
3.7 䡲 Shift Operations Figure 3.17 8-bit roll right and 8-bit roll left.
ROR
77
C
ROL
C
codes. The two assembly language names allow the programmer to write clearer code (using lsla for unsigned numbers and asla for signed numbers). The shift instructions use inherent addressing. The N bit is set if the result is negative. The Z bit is set if the result is zero. The V bit is set on a signed overflow, and detected by a change in the sign bit. The C bit is the carry out after the shift. asla aslb asld lsla lslb lsld asra asrb asrd lsra lsrb lsrd rola rolb rora rorb
;RegA=RegA*2 ;RegB=RegB*2 ;RegD=RegD*2 ;RegA=RegA*2 ;RegB=RegB*2 ;RegD=RegD*2 ;RegA=RegA/2 ;RegB=RegB/2 ;RegD=RegD/2 ;RegA=RegA/2 ;RegB=RegB/2 ;RegD=RegD/2 ; ; ; ;
Signed shift left, same as lsla Signed shift left, same as lslb Signed shift left, same as lsld Unsigned shift left, same as asla Unsigned shift left, same as aslb Unsigned shift left, same as asld Signed shift right Signed shift right Signed shift right Unsigned shift right Unsigned shift right Unsigned shift right Rotate RegA (C←A7←...←A0←C) Rotate RegB (C←B7←...←B0←C) Rotate RegA (C→A7→...→A0→C) Rotate RegB (C→B7→...→B0→C)
Example 3.7 Write assembly code to implement M N >> 2, where M and N are 16-bit unsigned variables. Solution We need to use a 16-bit register, because we have 16-bit data. First, we perform a 16-bit read, bringing N into Register D. Second we divide by 4 using two shift right operations, and lastly we store the result into M. Since the value gets smaller, no overflow can occur. If the variables were signed, then the two lsrd instructions should be replaced with a asrd instructions ldd N lsrd lsrd std M
Checkpoint 3.31: Let N and M be 8-bit signed locations. Write assembly code to implement M4*N. Maintenance Tip: Use the asla instruction when manipulating signed numbers, and use the lsla instruction when shifting unsigned numbers.
Example 3.8 Take two 4-bit nibbles and combine them into one 8-bit value. Solution The solution uses the shift operation to move the bits into position, then it uses the or operation to combine the two parts into one number. Let High and Low be the unsigned 4-bit
78
3 䡲 Representation and Manipulation of Information
components, which will be combined into a single unsigned 8-bit Result. We will assume both High and Low are bounded within the range of 0 to 15. The expression High127 R=127
R=R16
R=R16
end
end
The C code in Program 3.3 adds and subtracts two 8-bit signed numbers. The compiler will automatically promote A and B to signed 16-bit values before the addition. Program 3.3 Using promotion to detect and compensate for signed overflow errors.
char A,B,R; void add(void){ short result = A+B; /* if(result>127){ /* result = 127; /* } if(result127){ /* result = 127; /* } if(resultProcessor . . . command and select the processor you wish to use. You could short-cut through this tutorial by copying the Tutor5b.rtf, Tutor5b.uc, and Tutor5b.io files from the web instead of building them up from scratch.
5.9 䡲 Tutorial 5b. Microcomputer-Based Lock
183
Question 5b.1 What are the RAM and ROM locations for your microcomputer? Action: Type the following assembly code into the Tutor5b.rtf file, replacing RAM with the first RAM address, and ROM with the first ROM address. For the 9S12 you set STCK to the last RAM address plus 1. org RAM ; global variables will go here org ROM main lds #STCK ; program will go here stop ; constant data will go here org $FFFE fdb main Action: We begin with a list of the inputs and outputs. We specify the range of values and their significance. In this example, we will use PTT, with bits 6-0 being inputs. The seven input signals represent an unsigned integer from 0 to 127. Port T bit 7 will be an output. If PT7 is 1 then the solenoid will activate and the door will be unlocked. Click on the Tutor5b.io window and add seven positive logic switches to PT6 to PT0, and one LED to PT7. The LED will simulate the solenoid. Figure T5b.2 shows resulting IO window. The switches in both Figures T5b.1 and T5b.2 are positive logic, meaning if the switch is pushed a logic high is seen at the corresponding input.
Figure T5b.2 I/O window for the microcomputercontrolled lock.
Action: Open the Port12.rtf file. Copy the PTT and DDRT lines, and paste them into your Tutor5b.rtf file. Question 5b.2 What are the addresses of PTT and DDRT?
184
5 䡲 Modular Programming Action: Next, we make a list of the required data structures. Data structures are used to save information. If the data needs to be permanent, then it is allocates in global space. If the software will change its value then it will be allocated in RAM. In this example we need a 16-bit unsigned counter. Add this code to the global variable section. The rmb pseudo-op will reserve multiple bytes. cnt rmb 2 ; 16-bit counter If data structure can be defined at assembly time and will remain fixed, then it can be allocated in EEPROM. In this example, we will define an 8-bit fixed constant to hold the key code, which the operator needs to set to unlock the door. We will place these lines directly after the program so that they will be defined in ROM or EEPROM memory. The fcb pseudo-op defines an 8-bit constant. Add this code to the constant data section (after the bra loop and before the org $FFFE). This line also assigns the symbolic name key to the corresponding address of the information. key fcb %00100011 ; key code It is not real clear at this point exactly where in EEPROM this constant will be, but luckily for us, the assembler will calculate the exact address automatically. After the program is assembled, we can look at the line in the listing file or in the symbol table to see where in memory each structure is allocated. Action: Next we develop the software algorithm, which is a sequence of operations we wish to execute. There are many approaches to describing the plan. Experienced programmers can develop the algorithm directly in assembly language. On the other hand, most of us need an abstractive method to document the desired sequence of actions. Flowcharts, pseudo-code, and high-level language code are three common descriptive formats. The TExaS application is unique in regards that if you draw the flowchart on the computer, you can paste it directly into the program as a comment. There are no formal rules regarding pseudo-code, rather it is a shorthand for describing what to do and when to do it. We can place our pseudo-code as documentation into the comment fields of our program. Figure T5b.3 shows a flowchart on the left and pseudo-code and C code on the right for our digital lock example. The loop counter (400) is the number of times the loop must be executed to wait 1 ms.
Figure T5b.3 Flowchart, pseudocode and C code for a microcomputercontrolled lock.
main Initialize ports Solenoid=off cnt=400 different
switches match key
Solenoid =off cnt=400
cnt=cnt-1 cnt
>0
0 Solenoid=on
Pseudo Code 1) initialize ports PT6-PT0 inputs PT7 output 2) turn off solenoid 3) set counter to 400 4) repeat indefinitely if switch matches key a) decrement counter b) if counter is zero turn on solenoid otherwise a) turn off solenoid b) set counter to 400 C Code DDRT=0x80; PTT=0; cnt=400; while(1){ if((PTT&0x7F==key){ if((--cnt)==0) PTT |= 0x80;} else{ PTT=0; cnt=400;}}
Next we write assembly code to implement the algorithm as illustrated in the above flowchart and pseudo code. In step 1), we initialize Port T so that PT7 is an output and PT6 to PT0 are inputs. ldaa #$80 staa DDRT ; PT6-PT0 input, PT7 output In step 2), we turn off the solenoid. Remember, writing to an input pin has no effect, so this operation only changes bit 7. clr PTT ; disable solenoid lock
5.9 䡲 Tutorial 5b. Microcomputer-Based Lock
185
In step 3), we initialize the counter to 400, which is the number of loops required to wait 1 ms. The 9S12 requires 20 cycles to execute the loop. ldx #400 stx cnt ; 1,000,000ns/(125*20) In step 4) we implement the indefinite loop. We place an assembly label at the program locations to which we wish to branch. The bra instruction is an unconditional branch. loop bra loop Inside the indefinite loop we test to see if the switch pattern matches the key code. In this implementation we branch to off if the switches do not match the key code. If they do match, we will execute the instruction immediately after the bne off. loop ldaa PTT ; [3] input from 7 switches anda #$7F ; [1] cmpa key ; [3] match key code? bne off ; [3] If the switches match the key code, then the 16-bit counter is decremented. ldx cnt ; [3] dex ; [1] stx cnt ; [3] If the counter becomes zero, then the door is unlocked. The bne instruction will go to loop if cnt is not equal to zero. bne loop ; [3]=20 cycles/loop ldaa #$80 staa PTT ; enable solenoid lock bra loop If the switches do not match the key code, then the solenoid is turned off and the cnt set back to 400. off ldx #400 stx cnt ; 1,000,000ns/(125*20) clr PTT ; disable solenoid lock bra loop We put the above pieces together to create the source code, as shown in Program T5b.1. The order of the instructions is very important because it determines the sequence of execution. The last two lines will define where the computer will start execution after a reset.
Program T5b.1 Lock program for Tutorial 5b.
; activate solenoid (PT7=1) if switches match key code PTT equ $0240 ; PT6-PT0 switches, PT7 solenoid lock DDRT equ $0242 ; specifies input or output org $0800 ; RAM cnt rmb 2 ; 16-bit counter org $4000 ; EEPROM main lds #$4000 ldaa #$80 staa DDRT ; PC6-PC0 input, PC7 output clr PTT ; disable solenoid lock ldx #400 stx cnt ; 1,000,000ns/(125*20) loop ldaa PTT ; [3] input from 7 switches anda #$7F ; [1] cmpa key ; [3] match key code? bne off ; [3] ldx cnt ; [3] dex ; [1]
continued on p. 186
186
5 䡲 Modular Programming
continued from p. 185 stx bne ; 7 switches ldaa staa bra off ldx stx clr bra key fcb org fdb
cnt ; [3] loop ; [3]=20 cycles/loop match key code for more than 10 ms #$80 PTT ; enable solenoid lock loop #400 cnt ; 1,000,000ns/(125*20) PTT ; disable solenoid lock loop %00100011 ; key code $FFFE main
Action: The last stage is debugging. You should run the system to verify its proper behavior. For a simple system like this, we could test all 128 possible input values, verifying that only 0100011 is the only code that unlocks the door (turns on the LED). Question 5b.3 What switch pattern activates the solenoid (turns on the LED)? Question 5b.4 How do you change the program so the key is Sw6,5,4off, Sw3,2,1,0on?
5.10
Homework Problems Homework 5.1 Assume you have a 16-bit unsigned global variable H. Write assembly code that implements if(H > 1234)isGreater(); Homework 5.2 Assume you have a 16-bit signed global variable H. Write assembly code that implements if(H > -1234)isGreater(); Homework 5.3 Assume you have an 8-bit unsigned global variable G. Write assembly code that implements if(G < 50) isLess(); else isMore(); Homework 5.4 Assume you have a 16-bit signed global variable H. Write assembly code that implements if(H < -500) isLess(); else isMore(); Homework 5.5 Assume you have an 8-bit global variable G. Write assembly code that implements while(G&0x80)body(); Homework 5.6 Write assembly code that implements while(PTT&0x01)body(); Homework 5.7 You will write four assembly language versions of the following C code n=100; while(n!=0){n--; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as an 8-bit global variable. c) Assume the variable n is implemented as a 16-bit variable in Register D. d) Assume the variable n is implemented as an 8-bit variable in Register A. Homework 5.8 You will write four assembly language versions of the following C code n=0; while(n3
where the divide by 8 is integer math without rounding. Notice that if J is less than or equal to 31, then J divided by 8 will be less than or equal to 3. Let K be the bottom three bits of J. K = J&0x07;
A mask will specify the bit location within the byte. In C, the following array can be used unsigned const char Masks[8]={0x80,0x40,0x20,0x10,0x08,0x04,0x02, 0x01};
In assembly, this array can be defined in ROM as Masks fcb $80,$40,$20,$10,$08,$04,$02,$01
Recall that K is the bottom three bits of J. For example, if K is 0102 then we use the bit mask of $20 to access the information stored in the appropriate byte of the Video buffer mask = Masks[K];
Program 6.11 takes the row and column index values and calculates the memory address and bit mask to access that bit in the Video matrix. Access is a private function for this module. A helper function is another name for private functions used inside a module, but is not called by software outside the module. Conversely, the other four functions of this module are public. Functions to clear, set, and toggle bits in the Video matrix are shown in Program 6.12. For all four public functions, the parameters I, J as passed by value, and the video buffer itself
208
6 䡲 Pointers and Data Structures
Program 6.11 A helper function to access a bit matrix.
; ********* Access *************** ; Access the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg X points to the byte of interest ; Reg A is the Mask to access that bit Access lsla lsla ;4*I pshb ;save a copy of J lsrb lsrb lsrb ;Reg B = J>>3 aba ;Reg A = 4*I + J>>3 ldx #Video tab abx ;Reg X = Video + 4*I + J>>3 pulb ;Reg B = J again andb #$07 ;Reg B = K (bottom three bits of J) ldy #Masks ldaa B,Y ;Reg A = mask = Masks[K] rts
is a private global within this module. A function that tests the current value within the matrix is shown in Program 6.13. In order for the image to appear on the display, there must be a hardware interface that translates the data in the video buffer onto the graphics hardware. A typical way this translation occurs is for the video buffer to exist in the display hardware itself. The software reads and writes this buffer in a similar way as described in this example. The graphics hardware is then responsible for copying the data from the buffer onto the display.
Program 6.12 Functions that modify the bit matrix.
; Clear the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_ClrBit bsr Access coma ;Not(mask) zero in bit location anda 0,x ;Clear bit staa 0,x rts ; Set the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_SetBit bsr Access oraa 0,x ;Set bit staa 0,x rts ; Invert the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_InvBit bsr Access eora 0,x ;Flip bit staa 0,x rts
6.5 䡲 Structures Program 6.13 A function that reads the bit matrix.
6.5
209
; Read the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg CC zero bit is the value read from the array ; Reg A is zero or not zero depending on the bit Display_ReadBit bsr Access anda 0,x ;Z=1 if bit was zero, Z=0 if bit was one rts
Structures A structure has elements with different types and/or precisions. In C, we use struct to define a structure. The const modifier causes the structure to be allocated in ROM. Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. In the example shown in Figure 6.13, Name is a variable length ASCII strings, but as you can see, we have to specify its maximum size. const struct port{ unsigned char AndMask; // bits that can change unsigned char OrMask; // bits that must stay high unsigned char *Addr; // Port Address unsigned char Name[10]; // ASCII string }; typedef const struct port portType; portType PortT={0x15,0x82,0x0240,”PTT”};
Figure 6.13 A structure collects objects of different sizes into one object.
$F950 $F951 $F952 $F954
$15 $82 $0240 “PTT”,0,0,0,0,0,0,0
Checkpoint 6.13: Most C compilers will align 16-bit elements within structures to an even address. How would Figure 6.13 have been different if the positions of OrMask and Addr had been reversed?
In Program 6.14, we can use the equ pseudo-op to make our software more readable. The subroutine Port_Out uses call by reference for the port structure and call by value for the data written to the port. Program 6.14 Assembly language example of a structure.
AndMask equ 0 OrMask equ AndMask+1 Addr equ OrMask+1 Name equ Addr+2 ; Reg A = data to output ; Reg X = pointer to Port structure Port_Out psha anda AndMask,x ;modify input with andmask oraa OrMask,x ;modify input with ormask ldy Addr,x ;get Port address staa 0,y ;output ldx Name,x ;pointer to string jsr OutString ;print string
continued on p. 210
210
6 䡲 Pointers and Data Structures
continued from p. 209 pula rts ;******************************** PortT fcb $15,$82 ;AndMask,OrMask fdb $0240 ;pointer to PTT fcc “PTT” ;string fcb 0,0,0,0,0,0,0 main lds #$4000 movb #$FF,DDRT ldaa #$00 ;data loop ldx #PortT ;pointer to structure bsr Port-Out inca bra loop
Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. If the structure resides in RAM, then the system will have to initialize it explicitly via software execution. Again, most C compilers will implicitly initialize variable structures.
6.6
*Tables A table is a collection of identically sized structures. Program 6.15 and Figure 6.14 show a table containing a simple data base. Each entry in the table records the name, life span, and the year of inauguration. The names are variable length, but a fixed size will be allocated so that each table entry will be exactly 36 bytes. The C compiler will fill the unused bytes in the Name field with zeros.
Program 6.15 A simple data base with three entries.
Figure 6.14 A table collects structures of same size into one object.
const struct entry{ unsigned char Name[30]; // null-terminated string unsigned short life[2]; // birth year, year died unsigned short year; // year of inauguration }; typedef const struct entry entryType; entryType Presidents[3]={ {“George Washington”,{1732,1799},1789}, {“John Adams”,{1735,1826},1797}, {“Thomas Jefferson”,{1743,1826},1801} };
"George Washington" 1732 1799 1789 "John Adams" 1735 1826 1797 "Thomas Jefferson" 1743 1826 1801
Checkpoint 6.14: Why do elements of a table all have to be the same size?
Program 6.16 shows the assembly language definition of the data base. We use equ pseudoops to make the software more readable.
6.6 䡲 *Tables Program 6.16 The entries of a table written in assembly language.
211
NAME equ 0 LIFE equ NAME+30 YEAR equ LIFE+4 SIZE equ YEAR+2 Presidents fcb “George Washington”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0 fdb 1732,1799 fdb 1789 fcb “John Adams”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 fdb 1735,1826 fdb 1797 fcb “Thomas Jefferson”,0 fcb 0,0,0,0,0,0,0,0,0,0,0,0,0 fdb 1743,1826 fdb 1801
To access the Inauguration year of the second president in C, we could execute theyear = Presidents[1].year;
This operation in assembly is ldd std
Presidents+SIZE+YEAR theyear
If we wanted the year the third president died in C, we could execute theyear = Presidents[2].life[1];
This operation in assembly is ldd std
President+2*SIZE+LIFE+2 theyear
Program 6.17 shows an assembly language function that prints the name of the nth president. First it calculates the address of the nth entry (Presidentsn*SIZE). In general, the next step would be to add the offset (in this case NAME is zero). This program assumes SIZE*n is less than 256. Program 6.17 A subroutine that prints the name of a president.
;Print the name of the nth entry ;Reg A is the index n ranging from 0 to 2 OutPresident ldx #Presidents ;Reg X points to the table ldab #SIZE ;36 bytes in each entry mul ;Reg D = SIZE*n abx ;Reg X = base +SIZE*n jsr OutString ;Prints name rts
The table, shown in Program 6.18, contains five identically formated structures. Each structure (e.g., PORTA) contains five entries: an 8-bit ASCII character, two pointers, and two byte values. Again, the equ pseudo-ops clarify access the table. It could be used to write a I/O port driver, separating the high-level software from the low-level hardware.
212
6 䡲 Pointers and Data Structures
Program 6.18 A table containing the information about the some 9S12 I/O ports.
6.7
PortChar equ 0 DataPT equ 2 DirPT equ 4 InitDir equ 6 InitData equ 7 Table PORTA fcb 'A' fdb $0000 fdb $0002 fcb 0,0 PORTB fcb 'B' fdb $0001 fdb $0003 fcb $FF,$55 PORTJ fcb 'J' fdb $0268 fdb $026A fcb 0,0 PORTM fcb 'M' fdb $0250 fdb $0252 fcb 0,0 PORTT fcb 'T' fdb $0240 fdb $0242 fcb $FF,$00
;ASCII character specifying port name ;Pointer to port address ;Pointer to direction register ;8-bit initial value of direction reg ;8-bit initial value to output ;Port A ;Address of PortA ;DDRA ;Initially input ;Port B ;Address of PortB ;DDRB ;Initially output=$55 ;Port J ;Address of PortJ ;DDRJ ;Initially input ;Port M ;Address of PortM ;DDRM ;Initially input ;Port T ;Address of PortT ;DDRT ;Initially output=$00
*Trees A graph is a general linked structure without limitations, see Figure 6.15. An acyclic graph is a linked structure without loops. Although there may be multiple pathways to access a node in an acyclic graph, all paths have a finite length. A tree is an acyclic graph with a single root node from which a unique path to each node can be traced. The pointers in an acyclic graph do not form closed circles.
Figure 6.15 Graphs and trees have nodes and are linked with pointers.
Graph
Acyclic graph
Tree
Figure 6.16 shows an arbitrary tree can have a variable number of leaves, while a binary tree consists of node with exactly two pointers (i.e., links or branches.) One way to implement an arbitrary tree is to place a null-terminated list of pointers in each node. For the binary tree, each node has exactly two links, and we use a null pointer to specify the link is not valid. Checkpoint 6.15: Neglecting shortcuts and the StartMenu for now, what type of organization best describes the file structure on the Windows OS? Checkpoint 6.16: Shortcuts and the StartMenu on the Windows OS allow for files and programs to be accessed in multiple ways. Observe the properties of a shortcut on your computer. Does Windows OS implement an acyclic graph? Checkpoint 6.17: If you made an electronic dictionary where each word in the definition portion of an entry was linked to its definition, what type of structure would you have?
6.7 䡲 *Trees Figure 6.16 A tree can be constructed with only down arrows, and there is a unique path to each node.
Binary tree
Arbitrary tree Root
213
Root
Lists with 0 1,2,... links
Info
Info Info
Lists with exactly 2 links
Info null
null
Info
Info null null
Info
Info
null
Info
null
Info null null
null
Info null null
Info null null
A null pointer signifies the end or leaf of the tree. Since each node of a tree has exactly one pointer to it, there is a unique path from the root to each node. One application of a tree is dictionary storage, as shown in Figure 6.17. Each word is stored as a node in the tree. The position of each word in the tree is determined from its alphabetical order. In this simple dictionary, each node contains a name that is a single letter and a value that is an 8-bit number. The binary tree is sorted by name, meaning elements alphabetically before this node can be found using the first link, and elements alphabetically after this node can be found using the second link. Figure 6.17 A binary tree is constructed so that earlier elements are to the left and later ones to the right.
Root S
F
$88
$84
V
null A
$8B
$86
null T
null
null
null
null
$8A
Program 6.19 shows the definition of the tree structure drawn in Figure 6.17. If the dictionary is static, then we can define it in ROM. If it needs to be dynamic, then it must be allocated in RAM and initialized at run time. In Program 6.19, the tree is implemented as a constant structure. Name Data Left Right Root NULL
equ equ equ equ equ equ
0 1 2 4 WS 0
;name of the node ;data for this node ;pointer to son ;pointer to son ;Pointer to top ;undefined address
#define NULL 0 const struct Node{ unsigned char Name; unsigned char Data; const struct Node *Left; const struct Node *Right;};
continued on p. 214 Program 6.19 Definition of a simple binary tree.
214
6 䡲 Pointers and Data Structures
continued from p. 213 WS
WV
WT
WF
WA
fcb fdb fdb fcb fdb fdb fcb fdb fdb fcb fdb fdb fcb fdb fdb
‘S’,$88 ;name,data WF ;Left son WV ;Right son ‘V’,$86 WT ;WT is a left son NULL ;no right son ‘T’,$8A NULL ;no children NULL ‘F’,$84 WA ;WA is a left son NULL ;no right son ‘A’,$8B NULL ;no children NULL
typedef const struct Node NodeType; typedef NodeType * NodePtr; #define Root WS #define WS &Tree[0] #define WV &Tree[1] #define WT &Tree[2] #define WF &Tree[3] #define WA &Tree[4] NodeType Tree[5]={ { ‘S’,0x88, WF, WV}, { ‘V’,0x86, WT, NULL}, { ‘T’,0x8A, NULL, NULL}, { ‘F’,0x84, WA, NULL}, { ‘A’,0x8B, NULL, NULL}};
Program 6.20 presents assembly and C functions that search the binary tree. To look up a word in this dictionary, one starts at the root. The following sequence is repeated until the entry is found (success) or a null point is reached (failure). If the current name matches, then it quits returning the data (its definition) at that node. If the current word is not correct, then we will search left or right. If the look up word is less than the current word, go left. If the look up word is greater than the current word, go right. The program quits with a false result if the pointer becomes null.
;Inputs: Reg A = look up letter ;Outputs: Reg A=0 if not found, ; =data if found ; If fails RegY=>last link Look ldy #Root ldx 0,y ;start at root loop cpx #NULL beq fail cmpa Value,x ;Match beq found ;Skip if found blo golft leay Right,x ;letter>value ldx 0,y ;go right bra loop golft leay Left,x ;letterValue == letter){ return(pt->Data); // good } if(pt->Value < letter){ pt = pt->Right; } else{ pt = pt->Left; } } return 0; /* not in tree */ }
Program 6.20 Binary tree search functions.
In order to add and remove nodes at run time, the tree must be defined in RAM. Program 6.21 shows how to insert a new word into the dictionary. One first searches for the word (the search should fail), then change the null pointer to point to the new list. If the search fails in the previous Look subroutine, Reg Y contains the address of the null pointer to be changed.
6.7 䡲 *Trees Program 6.21 Program to add a node to a binary tree.
215
; Inputs : Reg Y points to a new word to be added to the dictionary ; the new word is already somewhere in memory formatted e.g., ; fcb ‘J’,6 ; fdb NULL ; fdb NULL New pshy ldaa 0,Y ;Reg A is the name of the new word bsr Look pulx ;RegX points to new node to add tsta bne ok ;skip if already defined stx 0,Y ;link into existing tree OK rts
Figure 6.18 shows the binary tree as the nodes J, U, and G are added to the dictionary. Notice, that after the J and U are added, there is something inefficient about this tree of depth 4 and size 7. A binary tree of depth n is capable of holding 2n 1 nodes. A binary tree is full if it has depth n and contains from 2n1 to 2n 1 nodes. The tree in Figure 6.18 after J and U are added is not full; all the other three trees are full. Figure 6.18 Nodes are added to a binary tree such that the alphabetical order is maintained.
add J
Initial tree S
S
F null
A null
V null
T
null null
F
null
add G
F
null null null null null
F null
T U
null null
null
T
S
V J
J
null null null null null null
add U
S
A
A
V
A null null
V J
G
null null
null null
null
T U
null null
This may seem like a lot of trouble for such a simple problem. However, the search time for a binary tree increases as the log2 of the size of the dictionary (more precisely, the search time increases linearly with the depth of the tree). For a simple linear structure (e.g., table or linked list), the search time increases linearly with the dictionary size. When the dictionary is millions of words, the time savings can be extraordinary. There are similar savings in the insertion and deletion times. The dynamic efficiency (execution speed) is enhanced at the cost of static efficiency (memory storage.) Checkpoint 6.18: Consider the problem of designing a large address book where each entry as a first name, a last name, and an address field. You wish to be able to search the data base both by first name and by last name. How do you organize the structure?
216
6.8
6 䡲 Pointers and Data Structures
Finite-State Machines with Statically Allocated Linked Structures 6.8.1 Abstraction
Software abstraction allows us to define a complex problem with a set of basic abstract principles. If we can construct our software system using these abstract building blocks, then we have a better understanding of both the problem and its solution. This is because we can separate what we are doing (policies) from the details of how we are getting it done (mechanisms.) This separation also makes it is easier to optimize. Abstraction provides for a proof of correct function, and simplifies both extensions and customization. The abstraction presented in this section is the Finite-State Machine (FSM.) The abstract principles of FSM development are the inputs, outputs, states, and state transitions. The FSM state graph defines the time-dependent relationship between its inputs and outputs. If we can take a complex problem and map it into a FSM model, then we can solve it with a simple FSM software tools. Our FSM software implementation will be easy to understand, debug, and modify. Other examples of software abstraction include Proportional Integral Derivative digital controllers, fuzzy logic digital controllers, neural networks, and linear systems of differential equations. In each case, the problem is mapped into well-defined model with a set of abstract yet powerful rules. Then, the software solution is a matter of implementing the rules of the model. In our case, once we prove our software correctly solves one FSM, then we can make changes to the state graph and be confident that our software solution correctly implements the new FSM. The FSM controller employs a well-defined model or framework with which we solve our problem. The state graph will be specified using either a linked or table data structure. An important aspect of this method is to create a one-to-one mapping from the state graph into the data structure. The three advantages of this abstraction are (1) it can be faster to develop because many of the building blocks preexist; (2) it is easier to debug (prove correct) because it separates conceptual issues from implementation; and (3) it is easier to change. In a Moore FSM, the output value depends only on the current state, and the inputs affect the state transitions. On the other hand, the outputs of a Mealy FSM depend both on the current state and the inputs. When designing a FSM, we begin by defining what constitutes a state. In a simple system like a single intersection traffic light, a state might be defined as the pattern of lights (i.e., which lights are on and which are off). In a more sophisticated traffic controller, what it means to be in a state might also include information about traffic volume at this and other adjacent intersections. The next step is to make a list of the various states in which the system might exist. As in all designs, we add outputs so the system can affect the external environment and inputs so the system can collect information about its environment or receive commands as needed. The execution of a Moore FSM repeats this sequence over and over: 1. 2. 3. 4.
Perform output, which depends on the current state Wait a prescribed amount of time (optional) Input Go to next state, which depends on the input and the current state
The execution of a Mealy FSM repeats this sequence over and over 1. 2. 3. 4.
Wait a prescribed amount of time (optional) Input Perform output, which depends on the input and the current state Go to next state, which depends on the input and the current state
There are other possible execution sequences. Therefore, it is important to document the sequence before the state graph is drawn. The high-level behavior of the system is defined by the state graph. The states are drawn as circles. Descriptive states names help explain what the machine is doing. Arrows are drawn from one state to another and labeled with the input value causing that state transition.
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
217
Observation: If the machine is such that a specific output value is necessary “to be a state”, then a Moore implementation will be more appropriate. Observation: If the machine is such that no specific output value is necessary “to be a state”, but rather the output is required to transition the machine from one state to the next, then a Mealy implementation will be more appropriate.
A linked structure consists of multiple identically structured nodes. Each node of the linked structure defines one state. One or more of the entries in the node is a pointer (or link) to other nodes. In an embedded system, we usually use statically allocated fixed-size linked structures, which are defined at compile time and exist through out the life of the software. In a simple embedded system, the state graph is fixed, so we can store the linked data structure in nonvolatile memory. For complex systems where the control functions change dynamically (e.g., the state graph itself varies over time), we could implement dynamically allocated linked structures, which are constructed at run time and where the number of nodes can grow and shrink in time. We can also use a table structure to define the state graph, which consists of contiguous multiple, identically structured elements. Each element of the table defines one state. One or more of the entries is an index to other elements. An important factor when implementing FSMs is that there should be a clear and one-to-one mapping between the FSM state graph and the data structure. I.e., there should be one element of the structure for each state. If each state has four arrows, then each node of the linked structure should have four links.
6.8.2 Moore FiniteState Machines
A Moore FSM has the outputs a function of only the current state. In constrast, the outputs are a function of both the input and the current state in a Mealy FSM. Often, in a Moore FSM, the specific output pattern defines what it means to be in the current state. In the first example, the inputs and outputs are simple binary numbers read from and written to a parallel port.
Example 6.6 Design a traffic-light controller for the intersection of two equally busy oneway streets. The goal is to maximize traffic flow, minimize waiting time at a red light, and avoid accidents. Solution The intersection has two one-ways roads with the same amount of traffic: North and East, as shown in Figure 6.19. Controlling traffic is a good example, because we all know what is supposed to happen at the intersection of two busy one-way streets. We begin the design defining what constitutes a state. In this system, a state describes which road has authority to cross the intersection. The basic idea, of course, is to prevent Southbound cars to enter the intersection at the same time as Westbound cars. In this system, the light pattern defines which road has right of way over the other. Since an output pattern to the lights is necessary to remain in a state, we will solve this system with a Moore FSM. It will have two inputs (car sensors on North and East roads) and six outputs (one for each light in the traffic Figure 6.19 Traffic light interface.
9S12
PT1 PT0 PT7 PT6 PT5 PT4 PT3 PT2
North R Y G
East R Y G
218
6 䡲 Pointers and Data Structures
signal.) The six traffic lights are interfaced to Port T bits 7 to 2 and the two sensors are connected to Port T bits 1 to 0, such that PT1 0, PT0 0 means no cars exist on either road PT1 0, PT0 1 means there are cars on the East road PT1 1, PT0 0 means there are cars on the North road PT1 1, PT0 1 means there are cars on both roads The next step in designing the FSM is to create some states. Again, the Moore implementation was chosen because the output pattern (which lights are on) defines which state we are in. Each state is given a symbolic name: goN, waitN, goE, waitE,
PT7 to 2 100001 makes it green on North and red on East PT7 to 2 100010 makes it yellow on North and red on East PT7 to 2 001100 makes it red on North and green on East PT7 to 2 010100 makes it red on North and yellow on East
The output pattern for each state is drawn inside the state circle. The time to wait for each state is also included. How the machine operates will be dictated by the input-dependent state transitions. We create decision rules defining what to do for each possible input and for each state. For this design we can list heuristics describing how the traffic light is to operate: If no cars are coming, we will stay in a green state, but which one doesn’t matter. To change from green to red, we will implement a yellow light of exactly 5 seconds. Green lights will last at least 30 seconds. If cars are only coming in one direction, we will move to and stay green in that direction. If cars are coming in both directions, we will cycle through all four states. Before we draw the state graph, we need to decide on the sequence of operations. 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Output to traffic lights, which depends on the state b) Delay, which depends on the state c) Input from sensors d) Change states, which depends on the state and the input We implement the heuristics by defining the state transitions, as illustrated in Figure 6.20. Instead of using a graph to define the finite-state machine, we could have used a table, as shown in Table 6.4.
Figure 6.20 Graphical form of a Moore FSM that implements a traffic light.
Next if input is 01 or 11 00,10
goN 100001 30
00,01, 10,11
01,11
waitN 100010 5
Wait time
Table 6.4 Tabular form of a Moore FSM that implements a traffic light.
00,01
goE 001100 30 00,01,10,11
01,11 waitE 010100 5
Output
State \ Input
00
01
10
11
goN (100001,30) waitN (100010,5) goE (001100,30) waitE (010100,5)
goN goE goE goN
waitN goE goE goN
goN goE waitE goN
waitN goE waitE goN
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
219
The next step is to map the FSM graph onto a data structure that can be stored in EEPROM. Program 6.22 uses a linked data structure, where each state is a node, and state transitions are defined as pointers to other nodes. The four Next parameters define the input-dependent state transitions. The wait times are defined in the software as fixed-point decimal numbers with units of 0.01 seconds, giving a range of 10 ms to about 10 minutes. Using good labels makes the program easier to understand; in other words, goN is more descriptive than &fsm[0]. The main program begins by specifying the Port T bits 1 and 0 to be inputs. The initial state is defined as goN. The main loop of our controller first outputs the desired light pattern to the six LEDs, waits for the specified amount of time, reads the sensor inputs from Port T, then switches to the next state depending on the input data. The timer functions were presented earlier as Program 4.5. The function Timer_Wait10ms will wait 10 ms times the parameter in RegY, and not destroy Registers D or X. We could have eliminated the two shift-left instructions by storing the data in the structure already shifted.
;Linked data structure org $4000 ;Put in ROM OUT equ 0 ;offset for output WAIT equ 1 ;offset for time NEXT equ 3 ;offset for next goN fcb $21 ;North green, East red fdb 3000 ;30sec fdb goN,waitN,goN,waitN waitN fcb $22 ;North yellow, East red fdb 500 ;5sec fdb goE,goE,goE,goE goE fcb $0C ;North red, East green fdb 3000 ;30 sec fdb goE,goE,waitE,waitE waitE fcb $14 ;North red, East yellow fdb 500 ;5sec fdb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldx #goN ;State pointer FSM ldab OUT,x ;Output value lslb lslb ;line up with 7-2 stab PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldab PTT ;Read input andb #$03 ;just bits 1,0 lslb ;2 bytes/address abx ;add 0,2,4,6 ldx NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector
// Linked data structure const struct State { unsigned char Out; unsigned short Time; const struct State *Next[4];}; typedef const struct State STyp; #define goN &FSM[0] #define waitN &FSM[1] #define goE &FSM[2] #define waitE &FSM[3] STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ STyp *Pt; // state pointer unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors Pt = goN; while(1){ PTT = Pt->OutNext[Input]; } }
Program 6.22 Linked data structure implementation of the traffic-light controller.
220
6 䡲 Pointers and Data Structures
Program 6.23 implements the same traffic-light controller using a table data structure. In the linked data structure implementation, the Next parameters contained 16-bit pointers to the next state. In the table implementation, the Next parameters contain 8-bit indices specifying the index of the next state. In this machine, the Next field will be 0, 1, 2, or 3. Although each state only requires 7 bytes of storage, 8 bytes will be allocated to simplify the address calculations (it is easier to multiply by 8 than to multiply by 7). ;Table structure org $4000 ; Put in ROM OUT equ 1 ;offset for output WAIT equ 2 ;offset for time NEXT equ 4 ;offset for next goN equ 0 ;North green, East red Fsm fdb $21,3000 ;30sec fcb goN,waitN,goN,waitN waitN equ 1 ;North yellow, East red fdb $22,500 ;5sec fcb goE,goE,goE,goE goE equ 2 ;North red, East green fdb $0C,3000 ;30 sec fcb goE,goE,waitE,waitE waitE equ 3 ;North red, East yellow fdb $14,500 ;5sec fcb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldab #goN ;State number n FSM ldx #Fsm tba lsla lsla lsla ;8*n leax a,x ;Fsm[n] ldaa OUT,x ;Output value lsla lsla ;line up with 7-2 staa PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldaa PTT ;Read input anda #$03 ;just bits 1,0 leax a,x ;add 0,1,2,3 ldab NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector
// Table implementation const struct State { unsigned char Out; unsigned short Time; unsigned char Next[4];}; typedef const struct State STyp; #define goN 0 #define waitN 1 #define goE 2 #define waitE 3 STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ unsigned char n; // state number unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors n = goN; while(1){ PTT = FSM[n].Out state PTT #$01 ;0,1 ;0,2 Time,X ;Time to wait a,x Out,x PORTAB ;start motors Timer_Wait1ms ;wait in ms #0,PORTAB ;stop motors Next,X ;next loop $FFFE main
struct State{ unsigned short Time; // wait in ms unsigned short Out[2]; // if input=0,1 struct State *Next[2]; // if input=0,1 }; typedef struct State StateType; typedef StateType * StatePtr; #define Trot1 &fsm[0] #define Trot2 &fsm[1] #define Trot3 &fsm[2] #define Trot4 &fsm[3] StateType fsm[4]={ {500,{0,0x8484},{ Trot1, Trot2}}, {500,{0,0x2121},{ Trot2, Trot3}}, {500,{0,0x4848},{ Trot3, Trot4}}, {500,{0,0x1212},{ Trot4, Trot1}} }; void main(void){ StatePtr Pt; // Current State unsigned char Input; Pt = Trot1; // Initial State DDRA = 0xFF; // Right legs DDRB = 0xFF; // Left legs DDRT &= ~0x01; // Trot switch Timer_Init(); while(1){ Input = PTT&0x01; // 0 or 1 PORTAB = Pt->Out[Input]; // output Timer_Wait1ms(Pt->Time); // wait PORTAB = 0; // motors off Pt = Pt->Next[Input]; // next } }
Program 6.24 Mealy FSM.
6.8.4 Functional Abstraction within FiniteState Machines
In the previous examples, the input was obtained by simply reading a parallel port. Similarly, the output was performed by writing to a parallel port. However, finite-state machines can be used in systems where the input and output processes are more complex. In this section, we will develop FSMs where the input is obtained by calling a function, which returns a number to be used by the FSM controller. Similarly, the output process will involve calling a function. The use of function calls adds a layer of abstraction between the high-level FSM and the low-level I/O occurring at the ports.
224
6 䡲 Pointers and Data Structures
Example 6.8 Design a vending machine with two outputs (soda, change) and two inputs (dime, nickel). Solution This vending machine example illustrates additional flexibility that we can build into our FSM implementations. In particular, rather than simple digital inputs, we will create an input subroutine that returns the current values of the inputs. Similarly, rather than simple digital outputs, we will implement general functions for each state. We could have solved this particular vending machine using the approach in the previous examples, but this approach provides an alternative mechanism when the input and/or output operations become complex. Our simple vending machine has two coin sensors: one for dimes and one for nickels. When a coin falls through a slot in the front of the machine, an electrical connection (modeled by a SPST switch) makes a connection between 5 V and a Port A input, as in Figure 6.23. If the digital input is high (1), this means there is a coin currently falling through the slot. When a coin is inserted into the machine, the sensor goes high, then low. Because of the nature of vending machines, we will assume there can not be both a nickel and a dime at the same time. To create the soda and change dispensers, we will interface two solenoids to Port B. The coil current of the solenoids is less than 40 mA, so we can use the 7406 open collector driver. For example, if the software makes PB0 high, waits 10 ms, then makes PB0 low, one soda will be dispensed. Figure 6.23 A simulated vending machine interfaced to a Freescale 9S12.
9S12
10kΩ
Input PA1 Port PA0
dime nickel
+5 +5
7406
10kΩ +12 1N914
Solenoid change
Output Port PB1 +12 1N914
Solenoid soda
PB0
We need to decide on the sequence of operations before we draw the state graph: 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Call an output function, which depends on the state b) Delay, which depends on the state c) Call an input subroutine to get the status of the coin sensors d) Change states, which depends on the state and the input Figure 6.24 shows the Moore FSM that implements the vending machine. A soda costs 15 cents, and the machine accepts nickels and dimes. We have an input sensor to detect nickels (bit 0) and an input sensor to detect dimes (bit 1.) We choose the wait time in each state to be 20 ms, which is smaller than the time it takes the coin to pass by the sensor. Waiting in each state will debounce the sensor, preventing multiple counting of a single event. Notice that we wait in all states, because the sensor may bounce both on touch and release. Each state also has a function to execute. The function Soda will trigger the Port B output so that a soda is dispensed. Similarly, the function Change will trigger the Port B output so that a nickel is returned. The M states refer to the amount of collected money. When we are in a W state, we have collected that much money, but we’re still waiting for the last coin to pass the sensor. For example, we start with no money in state M0. If we insert a dime, the input will
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
225
go 102, and our state machine will jump to state W10. We will stay in state W10 until the dime passes by the coin sensor. In particular when the input goes to 00, then we go to state M10. If we insert a second dime, the input will go 102, and our state machine will jump to state W20. Again, we will stay in state W20 until this dime passes. When the input goes to 00, then we go to state M20. Now we call the function change and jump to state M15. Lastly, we call the function Soda and jump back to state M0. Figure 6.24 This Moore FSM implements a vending machine.
00
10 01 M0 20 none
01,10
Function 00,01,10 Wait time M15 20 soda
W5 20 none
01,10
00 00
01
M5 20 none 10
00,01,10 M20 20 change
W10 20 none
00 M10 20 none 10 01
00
01,10 00 W20 20 none
W15 20 00 none 01,10
Since this is a layered system, we will begin by designing the low-level input/output functions that handle the operation of the sensors and solenoid, as in Program 6.25. Coin_Init bclr DDRA,#$03 ;PA1,0 sensor in rts Coin_Input ;0 means none ldaa PORTA ;1 means nickel anda #$03 ;2 means dime rts Solenoid_Init bset DDRB,#$03 ;PB1,0 solenoid out rts Solenoid_None rts Solenoid_Soda bset PORTB,#$01 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$01 ;deactivate rts Solenoid_Change bset PORTB,#$02 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$02 ;deactivate rts
void Coin_Init(void){ DDRA &= ~0x03; // PA1,0 sensor in } unsigned char Coin_Input(void){ return PORTA&0x03; } void Solenoid_Init(void){ DDRB |= 0x03; // PB1,0 solenoid out } void Solenoid_None(void){ }; void Solenoid_Soda(void){ PORTB |= 0x01; // activate solenoid Timer_Wait(10000); // 10 msec PORTB &= ~0x01; // deactivate } void Solenoid_Change(void){ PORTB |= 0x02; // activate solenoid Timer_Wait(10000); // 10 msec PORTB &= ~0x02; // deactivate }
Program 6.25 Low-level input/output functions for the vending machine.
The main program, Program 6.26, begins by specifying the Port A bits 1 and 0 to be inputs. The initial state is defined as M0. Our controller software first calls the function for this state, waits for the specified amount of time, reads the sensor inputs from PORTA, then switches to the next state depending on the input data. The Timer_Wait function is defined previously. Notice again the one-to-one correspondence between the state graph in Figure 6.24 and the data structure in Program 6.26.
226 CmdPt Time Next M0 W5 M5 W10 M10 W15 M15 W20 M20 main
6 䡲 Pointers and Data Structures equ equ equ fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb fdb lds jsr jsr jsr ldx
0 ;output function 2 ;wait time 4 ;3 pointers to next Solenoid_None,20000 M0,W5,W10 Solenoid_None,20000 M5,W5,W5 Solenoid_None,20000 M5,W10,W15 Solenoid_None,20000 M10,W10,W10 Solenoid_None,20000 M10,W15,W20 Solenoid_None,20000 M15,W15,W15 Solenoid_Soda,20000 M0,M0,M0 Solenoid_None,20000 M20,W20,W20 Solenoid_Change,20000 M15,M15,M15 #$4000 Coin_Init Solenoid_Init Timer_Init #M0 ;Initial State
loop jsr [CmdPt,x] ldd Time,x jsr Timer_Wait jsr Coin_Input lsla leax a,x ldx Next,x bra loop
;output ;wait ;0,1,2 ;0,2,4 ;next state
const struct State { void (*CmdPt)(void); // output unsigned short Time; // wait time const struct State *Next[3];}; typedef const struct State StateType; #define M0 &fsm[0] #define W5 &fsm[1] #define M5 &fsm[2] #define W10 &fsm[3] #define M10 &fsm[4] #define W15 &fsm[5] #define M15 &fsm[6] #define W20 &fsm[7] #define M20 &fsm[8] StateType fsm[9]={ {&Solenoid_None, // M0 20000,{M0,W5,W10}}, {&Solenoid_None, // W5 20000,{M5,W5,W5}}, {&Solenoid_None, // M5 20000,{M5,W10,W15}}, {&Solenoid_None, // W10 20000,{M10,W10,W10}}, {&Solenoid_None, // M10 20000,{M10,W15,W20}}, {&Solenoid_None, // W15 20000,{M15,W15,W15}}, {&Solenoid_Soda, // M15 20000,{M0,M0,M0}}, {&Solenoid_None, // W20 20000,{M20,W20,W20}}, {&Solenoid_Change, // M20 20000,{M15,M15,M15}}}; void main(void){ StateType *Pt; unsigned char Input; Coin_Init(); Solenoid_Init(); Timer_Init(); Pt = M0; // Initial State while(1){ (*Pt->CmdPt)(); // output Timer_Wait(Pt->Time); // wait Input = Coin_Input(); // 0,1,2 Pt = Pt->Next[Input]; // next } }
Program 6.26 Vending machine controller.
The next example involves a Mealy FSM with both the input and output processes being performed using function calls. The example also abstracts the high-level FSM from the low-level I/O.
Example 6.9 Design a robot that sits, stands, and lies down (depending on its mood, which can be OK, tired, curious, or anxious).
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
227
Solution The goal of this section is to design a robot controller, as illustrated in Figure 6.25. We begin the design defining what constitutes a state. In this system, a state describes the position of robot: standing, sitting, or sleeping. Since the outputs are necessary to cause a change in state, we will solve this system with a Mealy FSM. Rather than generate the output as a simply write to a port, the outputs on this robot will be defined as abstract functions, which perform a sequence of operations as needed to complete the task. The output functions are None, it performs no movement SitDown, assuming the robot is standing, it will perform a sequence of moves to sit down StandUp, assuming the robot is sitting, it will perform a sequence of moves to stand up LieDown, assuming the robot is sitting, it will perform a sequence of moves to lie down SitUp, assuming the robot is sleeping, it will perform a sequence of moves to sit up This robot has mood sensors, which are read and processed at the low level. There is an abstract input function, called Sensor_Input, which returns one of four possible conditions 00 01 10 11
OK, the robot is feeling fine Tired, the robot energy levels are low Curious, the robot senses activity around it Anxious, the robot senses danger
Before we draw the state graph, we need to decide on the sequence of operations: 1. Initialize inputs and outputs 2. Specify initial state 3. Perform FSM controller a) Call the Sensor_Input function to determine the current mode b) Call the appropriate robot output function, which depends on the input and the state c) Change states, which depends on the state and the input Figure 6.25 Robot interface. Inputs 9S12
Outputs
The outputs (which output function to call) depend on both the input and the current state. For this design, we can list heuristics describing how the robot is to operate: If the robot is OK, we will stay in whichever state we are currently in. If the robot’s energy levels are low (tired), it will go to sleep. If the robot senses activity around it (curious), it will awaken from sleep. If the robot senses danger (anxious), it will stand up. These rules are converted into a finite-state machine graph, as shown in Figure 6.26. Each arrow specifies both an input and an output. For example, the “Tired/SitDown” arrow from Standing to Sitting states means if we are in the Standing state and the input is Tired, then we will call the SitDown function and go to the Sitting state. Mealy machines can have time delays, this example just didn’t have time delays. The next step is to define the FSM graph using a linked data structure. Program 6.27 shows the implementation of the Mealy FSM using abstract functions to perform the input and output. Pointers to the functions are stored in the output field of the data structrure. Similar to the other FSM implementations, the four Next parameters define the input-dependent state transitions.
228
6 䡲 Pointers and Data Structures
Figure 6.26 Mealy FSM for a robot controller.
Tired/SitDown Curious/None Anxious/None OK/None
Tired/LieDown Tired/None OK/None
Curious/None OK/None Standing
Sitting
Anxious/StandUp
;Input/output defined as functions org $4000 ;EEPROM Out equ 0 ;Pointers to functions Next equ 8 ;Next states Standing fdb None,SitDown,None,None fdb Standing,Sitting,Standing,Standing Sitting fdb None,LieDown,None,StandUp fdb Sitting,Sleeping,Sitting,Standing Sleeping fdb None,None,SitUp,SitUp fdb Sleeping,Sleeping,Sitting,Sitting Main
LL
lds jsr ldx jsr lsla leax jsr ldx bra org fdb
#$4000 Robot_Init #Standing ;current state Sensor_Input ;0,1,2,3 ;0,2,4,6 a,x ;Base+2*input [Out,x] ;Call output function Next,x LL ;Infinite loop $FFFE Main ;reset vector
Sleeping Anxious/SitUp Curious/SitUp
// Input/outputs defined as functions const struct State{ void (*CmdPt)[4](void); // outputs const struct State *Next[4]; // Next }; typedef const struct State StateType; #define Standing &fsm[0] #define Sitting &fsm[1] #define Sleeping &fsm[2] StateType FSM[3]={ {{&None,&SitDown,&None,&None}, //Standing {Standing,Sitting,Standing,Standing}}, {{&None,&LieDown,&None,&StandUp},//Sitting {Sitting,Sleeping,Sitting,Standing }}, {{&None,&None,&SitUp,&SitUp}, //Sleeping {Sleeping,Sleeping,Sitting,Sitting}} }; void main(void){ StatePtr *Pt; // Current State unsigned char Input; Robot_Init(); // initialize hardware Pt = Standing; // Initial State while(1){ Input = Sensor_Input(); // Input=0-3 (*Pt->CmdPt[Input])(); // function Pt = Pt->Next[Input]; // next state } }
Program 6.27 A Mealy FSM implemented with functional abstraction.
6.9
*Dynamically Allocated Data Structures In order to reuse memory and provide for efficient use of RAM, we need dynamic memory allocation. The previous examples in this chapter used fixed allocation, meaning the size of the data structures are decided in advance and specified in the source code. In addition, the location of these structures is determined by the assembler at assembly time. With a dynamic allocation, the size and location will be determined at run time. To implement dynamic allocation, we will manage a heap. The heap is a chunk of RAM that is 1. Dynamically allocated by the program when it creates the data structure 2. Used by the program to store information 3. Dynamically released by the program when the structure is no longer needed
6.9 䡲 *Dynamically Allocated Data Structures
229
The heap manager provides the system with two operations: pt = malloc(size); // returns a pointer to a block of size bytes free(pt); // deallocates the block at pt
The implementation of this general memory manager is beyond the scope of this book. Instead, we will develop a very useful, but simple, heap manager with these two operations: pt = Heap_Allocate(); Heap_Release(pt);
6.9.1 *Fixed-Block Memory Manager
Figure 6.27 The initial state of the heap has all of the free blocks linked in a list.
// returns a pointer to a block of fixed size // deallocates the block at pt
In general, the heap manager allows the program to allocate a variable block size, but in this section, we will develop a simplified heap manager handles just fixed size blocks. In this example, the block size is specified by SIZE. The initialization will create a linked list of all the free blocks (Figure 6.27). FreePt
null
Program 6.28 shows the global structures for the heap. These entries are defined in RAM. SIZE is the number of 8-bit bytes in each block. All blocks allocated and released with this memory manager will be of this fixed size. NUM is the number of blocks to be managed. FreePt points to the first free block. Program 6.28 Private global structures for the fixed-block memory manager.
SIZE NUM NULL FreePt Heap
equ equ equ rmb rmb
4 5 0 2 SIZE*NUM
#define SIZE 4 #define NUM 5 #define NULL 0 // empty pointer char *FreePt; char Heap[SIZE*NUM];
Initialization must be performed before the heap can be used. Program 6.29 shows the software that partitions the heap into blocks and links them together. FreePt points to a linear linked list of free blocks. Initially, these free blocks are contiguous and in order, but as the manager is used, the positions and order of the free blocks can vary. It will be the pointers that will thread the free blocks together.
Heap_Init ldx stx ldab imLoop pshx puly aby sty abx cpx bne ldy sty rts
#Heap FreePt #SIZE
;FreePt=&Heap[0];
;RegY = pt; ;pt+SIZE 0,x ;*pt=pt+SIZE; ;pt=pt+SIZE; #Heap+SIZE*(NUM-1) imLoop #NULL 0,x ;*pt=NULL;
Program 6.29 Functions to initialize the heap.
void Heap_Init(void){ char *pt; FreePt = &Heap[0]; for(pt=&Heap[0]; pt!=&Heap[SIZE*(NUM-1)]; pt=pt+SIZE){ *(short*)pt =(short)(pt+SIZE); } *(short*)pt = NULL; }
230
6 䡲 Pointers and Data Structures
To allocate a block to manager just removes one block from the free list, see Program 6.30. The Heap_Allocate function will fail and return a null pointer when the heap becomes empty. The Heap_Release returns a block to the free list. This system does not check to verify a released block actually was previously allocated. ; returns RegX points to new block ; RegX=NULL if no more available Heap_Allocate ldx FreePt ;pt=FreePt; cpx #NULL beq aDone ;if (pt!=NULL) ldy 0,x sty FreePt ;FreePt=*pt; aDone rts ; RegX => block being released Heap_Release ldy FreePt ;oldFreePt=FreePt; stx FreePt ;FreePt=pt; sty 0,x ;*pt=oldFreePt; rts
void char pt if
*Heap_Allocate(void){ *pt; = FreePt; (pt != NULL){ FreePt = (char*) *(char**)pt;
} return(pt); } void Heap_Release(void *pt){ char *oldFreePt; oldFreePt = FreePt; FreePt = (char*)pt; *(short*)pt = (short)oldFreePt; }
Program 6.30 Functions to allocate and release memory blocks. Checkpoint 6.20: Consider a system that needs variable-size memory allocation, where the size can range from 2 to a maximum of 20 bytes. How might this simple heap be used?
6.9.2 *Linked List FIFO
Next Data
equ equ
0 2
An example application of a dynamically allocated data structure is a FIFO. In this structure, GetPt points to the oldest node (the one to get next), and PutPt points to the newest node: the place to add more data. The pointer for the newest node (if it exists) is a null. The Fifo_Put operation fails (full) when the heap runs out of space. The Fifo_Get operation fails (empty) when GetPt equals NULL. Program 6.31 shows the global variables defined in RAM. Figure 6.28 shows an example FIFO with three elements (after running with lots of putting and getting). In this example, element 1 is the oldest because it was put first. This system uses Programs 6.28, 6.29, and 6.30 with SIZE equal to 4 bytes.
;next ;16-bit data for node
struct Node{ struct Node *Next; short Data; }; typedef struct Node NodeType; typedef NodeType *NodePtr; NodePtr PutPt; // place to put NodePtr GetPt; // place to get
GetPt rmb 2 ; GetPt is pointer to oldest node PutPt rmb 2 ; PutPt is pointer to newest node
Program 6.31 Definition of the linked list structure. Figure 6.28 A linked list FIFO after putting 1,2,3.
PutPt FreePt
GetPt null 3
null 2
1
Program 6.32 shows the three functions which implement the FIFO. Figure 6.29 is a flowchart of the Put and Get functions. The FIFO is full only when the heap is full
6.9 䡲 *Dynamically Allocated Data Structures
231
(Heap_Allocate returns a failure). The Put operation first allocates space for the new entry, then stores the new information into the Data field. Since this element will be last, its Next field is set to null. The last part of Put links this new node at the end of the linked list. The Get function first checks to make sure the FIFO is not empty. Next, the Data field is retrieved from the node. This node is then unlinked from the linked list, and the memory block is released to the heap. There is a special case that handles the situation where you get the one remaining node in the linked list. In this case both PutPt and GetPt point to this node. When you get this node, both PutPt and GetPt are set to null, signifying the FIFO is now empty. Fifo_Init ldx #NULL stx GetPt ;GetPt=NULL stx PutPt ;PutPt=NULL jsr Heap_Init rts ; Inputs: RegD data to put ; Outputs: V=0 if successful ; V=1 if unsuccessful Fifo_Put jsr Heap_Allocate cpx #NULL beq Pful ;skip if full std Data,x ;store data ldy #NULL sty Next,x ;next=NULL ldy PutPt cpy #NULL ;previously MT? beq PMT stx Next,y ;link to previous bra PCon PMT stx GetPt ;Now one entry PCon stx PutPt ;points to newest clv ;success bra PDon PFul sev ;failure, full PDon rts ; Inputs: none ; Outputs: RegD data removed ; V=0 if successful ; V=1 if empty Fifo_Get ldx GetPt cpx #NULL beq GMT ;empty if NULL ldd Data,x ;read ldy Next,x ;pointer to next sty GetPt cpy #NULL bne GCon sty PutPt ;Now empty GCon sty GetPt ;points to oldest jsr Heap_Release clv ;success bra GetDone GMT sev ;failure, empty GDon rts
Program 6.32 Implementation of the linked list FIFO.
void Fifo_Init(void){ GetPt = NULL; // Empty when null PutPt = NULL; Heap_Init(); } int Fifo_Put(short theData){ NodePtr pt; pt = (NodePtr)Heap_Allocate(); if(!pt){ return(0); // full } pt->Data = theData; // store pt->Next = NULL; if(PutPt){ PutPt->Next = pt; // Link } else{ GetPt = pt; // first one } PutPt = pt; return(1); // successful }
int Fifo_Get(short *datapt){ NodePtr pt; if(!GetPt){ return(0); // empty } *datapt = GetPt->Data; pt = GetPt; GetPt = GetPt->Next; if(GetPt==NULL){ // one entry PutPt = NULL; } Heap_Release(pt); return(1); // success }
232
6 䡲 Pointers and Data Structures
Figure 6.29 Flowcharts of a linked list FIFO Put and Get operations.
Put
Get
pt=Heap_Allocate()
GetPt
valid
full
store data at pt->Data
return(0)
PutPt
fetch data at GetPt->Data pt = GetPt
NULL first element
GetPt valid
GetPt = pt
PutPt->Next = pt
return(0)
GetPt = GetPt->Next
pt->Next = NULL valid
empty
valid
NULL
pt
NULL
NULL now, it is empty PutPt = NULL
PutPt = pt
Heap_Allocate(pt)
return(1)
return(1)
Checkpoint 6.21: Draw a picture like Figure 6.28 of a doubly linked list. How might this more complicated structure be more efficient than the single linked list?
6.10
*9S12 Paged Memory 16-bit pointers can only access up to 64 KiB of memory. The 9S12 uses a paged memory system to access memory beyond this 64 KiB barrier. On most of the 9S12 microcontrollers, the extended address contains 20 bits and thus can access up to 1 Mbytes of memory. The paged memory system is organized into a maximum of 64 pages with a fixed page size of 16 KiB. The software must first write the page number into PPAGE, which is an 8-bit register located at $0030 (only the bottom 6 bits are used). On the 9S12, addresses in the $8000 to $BFFF window invoke the paged memory system. The top 6 bits of the 20-bit extended address are retreived from the PPAGE register, and the bottom 14 bits come from the regular 16-bit address, as shown in Figure 6.30. In particular, when the software accesses any address in the $8000 to $BFFF window, the bottom 6 bits of PPAGE are concatenated to the bottom 14 bits of the window address to create the 20-bit extended address used to access memory. This logical to physical address
Figure 6.30 The address is comprised of two components.
PPAGE 0 0 PIX5 PIX4 PIX3 PIX2 PIX1 PIX0 PIX5 PIX4 PIX3 PIX2 PIX1 PIX0 a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0
10 $8000 to $BFFF
a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0
20-bit address
6.10 䡲 *9S12 Paged Memory
233
translation occurs automatically whenever an address in the $8000 to $BFFF window is accessed. On the 9S12DP512, the full 512 KiB of flash EEPROM can only be accessed using this paged memory system. On the 9S12DP512, there are only 32 pages needed for the 512-KiB flash EEPROM. In particular, it utilizes page numbers $20 through $3F. Page $3E is actually the same as regular EEPROM at $4000 to $7FFF, and page $3F is the same as EEPROM at $C000 to $FFFF. Observation: If the software sets and leaves PPAGE at $20 (actually any constant value from $20 to $3D), then the EEPROM behaves like a simple 48 KiB memory from $4000 to $FFFF.
We will present two applications of paged memory. In this first application, the flash EEPROM on the 9S12DP512 will contain a 256 KiB data buffer. Because these data are located in EEPROM, we will consider them as constant and provide a function to access the data. The buffer will be accessed using a single 18-bit linear address and passed into the subroutine in registers B and X. In this example, we assume the system’s executable object code fits entirely in the 32 KiB space $4000 to $7FFF, $C000 to $FFFF. The 256 KiB buffer will be stored into 16 pages from $20 to $2F. The subroutine, shown as Program 6.33, first sets the PPAGE register to select the correct page, then reads from the $8000 to $BFFF window to retrieve the specified data.
;****Buf_Read******* ;Read byte from buffer ;Input B:X is 18-bit linear address ;Output A is data Buf_Read pshx xgdx lsld xgdx rolb ;addr14) stab PPAGE puld anda #$3F ;D=addr&$3FFF adda #$80 ;D=$8000+addr&$3FFF tfr d,x ;X=$8000+addr&$3FFF ldaa 0,x ;A=data from buffer rts
// Read byte from buffer unsigned char Buf_Read(unsigned long addr){ unsigned char *pt; unsigned char page; page = (unsigned char)(addr>>14); PPAGE = 0x20+(page&0x0F); pt = (unsigned char *)(addr&0x3FFF); return (*pt); }
Program 6.33 A 256 Kibibyte data buffer implemented in paged memory.
The second application implements a system with a code size of more then 48 KiB. In this system, we will partition the code into separate 16 Kibibyte pieces. The system will be most efficient if the partitioning is done according to access probability. In other words, if module A frequently calls module B, then A and B will be placed into the same 16 KiB page. We will place the most frequently used code and the starting location into the pages $4000 to $7FFF and $C000 to $FFFF. Accessing these locations is simple and uses standard 16-bit pointers. We place the remaining code into paged memory. Subroutine calls within the same page can utilize
234
6 䡲 Pointers and Data Structures
Figure 6.31 The call instruction is used to call a subroutine in paged memory.
before CALL PC $81
after CALL
Stack
PC
$01
Stack
$97
$6C
PPAGE
PPAGE
SP
$02 $20
$02 $21
$20 $81 $05
SP
top
PC Page2 $80101 call sub,#$21 $80105
PC Page2 $80101 call sub,#$21 $80105
Page3 $8576C sub inca $8576D rtc
Page3 $8576C sub inca $8576D rtc
the standard bsr and jsr instructions. To call a subroutine located in a different page, the call instruction is used. Figure 6.31 shows the stack before and after the call instruction is executed on the 9S12DP512. The call instruction pushes the old PPAGE and PC values on the stack and then loads PPAGE and PC with the address of the subroutine. When op codes are fetched from the $8000 to $BFFF window, the 6-bit PPAGE is combined with the lower 14 bits of the PC to form a 20-bit address. The translation occurs automatically in hardware. Consider the case where the PPAGE register equals $20 and the PC is $8101 (left picture of Figure 6.32). PPAGE = $20 = 00100000 PC = $8101 = 1000000100000001 PPAGE + Lower 14 bits of PC = 100000+00000100000001 = $80101
After the call instruction, PPAGE register equals $21, and the PC is $976C (right picture of Figure 6.32). PPAGE = $21 = 00100001 PC = $976C = 1001011101101100 PPAGE + Lower 14 bits of PC = 100001+01011101101100 = $8576C
The rtc instruction will return to the program that called the subroutine. Both the PPAGE and PC values are pulled off the stack. Figure 6.32 shows the stack before and after execution of the rtc instruction. Figure 6.32 The rtc instruction is used to return from a subroutine in paged memory.
before RTC PC
after RTC
Stack
$97
$6D
PPAGE
SP
$02 $21
PC $81
$20
PPAGE
$81
$02 $20
Stack $05
$05
SP PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc
top
PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc
6.11 䡲 Functional Debugging
235
Programs 6.34, 6.35, and 6.36 illustrate the use of call and rtc to create a paged memory system on the 9S12. Program 6.34 will be programmed into main EEPROM.
Program 6.34 Main memory programs for this paged memory system.
func1 equ func2 equ org main lds clra loop call call call call bra
0 3 $4000 #$4000
; ; ; ;
func1,#$21 func1,#$22 func2,#$21 func2,#$22 loop
relative offset in paged memory relative offset in paged memory main EEPROM memory stack in main RAM ; ; ; ;
call call call call
function function function function
1 1 2 2
in in in in
page page page page
$21 $22 $21 $22
(add (add (add (add
1) 2) 3) 4)
Program 6.35 will be programmed into external page $21.
Program 6.35 Page $21 programs for this paged memory system.
fun1 fun2
org lbra lbra adda rtc adda rtc
$0000 fun1 fun2 #1
; page $21 external memory ; link to actual function ; link to actual function
#2
Program 6.36 will be programmed into external page $22.
Program 6.36 Page $22 programs for this paged memory system.
fun1 fun2
org lbra lbra adda rtc adda rtc
$0000 fun1 fun2 #3
; page $22 external memory ; link to actual function ; link to actual function
#4
The TExaS simulator does not support external paged memory, but it will execute the call and rtc instructions similar to regular jsr rts subroutine.
6.11
Functional Debugging
6.11.1 Instrumentation: Dump Into Array Without Filtering
As mentioned in the last chapter, one of the difficulties with print statements are that they can significantly slow down the execution speed in real-time systems. Many times the bandwidth of the print functions can not keep pace with the existing system. For example, our system may wish to call a function 1000 times a second (or every 1 ms). If we add print statements to it that require 50 ms to perform, the presence of the print statements will significantly affect the system operation. In this situation, the print statements would be considered extremely intrusive. Another problem with print statements
236
6 䡲 Pointers and Data Structures
occurs when the system is using the same output hardware for its normal operation, as is required to perform the print function. In this situation, debugger output and normal system output are intertwined. To solve both these situations, we can add a debugger instrument that dumps strategic information into an array at run time. We can then observe the contents of the array at a later time. One of the advantages of dumping is that the 9S12 BDM debugger module allows you to visualize memory even when the program is running. So this technique will be quite useful in systems connected to a debugger. Assume happy and sad are strategic 8-bit variables. The first step when instrumenting a dump is to define a buffer in RAM to save the debugging measurements.
#define SIZE 20 unsigned char Buffer[2*SIZE]; unsigned char Cnt;
SIZE equ 20 Buffer rmb SIZE*2 Cnt rmb 1
The Cnt will be used to index into the buffers. Cnt must be initialized to zero, before the debugging begins. The debugging instrument, shown in Program 6.37, saves the strategic variables into the Buffer.
Program 6.37 Instrumentation dump.
Save pshb pshx ldab cmpb beq ldx movb incb movb incb stab done pulx pulb rts
;save Cnt #SIZE*2 ;full? done #Buffer happy,B,X ;save happy sad,B,X
void Save(void){ if(Cnt < SIZE*2){ Buffer[Cnt] = happy; Cnt++; Buffer[Cnt] = sad; Cnt++; } }
;save sad
Cnt
Next, you add jsr Save statements at strategic places within the system. You can either use the debugger to display the results or add software that prints the results after the program has run and stopped. Observation: You should save registers at the beginning and restore them back at the end, so the debugging instrument itself doesn’t cause the software to crash.
6.11.2 Instrumentation: Dump Into Array With Filtering.
One problem with dumps is that they can generate a tremendous amount of information. If you suspect a certain situation is causing the error, you can add a filter to the instrument. A filter is a software/hardware condition that must be true in order to place data into the array. In this situation, if we suspect the error occurs when another variable gets large, we could add a filter that saves in the array only when the variable is above a certain value. In the example shown in Program 6.38, the instrument saves the strategic variables into the buffer only when sad is greater than 100.
6.12 䡲 Tutorial 6 Software Abstraction Program 6.38 Instrumentation dump with filter.
6.12
Save pshb pshx ldab cmpb ble ldab cmpb beq ldx movb incb movb incb stab done pulx pulb rts
;save sad #100 ;save only done ;when sad >100 Cnt #SIZE*2 ;full? done #Buffer happy,B,X ;save happy sad,B,X
237
void Save(void){ if(sad > 100){ if(Cnt < SIZE*2){ Buffer[Cnt] = happy; Cnt++; Buffer[Cnt] = sad; Cnt++; } } }
;save sad
Cnt
Tutorial 6 Software Abstraction The purpose of this tutorial is to evaluate two stepper motor interfaces. Tutor6a.rtf spins a stepper motor using the switch statement. Tutor6b.rtf spins a stepper motor using a linked structure. You first will be asked to calculate the execution speed for each example. Then, you will study its ease of modification by adding additional states to the system. Action: Open and assemble the switch statement program Tutor6a.rtf. Question 6.1 What is the static efficiency of the step subroutine in the Tutor6a.rtf system in ROM bytes? Action: Run the Tutor6a.rtf system and observe the stepper motor signals. Question 6.2 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.3 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.4 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Action: Open and assemble the linked-structure program Tutor6b.rtf. Question 6.5 What is the static efficiency of the linked structure and the step subroutine in the Tutor6b.rtf system in ROM bytes? Action: Run the Tutor6b.rtf system and observe the stepper motor signals. Question 6.6 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.7 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.8 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Comment on the differences between the two approaches.
238
6.13
6 䡲 Pointers and Data Structures
Homework Assignments Homework 6.1 Assume Register X contains the address $2000, Register Y contains the address $2080, Register A contains $45, and Register B contains $67. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std staa stab std
40,x $40,x $66,y 25,y $FF,y $CD,x
Homework 6.2 Assume Register X contains the address $2000, and Register Y contains the address $2080. Assume memory contains the following initial values $2000 0, $2001 1, . . . , $20FF $FF. For each of the following instructions, specify the effective address and the resulting operation. Give all your answers in hexadecimal. ldaa ldab ldaa ldaa ldd ldd
40,x $40,y $66,x 25,y $FE,x $0D,y
Homework 6.3 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $02, and Register B contains $67. Assume locations $0802 and $0803 contain the 16-bit value $0A00. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std stx stab std
b,x -$40,y [2,x] d,y 1,-y 2,x+
Homework 6.4 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $03, and Register B contains $67. Assume locations $0804 and $0805 contain the 16-bit value $0B12. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. stab staa std sty staa std
a,x -1,y [4,x] d,x 1,+x 2,y-
Homework 6.5 Write assembly code that adds 10 to Register X and subtracts 100 from Register Y. Homework 6.6 Write assembly code that sets Register X equal to Register Y plus 100. Homework 6.5 Write assembly code that adds Register D to Register X and stores the sum in Register Y.
6.13 䡲 Homework Assignments
239
Homework 6.7 Look up the machine code created by the following instructions. Explain the basic function of each instructon. The first one is completed. Machine Code
Instruction
Comment
$860A
ldaa ldaa ldaa ldaa
RegA = 10
#10 10 10,x 10,y
Homework 6.8 Look up the machine code created by the following 9S12 instructions. Explain the basic function of each instructon. The first one is completed. Machine Code
Instruction
Comment
$A602
ldaa ldaa ldaa ldaa ldaa ldaa ldaa
RegA = [X + 2]
2,x -2,x 2,+x 2,x+ 2,-x 2,x[2,x]
Homework 6.9 Write a subroutine to converts a null-terminated string to upper case. In particular, convert all lower case ASCII characters to upper case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr
#string UpperCase
; pointer to ASCII string
Homework 6.10 Write a subroutine to converts a null-terminated string to lower case. In particular, convert all upper case ASCII characters to lower case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr
#string LowerCase
; pointer to ASCII string
Homework 6.11 Write a subroutine that compares two null-terminated strings. Register A will be 0 if the strings do not match and will be nonzero if the strings match. The calling sequence is ldx ldy jsr
#string1 ; pointer to first string #string2 ; pointer to second string StringCompare
Homework 6.12 Write a subroutine that adds two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The first array, pointed to by RegX, should be added to the second array, pointed to by RegY, and the sum placed back in the second array. Assume the data is 8-bit unsigned, and implement a ceiling operation (set result to 255) on overflow. Homework 6.13 Write a subroutine that implements the dot-product two equal sized arrays. The arrays contain 8-bit unsigned numbers. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The return parameter is an unsigned 16-bit number in Reg D. For example, consider these two arrays: Vector1 fcb 10,20,30 ; 3-D vector Vector2 fcb 1,0,2 ; 3-D vector The dot product is 10*120*030*2 70. The calling sequence is ldaa ldx ldy jsr
#3 #Vector1 #Vector2 DotProduct
; size of arrays ; pointer to first array ; pointer to second array
240
6 䡲 Pointers and Data Structures Homework 6.14 Write a subroutine that counts the number of characters in a string. The string is null-terminated. Register X is a call-by-reference pointer to the string. The number of characters in the string is returned in Reg B. For example, consider this string: Name "Valvano" fcb 0 The size is is 7. The calling sequence is: ldy jsr
#Name Count
; pointer to string
Homework 6.15 Write a subroutine that finds the maximum number in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-byreference pointer to the array. The maximum value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum value is 40. The calling sequence is ldy jsr
#Array Maximum
; pointer to array
Homework 6.16 Write a subroutine that finds the largest absolute value in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-by-reference pointer to the array. The maximum absolute value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum absolute value is 80. The calling sequence is ldy jsr
#Array Maximum
; pointer to array
Homework 6.17 Write a subroutine that compares two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call-by-reference pointers to the arrays. The return parameter is in RegB. RegB is 1 if the arrays are equal and 0 if they are different. For example, consider these two arrays containing 8-bit numbers: Array1 fcb 10,20,30,40,50,60,70,80 Array2 fcb 10,20,30,41,50,60,70,80 These arrays are different. The calling sequence is ldaa ldx ldy jsr
#8 #Array1 #Array2 ArrayEqual
; size of arrays ; pointer to first array ; pointer to second array
Homework 6.18 Write a subroutine that counts the frequency of occurance of letters in a text buffer. Register X points to a null-terminated ASCII buffer. There is a 26-element array into which the frequency data will be entered. For example, the first element of Freq will contain the number of A’s and a’s. Count only the upper case and lower case letters. Freq ds.w 26
;twenty six 16-bit counters
The calling sequence is ldx jsr
#buffer CalcFreq
; pointer to text buffer
Homework 6.19 Write three debugging subroutines that implement a debugging array dump. Assume there are two global 16-bit variables AA and BB that are strategic to the system under test. The first subroutine initializes your system. The second subroutine saves AA, BB, and TCNT in the array. Your system should be able to support up to ten measurements. You may assume the SCI port
6.13 䡲 Homework Assignments
241
is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine will display the collected data. These three subroutines will be added to the original system with the first being called at the beginning, the second placed at strategic places within the program under test, and the last one will be called at the end. Estimate the level of intrusiveness of this debugging process. In particular, how long does it take to call the second subroutine. These subroutines will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 6.20 Assume we have some 6-row by 8-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 5, and the column index, J, ranges 0 J 7. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs
RegA RegB RegX RegD
row index I=0,1,...,5 column index J=0,1,...,7 pointer to a 6 by 8 matrix 16-bit contents at matrix[I,J]
Homework 6.21 Assume we have some 5-row by 10-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 4, and the column index, J, ranges 0 J 9. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs
RegA RegB RegX RegD
row index I=0,1,...,4 column index J=0,1,...,9 pointer to a 5 by 10 matrix 16-bit contents at matrix[I,J]
Homework 6.22 Consider the following table structure: const struct theRoom{ unsigned char windows; // number of windows unsigned char doors; // number of doors unsigned short size[3]; // x,y,z dimensions } typedef const struct theRoom roomType; roomType Building[4]={ { 3,2,{16,16,8}}, { 4,1,{20,20,10}}, { 5,3,{32,16,12}}, { 0,1,{18,10,8}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of windows of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 2, then the number of windows will be 5. c) Write an assembly program to return the number of doors of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 0, then the number of doors will be 2. d) Write an assembly program to return the volume of a room. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the room number is 1, then the volume will be 20*20*104000. Homework 6.23 Consider the following table structure: const struct thedesk{ unsigned char legs; unsigned char drawers; unsigned short size[2];
// number of legs // number of drawers // top x,y dimensions 0.1 feet
242
6 䡲 Pointers and Data Structures } typedef const struct thedesk deskType; deskType furniture[4]={ { 4,5,{30,50}}, // 4 legs 5 drawers { 4,0,{45,45}}, // square table { 6,7,{40,65}}, { 4,4,{35,55}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of legs of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 2, then the number of legs will be 6. c) Write an assembly program to return the number of drawers of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 0, then the number of drawers will be 5. d) Write an assembly program to return the area of a desk top with units in2. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the desk number is 3, then the desk top area will be (35*55*144)/100 1764. Worry about accuracy (divide last) and overflow (use enough bits in the multiply stage to prevent overflow.) You could factor the 144 100 terms to calculate (35*55*18)/25 1764. Your solution has to work for these four examples. Homework 6.24 Write an assembly main program that implements this Mealy finite-state machine. The FSM data structure, shown below, is given and cannot be changed. The next state links are defined as 16-bit pointers. Each state has eight outputs and eight next-state links. The input is on Port M bits 2,1, and 0 and the output is on Port T bits 5, 4, 3, 2, 1, and 0. There are three states (S0, S1, and S2), and the initial state is S0. Show all assembly software required to execute this machine, including the reset vector. You need not be friendly, but do initialize the direction registers. The repeating execution sequence is input, output (depends on input and current state), and next (depends on input and current state). org * Finite S0 fcb fdb S1 fcb fdb S2 fcb fdb
$4000 ;EPROM State Machine 0,0,5,6,3,9,3,0 S0,S0,S1,S1,S1,S2,S2,S2 1,2,3,9,6,5,3,3 S2,S0,S0,S0,S2,S2,S2,S1 1,2,3,9,6,5,3,3 S2,S2,S2,S2,S0,S0,S2,S1
; ; ; ; ; ;
Outputs for Next states Outputs for Next states Outputs for Next states
inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7
Homework 6.25 Design a microcomputer-based controller using a linked-list finite-state machine. The system has one input and one output.
Figure Hw6.25 Electronic ignition.
9S12
Angle Machine Spark
PT3 PT2
about 1 ms
Angle exactly 50μs
Spark
The input, Angle, is a periodic signal with a frequency of about 1 kHz (has a period of about 1 ms). The output, Spark, should be a positive pulse (exactly 50 s wide) every time Angle goes from 0 to 1. The delay between the rising edge of Angle and the start of the Spark pulse should be as short as possible. The period of Angle can vary from 1 ms to 50 ms. Since Angle is an input you can not control it, only respond to its rising edge.
6.13 䡲 Homework Assignments
243
a) Design the one input, one output finite-state machine for this system. Draw the FSM graph. Use descriptive state names (i.e., don’t call them S0, S1, S2 . . .) b) Show the assembly code to create the statically allocated linked list. Include org statement(s) to place it in the proper location on your microcomputer. c) Show the assembly language controller. Include ORG statement(s) to place it in the proper location on a microcomputer. Assume this is the only task that the microcomputer executes. I.e., show ALL the instructions necessary. Make the program automatically start on a RESET. Homework 6.26 Implement the following Mealy finite-state machine using linked lists. The initial state is Stop. Do not convert the finite-state machine to an equivalent Moore, rather implement it as a mealy machine. There is no wait parameter for the states.
Figure Hw6.26 Engine controller.
Break Machine Gas Control
PT2 PT1 PT0
9S12
Gas Break Control
0/10
0/00 1/01 Go
1/00
Initial
Stop
There is one input, Control, connected to PT0. There are two outputs: Break connected to PT2, and Gas connected to PT1. Each state has two next states and two outputs which depend on the current input. The controller continuously repeats the sequence: Input from Control (PT0) Output to Break,Gas (PT2 and PT1) which depends on the input Control Next state which depends on the input Control E.g., if the state is in Stop, and the Control is 0, then the Break output is 1 and the Gas output is 0 and the next state is Stop. Show ALL the assembly language software required to implement this machine on a single chip microcomputer. Use equ statements to clarify the data structure. Use org statements to implement the appropriate segmentation. Homework 6.27 Write an assembly main program that implements this Moore finite-state machine. The FSM state graph, shown in Figure Hw6.27, is given and cannot be changed. The input is on Port T bits 1 and 0 and the output is on Port M bits 4, 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy.
Figure Hw6.27 Finite state graph.
0 1
happy 10
2
0
3
3 2 hungry 0
3
0 sleepy 12 2
1 1
a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . output, input, next. . . . Please make your code that accesses Port M friendly. Homework 6.28 Write an assembly main program that implements this Mealy finite-state machine. The FSM state graph, shown in Figure Hw6.28, is given and cannot be changed. The input is on Port T
244
6 䡲 Pointers and Data Structures
Figure Hw6.28 Finite state graph.
0/3
0/7
happy 1/2
hungry 1/8 0/4 sleepy
1/3
bit 0 and the output is on Port M bits 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy. a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . input, output, next. . . . Please make your code that accesses Port M friendly. Homework 6.29 Write the Stepper_CCW subroutine as described in Example 6.1.
6.14
Laboratory Assignments Lab 6.1 Minimally Intrusive Debugging Purpose. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project you will run with a short time delay. After the software is debugged, you will build your hardware and run your software on the real 9S12. During this phase of the project you will run with time delays long enough so you will be able to see the LED flash (slower than 8 Hz). Description. You will first design a system, and then add debugging instruments to prove the system is functioning properly. The system has one input switch and one output LED. The basic function of the system is to respond to the input switch, causing certain output patterns on the LED. Interface a positive logic switch to PT3. This means the PT3 signal will be 0 (low, 0V) if the switch is not pressed, and the PT3 signal will be 1 (high, 5V) if the switch is pressed. Overall functionality of this system is described in the following rules. The system starts with the LED off (make PT2 0). The system will return to the off state if the switch is not pressed (PT3 is 0). If the switch is pressed (PT3 is 1), then the LED will flash on and off at about 4 Hz. During the first phase of this lab, you will simulate these hardware circuits in TExaS using a positive logic mode for the switch and LED. During the second phase, you will interface a real switch and LED to your 9S12. When visualizing software running in real-time on an actual microcomputer, it is important use minimally intrusive debugging tools. The objective of this lab is to develop debugging methods that do not depend on the simulator. During the first phase of this lab, you will develop and test your program and debugging instruments on the TExaS simulator. In particular, you will write debugging instruments to record input and output information as your system runs in real time. This software dump should store data into an array while it is running, and the information will be viewed at a later time. Software dumps are an effective technique when debugging software on an actual microcomputer. During the second phase of this lab, you will run your system on the real 9S12 with and without your debugging instruments. a) Design the hardware interface of the switch and LED first in TExaS, then on the real system. b) Write a main program that implements the input/output system. To implement the 125 ms delay, use the timer functions from Chapter 4. The basic steps for the main program are shown in Program L6.1. c) Write two debugging subroutines that implement a dump instrument. This is called functional debugging because you are capturing input/output data of the system without information
6.14 䡲 Laboratory Assignments
loop
wait flash
Initialize the stack pointer Enable interrupts for the Metrowerks debugger, cli Set the direction register so PT3 is an input and PT2 is an output Set PT2 so the LED is off delay about 125ms (any delay from 60 to 500 ms is OK) read the switch and go to flash if the switch is pressed Set PT2 so the LED is off read the switch and go to wait if the switch is not pressed toggle the LED (if on turn it off, if off turn it on) go to loop
245
DDRT &= ~0x08; // PT3 input DDRT |= 0x04; // PT2 output PTT &= ~0x04; // PT2 off while(1){ Delay(); // you write this if((PTT&0x08)==0){ PTT &= ~0x04; // PT2 off while((PTT&0x08)==0){}; } PTT = PTT^0x04; // toggle }
Program L6.1 Program used to develop minimally intrusive debugging instruments. specifying when the input/output was collected. The first subroutine (Debug_Init) initializes your debugging system. The initialization should initialize a 100-byte array (start it at $3880), initializing pointers and/or counters as needed. The second subroutine (Debug_Capture) saves one data point (PT3 input data, and PT2 output data) in the array. Since there are only two bits to save, pack the information into one 8-bit value for storage and ease of visualization. For example, if Input (PT3)
Output (PT2)
0 0 1 1
0 1 0 1
Saved Data 0000,00002, or $00 0000,00012, or $01 0001,00002, or $10 0001,00012, or $11
In this way, you will be able to visualize the entire array in an efficient manner. Place a call to Debug_Init at the beginning of the system, and a call to Debug_Capture just after each time you output to PTT (there will be 3 or 4 places where your software writes to PTT). Within TExaS you can observe the debugging array using a Stack window. The basic steps involved in designing the data structures for this debugging instrument are as follows: Allocate a 100-byte buffer starting at address $3880 Allocate a 16-bit pointer, which will point to the place to save the next measurement The basic steps involved in designing Debug_Init are as follows: Set all entries of the 100-byte buffer to $FF (meaning no data yet saved) Initialize the 16-bit pointer to the beginning of the buffer The basic steps involved in designing Debug_Capture are as follows: Return immediately if the buffer is full (pointer past the end of the buffer) Read PTT data PTT Mask capturing just bits 3,2 data ((data&$08)1) ((data&$04)2) Dump information into buffer (*pt) data Increment pointer to next address pt pt 1 Both routines should save and restore registers that it modifies (except CCR), so that the original program is not affected by the execution of the debugging instruments. The temporary variable data may be implemented in a register. However, the 100-byte buffer and the 16-bit pointer, pt, should be permanently allocated in global RAM. d) By counting cycles in the listing file, estimate the execution time of the Debug_Capture subroutine. Assuming the actual E clock speed, convert the number of cycles to time. This time will be a quantitative measure of the intrusiveness of your debugging instrument.
246
6 䡲 Pointers and Data Structures Lab 6.2 Hand Assembly and Execution Purpose. In this lab you will learn how to hand-assemble source code. During pass 1 you will create the symbol table. During pass 2 you will create the object code. Another objective is to understand how the microcomputer executes instructions. For each memory cycle during execution, you will predict the R/W line, the 16-bit address, and the 8-bit data bus. Description. In preparation for this assignment, you should familiarize yourself with the format of the Microcomputer Programming Reference Manual. In particular, you should understand the addressing modes. You need to be able to look up op codes for each instruction. For each instruction, you need to determine the object code and CPU execution cycles. Many instructions have multiple addressing modes, each addressing mode has a distinct object code and execution cycles. a) Pretend you are pass 1 of the cross-assembler and create the symbol table for the Program L6.2. Labels start in column 1. A symbol table is a list of symbols and their 16-bit unsigned values. There will be an entry in the symbol table for each label. For all labels except equ or set, the value of a symbol is the beginning address of that line. For labels with equ or set, the value of the label is the 16-bit value of the operand. b) Pretend you are pass 2 of the cross-assembler and create the object code for the Program L6.2. Include four fields for each line of assembly code:
Program L6.2 The assembly program for Lab 6.2.
org Result rmb Index rmb org Main lds ldy ldaa bsr std stop Sum pshy staa ldd SLoop addd dec bne puly rts org data fdb org fdb 1. 2. 3. 4.
$900 ; RAM 2 1 $F800 ; EEPROM #$0C00 #data #2 Sum Result
Index #0 2,y+ Index SLoop
$FC00 13,9 $FFFE Main
The address is the 16-bit unsigned hexadecimal location of the start of this line The object code is a group of 8-bit unsigned hexadecimal values The number of cycles to execute this line (called Cycles in the manual) The execution pattern is called Access Detail in the CPU manual
Every line has an address. Some pseudo-ops will create object code (e.g., fcb5 fdb6). Since pseudo-ops are not executed, no pseudo-op will have values for the Cycles or Access Detail entries. For example, the 6812 yields Address $F800 $F800 5 6
Object Code(s) B6 08 00
Cycles
Access Detail
Source Code
[3]
rOP
org $F800 ldaa $0800
In TExaS, the pseudo-ops fcb, dc.b, and dc are identical. In TExaS, the pseudo-ops fdb and dc.w are identical.
6.14 䡲 Laboratory Assignments
247
c) Type the source code into the system and run the cross-assembler. Please correct your part b with a red pen. Please do parts a and b on paper first, then run the machine. d) Pretend your are the microcomputer and hand-execute this program up until the stop instruction. Perform the pseudo-execution showing the R/W, 16-bit Address, and 8-bit Data in hexadecimal for each cycle. On the 6812 the pseudo-execution will not match the actual 6812 execution. This is because the 6812 has an instruction queue and can actually fetch 16 bits at a time. TExaS does not simulate the 6812 instruction queue and will always fetch 8 bits. TExaS properly simulates the software timing on all its microcomputers. For example TExaS will show the 6812 instruction ldaa $0800 as four pseudo cycles Read Read Read Read
$F800 $F801 $F802 $0800
B6 08 00 xx
fetch opcode fetch operand fetch operand memory read, xx is the contents at $800
but the simulated time will be correctly incremented by 3. In fact, all timing aspects of the simulation will be accurate. Add in the comment field at the start of each instruction which instruction is being executed. e) Run this program with the simulator and verify your answers to part d. Correct any mistakes with a red pen. Please do part d on paper first, then run the machine. Lab 6.3 Profiling Purpose. The TExaS simulator provides a rich set of debugging tools, but eventually we will be asked to run programs on an actual microcomputer. The objective of this lab is to develop profiling tools that do not depend on the simulator. Even though we will still be using the simulator for this lab, these techniques can be used when debugging software on an actual microcomputer. Procedure. a) Write three debugging subroutines that implement profiling. The first subroutine (Debug_Init) initializes your system. The second subroutine (Debug_Capture) saves a profile point (time, data, and PC position) in an array. The time parameter is the current TCNT value, the data parameter is the hexadecimal value in Register D, and the PC position information can be obtained by reading the return address off the stack. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine (Debug_Display) displays the profile on the SCI/CRT interface. Be careful to save and restore registers so the original subroutine will execute. Program L6.3 shows an example application of these debugging functions. Measure the execution time of the Debug_Capture subroutine. This time will be a quantitative measure of the intrusiveness of the debugging instrument. b) In this part, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to count the number of times sqrt is called. Modify the main program so sqrt is called exactly 15 times. Connect unused parallel port bits to an external device that will assist in the visualization (LED, LCD etc.) Run your instrumented system that visualizes the program is called 15 times. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. c) Again, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to visualize the execution pattern within sqrt. Modify the main program so sqrt is called once with an input of Reg A 100. Connect unused parallel port bits to a logic analyzer Run your instrumented system that visualizes the execution pattern. In particular, you should see the subroutine start, visualize how many times it loops, and see it finish. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. Lab 6.4 MicroForth Interpreter Purpose. In this lab, you will build a binary-tree data structure. You will design an interpreter that performs simple arithmetic operations. Your system must handle of signed under/overflow conditions.
248
6 䡲 Pointers and Data Structures
org $0800 rmb 1 transformed to sqrt(s) rmb 1 loop counter rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input mul ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx bsr sqrt check nop inca pulx bne loop stop org $FFFE fdb main t cnt s16
Program L6.3 Profiling added to the squareroot program.
* with debugging added org $0800 t rmb 1 transformed to sqrt(s) cnt rmb 1 loop counter s16 rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt jsr Debug_Capture psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 jsr Debug_Capture dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx jsr Debug_Init bsr sqrt jsr Debug_Display check nop inca pulx bne loop stop org $FFFE fdb main
6.14 䡲 Laboratory Assignments
249
Description. In preparation for this assignment, review binary trees, command interpreters, and the last-in-first-out queue (stack). See the simple binary interpreter in TREE.rtf (installed with TExaS). The major advantage of a binary tree structure over a linear list is the speed of lookup. In the worst case, the maximum number of compares one must do to find an entry is the maximum depth of the tree. Let size be the number of entries and depth be the maximum distance from the root to any leaf. If the binary tree is full, the maximum depth is less or equal to next greatest integer of log2 size. For example, a full tree with 1023 entries requires only 10 searches to find an entry. A linear search on the same 1023 entries would take on average 512 searches. In this assignment, we will have only 15 entries, but still will implement a linked-list binary tree. There are two basic approaches to binary searching: linked lists and indexed table. In the listed list, each entry contains a string called name, a pointer to the function to execute called command, and two pointers: left and right. If both left and right are null, then the node is a leaf.
"1" push1
root "–1" pshm1
"in" in
"+" add "*" mult null
null
"/" divide "–" sub
null
null
"–2" pshm2 null
"drop" drop
"0" push0
null
null
depth = 4 "out" out
"2" push2
null
null
null
"dup" dup null
"mod" mod
null
null
"over" over
null
null
null
Figure L6.4a Tree structure containing the names and function addresses. In this procedure, input is a string to find. We begin searching at the root. Figure L6.4b Flowchart for the interpreter.
pt = root;
input < pt->name
input
input == pt->name
input > pt->name pt = pt->left;
pt = pt->right;
execute pt->command(); success
pt != null
pt
pt == null
failure
250
6 䡲 Pointers and Data Structures If the input is less than the name of the current node (pt-name) (alphabetically before) then the search will go left (pt pt-left). If the input is greater than the name of the current node (pt-name) (alphabetically after) then the search will go right (pt pt-right). The second approach (which you will not be implementing, but is included for your consideration) is called an indexed table. In this scheme we start numbering at index 1. The table must be sorted alphabetically. Rather than storing the pointers explicitly as we did in the previous example, notice how the index number when viewed in binary provides the same information. If the size is not exactly a power of two, we must allocate additional entries and place them alphabetically at the beginning or the end.
Table 1000
0100
1100
0010
0001
0110
0011
0101
0111
1010
1001
1110
1011
1101
1111
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
"*" "+" "-" "-1" "-2" "/" "0" "1" "2" "drop" "dup" "in" "mod" "out" "over"
mult add sub pshm1 pshm2 divide push0 push1 push2 drop dup in mod out over
Figure L6.4c Finite state graph.
Again, input is the string that is used to match the name field of the table. This is also a binary search because the number of tests will be less than or equal to the next greatest integer of log2 size.
Figure L6.4d Flowchart for the indexed table interpreter.
I = 0; mask = 0x08;
I = I+mask; input < Table[I].name input == Table[I].name input input > Table[I].name
I = I-mask;
mask = mask>>1; mask > 0
execute Table[I].command();
mask == 0 mask failure
success
6.14 䡲 Laboratory Assignments
251
The linked-list lookup will be a little faster to execute, because it is quicker to access pt-name than it is to access Table[I].name. On the other hand, it is easy to make minor changes in the indexed table. If the space is already allocated, then at run time involves shifting the entries down and so that the list remains alphabetical. Deleting a node simply involves shifting the nodes up. The only disadvantage is the size can not increase so that it exceeds the next power of two. The following table lists the 15 commands your FORTH interpreter will execute. Your software system will have two stacks. The return stack, pointed to by SP, will contain return addresses for the usual jsr rts subroutine call functions. The data stack, pointed to by RegY, will contain the input/output parameters for the functions. Commands will be separated by returns (ASCII 13). The idea is to input an entire line using InString, then lookup the command in the tree, if found, execute the function. You should display (without popping) the top data stack entry in the LCD display. Command
Function
in out dup over drop * / mod 0 1 2 1 2
Input 8-bit signed number from CRT keyboard, push on data stack Pop from data stack and output 8-bit signed number to CRT display Duplicates top of data stack Duplicates next to top of data stack Pop and discard top of data stack Pops two numbers from data stack, add, push result on data stack Pops two numbers from data stack, subtract, push result on data stack Pops two numbers from data stack, multiply, push result on data stack Pops two numbers from data stack, divide, push quotient on data stack Pops two numbers from data stack, divide, push remainder on data stack Pushes the constant 0 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack
We will create 10 bytes of space for the data stack (Y) separate from the hardware stack (SP). Register Y will always point into this space. You must explicitly test for data stack overflow and underflow. You must also implement ceiling and floor handling during the addition, subtraction, multiply, divide, and modulo functions. datastack penultimate ultimate bottom
Figure L6.4e Data and return stacks.
rmb rmb rmb rmb
8 1 1 0
Return stack
Data stack datastack Free area
Free area
Y
SP Subroutine return addresses
Top of stack Valid data
antepenultimate penultimate ultimate bottom
252
6 䡲 Pointers and Data Structures Notice that we can determine how many bytes are on the data stack by comparing Y to the fixed addresses: If Y Equals
This Many Bytes Are on the Data Stack
Bottom Ultimate Penultimate Datastack
None (empty) One Two Ten (full)
Use reverse-polish format for subtraction and division. E.g., 2 1 – is 1, and 2 1 / is 2. The usual stack rules apply to this data stack as well. 1. 2. 3. 4.
Stack accesses (PUSH or PULL) should not be performed outside the allocated area. Stack reads should not be performed from the free area. Stack PUSH should first decrement Y, then store the data (not vise versa). Stack PULL should first read the data, then increment Y (not vise versa).
Here are a couple of the routines to get you started: * duplicate next to top over cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate cpy #datastack check for full bls overend skip if stack is already full ldaa 1,y copy of next to top staa 1,-y push on data stack overend rts * push a 2 on the data stack push2 cpy #datastack check for full bls psh2end skip if stack is already full movb #2,1,-y push 2 on data stack psh2end rts * multiply top two entries mult cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate ldaa 1,y+ pop top of stack sex a,x X=multiplicand (-128 to +127) ldaa 1,y+ pop next to top sex a,d D=multiplicand (-128 to +127) exg x,y Y,D are multiplicands (X is stack pt) emuls D=product (no overflow possible in 16-bit D) exg x,y Y is stack pt again, -16256 = D = 16384 cpd #127 bgt ceiling cpd #-128 bge ok floor ldab #-128 since D127, set B = 127 ok stab 1,-y push result rts a) One by one, write and debug the 15 individual commands. Use stabilization to test each routine. b) Design the fixed binary tree containing the names and function addresses for all your commands. This structure will exist in EEPROM and can not be modified unless the source code is edited and the program reassembled. Note, most FORTH interpreters place the binary tree
6.14 䡲 Laboratory Assignments
253
in RAM and allow commands to be added and subtracted at run time. Use binding (equ) to make the program more readable. c) Write the main program that interprets input from the CRT keyboard and displays output back to the CRT display. Remember to display the top of the data stack on the LCD display. Lab 6.5 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, to create a segmented software system, and real-time synchronization by designing an input-directed traffic light controller. In preparation for this assignment, review finite-state machines, linked lists, and memory allocation. You should also run and analyze the linked-list controllers found in example files moore.rtf and mealy.rtf. Description. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project, you will run with a fast TCNT clock (TSCR2 0). After the software is debugged, you will interface actual lights and switches to the 9S12 and run your software on the real 9S12. During this phase of the project you will run with a slow TCNT clock (TSCR2 $07). As you have experienced, the simulator requires more actual time to simulate one cycle of the microcomputer. On the other hand, the correct simulation time is maintained in the TCNT register, which is incremented every cycle of simulation time. The simulator speed depends on the amount of information it needs to update into the windows. Unfortunately, even with the least amount of window updates, it would take a long for the simulator to process the typical 3 minutes it might take for a “real” car to pass through a “real” traffic intersection. Consequently, the cars in this traffic intersection travel much faster than “real” cars. In other words, you are encouraged to adjust the time delays so that the operation of your machine is convenient for you to debug and for the TA to observe during demonstration. You will create a segmented software system putting global variables into RAM, local variables into RAM, constants and fixed data structures into EEPROM, and program object code into EEPROM. Most microcontrollers have a rich set of timer functions. For this lab, you will the ability to wait a prescribed amount of time. In general, cycle-counting (simple for loops) has the problem of conditional branches and data-dependent execution times. If an interrupt were to occur during a cycle-counting delay, then the delay would be inaccurate using the cyclecounting method. Using the TCNT timer, however, the timing will be very accurate, even if an interrupt were to occur while the microcomputer was waiting. In more sophisticated systems, other timer modes provide even more flexible mechanisms for microcomputer synchronization. A linked list solution may not run the fastest or occupy the fewest memory bytes, but it is a structured technique that is easy to understand, easy to implement, easy to debug, and easy to upgrade. Consider a typical four-corner intersection as shown in Figure L6.5. There two one-way streets are labeled South (cars travel North) and West (cars travel East). There are three inputs to your 9S12, two are car sensors, and one is a walk button. The South sensor will be true (1) if one or more cars are near the South intersection. Similarly, the West sensor will be true (1) if one or more cars are near the West intersection. The Walk sensor will be true (1) if a pedestrian wishes
Figure L6.5 Traffic light intersection.
Walk
South
R Y G
West R Y G
R
Dont walk G
Walk
254
6 䡲 Pointers and Data Structures to cross in any direction. There are eight outputs from your microcomputer that control the two Red/Yellow/Green traffic lights and the two walk/don’t lights. The simulator allows you to attach binary switches to simulate the three inputs and LED lights to simulate the eight outputs. Traffic should not be allowed to crash. I.e., there should not be a green or yellow on South at the same time there is a green or yellow on West. You should exercise common sense when assigning the length of time that the traffic light will spend in each state, so that the simulated system changes at a speed convenient for the TA (stuff changes fast enough so the TA doesn’t get bored, but not too fast that the TA can’t see what is happening). Cars should not be allowed to hit the pedestrians. The walk sequence should be realistic (walk, flashing don’t, continuous don’t). Your system should consider both the average and worst-case waiting time. You may assume the two car sensors remain active for as long as service is required. On the other hand, the walk button may be pushed and released, and the system must remember the walk has been requested. a) Build an I/O system in TExaS with the appropriate names and colors on the lights and switches. Think about which ports you will be using in part d so that you simulate the exact system you will eventually plan to build. b) Design a finite-state machine that implements a good traffic-light system. Include a graphical picture of your finite-state machine showing the various states, inputs, outputs, wait times, and transitions. Remember the wait function will return input data collected while it is waiting. c) Write the assembly code that implements the traffic-light control system. There is no single, “best” way to implement your traffic light. However, your scheme must be segmented into RAM/EEPROM, and you must use a linked-list data structure. There should be a one-toone mapping from the FSM states and the linked list elements. A “good” solution has about 10 to 20 states in the finite-state machine and provides for input dependence. Try not to focus on the civil engineering issues. Rather, build a quality computer engineering solution that is easy to understand and easy to change. Do something reasonable, and have 10 to 20 states. A good solution has 1. 2. 3. 4. 5.
One-to-one mapping between state graph and data structure No conditional branches in program The state graph defines exactly what it does in a clear and unambiguous fashion The format of each state is the same Good names and labels
Typically in real applications using an embedded system, we put the executable instructions and the finite-state machine linked-list data structure into the nonvolatile memory (flash EEPROM). A good implementation will allow minor changes to the finite machine (adding states, modifying times, removing states, moving transition arrows, and changing the initial state) simply by changing the linked list controller, without changing the executable instructions. Making changes to executable code requires you to debug/verify the system again. If there is a one-to-one mapping from FSM to linked-list data structure, then if we just change the state graph and follow the one-to-one mapping, we can be confident our new system still operates properly. Obviously, if we add another input sensor or output light, it may be necessary to update the executable part of the software and re-assemble. During the debugging phase with the TExaS simulator, you can run with a fast TCNT clock (TSCR2 $00). d) After the software has been debugged on the simulator, you will implement it on the real board. The first step is to interface three pushbutton switches for the sensors. Do not place or remove wires on the protoboard while the power is on. Build the switch circuits and test the voltages using a digital voltmeter. You can also use the debugger to observe the input pin to verify the proper operation of the interface. The next step is to build six LED output circuits. You can use the two LEDs on the docking module (PT1, PT0) in addition to the six external LEDs you will build on your protoboard. Look up the pin assignments in the 7406 data sheet. Be sure to connect 5 V power to pin 14 and ground to pin 7. You can use the debugger to set the direction
6.14 䡲 Laboratory Assignments
255
register to output. Then, you can set the output high and low, and measure the three voltages (input to 7406, output from 7406 which is the LED cathode voltage, and the LED anode voltage). e) Debug your combined hardware/software system on the actual 9S12 board. When using the real 9S12, you should run with a slow TCNT clock (TSCR2 $07). An interesting question that may be asked during checkout is how you could experimentally prove your system works. In other words, what data should be collected and how would you collect it?
7
Local Variables and Parameter Passing Chapter 7 objectives are to: c Explain how to implement local variables on the stack c Show how various C compilers implement local variables and pass parameters c Compare and contrast call-by-value versus call-by-reference parameter passing
Variables are an important component of software design, and there are many factors to consider when creating variables. Some of the obvious considerations are the size and format of the data. Another factor is the scope of a variable. The scope of a variable defines which software modules can access the data. Variables with an access that is restricted to one software module are classified as private, and variables shared between multiple modules are public. In general, a system is easier to design (because the modules are smaller and simpler), easier to change (because code can be reused), and easier to verify (because interactions between modules are well-defined) when we limit the scope of our variables. However, since modules are not completely independent, we need a mechanism to transfer information from one to another. In this chapter, we will develop parameter passing methodologies. Because their contents are allowed to change, all variables must be allocated in RAM and not ROM. On the one hand, global variables contain information that is permanent and are usually assigned a fixed location in RAM. On the other hand, local variables contain temporary information and are stored in a register or allocated on the stack. One of the important objectives of this chapter is to present design steps for creating, using, and destroying local variables on the stack. In summary, there are three types of variables: public globals (shared permanent), private globals (unshared permanent), and private locals (unshared temporary). Because there is no appropriate way to create a public local variable, we usually refer to private local variables simply as local variables, and the fact that they are private is understood.
7.1
Local Versus Global A local variable contains temporary information. Since we will implement local variables on the stack or in registers, this information can not be shared with other software modules. Therefore, under most situations, we can further classify these variables as private. Local variables are allocated, used, then deallocated, in this specific order. For speed reasons, we wish to assign local variables to registers. When we assign a
256
7.1 䡲 Local Versus Global
257
local variable to a register, we can do so in a formal manner. There will be a certain line in the assembly software at which the register begins to contain the variable (allocation), followed by lines where the register contains the information (access or usage), and a certain line in the software after which the register no longer contains the information (deallocation). As an example, consider the register allocation used in a finite-state machine controller, shown earlier as Program 6.22, and again here as Program 7.1. Register B is allocated for holding the Output value in Line 6, used in Lines 6 through 9, then deallocated, such that after Line 9, Register B can be used for other purposes. Register B and Y are used in this program to temporarily hold information, and hence are classified as local variables. Constrast this to how Register X is used. This is a VERY simple program, and in such, the usage of Register X is unusual. This main program assigns Register X to hold the state pointer (Pt) in Line 5. From that point in time, Register X always contains Pt, and hence we classify this assignment of Register X as global (meaning permanent). It is appropriate to assign a register as a global only in the most simple situations (e.g., less than a 20-line program with no interrupts). Program 7.1 Register assignments in a finite-state machine controller.
Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Program Main lds #$4000 bsr Timer_Init ldab #$FC stab DDRT ldx #goN FSM ldab OUT,x lslb lslb stab PTT ldy WAIT,x bsr Timer_Wait10ms ldab PTT andb #$03 lslb abx ldx NEXT,x bra FSM
Register B
Register X
Register Y
$FC
Output Output Output Output
Input Input Input Input
Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt
Wait Wait
The information stored in a local variable is not permanent. This means if we store a value into a local variable during one execution of the module, the next time that module is executed the previous value is not available. Examples include loop counters and temporary sums. We use a local variable to store data that is temporary in nature. We can implement a local variable using the stack or registers. Some reasons why we choose local variables over global variables include: 䡲 Dynamic allocation/release allows for reuse of RAM memory. 䡲 Limited scope of access (making it private) provides for data protection; only the program that created the local variable can access it. 䡲 Since an interrupt will save registers and create its own stack frame, the code is reentrant. 䡲 Since absolute addressing is not used, the code is relocatable. Some reasons why we place local variables on the stack rather than using registers include: 䡲 We can use symbolic names for the local variables, making it easier to understand. 䡲 The number of variables is only limited by the size of the stack, which is more than registers. 䡲 Because it is more general, it will be easier to add additional variables in the future.
258
7 䡲 Local Variables and Parameter Passing Checkpoint 7.1: How do you create a local variable in C?
A global variable is allocated at a permanent and fixed location in RAM. A public global variable contains information that is shared by more than one program module. We must use global variables to pass data between the main program (i.e., foreground thread) and an ISR (i.e., background thread). If a function called from the foreground belongs to the same module as the ISR, then a global variable used to pass data between the function and the ISR is classified as a private global (assuming software outside the module does not directly access the data). Global variables are allocated at assembly time and never deallocated. Allocation of a global variable means the assembler assigns the variable a fixed location in RAM. The information they store is permanent. Examples include time of day, date, calibration tables, user name, temperature, fifo queues, and message boards. We use absolute addressing (direct or extended) to access their information. When dealing with complex data structures like the ones presented in Chapter 6, pointers to the data structures are shared. In general, it is a poor design practice to employ public global variables. On the other hand, private global variables are necessary to store information that is permanent in nature. Observation: Sometimes we store temporary information in global variables because it is easier to observe the contents using the debugger. This usage is appropriate during the early stages of development, but once the module is tested, temporary information should be converted to local, and the system should be tested again. Checkpoint 7.2: How do you create a global variable in C?
In C, a static local has permanent allocation, which means it maintains its value from one call to the next. It is still local in scope, meaning it is only accessible from within the function. I.e., modifying a local variable with static changes its allocation (it is now permanent), but doesn’t change its scope (it is still private). In the following example, count contains the number of times MyFunction is called. The initialization of a static local occurs just once, during startup. void MyFunction(void){ static short count=0; count++; }
In C, we create a private global variable using the static modifier. Modifying a global variable with static does not change its allocation (it is still permanent), but does reduce its scope. Regular globals can be accessed from any function in the system (public), whereas a static global only can be accessed by functions within the same file. Static globals are private. Functions can be static also, meaning they can be called only from other functions in the file. E.g., static short myPrivateGlobalVariable; // accessible by this file only void static MyPrivateFunction(void){ }
In C, a const global is read-only. It is allocated in the ROM portion of memory. Constants, of course, must be initialized at compile time. E.g., const short Slope=21; const char SinTable[8]={0,50,98,142,180,212,236,250}; Common Error: If you leave off the const modifier in the SinTable example, the table will be allocated twice: once in ROM containing the initial values and once in RAM
7.2 䡲 Stack Rules
259
containing data to be used at run time. Upon startup, the system copies the ROM-version into the RAM-version. Maintenance Tip: It is good practice to specify whether an assembly variable is signed or unsigned in the comments. If the information has units (e.g., volts, seconds, etc.) this should be included also.
7.2
Stack Rules In the last section, we discussed the important issue of global versus local variables. One of the more flexible means to create local variables will be the stack. In this section, we define a set of rules for proper use of the stack. A last-in-first-out (LIFO) stack is implemented in hardware by most computers. The stack can be used for local variables (temporary storage), saving return addresses during subroutine calls, passing parameters to subroutines, and saving registers during the processing of an interrupt. The first advantage of placing local variables on the stack is that the storage can be dynamically allocated before usage and deallocated after usage. The second advantage is the facilitation of reentrant software. The stack pointer (SP) on the 9S12 points to the top entry of the stack, as shown in Figure 7.1. If it exists, we define the data immediately below the top (larger memory address) as next to top. To push a byte on the stack, we first decrement the stack pointer (SP), then we store the byte at the location pointed to by the SP. To pull a byte from the stack, first we read the byte from memory pointed to by SP, then we increment the SP. To push a 16-bit word on the stack, we first decrement the SP by 2, then we store the word into that location. To pull a 16-bit word from the stack, we first read the word from the location pointed to by SP, then we increment the SP by 2.
Figure 7.1 The 9S12 stack. The white boxes are free spaces, and the shaded boxes contain data.
Stack with 3 elements
Empty Stack
SP
top next
SP
Checkpoint 7.3: How do we push/pull a 16-bit word onto/from the stack?
The instruction tsx will transfer a copy of the stack pointer into Register X. The instruction causes Register X to point to the top element of the stack, as shown in Figure 7.2. The instruction tsy works in a similar manner with Register Y. The tsx and tsy instructions do not modify the stack pointer. Formally, there is only SP that defines what data is on the stack. However, having a second pointer also point into the stack provides additional flexibility for accessing data. Figure 7.2 The tsx instruction creates a stack frame pointer.
Stack before
Stack after txs SP
SP
top next
top X
next
260
7 䡲 Local Variables and Parameter Passing
We can read and write previously allocated locations on the stack using indexed mode addressing. For example, to read an 8-bit value from the next to the top byte: tsx ldaa 1,X
;Reg X points to the top byte of the stack ;Reg A = the next to the top byte
Stack pointer indexed mode also can be used to read any data on the stack: ldaa 1,SP
;Reg A = the next to the top byte
The LIFO stack has a few rules (repeated from Chapter 5): 1. 2. 3. 4. 5.
Program segments should have an equal number of pushes and pulls. Stack accesses (push or pull) should not be performed outside the allocated area. Stack reads and writes should not be performed within the free area. Stack push should first decrement SP, then store the data. Stack pull should first read the data, then increment SP.
Programs that violate rule number 1 will probably crash when a rts instruction pulls an illegal address off the stack at the end of a subroutine. The TExaS simulator will usually recognize this error as an illegal memory access then the processor tries to fetch an op code at this incorrect address. The backdump command will be useful to retrace the steps leading up to the crash. Figures 7.1 and 7.2 show the free area as white boxes. Violations of rule number 2 can be caused by a stack underflow or overflow. Stack underflow is caused when there are more pulls than pushes and is always the result of a software bug. The TExaS simulator will recognize this error as an illegal memory access when the processor tries to pull data from an address that doesn’t exist. A stack overflow can be caused by two reasons. If the software mistakenly pushes more than it pulls, then the stack pointer will eventually overflow its bounds. Even when there is exactly one pull for each push, a stack overflow can occur if the stack is not allocated large enough. Stack overflow is a very difficult bug to recognize, because the first consequence occurs when the computer pushes data onto the stack and overwrites data stored in a global variable. At this point, the local variables and global variables exist at overlapping addresses. Setting a breakpoint at the first address of the allocated stack area allows you to detect a stack overflow situation. Checkpoint 7.4: How do you specify the size of the stack?
The following 9S12 assembly code violates rule 3, and will not work if interrupts are active. The objective is to save register A onto the stack. When an interrupt occurs, registers automatically will be pushed on the stack, destroying the data. staa -1,SP
;Store zero onto the stack (***illegal***)
To use the stack, one first allocates, then saves. The following assembly code also violates rule 3, because it first stores it on the stack, then allocates space. The objective is to push a zero onto the stack. If an interrupt were to occur between the clr and des instructions in the following example, the zero will be destroyed when registers are pushed on the stack by the interrupt context switch: tsx clr -1,X des
;Reg X points to the top of the stack ;Store zero onto the stack (***illegal***) ;Make space for the zero
The proper technique is to allocate first, then store: des clr 0,SP
;Allocate stack space first ;Store zero onto the stack
or clr 1,-SP ;Store zero onto the stack
7.3 䡲 Local Variables Allocated on the Stack
261
Constants can be pushed on the stack with the movb and movw instructions. For example, to push the byte 7: movb #7,1,-SP ;push a 7 onto the stack Checkpoint 7.5: Write an assembly instruction that pushes a 16-bit 1000 onto the stack.
7.3
Local Variables Allocated on the Stack Stack implementation of local variables has four stages: binding, allocation, access, and deallocation. 1. Binding is the assignment of the address (not value) to a symbolic name. The symbolic name will be used by the programmer when referring to the local variable. The assembler binds the symbolic name to a stack index, and the computer calculates the physical location during execution. In the following example, the local variable will be at address SP 0, and the programmer will access the variable using sum,SP addressing: sum
set
0
;16-bit local variable, stored on the stack
Checkpoint 7.6: Why is set better than equ for binding?
2. Allocation is the generation of memory storage for the local variable. The computer allocates space during execution by decrementing the SP. In this first example, the software allocates the local variable by pushing a register on the stack. An 8-bit push (e.g., psha) creates an unitialized 8-byte local variable, and a 16-bit push (e.g., pshx) creates an unitialized 16-byte local variable The value in the register is irrelevant; these instructions are used because they are a fast way to decrement the SP. pshx
;allocate 16-bit sum
In this next example, the software allocates the local variable by decrementing the stack pointer. This local variable is also uninitialized. This method is most general, allowing the allocation of an arbibrary amount of data. leas -2,SP
;allocate sum
Checkpoint 7.7: In what way is pshx better than leas -2,sp for allocating a 16-bit local? In what way is leas -2,sp better?
If you wished to allocate a 16-bit local and initialize it to zero, you could execute: ldx #0 pshx ;allocate sum=0
or movw #0,2,-sp ;allocate sum=0 Checkpoint 7.8: Assume Register A contains the size in bytes of an array, determined at run-time. Write assembly code to allocate the array on the stack.
3. The access to a local variable is a read or write operation that occurs during execution. In the next code fragments, the value of the local variable sum is initialized to 0. One way is tsx ldd std
;X points to locals #0 sum,x ;sum=0
and another way is movw #0,sum,sp
;sum=0
262
7 䡲 Local Variables and Parameter Passing
In the next code fragment, the local variable sum is incremented. We could use RegX to access the data tsx ldd sum,x addd #1 std sum,x
;sum=sum+1
or use the SP directly. ldd sum,sp addd #1 std sum,sp
;sum=sum+1
4. Deallocation is the release of memory storage for the location variable. The computer deallocates space during execution by incrementing SP. In this first example, the software deallocates the local variable by pulling a register from the stack. pulx
;deallocate sum
Observation: When the software uses the “push-register” technique to allocate and the “pull-register” technique to deallocate, it looks like it is saving and restoring the register. Because most applications of local variables involve storing into the local, the value pulled will NOT match the value pushed.
In this next example, the software deallocates the 16-bit local variable by incrementing the stack pointer twice. leas 2,SP
;deallocate sum
Checkpoint 7.9: Write a 9S12 subroutine that allocates then deallocates three 8-bit locals.
7.4
Stack Frames Assume the SP is initialized to $4000. By definition, the SP points to the top of the stack. Therefore, all data on the stack exist at addresses between SP and $3FFF, i.e., SP address $3FFF. However, sometimes it is convenient to setup a second pointer into the stack, using either register X or Y, called a stack frame pointer. For example, the stack frame pointer can point to a set of local variables and parameters of the function. It is important in this implementation that once the stack frame pointer is established (e.g., using the tsx instruction), that the stack frame register (X) not be modified. The term frame refers to the fact that the pointer value is fixed. If Register X is a fixed pointer to the set of local variables, then a fixed binding (using the equ or set pseudo op) can be established between Register X and the local variables (even if additional information is pushed on the stack.) Because the stack frame pointer should not be modified, every subroutine will save the old stack frame pointer of the function that called the subroutine (e.g., pshx at the top) and restore it before returning (e.g., pulx at the bottom.) In some cases, the txs instruction can be used to deallocate the local variables. Local variable access uses the indexed addressing mode using Register X. Observation: One advantage of using a stack frame is that you can push and pull within the body of the function and still be able to access local variables using their symbolic name. Observation: One disadvantage of using a stack frame is that a register is dedicated as the frame pointer, and thus, it is unavailable for general use.
Programs 7.2, 7.3, and 7.4 all calculate the 16-bit sum of the first 100 numbers. The purpose of these simple programs is to demonstrate various implementations of local variables. In these programs, the result will be returned by value in Register D.
7.4 䡲 Stack Frames Program 7.2 A simple function with two local 16-bit variables.
263
unsigned short calc(void){ unsigned short sum,n; sum = 0; for(n=100;n>0;n--){ sum=sum+n; } return sum; }
Program 7.3 shows two implementions using regular stack pointer addressing, as drawn in Figure 7.3 (left). The implementation on the left of Program 7.3 has no binding and is difficult to understand. In this version, the variable n is accessed using 2,SP addressing mode. The version on the right has exactly the same machine code as the left (same size and execution speed), but is easier to understand because the local variables are referred to by their symbolic names. Figure 7.3 Local variables on the stack, accessed with indexed addressing modes.
Stack for Program 7.3
Stack for Program 7.4 num
–4,X
num
0,SP
n
–2,X
n
2,SP
Old Reg X
SP SP
return address 16 bits
;***NO BINDING USED*****
; *******allocation phase ********* calc leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,0,sp ;sum=0 movw #100,2,sp ;n=100 loop ldd 2,sp ;RegD=n addd 0,sp ;RegD=sum+n std 0,sp ;sum=sum+n ldd 2,sp ;n=n-1 subd #1 std 2,sp bne loop ; ********deallocation phase ***** leas 4,sp ;deallocation rts ;RegD=sum
X
return address 16 bits
; *****binding phase*************** sum set 0 ;16-bit number n set 2 ;16-bit number ; *******allocation phase ********* calc leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,sp ;sum=0 movw #100,n,sp ;n=100 loop ldd n,sp ;RegD=n addd sum,sp ;RegD=sum+n std sum,sp ;sum=sum+n ldd n,sp ;n=n-1 subd #1 std n,sp bne loop ; ********deallocation phase ***** leas 4,sp ;deallocation rts ;RegD=sum
Program 7.3 Stack pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.
Program 7.4 shows two implementions using stack frame pointer addressing. The one on the left has no binding and is difficult to understand. The one on the right has exactly the same machine code but is easier to understand. The program establishes the frame pointer, then allocates the variables. In Program 7.4, the variable n is accessed using 2,X addressing mode, as shown in Figure 7.3 (right). Notice in both cases of Figure 7.3 that valid data on the stack exists in memory at addresses greater or equal to the stack pointer. In particular, one does not allocate/deallocate stack space by changing Registers X or Y. I.e., decrementing SP allocates space, and incrementing SP deallocates space.
264
7 䡲 Local Variables and Parameter Passing
;***NO BINDING USED*****
; *******allocation phase ********* calc pshx ;save old Reg X tsx ;stack frame pointer leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,-4,x ;sum=0 movw #100,-2,x ;n=100 loop ldd -2,x ;RegD=n addd -4,x ;RegD=sum+n std -4,x ;sum=sum+n ldd n,x ;n=n-1 subd #1 std -2,x bne loop ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts
; *****binding phase*************** sum set -4 ;16-bit number n set -2 ;16-bit number ; *******allocation phase ********* calc pshx ;save old Reg X tsx ;stack frame pointer leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,x ;sum=0 movw #100,n,x ;n=100 loop ldd n,x ;RegD=n addd sum,x ;RegD=sum+n std sum,x ;sum=sum+n ldd n,x ;n=n-1 subd #1 std n,x bne loop ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts
Program 7.4 Stack frame pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.
Example 7.1. Write an assembly subroutine with three 8-bit and one 16-bit local variables allocated on the stack. Name the variables cnt, n, flag, and pt. Solution There are two general approaches for creating local variables on the stack. Stack pointer addressing is faster, but stack frame addressing is more flexible, allowing for additional stack pushes within the body of the subroutine. The solutions in Program 7.5 begin by
; *****binding phase*************** cnt set 0 ;8-bit number n set 1 ;8-bit number flag set 2 ;8-bit number pt set 3 ;16-bit number ; *******allocation phase ********* func leas -5,sp ;allocate cnt,n,flag,pt
; ********access phase ************ ; ********deallocation phase ***** leas 5,sp ;deallocation rts ;RegD=sum
; *****binding phase*************** cnt set -5 ;8-bit number n set -4 ;8-bit number flag set -3 ;8-bit number pt set -2 ;16-bit number ; *******allocation phase ********* func pshx ;save old Reg X tsx ;stack frame pointer leas -5,sp ;allocate cnt,n,flag,pt ; ********access phase ************ ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts ;RegD=sum
Program 7.5 Three 8-bit and one 16-bit local variables on the stack. The program on the left uses stack pointer addressing, and the one on the right uses a stack frame pointer.
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables Figure 7.4 Three 8-bit and one 16-bit local variables on the stack.
Stack pointer addressing
SP
cnt n flag pt
0,SP 1,SP 2,SP 3,SP
265
Stack frame pointer addressing –5,X cnt SP –4,X n –3,X flag –2,X pt X
Old Reg X
return address
return address
8 bits
8 bits
allocating five bytes of storage. When using SP addressing, we simply decrement the stack pointer by 5. When using stack frame pointer addressing, we save the frame pointer, copy the SP into the frame pointer, and then decrement the stack pointer by 5. We then draw a picture of the stack at this point, and assign the four variables into the five bytes of storage, as shown in Figure 7.4. There is no particular advantage of one assignment over another, as long as the four variables exist contiguously. We label the addressing mode to be used to access each variable, and use these numbers to assign the bindings in our software.
7.5 Parameter Passing Using Registers, Stack, and Global Variables Up to this point in the book, we used registers to pass data into and out of subroutines. The input parameters (or arguments) are pieces of data passed from the calling routine into the subroutine during execution. The output parameter (or argument) is information returned from the subroutine back to the calling routine after the subroutine has completed its task. As previously defined in Chapter 6, there are two methods to pass parameters: call by reference and call by value. With call by reference, a pointer to the object is passed. In this way, the subroutine and the module that calls the subroutine have access to the exact same object. Call by reference can be used to pass a large quantity of data and can be used to implement a parameter that is both an input and an output parameter. With call by value, a copy of the data itself is passed. Using the stack to pass parameters provides a much greater flexibility not possible with just the registers.
7.5.1 Parameter Passing in C
The call-by-reference method passes a pointer to the object. In other words, references (pointers) to the actual arguments are passed, instead of copies of the actual arguments themselves. In this scheme, assignment statements have implied side effects on the actual arguments; that is, variables passed to a function are affected by changes to the formal arguments. Sometimes side effects are beneficial, and some times they are not. As an example, consider a stepper motor program shown in Program 7.6. Both assembly and C versions are shown. With call-by-reference parameter passing, there is one copy of the information, and the calling program (e.g., main) passes an address (RegX in the assembly version) to the function. The read and write accesses to the parameter affect the original variable. Since C supports only one formal output parameter, we can implement additional output parameters using call by reference. The calling program passes pointers to empty objects
266
7 䡲 Local Variables and Parameter Passing
Program 7.6 An input/output parameter is implemented using call by reference.
;RegX points to the angle next inc 0,x ;(*pt)++ ldaa 0,x ;RegA=(*pt) cmpa #200 bne skip clr 0,x ;(*pt) = 0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step leax angle,sp ;RegX=&angle bsr next bra loop
void next(unsigned char *pt){ (*pt)++; if((*pt) == 200){ (*pt) = 0; } } void main(void){ unsigned char angle=0; // 0 to 199 Stepper_Init(); while(1){ Stepper_Step(); next(&angle); } }
(RegX and RegY in the assembly version), and the where function fills the objects with data. Program 7.7 shows a function that returns two parameters using call by reference. Assume global variables Xx Yy are private to the where function and contain the true current position. Program 7.7 Multiple output parameters implemented using call by reference.
Xx rmb 2 ; private to where Yy rmb 2 where movw Xx,0,X ;RegX = xpt movw Yy,0,Y ;RegY = ypt rts myX set 0 ;16-bit myY set 2 func leas -4,sp ;allocate leax myX,sp ;RegX=&myX leay myY,sp ;RegY=&myY bsr where ;do something based on myX,myY leas 4,sp ;deallocate rts
short Xx,Yy; /* position */ void where(short *xpt, short *ypt){ (*xpt) = Xx; // return Xx (*ypt) = Yy; // return Yy } void func(void){ short myX,myY; where(&myX,&myY); // do something based on myX,myY }
When we use the call-by-value scheme, the values (not references) are passed to functions. With call by value, copies are made of the parameters. Within a called function, references to formal arguments access the copied values, instead of the original objects from which they were taken. At the time when the computer is executing within next, as shown in Program 7.8, there will be two separate and distinct copies of the angle data. An important point to remember about passing arguments by value in C is that there is no connection between an actual argument and its source. Changes to the arguments made within a function, have no affect what so ever on the objects that might have supplied their values. They can be changed and the original values will not be affected. This removes a burden of concern from the programmer since he may use arguments as local variables without side effects. It also avoids the need to define temporary variables just to prevent side effects. It is precisely because C uses call by value that we can pass expressions, not just variables, as arguments. The value of an expression can be copied, but it cannot be referenced since it has no existence in memory. Therefore, call by value adds important generality to the language. Since expressions may include assignment, increment, and decrement operators, it is possible for argument expressions to affect the values of arguments lying to their right. Consider, for example, func(y=x+1, 2*y);
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables ;Input: RegA is theAngle ;Output:RegA is theAngle next inca ;theAngle++ cmpa #200 bne skip clra ;theAngle=0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step ldaa angle,sp ;copy bsr next staa angle,sp bra loop
267
unsigned char next(unsigned char theAngle){ theAngle++; // next angle if(theAngle == 200){ theAngle = 0; // one rotation } return(theAngle); } void main(void){ unsigned char angle=0; // 0 to 199 Stepper_Init(); while(1){ Stepper_Step(); angle = next(angle); } }
Program 7.8 Parameters are implemented using call by value.
where the first argument has the value x+1 and the second argument has the value 2*(x+1). The value of the second argument depends on whether the arguments are evaluated right-toleft or left-to-right. This kind of situation should be avoided, since the C language does not guarantee the order of argument evaluation. The safe way to write this is y=x+1; func(y, 2*y);
The value of the expression is calculated at the time of the call, and that value is passed into the subroutine. Checkpoint 7.10: What is the difference between call by value and call by reference?
7.5.2 Parameter Passing in Assembly Language
Program 7.9 Multiple return parameters implemented with registers.
In contrast to C, it is easy to return multiple parameters in assembly language. If just a few parameters need to be returned we can use the registers. In Program 7.9, the values of ports A, B, T, and M are to be returned. Notice that it packs two 8-bit parameters into the 16-bit Register X.
; Reg A = Port A, Reg B= Port B ; Reg X = Ports T and M GetPorts ldaa PTT ldab PTM xgdx ldaa PORTA ldab PORTB rts ********calling sequence****** jsr GetPorts * Reg A,B,X have four results staa first stab second xgdx staa third stab fourth
268
7 䡲 Local Variables and Parameter Passing
If many parameters are needed, then the stack can be used. Program 7.10 also returns the values of ports A, B, T, and M. Space for the output parameters is allocated by the calling routine, and GetPorts stores the results into those stack locations.
Program 7.10 Multiple return parameters passed on the stack.
dataA dataB dataT dataM GetPorts
set 2 set 3 set 4 set 5 movb PORTA,dataA,sp movb PORTB,dataB,sp movb PTT,dataT,sp movb PTM,dataM,sp rts
********calling sequence****** leas -4,sp ;allocate jsr GetPorts pula ;first staa first pula ;second staa second pula ;third staa third pula ;fourth staa fourth
An input parameter is information passed from the calling program into the subroutine before the subroutine is executed. An output parameter is information passed out of the subroutine back to the calling program after the subroutine is executed. A parameter can be both an input and an output. The purpose of the next set of examples is to illustrate parameter passing. The subroutine Add8 adds M M N, and sets the flag P if there is an unsigned overflow. M is a 16-bit input/output parameter, N is an 8-bit input parameter, and P is a 1-bit output parameter. The simplest and fastest method to pass parameters uses registers. In this method, the information is contained in the registers. Because concurrent programs have “separate” registers and stack areas, the subroutine is reentrant. Program 7.11 shows the addition module. Reentrancy will be discussed in Chapter 12.
Program 7.11 Addition function that passes parameters call by value in registers.
; Subroutine Calling Sequence ; place information in A,X ; bsr Add8 ; use information in CC,X ; Subroutine Definition ; N is an input parameter, an unsigned 8-bit byte, passed in Reg A ; M is an input/output, a 16-bit number, passed/returned in Reg X ; P is an output parameter, a Boolean flag, ; returned in Reg CC carry bit Add8 psha ;Put N on the stack xgdx ;Place M in Reg D addb 1,SP+ ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P xgdx ;Return result in Reg X rts
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables
269
A simple but completely inappropriate method is to pass parameters using global variables. In this method, the information is contained in global memory variables. Because of the writes to global memory M and P, the subroutine, shown in Program 7.12, is not reentrant. Many embedded systems use this approach because the processor has limited or no facilities with handling data on the stack. Program 7.12 Addition function that passes call-by-value parameters in global variables.
; These three variables can be anywhere in RAM memory N rmb 1 ;N is an input parameter, an unsigned 8-bit number M rmb 2 ;M is an input/output parameter, 16 bits P rmb 1 ;P is an output parameter, a Boolean flag, ; 0 means no overflow, -1 means overflow ; Subroutine Calling Sequence ; place information in N,M ; bsr Add8 ; use information in M,P ; Subroutine Definition Add8 clr P ;Assume no overflow, P=0 ldd M ;Place M in Reg D addb N ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P ;Overflow, P=-1 POK std M ;Return result in M rts
A flexible and elegant method is to pass parameters using the stack. In this method, the information is placed on the system or user stack. As we will see later, most high-level language generate code that passes the first parameter in a register but use the stack to pass additional parameters. However, most high-level languages have only a single output parameter, which is usually returned in a register. When interrupts are enabled, it is possible have multiple threads active at the same time. There is still only one processor, so exactly one thread is actually running at a time, but we define concurrent programming as the state where multiple threads are “ready to run” at the same time. The interrupt hardware provides the mechanism to switch from one thread to the next. Because concurrent threads have “separate” registers and stack areas, software that uses the stack will operate properly in a concurrent environment. Conversely, extreme care is required when using global variables (including the I/O ports) in a concurrent environment. The other advantage of using the stack is that memory space is used temporarily, then deallocated. Program 7.13 passes both Program 7.13 Addition function that passes call-by-value parameters on the stack.
; ; ; ; ; ; ; ; ; ; ; ; ;
Subroutine Calling Sequence des Make room on the stack for P push M (16 bits) onto the stack push N (8 bits) onto the stack bsr Add8 ins Discard input only parameter, N pop M (16 bits) off the stack pop P (8 bits) off the stack Subroutine Definition N is an input parameter, a unsigned 8-bit number, passed on the top of the stack M is an input/output , a 16-bit number, passed/returned on top-1, top-2
continued on p. 270
270
7 䡲 Local Variables and Parameter Passing
continued from p. 269 ; P ;
is an output parameter, a Boolean flag, returned on top-3 Access Contents ;0,SP 16-bit return address N set 2 ;N,SP 8-bit N M set 3 ;M,SP 16 nit M P set 5 ;P,SP 8-bit P Add8 clr P,SP ;Assume no overflow, P=0 ldd M,SP ;Place M in Reg D addb N,SP ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P,SP ;Overflow, P=-1 POK std M,SP ;Return result in M rts ;Return
input and output parameters on the stack. Figure 7.5 shows the stack at the time while the subroutine is being executed. Figure 7.5 Stack diagram showing the parameters as passed in Program 7.13.
SP
return address N
0,SP 1,SP 2,SP 3,SP
M P
5,SP
8 bits
7.5.3 C Compiler Implementation of Local and Global Variables
One of the most important applications of learning assembly language involves analyzing assembly listings when programming in a high-level language. When one programs in a high-level language, there are many design decisions to be made affecting accuracy (e.g., overflow, dropout), reliability (e.g., buffer overflow, critical section, race condition), speed, and code size. Often, these decisions can be best understood at the assembly language level. In fact, one cannot tell if a section of high-level language code is critical without looking at the associated assembly language generated by the compiler. For another example, assume you are designing a finite-state machine in C. You could implement the FSM using a linked data structure like Program 6.22 or with a table like Program 6.23. If you compiled them both and observed the generated listing files, you could determine which version runs faster. Sometimes we have a highlevel language program that we know doesn’t work, but we just can’t seem to find the bug. Often it is easier to visualize bugs by looking at the assembly listing in and around the bugged code. Another application of observing assembly listing generated by the compiler involves proving program correctness. For example, we might ask if the following C code causes an overflow error, assuming both In and Out are 8-bit unsigned char). Out = (99*In)/100;
There are two ways to determine if overflow could occur. First, we could exhaustively test the software giving all possible inputs and verifying the correct output for each test case.
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables
271
Second, knowing the architecture and assembly language of the machine, we could look at the compiler listing and prove that overflow cannot occur. The following assembly code was generated by the Metrowerks Codewarrior V4.6 compiler. Because Out will always be less then In the multiplication is 8 by 8 into 16 bits, and the division is 16 by 16 into 16 bits, so this software can not overflow. Furthermore, we see this code takes exactly 23 cycles to execute. 0006 0008 000b 000c 000f 0011 0013
c663 b60000 12 ce0064 1815 b751 7b0000
[1] [3] [1] [2] [12] [1] [3]
LDAB LDAA MUL LDX IDIVS TFR STAB
#99 In #100 X,B Out
The specific goal of this section is to study how compilers implement local variables and pass parameters. However, in the big picture, we can improve our understanding of both the machine architecture and our high-level language programs by looking at the assembly code generated by the compiler. Program 7.14 shows a simple C program with a global variable G, two local variables both called z, and function parameters m and n. All three compilers analyzed in this section will pass one parameter in Register D and push the other parameter on the stack. If there were additional parameters, they too would have been pushed on the stack by the calling routine. Furthermore, all three compilers will push the one parameter initially passed in Register D onto the stack at the beginning of the subroutine. In this way, during the execution of the subroutine sub, the parameters are all on the stack. The first two compilers studied in this section will place the local variables on the stack. The third compiler will generate more efficient code by placing the local variables in registers as needed.
Program 7.14 An example used to illustrate the C compiler’s use of the stack.
short G; // definition of a global variable short sub(short n, short m){ short z; z = n-m; return(z); } void main(void){ short z; // definition of a local variable G = 5; // access global variable z = 6; // access local variable G = sub(z,1); // call function, pass parameter return(0); }
Observation: Although the local variables of the main program are on the stack, and it IS possible to access them, the compiler will NOT allow the subroutine to access them. In C, there is a clear distinction between the parameters pushed on the stack that are supposed to be accessed by the subroutine and the local variables of the calling program, which are not supposed to be accessed. Common Error: It would be a grievous programming error to access the local variables of the main program from the subroutine. Therefore, in assembly language, it is essential to make the distinction between local variables and data passed on the stack to the subroutine.
272
7 䡲 Local Variables and Parameter Passing
Program 7.15 Assembly code generated for the 6812 by the GCC compiler.
z n m sub
set set set movw pshx pshx sts ldx std ldx ldd ldx subd ldx std ldx ldd pulx pulx movw rts z set main movw pshx sts movw ldx movw movw ldx ldd bsr leas std ldd pulx movw rts
2 0 8 $0800,2,-SP
$0800 $0800 n,X $0800 n,X $0800 m,X $0800 z,X $0800 z,X
2,SP+,$0800 0 $0800,2,-SP $0800 #5,G $0800 #6,z,X #1,2,-SP $0800 z,X sub 2,SP G #0 2,SP+,$0800
;1)save previous stack frame pointer ;allocate space for n,z ;2)establish stack frame pointer ;place n on the stack ;3) use frame to access ;RegD=n ;3) use frame to access ;RegD=n-m ;3) use frame to access ;z=n-m ;3) use frame to access ;RegD=z ;deallocate n,z
n m z z
;4)restore previous stack frame pointer
;1)save previous stack frame pointer ;allocate z ;2)establish stack frame pointer ;G=5 ;3) use frame to access z ;z=6 ;push second parameter onto stack ;3) use frame to access z ;first parameter in RegD ;discard parameter ;G = sub(z,1) ;deallocate z ;4)restore previous stack frame pointer
The first compiler we will study is GCC Release 3.1 for the 6812. The assembly listing, shown as Program 7.15, has been edited to be consistent with the syntax of this book. In particular, the set pseudo-ops were added to help see where information is stored on the stack. The sts instruction establishes a stack frame pointer, at global memory $0800. The use of the stack frame pointer follows the typical pattern: (1) save old frame, (2) establish a new frame, (3) use the frame whenever accessing data on the stack, and (4) restore the previous frame. The pshx instruction allocates local variables. The Register X indexing mode is used to access the data on the stack. The pulx instruction deallocates the local variables. The stack pictures for the three compilers at the time of the subd instruction are drawn in Figure 7.6. Although the local variable of main is on the stack, it will not be (and should not be) accessed by the subroutine. The next compiler we will study is ImageCraft ICCV7 for the Freescale 6812. Again, the disassembled output has been edited to clarify its operation, and shown as Program 7.16. The global symbol, G, will be assigned or bound by the linker/loader. The leas instruction
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables GCC for the 6812 Global SP area frame G
ICCV7 for the 6812
Stack area
n
0,X z of sub 2,X old Frame 4,X return addr 6,X 8,X m z of main old Frame
Stack area
Global area G
Metrowerks Stack area Codewarrior 4.6 Global area G
SP
z of sub n
0,SP 2,SP return addr 4,SP m 6,SP z of main 16 bits
16 bits
273
SP
0,SP m return addr 2,SP 4,SP n 16 bits
Figure 7.6 The stack contains local variables, parameters, and the return address.
allocates and deallocates local variables, and stack pointer addressing is used to access parameters and local variables. This compiler passes the first input parameter into the subroutine by placing it in Register D. The remaining parameters are pushed on the stack by the calling routine.
Program 7.16 Assembly code generated for the 6812 by the ICCV7 compiler.
z m n sub
z main
set 0 set 6 set 2 pshd ;place n on the stack leas -2,SP ;allocate z ldd n,SP ;RegD = n subd m,SP ;RegD = n-m tfr D,Y sty z,SP ;z = n-m tfr Y,D leas 4,SP ;deallocate z,n rts set 2 leas -4,SP ;allocate z,secondParameter movw #5,G ;G=5 movw #6,z,SP ;z=6 ldy #1 sty 0,SP ;put second parameter on stack ldd z,SP ;first parameter in RegD jsr sub tfr D,X std G ;G = sub(z,1) ldd #0 leas 4,SP ;deallocate z,secondParameter rts
The third compiler we will study is Metrowerks Codewarrior 4.6 for the Freescale 9S12. Again, the disassembled output has been edited to clarify its operation (see Program 7.17). This is a highly optimized compiler. The local variable in both main and sub was implemented in a register. For this compiler, the second (or last) parameter is passed in Register D and the remaining parameters are pushed on the stack.
274
7 䡲 Local Variables and Parameter Passing
Program 7.17 Assembly code generated for the 9S12 by the ICC12 compiler.
m n sub
set 0 set 4 pshd ;place m on the stack ldd n,sp ;RegD = n subd m,sp ;RegD = n-m pulx ;deallocate m rts main ldab #5 clra std G ;G=5 incb ;RegD=z=6 pshd ;put first parameter on stack ldab #1 ;second parameter in RegD bsr sub leas 2,sp ;discard parameter std G ;G = sub(z,1) clrb clra rts
Observation: Notice the difference in code efficiency between a free compiler (GCC), a compiler costing about $250 (ICCV7), and a compiler costing over $3000 (Metrowerks Codewarrior).
7.6
Tutorial 7 Debugging Techniques The objective of this tutorial is to illustrate some debugging techniques. In particular, we will use TExaS to visualize stack overflow and stack underflow. Action: Copy the Tutor7.rtf Tutor7.uc files from the Web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. This should open the corresponding microcomputer window. This program contains an integer square root subroutine, based on Newton’s method. There is a bug in it that causes a stack overflow. The purpose of this main program is to exhaustively test this function by giving it all possible input patterns and manually checking the validity of all outputs. Being able to evaluate a subroutine with a known and repeatable sequence of inputs is called stabilization. Once a system is stabilized (the inputs are fixed and known), changes to the subroutine can be made being sure changes in the output are a result of software modification and not due to changes in the input. Question 7.1 This is a very easy bug to spot, but it represents a typical programming error. By visual inspection of the main program, identify the programming error that causes the stack overflow, but don’t fix it. Question 7.2 What’s the difference between a breakpoint and a ScanPoint? Action: Assemble the program. Notice that Input and Output parameters with unsigned 8-bit decimal format are in the ViewBox. A breakpoint has been added at the location in the main program labeled check. You can add breakpoints in two ways. The first way is to left-click the line in the listing file, then right-click executing BreakAtCursor. The second way is to type the address (you should use the symbolic address check rather than its numerical value) into the Break/ScanPoints box and click the add button. You could have used its absolute address, but absolute addresses must be recalculated each time the software is modified. The double red arrow («) points in the listing file to the breakpoint. Make check a ScanPoint by toggling the Mode->BreakMode command until the check mark is removed. Figure T7.1 shows the resulting configuration.
7.6 䡲 Tutorial 7 Debugging Techniques
275
Figure T7.1 A ScanPoint is added to Tutorial 7.
Action: Run the system until the first ten outputs are calculated, then stop the simulation with a F12. You should see the following results in TheLog.rtf file. These results are correct. Input=0 Input=1 Input=2 Input=3 Input=4 Input=5 Input=6 Input=7 Input=8 Input=9
Output=0 Output=1 Output=2 Output=2 Output=2 Output=2 Output=3 Output=3 Output=3 Output=3
Question 7.3 Explain how these first ten results are correct. In particular, verify how the output is the square root of the input. Are there any minor errors? Action: Run the system until TExaS gives the “ Write to EEPROM address 0x07FF” error. Hit reset, run it again, and this time observe the memory box in the Stack window. Notice locations $0800 (Input) and $0801 (Output). The rest of the memory ($802 to $0901) is the stack. In particular, watch in the memory box as the stack overflows. Question 7.4 Look in TheList.rtf file and identify which instruction caused the error. The cursor arrow (») will point to the instruction after the one that caused the error. Action: When a stack instruction causes a bug, observing the stack pointer makes sense. Add the SP to the ViewBox, hit reset, and run it again. The last few outputs are shown below Input=59 Output=8 SP=$0813 Input=60 Output=8 SP=$080F Input=61 Output=8 SP=$080B Input=8 Output=8 SP=$0807 Write to EEPROM address 0x07FF. Question 7.5 Stack errors can cause weird behavior. Why did input change from 61 to 8, when it should have been 62? Action: Fix the bug (change the second pshx to a pulx), assemble, and run the debugged system. Action: Sometimes a stack error results in program branching to a location that is not part of your program. Remove the pshb instruction from first line of the sqrt subroutine. Assemble the software with this new bug and run the system. This stack underflow will cause an error. You should get a Read from uninitialized RAM address error.
276
7 䡲 Local Variables and Parameter Passing Question 7.6 You won’t be able to find the cursor arrow (») in TheList.rtf file. Add the PC to the ViewBox, hit reset, run the system again, and check the value of the PC at the time of error. Question 7.7 There are two ways to find this bug. The first way is to execute Action-BackDump. What are the last five instructions to be executed just before the error? Where in the program are these five instructions? Question 7.8 The second way to visualize the error is to activate Mode-FollowPC. Click this option, reset the computer, and run it again. The rts instruction is highlighted, showing you the last instruction to execute. What does the purple color on the pulb instruction mean?
7.7
Homework Problems Homework 7.1 What does it mean to say a function is public versus private? Why is this distinction important? Homework 7.2 What does it mean to say a variable is public versus private? Why is this distinction important? Homework 7.3 What does it mean to say a variable is local versus global? Homework 7.4 Write assembly code that finds the average value of a ten-element array. The two parameters are passed by reference on the stack. Local variables must be allocated on the stack. void average(unsigned short *pt, unsigned short *ave){ unsigned short sum,n; sum = 0; for(n=0;nBuffer[j]){ temp = Buffer[j-1]; /* Exchange */ Buffer[j-1] = Buffer[j]; Buffer[j] = temp; } } } } A typical calling sequence is ldx #mydata ; pointer to 20-byte structure (call by reference) pshx
7.8 䡲 Laboratory Assignments ldaa #20 psha jsr Bubble pula pulx
281
; Count (call by value)
; balance stack
b) Use this simple assembly code to debug your Bubble Sort algorithm. org $0800 mydata rmb 5 main lds #$4000 ldaa #$35 staa mydata ; initialize ldd #$3433 std mydata+1 ldd #$3231 std mydata+3 ; mydata[]={'5','4','3','2','1'} ldx #mydata ; pointer to 5-byte structure (call by reference) pshx ldab #5 ; Count (call by value) pshb jsr Bubble ins ; balance stack pulx stop c) Write assembly code that tests the Bubble Sort algorithm. Copy and paste the SCI device driver software from tut2.rtf. This main program will input an ASCII string from a SCICRT interface (call SCI_InString), calculate its length, call the bubble sort subroutine, and output the sorted string on the SCI-CRT (call SCI_OutString). d) Add debugging code to the test software in part c) that measures the elasped execution time for the sort subroutine. Plot the execution time versus buffer size (using worst-case initial data) for buffer sizes 10, 20, 30, and 40 bytes. Fit this data to a quadratic equation to derive a general solution for all sizes. Lab 7.2 Heap Sort Purpose. This lab has these major objectives: 䡲 To evaluate the static and dynamic efficiency of software 䡲 To learn how to pass subroutine parameters on the stack 䡲 By value, pushing the value onto the stack 䡲 By reference, pushing a pointer onto the stack 䡲 To implement local variables on the stack 䡲 To study the Heap Sort algorithm Description. a) Write assembly code that implements the Heap Sort algorithm. The input parameters are passed on the stack. Local variables must be allocated on the stack. The buffer size (Count) is 1 to 255. void HeapSort(char *Buffer, unsigned char Count){ // Count is the size of the byte array Buffer[i] unsigned char i,j; // used when sifting unsigned char ir; unsigned char m; // used in the hiring phase char z; // temporary, used to sort m = (Count>>1)+1; // initial value Count/2+1 ir = Count; for(;;){ if(m > 1){ --m;
282
7 䡲 Local Variables and Parameter Passing z = Buffer[m]; // } else{ z = Buffer[ir]; Buffer[ir] = Buffer[1]; if(--ir == 1){ Buffer[1] = z;
still hiring // // // // // //
in retirement and promotion clear space at end Retire top of heap into it Done with last promotion? least competent worker of all
break; } } i = m; // whether in the hiring or promotion phase j = m+m; // we set up to sift down element z to while(j keycode[j]; (*Num)++; } column>>=1; // shift into position } pt++; } return key; }
8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller
303
once in the initialization. The Key_Scan function returns two parameters. One parameter is the number of keys pressed. If there is exactly one key pressed, the second parameter contains the ASCII code representing that key. A debounced interface is created by scanning the keyboard at a rate slower than the time of the bouncing. For example, if the bounce is less than 5 ms, then scan the keyboard every 10 ms. This way a bouncing key will not be seen as touched/released/touched. Observation: An n by n matrix keypad has n2 keys, but requires only 2n I/O pins. You can detect any 0, 1, or 2 key combinations, but it has trouble when 3 or more are pressed. Checkpoint 8.3: What happens if the three keys ‘1’ ‘2’ and ‘5’ are all pressed? Checkpoint 8.4: Why wouldn’t you use a matrix approach when creating a music keyboard for an electric piano?
The key wakeup and input capture will be presented in the next chapter. Either mechanism can be used to generate interrupts on touch and release. We can “arm” this interface for interrupts by driving all the rows to zero.
8.5
Parallel Port LCD Interface with the HD44780 Controller Microprocessor controlled LCD displays are widely used, having replaced most of their LED counterparts, because of their low power and flexible display graphics. This example will illustrate how a handshaked parallel port of the microcomputer will be used to output to the LCD display. The hardware for the display uses an industry standard HD44780 controller, as shown in Figure 8.13. The low-level software initializes and outputs to the HD44780 controller. The 9S12 simply writes ASCII characters to the HC44780 controller. Each ASCII character is mapped into a 5 by 8 bit pixel image, called a font. A 1 by 16 LCD display is 80 pixels wide by 8 pixels, and the HD44780 is responsible for refreshing the pixels in a rastered scanned manner similar to maintaining an image on a TV screen or computer monitor.
Figure 8.13 Interface of a HD44780 LCD controller.
+5
9S12
10kΩ PH0 PH1 PH2 PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Vss (ground) Vdd (power) Vee (contrast) RS R/W E DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7
1 by 16 LCD display
HD44780 controller 5 by 8 bit font
There are four types of access cycles to the HD44780 depending on RS and R/W as shown in Table 8.12. Table 8.12 Two control signals specify the type of access to the HD44780.
RS
R/W
Cycle
0 0 1 1
0 1 0 1
Write to Instruction Register Read Busy Flag (bit 7) Write data from P to the HD44780 Read data from HD44780 to the P
304
8 䡲 Serial and Parallel Port Interfacing
Normally, you write ASCII characters into the data buffer (called DDRAM in the data sheets) to have them displayed on the screen. However, you can create up to eight new characters the LCD by writing to the CGRAM; notice the University of Texas (UT) symbol in Figure 8.14. These new characters exist as ASCII data 0 to 7. Figure 8.14 HD44780-based LCD display interfaced to a 9S12. (Courtesy of Jonathan Valvano.)
Two types of synchronization can be used, blind-cycle and busy-waiting. Most operations require 40 s to complete while some require 1.64 ms. This implementation uses the timer to create the blind-cycle wait. A busy-waiting interface would have provided feedback to detect a faulty interface, but has the problem of creating a software crash if the LCD never finishes. A better interface would have utilized both busy-waiting and blind-cycle, so that the software can return with an error code if a display operation does not finish on time (due to a broken wire or damaged display.) First we present a low-level private helper function, see Program 8.6. This function would not have a prototype in the LCD.H file. E equ 4 ;PH2 RW equ 2 ;PH1 RS equ 1 ;PH0 ; Output command to LCD ; Inputs: RegA is command, Outputs: none OutCmd staa PTP movb #0,PTH ;E=0, RS=0, R/W=0 movb #E,PTH ;E=1, RS=0, R/W=0 movb #0,PTH ;E=0, RS=0, R/W=0 ldd #40 jsr Timer_Wait ;at least 37us rts
#define E 4 // on PH2 #define RW 2 // on PH1 #define RS 1 // on PH0 void OutCmd(unsigned char command){ PTP = command; PTH = 0; // E=0, R/W=0, RS=0 PTH = E; // E=1, R/W=0, RS=0 PTH = 0; // E=0, R/W=0, RS=0 Timer_Wait(40); // at least 37us }
Program 8.6 Private functions for an HD44780 controlled LCD display.
Next, we show the high-level public functions, see Program 8.7. These functions would have prototypes in the LCD.H file. The initialization sequence is copied from the data sheet of the HD44780. Figure 8.15 shows a rough sketch of the E, RS, R/W and data signals as the LCD_OutChar function is executed.
8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller ; Initialize HD44780 LCD display ; Inputs: none, Outputs: none LCD_Init movb #$FF,DDRP ;LCD data movb #$FF,DDRH ;PH3=R/W,PH1=E, PH2=RS jsr Timer_Init ;1us TCNT ldy #15 jsr Timer_Wait1ms ;15ms ldaa #$38 ;first time jsr OutCmd ldy #4 jsr Timer_Wait1ms ;4ms ldaa #$38 ;second time jsr OutCmd ldd #100 jsr Timer_Wait ;100us ldaa #$38 ;third time jsr OutCmd ldaa #$38 ;N=1 two line, F=0 5x7 jsr OutCmd ;DL=1 8-bit data ldaa #$08 ;display off jsr OutCmd jsr LCD_Clear ldaa #$0E ;set D=1, C=1, B=0 jsr OutCmd ;cursor on,no blink ldaa #$06 ;set I/D, S jsr OutCmd ;inc, no shift ldaa #$14 ;cursor move jsr OutCmd ;left rts ; Output one character to LCD ; Inputs: RegA is ASCII, Outputs: none LCD_OutChar staa PTP movb #RS,PTH ;E=0, R/W=0, RS=1 movb #E+RS,PTH ;E=1, R/W=0, RS=1 movb #RS,PTH ;E=0, R/W=0, RS=1 ldd #40 jsr Timer_Wait ;at least 40us rts LCD_Clear ldaa #$01 jsr OutCmd ;Clear Display ldd #1600 jsr Timer_Wait ;at least 1.52ms ldaa #$02 jsr OutCmd ;Cursor to home ldd #1600 jsr Timer_Wait ;at least 1.52ms rts
Program 8.7 Public functions for an HD44780 controlled LCD display.
305
void LCD_init(void){ DDRH = 0xFF; DDRP = 0xFF; Timer_Init(); // 1us TCNT Timer_Wait1ms(15); // 15 ms OutCmd(0x38); // function set Timer_Wait1ms(4); // 4 ms OutCmd(0x38); // second time Timer_Wait(100); // 100us OutCmd(0x38); // third time // now the busy flag could be read OutCmd(0x38); // 8bit, N=1 2line, F=0 5by7 OutCmd(0x08); // D=0 displayoff LCD_Clear(); OutCmd(0x0E); // D=1 displayon, // C=1 cursoron, B=0 blinkoff OutCmd(0x06); // Entry mode // I/D=1 Increment, S=0 nodisplayshift OutCmd(0x14); // S/C=0 cursormove, R/L=0 shiftleft } void LCD_OutChar(unsigned char letter){ // letter is ASCII code PTP = letter; PTH = RS; // E=0, R/W=0, RS=1 PTH = E+RS; // E=1, R/W=0, RS=1 PTH = RS; // E=0, R/W=0, RS=1 Timer_Wait(40); // 40 us wait } void LCD_clear(void){ OutCmd(0x01); // Clear Display Timer_Wait(1600); // 1.6 ms wait OutCmd(0x02); // Cursor to home Timer_Wait(1600); // 1.6 ms wait }
306
8 䡲 Serial and Parallel Port Interfacing
Figure 8.15 Timing diagram of the LCD signals as data is sent to the HD44780 display.
PTP = letter;
data
PTH = RS;
RS
// E=0, R/W=0, RS=1
PTH = E+RS; // E=1, R/W=0, RS=1 PTH = RS; // E=0, R/W=0, RS=1 Timer_Wait(40); // 40 us wait
R/W E
Checkpoint 8.5: Assuming the 9S12 is running at 8 MHz, how many s wide is the E pulse for the assembly language solution in Program 8.7? The movb instruction requires 4 cycles.
8.6
Binary Actuators 8.6.1 Interface
Relays, solenoids, and DC motors are grouped together because their electrical interfaces are similar. We can add speakers to this group if the sound is generated with a square wave. In each case, there is a coil, and the computer must drive (or not drive) current through the coil. To interface a coil, we consider voltage, current, and inductance. We need a power supply at the desired voltage requirement of the coil. If the only available power supply is larger than the desired coil voltage, we use a voltage regulator (rather than a resistor divider to create the desired voltage.) We connect the power supply to the positive terminal of the coil, shown as V in Figure 8.16. We will use a transistor device to drive the negative side of the coil to ground. The computer can turn the current on and off using this transistor. The second consideration is current. In particular, we must however select the power supply and an interface device that can support the coil current. The 7406 is a digital invertor with open collector outputs (hiZ and low). The 2N2222 is a bipolar junction transistor (BJT), NPN type, with moderate current gain. The TIP120 is a Darlingtion transistor, also NPN type, that can handle larger currents. The IRF540 is a MOSFET transistor that can handle even more current. BJT and Darlington transistors are current-controlled (meaning the output is a function of the input current), while the MOSFET is voltage-controlled (output is a function of input voltage). When interfacing a coil to the microcontroller, we use information like Table 8.13 to select an interface
Figure 8.16 Binary interface to EM relay, solenoid, DC motor or speaker.
+V
+V
+
+
R 1N914 9S12
7406
Port
IOL
L + –
emf
9S12
IC Rb IB
–
+
Port
VOL –
+
VOH –
Table 8.13 Four possible devices that can be used to interface a coil compared to the 9S12.
R
2N2222 TIP120 1N914 or IRF540
Coil
+ – VCE + VBE –
Device
Type
Maximum Current
9S12 7406 2N2222 TIP120 IRF540
CMOS TTL logic BJT NPN Darlington NPN power MOSFET
10 mA 40 mA 500 mA 5A 28 A
Coil L + –
emf –
8.6 䡲 Binary Actuators
307
device capable the current necessary to activate the coil. It is a good design practice to select a driver with a maximum current at least twice the required coil current. When the digital Port output is high, the the interface transistor is active and current flows through the coil. When the digital Port output is low, the transistor is not active and no current flows through the coil. Similar to the solenoid and EM relay, the DC motor has a frame that remains motionless, and an armature that moves. In this case, the armature moves in a circular manner (shaft rotation). A DC motor has an electro-magnet as well. When current flows through the coil, a magnetic force is created causing a rotation of the shaft. Brushes positioned between the frame and armature are used to alternate the current direction through the coil, so that a DC current generates a continuous rotation of the shaft. When the current is removed, the magnetic force stops, and the shaft is free to rotate. The resistance in the coil (R) comes from the long wire that goes from the terminal to the – terminal of the motor. The inductance in the coil (L) arises from the fact that the wire is wound into coils to create the electromagnetics. The coil itself can generate its own voltage (emf) because of the interaction between the electric and magnetic fields. If the coil is a DC motor, then the emf is a function of both the speed of the motor and the developed torque (which in turn is a function of the applied load on the motor.) Because of the internal emf of the coil, the current will depend on the mechanical load. For example, a DC motor running with no load might draw 50 mA, but under load (friction) the current may jump to 500 mA. Observation: It is important to realize that many devices can not be connected directly up to the microcontroller. In the specific case of motors, we need an interface that can handle the voltage and current required by the motor.
The third consideration is inductance in the coil. The 1N914 diode in Figure 8.16 provides protection from the back emf generated when the switch is turned off, and the large dI/dt across the inductor induces a large voltage (on the negative terminal of the coil), according to V L•dI/dt. For example, if you are driving 0.1A through a 0.1 mH coil (Port output 1) using a 2N2222, then disable the driver (Port output 0), the 2N2222 will turn off in about 20ns. This creates a dI/dt of at least 5•106 A/s, producing a back emf of 500 V! The 1N914 diode shorts out this voltage, protecting the electronic from potential damage. The 1N914 is called a snubber diode. If you are sinking 16 mA (IOL) with the 7406, the output voltage (VOL) will be 0.4 V. However, when the IOL of the 7406 equals 40 mA, its VOL will be 0.7 V. 40 mA is not a lot of current when it comes to typical coils. However, the 7406 interface is appropriate to control small reed relays. Checkpoint 8.6: A reed relay is interfaced with the 7406 circuit in Figure 8.16. The positive terminal of the coil is connected to 5 V and the coil requires 40 mA. What will be the voltage across the coil when active?
There are lots of motor driver chips, but they are fundamentally similar to the circuits shown in Figure 8.16. For the 2N2222 and TIP120 NPN transistors, if the Port output is low, no current can flow into the base, so the transistor is off, and the collector current, IC, will be zero. If the Port output is high, current does flow into the base and VBE goes above VBEsat turning on the transistor. The transistor is in the linear range if VBE VBEsat and Ic hfe•Ib. The transistor is in the saturated mode if VBE VBEsat, VCE 0.3 V and Ic hfe•Ib. We select the resistor for the NPN transistor interfaces to operate right at the transition between linear and saturated mode. We start with the desired coil current, Icoil (the voltage across the coil will be V VCE which will be about V 0.3 V). Next, we calculate the needed base current (Ib) given the current gain of the NPN Ib Icoil/hfe
308
8 䡲 Serial and Parallel Port Interfacing
knowing the current gain of the NPN (hfe). See Table 8.14. Finally, given the output high voltage of the microcontroller (VOH is about 5 V) and base-emitter voltage of the NPN (VBEsat) needed to activate the transistor, we can calculate the desired interface resistor. Rb (VOH VBEsat)/Ib hfe *(VOH VBEsat)/Icoil The inequality means we can choose a smaller resistor, creating a larger Ib. Because the of the transistors can vary a lot, it is a good design practice to make the Rb resistor about 1 ⁄2 the value shown in the above equation. Since the transistor is saturated, the increased base current produces the same VCE and thus the same coil current.
Table 8.14 Design parameters for the 2N2222 and TIP120.
Parameter
2N2222 (IC 150 mA)
2N2222 (IC 500 mA)
TIP120 (IC 3A)
hfe VBEsat VCE at saturation
100 0.6 0.3
40 2 1
1000 2.5 V 2V
The IRF540 MOSFET is a voltage-controlled device, if the Port output is low, the MOSFET is off, and the coil current will be zero. If the Port output is high, the MOSFET is on, and the VCE will be very close to 0. No resistor is needed between the Port output and the gate of the MOSFET, but often we add a resistor (i.e., Rb 1 k) to limit current into and out of the 9S12 during the turn on/off transients. Because of the resistance of the coil, there will not be significant dI/dt when the device is turned on. Consider a DC motor as shown in Figure 8.16 with V 12 V, R 50 and L 100 H. Assume we are using a 2N2222 with a VCE of 1 V at saturation. Initially the motor is off (no current to the motor). At time t 0, the digital port goes from 0 to 5 and transistor turns on. Assume for this section, the emf is zero (motor has no external torque applied to the shaft) and the transistor turns on instantaneously, we can derive an equation for the motor (Ic) current as a function of time. The voltage across both LC together is 12 VCE 11 V at time 0. At time 0, the inductor is an open circuit. Conversely, at time , the inductor is a short circuit. The Ic at time 0 is 0, and the current will not change instantaneously because of the inductor. Thus, the Ic is 0 at time 0. The Ic is 11 V/50 220 mA at time . 11 V Ic*R L*d Ic/dt General solution to this differential equation is Ic I0 I1et/
d Ic/dt (I1/ )et/
We plug the general solution into the differential equation and boundary conditions. 11 V (I0 I1et/ )*R L*(I1/ )et/ To solve the differential equation, the time constant will be L/R 2 sec. Using initial conditions, we get Ic 220 mA*(1 et/2 s)
Example 8.3 Design an interface for two 12 V 1A geared DC motors. These two motors will be used to propel a robot with two independent drive wheels as shown in Figure 8.17.
8.6 䡲 Binary Actuators
309
Figure 8.17 Geared DC motors provide a good torque and speed for light-weight robots. (Courtesy of Jonathan Valvano.)
Solution We will use two copies of the TIP120 circuit in Figure 8.16 because the TIP120 can sink at least three times the current needed for this motor. We select a 12 V supply and connect it to the V in the circuit. The needed base current is. Ib Icoil/hfe 1A/1000 1 mA The desired interface resistor. Rb (VOH Vbe)/Ib (5 2.5)/1 mA 2.5 k To cover the variability in hfe, we will use a 1.24 k resistor instead of the 2.5 k. The actual voltage on the motor when active will be 12 2 10 V. The coils and transistors can vary a lot, so it is appropriate to experimentally verify the design by measuring the voltages and currents.
8.6.2 Electromagnetic and Solid-State Relays
A relay is a device that responds to a small current or voltage change by activating switches or other devices in an electric circuit. It is used to remotely switch signals or power. The input control is usually electrically isolated from the output switch. The input signal determines whether the output switch is open or closed. Relays are classified into three categories depending upon whether the output switches power (i.e., high currents through the switch) or electronic signals (i.e., low currents through the switch). Another difference is how the relay implements the switch. An electromagnetic (EM) relay uses a coil to apply EM force to a contact switch that physically opens and closes. The solid state relay (SSR) uses transistor switches made from solid state components to electronically allow or prevent current flow across the switch). The three types are: 䡲 The classic general purpose relay has an EM coil and can switch AC power 䡲 The reed relay has an EM coil and can switch low level DC electronic signals 䡲 The solid state relay (SSR) has an input triggered semiconductor power switch Two solid state relays are shown in Figure 8.18. Interfacing a SSR is identical to interfacing an LED, which was previously described in Section 2.8.3, Figure 2.17. A SSR interface was presented earlier as Figure 3.10. SSRs allow the microcontroller to switch AC loads from 1 to 30A. They are appropriate in situations where the power is turned on and off many times.
310
8 䡲 Serial and Parallel Port Interfacing
Figure 8.18 Solid state relays can be used to control power to an AC appliance. (Courtesy of Jonathan Valvano.)
The input circuit of an EM relay is a coil with an iron core. The output switch includes two sets of silver or silver-alloy contacts (called poles.) One set is fixed to the relay frame, and the other set is located at the end of leaf spring poles connected to the armature. The contacts are held in the “normally closed” position by the armature return spring. When the input circuit energizes the EM coil, a “pull in” force is applied to the armature and the “normally closed” contacts are released (called break) and the “normally open” contacts are connected (called make.) The armature pull in can either energize or de-energize the output circuit depending on how it is wired. Relays are mounted in special sockets, or directly soldered onto a PC board. The number of poles (e.g., single pole, double pole, 3P, 4P, etc.) refers to the number of switches that are controlled by the input. Single throw means each switch has two contacts that can be open or closed. Double throw means each switch has three contacts. The common contact will be connected to one of the other two contacts (but not both at the same time.) The parameters of the output switch include maximum AC (or DC) power, maximum current, maximum voltage, on resistance, and off resistance. A DC signal will weld the contacts together at a lower current value than an AC signal, therefore the maximum ratings for DC are considerable smaller than for AC. Other relay parameters include turn on time, turn off time, life expectancy, and input/output isolation. Life expectancy is measured in number of operations. Figure 8.19 illustrates the various configurations available. The sequence of operation is described in Table 8.15. Figure 8.19 Standard relay configurations.
Form A 1
1
Form C 1 2
Form D 2 1
Form E 3 2
+
+
+
+
1 +
–
–
–
–
–
SPST-NO
Table 8.15 Standard definitions for five relay configurations.
Form B
SPST-NC
SPDT
SPDT
SPDT (B-M-B)
Form
Activation Sequence
Deactivation Sequence
A B C D E
Make 1 Break 1 Break 1, Make 2 Make 1, Break 2 Break 1, Make 2, Break 3
Break 1 Make 1 Break 2, Make 1 Make 2, Break 1
8.7 䡲 *Pulse-Width Modulation
8.6.3 Solenoids
311
Solenoids are used in discrete mechanical control situations such as door locks, automatic disk/tape ejectors, and liquid/gas flow control valves (on/off type). Much like an EM relay, there is a frame that remains motionless, and an armature that moves in a discrete fashion (on/off). A solenoid has an electro-magnet. When current flows through the coil, a magnetic force is created causing a discrete motion of the armature. Each of the solenoids shown Figure 8.20 has a cylindrically shaped armature the moves in the horizontal direction relative to the photograph. The solenoid on the top is used in a door lock, and the second from top is used to eject the tape from a video cassette player. When the current is removed, the magnetic force stops, and the armature is free to move. The motion in the opposite direction can be produced by a spring, gravity, or by a second solenoid.
Figure 8.20 Photo of four solenoids. (Courtesy of Jonathan Valvano.)
8.7
*Pulse-Width Modulation In the previous interfaces the microcontroller was able to control electrical power to a device in a binary fashion: either all on or all off. Sometimes it is desirable for the microcontroller to be able to vary the delivered power in a variable manner. One effective way to do this is to use pulse width modulation (PWM). The basic idea of PWM is to create a digital output wave of fixed frequency, but allow the microcontroller to vary its duty cycle. Figure 8.21 shows various waveforms that are high for H cycles and low for L cycles. The system is designed in such a way that H ⴙ L is constant (meaning the frequency is fixed). The duty cycle is defined as the fraction of time the signal is high: H Duty = H + L Hence, duty cycle varies from 0 to 1. We interface this digital output wave to an external actuator (like a DC motor), such that power is applied to the motor when the signal is high, and no power is applied when the signal is low. We purposely select a frequency high enough so the DC motor does not start/stop with each individual pulse, but rather responds to the overall average value of the wave. The average value of a PWM signal is linearly related to its duty cycle and is independent of its frequency. Let P (P V*I) be the power
312
8 䡲 Serial and Parallel Port Interfacing
Figure 8.21 Pulse width modulation used to vary power delivered to a DC motor.
+V
DC motor
+
R
1N914 2N2222 TIP120 or IRF540 Rb
9S12 PWM PP0
H
L
200
50
PP0
125
125
PP0
50
200
PP0
L + –
H
L
emf –
H H
L L
to the DC motor, shown in Figure 8.21, when the PP0 signal is high. Notice the circuit in Figure 8.21 is one of the examples previously described in Figure 8.16. Under conditions of constant speed and constant load, the delivered power to the motor is linearly related to duty cycle. Delivered power duty * P
H H+L
*P
Unfortunately, as speed and torque vary, the developed emf will affect delivered power. Nevertheless, PWM is a very effective mechanism, allowing the microcontroller to adjust delivered power. Appreciating the importance of pulse-width modulation, Freescale added dedicated hardware to handle PWM, not previously available in the 6811. The 9S12C32 has six channels, the 9S12DP512 has eight channels, and the 9S12E128 has 12 channels. This section will present the details on the 9S12DP512. With the exception of the MODRR register, the PWM operation on all 9S12 microcontrollers is identical. Table 8.16 shows the 9S12DP512 registers used to create pulse-width modulated outputs. There are eight 8-bit channels, but
Address
msb
$00B4 $00B6 $00B8 $00BA $00BC $00BE $00C0 $00C2
15 15 15 15 15 15 15 15
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$00A0 $00A1 $00A2 $00A3 $00A4 $00A5 $00A8 $00A9
PWME7 PPOL7 PCLK7 0 CAE7 CON67 Bit 7 Bit 7
PWME6 PPOL6 PCLK6 PCKB2 CAE6 CON45 6 6
PWME5 PPOL5 PCLK5 PCKB1 CAE5 CON23 5 5
PWME4 PPOL4 PCLK4 PCKB0 CAE4 CON01 4 4
PWME3 PPOL3 PCLK3 0 CAE3 PSWAI 3 3
PWME2 PPOL2 PCLK2 PCKA2 CAE2 PFRZ 2 2
PWME1 PPOL1 PCLK1 PCKA1 CAE1 0 1 1
PWME0 PPOL0 PCLK0 PCKA0 CAE0 0 Bit 0 Bit 0
PWME PWMPOL PWMCLK PWMPRCLK PWMCAE PWMCTL PWMSCLA PWMSCLB
14 14 14 14 14 14 14 14
13 13 13 13 13 13 13 13
12 12 12 12 12 12 12 12
11 11 11 11 11 11 11 11
10 10 10 10 10 10 10 10
9 9 9 9 9 9 9 9
8 8 8 8 8 8 8 8
7 7 7 7 7 7 7 7
6 6 6 6 6 6 6 6
5 5 5 5 5 5 5 5
Table 8.16 9S12DP512 registers used to configure pulse-width modulated outputs.
4 4 4 4 4 4 4 4
3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1
lsb
Name
0 0 0 0 0 0 0 0
PWMPER01 PWMPER23 PWMPER45 PWMPER67 PWMDTY01 PWMDTY23 PWMDTY45 PWMDTY45
8.7 䡲 *Pulse-Width Modulation
313
two 8-bit channels can be concatenated together to create one 16-bit channel. In particular, each of the 16-bit registers in Table 8.16 could be considered as two separate 8-bit registers. For example, the 16-bit register PWMPER01 could be considered as the two 8-bit registers PWMPER0 (at address $00B4) and PWMPER1 (at address $00B5). On the 9S12DP512, the PWM channels always use outputs on Port P (PP7-PP0). Bits 4, 5, and 6 of the MODRR register are used to map the SPI channels onto Port P, Port H or Port S, as described in Table 8.10. Since PWM has precedence over SPI (see Table 4.8), a Port P pin will become a PWM output if the corresponding bit in the PWME register is set (regardless of MODRR and SPI). On the 9S12C32, the six PWM channels use outputs on Port P (PP5 to PP0) or on Port T (PT4 to PT0). PP5 is available on all 9S12C32 packages, but the other five channels can be connected to either Port P or Port T. If a bit in the MODRR register is 1, the corresponding Port T pin is connected to the PWM system (see Table 8.17). If the bit is 1, the corresponding Port T pin is connected to the timer system. Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0247
0
0
0
MODRR4
MODRR3
MODRR2
MODRR1
MODRR0
MODRR
Table 8.17 9S12C32 MODRR register determines if PWM is on Port P or Port T.
On the 9S12E128, six PWM channels can be created on Port P (PP5 to PP0) and six more on Port U (PU5-PU0). The MODRR register can be used to map the bottom four bits of Port U onto either PWM or a timer module. The PWME register allows you to enable/disable individual PWM channels. The PWMCTL register is used to concatenate two 8-bit channels into one 16-bit PWM. For example, if the CON23 is 1, then channels 2 and 3 become one 16-bit channel with the output generated on PP3. Concatenated channels are controlled using the higher of the two channels. For example, concatenated channel 23 is configured with bits PWME3, PPOL3, PCLK3, and CAE3. The PWMPOL register specifies the polarity of the output. Figure 8.22 shows a PWM output for case when the PPOLx bit is 1. The output will be high for the number of counts in the PWMDTY register. The PWMPER register contains the number of counts in one complete cycle. The duty cycle is defined as the fraction of time the signal is high, calculated as a percent, depends on PWMPER and PWMDTY. Duty cycle 100% * PWMDTYx/PWMPERx Figure 8.22 PWM output generated when PPOL 1.
PWMPERx PWMDTYx PPx
If the PPOLx bit is 0, the output will be low for the number of counts in the PWMDTY register, as illustrated in Figure 8.23. The duty cycle, defined as a fraction of time the signal is high, is Duty cycle 100% * (PWMPERx PWMDTYx)/PWMPERx
Figure 8.23 PWM output generated when PPOL 0.
PWMPERx PWMDTYx PPx
314
8 䡲 Serial and Parallel Port Interfacing
There are many possible choices for the clock. The base clock is derived from the E clock. Activating the PLL affects the E clock, hence will affect the PWM generation. Channels 0, 1, 4, and 5 use either clock A or clock SA. Channels 2, 3, 6, and 7 use either clock B or clock SB. The six bits in the PWMPRCLK register, as shown in Table 8.18, determine the relationship between clocks A,B and the E clock.
Table 8.18 Clock A and Clock B prescale in PWMCLK.
PCKB2
PCKB1
PCKB0
Clock B
PCKA2
PCKA1
PCKA0
Clock A
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
E E/2 E/4 E/8 E/16 E/32 E/64 E/128
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
E E/2 E/4 E/8 E/16 E/32 E/64 E/128
It is possible to divide the A and B clocks further using the PWMSCLA and PWMSCLB registers. The period of the SA clock is the period of the A clock divided by two times the value in the PWMSCLA register. Similarly, the period of the SB clock is the period of the B clock divided by two times the value in the PWMSCLB register. If the value in PWMSCLA(B) is 0, then a divide by 512 is selected. The clock used for each channel is determined by the PWMCLK register. The period of the PWM output is the period of the selected clock times the value in the PWMPER register. PCLKn 1 Clock SB is the clock source for PWM channel n, where n 7, 6, 3, or 2 0 Clock B is the clock source for PWM channel n PCLKm 1 Clock SA is the clock source for PWM channel m, where m 5, 4, 1, or 0 0 Clock A is the clock source for PWM channel m Let n be the 3-bit value for PCKA2-0 in the PWMCLK register. Let the E clock period is PeriodE. Then if the A clock is selected for channel x, the periods of the A clock and PWM output will be PeriodA 2n * PeriodE PeriodPTx 2n * PWMPERx * PeriodE If the SA clock is selected for channel x, the periods of the SA clock and PWM output will be or or
PeriodSA 2n * 2 * PWMSCLA * PeriodE PeriodSA 2n * 512 * PeriodE (if PWMSCLA equals 0) PeriodPTx 2n * 2* PWMSCLA * PWMPERx * PeriodE PeriodPTx 2n * 512 * PWMPERx * PeriodE (if PWMSCLA equals 0)
The design of a PWM system considers three factors. The first factor is period of the PWM output. Most applications choose a period, initialize the waveform at that period, and adjust the duty cycle dynamically. The second factor is precision, which is the total number of duty cycles that can be created. An 8-bit PWM channel may have up to 256 different outputs, while a 16-bit channel can potentially create up to 65536 different duty cycles. More specifically, since the duty cycle register must be less than or equal to the period register (e.g., PWMDTYx PWMPERx), the precision of the system will equal PWMPERx 1 in alternatives. The last consideration is the number of channels. The 9S12DP512 supports up to eight 8-bit channels or four 16-bit channels. It is possible to mix and match, creating for example four 8-bit channels and two 16-bit channels. Different versions of the 9S12 will have different numbers of PWM channels.
8.7 䡲 *Pulse-Width Modulation
315
Example 8.4 Implement a 10-ms 8-bit PWM. Solution The software for this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. In this design example, we will create the PWM output using channel 0 generated on the PP0 output, using the hardware shown in Figure 8.21. In order to maximize precision, it is best to create the 10 ms period using as large a value in PWMPER0 as possible. We have the limitation that the prescale and PWMPER0 factors will be integers. Since 10 ms/256 equals 39.0625 s, we need a clock just larger than 39 s. The fastest clock that can be used is 40 s, resulting in PWMPER0 equal to 250. Assuming the E clock period is 125 ns, the prescale needs to be 40/0.125 or 320. There are a number of ways to make this happen, but one way is to select Clock A to be E/32, create SA A/10, the select the SA clock for channel 0, as shown in Program 8.8. Checkpoint 8.7: Give another way to create a prescale of 320 on channel 0.
PWM_Init0 ;10ms PWM on PP0 bset PWME,#$01 ;enable chan 0 bset PWMPOL,#$01 ;high then low bset PWMCLK,#$01 ;Clock SA ldaa PWMPRCLK anda #$F8 oraa #$05 staa PWMPRCLK ;A=E/32 movb #5,PWMSCLA ;SA=A/10 movb #250,PWMPER0 ;10ms period clr PWMDTY0 ;initially off rts PWM_Duty0 ;RegA is duty cycle staa PWMDTY0 ;0 to 250 rts
// 10ms PWM on PP0 void PWM_Inito(void){ PWME |= 0x01; // enable channel 0 PWMPOL |= 0x01; // PP0 high then low PWMCLK |= 0x01; // Clock SA PWMPRCLK = (PWMPRCLK&0xF8)|0x05; // A=E/32 PWMSCLA = 5; // SA=A/10, 0.125*320=40us PWMPER0 = 250; // 10ms period PWMDTY0 = 0; // initially off } // Set the duty cycle on PP0 output void PWM_Duty0(unsigned char duty){ PWMDTY0 = duty; // 0 to 250 }
Program 8.8 Implementation of an 8-bit PWM output.
Checkpoint 8.8: How would you modify Program 8.8 to have a period of 100 ms?
Example 8.5 Implement a 1-second 16-bit PWM. Solution Again this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. To create a 16-bit PWM we need to concatenate two 8-bit channels. We could have used channels 01, 23, 45, or 67. In this example, we choose to create the PWM output using concatenated channel 23 with its output generated on the PP3 output. In order to maximize precision, it is best to create the 1 s period using as large a value in PWMPER23 as possible. Since 1 s/65536 equals 15.2587890625 s, we need a clock just larger than 15 s. The fastest clock that can be used is 16 s, resulting in PWMPER23 equal to 62500. Assuming the E clock period is 125 ns, the prescale needs to be 16/0.125 or 128. There are a number of ways to make this happen, but one way is to make Clock B to be E/128, the select the B clock for channel 23, as shown in Program 8.9.
316
8 䡲 Serial and Parallel Port Interfacing
PWM_Init3 ;1s PWM on PP3 bset PWME,#$08 ;enable chan 3 bset PWMPOL,#$08 ;high then low bclr PWMCLK,#$08 ;Clock B bset PWMCTL,#$20 ;concat 2+3 ldaa PWMPRCLK anda #$8F oraa #$70 staa PWMPRCLK ;B=E/128 movw #62500,PWMPER23 ;1s period movw #0,PWMDTY23 ;off rts PWM_Duty3 ;RegD is duty cycle std PWMDTY0 ;0 to 62500 rts
// 1s PWM on PP3 void PWM_Init3(void){ PWME |= 0x08; // enable channel 3 PWMPOL |= 0x08; // PP3 high then low PWMCLK &=~0x08; // Clock B PWMCTL |= 0x20; // Concatenate 2+3 PWMPRCLK = (PWMPRCLK&0x8F)|0x70; // B=E/128 PWMPER23 = 62500; // 1s period PWMDTY23 = 0; // initially off } // Set the duty cycle on PP3 output void PWM_Duty3(unsigned short duty){ PWMDTY23 = duty; // 0 to 62500 }
Program 8.9 Implementation of a 16-bit PWM output. Checkpoint 8.9: What would be the effect of creating the 1 s output using a 1 ms SB clock and a PWMPER23 value of 1000? Checkpoint 8.19: Are programs 8.9 and 8.10 friendly enough to be used together?
8.8
*Stepper Motors A motor can be evaluated in terms of its maximum speed (RPM), its torque (N-m), and the efficiency in which it translates electrical power into mechanical power. Sometimes however, we wish to use a motor to control the rotational position ( motor shaft angle) rather control the rotational speed ( d/dt). Stepper motors are used in applications where precise positioning is more important than high RPM, high torque, or high efficiency. Stepper motors are very popular for microcontroller-based embedded systems because of their inherent digital interface. Figure 8.24 shows three stepper motors. The larger motors provide more
Figure 8.24 Photo of three stepper motors. (Courtesy of Jonathan Valvano.)
8.8 䡲 *Stepper Motors
317
torque, but require more current. It is easy for a computer to control both the position and velocity of a stepper motor in an open-loop fashion. Although the cost of a stepper motor is typically higher than an equivalent DC permanent magnetic field motor, the overall system cost is reduced because stepper motors may not require feedback sensors. They are used in printers to move paper and print heads, tapes/disks to position read/write heads, and highprecision robots. For example, the stepper motor shown in Figure 6.8 moves the R/W head from one track to another on an audio tape recorder. A bipolar stepper motor has two coils on the stator (the frame of the motor), labelled A and B in Figures 8.25 and 8.26. Typically, there is always current flowing through both coils. When current flows through both coils, the motor does not spin (it remains locked at that shaft angle). Stepper motors are rated in their holding torque, which is their ability to hold stationary against a rotational force (torque) when current is constantly flowing through both coils. To move a bipolar stepper, we reverse the direction of current through one (not both) of the coils, see Figure 8.25. To move it again we reverse the direction of current in the other coil. Remember, current is always flowing through both coils. Let the direction of the current be signified by up and down. To make the current go up, the microcontroller outputs a binary 01 to the interface. To make the current go down, it outputs a binary 10. Since there are 2 coils, four outputs will be required (e.g., 01012 means up/up). To spin the motor, we output the sequence 01012, 01102, 10102, 10012, . . . Figure 8.25 A bipolar stepper has 2 coils, but a unipolar stepper divides the two coils into four parts.
Interface
Bipolar stepper
Interface
Unipolar stepper A
A
+ –
+V A’
+ –
+ –
B + –
B
B’
+ –
+ –
0101 0110 1010 1001
I N
A
0101 0110 1010 1001
Flip B
Stator S
S
N S N
S
S
I
N
S N
I
I S
A
A B
A
N
N
I
S
S
N N S
I S
B
Output = 0101
N
S
Flip B
B
N
I
N N
Flip A
B
I
+V
N
S
B
I S
A
I S A
N
N S
Flip A
B I
N
S
S
N
N S N
S
I S
B
N
A
N
I S
B
S
I S
S S
N S
A
I
N
N
N S
I N B
Rotor Output = 0110
Output = 1010
Output = 1010
Figure 8.26 To rotate this stepper by 18°, the interface flips the direction of one of the currents.
N
A
318
8 䡲 Serial and Parallel Port Interfacing
over and over. Each output causes the motor to rotate a fixed angle. To rotate the other direction, we reverse the sequence (01012, 10012, 10102, 01102 . . .). There is a North and a South permanent magnet on the rotor (the part that spins). The amount of rotation caused by each current reversal is a fixed angle depending on the number of teeth on the permanent magnets. For example, the rotor in Figure 8.26 is drawn with 5 North teeth and 5 South teeth. If there are n teeth on the South magnet (also n teeth on the North magnet), then the stepper will move at 90/n degrees. This means there will be 4n steps per rotation. Because moving the motor involves accelerating a mass (rotational inertia) against a load friction, after we output a value, we must wait an amount of time before we can output again. If we output too fast, the motor does not have time to respond. The speed of the motor is related to the number of steps per rotation and the time in between outputs. For information on stepper motors see the data sheets web page at http://users.ece.utexas.edu/~valvano/Datasheets. The unipolar stepper motor provides for bi-directional currents by using a center tap, dividing each coil into two parts. In particular, coil A is split into coil A and A’, and coil B is split into coil B and B’. The center tap is connected to the V power source and the four ends of the coils can be controlled with open collector drivers. Because only half of the electromagnets are energized at one time, a unipolar stepper has less torque than an equivalent-sized bipolar stepper. However, unipolar steppers are easier to interface. For example, you can use four copies of the circuit in Figure 8.16 to interface a unipolar stepper motor. Figure 8.27 shows a circular linked graph containing the output commands to control a stepper motor. This simple FSM has no inputs, four output bits and four states. There is one state for each output pattern in the usual stepper sequence 5, 6, 10, 9, . . . The circular FSM is used to spin the motor is a clockwise direction. Notice the one-toone correspondence between the state graph in Figure 8.27 and the fsm[4] data structure in Program 8.10. Figure 8.27 This stepper motor FSM has four states. The 4-bit outputs are given in binary.
Name
Output
S5 0101
Next S6 0110
S10 1010
S9 1001
Example 8.6 Design a stepper motor controller than spins the motor at 6 RPM. Solution We choose a stepper motor according to the speed and torque requirements of the system. A stepper with 200 steps/rotation will provide a very smooth rotation while it spins. Just like the DC motor, we need an interface that can handle the currents required by the coils. We can use a L293 to interface either unipolar or bipolar steppers that require less than 1 A per coil. In general, the output current of a driver must be large enough to energize the stepper coils. We control the interface using an output port of the microcontroller, as shown in Figure 8.28. The circuit shows the interface of a unipolar stepper, but the bipolar stepper interface is similar except there is no V connection to the motor. The main program, Program 8.10, begins by initializing the Port T output and the state pointer. Every 5 ms the program outputs a new stepper command. The function Timer_Wait1ms() from Program 4.5 uses the built-in timer to generate an appropriate delay between outputs to the stepper. For a 200 step/rotation stepper, we need to wait 50 ms between outputs to spin at 6 RPM. Speed (1 rotation/200 steps)*(1000 ms/s)*(60 sec/min)*(1step/50 ms) 6 RPM
8.8 䡲 *Stepper Motors Figure 8.28 A unipolar stepper motor interfaced to a Freescale 9S12.
319
+V +5 16
PT3 9S12
2
L293 1A 1Y
8
A 3
Stepper Motor
1N914
A' PT2
7
2A 2Y
6
shaft
1N914
B PT1
10
3A 3Y
11 1N914
PT0
15
4A 4Y
1 1,2EN
+5
B'
14 4,5,12,13
1N914
9 3,4EN
org Out equ Next equ S5 fcb fdb S6 fcb fdb S10 fcb fdb S9 fcb fdb
$4000 0 1 5 S6 6 S10 10 S9 9 S5
; in ROM
main lds jsr movb ldx loop movb ldy jsr ldx bra
#$4000 Timer_Init #$FF,DDRT ;output to stepper #S5 ;initial state Out,x,PTT ;output #50 Timer_Wait1ms Next,x ;clockwise step loop
; output for this state ; clockwise next
const struct State { unsigned char Out; // command const struct State *next;}; // clockwise typedef const struct State StateType; #define S5 &fsm[0] #define S6 &fsm[1] #define S10 &fsm[2] #define S9 &fsm[3] StateType fsm[4]={ { 5, S6}, // Out=0101, Next=S6 { 6,S10}, // Out=0110, Next=S10 {10, S9}, // Out=1010, Next=S9 { 9, S5}}; // Out=1001, Next=S5 void main(void){ StateType *Pt; Timer_Init(); DDRT = 0xFF; // outputs Pt = S5; // initial state while(1){ // embedded systems never quit PTT = Pt->Out; // stepper out Timer_Wait1ms(50); // 50ms wait Pt = Pt->next; // Clockwise step } }
Program 8.10 Stepper motor controller.
To illustrate how easy it is to make changes to this implementation, let’s consider these three modifications. To make it spin in the other direction, we simply change pointers to sequence in the other direction. To make it spin at a different rate, we change the wait time. To implement an eight-step sequence (the half-stepping outputs are 5, 4, 6, 2, 10, 8, 9, 1, . . .), we add the four new states and link all eight states in the desired sequence. These changes can be easily made. Checkpoint 8.11: If the stepper motor were to have 36 steps per rotation, how fast would the motor spin using Program 8.10?
320
8 䡲 Serial and Parallel Port Interfacing Checkpoint 8.12: What would you change in Program 8.10 to make the motor spin at 30 RPM? Performance Tip: Use a DC motor for applications requiring high torque or high speed, and use a stepper motor for applications requiring accurate positioning at low speed. Performance Tip: To get high torque at low speed, use a geared DC motor (the motor spins at high speed, but the shaft spins slowly).
8.9
Homework Problems Homework 8.1 Assume the baud rate is 9600 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “ABC” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.2 Assume the baud rate is 19200 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “125” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.3 Assume the 9S12 E clock is 8 MHz. Write an assembly language subroutine that initializes the serial port to communicate at 9600 bits/sec, 8-bit data, 1 start bit, and 1 stop bit. Homework 8.4 Sometimes it is important for the software to know when the SCI transmission is complete. The transmit complete (TC) flag is set after the data in the shift register has been transmitted. Rewrite the SCI_OutChar subroutine so that it first writes to the data register, then waits for the TC flag to be set. The TC flag is cleared by first reading the status register with TC set followed by writing into the transmit data register. Homework 8.5 Design an interface for a 64-key keyboard, which is configured with eight rows and eight columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. Write an input subroutine that returns the key number 0 to 63 if a key is pressed or –1 if no key is pressed. Assume the keys do not bounce. Homework 8.6 Design an interface for a 20-key keyboard, which is configured with four rows and five columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. The keys bounce with a maximum time of 1 ms. Use a periodic interrupt at rate of 2 ms, and scan the keyboard in the ISR. Set a public global variable (called Key) equal to 0 to 19 if a key is pressed or –1 if no key is pressed. Homework 8.7 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 500 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 2000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.8 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 125 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 8000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.9 Interface an electromagnetic relay (2 wires) to the 9S12 pin PP5. The coil requires 250 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called On, that activates the relay, and a subroutine, called Off, that deactivates the relay. Homework 8.10 Interface a solenoid (2 wires) to the 9S12 pin PP5. The coil requires 100 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called Pulse, that activates the solenoid for 10 ms (then shuts off). No interrupts needed, use Timer_Wait.
8.10 䡲 Laboratory Assignments
321
Homework 8.11 Interface a DC motor (2 wires) to the 9S12. The coil requires 500 mA at 12 V. In addition to the motor output, there are two inputs. When the Go input is high the motor spins, (when Go is low, no power is delivered). The the motor is spinning, the other input (Direction) determines the CCW/CW rotational direction. Use a L293 H-bridge driver. Homework 8.12 There is a 9S12 digital output connected to a 9S12 digital input across a long cable. The connection has an equivalent capacitance of 25 pF into a 10 M resistance. The capacitance results from the long cable, and the resistance results from the input impedance of the 9S12. What is the time constant of this system? If we operate 10 times slower than the time constant, what is the maximum period allowed for this system? List two ways to speed up this transmission. Homework 8.13 Considering the voltages shown in Table 8.2, prove that you can connect a 9S12 output (VDD 5 V) to a 7404 input. Similarly, prove that you can not connect a 7404 output to a 9S12 input. Which logic family types shown in Table 8.2 allow the output of the digital gate to be connected to a 9S12 input? (By the way, if you wanted to connect a 7404 output to a 9S12 input, you could add a 1 k pull-up resistor on the 7404 output to 5 V, increasing the VOH of the output.) Homework 8.14 Interface a 12-bit DAC, MAX539 to the 9S12 SPI port. Connect MAX539 pins 1, 2, and 3 to the 9S12 SPI. Leave pin 4 not connected. Use a REF03 to create a 2.5 V reference and connect it to the MAX539 pin 6 reference input. Pin 8 is 5 V power and pin 5 is ground. Write two functions, one to initialize and one to update the DAC analog output. Updating the DAC output will require three SPI transmissions. Homework 8.15 Design an 8-bit PWM driver for Port P pin 5. Implement positive logic (PPOL5 equals 1) and left justified (CAE5 equals 0). There will be three functions: one to initialize the system at 1000 Hz 50% duty cycle, one to set the period, and a third function to set the duty cycle. You should fix the PWMPER5 to a constant value of 250, then allow the user to modify the clock using the second function. Add comments to your software that explains how the PWM driver can be used. Homework 8.16 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 1 rps, using Timer_Wait. Homework 8.17 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 100 mA at 6 V. There are 36 steps per revolution. Write software that spins the motor at 10 rps, using Timer_Wait. Homework 8.18 Interface a bipolar stepper motor (4 wires) to the 9S12 pins PT3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 5 rps, using Timer_Wait. Homework 8.19 Interface a 32 speaker (2 wires) to the 9S12 PT0. To make a sound, output a 1 kHz squarewave to the interface, creating about 1 V peak-to-peak on the speaker (about 30 mA pulsed current). Use the 5 V supply and a NPN transistor. Write a main program to activate the sound. Homework 8.20 Write open-loop software to control power to the robot shown in Figure 8.17. Assume the two copies of the TIP120 circuit from Figure 8.16 are connected to two 8-bit PWM channels. Write a Motor_Init subroutine to initialize the two PWM channels. Write a Motor_Left subroutine that adjusts delivered power to the left wheel. Write a Motor_Right subroutine that adjusts delivered power to the right wheel. Assume call by value parameters 0 to 250 in RegA for the left and right subroutines.
8.10
Laboratory Assignments Lab 8.1 Keyboard Device Driver Purpose: You will design the hardware interface between a keyboard and a microcomputer, create the low-level device driver, interface a single LED, and implement keyboard security system. Description: In this keyboard lab, you will design the keyboard interface using busy-wait synchronization. In the next chapter we will learn interrupts. Placing the key input task into a
322
8 䡲 Serial and Parallel Port Interfacing background thread, frees the main program to execute other tasks while the software is waiting for the operator to type something. This security system doesn’t have anything else to do, but in a complex system, it is important to be able to perform multiple tasks. The second advantage of interrupts is the ability to create accurate time delays even with a complex software environment. In this implementation, you will use busy-wait. One way to solve the switch-bounce problem is to wait in between scanning the keyboard. The time in between scans must be longer than the bounce time of the switch, but shorter than the total time a key is touched or released. For example, if the switch has a bounce time of 500 sec, then you could scan every 1 msec. If there is exactly one key typed and this key is different from the pattern observed at the time of the scan, then you will return the ASCII code. This experiment will illustrate how a parallel port of the microcomputer will be used to control a keyboard matrix. In each case your computer will drive the rows (output 0 or HiZ) and read the columns. The low level software (inputs, scans, debounces, and saves keys in a FIFO) runs in a background period interrupt thread. Your system must handle two-key rollover. For example, if the operator were to type “1,2,3”, they could push “1”, push “2”, release “1”, push “3”, release “2”, then release “3”. Low level device drivers normally exist in the BIOS ROM and have direct access to the hardware. They provide the interface between the hardware and the rest of the software. Good low-level device drivers allow: 䡲 䡲 䡲 䡲
New hardware to be installed New synchronization methods to be implemented (like changing busy-waiting to interrupts) New algorithms to be added (error detection, data compression) Higher level features to be built on top of the low level
and still maintain the same software interface. In larger systems like the Workstation and IBM-PC, the low level I/O software is compiled and burned in ROM separate from the code that will call it, it makes sense to implement the device drivers as software traps or software interrupt (swi) and specify the calling sequence in assembly language. In embedded systems like we use, it is OK to provide a source code file that the user can assemble into their application. Linking is the process of resolving addresses to code and programs that have been complied separately. In this way, the routines can be called from any program without requiring complicated linking. In other words, when the device driver is implemented with an swi, the linking is built into the operation of the software interrupt instruction. In our embedded system, the assembler will perform the linking. The concept of a device driver can be illustrated with a prototype device driver. You are encouraged to modify/extend this example, and define/develop/test your own format. A prototype keyboard device driver follows. The device driver software is grouped into four categories. 1. Data structures: global, private (accessed only by the device driver, not the user) openFlag Boolean that is true if the keyboard port is open initially false, set to true by Key_Open, set to false by Key_Close static storage (or dynamically created at bootstrap time, i.e., when loaded into memory) 2. Initialization routines (called by user) Key_Open Initialization of keyboard port Sets openFlag to true Initializes hardware Returns an error code in RegA if unsuccessful (already open) Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr Key_Open tsta ; 0 if opened correctly bne error Key_Close Release of keyboard port Sets openFlag to false Returns an error code in RegA if not previously open Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr KeyClose tsta ; 0 if closed correctly bne error
8.10 䡲 Laboratory Assignments
323
3. Regular I/O calls (called by user to perform I/O) Key_In Input an ASCII character from the keyboard port Waits for a key to be pressed, then waits for it to be released (there is bounce and two key rollover) Returns data in RegB if successful Returns an error code in RegA if unsuccessful device not open, hardware failure (probably not applicable here) Output Parameters: RegB is data, RegA is error code Typical calling sequence jsr Key_In tsta ; 0 if input is OK correctly bne error stab data ; save new key data Key_Status Returns the status of the keyboard port Returns a true in RegA if a call to Key_In would return with a key Returns a false in RegA if a call to Key_In would not return right away, but rather it would wait Returns a true if device not open, hardware failure (probably not applicable here) Typical calling sequence loop jsr work ; perform work until key is typed jsr Key_Status tsta ; true if a key is typed beq loop jsr Key_In ; read and process the key 4. Support software (private code). If you have any helper functions, these would be considered local to your driver and would be placed in this category. In C, these helper functions would be defined as private. In C, we could define the helper functions in the .c file, but not place a prototype in the .h file. In this way, the function could only be called from functions in the .c implementation, and not by the user. In assembly language we are very careful not to call a helper function from outside the device driver. An interrupt service routine is an example of support software. a) Create an I/O window and build a keyboard similar to the one shown in Figure L8.1. b) Write the low-level keyboard device driver. The main program will implement an access code based security system. Each access code will consist of four digits between 0 to 9.
Top view 1 2B
1A 4
3
C
2
3
4
5
6
7
8
9
2nd
0
help
enter
2
5E
6F
7
8
9
2nd
Clear
0
Help
Enter
D
1
3 4 clear
9 8 7 6 54 3 2 1 Wires on 0.1" centers
5 6 7 8
Bottom view Figure L8.1 0-9 keyboard with up arrow, down arrow, 2nd, CLEAR, HELP, and ENTER.
324
8 䡲 Serial and Parallel Port Interfacing The security system can recognize up to five access codes. You will specify these codes in global memory. The keyboard will be used to enter access codes. If this access code is one of the valid codes, checked by searching the access code database, the single LED is turned on. The LED will remain on until the new key is typed. The main program will need its own data structure to hold the last four keys typed. Assume “1257” and “2222” are valid codes. Following example shows the LED status (0 off, 1 on) after each key hit. 1 2 1 2 5 7 8 9 2 2 2 2 2 2 6 1 2 5 7 4 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 Write a main program to test the keyboard device driver. Collect some latency data (time from key touch to return of Key-In) measurements. c) Build the LED display writing a simple device driver that allows you to turn the LED on and off. d) Write a main program that implements the security system. Lab 8.2 Input/Output Interface to a Stepper Motor Purpose: The purpose of this laboratory is to develop a microcomputer system that spins a stepper motor. Description: a) Design the interface between the stepper and the 9S12. Use the simulator to create three files. Stepper.rtf will contain the assembly source code. Stepper.uc will contain the microcomputer configuration. Stepper.io will define the external connections. You should specify the microcomputer and attach one switch and the four signals to the stepper motor controller. The four stepper motor signals are called B, B, A, and A. b) You will write assembly code that inputs from the switch, and outputs to the stepper. When the input switch is “off” or open position, Port A bit 0 will be “0”. For this situation, your software will not change the Port B stepper motor outputs. When the input switch is “on” or closed position Port A bit 0 will be “1”. In this case, your software will output the sequence 5,6,10,9,5,6,10,9, . . . over and over again to the stepper motor. The motor will turn 1.8° for every new output to Port B. Instead of a stepper motor, the four outputs will be connected to four LEDs. The following C program describes the software algorithm.
Program L8.2 The C program to illustrate Lab 8.2.
unsigned char Angle; // ranges from 0 to 199 void main(void){ Angle=0; // initialize global DDRA=0; // make Port A inputs DDRB=0xFF; // make Port B outputs while(1){ while((PORTA&0x01)==0) {}; // stop if PA0=0, continue if PA0=1 PORTB=5; Angle++; PORTB=6; Angle++; PORTB=10; Angle++; PORTB=9; Angle++; if(Angle==200) Angle=0; } The software variable Angle varies from 0 to 199 as the stepper motor angle varies from 0 to 358°. c) During the demonstration, you will be asked to run the program to verify proper operation. Be prepared to use the debugger to determine how fast the simulated motor is spinning. Each output to Port B causes a 1.8° step. Lab 8.3 Calculator Purpose: The objectives of this lab are to: 䡲 Interface a matrix keyboard and HD44780 LCD display to the microcomputer 䡲 Write device drivers for the keyboard and HD44780 LCD display 䡲 Implement a four-function integer calculator
8.10 䡲 Laboratory Assignments
325
Description: In this lab you will design a four-function 8-bit unsigned integer calculator. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘’, ‘’, ‘*’, ‘/’, ‘’ and ‘C’. The HD44780 LCD display will show both an 8-bit global accumulator, and an 8-bit temporary register. You are free to design the calculator functionality in any way you wish, but you must be able to: (1) clear the accumulator and temporary; (2) type numbers in using the matrix keyboard; (3) add, subtract, multiply, and divide; (4) display the results on the HD44780 LCD display. Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 16-key matrix keyboard and
HD44780 display. You can assume the matrix keyboard does not bounce. During the initial debugging stages of the lab, you may disable the HD44780 busy flag, but your final demonstration will have to include the realistic timing for the LCD. b) Write a device driver for the HD44780. You should be able to: (1) initialize the interface; (2) clear the display; (3) output a character; (4) output an 8-bit integer; and (5) output a string. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write the main program that implements the calculator functionality. Include a “call-graph” of the system. Lab 8.4 Stepper Motor Controller Purpose: The objectives of this lab are to 䡲 Interface a matrix keyboard, a LCD display and stepper motor to the microcomputer 䡲 Write device drivers for the keyboard, LCD display and stepper motor 䡲 Implement a stepper motor controller Description: In this lab you will design a simple stepper motor controller. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘c’, and ‘g’. To move the motor, the operator types in the desired angle (0 to 359), then hits the ‘g’ key. As the operator enters the numbers, the digits are displayed on the three-digit LCD. If the operator types ‘c’, the command is cleared, and no motion occurs. The system should move clockwise or counterclockwise, whichever is fewer steps. While the motor is moving the three-digit LCD display will show the current angle of the stepper motor (0 to 359). Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 12-key matrix keyboard, a three-digit LCD display and one stepper motor. You can assume the matrix keyboard does not bounce. b) Write a device driver for the 3-digit LCD. You should be able to initialize the interface and output an angle as a number from 0 to 359. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write a device driver for the stepper interface. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the stepper motor must be included in this driver. The names of all the public driver subroutines should start with the letters “Step_”. Draw flowcharts of these subroutines. e) Write the main program that implements the calculator functionality. Include a “call-graph” of the system.
9
Interrupt Programming and Real-Time Systems Chapter 9 objectives are to: c c c c c c
Explain the fundamentals of interrupt programming Introduce interrupt-driven I/O, and implement periodic interrupts Explain key wakeup interrupts and use them to interface individual switches Present the timer-based modules needed for real-time systems Use the pulse accumulator and input capture to measure period and pulse width Develop methods to debug real-time events
An embedded system uses its input/output devices to interact with the external world. Input devices allow the computer to gather information, and output devices can display information. Output devices also allow the computer to manipulate its environment. The tight-coupling between the computer and external world distinguishes an embedded system from a regular computer system. The challenge is under most situations the software executes much faster than the hardware. E.g., the software may ask the hardware to clear the LCD display, but within the hardware this action might take 1 ms to complete. During this time, the software could execute thousands and thousands of instructions. Therefore, the synchronization between the executing software and its external environment is critical for the success of an embedded system. This chapter begins with an overview I/O synchronization. We then present general concepts about interrupts, and specific details for the 9S12. We will then use periodic interrupts to cause a software task to be executed on a periodic basis. This chapter describes the timer-based modules used to design real-time embedded systems.
9.1
I/O Sychronization Latency is the time between when the I/O device needs service, and the time when service is initiated. Latency includes hardware delays in the digital hardware plus computer software delays. For an input device, software latency (or software response time) is the time between new input data ready and the software reading the data. For an output device, latency is the delay from output device idle and the software giving the device new data to output. In this book, we will also have periodic events. For example, in our data acquisition systems, we wish to invoke the analog to digital converter (ADC) at a fixed time interval. In this way we can collect a sequence of digital values that approximate the continuous analog signal. Software latency in this case is the time between when
326
9.1 䡲 I/O Sychronization
327
the ADC converter is supposed to be started, and when it is actually started. The microcomputer-based control system also employs periodic software processing. Similar to the data acquisition system, the latency in a control system is the time between when the control software is supposed to be run, and when it is actually run. A real time system is one that can guarantee a worst case latency. In other words, the software response time is small and bounded. Throughput or bandwidth is the maximum data flow in bytes/second that can be processed by the system. Sometimes the bandwidth is limited by the I/O device, while other times it is limited by computer software. Bandwidth can be reported as an overall average or a short-term maximum. Priority determines the order of service when two or more requests are made simultaneously. Priority also determines if a high priority request should be allowed to suspend a low priority request that is currently being processed. We may also wish to implement equal priority, so that no one device can monopolize the computer. In some computer literature, the term “softreal-time” is used to describe a system that supports priority. The purpose of our interface is to allow the microprocessor to interact with its external I/O device. There are five mechanisms to synchronize the microprocessor with the I/O device. Each mechanism synchronizes the I/O data transfer to the busy to done transition. The methods are discussed in the following paragraphs. Blind cycle is a method where the software simply waits a fixed amount of time and assumes the I/O will complete before that fixed delay has elapsed. For an input device, the software triggers (starts) the external input hardware, wait a specified time, then reads data from device, see the left part of Figure 9.1. For an output device, the software writes data to the output device, triggers (starts) the device, then waits a specified time. We call this method blind, because there is no status information about the I/O device reported to the computer software. It is appropriate to use this method in situations where the I/O speed is short and predictable. One appropriate application of blind cycle synchronization is an ADC converter. For example, we can ask the ADC to convert, wait exactly 7 s, then read the digital result. This method works because the ADC conversion speed is short and predictable. Another good example of blind cycle synchronization is spinning a stepper motor. If we repeat this 8-step sequence over and over (1) output a 0x05, (2) wait 1 ms, (3) output a 0x06, (4) wait 1 ms, (5) output a 0x0A, (6) wait 1 ms, (7) output a 0x09, (8) wait 1 ms, the motor will spin at a constant speed. The LCD interface developed in Section 8.5 utilized blind cycle synchronization. Busy Waiting is a software loop that checks the I/O status waiting for the done state. For an input device, the software waits until the input device has new data, then reads it from the input device, see the middle part of Figure 9.1. For an output device, the software writes data, triggers the output device then waits until the device is finished. Another approach to output device interfacing is for the software to wait until the output device has finished the previous output, write data, then trigger the device. Busy-wait synchronization will be used in situations where the software system is relatively simple and real time response is not important. The ADC converter could also have been interfaced with busy-wait synchronization. For example, we can ask the ADC to convert, wait until the sequence conversion flag (SCF) in the ADC is set, then read the digital result. An interrupt uses hardware to cause special software execution. With an input device, the hardware will request an interrupt when input device has new data. The software interrupt service will read from the input device and save the data in a global structure, see the right part of Figure 9.1. With an output device, the hardware will request an interrupt when the output device is idle. The software interrupt service will get data from a global structure, then write to the device. Sometimes we configure the hardware timer to request interrupts on a periodic basis. The software interrupt service will perform a special function. A data acquisition system needs to read the ADC at a regular rate. The 9S12 microcomputer will execute special software (trap) when it tries to execute an illegal instruction. Other computers can be configured to request an interrupt on an access to an illegal address or a
328
9 䡲 Interrupt Programming and Real-Time Systems
divide by zero. The Freescale microcomputers do not provide for a divide by zero trap, but many computers do. Interrupt synchronization will be used in situations where the system is fairly complex (e.g., a lot of I/O devices) or when real time response is important. Periodic Polling uses a clock interrupt to periodically check the I/O status. At the time of the interrupt the software will check the I/O status, performing actions as needed. With an input device, a ready flag is set when the input device has new data. At the next periodic interrupt after an input flag is set, the software will read the data and save them in a global structure. With an output device, a ready flag is set when the output device is idle. At the next periodic interrupt after an output flag is set, the software will get data from a global structure, and write it. Periodic polling will be used in situations that require interrupts, but the I/O device does not support interrupt requests directly. DMA, or direct memory access, is an interfacing approach that transfers data directly to/from memory. With an input device, the hardware will request a DMA transfer when input device has new data. Without the software’s knowledge or permission the DMA controller will read data from the input device and save it in memory. With an output device, the hardware will request a DMA transfer when the output device is idle. The DMA controller will get data from memory, then write it to the device. Sometimes we configure the hardware timer to request DMA transfers on a periodic basis. DMA can be used to implement a high-speed data acquisition system. DMA synchronization will be used in situations where high bandwidth and low latency are important. One can think of the hardware being in one of three states. The idle state is when the device is disabled or inactive. No I/O occurs in the idle state. When active (not idle) the hardware toggles between the busy and ready states. The interface includes a flag specifying either busy (0) or ready (1) status. 䡲 The hardware will set the flag when the hardware component of the I/O operation is complete. 䡲 The software can read the flag to determine if the device is busy or ready. 䡲 The software can clear the flag, signifying the software component is complete. 䡲 This flag serves as the hardware trigger event for an interrupt. For an input device, a status flag is set when new input data is available. The “busy to ready” state transition will cause a busy-wait loop to complete, see middle of Figure 9.1. Once the software recognizes the input device has new data, it will read the data and ask the input device to create more data. It is the busy to ready state transition that signals to the computer that service is required. When the hardware is in the done state the I/O transaction is complete. Often the simple process of reading the data will clear the flag and request another input. Figure 9.1 The input device sets a flag when it has new data.
Blind Cycle
Input
Wait a fixed time
Input
Input Busy
BusyWait Status Ready
Interrupt
Empty
Fifo
Read data
Some
Read data
Read data
Get data from Fifo
Put data in Fifo
return
return
return
return from interrupt
The problem with I/O devices is that they are usually much slower than software execution. Therefore, we need synchronization, which is the process of the hardware and software waiting for each other in a manner such that data is properly transmitted. A way to visualize this synchronization is to draw a state versus time plot of the activities of the hardware and software. For an input device, the software begins by waiting for new input. When the input
9.1 䡲 I/O Sychronization
329
device is busy it is in the process of creating new input. When the input device is ready, new data is available. When the input device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software accepts the input, it can release the input device hardware. The arrows in Figure 9.2 represent the synchronizing events. In this example, the time for the software to read and process the data is less than the time for the input device to create new input. This situation is called I/O bound, meaning the bandwidth is limited by the speed of the I/O hardware. Figure 9.2 The software must wait for the input device to be ready.
Ready
Input device Software
Ready
Busy
Busy
Wait
Busy
Wait
Wait
Read Process
Read Process
Time
If the input device were faster than the software, then the software waiting time would be zero. This situation is called CPU bound (meaning the bandwidth is limited by the speed of the executing software). From this figure we can see that the bandwidth depends on both the hardware and the software. The busy-wait method is classified as unbuffered because the hardware and software must wait for each other during the transmission of each piece of data. The interrupt solution (shown in the right part of Figure 9.1) is classified as buffered, because the system allows the input device to run continuously, filling a FIFO with data as fast as it can. In the same way, the software can empty the buffer whenever it is ready and whenever there is data in the buffer. We will implement a buffered interface for the serial port input in Chapter 12 using interrupts. For an output device, a status flag is set when the output is idle and ready to accept more data. The “busy to ready” state transition causes a busy-wait loop to complete, see the middle part of Figure 9.3. Once the software recognizes the output is idle, it gives the output device another piece of data to output. It will be important to make sure the software clears the flag each time new output is started. Figure 9.3 The output device sets a flag when it has finished outputting the last data.
Blind Cycle
BusyWait
Busy
Write data
Status Ready
Wait a fixed time
Write data
return
return
Interrupt
Output
Output
Output
Empty
Fifo Full
Fifo
Not empty
Get data from Fifo
Not full
Put data in Fifo return
Write data return from interrupt
Figure 9.4 contains a state versus time plot of the activities of the output device hardware and software. For an output device, the software begins by generating data then sending it to the output device. When the output device is busy it is processing the data. Normally when the software writes data to an output port, that only starts the output process. The time it takes an output device to process data is usually longer than the software execution time. When the output device is done, it is ready for new data. When the output device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software writes data to the output, it releases the output device hardware. The output
330
9 䡲 Interrupt Programming and Real-Time Systems
Figure 9.4 The software must wait for the output device to finish the previous operation.
Ready
Ready
Output device
Ready
Software
Busy
Busy
Wait Write Generate Generate
Busy
Wait Write
Wait Write
Generate
Time
Generate
interface illustrated in Figure 9.4 is also I/O bound because the time for the output device to process data is longer than the time for the software to generate and write it. The arrows in Figure 9.4 signify the synchronizing events. Again, I/O bound means the bandwidth is limited by the speed of the I/O hardware. The busy-wait solution for this output interface is also unbuffered, because when the hardware is done, it will wait for the software and after the software generates data, it waits for the hardware. On the other hand, the interrupt solution (shown as the right part of Figure 9.3) is buffered, because the system allows the software to run continuously, filling a FIFO as fast as it wishes. In the same way, the hardware can empty the buffer whenever it is ready and whenever there is data in the FIFO. We will implement a buffered interface for the serial port output in Chapter 12 using interrupts.
9.2
Interrupt Concepts 9.2.1 Introduction
An interrupt is the automatic transfer of software execution in response to a hardware event that is asynchronous with the current software execution. This hardware event is called a trigger. The hardware event can either be busy to ready transition in an external I/O device (like the SCI input/output) or an internal event (like an op code fault, memory fault, power failure, or a periodic timer). When the hardware needs service, signified by a busy to ready-state transition, it will request an interrupt by setting its trigger flag. A thread is defined as the path of action of software as it executes. The execution of the interrupt service routine is called a background thread. This thread is created by the hardware interrupt request and is killed when the interrupt service routine executes the rti instruction. A new thread is created for each interrupt request. It is important to consider each individual request as a separate thread because local variables and registers used in the interrupt service routine are unique and separate from one interrupt event to the next interrupt. In a multithreaded system, we consider the threads as cooperating to perform an overall task. Consequently we will develop ways for the threads to communicate (e.g., FIFO) and synchronize with each other. Most embedded systems have a single common overall goal. On the other hand, general-purpose computers can have multiple unrelated functions to perform. A process is also defined as the action of software as it executes. Processes do not necessarily cooperate towards a common shared goal. Threads share access to I/O devices, system resources, and global variables, while processes have separate global variables and system resources. Processes do not share I/O devices. The software has dynamic control over aspects of the interrupt request sequence. First, each potential interrupt trigger has a separate arm bit that the software can activate or deactivate. The software will set the arm bits for those devices it wishes to accept interrupts from, and will deactivate the arm bits within those devices from which interrupts are not to be allowed. In other words it uses the arm bits to individually select which devices will and which devices will not request interrupts. The second aspect that the software controls is the interrupt enable bit, I, which is in the condition code register. The software can enable interrupts by making I 0, or it can disable interrupts by setting I 1. An interrupt occurs only when all three conditions are met: trigger, arm and enable. The disabled interrupt state
9.2 䡲 Interrupt Concepts
331
(I 1) does not dismiss the interrupt requests, rather it postpones them until a later time, when the software deems it convenient to handle the requests. We will pay special attention to these enable/disable software actions. In particular we will need to disable interrupts when executing nonreentrant code, but disabling interrupts will have the effect of increasing the response time of software. The interrupt service routine (ISR) is the software module that is executed when the hardware requests an interrupt. There may be one large ISR that handles all requests (polled interrupts), or many small ISRs specific for each potential source of interrupt (vectored interrupts). The design of the interrupt service routine requires careful consideration of many factors. Three conditions must be true for an interrupt to be generated. A device must be armed (e.g., RIE is set), interrupts must be enabled (I 0), and an external event must occur setting a trigger flag (e.g., new SCI input ready sets RDRF). An interrupt causes the following sequence of events. First, the current instruction is finished. There are exceptions to this rule: the 9S12 instructions rev revw and wav take a long time to execute, hence these three instructions can be interrupted in the middle of their execution. Second, the execution of the main program is suspended, pushing all the registers on the stack. Third, the PC is loaded with the address of the ISR (vector). Lastly, interrupts are disabled (I 1). These four steps, called a context switch, occur automatically in hardware as the context is switched from foreground to background. Next, the software executes the ISR. When the ISR is done it executes an rti causing the main program execution to be resumed. When the microcomputer accepts an interrupt request, it will automatically save the execution state of the main thread by pushing the registers (CCR, A, B, X, Y, and PC) on the stack. After the ISR provides the necessary service, it will execute an rti instruction. This instruction pulls these registers from the stack, which returns control to the main program. Since all threads use the same stack pointer, it is imperative that the ISR software balance the stack before exiting via the rti instruction. Execution of the main program will then continue with the exact stack and register values that existed before the interrupt. Although interrupt handlers can create and use local variables, parameter passing between threads must be implemented using shared global memory variables. A private global variables can be used if an interrupt thread wishes to pass information to itself, e.g., from one interrupt instance to another. The execution of the main program is called the foreground thread, and the executions of the various interrupt service routines are called background threads. An axiom with interrupt synchronization is that the interrupt program should execute as fast as possible. The interrupt should occur when it is time to perform a needed function, and the interrupt service routine should perform that function, and return right away. Placing backward branches (busy-waiting loops, iterations) in the interrupt software should be avoided if possible. The percentage of time spent executing interrupt software should be minimized. For an input device, the interface latency of an interrupt-driven input device is the time between when new input is available, and the time when the software reads the input data. We can also define device latency as the response time of the external I/O device. For example, if we request that a certain sector be read from a disk, then the device latency is the time it take to find the correct track and spin the disk (seek) so the proper sector is positioned under the read head. For an output device, the interface latency of an interruptdriven output device is the time between when the output device is idle, and the time when the software writes new data. A real-time system is one that can guarantee a worst case interface latency. Many factors should be considered when deciding the most appropriate mechanism to synchronize hardware and software. One should not always use busy-waiting because one is too lazy to implement the complexities of interrupts. On the other hand, one should not always use interrupts because they are fun and exciting. Busy-waiting synchronization is appropriate when the I/O timing is predicable, and when the I/O structure is simple and fixed. Busy-waiting should be used for dedicated single thread systems where there is
332
9 䡲 Interrupt Programming and Real-Time Systems
nothing else to do while the I/O is busy. Interrupt synchronization is appropriate when the I/O timing is variable, and when the I/O structure is complex. In particular, interrupts are efficient when there are I/O devices with different speeds. Interrupts allow for quick response times to important events. In particular, using interrupts is one mechanism to design real-time systems, where the interface latency must be short and bounded. They can also be used for infrequent but critical events like power failure, memory faults, and machine errors. Interrupts can be used to assist program development by triggering on stack overflow, invalid op code, and breakpoints. Periodic interrupts will be useful for real-time clocks, data acquisition systems, and control systems. For extremely high bandwidth and low latency interfaces, DMA should be used. An atomic operation is a sequence that once started will always finish, and can not be interrupted. Most instructions on the 9S12 are atomic. The exceptions are wai rev and revw, which can be suspended to process an interrupt. If we wish to make a section of code atomic, we can run that code with I 1. In this way, interrupts will not be able to break apart the sequence. In particular, to implement an atomic operation we will (1) save the current value of the CCR, (2) disable interrupts, (3) execute the operation, and (4) restore the CCR back to its previous value. Checkpoint 9.1: What three conditions must be true for an interrupt to occur? Checkpoint 9.2: How do you enable interrupts? Checkpoint 9.3: What are the steps that occur when an interrupt is processed?
9.2.2 Essential Components of Interrupt Processing
In this section, we will present the specific details for the 9S12 microcomputers. As you develop experience using interrupts, you will come to notice a few common aspects that most computers share. The following paragraphs outline three essential mechanisms that are needed to utilize interrupts. Although every computer that uses interrupts includes all three mechanisms there are a wide spectrum of implementation methods. All interrupting systems must have the ability for the hardware to request action from computer. The interrupt requests can be generated using a separate connection to microprocessor for each device, or using a shared negative logic wire-or requests using open collector logic. The shared interrupt request line on the 9S12 is IRQ, which is on the PE1 pin. The XIRQ line on the PE0 pin can also be shared, but XIRQ is usually reserved for catastrophic errors. The Freescale microcomputers support both types. All interrupting systems must have the ability for the computer to determine the source. A vectored interrupt system employs separate connections for each device so that the computer can give automatic resolution. You can recognize a vectored system because each device has a separate interrupt vector address. With a polled interrupt system, the interrupt software must poll each device, looking for the device that requested the interrupt. The third necessary component of the interface is the ability for the computer to acknowledge the interrupt. Normally there is a trigger flag in the interface that is set on the busy to ready state transition. In essence this trigger flag is the cause of the interrupt. Acknowledging the interrupt involves clearing this flag. It is important to shut off the request, so that the computer will not mistakenly request a second (and inappropriate) interrupt service for the same condition. Some Intel systems use a hardware acknowledgment that automatically clears the request. Most Freescale microcomputers use a software acknowledge. So when designing an interrupting interface on the 9S12, it will be important to know exactly what hardware conditions will set the trigger flag (and request an interrupt) and how the software will clear it (acknowledge) in the ISR. There are no standard definitions for the terms mask, enable, and arm in the professional, Computer Science, or Computer Engineering communities. Nevertheless, in this book we will adhere to the following specific meanings. To arm (disarm) a device means to enable (shut off) the source of interrupts. Each potential interrupting device has a separate arm bit. One arms (disarms) a device if one is (is not) interested in interrupts from
9.2 䡲 Interrupt Concepts
333
this source. For example, the 9S12 TIE register has eight arm bits for the output compare and input capture interrupts. The Freescale literature calls the arm bit as an “interrupt enable mask”. To enable (disable) means to allow interrupts at this time (postponing interrupts until a later time). On the 9S12 there is one interrupt enable bit for the entire interrupt system. We disable interrupts if it is currently not convenient to accept interrupts. In particular, to disable interrupts we set the I bit in 9S12 condition code register using the sei instruction. The software interrupt (swi) instruction and illegal instruction trap can not be disarmed or disabled. The XIRQ interrupt can be enabled by clearing the X bit in the CCR, but XIRQ interrupts can not be disabled. In particular, once cleared, the software can not set the X bit. The reset line will halt execution and load the PC with the 16-bit contents at $FFFE, but does not save the current state by pushing registers on the stack. Reset can’t be disarmed or disabled. Common Error: The system will crash if the interrupt service routine doesn’t either acknowledge or disarm the device requesting the interrupt. Common Error: The ISR software doesn’t have to explicitly disable interrupts at the beginning (sei) or explicitly reenable interrupts at the end (cli). The disabling and enabling occur automatically.
9.2.3 Sequence of Events
The sequence of events begins with the Hardware needs service (busy to done) transition. This signal is connected to an input of the microcomputer that can generate an interrupt. For example, the key wakeup, input capture, serial communication interface (SCI) and serial peripheral interface (SPI) systems support interrupt requests. Some interrupts are internally generated like output compare, real-time interrupt (RTI), and timer overflow. The second event is the setting of a trigger flag in one of the I/O status registers of the microcomputer. This is the same flag that a busy-waiting interface would be polling on. Examples include the key wakeup (KWIFJn), serial communication interface (RDRF and TDRE), output compare (CnF), real-time interrupt (RTIF), and timer overflow (TOF). In order for an interrupt to be requested the appropriate trigger flag bit must be armed. Examples include the key wakeup (KWIEJn), serial communication interface (RIE and TIE), output compare (CnI), real-time interrupt (RTII), and timer overflow (TOI). In summary, three conditions must be met simultaneously for an interrupt service to occur. These three conditions can occur in any order. 1. A device is armed 2. A microcomputer interrupts are enabled 3. An interrupting event occurs that sets the trigger
e.g., C3I 1 I0 e.g., C3F 1
The third event in the interrupt processing sequence is the context switch, or threadswitch. The thread-switch is performed by the microcomputer hardware automatically. First, the microcomputer will finish the current instruction (rev revw and wav are interruptable). After the current instruction is complete, it takes 9 more bus cycles on the 9S12 to perform the thread-switch: 1. 2. 3. 4. 5. 6. 7. 8. 9.
The 16-bit interrupt vector address is read (eventually this is loaded into the PC) The PC is pushed (return address) The first of three op code fetches is performed to fill the instruction queue Register Y is pushed on the stack Register X is pushed on the stack The second of three op code fetches is performed to fill the instruction queue Registers B and A are pushed on the stack (RegD is pushed little endian) The CCR is pushed, with the I bit still equal to 0, then sets I 1 The third of three op code fetches is performed to fill the instruction queue (queue is full)
334
9 䡲 Interrupt Programming and Real-Time Systems
The fourth event is the software execution of the interrupt service routine (ISR). For a polled interrupt configuration, the ISR must poll each possible device, and branch to specific handler for that device. The polling order establishes device priority. For a vectored interrupt configuration, you could poll anyway to check for runtime hardware/software errors. The ISR must either acknowledge or disarm the interrupt. We acknowledge an interrupt by clearing the trigger flag that was set in the second event shown above. After we acknowledge a low-priority interrupt, we may re-enable interrupts (cli) to allow higher priority devices to go first. All ISR’s must perform the necessary operations (read data, write data etc.) and pass parameters through shared global memory (e.g., FIFO queue). The last event is another thread-switch in order to return control back to the thread that was running when the interrupt was processed. In particular, the software executes an rti at the end of the ISR, which will pull CCR, B, A, X, Y, and PC off the stack. At the beginning of the interrupt service the CCR was pushed on the stack with I 0. Therefore, the execution of rti automatically re-enables interrupts. After the ISR executes rti the stack is restored to the state it was before the interrupt. The ISR may change global variables or I/O ports, but the registers and stack are left unchanged by the ISR. The interrupt hardware will automatically save all registers on the stack during the thread-switch, as shown in Figure 9.5. The thread-switch is the process of stopping the foreground (main) thread and starting the background (interrupt handler). The “oldPC” value on the stack points to the place in the foreground thread to resume once the interrupt is complete. At the end of the interrupt handler, another thread-switch occurs as the rti instruction restores registers from the stack (including the PC). Checkpoint 9.4: What would happen if the ISR forgot to acknowledge the interrupt? Checkpoint 9.5: If you didn’t want to or couldn’t acknowledge what else might the ISR do? Figure 9.5 Stack before and after an interrupt.
Before interrupt
RAM
I 0 SP PC
Stack $3FFF $4000 EEPROM main
$FFFF
9.2.4 9S12 Interrupts
After Context Switch 1) Finish instruction interrupt 2) Push registers 3) PC = {Vector} I 1 4) I=1 SP PC
RAM old CC old B old A old X old Y old PC Stack
$3FFF $4000 EEPROM main
Handler
Handler
rti
rti
Vector
Vector $FFFF
On the 9S12, exceptions include resets, software interrupts and hardware interrupts. Each exception has an associated 16-bit vector that points to the memory location where the ISR that handles the exception is located. Vectors are stored in the upper 128 bytes of the standard 64 kibibyte address map. As we have seen previously, the reset vector points to the main program, but the other vectors will point to interrupt service routines. A hardware priority hierarchy determines which exception is serviced first when simultaneous requests are made. Basically, the exception with the vector at a higher address has priority over an exception with a vector at a lower address. Since the reset vector is at $FFFE, it is the highest priority exception. Six exceptions are
9.2 䡲 Interrupt Concepts
335
nonmaskable, meaning there is no associated arm bit, and the exception is not affected by the I bit in the CCR. The remaining sources have an arm bit that can be activated (armed) or deactivated (disarmed). The priorities of the non-maskable sources are: 1. 2. 3. 4. 5. 6.
Power-On-Reset (POR) or regular hardware RESET pin Clock monitor reset Computer-Operating-Properly (COP) watchdog reset Unimplemented instruction (trap) Software interrupt instruction (swi) XIRQ signal (if X bit in CCR 0)
Maskable interrupt sources include on-chip peripheral systems and external interrupt service requests. Interrupts from these sources are recognized when the interrupt enable bit (I) in the CCR is cleared. The default state of the I bit out of reset is one, but it can be written at any time. The 9S12 has two external requests, XIRQ and IRQ, that are level zero active. Many of the internal I/O devices can generate interrupt requests based on external events (e.g., key wakeup, input capture, SCI, SPI, etc.) Other than the six non-maskable sources listed above, the remaining interrupt requests will temporarily set the I bit in the CCR during the interrupt program to prevent other interrupts (including itself). On the other hand, the XIRQ request temporarily sets both the I and X bits in the CCR during the interrupt program to postpone all other interrupts sources. The interrupts have a fixed priority, but you can elevate one request to highest priority using the HPRIO, Hardware Priority Interrupt Register ($001F). The relative priorities of the other interrupt sources remain the same. We typically use XIRQ to interface a single highest priority device. XIRQ has a separate interrupt vector ($FFF4) and a separate enable bit (X). Once the X bit is cleared (enabled) the software can not disable it. A XIRQ interrupt is requested when the external XIRQ pin is low and the X bit in the CCR is 0. XIRQ processing will automatically set X I 1 (an IRQ can not interrupt an XIRQ service) at the start of the XIRQ handler. Just like regular interrupts, the X and I bits will be restored to their original values by the rti instruction. The priority is fixed in the order shown in Table 9.1 with Key Wakeup P having the lowest priority and Reset having the highest. Not all interrupt sources are available on every 9S12, but this list defines some of the interrupt sources. Any one particular application usually uses just a few interrupts. In particular, those devices that need prompt service should be armed to request an interrupt. The software arms (specific for each possible source) and enables (I 0 globally) interrupts. The external event triggers the interrupt by setting the trigger flag. The interrupt service routine (ISR) is executed in response to the trigger. The ISR acknowledges the interrupt by clearing the trigger flag. For some interrupt sources, such as the SCI interrupts, flags are automatically cleared during the response to the interrupt requests. For example, the RDRF flag in the SCI system is cleared by the automatic clearing mechanism, consisting of a read of the SCI status register while RDRF is set, followed by a read of the SCI data register. The normal response to an RDRF interrupt request is to read the SCI status register to check for receive errors, then to read the received data from the SCI data register. These two steps satisfy the automatic clearing mechanism without requiring any special instructions. On the other hand, many trigger flags employ a confusing, but effective way for the software to acknowledge it. Flags such as RTIF, CnF, TOF, PIFJn, PIFHn, and PIFPn are cleared when the software writes a 1 into the bit position of that flag. Writing a zero to the flag register has no effect, and writing a $FF clears all the flag bits in the register. Many of the potential interrupt requests share the
336
9 䡲 Interrupt Programming and Real-Time Systems
same interrupt vector. E.g., there are 8 possible key wakeup interrupt sources (PH7 to PH0) that all use the vector at $FFCC. Therefore, when this request is processed the ISR software must determine which of the 8 possible signals caused the interrupt. Vector Address
CW Number
Interrupt Source or Trigger flag
$FFFE $FFFC
0 1
Reset COP Clock Monitor Fail Reset
$FFFA $FFF8 $FFF6 $FFF4 $FFF2 $FFF0 $FFEE $FFEC $FFEA $FFE8 $FFE6 $FFE4 $FFE2 $FFE0 $FFDE $FFDC $FFDA $FFD8
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
$FFD6
20
$FFD4
21
$FFD2 $FFD0 $FFCE $FFCC $FFC8 $FFC0 $FFBE
22 23 24 25 27 31 32
$FFBC
33
$FFB6 $FFB4
36 37
COP Failure Reset Unimplemented Instruction Trap SWI XIRQ IRQ Real Time Interrupt, RTIF Timer Channel 0, C0F Timer Channel 1, C1F Timer Channel 2, C2F Timer Channel 3, C3F Timer Channel 4, C4F Timer Channel 5, C5F Timer Channel 6, C6F Timer Channel 7, C7F Timer Overflow, TOF Pulse Acc. Overflow, PAOVF Pulse Acc. Input Edge, PAIF SPI0 Transfer Complete, SPIF SPI0 Transmit Empty, SPTEF SCI0 Transmit Buff Empty, TDRE SCI0 Transmit Complete, TC SCI0 Receiver Buffer Full, RDRF SCI0 Receiver Idle, IDLE SCI1 Transmit Buff Empty, TDRE SCI1 Transmit Complete, TC SCI1 Receiver Buffer Full, RDRF SCI1 Receiver Idle, IDLE ATD0 Sequence Complete, ASCIF ATD1 Sequence Complete, ASCIF Key Wakeup J, PIFJ.[7:6],[1,0] Key Wakeup H, PIFH.[7:0] Pulse Acc. Overflow, PBOVF I2C SPI1 Transfer Complete, SPIF SPI1 Transmit Empty, SPTEF SPI2 Transfer Complete, SPIF SPI2 Transmit Empty, SPTEF CAN wakeup CAN errors
$FFB2 $FFB0 $FF8E
38 39 56
CAN receive CAN transmit Key Wakeup P, PIFP[7:0]
Enable none none none none none none X bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit
I bit
I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit I bit
Table 9.1 Some of the interrupt vectors for the 9S12 CW stands for CodeWarrior.
Local Arm
HPRIO to Elevate
none COPCTL.CME COPCTL.FCME COP rate selected none none none INTCR.IRQEN CRGINT.RTIE TIE.C0I TIE.C1I TIE.C2I TIE.C3I TIE.C4I TIE.C5I TIE.C6I TIE.C7I TIE.TOI PACTL.PAOVI PACTL.PAI SPI0CR1.SPIE SPI0CR1. SPTIE SCI0CR2.TIE SCI0CR2.TCIE SCI0CR2.RIE SCI0CR2.ILIE SCI1CR2.TIE SCI1CR2.TCIE SCI1CR2.RIE SCI1CR2.ILIE ATD0CTL2.ASCIE ATD1CTL2.ASCIE PIEJ.[7:6], [1:0] PIEH.[7:0] PBCTL.PBOVI IBCR.IBIE SPI1CR1.SPIE SPI1CR1. SPTIE SPI2CR1.SPIE SPI2CR1. SPTIE CANRIER.WUPIE CANRIER.CSCIE CANRIER.OVRIE CANRIER.RXFIE CANTIER.TXEIE[2:0] PIEP.[7:0]
– – – – – – – $F2 $F0 $EE $EC $EA $E8 $E6 $E4 $E2 $E0 $DE $DC $DA $D8 $D6
$D4
$D2 $D0 $CE $CC $DC $C0 $BE $BC $B6 $B4 $B2 $B0 $8E
9.2 䡲 Interrupt Concepts
9.2.5 Polled Versus Vectored Interrupts
337
As we defined earlier, when more than one source of interrupt exists the computer must have a reliable method to determine which interrupt request has been made. There are two common approaches, and the Freescale microcomputers apply a combination of both methods. The first approach is called vectored interrupts. With a vectored interrupt system each potential interrupt source has a unique interrupt vector address. You simply place the correct handler address in each vector, and the hardware automatically calls the correct software when an interrupt is requested, see Table 9.1. The second approach is called polled interrupts. SCI, SPI, and key wakeup must be polled. With a polled interrupt system multiple interrupt sources share the same interrupt vector address (e.g., both RDRF and TDRE share the same vector). Once the interrupt has occurred, the ISR software must poll the potential devices to determine which device needs service. The 9S12 systems have a separate acknowledgment, so that if both interrupts are pending, acknowledging one will not satisfy the other, so the second device will request a second interrupt and get serviced. Common Error: If two interrupts were requested, it would be a mistake to service just one and acknowledge them both. Observation: External events are often asynchronous to program execution, so careful thought is required to consider the effect if an external interrupt request were to come in between each pair of instructions. Observation: The computer automatically sets the I bit during processing, so that an interrupt handler will not interrupt itself.
9.2.6 Pseudo-Interrupt Vectors
Table 9.2 Some pseudo-interrupt vectors for the 9S12.
Some development boards do not allow you to erase and reprogram the interrupt vectors from $FF80 to $FFFF. In these development systems, the ROM at $FF80 to $FFFF has the interrupt vectors pointing to memory locations that can be set. The locations to which the real vectors point are called pseudo-interrupt vectors. Typically, the pseudointerrupt vectors are defined in the same order as the real vectors. Pseudo vectors for three debuggers are shown in Table 9.2. In the old 6811 development boards each pseudo vector was in RAM and required 3 bytes. Three bytes were required to place a jmp instruction to your ISR. During a 6811 initialization, the program places jmp instructions into the pseudo vectors. In contrast, most 9S12 debuggers only require 2 bytes for each pseudo vector. The MON12 debugger on 9S12 boards from Axiom (http://www.axman.com) and the D-Bug12 debugger from Technological Arts implement 16-bit pseudo vectors in RAM. During initialization at run-time, your program
Real MON12 Vector Pseudo Vector
D-Bug12 Pseudo Vector
Serial Monitor Interrupt Source or Pseudo Vector Trigger Flag
$FFFE $FFFC $FFFA $FFF8 $FFF6 $FFF4 $FFF2 $FFF0 $FFEE ...
none none none $3E78 $3E76 $3E74 $3E72 $3E70 $3E6E ...
none $F7FC $F7FA $F7F8 $F7F6 $F7F4 $F7F2 $F7F0 $F7EE ...
none $0FFC $0FFA $0FF8 $0FF6 $0FF4 $0FF2 $0FF0 $0FEE ...
Reset COP Clock Monitor Fail Reset COP Failure Reset Unimplemented Instruction Trap SWI XIRQ IRQ Real Time Interrupt, RTIF Timer Channel 0, C0F ...
338
9 䡲 Interrupt Programming and Real-Time Systems
must place pointers to your ISRs into the pseudo vectors. Everytime an interrupt occurs the Axiom MON12 debugger requires 21 extra bus cycles to implement the indirect jump to your ISR. The Freescale Serial Monitor used by Metrowerks CodeWarrior and TExaS also employ pseudo vectors. The difference is the Serial Monitor pseudo vectors are in EEPROM. Your software does not have to perform any run-time initialization of the pseudo vector. Rather, the Serial Monitor will automatically translate a “Program ROM” command from $FF80-$FFFF down to $F780-$F7FF. For example, this code is the proper way to set the TC0 interrupt vector in a system without pseudo vectors, see Table 9.1. org fdb
$FFEE TC0han
However, when your software is loaded into EEPROM, this vector transparently and automatically ends up being programmed at $F7EE. Everytime an interrupt occurs the Serial Monitor requires 19 extra bus cycles to implement the pseudo vector. The actual Serial Monitor code for an interrupt is uvector08: bsr ... ISRHandler: pulx ldy cpy beq jmp
ISRHandler
-$0636,X #$FFFF BadVector ,Y
;TC0 interrupt starts executing here ;pull bsr return address off stack ;get value of pseudo vector ;is it programmed? ;jump to your ISR
SCI interrupts with the serial monitor include an overhead longer than 19 cycles, because the SCI interrupts are used by the debugger itself to perform its actions. In particular, after a SCI interrupt, the debugger will check the LOAD/RUN switch to see if the debugger or user program should process the interrupt.
9.3
Key Wakeup Interrupts The basic idea of key wakeup is to connect an input to the 9S12 and configure the interface so an interrupt is requested on either the rising or falling edge of the input. Using key wakeup allows make software respond quickly to changes in the external world. The 9S12C32 has ten possible key wakeup interrupt sources, which are available on Ports J, and P. The 9S12DP512 has twenty key wakeup interrupt sources, which are available on Ports H, J, and P. See Table 9.3. Any or all of these pins can be configured as a key wakeup interrupt. Each of the wakeup lines has a separate I/O pin (PTH, PTJ, PTP), a direction register bit (DDRH, DDRJ, DDRP), a trigger flag bit (PIFH, PIFJ, PIFP), an arm bit (PIEH, PIEJ, PIEP), and a polarity bit (PPSH, PPSJ, PPSP). First we identify external digital signals containing strategic edges (rising or falling). In particular, strategic means we wish to execute software whenever one of these edges occur. We connect these digital signals to individual key wakeup pins. To use key wakeup, we must make these lines an input, and configure the strategic edge to be active. Key wakeup interrupts can be configured to be active on either the rising or falling edge. If the corresponding bit in the PPSH/PPSJ/PPSP is 0, then a falling edge will set the trigger flag. Conversely, if the bit in the PPSH/PPSJ/PPSP register is 1, then a rising edge will set the trigger flag. A key wakeup interrupt will be generated if the trigger flag bit is set, the arm bit is set and the interrupts are enabled (I 0).
9.3 䡲 Key Wakeup Interrupts
339
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0260 $0261 $0262 $0263 $0264 $0265 $0266 $0267 $0268 $0269 $026A $026B $026C $026D $026E $026F $0258 $0259 $025A $025B $025C $025D $025E $025F
PH7 PH7 DDRH7 RDRH7 PERH7 PPSH7 PIEH7 PIFH7 PJ7 PJ7 DDRJ7 RDRJ7 PERJ7 PPSJ7 PIEJ7 PIFJ7 PP7 PP7 DDRP7 RDRP7 PERP7 PPSP7 PIEP7 PIFP7
PH6 PH6 DDRH6 RDRH6 PERH6 PPSH6 PIEH6 PIFH6 PJ6 PJ6 DDRJ6 RDRJ6 PERJ6 PPSJ6 PIEJ6 PIFJ6 PP6 PP6 DDRP6 RDRP6 PERP6 PPSP6 PIEP6 PIFP6
PH5 PH5 DDRH5 RDRH5 PERH5 PPSH5 PIEH5 PIFH5 – – – – – – – – PP5 PP5 DDRP5 RDRP5 PERP5 PPSP5 PIEP5 PIFP5
PH4 PH4 DDRH4 RDRH4 PERH4 PPSH4 PIEH4 PIFH4 – – – – – – – – PP4 PP4 DDRP4 RDRP4 PERP4 PPSP4 PIEP4 PIFP4
PH3 PH3 DDRH3 RDRH3 PERH3 PPSH3 PIEH3 PIFH3 – – – – – – – – PP3 PP3 DDRP3 RDRP3 PERP3 PPSP3 PIEP3 PIFP3
PH2 PH2 DDRH2 RDRH2 PERH2 PPSH2 PIEH2 PIFH2 – – – – – – – – PP2 PP2 DDRP2 RDRP2 PERP2 PPSP2 PIEP2 PIFP2
PH1 PH1 DDRH1 RDRH1 PERH1 PPSH1 PIEH1 PIFH1 PJ1 PJ1 DDRJ1 RDRJ1 PERJ1 PPSJ1 PIEJ1 PIFJ1 PP1 PP1 DDRP1 RDRP1 PERP1 PPSP1 PIEP1 PIFP1
PH0 PH0 DDRH0 RDRH0 PERH0 PPSH0 PIEH0 PIFH0 PJ0 PJ0 DDRJ0 RDRJ0 PERJ0 PPSJ0 PIEJ0 PIFJ0 PP0 PP0 DDRP0 RDRP0 PERP0 PPSP0 PIEP0 PIFP0
PTH PTIH DDRH RDRH PERH PPSH PIEH PIFH PTJ PTIJ DDRJ RDRJ PERJ PPSJ PIEJ PIFJ PTP PTIP DDRP RDRP PERP PPSP PIEP PIFP
Table 9.3 9S12 key wakeup ports (all twenty pins are available on the 9S12DP512, while just the ten shaded pins are available on the 9S12C32).
Another convenience of Ports H, J, and P is the available pull-up or pull-down resistors a shown in Table 9.4. Each of the pins of Ports H, J, and P can be configured separately.
DDRH/DDRJ/DDRP
PPSH/PPSJ/PPSP
PERH/PERJ/PERP
Port Mode
1 0 0 0 0
– 0 1 0 1
– 0 0 1 1
Regular output Regular input, falling edge Regular input, rising edge Input with passive pull-up, falling edge Input with passive pull-down, rising edge
Table 9.4 Pull up/down modes of Ports H, J and P.
A typical application of pull-up is the interface of simple switches. Using pull-up or pull-down mode eliminates the need for an external resistor when interfacing a switch. The PJ6, PT6 interfaces in Figure 9.6a) implement negative logic switch inputs, and the PJ7, PT7 interfaces in Figure 9.6b) implement positive logic switch inputs. The Port P interfaces employ internal resistors.
340
9 䡲 Interrupt Programming and Real-Time Systems
Figure 9.6 Key wakeup or input capture can generate interrupts on a switch touch.
+5V 9S12 +5V
PJ6
9S12 +5V
PJ7
10kΩ PT6
PT7 10kΩ
(a) Pull-up interface
(b) Pull-down interface
Checkpoint 9.6: What values to you write into DDRJ, PPSJ and PERJ to configure the switch interfaces of PJ6 and PJ7 in Figure 9.6?
Three conditions must be simultaneously true for a key wakeup interrupt to be requested: 䡲 The trigger flag bit is set 䡲 The arm bit is set 䡲 The I bit in the 9S12 CCR is 0 Even though there are twenty key wakeup lines, there are only three interrupt vectors, one for Port H, one for Port J and the other for Port P. So, if two or more wakeup interrupts are used on the same port, it will be necessary to poll. Interrupt polling is the software function to look and see which of the potential sources requested the interrupt. The flag bits are cleared by writing a one to it. For example, to clear Port P trigger flag 7 in C we can execute PIFP = 0x80;
// clears flag bit 7 of Port P
In assembly, to clear Port P trigger flag 7 movb #$80,PIFP ; clears flag bit 7 of Port P
Example 9.1 You are asked to design a measurement system for the robot in Figure 8.17 that counts the number of times the wheel turns. This count will be a measure of the total distance travelled. The desired resolution is 1⁄32 of a turn and the desired range is 0 to 2047 31⁄32 revolutions Solution Whenever you measure something, it is important to consider the resolution and range. The basic idea is to use an optical sensor (QRB1134) to visualize the position of the wheel. A black/white striped pattern is attached to the wheel, and an optical reflective sensor placed near the stripes. The sensor has an infrared LED output and a light sensitive transistor. The current to the 1.8 V LED is controlled by the R1 resistor. In this circuit, the LED current will be (5-1.8 V)/200 , which is 16 mA. The R2 pull-up resistor on the transistor creates a output swing at V1 depending on whether the sensor sees a black stripe or white stripe. Unfortunately, the signal V1 is not digital. The rail-to-rail op amp, in open loop mode, creates a clean digital signal at V2, which has the same frequency as V1. The negative terminal is set to a voltage approximately in the center of V1, shown as 2 V in Figure 9.7. In general, we should select the threshold at the place in the wave where the slope is maximum. We then interface V2 to a key wakeup pin, and configure the system to trigger a key wakeup interrupt on each rising edge. This solution uses PP5, such that a rising edge triggers an interrupt on Port P key wakeup, see Program 9.1. Because there are 32 stripes on the wheel, there will be 32 interrupts each time the wheel rotates once. A 16-bit counter is used, because we expect less than 65535 counts. The count is a binary fixed-point number with a resolution of 25 revolutions. E.g., if the count is 100, this means 100/32 or 3.125 revolutions. We also assume no other key wakeup channels on Port P will be used.
9.3 䡲 Key Wakeup Interrupts Figure 9.7 An optical sensor is used to detect rotations on a wheel.
+5V
+5V 5kΩ R2
200Ω R1
TLC2274 +5V V1 +
QRB1134 2V
light
341
9S12 V2
PP5
–
V2 V1
5V 2V 0V
org $0800 ;($3800 if 9S12C32) rmb 2 ;0.03125 revolutions org $4000 ;Rising edge on PP5 causes an interrupt Key_Init movw #0,Count bclr DDRP,#$20 ; PP5 is input bclr PERP,#$20 ; no pull down on PP5 bset PPSP,#$20 ; rising edge active bset PIEP,#$20 ; arm PP5 movb #$20,PIFP ; clear flag cli rts Keyhandler movb #$20,PIFP ; ack, clear flag ldx count inx stx Count ; units 1/32 revolution rti org $FF8E fdb Keyhandler Count
// Rising edge on PP5 causes an interrupt unsigned short Count; // 1/32 revolutions void Key_Init(void){ Count = 0; DDRP &= ~0x20; // PP5 is input PERP &= ~0x20; // no pull down on PP5 PPSP |= 0x20; // rising edge active PIEP |= 0x20; // arm PP5 PIFP = 0x20; // clear flag asm cli // enable interrupts } interrupt 56 void Keyhandler(void){ PIFP = 0x20; // clear flag Count++; // 1/32 revolution }
Program 9.1 Assembly and C implementations of an interrupting key wakeup.
Because of the read, modify, write sequence, the following software clears all the flag bits (hence these are inappropriate ways to clear one flag.)
bset PIFP,#$04
PIFP |= 0x04;
Observation: All 8 key wakeup lines on Port P use the same interrupt vector, but they have separate polarity, arm, pullup/down, and flag bits. Checkpoint 9.7: How do you modify Program 9.1 so it counts falling edges?
If a pin is configured as an input, then reads to PTH/PTJ/PTP return the same value as reads to PTIH/PTIJ/PTIP, which will be the digital value at the input. Conversely, if a pin is configured as an output, then reads to PTH/PTJ/PTP return the most recent value written to the output port, while reads to PTIH/PTIJ/PTIP will return the digital value at the output pin. The RDRH/RDRJ/RDRP register determines the drive strength of an output signal. If the bit is 1, then the corresponding output will have 1/3 drive current. This mode is used to reduce supply current to the 9S12.
342
9 䡲 Interrupt Programming and Real-Time Systems
9.4
Periodic Interrupt Programming We will continue our interrupt examples with periodic interrupts. Periodic interrupts are both simple to understand and extremely useful for real-time embedded systems. A periodic interrupt is one that is requested on a fixed time basis. Periodic interrupts are required for data acquisition and control systems, because software execution must be performed periodically at accurate time intervals. For a data acquisition system, it is important to establish an accurate sampling rate. The time in between ADC samples must be equal (and known) in order for the digital signal processing to function properly. Similarly for microcomputer-based control systems, it is important to maintain both the timing with the sensors (inputs) and with the actuators (outputs). One synchronization method that uses periodic interrupts is called “intermittent polling” or “periodic polling”. In regular busy-waiting, the main program polls the I/O devices continuously. With intermittent polling, the I/O devices are polled on a regular basis, established by a periodic interrupt, as shown in the flowchart of Figure 9.8. Assume for a moment that all n devices are simultaneously ready. It is an appropriate design constraint for the time it takes to service all n devices (maximum time to execute the ISR) to be small compared to the interrupt period used for the periodic polling. This constraint will prevent the periodic polling ISR from capturing all the available CPU time. Similarly, the time to execute this ISR will affect the response time of other interrupts in the system. On the other hand, the interrupt frequency used for the periodic polling should be large compared to the bandwidth of the I/O channel, so no data are lost. If no device needs service, then the interrupt simply returns. This method frees the main program from the I/O tasks. The original IBM-PC computer used an 18 Hz
Figure 9.8 An ISR flowchart that implements periodic polling.
Periodic Interrupt Ready Device 1 Busy
Input/Output Data 1 Ready
Device 2 Busy
Input/Output Data 2
Ready Device n Busy
Input/Output Data n
Acknowledge Interrupt rti
9.5 䡲 Real-Time Interrupt (RTI)
343
periodic interrupt to interface its keyboard. It is appropriate to use periodic polling when the following two conditions apply: 1. The I/O hardware can not generate interrupts directly. 2. We wish to perform the I/O functions in the background. Observation: The average response time of an event interfaced with periodic polling is 1/2 the period. Observation: The worst case response time of an event interfaced with periodic polling is the period.
There are three mechanisms on the 9S12 that generate periodic interrupts: real-timeinterrupt (RTI), timer overflow (TOF) and output compare (OC).
9.5
Real-Time Interrupt (RTI) First, the real-time interrupt (RTI) mechanism can generate interrupts at a fixed rate. Seven bits (RTR6-0) in the RTICTL register specify the interrupt rate. The 7-bit value is composed of two parts: Let RTR6, RTR5, RTR4 be n, which is a 3-bit number ranging from 0 to 7 Let RTR3, RTR2, RTR1, RTR0 be m, which is a 4-bit number ranging from 0 to 15 Table 9.5 shows the 9S12 registers used in RTI interrupts. The entries shown in bold will be used in this section.
Table 9.5 9S12 registers used to configure real time interrupts.
Address Bit 7 $0037 $0038 $003B
6
5
RTIF PROF RTIE 0 0 RTR6
4
3
0 LOCKIF LOCK 0 LOCKIE 0 RTR5 RTR4 RTR3
2
1
TRACK SCMIF 0 SCMIE RTR2 RTR1
Bit 0
Name
SCM CRGFLG 0 CRGINT RTR0 RTICTL
If n is zero, then the RTI system is off. A 9S12C32 with an 8 MHz crystal will have an OSCCLK frequency of 8 MHz and a default E clock frequency of 4 MHz. A 9S12DP512 with a 16 MHz crystal will have an OSCCLK frequency of 16 MHz and a default E clock frequency of 8 MHz. Let fcrystal be the crystal frequency, then the RTI interrupt frequency can be calculated using RTI interrupt frequency fcrystal *2n/(m 1)/512 RTI interrupt period 512*(m 1)*2n/fcrystal Observation: The phase-lock-loop (PLL) on the 9S12 will not affect the RTI rates.
The interrupt rate is determined by the crystal clock and the RTICTL value. Table 9.6 shows the available interrupt periods, assuming an 8 MHz crystal. Table 9.7 shows the available interrupt periods, assuming a 16 MHz crystal. Basically, the RTIF trigger flag is set periodically. If armed (RTIE 0), this trigger flag will request an interrupt. To clear the RTIF flag (acknowledge the interrupt), the software writes a one to it.
344
9 䡲 Interrupt Programming and Real-Time Systems
Table 9.6 9S12 real-time interrupt period in ms, assuming an 8 MHz crystal.
Table 9.7 9S12 real-time interrupt period in ms, assuming a 16 MHz crystal.
0000 0001 0010 0011 0100 0101 0110 m [3:0] 0111 1000 1001 1010 1011 1100 1101 1110 1111
0000 0001 0010 0011 0100 0101 0110 m [3:0] 0111 1000 1001 1010 1011 1100 1101 1110 1111
n [6:4] of the RTICTL 011 100 101
000
001
010
off off off off off off off off off off off off off off off off
0.128 0.256 0.384 0.512 0.640 0.768 0.896 1.024 1.152 1.280 1.408 1.536 1.664 1.792 1.920 2.048
0.256 0.512 0.768 1.024 1.280 1.536 1.792 2.048 2.304 2.560 2.816 3.072 3.328 3.584 3.840 4.096
000
001
010
off off off off off off off off off off off off off off off off
0.064 0.128 0.192 0.256 0.320 0.384 0.448 0.512 0.576 0.640 0.704 0.768 0.832 0.896 0.960 1.024
0.128 0.256 0.384 0.512 0.640 0.768 0.896 1.024 1.152 1.280 1.408 1.536 1.664 1.792 1.920 2.048
0.512 1.024 1.536 2.048 2.560 3.072 3.584 4.096 4.608 5.120 5.632 6.144 6.656 7.168 7.680 8.192
1.024 2.048 3.072 4.096 5.120 6.144 7.168 8.192 9.216 10.240 11.264 12.288 13.312 14.336 15.360 16.384
2.048 4.096 6.144 8.192 10.240 12.288 14.336 16.384 18.432 20.480 22.528 24.576 26.624 28.672 30.720 32.768
n [6:4] of the RTICTL 011 100 101 0.256 0.512 0.768 1.024 1.280 1.536 1.792 2.048 2.304 2.560 2.816 3.072 3.328 3.584 3.840 4.096
0.512 1.024 1.536 2.048 2.560 3.072 3.584 4.096 4.608 5.120 5.632 6.144 6.656 7.168 7.680 8.192
1.024 2.048 3.072 4.096 5.120 6.144 7.168 8.192 9.216 10.240 11.264 12.288 13.312 14.336 15.360 16.384
110
111
4.096 8.192 12.288 16.384 20.480 24.576 28.672 32.768 36.864 40.960 45.056 49.152 53.248 57.344 61.440 65.536
8.192 16.384 24.576 32.768 40.960 49.152 57.344 65.536 73.728 81.920 90.112 98.304 106.496 114.688 122.880 131.072
110
111
2.048 4.096 6.144 8.192 10.240 12.288 14.336 16.384 18.432 20.480 22.528 24.576 26.624 28.672 30.720 32.768
4.096 8.192 12.288 16.384 20.480 24.576 28.672 32.768 36.864 40.960 45.056 49.152 53.248 57.344 61.440 65.536
Example 9.2 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic RTI interrupt that occurs every 32.768 ms. RTI is simple, and accurate if the desired interrupt period matches one of the possibilities shown in Table 9.6 or 9.7. The main program executes RTI_Init to initialize the RTI
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
345
interrupts, as shown in Program 9.2. The RTI rate is determined by the crystal frequency and the RTICTL register. Bit 7 of the CRGINT register is set to arm the RTI system. The RTI_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.
; 9S12C32 4 MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 RTI_Init sei ;make atomic movb #$77,RTICTL ;($73 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ rts ; interrupts every 32.768ms RTIHan movb #$80,CRGFLG ;ack ldd Time addd #1 std Time rti org $FFF0 fdb RTIHan ;vector
// 9S12C32 4 MHz, 9S12DP512 8 MHz unsigned short Time; void RTI_Init(void){ asm sei // RTICTL = 0x77; // CRGINT = 0x80; // Time = 0; // asm cli }
Make atomic (0x73 if C32) Arm Initialize
// interrupts every 32.768ms void interrupt 7 RTIHan(void){ CRGFLG = 0x80; // Acknowledge Time++; }
Program 9.2 Implementation of a periodic interrupt using the real time clock feature.
Checkpoint 9.8: How would you modify Program 9.2 to count every 10.24 ms?
9.6
Timer Overflow, Output Compare, and Input Capture
9.6.1 Timer Features and Timer Overflow
Table 9.8 shows the 9S12 registers used in timer overflow, input capture and output compare. The entries shown in bold will be used in this section. The timer overflow interrupt feature can also be used to generate interrupts at a fixed rate, as listed in Table 9.9. The 16bit TCNT register is incremented at a fixed rate. The TOF trigger flag is set when the counter overflows and wraps back around (automatically) to zero. If armed, the TOF trigger flag will generate an interrupt. Three bits (PR2, PR1, and PR0) in the TSCR2 register determine the rate at which the counter will increment, hence will determine the TOF interrupt rate. To clear the TOF flag (acknowledge the interrupt), the software writes a one to it. To create a TOF periodic interrupt, we enable the timer (TEN 1), arm the timer overflow (TOI), and set the rate (PR2-0). Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The TOF interrupt rate is TOF interrupt frequency fE /2n 16 TOF interrupt period 2n 16/fE
346
9 䡲 Interrupt Programming and Real-Time Systems
Address
msb
$0044 $0050 $0052 $0054 $0056 $0058 $005A $005C $005E
15 15 15 15 15 15 15 15 15
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0240 $0242 $0046 $004D $0040 $004C $004E $004F $0048 $0049 $004A $004B
PT7 DDRT7 TEN TOI IOS7 C7I C7F TOF OM7 OM3 EDG7B EDG3B
PT6 DDRT6 TSWAI 0 IOS6 C6I C6F 0 OL7 OL3 EDG7A EDG3A
PT5 DDRT5 TSBCK 0 IOS5 C5I C5F 0 OM6 OM2 EDG6B EDG2B
PT4 DDRT4 TFFCA 0 IOS4 C4I C4F 0 OL6 OL2 EDG6A EDG2A
PT3 DDRT3 0 TCRE IOS3 C3I C3F 0 OM5 OM1 EDG5B EDG1B
PT2 DDRT2 0 PR2 IOS2 C2I C2F 0 OL5 OL1 EDG5A EDG1A
PT1 DDRT1 0 PR1 IOS1 C1I C1F 0 OM4 OM0 EDG4B EDG0B
PT0 DDRT0 0 PR0 IOS0 C0I C0F 0 OL4 OL0 EDG4A EDG0A
PTT DDRT TSCR1 TSCR2 TIOS TIE TFLG1 TFLG2 TCTL1 TCTL2 TCTL3 TCTL4
14 14 14 14 14 14 14 14 14
13 13 13 13 13 13 13 13 13
12 12 12 12 12 12 12 12 12
11 11 11 11 11 11 11 11 11
10 10 10 10 10 10 10 10 10
9 9 9 9 9 9 9 9 9
8 8 8 8 8 8 8 8 8
7 7 7 7 7 7 7 7 7
6 6 6 6 6 6 6 6 6
5 5 5 5 5 5 5 5 5
4 4 4 4 4 4 4 4 4
3 3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1 1
lsb
Name
0 0 0 0 0 0 0 0 0
TCNT TC0 TC1 TC2 TC3 TC4 TC5 TC6 TC7
Table 9.8 9S12 registers used for timer overflow, input capture, and output compare.
E 4 MHz
E 8 MHz
E 24 MHz
PR2
PR1
PR0
Divide by
TCNT period
TOF period
TCNT period
TOF period
TCNT period
TOF period
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
1 2 4 8 16 32 64 128
250 ns 500 ns 1 s 2 s 4 s 8 s 16 s 32 s
16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms 2097.152 ms
125 ns 250 ns 500 ns 1 s 2 s 4 s 8 s 16 s
8.192 ms 16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms
42 ns 83 ns 167 ns 333 ns 667 ns 1.33 s 2.67 s 5.33 s
2.73067 ms 5.46133 ms 10.9227 ms 21.8453 ms 43.6907 ms 87.3813 ms 174.763 ms 349.525 ms
Table 9.9 Timer overflow periods for various E clock frequencies.
Example 9.3 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic timer overflow interrupt that occurs every 32.768 ms. The main program executes TOF_Init to initialize the periodic interrupts, as shown in
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
347
Program 9.3. The interrupt rate is determined by the crystal frequency, the PLL and the TSCR2 register. When an E clock of 8 MHz, 32.768 ms/125 ns is 218, so the bottom three bits of TSCR2 should be 2. Bit 7 of the TSCR2 register is set to arm the TOF system. The TOF_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.
; 9S12C32 4MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 TOF_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$82,TSCR2 ;($81 if C32) movw #0,Time cli ;enable IRQ rts TOFHan movb #$80,TFLG2 ;acknowledge ldd Time addd #1 std Time rti org $FFDE fdb TOFHan ;vector
// 9S12C32 4MHz, (9S12DP512 8 MHz) unsigned short Time; void TOF_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // enable counter TSCR2 = 0x82; // (0x81 if C32) Time = 0; // Initialize asm cli // enable interrupts }
interrupt 16 void TOFHan(void){ TFLG2 = 0x80; // Acknowledge Time++; }
Program 9.3 Implementation of a periodic interrupt using timer overflow.
Checkpoint 9.9: How would you modify Program 9.3 to count approximately every 1 second?
9.6.2 Output Compare Interrupts
The third mechanism to generate periodic interrupts is output compare. There are 8 independent output compare channels, numbered 0 to 7. Let i be the channel number, 0 i 7. To enable output compare the corresponding bit in the TIOS register must be set. When the TCNT register matches TCi, the output compare flag, CiF is set. If armed (CiI1), then it will request an interrupt. To clear the CiF flag (acknowledge the interrupt), the software writes a one to it. The ISR will acknowledge the interrupt and set TCi TCiPERIOD, where PERIOD is a constant, specifying the time for the next interrupt. The interrupting period is determined by the TCNT period (set by TSCR2) multiplied by the constant PERIOD. Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The output compare interrupt rate is OC interrupt frequency fE /2n/PERIOD OC interrupt period PERIOD*2n/fE TCTL1 and TCTL2 registers are also used for output compare. If OMi OLi 0 then an output compare event will not directly affect the output pin. If the pair (OMi,OLi) equals (0,1) then the output pin will toggle on each output compare. If the pair (OMi,OLi) equals (1,0) then the output pin will clear on each output compare. If the pair (OMi,OLi) equals (1,1) then the output pin will set on each output compare.
348
9 䡲 Interrupt Programming and Real-Time Systems
Example 9.4 Write software that increments a global variable every 1 second. Solution When an E clock of 8 MHz, 1 s/125 ns is 8,000,000. The only possibility is to make n equal to 7 and PERIOD equal to 62500. Program 9.4 shows a periodic interrupt using output compare 6, incrementing a global variable, Time, every 1 sec. During the initialization, bit 7 of TSCR1 is set to activate the timer system. The TCNT period is set to 16 s in TSCR2. Bit 6 in TIOS is set to activate output compare on channel 6. The arm bit is set in TIE. The global variable is cleared in the initialization. The initial value of TC6 is set so the first interrupt occurs in 80 s (subsequent interrupts will occur every 1 s). It is possible the C6F flag might already be set, due to activity occurring before the initialization is executed. Clearing the C6F trigger flag in the initialization guarantees the first interrupt will occur exactly 80 s later. ; 9S12C32 4 MHz, 9S12DP512 8 MHz PERIOD equ 62500 ;in 16usec org $0800 ;($3800 if C32) Time rmb 2 org $4000 OC6_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$07,TSCR2 ;($06 if C32) bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 movw #0,Time ldd TCNT ;time now addd #5 ;first in 80us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han movb #$40,TFLG1 ;acknowledge ldd TC6 addd #PERIOD std TC6 ;next in 1 s ldd Time addd #1 std Time rti org $FFE2 fdb OC6Han ;vector
// 9S12C32 4 MHz, 9S12DP512 8 MHz #define PERIOD 62500 unsigned short Time;
void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 16us TCNT TSCR2 = 0x07; // (0x06 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 Time = 0; // Initialize TC6 = TCNT+5; // first in 80us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+PERIOD; // next in 1 s TFLG1 = 0x40; // acknowledge C6F Time++; }
Program 9.4 Implementation of a periodic interrupt using output compare. Checkpoint 9.10: How would you modify Program 9.4 to count at 100 Hz? Observation: The phase-lock-loop (PLL) on the 9S12 will affect the TOF and output compare rates.
Example 9.5 Design an interface 32 speaker and use it to generate a loud 1 kHz sound. Solution At 5 V, a 32 speaker will require a current of about 150 mA. We will use the 2N2222 circuit in Figure 8.16 because it can sink at least three times the current needed for this speaker.
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
349
In this example the interface will be connected to PT6. We select a 5 V supply and connect it to the V in the circuit. The needed base current is Ib Icoil/hfe 150 mA/100 1.5 mA The desired interface resistor. Rb (VOH Vbe)/ Ib (5 0.6)/1.5 mA 2.9 k To cover the variability in hfe, we will use a 1.5 k resistor instead of the 2.9 k. The actual voltage on the speaker when active will be 5 0.3 4.7 V. We can make the sound quieter by using a larger resistor for Rb. To generate the 1 kHz sound we need a 1 kHz squarewave. There are two good methods on the 9S12 to generate squarewaves. First, the output compare module can be used to create an interrupt every 0.5 ms, and make the output toggle at each interrupt. The second method uses the pulse width modulator (PWM) and previously presented in Section 8.6. The output compare method is used here (Program 9.4 adapted), but the PWM approach has the advantage of not requiring a periodic interrupt. The initialization of Program 9.5 selects toggle mode for output compare 6. Specifically, we set the bits (OM6,OL6) to (0,1) in TCTL1. To select the frequency of the sound we simply set the rate at which output compare interrupts are generated. To turn the sound off, we disarm OC6 interrupts. Notice with toggle mode, the output compare hardware changes the PT6 output automatically. Using automatic mode (as compared to having the software set and clear the port) creates a squarewave with a very low jitter (down to the stability of the crystal).
; 9S12C32 4 MHz, 9S12DP512 8 MHz OC6_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$03,TSCR2 ;($02 if C32) bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 bclr TCTL1,#$20 ;OM6=0 bset TCTL1,#$10 ;OL6=1 ldd TCNT ;time now addd #50 ;first in 50us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han movb #$40,TFLG1 ;acknowledge ldd TC6 addd #500 std TC6 ;next in 0.5 ms rti org $FFE2 fdb OC6Han ;vector
// 9S12C32 4 MHz, 9S12DP512 8 MHz void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 1 MHz TCNT TSCR2 = 0x03; // (0x02 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 TCTL1 = (TCTL1&0xCF)|0x10; TC6 = TCNT+50; // first in 50us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+500; // next in 0.5 ms TFLG1 = 0x40; // acknowledge C6F }
Program 9.5 Sound output using output compare.
Observation: To make a quieter sound, we could use a larger resistor between the 9S12 output and the 2N2222 base.
350
9 䡲 Interrupt Programming and Real-Time Systems
9.6.3 Input Capture Interrupts
We can use input capture to measure the period or pulse width of digital signals. The input capture system can also be used to trigger interrupts on rising or falling transitions of external signals. Table 9.8 shows the registers needed for input capture. TCNT is a 16-bit counter incremented at a fixed rate, determined by the E clock and the TSCR2 register. On most 9S12 microcontrollers, an input capture feature exists for each of the eight Port T inputs (let n be 0 to 7, representing the input PT0 to PT7 respectively.) There is a separate 16-bit input capture register for each of the 8 input capture modules (TC0 to TC7). Each input capture module has 䡲 䡲 䡲 䡲 䡲 䡲
A direction register bit, DDRTn An external input pin, PTn A flag bit, CnF Two edge control bits, EDGnB EDGnA An interrupt arm bit CnI A 16-bit input capture register, TCn
In this book, we use the term arm to describe the bit that allows/denies a specific flag from requesting an interrupt. The Freescale manuals refer to this bit as a mask. I.e., the device is armed when the mask bit is 1. Typically, there is a separate arm bit for every flag that can request an interrupt. An external input signal is connected to the input capture pin (PT0 to PT7). The EDGnB, EDGnA bits specify whether the rising, falling or both rising and falling edges of the external signal will trigger an input capture event, see Table 9.10. Two or three actions result from an input capture event: 1. The current TCNT value is copied into the input capture register, TCNT → TCn 2. The input capture flag is set, 1 → CnF 3. An interrupt is requested if the CnI equals 1 This means an interrupt can be requested on a capture event. The input capture mechanism has many uses. Three of common applications of input capture are: 1. An interrupt service routine is executed on the active edge of the external signal 2. Perform two rising edge input captures and subtract the measurements to get period 3. Perform rising edge then falling edge captures and subtract the measurements to get pulse width The flag bits do not behave like a regular memory location. In particular, a flag can not be set by software. Rather, an input capture or output compare hardware event will set the flag. The other peculiar behavior of the flag is that the software must write a one to the flag in order to clear it. If the software writes a zero to the flag, no change will occur. The pin is selected as input capture by placing a 0 in the corresponding bit of the TIOS register. There is a direction register, DDRT, and we should clear the corresponding bits for the input capture inputs. We specify the active edge (i.e., the edge that latches TCNT and sets the flag) by initializing the TCTL3 and TCTL4 registers, as described in Table 9.10. We can arm or disarm the input capture interrupts by initializing the TIE register. Our software can determine if an input capture event has occurred by reading the TFLG1 register. Every time the TCNT register overflows from $FFFF to 0, the TOF flag in the TFLG2 register is set. The TOF flag will cause an interrupt if the mask TOI equals 1. Checkpoint 9.11: When does an input capture event occur?
Table 9.10 Two control bits define the active edge used for input capture.
EDGnB
EDGnA
Active Edge
0 0 1 1
0 1 0 1
None Capture on rising Capture on falling Capture on both rising and falling
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
351
Checkpoint 9.12: What happens during an input capture event? Observation: The TCNT timer is very accurate because of the stability of the crystal clock. Therefore, measurements based on the clock will also be very accurate. Observation: When measuring period or pulse-width, the measurement resolution will equal the TCNT period.
The flags in the TFLG1 and TFLG2 registers are cleared by writing a 1 into the specific flag bit we wish to clear. For example, writing a $FF into TFLG1 will clear all 8 flags. The following is a valid method for clearing C3F. I.e., this acknowledge sequence clears the C3F flag without affecting the other 7 flags in the TFLG1 register. TFLG1 = 0x08; Checkpoint 9.13: Write assembly or C code to clear C6F. Common Error: Executing TFLG1 |= 0x08; will mistakenly clear all the bits in the TFLG1 register.
Example 9.6 Design a system that measures period with a resolution of 1 s. Solution Period is defined as the time from one rising edge to the next rising edge. The input signal will be connected to PT1 (any Port T pin could have been used) and the input capture system will be used to measure period. The initialization function first sets the I bit, so interrupts do not occur until the entire initialization sequence is complete, see Program 9.6. TIOS bit 1 and DDRT bit 1 are cleared so PT1 will be an input capture. Input capture is part of the timer module, which is activated by setting the TEN bit. The resolution of the system is determined by the period of the TCNT, so TSCR2 is set to make the TCNT period equal to 1 s, assuming the E clock is 8 MHz. Because the 9S12 must execute the ISR every rising edge, we should not try to use this solution to measure periods less than 50 s. In particular, it takes 9 bus cycles to perform an interrupt context switch plus 31 cycles to execute this assembly language ISR (Metrowerks Codewarrior C ISR executes in 30 cycles), so 40 cycles or 5 s are required for each edge. If the input wave has a period of 50 s, then the ISR software consumes 10% of the available processor execution. On the other extreme, this solution will will be incorrect for periods over 65.535 ms. The TCTL4 register is configured to so PT1 captures on each rising edge. Global variables are initialized and interrupts are armed and enabled. The 16-bit subtraction in the ISR calculates the number of TCNT clocks between rising edges. Since the ritual does not wait for the first edge, the first period measurement will be incorrect and should be neglected. Period rmb 2 ;resolution 1us First rmb 2 ;TCNT at first edge Done rmb 1 ;set each rising Init sei ;make atomic bclr TIOS,#$02 ;PT1=input capture bclr DDRT,#$02 ;PT1 is input movb #$80,TSCR1 ;enable TCNT movb #$03,TSCR2 ;1us clk bclr TCTL4,#$08 ;EDG1BA =01 bset TCTL4,#$04 ;on rise of PT1 movw TCNT,First ;init global
// Range = 50 us to 65.535 ms, // no overflow checking unsigned short Period; // 1us units unsigned short First; // TCNT first edge unsigned char Done; // Set each rising void Init(void){ asm sei // make atomic TIOS &=~0x02; // PT1 input capture DDRT &=~0x02; // PT1 is input TSCR1 = 0x80; // enable TCNT TSCR2 = 0x03; // 1us clock
continued on p. 354 Program 9.6 A software system implementing 16-bit period measurement.
352
9 䡲 Interrupt Programming and Real-Time Systems
continued from p. 353 clr Done movb #$02,TFLG1 ;clear C1F bset TIE,#$02 ;Arm C1F cli ;enable rts TC1Han ldd TC1 [3] subd First [3] std Period ;1us resolution[3] movw TC1,First ;setup [6] movb #$02,TFLG1 ;clear C1F [4] movb #$FF,Done [4] rti [8] org $FFEC ;timer channel 1 fdb TC1Han
TCTL4 = (TCTL4&0xF3)|0x04; // rising First = TCNT; // first will be wrong Done = 0; // set on subsequent TFLG1 = 0x02; // Clear C1F TIE |= 0x02; // Arm IC1 asm cli } void interrupt 9 TC1Han(void){ Period = TC1-First; // 1us resolution First = TC1; // Setup for next TFLG1 = 0x02; // ack by clearing C1F Done = 0xFF; }
Because the input capture interrupt has a separate vector the software does not poll. An interrupt is requested on each rising edge of the input signal. Figure 9.9 illustrates the period measurement for one situation with a period of 8192 s. On the first interrupt, TCNT ($F000) is latched into TC1. The ISR will save the $F000 in the private global called First. On the second interrupt, TCNT ($1000) is latched into TC1. The ISR will perform a 16-bit subtraction of $1000 $F000 $2000 8192, and store the 8192 into the public global called Period. This method is accurate as long as the period is between 50 and 65535 s. Figure 9.9 Example measurement of an input with a 8192 s period.
TCNT
EFFF F000 F001
FFFE FFFF 0000 0001
0FFF 1000 1001
1μs
8192 μs = 8192 cycles PT1 TC1
C1F=1 XXXX
F000
C1F =1 F000
F000
1000
Checkpoint 9.14: How would you modify Program 9.6 to implement a 2 s measurement resolution?
The interface circuit in Figure 9.7 could be combined with Program 9.6 to measure the speed of a spinning motor by connecting V2 to PT1 and calculating Speed = constant/Period.
9.7
Pulse Accumulator The pulse accumulator is a mechanism on the 9S12 to count events, measure frequency, or measure pulse width on a digital input signal. For example, if we wished to know how fast a motor is spinning, we could use a tachometer, which generates a squarewave with a frequency that is related to motor speed. We interface the tachometer output to the PT7 input and use the pulse accumulator to measure either frequency or pulse width. The software then converts the pulse accumulator measurements into motor speed. The 9S12 pulse accumulator is a 16-bit read/write counter that can operate in either of two modes. External event counting mode can be used for counting events or frequency measurement. We will use gated time accumulation mode for pulse width measurement. The I/O ports involved in the 9S12 pulse accumulator are shown in Table 9.11. The bits used in this section are shown in bold.
9.7 䡲 Pulse Accumulator
353
Address
msb
$0062
15
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0046 $0060 $0061 $0240 $0242
TEN 0 0 PT7 DDRT7
TSWAI PAEN 0 PT6 6
TSBCK PAMOD 0 PT5 5
TFFCA PEDGE 0 PT4 4
0 CLK1 0 PT3 3
0 CLK0 0 PT2 2
0 PAOVI PAOVF PT1 1
0 PAI PAIF PT0 Bit 0
TSCR1 PACTL PAFLG PTT DDRT
14
13
12
11
10
9
8
7
6
5
4
3
2
1
lsb
Name
0
PACNT
Table 9.11 9S12 I/O ports used by the pulse accumulator.
DDRT7 is the Data Direction bit for PT7. Normally, the DDRT7 bit is cleared so PT7 is an input, but even if it is configured for output, PT7 still drives the pulse accumulator. PAEN is the Pulse Accumulator System Enable bit. Turn this bit on to activate the pulse accumulator. The PAMOD and PEDGE bits select the operation mode, as shown in Table 9.12.
PAMOD
PEDGE
Mode
Action on Clock
Sets PAIF
0 0 1 1
0 1 0 1
event counting event counting gated time accumulation gated time accumulation
PT7 falling edge increments PACNT PT7 rising edge increments PACNT Counts when PT7 1 Counts when PT7 0
Falling edge Rising edge Falling edge Rising edge
Table 9.12 9S12 pulse accumulator operation modes on PT7.
In the event counting mode, the 16-bit counter (PACNT) is incremented on either the rising edge or falling edge of PT7. The maximum clocking rate for the external event counting mode is the E clock frequency divided by two. Event counting mode does not require the timer to be enabled. To use counting mode to measure frequency, we count the number of edges in a fixed time, T. We define frequency resolution as the smallest change in frequency the system can recognize. In this approach, the frequency resolution will be 1/T. The range of frequencies that can be measured will be 0 to 65535/T. In the gated time accumulation mode, a free-running clock (E clock divided by 64) increments the 16-bit counter. In particular, the E clock divided by 64 increments PACNT while the PT7 input is active. Gated accumulation mode does require the TEN in the TSCR1 register to be set. We can use gated accumulation mode to measure pulse width. We define pulse width resolution as the smallest change in pulse width the system can recognize. Let tE be the period of the E clock. The pulse width resolution will be 64*tE. The range of pulse widths that can be measured will be 64*tE to 65535*64*tE. The PAOVF status bit is set each time the pulse accumulator count rolls over from $FFFF to $0000. To clear this status bit, we write a one to the PAFLG register bit 1. The PAOVI will arm the device so that a pulse accumulator interrupt is requested when PAOVF is set. When PAOVI is zero, pulse accumulator overflow interrupts are disarmed. The PAIF status bit is automatically set each time a selected edge is detected at the PT7 pin (PEDGE 0 means falling edge, and PEDGE 1 means rising edge). To clear this status bit, write to the PAFLG register bit 1. The PAII will arm the device so that a pulse accumulator interrupt is requested when PAIF is set. When PAII is zero, pulse accumulator input interrupts are disarmed.
354
9 䡲 Interrupt Programming and Real-Time Systems Observation: The PACNT input and timer channel 7 use the same pin PT7. To use the pulse accumulator, disconnect PT7 from the output compare logic by clearing bits, OM7 and OL7. Also clear the channel 7 output compare 7 mask bit, OC7M7.
Example 9.7 Design a system that measures frequency with a resolution in Hz. Solution To estable the frequency resolution at 1 Hz, we count the number of falling edges that occur in one second. The signal to be measured will be connected to the pulse accumulator input, which is PT7 on the 9S12. The frequency measurement function, shown in Program 9.7, enables the pulse accumulator and selects event counting mode. When measuring frequency it usually doesn’t matter whether we count rising or falling edges. But, in this case, falling edges will be counted. The approach will be to initialize the pulse accumulator to event counting, clear the count, wait 1 second, then read the counter. Since frequency is defined as the number of edges in one second, the value in the PACNT after the one second time delay will be frequency in Hz. The 9S12 can measure 0 to 65535 Hz. In both cases, the frequency resolution (which is the smallest change in frequency that can be distinguished) will be 1 Hz. In general, the frequency resolution will be one divided by the fixed time during which counts are measured. The PAOVF bit will be set if the input frequency exceeds the measurement range. If the input signal has a frequency of 22.1 Hz (as illustrated in Figure 9.10), then function will return a result of 22.
Program 9.7 Frequency measurement using the pulse accumulator.
Figure 9.10 Example measurement with an input with a 22 Hz frequency.
Freq_Init bclr DDRT,#$80 ;PT7 is input movb #$40,PACTL ;count falling rts ;measures 0 to 65535 Hz ;returns Reg D = freq in Hz Freq_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF ldy #1000 bsr Timer_Wait1ms brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in Hz out rts
void Freq_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x40; // count falling } // measures 0 to 65535 Hz // returns result in Hz unsigned short Freq_Measure(void){ PACNT = 0; PAFLG = 0x02; Timer_Wait1ms(1000); if(PAFLG&0x02){ return(65535); } return PACNT; // frequency }
1s PT7 PACNT
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Checkpoint 9.15: What will be the output of Program 9.7 if the frequency is 1234.56 Hz? Checkpoint 9.16: How do you modify Program 9.7 so that it measures frequency with a resolution of 1 kHz? What the output then be if the frequency is 1234.56 Hz?
9.7 䡲 Pulse Accumulator
355
Example 9.8 Design a system that measures pulse width with a resolution of 8 s. Solution Pulse width is defined as the time the input signal is high. Again, the input signal will be connected to the pulse accumulator input, which is PT7 on the 9S12. The pulse width measurement function, shown in Program 9.8, enables the pulse accumulator and selects gated accumulation mode. In this case, PEDGE is set to zero, so the PACNT will accumulate when the input is high. With PEDGE equal to zero, the PAIF will be set on the falling edge of the input, signaling the pulse width measurement is complete. The approach will be to initialize the pulse accumulator to gated accumulation mode, clear the count, wait for PAIF to be set, then read the counter. Since PACNT counts while the input is high, the value in this counter will represent the width of the pulse. The pulse width resolution is the smallest change in pulse width that can be distinguished. In general, the pulse width resolution will be the period of the free-running clock used to increment the counter. Assuming the 9S12 E clock period is 125 ns, the pulse width resolution will be 8 s. The 9S12 can measure 8 s to 0.52 s. The PAOVF bit will be set if the input pulse width exceeds the measurement range. If the input signal has a pulse width of 152 s (as illustrated in Figure 9.11), then function will return a result of 152/8 or 19.
Pulse_Init bclr DDRT,#$80 ;PT7 is input movb #$60,PACTL ;measure high rts ;returns Reg D = pulse width in 8us ; measures 8us to 0.52s Pulse_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF loop brclr PAFLG,#$01,loop brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in 8us out rts
void Pulse_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x60; // measure high } // measures 8us to 0.52 sec // returns result in 8us unsigned short Pulse_Measure(void){ PACNT = 0; PAFLG = 0x02; while((PAFLG&0x01)==0){}; if(PAFLG&0x02){ return(65535); } return PACNT; // pulse width }
Program 9.8 Pulse width measurement using the pulse accumulator.
Figure 9.11 Example measurement of an input with a 152 s pulse width.
152μs PT7
PAIF set
E/64 PACNT
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
8μs
Checkpoint 9.17: The pulse width resolution of the system in Program 9.8 is 8 s. What does that mean? Checkpoint 9.18: What will be the output of Program 9.8 if the pulse width is 1234.5 sec?
356
9.8
9 䡲 Interrupt Programming and Real-Time Systems
*Direct Memory Access The purpose of this section is to introduce terminology of high-speed I/O interfacing. The bandwidth of an I/O device is the number of bytes/sec that can be transferred. Real-time systems have extremely tight requirements for both latency and bandwidth. The 9S12 has a 16-bit data bus. If it is executing at 24 MHz, the data bus bandwidth is 48 Mbytes/sec. A high speed SCI interface can achieve only 10,000 bytes/sec. The SPI clock can run at 12 Mbps, but peak bandwidth for an SPI interface will be limited by software speed. One of the limitations of software-based interfaces such as busy-wait and interrupts is the data must be brought into the processor and manipulated before it can be transferred to memory. If you wish to transfer data from an SPI input device into RAM, you must first transfer it from SPIDR to Register A, then from Register A into RAM. In order to achieve high bandwidth, we need to be able to transfer data directly from input to RAM or RAM to output using Direct Memory Access, or DMA. Because DMA is faster, we will use this method to interface high bandwidth devices like disks and networks. A key architecture component is the availability of a co-processor that can perform I/O functions in parallel with but separate from the processor execution. We program a co-processor in a similar manner as the way we program the regular processor. For example, there can be a program counter and general purpose registers. However, the instructions are usually very simple, explicitly defining an I/O operation to perform. An architecture with simple and explicit machine codes is called Reduced Instruction Set Computer (RISC). Devices that support DMA include the hard drive controller on the PC, video graphics controller on the PC, and the XGate peripheral co-processor on the 9S12X series of microcontrollers from Freescale. During a read DMA cycle (Figure 9.12) data flows directly from the memory to the output device. During the DMA cycles the co-processor drives the address and control bus.
Figure 9.12 A DMA read cycle copies data from RAM, ROM or input device into an output device.
$98 DMA Read Cycle
$3800 R Processor
Input ports
Input signals
Output ports
Output signals
RAM $98 ROM Address Control Data
During a write DMA cycle (Figure 9.13) data flows directly from the input device to memory. Figure 9.13 A DMA write cycle copies data from the input device into RAM, or output device.
$25 $3800
DMA Write Cycle W
Processor Input $25 ports
Input signals
RAM Output ports ROM Address Control Data
Output signals
9.9 䡲 Hardware Debugging Tools
357
Prediction: The need for I/O bandwidth will increase faster than the processor execution speed, therefore I/O co-processors will become more prevalent in embedded systems of the future. Prediction: The power requirements increase linearly with bandwidth, so there will always be a place in the embedded systems market for low speed low power systems.
9.9
Hardware Debugging Tools Microcomputer related problems often require the use of specialized equipment to debug the system hardware and software. Two very useful tools are the logic analyzer and in-circuit emulator (ICE). A logic analyzer is essentially a multiple channel digital storage scope with many ways to trigger (see Figure 9.14). As a trouble shooting aid, it allows the experimenter to observe numerous digital signals at various points in time and thus make decisions based upon such observations. As with any debugging process, it is necessary to select which information to observe out of a vast set of possibilities. Any digital signal in the system can be connected to the logic analyzer. Figure 9.14 shows an 8-channel logic analyzer, but real devices can support 128 or more channels. One problem with logic analyzers is the massive amount of information that it generates. With logic analyzers (similar to other debugging techniques) we must strategically select which signals in the digital interfaces to observe and when to observe them. In particular, the triggering mechanism can be used to capture data at appropriate times eliminating the need to sift through volumes of output. Sometimes there are extra I/O pins on the microcontroller, not needed for the normal operation of the system (shown as the bottom two wires in Figure 9.14). In this case, we can connect the pins to a logic analyzer, and add software debugging instruments that set and clear these pins at strategic times within the software. In this way we can visualize the hardware/software timing.
Figure 9.14 A logic analyzer and example output.
Logic Analyzer 9S12
Digital Interface Digital Interface
PT1 PT0
Some microcontrollers have external pins containing the address, R/W, and data containing bus cycle information as discussed in Section 4.2. In this case, we could connect address, R/W, and data to the logic analyzer. The logic analyzer must be synchronized to the processor, so that the analyzer knows which memory reads are op code fetches. This way the location and the data it calculates can be reconstructed from the bus cycles. This debugging method is nonintrusive. This process doesn’t work on high-performance processors such as the Pentium because (1) there is an internal memory cache to contain data it needs most frequently, and (2) it fetches many op codes that are never actually executed, as it tries to prefetch machine codes it thinks the processor will need in the future. An in-circuit emulator is a hardware debugging tool that recreates the input/output signals of the processor chip. To use an ICE, we remove the processor chip. One side of the cable is inserted into the vacated processor chip socket, and the other side is connected to the ICE. Figure 9.15 shows the microcomputer system with and without the ICE. Notice the cable between the debugging instrument (ICE) and the microcomputer socket on the target board. In most cases, the emulator/computer system operates at full speed. The emulator allows the programmer to observe and modify internal registers of the processor. Emulators
358
9 䡲 Interrupt Programming and Real-Time Systems
In-Circuit Emulator Registers I/O Ports I/O
I/O
9S12 Embedded System with microcomputer and I/O
A B X Y SP PC
= = = = = =
$55 $31 $1234 $5678 $0BF0 $F103
PortH PortJ PortS PortT PortE TCNT
= = = = = =
$83 $00 $55 $0F $21 $A010
I/O
I/O
Socket Embedded System with emulator and I/O
Figure 9.15 In-circuit emulator and example output.
are often integrated into a personal computer, so that its editor, hard drive, and printer are available for the debugging process. Observation: Many target microcomputer systems have the microcomputer chip soldered onto the circuit board, and thus can not be removed.
To debug a board-level system where the program is stored in an external ROM, we can use another class of emulator called the ROM-emulator (see Figure 9.16). This debugging tool replaces the ROM with cable connects to a dual-port RAM within the emulator. While the software is running, it fetches information from the emulator RAM just like it was the ROM. While the software is halted, you can modify its contents. Figure 9.16 In-circuit ROM emulator and example output.
Emulator Address Contents Interpretation $E000 $E001 $E002 $E003 $E004 $E005
$B6 $02 $40 $B7 $02 $50
ldaa $0240 staa $0250
Processor ROM socket
RAM
Address/data bus
Observation: An in-circuit ROM emulator can only be used in a microcomputer system that stores the program into an external ROM chip.
The only disadvantage of the in-circuit emulator is its cost. To provide some of the benefits of this high-priced debugging equipment, the 9S12 has a background debug module (BDM). The BDM hardware exists on the microcomputer chip itself and communicates with the debugging computer via a dedicated serial interface, as shown in Figure 9.17. Although not as flexible as an ICE, the BDM can provide the ability to observe software execution in real-time, the ability to set breakpoints, the ability to stop the computer and the ability to read and write registers, I/O ports and memory. The registers can only be observed when the computer is halted, but the memory and I/O ports are accessible while the program is executing.
9.10
Profiling Profiling is similar to performance debugging because both involve dynamic behavior. Profiling is a debugging process that collects the time history of strategic variables. For example if we could collect the time-dependent behavior of the program counter, then we could see the execution patterns of our software. We can profile the execution of a multiple thread software system to detect reentrant activity. We can profile a software system to see which of two software modules is run first. For a real-time system,
9.10 䡲 Profiling
359
Figure 9.17 P&E Microcomputer Systems Multilink BDM.
we need to guarantee the time between when software should be run and when it actually runs is short and bounded. Profiling allows us to measure when software is actually run, experimently verifying the system is real-time.
9.10.1 Profiling Using a Software Dump to Study Execution Pattern
Program 9.9 Debugging instrument for profiling.
In this section, we will use a debugging instrument to study the execution pattern of our software. In order to collect information concerning execution we will define a debugging instrument that saves the time and location in an array (like a dump), as shown in Program 9.9. The debugging session will initialize the private global N to zero. In this profile, the place p will be an integer, uniquely specifying from which place in the software Profile is called. The assembly version of Profile requires 44 cycles to execute (including the ldy and jsr). If the 9S12 is running at 24 MHz, this debugging instrument consumes less then 2 s per call. This amount of time would usually be classified as minimally intrusive.
Time rmb 200 Place rmb 200 N rmb 1 Profile ;RegY contains p pshb pshx ldab N cmpb #100 ;full? bhs Pdone lslb ;16-bits each ldx #Time movw TCNT,B,x ;record time ldx #Place sty B,X ;record place inc N Pdone pulx pulb rts
unsigned short Time[100]; unsigned short Place[100]; unsigned char N; void Profile(unsigned short p){ if(N0) { Profile(1); s16 = 16*s; t = 32; // guess 2.0 for(cnt=3; cnt; cnt—){ Profile(2); t = ((t*t+s16)/t)/2; } } Profile(3); return t; }
Observation: Debugging instruments need to save and restore registers so the original function is not disrupted.
9.10.2 Profiling Using an Output Port
In this section, we will discuss a hardware/software combination to visualize program activity. Our debugging instrument will set output port bits. We will place these instruments at strategic places in the software. If we are using a regular oscilloscope, then we must stabilize the system so that the function is called over and over. We connect the output pins to a scope or logic analyzer and observe the program activity. Program 9.11 uses an output port to profile.
9.10 䡲 Profiling Program 9.11 A time/position profile using two output bits.
;------t=sqrt(s)-----; input s RegA, resolution 1/16 ; output t Reg B, 1/16 t rmb 1 ;8-bit, res=1/16 cnt rmb 1 ;loop counter s16 rmb 2 ;16-bit 16*s sqrt movb #0,PTT clrb ;sqrt(0)=0 tsta beq done movb #1,PTT ldab #16 mul ;16*s std s16 ;s16=16*s movb #32,t ;t=2.0 movb #3,cnt next movb #2,PTT ldaa t ;RegA=t tab ;RegB=t tfr a,x ;RegX=t mul ;RegD=t*t addd s16 ;RegD=t*t+16*s idiv ;RegX=(t*t+16*s)/t tfr x,d lsrd ;RegB=((t*t+16*s)/t)/2 adcb #0 stab t dec cnt bne next done movb #3,PTT rts
361
//------t=sqrt(s)-----unsigned char sqrt(unsigned char s){ unsigned char t; // resolution 1/16 unsigned char cnt; // loop counter unsigned short s16; PTT = 0; t = 0; // secant method if(s>0) { PTT = 1; s16 = 16*s; t = 32; // guess 2.0 for(cnt=3; cnt; cnt--){ PTT = 2; t = ((t*t+s16)/t)/2; } } PTT = 3; return t; }
Checkpoint 9.19: Write two friendly debugging instruments, one that sets Port B bit 3 high, and the other makes it low.
9.10.3 *Thread Profile
When more than one thread is active, you could use the previous technique to visualize the thread that is currently running. For each thread, we assign an output pin. The debugging instrument would set the corresponding bit high when the thread starts and clear the bit when the thread stops. We would then connect the output pins to a multiple channel scope or logic analyzer to visualize in real-time the thread that is currently running. For an example of this type of profile, run one of the thread.* examples included with the TExaS simulator, and observe the logic analyzer. Program 9.12 shows a simple thread profile of a system with a foreground thread (main program) and a background thread (ISR). PT1 will be high when the software is running in the foreground and PT0 will be high when executing in the background. The debugging instruments are shown in bold. The ISR saves the previous PTT value at the beginning and restores it at the end. The results shown in Figure 9.18 demonstrate the interrupt occurs every 128 s and most of the time, the software is running in the foreground.
362
9 䡲 Interrupt Programming and Real-Time Systems
org $0800 ;($3800 if C32) rmb 2 org $4000 main lds #$4000 bset DDRT,#$03 ;PT1,PT0 output movb #$20,RTICTL ;($10 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ movb #$02,PTT ;foreground loop bra loop ; interrupts every 128us RTIHan ldab PTT ;save movb #$01,PTT ;background movb #$80,CRGFLG ;ack ldx Time inx stx Time stab PTT ;restore rti org $FFF0 fdb RTIHan ;vector Time
unsigned short Time; void main(void){ DDRT |= 0x03; // PT1,PT0 output RTICTL = 0x20; // (0x10 if C32) CRGINT = 0x80; // Arm Time = 0; // Initialize asm cli PTT = 0x02; // foreground while(1){ } } // interrupts every 128us void interrupt 7 RTIHan(void){ char oldPTT=PTT; PTT = 0x01; // background CRGFLG = 0x80; // Acknowledge Time++; PTT = oldPTT; }
Program 9.12 Implementation of a periodic interrupt using the real time clock feature.
Figure 9.18 Real-time thread profile measured with a logic analyzer.
Observation: Notice in Figure 9.18 that the time to execute the ISR (when PT0 is high) is short compared to the time between interrupt requests (period of PT0). This represents a good interrupt design.
9.11 䡲 Tutorial 9. Profiling
9.11
363
Tutorial 9. Profiling In this tutorial we will profile a real-time system that uses four periodic output compare interrupts. The goal of the system is to periodically execute four separate tasks in the background. Each task is performed at fixed rate; the four rates are similar but unequal, as shown in Table T9.1. As you can see from the table, in each 1 second the time to execute all four tasks is less than 200 ms. In other words, we plan to use only 20 percent of the available processor time. The TCNT period will be set to 16 s.
Task
ISR code
Interrupt period
Time to execute Task
Total time in 1 second
Task 0 Task 1 Task 2 Task 3
TC0TC079 TC1TC173 TC2TC267 TC3TC350
1264 s 1168 s 1072 s 800 s
50 s 50 s 50 s 50 s
39.6 ms 42.8 ms 46.6 ms 62.5 ms
Table T9.1 Real-time requirements of an embedded system. Monitors and memory dumps are minimally intrusive techniques to collect strategic information without slowing down too much the system we are testing. At the start of each ISR, one bit in Port T will be set high, and at the end of the ISR, that bit will be cleared. In addition, the main program will toggle PT4. We profile this system by observing all five bits on a logic analyzer. This profile will allow us to see where and when our tasks are running. We will be studying Task 3 in particular, so we expect PT3 to go high for 50 s every 800 s. The second debugging instrument used in this tutorial is a memory dump. It is a memory dump because the debugging information is not output or displayed, but rather it is just dumped into a memory buffer. In particular, we will measure the time between one execution of Task3 until next execution of Task3. These measurements are entered into a histogram so we can see the variability in the period. Let I be the difference in TCNT cycles, which we expect to be 50 each time. We calculate the time error or jitter as JI50. Next, we make it unsigned (KJ8) and apply upper and lower bounds (if K0 then K0, if K16 then K16). A histogram is a count of the number of times an event occurs, so we perform Dbg_Hist[K]. The first entry, Dbg_Hist[0], is the number of times the time between executing Task3 is more than 96 s too early. The middle entry, Dbg_Hist[8], is the number of times it perfect (50 cycles or 800 s). Similarly, Dbg_Hist[16] is the number of times it is more than 96 s too late. Dbg_Hist rmb 34 ;16-bit counts Question 9.1 What could cause a delay in executing the Task3 ISR? The debugging instruments shown in Program T9.1 were used to profile the system. FirstFlag is a flag used to skip the first measurement, because there is no previous interrupt to measure the time delay from. PreviousTCNT is the TCNT measurement from the previous execution of Task3. The initialization is called once, and the measurement is called from the start of Task3. Question 9.2 Why is it important to know the variability in the time between successive executions of a periodic task? Question 9.3 Consider the situation when two interrupts are requested at the same time. Is one lost or just delayed? If both are executed, which one goes first? Observation: Profiling is made easier if the subroutine as a single rts exit point at the bottom of the function. Action: Copy the Tutor9.rtf Tutor9.uc Tutor9.scp files from the web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. Assemble and run the system, observing the logic analyzer. You should see something like Figure T9.1 Question 9.4 Observe Figure T9.1. PT3 signifies the execution of Task3. The time between the first and second PT3 pulses is noticeably longer than the time between the second and third PT3 pulses. Why?
364
9 䡲 Interrupt Programming and Real-Time Systems
Program T9.1 Debugging instruments to measure time jitter of a periodic task.
Dbg_Init bset movb ldx ldy Dbg.1 clr dbne rts
DDRT,#$1F ;monitors #1,FirstFlag #Dbg_Hist #34 1,x+ ;clear Y,Dbg.1
Dbg_Measure ldd TCNT ;time now pshd tst FirstFlag bne Dbg.2 subd PreviousTCNT subd #42 ;means 42*16=672us bpl Dbg.3 ldd #0 ;way too small Dbg.3 cpd #16 bls Dbg.4 ldd #16 ;way too big Dbg.4 ldx #Dbg_Hist lsld ;16-bit entries ldy D,X iny sty D,X Dbg.2 puld std PreviousTCNT clr FirstFlag rts
Question 9.5 In the original design specification we expected the four tasks to occupy 20% of the available processor time. Does the data in Figure T9.1 support or reject this hypothesis?
Figure T9.1 Profile the system.
Question 9.6 Observing the listing file, estimate the intrusiveness of the Dbg_Measure instrument. Action: Close the logic analyzer window (so the simulation runs faster). Start the system and let it run for a long time. Question 9.7 The Dbg_Hist[8] entry will get very large, but does either Dbg_Hist[0] or Dbg_Hist[16] ever get incremented? What does that mean?
9.12 䡲 Homework Problems
365
Action: If we were to active the PLL changing the E clock from 8 MHz to 24 MHz, then the tasks would run 3 times faster. We can quickly simulate this effect by changing the 400 in the Fiftyus function to 133, making all four tasks complete in about 17 s instead of 50 s. Changing the E clock does not change how often the tasks should be run. I.e., Task 3 should still run every 800 s, but now only takes 17 s to complete. Assemble the new system and let it run for a long time. Question 9.8 Can you say this new system is real time? Question 9.9 How would you prove Task 3 is now running in real time?
9.12
Homework Problems Homework 9.1 Your job is to design a device driver for a computer mouse. Assuming it is to be written in C, give the Mouse.h header file that lists the prototypes for the public functions. You show just the header file, not the implementation file. Homework 9.2 Your job is to design a device driver for a black and white text-based video screen. There are 24 lines and 80 columns. Assuming it is to be written in C, give the Video.h header file that lists the prototypes for the public functions. Homework 9.3 In this problem you will write an assembly language subroutine that outputs data to the following printer using a busy-waiting handshake protocol.
Figure Hw9.3 Printer interface.
9S12
Printer
Start
Start
PA4 PA0 PB7-PB0
Ack Data
Ack Data
The following sequence will print one ASCII character: 1. The microcomputer puts the 8-bit ASCII on the Data lines 2. The microcomputer issues a Start pulse (does not matter how wide) 3. The microcomputer waits for the Ack pulse (Printer is done) a) Show the subroutine that outputs a character You may assume the Ack pulse is larger than 10 s. The 8-bit ASCII data to print is passed by value in Reg B. An example calling sequence is ldab #’V ; ASCII ‘V’ jsr Output b) How long is your Start pulse? Explain your calculation Homework 9.4 Redesign the printer interface of Homework 9.3 using interrupt synchronization. Connect the printer to Ports H and J and use a key wakeup interrupt on the Ack signal Write three routines: an initialization subroutine to turn it on, a public function that accepts a null-terminated ASCII string pointed to by register X, and an interrupt service routine triggered by the rising edge of Ack. The ASCII string to print is passed by reference in Reg X. An example calling sequence is ldx jsr
#String ; pointer to null-terminated ASCII string OutString
After the string has been printed, the system should disarm Homework 9.5 What happens if you forget to execute cli in the initialization in a system using interrupts? Homework 9.6 What happens if you execute cli as the first instruction in an ISR? Homework 9.7 What happens if you execute sei as the last instruction in an ISR? Homework 9.8 Write interrupting software that maintains the time of day. Give the initialization, the ISR, and the interrupt vector. The initial time of day is passed in when initialization is called. Register
366
9 䡲 Interrupt Programming and Real-Time Systems A contains the initial hour, Register B contains the initial minute. Assume the initial seconds are 0. Implement military time, where the hour goes from 0 to 23. Homework 9.9 Write interrupting software that counts a global variable at 1 Hz. Give the initialization, the ISR, and the interrupt vector. Homework 9.10 Assuming the object code is running in RAM, write three debugging subroutines that implement a ScanPoint system. The first subroutine initializes your system. The second subroutine adds a ScanPoint at the address passed into it in Register D. You may assume that the ScanPoint address is the first byte of an op code. When the target program executes that scanned instruction, the values of the registers are displayed, the original instruction is executed, and the program continues execution. Your system should be able to support up to ten ScanPoints. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine removes a ScanPoint at the address passed into it in Register D. For simplicity, you may assume scanpoints are only placed at single-byte instructions. Homework 9.11 Assuming the object code is running in RAM, write debugging subroutines that implement single stepping. In particular, write a subroutine that executes the target software at the address passed into it in Register D. You may assume that the starting address is the first byte of an op code. Your system should execute the target program one instruction at a time, showing the values of the registers, and pausing for SCI input after each instruction. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. If the operator types ‘q’, then the debugging halts and control is returned to the program that called your subroutine. For any other input, you should execute the next instruction. This is an advanced topic and will require output compare interrupts to solve. Homework 9.12 Create the repeating waveform on PT7 output as shown in Figure Hw9.12. Design the software system using RTI periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, RTI initialization, main program, RTI ISR, RTI vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The RTI ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places.
Figure Hw9.12 Desired output.
PT7
5.12ms
10.24ms
5.12ms
10.24ms
Homework 9.13 Create the repeating waveform on PT1 output as shown in Figure Hw9.13. Design the software system using OC1 periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, OC1 initialization, main program, OC1 ISR, OC1 vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The OC1 ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places. Figure Hw9.13 Desired output.
PT1
5 ms
10 ms
5 ms
10 ms
Homework 9.14 Redesign the FSM in Homework 6.24 to run in the background using RTI interrupts. Execute the FSM every 2.048 ms. There are no backward jumps in the ISR. Homework 9.15 Assume the PLL is running so the E clock is 25 MHz. Redesign the FSM in Homework 6.25 to run in the background using input capture and output compare interrupts. The FSM is run whenever there is a rising edge on PT3. There are no backward jumps in the ISR. Homework 9.16 Redesign the FSM in Homework 6.26 to run in the background using output compare interrupts. Execute the FSM every 10 ms. There are no backward jumps in the ISR. Homework 9.17 Redesign the FSM in Homework 6.27 to run in the background using output compare interrupts. Execute the FSM every 5 ms. There are no backward jumps in the ISR. Homework 9.18 Redesign the FSM in Homework 6.28 to run in the background using TOF interrupts. Execute the FSM every 16.384 ms. There are no backward jumps in the ISR.
9.13 䡲 Laboratory Assignments
367
Homework 9.19 Assume the PLL is running so the E clock is 24 MHz. Redesign the system in Homework 6.25 without using the FSM to run in the background using input capture and output compare interrupts. An input capture occurs on the rising edge on PT3. The pulse is created with output compare. There are no backward jumps in the ISR. Homework 9.20 These seven events all occur during each output compare 7 interrupt. 1. 2. 3. 4. 5. 6.
The TCNT equals TC7 and the hardware sets the flag bit (e.g., C7F 1) The output compare 7 vector address is loaded into the PC The I bit in the CCR is set by hardware The software executes movb #$80,TFLG1 The CCR, A, B, X, Y, PC are pushed on the stack The software executes something like ldd TC7 addd #2000 std TC7
7. The software executes rti List one possible order in which the events occur. Homework 9.21 There is a digital squarewave connected to input PT0. Use input capture on PT0 and output compare on channel 1 to measure the frequency on PT0. The range of values is 0 to 10000 Hz, and the desired resolution is 1 Hz. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. For example, if the frequency is 1000 Hz, the variable will be written with 1000. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.22 There is a digital squarewave connected to input PT2. Use input capture on PT2 and output compare on channel 3 to measure the frequency on PT2. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 0.1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. The range of values is 0 to 10000 Hz, and the desired resolution is 10 Hz. For example, if the frequency is 1000 Hz, the variable will be written with 100. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.23 Interface a switch to PJ7. Use positive logic (switch pressed makes PJ7 1). The switch bounce time is 10 ms. Use key wakeup on PJ7 and output compare on channel 0 to count the number of times the switch is pressed. Interrupt on the rising edge of PJ7. In the PJ7 ISR, disarm PJ7 and arm OC0 to interrupt in 15 ms. In the OC0 ISR, if the switch is pressed, increment the global variable, Count. The OC0 OSR software should disarm OC0 and rearm PJ7 key wakeup. Show the ritual, key wakeup ISR and output compare ISR. Assume the E clock is 8 MHz. Each touch will cause one key wakeup and one OC interrupt (Count incremented). Similarly, each release will cause one key wakeup and one OC interrupt (Count not incremented).
9.13
Laboratory Assignments Lab 9.1 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, create a segmented software system, interrupt synchronization by designing an input-directed traffic light controller. Description. Design implement and test the traffic light system described in Lab 6.5 with the added constraint that the software runs in the background using periodic interrupts. In particular, there are three components: a data structure containing the state graph, an initialization function that is called once to start the machine, and a periodic interrupt service routine that executes the state machine. All the other specifications and constraints described in Lab 6.5 still apply.
10
Numerical Calculations Chapter 10 objectives are to: c Introduce fixed-point and use it to develop numerical solutions c Develop extended precision mathematical calculations c Define floating point formats
The overall theme of this chapter is numerical calculations. Non-integer values can be represented on the computer using either fixed-point or floating point. Without hardware support, floating point operations run many times slower than fixed-point. Therefore, on a microcontroller like the 9S12 without floating point hardware, we would rather employ fixed-point. In general, we can use fixed-point for situations where the range of values is known at design time, and this range is small.
10.1
Fixed-Point Numbers We will use fixed-point numbers when we wish to express values in our software that have noninteger values. A fixed-point number contains two parts. The first part is a variable integer, called I. This integer may be signed or unsigned. An unsigned fixedpoint number is one that has an unsigned variable integer. A signed fixed-point number is one that has a signed variable integer. The precision of a number system is the total number of distinguishable values that can be represented. The precision of a fixed-point format is determined by the number of bits used to store the variable integer. On the 9S12, we typically use 8 bits or 16 bits. Extended precision can be implemented, but the execution speed will be slower because the calculations will have to be performed using software algorithms rather than with hardware instructions. This integer part is saved in memory and is manipulated by software. These manipulations include but are not limited to add, subtract, multiply, divide, convert to BCD, convert from BCD. The second part of a fixed-point number is a fixed constant, called . This value is fixed at design time, and can not be changed at run time. The fixed constant is not stored in memory. Usually we specify the value of this fixed constant using software comments to explain our fixed-point algorithm. The value of the fixed-point number is defined as the product of the two parts: Fixed-point number ⬅ I• The resolution of a number is the smallest difference that can be represented. In the case of fixed-point numbers, the resolution is equal to the fixed constant (). Sometimes we express the resolution of the number as its units. For example, a decimal fixed-point number with a resolution of 0.001 volts is really the same thing as an integer with units of mV. When inputting numbers from a keyboard or outputting numbers to a display, it may be convenient to use decimal fixed-point. With decimal fixed-point the fixed constant is a power of 10. Decimal fixed-point number I • 10m for some constant integer m
368
10.1 䡲 Fixed-Point Numbers
369
Again, the integer m is fixed and is not stored in memory. Decimal fixed-point will be easy to input or output to humans, while binary fixed-point will be easier to use when performing mathematical calculations. With binary fixed-point the fixed constant is a power of 2. Binary fixed-point number I • 2m for some constant integer m Observation: If the range of numbers is known and small, then the numbers can be represented in a fixed-point format. Checkpoint 10.1: Give an approximation of using the decimal fixed-point ( 0.001) format. Checkpoint 10.2: Give an approximation of using the binary fixed-point ( 28) format.
In the first example, we will develop the equations that a 9S12 would need to implement a digital voltmeter. The 9S12 has a built-in analog to digital converter (ADC) that can be used to transform an analog signal into digital form. The 10-bit ADC analog input range is 0 to 5 V, and the ADC digital output varies 0 to 1023 respectively. Let Vin be the analog voltage in volts and N be the digital ADC output, then the equation that relates the analog to digital conversion is Vin 5*N/1023 0.0048876*N Resolution is defined as the smallest change in voltage that the ADC can detect. This ADC has a resolution of about 5 mV. In other words, the analog voltage must increase or decrease by 5 mV for the digital output of the ADC to change by at least one bit. It would be inappropriate to save the voltage as an integer, because the only integers in this range are 0, 1, 2, 3, 4, and 5. Since the 9S12 does not support floating point, the voltage data will be saved in fixed-point format. Decimal fixed-point is chosen because the voltage data for this voltmeter will be displayed. A fixed-point resolution of 0.001 V is chosen because it is slightly smaller (better) than the ADC resolution. Table 10.1 shows the performance of the system. The table shows us that we need to store the variable part of the fixed-point number in a 16-bit variable.
Table 10.1 Performance data of a microcomputer-based voltmeter.
Vin (V) Analog input
N ADC digital output
I (0.001 V) variable part of the fixed-point data
0.000 0.005 1.000 2.500 5.000
0 1 205 512 1023
0 5 1000 2500 5000
One possible software formula to convert N into I is as follows. I (5000*N 512)/1023 It is very important to carefully consider the order of operations when performing multiple integer calculations. There are two mistakes that can happen. The first error is overflow, and it is easy to detect. Overflow occurs when the result of a calculation exceeds the range of the number system. The two solutions of the overflow problem were discussed earlier, promotion and ceiling/floor. The other error is called drop-out. Drop-out occurs after a right shift or a divide, and the consequence is that an intermediate result looses its ability to represent all of the values. To avoid drop-out, it is very important to divide last when performing multiple integer calculations. If you divided first, e.g., I 5000*(N/1023), then the
370
10 䡲 Numerical Calculations
values of I would be only 0, or 5000. The addition of “512” has the effect of rounding to the closest integer. The value 512 is selected because it is about one half of the denominator. For example, the calculation (5000*N)/1023 4 for N 1, whereas the “(5000*1 512)/1023” calculation yields the better answer of 5. The display algorithm is given as Program 10.1.
Program 10.1 Print unsigned 16-bit decimal fixed-point number to an output device.
void OutFDec(unsigned short OutUDec(n/100); // OutChar(‘.’); // OutUDec((n%1000)/100); // OutUDec((n%100)/10); // OutUDec(n%10); // OutChar(‘V’);} //
n){ // fixed constant is 0.001 digits to the left of the decimal point decimal point tenths digit hundredths digit thousandths digit units
When adding or subtracting two fixed-point numbers with the same , we simply add or subtract their integer parts. First, let x, y, z be three fixed-point numbers with the same . Let x I•, y J•, and z K•. To perform z x y, we simply calculate K I J. Similarly, to perform z x y, we simply calculate K IJ. When adding or subtracting fixed-point numbers with different fixed parts, then we must first convert two the inputs to the format of the result before adding or subtracting. This is where binary fixedpoint is more convenient, because the conversion process involves shifting rather than multiplication/division. In this next example, let x,y,z be three binary fixed-point numbers with the different s. In particular, we define x to be I•25, y to be J•22, and z to be K•23. To convert x, to the format of z, we divide I by 4 (right shift twice). To convert y, to the format of z, we multiply J by 2 (left shift once). To perform z x y, we calculate K (I 2) (J 1) For the general case, we define x to be I•2n, y to be J•2m, and z to be K•2p. To perform any general operation, we derive the fixed-point calculation by starting with desired result. For addition, we have z x y. Next, we substitute the definitions of each fixedpoint parameter K•2p I•2n J•2m Lastly, we solve for the integer part of the result K I•2np J•2mp For multiplication, we have z x•y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n•J•2m Lastly, we solve for the integer part of the result K I•J•2nmp For division, we have z x/y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n/J•2m
10.2 䡲 *Extended Precision Calculations
371
Lastly, we solve for the integer part of the result K I/J•2nmp Again, it is very important to carefully consider the order of operations when performing multiple integer calculations. We must worry about overflow and drop out. In particular, in the division example, if (n m p) is positive then the left shift (I•2nmp) should be performed before the divide (/J). We can use these fixed-point algorithms to perform complex operations using the integer functions of our 9S12.
Example 10.1 Rewrite the following digital filter using fixed-point calculations. y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 Solution In this case, the variables y, y1, y2, x, x1, and x2 are all integers, but the constants will be expressed in binary fixed-point format. The value 0.0532672 can be approximated by 14•28. The value 0.0506038 can be approximated by 13•28. Lastly, the value 0.9025 can be approximated by 231•28. The fixed-point implementation of this digital filter is y x x2 (14•x1 13•y1 231•y2) 8 Common Error: Lazy or incompetent programmers use floating-point in many situations where fixed-point would be preferable. Observation: As the fixed constant is made smaller, the accuracy of the fixed-point representation is improved, but the variable integer part also increases. Unfortunately, larger integers will require more bits for storage and calculations. Checkpoint 10.3: Using a fixed constant of 28, rewrite the digital equation F 1.8•C 32 in binary fixed-point format. Checkpoint 10.4: Using a fixed constant of 103, rewrite the digital filter y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 in decimal fixed-point format. Checkpoint 10.5: Assume resistors R1, R2, R3 are the integer parts of 16-bit unsigned binary fixed-point numbers with a fixed constant of 24 ohms. Write an equation to calculate R3 R1 ll R2 (parallel combination.)
10.2
*Extended Precision Calculations In this section, we will study various techniques to perform extended precision calculations. Sometimes complex calculations can be performed simply by combining simpler operations, while at other times, more sophisticated algorithms will be required. Three 32-bit local variables are used in the examples of this section. For most situations, local variables are more appropriate than globals, although using globals is often faster and easier to debug. Assume there are 12 bytes allocated on the stack pointed to by the stack pointer SP, and the following local variable binding. N M P
set set set
0 4 8
;32-bit local ;32-bit local ;32-bit local
372
10 䡲 Numerical Calculations
10.2.1 Addition and Subtraction
Program 10.2 A 32-bit addition operation.
Program 10.2 gives a 32-bit addition algorithm. The approach starts with the least significant byte and uses the add-with-carry operation to combine the 8-bit additions to form the 32-bit operation.
; 32-bit addition P=N+M ; Input: Two 32-bit numbers N,M ; Output: One 32-bit sum P ; Error: C/V set for unsigned/signed overflow add32 ldaa N+3,sp ; start with least significant byte adda M+3,sp staa P+3,sp ldaa N+2,sp ; next byte adca M+2,sp ; carry from previous addition staa P+2,sp ldaa N+1,sp ; next byte adca M+1,sp ; carry from previous addition staa P+1,sp ldaa N,sp ; last byte adca M,sp ; carry from previous addition staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z bit is not correct
Checkpoint 10.6: Why isn’t the Z bit correct?
Program 10.3 gives a 32-bit subtraction algorithm. Again, the approach starts with the least significant byte and uses the subtract-with-borrow operation to combine the 8-bit subtractions to form the 32-bit operation. Similar to addition, the V and C bits are properly set, while the Z bit is incorrect.
Program 10.3 A 32-bit subtraction operation.
sub32 ldaa N+3,sp ; start with suba M+3,sp staa P+3,sp ldaa N+2,sp ; next byte sbca M+2,sp ; carry from staa P+2,sp ldaa N+1,sp ; next byte sbca M+1,sp ; carry from staa P+1,sp ldaa N,sp ; last byte sbca M,sp ; carry from staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z
least significant byte
previous addition
previous addition
previous addition
bit is not correct
Program 10.4 presents functions that add (R A B) and subtract (R A B) two unsigned 8-bit values, using promotion to detect for errors. The assembly language version implements the 16-bit local result in Register D. This C program was previously presented as Program 3.2.
10.2 䡲 *Extended Precision Calculations Program 10.4 Using promotion to detect and compensate for unsigned overflow errors.
add
aOK sub
sOK
ldab clra addb adca cpd bls ldd stab rts ldab clra subb sbca cpd bge ldd stab rts
A ;promote to 16 bits B #0 ;A+B (16 bits) #255 aOK #255 ;ceiling R ;demote A ;promote to 16 bits B #0 #0 sOK #0 R
;A-B (16 bits)
;floor ;demote
373
unsigned char A,B,R; void add(void){ unsigned short result; result = A+B; /* promote */ if(result>255){ /* overflow ?*/ result = 255; /* yes */ } R = result; /* demote */ } void sub(void){ short result; result = A-B; /* promote */ if(result127){ /* result = 127; /* } if(result127){ /* result = 127; /* } if(result=M) ; C,V bits set on divide by zero (M=0) ; modifies Reg A,B,X,Y i set 0 ; loop counter div32 leas -1,s ; allocate i ldd M bne d32A ; divisor not zero ldd M+2 bne d32A ; divisor not zero sev ; divide by zero sec bra d32E d32A movw #0,M+4 movw #0,M+6 ; divisor is 64 bits, right justified movw #0,Q movw #0,Q+2 ; quotient=0 movb #32,i,s ; i=0 d32B ldx #M jsr lsr64 ; M=M>>1 ldx #Q jsr lsl32 ; Q=Q= 10){ LCD_OutDec(n/10); n = n%10; } 4) LCD_OutChar(n+$30); /* n is between 0 and 9 */ 5) deallocate variable LCD_OutFix 0) save any registers that will be destroyed by pushing on the stack 1) allocate local variables letter and num on the stack 2) initialize num to input parameter, which is the integer part 3) if number is less or equal to 9999, go the step 6 4) output the string “*.*** “ calling LCD_OutString 5) go to step 19 6) perform the division num/1000, putting the quotient in letter, and the remainder in num 7) convert the ones digit to ASCII, letter = letter+$30 8) output letter to the LCD by calling LCD_OutChar 9) output ‘.’ to the LCD by calling LCD_OutChar 10)perform the division num/100, putting the quotient in letter, and the remainder in num 11)convert the tenths digit to ASCII, letter = letter+$30 12)output letter to the LCD by calling LCD_OutChar 13)perform the division num/10, putting the quotient in letter, and the remainder in num 14)convert the hundredths digit to ASCII, letter = letter+$30 15)output letter to the LCD by calling LCD_OutChar 16)convert the thousandths digit to ASCII, letter = num +$30 17)output letter to the LCD by calling LCD_OutChar 18)output ‘ ‘ to the LCD by calling LCD_OutChar 19)deallocate variables 20)restore the registers by pulling off the stack
396
10 䡲 Numerical Calculations The third component of a device driver is a main program that calls the driver functions, as shown in Program L10.1c. This software has two purposes. For the developer (you), it provides a means to test the driver functions. It should illustrate the full range of features available with the system. The second purpose of the main program is to give your client or customer (e.g., the TA) examples of how to use your driver. Here is a 9S12DP512 example test program, assuming a positive logic switch is connected to PORTAD0 bit 7 (PAD7).
Program L10.1c Main program used to test the LCD driver.
org $4000 Entry lds #$4000 bset ATDDIEN,#$80 ;PAD7 digital input jsr LCD_Open ;***Your function that initializes the LCD*** start ldx #Welcome jsr LCD_OutString ;***Your function that outputs a string*** ldx #TestData loop brset PTAD,#$80,* ;wait for switch release brclr PTAD,#$80,* ;wait for switch touch jsr LCD_Clear ;***Your function that clears the display*** ldd 0,x jsr LCD_OutDec ;***Your function that outputs an integer*** ldaa #$40 ;Cursor location of the 8th position jsr LCD_GoTo ;***Your function that moves the cursor*** ldd 2,x+ jsr LCD_OutFix ;***Your function outputs a fixed-point*** cpx #TestDataEnd bne loop jsr LCD_Clear ;***Your function that clears the display*** bra start Welcome fcc “Welcome “ fcc “ “ ;32 spaces fcc “to lab! “ fcb 0 TestData fdb 0,5,16,123,5432,9876,9999,10000,23456,65535 TestDataEnd
You will use the TExaS simulator to develop and test your device driver, and then you will test your solutions on the real 9S12. There are many functions to write in this lab, so it is important to develop the device driver in small pieces. One technique you might find useful is desk checking. Basically, you hand-execute your functions with a specific input parameter. For example, using just a pencil and paper think about the sequential steps that will occur when LCD_OutDec or LCD_OutFix processes the input 9876. Later, while you are debugging the actual functions on the simulator, you can single step the program and compare the actual data with your expected data. a) One by one each of the subroutines should be designed, implemented and tested. Successive refinement is a development approach that can be used to solve complex problems. If the problem is too complicated to envision a solution, you should redefine the problem and solve an easier problem. If it is still too complicated, redefine it again, simplifying it even more. You could simplify LCD_OutFix 1. Implement the variables in global variables (rather than as local variables on the stack) 2. Ignore special cases with illegal inputs 3. Implement just one decimal digit During the development phase, you implement and test the simpler problem then refine it, adding back the complexity required to solve the original problem. You could simplify LCD_OutDec in a similar fashion.
10.7 䡲 Laboratory Problems
397
b) Draw the hardware circuit diagram. Build the interface using the circuit diagram. Please double check your connections before applying power. c) This lab is sufficiently complex that it should be first debugged on the TExaS simulator. Once the system is debugged on the simulator, download and debug it on the real 9S12. Each time a function is called, an activation record is created on the stack, which includes parameters passed on the stack (none in this lab), the return address, and the local variables. You will be asked to create a stack window and identify the activation records created during the execution of LCD_OutDec.
11
Analog I/O Interfacing Chapter 11 objectives are to: c Discuss sampling and the Nyquist Theorem c Use the DAC to generate sounds and music c Describe the internal ADC on the 9S12
The common theme of this chapter is analog I/O interfacing. The chapter begins with a discussion of representing continuous signals with digital approximations. A digital to analog converter will be used to generate waveforms and sound. This chapter covers some ADC modes built into the 9S12. The ADC is then used to design measurement systems. Finally, the chapter concludes with a control system, which includes both inputs and outputs.
11.1
Approximating Continuous Signals in the Digital Domain An analog signal is one that is continuous in both amplitude and time. Neglecting quantum physics, most signals in the world exist as continuous functions of time in an analog fashion (e.g., voltage, current, position, angle, speed, force, pressure, temperature, and flow etc.) In other words, the signal has an amplitude that can vary over time, but the value can not instantaneously change. To represent a signal in the digital domain we must approximate it in two ways: amplitude quantizing and time quantizing. From an amplitude perspective, we will first place limits on the signal restricting it to exist between a minimum and maximum value (e.g., 0 to 5 V), and second, we will divide this amplitude range into a finite set of discrete values. The range of the system is the maximum minus the minimum value. The precision of the system defines the number of values from which the amplitude of the digital signal is selected. Precision can be given in number of alternatives, binary bits or decimal digits. The resolution is the smallest change in value that is significant. Figure 11.1 shows a temperature waveform (solid line), with a corresponding digital representation sampled at 1 Hz and stored as a 5-bit integer number with a range of 0 to 31°C. Because it is digitized in both amplitude and time, the digital samples (individual dots) in Figure 11.1 must exist at an intersection of grey lines. Because it is a timevarying signal (mathematically, this is called a function), we have one amplitude for each time, but it is possible for there to be 0, 1, or more times for each amplitude. The second approximation occurs in the time domain. Time quantizing is caused by the finite sampling interval. For example, the data are sampled every 1 second in Figure 11.1. In practice we will use a periodic interrupt to trigger an analog to digital convertor (ADC) to digitize information, converting from the analog to the digital domain. Similarly, if we are converting from the digital to the analog domain, we use the periodic interrupt to trigger a digital to analog convertor (DAC). The Nyquist Theorem states that if the signal is sampled with a frequency of fs, then the digital samples only contain frequency components from 0 to 1⁄2 fs. Conversely, if the analog signal does contain frequency components larger than 1⁄2 fs, then there will be an aliasing error during the sampling process. Aliasing is when the digital signal appears to have a different frequency than the original analog signal.
398
11.2 䡲 Digital to Analog Conversion 32 28
Temperature (C)
Figure 11.1 An analog signal is represented in the digital domain as discrete samples.
399
24 20 16
Discrete digital signal
12 8
Continuous analog signal
4 0 0
1
2
3
4
5
6
7
8
9 10
Time (s)
Checkpoint 11.1: Why can’t the digital samples represent the little wiggles in the analog signal? Checkpoint 11.2: Why can’t the digital samples represent temperatures above 31°C?
11.2
Digital to Analog Conversion A digital to analog convertor (DAC) converts digital signals into analog form, as shown in Figure 11.2. An embedded system uses a DAC to affect changes in its external world (e.g., sound, RF wave, pressure, force, or heat.) When interfacing a DAC to the 9S12 we can use the SPI synchronous serial port, which was described previously in Section 8.3. The DAC output can be current or voltage. Additional analog processing may be required to filter, amplify, convert or modulate the signal. The DAC precision is the number of distinguishable DAC outputs. The DAC range is the maximum and minimum DAC output. The DAC resolution is the smallest distinguishable change in output. The units of resolution are in volts or amps depending on whether the output is voltage or current. The resolution is the change that occurs when the digital input changes by 1. The MAX550 interface in Figure 8.10 has a precision of 255 alternatives or 8 bits, a range of 0 to 5 V, and a resolution of about 20 mV. Range(volts) Precision(alternatives) • Resolution(volts)
Figure 11.2 Input/output functions of a 10-bit DAC and a 10-bit ADC.
1024
Digital Input
10-bit Digital signal
896
Analog Output
DAC
768 zoomed in
640 512 384 256
Analog Input
128
Digital Outputs
ADC
0 0
1
2
3
4
Analog signal (volts)
5
400
11 䡲 Analog I/O Interfacing
Let N be a m-bit digital output of the computer, hence is an input to the m-bit DAC. Let the range of the DAC is from Vmin to Vmax. From an overall perspective, the output of the DAC is a linear function of N Vout (Vmax Vmin) * N/2m Vmin The DAC accuracy is (Actual Ideal) / Ideal where Ideal is the desired output. An DAC is monotonic if an increase in digital value always causes an increase in analog value. This means if the digital signal increments slowly, then the analog output will never decrease. Example 11.1 Design a 2-bit DAC with a range of 0 to 5 V using resistors. Solution We begin the design by specifying the exact input/output relationship of the 2-bit DAC. There are two possible solutions depending upon whether we want a resolution of 1.25 V or 1.67 V, as shown as V1 and V2 in Table 11.1. The solution implements the V2 response. Table 11.1 Specifications of the 2-bit DAC.
N
Q1 Q0
V1 (V)
V2 (V)
0 1 2 3
0 0 5 5
0.00 1.25 2.50 3.75
0.00 1.67 3.33 5.00
0 5 0 5
Assume the output high voltage (VOH) of the 9S12 is 5 V, and its output low voltage (VOL) is 0. We choose the resistor ratio to be 2⁄1 so Q1 bit is twice as significant as the Q0 bit, as shown in Figure 11.3. If both Q1 and Q0 are 0, the output V2 is zero. If Q1 is 0 and Q0 is 5 V, the output V2 is determine by the resistor divider network 5 V 20 k V2 10 k 0 V, which is 1.67 V. If Q1 is 5 V and Q0 is 0, the output V2 is determine by the resistor divider network 5 V 10 k V2 20 k 0 V, which is 3.33 V. If both Q1 and Q0 are 5 V, the output V2 is 5 V. Figure 11.3 A 2-bit DAC.
9S12 bit1 bit0
Q1
10kΩ V2
Q0
20kΩ
You can realistically build a 4- or 5-bit DAC using this method. Checkpoint 11.3: How do you build a 3-bit DAC using this method?
11.3
Music Generation Most digital music devices rely on high-speed DAC converters to create the analog waveforms required to produce high-quality sound. In this section, we will discuss a very simple sound generation system that illustrates this application of the DAC. The hardware consists of a DAC and a speaker interface. You can drive headphones directly from a DAC output, but to drive a regular speaker, you will need to add an audio amplifier, as illustrated in Figure 11.4. For more information on the audio amplifier, refer to the data sheet of the MC34119. The quality of the music will depend on both hardware and software factors. The precision of the DAC, external noise and the dynamic range of the speaker are some
11.3 䡲 Music Generation
401
of the hardware factors. Software factors include the DAC output rate and the complexity of the stored sound data.
Figure 11.4 DAC allows the software to create music.
10kΩ DAC Out
headphones
DAC Out
speaker
0.1μF 20kΩ 0.1μF
+ 0.1μF
MC34119
If you output a sequence of numbers to the DAC that form a sine wave, then you will hear a continuous tone on the speaker, as shown in Figure 11.5. The loudness of the tone is determined by the amplitude of the wave. The pitch is defined as the frequency of the wave. Table 11.2 contains frequency values for the notes in one octave.
pitch = 1/period
period loudness
Figure 11.5 The loudness and pitch are controller by the amplitude and frequency.
Table 11.2 Fundamental frequencies of standard musical notes. The frequency for ‘A’ is exact.
Note
Frequency
C B Bb A Ab G Gb F E Eb D Db C
523 Hz 494 Hz 466 Hz 440 Hz 415 Hz 392 Hz 370 Hz 349 Hz 330 Hz 311 Hz 294 Hz 277 Hz 262 Hz
A 4-bit DAC was made with four resistors of values 1.5, 3, 6, and 12 k. The DAC had a range of 0 to 5 V and was interfaced to four output pins of the 9S12. The measured data in Figure 11.6 was collected using this DAC. The plot on the left was measured with a digital scope (without the headphones being attached). The plot on the right shows the frequency response of this data, plotting amplitude (in dB) versus frequency (in kHz). This measured waveform is approximately 2.7 2.3sin(2440 t) volts. The two peaks in the spectrum are at DC and 440 Hz (e.g., 20*log(2.3) 7.2 dB).
402
11 䡲 Analog I/O Interfacing
Figure 11.6 A 440 Hz sine wave generated with a 4-bit DAC. The plot on the right is the Fourier Transform(frequency spectrum dB versus kHz) of the data plotted on the left.
The frequency of each note can be calculated by multiplying the previous frequency by 12. You can use this method to determine the frequencies of additional notes above and below the ones in Table 11.2. There are twelve notes in an octave, therefore moving up one octave doubles the frequency. Figure 11.7 illustrates the concept of instrument. You can define the type of sound by the shape of the voltage versus time waveform. Brass instruments have a very large first harmonic frequency. 12
Figure 11.7 A waveform shape that generates a trumpet sound.
period
The tempo of the music defines the speed of the song. In 2⁄4 3⁄4 or 4⁄4 music, a beat is defined as a quarter note. A moderate tempo is 120 beats/min, which means a quarter note has a duration of 1⁄2 second. A sequence of notes can be separated by pauses (silences) so that each note is heard separately. The envelope of the note defines the amplitude versus time. A very simple envelope is illustrated in Figure 11.8. The 9S12DP512 has plenty of processing power to create these types of waves. Figure 11.8 You can control the amplitude, frequency and duration of each note (not drawn to scale).
330 Hz
0.5s
330 Hz
0.5s
523 Hz
1.0s
The smooth-shaped envelope, as illustrated in Figure 11.9, causes a less staccato and more melodic sound. This type of sound generation is possible to produce in real-time on the 9S12, but it uses most of the available processing capabilities. Figure 11.9 The amplitude of a plucked string drops exponentially in time.
330 Hz
0.5s
330 Hz
0.5s
523 Hz
1.0s
A chord is created by playing multiple notes simultaneously. When two piano keys are struck simultaneously both notes are created, and the sounds are mixed arithmetically. You can create the same effect by adding two waves together in software, before sending the wave to the DAC. Figure 11.10 plots the mathematical addition of a 262 Hz (low C) and a 392 Hz sine wave (G), creating a simple chord.
11.4 䡲 Analog to Digital Conversion 2 Sound Amplitude
Figure 11.10 A simple chord mixing the notes C and G.
403
1 0 –1 –2 0
11.4
0.005
0.01 Time (sec)
0.015
0.02
Analog to Digital Conversion An analog to digital convertor (ADC) converts an analog signal into digital form, also shown in Figure 11.2. An embedded system uses the ADC to collect information about the external world (data acquisition system.) The input signal is usually an analog voltage, and the output is a binary number. The ADC precision is the number of distinguishable ADC inputs (e.g., 1024 alternatives, 10 bits). The ADC range is the maximum and minimum A/D input (e.g., 0 to 5 V). The ADC resolution is the smallest distinguishable change in input (e.g., 5 mV). The resolution is the change in input that causes the digital output to change by 1. Range(volts) Precision(alternatives) • Resolution(volts) Normally we don’t specify accuracy for just the ADC, but rather we give the accuracy of the entire system (including transducer, analog circuit, ADC and software). An ADC is monotonic if it has no missing codes. This means if the analog signal is a slow rising voltage, then the digital output will hit all values one at a time. The merit of an ADC involves three factors: precision (number of bits), speed (how fast can we sample), and power (how much energy does it take to operate). How fast we can sample involves both the ADC conversion time (how long it takes to convert), and the bandwidth (what frequency components can be recognized by the ADC). The ADC cost is a function of the number and quality of internal components.
11.4.1 9S12 ADC Details
Most of the 9S12 microcontrollers have a built-in ADC converter, see Table 11.3. The 9S12C32 has one 8-channel 10-bit ADC using Port AD bits 7 to 0 (the pins are named PAD7 to PAD0). The 9S12DP512 has two 8-channel 10-bit ADC modules: ATD1 uses Port AD1 bits 7 to 0 (the pins are named PAD15 to PAD08) and ATD0 uses Port AD0 bits 7 to 0 (the pins are named PAD07 to PAD00). On the 9S12DP512, the ATD port names include a 0 or 1 to specify which ADC module, otherwise the ADC modules on the various 9S12 microcontrollers operate similarly. For example, the one control register 2 on the 9S12C32 is called ATDCTL2, but on the 9S12DP512 there are two ADC modules, so there are two control register 2s called ATD0CTL2 and ATD1CTL2. The ADC on the 9S12 can be operated in 8-bit mode or 10-bit mode. The 8 pins of 9S12C32 Port AD can be individually defined as analog input, digital output, or digital input. The 16 pins of 9S12DP512 Ports AD1 and AD0 can be individually defined as analog input or digital input (but not digital output). We set the corresponding bit in the ATD0DIEN register to be 1 for digital or 0 for analog input. On the 9S12C32, if we set the corresponding bit in the DDRAD register to 1, then that bit can be used as a digital output. There are no DDRAD registers on the 9S12DP512, because these pins can not be digital outputs. The ADC digital output can be right- or left-justified within the 16-bit result register, and it can be in a signed or unsigned format. When the ADC is triggered, it performs a sequence of conversions, with the sequence length being any number from 1 to 8 conversions. When performing a sequence, it can convert the same channel multiple times or it can convert
404
11 䡲 Analog I/O Interfacing
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0082 $0083 $0084 $0085 $0086 $008B $008D $008F
ADPU 0 SRES8 DJM SCF CCF7 Bit 7 PAD07
AFFC S8C SMP1 DSGN 0 CCF6 6 PAD06
AWAI S4C SMP0 SCAN ETORF CCF5 5 PAD05
ETRIGLE S2C PRS4 MULT FIFOR CCF4 4 PAD04
ETRIGP S1C PRS3 0 0 CCF3 3 PAD03
ETRIG FIFO PRS2 CC CC2 CCF2 2 PAD02
ASCIE FRZ1 PRS1 CB CC1 CCF1 1 PAD01
ASCIF FRZ0 PRS0 CA CC0 CCF0 Bit 0 PAD00
ATD0CTL2 ATD0CTL3 ATD0CTL4 ATD0CTL5 ATD0STAT0 ATD0STAT1 ATD0DIEN PORTAD0
address
msb
$0090 $0092 $0094 $0096 $0098 $009A $009C $009E
15 15 15 15 15 15 15 15
14 14 14 14 14 14 14 14
13 13 13 13 13 13 13 13
12 12 12 12 12 12 12 12
11 11 11 11 11 11 11 11
10 10 10 10 10 10 10 10
9 9 9 9 9 9 9 9
8 8 8 8 8 8 8 8
7 7 7 7 7 7 7 7
6 6 6 6 6 6 6 6
5 5 5 5 5 5 5 5
4 4 4 4 4 4 4 4
3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1
lsb
Name
0 0 0 0 0 0 0 0
ATD0DR0 ATD0DR1 ATD0DR2 ATD0DR3 ATD0DR4 ATD0DR5 ATD0DR6 ATD0DR7
Table 11.3 9S12 registers used for analog to digital conversion.
different channels during the sequence. We can trigger ADC conversions in three ways. The first way is to use an explicit software trigger (write to ATD0CTL5), and when the conversions are complete, the SCF flag is set. The example in Programs 11.1 and 11.2 employ the explicit software trigger to start an ADC conversion. The second way trigger the ADC is continuous mode. In this mode, the software starts it, but the ADC sample sequence is repeated over and over continuously. The third way is to connect an external trigger to the digital input on PAD07. With an external trigger we can use busy-wait synchronization (gadfly) on the SCF flag, or arm interrupts (ASCIE 1) on the ASCIF flag. The results of the ADC conversions can be found in the ATD0DR0 to ATD0DR7 result registers, where the register number refers to the sample sequence number. In other words, ATD0DR0 contains the result of the first conversion in the sequence, ATD0DR1 contains the result of the second conversion, . . . and ATD0DR7 contains the result of the eighth conversion. The ATD0CTL2 contains bits that activate the ADC module. The 9S12 ADC system is enabled by setting ADPU equal to 1. The ADC will request an interrupt on the completion of a conversion sequence if the arm bit ASCIE is set. ASCIF is the ATD Sequence Complete Interrupt Flag. If ASCIE 1 the ASCIF flag equals the SCF flag, else ASCIF reads zero. Write operations to ATD0CTL2 have no effect on ASCIF. ETRIGE is the External Trigger Mode Enable bit. This bit enables an external trigger using the digital input from Port AD bit 7. The external trigger allows us to synchronize sampling the ATD conversion with external events. If external triggering is enabled, then the type of trigger is defined in the ETRIGLE and ETRIGP bits as specified in Table 11.4.
Table 11.4 External trigger modes for the 9S12 ADC.
ETRIGLE
ETRIGP
External Trigger mode
0 0 1 1
0 1 0 1
Falling edge of PAD07 starts a conversion sequence Rising edge of PAD07 starts a conversion sequence Perform ADC conversions when PAD07 is low Perform ADC conversions when PAD07 is high
11.4 䡲 Analog to Digital Conversion
405
The ATD0CTL3 and ATD0CTL4 contain bits that configure the ADC mode. The bits S8C, S4C, S2C, S1C control the number of conversions per sequence. Let n be the four-bit number specified by these bits. For values of n from 1 to 7, n specifies the sequence length. For values of n equal to 0 or 8 to 15, the sequence length is 8. At reset, the default sequence length is 4 (0100), maintaining software continuity to HC12 family. This book will not discuss FIFO mode or freeze mode. SRES8 is the ADC Resolution Select bit. This bit selects the resolution of ADC conversion as either 8 (SRES8 1) or 10 bits (SRES8 0). The ADC converter has an accuracy of 10 bits; however, if low resolution is acceptable, selecting 8-bit resolution will reduce the conversion time, and may simplify software design. It takes about 10 s to convert an analog signal into a digital number. The exact time to perform an ADC conversion is determined by the E clock and the ATD0CTL4 register. The ATD0CTL4 register selects the sample period and PRS-Clock prescaler. SMP1, SMP0 are the Sample Time Select bits. These two bits select the length of the second phase of the sample time in units of ATD conversion clock periods, as listed in Table 11.5. The sample time consists of two phases. The first phase is two ATD conversion clock periods long and transfers the sample quickly (via the buffer amplifier) onto the ADC machine’s storage node. The second phase attaches the external analog signal directly to the storage node for final charging and high accuracy. Table 11.5 lists the lengths available for the second sample phase. Let m be the 5-bit number formed by bits PRS4-0. Let fE be the frequency of the E clock. The ATD conversion clock frequency is calculated as follows: ATD clock frequency 1⁄2 fE/(m 1) The default (after reset) prescaler value is 5, which results in a default ATD conversion clock frequency that is the E clock divided by 12. The choice of these parameters involves a tradeoff between accuracy and speed. Freescale recommends the ADC clock frequency be restricted to the 500 kHz to 2 MHz range. For analog signals with the white noise, we can essentially add an analog low pass filter by increasing the ADC sample time, s. To increase conversion speed, we wish to select a faster clock and shorter sample period. The last factor to consider is the slewing rate of the input signal. For signals with a high slope, dV/dt, we need to select a faster conversion time (i.e., shorter sample time). For a 4 MHz E clock, the possible m prescales range from 0 (ADCclock 2 MHz) to 3 (ADCclock 500 kHz). For an 8 MHz E clock, the possible m prescales range from 1 (ADCclock 2 MHz) to 7 (ADCclock 500 kHz). For a 24 MHz E clock, the possible m prescales range from 5 (ADCclock 2 MHz) to 23 (ADCclock 500 kHz). Other choices are not recommended. The time for one ADC conversion is equal to 2(m 1)(s n)/fE, where s is the total sample time (Table 11.5) and n is the number of ADC bits (e.g.,10).
Table 11.5 Sampling time for the 9S12 ADC.
SMP1
SMP0
First Sample Phase
Second Sample Phase
Total Sample Time, s
0 0 1 1
0 1 0 1
2 ADC clock periods 2 ADC clock periods 2 ADC clock periods 2 ADC clock periods
2 ADC clock periods 4 ADC clock periods 8 ADC clock periods 16 ADC clock periods
4 ADC clock periods 6 ADC clock periods 10 ADC clock periods 18 ADC clock periods
Observation: The ADC frequency does not determine the data acquisition sampling rate, rather it determines how fast one ADC conversion occurs. The sampling rate is determined by how often the software starts the ADC conversion.
Writing to the ATD0CTL5 register will start an ADC conversion. To begin continuous conversions, we write to the ATD0CTL5 with SCAN 1. On the other hand, if we write to ATD0CTL5 with SCAN 0, only one sequence occurs. CC, CB, CA select the analog input channel(s) whose signals are sampled and converted to digital numbers. Because the result registers (16 bits) are wider than the ADC digital code (8 or 10 bits), we must choose
406
11 䡲 Analog I/O Interfacing
where in the result register to put the digital code. DJM is the Result Register Data Justification bit, where 1 means right-justified and 0 means left-justified data in the result registers. DSGN selects between signed and unsigned format. We set DSGN to 1 for signed data representation and we set it to 0 for unsigned data representation. Table 11.6 describes the four possible 10-bit data formations for the 9S12. When MULT is 0, the ATD sequence controller samples only from the specified analog input channel for an entire conversion sequence. When MULT is 1, the ATD sequence controller samples a sequence of different analog input channels. The number of channels sampled is determined by the sequence length value (S8C, S4C, S2C, and S1C). The first analog channel examined is determined by channel selection code (CC, CB, and CA control bits); subsequent channels sampled in the sequence as determined by incrementing the channel selection code. The status register contains a status bit, SCF, which we use to poll for the ADC conversion completion. The SCF flag is cleared by writing data into the ATD0CTL5 (i.e., starting a new conversion.) The SCF flag can also be cleared by writing a 1 to it. The CC2,CC1,CC0 bits are the sequence counter as the ADC steps through a conversion sequence. The CCFn bits are individual flags for each of the conversions. Performance Tip: If we are interested in a single conversion, we should initialize the ATDCTL3 register to perform just one conversion. Common Error: The ADC data register ATD0DR0 does not necessarily contain the result from ADC channel 0. The ADC data register ATD0DR0 contains the result of the first conversion, the ATD0DR1 contains the result of the second conversion, etc. Observation: Voltages above 5 V or below 0 V will damage the ADC pin on a 9S12.
11.4.2 ADC Data Formats
The 9S12DP512 subroutine, ADC_In, will return a 10-bit value representing the analog input. Table 11.6 shows the 8-bit format and the four 10-bit formats available on 9S12.
Analog Input (V)
8-bit Unsigned Digital Output
10-bit Unsigned Right-justified
10-bit Unsigned Left-justified
10-bit Signed Right-justified
10-bit Signed Left-justified
0.000 0.005 0.020 2.500 3.750 5.000
$00 0 $00 0 $01 1 $80 128 $C0 192 $FF 255
$0000 0 $0001 1 $0004 4 $0200 512 $0300 768 $03FF 1023
$0000 0 $0040 64 $0100 256 $8000 32768 $C000 49152 $FFC0 65472
$FE00 512 $FE01 511 $FE04 508 $0000 0 $0100 256 $01FF 511
$8000 32768 $8040 32704 $8100 32512 $0000 0 $4000 16384 $7FC0 32704
Table 11.6 Binary formats used by the Freescale internal ADCs.
For the 9S12 running with an 8-bit precision, the analog input range is 0 to 5 V, the analog input resolution is 5 V/256, which is about 20 mV. For 9S12 running with a 10-bit precision, the analog input range is 0 to 5 V, the analog input resolution is 5 V/1024, which is about 5 mV. When the 9S12 inserts an 8- or 10-bit data into the 16-bit result register it will pad the extra bits with zeros, but the signed right-justified column in Table 11.6 is shown with bits 15 to 10 having been sign-extended from bit 9. The code to perform this sign extension is ldd ATD0DR0 bita #$02 beq ok oraa #$F8 ok std Result
; ; ; ;
10-bit test bit 9 skip if positive sign extend
Result = ATD0DR0; // 10-bit if(Result&0x200){ // test sign bit Result |= 0xF800; // sign extend }
11.4 䡲 Analog to Digital Conversion
407
Example 11.2 Design a device driver for the internal ADC. Solution In this simple solution, we will design two functions: one to initialize the ADC and one to sample a channel. The ADC_In function will perform one conversion, and the returns the 10-bit result, see Program 11.1. In this initialization routine, bit 7 of ATD0CTL2 is set to enable the ADC. In order to sample the ADC once we set S8C, S4C, S2C, S1C to 0001. In this solution, we choose the fastest possible rate for the ADC clock (0.5 s). For an 8 MHz E clock, we would set m equal to 1 to get a 2 MHz ADCclock. However, the solution assumes a 24 MHz E clock, and thus, we set m equal to 5 to get an ADCclock of 2 MHz. The solution chooses the shortest possible sampling time, which is appropriate for situations of low noise (s 4). For more noiser situations we can slow down the ADCclock and increase the sampling time. The time to make one ADC conversion is 2(m 1)(s n)/fE 2(5 1) (4 10)/24 MHz 7 s.
; 9S12DP512, assuming a 24MHz E clock ; bit 7 DJM 1=right , 0=left justified ; bit 6 DSGN 1=signed, 0=unsigned ; bit 5 SCAN 0=single sequence ; bit 4 MULT 0=single channel ; bits 2-0 channel number 0 to 7 ; Analog signal connected to PAD7-0 ADC_Init movb #$80,ATD0CTL2 ;power up movb #$08,ATD0CTL3 ;1 sample movb #$05,ATD0CTL4 ;10-bit rts ;In: RegA has channel Number ;RegA=$82 means right-justified channel 2 ;Out: RegD has 10-bit ADC result ADC_In staa ATD0CTL5 ;Start ADC Loop brclr ATD0STAT1,$01,Loop ldd ATD0DR0 ;10-bit result rts
// 9S12DP512, assuming a 24MHz E clock // bit 7 DJM 1=right , 0=left justified // bit 6 DSGN 1=signed, 0=unsigned // bit 5 SCAN 0=single sequence // bit 4 MULT 0=single channel // bits 2-0 channel number 0 to 7 // Analog signal connected to PAD7-0 void ADC_Init(void){ ATD0CTL2 = 0x80; // enable ADC ATD0CTL3 = 0x08 // 1 sample ATD0CTL4 = 0x05; // 10-bit, } // chan=0x82 means right-justified channel 2 unsigned short ADC_In(unsigned char chan){ ATD0CTL5 = chan; // start while((ATD0STAT1&0x01)==0) ; // CCF0 return ATD0DR0; // 10-bit result }
Program 11.1 Assembly and C software to sample data using the ADC.
Observation: To make an ADC driver for the 9S12C32, simply change all the ATD0 to ATD in Program 11.1. Checkpoint 11.4: Derive an equation for the digital value returned by the function ADC_In versus the analog input voltage.
11.4.3 ADC Resolution
The ADC resolution is the smallest change in input that can be reliably detected by the system. Figure 11.11 illustrates how ADC resolution should be measured. Because of the noise processes, if we set the ADC input to Vin and sample it 1000 times, we will get a distribution of digital outputs. We plot the number of times we got an output as a function of the output sample. The shape of this response is called a probability density function (pdf) characterizing the noise processes. For example, white noise has a Gaussian pdf. The standard deviation of repeated measurements (with units of volts) is a simple measure of ADC
408
11 䡲 Analog I/O Interfacing
Figure 11.11 Experimental determination of ADC resolution.
10-bit ADC of a 9S12
1000
2.500V 2.501V
100 pdf
2.502V 2.503V 2.504V
10
1 504
2.505V
506
508 ADC output
510
512
resolution (in volts). A better measure of resolution would be to repeat the 1000 measurements with an input slightly larger, Vin V. If we can demonstrate that the second data set is statistically different from the first (regardless of Vin), we claim the resolution is less than or equal to V. For the 10-bit ADC on the 9S12, we have to increase the input by 5 mV to always be able to statistically recognize the change. Therefore we claim the ADC has a resolution of 5 mV.
11.5
*Multiple Access Circular Queues A multiple access circular queue (MACQ) is used for performing digital signal processing on a data acquisition system. A MACQ is a fixed length order preserving data structure, as shown in Figure 11.12. The source process (producer) places information into the MACQ. Once initialized, the MACQ is always full. The oldest data is discarded when the newest data is entered into a MACQ. The sink process (consumer) can read any of the data from the MACQ. Reading the data in the MACQ is non-destructive. This means that the MACQ is not changed by the read operation.
Figure 11.12 A multiple access circular queue stores the most recent set of measurements.
MACQ before v[0] v[1] v[2] v[3]
new
MACQ after v[0] v[1] v[2] v[3]
lost The MACQ is useful for implementing digital filters and digital controllers. The following equation illustrates a robust way to calculate the first derivative of a measured signal. Let v[0] v[1] v[2] and v[3] are the most recent data sampled at a fixed time period t. If each v has the units of mV, and t has the units of msec, then the derivative, d, will be in mV/msec. d
v[0] 3 v[1] 3v[2] v[3] 6¢t
To measure the derivative following sequence of operations are executed every 1 ms, v[3] v[2] v[2] v[1] v[1] v[0] v[0] new voltage measurement (in mV) d (v[0] 3*v[1] 3*v[2] v[3])/6
11.6 䡲 Real-Time Data Acquisition
409
Checkpoint 11.5: Write an equation to calculate acceleration using inputs v[0] v[1] and v[2]. Checkpoint 11.6: Write a more robust equation to calculate acceleration using inputs d[0] d[1] d[2] and d[3], where d is calculated using the above equation.
11.6
Real-Time Data Acquisition Whenever we wish convert a continuous analog signal into discrete time digital samples, the rate at which the sampling process occurs is extremely important. Nyquist Theorem: If fmax is the largest frequency component of the analog signal, then you must sample more than twice fmax in order to faithfully represent the signal in the digital samples. For example, if the analog signal is A B sin(2 ft ) and the sampling rate is greater than 2f, you will be able to determine A, B, f, and from the digital samples.
The goal of a data acquisition system is to sample the ADC at a regular rate. Let fs be the desired sampling rate, and let ti be the actual time the ADC creates sample number i. In a perfect world, we would like to have (ti ti1) 1/fs for all i. When using period interrupts to establish the sampling rate (Programs 9.2, 9.3, and 9.4), there are two factors that lead to fluctuations in the sample period. We define time jitter, t, as the maximum variation in the sample-to-sample time. 1/fs t (ti ti1) 1/fs t We learned in Section 9.2 that it takes 9 cycles to process an interrupt (vector fetch, push registers). These 9 cycles, plus the execution of the ISR itself, are equal for every sample. Thus, the time between samples is not affected by this fixed delay. The first factor that does cause jitter is the instruction currently being executed at the time of the interrupt request. The time to execute an instruction on the 9S12 varies from 1 to 13 cycles. We can neglect the rev revw and wav instructions, because these three instructions can be suspended by an interrupt. We do not know which instruction will be executing or when during that instruction the interrupt will be requested. This uncertainty causes a maximum time jitter of at most 13 cycles, or 1.625 sec on an 8 MHz 9S12. This jitter is usually acceptable. The second source of jitter can be much larger. If there are any portions of the main program that disable interrupts (e.g., because of a critical section), then the time running with interrupts disabled will cause time jitter in the sampling. In a similar fashion, if there are other interrupts, then the time to execute the other ISR may cause a time jitter. Observation: Good software places as little processing in the ISR itself. Perform whatever functions must be done in the ISR, and shift the rest of the processing to the foreground. Observation: Real-time systems must put an upper bound on the time the software is allowed to run with interrupts disabled. Checkpoint 11.7: Assuming the 9S12 is running at 24 MHz, what is the time jitter caused by the uncertainty in which instruction is being executed at the time of the interrupt?
Example 11.3 Design a real-time data acquisition system that samples a voltage signal, performs a digital differentiation and displays the slope on a LCD. The range of inputs is 0 to 5 V with frequency components of 0 to 50 Hz.
410
11 䡲 Analog I/O Interfacing
Solution We will connect the input to PAD2 (but any ADC input could have been used). According to the Nyquist Theorem we need to sample at 100 Hz or faster. The time to output the result on the LCD is about 0.4 ms (using Program 8.7, 10 characters at 40 s each), so the reasonable choices of sampling rates are 100 to 2000 Hz. This solution implements 100 Hz sampling using a periodic output compare interrupt, like Program 9.4. The main program will initialize the PLL, OC6 and ADC. The main loop of the foreground thread will wait for a sample, calculate the derivative, then output the slope, see Figure 11.13 and Program 11.2. The multiple access circular queue will contain the most recent four voltage samples. A flag will be used to synchronize data transfer from background to foreground. This sample mechanism can be used whenever the main program is guaranteed to finish every time within the 10 ms interval, before the next sample arrives. The first in first out queue, which will be presented in the next chapter, can be used in situations where the main program is only guaranteed on average to be finished. In this case, the periodic interrupt will trigger the ADC, calculate Voltage as a decimal fixed-point number (units 0.001 V, or mV) and send the data to the foreground. The system implements a heartbeat, by toggling PT6 during each execution of the ISR. The PT6 output is not required for the correct operation of the system, but it is useful to see the program is running and the interrupt rate is 100 Hz. The function LCD_OutDec will display a signed integer on the LCD (not explicitly given in this book, left as an exercise to the reader). Although the original sample is unsigned (0 to 1023), and the voltage will be unsigned (0 to 5000), the calculations are performed in signed math because the slope may be negative. Since the units of voltage are mV and the sampling occurs every 10 ms, the units of slope will be 0.1 mV/ms, which is 0.1 V/sec. For example, let the voltage slope be 1 V/s, typical set of four voltage measurements might be 50, 60, 70, or 80 mV. The calculated slope would be (80 3*(70 60) 50)/6 10, which means 1.0 V/sec. Figure 11.13 Flowchart for a real-time data acquisition system.
main
OC6 interrupt (10ms)
OC6_Init
PTT ^= 0x40
ADC_Init
Sample=ADC_In
Flag 1
0
Voltage = 625*sample/128 Flag = 1
Slope = dVoltage/dt TC6 = TC6+15000 Flag = 0 LCD_Out(Slope)
; running at org V rmb Voltage rmb Flag rmb
24 MHz $0800 8 2 1
;($3800 if C32) ;last four samples ;current sample ;true if new data
Ack interrupt
// running at 24 MHz short V[4]; // last four samples short Voltage; // current sample, 0.001V char Flag; // true if new data short Slope; // derivative, 0.1V/sec
continued on p. 413 Program 11.2 Implementation of a periodic interrupt using output compare.
11.6 䡲 Real-Time Data Acquisition
continued from p. 412 Slope
rmb 2 ;derivative org $4000 main lds #$4000 jsr PLL_Init ;Program 4.4 jsr ADC_Init ;Program 11.1 jsr OC6_Init loop ldaa Flag beq loop ;wait for data movw V+4,V+6 ;shift MACQ movw V+2,V+4 movw V,V+2 movw Voltage,V clr Flag ;done with Voltage ldd V+2 ;V[1] subd V+4 ;V[1]-V[2] pshd asld ;2*(V[1]-V[2]) addd 2,SP+ ;3*(V[1]-V[2]) addd V ;V[0]+3V[1]-3V[2] subd V+6 ;V[0]+3V[1]-3V[2]-V[3] ldx #6 idivs stx Slope ;0.1V/sec jsr LCD_OutDec ;to be written bra loop OC6_Init bset DDRT,#$40 movb #$80,TSCR1 ;enable TCNT movb #$04,TSCR2 ;24MHz/16 bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 clr Flag ldd TCNT ;time now addd #50 ;first in 50us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han ldaa PTT eora #$40 ;heartbeat staa PTT ldaa #$82 jsr ADC_In ;10 bit sample ldy #625 emuls ldx #128 edivs sty Voltage ;0 to 5000 movb #1,Flag ;signal ldd TC6 addd #15000 std TC6 ;next in 10 ms movb #$40,TFLG1 ;acknowledge rti org $FFE2 fdb OC6Han ;vector
void main(void){ PLL_Init(); // Program 4.4 ADC_Init(); // Program 11.1 OC6_Init(); while(1){ while(Flag == 0){}; V[3] = V[2]; // shift MACQ V[2] = V[1]; V[1] = V[0]; V[0] = Voltage; Flag = 0; // done with Voltage Slope = (V[0]+3*V[1]-3*V[2]-V[3])/6; LCD_OutDec(Slope); // 0.1V/sec } } void OC6_Init(void){ asm sei // Make atomic DDRT |= 0x40; TSCR1 = 0x80; // 1.5 MHz TCNT TSCR2 = 0x04; // divide by 16 TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 Flag = 0; // no data TC6 = TCNT+50; // first in 50us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ short sample; PTT ^= 0x40; // heartbeat sample = (short)ADC_In(0x82); Voltage = (625*sample)/128; Flag = 1; // new data ready TC6 = TC6+15000; // next in 10 ms TFLG1 = 0x40; // acknowledge C6F }
411
412
11 䡲 Analog I/O Interfacing Checkpoint 11.8: Estimate the time jitter of the assembly version of the real-time system in Program 11.2.
Example 11.4 Design an analog interface between a 30 k thermistor and the 9S12, so that temperatures from 20 to 45°C can be measured. Solution The software components of this system are left as laboratory exercises. A thermistor is a transducer, with its resistance being a function of temperature. A 30 k thermistor has a resistance of 30 k at room temperature (25°C). The basic approach is to use a bridge to convert the thermistor resistance into a voltage difference (V1 V2), see Figure 11.14. A rail-to-rail instrumentation amplifier (AD623) converts the voltage difference into the 0 to 5 V range of the ADC. Rail-to-rail means the output can swing from one power supply rail (0 V) to the other (5 V). A rail-to-rail analog amplifier has three wonderful advantages over analog circuits that use the usual 12 V and 12 V supplies. First, with rail-to-rail analog chips, the entire embedded system can operate on a single supply. These rail-to-rail devices are low power, and hence the system can be run on batteries. Third, the analog output is guaranteed to exist in the 0 to 5 V range; hence broken devices, unplugged transducers, or damaged cables will not destroy the ADC pin of the 9S12. One way to design a data acquisition system is to create a design table, as shown in Table 11.7. The table is presented with the columns from left to right as the signal traverses the system. However, during the design phase, we begin with the left-most column (expected input) and the right-most column (desired outputs) and work our way into the center. We calibrate the thermistor using an ohmmeter, a reference thermometer, and a water bath. This calibration gives us a thermistor resistance (RT) for various temperatures throughout the measurement range. The nonlinear resistance versus temperature response can be modeled with RT R0 e/T where R0 and are calibration coefficients and T is temperature in Kelvin. The 10-bit ADC of the 9S12 will be used, so we can work backwards in the table (from right-most to second right-most column) using a linear mapping of the output T (in 0.01°C) into the digital sample expected from the ADC. Similarly, knowing how the ADC operates, we can calculate the expected V3 input (in V) that gives the corresponding ADC result. The bridge will be used to convert resistance to voltage. There are three design parameters for the bridge. The first is the bridge input voltage. One could use the 5 V supply as the input, but a less-noisy solution is to use an analog reference voltage of 2.50 V. This is a highly stable low noise voltage signal is generated by the REF03 chip
Figure 11.14 Analog interface between a thermistor and the ADC input of a 9S12.
+5V 2 0.1uF
6
+5V
+2.50V
0.1uF
7.3kΩ Rg
REF03
4
R1 200kΩ
200kΩ V1 R1 V2
RT
R2 10kΩ
7 1
3 2
8 6
AD623 5 4
V3
9S12 ADC
T
LCD 25.12C
11.7 䡲 *Control Systems Table 11.7 Thermistor design table.
413
T (°C)
RT (k)
V1 (V)
V1 V2 (V)
V3 (V)
ADC
T (0.01°C)
20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 42.5 45.0
44.87 39.85 35.46 31.61 28.24 25.27 22.66 20.35 18.31 16.50 14.89
0.458 0.415 0.376 0.341 0.309 0.280 0.254 0.231 0.210 0.191 0.173
0.339 0.296 0.257 0.222 0.190 0.161 0.135 0.112 0.091 0.071 0.054
5.000 4.370 3.796 3.277 2.806 2.380 1.996 1.649 1.336 1.054 0.799
1023 894 776 670 574 487 408 337 273 215 163
2000 2250 2500 2750 3000 3250 3500 3750 4000 4250 4500
(this reference is not used to power devices, but used as an analog constant). The second parameter is R1, which is choosen large enough to prevent the bridge from heating the thermistor, causing an error. A R1 value of 200 k creates a power dissipation of about 0.005 mW in the thermistor. Given a dissipation constant of 2.5 mW/°C, the selfheating error will be about 0.002°C (insignificant). The resistor R2 sets the maximum temperature of the system. The value of 10 k is choosen slightly smaller than the thermistor resistance at 45°C. Given the bridge input, R1 and R2, the design parameters V1 and V1 V2 can be calculated using Ohm’s Law. The last step is select the gain of the instrumentation amplifier so a V1 V2 of 0.339 V is converted to a V3 of 5.0 V. The system needs a gain of about 14.7 (5/0.339). AD623 and AD627 are low-power single supply rail-to-rail instrumentation amps, V3 Gain*(V1 V2) Vpin5 Gain 1 (100 k/RG) (for the AD623) Gain 5 (200 k/RG) (for the AD627) V3 Vpin5 when V1 equals V2 For this system, we set pin 5 to ground, and select RG 100 k/(14.7 1) 7.3 k. If we choose a smaller gain, then the minimum temperature will decrease. After we build the circuit, we recalibrate the system, creating a table like Table 11.7, but filling in actual measured values. The last two columns of the measured response will be used by the software to convert ADC sample into a fixed-point value to be displayed.
11.7
*Control Systems This book is an introduction to embedded systems, and thus control theory is beyond its scope. However, the simple example presented in this section knits together many of the topics of this chapter and provides the framework showing how more complex control systems might be implemented with a microcontroller. A control system is a collection of mechanical and electrical devices connected for the purpose of commanding, directing, or regulating a physical plant, as shown in Figure 11.15. An example physical plant is a DC motor. The real state variables are the properties of the physical plant that are to be controlled (e.g., motor speed in RPM). The sensor and state estimator comprise a data acquisition system. A tachometer is an example of a sensor, and its state estimator would be the associated hardware (Figure 9.7) and software (Program 9.6.) The goal of this data acquisition system is to estimate the state variables (e.g., measure speed). A closed loop
414
11 䡲 Analog I/O Interfacing
control system uses the output of the state estimator in a feedback loop to drive the errors to zero. The control system compares these estimated state variable, X’(t), to the desired state variable, X*(t), in order to decide appropriate action, U(t). The actuator is a transducer that converts the control system commands, U(t), into driving forces, V(t), that are applied the physical plant. An example actuator is the hardware and software associated with PWM (Figure 8.13 and Program 8.9). Figure 11.15 A control system employs closed-loop negative feedback.
Actuator Error
Desired + X* X’
– e=X*–X’
Incremental Controller
Measured Speed State estimator Period Measurement
U
PWM circuit
Sensor Tachometer
Plant V R L
Actual Speed + –
DC Motor
emf
In general, the goal of the control system is to drive the real state variables to equal the desired state variables. In actuality though, the controller attempts to drive the estimated state variables to equal the desired state variables. It is important to have an accurate state estimator, because any differences between the estimated state variables and the real state variables will translate directly into controller errors. If we define the error as the difference between the desired and estimated state variables: e(t) X*(t) X’(t) then the control system will attempt to drive e(t) to zero. We evaluate the effectiveness of a control system by determining three properties: steady state controller error, transient response, and stability. The steady state controller error is the average value of e(t). The transient response is how long does the system take to reach 99% of the final output after X* is changed. A system is stable if steady state (smooth constant output) is achieved. An unstable system oscillates or saturates. In general control theory, X(t), X’(t), X*(t), U(t), V(t) and e(t) refer to multidimensional vectors (e.g., speed of multiple motors, speed&acceleration, (yaw, pitch, roll) or (x,y,z,,,)), but the simple examples in this book control only a single parameter. Observation: Many control systems operate well when the control equations are executed about 10 times faster than the step response time of the physical plant.
Example 11.5 Design a control system that controls a DC motor in the range of 1000 to 2000 RPM. Solution The objective of this system is to control the speed of a motor, using the design approach shown in Figure 11.15. One input will be the desired speed, stored as a decimal integer (RPM) in a global variable called Desired. For example to spin the motor at 1500 RPM, we set Desired to 1500. The second input will be the measured speed, stored as a decimal integer (RPM) in a global variable called Measured. Figure 9.7 and Program 9.6 supply measured Period with a resolution of 1 s and a range of 50 s to 65.535 ms. Assuming there are 32 stripes on the wheel, we can use the following equation to measure speed (in RPM): Measured = 1,875,000/Period
11.7 䡲 *Control Systems
415
The units of Period are s/stripe. The 1,875,000 constant is derived from 1,000,000 s/sec, 60 min/sec and 1 rotation/32 stripes. Given motor speeds of 1000 to 2000 RPM, we can expect Period measurements from 1875 to 938. For example, if the speed is 1500 RPM, this is 25 rotations per second. Because there are 32 stripes the frequency of the tachometer will be 800 Hz, giving a period will be 1250 s. As you can see, the above equation will take the 1250 period measurement and calculate Measured as 1500 RPM. The speed resolution is the smallest change in speed we can reliably detect. We calculate speed resolution at 1500 RPM as 1875000/1249 1875000/1250 1.2 RPM. The output of the controller is U, which is an integer from 0 (no power) to 250 (full power), representing the duty cycle of the PWM actuator. Figure 8.13 and Program 8.9 comprise the actuator for this system. In this case, the controller calls PWM_Duty0 when it wants to adjust power to the motor. An incremental control algorithm simply adds or subtracts a constant from U depending on the sign of the error, as shown in Figure 11.16. In other words, if the error is positive (too slow) then U is incremented and if the error is negative (too fast) then U is decremented. It is important to choose the proper rate at which the incremental control software is executed. If it is executed too many times per second, then the actuator will saturate resulting in a Bang-Bang system (always on or always off). If it is not executed often enough then the system will not respond quickly to changes in the physical plant or changes in desired speed. In this incremental controller we add or subtract “1” from the actuator, but a value larger than “1” would have a faster response at the expense of introducing oscillations.
Figure 11.16 Incremental control used to run a DC motor at a constant speed.
Input capture
Periodic interrupt Measured=18750000/Period
Period = TC1-First Error = Desired-Measured First = TC1 TFLG1 = 0x02
Too fast >0 U=U–1
0
Too slow U
=250
0){ U--; // too fast } } else if(Error > 0){ if(U < 250){ U++; // too slow } } PWM_Duty0(U); // actuator TC5 = TC5+10000; // 10ms TFLG1 = 0x20; // ack C5F }
Checkpoint 11.9: In what way would the controller behave differently if we added/subtracted 10 instead of 1? Checkpoint 11.10: What happens the TC5 interrupt executes too frequently? Observation: It is a good debugging strategy to observe the assembly listing generated by the compiler when performing calculations on variables of mixed types (signed/unsigned, char/short). Observation: Incremental control will work moderately well (accurate and stable) for an extremely wide range of applications. Its only short-coming is that the controller response time can be quite slow.
11.8
Tutorial 11 Analog Input Programming The objective of this tutorial is to illustrate I/O programming of the LCD and ADC. An analog signal is sampled using the ADC, and the results are displayed on a LCD. As our software becomes more complete, we need alternative methods to visualize its complexity. The first part of this tutorial will be to visualize the software in different formats. Action: Copy the Tutor11.rtf Tutor11.uc Tutor11.scp Tutor11.io. files from the web onto your hard drive. Start a fresh copy of TExaS and open the Tutor11.rtf program file from within TExaS. This should open the corresponding microcomputer, scope and IO Device windows. Question 11.1 Flowcharts are a convenient way to describe computer algorithms. Look at the Tutor11.rtf file and draw flowcharts of the Main ADC_In and LCD_OutChar programs. Question 11.2 How does the synchronization technique used in ADC_In differ from LCD_OutChar? Question 11.3 Call-graphs are used to visualize software hierarchy. Look at the Tutor11.rtf file and draw a call-graph of the software system. First, begin by defining the three software modules
11.9 䡲 Homework Problems
417
and two hardware devices, as shown in Figure T11.1. Ovals are software modules and rectangles are I/O devices. If there were any global data structures, they would also be shown as rectangles. Next, for each situation where one function calls another, draw a call arrow from the function that performs the call to the function it calls. If one function calls a second function more than once, show only one arrow. Finally, for each I/O device access (read or write) draw an arrow from the function to the I/O device. Figure T11.1 The first step when drawing a call-graph is to list the functions and modules.
Main ADC_Init ADC_In
ADC Analog to digital convertor
LCD_OutHex LCD_Init LCD_OutChar
LCD Display
Question 11.4 A data flow graph illustrates the data as it flows from input to output. Draw a data flow graph of this system, starting with the input hardware (ADC), going through each software function that handles the data, leading to the output hardware (LCD). Again, ovals are software modules and rectangles are I/O devices. Action: Assemble and run Tutor11.rtf. Observe the I/O device registers in the microcomputer window, the analog voltage versus time signal in the scope window and the status of the external hardware in the IO Device window. The results of the ADC sampling will be displayed on the LCD. Question 11.5 What is the TCNT clock period? In this system, the default value is not used. Action: You will measure the time from ADC sample to ADC sample. First, remove all entries in the Microcomputer ViewBox window except TCNT. Next, add a breakpoint at the assembly line that starts the ADC (first instruction of the ADC_In function.) Next, switch the breakpoint system from Break BreakMode so that the check mark is absent. Run mode to Scan mode. In particular, execute Mode- the program and observe the timing results in the TheLog.RTF window. You should get results like the following TCNT=1748 TCNT=2047 TCNT=2346 TCNT=2645 TCNT=2942 TCNT=3239 TCNT=3536 Question 11.6 Using the results from Question 11.5, what is the average time in between ADC samples? Ignore the first couple of samples. Give your results in cycles and in sec. Question 11.7 Calculate the approximate sample rate in samples/sec. This is not a very appropriate method to sample an ADC. We can achieve very accurate timing using a periodic interrupt to sample the ADC. Question 11.8 If we still wish to display each ADC sample on the LCD, can we use interrupt synchronization to sample faster?
11.9
Homework Problems Homework 11.1 The Maxim MAX549 is a 2-channel 8-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX549. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550
418
11 䡲 Analog I/O Interfacing example in the chapter, except the DACout function includes a channel number in Register B. If Register B equals 0, then output the value in Register A to DAC channel A. If Register B equals 1, then output the value in Register A to DAC channel B. Homework 11.2 The Maxim MAX539 is a 1-channel 12-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX539. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550 example in the chapter, except the DACout function takes a 12-bit number in Register D. Homework 11.3 The Maxim MAX5235 is a 2-channel 12-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX5235. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550 example in the chapter, except the DACout function includes a channel number in Register X. If Register X equals 0, then output the 12-bit value in Register D to DAC channel A. If Register X equals 1, then output the 12-bit value in Register D to DAC channel B. Homework 11.4 Assume you have the MAX550 interface of Figure 8.10 and the software of 8.3. Write a main program and an output compare interrupt service routine that creates a 100 Hz sine wave analog output. The DAC outputs occur in the ISR, and after the main program initializes the DAC and output compare, it is free to perform other unrelated tasks. Homework 11.5 Consider the 8-bit R-2R resistor ladder shown in Figure Hw11.5. Assume Port T is an output and the digital output voltages from PTT are 0 or 5 V. Derive a relationship between the 8bit digital number output to PTT and the current flowing in the resistor labeled Rout. Hint: if one output pin is high and the other pins are low, calculate the current flowing from the pin up from through the 20 k resistor. Show that this current is the same value regardless of which pin is high (assuming the other pins are low). When a current comes up to a node (drawn with the black dot), it can go one way or another. Again assuming exactly one digital output is high, what happens to currents at each node? I.e., how much goes left and how much goes right? Solve for the basis elements of the 8-bit digital number. I.e., what is Iout if the digital number is 1, 2, 4, 8, 16, 32, 64, and 128? Given the responses for these basis elements, use the law of superposition to derive a general relationship.
Figure Hw11.5 8-bit R-2R resistor ladder.
20kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ
9S12 PT0 PT1 PT2 PT3 PT4 PT5 PT6 PT7
Iout 20kΩ R out
20k
20k
20k
20k
20k
20k
20k
20k
Homework 11.6 Assume you have a 12-bit signed ADC. Let Vin be the analog voltage in volts and N be the digital ADC output. The input range of 5 Vin 5 V. The ADC digital output range is 2048 N 2047. First, write a linear equation that relates Vin as a function of N. Next, rewrite the equation in fixed-point math assuming Vin is represented as a decimal fixed point number with 0.001 V. Homework 11.7 Assume you have an 11-bit signed ADC. Let Vin be the analog voltage in volts and N be the digital ADC output. The input range of 10 Vin 10 V. The ADC digital output range is 1024 N 1023. First, write a linear equation that relates Vin as a function of N. Next, rewrite the equation in fixed-point math assuming Vin is represented as a decimal fixed point number with 0.01 V. Homework 11.8 Write an assembly language subroutine that samples ADC channel 2 four times, calculates the average of the four samples, and returns the result in Register A. Homework 11.9 Write an assembly language subroutine that samples all 8 ADC channels, calculates the average of the eight samples, and returns the result in Register A. Homework 11.10 Write an assembly language subroutine that samples all 8 ADC channels, calculates the minimum and maximum of the eight samples, and returns the range (maximum-minimum) in Register A.
11.10 䡲 Laboratory Assignments
419
Homework 11.11 Write an assembly language subroutine that samples ADC channels 0,1,2, calculates the median of the three samples, and returns the result in Register A. Homework 11.12 Write an assembly language subroutine that samples the ADC and returns a voltage in Register D using decimal fixed point with 0.001 V. Homework 11.13 Write an assembly language subroutine that samples the ADC and returns a voltage in Register D using binary fixed point with 28 V. Homework 11.14 Assume an AC waveform is connected to analog channel 0. Write an initialization ritual. Write a subroutine that samples the analog input 256 times, and returns the DC amplitude (average) in Register A, and the AC amplitude (maximum-minimum) in Register B. Homework 11.15 Assume an analog input signal is connected to the ADC channel 2 on computer 1. Assume the transmit serial output of computer 1 is connected to the receive serial input of computer 2. A MAX550A is connected to computer 2, like the interface shown in the previous chapter. Write the dedicated software in both systems, so that the analog input is sampled by the first computer, transmitted via the serial link, and converted back to analog form by the DAC by the second computer. Homework 11.16 An embedded system will use an ADC to measure a parameter. The measurement system range is 0.0 to 9.99 and a resolution of 0.01. What is the smallest number of ADC bits that can be used? Homework 11.17 An embedded system will use an ADC to measure a distance. The measurement system range is 10 to 10 cm and a resolution of 0.01 cm. What is the smallest number of ADC bits that can be used? Homework 11.18 An embedded system will use an ADC to measure a force. The measurement system range is 0 to 100 N and a resolution of 0.01 N. What is the smallest number of ADC bits that can be used? Homework 11.19 An 8-bit ADC (different from the 9S12) has an input range of 0 to 2 volts and an output range of 0 to 255 (called straight binary). What digital value will be returned when an input of 1.5 volts is sampled? Homework 11.20 A 12-bit ADC (different from the 9S12) has an input range of 2.5 to 2.5 volts and an output range of 0 to 4095 (called offset binary). What digital value will be returned when an input of 1.25 volts is sampled? Homework 11.21 A 16-bit ADC (different from the 9S12) has an input range of 0 to 2.5 volts and an output range of 0 to 65535 (called straight binary). What digital value will be returned when an input of 0.625 volts is sampled? Homework 11.22 Assume the ADC sequence length is 3, ATDCTL3 equals $18, and $95 is written into ATDCTL5. Which of the following happens? a) Channel 5 is sampled and the result is placed in ATDDR0 b) Channel 5 is sampled and the result is placed in ATDDR5 c) Channel 5 is sampled three times and the results are placed in ATDDR0-ATDDR2 d) Channel 5 is sampled three times and the results are placed in ATDDR5-ATDDR7 e) Channels 5,6,7 are sampled and the results are placed in ATDDR0-ATDDR2 f) Channels 5,6,7 are sampled and the results are placed in ATDDR5-ATDDR7 Homework 11.23 The Maxim MAX1247 is a 4-channel 12-bit ADC with an SPI interface. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX1247. Show the circuit diagram connecting the ADC chip to an SPI port. Develop ADC_Init and ADC_In functions to initialize and sample the ADC. The ADC_In function takes the channel number (0-3) in RegA and returns the 12-bit ADC sample in RegD.
11.10
Laboratory Assignments Lab 11.1 Voltmeter Purpose: The purpose of this lab is to learn LCD interfacing, to use interrupts to perform real time sampling and to use fixed-point numbers to represent non-integer values.
420
11 䡲 Analog I/O Interfacing Description: In this lab you will develop an accurate way to establish 1 kHz sampling. In particular, the ADC should be started exactly every 1 ms. Second, you will convert the ADC sample into a decimal fixed-point number, with a of 0.01 V. Lastly, you will develop a fixed-point display function, which will be used to display the sampled signal on the LCD. a) Write initialization routines for the output compare interrupts, the ADC and the LCD display. b) Write a subroutine that converts an 8-bit binary ADC sample into unsigned fixed-point format. The input parameter to the subroutine will be passed in using Register D, and your subroutine will return the result in Register D. Table L11.1a shows some example results. Do not worry if your answers differs by 1, because of rounding.
Table L11.1a Example results of the conversion from ADC sample to fixed-point.
Analog input
ADC sample
Fixed-point Output
0.000 V 1.234 V 3.456 V 5.000 V
%0000000000 %0011111100 %1011000000 %1111111111
0 123 346 500
c) Write a subroutine that outputs the fixed-point number to the LCD. The input parameter to the subroutine will be passed in using Register D. Table L11.1b shows some example results. Table L11.1b Example results of the fixed-point display subroutine.
Fixed-point
LCD Display
0 123 346 500
0.00 V 1.23 V 3.46 V 5.00 V
d) Write the main program that first initializes the system. The real time measurements occur in the background, results are passed via a global variable to the foreground and the main program converts it to fixed point, then displays the results on the LCD. Lab 11.2 Distance Monitor Purpose: The purpose of this lab is to learn LCD interfacing, to use interrupts to perform real time sampling and to use fixed-point numbers to represent non-integer values. Description: In this lab you will use a Sharp GP2Y0A21YK0F infrared object detector to measure distance (http://www.sharpsma.com). This sensor creates a continuous analog voltage between 0 and 5 V that depends inversely on distance to object, see Figure L11.2. Figure L11.2 Response curve of the Sharp GP2Y0A21YK0F distance sensor.
3.5 6 cm
Output voltage (V)
3.0
8 cm
5 cm
7 cm
2.5 10 cm
2.0 15 cm
1.5
0.5
20 cm
25 cm
1.0
30 cm 50 cm 40 cm
Gray paper (Reflectance ratio 18%)
80 cm
0.0 0.00
White paper (Reflectance ratio 90%)
0.05
0.10 0.15 1/Distance (1/cm)
0.20
11.10 䡲 Laboratory Assignments
421
The operational range of this sensor is 10 to 80 cm. Other Sharp sensors have other distance ranges. The response time is 39 ms, so you will use output compare interrupts to establish 10 Hz sampling. In particular, the ADC should be started exactly every 100 ms. Thirdly, you will convert the ADC sample into a decimal fixed-point number, with a of 0.1 cm. Lastly, you will develop a fixedpoint display function, which will be used to display the sampled signal on the LCD. a) Write initialization routines for the output compare interrupts, the ADC and the LCD display. b) Write a subroutine that converts the ADC sample into distance defined as an unsigned fixedpoint number. The input parameter to the subroutine will be passed in using Register D, and your subroutine will return the result in Register D. c) Write a subroutine that outputs the fixed-point distance to the LCD. d) Write the main program that first initializes the system. The real time measurements occur in the background, results are passed using a MailBox to the foreground and the main program converts it to fixed point, then displays the results on the LCD. Lab 11.3 AC/DC Voltmeter Purpose: The objectives of this lab are to 䡲 䡲 䡲 䡲
Interface three-digit LCD display to the microcomputer Write device drivers for a switch, an ADC and a LCD Implement functions for addition multiplication division and squareroot Implement AC/DC voltmeter
Description: In this lab you will design an AC/DC voltmeter. The ADC will be used to sample an analog input. The DC amplitude is the simple average of multiple samples. Let v[n] be 256 voltage sampled voltage values. DC (v[0] v[1] . . . v[255])/256) The AC amplitude is calculated as a root-mean-squared value. AC sqrt(((v[0] DC)2 (v[1] DC)2 . . . (v[255] DC)2)/256) The LCD output will be in decimal fixed point with a of 0.01 V. You will find it much simpler to perform the AC/DC calculations on the raw integer sample, then convert to integer AC/DC results to fixed-point voltages. A toggle switch will allow the operator to select either AC or DC mode. The three-digit LCD display will show either the calculated DC or AC value as a fixed-point number with equal to 0.01 V. A device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a toggle switch, an analog signal to the ADC and simple three-digit LCD display. You can assume the toggle switch does not bounce. b) Write a device driver for the 3-digit LCD. You should be able to initialize the interface and output a fixed-point number. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the ADC interface. You should design subroutines as needed. The names of all the public driver subroutines should start with the letters “ADC_”. Draw flowcharts of these subroutines. d) Write a device driver for the switch interface. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the switch must be included in this driver. The names of all the public driver subroutines should start with the letters “Switch_”. Draw flowcharts of these subroutines. e) Write the main program that implements the voltmeter functionality. Sample the ADC as fast as possible, and use the TCNT timer to estimate the sampling rate. Calculate the DC and AC results independent of the switch position, so that the sampling rate will be approximately constant. Include a “call-graph” of the system. f) Evaluate the accuracy of the meter, which is the difference between the true signal, and the results measured with the system. Use a signal period that is about 16 times larger than the ADC sample period. In this way there will be about 16 ADC samples per wave, and about 16 waves per block of 256 ADC samples. The first two tests will be performed on the pure sine wave. Determine the mathematical relationship between the peak-to-peak sine wave amplitude and its RMS value. First, keeping the DC value fixed, evaluate the accuracy of the system the 5 different AC values.
422
11 䡲 Analog I/O Interfacing Second, keeping the AC value fixed, evaluate the accuracy of the system the 5 different DC values. Lastly, keeping the minimum and maximum of the signal constant, test the voltmeter with each of the signal shapes. Explain the differences in the results. Lab 11.4 Real-Time Position Measurement System Purpose: This lab has these major objectives: 䡲 䡲 䡲 䡲
An introduction to sampling analog signals using the ADC interface; Development of an ADC device driver; Data conversion and calibration techniques; Develop an interrupt-driven real-time sampling device drive.
Description: You will design a position meter with a range of about 3 cm. A linear slide potentiometer (Alpha RA300BF-10-20D1-B54) converts position into resistance (0 R 50 k). You will use an electrical circuit to convert resistance into voltage (Vin). The potentiometer has three leads. The 9S12 ADC will convert voltage into a 10-bit digital number (0 to 1023). Your software will calculate position from the ADC sample as a decimal fixed-point number. The position measurements will be displayed on the LCD. A periodic interrupt will be used to establish the real-time sampling. The left of Figure L11.4a shows the data flow graph of this system. Dividing the system into modules allows for concurrent development and eases the reuse of code. The right of Figure L11.4a shows the call graph.
Position Voltage 0 to 3 cm 0 to +5V Position Sensor
Sample 0 to 1023
ADC hardware
ADC driver
Sample 0 to 1023 OC ISR
OC hardware LCD display
Fixed-point 0 to 3.000 LCD driver
OC ISR
main
OC init
ADC driver
OC hardware
ADC hardware
LCD driver LCD hardware
Figure L11.4a Data flow graph and call graph of the position meter system.
You should make the position resolution and accuracy as good as possible. The position resolution is the smallest change in position that your system can reliably detect. In other words, if the resolution were 0.01 cm and the position were to change from 1.00 to 1.01 cm, then your device would be able to recognize the change. Resolution will depend on the amount of electrical noise, the number of ADC bits, and the resolution of the output display software. Considering just the errors due to the 10-bit ADC, we expect the resolution to be 3 cm/1024 or about 0.003 cm. Accuracy is defined as the absolute difference between the true position and the value measured by your device. Accuracy is dependent on the same parameters as resolution, but in addition it is also dependent on the stability of the transducer and the quality of the calibration procedure. In this lab, you will be measuring the position of the armature (the movable part) on the slide potentiometer. This signal has very few frequency components (0 to 2 Hz.) According to the Nyquist Theorem, we need a sampling rate greater than 4 Hz. Consequently, you will create a system with a sampling rate of 5 Hz. You will sample the ADC exactly every 0.2 sec and calculate position using decimal fixed-point with of 0.001 cm. You should display the results on the LCD, including units. An output compare interrupt will be used to establish the real-time periodic sampling. When a transducer is not linear, you could use a piece-wise linear interpolation to convert the ADC sample to position ( of 0.001 cm.) The 9S12 assembly language etbl instruction is an efficient mechanism to perform the interpolation. The etbl.RTF assembly program
11.10 䡲 Laboratory Assignments
423
included with TExaS is an example of a piece-wise linear interpolation using the etbl instruction. There are two small tables Xtable and Ytable. The Xtable contains the ADC results and the Ytable contains the corresponding positions. The ADC sample is passed into the lookup function. This function first searches the Xtable for two adjacent of points that surround the current ADC sample. Next, the function uses the etbl instruction to perform a linear interpolation to find the position that corresponds to the ADC sample. You are free to implement the conversion in any acceptable manner, with the exception that you are not allowed to use the etbl instruction. The 10-bit ADC converters on the 9S12 are successive approximation devices with a short conversion time. You need to enable the ADC in ATD0CTL2. In particular, you will set ATD0CTL2 ⴝ $80. You can define the number of ADC conversions (1 to 8) in a sequence using ATD0CTL3. For this lab, you will only need a single conversion, so you can set the control bits S8C S4C S2C S1C in ATD0CTL3 equal to 0001 respectively. In particular, you will set ATD0CTL3 ⴝ $08. Bit 7 of determines if the ADC operates with 8 bits or 10 bits. You will clear bit 7 to specify 10-bit precision. The remaining 7 bits of ATD0CTL4 specify the ADC clock, which will determine the time to perform an ADC conversion. If the 9S12DP512 were running at 8 MHz, you should set ATD0CTL4 ⴝ $03. At this setting, the ADC will be clocked at 1 MHz, and the ADC conversion time will be equal to 14 s. However, in this lab we will be running the 9S12DP512 at 24 MHz, therefore you could set ATD0CTL4 ⴝ $05, the ADC will be clocked at 2 MHz, and the ADC conversion time will be equal to 7 s. In summary, the ADC initialization should set ATD0CTL2=$80 ATD0CTL3=$08 ATD0CTL4=$05
turns on ADC specifies ADC sequence will perform one conversion specifics 10-bit mode, and 7us conversion time
Writing to the ADC Control register (ATD0CTL5) begins a conversion. The ADC chip clocks itself. To perform a right-justified ADC conversion of channel 4, you should write a $84 to ATD0CTL5. After the first sample is complete, CCF0 is set and the result can be read out of the first result register, ATD0DR0. After the entire sequence has been converted, the SCF bit is set. In summary, the ADC conversion of channel 4 requires the following actions 1) 2) 3) 4)
ATD0CTL5=$84 starts the ADC Read ATD0STAT1 and look at bit 0 (CCF0) Loop back to step 2 over and over until CCF0 is set (7us) Read 10-bit result in ATD0DR0
The analog signal connected to the microcomputer comes from a position sensor, such that the analog voltage ranges from 0 to 5 V as the position ranges from 0 to 3 cm. First, you will use output compare interrupts to establish 5 Hz sampling. In particular, the ADC should be started exactly every 0.2 s. Second, you will convert the ADC sample (0 to 1023) into a 16-bit unsigned decimal fixed-point number, with a of 0.001 cm. Lastly, you will use your LCD_OutFix function from the previous lab to display the sampled signal on the LCD. Include units on your display. a) You can create a scale by Xerox-copying a metric ruler. There are many ways to build the transducer. One method requires cutting, gluing, and soldering. Start with a piece of wood or plastic a little larger than the potentiometer. Glue the frame (the fixed part) of the potentiometer to this solid object. Tape or glue the metric ruler on the frame but near the armature (the movable part) of the sensor. Attach or draw a hair-line to the armature, which will define the position measurement. Solder three solid wires to the slide potentiometer. b) Write two subroutines: ADC_Init will initialize the ADC interface and ADC_In4 will sample the ADC channel 4. Use the simulator to test these functions. c) Write a simple simple version of the system, which you can use to collect calibration data. In particular, this system should first sample the ADC and then display the results as unsigned decimal numbers. You should use your LCD_OutDec developed in the previous lab. Collect five to ten calibration points and create a table showing the true position (as determined by reading the position of the hair-line on the ruler), the analog input measured with a digital voltmeter and the ADC sample (like the first three columns of Table L11.4).
424
11 䡲 Analog I/O Interfacing
Table L11.4 Calibration results of the conversion from ADC sample to fixed-point.
Position
Analog Input
ADC Sample
Fixed-Point Output
0.010 cm 0.741 cm 1.500 cm 2.074 cm 3.000 cm
0.000 V 1.234 V 2.500 V 3.456 V 5.000 V
0 252 512 707 1023
10 741 1500 2074 3000
d) Use this calibration data to write a subroutine that converts a 10-bit binary ADC sample into a 16-bit unsigned fixed-point number. The input parameter (10-bit ADC sample) to the subroutine will be passed in using Register D, and your subroutine will return the result (integer portion of the fixed-point number) in Register D. Table L11.4 shows some example results. You are allowed to use a linear equation to convert the ADC sample into the fixed-point number. e) Write a subroutine: OC_Init will initialize the output compare system to interrupt at exactly 5 Hz (every 0.2 second). Use the simulator to test these functions. When debugging your code in TExaS it will be more convenient to run with a shorter OC interrupt period, e.g., 10 to 50 ms. 1. Disable interrupts to make the initialization atomic (set I bit in CCR) 2. Enable the timer and an output compare channel, make PT7 an output (interface a LED) 3. Arm output compare 4. Specify when the first output compare interrupt will be 5. Enable interrupts (clear I bit in CCR) f) Write an output compare interrupt handler that samples the ADC and outputs the data to the LCD. Use the simulator to test these functions. Using the interrupt synchronization, the ADC will be sampled at almost equal time intervals7. The interrupt service routine performs these tasks 1. Acknowledge the output compare interrupt by clearing the flag that requested the interrupt 2. Specify the time for the next interrupt 3. Toggle PT7 (change from 0 to 1, or from 1 to 0) 4. Sample the ADC 5. Convert the sample into a fixed-point number (0 to 3000) 6. Output the fixed-point number on the LCD 7. Return from interrupt g) Write a simple main program, which initializes the PLL, timer, LCD, ADC and output compare interrupts. After initialization, this main program (foreground) performs a do-nothing loop. The entire run-time operations occur in the output compare interrupt service routine (background). h) Use the system to collect another five to ten data points, creating a table showing the true position (xti as determined by reading the position of the hair-line on the ruler), and measured position (xmi using your device). Calculate average accuracy by calculating the average difference between truth and measurement, Average accuracy (with units in cm)
1 n ƒ x ti xmi ƒ n ia 1
Lab 11.5 Music generation using a Digital to Analog Converter Purpose: The purpose of this lab is to 䡲 Build a DAC, 䡲 Design a data structure to represent music, 䡲 Develop a system to play sounds.
7
More precisely, the output compare flag is set at exact time intervals. There is some variability in when the ISR uns depending on which instruction is being executed at the time when the flag is set.
11.10 䡲 Laboratory Assignments
425
Description: Most digital music devices rely on high-speed DAC converters to create the analog waveforms required to produce high-quality sound. In this lab you will create a very simple sound generation system that illustrates this application of the DAC. Your goal is to create an embedded system that plays three note (a digital piano with three keys). The first step is to design and test a 4-bit DAC, which converts 4 bits of digital output from the 9S12 to an analog signal, see Figure L11.5a. You are free to design your DAC with a precision more than 4 bits. You will convert the binary bits (digital) to an analog output using a simple resistor network. During the static testing phase, you will connect the DAC analog output to your voltmeter and measure resolution, range, precision and accuracy. During the dynamic testing phase you will connect the DAC output to headphones, and listen to sounds created by your software. It doesn’t matter what range the DAC is, as long as there is an approximately linear relationship between the digital data and the speaker current. The performance score of this lab is not based on loudness, but sound quality. The quality of the music will depend on both hardware and software factors. The precision of the DAC, external noise and the dynamic range of the speaker are some of the hardware factors. Software factors include the DAC output rate and the complexity of the stored sound data. You can create a 3 k resistor from two 1.5 k resistors. You can create a 6 k resistor from two 12 k resistors,
Figure L11.5a DAC allows the software to create music.
Static testing
9S12
9S12 Bit3
Bit3 Bit2 bit1
Dynamic testing
Vout Voltmeter
Bit2 Bit1
I out
Speaker
The second step is to design a low-level device driver for the DAC. Remember, the goal of a device driver is to separate what the device does (general descriptions of DAC_Init and DAC_Out) from how is does it (implementations of DAC_Init and DAC_Out). The third step is to design a data structure to store the sound waveform. You are free to design your own format, as long as it uses a formal data structure. Compressed data occupies less storage, but requires runtime calculation. The fourth step is to organize the digital piano software into a device driver. Although you will be playing only three notes, the design should allow additional notes to be added with minimal effort. For example, if your system plays C, D, E, then you will need public functions Piano_Stop Piano_C, Piano_D and Piano_E. The Stop function makes it silent and the other functions activate a sound. A background thread implemented with output compare will fetch data out of your music structure and send them to the DAC. The last step is to write a main program that inputs from binary switches and performs the four public functions. If you output a sequence of numbers to the DAC that form a sine wave, then you will hear a continuous tone on the speaker. a) Draw the circuit required to interface the DAC to the 9S12. Design the DAC converter using a simple resistor-adding technique. Use resistors in a 1/2/4/8 resistance ratio. Select values in the 1.5 k to 12 k range. For example, you could use 1.5 k, 3 k, 6 k, and 12 k. Notice that you could create double/half resistance values by placing identical resistors in series/parallel. It is a good idea to email your design to your TA and have him/her verify your design before you build it. You can solder 24 guage solid wires to the jack to simplify connecting your circuit to the headphones. b) Write a low-level device driver for the DAC interface. Include two functions that implement the DAC interface. The function DAC_Init() initializes the DAC, and the function DAC_Out sends a new data value to the DAC. You can debug your software in TExaS using the DC motor I/O device. This module allows you to connect a DAC to an output port. You can select the precision of the DAC (4 bits in this case). You can visualize the generated waveform on the scope by selecting the D/A output (or DC motor power). Figure L11.5b shows the TExaS dialog to interface the DAC to PM3,2,1,0, and a sine wave generated by a 4-bit DAC simulated in TExaS.
426
11 䡲 Analog I/O Interfacing
Figure L11.5b The 4-bit DAC is used to create a sin wave using TExaS.
c) Write a couple of simple main programs that test the DAC interface. This main program can be used for static testing. You can single step this program using the debugger to test the static function of the DAC org Entry lds jsr clra loop jsr inca anda bra
$4000 #$4000 DAC_Init DAC_Out #$0F loop
d) Using Ohm’s law and fact that the digital output voltages will be approximately 0 and 5 V, make a table of the theoretical DAC voltage and as a function of digital value (without the speaker attached). Calculate resolution, range, precision and accuracy. This main program can be used for dynamic testing. It creates triangle waveform (adjust the 1000 to affect the frequency). org Entry lds jsr jsr clra psha n equ loop ldd jsr ldaa inca jsr staa cmpa bne loop2 ldd jsr ldaa deca jsr staa cmpa bne bra
$4000 #$4000 Timer_Init DAC_Init
0 #1000 Timer_Wait n,sp DAC_Out n,sp #15 loop #1000 Timer_Wait n,sp DAC_Out n,sp #0 loop2 loop
e) Design and write the piano device driver software. Add minimally intrusive debugging instruments to allow you to visualize when interrupts are being processed. f) Write a main program to run the entire system. Document clearly the operation of the routines. Figure L11.5c shows the data flow graph of the music player.
11.10 䡲 Laboratory Assignments Figure L11.5c Data flows from the memory and the switches to the speaker.
Push buttons
Switch interface
Timer hardware
Timer interface
main
Sound interface
427
Speaker hardware
music
Figure L11.5d shows a possible call graph of the system. Dividing the system into modules allows for concurrent development and eases the reuse of code. Figure L11.5d A call graph showing the three modules used by the music player.
main program
Switch driver
Switch hardware
DAC driver
music
OC hardware
Speaker hardware
Extra Credit: Extend the system so that is plays your favorite song (a sequence of notes, set at a specific tempo and includes an envelop like Figure 11.9). Your goal is to play your favorite song. One possible approach is to use two output compare interrupts. A fast output compare ISR outputs the sinewave to the DAC (Figure 8.10). The rate of this interrupt is set to specify the frequency (pitch) of the sound. A second slow output compare ISR occurs at the tempo of the music. For example, if the song has just quarter notes at 120, then this interrupt occurs every 500 ms. If the song has eight notes, quarter notes and half notes, then this interrupt occurs at 250, 500, 1000 ms respectively. During this second ISR, the frequency of the first ISR is modified according to the note that is to be played next. Compressed data occupies less storage, but requires runtime calculation. On the other hand, a complete list of points will be simpler to process, but requires more storage than is available on the 9S12. The fourth step is to organize the music software into a device driver. Although you will be playing only one song, the song data itself will be stored in the main program, and the device driver will perform all the I/O and interrupts to make it happen. You will need public functions Play and Stop, which perform operations like a cassette tape player. The Play function has an input parameter that defines the song to play. If you complete the extra credit (with input switches that can be used to play and stop), then the piano functionality in parts e) and f) need not be completed. Either way, parts a) b) c) and d) are required. Lab 11.6 Real-Time Temperature Data Acquistion Purpose: The objective of this lab is to study analog to digital conversion, real-time sampling, digital filtering foreground-background communication, table lookup, and LCD display. Description: In preparation for this assignment, review fixed-point, passing/returning parameters, and ADC converters. Look up how TExaS simulates analog signals and the ADC using the on-help. Analog command. Example programs that apply to this lab include In particular, get help for the IO- HD44780.rtf, LCD.rtf, tut3.rtf, and tut5.rtf. The ADC will be used to measure oral body temperature. The range is about 90 to 105 F, and the resolution is 0.1 F. Real-time data acquisition will use 0.5 ms periodic interrupts. The sampling rate will be 2 kHz. 500 Hz analog noise will be added. A 500 Hz digital reject filter will remove the noise. Software will convert the ADC sample into decimal fixed point number, and the result will be displayed on a LCD. The system will implement a digital oral thermometer. The thermistor resistance is nonlinearly related to its temperature in Kelvin.
428
11 䡲 Analog I/O Interfacing RT R0exp(/T) where R0 is 1.03947E-07 k and is 5808.1 K for this device. An analog circuit, shown in Figure L11.6a, converts the resistance to voltage, and the ADC converts voltage to digital sample. R1 is 200 k, R2 is 11.1 k, and the gain is 28.6.
Figure L11.6a Temperature data acquisition circuit.
+5 V
R1
R1
+5V
V2 RG
RT
R2
V3
AD623 or AD627
ADC
V1
The overall response is shown in Table L11.6, and plotted in Figure L11.6b. The details of these calculations can be found in the spreadsheet Lab11_06.xls.
Temperature (0.1 F)
Figure L11.6b Temperature calibration.
1080 1060 1040 1020 1000 980 960 940 920 900 880 0
50
100
150 ADC
200
250
300
a) Interface a LCD display (either the simple BCD or the Hitachi HD44780), a switch, and an analog signal to the microcomputer. You should select a slow-varying sine-wave to simulate oral temperature. The peak temperature will be displayed on the LCD. The maximum voltage should be set to 5000 mV (90F) and the minimum voltage can be adjusted to change body oral temperature. For example, 1910 mV (ADC 97) represents 98.6 F. Make the sine-wave period large, e.g., 50000 sec. In this way, it will take a lot of ADC samples to create an entire cycle. You should add 100 mV of 500 Hz (2000 sec period) noise. b) Enter the last two columns of data from Table L11.6 into a constant data structure. Write a subroutine that converts ADC sample to decimal fixed-point temperature using table look up and linear interpolation. Table L11.6 Temperature calibration.
T(F)
RT (k)
V1 (V)
V1 V2 (V)
V3 (V)
ADC
Fixed-Point
89.6 91.2 92.8 94.5 96.1 97.7 99.3 100.9 102.6 104.2 105.8
19.2 18.1 17.2 16.2 15.4 14.6 13.8 13.1 12.4 11.7 11.1
0.438 0.416 0.395 0.375 0.357 0.339 0.322 0.306 0.291 0.277 0.263
0.175 0.153 0.132 0.113 0.094 0.076 0.059 0.044 0.028 0.014 0.001
5.000 4.375 3.782 3.220 2.686 2.180 1.700 1.244 0.813 0.403 0.015
255 223 192 164 136 111 86 63 41 20 0
896 912 928 945 961 977 993 1009 1026 1042 1058
11.10 䡲 Laboratory Assignments
429
c) Write software that samples at 2 kHz, implements the following digital filter, and stores the digital filter output into a FIFO queue. You will need two MACQs, one for x(n) and one for y(n). y(n) (113*x(n) 113*x(n 2) 98*y(n 2))/128 Observation: If the sampling rate were to be 240 Hz, this filter rejects 60 Hz. d) Write a main program that waits for the switch to be pressed. Once the switch is pressed, the interrupts are armed, data is removed from the FIFO, converted to decimal fixed point and the maximum temperature is displayed on the LCD. If the switch is released, then the interrupts are disarmed, the display is blanked, and the software waits for the switch to be pressed again. Observation: If you change the sampling rate, change the simulated noise period to be four times the ADC sample period. Lab 11.7 AC Voltmeter Purpose: In this lab you will learn how to convert, subtract, and display decimal fixed-point numbers. You will pass parameters on the stack. The analog to digital converter will be used to convert an analog signal into digital form. Description: In preparation for this assignment, review fixed-point, passing/returning parameters on the stack, and ADC converters. Look up how TExaS simulates analog signals and the ADC using the on-help. In particular, get help for the IO->Analog command. Run and analyze the TUT3.rtf example program. Five possible analog waveforms can be connected to the ADC. Your objective is to measure the analog signal 1008 times and convert each ADC sample to fixed-point voltage. During the 100-sample sequence establish the minimum (min) and maximum (max). At the end of the cycle, you will measure the AC amplitude (max ⴚ min) and display it on the LCD display. Activate the appropriate decimal point in the LCD and add appropriate label and units. Your software should reinitialize variables and continuously repeat the 100-sample cycle. a) Create an I/O file and attach an analog waveform and a LCD display. You will adjust the number of samples (e.g., 100) in your main program loop, and the period of the analog wave so that about 2 to 5 waveform periods occur in each software loop. You will adjust the minimum, maximum, and noise level during the testing steps of this problem (part f). b) Write a general purpose ADC sampling subroutine that accepts an 8-bit call by value input parameter, channel number (0 to 7), and returns an 8-bit unsigned binary digital result from the ADC. Both parameters will be on the stack. Typical calling sequences are shown below. Use binding (equ) to make the subroutine more readable. ***** ZZ=A2D(4); *** ******* XX=A2D(chan); ************ ldab #4 movb chan,1,-sp ; push chan on the stack pshb leas -1,sp des ; allocate space for result bsr A2D jsr A2D pulb movb 1,sp+,XX ; get result stab ZZ leas 1,sp ins ; discard input c) Write an 8-bit unsigned binary to 16-bit unsigned decimal fixed-point conversion subroutine. The fixed-point constant is 0.01 V. You may choose to pass parameters anyway you wish, but please document in the comments how parameters are passed. In the table below, the ADC converts from the first to second column. This subroutine converts the 8-bit unsigned number shown in the second column to the value shown in the third column.
8
You can adjust this number to make sure you are observing at least two full periods of the waveform.
430
11 䡲 Analog I/O Interfacing
Analog Voltage
ADC Output
16-bit Unsigned Decimal Fixed Point
LCD Display
0.0 V 0.02 V 1.25 V 2.50 V 4.98 V
0 1 64 128 255
0 2 125 250 498
0.00 0.02 1.25 2.50 4.98
d) Write a subroutine that takes a 16-bit unsigned decimal fixed-point number (fixed constant is 0.01 V), and displays it on the LCD display. You may choose to pass parameters anyway you wish, but please document in the comments how parameters are passed. In the above table, this subroutine converts the 16-bit integer shown in the third column to the LCD pattern shown in the fourth column. e) Write the main program that calls the above three subroutines and performs the AC voltmeter measurements, updating the LCD at the end of each cycle. f) Using the information entered in the IO-Analog command as truth, collect the following measurements. At the time of checkout be prepared to discuss why the last three measurements had more error than the first.
Waveform
Noise
True Max (volts)
True Min (volts)
True AC (volts)
Sine Sine EKG Sine
None None None 100 mV 500 s
4.000 1.005 4.000 4.000
1.000 1.000 1.000 1.000
3.000 0.005 3.000 3.000
Measured AC Percent Error (%) (volts) 100•(true-measured)/true
g) In addition to the operations described in part f extend the main program to also measure the period of a sine wave in sec. Display the AC amplitude on the LCD display and display the period by simply writing it to a global variable, and observing it in the ViewBox. To implement hysteresis, we define two thresholds at 25 percent and 75 percent9 depending on min and max: High (3*(max min))/4 min Low (max min)/4 min First wait for the signal to go below low, then wait for the signal to go above high, then wait for the signal to go below low, then wait for the signal to go above high, then wait for the signal to go below low, then wait for the signal to go above high,
first TCNT; period (TCNT-first)/8; first TCNT; period (TCNT-first)/8; first TCNT;
Record the TCNT each time the signal goes above high, and the period is the 16-bit unsigned difference between TCNT measurements. After subtracting, divide by 8 to get the answer in sec. Discuss with the TA at the time of checkout, what would happen if you divided first, then subtracted and, why two thresholds were used instead of one. In this first example, TCNT is $1000 the first time the signal goes above high, and $7000 the second time. The period is ($7000 to $1000)/8 or 3072 sec.
9
You may adjust these percentages so that you get only one trigger each period.
11.10 䡲 Laboratory Assignments
431
period Max High
TCNT = $1000
Low Min
TCNT = $7000
This second example illustrates that the system works even if the TCNT rolls over. In this example, the period is now 3584 sec, but TCNT is $F000 the first time the signal goes above high, and $6000 the second time. The period is ($6000-$F000)/8 or 3584 sec. This works because both TCNT and the subtraction are unsigned 16-bit values. period Max High
TCNT = $F000
Low Min
TCNT = $6000
Lab 11.7 Microcomputer-Based Motor Controller Purpose: The objective of this lab is to study analog to digital conversion, digital to analog conversion, real-time digital control, and LCD display. Description: The objective of this problem is to design a microcomputer-based motor controller. The desired rotation speed x* will be selected interactively by the operator typing on keyboard, either the matrix keyboard or the SCI interface. The output information will be displayed on either a LCD or the SCI interface. You will power the motor with an analog signal from the DAC. You will estimate the motor speed (x’) by measuring the tachometer voltage using the ADC. Your first control software will implement an incremental control algorithm. Your second system will implement a proportional/integrator (PI) control system. The goal of the control software is to maintain the motor speed as close to x* as possible. You will implement two control systems. The first one will be a simple incremental controller. Let u be the DAC output controlling the motor. The power to the motor is directly related to the DAC value. The value is increased by a fixed amount if it is spinning too slowly, and decreased by a fixed amount if it is spinning too fast. The incremental control algorithm executes the following at a regular rate:
or
u min(255,u 1)
if x* x’ (too slow)
u max(0,u 1)
if x* x’ (too fast)
The min and max operations maintain the DAC output within the valid range of 0 to 255. The disadvantage of incremental control is that it has a very slow response. The second system you will implement will be a PI controller. We can use linear control theory to develop the digital controller, see Figure L11.7. We will define the tachometer voltage as the actual motor speed. This speed will be measured with the ADC. Any error in the state estimator will lead to a nonremovable controller error. Just like the data acquisition and digital filter situations, t is the continuous time and n is the discrete time. We will assume the controller is executed at a fixed interval, t. Figure L11.7 Block diagram of a linear control system in the frequency domain.
x*
e(n)
PI Controller k kP + s I x'(n)
u(n)
Actuator c
p(t) DC motor m 1 + sτ
State Estimator 1
f(t)
432
11 䡲 Analog I/O Interfacing Theoretically we can choose controller constants, kP and kI, to create the desired controller response. Unfortunately it can be difficult to estimate c, m and . If a load is applied to the motor, then m and will change. In addition, most motors do not follow a simple single pole relationship. The basic approach is presented in the following equations. Let x(n) be the current tachometer voltage represented as a fixed-point number. Let x* be the desired tachometer voltage also represented as a fixedpoint number. The error is e(n) (x* x(n)) The proportial term is up(n) (kp*e(n))/100 The integral term is ui(n) ui(n 1) (ki*e(n))/100 if (ui(n) 50) then ui(n) 50 (called anti-reset-windup) if (ui(n) 50) then ui(n) 50 The DAC output is the combination of u(n) up(n) ui(n) if (u(n) 255) then u(n) 255 if (u(n) 0) then u(n) 0 All calculations are performed as 16-bit signed integers. A simple empirical method can be used to determine the controller constants. This empirical approach starts with just a proportional term (kp). This proportional controller will generate a smooth motor speed (actuator output achieves a constant value), but the speed will not be correct. Try different kp constants until the response times are fast enough. The response time is the delay after x* is changed for the motor to reach a new constant speed. kp is too big if the actuator saturates both at the maximum and minimum after x* is changed. Steady state controller accuracy is defined as the average difference between x* and x. The next step is to add some integral term (ki) a little at a time to improve the steady state controller accuracy without adversely affecting the response time. Don’t change both kp and ki at once. Rather, you should vary them one at a time. Overshoot is defined as the maximum positive error that occurs when x* is increased. Similarly, undershoot is defined as the maximum negative error that occurs when x* is decreased. If the response time, overshoot, undershoot and accuracy are within acceptable limits, then a PI controller is adequate. The foreground (main) process: Initializes I/O ports and data structures Explanations of the various interpreter commands Maintain a display of x(n) The interpreter process (using interrupting keyboard or interrupting SCI): Can specify the desired motor speed, 0 x* 5000 mV with a resolution of 1 mV The digital controller (periodic interrupt) process: The controller rate is 1 ms. Implement a PI control system
12
Communication Systems Chapter 12 objectives are to: c c c c c
Present a general model for data flow problems Develop implementations for the first in first out queue Discuss methods to support interthread communication Design show simple networks based on the SCI port Introduce the controller area network (CAN), and use it to connect 9S12’s together c Present the I2C protocol, and use it to interface peripheral devices
The goal of this chapter is to provide a brief introduction to communication systems. Communication theory is a richly developed discipline, and much of the communication theory is beyond the scope of this book. Nevertheless, the trend in embedded systems is to employ multiple intelligent devices, therefore the interconnection will be a strategic factor in the performance of the system. A variety of different manufacturers are involved in the development these devices, thus the interconnection network must be flexible, robust, and reliable. Because the emphasis of this book is on real-time embedded systems, this chapter focuses on implementing communication systems appropriate for embedded systems. The components of an embedded system typically combined to solve a common objective, thus the nodes on the communication network will cooperate towards that shared goal. In particular, requirements of an embedded system, in general, involve relatively low to moderate bandwidth, static configuration, and a low probability of corrupted data. On the other hand reliability and latency are important for real-time systems.
12.1
Introduction In Chapter 8, we presented the hardware and software interfaces for the SCI channel. At that time we connected the 9S12 to an I/O device, and used the SCI channel to input/output data to the human. In this chapter, we will build on those ideas and introduce the concepts of networks by investigating a couple of simple networks. In particular, we will use the SCI channel to connect multiple 9S12’s together, creating a network. A communication network includes both the physical channel (hardware) and the logical procedures (software) that allow users or software processes to communicate with each other. The network provides the transfer of information as well as the mechanisms for process synchronization. It is convenient to visualize the network in a hierarchical fashion, as shown in Figure 12.1. At the lowest level, frames are transferred between I/O ports of the two (or more) computers along the physical link or hardware channel. Error detection and correction may be 433
434
12 䡲 Communication Systems
Figure 12.1 A layered approach to communication systems.
Communication TaskA
TaskB
TaskC
TaskC
Messages OS1
OS2
Computer 1 I/O Port
Computer2 Frames
I/O Port
Physical Link
handled at this low level. At the next logical level, the operating system (OS) of one computer sends messages or packets to the OS on the other computer. The message protocol will specify the types and formats of these messages. Error detection and correction may also be handled at this level. Messages typically contain four fields: 1. Address information field Physical address specifying the destination/source computers Logical address specifying the destination/source processes (e.g., users) 2. Synchronization or handshake field Physical synchronization like shared clock, start and stop bits OS synchronization like request connection or acknowledge Process synchronization like semaphores 3. Data field ASCII text (raw or compressed) Binary (raw or compressed) 4. Error detection and correction field Vertical and horizontal parity Checksum Block correction codes (BCC) Observation: Communication systems often specify bandwidth in total bits/sec, but the important parameter is the data transfer rate. Observation: Often the bandwidth is limited by the software and not the hardware channel.
At the highest level, we consider communication between users or high-level software tasks. Many embedded systems require the communication of command or data information to other modules at either a near or a remote location. Because the focus of this book is embedded systems, we will limit our discussion with communication with devices within the same room. A full duplex channel allows data to transfer in both directions at the same time. In a half duplex system, data can transfer in both directions but only in one direction at a time. Half-duplex is popular because it is less expensive (2 wires) and allows the addition of more devices on the channel without change to the existing nodes.
12.2
Reentrant Programming and Critical Sections In general, if two threads access the same global memory and one of the accesses is a write, then there is a causal dependency between the threads. This means, the execution order may affect the outcome. Shared global variables are very important in multithreaded systems because they are required to pass data between threads, but they are complicated and it is hard to find bugs can result with their use.
12.2 䡲 Reentrant Programming and Critical Sections
435
A program segment is reentrant if it can be concurrently executed by two (or more) threads. To implement reentrant software, we place variables in registers or on the stack, and avoid storing into global memory variables. When writing in assembly, we use registers, or the stack for parameter passing to create reentrant subroutines. Typically each thread will have its own set of registers and stack. A nonreentrant subroutine will have a section of code called a vulnerable window or critical section. An error occurs if 1. One thread calls the nonreentrant subroutine 2. Is executing in the critical section when interrupted by a second thread 3. The second thread calls the same subroutine There are a number of scenarios that can happen next. In the most common scenario, the second thread is allowed to complete the execution of the subroutine, control is then returned to the first thread, and the first thread finishes the subroutine. This first scenario is the usual case with interrupt programming. In the second scenario, the second thread executes part of it, is interrupted and then re-entered by a third thread, the third thread finishes, the control is returned to the second thread and it finishes, lastly the control is returned to the first thread and it finishes. This second scenario can happen in interrupt programming if interrupts are reenabled during the execution of the ISR. A critical section may exist when two different subroutines access and modify the same memory-resident data structure. Program 12.1 shows an assembly and a C function that are nonreentrant because they use a global variable, num. The assembly language program accepts two 16-bit signed integers in Registers D and X and returns the average in Register D. These functions could have been made reentrant by implementing num as a local variable or in a register, but the purpose of the example is to illustrate what can go wrong when a nonreentrant function is re-entered. Program 12.1 This function is nonreentrant because of the read-modify-write access to a global.
num rmb 2 Ave stx num addd num asrd rts
short num; short Ave(short first, short second){ num = first; return (num+second)/2; }
Checkpoint 12.1: Rewrite Program 12.1 so that it is reentrant.
A critical section exists between the stx and the addd instructions. Assume there are two concurrent threads (the main program and a background ISR) that both call this subroutine. Concurrent means that both threads are ready to run. Because there is only one computer, exactly one thread will be running at a time. Typically, the operating system switches execution control back and forth using interrupts. For example, the main program might be executing when an interrupt causes the computer to switch over and execute the ISR. When the ISR is done it executes an rti and the control returns back to the main program. An error occurs if: 1. The main program calls Ave 2. The main program executes the stx instruction saving its second number in num 3. The OS halts the main program (using an interrupt) and starts the interrupt service routine 4. The ISR calls Ave The ISR executes the stx saving its second number in num The ISR finishes Ave 5. The OS returns control back to the main program 6. The main program executes the addd instruction but gets the wrong num An atomic operation is one that once started is guaranteed to finish. In most computers, once an instruction has begun, the instruction must be finished before the computer can
436
12 䡲 Communication Systems
process an interrupt. Therefore, the following read-modify-write sequence is atomic because it can not be halted in the middle of its operation. inc counter
;where counter is a 8-bit global variable
On the other hand, this read-modify-write sequence is not atomic because it can start, then be interrupted. ldx counter inx stx counter
;where counter is a 16-bit global variable
In general, nonreentrant code can be grouped into three categories all involving 1) nonatomic sequences, 2) writes and 3) global variables. We will classify I/O ports as global variables for the consideration of critical sections. We will group registers into the same category as local variables because each thread will have its own registers and stack. The first group is the read-modify-write sequence: 1. The software reads the global variable producing a copy of the data 2. The software modifies the copy (at this point the original variable is still unmodified) 3. The software writes the modification back into the global variable. In the second group, we have a write followed by read, where the global variable is used for temporary storage: 1. The software writes to the global variable (this becomes the only copy of the information) 2. The software reads from the global variable expecting the original data to still be there. In the third group, we have a non-atomic multi-step write to a global variable: 1. The software writes part of the new value to a global variable 2. The software writes the rest of the new value to a global variable. Observation: When considering reentrant software and vulnerable windows we classify accesses to I/O ports the same as accesses to global variables. Observation: Any variable larger than 16 bits on the 9S12 will require at least two instructions to read or write it. Observation: Sometimes we store temporary information in global variables out of laziness. This practice is to be discouraged because it wastes memory and may cause the module to not be reentrant.
Sometime we can have a critical section between two different software functions (one function called by one thread, and another function called by a different thread). In addition to above three cases, a non-atomic multi-step read will be critical when paired with a multi-step write. For example, assume a data structure has multiple components (on the 9S12, any variable larger than 16 bits is considered to have multiple components). In this case, the write to the data structure will be atomic because it occurs in an ISR, running with I 1. The critical section exists in the foreground between steps 1 and 2. In this case, a critical section exists even though no software has actually been reentered. Foreground Thread
Background Thread
1. The software reads some of the data 2. The software reads the rest of the data
1. The ISR writes to the data structure
12.2 䡲 Reentrant Programming and Critical Sections
437
In a similar case, a non-atomic multi-step write will be critical when paired with a multistep read. Again, assume a data structure has multiple components. In this case. the read from the data structure will be atomic because it occurs in an ISR, running with I 1. The critical section exists in the foreground between steps 1 and 2. Foreground Thread
Background Thread
1. The software writes some of the data 2. The software writes the rest of the data
1. The ISR reads from the data structure
When multiple threads are active, it is possible for two threads to be executing the same program. For example, the system may be running in the foreground and calls Func. Part way through execution the Func, an interrupt occurs. If the ISR also calls Func, two threads are simulataneously executing the function. To experimentally determine if a function has been reentered, we could use two output pins. We increment the port at the start and decrement it at the end. The thread has been re-entered if the port value goes above 1, as shown in Program 12.2. In this example, Port T is not part of the original code, but rather used just for the purpose of debugging. PT0 is 1 when one thread starts executing the function. However, if PT1 becomes 1, then the function has been reentered.
Program 12.2 Detection of re-entrant behavior using two output bits.
; subroutine to be tested Func inc PTT ; the function dec PTT rts
// function to be tested void Func(void){ PTT++; // the function PTT—; }
Checkpoint 12.2: What does it mean if both PT1 and PT0 are 1?
If critical sections do exist, we can either eliminate it by removing the access to the global variable or implement mutual exclusion, which simply means only one thread at a time is allowed execute in the critical section. In general, if we can eliminate the global variables, then the subroutine becomes reentrant. Without global variables there are no “vulnerable” windows because each thread has its own registers and stack. Sometimes one must access global memory to implement the desired function. Remember that all I/O ports are considered global. Furthermore, global variables are necessary to pass data between threads. A simple way to implement mutual exclusion is to disable interrupts while executing the critical section. It is important to disable interrupts for as short a time as possible, so as to minimize the affect on the dynamic performance of the other threads. While we are running with interrupts disabled, time-critical events like power failure and danger warnings can not be processed. Notice also that the interrupts are not simply disabled then enabled. Before the critical section, the interrupt status is saved, and the interrupts disabled. After the critical section, the interrupt status is restored. You can not save the interrupt status in a global variable, rather you should save it either on the stack or in a register. In assembly, we can use the following skeleton to implement mutual exclusion and eliminate the critical section: pshc ;save CCR sei ;disable interrupts ;execute the critical section pulc ;restore I bit to its original value
438
12 䡲 Communication Systems
In C, we can use a similar approach to implement mutual exclusion and eliminate the critical section: void function(void){ char saveCCR; asm tpa // previous interrupt enable asm staa saveCCR // save previous asm sei // make atomic // execute the critical section asm ldaa saveCCR // recall previous asm tap // end critical section } Checkpoint 12.3: Consider the situation of nested critical sections. For example, a function with a critical section calls another function that also has a critical section. What would happen if you simply added a sei at the beginning and a cli at the end of each critical section?
Reentrant programming is very important when writing high-level language software too. Obviously, we minimize the use of global variables. But when global variables are necessary must be able to recognize potential sources of bugs due to nonreentrant code. We must study the assembly language output produced by the compiler. For example, we can not determine whether the following read-modify-write operation is reentrant without knowing if it is atomic: time++;
If the compiler generates the following object code, then time; is atomic (therefore not critical) inc time
If the compiler generates the following object code, then time; is not atomic (therefore critical) ldd time addd #1 std time Observation: A good compiler generates atomic code when setting or clearing individual bits in the I/O ports. E.g., PTT & ⬃0x40; and PTH | 0x20; will be compiled as bclr PTT,#$40 and bset PTH,#$20.
Another category of timing-dependent bugs, similar to critical sections, is called a race condition. A race condition occurs in a multi-threaded environment when there is a causal dependency between two or more threads. In other words, different behavior occurs depending on the order of execution of two threads. In this first example, thread1 initializes DDRT 0xF0 because it uses PT7 to 4 as outputs, and thread2 initializes DDRT 0x0F because it uses PT3 to 0 as outputs. In particular, if thread1 initializes first and thread2 initializes second, then PT7 to 4 will be set to inputs. Conversely, if thread2 initializes first and thread1 initializes second, then PT3 to 0 will be set to inputs. In a second example, assume two threads are trying to get data from the same input device. When data arrives at the input, the thread that executes first will capture the data.
12.3
Interthread Communication and Synchronization For regular function calls we use the registers and stack to pass parameters, but interrupt threads have logically separate resisters and stack. In particular, all registers are automatically saved by the microcomputer as it switches from main program (foreground thread) to interrupt service routine (background thread). The rti instruction will restore the registers
12.3 䡲 Interthread Communication and Synchronization
439
(including the interrupt enable bits and the PC) back to their previous values. Thus, all parameter passing must occur through shared global memory. One cannot pass data from the main program to the interrupt service routine using registers or the stack. The classic producer/consumer problem has two threads. One thread produces data and the other consumes data. For an input device, the background thread is the producer because it generates new data, and the foreground thread is the consumer because it uses the data up. For an output device, the data flows in the other direction so the producer/consumer roles are reversed.
12.3.1 Mailbox
Figure 12.2 A mailbox can be used to pass data between threads.
One simple interthread communication scheme is the mailbox. Figure 12.2 illustrates an input device interfaced using interrupt synchronization. The big arrow in this figure signifies the communication/synchronization link between the background and foreground. The mailbox structure is implemented with two global variables. RxMail contains data, and RxStatus is a flag specifying whether the mailbox is full or empty. The interrupt is requested when its trigger flag is set, signifying new data is ready from the input device. The ISR will read the data from the input device and store it in the global variable RxMail, then update its status as full. The main program will perform other calculations, while occasionally checking the status of the mailbox. When the mailbox has data, the main program will process it. This approach is adequate for situations where the input bandwidth is slow compared to the software processing speed. It is also possible to process the data within the ISR itself, and just report the results of the processing to the main program using the mailbox. Main program
ISR Read data from input
Other calculations Empty RxStatus
RxMail=data RxStatus=full
Full Process RxMail RxStatus=empty
rti
One way to visualize the interrupt synchronization is to draw a state versus time plot of the activities of the hardware, the mailbox, and the two software threads. Figure 12.3 shows that at time (a) the mailbox is empty, the input device is idle and the main program is performing other tasks, because mailbox is empty. When new input data is ready, the trigger flag will be set and an interrupt will be requested. At time (b) the ISR reads data from input Figure 12.3 Hardware/software timing of an input interface using a mailbox.
Input device
Trigger set
interrupt service routine main program
b
b rti
a
c empty
RxStatus
Trigger set
full
rti d
a
c
empty
full
d
a empty
440
12 䡲 Communication Systems
device and saves it in RxMail, then it sets RxStatus to full. At time (c) the main program recognizes RxStatus is full. At time (d) the main program processes data from RxMail, sets RxStatus to empty. Notice that even though there are two threads, only one is active at a time. The interrupt hardware switches the processor from the main program to the ISR, and the rti instruction switches the processor back.
12.3.4 Producer Consumer Problem
Figure 12.4 FIFO queues and double buffers can be used to pass data from a producer to a consumer.
Table 12.1 Producer-consumer examples.
The first in first out circular queue (FIFO) and double buffer are useful for data flow situations, as shown in Figure 12.4. These data structures can be used to link a source process (the producer is hardware/software that generates data) to a sink process (the consumer is hardware/software that consumes data). In both cases the data is order-perserving, such that the order in which data is saved equals the order in which it is retrieved. There are many producer-consumer applications. In Table 12.1 the activities on the left are producers that create or input data, while the activities on the right are consumers that process or output data.
Put Source process Producer
Get FIFO or Double buffer
Sink process Consumer
Source/Producer
Sink/Consumer
Keyboard input Program with data Program sends message Microphone and ADC Program that has sound data
Program that interprets Printer output Program receives message Program that saves sound data DAC and speaker
The source process puts data into the FIFO or double buffer. If there is room, the Put operation saves data in the structure. If the data structure is full and the user tries to put, the Put routine will return a full error signifying the last (newest) data was not properly saved. The sink process removes data from the FIFO or double buffer. After a Get, the particular information returned from the Get routine is no longer saved. If the structure is empty and the user tries to get, the Get routine will return an empty error signifying no data could be retrieved. The FIFO and double buffer are order preserving, such that the information is returned by repeated calls of Get in the same order as the data was saved by repeated calls of Put. A FIFO typically can store many small chunks of data, whereas a double buffer can store two large fixed-size blocks of data. Checkpoint 12.4: What conditions might cause the FIFO to become full?
The first in first out circular queue (FIFO) is quite useful for implementing a buffered I/O interface. It can be used for both buffered input and buffered output. The order preserving data structure temporarily saves data created by the source (producer) before it is processed by the sink (consumer). After initialization, the FIFO has two functions: Put (enters new data) and Get (removes the oldest data). You have probably already experienced the convenience of FIFOs. For example, when using an editor, you can continue to type characters while other processing is occurring. The ASCII codes are input from the keyboard as they are typed and put in a FIFO. When the editor is active again, it gets more keyboard data to process. A FIFO is also used when you ask the computer to print a file.
12.3 䡲 Interthread Communication and Synchronization
441
Rather than waiting for the actual printing to occur character by character, the print command will put the data in a FIFO. Whenever the printer is free, it will get data from the FIFO. The advantage of the FIFO is it allows you to continue to use your computer while the printing occurs in the background. To implement this magic of background printing we will need interrupts. Figure 12.5 shows a data flow graph with buffered input and buffered output. FIFOs used in this book will be statically allocated global structures. Because they are global variables, it means they will exist permanently and can be carefully shared by more than one program. The advantage of using a FIFO structure for a data flow problem is that we can decouple the producer and consumer threads. Without the FIFO we would have to produce one piece of data, then process it, produce another piece of data, then process it, as described in Figures 12.2 and 12.3. With the FIFO, the producer thread can continue to produce data without having to wait for the consumer to finish processing the previous data. This decoupling can significantly improve system performance.
Figure 12.5 A data flow graph showing two FIFOs that buffer data between producers and consumers.
Consumer
RxFifo_Put
RxFifo_Get
SCI_InChar
Producer RDRF ISR
RxFifo
main
Producer TxFifo_Put SCI_OutChar TxFifo
TxFifo_Get
SCI input
Consumer TDRE ISR
SCI output
Let tp be the time (in sec) between calls to Put, and rp be the arrival rate (in bytes/sec) into the system. Similarly, let tg be the time (in sec) between calls to Get, and rg be the service rate (in bytes/sec) out of the system. rg
1 tg
rp
1 tp
If the minimum time between Put’s is greater than the maximum time between Get’s, min tp Ú max tg then a FIFO is not necessary and the data flow program could be solved with a simple mailbox. On the other hand, if the time between Put’s temporarily becomes less than the time between Get’s because either 䡲 The arrival rate temporarily increases 䡲 The service rate temporarily decreases then information will be collected in the FIFO. For example, a person might type very fast for a while followed by long pause. The FIFO could be used to capture without loss all the data as it comes in very fast. Clearly, on average the system must be able to process the data (the sink process) at least as fast as the average rate at which the data arrives. If the average input rate is larger than the average output rate rp 7 rg then the FIFO will eventually overflow no matter how large the FIFO. If rp is temporarily high or rg is temporarily low, and that causes the FIFO to become full, then this problem can be solved by increasing the FIFO size.
442
12 䡲 Communication Systems
There is fundamental difference between an empty error and a full error. Consider the application of using a FIFO between your computer and its printer. This is a good idea because the computer generates data to be printed at a very high rate followed by long pauses. The printer is like a turtle. It can print at a slow but steady rate. The computer will Put a byte into the FIFO that it wants printed. The printer will Get a byte out of the FIFO when it is ready to print another character. A full error occurs when the computer calls Put at too fast a rate. A full error is serious, either data will be lost or the rest of the computer pauses waiting for there to be room in the FIFO. On the other hand, an empty error occurs when the printer is ready to print but the computer has nothing in mind. An empty error is not serious. If the FIFO is empty the printer just shuts itself off and does nothing. Checkpoint 12.5: If the FIFO becomes full, can the situation be solved by increasing the size?
An input interface using interrupts, but without a FIFO, isn’t any better than a busywaiting solution. If the next input data arrives before the previous data is processed, then data will be lost. When the I/O bandwidth is fast or unpredictable, it is appropriate to pass data from the producer thread to the consumer thread using a first in first out queue (FIFO). The FIFO will buffer the data between the foreground and background. The presence of the FIFO placed between the producer and consumer greatly improves performance by reducing the time each waits for the other. Figure 12.6 shows I/O system that uses interrupts for both input and output. When the main program wishes to output, it calls OutChar, which will put the data in the TxFifo and arm the output device. When the main program wishes to input, it calls InChar, which will get data from the RxFifo. This example has been implemented at tut4 as part of the TExaS system. Figure 12.6 FIFO queues can be used to pass data between threads.
Output ISR
Input ISR InChar
Read data from input
Empty RxFifo Not empty RxGetFifo
RxFifo
Full
RxPutFifo
Empty
Not empty
OutChar
TxGetFifo Full
Not full
rts
TxFifo
TxFifo
Not full
TxPutFifo
Write data to output
Disarm output
ERROR Arm output rti
rts
rti
Observation: For systems with interrupt-driven I/O on multiple devices, there will be a separate FIFO for each device.
The incoming serial data will set input trigger, requesting an interrupt. The ISR (background) will accept the data and put it in the RxFifo. The RxFifo buffers data between the input hardware and the main program that processes the data. If the RxFifo becomes full, then data will be lost. FIFO full errors will always occur if the average input rate (number of bytes arriving per second from the input hardware) exceeds the average processing rate (number of bytes processed per second by the main program). In this situation, either the output rate must be increased (by using a faster computer or by writing a better software processing algorithm), or the input rate must be decreased (by slowing down the arrival rate of data.) The second way the RxFifo could become full is if there is a temporary increase
12.3 䡲 Interthread Communication and Synchronization
443
in the arrival rate or a temporary decrease in the process rate. For this situation, the full errors could be eliminated by increasing the size of the RxFifo. The output trigger occurs when the output device is idle, ready to output more data. If there is data available in the TxFifo, the ISR will get it and write it to the output device. If the TxFifo is empty, the output device is disarmed. The main program puts data in TxFifo and gets data from the RxFifo as desired. If the TxFifo becomes full, then it is appropriate to wait for the interrupts to make room. It is inefficient, but not catastrophic for the main program to wait on a full TxFifo. Efficiency can be improved for the buffered output problem by increasing the TxFifo size. It is also inefficient, but not catastrophic for the main program to wait on an empty RxFifo. Efficiency can be improved for the buffered input problem by performing other tasks while waiting for data. It is important to study the timing behavior of the I/O hardware and software processing when designing an interrupting interface. One simple way to study a problem is to measure the number of elements in the RxFifo when new data is entered by the input ISR. If the time for the software to read and process the data is much faster than the time for the input device to create new input, then there will be very few elements in the RxFifo. For most systems, the producer and consumer rates fluctuate, but during the times when the software waits for the I/O hardware, the system is classified as I/O bound. For an I/O bound input interface the RxFifo has either 0 or 1 entry, and the use of interrupts does not enhance the bandwidth over the busy-waiting implementations. Even with an I/O-bound input device however, it may be more efficient to utilize interrupts because it provides a straightforward approach to servicing multiple devices. If the input device generates a burst of high bandwidth activity, then there will be many elements in the RxFifo. As long as the interrupt service routine is fast enough to keep up with the input device and as long as the RxFifo does not become full, no data is lost. Recall the ISR doesn’t have to process the input data, just read it and save it in the RxFifo. In this situation, the overall bandwidth is higher than it would be with a busy-waiting implementation, because the input device does not have to wait for each data byte to be processed. This is the classic example of a “buffered” input, because data enters the system (via the interrupts) is temporarily stored in a buffer (put into the RxFifo) and the data is processed later (by the main program, get from the RxFifo.) During the times when the I/O device is faster than the software, the system is called CPU-bound. A system will work if the producer rate only temporarily exceeds the consumer rate (a short burst of high bandwidth input). If the external device sustained the high bandwidth input rate, then the RxFifo would become full and data would be lost. For an output device, we will count the number of elements in the TxFifo when data is removed by the output ISR. If the rate for the software to generate new data is much slower than the rate for the output device to send data, then there will be very few elements in the TxFifo. During this time the system is called CPU-bound. In this situation, the TxFifo has either 0 or 1 entry, and the use of interrupts does not enhance the bandwidth over the busywaiting implementations. Even with a CPU-bound output device however, it may be more efficient to utilize interrupts because it provides a straight-forward approach to servicing multiple devices. If the main program generates a burst of output activity, then there will be many elements in the TxFifo. In this situation, the overall bandwidth is higher than it would be with a busy-waiting implementation, because the main program does not have to wait for each data byte to be outputted. This is the classic example of a “buffered” output, because data enters the system (via the main program) is temporarily stored in a buffer (put into the TxFifo) and the data is processed later (by the output ISR, get from the TxFifo.) During the time when the main program is faster than the output hardware, the system is called I/Obound. Just like the input situation, a system will work only if the producer rate temporarily exceeds the consumer rate. If the main program sustained the output rate, then the TxFifo would become full and main program would then have to wait. Again, the output situation is most efficient if the TxFifo is big enough to avoid full errors.
444
12 䡲 Communication Systems
12.3.4 FIFO Queue Implementation
There are many ways to implement a statically allocated FIFO. We can use either a pointer or and index to access the data in the FIFO. We can use either two pointers (or two indices) or two pointers (or two indices) and a counter. The counter specifies how many entries are currently stored in the FIFO. There are even hardware implementations of FIFO queues. If we were to have infinite memory, as shown in Figure 12.7, a FIFO implementation is easy. GetPt points to the data that will be removed by the next call to Fifo_Get, and PutPt points to the empty space where the data will stored by the next call to Fifo_Put. Program 12.3 presents the basic idea of a pointer-based FIFO implementation. To put data in the FIFO, the new data is stored at PutPt, then this pointer is incremented. To get data from the FIFO, the value at GetPt is read, then this pointer is incremented.
Figure 12.7 The FIFO implementation with infinite memory.
RxFifo GetPt PutPt
Program 12.3 Code fragments showing the basic idea of a FIFO.
Valid data
;Reg A is data to put into the FIFO Fifo_Put ldx PutPt staa 1,X+ ;store into FIFO stx PutPt ;update pointer rts ;Reg A returned with byte from FIFO Fifo_Get ldx GetPt ldaa 1,X+ ;read from FIFO stx GetPt ;update rts
void Fifo_Put(char data){ *(PutPt++) = data; }
void Fifo_Get(char *datapt){ *datapt = *(GetPt++); }
There are three modifications that are required to these functions. If the FIFO is full when Fifo_Put is called then the subroutine should return a full error. Similarly, if the FIFO is empty when Fifo_Get is called, then the subroutine should return an empty error. There is never an infinite amount of memory, so a finite number of bytes will be permanently allocated to the FIFO. Figures 12.8 and 12.9 show an example with 10 bytes allocated. The PutPt and GetPt must be wrapped back up to the top when they reach the bottom. The shaded blocks in these two figures represent valid data saved in the FIFO. Figure 12.8 shows how the FIFO changes as four bytes are Put into it. Figure 12.9 shows the same FIFO as Fifo_Get is called four times. Observe the order-preserving nature of the FIFO. Figure 12.8 The FIFO Put operation showing the pointer wrap.
Put
Put
Put
Put
PutPt
newest
PutPt
newest
PutPt
GetPt PutPt
GetPt PutPt
oldest newest
GetPt
oldest newest
GetPt
oldest
GetPt
oldest
12.3 䡲 Interthread Communication and Synchronization Figure 12.9 The FIFO Get operation showing the pointer wrap.
Get
Get
Get
Get GetPt
newest
PutPt
GetPt
newest
oldest newest
newest
PutPt
PutPt
445
PutPt
GetPt PutPt
oldest
GetPt
oldest
GetPt
oldest
There are two mechanisms to determine whether the FIFO is empty or full. A simple method is to implement a counter containing the number of bytes currently stored in the FIFO. Fifo_Get would decrement the counter and Fifo_Put would increment the counter. The second method, shown in Figure 12.10 and Program 12.4, is to prevent the FIFO from being completely full. For example, if the FIFO had 10 bytes allocated, then the Fifo_Put subroutine would allow a maximum of 9 bytes to be stored. If there were already 9 bytes in the FIFO and another Fifo_Put were called, then the FIFO would not be modified and a full error would be returned. In this way if PutPt equals GetPt at the beginning of Fifo_Get, then the FIFO is empty. Similarly, if PutPt+1 equals GetPt at the beginning of Fifo_Put, then the FIFO is full. Be careful to wrap the PutPt+1 before comparing it to Fifo_Get. This second method does not require the length to be stored or calculated. The FIFO global structures must be allocated in RAM. PutPt and GetPt are private, and not accessible by programs outside the FIFO module. Figure 12.10 Flowcharts of the put and get operations.
Fifo_Get
Fifo_Put
=PutPt GetPt != PutPt Retreive data at GetPt GetPt++
tempPt = PutPt Store data at tempPt tempPt++ within buffer tempPt beyond buffer Reset tempPt
return(1)
return(0)
within buffer GetPt beyond buffer Reset GetPt
=GetPt tempPt != GetPt PutPt = tempPt
empty
return(1) full return(0)
The initialization function, Fifo_Init, is usually called once at the start of the system. The FIFO is empty if the PutPt equals the GetPt. Both pointers should always address locations within the 10-byte allocated area. The Fifo_Put routine enters new data in the FIFO. To check for FIFO full, the Fifo_Put routine attempts to put using a temporary pointer. If putting makes the FIFO look empty, then the temporary pointer is discarded
446
12 䡲 Communication Systems
Program 12.4 Implementation of a two-pointer FIFO.
FIFO_SIZE PutPt GetPt Fifo Fifo_Init
equ 10 rmb 2 rmb 2 rmb FIFO_SIZE movw #Fifo,PutPt movw #Fifo,GetPt rts ; Input RegA data to put ; Output RegB 1=OK, 0=full Fifo_Put pshx ldx PutPt ;Temporary staa 1,x+ ;Try to put cpx #Fifo+FIFO_SIZE bne skip ldx #Fifo ;Wrap skip clrb cpx GetPt ;Full if same beq ok incb ;1 means OK stx PutPt ok pulx rts ; Input none ; Output RegA data from Get ; RegB 1=ok, 0=empty Fifo_Get pshx clrb ldx GetPt cpx PutPt ;Empty? beq done incb ;1=OK ldaa 1,x+ ; Data cpy #Fifo+FIFO_SIZE bne no ;wrap? ldx #Fifo ;yes no stx GetPt done pulx rts
#define FIFO_SIZE 10 char static *PutPt; char static *GetPt; char static Fifo[FIFO_SIZE]; void Fifo_Init(void){ PutPt = GetPt = &Fifo[0]; } int Fifo_Put(char data){ char *tempPt; tempPt = PutPt; *(tempPt++) = data; /* try */ if(tempPt==&Fifo[FIFO_SIZE]){ tempPt = &Fifo[0]; } if(tempPt == GetPt){ return(0); /* full! */ } else{ PutPt = tempPt; /* OK */ return(1); } } int Fifo_Get(char *datapt){ if(PutPt == GetPt){ return(0); /* Empty */ } else { *datapt = *(GetPt++); if(GetPt==&Fifo[FIFO_SIZE]){ GetPt = &Fifo[0]; } return(1); } }
and the routine is exited without saving the data. This is why a FIFO with 10 allocated bytes can only hold 9 data points. If putting doesn’t make the FIFO look empty, then the temporary pointer is stored into the actual PutPt saving the data as desired. The Fifo_Get routine removes the oldest data from the FIFO. To check for FIFO empty, the Fifo_Get routine simply checks to see if GetPt equals PutPt. If they match at the start of the routine, then Fifo_Get returns with the “empty” condition signified. Next, the information is retreived from the FIFO. The GetPt is incremented signifying that information is no longer in the FIFO. If the add one to GetPt operation makes the pointer go beyond the FIFO buffer, the pointer is wrapped back to the beginning.
12.3.4 Double Buffer
A double buffer is two buffers of fixed size. One example that uses a double buffer is a disk. Consider the situation where a large amount of data is to be read from a disk. The disk is organized into fixed size blocks. The size of each of the two buffers will match the block size of the disk. In the situation shown in Figure 12.11, the hardware is
12.4 䡲 Serial Port Interface Using Interrupt Synchronization Figure 12.11 A double buffer allows you to store data into one buffer at the same time as retrieving data from the other buffer.
447
Data read from disk Buf1 Data processed by software
Buf2
reading data from the disk filling Buf1. The hardware is configured to read an entire block. During this time the software is reading the data previously stored in Buf2. The double buffer will preserve order. This means the order in which the characters are input from the disk is the same as the order in which they are processed by the software. The differences between a FIFO queue and a double buffer are data size and queue length. The data size of a FIFO is typically one or two bytes. This means that one puts and gets single bytes into and out of the FIFO queue. The data size of the double buffer is typically large (e.g., 80, 256, and 1024 bytes). This means that one always saves and removes big blocks into and out of the double buffer. The FIFO queue length is large (typically ranging from 16 to 60000 bytes). The double buffer has exactly 2 buffers. When the software finishes processing Buf2 and the hardware finishes filling Buf1, the buffers are switched (hardware fills Buf2 and the software processes Buf1). This means if the hardware finishes first, then the disk hardware will have to be paused. Maximum disk efficiency occurs only if the disk can continuously read data has the blocks pass under the read head. I/O devices which manipulate data in fixed size blocks are candidates for using double buffer data structures. Other examples of such devices include: graphics displays, bar code scanners, UPC readers, credit card readers, and IR receivers. A graphics display uses two buffers called a front buffer and a back buffer. The graphics hardware uses the front buffer to create the visual image on the display, i.e., the front buffer contains the data that you see. The software uses the back buffer to create a new image, i.e., the back buffer contains the data that you see next. When the new image is ready, and the time is right, the two buffers are switched (the front becomes the back and the back becomes the front). In this way, the user never sees a partially drawn image.
12.4
Serial Port Interface Using Interrupt Synchronization The objective of this section is to develop software to support bidirectional data transfer using interrupt synchronization, implementing the data flow graph shown previously in Figure 12.5. We could connect the microcontroller to another computer and use this channel to transfer data. For example, if we connect the DB9 cable to a serial port on a PC, we could run HyperTerminal on the PC and communicate with the microcontroller. The RS232 timing is generated automatically by the SCI. In order for data to be properly received, the baud rate must match with the other module, which in this interface will be 9600 bits/sec. Initially, the two FIFOs are cleared, and just the receiver is armed, see Program 12.5. The transmitter will be armed when data is available within the SCI_OutChar routine. An interrupt occurs when new incoming data arrives in the receiver data register (RDRF 1). An interrupt also occurs when the transmit data register is empty (TDRE 1). TDRE is one, when the output channel is idle needing the software to supply additional data. Notice that the transmit channel is disarmed when the TxFifo is empty and rearmed when new data is put into the TxFifo. When the RxFifo becomes full, then data is lost, but when the TxFifo becomes full, the main program simply waits for space to become available.
448
12 䡲 Communication Systems
;9S12C32, 4MHz (9S12DP512 at 8 MHz) ;baud rate=9600 SCI_Init jsr RxFifo_Init ;FIFO is empty jsr TxFifo_Init ;FIFO is empty movb #$2C,SCI0CR2 ;arm just RDRF movw #52,SCI0BD ;(26 if 9S12C32) cli rts * Inputs: none Outputs: RegA is ASCII SCI_InChar pshb iloop jsr RxFifo_Get ;B=0 if empty tbeq B,iloop pulb rts ;A=character * Inputs: RegA is ASCII Outputs: none SCI_OutCh pshb ;A=character oloop jsr TxFifo_Put ;save in FIFO tbeq B,oloop ;B=0 if full movb #$AC,SCI0CR2 ;arm TDRE pulb rts SCIhandler ldaa SCI0SR1 bita #$20 beq CkTDRE ;Not RDRF set ldaa SCI0DRL ;ASCII character bsr RxFifo_Put CkTDRE ldaa SCI0SR1 bpl sdone ;Not TDRE set ldaa SCI0CR2 ;bit 7 is TIE bpl sdone ;disarmed? bsr TxFifo_Get tbeq B,nomore staa SCI0DRL ;start output bra sdone nomore movb #$2C,SCI0CR2 ;disarm TDRE sdone rti org $FFD6 fdb SCIhandler
// 9S12C32, (9S12DP512 at 8 MHz) // 9600 bits/sec void SCI_Init(void){ RxFifo_Init(); // empty FIFOs TxFifo_Init(); SCI0BD = 52; // (26 if 9S12C32) SCI0CR1 = 0; // M=0, no parity SCI0CR2 = 0x2C; // enable, arm RDRF asm cli // enable interrupts } // Input ASCII character from SCI // spin if RxFifo is empty char SCI_InChar(void){ char letter; while (RxFifo_Get(&letter) == 0){} ; return(letter); } // Output ASCII character to SCI // spin if TxFifo is full void SCI_OutChar(char data){ while (TxFifo_Put(data) == 0){} ; SCI0CR2 = 0xAC; // arm TDRE } #define TDRE 0x80 #define RDRF 0x20 // RDRF set on new receive data // TDRE set on empty transmit register interrupt 20 void SciHandler(void){ char data; if(SCI0SR1 & RDRF){ RxFifo_Put(SCI0DRL); // clears RDRF } if((SCI0CR2&0x80)&&(SCI0SR1&TDRE)){ if(TxFifo_Get(&data)){ SCI0DRL = data; // clears TDRE } else{ SCI0CR2 = 0x2c; // disarm TDRE } } }
Program 12.5 Assembly and C implementations of an interrupting SCI interface. Observation: Data is lost when the RxFifo gets full. Common Error: Notice that the above transmit device driver either acknowledges the interrupt by sending another character or disarms itself because the TxFifo is empty. The software will crash (infinite loop) if it returns from interrupt without acknowledging or disarming. Checkpoint 12.6: Why didn’t the initialization software arm TDRE? Checkpoint 12.7: What bad thing would happen if the RDRF ISR waited for there to be room in the RxFifo like SCI_OutChar waits for there to be room in the TxFifo? Checkpoint 12.8: Modify Program 12.5 so the baud rate is 1200 bits/sec.
12.5 䡲 *Distributed Systems
449
Consider the interrupting serial port interface shown in Program 12.5, and the FIFO implementations in Program 12.4. Assume also there is a TxFifo implementation separate, but identical to the RxFifo. When processing input, it is possible for the main program to start to execute RxFifo_Get, be interrupted and the ISR calls RxFifo_Put. To verify correctness of our system, we notice nothing bad would happen (e.g., crash, lost data, extra data, etc.) if the RxFifo_Put subroutine were to execute in between any two assembly instructions of the RxFifo_Get routine. A similar consideration arises when processing outputs. In this situation, the main program may start to execute TxFifo_Put, be interrupted and the output interrupt routine calls TxFifo_Get. Again, nothing bad happens if the TxFifo_Get subroutine is executed in between any two assembly instructions within the TxFifo_Put routine. If we are processing both input and output, then two FIFO’s would be used. Each FIFO routine (RxFifo_Get, RxFifo_Put, TxFifo_Get, TxFifo_Put) is called from exactly one place in the software system. Even though the functions themselves are nonreentrant, the system has no critical sections because none of the individual functions will be reentered. They can interrupt each other, but not themselves. In conclusion, the FIFO routines as used in the system (you can run this system as tut4 on TExaS) have no critical sections. One the other hand, if the foreground and background threads both were to call the same FIFO function, then there would be a critical section.
12.5
*Distributed Systems In this section, we will present three simple communication systems that utilize the SCI port. If the distances are short, half-duplex can be implemented with simple open collector or opendrain TTL-level logic. Open collector logic has two output states: low and off. In the off state the output is not driven high or low, it just floats. The 10 k pull-up resistor will passively make the signal high if none of the open collector outputs are low. The 9S12 can make its TxD serial outputs be open collector. This mode allows a half-duplex network to be created without any external logic (although pull-up resistors are often used). Three factors will limit the implementation of this simple half-duplex network: (1) the number nodes on the network, (2) the distance between nodes; and (3) presence of corrupting noise. In these situations a halfduplex RS485 driver chip like the SP483 made by Sipex or Maxim can be used. The first communication system is master-slave configuration, where the master transmit output is connected to all slave receive inputs, as shown in Figure 12.12. This provides for
Figure 12.12 A master-slave network implemented with multiple microcomputers.
+5V
10kΩ Master 9S12
TxD x RxD
SCI PS1 PS0
SCI PS1
Slave 9S12
PS0
TxD Ground
RxD
Ground TxD means regular digital output TxD means open collector output
TxD x RxD
to other slaves
SCI PS1
Slave 9S12
PS0 Ground
450
12 䡲 Communication Systems
broadcast of commands from the master. All slave transmit outputs are connected together using wire-or open collector logic, allowing for the slaves to respond one at a time. The WOMS1 bit (WOMS bit 1) in the slaves should be set to activate open collector mode on PS1. The low-level device driver for this communication system was presented in the previous section. When the master performs SCI output it is broadcast to all the slaves. There can be no conflict when the master transmits, because a single output is connected to multiple inputs. When a slave receives input, it knows it is a command from the master. The potential problem exists because multiple slave transmitters are connected to the same signal. If the slaves only transmit after specifically being triggered by the master, no collisions can occur. Checkpoint 12.9: What voltage level will the master RxD observe if two slaves simultaneously transmit, one making it a logic high and the other a logic low?
The next communication system is a ring network. This is the simplest distributed system to design, because it can be constructed using standard serial ports. In fact, we can build a ring network simply by chaining the transmit and receive lines together in a circle, as shown in Figure 12.13. Building a ring network is a matter as simple as soldering a RS232 cable in a circle with one DB9 connector for each node. Messages will include source address, destination address and information. If computer A wishes to send information to computer C, it sends the message to B. The software in computer B receives the message, notices it is not for itself, and it resends the message to C. The software in computer C receives the message, notices it is for itself, and it keeps the message. Although simple to build, this system has slow performance (response time and bandwidth), and it is difficult to add/subtract nodes. Figure 12.13 A ring network implemented with 3 microcomputers.
SCI TxDx PS1 B 9S12
SCI PS0 PS1
RxD
A 9S12
RxD PS0
TxD
Ground
Ground All TxD are regular digital output
TxDx
SCI PS1
RxD
PS0
C 9S12
Ground
Checkpoint 12.10: Assume the ring network has 10 nodes, the baud rate is 100,000 bits/sec, and there are 10 bits/frame. What is average time it takes to send a 10 byte message from one computer to another?
The third communication system is a very common approach to distributed embedded systems, called multi-drop, as shown in Figure 12.14. To transmit a byte to the other computers, the software activates the SP483 driver and outputs the frame. Since it is halfduplex, the frame is also sent to the receiver of the computer that sent it. This echo can be checked to see if a collision occurred (two devices simultaneously outputting). If more than
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Figure 12.14 Two multi-drop networks implemented with 3 microcomputers.
TxDx means open collector +5V10kΩ A 9S12
SCI
PS1 PS0
TxD RxD
RS485 SP483 A 9S12
SCI
PS1 PS0
SCI
PS1 PS0
Ground
PS1
Ground TxD RxD
B 9S12
PS1 SCI PS0 Ground
Ground C 9S12
SCI
PS0
Ground B 9S12
451
TxD RxD
to others
C 9S12
PS1 SCI PS0 Ground
to others
two computers exist on the network, we usually send address information first, so that the proper device receives the data. The 6812 SCI has a status bit in the SCISR2 register called RAF that will true if there is an incoming frame on the RxD line. Many collisions can be avoided by checking this bit before transmitting. Checkpoint 12.11: How can the transmitter detect a collision had corrupted its output? Checkpoint 12.12: How can the receiver detect a collision had corrupted its input?
There are many ways to check for transmission errors. You could use a longitudinal redundancy check (LRC) or horizontal even parity. This error check byte is simply the exclusive-or of all the message bytes (except the LRC itself). The receiver also performs an exclusive-or on the message as well as the error check byte. The result will equal zero if the block has been transmitted successfully. Another popular method is checksum, which is simply the modulo256 (8-bit) or modulo65536 (16-bit) sum of the data packet. In addition, each byte could have (but doesn’t have to) include even parity. There are two mechanisms that allow the transmission of variable amounts of data. Some protocols use start (STX $02) and stop (ETX $03) characters to surround a variable amount of data. The disadvantage of this “termination code” method is that binary data cannot be sent because a data byte might match the termination character (ETX). Therefore, this protocol is appropriate for sending ASCII characters. Another possibility is to use a byte count to specify the length of a message. Many protocols use a byte count. The S19 records, for example, have a byte count in each line.
12.6
*Design and Implementation of a Controller Area Network (CAN)
12.6.1 The Fundamentals of CAN
In this section, we will design and implement a Controller Area Network (CAN). CAN is a high-integrity serial data communications bus that is used for real-time applications. It can operate at data rates of up to 1 Mbits/second, having excellent error detection and confinement capabilities. The CAN was originally developed by Robert Bosch for use in
452
12 䡲 Communication Systems
automobiles, and is now extensively used in industrial automation and control applications. The CAN protocol has been developed into an international standard for serial data communication, specifically the ISO 11989. Figure 12.15 shows the block diagram of a CAN system, which can have up to 112 nodes. There are four components of a CAN system. The first part is the CAN bus consisting of two wires (CANH, CANL) with 120 termination resistors on each end. The second part is the Transceiver, which handles the voltage levels and interfacing the separate receive (RxD) and transmit (TxD) signals onto the CAN bus. The third part is the CAN controller, which is hardware built into the 9S12, and it handles message timing, priority, error detection, and retransmission. The last part is software running within the 9S12 that handles the high-level functions of generating data to transmit and processing data received from other nodes.
9S12C32
9S12C32
1
9S12C32 112
2
CAN Controller
CAN Controller
CAN Controller
PM0/RxD PM1/TxD
PM0/RxD PM1/TxD
PM0/RxD PM1/TxD
Transceiver Slope Control Driver Control
Dominant Detect Shutdown
Driver Control
POR
MCP2551 CANL
CANH
Transceiver
Transceiver Slope Control
Dominant Detect
Slope Control
Shutdown
Driver Control
POR
CANH
Shutdown POR
MCP2551
MCP2551 CANL
Dominant Detect
CANL
CANH 120Ω
120Ω Figure 12.15 Block Diagram of a 9S12-Based CAN communication system.
Each node consists of a 9S12 microcontroller (with an internal CAN controller), and a transceiver that interfaces the CAN controller to the CAN bus. A transceiver is a device capable of transmitting and receiving on the same channel. The CAN is based on the “broadcast communication mechanism”, which follows a message-based transmission protocol rather than an address-based protocol. The CAN provides two communication services: the sending of a message (data frame transmission) and the requesting of a message (remote transmission request). All other services such as error signaling, automatic retransmission of erroneous frames are user-transparent, which implies that the CAN interface automatically performs these functions. The 9S12 has an integrated CAN interface (e.g., the 9S12C32 has one CAN channel and the 9S12DP512 has five CAN channels). The physical channel consists of two wires containing in differential mode one digital logic bit. Because multiple outputs are connected together, there must be a mechanism to resolve simultaneous requests for transmission. In a manner similar to open collector logic, there are dominant and recessive states on the transmitter, as shown in Figure 12.16. The outputs follow a wired-and-mechanism in such a way that if one or more nodes are sending a dominant state, it will override any nodes attempting to send a recessive state. Checkpoint 12.13: What are the dominant and recessive states in open collector logic?
The CAN transceiver is a high-speed, fault-tolerant device that serves as the interface between a CAN protocol controller (located in the 9S12) and the physical bus. The
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Figure 12.16 Voltage specifications for the recessive and dominant states.
5V 3.5V 2.5V 1.5V 0V
453
Recessive Dominant Recessive CANH CANL Time
transceiver is capable of driving the large current needed for the CAN bus and has electrical protection against defective stations. Typically each CAN node must have a device to convert the digital signals generated by a CAN controller to signals suitable for transmission over the bus cabling. The transceiver also provides a buffer between the CAN controller and the high-voltage spikes than can be generated on the CAN bus by outside sources. Examples of CAN transceiver chips include the AMIS-30660 high speed CAN transceiver, Infineon Technologies TLE6250GV33 transceiver, ST Microelectronics L9615 transceiver, Philips Semiconductors AN96116 transceiver, and the Microchip MCP2551 transceiver. These transceivers have similar characteristics and would be equally suitable for implementing a CAN system. In a CAN system, messages are identified by their contents rather by addresses. Each message sent on the bus has a unique identifier, which defines both the content and the priority of the message. This feature is especially important when several stations compete for bus access, a process called bus arbitration. As a result of the content-oriented addressing scheme, a high degree of system and configuration flexibility is achieved. It is easy to add stations to an existing CAN network. Four message types or frames can be sent on a CAN bus. These include the Data Frame, the Remote Frame, the Error Frame, and the Overload Frame. This section will focus on the Data Frame, where the parts in standard format are shown in Figure 12.17. The Arbitration Field determines the priority of the message when two or more nodes are contending for the bus. For the Standard CAN 2.0A, it consists of an 11-bit identifier. For the Extended CAN 2.0B, there is a 29-bit Identifier. The identifier defines the type of data. The Control Field contains the DLC, which specifies the number of data bytes. The Data Field contains zero to eight bytes of data. The CRC Field contains a 15-bit checksum used for error detection. Any CAN controller that has been able to correctly receive this message sends an Acknowledgement bit at the end of each message. This bit is stored in the Acknowledge slot in the CAN data frame. The transmitter checks for the presence of this bit and if no acknowledge is received, the message is retransmitted. To transmit a message, the software must set the 11-bit Identifier, set the 4-bit DLC, and give the 0 to 8 bytes of data. The receivers can define filters on the identifier field, so only certain message types will be accepted. When a message is received the software can read the identifier, length, and data.
Message Frame Bus Idle
Arbitration field
Control
11-bit Identifier
SOF
RTR
Figure 12.17 CAN Standard Format Data Frame.
DLC
r0 IDE/r1
Data field Data (0–8 bytes)
CRC field ACK
EOF
IFS Bus Idle
15 bits
Delimiter
Delimiter Slot
454
12 䡲 Communication Systems
The Intermission Frame Space (IFS) separates one frame from the next. There are two factors that affect the number of bits in a CAN message frame. The ID (11 or 29 bits) and the Data fields (0, 8, 16, 24, 32, 40, 48, 56, or 64 bits) have variable length. The remaining components (36 bits) of the frame have fixed length including SOF (1), RTR (1), IDE/r1 (1), r0 (1), DLC (4), CRC (15), and ACK/EOF/intermission (13). For example, a Standard CAN 2.0A frame with two data bytes has 11 16 36 63 bits. Similarly, an Extended CAN 2.0B frame with four data bytes has 29 32 36 97 bits. If a long sequence of 0’s or a long sequence of 1’s is being transferred, the data line will be devoid of edges that the receiver needs to synchronize its clock to the transmitter. In this case, measures must be taken to ensure that the maximum permissible interval between two signal edges is not exceeded. Bit Stuffing can be utilized by inserting a complementary bit after five bits of equal value. Some CAN systems add stuff bits, where the number of stuff bits depends on the data transmitted. Assuming n is the number of data bytes (0 to 8), CAN 2.0A may add 3 n stuff bits and a CAN 2.0B may add 5 n stuff bits. Of course, the receiver has to un-stuff these bits to obtain the original data. The urgency of messages to be transmitted over the CAN network can vary greatly in a real-time system. Typically there are one or two activities that require high transmission rates or quick responses. Both bandwidth and response time are affected by message priority. Low priority messages may have to wait for the bus to be idle. There are two priorities occurring as the 9S12 CANs transmit messages. The first priority is the 11-bit identifier, which is used by all the CAN controllers wishing to transmit a message on the bus. Message identifiers are specified during system design and cannot be altered dynamically. The 11-bit identifier with the lowest binary number has the highest priority. In order to resolve a bus access conflict, each node in the network observes the bus level bit by bit, a process known as bit-wise arbitration. In accordance with the wired-andmechanism, the dominant state overwrites the recessive state. All nodes with recessive transmission but dominant observation immediately lose the competition for bus access and become receivers of the message with the higher priority. They do not attempt transmission until the bus is available again. Transmission requests are hence handled according to their importance for the system as a whole. The second priority occurs locally, within each CAN node. When a node has multiple messages ready to be sent, it will send the highest priority messages first.
12.6.2 Details of the 9S12 CAN
Table 12.2 shows the I/O registers used for the CAN. The 9S12 CAN receiver has a FIFO queue, which can hold up to five incoming messages, as shown in Figure 12.18. The 9S12 CAN transmitter uses a priority queue, which can hold up to three outgoing messages. To transmit a message the software writes the message into addresses $0170 to $017F. The software specifies the priority of the outgoing message (CANTXTBPR at $017F). High priority messages go to the front of the queue and are transmitted next. Low priority messages go to the back of the queue and are transmitted only when no higher priority messages are ready. Once in the queue, the CAN hardware is responsible for handling the priority, timing, transmitting the message, error detection, and retransmission if an error occurs. The 9S12 CAN receiver has a FIFO queue, which can hold up to five incoming messages. To retrieve the contents of an incoming message the software reads from addresses $0160 to $016F. Observation: It is confusing when designing systems that use a sophisticated I/O interface like the CAN to understand the difference between those activities automatically handled by the CAN hardware module and those activities your software must perform. The solution to this problem is to look at software examples to see exactly the kinds of tasks your software must perform.
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN)
455
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0140 $0141 $0142 $0143 $0144 $0145 $0146 $0147 $014A $014B $0150– $0153 $0154– $0157 $0158– $015B $015C– $015F $0160 $0161 $0164– $016B $016C $0170 $0171 $0174– $017B $017C $017F
RXFRM CANE SJW1 SAMP WUPIF WUPIE 0 0 0 0 AC7
RXACT CLKSRC SJW0 TSEG22 CSCIF CSCIE 0 0 0 0 AC6
CSWAI LOOPB BRP5 TSEG21 RSTAT1 RSTATE1 0 0 0 IDAM1 AC5
SYNCH LISTEN BRP4 TSEG20 RSTAT0 RSTATE0 0 0 0 IDAM0 AC4
TIME 0 BRP3 TSEG13 TSTAT1 TSTATE1 0 0 0 0 AC3
WUPE WUPM BRP2 TSEG12 TSTAT0 TSTATE0 TXE2 TXEIE2 TX2 IDHIT2 AC2
SLPRQ SLPAK BRP1 TSEG11 OVRIF OVRIE TXE1 TXEIE1 TX1 IDHIT1 AC1
INITRQ INITAK BRP0 TSEG10 RXF RXFIE TXE0 TXEIE0 TX0 IDHIT0 AC0
AM7
AM6
AM5
AM4
AM3
AM2
AM1
AM0
AC7
AC6
AC5
AC4
AC3
AC2
AC1
AC0
AM7
AM6
AM5
AM4
AM3
AM2
AM1
AM0
ID10 ID2 DB7
ID9 ID1 DB6
ID8 ID0 DB5
ID7 RTR DB4
ID6 IDE 0 DB3
ID5 0 DB2
ID4 0 DB1
ID3 0 DB0
0 ID10 ID2 DB7
0 ID9 ID1 DB6
0 ID8 ID0 DB5
0 ID7 RTR DB4
DLC3 ID6 IDE 0 DB3
DLC2 ID5 0 DB2
DLC1 ID4 0 DB1
DLC0 ID3 0 DB0
0 PRIO7
0 PRIO6
0 PRIO5
0 PRIO4
DLC3 PRIO3
DLC2 PRIO2
DLC1 PRIO1
DLC0 PRIO0
CANCTL0 CANCTL1 CANBTR0 CANBTR1 CANRFLG CANRIER CANTFLG CANTIER CANTBSEL CANIDAC CANIDAR0– CANIDAR3 CANIDMR0– CANIDMR3 CANIDAR4– CANIDAR7 CANIDMR4– CANIDMR7 CANRXIDR0 CANRXIDR1 CANRXDSR0– CANRXDSR7 CANRXDLR CANTXIDR0 CANTXIDR1 CANTXDSR0– CANTXDSR7 CANTXDLR CANTXTBPR
Table 12.2 9S12 CAN ports. Figure 12.18 Data flow through the 9S12 CAN controller.
9S12C32
+5V
CAN Controller
3
Priority Queue Identifier DLC Data
1
nc 5
Vdd
Vref
TxD
PM1
MCP2551 CANL
FIFO Queue Identifier DLC Data
PM0
4
CANH RxD Vss 2
6 7
Rs 8
The CANCTL0 and CANCTL1 registers contain flags and control bits. RXFRM is the Received Frame Flag. It is set when a receiver has received a valid message correctly, independently of the filter configuration. Once set, it remains set until cleared by software or reset. Clearing is done by writing a ‘1’ to the bit. RXACT is the Receiver Active Status flag. This read-only flag indicates the CAN is receiving a message. SYNCH is the Synchronized Status flag. This read-only flag indicates whether the CAN is synchronized to the CAN bus and, as such, can participate in the communication process. INITRQ is the Initialization
456
12 䡲 Communication Systems
Mode Request bit. When this bit is set by the CPU, the CAN skips to Initialization Mode. Any ongoing transmission or reception is aborted and synchronization to the bus is lost. The module indicates entry to Initialization Mode by setting INITAK=1. SLPRQ is the Sleep Mode Request bit. This bit requests the CAN to enter Sleep Mode, which is an internal power saving mode. The Sleep Mode request is serviced when the CAN bus is idle, i.e. the module is not receiving a message and all transmit buffers are empty. The module indicates entry to Sleep Mode by setting SLPAK=1. CANE is the CAN Enable bit, which we set to 1 to enable the CAN module. If it is 0, then the module is disabled. CLKSRC is the CAN Clock Source bit, which defines the clock source for the CAN module. We set it to 1 to use the Bus Clock, and to 0 to use the Oscillator Clock. The frequency of the Oscillator Clock is equal to the frequency of the external crystal. The Bus Clock is the frequency at which data is accessed on the Bus and is a function of both the crystal and the PLL. We define the time quanta, Tq, as the period of the selected clock. LISTEN is the Listen Only Mode bit, which configures the CAN as a bus monitor. When the bit is set, all valid CAN messages with matching ID are received, but no acknowledgement or error frames are sent out. The CANBTR0 and CANBTR1 registers provide for bus timing control, which can only be written in initialization mode. SJW1, SJW0 are the Synchronization Jump Width bits, which we will set to zero for high speed communication. BRP[5-0] are Baud Rate Prescaler bits, and let x be the 6-bit number formed by these bits. The clock period used to create the individual bit timing is (x 1)*Tq. SAMP is the Sampling bit, which determines the number of samples of the serial bus to be taken per bit time. If set, three samples per bit are taken; the regular one (sample point) and two preceding samples using a majority rule. For higher bit rates, it is recommended that SAMP be cleared which means that only one sample is taken per bit. There are three time segments for each transmitted bit. Segment 0 is exactly one clock period, but the length of the other two periods is programmed using CANBTR1. The input bit is sampled at the time in between Segment 1 and Segment 2. TSEG22-TSEG20 are the three Time Segment 2 bits, and let y be the 3-bit number formed by these bits. The length of Segment 2 will be y 1 clock periods. TSEG13-TSEG10 are the four Time Segment 1 bits, and let z be the 4-bit number formed by these bits. The length of Time Segment 1 will be z 2 clock periods. The time for each bit includes all three segments Bit Time Tq *(x 1)(3 y z) Checkpoint 12.14: What is the relationship between y and z if we wish to sample the input in the middle of the bit interval?
CANRFLG is the Receiver Flag Register. The WUPIF CSCIF RSTAT1 RSTAT0 TSTAT1 TSTAT0 OVRIF and RXF flags are cleared by writing a ‘1’ to the corresponding bit position. Every flag has an associated interrupt arm bit in the CANRIER register. For low power applications, we can place the system in Sleep Mode. WUPIF is the Wake-Up Interrupt Flag, which is used to detect bus activity while in Sleep Mode. This bit is 1 when it has detected activity on the bus and requested wake-up. CSCIF is the CAN Status Change Interrupt Flag. This flag is set when the CAN changes its current bus status as shown in the 4-bit (RSTAT[1:0], TSTAT[1:0]) status register. The coding for the bits RSTAT1, RSTAT0 is: 00 Rx OK: 01 Rx Warning: 10 Rx Error: 11 Bus-Off:
0 Receive Error Counter 96 96 Receive Error Counter 127 127 Receive Error Counter 255 Transmit Error Counter
The coding for the bits TSTAT1, TSTAT0 is: 00 Tx OK: 01 Tx Warning: 10 Tx Error: 11 Bus-Off:
0 Transmit Error Counter 96 96 Transmit Error Counter 127 127 Transmit Error Counter 255 255 Transmit Error Counter
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN)
457
Excessive transmitter errors will turn off both the receiver and the transmitter. OVRIF is the Overrun Interrupt Flag, which is set when a data overrun condition occurs. In particular, an overrun occurs when five valid messages are in the receive FIFO, and a sixth message is received. RXF is the Receive Buffer Full Flag, which is set by the CAN when a new message is shifted in the receiver FIFO. This flag indicates whether the shifted buffer is loaded with a correctly received message (matching identifier, matching Cyclic Redundancy Code (CRC) and no other errors detected). After the CPU has read that message from locations $0160-$016F, the RXF flag must be cleared to release the buffer. If armed (RXFIE), this bit will request an interrupt. The software can configure the 9S12 CAN to filter incoming messages. Accepted messages will set the RXF flag and will be available for processing. Dropped messages will not set the RXF flag and will be discarded. CANIDAR0-7 are the Identifier Acceptance Registers. CANIDMR0-7 are corresponding the Identifier Mask Registers. These registers can only be set in initialization mode. CANIDAC is the Identifier Acceptance Control Register. The two bits IDAM1 IDAM0 specify the Identifier Acceptance Mode. 002 means the eight acceptance registers are configured as two 32-bit filters. 012 means the eight acceptance registers are configured as four 16-bit filters. 102 means the eight acceptance registers are configured as eight 8-bit filters. 112 means the filter is closed, meaning no message will be accepted and that the foreground buffer is never reloaded. On reception, each message is written into the background receive buffer. The CPU is only signaled to read the message if it passes the criteria in the identifier acceptance and identifier mask registers (accepted); otherwise, the message is overwritten by the next message (dropped). The acceptance registers of the CAN are applied on the IDR0 to IDR3 registers of incoming messages in a bit by bit manner. Mask bits AM7-AM0 are set to 0 to specify the corresponding bit will be filtered, and a mask bit of 1 means the corresponding bit will match (be acceptable) regardless of ID bit value. AC7-AC0 comprise a user defined sequence of bits with which the corresponding bits of the related identifier register (IDRn) of the receive message buffer are compared. The result of this comparison is then masked with the corresponding identifier mask register. The three bits IDHIT2, IDHIT1, and IDHIT0 specify which filter applied to the message currently available in the receive FIFO. Observation: To enable the receiver to accept all messages set the mask registers to 0xFF.
CANTFLG is the Transmitter Flag Register. The flags are cleared by writing a ‘1’ to the corresponding bit position. Every flag has an associated interrupt arm bit in the CANTIER register. TXE2, TXE1, and TXE0 are the Transmitter Buffer Empty bits, which indicate that the associated transmit message buffer is empty, and thus not scheduled for transmission. The CPU must clear the flag after a message is set up in the transmit buffer and is due for transmission. The CAN sets the flag after the message is sent successfully. The flag is also set by the CAN when the transmission request is successfully aborted due to a pending abort request. There are three transmit buffers in the priority, but only one is accessible at addresses $0170-$017F. CANTBSEL is the Transmit Buffer Selection register, defining which buffer will be accessible. In particular, TX2, TX1, and TX0 are the Transmit Buffer Select bits. The lowest numbered bit places the respective transmit buffer in $0170-$017F space (e.g. if CANTBSEL is 0112, transmit buffer 0 is selected). Read and write accesses to the selected transmit buffer will be blocked, if the corresponding TXEx bit is cleared and the buffer is scheduled for transmission. IDE is the ID Extended bit, which indicates whether the extended or standard identifier format is applied in this buffer. In the case of a receive buffer, the flag is set as received and indicates to the CPU how to process the buffer identifier registers. In the case of a transmit buffer, the flag indicates to the CAN what type of identifier to send. IDE 1 means
458
12 䡲 Communication Systems
Extended format (29 bit), and IDE 0 means Standard format (11 bit). RTR is the Remote Transmission Request bit, which reflects the status of the Remote Transmission Request bit in the CAN frame. In the case of a receive buffer, it indicates the status of the received frame and supports the transmission of an answering frame in software. In the case of a transmit buffer, this flag defines the setting of the RTR bit to be sent. RTR 1 means Remote frame and RTR 0 means Data frame.
12.6.3 9S12 CAN Device Driver
Program 12.6 Initialization of the 9S12 CAN network.
The device driver for the 9S12-based CAN network is divided into three components: initialization, transmission, and reception. Although the 9S12 can handle standard and extended message formats, this software system will be configured to handle only the standard format. Program 12.6 gives the initialization code for the interface. The highlevel software on all nodes of the network will call CAN_Open() to initialize the CAN modules. If a node wishes to send 0 to 8 bytes of data to the other nodes, it would pass the information to CAN_Send(), which will transmit the message via the CAN bus. This information would then be retrieved by the receiving nodes by calling CAN_Receive(). The receiver will generate an interrupt when a new message is ready, and a FIFO queue will be used to pass the message from the background to the foreground. Each entry in the FIFO needs to be at least 11 bytes long: 2 bytes for the 11-bit ID, 1 byte for the 3-bit length, and 8 bytes for the data. The CAN is enabled by setting the CANE bit. In order to set the configuration registers, the CAN must be in initialization mode. If the main program calls CAN_Open a second time, there may be transmit or receive messages in progress. In order to prevent errors, this ritual will first request a transfer into Sleep Mode. This request will allow incoming and outgoing messages to complete before acknowledging Sleep Mode has been entered. Once in Sleep Mode, this ritual can safely request the CAN enter Initialization Mode. The initialization sequence turns off Listen Mode, sets the clock, and establishes the acceptance filters. Setting all the acceptance masks to 0xFF means all messages will be accepted.
void CAN_Open(void){ asm sei // make atomic CANFifo_Init(); // Initialize FIFO data structure CANCTL1 |= 0x80; // CANE=1, Enable CAN CANCTL0 |= 0x02; // SLPRQ=1, go to sleep first while((CANCTL1&0x02)==0){} ; // SLPAK signifies Sleep Mode CANCTL0 &= ~0x02; // SLPRQ=0, leave Sleep Mode CANCTL0 |= 0x01; // INITRQ=1, Enter Initialization Mode while((CANCTL1&0x01)==0){} ; // INITAK signifies Initialization Mode CANCTL1 &= ~0x10; // LISTEN=0, get out of Listen-only mode CANCTL1 &= ~0x40; // CLKSRC=0, use oscillator clock CANIDAC = 0x10; // four 16-bit filters CANIDMR0 = 0xFF; CANIDMR1 = 0xFF; CANIDMR2 = 0xFF; CANIDMR3 = 0xFF; CANIDMR4 = 0xFF; CANIDMR5 = 0xFF; CANIDMR6 = 0xFF; CANIDMR7 = 0xFF; CANBTR0 = 0x03; // (x+1)=4, assume oscillator is 8 MHz CANBTR1 = 0x23; // (3+y+z)=8, divide by 32 gives 250,000 bits/sec CANCTL0 &= ~0x01; // INITRQ=0, Leave Initialization mode while(CANCTL1&0x01){} ; // wait for the end of initialization CANRIER |= 0x01; // Arm RxF, interrupt on receive message asm cli // Enable interrupts }
Program 12.7 shows the software used to transmit a message. It begins by waiting for an empty transmit buffer. After the first while loop, one or more bits in the CANTFLG register will be set. Each flag bit that is set means its corresponding buffer is free.
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Program 12.7 Transmit a message on the 9S12 CAN network.
459
void CAN_Send(unsigned short id, char length, char *data, char priority) { char *pt=(char*)&_CANTXDSR0; // points to transmit message buffer while((CANTFLG&0x07)== 0){} ; // Wait for transmit buffer available CANTBSEL = CANTFLG; // Request selection of empty xmt buf CANTXIDR0 = id>>3; // Write Identifier into ID registers CANTXIDR1 = idTextFormat...command. The second unique feature is the extensive error reporting that is included when a syntax error is found. Rather than reporting the usual terse “syntax error” message, this assembler attempts to suggest ways to correct the error. The third unique feature is its expression evaluator, which applies the standard rules of precedence and parentheses. For example five
equ ‘0’|(3+4*5/10)
The application will automatically create object code for the microcomputer you have selected. There are many assemblers on the market for the 9S12. This assembler attempts to support many of their syntax and commands. This means you can import software originally written for these other assemblers into TExaS. Checkpoint A1.4: What is object code?
Instruction Set Simulator: This part of the TExaS application implements the basic central processing unit (CPU) of the 9S12. The program counter, or PC, is a special register located in the CPU used to control program execution. Software is a sequence of specific instructions to be executed. These instructions are fetched/executed using the current value of the program counter, PC. The simulated memory supports the appropriate amount of RAM, EEPROM, or ROM. The bus is a set of digital signals that connect the CPU, memory and I/O devices. The unique aspects of the instruction set simulator are the bus cycle activity and the extensive error checking. The exact read/write cycles of the 6811 can be viewed. The three-element 16-bit instruction queue of the 9S12 will be explained in Chapter 3. TExaS simulates the 9S12 memory bus activity using a simplified 8-bit memory read/write cycles similar to the 6811. Although the software/hardware timing is accurate when simulating a 9S12, the memory bus activity is shown in a simplified format. Information collected during simulation is recorded in the TheLog.rtf file. The Mode
A1.2 䡲 Major Components of TExaS
485
menu commands are used to configure the instruction set simulator. There is a clear tradeoff between the simulation speed (your program runs faster) and the amount of information you can observe (your program runs slower.) Some Mode menu commands are illustrated in Table A1.2.
Table A1.2 TExaS Mode menu commands.
Processor Simulation Speed Run Mode Open S19 Mode Follow PC Halt on Error Break Mode Cycle View Instruction View Log Record
Specify which processor to simulate Choose the number of instructions between screen updates Specify various execution modes Specify parameters for loading externally-compiled S19 code Cursor in program document is updated Halt execution on a program error Toggle between breakpoints and scanpoints Show bus cycles in TheLog.RTF during execution Show instructions in the TheLog.RTF during execution Record View Box data in TheLog.RTF during execution
The Mode->Processor command allows you select the processor and memory configuration of the microcomputer. The Mode->SimulationSpeed command allows you to choose the number of instructions executed between screen updates. Many of the Mode menu commands can be set in one dialog box using the Mode->RunMode command. You use the Mode->OpenS19Mode command to specify how TExaS imports object code created with a cross-compiler. Toggle the Mode->FollowPC command to enable and disable highlighting the current instruction being executed in the TheList.rtf window. Toggle the Mode->CycleView command to enable and disable showing memory bus cycles in TheLog.rtf during execution. Toggle the Mode->InstructionView command to enable and disable displaying instructions in the TheLog.rtf as they are executed. Toggle the Mode->LogRecord command to enable and disable recording strategic information in TheLog.rtf during execution. While the Mode menu is used to configure the simulation, the Action menu initiates activity. Some Action menu commands shown in Table A1.3.
Table A1.3 TExaS Action menu commands.
Reset Step StepOver StepOut RuntoCursor Few Go OpenS19Again OpenS19 BackDump BreakatCursor
F9 F10 Shft F10 Alt F10 Alt 1 F11 F12 Alt 2 Alt 3 Alt 4 Alt 5
Hardware reset Single step, execute one instruction Step over, execute 1 instruction or 1 subroutine Finish subroutine Run to cursor in listing file Execute a few instructions and stop Start/stop execution Reload S19 object code and listing file Load S19 object code and listing file Dump log data from most recent instructions executed Place a breakpoint at listing cursor
The Action->Reset (F9) command performs a hardware reset on the microcomputer system. The Action->Step (F10) command executes one instruction. The Action->StepOver (Shft F10) command will execute one instruction. If that instruction is a subroutine call, then the entire subroutine will be executed. The Action->StepOut (Alt F10) will execute until the subroutine is finished then stop. Toggle the Action->Go (F12) command to start and stop simulation. You use the Action->OpenS19 command to
486
Appendix 1 䡲 Embedded System Development Using TExaS
import object code created with a cross-compiler. Executing the Action->BackDump (Alt 4) will display the activity generated by most recent instructions executed. The second unique aspect of this simulator is the error checking. Examples of illegal activity include: 䡲 䡲 䡲 䡲 䡲 䡲 䡲
Execution of an illegal instruction Read/write to an undefined address Stack underflow (causing a read/write from unimplemented memory) Write to ROM, EEPROM Read from unprogrammed ROM, EEPROM Read from RAM that has not yet been written to Read from an unimplemented I/O port
These error-checking operations will catch many run-time programming errors. Real 9S12 microcomputers will execute an unimplemented instruction trap interrupt (like a software interrupt) when the processor attempts to execute an illegal instruction. The TExaS application gives you the option of halting simulation or executing the trap interrupt (like the real microcomputer.) You select this option using the Mode->RunMode . . . command. Software bugs often results in one of the illegal activities shown above. Whereas a real computer gives garbage data then continues on when executing an illegal read or write, this simulator will report the error and stop. Debugging, the process of identifying and removing software errors, is an important aspect of embedded system developing, and this simulator has many powerful debugging tools. I/O Port Simulator: This part of the application simulates many of the I/O peripherals on the microcomputer. Simple peripherals include the parallel I/O ports with direction registers as appropriate. Other functions like the timer, timer overflow, input capture, output compare, key wakeup, serial communications interface and ADC are available as supported by the actual microcomputer device. Most but not all peripherals on the 9S12 are supported. For a complete list of implemented features see the Port12.rtf file created when TExaS is installed. Interrupts (flags, masks, priority and vectors) are accurately simulated. The application will automatically simulate the I/O ports for the specific microcomputer you have selected. For the latest implementation details see the readme.txt file. If your software accesses an unimplemented I/O port, a run-time error is generated. As we saw earlier, the command Mode->HaltonError will enable/disable the reporting of run-time errors. External Device Simulator: This is one of the most complex yet important parts of the TExaS application. What makes embedded system programming interesting is its interaction with physical devices external to the microcomputer. These devices are configured using the commands in the IO menu. An IO file must be open to create external I/O devices. The user interacts with these devices (e.g., toggling a switch) using the IO window. Logic probes and voltmeters are automatically attached as appropriate. A logic analyzer and oscilloscope can be added to provide visual information about signals outside the microcomputer chip.
A1.3
Embedded System Design Process Figures A1.1 through A1.4 illustrate the design process of an embedded system with four switches and four light emitting diodes (LEDs) using a simulator like TExaS. The requirements of this system is to have each switch control an LED. If a switch is pressed, the corresponding LED come on. Figure A1.1 shows the circuit diagram of the system, drawn during the early stages of the design.
A1.3 䡲 Embedded System Design Process Figure A1.1 Circuit diagram of an embedded system with four inputs and four outputs.
Inputs
Processing
+5V
Outputs 7405
PH3
+5V
PT3
+5V
22Ω 5μF
+5V
200Ω
9S12 74HC14
1kΩ
200Ω 74HC14
1kΩ
487
7405
PH2
PT2 +5V
22Ω +5V
200Ω
5μF 74HC14
1kΩ
7405
PH1 22Ω 5μF
200Ω 74HC14
1kΩ
+5V
PT1
+5V
7405
PH0
PT0
22Ω 5μF
Next, the system is “built” in the simulator, and Figure A1.2 shows the I/O window within TExaS. Figure A1.2 TExaS simulation of an embedded system with four inputs and four outputs.
PT3
PT2
PT1
PT0
0
z
0
z
0
z
0
z
PH3
PH2
PH1
PH0
0
5
0
5
0
5
0
5
Next, we will discuss the process of software development within the simulator. One way to develop assembly software is to first write the software in a high level language like C, then convert the software by hand into assembly. Program A1.1 shows the C code for this simple system. The line numbers are not part of the program, but were added for this example in order to help with the explanation. Characters between /* and */ are comments, and are added as documentation. Lines 2,3,4 are inserted by the compiler wizard when a new project is created, In particular, Line 3 defines symbols for all the I/O ports on the 9S12DP512. These symbols make the software easier to read. For example, line 11 could have been written as *(unsigned char volatile *)(0x0240) = Data;
488
Appendix 1 䡲 Embedded System Development Using TExaS
Program A1.1 C language program for the 9S12DP512 written in Metrowerks CodeWarrior.
1 2 3 4 5 6 7 8 9 10 11 12 13
/* ********ChapA1.c**********************/ #include /* common defines and macros */ #include <mc9s12dp512.h> /* derivative information */ #pragma LINK_INFO DERIVATIVE “mc9s12dp512” unsigned char Data; void main(void){ DDRH = 0x00; /* Port H in an input */ DDRT = 0xFF; /* Port T is an output */ while(1){ Data = PTH; /* read switch value into Data */ PTT = Data; /* write value to LEDs */ } }
Line 5 defines a global variable. Line 7, when executed, define Port H as an input port. Similarly, the execution of Line 8 makes Port T as an output. The while(1) code causes the lines 10 and 11 to be executed over and over. Executing the code Data ⴝ PTH; will bring copy of the 8 input pins of Port H into the global variable. The code PTT ⴝ Data; stores the value from the global out to the output Port T, changing the pattern on the LED lights. Program A1.2 shows the assembly code for the simple system shown previously in Figures A1.1 through A1.4. The line numbers are not part of the program, but were added for this example in order to help with the explanation. Characters that follows a semicolon (;) are comments, and are added as documentation. A line of assembly code has four fields. The label field is optional and starts in the leftmost column. The next field contains the op code (like ldaa) or a pseudo-op code (like equ). Op codes contain actual instructions to be executed by the computer. For example, ldaa brings a value from memory or I/O port into Register A, staa sends a value from Register A out to memory or I/O port, and bra causes the program to branch. Pseudo-op codes give instructions to the assembler and are not executed by the computer. The third field is the operand field, which contains information needed by the instruction. The instruction ldaa #n will load the number n into Register A. For example,
Program A1.2 Assembly language program for the 9S12DP512.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
; Appendix 1 tutorial program for the 9S12DP512 PTH equ $0260 ; Port H I/O Register DDRH equ $0262 ; Port H Data Direction Register PTT equ $0240 ; Port T I/O Register DDRT equ $0242 ; Port T Data Direction Register org $0800 ;globals go in RAM Data ds 1 ;copy of Input from Port H switches org $4000 ;object code goes in ROM main ldaa #$00 staa DDRH ;make all pins of Port H input ldaa #$FF staa DDRT loop ldaa PTH ;read switch values staa Data ;save a copy in global variable staa PTT ;output to lights bra loop ;repeat org $FFFE fdb main ;starting address after a RESET
A1.3 䡲 Embedded System Design Process
489
the #$0010 operand field in line 9 specifies the data to be used in the instruction is the value $00. On the other hand, the instruction ldaa N will load the contents of memory location N into Register A. For example, the PTH operand field in line 13 specifies the place to read the data will be Port H. The instruction staa N will store the contents of Register A out to memory location N. The last field, which is the right most field, contains comments. Comments are ignored by the computer, but used by the programmer to clarify the program operation. Tabs and/or spaces delimit the fields, which are nicely lined up in this example. Lines 2 through 5 use the pseudo-op equ to define symbols for I/O ports used on the 9S12DP512. Just like symbols in the C code, these symbols make the software easier to read. For example, line 15 could have been written as staa $0240, but notice how much easier it is to understand staa PTT. Lines 6 and 8 use the pseudo-op org to place the variables in RAM, and the program in ROM. Line 7 defines a global variable. Lines 9 through 16 give the actual instructions the 9S12 will execute. Lines 9 and 10, when executed, define Port H as an input port. The instruction ldaa #$00 places the value $00 into Register A. The instruction staa DDRH stores the value from Register A into the I/O register DDRH, making Port H an input. Executing Lines 11 and 12 make Port T an output. Executing the instruction ldaa PTH will bring a copy of the 8 input pins of Port H into Register A. The instruction staa PTT stores the value from Register A out to the output Port T, changing the pattern on the LED lights. The bra loop instruction causes lines 13-16 to be executed over and over. Lines 17 and 18 define the reset vector, which specifies where the software will begin when power is applied or when the reset button is pushed. Figure A1.3 shows a prototype of this system constructed with a commercially available development board constructed on a breadboard. The software would be tested again to verify its correctness. Figure A1.4 shows the final product. The final product must also be tested.
Figure A1.3 Prototype of an embedded system with four inputs and four outputs. (Courtesy of Jonathan Valvano.)
10
The # means immediate mode (take this as data) and the $ means hexadecimal.
490
Appendix 1 䡲 Embedded System Development Using TExaS
Figure A1.4 Final embedded system with four inputs and four outputs. (Courtesy of Jonathan Valvano.)
A1.4
Running and Modifiing Existing Assembly Language Programs The simplest place to begin using the TExaS simulator is to run an existing configuration, as listed in Table A1.1. In fact, the tutorials, located at the end of each chapter, will step you through this process in detail. In this section however, the general approach to running an existing system is presented. First, you choose the general topic of interest. For example, if you were interested in microcomputer software that used the serial port, you could choose the sci example. Next you choose the microcomputer. If you choose the 9S12DP512, open the sci.rtf file located in the MC9S12DP subdirectory. When any of the sci.* files is opened, TExaS will automatically open the related files. For example, sci.rtf is the source code, sci.uc is the microcomputer file, and sci.io is the external I/O devices. On most computers double-clicking sci.rtf will incorrectly start WordPad or Microsoft Word. So to start TExaS with these files, you could double-click the sci.uc icon. The program must be assembled before it can be executed. The assembly process involves converting the human readable assembly source code (sci.rtf) into the machine readable object code. Click on the source code (the sci.rtf window) and execute the Assemble->Assemble command (ctl-B). The TExaS assembler automatically loads the object code into simulated memory. Next, you run the simulation. You start the simulation by executing the Action->Go command (F12). There are many windows you can observe during execution. The ViewBox in the microcomputer window shows strategic information during execution. If the FollowPC mode is active (execute the Mode->FollowPC command to toggle this mode on and off), then the TheList.rtf window will show the current position of the executing software. The TheLog.rtf window can be configured to show a wide range of results during simulation. If the CycleView mode is active (execute the Mode->CycleView command to toggle this mode on and off), then the address/data bus activity will be logged. If the InstructionView mode is active (execute the Mode->InstructionView command to toggle this mode on and off), then the executed instruction will be logged. If the LogRecord mode is active (execute the Mode->LogRecord command to toggle this mode on and off), then the parameters of the ViewBox will be dumped into the TheLog.rtf during execution. The status of external devices is shown in the IO device window. If a serial port is active then the input/output of this external device is shown in the TheCRT.rtf window. If you plan to modify these files, it makes sense to give them new names. If you didn’t change the names, and were to upgrade to a newer version of TExaS, then the install process might overwrite your programs. Use the File->SaveAs command to change the names of all the files you will be using. You should maintain the appropriate extension (*.rtf, *.uc, *.io, *.stk, *.scp), place these new files all in the same directory (but the directory may be different than the original directory that contained the existing example), and give them all the same first part of the filename name. For example, you could save the source file as My.rtf, save the microcomputer file as My.uc, and save the
A1.5 䡲 TExaS Editor
491
external I/O file as My.io. The scope and stack view files if needed are saved as My.scp and My.stk respectively. Leave the TheList.rtf, TheLog.rtf, and TheCRT.rtf files alone, because these are special files that must maintain these exact file names. The second step is to reconfigure the processor and external I/O ports as needed. For example, you may wish to switch from one type of 9S12 to another. Use the Mode-> Processor. . . command to select the new microcomputer. To reconfigure the external devices, click on the I/O window and execute the appropriate command from the IO menu. If you are converting an example from one microcomputer to another, you will have to rebuild all the external I/O and scope connections to be compatible with the I/O port names of the new microcomputer. Because the I/O devices are at different memory-mapped locations, switching between microcomputers will also require you to rebuild the scope connections. The third step is to write assembly code by editing the My.rtf file. For small programs you may wish to enable automatic recolor, and for large programs you may wish to disable it. The program must be assembled (ctl-B) before it can be executed. The fourth step is configuring the simulation modes. A basic tradeoff exists between simulation speed and the ability to observe the system behavior. The commands that affect simulation are grouped in the Mode menu. These configuration settings are saved and restored with the microcomputer file (e.g., in the file My.uc). The last step is running and debugging your software. The ViewBox, Stack Window, IO window, oscilloscope and logic analyzer provide visualization of your running program. The action commands are appropriately grouped into the Action menu. The usual debugging commands of Action->Reset, Action->Go, Action->Step, Action-> StepOver, and Action->StepOut are available. Breakpoints can be set on any address (even I/O ports). When a read or write access occurs to a breakpoint address, the simulation can be configured to stop (halt mode) or simply copy the ViewBox parameters into the TheLog.rtf window (scan mode). Some special debugging features include Action-> Few (execute a few instructions and stops) and Action->BackDump (display the simulator state for the previous instructions). You can perform right-click commands in the listing window: Action->RuntoCursor, and Action->BreakatCursor. Sometimes it is more efficient to build a system from scratch. To develop assembly programs you need at least one source code file and one microcomputer file. If you have I/O devices, you will need an I/O file too. The first step is to create new files as needed. After creating a new microcomputer file, use the Mode->Processor . . . command to select the desired microcomputer and specify the clock period. The second step is to perform File->SaveAs operations on the files you will be developing. You should maintain the appropriate extension (*.rtf, *.uc, *.io, *.stk, *.scp), place these new files all in the same directory, and give them all the same first part of the filename name. Next, follow the development cycle described in the above steps 3, 4, and 5.
A1.5
TExaS Editor The editor is a simplified version of WordPad. You can specify fonts, sizes and colors. Embedded figures can be added to clarify the software. For example, you can add circuit diagrams, flowcharts, and speadsheet objects into your programs. These embedded objects are ignored by the assembler when creating the object code, but can be quite useful for documentation. The editor uses rich text format (RTF), so formatted text can be cut and pasted from other applications that support rich text format. The following lists the default color settings of the TExaS editor: The labels are shown in purple The op codes are shown in blue The pseudo-op codes are shown in gray The numbers are shown in dark blue
492
Appendix 1 䡲 Embedded System Development Using TExaS
The strings are shown in magneta The operands are shown in black The comments are shown in green The assembly errors are shown in bold red These colors can be changed using the Assemble->TextFormat . . . command. Checkpoint A1.5: How can you tell if an operation is an opcode or pseudo-op?
A1.6
Assembly Language Syntax
A1.6.1 Overall Structure
Programs written in assembly language consist of a sequence of source statements. Each source statement consists of a sequence of ASCII characters ending with a carriage return. Each source statement may include up to four fields: a label, an operation (instruction mnemonic or assembler directive), an operand, and a comment. We use pseudo-op codes in our source code to give instructions to the assembler itself. The equ is an assembly directive and the ldaa is a regular machine instruction. PORTA Inp
equ ldaa
$0000 PORTA
; Assembly time constant ; Read data from fixed address I/O data port
An assembly language statement contains the following fields. Label Field can be used to define a symbol Operation Field defines the operation code or pseudo-op Operand Field specifies either the address or the data. Comment Field allows the programmer to document the software. Sometimes not all four fields are present in an assembly language statement. A line may contain just a comment. The entire line is considered a comment if the first character of the line is a star (*) or a semicolon (;). For example, ; This line is a comment * This is a comment too * This line is a comment
Instructions with inherent mode addressing do not have an operand field. For example, label
clra deca cli inca
comment comment comment comment
Recommendation: For small programs, you should enable automatic assembly colors. The editor will then color each field according to its type. Recommendation: For large programs, you disable automatic assembly colors, because the system will run too slow. Instead, use the assembler to color the source code explicitly each time the program is assembled.
A1.6.2 Label Field
The label field occurs as the first field of a source statement. The label field can take one of the following three forms: A. An asterisk (*) or semicolon (;) as the first character in the label field indicates that the rest of the source statement is a comment. Comments are ignored by the assembler, and are printed on the source listing only for the programmer’s information. Examples: * This line is a comment ; This line is also a comment
A1.6 䡲 Assembly Language Syntax
493
B. A white-space character (blank or tab) as the first character indicates that the label field is empty. The line has no label and is not a comment. These assembly lines have no labels: ldaa 0 rmb 10
C. A symbol character as the first character indicates that the line has a label. Symbol characters are the upper or lower case letters a to z, digits 0 to 9, and the special characters, period (.), dollar sign ($), and underscore (_). Symbols consist of at least one and at most 99 characters, the first of which must be alphabetic or the special characters period (.) or underscore (_). All characters are significant and upper and lower case letters are distinct. A symbol may occur only once in the label field. If a symbol does occur more than once in a label field, then each reference to that symbol will be flagged with an error. The exception to this rule is the set pseudo-op that allows you to define and redefine the same symbol. We typically use set to define the stack offsets for the local variables in a subroutine. The set pseudo-op allows two separate subroutines to re-use the same name for their local variables. With the exception of the equ and set directives, a label is assigned the value of the program counter of the first byte of the instruction or data being assembled. The value assigned to the label is absolute. Labels may optionally be ended with a colon (:). If the colon is used it is not part of the label but merely acts to set the label off from the rest of the source line. Thus the following code fragments are equivalent: here: deca bne here here
deca bne here
A label may appear on a line by itself. The assembler interprets this as set the value of the label equal to the current value of the program counter. A label may occur on a line with a pseudo-op. The size of the symbol table depends on the available PC computer memory, but you are typically allowed to have thousands of labels.
A1.6.3 Operation Field
The operation field occurs after the label field, and must be preceded by at least one white-space character. The operation field must contain a legal opcode mnemonic or an assembler directive. Upper case characters in this field are converted to lower case before being checked as a legal mnemonic. Thus nop, NOP, and NoP are recognized as the same mnemonic. Entries in the operation field may be opcodes or directives. Opcodes correspond directly to the machine instructions. The operation code includes any register name associated with the instruction. These register names must not be separated from the opcode with any white-space characters. Thus clra means clear accumulator A, but clr a means clear memory location identified by the label a. The available instructions depend on the microcomputer you are using. Directives or pseudo-ops are special operation codes known to the assembler that control the assembly process rather than being translated into machine instructions. The directives that TExaS supports are described in detail later in this chapter.
A1.6.4 Operand Field
The interpretation of the operand field is dependent on the contents of the operation field. The operand field, if required, must follow the operation field, and must be preceded by at least one white-space character. The operand field may contain a symbol, an expression, or a combination of symbols and expressions separated by commas. There can be no
494
Appendix 1 䡲 Embedded System Development Using TExaS
white-spaces in the operand field. For example the following two lines produce identical object code because of the space between data and in the first line: ldaa ldaa
data data
+
1
Observation: The Metrowerks assembler allows spaces within the operand field, but requires that a semicolon (;) be placed before each comment.
The operand field of machine instructions is used to specify the addressing mode of the instruction, as well as the operand of the instruction. Table A1.4 summarizes the operand field formats on 9S12. Table A1.4 Example operands for the 9S12.
Operand
Format
Example
no operand expression #expression expression,idx expr,#expr expr,#expr,expr expr,idx,#expr,expr
inherent direct, extended, or relative immediate indexed with address register bit set or clear bit test and branch bit test and branch
clra ldaa 4 ldaa #4 ldaa 4,x bset 4,#$01 brset 4,#$01,there brset 4,x,#$01,there
The 9S12 assembly language includes some additional operand formats, as shown in Table A1.5. The accumulator offset, acc, is A, B or D, and the index register, idx, is X, Y, SP, or PC. The PC is not allowed with any of the predecrement, postdecrement, preincrement, or postincrement addressing modes. Table A1.5 Additional example operands for the 9S12.
Operand
Format
Example
expression,idx expression,idx expression,idx expression,idx acc,idx [expression,idx] [D,idx]
indexed, post increment indexed, post decrement indexed, pre increment indexed, pre decrement accumulator offset indexed indexed indirect RegD indexed indirect
ldd 2,SP ldaa 4,Y ldaa 4,X staa 1,SP ldaa A,X ldaa [4,X] ldaa [D,Y]
The valid syntax of the operand field depends on the microcomputer. For a detailed explanation of the instructions and their addressing modes, see the help system with the TExaS application.
A1.6.5 Expressions
An expression is a combination of symbols, constants, algebraic operators, and parentheses. The expression is used to specify a value that is to be used as an operand. Expressions may consist of symbols, constants, or the character ‘*’ (denoting the current value of the program counter) joined together by one of the operators: * / % & | ^. * / % & | ^
add subtract multiply divide remainder after division bitwise and bitwise or bitwise exclusive or
A1.6 䡲 Assembly Language Syntax
495
Expressions may include parentheses and other expressions. Expressions are evaluated using the standard arithmetic precedence. Evaluation occurs left to right for multiple operations with the same precedence. The precedence follows standard mathematic conventions, as shown in Table A1.6. Arithmetic is carried out in signed 32-bit twos-complement integer precision at assembly time.
Table A1.6 Operator precedence.
Precedence
operation
Highest 2 3 lowest
parentheses unary ⬃ binary * / % & binary ^ |
Maintenance Tip: It is good programming practice to add parenthesis even if it is not necessary in order to clarify the operation. E.g., (A&B)|(C&D) is clearer than A&B|C&D.
Each symbol is associated with a 16-bit integer value that is used in place of the symbol during the expression evaluation. The asterisk (*) used in an expression as a symbol represents the current value of the location counter (the first byte of a multi-byte instruction.) Constants represent numbers that do not vary in value during the execution of a program. Constants may be presented to the assembler in one of four formats: decimal, hexadecimal, binary, or ASCII. The programmer indicates the number format to the assembler with the following prefixes: 0x $ % ‘c’
hexadecimal, C syntax hexadecimal, assembly syntax binary ASCII code for a single letter ‘c’
Unprefixed constants are interpreted as decimal. The assembler converts all constants to binary machine code and are displayed in the assembly listing as hexadecimal. A decimal constant consists of a string of numeric digits. The value of an 8-bit decimal constant ranges from 128 to 255. The value of a 16-bit decimal constant must fall in the range from 32768 to 65535. Some valid decimal constants are 12, 1235, and 3200. A hexadecimal constant consists of a maximum of four characters from the set of digits (0 to 9) and the alphabetic letters (A-F), and is preceded by a dollar sign ($). Hexadecimal constants must be in the range $0000 to $FFFF. Some valid hexadecimal constants are $12, $ABCD, and $001f. A binary constant consists of a maximum of 16 ones or zeros preceded by a percent sign (%). Some valid binary constants are %00101, %1, and %10100. A single ASCII character can be used as a constant in expressions. ASCII constants are surrounded by a single quotes (’). Any character, except the single quote, can be used as a character constant. Some valid character constants are ‘*’, ‘a’, and ‘Q’. Invalid cases will be identified as syntax errors by the assembler. Checkpoint A1.6: What is the value of 2 4*6/5 1? Checkpoint A1.7: The following two expressions evaluate to exactly the same result: $0F&‘A’|$F0&‘0’ and ($0F&‘A’)|($F0&‘0’). Which is better and why? Checkpoint A1.8: The following two assembly code sequences produce similar results: ldaa #56 and ldaa #5 adda #6. How are they different?
496
Appendix 1 䡲 Embedded System Development Using TExaS
A1.6.6 Comment Field
The last field of an assembler source statement is the comment field. This field is optional and is only printed on the source listing for documentation purposes. The comment field is separated from the operand field (or from the operation field if no operand is required) by at least one white-space character. The comment field can contain any printable ASCII characters. Observation: The Metrowerks assembler requires that a semicolon (;) be placed before each comment.
As software developers, our goal is to produce code that not only solves our current problem, but can serve as the basis of our future problems. In order to reuse software we must leave our code in a condition such that future programmer (including ourselves) can easily understand its purpose, constraints, and implementation. Documentation is not something tacked onto software after it is done, but rather a discipline built into it at each stage of the development. We carefully develop a programming style providing appropriate comments. A comment that tells us why we perform certain functions is more informative than comments that tell us what the functions are. An examples of bad comments would be: clr Flag sei ldaa $0240
;Flag=0 ;Set I=1 ;Read PTT
These are bad comments because they provide no information to help us in the future to understand what the program is doing. An example of good comments would be: clr Flag sei ldaa $0240
;Signifies no key has been typed ;The following code will not be interrupted ;Bit7=1 iff the switch is pressed
These are good comments because they make it easier to change the program in the future. Self-documenting code is software written in a simple and obvious way, such that its purpose and function are self-apparent. To write wonderful code like this, we first must formulate the problem organizing it into clear well-defined subproblems. How we break a complex problem into small parts goes a long way making the software self-documenting. Both the concept of abstraction and modular code address this important issue of software organization. Maintaining software is the process of fixing bugs, adding new features, optimizing for speed or memory size, porting to new computer hardware, and configuring the software system for new situations. It is the MOST IMPORTANT phase of software development. Flowcharts are effective in the design phase of a project. Flowcharts and software manuals are good mechanisms for documenting programs only when these types of documentation are kept up to date when modifications are made. We should use careful indenting, and descriptive names for variables, functions, labels, I/O ports. Effective use of equ provide explanation of software function without cost of execution speed or memory requirements. A disciplined approach to programming is to develop patterns of writing that you consistently follow. Software developers are unlike short story writers. It is OK to use the same subroutine outline over and over again. In Program A1.3, notice the following style issues: 1. Begins and ends with a line of *s 2. States the purpose of the subroutine 3. Gives the input/output parameters, what they mean and how they are passed 4. Different phases (submodules) of the code delineated by a line of s
A1.6 䡲 Assembly Language Syntax Program A1.3 An example use of comments.
A1.6.7 Assembly Listing and Errors
497
;****************** Max ******************************* ; Purpose: returns the maximum of two 16-bit numbers ; Inputs: RegX and RegY are two 16-bit unsigned numbers ; Output: RegX is the maximum of the two inputs ; Destroyed: CCR ; Calling sequence ; ldx #100 ;first number ; ldy #200 ;second number ; jsr Max Max psha ;Save registers, that will be modified pshb pshy ; - - - - - - - - - - - - - - - - - - - - - - - - - - pshx ;first number on the stack xgdy ;RegD is second number tsx ;access the stack cpd 0,x ;which is bigger bhs second ;go if second>=first first pulx ;RegX =first bra end second pulx xgdx ;RegX = second end ; - - - - - - - - - - - - - - - - - - - - - - - - - - puly ;Restore registers pulb pula rts ;****************** End of Max *****************************
The assembler output includes a listing containing the source program, the object code, and any assembly errors. The listing file is created when the TheList.rtf file is open. Each line of the listing contains a reference line number, the address and bytes assembled, and the original source input line. If an input line causes more than 8 bytes to be output (e.g., a long fcc directive), the additional bytes are included in the object code (S19 file or loaded into memory) but not shown in the listing. There are three assembly options, each can be toggled on/off using the Assembly->Options command. (4) [100] {PPP}
cycles total type
shows the number of cycles to execute this instruction gives a running cycle total since last org pseudo-op gives the cycle type
The codes used in the cycle type are presented in Chapter 4. The end of the assembly listing contains a symbol table. The symbol table contains the name of each symbol, along with its defined value. Since the set pseudo-op can be used to redefine the symbol, the value in the symbol table is the last definition. Programming errors fall into two categories. Simple typing/syntax error will be flagged by the TExaS assembler as an error when the assembler tries to translate source code into machine code. The more difficult programming errors to find and remove are functional bugs that can be identified during execution, when the program does not perform as expected. Error messages are meant to be self-explanatory. The assembler has a verbose (see Assembler->Options command) mode that provides more details about the error and suggests possible solutions. The assembler error types are listed below: 1. Label previously defined error: the same label occurs multiple times How to fix: check spelling of all the labels
498
Appendix 1 䡲 Embedded System Development Using TExaS
2. Undefined opcode error: operation does not exist How to fix: check the spelling/availability of the instruction, verify the correct processor is being used 3. Operand error: syntax error within the operand How to fix expression error: check parentheses, start with a simpler expression How to fix undefined symbol: check spelling of both the definition and access How to fix addressing mode error: look up the addressing modes available for the instruction 4. Phasing error: the value of a symbol changes from pass1 to pass2 How to fix: first remove any undefined symbols, then remove forward references If you really need a forward reference: use and to force extended or direct addressing 5. Can’t program address error How to fix: use the org pseudo-op to match available memory. 6. Branch too far error: Destination address is too far away to use 8-bit PC-relative addressing How to fix: switch to long branch version of the instruction Error diagnostic messages are placed in the listing file just after the line containing the error. If there is no TheList.rtf file, then assembly errors are reported in TheLog.rtf file. If neither TheList.rtf or TheLog.rtf exist, then assembly errors are not reported. A phasing error occurs during Pass 2 of the assembler when the address of a label is different than when it was previously calculated. The purpose of Pass 1 of the assembler is to create the symbol table. In order to calculate the address of each assembly line, the assembler must be able to determine the exact number of bytes each line will take during pass 1. For most instructions, the number of bytes required is fixed and easy to calculate, but for other instructions, the number of bytes can vary. A phasing errors occur when the assembler calculates the size of an instruction different in Pass 2 than previously calculated in Pass 1. Sometimes a phasing error often occurs on a line further down in the program than where the mistake occurs. A phasing error usually results from the use of forward references. In this first example, the symbol “size” is not available at the time of assembling the ldaa size. The assembler incorrectly chooses extended addressing mode version rather than the correct direct mode. One solution is to move the variables to the top, and a second solution is to force direct mode using ldaa size. ldaa size ... org 0 size fcb 5 ;
In this example, the symbol “index” is not available at the time of assembling the ldaa index,x. The assembler incorrectly chooses the 2 byte IDX addressing mode version rather than the correct 3 byte IDX1 mode. ldaa index,x index equ 100 ; ... loop ldaa #0 The listing shows the phasing error $0000 A6E064 ldaa index,x $0064 index equ 100 ; ... $0003 8600 loop ldaa #0 ##### Phasing error This line was at address $0002 in pass 1, now in pass 2 it is $0003
A1.6 䡲 Assembly Language Syntax
499
***************Symbol Table********************* index $0064 loop $0002 ##### Assembly failed, 1 errors!
When the assembler gets to loop, the Pass 1 and Pass 2 values are off by one causing a phasing error at the loop ldaa #0 instruction. The solution here to simply put the index equ 100 first. Observation: The assembler must be able to accurately determine the object code size of each instruction during pass 1.
A1.6.8 Assembler Pseudo-Ops
Table A1.7 Assembly directives supported by TExaS.
Pseudo-ops are specific commands to the assembler that are interpreted during the assembly process. An alternative name for pseudo-op is assembly directive. A few of them create object code, but most do not. There are many assemblers available developing Freescale assembly code. Although they all use the standard Freescale op codes, the spelling of the pseudo-op codes varies. The TExaS assembler supports many of the various dialects. The pseudo-op codes supported by this assembler are shown in Table A1.7. If you plan to export software developed with TExaS to another application, then you should limit your use only the pseudo-ops compatible with that application. Group A is supported by Freescale’s MCUez, and Metroworks. Group B is supported by Freescale’s DOS level AS05, AS08, AS11 and AS12. Group C are are used by ImageCraft’s ICC11 and ICC12. Group A
Group B Group C Meaning
org
org equ set dc.b db fcb fcc dc.w dw fdb dc.l dl ds ds.b rmb ds.w ds.l end end
.org
.byte .word .long .blkb .blkw .blkl .end
Specific absolute address to put subsequent object code Define a constant symbol Define or redefine a constant symbol Allocate byte(s) of storage with initialized values Create an ASCII string (no termination character) Allocate word(s) of storage with initialized values Allocate 32-bit long word(s) of storage with initialized values Allocate bytes of storage without initialization Allocate bytes of storage without initialization Allocate 32-bit words of storage without initialization Signifies the end of the source code (TExaS ignores these)
Equate Symbol to a Value: equ <expression> () = <expression> ()
The equ (or ) directive assigns the value of the expression in the operand field to the label. The equ directive assigns a value other than the program counter to the label. The label cannot be redefined anywhere else in the program. The expression cannot contain any forward references or undefined symbols. Equates with forward references are flagged as a phasing error. Program A1.4 A constant implemented with equ might make the program easier to change.
org size equ data rmb org sum ldaa ldx clrb loop addb dbne rts
$0800 5 size $4000 #size #data 1,x+ A,loop
500
Appendix 1 䡲 Embedded System Development Using TExaS
The equ pseudo-op is used to define the I/O ports, and to access the elements of a data structure. Programming Tip: Use equ definitions only if it makes the program easier to understand, to debug, or to change.
Redefinable Equate Symbol to a Value: set <expression> ()
The set directive assigns the value of the expression in the operand field to the label. The set directive assigns a value other than the program counter to the label. Unlike the equ pseudo-op, the label can be redefined within the program. Although allowed, it is probably a mistake to use forward references. The use of this pseudo-op with forward references will not be flagged with a phasing error. In Program A1.5, the local variable names created with the set directive could be reused in another subroutine.
Program A1.5 Simple functions with local variables using set.
; *****binding phase*************** I set 0 J set 1 ; *******allocation phase ********* function leas -2,sp ;allocate I,J ; ********access phase ************ clr I,sp ;Clear I ldab I,sp ;Reg B is a copy of I staa J,sp ;store into J ; ********deallocation phase ***** leas 2,sp ;deallocate J,I rts
Form Constant Byte: () () () ()
fcb <expr>(,<expr>,...,<expr>) () dc.b <expr>(,<expr>,...,<expr>) () db <expr>(,<expr>,...,<expr>) () .byte <expr>(,<expr>,...,<expr>) ()
The fcb directive may have one or more operands separated by commas. The value of each operand is truncated to eight bits, and is stored in a single byte of the object program. Multiple operands are stored in successive bytes. The operand may be a numeric constant, a character constant, a symbol, or an expression. If multiple operands are present, one or more of them can be null (two adjacent commas), in which case a single byte of zero will be assigned for that operand. If an operand is larger than the range of an 8-bit number (128 to 255), the result is truncated without a warning, and the least significant 8 bits are used. A string can be included, which is stored as a sequence of ASCII characters. The delimiters supported by TExaS are “ ‘ and \. The string does not include a null-termination, so if desired, the programmer must explicitly terminate it. The following three examples produce identical null-terminated strings. str1 fcb “Hello World”,0 str2 fcb ‘Hello World’,0 str3 fcb \Hello World\,0
The stepper motor controller shown in Program A1.6 uses the fcb definitions to store the four stepper motor output values.
A1.6 䡲 Assembly Language Syntax Program A1.6 A stepper motor controller using fcb.
size equ 4 PORTB equ $0001 DDRB equ $0003 org $4000 main movb #$FF,DDRB run ldaa #size ldx #steps step movb 1,x+,PORTB dbne A,step bra run steps fcb 5,6,10,9 org $FFFE fdb main
501
;PB3-PB0 to stepper
;PB3-PB0 outputs
;step the motor
;output sequence
Form Constant Character String: () fcc <delimiter><string><delimiter> ()
The fcc directive is used to store ASCII strings into consecutive bytes of memory. The byte storage begins at the current program counter. The label is assigned to the first byte in the string. Any of the printable ASCII characters can be contained in the string. The string is specified between two identical delimiters. The first non-blank character after the fcc directive is used as the delimiter. The delimiters supported by TExaS are “ ‘ and \. Examples: LABEL1 LABEL2 LABEL4
FCC fcc fcc
‘ABC’ “Jon Valvano “ /Welcome to FunCity!/
The first line creates the ASCII characters ABC at location LABEL1. Be careful to position the fcc code away from executable instructions. The assembler will produce object code like it would for regular instructions, one line at a time. For example the following would crash because after executing the ldx instruction, the microcomputer would try to execute the ASCII characters “Trouble” as instructions. ldaa 100 ldx #Strg Strg fcc “Trouble”
Typically we collect all the fcc, fcb, fdb together and place them at the end of our program, so that the microcomputer does not try to execute the constant data. For example Loop
ldaa Con8 ldy Con16 ldx #Strg bra loop ; Since the bra loop is unconditional, ; the computer won’t go beyond this point. Strg fcc “No Trouble” Con8 fcb 100 Con16 fdb 1000
The ASCII string generated by fcc is not null-terminated, so if a termination is needed, you must add it explicitly using either Strg1 fcc fcb
“happy” 0
or Strg2 fcb
“happy”,0
502
Appendix 1 䡲 Embedded System Development Using TExaS
Form Double Byte: () () () ()
fdb <expr>(,<expr>,...,<expr>) () dc.w <expr>(,<expr>,...,<expr>) () dw <expr>(,<expr>,...,<expr>) () .word <expr>(,<expr>,...,<expr>) ()
The fdb directive may have one or more operands separated by commas. The 16-bit value corresponding to each operand is stored into two consecutive bytes of the object program. The storage begins at the current program counter. The label is assigned to the address of the first 16-bit value. Multiple operands are stored in successive bytes. The operand may be a numeric constant, a character constant, a symbol, or an expression. If multiple operands are present, one or more of them can be null (two adjacent commas), in which case two bytes of zeros will be assigned for that operand. The fdb has been used many times so far in the book to define the reset vector. org fdb
$FFFE main
Define 32-Bit Constant: () dc.l <expr>(,<expr>,...,<expr>) () () dl <expr>(,<expr>,...,<expr>) () () .long <expr>(,<expr>,...,<expr>) ()
The dl directive may have one or more operands separated by commas. The 32-bit value corresponding to each operand is stored into four consecutive bytes of the object program (big endian). The storage begins at the current program counter. The label is assigned to the first 32-bit value. Multiple operands are stored in successive bytes. The operand may be a numeric constant, a character constant, a symbol, or an expression. If multiple operands are present, one or more of them can be null (two adjacent commas), in which case four bytes of zeros will be assigned for that operand. In the following examples the dl definitions are used to define 32-bit constants. S1 S2 S3
dl .long dc.l
100000,$12345678 1,10,100,1000,10000,100000,1000000,10000000 -1,0,1
Set Program Counter Origin: org <expression> () .org <expression> ()
The org directive changes the program counter to the value specified by the expression in the operand field. Subsequent statements are assembled into memory locations starting with the new program counter value. If no org directive is encountered in a source program, the program counter is initialized to zero. Expressions cannot contain forward references or undefined symbols. The org statements in Program A1.4 place the variables in RAM and the programs in EEPROM. The org statement is also used to set the reset vector. Reserve Multiple Bytes: () () () ()
rmb <expression> () ds <expression> () ds.b <expression> () .blkb <expression> ()
The rmb directive causes the location counter to be advanced by the value of the expression in the operand field. This directive reserves a block of memory the length of which in bytes is equal to the value of the expression. The block of memory reserved is not initialized to any
A1.6 䡲 Assembly Language Syntax
503
given value. The expression cannot contain any forward references or undefined symbols. This directive is commonly used to reserve a scratchpad or table area for later use. Checkpoint A1.9: Why can’t you use a forward reference in a rmb directive?
Reserve Multiple Words: () ds.w <expression> () () .blkw <expression> ()
The ds.w directive causes the location counter to be advanced by 2 times the value of the expression in the operand field. This directive reserves a block of memory the length of which in words (16-bit) is equal to the value of the expression. The block of memory reserved is not initialized to any given value. The expression cannot contain any forward references or undefined symbols. This directive is commonly used to reserve a scratchpad or table area for later use. ds.l Reserve Multiple 32-Bit Words: () ds.l <expression> () () .blkl <expression> ()
The ds.l directive causes the location counter to be advanced by 4 times the value of the expression in the operand field. This directive reserves a block of memory the length of which in words (32-bit) is equal to the value of the expression. The block of memory reserved is not initialized to any given value. The expression cannot contain any forward references or undefined symbols. This directive is commonly used to reserve a scratchpad or table area for later use. End of Program (Optional): end () .end ()
This directive signifies the end of the source code. The TExaS assembler will ignore these pseudo operation codes. Some other assemblers require one of these directives at the end of every program.
A1.6.9 S-19 Object code
The S-19 record output format encodes program and data object modules into a printable (ASCII) format. This allows viewing of the object file with standard tools and allows display of the module while transferring from one computer to the next or during loads between a host and target. The S-record format also includes information for use in error checking to insure the integrity of data transfers. S-Records are character strings made of several fields that identify the record type, record length, memory address, code/data, and checksum. Each byte of binary data is encoded as a 2-character hexadecimal number: the first character representing the highorder 4 bits, and the second the low-order 4 bits of the byte. The 5 fields that comprise an S-record are: 1. 2. 3. 4. 5.
Type S0, S1 or S9 Record Length Address Code/Data Checksum
Eight types of S-records have been defined to accommodate various encoding, transportation, and decoding needs, but only three types are used in most Freescale microcontrollers. The S0 record is a title record containing the ASCII name of the file in the Code/Data field. The address field of this type is usually 0000. The S1 record is a data
504
Appendix 1 䡲 Embedded System Development Using TExaS
record containing the information to be loaded sequentially starting at the specified address. The S9 record is a end of file marker, and sometimes contains the starting address to begin execution. In an embedded microcomputer environment, the starting address must be programmed at the appropriate place. For most Freescale microcontrollers, the reset vector is the last two bytes of ROM or EEPROM. The Record Length contains the count of the character pairs in the length record, excluding the type and record length. For S0, S1, S9 record types, the Address field is a 4byte value. For the S1 record type the address specifies where the data field is to be loaded into memory. There are from 0 to n bytes in the Code/Data field. This information contains executable code, memory loadable data, or descriptive information. The Checksum field is 2 ASCII characters used for error checking. The least significant byte of the one’s complement of the sum of the values represented by the pairs of characters making up the record length, address, and the code/data fields. When generating a checksum, one adds (call the result sum) the record length, address and code/data field using 8-bit modulo arithmetic (ignoring overflows.) The checksum is calculated Checksum $FF sum When verifying a checksum, one adds (call the result sum) the record length, address code/data field and checksum using 8-bit modulo arithmetic (ignoring overflows.) The sum should be $FF. Each record may be terminated with a CR/LF/NULL. The following is a typical S-record module: S1130000285F245F2212226A000424290008237C2A S11300100002000800082629001853812341001813 S113002041E900084E42234300182342000824A952 S107003000144ED492 S9030000FC
The module consists of four code/data records and an S9 termination record. The first S1 code/data record is explained as follows: S1 S-record type S1, indicating a code/data record to be loaded/verified at a 2-byte address. 13 Hex 13 (decimal 19), indicating 19 character pairs, representing 19 bytes of binary data, follow. 0000 Four-character 2-byte address: hex address 0000, indicates location where the data is to be loaded. The next 16 character pairs are the ASCII bytes of the actual program code/data 2A Checksum of the first S1 record. The second and third S1 code/data records each also contain $13 character pairs and are ended with checksums. The fourth S1 code/data record contains 7 character pairs. The S9 termination record is explained as follows: S9 03 0000 FC
S-record type S9, indicating a termination record. Hex 03, indicating three character pairs (3 bytes) to follow. Four character 2-byte address field, zeroes. Checksum of S9 record.
Checkpoint A1.10: What loader operation is caused by the following S19 record?
S107F026F08020FE54 Checkpoint A1.11: Is the checksum correct in the following S19 record?
S10CF0DD80CCF0E616F4BC20B876 Checkpoint A1.12: Create an S19 record that stores the value $1234 into location $5678.
A1.7 䡲 TExaS ViewBox
A1.7
505
TExaS ViewBox You use the ViewBox to specify which registers, memory and I/O devices you wish to view. Usually, the data at this address will be displayed. The exception is when you specify the v orV format; in this case the value of the address itself is displayed. The ViewBox display numbers in decimal constants (e.g., 0 100 10000), hexadecimal (e.g., $00 $1000 $F000) or binary (e.g., %100 %111000 %1111111111).You can observe the CPU registers (e.g., RegA RegX, RegSP) The register names are not case sensitive (e.g., regA rega REGA) and multiple spellings are allowed (e.g., A RegA PC CC CCR RegCC). The IR and EAR registers are observable.Your program symbols are case sensitive, but your program must be assembled first because the ViewBox needs the symbol table. I/O port names are available as symbols if your program includes the appropriate header file. The files HC12.RTF contains all the I/O ports. The file Port12.RTF contain the specific I/O ports supported by this simulator.Any expression which evaluates to a constant is also allowed: $F000+5*(1+2) data+1
assuming data is a valid symbol. The V format can be used to implement a calculator. First, specify one of the value formats for the result in the Format Field (v V v or V). Next, type the desired expression in the Address Field, then hit Enter. For example to calculate Main50 in hexadecimal (where Main is a symbol in your program), you could Step 1. Type Main+50 in the address field, Step 2. Type V in theformat field to specify 16-bit unsigned hex Step 3. Click the Enter button.
For example to calculate Program2-Program1 in decimal (where Program1 and Program2 are two symbols in your program), you could Step 1. Type Program2-Program1 in the address field, Step 2. Type v in theformat field to specify 16-bit signed decimal Step 3. Click the Enter button.
For the CCR register, the data field can be a regular number like above or be specify individual CCR bits like the following. Note that the upper case letter will set the CCR bit and a lower case letter will clear the CCR bit. CZVNIHSX czvnihsx
to set bits in the 9S12 CCR to clear bits in the 9S12 CCR
For example, if you wished to clear the zero bit and set the carry/negative bits, you could Step 1. Click on the CCR entry in the ViewBox or type CCR in the address field, Step 2. Type zCN in the data field, Step 3. Click the Enter button.
Any expression that evaluates to a constant is also allowed. For example, (5+’3’)/3 $F000+5*12 $F0|($23&$36) $F0F0%10
If the source code has been assembled, symbols can also be utilized. For all symbols except those generated by equ set or , the value of a symbol is its address. For example if data
org $F800 fcb 1,2,3
then the expression data1 yields the address $F801.
506
Appendix 1 䡲 Embedded System Development Using TExaS
Multiple data values can be entered as a list, separated by commas. All of the above examples (except the null-terminated ASCII string) can be concatenated together as a list. E.g., 1,2,3,4 (3+3*2)-4,’f’,4&-7,data+1
You use the format field to specify the format of the entry. Each address can have a separate format. If the field is blank or does not make one of the possibilities listed below, it will use the 16-bit unsigned hexadecimal format. Table A1.8 shows the ViewBox format field options.
Format
Description
Range
h d b H D B h or h d or d b or b H or H D or D B or B b3 b4 cc c or C s or S v V v or v V or V
8-bit unsigned hexadecimal 8-bit unsigned decimal 8-bit unsigned binary 16-bit unsigned hexadecimal 16-bit unsigned decimal 16-bit unsigned binary 8-bit signed hexadecimal 8-bit signed decimal 8-bit signed binary 16-bit signed hexadecimal 16-bit signed decimal 16-bit signed binary 3-bit binary (least significant bits) 4-bit binary (least significant bits) 8-bit binary showing bits in the CCR ASCII character NULL or EOT terminated ASCII string address itself unsigned decimal address itself unsigned hexadecimal address itself signed decimal address itself signed hexadecimal
$00 to $FF 0 to 255 %00000000 to %11111111 $0000 to $FFFF 0 to 65535 %0000000000000000 to %1111111111111111 $80 to $7F 128 to 127 %10000000 to %01111111 $8000 to $7FFF 32768 to 32767 %1000000000000000 to %0111111111111111 %000 to %111 %0000 to %1111
0 to 65535 $0000 to $FFFF 32768 to 32767 $8000 to $7FFF
Table A1.8 ViewBox formats.
Each of these formats (except for s S v V) can be preceded by a number which will create a list. Examples, assume the data is $313233343536373800 d shows 3h shows 4c shows s shows 2H shows
A1.8
49 $31,$32,$32 ‘1’,‘2’,‘3’,‘4’ “12345678” $3132,$3334
Microcomputer Interfacing in TExaS There are three components to microcomputer interfacing. Since many external devices have physical characteristics, the first step is the mechanical design of the physical components. Often, the mechanical design is simply selecting the physical devices from a list of available components. The next step is the analog and digital electronics used to connect the physical devices to the computer. The input/output information may be encoded as simple digital signals or variable analog signals. More complex systems may use frequency, period, phase, or pulse width represent the signals. The third component of interfacing is
A1.8 䡲 Microcomputer Interfacing in TExaS
507
the low-level software that transforms the mechanical and electrical devices into objects that perform the desired tasks. The group of these low-level functions is often designated as an I/O device driver. The IO menu offers the following commands, which allow you to connect external devices to the microcomputer. These commands are available only when an IO file is active. Switch LED CRT LCD Display Analog Keyboard DC Motor IR Remote HD44780 Stepper WhiteBackground
Allows you to connect up to 8 toggle switches. Allows you to connect up to 8 colored light emitting diodes. Allows you to connect a CRT terminate via the SCI port. Allows you to connect a liquid crystal display. Allows you to connect an analog input. Allows you to connect scanned matrix keyboard. Allows you to connect a DC Motor. Allows you to connect an IR remote control. Allows you to connect a HD44780-controller LCD display. Allows you to connect stepper motors. Allows you to toggle between white and black backgrounds.
The help system within TExaS gives extensive explanations of the available I/O devices. Maintenance Tip: New versions of TExaS may include new I/O devices. Check for the availability of TExaS upgrades at: http://users.ece.utexas.edu/⬃valvano.
Appendix 2 Running on an Evaluation Board Because embedded systems run in real time and have limited debugging software, it
makes sense to test the hardware/software on a simulator first. When first testing the system on the TExaS simulator, we use its cross-assembler running on a PC to convert our source programs into a listing file and object file. We then “run” our program on a simulator that emulates microcomputer with its external component. Simulation in this environment is more difficult than other computer applications because of the software execution is tightly coupled to (extremely dependent on) the hardware. Simulation of an embedded system is only effective if all the software, computer, external mechanical, and external electrical components are modeled. Another complicating issue is the real-time nature of the external mechanical and electrical devices. Figure A2.1 outlines the software development process using real hardware.
Figure A2.1 Assembly language development process on a real board using TExaS.
Source code PTT DDRT cnt main off look loop
key
equ equ org rmb org lds movb bclr ldd std ldaa anda cmpa bne ldx dex stx bne bset bra fcb org fdb
$0240 $0242 $0800 2 $4000 #$4000 #$80,DDRT PTT,#$80 #4444 cnt PTT #$7F key off cnt cnt loop PTT,#$80 look %00100011 $FFFE main
Assembler
Loader Microcomputer
Object code $4000 $4003 $4008 $400C $400F $4012 $4015 $4017 $401A $401C $401F $4020 $4023 $4025 $4029 $402B $FFFE
CF4000 180B800242 1D024080 CC115C 7C0800 B60240 847F B1402B 26EC FE0800 09 7E0800 26ED 1C024080 20E1 23 4000
Processor RAM EPROM I/O
External circuits and devices
Address Data
Figures A2.2 and A2.3 show two popular hardware platforms on which you could design, implement, and test embedded systems. The exact details of how to program these boards is not included in this book, because this information is included on the manufacturers’ web site or included with the product. When debugging involves controlling real hardware, we must use an actual microcomputer running in real-time. Because the programs are stored in ROM or EEPROM on the single chip microcomputer they are difficult to debug. To solve this problem there are evaluation boards that include a debugger. These systems may have RAM in the positions where the final system will have ROM. Some development systems load programs into EEPROM. With this development system, we also use a cross-assembler or a cross-compiler to convert our source programs into a listing file and object file. The object file is transmitted via the serial port to the evaluation board and loaded into memory. One possibility for 9S12 program development uses a development, and a resident debugger. Resident means the debugger exists as 9S12 software that exists along with the regular user program on the system. A popular debugger for the 9S12 is called the Serial Monitor. The debugger allows us to test our software. Sophisticated development environments integrate the editor, 508
Appendix 2 䡲 Running on an Evaluation Board
509
Image not available due to copyright restrictions
compiler, assembler, serial port communication and debugger into a single application running on the PC. A second possibility is the background debug module (BDM). The BDM is more expensive than the serial monitor, but does provide a more rich set of debugging features. Figure A2.2 shows the Dragon12 board from Wytec (http://www.evbplus. com/index. html). This board provides an integrated approach to teaching embedded systems where most of the I/O devices are included on the board itself. This board includes the 9S12DG512, and includes the following additional components 䡲 16 by 2 LCD display module with LED backlight module 䡲 4-digit, multiplexed 7-segment display module 䡲 4 by 4 matrix keypad Figure A2.3 Adapt9S12DP512M0 board from Technological Arts.
510
Appendix 2 䡲 Running on an Evaluation Board
䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲
Eight LEDs connected to port B An 8-position DIP switch connected to port H Four pushbutton switches IR transceiver with built-in 38 kHz oscillator RS485 communication port with terminal block Speaker driven by timer, or PWM, or DAC for alarm, voice and music applications Potentiometer trimmer pot for analog input Dual SCIs with DB9 connectors Dual 10-bit DAC for testing SPI interface and generating analog waveforms I2C based Real Time Clock DS1307 with backup battery Dual H-Bridge with motor feedback or incremental encoder interface Four robot servo outputs with a terminal block for external 5 V Opto-coupler output DPDT form C relay Temperature sensor for home automation applications Light sensor for home automation applications Logic probe with LED indicator Fast prototyping with on-board solderless breadboard DB9 RS232 cable for connecting to a PC serial port
If the focus of the course is software development, then the Dragon 12 is an excellent platform to teach embedded systems. This board is very cost effective, much cheaper than purchasing the components individually. If embedded systems is taught in more than one class, this board supports both a simple introduction to embedded systems as well as a more sophisticated embedded systems lab including CAN, I2C, motors, and servos. Another possible hardware configuration is shown in Figure A2.3. The Adapt9S12 DP512M0 board from Technological Arts (http://www.technologicalarts.com/) provides a more minimal approach to teaching embedded systems. There are few I/O devices on the board itself. Rather, the students are responsible for designing, implementing and testing the hardware components of the interface to the external devices. This board includes the 9S12DP512, and includes 䡲 䡲 䡲 䡲
Two 50-pin connectors bring out all I/O pins of the MCU RS232 transceivers provided for both SCI channels One LED tied to an output port A low-voltage inhibit reset circuit plus a reset button
The minimal system requires the student to design, implement and test both the hardware and the software components for the system. Students experience the hardware in a real and physical sense. There are more opportunities for education, but more opportunities for failure. Students must learn how to debug hardware failures (bad design and bad components) at the same time as debugging their software. The initial design, implementation and testing on the simulator should remove most of the software bugs.
Appendix 3 Systems Engineering The information in this appendix will probably not be required for students taking a class based on this textbook. However, in the future students may be faced with designing a system based on the concepts presented in this book. There are two systems level considerations for each the reference information in this appendix might be valuable. The first concept is manufacturability and the second concept is power.
A3.1
Design for Manufacturability Using standard values for resistors and capacitors makes finding parts quicker. Standard values for 1% resistors range from 10 to 2.2 M. We can multiply a number in Table A3.1 by powers of 10 to select a standard value 1% resistor. For example, if we need a 5 k 1% resistor, the closest number is 49.9*100, or 4.99 k.
Table A3.1 Standard resistor values for 1% tolerance
10.0 13.3 17.8 23.7 31.6 42.2 56.2 75.0
10.2 13.7 18.2 24.3 32.4 43.2 57.6 76.8
10.5 14.0 18.7 24.9 33.2 44.2 59.0 78.7
10.7 14.3 19.1 25.5 34.0 45.3 60.4 80.6
11.0 14.7 19.6 26.1 34.8 46.4 61.9 82.5
11.3 15.0 20.0 26.7 35.7 47.5 63.4 84.5
11.5 15.4 20.5 27.4 36.5 48.7 64.9 86.6
11.8 15.8 21.0 28.0 37.4 49.9 66.5 88.7
12.1 16.2 21.5 28.7 38.3 51.1 68.1 90.9
12.4 16.5 22.1 29.4 39.2 52.3 69.8 93.1
12.7 16.9 22.6 30.1 40.2 53.6 71.5 95.3
13.0 17.4 23.2 30.9 41.2 54.9 73.2 97.6
Standard values for 5% resistors range from 10 to 22 M. We can multiply a number in Table A3.2 by powers of 10 to select a standard value 5% resistor. For example, if we need a 25 k 5% resistor, the closest number is 24*1000, or 24 k. Table A3.3 shows standard capacitor values.
Table A3.2 Standard resistor values for 5% tolerance.
10 33
Table A3.3 Standard capacitor values for 10% tolerance.
10pF 12pF 15pF 18pF 22pF 27pF 33pF 39pF 47pF 56pF 68pF 82pF
11 36
12 39
100pF 120pF 150pF 180pF 220pF 270pF 330pF 390pF 470pF 560pF 680pF 820pF
13 43
15 47
1000pF 1200pF 1500pF 1800pF 2200pF 2700pF 3300pF 3900pF 4700pF 5600pF 6800pF 8200pF
16 51
18 56
0.010 F 0.012 F 0.015 F 0.018 F 0.022 F 0.027 F 0.033 F 0.039 F 0.047 F 0.056 F 0.068 F 0.082 F
20 62
22 68
0.10 F 0.12 F 0.15 F 0.18 F 0.22 F 0.27 F 0.33 F 0.39 F 0.47 F 0.56 F 0.68 F 0.82 F
24 75
1.0 F 1.2 F 1.5 F 1.8 F 2.2 F 2.7 F 3.3 F 3.9 F 4.7 F 5.6 F 6.8 F 8.2 F
27 82
30 91
10 F
22 F 33 F 47uF
511
512
A3.2
Appendix 3 䡲 Systems Engineering
Battery Power A battery is a source of energy that can be used in an embedded system to make the system portable. Another application of batteries is to supply power to a mission-critical system when the regular AC power is lost temporarily. Typically, a battery has three parts. The anode is the negative terminal of the battery, the cathode is the positive terminal and the electrolyte is a liquid solution that accepts stores and releases energy. These three components can be constructed from many different materials and configured in an almost endless array of sizes and shapes. The type, size and shape of the materials play a major role in determining the battery performance. A primary battery is used once and discarded, and a secondary battery can be recharged and reused. There are many parameters to consider when selecting a battery. Nominal voltage is the typical voltage of the battery when fully charged. Some batteries maintain a fairly constant voltage output while energy is being discharged. However, other batteries will drop its voltage steadily during usage. Physical parameters of the battery, such as volume, weight, and shape, often play a significant role in the overall appeal of an embedded system. The energy storage of a battery is typically defined in amp-hours, because the voltage is assumed constant. The standard units of energy are watt-hours (1 W-hr is 3600 J). One can estimate the operation time of a battery-powered embedded system by dividing the energy storage by the required current to run the system. Peak current is the maximum current the battery can deliver. Shelf-life, operating temperature, and storage temperature are other parameters to consider when choosing a battery. Memory effect, is an observable condition in some rechargeable batteries that causes them to hold less charge over time. Heavy duty batteries, were first made with Zinc-carbon in the mid-1800’s, but now are made with Zinc chloride. They are a low cost, low performance battery, but are not appropriate for most embedded applications. An alkaline battery is made with alkaline manganese. Alkaline batteries are appropriate for situations that require long shelf life, but size and weight are not important. There are two kinds of lead acid batteries. Flooded lead acid vent inflammable gasses and require additional water to maintain the proper specific gravity of the acid. Valve-regulated lead-acid (VRLA also called sealed lead battery) have about a two-to-one advantage over the flooded type battery in specific energy and energy density. In the VRLA cell, the vent for the gas space incorporates a pressure relief valve to minimize the gas loss and to prevent direct contact between the headspace and outside air. Lead acid batteries can be used for backing up power on systems that require large currents. Lead acid batteries have a maximum storage time of six months at temperatures between 20° and 30 °C, after which they require a freshening charge. Zinc chloride, alkaline and lead acid batteries all have voltages that drop as energy is drained from them. In these systems the voltage can be monitored as a measure of the energy left in the battery. However, embedded systems that use these types of battery will require a voltage regulator to maintain a constant voltage for the electronics. For example, a 5 V 9S12DP512 will operate with a power supply voltage from 4.5 to 5.25 V. Nickel-cadmium (NiCad) and Nickel-metal hydride (NiMH) are lost-cost rechargeable batteries that used to be popular for embedded systems. NiMH batteries have about twice the storage capacity as NiCad. Certain NiCad batteries gradually lose their maximum energy capacity if they are repeatedly recharged after being only partially discharged. Most NiMH batteries do not suffer from a memory-effect. The NiMH batteries operate between 10° to 55 °C, and have a projected life of seven and a half years at 30 °C. You should cycle new NiMH batteries three to five times to achieve peak performance. Cycling or conditioning a NiMH battery is performed by completely discharging it then completely recharging it. At room temperature, NiMH batteries will self-discharge in 30 to 60 days without usage, depending on environmental condition. In general, you can expect NiMH batteries to last up to 500 recharges.
A3.2 䡲 Battery Power
513
The search for a lighter battery that uses metallic lithium as its anode was driven by the fact that lithium is the lightest and the most electropositive of metals. The specific energy of lithium metal (1727Ah/lb) is greater than lead (118Ah/lb) and cadmium (218Ah/lb). There are a whole range of batteries based on Lithium, both single use (used in cameras) and rechargeable. The most common rechargeable type is called Lithium-ion (Li-ion). When energy is being discharged, the lithium ion moves from the anode to the cathode. During charging, the lithium ion moves from the cathode to the anode. Because of their excellent energy to weight and energy to size ratios, Lithium-ion rechargeable batteries are commonly employed in portable embedded systems. Table A3.4 shows energy storage for typical AA-sized batteries (50 mm tall by 14 mm diameter).
Table A3.4 Energy storage for different battery types.
Battery
Voltage (V)
Energy (mAh)
Type
Alkaline Lithium NiCad NiMH Li-ion
1.5 1.5 1.2 1.2 3.6
2000 3000 1200 1800 1900
Primary Primary Secondary Secondary Secondary
Glossary of Terms 1/f noise A fundamental noise in resistive devices arising from fluctuating conductivity. Same as pink noise. 2’s complement See two’s complement. 60 Hz noise An added noise from electromagnetic fields caused by either magnetic field induction or capacitive coupling. accumulator High-speed memory located in the processor used to perform arithmetic or logical functions. The accumulators on the 9S12 are A and B. accuracy A measure of how close our instrument measures the desired parameter referred to the NIST. acknowledge Clearing the interrupt flag bit that requested the interrupt. actuator Electro-mechanical or electro-chemical device that allows computer commands to affect the external world. ADC Analog to digital converter, an electronic device that converts analog signals (e.g., voltage) into digital form (i.e., integers). address bus A set of digital signals that connect the CPU, memory and I/O devices, specifying the location to read or write for each bus cycle. See also control bus and data bus. aliasing When digital values sampled at fs contain frequency components above 0.5 fs, then the apparent frequency of the data is shifted into the 0 to 0.5 fs range. See Nyquist theory. alternatives The total number of possibilities (e.g., an 8-bit number scheme can represent 256 different numbers). An 8-bit digital to analog converter (DAC) can generate 256 different analog outputs. arithmetic logic unit (ALU) Component of the processor that performs arithmetic and logic operations. arm Activate so that interrupts are requested. ASCII American Standard Code for Information Interchange, a code for representing characters, symbols, and synchronization messages as 7 bit, 8-bit or 16-bit binary values. assembler System software that converts an assembly language program (human readable format) into object code (machine readable format). assembly directive Operations included in the program that are not executed by the computer at run time, but rather are interpreted by the assembler during the assembly process. Same as pseudo-op. assembly listing Information generated by the assembler in human readable format, typically showing the object code, the original source code, assembly errors, and the symbol table. asynchronous communications interface adapter (ACIA) Device to transmit data with asynchronous serial communication protocol (same as UART and SCI). asynchronous protocol A protocol where the two devices have separate and distinct clocks atomic Software execution that can not be divided or interrupted. Once started an atomic operation will run to its completion without interruption. On most computers the assembly language instructions are atomic. background mode A 9S12 mode with the background debug module (BDM) active. 514
Glossary of Terms
515
bandwidth The information transfer rate, the amount of data transferred per second. Same as throughput. bang-bang A control system where the actuator has only two states, and the system “bangs” all the way in one direction or “bangs” all the way in the other, same as binary controller. basis Subset from which linear combinations can be used to reconstruct the entire set. baud rate In general the baud rate is the total number of bits (information, overhead, and idle) per time that are transmitted, in a modem application it is the total number of sounds per time are transmitted. bi-directional Digital signals that can be either input or output. biendian The ability to process numbers in both big and little endian formats. big endian Mechanism for storing multiple byte numbers such that the most significant byte exists first (in the smallest memory address). See also little endian. binary A system that has two states, on and off. binary operation A function that produces its result given two input parameters. For example, addition, subtraction, and multiplication are binary operations. binary recursion A recursive technique that makes two calls to itself during the execution of the function. See also recursion, linear recursion, and tail recursion. bipolar stepper motor A stepper motor where the current flows in both directions (in/out) along the interface wires; a stepper with 4 interface wires. bit Basic unit of digital information taking on the value of either 0 or 1. bit time The basic unit of time used in serial communication. blind cycle A software/hardware synchronization method where the software waits a specified amount of time for the hardware operation to complete. The software has no direct information (blind) about the status of the hardware. borrow During subtraction, if the difference is too small, then we use a borrow to pass the excess information into the next higher place. For example, in decimal subtraction 3627 requires a borrow from the ones to tens place because 6-7 is too small to fit into the 0 to 9 range of decimal numbers. break or trap A break or a trap is an instrument that halts the processor. The TExaS application will halt both software and hardware simulation when a specific address is encountered. With a resident debugger, the break is created by replacing specific op code with a software interrupt instruction. When encountered it will stop your program and jump into the debugger. Therefore, a break halts the software. The condition of being in this state is also referred to as a break. breakpoint The place where a break is inserted, the time when a break is encountered, or the time period when a break is active. buffered I/O A FIFO queue is placed in between the hardware and software in an attempt to increase bandwidth by allowing both hardware and software to run in parallel. burn The process of programming a ROM, PROM or EEPROM. bus A set of digital signals that connect the CPU, memory and I/O devices, consisting of address signals, data signals and control signals. See also address bus, control bus and data bus. bus interface unit (BIU) Component of the processor that reads and writes data from the bus. busy-waiting A software/hardware synchronization method where the software continuously reads the hardware status waiting for the hardware operation to complete. The software usually performs no work while waiting for the hardware. Same as gadfly. byte Digital information containing 8 bits.
516
Glossary of Terms
carry During addition, if the sum is too large, then we use a carry to pass the excess information into the next higher place. For example, in decimal addition 36 27 requires a carry from the ones to tens place because 6 7 is too big to fit into the 0 to 9 range of decimal numbers. cathode ray tube (CRT) terminal An I/O device used to input data from a keyboard and output character data to a screen. The electrical interface is usually asynchronous serial. ceiling Establishing an upper bound on the result of an operation. closed loop control system A control system that includes sensors to measure the current state variables. These inputs are used to drive the system to the desired state. CMOS A digital logic system called complementary metal oxide semiconductor. It has properties of low power and small size. Its power is a function of the number of transitions per second. compiler System software that converts a high level language program (human readable format) into object code (machine readable format). concurrent programming A computer system that supports two or more software tasks that are simultaneously active. Typically one task executes at a time, and there are mechanisms to suspend one task and execute another task. Compare to parallel programming. condition code register (CCR) Register in the processor that contains the status of the previous ALU operation, as well as some operating mode flags such as the interrupt enable bit. control bus A set of digital signals that connect the CPU, memory and I/O devices, specifying when to read or write for each bus cycle. See also address bus and data bus. control unit (CU) Component of the processor that determines the sequence of operations. CPU bound A situation where the input or output device is faster than the software. In other words it takes less time for the I/O device to process data, than for the software to process data. crisp input An input parameter to the fuzzy logic system, usually with units like cm, cm/sec, °C etc. crisp output An output parameter from the fuzzy logic system, usually with units like dynes, watts, etc. critical section Locations within a software module, which if an interrupt were to occur at one of these locations, then an error could occur (e.g., data lost, corrupted data, program crash, etc.) Same as vulnerable window. cross-assembler An assembler that runs on one computer but creates object code for a different computer. cross-compiler A compiler that runs on one computer but creates object code for a different computer. DAC Digital to analog converter, an electronic device that converts digital signals (i.e., integers) to analog form (e.g., voltage). data acquisition system A system that collects information, same as instrument. data bus A set of digital signals that connect the CPU, memory and I/O devices, specifying the value that is being read or writen for each bus cycle. See also address bus and control bus. defuzzification Conversion from the fuzzy logic output variables to the crisp outputs. denormalized A denormalized number is an unnormalized floating point number with an exponent of the smallest possible value. An unnomalized number has a mantissa value
Glossary of Terms
517
less than one. The mantissa of a normalized floating point number is greater than or equal to 1, but strictly less than 2. desk checking or dry run We perform a desk check (or dry run) by determining in advance, either by analytical algorithm or explicit calculations, the expected outputs of strategic intermediate stages and final results for a typical inputs. We then run our program and compare the actual outputs with this template of expected results. device driver A collection of software routines that perform I/O functions. digital signal processing Processing of data with digital hardware or software after the signal has been sampled by the ADC (e.g., filters, detection and compression/decompression). direct An addressing mode where the data or address value for the instruction is located in memory at address $0000 to $00FF. Contrast with extended. direction register A bi-directional port configuration register that determines if the port will be an input or an output. disarm Deactivate so that interrupts are not requested. DMA Direct Memory Access is a software/hardware synchronization method where the hardware itself causes a data transfer between the I/O device and memory at the appropriate time when data needs to be transferred. The software usually can perform other work while waiting for the hardware. No software action is required for each individual byte. double byte Two bytes containing 16 bits. Same as word. double-pole switch Two separate and complete switches that are activated together, same as two-pole. Contrast with single-pole. double-throw switch A switch with three contact connections. The center contact will be connected exactly one of the other two contacts. Contrast with single-throw. download The process of transferring object code from the host (e.g., the PC) to the target microcomputer (e.g., the 9S12.) drop out An error that occurs after a right shift or a divide, and the consequence is that an intermediate result looses its ability to represent all of the values (e.g., I 100*(N/51) can only result in the values 0, 100, or 200, whereas I (100*N)/51 properly calculates the desired result). dummy PC A computer bus cycle that fetches data pointed to by the PC, but the data is not used. dummy SP A computer bus cycle that fetches data pointed to by the SP, but the data is not used. duty cycle For a periodic digital wave, it is the percentage of time the signal is high. dynamic RAM Volatile read/write storage built from a capacitor and a single transistor having a low cost, but requiring refresh. Contrast with static RAM. EEPROM Electrically erasable programmable read only memory that is nonvolatile and easy to reprogram. Typically, EEPROM can be erased and reprogrammed over 10,000 times. effective address register (EAR) A register that contains the address for the current memory cycle. embedded computer system A system that performs a specific dedicated operation where the computer is hidden or embedded inside the machine. emulator An in-circuit emulator is an expensive debugging hardware tool that mimics the processor pin outs. To debug with a 6811 emulator, you would remove the 6811 processor chip and attach the emulator cable into the 6811 processor socket. The emulator would sense the processor input signals and recreate the processor outputs signals on
518
Glossary of Terms
the socket as if a 6811 chip were actually there running at 2 MHz. Inside the emulator you have internal read/write access to the registers and processor state. Most emulators allow you to visualize/record strategic information in real-time without halting the program execution. You can also remove ROM chips and insert the connector of a ROM-emulator. This type of emulator is less expensive, and it allows you to debug ROM-based software systems. EPROM programmer System hardware/software that burns the object code into the microcomputer’s EPROM. EPROM Same as PROM. Electrically programmable read only memory that is nonvolatile and requires external devices to erase and reprogram. It is usually erased using UV light. erase The process of clearing the information in a PROM or EEPROM. The information bits are usually all set to logic 1. EVB Evaluation Board, a product used to develop microcomputer software. even parity A communication protocol where the number of ones in the data plus a parity bit is an even number. Contrast with odd parity. expanded mode The mode where some of the I/O ports are used to create an external data bus (control, address, data) allowing external memory to be connected. extended An addressing mode where the data or address value for the instruction is located anywhere in memory. Contrast with direct. fan out The number of inputs that a single output can drive if the devices are all in the same logic family. Fast clear A 9S12 mode where the associated flag is automatically cleared when the data or timer register is accessed. filter In the debugging context, a filter is a boolean function or conditional test used to make run-time decisions. For example, if we print information only if two variables x,y are equal, then the conditional (x == y) is a filter. Filters can involve hardware status as well. For example, if we halt when the serial port has an overrun error, then (SCSR&0x08) is the filter, and if(SCSR&0x08)asm(''swi''); would be the entire instrument. finite impulse response filter (FIR) A digital filter where the output is a function of a finite number of current and past data samples, but not a function of previous filter outputs. fixed-point A technique where calculations involving nonintegers are performed using a sequence of integer operations. E.g., 0.123*x is performed in decimal fixed-point as (123*x)/1000 or in binary fixed-point as (126*x) 10. flash EEPROM Electrically erasable programmable read only memory that is nonvolatile and easy to reprogram. Flash EEPROMs are typically larger than regular EEPROM, and have fewer erase-reprogram cycles. floating A logic state where the output device does not drive high or pull low. The outputs of open collector and tristate devices can be in the floating state. Same as HiZ. floor Establishing a lower bound on the result of an operation. fork Used in parallel programming to create additional software tasks that will run in parallel. See join. frame A complete and distinct packet of bits occuring in a serial communication channel. framing error An error when the receiver expects a stop bit (1) and the input is 0. friendly Friendly software modifies just the bits that need to be modified, leaving the other bits unchanged. full-duplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer simultaneously in both directions. Contrast with simplex and half-duplex channels.
Glossary of Terms
519
full-duplex communication A system that allows information (data, characters) to transfer simultaneously in both directions. functional debugging The process of detecting, locating, or correcting functional and logical errors in a program and the process of instrumenting a program for such purposes is called functional debugging or often simply debugging. Contrast with performance debugging. fuzzification Conversion from the crisp inputs to the fuzzy logic input variables. fuzzy logic Boolean logic (true/false) that can take on a range of values from true (255) to false (0). Fuzzy logic and is calculated as the minimum. Fuzzy logic or is the maximum. gadfly A software/hardware synchronization method where the software continuously reads the hardware status waiting for the hardware operation to complete. The software usually performs no work while waiting for the hardware. Same as busy-waiting. general purpose computer system A system like the IBM-PC or Macintosh with a keyboard, disk and display that can be programmed for a wide variety of purposes. half-duplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer in both directions, but in only one direction at a time. Contrast with simplex and full-duplex channels. half-duplex communication A system that allows information to transfer in both directions, but in only one direction at a time. handshake A software/hardware synchronization method where control and status signals go both directions between the transmitter and receiver. The communication is interlocked meaning each device will wait for the other. hard real-time system As one that can guarantee that a process will complete a critical task within a certain specified range. In data acquisition system, hard real-time means there is an upper bound on the latency between when a sample is supposed to be taken (every 1/fs) and when the ADC converter is actually started. Hard real-time also implies that no ADC samples are missed. hexadecimal A number system that uses base 16. HiZ A logic state where the output device does not drive high or pull low. The outputs of open collector and tristate devices can be in the HiZ state. Same as floating. hold time When latching data into a device with a rising or falling edge of a clock, the hold time is the time after the active edge of the clock that the data must continue to be valid. Contrast with setup time. hysteresis A condition when the output of a system depends not only on the input, but also on the previous output (e.g., a transducer that follows a different response curve when the input is increasing than when the input is decreasing). I/O bound A situation where the input or output device is slower than the software. In other words it takes longer for the I/O device to process data, than for the software to process data. I/O device A computer component capable of bringing information from the external environment into the computer (input device), or sending data out from the computer to the external environment (output device). I/O port A hardware device that connects the computer with external components. IEEE488 A medium speed handshaking parallel I/O standard used for desktop instruments. immediate An addressing mode where the operand is a fixed data or address value. incremental control system A control system where the actuator has many possible states, and the system increments or decrements the actuator value depending on either in error is positive or negative.
520
Glossary of Terms
indexed An addressing mode where the data or address value for the instruction is located in memory pointed to by an index register. infinite impulse response filter (IIR) A digital filter where the output is a function of an infinite number of past data samples, usually by making the filter output a function of previous filter outputs. inherent An addressing mode where there is no operand or where the operand is implied (not explicitly stated). input capture A mechanism to set a flag and capture the current time (TCNT value) on the rising, falling or rising&falling edge of an external signal. The input capture event can also request an interrupt. instruction register (IR) Register in the control unit that contains the op code for the current instruction. instrument An instrument is the code injected into a program for debugging or profiling. This code is usually extraneous to the normal function of a program and may be temporary or permanent. Instruments injected during interactive sessions are considered to be temporary because these instruments can be removed simply by terminating a session. Instruments injected in source code are considered to be permanent because removal requires editing and recompiling the source. An example of a temporary instrument occurs when the debugger replaces a regular op code with the swi instruction. This temporary instrument can be removed dynamically by restoring the original op code. A print statement added to your source code is an example of a permanent instrument, because removal requires editing and recompiling. instrument A system that collects information, same as data acquisition system. instrumentation The process of injecting or inserting a debugging instrument. interrupt A software/hardware synchronization method where the hardware causes a special software program (interrupt handler) to execute when its operation to complete. The software usually can perform other work while waiting for the hardware. interrupt flag A status bit that is set by the hardware to signify an external event has occurred. Same as trigger flag. interrupt mask A control bit that, if programmed to 1, will cause an interrupt request when the associated trigger flag is set. Same as arm. interrupt service routine (ISR) Program that runs as a result of an interrupt. interrupt vector 16-bit values at the end of memory specifying where the software should execute after an interrupt request. There is a unique interrupt vector for each type of interrupt. IRQ A interrupt mechanism on the 9S12 on PE1. join Used in parallel programming to combine two or more software tasks into one. Execution after a join will continue when all software tasks above the join are complete. See fork. kibibit Stands for kilo-binary-bits, which is 1024 bits or 128 bytes, abbreviated Kibit. kibibyte Stands for kilo-binary-bytes, which is 1024 bytes or 8192 bits, abbreviated KiB. latch As a noun, it means a register. As a verb, it means to store data into the register. latched input port An input port where the signals are latched (saved) on an edge of an associated strobe signal. latency In this book latency usually refers to the response time of the computer to external events. For example, the time between new input becoming available and the time the input is read by the computer. For example, the time between an output device becoming idle and the time the input is the computer writes new data to it. There can also be a latency for an I/O device, which is the response time of the external I/O device
Glossary of Terms
521
hardware to a software command. For a data acquisition system, the time between the time when the signal should be sampled and the time the ADC is actually started. LCD Liquid Crystal Display, where the computer controls the reflectance or transmittance of the liquid crystal, characterized by its flexible display patterns, low power, low cost, and slow speed. LED Light Emitting Diode, where the computer controls the electrical power to the diode, characterized by its simple display patterns, medium power, and high speed. linear filter Means the output is a linear combination of its inputs. little endian Mechanism for storing multiple byte numbers such that the least significant byte exists first (in the smallest memory address). Contrast with big endian. linear recursion A recursive technique that makes only one call to itself during the execution of the function. Linear recursive functions are easier to implement iteratively. We draw the execution pattern as a straight or linear path. See also recursion, binary recursion, and tail recursion. loader System software that places the object code into the microcomputer’s memory. If the object code is stored in EEPROM, the loader is also called a EEPROM programmer. logic analyzer A hardware debugging tool that allows you to visualize many digital logic signals versus time. Real logic analyzers have at least 32 channels and can have up to 200 channels, with sophisticated techniques for triggering, saving and analyzing the real-time data. In TExaS, logic analyzers have only 8 channels and simply plot digital signals versus time. LSB The least significant bit in a number system is the bit with the smallest significance, usually the right-most bit. With signed or unsigned integers the significance of the LSB is 1. maintenance Process of verifying, changing, correcting, enhancing, and extending a system. mark A digital value of true or logic 1. Contrast with space. mask As a verb, mask is the operation that selects certain bits out of many bits, using the logical and operation. The bits that are not being selected will be cleared to zero. When used as a noun, mask refers to the specific bits that are being selected. measurand A signal measured by a data acquisition system. mebibit Stands for mega-binary-bits, which is 1,048,576 bits, abbreviated Mibit. mebibyte Stands for mega-binary-bytes, which is 1,048,576 bytes, abbreviated MiB. membership sets Fuzzy logic variables that can take on a range of values from true (255) to false (0). memory A computer component capable of storing and recalling information. memory-mapped I/O A configuration where the I/O devices are interfaced to the computer in a manner identical to the way memories are connected, from an interfacing perspective I/O devices and memory modules shares the same bus signals, from a programmer’s point of view the I/O devices exist as locations in the memory map, and I/O device access can be performed using any of the memory access instructions. Mibit Stands for mega-binary-bits, which is 1,048,576 bits, same as mebibit. MiB Stands for mega-binary-bytes, which is 1,048,576 bytes, same as mebibyte. microcomputer An electronic device capable of performing input/output functions containing a microprocessor, memory, and I/O devices. microcontroller A single chip microcomputer like the Freescale 6811, Freescale 9S12, Intel 8051, Intel 8096, PIC16, or the Texas Instruments TMS370. mnemonic The symbolic name of an operation code, like ldaa psha stx. monitor or debugger window A monitor is a debugger feature that allows use to passively view strategic software parameters during the real-time execution of our program.
522
Glossary of Terms
An effective monitor is one that has minimal effect on the performance of the system. When debugging software on a windows-based machine, we can often set up a debugger window that displays the current value of certain software variables. MSB The most significant bit in a number system is the bit with the greatest significance, usually the left-most bit. If the number system is signed, then the MSB signifies positive (0) or negative (1). multiple access circular queue (MACQ) A data structure used in data acquisition systems to hold the current sample and a finite number of previous samples. multi-threaded A system with multiple threads (e.g., main program and interrupt service routines) that cooperate towards a common overall goal. negative logic A signal where the true value has a lower voltage than the false value, in digital logic true is 0 and false is 1, in TTL logic true is less than 0.7 volts and false is greater than 2 volts, in RS232 protocol true is 12 volts and false is 12 volts. Contrast with positive logic. nibble 4 binary bits or 1 hexadecimal digit. nonatomic Software execution that can be divided or interrupted. Most lines of C code require multiple assembly language instructions to execute, therefore an interrupt may occur in the middle of a line of C code. nonintrusive A characteristic when the presence of the collection of information itself does not affect the parameters being measured. non-intrusive/intrusive Non-intrusiveness is the characteristic or quality of a debugger that allows the software/hardware system to operate normally as if the debugger did not exist. Intrusiveness is used as a measure of the degree of perturbation caused in program performance by an instrument. For example, a print statement added to your source code and single-stepping are very intrusive because they significantly affect the real-time interaction of the hardware and software. When a program interacts with real-time events, the performance is significantly altered. On the other hand, an instrument with outputs strategic information on LEDs (that requires just 1 s to execute) is much less intrusive. A logic analyzer that passively monitors the address and data by is completely non-intrusive. An in-circuit emulator is also non-intrusive because the software input/output relationships will be the same with and without the debugging tool. non-invasive/invasive Non-invasiveness is the characteristic or quality of a debugger that makes the order of invocation immaterial. The debugger and the user program co-exist in the same global environment. On the other hand, an invasive debugger requires the user program to execute within an environment defined by the debugger. The debugger is invoked first and the program is then loaded either by the debugger or by the user from within the debugger. Invasiveness is also a measure of the degree of source code modification to debug or monitor a program. A resident debugger like the serial monitor is invasive because it exists first and then your program is loaded on top of it. This program development environment is invasive because the 9S12 SCI interrupts with the serial monitor are different from the eventual the single chip embedded application. An in-circuit emulator is non-invasive because it can coexist (be added or deleted) from our system without changing the way our system runs. nonlinear filter A filter where the output is not a linear combination of its inputs (e.g., median, minimum, maximum are examples of nonlinear filters). nonreentrant A software module that once started by one thread, can not be interrupted and executed by a second thread. A nonreentrant modules usually involve nonatomic accesses to global variables or I/O ports: read modify write, write followed by read, or a multistep write.
Glossary of Terms
523
nonvolatile A condition where information is not lost when power is removed. When power is restored, then the information is in the state that occurred when the power was removed. nonvolatile RAM Read/write storage that achieves its long term storage ability because it includes a battery. normalized The mantissa of a normalized floating point number is greater than or equal to 1, but strictly less than 2. null cycle A computer bus cycle that fetches data at address $FFFF, but the data is not used. Nyquist Theorem If a input signal is captured by an ADC at the regular rate of fs samples/sec, then the digital sequence will accurately represent the 0 to 0.5 fs frequency components of the original signal. object code Programs in machine readable format created by the compiler or assembler. The S19 records are examples of object code. odd parity A communication protocol where the number of ones in the data plus a parity bit is an odd number. Contrast with even parity. op code, opcode or operation code A specific instruction executed by the computer. The op code along with the operand completely specify the function to be performed. In assembly language programming, the op code is represented by its mnemonic, like ldaa. During execution, the op code is stored as a machine code loaded in memory. The ldaa instruction with immediate addressing has a machine code of $86. open collector A digital logic output that has two states low and HiZ. open loop control system A control system that does not include sensors to measure the current state variables. operand The second part of an instruction that specifies either the data or the address for that instruction. An assembly instruction typically has an op code (e.g., ldaa) and an operand (e.g., #55). Instructions that use inherent addressing mode have no operand field. operating system System software for managing computer resources and facilitating common functions like input/output, memory management, and file system. oscilloscope A hardware debugging tool that allows you to visualize one or two analog signals versus time. In TExaS, oscilloscopes can plot up to 8 channels. output compare A mechanism to cause a flag to be set and an output pin to change when the TCNT matches a preset value. The output compare event can also request an interrupt. overflow An error that occurs when the result of a calculation exceeds the range of the number system. For example, with 8-bit unsigned integers, 200 57 will yield the incorrect result of 1. overflow When TCNT increments from $FFFF back to $0000, setting the TOF flag. This overflow event can also request an interrupt. overrun error An error that occurs when the receiver gets a new frame but the data register and shift register already have information. parallel port A port where all signals are available simultaneously. In this book the parallel ports are 8 bits wide. parallel programming A computer system that supports simultaneous execution of two or more software tasks. Compare to concurrent programming. PC relative An addressing mode where the effective address is calculated by its position relative to the current value of the program counter. performance debugging or profiling The process of acquiring or modifying timing characteristics and execution patterns of a program and the process of instrumenting a
524
Glossary of Terms
program for such purposes is called performance debugging or profiling. Contrast with functional debugging. periodic polling A software/hardware synchronization method that is a combination of interrupts and busy-waiting. An interrupt occurs at a regular rate (periodic) independent of the hardware status. The interrupt handler checks the hardware device (polls) to determine if its operation is complete. The software usually can perform other work while waiting for the hardware. personal computer system A small general purpose computer system having a price low enough for individual people to afford and used for personal tasks. PID Controller A control system where the actuator output depends on a linear combination of the current error (P), the integral of the error (I) and the derivative of the error (D). polling A software function to look and see which of the potential sources requested the interrupt. port External pins through which the microcomputer can perform input/output. Same as I/O port. positive logic A signal where the true value has a higher voltage than the false value, in digital logic true is 1 and false is 0, in TTL logic true is greater than 2 volts and false is less than 0.7 volts, in RS232 protocol true is 12 volts and false is 12 volts. Contrast with negative logic. precision For an input signal, it is the number of distinguishable input signals that can be reliably detected by the measurement. For an output signal, it is the number of different output parameters that can be produced by the system. For a number system, precision is the number of distinct or different values of a number system in units of “alternatives.” The precision of a number system is also the number of binary digits required to represent all its numbers in units of “bits.” priority When two requests for service are made simultaneously, priority determines which order to process them. private Can be accessed only by software functions in that module. Contrast with public. private variable A variable that is used by a single module, and not shared with other module. process The execution of software that does not necessarily cooperate with other processes. producer-consumer A multithreaded system where the producers generate new data, and the consumers process or output the data. program counter (PC) A register in the processor that points to the memory containing the instruction to execute next. PROM Same as EPROM. Programmable read only memory that is nonvolatile and requires external devices to erase and reprogram. It is usually erased using UV light. Constrast with EEPROM. promotion Increasing the precision of a number for convenience or to avoid overflow errors during calculations. pseudo-code A shorthand for describing a software algorithm. The exact format is not defined, but many programmers use their favorite high-level language syntax (like C) without paying rigorous attention to the punctuation. pseudo op Operations included in the program that are not executed by the computer at run time, but rather are interpreted by the assembler during the assembly process. Same as assembly directive.
Glossary of Terms
525
public Can be accessed by any software module. Contrast with private. public variable A variable that is shared by multiple programs or threads. pulse width modulation A technique to deliver a variable signal (voltage, power, energy) using an on/off signal with a variable percentage of time the signal is on (duty cycle). Same as variable duty cycle. qualitative DAS A DAS that collects information not in the form of numerical values, but rather in the form of the qualitative senses (e.g., sight, hearing, smell, taste and touch). A qualitative DAS may also detect the presence or absence of conditions. quantitative DAS A DAS that collects information in the form of numerical values. RAM Random Access Memory, a type of memory where is the information can be stored and retrieved easily and quickly. Since it is volatile the information is lost when power is removed. range Includes both the smallest possible and the largest possible signal (input or output). The difference between the largest and smallest input that can be measured by the instrument. The units are in the units of the measurand. When precision is in alternatives, range precision•resolution. real-time A system that can guarantee an upper bound (worst case) on latency. real-time computer system A system where time-critical operations occur when needed. recursion A programming technique where a function calls itself. See also linear recursion, tail recursion, and binary recursion. reentrant A software module that can be started by one thread, interrupted and executed by a second thread. A reentrant module allow multiple threads to properly execute the desired function. registers High-speed memory located in the processor. The registers on the 9S12 are CCR, A, B, X, Y, SP, and PC. Registers do not have memory addresses. reproducibility (or repeatability) A parameter specifying how consistent over time the measurement is when the input remains fixed. reset vector The 16-bit value at memory locations $FFFE and $FFFF specifying where the software should start after power is turned on or after a hardware reset. resolution For an input signal, it is the smallest change in the input parameter that can be reliably detected by the measurement. For an output signal, it is the smallest change in the output parameter that can be produced by the system, range equals precision times resolution where precision is given in alternatives. ritual Software, usually executed once at the beginning of the program, which defines the operational modes of the I/O ports. ROM Read Only Memory, a type of memory where is the information is programmed into the device once, but can be accessed quickly. It is low cost, must be purchased in high volume and can be programmed only once. See also PROM, EEPROM, and flash EEPROM. roundoff The error that occurs in a fixed-point or floating-point calculation when the least significant bits of an intermediate calculation are discarded so the result can fit into the finite precision. sampling rate The rate at which data is collected in a data acquisition system. See Nyquist Theorem. Scan or ScanPoint Any instrument used to produce a side effect without causing a break (halt) is a scan. Therefore, a scan may be used to gather data passively or to modify functions of a program. Examples include software added to your source code that simply outputs or modifies a global variable without halting. A ScanPoint is triggered
526
Glossary of Terms
in a manner similar to a breakpoint but a ScanPoint simply records data at that time without halting execution. scope A logic analyzer or an oscilloscope, hardware debugging tools that allows you to visualize multiple digital or analog signals versus time. SCSI Small Computer Systems Interface, a high speed handshaking parallel I/O standard. sensitivity The sensitivity of a transducer is the slope of the output versus input response. The sensitivity of a qualitative DAS that detects events is the percentage of actual events that are properly recognized by the system. serial communication A process where information is transmitted one bit at a time. serial communications interface (SCI) A device to transmit data with asynchronous serial communication protocol. Same as UART and ACIA. serial peripheral interface (SPI) device to transmit data with synchronous serial communication protocol. serial port An I/O port where the bits are input or output one at a time. setup time When latching data into a device with a rising or falling edge of a clock, the setup time is the time before the active edge of the clock that the data must be valid. Contrast with hold time. signed 2’s complement binary A mechanism to represent signed integers where 1 followed by all 0’s is the most negative number, all 1’s represents the value 1, all 0’s represents the value 0, and 0 followed by all 1’s is the largest positive number. sign-magnitude binary A mechanism to represent signed integers where the most significant bit is set if the number is negative, and the remaining bits represent the magnitude as an unsigned binary. simplex channel Hardware that allows bits (information, error checking, synchronization or overhead) to transfer only in one direction. Contrast with half-duplex and full-duplex channels. simplex communication A system that allows information to transfer only in one direction. simulator A simulator is a software application, like TExaS, which simulates or mimics the operation of a processor or computer system. Most simulators recreate only simple I/O ports and often do not effectively duplicate the real-time interactions of the software/hardware interface. On the other hand, they do provide a simple and interactive mechanism to test software. Simulators are especially useful when learning a new language, because they provide more control and access to the simulated machine, than one normally has with real hardware. single-pole switch One switch that acts independent from other switches in the system. Contrast with double-pole. single-throw switch A switch with two contact connections. The two contacts may be connected or disconnected. Contrast with double-throw. software interrupt vector The 16-bit value at memory locations $FFF6 and $FFF7 specifying where the software should go after executing a software interrupt instruction, swi. software maintenance Process of verifying, changing, correcting, enhancing, and extending software. source code Programs in human readable format created with an editor. space A digital value of false or logic 0. Contrast with mark. specificity The specificity of a transducer is the relative sensitivity of the device to the signal of interest versus the sensitivity of the device to other unwanted signals. The specificity of a qualitative DAS that detects events is the percentage of events detected by the system that are actually true events.
Glossary of Terms
527
stabilize The process of stabilizing a software system involves specifying all its inputs. When a system is stabilized, the output results are consistently repeatable. Stabilizing a system with multiple real-time events, like input devices and time-dependent conditions, can be difficult to accomplish. It often involves replacing input hardware with sequential reads from an array or disk file. stack Last in first out data structure located in RAM and used to temporarily save information. stack pointer (SP) A register in the processor that points to the RAM location of the stack. start bit An overhead bit(s) specifying the beginning of the frame, used in serial communication to synchronize the receiver shift register with the transmitter clock. See also stop bit, even parity and odd parity. static RAM Volatile read/write storage built from three transistors having fast speed, and not requiring refresh. Contrast with dynamic RAM. stepper motor A motor that moves in discrete steps. stop bit An overhead bit(s) specifying the end of the frame, used in serial communication to separate one frame from the next. See also start bit, even parity and odd parity. string A sequence of ASCII characters, usually terminated with a zero. symbol table A mapping from a symbolic name to its corresponding 16-bit address, generated by the assembler in pass one and displayed in the listing file. synchronous protocol A system where the two devices share the same clock. tachometer A sensor that measures the revolutions per second of a rotating shaft. tail recursion A technique where the recursive call occurs as the last action taken by the function. See also recursion, binary recursion, and linear recursion. thread The execution of software that cooperates with other threads. A thread embodies the action of the software. One concept describes a thread as the sequence of operations including the input and output data. throughput The information transfer rate, the amount of data transferred per second. Same as bandwidth. time constant The time to reach 63.2% of the final output after the input is instantaneously increased. time profile and execution profile Time profile refers to the timing characteristic of a program and execution profile refers to the execution pattern of a program. toggle Change 0 to 1 or 1 to 0. A toggle switch is one that if it is off when you push it, it will turn on. If it is on when you push it, it will turn off. transducer A device that converts one type of signal into another type. trigger flag A status bit that is set by the hardware to signify an external event has occurred. Same as interrupt flag. tristate The state of a tristate logic output when HiZ or not driven. tristate logic A digital logic device that has three output states low, high, and HiZ. truncation The act of discarding bits as a number is converted from one format to another. two-pole switch Two separate and complete switches, which are activated together, same as double-pole. two’s complement A number system used to define signed integers. The MSB defines whether the number is negative (1) or positive (0). To negate a two’s complement number, one first complements (flip from 0 to 1 or from 1 to 0) each bit, then add 1 to the number. unary operation A function that produces its result given a single input parameter. For example, negate, increment, and decrement are unary operations.
528
Glossary of Terms
unbuffered I/O The hardware and software are tightly coupled so that both wait for each other during the transmission of data. unipolar stepper motor A stepper motor where the current flows in only one direction (on/off) along the interface wires; a stepper with 5 or 6 interface wires. universal asynchronous receiver/transmitter (UART) A device to transmit data with asynchronous serial communication protocol, same as SCI and ACIA. unnormalized An unnomalized floating point number has a mantissa value less than one. The mantissa of a normalized floating point number is greater than or equal to 1, but strictly less than 2. unsigned binary A mechanism to represent unsigned integers where all 0’s represents the value 0, and all 1’s represents is the largest positive number. vector An address at the end of memory containing the location of the interrupt service routines. See also reset vector and interrupt vector. volatile A condition where information is lost when power is removed. In C, volatile tells the compiler, the value may change beyond the control of the software itself. vulnerable window Locations within a software module, which if an interrupt were to occur at one of these locations, then an error could occur (e.g., data lost, corrupted data, program crash, etc.) Same as critical section. white noise A fundamental noise in resistive devices arising from the uncertainty about the position and velocity of individual molecules. Same as Johnson noise and thermal noise. word Two bytes containing 16 bits. Same as double byte. workstation A powerful general purpose computer system having a price in the $10 K to 50 K range and used for handling large amounts of data and performing many calculations. XIRQ A high priority interrupt mechanism available on the 9S12 on PEO. XON/XOFF A protocol used by printers to feedback the printer status to the computer. XOFF is sent from the printer to the computer in order to stop data transfer, and XON is sent from the printer to the computer in order to resume data transfer.
Solutions Manual Checkpoint Solutions Checkpoint 1.1: An embedded system is a microcomputer with mechanical, chemical and electrical devices attached to it, programmed for a specific dedicated purpose, and packaged up as a complete system. Checkpoint 1.2: A microcomputer is a small computer that includes a processor, memory and I/O devices. Checkpoint 1.3: Typical input devices include the keys on the keyboard, mouse and its buttons, joystick, CD reader, and microphone. The floppy disk can be used for input and output. Checkpoint 1.4: Typical output devices include the LEDs on the keyboard, monitor, speaker, printer, CD burner, and speaker. The floppy disk can be used for input and output. Checkpoint 1.5: The software in a digital watch must maintain time using a real-time clock, output the current time on the LCD display, respond to button pushes updating parameters as required, check and see if the current time matches the alarm time. Checkpoint 1.6: One safety feature might be to turn it off after a finite amount of time, preventing the system from overheating. Another approach to safety is redundant or backup sensors. One could measure temperature at two or three locations, shutting off the toaster when any one sensor goes above threshold. Checkpoint 1.7: Both terms refer to parameters of a system, but the differences lie in the level of detail used to describe the parameter. A requirement is usually defined in general terms, whereas a specification entails detailed engineering rigor. A requirement often refers to an objective of the system, while a specification describes how well the actual device works. Checkpoint 1.8: It failed because employees were rewarded for poor behavior. It is much better to punish poor behavior and reward good behavior. Checkpoint 1.9: In general, the presence of a minimally intrusive debugging instrument itself only has minimal effect on the parameter being measured. One criterion is the total execution time required to perform the instrumentation is small compared to the execution times of the original target operation. Checkpoint 1.10: Runtime debugging can be activated in final production systems. Runtime debugging is quicker to activate/deactivate because an edit/assemble/download cycle is not needed. Assembly time debugging produces a final production system that runs faster and requires less memory. Checkpoint 1.11: We are sure we debugged the exact system that is being manufactured. The debugging statements can be used to evaluate the proper operation of systems before they are shipped. The instruments can also be used to diagnose and repair systems. Checkpoint 2.1: This question is significant because it yields the largest 8-bit unsigned integer. We add the powers of 2 for each digit that is 1. 1•27 1•26 1•25 1•24 1•23 1•22 1•21 1•20 255 Checkpoint 2.2: This is the same question as 2.1. We multiply the hexadecimal digit (0 to 15) by powers of 16. 15•161 15•160 255 Checkpoint 2.3: First, divide the binary into 4-bit nibbles, then convert the two 4-bit nibbles: %0100 $4 and %0101 $5. Third, combine the two hex digits into one number $45. 529
530
Solutions Manual
Checkpoint 2.4: First, divide the binary into 4-bit nibbles, then convert the three 4-bit nibbles: %1100 $C, %1010 $A and %1011 $B. Third, combine the three hex digits into one number $CAB Checkpoint 2.5: First, convert the two 4-bit nibbles: $4 %0100 and $0 %0000. Second, combine the 8 binary bits into one binary number %01000000 Checkpoint 2.6: First, convert the three 4-bit nibbles: $6 %0110, $3 %0011 and $F %1111. Second, combine the 12 binary bits into one binary number %011000111111 Checkpoint 2.7: Four binary bits are required for each hex digit. 4*6 is 24 bits. Checkpoint 2.8: This operation stores the 8-bit value in Register A out to address $0240. Because $0240 happens to be Port T, this will perform an output operation using Port T. Checkpoint 2.9: First you subtract one number from the other number. Subtraction is an arithmetic operation, so the Z bit will be set if the result of the subtraction is zero. Then, you use a conditional branch on zero, which will branch if the two original numbers were equal. Checkpoint 2.10: The variable Ptr1 is located at $0800 and is two bytes long, so Data1 will be located at $0802. Checkpoint 2.11: The addressing mode defines the format for the effective address for that instruction. In other words, it defines how the instruction will access the data it needs. Checkpoint 2.12: They have different formats for the constant, but they have the same machine code, and therefore perform exactly the same operation when executed. The only difference is programming style. We use hexadecimal formats for addresses and binary values. We use decimal formats for numbers that humans use when counting or measuring. Checkpoint 2.13: ldaa #$32 loads Register A with the value 50. On the other hand, ldaa $36 loads the 8-bit memory contents at address $0032, which happens to be Port K. Checkpoint 2.14: The bra instruction is two bytes long. The destination address is equal to the location of instruction, so the rr field is (size of the instruction), which is 2 (or $FE). The op code is $20, so the object code is $20,$FE. Checkpoint 2.15: ldx #$0801 loads the value $0801 into Reg X. ldx $0801 loads the 16-bit memory contents at address $0801 into Reg X. Checkpoint 2.16: Both perform the same operation, which is to load the memory contents of $0012–$0013 into Register X. The extended mode instruction does require more program memory space and will execute a little slower. Checkpoint 2.17: ldaa $0810 staa $0820
Checkpoint 2.18: ldaa #%11000111 staa PTT
Checkpoint 2.19: Setting a bit in the direction register to 1 makes that pin an output. So PT7, PT6, PT5, PT4 will be outputs, meaning the software can write to $0240 and make the pin high or low. Clearing a bit in the direction register to 0 makes that pin an input. So PT3, PT2, PT1, PT0 will be inputs, meaning the software can read from $0240 to see if these voltages on these pins are high or low. Checkpoint 2.20: The VOL of the LED driver is still 0.5 V. R (5 1.7 0.5)/0.005 560 . Checkpoint 2.21: The VOL of the 9S12 is still 0.8 V. R (5 1.7 0.8)/0.002 1250 . Checkpoint 3.1: 50/8 6.25, therefore it would take 7 bytes to store a 50-bit number.
Checkpoint Solutions
531
Checkpoint 3.2: 31⁄2 decimal digits is about 2000 alternatives, which is about 11 bits. Checkpoint 3.3: The rule of thumb says 260is about 1018, which is 18 decimal digits. 24is 16, which is about 11⁄2 decimal digits. Together, we have 191⁄2 decimal digits. Checkpoint 3.4: 2 terabytes is 2*1012 bytes, but two tebibytes is 2*240, which is 241 bytes. Checkpoint 3.5: Rotary switches, like the wiper speed switches on cars, have multiple positions. There are switches with three contacts (label them A,B,C), and there are three possible positions of the switch: A-B, no contact, or B-C. Checkpoint 3.6: Add the basis elements for each 1 in the binary number. 64 32 8 2 is 106. From right to left the binary basis elements are 1,2,4,8, . . . Checkpoint 3.7: Multiply the hex digit value by the corresponding hexadecimal basis element. 4*16 5 is 69. From right to left the binary basis elements are 1,16,256,4096, . . . Checkpoint 3.8: We start by setting the running total to the number we wish to convert. We start with the basis element associated with the MSB and work towards the basic element for the LSB. We must also subtract basis elements from the running total as we determine they are needed. If the basis element in question is less than or equal to the running total, then we need that basis element. Checkpoint 3.9: Combine binary basis elements to create the desired value. 45 32 8 4 1, so 45 %00101101 $2D. Checkpoint 3.10: Combine binary basis elements to create the desired value. 200 128 64 8, so 200 %11001000 $C8. Checkpoint 3.11: Combine signed binary basis elements to create the desired value. 128 62 32 8 2 22. Checkpoint 3.12: They are the same, because bit 7 is zero. Checkpoint 3.13: Combine signed binary basis elements to create the desired value. 45 128 64 16 2 1 %11010011 $D3. Checkpoint 3.14: Because the range of 8-bit signed numbers is 128 to 127. Checkpoint 3.15: Each four bits represent a single decimal digit, $25 %00100101. Checkpoint 3.16: 8192 64 32 8 2 8298. Checkpoint 3.17: 1*4096 2*256 3*16 4 4660. Checkpoint 3.18: 1234 4*256 13*16 2 $04D2. Checkpoint 3.19: 10000 8192 1024 512 256 16 %0010011100010000. Checkpoint 3.20: 1*4096 2*256 3*16 4 4660. Checkpoint 3.21: 32768 2*4096 11*256 12*16 13 21555. Checkpoint 3.22: 1234 4*256 13*16 2 $04D2. Checkpoint 3.23: 10000 32768 16384 4096 2048 128 64 32 16 %1101100011110000. Checkpoint 3.24: $3456. Checkpoint 3.25: Out not(eor(A,B))
A B
A^B 74HC86
Checkpoint 3.26: anda andb
#$0F #$3C
A^B 74HC04
A B
A^B 74HC7266
532
Solutions Manual
Checkpoint 3.27: xgdx oraa #$12 orab #$34 xgdx
Checkpoint 3.28: ldaa N anda #$EF staa N
clear bit 4
Checkpoint 3.29: Use bset to set bits and bclr to clear bits SSR_Init bset DDRT,#$20 ;PT5 output rts SSR_On bset PTT,#$20 ;PT5 high rts SSR_Off bclr PTT,#$20 ;PT5 low rts
Checkpoint 3.30: Set bit 7 to whatever value it used to be, shift one bits b7
b6
b5
b4
D Q
D Q
D Q
D Q
b3 D Q
b2 D Q
b1
b0
D Q
D Q
C D Q
shift D Q
D Q
D Q
D Q
D Q
D Q
D Q
D Q
copy
Checkpoint 3.31: ldaa N asla asla staa M
Checkpoint 3.32: 9 bits. If 0 x 255 and 0 y 255 then 0 (x y) 510. Checkpoint 3.33: 9 bits. If 128 x 127 and 128 y 127 then 256 (x y) 254. Checkpoint 3.34: 16 bits. If 0 x 255 and 0 y 255 then 0 (x*y) 65025. Checkpoint 3.35: 16 bits. If 128 x 127 and 128 y 127 then 16256 (x*y) 16384. Checkpoint 3.36: Either xgdx addd #100 xgdx
or as we will learn in Chapter 6, it could have been done with indexed addressing leax 100,x
Checkpoint 3.37: 100 64 36, so V 0. 156 64 220, so C 0. N 1 (negative) and Z 0 (not zero). Checkpoint 3.38: 100 (64) 164, so V 1 (overflow). 156 192 348, so C 1. N 0 (positive) and Z 0 (not zero). Checkpoint 3.39: 56 64 120, so V 0. 20064 136, so C 0. N 1 (negative) and Z 0 (not zero).
Checkpoint Solutions
533
Checkpoint 3.40: 56 64 8, so V 0. 200 64 264, so C 1 (overflow). N 0 (positive) and Z 0 (not zero). Checkpoint 3.41: To avoid ambiguity, because there may be two solutions to the equation N equals M*Q R. Checkpoint 3.42: Since the values in Registers A and B range from 0 to 255, their product must range from 0 to 65025. Therefore, all potential results will fit in Register D. Checkpoint 3.43: dividend quotient*divisor remainder. Checkpoint 3.44: ldaa ldab mul ldx idiv xgdx stab
N #7 D=7*N #31 X=(5*N)/31 M
Checkpoint 3.45: $30. Checkpoint 3.46: suba #$30 Checkpoint 3.47: Look up each letter, concatenate, add 0 at end, $48656C6C6F20 576F726C6400 Checkpoint 4.1: 216, which is 65536 locations. Checkpoint 4.2: 1 mebibyte is 220, so there are 20 address lines. Checkpoint 4.3: The 9S12 can access individual bytes. Each 8-bit byte has a unique address. Checkpoint 4.4: CU means control unit, BIU stands for bus interface unit, and ALU means arithmetic logic unit. Checkpoint 4.5: $80 is the mask value to select bit 7, and $01 is the mask value to select bit 0. We simply change all the $01 to $80, change the PT0 comments to PT7. Checkpoint 4.6: Because of the queue, the real 9S12 first executes its operation, then fetches the op codes for the next instruction. Queue Before
Cycle
Address
R/W
LSTRB
Data
Executing
$B608,$0006 $B608,$0006 $B608,$0006 $0006,$F000
r O P P P P
$0800 $FFFE $F004 $F006 $F000 $F002
Read Read Read Read Read Read
1 0 0 0 0 0
$55 **** $F000 **** $B608 $0006
ldaa $0800
$B608
jmp loop
Checkpoint 4.7: Two copies, one on the stack and a second copy still in Register A. Checkpoint 4.8: One copy of the data exists in Register A. The stack does not contain the data anymore. Checkpoint 4.9: ldaa psha ldaa staa pula staa
M N M N
534
Solutions Manual
Checkpoint 4.10: If the E clock is 4 MHz, its period is 250 ns. The divisor needs to be 2 s/250 8. 23 8, so we set TSCR2 to 3. If the E clock is 8 MHz, its period is 125 ns. The divisor needs to be 2 s/125 16. 24 16, so we set TSCR2 to 4. ; 9S12DP512/9S12E128 (E clock is 8 MHz) Init movb #$80,TSCR1 ; enable movb #$04,TSCR2 ; PR210=100 rts
; 9S12C32 (E clock is 4 MHz) Init movb #$80,TSCR1 ; enable movb #$03,TSCR2 ; PR210=011 rts
Checkpoint 4.11: 2 s*65536 131072 s, which is about 131 ms. Checkpoint 4.12: We could call the existing functions or do the wait explicitly Timer_Wait10ms ldd TCNT ;end of wait time addd #10000 wloop cpd TCNT ;stop when RegD= 0) isPositive();
skip
Checkpoint 5.9: Choose an 8-bit register, because the number is 8 bits. After a load instruction the N and Z bit specify whether the number is negative/positive and zero/not zero respectively. Also the load instructions clear the V bit. Therefore any of the signed conditional branches can be executed after a load, with the effect of comparing to zero. ldaa M ble skip jsr isGreaterThan0
; test the value of M ; do not execute if negative or zero ; if(M > 0) isGreaterThan0();
skip
Checkpoint 5.10: Choose a 16-bit register, because the number is 16 bits. Bring first number into a register, subtract second number. Choose a signed conditional branch because the number is signed. ldd cpd bgt jsr bra high jsr next
M #1000 high isless next isGreater
; ; ; ;
16-bit data read 16-bit subtract branch if M>1000 M1000
Checkpoint 5.11: Choose a 16-bit register, because the number is 16 bits. Unsigned/signed doesn’t matter when testing for equal or not equal. loop ldd cpd beq jsr bra next
N #25 next body loop
; ; ; ;
16-bit data read 16-bit subtract stop when N==25 execute body of the while loop
Checkpoint 5.12: Convert Register B to Register D. Checkpoint 5.13: Execute Test 1000 times. ldy #1000 loop jsr Test dbne Y,loop
Checkpoint 5.14: for(i 100; i ! 0; i) Process(); (the 100 can be any value) Checkpoint 5.15: The macro runs faster. If the subroutine/macro is called/invoked from one or two locations in our software, then the macro will also require less storage. Checkpoint 5.16: Each time requires 3 bytes, one byte for the parameter and two bytes for the return address. It is called five times, so 15 bytes are required. Checkpoint 5.17: Public functions have an underline. E.g., SCI_OutString. Private functions do not have an underline. E.g., SetBaud. Checkpoint 5.18: Local variables begin with a lower case letter E.g., myKey. Global variables begin with an upper case letter E.g., TheKey.
536
Solutions Manual
Checkpoint 5.19: Each conditional branch creates two potential execution paths. Twenty conditional branches might create 220, which is about a million, potential paths. In most cases, the actual number of paths will be much less, because taking one branch path usually prevents other conditional branches from being executed. Checkpoint 5.20: The assembler determines the size of each instruction. Using the org statements it will create a symbol table mapping the symbols into physical addresses. Checkpoint 5.21: The assembler determines the machine code for each instruction and creates the listing file. Checkpoint 6.1: rr is 01 signifying Reg Y. As a 5-bit 2’s complement number, 10 is 101102. Thus, for this addressing mode, the post-byte xb is rr0nnnnn 01010110 $56. Checkpoint 6.2: rr is 10 signifying Reg SP. The 1 is encoded as nnn 8 1 1112. The post-byte xb is rr111nnn 10101111 $AF. Checkpoint 6.3: rr is 00 signifying Reg X. 1000 doesn’t fit into 5-bit or 9-bit mode, so 16-bit mode is needed. 1000 is $FC18. The post-byte xb is 111rr010ffee 11100010,$FC18 $E2FC18. Checkpoint 6.4: leay 10,X
Checkpoint 6.5: Discarding stack requires adding to the SP leas 5,SP
Checkpoint 6.6: The precision is 32 bits or 4 bytes, the length is 5, and the total size is 20 bytes. Checkpoint 6.7: The termination code might exist in the data itself. Checkpoint 6.8: The amount we add is equal to the size in bytes of one data entry. For example for byte data use 1,x. For 16-bit word data use 2,x. For 32-bit long data use 4,x. Checkpoint 6.9: One byte is discarded by the addb 1,SP instruction. Therefore the stack is balanced. Checkpoint 6.10: For this particular matrix multiplying by 2 is simpler than multiplying by 3. Checkpoint 6.11: An 8-bit value, J, is added to a 16-bit value n*I. This instruction passes the carry from the least significant byte into the most significant byte if necessary. Checkpoint 6.12: n can be any value 1 to 255 because the product n*I is stored in the 16-bit RegD. For the same reason, m can be any value 1 to 255. Checkpoint 6.13: The compiler will skip bytes to align the 16-bit elements. $F950 $F952 $F954 $F956
$15 wasted memory $0240 $82 “PTT”,0,0,0,0,0,0,0
Checkpoint 6.14: It is allocated contiguously, and a simple equation can be used to calculate the address of each entry. Checkpoint 6.15: Tree Checkpoint 6.16: Yes, there can be no cycles. Checkpoint 6.17: General graph Checkpoint 6.18: Two separate binary trees, one for first name, one for last name. Checkpoint 6.19: To make it easier to understand. Checkpoint 6.20: Create it using a 20-byte size, and just waste the space when less than 20 bytes are requested.
Checkpoint Solutions
537
Checkpoint 6.21: It would be easier to search from either end. PutPt
null null 3
2
GetPt
1
Checkpoint 7.1: Define the variable within the scope of the function. E.g., void MyFunction(void){ short myLocalVariable; }
Checkpoint 7.2: Define the variable outside the scope of the function. E.g., short myGlobalVariable; void MyFunction(void){ }
// accessible by all programs
Checkpoint 7.3: To push a word on the stack, we first decrement the stack pointer (SP) by 2, then we store the word at the location pointed to by the SP. To pull a word from the stack, first we read the word from memory pointed to by SP, then we increment the SP by 2. Checkpoint 7.4: You don’t explicitly define the size of the stack. It can be implicitly calculated if the global variables are contiguously allocated starting at the beginning of RAM, and the stack pointer is initialized to the end of RAM. Then, the stack size is defined as the total RAM bytes minus the size of global variables. Some systems have a heap, which is used by malloc and free that also is defined in RAM. Checkpoint 7.5: movw #1000,2,SP ;Store 1000 onto the stack Checkpoint 7.6: The same local variable name can be used in other subroutines (just like C). Checkpoint 7.7: pshx takes less object code (1 byte versus 2 bytes). They both take two cycles to execute. The leas 2,sp may be easier to understand, and it will be easier to change. Checkpoint 7.8: The basic idea is to subtract size bytes from SP nega leas A,SP
;RegA = -size ;allocate array
Checkpoint 7.9: To allocate space we decrement SP, to deallocate we increment SP Sub leas -3,SP ;use locals leas 3,SP rts
;allocate ;deallocate
Checkpoint 7.10: In call by value, we pass a copy of the data (which may be result of an expression). In call by reference, we pass a pointer to the data such that the calling program and the subroutine are accessing the same data. Checkpoint 8.1: The 9S12 boards are classified DCE because the output of the PC is connected to the input of the 9S12, and vice versa. Checkpoint 8.2: Set BR to 13, baud rate 8 MHz/BR/16 38400 bps. (If you are running with the 9S12C32 at 4 MHz, you would have to adjust the PLL to change the E clock to 8 MHz). SCI0BD
= 13;
Checkpoint 8.3: Looks like 4 is also pressed. Checkpoint 8.4: Because it can only handle 0, 1, or 2 keys pressed at a time. Checkpoint 8.5: Since the movb instruction takes 4 cycles, E will be high for 4 cycles, which is 0.5 s. By the way, the Metrowerks Codewarrior created code that runs faster than this assembly version, making E 1 for 3 cycles, which is 0.375 s.
538
Solutions Manual
Checkpoint 8.6: VOL 0.7 V at 40 mA, so the coil voltage will be 5-0.7 V 4.3 V, which is large enough to activate a 5 V coil. Checkpoint 8.7: Select Clock A to be E/16, create SA A/20. I.e., 16*20 320 PWMPRCLK = (PWMPRCLK&0xF8)|0x04; // A=E/16 PWMSCLA = 10; // SA=A/20, 0.125*320=40us
Checkpoint 8.8: Change prescale from 320 to 3200. E.g., change PWMSCLA from 5 to 50. PWMPRCLK = (PWMPRCLK&0xF8)|0x05; // A=E/32 PWMSCLA = 50; // SA=A/100, 0.125*3200=400us
Checkpoint 8.9: The data might be easier to communicate with humans because the duty cycle is now a decimal fixed point number 0.000 to 1.000. However, the precision is reduced from about 16 bits (62501 alternatives) to 10 bits (1001 alternatives). Checkpoint 8.10: Yes, there are no shared registers (one uses clock A and the other uses clock B) and both are written in a friendly manner. Checkpoint 8.11: Speed (1 rotation/36 steps)*(1000 ms/s)*(60 sec/min)*(1step/50 ms) 33.3 RPM Checkpoint 8.12: Change the 50 ms to 10 ms, and it will spin 5 times faster. Speed (1 rotation/200 steps)*(1000 ms/s)*(60 sec/min)*(1step/10 ms) 30 RPM Checkpoint 9.1: 1) external event occurs, 2) the condition is armed, 3) the microcomputer is enabled. Checkpoint 9.2: Clear the I bit in the CCR with a cli instruction. Checkpoint 9.3: 1) finish the instruction, 2) push registers on the stack, 3) get interrupt vector, 4) execute the ISR, 5) execute rti instruction returning to the problem executing at the time of the interrupt. Checkpoint 9.4: The software would crash because the ISR would interrupt over and over again. Checkpoint 9.5: The software could disarm. Checkpoint 9.6: If we are not worrying about being friendly, we can simply make DDRJ=0 because PJ6 and PJ7 are inputs PPSJ=0x80 because PJ7 needs a pull-down, and PJ6 needs a pull-up PERJ=0xC0 in order to activate pull up/down on PJ6 and PJ7
The following is friendly assembly code bclr bset bclr bclr
DDRJ,#$C0 PPSJ,#$80 PPSJ,#$40 PERJ,#$C0
; ; ; ;
because PJ6 and PJ7 are inputs because PJ7 needs a pull-down because PJ6 needs a pull-up activates pulls on PJ6 and PJ7
Checkpoint 9.7: Clear bit 5 of PPSP. Checkpoint 9.8: With a 16 MHz crystal, set RTICTL to $59 or $64. With an 8 MHz crystal, set RTICTL to $49 or $54. Checkpoint 9.9: With an 8 MHz E clock, set TSCR2 to $87. With a 4 MHz E clock, set TSCR2 to $86. In this way TOF interrupts occur about every 1.048 seconds. Checkpoint 9.10: Set PERIOD to 10000. Checkpoint 9.11: Input capture occurs on the selected edge of the Port T input (rising, falling, or both rising and falling edges). Checkpoint 9.12: In hardware, TCNT is copied into the TCn latch, the flag is set (CnF), and if armed and enabled an interrupt will occur.
Checkpoint Solutions
539
Checkpoint 9.13: TFLG1 0x40; or movb #$40,TFLG1 Checkpoint 9.14: Change the parameter to TSCR2 to 4, so TCNT counts at 2 s. Checkpoint 9.15: 1234 or 1235 depending on the timing between the wave and the software. Checkpoint 9.16: Change the parameter to Timer_Wait1ms to 1, so it waits 1 ms. 1 or 2, meaning 1 kHz or 2 kHz. Checkpoint 9.17: It means if the pulse width increases by 8 s or more, the system can detect the change. Checkpoint 9.18: 1234.5/8 154 Checkpoint 9.19: void SetB3(void){ PORTB |= 0x08; } void ClrB3(void){ PORTB &= ~0x08; }
SetB3 bset PORTB,#$08 rts ClrB3 bclr PORTB,#$08 rts
Checkpoint 10.1: Checkpoint 10.2: Checkpoint 10.3: Checkpoint 10.4: Checkpoint 10.5: Checkpoint 10.6: Checkpoint 10.7: becomes
*1000 is about 3141.59, so the variable integer part is 3142. *256 is about 804.2477, so the variable integer part is 804. F (461•C)/256 32. y (1000•x 53•x1 1000•x2 51•y1 903•y2)/1000. Simply, R3 (R1*R2)/(R1 R2), because the fixed constants factor out. The Z-bit is set if and only if the most significant byte of the result is zero. Add (long) before one of the terms. For example P (N M)/10;
P=(long)(N+M)/10;
Checkpoint 10.8: The one on the left is a simple 32-bit divide by 2, which is 100001/2 50000. The one on the right rounds up because the number is odd, which is 100001/2 1 50001. Checkpoint 10.9: Sub
Div
ldaa suba staa rts ldab tfr ldab clra idiv tfr stab rts
1,y+ ;subtract top two,pop 0,y 0,y 1,y+ ;top is divisor b,x ;X is divisor 0,y ;dividend
x,b 0,y
;RegB is quotient
Checkpoint 10.10: You need more than 80 bits to convert, and most calculators do have this precision. Checkpoint 11.1: Because the frequency components of the wiggles are higher than 1⁄2 the sampling rate. The Nyquist Theorem is violated. Checkpoint 11.2: Because temperatures above 31°C are beyond the range, which is defined in this example as 0 to 31°C.
540
Solutions Manual
Checkpoint 11.3: Q2 is connected to 10 k, Q1 is connected to 20 k and Q0 is connected to 40 k, the other ends of the resistors are connected together. Any three resistors with the 1/2/4 ratio would be ok. Checkpoint 11.4: Approximating the 10-bit ADC is linear, either Dout 1024*Vin/5 or Dout 1023*Vin/5. Checkpoint 11.5: Acceleration is the derivative of the velocity, so use two discrete derivatives. d[0] v[0] v[1] (mV/ms) d[1] v[1] v[2] (mV/ms) a d[0] d[1] v[0] 2 v[1] v[2] (mV/ms2) Checkpoint 11.6: Acceleration is the derivative of the velocity, so use two discrete derivatives. a (d[0] 3d[1] 3d[2] d[3])/6 (mV/ms2) Checkpoint 11.7: The minimum time to finish an instruction is 0, and the maximum time to finish an instruction is 13 cycles, causing an uncertainty of 13 cycles; at 42 ns each this jitter is 542 ns. Checkpoint 11.8: Since the slope calculations and LCD output will always be less than 10 ms. The system is executing the while loop, waiting for the Flag to be set. This means it is either executing the ldaa Flag or the beq loop instructions. Both instructions take 3 cycles, so the maximum jitter will be 3 cycles. Checkpoint 11.9: It would react quicker to changes in load or desired speed, but it would be more unstable (oscillations) in the steady state. Checkpoint 11.10: The output would go all the way to 250 (full on) if too slow, and all the way to 0 (no power) if too fast. This is called bang-bang (like a taxi cab driver in New York City, who has two outputs: full power and full break). Checkpoint 12.1: Use local variables or registers instead of global variables AVE leax D,X ;first+second tfr X,D asrd rts
short AVE(short first, short second){ short num; num = first; return (num+second)/2; }
Checkpoint 12.2: PTT 3 if three threads are executing Func. Checkpoint 12.3: The cli would reenable interrupts too early. Checkpoint 12.4: There are two ways a FIFO can get full. If the average rate at which data is put in the FIFO is larger than the average rate data is get from the FIFO, then the FIFO will always fill up. If the temporary rate at which data is put in the FIFO is larger than the temporary rate data is get from the FIFO, and the FIFO size is small, then the FIFO may fill up. Checkpoint 12.5: No, if the average producer rate exceeds the average consumer rate. Yes, if the temporary producer rate exceeds the temporary consumer rate. Checkpoint 12.6: The TxFifo is empty and there is nothing to print. Checkpoint 12.7: The software would crash because the ISR is spinning with interrupts disabled, and no software has the chance to empty the RxFifo. Checkpoint 12.8: BR 8,000,000/16/1200 417. Checkpoint 12.9: With open collector outputs, the low will dominate over HiZ. The signal will be low. Checkpoint 12.10: The minimum is 1, and the maximum is N-1. On average, it will take N/2 transmissions for the message to go from one computer to another. There are
Checkpoint Solutions
541
10 bits/frame, so there are 10,000 bytes/sec. Because there are 10 bytes/message, it takes 1 ms to transmit a message. Because it has to be sent 5 times, it takes 5 ms on average. Checkpoint 12.11: The frame sent by a transmitter is echoed to its own receiver. If the data does not match, or if there are any framing or noise errors then a collision occurred. Checkpoint 12.12: Parity could be used to detect collisions. Also the message could have checksum added. Framing or noise errors can also indicate a collision. Checkpoint 12.13: With open collector outputs, the low dominates over HiZ. Checkpoint 12.14: z 2 is equal to y 1. So, make y equal to z 1 Checkpoint 12.15: The rise time of the open collector signal will follow a R-C relationship. As the number of devices increase so does the capacitance. It will rise faster with a smaller R. Checkpoint 12.16: If no device sends an acknowledgement, the SDA signal will float high, generating a negative acknowledgement. Checkpoint 12.17: If they both send the same address and the same sequence of data bits, both will finish without getting a lost-arbitration error. If they both send the same address and but different data values, an arbitration will occur during the data transfer, and the master with the smaller data value will win arbitration. Checkpoint 12.18: tE/ tbit is 24000/100 240. The only solution is to make IBFD equal to $1F, making MUL 1, scl2start 6, scl2stop 9, scl2tap 6, tap2tap 8, SCLTap 15 MUL • {2•(scl2tap [(SCLTap 1) • tap2tap] 2)} 1•{2•(6 [(15 1)•8] 2)} 240 Checkpoint A1.1: An assembly source code is software in human readable format created with an editor that gives explicit instructions to the computer. It includes both specific functions to perform as well as the order in which to perform them. Checkpoint A1.2: An oscilloscope plots voltage level on the y-axis versus time on the x-axis. Checkpoint A1.3: A logic analyzer also plots voltage level versus time, but the difference is that the voltage level just takes on the two digital logic states of high and low. Checkpoint A1.4: Object code is software in machine readable format loaded into memory that gives explicit instructions to the computer. Checkpoint A1.5: The op codes are shown in blue, while the pseudo-op codes are shown in gray. Checkpoint A1.6: 2 4*6/5 1 2 24/5 1 2 4 1 6 1 7 Checkpoint A1.7: Using parentheses make the expression easier for humans to understand, so from a style perspective the second one is better. Checkpoint A1.8: With ldaa #56 the addition occurs in the IBM-PC at assemble time, while with the second case the addition occurs in the 9S12 microcomputer at run time. Checkpoint A1.9: The assembler must create the symbol table during pass 1, so it must know the size of each assembly line during pass 1. A forward reference would prevent the assembler from knowing how many bytes to allocate during pass 1. It would probably create a phasing error. Checkpoint A1.10: The loader stores the data $F0, $80, $20, $FE at addresses $F026 – $F029. Checkpoint A1.11: No, the checksum should be 66 making it S10CF0DD80CCF0E6 16F4BC20B866 Checkpoint A1.12: S10556781234E6
542
Solutions Manual
Tutorial Solutions Answer 1.1. The dec and bne instructions are executed 10 times. Answer 1.2. The variable is defined in RAM. Answer 1.3. The program is defined in ROM. Answer 1.4. The different parts of the program are automatically colored. For example, labels, comments, op codes, pseudo-op codes, and operands have unique colors. Also, misspelled op codes and other syntax errors are flagged in red. Answer 1.5. 9S12C32. Answer 1.6. The cursor signifies the current location of the program counter (PC). Answer 1.7. TExaS simulates a number of hardware devices as well as the software. Answer 2.1. First, we specify what data to observe, and second we define when to collect the data. Answer 2.2. To see 8-bit unsigned decimal, we use the d format. Answer 2.3. We activate CycleView to dump into the TheLog.rtf window the memory bus cycles as the program executes. Answer 2.4. We activate InstructionView to dump into the TheLog.rtf window the instructions as the program executes. Answer 2.5. We activate LogRecord to dump into the TheLog.rtf window the information in the ViewBox as the program executes. Answer 2.6. A ScanPoint is like a breakpoint, but it doesn’t stop. Rather, it copies the information from the ViewBox into the TheLog.rtf window and continues to execute. Answer 3.1. The value is outside the range of 8-bit numbers. It will be truncated, and only the lower 8 bits will be stored, RegA will be improperly set to 0. Answer 3.2. As an 8-bit binary number 1 is 11111111, thus RegA will be properly set to 111111112. Since the format is set to unsigned decimal (d), RegA will be shown as 255 in the ViewBox. If the format were set to signed decimal (d), RegA would have been shown as 1 in the ViewBox. This points out the reality that within the computer all data is binary. The data can be interpreted in a multitude of ways. Answer 3.3. The value is outside the range of 16-bit numbers. It will be truncated, and only the lower 16 bits will be stored, RegX will be improperly set to 0. Answer 3.4. As a 16-bit binary number 1 is 1111111111111111, thus RegX will be properly set to 11111111111111112. Since the format is set to unsigned decimal (D), RegX will be shown as 65535 in the ViewBox. If the format were set to signed decimal (D), RegX would have been shown as 1 in the ViewBox. Answer 3.5. As an 8-bit binary number 128 is 10000000, thus RegA will be properly set to 100000002. Since the format is set to signed decimal (d), RegA will be shown as 128 in the ViewBox. If the format were set to unsigned decimal (d), RegA would have been shown as 128 in the ViewBox. Answer 3.6. The value is outside the range of 8-bit numbers. It will be truncated, and only the lower 8 bits will be stored, RegA will be improperly set to 127. Answer 3.7. As a 16-bit binary number 32768 is 1000000000000000, thus RegX will be properly set to 10000000000000002. Since the format is set to signed decimal (D), RegX will be shown as 32768 in the ViewBox. If the format were set to unsigned decimal (D), RegX would have been shown as 32768 in the ViewBox. Answer 3.8. The value is outside the range of 16-bit numbers. It will be truncated, and only the lower 16 bits will be stored, RegX will be improperly set to 32767.
Tutorial Solutions
543
Answer 3.9. R is the result loaded into Register A. N is set if the result is negative, N R7. Z is set if the result is zero. Z not(R7)•not(R6)•not(R5)•not(R4)•not(R3)•not(R2)• not(R1)•not(R0). V is cleared to 0. There can be no overflow. The carry bit is not altered. Answer 3.10. $0F&$85 $0F|$85 $0F^$85 ~$0F
= = = =
$05 $8F $8A $F0
N=0, N=1, N=1, N=1,
Z=0 Z=0 Z=0 Z=0
77 100 160 32 96 224
N=0, N=0, N=1, N=0, N=0, N=1,
Z=0, Z=0, Z=0, Z=0, Z=0, Z=0,
C=1 C=0 C=0 C=1 C=0 C=1
-51 -100 +32 -96 -32 +96
N=1, N=1, N=0, N=1, N=1, N=0,
Z=0, Z=0, Z=0, Z=0, Z=0, Z=0,
V=0 V=0 V=0 V=1 V=0 V=1
Answer 3.11. 155>>1 501 -50