LOGIC DESIGN for ARRAY-based Circuits
by Donnamaie E. White
for elektroda people
Logic Design for Array-Based Circui...
50 downloads
760 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LOGIC DESIGN for ARRAY-based Circuits
by Donnamaie E. White
for elektroda people
Logic Design for Array-Based Circuits by Donnamaie E. White Copyright © 1996, 2001, 2002 Donnamaie E. White
The original form of this book was published by Academic Press, 1250 Sixth Avenue, San Diego, California 92101-4311 in 1992. ISBN 0-12-746660-6. The figures were reproduced with the permission of Applied Micro Circuit Corporation. The Q20000 Series and other bopolar and BiCMOS series referenced belong to AMCC. Note that this book is dated as to the AMCC ASIC business. It represents design flow from the late 1980's to early 1990's. Thge design flow bears a remarkable similarity to the current design flow used by Cadence and Synopsis - with the switch from schematic capture to HDL code and synthesis and with more of the validation steps now performed by various software programs. A basic understanding of the underlying methodology to what we do today with deep submicron technologies is still a good read. Everything that bipolar design had to handle in the early 1980's is what we now must handle for 0.25 micron and belowCMOS technologies. This book was based on classes taught at USCD and at AMCC's customer Design Center. Customer training courses prepared by high-technology vendors are a required extension to that training available in the engineering classes at the college level. The quality of that training can vary with the experience of the instructor. The experience of the instructor with the nuances of the products is one facet. The teaching expertise is another. The purpose behind this book was to document what a proven instructor was adding to the course material and manuals. By providing this supplement, it would be possible for other, less experienced instructors, to take over the actual presentation of the seminars while ensuring no loss of insight into the methodology or product taught.
● ● ●
●
●
Table of Contents Preface Overview ❍ Introduction ❍ Integration Levels ❍ Demand and Supply ❍ eLearning - the Next Best Thing Chapter 1 Introduction ❍ Introduction ❍ Selection ❍ Design Support Issues ■ Schematic Rules Checking ■ Reformatters ■ Design Upgrades ❍ Exercises ❍ Update 2000 Chapter 2 Structured Design Methodology ❍ Structured Design Methodology ❍ Review of the Available Arrays ❍ Initial Sizing of the Circuit ❍ Create the Preliminary Macro Schematic (when schematics are used) ❍ Compute the Path Propagation Delay ❍ Compute the Estimated Power
Pre-Simultation Steps Simulation ❍ Fault Grading ❍ Design Submission Through Prototype ■ Design Validation Review ■ Placement Chapter 3 Sizing the Design ❍ Functional Specification - A Closer Look ❍ Review the Available Arrays ❍ Architectual Specification or Hardware Specification ❍ Array Sizing ❍ Cell Capabilities ❍ Array Architecture ❍ Netlist ❍ Example - AMCC Arrays - Power Supply Options ❍ Examples ❍ Refining Interface Requirements ❍ Dual-Function I/O Macros ❍ Thermal Diodes ❍ Final Interface Cell Utilization ❍ Drivers ❍ Exercises Chapter 3 Appendix = Case Study in Sizing a Design Chapter 4 Design Optimization ❍ Introduction ❍ Optimization Approaches ❍ Design to Improve Speed ❍ Example of Silicon Efficiency ❍ Internal Net Delays ❍ Design to Reduce Internal Cell Utilization ❍ Design to Reduce I/O Utilization ❍ Design to Fit the Package ❍ Design to Reduce Power ❍ Design to Reduce Cost ❍ Basic Design for Circuit Testability ❍ Basic Design for Circuit Reliability ❍ Design to Reduce Cost ❍ Exercises Chapter 5 Timing Analysis for Arrays ❍ Introduction ❍ Path Propagation Delay Overview ❍ Intrinsic Set-Up and Hold Time ❍ Interconnect Delays ❍ Annotation ❍ Manual Computation - One Method ❍ Example Equations for Extrinsic Loading - Internal Nets ❍ k-Factors ❍ Computing Lfo ❍ Computing Lwo ❍ Computing Lnet ❍ Example Equations for Extrinsic Loading - Output Nets ❍ Worst-Case Delay Multiplication Factors ❍ Front Annotation ❍ Intermediate Annotation ❍ Back Annotation ❍ Exercises Chapter 6 External Set-up and Hold Times ❍ Introduction ❍ Case 1: When the Timing Specifications Are Nominal ❍ Case 2: When The Timing Specifications Are Worst-Case ❍ Example - AMCC Q1400 BiCMOS Series ❍ Example - AMCC Q20000 Bipolar Series ❍ Case Study: Preventing Hold Violations Due To Clock Skew ❍ Computing Hold Time in the Register Example Chapter 7 Power Considerations ❍ Introduction ❍ DC Power--❍ AC Power ❍ Worst-Case Power ❍ Power Reduction Techniques ❍ Design Rules for Power Reduction (AMCC) ❍ Computing DC Power Dissipation ❍ AC Macro Power Dissipation Case Study: DC Power Computation ❍ Step 10: Determine ECL Static Power PEO Case Study: AC Power Computation ❍ Total Power Dissipation Chapter 8 Simulation ❍ Introduction ❍ Simulators - The Tools ❍ Wafer Sort/Packages Part Sort Functional Simulation ❍ ❍
●
● ●
●
●
●
●
●
●
At-Speed Simulation For Timing Verification AC Tests ❍ Parametric Vectors ❍ Hazards Case Study: Simulation ❍ Functional Simulation ❍ Parametric Simulation ❍ At-Speed Simulation ❍ AC Test Chapter 9 Faults and Fault Detection ❍ Introduction ❍ Fault Types ❍ The Problem ❍ Selecting a Chain ❍ Case Study - 16:1 MUX ❍ 2:1 MUX Example ❍ 3:1 MUX ❍ Actual Test Vector Sequence for a 16:1 MUX with Clocked Output Chapter 10 Design Submission ❍ Case Study: AMCC Design Submission ❍ Continued ❍ Functional Simulation Submission (Required) ❍ At-Speed Simulation Submission (Required) ❍ AC Tests Path Delay Vector Submission (Optional) ❍ Parametric Vectors (Required or Optional) ASIC Glossary ❍ Glossary A-D ❍ Glossary E-K ❍ Glossary L-R ❍ Glossary S-Z ❍ ❍
●
●
●
●
❍
Preface This book was based on classes taught at USCD and at AMCC's customer Design Center 1984 - 1994. It is based on schematic capture rather than Verilog or VHDL input and on manual support rather than the newer tools available for static timing verification, test generation, synthesis, etc. The Design Flow is, however, still the same. The steps must be done, with or without tools. This book details the theory behind the new EDA tools. Customer training courses prepared by high-technology vendors are a required extension to that training available in the engineering classes at the college level. The quality of that training can vary with the experience of the instructor. The experience of the instructor with the nuances of the products is one facet. The teaching expertise is another. The purpose behind this book was to document what a proven instructor was adding to the course material and manuals. By providing this supplement, it would be possible for other, less experienced instructors, to take over the actual presentation of the seminars while ensuring no loss of insight into the methodology or product taught. The course on which the book was based was rewritten for each new array series and technology change made by Applied Micro Circuits Corporation. The array series covered originally included the bipolar Q700 (1000 gates on a chip running at 200 MHz) and ends with the Q20000 (20000 gates on a chip running at 1.2 GHz), with CMOS and BiCMOS added along the way.
Structured Design In the process of these rewrites, it became obvious that a certain core of the seminar remained inviolate - the structured, orderly, logical approach to circuit design . This approach was taken from discrete board design, from SSI-MSI logic design, from bit-slice design, from structured software and firmware programming, and from systems concepts. This core material is represented in this text. The emphasis is on the total design picture - all those myriad of details that interleave. What was also obvious is that examples that are wrong as well as those that are right are essential to rapid assimilation of the material. The last array series turned to for examples was the 1994-1995 AMCC Q20000 Bipolar array series. The goal has been to create a book that can be used with any vendor's array series. It applies those designing circuits for an ASIC, Application Specific Integrated Circuit, vendor to produce, and to those vendors who are designing ASSPs, Application Specific Special Products, standard products that are designed to be built on an array base wafer. (ASSPs are the latest addition to the designer's toolbox.)
Acknowledgments The author would like to thank the AMCC staff for their time and energy expended in the compilation of this material, with specific thanks to Richard W. Spehn for his expertise and encouragement.
Overview Last Edit July 22, 2001
Introduction Each of the last six decades has seen a new technology come forward as the leading edge for that era. Table 1 provides a summary of this evolution by decade and integration level. Table 1 - Integrated Circuit Evolution Approx. Date 1950s
Size gate level
Description A few transistors and other components combined to form an AND, OR or NOR gates
mid 1960s
SSI
4 or more gates; NAND, NOR, OR, AND, EXOR, NOT or INVERT
early 1970s
MSI
up to 200 gates; registers, decoders, multiplexors, etc.
late 1970s
LSI
several hundred gates; ALUs with scratch-pad registers, interrupt controllers, microprogram sequencers, ROMs, PROMs
1980s
VLSI
700 gates and up; CPUs, complex functions
1980s
ASIC
up to 30,000 gates; multiple functions
early 1990s
ASIC
up to 100,000 gates and increasing with speeds at 1.4GHz and higher
1980-1990s
EPAC
The development of analog circuit arrays
1990s
DSM SoC IP
Deep SubMicron (< 0.18µ) designs; 1 Million gate arrays; System on a Chip Intellectual Property - soft and hard IP building blocks
2000 and up
Design Reuse (IP); High-speed
Deep SubMicron (0.13µ) designs; 410 Million gate arrays; more gates, faster designs; improved test methodology; faster synthesis the 1 GHz and faster CPUs
Each technology change has led to a period where those designers who are state-of-the-art orientated, those who readily delve into new developments, accept and begin to use the newest devices in designs. For successful technologies, this is followed by the intense application and development phase where the high demand for engineers who can design with the devices typically exceeds the supply of those engineers. The is the driving force behind the evolution of IP (Intellectual Property) blocks, predesigned mega-function blocks that can be re-used in more than one chip. These mega-functions become part of the design library.They may be hardIP, where all levels of the base die are involved, or soft-IP, where only the metallization layers (currently about 6-8 layers of the die) are involved.
Integration Levels From the mid-1960s, there are small-scale integration (SSI) gates: NAND, NOR, EXOR, and NOT or INVERT. SSI can be defined to be about 2-10 gates on a single chip. Anything can be built from SSI, but the design time, power, and size make this approach obsolete for designs that must be built quickly and in quantity. Custom design at the transistor and resistor level is reserved for special projects. From the early 1970s there are larger blocks, medium-scale integration (MSI): registers, decoders, multiplexors, counters, adders, comparators, etc. MSI is loosely defined as approximately 20-100 gates. MSI allows more modular designs, speeding the design process when the blocks could be applied. In the late 1970s arithmetic logic units (ALUs) with on-board registers, microprogrammable sequencers and interrupt controllers in a bit-slice format became available. Memory chips (ROM, PROM, RAM) in increasing sizes became readily available. Large-scale integration (LSI) culminated in the one-chip microprocessors. LSI is loosely defined as approximately 200-1000+ gates. Very large scale integration (VLSI) has reached 20,000 gates and higher. LSI and VLSI further increase the modular block size, reducing design time, space, and power considerations and increasing reliability as connections are moved inside the components. Many LSI and VLSI blocks are designed by their manufacturers and referred to as fixed-instruction-set modules.
Bit-Slice Design For any given design, if the architecture of the fixed LSI and VLSI blocks suit the application then the design time is considerably shortened. When a one-chip microprocessor is not quite suitable, microprogrammable architectures can often provide sufficient customization. Microprogrammable architectures, such as bit-slice, allow a closer control over the architecture but not total control. The basic building blocks are still designed by the chip manufacturer for generic applications. Bit-slice architectures include interruptable sequencers and 32-bit ALUs. The customization of the bit-slice modules to an application is done through customer-designed module interconnection, the implemented commands and their sequences. The commands or instruction set is called the microprogram for the design.
ASIC The 1980s saw the acceptance of ASICs ( application specific integrated circuits), VLSI devices large enough to allow designers to implement architectures that were suited to solving the design problem rather than
forcing one architecture to solve everything. It was the natural extension to the bit-slice architectures, where some control of architecture was possible through microprogramming but where the basic building blocks were fixed designs. The application-specific customization of the design solution allows the designer to have the creative power of a gate-level breadboard design while keeping the production advantages of VLSI. Not far behind the ASIC and ASIC developments, multimedia and design integration saw a need to incorporate analog functions into digital systems. For years the trend had been away from analog design as a chosen career and now there was a shortage of design engineers. First came massive retraining of internal staff as companies struggled to cope. Then came the creation of Electrically Programmable Analog Circuit (EPAC) and related devices. Now designers are coping with 8-12 inch wafers, 1 million gate chips, a deep submicron technologies with a shrinking design time window. For example, the next-generation Pentium chips are mandated to be first-time silicon success. The first took four tapeouts to achieve success. Table 2 Integration Sizing Terminology Acronym Definition SSI
small scale integration where a few gates were lumped together as a means of improving the design and the design process,
MSI
medium scale integration when more gates were packed together in a single chip for the same reasons,
LSI
large scale integration when functional blocks could be contained on a chip,
VLSI
very large scale integration and its various offshoots (VHLSI, etc.) where larger functional blocks and their related circuitry could be brought together in lower power, faster chips.
ASIC
application-specific integrated circuit
ASSP
application specific standard product
EPACtm
Electrically Programmable Analog Circuit
ALU
arithmetic-logic unit
CPU
central processor unit
DSM
Deep SubMicron
VDSM
Very Deep SubMicron
SoC
System-on-a-chip
IP
Intellectual Property - precoded functional block for design reuse (Hard-IP, Soft-IP)
Business Systems
Demand And Supply The number of designers who can successfully complete the design of an array-based circuit through design submission and prototype acceptance is limited. Some estimates as of 1998 are as low as 50,000 engineers in the USA. The demand for array-based circuit designers is already predicted by the periodicals to exceed the supply of trained engineers. The demand for designers capable of fast, efficient and successful design with ASICs is exceeding the supply and the predictions for the future show a projected shortage. In addition to adding engineers to meet the demand, the productivity of each designer will need to be drastically increased. Designers must choose from a complex array of new products, new technologies, changing standards, a wide range of support, changes in packaging, varied design tools, and changing design rules, while evaluating cost-effectiveness of the final product. Workstations are evolving, changing platforms, expanding features, and moving from device to board to system level capabilites. Note: While this book was being written, Daisy went from one of the leading vendors to nothing, Valid transferred to the SUN platform, obsoleting the SCALD system, hardware emulators were beginning to be interesting, virtual memory was recognized as probably useful for the big designs, the average array speed went from 280MHz to over 1.2Ghz, the ASIC array size went from 1000 gates to over 100,000 gates (30,000 useable), and design rules for the newer arrays were rated as four times more complicated then before. In the time since, we have reached successful 750,000 gate designs and higher, have reduced technology from 0.35 to 0.18 micron and switched from schematic capture to Verilog or VHDL input. Design tools have advanced to pick up the intermediate steps between the larger packages and tools to remove manual operations and make on-screen design a reality. Array vendors start as many FPGAs and ASICs and are outsourcing their libraries. EDA houses are supplying libraries alsog with a full design flow tools set, usually with the intention of being the sole vendor for all of the array designer's needs. With the size, simulations became longer and 4K vectors were no longer a reasonable limit for test vectors, packaging was pushed to its limits and beyond, simulators were faced with the need for hardware-assist, timing verifiers became non-unique in the design cycle, frameworks began to be spoken of if not heavily used, behavioral languages (HDL, VHDL) were accepted in marketing vocabulary and then supported - and are now the accepted design start. These changes are only some of the ongoing evolution made over the past five years. Pick up any magazine or newspaper devoted to ASIC and at least one article will decry the monumental task facing the design engineer in the 90's and forward. There is a constant need to acquire new skills, understand and master new tools and accept new array design restrictions and features. And not only is the designer faced with the choice of which vendor and what product, but also with the management of the design once started. The design tools that do exist may not work together making design management a complex and error-prone process. As with any new technology, the engineer can choose to study the product and its support from the design manuals, datasheets and reading literature. ASIC array vendors provide design manuals to assist the designer in completing a successful design submission, that point of transfer between the design and the vendor. Vendors maintain applications support engineers to answer questions and to guide the customer-designer through the submission process. This "earn while you learn" is acceptable in some cases, where design schedules will allow the weeks or months it takes for the engineer to "get up to speed" and to redo those design phases that failed due to misunderstanding of the technology and its limits.
eLearning - Next Best Thing When I first composed this text, back in the early 1990s, little did I realize how much the industry would leap forward. None of us were prepared for the advance of the Internet, although e-mail had been with the engineering community since the 1960s and FTP had been in use since at least that time. HTML burst upon the scene and several of us clicked on the concept of "living" classrooms on the web almost instantly. In 2001, Harvard put its entire curriculum on-line (for free). Cadence has put all of its technical training classes on-line (for a fee). Synopsys has begun to put its technical training on the web. The industry has spawned expensive-to-produce CD ROM training, which has not been widely accepted, page-turners (PowerPoint presentations with/without audio and with/without video assist, "live" webcasts or update training, and fully-integrated, true computer-based instruction. The goal is to have "living" technical material that can be updated faster than the two-year cycle for a technical book or the six-month cycle of a technical journal. The web is immediate. This author has just completed the conversion of the Synopsys Advanced Chip Synthesis 3-day lecture-lab Workshop into the Advanced Chip Synthesis eLearning Workshop, hosted at Vitalect. This is the first of several planned course conversions. The workshops will still be available in ILT form (Instructor Led Training) as well. There is a free on-line Advanced Chip Synthesis Demo featuring one of the workshop Units. You can view the demo at Vitalect but your browser must be configured with RealAudio and Flash for proper display of animations and to hear the audio scripts. Vitalect features a "Set-Up" page to help you.
Training Classes - Historical Review ASIC, library and EDA vendors offer training classes where the array product and its peripheral requirements for design submission are presented in intense two to five day seminars and workshops. Because of the structure of a class, the array vendor can attempt to ensure that important issues are discussed or at least brought to the attention of the designers. This reduces the problems that could occur during the acceptance review of the design submission which shortens the first-time design cycle. AMCC - Applied Micro Circuits Corporation - offered a three-day array design class and a two-day workshop workstation lab class to its customers. This class was taught for seven years, using the same methodology for a range of evolving products: Bipolar Q700 Series, Q1500, QH1500, Q3500 Series, Q5000 Series, Q20000 Series; CMOS Q6000 Series, Q6000A Series, Q9000 Series; BiCMOS Q14000 Series; and Q24000 Series.
The workstations covered included the Daisy Logician (now obsolete) Dazix SUN, VALID SCALD (now obsolete), Valid SUN and Mentor on Apollo. Simulators include Tegas 5 (discontinued at AMCC), Lasar 6 and Verilog. The seminar was also taught at UCSD - University of California at San Diego - as an extension graduate engineering class with credit. The range of series and the variability of the platforms and tools listed for just one vendor demonstrates some of the problems associated with maintaining currency.
Vendor-Independent Training - the Design Flow With the range of technologies and array families within any technology and the number of workstations, platforms and simulators that support them, a basic design methodology was developed at AMCC to ensure a successful design the first time. This design flow is reflected in the wide variety of design tools and customer education classes offered at Synopsys and Cadence, the two biggest EDA firms. At this moment Synopsys is the industry leader in synthesis tools (Design Compiler, Module CompilerBehaviorial Compiler, Designware Foundation, RTL Analyzer, etc. ) and Cadence is the leader in Place and Route tools (Gate Ensemble, Silicon Ensemble). The same flow is represented in the design-reuse concept supported by Synopsys and other companies. The flow is used with little variation for the design of a full chip (core and I/O) and for the design of an IP module (coreonly).
Structured design works. The methods have been developed and tested with hundreds of designs. Any problems seen on submissions and prototypes can usually be traced back to some violation of the stated design methodology. In addition to AMCC and its arrays and the Synopsys CBA Design System, design manuals from other vendors were obtained and reviewed to verify that the basic approach is generic, i.e., technology and vendor independent. Once a structured design methodology was developed, it was imperative that the presentation be consistent across several instructors. Class notes, usually in the form of slides or overheads, are merely topical outlines and suppliments. Few instructors last long if everything is written on the overhead and instruction is a "reading of the screen". The usual procedure is to keep key words and phrases on the screen and the instructor then speaks "off the cuff". This approach is acceptable for most subjects. ASIC design is so complex and encompassing an issue that the class content can be driven by active students so that it emphasizes those areas questioned and de-emphasizes the rest. Classes will therefore vary in the depth of topics covered depending on the students in attendance.
About This Text An improvement to the process is a text book that can survive the evolving technologies and changing equipment. This text is an attempt to capture the verbal lecture used by the AMCC/UCSD instructor for this course to provide consistency for the classroom and for those who choose to be self-taught. It does not try to duplicate what can be found in the design manuals per se beyond using design examples. The student cum reader is always referenced to the most current design manual and datasheet for the array or array series or vendor support software of interest. This text will present the basic structured design flow, show how various steps interconnect, how they may be performed and provide checklists of items that should be known prior to design start. It is designed to support any vendor and any array. For those who were taking the class for credit, chapter exercises were provided to allow the students to perform exercises using the equipment and materials of interest to them. At the time, schematic-capture was the methodology; today we have VHDL and Verilog. Daisy, Mentor and Valid workstations centered on shcematic capture are no longer found in most engineering environments. The SUN workstation, the NT platform, HP and LINUX are the modes of operation today using advanced software tools to complete the tasks formerly done by hand.
Introduction
Introduction to Chapter 1 Application-Specific Integrated Circuits (ASIC) [1996] Application-specific integrated circuits (ASICs) fit between the detailed full-custom circuit designs and the off-the-shelf predesigned components. They offer the designer a faster method of tailoring the circuit to the task while retaining most of the fast design turn-around time offered by predesigned parts.
The Array An ASIC array is a single die from a production wafer. in the 1990s, it was generally two or three layers of metalization placed on top of a base array. Figure 1-1 provides an overview of the steps involved in building a semi-custom array. By 2001, the levels of metalization had climbed to an average of six layers of metalization. At least two layers are usually reserved for power-ground planes. THe layers in the base array varied with the process with 26-28 layers in the base die being a reasonable assumption. Figure 1-1 Semicustom Array Processing
The base array is predesigned by the array vendor. It consists of the layers required to define the cells and the components within
them. These components vary depending on the type of cell and the array family. They are resistors, diodes, transistors (bipolar or CMOS) with capacitance and impendances implied in the layering. The threshold voltage generators and other overhead circuitry will also be included in the base design. WAFER -----------------> DIE multiple die
Individual array
The array designer will have already determined where the fixed power and ground pads are located, how many types and how many of each type of cell there is per array, and what design rules are required in the use of the array. The base array is premanufactured, reducing the turn-around time of the design between design acceptance and prototype or production. CBA Design System designers had the priveledge of designing their own base die, including punch-outs for hard IP blocks, and powerground routing for RAMs and soft IPs. The wafer is put through wafer-sort to determine good and bad die. The die is a pre-packaged part which can be and is tested. When packaging is completed, the packaged part is retested. Wafer verification software (Dracula comes to mind) must verify all layers of the wafer, metalization and the base die, and verify that all IP blocks and memory blocks are properly connected. Hard IP blocks interconnect or "stitch" into all levels of the base die.
Customization The customization of the array comes from the interconnect of the base array components. The interconnect is both the intraconnect between components within a cell to form a function, called a macro, and the interconnect between the macros to form the circuit module. One or more modules may be placed on an array. The interconnect between macros is considered the routing or nets. Routability is a measure of the ability to transform the design to physical metal etch patterns or the metalization of the array. The macros are formed by a predefined layout pattern that is not considered part of the routing problem. Macros may exist with several "footprints", which allow them to be positioned with different layout aspects. They also exist in different drive versions, which may also cause differences inthe layout pattern. Switching a macro from one drive configuration to another may require its relocation in the circuit layout. With the high-speed arrays already available, the time delay or propagation delay through an interconnect net under heavy loading conditions may exceed the propagation delay through a macro. Priority pre-placement, design optimization for speed and other design approaches must be used to control the interconnect delays.
For DSM technologies, any technology below 0.18 micron, it is given that the interconnect delays will represent approximately 70% or more of the timing path delay These tehnologies require pro-active design methodologies to be successful. Design partitioning, placement, and careful constraints are all required for a successful DSM design,
Design Tools - [2001] In the 1990s the industry began to shift to EDA tools to handle the increased complexity of the ASIC designs. Any reasonable engineer can handle a design of up to 30,000 gates. When 6 million gates are involved, it would take multiple engineers years to complate one design. By the 1990s designs shifted from schematic capture, with the engineeer selecting the appropriate macros from a library, to HDL code. VHDL is currently used in Europe and Verilog is currently used in the United States.
Design Tools - [1990s] To perform a logical circuit design for an array-based circuit, the designer may choose between schematic capture, direct netlist creation, and the use of behavorial languages such as HDL and VHDL. Netlist generation as was done using Tegas is too tedious an approach for ASIC-based circuits past a minimal size. Netlist generation via a behavioral language or from schematic capture is the more usual approach. Translation programs exist to move a netlist in one format to a netlist in another format. The industry is still trying to expand the idea of EDIF, a common netlist that would allow input to any simulator and any placement system. For example: Verilog to Mentor translation is now possible using a Verilog netlist to create Mentor schematics. (Back-generation of schematics will remain a necessary step in spite of the push for behavioral descriptions as the preferred design tool.) Once an acceptable netlist has been generated by whatever means, the designer needs to check or verify that the design rules have not been violated. When the circuit is certified as acceptable and buildable, the circuit must be simulated according to the design submission requirements of the chosen vendor. The simulation must be checked. The design must be documented. Simulations involve control programs, stimulus generation, annotation delay files and descriptions. AC test analysis requires additional documentation. Which simulator can be used, and whether any timing verifier or other tools are available, is limited to what the array vendor supports. The simulation output files must be formatted according to vendor rules to allow the generation of test vectors. These will be transferred to the placement software and to test-generation software. A submission may include dozens of files that must be
tracked, controlled for revision level and managed to verify that the design submitted to the vendor is the one intended to be submitted. And yes, errors do occur.
Framework Systems Framework systems are under development as the means of alleviating the design management problem but they are in their infancy and industry sages are predicting at least five years before they meet any goals. Further, those developing framework systems disagree about those goals. There are four basic functions of a frame work agreed upon: ● ● ● ●
integration of design tools provide a common user interface manage the design data and manage the design process.
The integration of design tools includes tools from non-framework vendors. Allowing access to different design tools requires that the interface to those tools be reasonably similar and easy to use. (The Macintosh computers have proven the merit of similar and easy interface to tools and common databases.)
Array Selection as the First Task Whatever the framework systems end up providing, the basic design flow that exists today will remain intact. The first and most difficult task of array selection will not change, nor will the basic goals of the current design methodology. It is the ease of satisfying those goals that will change. The process of selecting an implementation for a circuit involves two basic decision processes. ●
●
First, a decision must be made on the technology that will satisfy the design criteria for power and speed. Second, a selection must be made from the components (arrays, macro, IP, I/O, etc.) available within those technologies.
Even with all the changes made in software tools, these two key items remain unchanged. Choose the process, which defines the technology, and then choose the components, for even with highlevel synthesis, the astutue designer can "guide" the software to a better solution. The software (Synopsys, Cadence, Avant! are the big three) is chosen by the designing group with input from the selected foundry as to the product design flow.
Design Options The choices listed in Table 1-1 are available to the designer for whom off-the-shelf and bit-slice microprogrammable architectures are not good enough: full-custom arrays semi-custom arrays and simple-custom (gate) arrays.
Full Custom Arrays If the bit-slice or off the shelf microprocessor solution is not adequate, the next option may be a customized design. Full customization for an application-specific design is not practical in individual components at the SSI/MSI level. Instead, one or more custom semiconductors can be designed that are specifically for and only for the application. The customized VLSI chip may be totally designed by the customer from the design of the components present in the individual cells (resistors, diodes, transistors, etc.) to the interconnect between these components in one cell and other cells. Table 1-1 Design Approach Comparisons FULL CUSTOM
SEMI-CUSTOM
PREFAB
multiple layers (18-20+)
2-3 layers
0 layers
fastest (maybe)
faster (maybe)
fast
smallest (maybe)
smaller (maybe)
longest design cycle
moderate design cycle
fastest design cycle
most control over design
moderate controlover design
no control (fixed architecture)
All mask layers required to implement the full custom design must be generated specific to the application. Prototype and debug must encompass all layers. This approach will provide the smallest silicon and the most optimum solution if the designer is experienced. It can be the longest prototype time. The key is the required expertise of the designer. The number of designers that can successfully design a fully customized array is significantly less than the designers that can successfully design an MSI/LSI PC board. Depending on the manufacturer, a macro or standard cell library may exist that can speed the design time if the cells and macros are suitable for the application. The internal macro interconnects would still run through all mask layers. Design time may be reduced at the cost of some flexibility, but prototype time would remain lengthy. The advantage of the macro library is to help the designer by providing common functions while lessening the experience level required for a successful design.
Semi-Custom Arrays A compromise between off-the-shelf modules and full custom semiconductors is semi-custom design. Semi-custom combines a manufacturer-designed base wafer with all components in place (resistors, diodes, tran-sistors, etc.) and a customer-generated interconnect pattern to implement the desired circuit. (Refer to Figure 1-2.) A SEMI-CUSTOM ARRAY CONSISTS OF: ●
Base Wafer ●
Macro Intra-connects ●
Placement ●
Interconnects The interconnect pattern, also called a netlist, is generated from the customer-designed schematic and restricted to the topmost mask layers. Most arrays require two metal layers and a via (through hole) mask layer. Some arrays require three metal and two via layers. Three-layer arrays may use two layers for global interconnect and the third for macro intraconnect, but there are no hard rules. The more layers, the more prototype debug time required. This may be compromised with the significant gain in power management possible with the third layer. The schematic for a semi-custom array-based circuit is built up from a library of macros released librarythat represent SSI, MSI, and sometimes LSI functions. If a different macro is needed from those in a released library, the manufacturer, for a fee, can usually generate a special custom macro (cell and component dependent). Most manufacturers prefer that the released macros be used. Figure 1-2 Circuit Composition
Semi-custom arrays allow a designer to create at the SSI-MSI level, with familiar functions, without a detailed knowledge of the underlying technology. Semi-custom arrays may themselves contain elements of bit-slice components, allowing both the hardware and the software to be tailored to the application. For example, at least one CMOS array uses the AMD Am2909 sequencer as a macro. If the designer is experienced and familiar with the macro library, the resulting silicon usage may approach that required by the best full custom design. CBA (cell-based arrays) and the now more-popular standard-cell libraries [popular as of 1999] are macro collections. The differences between them invlove how they are built in the sub-strata of the base die. CBA designs led the size war for some time; standard cells now produce typically smaller die sizes. Approximate estimates were for 10,000 array-starts in 2000; spilt 50-50 between these two technologies, the first time standard cells had come on so strong. Metalization layers are the customizable layers in a semi-custom array. Metalization, which sits on top of the base die, currently runs to 6 layers, 4 for interconnect and 2 for power-ground, although this may vary. The number of layers of metalization is expected to increase. Keep in mind that between each routing metalization layer, is a layer of vias, the vertical interconnects.
Simple Semi-custom Devices At the simplest end of the semi-custom spectrum are gate arrays, providing one level of interconnect to the user for specification with all other connections defined. Programmable devices such as PLAs (programmable logic array), PALs (programmable array logic), field-programmable muxs, sequencers, gate arrays, and other modules are available for limited quantity applications. (PLAs allow both AND and OR gates to be programmed PALs allow only the AND gates to be programmed.) Programmable devices are restrictive in the functionality provided. They are suitable and competitive when there is a match between a module and the current application. Field-programmable devices are to VLSI what the ROM/PROM is to the microprocessor, i.e., they support and enhance the design project. These devices provide board-clean-up functions, incorporating the simple functions that do not fit into a full semicustom array or that were found necessary to augment in a bit-slice or fixed instruction set design. They are still with us.
Selection The choice between full-custom, semi-custom, fixed or simple gatelevel custom is based on several factors. These include: architectural requirements, interface technology requirements, size restrictions, speed (maximum worst-case operating frequency), power limitations, power supply options, manufacturing cycle time, cost, packaging options, and design time. Figure 1-3 characterizes the problem.
Basis for Discussion The discussion in this text will refer primarily to Applied Micro Circuits Corporation arrays for examples of current technology. These include: Bipolar arrays: the Q5000, and the Q20000 Series; and BiCMOS arrays: the Q14000 and Q24000 Series. However, the design methodology; can be applied to any arrays from any vendor for any array technology and to any future arrays developed by AMCC and the other array vendors.
The design methodology is generic. It is vendor and technology-independent.
WHERE DO YOU START? Figure 1-3 The Selection Problem
Note: Later chapters in this text refer to engineering workstations (EWS) and the methodology for their use in the design process. Workstations that are specifically referred to are: the Mentor Graphics System on Apollo and the Valid on SUN. Simulators referenced include Verilog on SUN4 and Lasar 6 on the VAX under VMS. The basic tools required for a design remain the same regardless of the workstation, platform, framework or mainframe used.
Circuit Architecture A fixed-instruction set microprocessor or sequencer has a predefined architecture and instruction set. A bit-slice solution places some constraints on the designer in terms of architecture but leaves most of the definition to the user by way of the selected interconnections between bit-slice modules and the microprogram control. An SSI/MSI implementation allows the designer the specify in complete, exact detail the architecture desired. The SSI/MSI design can be implemented in full custom or semi-custom VLSI. Bit-slice modules can be emulated on arrays. The ASIC arrays are big enough to support a complex ALU module but not yet large enough for one array to replace a full microprocessor.
Which Array Technology? The broad categories of technologies are CMOS, BiPOLAR, BiCMOS, and GaAs. Figure 1-4 provides a family tree of the most common technologies, at least at this moment. Array technology is a subject in itself and the reader is referred elsewhere for detailed discussions on any specific process. Figure 1-4a The Dominant Technologies
Bipolar as used in conjunction with arrays in this text refers to ECLinternal with TTL, ECL 10K, ECL 100K I/O modes, or mixed ECL/TTL interface capability. Not all arrays offer the ability to mix TTL and ECL or to mix ECL 10K and ECL 100K on one chip. Some arrays may limit the types of macros that can be placed on the I/O cells. Design limits imposed by these restrictions are generally based on the array technology. The AMCC BiCMOS has the same interface capability as the bipolar arrays while providing a CMOS internal core. BiCMOS interfaces include CMOS, TTL, ECL 10K and ECL 100K and combinations of all. Not all BiCMOS arrays offer the ability to mix TTL and ECL or ECL 10K and ECL 100K on one chip. Figure 1-4b Relations among Silicon Technologies
Ref: Design of VLSI Gate Array ICs by Ernest E. Hollis Technology differences for VLSI are primarily speed and power. CMOS is lower speed, lower power. Bipolar at 600MHz or 1.2GHz and up is faster with a high power dissipation (5-7, up to 16 watts for the fastest arrays is not unusual). BiCMOS is intended to be a combination of these two, providing a reasonable speed (about 130MHz and up) at greatly reduced power dissipation. The actual maximum frequency of operation and the power dissipation will vary from series to series even within the technologies. Data sheets for the array series of interest should be reviewed and compared as a first method of estimation for applicability.
Obtain Data Sheets from several vendors Note: One array series may be lower power at one frequency and higher power at another. Comparisons must be made using equivalent conditions. When the conditions are not specified, ask! All vendors maintain Field-Application Engineers that can explain how measurements were taken or what assumptions were used. Figure 1-4c Relations among Technologies
Size The physical size limitations imposed on a design can dictate the design approach.
Base Arrays Base arrays come in a variety of sizes, usually specified in terms of equivalent gates. The arrays discussed herein range from 250 to 28000 gates, depending on the computational approach used. Equivalent gates; allow a relative sizing between arrays of the same technology. The gate used as an equivalent gate for bipolar arrays is the NOR gate, that used for BiCMOS arrays is the NAND gate. Equivalent gate sizing can be misleading. For a CMOS array, one gate is typically one cell. For the Q24000 Series BiCMOS arrays, one internal cell is approximately 4 equivalent gates. Today's arrays are custom-designed to the project. The determination of the die size and the number of I/O is computed from initial evaluations based on the specification.
Cells The actual cells; available on a bipolar array are larger and more complex, and can support a large variety of macros. A Q5000 Series logic cell (internal) can support: a 4:1 MUX, a 1:4 decoder, a scan-set D F/F, an 8-input OR/NOR, three latches or 2 D flip/flops. A 4-bit universal register (4 4:1 MUXs and 4 D flip/flops) requires 4.5 logic cells. A 4-bit carry-look-ahead adder with carry-out requires 5 logic cells. The 4-bit carry-look-ahead adder in the Q14000 Series BiCMOS arrays macro library requires 14 basic cells or 56 gates. The Q20000 Series L-cell is sized based on one Turbo output; per cell and is smaller than a Q5000 cell. A flip/flop that uses 1 cell in the Q5000 Series may use 3 cells in the Q20000 Series. Estimating cell counts requires access to the macro library. Basic sizing information such as cell counts and die sizes; can be obtained from the data sheets. Many circuit modules can be equated to cell counts by the specific array vendor. These estimates can be used for initial circuit sizing.
Array Size - Die Size A full custom design may or may not be smaller in die size than a semi-custom design. For a heavily populated array, the differences may be insignificant. The comparison must be based on the specific
application and the skill of the designer.
Packaging For arrays, the die size, the number of I/O pads, and the number of power and ground pads used affect available packaging;. A number of standard packages; are usually available for each array and the data sheet for an array series will provide the designer with an initial table of available packages. If less than the maximum number of I/O cells is used, some smaller packages may be usable. The package selection; affects package pin capacitance;, which affects loading delay for output pins;, junction temperature; computations and cooling considerations;, and final cell placement;, which also depends on the pin capacitance, and should be made well before final design completion.
Word Length The word length necessary for the system, whether a computer, controller, signal processor, etc., is known in advance. This is seen as the width of registers, partitioning of counters, width of adders, and number of simultaneously switching outputs; (SSO;s). It affects the partitioning and modularity of the design. The adders; available with a macro library are typically 4-bit adders, cascadable with the carry-look-ahead; macro to build a range of standard adder sizes. With a macro design, the available MSI macros and SSI logic can be used to provide a range of nonstandard word lengths. Counters; are typically 4-bits wide, expandable to 12 or 16 bits in width. Comparators; are modulo 6. Registers; come as 4-bit widths and latches; as 8-bits (octal latch). Larger macros are also under development or custom structures may be possible.
Instruction Set The instruction set; that the system is to support is another major impact on the design implementation selection. By building a custom or semi-custom array, the hardware can be configured to support any instruction set yet have the advantages of still being a VLSI solution.
Speed The maximum frequency of operation; specified for the circuit must be compared to that available for the array series or the off-the shelf components. The nature of the design may make it necessary to look at the toggle frequency; of the internal functions. The maximum frequency of operation, of interface as well as internal macros, is very important but it is not the only consideration when evaluating the performance that can be achieved. Due to loading delays, the final performance will depend heavily on the implementation possible with the given macros or possible custom macros, their drive factors; and load limits;.
Achievable speed is a function of both the experience of the designer in general and the macro library in specific. As an example, three implementations of a test circuit were made with the Q3500 Series and they varied from 145MHz to 233MHz (worst case maximum speed limits). The variance was found to be solely a function of the macros selected. This type of performance variance can be repeated for almost any circuit of any reasonable size. Speed, cell utilization (silicon density) and power can be traded off among the different possible implementations. This diversity is an advantage as well as a design challenge.
Macros - Libraries - Etc. The existence of an extensive macro library;, or even one that supports the circuit function for the application at hand, can sway a
decision as to which product to select. For the arrays of interest, the designer needs to review the existence of a macro library. If the array has a macro library, review the macros available for application to the intended design.
Macro Library In an array macro library;, macros; already released are available without delay. They represent pre-modeled, pre-simulated, preverified logic blocks. Their interconnect patterns are already defined for the various mask levels.
Custom Macros If custom macros; are needed for a semi-custom array library, they involve 2-3 masks layers. If a custom macro has to be built for addition to a full custom array library, it is a multi-mask level design task.
Silicon Compilers Silicon compilers provide a translation from a design description to pre-defined macros. They provide support for designers who wish to stay at a higher level in the design process. A silicon compiler can be compared to a software compiler it will speed the design process for the engineer at the cost of some flexibility. Like framework systems, the industry has no set standard to measure or define exactly what a silicon compiler can do. They remain in isolated use, faced with the same resistance that software compilers met on their first introduction.
Other Support Regardless of the design implementation;, a certain amount of software design support; is required. Error checking;, annotation;, simulation;, testability analysis;, fault-grading;, and vector rules checking; are some of the support areas pre-layout. After placement, there are placement rules checking, bus current checking for those arrays which require it, finalization of overhead current computations (for those arrays with programmable overhead), and finalization of power dissipation computations.
Design-Support Issues The basic questions involving design support; which must be asked when selecting any array include: 1. ) Which workstations are a prospective library or parts catalog available on? What main-frame? Is the library accessible for a customer-site or must dial-up be used? 2. ) What error checking; at the schematic level is available? Are there engineering rules checks (ERCs) to check on valid names, fan-out loading, population counts, current sums, power dissipation, technology mix-ups, array pad count, and interconnection restriction violations need to be caught before simulation. 3. ) What about Front-, Intermediate- and Back-Annotation;? These are needed for metal length and load evaluation and the impact of these on the timing. The ability of the annotation software to handle rise and fall load factor; differences and metal layer; differences needs to be clearly identified. Is there provision for output capacitive load (system and package pin capacitance). 4. ) Are there support tools; for simulation? Simulation control files, reformatter;s, and vector checking; are required. Timing verifiers; are important when path matching; is required. Other software that is useful for bit-slice, all arrays and any microprogrammable architecture device is a metaassembler;. This software allows a program or vector set to be described in a user-defined language (a pseudoassembler) and compiled to ones and zeros. It provides the designer with the ability to code the vectors in pseudoEnglish for readability. An example is MICRO2 from Digital Equipment Corporation. Also for simulation, what about automatic test generation; (ATG;)? Are design-for-test; (DFT;) macros and support software available to allow the use of this tool? 5. ) How does placement; enter into the design sequence? This would be board placement for components or cell placement for a semi- or full-custom design. Does the software offer some assistance to the user in drafting a placement file? What checking software is provided either on the workstation or is accessible by dial-up?
Workstations, Mainframes, Dial-up When evaluating an array library on a workstation, there must be a match between the operating system;, the graphics editor; and simulator;s and the macro library;. Each installation document for a line of workstations specifies the versions of the vendor software with which that the library is compatible. Check with the vendor summaries published by several technical magazines for an initial review or check with the array vendor for a more updated list of equipment and software compatibility. Most array vendors offer support for several workstations. The workstations are not restricted to semi-custom or single array design support. They offer component libraries for board design through simulation. Multiple-array simulations are possible if the array is correctly modeled and there is enough memory.
Design-Support Issues Schematic Rules Checking Each workstation has a modest schematic checking pass that it makes on the way to generating the workstation-specific netlist. The error reports; from these checking routines should be checked and all pertinent errors removed. If a partial circuit is being compiled, there may be interconnect errors; that need to be ignored. The checks are not exhaustive, but later software will assume that these checking routines were successfully passed. Workstation checks include one-ended nets, undriven page inputs, page outputs with no destination, naming confusion, missing blocks, and an attempt at duplicate name detection. AMCC provides engineering rules checking (AMCCERC;) for commonly made schematic interconnect and design errors including too many cells for the array checked by cell type and macro type, too many fan-out loads, improper connections for 3-state and bidirectional enables, improper characters in names or too long names, improperly connected wire-ORs, dangling pins, grounded outputs, and terminated inputs. It is one of the most complete packages in the industry today. As a part of the AMCCERC package, internal current, worst-case power dissipation for bipolar arrays, fan-out loading tables, simultaneously switching outputs reporting and power-ground checking, an I/O list, a package data list and a detailed population report are generated. Once placement is completed, these reports have a final form that becomes part of the device specification;.
Annotation Front-Annotation; is the estimation of interconnect (pin to pin) delays in an array due to electrical fan-out loading, electrical wireOR loading and estimated metal loading. The metal load delay estimate; is a statistical estimate based on the net size;. It is available pre-placement. Intermediate-Annotation uses a refined estimation of the metal load delay based on the relative placement of the individual macros in an array. The electrical fan-out loads and electrical wire-OR loads remain the same. Intermediate-Annotation is generated postplacement but pre-routing. Back-Annotation uses the final, actual metal load delay computed from the known metal lengths for the metal layers involved in the interconnect. It is available post-routing.
The availability of the annotation software, its ease of use, and the ease of integration into the simulation database is an important concern. Output capacitive load delays; for system system capacitive load;and package pin capacitance; affect the overall path delay. The ability to specify these loads and to have their delays included in the simulation database is another item of concern. If this feature is not available, the computation must be manually performed.
Simulation Support Every simulator has its own unique format requirements; for simulation input files. The stimulus, its switching waveform, the operating condition (military, commercial, nominal or minimum) library, sampling rates; or print on change recording, output file format, and input file format if a binary file can be read. The workstation may offer several methods of simulation and timing verification. The vendor may only accept certain files or file formats;. List and waveform displays; are available on the three previously listed workstations. Data can be displayed in binary, octal, decimal and hex format.
Reformatters If a standard simulation vector format; is required by the array vendor or by software to which the simulation results must be submitted as data, some means of reformatting must be available. For arrays, the functional, parametric, and AC test simulation results are generally used as input to test vector generation; software, and the allowed input formats may be restricted.
Example AMCC accepts only binary results for specific signals (input, output, bidirectional, 3-state and bidirectional enable internal signals). Sample size is restricted. No print on change; results are used for functional simulations, only sampled. No waveforms are requested. Since there are different simulation output formats, AMCC customers use a reformatter to translate Dazix, MENTOR, Verilog, Lasar and VALID simulation output files into a generic format;. If any other workstation is used, the output of that simulator must also be reformatted. AMCC uses their AMCCSIMFMT; software to transpose output files into an AMCC generic interface format that their test software programs can read.
Rules Checking Regardless of the implementation selected, the design must be simulated and the parts tested. There may be a number of functional, parametric and AC test simulation vector rules; that must be followed to insure correctness in the test program. The rules are based on tester limitations;, test procedures and test objectives. The rules required by the array vendor must be clearly stated and it is increasingly desirable to have some form of rules check software available to help the designer. AMCC supplies a vector checker, AMCCVRC;, to catch the more blatant vector rule violations such as missing required signals, too many signals switching in one vector causing noise, race conditions, undesired internal signals in the output and uneven sampling steps. Some basic toggle tests are also included.
Submission Assistance The design submission process for custom and semi-custom arrays requires a number of specific forms, files and validation procedures be followed and the process is increasingly complex. Automation of that procedure is one desirable goal. Automation support is feasible for the I/O signal list;, package pad-pin-post, capacitive load; and I/O toggle frequency; descriptions, design validation; checklists and design submission; checklists, including simulation submission. If no automated support is available, the necessary forms must be reviewed and filled in manually. Errors and incomplete information can lead to schedule delays. (Refer to the framework systems.)
Placement As a part of the submission process for custom and semi-custom arrays, the designer may wish to submit a desired placement or
partial placement. The vendor must supply placement; rules and restrictions for the particular array in the selected package as well as a placement worksheet. The user may be able to choose between a full graphic interface to the placement system or be content to supply the vendor with an ASCII list for placing some or all the macros, and let the vendor complete the placement process. The options and the control over placement become an issue when performance is driven to the limits of the array technology. I/O placement is an issue when an array will emulate an older technology and the PC board array pin out pattern must remain unchanged.
Design-Support Issues Reformatters If a standard simulation vector format; is required by the array vendor or by software to which the simulation results must be submitted as data, some means of reformatting must be available. For arrays, the functional, parametric, and AC test simulation results are generally used as input to test vector generation; software, and the allowed input formats may be restricted.
Example AMCC accepts only binary results for specific signals (input, output, bidirectional, 3-state and bidirectional enable internal signals). Sample size is restricted. No print on change; results are used for functional simulations, only sampled. No waveforms are requested. Since there are different simulation output formats, AMCC customers use a reformatter to translate Dazix, MENTOR, Verilog, Lasar and VALID simulation output files into a generic format;. If any other workstation is used, the output of that simulator must also be reformatted. AMCC uses their AMCCSIMFMT; software to transpose output files into an AMCC generic interface format that their test software programs can read.
Rules Checking Regardless of the implementation selected, the design must be simulated and the parts tested. There may be a number of functional, parametric and AC test simulation vector rules; that must be followed to insure correctness in the test program. The rules are based on tester limitations;, test procedures and test objectives. The rules required by the array vendor must be clearly stated and it is increasingly desirable to have some form of rules check software available to help the designer. AMCC supplies a vector checker, AMCCVRC;, to catch the more blatant vector rule violations such as missing required signals, too many signals switching in one vector causing noise, race conditions, undesired internal signals in the output and uneven sampling steps. Some basic toggle tests are also included.
Submission Assistance The design submission process for custom and semi-custom arrays requires a number of specific forms, files and validation procedures be followed and the process is increasingly complex. Automation of that procedure is one desirable goal. Automation support is feasible
for the I/O signal list;, package pad-pin-post, capacitive load; and I/O toggle frequency; descriptions, design validation; checklists and design submission; checklists, including simulation submission. If no automated support is available, the necessary forms must be reviewed and filled in manually. Errors and incomplete information can lead to schedule delays. (Refer to the framework systems.)
Placement As a part of the submission process for custom and semi-custom arrays, the designer may wish to submit a desired placement or partial placement. The vendor must supply placement; rules and restrictions for the particular array in the selected package as well as a placement worksheet. The user may be able to choose between a full graphic interface to the placement system or be content to supply the vendor with an ASCII list for placing some or all the macros, and let the vendor complete the placement process. The options and the control over placement become an issue when performance is driven to the limits of the array technology. I/O placement is an issue when an array will emulate an older technology and the PC board array pin out pattern must remain unchanged.
Design-Support Issues Design Upgrades A semi-custom array, full array or bit-slice design can be upgraded more easily than an LSI/MSI/SSI component or a fixed-instruction set microprocessor design. For bit-slice, if the design enhancements are known at the time of the original design, allowances can be made through interconnections and functional capabilities that are not accessed until a microprogram accessing these features is incorporated. Many changes can be made with microprogram changes alone. For semi-custom arrays, if the design enhancements are known in advance, the arrays can be partitioned to leave room for future macro additions or the macro functions could even be incorporated. As with bit-slice, the added capability is simply not accessed until required. If the design enhancements (evolution) are not known, but are anticipated to occur, the allowances for expansion may be anticipated. The designer may provide room for the design changes to be incorporated onto the older schematics, with additional vectors to be added to the existing simulations. The design is thus easily revised.
Tradeoffs The designer must evaluate the all the items discussed in this chapter to make a selection as to the best method of implementation for a specific circuit design. From there, the designer must further evaluate to find the best components available within the chosen category of implementation.
Exercises 1. To select a design approach, the following are questions that may need to be answered: ● ● ●
●
What architecture does the design require What flexibility can be allowed in the implementation What package types are desired versus what package types are available What operating environment (Commercial, Industrial or
●
● ●
●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ●
●
Military) What cooling considerations have been made (heat sinks, air flow) What is the required interface to the outside world What is the required I/O mode (ECL, TTL, CMOS, MIXED ECL/TTL) What power supplies are available (+5, -5.2, -4.5,+5 with 4.5 or +5 with -5.2v) How many of the required I/O signals are inputs How many of the required I/O signals are outputs How many of the required I/O signals are bidirectional What type(s) of TTL: Totem pole, open collector or 3-stated What type(s) of ECL: ECL 10K, ECL 100K, on-chip series termination, off-chip series termination, differential, open collector, Darlington, etc. What about CML What about CMOS What are the physical size limitations imposed on the design What word length is required for an adder, ALU, counter, sequencer What instruction set or commands are to be supported How big is the design (equivalent gates) What is the intended maximum frequency of operation including I/O toggle frequency for the circuit, i.e., what are the performance requirements How much design time has been allowed What design support is available How much debug time has been allowed What debug support is available What simulation support is available What simulators What timing verifiers What about testability support Are upgrades to the design planned and if so, how easily can a design be revised What upgrades to the component series are planned What are the possible time schedules for ❍ design review: ❍ design to prototype ❍ prototype to production What are the overall cost limitations
Review these questions. Catalog them as to design-specific, arrayspecific, component vendor-specific and workstation-specific. What other questions might need to be asked before a design implementation approach (semi-custom, full-custom, fixed components, bit-slice, or gate array) can be selected? 2. Review the latest issues of ASIC News and at least one other ASIC related magazine. a. Locate two articles on framework systems. b. Locate two articles on HDL and VHDL. c. Locate at least one survey on expected growth of demand for ASIC arrays: bipolar, BiCMOS, CMOS and GaAs.
Update 2000 When I first wrote this book, I had spent a considerable amount of time in the Bit-Slice and ASIC industry. The book reflected the design procedures at that moment. It was a reflection of the class I taught at AMCC and at UCSD for 11.5 years. From there I developed and taught the technical training classes for CBA (cellbased array) designs. Now, even CBA is beginning to fade as standard cells take over a the dominant array-based technology. As of January 2001, there have been some changes, although not as many as people would like to think. The basic design flow? - It is still with us. ●
● ●
●
●
●
●
●
● ●
●
●
●
●
The size of the designs has changed from what could be handled by a human (up to 50,000 gates) to what must be handled by a computer (12 million gates). We are seeing designs that run 4-6 millions gates per ASIC and will be seeing 10-12 million designs shortly. Wafers have gone from 3" to 6-8" and are headed for 12". From primarily bipolar, I now work almost exclusively with CMOS. The process technology has gone from 5 microns to 0.18 deep sub-micron, and everything bipolar designers worried about is now the headache that concerns CMOS designers namely, that gate delays are practically negligible compared to interconnect delays. DSM (deep sub-micron) refers to this phomonina. By 1998, CBA ASICs still ruled, but by 2000 standard cells had become dominant, producing smaller and faster designs. Libraries still exist but now macro selection is by a synthesis software package and 85% of designs are done with the Synopsys Design Compiler package. Cadence now has a competitive synthesis package. Schematic capture is pretty much an anachronism and design are specified in Verilog or VHDL (RTL). Software tools take the design from RTL (register-transfer logic) through wafer-verification, with software like Dracula, Vampire), performing ATT (antenna checking), ERCs (electrical rules checking), DRCs (design rule checking). Manual operations are no longer feasible. RAM and ROM onboard an ASIC is no longer unusual. IP (intellectual property) blocks are in common use. DesignReuse is a buzzword. IP may be multi-layered and fixed (hard IP), or soft IP where a netlist is incorporated and the block may be altered by synthesis steps. Vectors are generated by automatic test vector generation software. Designs are made DFT (design for test) as a routine step in the compile process. DFT is no longer an option. Software can tell you if your design is testable and if it is routable. Place and Route are no longer a last step in the process.
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
Floorplanners are required for any sizable design. Cadence's Gate Ensemble and Silicon Ensemble were the leaders in Place and Route. Synopsys has the Chip Architect floorplanner. Cadence has the Logical and Physical design planners. Avant! has Planit! for floorplanning tasks. Everybody's floorplanner has to talk to everybody's synthesis tool and they both have to talk to everybody's place&route software. Everybody has to talk to PrimeTime. What AMCC called "intermediate annotation" is what is produced during floorplanning and it's use is not optional.. EDIF became the standard for netlists. There is no more concern about whose netlist went where. EDIF is used to input to floorplanners, and EDIF is produced by the systhesis tools and by the place&route tool. DB has become a standard format. DEF (design exchange format - Cadence) became the standard for input to place&route PDEF 2.0 is the standard output of the floorplanners and is now standard input to place&route software. PDEF 3.0 is on the horizon. SDF 2.0 is used for delay files and can be created by PrimeTime from delay information from most floorplanners and place&route tools. SPICE files are still with us VCS, VSS and related tools perform simulation. GDSII is for building the basedie layers and GDSII is produced by the place&route software PrimeTime is the standard for static timing analysis (85% of design are verified with that software). Designers no longer have to "fit" their designs into a fixedsize chip with a fixed I/O count (cells in the I/O ring). Dies are designed to fit the design. Holes are punched through the layers to accommodate IP blocks (Hard IP blocks) and software exists to "stitch" the IP blocks into the basedie. You may not "diddle" with parametric specifications. If you have a different set of operating conditions, you must go back to the library vendor for new library specifications. The slightest variation can have dire consequences in the results. The axiom that the engineer who knew the library could do better designs than a more experienced engineer who did not, still holds. You may "direct" the use of macros by the synthesis tools. Tcl has become the standard interface scripting language.
In fact, Synopsys alone has approximately 42 different software tools available to help create an ASIC design. Design flow from RTL to wafer fab is the focus of most vendors today.
Structured Design Methodology
Introduction To The Overview The Structured Design Methodology, as developed here for the design of Bipolar, CMOS or BiCMOS logic arrays, applies to any array design effort regardless of technology or vendor. The designer who follows this methodology will ensure a smooth design flow between milestones that will help ensure a successful design the first time. The design flow is presented in this chapter at the introductory level. Following chapters will detail specific areas such as timing analysis, simulation and power computation.
Design Sequence - Pre-Capture The Structured Design Methodology stresses a certain design flow sequence of events, developed for use by the beginning array designer, the beginning user of an Engineering Workstation (EWS) or the designer experienced in both. Each step will be discussed in more detail after the design flow is fully outlined.
Circuit functional specification The circuit functional specification is the target specification; it describes what it is that is to be implemented on one or more arrays. This includes: a block diagram of the system or circuit, overall performance requirements, I/O interface, testability, environmental and packaging requirements. (See Table 2-1.) Once the functional specification identifies the need for more than one array, partitioning of the overall circuit modules to ensure proper boundary conditions must be made and then the functional specifications of the individual array circuits must be created. The specifications must be defined to be independent of each other to allow parallel circuit development. Note that there is no constraint at this point as to the product to be used beyond operating specifications. The technology of the array is defined by the performance requirements. As a basic guideline, high speed requires ECL bipolar, slower speeds and low power require CMOS, and moderate speeds and bipolar drive capability without the price of bipolar power dissipation require BiCMOS. Where the boundaries are is subjective and subject to continual evolution and change. Table 2-1 Components Of The Target Specification
Target Specification Block Diagram Showing Modules and Their Interface to Each Other and to the Rest of the System Functional Description of Modules Maximum Frequency of Operation Performance Requirements I/O Interface Environmental Requirements Physical Restrictions Power Restrictions Packaging Restrictions
Circuit hardware specification The circuit hardware specification is the planned hardware approach to satisfying the target functional specification. For multiple array designs, this may involve another level of specification, one specification for each circuit intended for a different array. This implies that project partitioning has been completed, and defines all required I/O and throughput performance. (See Table 2-2.) Table 2-2 Components Of The Hardware Specification Hardware Specification Selected Technology Potential Array Series Modules Detailed into Functional Sub-Modules Functional Description of Sub-Modules Functional Block Sizing - Cell Counts (Rough) I/O Interface Details - Cell Counts (Rough) Toggle Frequency for I/O initial Packages Critical Path Throughput Estimates Power Estimates
A hardware architecture specification equates to PDL (program description language) for software. It identifies modules and closely defines how the modules will work together. HDL (hardware description language) and VHDL have been developed to formalize this specification. From this level of specification it is possible to estimate I/O signal requirements and internal cell utilization. At this point, the estimates are very rough and will only serve to allow a first cut at reducing the number of arrays that need to be considered. Some compromises or engineering tradeoffs may have been made, refining the functional specification.
Review of the available arrays The arrays available at the time of a design evaluation need to be reviewed using the outline in Table 2-3 as an initial basis of comparison. Table 2-3 Array Checklist - Initial Review Initial Review Checklist technology I/O resources - number of available I/O pads and pins internal density or equivalent gate limits I/O mode configurations including power supplies supported Placement support, options power dissipation limits available packaging maximum operating frequencies ● ●
internal toggle frequencies interface toggle frequencies
design support: ● ● ● ● ●
EWS libraries - the macros available Netlister - the macros available annotation support design-correctness software user-friendly interface with test
turnaround time from design submission to wafer prototype cost
Figure 2-1 indicates the interdependencies between functional specification, hardware specification and the arrays. Figure 2-1 The Array Selection Process
This review must compare what is available with the circuit specifications and produce a list of the available arrays that could be used to support those specifications. As the number of potential arrays is reduced, preliminary implementations of some of the critical paths for the circuit, constructed from the macro libraries under consideration, should be evaluated.
Initial sizing of the circuit Before an array or array series has been chosen, estimate the size of the circuit or circuits to be placed on the array. Estimate the number of I/O connections, the types of I/O connections and the I/O cell count. The I/O cell count and the pad count may both be required. Estimate the internal cell count. (See Table 2-4.) Table 2-4 Sizing Review Initial Sizing Review Types of I/O Interface Number of Each Type Equivalent Gate Count or Internal Cell Utilization ---- Estimated by Cell Type
For standard functions, equivalent gate counts may exist that can be used in place of internal cell count to estimate the size of the internal array area that will be required. Internal cell counts are more useful than equivalent gate counts where the cells are more complex than one or two gates. Compare these estimates to the review of the arrays still under consideration and their I/O resources, their internal density and their maximum frequency of operation. Note that, at this stage in the design, the sizing estimates for the circuit may be off by a considerable margin. Historically, device cell utilization at the estimate stage of a design is 20-30% below the final value. Q2000 Series Approximate Equivalent Gate Size (Historical)
Internal cell utilization The first population checks can be made before the circuit is designed. Internal cell utilization is one of these checks. Internal cell utilization is the number of cells required by a circuit divided by the number of cells available. Internal
Number of Internal Cells Used
Cell Utilization = ------------------------------------------Number of Internal Cells Available Macros that are suitable can be listed and a rough estimate of internal cell utilization computed. This step includes a review of the available macros in the various libraries with emphasis on the requirements of the specific circuit application. Reviewing the macros available allows a match to be made between functional macros that exist and what is required to implement the design in the least silicon for the highest performance. All other things being equal, the convenience of the macro library can be a decisive factor in the final array selection. Do the macros available support the circuit modules? Large macros may include adders, carry-look-ahead, comparators, up and down counters, universal registers, large multiplexors and decoders. Internal cell utilization should be 60-70% at the initial stages of sizing estimates to allow for expansion due to buffers, fan-out load distribution, path balancing or specification changes. The internal cell utilization limit for a completed design is array-specific. (See Table 25.) AMCC arrays have an upper limit of 95-100%. Table 2-5 Internal Cell Utilization Limit Preliminary Circuit Final Circuit 60-70%
80-100%
Interface cell utilization The I/O requirements to the outside world are the second size determination. The array for a circuit must provide sufficient I/O capability to handle all signals, all other interface-placed circuit support such as three-state enable drivers, test enable controls and added power and ground pads to support simultaneously switching outputs (SSO) and high-speed inputs. As with internal cell utilization, only an estimate of final interface cell utilization can be made. The array should not use100% of the I/O or the design will become I/O bound. Pad utilization, for cases where the I/O cells and pads are not one for one, must also be kept under 100%. A check on array symmetry should be made. The Q20000 Series arrays do not provide the same number of I/O cells in each array quadrant. This may affect placement and added power and ground usage. The Q24008 is not square and has variable power and ground bonding. Check for these and other variations that might affect allowable utilization of the I/O pads and cells.
Selection of the array series Integrate the hardware specification, the available arrays and the initial sizing estimates to select the target array series. The final choice is usually based on the performance - cost - availability support matrix. In cases of equivalence between one or more array series, the final choice may be subjective. Package availability should be considered in the early decisions since customized packages, especially for large arrays, take months to
develop. The specified performance and requirements for on-chip memory will assist in the reducing the number of options. Only a limited number of arrays support on-chip memory, such as the QM1600T. CMOS and BiCMOS do not yet support designs operating at 300MHz (although individual macros can toggle at these speeds). High-speed bipolar arrays support paths operating over 1.4GHz.
Combine all of the information gathered to date and select one or more series for final evaluation.
Compute the path propagation delay Compute the path propagation delay for the most critical (time sensitive) paths in the circuit. Make adjustments to the schematic in terms of macro options for speed where needed. Does the estimated performance satisfy the specification? Sum of Macro
Sum of Macro
Path Delay = Intrinsic Delays + Extrinsic Loading Delays For the arrays that use typical specifications, be certain to use the correct multiplication factor (WCM) for this worst-case analysis. Review the assumptions made in establishing the multiplication factors and adjust them if these assumptions are not expected to be met (i.e., derate the performance by a higher factor). Some vendors call these multiplication factors "adjustment factors". Be clear as to what is being adjusted and why. There may be different multipliers for the different product grades, Commercial and Military, and for different power supplies within the product grade. The multiplier may depend on the macro type. Many arrays are specified without worst-case timing multipliers. They are specified with min/max ranges for each macro propagation delay. Maximum path delay is found using the MAX data although the conditions for a maximum propagation delay for an individual macro will vary. Minimum delays are found using the MIN data. Be certain that the proper fan-out loading and performance specifications are selected when doing this computation. Because of the high degree of variation in the way a library is documented between vendors and between array series from the same vendor, be certain that the rules regarding the methods of specifying timing delays for the macros for the array series selected are clearly understood. Internal extrinsic loading delays are composed of metal load (Lnet), electrical fan-out load, the sum of all loads driven (Lfo), wire-OR electrical loading if the array allows wire-ORs and if one was used in the net (Lwo) and the k-factors for each. The k-factors, expressed in ns/LU, convert the load units into time units. Table 2-7 shows the extrinsic load equations for internal nets as they are used by AMCC and other vendors. K-factors may be specified as tables, graphs, or broken down into parts for temperature, voltage and processing. Check with the specific vendor.
Will the array support the maximum frequency of operation and the critical path performance requirements?
Table 2-7 Components Of Path Delay - Internal Loading General Equation for Internal Extrinsic Delay: No wire-OR allowed: tex = knet * Lnet+ kfo * Lfo General Equation for Internal Extrinsic Delay: Wire-OR allowed: tex = knet * Lnet+ kfo * Lfo + kwo * Lwo Worst-case Internal Extrinsic Delay: For Arrays with a Worst-Case Multiplier: texwc = WCM * tex For Arrays with no Worst-Case Multiplier: tex is already worst-case
External extrinsic loading delays are composed of the system load capacitance and the package pin capacitance (Lcap) and the k-factor. The k-factor, expressed in ns/pF, convert the load capacitance into time units. The equation used by AMCC for this delay are listed in Table 2-8. Table 2-8 Components Of Path Delay - External Loading General Equation for External Extrinsic Delay: tex = kcap * Lcap Worst-case Internal Extrinsic Delay: For Arrays with a Worst-Case Multiplier: texwc = WCM * tex For Arrays with no Worst-Case Multiplier: tex is already worst-case
Compute the estimated power Use the macro occurrence list compiled for cell utilization to compute power. Determine the worst-case current multipliers used by the array and what voltage variations will be used by the circuit for DC power computations. Review the AC power equation if AC power must be computed. ECL output macros use a termination current and that power element must be included with the DC power computation. Different technologies use different methods to compute power as seen by the examples in Table 2-9. Table 2-9 Example Technology Approaches To Power Computation - AMCC Arrays
●
Bipolar (pre-Q20000) uses a current dissipation for each macro regardless of operating frequency (DC power only). ●
CMOS uses internal and output macros and their operating frequency to find AC power dissipation. ●
BiCMOS uses a combination of these techniques, DC power for bipolar interface macros and AC power for internal macros. ●
Q20000 Series uses DC power for all macros and AC power computation for ECL inputs, Darlington outputs and all internal macros.
Some bipolar arrays have power-down capabilities that can reduce the current dissipated when macro output pins are not used (conditional geometry). Other arrays may have programmable overhead current. Before ac-tual placement, an estimate of the overhead current will need to be used.
Are the estimated power and estimated maximum current acceptable for this design on this array? Actual DC power computations and maximum current checks are available through the MacroMatrix AMCCERC after once the circuit has been captured on an AMCC-supported EWS or netlister. A worksheet is provided for AC power computation.
Compute maximum internal current A maximum internal current may be specified for bipolar arrays. It is possible for the total core current to be computed and compared to array limits. It does not guarantee that the design will later pass layout row current limits. If the circuit internal core current is high and the cell utilization is also high, and other placement constraints are required, then the placement process will be difficult and may be unsuccessful.
Before placement, a global check is used, verifying that the core as a whole can handle the current required by the macros. A more detailed bus-check, or row, half-row, and quadrant current check, can be made after placement for those arrays which require this type of checking. BiCMOS and CMOS arrays typically have no internal current limit. The development of three-layer metal arrays reduced the concern for this check for bipolar arrays as well, leaving the final control of the power used in the design to be a function of the ability to keep the junction temperature of the packaged part within limits.
Make the final package selection Make the final package selection based on the array chosen and the estimated power. Refer to the Packaging Brochure from the chosen vendor. For packages with internal power and ground planes, the package selected will control the placement of added power and grounds if the use of package signal pins is to be avoided. A package must accommodate all signal pins required for the circuit plus any signal pins required by added power and grounds not placed to connect to the internal power/ground planes of the package. When a package has no internal bonding planes, the selected package signal pins must be sufficient to include all circuit signals and all added power and grounds. Review the array for any other pads that need package signal pins before making the package selection. The Q20000 Series arrays have four fixed pads, two for the thermal diode anode and cathode and two for the AC speed monitor. These array pads must reach external package signal pins, decreasing what is available for the circuit proper.
Compute the junction temperature Compute the estimated junction temperature based on the power dissipation, the packages available that meet specifications and the operating environment, including any heat sinking and air flow as specified in the functional specifications. If possible, several options should be evaluated. The allowed packages for an array should also have their thermal coefficients for junction-case (Qjc) and junction-ambient (Qja) specified. Tables or some other means of computing the coefficient for case-ambient (Qca) as a function of the heatsink, the array, the package and airflow should also be provided. For most military applications, Tc can be maintained at 125oC. For most Commercial applications, Ta can be maintained at 70oC.
Read "Theta" for Q: Military: Tj = Pd * Qjc + Tc Commercial: Tj = Pd * Qja + Ta with Qca = Qjc + Qca
With the completion of both timing and power analysis, changes in macro options, or optional functions within the circuit can be evaluated and the speed-power curve managed before full schematic capture and simulation have been performed.
Optional - Bonding diagram (custom bonding), Pinout request As an option, a bonding diagram (pin out) request can be submitted to the vendor for approval Both pin out requests and placement requests can be initiated by the designer and both must be approved by the vendor after layout and Back-Annotation evaluation.
Review the design submission requirements Review the requirements for the array series design submission as specified by the vendor. ● ●
Are schematics required? What schematic format is required by the vendor?
● ●
What simulation must be run and submitted? What other procedures are requested by the vendor?
Clarify what is to be done to actually perform a design submission to your vendor.
Pre-Simulation Steps Once an array or array series has been selected, the design can be captured and all checking performed with packaged or vendor software. For non-schematic designs, the steps leading to the netlist are performed per the system requirements. Once a netlist exists, the design steps are the same.
Perform schematic capture through netlist generation Perform the schematic capture using the Dazix, Mentor, Valid or other EWS (Electronic WorkStation) system; Lasar 6, Verilog or other netlister equipped with schematic-generation software. Perform the schematic capture following vendor schematic rules and conventions. Perform the vendor-software steps through netlist generation Each workstation has a different netlist format and a different procedure to generate it. Each workstation has its own simulator that uses the workstation-specific netlist as an input file. LASAR 6 (Vax/VMS) and Verilog each has a specific netlist format. Communication of a design from the design workstation to a vendor must be done using a netlist the array vendor can recognize. (See Figure 2-2.) In the 90s, an array vendor was limited to accepting only those designs created on a workstation that matches the equipment that the vendor has in-house. Most design input today is done without schematic capture. Cadence Composer can handle schematics. Design Compiler from Synopsys will display a schematic after synthesis (best used at the module level). Today;'s engineers use Verilog or VHDL netlist to input a circuit description. Design Compiler produces a Verilog netlist, an EDIF netlist and a Synopsys .db formatted file for design transfer. Figure 2-2 Netlister Confusion
Another solution is the use of a dial-up design system based on a mainframe. The array vendor provides the account access for a fee and provides all required support and the designer provides an acceptable terminal. The problem is the access to a compatible terminal when a graphics terminal is required and the costs of the design in connect time. To combat the problem of multiple formats without moving to a dial-up solution, netlist reformatters or translation programs have been written. AMCC has a netlist formatter, AGIF, which is customized to each supported workstation and netlister. The AMCC Generic Interface Format file produced is called circuit.sdi and it is the means of communication between the customer and all AMCC software, including the MacroMatrix components: AMCCERC, AMCCANN, AMCCVRC, AMCCSIMFMT, AMCCSUBMIT and AMCCAD for placement.
Perform design rules checking For systems and vendors without software support or with support that is less than complete, the design checks must be performed manually. EWSbased checking provided by the EWS vendor is minimal and should only be used as a first step in the validation process. Intelligent checkers are evolving. These may be interactive with a schematic capture or work on the standardized netlist. The checker must be successfully completed before proceeding. Remove all errors if possible, and document those that remain. The vendor may require a waiver before submission if errors are not removed from the circuit. AMCC customers must run AMCCERC and remove errors. The program output, AMCCERC.LST, provides reports on population, I/O types and mixes, utilization, package signal pin requirements, DC power, internal pin count, and SSO power-ground evaluation while listing naming violations, unconnected pins, pin connect violations, fan-out loading violations with derated loads, and technology (array, power-supply, and macro mismatch) errors. AMCCERC.LST must be included with the design submission package.
Generate extrinsic load time delays (Annotation) The need for annotation software came from the change in the ratio between the delays caused by the interconnect between macros and the macro internal (intrinsic) delays. Once it was common for an interconnect net delay to exceed one half of the intrinsic delay, or even to exceed the intrinsic delay, it became necessary to produce a reasonable estimate of the interconnect delay. Figure 2-3 Schematic And Netlist Paths Into AMCC
In 2000, the netlist standard had become edif. Verilog, VHD, edif, db, PDEF are transfer standards now. Front-Annotation is the term used for pre-placement-pre-route interconnect delay estimation. The estimate is based on the net size, number of fan-out loads, both physical and electrical, or the capacitive load on an output macro. The Front-Annotation programs such as AMCCANN compute the fan-out loading delay, the wire-OR loading delay, and provide an estimate of the metal etch delay due to the size of the nets. The estimate is based on a statistical evaluation of previously built circuits and the average etch length used to connect same-sized nets. It is too large a number some of the time and too small of a number at other times. Front-Annotation is not a specification. Where Intermediate-Annotation is available (a Manhattan-Distance algorithm based on a placement file), it should be used. It is more accurate in more case but it is still an estimate. Only after place and route can the actual metal etch delays be known. Annotation after place and route is called Back-Annotation.
Perform testability analysis on the circuit. All testability measures have one common goal: to enhance controllability and observability of the circuit. It is a grade on the logic design itself. Controllability is a measure of the ease in setting a particular node to a logic level of zero or one, while observability determines the ease of propagating the node's state to one or more primary outputs. After a netlist has been created and logic simulation has verified correct functional performance, testability can be verified by running testability analysis programs. This optional step is highly recommended if there is software available to perform the analysis. For a modular design, a manual review should be performed if there is no software support. The purpose is to identify those parts of the circuit that are difficult or impossible to reach by way of primary inputs (controllability), and those parts of the circuit that may change state but that are difficult or impossible to observe at a primary output (observability). Steps should be taken to make hard to reach nodes controllable by adding test control signals and degating logic. Make hard to observe nodes observable by adding test points. Make any adjustments or changes to the schematic as necessary to improve testability to acceptable limits. Changing the schematic will mean repeating the error-checking and annotation software steps.
Testability analysis should be done before simulation since the result will be to simplify the functional simulation vector set development.
Simulation Once the circuit has been checked for design rule violation, sizing, power, package fit, optimization, functionality and other non-simulation dependent checking, the simulations required by the array vendor may be performed. There are several types of simulations: functional (all etch is connected without SA0, SA1 faults); at-speed (the arrayimplemented design runs at the specified maximum operating frequency of the circuit); AC test (path propagation delay) and parametric (VIH, VIL). The array vendor may specify the simulations, formats required, and vector rules to be followed.
Modular simulations - Debug only During logical debug of the original design it is better to simulate modular segments of the circuit, verifying basic logical operation and debugging the immediately obvious design errors. Once the circuit is considered to be a successful logical design, then perform the functional simulation that will form the basis of the test vectors submitted with the design. Multiple fragmented functional simulations cannot be submitted.
Functional simulation The object of functional testing is to detect a single SA1 (stuck-at-1) or SA0 (stuck-at-0) fault in the circuit if one exists. This ideally requires sufficient vectors to "cover" all possible SA1 and SA0 fault locations. The percentage of coverage is the fault grade of the vector set. For a high fault-grade score (95% and up), other types of circuit failures are assumed to be "covered". Functional test vectors are initially created from the functional simulation sampled results file. Functional simulations are run using Front-Annotation or Intermediate-Annotation with timing checks enabled. They are re-executed when Back-Annotation is available. It is the Back-Annotated simulation result file that goes to test. The functional vector set for a circuit should detect any single fault occurring on a single path. In theory, triple faults, odd faults of 5, etc., per path are covered by the vectors detecting single faults provided the faults do not mask each other. Even-numbered sets of faults on a path (double faults, quad faults, etc.) are assumed to mask each other and not to be detectable. The probability of multiple faults on a path is significantly less than the probability of a single fault. (Multiple faults that signal a catastrophic failure are detected within the basic wafer screening.) Figure 2-4a Simulations - Types And Forms
PRINT ON CHANGE
SAMPLED MAX
MIN MAX
MIN
FUNCTIONAL X
X
AT-SPEED
X
X
X
X
AC TEST
X
X
X
X
PARAMETRIC X
X
Figure 2-4b Circuit Simulation Requirements
Redundant circuit logic will cause some faults to be masked (prevent their detection) and should be avoided. Where redundancy is desired for other reasons, the designer should add test points to make masked faults visible. One extreme approach used to develop functional vectors is to cycle all inputs and outputs through all combinations of 1-0 and 0-1 transitions as a first check after initialization. (Theoretically, this should cycle all internal nodes in a combinatorial circuit as well.) This 2n (where n = number of inputs) brute force approach is not necessary. Minimum vector test sets and minimum vector test sequences will cover 100% of all observable faults. A fault cannot be detected by any test methodology if it is a masked fault. A masked fault cannot be seen at a primary output due to redundancy in the logic. Logic minimization is therefore a requirement if high fault grade scores are desired. The functional simulation vectors may have been developed for an earlier technology version of the array circuit or may be developed from scratch. They need to be constructed in pages (AMCC uses a 4K or an 128K page depending on the tester), begin with initialization of the array, and initialize periodically within the page between test modules. Begin by initializing every I/O pin (preferred initialization is within 25 100 simulation steps, depending on array size). Proceed to "home" the circuit For testability, a master reset or master set is desirable since it will allow a circuit to be placed in a known state quickly. For circuits that combine reset or set with non-resettable logic, the flip/flops and latches that are not cleared by the set or reset should be initialized after the set or reset has executed and the components settled. A circuit will need to be placed in a known state between groups of tests, at tester page boundaries and before any long or complex test.
Functional simulation execution Functional simulations must be done for the maximum and minimum worst-case timing and are sampled with a step long enough to ensure that all changes caused by the controlling data or clock signal have stabilized. (AMCC uses a step of 100ns.) The rule of thumb is to measure the longest path in the circuit, compute its worst-case maximum time delay, add 50ns and round to the nearest 100. For BiCMOS and bipolar arrays, 100ns is more than adequate. The 100ns step size equates to 50MHz, the limit for the SENTRY tester. Different vendors may specify different step size approaches but the necessity of all signals being stable will remain. Functional simulation results for the maximum and minimum libraries should be compared as a check on hazards and races. The results for the minimum library should match those obtained with the maximum library. If they do not match, stop and evaluate why they do not.
Table 2-10 Functional Simulations minimum worst-case maximum worst-case sampled
sampled
Simulation outputs Each simulator produces a data file or a list file that represents the signals the designer specified and the time step at which they were recorded. Most provide a waveform of the results as well. The formats of these output files are not standardized. To submit them to the array vendor, some reformatting must take place.
Reformatting simulation outputs To allow vector format checking and to simplify test transfer, AMCC developed the AMCCSIMFMT (AMCC simulation format) program. It reformats the output of the logical simulator into a form acceptable to AMCC test and to programs that need to read the files. (This standard format allows the simulation sampled output file to be used as a data input file to other software.) Each supported EWS and netlister has a unique AMCCSIMFMT program.
Vector Rules Checking The AMCC Vector Rules Checker (AMCCVRC) can be run against any AMCCSIMFMT (AMCC Simulation Format) sampled simulation output file from any simulator. AMCCVRC will issue a count of the number of test vectors and simulation vectors for the particular file being scanned. AMCCVRC will check for: missing primary I/O signals, missing 3-state or bidirectional enable internal signals. It will identify differential signals, verify that related clock and data signals do not change in the same vector (race conditions for the tester), check the number of simultaneously switching outputs per vector against some established limit, look for internal signals that should not be present, and print a summary of warn-ings and errors. Figure 2-5 Using A Formatter For Simulation Output
AMCCVRC will also identify primary signals that did not change in both directions during the vector set (toggle test). It produces a report and error listing called AMCCVRC.LST that is a required part of the design
submission package. Every maximum worst-case functional simulation file must be processed through AMCCSIMFMT and AMCCVRC.
simulator output ----> AMCCSIMFMT formatter --->
amccvrc.lst AMCCVRC Report
Fault grading There are fault grading programs that score the vectors as to per-cent faults covered. There are a number of fault-grading packages appearing on the workstations and on mainframes. Fault-grading is used to verify that the simulation bit vectors sufficiently exercise nodes within the circuit to assure that the outgoing product matches the customer specification. If an array vendor does not support a particular package, it is likely that the software will give misleading fault grade scores. Fault grade scores depend on the modeling approach used as well as the vectors themselves. Most fault-graders need a file or support program to reduce errors due to global ground not switching, VCC, VSS or VDD not switching, or a terminated output not switching and other, similar exceptions. Insufficient fault coverage as determined in a fault grading analysis may require the addition of vectors to the graded set. Functional simulation vector fault-grading can be performed at AMCC using the LASAR 6 simulator. AMCC looks for scores based on the interconnect nets and not on the internal macro component interconnect links. MSI macro modeling (and whether the macro is hard or soft) will affect fault grade scores. AMCC recommends the creation of enough vectors to achieve a fault coverage of 90% or higher. simulation stimuli and netlist ----> fault-grader ---> report grade
At-speed simulation In addition to function simulation, the designer must perform some at-speed verification of circuit operation. One method is to perform a simulation that is executed at the specified maximum frequency of operation of the circuit with timing checks enabled. At the minimum, these vectors should cover the critical performance paths of the circuit and may cover the entire circuit. The at-speed simulations are run using Front-Annotation. The Front-Annotation results are not to be considered to be a specification of the final results. The atspeed simulation is re-executed when Back-Annotation files are available. For conventionally specified array series, at-speed timing analysis is done with the worst-case military or commercial (maximum) and with the minimum library. At-speed simulations are run with the print on change option for the simulator (print_on_change, -c, list -change, etc.), monitoring the same signals monitored by the functional simulation. Because these are complex to evaluate, they are also performed in the sampled mode. They are run using the maximum library
and the minimum library. Table 2-11 At-Speed Simulations minimum worst-case sampled
print_on_change
maximum worst-case sampled
print_on_change
Timing Verifiers - An at-speed option If they are supported by the array vendor, timing verifiers can be substituted for at-speed simulation. Not all timing verifiers are supported by the array vendors even if the corresponding simulators are supported. (The Valid timing verifier is the only one currently supported by AMCC and then only with certain libraries.) Check with the array vendor. Verifiers can run min-max analysis against either the maximum or minimum delay library. The min-max spread is the process, temperature, and voltage variation for the library and is about 10-40%, as specified by the vendor. This type of analysis can highlight spikes, ambiguity on clock paths, and marginal timing performance.
Supported and non-supported EWS features Timing verifiers emphasize the need to communicate clearly with the array vendors. When evaluating an EWS or netlist purchase, consult the intended array vendors for a list of systems and system features that the target libraries support before committing to a design approach. The EWS system may have software for which the vendor has not created models, rendering that software useless without extensive further development. There is a growing pool of independent workstation tool suppliers. For these packages, the array vendor must also be consulted before assuming that they can be used. Some of them alter the netlist that the vendor may be using as input to the layout system, destroying the circuit interface. Always refer to allowed equipment and EWS configuration supplied by the target array vendors. Consult with them before starting a purchase or a design.
Create the AC test simulation vectors - Optional AC tests are optional and may be written to check either propagation path delay in a non-memory path or external set-up and hold time for memory elements. Both rising and falling edges should be checked. AC test simulations may be concatenated into one simulation file provided clear documentation of start and stop time addresses are provided. Each test (one pair of input-output pads, one edge direction) must initialize the circuit so that the test can be performed, provide the stimuli and run until the effect of the stimuli is seen at the circuit output. AC test simulations are run using the maximum and then the minimum library. In each case, run once for sampled results and once for print on change. AMCC performs only path propagation delay AC tests. For older AMCC arrays, there is a limit of 20 tests over 10 paths, with bus lines handled as multiple paths. AMCCVRC is used by AMCC customers to screen AC Test simulation vectors. Table 2-12 AC-Test Simulations minimum worst-case sampled
print_on_change
maximum worst-case sampled
print_on_change
AC Speed Monitor The AC speed monitor that AMCC built into the Q20000 Series base arrays removes the requirement for customer-generated AC test simulation vectors. This on-chip device will be added to all future arrays. The basis of the AC monitor is a 9-stage ring oscillator followed by a 2-stage divide by 4 counter. Each stage uses 100 mils of second and third layer metal to evaluate metal loading. The accuracy of the counter is 0.005% up to 100MHz. The AC speed monitor uses two pads, a power supply pin and the output pad, that are bonded out to external package pins. (See Figure 2-6.) Figure 2-6 AC Speed Monitor - Q20000 Series Arrays
Parametric testing - Optional Parametric testing for VIH, VIL is optional. There are several different methods of setting up a parametric simulation. One approach is the use of a parametric gate tree, where all circuit inputs (clocks and set and reset included) are treed by NOR, AND or OR gates (SSI logic) to a single output. The cost is the number of internal cells needed to implement the gate tree, one output and an added load on the primary input signals. The vectors are the minimal test sequence (100% fault coverage) for that gate tree. A minimal sequence changes one input per vector and the output toggles every vector. Every input is switched from 1-0 and from 0-1, one by one. The parametric vector set is combined with the functional simulation vector set for fault-grading. Parametric simulation is run once, using the maximum worst-case library and a sampled output. The vendor may require that the minimum simulation also be run. AMCCVRC can be run to check parametric testing simulations. Table 2-13 Parametric Simulations minimum worst-case maximum worst-case sampled
sampled
The Design Submission Through Prototype Complete the design validation review Once the simulations are completed, the entire circuit package should be reviewed for completeness. This is a preliminary design acceptance review. AMCC currently provides a Design Validation form that identifies areas that are characteristically problems in a design submission, a list of items that previous design submissions had in error. After AMCCERC errors, timing check errors and AMCCVRC errors have been resolved, the designer should work through the checks in the Design Validation section of the design manual. These are checks that have not yet been or can never be automated. The questionnaire is incorporated into AMCCSUBMIT, a design submission program that queries the designer for file names and conditions, and produces reports for use in design submission. For array vendors without such a list or automated support, review the submission procedures and the array design rules.
Complete the design submission checklist The array vendor will have a submission procedure, a list of the files and documents required for submission before the vendor can accept the design and proceed to layout. Check off the required items as they are assembled into the package. A design cannot be submitted without the completion of the required items. Optional items must be complete if the option is chosen. Last of all, make sure some media index exists in both media and hardcopy form that identifies what is in the submission package. AMCC has created a generic design submission form as a first step in creating a user-interface automation of the design submission - design validation process. It and its accompanying document detail what is required to be submitted in hardcopy and what is to be submitted on media (disk or tape). (The first version of the form is now part of AMCCSUBMIT.) AMCCSUBMIT generates a report that alerts the reviewer to problem areas in the design or the design submission package.
Submit the circuit - acceptance design review When the design submission package is complete, submit it. The vendor will review the package for completeness and correctness. The submission will include all pre-approved design waivers negotiated before submission. ●
●
●
If everything has been done according to the vendor''s rules, the design will be accepted and move into its proposed schedule. If only minor things are missing or incorrect, the array vendor may make the changes under the designer''s approval. If serious errors or omissions exist, the design may be rejected, i.e., returned to the designer with instructions on what is missing or incorrect.
Implementation Engineering - the Array Vendor On design acceptance, the Implementation Engineer assigned to the design will rerun all simulations against the in-house library. The object is to identify any macro design changes implemented after the library release date that would affect the design under review. A second objective is to verify that the circuit used on the schematics, netlist and simulations all match since it is easy to violate file consistency.
The Design Submission Through Prototype Placement After processing by Implementation Engineering, the circuit will be submitted for layout. Preplacement requests that were approved by the array vendor are input to the layout system in this phase. For customers who wish a particular package pin-out, a specific pad placement may be required. Vendors attempt to honor these requests if they do not violate other placement criteria. Placement restrictions may be I/O mode and package specific. They may be driven by the type of macro, such as the dual-cell differentials. They may be driven by MSI (multiple-cell) placement requirements, whether these are hard or soft macros. Timing specifications and clock distribution requirements are another factor as are the particular restrictions induced by simultaneously switching outputs (SSO). Packages that use internal power and ground planes may restrict where added power and ground macros are placed, and this may conflict with the SSO requirements. All of these factors must be reviewed before approving a placement. A placement is not usually considered final until after routing and then only after the at-speed Back-Annotated simulation is approved by the customer. On the average, a circuit requires a first-pass placement (90-95% auto-placement is the goal) and some adjustments in a second pass.
- Intermediate Annotation Some vendors may have an Intermediate-Annotation software package capable of providing Manhattan-Distance algorithm-based Intermediate Annotation delay files. They allow simulations to be performed with time delay data that is much closer to reality than the generic, "every-same-sized-net-is-the-same-length" Front-Annotation software. For a circuit where the technology is being pushed to the limit, and Back-Annotation will take more than a week to obtain, it might be a good idea to run Intermediate-Annotation simulations. They are still not accurate enough to be treated as a specification, but they could identify gross errors in placement that could be corrected before routing.
Routing Routing is the longer process. For circuits meeting the internal pin count and cell utilization limits for the array, 95% of the nets can usually be routed automatically. The last few are closed by a human operator at a graphics interface terminal. Some array vendors will not accept an array that cannot be 95% autorouted. As a guideline, AMCCERC will report warnings for those circuits exceeding recommended internal cell utilization limits and recommended internal pin count limits. It will report an error for those circuits that exceed the limits so far as to be considered impossible to route. It cannot cross-check package pad-pin requirements or make any assumptions about the physical location of the macros.
- Back-Annotation After layout, the Back-Annotation delay files are available to the designer to rerun the logical and at-speed simulations, plus any of the optional simulations originally submitted. These files provide the actual metal lengths in the circuit nets as opposed to the estimated metal length, and the actual (as far as is known) package pin capacitance for the output nets. Critical paths must be checked with this data. At this point, a failure to meet specification timing requirements by a small amount may be correctable with a layout adjustment or a routing change. Serious failures may signal the need to re-design. AMCC and most vendors guarantee the maximum worst-case BackAnnotation at-speed simulation, i.e., guarantee that the silicon will not be slower than the results. The careful evaluation of the critical paths early in the design phase, the proper derating of the fan-out loading, careful selection of the macros and the options for those macros, preplacement for critical and sensitive paths (balanced against the placement restrictions and rules for the array), and the careful simulation and timing validation before layout, will all ensure a successful design experience. Re-simulation and timing validation after layout (place and route) help ensure a successful wafer.
Prototyping After the Back-Annotated simulations are approved by the customer, the vendor can proceed to produce prototypes.
Array Design Acceptance After prototyping and testing per the testing specification supplied at design submission, including the functional vectors, the customer would perform the final array acceptance as desired. At this point, full fabrication of the final product can begin.
Functional Specification - A Closer Look The functional or target specification is the first level of description of the project that may encompass one or more arrays when the design is partitioned. There may be a specification tree with the total project at the top node and individual circuit blocks or modules detailed underneath. Topics included in a functional specification are listed in Table 3-1. At this early stage, a functional description of what is to be accomplished is created along with some of the top-level circuit requirements.
Array Interfacing For the partitioned project (multiple arrays), the individual array specifications would include a description of array interfacing. Interconnection between arrays is faster when done with ECL. When choosing single or dual (differential) rail ECL use the following guidelines: ●
●
●
If the arrays will be placed on the same board and will be adjacent to each other, single rail (non-differential) ECL may be acceptable. If the arrays will communicate across a backplane or be remote on the board, differential ECL may be required. Differential ECL is required if the operating speeds exceed the maximum frequency specifications for single rail ECL.
The potential need for differential ECL should be indicated at the functional specifica tion level.
Partitioned circuits should attempt to balance the distribution of I/O and internal cell usage between the different arrays while maintaining critical paths within one array if possible. This is still the rule to follow - no matter how big the arrays get. It is also a good guideline for how to break up a 6-8 milllion gate array into top-level blocks - keep the critical paths inside the block if possible. Interblock connections today are what interarray connections were yesterday. Table 3-1 Components of The Functional specification
Functional Specification Block diagram to the module level- including any partitioning into more than one array Description of the boundaries between the modules and the rest of the system Initial sizing of the I/O interface by type - ECL, TTL, etc. Functional Description of the Modules Description of the interface between the circuit modules - busses, control, critical interconnects The overall performance requirements - - - the maximum frequency of operation - - - target clock speed (per clock) - - - path propagation delay requirements set by modules external to this design I/O toggle rates Synchronous/asynchronous signals Allowed or available power supplies Power restrictions Physical size restrictions Environmental requirements -Commercial, Military, Industrial, Other Packaging requirements Derating for junction temperature Prioritized design objectives
Hard Specifications Design criteria that are considered as hard (inflexible) specifications should be clearly documented as such. Specifications that might be alterable should also be clearly identified. If a tradeoff or judgment call needs to be made during the remainder of the design project, such information can save time and possibly the project.
Design Objectives Overall design objectives should be clearly identified and documented. These include optimization for speed, power or die size, which translates to minimized inter nal cell utilization and minimized I/O utilization. Since these objectives are in conflict, they should be prioritized. As a last step, there should be a careful design review of the circuit and sys tem functional specifications, and the partitioning
Review the Available Arrays With a clear understanding of the design description and overall objectives, review the arrays currently available that could be used. For a listing of currently-available array series, check with the latest ASIC vendor surveys run by several of the engineering magazines. These buyer's guides provide a cursory look at what is available and allow a first-pass sort of available arrays into feasible and non-feasible, a staring point from which the designer can proceed. They have limited space to review technology, die size, cell counts, metal layers, number of macros, interface levels, second sources and the EWS workstations the array ven dor supports. They may not have the latest updates on an array series. They can provide addresses and phone numbers for array vendors. Once one or more vendors have been selected, the designer should obtain data sheets and design guides from the prospective vendors for the most promising array series and begin a more in-depth review.
Example - The AMCC Arrays - as of 1991 The industry shows an evolutionary trend as designers drive them to develop larger, faster and cooler arrays. There have been five bipolar array families from AMCC since 1984, (see Table 3-2) increasing in cell size and speed while reducing die size and power. The most recent is the AMCC Q20000 series, officially released in September 1989. Table 3-2 AMCC Bipolar Array Series AMCC Array Series Year
●
Q20000
1989
Q5000
1987
Q3500
1986
Q1500
1985
Q700
1981
The Q20000 Series speed estimates list its internal toggle rate, at least twice as fast as that of the previous Q5000 Series, at 1.25GHz, with an enhanced drive and much lower power. Individual macros have been found to run at 1.4GHz and higher.
There are two AMCC BiCMOS array families, the Q14000 Series and the Q24000 Series, a partial shrink of the Q14000, as shown in Table 3-3. Table 3-3 AMCC BiCMOS Array Series AMCC Array Series Year Q24000 Series
1990
Q14000 Series
1988
The current BiCMOS families were preceded by three CMOS array series, each faster than its predecessor. The BiCMOS arrays combine the drive and interface ability of bipolar with the cooler operation of CMOS. The newer BiCMOS Series must be larger, faster and cooler.
Comparing the arrays The items that define the differences between array series include those shown in Table 3-4. Table 3-4 Features for Array Series Comparison Array Series Comparison Topics The process technology Metal layers routed (2, 3) Series gating techniques Sea-of-cells versus routing track architectures also called channelless vs. channelled Overall Maximum Speed of Operation specified as I/O and internal toggle rates Frequency Ft (frequency at which beta for transistor becomes unity) Noise immunity Edge rates - programmable or not Symmetry in rise and fall times Power-supply options allowed Power-supply variation stability Maximum number of I/O cells available I/O modes allowed (TTL, ECL, MIXED, etc.) ECL terminations On-chip translators Maximum number of internal cells or gates available Features for Array Series Comparison Macro Options - Standard (S); Power (P); Low-power (L); High-speed (H); Fast (V), Drivers (D); superdrivers (D) - - - or lack of options; i.e., speed-power programmability Variety in the macro library available Wire-ORs (dot-wire) allowed or not Design constraints Power dissipation per gate Packaging Available Autoplace, Autoroute Engineering Workstation support Simulators supported Second source Military compatible Commercial compatible Military qualified testing Other topics as dictated by the arrays, their technology and the design issues
The arrays within a series refine these differences with specific information on size, number of cells by type, and details about interfacing, as shown in Table 3-5. Data sheets, product profiles and macro library design guides or design manuals supply the specific information for an array series. The design manual, supplied with the array library media, is the controlling document.
Architectural Specification or Hardware Specification Once a clear definition exists of the circuit or circuits that will be placed on one array, then the planned design can be developed. This is on a smaller module scale than the block-level functional specification, e.g., at the level of counters, adders, latches, registers, sequencers, etc. The performance requirements defined in the functional specification can be used to select the technology. Table 3-5 Array-Specific Specifications Hardware or Architectural Specification Number of internal cells Number of I/O cells Number of outputs Number of bidirectional macros Number of fixed power and grounds Rules for adding power and ground Packaging Options Maximum internal current limits On-chip memory Macro-type design use restrictions such as number of Darlington; CML outputs Placement rules that affect design Variable bonding The review of available arrays is conducted in parallel with the creation of the hardware specification. With the descriptions developed for the modules, equivalent gate estimates can be made for the circuit, or estimated cell usages can be computed for the circuit on a specific array. The array vendor Applications Engineer can help with the sizing esti mate. The hardware design specification details what the designer intends to do to meet the target functional specification. This level of specification can be equated to a PDL (program definition language) description of software and is the basis for the evolution of HDL, hardware description
language, and its derivatives. If a particular testing methodology is being enforced, the sizing estimates must take this additional logic into account. If additional testing logic, such a parametric gate tree or parity logic, is to be used, it must be included in the sizing estimates. The specification may include proposed vendors and arrays. Table 3-6 Components of The Hardware Specification Hardware Specification Components The selected technology or technologies Potential array series (1-3 at the most) Block level diagram to the sub-module level The functional description of the different circuit sub modules such as adders, counters, registers, etc. Sub-module sizing --- equivalent gates or estimated internal cell utilization --- estimated I/O cell utilization --- estimated pad utilization --- estimated internal pin counts Refined details on the array interface --- number of CMOS I/O --- number of TTL I/O --- number of ECL 10K I/O --- number of ECL 100K I/O --- all four types partitioned into inputs and outputs and bidirectionals --- number of outputs switching simultaneously (by type) (SSOs) --- maximum toggle frequencies for each I/O --- external set-up and hold window unless this circuit will establish the window specification for the driving circuit Critical path throughput performance Estimated power - DC and AC as required Package to be used Heatsinks required and/or air cooling required Estimated junction temperature There should be a design review of the architectural or hardware specification before final selection of an array series. On final selection, the specification should be revised to show that series and all computations performed for that series. Note that a workstation can provide some assistance. The critical path may be captured in more than one version and comparisons made based on an annotated simulation. Power and sizing details can be run against a macro list rather than a full interconnect netlist. (This tool is vendor-dependent.) Check if such a precapture tool is available to help size the circuit.
Array Sizing Cell Structure Each cell in an array consists of a number of uncommitted transistors, resistors and other discrete components and is designed around the performance criteria for the intended macro library. The cells will vary between array series, regardless of the vendor.
Equivalent gates The number of equivalent gates has been a design measure dating from the days of discrete designs first converting into SSI-level ICs. Integrated circuits were classed as SSI, MSI and LSI based on their equivalent gate counts. Circuits were "sized" based on the number of equivalent gates it would take to create them. CMOS arrays carried on with the equivalent gate count and it was reasonable because the internal cell in a CMOS array can be sized as 1, 2 or 3 gates. Bipolar arrays carry equivalent gate counts on their data sheets as a sizing measure but it serves only to show relative sizing between arrays in the same series. Bipolar array cell complexities render equivalent gates a rough measure at best. BiCMOS cells are more complex than CMOS and equivalent gate estimates are not recom mended for them either. To complicate the problem, vendors use many different methods for computing equiva lent gates. The designer would need the algorithms before a rational comparison based on equivalent gates can be made between and two array series, even from the same vendor.
Example - Method 1 One approach to array sizing is to count the number of transistors in the internal core cells, assume that 2.5 transistors is equivalent to a gate (Digital Equipment's defini tion), and compute the number of equivalent gates per cell. The product of the number of cells times the number of gates per cell provides the equivalent gates per array. equivalent gates = ( number of transistors in core / 2.5 )
Example - Method 2 Another method is to use the D flip/flop. Sizing the D flip/flop as 11 gates, the Q20000 Series D flip/flop uses 2 internal cells.
equivalent gates = ( number of internal cells / 2 ) * 11
Example - Method 3 The usual AMCC method is to size a 3:1 MUX-D flip/flop macro as 11 gates. The Q20000 Series 3:1 MUX-D flip/flop uses 3 internal cells. equivalent gates = (number of internal cells / 3) * 11
Example - Method 4 The last method discussed here is to size a full adder at 16 gates. For the Q20000 Series, a 1-bit full adder takes 3 internal cells. equivalent gates = (number of internal cells / 3) * 16 or: equivalent gates = [(number of internal cells / number of cells required for measuring function) * number of gates in function] AMCC ASIC Product Selection Guide with Equivalent Gates Listed (1996)
AMCC ASIC PRODUCT SELECTION GUIDE (1990's)
Equivalent Gates Number Structured (Full Adder of I/O Array Blocks Method)
Part Number
Technology
Q20004
1 Micron Bipolar
671
28
Q20010
1 Micron Bipolar
1469
66
None
Q20025
1 Micron Bipolar
4032
100
None
Q20045
1 Micron Bipolar
6782
128
None
Q20080
1 Micron Bipolar
11242
162
None
Q20120
1 Micron Bipolar
18777
198
None
928
34
1 GHz PLL
Q20P010 1 Micron Bipolar
None
Q20P025 1 Micron Bipolar
3120
51
1 GHz PLL
Q20M100 1 Micron Bipolar
13475
195
RAM
I/O cell contributions None of these methods for estimating equivalent gates take the logic capability of the interface cells into account. Some vendors do count them in their published equivalent gate counts and others do not.
Example - AMCC cell design AMCC cell design is optimized for MUX, latch and flip/flop implementations. Each cell is designed to support high-speed requirements so that there are no placement re strictions on the highspeed option macros due to cell limitations. No power is used by a cell in its base configuration. For the AMCC BiCMOS arrays, a cell is roughly 3 gates. For the bipolar arrays, a logic cell is a more complex structure and varies with the series.
Cell capabilities Cells for each array have different capabilities. The cells for different array series, same technology (bipolar, BiCMOS or CMOS), from the same vendor may also differ widely in the approach used in their design and in their functional complexity.
Example - AMCC cell capabilities An internal cell for the Q5000 Series can support a complex D flip/flop, a 3:1 MUX and D flip/flop, a triple latch, two simple (no RESET, single output) D flip/flops, or triple 2:1 MUXs with common select. The Q20000 Series internal cell alone cannot support a D flip/flop. S- and L-option D flip/flops use two cells while H-option D flip /flops require three. The Q20000 Series internal cell is roughly comparable to a half-cell for the Q5000 Series if size of function alone is considered as the basis for comparison. The logic cell for the Q20000 Series is defined as the smallest partition possible and each internal cell supports one Turbo macro output. Turbo is a Q20000 feature that provides high drive (18 loads) with less power and less skew.
Cell types and resources The vendor data sheet and design guide or design manual should clearly identify cell types and the number of each on each array in the series. Any restrictions in the use of the cells, either utilization limits or cell count limits should also be readily available. Included in these descriptions should be a measure of cell functionality, either in a table summarizing the array cell capability or through the macro library documenta tion. As a part of the cell resources identification, the vendor should be supply a clear description of the fixed power and ground pads and procedures to added additional power and ground pads. These added power and ground macros usually reside on an I/O cell and pad and can affect the number of cells left for circuit signals.
Example - AMCC Cell types The basic AMCC logic array is composed of two classes of cells: the internal cells, which is composed of logic (L) and memory (M) cells for bipolar arrays or basic (B) cells for BiCMOS arrays; and the perimeter cells composed of input, output or bidirec tional (I/O) cell. Older AMCC arrays had buffer cells internally and specialized input or output-only interface cells. An array may or may not have specialized I/O cells. AMCC
cell types are shown in Table 3-7. The QM1600S (now the QM1600T) was the first of the AMCC arrays to incorporate memory on a logic array. Table 3-7 Cell Types INTERNAL:
Logic,
Basic,
Buffer,
Memory
PERIPHERAL:
Input,
Output,
I/O,
Special-I/O
Refer to the cell resources table for an approximate idea of the array cell capacity for three series and note the differences. Cell resources for the Q24000 Series are shown in Table 3-8, for the Q5000 Series in Table 3-9 and for the Q20000 Series in Table 3-10. Note that no two series are alike! Table 3-8 AMCC Q24000 Series Arrays - Cell Resources Array Name
Internal B Cells
I/O Cells
Pads
6880
300
256
Q24280 Q24140
3360
226
226
Q24091
2268
160
160 132
Q24060
1440
132
Q24021
540
80
80
Q24008
190
66
44
Usage restrictions: Refer to the Q24000 Design Manual for details. Table 3-9 AMCC Q5000 Series Arrays - Cell Resources Array Name
Internal L Cells
I/O Cells
Output Limit
Memory Cells
Q5000T
352
160
120
-
Q3500T
242
120
-
-
Q1300T
84
76
-
-
114
106
-
2 (1240 bits)
QM1600T
Table 3-10 AMCC Q20000 Series Arrays - Cell Resources I/O I/O Signals Signals Cells Cells - Loop - PLL (For (Fixed) FIlter Related Signals) (1) (2)
Array Name
Internal Cells
Q20120
3414
198
4
-
-
Q20080
2044
162
4
-
-
Q20045
1227
128
4
-
-
Q20P025
595
76
4
13
8
Q20025
733
100
4
-
-
Q20P010
177
54
4
13
8
Q20010
267
66
4
-
-
* Two pads are used by the AC Speed Monitor and two by the thermal diode. ** Only for the largest arrays, 100_LDCC for the Q20P010 and 132_LDCC for the Q20P025 Add last four columns to find total I/O cells and pads.
Array Name
ECL TTL PLL Power/Ground Outputs Outputs Power/Ground (1) Limit Limit
Q20120
172
100
-
78
Q20080
130
80
-
52
Q20045
100
64
-
52
Q20P025
45 (2)
45 (2)
8
26
Q20025 Q20P010 Q20010
80
48
-
36
23 (3)
23 (3)
8
20
50
24
-
32
(1) Add last two columns to find total number of fixed power and grounds. (2) 51 for external loop (3) 34 for external loop
Systems
Array architecture The base arrays for the various series are similar in their design concept in that the core of most arrays is composed of an array or matrix of logic or basic cells organized in a row-column configuration. Arrays that contain memory place the RAM blocks in the core area, with the rest of the core designated for internal logic cells. Phase-Lock loop arrays, the PLL arrays, have PLL locations that straddle both core and interface areas. Interface (I/O) cells are placed around the perimeter of the array interspersed with power and ground. There are different base arrays for different power supply configurations. The base array for a single +5V supply will be different from that for a mixed-mode +5V/-5.2V dual supply. A generic die plot for the Q20080 array is shown in Figure 3-1 and one for the BiCMOS Q24091 is shown in Figure 3-2, with the interconnect pattern in Figure 3-3. Figure 3-1 Q20080 Die Plot
Figure 3-2 Q24140 Die Plot
Figure 3-3 BiCMOS Macro Interconnect Pattern
Macro configurations Macros are individually configured by interconnecting the components within a cell with one layer of metal to form the selected macro function. Macros can occupy a cell, a partial cell (usually 0.5 cell), or require several cells. The internal interconnect for a simple macro is generally confined to one layer of metal. The particular layer will depend on the array series.
Cell Interconnect The process of interconnecting macros is called routing. For channelled architec tures, routing is performed following specific routing tracks. The interconnect is on the first and second layers of metal in a two layer metalization array. Horizontal and vertical tracks are assigned to specific metal layers. For an array with three layers of metal, the second and third layers will be used for inter-macro routing and the first layer for intra-macro routing. In practice, the hard definition of which layer of metalization is restricted to which operation can be blurred.
Channelless architecture Channelless architectures have been developed to avoid some of the limitations im posed by restricted number of routing tracks. The Q24000 sea-of-gates and Q20000 sea-of-cells (channelless) architectures use three layers of metal. Macros are interconnected on one level and interconnect between macros occurs on the other two, the specific layers being array and series dependent. For the Q20000 Series arrays, the internal macro connects (intraconnects) are on second and third metal with macro and I/O interconnects on the first layer. Routing on all three layers is possible and four layers of metal is a future possibility.
Netlist The combination of the macro layout patterns (component interconnect) and the macro interconnect forms the metalization pattern required to implement the circuit on a given array. This pattern is described in a netlist. Each workstation produces a netlist in its own format, carrying along whatever in formation the workstation vendor has decided was necessary. There is no standard workstation or simulator netlist format although efforts are directed toward that goal (see EDIF) and some success has been recently attained. Parametric information that is included in the netlist is array and arrayvendor depen dent. A library such as the Q20000 is shipped to customers with a Macro Parameter File, which supplies the parameters for each macro in the library. These parameters are included in the netlist for each occurrence of each macro used in the design.
Example The AMCC netlist To accommodate transfer of designs from any workstation or from any of the sup ported netlisters (Laser 6 and Verilog) to the mainframe-based place and route sys tem, netlist conversion is performed, where the workstation netlist is translated into a standard interface format. AMCC refers to this as AGIF - AMCC generic interface format. A different conversion program is required for each workstation or simulator that AMCC supports. The standardized netlist is named circuit.sdi . This netlist is used as input to the AMCC MacroMatrix software as listed in Table 3-11. Table 3-11 AMCC MacroMatrix and Design Support Software - using circuit.sdi MacroMatrix AMCCERC rules check MacroMatrix AMCCPACKAGE (Package Check and Data) MacroMatrix AMCCANN annotation MacroMatrix AMCCSIMFMT simulation file formatter MacroMatrix AMCCVRC vector check MacroMatrix AMCCSUBMIT submission check AMCCAD place and route system Test vector transfer software Verilog simulator
Interface options - I/O modes Interface combinations required for the design should be compared to those offered by the arrays under evaluation. The power supply and the interface combination define the I/O mode of the array. Not all arrays support all possible I/O modes with all possible power-supply combinations.
Interface types Once it is seen that the interface mix can be supported on an array series, the type of TTL and ECL outputs that will be required is used to help size the I/O requirements of the array.
Example: AMCC interface options For all AMCC arrays, TTL and ECL translators are included in the I, O, or I/O cells for external interfacing to both ECL and TTL. Each I, O, or I/O cell can be configured to be either TTL, ECL 10KH, ECL 10K, ECL 100K or as a power or ground pad. I/O cells can usually be used for input macros, output macros or bidirectional macros. Table 3-12 shows the possible I/O combinations allowed on AMCC arrays while Table 3-13 details the TTL output options and Table 3-14 the ECL output options. Table 3-12 AMCC Interface Combinations IF INPUT IS OF TYPE:
OUTPUT CAN BE ANY OF:
TTL ECL 10K ECL 100K TTL ECL 10K
ECL 100K
X X X X X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Table 3-13 TTL OUTPUT OPTIONS standard TTL open-collector three-state or 3-state also called tri-state standard TTL output bidirectional open-collector output bidirectional The 3-state outputs and TTL bidirectional macros have an enable pin that is either restricted to being driven by a specific macro type (a 3-state enable driver) or unre stricted and driveable by any internal-level signal. The restriction depends on the array and on the mode (100% TTL or Mixed ECL/TTL) of the circuit. Table 3-14 ECL Output Options
ECL 10K, 25 ohm termination ECL 100K, 25 ohm termination ECL 10K, 50 ohm termination bidirectional ECL 100K, 50 ohm termination bidirectional CML outputs (> 600MHz), ECL's version of an open-collector On-chip 50 ohm series termination ECL 10K On-chip 50 ohm series termination ECL 100K On-chip 100 ohm series termination ECL 10K On-chip 100 ohm series termination ECL 100K Darlington ECL 10K, 50 ohm termination Darlington ECL 100K, 50 ohm termination Darlington ECL 10K, 25 ohm termination Darlington ECL 100K, 25 ohm termination Darlington ECL Hi-Z 10K Darlington ECL Hi-Z 100K Darlington On-chip 50 ohm series termination ECL 10K Darlington On-chip 50 ohm series termination ECL 100K Darlington On-chip 100 ohm series termination ECL 10K Darlington On-chip 100 ohm series termination ECL 100K
From CML forward in the above list are types identified as possible for the Q20000 Series. Standard ECL 10K, 100K, CML and Darlington outputs were in the first re lease of the macro library for the series.
Power supply options In addition to the types of interface required, the power supply or power supplies to be used should be compared to the supplies allowed for the array. The supplies, the number of fixed power and ground pads and their locations should be reviewed for their applicability to the design in question. There is often a need to have an array interface with several types of I/O while keeping power supply requirements in line with what is already provided on the target PCB (printed circuit board). This can lead to operation of a technology with non-standard voltages.
Effects on Parametrics When non-standard voltages are used, such as -4.5V with ECL 10K and 5.2V with ECL 10K, the DC parametrics for the array will be affected. The data sheet for the array series will call out the parametrics for standard supplies.
The vendor must be consulted for computational procedures to be used when non-standard power supplies are used.
Example - AMCC Arrays - Power Supply Options The power-supply and interface type matrix for the AMCC arrays shows a very flex ible approach to solving interface problems. Many of the AMCC arrays can be used with a single power supply (+5V) or dual supplies (+5V/ -5.2V or +5V/-4.5V) as shown in Table 3-15. The Q5000 and Q20000 Series arrays are bipolar arrays. They use an internal ECL core (0.5V ECL) and can externally interface to either Schottky TTL, ECL 10K or to ECL 100K. AMCC arrays allow for the mixed mode operation of ECL/TTL on the same array, either ECL 10K/TTL or ECL 100K/TTL or all three. Only one type of ECL may be used for input on a single array. Both ECL types may be used for output on the same array. Table 3-15 AMCC Power Supply Options
SINGLE POWER SUPPLY
DUAL POWER
SUPPLY
-5.2V
-4.5V
+5V/5.2V
+5V/4.5V
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
I/O MODE
+5V
100% TTL
x
100% ECL 10K 100% ECL 100K ECL10K/TTL ECL100K/TTL
100% ECL run with dual power supplies is called "DECL". Table 3-15, with the exception of "DECL", also applies to the Q24000 Series BiCMOS arrays. They have a CMOS core and bipolar I/O and they can interface to CMOS. The concept of mixed ECL-TTL interface on a single array was originated as a result of customer demand. The idea of operating ECL 10K at ECL 100K power supplies and visa versa was also the result of customer requests.
Example- communicating to the software AMCC uses dummy macros called chip macros that allow a user to specify precisely what array is to be used in what interface mode with what power supplies. (See Figure 3-4.) The chip macro communicates parameters to the AMCC MacroMatrix software modules that are performing design validation, including population and cell type limit checks. The array-specific checks use chip macro parameters to set limits for TTL outputs, Darlington outputs, simultaneously switching outputs, bidirectional macro counts, and other checks. The AMCCERC software can spot mismatched interface macros and exceeded macro type limits and issue appropriate error messages. It can also adjust the DC power module to use the correct power supply in the power computation. Figure 3-4 A Chip Macro (1994)
Interface cell functionality Interface cells are designed to support TTL-translators, ECL-translators and most of the required buffers for external interfacing to both ECL and TTL. The amount of buffering, the capability of the cell to support high fan-out drivers, single-cell bidirec tional macros, ECL output terminations to 25 or 50 ohms, and elementary logic possible in an interface cell varies by array series. For many of the arrays, the input macros also provide simple AND/NAND or OR/NOR logic or high fan-out driver operations. The output macros for TTL contain OR or NOR operations and those for ECL may contain these operations plus others as complex as a latch or a 2:1 MUX. This is in addition to level translation and buffering functions. The amount of logic contained within an interface macro is series-dependent; it is a function of the I/O cell complexity and the components available within the cell.
Variability in I/O design The various array series and even arrays within a series differ in their approach to interface. The following gives an idea of the choices that have existed on the arrays from one vendor. Similar variability and evolution can be traced for other vendors. ●
●
●
●
●
The Q700 Series used unbuffered I (input-only) and I/O (input, output or bidirectional) cells that require a buffer for each input and each output macro. The buffer macros were placed on internal cells (L or B), reduc ing the L-cells available for internal logic functions. There was a D-cell on one array in the series to provide a pin-restrictive three-state enable driver that could drive more than eight loads. A bidirectional macro was composed of one interface and two buffer cells. The Q1500A array used I (input-only) and O (output-only) cells, with buffering either in the input or output macro or in a separate macro. The BExx macros were for ECL output buffering, for example, and were placed on a B cell. TTL input buffers are part of the input macro that was placed on an I cell. Bidirectionals were constructed from two macros on two adjacent cells using the same methods now used on the Q14000 Series arrays. The QH1500A array used I and I/O cells, with buffering included in the input, output, and bidirectional macros the first time all buffers were removed from the internal cell area. The I/O cell could support single -cell bidirectional macros. The Q3500 and Q5000 Series use I/O cells only, with buffering included in the input, output and bidirectional macros. The Q1500, Q3500 and Q5000 Series also provide unbuffered ECL input and the buffered logic macros to support it. The BIxx series
macros are made up of representative logical functions from the rest of the macro library (gates, EXOR networks, latches, flip/flops, MUXs and decoders) which also includes the ECL input buffering function on selected input pins. The BIxx macros are placed on internal macros (L or B). The selected pins are pinrestricted to be driven by any unbuffered ECL input macro. ●
●
●
●
●
The unbuffered ECL input macro does not suffer any degradation in speed due to loading delay, the only macro to behave in this manner. It can drive eight loads. Load capacitance presented to the source driving the unbuffered ECL input increases by 1 pF per fan-out. The Q14000 Series uses I/O cells, with buffering and logic as is used in the Q5000 Series. Single cell bidirectional macros can only be used on the Q9100B or Q2100B and then only in specific "special-I/O" cell loca tions. Additional bidirectional macros must be built from one input and one output macro. The Q20000 Series uses I/O cells, with buffering but no logic functions. TTL outputs (output macros and bidirectional macros) are limited to a number that varies per array. ECL outputs are also limited. The bidirec tional macros use two-cells and provide an added ground pad by using the left-over pad. Most 25 ohm termination macros require two I/O cells. The Q20000 Series provides a single-cell 25 ohm termination macro but limits its use to arrays using two power supplies. Darlington macros are limited to arrays with two power supplies. The Q20000 Series uses four fixed I/O signals per array. These signals are used by the on-chip thermal diode (one anode and one cathode) and the on-chip AC speed monitor (one is power and the other is an output signal). These four pads and cells are not available for use with any other function.
Bidirectional macros Bidirectional macros can be two-pin, one-pin, one-cell or two-cell macros. If an array series has no bidirectional macros, they may need to be constructed. Watch out for incompatibility with the workstations - a work-around may be required for proper simulation of bidirectional macros. If more bidirectional macros are needed, they are constructed from two macros, one input and one output, and placed on two adjacent I/O cells. The two macros can be tied together into one package pin, but this requires two test vector sets, one for wafer sort and one for packaged part testing. They are usually tied together outside of the package to keep testing simplified, but this requires two package pins. A third approach not liked by the array vendors is to stitch two macros together in the interconnect so that only one pad and one pin are used. Anytime that hand-edits or customization of the interconnect or base is involved, both time and money are required, and debugging time may need to be increased.
Examples The Q20000 Series arrays support a bidirectional macro that sits on two I/O cells, unlike the single-cell approach of the Q5000 Series. In this case, the internal macro routing eliminates the need for two sets of test vectors or an extra bonded-out pad. Each bidirectional macro also contains either an IEVCC pad (ECL VCC) or an ITGND (TTL GROUND) pad. (Refer to "Added power and grounds" for a discussion of pad -plane interconnections for added power and ground pads.)
Internal Cell Functionality The logic (bipolar) and basic (BiCMOS) cells are organized to provide logic functions such as basic logic gates and buffers, high-fan-out drivers, EXOR and EXNOR net works, gate networks, multiplexors, decoders, latches, and flip/flops. These cells can support a 3:1 MUX-D flip/flop combination, triple latchcommon clock, triple 2:1 MUX-common select and dual D F/Fs. As stated before (see "Cell struc ture"), the number of cells required for any of these functions will vary by array series. The number of cells required to implement a function depends on the component mix present in the cell and that required by the function. Arrays are designed for a specific set of applications or targets and base array design is optimized for those applications. An array cell size may be divisible so that half-cell macros are possible, which also allows sizes such as 4.5 cells. A cell may be designated as the smallest divisible or addressable unit (SAU), in which case a one cell macro is the smallest macro allowed.
Multi-Cell Macros Groups of internal and/or interface cells can also be combined into large multi-cell macros for higher functionality. The larger multi-cell macros, named MSI macros by AMCC, interconnect components spread across several cells more efficiently than the schematic interconnection of the equivalent function formed from basic macros. The result is a denser functionality with the resultant speed improvement. Design density, measured by the cell utilization per functionality, can be increased by 20-40% while reducing design partitioning and macro conversion efforts. The large MSI macros include MSI and LSI functions.
Example MSI macros are 6-bit comparators, 4-bit carry-look ahead adders and their companion carry-look-ahead generator, 4-bit up and down counters, 4-bit registers, 6-bit comparators and 8-bit latches. Different array series offer different MSI mac ros. The simple and MSI macros available with a specific array series are documented, along with any use or placement restrictions, in the appropriate Design Guide or Design Manual. Always refer to the latest version of these manuals when performing an evaluation.
Hard and soft macros There are two types of MSI or multi-cell macros. One type is hard, where the cell interconnect is treated as one large macro and no variations in layout are permitted. The other type is soft, where the cells composing the macro have a preferred, speci fied-to layout pattern but which requires the interconnect to be routed as if it were any other interconnect net. The MSI macros in the Q5000 Series were originally designed to allow placement in several different configurations to facilitate the auto-place algorithm (best-fit ap proach), while closely maintaining the specified performance for the macro. This is a soft-macro. A preferred placement is documented. The problems in improper placement, which invalidates the timing specifications and therefore, the simulation model, and the problems in net weighting and prioritizing the internal nets to the router, so that the interconnect delay could be kept minimal, make the soft MSI macro approach unattractive. Both the BiCMOS Q14000 and Q20000 Series MSI macros use hardmacros, where an MSI macro is laid-out as a single multi-cell unit and handled by the placement soft ware as an inflexible black box. Hardmacros facilitate automated placement. Future AMCC arrays will use the hard macro approach. Figure 3-5 shows an MSI-based 16 -bit adder. Figure 3-5 16-Bit MSI Adder (1994)
Refining Interface Requirements When the interface types and their power supply requirements are documented and one or more arrays chosen as candidates for the final selection, the interface require ments must be refined. There are several conditions under which additional power or ground pads will need to be added to an array beyond the fixed power and ground pads provided. These include: ● ● ● ●
simultaneously switching outputs, package restrictions, high-speed signal isolation and ECL - TTL isolation.
Simultaneous switching TTL or ECL outputs is a potential source of system noise, which can be reduced by the addition of TTL VCC - TTL Ground pairs and/or ECL VCC. Some arrays require that drivers be placed next to ground. Others require that a ground exist between simultaneously switching TTL outputs and ECL inputs, or between any TTL output and an ECL input. Isolation of CMOS inputs from the faster switching TTL and ECL signals may also be required. When a fixed ground is not available, then one must be added. The design rules for any array series are called out in the Design Manual for the array.
Variable Requirements for Power and Ground Bipolar arrays require that all fixed power and ground be used or bonded out to the package. Additional power and grounds are based on simultaneously switching out puts or isolation requirements. CMOS arrays have some or all of their fixed power and ground pads under user -placement control. The vendor provided a list of how many would need to be used depending on the signals used by the design. This type of flexibility is detrimental to standard packaging; it is time consuming and expensive. In spite of the drawbacks, recent BiCMOS designs have returned to this approach, providing the minimal number of power and grounds and allowing other fixed-position power and grounds to go unbonded (unconnected). The criteria for requiring that these fixed positions be used or that additional power and grounds be added is based on the number and types of interface macros used. When the power busses supporting the internal core are isolated from the busses supporting the peripheral I, O or I/O cells, noise feedback due
to output switching is minimized. The threshold and reference voltage generators for the logic array inter nal cells and I, O and I/O cells should also be independent to insure steady operation.
Adding Extra Power and Ground pads Adding a power pad or a ground pad to an array can be accomplished by placing a power or ground macro on the desired pad (array-specific procedure. AMCC arrays use the ITPWR (+5V), ITGND (0V) and IEVCC (ECL VCC) macros to add power or ground. (See Figure 3-6.) For standard refer-ence ECL, IEVCC represents a ground pad. For +5V REF ECL, IEVCC represents a power pad. Figure 3-6 Added power and Ground macros (AMCC)
Dual-Function I/O Macros Each added power and ground macro uses a pad and disables the cell that is associ ated with that pad, reducing the number of these cells and pads available for I/O operations. To offset this waste, many macro libraries include dual-function macros that use the I/O cell for one function and the pad for added ground. Silicon efficiency can be achieved with the dual function macros. The macros avail able are array series-specific and vary widely. If any of these functions applies to the design, they can reduce silicon requirements while maintaining functionality. (See Figure 3-7.) Example macros include: ● ●
input function with 3-state enable driver 3-state enable driver with added ground bidirectional input with added ground
Figure 3-7 Example Dual-Function I/O Macro
Example - Simultaneously Switching Outputs All AMCC arrays, with the exception of the Q20000 Bipolar Series and the BiCMOS Q24008 array, use the following rules for adding power and ground due to simul taneously switching outputs (SSO), called an output group. Allow 8 TTL SSO outputs per quadrant, then add one TTLPWR and one TTLGND macro for each group of 1-8 after the first eight. This requires two cells, two pads and, depending on the package, two package pins. Add another pair for the next group of 1-8 and another for the next group of 1-8 and so on. All TTL output counts are converted to "equivalent" 8 mA outputs. (See Table 3-16.) For packages with internal power and ground planes, place the TTLPWR and TTLGND macros so that they are interspersed with the
simultaneously switching outputs and can be bonded to the power or ground package plane. Table 3-16 Sample Rules for Adding TTL Power and Ground PER TTL SSO ADD TTLPWR, TTLGND PAIRS: 0-8
do nothing
9-16
add 1 pair
7-24
add 2 pairs
Etc. Allow 8 ECL SSO outputs per quadrant, then add one ECLVCC macro for each group of 1-8 after the first eight. This requires one cell, one pad and, depending on the package, one package pin. Add another pair for the next group of 1-8 and another pair for the next group and so on. For packages with internal power and ground planes, place the ECLVCC macro so that it is interspersed with the simultaneously switching outputs and can be bonded to the power or ground package plane as required. Note that ECLVCC is a power pad in a +5V reference ECL circuit (5V REF ECL) and a ground pad in a standard reference ECL circuit. (See Table 3-17.) Table 3-17 Sample Rules for Adding ECL Power OR Ground PER ECL SSO ADD ECLVCC Q20000 Rules 0-4
do nothing
do nothing
4-8
do nothing
add 1
9-12
add 1
add 2
13-16
add 1
add 3
17-21
add 2
add 4
21-24
add 2
add 5
Etc.
Etc.
The Q20000 Series requires one ECLVCC per additional 1-4 ECL SSO after the first group of four. All output counts are converted to "equivalent" 50 ohm outputs. The extremely high speeds of these arrays require design procedures to ensure minimal noise.
Systems
Thermal Diodes As the arrays have become larger and dissipate more power, thermal characteriza tion becomes an increasingly important issue. Some means of evaluating array junc tion temperature must be developed for each array series. For some of these series, macros have been developed that allow the designer to add one or more thermal diodes to the design. The macros are treated as any other macro and are placed on interface cells. Newer arrays, such as the Q20000 Series, have thermal diodes built into the base array. The Q20000 Series arrays have a thermal diode structure embedded in the base and brought out to dedicated or fixed pads. These pads must be brought out to package pins. These pads are not accessible to any other macro function.
Example - AMCC thermal diodes (1994) Thermal diode macros exist for the Q14000 and Q5000 Series libraries and the de signer is required to add one thermal diode macro pair per circuit. Using more than one was found to be unnecessary as the thermal gradient across the chips was found to be insignificant. Where there might be doubt, additional thermal diode pairs can be added. Each pair uses two I/O cells. (See Figure 3-8.) One earlier version of the implementation also used one internal cell. No differences were found to exist be tween these two versions. Thermal diode macros also exist for the Q20000 Series for those cases where a second thermal diode measurement is felt to be necessary. Figure 3-8 Thermal Diode Pair
The AMCC AC Speed Monitor AC testing is a problem for both the designer and the vendor and to reduce the problems associated with it, the Q20000 Series arrays each has a built-in AC speed monitor with two fixed pads assigned to it. These pads must be brought out to package pins.
Threshold generators - routable generators The designer is not usually concerned with the threshold generators. In cases where they are required, they may only need identification and routing connections rather than actual cell placement.
VBB Reference voltages There are some instances where VBB reference voltages are desired, where I/O utili zation is high and the designer is using single-rail ECL where differential ECL is re quired. These reference voltages are supplied with a macro and are placed on an interface cell. They will connect to external package pins.
Speed and testing interface cell utilization Maximum speed of operation and testing requirements will have an affect on the final interface cell count. For very high speeds, differential ECL may be required by the array vendor, doubling the cell and pad counts of those signals. Testing may require that parts of the circuit are degated while other parts are being tested. This will occur when a simultaneously switching group is very large, including the simultaneous enable-disable of threestate or bidirectional macros. Test-enables may be required to partition the circuit for testing, and test enables will use cells and pads.
Population or cell type limits and utilization Where population restrictions exist, circumvention of the limits may the include the addition of interface macros. For example, a single-cell bidirectional macro limit would result in two-cell bidirectional macros being used for additional bidirectional signals. The single-cell 25 ohm ECL termination, if dual power supplies are not available would result in twocell 25-ohm terminations.
Placement restrictions High-frequency signals in particular will often require placement in specific cell loca tions and require that these macros be isolated with added grounds. Added grounds use pads and disable the accompanying cell. Where placement restrictions require the addition of macros or a change in the macros selected, the effects on cell utilization must be anticipated in the initial estimate.
Final Interface Cell Utilization The final interface cell count for the circuit in its estimated stage should look at all the factors that could increase interface cell requirements. The interface cell utiliza tion for a non-captured circuit should be less than 100% if possible to allow for adjustments and expansion. If this is not possible, than the rest of the design must be completed using I/O cell utilization minimization as a priority design objective. In the ideal situation, an array chosen for a design should be somewhere in the middle of an array series. This is to provide a smaller option if I/O minimization can reduce the requirements and to provide a larger option should the interface requirements grow out of the original selection. If not, then the interface utilization should be no more than 90% during develop ment, with no more than 100% interface utilization for the final design.
Interface Cell Utilization (general) To find interface cell utilization, add the items in the list in Table 3-18. Table 3-18 Interface Cell Utilization Interface Cell Utilization cells for input signals cells for output signals cells for bidirectional signals cells for thermal diodes (I/O) cells for AC speed monitor (I/O) cells for reference generators cells blocked by added power pads cells blocked by added ground pads cells dedicated to fixed I/O signals Divide this sum by the number of interface cells available on the array of choice. Interface cell utilization = (number of interface cells used by the circuit) / (number of interface cells available on the array)
Example - BiCMOS Cell/PAD Utilization When an array does not have a one-to-one ration of I/O cells and pads, then PAD utilization may also be required. The Q24008 and Q24280 arrays have 2-cell-1-pad structures. Certain macros placed on these structures are very efficient, others are not. Depending on the macros used, single-cell or multi-cell, either pads or cells may be rendered inaccessible. These arrays have a complex algorithm available to allow sizing. The algorithm requires a check on both cells and pads. PAD utilization = (number of PADS used by the circuit) / (number of PADS available on the array)
Fan-out load limits Internal cell usage will depend on the macros required to implement the desired func tions. Refinements to that estimate come when the fan-out load limits, hook-up and pin restrictions for those macros are evaluated. If an interface macro is driving too many loads, internal macro buffers will be needed to divide that load or additional interface macros will be needed. If internal macros are driving too many loads, the same approach is used. These buffer trees use cells and current. Macros will be specified with both fan-in and fan-out load limits. The fanin numbers represent the load that the macro presents to the macro driving it. The fan-out limit is the number of loads that the macro can safely drive before signal degradation becomes a predominant factor. A load unit can be considered to be equivalent to one pico-farad. Check with the array vendor for their definition.
Derated fan-out load limits Clock paths, distortion-sensitive and high-speed paths should be designed with a derated fan-out load limit, i.e., with macros operating well below their specified limits. The array may be specified with a guideline as to the frequency - derating schedule. Each AMCC array series is different in the value of the breakpoint frequency but each has the same basic rule. For sensitive and clock paths, derate the fan-out load limit by 20% up to the breakpoint and 40% at or above the breakpoint frequency.
Example - fan-out derating For the Q20000 Series, all internal macros have the Turbo speed enhancement allow ing a fan-out load limit of 18 loads. The TTL input and bidirectional input macros are the only interface macros that do not have this Turbo enhancement and their fan-out load limit is 9 loads. Assume that the breakpoint frequency is currently set at 400MHz. For an ECL input toggling at 500MHZ, the derated fan-out load limit would be: (1.0 - 0.4) * 18 = 10 (truncated)
Drivers Special driver macros may be provided in a library. These "super-drivers" are not derated. They are designed to provide a clean edge even when loaded to their rated limit. These drivers will use more current and more cells then the non-driver but fewer of them are required to drive the same load. The result may be the same cell utilization and the same power. Another feature of drivers should be considered. When timing analysis is performed, the super-drivers and drivers will be seen to have a lower kfactor (drive factor) than the non-driver macros, resulting in lower intermacro delays for the same load than a non-driver macro could provide. Drivers may be interface macros or internal macros.
Hook-up or interconnect restrictions Hook-up is used here to define the rules on grounding an input pin to a macro. CMOS and BiCMOS technologies require that all unused macro input pins (non-primary array inputs) be clipped to VDD or VSS, no exceptions. Bipolar technologies allow the unused input pins to be tied to global ground. The ground symbol on the schematic is for human comprehension and to allow checking software to understand that the designer meant to leave the pin unattached. For some arrays, a macro input pin connected to global ground on a schematic will mean that the pin "floats", or is unattached to anything when silicon is built. For others, these pins are physically attached to a confirmed logical low by connecting to a rail (CMOS) or by strapping the base to the emitter (bipolar) through conditional geometry. For the Q20000 Series these pins are base input to transistors and when unused are tied to the emitter to ensure a logical low. For the Q5000 Series, the pins were allowed to float. Whether or not the pins are allowed to float, there will be cases where specific macro pins are restricted, i.e., these pins cannot be attached to global ground but must be driven low by another macro. This is a hookup restriction. When hook-up restrictions exist, some macro must be added to the schematic to drive these pins low (or high). The number added will depend on the number of loads that must be driven low or high.
Pin restrictions - interconnect restrictions Some macros are pin-restricted in that they may not be freely connected to any other macro but much be driven by or drive a specific class of macro. As an example, TTL three-state outputs and TTL bidirectional macros in some macro libraries must have their enable pins driven by a macro known as a three-state enable driver. No other macro may drive that enable pin. The three-state enable drivers can only be connected to drive these specific pins; they may not be used to drive other macros. In the Q5000 library, three-state enable drivers may only be placed on interface I/O cells, even when they are driven by internal signals, leaving the pad unused in this case. When pin-restrictions cause the use of specific macros and these macros have re stricted placements, the impact on cell utilization must be considered.
Internal cell utilization When the paths have all been checked for fan-out, pin restrictions, hookup restric tions, placement rules, etc., the internal cell utilization can be estimated. As stated in Chapter 2, this is the sum of all the internal cells used divided by the number of internal cells available. Internal cell utilization = (number of internal cells used by the circuit) / (number of internal cells available on the array)
Further changes Other factors that can change the estimated cell utilization include adjustments made for power reduction, for speed enhancement, or for cell utilization reduction for ei ther interface or internal cells.
Exercises 1. Select a semi-custom array series (any). List: ● ● ● ● ●
the processing technology available power supply configurations types of TTL input and outputs allowed types of ECL input and output allowed how bidirectional macros are handled
2. For the selected series, what cell usage restrictions exist? ● ●
● ● ●
a. Any limits on inputs b. Any limits on outputs ❍ TTL ❍ ECL c. Any limits on bidirectionals d. Any rules for simultaneously switching outputs Are the rules easy to find?
3. For the selected series, how many fixed power and ground pads are on each array in the series? How are additional power and ground pads added? 4. For the selected series, what types of cells are available on each array and how many of each type? 5. How many internal cells would be required by the selected array series macros to implement an 8-bit barrel shift register (8 2:1 MUXs with 8 4:1 MUXs, 8 D flip/flops)? 6. Given a 16-bit fast adder design using carry-look ahead, 16 DATAA and 16 DATAB inputs, necessary controls (clock, reset, carry-in), a registered output, 17 outputs (sum plus carry out), size the design for the macro library for the selected array series. Assume a COMMERCIAL environment, single -5.2V power supply, ECL is ECL 10K or ECL 10KH. Fast adder: four 4-bit fast adders with carry-propagate outputs; one 4bit carry-look ahead unit; 17 D flip/flops; 35 ECL inputs; 17 outputs; buffers and gates as required; added power/ground as required. 7. Given a 32-bit register, 35 ECL inputs (32 data, clock, reset, 3-state enable), dual ECL-TTL outputs (32 TTL 3-state and 32 ECL, same signals), size the design for the selected array series. Assume a MILITARY environment, dual-power supplies of +5V and -5.2V, ECL is ECL 10K or ECL 10KH. Register: 32 D flip/flops, 35 ECL inputs; 64 ECL outputs; buffers and gates as required; added power and ground as required.
Case Study: Sizing A Design
TARGET ARRAY: AMCC's Q20080 {Based on 1994 data} The following exercise is not intended as a practical circuit for actual construction on an array, however, this exercise will examine nearly every design rule and restriction for the example array series. It will be solved here using a Q20080 array as the intended target solution but could be solved with any macro library provided one of the supported arrays in that series can accommodate approx. 160 I/O signals and toggle at 500MHz. See Figure A-1.
THE DESIGN Using the following list of requirements, design a circuit using AMCC macros for the Q20000 Series and size the design to fit the Q20080 array in that series: ●
●
●
A pipelined structure two flip/flops deep is to be 32 bits wide. Each data input to the first flip/flop stage is to be driven by a 2:1 MUX, the inputs of which are driven by ECL 10KH inputs. All flip/flops are required to be reset by way of a master reset signal.
●
The common clock is to be a differential signal, if possible.
●
All 32 multiplexors are to have a common select.
●
●
●
●
●
●
The target maximum speed of operation is 500MHz. (Design Objective.) All dataA inputs (32 of them) are to be fed in groups of four into two 16:1 multiplexors. There are four common select lines for the two 16:1 multiplexors and two outputs, controlled by enables (one per signal). All input signals, data and controls, are to be fed into a parity tree, a gate tree that will produce a single output. This structure is to be used for parametric testing. A six-bit pass-through bus (input to output without logic) is included which uses ECL inputs and outputs. The flip/flop output stage is connected to non-Darlington ECL 10KH outputs. Both true and complementary outputs are to be brought out to external pins. This is a military, standard reference ECL -5.2V single-supply circuit.
Note: Keep your data. This problem or a similar one will be referred to in other chapters.
Exercise Review the selected design manual, select macros and compute cell utilization. Pick an array from the series that would fit the design. Perform all required population checking for that series.
LOGIC DESIGN for ARRAY-based Circuits
SOLUTION - Q20000 Check for I/O mode and power supply. This is a 100% ECL circuit and uses no Darlingtons so that a single -5.2V supply is allowed. The AMCC chip macro is Q20080ECL10K, which sets the I/O mode at 100% ECL with ECL 10K/KH inputs. The power supply parameter is set at STD5 for standard reference -5.2V supply. The product grade parameter is set at MIL for military. Between them, these parameters define this circuit as a MIL5 circuit, using the MIL5 library and annotation data. The chip macro is shown in Figure A-2. Figure A-2 AMCC Icon for the Chip Macro
Selecting a flip/flop - first pass The need for a master reset will reduce the set of available flip/flop macros that could be used to those with a synchronous or asynchronous reset (or set). The use of a 2:1 MUX - flip/flop combination will further reduce the choices for the first stage of the circuit. For the chosen Q20000 macro library, FF46S is a D flip/flop with a 2:1 MUX on the data input and an asynchronous reset. It is more siliconefficient to use a combination MUX-F/F macro than to implement the design with individual multiplexor and flip/flop macros. The second stage flip/flop needs a reset and at this stage in the design process needs both Q and QN outputs. FF10S was chosen as the
appropriate macro. See Figure A-3. Figure A-3 MUX and two F/Fs in Two Macros
Selecting the ECL input All inputs (reset, selects, output enables and data) except the clock will use the IE93S, a simple buffered input that produces both Y and YN outputs shown in Figure A-4. The YN output will be used to input to the gate tree to keep loading off the Y path. To reduce power, the IE94 version with only the Y output could have been chosen. This option would use three loads on the Y path, two to the main circuit (register input and 16:1 MUX input) and one to the parametric tree. Figure A-4 Output Macro with Complementary Outputs
For this circuit, the saving of one load is not significant in that the loads are not in the critical path. In another instance, the reduction of one load could be the difference between meeting or failing specification. There are 64 data inputs, 32 dataA and 32 dataB, plus one select for the input 2:1 MUX, and four for the 16:1 MUX controls (and four 16:1 MUXs) for a total of 82 IE93S macros. Each macro uses one I/O cell and one pad. (See Table A-1.) Table A-1 Required IE93S Inputs 32 data A 32 data B 1 reset control 5 data MUX control select 2 output enables (MUX outputs) 6 pass-through inputs 78 IE93S inputs
Clock input The clock input will use IE34H, a differential high-speed input with a Y and YN output. For CML-compatible input, use IE31H. The clock will have two loads. It uses two I/O cells and two pads. The clock is in the critical path. Other options that could be considered include the use of the driver version of the differential input, IE32D. The driver handles 32 loads and has k-factors with less skew than those of the H-option IE34H. If the IE34H proves to be too slow or the inter-macro delays too long, the IE32D would be the choice for a speed upgrade. The driver is shown in Figure A-5. Figure A-5 Differential Input Macro
ECL outputs - first pass All outputs in the initial version of the circuit were the OE42S, a cut-off (ECL output with an enable) macro used with the enable tied low (always on) except for the two controlled outputs. (This macro was the only 50 ohm non-Darlington termination in the initial release of the library.) The (111) version of the library added OE11S, a NOR-input 50 ohm termination, rated for 350MHz. The other option is to have a custom 50 ohm macro created, not worth the effort for the case study but something that should be reviewed in a real circuit where power and cell space are at a premium. The OE42S enable is tied low by way of the GT87D static driver, a macro that supplies steady HIGH and LOW signals when unused macro pins cannot be "clipped" low or allowed to float. There will be 64 data output for the pipeline, six outputs for the passthrough signals, two MUX outputs and an output for the parametric gate tree for a total of 73 outputs. Each OE42S uses one I/O cell and one pad. The fan-out load limit for the GT87D is 50 loads so two will be required to supply the OE42S enable pins in this first version of the design. The basic module is shown in Figure A-6. Note that the OE11S is easier to use and uses less power - reasons to consider challenging the initial solution.
16:1 MUX The 16:1 MUX is constructed from five MX21S macros, each a 4:1 MUX with two selects. This is the largest multiplexor in the first release. Four of these will feed into the fifth to form the 16:1 MUX structure. Since there are two 16:1 MUXs, there will be 10 MX21S macros required. An 8:1 or 16:1 MUX MSI macro would simplify the design. The basic design is shown in Figure A-7. Figure A-7 Schematic Page for the 16:1 MUX
Parity tree A parity tree of all inputs (required for parametric VIL, VIH testing) can be formed from NOR gates using the GT60L or GT60S, an 8-input NOR macro. The L-option is slower and uses less power. The speed of the gate tree is not important since testing is functional at 100ns intervals. The first estimate for the tree is to use eleven GT60S macros in a threelevel structure to accommodate the 79 input signals. (The 78 data signals plus the clock are required.) The parity tree is shown in Figure A8. Figure A-8 Parity Tree
REVIEW STATUS SO FAR The first sizing estimate provides the cell counts shown in Table A-2. Table A-2 First Sizing Estimates # MACROS Macro
# I/O Cells Required
78
IE93S
78
73
OE42S
73
1
IE31H
2
TOTAL I/O CELLS: 153 Macro
# L Cells Required
10
# MACROS
MX21S
20
11
GT60L
33
32
FF10S
96
32
FF46S
96
TOTAL L CELLS: 245 The number of macros is not the same as the number of cells, even for the I/O macros.
Exercise Check the cell counts against the current design manual for the Q20000 Series. Check for new MSI macros or new I/O macros that might be used in place of those selected (such as OE11S). Consider size, speed and power in making changes. (Changes should be made!) If you are designing with a different array series, create the same table for the chosen library.
SIMULTANEOUSLY SWITCHING OUTPUTS Since 64 outputs are switching simultaneously in the worst case (master reset is one example), additional IEVCC macros (added ground) will be required according to the Q20000 Series design rules. A total of 16 IEVCC macros is required for these outputs and each blocks off one I/O cell and uses one pad. This is the minimum number of added power and grounds recommended for worst-case conditions.
Adding two more outputs for the 16:1 MUX Y outputs, six for the passthrough and one for the gate tree, requires two more IEVCC macros. ●
●
If the outputs switch within one macro delay (or within 2 ns, whichever is larger) of the other switching group additional IEVCC is required. If they switch well separated in time from the other group, then the added IEVCC for this group will not be required.
By tagging the switching groups and the added power and ground macros that belong to the groups with a SWGROUP parameter or property, the AMCC MacroMatrix can check for sufficient added power and grounds. For this design, assume that the groups are not simultaneously switching more than 32 signals, allowing a reduction in added ground. Allowing 8 IEVCC for the pipeline outputs (switching group AAA) and one for the rest of the circuit (switching group BBB), nine IEVCC macros are required. Adding these 9 IEVCC macros to the previous counts (153 + 9), the number of I/O cells used is 162. This is exactly the number of I/O cells available for circuit use on the Q20080 array. (This does not count the four fixed I/O signals for the AC Speed Monitor and the thermal diode that have pre-assigned PADs.) The added ground macro is shown in Figure A-9. Figure A-9 Added IEVCC Macro
Note: Using less than the recommended number of added grounds is not a good idea. It will require engineering approval before design submission and could cause other problems later. Think about another solution!
FAN-OUT LOADS The final step toward an estimate of circuit size requires that fan-out loads be examined. Most macros in the Q2000 library will have a fan-in of one except for H-option macros that will have a higher fan-in (and larger cell size). This is not always the case but should be considered when examining macro options. Select lines for 16:1 MUX Select lines to each 16:1 have at most four loads. No buffering is required for the IE93S macros that can drive 18 loads each. Select lines to 2:1 MUX structure The select to the 2:1 MUX structure has 32 loads and will need buffering. One macro can drive 18 loads, adding a gate buffer tree such as two GT09S macros allows one primary input to drive 32 loads. (See Figure A10.) Figure A-10 Buffer Tree for the 2:1 MUX (32 Loads)
The other option is to switch the IE93S for an IE23D driver that can drive 32 loads directly. The IE23D driver uses twice as much current as an IE93S macro but would save the internal cells that the GT09S macros would have used. Reset loading RESET requires the same decision process. In this case, the signal goes to 64 flip/flops. The AR pin for the FF46S is two loads and the AR pin for the FF10S is 1 load for a total of 96 loads. Either six GT09S macros or three GT55D macros can provide the drive. The GT55D driver uses twice as much current as a GT09S macro and is twice as large. Since half as many are required, on comparing cell usage and power these two solutions are equivalent. On the schematic, eight GT09S macros were used to simplify the schematic design (eight pages are replicated). (See Figure A-11.) Figure A-11 Reset Signal Buffer Tree
RESET STRUCTURE - ONE OPTION Reset structures are often treated as clock structures without the need for speed. This structure is only one level in depth. Current synthesis systems will create the necessary buffer trees to support the load being driven. Clock The clock is handled differently since all clock nets must be derated. There are 64 loads from the flip/flops, plus 1 load due to the parametric gate tree, for a total of 65 loads. The IE31H can drive 10 loads with a 40% derating. The GT55D driver, derated, drives 19 loads and presents a fan-in load of two to the driving macro. Four GT55D macros would provide the drive capability with full 40% derating down the path as shown in Figure A-12. Figure A-12 Clock Tree
CLOCK STRUCTURE - ONE OPTION Derating guidelines are part of the array design rules. Macro load limits are listed in the macro documentation. Place & Route software today creates the clock tree structure based on the commands in a control script. The commands involve suggested buffer or macro to be used and clock tree depth. In the near future, Floorplanners will incorporate this function. Clock trees have priority during layout, depending on the design constraints supplied to the Place&Route tool. When the clock tree is to be constructed by the Place&Route software, all timing analysis prior to the routing is done using a modeled clock, approximating what the final clock tree behavior might be. Static Driver The static driver required to drive the always-on output enable inputs can handle 50 loads but 64 are required in this version of the design.
Two GT87D macros can be used. One is shown in Figure A-13. Figure A-13 Static Driver
Static driver is not a term that shows up in macro lists today. Rather, high-drive options on various macros are used. If no one macro can handle the load to be driven, then a buffer tree is constructed by the synthesis tool.
Parity tree A parity tree of all inputs (required for parametric VIL, VIH testing) can be formed from NOR gates using the GT60L or GT60S, an 8-input NOR macro. The L-option is slower and uses less power. The speed of the gate tree is not important since testing is functional at 100ns intervals. The first estimate for the tree is to use eleven GT60S macros in a threelevel structure to accommodate the 79 input signals. (The 78 data signals plus the clock are required.) The parity tree is shown in Figure A8. Figure A-8 Parity Tree
REVIEW OF SIZE - SECOND PASS The revised estimate (one version of the solution) shows the circuit requirements as they are now understood. Table A-3 Second Sizing Estimates Number of Cells Required #macros
MACRO
CELLS
TOTAL
79
IE93S
1
79
73
OE42S
1
73
1
IE31H
1
2
9
IEVCC
1
9
TOTAL I/O CELLS REQUIRED 162 10
MX21S
2
20
11
GT60S
3
33
10
GT09S
1
10
4
GT55D
2
8
2
GT87D
2
4
32
FF10S
3
96
32
FF46S
3
96
TOTAL L CELLS REQUIRED 267 Change OE42S to OE11S and delete the 2 GT87Ds. This fits into the Q20080 array that has 162 I/O cells and 2044 L cells. This is a severely I/O-bound design (of course!). A design is either corelimited or I/O limited. Note: When vectors are written for this array, they should be designed so that no more than 16-32 of the outputs switch at any one time. These are AMCC-specific vector design rules. Table A-4 AMCCERC Population ERC
PACKAGE SIZE The minimum number of signal pins that should be available on a package for this circuit is 157 (162 signals plus the 4 fixed signals minus the 9 added grounds). The worst-case number of signal pins that could be required on a package for this circuit is 166 (162 signals plus the 4 fixed signals). The truth is in the middle and is placement-dependent. PROBLEMS ●
The OE42S is limited to a toggle frequency of 350MHz. If the clock is running at 500MHz, the outputs could be toggling slower. If not, then the OE42S is not a correct choice if speed is to be maintained. Neither is the OE11S!
●
Insufficient added grounds is not a minor problem.
●
The circuit uses nearly 8 Watts - much too high.
ALTERNATIVE SOLUTION The differential output OE14S could be used in place of two OE42S macros and the GT87D driver (at least one) could be deleted. This reduces the OE42S macros from 73 to 9, and the 7 always-on enables could be driven by a GT08L NOR gate instead of a static driver macro. The use of OE14S provides a cleaner solution (less skew) plus it frees internal cells. The maximum frequency of the OE14S is 1.2GHz. One output pad can be used as the true signal and the other as the compliment. Another advantage is the reduced requirement for added grounds. The 32 differential outputs count as 32 outputs and not as 64, reducing the re-quirement for this group to 8 added IEVCC, what was provided. The ninth IEVCC applies to the miscellaneous other outputs. There will be a warning issued by AMCCERC that there might not be sufficient added grounds for these miscellaneous outputs - the algorithm defined by AMCC requires that two IEVCC macros be added. Table A-5 OE42S Solution IE93S OE42S
78 73
IE31H IEVCC MX21S GT87D GT60S GT09S GT55D FF10S FF46S
1 9 10 2 11 8 4 32 32
Table A-65 OE14S Solution IE93S OE42S OE14S IE31H IEVCC MX21S GT87D
78 9 32 1 9 10 1
GT09S GT55D FF10S FF46S
8 4 32 32
POWER The DC power dissipation for the maximum worst-case MILITARY DC power for the OE42S version of the circuit was estimated to be over 8 Watts. The DC power computation for the OE14S version, same conditions, is esti-mated to be 5.88 Watts. (This number is based on the circuit as shown in the schematics and the February 1991 library specifications.) Reducing the GT08S macros to GT08L macros can further reduce power.
FURTHER THOUGHT For cell usage, timing, power, and added ground requirements, the basic OE14S solution is the best pro-posed so far. Table A-7 OE14S Solution
Table A-8 OE14S Solution
This version used GT87D instead of a GT08L. It uses GT60S macros in the gate tree instead of GT60L macros. Do the MUX and reset buffer trees need S-macros or could L-option macros be used? (Watch it - the options have different maximum frequency of operation numbers! This is often overlooked in choosing options.)
The DC power computed by the AMCCERC program is summarized below. Remember - AC power dissipation must be added to this. AC power compu-tations required depend on the array series. Table A-8b Macro Occurrence Report Continued
Exercise Add a design objective to reduce power to 5 Watts or as close to it as possible and modify this circuit using the latest library information. The frequency of operation requirement remains. This same exercise was used in the AMCC training classes through several library releases. This problem, or one close to it, was actually used for over eleven years with several technology libraries, bipolar, Bisquared MOS and CMOS. It demonstrates nearly 85% of the array design rules. Today's designers would create this circuit in Verilog or VHDL and a control script for the synthesis tool. Constraints can drive area reduction, speed improvements or power reduction. The script can also set the priority for the different design constraints.
THE SCHEMATICS Page 1 - Chip Macro and added Ground (IEVCC for ECL VCC); AAA is switch group tag; GT87D a static driver
Page 2 - Clock tree; RESET tree; 2:1 MUX select tree. Buffer trees go to various pages. Note the inputs to the parametric gate tree. "40"s are FOD values. (Figure A-10, Figure A-11, Figure A-12, Figure A-13)
Page 3 - 2:1 MUX selects and enable controls; 6-bit input-output path. OE42S macros should be replaced and VLO signal deleted.
Page 4 - Using MX21S 4:1 MUX macros to built a 16:1 MUX. OE42 should be changed.
Page 5 - pipelined register: 2:1 MUX-D F/F FF46S feeds FF10S which drives OE14S. OE14S connection could be improved to remove need for VLO signal.
Page 6 - Same as page 5 except for names. Note output to parametric gate tree. AAA is the switch group tag (matches IEVCC on page 1).
Page 7 - Next four bits.
Page 8 - Next four bits.
Page 9 - Next four bits. Note how the page number has been incorporated into the macro instance names - FF0905, FF0906, etc. - to prevent duplicate names.
Page 10 - Next four bits.
Page 11 - Next four bits.
Page 12 - Last four bits for 32-bit registers.
Page 13 - The parametric gate tree - all inputs fed into a combinatorial gate tree and tied to one output. PGATE is the GTO parameter value. OE42S should be changed. Note how page references make it possible to trace the connections. (Figure A-8) Page 14 - The second 16:1 MUX - this page should have been grouped with the other MUX page for better schematic set readability. Group functions together. (Figure A-7)
Design Optimization Last Edit July 22, 2001
The initial version of any design is almost guarenteed not to be the best solution. It is always possible to improve on an existing design, hardware or software, just as it is always possible to edit a manuscript. The trick is in knowing when to start and stop the process, also known as the endless loop.
Introduction Design optimization should be performed once an initial version of the design has been drafted at the block-diagram level. The design should be reviewed for optimization under the constraints of the established design objectives. It should also be reviewed for optimization using the particular characteristics of the technology and array series selected. A second design optimization review should be performed once the macro conversion has been accomplished. The first step in this process is another review of the chosen macro library. Familiarity with the macros available will be invaluable in contributing to an optimized final design. The process is shown in Figure 4-1.
Overview of Design Process - Objectives and Optimization
It has been shown that familiarity with the macro library is even more important than previous design experience! Something many designers argue with until faced with a case in point. After reviewing the steps required to solve the simple case study example, it should be obvious that the selection of macros to solve a circuit implementation is much more complex than simply selecting the macros that appear to solve the equation. Timing, cell utilization and power dissipation are integrated elements that must be considered in parallel during the design process. Design automation tools are moving in the direction of design synthesis and design review-for-criteria. Example systems are the NCR ViSys Design Advisor• available on the Mentor Graphics and VCR-supported CAE workstations. A future expansion to that system is the NCR design synthesis tool. Design synthesis tools will become more prevalent over the next few years. They should be considered as a tool to assist the designer, not to replace the designer. [This was written in 1991-4 so there are presumably more systems available. Check the current literature for other references.]
Optimization Approaches There are several approaches to optimizing a design as shown in Table 41. Table 4-1 Design Optimization Objectives General Design Optimization Objectives ● ● ● ● ● ● ● ● ●
improve speed and minimize distortion balance speed (tracking) reduce internal cell utilization reduce I/O pin count reduce power (lower the junction temperature) increase circuit testability increase circuit reliability reduce cost
These "design objectives" are often incompatible. Each design will have its own priority order for these items, establishing the basis for decisions where choices must be made. For example, design requirements may include a power dissipation limit or a junction temperature restriction. Solving the power equation may violate the speed requirements. The maximum specified operating frequency and critical path performance are usually clearly defined. Balanced path design is essential in communications circuits and has its own restrictions and tracking requirements. Macros selected to allow speed to be achieved may increase power while macros chosen to allow balanced delays may increase cell utilization. Cell utilization can determine which array in a series is acceptable, and the larger arrays do cost more. Cell reduction techniques may affect final speed, power and cost.
●
●
●
●
The reliability of the circuit is a question of the "trickiness" of the timing and the logic design. The so-called "hot dog" designers are as welcome in a company as their software counterparts - their designs are difficult to build, test or maintain. Modular designs are an important reliability and testability issue. Modularity may require additional macros for degating while testability may require additional macros for test point monitoring or circuits such as scan-path. A circuit must be testable to at least the 90% confidence level, preferably higher, and testability issues should have been in place before the design start. Refinements to the circuit testability are what should be required at this point in the design cycle.
Circuits with design for test (DFT) modules will average 10-20% more cells than circuits that do not use DFT. DFT circuits are easier to develop test vectors for than non-DFT circuits and they require significantly less vectors. (See Chapter 8.) The size of the test vector set also has an effect on cost. It is more costly to develop a large set of vectors, more time consuming to fault-grade them and takes more tester time to test the die.
DESIGN FOR SPEED There are several basic design approaches that can help the designer achieve the desired speed from the circuit. A list is provided in Table 4-2. Table 4-2 Designing For A High-Speed Circuit Design Procedures for a High-Speed Circuit ● ●
● ● ● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
Minimize the circuit logic. Evaluate several implementations of speed-critical paths. Reduce interconnections, gate count and chip area. Use dense, silicon-efficient macros where possible. Use the correct macro-option or version. Place signals efficiently in multilevel gating structures. Place late-arriving signals as last-level inputs. Place critical signals on the fastest paths when the macro has more than one propagation delay. Where there are complementary outputs on a macro, alternate the signal to balance loading delays, watching loads, drive-factors. Where there are complementary outputs, use the fastest path to propagate a critical signal. Duplicate logic to reduce the fan-out and metal load delay in the critical path (balance this against the added load delay to duplicate the signal and added silicon). Use the correct-drive macro for the fan-out at each net (do not overload). Do not use high fan-out drivers for low loads. Use fan-out derating on all macros except identified super-drivers. Avoid wire-ORs when available as they add to metal loading. Use binary counters instead of shift counters (parallel versus serial). Use parallel load counters and parallel data transfer. Use carry-look ahead, carry generate-propagate and other fast-adder techniques Choose the design for high-input logic gates carefully. Investigate design options and their speeds. Perform a pulse-distortion review and minimize distortion in critical paths. Minimize output pin capacitive loading, both system load and package pin capacitance. Last resort: specify placement restrictions for difficult paths (less than 20% of the paths in the array). Last resort: ask the vendor for a custom macro to enhance speed.
Systems
Design to Improve Speed If a macro library provides speed-power options for the macros available, an initial design is done selecting the standard macro options (the middle choice) and is also done from the perspective of logic minimization. Once the array technology has been selected, circuit speed is affected by the major factors shown in Table 4-3. Table 4-3 Designing To Improve Speed Optimize for Speed ● ● ● ● ● ●
Review the macro chosen Review the macro option Review Macro functionality Perform Logic minimization Reduce the fan-out loading Reduce wire-OR loading (if allowed)
In the case study, the change of the output macro to allow the circuit to operate at specification did not affect cell count and reduced the power by three Watts. The solution of timing problems are not often so simplistic or beneficial. When faced with a circuit that does not meet timing specifications, changing macros can lead to increased cell counts, no change or decreased cell counts and higher power dissipation, no change or a reduction in pow-er dissipation.
Macro options When an array provides macro options, those options should be reviewed for applicability to the design problem. If a macro comes in low-power, high-speed and standard options, they will each have a different toggle frequency or maximum frequency of operation. Each option may have a different fan-out load or drive capability. Each option will have a different power dissipation. In some case, the different options may have different cell counts. The selection of a high-speed macro option may carry power dissipation penalties that may in turn lead to other macros needing to be downgraded to low-power options. This may be necessitated by an internal current limit for the array or from the early estimates of the junction temperature. As long as the toggle frequency of the low-power macros are not violated and the fan-out load limits are not exceeded, then the use of low-power options is acceptable. When a library has driver macros with balanced load delay drive factors (minimal skew) and faster intrinsic (internal) delays, the use of the driver
should be justified. Drivers typically carry a power dissipation penalty. They should be used in clock distribution lines and for heavily loaded paths that have tight timing specifications.
Macro functionality Another reason to review the macro library before finalizing a design is that speed is usually a function of density. In general, a highfunctionality multiple-cell macro will perform better than a circuit module formed from equivalent macros. Intra-macro nets (connecting components) are shorter than inter-macro interconnect delays. A hard macro, where the routing is always in the same pattern, can guaranty its worst-case speed. A soft macro has placement require ments and priority routing that must be used if it is to meet its specifications. High-functionality macros also include those that combine functions, such as the 3:1 MUX-D flip/flop macros (supporting testing), dual flip/flops, triple latches or triple multiplexors, internal-I/O dual function macros that ensure the maximum utilization of the complex I/O cells, or combined two-cell bidirectional-added ground macros, that keep the second pad from being wasted. High-functionality macros are those that prevent or minimize wasted (unused) silicon, pads or cells. Other design techniques to increase the circuit density include replacing gate structures with multiplexors where the speed and gate count would be reduced. (See Digital Design with Standard MSI and LSI, 2nd ed., by T.R. Blakeslee, 1979, Wiley, New York.)
Example of silicon efficiency As an example of high-functionality silicon-efficient macros, the AMCC REG00 universal register is the equivalent of four 4:1 MUXs and four D F/F or 8 macros and is available in several libraries. In the Q5000 library, it uses 4.5 internal cells whereas the individual 4:1 MUX and D Flip/Flop macros would use 8 internal cells. The number of inter-macro interconnects is 15 and their delays are kept constant for each occurrence of the MSI macro due to the required placement pattern. REG00 is a soft macro in the Q5000 library and a hard macro in another. Table 4-4 Basic Macro-Selection Guidelines * Path Type
Use these macro types: High fan-out drivers
Clock paths
Fast paths
Loaded paths
Slow paths
High speed, minimum skew macros High-speed, minimum skew macros High fan-out drivers Parallel structures Low-power, slower macros
Average paths Standard macros
Penalty: more power more cells more power more power more power more cells more power more cells (less power) (less cells) --------
* For libraries with high-speed, low-power, standard and driver macros. Comprable tables can be generated for libraries with other combinations of variations, versions and options.
Alternative implementations One way to ensure that the best version of a critical path has been created with any given library is to create more than one solution. Alternative implementations should be reviewed within the confines of a specific set of prioritized design objectives. They should always be evaluated for any critical path. Paths (partial circuits) can be captured and simulated for detailed comparison of timing, including timing check analysis using the current workstations. Some jury-rig of connectors or dummy loading may be required depending on the peculiarities of the CAD/CAE workstation chosen and the part of the circuit being captured. This is a minor inconvenience in exchange for which the designer can easily perform timing analysis, such as checking on pulse width distortion, set-up and hold violations and path delays.
Examples As an example of the value of checking various implementations, a test circuit was given to students, applications and macro design engineers to implement using the Q5000 library. The design objective was stated as speed at all costs. The students (unfamiliar with the macro library) produced circuits running at 125-145MHz. Applications engineers produced a version running at 183MHz. The array-macro designers produced a version running at 235MHz. The variable was macro library familiarity; no custom macros were allowed. The case study in the previous chapter was another example. It showed two different implementations of that circuit, one with an output toggle rate of 350MHz and one with 600MHz - the only difference was the output macro.
Internal Net Delays The delay in a heavily loaded net tends to be longer than the macro intrinsic delay, the delay through the macro that drives the net. The net delay is a result of the electrical effects of fan-out load, wire-OR load and the capacitance of the metal interconnect length. Wire-ORs if allowed in the library will add to the delay in the net with both an electrical load and with additional metal. Front-Annotation delays (pre-place and route) due to metal length for a large array are on the average larger than for a small array. This is reasonable since the side to side distances are larger for the larger array. The break-up of heavily loaded paths into identical parallel paths can result in significant propagation delay improvement, regardless of array size. Table 4-5 Components Of Internal Net Delays Internal Net Delays ● ● ●
electrical fan-out load electrical wire-OR load capacitive load of the metal etch
Fan-out loading is the same regardless of the array size. A macro driving 6 loads on a large array would see the same load if the circuit were placed on a smaller array.
Wire-ORs when allowed Some array libraries allowed dot-connects such as wire-ORs or wireANDs. These may save gate delays but add a wire or metal length penalty. A wire-OR driven by four macros and outputting to six other macros has the metal equivalent of a ten output net - using lumped FrontAnnotation computations. This is considered to be a heavily loaded net. The added net delay will probably exceed the "saved" gate delay. The use of dot-connects should be carefully evaluated. Verify that they are allowed on the schematics before evaluating their usefullness. They may be allowed on some arrays and not on others from the same vendor. For example, the AMCC Q5000 has wire-ORs but the Q20000 bipolar and Q24000 BiCMOS arrays do not allow their use. Figure 4-2 Optimization - Speed
Optimization - Speed Considerations
Design To Reduce Internal Cell Utilization Reduction of the internal cell utilization or equivalent gate count is also called logical circuit minimization. Factoring of common terms from the logical equation and the removal of redundant logic help reduce cell counts by reducing the logic that must be implemented. Minimization is critical when a high fault-grade score is desired since redundant logic will lower the potential test score (fault masking). Use of the higher functionality MSI macros and the design approach discussed earlier, of selecting higher functionality macros first and working back towards the SSI macros in the library, will contribute to a cell-efficient design. The design approaches for internal cell minimization are shown in Table 4-6. Table 4-6 Minimizing Internal Cell Utilization Reducing Internal Cell Utilization Logic minimization. High-functionality internal macros. Use shift counters instead of parallel counters. Use ripple counters if the propagation delay meets specification delays. Use ripple-carry adders (between MSI blocks). Use single polarity between macros. Use serial data transfer. Use a scan-test F/F or latch to replace a MUX-F/F or MUXLATCH combination. Avoid extraneous invertors (those added just to invert signals). Many macros are available in complementary form or use DeMorgan's theorem. When converting from TTL or ECL, do not implement unused functions. Keep the macro design application-specific. Avoid: ● ● ● ●
unused preset, clear multiple enables excess carry logic excess load logic
Internal cell minimization is not fully compatible with the approaches used to improve speed. While logic minimization does help speed, as does the use of high-functionality macros, serial operations are slower than parallel operations. The designer must be guided by the priority assigned to the conflicting design objectives.
Example An experienced ECL designer chose the Q3500 array and converted a standard-part design into macros from the chosen library. He was careful to duplicate the parts exactly. When he was finished, he had 124% cell utilization. At the time, there was no larger array. The solution came when the logic was minimized and the unused functionality of the individual standard parts was deleted from the design. By changing the design from a direct conversion to an application-specific implementation, the cell utilization was reduced to 98% and the circuit was built. {True story.} Figure 4-3 Optimization - Cell Utilization (Sizing)
Optimization Issues - Cell Utilization
Design To Reduce I/O Utilization There are several techniques used to reduce the I/O cell utilization shown in Table 4-7. Table 4-7 Reducing I/O Cell And Pad Utilization Reducing I/O Utilization Use the high-functionality interface macros. Use bidirectional macros if possible. Partition a system by bits (bit-slice) rather than function. This also allows the development of circuit sub-modules which reduce schematic capture and test efforts. When speed requirements will permit it, use serial data transfer rather than parallel data transfer. Multiplex test points to reduce test pinouts. Multiplex non-critical outputs. Transfer only one polarity of a signal on and off chip (single rail transfer) rather than differential if other factors permit it. - use a VBB source and single-rail ECL. Decode input signals on-chip. Use local (to the array) counters; duplicate the counter on several arrays and synchronize. Use bus architecture, where one or more I/O signals serve several signal sources. Use external serial-input registers. The difficulty with array-based circuit design is that, should a circuit require just one more I/O connection, there is no way to obtain it save by the selection of a larger array. There are no jumpers, piggy-backed components and other quick-and-dirty board design tricks that can apply. The I/O signal count must fit the target array, and be equal to or less than the available array signal pads.
Design To Fit The Package Also at issue at this point in the design stage is the desired package. When the package is selected, the package methodology for handling added power and grounds can be determined. Packages can be one-onone, each added power and ground pad reaches an external package pin, or they may have internal power and ground planes. Not all array pads will reach an internal power or ground plane therefore there are placement restrictions on the locations of the added power and ground macros. If these restrictions cannot be met due to other placement requirements or if the package does not have enough pads that can bond to internal planes, then the added power or ground macros will require external package pins. (Example, some packages offer ground planes and no power planes.)
Case 1 - Count of all array pads less than or equal to the number of total package pins There are two approaches to checking the package against the design. The first is when there are no internal power or ground planes. In this case, count the number of inputs, outputs, bidirectionals, added power, added ground, fixed power, fixed ground, and any fixed I/O signals (such as on-chip thermal diodes and AC speed monitors). This number should be less than or equal to the total number of package pins.
Case 2 - Count of all signals less than or equal to the number of package signal pins The second case involves making an estimate which can be refined after placement is completed and approved. In this case, count the number of array pads used by inputs, outputs, bidirectionals, added power, added ground and any fixed I/O signals such as on-chip thermal diodes or AC speed monitors. This number should be less than or equal to the total number of package signal pins. The package power and ground pins connect to the internal power and ground planes. After placement, the number of signals will be reduced by the number of added power and ground macros that were placed to connect to internal package power and ground planes. Those macros will not use the external package signal pins.
Example A designer submitted a design with a desired package (a 149 PGA with internal power and ground planes). The package has 120 signal pins. He used 132 array pads for I/O signals and added power and grounds. There were no fixed thermal diode or AC speed monitor signals on the array. There were eight added power and grounds. The design, after careful placement of the added power and grounds, had four signals more than there were package signal pins.
This problem was not discovered until placement, i.e., until after all simulations and design validations were performed. One solution is to look for expendable I/O cell usage. If there are extra grounds beyond the minimum, or more VBB or other voltage sources than is really required, they can be reduced. Another solution is to add an 8:1 MUX, place eight non-critical outputs as inputs to the MUX, add three input signals to control the select lines and one output for the MUX. This reduced the total number of signal pins required to 120, which would fit the package. What if neither of these solutions is acceptable? Then some other design change is in order if the package cannot be changed. A design change requires that all simulations and all checking be repeated. Could this situation have been prevented? By checking the package limits during the optimization phase and using the package limits as a guide, the design changes or package changes could have been identified earlier, saving the iteration of the simulation loop. Remember that simulation is estimated to use approximately 50% of the CPU time used in a design process. Figure 4-4 Optimization - Packaging Issues
Optimization Issues - Packaging
Design To Reduce Power Table 4-8 Power And Macro Choices Macro Options That Affect Power Consumption Macro Option, if available Macro Drive capability Functionality Outputs used or terminated I/O mode
When the array library has options, the high-speed macros will dissipate more current than the standard option macros. The low-power macros are slower with less current than the standard option macros. Driver macros use more current than non-drivers, while super-drivers may use 2-3 times the current used by a driver that can handle fewer loads. High-functionality macros are using more of the components in a macro cell so the cells these macros occupy use more current. MSI macros are usually pre-placed to avoid hot-spots and to maintain their timing specifications. Some arrays (such as the AMCC Q5000 Series) offer a power-down feature if a macro output is unused. The newer [1994] Q20000 Series does not have this feature. For DC power dissipation, overhead current is also a factor. Overhead current is that current that is used when an unpopulated array is plugged into the power supplies. It supplies the internal voltage regulators and reference generators. It is a function of the I/O mode and the power supply configuration. AC power dissipation computations depend heavily on the switching frequency of the various macros. Depending on the vendor, the number of outputs on the macro is a factor in the AC power equation. If one output is required and the other terminated, the terminated output contributes to the AC power equation. When several variations of a macro exist, and there is a choice, use the macro with a single output if that is what is needed. The DC power computation can be performed by software supplied by the array vendor. AC power computations are still primarily a manual estimate. (Hardware emulators are in the early stages of AC computation support. Refer to the chapter on power computation.) Figure 4-5 Optimization - Power Considerations
Power dissipation is controlled by the number and type of macros chosen. . Some of the variations are summarized in Table 4-8.
Optimization Issues - Power
Design To Reduce Cost The cost of a design is a function of all factors involved in that design, from the initial design decisions on who will do the macro conversion to the special testing requirements. Anything that is not within the standard, routine design flow will usually cost more. Design iterations cost more. A redesign averages the same amount of CPU time spent in simulation. Items that increase costs are listed in Table 4-9. Some guidelines to keep costs down are shown in Table 4-10 . Table 4-9 Items That Increase Cost; * Items That Increase Circuit Costs ●
●
● ● ● ●
●
●
● ● ● ●
● ● ● ● ● ●
●
● ● ● ●
Training classes - usually credited against NRE Amount of design support from vendor ❍ macro conversion ❍ performing validation ❍ performing simulations Array series chosen Size of the array Custom macros Use of the vendor's design center longer than a nominal time (4 weeks) Functional vectors that exceed 4K ❍ charged on a per page (4K) basis (based on the SENTRY tester) or whatever limit is specified Functional vectors that contain races and hazards (require rework) Fault grading - more than 2 passes Net matching required Pre-defined pin-out (causes iterations) Design iteration ❍ after first place and route ❍ when it was not the vendor's error Non-standard bonding Non-standard package Optional heatsink Custom DUT board Custom test software Bench tests ❍ charged per path MIL screening ❍ tri-temp testing ❍ burn-in ❍ qual Optional Commercial circuit burn-in Optional Commercial circuit - qual Optional PIND testing (post-package) Expedite on schedule
* AMCC measures used for example
Table 4-10 Keeping The Costs Down How to Keep Circuit Costs Down ●
●
●
● ●
●
●
●
● ●
●
● ● ● ●
Allow adequate time in the design schedule and keep to the schedule Allow time for several design reviews at various steps in the design flow Allow time for the vendor's steps and to review those steps Follow the previously outlined steps for the design flow Choose an array that is adequate for the design - within population and cell utilization bounds Keep the junction temperature within bounds --- so that a heatsink is not required --- to minimize placement problems Use added power and grounds as required for SSOs, high-frequency- use additional grounds Keep the design modular to keep functional vector set size down Plan for standard packaging, standard bonding Do not commit the PCB layout until the array place and route is approved Plan the design (through a design review) before showing up at the design center Design for testability Work with the vendor to avoid custom macros Plan for standard testing, standard DUT board Plan to avoid costly redesigns
Design Reviews A design review should be held at the initial optimization at the block diagram-functional description stage of the design. Another design review should be held on the completion of the design optimization at the macro level, before any lengthy simulations are performed. This procedure will help to reduce iterations of the simulation design step.
Summary Design optimization is not something that is just applied to a design after it has been created. Optimization should be considered as a design is being planned, as the block diagram is being sketched, and as the macro conversion is being performed. The design criteria, and the priorities of those criteria should be established at the start of the design cycle and referred to at every step in that cycle. The design criteria are not compatible items. In satisfying one objective, another may be compromised. The engineering task is to satisfy as many as feasible for the given situation. It requires tradeoffs and compromise.
Exercises 1. Select an array series and macro library to evaluate. 2. Choose a design project of your own or try a 16-bit adder with latched input and registered output (the adder portion of the Am2901 without memory). 3. Establish a set of prioritized design objectives including the target array and the target speed (such as add-with-carry). Block out a solution for the design. 4. Change the priority of the design objectives. For example, instead of speed being the most important, make cell reduction and power reduction the most important objectives. Block out a second solution for the design. 5. What effect did this change of perspective have on your approach to the design?
Basic Design For Circuit Testability There are several formal methods for design-for-test. These include scanpath and level sensitive scan design. In addition to these, and a part of the design for test requirements, are the follow suggestions for improved circuit testability. ●
●
●
● ● ●
● ●
●
● ●
●
●
● ● ●
● ●
Become familiar with the macro library BEFORE beginning the macro conversion or design. Use synchronous rather than asynchronous circuits whenever possible - functional tests are synchronous. Partition the design (use structured design techniques) into smaller, testable sections, usually along a functional boundary. In partitioning: ❍ Use degating logic to isolate modules for test. ❍ Use modular architecture, bus structures. ❍ Break up long counters (>8). Don't bury states. Use transparent latches instead of flip/flops where possible. Use macros, especially flips/flops and latches, with RESET or SET controls where possible to simplify initialization. Avoid feedback loops. If unavoidable, provide a means to break up feedback loops during test (degating, enables). Avoid redundant logic - minimize! - or add test points to unmask masked faults. Avoid derived clocks - they complicate testing. Design in test points, especially in sequential logic. Add test points to improve controllability and observability. Perform testability analysis. If I/O pins are limited, use demultiplexors to control and multiplexors to observe internal nodes with otherwise poor observability (buried states). Any 3-state enable control signal that is internally generated must be externally observable, and should be externally controllable during test. Add parity trees for error detection. Use Scan Path Design to simplify test sequence generation. Use Level Sensitive Scan Design to simplify test sequence generation. Use some variation of the Scan Path or LSSD DFT procedures. Keep test generation in mind while designing the circuit.
Figure 4-6 Optimization - Circuit Testability
Optimization Issues - Testability
Basic Design For Circuit Reliability Some specific design suggestions for improved circuit reliability are: ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
Become familiar with the macro library BEFORE beginning the macro conversion or design. Be aware of "glitch" circuits. Do not use potential glitch circuits to drive clock inputs. Avoid one-shot pulse generators. Avoid gated and derived clocks. Avoid race and hazard conditions. (print on change files can help identify these.) These are generated by having a signal follow two or more paths to a common circuit element (a.k.a. reconvergent fan-out.) Avoid feedback loops. If unavoidable, provide a means to break up feedback loops during test (using degating, enables). Avoid feedback paths between registers. If present, compute the worst-case set-up and hold times and verify operation. (Feedback from the ECL output macros must be handled with care if used to input to internal latches and flip/flops.) Add sufficient GROUND for the number of simultaneously switching outputs and distribute among these outputs (similar to distributed ground in a ribbon cable). Add additional extra ground if there are extra I/O pins available. Add extra VCC as needed for the number of simultaneously switching outputs. Properly derate fan-out on all distortion-sensitive paths and all clock paths. Keep clock path loading balanced. Avoid floating nodes on internal 3-state busses or external bidirectional busses. Use Johnson (a.k.a. Mobius, Ring or Twisted-tail) counters or separate flip/flops to decode terminal counts. The loading on the Q outputs is identical, eliminating the loading skew (not the metal skew), and the outputs are a Gray code - only one output changes state per clock cycle. (Binary counter decoding can cause glitches.) Compensate for rising and falling edge loading skews and the reversed TTL input translator rising and falling edge skews by inversion as needed to reduce pulse stretch and pulse shrink phenomena. Reduce heavy loading on high-speed paths by creating parallel paths with identical macros. Use ECL differential inputs and outputs when the frequency exceeds 200-300MHz. The actual frequency boundary will be series-specific.
Figure 4-7 Optimization - Circuit Reliability
Optimization Issues - Reliability
Timing Analysis for Arrays
Introduction There are two types of timing analysis required for array verification ● ●
path propagation delay and set-up and hold time analysis.
Path propagation delay is covered in this chapter. External set-up and hold time is covered in Chapter 6. One circuit design objective is speed and, before schematic capture, a preliminary timing analysis of the critical paths of a circuit is performed. It may be done by simulation or even by hand, if necessary, to assure that the circuit as implemented on the chosen array will be successful. This initial analysis may dictate the design optimization techniques required to ensure that the circuit will meet the specified performance requirements. To reduce manual effort, as soon as a macro library is available for evaluation, the critical performance paths of the circuit should be captured and a detailed, annotated simulation performed. (Manual computations may still be required for external set-up and hold time analysis.)
A detailed timing analysis of the complete circuit and its critical paths is required before a circuit can be submitted for place and route. Timing analysis of a circuit includes: ●
● ●
●
●
worst-case path propagation delays for both rising and fallingedge inputs for all critical or suspected critical paths; external and internal set-up, hold and recovery times; pulse skew and tracking due to placement (process variation); pulse distortion due to wire-OR, fan-out and metal loading (pulse stretch, pulse shrink) for all input and internal macros; pulse distortion due to output macro loading (package pin capacitance and system capacitive loading
The macros selected, the options of those macros, the loading on the macros, and the final layout of the circuit are all factors in the propagation delay of any path. The loading may be the interconnect capacitance or the external load capacitance due to system loading and package pin capacitance.
Path Propagation Delay Overview Computation of the propagation delay for a circuit path includes an evaluation of the following: ● ● ●
input, logic and output macro intrinsic propagation delays extrinsic loading devices an adjustment for environmental effects and processing variations (worst-case timing multiplier) for those arrays which specify typical intrinsic delays
Intrinsic Delays Array vendors use a variety of design manual documentation formats to specify intrinsic delays. Intrinsic delay (Tpd) is the time required for a signal to propagate from a macro input pin to a macro output pin. The delay may be different for each input to output path through the macro. The delay may be specified as dependent on the input and output edges such as rising to rising (++) or rising to falling edges (+-, inversion). The delay may be a function of other input states or simultaneous input switching. This information is usually detained in the documentation for the macro library. The four input-output edge combinations may be identified as: Tpd ++ rising edge input, rising edge output Tpd +- rising edge input, falling edge output Tpd --
falling edge input, falling edge output
Tpd -+ falling edge input, rising edge output Intrinsic delay may be specified as typical, with adjustment factors or worst-case delay multipliers supplied to allow maximum and minimum delay computations for specific operating conditions. The delays may be specified as worst-case maximum for one set of operating conditions with adjustment factors to convert to other conditions. Another option is to specify the delays with a worst-case min-max range for one or more sets of operating conditions. (See Table 5-1.) Table 5-1 Tpd Specifications And Adjustment Factors (Historical) Tpd Specified: For Specific Operating Conditions Use: Typical
Adjustment Factors
Worst-Case
Factors for other Conditions
MIN/MAX Range (Specific to Conditions) Macro intrinsic delay values may assume no loading, one load, or several loads on the macro output pin. When annotation software is available, macros are specified as unloaded. For some macros, delays are dependent on how many other pins on the macro are also switching. The actual macro path delay may be a function of: 1. state of the input data (low data may have different set-up and hold times than high data; 2. low to high (rising edge) propagation may be different from high to low (falling edge) propagation); 3. multiple inputs changing state (when several OR/NOR inputs change simultaneously, the delay increases). Three-state macros have specifications for high-Z, representing switching delays for TPHZ, TPZH, TPZL and TPLZ. The propagation delay supplied in a design manual is for the delay from input to output measured at the 50% level. For TTL I/O macros, the measurement is at the 1.5V level. Rise and fall time is measured between 10% and 90%. The data sheet for the array series should indicate measurement points and levels. If this information is important to the design, check to see if it is available, under what conditions the measurements were taken and under what load. Adjust according to the intended operating environment and load conditions.
Intrinsic Set-Up and Hold Time The intrinsic set-up and hold times represent the required behavior of the signals coming into the macro, observed at its input and output nodes. External set-up and hold times are concerned with the signals at the external pins of the array. Set-up time (Tsu) is the length of time that a data signal must be stable before the 50% point of the next active clock edge. A negative set-up time indicates that the data does not have to become stable until after the active clock edge. Hold time (Th) is the length of time that a data signal must be held stable after the active clock edge. A negative hold time indicates that the data can be removed before the active clock edge. Whether a set-up or hold time is negative is a function of the disparity in the delays in the clock and data paths between the input and the actual functional use of the signal. For example, a complex macro may have a multiplexor in the data path and nothing in the clock path for an internal flip/flop. Today's libraries tend toward rising-edge active clocks, zero hold times, and positive set-up times.
Start and End Points for Set-Up and HoldTime Computations Set-up time is considered to be the minimum time required for a signal to travel from the circuit data pin input to the data port of the flip/flop or register. It can also be the minimum time for a signal to leave the Clk->q port of one register and arrive at the data port of a sequential register (register-tegister delay time). When the start point is the external data pin (and the clock from the external clock pin), this is considered external set-up and hold time. This is discussed in Chapter 6.
Intrinsic Recovery Time Recovery time (Trec) is specified for any latch or flip/flop macro that has a set or reset pin, or any complex macro that contains the equivalent function. It is the length of time that a reset or set signal has to have been inactive before an active clock edge. Clocking within the recovery time will result in unpredictable behavior.
Set-up, Hold and Recovery Specifications Set-up time (Tsu), hold time (Th), and recovery time (Trec) may be specified as typical, in which case adjustment factors or worst-case multipliers must be used to adjust them to the specific operating conditions of the circuit. They may be specified as worst-case for common operating conditions, i.e., Commercial and Military. In cases where differences between Commercial and Military are minimal, only one value may be specified.
Maximum Operating Frequency; Toggle Frequency The maximum operating frequency (fmax) is specified as the maximum switching rate for the macros and is technology dependent. Macros must not be driven beyond their specified operating limits. The actual frequency at which a circuit may operate must be computed from the worst-case critical path propagation delay timing analysis and set-up and hold analysis. The limit for a macro is its minimum pulse width: the minimum pulse that can be successfully propagated through the macro. The pulse reaching a macro input pin is a function of the frequency of the signal on that pin and the pulse stretch-shrink distortion of the signal.
Intrinsic Pulse Width Minimum pulse width (PW) is the inverse of the specification of the toggle frequency for the macro. It defines how close any two edges of a pulse passing through the macro may be. Latches and flip/flops and the complex macros that include these devices will normally be specified with a minimum pulse width and a maximum frequency of operation, possibly differentiated for military and commercial operation, or both. (The reciprocal of the pulse width multiplied by two is the frequency.) For most macros, the generic maximum frequency of operation for a given class of macros defines the limits. For special cases, such as latches, flip/flops, and complex macros, the macro may have its own specific limits. Complex libraries with a range of performance within the macro set may specify a maximum frequency and pulse width for each macro.
Examples As an example of specification approaches, the Q20000 [as of 19924] specification approach is detailed below. For Q20000 Bipolar Array Series: ●
●
● ● ●
The pulse width specifications in the AMCC Q20000 design manual are worst-case times, computed from the maximum frequency of operation (assuming a 50% duty cycle). PW is specified for Military and Commercial conditions (originally specified for Hot and Cold conditions). Tpd delays are specified as unloaded delays. Set-up and Hold times are specified as worst-case. Although Tsu, Th, and pulse width are specified as single values for the given conditions, the Tpd delays are specified as a min/max range.
For today's Arrays ●
●
●
●
●
●
●
Pulse width, set-up and hold times, are computed by today's synthesis programs using data specified in the chosen design library. Libraries are specified for specific operating conditions - NO ADJUSTMENTS If variations on the library operating conditions is desired, get the vendor to supply a new library. [We will repeat this warning.] Each library has 1-2 interconnect models (worst-case being one of them) MIN-MAX anaysis of hold time is commonly computed prelayout to identify serious hold-time issues. Individual macros contain set-up and hold time information for the macro. Pulse-swallowing is used to refer to the condition where the signal is too fast for the macro to "see" - i.e., the frequency exceeds the minimum pulse wodth fo the macro
Interconnect Delays Path delays are composed of the intrinsic (ti, internal to the macro) delays specified for the macros and extrinsic delays (Tex). Extrinsic delays are composed of the path propagation delays for the macro interconnect (macro to macro routing) and the output macro capacitive load. For any array, once the macro intrinsic propagation delays for macros in a given path for a given edge direction are listed, the next step in performing timing analysis on a circuit is to evaluate the loading for each macro in the path. Each macro output pin belongs to a separate timing path. Interconnect delays are the delays incurred by a signal when propagating from a driving macro to the inputs of its load that are caused by the RC time constants of the metal etch interconnect. The delay will be different for rising and falling edges. For small arrays, the delay was assumed to be dominated by the fan-out loading allowing linear estimates based on that load to be used to approximate the interconnect delays. The interconnect delays in the small arrays were not as large as the intrinsic macro delays themselves, therefore, the simplification could be justified. As the arrays became larger and faster, the interconnect delay in a heavily loaded net became larger than the intrinsic delay for the macro. It became important to obtain a closer estimate of circuit performance pre-layout. This led to the development of FrontAnnotation software, capable of estimating the interconnect delays based on statistical tables of physical fan-out loads versus etch length.
Types of Extrinsic Loading There are two types of extrinsic loading. The first type exists for those macros that drive internal nets (texint). Loading for an internal net involves: ● ●
●
the electrical load due to the fan-out for the net; the electrical load due to the presence of wire-ORed inputs to the net; the physical load due to the first and second layer metal used to interconnect the net.
The net that connects the output macro to the outside world will be subject to a capacitive load. This second type of extrinsic load (texout) is due to system capacitive loading and package pin capacitance.
Annotation There are three approaches that can be used to compute propagation delay due to the interconnect nets: ●
●
●
Front-Annotation, where a statistical estimate of metal delays based on the physical net size is used; Intermediate-Annotation, where a refined statistical estimate based both on the physical size of the net and the relative positions of the macros; Back-Annotation, based on the actual lengths of metal 1 and metal 2.
In all three cases, the interconnect delays due to the electrical fanout load and the electrical loading due to wire-ORs are accurate. The part of the delay due to the metal lengths will vary. When package pin capacitance is included in the output macro extrinsic load, Front-Annotation uses an estimate since actual placement is unknown. Back-Annotation uses the actual pin capacitance for the assigned package pin. For critical circuits, since routing is the longest process, Intermediate Annotation can reduce the routing passes or edits required. This tool, however, is not always available.
Drive Factors Macros are specified with drive capability, drive factors or adjustment factors that apply to extrinsic loading with the same variability that is found in the intrinsic delay specifications. Alternatively, they may have delays directly specified for several loads allowing an extrapolation to be made, often by reading a graph.
Manual Computation - One Method The path propagation delays can be estimated using: ● ● ● ●
the statistical wire delay tables (Lnet), fan-out loading (Lfo), wire-OR loading (Lwo), if any, and the appropriate k-factors or drive factors (k) for the macros chosen.
One equation for the typical extrinsic (load) delay for a single internal net is shown below and discussed in detail on the following pages. texint = kfo * Lfo + knet * Lnet + kwo * Lwo
One equation for the typical extrinsic (load) delay for a single output net is shown below and discussed in detail on the following pages. texout = kcap * (Csystem + Cpackage )
The form and notations used for these equations varies widely with the array vendors. The typical intrinsic macro delays in the path (tin; specified as Tpd in the macro documentation) and all typical extrinsic loading: (texi = sum of all texintj and texout) 1...j all nets in path
are multiplied by the proper worst-case timing multiplication factor and the results summed. Worst-case macro delays are used directly.
Example Equations for Extrinsic Loading Internal Nets Loading delays may be computed for Front-Annotation analysis by the general equation: texint = kfo * Lfo + knet * Lnet + kwo * Lw
where: k = the k-factor for the series and the macro option kfo is for fan-out load; knet is for metal load kwo is for wire-OR load Lfo = the sum of the electrical fan-out loads in a net. ----(Pins with a fan-in of 2 count as 2 electrical loads.) Lnet = the estimated metal delay from Front Annotation tables or equations; Lwo = the electrical load due to wire-OR The k-Factors are the conversion factors for changing load units into time units. These k-Factors are expressed in ns/LU. The load units are computed for electrical fan-out, net metalization and electrical wire-OR loads. The k-Factors may be assumed to be identical for Front-Annotation estimation. If the array does not allow wire-OR structures, the equation reduces to: tex = k * [ Lfo + Lnet ] reduced equation
Lnet - Front-Annotation Lnet is the statistical wire load taken from the Front-Annotation Statistical Wire Load table supplied by the vendor or from an equation that is supplied by the vendor, using the number of physical pins in the net minus 1 as an index. Lnet is expressed in load units. This load may also be specified through graphs. For Front-Annotation, Lnet tables are derived from empirical measurement of hundreds of nets of equal size in actual circuits on the same array and the resulting 50% point in the normal distribution (the median) is used as the table entry. This means that 50% of the net delays computed with this number will be equal to or smaller than this number and 50% will be equal to or larger than the computed number. When an array is preliminary, not many circuits will have been tested or measured. This means that the Front-Annotation values are estimates of expected delays. The errors could be in either direction. For any array, Front-Annotation accuracy decreases with net size. Critical paths that are pre-placed or given a higher priority in place and route operations than the rest of the circuit can be kept within the FrontAnnotation limits. Other circuits that are not critical will have longer metal lengths.
A reasonable number of paths in the circuit can be prioritized, with the allowed number a function of other the placement restrictions for the circuit, cell utilization and internal pin count. A limit - on the number of pre-placed and priority routed paths - of 20% is satisfactory for most circuits. Preplacement should not be considered as a solution to a timing problem. It is available as an aid, depending on other placement considerations.
Fan-In The loading that a typical macro presents to its driving source is typically one load for bipolar arrays and higher for BiCMOS arrays. Some vendors specify fan-in in tabular form with other specification data, or indicate a general rule. Some macros look as if they present two or more loads to their driving sources when they do not. The graphic representation is a logical picture of the function, not a physical representation of how the function is constructed.
Fan-Out - Lfo Fan-in affects the electrical loading presented to the driving macro. The electrical fan-out load count may be higher than the physical fan-out load count. For example, a pin with a fan-in of 2 counts as two electrical loads in Lfo and one physical pin when looking up Lnet. Fan-out violations or fan-out in excess of derated levels should have been checked during the design review of the circuit. Derated fan-out limits are used by the AMCCERC software when performing fan-out load limit violation checking. If a fan-out load is found to be excessive, the circuit must be corrected before proceeding with the timing analysis. Lfo is the sum of all fan-out loads. This is the sum of all electrical fan-out loads - the sum of the fan-in for each pin connected to the net. Lfo is expressed in load units. The load a macro presents to a driving macro is part of the macro specifications.
Wire-OR - Lwo For arrays that allow wire-ORs, Lwo is W * (n-1) where W is the wire-OR load factor for the array and n is the wire-OR size. Lwo is expressed in load units. This term only exists for those arrays that allow a wire-OR. Not all arrays in all technologies allow the use of the wire-OR. If it is legal for the array, the presence of a wire-OR in a net will affect both the electrical and the physical loading in the net. Wire-ORing two outputs will not increase the fan-out load limit in the bipolar arrays as it would have in a CMOS array. AMCC Q5000 arrays, which allow wire-ORs, power-down the additional current sources by way of conditional geometry software. Load units for the Q5000 wire-OR macros are shown in Table 5-2.
Example Table 5-2 Q5000 Wire-Or Loading (Lwo) WIRE-OR SIZE
LU
WIREOR2
0.40
WIREOR3
0.80
WIREOR4
1.20
k-Factors The k-factor for an internal macro output pin is the drive factor expressed in ns/LU (Load Units) that is used to convert the total net load units into time. As detailed above, load units are attributed to electrical fan-in loading, physical metal length loading and electrical wire-OR loading. Within an array series, the k-Factors vary by macro option and by edge direction. Examples are shown in Table 5-3 for the Q5000 Series (bipolar) and the Q20000 Series (high-speed bipolar) arrays. Table 5-3a Example K-Factors - Option Specific * Low Power Option Macro: k-factors
rising edge
0.04 ns/LU
falling edge
0.08 ns/LU
Standard Option Macro: k-factors
rising edge
0.04 ns/LU
falling edge
0.04 ns/LU
High-Speed Option Macro: k-factors
rising edge
0.02 ns/LU
falling edge
0.04 ns/LU
15-Load Driver: k-factors
rising edge
0.02 ns/LU
falling edge
0.02 ns/LU
25-Load Driver: k-factors
rising edge
0.01 ns/LU
falling edge
0.01 ns/LU
* Q5000 Library, internal macros, typical values
Table 5-3b Example Com5max Library K-Factors For The AMCC Q20000 Series kfo = knet; kcap** Macro Type ECL input, Bidi and internal macros - L option
ECL input, Bidi and internal macros - S option
ECL input, Bidi and internal macros - H option
ECL input, Bidi and internal macros - drivers
ECL output - 50 ohm standard
ECL output - 25 ohm Darlington
ECL output - 50 ohm Darlington
TTL input and Bidi (Non-Turbo) S, L, H options
TTL input and Bidi (Turbo) S, L, H options
TTL output - 20 mA
TTL output - 8 mA
Description
min/max
units
rising edge
2.3/5.1
ps/LU
falling edge
3.9/8.5
ps/LU
rising edge
1.8/3.9
ps/LU
falling edge
3.0/6.5
ps/LU
rising edge
1.4/3.1
ps/LU
falling edge
2.4/5.2
ps/LU
rising edge
1.1/2.3
ps/LU
falling edge
1.5/3.3
ps/LU
rising edge
16.0/36.0
ps/pF
falling edge
20.0/44.0
ps/pF
rising edge
13.0/30.0
ps/pF
falling edge
15.0/34.0
ps/pF
rising edge
13.0/30.0
ps/pF
falling edge
21.0/46.0
ps/pF
rising edge
3.0/6.5
ps/LU
falling edge
6.0/13.0
ps/LU
rising edge
1.4/3.1
ps/LU
falling edge
2.4/5.2
ps/LU
rising edge
33.0/72.0
ps/pF
falling edge
33.0/72.0
ps/pF
rising edge
33.0/72.0
ps/pF
falling edge
54.0/117.0
ps/pF
** Individual macro k-factors are specified in the macro library documentation in the AMCC Q20000 Design Manual, Volume I, Section 6. Units are min/max spread for the Commercial 5V library only.
Computing Lfo Compute Lfo by adding the sum of the electrical loads of all loads driven. If a destination pin has a fan-in of 2, it counts as two electrical loads and as one physical pin. A destination may appear to have two physical loads internal to the macro. In these cases, the macro documentation will clearly identify the fan-in load represented by that pin. BiCMOS libraries have a higher average fanin than do bipolar libraries.
Computing Lwo Compute Lwo by multiplying the wire-OR load factor by the size of the wire-OR. For libraries that do not allow a wire-OR, this term becomes zero. Example: For the AMCC Q5000, WIREOR4 = 1.2 loads.
Computing Lnet For a non-RC tree, non-distributed estimate of metal delays, use the vendor-supplied equation to find the metal loading. The cell sizes on the larger arrays in the same family are the same as for the smaller arrays. Since the distance from edge to edge of the array is larger, the average distance for an interconnect is larger. Therefore, the same macro path would be estimated as longer on the larger array than on the smaller one. Note that this is strictly the estimate - the actual delay will depend upon the macro positions and the actual routing paths.
Example AMCC uses: Lnet = a * (net size - 1)** b
where (net size - 1) is the physical pin count of the loads driven plus the number of sources on the net (assuming a wire-OR) minus one. When there is no wire-OR, (net size - 1) reduces to pins driven. Q2000 Series a and b factors (Historical)
For the Q5000, b = 0.67 and a varies by array. The Q5000T uses a = 3.84 and the Q1300T uses a = 1.96. For a macro driving a net sized 8 (net size - 1 = 7), this converts to 14.14 load units for the Q5000T and 6.63 load units for the Q1300T.
Exercises 1. For a Q5000T array, using the above information, what is the typical estimated delay in a net that is driven by a standard macro, if it is sourced by a four-input wire-OR (all sources are standard macros), and drives four other macros each of which has a fan-in of 1? Answer: k-factor = 0.04 ns/LU rising or falling Lwo = 1.2 LU Lfo = (4 macros * (fan-in = 1)) = 4 LU Lnet = 3.83 * (8-1)** 0.67 = 14.14 LU texint = 0.04 * ( 4 + 14.14 + 1.2 ) ns = 0.77 ns 2. Repeating for the Q13000T array: Answer: k-factor = 0.04 ns/LU rising or falling Lwo = 1.2 LU Lfo = (4 macros * (fan-in = 1)) = 4 LU Lnet = 1.96 * (8-1)** 0.67 = 6.63 LU