POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
FRONTIERS IN ELECTRONIC TESTING Consulting Editor Vishwani D. Agrawal
Boo...
134 downloads
969 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
FRONTIERS IN ELECTRONIC TESTING Consulting Editor Vishwani D. Agrawal
Books in the series: High Performance Memory Memory Testing R. Dean Adams ISBN: 1-4020-7255-4 SOC (System-on-a-Chip) Testing for Plug and Play Test Automation K. Chakrabarty ISBN: 1-4020-7205-8 Test Resource Partitioning for System-on-a-Chip K. Chakrabarty, Iyengar & Chandra ISBN: 1-4020-7119-1 A Designers’ Guide to Built-in Self-Test C. Stroud ISBN: 1-4020-7050-0 Boundary-Scan Interconnect Diagnosis J. de Sousa, P.Cheung ISBN: 0-7923-7314-6 Essentials of Electronic Testing for Digital, Memory, and Mixed Signal VLSI Circuits M.L. Bushnell, V.D. Agrawal ISBN: 0-7923-7991-8 Analog and Mixed-Signal Boundary-Scan: A Guide to the IEEE 1149.4 Test Standard A. Osseiran ISBN: 0-7923-8686-8 Design for At-Speed Test, Diagnosis and Measurement B. Nadeau-Dosti ISBN: 0-79-8669-8 Delay Fault Testing for VLSI Circuits A. Krstic, K-T. Cheng ISBN: 0-7923-8295-1 Research Perspectives and Case Studies in System Test and Diagnosis J.W. Sheppard, W.R. Simpson ISBN: 0-7923-8263-3 Formal Equivalence Checking and Design Debugging S.-Y. Huang, K.-T. Cheng ISBN: 0-7923-8184-X Defect Oriented Testing for CMOS Analog and Digital Circuits M. Sachdev ISBN: 0-7923-8083-5 Reasoning in Boolean Networks: Logic Synthesis and Verification Using Testing Techniques W. Kunz, D. Stoffel ISBN: 0-7923-9921-8 Introduction to Testing S. Chakravarty, P.J. Thadikaran ISBN: 0-7923-9945-5 Multi-Chip Module Test Strategies Y. Zorian ISBN: 0-7923-9920-X Testing and Testable Design of High-Density Random-Access Memories P. Mazumder, K. Chakraborty ISBN: 0-7923-9782-7 From Contamination to Defects, Faults and Yield Loss J.B. Khare, W. Maly ISBN: 0-7923-9714-2
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
by
NICOLA NICOLICI McMaster University, Hamilton, Canada and
BASHIR M. AL-HASHIMI University of Southampton, U.K.
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-48731-4 1-4020-7235-X
©2004 Springer Science + Business Media, Inc. Print ©2003 Kluwer Academic Publishers Dordrecht All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America
Visit Springer's eBookstore at: and the Springer Global Website Online at:
http://www.ebooks.kluweronline.com http://www.springeronline.com
Foreword
Increased levels of chip integration combined with physical limitations of heat removal devices, cooling mechanisms and battery capacity, have established energy-efficiency as an important design objective in the implementation flow of modern electronic products. To meet these low energy objectives, new low power techniques, including circuits, architectures, methodologies, algorithms and computer-aided design tool flows, have emerged. If the integration trend continues in the coming decade, i.e. transistors on lead microprocessors double every two years, die size grows by 14% every two years, supply voltage scales meagerly, and frequency doubles every two years, then what would happen to power and energy? Expected power consumption of such microprocessors, which goes beyond 100watts today, will grow by an order of magnitude every two years reaching 10Kwatts in 2008. It is clear that excessive power usage may become prohibitive and total power consumption will be a limiting factor in the near future. These two factors will become even more critical for lower performance applications, such as in portable products, where low power techniques becomes a necessity. Planning for power need to be incorporated into the design flow of such systems. Since most of the existing low power techniques aim to reduce the switching activity during the functional operation, they may conflict with the state-of-theart manufacturing test flow. In fact, power management is not limited to the design space only, testing chips with high power consumption is a major problem too. For example, a complex chip may consume three or four times higher power during testing when compared to its functional operation. This leads to a reliability problem since overheating can cause destructive test. An additional concern for testing low power circuits is caused by the interaction between the existing design-for-test methods and voltage drop on power/ground networks. Due to high circuit activity when employing scan or built-in self-test, the voltage drop which occurs
v
vi only during test will cause some good circuits to fail the testing process, thus leading to unnecessary manufacturing yield loss. Therefore, accounting for power dissipation during test is emerging as a necessary step in the implementation flow, which will ultimately influence both the quality and the cost of test. To keep the pace with the low power design practices, it is essential that the emerging system-on-a-chip test methodologies regard power-constrained testing as an important parameter when establishing the manufacturing test requirements. Today, we have started to see certain embedded test solutions that are designed with infrastructure IP to perform power management on-chip. With the increasing use of embedded cores from third party IP providers, it is expected that power-constrained test solutions be implemented at the cores level by the third party IP providers. This book is the first comprehensive book that covers all aspects of powerconstrained test solutions. It is a reflection of authors’ own research and also a survey of the major contributions in this domain. I strongly recommend this book to all engineers involved in design and test of system-on-chip, who want to understand the impact of power on test and design-for-test.
Fremont, November 2002
Dr Yervant Zorian, Vice President & Chief Scientist, Virage Logic Corp, Fremont, California, U.S.A.
Contents
Foreword Preface Acknowledgments
v ix xi
1. DESIGN AND TEST OF DIGITAL INTEGRATED CIRCUITS 1.1 Introduction 1.2 VLSI Design Flow 1.3 External Testing Using Automatic Test Equipment 1.4 Internal Testing Using Built-In Self-Test 1.5 Power Dissipation During Test Application 1.6 Organization of the Book
1 1 2 4 7 17 19
2. POWER DISSIPATION DURING TEST 2.1 Introduction 2.2 Test Power Modeling and Preliminaries 2.3 Power Concerns During Test 2.4 Sources of Higher Power Dissipation During Test Application 2.5 Summary
21 21 22 25 26 30
3. APPROACHES TO HANDLE TEST POWER 3.1 Introduction 3.2 A Taxonomy of the Existing Approaches for Power-Constrained Testing 3.3 Test Set Dependent vs. Test Set Independent Approaches 3.4 Test-per-Clock vs. Test-per-Scan 3.5 Internal Test vs. External Test 3.6 Single vs. Multiple Test Sources and Sinks 3.7 Power-Constrained Test Scheduling 3.8 Summary
31 31
vii
31 34 37 38 46 46 49
viii
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
4. BEST PRIMARY INPUT CHANGE TIME 4.1 Introduction 4.2 Scan Cell and Test Vector Reordering 4.3 A Technique for Power Minimization 4.4 Algorithms for Power Minimization 4.5 Experimental Results 4.6 Summary
51 51 52 55 68 73 85
5. MULTIPLE SCAN CHAINS 5.1 Introduction 5.2 Multiple Scan Chain-Based DFT Architecture 5.3 Multiple Scan Chains Generation 5.4 Experimental Results 5.5 Summary
87 87 88 97 104 111
6. POWER-CONSCIOUS TEST SYNTHESIS AND SCHEDULING 6.1 Introduction 6.2 Power Dissipation in BIST Data Paths 6.3 Effect of Test Synthesis and Scheduling 6.4 Power-Conscious Test Synthesis and Scheduling Algorithm 6.5 Experimental Results 6.6 Summary
113 113 115 117 124 132 137
7. POWER PROFILE MANIPULATION 7.1 Introduction 7.2 The Global Peak Power Approximation Model 7.3 Power Profile Manipulation 7.4 Power-Constrained Test Scheduling 7.5 Experimental Results 7.6 Summary
139 139 139 141 147 153 156
8. CONCLUSION References About the Authors Index
159 163 175 177
Preface
Multi-billion transistor chips will be manufactured by the end of this decade, and new design and test methodologies will be emerging to cope with the product complexity. Power dissipation has already become a major design concern and it is turning into a key challenge for the deep submicron digital integrated circuits. Power dissipation concerns cover a large spectrum of products ranging from high performance computing to wireless communication. The continuous growth in power requirements is driven by the ever increasing chip complexity. While smaller devices operate at lower voltages, the currents grow to drive the large number of transistors on a complex circuit and consequently the power dissipation increases. Since devices are sources of heat, the chip temperatures are also increasing. If the temperature increase is excessive (i.e., above heat removal limits) the devices being heated may either permanently degrade or totally fail. Also, due to variations in chip temperatures and since the slew rate (slope) is temperature dependent, differences in delay between different parts of the circuit can occur and they may lead to skew problems that can affect the functionality. Hence, placing more and more functions on a silicon die has resulted in higher power/heat densities, which impose stringent constraints on packaging and thermal management in order to preserve performance and reliability. If the packaging and thermal management parameters (e.g., heat sinks) are determined only based on the normal (functional) operating conditions, then what are the implications of the high circuit activity during test on yield and/or reliability? To answer this question, the factors responsible for power dissipation in digital integrated circuits and the solutions that address the excessive power dissipation during test are discussed in this book.
ix
This page intentionally left blank
Acknowledgments
This book could have not been completed without the help and assistance of many people. The members of the Computer-Aided Design and Test Group at McMaster University and Electronic Systems Design Group in the Department of Electronics and Computer Science of University of Southampton have inspired us with their support. We are thankful for their constant friendship and providing appreciated feedback on several occasions. The authors would like to express sincere gratitude to two very special friends who have contributed generously to this book: Paul Rosinger and Theo Gonciari. This venture has been initiated with the enthusiastic encouragement from Mark de Jongh of Kluwer Academic Publishers. The editorial assistance and patience of the staff at Kluwer has been invaluable. We are particularly indebted to our editor, Vishwani Agrawal of Agere Systems, who has provided valuable comments and helpful corrections, which have been incorporated with deep appreciation. Finally, we wish to thank our families for understanding the time we needed to complete this project. Without their support this book would have not been possible.
Nicola Nicolici (http://www.ece.mcmaster.ca/~ nicola/) Bashir M. Al-Hashimi (http://www.ecs.soton.ac.uk/~ bmah/)
xi
This page intentionally left blank
Chapter 1 DESIGN AND TEST OF DIGITAL INTEGRATED CIRCUITS
1.1
Introduction
The topic of this book is power-constrained testing of very large scale integrated (VLSI) circuits. This is a sub-problem of the general goal of testing VLSI circuits. Testing VLSI circuits bridges the gap between the imperfection of the manufacturing process for integrated circuits (IC) and the end user’s expectations of defect-free chips. Manufacturers test their products to discard the faulty components to ensure that only the defect-free chips make their way to the consumer [1‚ 15]. With the advent of deep sub-micron technology [69]‚ the tight constraints on power dissipation of VLSI circuits have created new challenges for testing low power VLSI circuits which need to overcome the traditional test techniques that do not account for power dissipation during test application. Since much of the power consumed by the circuit is dissipated as heat‚ the relationship between the test activity and the cooling capacity need to be taken into consideration in order to avoid destructive test [148]. Also in the long term power limitations will be driven more by system level cooling and test constraints than packaging [70]. The aim of this chapter is to place the problem of testing low power VLSI circuits within the general context of the VLSI design flow. The rest of the chapter is organized as follows. Section 1.2 overviews the VLSI design flow and outlines the importance of testing integrated circuits. External testing using automatic test equipment (ATE) and the need for design for test (DFT) methods are described in Section 1.3. Section 1.4 introduces built-in self-test (BIST) and provides the terminology used throughout the book with the help of detailed examples. Section 1.5 describes the importance of power minimization during test and Section 1.6 provides an overview of the book. 1
2
1.2
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
VLSI Design Flow
In the complementary metal-oxide semiconductor (CMOS) technology‚ process technologies race to keep pace with Moore’s law which observes that chip processing power doubles every 18 months [69]. While the increase in integration comes with numerous beneficial effects‚ the perception of the faulty behavior is changing. This section introduces the design and test flow of CMOS integrated circuits‚ which is the dominant fabrication technology for implementation of VLSI circuits that contain more than transistors [39]. As shown in Figure 1.1 the design flow of VLSI circuits is divided into three main steps: specification‚ implementation and manufacturing [113]. Specification is the step of describing the functionality of the VLSI circuit. The specification is done in hardware description languages (HDLs)‚ such as VHDL or Verilog [39] in two different design domains‚ the behavioral domain or the structural domain [39]‚ at various levels of abstraction. For example the logic level of abstraction is represented by means of expressions in Boolean algebra in the behavioral domain‚ or interconnection of logic gates in the structural domain. Going up in abstraction level‚ one reaches the register-transfer level. Register-transfer level (RTL) is the abstraction level of the VLSI design flow where an integrated circuit is seen as sequential logic consisting of registers and functional units that compute the next state given the present state. The highest level for system specification is the algorithmic level where the specification consists of tasks that describe the abstract functionality of the system. Implementation is the step of generating a structural netlist of components that perform the functions required by the specification. According to the design methodology‚ the implementation can be either full custom or semicustom [34]. In the full custom design methodology the design is hand-crafted requiring an extensive effort of a design team to optimize each detailed feature of the circuit. In semi-custom design methodology‚ which can be either library cell-based or gate array-based‚ a significant portion of the implementation is done automatically using computer-aided design (CAD) tools. CAD tools are used to capture the initial specification in hardware description languages‚ to translate the initial specification into internal representation‚ to translate the behavior into structural implementation‚ to optimize the resulted netlist‚ to map the circuit into physical logic gates‚ and to route the connections between gates. Manufacturing is the final step of the VLSI design flow and it results in a physical circuit realized in a fabrication technology with transistors connected as specified by implementation. The term fabrication technology refers to the semiconductor process used to produce the circuit which can be characterized by the type of semiconductor (e.g.‚ silicon)‚ the type of transistors (e.g.‚ CMOS)‚ and the details of a certain transistor technology (e.g.‚ 0.13 micron). CMOS technology is the dominant technology for manufacturing VLSI circuits and it is considered throughout this book. Due to significant improve-
Design and Test of Digital Integrated Circuits
3
ment in the fabrication technology designers can place millions of transistors on a single piece of silicon that only accommodated thousands of transistors a few decades ago. However‚ complex designs are more failure-prone during the design flow. Therefore‚ to increase the reliability of the final manufactured product two more problems have to be addressed during the early stages of the VLSI design flow shown in Figure 1.1: verification and testing. Verification involves comparing the implementation to the initial specification. If there are mismatches during verification‚ then the implementation may need to be modified to more closely match the specification [72]. In traditional VLSI design flow the comparison between specification and implementation is accomplished through exhaustive simulation. Because exhaustive simulation for complex designs is practically infeasible‚ simulation provides at best only a probabilistic assurance. Formal verification‚ in contrast to simulation uses rigorous mathematical reasoning to prove that an implementation meets all or parts of its specification [72]. Testing assures that the function of each manufactured circuit corresponds to the function of the implementation [1]. Producing reliable VLSI circuits depends strongly on testing to eliminate various defects caused by the manufacturing process. Basic types of defects in VLSI circuits [99] are the following: particles (small bits of material that bridge two lines)‚ incorrect spacing‚
4
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
incorrect implant value‚ misalignment‚ holes (exposed area that is unexpectedly etched)‚ weak oxides‚ and contamination. The defects lead to faulty behavior of the circuit which can be determined either by parametric testing or logic testing [15]. Testing circuits parametrically includes measuring the current flowing through the power supply in the quiescent or static state [15]. As CMOS technology scales down parametric testing is no longer practical due to an increase in sub-threshold leakage current. Logic testing involves modeling manufacturing defects at the logic level of abstraction of the VLSI design flow‚ where faulty behavior is measured by the logic value of the primary outputs of the circuit [1]. The basic fault models for logic testing are stuck-at fault model‚ bridging fault model‚ open fault model‚ and timing related fault models such as gate delay and path delay fault models [15]. The earliest and most common fault model is the stuck-at fault model where single nodes in the structural netlist of logic gates are assumed to have taken a fixed logic value (and thus is stuck-at either 0 or 1). From now onwards throughout this book testing VLSI circuits refers to the most common and generally accepted logic testing for stuck-at fault model. Having described manufacturing defects and their fault models‚ the following two sections describe how test patterns are applied to the circuit under test to distinguish the fault free and faulty circuits. The application of test patterns to detect faulty circuits can be done either externally using ATE or internally using BIST.
1.3
External Testing Using Automatic Test Equipment
Given the design complexity of state of the art VLSI circuits‚ the manufacturing test process relies heavily on automation. Figure 1.2 shows the basic principle of external testing using automatic test equipment with its three basic components: circuit under test (CUT) or device under test (DUT) is the component which is tested for manufacturing defects; ATE including the control processor‚ timing module‚ power module‚ and format module; and ATE memory that supplies test patterns and measures test responses. In the following an overview [99] of each of these components is presented. The CUT is the part of silicon wafer or packaged device to which tests are applied to detect manufacturing defects. The connections of the CUT pins and bond pads to ATE must be robust and easily changed since testing will connect and disconnect millions of parts to the ATE to individually test each part. Due to the complexity of state-of-the-art circuits many modern CUTs are heterogenous systems-on-a-chip (SOCs) which combine different types of entities such as digital cores‚ embedded processors‚ memory macro-cells and analog interfaces. The ATE includes the control processor‚ timing module‚ power module‚ and format module. The control processor is a host computer that controls the
Design and Test of Digital Integrated Circuits
5
flow of the test process and communicates to the other ATE modules whether the CUT is faulty or fault free. The timing module defines the clock edges needed for each pin of the CUT. The format module extends the test pattern information with timing and format information that specifies when the signal to a pin will go high or low‚ and the power module provides power supply to CUT and is responsible for accurately measuring currents and voltages. The ATE memory contains test patterns supplied to the CUT and the expected fault free responses which are compared with the actual responses during testing. State of the art ATE measures voltage response with millivolt accuracy at a timing accuracy of hundreds of picoseconds [15]. Test patterns or test vectors stored in the ATE memory are obtained using automatic test pattern generation (ATPG) algorithms [1]. From now onwards throughout this book the terms test patterns and test vectors are used interchangeably. The number and size of the test patterns/responses which need to be provided to/analyzed from the CUT determine the volume of test data. ATPG algorithms can broadly be classified into random and deterministic algorithms. Random ATPG algorithms involve generation of random vectors and test efficiency (test quality quantified by fault coverage) is determined by fault simulation [1]. Deterministic ATPG algorithms generate tests by processing a structural netlist at the logic level of abstraction using a specified fault list from a fault universe (defined by an explicit fault model such as stuck-at fault model). Compared to random ATPG algorithms‚ deterministic ATPG algorithms produce shorter and higher quality tests in terms of test efficiency‚ at the expense of longer computation time. High computation time associated with deterministic ATPG algorithms is caused by low controllability and observability of the internal nodes of the circuit. This problem is more severe for sequential circuits where despite recent advancements in ATPG [15] computation time is large‚ and test efficiency is
6
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
not satisfactory. Further‚ the growing disparity between the number of transistors on a chip and the limited input/output pins makes the problem of achieving high test efficiency very complicated and time consuming. DFT is a methodology that improves the testability‚ in terms of controllability and observability‚ by adding test hardware and introducing specific test oriented decisions during the VLSI design flow shown in Figure 1.1. This often results in shorter test application time‚ higher fault coverage and hence test efficiency‚ and easier ATPG. The most common DFT methodology is scan based DFT where sequential elements are modified to scan cells and introduced into a serial shift register. This is done by having a scan mode for each scan cell where data is not loaded in parallel from the combinational part of the circuit‚ but it is shifted in serially from the previous scan cell in the shift register. Scan based DFT can further be divided into full scan and partial scan. The main advantage of full scan is that by modifying all the sequential elements to scan cells it reduces the ATPG problem for sequential circuits to the more computationally tractable ATPG for combinational circuits. On the other hand‚ partial scan modifies only a small subset of sequential elements leading to lower test area overhead at the expense of more complex ATPG. The introduction of scan based DFT leads to the modification of the test application strategy‚ which describes how test patterns are applied to the CUT. Unlike the case of combinational circuits or non-scan sequential circuits where a test pattern is applied every clock cycle‚ when scan based DFT is employed each test pattern is applied in a scan cycle. In a scan cycle‚ the number of clock cycles required to shift in the present state (pseudo input) part of a test vector equals the total number of scan latches‚ and then the test response is loaded in a single clock cycle. Figure 1.3 illustrates the application of a test pattern at time (clock cycle) t + m after shifting out the test response of test pattern applied at t – 1‚ where p is the number of primary inputs‚ and m is the number of memory elements modified to scan cells The scan cycle lasts for m + 1 clock cycles of which m clock cycles are required to shift out the pseudo output part of the test response for test vector (time t to t + m – 1) and one clock cycle is required to apply In the case when ATE channels control directly the primary inputs of the CUT‚ then while shifting in the pseudo input value of the test vector the redundant information at primary inputs can be exploited for defining new test application strategies which do not affect test efficiency. This section has described the basic principles of external testing using ATE and concepts of scan based DFT method. Finally‚ it should be noted that five main test parameters which assess the quality of a scan DFT method when using external ATE are: test area required by extra DFT hardware‚ performance‚ test efficiency‚ test application time and volume of test data.
Design and Test of Digital Integrated Circuits
1.4
7
Internal Testing Using Built-In Self-Test
Despite its benefits of detecting manufacturing defects‚ external testing using ATE has two problems. Firstly‚ ATE is extremely expensive and its cost is expected to grow in the future as the number of chip pins increases [15]. Secondly‚ when applying generally accepted scan based DFT‚ test patterns cannot be applied to the circuit under test in a single clock cycle since they need to be shifted through the scan chain in a scan cycle. This makes at-speed testing difficult. These problems have led to development of BIST [1‚ 7‚ 15]. which is a DFT method where parts of the circuit are used to test the circuit itself. Therefore test patterns are not generated externally as in the case of ATE (Figure 1.2)‚ but they are generated internally using BIST circuitry. To a large extent this alleviates the reliance on ATE and testing can be carried out at normal functional speed. In some cases this not only substantially reduces the cost of external ATE‚ but also enables the detection of timing related faults. The basic principle of BIST is illustrated in Figure 1.4. The heavy reliance on external ATE including ATE memory to store the test patterns (Figure 1.2)‚ is eliminated by BIST which employs on chip test pattern generator (TPG) and signature analyzer (SA). When the circuit is in the test mode‚ TPG generates patterns that
8
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
set the CUT lines to values that differentiate the faulty and fault-free circuits‚ and SA evaluates circuit responses. The most relevant approach for exhaustive‚ pseudoexhaustive‚ or pseudorandom generation of test patterns is the use of a linear feedback shift register (LFSR) as TPG [15]. LFSR is widely employed by BIST methods mainly due to its simple and fairly regular structure‚ its pseudorandom properties which lead to high fault coverage and test efficiency‚ and its shift property that leads to easy integration with serial scan. The typical components of an LFSR are memory elements (latches or flip flops) and exclusive OR (XOR) gates. Despite their simple appearance LFSRs are based on complex mathematical theory [7] that help explain their behavior as test pattern generators and response analyzers. While LFSR can be used to compact and analyze the test responses for single-output CUT‚ its simple extension to multiple-input signature analyzer (MISR) compacts and analyzes test sequences for multiple-output CUT. MISR can be extended to built-in logic block observer (BILBO) or to concurrent BILBO (CBILBO) [1] to perform both test pattern generation and signature analysis. An alternative to LFSR for test pattern generation are cellular automata [15] in which each cell consisting of a memory element is connected only to its neighboring cells. Based on the relation to functional operation‚ BIST can be on-line BIST or off-line BIST. On one hand‚ in on-line BIST testing occurs during the functional operation. Despite its benefits for in-field testing and on-line fault detection for improving fault coverage‚ on-line BIST leads to excessive power dissipation which increases packaging cost and reduces circuit reliability. Moreover‚ on-line testing conflicts with power management policies implemented in the state of the art deep sub-micron VLSI circuits to reduce power dissipa-
Design and Test of Digital Integrated Circuits
9
tion. This makes on-line BIST inefficient for testing low power VLSI circuits. On the other hand‚ the same test efficiency is achieved by off-line BIST which deals with testing a circuit when it is not performing its normal functions. From now onwards throughout this book unless explicitly specified‚ the term BIST refers to off-line BIST. Based on the trade-off between test application time required to achieve a satisfactory fault coverage and BIST area overhead associated with extra test hardware‚ BIST methods can broadly be classified into scan BIST and parallel BIST. BIST embedding is a particular case of the parallel BIST where functional registers are modified to test registers to generate test patterns and analyze test responses when the circuit is in the test mode. Scan BIST and BIST embedding methodologies are described in the following two subsections.
1.4.1
Scan BIST Methodology
Scan BIST methodology is an extension of scan DFT method‚ where test patterns are not shifted in the scan chain using external ATE‚ but they are generated on-chip using TPG. The basic principle of scan BIST is shown in Figure 1.5. In order to provide pseudorandom patterns‚ an LFSR is used as the TPG‚ and the serial output of the LFSR is connected to a shift register (SR)‚ connected to the primary inputs of the CUT in order to supply the test pattern. The serial output of SR is connected to the serial input of the internal scan chain‚ and the serial output of the scan chain and primary outputs of the CUT
10
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
are analyzed using an MISR. A counter is employed to indicate when shifting is complete‚ so that the pattern stored in the SR and scan chain can be applied to the CUT by activating N/T‚ which is the signal that switches the modes between scan and capture. The extra hardware required by the counter‚ SR‚ LFSR‚ and MISR leads to a minor impact on the performance‚ however‚ usually at the expense of long test application time to achieve satisfactory fault coverage [7]. The long test application time is due to applying each test pattern in a scan cycle which comprises the time required to shift in the patterns into the SR and the internal scan chain. Therefore‚ the test scheme for a scan BIST methodology is called test-per-scan [15]. This is unlike the test-per-clock testing scheme where a test pattern is applied every clock cycle as in the case of the BIST embedding methodology explained in the following subsection.
1.4.2
BIST Embedding Methodology
In a parallel BIST methodology‚ test patterns are applied to the CUT every clock cycle which leads to a substantial reduction in test application time when compared to the scan BIST methodology (Figure 1.5). Figure 1.6 shows a circuit under test having p inputs and q outputs which is tested as one entity using an LFSR for test pattern generation and an MISR for signature analysis. Since most practical circuits are too complex to be tested as one entity‚ a circuit is partitioned into modules [32]. BIST embedding is the parallel BIST methodology where each module is a test primitive in the sense that test patterns are generated and output responses are compressed using test registers for each module [85]. This methodology is particularly suitable for data path circuits described at register-transfer level of the VLSI design flow where modules are tested using test registers which are a subset of functional registers. The following example overviews the BIST embedding methodology for RTL data paths.
Design and Test of Digital Integrated Circuits
11
Example 1.1 Consider the data path shown in Figure 1.7 which was described initially in [32]. The data path consists of six modules and nine registers that are modified into test registers. To make a module testable‚ each input port is directly or indirectly (through a driving path as multiplexer network or a bus) fed by a TPG and every output port directly or indirectly feeds a SA. For example in the case of acts as TPG and operates as SA. These TPGs and SAs are said to be associated with module TPGs and SAs are configured as one of the following: LFSRs‚ MISRs‚ BILBOs‚ and CBILBOs [1]. During the test of modules k = 1…6‚ the associated TPGs and SAs are first initialized to known states‚ then a sufficient number of test patterns are generated by the TPGs and applied to Outputs from are compressed in SAs to form a signature. After all patterns are applied to the final signature is shifted out of the SAs and compared with the fault-free signature.
12
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Test hardware is allocated such that each module receives test patterns and its output responses are observable during test. The process of allocating test hardware (test resources) to each module is referred to as test synthesis. Since‚ within the context of this book‚ test hardware is allocated for BIST‚ the terms test synthesis and BIST synthesis are used interchangeably throughout this book. Due to the test hardware required by TPGs and SAs‚ a BIST data path has greater area than the original circuit. This extra area is referred to as BIST area overhead. Also‚ test hardware often increases circuit delays that may lead to performance degradation. Depending on test hardware allocation generated by test synthesis‚ some modules from the data path may be tested at the same time while others are not. This is due to the conflicts which may arise between different modules that need to use the same test resources. A test schedule specifies the order of testing all the modules by eliminating all the conflicts between resources. A test schedule is divided into several test sessions‚ where in each test session one or more modules are tested. Data paths with many modules in conflict have a higher number of test sessions and hence longer test application time. The test application time of a built-in self-testable data path is the time to complete the test schedule added to the shifting time required to shift in the seeds for test pattern generators and shift out signatures stored in signature analyzers as described in Example 1.1. In the following the concepts defined in [32] are introduced based on the example data path shown in Figure 1.7. These concepts are necessary to understand how a test schedule is generated and serve as a basis for the technique detailed in Chapter 6. A test for a module has an allocation relation with a test register if the register generates test patterns for or analyzes test responses of In general‚ the allocation between modules and test registers can be represented by a bipartite graph [34] with a node set consisting of tests and resources. The resource allocation graph for the data path example from Figure 1.7 is shown in Figure 1.8(a). If there is an allocation relation between and then there are edges between and in the resource allocation graph. For example in the case of from Figure 1.7 and generate test patterns‚ and analyzes test responses. Therefore‚ in the resource allocation graph shown in Figure 1.8(a) there is an edge between and between and and between and If a resource node (register) is connected to more that one test this indicates a conflict between the tests that require that resource. A pair of tests that share a test resource cannot be run concurrently and are referred to as incompatible. Otherwise‚ they are compatible. Pairs of compatible tests form a relation on the set of tests which is a compatibility relation. Such a relation can be represented by a test compatibility graph (TCG) shown in Figure 1.8(b). In a TCG a node appears for each test and an edge exists between two nodes if the corresponding two tests are compatible. For example‚ in the case of TCG from Figure 1.8(b) there is an edge between and since the
Design and Test of Digital Integrated Circuits
13
two tests do not share any resources in the resource allocation graph shown in Figure 1.8(a). The test compatibility graph indicates which tests can be run concurrently. The complement of the test compatibility graph is the test incompatibility graph (TIG) shown in Figure 1.8(c). Unlike the TCG where there is an edge between two compatible tests‚ an edge appears in the TIG if the corresponding two tests are incompatible‚ i.e.‚ they share the same resources in the resource allocation graph shown in Figure 1.8(a). For example‚ since (or simply generates test patterns for both and there is an edge between between and and another edge between and in the resource allocation graph. This will lead to a conflict between and and an edge between and will be introduced in the TIG shown in Figure 1.8(c). The TCG and the TIG can be used as bases for scheduling the tests such that the
14
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
number of test sessions and hence the total test applications are minimized. A clique (a complete subgraph of a graph [34]) of the TCG represents a set of tests which can run concurrently. For example‚ in the case of the TCG shown in Figure 1.8(b)‚ is a clique which means that modules and can be tested concurrently. Thus‚ the test scheduling problem reduces to finding all the cliques in the TCG and covering all the nodes in the TCG with the minimum number of cliques. This problem can also be thought of as finding the minimum number of colors required to color the TIG. This is because the graph coloring problem aims to minimize the number of colors in a graph such that two adjacent nodes do not have the same color. Since all the nodes with the same color in the TIG belong to the same clique in the TCG‚ minimum number of colors in the TIG will indicate the minimum number of test sessions which leads to the lowest test application time. The test scheduling problem to minimize the test application time was shown to be NP-hard [32] and therefore fast heuristics must be developed. It should be noted that test scheduling differs fundamentally from traditional operation scheduling in high level synthesis (HLS) [34‚ 39]. Unlike operation scheduling which is based on a data dependency graph‚ test scheduling is based on the TCG and the TIG shown in Figures 1.8(b) and 1.8(c). Therefore‚ in test scheduling there is no concern with regard to the precedence and the order of execution. The main objective in test scheduling is to minimize the number of test sessions and hence test application time by increasing the test concurrency based on the conflict information derived from the resource allocation graph (Figure 1.8(a)). It is desirable to allocate test hardware (test synthesis) such that both test application time and BIST area overhead are reduced. For each testable data path there are one or more test schedules according to the resource allocation and test incompatibility graphs (test scheduling). Test synthesis and test scheduling are strictly interrelated since each test resource allocation determines the number of conflicts between different tests. Example 1.2 outlines this interrelation between test synthesis and test scheduling.
Example 1.2 Figure 1.9 shows a data path with two registers‚ and and two modules‚ and If only register is modified to a for generating test patterns for both and (through a multiplexer)‚ then due to the test resource conflict it is necessary to schedule tests and at different test times. This leads to an increase in test application time. However‚ if both registers and are modified to and then no test resource conflict occurs and and may be scheduled at the same time. The use of two test registers leads to lower test application time at the expense of higher BIST area overhead.
Design and Test of Digital Integrated Circuits
15
The set of feasible test resource allocations and test schedules define a testable design space. For complex circuits‚ with a large number of registers and modules‚ the size of the testable design space is huge due to the enormous number of test resource allocations and test schedules. Exploring different alternatives in the design space in order to minimize one or more test parameters‚ such as test application time or BIST area overhead‚ is referred to as testable design space exploration. To achieve high quality solutions with both low test application time and low BIST area overhead‚ efficient testable design space exploration is required. Further‚ efficient testable design space exploration is also important from the computation time standpoint‚ since for complex circuits the size of the testable design space is huge. After test resources are allocated (test synthesis) and the test schedule is generated (test scheduling) the final step is to synthesize a BIST controller that controls the execution of test sessions and shifts in the seeds for TPGs and shifts out the signatures stored in SAs. In order to achieve minimum area overhead‚ the BIST controller is merged with the functional controller into a single control unit for the data path. Figure 1.10 shows the extention of a functional data path (Figure 1.10(a)) to a self-testable data path (Figure 1.10(b)) with merged functional and BIST controllers‚ A particular advantage of specifying a circuit at RTL is that control and status signals during the functional specification are merged and optimized with the test signals that operate the data path during testing.
16
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
In addition to test application time‚ BIST area overhead‚ and performance degradation‚ other BIST parameters include volume of test data and faultescape probability. Volume of test data affects storage requirements and shifting time required to shift in the seeds for the TPGs and to shift out the signatures stored in SAs. Hence‚ volume of test data has an influence on test application time which is the sum of the shifting time and the time required to complete the test sessions. High aliasing probability in signature analysis registers [101] leads to data paths with high fault-escape probability which lowers fault coverage and hence decreases test efficiency. Finally‚ it should be noted that the six main test parameters which assess the quality of the BIST embedding methodology are: test application time‚ BIST area overhead‚ performance
Design and Test of Digital Integrated Circuits
17
degradation‚ volume of test data‚ fault-escape probability‚ and efficiency of testable design space exploration.
1.5
Power Dissipation During Test Application
The ever increasing demand for portable computing devices and wireless communication systems requires low power VLSI circuits. Minimizing power dissipation during the VLSI design flow increases lifetime and reliability of the circuit [114‚122]. Numerous techniques for low power VLSI circuit design were reported [114] for CMOS technology where the dominant factor of power dissipation is dynamic power dissipation caused by switching activity [122]. While these techniques have successfully reduced the circuit power dissipation during functional operation‚ testing of such low power circuits has recently become an area of concern as detailed in Chapter 2. Therefore‚ addressing the problems associated with testing low power VLSI circuits has become an important issue. Most of the solutions reported for power minimization during normal operation reduce spurious transitions during functional operation (glitches) which do not carry any useful functional information and cause useless power dissipation. Consequently‚ power can be minimized during test application by eliminating spurious transitions during test application which do not carry any useful test operation. Since dynamic power dissipation caused by switching activity is the dominant factor of power dissipation in CMOS VLSI circuits [114‚ 122]‚ from now onwards‚ unless explicitly specified‚ the terms dynamic power dissipation and power dissipation are used interchangeably throughout this book.
1.5.1
Three Dimensional Testable Design Space
The testable design space exploration involves a trade-off between test application time and BIST area overhead‚ as shown in Figure 1.11 for 32 point discrete cosine transform data path with 60 registers‚ 9 multipliers‚ 12 adders‚ and an execution time constraint of 30 control steps. The results were obtained by synthesizing and technology mapping [34] into 0.35 micron AMS technology [5] 35‚000 BIST data paths which is a large statistical sample of the entire design space of BIST data paths. The BIST data paths were specified in VHDL [34]‚ and test application time (in terms of clock cycles) and BIST area overhead (in terms of square mils) were obtained using the experimental validation flow detailed in [106‚ 109]. BIST area overhead in terms of square mils reflects not only the additional test hardware required by test registers‚ but also the additional gates required to integrate the functional and test controllers as shown in Figure 1.10. Figure 1.11 shows that as test application time decreases there is an increase in BIST area overhead. However‚ there are many test resource allocations leading to identical values in test application
18
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
time with significantly different values in BIST area overhead (each point in Figure 1.11 represents a testable design). For example‚ in the case of the lowest test application time equal to 1064 clock cycles‚ BIST area overhead varies from approximately 130 square mils to 180 square mils. The main disadvantage of trading off only test application time and BIST area overhead is that testable data paths are selected without providing the flexibility of exploring alternative solutions in terms of power dissipation. Indeed‚ a large number of optimum or near-optimum solutions in terms of test application time and BIST area overhead may be found‚ but with different power dissipation. Thus‚ power dissipation is a new parameter which should be considered during testable design space exploration. Figure 1.12 shows the tradeoff between test application time and power dissipation for the 32 point discrete cosine transform data path. In the case of the lowest test application time equal to 1064 clock cycles‚ power dissipation varies from approximately 40mW to 130mW. The different values in power dissipation during test application are not caused only by different values in BIST area overhead (Figure 1.11). Since power dissipation is dependent on switching activity of the active elements during each test session‚ the variation in power dissipation is also due to useless power dissipation defined in Chapter 6.
Design and Test of Digital Integrated Circuits
19
Finally‚ Figure 1.13 shows the three dimensional testable design space for the 32 point discrete cosine transform data path. Unlike the case of exploring only test application time and BIST area overhead (Figure 1.11) or only test application time and power dissipation (Figure 1.12)‚ the exploration of the three dimensional design space accounts for all the three parameters: test application time‚ BIST area overhead and power dissipation (Figure 1.13). The aim of the techniques proposed in Chapter 6 is to efficiently explore the three dimensional design space and eliminate useless power dissipation without any effect on test application time or BIST area overhead.
1.6
Organization of the Book
This book presents a set of recently proposed techniques for power-constrained testing of VLSI circuits. The rest of the book is organized as follows. Motivation for power-constrained testing and a background on test power modeling are given in Chapter 2. Chapter 3 provides a comprehensive review of the existing research approaches for testing low power VLSI circuits. Chapter 4 introduces a test set dependent technique [104‚ 108‚ 112] for power minimization during test application in scan sequential circuits with no penalty in area overhead‚ test application time‚ test efficiency‚ performance‚ or volume of test data when compared to standard scan method. Chapter 5 in-
20
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
traduces a test set independent technique [105‚ 110] applicable to large scan sequential circuits and shows how with low overhead in test area and volume of test data‚ and with no penalty in test application time‚ test efficiency‚ or performance‚ considerable savings in power dissipation during test application in large scan sequential circuits can be achieved. Chapter 6 shows how power dissipation during test application is minimized at the register-transfer level of abstraction of the VLSI design flow [103‚ 107]. The three dimensional testable design space described in Figure 1.13 is explored using power-conscious test synthesis and test scheduling algorithms. Chapter 7 emphasizes that the shape of the power profile is very important as the input to power-constrained test scheduling algorithms‚ which can easily increase test concurrency by exploiting the position and size of the higher and lower power parts in the power profile of every block in the system. It is shown how by manipulating the power profile during test scheduling the test concurrency is increased under given power constraints [120]. Finally‚ conclusions are given in Chapter 8.
Chapter 2 POWER DISSIPATION DURING TEST
2.1
Introduction
Personal mobile communications and portable computing systems are the fastest growing sectors of the consumer electronics market. The electronic devices at the heart of such products need to dissipate low power‚ in order to conserve battery life and meet packaging reliability constraints. Low power design in terms of algorithms‚ architectures‚ and circuits has received significant attention and research input over the last decade [114]. Although low power design methodologies will solve the problem of designing complex digital VLSI circuits‚ such circuits will still be subjected to manufacturing defects. It was implicitly assumed that traditional DFT methodologies are suitable for CMOS digital integrated circuits designed using low power methods. However‚ recent research has shown that this assumption is not valid and leads to lower circuit reliability and reduced manufacturing yield [42‚ 134‚ 138]. For example‚ it was reported in [148] that a VLSI chip can dissipate up to three times higher power during testing when compared to normal (functional) operation. While some over-stressing of devices during a burn-in phase may be desirable‚ increasing the power dissipation by several times can be destructive. The additional power dissipation is caused by significantly higher switching activity during testing than in functional operation. This is due to the fact that there is a fundamental conflict between the very aims of low power design where the correlation between input patterns is increased and traditional DFT methodologies where correlation between test vectors is decreased in order to reduce test application time. 21
22
2.2
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Test Power Modeling and Preliminaries
This section overviews test power modeling and provides the preliminary terminology used throughout this book.
2.2.1
Power Dissipation in CMOS Digital Circuits
Total power dissipation in CMOS circuits can be divided into static‚ short circuit‚ leakage and dynamic power dissipation. The static power dissipation is low when compared to the other components [114]. Short circuit power dissipation caused by short circuit current during switching and power dissipated by leakage currents contribute a small fraction of the total power dissipation. The dominant component of total power dissipation is attributed to dynamic power dissipation caused by switching of the gate outputs [122]. If the gate is part of a synchronous digital circuit controlled by a global clock‚ it follows that the dynamic power required to charge and discharge the output capacitance load of every gate is:
where is the load capacitance, is the supply voltage, is the global clock period, and is the total number of gate output transitions and The vast majority of power reduction techniques concentrate on minimizing the dynamic power dissipation by minimizing switching activity. Thus, node transition count
is used as quantitative measure for power dissipation in Chapters 4 and 5. It is assumed that the load capacitance for each gate is equal to the number of fan-outs. The node transition count in scan cells, is considered as in [33], where it was considered that for input changes and whilst for input changes and Similarly, the node transition count in non-scan cells, is considered and It should be noted that non-scan cells are not clocked while shifting out test responses which leads to zero value in NTC. The average value of node transition count (NTC) reported throughout Chapters 4 and 5, is calculated under the assumption of the zero delay model using Equation 2.2. The use of zero delay model is motivated by very rapid computation of NTC required by the algorithms presented in this book, and by the observation that the average power dissipation under the zero delay model has a high correlation to the average power dissipation under the real delay model [129]. Although the power due to glitches is neglected in Equation 2.1, the zero delay model provides reliable relative power information. This means that the savings in NTC, and
23
Power Dissipation During Test
savings in power dissipation obtained after technology mapping the circuit and accounting for glitching activity during test application‚ are within the same range. Since for the test-per-scan testing‚ the simulation time to compute NTC increases by a factor of m‚ a simpler power estimation method is needed. It was shown that weighted transition count (WTC) described in [125] is well correlated with power dissipation. Experiments were performed on ISCAS85 benchmark circuits synthesized in Alcatel MTC35000 technology [3] to verify the suitability of this power estimation method by determining the degree of correlation between WTC values and transistor-level power estimations. As shown later in Chapter 7‚ the correlation is confirmed since the Pearson correlation coefficients [130] range from 0.86 and 0.98 for WTC values vs. PowerMill power estimations [61]. The WTC values corresponding to scan-in and scan-out respectively are given by:
where represents the in the scan chain.
2.2.2
bit from vector
and m is the number of cells
Average vs. Peak Power Dissipation
Average power dissipation accurately estimates the average switched capacitance at the internal nodes in the circuit. To compute the average power dissipation‚ simulation is performed to determine the current waveforms from the supply voltage for a large number of input patterns. The average power dissipation is then computed by determining the average current from the supply. In the functional mode‚ the average power dissipation is important for mobile computing systems where the battery life is important. Also average power dissipated during test impacts the reliability of the circuit. Peak power dissipation given by the maximum sustained power in a circuit is important for the design of power and ground lines. Furthermore‚ high values of peak power cause voltage drops which can also invalidate the correctness of the test operation [122]. In order to consider power during test scheduling‚ the power dissipated by the block under test needs to be modeled using generic power models. The power profiles capture the power dissipation of a block over time when applying a sequence of test vectors to the primary and pseudo inputs of the block. The power profiles give cycle-accurate descriptions of power dissipa-
24
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
tion which makes them too complex to be considered in the test scheduling process. Therefore‚ simpler approximate power models are needed. A suitable model for the power dissipation of system blocks needs to satisfy the following conditions: Simple - the power model should introduce very low computational needs to test scheduling. Reliable - the power model should guarantee that it does not under-estimate power at any time. Accurate - the power model should not introduce large approximation errors that decrease concurrency‚ and undermine the performance of the power-constrained test scheduling algorithms. The power approximation model currently used by most of the existing powerconstrained test scheduling algorithms is the global peak power approximation model. This power model basically flattens the power profile of a module to the worst case instantaneous power dissipation value. This model is very simple and reliable‚ however its low accuracy decreases the test concurrency which can be achieved by the system level test schedule‚ as emphasized in Chapter 7.
2.2.3
Terminology Overview
A brief review of the standard terminology used throughout this book is presented. The controlling value for a gate is a single input value that uniquely determines the output to a known value independent of the other inputs to the gate. For example‚ the controlling value for an OR gate is 1‚ and for an AND gate is 0. If the value of an input is the complement of the controlling value‚ then the input has a non-controlling value. A path is a set of connected gates and wires. A path is defined by a single input wire and a single output wire per gate. A signal is a side input to a gate which is on a path that starts from scan latches‚ if the primary inputs can justify the signal’s value. If two faults can be detected by a single test vector‚ then they are compatible faults. Consequently‚ two faults are incompatible faults‚ if they cannot be detected by a single test vector. A test vector from a given test set is an essential test vector‚ if it detects at least one fault that is not detected by any other test vector in this test set. A test vector is non-essential with respect to a given test set if all the faults detected by it are also detected by other test vectors in the given test set. A test set dependent approach for power minimization is dependent on the size and the type of the test set employed during test application. A test set independent approach for power minimization depends only on the circuit structure and savings are guaranteed regardless of the size and the type of the test set.
Power Dissipation During Test
2.3
25
Power Concerns During Test
With the advent of deep sub-micron technology and tight yield and reliability constraints‚ in order to perform a non-destructive test for high performance VLSI circuits power dissipation during test application should not exceed the power constraint set by the power dissipated during functional operation of the circuit [18‚ 42‚ 69‚ 99‚ 149]. This is because excessive power dissipation during test application caused by high switching activity may lead to the following two problems [134‚ 138‚ 148]: (i) Destructive testing may be caused by the excessive heat dissipation during test. The use of special cooling equipment to remove the heat during wafer probing is difficult and costly due to the expensive interface between the ATE and the CUT. Also‚ heat removal limitations hinder multi-site testing‚ thus lowering the test cell throughput. Further‚ when tests are executed at higher levels of integration (board-level test or in-field diagnosis)‚ manufacturing specifications (e.g.‚ packaging constraints‚ heat sinks) may be violated. If the temperature of the circuit under test increases too much‚ it will cause overheating and the circuit will fail. Therefore‚ excessive heat dissipation may lead to permanent damage‚ and thus destructive testing of the CUT. The main reason is that power ratings are underestimated by state of the art simulation/estimation approaches‚ which assume functional signal correlations that are eliminated when DFT methods such as scan or BIST are employed. (ii) Manufacturing yield loss can be caused by the power/ground noise and/or the voltage (IR) drop. Since packaging adds extra cost to the IC‚ wafer probing is an important step used to eliminate the defective chips. However‚ to test unpackaged components at the wafer level using ATE‚ power must be supplied through probes which have higher inductance than the power and ground pins of the circuit package‚ thus leading to greater power/ground noise. This noise may cause circuit malfunctioning only during test‚ thus eliminating good unpackaged chips which function correctly under normal conditions. The same yield reduction problem can happen during the packaged component test (pre-burn-in‚ post-burn-in‚ board-level or in-field) due to the large voltage drop. The voltage drop problem is important in the deep-submicron era due to the increased current and wire resistance‚ and the decreased supply voltage. For example at 1.2 V supply‚ a 0.6 A current flowing through 0.4 ohm resistance will cause a voltage drop of 0.24 V which is 20% of the supply voltage. This large voltage drop may cause the devices to run at lower speed‚ thus leading to performance failures‚ setup/hold violations or unreliable operation due to a lower noise margin. Indeed‚ the voltage drop analysis tools are used to re-design the power/ground networks‚ however they are based on the maximum instan-
26
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
taneous current which assumes the signal correlations during the normal operation. Since these signal correlations are overlooked when employing scan or BIST methodologies‚ the voltage drop which occurs only during test will cause some good circuits to fail the test‚ thus leading to unnecessary yield loss.
2.4
Sources of Higher Power Dissipation During Test Application
This section reviews low power design techniques and methodologies which lead to the conflict between low power dissipation during functional operation and achieving high testability of the circuit under test. Dynamic power dissipation in CMOS VLSI circuits depends on three parameters: supply voltage‚ clock frequency‚ and switching activity (see Equation 2.1). While the first two parameters reduce power dissipation at the expense of circuit performance‚ power reduction by minimizing switching activity and hence switched capacitance does not introduce performance degradation and it is the main technique researched over the last decade [114]. Depending on the level of abstraction‚ sources of high power dissipation during test application due to increased switching activity can broadly be classified into logic level sources and register-transfer level sources: (i) Sources of high power dissipation during test application caused by design techniques at the logic level of abstraction can further be classified: (a) Low power combinational circuits are synthesized by algorithms [114] which seek to optimize the signal or transition probabilities of circuit nodes using the spatial dependencies inside the circuit (spatial correlation)‚ and assuming the transition probabilities of primary inputs to be given (temporal correlation). The exploitation of spatial and temporal correlations during functional operation for low power synthesis of combinational circuits leads to high switching activity during test application since correlation between consecutive test patterns generated by ATPG algorithms is very low [124]. This is because a test pattern is generated for a given target fault without any consideration of the previous test pattern in the test sequence. Therefore‚ lower correlation between consecutive test patterns during test application may lead to higher switching activity and hence higher power dissipation when compared to functional operation [138]. (b) Low power sequential circuits are synthesized by state assignment algorithms which use state transition probabilities [114]. The state transition probabilities are computed assuming the input probability distribution and the state transition graph which are valid during functional operation. These two assumptions are not valid during the test
Power Dissipation During Test
27
mode of operation when scan DFT technique is employed. While shifting out test responses‚ the scan cells are assigned uncorrelated values that destroy the correlation between successive functional states. Furthermore‚ in the case of data path circuits with large number of states that are synthesized for low power using the correlations between data transfers [77]‚ in the test mode scan registers are assigned uncorrelated values that are never reached during functional operation‚ which may lead to higher power dissipation than during the functional operation. (ii) High power dissipation during test application caused by design techniques at the register-transfer level of abstraction is due to the following. Systems which comprise a large number of memory elements and multi-functional execution units employ power-conscious architectural decisions such as power management where blocks are not simultaneously activated during functional operation [77]. Hence‚ inactive blocks do not contribute to dissipation during the functional operation. The fundamental premise for power management is that systems and their components experience nonuniform workload during the functional operation [8]. However‚ such an assumption is not valid during test application. In order to minimize test application time‚ concurrent execution of tests is required. Therefore‚ by concurrently executing tests many blocks will be active at the same time leading to a conflict with the power management policy. This will result in higher power dissipation during test application when compared to functional operation. The following two examples illustrate the sources of higher switching activity during test application than during normal operation at two different levels of abstraction of the VLSI design flow: logic level (Example 2.1) and registertransfer level (Example 2.2). Example 2.1 Consider the state transition graph and its circuit implementation shown in Figure 2.1. The functional description of the state transition graph comprises five states and circuit implementation consists of the combinational part C and the sequential part S. In order to achieve high test efficiency scan based DFT is employed and sequential elements are transformed into scan cells with serial input‚ Scan In‚ and serial output‚ Scan Out. To reduce power dissipation during functional operation‚ state assignment algorithms for low power‚ outlined in problem ((i)b)‚ allocate a code to each state such that the number of transitions (nt) is minimized. However‚ when scan based DFT is employed‚ state transition correlations that exist during functional operation are destroyed. This leads to larger number of transitions during testing‚ and hence higher power dissipation‚ as in the case when shifting out the test response with sequential part and shifting in the next test pattern with sequential part For example‚ during testing‚ the following
28
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
state transitions and lead to nt=3 which is higher than any number of transitions during functional operation. Example 2.2 In order to show that the high test concurrency required for low test application time aimed for by the BIST embedding methodology (Section 1.4.2) leads to higher power dissipation during test application‚ consider the data flow graph from Figure 2.2 and its low power implementation shown in
Power Dissipation During Test
29
Figure 2.3. The 13 variables in the data flow graph are mapped to 8 registers and the 6 operations are mapped to 3 functional units (modules) {(*)(+)(–)} (Figures 2.2 and 2.3). According to the variable assignment shown in Figure 2.2‚ the multiplier (*) is active only in clock cycles 1 and 4‚ and the adder (+) and the subtracter (–) are active only in clock cycles 2 and 3. Similarly registers are active only in clock cycles 1 and 4‚ only in clock cycle 2‚ only in clock cycle 3‚ only in clock cycles 2 and 3‚ and only in clock cycles 2‚3‚ and 5. This implies that not all the data path elements are active at the same time which leads to low power dissipation during functional operation as shown in Figure 2.3. However‚ if tests for {(*)(+)} are executed at the same time during test application by employing BIST embedding methodology‚ and modifying registers to LFSRs‚ and registers to MISRs‚ then modules {(*)(+)} and registers are active at the same time. Higher switching activity caused by high test concurrency leads to higher power dissipation during test application than during the functional operation.
30
2.5
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Summary
This chapter has motivated the need for low power testing in order to maintain high circuit yield and reliability. As emphasized in the International Technology Roadmap for Semiconductors (ITRS 2001) [4‚ 70]‚ nanometer technologies with feature sizes under 50nm by the end of the decade‚ increasing clock frequencies up to 10GHz and SOC integration present severe challenges for DFT methodologies. Therefore‚ the electronic test industry must handle a large number of issues ranging from high level test methodologies to large infield test power dissipation for high-performance electronics. ITRS [70] also anticipates that test power management will lower the manufacturing test cost by enabling test cell throughput enhancements in the near-term. Furthermore‚ in the long-term‚ decreasing the die thermal density is a major challenge for wafer probe and component test‚ whose solution will also lower the cost of the DUT to ATE interface.
Chapter 3 APPROACHES TO HANDLE TEST POWER
3.1
Introduction
This chapter gives a review of the recently proposed solutions for dealing with power dissipation during test application. Based on different classification criteria, Section 3.2 gives a taxonomy of the existing approaches which handle test power. Section 3.3 explains the differences between test set dependent and test set independent approaches. Solutions for test-per-clock and test-per-scan testing schemes are outlined in Section 3.4. Since embedded test technology overcomes the problems associated with expensive ATE, the classification based on internal and external test is described in Section 3.5. Two problems which arise in embedded testing are the choice and the number of test sources and sinks and the type of the test control, which are outlined in Section 3.6. Section 3.7 describes power-constrained test scheduling and Section 3.8 gives the summary of this chapter.
3.2
A Taxonomy of the Existing Approaches for Power-Constrained Testing
This section provides a taxonomy of the existing approaches for powerconstrained testing of VLSI circuits. Since the test methodology influences directly the techniques used for handling test power dissipation, different criteria can be used for classifying the existing approaches, as shown in Figure 3.1: Are the values of test vectors used for manufacturing test exploited for power minimization? If the circuit activity is reduced by manipulating the type and order of the manufacturing test vectors then the approach is test set dependent. A rather different approach is based on test set independence, 31
32
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
where special architectural decisions, such as, for example, multiple scan chains or lower test frequency, are employed (Section 3.3). How many clock cycles are necessary to apply one test vector? Depending on the testing scheme there can be long periods during which the inputs to the circuit do not carry any useful information, thus leading to unnecessary circuit activity (Section 3.4). Where are the manufacturing test vectors generated and analyzed? Since the manufacturing test vectors can be generated (or expanded) and analyzed (or compacted) on-chip, thus lowering the necessity and cost of the ATE, it is very important to outline the differences between power-constrained internal and external testing (Section 3.5). When internal testing is employed another important issue is how many test sources (generators) and sinks (analyzers) are used? When employing multiple on-chip generators and analyzers an important question is how is the test control implemented? A single complex test controller defines a centralized approach, which is in contrast to the distributed approach where multiple less complex controllers deal separately with subgroups of test entities (Section 3.6).
Approaches to Handle Test Power
33
The relationship between different test solutions is illustrated in Figure 3.2. The first three classification criteria are orthogonal to each other. For example, on the one hand, one of the first approaches for test power minimization in sequential circuits [17] exploits the values of test vectors and hence it is test set dependent, it uses the redundancies existing in the test-per-scan testing scheme and it employs the external testing approach where test vectors are provided from the ATE. On the other hand, the early work by Zorian [148], that introduces a distributed BIST architecture for handling test power, can be classified as a test set independent approach which applies test vectors in a test-per-clock fashion and uses embedded test technology, such as BIST, for internal testing. Power-constrained test scheduling (Section 3.7) is an additional necessity for the given power ratings and the system-specific test resource allocation conflicts, caused by sharing test sinks/sources (e.g., pattern generators, signature analyzers, ...) or test access mechanisms (e.g., test buses, chip pins, ...). It should be noted that within the previously described criteria there are two main research directions in power-constrained testing. The first direction considers power dissipation during test as the minimization objective. By lowering power dissipation during test, it is guaranteed that the power ratings corresponding to the functional mode are not exceeded. However, in most of the situations this direction leads to higher test application time, which can increase the cost of test. To overcome this problem, the second direction considers power dissipation as a design constraint and the test application time as the minimization objective. These two directions can be merged, thus leading to both low test application time and non-destructive testing.
34
3.3
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Test Set Dependent vs. Test Set Independent Approaches
This section surveys several test set dependent and test independent approaches to power minimization during test. Note that, due to the orthogonality of the solutions (see Figure 3.2), all of the approaches summarized in this section belong also to one of the classes detailed in the following two sections (i.e., test-per-clock vs. test-per-scan and internal vs. external). The vast majority of power reduction techniques concentrate on minimizing the dynamic power dissipation by minimizing the switching activity. Therefore, since the order in which test vectors are applied to the circuit influences the switching activity, a test set dependent approach for power minimization is dependent on the size and the type of the test set employed during test application. In the following, a summary of several approaches is given. An ATPG tool [134, 135, 138] was proposed to overcome the low correlation between consecutive test vectors during test application in combinational circuits. The ATPG tool is based on the modified PODEM algorithm where three new cost functions are introduced: transition probability, transition observability and transition test generation. Since the generated test vector has don’t care bits, a new procedure is used to fill the vector such that the transition activity is reduced. For example, if a bit position of two consecutive vectors changes its value from a controlling value to a non-controlling value of a gate, then the unassigned side inputs of the gate need to have a controlling value in the second test vector. The solution proposed in [135, 138] reduces the switching activity at the expense of higher test application time. To overcome this problem, a very efficient solution has been proposed recently [71]. The method first identifies a set of don’t care inputs for a given compacted test set and then re-assigns the don’t care inputs such that the switching activity is reduced. A different test set dependent approach for minimizing power is based on test vector reordering [17, 33, 38, 45, 50, 51]. The basic idea beyond test vector reordering is to find a new order of the set such that correlation between consecutive test patterns is increased as shown in Figure 3.3. For example considering a p input combinational circuit with a test set of n test vectors (Figure 3.3(a)), swapping the position of test vectors and will lead to a lower power dissipation (Figure 3.3(b)). Test vector reordering is done in a post-ATPG phase with no overhead in test application time since test vectors are reordered such that correlation between consecutive test vectors matches the assumed transition probabilities of primary inputs used for switching activity computation during low power logic synthesis. However, the computation time in [17, 33] is high due to the complexity of the test vector reordering problem which is reduced to finding a minimum cost Hamiltonian path in a complete, undirected, and weighted
Approaches to Handle Test Power
35
36
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
graph. The high computation time is overcome by the techniques proposed in [38, 45, 50], where test vector reordering assumes a high correlation between switching activity in the CUT and the Hamming distance [38, 50] or transition density [45] at circuits’ primary inputs. A test set independent approach for power minimization depends only on the circuit structure and savings are guaranteed regardless of the size and the type of the test set. Since power dissipation is dependent on the clock period, a method to reduce power in sequential circuits is to decrease the test frequency [132]. The main shortcoming of this direction is that the testing time increases as the clock frequency decrease. Note that power supply can also be lowered during test, however, in addition to implicitly increasing the circuit delay and hence decreasing the test frequency, this approach also causes high leakage current and consequently large stand-by power dissipation. To avoid the unnecessary power dissipation in the combinational block during the scan cycle, block circuitry can be used for gating the scan cell output [142, 60, 40, 41]. Significant savings in scan power are achieved along with an undesirable delay penalty which has negative effect on the performance of the circuit in the normal mode of operation. To eliminate the performance degradation, a modified clocking scheme using two non-overlapping clocks, which work at half of the initial frequency, are operating the odd and the even scan cells of the scan chain [11, 49, 10]. In addition to minimizing the clock tree power, the technique reduces the scan power by a factor of approximately two, and it does not incur any test time penalty. Several gated scan clock-based approaches achieve similar results at the advantage of a variable number of scan chains [127, 140]. The approaches in [127, 140] adapt the scan chain for low power using equal multiple scan chain divisions. A two dimensional scan array solution was proposed in [141], while an interleaving architecture that adds delay buffers between the scan chains to reduce the peak power of the capture cycle was introduced in [84].
Approaches to Handle Test Power
3.4
37
Test-per-Clock vs. Test-per-Scan
In a scan environment, a test pattern is applied after shifting in the test vector and shifting out the test response. Therefore, the testing scheme for a scan methodology is called test-per-scan, which is unlike the test-per-clock testing scheme where a test pattern is applied in every clock cycle. In a test-per-scan scheme, the values of the primary inputs are important only in the capture cycle. A fast procedure to compact scan vectors as much as possible without exceeding the power ratings was proposed in [125]. The trade-offs between the test data and test power have been examined in [115] and the results confirm
38
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
that both test logic power and test scan power need to be tackled (see also [10, 11, 49, 127, 140]). Consequently a new test-per-scan technique was proposed in [126], where scan chain disable has been employed to prevent the scan cells from transitioning during some parts of the test. In a test-per-clock scheme, to minimize power dissipation in non-scan sequential circuits during test application a test pattern generation methodology for low power dissipation was proposed in [26]. The methodology is based on three independent steps comprising redundant test pattern generation, power dissipation measurement and optimum test sequence selection. The solution proposed in [26], which is based on genetic algorithms, achieves savings in power dissipation, however, cannot be applied to scan sequential circuits where shifting power dissipation is the major contributor to the total power dissipation. Many other intrusive BIST techniques, summarized in the following section, exploit the features of a test-per-clock scheme.
3.5
Internal Test vs. External Test
The external test technique presented in [33] is based on test vector and scan cell reordering and it minimizes power dissipation in full scan sequential circuits without any overhead in test area or performance degradation as shown in Figure 3.7. The input sequence at the primary and pseudo inputs of the CUT while shifting out test response in the case of standard scan design (Figure 1.3 from Section 1.3) is significantly modified when reordering scan cells and and test vectors and
Approaches to Handle Test Power
39
The new sequence obtained after reordering will lead to lower switching activity and hence lower power dissipation due to higher correlation between consecutive patterns at the primary and pseudo inputs of the CUT. Further benefit of the post-ATPG technique proposed in [33] is that minimization of power dissipation during test application is achieved without any decrease in fault coverage and/or increase in test application time. The technique is test set dependent which means that power minimization depends on the size and the value of the test vectors in the test set. Due to its test set dependence, the technique proposed in [33] is computationally infeasible due to large computation time required to explore the large design space. A different approach to achieve power savings is the use of extra primary input vectors, which leads to supplementary volume of test data [62, 64, 136]. The technique proposed in [136] exploits the redundant information that occurs during scan shifting to minimize switching activity in the CUT as shown in Figure 3.8. While shifting out the pseudo output part of the test response during the clock cycles the value of the primary inputs is redundant. Therefore, this redundant information can be exploited by computing an extra primary input vector for each clock cycle t + k, with k = 0.. .m – 1, of the scan cycle of every test pattern
40
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
However, despite achieving considerable power savings the technique requires large test application time which is related to a long computation time and a large volume of test data. The volume of test data is reduced in [62] where a D-algorithm like ATPG [1] is developed to generate a single control vector to mask the circuit activity while shifting out the test responses. Unlike the technique proposed in [136] based on a large number of extra primary input vectors, the solution presented in [62] employs a single extra primary input vector for all the clock cycles of the scan cycle of every test pattern (Figure 3.9). The input control technique proposed in [62] can further be combined with scan cell and test vector reordering [33] to achieve, however, modest savings in power dissipation despite a substantial reduction in volume of test data when compared to [136]. When employing internal test the most popular test method is BIST. For combinational circuits employing BIST several techniques for minimizing power were proposed [12, 27, 28, 30, 31, 44, 46, 48, 88, 89, 90, 137, 143, 144, 146, 147]. In [137] the use of dual speed linear feedback shift register (DS-LFSR) lowers the transition density at the circuit inputs leading to minimized power dissipation. The DS-LFSR operates with a slow and a normal speed LFSR in
Approaches to Handle Test Power
41
order to increase the correlation between consecutive patterns. It was shown in [137] that test efficiency of the DS-LFSR is higher than in the case of the LFSR based on a primitive polynomial with a reduction in power dissipation at the expense of more complex control and clocking. In [146] optimum weight sets for input signal distribution are determined in order to minimize average power, while the peak power is reduced by finding the best initial conditions in the cellular automata cells used for pattern generation [143]. It was proven in [12] that all the primitive polynomial LFSR of the same size, produce the same power dissipation in the circuit under test, thus advising the use of the LFSR with the smallest number of XOR gates since it yields the lowest power dissipation by itself. A mixed solution based on re-seeding LFSRs and test vector inhibiting to filter a few non-detecting sub-sequences of a pseudorandom test sequence was proposed in [44, 48]. A sub-sequence is non-detecting if all the faults found by it are also observed by other detecting sub-sequences from the pseudorandom test sequence. An enhancement of the test vector inhibiting technique was presented in [48, 88, 89, 90] where all the non-detecting subsequences are filtered. The basic principle of filtering non-detecting sequences is to use decoding logic to detect the first and the last vectors of each nondetecting sequence. After the detection of the first vector of a non-detecting
42
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
sequence, the inhibiting structure using a transmission gate network enabling signal propagation, prevents the application of test vectors to the CUT. To increase the test efficiency by detecting random pattern resistant faults with a small test sequence, an enhanced BIST structure based on re-seeding the LFSR was proposed. The particular feature of the proposed BIST structure is that the seed memory is composed of two parts: the first part contains seeds for random pattern resistant faults and the second part contains seeds to inhibit the non-detecting sequences [44]. The seed memory combined with the decoding logic (Figure 3.10(b)) is better than only decoding logic (Figure 3.10(a)) in terms of low power dissipation and high fault coverage, at the expense of higher BIST area overhead.
Approaches to Handle Test Power
43
A different approach for filtering non-detecting vectors inspired by the precomputation architecture is presented in [28]. The MASK block shown in Figure 3.11 is a circuit with a latch-based architecture or AND-based architecture which either eliminates or keeps unaltered the vectors produced by the LFSR. The enable logic implements an incompletely specified Boolean function whose on-set is the set of the unaltered vectors and whose off-set is the set of the eliminated (non-detecting) vectors [28]. An improvement in area overhead associated with filtering non-detecting vectors without penalty in fault coverage or test application time was achieved using a non-linear hybrid cellular automata [27]. The hybrid cellular automata generate test patterns for the CUT using cell configurations optimized for low power dissipation under given fault coverage and test application time constraints. The regularity of multiplier modules and linear sized test set required to achieve high fault coverage lead to efficient low power BIST implementations for data paths [6, 52, 53, 54, 68, 76]. Regardless of the implementation type of the test pattern generator, BIST architectures differ one from another in terms of power dissipation [118]. The three different architectures were evaluated for power dissipation, BIST area overhead and test application time. It was found in [118] that the architecture consisting of an LFSR and a shift register (SR) produces lower power dissipation, BIST area overhead and test application time when compared to a single LFSR and two LFSRs with reciprocal characteristic polynomials. However, this is achieved at the expense of lower fault coverage and hence reduced test efficiency due to the modified sequence of patterns applied to the CUT, which does not detect all the random pattern resistant faults. To minimize shifting power dissipation in scan BIST circuits, some extra logic is introduced between the combinational logic and the scan chain and the
44
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
scan cells are reordered such that the number of signal transitions is minimized [145]. In another approach the test vector inhibiting techniques proposed for combinational circuits are extended to scan sequential circuits [29]. The scan BIST methodology needs to be extended with decoding logic and an AND gate to enable pattern shifting. Based on the content of the LFSR the decoding logic detects whether the test pattern to be shifted belongs to the subset of the detecting sequences. If the pattern is non-detecting the propagation through the SR and scan chain is stopped. In [40, 41] the test vector inhibiting technique is extended where the modules and modes with the highest power dissipation are identified, and gating logic is introduced in order to reduce power dissipation. Highly correlated patterns are employed in the low transition random test pattern generator (LT-RTPG) proposed in [139], where neighboring bits of the test vectors are assigned identical values in most test vectors. Another approach suggested to operate LFSRs in a parallel mode, such that, instead of updating scan cells in every clock cycle, the taps are moved while the se-
Approaches to Handle Test Power
45
quential elements remain static [86]. The goal of clocking the least number of sequential elements was achieved, a goal which was further improved in [57] using non-primitive polynomials with two taps. Other multiphase clocking techniques based on a hybrid single/multiphase approach and on token scan cells have been proposed in [63] and [65, 66]. A mixed internal/external test solution is the use of the recently proposed [19, 56, 121] test data compression/decompression methods. These methods do not introduce performance penalty and guarantee full reuse of the existing embedded cores, as well as the ATE infrastructure, with minor modifications required during system test preparation (compression) and test application (decompression). Hence, test data compression/decompression is an efficient complementary alternative to BIST, and requires further investigation [56]. In order to low power test power the approach reported in [19] has shown that Golomb coding achieves high compression ratios along with savings in scan-in power. The ’0’ don’t care (DC) mapping was chosen since long runs of ’0’s have short Golomb codes, thus high compression ratios can be attained on test sets containing long and frequent runs of ’0’s. However, the ’0’ DC mapping exhibits higher transition counts than the minimum transition count (MTC) DC mapping due to the occurrences of runs of ’0’s bounded by ’ 1’s in the ’0’ DC mapped test set. This is addressed by the coding scheme proposed in [121], which exploits the characteristics of the low power test sets generated using MTC DC mapping. The low power code is composed of a “sign-bit” followed by the traditional Golomb code corresponding to a run of ’0’s. The value of the “sign-bit” differentiates the two types of runs: value ’0’ for a run of ’0’s and value ’1’ for a run of ’1’s. Comparing columns 4 and 7 shown in Table 3.1, it can easily be observed that for runs of ’0’s the coding proposed in [121] has
46
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
1 bit longer codes than Golomb, however for runs of ’1’s the proposed coding provides significant improvement.
3.6
Single vs. Multiple Test Sources and Sinks
Circuit partitioning into sub-circuits and power-conscious sub-circuit test planning have an important influence on power dissipation [43]. The main justification for circuit partitioning is to obtain two different structural circuits of approximately the same size, so that each circuit can be successively tested in two different sessions. In order to minimize the BIST area overhead of the resulting BIST scheme, the number of connections between the two subcircuits has to be minimum. It was shown in [43, 47] that by partitioning a single circuit into two sub-circuits and executing two successive tests, savings in power dissipation can be achieved with roughly the same test application time as in the case of the single circuit. Since the test application time and power reduction depend on the circuit partitioning method, multilevel graph partitioning has proved to be an efficient method. Unlike the case of external testing where there is only one test source and sink, the ATE, in the case of internal testing the potentially large number of onchip test sinks/sources and test access mechanisms (TAMs) can also determine the type of test control: centralized or distributed. Having identified the test resources, including their sharing and sequencing, the test schedules can be determined such that the resource conflicts are resolved.
3.7
Power-Constrained Test Scheduling
Numerous power-constrained test scheduling algorithms were proposed [24, 25, 78, 79, 80, 81, 92, 93, 94, 95, 96, 97, 98, 117, 119, 120, 128, 148]. The approach in [148] schedules the tests under power constraints by grouping and ordering based on floorplan information. A further exploration in the solution space of the scheduling problem is provided in [25] where a resource allocation graph formulation (Figure 1.8(a) from Section 1.4.2) for the test scheduling problem is given and tests are scheduled concurrently without exceeding their power constraint during test application. To simplify the scheduling problem the worst case power dissipation (maximum instantaneous power dissipation) is used to characterize the power constraint of each test as shown in Figure 3.14(a). The test compatibility graph introduced in Figure 1.8(b) is annotated with power and test application time information as shown in Figure 3.14(b). The power rating characterized by maximum power dissipation (Figure 3.14(a)) and test application time are used for scheduling unequal length tests under a power constraint. To overcome the clique covering problem [25], which is a well known NP-hard problem, the solutions proposed in [92, 93,
Approaches to Handle Test Power
47
96] use list scheduling, left edge algorithm and a tree growing technique as heuristics for the block test scheduling problem. Power-constrained test scheduling is extended to SOCs in [16, 67, 80, 117, 119]. A test infrastructure and power-constrained test scheduling algorithms for a scan-based architecture are presented in [78, 79]. Also, following the initial work on BIST for low power RAM testing [22], new approaches [21, 133] are emerging where the focus shifts towards considering power-constrained testing of SRAM clusters and heterogeneous SOC designs. Most of the ap-
48
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
proaches have assumed that a fixed amount of power dissipation is associated with each test. This is an optimistic assumption which is not always valid due to the useless power which “leaks” through the untested modules [128]. Also, the simplicity of the generally accepted global peak approximation [25] used
Approaches to Handle Test Power
49
as a power model leads to reduced test concurrency. As shown in Chapter 7, this problem can be overcome by using a simple, reliable, and accurate power model based on two local peaks and incorporated in the power profile manipulation approach [120].
3.8
Summary
Throughout this chapter a number of approaches to handle test power have been outlined. Three orthogonal directions have been identified based on: test set dependence, the number of clock cycles required to apply a test vector and the location of test sources and sinks. Since test resource sharing determines resource allocation conflicts power-constrained test scheduling was also summarized.
This page intentionally left blank
Chapter 4 POWER MINIMIZATION BASED ON BEST PRIMARY INPUT CHANGE TIME
4.1
Introduction
To decrease the complexity of ATPG for sequential circuits structured DFT is required. The most commonly used DFT method employed for increasing the testability of VLSI digital circuits is the scan method [1]. The scan method makes sequential elements (latches or flip flops) controllable and observable by chaining them into a scan chain (shift register). To reduce performance degradation, test area overhead and test application time associated with full scan, partial scan has been investigated over the last two decades [2]. Partial scan selects a small number of scan cells which allows ATPG to achieve a high fault coverage in a short computation time. Due to the increasing complexity of very deep sub-micron VLSI circuits scan chains are inserted in a structural network of logic gates at the logic level of abstraction of the VLSI design flow. Therefore, the best exploration of alternative solutions for power minimization in scan sequential circuits, is most effectively done at the logic level of abstraction. This is illustrated in Figure 4.1 where scan cells can be inserted either prior to or after the logic optimization phase. The design is specified in a HDL at the RTL of abstraction of the VLSI design flow and RTL synthesis translates the initial design into a network of logic gates before logic optimization satisfies the area and delay constraints, and prepares the design for the physical design automation tools [14]. This chapter addresses power minimization during test application in small to medium sized scan sequential circuits by analyzing and exploiting the influence of primary input change time on the minimization of power dissipation during test application. In the case when ATE channels control directly the primary inputs of the CUT, the primary input values are redundant while shifting out the test responses and consequently this redundancy can be exploited 51
52
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
for defining new test application strategies which do not affect test efficiency. A test application strategy based on best primary input change (BPIC) time in scan sequential circuits is described. Also, the effect of combining the primary input change time with test vector and scan cell reordering [33] on power dissipation is discussed. The rest of the chapter is organized as follows. In Section 4.2, the parameters which are accountable for power dissipation in scan circuits during test application are described. Section 4.3 explains why the primary input change time has strong impact on reducing spurious transitions during test application. Algorithms for exploiting the parameters which lead to savings in power dissipation are introduced in Section 4.4. Experimental results and a comparative study of full scan and partial scan from the power dissipation standpoint are presented in Section 4.5. Finally, concluding remarks are given in Section 4.6.
4.2
Scan Cell and Test Vector Reordering
This section reviews scan cell and test vector reordering described in [33] for full scan sequential circuits and investigates its applicability to partial scan sequential circuits.
Best Primary Input Change Time
4.2.1
53
Full Scan Sequential Circuits
Previous research has established that the node transition count, introduced in Chapter 2, in full scan sequential circuits depends on two factors, test vector reordering and scan cell reordering, when the circuit is in the test mode [33]. The following example shows how test vector and scan cell reordering affect the circuit activity during test application in full scan sequential circuits. Example 4.1 To illustrate the factors accountable for power dissipation consider the s27 circuit (Figure 4.2) from the ISCAS89 benchmark set [13]. The primary inputs are are the scan cells, are the present state lines, and is the circuit output. Using the GATEST [123] ATPG tool, it was shown that 5 test vectors are needed to achieve 100% fault coverage. The test vectors are {1101011, 0000000, 0010010, 0111111, 1100010}. For easy reference they are labeled as Test vectors consist of primary and pseudo inputs in the following order Assuming that initially all the primary and pseudo inputs are set to 0 the node transition count is calculated as NTC = 372. By reordering the test vectors as a lower value for node transition count is obtained as NTC = 352. This shows that reordering of test vectors reduces power dissipation during test application by increasing the correlation between consecutive test vectors. Note that the NTC is computed over the entire test application period of n × (m + 1) + m clock cycles, where n is the number of test vectors and
54
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
m is the number of scan cells. Now the effect of scan cell reordering on power savings is examined. Consider the reordered test vector set and reordering scan cells to as shown in Figure 4.3 the value of node transition count is further reduced to NTC = 328. This reduction is due to the higher correlation between successive states during shifting in test vectors and shifting out test responses. If test vector reordering and scan cell reordering are done simultaneously a further reduction in node transition count is achieved as NTC = 296, for test vector order and scan cell order This shows that scan cell reordering and test vector reordering are interrelated and can lead to higher savings than when either scan cell reordering or test vector reordering is considered separately.
4.2.2
Partial Scan Sequential Circuits
It was shown in Example 4.1 how test vector reordering affects circuit activity and hence power dissipation in full scan sequential circuits. However, test vector reordering described in [33] is prohibited for partial scan due to the fixed test vector order fault activation and fault-effect propagation sequences through non-scan cells [2]. On the other hand, scan cell reordering can be applied for partial scan sequential circuits as shown in the following example.
Example 4.2 To investigate the influence of scan cell reordering on power dissipation during test application in a partial scan sequential circuit consider the simple circuit shown in Figure 4.4. The primary inputs are
Best Primary Input Change Time
55
are the scan cells, is the non-scan cell, are the present state lines, and is the circuit output. The scan cells are selected using the logic level partial scan tool OPUS [23]. Using the logic level ATPG tool GATEST [123], 6 test vectors are generated to achieve 100% fault coverage. The test vectors are {1011110, 0001010, 0111010, 0110100, 1010111, 0100101}. For easy reference they are labeled as Each test vector consists of a primary input part and a present state part in the following order Initially all the primary inputs and present state lines are considered 0 and using Equation 2.2 the node transition count is calculated as NTC = 224. By reordering the scan cells to as shown in Figure 4.5 the value of the node transition count is reduced to NTC = 216. The techniques shown in Examples 4.1 and 4.2 yield savings in NTC, and hence in power dissipation. To further reduce power dissipation during test application in the circuit under test, a test application strategy is described in the following section.
4.3
A Technique for Minimization of Power Dissipation
In this section the influence of primary input change time on the reduction of spurious transitions, and hence savings in the total number of transitions, is demonstrated through detailed examples.
56
4.3.1
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Full Scan Sequential Circuits
An overview of testing scan sequential circuits is provided. For a scan sequential circuit, each test vector applied to the circuit under test is composed of primary input part and pseudo input where @ denotes concatenation. Given m scan cells, for each test vector the present state part is shifted in m clock cycles to In the case of partial scan sequential circuits, the non-scan cells preserve their value during clock cycles to In the next clock cycle the entire test vector is applied to the circuit under test. A scan cycle represents the m + 1 clock cycles to required to shift in the present state part of the test vector and apply the entire test vector to the circuit under test. In the following m clock cycles of the next scan cycle the test response is shifted out simultaneously with shifting in the present state part of the next test vector The values of the primary inputs are important only at when the entire test vector is applied. Therefore, the primary inputs can be changed at clock cycles to without affecting test efficiency. The transitions which occur in the combinational part, without any influence on test efficiency or test data, are defined as follows. Definition 4.1 A spurious transition during test application in a scan sequential circuit is a transition which occurs in the combinational part of the circuit under test while shifting out the test response and shifting in the present state part of the next test vector. These transitions do not have any influence on test
Best Primary Input Change Time
57
efficiency since the values at the input and output of the combinational part are not useful test data. It was assumed in the Example 4.1 circuit that changing of the primary inputs occurs at time The following definitions introduce two test application strategies that will be used throughout this book: Definition 4.2 The test application strategy where primary inputs change at is called as soon as possible (ASAP). Definition 4.3 The test application strategy where primary inputs change at is called as late as possible (ALAP), where m is the number of sequential elements converted to scan cells. Having introduced ASAP and ALAP test application strategies the following example shows their shortcomings and the need for an improved test application strategy. Example 4.3 For the particular example in Figure 4.6, where the number of scan cells is 3, at times and the scan cells are in the shift mode and the values on the input lines of the combinational part of the circuit are irrelevant. The value of primary inputs is important only at when the entire test vector is applied to the combinational part of the circuit. Therefore, the primary inputs can keep the value of the previous test vector during and without affecting the testing process. To illustrate the importance of primary input change time consider the application of test vector {0000000} followed by test vector {1101011}. The circuit lines are described in terms of three values. For example in Figure 4.6(a), in the case of primary input the value 0/1/1 denotes value 0 at when applying {0000000} and value 1 at and when shifting in the second test vector {1101011}. When primary inputs change at (ASAP test application strategy) as shown in Figure 4.6(a) the two marked boxes illustrate spurious transitions 0/1/0 and 1/0/1 at the output of the marked NOR and NOT gates respectively. Since the value of primary inputs is irrelevant during shifting out the test response, if the primary inputs are changed at the controlling value 1 at the input of the marked NOR gate is preserved at and no spurious transitions at the output of the marked NOR and NOT gates will occur, as shown in Figure 4.6(b). The primary inputs can keep their values until when test vector {1101011} is applied to the circuit (ALAP test application strategy). However, changing the primary inputs at will not yield the minimum number of transitions as demonstrated in Figures 4.6(c) and 4.6(d) using the same test vectors. In Figure 4.6(c), in the case of primary input the value 0/0/1 denotes value 0 at and when shifting in {1101011} and value 1 at when applying {1101011}. When primary inputs
58
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
are changed at as shown in Figure 4.6(c) the marked box illustrates a spurious transition 0/1/0 at the output of the marked AND gate. However if the primary inputs are changed earlier at the controlling value 0 at the input of the marked AND gate is preserved at and no spurious transition at the output of the marked AND gate will occur as shown in Figure 4.6(d). It was shown in Example 4.3 that both ASAP or ALAP test application strategies lead to spurious transitions during shifting in test vectors and shifting out test responses. Now the question is when should the primary inputs change such that the smallest number of spurious transitions occur, which leads to lower power dissipation? Before introducing the test application strategy which reduces spurious transitions during test application the following necessary definition is given. Definition 4.4 The best primary input change time of test vector is the time when the primary input part of the previous test vector changes to the primary input part of the actual test vector leading to the smallest value of node transition count during the scan cycle when test vector is applied after test vector
Best Primary Input Change Time
59
Finding the best primary input change time will lead to higher correlation between consecutive values on the input lines of the combinational part of the circuit. This leads to minimum value of NTC during the scan cycle, and yields savings in power dissipation. Definition 4.5 is used to introduce the test application strategy. Definition 4.5 The test application strategy where best primary input change time for each test vector with i = 0…n – 1, is determined such that the minimum value of node transition count over the entire test application period is achieved, is referred to as BPIC test application strategy. Figures 4.6(a) to 4.6(d) have illustrated the reduction of spurious transitions over a three clock cycles period. The following example gives insight of the BPIC technique for power dissipation minimization during the entire test application period when applied to full scan sequential circuits. Example 4.4 Tables 4.2(a) and 4.2(b) show the flow of test data for the benchmark circuit s27 of Figure 4.6 for ASAP and the BPIC test application strategies respectively. In Table 4.2(a) consider the scan cell order and
60
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
test vector order after simultaneous test vector reordering and scan cell reordering was carried out. The first column shows the clock cycle index and the second column outlines the test vector which is scanned in during and applied at The operation type (Scan or Load) is shown in column 3. In the case of a Scan operation the fourth column gives the ScanIn value. Columns 5-8 show the values of primary inputs and the columns 9-11 show the next state values The last column shows the value of the node transition count NTC for each clock cycle. The NTC is calculated as follows. In clock cycle i the NTC in the combinational part is computed by considering the primary inputs of clock cycle i and the next state values of clock cycle i – 1, which are present state values at clock cycle i. The NTC in the sequential part is the sum of NTC of each scan cell by scanning/loading the next state values in clock cycle i, using the values of and outlined in Chapter 2. Initially all the primary and scan inputs are set to 0 and the node transition count over the entire test application period under the ASAP test application strategy is NTC = 296. This value can be reduced if spurious transitions during shifting in test vectors and shifting out responses are avoided by modifying the primary input change time. The
Best Primary Input Change Time
61
change of primary input part of test vector at time is indicated by If the primary input change times are set to and as shown in the marked boxes of Table 4.2(b), the node transition count reduces to NTC = 266. The reason for reducing the number of transitions is the increased correlation between consecutive values of primary and pseudo inputs during when test vectors are scanned in and test responses are scanned out. For example, by changing the primary inputs of at the NTC in clock cycles 12 and 14 reduces from 16 and 24, respectively, in the case of ASAP (Table 4.2(a)) and to 10 and 14, respectively, in the case of BPIC (Table 4.2(b)). Note that in clock cycles when test responses are loaded in scan cells (L in column 3), the correct test response values from Table 4.2(a) are preserved. Combining primary input change time with simultaneous scan cell reordering and test vector reordering further improvements are achieved. For test vector order scan cell order and primary input change times set at and it is found that the new value of node transition count is reduced further to NTC = 251 (Table 4.2(c)). This highlights the importance of combining the best primary input change time with simultaneous scan cell and test vector reordering for NTC reduction.
62
4.3.2
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Applicability of BPIC Strategy to Partial Scan Circuits
Having introduced the BPIC test application strategy for full scan sequential circuits, this subsection gives two detailed examples showing that the BPIC test application strategy is applicable to partial scan sequential circuits. The importance of combining the BPIC test application strategy with scan cell reordering is outlined. So far it was assumed that the changing time of the primary inputs of circuit shown in Figure 4.4 occurs at clock cycle To illustrate the importance of primary input change time on the number of spurious transitions in a partial scan sequential circuit consider the following example.
Best Primary Input Change Time
63
Example 4.5 Consider the application of test vector followed by test vector to the circuit of Figure 4.7. The circuit data lines are described in terms of four values. For example in Figure 4.7(a), in the case of primary input the value 0/0/0/1 denotes value 0 at when applying value 0 at and respectively, when shifting in the present state part of the second test vector and value 1 at when applying When primary inputs change at as shown in Figure 4.7(a) the two marked boxes illustrate the spurious transition 0/0/1/0 at the output of the marked AND gate, which further propagates to the output of the marked OR gate. However the value of primary inputs is irrelevant during shifting out the test response. Thus, the primary inputs can be changed as early as after test vector is applied to the circuit under test. When primary inputs change at
64
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
as shown in Figure 4.7(b) the controlling value 0 at the input of the marked AND gate is preserved at and no spurious transitions at the output of the marked AND and OR gates will occur. However, changing the primary inputs at does not yield the minimum value of node transition count. The marked box in Figure 4.7(b) illustrates a spurious transition 1/0/1/0 at the output of the marked NAND gate. The value of NTC = 41 over the scan cycle period and in the case of ALAP test application strategy is reduced to NTC = 37 in the case ASAP test application strategy. However, both ALAP and ASAP test application strategies fail to achieve the minimum NTC. If the primary inputs change at clock cycle the controlling value 0 at the input of the marked NAND gate is preserved at and no spurious transitions at the output of the marked NAND gate will occur, as shown in Figure 4.7(c). Furthermore, the
Best Primary Input Change Time
65
controlling value 0 at the input of the marked AND gate is preserved at and no spurious transitions at the output of the marked AND and OR gates will occur, as shown in the marked boxes in Figure 4.7(c). Thus, the minimum value of NTC = 35 is achieved when primary input change time is set to Figures 4.7(a)-4.7(c) have illustrated the reduction of spurious transitions in an interval of four clock cycles. Now, to give insight of the BPIC test application strategy during the entire test application period in a partial sequential circuit, consider the following example. Example 4.6 To outline the advantage of controlling primary input change time of each test vector, Tables 4.3(a) and 4.3(b) show the flow of test data in the circuit of Figure 4.7 for ALAP and BPIC test application strategy respectively, during the entire test application period. The first column shows the clock cycle index and the second column outlines the test vector which is scanned in during and applied at The operation type (Scan or Load) is shown in the third column. In the case of a Scan operation the fourth column gives the value on scan input line ScanIn. Columns 5-9 show the values of primary inputs and the columns 10-12 show the next state values
66
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
The last column 13 shows the value of NTC for each clock cycle. The NTC is calculated as follows. In clock cycle i the NTC in the combinational part is computed by considering the primary inputs of clock cycle i and the next state values at clock cycle i – 1, which are present state values at clock cycle i. The NTC in the sequential part is the sum of NTC of each cell (scan cells and and non-scan cell ) by scanning/loading the next state values in clock cycle i, using the values of and given in Chapter 2. Note that when shifting out test responses the non-scan cell is not clocked and therefore no transitions occur. Initially all the primary inputs and present state lines are considered 0 and the node transition count over the entire test application period under the ALAP test application strategy is NTC = 224. This value can be reduced if spurious transitions are avoided by determining best primary input change time for each test vector. If the primary input change times are set to and as shown in the marked boxes of Table 4.3(b), the node transition count reduces to NTC = 214. The reason for reducing NTC is the increased correlation between consecutive values of primary inputs and present state lines. For example by changing test vector at the NTC in clock cycles 13
Best Primary Input Change Time
67
and 14 reduces from 16 and 12, respectively, in the case of ALAP (Table 4.3(a)) to 13 and 9, respectively, in the case of BPIC (Table 4.3(b)). It should be noted that for the particular circuit of Figure 4.7 the value of NTC = 214 when applying the BPIC test application strategy by itself is better than when applying scan cell reordering by itself (NTC = 216 as shown in Example 4.2). When combining BPIC test application strategy with scan cell reordering further improvements are achieved. For scan cell order and primary input change times set at and it is shown that the value of node transition count is further reduced to NTC = 206 (Table 4.3(c)).
4.3.3
Extension of BPIC Strategy to Scan BIST
So far the BPIC test application strategy was applied to full and partial scan sequential circuits using external automatic test equipment ATE (Figure 1.2). This can be summarized in Figure 4.8 where the best primary input change time k is highlighted. However, the BPIC test application strategy is not applicable only to standard full and partial scan sequential circuits using external ATE. In the following the minor modifications which need to be considered when using
68
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
scan BIST methodology (Figure 1.5) are outlined. Figure 4.9 shows that the serial output of the LFSR is fed directly into the scan chain and the primary inputs are directly controllable. Therefore, primary inputs can be changed at the best primary input change time. This will lead to a lower area overhead associated with scan BIST methodology at the expense of higher interference from ATE which needs to store the primary input part of each test vector.
4.4
Algorithms for Power Minimization
Having described BPIC test application strategy, algorithms, which compute best primary input change times used by BPIC test application strategy, are considered. First, an algorithm which computes best primary input change time for each test vector with respect to a given test vector and scan cell order is given. Then, it is shown how combining the BPIC test application strategy
Best Primary Input Change Time
69
with scan cell and test vector reordering using a simulated annealing algorithm leads to further reduction in power dissipation.
4.4.1
Best Primary Input Change (BPIC) Algorithm
Spurious transitions induced by fixed primary input changes are solved by changing the primary inputs of each test vector such that the minimum number of transitions is achieved. For a given scan cell order with m scan cells, the total number of primary input change times is (m + 1). Considering n test vectors, in a given test vector order, the total number of configurations of primary input changing for all the test vectors is Best Primary Input Change Algorithm (BPIC-ALG) computes the best primary input change time for each test vector for a given scan cell order and test vector order. Figure 4.10 illustrates the pseudocode of the BPIC-ALG algorithm. The function accepts as input, a test set S and a circuit C. The outer loop represents the traversal of all the test
70
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
vectors from test set S. All the m+ 1 primary input change times for test vector are then considered in the inner loop. For each primary input change time circuit C is simulated and the node transition count is registered. After the completion of the inner loop the best primary input change time for which is minimum, is retained and the outer loop continues until the entire test set is examined. The algorithm computes the best solution in a computation time which is polynomial in the number of test vectors n, the number of scan cells m, and the circuit size |C| (number of gates). It should be noted that BPIC-ALG is test set dependent and hence it is applicable only to to small to medium sized sequential circuits.
4.4.2
Simulated Annealing-Based Design Space Exploration
High power dissipation problems caused by an inadequate test vector reordering and scan cell reordering for full scan sequential circuits are solved by
Best Primary Input Change Time
71
72
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
a simulated annealing algorithm, which can escape local minima. Since test vector reordering is not applicable to partial scan the simulated annealing algorithm for partial scan sequential circuits uses only scan cell reordering. For a test set, which consists of n test vectors, there are n! test vector reorderings. Furthermore for each test vector reordering there are m! scan cell orders, where m is the number of scan cells. Finding the optimum test vector and scan cell order is NP-hard [33]. The total complexity of the design space, defined by the set of scan cell and test vector orders, is n! × m!, which even for small design problems with 15 test vectors and 15 scan cells is computationally expensive. Figure 4.11 illustrates the basic steps of simulated annealing-based optimization [39]. The optimization function accepts as input a test set S and a circuit C, which are set to initial configuration and respectively. The calculation of the initial control parameter value is based on the assumption that a sufficiently large number of generated solutions (for example 95%) should be accepted at the beginning of the annealing process. The outer loop modifies the control parameter of the simulated annealing algorithm, which is gradually lowered as the annealing process proceeds. Within the inner loop a new sequence of solutions is generated at a constant control parameter value. The length of a sequence of solutions is set to 20. The control parameter is
Best Primary Input Change Time
73
decreased in such a way that the stationary distributions at the end of the sequences of solutions are close to each other. By evaluating information about the cost distribution within each sequence of solutions, a fast decrease of the control parameter is given according to the cooling schedule. Each new solution is generated using one of the following: randomly choose two scan cells and from the actual scan cell order and exchange their position generating a new scan cell order where m is the number of scan cells in the circuit.
randomly choose two test vectors and from the actual test vector order and exchange their position generating a new test vector order where n is the number of test vectors in the test set. The alternative application of exchanges between randomly chosen test vectors and scan cells proves to be efficient in exploring the discrete design space. It should be noted that for partial scan sequential circuits the exchanges between test vectors are prohibited and SA-Optimization from Figure 4.11 does not execute line 6. For each new solution the BPIC-ALG algorithm is called to determine the best primary input change times for all the test vectors from according to the scan cell order in New solutions are either accepted or rejected depending on the acceptance criterion defined in the simulated annealing algorithm [74]. If the best solution so far is reached then it is saved in which is returned together with best primary input change times at the end of the optimization process. The optimization process is terminated after the variation in the average cost for a specified number of sequences of solutions falls bellow a given value [35]. It is important to note that the size of the design space for partial scan sequential circuits is m!, and it is significantly smaller than the size of the design space for full scan sequential circuits, which is m!×n!, where m is the number of scan cells and n is the number of test vectors.
4.5
Experimental Results
Experimental results are divided in three separate subsections. The first one gives the results for a number of full scan sequential circuits. The second subsection provides the experimental results for partial scan sequential circuits, and the third subsection highlights further benefits of partial scan in minimizing power dissipation during test application.
74
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Best Primary Input Change Time
4.5.1
75
Experimental Results for Full Scan Sequential Circuits
This subsection shows through a set of benchmark examples that the BPIC test application strategy yields savings in power dissipation during test application in full scan sequential circuits. Furthermore, the savings can be substantially improved when BPIC is integrated with test vector reordering and scan cell reordering. The BPIC-ALG and SA-Optimization algorithms were implemented within the framework of a low power testing system on a 350 MHz Pentium II PC with 64 MB RAM running Linux and using GNU CC version 2.7. Table 4.3 shows the results when the BPIC test application strategy is applied by itself (i.e., without scan cell and test vector reordering) for 24 commonly accepted ISCAS89 benchmark circuits. The first and second columns give the circuit name and the number of scan cells (SC), respectively. The third column gives the number of test vectors (TV) generated by the ATPG tool ATOM [58]. The average value of node transition count (NTC), which is the total value of NTC divided by the total number of clock cycles for ASAP, ALAP, and the BPIC test application strategies, are given in columns 4, 5 and 6, respectively. It can be seen from Table 4.3 that BPIC test application strategy has the least average value of NTC for all the benchmark circuits when compared to ASAP and ALAP test application strategies. To give an indication of the reductions in average value of NTC, columns 7 and 8 show the percentage reduction of BPIC over ASAP and ALAP test application strategies. The reduction varies from approximately 10% as in the case of s641 down to under 1% as in the case of s526. Table 4.3 has shown the reductions in node transition count using a non-compact test set. In order to reduce test application time while maintaining the same test quality, compact test sets are used. Compact test sets may lead to higher power dissipation because of an increased number of sensitized paths by each test vector. However, using the BPIC test application strategy similar average values of NTC are achieved for all the benchmark circuits when comparing compact test sets to non-compact test sets. Table 4.4 shows experimental results for compact test set generated by MINTEST [59] when applying the BPIC test application strategy without scan cell and test vector reordering. For example, in the case of s298 the average value of NTC for non compact test set is 114.73 (Table 4.3), whereas for compact test set the average value of NTC is 107.85 (Table 4.4) despite a reduction in the number of test vectors from 52 as in the case of non-compact test set to 23 as in the case of the compact test set. The similar values of NTC are due to finding the best primary input change time for reducing spurious transitions during shifting in test vectors and shifting out responses. This shows that using compact test sets and hence decreasing the test application time will not increase the power dissipation during test application in full scan sequential circuits.
76
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
While the application of the BPIC test application strategy reduces average value of NTC when compared to ASAP and ALAP test application strategies, as shown in Tables 4.3 and 4.4 for non-compact and compact test sets, respectively, further savings can be achieved when BPIC is combined with test vector reordering and scan cell reordering. The results of the simulated annealingbased design space exploration for solving simultaneous test vector reordering, scan cell reordering and primary input change time for non-compact test set are shown in Table 4.5. In order to asses the impact of the factors accountable for power dissipation, three experiments were carried out. In the first experiment the average value of NTC is computed using scan cell and test vector reordering under ASAP test application strategy. The results are given in columns 2 and 3. The reduction in column 3 is computed over the node transition count of the initial scan cell and test vector. In the second experiment the test applica-
Best Primary Input Change Time
77
tion strategy was changed to ALAP, and the results are shown in columns 4 and 5. In the third experiment the BPIC test application strategy is combined with scan cell and test vector reordering and the results are given in columns 6 and 7. Note that BPIC always produces better results than ASAP and ALAP due to higher correlation between successive states during shifting in test vectors and shifting out test responses. This shows the importance of integrating all the factors accountable for power dissipation in the optimization process. The reduction value depends on the type of the circuit and the average value of NTC for the initial scan cell and test vector order. For example in the case of s713 the reduction is 34% and it goes down to 4% as in the case of s838. However, this still presents an improvement when compared to ASAP and ALAP which yield reductions only of 3%. Table 4.6 shows the results for ASAP, ALAP and BPIC test application strategies when using compact test sets. Again the
78
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
BPIC provides better results than ASAP and ALAP, for all the circuits from the benchmark set. It is interesting to note that NTC values for the BPIC test application strategy when using non-compact and compact test sets are similar. This indicates that the test set size has no influence on the average value of NTC confirming that the only three accountable factors for power dissipation during test application in full scan circuits are test vector reordering, scan cell reordering and primary input change time. For some of the examples the computation time for completing the optimization may increase over 40,000 s using a Pentium II processor at 350 MHz, as shown in Table 4.5. This is due to the huge size of the design space and a small number of solutions with identical NTC which lead to longer times for exploration and convergence of the simulated annealing algorithm. However, when using compact tests as shown in Table 4.6, due to smaller number of test vectors and consequently the size design space, less time for completion is required which leads to the conclusion that compact test sets have benefits in both test application time as well as in computation time with similar reductions in power dissipation.
Best Primary Input Change Time
4.5.2
79
Experimental Results for Partial Scan Sequential Circuits
This subsection demonstrates through a set of benchmark examples that BPIC yields savings in power dissipation during test application in partial scan sequential circuits. Furthermore, the savings can be substantially improved when the BPIC test application strategy is combined with scan cell reordering. Table 4.7 shows circuit and test set characteristic for 17 circuits from ISCAS89 benchmark set [13]. The first and second columns give the circuit name and the number of primary inputs, respectively. The third and fourth columns give the number of scan cells and non-scan cells, respectively. The scan cells are selected using the logic level partial scan tool OPUS [23] by cutting all the cycles in the circuit. Column 5 gives the number of test vectors generated by the logic level ATPG tool GATEST [123] to achieve the fault coverage shown in the last column. Table 4.8 shows the results when the BPIC test application strategy is applied by itself (i.e., without scan cell reordering) for the 17 benchmark circuits described in Table 4.7. The average value of NTC, which is the total value of NTC divided by the total number of clock cycles over the entire test application period, for ALAP and the BPIC test application
80
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
strategies are given in columns 2 and 3, respectively. It can be seen from Table 4.8 that BPIC test application strategy has smaller average value of NTC for all the benchmark circuits when compared to ALAP test application strategy [1]. To give an indication of the reductions in average value of NTC, column 4 shows the percentage reduction of BPIC over ALAP test application strategy. The reduction value depends on the type of the circuit and the average value of NTC for the initial scan cell order. The reduction varies from approximately 15% as in the case of s713 down to under 1% as in the case of s349. The last column gives the computation time for the exact BPIC-ALG algorithm, which computes best primary input change times used by BPIC test application time strategy. For most of the circuits it took less than 1s to find the best primary input change times for all the test vectors. This indicates that the BPIC-ALG can be used for fast computation of the cost function in the optimization process when combined with scan cell reordering. There are exceptional circuits with very large number of test vectors (1465 in the case of s5378) where the computation time is 380s (last row from Table 4.8). However, this is still an acceptably low computation time for the calculation of the cost function. While the application of the BPIC test application strategy reduces average value of NTC when compared to ALAP test application strategy, as shown in Table 4.8 further savings can be achieved when the BPIC test application
Best Primary Input Change Time
81
strategy is combined with scan cell reordering. Before combining scan cell reordering and the BPIC test application strategy, the influence of scan cell reordering under the ALAP test application strategy [1] is examined as shown in Table 4.9. For circuits with small number of scan cells, the exploration of the entire design space is computationally inexpensive. In the case of s713, where for 7 scan cells there are 7! = 5040 possible scan cell orders, it took 3191s to find the optimum scan cell order which yields 11.42% reduction in average value of NTC. However for larger circuits as in the case of s1423 and s5378, where the sizes of the design space are and respectively, SA-Optimization is required for efficient design space exploration of the discrete, degenerate and highly irregular design space. For example, it takes up to 44830s to find a sub-optimum scan cell order as in the case of s5378. This is due to the huge size of the design space and the low number of solutions with identical NTC which leads to long computation times for the convergence of the simulated annealing algorithm. It should be noted that for most of the benchmark circuits scan cell reordering under ALAP test application strategy (Table 4.9) yields higher reductions in average value of NTC than when the BPIC test application strategy is applied by itself (Table 4.8), at the expense of greater computation time. Furthermore, there are circuits
82
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
(s641 and s713) where the BPIC test application strategy applied by itself generates higher reductions in NTC with computation time which is three orders of magnitude lower as compared to scan cell reordering under the ALAP test application strategy. For example, in the case of s713 it took 3191s to achieve 11.42% reduction in average value of NTC by scan cell reordering (last two columns of Table 4.9) when compared to only 0.80s of computation time to achieve 14.40% reduction in average value of NTC by the BPIC test application strategy (last two columns of Table 4.8). To achieve maximum reductions in average value of NTC, the BPIC test application strategy and scan cell reordering are combined as shown in Table 4.10. For all the benchmark circuits the combination of the BPIC test application strategy and scan cell reordering leads to higher reductions than when any parameter is considered by itself. For example, in the case of s5378 the reduction in average value of NTC is 28.69% at the expense of high computation time which is due to the large number of scan cells and hence a large design space and slow convergence of the simulated annealing algorithm.
Best Primary Input Change Time
83
84
4.5.3
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Further Benefits of Partial Scan in Minimizing Power
It is known that partial scan has advantages in terms of test area overhead and test application time when compared to full scan [2]. This subsection shows how the BPIC test application strategy for partial scan provides further benefits in terms of power dissipation and computation time required for design space exploration when compared to full scan. Figures 4.12(a) and 4.12(b) show a comparison of average value of NTC and computation time for partial and full scan sequential benchmark circuits. The results for full scan sequential circuits were outlined in Table 4.5 (ATOM [58]), and Table 4.6 (MINTEST [59]). The average value of NTC for partial scan is smaller when compared to full scan, as shown in Figure 4.12(a). The reduction is due to partial scan DFT methodology, which in the test mode of operation does not clock the non-scan cells while test responses are shifted out, leading to savings in power dissipation. It is interesting to note that for benchmark circuits s820 and s832 the average value of NTC is lower for full scan sequential circuits. This is due to the fact that for both circuits 4 out of 5 sequential elements are modified to scan cells and full scan sequential circuits allow test vector reordering which gives higher degree of freedom during the optimization process. However, the
Best Primary Input Change Time
85
processing time is lower for partial scan sequential circuits as shown in Figure 4.12(b) which gives a comparison of computational overhead. It should be noted that large circuits (s1423 and s5378) are not handled in the case of the full scan sequential circuits due to the huge design space where both scan cell and test vector reordering are considered. Furthermore, for all the benchmark circuits shown in Figure 4.12(b) the computation time required for exploring the design space of partial scan is substantially smaller (orders of magnitude) than the the computation time required for exploring the design space of full scan. This is caused by the reduction in the size of the design space to be explored due to smaller number of scan cells and by the exact and polynomial time BPIC-ALG algorithm. Finally, based on the results shown in Figures 4.12(a) and 4.12(b), it may be concluded that partial scan has advantages not only in lower test area overhead and test application time, but also in lower power dissipation during test application (i.e., average value of NTC in Figure 4.12(a)) and computation time required for design space exploration (CPU time in Figure 4.12(b)) when compared to full scan.
4.6
Summary
This chapter has described a technique for minimizing power dissipation in scan sequential circuits during test application. The BPIC test application strategy is equally applicable to minimizing power dissipation in partial scan sequential circuits. Since the test application strategy depends only on controlling primary input change time, power is minimized with no penalty in test area, performance, test efficiency, test application time or volume of test data. It is shown that combining the BPIC test application strategy and scan cell reordering using a simulated annealing-based design space exploration algorithm yields reductions in power dissipation during test application in partial scan sequential circuits. This chapter has also shown that partial scan does not provide only the commonly known benefits such as lower test area overhead and test application time, but also lower power dissipation during test application and reduced computation time required for design space exploration, when compared to full scan. This reinforces that partial scan should be the preferred choice as design for test methodology for sequential circuits when low power dissipation during test application is of prime importance for high yield and reliability.
This page intentionally left blank
Chapter 5 TEST POWER MINIMIZATION USING MULTIPLE SCAN CHAINS
5.1
Introduction
The previous chapter has shown how power dissipation during test application can be minimized in scan sequential circuits with no penalty in area overhead, test application time, test efficiency, performance, or volume of test data. However, the computation of the BPIC time is dependent on the size and the value of the test vectors in the test set. Therefore, integrating the best primary input change time with scan cell and test vector reordering leads to discrete, degenerate and highly irregular design space, and hence high computation time, which limits the applicability of the BPIC test application strategy only to small to medium sized scan sequential circuits. This chapter introduces a test set independent technique based on multiple scan chains and shows how with low overhead in test area and volume of test data, and with no penalty in test application time, test efficiency, or performance, savings in power dissipation during test application in large scan sequential circuits can be achieved with reduced computation time. The extra test hardware required by the described technique employing multiple scan chains can be specified at the logic level and synthesized with the rest of the circuit. This makes the multiple scan chain-based power minimization technique easily embeddable in the existing VLSI design flow. The rest of this chapter is organized as follows. Section 5.2 introduces the technique for power minimization in large scan sequential circuits based on multiple scan chains including the DFT architecture, scan cell classification, clock tree power minimization and extention to scan BIST methodology. Partitioning scan cells in multiple scan chains based on their classification, and a test application strategy based on the DFT architecture described in the previous section are introduced in Section 5.3. Experimental results and a com87
88
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
parative study of the BPIC from Chapter 4 and the multiple scan chain-based technique are presented in Section 5.4. The chapter concludes in Section 5.5.
5.2
Power Minimization in Scan Sequential Circuits Based on Multiple Scan Chains
In this section, a technique for power minimization in full scan sequential circuits based on multiple scan chains is introduced. The first subsection overviews the DFT architecture for power minimization. The second subsection defines compatible, incompatible and independent scan cells and outlines their importance for partitioning scan cells into multiple scan chains. The third subsection shows the advantage of the DFT architecture from the clock tree power dissipation standpoint and the last subsection discusses how the described technique can be extended to scan BIST.
5.2.1
Design for Test Architecture Using Multiple Scan Chains
The DFT architecture using multiple scan chains is illustrated in Figure 5.1. The scan input ScanIn is routed to all the scan chains while the scan output ScanOut is selected from the output of each scan chain. Scan chains are operated using non-overlapping enable signals for clocks . Non-overlapping enable signals gate the system clock CLK using a scan control register where the number of flip flops equals the number of scan chains. This implies that scan chains are enabled one by one during each scan cycle. While shifting out test responses through scan chain only the bit position i of scan control register is set to 1 while the other positions are set to 0. This is easily implemented by shifting a 1 through scan control register using the extra scan clock SCLK. Before starting the first scan cycle, the initial vector 10…00 is set up in the scan control register using the scan input ScanIn. Thereafter, for each scan cycle, the 10. . .00 value is propagated circularly through the scan control register. During the normal operation of the circuit are active at the same time, since when normal/test signal N/T is 1 the outputs of the extra OR gates are 1 and CLK is not gated by the scan control register. Although the DFT architecture is described for single clock circuits, it is equally applicable to multiple clock domains. The extra logic needs no modification, however there is an additional constraint which needs to be considered when generating multiple scan chains. The additional constraint guarantees that two compatible scan cells are merged into a scan chain only if they belong to the same clock domain. In the following, a brief overview of the test application strategy for the DFT architecture is given. While shifting out the test response present in scan chain the primary inputs are set to the extra test vector which reduces the spurious transitions that originate from all the scan cells within scan chain
Multiple Scan Chains
89
Since is applied during the shifting time, this implies that there are no extra clock cycles and hence no penalty on test application time. Note that the DFT architecture has no penalty on performance since extra test hardware is not inserted on critical paths. Further, the extra test hardware required by the scan control register and the selection logic can be specified at the logic level and synthesized with the rest of the circuit, which makes the DFT architecture easily embeddable in the existing VLSI design flow. What makes the multiple scan chain-based DFT architecture particularly suitable for large scan sequential circuits is that partitioning scan cells into multiple scan chains is test set independent, and it depends only on the circuit size and structure unlike the test set dependent approaches which strongly rely on the size of the test set and hence are applicable only to small to medium sized scan sequential circuits. It should be noted that when the circuit under test is in the test mode all the faults in the extra logic are observable through ScanOut. For example, on the one hand, the faults in the selection logic are observable using test data which is shifted through the k scan chains and control data shifted through the
90
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
scan control register. On the other hand, the faults in the extra AND gates that generate the clocks , will affect the data that is loaded in scan chains, which will also be propagated to ScanOut. For example, if is never active then the constant value in will be propagated to ScanOut while shifting out the test response. However, if is active during the entire scan cycle then ScanIn will load an invalid value in which, when applied to the circuit under test, will generate an invalid test response that will be observed when shifted out. Therefore, the extra test hardware, including the selection logic shown in Figure 5.1, has no penalty on test efficiency. Before describing generation of multiple scan chains, scan cells need to be classified into three broad classes as described in the following subsection.
5.2.2
Compatible, Incompatible and Independent Scan Cells
In order to partition scan cells into multiple scan chains, they are first classified into three broad classes: compatible, incompatible and independent scan cells. It should be noted that scan cell classification is not done explicitly by enumeration or exhaustive search, but it is done implicitly by the partitioning algorithm explained later in Figure 5.9. The distinctive feature of the implicit scan cell classification is the relation between the scan cell compatibility and the fault compatibility in a reduced circuit whose lines can be justified only by the primary inputs. A byproduct of the ATPG process required to determine fault compatibility is the generation of the extra test vectors associated with every scan chain. The application of these extra test vectors reduces the spurious transitions (Definition 4.1 in Chapter 4). First the compatible and incompatible scan cells are introduced. Definition 5.1 Two scan cells and are compatible if all primary inputs are assigned values that eliminate the spurious transitions which originate from both and . The values of primary inputs constitute the extra test vector which eliminates spurious transitions originating from both and . Note that the sole purpose of extra test vectors is to reduce the spurious transitions during test application and they have no effect on fault coverage which is determined by the original test set. The application of extra test vectors defines a multiple scan chain-based test application strategy for power minimization which is detailed later in Figure 5.11. Since a single extra test vector is used for each scan chain, regardless of values loaded in scan cells, the volume of extra test data is dependent only on the number of scan chains and not on the number of scan cells and/or the size of the original test set. Definition 5.2 Two scan cells and mary input that is assigned value
are incompatible if at least one prito eliminate the spurious transitions
Multiple Scan Chains
91
originating from will propagate the transitions originating from . Two incompatible scan cells cannot be assigned to the same scan chain since there is no extra test vector that can eliminate spurious transitions, which originate from both of them. It should be noted that the terms “eliminate’’’ and “propagate” are relative to the freezing signals, which are signals that depend on primary inputs. Freezing signals are set to the controlling value as side inputs to the gates which block transitions that originate from scan cells. The procedure for finding the freezing signals is detailed in Figure 5.10. The following example illustrates the compatible and incompatible scan cells. Example 5.1 Consider the simple circuit of Figure 5.2. The primary inputs, are scan cells,
are are
92
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
present state lines, and are circuit outputs. To eliminate spurious transitions at gate while shifting out test responses through scan cell , primary input must be assigned the controlling value 0 of gate .. Similarly, to eliminate spurious transitions that originate from scan cell , , primary input must be assigned the controlling value 1 of gate . Different values must be assigned to to eliminate spurious transitions which originate from scan cells and . Therefore, scan cells and are incompatible and are assigned to different scan chains and On the other hand, by assigning to the controlling value 0 of gates and the spurious transitions which originate from both scan cells and are eliminated. Thus, by introducing and into and applying, for example, extra test vector while shifting out test responses from no spurious transitions will occur at gates and Similarly, scan cells and are compatible since assigning 1 to the primary input eliminates spurious transitions at gates and By introducing and into and applying extra test vector while shifting out test responses from no spurious transitions will occur at gates , and . It should be noted that there is a strict interrelation between extra test vector value and scan chain and and scan chain While for the sake of simplicity, the extra test vectors and have been described explicitly in this particular example, the extra test vectors and the multiple scan chains are derived implicitly using a reduced circuit, specified fault list and ATPG tool as described in the next section. Finally, note that output signals of scan chain and of are fed into the selection logic of the DFT architecture from Figure 5.1. The previous example has assumed a simple circuit where all the spurious transitions are eliminated by partitioning scan cells in two scan chains and . However, some of the spurious transitions cannot be eliminated as described in the following example. Example 5.2 Consider the circuit shown in Figure 5.3. The spurious transitions which originate in scan cells and cannot be eliminated at gate since both inputs are present state lines. However, by assigning and/or to the controlling value 0 of gate the spurious transitions will be eliminated at gate . Scan cells and are compatible since same primary input values eliminate the spurious transitions of gate . Example 5.2 has illustrated that some of the spurious transitions cannot be eliminated since all the gate inputs depend on present state lines. Computing primary input values that eliminate spurious transitions (extra test vectors) can be viewed as an ATPG problem to a reduced circuit with a specified fault list
Multiple Scan Chains
93
which are detailed in the algorithms presented in the following section. The following example briefly illustrates the generation of the reduced circuit required to compute extra test vectors. Example 5.3 For the circuit shown in Figure 5.3 the reduced circuit is generated as follows. Initially the signal at the input of gate is identified to eliminate spurious transitions that originate from scan cells and Then scan cells and and the AND gate are excluded from the reduced circuit as shown in Figure 5.4. Furthermore, gate is modified to a buffer (signals and are identical). The targeted fault in the reduced circuit is which eliminates the spurious transitions at gate in the original circuit. Finally, the extra test vectors that eliminate the spurious transitions during test application are computed A particular case of a scan cell is the self-incompatible scan cell defined as follows.
94
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Definition 5.3 A scan cell is self-incompatible if at least one primary input that is assigned value to eliminate the spurious transitions which originate from on one fanout path will propagate the transitions which originate from on a different fanout path. Now a question arises, whether the spurious transitions which originate from self-incompatible scan cells can be eliminated? In order to provide an answer consider the following example. Example 5.4 Consider the circuit of Figure 5.5 where are primary inputs, is scan cell, is present state line, and are circuit lines. To eliminate spurious transitions at gate while shifting out test responses through scan cell , primary input must be assigned the controlling value 1 of gate . However, to eliminate spurious transitions at gate primary input must be assigned the controlling value 0 of gate Different values must be assigned to to eliminate spurious transitions which originate from the same scan cell and hence scan cell is self-incompatible. However, if primary input is assigned the controlling value 0 of gate the spurious transitions which originate in and propagate on path will be eliminated. Therefore, by assigning extra test vector spurious transitions propagating on both paths and will be eliminated. The previous example has shown that following a careful examination of fanout branches of self-incompatible scan cells, most of the spurious transitions originating in self-incompatible scan cells can be eliminated using a single value for the extra test vector. However, the single extra test vector is computed at the expense of a small number of transitions that cannot be eliminated, as in the case of transitions on line in the simple circuit of Figure 5.5. Finally, independent scan cells are introduced.
95
Multiple Scan Chains
Definition 5.4 A scan cell paths which originate from by primary inputs.
is independent if none of the gates on all the has at least one side input which can be justified
The independent scan cells are grouped in the extra scan chain (ESC) for which no extra test vector can be computed and hence the spurious transitions cannot be eliminated. The following example illustrates independent scan cells. Example 5.5 Consider the circuit shown in Figure 5.6. Output depends only on scan cells and The next state of scan cell depends on scan cells and There are no side inputs of gates and that can be justified by primary inputs such that spurious transitions originated from and are eliminated. Therefore, scan cells and are independent.
5.2.3
Power Dissipated by the Buffered Clock Tree
It has been established that power dissipated in the clock tree is typically one third of the total power dissipation [131] and hence it is necessary to minimize power dissipated in the clock tree not only during functional operation but also during test application. The DFT architecture using multiple scan chains (Figure 5.1) reduces clock tree power for all the test vectors of a very small test set where each test vector is essential (i.e., detects at least one fault). Since for large dies the clock power dissipation changes from square-root dependence on the number of scan cells to a linear dependence [131] power dissipated by each scan chain can be approximated to where is dependent on clock frequency, supply voltage and wire lengths, is the size of every scan chain with
where m is the total number of scan cells and k is the
number of scan chains. When computing the average power dissipated by the
96
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
buffered clock tree per clock cycle, while shifting test responses over an entire scan cycle (m clock cycles) for the DFT architecture is since for clock cycles only the buffered clock tree feeding is active. On the other hand, the average power dissipated per clock cycle by the buffered clock tree in the traditional full scan architecture is (considering m clock cycles for a scan cycle). Using the above formulas it can be shown that is as low as when all the scan chains have an equal number of scan cells [100].
5.2.4
Extention of the DFT Architecture Based on Multiple Scan Chains to Scan BIST Methodology
So far the DFT architecture based on multiple scan chain-technique was applied to full scan sequential circuits using external automatic test equipment ATE (Figure 1.2). This is summarized in Figure 5.7 where the extra test vectors for scan chains and are highlighted. Further, it is shown that the test response which is the test response in the first scan cell from scan chain has yet to be shifted after the test responses from scan chains were already shifted out. However, the DFT architecture based
Multiple Scan Chains
97
on multiple scan chains is not applicable only to standard scan sequential circuits using external ATE. In the following the minor modifications which need to be considered when using scan BIST methodology (Figure 1.5) are given. Figure 5.8 shows that the serial output of the LFSR is fed directly into the scan chain which makes the primary inputs directly controllable while shifting out test responses from each scan chain. Therefore, extra test vectors associated with each scan chain can be applied to primary inputs while shifting in the present state part of the next test vector associated with each scan chain. Scan cells are partitioned into multiple scan chains and extra test vectors are calculated in the same way as for scan sequential circuits, as described in the following section. This will lead to a lower area overhead associated with scan BIST methodology (Figure 1.5) at the expense of higher interference from ATE which needs to store the primary input part of each test vector and the extra test vector associated with each scan chain.
5.3
Multiple Scan Chain Generation and Test Application Strategy
This section describes how scan cells are partitioned into multiple scan chains. Then, a test application strategy for power minimization during test application, based on the DFT architecture, is introduced.
98
5.3.1
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Partitioning Scan Cells into Multiple Scan Chains
Multiple Scan Chain Partitioning (MSC-PARTITIONING) algorithm identifies compatible scan cells, groups them in multiple scan chains and computes an extra test vector for every scan chain. Figure 5.9 gives the flow of the MSCPARTITIONING algorithm, which is divided into five parts identified in boxes marked from (a) to (e). In order to facilitate the elimination of spurious transitions by computing an extra test vector for each scan chain, the initial circuit C needs to be transformed to a reduced circuit C’ whose lines can be justified only by the primary inputs (box (a)). A byproduct of the reduction procedure is a specified fault list L (box (b)) which is targeted by an ATPG process on the reduced circuit C’ (box (c)). Associated with each fault stuck-at noncontrolling value on wire in the specified fault list L is a set of scan cells whose spurious transitions will be eliminated in the original circuit C by applying extra test vector which detects in the reduced circuit C’. Therefore, based on the fault compatibility in the reduced circuit C’, scan cell classification in the original circuit C is done implicitly. This is because, using the list of scan cells associated with every fault from L, compatible faults in C’ define the partitions in the scan chain from C (box (d)). However, some scan cells may be self-incompatible which leads to iterations through the ATPG process with a respecified fault list (box (e)) until no selfincompatible scan cells are left. At the end of the algorithm the multiple scan chains and the extra test set will be used by the test application strategy described later in this chapter. In the following each part of the MSC-PARTITIONING algorithm is explained in detail. (a) In the first part, the initial circuit C is transformed into a reduced circuit C’ as described in CIRCUIT-REDUCTION algorithm of Figure 5.10. The algorithm also identifies the freezing signals which are the signals that depend on primary inputs and should be set to the controlling value as side inputs to the gates which block transitions that originate from scan cells, as described in the following parts. Two lists of eliminated_gates and modified_gates contain the gates which ought to be eliminated and modified respectively in the reduced circuit C’. Initially, eliminated_gates contains all the scan cells whereas the modified_gates is void (lines 1-2). The circuit is traversed in breadth first search order using two lists current_frontier and new_frontier. While current_frontier is set initially to all the scan cells of C (line 3), the new_frontier initially is void (line 4). In the inner loop (lines 6-13), for all the gates that are neighbors of the current frontier, it is checked whether the input gates already belong to the eliminated_gates (i.e., depend on scan cells). If this is the case then the currently evaluated gate is introduced into eliminated_gates, removed from modified_gates
Multiple Scan Chains
99
(if applicable) and introduced to new_frontier. If at least one input does not belong to eliminated_gates then the currently evaluated gate is introduced to modified_gates. In the outer loop (lines 5-16) while current fron-
100
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
tier is not void (i.e., no more gates need to be eliminated) the inner loop proceeds further. At the end of each iteration of the outer loop the current_frontier and the new_frontier are updated (lines 14 and 15). Finally, using the eliminated_gates and the modified_gates, the initial circuit C is modified to the reduced circuit C’ (lines 16 and 17) as follows: gates that belong to eliminated_gates (depend only on scan cells) are excluded; gates that belong to modified_gates (depend on both scan cells and primary inputs) are modified to gates with input signals dependent only on primary inputs (in the case of gates with two inputs of which one is a freezing signal the gate is modified to a buffer); all the freezing signals identified in the first step are set as the primary outputs of the reduced circuit C’. Freezing signals which are the outputs of the gates present in the modified_gates, are determined simultaneously with identifying independent scan cells. The independent scan cells are grouped into the ESC which consists of scan cells whose spurious transitions cannot be eliminated by computing an extra test vector. The algorithm returns not only the reduced circuit C’ but also the list of the freezing signals that will be used in the following part of the algorithm. (b) In the second part, a specified fault list L is created which will be provided together with the reduced circuit C’ to an ATPG tool. Specified fault list L comprises freezing signals targeting the stuck-at the noncontrolling value of the gate from modified_gates list of algorithm CIRCUIT-REDUCTION from Figure 5.10. It is important to note that each fault has attached a list of scan cells whose spurious transitions in the initial circuit C are eliminated when setting gate to its controlling value. The list of scan cells is required during the generation of the scan chains in part (d) of the MSC-PARTITIONING algorithm. (c) In the third part, having generated the reduced circuit C’ and the specified fault list L, any state-of-the-art combinational ATPG tool can be used to generate test vectors for the faults from L for C’. Test vectors for the faults from L are the extra test vectors required to eliminate spurious transitions while shifting test responses in the initial circuit C as described in part (d). Since the freezing signals are primary outputs in C’ as described in part (a) then L contains faults only on primary outputs. This will speed up the ATPG process since only backward justification with no forward propagation is required. Moreover, the specified fault list is significantly smaller than the entire fault set which will further reduce the ATPG computation time for computing extra test vectors. It should be noted that some faults from L are redundant which implies that no extra test vector can be computed to stop the propagation of the spurious transitions from scan cells
Multiple Scan Chains
101
associated with the respective fault. However, these scan cells are treated as self-incompatible and handled by re-specifying the fault list as described in the last part (e) of the MSC-PARTITIONING of Figure 5.9.
102
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
(d) Given the extra test with a list of faults from L detected by each extra test vector the scan cell classification is done as follows. If two faults and from L are incompatible (i.e. they are not detected by the same extra test vector) then each element of the two lists of scan cells associated with the two faults, and respectively, are incompatible. Otherwise they are compatible. This leads to grouping all the scan cells, associated with faults detectable by a single extra test vector, into a single scan chain. However, this may lead to self-incompatible scan cells when different extra test vectors eliminate spurious transitions from the same scan cell. Consequently, if there are any self-incompatible cells, the MSC-PARTITIONING algorithm will iterate through parts (e), (c), (d) as explained next. (e) In the case that there are self-incompatible scan cells after the generation of multiple scan chains, the problem needs to be addressed as it was briefly explained in Example 5.4. The faults which have attached self-incompatible scan cells are removed from fault list L and new faults are specified on the lines in the fanout paths of Thus, the respecified fault list L will be provided back to the ATPG process for computing extra test vectors (part (c)) which will be followed by new multiple scan chain generation based on fault compatibility (part (d)). This iterative process continues until there is no self-incompatible scan cell left. The MSC-PARTITIONING algorithm of Figure 5.9 returns the scan chains of compatible scan cells, the extra scan chain ESC and the extra test set, which are used to define a test application strategy as described in the following section.
5.3.2
Test Application Strategy Using Multiple Scan Chains and Extra Test Vectors
Having partitioned the scan cells into multiple scan chains with an extra test vector for each scan chain, this section introduces a test application strategy for power minimization during test application in full scan sequential circuits. Multiple Scan Chain Test Application (MSC-TEST APPLICATION) algorithm computes the node transition count during the entire test application period for a given test set S, circuit C, multiple scan chains and extra test set Figure 5.11 gives the pseudocode of the MSC-TEST APPLICATION algorithm. The value of NTC is 0 at the beginning of the algorithm and it is gradually increased as the entire test set is traversed. The outer loop represents the traversal of all the test vectors with from test set S. Shifting out test responses through all the scan chains is then considered in the inner loop. For each scan chain circuit C is simulated by applying the extra test vector to primary inputs and is added to the node transition count NTC. stands for the node transition
Multiple Scan Chains
103
count while shifting in the present state part of test vector through scan chain and applying the extra test vector to the primary inputs. After shifting out the test responses though each scan chain the primary input part of test vector is applied to primary inputs and is computed while shifting out test response through the extra scan chain ESC. Finally the entire
104
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
test vector is applied to the circuit under test and required to load the test response in the scan cells, is added to NTC. After the completion of the inner loop, the outer loop continues until the entire test set is examined. The algorithm returns the value of NTC over the entire test application period.
5.4
Experimental Results
This section demonstrates through a set of benchmark examples that multiple scan chains combined with extra test vectors yield savings in power dissipation during test application. The algorithms described in Section 5.3 were implemented on a 500 MHz Pentium III PC with 128 MB RAM running Linux and using GNU CC version 2.91. The first subsection shows the reduction in power dissipation at the expense of low overhead in test area and volume of test data, when the multiple scan chain-based technique is employed for power minimization. The second subsection provides a comparison with the BPIC test application strategy presented in Chapter 4.
5.4.1
Experimental Results for Multiple Scan Chain-Based Power Minimization
The average value of NTC reported throughout this section is calculated using the Equation 2.1 from Chapter 2 under the assumption of the zero delay model. First column of Table 5.1 gives the number of scan cells of all full scan sequential circuits from ISCAS89 benchmark set [13]. Second and third columns give the number of scan chains (SC) and the length of the ESC, respectively, computed using the MSC-PARTITIONING algorithm. The number of scan chains varies from 2 as in the case of s208 up to 7 as in the case of s38584. The small number of scan chains implies that both area overhead required to control multiple scan chains and volume of test data overhead caused by extra test vectors are very low since they are proportional to the number of scan chains. For most of the examples the size of the extra scan chain is nil or very low. However, there are two extreme cases as in the case of s13207 and s38417 where the number of independent scan cells is very high leading to an increase in ESC length and hence insignificant penalty in power reduction. Furthermore, the computation time is very low (< 1s) for small circuits. For large circuits, such as s38584, it takes < 3600s to achieve reduction in average value of NTC. Tables 5.2, 5.3 and 5.4 show the experimental results for all the circuits from ISCAS89 benchmark set [13] using three different ATPG test tools [58,59, 83]. The first and second columns of Table 5.2 give the circuit name and the number of test vectors (TV) generated using the ATALANTA test tool [83]. Third column shows the initial average value of NTC (traditional NTC), which is the total value of NTC using the traditional single scan chain design [1] and ALAP test application strategy (see Chapter 4) divided by the total number of
Multiple Scan Chains
105
clock cycles over the entire test application period. Column 4 shows the final average value of NTC when using multiple scan chains and extra test vectors. The same experiment was completed for non-compact test sets generated by ATOM test tool [58] (Table 5.3) and compact test sets generated by MINTEST compaction tool [59] (Table 5.4), respectively. It should be noted that all the three test sets [58, 59, 83] achieve 100% fault coverage. It can be seen that the test application strategy MSC-TEST APPLICATION has smaller average value of NTC for all the benchmark circuits when compared to the single scan chain case.
106
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
To give an indication of the reductions in power dissipation, Table 5.5 shows the percentage reduction in power dissipation and Table 5.6 shows the percentage overhead in volume of test data (columns 2-4) and test area (column 5). The test area overhead represents the extra logic required to multiplex the scan output signal (Figure 5.1) and it is computed accurately by synthesizing and technology mapping the ISCAS89 circuits to AMS 0.35 micron technology [5]. The volume of test data overhead represents the number of extra bits required for the extra test vectors (the number of scan chains multiplied by the number of primary inputs). Note that test area overhead decreases as the complexity of the circuit increases. This is due to the fact that extra area occupied by scan control register and selection logic (Figure 5.1) required to control mul-
Multiple Scan Chains
107
tiple scan chains is very small when compared to the size of large sequential circuits. The power reduction varies from approximately 82% as in the case of 15850 down to under 17% as in the case of s832 when employing MINTEST [59]. It should be noted that moderate power reduction as in the case of s386, s510, s820, s832, s1488, s1494 is due to very small number of scan cells (5 to 6 scan cells only as shown in Table 5.1) which are difficult to partition in multiple scan chains. However, for modern complex digital circuits where the number of scan cells is significantly higher (1426 as in the case of s38584) the power reduction is up to 69% at the expense of insignificant (< 1%) volume of test data and test area overhead. This shows the advantage of the multiple scan chain-based power minimization technique for large scan sequential circuits.
108
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
A further advantage of the discussed technique is that due to its test set independence the final average value of NTC is predictable within a given range of values regardless of test vectors applied to the circuit. This is justified by the fact that the low overhead area associated with the multiple scan chain architecture is not overly sensitive to the values of test vectors since only a single chain is active at a time and the spurious transitions within the combinational circuit are eliminated by the extra primary input vector regardless of the value loaded in non-active scan chains. This is shown in Figure 5.12 where the graphs for average value of NTC for 7 largest ISCAS89 benchmarks under three different size test sets are given. For all three test sets, MINTEST [59], ATALANTA [83] and ATOM [58], the average values of NTC are are approximately equal.
Multiple Scan Chains
109
This implies that the multiple scan chains can further be applied to more DFT methodologies such as scan-based BIST [1] where regardless of the value of the pseudorandom test set the savings in power dissipation are guaranteed and final values of NTC are predictable.
5.4.2
Comparison with BPIC Test Application Strategy
In order to show the suitability of the multiple scan chain-based technique for large scan sequential circuits, a comparison, in terms of power reduction, overhead in volume of test data, test area, and computation time, for BPIC test application strategy (see Chapter 4) and multiple scan chain-based technique,
110
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
is given in Figure 5.13. It can be concluded from Figure 5.13(a), in the case of large sequential circuits for which it is infeasible to compute the best primary input change time using the test set dependent approach from Chapter 4, the test set independent solution presented in this chapter is applicable with low computation time. For example, BPIC is computationally intractable for all the large scan sequential circuits, while multiple scan chain-based power minimization yields savings in power dissipation with low overhead in volume of test data and test area. For example circuit s953, the 25% saving in power dissipation in the case of BPIC (Chapter 4) is significantly smaller when com-
Multiple Scan Chains
111
pared to 54% savings in the case of multiple scan chain-based technique. This is achieved at the expense of additional overhead of 6% in test area, and 1% in volume of test data, as shown in Figure 5.13(b). However, for small to medium sized circuits where design space exploration of a small number of scan cell and test vector orderings is feasible within reasonable computational limits, BPIC is applicable since it has no penalty in test area, performance, test efficiency, test application time or volume of test data.
5.5
Summary
This chapter has presented a technique based on multiple scan chains and shown how with low overhead in test area and volume of test data and with no penalty in test application time, test efficiency, or performance, savings in power dissipation during test application in large scan sequential circuits can be achieved in very low computation time. The technique is based on a multiple scan chain-based DFT architecture and a multiple scan chain-based test application strategy. The multiple scan chain-based technique is test set independent with no penalty in test application time or test efficiency. The DFT architecture requires low overhead in test area to control multiple scan chains, which are successfully combined with extra test vectors in defining a test application strategy. Due to the efficient scan chain partitioning algorithms the multiple scan chains technique is computationally inexpensive, which makes it suitable for large sequential circuits. Finally, the easily synthesizable extra hardware required by the DFT architecture, the efficient partitioning algorithms, and the multiple scan chain-based test application strategy described, make the solution described in this chapter easily embeddable in the existing VLSI design flow using the state of the art third party electronic design automation tools.
112
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Chapter 6 POWER-CONSCIOUS TEST SYNTHESIS AND SCHEDULING
6.1
Introduction
To avoid unnecessary iterations in the design flow, research interests have shifted towards addressing testability at higher levels of abstraction during the early stages of the VLSI design flow [36]. The complexity of designs that employ BIST methodology makes BIST hardware insertion particularly suitable at the RTL. This is illustrated in Figure 6.1 where the initial design is specified in an HDL at register-transfer level of the VLSI design flow [14]. BIST hardware, which includes test registers and a BIST controller as described in Figure 1.10(b), is inserted at RTL. This makes it possible for RTL synthesis to translate the initial design into a BIST network of logic gates prior to logic optimization which satisfies the area and delay constraints and prepares the design for the physical design automation tools. To fully exploit the testability benefits of BIST data paths, power dissipation during test application in BIST data paths needs to be accounted for, and power-conscious test synthesis and scheduling algorithms equally applicable to BIST embedding need to be developed. This is of particular importance when power dissipation during the functional operation is not exceeding a given power constraint as it is the case for RTL data paths synthesized using low power high level synthesis algorithms [77] outlined in Figures 2.2 and 2.3 in Chapter 2. This chapter introduces power-conscious test synthesis and test scheduling algorithms that account for power dissipation during the testable design space exploration. It was established in Chapter 1, that power dissipation and BIST area overhead decrease as test application time increases. Since power dissipation is dependent on switching activity of all the active elements during every test session, there is a significant variation in power dissipation due to useless power dissipation. This chapter shows that considering the interrelation 113
114
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
between test synthesis and test scheduling described in Chapter 1, and investigating their impact on power dissipation during test application, useless power dissipation is eliminated which leads to low power BIST data paths. This chapter explains how power-conscious test synthesis and test scheduling algorithms can be integrated in the testable design space exploration, and how this leads to savings in power dissipation both during test application and while shifting out of test responses. The rest of the chapter is organized as follows. Section 6.2 accounts for sources of power dissipation in BIST data paths and outlines a taxonomy for power dissipation in BIST data paths. The effect of test synthesis and test scheduling on power dissipation during test application is investigated in Section 6.3. Power-constrained testable design space exploration, using powerconscious test synthesis and test scheduling algorithms, is described in Section 6.4. Experimental results are described in Section 6.5 and the concluding remarks are given in Section 6.6.
Power-Conscious Test Synthesis and Scheduling
6.2
115
Power Dissipation in BIST Data Paths
This section explains why power dissipation during test needs to account for additional factors, such as useless power dissipation, and gives a taxonomy of power dissipation during test application in BIST data paths. Considering power minimization at higher levels of abstraction is of great importance when various alternatives in the functional low power design space are explored at higher levels of design abstraction [77]. The existing power-constrained test scheduling approaches are optimistic for BIST data paths due to the following observations: (a) test scheduling assumes fixed amount of power dissipation associated with each test which is not the case for BIST data paths. (b) test scheduling is performed on a fixed test resource allocation without considering the strong interrelation between test synthesis and test scheduling as outlined in Figure 1.9 from Chapter 1. The fixed amount of power dissipation assumption (observation (a)) is not valid in the case of BIST data paths where transitions associated with necessary power dissipation required for testing each module can propagate to other registers and further to untested modules leading to useless power dissipation. This useless power dissipation does not have any influence on test efficiency and it is not caused only by test scheduling but also by test synthesis as shown in Section 6.3. Since test synthesis and test scheduling are strictly interrelated as described in Chapter 1, the existing power-constrained test scheduling algorithms based on fixed test resource allocation (observation (b)) will lead to prohibitively large computation time hindering efficient exploration of the testable design space. According to the necessity for achieving the required test efficiency, power dissipation is classified into necessary and useless power dissipation: Definition 6.1 Necessary power dissipation is the power dissipated in test registers and tested modules during each test session and the power dissipated in test registers while shifting in seeds for test pattern generators and shifting out responses from signature analyzers. Necessary power dissipation is compulsory for achieving the required test efficiency, however, the useless power dissipation must be eliminated. In order to introduce useless power dissipation, first spurious transitions in BIST data paths are defined. While Chapter 4 has introduced spurious transitions during test application in scan sequential circuits at the logic level of abstraction (Definition 4.1), the following definition introduces spurious transitions when using BIST for RTL data paths.
116
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Definition 6.2 A spurious transition when employing BIST for RTL data paths is a transition which occurs in modules and/or registers that are not used in the current test session. These transitions do not have any influence on test efficiency since the values at the input and output of modules and/or values loaded in registers are not useful test data. Definition 6.3 Useless power dissipation is the power dissipated in registers and untested modules due to spurious transitions which cannot be eliminated by any configuration of control signals for data path multiplexers.
Example 6.1 To show sources of useless power dissipation, consider the BIST data path shown in Figure 6.2, where and are active simultaneously. For the sake of simplicity, the modules whose output responses are analyzed by and do not appear in Figure 6.2. The output of is connected to while is connected to and In the case of the inactive register can be selected by using the appropriate value of control signal which stops the propagation of transitions that occur in However, in the case of there is no value for control signal such that transitions which occur in or are eliminated at the input of Thus, spurious transitions occur in without any influence on test efficiency leading to useless power dissipation.
Power-Conscious Test Synthesis and Scheduling
117
According to the occurrence during the testing process power dissipation is classified into test application and shifting power dissipation, as defined in the following. Definition 6.4 Test Application Power dissipation is the power used during execution of each test session when test patterns, necessary to achieve the required test efficiency, are applied to modules. Definition 6.5 Shifting Power dissipation is the power used while shifting in the seeds for test pattern generators, required for next test session, and shifting out the responses stored in signature analyzers at the end of the previous test session. It is interesting to note that multiplexer power is very low during test application and while shifting in seeds and shifting out responses. This is unlike the case of functional operation where power dissipated by multiplexers is large [116]. This is due to the fact that multiplexer control signals are modified only at the start and at the end of every test session, thus avoiding any glitching activity which can propagate from control logic.
6.3
Effect of Test Synthesis and Scheduling on Useless Power Dissipation
In order to eliminate useless power dissipation, the effect of test synthesis and scheduling on useless power dissipation is analyzed through three detailed examples. The first example analyzes the effect of test synthesis on both test application and shifting power in modules and registers. The second example examines the effect of module selection during test scheduling on elimination of useless power dissipation in both registers and modules. The third example illustrates the effect of power-conscious test synthesis on BIST area overhead, performance degradation and volume of test data for the example data path illustrated in Figure 2.3 from Chapter 2. In order to show the need for eliminating useless power dissipation, Example 6.2 investigates the effect of test synthesis on test application and shifting power dissipation. Example 6.2 Consider the BIST data path shown in Figure 6.3 and assume that modules and are tested simultaneously without exceeding the given power constraints. When linear feedback shift register generates test patterns for any configuration of control signals for multiplexer at the input of will lead to useless power dissipation in (Figure 6.3(a)). Moreover, the transitions causing the useless power dissipation in will be propagated to leading to useless power dissipation in both modules and registers. However, when generates test patterns for by selecting the inactive reg-
118
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Power-Conscious Test Synthesis and Scheduling
119
ister at the input of will lead to the elimination of useless power in without any penalty in test area or test efficiency (Figure 6.3(b)). selection also implies that useless power dissipation in is eliminated. It should be noted that test synthesis has a profound impact on shifting power since an inappropriate selection of test registers may lead to useless power dissipation in and in modules while shifting in the seeds for test pattern generators and (Figure 6.3(a)). Having described the effect of test synthesis on useless power dissipation, the following example examines the effect of module selection during test scheduling on useless power elimination in both registers and modules. Example 6.3 Consider the BIST data path shown in Figure 6.4. Assume that module is already scheduled in the current test session and the selection of and is examined. For the sake of simplicity signature analysis registers for and are not shown in Figure 6.4 and registers and are not used as analyzers in the current test session. By selecting to be tested simultaneously with and by choosing the inactive module at the input of register useless power is eliminated in However, any configuration of control signals for the multiplexer at the input of will lead to useless power dissipation in (Figure 6.4(a)). The useless power in both and is eliminated by selecting to be tested simultaneously with and setting the appropriate values on control signals of multiplexers at the inputs of and (Figure 6.4(b)). So far the effect of test scheduling on useless power in registers has been outlined. Now, to examine the effect of test scheduling on useless power in modules consider the circuit shown in Figure 6.5. Assume that analyzes test responses from and can analyze test responses from either or Scheduling the test for at the same time with the test for will lead not only to useless power dissipation in but also in This is because both and are active at the same time which leads to propagation of spurious transitions from to (Figure 6.5(a)). The useless power is eliminated in both and by selecting to be tested simultaneously with and setting the appropriate values on control signals of multiplexers at the input of (Figure 6.5(b)). This shows that test scheduling has an effect on useless power dissipation in both registers and modules. The following example investigates the impact of power-conscious test synthesis and test scheduling on BIST area overhead, performance degradation and volume of test data. Example 6.4 It was shown in Chapter 2, how data flow graph shown in Figure 2.2 is synthesized in the data path shown in Figure 2.3 such that low power
120
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Power-Conscious Test Synthesis and Scheduling
121
122
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
dissipation during functional operation is achieved. As described in Example 2.2 from chapter 2, during clock cycles 1 and 4 the active elements are registers and multiplier (*). Since excessive power dissipation during BIST can damage the circuit under test it is important that data path circuit is tested in two separate sessions, one for multiplier (*), and the other for adder (+) and subtracter (–). Figure 6.6 illustrates the self-testable data path in two test sessions: first session for the multiplier ( * ) (Figure 6.6(a)) and second session for the adder (+) and subtracter (–) (Figure 6.6(b)). The BIST data path shown in Figure 6.6 is obtained such that the given power constraint derived from functional operation is not exceeded during test application and considers test application time and BIST area overhead as minimization objectives without accounting for useless power dissipation. This algorithm will be referred to as Time and Area Test Synthesis and Scheduling (TA-TSS). Figure 6.7 illustrates the BIST data path in the two test sessions when applying the Power-Conscious Test Synthesis and Scheduling (PC-TSS) algorithm detailed in the next section using the observations from Examples 6.2 and 6.3. The main objective of TA-TSS is to minimize test application time under the given power constraint with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application time. Unlike TA-TSS, the main objective of PC-TSS is to eliminate useless power dissipation, and then use test application time and BIST area overhead as tie-breaking mechanism as outlined in Section 6.4. Therefore, PC-TSS leads to more test registers than TA-TSS with the benefit of eliminating useless power dissipation. It should be noted that the power constraint is exceeded in both test sessions when TA-TSS is employed due to useless power dissipation shown in registers and subtracter (–) of Figure 6.6(a) and registers and multiplier (*) of Figure 6.6(b). In order to show that TA-TSS ignores useless power dissipation and hence exceeds power constraints the following experiment was conducted. Registers, test registers, and functional units (modules) were synthesized and implemented using AMS 0.35 micron technology [5]. Using a real delay model simulator [91] and cell library timing and power information operating at supply voltage 3.3V and clock frequency 100MHz, the following power values were obtained for 8 bit data path width using pseudorandom sequences applied during testing: and Using the register and module activity from Example 2.2 of Chapter 2, the power dissipated during functional operation of the data path from Figure 6.6 is 16.5mW in clock cycles 1 and 4, due to the activity of the following elements: and (*). Considering manufacturing process tolerance the power constraint during testing is set to 20mW. When using TA-TSS, due to ignorance of useless power dissipation during test synthesis and scheduling, power dissipation for the first test session (Figure 6.6(a)) is 24.1mW, and 30.1mW for
Power-Conscious Test Synthesis and Scheduling
123
124
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
the second test session (Figure 6.6(b)). This shows that ignoring useless power dissipation there can be a significant violation of the power constraint due to substantially higher power dissipation during testing. However, when employing PC-TSS the data path circuit from Figure 6.7 dissipates 13.5mW in clock cycles 1 and 4, due to the activity of the following elements: and (*). During testing, useless power dissipation is eliminated and 16mW are dissipated in the first test session (Figure 6.7(a)) and 16.5mW during the second test session (Figure 6.7(b)). It should be noted that for both self-testable data paths of Figures 6.6 and 6.7 the volume of test data consists of 6 seeds for test pattern generators and 3 signatures to be shifted out and compared with the fault-free responses. So far the effect of power-conscious test synthesis on BIST area overhead and volume of test data has been analyzed. On the other hand, power-conscious test scheduling has a direct impact on test application time and an indirect impact on BIST area overhead due to the merged implementation of the functional and BIST controller (Figure 1.10) which controls the execution of test sessions, and shifting in seeds to LFSRs and shifting out signatures from MISRs. Therefore, when both circuits from Figures 6.6 and 6.7 are synthesized and implemented in AMS 0.35 micron technology [5] the following results are obtained for 8 bit data path width. Total area of circuit from Figure 6.6 is 96 sqmil, whereas total area of the circuit from Figure 6.7 is 97 sqmil. This leads to a minor increase in BIST area overhead at the benefit of an improvement in performance from 145 MHz for the circuit of Figure 6.6 to 147 MHz for the circuit of Figure 6.7, due to fewer performance degrading test registers such as BILBOs. Therefore, for the data path example shown in Figures 6.6 and 6.7, PC-TSS has a minor impact on BIST area overhead and performance degradation, and no effect on volume of test data.
6.4
Power-Conscious Test Synthesis and Scheduling Algorithm
Having outlined in Section 6.3 the effect of test synthesis and test scheduling on useless power dissipation, now power-conscious test synthesis and scheduling is described. PC-TSS has been integrated into an efficient tabu searchbased testable design space exploration which combines the accuracy of incremental test scheduling algorithms with the exploration speed of test scheduling algorithms based on fixed test resource allocation [111, 102]. Elimination of useless power dissipation introduced in Section 6.2 is carried out in two steps. The first step is based on power-conscious test synthesis moves during the testable design space exploration, while the second step describes module selection during power-conscious test scheduling. In order to provide a meaningful understanding of the two steps, an overview of the tabu search-based testable design space exploration is summarized.
Power-Conscious Test Synthesis and Scheduling
125
126
6.4.1
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Tabu Search-Based Testable Design Space Exploration
This subsection summarizes tabu search-based testable design space exploration [100, 111] in order to provide a general framework for understanding how useless power dissipation is eliminated in BIST data paths. Tabu search [55] was proposed as a general combinatorial optimization technique. Tabu search falls under the larger category of move-based heuristics, which iteratively construct new candidate solutions based on the neighborhood that is defined over the set of feasible solutions and the history of optimization. The neighborhood is implicitly defined by a move that specifies how one solution is transformed into another solution in a single step. The philosophy of tabu search is to derive and exploit a collection of principles of intelligent problem solving. Tabu search controls uphill moves and stimulates convergence toward global optima by maintaining a tabu list of its r most recent moves, where r is called tabu tenure and it is a prescribed constant. Occasionally, it is useful to override the tabu status of a move when the move is aspirated (i.e., improves the search and does not produce cycling near a local minima). Tabu search based heuristics are simple to describe and implement. Furthermore, a well defined cost function and the use of topological information of the design space will lead to an intelligent search of high quality solutions in very low computation time. A solution in the testable design space is a testable data path T-DP where the test pattern generators and the signature analysis register are allocated for each data path module. The tabu search-based testable design space exploration is summarized in Figure 6.8. The algorithm starts with an initial solution which is a testable data path obtained by randomly assigning a single test pattern generator to each input port of every module from the data path as expressed by lines 1 to 4. During the optimization process (lines 5 to 22) for each current solution neighbor solutions are generated by moves in the design space (line 7). The neighborhood of the current solution in the testable design space is defined with feasible neighbor solutions, where is the number of data path registers. For each data path register there is a single neighbor solution. Each of the solutions is provided by an independent subroutine designed to identify better configuration of test registers based on two metrics which measure the potential of each solution to reduce test application time and BIST area overhead. A detailed description of moves in the design space and speedup techniques for fast testable design space exploration are given in [100, 111]. If the new testable design does not increase useless power dissipation (algorithm ACCEPT-MOVE), test application time and BIST area overhead are computed after a test schedule is generated, by execution of lines 8 to 12. The optimization process is guided toward the objective of minimal test application time design by a cost function which is defined as follows. The cost function is a 2-tuple where
Power-Conscious Test Synthesis and Scheduling
is the test application time, relations are defined: (a)
if
and
(b)
if
or
127
is the BIST area overhead and the following
and
128
(c)
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
if
or
and
The main focus of the cost function is test application time with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application time. It should be noted that the minimization of other parameters such as performance degradation, volume of output data, overall test application time and fault escape probability, is a by-product of the design space exploration using the previously defined cost function. Based on the value of the cost function and on the tabu status of a move, a new solution is accepted or rejected as described by lines 15 to 20 in Figure 6.8. The tabu list contains registers involved in a move. A move is classified as tabu if a register involved in the move is present in the tabu list. The tabu tenure (length of the tabu list) varies from 5 (small designs) to 10 (complex designs). A move is aspirated as shown in line 15 if it has produced a solution which is better than the best solution reached so far. The testable design space exploration continues until the number of iterations since the previous best solution exceeds a predefined It should be noted that in order to explore the testable design space under power constraints, generic power models need to be considered. Estimating power dissipation at a lower level of abstraction for each solution will hinder the efficient exploration of the testable design space. This is due to the fact that module selection during test scheduling (algorithm SELECT-MODULE) requires power computation in each stage of the test scheduling algorithm for each solution in the design space. Thus by having high level power models will lead to quick computation of power dissipation and fast examination of different alternatives in the solution space.
6.4.2
Move Acceptance During Power-Conscious Test Synthesis
The previous subsection has overviewed the tabu search-based testable design space exploration. In order to minimize power dissipation, move acceptance criteria (line 8 of Figure 6.8) must be modified to examine if the newly generated testable design leads to useless power dissipation (see Example 6.2). If a move generates a testable design with useless power dissipation then it is rejected. Figure 6.9 shows the ACCEPT-MOVE algorithm. Given the testable data path T-DP and the test registers of the current solution (i.e., left and right test pattern generators, and and signature analyzers SA), the algorithm accepts or rejects the new testable designs by analyzing the interconnect between test registers and modules. For every module from the output module set of test registers of the current solution, the left and right input register sets (LIRS and RIRS) are examined. If all the registers from either input register set are test registers then the move is rejected. This is because there will be no value of control signals for multiplexers at the input of data
Power-Conscious Test Synthesis and Scheduling
129
path modules that will eliminate the propagation of spurious transitions. By rejecting the testable data paths using the ACCEPT-MOVE algorithm useless power will be eliminated both during test application and while shifting out test responses. It should be noted that if all the moves lead to useless power dissipation then the move which leads to lowest test application time is accepted and the useless power dissipation is minimized using power-conscious test scheduling described in the following subsection.
6.4.3
Module Selection During Power-Conscious Test Scheduling
Testable design space exploration aims to minimize test application time under power constraints with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application time. In order to satisfy the power constraints during test application, the test application time is computed by carrying out the following modifications to the test scheduling algorithm based on partitioned testing with run to completion [32]. (a) A module selection algorithm SELECT-MODULE is described such that useless power is eliminated.
130
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
(b) The power dissipated by scheduling the selected module is computed and if the power constraint is not satisfied during the current test time then test for is removed from the candidate node set [32] being postponed for a later test time. Figure 6.10 shows the algorithm for module selection during power-conscious test scheduling. The module selection aims to eliminate useless power dissipation not only in useless registers (UR) at the output of currently tested modules, but it considers also the useless power dissipation in useless modules (UM) to which spurious transitions are propagated through useless registers (see Example 6.3). Given the testable data path, the modules scheduled at the current test time (tested modules) and the candidate modules to be scheduled according to the resource conflict graph [32, 111], the algorithm SELECT-MODULE selects the candidate module which when scheduled at the current test time will lead to the minimum increase in power dissipation. Initially the active module set (AMS) contains the tested modules, while the active register set (ARS) contains the test registers which generate test patterns and analyze test responses for
Power-Conscious Test Synthesis and Scheduling
131
currently tested modules. For each candidate module the power dissipation is computed by recursively propagating spurious transitions through UR and UM (lines 4 to 11 in Figure 6.10). Initially both sets of useless registers and useless modules are null. UM is computed using ARS and UR. A module is assigned to UM if all the registers in its left or right input register set are active at the current test time. The useless modules are considered for detecting the propagation of spurious transitions to useless registers. Once AMS is updated with UM, useless registers UR are computed using the updated AMS. A register is assigned to UR if all the modules in its input module set are active at the current test time. All the useless registers detected in the current iteration are used to update the set of active registers ARS in the next iteration. Once ARS is updated, new useless modules are detected and this recursive propagation of spurious transitions continues until no new useless registers are detected. At the end of the recursive propagation of spurious transitions (lines 5 to 10 in Figure 6.10) AMS and ARS contain not only the tested modules and their test registers, but also all the active data path elements during the current test time. AMS and ARS are used to compute both necessary and useless power dissipation associated with selecting candidate module to be scheduled at the current test time. Finally, the candidate module which leads to minimum power dissipation is selected to be scheduled at the current test time. It is interesting to note that both algorithms ACCEPT-MOVE and SELECTMODULE introduced in this section guarantee that as far as there is a potential solution leading to lower test application time and BIST area overhead, and which eliminates useless power dissipation, then it will be selected. The problem of fixed amount of power dissipation associated with each test is overcome by accounting for useless power dissipation during test register allocation and test scheduling. It should also be noted that for designs where gated clocks are employed at register-transfer level, useless power dissipation in registers can also be eliminated by gating the clock of the inactive registers. However, elimination of useless power dissipation in modules by controlling multiplexer inputs is necessary even in the case when modules are highly sequential and use power enable/disable signals to turn off the modules which are not targeted. This is due to the fact that useless power dissipation is eliminated in the combinational logic up to the first sequential boundary in the module only where power enable/disable signals take effect. Therefore, although the techniques described in this chapter are geared towards design styles which do not employ gated clocks and/or power enable/disable signals they can successfully be combined with clock gating power reduction methodologies leading to further savings in power dissipation. Finally, it should be noted that the amount of power dissipation during test scheduling is modeled using test power models, which have been discussed in Chapter 2. Since the techniques described in this chapter focus on the elimination of useless power dissipation, which is
132
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
equally applicable to any test power model, the experimental validation uses a simple test power model represented by a single value associated to every data path entity. The value represents the power dissipation over 128 clock cycles when pseudorandom patterns are applied to the entity. Also, more accurate test power models for deterministic BIST, which can further increase test concurrency when exploited during test scheduling, can also be embedded into PC-TSS.
6.5
Experimental Results
Power-conscious test synthesis and test scheduling were successfully implemented on a SUN SPARC 20 workstation and integrated within the testable design space exploration environment described in Section 6.4.1. To give insight into the importance of considering the effect of test synthesis and test scheduling on useless power dissipation, Figures 6.11 to 6.13 show a comparison of test application time (TAT), BIST area overhead (BAO), test application power (TAP) and shifting power (SP) when using PC-TSS and TA-TSS for BIST data paths. The main objective of TA-TSS is to minimize test application time under the given power constraint with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application without any consideration of useless power dissipation during test application [111]. Unlike TA-TSS, the main objective of PC-TSS is to eliminate useless power dissipation, and then use test application time and BIST area overhead as tiebreaking mechanism. It should be noted that although PC-TSS is compared with TA-TSS where both use tabu search-based testable design space exploration, the power-conscious test synthesis and test scheduling techniques are equally applicable to any test register allocation algorithm, any test scheduling algorithm and any testable design space exploration algorithm. This is because the power-conscious technique improves the decision making during the exploration process by accounting for useless power dissipation in the search objectives. A comparison is carried out for a number of benchmark examples including elliptic wave digital filter (EWF) with execution time constraint of 17 control steps (Figure 6.11), 8 point discrete cosine transform (8DCT) with execution time constraint of 10 control steps (Figure 6.12), and 32 point discrete cosine transform (32DCT) with execution time constraint of 30 control steps (Figure 6.13). The benchmarks were synthesized using the ARGEN high level synthesis system [75]. The synthesized data paths have 3 adders, 3 multipliers, and 15 registers for the EWF, 4 adders, 4 multipliers, and 12 registers for the 8DCT and 12 adders, 9 multipliers and 60 registers for the 32DCT. When integrating power-conscious test synthesis and test scheduling algorithms into testable design space exploration, generic values for test application time, and power dissipation need to be considered. Therefore, TAT for adders and multi-
Power-Conscious Test Synthesis and Scheduling
133
pliers are assumed to be and respectively, where for achieving 100% fault coverage for 8 bit data path modules. To validate the assumption regarding test length, 8 bit width adder and multiplier modules were synthesized and technology mapped onto AMS 0.35 micron technology [5], and subsequently a parallel pattern single fault propagation fault simulator [82] showed that is a valid assumption. Similarly, during the power-conscious testable design space exploration power dissipation for registers, adders and multipliers is assumed to be and where can be derived using the techniques from [87]. The generic high level model for power dissipation provides the flexibility of applying the power-conscious techniques to various library modules with different power characterization. To validate the generic power model, registers, test registers, and functional modules were synthesized and technology mapped to AMS 0.35 micron technology [5]. Using a real delay model simulator [91] and AMS 0.35 micron timing and power information operating at supply voltage 3.3V and clock frequency 100MHz, and hence accounting for glitching activity, the following power values were obtained for 8 bit data path and pseudorandom sequences applied during testing: and To compute the power dissipation during test application in the entire data path (Figures 6.11, 6.12, and 6.13), the power dissipation of all the active elements is summed. This hierarchical power dissipation computation provides a trade-off between the accuracy of low level power simulators such as SPICE and the computational complexity for large circuits such as elliptic wave digital filters and discrete cosine transform. Finally in order to compute BIST area overhead, BIST data paths are specified in VHDL code at RTL and synthesized and technology mapped using [37] onto AMS 0.35 micron technology [5]. It should be noted that BIST area overhead includes not only the overhead caused by data path test registers, but also the overhead caused by the merged functional and BIST controller. To emphasize the importance of considering useless power dissipation, PCTSS and TA-TSS were compared for 10 power constraints ranging from to Therefore, the experiments were performed on a large number of 30 testable circuits: 3 examples EWF, 8DCT and 32DCT synthesized for 10 different power constraints. For all the evaluated circuits the PC-TSS produces less TAP and less SP when compared to TA-TSS which does not account for useless power dissipation. For example, as shown in Figure 6.1l(c) in the case of elliptic wave digital filter, TAP varies from 20mW to 45mW in the case of PC-TSS, whereas in the case of TA-TSS it varies from 23mW to 48mW. While TAP, SP and BAO increase simultaneously with power constraint, TAT in terms of clock cycles decreases as the power constraint increases, as shown in Figures 6.11(a), 6.12(a), and 6.13(a). This is because by increasing the power
134
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
constraint, more modules are tested concurrently leading to lower test application time. It should be noted that TA-TSS which assumes fixed amount of power dissipation with each test yields lower test application time and BIST area overhead when compared to PC-TSS for most of the power constraints. However, lower test application time and BIST area overhead in the case of TA-TSS leads consistently to higher power dissipation in test application due to ignorance of useless power dissipation during test synthesis and test scheduling, and hence causes a violation of the power constraint which can decrease
Power-Conscious Test Synthesis and Scheduling
135
the reliability of the circuit and lead to manufacturing yield loss [138, 148]. Further, by relaxing the power constraints for TA-TSS, useless power dissipation can actually increase without any reduction in test application time. This can be seen, for example, in Figures 6.11(a) and 6.11(c), where the power constraints increase from to The explanation comes from the fact that relaxed power constraints allow higher activity in the circuit during test, which consequently increases the number of configurations where useless power dis-
136
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
sipation cannot be eliminated, unless it is explicitly analyzed during the search process. Further, as the complexity of the circuit increases as it is the case of 32 point discrete cosine transform, PC-TSS has comparable test application time (Figure 6.13(a)) and BIST area overhead (Figure 6.13(b)) to TA-TSS at the benefit of lower power dissipation. This is because the testable design space increases and therefore PC-TSS can successfully find test register allocations and test schedules which lead to elimination of useless power dissipation, as well as
Power-Conscious Test Synthesis and Scheduling
137
BIST area overhead and test application time which are comparable to TATSS. The integration of algorithms ACCEPT-MOVE and MODULE-SELECT into testable design space exploration algorithm comes at the expense of small overhead in computation time. For example, the computation time for elliptic wave digital filter and 8 point discrete cosine transform is below 10s, where for complex 32 point discrete transform is under 500s which is within reasonable computational limits.
6.6
Summary
This chapter has shown how power dissipation during test application is minimized at higher levels of abstraction than the logic level of abstraction (Chapters 4 and 5) of the VLSI design flow. When using BIST for low power RTL data paths [77], useless power dissipation during test application is eliminated using power-conscious test synthesis and test scheduling. In order to achieve this goal power dissipation was classified into necessary and useless power. Then the effect of test synthesis and test scheduling on power dissipation was analyzed and, using this analysis, power minimization was achieved by power-conscious test synthesis moves during the testable design space exploration and by power-conscious module selection during test scheduling. The algorithms described in this chapter can increase yield and reliability of BIST data paths by satisfying power constraints during test application.
This page intentionally left blank
Chapter 7 POWER PROFILE MANIPULATION PAUL M. ROSINGER‚ BASHIR M. AL-HASHIMI AND NICOLA NICOLICI
7.1
Introduction
The previous chapter has described how useless power can be eliminated in BIST data paths described at the RTL of abstraction. This chapter introduces an approach for reducing test application time by manipulating power profiles of test sets that are applied to every block in the system. The described solution is a fusion between the two existing research directions in low power testing: minimizing test power dissipation and minimizing test application time under given power constraints‚ which were outlined in Chapter 3. Hence‚ this chapter shows how complementary techniques can easily be combined to increase test concurrency under given power constraints. This is achieved in two steps: in the first step power dissipation is considered as a design objective and is consequently minimized‚ result which is further exploited in the second step where power dissipation becomes a design constraint under which the test concurrency is increased. The distinctive benefit of the power profile manipulation approach is that it does not depend on the initial test sets‚ as well as it is independent of the test scheduling policy. Consequently‚ it can be embedded into any existing PCTS algorithm to leverage its performance. The rest of the chapter is organized as follows. Section 7.2 motivates power profile manipulation‚ which is detailed in Section 7.3. Section 7.4 shows the effectiveness of the power profile manipulation by explaining how it can be fully exploited by power-constrained test scheduling algorithms. Extensive experimental data is given in Section 7.5‚ while Section 7.6 concludes the chapter.
7.2
The Global Peak Power Approximation Model
As discussed in Chapter 2‚ in order to consider power during test scheduling‚ the power dissipated by the block under test needs to be modeled using generic 139
140
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
power models. The commonly used Global Peak Power Approximation Model (GP-PAM) is analyzed in this section. As shown in Figure 7.1‚ the GP-PAM basically flattens the power profile of a block to the worst case instantaneous power dissipation value‚ i.e.‚ its peak value. According to this model‚ the power profile of a block is described by the pair where is the global peak value of the power profile‚ and L is the test sequence length. Evidently‚ the simplicity and reliability requirements for a good power approximation model are satisfied by the GP-PAM. However‚ the low complexity of the GP-PAM is achieved at the expense of a relatively high approximation error‚ as indicated by the large false power region in Figure 7.1. For regular power profiles‚ which can be described with high accuracy using a simple approximation model‚ the simplicity of the GP-PAM does not justify the high error it introduces. This is because simple and reliable power models with higher approximation accuracy can be derived‚ as described in the following section. The disadvantage of using GP-PAM is illustrated through the following example. Example 7.1 Consider the two power profiles shown in Figure 7.2. Power profiles 1 and 2 correspond to two resource compatible tests. The joint power profile for the two tests is lower at any time than the power constraint which means that the two tests could be assigned to the same test session without violating the power requirements. However‚ the GP-PAM ignores the fact that the higher parts of the two power profiles do not overlap‚ thus falsely prevent-
Power Profile Manipulation
141
ing the two tests from being scheduled in the same test session. The prevention is due to the fact that the given power constraint is exceeded by the sum of their global peak values (GP1+GP2). The error in power approximation model leads to lower test concurrency‚ and hence longer test application time. Therefore‚ a power approximation model which considers both the value and the position of the peak of the power profile of every test set would be able to produce power descriptions with similar simplicity and reliability‚ and provide higher approximation accuracy. A solution where not only the average or peak power values are considered but also the shape of the power profile would allow any PCTS algorithm to increase test concurrency by exploiting the position and the size of the higher and lower power parts in each power profile. Consequently this will lead to shorter test application time.
7.3
Power Profile Manipulation
The previous section has shown that‚ regardless of its simplicity and reliability‚ the global peak power approximation model leads to large approximation errors and consequently to low test concurrency. This can be avoided if the power profile can be manipulated such that power model’s simplicity and reliability are maintained‚ while increasing its accuracy. The aim of this section is to show how this power profile manipulation can be performed such that power-constrained test scheduling algorithms could fully exploit its advantages. The power profile manipulation approach is introduced using the following components:
142
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
test vector reordering - initially‚ power profiles for the block tests are lowered by increasing the correlation between successive vectors in the test sequence; test vector reordering is used for peak and average power reduction‚ as well as for power profile molding that is exploited by efficient modeling using the enhanced approximation model. enhanced power approximation model - when test vector reordering is completed the power profile consists of an initial low power part of the profile followed by a high power part; hence a simple and reliable approximation model exploiting these two parts will provide more accurate descriptions of the power profile than the GP-PAM. test sequence rotation - finally‚ the low power profiles are rotated and piled up together such that the high power parts do not overlap in order to obtain improved usage of the power constraint; test sequence rotation is used to avoid overlapping of high power parts of power profiles in the same test session.
7.3.1
Test Vector Reordering
At this stage‚ power dissipation is seen as a design objective and consequently it is minimized. Since the dynamic power represents the main component of power dissipation in CMOS circuits‚ the order in which the test vectors are applied to the primary and pseudoinputs of a circuit influences the power dissipation in the circuit. Ordered test sequences can easily be provided from ATE when using external testing or generated on-chip using embedded deterministic test when internal testing is preferred. It should be noted that in the case of external testing‚ the resource allocation conflicts (used to generate the test compatibility graph) are caused by sharing the number of ATE channels and CUT pins‚ which is in contrast to embedded deterministic test where the resource conflicts are caused by sharing of test sources‚ test sinks and test access mechanisms. The power profile of re-ordered test sequences changes with the vector order in the sequence. Thus‚ certain test vector orderings producing regular shaped power profiles can be determined. A shape is considered to be regular if it can be described with high accuracy using a simple approximation model. Considering the previously mentioned issues‚ the test vector reordering algorithm described below aims to achieve the following two objectives: minimize the average and peak power dissipation values and produce a power profile suitable for simple‚ reliable and accurate test power modeling. The input to the test vector reordering algorithm is a transition graph described below. Given a test sequence TS with N test vectors‚ the input transition graph can be computed. ITG is a complete directed graph with nodes and |E| = N(N – 1) edges‚ where each node represents a vector in TS and each edge represents a transition at the
143
Power Profile Manipulation
primary inputs from to The ITG edges are labeled with an estimation of the power dissipated in the circuit by the corresponding input transition. The edge weights are computed differently depending on the adopted testing scheme: test-per-clock or test-per-scan (see Chapter 1.1): In a test-per-clock testing scheme‚ the test vectors are applied to the primary inputs one vector at each clock cycle. Each edge in ITG is weighted with the power P consumed in the circuit during the transition of the primary inputs from to Weight All the possible ordered pairs have to be simulated using a power estimation tool (e.g.‚ PowerMill [61]) in order to compute the ITG edge weights. The simulation length in this case is clock cycles‚ where N is the sequence length. In a test-per-scan testing scheme‚ a test vector is first shifted in during m clock cycles‚ where m is the number of inputs (i.e.‚ cells in the scan chain)‚ then it is applied to the block during the clock cycle m + 1‚ and the circuit response is shifted out during the next m clock cycles‚ simultaneously shifting in the next test vector. Edge in ITG is weighted with the power consumed by the simultaneous scan-out of and scan-in of As the weight considers m clock cycles for the test-per-scan testing scheme‚ when compared to the single-cycle transition corresponding to the testper-clock testing scheme‚ the simulation time increases by a factor of m‚ Since the simulation time for N (N – 1) ordered vector pairs using a transistor-level power simulator such as PowerMill [61] becomes prohibitively high‚ a simpler power estimation method is needed. The weighted transition count (WTC) described in [125] is well correlated with power dissipation. Experiments were performed on ISCAS85 benchmark circuits synthesized in Alcatel MTC35000 technology [3] to verify the suitability of this power estimation method by determining the degree of correlation between WTC values and transistor-level power estimations. The correlation is confirmed by the results reported in Table 7.1‚ where Pearson correlation coefficients [130] range from 0.86 and 0.98 for WTC values [125] vs‚ PowerMill power estimations [61]. As described in Equations (2‚3) and (2.4) from Chapter 2‚ the WTC values corresponding to scan-in and scan-out respectively are given by:
where
represents the
bit from vector
Finally‚ ITG edge weights for test-per-scan are computed using:
144
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Having computed the ITG edge weights‚ reordering the test sequence for low power reduces to the problem of finding in ITG a low cost Hamiltonian tour which maximizes As ITG is a complete directed graph‚ finding a low cost Hamiltonian cycle in it represents an instance of the asymmetric traveling salesman problem which is known to be NP-hard. Therefore‚ the greedy depth-first search heuristic described in the algorithm shown in Figure 7.3 was implemented to determine a good solution to this problem. The algorithm starts from a randomly selected vector in the sequence and at each iteration selects the neighboring node which generates the lowest power dissipation‚ i.e.‚ the outgoing edge with the smallest weight. Due to the greedy nature of the method adopted for traversing the ITG the power profile corresponding to the resulting path will exhibit an initial long low power part followed by a short high power part towards the end of the sequence. This is because the edges
Power Profile Manipulation
145
with lower weights are added to the path in early iterations‚ leaving the edges with higher weights for the end of the profile. This particular shape of the power profile has the following advantages: it has lower peak and average power values than a random path in the graph (such as the initial unordered test sequence). in conjunction with the enhanced power approximation model introduced later in the chapter it brings approximation accuracy improvement over its GP-PAM representation as shown later in Table 7.2.
7.3.2
Enhanced Power Approximation Model
So far it was shown how‚ by considering power as a design objective‚ test vector reordering can generate a test sequence with a regular power profile which has an initial long low power part followed by a short high power part towards the end of the sequence. From now onwards‚ power is viewed as a design constraint and test application time is the minimization goal. This regular shaped power profile can be accurately described using simple approximation models as shown in the example from Figure 7.4. By modeling low and high power parts of the profile using their local power peaks and lengths and then the value‚ position and size of each part of the profile are available as inputs for power-constrained test scheduling algorithms. The improvement in approximation accuracy compared to the GPPAM‚ which is represented by the dashed rectangle in Figure 7.4‚ is given by The enhanced power approximation model will
146
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
be further referred to as the two local peak power approximation model (2LPPAM) and will be represented by the 4-tuple Several 4-tuple descriptions are possible for the same power profile‚ however the optimum is the one with the highest This optimum 4-tuple approximation can be computed in linear time with the length of the test sequence. The test vector reordering presented generates the type of power profile which in conjunction with 2LP-PAM consistently exhibits high accuracy improvement over GP-PAM representations (i.e.‚ high The GPPAM‚ which is currently used by PCTS algorithms‚ represents the upper approximation bound for the enhanced power approximation model. Thus‚ using the enhanced approximation model in conjunction with an existing PCTS algorithm guarantees at least the performance of the original algorithm. Further‚ it is interesting to note that‚ because the test reordering algorithm which always adds low weight edges in early iterations leaving the edges with higher weights to the end of the profile (Figure 7.4)‚ 2LP-PAM does not depend on the test set values nor it depends on the initial test vector order in the test sequence. On one hand‚ test set independence guarantees the existence of two local power peaks‚ exploited by the power-constrained test scheduling algorithms‚ regardless of the system and block structural or functional information. On the other hand‚ vector order independence facilitates test sequence rotation for increasing test concurrency in every test session‚ as explained next.
7.3.3
Test Sequence Rotation
Having lowered and reshaped the test power profiles using test vector reordering and modeled using the 2LP-PAM‚ this paragraph explains how compatible tests can be combined into a test session for increasing test concurrency under given power constraints. Since 2LP-PAM offers information on the position and size of both low and high power parts‚ the power profiles can be rotated such that‚ when added to a test session‚ their high power parts do not overlap with the high power parts of the profiles already in the test session. This leads to a higher test concurrency under given power constraints as explained in the following example. Example 7.2 Consider the power profiles shown in Figure 7.5 belonging to two compatible test sequences TS1 and TS2 that can be merged in the same test session. Figures 7.5(a) and 7.5(b) show the 2LP-PAM power profile approximations corresponding to TS1 and TS2 with the vectors reordered using the algorithm in Figure 7.3. First TS2 is added to the empty test session. Then‚ TS1 is rotated left by vectors‚ as illustrated in Figure 7.5(c). The joint power profile obtained by adding the rotated TS1 to the test session is shown in Figure 7.5(d). Unlike the GP-PAM based approach where the maximum power dissipation for the test session composed of the two tests would be given
Power Profile Manipulation
147
by
by using the 2LP-PAM‚ the maximum test session power dissipation becomes
Thus‚ This example has shown how by controlling the rotation of the test sequences before adding them to a test session‚ the high power parts of their power profiles are uniformly spread over the entire test session length‚ rather than being piled up on top of each other‚ as in the case of the GP-PAM approach. Therefore‚ joint power profile of the test session when using the 2LP-PAM‚ becomes more flat and can fit more tests under the same power constraint. It should be noted‚ that cyclic power profiles are needed for test sequence rotation. The cyclic power profile corresponding to a test sequence TS when compared to the normal power profile‚ has an extra element: the power dissipation corresponding to the transition between the last and the first vector in TS. Having the power dissipation values for all pairs of adjacent vectors in the test sequence facilitates the selection of any vector in the sequence as the first vector in the rotated sequence. These cyclic power profiles are exploited by the power-constrained test scheduling algorithms described next.
7.4
Power-Constrained Test Scheduling Using Power Profile Manipulation
So far‚ the enhanced power profile manipulation approach was introduced using the following components: test vector reordering‚ two local peak power approximation model and test sequence rotation. Test vector reordering has considered power dissipation as a design objective‚ which was minimized. Using the minimized power profile‚ the enhanced power approximation has reshaped it into a simple‚ reliable and accurate model‚ such that test sequence rotation can exploit this model in order to increase test concurrency under a power constraint. This section shows how power profile manipulation can be integrated into existing power-constrained test scheduling algorithms. The non-partitioned test scheduling algorithm for unequal test lengths described in [25] is extended for use in conjunction with power profile manipulation. The choice of using the simple non-partitioned test scheduling algorithm for
148
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
unequal test lengths from [25] is justified by its simplicity and comparison purposes. It is important to note that power profile manipulation can be embedded into any other more complex power-constrained test scheduling algorithms‚ including the PC-TSS described in Chapter 6. The extended PCTS algorithm starts with a preprocessing step (lines 1 to 5 in Figure 7.6)‚ where all the test sequences are reordered according to the algorithm described in Figure 7.3. The resulting power profiles are modeled using the 2LP-PAM. The high power parts of the power profiles are then moved to the beginning of the test sequence by rotating them to the left by vectors (line 4). Next‚ the algorithm shown in Figure 7.6 determines all the cliques (i.e.‚ completely connected subgraphs) of the TCG (line 6). The TCG cliques represent the maximal groups of test compatible blocks. For each TCG clique‚ the algorithm computes all the maximal ordered subsets which comply with the given power constraint‚ referred to as the power compatible lists (PCLs) (lines 7 and 8). The following example explains PCLs. Example 7.3 Consider the system with the TCG and 2LP-PAM power profiles shown in Figure 7.7. The cliques in this case are
Power Profile Manipulation
149
The maximality requirement of the PCLs means that no other test can be added to them without exceeding the power constraint. The tests in a PCL are arranged in the descending order of their length. Consider the test compatible clique composed of tests Figures 7.8(a) and 7.8(b) show the PCLs corresponding to the clique under a power constraint of 10 using 2LP-PAM and GP-PAM approximations. The test list sorted in descending order of their length is First‚ all tests are rotated left by vectors such that their high power parts are moved to the beginning of the power profiles. is then added to the empty test session and the offset variable is set to The next test in the ordered list‚ is rotated right by offset vectors and added to the test session. The value offset variable is increased by Finally‚ test is rotated right by the new value of the offset variable‚ offset=50. The maximum value of the resulting power profile is which is lower than the power constraint. This means that by using the 2LP-PAM all tests in the clique can be scheduled in the same test session under the given power constraint‚ while by using the GP-PAM‚ two test sessions are required to cover all tests in the clique. Using the 2LP-PAM during PCL construction resulted in a test schedule of 100 clock cycles for tests compared
150
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
with the 160 clock cycles test schedule corresponding to the GP-PAM based approach. The PCLs are determined for each clique C from TCG and the given power constraint using the algorithm shown in Figure 7.9. For each subset of C the algorithm computes the optimum arrangement of its tests using the test sequence rotation. The offset variable guides the rotation of the test sequences to be inserted into the current test session. If the resulting power profile complies with the power constraint‚ the algorithm checks whether the subset is maximal (lines 10 to 17)‚ i.e.‚ no other test in C can be added to the subset without violating the power constraint. The maximal power compatible subsets are then added to the set of PCLs. Having explained how PCLs are computed (lines 7 and 8 in Figure 7.6‚ Example 7.3 and the algorithm in Figure 7.9)‚ now the derived power compatible lists (DPCLs) are computed (lines 9 and 10 in Figure 7.6). A DPCL is generated recursively from a PCL or a DPCL by removing from it the longest test (or tests if there are more with the same length). If is a PCL and DPCL such that then the DPCL of H is Thus‚ is an ordered subset of the PCL H such that the test length of the first element in is strictly less than the test length of the first element in H. The process of deriving DPCLs is repeated on each
Power Profile Manipulation
151
newly generated DPCL until no further derivation is possible‚ i.e.‚ the resulting DPCL has no elements. Finally‚ finding the optimum test schedule under the given power constraint reduces to the problem of finding a minimum cost cover for the DPCLs set (line 11 in Figure 7.6)‚ where the cost associated with each DPCL is the length of the longest test in the DPCL. The minimum cost covering problem was formulated as an integer linear programming (ILP) problem [39] and solved
152
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
using lp_solve. Although ILP is based on a branch and bound search algorithm‚ this problem is tractable‚ since the number of variables corresponding to the benchmark systems used in our experiments is relatively small. This section has shown how power profile manipulation can be integrated into an existing simple PCTS algorithm [25]. By combining power profile manipulation and power-constrained test scheduling it was demonstrated that‚ for the simple system shown in Example 7.3‚ test application time can be reduced (e.g.‚ from 160 clock cycles to 100 clock cycles).
Power Profile Manipulation
7.5
153
Experimental Results
The power profile manipulation approach was validated experimentally on hypothetical systems with randomly generated test compatibility graphs. The ISCAS85 benchmark circuits synthesized in Alcatel MTC35000 technology [3] were used as system blocks. The power profile manipulation approach was integrated into the PCTS algorithm from [25]‚ and the algorithms were implemented in C++ on a Linux PIII 500MHz with 128Mb RAM. The power simulations were performed using the PowerMill tool from Synopsys [61] on a Sun UltraSparc 10 / 450MHz workstation with 512Mb of RAM. First‚ the benefits of the 2LP-PAM over the GP-PAM are outlined. Both the test-perclock and test-per-scan testing schemes are analyzed. Column 2 in Table 7.2 shows the improvement in approximation accuracy of the 2LP-PAM over the traditional GP-PAM on the ISCAS85 benchmark suite for the test-per-clock testing scheme. Column 3 in Table 7.2 gives the improvement in approximation accuracy for the test-per-scan testing scheme. Next‚ for systems having from 8 to 18 blocks‚ three sets of experiments were performed for a wide range of power constraints. The first set of experiments was performed using the original PCTS presented in [25]. The power model used in this case was the GP-PAM and no test sequence reordering was performed. The test application times obtained for this set of experiments for the test-per-clock and test-per-scan testing schemes are given in columns 3 of Tables 7.3 and 7.4‚ respectively. The next set of experiments demonstrates the impact of the test sequence reordering on the test application time. The power model used was‚ once again‚ the traditional GP-PAM since the scope of the experiments is to show how lowering the power profiles of the blocks affects the concurrency in test sessions. The results obtained from this experiment for the test-per-clock and test-per-scan testing schemes are reported in columns 4 of Tables 7.3 and 7.4‚ respectively. The results in columns 5 shown in Tables 7.3 and 7.4‚ correspond to the final set of experiments which used all the features of power profile manipulation: test vector reordering‚ test sequence rotation and the power approximation model. Columns 6‚ 7 and 8 in Tables 7.3 and 7.4 provide the reduction in test application time between pairs of the three sets of experiments. The results show how by reordering the test vectors and using GP-PAM‚ the total test application time varies from marginal up to 23% improvements. However‚ when using power profile manipulation and all its components (test vector reordering‚ rotation and 2LP-PAM - Column 7)‚ the test application time is always reduced‚ with savings up to 41%. Figures 7.10(a) and 7.11 (a) compare the test application times obtained in the three previously mentioned sets of experiments when the power constraint and the number of the blocks per system are variable. The results corresponding to each experimental set are represented as a surface. The highest surface plot is the one corresponding to the original PCTS algorithm. Underneath it‚
154
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
the surface plot corresponding to the original PCTS combined with the reordered test sequences under the traditional GP-PAM model‚ already shows reductions in test application time due to the increased concurrency per test session produced by the lower power profiles. Finally‚ the lowest surface plot shows the results obtained by combining the power profile manipulation ap-
Power Profile Manipulation
155
proach with the exact PCTS algorithm [25]. For the last experiment‚ savings and constantly minimum values in test application time can be observed when compared to the previous two experimental sets. Figures 7.10(b) and 7.11(b) show the variation in test application time with the power constraint‚ for a given number of blocks (planes from the surfaces shown in Figures 7.10(a) and 7.11 (a)). As the power constraint becomes less restrictive (the power constraint value increases)‚ the test application times obtained for the three sets of experiments converge to a minima given by the maximum test concurrency achievable under the existing resource sharing relationships among system’s blocks. However‚ for tight power constraints‚ power profile manipulation brings savings in the test application time when compared to the original PCTS algorithm. Finally‚ the influence of the number of blocks on the test application time is examined‚ for a given power constraint. Since power profile manipulation is a complementary technique introduced to improve PCTS algorithms‚ the test application time when using power profile manipulation always maintains or improves the result of the original PCTS algorithm‚ as shown in Figures 7.10(c) and 7.11(c).
156
7.6
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
Summary
This chapter has presented a power profile manipulation approach based on the following components: test vector reordering‚ a double peak power approximation model and test sequence rotation. Initially‚ test vector reordering considered power dissipation as a design objective. Then‚ using the minimized power profile‚ a double peak power approximation model has reshaped the power profile into a simple‚ reliable and accurate test power model‚ which can be exploited by the test sequence rotation in order to increase test concurrency
Power Profile Manipulation
157
under a given power constraint. Since power profile manipulation is orthogonal to the test scheduling policy and the test set values‚ its distinctive feature is that it can be equally well included to leverage the performance of any powerconstrained test scheduling algorithm.
This page intentionally left blank
Chapter 8 CONCLUSION
The demand for low power VLSI digital circuits in the growing area of portable communications and computing systems will continue to increase in the future. Cost and life cycle of these products will depend not only on low power synthesis techniques‚ but also on new DFT methods targeting power minimization during test application. This is because the traditional DFT methods are not suitable for testing low power VLSI circuits since they reduce the reliability and manufacturing yield. This book has focused on powerconstrained testing of VLSI circuits. The results presented in this book further motivate the need for new DFT methods for testing low power VLSI circuits‚ especially for SOCs. Recent advances in manufacturing technology have provided the opportunity to integrate millions of transistors on an SOC. Even though the design process of embedded core-based SOCs is conceptually analogous to the traditional board design‚ their manufacturing processes are fundamentally different [149]. True inter-operability can be achieved only if the tests for these cores can also be reused. Therefore‚ novel problems need to be addressed and new challenges arise for research community in order to provide unified solutions that will simplify the VLSI design flow and provide a plug-and-play methodology for core-based design paradigm. Due to the increasing complexity of future SOCs combined with the multitude of factors that influence the cost of test‚ a new generation of power-conscious test techniques‚ DFT automation tools‚ structured methodologies with seamless integration into the design flow‚ and novel architecture-specific‚ software-centered‚ embedded‚ structural and deterministic test approaches are anticipated. Low threshold devices have an increased leakage current that flows through a transistor which is turned off but is powered. For high supply and thresh159
160
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
old voltages this current is insignificant when compared to the dynamic component. However‚ due to circuit miniaturization and lower supply voltages‚ this leakage current emerges as an important obstruction to power efficient chip design in deep submicron manufacturing processes. Further‚ due to increased leakage‚ the effectiveness of IDDQ testing for low voltage circuits is also reduced. Several approaches based on two parameter test techniques (IDDQ and maximum operating frequency) [73] and leakage control techniques [20] have recently been proposed to overcome this problem. More challenges are lying ahead‚ especially for performance testing. This is because many low power designs trade-off the performance of a circuit against the power dissipation by reducing the leakage in non-critical paths using high threshold cells. Consequently‚ the use of multiple threshold circuits combined with the modeling inaccuracies can lose the causality relationship between signals‚ thus inducing erroneous test results. Low power design automation is seeking simultaneous optimization of timing and power‚ which will enable‚ in addition to timing sign-off‚ a power sign-off. It is important that test automation tools interact with low power design tools and exploit this power closure information. For example‚ the power reports containing voltage drop and thermal data can be used for tuning the current DFT support towards meeting the operating power ratings. In addition‚ ATPG algorithms can be enhanced to support high test sequence correlation along with graceful degradation in testing time. To take the full advantage of low power DFT features‚ a structured and seamless integration into the design flow is required. Decisions made during the early stages of product specification and refinement may impact the testability of the design. For example‚ preliminary package decisions and test plan preparation should include power ratings since they will influence also the DFT strategy to be coupled with the design flow. New test methodologies need to track power related decisions across the entire VLSI design flow‚ from architectural exploration‚ such as partitioning the design into multiple supply voltage domains‚ down to scan clock tree synthesis and test access routing. By the end of the decade the on-chip interconnections will be a limiting factor for performance and energy consumption‚ thus new SOC paradigms‚ such as Networks on Chips [9]‚ are currently emerging. This paradigm postulates that on-chip micronetworks will address problems caused by physical interconnections and the overall design goal for most SOCs will be to satisfy the Quality of Service metrics with the least energy consumption. Since design and test methodologies need to be coupled in the early stages of the design flow‚ these new SOC paradigms will trigger appro-
Conclusion
161
priate test methodologies where handling power during performance testing and in-field diagnosis will be of prime importance. These in-field power-conscious test methodologies should include the following features: architecture-specific‚ software-centered‚ embedded‚ structural and deterministic. The low power design philosophy was expanded over the last several years‚ where an increasing number of hardware designers and embedded software developers have become aware of the importance and benefits of considering power issues in the very early stages of product development. To keep the pace with low power design‚ it is expected that test methodologies will soon regard power-constrained testing as an important parameter in establishing the manufacturing test flow. This will ultimately reduce the cost of test by lowering thermal density and thus increasing the test throughput‚ as well as providing reliable in-field self-test and diagnosis for high performance/low energy electronic products.
This page intentionally left blank
References
[1] M. Abramovici, M.A. Breuer, and A.D. Friedman. Digital Systems Testing and Testable Design. IEEE Press, 1990. [2] V. D. Agrawal. Editorial - special issue on partial scan design. Journal of Electronic Testing: Theory and Applications (JETTA), 7(5):5–6, August 1995. [3] Alcatel. MTC35000 Standard Cell Library. Alcatel, 1998. [4] A. Allan, D. Edenfeld, W. H. Joyner Jr., A. B. Kahng, M. Rodgers, and Y. Zorian. 2001 technology roadmap for semiconductors. Computer, 12(1):42–53, January 2002. [5] AMS. 0.35 Micron CMOS Process Parameters. Austria Mikro Systeme International AG, 1998. [6] D. Bakalis, H. T. Vergos, D. Nikolos, X. Kavousianos, and G. P Alexiou. Low power dissipation in BIST schemes for modified booth multipliers. In International Symposium on Defect and Fault Tolerance in VLSI Systems, pages 121–129, 1999. [7] P.H. Bardell, W.H. McAnney, and J. Savir. Built-in Self Test - Pseudorandom Techniques. John Wiley & Sons, 1986. [8] L. Benini, A. Bogliolo, and G. de Micheli. A survey of design techniques for systemlevel dynamic power management. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(3):299–316, June 2000. [9] L. Benini and G. de Micheli. Networks on chips: A new SoC paradigm. Computer, 12(1):70–78, January 2002.
[10] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. A gated clock scheme for low power scan-based BIST. In Proc. IEEE On-Line Testing Workshop, pages 87–89, 2001. [11] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. A gated clock scheme for low power scan testing of logic ICs or embedded cores. In Proc. IEEE Asian Test Symposium, pages 253–258, November 2001. [12] M. Brazzarola and F. Fummi. Power characterization of LFSRs. In International Symposium on Defect and Fault Tolerance in VLSI Systems, pages 138–146, 1999.
163
164
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
[13] F. Brglez, D. Bryan, and K. Kozminski. Combinational profiles of sequential benchmark circuits. In Proc. International Symposium on Circuits and Systems, pages 1929– 1934, 1989. [14] R. E. Bryant, K. T. Cheng, A. B. Kahng, K. Keutzer, W. Maly, R. Newton, L. Pileggi, J. M. Rabaey, and A. Sangiovanni-Vincentelli. Limitations and challenges of computeraided design technology for CMOS VLSI. Proceedings of the IEEE, 89(3):341–365, March 2001. [15] M.L. Bushnell and V.D Agrawal. Essentials of Electronic Testing. Kluwer Academic Publishers, 2000. [16] K. Chakrabarty. Design of system-on-a-chip test access architectures under place-androute and power constraints. In Proc. IEEE/ACM Design Automation Conference, pages 432–437, 2000. [17] S. Chakravarty and V. Dabholkar. Two techniques for minimizing power dissipation in scan circuits during test application. In Proc. IEEE Asian Test Symposium, pages 324–329, 1994. [18] S. Chakravarty, J. Monzel, V. D. Agrawal, R. Aitken, J. Braden, J. Figueras, S. Kumar, H. J. Wunderlich, and Y. Zorian. Power dissipation during testing: Should we worry about it? In Proc. IEEE VLSI Test Symposium, page 456, 1997. [19] A. Chandra and K. Chakrabarty. Combining low-power scan testing and test data compression for system-on-a-chip. In Proc. IEEE/ACM Design Automation Conference, pages 166–169, 2001. [20] Z. Chen, L. Wei, and K. Roy. On effective IDDQ testing of low-voltage CMOS circuits using leakage control techniques. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(5):718–725, October 2001. [21] K. L. Cheng, C. M. Hsueh, J. R. Huang, J. C. Yeh, C. T. Huang, and C. W. Wu. Automatic generation of memory built-in self-test cores for system-on-chip. In Proc. IEEE Asian Test Symposium, pages 91–96, November 2001. [22] H. Cheung and S. Gupta. A BIST methodology for comprehensive testing of RAM with reduced heat dissipation. In Proc. IEEE International Test Conference, pages 386–395, 1996. [23] V. Chickername and J. H. Patel. A fault oriented partial scan design approach. In Proc. International Conference on Computer Aided Design, pages 400–403, 1991. [24] R. M. Chou, K. K. Saluja, and V. D. Agrawal. Power constraint scheduling of tests. In Proc. 7th International Conference VLSI Design, pages 271–274, 1994. [25] R. M. Chou, K. K. Saluja, and V. D. Agrawal. Scheduling tests for VLSI systems under power constraints. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 5(2):175–184, June 1997. [26] F. Corno, P. Prinetto, M. Rebaudengo, and M. Sonza-Reorda. A test pattern generation methodology for low power consumption. In Proc. IEEE VLSI Test Symposium, pages 453–460, 1998.
REFERENCES
165
[27] F. Corno, M. Rebaudengo, M. Sonza Reorda, G. Squillero, and M. Violante. Low power BIST via non-linear hybrid cellular automata. In Proc. IEEE VLSI Test Symposium, pages 29–34, 2000. [28] F. Corno, M. Rebaudengo, M. Sonza Reorda, and M. Violante. A new BIST architecture for low power circuits. In Proc. IEEE European Test Workshop, pages 160–164, 1999. [29] F. Corno, M. Rebaudengo, M. Sonza Reorda, and M. Violante. Optimal vector selection for low power BIST. In Proc. IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pages 219–226, 1999. [30] J. Costa, P. Flores, H. Neto, J. Monteiro, and J. Silva. Exploiting don’t cares in test patterns to reduce power during BIST. In Proc. IEEE European Test Workshop, 1998. [31] J. Costa, P. Flores, H. Neto, J. Monteiro, and J. Silva. Power reduction in BIST by exploiting don’t cares in test patterns. In Proc. IEEE International Workshop on Logic Synthesis, 1998. [32] G. L. Craig, C. R. Kime, and K. K. Saluja. Test scheduling and control for VLSI built-in self-test. IEEE Transactions on Computers, 37(9): 1099–1109, September 1988. [33] V. Dabholkar, S. Chakravarty, I. Pomeranz, and S. M. Reddy. Techniques for minimizing power dissipation in scan and combinational circuits during test application. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(12):1325–1333, December 1998. [34] G. de Micheli. Synthesis and Optimization of Digital Circuits. McGraw-Hill International Editions, 1994. [35] A. Dekkers and E. Aarts. Global optimization using simulated annealing. Mathematical Programming, 50(l):367–393, 1991. [36] S. Dey, A. Raghunathan, and K. D. Wagner. Design for testability techniques at the behavioural and register-transfer levels. Journal of Electronic Testing: Theory and Applications (JETTA), 13(2):79–91, October 1998. [37] Exemplar Logic. Leonardo Spectrum Release Notes. Exemplar Logic Incorporated, 1999. [38] P. Flores, J. Costa, H. Neto, J. Monteiro, and J. Marques-Silva. Assignment and reordering of incompletely specified pattern sequences targeting minimum power dissipation. In Proc. IEEE International Conference on VLSI Design, pages 37–41, 1999. [39] S.H. Gerez. Algorithms for VLSI Design Automation. John Wiley & Sons, 1999. [40] S. Gerstendorfer and H. J. Wunderlich. Minimized power consumption for scan-based BIST. In Proc. IEEE International Test Conference, pages 77–84, 1999. [41] S. Gerstendorfer and H. J. Wunderlich. Minimized power consumption for scan-based BIST. Journal of Electronic Testing: Theory and Applications (JETTA), 16(3):203–212, June 2000. [42] P. Girard. Low power testing of VLSI circuits: Problems and solutions. In Proc. IEEE International Symposium on Quality of Electronic Design, pages 173–180, 2000.
166
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
[43] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. Circuit partitioning for low power BIST design with minimized peak power consumption. In Proc. IEEE Asian Test Symposium, pages 89–94, 1999. [44] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. A test vector inhibiting technique for low energy BIST design. In Proc. IEEE VLSI Test Symposium, pages 407–412, 1999. [45] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. A test vector ordering technique for switching activity reduction during test operation. In Proc. IEEE Great Lakes Symposium on VLSI, pages 24–27, 1999. [46] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. An adjacency-based test pattern generator for low power BIST design. In Proc. IEEE Asian Test Symposium, pages 459–464, November 2000. [47] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch. Low power BIST design by hypergraph partitioning: Methodology and architectures. In Proc. IEEE International Test Conference, pages 652–661, October 2000. [48] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, J. Figueras, S. Manich, P. Teixeira, and M. Santos. Low power pseudo-random BIST: on selecting the LFSR seed. In Design of Circuits and Integrated Systems Conference, pages 166–172, 1998. [49] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, and H. J. Wunderlich. A modified clock scheme for a low power BIST test pattern generator. In Proc. IEEE VLSI Test Symposium, pages 306–311, 2001. [50] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac. Reduction of power consumption during test application by test vector ordering. IEE Electronics Letters, 33(21): 1752–1754, 1997. [51] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac. Reducing power consumption during test application by test vector ordering. In Proc. IEEE International Symposium on Circuits and Systems, pages 296–299, 1998. [52] D. Gizopoulos, N. Kranitis, A. Paschalis, M. Psarakis, and Y. Zorian. Effective low power BIST for datapaths. In Proc. of the Design, Automation and Test in Europe Conference, page 757, 2000. [53] D. Gizopoulos, N. Kranitis, A. Paschalis, M. Psarakis, and Y. Zorian. Low power/energy BIST for datapaths. In Proc. IEEE VLSI Test Symposium, pages 23–28, 2000. [54] D. Gizopoulos, M. Psarakis, A. Paschalis, N. Kranitis, and Y. Zorian. Low power builtin self-test for datapath architectures. In Proc. of the 2nd International Workshop on Microprocessor Test and Verification, pages 23–28, 1999. [55] F. Glover and M. Laguna. Tabu search. In C.R. Reeves, editor, Modern Heuristic Techniques for Combinatorial Problems, pages 70–150. McGraw-Hill Book Company, 1995.
REFERENCES
167
[56] P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici. Improving compression ratio, area overhead, and test application time for system-on-a-chip test data compression/decompression. In Proc. IEEE/ACM Design Automation and Test in Europe, pages 604–611, March 2002. [57] M. E. Hamid and C. I. H. Chen. A note to low-power linear feedback shift registers. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 45(9):1304–1307, September 1998. [58] I. Hamzaoglu and J. H. Patel. New techniques for deterministic test pattern generation. Journal of Electronic Testing: Theory and Application (JETTA), 15(l/2):63–73, August 1999. [59] I. Hamzaoglu and J. H. Patel. Test set compaction algorithms for combinational circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(8):957–962, August 2000. [60] A. Hertwig and H. J. Wunderlich. Low power serial built-in self-test. In Proc. IEEE European Test Workshop, pages 49–53, 1998. [61] C. X. Huang, B. Zhang, A. C. Deng, and B. Swirski. The design and implementation of PowerMill. In Proc. IEEE International Symposium on Low Power Design (ISLPED), pages 105–110, 1995. [62] T. C. Huang and K. J. Lee. An input control technique for power reduction in scan circuits during test application. In Proc. IEEE Asian Test Symposium, pages 315–320, 1999. [63] T. C. Huang and K. J. Lee. A low-power LFSR architecture. In Proc. IEEE Asian Test Symposium, page 470, November 2001. [64] T. C. Huang and K. J. Lee. Reduction of power consumption in scan-based circuits during test application by an input control technique. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 20(7):911–917, July 2001. [65] T. C. Huang and K. J. Lee. A token scan architecture for low-power testing. In Proc. IEEE International Test Conference, pages 661–669, October 2001. [66] T. C. Huang and K. J. Lee. Token scan cell for low power testing. IEE Electronics Letters, 37(11):678–679, May 2001. [67] Y. Huang, W. T. Cheng, C. C. Tsai, N. Mukherjee, O. Samman, Y. Zaidan, and S. M. Reddy. Resource allocation and test scheduling for concurrent test of core based SOC design. In Proc. IEEE Asian Test Symposium, pages 265–270, November 2001. [68] S. A. Hwang and C. W. Wu. Low-power testing for C-testable iterative logic arrays. In Proceedings International Conference on VLSI Technology, Systems, and Applications, pages 355–358, 1997. [69] International SEMATECH. The International Technology Roadmap for Semiconductors (ITRS): 1999 Edition. http://public.itrs.net/1999_SIA_Roadmap/Home.htm, 1999. [70] International SEMATECH. The International Technology Roadmap for Semiconductors (ITRS): 2001 Edition. http://public.itrs.net/Files/2001ITRS/Home.htm, 2001.
168
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
[71] S. Kajihara, K. Ishida, and K. Miyase. Test power reduction for full scan sequential circuits by test vector modification. In The Second Workshop on RTL ATPG & DFT, pages 140–145, November 2001. [72] C. Kern and M. R. Greenstreet. Formal verification in hardware design: A survey. ACM Transactions on Design Automation of Electronic Systems (TODAES), 4(2): 123–193, April 1999. [73] A. Keshavarzi, K. Roy, and C. F. Hawkins. Intrinsic leakage in deep submicron CMOS ICs-measurement-based test solutions. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(6):717–723, December 2000. [74] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4698):671–680, 1983. [75] P. Kollig and B. M. Al-Hashimi. A new approach to simultaneous scheduling, allocation and binding in high level synthesis. IEE Electronics Letters, 33(18):1516–1518, August 1997. [76] N. Kranitis, D. Gizopoulos, A. Paschalis, M. Psarakis, and Y. Zorian. Power/energyefficient bist schemes for processor data paths. IEEE Design and Test of Computers, 17(4): 15–28, Dec 2000. [77] G. Lakshminarayana, A. Raghunathan, N. K. Jha, and S. Dey. Power management in high level synthesis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(1):7–15, March 1999. [78] E. Larsson and Z. Peng. An estimation-based technique for test scheduling. In Electronic Circuits and Systems Conference, pages 25–28, 1999. [79] E. Larsson and Z. Peng. Test infrastructure design and test scheduling optimization. In Proc. IEEE European Test Workshop, 2000. [80] E. Larsson and Z. Peng. The design and optimization of SOC test solutions. In Proc. IEEE/ACM International Conference on Computer Aided Design, pages 523– 530, November 2001. [81] E. Larsson and Z. Peng. Test scheduling and scan-chain division under power constraint. In Proc. IEEE Asian Test Symposium, pages 259–264, November 2001. [82] H. K. Lee and D. S. Ha. An efficient forward fault simulation algorithm based on the parallel pattern single fault propagation. In Proc. International Test Conference, pages 946–955, 1991. [83] H. K. Lee and D. S. Ha. On the generation of test patterns for combinational circuits. Technical Report No. 12-93, Department of Electrical Engineering, Virginia Polytechnic Institute and State University, 1991. [84] K. J. Lee, T. C. Huang, and J. J. Chen. Peak-power reduction for multiple-scan circuits during test application. In Proc. IEEE Asian Test Symposium, pages 453–458, November 2000. [85] S. P. Lin, C. A. Njinda, and M. A. Breuer. Generating a family of testable designs using the BILBO methodology. Journal of Electronic Testing: Theory and Applications (JETTA), 4(2):71–89, 1993.
REFERENCES
169
[86] M. Lowy. Parallel implementation of linear leedback shift registers for low power applications. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 43(6):458–466, June 1996. [87] E. Macii, M. Pedram, and F. Somenzi. High level power modeling, estimation, and optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(11):1061–1079, November 1998. [88] S. Manich, A. Gabarro, M. Lopez, J. Figueras, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, P. Teixeira, and M. Santos. Energy and average power consumption reduction in LFSR based BIST structures. In Design of Circuits and Integrated Systems Conference, pages 651–656, 1999. [89] S. Manich, A. Gabarro, M. Lopez, J. Figueras, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, P. Teixeira, and M. Santos. Low power BIST by filtering nondetecting vectors. In Proc. IEEE European Test Workshop, pages 165–170, 1999. [90] S. Manich, A. Gabarro, M. Lopez, J. Figueras, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, P. Teixeira, and M. Santos. Low power BIST by filtering nondetecting vectors. Journal of Electronic Testing: Theory and Applications (JETTA), 16(3): 193–202, June 2000. [91] Model Technology. ModelSim Tutorial. Model Technology Incorporated, 2000. [92] V. Muresan, V. Muresan, X. Wang, and M. Vladutiu. The left edge algorithm and the tree growing technique in block-test scheduling under power constraints. In Proc. IEEE VLSI Test Symposium, pages 417–422, 2000. [93] V. Muresan, V. Muresan, X. Wang, and M. Vladutiu. The left edge algorithm in blocktest scheduling under power constraints. In Proc. IEEE International Symposium on Circuits and Systems, pages 351–354, 2000. [94] V. Muresan, X. Wang, V. Muresan, and M. Vladutiu. A comparison of classical scheduling approaches in power-constrained block-test scheduling. In Proc. IEEE International Test Conference, pages 882–891, 2000. [95] V. Muresan, X. Wang, V. Muresan, and M. Vladutiu. Distribution-graph based approach and tree growing technique in power-constrained block-test scheduling. In Proc. IEEE Asian Test Symposium, pages 465–470, 2000. [96] V. Muresan, X. Wang, V. Muresan, and M. Vladutiu. Power-constrained block-test list scheduling. In Proc. IEEE International Workshop on Rapid System Prototyping, pages 182–187, 2000. [97] V. Muresan, X. Wang, V. Muresan, and M. Vladutiu. A combined tree growing technique for block-test scheduling under power constraints. In Proc. IEEE International Symposium on Circuits and Systems, pages (V)255–(V)258, 2001. [98] V. Muresan, X. Wang, V. Muresan, and M. Vladutiu. Mixed classical scheduling algorithms and tree growing technique in block-test scheduling under power constraints. In Proc. IEEE International Workshop on Rapid System Prototyping, pages 162–167, 2001.
170
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
[99] W. M. Needham. Nanometer technology challenges for test and test equipment. Computer, 32(11):52–57, November 1999. [100] N. Nicolici. Power Minimisation Techniques for Testing Low Power VLSI Circuits. PhD thesis, University of Southampton, UK, http://www.bib.ecs.soton.ac.uk/records/4937/, October 2000. [101] N. Nicolici and B. M. Al-Hashimi. Correction to the proof of theorem 2 in Parallel Signature Analysis Design with Bounds on Aliasing. IEEE Transactions on Computers, 47(12): 1426, December 1998. [102] N. Nicolici and B. M. Al-Hashimi. Efficient BIST hardware insertion with low test application time for synthesized data paths. In Proc. IEEE/ACM Design Automation and Test in Europe, pages 289–295, 1999. [103] N. Nicolici and B. M. Al-Hashimi. Power conscious test synthesis and scheduling for BIST RTL data paths. In Proc. IEEE International Test Conference, pages 662–671, 2000. [104] N. Nicolici and B. M. Al-Hashimi. Power minimisation techniques for testing low power VLSI circuits. In IEE/EPSRC Postgraduate Research in Electronics and Photonics, pages 7–12, 2000. [105] N. Nicolici and B. M. Al-Hashimi. Scan latch partitioning into multiple scan chains for power minimization in full scan sequential circuits. In Proc. IEEE/ACM Design Automation and Test in Europe, pages 715–722, 2000. [106] N. Nicolici and B. M. Al-Hashimi. Exploring testability trade-offs for BIST RTL data paths: The case for three dimensional design space. In Proc. IEEE/ACM Design Automation and Test in Europe, page 802, March 2001. [107] N. Nicolici and B. M. Al-Hashimi. Low power test compatibility classes: Exploiting regularity for simultaneous reduction in test application time and power dissipation. In The Second Workshop on RTL ATPG & DFT, pages 134–139, November 2001. [108] N. Nicolici and B. M. Al-Hashimi. Minimising power dissipation in partial scan sequential circuits. IEE Proceedings - Computers and Digital Techniques, 148(4): 163– 166, September 2001. [109] N. Nicolici and B. M. Al-Hashimi. Tackling test trade-offs for BIST RTL data paths: BIST area overhead, test application time and power dissipation. In Proc. IEEE International Test Conference, pages 72–81, October 2001. [110] N. Nicolici and B. M. Al-Hashimi. Multiple scan chains for power minimization during test application in sequential circuits. IEEE Transactions on Computers, 51(6):721– 734, June 2002. [111] N. Nicolici, B. M. Al-Hashimi, A. D. Brown, and A. C. Williams. BIST hardware synthesis for RTL data paths based on test compatibility classes. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(11): 1375–1385, November 2000.
REFERENCES
171
[112] N. Nicolici, B. M. Al-Hashimi, and A. C. Williams. Minimisation of power dissipation during test application in full scan sequential circuits using primary input freezing. IEE Proceedings - Computers and Digital Techniques, 147(5):313–322, September 2000. [113] R.B. Norwood. Synthesis-for-Scan: Reducing Scan Overhead with High Level Synthesis. PhD thesis, Stanford University, November 1997. [114] M. Pedram. Power minimization in IC design: Principles and applications. ACM Transactions on Design Automation of Electronic Systems (TODAES), l(l):3–56, January 1996. [115] B. Pouya and A. L. Crouch. Optimization trade-offs for vector volume and test power. In Proc. IEEE International Test Conference, pages 873–881, October 2000. [116] A. Raghunathan, N. K. Jha, and S. Dey. Register transfer level power optimization with emphasis on glitch analysis and reduction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(8): 1114–1131, August 1999. [117] C. P. Ravikumar, G. Chandra, and A. Verma. Simultaneous module selection and scheduling for power-constrained testing of core based systems. In Proc. IEEE International Conference on VLSI Design, pages 462–467, 2000. [118] C. P. Ravikumar and N. S. Prasad. Evaluating BIST architectures for low power. In Proc. IEEE Asian Test Symposium, pages 430–434, 1998. [119] C. P. Ravikumar, A. Verma, and G. Chandra. A polynomial-time algorithm for power constrained testing of core based systems. In Proc. IEEE Asian Test Symposium, pages 107–112, 1999. [120] P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici. Power constrained test scheduling using power profile manipulation. In Proc. IEEE International Symposium on Circuits and Systems, pages (V)251–(V)254, May 2001. [121] P. M. Rosinger, P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici. Simultaneous reduction in volume of test data and power dissipation for systems-on-a-chip. IEE Electronics Letters, 37(24): 1434–1436, November 2001. [122] K. Roy and S. Prasad. Low-Power CMOS VLSI Circuit Design. John Wiley & Sons, 2000. [123] E. M. Rudnick, J. H. Patel, G. S. Greenstein, and T. M. Niermann. A genetic algorithm framework for test generation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 16(9): 1034–1044, September 1997. [124] D. G. Saab, Y. G. Saab, and J. A. Abrahamn. Automatic test vector cultivation for sequential VLSI circuits using genetic algorithms. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 15(10): 1278–1285, October 1996. [125] R. Sankaralingam, R. R. Oruganti, and N. A. Touba. Static compaction techniques to control scan vector power dissipation. In Proc. IEEE VLSI Test Symposium, pages 35–40, 2000. [126] R. Sankaralingam, B. Pouya, and N. A. Touba. Reducing power dissipation during test using scan chain disable. In Proc. IEEE VLSI Test Symposium, pages 319–324, 2001.
172
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
[127] J. Saxena, K. Butler, and L. Whetsel. A scheme to reduce power consumption during scan testing. In Proc. IEEE International Test Conference, pages 670–677, October 2001. [128] T. Schuele and A. P. Stroele. Test scheduling for minimal energy consumption under power constraints. In Proc. IEEE VLSI Test Symposium, pages 312–318, 2001. [129] A. Shen, A. Ghosh, S. Devadas, and K. Keutzer. On average power dissipation and random pattern testability of CMOS combinational logic networks. In Proc. IEEE/ACM International Conference on Computer Aided Design, pages 402–407, 1992. [130] D. W. Stockburger. Multivariate Statistics: Concepts, Models, and Applications. Southwest Missouri State University, 1998. [131] A. Vittal and M. Marek-Sadowska. Low-power buffered clock tree design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 16(9):965–975, September 1997. [132] H. Vranken, T. Waayers, H. Fleury, and D. Lelouvier. Enhanced reduced-pin-count test for full-scan design. In Proc. IEEE International Test Conference, pages 738–747, October 2001. [133] C. W. Wang, R. S. Tzeng, C. F. Wu, C. T. Huang, C. W. Wu, S. Y. Huang, S. H. Lin, and H. P. Wang. A built-in self-test and self-diagnosis scheme for heterogeneous SRAM clusters. In Proc. IEEE Asian Test Symposium, pages 103–108, November 2001. [134] S. Wang. Minimizing Heat Dissipation During Test Application. PhD thesis, University of Southern California, May 1998. [135] S. Wang and S. K. Gupta. ATPG for heat dissipation minimization during test application. In Proc. IEEE International Test Conference, pages 250–258, 1994. [136] S. Wang and S. K. Gupta. ATPG for heat dissipation minimization during scan testing. In Proc. 34th Design Automation Conference, pages 614–619, 1997. [137] S. Wang and S. K. Gupta. DS-LFSR: A new BIST TPG for low heat dissipation. In Proc. IEEE International Test Conference, pages 848–857, 1997. [138] S. Wang and S. K. Gupta. ATPG for heat dissipation minimization during test application. IEEE Transactions on Computers, 47(2):256–262, February 1998. [139] S. Wang and S. K. Gupta. LT-RTPG: A new test-per-scan BIST TPG for low heat dissipation. In Proc. IEEE International Test Conference, pages 85–94, 1999. [140] L. Whetsel. Adapting scan architectures for low power operation. In Proc. IEEE International Test Conference, pages 863–872, October 2000. [141] L. Xu, Y. Sun, and H. Chen. Scan array solution for testing power and testing time. In Proc. IEEE International Test Conference, pages 652–659, October 2001. [142] M. Zellerohr, A. Hertwig, and H. J. Wunderlich. Pattern selection for low-power serial built-in self-test. In Proc. IEEE International Test Synthesis Workshop, 1998.
REFERENCES
173
[143] X. Zhang and K. Roy. Design and synthesis of low power weighted random pattern generator considering peak power reduction. In Proc. IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pages 148–156, 1999. [144] X. Zhang and K. Roy. Peak power reduction in low power BIST. In Proc. IEEE First International Symposium on Quality Electronic Design, pages 425–432, 2000. [145] X. Zhang and K. Roy. Power reduction in test-per-scan BIST. In Proc. IEEE On-Line Testing Workshop, pages 133–138, 2000. [146] X. Zhang, K. Roy, and S. Bhawmik. POWERTEST: A tool for energy conscious weighted random pattern testing. In Proc. IEEE International Conference on VLSI Design, pages 416–422, 1999. [147] X. Zhang, W. Shan, and K. Roy. Low-power weighted random pattern testing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(11):1389–1398, November 2000. [148] Y. Zorian. A distributed BIST control scheme for complex VLSI devices. In Proc. 11th IEEE VLSI Test Symposium, pages 4–9, 1993. [149] Y. Zorian, S. Dey, and M. Rodgers. Test of future system-on-chips. In Proceeding of the IEEE/ACM International Conference on Computer-Aided Design, pages 392–398, 2000.
This page intentionally left blank
About the Authors
Nicola Nicolici is an Assistant Professor in the Department of Electrical and Computer Engineering at McMaster University, Canada. He received the Dipl. Ing. degree in Computer Engineering from the University of Romania (1997) and a Ph.D. in Electronics and Computer Science from the University of Southampton, U.K. (2000). Nicola’s current research interests are in the broad area of computer-aided design and test technologies with special emphasis on systems-on-a-chip. He has authored a number of papers in these areas and received the IEEE TTTC Beausang Award for the Best Student Paper at the International Test Conference (ITC 2000). He is a member of the ACM SIGDA and the IEEE Computer and CAS Societies. Bashir M. Al-Hashimi received the B.Sc. degree in Electrical and Electronics Engineering from Bath University, UK, in 1984, and the D.Phil degree from York University, UK, in 1989. Following this he spent over 5 years in industry working mainly in the areas of mixed-signal IC design and test, SPICE modeling and simulation, and design automation of electronic systems. In 1994, he joined Staffordshire University, UK, where he formed the VLSI Signal Processing research group with funding from industry and government. In 1999, he joined the Department of Electronics and Computer Science, Southampton University, UK, as a Senior Lecturer. His current research interests are analog and digital VLSI design and test, SOC low power testing, and HW/SW co-design. He has more than 70 publications in refereed journals and conferences, including a paper that received the IEEE TTTC Beausang Award at the International Test Conference. He has served on numerous IEE and IEEE technical committees of national and international conferences, the most recent are IEEE ISCAS, ETW, and DATE. Dr Al-Hashimi is a senior member of the IEEE, member of the IEE, and a Charted Engineer.
175
This page intentionally left blank
Index
automatic test pattern generation (ATPG) ATALANTA, 105, 111 ATOM, 75, 84, 106, 111 GATEST, 53, 55, 79 MINTEST, 75, 84, 106, 109, 111 post-ATPG, 34, 39
hardware description language (HDL), 2, 51, 115 high level synthesis(HLS), 115, 134 ARGEN, 134 IDDQ, 162 ISCAS89 benchmark circuits, 53, 75, 79, 104, 105, 108, 111
backward justification, 101 BIST area overhead (BAO), 17–19, 46, 115, 119, 124, 127–131, 133–136, 138, 139 built-in logic block observer (BILBO), 8 circuit reliability, 3, 8, 21, 23, 25, 30, 85, 137, 139, 142, 143, 161 circuit under test (CUT), 4–10, 25, 26, 36, 38, 39, 41, 55, 56, 89, 90, 104, 144 clock tree, 36, 87, 88, 95, 96, 162 compact test set, 75–78, 106 complementary metal-oxide semiconductor (CMOS), 2, 4, 21, 22, 26, 144 concurrent built-in logic block observer (CBILBO), 8
linear feedback shift register (LFSR), 8–10, 40, 41,68,96 logic level, 2, 4, 5, 26, 51, 55, 79, 87, 89, 117, 139 low power, 1, 9, 19, 21, 26–29, 34, 36, 38, 45, 47, 85, 115, 139, 144, 146, 147, 161, 162 BIST, 116 design, 21, 117, 162, 163 synthesis, 26, 161 testing, 30, 75, 141 manufacturing yield, 21, 25, 137, 161 Moore’s law, 2 multiple-input signature analyser (MISR), 8, 10
design for test (DFT), 1, 6, 7, 9, 21, 25, 27, 30, 51, 84, 85, 87–89, 92, 95–97, 111, 112, 161, 162 discrete cosine transform (DCT), 17–19, 134, 135, 138, 139
necessary power dissipation, 36, 117 node transition count (NTC), 22, 23, 53–55, 58– 61, 66, 67, 70, 74–82, 84, 85, 102– 108, 111 non-compact test set, 75, 76, 106 NP-hard, 72, 146
elliptic wave filter (EWF), 134, 135, 139 fault coverage, 5, 6, 8–10, 39, 51, 53, 55, 79, 90, 107, 135 freezing signal, 91, 92, 100, 101 full scan, 6, 38, 51–54, 59, 62, 70, 73, 75, 78, 82, 84, 85, 88, 96, 102, 104
partial scan, 6, 51, 52, 54–56, 62, 67, 72, 73, 79, 82, 84, 85 OPUS, 55, 79 performance degradation (PD), 26, 36, 38, 51, 119, 127, 130
genetic algorithms, 38 glitches, 22
177
178
POWER-CONSTRAINED TESTING OF VLSI CIRCUITS
power conscious test synthesis and scheduling (PC-TSS), 124, 127, 134–136, 138, 150 power management, 8, 27, 30 PowerMill, 23, 145, 155 primary inputs, 6, 9, 24, 26, 34, 36, 37, 39, 40, 51–53, 55–62, 65–70, 72–76, 78– 80, 85, 87–98, 100, 103, 104, 108, 1 1 1 , 112, 145 primary outputs, 4, 9, 100 pseudo inputs, 6, 23, 38, 39, 53, 56, 61, 144 pseudo outputs, 6, 39 random access memory (RAM), 47, 75, 104, 155 real delay model, 22, 124, 135 reduced circuit, 89, 92, 93, 97, 98, 100, 101 register-transfer level (RTL), 2, 10, 20, 27, 51, 115, 117, 118, 133, 135, 139, 141 resource allocation graph, 12 scan cells, 6, 22, 27, 36, 38, 40, 51–57, 60, 61, 66, 69, 70, 72–77, 79–82, 84, 85, 87, 96, 103, 104, 110, 112 compatible scan cells, 90 independent scan cells, 104 scan cell order, 54, 59, 61, 67–69, 72–74, 80, 81 scan cell ordering, 38, 52–54, 60–62, 67, 70, 72, 75, 76, 78–82, 85 scan chain, 7, 9, 10, 23, 36, 38, 51, 68, 88, 89, 91, 92, 95–98, 100, 102–104, 106– 108, 1 1 1 , 112, 145 extra scan chain (ESC), 95, 101, 102, 104 multiple scan chains (MSC), 32, 36, 87– 89, 92, 95–98, 102, 104, 106, 109– 112 scan cycle, 6, 7, 10, 36, 39, 40, 56, 58, 59, 72, 88, 89, 96 shift register (SR), 6, 8–10, 40, 51 shifting power (SP), 38, 119, 134, 135 signature analyser (SA), 7, 8, 1 1 , 74 signature analysis, 8, 10, 129 simulated annealing, 69, 72, 73, 78, 81, 82 spurious transitions, 52, 55–60, 62, 66, 69, 75, 88–95, 97, 100, 102, 1 1 1 , 117, 118, 131–133 structural domain, 2 tabu search, 129 test application power (TAP), 134, 135 test application time (TAT), 10, 12, 17–21, 27, 28, 33, 34, 39, 40, 46, 51, 75, 78,
80, 82, 85, 87, 89, 112, 115, 124, 127–131, 133–139, 141, 143, 147, 155–158 test compatibility graph (TCG), 144, 151, 152 test controller, 32 BIST controller, 115, 127, 135 test control, 31, 32 test efficiency, 5, 6, 8, 9, 19, 20, 27, 41, 52, 56, 57, 85, 87, 89, 90, 112, 117–119 test pattern generator (TPG), 7–9, 11, 129 test scheduling, 20, 23, 24, 31, 33, 46, 49, 115– 117, 119, 127, 130–134, 136, 139, 141, 142, 144, 148–150, 155, 158 test set dependent, 19, 31, 34, 39, 49, 89, 1 1 2 test set independent, 20, 31, 32, 87, 89, 105, 1 1 1 , 112 test synthesis, 20, 115–117, 119, 124, 127, 134, 136, 139 test hardware allocation, 12 test register allocation, 133, 134 test resource allocation, 33, 117, 127 test vector (test pattern), 4–12, 21, 23, 24, 26, 27, 31–34, 37–41, 49, 52–61, 65– 70, 72–80, 87, 89–96, 100–106, 108, 1 1 1 , 112, 117, 119, 127, 129, 130, 132, 144–146, 148, 149, 156, 158 essential test vector, 95 extra test vector, 88, 91, 92, 94–98, 100, 102, 104 test vector order, 54, 60, 61, 69, 73, 74, 77, 148 test vector ordering, 34, 40, 52–54, 60, 61, 69, 70, 72, 75–78, 84, 87, 144 testable design space, 19, 20, 117, 129–131, 138 testable design space exploration, 17, 18, 1 1 5 , 116, 127–130, 134, 135, 139 time and area test synthesis and scheduling (TATSS), 124, 134–139
useless power dissipation, 19, 117–119, 124, 127–139 Verilog, 2 very large scale integration (VLSI), 1–4, 6, 8– 10, 19–21, 25–27, 31, 51, 87, 89, 112, 1 1 5 , 139, 161, 162 VHDL, 2, 17, 135 volume of test data (VTD), 19, 20, 39, 40, 85, 87, 104, 107, 108, 110–112, 119, 127 zero delay model, 22, 104