Test and Diagnosis for Small-Delay Defects
Mohammad Tehranipoor • Ke Peng Krishnendu Chakrabarty
Test and Diagnosis for Small-Delay Defects
123
Mohammad Tehranipoor ECE Department University of Connecticut 371 Fairfield Way, Unit 2157 06269 Storrs, CT USA
[email protected] Ke Peng Microcontroller Solutions Group Freescale Semiconductor 6501 William Cannon W Dr, OE320 78735, Austin USA
[email protected] Krishnendu Chakrabarty ECE Department Duke University 130 Hudson Hall 27708 Durham, NC USA
[email protected] ISBN 978-1-4419-8296-4 e-ISBN 978-1-4419-8297-1 DOI 10.1007/978-1-4419-8297-1 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011935996 © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To my parents MT To my parents, Maojun and Cuicui: Thanks for your encouragement and support KP To my students over the years and to the tradition of excellence that they have established KC
Preface
Electronic devices play a very important part in modern human life. As the demand of the market increases, and the development of manufacturing technologies further advances, more and more transistors are being packed into chips with ever increasing operating frequencies for higher functional density. Nanometer-scale technology poses new challenges for both design and test engineers, since scaling technologies provides us not only with higher integration and enhanced performance in designs, but also with increased manufacturing-related defects. The shrinking of technology has introduced more variation to designs and has made design features more probabilistic. Furthermore, the shrinking of technology, along with the long interconnects required by very large scale designs, has also increased on-chip coupling capacitances. Scaled power supply voltages can be applied to lower power consumption in the circuit. However, reducing the power supply voltage also compromises noise immunity, impacting the signal integrity of the design. On the other hand, the market is always requiring higher test quality and lower failure rates, measured in defects per million (DPM). As a result, testing has become one of the most challenging tasks for nanometer-technology designs, and the cost for testing per transistor is increasing as we try to meet these challenges while keeping product quality high. Due to lack of high quality functional tests, several fault models and testing methodologies have been developed for performing structural tests. At-speed delay testing using the transition delay fault (TDF) model has been done for decades to detect timing-related defects to ensure higher test quality and in-field reliability. The small-delay defect (SDD) is one such type of timing defect; it can be introduced by imperfect manufacturing processes as well as by pattern-induced on-chip noises, e.g., power supply noise (PSN) and crosstalk, causing chip failures by introducing extra delay to the design. As technology scales to 45 nm and below, testing for SDDs is necessary to ensure the quality and reliability of high-performance integrated circuits. Traditional at-speed test methods cannot ensure high test coverage for SDDs with a reasonable pattern count. As a result of semiconductor industry demand for high quality patterns, commercial timing-aware automatic test pattern generation vii
viii
Preface
(ATPG) tools have been developed for SDD detection. However, these ATPG tools suffer from large pattern counts and long CPU runtimes. Furthermore, none of these methodologies take into account the impact of process parameters, variations, or onchip noises (e.g., process variations, PSN, and crosstalk) which are potential sources of SDDs. It is vital to diagnose these SDD failures and show which are the major causes of chip failures. This book presents new techniques and methodologies to improve overall SDD detection with very small pattern sets. Based on implementations of these procedures on both academic and industrial circuits, these methods can result in pattern counts as low as a traditional 1-detect pattern set and long path sensitization and SDD detection similar to or even better than n-detect or timing-aware pattern sets. The important design parameters and pattern-induced noises such as process variations, PSN, and crosstalk are taken into account in the proposed methodologies. A diagnostic flow is also presented to identify whether the failure is caused by PSN, crosstalk, or a combination of these two effects. Despite increasing concerns regarding SDDs in integrated circuits fabricated using the latest technologies, the area lacks a comprehensive book that introduces effective and scalable methodologies for screening and diagnosing SDDs that can be used by researchers and students in academia as well as by design and designfor-test (DFT) engineers in industry. The book will greatly benefit people who are interested in SDD detection and diagnosis. Instructors and students can use this book as a text book or reference book for their testing course. DFT engineers in industry can use this book to increase the efficiency of their SDD test patterns and reduce their testing costs. Storrs, CT, USA Austin, TX, USA Durham, NC, USA
Mohammad Tehranipoor Ke Peng Krishnendu Chakrabarty
Acknowledgements
The authors would like to thank Nicholas Tuzzio, a PhD student of ECE Department at the University of Connecticut, for his help with reviewing the chapters, Dr. Junxia Ma of LSI Corporation for her contribution to Chap. 10, Dr. Nisar Ahmed of Freescale Semiconductor for his contribution to Chap. 11, and Dr. Jeremy Lee of Texas Instruments for his contribution to Chap. 9. Thanks to Dr. Mahmut Yilmaz of Advanced Micro Devices Inc. for his contribution to developing pattern grading techniques and path-tracing tool used in this book as well as his valuable discussions. Thanks to the National Science Foundation (NSF) and Semiconductor Research Corporation (SRC), for supporting the projects related to the topics covered in this book. The authors are also grateful to LeRoy Winemberg, Geoff Shofner and Drew Payne of Freescale Semiconductor for their support and valuable discussions. Thanks to Wu-Tung Cheng, Yu Huang, Ruifeng Guo, Pinki Mallick of Mentor Graphics for their support and help in the development of diagnosis flow in this book. Thanks to Fang Bao of ECE department at the University of Connecticut for her contribution to the pattern grading techniques. Finally, special thanks to Charles Glaser of Springer for making this book possible.
ix
Contents
1
Introduction to VLSI Testing. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 Background on Defects and Fault Models . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.1 Defects, Errors, and Faults . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.2 Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Fault Simulation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.1 Serial Fault Simulation . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.2 Parallel Fault Simulation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.3 Deductive Fault Simulation . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.4 Concurrent Fault Simulation . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.5 Other Fault Simulation Algorithms .. . .. . . . . . . . . . . . . . . . . . . . 1.3 Background on Testing.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Test Principle . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.2 Types of Testing . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Automatic Test Pattern Generation .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 Design for Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.1 Scan-Based Design . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.2 Built-In Self-Test (BIST) .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6 Test Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1 1 1 2 6 7 7 9 9 10 11 11 11 13 14 14 16 17 18
2
Delay Test and Small-Delay Defects . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Delay Test Challenging . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.1 Process Variations Effects .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.2 Crosstalk Effects .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.3 Power Supply Noise Effects . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Test for Transition-Delay Faults . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 Test for Path-Delay Faults . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 Small-Delay Defects (SDDs) . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 Prior Work on SDD Test . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5.1 Limitations of Commercial ATPG Tools . . . . . . . . . . . . . . . . . . 2.5.2 New-Proposed Methodologies for SDD Test. . . . . . . . . . . . . .
21 21 22 23 26 27 29 30 30 30 32 xi
xii
3
4
Contents
2.6 Book Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
34 35
Long Path-Based Hybrid Method . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 Pattern Grading and Selection . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.1 Path Classification and Pattern Grading . . . . . . . . . . . . . . . . . . . 3.2.2 Pattern Selection .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Experimental Results on Long Path-Based Hybrid Method . . . . . . . 3.3.1 Experimental Setup .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.2 Pattern Selection Efficiency Analysis. .. . . . . . . . . . . . . . . . . . . . 3.3.3 Pattern Set Comparison . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.4 CPU Runtime Analysis .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.5 Long Path Threshold Analysis . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 Critical Fault-Based Hybrid Method . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.1 Identification of Critical Faults . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.2 Critical Fault-Based Pattern Selection .. . . . . . . . . . . . . . . . . . . . 3.5 Experimental Results on Critical Fault-Based Hybrid Method . . . . 3.5.1 Experimental Benchmarks . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5.2 Effectiveness in Critical Path Sensitization . . . . . . . . . . . . . . . 3.5.3 CPU Runtime Comparison on TF- and CF-Based Methods . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5.4 CF-Based Pattern Generation vs. Timing-Aware ATPG .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5.5 Multiple Detection Analysis on Critical Faults . . . . . . . . . . . 3.5.6 Trade-Off Analysis . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.6 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
37 37 38 38 40 41 41 42 44 46 47 48 48 50 51 51 51
Process Variations- and Crosstalk-Aware Pattern Selection .. . . . . . . . . . 4.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.1 Prior Work on PV and Crosstalk . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.2 Chapter Contents and Organization.. . .. . . . . . . . . . . . . . . . . . . . 4.2 Analyzing Variation-Induced SDDs . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.1 Impact of Process Variations on Path Delay.. . . . . . . . . . . . . . 4.2.2 Impact of Crosstalk on Path Delay . . . .. . . . . . . . . . . . . . . . . . . . 4.3 PV- and Crosstalk-Aware Pattern Selection .. . . .. . . . . . . . . . . . . . . . . . . . 4.3.1 Path PDF Analysis. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3.2 Pattern Selection .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.1 Validation of Process Variations Calculation .. . . . . . . . . . . . . 4.4.2 Validation of Crosstalk Calculation .. . .. . . . . . . . . . . . . . . . . . . . 4.4.3 Pattern Selection Efficiency Analysis. .. . . . . . . . . . . . . . . . . . . . 4.4.4 Pattern Set Comparison . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.5 Long Path Threshold Analysis . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.6 CPU Runtime Analysis .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
61 61 62 62 63 63 66 69 69 70 72 74 74 77 78 80 81
53 55 57 58 59 60
Contents
4.5 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
xiii
81 82
5
Power Supply Noise- and Crosstalk-Aware Hybrid Method. . . . . . . . . . . 83 5.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 83 5.1.1 Prior Work on PSN and Crosstalk . . . . .. . . . . . . . . . . . . . . . . . . . 83 5.1.2 Chapter Contents and Organization.. . .. . . . . . . . . . . . . . . . . . . . 84 5.2 Analyzing Noise-Induced SDDs. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 85 5.2.1 Impact of PSN on Circuit Performance.. . . . . . . . . . . . . . . . . . . 85 5.2.2 Impact of Crosstalk on Circuit Performance . . . . . . . . . . . . . . 87 5.3 Pattern Grading and Selection . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 89 5.3.1 Sensitized Path Identification and Classification . . . . . . . . . . 89 5.3.2 Pattern Selection .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91 5.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93 5.4.1 Experimental Setup .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93 5.4.2 Validation of PSN Calculation . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 94 5.4.3 Validation of Crosstalk Calculation .. . .. . . . . . . . . . . . . . . . . . . . 96 5.4.4 Pattern Selection Efficiency Analysis. .. . . . . . . . . . . . . . . . . . . . 97 5.4.5 Pattern Set Comparison . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98 5.4.6 The Impact of PSN and Crosstalk . . . . .. . . . . . . . . . . . . . . . . . . . 99 5.4.7 CPU Runtime Analysis .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101 5.5 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103 References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 104
6
SDD-Based Hybrid Method . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Techniques for Reducing Runtime and Memory . . . . . . . . . . . . . . . . . . . 6.2.1 Critical Faults Identification . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.2 Parallel Fault Simulation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.3 Fault Merging .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Pattern Evaluation and Selection . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.1 Pattern Evaluation . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.2 Pattern Selection .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.1 Pattern Set Analysis . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.2 Comparison with LP-Based Method.. .. . . . . . . . . . . . . . . . . . . . 6.4.3 Multiple Detection Analysis for Critical Faults .. . . . . . . . . . 6.4.4 Experiments on Industry Circuits . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
105 105 106 106 107 108 109 109 110 111 111 112 113 114 117 117
7
Maximizing Crosstalk Effect on Critical Paths . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1.1 Related Prior Work . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1.2 Chapter Contents and Organization.. . .. . . . . . . . . . . . . . . . . . . .
119 119 120 121
xiv
Contents
7.2
Preliminary Crosstalk Analysis: Proximity and Transition Direction . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.1 Victim/Aggressor Proximity . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.2 Victim/Aggressor Transition Direction .. . . . . . . . . . . . . . . . . . . 7.3 Inducing Maximum Coupling Effects on Delay-Sensitive Paths .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.1 Identifying Nearby Nets . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.2 Xtalk-TDF ATPG . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.3 Virtual Test Point Insertion .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.4 Weighted-Xtalk-TDF ATPG . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 Weighted-Xtalk-TDF ATPG Framework .. . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 Experimental Results and Analysis. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.1 Framework Run-Time . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.2 Targeting Multiple Delay-Sensitive Paths . . . . . . . . . . . . . . . . . 7.6 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
124 125 126 127 128 130 132 134 135 136 137
8
Maximizing Power Supply Noise on Critical Paths .. . . . . . . . . . . . . . . . . . . . 8.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Supply Voltage Noise Induced Delay Analysis .. . . . . . . . . . . . . . . . . . . . 8.2.1 Localized Voltage Drop Analysis . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.2 Voltage Drop Effects on Path Delay . . .. . . . . . . . . . . . . . . . . . . . 8.3 Pattern Generation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.1 Cell Identification.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.2 Virtual Test Points Insertion.. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.3 PDF-Constrained TDF ATPG . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
139 139 141 142 144 145 146 146 147 148 152 152
9
Faster-Than-At-Speed Test . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1.1 Chapter Contents and Organization.. . .. . . . . . . . . . . . . . . . . . . . 9.2 Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Test Pattern Delay Analysis.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.1 Dynamic IR-Drop Analysis at Functional Speed . . . . . . . . . 9.3.2 Dynamic IR-Drop Analysis at Faster-Than-At-Speed Test . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 IR-Drop Aware Faster-Than-At-Speed Test Technique . . . . . . . . . . . . 9.4.1 Pattern Grouping.. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.2 Estimation of Performance Degradation (Δ TGi ) .......... 9.5 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
153 153 155 156 157 158
122 123 123
161 164 165 167 170 172 173
Contents
xv
10 Introduction to Diagnosis .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2 Diagnosis of Combinational Logic . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2.1 Static Fault Diagnosis . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2.2 Dynamic Fault Diagnosis . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2.3 Inject-and-Evaluate Technique .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Diagnosis of Scan Chain .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3.1 Preliminary Scan Chain Diagnosis . . . .. . . . . . . . . . . . . . . . . . . . 10.3.2 Hardware-Assisted Diagnosis .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3.3 Inject-and-Evaluate Scan Chain Diagnosis .. . . . . . . . . . . . . . . 10.4 Chip-Level Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
175 175 176 176 178 181 185 185 186 188 191 191
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF . . . . . . . . . . . . . 11.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1.1 Techniques for Timing Analysis . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1.2 Prior Work on PSN and Crosstalk . . . . .. . . . . . . . . . . . . . . . . . . . 11.1.3 Chapter Contents and Organization.. . .. . . . . . . . . . . . . . . . . . . . 11.2 IR-Drop Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3 IR2Delay Database .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3.1 Transition Analysis . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3.2 Driving Strength Analysis . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3.3 Power Voltage-Delay Map . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4 Mixed-Signal Simulation-Based Validation.. . . .. . . . . . . . . . . . . . . . . . . . 11.4.1 Mixed-Signal Simulation.. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4.2 Simulation Results Extraction . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.5 Experimental Results on IR2Delay Database Validation .. . . . . . . . . . 11.5.1 Experimental Setup .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.5.2 Comparison with Full-Circuit SPICE Simulation .. . . . . . . . 11.5.3 Complexity Analysis . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.6 Diagnosis for Failure Paths . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.7 Experimental Results on Diagnosing IR-Drop SDDs . . . . . . . . . . . . . . 11.7.1 Diagnosis Flow and Experimental Setup .. . . . . . . . . . . . . . . . . 11.7.2 Circuit Performance in Presence of IR Drop .. . . . . . . . . . . . . 11.7.3 Failures from IR Drop .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.7.4 Timing-Aware IR-Drop Diagnosis .. . . .. . . . . . . . . . . . . . . . . . . . 11.8 Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
193 193 193 194 194 195 197 197 199 201 202 202 204 205 205 205 207 207 208 208 209 209 210 211 212
Acronyms
ALAPTF ASIC ATE ATPG BIST CF CLT CMOS CPU CUD CUT DC DDP DDPM DEF DFF DFM DFT DPM DS DSPF DTC DTPG DUT EDA EMD FCFI GB HB IC IEEE
As late as possible transition Application specific integrated circuit Automatic test equipment Automatic test pattern generation Built-in self-test Critical fault Central limit theorem Complementary metal oxide semiconductor Central processing unit Circuit under diagnosis Circuit under test Design Compiler Delay defect probability Delay defect probability matrix Design exchange format Data flip-flop Design-for-manufacturability Design for test Defective parts per million Detected by simulation Detailed standard parasitic format Delay test coverage Diagnostic test pattern generation Design under test Electronic design automation Embedded multi-detection First come first impact Gigabyte Hybrid method Integrated circuit Institute of electrical and electronics engineers xvii
xviii
IP IWLS LOC LOS LP LPthr MB NLDM PDF PDN PI PLL PO PPSFP PSN PV PVT PathDF RTL SCAP SDD SDF SDQL SDQM SE SI SLAT SLthr SOC SP SPEF SRC SSTA STA STAFAN TA TDF TF TPG TPI VCD VDSM VLSI VLV
Acronyms
Intermediate path International workshop on logic and synthesis Launch-on-capture Launch-on-shift Long path Long path threshold Megabyte Non-linear delay model Probability density function Power distribution network Primary input Phase locked loop Primary output Parallel-pattern single-fault propagation Power supply noise Process variation Process-voltage-temperature Path delay fault Register-transfer level Switching cycle average power Small-delay defect Standard delay format Statistical delay quality level Statistical delay quality model Scan enable Signal integrity Single location at-a-time Slack threshold System on chip Short path Standard parasitic exchange format Semiconductor Research Corporation Statistical static-timing analysis Static-timing analysis Statistical fault analysis Timing-aware Transition-delay fault Total fault Test pattern generator Test point insertion Value change dump Very deep sub-micron Very large scale integration Very-low-voltage
Chapter 1
Introduction to VLSI Testing
Due to the complex mechanical and chemical steps involved in today’s manufacturing processes, inaccuracy and imperfections can be introduced to the fabricated chips or integrated circuits (ICs), and therefore they may not be able to perform exactly as the design specification intended. This may result in chip failures and profit loss. Therefore, testing is necessary to ensure that ICs operate correctly before being delivered to customers. Testing is a process used to identify ICs containing imperfections or manufacturing defects that may cause failures by applying test patterns to circuits and analyzing their responses. This book focuses on test for manufacturing and presents techniques and solutions for increasing the test quality for timing-related defect detection in integrated circuits.
1.1 Background on Defects and Fault Models 1.1.1 Defects, Errors, and Faults A defect can be described as an unintended difference between the fabricated circuit and its intended design [19]. These defects, including gate-oxide shorts, missing contact windows, oxide break-down, surface impurities, metal opens or bridges, contact degradation, etc., can be introduced at any step of the manufacturing process. Under some specific situations, the defective device may produce an incorrect output, which is called an error or a failure. In other words, an error or failure is an “effect” of some “defect” in the IC. Due to different types of defects, different types of failures can be introduced to the circuit. The first type of failure is usually called a “logic” or “functional” failure, which causes impaired functionality, much like failures caused by complete opens or bridges. There are also some defects that may only result in parametric failures rather than logical or functional failures. These parametric failures are generally caused by process variations, weak/resistive opens or bridges, and/or M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 1, © Springer Science+Business Media, LLC 2011
1
2
1 Introduction to VLSI Testing
Fig. 1.1 An example defective circuit, with an output error
on-chip noise. They may not necessarily produce an incorrect logical operation, but they can degrade the circuit from expected specifications- resulting in performance degradation, unexpectedly high current leakage, etc. The fault is a representation of a defect at an abstracted functional level. Figure 1.1 shows an example of the difference between a defect, a fault, and an error. This example has two inputs, a and b, and one output c for the two-input OR gate. The input signals at pins a and b are all 0. If the circuit is fault-free, the output response would be 0. Assume that, due to a manufacturing defect, there is a complete open between pin a and the input pin of the OR gate, which is connected to VDD. Therefore, the actual output response is 1 instead of correct output c = a + b = 0. This circuit has: • Defect: A short to power supply VDD. • Error: With inputs a = 0 and b = 0, the output response becomes 1 while a correct response should be 0. • Fault: The OR gate input pin “sticks” at logic 1.
1.1.2 Fault Models As mentioned in Sect. 1.1.1, a fault model is an abstraction of a defect or defects. The advantages of modeling defects as faults are: 1. Analyzing the problem is simplified and there is no need to target and describe complex physical and parametric effects; 2. A single fault model may cover many different kind of defects; 3. The fault models allow us to develop algorithms and automatically generate structural test patterns to detect these defects at earlier stages of the design cycle with at a lower resource cost; 4. The fault models enable test pattern grading in terms of fault coverage metrics. The fault models commonly used in academia and industry are stuck-at faults, bridging faults, transition-delay faults (TDFs), and path-delay faults (PathDFs). 1.1.2.1 Stuck-At Fault Model The single stuck-at fault model is one of the most commonly used fault models in practice. There are two types of stuck-at faults: stuck-at-1 (s-a-1), for which the
1.1 Background on Defects and Fault Models
3
faulty net is permanently set to 1, and stuck-at-0 (s-a-0), for which the faulty net is permanently set to 0. The circuit shown in Fig. 1.1 is an example of stuck-at-1 fault. The assumptions for the single stuck-at fault model are: 1. 2. 3. 4. 5.
The fault only affects the interconnections between gates; Only one line in the circuit is faulty; The fault is permanently set to be either 1 or 0; The fault can be at the input or output of a gate; The fault does not affect the functionality of gates in the circuit.
Due to its simplicity, the single stuck-at fault model offers many advantages in fault detection, which has enabled it to become the most commonly used fault model in industry. Some of the advantages are: 1. The model covers a large portion of manufacturing defects; 2. It is comparably easy to develop algorithms to automatically generate test patterns for stuck-at faults. Current algorithms for generating stuck-at patterns are well-developed and very efficient; 3. It results in a reasonable fault number (at most 2n single stuck-at faults for an n-net circuit). This number can be further reduced by fault collapsing techniques [19]; 4. Some other fault models can be mapped into a series of stuck-at faults. Empirically, silicon data demonstrates that stuck-at test patterns are even capable of detecting some unmodeled defects. However, the stuck-at fault model does not cover all defect types in the circuit (resistive defects, parametric defects, etc.). As a result, the stuck-at fault model alone may not be able to meet the highest defect coverage requirements, and additional fault models are needed.
1.1.2.2 Bridging Fault Model A bridging fault represents a short between a group of signals, which is usually modeled at the gate or transistor level. Bridging faults are commonly found between lines which are physically close to each other in the layout of a circuit. A bridging fault can be 1-dominant (also called OR bridging), 0-dominant (also called AND bridging), or indeterminate bridging, depending upon the technology used for the circuit and the characteristics of the short on site. They can also be classified as non-feedback bridging faults and feedback bridging faults. Non-feedback bridging faults are combinational and most of them can be detected by stuck-at test patterns. Feedback bridging faults produce memory states in the otherwise combinational logic. The bridging fault is a kind of defect-oriented fault, which is more similar to real defects [25,36]. However, layout information is required for accurate models of these kinds of defects. Figure 1.2 shows a few examples of potential bridging fault locations in the layout. It is obvious that with (1) smaller distances between metal lines as a result
4 Fig. 1.2 Potential bridging fault examples
1 Introduction to VLSI Testing
a
c
b
d
of shrinking technology and (2) long parallel lines, the probability of a bridging fault occurring increases. On the other hand, it is possible to decrease bridging fault occurrence rates by increasing line distances and avoiding long parallel lines. However, this is always constrained by performance requirements and design costs. Theoretically, a bridging fault can occur between any kind of physically-close metal wires. However, the bridging probability between lines on two different metal layers is low since there are oxide layers in-between to isolate them. Therefore, in most cases a bridging fault refers to a bridging short between lines in the same metal layer.
1.1.2.3 Transition-Delay Fault Model As technology scales down, some defects tend to alter the performance of the circuit instead of changing its logical functionality; these cannot be detected by stuck-at fault patterns. The delay fault model, which assumes that the faulty circuit element makes signal propagation slower, was developed to target this kind of defects. The delay fault model also covers many physical defects in real silicon, including the effects of process variations, temperature, on-chip power supply noise, crosstalk, resistive opens, resistive shorts, etc. [2, 9]. The transition-delay fault (TDF) is one such delay fault model, and is widely used in industry. A TDF on a line makes the signal change on that line slower and therefore degrades the circuit performance and causes signal propagation errors. TDFs are mainly used to model defects on gates and interconnects whose delay is large enough to cause a signal propagation error (also called a gross-delay defect) on the path running through the fault site to the test observation points (i.e., flip-flops (FFs) or primary outputs (POs)). For each fault site, there are two possible faults: slow-to-rise or slow-to-fall. Figure 1.3 shows TDF’s impact on signal propagation. In the example, the input flip-flop (shown as “Input DFF”) launches a rising transition on its output pin Q1 at the rising edge of CLK1. Then the rising transition at Q1 is propagated to the output flip-flop (shown as “Output DFF”). If the circuit is well designed and is fault-free, the rising transition will arrive at D2 before the capture clock on the output DFF “CLK2,” and the timing diagram should be similar to Fig. 1.3b. However, if there is a TDF (or a gross-delay defect) in the circuit as shown in (a), the signal propagation
1.1 Background on Defects and Fault Models
5
a
Output DFF
Input DFF D
SET
1
1
Q
0
TDF CLR
Q
D
SET
D2 CLR
Q1
Q Q
CLK2
CLK1
Sample circuit
b
Clock period
c
Clock period
CLK1
CLK1
CLK2
CLK2
Q1
Q1
D2
D2
Timing diagram of fault-free circuit
Timing diagram of faulty circuit
Fig. 1.3 An example of the transition delay fault
will be slowed down and the transition will exceed the specified clock period as shown in Fig. 1.3c. As a result, the output DFF is not able to capture the correct value at D2 at the rising edge of CLK2. In other words, the TDF causes the circuit to fail. The advantages of the TDF model are: 1. The number of faults is bounded by 2X the number of nets in the circuit; 2. It is easy to modify the stuck-at fault test pattern generator to produce patterns for TDFs [1]; 3. The circuit that has a high stuck-at fault testability usually has a high TDF testability as well; 4. The TDF model can cover many physical defects in real silicon, and can achieve a defect coverage that the stuck-at fault alone cannot.
1.1.2.4 Path-Delay Fault Model The path-delay fault (PathDF) model is another widely-used delay fault model, which was firstly proposed in [17]. It assumes that there is a cumulative delay defect along a combinational path, which causes the path to exceed some specified duration. This combinational path can begin at a primary input or a clocked flip-flop
6
1 Introduction to VLSI Testing
and can end at a primary output or a clocked flip-flop. The combinational gate chain is between the start and end points. The timing duration can be a specified clock period, or the vector period. Similar to the slow-to-rise and slow-to-fall TDFs, for each combinational path, there are two possible path-delay faults, corresponding to the rising and falling transitions at the input of the path. As a result, the number of possible path-delay faults is 2X the number of physical combinational paths in the circuit. However, note that the total number of paths could be much more than the total number of nets in the circuit. The path-delay fault model addresses cumulative defects that affect an entire path, which makes it superior to the TDF in its modeling capacity. However, it is almost impossible to enumerate all possible paths in large industry designs due to the fact that the number of paths increases exponentially with the circuit size. As a result, the path-delay fault model is only used on a small portion of selected critical paths to generate path-delay test patterns and to verify performance.
1.2 Fault Simulation In the electronic design and test world, there are two types of simulations for two distinct purposes. The first type of simulation uses a true-value simulator to verify the correctness of the design. Its advantages are: 1. It can simulate the details of the circuit behavior, like logic, timing, and analog behaviors; 2. It can be performed on different circuit levels to facilitate and speed up the simulation. However, the weakness of this kind of simulation is the level of difficulty in generating test patterns that can detect all possible defects in the design. The second type of simulation is called fault simulation, which verifies test patterns using a fault simulator. This subsection focuses only on fault simulation. Fault simulation is generally performed after verifying the design and generating test patterns, since it needs the verified design netlist and test patterns as inputs. The fault simulation engine can help to 1. Generate test patterns to meet the given fault coverage for a given fault model with the help of other programs, i.e., the test pattern generator; 2. Determine the fault coverage of a given set of test patterns for a given fault model or fault list; 3. Evaluate test patterns using the fault coverage metric. In fact, the pattern grading and selection procedure in this book is based on the fault simulation. Several algorithms have been developed for fault simulation. Some of them are briefly described below.
1.2 Fault Simulation
7
Fault-Free Circuit Test Pattern
Fault 1 Detected? Faulty Circuit 1 For Fault 1
Fault n Detected? Faulty Circuit n For Fault n
Fig. 1.4 An example of serial fault simulation
1.2.1 Serial Fault Simulation Serial fault simulation is the simplest fault simulation algorithm. At the very beginning of serial fault simulation, the fault-free circuit is simulated and the responses are stored in a file as the reference. Next, the serial fault simulator repeatedly uses the true-value simulator to simulate the faulty circuit and compare the results with the responses from the fault-free circuit. The faulty circuit is derived from the fault-free circuit according to the target fault specification. Simulation of a faulty circuit is stopped as soon as the comparison indicates that the target fault has been detected. Figure 1.4 shows an example of a serial fault simulation. Note that the simulation of each of the circuits (fault-free circuit or faulty circuit) needs to be run one at a time. The advantage of serial fault simulation is that it can easily simulate any type of faults or fault conditions, if the fault and fault condition can be introduced in the circuit description- this includes stuck-at faults, bridging faults, delay faults, and even analog faults. Therefore, many analog circuit faults can be simulated by the serial fault simulation method. The disadvantage is in the runtime requirements. As can be seen in Fig. 1.4, for a circuit with n faults, n + 1 simulations are needed for each test pattern, and the total runtime can be almost n + 1 times that of the truevalue simulation. Therefore, more intelligent algorithms with shorter runtime are needed to reduce the effort of fault simulation.
1.2.2 Parallel Fault Simulation The parallel fault simulation algorithm takes advantages of the bit-parallelism of logic operations in the digital computer, to simulate faulty circuits in parallel with the fault-free circuit. For an n-bit machine (e.g., 32-bit or 64-bit), n simulations (1 for the fault-free circuit simulation, and n − 1 for faulty circuit simulation) with
8
1 Introduction to VLSI Testing For d s-a-1 For b s-a-0 For c s-a-1 For fault-free circuit
1 1 1 1 a
b c
1 0 0 0 d s-a-1
1 1 1 1 0 0 0 0
e 1 0 0 0
1 0 1 1 s-a-0 s-a-1 0 0 1 0
f 1 0 1 0
Fig. 1.5 An example of parallel fault simulation
different signal values can be run simultaneously. This method is straightforward, like serial fault simulation, but is more memory efficient. Obviously, the number of faulty circuits or faults that can be processed in parallel depends on the word length of the machine. Figure 1.5 shows an example of parallel fault simulation. Assume that the computer has four-bit words, and the fault simulation is run on three faults: d stuck − at 1 (s − a − 1), b stuck − at 0 (s − a − 0), and c stuck − at 1 (s − a − 1). The input vector is {abc} = {110}. In order to simulate the three faulty circuits and the fault-free circuit in parallel, the signal on each line is expressed as one word. The first bit (left-most bit) is for the d s − a − 1 faulty circuit, the second bit is for the b s − a − 0 faulty circuit, the third bit is for the c s − a − 1 faulty circuit, and the last bit is for the fault-free circuit. The output of the inverter should be 0 if the circuit is fault-free. When the circuit has the d s − a − 1 fault, the first bit at node d is forced to be 1. The b s − a − 0 and c s − a − 1 faults do not affect node d. Therefore, the values of node can be obtained d: d = {1000}. Similarly, the values of another input of the AND gate is {1011}. The output of the AND gate (node e) is obtained by a bit-by-bit AND operation of its inputs. Thus, e = {1000}. In a similar way, the values of node f can be obtained: f = {1010}. It can be seen that the first bit (for fault d s − a − 1) and the third bit (for fault c s − a − 1) of node f are different from the last bit (for the fault-free circuit). Hence, they are detected. The fault b s − a − 0 is not detected since the second bit of node f is the same as the last fault-free bit. Since the parallel fault simulator computes the signal changes corresponding to several circuits (according to the word length of the host machine) together, it may not be able to model the rise and fall transition delays accurately. In general, the parallel fault simulator models the circuit with zero-delay or unit-delay, which is not acceptable for accurate fault slack calculation. The parallel fault simulation algorithm was proposed in [29] and was widely used in the 1960s and 1970s.
1.2 Fault Simulation La = {a0}
a 1
9 Ld = {a0, d1}
d
0
Le = {a0, d1,e1}
e
0 Lf = {a0, d1,e1, c1, f1}
b 1
Lb = {b0}
c 0
Lc = {c1}
f
0
Fig. 1.6 An example of deductive fault simulation
1.2.3 Deductive Fault Simulation The deductive fault simulation method only simulates the fault-free circuit. For the faulty circuit, all signal values are deduced from the fault-free circuit values and the circuit structure. Since the circuit structure is the same for all faulty circuits, the deductive fault simulator can process all of the faults in a single pass of true-value simulation [4]. During the fault simulation, the fault list of each node contains the names of all the faults that can change the state of that node and make it different from the fault-free value. The propagation of such fault lists is based on the set of operations according to corresponding gate types and values. Figure 1.6 presents an example of the deductive fault simulation. The input vector of this circuit is {abc} = {110}. For simplicity, ak is used to represent a stuck − at − k fault at node a, where k = 0 or 1. Since a, b, and c are primary inputs, their fault lists only contain their own faults activated by the input vectors, La = {a0 }, Lb = {b0 }, Lc = {c1 }. The fault effect on node a can be propagated to node d through the inverter, therefore, a0 is included in the fault list of node d. In additional, it also contains the fault d s − a − 1 since the current state of d is 0 for the fault-free circuit. Since b = 1, the path d → e is sensitized and faults a0 and d1 can be propagated to node e. Since d = 0, the path b → e is not sensitized and therefore, faults b0 cannot be propagated to node e. In additional to these two faults, the fault list of node e also contains e1 since the current state of e is 0 for the fault-free circuit. Thus, Le = {a0 , d1 , e1 }. Similarly, L f can be obtained as L f = {a0 , d1 , e1 , c1 , f1 }, and all faults in L f can be detected by this test vector. As can be seen, the fault list may grow dramatically in size, especially for large designs with long paths. This may incur extremely large memory usage. Furthermore, the fault list must be updated with each new test pattern, which is not efficient in computing.
1.2.4 Concurrent Fault Simulation The concurrent fault simulation algorithm is one of the most general fault simulation methods. It proceeds in exactly the same manner as the event-driven simulation [7]
10
1 Introduction to VLSI Testing
a0
d1
0 1
a
b c
1
a0 1 1
1 1
d
1 0
0
0 1
d1 1 1
1
e1
1
0 1
0
1
0
e
0
b0 0 0
0 0
1
0
0
f
0 a0 1 0
1
d1 1 0
b0 1
0 0
e1 0
1 0
1
c1 0 1
f1 1
0 0
1
Fig. 1.7 An example of concurrent fault simulation
but it extends the event-driven simulation method for the fault simulation in the most efficient way. The details of the concurrent fault simulation algorithm can be found in [7]. Figure 1.7 shows an example of concurrent fault simulation on a three-gate circuit with input vector 110. In this example, all single stuck-at faults are concurrently simulated. Similar to Sect. 1.2.3, ak represents stuck − at − k fault at node a, where k = 0 or 1. Signal values at inputs and output of each gate are written inside the gate. The figure also shows the steady-state of each node after the input vector was applied, and the signal values at the inputs/outputs of each gate (written inside of the gate). For each good gate (gates drawn in solid lines), a number of faulty gates (gates with shading and drawn in dash lines) are attached, with the fault type on top of each bad gate. At least one value of the bad gate (input or output) differs from the good gate, which is caused by the corresponding fault. At the primary output f , if the output of a bad gate differs from that of the good gate, the corresponding fault is detected. Thus, faults a0 , d1 , e1 , c1 , f1 are detected in this example.
1.2.5 Other Fault Simulation Algorithms There are some other fault simulation algorithms which are not listed here, such as the D-calculus-based test-to-detect algorithm [14]. The differential fault simulation algorithm is an enhanced algorithm based on the test-to-detect algorithm. It eliminates the use of D-calculus and explicit true-value restoration, which are both required in the test-to-detect algorithm. As a result, the differential fault simulation algorithm relies only on logic events. The details of this algorithm can be found in [40]. Other techniques worth mentioning include the critical path tracing technique [41,42], the statistical fault analysis (STAFAN) method [43,44], the parallel-pattern single-fault propagation (PPSFP) method [10], etc.
1.3 Background on Testing
11
1.3 Background on Testing 1.3.1 Test Principle The goal of manufacturing test is to detect any defects that occur in the fabricated circuits. Ideally, faulty circuits and fault-free circuits can be differentiated after the manufacturing test. Figure 1.8 illustrates the basic principle of chip testing. Test vectors are applied to the inputs of the circuit-under-test (CUT), and the responses are collected and compared with the expected values. If the responses match, the circuit is considered “good.” Otherwise, it is considered “bad.” The automatic test equipment (ATE) is used to test chips. The test quality depends on the thoroughness of the test vectors. However, the test quality and test cost are interdependent. A large number of test vectors/patterns may result in a good test quality, but it will increase the test time and test cost as well as the time-to-market.
1.3.2 Types of Testing A variety of test methods have been developed for the manufacturing test, and they can be categorized in different ways. In general, each fabricated circuit should be subjected to parametric testing and functional testing.
1.3.2.1 Parametric Testing The parametric test is mainly based on current or voltage measurements. Currentbased tests include short test, open test, maximum current test, IDDQ test, and output driving current test. The IDDQ test has proven to be effective in achieving low defect levels [15]. This test method observes the drain current IDD of the transistors after
Input Vectors 01001 00111 …… 10101
Fig. 1.8 The principle of testing chips
CircuitUnder-Test (CUT)
11001 01011 …… 00011
Comparator
Expected Responses
Test Result
Output Responses 11001 01011 …… 00011
12
1 Introduction to VLSI Testing
completion of switching. This current is referred to as the quiescent current, IDDQ . For a good circuit, IDDQ falls to a negligible value after switching is completed. On the other hand, IDDQ in a defective circuit remains elevated long after switching completes. Therefore, an elevation in this current would indicate the presence of a defect, design error, or variation in circuit parameters. Voltage-based tests include propagation delay test, setup and hold test, functional speed test, access time test, refresh and pause time test, and rise and fall time test, which are usually technologydependent.
1.3.2.2 Functional Testing Functional testing can be considered as an extension of design verification. In design verification, test patterns for specific functionality are generated and applied to the circuit to verify the correctness of the design. In functional testing, the same test patterns are applied to the CUT and the responses are collected by the tester for analysis. Functional testing assumes that if a defect has occurred in the CUT during the manufacturing process, it will manifest as a functional failure. In other words, if a circuit passes all functional tests, then it can be considered to be defect-free. However, this may not be entirely true in real applications since functional test patterns are generated for a particular functional scenario, rather than for catching manufacturing defects in the circuits. As a result, functional patterns may not be able to detect all possible defects in the design, even though many defects can be detected by the functional pattern set. The test quality of functional patterns can always be increased by adding more test patterns, e.g., using exhaustive testing. As mentioned in Sect. 1.3.1, test quality depends upon the thoroughness of the test patterns. With the VLSI technologies sizes scaling down to nanometers, more transistors are being packed into each chip to meet power, area, and performance requirements. As a result, verifying all possible input combinations in a reasonable time becomes an impossible task, since the number of possible input combinations grows exponentially as the number of input pins increases linearly. Therefore, structural testing has been wisely used in industry over the last decade.
1.3.2.3 Structural Testing Structural testing is based on fault models, and focuses on detecting manufacturing defects in the circuit by considering how a defect manifests at the logic level. Instead of caring about the functionality of the CUT, it targets each node in the CUT and tests them in presence of a fault (e.g., stuck-at 1, stuck-at 0, or TDF at each node). Since each fault represents a possible defect manifestation at logic level, structural testing serves as a explicit means of detecting manufacturing defects, as opposed to functional testing. With proper fault models, the structural test can detect
1.4 Automatic Test Pattern Generation
13
manufacturing defects effectively. The most widely used structural fault models are the stuck-at fault model, bridging fault model, transition-delay fault model, and path-delay fault model. Furthermore, structural testing is tractable and measurable in terms of fault detection, and can reduce the pattern count (test cost) significantly for high-defect coverage compared to functional testing. In practice, most defects are detected using structural testing.
1.4 Automatic Test Pattern Generation Automatic test pattern generation (ATPG) is the process of automatically generating a set of test patterns for detecting a specific group of faults. The inputs of the ATPG procedure are design data (e.g., netlist), fault group (specifying what faults are targeted), test protocol and test constraints, and the output is a set of test patterns. The test patterns are then applied to the design for fault detection. If a fault can be detected by the input test patterns, it is called a detected fault. Otherwise, it is called an undetected fault. Note that there may be some faults in the design that cannot be detected by structural patterns, which are called undetectable faults. ATPG algorithms inject a fault into the CUT, and then use a variety of mechanisms to activate the fault and propagate its effect to the circuit output. The output signal changes from the value expected for the fault-free circuit, and this causes the fault to be detected. There are various ATPG algorithms can be found in literature. Some of them are listed below: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Roth’s D-Algorithm (D-ALG) [13]; Goel’s PODEM algorithm [27]; Fujiwara and Shimono’s FAN algorithm [8]; Kirkland and Mercer’s dominator ATPG programs TOPS [32]; Schulz’s learning ATPG programs SOCRATES [20–22]; Giraldi and Bushnell´s EST methodology [11, 12, 24]; Kunz and Pradhan’s Recursive Learning methodology [37–39]; Chakradhar’s NNATPG algorithm family [30]; BDD-Based ATPG Algorithms [28, 33].
The following terms are important test generation definitions and are commonly used in ATPG literatures: • Controllability: A testability metric that measures how difficult it is to drive a node to a specific value. • Observability: A testability metric that measures how difficult it is to propagate the value on a node to the primary output or scan flip-flop. • Sensitization: The process of sensitizing the circuit and enabling the fault to cause an actual erroneous value at the point of the fault.
14
1 Introduction to VLSI Testing
• Propagation: The process of propagating error effects to the primary output or scan flip-flop. • Justification: The process of finding the input combination required to drive an internal circuit node to a specified value. The fault coverage (FC) is a very important matric in the ATPG procedure, which can be calculated using (1.1), where DT represents the number of detected faults, and T F represents the number of total faults in the CUT. FC =
DT × 100% TF
(1.1)
1.5 Design for Testability The purpose of design for testability (DFT) is to make the testing of the CUT easier and more efficient. There are many different DFT techniques in literature, including ad-hoc DFT methods [18], scan-based designs [6, 23], build-in self-test (BIST) [34, 35], and printed circuit board level boundary-scan methods [16]. This subsection only focuses on the scan-based design and BIST methodologies, which are the most commonly used in industry.
1.5.1 Scan-Based Design As the size of circuits increases, it becomes extremely difficult to control and observe all fault locations inside the circuit. As a result, the deterministic ATPG time would be too long to be feasible. The main idea in scan-based designs is to obtain controllability and observability over the flip-flops, which significantly reduces the complexity of test generation and increases the overall test coverage. This is done by 1. Using scan flip-flops instead of normal flip-flops in the circuit; 2. Adding a test mode to the circuit such that when the circuit is in this mode, all flip-flops functionally form one or more scan chains (scan registers); 3. The input and output of these scan registers are connected to the primary input and output so that by using the test mode, all flip-flops can be set to any desired states by shifting those logic states into the scan registers. Similarly, the states of flip-flops can be shifted out to the primary output for observations. Figure 1.9b shows the equivalent scan flip-flop which is used to replace the existing normal flip-flops (Fig. 1.9a) in the design for the purpose of testing. It can be seen that two additional pins, scan-in (SD) and scan enable (SE), are added to the flip-flop. The SE signal controls the operation mode of the scan flip-flop. In the test mode, SD-Q path is activated at the clock edge, while in normal mode, the path
1.5 Design for Testability Fig. 1.9 (a) Normal D flip-flop, (b) Equivalent scan flip-flop, and (c) A scan chain example
15
a
b
Scan flip-flop
D flip-flop D
D
Q
Q
DFF CLK
c
D SD SE CLK
0 1
D
Q
Q
DFF
Combinational Circuit
SI
D SD
Q
CLK SE
CLK SE
D SD CLK SE
Q
D SD
Q
SO
CLK SE
from D-Q is exercised and the scan flip-flop acts as a normal D flip-flop. It can also been seen from Fig. 1.9a, b that due to the added multiplexer, (1) the area overhead is increased and (2) the performance is degraded by replacing the normal flip-flop with the scan flip-flop. However, considering the multiplexer area percentage to the flipflop area, and the flip-flop area percentage to the entire die area, the area overhead added by the multiplexer is negligible. In practice, people can use techniques such as partial scan-based design to avoid timing degradation on critical paths. Note that the partial can may impact the fault coverage. The scan-in (SD) pin of a scan flip-flop is connected to the output of another scan flip-flop (Q pin), so that they form a shift register in the test mode, as shown in Fig. 1.9c. This makes each flip-flop deep in the design logic controllable to any value, as if it were accessible from the primary input, so that the scan flip-flop can drive the combinational logic with the shift-in values. Similarly, the responses of the combinational circuits are captured by the scan flip-flop at the normal operation mode and then shifted out in the test mode to make them observable. This makes the scan flip-flop in the design observable at the primary output. In addition to added controllability and observation, the scan-based design also enables the process for generating test patterns to be fully automated. Furthermore, it can also enable the ATPG tool to generate test patterns with a very high fault coverage. Nowadays, scan-based design and ATPG techniques are main-stream solutions for ensuring the high-quality test during manufacturing. There are many hundreds of flip-flops in modern industry designs with millions of gates. As a result, multiple scan chains are used (as shown in Fig. 1.10a) to limit the test time, determined by (1.2). Ttest = N × M × Tscan ,
(1.2)
where Ttest represent the test time, N is the total number of test patterns, M is the maximum scan chain length, and T is the shift clock period.
16
1 Introduction to VLSI Testing CUT Scan cell
Scan chains
Scan inputs
b
CUT Scan cell
Decompressor
Scan in channels
Scan outputs
Scan chains
Compactor
a
Scan out channels
Fig. 1.10 (a) An example of the scan-based design structure, (b) An example of the TestKompress-based design structure
It can be seen from Fig. 1.10a that a large number of scan chains will result in the same number of scan input and scan output pins. In addition, the large scan chain number and scan chain length will directly produce a huge test pattern volume. This is a big concern about scan architectures, due to the limitations of ATE’s memory and the test cost. The TestKompress logic from Mentor Graphics is an efficient methodology to reduce the test pattern volume and test application time [26]. Figure 1.10b shows an example of the structure of TestKompress-based design. Furthermore, a large number of flip-flops switch simultaneously when shifting in the test patterns. Therefore, the power consumption during test is another important issue for scan architectures. Several work has been done to address this issue [3, 5, 31].
1.5.2 Built-In Self-Test (BIST) The BIST is a DFT technique that aims at detecting faulty components in a system by incorporating the test logic on-chip [34, 35]. The BIST has become a promising solution to VLSI testing problems and has been widely used in industry.
1.6 Test Cost
17
Fig. 1.11 Built-in self-test architecture
TPG
BIST Controller
ROM
Circuit under test (CUT)
Comparator
Good or Faulty?
Figure 1.11 shows a typical BIST hardware structure. In BIST, a test pattern generator (TPG) is used to generate test patterns and apply them to the CUT. The output signature from the CUT is then compared with the reference signature stored in the ROM during BIST. The entire process is controlled by BIST controller. Note that the paths from primary inputs (PIs) to flip-flops and from the flip-flops to primary outputs (POs) cannot be tested by BIST. The BIST technique can be used for both logic and memory test. Some of the advantages of BIST are listed below: 1. 2. 3. 4.
Supports concurrent testing; Can be used for different test levels; Low test cost; Improved testability. However, there are also some disadvantages of using BIST, as listed below:
1. 2. 3. 4.
Area overhead for additional hardware; Performance degradation; Extra power consumption; Limited to achieve full fault coverage.
1.6 Test Cost It is known that fabricated chips need to be tested before being delivered to customers, and test quality and test cost are interdependent. Several tradeoffs are often necessary to obtain the required quality level at the minimal cost. Several costs are mandatory during testing, including the cost of automatic test equipment (ATE) for initialization and running, cost of development (CAD tools, test vector generation, test programming), DFT cost, etc. The test cost has received serious attention in industry. For large electronic systems, testing accounts for 30% or more of the total cost, and many companies claimed that 50–60% of their cost goes to the manufacturing test [19]. As the technology scales down, the cost on manufacturing test will keep increasing. Reducing test patterns is an efficient way to reduce the cost for manufacturing test. However, smaller pattern counts mean lower test quality in most cases, especially for newly emerged defects such as small-delay defects (SDDs). In fact, many companies use n-detect pattern set, which in turn result in much larger pattern
18
1 Introduction to VLSI Testing
counts, compared with 1-detect pattern sets, to ensure high test quality. This book presents techniques to reduce pattern count while keeping a very high test quality for screening SDDs.
References 1. A. Krstic and Kwang-Ting Cheng, “Delay Fault Test for VLSI Circuits”, Boston: Kluwer Academic Publishers, 1998 2. A. Krstic and Yi-Min Jiang and Kwang-Ting Cheng, “Delay testing considering power supply noise effects”, in IEEE International Test Conference (ITC’99), pp. 181–190, 1999 3. Cy Hay, “Testing Low Power Designs with Power-Aware Test, Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX”, Synopsys White Paper, 2010 4. D. B. Armstrong, “A Deductive Method for Simulating Faults in Logic Circuits”, in IEEE Trans. on Computers, vol. C-21, no. 5, pp. 464–471, 1972 5. Dariusz Czysz, Grzegorz Mrugalski, Janusz Rajski, and Jerzy Tyszer, “Low Power Embedded Deterministic Test”, in 25th IEEE VLSI Test Symmposium (VTS’07), 2007 6. E. B. Eichelberger, E. Lindbloom, J. A. Waicukauski, and T. W. Williams, “Structured Logic Testing”, Englewood Cliffs, New Jersey: Prentice-Hall, 1991 7. E. G. Ulrich, V. D. Agrawal, and J. H. Arabian, “Concurrent and Comparative Discrete Event Simulation”, Boston: Kluwer Academic Publishers, 1994 8. H. Fujiwara, “FAN: A Fanout-Oriented Test Pattern Generation Algorithm”, in Proc. of the International Symp. on Circuits and Systems, pp. 671–674, 1985 9. H. Li and P. Shen and X. Li, “Robust test generation for precise crosstalk-induced path delay faults”, in Proc. VLSI 2 Symp. (VTS’06), 2006 10. J. A. Waicukauski, E. B. Eichelberger, D. O. Forlenza, E. Lindbloom, and T. McCarthy, “Fast Simulation for Structured VLSI”, in VLSI Systems Design, vol. 6, no. 12, pp. 20–32, 1985 11. J. Giraldi and M. L. Bushnell, “EST: The New Frontier in Automatic Test Pattern Generation”, in Proc. of the 27th Design Automation Conf., pp. 667–672, 1990 12. J. Giraldi and M. L. Bushnell, “Search State Equivalence for Redundancy Identification and Test Generation”, in Proc. of the International Test Conf., pp.184–193, 1991 13. J. P. Roth, “Diagnosis of Automata Failures: A Calculus and a Method”, in IBM Journal of Research and Development, vol. 10, no. 4, pp. 278–291, 1966 14. J. P. Roth, W. G. Bouricius, and P. R. Schneider, “Programmed Algorithms to Compute Tests to Detect and Distinguish Between Failures in Logic Circuits”, in IEEE Trans. on Electronic Computers, vol. EC-16, no. 5, pp. 567–580, 1967 15. J. Soden, and C. F. Hawkins, “IDDQ Testing: A Review”, in Journal of Electronic Testing: Theory and Applications (JETTA), vol. 3, no. 4, pp. 291–304, 1992 16. K. P. Parker, “The Boundary-Scan Handbook”, Boston: Kluwer Academic Publishers, second edition, 1998 17. L. Smith, “Model for Delay Faults Based upon Paths”, in IEEE International Test Conference (ITC’85), pp. 342–349, 1985 18. M. Abramovici, M. A. Breuer, and A. D. Friedman, “Digital Systems Testing and Testable Design”, Piscataway, New Jersey: IEEE Press, 1994. Revised printing 19. M. Bushnell and V. Agrawal, “Essentials of Electronics Testing for Digital Memory and MixedSignal VLSI Circuits”, ISBN 0-792-37991-8, Kluwer Publishers, 2000 20. M. H. Schulz and E. Auth, “Advanced Automatic Test Pattern Generation and Redundancy Identification Techniques”, in Proc. of the International Fault-Tolerant Computing Symp., pp. 30–35, 1988 21. M. H. Schulz and E. Auth, “Improved Deterministic Test Pattern Generation with Applications to Redundancy Identification”, in IEEE Trans. on Computer-Aided Design, vol. 8, no. 7, pp. 811–816, 1989
References
19
22. M. H. Schulz and E. Auth,, “SOCRATES: A Highly Efficient Automatic Test Pattern Generation System”, in IEEE Trans. on Computer-Aided Design, vol. CAD-7, no. 1, pp. 126– 137, 1988 23. M. J. Y. Willaims and J. B. Angell, “Enhancing Testability of Large-Scale Integrated Circuits via Test Points and Additional Logic”, in IEEE Trans. on Computers, vol. C-22, no. 1, pp. 46– 60, 1973 24. M. L. Bushnell and J. Giraldi, “A Functional Decomposition Method for Redundancy Identification and Test Generation”, in Journal of Electronic Testing: Theory and Applications, vol. 10, no. 3, pp. 175–195, 1997 25. M. Sachdev, “Defect Oriented Testing for CMOS Alalog and Digital Circuits”, ISBN:0-79238083-5, Boston: Kluwer Academic Publishers, 1998 26. Mentor Graphics, “Tessent TestKompress, ATPG with embedded compression”, in Silicon Test and Yield Analysis Datasheet, 2009 27. P. Goel, “An Implicit Enumeration Algorithm to Generate Tests for Combinational Logic Circuits”, in Proc. of the International Fault-Tolerant Computing Symp., pp. 145–151, 1980 28. R. K. Gaede, M. R. Mercer, K. M. Butler, and D. E. Ross, “CATAPULT: Concurrent Automatic Testing Allowing Parallelization and Using Limited Topology”, in Proc. of the 25th Design Automation Conf., pp. 597–600, 1988 29. S. Seshu, “On an Improved Diagnosis Program”, in IEEE Trans. on Electronic Computers, vol. EC-14, no. 1, pp. 76–79, 1965 30. S. T. Chakradhar, V. D. Agrawal, and M. L. Bushnell, “Neural Models and Algorithms for Digital Testing”, Boston: Kluwer Academic Publishers, 1991 31. Srivaths Ravi, V. R. Devanathan, and Rubin Parekhji, “Methodology for Low Power Test Pattern Generation Using Activity Threshold Control Logic”, in Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design (ICCAD’07), 2007 32. T. Kirkland and M. R. Mercer, “A Topological Search Algorithm for ATPG”, in Proc. of the 24th Design Automation Conf., pp. 502–508, 1987 33. T. Stanion and D. Bhattacharya, “TSUNAMI: A Path Oriented Scheme for Algebraic Test Generation”, in Proc. of the International Fault-Tolerant Computing Symp., pp. 36–43, 1991 34. V. D. Agrawal, C. R. Kime, and K. K. Saluja, “A Tutorial on Built-In Self-Test, Part 2: Applications”, in IEEE Design & Test of Computers, vol. 10, no. 2, pp. 69–77, 1993 35. V. D. Agrawal, C. R. Kime, and K. K. Saluja, “Tutorial on Built-in Self-Test, Part 1: Principles”, in IEEE Design & Test of Computers, vol. 10, no. 1, pp. 73–82, Mar. 1993 36. V. R. Sar-Dessai and D. M. H. Walker, “Resistive bridge fault modeling, simulation and test generation”, in IEEE International Test Conference (ITC’99), pp. 596–605, 1999 37. W. Kunz and D. K. Pradhan, “Recursive Learning: A New Implication Technique for Efficient Solution to CAD Problems”, in IEEE Trans. on Computer-Aided Design, vol. 13, no. 9, pp. 1143–1158, 1994 38. W. Kunz and D. K. Pradhan, “Recursive Learning: An Attractive Alternative to the Decision Tree for Test Generation in Digital Circuits”, in Proc. of the International Test Conf., pp. 816– 825, 1992 39. W. Kunz and D. Stoffel, “Reasoning in Boolean Networks: Logic Synthesis and Verification Using Testing Techniques”, Boston: Kluwer Academic Publishers, 1997 40. W.-T. Cheng and M.-L. Yu, “Differential Fault Simulation for Sequential Circuits”, in Journal of Electronic Testing: Theory and Applications, vol. 1, no. 1, pp. 7–13, 1990 41. M. Abramovici, P. R. Menon, and D. T. Miller, “Critical Path Tracing: An Alternative to Fault Simulation”, in IEEE Design & Test of Computers, vol. 1, no. 1, pp. 83–93, Feb. 1984 42. P. R. Menon, Y. H. Levendel, and M. Abramovici, “Critical Path Tracing in Sequential Circuits”, in Proc. of the International Conf. on Computer-Aided Design, Nov. 1988, pp. 162– 165 43. S. K. Jain and V. D. Agrawal, “Statistical Fault Analysis”, in IEEE Design & Test of Computers, vol. 2, no. 1, pp. 38–44, Feb. 1985 44. J. Villoldo, P. Agrawal, and V. D. Agrawal, “STAFAN Algorithms for MOS Circuits”, in Proc. of the International Conf. on Computer Design, Oct. 1991, pp. 56–59
Chapter 2
Delay Test and Small-Delay Defects
As technology scales downwards, new challenges are emerging for test engineers. The deep-submicron effects are becoming more prominent with shrinking technology, thereby increasing the probability of timing-related defects [22,24]. As a result, the stuck-at and IDDQ tests alone cannot ensure high quality level of chips, and at-speed test is needed to cover these timing-related defects. In the past, functional patterns were used for at-speed test. However, functional test generation is difficult and time-consuming for large complex designs. As mentioned previously, functional patterns also have pattern count and coverage issues. A cost-effective alternative are the scan-based structural tests generated by at-speed automatic test pattern generators. The transition fault model and path-delay fault model together provide relatively good coverage for delay-induced defects [5, 13, 26, 29].
2.1 Delay Test Challenging As mentioned above, the transition and path-delay fault models provide better defect coverage, increasing the production test quality and reducing the defective parts per million (DPM) levels. Thus, the transition and path-delay fault models have become very popular in the past decade. The path delay test targets the accumulated delay defects on critical paths in a design and generates test patterns to detect them. Traditionally, static timing analysis (nominal, best-case, or worst-case) is performed to identify the critical paths in the circuit. As technology continues to scale down, more and more delay variations are introduced, which can affect the performance of the target circuit to a large extent. In this situation, the static timing analysis method becomes quite inaccurate since it does not have the capability to fully address these effects. This section briefly describes some of the delay variations in nanometer technology.
M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 2, © Springer Science+Business Media, LLC 2011
21
22
2 Delay Test and Small-Delay Defects
Fig. 2.1 An example of process variations’ effects on interconnect characteristics. (a) is the interconnect specification in the layout, (b) and (c) are the possible fabricated interconnect on real silicon
2.1.1 Process Variations Effects In reality, the parameters of fabricated transistors are not exactly the same as design specifications due to process variations. In fact, the parameters are different from die-to-die, wafer-to-wafer, and lot-to-lot. These variations are systematic and independent in most cases. These variations include impurity concentration densities, oxide thicknesses, and diffusion depths, caused by nonuniform conditions during the deposition and/or the diffusion of the impurities. They directly result in deviations in transistor parameters, such as threshold voltage, oxide thickness, W/L ratios, as well as variation in the widths of interconnect wires [9], and impact the performance (increase or decrease delays) to large extents in the latest technologies. In applications, designers usually develop process technology files (with nominal, best-case and worst-case conditions) to deal with the variations introduced by manufacturing process to their designs. They then simulate their design in different corner cases specified by these process files to ensure that their design is functional in all corners and that the specific timing behaviors are met for static timing analysis. Figure 2.1 is an example of the process variations’ effects on interconnect characteristics. In this example, the interconnect is specified as (a). However, due to the imperfect fabrication process, the fabricated interconnect could be thinner like (b), or wider like (c). Case (b) produces higher resistance and case (c) may result in a higher coupling capacitance between this interconnect and its neighbor nets. Both cases will affect the interconnect delays, and further variation may have a larger impact on the high-speed designs’ performance. Process variations also have a similar impact on transistor characteristics. In academia, Monte Carlo simulation is often used to emulate the effects of process variation. After 1, 000 Monte Carlo simulation runs, it is clear that the delay of this NAND3X1 gate becomes a random variable with a certain distribution, rather than a fixed value even with the same input and output load capacitances (Fig. 2.2). For these 1, 000 Monte Carlo simulation runs, only 170 simulations result in the nominal delay of this gate (1.5 ns). The rest of the simulation results are
2.1 Delay Test Challenging 200
# of occurance
Fig. 2.2 Monte Carlo simulation results on an NAND3X1 gate (simulation runs: 1,000, load capacitance: 1 pF)
23
150
100
50
0 0.5
1
1.5 delay (ns)
2
2.5
distributed around this nominal delay. In some extreme cases, the delay variation can be 50% compared with the 1.5 ns nominal delay, and the delay variation between the minimum gate delay and maximum gate delay can reach 2X. As technology scales down, the delay variation introduced by process variations can increase even more. Not only must designers have a full understanding of process variations, taking them into account during design, but so must test engineers. For example, it is important for test engineers to identify and select the timing-critical paths accurately for path-delay pattern generation. As a result of the variability in this process, it is more likely that more paths will become timing sensitive and will require testing. Without considering process variations, they may fail to identify all timing-critical paths for test generation.
2.1.2 Crosstalk Effects Signal integrity can be affected significantly by the crosstalk effects introduced by parasitic coupling capacitances between parallel interconnects. The crosstalk effects introduced by parasitic coupling capacitances between a target net (the victim) and its neighboring nets (the aggressors) may either increase or decrease the delays on both victim and aggressor nets, depending on the transition direction, transition arrival time, and coupling capacitance between the victim and aggressor nets [27]. Nowadays, more transistors are integrated on a chip. As a result, interconnects have become longer and the interconnect delay has become dominant over the gate delay. As technologies continue to shrink beyond the ultra-deep submicron level, interconnects are also becoming narrower in width. To keep a low wire resistance, the interconnects are becoming taller in height, resulting in large cross coupling capacitances, as shown in Fig. 2.3. It is predictable that in the near future, crosstalk will be a major contributor to the interconnect delay, and will further increase chip delay.
24
2 Delay Test and Small-Delay Defects
Fig. 2.3 Sidewall capacitances between parallel interconnects for (a) 180 nm technology, and (b) 45 nm technology (for simplicity the resistance is not shown)
a
b
900 800
dcoupling_arrival (ps)
700
load capacitance=0.20 load capacitance=0.15 load capacitance=0.10 load capacitance=0.05
600 500 400 300 200 100 0
0.02
0.04 0.06 0.08 Coupling capacitance Ca−v (pF)
0.1
Fig. 2.4 Impact of coupling capacitance on victim propagation delay with same arrival times, opposite transition direction, and different load capacitances. Load capacitance unit is pF
It is necessary for both design and test engineers to analyze and assess crosstalk effects both before signing off on a tape-out and after fabrication (during delay testing). Unfortunately, it is impossible to accurately analyze crosstalk effects without test pattern information. Notice that there may be tens or even hundreds of aggressors for a target victim net. Without test pattern information, there is no way to count how many aggressors are switching with the victim net, and what the active coupling capacitance between the victim net and its aggressors is. The coupling capacitance has a direct impact on the victim’s delay, as shown in Fig. 2.4. It can be seen from the figure that for different load capacitance cases, the propagation delay on the victim net increases linearly with the coupling capacitance. dcoupling arrival denotes the victim net delay considering the impact of coupling capacitance size and Ca−v is the coupling capacitance between the aggressor and victim nets. For the same transition direction case, the crosstalk delay decreases linearly. In real
2.1 Delay Test Challenging
25
Fig. 2.5 Impact of aggressor arrival time on victim propagation delay when victim and aggressor nets have (a) same transition direction and (b) opposite transition direction. Coupling capacitance: 0.1 pF
applications, the load capacitance of the target victim net (the capacitance between the target net and the substrate) is fixed, but the coupling capacitance of the target victim net depends on its activated aggressors. Besides coupling capacitance, arrival time and transition direction on the victim and its aggressors can also impact the victim’s delay dramatically. Figure 2.5 shows the SPICE simulation results on crosstalk effects between two neighboring interconnects (one victim and one aggressor) with a fixed coupling capacitance. The parameter ta−v denotes the arrival time difference between transitions on the aggressor and victim nets and darrival represents the victim net delay considering the impact of arrival time difference. It is seen that when the aggressor and victim nets have the same transition direction (see Fig. 2.5a), the victim net will be sped up. Otherwise, the victim net will be slowed down (see Fig. 2.5b). Furthermore, the crosstalk effect on the victim net is maximized when the transition arrival time of aggressor and victim nets are almost the same (ta−v ≈ 0). Again, without test pattern information, the transitions on the victim and it aggressors cannot be obtained.
26
2 Delay Test and Small-Delay Defects
Fig. 2.6 IR-drop plot of a test pattern applied to wb conmax benchmark
2.1.3 Power Supply Noise Effects Technology scaling allows us to pack more transistors into one chip and increase the operating frequency of the transistors. This results in increases to both switching and power density. The increase in frequency and decrease in the rise/fall transition times in today’s designs causes more simultaneous switching activity within a small time interval and further causes increases in current density and voltage drop along power supply nets. As a result, power supply noise (PSN) has become an important factor for nanometer technology designs. PSN can be introduced by inductive or resistive parameters, or by a combination of them. Inductive noise is calculated as L.di/dt, depending on the inductance L and instantaneous current changing rate. The package leads and wire/substrate parasitics are the main sources of inductive noise. The resistive noise is referred to as IR drop, and depends on the current and distributed resistance in the power distribution network. This book focuses on the resistance-introduced power supply noise, i.e., IR drop, and its impact on circuit performance. Figure 2.6 shows the IR-drop plot for the wb conmax benchmark [7] for a randomly selected TDF test pattern. The pattern set for this benchmark is generated using a commercial ATPG tool. The launch-off-capture (LOC) scheme with random-fill is used to minimize the pattern count. This is the average IR drop calculated during the launch-to-capture cycle. Power pads are located in the four corners of the design. It can be seen that the areas far the from power pads (in the center of the design) have a large IR drop. As shown, different gates in the design will experience different voltage drops, resulting in delay compromise and performance degradation. As the circuit size increases, more severe IR drop is expected in the
2.2 Test for Transition-Delay Faults 250
240 Gate delay (ps)
Fig. 2.7 Average delay increase of a gate as a result of IR-drop increase (180 nm Cadence generic standard cell library, nominal power supply voltage = 1.8 V)
27
230
220
210
200 0
0.1 0.2 0.3 Power supply voltage drop (V)
0.4
design. These IR-drop plots were obtained from the Cadence SOC Encounter tool and were measured during the at-speed launch and capture cycles of the test pattern. The voltage drop on a gate will directly impact its performance, and further result in performance degradation or functional failures of the circuit. Figure 2.7 presents the simulation results on an AND gate with different power supply voltages. The output load capacitance of the gate is 0.1 pF. It is seen that with 20% IR-drop (0.36 V), the average gate delay decrease can be approximately 21%. This experiment is based on 180 nm Cadence Generic Standard Cell Library with nominal Vdd = 1.8 V. It has to notice that in smaller technology nodes, the percentage of gate delay increase will be much higher [10]. Note that when more than one gate on a path experiences voltage drop, the performance degradation will be profound. Power supply noise has been a major issue to deal with when generating at-speed delay test patterns. After scan shifting in the test patterns at a lower frequency, the functional frequency is applied during the launch-to-capture cycle of the at-speed test. In general, the power supply noise during the at-speed delay test is much larger compared with functional circuit operation. This is due to the fact that a larger number of transitions occur within a short time interval in the structural at-speed delay test. Novel frameworks and methods are needed to accurately analyze power supply noise effects for delay fault test pattern generation. This book will present new power supply noise calculations and diagnosis flows for delay test pattern analysis.
2.2 Test for Transition-Delay Faults As mentioned in the previous chapter, the transition-delay fault (TDF) models a slow signal change defect in the circuit, and that for each fault site, there are two possible faults- slow-to-rise and slow-to-fall. To test a TDF, a vector pair (V 1, V 2)
28
2 Delay Test and Small-Delay Defects
Fig. 2.8 Waveform for test with the launch-off-shift method
Shift clock
Launch Capture clock clock
Shift clock
CLK SEN
Shift in pattern i Shift out pattern i-1 response
Shift in pattern i+1 Shift out pattern i response
is required. Therefore, there are two test vectors in a single TDF test pattern. A pathdelay test pattern also has two test vectors. Vector V 1 is called the “initialization vector” and V 2 is called the “launch vector.” The response of the CUT to the vector V 2 is captured at the operational functional speed. The entire process for testing a TDF can be divided into three cycles: 1. Initialization Cycle, where the CUT is initialized to a particular state by vector V 1, 2. Launch Cycle, where a transition is launched at the target gate terminal (V 2 is applied); 3. Capture Cycle, where the transition is propagated and captured at an observation point. Depending on how the transition is launched and captured, there are three transition fault pattern generation methods, referred to as launch-off-shift (LOS) or skewed-load [12], launch-off-shift (LOC) or broadside method [11], and Enhanced Scan [3]. For the LOS method, the transition at the gate output is launched in the last shift cycle during the shift operation. Then the scan enable (SEN) goes low to enable response capture at the capture clock edge. Figure 2.8 shows the LOS waveform for a scan flip-flop design. It can be seen that LOS requires the SEN signal to be timing critical (must be low between the last shift clock cycle and the capture clock cycle), which may make the DFT design more expensive. The LOC method does not need at-speed SEN signal. In the LOC method, the launch cycle is separated from the shift operation. At the end of scan-in (shift mode), vector V 1 is applied and the CUT is set to an initialized state, and the vector V 2 depends on the functional response of the initialization vector V 1. As a result, the launch path is less controllable and the test coverage is lower compared with the LOS method. Figure 2.9 shows the LOC waveform for a scan flip-flop design. The Enhanced Scan technique allows application of any arbitrary vector-pairs by inserting a hold latch between each scan flip-flop. This technique requires that the two vectors V 1 and V 2 are shifted into the scan flip-flops simultaneously. Using the enhance-scan method, delay tests can be generated by considering the combinational logic alone, which makes the test generation easier. Figure 2.10 shows the architecture of enhance-scan delay test. It can be seen that due to the hold latches, an additional HOLD signal is needed.
2.3 Test for Path-Delay Faults
29
Fig. 2.9 Waveform for test with the launch-off-capture method
Launch Capture clock clock
Shift clock
Shift clock
CLK SEN
Shift in pattern i+1 Shift out pattern i response
Shift in pattern i Shift out pattern i-1 response
Combinational Circuit
D
SI
SD
D Q
HL
SD
CLK SE
D Q
HL
CLK SE
SD
Q
SO
CLK SE
CLK SE HOLD
Fig. 2.10 Architecture for the enhanced-scan delay test application
The drawbacks of enhanced scan: 1. It needs to add the area-intensive hold latch and increases the area overhead; 2. The hold latch adds some delay to signal paths and degrades the circuit performance.
2.3 Test for Path-Delay Faults The path-delay fault assumes that there is a cumulative delay defect along a combinational path, which causes the path to exceed some specified duration. For each combinational path, there are two possible path-delay faults, corresponding to the rising and falling transitions at the input of the path. Two test vectors are required to test each path-delay fault. The first vector V 1 is used to initialize the target path to a specific state and the second vector V 2 is used to launch a transition at the input of the target path. The circuit setup needs to ensure that the transition at the input of the path can be propagated to the end of the path. The terms robust path-delay test and non-robust path-delay test are frequently mentioned in literatures of path-delay test. A robust path-delay test can guarantee
30
2 Delay Test and Small-Delay Defects
that an incorrect value can be produced at the end of path if the delay of the path under test exceeds a specified duration. A non-robust path-delay test can detect the path-delay fault only when no other path-delay fault is present. Therefore, to effectively detect this path-delay fault, the expected output value must be uniquely controlled by the transition propagating through the target path. Besides the robustness problem, the path number in the circuit is also of large concern, as it is know that the number of paths in the circuit increases exponentially with the circuit size. Therefore, the path-delay fault model is usually only used on a small portion of selected critical paths to generate path-delay test patterns.
2.4 Small-Delay Defects (SDDs) Small-delay defects (SDD) are one type of timing defect, which introduces a small amount of extra delay to the design. The SDD was firstly alluded to in [4]. Because of their small size relative to the timing margins allowed by the maximum operating frequency of a design, SDDs were not seriously considered in the testing of designs at higher technology nodes. Although the delay introduced by each SDD is small, the overall impact can be significant if the sensitized path is a long/critical path, especially when technology scales to 45 nm and below [21]. As the shrinking of technology geometries and increasing of operating frequency of the design continues, the available timing slack becomes smaller. Therefore, SDDs have a good chance to add enough additional delay to a path to adversely impact the circuit timing and make that part deviate from its functional specifications. Studies have shown that a large portion of failures in delay-defective parts are due to SDDs in the latest technologies [19, 21]. Therefore, SDDs require serious consideration to help increase defect coverage and test quality, or decrease the number of test escapes (i.e., increase in-field reliability), denoted by DPM. Due to the small size of their delay, SDDs are commonly recommended to be detected via long paths running through the fault site.
2.5 Prior Work on SDD Test 2.5.1 Limitations of Commercial ATPG Tools Experiments have demonstrated that TDF test pattern sets can achieve a defect coverage level that stuck-at patterns alone cannot, and they can also detect some SDDs. Unfortunately, such pattern sets have shown a limited ability to detect SDDs in devices and meet the high SDD test coverage requirements in industry, which is very low or close-to-zero DPM for some critical application such as automotive or medical systems. Traditional ATPG tools were developed to target gross delay
2.5 Prior Work on SDD Test
31
defects with a minimal runtime and pattern count, rather than targeting SDDs. To minimize run time and pattern count, TDF ATPGs were developed to be timing unaware and activate TDFs via the easiest sensitization and propagation paths, which are often shorter paths [19]. Note that a transition delay fault can only be detected if it causes a signal transition that exceeds the slack of the affected path. Considering this fact, an SDD may escape the traditional TDF testing and may cause failure in the field if a long path passes through it [1, 19]. Therefore, it is necessary to detect SDDs through long paths. Commercial timing-aware ATPG tools, e.g., the latest versions of Synopsys TetraMAX [25] and Mentor Graphics FastScan [17], have been developed to deal with the deficiencies of traditional timing-unaware ATPGs. The timing-aware ATPG targets each undetected fault along paths with minimal timing slack, or small timing slack according to user specification, which is the long path running through the fault site. If a fault is detected through a minimal slack path, it is categorized as detected by simulation (DS) and removed from the fault list. The generated pattern will either be filled randomly or fed to the compression engine. Then fault simulation will be run to identify all detected SDDs. If a fault is detected along a path with a slack larger than the specified threshold (least-slack), it is identified as partially detected. Such faults will continue to be targeted every time ATPG generates a new pattern; the fault will be dropped from fault list if, at some point during the pattern generation process, is detected through a path that meets the slack requirement. The main drawback of timing-aware ATPG algorithms is that they waste a lot of time operating on faults that do not contribute to SDD coverage resulting in a large number of patterns. Experimental results have demonstrated that timingaware ATPGs will result in significantly larger CPU runtime and pattern count. Furthermore, they seem (1) ineffective in sensitizing large numbers of long paths and (2) incapable of taking into account important design parameters like process variations, crosstalk and power supply noise, which are very important sources for inducing small delay in very deep submicron designs [15, 16]. Due to its underlying algorithms, the n-detect ATPG can also be an effective method for SDD detection, even without timing information of the design. For each target fault, n-detect ATPG will generate patterns trying to detect it n times, through different paths [14, 30]. Therefore, if there is a sensitizable long path running through the target fault site, with a large value of n for the n-detect ATPG, the tool will have a good chance of detecting faults via their possible long paths. In other words, n-detect ATPG can result in high-quality patterns for screening SDDs. Furthermore, experiments have demonstrated that the n-detect ATPG requires much lower CPU runtime when compared with timing-aware ATPG. However, the significantly large pattern count for large n limits the usage of n-detect ATPG in practice. Figure 2.11 presents the normalized pattern count, number of detected SDDs, and CPU runtime of 1-detect, n-detect (n = 5, 10, 20) and timing-aware (ta) pattern sets for the IWLS ethernet benchmark (138,012 gates and 11,617 flipflops) [7]. The detected SDD is defined as a detected TDF with slack equal or
32
2 Delay Test and Small-Delay Defects
Fig. 2.11 Normalized pattern count, detected SDDs and CPU runtime of different pattern sets for ethernet. The results are normalized with respect to 1-detect pattern set
smaller than 0.3T , where T is the clock period of the design. It can be seen that n-detect and timing-aware pattern sets can detect more SDDs than the traditional 1-detect timing-unaware pattern set (1.5–1.9 X). However, the penalty is a large pattern count (3.6–12.2 X) and CPU runtime (3.0–12.0 X). It is obvious that as n increases, the increases in pattern count and CPU runtime for the n-detect ATPG are approximately linear. For this design, the timing-aware ATPG results in a pattern count comparable to 5-detect ATPG. However, its CPU runtime is even larger than 20-detect ATPG. Another method for detecting SDDs is to simply perform faster-than-at-speed testing. This kind of techniques increases the test frequency to reduce the positive slack of the path to detect SDDs on target paths. However, the application of this method is limited since (1) the on-chip clock-generation circuits must be overdesigned to meet the requirements of the faster-than-at-speed testing to provide various high frequency steps and frequency sweep ranges, which would make it expensive and difficult to generate given the process variations and (2) the methodology may result in false identification of good chips as faulty due to the reduced cycle time and increased IR-drop [18] leading to unnecessary yield loss. In summary, current tools such as timing-unaware TDF ATPGs and timing-aware ATPGs are either inefficient at detecting SDDs or suffer from large pattern counts and CPU runtime. Furthermore, none of these methodologies take into account the impact of important design parameters e.g., process variations, power supply noise and crosstalk, which are potential sources of SDDs.
2.5.2 New-Proposed Methodologies for SDD Test In recent years, several techniques have been proposed for screening SDDs, with most attempts having focused on developing algorithms to target a delay fault via the longest path.
2.5 Prior Work on SDD Test
33
• In [20], the authors proposed an as late as possible transition fault (ALAPTF) model to launch one or more transition faults at the fault site as late as possible to detect the faults through the path with least slack. This method requires a large CPU run time compared to traditional ATPGs. • A method to generate K longest paths per gate for testing transition faults was proposed in [28]. However, the longest path through a gate may still be a short path for the design, and thus may not be efficient for SDD detection. Furthermore, this method also suffers from high complexity, extended CPU runtime, and pattern count. • The authors in [2] proposed a delay fault coverage metric to detect the longest sensitized path affecting a TDF fault site. It is based on the robust path delay test and attempts to find the longest sensitizable path passing through the target fault site and generating a slow-to-rise or slow-to-fall transition. It is impossible to implement this method on large industry circuits since the number of long paths increases exponentially with circuit size. • The authors in [6] proposed path-based and cone-based metrics for estimating the path delay under test, which can be used for path length analysis. This method is not accurate due to its dependence on gate delay models, unit gate delay, and differential gate delay models, which were determined by the gate type, the number of fan-in and fan-out nets, and the transition type at the outputs. Furthermore, this method is also based on static timing analysis. It does not take into account pattern-induced noise, process variations, and their impact on path delays. • The authors in [23] proposed two hybrid methods using 1-detect and timingaware ATPGs to detect SDDs with a reduced pattern count. These methods first identify a subset of transition faults that are critical and should be targeted by the timing-aware ATPG. Then top-off ATPG is run on the undetected faults after timing-aware ATPG to meet the fault coverage requirement. The efficiency of this method is questionable, since it still results in a pattern count much larger than traditional 1-detect ATPG. • In [19], a static-timing-analysis based method was proposed to generate and select patterns that sensitize long paths. It finds long paths (LPs), intermediate paths, and short paths to each observation point using static timing analysis tools. Then intermediate path and short path observation points are masked in the pattern generation procedure to force the ATPG tool to generate patterns for LPs. Next, a pattern selection procedure is applied to ensure the pattern quality. • The output-deviation based method was proposed in [16]. This method defines gate-delay defect probabilities (DDPs) to model delay variations in a design. Gaussian distribution gate delay is assumed and a delay defect probability matrix (DDPM) is assigned to each gate. Then, the signal-transition probabilities are propagated to outputs to obtain the output deviations, which are used for pattern evaluation and selection. However, in case of a large number of gates along the paths, with this method, calculated output deviation metric can saturate and similar output deviations (close to 1) can be obtained for both long and intermediate paths (relative to clock cycle). Since in modern designs, there
34
2 Delay Test and Small-Delay Defects
exists a large number of paths with large depth in terms of gate count, the output deviation-based method may not be very effective. A similar method was developed in [15] to take into account the contributions of interconnect to the total delay of sensitized paths. Unfortunately, it also suffers from the saturation problem. • A false-path-aware statistical timing analysis framework was proposed in [8]. It selects all logically sensitizable long paths using worst-case statistical timing information and obtains the true timing information of the selected paths. After obtaining the critical long paths, it uses path delay fault test patterns to target them. This method will be limited by the same constraints the path delay fault test experiences. In general, the most effective way of detecting a SDD is to detect it via long paths. The commercial timing-aware ATPG tools and most previously proposed methods for detecting SDDs have relied on standard delay format (SDF) files generated during physical design flow [17, 25]. However, the path length is greatly impacted by process variations, crosstalk, and power supply noise. A short path may become long because of one or a combination of such effects. Some of these effects may also cause a long path to become short. For instance, in certain circumstance, crosstalk effects can speed up signal propagation and shorten path length. SDF files are pattern-independent and are incapable of taking these effects into consideration. Furthermore, the complexity of today’s ICs and shrinking process technologies has made design features more probabilistic. Thus, it is necessary to perform statistical timing analysis before evaluating the path length. Both traditional and timing-aware ATPGs are not capable of addressing these statistical features. This book presents various hybrid pattern grading and selection methodologies for screening SDDs that are caused by physical defects as well as by delays added to the design by process variations, power supply noise, and crosstalk. From implementations of the presented procedures on both academic and industry circuits, these methods can result in pattern counts as low as a traditional 1-detect pattern set and long path sensitization and SDD detection similar or even better than the n-detect or timing-aware pattern set. This procedure is capable of considering important design parameters such as process variations, crosstalk, power-supply noise, on-chip temperature, Ldi/dt effects, etc. In this book, process variations, crosstalk and power supply noise are added in the pattern evaluation and selection procedures.
2.6 Book Outline The remainder of the book is organized as follows. Chapter 3 presents a longpath SDF-based hybrid method for screening SDDs and present metrics for pattern evaluation and selection. Chapter 4 adds process variations and crosstalk effects to the pattern evaluation and selection flow. This chapter also evaluated the efficiency
References
35
and accuracy of the procedures for process variations and crosstalk calculation. In Chap. 5, the power supply noise and crosstalk-aware hybrid method is presented. This chapter presents a new crosstalk calculation procedure, which is more accurate compared with the methodology used in Chap. 4. A SDD-based hybrid method is presented in Chap. 6, which is very fast and efficient, and can easily be applied to large industry design with millions of gates. Chapters 7 and 8 present the techniques for introducing maximizing crosstalk and power supply noise effects on critical paths, when generating path-delay test patterns. Chapter 9 introduces the fast-thanat-speed test technique. Power supply noise is also considered in this technique. Chapter 10 introduces the techniques for combinational circuit diagnosis, scan chain diagnosis, as well as chip-level diagnosis strategy. Chapter 11 presents a timingbased SDD diagnosis flow.
References 1. A. K. Majhi, V. D. Agrawal, J. Jacob, L. M. Patnaik, “Line Coverage of Path Delay Faults,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol. 8, No. 5, pp. 610–614, 2000 2. A. K. Majhi, V. D. Agrawal, J. Jacob, L. M. Patnaik, “Line coverage of path delay faults,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5, pp. 610–614, 2000 3. B. Dervisoglu and G. Stong, “Design for Testability: Using Scanpath Techniques for PathDelay Test and Measurement,” in Proc. Int. Test Conf. (ITC’91), pp. 365–374, 1991 4. E. S. Park, M. R. Mercer, T. W. Williams, “Statistical Delay Fault Coverage and Defect Level for Delay Faults,” in Proc. IEEE International Test Conference (ITC’88), 1988 5. G. Aldrich and B. Cory, “Improving Test Quality and Reducing Escapes,” in Proc. Fabless Forum, Fabless Semiconductor Assoc., pp. 34–35, 2003 6. H. Lee, S. Natarajan, S. Patil, I. Pomeranz, “Selecting High-Quality Delay Tests for Manufacturing Test and Debug,” in Proc. IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT’06), 2006 7. IWLS 2005 Benchmarks, “http://iwls.org/iwls2005/benchmarks.html” 8. J. Lion, A. Krstic, L. Wang, and K. Cheng, “False-Path-Aware Statistical Timing Analysis and Efficient Path Selection for Delay Testing and Timing Validation”, in Design Automation Conference (DAC’02), pp. 566–569, 2002 9. J. M. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits, A Design Perspective (Second Edition),” Prentice Hall Publishers, 2003 10. J. Ma, J. Lee, and M. Tehranipoor, “Layout-Aware Pattern Generation for Maximizing Supply Noise Effects on Critical Paths,” in Proc. IEEE VLSI Test Symposium (VTS’09), 2009 11. J. Savir and S. Patil, “On Broad-Side Delay Test,” in Proc. VLSI Test Symp. (VTS’94), pp. 284– 290, 1994 12. J. Savir, “Skewed-Load Transition Test: Part I, Calculus,” in Proc. Int. Test Conf. (ITC’92), pp. 705–713, 1992 13. K. Cheng, “Transition Fault Testing for Sequential Circuits,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 12, no. 12, pp. 1971–1983, Dec 1993 14. M. E. Amyeen, S. Venkataraman, A. Ojha, S. Lee, “Evaluation of the Quality of N-Detect Scan ATPG Patterns on a Processor”, IEEE International Test Conference (ITC’04), pp. 669–678, 2004 15. M. Yilmaz, K. Chakrabarty, and M. Tehranipoor, “Interconnect-Aware and Layout-Oriented Test-Pattern Selection for Small-Delay Defects,” in Proc. IEEE Int. Test Conference (ITC’08), 2008
36
2 Delay Test and Small-Delay Defects
16. M. Yilmaz, K. Chakrabarty, and M. Tehranipoor, “Test-Pattern Grading and Pattern Selection for Small-Delay Defects,” in Proc. IEEE VLSI Test Symposium (VTS’08), 2008 17. Mentor Graphics, “Understanding how to run timing-aware ATPG,” Application Note, 2006 18. N. Ahmed, M. Tehranipoor and V. Jayaram, “A Novel Framework for Faster-than-at-Speed Delay Test Considering IR-Drop Effects”, in Proc. Int. Conf. on Computer-Aided Design (ICCAD’06), 2006 19. N. Ahmed, M. Tehranipoor and V. Jayaram, “Timing-Based Delay Test for Screening Small Delay Defects,” IEEE Design Automation Conf., pp. 320–325, 2006 20. P. Gupta, and M. S. Hsiao, “ALAPTF: A new transition fault model and the ATPG algorithm,” in Proc. Int. Test Conf. (ITC’04), pp. 1053–1060, 2004 21. R. Mattiuzzo, D. Appello C. Allsup, “Small Delay Defect Testing,” http://www.tmworld.com/ article/CA6660051.html, Test & Measurement World, 2009 22. R. Wilson, “Delay-Fault Testing Mandatory, Author Claims,” EE Design, Dec 2002 23. S. Goel, N. Devta-Prasanna and R. Turakhia, “Effective and Efficient Test pattern Generation for Small Delay Defects,” IEEE VLSI Test Symposium (VTS’09), 2009 24. S. Natarajan, M.A. Breuer, S.K. Gupta, “Process Variations and Their Impact on Circuit Operation,” in Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 73– 81, 1998 25. Synopsys Inc., “SOLD Y-2007, Vol. 1–3,” Synopsys Inc., 2007 26. T. M. Mak, A. Krstic, K. Cheng, L. Wang, “New challenges in delay testing of nanometer, multigigahertz designs,” IEEE Design & Test of Computers, pp. 241–248, May–Jun 2004 27. W. Chen, S. Gupta, and M. Breuer, “Analytic Models for Crosstalk Delay and Pulse Analysis Undernon-Ideal Inputs,” in Proc. IEEE International Test Conference (ITC’97), pp. 808–818, 1997 28. W. Qiu, J. Wang, D. Walker, D. Reddy, L. Xiang, L. Zhou, W. Shi, and H. Balachandran, “K Longest Paths Per Gate (KLPG) Test Generation for Scan Scan-Based Sequential Circuits,” in Proc. IEEE ITC, pp. 223–231, 2004 29. X. Lin, R. Press, J. Rajski, P. Reuter, T. Rinderknecht, B. Swanson and N. Tamarapalli, “HighFrequency, At-Speed Scan Testing,” IEEE Design & Test of Computers, pp. 17–25, Sep–Oct 2003 30. Y. Huang, “On N-Detect Pattern Set Optimization,” in Proc. IEEE the 7th International Symposium on Quality Electronic Design (ISQED’06), 2006
Chapter 3
Long Path-Based Hybrid Method
3.1 Introduction As discussed in the previous chapter, small-delay defects (SDDs) introduce a small amount of extra delay to the design, and it is commonly recommended to detect SDDs via long paths running through fault sites [3–10]. Therefore, if a pattern sensitizes a large number of long paths, it can detect all the SDDs along these long paths, and can be considered as an effective pattern. That is the basis of the book. Due to the complexity of automatic test pattern generation (ATPG) algorithms, it is very difficult, or even impossible, to develop an ATPG tool that can produce a high-quality SDD pattern set, with (1) a large number of long path sensitization, (2) low pattern count, (3) reasonable CPU runtime, and (4) considerations for layout and on-chip noise. Instead of developing a new ATPG algorithm, this book presents pattern evaluation and selection methodologies to 1. Take advantages of existing commercial ATPG, e.g., n-detect ATPG, in terms of SDD detection efficiency and CPU runtime, 2. Reduce the pattern count by selecting only high-quality test patterns for screening SDDs. This chapter presents a hybrid method based on standard delay format (SDF) timing information to grade and select the most effective patterns for screening SDDs. Process variations and on-chip noise, i.e., power supply noise and crosstalk, will be considered in the following chapters. n-detect pattern sets are used as the original pattern repository for pattern evaluation and selection. Several techniques are presented to reduce pattern counts and CPU runtime. Instead of evaluating the length of all the paths in the design, which is impossible for large industry designs, this chapter evaluates only the sensitized paths of each pattern, based on the SDF timing with which to evaluate/grade the pattern; this can save significant CPU runtime. The contents of the chapter include: 1. Presenting a hybrid method based on pattern grading and selection from a large repository of patterns and using timing-unaware ATPG. M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 3, © Springer Science+Business Media, LLC 2011
37
38
3 Long Path-Based Hybrid Method
2. A procedure is introduced to identify all the sensitized paths by each TDF pattern. 3. An procedure and a metric are introduced to evaluate each TDF pattern in terms of its effectiveness for detecting SDDs. 4. A critical fault identification procedure is presented to reduce ATPG runtime and hardware resources before generating the original pattern repository. 5. A SDD pattern selection procedure is presented. The patterns are selected from a large pattern repository, such as n-detect patterns generated using traditional timing-unaware TDF ATPG or randomly generated pattern set. This would make the pattern generation procedure much faster than for timing-aware ATPG. The pattern selection procedure will minimize the overlap of sensitized paths between patterns. This can ensure that only the most effective patterns with minimum sensitized path overlap will be selected and can further reduce the selected pattern count. The final pattern set ensures the same fault coverage as timing-aware ATPG by using 1-detect ATPG on top of selected patterns from the n-detect pattern set. The presented procedure is fast and effective in terms of CPU runtime, pattern count, and the number of sensitized long paths. Although the primary objective is to implement this method on an n-detect pattern set, implementation of this method on 1-detect and timing-aware pattern sets has shown significant improvement in pattern counts and the number of sensitized long paths. The remainder of this chapter is organized as follows. Section 3.2 presents the pattern grading and selection procedure, based on SDF timing. To validate this procedure, it is applied to several benchmarks and the experimental results are presented in Sect. 3.3. As an extension, a critical fault-based hybrid method is presented in Sect. 3.4. The experimental results on the critical fault-based method are presented in Sect. 3.5. Finally, Sect. 3.6 concludes this chapter. The authors would like to thank the following for their help on procedures presented in this chapter, as well as their insightful feedback and discussions: Fang Bao from the University of Connecticut, and Mahmut Yilmaz from the Advanced Micro Devices Inc.
3.2 Pattern Grading and Selection 3.2.1 Path Classification and Pattern Grading Since it is desirable to detect SDDs via long paths running through the fault sites, a technique is needed to target SDDs via long paths and gross delay defects via paths of any length [11–15]. An SDD is defined as a TDF on a long path. Defining the long path will greatly impact the pattern grading and selection. Whether a path is long or short is determined by comparing its length to the functional clock period. In general, when the length of a path is close to the clock period, which means it has a small slack, it is considered as a long path. Otherwise, when the path length is
3.2 Pattern Grading and Selection
39
Fig. 3.1 Sensitized path classification of a pattern
short compared with clock period, it is considered as a short path with large slack. Thus, it is necessary to define a threshold (named long path threshold LPthr in this chapter) according to the clock period to differentiate the paths to be long paths or short paths. This chapter considers a path to be long if its length is equal or greater than 0.7T , where T is the clock period. The effectiveness of a TDF pattern is determined by its ability in sensitizing long paths. Note that any other long path threshold can be used in practice as well. For example LPthr = 0 means every fault must be targeted via its longest sensitizable path while LPthr = 0.95T means that only critical paths are targeted. The greater number of sensitized long paths, the more effective the pattern is considered. Due to the large pattern count in the original pattern repository, a procedure is needed to select the most effective ones. The pattern grading and selection procedure presented in this chapter is based on this observation. For each TDF pattern, fault simulation is performed to identify all its detected TDF faults. Bearing such information, the sensitized paths of this pattern can be obtained by searching the topology of the design. Then the timing information for each of the sensitized paths is added based on the corresponding SDF file. The timing information makes it possible to differentiate the sensitized long paths (LPs) from short paths (SPs). In this chapter, the paths with length equal or greater than LPthr are referred to as long paths, the remaining paths are defined as short paths. Figure 3.1 presents the flow of sensitized path identification and classification. The procedure will provide a complete list of all sensitized paths and their respective length. With the definition of long path threshold, LPthr , a path can be graded by assigning a weight to it according to its path length, and further evaluate a TDF pattern by calculating the weight of all its sensitized long paths. Assume that WPi is the weight of pattern Pi, N is the total number of sensitized paths of pattern Pi, and Wpathi is the weight of ith sensitized path. The weight of the pattern Pi is calculated by (3.1). N
WPi = ∑ Wpathi . i=1
(3.1)
40
3 Long Path-Based Hybrid Method
This chapter assigns a weight of 1.0 for each long path (Wpathi = 1), and a weight of 0.0 for each short path (Wpathi = 0). Therefore, in this chapter, the weight of a pattern is equal to the number of its sensitized long paths. It has to acknowledge that the definition of pattern weight is open ended, according to the requirements of the application. For example, one can select patterns that target very small delay defects in the design by setting LPthr = 0.8T or 0.9T . One can also select patterns that will target the SDDs on long paths with first priority and the SDDs on the intermediate paths with second priority. Then, the sensitized paths can be differentiated to long paths (LPs, i.e., the paths that are longer than LPthr ), intermediate paths (IPs, i.e., the paths that are between a defined SPthr and LPthr ), and short paths (SPs, i.e., the paths that are shorter than SPthr ). Then different weights can be applied to LPs, IPs, and SPs. Also, the weight of a pattern is defined as the sum of all its sensitized paths’ weights. In this way, the patterns are evaluated according to their sensitized LPs, IPs, and SPs. If the absolute length of a path is a concern, the path weight can be defined as the ratio between the path length and the clock period for pattern evaluation. In any case, the pattern is evaluated based on its sensitized paths, and each path is weighted by its length. Furthermore, this chapter uses typical delay values in the SDF file for path evaluation. According to the requirements of the application, max/min delay values in the SDF file can also be used for worst/best case path evaluation.
3.2.2 Pattern Selection From the pattern evaluation procedure in the previous subsection, if a pattern sensitizes a large number of long paths (i.e., it is efficient in detecting SDDs) it will have a large weight. Thus, this pattern selection procedure will select patterns with the largest weights, which ensures that they are most effective in detecting SDDs. It must be noted that many paths in the design can be sensitized by multiple patterns. In this case, if a path has been sensitized by a previously selected pattern, it is not necessary to use it for evaluating the remaining patterns, if one does not want to target the faults on the path multiple times. The pattern selection procedure will check the overlap of sensitized paths between patterns, and ensures that only the unique sensitized paths of a pattern are involved in the pattern evaluation procedure. The patterns are first sorted according to their unique weights before selecting the most effective ones. The pattern selection and sorting procedure is shown in Fig. 3.2. The pattern with the largest weight will be first selected to put on the top of the sorted list. Then the weights of the remaining patterns are re-calculated by excluding paths that have been sensitized by patterns in the sorted list. Then, the pattern with the largest weight in the remaining pattern set is selected to put in the second position of the sorted list. This procedure is repeated to sort the patterns in the n-detect pattern set, and ensure that every pattern is evaluated by its sensitized unique paths, which are not sensitized by the previous patterns in the sorted pattern list.
3.3 Experimental Results on Long Path-Based Hybrid Method
41
Fig. 3.2 Selection and sorting procedure for TDF patterns based on their unique weight
This unique-path-evaluation method will ensure that there is as little overlap as possible between patterns in terms of sensitized paths. Therefore, it can further reduce the selected pattern count. The pattern-sorting iteration will be stopped when the largest weight in the remaining patterns is equal to, or smaller than the pattern selection threshold (considered as Wthr in Fig. 3.2). This chapter sets Wthr = 1 to ensure that the selected patterns can sensitize all the long paths sensitized by the original pattern repository. The remaining patterns are attached to the sorted list as their original order. This will save significant CPU runtime, considering that according to experiment, less than 20% of the patterns in the top of the sorted pattern list can sensitize all the long paths that are sensitized by the n-detect pattern set. The weights of the remaining patterns are all zero according to the pattern weight definition in this chapter. This pattern sorting algorithm will return a pattern list with decreasing order according to the test efficiency of the patterns, and only patterns with weights larger than the pattern selection threshold (Wthr ) are selected.
3.3 Experimental Results on Long Path-Based Hybrid Method 3.3.1 Experimental Setup This section will present experimental results for several IWLS [1] benchmarks. The experiments were performed on a Linux x86 server with 8 processors and 24 GB of available memory. Four IWLS benchmarks were used. The characteristics of these benchmarks are shown in Table 3.1. The number of logic gates is shown in Column 2, and number of flip-flops is shown in Column 3. The total number
42 Table 3.1 Benchmarks characteristics
3 Long Path-Based Hybrid Method
Benchmark
# of gates
# of FFs
# of total standard cells
tv80 mem ctrl systemcaes usb funct
6,802 10,357 7,289 11,262
359 1,083 670 1,746
7,161 11,440 7,959 12,808
Fig. 3.3 The entire flow of hybrid method
of standard cells is presented in Column 4. Note that the data in Table 3.1 is obtained from original Verilog RTL code of the benchmark and they may be slightly different after synthesizing the circuit, since the tool may optimize the design according to specified optimization options. Commercial tools were used for circuit synthesis, physical design and pattern generation. In order to get accurate timing information, the post-layout SDF file of the design is extracted for pattern evaluation. 180 nm Cadence Generic Standard Cell Library was used for physical design. Note that the post-layout SDF still cannot reflect some parasitic impacts, like power supply noise and crosstalk. These effects will be taken into consideration to update the post-layout SDF in the following chapters. All patterns are generated using transition-delay fault (TDF) launch-off-capture (LOC) method. After pattern selection, top-off ATPG is run for the undetected faults to ensure that all the detectable faults in the design are detected by the final pattern set and meet the fault coverage requirements. The entire flow of this hybrid method is shown in Fig. 3.3.
3.3.2 Pattern Selection Efficiency Analysis Obviously, how to define long path will greatly impact the experimental results. Whether a path is long or short is determined by comparing its length to the
3.3 Experimental Results on Long Path-Based Hybrid Method
43
4500
X: 781 Y: 4106
# of sensitized long paths
4000
n=10 X: 697
3500
n=8
Y: 3738 X: 593 Y: 3264
3000 X: 489 Y: 2764
2500
n=5 n=3
2000 X: 293 Y: 1903
1500
n=1
1000 500 0 0
200
400 600 # of selected patterns
800
Fig. 3.4 Relation between number of selected patterns and unique sensitized long paths
functional clock period. Generally speaking, all the paths should be shorter than the clock period. Otherwise, the design cannot work properly with the expected frequency. When the length of a path is close to the clock period, which means it has a small slack in terms of the clock period, it is considered as a long path. Otherwise, when the path length is small compared with clock period, it is considered as a short path with large slack. Thus, it is necessary to define a threshold (named long path threshold LPthr in this chapter) according to the clock period and applications to differentiate the paths to be long paths or short paths. As discussed in Sect. 3.2.1, this chapter, considers a path to be long if its length is equal or larger than 0.7T . Figure 3.4 presents the relation between the number of selected patterns and unique sensitized long paths by the selected pattern sets (selected from 1-detect, 3-detect, 5-detect, 8-detect, and 10-detect pattern sets) for IWLS benchmark usb funct [1]. It can be seen that only 781 (6.3% of the total pattern count 12,345) patterns are needed to detect all the long paths sensitized by the 10-detect pattern set. 7.0% (697 out of 9,958), 9.3% (593 out of 6,397), 12.2% (489 out of 4,006), and 18.7% (293 out of 1,569) patterns are needed to detect all the long paths sensitized by 8-detect, 5-detect, 3-detect, and 1-detect pattern sets, respectively. The number of patterns of different pattern sets are shown in Table 3.2. It can be seen from Fig. 3.4 that with increase in n (of n-detect pattern set), the percentage of needed patterns for sensitizing all the long paths of the pattern set decreased, even though the number of needed patterns increased. This is because the overlap of sensitizing paths between patterns was eliminated in the pattern selection procedure. It can also be seen from the figure that with the increase in n, the number of sensitized unique long paths increased, which demonstrates the efficiency of SDD detection of n-detect pattern set.
44
3 Long Path-Based Hybrid Method
Table 3.2 Number of patterns for different pattern sets Benchmark n=1 n=3 n=5 tv80 Original 1,435 3,589 5,775 Selected 50 108 133 topoff 1,332 1,260 1,213 Total 1,382 1,368 1,346
n=8 8,931 196 1,142 1,338
n = 10 11,072 193 1,174 1,367
ta 12,931 193 1,173 1,366
mem ctrl
Original Selected topoff Total
1,595 27 1,404 1,431
4,187 96 1,309 1,405
6,863 170 1,212 1,382
10,740 269 1,147 1,416
13,142 309 1,109 1,418
8,769 156 1,252 1,408
systemcaes
Original Selected topoff Total
591 304 234 538
1,318 612 78 690
2,005 885 26 911
3,028 1,206 8 1,214
3,686 1,430 4 1,434
6,166 1,918 25 1,943
usb funct
Original Selected topoff Total
1,569 293 1,180 1,473
4,006 489 1,016 1,505
6,397 593 929 1,522
9,958 697 823 1,520
12,345 781 767 1,548
11,530 788 1,006 1,794
3.3.3 Pattern Set Comparison After setting a long path threshold LPthr , it can evaluate each sensitized path and further each pattern in the pattern set. A pattern weight threshold is needed for the pattern selection procedure to terminate the procedure and ensure only the most effective patterns can be selected. These experiments set the pattern weight threshold Wthr = 1. Note that a weight 1.0 is assigned for each long path and a weight 0.0 is assigned for each short path; this pattern weight threshold will ensure that only the patterns contributing to unique long paths sensitization can be selected. Changing this threshold may impact the number of selected patterns. For example, if the pattern weight threshold is 0, all the patterns will be selected. In this case, no topoff ATPG is necessary, and the final pattern set is the original repository pattern set. Otherwise, if the Wthr is a large number, only a few patterns in the original pattern repository can be selected, and more top-off patterns are needed. For an extremely case, if the Wthr is even larger than the largest pattern weight in the original repository, no pattern can be selected, and the final pattern set is all top-off patterns. These experiments evaluate and select patterns from different pattern sets (i.e., different n-detect and timing-aware pattern sets). Table 3.2 presents the number of patterns for these pattern sets (shown at “Original”), and the number of selected patterns (shown at “selected”) as well as top-off patterns (shown at “topoff”) based on these pattern sets. It is clear that as n increases, the number of original patterns increases for each benchmark. Furthermore, the pattern count of timing-aware ATPG is larger than 1, 3-detect timing-unaware ATPGs, and sometimes even larger than 10-detect ATPG pattern set, as is the case for tv80 and systemcaes. For tv80 and mem ctrl benchmarks, the final pattern set (shown at “total”), i.e., the selected
3.3 Experimental Results on Long Path-Based Hybrid Method
45
Table 3.3 Number of sensitized long paths for different pattern sets Benchmark n=1 n=3 n=5 n=8 tv80 Original 218 428 431 724 Selected 218 428 431 724 topoff 96 69 114 37 Total 314 497 545 761
n = 10 633 633 74 707
ta 695 695 77 772
mem ctrl
Original Selected topoff Total
208 208 285 493
601 601 137 738
1,117 1,117 210 1,327
1,825 1,825 91 1,916
2,053 2,053 125 2,178
1,082 1,082 241 1,323
systemcaes
Original Selected topoff Total
945 945 379 1,324
2,049 2,049 72 2,121
2,793 2,793 24 2,817
3,775 3,775 11 3,786
4,520 4,520 1 4,521
6,135 6,135 26 6,161
usb funct
Original Selected topoff Total
1,903 1,903 593 2,496
2,764 2,764 206 2,970
3,264 3,264 154 3,418
3,738 3,738 73 3,811
4,106 4,106 79 4,185
4,338 4,338 98 4,436
patterns plus top-off patterns, is independent of n and similar in terms of pattern count. The pattern count, in fact, is even smaller than 1-detect pattern set, except for systemcaes. For systemcaes and usb funct benchmarks, the final pattern set varies more for different pattern repositories, especially for the results obtained from timing-aware pattern sets. Other notable observations are: 1. The “total” number of patterns (i.e., the selected patterns plus top-off patterns) when n = 1 is smaller than the original 1-detect pattern set. Therefore, this procedure can be used to reduce the pattern count of 1-detect pattern set without any fault coverage penalty. 2. The “total” number of patterns is considerable smaller than n-detect (shown at “Original”) or timing-aware (shown as “ta”) ATPG pattern sets. Therefore, the procedure can reduce the pattern count of these pattern sets significantly while maintaining or even increasing their SDD-detection effectiveness. The number of sensitized unique long paths and detected unique SDDs of these pattern sets are shown in Tables 3.3 and 3.4, respectively, when LPthr = 0.7T . In order to make a fair comparison, the slack was set to be 0.3T when running timingaware ATPG. From these two tables, it can be seen that for each benchmark, as n increases, the number of sensitized unique long paths and detected unique SDDs increase, except for tv80 where the 8-detect pattern set performs slightly better than 10-detect pattern set. However, both of these pattern sets sensitized a large number of long paths and detected a lot of SDDs. Furthermore, even the final pattern count does not change significantly based on different pattern repositories (see Table 3.2), its sensitized long paths and detected SDDs increase significantly from 1-detect pattern set to 10-detect pattern set. This
46
3 Long Path-Based Hybrid Method
Table 3.4 Number of sensitized long paths for different pattern sets Benchmark n=1 n=3 n=5 n=8 tv80 Original 1,190 2,012 2,005 2,591 Selected 1,190 2,012 2,005 2,591 topoff 362 170 164 17 Total 1,552 2,182 2,169 2,608
n = 10 2,558 2,558 52 2,610
ta 3,052 3,052 166 3,218
mem ctrl
Original Selected topoff Total
1,148 1,148 561 1,709
1,918 1,918 139 2,057
2,102 2,102 263 2,365
2,358 2,358 90 2,448
2,856 2,856 32 2,888
2,692 2,692 122 2,814
systemcaes
Original Selected topoff Total
4,073 4,073 537 4,610
4,782 4,782 24 4,806
5,687 5,687 5 5,692
5,909 5,909 0 5,909
6,321 6,321 0 6,321
6,672 6,672 0 6,672
usb funct
Original Selected topoff Total
5,362 5,362 590 5,952
6,317 6,317 185 6,502
6,515 6,515 86 6,601
6,934 6,934 10 6,944
7,105 7,105 6 7,111
7,570 7,570 6 7,576
is because the n-detect pattern set is more effective in sensitizing long paths and detecting SDDs and the effectiveness of the pattern set depends on the effectiveness of the original pattern repository. The penalty for running the pattern selection procedure on n-detect pattern set is comparable larger CPU runtime and memory usage for ATPG and pattern evaluation. Since the selected pattern set is a subset of the original pattern repository, the number of its sensitized long paths is upper bounded by the original pattern repository. It cannot sensitize more long paths than the original pattern repository. However, due to the Wthr definition, it sensitizes all the long paths of the original pattern repository. It is also interesting to note that: 1. This hybrid method is able to increase the number of sensitized long paths for 1-detect pattern set (e.g., 218 in “Original” and 314 in “total” for tv80 in Table 3.3) with a smaller pattern count (1,382 vs. 1,435). 2. It also improved the results obtained from timing-aware pattern set (e.g., 695 in “Original” and 772 in “total” for tv80 as in Table 3.3) with much smaller pattern count (1,366 vs. 12,931). This is because the top-off pattern set is helpful in incidental detection of some long paths. Therefore, the final pattern set can sensitize more long paths than the original n-detect pattern set and in some cases larger than timing-aware pattern sets.
3.3.4 CPU Runtime Analysis Table 3.5 presents the CPU runtime for implementing the hybrid method (shown as “CPU (HB)”) on different pattern sets for usb funct benchmark. Row 2 presents the
3.3 Experimental Results on Long Path-Based Hybrid Method Table 3.5 CPU runtime of different pattern sets for usb funct benchmark pat. sets 1-detect 3-detect 5-detect 8-detect 10-detect # pat. 1,569 4,006 6,397 9,958 12,345 CPU(ATPG) 15 s 37 s 57 s 1 m 28 s 1 m 49 s CPU(HB+I/O) 4 m 19 s 11 m 25 s 18 m 30 s 29 m 40 s 38 m 30 s CPU(HB) 56 s 2 m 24 s 3 m 51 s 6m 5s 7 m 49 s
47
ta 11,530 21 m 08 s 35 m 2 s 7m 4s
number of patterns generated using n-detect and timing-aware ATPGs, and Row 3 presents their respective CPU runtime. From the table, it can be seen that the CPU runtime of n-detect timing-unaware ATPG is small and negligible even though as n increases. However, timing-aware ATPG needs much more CPU runtime compared with timing-unaware ATPGs (Row 3). Row 4 and Row 5 present the CPU runtime of this hybrid method with and without file input/output (I/O) operation (shown as “CPU(HB+I/O)” and “CPU(HB)”), respectively. When n is small (n = 1, 3, 5, and 8), the total CPU runtime of this procedure (Row 4) is lower CPU compared with the timing-aware ATPG for this benchmark. For a larger n (n = 10), the procedure consumes a larger CPU runtime compared with the timing-aware ATPG. However, running the procedure on larger n will result in a larger number of sensitized long paths sensitization with a small increase in the final pattern count. However, by comparing Row 4 and Row 5, it can be seen that most of the CPU time are consumed by file input/and output operation. This is because the procedure has to deal with the fault list of each pattern. Therefore, if the presented method is integrated into an ATPG tool, it will be much faster than timing-aware ATPG even based on 10-detect pattern set, since there will be no need for file I/O operation. Furthermore, it does not seem fair to compare CPU runtime between this hybrid method and the commercial timing-aware ATPG tool, since it is a comparison between an experimental non-optimized codes and the highly-optimized commercial tool. It is expected that the method’s CPU runtime can be further reduced by better programming, optimizing the data structures and algorithms.
3.3.5 Long Path Threshold Analysis As discussed earlier, the long path threshold LPthr is an important parameter for this procedure. If the long path threshold changes, the number of sensitized long paths of will change as well. This will impact the number of selected patterns, and 1 the number of top-off patterns. Table 3.6 presents the experimental results with different path threshold (LPthr = 0.5T , 0.6T , 0.7T , 0.8T , and 0.9T ) for 10-detect pattern set of tv80 benchmark. It is clear that as LPthr increases from 0.5T to 0.9T , 1. The number of sensitized unique long paths decreases significantly (from 12,368 to 13) even though the original pattern set is same (shown in Row 2). This is due to the facts that with a larger LPthr , the total number of long paths in the design decreases.
48
3 Long Path-Based Hybrid Method
Table 3.6 Number of unique sensitized long paths, number of patterns for tv80 with different long path threshold LPthr tv80 LPthr = 0.5T LPthr = 0.6T LPthr = 0.7T LPthr = 0.8T LPthr = 0.9T 10-detect # LPs # patterns # LPs # patterns # LPs # patterns # LPs # patterns # LPs # patterns Original 12,368 11,072 Selected 12,368 1,761 topoff 62 306 Total 12,430 2,067
4,086 11,072 4,086 845 65 654 4,151 1,499
633 633 74 707
11,072 193 1,174 1,367
28 28 8 36
11,072 8 1,374 1,382
13 13 0 13
11,072 2 1,370 1,372
2. The number of selected patterns also decreases significantly since fewer patterns can sensitize the extremely long paths specified by the long path threshold. Due to the Wthr definition, the selected patterns will sensitize all the long paths specified by the long path threshold (shown in Row 3). 3. As the number of selected patterns decreases, the number of top-off patterns increases to meet the fault coverage requirement. As a result, with a large LPthr , most patterns in the final pattern set (shown as “total” in Row 5) are top-off patterns, which are generated using timing-unaware 1-detect ATPG. However, the number of sensitized long paths of top-off pattern set may not necessarily increase as can be seen in Row 4. 4. The final pattern sets does not change much except when LPthr = 0.5T , where a lot of patterns from the original 10-detect pattern set are selected because of the small LPthr .
3.4 Critical Fault-Based Hybrid Method In the hybrid method presented in the previous sections, n-detect ATPG is run on the entire fault list to generate the original pattern repository. In fact, there is a large portion of faults in a design that may never be timing-critical but n-detect ATPG still generates a large quantity of patterns for these faults to detect them n times, if running on the entire fault list, which does not contribute to SDD coverage. Furthermore, running n-detect ATPG on these non-timing critical faults for SDD detection is an unproductive consumption on CPU runtime and memory resources. This section presents a critical fault (CF)-based procedure to efficiently generate a high-quality original SDD pattern repository.
3.4.1 Identification of Critical Faults Before running n-detect ATPG to generate the original pattern repository, the timing critical faults are selected by calculating the minimum slack of each fault using a static timing analysis (STA) tool. Post-layout SDF information is used to calculate
3.4 Critical Fault-Based Hybrid Method
49
Table 3.7 Comparison between TF and CF methods in pattern count and cpu runtime of 10-detect ATPG. The numbers in parenthesis refer to CF’s percentage with respect to the TF Fault List TF CF SLthr – 0.35T 0.3T 0.25T # of faults 771,840 96,640 (12.5%) 48,116 (6.2%) 23,474 (3.0%) # of patterns 21,309 13,937 9,766 6,472 CPU(ATPG) 7 m 24 s 3 m 31 s 2 m 22 s 1 m 01 s
the slack of each fault. Note that the fault slack reported by STA tool is the minimum slack of a fault, obtained by calculating the longest path running through it. In a real situation, the actual fault slack after pattern generation may not necessarily be equal to this minimum value since: (1) the longest path running through the fault may not be testable, or (2) the ATPG tool does not generate patterns to detect the target fault via the longest path. A slack threshold (SLthr ) is needed for the timing critical fault selection. All the TDFs with minimum slack equal or smaller than the pre-defined slack threshold will be selected as timing critical faults. This section considers a fault as timing critical if its minimum slack is equal or smaller than SLthr = 0.3T , where T is the clock period. It indicates that all the faults on paths that are equal or longer than 0.7T will be selected for n-detect pattern generation. Similar to the long path threshold (LPthr ) definition in Sect. 3.2.1, any other slack threshold can be used for timing critical fault selection as well. It has to note that STA tools may not directly provide the exact slack of a fault that is used in this application. In the STA, a fault may have a very small slack due to timing constraints but not because it is located on a very long path. Such slack value is fine for consideration during design but not precise enough for the identification of critical faults. Therefore, this procedure ran one STA iteration to make the slack of a fault precisely correlated to the largest path length running through the fault. The first STA is run to calculate slack inaccuracy introduced by timing constraints for each fault. Then the timing constraints file in the second STA run is modified accordingly to compensate the inaccuracy to make the slack precise. Note that such timing constraints file is only used for critical fault identification but not suitable for circuit design. Table 3.7 presents the efficiency of the critical faults selected with different thresholds on one of the largest IWLS benchmark, namely ethernet [1]. T F represents the total number of faults in the design while CF represents the number of critical faults based on different SLthr (0.35T , 0.3T and 0.25T ). This experiment used 10-detect ATPG to generate TDF patterns for TFs and CFs. As seen, CPU runtime and pattern count are significantly decreased when pattern generation only focus on critical faults. This procedure provides an open access to setting slack threshold, thus a slightly loose SLthr such as 0.35T could be set to tolerate the delay induced by process variations. Similarly, a slightly tight SLthr such as 0.25T could be set for less pattern count and CPU runtime. Note that, if SLthr
3 Long Path-Based Hybrid Method
Re-calculation
50
Fig. 3.5 Critical fault-based pattern generation flow
is changed for application, the other thresholds such as long path threshold and pattern selection threshold that are used in the subsequent procedure should be change correspondingly.
3.4.2 Critical Fault-Based Pattern Selection By targeting critical faults, this procedure ensures a productive invest on n-detect pattern generation. However, n-detect cannot guarantee that each fault is detected via the long path due to the fact that its main engine is the traditional ATPG that usually tries to detect a fault via the easiest path, which tends to be a short one. Thus the n-detect pattern set based on critical faults still contains redundant patterns that detect faults via short paths. To remove these inefficient patterns and maintain the SDD detection efficiency of the original pattern set, the pattern grading and selection procedure presented in Sect. 3.2.1 is applied to the n-detect patterns. Same to the procedure in the previous sections, after pattern selection, top-off ATPG is run on the undetected faults to ensure that all the testable TDF faults can be detected by the final pattern set and meet the fault coverage requirement. Figure 3.5 shows the entire flow of the presented critical fault-based pattern generation. It is very similar to the hybrid method flow shown in Fig. 3.3. But the critical faults are firstly identified before running n-detect ATPG to generate the original pattern repository.
3.5 Experimental Results on Critical Fault-Based Hybrid Method
51
3.5 Experimental Results on Critical Fault-Based Hybrid Method 3.5.1 Experimental Benchmarks This section applies the presented procedure to 9 IWLS benchmarks [1]. The experiments are performed on a Linux x86 server with 8 processors and 24 GB of available memory. Commercial EDA tools are used for logic synthesis, physical design, timing analysis and pattern generation [2]. 180 nm Cadence Generic Standard Cell Library is used for physical design. Table 3.8 shows the characteristics of these benchmarks. The number of logic gates, flip-flops, and total standard cells, shown in Columns 2, 3, and 4, respectively, are obtained from the post-layout netlists. Columns 5 and 6 show the number of T Fs and CFs for each benchmark. All the CFs are selected based on 0.3T slack threshold where T is the clock period of the circuit.
3.5.2 Effectiveness in Critical Path Sensitization This subsection performs a series of experiments to demonstrate the necessity of CF selection for pattern generation. All the patterns used in these experiments are generated by launch-off-capture (LOC) method. The fault selection threshold SLthr is set to 0.3T , i.e., LPthr = 0.7T . After CF selection, n-detect ATPG (n = 5, 10, 20) is performed on both CF fault list (CF-n) and TF fault list (TF-n) of the designs, to compare their results in terms of pattern count and long path sensitization. Figure 3.6 presents the results from two pattern sets generated (1) when 5-detect is run on total faults (TF-5) and (2) when 5-detect is run on critical faults (CF-5). Table 3.8 Characteristics of the experimental benchmarks used for validating the CF-based hybrid method Benchmark tv80 wb dma systemcaes mem ctrl aes core dma ac97 ctrl wb conmax ethernet
# gates 8,353 8,676 12,570 13,408 21,515 21,540 24,803 44,663 132,369
# FFs 798 1,423 1,609 1,730 2,322 2,322 3,059 4,563 11,642
# Total cells 9,151 10,099 14,179 15,138 23,837 23,862 27,862 49,226 144,011
# TFs 52,916 56,286 81,018 84,986 146,402 146,582 150,074 309,962 771,840
# CFs 15,050 6,119 26,241 10,713 44,139 27,615 7,519 53,539 48,116
CFs/TFs (%) 28.4 10.9 32.4 12.6 30.1 18.8 5.0 17.3 6.2
52
3 Long Path-Based Hybrid Method
CF/TF 1.2
PCF/PTF LPCF/LPTF
CF based−to−TF based ratio
1
0.8
0.6
0.4
0.2
0
x
ma
on
c b_
w
l ctrl ctr m_ c97_ e a m
a
dm
a e 0 re es et tv8 mca b_dm s_co hern verag t e a te w e a s sy
Fig. 3.6 Comparison between CF-5 and TF-5 pattern sets
It can be seen that the CF-5 pattern set is much smaller than the TF-5 pattern set (the ratio is shown as PCF /PT F in Fig. 3.6). This is because only a small portion of faults are selected as CFs with the pre-defined SLthr (the ratio is shown as CF/T F). However, for the long path sensitization (the ratio is shown as LPCF /LPT F in Fig. 3.6), the smaller CF-5 pattern set can sensitize a big portion of long paths as the larger TF-5 pattern set. For wb dma, the CF-5 pattern set can even sensitize more long paths than the TF-5 pattern set. Except for systemcaes, the long path sensitization ratio between CF-5 pattern set and TF-5 pattern set is much larger than the pattern count ratio. On average, for all benchmarks, 35.9% PCF /PT F can result in 75.5% LPCF /LPT F when n = 5. This predicts that the long path sensitization effectiveness of each pattern in PCF reaches 2.1X of each pattern in PT F . Note that the sensitized long paths of CF-5 pattern set will not necessarily be a subset of TF-5 pattern set; it could sensitize some long paths that are not sensitized by the TF-5 pattern set. With an increase in n, CF-n performs better than TF-n in terms of both pattern count ratio and long path sensitization ratio. Figures 3.7 and 3.8 present the results from CF-n and TF-n when n = 10 and 20. For most benchmarks, PCF /PT F decreases while LPCF /LPT F increases as n goes up. For systemcaes, the LPCF /LPT F is improved steeply from 55.4 to 94.2%. On average, when n = 10 and 20, 33.2%
3.5 Experimental Results on Critical Fault-Based Hybrid Method
53
CF/TF PCF/PTF
1.2
CF based−to−TF based ratio
LPCF/LPTF 1
0.8
0.6
0.4
0.2
0
x trl trl ma m_c _c 97 ac me
on
c b_
w
a
dm
t s 0 a ore rne rage ae tv8 mc b_dm es_c ethe ave e t a s w sy
Fig. 3.7 Comparison between CF-10 and TF-10 pattern sets
and 37.8% PCF /PT F brings out 73.1% and 94.0% LPCF /LPT F , respectively. In summary, the long path sensitization efficiency of CF-based method significantly increases with the increase of n. Note that the CF-n method is much faster and requires much less hardware resources than the TF-n method. This leaves us room to increase the value of n for a large number of sensitized long paths. Figure 3.9 shows an example for comparing the pattern count and number of sensitized long paths obtained from TF-10 and those obtained from CF-n when n = 10, 15, 20 and 30 on the circuit ac97 ctrl. It can be seen that even with n = 40, the pattern count generated with CF-40 method is still smaller than that of TF-10 (1,218 vs. 1,770). However, the number of sensitized long paths increases by approximately 19% (272–323).
3.5.3 CPU Runtime Comparison on TF- and CF-Based Methods After performing n-detect ATPG on the selected CFs, the hybrid pattern selection procedure is applied to the generated pattern repository to evaluate patterns and select the most effective patterns. Top-off 1-detect ATPG is run to meet the fault coverage requirement which is the same as that of timing-aware ATPG. Due to CF identification, each step in this procedure saves significant CPU runtime.
54
3 Long Path-Based Hybrid Method
CF/TF PCF/PTF
1.2
LPCF/LPTF CF based−to−TF based ratio
1
0.8
0.6
0.4
0.2
0
x trl trl ma m_c 7_c 9 e ac m
n co
_
wb
a
dm
t s a 0 ore erne rage tv8 cae _dm _c e h m s b t av e w ae ste sy
Fig. 3.8 Comparison between CF-20 and TF-20 pattern sets
2000
400 # of patterns # of sensitized long paths 321
Number of patterns
1500
300
290
287
272
323
1218
1000
200
891
584
500
0
100
423
TF-10
CF-10
CF-20
CF-30
Fig. 3.9 Comparison between different pattern sets on ac97 ctrl
CF-40
0
Number of sensitized long paths
1770
3.5 Experimental Results on Critical Fault-Based Hybrid Method
55
Table 3.9 CPU runtime comparison between TF-based and CF-based pattern generation and selection. The numbers in parenthesis refer to the CPU runtime saving when using CF-based method over TF-based method Benchmark mem ctrl wb conmax systemcaes Faultlist TF CF TF CF TF CF # of Pat 11,025 3,606 28,354 4,049 13,039 3,507 CPU(ATPG) 41 s 13 s(68%) 4 m 18 s 1 m 07 s(74%) 43 s 17 s(60%) CPU(EVA+SEL) 1 m 09 s 8 s 12 m 05 s 2 m 20 s 1 m 27 s 12 s CPU(TOTAL) 1 m 50 s 21 s(81%) 16 m 23 s 3 m 27 s(79%) 2 m 10 s 29 s(78%)
Table 3.9 presents CPU runtime of three benchmarks to demonstrate the efficiency of CF-based pattern generation. In this experiment, TF-40 and CF-40 ATPG and pattern selection are performed on the benchmarks. It can be seen that the CF-based method can reduce CPU runtime of n-detect ATPG (shown as “CPU (ATPG)”) by about 60–75%, and results in a much smaller pattern set, saving huge CPU resources, compared with TF-based method. Note that this CPU runtime includes cost from both n-detect ATPG and top-off ATPG. Consequently, CPU runtime for pattern evaluation and selection (shown as “CPU(EVA+SEL)”) is also reduced by about 60–80%. As a result, the total CPU runtime (shown as “ CPU(TOTAL”)), which is pattern generation runtime plus pattern evaluation and selection runtime, reduces by approximately 80% with the CF-based method.
3.5.4 CF-Based Pattern Generation vs. Timing-Aware ATPG This subsection compares the final pattern set with two different timing-aware pattern sets generated by (1) timing-aware ATPG on the TF list (shown as TAT ), and (2) timing-aware ATPG on the CF list (shown as TAC ). In the pattern selection procedure, the pattern selection threshold is PSthr = 1, i.e., only the patterns that sensitize at lease one long path can be selected. Changing this threshold can impact the number of selected patterns. Table 3.10 shows the pattern count for five benchmarks. It can be seen from the table that CF-10 pattern sets (shown as “ori.”) are comparable to TAC which are much smaller than TAT . As n increases from 10 to 40, the CF-n pattern sets become even larger than TAT . However, after pattern evaluation and selection, the selected pattern sets (shown as “sel.”) are much smaller than TAC . Furthermore, the final pattern sets (shown as “final”, which equals to corresponding “sel.” plus “topoff”) are still much smaller than TAC except for wb conmax. For wb conmax, when selecting pattern based on 40-detect pattern set, the selected 1,713 pattern set is larger than 1,703 pattern set of TAT . This is because the original pattern repository contains more high quality patterns, considering the long paths sensitization and SDD detection, its efficiency is the best among all pattern sets. This will be seen in Tables 3.11 and 3.12 and discussed in Sect. 3.5.6.
56
3 Long Path-Based Hybrid Method
Table 3.10 Number of patterns for different pattern sets Benchmark CF-10 CF-20 CF-30 ac97 ctrl ori. 272 584 891 sel. 58 71 67 topoff 170 181 182 Final 228 253 249 mem ctrl
wb conmax
system-caes
ethernet
CF-40 1,218 66 177 243
TAT
TAC
593
266
ori. sel. topoff Final
803 79 412 491
1,526 117 414 531
2,440 187 378 565
3,606 262 339 611
1,126
813
ori. sel. topoff Final
995 718 302 1,020
2,062 1,226 255 1,481
3,026 1,510 246 1,756
4,049 1,713 237 1,950
1,703
963
ori. sel. topoff Final
838 308 272 580
1,966 577 265 842
2,625 672 263 935
3,507 809 247 1,056
1,560
1,002
ori. sel. topoff Final
9,766 1,432 2,360 3,792
20,751 1,472 2,345 3,817
32,740 1,479 2,340 3,819
40,079 1,703 2,295 3,998
12,065
8,220
Table 3.11 Number of sensitized long paths for different pattern sets Benchmark ac97 ctrl mem ctrl wb conmax systemcaes ethernet
sel. Final sel. Final sel. Final sel. Final sel. Final
CF-10 250 258 821 969 7,938 9,103 964 1,099 9,753 12,821
CF-20 290 295 1,232 1,410 14,800 15,349 1,761 1,882 10,714 12,765
CF-30 321 322 1,777 1,881 17,938 18,304 2,707 2,804 11,035 13,020
CF-40 323 324 2,953 3,069 19,719 19,978 3,584 3,683 12,337 14,316
TAT
TAC
262
245
2,228
2,013
12,297
10,296
3,046
2,884
10,347
10,079
When comes to the number of sensitized long paths and detected SDDs shown in Tables 3.11 and 3.12, respectively, the effectiveness of the pattern sets is obvious. As n increases, the final pattern sets sensitize more long paths and detect more SDDs than TAT and TAC for all benchmarks. The efficiency of long path sensitization and SDD detection of the final pattern sets mainly depends on the original CF-n pattern sets. The increasing ratio of SDD detection is not as fast as long paths sensitization. The reason for this is the large number of overlapping SDDs that exist between long paths. For wb conmax, the number of sensitized long paths in the final pattern set is nearly double that of TAC . Correspondingly, the SDD detection ability is 1.31X of TAC .
3.5 Experimental Results on Critical Fault-Based Hybrid Method Table 3.12 Number of detected SDDs for different pattern sets Benchmark CF-10 CF-20 CF-30 CF-40 ac97 ctrl sel. 2,882 3,003 3,213 3,191 Final 2,934 3,055 3,222 3,199 mem ctrl sel. 4,249 5,384 5,788 6,510 Final 4,653 5,553 5,942 6,590 sel. 18,470 22,081 23,440 23,419 wb conmax Final 19,293 22,273 23,568 23,526 systemcaes sel. 5,337 6,294 7,476 7,983 Final 5,640 6,552 7,687 8,141 ethernet sel. 35,393 36,916 37,224 39,075 Final 38,919 40,355 40,556 41,781
57
TAT
TAC
2,922
2,857
7,870
7,665
19,324
17,999
7,465
6,925
37,421
37,331
Table 3.13 Number of different long paths sensitized through the same CF Benchmark CF-10 CF-20 CF-30 CF-40 TAT ac97 ctrl NLP DCF 2.23 2.50 2.59 2.63 2.27 0.76 0.89 0.98 0.98 0.78 NLP TCF mem ctrl NLP DCF 5.25 9.53 11.94 15.76 8.72 1.24 3.23 4.94 6.97 3.07 NLP TCF NLP DCF 18.25 28.11 32.04 35.23 20.55 wb conmax 6.06 10.73 12.93 14.11 8.38 NLP TCF systemcaes NLP DCF 8.00 9.14 17.07 20.10 6.14 1.15 1.41 3.77 4.62 0.75 NLP TCF 16.85 17.80 18.32 21.01 18.43 ethernet NLP DCF 12.41 13.40 13.81 16.00 14.14 NLP TCF
TAC 2.16 0.72 7.43 1.13 17.50 6.95 5.14 0.52 17.95 13.75
The table also shows that most of sensitized long paths and detected SDDs of the final pattern sets are contributed by the selected pattern sets (shown as “sel.”), which are the subset of original CF-n pattern sets. Take mem ctrl as an example; when n = 40, 96.2% (2,953 out of 3,069) sensitized long paths and 98.8% (6,510 out of 6,590) detected SDDs come from patterns selected from the original pattern set (shown as “sel.”). This is because (1) after n-detect ATPG, few unsensitive long paths are left, and (2) top-off ATPG, which is timing-unaware 1-detect TDF ATPG, tends to detect faults via the short paths, rather than long paths.
3.5.5 Multiple Detection Analysis on Critical Faults In presence of uncertainties such as process variations and on-chip noises, the length of a path can vary to a large extent. As a result, it is desirable to test each critical fault via various long paths. Therefore, this subsection compares the number of different sensitized long paths running through each detected critical fault to show the effectiveness of this approach. Table 3.13 presents a comparison between the CF-n final pattern sets when n = 10, 20, 30, and 40, TAT , and TAC pattern sets.
3 Long Path-Based Hybrid Method
Number of sensitized long paths
58
2x 10
4
n=40 (1713,19719) 1.8 n=30 (1510,17938) 1.6 (190,12297) (223,12297) 1.4 n=20 (1226,14800) TAT 1.2 (1703,12297) (339,12297) 1 0.8 n=10 (718,7938) 0.6 0.4 0.2 0 500 1000 1500 2000 n−detect pattern count
Fig. 3.10 Tradeoff between pattern count and LP sensitization. The number pair (x, y) annotated in the figure shows the pattern count and number of sensitized long paths at each point
NLP TCF represents the average number of different long paths running through each critical fault while NLP DCF represents that number for each detected critical fault. It can be seen that NLP TCF and NLP DCF of CF-based pattern set increases as n goes up from 10 to 40. CF-20 begins being superior to TAT and TAC in detecting critical faults through different long paths. The CF-based method concentrates more of n-detect effort on critical faults. Take wb conmax for example, TAT and TAC pattern sets detect each critical fault 20.55 and 17.50 times through different long paths on average. The CF-40 pattern set detects each critical fault 35.23 times that is much higher and thus ensures a reliable detection capability to the critical faults via different long paths.
3.5.6 Trade-Off Analysis The presented pattern generation flow provides open access to setting variables such as SLthr , LPthr , and PSthr . Therefore, the generated pattern can be configured to meet different requirements. SLthr and LPthr can be adjusted to treat one fault with different n-detect effort. Changing PSthr can significantly impact the pattern count and SDD detection efficiency. To meet different requirements such as minimizing the pattern count or maximizing the long path sensitization, different decisions can be made. Figure 3.10 presents long path sensitization results of the original CF-n pattern sets (n = 10, 20, 30, 40) for wb conmax. The long path sensitization result of TAT is also shown as a reference. It can be seen that after pattern selection, 10-detect pattern set sensitizes 7,938 long paths which is smaller than TAT . As n increases, the number of sensitized long paths increases considerably. Take CF-20 for instance, to maximize long path sensitization performance, the pattern set at (1,226, 14,800) should be selected
3.6 Summary Table 3.14 n-detect CF-20 CF-30 CF-40
59
Pattern generation results for minimizing pattern count on circuit wb conmax sel. topoff Final # Long paths # TAT patterns # TAT long paths 339 343 682 13,183 223 369 592 13,489 190 406 596 13,798 1,703 12,297
since it uses 477 less patterns to sensitize 2,603 more long paths compared with TAT at (1,703, 12,297). To minimize pattern count and CPU resources, one can select pattern set at (339, 12,297) which is only 19.9% of TAT ’s 1,703 pattern set. Similarly, if selecting patterns based on CF-30 or CF-40 pattern set, 223 (13.1% of TAT ) or 190 (11.2% of TAT ) patterns are needed. In fact, all the points, when n = 10, between (339, 12,297) and (1,226, 14,800) could be selected to balance different factors during pattern generation. Note that a smaller selected pattern set requires a larger number of topoff patterns to meet the fault coverage requirement. However, the top-off pattern set will not increase significantly since the 1-detect TDF ATPG is capable of detecting faults with minimum runtime and pattern count. Table 3.14 presents the experimental results on wb conmax. In this experiment, pattern selection is based on CF-n (n = 20, 30, 40) pattern sets, and the pattern selection terminates when the selected patterns sensitize the same long paths as TAT pattern set. After that, top-off ATPG is performed on the undetected faults. In this case, this procedure needs much less final pattern count (682 for CF-20, 592 for CF-30, and 596 for CF-40) to obtain the same or a lightly better long path sensitization performance as TAT due to the fact that top-off patterns can detect some additional unique long paths fortuitously as well. Comparing Table 3.14 with Tables 3.10 and 3.11 for wb conmax, it can be seen that with CF-40 pattern generation, approximate 2/3 patterns can be reduced with the penalty of 6,000 sensitized long paths reduction.
3.6 Summary This chapter has presented an efficient pattern grading and selection procedure for screening SDDs using SDF timing information. n-detect pattern sets were used as the original pattern repository for pattern selection. The procedure takes advantage of n-detect ATPG for SDD detection, and reduces the pattern count significantly by selecting the most effective patterns in the pattern repository. To save ATPG runtime and hardware resources, critical fault-based hybrid method is also presented. The method was implemented on several IWLS benchmarks and the experimental results demonstrate the efficiency of this procedure.
60
3 Long Path-Based Hybrid Method
References 1. IWLS 2005 Benchmarks, “http://iwls.org/iwls2005/benchmarks.html” 2. Synopsys Inc., “SOLD Y-2007, Vol. 1–3,” Synopsys Inc., 2007 3. J. Savir and S. Patil, “On Broad-Side Delay Test,” in Proc. VLSI 2 Symp. (VTS’94), pp. 284– 290, 1994 4. R. Mattiuzzo, D. Appello C. Allsup, “Small Delay Defect Testing,” http:// www.tmworld.com/ article/ CA6660051.html Test & Measurement World, 2009 5. M. E. Amyeen, S. Venkataraman, A. Ojha, S. Lee, “Evaluation of the Quality of N-Detect Scan ATPG Patterns on a Processor”, IEEE International Test Conference (ITC’04), pp. 669–678, 2004 6. Y. Huang, “On N-Detect Pattern Set Optimization,” in Proc. IEEE the 7th International Symposium on Quality Electronic Design (ISQED’06), 2006 7. Mentor Graphics, “Understanding how to run timing-aware ATPG,” Application Note, 2006 8. P. Gupta, and M. S. Hsiao, “ALAPTF: A new transition fault model and the ATPG algorithm,” in Proc. Int. Test Conf. (ITC’04), pp. 1053–1060, 2004 9. S. Goel, N. Devta-Prasanna and R. Turakhia, “Effective and Efficient Test pattern Generation for Small Delay Defects,” IEEE VLSI Test Symposium (VTS’09), 2009 10. W. Qiu, J. Wang, D. Walker, D. Reddy, L. Xiang, L. Zhou, W. Shi, and H. Balachandran, “K Longest Paths Per Gate (KLPG) Test Generation for Scan Scan-Based Sequential Circuits,” in Proc. IEEE ITC, pp. 223–231, 2004 11. A. K. Majhi, V. D. Agrawal, J. Jacob, L. M. Patnaik, “Line coverage of path delay faults,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5, pp. 610–614, 2000 12. H. Lee, S. Natarajan, S. Patil, I. Pomeranz, “Selecting High-Quality Delay Tests for Manufacturing Test and Debug,” in Proc. IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT’06), 2006 13. N. Ahmed, M. Tehranipoor and V. Jayaram, “Timing-Based Delay Test for Screening Small Delay Defects,” IEEE Design Automation Conf., pp. 320–325, 2006 14. M. Yilmaz, K. Chakrabarty, and M. Tehranipoor, “Test-Pattern Grading and Pattern Selection for Small-Delay Defects,” in Proc. IEEE VLSI Test Symposium (VTS’08), 2008 15. M. Yilmaz, K. Chakrabarty, and M. Tehranipoor, “Interconnect-Aware and Layout-Oriented Test-Pattern Selection for Small-Delay Defects,” in Proc. Int. Test Conference (ITC’08), 2008
Chapter 4
Process Variations- and Crosstalk-Aware Pattern Selection
4.1 Introduction The complexity of today’s ICs and shrinking process technologies have made design features more probabilistic. Thus, it is necessary to perform statistical timing analysis when evaluating path lengths and considering process variations. Statistical static timing analysis (SSTA) methods were proposed and SSTA tools were developed to deal with these issues [1,14]. However, these methods are patternindependent, i.e., they estimate path length using the delay of components on the path without considering the pattern-dependent parameters. Note that power supply noise and crosstalk are pattern-dependent effects and they can significantly impact the delay of the components on a path. As discussed in Chap. 3, it is extremely difficult to develop an ATPG to take into account all important design features, such as process variations (PV), as well as the on-chip noise such as crosstalk and power supply noise. It is also difficult to ensure that all generated patterns are high quality SDD detection patterns. On the other hand, the following pattern grading and selection procedures are open for adding the impacts of these important effects, and can ensure that the selected patterns are minimized and high-quality ones. This chapter presents the use of patterndependent statistical timing analysis considering process variation and crosstalk for path length evaluation. The impact of power supply noise will be added in the following chapter. Similar to the hybrid method presented in Chap. 3, the pattern selection procedure is also long-path based, and can ensure that only high-quality patterns for screening SDDs are selected. Although the n-detect pattern set is still used as the original pattern repository, the pattern selection procedure can be applied to any kind of pattern set. Since it is a post-ATPG procedure, one can dynamically calculate the impacts of process variation and crosstalk to make the procedure more accurate.
M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 4, © Springer Science+Business Media, LLC 2011
61
62
4 Process Variations- and Crosstalk-Aware Pattern Selection
4.1.1 Prior Work on PV and Crosstalk Several statistical static timing analysis (SSTA) algorithms have been proposed, which can be classified to be path-based or block-based. In [6], the authors provide a simple method to perform the statistical timing analysis using a path-based scheme. This procedure is based on the deterministic static timing analysis to identify critical paths. A parameterized block-based SSTA method is proposed in [7]. It assumes Gaussian distribution parameters for efficient statistical analysis. In [21], a statistical quality model reflecting fabrication process quality, design delay margin, and test timing accuracy was proposed for delay testing, based on which effective test vectors were generated. A statistical fault coverage metric combining local and global delay faults was proposed in [20], which can only be used to evaluate the coverage of an existing test set. The authors in [8] proposed a statistical delay fault coverage model based on the propagation delay of a path and the delay defect size. They assumed independent delay distribution for each gate and derived a Gaussian distribution for path delay according to the central limit theorem (CLT). However, the analysis of test effectiveness in this work requires information on the delay distribution of the path under test, the delay defect size, as well as the delay distribution of the longest path passing through the fault site. Obtaining the delay defect size needs considerable analysis using circuit testability or silicon data. Since the path length is a random variable in statistical timing analysis, it is extremely difficult to find the longest path passing through the target fault site. Crosstalk effects can be reduced using existing techniques like circuit redesign presented in [4]. However, it may not be possible to eliminate the crosstalk effects, especially for the high-density designs fabricated with the latest technologies. The authors in [5] show that the simultaneous switching-induced crosstalk effects can cause up to 40% stage delay error on coupled nets, and should be taken into account in test generation and validation. Furthermore, [13] shows that process variations can also aggravate crosstalk and ground bounce effects. Therefore, techniques are proposed for test generation with crosstalk consideration, e.g., [17, 19], which are mostly focusing on a single-aggressor scenario. The approach proposed in [2] uses a genetic algorithm to induce crosstalk into delay test patterns. Pattern generation procedures were proposed in [3, 18] that can generate test patterns with crosstalk and transition arrival time consideration. However, this approach is computationally intensive. Moreover, prior work on crosstalk test generation did not consider process variations explicitly – process variations were not deemed as serious problem as they are today at the time when crosstalk modeling and test generation were studied.
4.1.2 Chapter Contents and Organization This chapter focuses on introducing effective pattern selection procedure for screening SDDs, while dynamically and accurately considering the impacts of PV
4.2 Analyzing Variation-Induced SDDs
63
and crosstalk. This approach is compatible with existing ATPG flows and it does not require new ATPG techniques. The main contents of this chapter include: 1. It uses a probability density function (PDF)-based method rather than DDPMbased method for pattern evaluation; this solves the saturation problem associated with output deviations. 2. It considers process variations and crosstalk effect as sources of SDD in nanometer technology designs. The impact of these two design features are taken into account dynamically during the pattern evaluation procedure using PDFbased analysis. 3. The PDF propagation and crosstalk calculation procedures are validated by comparisons with SPICE simulation. 4. The presented procedure can check the overlap between sensitized paths by various patterns; this helps in selecting the most effective patterns with minimum overlap in terms of sensitized paths. The n-detect pattern set is used as the original pattern repository. 5. The new pattern set increases the number of sensitized long paths, and reduces the pattern count. Therefore, it can detect more SDDs with a reduced test cost. The CPU runtime is also less than the commercial timing-aware ATPG tool. The remainder of this chapter is organized as follows. Section 4.2 presents the variation sources that induce SDDs. Section 4.3 presents the pattern grading and selection procedure considering PV and crosstalk. The procedure validations and experimental results are presented in Sect. 4.4. Finally, Sect. 4.5 concludes this chapter.
4.2 Analyzing Variation-Induced SDDs As mentioned earlier, SDDs can be introduced by both physical defects and variations. The physical defects include resistive opens and shorts. The variationinduced SDDs in a circuit are from process variations, crosstalk and power supply noise, etc. In this chapter, the procedure targets physical SDDs, as well as SDDs introduced by process variations and crosstalk.
4.2.1 Impact of Process Variations on Path Delay In reality, the parameters of fabricated transistors are not exactly the same as design specifications due to process variations. In fact, the parameters are different from die-to-die, wafer-to-wafer, and lot-to-lot. These variations include impurity concentration densities, oxide thicknesses, and diffusion depths, caused by nonuniform conditions during the deposition and/or the diffusion of the impurities. They directly result in deviations in transistor parameters, such as threshold voltage, oxide
64
4 Process Variations- and Crosstalk-Aware Pattern Selection
thickness, W/L ratios, as well as variation in the widths of interconnect wires [11], and impact the performance (increase or decrease delays) to a large extent in the latest technologies. Due to the impact of process variations, the delay of each path segment (gate and interconnect) is assumed to be a random variable X with mean value μ and standard deviation σ , rather than a fixed value. This chapter ran Monte-Carlo simulations using a 180 nm Cadence Generic Standard Cell Library to obtain the delay distributions for all gates in the library. For each gate, Monte-Carlo simulations were run with: 1. Different input switching combinations, 2. Different output load capacitances, 3. Process-variation parameters: • • • •
Transistor gate length L: 3σ = 10%, Transistor gate width W : 3σ = 10%, Threshold voltage Vth : 3σ = 20%, Gate-oxide thickness tox : 3σ = 3%.
The slew rate is an important parameter for measuring propagation delay on standard cells. Different slew rate of the input signal may result in different propagation delay on the cell [22]. To obtain a simple model for these experiments, a fixed input slew rate was applied to all the standard cells, when driving by a mediate size cell in the library. Figure 4.1 shows an example of Monte-Carlo simulation results on a NAND3X1 gate, with the above variations, input switching from 011 to 111, and output load capacitance 0.1 PF. It is clear that the mean value μ and standard deviation σ for the random variable corresponding to the gate delay are bounded values, (the gate delay values are bounded in a certain range, e.g., 0.1–0.3 ns for this example) and as the number of Monte-Carlo simulations increases, the probability density function (PDF) of the gate delay becomes closer to a Gaussian distribution. With large number of Monte-Carlo simulation runs, more accurate PDFs can be obtained while more CPU runtime is needed. To trade-off the accuracy and CPU runtime, this chapter ran 250 Monte-Carlo simulations for each input-output combination of a gate. For interconnects, this chapter uses SPICE simulation instead of Monte-Carlo simulation to obtain their delay distributions. Different variations between metal layers and vias are taken into consideration when calculating these delay distributions. It firstly ran SPICE simulation for the delay of each metal layer with unit length and each single via according to their RC parameters from the library database. Then the mental length and vias between metal layers of each interconnect are extracted from layout, with which the nominal delay of each interconnect can be calculated. There are 6 available metal layers in the library. This chapter uses 3σ variations 30%, 30%, 20%, 15%, 10%, 5%, and 5% for Metal 1 to Metal 6, respectively. The 3σ variations for vias in Metal layers 1–5 are 50% [10]. In this way, the nominal delay and 3σ variation for each interconnect in the design can be obtained.
4.2 Analyzing Variation-Induced SDDs
65
Fig. 4.1 Delay histograms obtained for NAND3X1 gate after (a) 250, and (b) 2,000 Monte-Carlo Simulation runs
In this chapter, the delay distributions of metal layers and vias are assumed to be independent from each other. It was seen that the mean and variance of the delay of each interconnect segment are bounded. Therefore, the Lindeberg’s Condition is satisfied and the CLT holds (the detailed proof is given in the Appendix A). Assume that the delay for Metal layers 1–6 with unit length L are μM1 , μM2 , μM3 , μM4 , μM5 , and μM6 , respectively, and the delay for single vias in Metal layers 1–5 are μV 1 , μV 2 , μV 3 , μV 4 , and μV 5 , respectively. If an interconnect contains Metal 1 (length l1 ) and 2 (length l2 ), and a single via 1 connecting the two layers, the mean μi and 3σ deviation 3σi of its delay distribution can be calculated using (4.1) and (4.2), respectively.
μi =
3σi =
0.3
μM1 μM2 l1 + μV 1 + l2 L L
2 μ μM1 2 M2 l1 + (0.5 μV1 )2 + 0.3 l2 . L L
(4.1)
(4.2)
66
4 Process Variations- and Crosstalk-Aware Pattern Selection
Note that the delay of each interconnect segment (metal segment or via) can be of any distribution. As the number of segments increase, the delay distribution of the interconnect approaches to a normal distribution. It is also assumed that delay variations on path segments (gates or interconnects) are independent from each other. Similar to the interconnect distribution calculation, the path delay distribution can be calculated using (4.3) and (4.4), respectively.
μp =
N
∑ μsi
(4.3)
i=1
σp =
N
∑ σsi2 ,
(4.4)
i=1
where μ p and σ p are the mean and standard deviation for the target path, respectively. μsi and σsi are the mean delay and standard deviation for segment i, respectively. The calculation accuracy is evaluated and validated in Sect. 4.4.1.
4.2.2 Impact of Crosstalk on Path Delay There are millions of interconnect segments running in parallel in a design, with parasitic coupling capacitances between them, introducing crosstalk effects and impacting the circuit delay characteristics and performance. The crosstalk effects introduced by parasitic coupling capacitance between a target net (victim) and its neighboring nets (aggressors) may either speed up or slow down the delays on both victim and aggressor nets, according to the transition direction, transition arrival time, as well as coupling capacitance between the victim and aggressor nets [16]. To take crosstalk effects into account during path length analysis and pattern selection, the chapter performs various analysis to obtain a realistic model of crosstalk. Since transitions on aggressors and victim have different direction and arrival time, this chapter performs a set of SPICE simulations and analyze their impact on each other. Figure 4.2 demonstrates the simulation results on crosstalk effects between two neighboring interconnects (one victim and one aggressor) with a fixed coupling capacitance. The times t1, t2, and t3 in the figure represent the break-points of the curve-fitting. The parameter ta−v denotes the arrival time difference between transitions on aggressor and victim nets and darrival represents the victim net delay considering the impact of arrival time difference. It is seen that when the aggressor and victim nets have the same transition direction (see Fig. 4.2a), the victim net will be sped up. Otherwise, the victim net will be slowed down (see Fig. 4.2b). Furthermore, the crosstalk effect on the victim net is maximized when the transition arrival time of aggressor and victim nets are almost the same (ta−v ≈ 0).
4.2 Analyzing Variation-Induced SDDs
67
Fig. 4.2 Impact of aggressor arrival time on victim propagation delay when victim and aggressor nets have (a) same transition direction and (b) opposite transition direction. Coupling capacitance: 0.1 pF
This chapter performs another set of simulations and analysis to take into account the impact of coupling capacitance size given a fixed arrival time between transitions on aggressor and victim nets. The simulations were done considering one aggressor for the victim net. In Fig. 4.3, it is shown that for different load capacitance cases, the propagation delay on the victim net increases linearly with the coupling capacitance. dcoupling arrival denotes the victim net delay considering the impact of coupling capacitance size and Ca−v is the coupling capacitance between the aggressor and victim nets. For the same transition direction case, the crosstalk delay decreases linearly. In the literature on statistical methods, it has been demonstrated that the least squares technique is useful for fitting data to a curve [9]. Instead of solving the equations exactly, the least squares technique tries to minimize the sum of the squares of the residuals. Similar to [1], this chapter applies the least squares curvefitting to the simulation results shown in Fig. 4.2 and approximate a piecewise
68
4 Process Variations- and Crosstalk-Aware Pattern Selection 900 800
600 500 400
d
coupling_arrival
(ps)
700
load capacitance=0.20 load capacitance=0.15 load capacitance=0.10 load capacitance=0.05
300 200 100 0
0.02
0.04 0.06 Coupling capacitance C
a−v
0.08
0.1
(pF)
Fig. 4.3 Impact of coupling capacitance on victim propagation delay with same arrival times, opposite transition direction, and different load capacitances. Load capacitance unit is pF
function relationship between the interconnect delay and the aggressor-victim arrival time difference as shown in (4.5). ⎧ ⎪ ⎪ ⎨
darrival
dorig , 0 ≤ ta−v < t1 a0ta−v + a1 , t1 ≤ ta−v < t2 = ⎪ b t + b1 , t2 ≤ ta−v < t3 ⎪ ⎩ 0 a−v dorig , t3 ≤ ta−v
(4.5)
where dorig is the original interconnect delay without considering crosstalk effects. ta−v is the arrival time difference between aggressor and victim nets. a0 and b0 are the curve slopes between timing windows [t1 ,t2 ] and [t2 ,t3 ] (see Fig. 4.2), respectively. a1 is the arrival time delay of the victim net with the conditions ta−v = 0 and t2 > 0. Similarly, b1 is the arrival time delay when ta−v = 0 and t2 < 0. After considering the impact of the transition direction and arrival time, this chapter takes the impact of coupling capacitance into account, approximated in (4.6), which is also obtained from least squares curve fitting.
dcoupling arrival = a darrivalCa−v
(4.6)
where dcoupling arrival is the propagation delay considering the impact of the transition direction, arrival time and coupling capacitance between the aggressor net and the target victim net. The factor a is negative for the same transition direction and positive for the opposite transition direction case. Furthermore, these parameters are highly dependent on technology nodes. For different technologies, these parameters are different.
4.3 PV- and Crosstalk-Aware Pattern Selection
69
For a target victim net, the aggressors as well as the coupling capacitances between the victim net and these aggressors are extracted using Synopsys PrimeTime SI [15]. After aggressor extraction, a coupling capacitance threshold is set to minimize the number of aggressors. Only aggressors with coupling capacitance larger than this threshold are considered as effective aggressors. Then, this chapter introduced a first come first impact (FCFI) procedure for the calculation of multipleaggressor cases. The FCFI procedure lists all the sensitized aggressors of the target victim net for each test pattern, and sort them according to their arrival time. Then the impact of the first-coming aggressor is applied using (4.5) and (4.6) and update the arrival time of the victim net. The second-coming aggressor is applied next. This procedure iterates until the impact of all the sensitized aggressors are applied. For simplicity, it assumes that there is no crosstalk effects between the aggressors. The validation of the crosstalk calculation procedure is presented in Sect. 4.4.2. In this chapter, it assumes that for each victim net, crosstalk effects will only impact its mean delay value, rather than its standard deviation. In this way, only the mean delay value of the victim net will be updated when measuring the path delay.
4.3 PV- and Crosstalk-Aware Pattern Selection Since it is desirable to detect SDDs via long paths running through the fault sites, a technique is needed to target SDDs via long paths and gross delay defects via paths of any length. An SDD is defined as a TDF on a long path. Therefore, a TDF pattern would be considered more efficient in detecting SDDs when it sensitizes a large number of long paths. Thus, it is needed to identify all the paths sensitized by each pattern for pattern evaluation and selection. An in-house tool was used to list all the sensitized paths of a TDF pattern. Based on the TDF fault list of each TDF pattern, the tool will search the topology of the design, and report the path as sensitized if all the segments of the path are sensitized. Note that without timing information during the sensitized paths identification, the tool may report some non-robust paths as the sensitized paths of the target pattern. With the sensitized-path report, one can evaluate and weight the sensitized paths delay, and ensure that if a TDF pattern sensitizes a large number of long paths in the design, it would be considered an effective pattern and will have a large weight. Whether a path is long or short is determined by comparing its length to the functional clock period. The path length is calculated in presence of process variations and crosstalk.
4.3.1 Path PDF Analysis In general, when the length of a path is close to the clock period, which means it has a small slack, it is considered as a long path. Otherwise, when the path length is short compared with clock period, it is considered as a short path with large slack.
70
4 Process Variations- and Crosstalk-Aware Pattern Selection
Fig. 4.4 Path PDF and path weight definition
Thus, it is necessary to define a threshold (named long path threshold (LPthr ) in this chapter) according to the clock period T to differentiate the paths to be long or short. However, in presence of process variations, the path length is a random variable with mean μsi and standard deviation σsi , rather than a fixed value. A path is evaluated by the probability that it is longer than LPthr obtained from the clock period T . Figure 4.4 shows an example of path weight definition. As mentioned in Sect. 4.2, the PDF of a sensitized path is calculated by (4.3) and (4.4) considering process variations, and is updated with crosstalk effects. In these experiments, the long path threshold is defined as LPthr = 0.7T , which means that the path weight is the probability that it is longer than 0.7T . Therefore, if a path has a large weight, there is a large probability that it is a long path, and it will contribute more in evaluating the pattern sensitizing it. Note that any other long path threshold can be used in practice as well. For instance, if LPthr = 0, every sensitized paths will have a weight 1 and contribute equally to the pattern evaluation, while LPthr is very close to the clock period T , only critical paths have the non-zero weights and are used for the pattern evaluation. After calculating the weight of each sensitized path for patterni , the weight of patterni (Wpatterni ) is calculated using (4.7), where Mi is the total number of sensitized paths by patterni . This will ensure that a long path will have a large weight for pattern evaluation. This chapter considers a path long if the mean value of its delay distribution is larger than the LPthr , which means that the path weight is larger than 0.5. Mi
Wpatterni = ∑ Wpathi
(4.7)
i=1
Figure 4.5 shows an example of path weight and pattern weight calculation. In this example, assume that patterni sensitizes four different paths. The PDF of each path is shown in Fig. 4.5 as PDF1, PDF2, PDF3, and PDF4, respectively. Assume that LPthr is 0.7T . The weight of these four paths are calculated as Wpath1 = 1, Wpath2 = 0.65, Wpath3 = 0.3, and Wpath4 = 0. Then the weight of this pattern can be calculated using (4.7): Wpatterni = 1 + 0.65 + 0.3 + 0 = 1.95.
4.3.2 Pattern Selection From the analysis and calculation of pattern/path weights in the previous subsection, it can be concluded that if a pattern has large weight, it is more effective in detecting SDDs. Therefore, the patterns with largest weights should be selected. However,
4.3 PV- and Crosstalk-Aware Pattern Selection
71
Fig. 4.5 An example of path and pattern evaluation
some of the paths may be sensitized by multiple patterns. In this procedure, if a path has already been detected by the selected patterns, it will not be considered during evaluation of the remaining patterns. In this pattern selection procedure, the pattern with the largest weight will be the first to be selected. After selecting the pattern, the procedure re-evaluates all the remaining patterns by excluding paths that have been sensitized by the selected pattern. Then, the pattern with largest weight in the remaining pattern set is selected. This procedure is repeated until some stopping criteria is met, for instance, when the pattern weight is smaller than a specific threshold. This pattern selection procedure will ensure that the best patterns can be selected from the initial pattern repository. It can also ensure that there is as little overlap as possible between patterns in terms of sensitized paths. Therefore, it can reduce the pattern count. The pattern-sorting algorithm is shown in Fig. 4.6. In this algorithm, each pattern is evaluated by the not already sensitized long paths, i.e., those that are not sensitized by the previously selected patterns. This algorithm will return a pattern list with decreasing order according to the test efficiency of patterns with which we can select the best patterns in the initial pattern repository. Assume that N patterns are used by the above algorithm. Also assume that a maximum of M paths are sensitized by a pattern and a maximum of K segments exist on a sensitized path in the target circuit. The worst-case time complexity of the pattern sorting algorithm is O(N 2 MK) where N M and N K for large designs. In fact, this is the worst-case scenario which may never be met in real applications since several new techniques are added to the procedure to speed it up: (1) The inefficient patterns are removed before performing pattern selection; (2) Once a pattern is selected, all its sensitized long paths will be removed from the long path lists of the remaining patterns. This will reduce the size of long paths list of each pattern significantly after several patterns are selected; (3) After re-evaluation, the new inefficient patterns in the remaining pattern set will be removed since they will never be selected; and (4) the pattern selection will be terminated if the largest
72
4 Process Variations- and Crosstalk-Aware Pattern Selection
Fig. 4.6 Pattern sorting algorithm
weight in the remaining patterns is smaller than the pattern weight threshold, which is used to terminate the pattern selection iteration. The CPU runtime can also be trade-off by the pattern selection efficiency. For instance, assuming that there are 10,000 test patterns in the original pattern repository. These patterns can be divided into 1,000 groups according to pattern ID, and each group has ten patterns. The pattern groups can also be evaluated by their sensitized paths. After evaluating the pattern groups, the pattern evaluation and selection procedure can be applied on them. In this way, the time complexity can be reduced by 100X. Furthermore, it is also viable to bypass the procedure used for checking the sensitized path overlap, so that the time complexity of this algorithm would be O(NMK). This will significantly reduce the CPU runtime. In this case, some faults may be detected multiple times and therefore, the test quality of the selected pattern set may also increase. However, the penalty is that the pattern count will also increase.
4.4 Experimental Results The complete flow of these presented methods is shown in Fig. 4.7. Commercial EDA tools are used for circuit synthesis, physical design, parasitic parameter extraction, as well as pattern generation. In the experiments, Synopsys Design Compiler [15] was used for circuit synthesis and Astro [15] was used to perform the placement and routing of standard cells. After physical synthesis on the design, n-detect ATPG with TetraMAX [15] is run to generate source patterns (n-detect patterns in Fig. 4.7) for pattern evaluation and selection procedure. All patterns are generated using launch-of-capture (LOC) method. Although the flow is present
4.4 Experimental Results
73
Fig. 4.7 Flow diagram of pattern generation, evaluation and selection
Table 4.1 Details of experimental benchmarks
Benchmark
# of gates
# of FFs
# of Total standard cells
ethernet wb conmax tv80 ac97 ctrl mem ctrl systemcaes wb dma s13207 s9234
138,021 43,219 8,648 25,488 13,388 12,529 8,474 1,355 511
11,617 4,538 773 3,034 1,705 1,584 1,398 625 145
149,638 47,757 9,421 28,522 15,093 14,113 9,872 1,980 656
based on n-detect pattern set, it can be run on any kind of pattern repository, e.g., random pattern set. PrimeTime SI [15] is used for crosstalk analysis. Monte-Carlo simulation is run to obtain the PDFs of the standard cells, with consideration of the different input combinations, and different output load capacitances. After pattern selection, top-off ATPG, which is 1-detect timing-unaware ATPG, is run to meet the fault coverage requirement for TDF fault model. As a result, the final pattern set of this procedure is the selected pattern set plus the top-off ATPG pattern set. The programs for performing crosstalk calculation, pattern evaluation and selection were implemented using C/C++. The validation for process variations and crosstalk calculation procedures are implemented with Perl. These experiments were performed on Linux x86 servers with 8 processors and 24 GB of available memory. Seven IWLS benchmarks and two ISCAS benchmarks were used in the experiments. The details of these benchmarks are shown in Table 4.1. The number of logic gates is shown in Column 2, and the number of flip-flops is shown in Column 3. The total number of standard cells is presented in Column 4. Note that the data in
74
4 Process Variations- and Crosstalk-Aware Pattern Selection Table 4.2 Comparison between the calculated path delay distribution using (4.3) and (4.4) and simulated path delay distribution. 1,000 SPICE simulations were run for each path Standard deviation σ (ps) Mean μ (ps) Path ID Cal. Sim. μ% Cal. Sim. σ% 1 2 3 4
340.92 489.52 668.69 441.40
340.91 489.53 668.70 441.39
0.0029 −0.0020 −0.0015 0.0023
30.82 36.03 39.16 32.72
31.81 37.05 40.57 33.97
−3.1122 −2.7530 −3.4755 −3.6797
Table 4.1 are obtained from synthesized circuits, and they may be slightly different after placement and routing, since the physical design tool may add some buffers for routing optimization.
4.4.1 Validation of Process Variations Calculation As mentioned in Sect. 4.2.1, (4.3) and (4.4) can be used to calculate the mean and standard deviation of the path length. This subsection will perform experiments to validate the equations for process variations calculation, by comparing them with full-circuit Monte-Carlo simulation results. This subsection randomly generates some sample paths (with 6–10 gates along each path) and run Monte-Carlo simulation on them to obtain the mean value and standard deviation of each path, which are used as references for the experiments. Then, the calculation procedure is applied to calculate the mean value and standard deviation of each target path to compare with the references. The results are shown in Table 4.2. Columns 2–4 are the calculated mean value, the mean value obtained using simulation for each path, and the difference between them, respectively. Columns 5–7 are the standard deviations obtained using (4.4) and simulation of each path, and the difference between them, respectively. It can be seen from the table that the calculated mean values are very close to the simulation results, while the standard deviation difference is a little bit large. Also, for all the paths involved in these experiments, the difference between the calculated and simulated standard deviation is under 4%. The reason behind this variation is that the delay distribution of each path segment is close to a Gaussian distribution.
4.4.2 Validation of Crosstalk Calculation This subsection will first present the necessity of performing crosstalk analysis, and then validate the crosstalk calculation procedure by comparing with SPICE simulation. In order to show the impact of crosstalk effects, this subsection sets
4.4 Experimental Results
75
Table 4.3 Delay comparison between different crosstalk and aggressor transition scenarios Case Delay (ps) Var. (%)
0 49.52 0.00
1
2
34.58 −30.17
42.33 −14.52
3 53.02 7.07
4 70.90 43.17
up a circuit with one victim net and ten aggressor nets. With different aggressor sensitization, the delay on the victim is shown in Table 4.3. This table ran SPICE simulation and measure the victim delays on 5 cases. Case0 (Column 2) presents the victim delay without crosstalk effect (all the aggressors are static). Case1 (Column 3) is an extreme case that all the aggressors have the same transition direction and arrival time with the victim net (speed-up), while Case4 (Column 6) is another extreme case (slow-down) that all the aggressors have opposite transition direction but same arrival time as the victim net. Case2 and Case3 are two random cases with the transition direction and arrival time randomly selected. The absolute delays are shown in Row 2, and the delay variations compared with no-crosstalk-impact case (Case0) are presented in Row 3. It can be seen from Table 4.3 that with the extreme crosstalk-impact cases, the delay on the victim can be either speeded by over 40% (Case4) or slowed down by over 30% (Case1) compared with Case0. The delay of Case4 (extreme slowdown) is over 2X larger than Case1 (extreme speed-up). For the cases with random transition direction and arrival time on the aggressors (Case2 and Case3), it is easy to have a delay variation around 10%. Note that this is just a sample circuit with only ten aggressors. However, for the design where a victim has more aggressors, the situation may be even worse. Therefore, it is obvious that crosstalk effects need to be into consideration in order to evaluate the path length accurately. To obtain a relatively simple model, the interconnect is modeled using an equally distributed RC network, i.e., the resistance and capacitance of the interconnect are all equally distributed. It is also assumed that the resistance of an interconnect is proportional to its load capacitance. This chapter uses a tool, which can automatically generate circuits with: • One victim net, • Random number of aggressors, in the range of 2–10. • Random load capacitance on the victim and aggressor nets, in the range of 0–0.1 pF. The number of RC segments of the victim or aggressor is the same, and can be defined by user input. • Independent random coupling capacitance between the victim and aggressor nets, in the range of 0–0.01 pF. The coupling capacitance is also equally distributed according to the RC network of victim and aggressors. • Independent random arrival time difference between the victim and aggressor nets, in the range of −5,000 to 5,000 ps. • Independent random transition directions on the aggressors.
76
4 Process Variations- and Crosstalk-Aware Pattern Selection 450 SimDelay CalDelay
400
Absolute delay value (ps)
350 300 250 200 150 100 50 0 0
20
40
60
80
100
Case ID
Fig. 4.8 Delay comparison between SPICE simulation and the crosstalk calculation procedure
It then ran SPICE simulation on the circuit and measure the delay on the victim. Next, the curve-fitting-based crosstalk calculation procedure is applied to calculate the victim delay on the same circuit so that it can compare the calculation results with SPICE simulation results. The results obtained from simulation (shown as SimDelay) and curve-fitting (shown as CalDelay) are presented in Fig. 4.8. It randomly generates 100 cases and do the comparison for each case. In Fig. 4.8, the x axis represents the case ID, and the y axis represents the absolute delays on the victim. It can be seen from the figure that the curve-fitting based crosstalk calculation procedure slightly overestimated the crosstalk effects(i.e., delay). For the 100 random cases, the minimum, maximum, and average percentage errors of the calculation are 3.97%, 35.19%, and 15.15%, respectively. Even though the absolute values are different, this crosstalk calculation procedure correlates very well with the SPICE simulation results. In fact, the two data sets (simulation data and calculated data) have a correlation coefficient of 0.9829. The correlation coefficient is calculated with (4.8).
ρX,Y =
E((X − μX )(Y − μY ) σX σY
(4.8)
4.4 Experimental Results
77
1800 X: 520 Y: 1718
1600 X: 316 Y: 1547
# of sensitized long paths
1400 1200 1000 800 600 400 200 0 0
200
400 600 800 # of selected patterns
1000
1200
Fig. 4.9 Long path sensitization by the selected patterns for tv80
where ρX,Y is the calculated correlation coefficient and X and Y are the two random variables used for correlation calculation, respectively. μX and μY , σX and σY are mean values and standard deviations of X and Y , respectively [12].
4.4.3 Pattern Selection Efficiency Analysis This subsection presents experimental results for validating the efficiency of the pattern selection procedure. The validation is done by calculating the total number of sensitized unique long paths for the selected patterns. These experiments set the long path threshold LPthr = 0.7T (T is clock period). It considers a path to be long if its weight is larger than 0.5, i.e., its mean value is larger than the long path threshold. Figure 4.9 presents the relation between the number of selected patterns and sensitized unique long paths by the selected pattern set on IWLS benchmark tv80. In this experiment, 10-detect pattern set is used as the initial pattern repository, which has 11,072 patterns, and sensitizes 1,718 LPs, with long path threshold LPthr = 0.7T . From the experimental results in Fig. 4.9, it can be seen that only 520 (4.7% of the total pattern count 11,072) patterns are needed to detect all the long paths sensitized by the 10-detect pattern set, or 316 (2.8% of the total pattern count 11,072) patterns are needed to detect 1,547 (or 90.0% of) long paths sensitized by the 10-detect pattern set.
78
4 Process Variations- and Crosstalk-Aware Pattern Selection Table 4.4 Number and percentage of selected patterns for long path sensitization Benchmark 10-detect ethernet wb conmax tv80 ac97 ctrl mem ctrl systemcaes wb dma s13207 s9234
90% LP sensitization
100% LP sensitization
# pat. 1,204 1,487 316 113 230 750 66 18 34
# pat. 2,785 2,932 520 165 390 1,139 146 41 82
% pat. 3.73 10.96 2.80 1.77 1.75 20.35 1.83 0.27 1.23
% pat. 8.63 21.62 4.70 2.58 2.97 30.90 4.06 0.61 2.97
Note that these percentages are design-dependent; for different designs, the percentage of selected patterns would be different. Table 4.4 presents the pattern percentages of all the benchmarks. Columns 2 and 3 are number and percentage of the selected patterns for 90% long paths sensitization, respectively. Columns 4 and 5 are number and percentage of the selected patterns for 100% long paths sensitization, respectively. All the pattern selections in this table are based on 10-detect pattern set. The total pattern count of these benchmarks can be found later in Table 4.7. From these results, it can be seen that for most of the benchmarks only a small portion of patterns are selected for the long path sensitization, except for wb conmax and systemcaes. The reasons behind many patterns are selected for these two benchmarks are that (1) there are many intermediate paths in the designs, (2) although these intermediate paths are not recognized as long paths according to the definition, they do have weight and contribute to pattern weight, and (3) the pattern selection procedure is based on the pattern weight. Therefore, the selected patterns would be effective in detecting SDDs (TDF faults with slack smaller than 0.3 according to the definition in this experiment), as well as TDFs with slack larger but close to 0.3. If defining the pattern weight to be the number of sensitized LPs, the selected patterns for this benchmark would be significantly reduced. For the benchmark s13207, only less than 1% patterns can sensitize all the LPs sensitized by 10-detect pattern set. This is because there are fewer LPs and intermediate paths in the design, and most of the paths are short.
4.4.4 Pattern Set Comparison In the pattern selection procedure, a pattern weight threshold is set as the termination criterion. For example, the pattern weight threshold used in this procedure is Wpattern thr = 1, i.e., only the patterns with weight larger than 1 can be selected. Changing this threshold can impact the total number of selected patterns.
4.4 Experimental Results
79
Table 4.5 Number of sensitized long paths for different pattern sets Benchmark 1-detect 10-detect t.aware sel. tff. sel+tff ethernet 10,882 16,559 15,598 16,559 155 16,714 wb conmax 9,401 18,826 13,085 18,826 70 18,896 tv80 976 1,718 1,578 1,623 613 2,236 343 400 360 373 92 465 ac97 ctrl 237 2,739 1,168 2,707 186 2,893 mem ctrl systemcaes 772 2,110 1,763 1,940 28 1,968 wb dma 1,036 1,890 1,209 1,881 98 1,979 s13207 143 155 136 152 25 177 s9234 172 309 203 305 3 308 Table 4.6 Number of detected SDDs for different pattern sets Benchmark 1-detect 10-detect t.aware sel. tff ethernet 48,385 63,829 66,622 63,829 646 wb conmax 21,571 28,608 24,063 28,608 37 tv80 17,420 32,856 34,199 31,080 10,171 3,825 4,704 4,183 4,376 1,007 ac97 ctrl 4,614 53,650 22,156 53,024 3,416 mem ctrl systemcaes 13,366 38,380 32,988 35,334 440 wb dma 13,842 26,701 20,131 26,594 1,299 s13207 2,167 2,629 1,881 2,682 306 s9234 2,128 3,724 2,442 3,679 44
sel+tff 64,475 28,645 41,251 5,383 56,440 35,774 27,893 2,988 3,723
Tables 4.5 and 4.6 show the results for the number of sensitized unique long paths and detected unique SDDs. In general, n-detect and timing-aware pattern sets are expected to perform better in sensitizing unique long paths and detecting unique SDDs compared to the 1-detect timing-unaware ATPG. This is indicated by the results shown in Columns 2, 3, and 4 in both tables. In Table 4.5, Column 5 presents the number of sensitized unique long paths of the selected pattern set. Column 6 presents the number of unique long paths sensitized by top-off ATPG pattern set, not sensitized by the selected pattern set. Top-off patterns are generated using a 1-detect timing-unaware ATPG. Column 7 presents the total number of sensitized unique long paths for the final pattern set, i.e., the selected patterns plus top-off ATPG patterns. From the results in Table 4.5, it can be seen that timing-aware ATPG and 10-detect ATPG pattern sets always detect significantly higher number of long paths than 1-detect pattern set except for s13207 benchmark. On the other hand, timing-aware ATPG is not as effective as 10-detect ATPG in long path sensitization for these circuits. However, this pattern set is more efficient than 10-detect ATPG pattern set in terms of long paths sensitization, except for s9234 benchmark, for which the number of sensitized long paths are very close. Table 4.6 presents the number of SDDs for different pattern sets. Since SDDs are TDFs on the long paths, if a pattern detects many long paths, then it can also detect many SDDs. Table 4.7 presents the number of patterns for 1-detect, 10-detect, timing-aware ATPG, and this pattern set. These patterns are used for obtaining the
80
4 Process Variations- and Crosstalk-Aware Pattern Selection Table 4.7 Comparison between the number of patterns Benchmark 1-detect 10-detect t.aware sel. ethernet 6,479 39,844 32,275 2,785 wb conmax 1,931 13,562 19,766 2,932 tv80 1,435 11,072 17,107 393 1,032 6,393 4,087 126 ac97 ctrl 1,595 13,142 6,577 352 mem ctrl systemcaes 591 3,686 5,590 800 wb dma 483 3,600 4,460 131 s13207 810 6,712 1,108 36 s9234 343 2,763 428 64
tff 3,816 195 924 834 1,032 30 354 775 271
sel+tff 6,601 3,127 1,317 960 1,384 830 485 811 335
Table 4.8 Long path threshold impact on pattern selection for tv80 # of sel # of top-off Total # # of # of LPthr patterns patterns patterns LPs SDDs 0.7T 393 924 1,317 2,236 82,502 0.8T 60 1,279 1,339 1,765 63,428
number of sensitized long paths and detected SDDs as shown in Tables 4.5 and 4.6. From these results, it can be seen that timing-aware ATPG results in large pattern count compared to 1-detect set for large IWLS benchmarks. For some cases, e.g., wb conmax, tv80, and wb dma benchmarks, their pattern counts are even larger than the corresponding 10-detect pattern sets. For all cases, this pattern set would result in a significantly smaller number of patterns compared to 10-detect and timing-aware pattern sets. In short, the pattern set can detect a large number of long paths with pattern count close to 1-detect pattern set.
4.4.5 Long Path Threshold Analysis Long path threshold LPthr is an important parameter for this procedure. If the long path threshold changes, the path weight calculation threshold will change accordingly. Although it will not impact the effectiveness of the selected patterns, it may impact the number of selected patterns and number of detected long paths and SDDs. If the long path threshold increases, the number of selected patterns decreases and the number of top-off ATPG patterns increases to meet the fault coverage requirement. On the other hand, if reducing the long path threshold, the number of selected patterns increases and top-off ATPG pattern count decreases. The number of sensitized long paths and SDDs will also change accordingly. The results in Sect. 4.4.4 is only for a fixed long path threshold LPthr (0.7T ). Table 4.8 presents the results for two different long path thresholds (0.7–0.8T) for tv80 benchmark. When the long path threshold increases, the weight of each pattern decreases and the number of selected pattern decreases as well.
4.5 Summary Table 4.9 CPU runtime of different pattern sets for tv80 n-detect 1-detect 3-detect 5-detect # patterns 1435 3,589 5,775 CPU (PV) 48 s 2 m 17 s 4m 2s CPU (PV+Xtalk) 18 m 57 s 48 m 28 s 1 h 19 m 2 s
81
8-detect 8,931 6 m 32 s 2 h 1 m 59 s
10-detect 11,072 8m 3s 2 h 30 m 3 s
4.4.6 CPU Runtime Analysis Table 4.9 presents the CPU runtime of implementing this method on n-detect pattern sets (n = 1, 3, 5, 8, and 10) for the tv80 benchmark. It can be seen that as n increases, the pattern count increases. The CPU runtime of the pattern evaluation and selection procedure also increases with the pattern count when considering (1) only process variations (Row 3 in Table 4.9) and (2) process variations and crosstalk (Row 4 in Table 4.9). Furthermore, CPU runtime increases significantly when considering crosstalk effects. This is because during crosstalk calculation, for each net on the path, the procedure extracts (1) all its neighboring nets with coupling capacitances from the layout database, (2) the arrival time, and (3) transition directions on the neighboring nets (if any) for each pattern. This makes the pattern selection procedure much more complex. Note that the timing-aware TDF ATPG on the tv80 benchmark takes about 1 h 2 min, and the CPU time of n-detect timing-unaware TDF ATPG (n = 1, 3, 5, 8, and 10) on this benchmark is less than 2 min. As seen from the table, this pattern evaluation and selection procedure consumes a considerably lower CPU time when only process variations is considered. The top-off ATPG is quite fast and consumes a negligible CPU time. When taking crosstalk into consideration, the CPU time of evaluating 5-detect pattern set is close to that of timing-aware ATPG. Even though this method is quite fast, comparing it with timing-aware ATPG does not seem fair since this method takes extra features into account during pattern generation. Furthermore, it is expected that the method’s CPU runtime can be further reduced by better programming, optimizing the data structures and algorithms.
4.5 Summary This chapter has presented an effective pattern evaluation and selection procedure for screening small delay defects. The presented procedure takes into account process variations and crosstalk to evaluate their impact on path delay. The accuracy of the calculation procedures are validated by comparing with SPICE simulation. Although n-detect pattern set was used as the initial pattern repository, this flow is able to be applied to any kind of pattern repository, and efficiently identify high-quality patterns, that sensitize a large number of long paths. The method was implemented on several ISCAS and IWLS benchmarks and the results demonstrated its effectiveness for reducing the pattern count and significantly increasing the number of sensitized long paths.
82
4 Process Variations- and Crosstalk-Aware Pattern Selection
References 1. A. B. Kahng, B. Liu, and X. Xu, “Statistical Timing Analysis in the Presence of Signal-Integrity Effects,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, No. 10, October 2007 2. A. Krstic, J. Liou, Y. Jiang, and K. T. Cheng, “Delay testing considering crosstalk induced effects,” in Proc. Int. Test Conf., pp. 558–567, 2001 3. A. Sinha, S. K. Gupta, and M. Breuer, “An enhanced test generator for capacitance induced crosstalk delay faults,” in Proc. Asian Test Conf., pp. 174–177, 2003 4. A. Vittal and M. Marek-Sadowska, “Crosstalk reduction for VLSI,” in Proc. IEEE Trans. on Computer-Aided Design for Circuits and Systems, vol. 16, no. 3, pp. 290–298, 1997 5. AR. Arunachalam, K. Rajagopal, and L. T. Pileggi, “Taco: Timing analysis with coupling,” in Proc. Des. Autom. Conf., pp. 266–269, 2000 6. C. Amin, N. Menezes, K. Killpack, F. Dartu, U. Choudhury, N. Hakim, and Y. I. Ismail, “Statistical static timing analysis: How simple can we get?” in Proc. ACM/IEEE Des. Autom. Conf., pp. 652–657, 2005 7. C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan, “First-order incremental block-based statistical timing analysis,” in Proc. ACM/IEEE Des. Autom. Conf., pp. 331–336, 2004 8. E. S. Park, M. R. Mercer, T. W. Williams, “Statistical Delay Fault Coverage and Defect Level for Delay Faults,” in Proc. IEEE International Test Conference (ITC’88), 1988 9. G. W. Recktenwald, “Numerical Methods with MATLAB: Implementations and Applications,” Prentice-Hall, 2000 10. ITRS 2008, “http://www.itrs.net/Links/2008ITRS/Home2008.htm” 11. J. M. Rabaey, A. Chandrakasan, B. Nikolic, “Digital Integrated Circuits, A Design Perspective (Second Edition),” Prentice Hall Publishers, 2003 12. L. B. Koralov, Y. G. Sinai, “Theory of Probability and Random Processes,” Second Edition, springer.com. ISBN: 978-3-540-25484-3 13. M. A. Breuer and S. K. Gupta, “Process aggravated noise (PAN): New validation and test problems,” in Proc. Int. Test Conf., pp. 914–923, 1996 14. R.-B. Lin, and M.-C. Wu, “A New Statistical Approach to Timing Analysis of VLSI Circuits,” in IEEE DAC2001, 1997 15. Synopsys Inc., “SOLD Y-2007, Vol. 1–3,” Synopsys Inc., 2007 16. W. Chen, S. Gupta, and M. Breuer, “Analytic Models for Crosstalk Delay and Pulse Analysis Undernon-Ideal Inputs,” in Proc. IEEE International Test Conference (ITC’97), pp. 808–818, 1997 17. W. Chen, S. K. Gupta, and M. A. Breuer, “Test generation for crosstalk-induced delay in integrated circuits,” in Proc. Int. Test Conf., pp. 191–200, 1999 18. W. Chen, S. K. Gupta, and M. A. Breuer, “Test generation for crosstalk-induced faults: Framework and computational results,” in Proc. Asian Test Conf., pp. 305–310, 2000 19. W. Chen, S. K. Gupta, and M. A. Breuer, “Test generation in VLSI circuits for crosstalk noise,” in Proc. Int. Test Conf., pp. 641–650, 1998 20. W. Qiu, X. Lu, J. Wang, Z. Li, D. Walker, and W. Shi, “A statistical fault coverage metric for realistic path delay faults,” in Proc. VLSI Test Symp., pp. 37–42, 2004 21. Y. Sato, S. Hamada, T. Maeda, A. Takatori, S. Kajihara, “A Statistical Quality Model for Delay Testing,” in Proc. The Institute of Electronics, Information and Communication Engineers (IEICE) Trans., Vol. E89-C pp. 349–355, 2006 22. “Liberty User Guide, Vol. 1, 2,” Version 2007.12
Chapter 5
Power Supply Noise- and Crosstalk-Aware Hybrid Method
5.1 Introduction Nanometer technology allows packing more transistors into one chip and increasing the operating frequency of transistors, which in turn results in increased switching, power density, and coupling noise. With the process technologies scaling down to 45 nm and below, a large number of SDDs are introduced by the on-chip power supply noise (PSN) and crosstalk. The impact of these pattern-induced noise effects must be considered since their impacts on path delays become increasingly significant as operating frequency and power density increase [2, 6, 13].
5.1.1 Prior Work on PSN and Crosstalk The impact of power supply noise and crosstalk on circuit performance has been addressed in many literatures, such as [2, 3, 8]. In [10, 11], the authors proposed power supply noise aware ATPGs for minimizing the power supply noise on path delays. In [7], an ATPG method was proposed to generate path delay test patterns with maximizing power supply noise effects. All these methods can only be applied to the selected critical paths, since the number of paths in the design increases exponentially with circuit size. The authors in [9] built a look-up table for calculating the propagation delay of the target paths with power supply noise effects. In this work, the authors assumed linear relationship between delay and voltage. The crosstalk effects are also addressed in many work, as discussed in Chap. 4. Here this chapter discusses some more prior work on this topic. In [13], the authors proposed an analytical model for crosstalk, which was used for generating pattern with crosstalk consideration in [14]. This approach considers only singleaggressor case. The authors in [1] proposed a statistical procedure for gate delay variation calculation with crosstalk alignment consideration. It firstly established a relationship between gate delay and crosstalk alignment based on deterministic M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 5, © Springer Science+Business Media, LLC 2011
83
84
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
delay calculation, then performed the probabilistic delay calculation. In [12], a path selection procedure was proposed with coupling noise consideration. The authors in [6] proposed a flow for path-delay pattern generation with maximum crosstalk effects. Similar to the PSN-aware work in [7], this method is only applied to the selected critical paths.
5.1.2 Chapter Contents and Organization This chapter presents a PSN- and crosstalk-aware hybrid method to grade and select the most effective patterns for screening SDDs based on the technique presented in Chap. 3. Unlike the curve-fitting crosstalk calculation procedure presented in Chap. 4, this chapter presents a new Xtalk2Delay database-based procedure for crosstalk calculation, which is more accurate. Similarly, an IR2Delay database is presented for accurate mapping from IR-drop effects to gate delay compromise. The accuracy of the database-based calculation is validated by comparing with fullcircuit SPICE simulation results. The pattern grading criteria in this chapter is based on the number of sensitized long paths of each pattern. The main contents of the chapter include: 1. Presenting a metric to evaluate each TDF pattern based on its sensitized paths. This is based on the observation that detecting SDDs via the long paths running through the fault sites is more efficient. 2. A procedure is introduced to identify all the sensitized paths by each TDF pattern. 3. PSN and crosstalk effects are considered dynamically to accurately evaluate the sensitized paths of each pattern. 4. The procedures for PSN and crosstalk calculation are validated by comparisons with full-circuit SPICE simulation. 5. An efficient pattern selection procedure is presented. The pattern selection procedure is based on n-detect pattern repository, takes the advantages of its lower CPU runtime and high test quality in terms of long path sensitization. The procedure also minimizes the overlap of sensitized paths between patterns (i.e., identifies unique sensitized long paths) to minimize the selected pattern count. 1-detect top-off TDF ATPG is run on the undetected faults after the pattern selection procedure to ensure the same fault coverage as timing-aware ATPG. It has to acknowledge that this procedure can be applied to any kind of pattern set, e.g., path-delay pattern set, random pattern set, etc. The remainder of this chapter is organized as follows. Section 5.2 discusses the PSN and crosstalk effects and their impacts on circuit performance. The pattern grading and selection procedure is presented in Sect. 5.3. The procedure validations and experimental results are presented in Sect. 5.4. Finally, Sect. 5.5 concludes this chapter.
5.2 Analyzing Noise-Induced SDDs
85
5.2 Analyzing Noise-Induced SDDs As technology scales to 45nm and below, the impact of PSN and crosstalk on circuit performance have become increasingly significant. PSN and crosstalk are very important sources of adding small delay in devices manufactured with the latest technologies. Thus, the goal in this chapter is to target physical SDDs, as well as SDDs introduced by PSN and crosstalk. These effects are added to the procedure dynamically, when evaluating the delay of sensitized paths by each pattern.
5.2.1 Impact of PSN on Circuit Performance Technology scaling allows packing more transistors into one chip and increasing the operating frequency of transistors. This results in increase in switching and power density. PSN can be introduced by inductive or resistive parameters, or a combination of them. The inductive noise is referred to as L.di/dt, depending on the inductance L and instantaneous current changing rate. The resistive noise is referred to as IR drop, depending on the current and distributed resistance on the power distribution network. This chapter only focuses on the resistance-introduced power supply noise, i.e., IR drop, and its impact on circuit performance. Figure 5.1 shows the IR-drop plot for wb conmax benchmark [5] for a randomly selected TDF test pattern. The pattern set for this benchmark is generated using a commercial ATPG tool. Launch-off-capture (LOC) scheme with random-fill is used to minimize the pattern count. This is the average IR drop calculated during launchto-capture cycle. Power pads are located in the four corners of the design. It can be seen that the area far from power pads (center of the design) has a large IR drop. As seen, difference gates in the design will experience different voltage drop, hence delay compromise and performance degradation. As the circuit size increases, a more severe IR drop is expected in the design. The voltage drop on a gate will directly impact its performance. Figure 5.2 presents the simulation results on an AND gate with different power supply voltages. The output load capacitance of the gate is 0.1 pF. It is seen that with 20% IR-drop (0.36 V), the average gate delay decrease can be approximately 21%. This experiment is based on 180 nm Cadence Generic Standard Cell Library with nominal Vdd = 1.8 V. It has to acknowledge that in smaller technology nodes, the percentage of gate delay increase will be much higher [7]. Such effect is represented as an SDD and introduces an extra delay on the path running through this gate. Note that when more than one gate on a path experiences voltage drop, the performance degradation will be profound. An IR2Delay database based on SPICE simulation has been introduced in this chapter to accurately map IR-drop effects to gate delay increase. To obtain a relatively simple model, only the lump capacitance of the interconnect is used to
86
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
Fig. 5.1 IR-drop plot of a test pattern applied to wb conmax benchmark
250
240 Gate delay (ps)
Fig. 5.2 Average delay increase of a gate as a result of IR-drop increase (180 nm Cadence generic standard cell library, nominal power supply voltage = 1.8 V)
230
220
210
200 0
0.1 0.2 0.3 Power supply voltage drop (V)
0.4
calculate the output load of the gate. The interconnect resistance is ignored. The slew rate is an important parameter for measuring propagation delay on standard cells. Different slew rate of the input signal may result in different propagation delay on the cell [15]. To obtain a simple model for these experiments, a fixed input slew
5.2 Analyzing Noise-Induced SDDs
87
rate was applied to all the standard cells, when driving by a mediate size cell in the library. For each standard cell in the library, SPICE simulation is performed to measure its propagation delay with 1. 2. 3. 4.
Different propagation path, from different input pins to the output pin, Different transition direction, including rising and falling transitions, Different power supply voltage on the cell, and Different load capacitance.
The results are collected and stored in the IR2Delay database for delay calculation. More details on setting up the IR2Delay database will be presented in Chap. 11. In real application, people can take advantage of the database for PSN-introduced path delay analysis. For each gate along the sensitized path, the procedure 1. Obtains its transition pins and transition direction according to the pattern information; 2. Performs IR-drop analysis and obtain its IR drop using a commercial EDA tool; 3. Extracts its output load capacitance from the layout. With all this information, it searches the IR2Delay database for the PSNintroduced delay of the gate, with which to evaluate the sensitized path.
5.2.2 Impact of Crosstalk on Circuit Performance Parasitic coupling capacitance between parallel interconnects introduces crosstalk effects which may either speed up or slow down the delays on both nets, according to the transition direction, transition arrival time, as well as the coupling capacitance [6, 13, 14]. As the technology scales, the distance between interconnects becomes much smaller. This results in a large coupling capacitance and crosstalk effects. Crosstalk has become a big concern in modern designs, especially for the critical paths which present minimum slacks. As discussed in Chap. 4, different transition direction, coupling capacitance, and arrival time difference between the neighbor nets would result in different crosstalk effects, as shown in Figs. 5.3 and 5.4. In these two figures, the target net is referred to as victim net and its neighboring net is referred to as aggressor net. Figure 5.3 shows the simulation results on crosstalk effects between two neighboring nets with a fixed coupling capacitance. It is seen that when the aggressor and victim nets have the same transition direction (see Fig. 5.3a), the victim net will be speeded up. Otherwise, the victim net will be slowed down (see Fig. 5.3b). Furthermore, the crosstalk effect on the victim net is maximized when the transition arrival time of aggressor and victim nets are almost the same (ta−v ≈ 0). Figure 5.4 shows the impact of coupling capacitance on the victim net, with a fixed arrival time difference between transitions on the victim and aggressor nets. The aggressor net has an opposite transition with the victim net in this experiment. It can be seen
88
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
Fig. 5.3 Impact of aggressor arrival time on victim propagation delay when victim and aggressor nets have (a) same transition direction and (b) opposite transition direction. Coupling capacitance: 0.1 pF 900 800
600 500 400
d
coupling_arrival
(ps)
700
load capacitance=0.20 load capacitance=0.15 load capacitance=0.10 load capacitance=0.05
300 200 100 0
0.02
0.04 0.06 Coupling capacitance C
a−v
0.08
0.1
(pF)
Fig. 5.4 Impact of coupling capacitance on victim propagation delay with same arrival times, opposite transition direction, and different load capacitances. Load capacitance unit: pF
that the propagation delay on the victim net increases linearly with the coupling capacitance for different load capacitance cases. For the same transition direction case, the crosstalk delay decreases linearly. Instead of using curve-fitting method presented in Chap. 4, this chapter introduced an Xtalk2Delay database based on SPICE simulation for crosstalk effects calculation. When setting up the Xtalk2Delay database, both parasitic resistance and capacitance of the interconnect are taken into account. In this chapter, the interconnect is modeled using an equally distributed RC network, i.e., the resistance and capacitance of the interconnect are all equally distributed. The number of RC segments of the victim or aggressor is the same, and can be defined by user input.
5.3 Pattern Grading and Selection
89
It is also assumed that the resistance of an interconnect is proportional to its load capacitance. The RC ratio difference between metal layers is ignored in this chapter. The Xtalk2Delay database is setup based on single-aggressor case. The SPICE simulation is run with 1. Different arrival time difference between the victim and aggressor transitions, 2. Different load capacitance and corresponding resistance of the victim and aggressor nets, 3. Different coupling capacitance, The coupling capacitance is also equally distributed according to the RC network of victim and aggressors. 4. Different transition combinations, including all rising, all falling, victim rising – aggressor falling, and victim falling – aggressor rising cases. The Xtalk2Delay database includes all the simulation results of these combinations of parameters. The load and coupling capacitances used for crosstalk calculation are extracted from layout, and interconnect resistance is calculated according to the ratio to its load capacitance. The transition direction and arrival time are obtained from the test pattern. For single-aggressor crosstalk calculation, the procedure searches the Xtalk2Delay database with all the above parameters and calculate the crosstalk-introduced delay. For the multiple-aggressor case, a first come first impact (FCFI) procedure is used to calculate the impact of aggressors one by one, which is similar to the one used in Chap. 4. The FCFI procedure first obtains the arrival time of all the sensitized aggressors of the target victim net according to the test pattern, and sort them. Then it applies the impact of the first-coming aggressor using the Xtalk2Delay-based calculation procedure. The second-coming aggressor is applied next. This procedure iterates until the impact of all the sensitized aggressors are applied. A coupling capacitance threshold is specified to make sure that only aggressors with coupling capacitance larger than this threshold are considered by the FCFI procedure. This can minimize the number of aggressors for the target victim, and speed up the calculation. This chapter does not consider the impact of aggressors on each other.
5.3 Pattern Grading and Selection 5.3.1 Sensitized Path Identification and Classification As mentioned earlier, it is desirable to detect SDDs via long paths running through the fault sites. Therefore, it is necessary to target SDDs via long paths and gross delay defects via paths of any length. An SDD is defined as a TDF on a long path. The pattern grading and selection procedure in this chapter is based on this observation. In the presented procedure, each TDF pattern is evaluated by its
90
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
Fig. 5.5 Sensitized path identification and classification for each pattern
sensitized paths. If a pattern sensitizes a large number of long paths, it is considered as an effective pattern for SDD detection. However, the challenges are 1. Identification of all the sensitized paths for each TDF pattern; 2. Classification of the sensitized paths, i.e., long paths or short paths, 3. Dynamic consideration of pattern-induced noise effect on path delay. An in-house tool was used to report all the sensitized paths of each pattern. For a test pattern, the fault simulation is run to identify all its detected TDF faults. With this information, it searches the topology of the design for the sensitized paths of the pattern. A path is reported as sensitized if all the nodes along the path are detected TDF faults. Path classification is done to identify long paths and short paths by comparing path length to the functional clock period. In general, when the length of a path is close to the clock period, which means it has a small slack, it is considered as a long path. Otherwise, when the path length is short compared with the clock period, it is considered as a short path with large slack. A threshold (called long path threshold LPthr in this chapter) is defined according to the clock period to differentiate the paths to be long or short. This chapter defines the long path threshold to be 0.7T , where T is the functional clock period. Thus, a path is considered to be long if its length is equal or greater than this long path threshold. Otherwise, it is considered as a short path. Note that any other long path threshold can be used in practice as well. For example LPthr = 0 means every fault must be targeted via its longest sensitizable path while LPthr = 0.95T means that only critical paths with length equal or greater than 0.95T are targeted. The original path length is obtained from SDF database, and it is dynamically updated based on pattern-dependent PSN and crosstalk calculation. Figure 5.5 shows the flow of the sensitized path identification and classification. For different test patterns, even the same path may have different lengths due to the impact of PSN and crosstalk effects. With the definition of long path threshold, LPthr , people can grade a path by assigning a weight to it according to its path length. This chapter assigns a weight
5.3 Pattern Grading and Selection
91
of 1.0 for each long path, and a weight of 0.0 for each short path. The test pattern is evaluated by calculating the weight of all its sensitized long paths. Assume that WPi is the weight of pattern Pi, N is the total number of sensitized paths of pattern Pi, and Wpathi j is the weight of jth sensitized path for pattern Pi. The weight of the pattern Pi is calculated by WPi = ∑Nj=1 Wpathi j . In this way, the weight of a pattern is equal to the number of its sensitized long paths. It has to acknowledge that other weight assignments can also be used. It can be varied according to the requirements of the application. Since the weight of a pattern is defined as the sum of all its sensitized paths’ weights, the patterns will have different weights according to different path weight definition. For example, it is viable define the path weight according to the ratio between its length and clock period T , rather than long path threshold LPthr . This can also ensure that a long path will have a large weight while a short path will have a small weight. Setting LPthr = 0.8 or 0.9T to target the very small delay defects in the design is another way of assigning weight. Another method would be to define the long path threshold LPthr as well as a short path threshold SPthr to classify sensitized paths to be long paths (LPs, i.e., the paths that are longer than LPthr ), intermediate paths (IPs, i.e., the paths that are between SPthr and LPthr ), and short paths (SPs, i.e., the paths that are shorter than SPthr ). Different weights can be applied to LPs, IPs, and SPs. In this case, the selected patterns target the SDDs on long paths with first priority and the SDDs on the intermediate paths with second priority. In any case, the pattern is evaluated based on its sensitized paths, and each path is weighted by its length.
5.3.2 Pattern Selection The path/pattern evaluation procedure in the previous subsection ensures that a pattern sensitizing a large number of long paths will have a large weight. Therefore, patterns with the largest weights should be selected for screening SDDs. It must be noted that many paths in the design can be sensitized by multiple patterns. In this case, if a path has been sensitized by a previously selected pattern, it is not necessary to use it for evaluating the remaining patterns, if not want to target the faults on the path multiple times. This pattern selection procedure will check the overlap of sensitized paths between patterns, and ensure that only the unique sensitized long paths of a pattern are involved in the pattern evaluation procedure. The pattern selection procedure is very similar to the one presented in Chap. 3, as shown in Fig. 5.6. Before selection, the patterns with no or very few long paths sensitized are removed from the pattern list, since they are inefficient in detecting SDDs, and will never be selected. This can save CPU runtime significantly. Then, the pattern with the largest weight will be firstly selected. All the remaining patterns are re-evaluated by excluding paths detected by the selected pattern. The pattern with largest weight in the remaining pattern set is then selected. This procedure is repeated until some stopping criteria is met, for instance the largest pattern weight
92
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
Fig. 5.6 Selection and sorting procedure for TDF patterns based on their unique weight
evaluated by unique sensitized long paths in the remaining patterns is smaller than a specific threshold. This can further save additional CPU runtime. The pattern selection procedure can ensure that 1. The best patterns for screening SDDs in the original pattern repository can be selected; 2. Every pattern is evaluated by its sensitized unique paths and there is as little overlap as possible between selected patterns in terms of sensitized paths. This can further reduce the pattern count. Experimental results demonstrate that only a small portion of the patterns is selected to sensitize all the long paths that are sensitized by the original pattern repository. Assume that the pattern selection procedure is applied to a pattern set with N patterns. Also assume that a maximum of M paths are sensitized by a pattern and a maximum of K segments exist on a sensitized path. The worst-case time complexity of the pattern sorting algorithm is O(N 2 MK) where N M and N K for large designs. However, the CPU runtime is much smaller than the worst-case expected, since 1. The procedure removed the inefficient patterns before pattern selection; 2. Once a pattern is selected, all its sensitized long paths will be removed from the long path lists of the remaining patterns. This will reduce the size of long paths list of each pattern significantly after several patterns are selected;
5.4 Experimental Results
93
3. After re-evaluation, the new inefficient patterns in the remaining pattern set will be removed since they will never be selected; 4. The pattern selection iteration is terminated at the pattern selection threshold. Again, the CPU runtime can always be trade-off by the pattern selection efficiency. For instance, assuming that there are 10,000 test patterns in the original pattern repository. These patterns can be divided into 1,000 groups according to pattern ID, and each group has ten patterns. The pattern groups can also be evaluated by their sensitized paths. After evaluating these pattern groups, the pattern evaluation and selection procedure can be applied. In this way, the time complexity can be reduced by 100X. Furthermore, it is also viable to bypass the procedure used for checking the sensitized path overlap, so that the time complexity of this algorithm would be O(NMK). This will significantly reduce the CPU runtime. In this case, some faults may be detected multiple times and therefore, the test quality of the selected pattern set may also increase. However, the penalty is that the pattern count will also increase.
5.4 Experimental Results 5.4.1 Experimental Setup The noise-aware hybrid method is shown in Fig. 5.7. In these experiments, n-detect pattern set is used as the original pattern repository. Commercial EDA tools were used for physical design and n-detect test pattern generation. During the pattern evaluation and selection procedure, layout information as well as pattern-dependent noise (i.e., PSN and crosstalk) were taken into consideration. The programs for performing PSN and crosstalk calculation, pattern evaluation and pattern selection
Fig. 5.7 The entire flow of the noise-aware hybrid method
94 Table 5.1 Benchmarks characteristics
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
Benchmark
# of gates
# of FFs
Total # of standard cells
wb conmax ac97 ctrl
43,219 25,488
4,538 3,034
47,757 28,522
were implemented using C/C++. After pattern selection, top-off ATPG is run to meet the fault coverage requirement for the TDF fault model. As a result, the final pattern set of the procedure is the selected pattern set plus the top-off ATPG pattern set. This section applies the procedure to two IWLS benchmarks [5], namely wb conmax and ac97 ctrl. The experiments were performed on a Linux x86 server with 8 processors and 24 GB of available memory. The characteristics of these benchmarks are shown in Table 5.1. Note that the data in Table 5.1 is obtained after the benchmarks are synthesized by the commercial synthesis tool, and they may be slightly different after placement and routing, since the physical design tool may add some buffers for routing optimization and meeting timing requirements. 180 nm Cadence Generic Standard Cell Library was used for physical design.
5.4.2 Validation of PSN Calculation As mentioned in Sect. 5.2.1, SPICE simulation-based IR2Delay database was introduced to calculate the IR-drop impact on delay compromise. This subsection validates the accuracy of the IR2Delay database-based procedure by comparing with full-circuit SPICE simulation results. Since it is very time-consuming to run fullcircuit SPICE simulation on a large circuit, this chapter uses a small benchmark circuit from ISCAS [4], s344, for validation purposes. It chose one test pattern from the TDF pattern set generated for s344, as an example. Under this pattern, 32 gates are sensitized. The full-circuit SPICE simulation is run on the design to extract the delays of these sensitized gates. Then the validation procedure searches the IR2Delay database for these sensitized gates’ delays. To demonstrate the accuracy of the IR2Delay database-based procedure, it also searches the SDF file for the delays of these sensitized gates. The comparison results are shown in Fig. 5.8. In Fig. 5.8, the x axis represents the sensitized gate ID, and y axis represents the absolute delays of the sensitized gates. It can be seen that the delay calculated with IR2Delay database is much closer to full-circuit SPICE results compared with SDF delays. Furthermore, the IR2Delay database slightly underestimates the gate delays if compared with full-circuit SPICE results. There could be many reasons for this difference, including the approximations it made in the IR2Delay calculation, as well as ignoring the impact of some other important SI issues, like crosstalk effects as a result of coupling capacitance between neighbor nets. The following Sect. 5.4.3 will show that the calculation will be much accurate after crosstalk effects using Xtalk2Delay database.
5.4 Experimental Results
95
140 SPICE delay IR2Delay upd delay SDF delay
Absolute delay value (ps)
120
100
80
60
40
20
0
5
10
15 20 Sensitized gate ID
25
30
35
Fig. 5.8 Delay comparison between full-circuit SPICE, SDF and IR2Delay database results Table 5.2 Error percentage of IR2Delay database-based delay calculation and SDF-based delay calculation procedure when comparing with full-circuit SPICE simulation % error IR2Delay-based SDF-based
Max (%) 42.51 72.77
Min (%) 4.29 0.12
Average (%) 20.72 31.82
Table 5.2 shows the error percentage when comparing the delay calculation procedures with full-circuit SPICE simulation results. It can be seen that in some case, the error percentage of SDF-based delay calculation can be over 70%, while the IR2Delay-based delay calculation procedure has a maximum error percentage 42.51% for this test case. On the other hand, on some specific gate, the SDF delay can be very close to full-circuit SPICE delay fortuitously (minimum error percentage can be 0.12%). However, on average, the IR2Delay-based delay calculation procedure is much accurate than the SDF-based procedure. Furthermore, the IR2Delay database correlates very well with the full-circuit SPICE simulation results comparing with the SDF delays. In fact, the correlation coefficient of IR2Delay vs. SPICE is 0.947, while SDF vs. SPICE is 0.278. The correlation coefficient is calculated with (5.1).
ρX,Y =
E((X − μX )(Y − μY )) σX σY
(5.1)
96
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
where ρX,Y is the calculated correlation coefficient and X and Y are the two random variables used for correlation calculation, respectively. μX and μY , σX and σY are mean values and standard deviations of X and Y , respectively. The computation complexity of searching the IR2Delay database is O(log n), where n is the number of logic cells in the technology library. Therefore, it is very fast and easy to scale to larger industry design.
5.4.3 Validation of Crosstalk Calculation This subsection validates the accuracy of the Xtalk2Delay database-based procedure. Similar to the validation of IR2Delay database, it is also by comparisons with full-circuit SPICE simulation. As mentioned in Sect. 5.2.2, in this chapter (1) the interconnect is modeled using an equally distributed RC network, and the number of RC segments can be defined by user input; (2) it assumes that the resistance of an interconnect is proportional to its load capacitance; and (3) The RC ratio difference between metal layers is ignored. This section developed a tool, which can automatically generate circuits with: • One victim net, • Random number of aggressors, in the range of 2–10, • Random load capacitance on the victim and aggressor nets, in the range of 0–0.1 pF. The number of RC segments of the victim or aggressor is the same, and can be defined by user input. • Independent random coupling capacitance between the victim and aggressor nets, in the range of 0–0.01 pF, The coupling capacitance is also equally distributed according to the RC network of victim and aggressors. • Independent random arrival time difference between the victim and aggressor nets, in the range of −5,000 to 5,000 ps, • Independent random transition directions on the aggressors. Then the validation procedure ran full-circuit SPICE simulation and the Xtalk2Delay-based calculation procedure on each case circuit and compare the results as shown in Fig. 5.9. One hundred case circuits are created to validate the crosstalk calculation procedure. Each case has one victim and multiple aggressor nets. In Fig. 5.9, the x axis represents the case ID, and the y axis represents the absolute delays on the victim. It is seen that the Xtalk2Delay-based calculation procedure is very accurate compared with full-circuit SPICE simulation results. This subsection also compared the Xtalk2Delay database-based method with the curve fitting-based method presented in Chap. 4. The full-circuit SPICE simulation results are used as the reference for the comparison. The comparison results are shown in Table 5.3. It can be seen from that table that curve fitting-based method presented in Chap. 4 always underestimates the crosstalk delay (always positive error percentage), and the absolute error percentage is large (35.19% in maximum and 15.15% on average). The Xtalk2Delay database-based method presented in this
5.4 Experimental Results
97
400
SimDelayR CalDelayR
Absolute delay value (ps)
350 300 250 200 150 100
50
0
20
40
60
80
100
Case ID
Fig. 5.9 Delay comparison between Xtalk2Delay-based calculation and full-circuit SPICE simulation results Table 5.3 Error percentage of Xtalk2Delay database-based delay calculation and curve fitting-based delay calculation procedure (Chap. 4) when comparing with full-circuit SPICE simulation % error Xtalk2Delay-based Curve fitting-based
Max (%) 23.92 35.19
Min (%) −21.02 3.97
Average (%) 1.30 15.15
chapter has a smaller error percentage (23.92% in maximum), compared with curve fitting method. Therefore, it is much closer to the full-circuit SPICE simulation reference. With the Xtalk2Delay database-based method, both positive and negative error percentage will be obtained and therefore, it is very close to full-circuit SPICE results on average (1.30% error percentage). In fact, the Xtalk2Delay databasebased results also has a better correlation with full-circuit SPICE results (with a correlation coefficient of 0.9925), while the curve fitting-based has a correlation coefficient 0.9829. The correlation coefficient is calculated with (5.1).
5.4.4 Pattern Selection Efficiency Analysis Table 5.4 presents the number and percentage of selected patterns for 90% and 100% long path sensitization, respectively, when applying the pattern selection procedure to different pattern sets of wb conmax. The long path threshold LPthr
98
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method Table 5.4 Number and percentage of selected patterns for long path sensitization applied to wb conmax n-detect n=1 n=3 n=5 n=8 n = 10
Original patt. set 1,931 4,612 7,151 11,028 13,562
90% LP sensitization
100% LP sensitization
# sel. patt. 734 1,244 1,655 1,889 1,929
# sel. patt. 1,410 2,451 3,052 3,589 3,847
% sel. patt. 38.01 26.97 23.14 17.13 14.22
% sel. patt. 73.02 53.14 42.68 32.54 28.37
in this experiment is 0.7T . Therefore, if a sensitized path is equal or longer than 0.7T , it is considered as long path. Otherwise, it is a short path. It can be seen from the table that only 38.01% patterns are needed to sensitize 90% long paths sensitized by the 1-detect pattern set. With the increase in n, the percentage of needed patterns for long path sensitization decreases, even though the number of selected patterns increases. This is because this procedure eliminates the overlap of sensitized paths between patterns in the pattern selection procedure. The following part will show that with the increase in n, the number of sensitized unique long paths increases, which demonstrates the efficiency of SDD detection of n-detect pattern set. The table demonstrates that this pattern selection procedure is effective in selecting high-quality SDD patterns. For example, only 1,929 patterns (14.22%) and 3,847 patterns (28.37%) from 13,562 10-detect patterns are selected to sensitize 90% and 100% of all the sensitized long paths, respectively. Note that, this chapter selects patterns from 10-detect pattern set, with 100% long path sensitization, for screening SDDs.
5.4.5 Pattern Set Comparison With the long path threshold LPthr , people can evaluate each sensitized path and further each pattern in the pattern set. As mentioned in Sect. 5.3, the pattern selection procedure classifies the sensitized paths to be long and short based on the LPthr , and assign weight of 1.0 or 0.0 for the long path and short path, respectively. Also, as mentioned in Sect. 5.1, the objective is to select patterns sensitizing a large number of long paths for SDD detection, which are the patterns with large weight according to the pattern weight definition. The pattern selection procedure starts from the pattern with the largest weight. A pattern weight threshold is needed in the pattern selection procedure to terminate the procedure and ensure only the most effective patterns can be selected. These experiments set the pattern weight threshold Wthr = 1, which means that it is going to select patterns that can sensitize all the long paths sensitized by the original pattern set. The pattern selection procedure also eliminates the sensitized long path overlap between patterns, which ensures that only the sensitized unique long paths will be considered for pattern
5.4 Experimental Results
99
evaluation. This can significantly reduce the selected pattern count. After pattern selection, the top-off ATPG, which is 1-detect timing-unaware TDF ATPG, is run on the undetected faults, to meet the fault coverage requirements. Therefore, the final pattern set is the selected patterns plus top-off ATPG patterns. Table 5.5 compares different kind of pattern sets (n-detect pattern sets, timingaware pattern set, the selected patterns (shown as “sel.”), top-off ATPG patterns (shown as “topoff”), and the final pattern set (shown as “sel.+topoff”)) in terms of pattern count, sensitized unique long paths, and detected unique SDDs for IWLS benchmarks wb conmax and ac97 ctrl. It is clear that as n increases, the number of original patterns, as well as their sensitized long paths and detected SDDs increase for both benchmarks. Timing-aware ATPG pattern set can sensitize more long paths and detect more SDDs compared with timing-unaware 1-detect ATPG pattern set. However, it also results in a pattern count significantly larger than 1-detect, and even larger than 10-detect pattern set for wb conmax. Since the selected pattern set is a subset of the original pattern repository, the number of its sensitized long paths is upper bounded by the original pattern repository. It cannot sensitize more long paths than the original pattern repository. However, due to the Wthr definition, it sensitizes all the long paths of the original pattern repository. Therefore, with top-off patterns, which may incidentally sensitize some extra long paths, the final pattern set can sensitize more long paths and detect more SDDs than the original pattern repository (10-detect pattern set in these experiments) and timing-aware pattern set as in ac97 ctrl. For wb conmax, the sensitized long paths and detected SDDs of the final pattern set is very close to timing-aware pattern set. When comparing the pattern counts, it is interesting that the final pattern count is much smaller (by 5X) than the original 10-detect pattern repository or timing-aware pattern set, and sometime even smaller than 1-detect pattern set as in ac97 ctrl.
5.4.6 The Impact of PSN and Crosstalk To present the impact of PSN and crosstalk on path delay, this subsection ran the procedure on a selected number of paths 1. 2. 3. 4.
Without considering noise impacts; Only considering crosstalk effects; Only considering PSN effects; Considering both crosstalk and PSN effects.
Table 5.6 shows the path delay for four sensitized paths for the above four cases, when applying 1-detect pattern set to the wb conmax benchmark. It is seen that the crosstalk effects (Column 3, shown as “xtalk”) can either speed up the path delay as in Path1 or slow down the path delay as in Path2, Path3, and Path4. However, the PSN effect (Column 4, shown as “PSN”) always slows down the sensitized path delay. The combination of these two effects (Column 5, shown as “xtalk+PSN”)
ac97 ctrl
# patt. # LPs # SDDs
1,032 532 5,894
2,335 548 5,994
3,570 601 6,264
5,260 658 6,711
6,393 652 6,747
4,087 580 6,297
184 652 6,747
755 1 14
Table 5.5 Comparison between pattern count, number of sensitized long paths, and number of detected SDDs. LP thr = 0.7T Benchmark n=1 n=3 n=5 n=8 n = 10 ta sel. topoff wb conmax # patt. 1,931 4,612 7,151 11,028 13,562 19,766 3,847 92 # LPs 12,867 19,836 22,933 26,417 27,896 28,012 27,896 43 # SDDs 25,368 29,988 31,997 33,664 34,595 34,653 34,595 51
939 653 6,761
sel.+topoff 3,939 27,939 34,646
100 5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
5.4 Experimental Results
101
Table 5.6 The impact of PSN and crosstalk on path delay for four randomly selected paths in wb conmax Delay (ps) Path1 Path2 Path3 Path4
ori. (ps) 2,030.6 2,705.8 2,075.5 1,893.4
xtalk (ps) 2,026.7 2,768.8 2,095.2 1,884.1
PSN (ps) 2,039.7 2,861.9 2,200.7 1,972.5
xtalk+PSN (ps) 2,035.9 2,928.6 2,221.6 1,962.9
Table 5.7 The impact of PSN and crosstalk effects on the pattern selection results for wb conmax wb conmax # sel. patt. # LPs # SDDs
ori. 1,240 9,130 19,724
xtalk 1,335 10,060 21,493
PSN 1,350 11,947 23,679
xtalk+PSN 1,410 12,867 25,368
will slow down the target path, even though crosstalk may speed up some paths, since PSN effect seems to be more dominant for the 180 nm technology nodes. However, for smaller technology nodes, it is expected that crosstalk and PSN present comparable impact on path delay. Due to the change in path delay when considering the pattern-induced noises, the number of selected patterns, the number of sensitized long paths, and the number of detected SDDs, will change even if the long path threshold LPthr is kept intact. Table 5.7 shows the pattern selection results in the above four cases, when applying 1-detect pattern set to the benchmark wb conmax. Row 2 presents the number of selected patterns in these four cases. Rows 3 and 4 present the number of sensitized long paths and the number of detected SDDs for the selected patterns in these four cases. It is clearly seen that the number of long paths in the design increases as a result of the noise impact. Therefore, more patterns are selected for the long path sensitization and SDD detection. In fact, there are 2,099 short paths that become long as a result of crosstalk slow down effects. At the same time, 1,169 long paths become short due to crosstalk speed up for this pattern set. Therefore, 2,099 − 1,169 = 930 new long paths are brought in with crosstalk effects as can be seen in Row 3 for Columns 2 and 3. The PSN can only slow down the path, and it makes in 11,947 − 9,130 = 2,817 short paths to be long, i.e., become longer than LPthr . Note that the sum of unique long paths caused by crosstalk and PSN effects (930 + 2817 = 3,747) is not exactly equal to the results considering both effects (12,867 − 9,130 = 3,737). This is because there are few paths that have been slowed down by both effects large enough to make them longer than LPthr , i.e., the procedure only reports the new unique long paths.
5.4.7 CPU Runtime Analysis Table 5.8 presents the CPU runtime for pattern generation with commercial ATPG tool (Row 3, shown as “CPU (ATPG)”) and for implementing the hybrid method
Table 5.8 CPU runtime of different pattern sets for wb conmax benchmark pat. sets 1-detect 3-detect 5-detect # patt. 1,931 4,612 7,151 CPU (ATPG) 42 s 1 m 43 s 2 m 43 s CPU (HB) 1 m 43 s 2 m 40 s 3 m 18 s CPU (HB+xtalk) 45 m 50 s 1 h 50 m 24 s 2 h 47 m 20 s CPU (HB+PSN) 9 m 16 s 13 m 59 s 17 m 48 s CPU (HB+xtalk+PSN) 54 m 36 s 2 h 0 m 25 s 3 h 0 m 24 s 8-detect 11,028 4 m 19 s 4m 8s 4 h 22 m 18 s 19 m 34 s 4 h 37 m 3 s
10-detect 13,562 5 m 24 s 4 m 32 s 5 h 21 m 28 s 24 m 8 s 5 h 38 m 52 s
ta 19,766 34 m 15 s 3 m 55 s 5 h 40 m 33 s 26 m 40 s 5 h 59 m 21 s
102 5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
5.5 Summary
103
(pattern grading and selection after sensitized paths identification) on different pattern sets for wb conmax benchmark. The hybrid method is run (1) without considering PSN and crosstalk noises (Row 4, shown as “CPU (HB)”), (2) considering only crosstalk effect (Row 5, shown as “CPU (HB+xtalk)”), (3) considering only PSN effect (Row 6, shown as “CPU (HB+PSN)”), and (4) considering both crosstalk and PSN effects (Row 7, shown as “CPU (HB+xtalk+PSN)”). From the table, it can be seen that the CPU runtime of n-detect timing-unaware ATPG is small compared to timing-aware ATPG (TA-ATPG) even though n increases up to 10. The CPU runtime of the hybrid method increases approximately linearly with the pattern count of the original pattern repository. Without noise calculation (Row 4, CPU (HB)), the hybrid method is much faster than the timing-aware ATPG. The noise calculation, especially for crosstalk, consumes most of CPU runtime. For example, applying the hybrid method alone to 10-detect pattern set of this circuit consumes less than 5 min but it requires more than 5 h when considering crosstalk (Column 6). This is due to the fact that (1) each sensitized path has multiple net segments, (2) each net segment may have multiple aggressors, and (3) the procedure must extract all the information of the victim and aggressors (arrival time, transition direction, load capacitance, as well as coupling capacitance between the victim and aggressors) for accurate crosstalk calculation. The PSN calculation only takes individual gate into account, which makes the procedure much simpler. Therefore, PSN calculation consumes much less CPU runtime compared with crosstalk calculation (24 min vs. 5 h and 21 min for 10-detect pattern set as shown in Column 6). The CPU runtime can be used as trade-off with calculation accuracy. For example, if the procedure only calculates the PSN and crosstalk effects only on critical long paths, it can save significant runtime since only a smaller number of paths in the design are critical. Furthermore, better programming, optimized data structures and algorithms can help speed up the procedure significantly.
5.5 Summary This chapter has presented a hybrid method for SDD pattern grading and selection considering the impact of pattern-induced noises, e.g., power supply noise and crosstalk. PSN and crosstalk calculation procedures are introduced to take their impact on path delay into consideration. The accuracy of the calculation procedures is validated by comparing with full-circuit SPICE simulation. This pattern selection procedure is performed based on n-detect pattern sets, taking advantage of low CPU runtime and effective SDD detection of the n-detect ATPG. The procedure reduces the pattern count significantly by selecting the most effective patterns in the pattern repository for SDD detection. The method was implemented on two IWLS benchmarks and the experimental results demonstrate the efficiency of this procedure.
104
5 Power Supply Noise- and Crosstalk-Aware Hybrid Method
References 1. A. B. Kahng, B. Liu, X. Xu, “Statistical Gate Delay Calculation with Crosstalk Alignment Consideration,” in Proc. of the 16th ACM Great Lakes Symposium on VLSI (GLSVLSI’06), pp. 223–228, 2006 2. A. H. Ajami, K. Banerjee, A. Mehrotra, and M. Pedram, “Analysis of IR-Drop Scaling with Implications for Deep Submicron P/G Network Designs,” in Proc. Of the Fourth International Symposium on Quality Electronic Design (ISQED’03), pp. 35–40, 2003 3. C. Tirumurti, S. Kundu, S. K. Susmita, and Y. S. Change, “A Modeling Approach for Addressing Power Supply Switching Noise Related Failures of Integrated Circuits,” in Proc. of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04), 2004 4. ISCAS 89 Benchmarks, “http://www.fm.vslib.cz/ kes/asic/iscas/” 5. IWLS 2005 Benchmarks, “http://iwls.org/iwls2005/benchmarks.html” 6. J. Lee and M. Tehranipoor, “A Novel Test Pattern Generation Framework for Inducing Maximum Crosstalk Effects on Delay-Sensitive Paths,” in IEEE International Test Conference (ITC’08), 2008 7. J. Ma, J. Lee, and M. Tehranipoor, “Layout-Aware Pattern Generation for Maximizing Supply Noise Effects on Critical Paths,” in Proc. IEEE VLSI Test Symposium (VTS’09), 2009 8. J. Saxena, K.M. Butler, V.B. Jayaram, S. Kundu, N.V. Arvind, P. Sreeprakash and M. Hachinger, “A Case Study of IR-Drop in Structured At-Speed Testing,” in Proc. Intl. Testing Conf., pp. 1098–1104, 2003 9. J. Wang, D. M. Walker, A. Majhi, B. Kruseman, G. Gronthoud, L. E. Villagra, P. van de Wiel, and S. Eichenberger, “Power Supply noise in Delay Testing,” in IEEE International Test Conference (ITC’06), 2006 10. N. Ahmed, M. Tehranipoor, and V. Jayaram, “Supply Voltage Noise Aware ATPG for Transition Delay Faults,” in 25th IEEE VLSI Test Symmposium (VTS’07), 2007 11. N. Ahmed, M. Tehranipoor, and V. Jayaram, “Transition Delay Fault Test Pattern Generation Considering Supply Voltage Noise in a SoC Design,” in Proc. Design Automation Conference (DAC’07), 2007 12. R. Tayade, J. A. Abraham, “Critical Path Selection For Delay Test Considering Coupling Noise,” in Proc. IEEE European Test Symposium, pp. 119–124, 2008 13. W. Chen, S. Gupta, and M. Breuer, “Analytic Models for Crosstalk Delay and Pulse Analysis Under Non-Ideal Inputs,” in Proc. of International Test Conf. (ITC’97), pp. 809–818, 1997 14. W. Chen, S. K. Gupta, and M. A. Breuer, “Test generation for crosstalk-induced delay in integrated circuits,” in Proc. Int. Test Conf., pp. 191–200, 1999 15. “Liberty User Guide, Vol. 1, 2,” Version 2007.12
Chapter 6
SDD-Based Hybrid Method
6.1 Introduction Chapter 3 presented a SDF-based hybrid method to generate patterns with minimized pattern count and large long path sensitization. The pattern grading and selection is based on sensitized long paths of each test pattern in the original pattern repository. This method was enhanced in Chaps. 4 and 5. Chapter 4 is based on statistical timing analysis and takes process variations and crosstalk effects into consideration, when evaluating the sensitized path length. Chapter 5 enhances the hybrid method by taking the impacts of power supply noise and crosstalk into consideration when calculating the path delay. All these methods use sensitized long paths as a criteria to grade and select patterns, which makes them slow when scaling for large industry designs with millions of gates, due to the fact that the number of paths in the design increases exponentially with the circuit size [17–19]. This chapter presents a new pattern grading and selection procedure for screening SDDs, which is based on detected SDDs and their actual slack, rather than sensitized long paths like the previous chapters. This significantly saves the CPU runtime of path tracing and hardware for storing sensitized paths. Several techniques are used in this chapter to enable the procedure scalable to very large industry circuits. Before generating the original pattern repository, static timing analysis (STA)-based timing-critical fault selection is performed to save ATPG runtime and hardware resources. After pattern generation, fault simulation is performed on each individual test pattern for the pattern evaluation and selection. Parallel fault simulation is used to further reduce the CPU runtime. A new fault merging technique is also presented to reduce the hardware resources and data processing runtime in the pattern selection procedure. The pattern selection procedure will minimize the overlap of detected SDDs between patterns. This can ensure that only the most effective patterns with minimum overlap between detected SDDs can be selected and can further reduce the selected pattern count. 1-detect top-off ATPG is performed after pattern selection to
M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 6, © Springer Science+Business Media, LLC 2011
105
106
6 SDD-Based Hybrid Method
ensure that the final pattern set can detect all the testable TDFs in the design. Several new metrics are used to evaluate the efficiency of the final pattern set for screening SDDs on large designs. The remainder of this chapter is organized as follows. Section 6.2 presents the techniques used for reducing CPU runtime and memory. The pattern grading and selection procedure is presented in Sect. 6.3. The procedure introduced in this chapter is applied to several academic and industrial circuits and the experimental results are presented in Sect. 6.4. Finally, Sect. 6.5 concludes this chapter.
6.2 Techniques for Reducing Runtime and Memory 6.2.1 Critical Faults Identification In this chapter, the n-detect pattern set is used as the original pattern repository since it has demonstrated its efficiency for screening SDDs [3–8]. However, the CPU runtime of n-detect ATPG may still be significant when n is large. Furthermore, the n-detect ATPG on the entire fault list results in a significantly large pattern count [10–16], which requires large hardware resources and CPU runtime for the following fault simulation step. In fact, there is a large portion of faults in a design that may never be timing-critical, and it is not necessary to run n-detect ATPG on them. Therefore, the procedure identifies and select the timing critical faults before running n-detect ATPG to avoid unproductive consumption on CPU runtime and hardware resources. In practice, the presented procedure can be applied to any kind of pattern set. This chapter uses the static timing analysis (STA) tool for critical fault selection. Note that the STA tool reports the minimum fault slack by calculating the length of the longest path running through the fault site. In reality, the actual fault slack after pattern generation may not necessarily be equal to this minimum value since (1) the longest path running through the fault may not be testable, and (2) the ATPG tool does not generate patterns to detect the target fault via the longest path. A critical fault selection slack threshold is needed for the timing critical fault selection. All the TDFs with minimum slack equal or smaller than the pre-defined slack threshold will be selected as timing-critical faults. This chapter considers a fault as timing critical if its minimum slack is equal or smaller than SSLthr = 0.3T , where T is the clock period. It indicates that all the faults on paths that are equal or longer than 0.7T will be selected for n-detect pattern generation. In practice, some other slack threshold can be used for the critical fault selection according to the application requirements. Table 6.1 shows the efficiency of the critical fault selection method on two academic circuits (ethernet and wb conmax) and one industry circuit (shown as “Circuit A”). The slack threshold for critical fault selection is 0.3T , where T is the clock period. 10-detect ATPG is performed on both total fault (TF) list and selected critical fault (CF) list of the circuits. It is clearly seen that the number of faults for 10-detect ATPG is significantly reduced after critical fault selection for all
6.2 Techniques for Reducing Runtime and Memory
107
Table 6.1 Comparison between TF- and CF-based methods for 10-detect ATPG on two academic circuits and one industry circuit Circuit wb conmax
# faults # patterns CPU
TF-based 347,300 3,302 2 m 11 s
CF-based 28,981 771 38 s
Percent (%) 8.34 23.35 29.01
ethernet
# faults # patterns CPU
868,248 17,582 8 m 37 s
28,209 5,906 2 m 26 s
3.25 33.59 28.24
Circuit A
# faults # patterns CPU
3,396,938 >500 K >5 days
377,857 56,784 17 h 30 m 03 s
11.12 1
cured
Circuit under diagnosis Fig. 10.6 An example of the cure injection and curable output
Checking if an output is curable by a signal f can be done by fault simulation. Since there may be millions of signals in a CUD, and thousands of test patterns, this process could be very time-consuming. Several approximation techniques have been proposed to speed up this process. The authors in [22] proposed a back propagation heuristic to obtain candidate signals by tracing back from the target primary output towards its entire fanin cone. This technique can collect all possible single-defect candidates by one pass of traversing of the circuit under diagnosis CUD, and therefore is much faster than fault simulation. The penalty of this methodology is the sacrifice of accuracy. The authors in [16] improved the accuracy of the back propagation technique. In general, a signal with many curable outputs is mostly like to be a fault location [13]. Actually, the number of curable outputs is a major ranking metric to evaluate the possibility of a signal being a defect site in the inject-and-evaluate technique. If a signal has many curable outputs, it is more likely to be a defect site, and therefore has a larger ranking.
10.2.3.3 Curable Vector A test pattern (test vector) p is called curable vector of a signal f if the injection at f can recreate all the output mismatch in the circuit under diagnosis CUD with respect to p (produce exactly the same output responses as the defective chip). The curable vector is also referred to as single location at a time (SLAT) pattern [18]. The number of curable vectors is a better ranking metric compared with the number of curable outputs due to the reasons listed below:
184
10 Introduction to Diagnosis
a
1 0
Input pattern p1
1 1 0 0
failed failed failed
Failing chip
b
1 1->0
Input pattern p1
f=f
0->1 1 1->0 0
cured cured cured
Circuit under diagnosis Fig. 10.7 An illustration of a curable vector
1. The curable vector is a very stringent condition. It not only checks the recreation of the failing syndrome, but also the side effects of the injection (i.e., whether there is new mismatch created) when grading the signal. If a signal is able to cure a failing vector, it strongly indicates that the signal is one of the defect sites. 2. The curable vector-based metric checks all the output responses simultaneously, instead of one by one like the curable output-based metric. 3. The curable vector-based metric can be used to check whether a signal f is a single-fault candidate or not. Generally, if a failing chip only has one single fault, there would be always a signal in the CUD that can cure all failing test vectors. Figure 10.7 illustrates an example of a curable vector p1 . When applying the test pattern p1 to the faulty circuit, we observed three output mismatches (compared with the fault-free simulation results) as can be seen in Fig. 10.7a. After flipping the value of signal f in the CUD, the value-change event propagates to all three mismatched outputs but not to any originally matched outputs. Therefore, p1 is a curable vector with respect to signal f . The vector in Fig. 10.8 is not a curable vector since it creates a new mismatch output. In the inject-and-evaluate procedure, both number of curable vectors and number of curable outputs for each signal are calculated and used to rank the signals. The ranking process following the rules below: 1. A signal with a larger number of curable vectors has a higher rank. 2. When two signals have the same number of curable vectors, the number of curable outputs is used to break the tie. 3. When new mismatch is created like Fig. 10.8, a penalty is imposed on the signal. For example: rank = #curableout puts − 0.5 ∗ # new mismatched outputs [23].
10.3 Diagnosis of Scan Chain
185
a
1 1
Input pattern p2
1 1
failed
1 0
Failing chip
b Input pattern p2
1 1 f=f
0->1 cured 1->0 new mismatch 1 0
Circuit under diagnosis Fig. 10.8 An illustration of a non-curable vector
10.3 Diagnosis of Scan Chain The combinational logic diagnosis is under the assumption that the sequence circuits of the chip (scan chains and flip-flops) are fault-free. However, it has been reported that in some cases scan chain failures can account for over 50% of chip failures [23]. Identifying and locating the scan chain defects is also very important for infield reliability and yield improvement.
10.3.1 Preliminary Scan Chain Diagnosis Like the combinational circuit diagnosis, both cause-effect and effect-cause analysis can be used for the scan chain diagnosis, and certain fault types have been commonly targeted [24, 25] for the scan chain diagnosis. The fault type is usually classified as functional fault or timing fault. Stuck-at and bridging faults are usually used for diagnosing functional faults, while slow-to-fall and slow-to-rise transition faults are usually used for diagnosing timing faults. The fault type can also be classified as permanent or intermittent. The permanent fault refers to a fault occurs in any situation if being sensitized, such as a stuck-at fault. On the contrary, an intermittent fault only occurs a certain operating environment, such as timing fault at severe power supply noise and crosstalk situation, or a bridging fault. Several faults on scan chains can be detected or even diagnosed with flush patterns. Figure 10.9 shows an example of the effect of a faulty scan chain (as can be
186
10 Introduction to Diagnosis
a Combinational Circuit
SI
D SD
Q
CLK SE
D SD
Q
CLK SE
D SD
Q
CLK SE
D SD CLK SE
Q
D SD
Q
CLK SE
D SD
Q
SO
CLK SE
CLK SE
b
Fault Stuck-at-0 Stuck-at-1 Slow-to-rise Slor-to-fall
Scan inputs 001100 001100 001100 001100
Scan outputs 000000 111111 001000 011100
Fig. 10.9 An example of the scan chain fault syndromes under a flush pattern. (a) the faulty scan chain, and (b) the observed outputs with different fault types at the fault site
seen in Fig. 10.9a, the fault locates at the output of the third flip-flop counted from the leftmost side). The pattern flushed into the scan chain is {001100}. Assuming that the rightmost bit of the pattern goes into the scan chain first, and shifts out of the scan chain first. With different fault type at the fault site, the observed output would be different, as can be seen in Fig. 10.9b. Here, it is assumed that the extra delay added by the slow-to-fall or slow-to-rise fault is larger than one clock period so that it can fail the flush test, but smaller than two clock period, so that it comes out as a timing fault, rather than a permanent stuck-at fault. It has to be mentioned that the flush test pattern is not good enough to locate the fault. For example, if the same fault type locates on a different flip-flop, The observed outputs are the same as those shown in Fig. 10.9b. But the flush test pattern does help to classify the fault type. For example, an all-0 syndrome indicates a stuckat-0 fault in the scan chain, while an all-1 syndrome indicates a stuck-at-1 fault in the scan chain. The slow-to-fall and slow-to-rise faults can also be indicated as shown in Fig. 10.9. The diagnosis of the bridging faults in the scan chain is usually more challenging due to the large candidate space and their intermittent nature.
10.3.2 Hardware-Assisted Diagnosis Generally speaking, the diagnosis of scan chain is difficult than that of the combinational circuit due to the fact that (1) the scan chain is assumed to be fault-free when diagnosing the combinational circuit, and the responses of the combinational circuit can be captured by the flip-flops and shifted out properly. (2) It is impossible to do the same assumption when diagnosing scan chains, and therefore the observability of a scan chain is limited (only the scan outputs). (3) The
10.3 Diagnosis of Scan Chain Fig. 10.10 (a) Normal D flip-flop, (b) equivalent scan flip-flop, and (c) hardware-assisted scan flip-flop used for diagnosis
187
a D
D
Q
Q
DFF CLK
c
Scan flip-flop
b
D flip-flop
D SD SE CLK
0 1
D
Q
Q
DFF
Hardware-assisted scan flip-flop
D SD INVERT SE CLK
0 1
D
Q
Q
DFF
track back of a scan chain failure to locate the fault site is more difficult than combinational circuit, due to the fact that every observed bit would pass through the fault site and may get distorted. The basic idea of the hardware-assisted diagnosis method is to insert extra logic to the scan chain so that to facilitate the diagnosis process [26, 27] and to locate the fault site. As mentioned in Chap. 1, a MUX is added to a normal D flip-flop (as shown in Fig. 10.10a) to produce a scan flip-flop (as shown in Fig. 10.10b) for the purpose of test. It can be seen that two additional pins, scan-in (SD) and scan enable (SE), are added to the flip-flop. A hardware-assisted diagnosis method is proposed to insert some extra logic to the scan flip-flop, i.e., an XOR gate as shown in Fig. 10.10c, for the purpose of diagnosis. In this case, an extra pin needs to be added to control the XOR gate (named INVERT in Fig. 10.10c). The hardware-assisted scan flip-flop has three operation modes: 1. Normal operation mode, when SE = 0. In this mode, the input of the flip-flop comes from the logic circuit. 2. Scan operation mode, when SE = 1 and INV ERT = 0. In this mode, the input of the flip-flop comes from its scan input, connected to the output of its previous flip-flop or the primary scan input of the circuit. 3. Inverted scan operation mode, when SE = 1 and INV ERT = 1. In this mode, the input of the flip-flop is the inverted value of its scan input. The scan chain diagnosis can be done by comparing the scan outputs in the scan operation mode and the inverted scan operation mode. The following definitions need to be introduced to help understanding the hardware-assisted diagnosis method:
10.3.2.1 Snapshot Image At a certain time instance, each scan flip-flop has a logic value, and the combination of these logic values is called the snapshot image of the scan chain at this particular
188
10 Introduction to Diagnosis
time instance. It has to mention that the snapshot image of a faulty scan chain may not be available due to the fact that (1) the scan flip-flop values can be observed only if they can be shifted out correctly, and (2) the shift out values may be distorted by the fault in the scan chain.
10.3.2.2 Observed Image The shift out values observed at the scan output is called observed image of the scan chain, which is the scan-out version of the snapshot image. These two images could be different due to the presence of faults in the scan chain. With the hardware-assisted scan flip-flop, the scan chains diagnosis can be performed in the following steps: 1. Scan in a flush pattern. For example, an all-1 pattern for the stuck-at-0 fault, and an all-0 pattern for the stuck-at-1 fault. 2. Invert the scan chain by enabling the INVERT signal. 3. Apply one clock so that the scan chain can record the inversion. 4. Disable the INVERT signal and scan out the snapshot image of the scan chain. Here, we use the circuit shown in Fig. 10.9a as an example to indicate how the hardware-assisted scan flip-flop helps in diagnosing the scan chain. First, the inversion XOR gate is inserted to each flip-flop. Assume that the scan output is also inverted by an XOR gate, as shown in Fig. 10.11. It is also assumed that the fault locates at the output of the third flip-flop counted from the leftmost side as shown in Fig. 10.11a, which is a stuck-at-0 fault. The snapshot image of the scan chain is captured at the output of the XOR gate. Therefore, after scan in the all-1 pattern, the snapshot image of the scan chain is {110000}. Then we enable the INVERT signal to invert the snapshot image as {001111}. The INVERT signal is disabled after applying one shift clock to ensure proper scan out (the waveform of the INVERT signal is shown in Fig. 10.11b). After that, we continue to apply the shift clock to get the observed image as {001111}. Obviously, the fault locates at the edge between the 0’s and 1’s. The stuck-at-1 fault can be located by applying an all-0 pattern.
10.3.3 Inject-and-Evaluate Scan Chain Diagnosis As mentioned above, although the flush test pattern is ineffective for locating the fault, it does help to classify the fault type, which can benefit the following diagnosis process. Once the fault type is recognized, the inject-and-evaluate paradigm can be used for scan chain diagnosis as well [28]. Similar to the combinational circuit diagnosis, the basic idea of the inject-and-evaluate scan chain diagnosis is to inject a fault, run simulation and based on that to determine which signals are more responsible for the scan chain failure and have more possibility to be the defect sites. The inject-and-evaluate scan chain diagnosis is a software method without
10.3 Diagnosis of Scan Chain
189
a Combinational Circuit
D
SI
D
SD
Q
CLK SE
D
SD CLK SE
Q
D
SD
Q
CLK SE
D
SD CLK SE
Q
D
SD
Q
SD
CLK SE
Q
SO
CLK SE
CLK SE INVERT
b
Scan in
Scan out
CLK
INVERT SE 1
c Fault
Scan inputs
Stuck-at-0 Stuck-at-1
111111 000000
Snapshot image (after scan in) 110000 001111
Snapshot image (after inversion) 001111 110000
Observed image (after scan out) 001111 110000
Fig. 10.11 Scan chain diagnosis with hardware-assisted scan flip-flops. (a) the faulty scan chain, (b) the signal waveform, and (c) the responses with different fault types at the fault site
any area overhead. Generally, the following operations are performed for the injectand-evaluate scan chain diagnosis: 1. Apply the flush test patterns to classify the fault type. 2. For each failing test pattern, shift it into the scan chain of a fault-free CUD, apply the test pattern to the combinational circuit and capture the responses, and then shift out and observe the response (i.e., scan-capture-scan). The internal values of the circuit can also obtained by the simulation on the fault-free CUD. 3. For each fault candidate, inject the fault, do the same scan-capture-scan process as the previous step and derive the observed image by the event-driven simulation. 4. Compare the observed image with the failing syndrome on chip. A signal is mostly like to be a fault candidate if it can reproduce the fault syndrome. Figure 10.12 shows an example of the scan-capture-scan on one scan chain. Assume that the shift in test vector is {101110}, and there is a stuck-at-0 fault at the output of the third flip-flop counted from the leftmost side. Therefore, the snapshot image of the scan chain after scanning in is {100000}, which is applied to the combinational circuit, as shown in Fig. 10.12a. Then the responses of the
190
10 Introduction to Diagnosis
a
Combinational Circuit
101110
D
D
SD
SI
Q
1
D
SD
CLK SE
Q
0
D
SD
CLK SE
Q
CLK SE
D
SD
0
Q
CLK SE
0
D
SD
Q
CLK SE
0
SD
Q
CLK SE
0 SO
CLK SE
b
Combinational Circuit D
D
SD
SI
Q
0
CLK SE
D
SD
Q
CLK SE
1
D Q
SD
0
CLK SE
D
SD
Q
CLK SE
1
D
SD
Q
CLK SE
1
SD
Q
CLK SE
0 SO
CLK SE
c
Combinational Circuit D
D
SD
SI
CLK SE
Q
0
D
SD CLK SE
Q
1
D
SD CLK SE
Q
D
SD
0
CLK SE
Q
1
D
SD CLK SE
Q
1
SD CLK SE
Q
000110 0
SO
CLK SE
Fig. 10.12 The scan-capture-scan procedure in the inject-and-evaluate scan chain diagnosis. (a) Scan in and apply the test vector (scan-enable (SE) = 1), (b) Capture the response of the combinational circuit (SE = 0), (c) Scan out the captured values (SE = 1)
combinational circuit (assume to be {010110}) is captured into the scan chain after disabled the Scan-Enable (SE) signal, as can be seen in Fig. 10.12b. After scanning out, the observed image would be {000110} due to the stuck-at-0 fault in the scan chain, as shown in Fig. 10.12c. If a failing chip also has the same responses as the inject-and-evaluate simulation ({000110} in this example) with respect to the test vector {101110}, it is mostly like that the chip has the same fault as the assumption. When there is no perfect match can be found, the fault ranking methodologies discussed in the combinational logic diagnosis section can still be used to identify the possibility of a signal to be a fault site. Obviously, if a signal has a higher ranking score, it is more likely to be a fault site. There are several more scan chain diagnosis techniques that have not been covered in this book, such as the signal-profiling-based method [29]. This method selects diagnostic test sequences in functional mode to analyze the fault effects. Compared with the traditional cause-effect or effect-cause analysis, the signalprofiling-based method is much faster since it does not need to simulate a large number of fault candidates.
References
191
10.4 Chip-Level Diagnosis This section introduces the chip-level diagnosis strategies for diagnosing multiple faults on a large chip simultaneously. The terms structurally independent fault and dependent fault are often used in chip-level diagnosis. If a fault’s fanout cone does not overlap with the fanout cones of any other faults, it is referred to as a structurally independent fault. Otherwise, it is referred to as a structurally dependent fault. Obviously, an independent fault is easier to be identified than a dependent fault due to the fact that an independent fault is the sole cause of the mismatch at its reachable outputs. The multiple-fault diagnosis problem can be decomposed into several block-level diagnosis problems using the divide-and-conquer strategy [20]. In general, the chip-level diagnosis is performed in two phases. The first phase performs fault simulation with the failing test patterns, to identify the prime candidates. The second phase performs structural analysis and decompose the CUD into several diagnosis blocks, so that each block can be diagnosed using the blocklevel techniques [23]. Note that the diagnosis resolution is not only related to the diagnosis algorithm, but also the test pattern quality. During the test pattern generation procedure, various techniques are used to produce the high fault-coverage test patterns for maximizing the fault coverage and minimizing the pattern count. As a result, most of the failing patterns are these high fault-coverage test patterns, which are used for diagnosis. However, the high fault-coverage test patterns do not necessarily guarantee a high diagnostic resolution. If the resolution of the existing failing patterns is not enough, extra diagnosis patterns might be generated using the diagnostic test pattern generation (DTPG) process. In general, the DTPG process tries to generate diagnostic patterns that can further refine the fault candidates that are indistinguishable to the existing failing patterns.
References 1. J. B. Khare, W. Maly, S. Griep, and D. Schmitt-Landsiedel, “Yield-oriented computer-aided defect diagnosis”, in IEEE Trans. Semiconductor Manufacturing, pp. 195–206, 1995 2. J. Segura, A. Keshavarzi, J. Soden, and C. Hawkins, “Parametric failures in CMOS ICs: A defect-based analysis”, in Proc. Internatial Test Conf., pp. 90–99, 2002 3. R. E. Tulloss, “Size optimization of fault dictionaries”, in Proceedings of the 1978 Semiconductor Test Conference, pp. 264–265, 1978 4. I. Pomeranz and S. M. Reddy, “On the generation of small dictionaries for fault location”, in Proceedings of the 1992 International Conference on Computer-Aided Design, pp. 272–279, 1992 5. V. Boppana and W. K. Fuchs, “Fault dictionary compaction by the elimination of output sequences”, in Proceedings of the 1994 International Conference on Computer-Aided Design, pp. 576–579, 1994 6. P. G. Ryan, W. K. Fuchs, and I. Pomeranz, “Fault dictionary compression and equivalence class computation for sequential circuits”, in Proceedings of 1993 International Conference on Computer-Aided Design, pp. 508–511, 1993
192
10 Introduction to Diagnosis
7. P. G. Ryan, “Compressed and dynamic fault dictionaries for fault isolation”, in CRHC Technical Report UILU-ENG-94-2234, 1994 8. V. Boppana, I. Hartanto, and W. K. Fuchs, “Full fault dictionary storage based on labeled tree encoding”, in Proceedings of VLSI Test Symposium, 1996 9. J. Wu and E. M. Rudnick, “Bridging fault diagnosis using stuck-at fault simulation”, in IEEE Trans. on Computer-Aided Design, pp. 489–495, 2000 10. R. Davis and H. E. Shrobe, “Diagnostic reasoning based on structure and behavior”, in Artificial Intelligence, vol. 24, pp. 347–410, 1984 11. S. J. Sangwine, “Fault diagnosis in combinational circuits using a backtrack algorithm to generate fault location hypothesis”, in Proceedings IEE-G, vol. 135, pp. 247–252, 1988 12. J. A. Waicukauski and E. Lindbloom, “Failure diagnosis of structured VLSI”, in IEEE Design Test Comput., pp. 49–60, 1989 13. I. Pomeranz and S. M. Reddy, “On correction of multiple design errors”, in IEEE Trans. Computer-Aided Design, pp. 255–264, 1995 14. S.-Y. Huang, K.-T. Cheng, K.-C. Chen, and D.-T. Cheng, “ErrorTracer: A fault simulation based approach to design error diagnosis”, in Proc. IEEE International Test Conference, pp. 974–981, 1997 15. S. Venkataraman and W. K. Fuchs, “A Deductive Technique for Diagnosis of Bridging Faults”, in Proc. IEEE International Conference on Computer-Aided Design, pp. 313–318, 2000 16. A. G. Veneris, and I. N. Hajj, “A fast algorithm for locating and correcting simply design errors in VLSI digital circuits”, in Proc. Great Lakes Symposium, pp. 45–50, 1997 17. B. Boppana, R. Mukherjee, J. Jain, and M. Fujita, “Multiple error diagnosis based on Xlists”, in Proc. Design Automation Conference, pp. 100–110, 1999 18. T. Bartenstein, D. Herberlin, L. Huisman, and D. Sliwinski, “Diagnosing combinational logic design using the single location at-a-time (SLAT) paradigm”, in Proc. IEEE International Test Conference, pp. 287–296, 2001 19. S.-Y. Huang, “On improving the accuracy of multiple fault diagnosis”, in Proc. IEEE VLSI Test Symp, pp. 34–39, 2001 20. Z. Wang, K. H. Tsai, M. Marek-Sadowska, and J. Rajski, “An efficient and effective methodology on the multiple fault diagnosis”, in Proc. IEEE International Test Conference, pp. 329–338, 2003 21. J. B. Liu and A. Veneris, “Incremental fault diagnosis”, in IEEE Trans. Computer-Aided Design, pp. 240–251, 2005 22. A. Kuehlmann, D. I. Cheng, A. Srinivasan, and D. P. Lapotin, “Error Diagnosis for TransistorLevel Verification”, in Proc. of Design Automation Conf., pp. 218–223, 1994 23. L. Wang, C. Wu, X. Wen, “VLSI Test Principles and Atchitectures – Design for Testability”, Morgan Kaufmann Publishers, 2006 24. S. Kundu, “On diagnosis of faults in a scan chain”, in Proc. IEEE VLSI Test Symposium, pp. 303–308, 1993 25. Y. Huang, W.-T. Cheng, S.-M. Reddy, C.-J. Hsieh, and Y.-T. Hung, “Statistical diagnosis for intermittent scan chain hold-time fault”, in Proc. IEEE Int. Test Conference, pp. 319–328, 2003 26. J. L. Schafer, “Partner SRLs for improved shift register diagnosis”, in Proc. IEEE VLSI Test Symposium, pp. 198–201, 1992 27. Y. Wu, “Diagnosis of scan chain failures”, in Proc. Int. Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 217–222, 1998 28. K. Stanley, “High-accuracy flush-and-scan software diagnostics”, in IEEE Design Test Comput., pp. 56–62, 2001 29. J.-S. Yang and S.-Y. Huang, “Quick scan chain diagnosis using signal profiling”, in Proc. IEEE Int. Conference on Computer Design, pp. 157–160, 2005 30. “YieldAssist User Guide Version 8.2009 1”, Mentor Graphics Inc., 2009
Chapter 11
Diagnosing Noise-Induced SDDs by Using Dynamic SDF
11.1 Introduction Timing analysis is a very important step in validation of both an IC design’s performance and the quality of test patterns [2] used on that IC. There are many signal integrity (SI) issues in designs that may impact timing performance, often in the form of small-delay defects (SDDs), such as IR-drop and crosstalk effects. These SI issues are pattern-dependent parasitic effects that may significantly impact the design performance in the latest technologies.
11.1.1 Techniques for Timing Analysis There are several techniques used for timing analysis and validation, like transistor level simulation with SPICE, gate level simulation with Standard Delay Format (SDF) annotation, as well as pattern-independent Static Timing Analysis (STA) technique [1, 3, 9]. 11.1.1.1 SPICE Simulation The Simulation Program with Integrated Circuit Emphasis (SPICE) [9] is the most trustable and comprehensive analog circuit simulator in industry and is often used as a “gold standard.” Unfortunately, due to its computation complexity, SPICE is incapable of dealing with the simulations on the entire design with millions of gates. 11.1.1.2 SDF-Based Simulation Gate-level netlist simulation with SDF (Standard Delay Format) annotation is widely used for logic and timing verification of the designs [1]. For this kind of M. Tehranipoor et al., Test and Diagnosis for Small-Delay Defects, DOI 10.1007/978-1-4419-8297-1 11, © Springer Science+Business Media, LLC 2011
193
194
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
simulation, a test-bench is built to provide stimuli, and to check the response of the design. The timing information of each gate and interconnect of the design, which were extracted from a standard library, is annotated to the design during simulation for performance analysis and evaluation. The SDF-based digital simulation is a much faster approach than SPICE for design analysis and verification, and is easy to be scaled up for large designs.
11.1.1.3 Static Timing Analysis The STA technique is also SDF-based, for which the timing information of each gate and interconnect is extracted from the standard library, and is annotated to the design during the performance analysis and evaluation [3]. Unfortunately, the SDF-based methods could be inaccurate because it ignores the important parasitic effects like IR-drop and crosstalk since the SDF is patternindependent. In this chapter, the conventional pattern-independent SDF is called static SDF. Although there are min/typical/max delay values for best/typical/worst cases in the static SDF, it may still not be able to reflect the real situation accurately without considering test patterns, and pattern-dependent parasitic effects or environmental variations.
11.1.2 Prior Work on PSN and Crosstalk The impact of power supply noise (PSN) and crosstalk on circuit performance has been addressed in many prior works as discussed in Chaps. 4 and 5. Most of the previous works are about IR-drop modeling or IR-drop-aware pattern generation rather than IR-drop defects diagnosis. Furthermore, none of previous work validated the accuracy of their methods by comparing with more trustable references, like silicon data or SPICE simulation results.
11.1.3 Chapter Contents and Organization This chapter presents the details of the IR2Delay database method, and based on that, a flow using the dynamic SDF to model IR drop is demonstrated. The flow allows a fast and accurate IR-drop-aware digital simulation for verifying design specifications and test patterns. It can also be used to diagnose IR-drop-induced defects by reporting the gates that experience severe IR-drop and delay increase and improving the diagnosis resolution for IR-drop-caused failures. A mixedsignal simulation-based flow to validate the accuracy of the IR2Delay database by comparing with full-circuit SPICE simulation results is also demonstrated. The setup of Xtalk2Delay database and generation of crosstalk-aware dynamic SDF is
11.2 IR-Drop Analysis
195
very similar to the IR2Delay database-based work. Therefore, this chapter only focuses on setup, validating the IR2Delay database, and diagnosing SDDs induced by IR drop. The remainder of this chapter is organized as follows. Section 11.2 presents the IR-drop analysis flow using a commercial EDA tool. Section 11.3 presents the work on setting up IR2Delay database. Section 11.4 introduces the mixed-signal simulation procedure used for validating the IR2Delay database-based delay calculation procedure. The experimental results on validating the IR2Delay database is presented in Sect. 11.5. In Sect. 11.6, diagnosis is performed to validate the failure paths and to pin-point key gates for IR-drop failures. Experimental results on diagnosing IR drop-induced SDDs are presented in Sect. 11.7. Finally, Sect. 11.8 concludes this chapter.
11.2 IR-Drop Analysis The circuit density and operating frequency of current VLSI designs increase as technology scales leads to severe power density problem. Power supply noise can be introduced by inductive and resistive parameters. The inductance-introduced noise, usually referred as Ldi/dt, depends on the changing rate of instantaneous current flowing through the power distribution network (PDN), as well as the inductance L, which is mainly introduced by packaging. The resistance-introduced noise, usually referred as IR drop, depends on the current and distributed resistance on the PDN. This only focuses on the resistance-introduced power supply noise and their impacts on the design performance. Figure 11.1 shows a simplified PDN for a standard-cell based design. In this design, standard cells are placed side-by-side in rows. Local power rails between standard cell rows are formed in lower metal layer (Metal 1 or Metal 2, depending on the design cells in the library). The global power rails are always routed in upper metal layer. Power vias are used to connect the global and local power rails. When there are switching activities on the cells, they will either draw current from the power rails (for charging the output nodes and capacitors) or dump current to the ground rails (for discharging the output nodes). Due to the resistance on the PDN, the current will result in a voltage drop on the power network and/or a voltage increase (also known as ground bound) on the ground network. For simplicity, only the voltage drop on power network is considered in this chapter. However, it would be easy to add the voltage increase on ground network to the procedure by using the same flow. The IR-drop analysis flow is illustrated in Fig. 11.2. In the first step of IR-drop analysis, it ran ATPG to generate at-speed patterns targeting TDFs. For each TDF pattern, it ran simulation to get its value change dump (VCD) file, with which it performs IR-drop analysis using a commercial EDA tool [6] and get the average IR-drop in a specified timing window of each gate in the design. The specified timing window for each at-speed pattern is within the launch and capture clock cycles.
196
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
Fig. 11.1 A simplified view of PDN
Global Power Rails (upper metal layer)
Local Power Rails (lower metal layer)
Power Vias
Standard Cells
Fig. 11.2 The flow for IR-drop analysis
Design Netlist, SDF, etc
ATPG
Simulation
Patterns
VCD files IR-drop Analysis
IR-drop Files
Note that the real voltage value on a specific gate is dynamic during launch and capture window. Figure 11.3 illustrates an example of the real voltage value between the launch and capture cycles on one gate in this experiment. The IR-drop analysis tool can only report the average voltage drop of gates in a user-defined timing window. Therefore, the timing window for IR-drop analysis should be carefully selected to make sure that the IR-drop results can accurately reflect the real situation in terms of performance impact. Our timing window for IR-drop analysis starts at the launch clock cycle. The end point of a timing window is chosen based on SPICE
11.3 IR2Delay Database
197
1.85 1.8 1.75
VDD (V)
1.7 1.65 1.6 1.55 1.5 1.45 1.4 0
1
2
3
4 (ns)
5
6
7
8
Fig. 11.3 Dynamic IR-drop on a gate during the launch and capture cycles
simulation to make sure that the average IR-drop induced delay and the real dynamic IR-drop induced delay are as close as possible. Hence, the average IR-drop can be used to evaluate the extra delay on each gate and interconnect in the design.
11.3 IR2Delay Database This section sets up the IR2Delay database to map the average power voltage drop to the delay increase for each cell model in the library based on SPICE simulation. Mentor Graphics Eldo [7] is used for SPICE simulation, and an in-house tool was introduced to automatically extract gate SPICE models from the library, set up test circuits, run simulation and extract the simulation results.
11.3.1 Transition Analysis For each cell model in the library, this section will measure its delay from all its inputs to its output for both rising and falling transitions. Clearly, when the cell has multiple input pins, there should be a transition on this targeted input pin when measuring the propagation delay from it to the output of the cell. All the other input pins, which are called off-path pins, must have non-controlling values such that the transition on the targeted input pin can be propagated to the output. Furthermore, the status of off-path pins may impact the results. Consider a 2-input AND gate as
198
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
Fig. 11.4 Rising edge transition propagation of an AND2X1 gate
A B
A
1
B
Y
Y d_1
d_2
case(2)
case(1) A
A
B
B
Y
Y d_4
d_3
case(4)
case(3) A
A
A
B
B
B
Y
Y
Y
0
d_6
case(6)
case(5)
Case(7)
Table 11.1 Propagation delay from A to Y for an AND2X1 gate (load capacitance: 50 FF) Test Case Case Case Case Case Case Case case (1) (2) (3) (4) (5) (6) (7) Delay(ns)
0.1288
0.1288
2.132
0.1371
–
0.1288
–
an example. As illustrated in Fig. 11.4, a total of seven cases should be considered when measuring the rising edge propagation delay from the input pin A to the output pin Y. The propagation delay of all these cases are measured via SPICE simulation and listed in Table 11.1. It is easy to measure the delay in case (1) when off-path pin B is stable. If off-path pin B has a rising transition and its transition arrives earlier than the transition on pin A, as shown in case (2), pin B is stable when the transition on pin A arrived, which is the same as case (1). Similarly, for case (3), the output transition is determined by the transition on pin B, and Pin A should be considered as off-path pin. For instance, due to a 2 ns difference between the A and B transitions, a 2.1320 ns delay from pin A to pin Y is obtained. When pin B has a rising transition simultaneously with pin A, as shown in case (4), it impacts the propagation delay on pin A. From Table 11.1 it can be seen that the delay increased is about 6.4% compared with case (1) or (2). When pin B has a falling transition, case (5) or (7) should not be considered since they cannot ensure the transition on pin A can be propagated to the output pin Y. In other words, the falling transition on pin B has to arrive later than pin A’s transition, as shown in case (6). In this case, the rising transition propagation delay from A to Y is also the
11.3 IR2Delay Database
199
Table 11.2 Propagation delay comparison for BUFX3 gate with different pre-drivers (Vdd: 1.8 V, load capacitance: 100 FF, delay type: rising edge) Pre-case Delay (ns) Diff vs. pulse (%) Diff vs. pulse (%)
Pulse 0.0985 0.00
INV1 0.1069 8.58
INV2 0.1047 6.35
NAND 2X1 0.1079 9.60
−8.58
0.00
−2.06
0.94
NOR 2X1 0.1114 13.15
OAI 2X1 0.1117 13.46
4.21
4.49
same as in case (1). In summary, the rising edge propagation delay from pin A to pin Y can be measured by setting pin B to 1 as shown in case (1). Although the simultaneous transitions (e.g., case (4)) will impact the propagation delay, it can be ignored since the delay variation is small (6.4% in our experiments). The same can be applied to the falling transition propagation delay. In this chapter, all off-path pins are set to non-controlling values when measuring the propagation delay from one input pin to the output.
11.3.2 Driving Strength Analysis In any design, a non-primary input (PI) gate is driven by another gate with finite driving strength. A non-PI gate is a gate that none of its inputs is directly connected to a PI. The driving strength of the driving gate can also impact the propagation delay on the targeted driven gate. Consider a buffer cell (BUFX3) as an example. Table 11.2 presents the propagation delay of the gate, driving by pulse source and various logic gates. The pulse source in this chapter refers to the ideal pulse signal with infinite driving strength in SPICE simulator. Row 2 in Table 11.2 presents the absolute propagation delay of the targeted BUFX3 gate with different driving gates. Row 3 presents the delay variations of BUFX3 gate when driven by the specific logic gate vs. the pulse source. Row 4 presents the delay variations of BUFX3 gate when driven by various logic gates vs. a specific INV1 gate. Table 11.2 demonstrates that the delay variation between gate driver and ideal pulse source driver is significant (6.35–13.46% in our experiments), while the delay variation between various gate drivers compared to INV1 gate driver is small (below 4.5% in this experiment). Furthermore, the output load capacitance of the targeted driven gate also has an impact on its delay variations. With the increase of load capacitance, the delay variations of different driving gates would be reduced. Figure 11.5a shows the delay variations between the driving gates vs. pulse source and Fig. 11.5b shows the delay variations between the driving gates vs. INV1 gate, with different output load capacitance of the targeted driven gate.
200
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
Fig. 11.5 Delay variations of BUFX3 gate when driven by (a) different logic gates vs. pulse source driver and (b) different logic gates vs. INVX1 gate
From Fig. 11.5, it can be seen that with the increase in load capacitance, the delay variations of different driving gates would be reduced. Regardless of the output load capacitance, the delay variation between gate driver and pulse source driver is significantly larger than the delay variation between various gate drivers and INV1 gate driver. Therefore it is more accurate to drive the targeted driven gate with a logic gate, rather than with a pulse source, when measuring its gate delay. These delays may vary for different driving gates, but the variation is small and hence can be ignored to simplify and speed up the procedure. These experiments select an appropriate driving gate to drive the targeted gates when measuring their delays. The appropriate driving gate is a gate with sufficient driving strength to drive the test gate.
11.3 IR2Delay Database Fig. 11.6 The flow for setting up power supply voltage-delay map
201
Vendor Lib Gate Definition, SPICE model, etc
i=1
For gate i, set up test circuits, run Simulation
Extract Results Write database
Enumerate all gates?
Incr i
N
Y End
11.3.3 Power Voltage-Delay Map An in-house tool was developed to perform SPICE simulation and set up the power supply voltage-delay map (shown in Fig. 11.6). For each cell model in the library, SPICE simulations are run to measure its propagation delays with 1. 2. 3. 4.
Different propagation paths, from all input pins to the output pin; Different transition direction, including rising and falling transitions; Different power voltage; Different load capacitance.
As mentioned in Sect. 11.3.1, when measuring the propagation delay from one input pin to the output pin, the off-path pins are kept to be stable non-controlling values. The logic gate with proper driving strength is selected to drive the targeted gate as discussed in Sect. 11.3.2. The simulation results are written into an IR2Delay database to be used for generating dynamic SDF later. Generally speaking, a large IR-drop will result in a large delay increase. However, for different gates, the same IR-drop will result in different extra delay. The relationship between IR-drop and delay increase of all cells are reflected in the IR2Delay database.
202
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
11.4 Mixed-Signal Simulation-Based Validation From the discussions in Sect. 11.3, it can be seen that several approximations are introduced when setting up IR2Delay database to make the procedure feasible to be applied to industrial designs, like: 1. Using lumped output load capacitance instead of distributed RC network in real case; 2. Using a fixed driving gate to drive the target gate under test instead of various driving gates with different driving strength in the design; 3. Using average IR-drop values instead of dynamic IR drop instantaneous values in real situation. Therefore, the accuracy of the IR2Delay database has to be validated before being applied to real applications. This section will focus on validating the IR2Delay database. The best way to validate the IR2Delay database is to compare with the real silicon delays, which are very difficult to measure. However, it is feasible to validate this procedure by comparing it with SPICE simulation, which has been proven to be very close to real silicon and is often used as a “golden reference” in industry.
11.4.1 Mixed-Signal Simulation The IR2Delay database can perform accurate delay calculation for a given pattern set. In order to validate the IR2Delay database calculation results, it is needed to apply the same patterns to the design and run full-circuit SPICE simulation. A design may have a large number of input/output pins. In SPICE simulation, the input signal waveform has to be specified one at a time, and thus it is very difficult to translate the existing patterns to analog stimuli signals for SPICE simulation. Figure 11.7 shows a sample waveform and its description in the SPICE netlist. If the signal waveform is not a regular pulse with a fixed frequency like the clock signal, it has to be in a piecewise format as shown in Fig. 11.7, in which every transition point
Volt (V) V2 Time (s)
V1 t0 Fig. 11.7 A sample waveform and its description in the SPICE netlist
t1 t2
t3 t4
t5 t6
t7 t8
Vsample Pin1 Pin2 PWL(t0 V1 t1 V1 t2 V2 t3 V2 t4 V1 t5 V1 t6 V2 t7 V2 t8 V1)
11.4 Mixed-Signal Simulation-Based Validation Fig. 11.8 The structure of mixed-signal simulation
203
Verilog Test Bench Input
SPICE Design
Output
Fig. 11.9 A sample command lines for running Mentor graphics ADMS
of the signal has to be specified. It is very tough and time-consuming considering that a signal may have multiple transitions and a design may have multiple input pins need to be specified. Furthermore, there seems to be no available tool that can be used to transition digital test patterns to analog stimuli in SPICE format. Therefore, a mixed-signal simulation method is presented for the SPICE design activation and simulation, as shown in Fig. 11.8. As shown in Fig. 11.8, the Verilog testbench, which was dumped from ATPG tool, is used to stimulate the SPICE design under test. Hence, it is possible reuse the ATPG testbench and take both advantages of the flexibility of Verilog assignment and the accuracy of SPICE simulation. Mentor Graphics ADMS [4] was used for the mixed-signal simulation. Virtual D/A and A/D converters are needed between the Verilog testbench and SPICE design. The D/A converter is used to translate the digital stimuli from the Verilog testbench to analog signals so that they can be applied to the SPICE design. The A/D converter is used to translate the output signals of the SPICE design to digital data, so that the Verilog testbench can read it. Figure 11.9 shows an example of command lines for running Mentor Graphics ADMS. Line 1 and line 3 are used to create the digital library and ADMS library, respectively. Line 2 is used to compile the s344 module file s344 define.v, which includes the s344 pin definition in Verilog format. The pin definition in the SPICE netlist is consistent with it. Line 4 is used to compile the s344 sub-circuit into the s344 module. It will connect the SPICE sub-circuit to the corresponding pins of the digital module compiled in Line 2. Line 5 is used to compile the test pattern file in Verilog format to obtain the testbench module s344 pat v ctl, which is used in Line 6. The ADMS simulator is invoked in Line 6. The command file mycmd.cmd includes necessary SPICE libraries, SPICE .MEASURE commands for delay and average power voltage measurement, definitions of D/A and A/D converters, power definition and any user-defined options for SPICE simulation. Line 7 is used to run the simulation.
204 Fig. 11.10 The flow of the mixed-signal simulation-based validation procedure
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF Design SPICE netlist, Patterns, etc
Full-SPICE Results
Procedure 1
Procedure 2
Mixed-Signal Simulation
IR2Delay Database Data Comparison & Analysis
Validation Results
11.4.2 Simulation Results Extraction It is very time-consuming, and sometimes impossible for large designs, to measure the delay of each gate in the waveform database resulting from SPICE simulation. Therefore, this chapter uses two procedures to automate this validation work, as shown in Fig. 11.10. 11.4.2.1 Procedure 1 This procedure is used to pre-process the SPICE netlist of the design. It will parse the SPICE netlist for information of all the gates and their corresponding pins (input, output, and power pins). Thus it can enable the simulator to measure the delay and average power supply voltage of each gate in a specific timing window using the SPICE .MEASURE command. The timing window is between the launch and capture clock cycles of the test pattern. 11.4.2.2 Procedure 2 This procedure is used to post-process the simulation results. In Procedure 1, the delay and average power supply voltage of all the gates in the design are measured. However, for a specific test pattern, not all the gates can be sensitized. This procedure will identify the gates sensitized between the launch and capture clock cycles and extract their valid delay and average power supply voltage. For the sensitized gates with multiple inputs, if more than one input has transitions, the procedure will determine which input transition is the valid one and extract the delay from the valid input pin to the output pin. The measured average power supply voltage is used to search the IR2Delay database for the validation. After simulation results are extracted, the procedure searches the IR2Delay database for gate delays according to the power supply voltage, propagation path, transition direction, and output load capacitance of the sensitized gates. Therefore, it can compare the IR2Delay database results with full-circuit SPICE simulation results to see whether the IR2Delay database is accurate enough for mapping the IR-drop to real gate delays.
11.5 Experimental Results on IR2Delay Database Validation
205
11.5 Experimental Results on IR2Delay Database Validation 11.5.1 Experimental Setup Since the full-circuit SPICE simulation is very time-consuming for large circuit, this chapter uses a small benchmark circuit from ISCAS, s344, for validation purposes. The 180 nm Cadence Generic Standard Cell Library with typical 1.8 V power supply voltage was used in these experiments. Synopsys Design Compiler [3] was used for logic synthesis, and Astro was used for physical design. Mentor Graphics FastScan [5] was used for pattern generation. Mentor Graphics Eldo [7] and ADMS were used for SPICE simulation and mixed-signal simulation. The IR2Delay database procedure was implemented with Perl, and pre-process and post-process procedures for mixed-signal simulation were implemented in C/C++.
11.5.2 Comparison with Full-Circuit SPICE Simulation This section chose a test pattern from transition delay fault test pattern set generated for s344, for which there are 32 gates being sensitized. The full-circuit SPICE simulation was run on the design to extract the delays of these sensitized gates. Then we search the IR2Delay database for these sensitized gates’ delays. The delay comparison between these two results is shown in Fig. 11.11. The x axis represents the sensitized gate ID, and the y axis represents the absolute delays
140 SPICE delay IR2Delay database delay Absolute delay value (ps)
120
100
80
60
40
20 0
5
10
15 20 Sensitized gate ID
25
30
35
Fig. 11.11 Delay comparison between SPICE simulation and IR2Delay database results
206
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF 140 SPICE delay SDF delay Absolute delay value (ps)
120
100
80
60
40
20 0
5
10
15 20 Sensitized gate ID
25
30
35
Fig. 11.12 Delay comparison between SPICE simulation and SDF file
Table 11.3 Delay correlation coefficients of different data pairs Data pair IR2Delay vs. SPICE SDF vs. SPICE Corr. coeff.
0.947
0.278
of all these sensitized gates. From the figure, it can be seen that the IR2Delay database calculated delays are smaller than the delays obtained from full-circuit SPICE simulation. There are many reasons for this difference, including the approximations made in the IR2Delay calculation, as well as ignoring the impact of some other important design parameters, like crosstalk effects, which may impact the capacitance of wires connected to the target gate, and further impact the gate delay. After taking crosstalk effects into account, the calculation is much accurate, as can be see in Chap. 5. Figure 11.12 presents the delay comparison between full-circuit SPICE simulation and SDF delays on the sensitized gates in the design. The SDF delays were extracted using commercial EDA tools. From Figs. 11.11 and 11.12, it is obvious that even though the absolute values are different, our IR2Delay database correlates very well with the full-circuit SPICE simulation results comparing with the SDF delays. Table 11.3 presents the correlation coefficients of these two data pairs. The correlation coefficient is calculated with (11.1).
ρX,Y =
E((X − μX )(Y − μY ) σX σY
(11.1)
11.6 Diagnosis for Failure Paths
207
where ρX,Y is the calculated correlation coefficient and X and Y are the two random variables used for correlation calculation, respectively. μX and μY , σX and σY are mean values and standard deviations of X and Y , respectively. From the experimental results, it is clearly that the IR2Delay database can provide more accurate delay calculation compared with SDF database. Therefore, it can be used for more accurate dynamic delay estimation.
11.5.3 Complexity Analysis The CPU runtime for setting up the IR2Delay database is approximately 6 h since tens of thousands of SPICE simulation are needed to obtain the delays of each gate under different circumstances and set up the database. However, once the database is being setup, it can be used to any designs using the target library. The computation complexity of searching the IR2Delay database is O(log n), where n is the number of logic cells in the technology library. Therefore, it is very fast and easy to scale to larger industry design.
11.6 Diagnosis for Failure Paths Whether an at-speed test pattern can fail on tester depends on the real delay of gates and interconnects in silicon, as well as the test clock frequency, if there is no physical defect (e.g., resistive open defect) in the design. For TDF test patterns, the section ran simulation and select an at-speed frequency based on the critical path delay to make sure that there are no failures with the original static SDF file of the design. Then it quickly generates dynamic SDF file for each pattern with IR-drop effect and rerun simulation again. If pushing the operating clock frequency to its limit, it can be seen that some patterns fail with the dynamic SDF files. In this controlled experiment, the IR-drop effect is the only reason for the failures. Then commercial diagnosis tool [10] is run to report the suspect failure paths. The diagnosis tool will check the failed bits in the failure log, with which to back trace the design. Logic simulation is run to see which paths may potentially fail the target bits. However, for a specific target failed bit, there may be several logic paths that could fail it. Without accurate timing calculation, the tool could not tell which path is exactly the cause of the target failed bit. Thus, it just reports all the possible paths as suspect failure paths. With the dynamic SDF, accurate timing analysis can be performed on the suspect failure paths to see which paths do violate the timing constrains of the design, and hence to improve the resolution of current diagnosis tool. We can also compare the timings in the dynamic SDF and original static SDF to see which gate(s) are the major cause of the timing failure so that can exactly pin-point the failure gate(s) in the design, which also is an improvement to the diagnosis resolution.
208
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
11.7 Experimental Results on Diagnosing IR-Drop SDDs 11.7.1 Diagnosis Flow and Experimental Setup The entire flow is illustrated in Fig. 11.13. The flow was verified on the IWLS benchmark ac97 ctrl, which contains 9,656 logic gates and 2,199 flip-flops. The experiments use 180nm Cadence Generic Standard Cell Library with typical 1.8 V power supply voltage. Synopsys Design Compiler [3] was used for logic synthesis. Cadence SoC Encounter [6] was used for layout placement and routing, as well as IR-drop analysis. Mentor Graphics FastScan [5] was used for pattern generation. Mentor Graphics Eldo [7] and ModelSim [8] were used for SPICE simulation and digital simulation, respectively. Mentor Graphics YieldAssist [10] was used for diagnosis. The power supply voltage-delay map procedure was implemented with Perl, and the SDF Updater and post-diagnosis timing analysis tool were implemented in C/C++.
Physical Design
IR-drop Analysis
Volt-Delay Map Vendor Lib
RTL Netlist
ATPG and IR-drop Analysis
Synthesis, Scan Insertion Physical Design Layout
IR-drop Files
SDFUpdater
Netlist, SDF, etc Updated SDF files
Failure Paths, Defect Gates
Accurate Timing Analysis
Fig. 11.13 The flow for emulating and diagnosis IR-drop effects
Gate Definition, SPICE model, etc IR2Delay Scripts
IR2Delay database
Accurate Sim., Diagnosis
Suspect Failure Paths
11.7 Experimental Results on Diagnosing IR-Drop SDDs
209
Table 11.4 Profiles of sample gates in the ac97 Instance/model U8701/AND2X1 Power volt. (V) 1.59 Path A→Y B→Y 0→1 Original 0.0716 0.3208 delay Updated 0.0845 0.3811 (ns) Incr. (%) 18.0 18.8
ctrl benchmark U8714/NOR2X1 1.61 A→Y B→Y 0.0847 0.0693 0.0946 0.0761 11.7 9.8
U12160 INVX1 1.76 A→Y 0.0158 0.0166 5.1
1→0 delay (ns)
0.0957 0.1068 11.6
0.0167 0.0165 −1.25
Original Updated Incr. (%)
0.0560 0.0661 18.0
0.1256 0.1492 18.8
0.0873 0.0959 9.8
11.7.2 Circuit Performance in Presence of IR Drop For a specific test pattern, the IR-drop analysis flow can report the average power voltage of each gate in the design. Combined with load capacitance information extracted from layout and transition direction from the pattern, the rising or falling delay increases for each gate, from all input pins to its output pin, can be obtained by searching the IR2Delay database. The SDF Updater (see Fig. 11.13) can update the gate and interconnect delays in the original static SDF file. Therefore for each pattern, the SDF Updater can dynamically produce a new SDF file (also called dynamic SDF file). Table 11.4 shows the power voltage and delay profiles of several sample gates in the experimental benchmark for a specific test pattern. The “original” delays in this table are extracted from the original static SDF file of the design without considering IR-drop effect, while the “updated” delays are extracted from the updated SDF file for the test pattern with IR-drop consideration. It can be seen from the last row of the table that the delays of some gates increase significantly due to the IR-drop effect (for example, gate U8701 has over 18% extra delay). Therefore, it is necessary to take the IR-drop effect into consideration for accurate performance evaluation. Furthermore, unlike common belief that the delay of a gate always increases with IR-drop impact, these experiments demonstrate that sometimes for a specific transition direction, the delay of a gate may decrease due to the IR-drop effect (see the 1 → 0 delay of gate U12160). This is because the output of the gate has a lower VDD. When there is a falling transition at the output pin of the gate, it is faster to discharge from the output pin. However, if calculating the total delay change for a path, the delay may still increase. In a word, it will overestimate the design performance if analyzed with the original static SDF.
11.7.3 Failures from IR Drop As mentioned in Sect. 11.6, whether an at-speed test pattern can fail on tester depends on the real delay of gates and interconnects in silicon, as well as the test
210 Table 11.5 Failure logs of some failed patterns
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
Pattern no. 28 28 79 115 115 167
Chain no. 17 20 20 17 20 20
Pin no. 18 22 18 22 18 18
Expected 1 1 0 1 0 0
Simulated X X X X X X
Table 11.6 The timing of suspect failure paths for test pattern 28, based on IR-drop-aware dynamic SDF file (clock cycle: 7.63 ns) Suspect path # Path length (ns)
1 7.66
2 7.56
3 3.90
4 3.28
clock frequency. The clock frequency in the following experiments is also selected based on the critical path delay and ensures that there are no failures with the original static SDF file of the design. However, if pushing the operating clock frequency to its limit, some patterns may fail with the dynamic SDF files. An example failure log from the test patterns applied to the benchmark circuit is shown in Table 11.5. The simulated value is “X” because the end-point transition of the failed paths did not meet the set up time constrain of those flip-flops. In fact, only a couple of gates in the failed paths experience severe IR-drop and large delay increase. With this failure log, diagnosis can be performed to locate these defects, and to conclude whether the failures are caused by IR drop or not. In these experiments, 6 over total 203 patterns are failed with the selected frequency. However, with different guard banding frequency, the failed pattern number would change. Thus, this method could also be used for efficient guard banding selection against IR-drop failures during design validation.
11.7.4 Timing-Aware IR-Drop Diagnosis With the failure log of test patterns, as well as IR-drop-aware dynamic SDF files, timing-aware diagnosis can be performed to report and do accurate timing analysis on the suspect failure paths. From the accurate timing analysis, it can be seen that not all the suspect paths violate the timing constrains, and should be eliminated from the IR-drop-caused failures. Take pattern 28 in the above experiment as an example, 4 suspect failure paths are reported and their path lengths based on dynamic SDF are listed in Table 11.6. From this table it can be seen that only Path 1 is the real failure path. The other paths are not the real reason of the failed bit. Therefore, given that the design is failed by IR-drop, the IR-drop aware flow can demonstrate exactly which paths caused these failures. Otherwise, just given the failure log, it can judge whether the failures are caused by IR drop or not. Actually, it can be seen from the dynamic SDF file that only a couple of gates (two gates for the failure
11.8 Summary
211
Fig. 11.14 IR-drop plots and failed gates of a test pattern
path in Table 11.6) experience severe IR-drop, and large delay increase (about 20%) by applying and comparing the original static and updated dynamic SDFs to the failure paths. The delay increase of other gates is minor and negligible. Figure 11.14 pin-points the above gates with large delay increase in the layout. It can be seen that these two gates are close to each other and both are located in the area suffers from severe IR-drop. In order to induce a comparable severe IR-drop in the small benchmark, only one pair of power/ground pins is placed at the top-right corner of the design. It has to be noted that the delay information in the SDF file may still not match the real silicon even all parasitic parameters are considered. However, with the silicon data, people can find a correlation between the SDF and real silicon delays, so that they can scale the SDF to match real silicon delay. Furthermore, the IR-drop-induced extra delay is obtained from SPICE simulation with real parasitic consideration, which is much close to real silicon. Hence it can accurately reflect the IR-drop impact to real silicon.
11.8 Summary This chapter has presented an efficient IR-drop modeling and injection procedure for design performance evaluation. For each test pattern, the IR-drop analysis is performed to obtain its power voltage drop during the launch and capture cycles. SPICE simulation is performed to build up a database to map the power voltage drop
212
11 Diagnosing Noise-Induced SDDs by Using Dynamic SDF
of a gate to the delay increase with different output load capacitance. The database was validated by comparing with full-circuit SPICE simulation results. Mixedsignal simulation was used to take both advantages of the flexibility of Verilog signal assignment and the accuracy of SPICE simulation, and to make the procedure automatic and easy to use. Validation results demonstrate that the IR2Delay database is more accurate than the current SDF-based delay calculation. Furthermore, it is very fast to calculate gate delays using IR2Delay database, which makes it easy to scale for larger industry design. The static SDF file of the design is updated with IR-drop consideration and dynamic SDF files are generated for each pattern. It is a fast and accurate simulation flow that can be applied to large VLSI circuits. Based on the dynamic SDF files, IR-drop related diagnosis can be performed, to improve the resolution of the current diagnosis tool. The dynamic pattern-dependent SDF files can also be used for performance evaluation of a design considering dynamic parametric variations caused by test patterns, or used as constraints during ATPG to improve pattern quality.
References 1. C. Hsu, S. Ramasubbu, M. Ko, J. L. Pino, S. S. Bhattacharyya, “Efficient Simulation for Critical Synchronous Dataflow Graphs”, DAC 2006 2. D. A. Stuart, M. Brockmeyer, A. K. Mok, F. Jahanian, “Simulation-Verification: Biting at the State Explosion Problem”, in IEEE Transactions on Software Engineering, Vol. 27, No. 7, July 2001 3. Synopsys Inc., “SOLD Y-2007, Vol. 1–3,” Synopsys Inc., 2007 4. “ADVance MS User Manual Versioin 2008.2”, Mentor Graphics Inc., 2008 5. “ATPG and Failure Diagnosis Tools Reference Manual”, Mentor Graphics Inc., 2008 6. “Cadence Encounter Manual”, Cadence Inc., 2008 7. “Eldo User´s Manual Version 2008.2a”, Mentor Graphics Inc., 2008 8. “ModelSim Reference Manual”, Mentor Graphics Inc., May 2008 9. “SPICE Home Page”, [Online] Available: http://bwrc.eecs.berkeley.edu/Classes/icbook/SPICE 10. “YieldAssist User Guide Version 8.2009 1”, Mentor Graphics Inc., 2009