0521865816pre
CUUK838-Tan
0 52186581 6
February 15, 2007
16:21
Char Count= 0
This page intentionally left blank
...
33 downloads
802 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
0521865816pre
CUUK838-Tan
0 52186581 6
February 15, 2007
16:21
Char Count= 0
This page intentionally left blank
0521865816pre
CUUK838-Tan
0 52186581 6
February 15, 2007
16:21
Char Count= 0
Advanced Model Order Reduction Techniques in VLSI Design Model order reduction (MOR) techniques are important in reducing the complexity of nanometer VLSI designs, and consequently controlling “parasitic” electromagnetic effects, so that higher operating speeds and smaller feature sizes can be achieved. This book presents a systematic introduction to, and treatment of, the key MOR methods used in general linear circuits, using real-world examples to illustrate the advantages and disadvantages of each algorithm. Starting with a review of traditional projection-based techniques and proofs of some fundamental theories, coverage progresses to advanced “state-of-the-art” MOR methods for VLSI design. These include HMOR, passive truncated balanced realization (TBR) methods, efficient inductance modeling via the VPEC model, general model optimization and passivity enforcement methods, passive model realization techniques, and structure-preserving MOR techniques. Numerical methods have been used throughout and, where possible, approached from the CAD engineer’s perspective. This avoids complex mathematics, and allows the reader to take on real design problems and develop more effective tools. With practical examples and over 100 illustrations, this book is suitable for researchers and graduate students of electrical and computer engineering, as well as for practitioners working in the VLSI design and design automation industries. Sh el d o n X.-D. Ta n is an associate professor in the Department of Electrical Engineering, and cooperative faculty member in the Department of Computer Science and Engineering, at the University of California, Riverside. He received his Ph.D. in electrical and computer engineering in 1999 from the University of Iowa, Iowa City. His current research interests focus on design automation for VLSI integrated circuits. Lei He is an associate professor in the Department of Electrical Engineering at the University of California, Los Angeles, where he was also awarded his Ph.D. in computer science in 1999. His current research interests include computer-aided design of VLSI circuits and systems.
0521865816pre
CUUK838-Tan
0 52186581 6
February 15, 2007
16:21
Char Count= 0
0521865816pre
CUUK838-Tan
0 52186581 6
February 15, 2007
16:21
Char Count= 0
Advanced Model Order Reduction Techniques in VLSI Design S HE LDON X. -D. T A N University of California, Riverside
LE I HE University of California, Los Angeles
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521865814 © Cambridge University Press 2007 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2007 eBook (NetLibrary) ISBN-13 978-0-511-29032-9 ISBN-10 0-511-29032-2 eBook (NetLibrary) ISBN-13 ISBN-10
hardback 978-0-521-86581-4 hardback 0-521-86581-6
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Contents Figures Tables Foreword Acknowledgments 1 Introduction 1.1 The need for compact modeling of interconnects 1.2 Interconnect analysis and modeling methods in a nutshell 1.3 Book outline 1.4 Summary
page
v viii xiv xv xvii 1 1 2 4 7
2 Projection-based model order reduction algorithms 2.1 Moments and moment-matching methods 2.2 Moment computation in MNA formulation 2.3 Asymptotic waveform evaluation 2.4 Projection-based model order reduction methods 2.5 Numerical examples 2.6 Historical notes 2.7 Summary 2.8 Appendices
8 8 11 13 20 32 32 34 34
3 Truncated balanced realization methods for MOR 3.1 Introduction 3.2 The singular value decomposition (SVD) 3.3 Proper orthogonal decomposition (POD) 3.4 Classic truncated balanced realization methods 3.5 Passive-preserving truncated balanced realization methods 3.6 Hybrid TBR and combined TBR-Krylov subspace methods 3.7 Empirical TBR and poor man’s TBR 3.8 Computational complexities of TBR methods 3.9 Practical implementation and numerical issues 3.10 Numerical examples 3.11 Summary
37 37 38 38 39 43 45 45 47 48 53 54 v
vi
Contents
4 Passive balanced truncation of linear systems in descriptor form 56 4.1 Introduction 56 4.2 The passive balanced truncation algorithm: PriTBR 57 4.3 Structure-preserved balanced truncation 60 4.4 Numerical examples 62 4.5 Summary 64 5 Passive hierarchical model order reduction 5.1 Overview of hierarchical MOR algorithm 5.2 DDD-based hierarchical decomposition 5.3 Hierarchical reduction versus moment-matching 5.4 Preservation of reciprocity 5.5 Multi-point expansion hierarchical reduction 5.6 Numerical examples 5.7 Summary 5.8 Historical notes on node-elimination-based reduction methods
67 68 70 76 80 81 84 91 91
6 Terminal reduction of linear dynamic circuits 6.1 Review of the SVDMOR method 6.2 Input and output moment matrices 6.3 The extended-SVDMOR (ESVDMOR) method 6.4 Determination of cluster number by SVD 6.5 K-means clustering algorithm 6.6 TermMerg algorithm 6.7 Numerical examples 6.8 Summary
93 95 96 99 102 104 106 111 116
7 Vector-potential equivalent circuit for inductance modeling 7.1 Vector-potential equivalent circuit 7.2 VPEC via PEEC inversion 7.3 Numerical examples 7.4 Inductance models in hierarchical reduction 7.5 Summary
118 119 124 128 131 136
8 Structure-preserving model order reduction 8.1 Introduction 8.2 Chapter overview 8.3 Background 8.4 Block-structure-preserving model reduction 8.5 TBS method 8.6 Two-level analysis 8.7 Numerical examples 8.8 Summary
137 137 138 139 141 144 149 151 157
9 Block structure-preserving reduction for RLCK circuits
158
vii
Contents
9.1 9.2 9.3 9.4 9.5 9.6 9.7
Introduction Block structure-preserving model reduction Structure preservation for admittance transfer-function matrices General block structure-preserving MOR method Numerical examples Summary Appendix
158 159 161 163 167 169 170
10 Model optimization and passivity enforcement 10.1 Passivity enforcement 10.2 Model optimization for active circuits 10.3 Optimization for magnitude and phase responses 10.4 Numerical examples 10.5 Summary
172 172 176 178 181 185
11 General multi-port circuit realization 11.1 Review of existing circuit-realization methods 11.2 General multi-port network realization 11.3 Multi-port non-reciprocal circuit realization 11.4 Numerical examples 11.5 Summary
187 187 195 197 199 203
12 Reduction for multi-terminal interconnect circuits 12.1 Introduction 12.2 Problems of subspace projection-based MOR methods 12.3 Model order reduction for multiple-terminal circuits: MTermMOR 12.4 Numerical examples 12.5 Summary
204 204 205 208 212 214
13 Passive modeling by signal waveform shaping 13.1 Introduction 13.2 Passivity and positive-realness 13.3 Conditional passivity and positive-realness 13.4 Passivity enforcement by waveform shaping 13.5 Numerical examples 13.6 Summary References
215 215 217 218 221 225 226 229
Index
238
Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10
3.1 3.2
4.1 4.2 4.3 4.4 4.5
5.1 5.2 5.3 viii
The network of an ideal delay of T . The unit impulse and unit step responses. Block diagram of (2.54). Arnoldi method based on modified Gram–Schmidt orthonormalization for SISO systems. Non-symmetric Lanczos method for SISO systems. Transient response of a non-passive circuit. Block Arnoldi method for MIMO systems. A two-port large lumped RCL circuit. Comparison of the magnitudes of Y (11) for different reduction orders for the lumped RLC circuit. Comparison of the magnitudes of Y (12) for different reduction orders for the lumped RLC circuit. Frequency responses of a reduced model and its original system. Frequency response of the input impedance of a reduced model and its original system. Frequency responses of TBR, PriTBR, and PRIMA reduced models and the original circuit. Nyquist plots of the TBR reduced model and the PriTBR reduced model. Pole zero map of system before mapping. Frequency responses of PRIMA and combined PRIMA and PriTBR reduced models and the original circuit. Frequency responses of SPRIM and SP-PriTBR reduced models and the original circuit. A hierarchical circuit. Reprinted with permission from [126] (c) 2000 IEEE. A simple RC circuit. A matrix determinant and its DDD.
9 11 21 25 26 29 31 32 33 33 54 55
63 64 65 65 66
68 71 71
FIGURES
5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11
5.12 5.13
5.14 5.15 5.16 5.17
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
Illustration of Theorem 5.1. Reprinted with permission from [122] (c) 2005 IEEE. A determinant and its YDDD. Reprinted with permission from [122] (c) 2005 IEEE. Y-expanded DDD construction. Reprinted with permission from [122] (c) 2005 IEEE. The general hierarchical model order algorithm flow. Frequency responses of µA741 circuit under different reduction orders. Reprinted with permission from [94] (c) 2006 IEEE. Frequency responses of an RC tree circuit under different reduction orders. Reprinted with permission from [94] (c) 2006 IEEE. Responses of a typical ki /(s − pi ). Reprinted with permission from [94] (c) 2006 IEEE. Frequency responses of the three-turn spiral inductor and its reduced model by using waveform matching and the common-pole method. Reprinted with permission from [94] (c) 2006 IEEE. Colpitts LC oscillator with spiral inductors. Reprinted with permission from [94] (c) 2006 IEEE. Time-domain comparison between original and synthesized models for a Colpitts LC oscillator with a three-turn spiral inductor. Reprinted with permission from [94] (c) 2006 IEEE. Frequency responses of Y11 of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE. Frequency responses of Y12 of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE. Transient responses of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE. Frequency responses of a two-bit transmission line at two ports. Reprinted with permission from [94] (c) 2006 IEEE.
ix
72 73 74 76 78 79 83
85 86
86 87 88 89 89
Terminal reduction versus traditional model order reduction. 94 Frequency responses from SVDMOR and ESVDMOR for net27 circuit.102 Frequency response from SVDMOR and ESVDMOR with different terminals for net27 circuit. 103 K-means clustering algorithm. Reprinted with permission from [75] (c) 2005 IEEE. 105 The reduction flow of combined terminal and model order reductions. 107 Simple interface circuit. 108 Frequency impedance responses from the SVDMOR method for net1026 circuit. 112 Output terminal distribution for each cluster for net1026 circuit. Reprinted with permission from [75] (c) 2005 IEEE. 113 Step responses of representative output terminals. Reprinted with permission from [75] (c) 2005 IEEE. 114
x
FIGURES
6.10
Comparison of 50% delay time among representative output terminals. Reprinted with permission from [75] (c) 2005 IEEE.
114
Step responses of representative output terminals and two suppressed outputs. Reprinted with permission from [75] (c) 2005 IEEE.
115
6.12
Comparison of 50% delay time among representative output terminals and two suppressed outputs. Reprinted with permission from [75] (c) 2005 IEEE.
115
6.13
Output terminal distribution for each cluster for net27 circuit. Reprinted with permission from [75] (c) 2005 IEEE.
116
6.14
Input terminal distribution for each cluster for circuit net38.
117
6.15
Output terminal distribution for each cluster for circuit net38.
117
7.1
(a) Electronic current-controlled vector-potential current source; (b) The Kirchoff current law for vector potential circuit. An invoking vector potential current source is employed at ai , and the responding vector potential at aj is Akj , determined by the full effective resistance network. Reprinted with permission from [135] (c) 2005 IEEE.
121
Vector potential equivalent circuit model for three filaments. Reprinted with permission from [135] (c) 2005 IEEE.
123
For five-bit bus, (a) a 1-V step voltage with 10 ps rising time and (b) a 1-V ac voltage are applied to the first bit and all other bits are quiet. The responses of the PEEC model, full VPEC model, and localized VPEC model are measured at the far end of the second bit. Reprinted with permission from [135] (c) 2005 IEEE.
129
7.4
For 128-bit bus by numerical truncation, a 1-V step voltage with 10 ps rising time is applied to the first bit, and all other bits are quiet. The responses of the PEEC model, the full VPEC model, and the tVPEC model are measured at the far end of the second bit. Reprinted with permission from [135] (c) 2005 IEEE.
130
7.5
Example of a coupled two-bit RLCM circuit under the PEEC model. Reprinted with permission from [135] (c) 2005 IEEE.
132
7.6
Example of a coupled two-bit RLCM circuit under the nodal susceptance model. Reprinted with permission from [135] (c) 2005 IEEE.
133
7.7
Frequency responses of PEEC model in SPICE, susceptance under NA and VPEC models for the two-bit bus. Reprinted with permission from [135] (c) 2005 IEEE.
133
Stamp of the second-order admittance in the NA matrix, where (a), (b) and (c) represent for G, Γ, C and B. G (rank=4) and Γ (rank=4) are both singular for 6 × 6 matrices. Reprinted with permission from [135] (c) 2005 IEEE.
134
Example of a coupled two-bit RLCM circuit under the VPEC model. Reprinted with permission from [135] (c) 2005 IEEE.
135
6.11
7.2 7.3
7.8
7.9
FIGURES
8.1
8.2
8.3
8.4
8.5
8.6 8.7
9.1 9.2 9.3 9.4
10.1 10.2 10.3 10.4 10.5
Pole matching comparison: mq poles matched by TBS and BSMOR, and q poles matched by HiPRIME. Reprinted with permission from [138] (c) 2006 ACM. Non-zero (nz) pattern of conductance matrices: (a) original system, (b) triangular system, (c) reduced system by TBS. (a)–(c) have different dimensions, but (b)–(c) have the same triangular structure and the same diagonal block structure. Reprinted with permission from [138] (c) 2006 ACM. Comparison of time-domain responses between HiPRIME, BSMOR, [139], TBS and the original. TBS is identical to the original. Reprinted with permission from [138] (c) 2006 ACM. Comparison of frequency-domain responses between HiPRIME, BSMOR, TBS, and the original. TBS is identical to the original. Reprinted with permission from [138] (c) 2006 ACM. Comparison of runtime under similar accuracy. (a) macro-model building time (log scale) comparison; (b) macro-model time-domain simulation time (log scale) comparison. Reprinted with permission from [138] (c) 2006 ACM. A P/G voltage bounce map without decoupling capacitor allocations. Reprinted with permission from [138] (c) 2006 ACM. A P/G voltage bounce map with decoupling capacitors allocated at the centers of four blocks. Reprinted with permission from [138] (c) 2006 ACM. Comparison between SPRIM, PRIMA, and BSPRIM for impedance form. Sparsity preservation of BSPRIM. Comparison between PRIMA and structure-preserving algorithm (BSPRIM) for admittance form. Comparison between PRIMA and BSPRIM with and without reorthonormalization for admittance form. Admittance Y21 response of the µA725 opamp without considering phase. Reprinted with permission from [73] (c) 2005 IEEE. Frequency response of Y12 of opamp model. Reprinted with permission from [73] (c) 2005 IEEE. Active Sallen–Key topology low-pass filter. Reprinted with permission from [73] (c) 2005 IEEE. Frequency response of Y21 of the Sallen–Key topology low-pass filter. Reprinted with permission from [73] (c) 2005 IEEE. Frequency response of Y21 of the Sallen–Key topology low-pass filter without considering phase. Reprinted with permission from [73] (c) 2005 IEEE.
xi
150
151
152
153
154 155
156
167 168 168 169
178 182 182 183
183
xii
FIGURES
10.6 10.7
10.8 10.9 10.10
11.1 11.2 11.3 11.4 11.5 11.6 11.7
11.8 11.9 11.10 11.11 11.12 11.13 11.14
12.1 12.2 12.3
Frequency response of the transfer function of the Sallen–Key topology low-pass filter. Reprinted with permission from [73] (c) 2005 IEEE.184 Transient response of the Sallen–Key topology low-pass filter with different excitations. Reprinted with permission from [73] (c) 2005 IEEE. 184 Active low-pass FDNR filter. Reprinted with permission from [73] (c) 2005 IEEE. 185 Frequency response of the transfer function of the low-pass FDNR filter. Reprinted with permission from [73] (c) 2005 IEEE. 185 Transient response of the FDNR filter with different excitations. Reprinted with permission from [73] (c) 2005 IEEE. 186 Realization of Z(s) in (11.2) and Y (s) in (11.3). Realization of Z(s) in (11.4) and Y (s) in (11.5). Realization of Z(s) in (11.6). Real-part responses of Z(s) and the remainder Z1 (s) = Z(s) − Rmin . Brune’s driving point synthesis by multiple-stage RLCM ladders (Brune’s cycle). Brune’s multiple level ladder macromodel synthesis. Example of Brune’s synthesis with passivity-preserved transformation: the non-passive T circuit is transformed to a passive coupledinductor circuit. One-port Foster admittance realization. Reprinted with permission from [94] (c) 2006 IEEE. General two-port realization Π model. Reprinted with permission from [94] (c) 2006 IEEE. Six-port realization based on Π-structure. Reprinted with permission from [94] (c) 2006 IEEE. General two-port non-reciprocal active realization. Reprinted with permission from [73] (c) 2005 IEEE. Comparison between the transfer function Y1−port (s) and its circuit realization. Comparison between the transfer function Y12 (s) and its circuit realization. Comparison between the transfer function Y22 (s) and its circuit realization. Frequency response of the three-input circuit. Reprinted with permission from [76] (c) 2006 IEEE. Frequency response of the three-input circuit with different approximation. Reprinted with permission from [76] (c) 2006 IEEE. Comparison of computation cost for admittance. Reprinted with permission from [76] (c) 2006 IEEE.
189 190 190 191 193 193
194 196 197 198 199 200 202 202
207 210 212
FIGURES
12.4
12.5
13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12
Frequency response comparison among the original circuit, PRIMA model, and MTermMOR model of the circuit clktree50. Reprinted with permission from [76] (c) 2006 IEEE. Frequency response comparison among the original circuit, PRIMA model, and MTermMOR model of the circuit sram1026. Reprinted with permission from [76] (c) 2006 IEEE. Transient response of a non-passive circuit. Frequency responses of a reduced model and its original RC circuit. Transient responses of a reduced model and its original RC circuit for a 10 GHz input. Transient responses of a reduced model and its original RC circuit for a 60 Ghz input. Algorithm flow of FFT-IFFT-based waveform shaping. Algorithm of FFT-IFFT-based waveform shaping. Ramp signal shaped at different frequencies. Low-pass-filter-based waveform shaping. Group-delay characteristic and magnitude response for different order Bessel filters (normalized frequency). Comparison of responses of different models in time domain for the first example. Comparison in time domain between reduced models based on Bessel filters and ellipse filters. Comparison of responses of different models in time domain for second example.
xiii
213
213 217 218 219 220 221 222 223 224 224 227 228 228
Tables
3.1
The Hankel singular values for a six-port linear interconnect circuit.
5.1
Simulation efficiency comparison between original and synthesized model (part I). Reprinted with permission from [94] (c) 2006 IEEE. Simulation efficiency comparison between original and synthesized model (part-II). Reprinted with permission from [94] (c) 2006 IEEE. Comparison of reduction CPU times. Reprinted with permission from [94] (c) 2006 IEEE.
5.2 5.3
6.1 6.2
6.3 6.4
7.1 7.2 7.3
8.1
xiv
Singular values of DC moment, input moment matrix and output moment matrix of the circuit net27. Singular values of the DC admittance moment, 1st order admittance moment matrices of the circuit net1026 when all the terminals are treated as bidirectional. Singular values of DC admittance moment, input moment matrix and output moment matrix of the circuit net1026. Output clustering results for the one-bit lines circuit net1026. Reprinted with permission from [75] (c) 2005 IEEE.
53
88 90 90
101
110 112 113
Table of notations. Reprinted with permission from [135] (c) 2005 IEEE.119 Settings and results of geometrical tVPEC models. Reprinted with permission from [135] (c) 2005 IEEE. 130 Settings and results of numerical tVPEC models. Reprinted with permission from [135] (c) 2005 IEEE. 131 Time-domain waveform error of reduced models by HiPRIME, BSMOR, and TBS under the same order (number of matched moments). Reprinted with permission from [138] (c) 2006 ACM.
155
Foreword
Interconnect model reduction has emerged as one crucial operation for circuit analysis in the last decade as a result of the phenomenon of interconnect dominance of advanced VLSI technologies. Because interconnect contributes to a significant portion of the system performance, we have to take into account the coupling effects between subcircuit modules. However, the extraction of the coupling renders many small fragments of parasitics. While the values of the parasitics are small, the number of fragments is huge and this makes the accumulated effect non-negligible. If left untreated, the amount of parasitics can gobble up the memory capacity and consume long CPU time during circuit analysis. Model reduction transforms a system into a circuit of much smaller size to approximate the behavior of the original description. Many researchers have contributed to the advancement of the techniques and demonstrated drastic reduction of the circuit sizes with satisfactory output responses in published reports. Many of these techniques have also been implemented in software tools for applications. However, it is important for the users to understand the techniques in order to use the package properly. To adopt these approaches, we need to inspect the following features. 1. Efficiency of the reduction: the complexity of the reduction algorithm determines the CPU time of the model reduction. The size of the reduced circuit affects the simulation time. 2. Reduction of both model order and terminals of circuits: reduction of terminals was investigated less in the past and combined terminal and model order reduction leads to more compact models. 3. Robustness of the algorithms: the numerical stability of the reduction algorithm ensures the robustness of the operation. 4. Structure of the reduced systems: the reduced systems may or may not preserve important characteristics like symmetry, reciprocity, etc. Those structure characteristics are important for reduction itself and for systems using the models. 5. Realizablility of the reduced system: the reduced system is realizable if it is passive and we can implement it using electrical elements with positive or negative values. We can simulate a realizable system with general simulation tools. Otherwise, we need to check if the reduced system satisfies the constraints of the simulation package. 6. Passivity of the reduced circuits: the passivity ensures that the simulation xv
xvi
Foreword
outputs are bounded for bounded inputs even if the reduced circuit is combined with other passive subcircuits. 7. Error bounds: The error bounds of the output responses provide users with confidence in the results. In this book, Professors Sheldon X.-D. Tan and Lei He presented a comprehensive description of the reduction techniques. They have provided motivations for the approaches and insights into the algorithms as active researchers in the field. I found that the treatment of the subject is innovative and the general description is pleasant to read. The book covers the contemporary results and opens windows on future research directions in the field. Chung-Kuan Cheng Department of Computer Science and Engineering, The University of Californiat at San Diego
Acknowledgments
Sheldon Tan is deeply indebted to the support, encouragement and love of his family: his wife, Yan Ye, his lovely daughters Felicia Tan and Lesley Tan. Sheldon Tan is also grateful to his parents, Linsen Tan, Wanhua Mao, for constant encouragement, and inspiration over the years. Many people contributed to this book. The authors would like thank some of their students, Boyuan Yan, Pu Liu, and Duo Li from University of California at Riverside for their dedicated work and the implementation of some of algorithms discussed in this book. Boyuan Yan also made significant contributions to Chapters 3 and 4 and Pu Liu made noticeable contributions to Chapter 6. We will thank Dr. Hao Yu from UCLA for his contribuation to Chapters 7 and 8. Some of the materials in this book come from the graduate level course notes of EE213, Computer-aided electronic circuit simulation, taught by Sheldon Tan at UC Riverside, who would like to thank his students in this course for developing those notes in the past three years. This book would never have been completed without support from many other sources. Sheldon Tan is grateful to Dr. Zhihong Liu, Dr. Lifeng Wu, Dr. Bruce McGaughy from Cadence Design Systems Inc. for their strong supports and collaboration for model order reduction related projects at University of California at Riverside. Sheldon Tan would like to acknowledge the support of the National Science Foundation CAREER Award under grant No. CCF-0448534, NSF awards under grant No. CCF- 0541456 and No. OISE-0623038. University of California MICRO program under grant No. 04-088, No. 05-111 and No. 06-252 via Cadence Design Systems and University of California at Riverside through UC Regent’s Faculty Fellowship, UC Senate Research Funds. These supports are essential for development of the book. Special thanks also to Ms. Anna Littlewood, Ms. Jeanette C. Alfoldi and Dr. Philip Meyler at Cambridge University Press for soliciting the proposal for this book and for their dedicated help, hard work and encouragement throughout the course of writing this book.
xvii
xviii
Acknowledgments
1 Introduction
1.1
The need for compact modeling of interconnects As VLSI technology advances into the sub-100nm regime with increased operating frequency and decreased feature sizes, the nature of the VLSI design has changed significantly. One fundamental paradigm change is that parasitic interconnect effects dominate both the chip’s performance and the design’s complexity growth. As feature sizes become smaller, their electromagnetic couplings become more pronounced. As a result, their adverse impacts on circuit performances and powers will become more significant. Signal integrity, crosstalk, skin effects, substrate loss and digital and analog substrate couplings are now adding severe complications to design methodologies already stressed by increasing device counts. It was observed that today’s high performance digital design essentially becomes analog circuit design [24] as there has been a need to observe a finer level of detail. In addition to dominant deep submicron effects, the exponential increase of device counts causes a move in the opposite direction: we need to increase the increasing design abstraction levels to cope with the design capacity growth. It was widely believed that behavioral and compact modeling for the purpose of synthesis, optimization, and verification of the complicated system-on-a-chip are viable solutions to address these challenging design problems [66]. In this book, we focus on the compact modeling of on-chip interconnects and general linear time invariant systems (LTI) because interconnect parasitics, which are modeled as linear RLCM circuits1.1 , are the dominant factors for complexity growth. Unchecked parasitics from on-chip interconnects and off-chip packaging will de-tune the performance of high-speed circuits in terms of slew rate, phase margin and bandwidth [2]. Reduction of design complexity especially for the extracted highorder RLCM networks is crucial for reducing the explosive design productivity gap in the nanometer VLSI design and verification. This book does not by any means intend to be comprehensive. The absence of coverage of work by other researchers should not diminish their contributions.
1.1
M here means the mutual inductances.
1
2
Introduction
1.2
Interconnect analysis and modeling methods in a nutshell Compact modeling of passive RLC interconnect networks has been a research intensive area in the past decade due to increasing adverse deep submicron effects and interconnect-dominant delays in current high-performance VLSI designs [23, 72]. A number of projection-based model order reduction (MOR) based techniques have been introduced [32, 33, 38, 85, 91, 113, 114] to analyze the transient behavior of interconnects. An asymptotic waveform evaluation (AWE) algorithm was first proposed [91, 92] where explicit moment matching was used to compute the dominant poles via Pade approximation. The AWE algorithm used the moment concept to control and measure the accuracy of the reduced and the original system, and this was advantageous over many previous black-box fitting methods. Also the AWE method shows that the wildly popular interconnect delay model, The Elmore delay model, is just the first order of moments of a circuit [28]. The success of the AWE method led to intensive research efforts on model order reduction of interconnect circuits. The AWE method is numerically unstable for higher-order moment approximation. The approximation is carried out around s = 0. The same authors introduced some remedial methods to overcome this problem by frequency shifting and expanding around s = ∞ [92], but a more effective method involved carrying out multiple-point expansions along the imaginary axis (called frequency hopping) and combining the expansion results at higher computing costs [19]. A more elegant solution to the numerical problem of AWE is to use projectionbased model order reduction (MOR) methods, which are based on implicit moment matching. The main idea is to project the explicit moment space into an orthonormal subspace, called Krylov subspace. The projection process basically preserves the moment information, but the Krylov vectors contain much less numerical noise compared with the explicit moments owing to the generation of Krylov subspace. The Pade via Lanczos (PVL) method was the first projection-based method [32], where the Lanczos process, which is a numerically stable method for computing eigenvalues of a matrix, was used to compute the Krylov subspace. Feldmann also proved that the reduced system implicitly matches the original system to a certain order of moments. Later on, the PVL method was extended to deal with multiple input and multiple output cases by MPVL [33], and to deal with circuits with symmetric matrices by SyPVL algorithm [38]. The Krylov subspace can also be generated by the Arnoldi process, which is based on the so-called orthogonal projection. Examples include the Arnoldi method [114] and Arnoldi transformation method [113]. But Arnoldi methods only match the half order of the moments or block moments for the same reduced order. To ensure the passivity of the reduced model further, it was shown in [63] that congruence transformation can preserve the model’s passivity if the system matrices are in a passive form. Later PRIMA [85] used the Krylov subspace vectors to form the projector for the congruence transformation, which leads to passive models with
1.2 Interconnect analysis and modeling methods in a nutshell
3
the matched moments in the rational approximation paradigm. Projection-based methods, however, have several drawbacks. First, they are not efficient for circuits with many inputs and output terminals. This reflect in the fact that the reduction cost is tied to the number of terminals; the number of poles of reduced models is also proportional to the number of terminals. Second, PRIMA-like methods do not preserve structure properties like reciprocity of a network. Third, it is difficult to apply PRIMA-like methods to model very high frequency circuits where the circuit parameters are frequency dependent or where only measure data are available in terms of scattering-parameters. Another approach to circuit-complexity reduction is by means of local node reduction. The main idea is to reduce the number of nodes in the circuits and approximate the newly added elements in the circuit matrix in reduced rational forms. The major advantage of these methods over projection-based methods is that the reduction can be made in a local manner and no overall solutions of the whole circuit are required (with some circuit realization or synthesis techniques), which makes those methods very amenable to attacking large linear networks. This idea has been explored by approximate Gaussian elimination for RC circuits [27], by the TICER program [107], which is also based on Gaussian elimination but only keeps first two moments, and by the extension of TICER method into RLC circuits [3]. The rational approximation is also explored by the direct truncation of the transfer function algorithm (DTT) [56] for tree-structured RLC circuits and by an extended DTT method for non-tree structured RLC circuits [125]. Recently, a more general topology-based node-reduction method was proposed [96, 98], in which nodes are reduced one at a time (topologically, it is called Y -∆ transformation) and the generated admittance in the reduced network is represented as an order-reduced rational function of s. This method is equivalent to symbolic Gaussian elimination (s is the only symbol) but the reduction is made on circuit topologies only, which is equivalent to the nodal analysis (NA) formulation of a circuit only. The stability is enforced by Hurwitz polynomial approximation. But this method only works for linear circuits with limited element types (RCLKVJ) and cannot be applied to reduce general linear circuits due to NA formulation requirement. A more general multi-node or block version of Y -∆ transformation, named hierarchical model order reduction or HMOR was proposed by Tan [121,124]. Since a number of nodes can be reduced at the same time, this method essentially leads to the general node-reduction based hierarchical model order reduction. The third major development for model order reduction of LTI systems is by means of control-theoretical-based truncated balance realization (TBR) methods, where the weak uncontrollable and unobservable state variables are truncated to achieve the reduced models [81, 87, 89, 131]. The TBR methods can produce nearly optimal models but they are more computationally expensive than projection-based methods. Also, TBR can produce the passive models by so called positive real TBR methods [87,131]. Recently, empirical TBR method, named poor man’s TBR, was proposed to improve the scalability of the TBR methods, which shows the connection with the generalized projection-based reduction methods [89].
4
Introduction
1.3
Book outline Mode order reduction of time-invariant linear systems is still an active research area. There are many excellent books covering the classic MOR methods such as moment matching, Krylov subspace projection-based methods and node-reduction based methods [14, 20, 72, 97]. In this book, we look at some important developments in this area since the PRIMA method was introduced in 1997, but we make no attempt to be comprehensive. Instead, we primarily present several important methods that give new perspectives on model order reduction techniques in terms of improved efficiency, accuracy, and more compact reduced model sizes over the existing projection-based methods. For instance, we look at truncated balanced realization-based methods, the hierarchical model order reduction method, MOR methods for linear circuits with multiple terminals, MOR methods for highly inductive circuits, general passivity enforcement and circuit realization techniques, and terminal reduction methods. In the following, we give the outline of this book. • Chapter 2 will review the concepts of model order reduction, moment matching, and classic explicit moment-matching methods like AWE for model reduction. Then we will review Krylov subspace projection-based model order reduction techniques, such as the projection-based MOR methods, which are still the most widely used reduction techniques. We will present the basic concepts of Krylov subspace, passivity, numerical algorithms, such as Arnoldi and Lanczos methods for obtaining orthogonal Krylov basis and reduction matrices, and the PRIMA method. We also present some important theoretical results regarding the Krylov subspaces. Some of the concepts introduced here will be used throughout this book. • Chapter 3 studies the SVD-based model order reduction technique based on the classic control theory, called truncated balanced realization (TBR), which leads to more compact models than the Krylov subspace projection MOR methods but at much higher computation costs. We will review the basic concepts of truncated balanced realization methods in terms of controllability and observability from the control-theory perspective. We then present the positive real TBR methods which can produce the passive models. After this, the empirical TBR method named poor man’s TBR is also presented, which can scale to reduce large circuits. Finally, some numerical and implementation issues with TBR methods are discussed. • Chapter 4 presents a new passive TBR method, called PriTBR, for interconnect modeling. Different from existing passive truncated balanced realization (TBR) methods where numerically expensive Lur’e or algebraic Riccati equations (ARE’s) are solved, the new method performs balanced truncation on the linear systems in descriptor form by solving generalized Lyapunov equations. Passivity preservation is achieved by congruence transformation instead of simple truncations. The PriTBR method can be applied as a second stage model
1.3 Book outline
•
•
•
•
•
5
order reduction to work with Krylov subspace methods to generate a nearly optimal reduced model from a large scale interconnect circuit while passivity, structure, and reciprocity are preserved at the same time. In Chapter 5, we present the hierarchical model order reduction method, named HMOR, which is based on multiple-point expansion. We will review basic steps of the hierarchical reduction technique and describe the flow of the multiplepoint expansion based on hierarchical reduction. The concept of symbolic analysis based on a determinant decision diagram (DDD) will be reviewed; this is the core algorithm for the HMOR. We also discus the new pole search algorithm and some important properties of hierarchical reductions such as structure preserving and numerical stability for tree-like circuits. Chapter 6 first reviews a terminal reduction algorithm named SVDMOR, which performs the reduction on the input and output position matrices of a transfer function matrix. Then we present another general terminal reduction algorithm, named TermMerg, to efficiently reduce the terminal number of general linear interconnect circuits with a large number of input and output terminals considering delay variations. TermMerg can reduce many similar terminals and keep a small number of representative terminals. It can also work with passive model reduction algorithms to generate passive compact models. This is in contrast to SVDMOR, which may not produce passive models. After terminal reduction, traditional model order reduction methods can be applied and achieve more compact models and improve simulation efficiency. Chapter 7 deals with a new inductance modeling technique, vector potential equivalent circuit and its application in the HMOR. We will discuss the concept of VPEC models and VPEC-based model mutual inductance sparsification technique. Some theoretical results of passivity of VPEC models and its application in the hierarchical modeling reduction will be presented. Chapter 8 presents a structure-preserving projection-based MOR method. It starts with the SPRIM method, which is the first structure-preserving reduction algorithm based on 2 × 2 partitioning of circuit matrices. Then we present a general block structure-preserving projection-based model order reduction technique, called TBS, which is an extension of the SPRIM based algorithm. The new algorithm can preserve the structure of the reduced circuits, which makes it easier and more efficient to realize the reduced circuits. Also, we show that by partitioning the original circuits into many disjoint subcircuits, not only can we preserve sparsity of the reduced circuits, but we can also match more poles of the original systems, thus improving the model accuracy. Chapter 9 introduces another generalized block structure-preserving reduced order interconnect macromodeling method (BSPRIM). The new approach extends the structure-preserving model order reduction (MOR) method SPRIM [37] into more general block forms. The chapter first shows how a SPRIM-like structure-preserving MOR method can be extended to deal with admittance RLC circuit matrices and show that the 2q moments are still matched and symmetry is preserved. It then shows that 2q moment match-
6
Introduction
•
•
•
•
ing can’t be achieved when the RLC circuits are driven by both current and voltage sources. Using BSPRIM improves SPRIM by introducing the reorthonormalization process on the partitioned projection matrix. The BSPRIM method can deal with more circuit partitions and can perform the general block structure preserving MOR for circuits formulated in impedance and admittance forms. The reduced models by the proposed BSPRIM will still match the 2q moments and preserve the circuit structure properties, like symmetry, as SPRIM does. Chapter 10 examines some effective methods to enforce the passivity of models and optimize models. Model passivity and passivity enforcement are widely used for building realizable models from direct measurements and simulation data for radio-frequency and microwave applications. They are also used in HMOR and TermMOR methods. The studied methods include convex-based passivity enforcement and optimization and least-square-based methods for active model optimization. Chapter 11 studies the problem of realizing a reduced model into a SPICEcompatible netlist. This process is called model realization. We first present a traditional one-port network synthesis technique, Brune’s method, for realizing a passive circuit from its mathematical model. The concept of traditional network realization will be covered. We then present a general multiport network-based realization technique, which includes a one-port realization based on a Foster’s method and general multiple-port impedance realization based on one-port realization techniques. We also discuss how to realize general non-symmetric circuits. Chapter 12 presents a novel compact reduced modeling technique to reduce interconnect circuits with many external ports called TermMOR. The proposed method overcomes the difficulty associated with subspace projectionbased MOR methods for reducing circuits with many ports. The new method can lead to much smaller reduced models for a given frequency range or much higher accuracy given the same model sizes than subspace projection-based methods. Like HMOR, the TermMOR method is a closed-loop method, as it can produce models matching the desired frequency range precisely. Chapter 13 presents an approach to enforcing the passivity of a reduced system of general passive linear time-invariant circuits. Instead of making the reduced models passive for infinite frequencies, the method works on the signal waveform driving reduced models. It slightly shapes the waveforms of the signal such that the resulting signal spectra are band limited to the frequency range in which the reduced system is passive. As a result, the reduced models only need to be band-limited passive (also called conditionally passive), which can be achieved much more easily than traditional passivity for a reduced system, especially for one with many terminals or requiring wide band accuracy (more poles). We propose to use spectrum truncation via FFT and IFFT and low-pass-filterbased approaches for transient waveform shaping processing. We analyze the delay and distortion effects caused by using low-pass filters and present methods
1.4 Summary
7
to mitigate the two effects.
1.4
Summary In this chapter, we first present the cases for compact modeling for interconnect circuits. We then briefly survey the previous developments on this topic and present what will be covered in this book for each chapter. Throughout the book, numerical examples are provided to shed light on the discussed topics to help the reader gain more insights into the discussed algorithms. All our treatments of many topics may not be mathematically rigorous. Instead, we try to present the topics from a typical computer-aided design (CAD) engineer’s perspective and try to help reader to apply those techniques to solve real VLSI design problems and develop more efficient simulation tools.
2 Projection-based model order reduction algorithms
Compact modeling of passive RLC interconnect networks has been an intensive research area in the past decade owing to increasing signal integrity effects and interconnect-dominant delay in current system-on-a-chip (SoC) design [72]. In this chapter, we briefly review the existing modeling order reduction (MOR) algorithms for linear time-invariance (LTI) systems developed over the past two decades in the electrical computer-aided design community. Since compact modeling of TLI systems is a well researched and studied field, many efficient approaches have been proposed over the years. Given the space in this book, we cannot review all of them and neither do we attempt to be complete in our review. Instead, we mainly review the Krylov subspace projection-based model order reduction methods, which are widely used MOR methods and are closely related to the rest of this book. Although there exists an excellent and detailed treatment of Krylov subspace projection-based methods already [14], for the completeness of this book, we still present some basic concepts, algorithms and important results for Krylov subspace projection-based MOR methods. We try to present them in a way that can be easily understood from the practical application point of view.
2.1
Moments and moment-matching methods In this section, we briefly review the concepts of time-domain moments, the Elmore delay and Pade-approximation-based moment-matching method, which are important concepts for subspace projection-based model order reduction methods.
2.1.1
Concept of moments In the s domain, the transfer function of a linear network H(s) is defined as the ratio of the output to the input under zero initial conditions: H(s) =
Y (s) . X(s)
(2.1)
If the input is the impulse function δ(t), its Laplace transformation is 1. So the transfer function is also the impulse response at the port. If we expand H(s) around 8
9
2.1 Moments and moment-matching methods
s = 0 by the Taylor series expansion, we have H(s) =
∞ X
mk s k ,
(2.2)
k=0
where 1 dk H(s) mk = . × k! dsk s=0
(2.3)
where the kth coefficeint of H(s), mk , is called the kth moment. Assuming that h(t) is the corresponding time-domain impulse response, we have Z ∞ H(s) = e−st h(t)dt. (2.4) 0
We rewrite moments defined in (2.4) in terms of H(t) by using the Taylor expansion of e−st in the Laplace transform H(s) and we have Z ∞ h(t)e−st dt H(s) = 0 Z ∞ (−1)k k t2 t + · · · dt 1 − st + s2 + · · · + sk = 2 k! 0 Z ∞ ∞ k X (−1) = sk tk h(t)dt. (2.5) k! 0 k=0 P k Comparing (2.5) with the definition that H(s) = ∞ k=0 mk s , moments can be rewritten as: Z (−1)k ∞ mk = tk H(t)dt, (2.6) k! 0 or Z ∞ m0 = h(t)dt, (2.7) 0 Z ∞ m1 = − th(t)dt, (2.8) Z0 ∞ 1 t2 h(t)dt, (2.9) m2 = 2! 0 ···
2.1.2PSfrag Elmore delay replacements
f (t)
Ideal delay T
Figure 2.1 The network of an ideal delay of T .
f (t − T )
10
Projection-based model order reduction algorithms
For an ideal delay network as shown in Figure 2.1, the response of the network for an input function f (t) is f (t − T ). We take the Laplace transformation of f (t − T ) and we have L(f (t − T )) = e−sT F (s),
(2.10)
where F (s) is the Laplace transformation of f (t) and L(∗) is the Laplace transformation operator. So the ideal delay element’s transfer function is Hd (s) = e−sT .
(2.11)
If we take the derivative of Hd (s) with respect to s, we have dHd (s) |s=0 = −T. ds
(2.12)
d (s) As a result, we may use dHds |s=0 as an approximate for the delay of a general linear network described by H(s), i.e.,
dHd (s) |s=0 . (2.13) ds Td is the so-called Elmore delay [28], which is also the first-order moment in (2.3). So we have Z ∞ dHd (s) h(t)tdt = |s=0 . (2.14) m1 = ds 0 Td ≈ −
Another popular mathematic interpretation of the Elmore delay is by means of probability perspective. Physically, the delay of a network can be measured using the 50% point delay of the monotonic step response from a unit step input. If h(t) R∞ is the unit impulse response, the unit step response is o h(t)dt. The 50% point delay τ is then defined as Z τ h(t)dt = 0.5. (2.15) 0
If we treat h(t), which is assumed to be non-negative for all t ≥ 0, as the probaR∞ bility density function (p.d.f.), i.e., 0 h(t) = 1, the Elmore delay, Td , is essentially the mean under the p.d.f. of h(t), the impulse response of the network Z ∞ Td = m 1 = h(t)tdt, (2.16) 0
which actually is the first-order moment m1 . It can be shown that for an ideal delay network or a network whose impulse response is symmetric, the Elmore delay is exactly the actual 50% delay of the network. For practical networks, whose responses are always skewed as shown in Figure 2.2, the Elmore delays are just an estimation. Actually Gupta and Pileggi et al. proved that the Elmore delay is the upper bound for general RC circuits [48]. The Elmore delay was first introduced by Elmore in 1948 for estimating the delay of active circuits. It was popularized by Penfield and Rubinstein [100] as it can be computed directly and efficiently for RC trees by using the R and C values
11
2.2 Moment computation in MNA formulation
1 0.9 0.8
Step response
0.7
Volts
0.6 0.5
Impulse response
0.4 0.3 0.2 0.1 0 0
1
2
3
4 Time
5
6
7
8
Figure 2.2 The unit impulse and unit step responses.
from the tree circuits in a linear time. Because of its high fidelity as a delay metric, it was intensively used for the delay and timing estimation of interconnects during the physical layout design processes [23].
2.2
Moment computation in MNA formulation In this section, we present the method for efficient computation of moments using classic circuit analysis techniques. We start with the modified nodal analysis (MNA) formulation of the general RLC linear circuits and derive the recursive moment computation formula, which is the most critical step for all the moment-matching methods and Krylov subspace projection-based model order reduction methods.
2.2.1
Recursive moment computation For a general linear network, we can apply modified nodal analysis to formulate it in the state space equation form ˙ Gx(t) + C x(t) = Bu(t) y(t) = LT x(t),
(2.17)
where G and C are the conductive and storage element matrices; B and L are the input and output positions matrices; and state variables x can be nodal voltage or branch currents of the linear circuit. Upon applying the Laplace transformation of the state equation, we have the
12
Projection-based model order reduction algorithms
state equations in the s domain GX(s) + sCX(s) − CX(0) = BU (s) Y (s) = LT X(s).
(2.18)
Assuming the initial condition is zero, X(0) = 0, and the impulse response (U (s) = 1) is applied, the state equation will become (G + sC)X(s) = B Y (s) = LT X(s)
(2.19)
Expanding X(s) using Taylor’s series at s = 0, we obtain (G + sC)(x0 + x1 s + x2 s2 + · · · + xq sq + · · · ) = B.
(2.20)
We then obtain the state moment computation formula in a recursive form x0 = G−1 B x1 = −G−1 Cx0 ··· xi = −G−1 Cxi−1 for i > 0,
(2.21)
Notice that G−1 here means we solve Gx = b and G−1 is used for solving for all the moments. Numerically, we only need to perform one LU decomposition of G = LU and then use the L and U matrices to solve for all moments sequentially. Typically, we only need a few orders, say q, of moments for achieving the required accuracy in terms of transient waveforms for RC or RLC circuits, where q n and n is the size of state vector x. As a result, transient analysis using the moment method is much faster than integration-based numerical analysis as it is independent of time steps and time intervals. For the output moments, which actually are the moments of the transfer function: H(s) = LT (G + sC)−1 B.
(2.22)
The moments mi are related to the state moment by m i = L T xi .
(2.23)
As a result, the transfer function moment mi can be directly computed in a recursive way x0 = G−1 B; m0 = LT x0 x1 = −G−1 Cx0 ; m1 = LT x1 ··· xi = −G−1 Cxi−1 ; mi = LT xi for i > 0,
(2.24)
In general, the ith block moment is given by mi = LT (−G−1 C)i G−1 B.
(2.25)
13
2.3 Asymptotic waveform evaluation
2.3
Asymptotic waveform evaluation Asymptotic waveform evaluation (AWE) is an efficient frequency-domain analysis approach and was proposed in 1990 by L. Pileggi [91, 92]. It basically combines the fast recursive moment computation methods presented in Subsection 2.2.1 with Pade approximation to compute the poles and residues of the order truncated transfer functions. We first present the Pade approximation method.
2.3.1
Pade approximation The idea of Pade approximation is to approximate a transfer function H(s) by an order-limited rational function Hq (s), where q is the order. Specifically, after the moments are generated, a general multi-input multi-output (MIMO) transfer function H(s) is represented as a Taylor series expansion form or the block moment form H(s) = m0 + m1 s + m2 s2 + · · · ,
(2.26)
where mi is the ith block moment of the circuit, and m i = L T xi ,
(2.27)
where xi is the ith order state block moment vector. Once the moment expansion is available, a Pade approximation is calculated. For a qth order approximation, 2q moments must be computed. Without losing generality, we consider a single-input single-output transfer function. Consider the transfer function at entry (p, q) and let mi = mi,pq for i = 0, 1, 2, · · · , the scalar moment expansion then can be written as Y (s) = m0 + m1 s + m2 s2 + · · · + m2q−1 s2q−1 .
(2.28)
Then we can use a qth Pade approximation rational function Hq (s) to match H(s), H(s) =
P (s) a0 + a1 s + a2 s2 + · · · + aq−1 sq−1 = , Q(s) 1 + b 1 s + b 2 s2 + · · · + b q sq
(2.29)
such that they agree on the first 2q terms in the moment form, i.e., H(s) = Hq (s) + O(s2q ).
(2.30)
To compute the coefficients ai and bi from (2.29), we have P (s) = Hq (s)Q(s),
(2.31)
or q−1 X
k=0
ak sk = (
2q−1 X k=0
mk sk )(
q X
bk sk ).
(2.32)
k=0
Here b0 = 1. The equation can be solved for different powers of s separately. If we compare the coefficients of sq on the left-hand side and the right-hand side of (2.32),
14
Projection-based model order reduction algorithms
we have 0 = m0 bq + m1 bq−1 + · · · + mq−1 b1 + mq .
(2.33)
Doing this for the coefficients of sq to s2q−1 yields a set of linear equations bq mq m0 m1 · · · mq−1 mq+1 m1 m2 · · · mq bq−1 = − (2.34) . . . .. .. . . .. . .. . . . . . b1 m2q−1 mq−1 mq · · · m2q−2 Through (2.34), we can solve for all the coefficients bi of Q(s). To solve for the coefficients ak of numerator Q(s), we solve for the following equations, which are obtained by comparing the coefficients of sq on the left-hand side and the right-hand side of (2.32) from s0 to sq−1 , which yields
1 m0 0 0 ··· 0 b1 0 ··· 0 m1 m0 = . . . . . . . . . . . . . . . . . . . . . . . . . . .. mq−1 mq−2 mq−3 · · · m0 aq−1 bq−1
2.3.2
a0 a1 .. .
(2.35)
Partial fraction decomposition and time-domain response After the Pade approximation, we obtain the order-reduced order transfer function in (2.29). To compute the transient response waveforms, we need to derive the partial fraction form from the rational function form. Partial fraction decomposition of a transfer function H(s) is to represent it in the following form, k1 kp k0 + +···+ . s − p0 s − p1 s − pq q X ki = , s − pi i=0
H(s) =
(2.36)
where pi and ki are the poles and residues of the transfer function. From this representation of H(s), the time-domain impulse response h(t) can be computed by the closed form expression h(t) =
q X
ki epi t u(t),
(2.37)
i=0
where u(t) is the unit step function. Once the impulse response poles and residues are known, the responses for some ideal signals such as the step and limited ramp inputs can be expressed easily. For instance, for unit step input U (s) = 1/s, the output response in the s-domain
15
2.3 Asymptotic waveform evaluation
is Y (s) = H(s)U (s) 1 = H(s) s q X ki = (s − pi )s i=0 =
q X ki i=0
1 1 − ), pi s − p i s (
and the corresponding time-domain response, y(t), is " q # X ki pi t y(t) = (e − 1) u(t). p i=0 i
2.3.3
(2.38)
(2.39)
Derivation of poles and residues To derive the partial fraction representation of a rational transfer function H(s), we need to compute the poles pi and residues ki . The poles are actually the roots of the denominator polynomials of Q(s) in (2.29) Q(s) = 1 + b1 s + b2 s2 + · · · + bq sq , and can be obtained by many numerical root computation processes [44]. To compute the residues ki , there are two methods. The first method requires that we know both numerator P (s) and denominator Q(s) (i.e., their coefficients). While the second method does not need to compute the numerator polynomial P (s), it can compute the residues directly once the poles are available. We first present the first method. Notice that for the residues kj in (2.36), we have q
X ki P (s) . = s − pi i=0 (s − pi ) i=0
H(s) = Qq
If we multiply factor (s − pj ) on both sides, we obtain q X P (s) ki = (s − pj ) + kj . s − pi i=0,i6=j (s − pi )
Qq
i=0,i6=j
To derive kj , we substitute s for pj in the above equation, thus P (pj ) . i=0,i6=j (pj − pi )
kj = Q q
(2.40)
In the second method for computing residues ki , we first expand each of the
16
Projection-based model order reduction algorithms
partial fraction terms in (2.36): H(s) =
q X i=0
q X s s2 ki ki 1+ = + 2 + ... . − s − pi pi pi pi i=0
(2.41)
Comparing coefficients of (2.41) with that of H(s) in its moment expansion form H(s) = m0 + m1 s + m2 s2 + · · · + m2q−1 s2q−1 , we can arrive at −( kp11 + −( kp21 1
+ .. . −( pk2q1 + 1
k2 p2 k2 p22
+···+ +···+
k2 p2q 2
kq pq ) kq p2q )
+··· +
= m1 , = m2 ,
kq ) p2q q
(2.42) = m2q .
Actually, since the number of unknown ki is q, we only need to select q equations from (2.42) to solve for ki , i = 1, ..., q. We summarize the steps involved in AWE using moment matching in the following algorithm: Asymptotic waveform evaluation 1. Compute 2q moments using (2.21), choice of q depends on accuracy requirement. In practice, q < 5 is typically used. 2. Obtain the coefficients of denominator polynomials Q(s) and coefficients of numerator polynomials P (s) (if using the first method to compute residues) in (2.29) by solving (2.34) and (2.35). 3. Obtain the roots (thus the poles pi , i = 1, ..., q) of the denominator polynomials Q(s). 4. Find the corresponding residues kq , i = 1, ..., q using (2.40) or (2.42). 5. Compute the time domain responses for given inputs using closed form expressions for ideal input signals or recursive convolution for general input signals.
One important observation from (2.42) is that the moment mi is the power function of poles mi (p1 , p2 , ...pq ). As a result, the higher order (less dominant) poles, which have large magnitudes, will soon be lost numerically in the moments and high order moments will approximate the dominant poles (small poles). Practically, this reflects the fact that explicit moment methods like AWE cannot reliably obtain high-order poles and AWE can easily generate unstable positive poles when highorder moments are computed [91]. Typically, the reliable orders that can be obtained by AWE are approximately 5-6 based on our experiments. That implies two or three poles can be reliably computed by AWE.
2.3 Asymptotic waveform evaluation
17
To mitigate this problem, many approaches have been proposed in the past. Examples are frequency scaling, frequency shifting [92] and multiple point expansions such as complex frequency hopping (CFH) [19]. In this book, we present two recent developments for improving the numerical stability of direct moment matching in the sequel.
2.3.4
Multi-node moment matching (MMM) As mentioned in the previous section, the major issue with direct moment-matching methods like AWE is that the higher order moments will become less accurate: higher order moments are power functions of poles as shown in (2.42). Higher order pole information will soon be lost numerically. One effective way to partially mitigate this problem is by means of the multi-node moment matching technique (MMM), which can obtain more numerical stable estimation of the system poles [54]. The main idea of the MMM method is to estimate the poles by using the moment information from different nodes or from different input stimulus instead of using a single node with a single stimuli as done in the traditional moment-matching method [91]. The rational behind this method is that poles from different nodes and from different input stimulus are the same. By using MMM, we only need fewer orders of moments to compute the same number of poles. For instance, if moments from p different nodes with a single input are used only, we can get the p poles by using just p + 1 moments. If k inputs are allowed, then we can get the p poles by just using p/k + 1 moments for each input with about the same computing cost. Since the pole information is better preserved numerically in lower-order moments, as shown in (2.42), we can obtain a much better and numerically stable estimation of the poles. Specifically, we start with a linear dynamic system with n states: x˙ = Ax + Bu,
(2.43)
where A is the n × n original system and B is the n × 1 position matrix. Assume that there exists a reduced system, Aq×q , of order q to be determined that approximates the original circuit by simultaneously matching the moments of the selected q state variables x. This system is given by x˙ = Aq×q x + Bq u,
(2.44)
where Aq×q is a q × q reduced system and Bq is a q × 1 reduced position matrix. We select q nodes, which have one-one correspondence with the q variables selected in Aq×q , and their corresponding moments in the original circuit become the moment vectors m0 , m1 ,m2 ,...,mq+1 . According to (2.43), we have s[m0 + m1 s + m2 s2 + ... + mq sq + ...] = Aq×q [m0 + m1 s + m2 s2 + ... + mq sq + ...] + Bq u.
(2.45)
By comparing the coefficients of equal powers of s in both sides, we have the fol-
18
Projection-based model order reduction algorithms
lowing equations: Bq m0 m1 ··· mq−1
= −Aq×q m0 , = Aq×q m1 , = Aq×q m2 ,
(2.46)
= Aq×q mq .
Excluding the first equation, the equations above can be put into a matrix form as Aq×q [m1 , m2 , ..., mq ] = [m0 , m1 , m2 , ..., mq−1 ].
(2.47)
As a result, we can compute Aq×q as Aq×q = [m0 , m1 , m2 , ..., mq−1 ][m1 , m2 , ..., mq ]−1 .
(2.48)
Then the eigenvalues of Aq×q , which can be obtained by performing the eigendecomposition on Aq×q , are the reciprocals of q dominant poles of the original system in (2.43). Notice that only q + 1 moments are required to compute the q dominant poles when we select q nodes. If we use the moments from different inputs, then the order of moments used in (2.47) can be further reduced, as shown in [54].
2.3.5
Projection-based methods for pole computation Another way of finding poles in a numerically reliable way is by means of projectionbased model order reduction methods, where poles are computed from the eigendecomposition of the reduced system. This method makes sense if we are interested in the fast transient response by using explicit moment-matching methods instead of model order reduction of the systems. Explicit circuit moment matching can be transformed to the time domain waveforms very easily without simulating the reduced models. Hence, explicit moment matching is still appealing if poles can be computed reliably. Krylov subspace projection-based methods will be discussed in detail in the following sections. Here, we just briefly mention how the methods can be used to generate the dominant poles of a system. In the Krylov-subspace projection-based model order reduction process, moment vectors generated recursively in (2.21) are first orthonormalized during the generation process and then used to build a projection matrix. The projection matrix is then used to reduce the original circuit matrices by congruence transformation, which ensures that the reduced system is passive (thus stable) if original matrix are formulated properly [63, 85]. By using this method, we only require q moments to find q poles. Such a method guarantees that all the poles computed are stable (less than zero in their real part) [63, 85] owing to the nature of congruence transformation and the MNA formulation of the original RLC circuit matrices. Specifically, we obtain the first q moment vectors through (2.21). Then we form
19
2.3 Asymptotic waveform evaluation
the following n × q matrix where each moment vector is a column. M = [m0 , m1 , · · · , mq−1 ]n×q .
(2.49)
where q n, and n is the number of state variables (nodes) in the original circuit and also the dimension of the moment vectors. Then we orthonormalize M into an n × q projection matrix V , such that the columns in V are mutually orthogonal, i.e., viT vj = δij , i 6= j. Such an orthogonalization process can be simply carried out by using the standard Gram–Schmidt method [44]. Actually, the moments can be orthonormalized during the generation process, as in methods such as the Arnoldi or Lanczos methods, which are numerically more stable than the standard GramSchmidt method [44]. We will discuss the subspace projection methods in detail in the following sections. Once we have obtained the projection matrix V , the original circuit matrix G and C in (2.17) can be reduced to two q × q order-reduced matrices by the congruence transformation: ˆ = V T GV, G
ˆ = V T CV. C
(2.50)
ˆ −1 C ˆ will be related to After this reduction process, the eigenvalues of matrix G the dominant poles we are looking for as: pi = −
1 , λi
(2.51)
where pi and λi are the ith pole and eigenvalue. This can be easily obtained by ˆ −1 C. ˆ Once all the poles are computed, we performing the eigendecomposition of G then compute the residues at any node using aforementioned residue computation methods. We want to stress that the major difference of the projection methods used for pole computations here and the full-blown subspace projection-based model order reduction methods like PRIMA [85] is that we only need to compute the dominant poles here instead of the reduced system. Since the dominant poles are shared by all the transfer functions, we only need one transfer function to compute them. As a result, we can perform the projection-based model order reduction assuming one terminal. In this case, we need to make sure that the order of moments will be the same or larger than the required number of poles. If we use multiple terminals, then we end up with block moments. But we only need a few orders of block moments so that the reduced models have larger number of poles than is required. This case is similar to the multiple-input matching in the aforementioned MMM method. The bottom line is that the pole-computation reduction process (its CPU cost) does not depend on the number of terminals of the systems, which is not the case for full-blown Krylov subspace projection-based model order reduction methods to be discussed later. We notice that the Krylov subspace projection-based methods for pole computation have been used to compute the transient thermal profile using explicit moment matching in the microprocessor architecture level [74].
20
2.4
Projection-based model order reduction algorithms
Projection-based model order reduction methods In this section, we present projection-based model order reduction techniques, which are widely used for parasitic interconnect circuit macromodeling and reductions. We mainly focus on model order reduction approaches based on Krylov subspace projection methods.
2.4.1
Framework of projection-based model order reduction We consider a general linear time-invariant state-space model with only one input and one output (we will extend our discussion to multi-input, multi-output cases later), ˙ x(t) = Ax(t) + bu(t) y(t) = lT x(t),
(2.52)
where u is the input variable, y is the output variable, A is an n × n matrix, and b is an n × 1 matrix. x is an n × 1 vector of state variables. Then the transfer function from u(t) to y(t) can be given as H(s) = lT (sI − A)−1 b.
(2.53)
Typically, the number of state variables, n, is very large so that the simulation and synthesis of the whole systems are very slow. We want to build a much smaller system, such that the transient response y(t) to some given input signal u(t) is approximate to that by the original system. One question we may ask is which parts of the system can be discarded without changing the transfer function H(s) significantly. The concepts of controllability and observability can give good answers: uncontrollable and unobservable parts of the system can be removed without affecting the transfer function [15]. To this end, we can perform the state transformation x = T z, where T is the matrix of eigenvectors of A and we assume that the eigenvalues of A are denoted λ1 , ..., λn and they are simple and unique. Then (2.52) becomes ¯ b1 λ1 . z(t) + . u(t), ˙ z(t) = T −1 AT z(t) + T −1 bu(t) = . (2.54) . ¯ bn λn y(t) = lT T z(t),
Figure 2.3 illustrates the eigen-decomposed system, which consists of n independent transfer paths. If ¯bi is zero, then the state variable zi is not controllable and can be removed; Let c¯ = lT , if c¯i is zero, then the state variable zi is unobservable and can be removed. So the key issue of model order reduction is to remove uncontrollable or unobservable parts or practically weakly controllable or observable parts. One way to perform the direct state removal is by using a balancing and truncation
2.4 Projection-based model order reduction methods
21
method based on balanced realization instead of the diagonal representation [81], which will be discussed in detail in Chapter 3.
Figure 2.3 Block diagram of (2.54).
Another way to do the order reduction is by means of projection. Specifically, if we assume that the first q state variables in (2.54) to be preserved and the remaining variables are to be removed, we can form a new projection matrix Tp such that Tq consists of the first q left columns of T ; we have ˙ z(t) = Tq−1 ATq z(t) + Tq−1 bu(t) y(t) = lT Tq z(t),
(2.55)
Notice that Ar = Tq−1 ATq is a q × q matrix, and br = Tq−1 b is a q × 1 matrix. Thereby, the original state space of dimension n is projected to the space of the reduced model with dimension q. In general, for a single-input and single-output system (SISO): ˙ E x(t) = Ax(t) + bu(t) y(t) = lT x(t),
(2.56)
instead of using biorthogonal matrices Tq and Tq−1 , we can use other n × q matrices V and W for the projection. As a result, we may have more general projection reduction methods in terms of V and W for a single-input and single-output system, ˙ W T EV z(t) = W T AV z(t) + W T bu(t) y(t) = lT V z(t).
(2.57)
In projection theory, the q × q matrix W T AV can be interpreted as projecting A onto the subspace L spanned by V , and orthogonal to the subspace M spanned by W [102]. The projection operation approximates a solution to a system from a search subspace L of dimension q, so that q constraints are met at the same time. The
22
Projection-based model order reduction algorithms
projection operations are always involved with two spaces. If L = T , the projection is said to be orthogonal. Otherwise, the projection is oblique. For a linear dynamic system, the resulting projection operation will lead to matrix transformation as shown in (2.57). To see this in terms of a general projection operation, we first project the state variable x into the subspace L represented by V , i.e., span(V ) = L, we have x = V z.
(2.58)
Since (2.58) is an approximate solution in subspace L, we require that the residues in (2.52) will be orthogonal to the subspace M = span(W ), which satisfies the so-called Petrov–Galekin conditions. The residues for (2.52) can be written as (assuming the impulse response u(s) = 1) er = b + (A − sE)x.
(2.59)
If we multiply (2.59) by W T and replace x with V z, we have W T er = W T b + W T (A − sE)V z = 0,
(2.60)
which is the same state transfer function we have in (2.57). One question left is how to select the two subspaces and their corresponding projection matrices V and W . Krylov subspaces are the most useful and popular ones owing to their moment matching connection with the original system [32].
2.4.2
Krylov subspaces A subset of a vector space is called a vector subspace or subspace. The subspace is uniquely defined by a set of vectors V = {v1 , v2 , ..., vn }. The set of all linear combinations of these vectors is referred to as the span of V . span{V } = span{v1 , v2 , ..., vn }, ( ) n X = x|x = αi v i ,
(2.61) (2.62)
i=1
where αi are real numbers. If vi are linearly independent, then each vector of span{V } admits a unique expression as a linear combination of vi . The set V is then called a basis of the subspace span{V }. Concept of Krylov subspaces For n × n matrix A and a vector b, the Krylov subspace Kq (A, b) is defined as Kq (A, b) = span{b, Ab, A2 b, ..., Aq b},
(2.63)
where q is a given positive integer. For an n × q matrix Tq , whose columns form bases for the subspace spanned by
2.4 Projection-based model order reduction methods
23
the Kq (A, b), colspTq = Kq (A, b),
(2.64)
where colsp(G) indicates the column space of G. With this definition, the two projection matrices V and W determining the reduced order model for (2.56) are chosen as follows: colspV = Kq (A−1 E, A−1 b) = span{A−1 b, ..., (A−1 E)q−1 A−1 b}, colspW = Kq (A−T E T , A−T l) = span{A−T l, ..., (A−T E T )q−1 A−T l},
(2.65)
where A−T = (A−1 )T and V and W are both rank q. In this book, we call V the output Krylov subspace and W the input Krylov subspace. With this choice, the reduction method is called a two-side method as both input and output relations of the systems are involved. If only one of the two matrices V or W is chosen and the other one is chosen arbitrarily such that W T AV is non-singular, then the method is called a one-side MOR method. If we compute the V and W directly based on their definitions in (2.65), we will find that the numerical calculation of the matrix-vector products involved in the subspaces turns out to be unstable. This is no surprise as {(A−1 E)q−1 A−1 b} are the circuit moments, which will lose the small eigen value information numerically, as the order q becomes large, as shown in the previous section. The reduction process may lead to unstable poles as a result. Practically, the Krylov spaces can be obtained by numerical stable processes like Arnoldi and Lanczos algorithms [44], which are to be discussed in the next subsections. Moment connection of Krylov subspaces One important property of the Krylov subspace is that the output responses of reduced models match that of the original circuits in terms of moments [32]. To see this, we rewrite the transfer function for the state space equation in (2.56) as H(s) = lT (sE − A)−1 b ¯ = lT (sA¯ − I)−1 b,
(2.66)
¯ = A−1 b. As a result, we can obtain the coefficients of the where A¯ = A−1 E and b Taylor series of (2.66) around s0 = 0, ¯ H(s) = lT (sA¯ − I)−1 b, T¯ T ¯¯ ¯ i··· = −l b − l Abs, · · · , −lT A¯i bs
= −lT A−1 b − lT (A−1 E)A−1 bs, · · · , −lT (A−1 E)i A−1 bsi · · · . (2.67)
In the two side methods, the first two 2q moments of the original and the reduced model match. In one-side methods, only q moments of the original and the reduced model match. In the sequel, we first present the moment matching theorem and then we give a simple proof for s0 = 0 case. A more general proof can be found in [45].
24
Projection-based model order reduction algorithms
Theorem 2.1 (Moment matching connection of one-side Krylov subspace). For the two Krylov subspaces colspV = Kq (A−1 E, A−1 b) and colspW = Kq (A−T E T , A−T l) defined in (2.65), For the one-side Krylov subspace method, the reduced system obtained by projection reduction matches q moments of the transfer function of the original system defined in (2.66). The proof can be found at the Section 2.8. If we use both Krylov spaces colspV = Kq (A−1 E, A−1 b) and colspW = Kq (A−T E T , A−T l), then we have the following result: Corollary 2.1 (Moment matching connection of two-side Krylov aubspace methods). If both subspaces V and W are used, the reduced model matches 2q moments of the transfer function of the original system defined in (2.66). See Section 2.8 for a simple proof. In the one-side Krylov methods, since we only use one Krylov subspace, say, V , a good choice for W is V = W . As a result, the matrix transformation W T AV become V T AV , which is called congruence transformation. It was shown that congruence transformation can preserve the passive property of the original systems if the state space matrix is formed in a passive form. We will discuss this in the later subsection. Here we want to present some result regarding the symmetric systems. Corollary 2.2 (Moment matching connection for symmetric systems). For a symmetric system {A, E, b, l} in (2.56) with the same input and output ports , i.e., A and E are symmetric and b = l, for one-side Krylov subspace methods with congruence transformation (i.e., V = W ), the reduced model matches 2q moments of the transfer function of the original system defined in (2.66). Proof: This corollary can easily be proved by noticing that, in this case, the two subspaces colspV = Kq (A−1 E, A−1 b) and colspW = Kq (A−T E T , A−T l) are the same because of the symmetric nature of the system. Also, since we make V = W , such reduction is equivalent to the two-side method and, thus, 2q moments are matched, based on Corollary 2.1. QED. The significance of Corollary 2.1 is that for many practical RC circuits, whose MNA matrices are symmetric, using the one-side method such as the Arnoldi method can lead to more accurate results and the reduced models are also passive. For general RLC circuits, their matrices are not symmetric, but by splitting the projection matrices, which makes the matrices subblock-level symmetric, the reduction process can still lead to double accuracy (matching 2q instead of q moments) as shown in [37] and in Chapter 8 for general structure-preserved model order reduction methods. Another way is by means of the second order formulation of RLC circuits, which is also symmetric and can lead to 2q moment matching. Note that instead of s0 = 0, the expansion can be carried out at another point s0 6= 0. Then, with slight modification of the Krylov subspaces, the moments can
2.4 Projection-based model order reduction methods
25
still be matched here. In the following subsections, we show two numerically stable algorithms, the Arnoldi method and the Lanczos method to find the Krylov subspaces. The Arnoldi method is an one-side method while Lanczos method is a two-side method. The fundamental idea is to orthonormalize the basis vectors in the Krylov subspaces defined in colspV = Kq (A−1 E, A−1 b) or colspW = Kq (A−T E T , A−T l). Such orthogonal vectors contain much less numerical noise compared with the circuit moments as lower-order moment vectors are subtracted during the orthonormalization process.
2.4.3
Arnoldi algorithms To avoid numerical problems when constructing the Krylov subspace, it is advantageous to build the orthogonal basis for a given subspace. The Arnoldi method is the classic method to find a set of orthonormal vectors as a basis for a given Krylov subspace as proposed by W.E. Arnoldi in 1951 [8]. Specifically, given Krylov subspace Kq (A, b), the Arnoldi method using the modified Gram–Schmidt orthogonalization [36, 102, 104] is as follows: Arnoldi Algorithm (A, b) b 1 Compute v1 = kbk —- # normalize b 2 2 For j = 1, ..., q Do 3 Compute wj = Avj —- # calculate the next vector 4 For i = 1, ..., j Do 5 hij = viT wj 6 wj = wj − hij vi —- # orthogonalization 7 EndDo 8 If wi = 0, stop, otherwise hj+1,j = kwj k2 wj —- # normalization 9 vj+1 = hj+1,j 10 EndDo Figure 2.4 Arnoldi method based on modified Gram–Schmidt orthonormalization for SISO systems.
After the Arnoldi process, we obtain a new set of vectors V = {v1 , v2 , ..., vq } and new orthogonal projection matrix V V T V = I,
(2.68)
whose columns are the basis for the given Krylov subspace. Furthermore, V T AV = H,
(2.69)
where H is a q × q matrix, called a Hessenberg matrix , in which hij = 0 for i > j + 1 for all 1 ≤ i, j ≤ q. The Arnoldi method presented in Figure 2.4 is just for a single-input and singleoutput system (SISO). For the multi-input and multi-output case, block Arnoldi
26
Projection-based model order reduction algorithms
methods can be applied [8, 14].
2.4.4
Lanczos algorithms The Lanczos algorithm is a well known two-side Krylov method proposed by C. Lanczos in 1950 [67]. Specifically, given two Krylov subspaces Kq (A, b), Kq (AT , c), the non-symmetric Lanczos method [32, 102] is as follows:
Lanczos Algorithm (A, b, c) 1 Compute ρ1 = kbk2 ; η1 = kck2 ; v1 = 2 Set v0 = w0 = 0 and τ0 = 0 3 For j = 1, ..., q Do 4 Compute τj = wjT vj wT Avj
b ρ 1;
w1 =
τ
c η1
—- # normalize b and c
τ
j j 5 Compute αj = jτj βj = τj−1 ηj ; γj = τj−1 ρj 6 Compute v = Avj − vj αj − vj−1 βj 7 Compute w = AT wj − wj αj − wj−1 γj —- # orthogonalization 8 If v = 0 or w = 0, stop, otherwise ρj+1 = kvj k2 and ηj+1 = kwk2 w w 9 vj+1 = ρj+1 and wj+1 = ηj+1 —- # normalization 10 EndDo
Figure 2.5 Non-symmetric Lanczos method for SISO systems.
After the Lanczos process, we have two matrices V = {v1 , v2 , ..., vq } and W = {w1 , w2 , ..., wq } and it can be shown that W T V = I,
(2.70)
where columns of W and V are the bases for Krylov subspaces Kq (A, b), Kq (AT , c), respectively. In addition to the two mutually orthogonal projection matrices, we also obtain two tri-diagonal matrices
α 1 β2 0 · · · . ρ 2 α 2 β3 . . Tq = 0 ρ 3 . . . . . . . . . . .. . . . . . . 0 · · · 0 ρq
0 .. . , 0 βq
αq
(2.71)
27
2.4 Projection-based model order reduction methods
and
α1 γ 2 0 · · · . η 2 α2 γ 3 . . T˜q = 0 η3 . . . . . . . . . . .. . . . . . .
0 .. . , 0 γq
(2.72)
0 · · · 0 η q αq
where
V T AV = Tq , W T AT W = T˜q .
(2.73) (2.74)
The Lanczos algorithm given in Figure 2.5 is just for the reduction of single-input and single-out system. For multi-input and multi-output systems, block Lanczos methods are needed [14, 33, 38].
2.4.5
RLC circuit formulation for reduction For a general RLC linear network, we can apply modified nodal analysis to formulate it into the state space equations: ˙ Gx(t) + C x(t) = Bu(t) y(t) = LT x(t),
(2.75)
where G and C are the n × n conductive and storage element matrices, B and L are the n × N input and output positions matrices, and, typically, B = L or B = −L; N is the number of input or output ports. State variables x can be nodal voltage or branch currents of the linear circuits. The matrices G and C can be further expressed as [14, 129] Ag GATg Al Av Ac CATc 0 0 xn T G= Al 0 0 0 −L 0 C= (2.76) x = xl , T Av 0 0 0 0 0 xv where G, C, and L are the conductance, capacitance, and inductance sub-matrices. Ag , Ac , Al and Av are the incident matrices for conductive, capacitive, inductive, and voltage source elements [129]. The matrices Ag GATg , Ac CATc , and L are symmetric and positive semidefinite. As a result, G and C are symmetric and positive semidefinite. We can further rewrite (2.76) into the following condensed form C11 0 x G11 G12 C= G= x= 1 , (2.77) 0 −C22 x2 GT12 0
where G11 =
Ag GATg
C11 =
Ac CATc
C22
L0 = . 00
(2.78)
28
Projection-based model order reduction algorithms
Notice that G11 and C11 are symmetric and positive semidefinite. If we change the sign of GT12 and the sign of x2 , then we have C11 0 G11 G12 x1 C = G= x = 0 C22 −GT12 0 −x2
(2.79)
The resulting G and C become positive semidefinite, although they are no longer symmetric. This can easily be seen as T T G11 G12 x1 T x Gx = x1 x2 −GT12 0 x2 = xT1 G11 x1 + xT1 G12 x2 − xT2 GT12 x1 = xT1 G11 x1 ≥ 0,
(2.80)
where xT1 G11 x1 ≥ 0 means that xT1 G11 x1 is positive semidefinite. It will be shown in a later subsection that this property is important for passive model order reduction. In the case of RC and RL circuits, if the circuits are excited by the current sources only (the transfer function is an admittance), the G and C and x become G = Ag GATg ,
C = Ac CATc ,
x = xn ,
(2.81)
with B = L. As a result, G and C are symmetric and positive semidefinite.
2.4.6
Passivity preservation In this subsection, we present some important results regarding the passive preservation in the Krylov subspace projection-based model order reduction. Passivity is an important property of many physical systems. A passive network does not generate energy. If the reduced order model loses its passivity, it may lead to unbounded responses in transient simulation, which means that new energy has been generated in this network. Figure 2.6 shows a transient simulation result of a non-passive circuit under a sinusoidal excitation. Brune [12] has proved that the admittance and impedance matrix of an electrical circuit consisting of an interconnection of a finite number of positive R, positive C, positive L, and transformers are passive if and only if their rational functions are positive real. A network with transfer matrix function H(s) is said to be positive real iff H(s) is analytic, forRe(s) > 0.
(2.82)
H(s) = H(¯ s) , for Re(s) > 0,
(2.83)
HH (s) = H(s) + H(s)
H
≥ 0 , for Re(s) > 0,
(2.84)
where HH (s) is called the Hermitian part of matrix H(s), as defined in (2.84). Mathematically, condition (2.82) means that there are no unstable poles (poles lie on the right-half-plane (RHP) in the s-domain). Condition (2.83) refers to a system that has a real response, and condition (2.84) is equivalent to stating that the real
29
2.4 Projection-based model order reduction methods
0.015 0.01
Current (A)
0.005 0 −0.005 −0.01 −0.015 −0.02
0
0.2
0.4
0.6
Time (s)
0.8
1 −9
x 10
Figure 2.6 Transient response of a non-passive circuit.
part of H(s) is a positive semidefinite matrix at all frequencies. So it is only a necessary condition that a positive-real function has no poles in the RHP. Conditions (2.82) and (2.83) are easily satisfied for the transfer functions of general RLC circuits when matrices G and C are real. Condition (2.84) is not satisfied in general. But if the matrices G and C are formulated in the passive form as shown in (2.79), then the transfer function is positive real and we have the following results [14, 85]: Theorem 2.2. For state space representation of an RLC system {G, C, B} in (2.75) where B = L. If G + GT ≥ 0 and C = C T ≥ 0 and G + sC is invertible at least at one point s with Re(s) > 0, then the following matrix transfer function is positive real H(s) = B T (G + sC)−1 B.
(2.85)
The proof of the theorem can be found in [14]. The key step in the proof is the fact that the Hermitian of W (s) = G + sC, i.e., W (s)H = G + GT + 2δC at s = δ + jω is positive semidefinite. The inverse of the matrix and congruence transformation also do not change the property of positive semidefinite. After we perform the model order reduction on both G and C respectively, the reduced system is still passive. We have the following result [14]: Theorem 2.3. Let there be a state space representation of an RLC system {G, C, B} in (2.75) where B = L. Let V be any n × q real reduction matrix. If G + GT ≥ 0, C = C T ≥ 0 and V is full rank. Let Gq = V T GV , Cq = V T CV , and
30
Projection-based model order reduction algorithms
Bq = V T B. Then the matrix transfer function Hq (s) = BqT (Gq + sCq )−1 Bq
(2.86)
is positive real, thus passive.
2.4.7
PRIMA algorithm In this section, we summarize the projection-based model order reduction by presenting the PRIMA (passive reduced-order interconnect macromodeling algorithm) algorithm [85], which performs general passive projection-based MOR on multiinput and multi-output (MIMO) RLC linear dynamic systems. The PRIMA algorithm starts with the MNA formulation of a linear RLC dynamic MIMO system ˙ Gx(t) + C x(t) = Bu(t) i(t) = B T x(t),
(2.87)
where G and C are n × n conductance and storage element matrices in their passive form as shown in (2.79), u(t) and i(t) are the input port voltage and the current source vector at the same ports (to be computed) as outputs are measured at the inputs, B is the n × N position matrix for input and output ports, and N is the number of terminals (ports). The transfer function, which is the admittance matrix, will become Y (s) = B T (G + sC)−1 B.
(2.88)
A = −G−1 C
(2.89)
If we define R = G−1 B,
then the transfer function can be written as Y (s) = B T (I − sA)−1 R.
(2.90)
Assume that we have an n × q projection matrix V . Unlike some previous methods such as Pade via Lanczos (PVL) [32] or Block PVL (MPVL) [33, 38], which perform the reduction on the I + A and R in (2.89), PRIMA performs the reduction on G + sC and B directly. The major benefit of this reduction process is that G + sC can be formed into a passive form as shown in Theorem 2.2. After the reduction by the congruence transformation, the resulting reduced models can be still passive as shown in Theorem 2.3. Specifically, the reduced system becomes ˙ V T GV z(t) + V T CV z(t) = V T Bu(t) i(t) = B T V z(t),
(2.91)
where z is the reduced state vector of size q. The reduced transfer function becomes Yq (s) = BqT (Gq + sCq )Bq ,
(2.92)
2.4 Projection-based model order reduction methods
31
where Gq = V T GV
Cq = V T CV
Bq = V T B.
(2.93)
PRIMA applies a block Arnoldi algorithm to generate the projection matrix V . The reduced model matches the original systems in terms of block moments as defined below, Y (s) = M0 + M1 s + M2 s2 + ...
(2.94)
where Mi ∈ RN ×N . The block moments can be computed as follows Mi = B T Ai R.
(2.95)
Accordingly, we can define a block Krylov subspace for a given matrix A and R = [r0 , r1 ..., rN ] Kq (A, R) = span{R, AR, ..., Ak−1 R},
(2.96)
where k = q/N . If q/N does not result as in integer, we set k = bq/N c2.1 and the corresponding Krylov subspace is defined as Kq (A, R) = span{R, AR, ..., Ak−1 R, Ak r1 , Ak r2 , ..., Ak rl },
(2.97)
where ri is the ith column vector of R, and l = q − kN . With those concepts, given two matrices A and R, the block Arnoldi algorithm is presented as follows [14]: Block Arnoldi Algorithm (A, R) 1 Compute QR factorization R = V0 X 2 For j = 1, ..., q/N Do 3 Compute Vj0 = AVj−1 4 For i = 1, ..., j Do (i−1) T 5 Compute Hj−i,j−1 = Vj−i Vj (i) (i−1) 6 Compute Vj = Vj − Vj−i Hj−i,j−1 ; 7 EndDo (j) 8 Compute QR factorization Vj = Vj Hj,j−1 9 EndDo Figure 2.7 Block Arnoldi method for MIMO systems.
After the Block Arnoldi process, we have the projection matrix V = {V0 , V1 , ...Vk−1 }, whose columns of V form bases for the Krylov subspace generated by A and R, i.e., colsp(V ) = Kq (A, R) 2.1
The b.c operator is the truncation to the nearest integer towards zero.
(2.98)
32
Projection-based model order reduction algorithms
and V T V = I. The algorithm also generates a q × q block upper Hessenberg matrix H00 H01 ··· H0,k−1 H10 H11 ··· H1,k−1 (2.99) Hq = . . , .. . .. .. .. . 0 · · · Hk−1,k−2 Hk−1,k−2
which satifies
V T AV = Hq .
2.5
(2.100)
Numerical examples In this section, we present one numerical example for compact modeling by projection-based model order reduction. The example is a two-port large lumped RLC circuit. The values are marked in the Figure 2.8. 1
100
10n
10n
1 1p
1
R=5 C=1p L=10n
10
5n
1p
1 1p
2 3p
10n
1
1p
1p
R=10 C=3p L=5n
Figure 2.8 A two-port large lumped RCL circuit.
We perform projection-based model order reduction on the circuit with different orders (different poles). The results are shown in Figure 2.9 for Y (11) responses and Figure 2.10 for Y (12) responses. Hence, with more orders or more poles used, the reduced models will better match the original models.
2.6
Historical notes In this section, we briefly review model order reduction methods based on Krylov subspaces from a historical perspective in the computer-aided design community. Projection-based model order reduction techniques have been intensively studied in the past two decades [32, 37, 63, 85, 91, 113]. Projection-based methods were pioneered by asymptotic waveform evaluation (AWE) algorithm [91], where explicit moment matching was used to compute dominant poles at low frequency. The Pade via Lanczos (PVL) [32], block PVL (MPVL) [33], symmetric block PVM (SyMPVL) [38], and Arnoldi transformation [113] methods improved the numerical stability of AWE, while the split congruence transformation [63] method and
33
2.6 Historical notes
Y(11) responses
−3
10
x 10
Exact response Reduced model with 12 poles Reduced model with 24 poles
9
Magnitude (1/ohm)
8 7 6 5 4 3 2
0
0.5
1
1.5
2
Frequency
2.5
3
3.5 9
x 10
Figure 2.9 Comparison of the magnitudes of Y (11) for different reduction orders for the lumped RLC circuit.
Y(12) responses
−3
9
x 10
8
Exact response Reduced model with 12 poles Reduced model with 24 poles
Magnitude (1/ohm)
7 6 5 4 3 2 1 0
0
0.5
1
1.5
2
Frequency
2.5
3
3.5 9
x 10
Figure 2.10 Comparison of the magnitudes of Y (12) for different reduction orders for the lumped RLC circuit.
PRIMA [85] can further produce passive models. However, reduced circuit matrices by PRIMA are larger than direct pole marching (having more poles than necessary) [1] and PRIMA does not preserve certain important circuit properties such reciprocity [37]. The latest development by structured projection can preserve reciprocity [37], but it does not realize the reduced circuit matrices. The extension of structure-preserving MOR can be found in Chapter 8 of this book.
34
2.7
Projection-based model order reduction algorithms
Summary In this chapter, we presented the basic concepts for the model order reduction of interconnect circuits based on Krylov subspaces. We started with the concepts of moments, Elmore delays, and the Pade approximation for moment matching. We introduced the classic moment matching techniques like AWE, which suffer from the numerical issues. Then we presented the projection-based reduction framework and concepts of Krylov subspaces, which lead to more numerical stable model order reduction methods. Finally, we showed how model order reductions can be applied to RLC circuits and produce passive models via some numerical stable processes like Arnoldi and Lanczos methods for both SISO and MIMO systems.
2.8
Appendices Theorem 2.1 (Moment-matching connection of one-side Krylov subspace). For the two Krylov subspaces colspV = Kq (A−1 E, A−1 b) and colspW = Kq (A−T E T , A−T l), defined in (2.65). For the one-side Krylov subspace method, the reduced system obtained by the projection-based reduction matches q moments of the transfer function of the original system defined in (2.66). Proof: We first rewrite the reduced system (2.57) as ˙ Er z(t) = Ar z(t) + br u(t) y(t) = lTr z(t), where Er = W T EV , Ar = W T AV , br = W T b, and lTr = lT V . Then the reduced system transfer function can be written into T −1 −1 T −1 i −1 i Hr (s) = −lTr A−1 r br − lr (Ar Er )Ar br s, · · · , −lr (Ar Er ) Ar br s · · · .
We first compare the moment 0 of the reduced system mr0 with the original system by using the Krylov subspace colspV = Kq (A−1 E, A−1 b). mr0 = lTr A−1 r br = lT V (W T AV )−1 W T b = lT V (W T AV )−1 (W T AV )r0 = lT V r0 = lT A−1 b = m0 . The critical step in the above derivation is to realize that A−1 b belongs to the Krylov subspace Kq (A−1 E, A−1 b) and, therefore, we can represent A−1 b = V r0 and b = AV r0 . We can also prove mr0 = m0 by using Krylov subspace colspW =
35
2.8 Appendices
Kq (A−T E T , A−T l). mr0 = lTr A−1 r br = lT V (W T AV )−1 W T b = g0 T (W T AV )(W T AV )−1 W T r0 = g0 T W T b = lT A−1 b = m0 . In this derivation, we utilize the fact that A−T l belongs to the Krylov subspace Kq (A−T E T , A−T l) and, therefore, A−T l = W g0 and lT = g0 W T A. Then we compare the moment 1 of the reduced system mr1 and that of the original system, m1 . Still we first use Krylov subspace colspV = Kq (A−1 E, A−1 b) and we have −1 mr1 = lTr (A−1 r Er )Ar br = lT V (W T AV )−1 W T EV (W T AV )−1 W T b = lT V (W T AV )−1 W T EV r0
= lT V (W T AV )−1 W T A(A−1 E)A−1 b
= lT V (W T AV )−1 (W T AV )r1 = lT V r1 = lT (A−1 E)A−1 b = m1 . where V r1 = (A−1 E)A−1 b as (A−1 E)A−1 b belongs to Kq (A−1 E, A−1 b). In general, we have (A−1 E)i A−1 b = V ri , i < q for the Krylov subspace Kq (A−1 E, A−1 b). Then we can prove that i −1 mri = lTc (A−1 r E r ) A r br i = lT V (W T AV )−1 W T EV (W T AV )−1 W T b
= lT V ri = lT (A−1 E)i A−1 b = mi .
Similarly, if we use Krylov subspace W = Kq (A−T E T , A−T l), i < q. As a result, we can have (A−T E T )i A−T l = W gi . We then show that i −1 mri = lTc (A−1 r E r ) A r br i = lT V (W T AV )−1 W T EV (W T AV )−1 W T b
= giT W T b = lT A−1 (EA−1 )i b = lT (A−1 E)i A−1 b = mi .
QED. Corollary 2.1 (Moment-matching connection of two-side Krylov subspace methods). If both subspaces V and W are used, the reduced model matches 2q moments of the transfer function of the original system defined in (2.66). Proof: To prove the corollary, we only need to look at the expected highest-order
36
Projection-based model order reduction algorithms
moment of the reduced system m2q−1 ; 2q−1 −1 mr2q−1 = lTc (A−1 A r br r Er ) q−1 −1 q−1 −1 = lTc (A−1 (A−1 A r br r Er ) r Er )(Ar Er ) q−1 T T −1 T = l V (W AV ) W EV × q−1 T (W AV )−1 W T EV (W T AV )−1 W T EV (W T AV )−1 W T b T = gq−1 (W T AV ) (W T AV )−1 W T EV rq−1 T = gq−1 W T EV rq−1
= lT A−1 (EA−1 )q−1 E(A−1 E)q−1 A−1 b = lT (A−1 E)q−1 A−1 E(A−1 E)q−1 A−1 b = lT (A−1 E)2q−1 A−1 b = m2q−1 . QED.
3 Truncated balanced realization methods for MOR
3.1
Introduction Model order reduction methods for linear and non-linear dynamic systems in general can be classified into two categories [6]: 1. Singular-value-decomposition (SVD) based approaches 2. Krylov-subspace-based approaches. Krylov-subspace-based methods have been reviewed in Chapter 2. In this chapter, we focus on the SVD-based reduction methods. Singular value decomposition is based on the lower rank approximation, which is optimal in the 2-norm sense. The quantities for deciding how a given system can be approximated by a lower-rank system are called singular values, which are the square roots of the eigenvalues of the product of the system matrix and its adjoint. The major advantage of SVD-based approaches over Krylov subspace methods lies in their ability to ensure the errors satisfying an a-priori upper bound. Also, SVD-based methods typically lead to optimal or near optimal reduction results as the errors are controlled in a global way. However, SVD-based methods suffer the scalability issue as SVD is a computational intensive process and cannot deal with very large dynamic systems in general. In contrast, Krylov-subspace-based methods can scale to reduce vary large systems due to efficient computation methods for moment vectors and their orthogonal forms. SVD-based approaches consist of several reduction methods [6]. In this chapter, we mainly focus on the truncated-balanced-realization (TBR) approach and its variants, which were first introduced by Moore [81]. In the TBR method, a system is mapped onto a basis where the states that are difficult to reach are also difficult to observe. Then, the reduced model is obtained simply by truncating those weak states. TBR-based approaches have been applied to reduce linear VLSI systems in the past [87,89,131]. Also, TBR methods can produce passive models by using positive real TBR methods [87,89,131]. The latest development by using empirical Gramians improves the scalability of the TBR methods [89, 90]. Here, we first review the singular value decomposition method in Section 3.2. Then, we briefly discuss the proper orthogonal decomposition (POD) method, as it 37
38
Truncated balanced realization methods for MOR
can be viewed as a simple application of SVD on the responses of a dynamic system in section 3.3. After this we proceed to the TBR-based reduction methods in the remaining sections.
3.2
The singular value decomposition (SVD) In this section, we briefly review the singular value decomposition method, as it is the key reduction step used in this chapter. For an m × n matrix A, the SVD decomposition of A is T A = Um×m ΣVn×n ,
(3.1)
T T where Um×m and Vn×n are orthogonal matrices, Um×m Um×m = I andp Vn×n Vn×n = I. Σ = diag(σ1 , σ2 , . . . , σmin(m,n) ), σi is called singular values: σi = λi (AT A) ≥ σi+1 and σ1 ≥ σ2 ≥ · · · ≥ σmin(m,n) .
SVD can lead to the best approximation in terms of the 2-norm [44], which is expressed by the following theorem: Theorem 3.1 (Schmidt–Mirsky, Eckart–Young). For an SVD decomposition of a matrix A, if k < r = rank(A) and k X
σi ui viT ,
(3.2)
=k A − Ak k2 = σk+1 ,
(3.3)
Ak =
i=1
then min
rank(B)=k
where {ui } and {vi } are the left and right singular vectors respectively. Equation (3.3) reflects the fact that the rank-k approximation matrix Ak is just σk+1 away from original matrix A in terms of the 2-norm distance. Hence, SVD gives a very good low-rank approximation to the original system with errors bounded by the known singular values.
3.3
Proper orthogonal decomposition (POD) We first start with the proper orthogonal decomposition method as it is a straightforward way to apply the SVD method for model order reduction. This method can be used for both linear and non-linear system reductions. Consider the dynamic (linear and non-linear) system given by x(t) ˙ = f (x(t), u(t)),
(3.4)
y(t) = h(x(t), u(t)),
(3.5)
3.4 Classic truncated balanced realization methods
39
where u(t) is the input vector the system. For a fixed input u(t), the state trajectory at certain instances of time (m samples) is measured: X = [x(t1 ), x(t2 ), ..., x(tm )].
(3.6)
the X matrix can be viewed as the snapshots of the state. In general m n. Next we perform SVD on X . If the singular values of this matrix fall off rapidly, a loworder approximation of this system can be computed, X = U ΣV T ≈ Uk Σk VkT ,
(3.7)
where Σ and Σk are diagonal matrices with singular values in a decreasing order along the diagonal and Σk is a k × k diagonal matrix. As a result, we have the eigendecomposition of the response Gramian as X X T = U ΣΣT U T ≈ Uk Σk ΣTk UkT .
(3.8)
Let x(t) ≈ Uk η(t), η(t) ∈ Rk ; then we have the following reduced system in terms of η(t), η(t) ˙ = Uk T f (Uk η(t), u(t)), y(t) = h(Uk η(t), u(t)).
(3.9) (3.10)
Thus the approximation of the state x(t) uses a low-dimensional space that is spanned by the k leading columns of U , which essentially are the dominant eigen vectors of the response Gramian X X T . This reduction process can be used directly for both linear and non-linear dynamic systems. But the state trajectory is dependent on the input signals. The POD method is more suitable for autonomous systems like oscillators as we perform the decomposition directly on the state trajectory of the system. For systems with input and output terminals, it is more desirable to perform the reduction on the controllability and observability Gramians as they are better ways to describe the system dynamic behaviors from the input’s and output’s perspectives. This leads to the truncated balanced realization (TBR) methods, to be discussed in the following sections.
3.4
Classic truncated balanced realization methods In this section, we review the classic TBR method for general linear dynamic systems.
40
Truncated balanced realization methods for MOR
3.4.1
State-space models Given a linear system in its state-space form, dx = Ax(t) + Bu(t), dt y(t) = Cx(t) + Du(t),
(3.11) (3.12)
where A ∈ Rn×n , B ∈ Rn×p , C ∈ Rp×n , D ∈ Rp×p , y(t), and u(t) ∈ Rp , model reduction algorithms seek to produce a similar system d˜ x ˜x(t) + Bu(t), ˜ = A˜ dt ˜ y˜(t) = C˜ x ˜(t) + Du(t),
(3.13) (3.14)
˜ ∈ Rq×p , C˜ ∈ Rp×q , and D ˜ ∈ Rp×p , of order q much smaller than where A˜ ∈ Rq×q , B the original order n, but for which the outputs y(t) and y˜(t) are approximately equal for inputs u(t) of interest. Often the transfer functions H(s) = D + C(sI − A)−1 B, ˜ ˜ + C(sI ˜ ˜ −1 B, ˜ H(s) =D − A)
(3.15) (3.16)
˜ are used as a metric for approximation: if kH(s) − H(s)k < , in some appropriate norm, for some given allowable error and allowed domain of the complex frequency variable s, the reduced model is accepted as accurate.
3.4.2
Basic idea of balanced truncation Consider a system as in (3.11) where A is stable or, equivalently, its spectrum is in the open left half plane. We consider the output response to a particular input signal, namely, the impulse u = δ. In this case the output is called the impulse response and is denoted by h(t) = CeAt B, t ≥ 0. This can be decomposed into an input-to-state map x(t) = eAt B, and a state-to-output map η(t) = CeAt . Thus the input δ causes the state x(t), while the initial condition x(0) causes the output y(t) = η(t)x(0). The Gramians corresponding to x and η are Z ∞ X T T Wc = x(t)x(t) = eAt BB T eA t dt, (3.17) 0
t
Wo =
X t
T
η(t) η(t) =
Z
∞
T
eA t C T CeAt dt.
(3.18)
0
where Wc and Wo are called the controllability and observability Gramians respectively. A linear system in the state space form is called balanced if the solutions of the two Gramians are equal and diagonal: Wc = Wo = Σ = diag(σ1 , σ2 , . . . , σn ).
(3.19)
3.4 Classic truncated balanced realization methods
41
It turns out that every controllable and observable system can be transformed to the so-called balanced form by means of a basis change x ˜ = T x. The two Gramians are transformed by congruence transformation: ˜c = T W c T T , W W˜o = T −T Wo T −1 , ˜c W˜o = T Wc Wo T −1 . W
(3.20) (3.21) (3.22)
Since any two symmetric matrices can be simultaneously diagonalized by an appropriate congruence transformation, it is possible to find a value for T that makes ˜ c and W ˜ o equal and diagonal. Since the product Wc Wo transforms under the W similarity at the same time, the eigenvalues of the product Wc Wo are invariant. These eigenvalues, λi (Wc Wo ), are called the Hankel singular values, and contain useful information about the input-output behavior of the system. In particular, small eigenvalues of Wc Wo correspond to internal sub-systems that have a weak effect on the input-output behavior of the system and are almost non-observable or non-controllable or both. These are fundamental invariants that determine how well a model can be approximated by a reduced-order model. They play the same role for dynamical systems that the singular values play for finitedimensional matrices. The key for the computation of the Hankel singular values is the observation that the Gramians Wc , Wo are the unique Hermitian positive definite solutions to the following linear matrix equations, which are known as Lyapunov equations: AWc + Wc AT + BB T = 0, T
T
A Wo + Wo A + C C = 0.
(3.23) (3.24)
Then, the Hankel singular values of the system are the square roots of the eigenvalues of the product Wc Wo . If a system is controllable and observable, the two Gramians (W c and W o) are positive definite and then can be factored as W c = Lc LTc and W o = Lo LTo respectively by Cholesky factorization, where Lc and Lo are lower triangular matrices. Let LTo Lc = U ΣV T be the SVD of LTo Lc . The balancing transformation matrix can be obtained as 1
T = L c V Σ− 2 , T
−1
=Σ
− 12
U
T
LTo .
(3.25) (3.26)
Then, under a similarity transformation of the state-space model, A˜ = T −1 AT, ˜ = T −1 B, B C˜ = CT.
(3.27) (3.28) (3.29)
42
Truncated balanced realization methods for MOR
We may partition Σ into Σ=
Σ1 0 . 0 Σ2
(3.30)
Conformally partitioning the transformed matrices as A˜11 A˜12 ˜ A= ˜ ˜ , A21 A22 ˜ ˜ = B1 , B ˜2 B C˜ = C˜1 C˜2 .
(3.31) (3.32) (3.33)
The reduced-order model is obtained by simple truncation, that is, by taking the ˜ B, ˜ C, ˜ respectively; the system satisfies kth k × k, k × m, p × k leading blocks of A, order Lyapunov equations with diagonal solution Σ1 . This truncation leads to a balanced reduced-order system (A11 , B1 , C1 , D). Algorithm 3.1: classic truncated balanced realization(TBR) 1. 2. 3. 4. 5. 6. 7. 8.
3.4.3
Solve AWc + Wc AT + BB T = 0 for Wc . Solve AT Wo + Wo A + C T C = 0 for Wo . Compute Cholesky factors, Wc = Lc LTc , Wo = Lo LTo . Compute SVD of Cholesky factors U ΣV T = LTo Lc where Σ is diagonal positive and U ,V have orthonormal columns. Compute the balancing transformation matrices T = Lc V Σ−1/2 , T −1 = Σ−1/2 U T LTo . ˜ = T −1 B, C˜ = Form the balanced realization transformations as A˜ = T −1 AT , B CT . ˜ B, ˜ C˜ conformally. Select reduced model order and partition realization A, ˜ ˜ ˜ Truncate A, B, C to form the reduced realization.
Error bounds One of the attractive aspects of TBR methods is that computable error bounds are available. If this truncation is such that the resulting Gramians contain the largest k singular values σ1 through σk , then the H∞ norm of the error system has the following upper and lower bound [29, 43]: ˜ q (s)k∞ ≤ 2 σk ≤ kH(s) − H
N X
σk .
(3.34)
k=q+1
The H∞ norm of a system Σ is defined as the maximum of the highest peak of the frequency response, i.e., as the largest singular value of the transfer function evaluated on the imaginary axis (i.e., of the frequency response): σmax [D + C(jω − A)−1 B]. This result provides rigorous justification for using the reduced-order model to predict behavior and enact control of the full system. The upper bound for the
3.5 Passive-preserving truncated balanced realization methods
43
error system given above translates to an upper bound for the energy of the output error signal.
3.5 3.5.1
Passive-preserving truncated balanced realization methods Passivity and positive-realness A passive system cannot produce energy internally. It is desired that reduced models also be passive. Otherwise, the reduced models may cause non-physical behavior when used in system simulations, such as by generating energy at high frequencies that cause erratic or unstable time-domain behavior. For many electrical systems of interest, passivity is implied by positive-realness of the transfer function. The matrix H(s) is said to be positive real (PR) if: H(s) = H(s),
(3.35)
H(s) is analytic in {s : Re(s) > 0},
(3.36)
H(s) + H(s)
H
≥ 0 {s : Re(s) > 0}.
(3.37)
In the above, H denotes complex conjugate, H H denotes Hermitian (complex conjugate and transpose), and ≤ in a matrix context denotes semidefiniteness.
3.5.2
Positive-realness constraints The central tool in relating passivity of state-space models to positive-realness of the transfer function is the positive-real lemma [4]. Let (A, B, C, D) describe a state-space model. The positive-real lemma states that the system is passive if and only if there exists a P (∈ Rn×n ) ≥ 0 satisfying the linear matrix inequality T A P + P A P B − CT ≤ 0. (3.38) B T P − C −(D + DT ) Two important methods for generating guaranteed passive models of reduced systems, positive-real TBR [87] and convex optimization for passivity enforcement method [22], are both based on this lemma. We will introduce positive-real TBR (PR-TBR) in this section.
3.5.3
Lur’e equations We assume D + DT > 0. For RLC models in a modified nodal analysis (MNA) format, we have A + AT ≤ 0, B = C T , and D = 0. The MNA system can then be transformed into an equivalent form with D + D T > 0. We will discuss this case in the following sections. Using the Schur complement, the solution of the positive-real lemma is equivalent to the following Lur’e equations. Therefore, the positive-real lemma means that H(s) is positive real if and only if there exist matrices Xc = XcT , Jc , Kc such that
44
Truncated balanced realization methods for MOR
the following Lur’e equations [5] are satisfied AXc + Xc AT = −Kc KcT , T
Xc C − B = Jc JcT
−Kc JcT , T
=D+D ,
(3.39) (3.40) (3.41)
and Xc ≥ 0 (Xc is positive semidefinite), Xc is analogous to the controllability Gramians. In fact, it is the controllability Gramian for a system with the input-tostate mapping given by the matrix Kc (Kc is treated as input matrix). Similarly there is a dual set of Lur’e equations for Xo = XoT > 0: Jo , Ko are obtained from above equations by the substitutions A → AT , B → C T , C T → B. The dual equations AT Xo + Xo A = −KoT Ko , Xo B − C
T
JoT Jo
=
−KoT Jo , T
= D+D ,
(3.42) (3.43) (3.44)
have a corresponding observability quantity Xo ≥ 0 for a positive-real H(s). It can be verified that Xc Xo transform under similarity just as Wc Wo , so that their eigenvalues are invariant, and in fact they behave as the Gramians Wc and Wo .
3.5.4
Algebraic Riccati equations Combining the three equations gives AXc + Xc AT + (B − Xc C T )(D + DT )(B − Xc C T )T = 0,
(3.45)
which is the so-called algebraic Riccati equation (ARE) [7]. Taking the matrix root LLT = (D + DT )−1 and defining B0 = BL, C0 = LT C, and A0 = A − B0 C0 , the ARE is rewritten as AT0 Xc + Xc A0 + Xc B0 B0T Xc + C0T C0 = 0.
(3.46)
Also, there is a dual set of ARE that are obtained from above equations by the substitutions A0 → AT0 , B0 → C0T , C0T → B0 . The dual equation is A0 Xo + Xo AT0 + Xo B0 B0T Xo + C0T C0 = 0.
3.5.5
(3.47)
Passivity-preserving truncated balanced realization A passivity-preserving reduction procedure is to solve Lur’e equations or algebraic Riccati equations (AREs) for the quantities Xc ,Xo , which may then be used as the basis of a TBR procedure. ˜c = X ˜ o = Σ, with Σ again being We may find a coordinate system in which X ˜ ˜ diagonal. In this coordinate system, the matrices A, B, C˜ may be partitioned and truncated, similar to the standard TBR procedure. Here is the outline of this algorithm [87].
3.7 Empirical TBR and poor man’s TBR
45
Algorithm 3.2: passive truncated balanced realization 1. Solve Lur’e equations or AREs for Xc and Xo . 2. Proceed with steps 3 to 8 in Algorithm 3.1, substituting Xc for Wc and Xo for Wo .
3.6
Hybrid TBR and combined TBR-Krylov subspace methods A hybrid TBR method was proposed in [87] where the classic TBR method is tried first. If the reduced models are not passive, then the positive-real TBR is tried. The rationale for doing this is that TBR approximates are more accurate for a given order than PR-TBR. Also PR-TBR costs more than TBR in computation. The hybrid TBR is presented as follows: Algorithm 3.3: hybrid truncated balanced realization 1. Perform Algorithm 3.2. ˜ B, ˜ C, ˜ solve Lur’e equatons or AREs for 2. Using the reduced model matrices A, ˜c. X ˜ c ≥ 0, then terminate and return A, ˜ B, ˜ C; ˜ 3. If those equations are solvable and X otherwise discard the TBR-reduced model and proceed with Algorithm 3.2. Truncated balanced realization algorithms are important from the theoretical point of view. For small systems (a few hundred states or so) they are superior in accuracy to the Krylov subspace methods and other parameter-matching techniques, and also provide computable bounds on the reduction errors. For large systems, direct application of the techniques used to balance and truncate the systems is computationally infeasible, since the computations required have O(n3 ) complexity when performed directly (n being the order of the system to be reduced). Therefore, the TBR methods are of more interest when combined with iterative Krylov-subspace procedures. The idea is to obtain an initial reduced model via some initial reduction or approximation technique and then further compress it using a TBR method. As a matter of fact, the initial approximation can be generated by any desired method, for example rational fitting, or a Krylov-subspace technique. Actually a combined TBR and Krylov approach was investigated in [60] for interconnect and packing modeling and nearly optimal reduced models were obtained using such a combined reduction scheme.
3.7
Empirical TBR and poor man’s TBR As mentioned already, the TBR method is a computationally expensive reduction process owing to the cubical time complexity for computing the Gramians. One strategy to overcome this problem is to use approximate Gramians, which can be
46
Truncated balanced realization methods for MOR
computed more cheaply. Such a strategy is generally referred to as Empirical TBR and has been explored in the poor Man’s TBR method for interconnect model order reduction [89, 90]. Poor man’s TBR also can be viewed as a special form of proper orthogonal decomposition methods applied to linear dynamic systems as shown in (3.11).
3.7.1
Poor man’s TBR In this subsection, we briefly review the poor man’s TBR method [90]. The basic idea of Poor Man’s TBR method is to recognize that the Gramian in the frequencydomain form can be approximated by using the state responses under impulse responses in frequency domain. Specifically, given a linear system in its state-space form dx = Ax(t) + Bu(t), dt y(t) = Cx(t),
(3.48) (3.49)
where A ∈ Rn×n , B ∈ Rn×p , C ∈ Rp×n , y(t), and u(t) ∈ Rp . We further assume that a stable circuit under discussion has a symmetric state matrix A = AT and the same input and output terminals, i.e., C = B T . In this case, both Gramians are equal. As a result, we only need to look at the controllability Gramian, X, Z ∞ T eAt BB T eA t dt. (3.50) X= 0
Since the Laplace transform of e is (sI − A)−1 , the Gramian X in the frequency domain becomes: Z ∞ X= (jωI − A)−1 BB T (jωI − A)−H dω, (3.51) At
−∞
where superscript H denotes the Hermitian transpose. We now consider evaluating X by applying numerical quadrature to the above equation, given a quadrature scheme with nodes ωk and weights ωk , and defining zk = (jωk I − A)−1 B.
(3.52)
Notice that (3.52) can essentially be viewed as the state response of (3.48) under impulse responses. For general POD methods, we can apply typical inputs to ˆ to X can then be the system and obtain the responses. An approximation of X computed as X ˆ= X wk zk zkH , (3.53) k
which can be viewed as the response matrices’ Gramian. Let Z be a matrix whose columns are zk , and W a diagonal matrix with diagonal √ entries Wkk = wk . The above equation can be written more compactly as ˆ = ZW 2 Z H = (ZW )(W Z H ) = (ZW )(ZW )H . X
(3.54)
3.8 Computational complexities of TBR methods
47
Following the POD method, we perform singular value decomposition of ZW to perform the model reduction. ZW = V SU,
(3.55)
with SZ real diagonal, V and U unitary matrices. As expected, we have the eigenˆ decomposition of the approximate Gramian X ˆ = X = = =
(ZW )(ZW )T (V SU )(V SU )T V SU U T SV T V S2V T .
(3.56)
ˆ Therefore, Hence, the dominant singular vectors in V are the eigenvectors of X. V converges to the eigenspaces of X, and the Hankel singular values are obtained directly from the entries of S. V can then be used as the projection matrix in a model order reduction scheme. The poor man’s TBR method is illustrated in the following: Algorithm 3.4: PMTBR: poor man’s TBR 1. 2. 3. 4. 5.
Do until satisfied: Select a frequency point si . Compute zi = (si I − A)−1 B. Form the matrix with columns: Z(i) = [z1 , z2 , · · · , zi ]. Construct the SVD of Z(i) . If the error is satisfactory, set Z = Z(i) , go to Step 6. Otherwise, go to Step 2. 6. Construct the projection space V from the orthogonalized column span of Z, dropping columns whose associated singular values fall below a desired tolerance.
Because the full Gramians are not computed, as developed here, poor-man’s TBR does not possess the stability and passivity preserving properties of the full-blown TBR algorithms [90].
3.8
Computational complexities of TBR methods In this section, we briefly discuss the time complexities of TBR methods as this is important to understand the advantages and limitations of various TBR methods. Specifically, we analyze the time complexities of standard TBR and poor man’s TBR methods. Similar analysis can be found at [90].
3.8.1
Standard TBR methods We assume that we start with a linear dynamic system with n states, as in (3.48). We want to derive the cost of computing a qth order model and q n. To analyze
48
Truncated balanced realization methods for MOR
the TBR and its variants, we need to quote the time complexities of the basic operations involved. For the SVD operations, the cost is roughly O(nq 2 ) for q n [44]. Note that the time complexity of the full-blown SVD on an n × m matrix, where n > m, is approximately O(nm2 ) where all the singular values are computed. But, practically, we can compute the dominant q singular values instead, which can lead to O(nq 2 ) complexity. The linear matrix solving takes O(nα ) (typically, 1 ≤ α ≤ 1.2 for sparse circuits), and matrix factorizations take O(nβ ) (typically, 1.1 ≤ β ≤ 1.5 for sparse circuits). The formation of transformation matrices is O(q 2 n) and similarity transformation takes roughly O(nq). The TBR algorithm implies the computation of the Gramians X and W , and the eigenvalues of their product, at a cost of O(n3 ). Since we need to take two Cholesky decompositions, with O(nβ ), the total cost of standard TBR is approximately O(n3 + 2nβ + nq 2 + nq).
(3.57)
Clearly, the O(n3 ) limits the TBR method to reduce large systems.
3.8.2
Poor man’s TBR methods Assuming that q frequency points are chosen in the quadrature scheme for PMTBR, an examination of Algorithm 3.4, indicates that it involves one SVD, q linear solutions, and q factorizations, for a total cost of O(nq 2 + qnα + qnβ + nq 2 + nq).
(3.58)
The last two items are the cost of the generating the transformation matrix and performing the reductions as mentioned before. As we can see, the poor man’s TBR is much more scalable. It was shown that the computing cost of poor man’s TBR is similar to that of the multi-point Krylov subspace methods [90].
3.9
Practical implementation and numerical issues In this section, we discuss some practical implementation and numerical issues for the standard TBR algorithm and its variants.
3.9.1
Model reduction of unstable systems Directly applying the TBR method mentioned above may encounter some numerical issues. One problem we found in practical implementation of TBR methods is that the controllability and observability Gramians computed from Lyapunov equations (3.24) may not be positive definite. As a result, we cannot perform the Cholesky decomposition in step (3) in Algorithm 3.1 and the algorithm simply breaks. The typical reason for this problem is the numerical noise in the reduction
3.9 Practical implementation and numerical issues
49
process, which can make a given system unstable even though the original system is stable and passive. In this section, we present a method to deal with this problem [10]. The basic idea is to decompose the system into two parts, the stable part and unstable part, and then perform the reduction on the stable part only. Given a system transfer function matrix G(s), which can be rewritten into the following form: G(s) = G− (s) + G+ (s),
(3.59)
such that G− is stable and G+ is unstable. Then any of the absolute or relative error state-space truncation methods for model reduction can be applied to G− in ˜ − , and the reduced order system order to obtain a reduced-order transfer function G ˜ ˜ is synthesized by G(s) = G− (s) + G+ (s). For model order reduction of RLC circuits, which are stable and passive, we can just simply ignore the unstable part, G+ (s), as it is definitely not a dominant part of the system. But for some control application, those unstable systems may be created specifically. Hence, we still need to put them back to the final reduced system. We now describe how the decomposition can be computed using the matrix sign function. The matrix sign function of Z is defined as −Ik 0 S −1 . (3.60) sign(Z) := S 0 In−k If an eigenvalue λ > 0, then its corresponding eigenvalue in sign(Z) is +1 and if an eigenvalue λ < 0, then its corresponding eigenvalue in sign(Z) is −1. Note that sign(Z) is unique and independent of the order of the eigenvalues in the Jordan decomposition of Z. Consider the realization (A, B, C, D) of a continuous-time linear time-invariant system, and let sign(A) denote the sign function of A. We start by computing a QR factorization of In − sign(A) as In − sign(A) = QRP, R11 R12 , R= 0 0
(3.61)
where Q ∈ Rn×n is orthogonal, R ∈ Rn×n is upper triangular, with R11 ∈ Rk×k , and P ∈ Rn×n is a permutation matrix. Note that all the positive eigenvalues of A will become 2 in In − sign(A), while all the negative eigenvalues of A will become zero theoretically as shown in the last n − k rows of R. But they may not be exactly zero numerically. So the zeros in the last n − k rows of R are to be understood as “zero with respect to a given tolerance threshold.” Then the first k columns of Q span the stable A-invariant subspace. Thus, A11 A12 T ˜ , (3.62) A := Q AQ = 0 A22
50
Truncated balanced realization methods for MOR
where Λ(A11 ) = Λ(A) ∩ C − , and Λ(A22 ) = Λ(A) ∩ C + , Λ(X) means the eigenvalues of X. In a second step, we compute a matrix V ∈ Rn×n such that Ik Y A11 0 Ik −Y −1 ˜ ˆ , (3.63) A := V AV = 0 In−k 0 A22 0 In−k where Y ∈ Rk×n−k satisfies the Sylvester equation A11 Y − Y A22 + A12 = 0.
(3.64)
As Λ(A11 ) ∩ Λ(A22 ) = ∅, the equation has a unique solution. Sylvester equations with strictly stable or unstable coefficient matrices can be solved using the iterative algorithm [10]. The desired additive decomposition of G(s) = C(sI − A)−1 B + D is finally obtained by performing the state-space transformation ˆ B, ˆ C, ˆ D) ˆ : = (V −1 QT AQV, V −1 QT B, CQV, D) (A, A11 0 B1 = , , C1 C2 , D , 0 A22 B2
(3.65)
ˆ B, ˆ Cˆ are partitioned conformally so that where A, G(s) = C(sI − A)−1 B + D ˆ ˆ −1 B ˆ +D ˆ = C(sI − A)
= C1 (sIk − A11 )−1 B1 + D + C2 (sIn−k − A22 )−1 B2
= G− (s) + G+ (s),
(3.66)
where G− (s) is a stable part and G+ (s) is a unstable part of transfer function matrix. Then we can proceed to perform the TBR on the G− (s).
3.9.2
Generalized eigenproblem method for AREs When solving the ARE equation in (3.45) for linear VLSI interconnect RLC circuits, the directly coupling part D matrix is typically zero (or singular). As a result, we cannot solve (3.45) in case D matrix is zero or singular. This section presents numerical methods to solve AREs for both non-singular and singular D cases. Solutions of the Lur’e equations and solutions of algebraic Riccati equations are closely related. We summarize the basic numerically robust computational procedure below [103]. ARE solution for non-singular D + D T Let F , X, Y be real n × n matrices with X and Y symmetric and positive semidefinite. Assume that there exists a symmetric matrix P ∈ R n×n that satisfies the Riccati equation F T P + F P + P XP + Y = 0,
(3.67)
3.9 Practical implementation and numerical issues
51
and that all the eigenvalues of F + XP are in C − (i.e., they have (strictly) negative real parts). Then it follows that such a P is unique and we shall use the notation Ric(F, X, Y ) to refer to it. P = Ric(F, X, Y ) can be determined by finding the eigenvectors of the matrix M :=
F X −Y −F T
∈ R2n×2n ,
(3.68)
which has n eigenvalues in C − and n in C + . Let the eigenvectors corresponding to the eigenvalues in C − be denoted by w1 , · · · , wn , and define W := [w1 , · · · , wn ] ∈ W1 R2n×n . Partitioning W as W = , then it follows that W1 ∈ Rn×n is nonW2 singular and Ric(F, X, Y ) = W2 W1−1 . Following is the algorithm for the computation of P matrix for a positive-real system (A, B, C, D) with non-singular D + D T . Algorithm 3.5: computation of P with non-singular D + D T 1. Let F = A − BKC, K = (D + D T )−1 . 2. Determine P = Ric(F, BKB T , C T KC).
ARE solution for singular D + D T Now we consider constructing a positive real (PR) solution algorithm for the case of singular D + DT . In Algorithm 3.6, the system (A, B, C, D) is first transformed to a system (AI , BI , CI , DI ) with non-singular DI + DIT to which Algorithm 3.5 is applicable. The P matrix of the original system is then computed based upon those of (AI , BI , CI , DI ) [103]. Following is the algorithm for the computation of P matrix in the positive-real lemma for an positive-real system (A, B, C, D) with r := rank(D + DT ) < m Algorithm 3.6: computation of P matrix with singular D + D T ¯ := (D + DT )/2: 1. Perform the singular value decomposition of D Σ0 ¯ D=U UT , 00
(3.69)
and set R := U
1 Σ− 2 0 . 0 I(m−r)×(m−r)
(3.70)
¯ to Then transform D Ir×r 0 T ¯ ˆ D := R DR = . 0 0
(3.71)
52
Truncated balanced realization methods for MOR
2. Partition BR := BR and CR := RT C into BR = B1 B2 , C1 , CR = C2
(3.72) (3.73)
with B1 ∈ Rr×n and C1 ∈ Rr×n . Pick n − m + r linearly independent vectors v1 , · · · , vn−m+r that span the null space of C2 . Set V := [v1 , · · · , vn−m+r ] and T := [B2 V ]. Note that T is always invertible since C2 B2 is invertible. Partition T −1 into # B2 −1 , (3.74) T = V# B2# = (C2 B2 )−1 C2 .
(3.75)
¯ to (A, ˆ B, ˆ C, ˆ D) ˆ by the positive-real preserving transfor3. Transform (A, B, C, D) mation (T, R): # " Aˆ11 Aˆ12 −1 ˆ , (3.76) A : = T AT = ˆ ˆ A21 A22 " # ˆ B I 11 (m−r)×(m−r) ˆ : = T −1 BR = B , (3.77) ˆ21 B 0 " # Cˆ11 C12 ˆ C : = CR T = ˆ , (3.78) C21 0 ˆ ˆ : = Ir×r 0 . (3.79) D 0 0 Go to the next step only if Cˆ21 = C2 B2 is symmetric and positive definite; otherwise the system in question is not positive real. 4. Determine the state-space realization (AI , BI , CI , DI ) of the partial inverse of ˆ B, ˆ C, ˆ D) ˆ as follows: (A, AI : = Aˆ22 ∈ R(n−m+r)×(n−m+r), ˆ21 Aˆ21 ], BI : = [B # " Cˆ12 , CI : = −Cˆ21 Aˆ12 Ir×r C1 B2 . DI : = −C2 B1 −C2 AB2
(3.80) (3.81) (3.82) (3.83)
Go to the next step only if DI is positive definite; otherwise the system in question is not positive real. 5. Use Algorithm 3.5 to find the positive-real matrix, PI , for (AI , BI , CI , DI ) where DI + DIT is non-singular:
53
3.10 Numerical examples
n 1 2 3 4 5 6 7 8 9 10
σ ∞ 32.643 6.7536 2.0489 0.71765 0.40582 0.10189 0.023015 0.009922 0.0031625
n 11 12 13 14 15 16 17 18 19 20
σ 0.00074038 0.00040137 7.6789 × 10−5 3.0178 × 10−5 1.0206 × 10−5 2.323 × 10−6 8.8947 × 10−7 1.1729 × 10−7 4.2241 × 10−8 1.1495 × 10−8
n 21 22 50 100 150 200 250 300 400 516
σ 1.8735 × 10−9 7.025 × 10−10 5.2306 × 10−19 3.1665 × 10−20 3.157 × 10−21 3.8209 × 10−23 2.5916 × 10−27 8.5601 × 10−36 1.1436 × 10−52 3.2594 × 10−89
Table 3.1 The Hankel singular values for a six-port linear interconnect circuit.
PI = Ric(FI , BI KI BIT , CIT KI CI ),
(3.84)
where KI := (DI + DIT )−1 and FI := AI − BI KI CI . 6. Compute P according to T T P := V # PI V # + B2# Cˆ21 B2# .
(3.85)
If matrix D = 0, then all the submatrices in the above algorithm with r number of rows or columns are removed from their main matrix, e.g., BR = [B1 B2 ], B1 ∈ Rn×r becomes BR = B = B2 if r = 0.
3.10
Numerical examples In this section, we present some numerical results of TBR-based reduction of an interconnect circuit with six ports from a practical industry chip. The RLC interconnect circuit has 516 states in which there exist one unstable state (pole). So its stable part is isolated, balanced. The entries corresponding to unstable modes are set to ∞. After balancing the system, we list the Hankel singular values in Table 3.1 in descending order where n indicates the singular value indices and sigma denotes the singular value. From Table 3.1, we can observe that the Hankel singular values of the system decay extremely rapidly. As a matter of fact, many RLC interconnect circuits have such a property. Hence, very low rank approximations are possible and accurate low-order reduced models can be obtained. After the truncation, the resulting Gramians contain the largest k singular values σ1 through σk (k = 1, 2, 3, 4, 5, respectively). The approximation results with different singular values (states) are shown in
54
Truncated balanced realization methods for MOR
80
Magnitude (dB)
60 40 20
Original 2 states
0
3 states 4 states
−20
5 states
Phase (deg)
−40
6 states
−60
−90 9
10
10
10
11
10
12
10
13
10
Frequency (rad/s)
14
10
15
10
16
10
Figure 3.1 Frequency responses of a reduced model and its original system.
Fig. 3.1. From Fig. 3.1, we observe that first the more states we retain, the better both curves can match. Second, even a reduced model, which retains a small number of dominant states (six in this case), can be accurate to approximate the original system with a large number of states (516 in this case). Next, we use the ten largest states to approximate the original system. The frequency response of the input impedance for port 1 is shown in Fig. 3.2. We can see that the two responses are almost the same up to 1016 Hz.
3.11
Summary In this chapter, we have introduced truncated balanced realization methods for model order reduction of general linear time-invariant dynamic systems. TBR method is a the singular-value-decomposition-based approaches. In the TBR methods, the system is first transformed into an eigenspace of the balanced controllability and observability Gramian. Then SVD is performed and states with weak controllability and observability are dropped. We also discussed positive real TBR methods, which can produce passive models for passive circuits but at the expense of high computation costs. To avoid the high computation costs associated with solving
55
3.11 Summary
100
Magnitude (dB)
Original Reduced
50
0
−50
Phase (deg)
−30
−60
−90
8
10
10
10
12
10
Frequency (rad/s)
14
10
16
10
Figure 3.2 Frequency response of the input impedance of a reduced model and its original system.
Lyapunov equations for classic TBR methods, an empirical TBR method was also presented, where Gramians are computed by sampling the frequency-domain state impulse responses. Finally we discuss some numerical issues associated with implementations of TBR methods and see how those issues are resolved in the solving of Lyapunov equations and algebraic Riccati equations.
4 Passive balanced truncation of linear systems in descriptor form
4.1
Introduction In Chapter 3, we introduced the truncated balanced realization (TBR) method for compact modeling of interconnect circuits. In this chapter, we introduce a new passive TBR method, which only requires the solving of generalized Lyapunov equations and is able to preserve the structure of the reduced models just like the structure-preserving Krylov subspace method of [37]. In the TBR method, two steps are involved in the reduction process: the balancing step aligns the states such that states can be controlled and observed equally. The truncating step then throws away the weak states, which usually leads to much smaller model. The major advantage of TBR methods is that TBR methods can give deterministic global bound for the approximate error and can give nearly optimal models in terms of errors and model sizes [60]. Standard TBR algorithms, which solve the Lyapunov equations (linear matrix equations) do not necessarily preserve passivity. To ensure passivity, positive-real TBR (PR-TBR) has to be carried out [88, 131] by solving more difficult Lur’e or Riccati equations, which can be computationally prohibitive as they are non-linear (quadratic) matrix equations. Given a state-space model in descriptor form E dx dt = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t),
(4.1)
where E, A ∈ Rn×n , B ∈ Rn×p , C ∈ Rp×n , D ∈ Rp×p , and y(t), u(t) ∈ Rp . When E = I, (4.1) is in standard state-space form. Note that the descriptor form is the natural form of the circuit MNA matrices for interconnect circuits, where E is the matrix of storage elements, A is the matrix of conductance and B = C T is the input and output position matrices and D = 0. The existing passive TBR methods first convert original descriptor systems into standard state-space equations by mapping E → I, A → E −1 A, and B → E −1 B and then solving two Lur’e or Riccati equations to guarantee the passivity of reduced model. However, there are several issues related to PR-TBR: first, it is not numerically reliable in the sense that given an ill-conditioned E, the mapping can generate too much numerical error and sometimes even the stability of the sys56
4.2 The passive balanced truncation algorithm: PriTBR
57
tem cannot be guaranteed in this process. Second, Lur’e and Riccati equations are quadratic matrix equations and thus usually more expensive than Lyapunov equations, which are linear matrix equations. Third, the structure property (symmetry, sparsity) inherent to RLC circuits cannot be preserved after the reduction. Fourth, in most cases, PR-TBR is not as accurate as standard TBR [88]. In this chapter, we present a novel passivity-preserving TBR method, named PriTBR, for interconnect modeling. Instead of working on the standard state-space equations, PriTBR works on space equations in descriptor form directly by solving generalized Lyapunov equations. Since the circuit matrix structure is preserved in the state-space equations in the descriptor form, congruence transformation can be applied to ensure the passivity of reduced models. Compared with existing PRTBR, which solves more difficult Lur’e or Riccati equations on standard statespace equations, the new method is numerically reliable, less expensive, and more accurate. The new method has the same accuracy as the standard TBR method for RC circuits using the first-order formulation and can easily be extended to RLCM circuits to preserve the structure information of RLC circuit matrices using the second-order formulation. By combining it with structure-preserving Krylovsubspace based MOR methods, it can generate optimal structure preserved reduced model from large-scale circuits.
4.2
The passive balanced truncation algorithm: PriTBR The new approach is motivated by recent balanced truncation work on the linear systems in descriptor form [119] as shown in (4.1). But the existing method does not give passive reduction. In the PriTBR method, we work directly on systems in descriptor form, which is the natural form of RLC circuits in MNA formulation. Instead of obtaining a balanced form and truncating, we compute the basis, which spans the dominant subspace corresponding to the first q largest Hankel singular values. After that, we orthonormalize it and project the system onto it so that the reduction process can be viewed as a congruence transformation. The difference between the new method and the standard TBR method is just like the difference between PRIMA [85] and Pade approximation via Lanczos (PVL) [32].
4.2.1
Generalized balanced truncation by projection Given a state-space model in descriptor form in (4.1) with the stable matrix pencil λE − A, which is usually the case in an RLC circuit, we first assume that E is nonsingular. This restriction can easily be released with some additional steps [119]. If the system is in descriptor form, the controllable and observable Gramians can be
58
Passive balanced truncation of linear systems in descriptor form
computed by solving generalized Lyapunov equations [119]. EP AT + AP E T = −BB T , E T QA + AT QE = −C T C.
(4.2)
The matrix P E T p QE has non-negative eigenvalues, and the square roots of these eigenvalues σj = λj (P E T QE) define the Hankel singular values of the system. We assume that σj are ordered decreasingly. The system is called balanced if P = Q = diag(σ1 , σ2 , . . . , σn )
(4.3)
P E T QE here is similar to Wc Wo in the standard TBR method. After solving the generalized Lyapunov equations, one can compute the similarity transformation matrix T in (3.26) based on the square-root method [68]. The subsequent steps in [119] are the same as the TBR method in standard form: to first balance the system and then truncate it. However, in this paper, instead of obtaining a balanced form and truncating, we perform reduction from the viewpoint of projection. As we know in standard TBR, the similarity transformation matrix T is the eigenmatrix of P E T QE T P E T QET −1 = diag(σ1 2 , σ2 2 , . . . , σn 2 )
(4.4)
Then, the balancing and truncation can be viewed as a special projection of the system onto the dominant eigenspace of the matrix P E T QE corresponding to the q largest eigenvalues. Those eigenvalues are more important in terms of input–output behavior of the system. The dominant subspace is spanned by the first q columns of matrix T . If we partition T into T = T1 T2 , (4.5)
where T1 ∈ Rn×q , T2 ∈ Rn×(n−q) . Then the dominant subspace is spanned by basis T1 . Orthonormalize the basis X = orth(T1 ),
(4.6)
where X ∈ Rn×q . X T X = I and X spans the same subspace as T1 span(X) = span(T1 ).
(4.7)
After that, we perform the reduction directly by projecting the system onto the subspace spanned by X x = Xx ˜
(4.8)
where x ˜ ∈ Rq is the reduced system variable. The reduced-order system matrices are ˜ = X T B, C˜ = CX. A˜ = X T AX, E˜ = X T EX, B
(4.9)
These types of transformation are known as congruence transformations. In the following, we review congruence transformation and its passivity-preserving property.
4.2 The passive balanced truncation algorithm: PriTBR
4.2.2
59
Passive reduction through congruence transformation For RLC interconnect circuits, we can formulate the original circuit matrix with MNA formulation into the so-called passive form [85]: C dx dt = −Gx(t) + Bu(t), y(t) = B T x(t),
(4.10)
such that the conductance matrix and the storage element matrix are positive semidefinite (.i.e., G ≥ 0 and C ≥ 0) and the input and output position matrices are the same. Notice that such passive form is also the descriptor form in (4.1). Then the transfer function of the model will be positive real, meaning that the model is passive. ˜ = X T GX, B ˜ = X T B. C˜ = X T CX, G
(4.11)
Since congruence transformation preserves the definiteness of the matrix, the re˜ C˜ are still positive semidefinite. Then the transfer function of the reduced duced G, model will be positive real, and thus passive. Therefore, PriTBR can preserve the passivity for general RLC circuits. For large systems, direct application of balanced truncation is computationally infeasible. Therefore, the methods are of more interest when combined with iterative Krylov-subspace procedures like PRIMA. Since an initial reduced model via PRIMA still has this convenient passive form, PriTBR can always be used as the second stage of a composite model reduction procedure to generate a passive preserved reduced model.
4.2.3
Comparison with PR-TBR Compared with passive PR-TBR method, this process has the following advantages: First, we do not need to put a descriptor system into a standard form by mapping E → I, A → E −1 A, B → E −1 B. After an initial projection MOR step (assuming Krylov-subspace method is performed as a first stage), E is usually non-singular [88] but maybe ill-conditioned. As a result, the result by PR-TBR is no more accurate from the numerical point of view. Even worse, sometimes, an unstable system can be generated after the mapping, so that the Lur’e equations do not have positive semidefinite solutions. Second, the Lur’e equation is a quadratic matrix equation, which is more expensive than linear matrix equations like the Lyapunov equation. The generalized Lyapunov equation is still a linear matrix equation; when E is non-singular, it costs almost the same as the Lyapunov equation [119]. Third, PriTBR has the same accuracy as standard TBR but PR-TBR is usually not so accurate as standard TBR [88]. Fourth, as shown in the next section, PriTBR can be generalized to preserve structure and reciprocity. However, like PRIMA, it cannot be used on systems of non-RLC circuits because
60
Passive balanced truncation of linear systems in descriptor form
both of them rely on congruence transformations to preserve passivity.
4.2.4
PriTBR reduction algorithm In this section, we present the complete PriTBR reduction flow. On top of PriTBR, we also present the combined PriTBR and PRIMA flow. Algorithm 4.1: projection-based passive TBR (PriTBR) 1. 2. 3. 4.
Solve EP AT + AP E T = −BB T for P . Solve E T QA + AT QE = −C T C for Q. Compute Cholesky factors P = Lp LTp , Q = LQ LTQ . Compute SVD of LTp E T LQ : Σ1 0 [V1 , V2 ]T . LTp E T LQ = [U1 , U2 ] 0 Σ2
(4.12)
−1/2
5. Compute the dominant basis T = Lp U1 Σ1 . 6. Orthonormalize T : X = orth(T ). ˜ = X T EX; A˜ = X T AX; B ˜ = X T B; C˜ = 7. Compute the reduced system with E CX. The basic algorithm is a generalization of the square-root method used in standard balanced truncation [68]. The new method can also be combined with a projection-based MOR method like PRIMA to deal with large scale circuits. This is similar to the method in [60]. Algorithm 4.2: combined PRIMA and PriTBR model order reduction 1. Perform PRIMA to obtain a small-size passive-preserved initial reduced model from a large-size original model. 2. Perform PriTBR to obtain an optimal passive-preserved final reduced model from the initial model.
4.3
Structure-preserved balanced truncation While PR-TBR generates provably passive reduced-order models, it does not preserve other structures, such as reciprocity (symmetry) or the block structures of the circuit matrices, inherent to RLC circuits. However, the PriTBR in the previous section can easily be extended such that both passivity and structure can be preserved in a balanced truncation process.
4.3.1
Structure-preserved balanced truncation Similar to the SPRIM [37], we assume that only current sources are applied and the transfer function is an impedance matrix function. Z(s) = B T (G + sC)−1 B.
(4.13)
4.3 Structure-preserved balanced truncation
61
In the modified nodal formulation (MNA) of an RLC circuit, G, C, B have the block structure B1 C1 0 G1 GT2 . (4.14) ,B = ,C = G= 0 0 C2 −G2 0 Let V be the matrix T1 used in PriTBR. Let V1 V = V2
(4.15)
be the partitioning of V corresponding to the block sizes of G and C. We then split the projection matrix V and orthonormalize each block respectively as orth(V1 ) 0 . (4.16) V˜ = 0 orth(V2 ) Notice that span(V ) ⊆ span(V˜ ).
(4.17)
We, therefore, can project onto V˜ , which spans at least the same dominant subspace as V . As shown in experiments, it has the good properties of the standard balanced truncation and matches the original model well globally. At first glance, SP-PriTBR is not optimal in model size compared with PriTBR because, given the same error bound, the SP-PriTBR model would be twice as large as the corresponding PriTBR model. However, the SP-PriTBR model can always be represented in the second-order form. In this sense, the SP-PriTBR model (when written in second-order form) has the same dimension as PriTBR model given the same error bound, G˜1 = V1T G1 V1 , G˜2 = V2T G2 V1 , C˜1 = V1T C1 V1 , C˜2 = V2T C2 V2 , B˜1 = V1T B1 and ˜= G
"
T G˜1 G˜2 −G˜2 0
#
˜ C˜1 0 ˜ = B1 . , B , C˜ = 0 0 C˜2
(4.18)
(4.19)
The reduced model Z˜n in first-order form, ˜ ˜ T (G ˜ + sC) ˜ −1 B, ˜ Z(s) =B
(4.20)
˜ T C˜ −1 G ˜ 2 )−1 B ˜ ˜ T (sC˜1 + G˜1 + 1 G ˜1 , Z(s) =B 1 s 2 2
(4.21)
˜ 1 0, G ˜ T2 C˜ −1 G ˜ 2 0, C˜1 0, G 2
(4.22)
and in second-order form,
and
˜ and thus Z(s) is passive and also symmetric. This means that the reduced model preserves reciprocity and can be more easily synthesized as an actual circuit.
62
Passive balanced truncation of linear systems in descriptor form
4.3.2
Structure-preserving PriTBR algorithm Algorithm 4.3: structure-preserved PriTBR (SP-PriTBR) 1. Perform Algorithm 4.2 (step 1 – step 5) for V = T1 . 2. Partition V corresponding to the block sizes of G and C, V =
V1 . V2
(4.23)
3. Set orth(V1 ) 0 ˜ . V = 0 orth(V2 )
(4.24)
4. Obtain the reduced model by projection: ˜ = V˜ T GV˜ , C˜ = V˜ T C V˜ , B ˜ = V˜ T B. G
(4.25)
The SP-PriTBR algorithm can also work together with the structure-preserving projection-based method like SPRIM to produce a structure-preserved compact model from large scale circuits as shown in the following algorithm: Algorithm 4.4: combined SPRIM and SP-PriTBR model order reduction 1. Perform SPRIM to obtain a small-size structure-preserved initial reduced model from a large-size original model. 2. Perform SP-PriTBR to get an optimal structure-preserved final reduced model from the first step.
4.4
Numerical examples In this section, we show examples that illustrate the effectiveness of proposed PriTBR methods and compare them with existing relevant approaches.
4.4.1
The accuracy of PriTBR First, we demonstrate empirically that PriTBR has the same accuracy as standard balanced truncation for RC circuits. Compared with PRIMA, both of them are optimal in the sense that given a reduced order, they are more accurate. Here, the original model is a 515-order RC circuit stimulated by a current source. In Figure 4.1, given the same reduced order ten, both PriTBR and standard balanced truncation match equally well with the original curve and far better than PRIMA of the same reduced order.
63
4.4 Numerical examples
100 Full (516) PRIMA (10) 80
PriTBR (10) TBR(10)
60
Magnitude (dB)
40
20
0
−20
−40
−60 8 10
10
10
12
10 Frequency (rad/s)
14
10
16
10
Figure 4.1 Frequency responses of TBR, PriTBR, and PRIMA reduced models and the original circuit.
4.4.2
The guaranteed passivity of PriTBR The second example is an RLC transmission line (order 904) with voltage source input. The initial reduction is done by PRIMA and then followed by a TBR and PriTBR, respectively. The final reduced order is 12. In Figure 4.2, the Nyquist plot of the driving-point impedance is shown. This Nyquist plot contains both the magnitude and phase information about the network impedance. It also provides a graphical test of port passivity. Indeed, it is well known that the Nyquist plots of positive-real transfer functions lie entirely in the right half of the complex plane. At first glance, it seems that the reduced models are both passive. But when zooming in, we find that, for PriTBR, the entire Nyquist plot lies in the right half of the complex plane. However, for standard balanced truncation, the Nyquist plot extends to the left half of the complex plane, which means that passivity is not preserved in the reduced model.
4.4.3
The numerical stability and accuracy of reduced model by PriTBR We use an RC circuit of 517 order with a voltage source input to demonstrate the advantage of PriTBR. We do a initial reduction by PRIMA and obtain a reduced model of 75 order, which matches the original curve very well. First, if we want to use PR-TBR for the second stage, we need map E → I, A → E −1 A, B → E −1 B to put a descriptor system into a standard form. We find that after the mapping, the standard form is no more stable and thus no positive semidefinite solution is available for the following Lur’e equations. Figure 4.3 is the pole-zero map of the system around the origin before and after mapping. We can
64
Passive balanced truncation of linear systems in descriptor form
TBR
2
0.02
0.01 Imaginary axis
Imaginary axis
1
0
−1
−2 −2
−1
0
1
Real axis
2
3
−0.02 −1
4
PriTBR
0.02
0
Real axis
1 −4
x 10
PriTBR (zoom in)
0.01 Imaginary axis
1 Imaginary axis
0
−0.01
2
0
−1
−2 −2
TBR (zoom in)
0
−0.01
−1
0
1
Real axis
2
3
4
−0.02 −1
0
Real axis
1 −4
x 10
Figure 4.2 Nyquist plots of the TBR reduced model and the PriTBR reduced model.
see after mapping a positive pole is generated. Then, we employ PriTBR as the second reduction stage and get a final reduced model of 15 order, which is still indistinguishable with the original curve. However, if we use PRIMA to get a reduced model of 15 directly, we find that the difference is very obvious.
4.4.4
Comparison of SPRIM and SP-PriTBR We use an RLC circuit of 302 order with current source input to compare SPRIM and SP-PriTBR with the same reduced order ten. The structure inherent to RLC circuit is preserved in both of them. We find that SP-PriTBR inherits the optimal property of standard balanced truncation and matches the original curve better than SPRIM in a wide frequency band.
4.5
Summary In this chapter, we presented a novel passive projection-based balanced truncation model reduction method, named PriTBR. The new method combines the TBR method with a projection framework to produce passive models for the first time.
65
4.5 Summary
1
1 Pole
0.8
0.8
Zero
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1 −1
−0.5
0
0.5
−1 −1
1
−0.5
0
0.5
1
12
12
x 10
x 10
Figure 4.3 Pole zero map of system before mapping. 100
Original 2ndTBR SAPOR TBR
80
Impedance
60 40 20 0 −20 −40 −60 −2 10
−1
10
Frequency (rad/s)
0
10
Figure 4.4 Frequency responses of PRIMA and combined PRIMA and PriTBR reduced models and the original circuit.
It enjoys the good error bounds from the TBR method and passive-reduction benefit from congruence transformation. Compared with existing passive TBR, the new technique is numerically reliable, more accurate and less expensive. In addition to passivity-preservation, we also showed that PriTBR can be extended to preserve structure information such as reciprocity and block structure. It can be applied as a second stage of model order reduction to work with Krylov subspace methods to generate a nearly optimal reduced model from a large-scale interconnect circuit while passivity, structure, and reciprocity can be preserved at the same time. Experimental results demonstrated the effectiveness of the new method and its advantage over existing passive TBR and projection-based MOR methods.
Passive balanced truncation of linear systems in descriptor form
2
10
2ndTBR SAPOR TBR
1
10
0
10
Absolute Error
66
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−2
10
−1
10
Frequency (rad/s)
0
10
Figure 4.5 Frequency responses of SPRIM and SP-PriTBR reduced models and the original circuit.
5 Passive hierarchical model order reduction
In this chapter, we focus on passive wideband modeling of RLCM circuits. We propose a new passive wideband reduction and realization framework for general passive high-order RLCM circuits. Our method starts with large RLCM circuits, which are extracted by existing geometry extraction tools like FastCap [83] and FastHenry [59] under some relaxation conditions of the full-wave Maxwell equations (like electro-quasi-static for FastCap or magneto-quasi-static for FastHenry) instead of measured or simulated data. It is our ultimate goal that we can obtain the compact models directly from complex interconnect geometry without measurement or full-wave simulations. The method presented in this chapter is called hierarchical model order reduction, HMOR, which is based on the general frequency-domain hierarchical model reduction algorithm [121,122,124] and an improved VPEC (vector potential equivalent circuit) [134] model for self and mutual inductance, which can be easily sparsified and is hierarchical-reduction friendly. The HMOR method achieves passive wideband modeling of RLC circuits via multi-point expension and the convex programming based passivity enforcement method. In this section, we will show that the frequency-domain hierarchical reduction is equivalent to implicit moment-matching around s = 0, and that the existing hierarchical reduction method by one-point expansion [121, 124] is numerically stable for general tree-structured circuits. We also show that HMOR preserves reciprocity of passive circuit matrices. We present a hierarchical multi-point reduction scheme to obtain accurate order-reduced admittance matrices of general passive circuits. An explicit waveform matching algorithm is applied for searching dominant poles and residues from different expansion points based on the unique hierarchical reduction framework. To enforce passivity, the state-space-based convex programming optimization technique [21] is applied to the model order reduced admittance matrix, which is to be presented in Chapter 10. To realize the passivity-enforced admittance, we use a general, reciprocity-preserving, passivity-preserving, multi-port network realization method based on a relaxed one-port network synthesis technique using Foster’s canonical form in an error-free manner, which is to be represented in Chapter 11. The resulting modeling algorithm can take in general RLCM SPICE netlists and generate SPICE netlists of passive multi-port models for any linear passive network with easily controlled model accuracy and complexity. 67
68
5.1
Passive hierarchical model order reduction
Overview of hierarchical MOR algorithm In this section, we first briefly review hierarchical subcircuit reduction and the onepoint frequency-domain hierarchical model order reduction method [122] before we present the multi-point hierarchical model order reduction method, HMOR. Then we give the flow of the hierarchical model order reduction and realization method.
5.1.1
Review of hierarchical subcircuit reduction
Four-boundary-node subcircuit (I) (B)
The rest of the circuit (R)
Figure 5.1 A hierarchical circuit. Reprinted with permission from [126] (c) 2000 IEEE.
Consider a subcircuit with some internal structures and terminals, as illustrated in Figure 5.1. The circuit unknowns — the node-voltage variables and branchcurrent variables — can be partitioned into three disjoint groups xI , xB , and xR , where the superscripts I, B, R stand for, respectively, internal variables, boundary variables and the remaining variables. Internal variables are those local to the subcircuit; boundary variables are those related to both the subcircuit and the rest of the circuit. Note that boundary variables include those variables required as the circuit inputs and outputs. With this, the system-equation set Ax = b, can be rewritten in the following form: II IB I I A A 0 x b ABI ABB ABR xB = bB . (5.1) 0 ARB ARR xR bR The matrix, AII , is the internal matrix associated with the internal variable vector xI . Assume that the row indices for AII are from 1 to m, the row indices for ABB are from m + 1 to m + l, and the row indices for ARR are from m + l + 1 to n, where m and l are the sizes of the submatrices AII and ABB respectively. Subcircuit suppression (also called Schur decomposition) is used to eliminate
69
5.1 Overview of hierarchical MOR algorithm
all the variables in xI , and to transform (5.1) into the following reduced set of equations: BB∗ BR B B∗ b x A A , (5.2) R = RB RR bR x A A where ABB∗ = ABB − ABI (AII )−1 AIB , 1 T IB = ABB − ABI [∆II , u,v ] A det(AII )
(5.3) (5.4)
where [∆u,v ]T is called the adjoint matrix of A, ∆u,v is the first-order cofactor, Schur decomposition of det(A) with respect to au,v , and is defined as ∆u,v = (−1)(u+v) det(Aau,v ), and matrix Aau,v is the (n − 1) × (n − 1) matrix obtained from matrix A by deleting row u and column v. ABB∗ is also called the Schur complement [40]. We also have bB∗ = bB − ABI (AII )−1 bI , 1 T I = bB − ABI [∆II u,v ] b . det(AII )
(5.5) (5.6)
Subcircuit suppression can be performed for all the subcircuits by visiting the circuit hierarchy in a bottom-up fashion. Since the number of internal variables is m, and the number of boundary variables is l, (5.4) and (5.6) can be written in the following expanded forms: BB aBB∗ u,v (s) = au,v (s) −
1 det(AII (s))
m X
II IB aBI u,k1 (s)∆k2 ,k1 (s)ak2 ,v (s),
(5.7)
BI I au,k (s)∆II k2 ,k1 (s)bk2 (s), 1
(5.8)
k1 ,k2 =1
where u, v = m + 1, ..., m + l and B bB∗ u (s) = bu (s) −
1 det(AII (s))
m X
k1 ,k2 =1
where u = m + 1, ..., m + l. From (5.7) and (5.8), we can observe that admittance B∗ aBB∗ u,v and input stimuli bu at boundary nodes will become rational functions of s once the subcircuit is suppressed. The term IB II II (aBI u,k1 (s)∆k2 ,k1 (s)ak2 ,v (s))/det(A )
(5.9)
is called the composite admittances as they are the admittances generated during subcircuit reduction. B∗ To obtain the rational functions for aBB∗ u,v (s) and bu (s), we need to compute the rational function for a determinant whose elements may again be rational functions of s. This task can be achieved using determinant decision diagrams (DDDs), which were originally proposed for symbolic analysis of large analog circuits [97, 110, 126] and will be discussed in Section 5.2. We will show how the DDD graphs can be modified to compute the rational function for a determinant in later sections.
70
Passive hierarchical model order reduction
An important issue is that if the symbolic expressions are kept (s-expressions are special symbolic expressions), the final expressions of the generated rational admittances are not free from cancellation. There are two cancellation types. The first one is common-factor cancellation from common factors between the numerator B∗ and denominator of the resulting rational admittances in aBB∗ u,v (s) and bu (s). The second one is term cancellation from canceling terms (the sum of those symbolic terms equals to zero). Such symbolic cancellation problems have been observed and discussed in [96,97] in the context of Y -∆ transformation. The cancellation of common factors may lead to exponential growth of the magnitude of coefficients in the numerators and denominators. As we can see in the later sections of this chapter, cancellationfree rational admittances are related directly to the exact admittance expressions computed from the flattened circuit matrix and are also very important to the new de-cancellation method. Fundamentally, all the cancellations are caused by subcircuit reductions as shown in [122]. The detailed explanation of the conditions for common-factor cancellations and term cancellations can be found in [97, 122]. Therefore, how to remove those common factors (de-cancellation) in the rational B∗ functions of aBB∗ u,v (s) and bu (s) in general becomes a key issue for the subcircuit reduction process. An efficient algorithm has been proposed for de-cancellation during the generalized Y -∆ transformation [97, 122]. We will review how cancellation-free rational functions can be obtained by efficient DDD graph operations during the general hierarchical circuit reduction process.
5.2
DDD-based hierarchical decomposition The idea of DDD-based hierarchical decomposition [126] is to represent all the determinants and cofactors in (5.7) and (5.8), as well as those determinants and cofactors in the final transfer functions of the desired circuit characteristics. Owing to the compactness of DDD graphs, DDD-based hierarchical circuit decomposition was shown to be superior to the previous methods [51, 117].
5.2.1
Determinant decision diagram In this subsection, we provide a brief overview of the notion of determinant decision diagrams [97, 110]. Determinant decision diagrams [110] are compact and canonical graph-based representations of determinants. The concept is best illustrated using a simple RC filter circuit like the one shown in Figure 5.2. Its system equations can be written as
PSfrag replacements 71
5.2 DDD-based hierarchical decomposition
2
R2
1 I
R3 3
C2
C1
C3
Figure 5.2 A simple RC circuit.
1 R1
+ sC1 + − R12 0
1 R2
1 R2
− R12 + sC2 + − R13
0 1 R3
− R13 + sC3
1 R3
" # " # v1 I v2 = 0 . v3
0
We view each entry in the circuit matrix as one distinct symbol, and rewrite its system determinant in the left-hand side of Figure 5.3. Then its DDD representation is shown in the right-hand side. Please refer to [110] for the formal definition of a DDD graph. A + PSfrag replacements
1 edge
A B
D +
0
C D E
G
+
0 F G
C −
F −
0 edge
B +
E +
E 1
0
Figure 5.3 A matrix determinant and its DDD.
A DDD is a signed, rooted, directed acyclic graph with two terminal vertices, namely the 0-terminal vertex and the 1-terminal vertex. Each non-terminal DDD vertex is labeled by a symbol in the determinant, denoted by ai (A to G in Figure 5.3), and a positive or negative sign, denoted by s(ai ). It originates two outgoing edges, called 1-edge and 0-edge. Each vertex ai represents a symbolic expression D(ai ) defined recursively as follows: D(ai ) = ai s(ai ) Dai + Dai , where Dai and Dai represent, respectively, the symbolic expressions of the vertices pointed by the 1-edge and 0-edge of ai . The 1-terminal vertex represents expression 1, whereas the 0-terminal vertex represents expression 0. For example, vertex E in Figure 5.3 represents expression E, and vertex F represents expression −EF , and vertex D represents expression DG − F E. We can also say that a DDD vertex D represents an expression defined the DDD subgraph rooted at D.
72
Passive hierarchical model order reduction
bI
v
AII
AII
aBB u,v
u
A[1, ..., m, u|1, ..., m, v]
u
bB u
A[1, ..., m, u|1, ..., m] + bI
Figure 5.4 Illustration of Theorem 5.1. Reprinted with permission from [122] (c) 2005 IEEE.
A 1-path in a DDD corresponds a product term in the original DDD, which is defined as a path from the root vertex (A in our example) to the 1-terminal including all symbolic symbols and signs of the vertices that originate all the 1edges along the 1-path. In our example, there exist three 1-paths representing three product terms: ADG, −AF E and −CBG. The root vertex represents the sum of these product terms. The size of a DDD is the number of DDD vertices, denoted by |DDD|. Note that the size of a DDD depends on the size of circuits in a very complicated way. |DDD| is a linear function of the circuit size for ladder circuits and it may grow superlinearly in general with the sizes of circuits [110].
5.2.2
DDD representation for hierarchical subcircuit decomposition First we introduce the following theorem: [124] Theorem 5.1. (5.7) can be written in the following form: aBB∗ u,v =
det(A[1, ..., m, u|1, ..., m, v]) , det(AII )
(5.10)
where u, v = m + 1, ..., m + l and (5.8) can be written in the following form: bB∗ u =
det(A[1, ..., m, u|1, ..., m] + bI ) , det(AII )
(5.11)
where, A[1, ..., m, u|1, ..., m, v] is the submatrix that consists of matrix AII , which actually is A[1, ..., m|1, ..., m], plus row u and column v of matrix A; A[1, ..., m, u|1, ..., m] + bI is the submatrix that consists of matrix AII plus row u of matrix A and the right hand side column bI . The A[1, ..., m, u|1, ..., m, v], and A[1, ..., m, u|1, ..., m] + bI are illustrated in Figure 5.4. With Theorem 5.1, we only need to compute the rational function for the numerator, which is a determinant (as det(AII ) shared by all the newly created matrix elements for each subcircuit) for each boundary-variable related element in the reduced circuit matrix instead of representing each individual first-order cofactor of
73
5.2 DDD-based hierarchical decomposition
det(AII ) explicitly [126]. This leads to the efficient DDD-based hierarchical decomposition method. Such a determinant-only DDD representation for all the involved matrix elements is more compact than the previous method [126], as first-order cofactors are not explicitly represented. Therefore, more sharing of common terms becomes possible among those cofactors and elements in row u, columns v and bI . More importantly, such DDD-only representation of new admittances is more amenable to removing cancellation during a subcircuit reduction process as shown in the next section [122].
5.2.3
Y-expanded DDDs
For our problem, some elements (in admittance forms) of a circuit matrix will become a rational function of s during the hierarchical subcircuit reduction process. As a result, the construction of the rational function for such a determinant will be different from the method used for s-expanded DDDs [111]. To efficiently compute the rational function for a determinant and handle cancellations, a new DDD graph, a Y-parameter expanded DDD (YDDD), is first introduced. Y-expanded PSfrag called replacements DDDs are also DDD graph where each DDD node represents a device admittance or a composite admittance, as shown in Figure 5.5. a
1-edge bs c
a + bs c + ds
e
f
0-edge
f e
1
ds
0
Figure 5.5 A determinant and its YDDD. Reprinted with permission from [122] (c) 2005 IEEE.
Note that some circuit parameter admittances are functions of the complex frequency variable s. This is different from the s-expanded DDD graphs where s is explicitly extracted and represented [111]. The main purpose of the introduction of YDDDs is that we can easily handle both term cancellation and common-factor cancellation as cancellation patterns can be easily detected by examining those device admittances or composite admittances. Like s-expanded DDDs [111], the YDDDs can be constructed from a complex DDD in linear time in the size of the original complex DDDs. The time complexity for constructing a complex DDD depends on the circuit topology. Given the best vertex ordering, if the underlying circuit is a ladder or tree circuit, |DDD| is a linear function of the size of the circuit. For general circuits, the size of a DDD graph may grow exponentially in the worse case.
74
Passive hierarchical model order reduction
But like BDDs, with proper vertex ordering, the DDD representations are very compact for many real circuits [110, 111]. In Figure 5.2.3, we present a version of the algorithm called YDDDConst(). YDDDConst(D) 1 if ( D = 0 or D = 1) 2 return NULL 3 L0 = YDDDConst(D.0) 4 L1 = YDDDConst(D.1) 5 Presult = NULL 6 for i = 1 to m do 7 Ptmp = YDDDMultiply(L1 ,D.xi ) 8 Presult = YDDDUnion(Ptmp , Presult ) 9 return YDDDUnion(Presult , L0 ) Figure 5.6 Y-expanded DDD construction. Reprinted with permission from [122] (c) 2005 IEEE.
Function YDDDConst() takes a complex DDD rooted at D and returns the resulting YDDD tree. D.1 and D.0 denote, respectively, the vertices pointed to by the 1-edge and 0-edge of vertex D. m is the number of devices admittance or composite admittances connected to a node, each of which is denoted by D.xi . YDDDUnion(P1 , P2 ) computes the union of two YDDDs, P1 and P2 . YDDDMultiply(P, v) computes the product of YDDD P and YDDD vertex v. It was shown that YDDD representation is more compact than the sequence of expressions [51, 126], whose complexity essentially represents the complexity of symbolic node-by-node Gaussian elimination process. Hence time complexity of the general hierarchical reduction algorithm is better than the Y -∆ reduction algorithm [96], which is based on Gaussian elimination. Another advantage of the general reduction algorithm over the existing Y -∆ transformation algorithm is that we need to remember a smaller number of common factors, which are the determinants of the reduced subcircuit matrices. If we reduce one node at a time, each node becomes a subcircuit, we end up with more common factors to deal with. Also there are more independent subcircuit hierarchies as each subcircuit is the smallest (consists of one node). As a result, we have to remember all the common factors of the reduced circuits (nodes) that are connected to nodes yet to be reduced, which in turn leads to more memory usage compared to the general reduction method. Our experimental results confirm those observations. With YDDD, we can compute the s-polynomial of a determinant in a concellation-free manner. But in the context of hierarchical decomposition, we need to compute the rational admittance functions in a hierarchical and cancellation-free way with limited orders of s. The detail of this algorithm can be found at [97, 122]. For large RLC interconnect circuits, truncations have to be carried out to keep limited orders of s in each rational admittance. The resulting admittances or transfer functions may not be stable. A convex programming based method will be carried out to enforce the passivity of the reduced models [21], which will be discussed
5.2 DDD-based hierarchical decomposition
75
in Chapter 10.
5.2.4
Overview of passive hierarchical model order reduction (HMOR) method In this subsection, we give the overview of the general frequency-domain hierarchical model order reduction and realization flow. We show how the symbolic hierarchical subcircuit decomposition and rational function computations can be used to perform the hierarchical model order reduction. The whole reduction flow algorithm consists of several important steps. The key step for HMOR is hierarchical circuit reduction and model order reduction. The basic idea is to reduce subcircuits in a hierarchical and cancellation-free way. The new admittances coming from subcircuit suppressions shown in (5.10) and (5.11) are kept as rational functions of s in the exact or order-reduced form (with fixed order of s). Such a reduction can be repeated until we reach the top-level circuit, which is typically small enough to be solved exactly and symbolically (s is still the only symbol). The reduction process is repeated for multiple frequency (expanded at multiple frequency points) for wideband modeling. The resulting circuit unknown variables are rational functions of s, which can be further optimized to enforce passivity and be realized for easy use with SPICE-like circuit simulators. The whole hierarchical model order reduction algorithm is described below: The general hierarchical network modeling algorithm 1. Build the complex DDDs for each subcircuit matrix and all the required cofactors and determinants for each newly created admittances in a bottom up way. 2. If the present circuit has subcircuits, perform the reduction on each subcircuit first. 3. Derive the YDDDs from complex DDDs for each new composite admittance shown in (5.10) and (5.11). 4. Construct the cancellation-free rational admittances for each YDDD. 5. Repeat step 1 to step 4 for multiple frequency points based on the given error bound. 6. Select the dominant poles from different frequency regions to form the final set of poles, which cover all the frequency range. 7. Apply convex programming passivity enforcement and optimization to determine all the residues of each rational admittance in the reduced admittance matrix (Chapter 10). 8. Realize the passivity-enforced order reduced admittance matrix using a general network synthesis technique (Chapter 11). The general hierarchical reduction algorithm is also illustrated in Figure 5.7. In the following sections, we first present some theortical results, which show that the one-point hierarchical model order reduction is equivalent to the implicit moment-matching around s = 0 and hierarchical model order reduction preserves the reciprotical property of a circuit. Then we present the multi-point hierarchical
76
Passive hierarchical model order reduction
model order reduction scheme for wideband modeling. Hierarchical network reduction
Repeat the first four steps for multiple frequency points under given error.
Build CDDDs for cofactors and system determinants for all the subcircuits.
If current circuit has subcircuits, perform hierarchical reduction on all the subcircuits in top−down fashion recursively.
Select the dominant poles from a different frequency range to form the final set of poles.
Build the YDDDs from each CDDD for each new composite admittance.
Use a convex programming method to enforce passivity of the reduced models and determine the residues of rational admittances.
Build the cancellation−free rational admittancs for each YDDD in a cancellation−free way.
Use a general network realization method to realize the reduced admittance matrix into a spice−compatible circuit.
Figure 5.7 The general hierarchical model order algorithm flow.
5.3
Hierarchical reduction versus moment-matching In this section, we first discuss how the frequency-domain hierarchical reduction is related to the implicit moment-matching. Then we discuss the numerical stability and reciprocity-preserving property of the hierarchical reduction process.
5.3.1
moment-matching connection Consider a linear system with n state variables in vector x, the system is given by sx = Ax + b,
(5.12)
where A is an n × n system matrix, b is the input vector to the circuit. Then we can obtain x = (Is − A)−1 b. Let us consider single-input single-output systems, where we have only one input bj and we are interested in state response at node i. In this case we have xi (s) = Hij (s)bj =
∆ij bj , det(Is − A)
(5.13)
5.3 Hierarchical reduction versus moment-matching
77
where ∆ij is the first-order cofactor of matrix M = (Is − A) with respect to the element at the row i and column j and Hij (s) is the transfer function. So the exact solution of any state variable or its transfer function in frequency-domain can be represented by a rational function of s. Hierarchical reduction, basically, is to reduce the n × n matrix M into a very smaller m × m matrix M 0 based on block Gaussian elimination, such that xi can be trivially solved symbolically by using (5.13). During this reduction process, all the rational functions involved are truncated up to a fixed maximum order and the final solution will be a rational function with the same order for its numerator and its denominator. We then have the following theoretical result for the computed state variable x0i (s) from the hierarchical reduction process in the frequency-domain. Theorem 5.2. The state variable x0i (s) computed by the frequency-domain hierarchical reduction with q as the maximum order for all the rational functions will match the first q moments of the exact solution xi (s) expanded by Taylor series at s = 0. Proof: We know that the exact solution of xi (s) is a rational function as shown in (5.13). Because of the truncation, the solution computed by the hierarchical reduction process will be given by x0i (s) =
a0 + a1 s + ... + aq sq , i = 1, ..., q. b0 + b1 s + ... + bq sq
(5.14)
It was proved in [123] that a cancellation-free rational expression from the hierarchical reduction process is the exact expression obtained from the flat circuit matrix. If we do not perform any truncation, x0i (s) will be the exact solution, xi (s), which is obtained from the flat circuit matrix by (5.13) when all cancellations are removed numerically during the hierarchical reduction process (assuming that no numerical error is introduced). With truncation, all the coefficients a0 , ..., aq and b0 , ..., bq are still exactly the same as that in xi (s). If we compute the moments of x0i (s) = m0 + m1 s + ..., the first q moments can be uniquely determined by the 2q coefficients a0 , ..., aq and b0 , ..., bq : Pi ai − k+l=i,k≤i,l≤i,k6=0 bk ml . (5.15) mi = b If b0 is not zero, b is simply b0 , otherwise b will be the first non-zero coefficient bt and the first q moments become the coefficients of s−t (t > 0) to that of s−t+q . So 0 the theorem is proved. Hence the transfer function Hij (s) will also match the exact one up to the first q moments. QED. For a general multi-input and multi-output system, each element in the reduced m × m admittance matrix M 0 (s) becomes a rational function [124]: aBB∗ u,v =
det(M [1, ..., m, u|1, ..., m, v]) , det(M II )
(5.16)
78
Passive hierarchical model order reduction
where, M [1, ..., m, u|1, ..., m, v] is a matrix that consists of matrix M II . It is M [1, ..., m|1, ..., m], plus row u and column v of matrix M . Then we have the following result: BB∗
Corollary 5.1. Each rational admittance function a0 u,v (s) in the reduced m × m matrix, M 0 (s), by the hierarchical reduction process, will match the first q moments of the exact rational function aBB∗ u,v (s) expanded by Taylor series at s = 0.
Numerical stability of the hierarchical reduction The hierarchical reduction process is essentially equivalent to implicit momentmatching at s = 0. As a result, the frequency response far away from s = 0 will become less accurate owing to the truncation of high order terms. Another source of numerical error comes from the numerical de-cancellation process, where polynomial division is required for removing the common factors (cancellation) in the newly generated rational function, which will, in turn, introduce error from numerical term cancellation (the sum of two symbolic terms should have been zero, but is not zero owing to numerical error). Such numerical noise will cause the higher-order terms to be less accurate even if we try to keep them. In Figure 5.8, we show the responses from the 3-way, 2-level partitioned µA741 circuit [126] under different maximum reduction orders of rational functions. As we can see, increasing the rational function order does not increase the accuracy of the response after the order reaches eight. This is the typical numerical stability problem with the moment60 50 40 30
Voltage gain (dB)
5.3.2
20
Exact 4th order 6th order 8th order 10th order 12th order 14th order
10 0 −10 −20 −30 −40 0 10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
10
Frequency
Figure 5.8 Frequency responses of µA741 circuit under different reduction orders. Reprinted with permission from [94] (c) 2006 IEEE.
matching method [91]. However, unlike explicit moment-matching methods, the
79
5.3 Hierarchical reduction versus moment-matching
hierarchical reduction is numerically stable for tree-structured circuits. We then show the following results: Theorem 5.3. For tree-structured circuits, the hierarchical reduction process can be performed such that there is no common-factor cancellation in the generated rational functions. Proof: For tree structured circuits, we can always partition the circuit in such a way that each subcircuit only has one node shared with its parent circuit. As a result, there is only one composite admittance aBI ∆II aIB /det(M II ) in (5.7) generated in u,k1 k2 ,k1 k2 ,v its parent circuit for each subcircuit. According to the common-factor cancellation condition [121], at least four composite elements from the same subcircuit reduction are required for the existence of common-factor cancellation. So no common-factor cancellation will occur under such partitioning. QED. The significance of Theorem 5.3 is that the hierarchical reduction process becomes numerically stable for an arbitrary order for tree circuits. The only cancellation left is the term cancellation, where the sum of two symbolic terms is zero, which will not introduce any noticeable numerical error in the reduction process. Figure 5.9 shows the voltage gain response (real part) of an RC tree with about 100 nodes (also 100 capacitors) under different reduction orders. As can be seen, the reduced voltage gain will match the exact one well when the kept orders reach about 60. 0.12
Real part of voltage gain
0.1
Exact 10th order 20th order 40th order 60th order
0.08
0.06
0.04
0.02
0
−0.02
0
1
2
3
4
5
Frequency
6
7
8
9
10 8
x 10
Figure 5.9 Frequency responses of an RC tree circuit under different reduction orders. Reprinted with permission from [94] (c) 2006 IEEE.
The fact that no common-factor de-cancellation (polynomial division) is required was also exploited in the direct truncation of the transfer-function method
80
Passive hierarchical model order reduction
(DTT) [56], where only polynomial addition is required to compute the truncated transfer functions for tree-structured RLC circuits. The DTT reduction process can be viewed as a special case of our method. But for general non-tree structured circuits, polynomial division is required in node-elimination-based reduction methods owing to common-factor cancellation, and polynomial division owing to truncation will not be numerically stable for very high frequency ranges far away from DC, as shown before. To mitigate this problem, we propose using multi-point expansion to obtain accurate rational functions or reduced admittance matrices for modeling a general multi-input and multi-output linear system, as will be shown in Section 5.5.
5.4
Preservation of reciprocity A reciprocal network is one in which the power losses are the same between any two ports regardless of the direction of propagation [127]. Mathematically, this is equivalent to the requirement that the circuit admittance matrix is symmetric (or scattering parameter S21 = S12, S13 = S31, etc). A network is known to be reciprocal if it is passive and contains only isotropic materials. Reciprocity is an important network property. For the hierarchical reduction, we have the following result: Theorem 5.4. The hierarchical reduction method preserves the reciprocity of a passive circuit matrix. Proof: The proof can be found by using (5.10) again. We first study a circuit with circuit matrix M . The circuit has one subcircuit with circuit matrix M II . Assume that the original circuit matrix is symmetric (its subcircuit is also symmetric, i.e., both M and M II are symmetric) owing to reciprocity. After reduction, the reduced circuit matrix becomes an m × m matrix where each matrix element at row u and column v appears in (5.10). Then we look at the element at row v and column u, which is aBB∗ v,u =
det(M [1, ..., m, v|1, ..., m, u]) , det(M II )
(5.17)
Notice that matrix M is symmetric, so row u in (5.10) and column u in (5.17) are the same. This is true for column v in (5.10) and row v in (5.17). As a result we have M [1, ..., m, v|1, ..., m, u] = M [1, ..., m, u|1, ..., m, v]T ,
(5.18) T
det(M [1, ..., m, v|1, ..., m, u]) = det(M [1, ..., m, u|1, ..., m, v] ).
(5.19)
BB∗ Hence, aBB∗ u,v = av,u and reciprocity is preserved in the reduced circuit matrix when a subcircuit is reduced. In the hierarchical reduction, we reduce one subcircuit at
5.5 Multi-point expansion hierarchical reduction
81
a time and the reduced circuit matrix is still symmetric after reduction. So the reduced circuit matrix is still symmetric after all the subcircuits are reduced. QED.
5.5
Multi-point expansion hierarchical reduction The multi-point expansion scheme by real or complex frequency shifting has been exploited before in projection based reduction approaches for improving the modeling accuracy [19, 53]. The basic idea for such a strategy is that dominant poles close to an expansion point is more accurately captured than the poles that are far away from the expansion point in the moment-matching-based approximation framework. Therefore, instead of expanding at only one point, we can expand at multiple points along the real or complex axis to capture all the dominant points in the given frequency range accurately. In this chapter, we extend this concept to the hierarchical reduction algorithm. Specifically, at each expansion point, the driving point function or each rational admittance function in a reduced admittance matrix can be written into the partial fraction form f (s) =
n X i
ki /(s − pi ).
(5.20)
By intelligently selecting poles and their correspond residues from different expansions and combining them into one rational function, we can obtain a more accurate rational function for a very high frequency range. In this chapter, we propose an explicit waveform matching scheme based on hierarchical reduction framework to find dominant poles and their residues for both SISO (single-input single-output) and MIMO (multi-input multi-output) systems. It is shown experimentally to be superior to the existing pole searching algorithm.
5.5.1
Multi-point expansion in hierarchical reduction To expand the circuit at an arbitrary location in the complex s-plane, say sk = αk + ωk j, we can simply substitute s in (5.13) by s + sk . Then (5.13) becomes xi (s) = Hij (s)bj =
∆ij (s + sk ) bj . det(I(s + sk ) − A)
(5.21)
As shown in [19], poles that dominate the transient response in interconnect circuits are near the imaginary axis with large residues. Hence, we expand along the imaginary axis for RF passive and interconnect circuits. Since only capacitors and inductors are associated with the complex frequency variable s, expansion at a real point α or a complex point ωi j is essentially equivalent to analyzing a new circuit where each capacitor C has a new resistor (with real value αi C or complex value ωi Cj) connected in parallel with it and each inductor L has a new resistor (with
82
Passive hierarchical model order reduction
real value αi L or complex value ωi Lj) connected in series with it [92]. In this chapter, we show that the multi-point expansion can be made very efficiently in the hierarchical reduction framework. The rational functions are constructed in a bottom-up fashion in a Y-parameter determinant decision diagram (YDDD) in the hierarchical reduction algorithm [121]. When a capacitor C or an inductor L (its YDDD node) is visited, we build a simple polynomial 0 + Cs or 0 + Ls to multiply or add to existing polynomials seen at that DDD node. In the presence of a non-zero expansion point, αi or ωi j, we can simply build a new polynomial αi C + Cs or ωi Cj + Cs for the capacitor and αi L + Ls or ωi Lj + Ls for the inductor respectively. So we do not need to rebuild the circuit matrix or the YDDD graphs used for reduction at s = 0. Instead, we only need to rebuild the rational functions by visiting every YDDD node once, which has the time complexity linear with the YDDD graph size, a typical time complexity for DDD-graph-based methods [110].
5.5.2
Explicit waveform-matching algorithm One critical issue in multi-point expansion is to determine, at each expansion point, which poles are accurate and should be included in the final rational function. In the complex frequency hopping method [19], a binary search strategy was used where poles (common poles) seen by two expansion points and poles with distance to the expansion points shorter than the common poles are selected. Such a common-pole matching algorithm, however, is very sensitive to the numerical distance criteria for judging if two poles are actually a same pole. For accurately detecting common poles, a small distance is desirable, but this will lead to more expansion points; and even worse is that the same pole may be treated as different poles seen by two different expansion points. Also, this method may fail to detect some dominant poles as the circle for searching accurate poles might be too small, as shown in our experimental results. In this chapter, we propose a new reliable pol-searching algorithm, which is based on explicit frequency waveform matching. The new algorithm is based on the observation that a complex pole pi and its residue ki in the partial fraction form, ki /(s − pi ), has the largest impact at frequency fi when the imaginary part of the pole equals 2πfi . Figure 5.10 shows a typical response of ki /(s − pi ), where ki = 2.78 × 1012 + 2.34 × 1010 j and pi = −4.93 × 108 + 2.58 × 1010 j. The peaks of both real (absolute) value and magnitude are around 4.11 × 109 , which is equal to 2.58 × 1010 /(2π). The reason for this is that both real and imaginary parts of ki /(s − pi ) reach a peak when their denominator (pr)2 + (ω − pi)2 reaches a minimum at ω = pi, where s = ωj and pi = pr + pij. A complex pole with a negative imaginary part typically will not have significant impact on the upper half complex plane. The idea of frequency-waveform matching is explicitly to match the approximate frequency waveform with that of exact ones. Specifically, at an expansion point, f i , we perform the hierarchical reduction and then determine an accurate maximum frequency range [fi , fi+1 ] such that the error between responses (magnitude) of the
83
5.5 Multi-point expansion hierarchical reduction
0
80
75 −1000
Magnitude (dB(Vol))
70
Real (Vol)
−2000
−3000
−4000
65
60
55
50
45 −5000 40
−6000
0
2
4
6
8
Frequency (rad/s)
10 9
x 10
35
0
2
4
6
8
Frequency (rad/s)
10 9
x 10
Figure 5.10 Responses of a typical ki /(s − pi ). Reprinted with permission from [94] (c) 2006 IEEE.
reduced rational function and that of the exact one are bounded by a pre-specified error bound. The error is computed as follows: err =
|dB20(Ve ) − dB20(Va )| |dB20(Ve )|
(5.22)
where dB20(x) = 20 ∗ log10 (|x|) and |x| is the magnitude of a complex number x; Ve is the exact response and Va is the approximate response. If |dB20(Ve )| = 0, then we use |dB20(Va )| as the denominator in (5.22), if it is not zero. If |dB20(Va )| = 0, we have err = 0. Then fi+1 will be the next expansion point. All the poles whose imaginary parts fall within the range [2πfi , 2πfi+1 ] will be selected because their contribution in this frequency range is the largest. The new algorithm does not have the duplicate pole issue as accurate poles can only be located at one place. The accuracy of the found poles is assured by explicit waveform matching. Experimental results show that it tends to use less expansion points than the common-pole matching method, and less CPU time.
5.5.3
Multi-point expansion for MIMO system reduction For a multi-input multi-output system, by using the modified nodal analysis, the reduced circuit matrix M 0 (s) = [yij (s)]m×m will become an m × m admittance matrix. Each admittance yij is a complex rational function with real or complex (if expansion points are on the imaginary axis) coefficients. In this case, we explicitly watch for the error between each approximate rational admittance and the exact value of the admittance at each frequency. The exact value of each admittance can be computed by visiting the DDD graph representing the admittance. Since there
84
Passive hierarchical model order reduction
is a lot of sharing among those admittances, the costs of evaluating all the admittances are similar to evaluating one admittance, considering that every DDD node just needs to be visited once at each frequency point [126].
5.6
Numerical examples In this section, we present some numerical examples by using the HMOR methods and compare it with several other methods.
5.6.1
One-port macro-model for spiral inductor We first construct a detailed PEEC model for a three-turn spiral inductor with its substrate. We assume copper (ρ = 1.7 × 10−8 Ω · m) for the metal and the lowk dielectric ( = 2.0). The substrate is modeled as a lossy ground plane (heavily doped) with ρ = 1.0 × 10−5 Ω · m. The conductor is volume discretized according to the skin depth, and longitudinally segmented by one 10th of the wave-length. The substrate is also discretized as in [79]. The capacitance is extracted by FastCap [83] and only adjacent capacitive coupling is considered, since capacitive coupling is short-range. The partial inductance is extracted by FastHenry at 50GHz [59]. The inductive coupling between any pair of segments (including segments in the same line) is considered. Then we generate the distributed PEEC model by π-type of RLC-topology to connect each segment, and it results in a SPICE netlist with 232 passive RLCM elements. The substrate parasitic contribution (Eddy current loss) is lumped into the above conductor segment. Note that for more accurate extraction at ultra-high frequencies, we need a full-wave PEEC model description [61]. For mutual inductance, a vector potential equivalent model (VPEC) is used [134], which is more hierarchical-reduction friendly as no coupling inductor branch currents are involved and circuit partition can be done easily.
5.6.2
Comparison with common-pole matching method in frequency domain For the spiral inductor, the driving point impedance is obtained by the multi-point hierarchical reduction process. We use both the common-pole matching algorithm in the complex frequency-hopping method and the new waveform-matching algorithm to search for dominant poles along the imaginary axis. For a fair comparison, we make sure that the resulting rational functions will have similar accuracy. For a common-pole matching algorithm, if two poles are located within 1% of their magnitude, they are regarded as the same pole. For the waveform-matching algorithm, the error bound between the approximate one and the exact one is set to 0.1%. As a result, the common-pole approach takes 26 expansions with 37.1 seconds, waveform marching method use 15 expansion with 22.57 seconds. The responses obtained using both methods versus the exact response up to 100GHz are shown in Figure 5.11. The responses from both methods
85
5.6 Numerical examples
match the exact ones very well all the way to 100 GHz. Our experience shows that the CPU time of the common-pole method is highly dependent on the common-pole detection criteria. For instance, if we set the criteria for common-pole detection to 0.5%, then 65 expansions are carried out. Also as more expansions are carried out, the chance that a single pole is seen by two consecutive expansion points becomes larger, but it may be treated as different poles, owing to a small distance criteria, which in turn leads to a significant distortion of the frequency response. 6000
Frequency Driving Point Responses of the On−Chip Spiral Inductor
5000
Common−pole matching Exact Waveform matching
Ohm
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
0.6
Frequency (in 100GHz)
0.7
0.8
0.9
1
Figure 5.11 Frequency responses of the three-turn spiral inductor and its reduced model by using waveform matching and the common-pole method. Reprinted with permission from [94] (c) 2006 IEEE.
5.6.3
Time-domain simulation of an LC oscillator We further demonstrate the accuracy and efficiency of the inductor macro-model in the time-domain harmonic simulation. Note that the synthesized one-port macromodel can be used to efficiently predict the critical performance parameters of spiral inductor, such as the ωT , Q factor, and even the resonance starting-condition for an oscillator [99]. We use the Colpitts LC oscillator as an example, as shown in Figure 5.12 (b), where the active circuit behaves like a negative resistor to make the oscillator work, as shown in Figure 5.12 (a). In this experiment, the synthesized one-port model is from a 25-order rational function with 24 poles and results in a macro-model with 40 RLC elements. As shown in Figure 5.13, the waveform in the steady state of the synthesized and original models match very well while we observe a 10× (5.17s as against 0.52s) runtime speed-up by using the reduced model.
86
Passive hierarchical model order reduction
PSfrag replacements
V dd ←Spiral inductor
M1
Passive resonator
Active circuit
M1 Vb
W:L
C1
s C2 I
Rp = −Ra
Gnd (a)
(b)
Figure 5.12 Colpitts LC oscillator with spiral inductors. Reprinted with permission from [94] (c) 2006 IEEE. Output of a Colpitts LC oscillator Original Synthesized 1.8
1.6
Voltage (V)
1.4
1.2
1
0.8
0.6
0.4
4.5
5
5.5
Time (s)
6
6.5
7 −8
x 10
Figure 5.13 Time-domain comparison between original and synthesized models for a Colpitts LC oscillator with a three-turn spiral inductor. Reprinted with permission from [94] (c) 2006 IEEE.
5.6.4
Multi-port macro-model of coupled transmission line We then use a two-bit coupled transmission line as the example for multi-port reduction and synthesis. The original PEEC model contains 42 resistors, 63 capacitors, 40 self-inductors, and 760 mutual inductors, where we consider inductive coupling between any two segments including those in the same line. Still, for mutual inductance, a vector potential equivalent model (VPEC) is used. The matching frequency is up to 31 GHz and we find 24 dominant poles in
87
5.6 Numerical examples
Imag part (mho)
Original system Synthesized ROM
4 2 0 0 −3
Magnitude (mho)
−3
x 10
6
5 10 Frequency
x 10
5
x 10
2
−2
x 10
1
Original system Synthesized ROM
3 2
Original system Synthesized ROM
0
−4 0
15
4
1 0
4
10
Phase (mho)
Real part (mho)
−3
6
0.5
5 10 Frequency
15 10
x 10
Original system Synthesized ROM
0 −0.5 −1
5 10 Frequency
15 10
x 10
−1.5 0
5 10 Frequency
15 10
x 10
Figure 5.14 Frequency responses of Y11 of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE.
this range. There are 150 RLC elements in the synthesized circuit compared with 364 devices in the original circuit; this represents a 58.79% reduction rate. The frequency responses for Y11 (s) and Y12 (s) are shown in Figure 5.14 and Figure 5.15, respectively. If we only match to 14 Ghz, 12 poles are required and we can achieve a 78.5% reduction rate instead. The time-domain step responses from the original circuit, the 14 GHz synthesized circuit, and the 31 GHz synthesized circuit are shown in Figure 5.16. The difference among these three circuits is fairly small. In Figure 5.17, we further compare the frequency responses of the 24th-order macro-model by hierarchical model order reduction (HMOR), the 24th-order macromodel by PRIMA, and time-constant-based reduction (with similar reduction ratios) with the original circuit. The frequency response of port one is observed at the input port of the first bit, and that of port two is at the far end of the first bit. Owing to the preserved reciprocity, the reduced mode is easily realized by Foster’s synthesis, and the model size is half of the SPICE-compatible circuit by PRIMA (via recursive convolution for each Yij ). Moreover, as shown in Figure 5.17, the accuracy of the 24th-order model by HMOR can match up to 30 GHz but the same order model by PRIMA can only match up to 20 GHz. Note that under the similar reduction ratio with HMOR, the time-constant-based reduction can only match up to 5 GHz.
88
Passive hierarchical model order reduction
−3
−3
x 10
Imag part (mho)
Real part (mho)
4
Original system Synthesized ROM
2 0 −2 −4 0 5
x 10
0 −2 −4 0
15
5 10 Frequency
x 10
4
3 2
15 10
x 10
Original system Synthesized ROM
2 0 −2
1 0 0
Original system Synthesized ROM
10
Original system Synthesized ROM
4
x 10
2
Phase (mho)
Magnitude (mho)
−3
5 10 Frequency
4
5 10 Frequency
15
−4 0
5 10 Frequency
10
x 10
15 10
x 10
Figure 5.15 Frequency responses of Y12 of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE.
1 Circuits
circuit1 circuit2 circuit3 circuit4 circuit5
2
3
No. of
No. of
elements
poles
84 258 905 5255 20505
10 10 25 30 30
4
5
6
7
8
Simulation time (s)
9
10
11
Model size (kb)
HMORtime constPRIMA Original HMORtime constPRIMAOriginal
0.15 0.15 0.32 0.52 0.52
0.12 0.15 0.43 0.89 1.08
0.50 0.51 0.85 0.92 5.31 0.85 2.25 19.51 1.68 3.13 661.46 1.92 5.98 1356.66 1.92
0.86 0.86 1.70 2.23 2.53
3.51 3.51 19.8 28.2 28.2
3.52 11.23 41.1 243.6 957.9
Table 5.1 Simulation efficiency comparison between original and synthesized model (part I). Reprinted with permission from [94] (c) 2006 IEEE.
5.6.5
Scalability comparison with existing methods Table 5.6.5 gives the reduction CPU time comparison for two methods; HMOR denotes the CPU times of hierarchical reduction. We notice that the HMOR is slow than the PRIMA. But the difference become less for large circuits. Theoretically, PRIMA and the one-point hierarchical model reduction have the same time complexity, that is the time of complexity of one Gaussian elimination. In PRIMA, we have to solve the circuit matrix at least once using Gaussian elimination or LU decomposition to solve for all the Krylov space base vectors (or moments). In the HMOR method, if we reduce one node at a time, it becomes a Gaussian elimination process. All the polynomial operations with fixed order have fixed computing costs.
89
5.6 Numerical examples
1.2 1
Voltage (V)
0.8
Original system Model fit to 31GHz Model fit to 14GHz
0.6 0.4 0.2 0 −0.2 0
0.5
1 Time (s)
1.5
2 −9
x 10
Figure 5.16 Transient responses of a two-bit transmission line. Reprinted with permission from [94] (c) 2006 IEEE.
Voltage (dB)
Frequency domain accuracy comparsion for bus 2x10 at port1 Original Time−constant Prima H−reduction
6
5.5
5
0
1
2
3
Frequency (Hz)
4
6 10
x 10
Frequency Domain Accuracy Comparsion for bus 2x10 at Port−2
5.5
Original Time−constant Prima H−reduction
5 Voltage (dB)
5
4.5 4 3.5 3 2.5 2
0
1
2
3 Frequency (Hz)
4
5
6 10
x 10
Figure 5.17 Frequency responses of a two-bit transmission line at two ports. Reprinted with permission from [94] (c) 2006 IEEE.
The efficiency difference is mainly owing to expensive recursive operations used in graph operations, which can be further improved, and multi-point matching. However the multi-point matching makes our method closed loop and this gives us good control on the model accuracy. For PRIMA, the model accuracy cannot be
90
Passive hierarchical model order reduction
Circuits
circuit1 circuit2 circuit3 circuit4 circuit5
No. of elements
84 258 905 5255 20505
No. of poles
10 10 25 30 30
Delay error (%) HMOR
time const.
PRIMA
−0.16% −0.24% −0.41% −0.62% −1.04%
−0.86% −1.12% −4.43% −6.83% −12.91%
−0.15% −0.22% −0.37% −0.58% −0.83%
Table 5.2 Simulation efficiency comparison between original and synthesized model (part-II). Reprinted with permission from [94] (c) 2006 IEEE.
Circuits
No. of elements
PRIMA (s)
HMOR (s)
circuit1 circuit2 circuit3 circuit4 circuit5
84 258 905 5255 20505
0.042 0.077 0.16 1.73 32.37
0.6 11.7 17.8 44.4 96.8
Table 5.3 Comparison of reduction CPU times. Reprinted with permission from [94] (c) 2006 IEEE.
determined without several trials using different reduction orders. We finally present a scalability comparison in Tables 5.1 and 5.2 by time-domain transient simulation for the following aspects: (i) runtime of simulation; (ii) realization efficiency (realized model size); and (iii) accuracy in terms of delay. Several different sized RLCM circuits are used. We compare our method (HMOR) with the time-constant-based circuit-reduction [3] (time const.) and projection-based reduction PRIMA implemented at [64] (PRIMA). The same number of poles is used for the reduction when we compare our HMOR with PRIMA. The reduced model by time-constant reduction is obtained with a similar model size as HMOR. First, we find that the realized RLCM circuit model size is up to 10 times smaller on average than the SPICE compatible circuit from PRIMA. Therefore, a similar simulation speed-up (8×) is observed when we run both circuits in SPICE3. When we further compare the simulation time of our reduced models with the PEEC circuits, a significant speed-up (up to 2712× for circuit5) is obtained. Furthermore, the waveform accuracy in terms of delay is given in columns 4–6 in Table 5.2. The reduced models are very accurate with the worst case delay error being −1.04% even with 478× (957.9kb versus 1.92kb) reduction ratio in terms of model size. But for the same reduction ratio as our reduction, we find that the time-constantbased reduction introduces large errors (up to 12.91%) because too many nodes are eliminated and the reduction criteria cannot be satisfied. Note that the sparsification in the VPEC model can dramatically reduce the number of mutual inductive couplings, but can also maintain the accuracy [134]. As
5.8 Historical notes on node-elimination-based reduction methods
91
a result, we use this technique during our reduction for larger circuits. For example, in the case of circuit5 (the largest one) in Table 5.1 and 5.2, we obtain a 97.5% sparsification from 19900 to 498 mutual inductors. Because of this sparsification, the reduction time reduces by 10× (365.4s versus 47.8s).
5.7
Summary In this chpater, we presented a new hierarchical multi-point reduction algorithm for wideband modeling of high-performance RF passive and linear(ized) analog circuits. On the theoretical side, we showed that the frequency-domain hierarchical reduction is equivalent to the implicit moment-matching around s = 0 and showed that the hierarchical one-point reduction is numerically stable for general tree-structured circuits. We also showed that the hierarchical reduction preserves the reciprocity of passive circuit matrices. We presented a hierarchical multi-point reduction scheme for high-fidelity, wideband modeling of general passive and active linear circuits. A novel explicit waveform matching algorithm is presented for searching dominant poles and their residues from different expansion points; this is shown to be more efficient than the existing pole-search algorithm. The passivity of reduced models is enforced by state-space-based optimization method. We also presented a general multi-port network realization framework to generate SPICE-compatible circuits as the macro models of the reduced circuit admittance matrices. The resulting modeling algorithm can generate the multi-port passive SPICE-compatible model for any linear passive networks with easily controlled model accuracy and complexity. Experimental results on a number of passive RF and interconnect circuits have shown that the HMOR modeling technique generates more compact models given the same accuracy requirements than existing approaches, such as PRIMA and time-constant methods.
5.8
Historical notes on node-elimination-based reduction methods As VLSI technology advances with increased operating frequency and decreased feature size, parasitics from on-chip interconnects and off-chip packaging will detune the performance of high-speed circuits in terms of slew rate, phase margin and bandwidth [2]. Reduction of design complexity especially for those extracted high-order RLCM network is important for efficient VLSI design verification. Compact modeling of passive RLC interconnect networks has been a research intensive area in the past decade owing to increasing signal integrity effects and interconnect-dominant delay in current system-on-a-chip (SoC) design [72]. Existing approaches can be classified into two categories. The first category is based on subspace projection [32, 37, 63, 85, 91, 113]. The projection-based method was pioneered by asymptotic waveform evaluation (AWE) algorithm [91], where explicit
92
Passive hierarchical model order reduction
moment-matching was used to compute dominant poles at low frequency. The Pade via Lanczos (PVL) [32], Arnoldi transformation [113] methods improved the numerical stability of AWE; congruence transformation method [63] and PRIMA [85] can further produce passive models. However, reduced circuit matrices by PRIMA are larger than direct pole marching (having more poles than necessary) [1] and PRIMA does not preserve certain important circuit properties, such as reciprocity [37]. The latest development by structured projection can preserve reciprocity [37], but it does not realize the reduced circuit matrices. An efficient reduction algorithm that is based on matching a few order moments and on realization is proposed in [62]. In general, no systematic approach has been proposed for realizing order-reduced circuit matrices. Another quite different approach to circuit complexity reduction is by means of local node elimination and realization [3, 27, 98, 107, 108]. The major advantage of these methods over projection-based methods is that the reduction can be done in a local manner and no overall solution of the entire circuit is required so that reduced models can be easily realized using RLCM elements. This idea was first explored by selective node elimination for RC circuits [27, 107], where time-constant analysis is used to select nodes for elimination. Node reduction for magnetic coupling interconnect (RLCM) circuits has recently become an active research area. Generalized Y -∆ transformation [98], RLCK circuit crunching [3], and branch merging [108] have been developed, based on nodal analysis (NA), where inductance becomes susceptance in the admittance matrix. Since mutual inductance is coupled via branch currents, to perform nodal reduction, an equivalent six-susceptance NA model is introduced in [98] to reduce two coupling current variables and template matching via geometrical programming is used to realize the model order reduced admittances, but its accuracy depends heavily on the selection of templates and only 1-port realization has been reported. Meanwhile, RLCK circuit crunching and branch merging methods are first-order approximation based on the nodal timeconstant analysis. The drawbacks for this first-order approximation are: (1) error is controlled in a local manner and will be accumulated, hence it is difficult to control the global error due to reduction; (2) not too many nodes can be reduced if the elimination condition is not satisfied. Another way to model and characterize complex interconnect structures in high frequency (in RF or even microwave ranges) is by means of rational approximation based on direct measurements or rigorous full-wave electromagnetic simulation [1, 21, 22, 46, 49, 82, 105]. Many of those methods have been used in RF and microwave circuit modeling, as they are very flexible and can be applied to different interconnect structures and wideband modeling.
6 Terminal reduction of linear dynamic circuits
Complexity reduction and compact modeling of interconnect networks have been an intensive research area in the past decade, owing to increasing signal integrity effects and rising electro and magnetic couplings modeled by parasitic capacitors and inductors. Most previous research works mainly focus on the reduction of internal circuitry by various reduction techniques. The most popular one is based on subspace projection [32,37,85,91,113]. The projection-based method was pioneered by asymptotic waveform evaluation (AWE) algorithm [91], where explicit moment matching was used to compute dominant poles at low frequency. Later on, more numerical stable techniques were proposed [32,37,85,113] by using implicit moment matching and congruence transformation. However, nearly all existing model order reduction techniques are restricted to suppress the internal nodes of a circuit. Terminal reduction, however, is less investigated for compact modeling of interconnect circuits. Terminal reduction is to reduce the number of terminals of a given circuit under the assumption that some terminals are similar in terms of performance metrics like timing or delays. Such reduction will lead to some accuracy loss. But terminal reduction can lead to more compact models after traditional model order reduction has been applied to the terminal-reduced circuit, as shown in Figure 6.1. For instance, if we use subspace projection methods like PRIMA [85] for the model order reduction, a smaller terminal count will lead to smaller reduced models, given the same order of block moment requirement for both circuits. The reason is that for every block moment order increase, PRIMA will generate m new poles in the reduced models where m is the number of terminals. Hence, fewer terminals will directly lead to fewer poles used in the reduced models. For terminal reduction, another question one may have is whether there are many similar terminals in many practical interconnect circuits? One important observation is that many terminals in practical interconnect circuits are close to each other structurally or are extracted by using mathematic discretization by volume or surface meshing in methods such as finite difference and finite element scheme. As a result, their electrical characteristics are also similar in a well-designed VLSI system. For instance, clock sinks in clock networks, substrate plane, and critical interconnects in the memory circuits like word or bit lines are among those interconnects. Recent studies [31,34] show that there exists a large degree of correlation 93
94
Terminal reduction of linear dynamic circuits
Model order reduction (MOR)
H*(s)
H(s)
MOR after terminal reduction
H(s)
H’(s)
Terminal reduction leads to more compact models Figure 6.1 Terminal reduction versus traditional model order reduction.
between the various input and output terminals. A terminal reduction method named, SVDMOR was proposed [31,34]. The method, which is based on a low-rank approximation, was performed on the input and output position matrices before the model-order-reduction process. However, the low-rank approximation in the existing methods is only based on the DC or a specific order of moments of responses. Hence, the port-reduced systems may not correlate well with the original systems in terms of timing or delay. In this chapter, we present a novel terminal reduction method called TermMerg. The new terminal reduction method is based on the observation that if we allow some delay tolerance or variations, which actually cannot be avoided in today’s VLSI chip manufacture and working environments, some of the terminals with similar timing responses can actually be suppressed or merged into one representative terminal during the reduction without affecting their modeling functionality. In contrast to the existing terminal reduction methods, the new approach uses high order moments as timing and delay metrics. Specifically, given some delay tolerances or variations, TermMerg employs a singular value decomposition (SVD) method to determine the number of clusters based on the low-rank approximation. Then the K-means clustering algorithm is used to group the moments of the terminals into different clusters. After the clustering, we pick one terminal that could best represent other terminals for each cluster. The new method can work with any passive model order reduction and ensure the passive models. In contrast, we show that SVDMOR does not generate passive models in general. Passivity enforcement in SVDMOR will significantly hamper the terminal reduction qualities. We also present an improved version of the SVDMOR method, called ESVDMOR, which improves the computation efficiency of SVDMOR as well as accuracy for the similar reduced model sizes when the used moment matrix does not give a good terminal
6.1 Review of the SVDMOR method
95
correlation.
6.1
Review of the SVDMOR method In this section, we briefly review the SVDMOR method for terminal reduction, which was proposed recently for reducing the terminals of interconnect circuits [31, 34]. For a linear RLC interconnect network with p input and q output terminals, we can apply modified nodal analysis to formulate it into the state space equation form ˙ Gx(t) + C x(t) = Bu(t) y(t) = Lx(t),
(6.1)
where G ∈ Rn×n and C ∈ Rn×n are the conductive and storage element matrices. L ∈ Rq×n and B ∈ Rn×p are the output and input position matrices. y(t) ∈ Rq , u(t) ∈ Rp . State variables x(t) ∈ Rn can be nodal voltages or branch currents of the linear circuit. The circuit transfer function is H(s) = L(G + Cs)−1 B.
(6.2)
Then the ith block moment of the system is defined as mi = L(−G−1 C)i G−1 B,
(6.3)
which is a q × p matrix function. The block moment mi can be directly computed in a recursive way x0 = G−1 B; m0 = Lx0 x1 = −G−1 Cx0 ; m1 = Lx1 ··· xi = −G−1 Cxi−1 ; mi = Lxi for i > 0,
(6.4)
The SVDMOR method exploits the fact that many terminals are not independent in terms of their timing information, which can be reflected in their frequency domain moments. As a result, we can perform the singular value decomposition (SVD) on a block moment of specific order. For instance, if we perform the SVD on the 0th order block moment (DC response) m0 , we have m0 = LG−1 B = U ΣV T ,
(6.5)
where U and V are orthogonal matrices and Σ is a diagonal matrix with singular values in the diagonal in a decreasing order. If there are k dominant singular values, we can use a k-rank matrix (a k × k full rank matrix) to approximate the original m0 based on the SVD theory as m0 = U ΣV T ≈ Uk Σk VkT .
(6.6)
96
Terminal reduction of linear dynamic circuits
Notice that Uk is q × k matrix and VkT is a k × p matrix and Σk is a k × k matrix. After this, we can have the following expressions B = Bb VkT ,
(6.7)
T
(6.8)
L =
Lc UkT ,
where Bb ∈ Rn×k and Lc ∈ Rn×k are obtained using the Moore–Penrose pseudoinverse of Vk . Bb = BVk (VkT Vk )−1 , Lc = L
T
Uk (UkT Uk )−1 .
(6.9) (6.10)
The circuit transfer function now becomes H(s) = Uk LTc (G + Cs)−1 Bb VkT .
(6.11)
Notice that the transfer function Hr (s), Hr (s) = LTc (G + Cs)−1 Bb ,
(6.12)
which is inside (6.11), is a k × k matrix transfer function, which actually is the terminal-reduced transfer function of (6.2) and can be reduced by traditional Krylov subspace-based model order reduction methods. If the reduced transfer function of ˆ r (s), then the final order reduced transfer function is (6.12) is H ˆ ˆ r (s)VkT . H(s) = Uk H
(6.13)
The SVDMOR method has several limitations. First, when the input and output terminals are quite different (especially in terms of numbers), SVDMOR does not work very well as SVDMOR performs the terminal reduction on both input and output responses at the same time. As a result, it can only approximate well for one type of terminals which have smaller terminal counts (as the rank of mi typically is determined by the smaller dimensions of its rows or columns). We will compare SVDMOR with the improved SVDMOR, called ESVDMOR in Section 6.3. The second problem with SVDMOR is that it is difficult to enforce the passivity during the combined terminal and model order reduction, and passivity terminal reduction of SVDMOR leads to less effect terminal reduction. We will discuss the passivity issues in Subsection 6.6.4.
6.2
Input and output moment matrices Our task is to find the terminals with similar delay or timing behaviors such that they can be viewed as one terminal if some delay uncertainty is allowed. We focus on the timing metric of terminals and look at their timing responses from step or impulse inputs, but the new methods can be applied to other metrics of interest. Ideally, the delay or timing information should be represented by waveforms in the time domain. However, this does not give us the best representation for the
6.2 Input and output moment matrices
97
terminal merging method. Because all the waveforms have to be computed first, it is also difficult to compare two transient waveforms in general. Instead, we use terminal response moments in the frequency domain to represent their time-domain response information. The terminal merging algorithm is based on the observation that if two terminals have similar timing or delay, then they should have similar moments (vectors) numerically. Remember that the moments computed in (6.17) represent the impulse responses of outputs from inputs. It is well known that the 1th moment m1 represents the first-order delay approximation, or the Elmore delay, of the corresponding output with respect to a specific input. Higher-order moments represent more detailed timing and delay information. Hence, the moment vector is an ideal expression of timing information for the terminal merging algorithm. Since we need to merge both input and output terminals, we need to present the timing information for both input and output terminal. As a result, we have two moment matrices, the input moment matrix MI and the output moment matrix MO . For a 1 × 1 system, the system’s transfer function H(s) could be expanded into a Taylor series around s = 0. The coefficient of si in the series expansion is the ith moment of the transfer function: H(s) = m0 + m1 s + m2 s2 + m3 s3 + . . .
(6.14)
For a general linear (linearized) time-invariant network with p input and q output terminals, we can describe it as follows. x˙ = Ax + Bu, y = Cx,
(6.15)
where x is an n-dimensional state vector, u is a p-dimensional input vector, and y is a q-dimensional output vector. The transfer function by Laplace transformation is H(s) = C(sI − A)−1 B.
(6.16)
If we expand the above equation in a Taylor’s series at s = 0, we get the moments at various terminals: m0 = −CA−1 B, m1 = −CA−2 B, .. . mr−1 = −CA−r B. .. .
(6.17)
98
Terminal reduction of linear dynamic circuits
Each moment mi is a q × p matrix, i m1,1 mi1,2 mi2,1 mi2,2 mi = . .. .. . miq,1 miq,2
. . . mi1,p , . . . mi2,p , .. , .. ., . . . . miq,p ,
(6.18)
where each column j in mi represents the moment vector of all output terminals from the input terminal j and each row k in mi represents the moment vector at the output terminal k from all input terminals. Then a moment matrix can be written as M = m0 m1 . . . mr−1 . . . . (6.19) To perform terminal reduction for both inputs and outputs, different moment matrices are constructed. For the output terminal reduction, we define the output moment matrix MO as: mT0 mT1 (6.20) MO = . , .. mTr−1
where each column j represents a moment series of output node j from all all inputs’ stimuli. Notice that in this way, the output terminals’ responses are with respect to all the inputs to make sure they are similar under all the inputs. Similarly, for input terminal reduction, the input moment matrix MI is defined as: m0 m1 (6.21) MI = . , .. mr−1
where each column k represents a moment series at all the outputs’ nodes from an input node k. To determine the best order of moments, r, we use the following rule: the number of moments from all the inputs should be equal to or larger than the number of terminals to be merged. The reason is that in the worst case where no terminals can be merged, we should be able to distinguish all the terminals using the moment information. This will become more clear in the following section. As a result, we have rp ≥ q for MO ,
(6.22)
p ≤ rq for MI .
(6.23)
99
6.3 The extended-SVDMOR (ESVDMOR) method
When p = q, then any r will satisfy (6.22) and (6.23). Hence, we can simply select r to be 2 or 3. If we have very large input and output terminals, then we cannot apply SVD to do the terminal reduction directly (this is also true for [31]) as SVD has about O(n3 ) complexity, where n is the size of the matrix [44]. However, we can do the terminal reduction in a hierarchical way: we can partition the circuit into k subcircuits such that the terminal in each subcircuit is small enough for SVD, as shown in [34]. More discussions on other special cases will be presented in Subsection 6.6.3. Moments can be efficiently computed by recursively solving the given circuits using a traditional SPICE-like simulation technique [91]. After we obtain the moment matrix, as shown in (6.20) or (6.21), we proceed to find the optimum number of clusters using singular value decomposition method to be shown in the next section.
6.3
The extended-SVDMOR (ESVDMOR) method In this section, we present an extended terminal and model order reduction algorithm, ESVDMOR, which improves the SVDMOR method. The basic idea of the new method is to perform the SVD low-rank approximation for the input and output terminals separately and use higher-order moment information during the SVD approximation to find true terminal independency and ensure the accuracy of reduced models.
6.3.1
The ESVDMOR terminal reduction algorithm The main idea of the new terminal reduction method is to perform the SVD approximation on the input and output moment response separately with the use of high-order moment information. We basically follow the terminal reduction framework of the SVDMOR method [34] but with different moment matrices. But we improve the efficiency of SVDMOR by saving one computation step, as shown later. The major problem for the SVDMOR method is that both input and output responses are considered at the same time during SVD because of the use of the specific order of block moments mi . So we cannot accommodate higher order moment information. To mitigate this problem, we create new moment matrices for input and output terminals separately. In this way, we can use higher order moments during the SVD process for input and output responses. Next, we perform singular value decomposition to both input moment response matrix MI and output moment response matrix MO . MI = UI ΣI VIT ≈ UIki Σki VITk ,
(6.24)
MO = UO ΣO VOT ≈ UOko Σko VOTko ,
(6.25)
i
where Σki is a ki × ki diagonal matrix and ki is the number of significant singular
100
Terminal reduction of linear dynamic circuits
values for matrix MI ; VITk is a ki × p matrix. Similarly, Σko is a ko × ko diagonal i matrix and ko is the number of significant singular values for matrix MO ; VOTko is a ko × q matrix. Then we can perform the low-rank approximation for the input and output position matrix B and C respectively. B = Br VITk ,
(6.26)
L = V Ok o L r ,
(6.27)
i
where Br ∈ Rn×ki and Lr ∈ Rko ×n are obtained by computing the Moore–Penrose pseudo-inverses of VIki and VOko respectively. Br = BVIki (VITk VIki )−1 ,
(6.28)
i
Lr =
(VOTko VOko )−1 VOTko L.
(6.29)
Notice that both VIki and UOko are orthonormal matrices, i.e. VITk VIki = I and i UOTko UOko = I. Therefore, (6.28) and (6.29) can be further simplified as Br = BVIki ,
(6.30)
VOTko L.
(6.31)
Lr =
Therefore, we save one computation step compared with the SVDMOR method [34]. As a result, the circuit transfer function now becomes H(s) = VOko Lr (G + Cs)−1 Br VITk . i
(6.32)
Consequently we get a terminal reduced subsystem with transfer function Hr (s). Hr (s) = Lr (G + Cs)−1 Br .
(6.33)
For this subsystem, the standard model order reduction techniques [32,37,85,91,113] can now be applied. Consider both terminal and model order reductions; we can obtain the order ˆ r (s), reduced transfer function H ˆ r (s) = L ˆ r (G ˆ + Cs) ˆ −1 B ˆr , H
(6.34)
ˆ = V T GV ; Cˆ = V T CV ; G ˆr = V T BVI ; L ˆ r = V T LV, B ki Ok o
(6.35)
where
where V is the projection matrix for reducing system of (6.33). The final reduced transfer function becomes ˆ G ˆ + Cs) ˆ −1 BV ˆ ˆ IT . H(s) = VOko L( k i
(6.36)
101
6.3 The extended-SVDMOR (ESVDMOR) method
6.3.2
ESVDMOR algorithm flow In this subsection, we give the whole combined terminal and model order reduction flow of the ESVDMOR method. Algorithm 6.1: Extended SVDMOR: ESVDMOR 1. Compute the block moments mi up to the (r − 1)th order using (6.4). 2. Construct the input and the output moment response matrices defined in (6.21) and (6.20) respectively. 3. Perform the SVD-based low-rank approximation on the position matrices B and L in (6.1) using (6.30) and (6.31). 4. Perform the normal Krylov-subspace-based MOR on the terminal reduced system (6.33) and perform the transformations using (6.35). ˆ 5. Compute the final reduced system H(s) using (6.36).
6.3.3
Numerical examples for ESVDMOR method In this subsection, we compare the SVDMOR and ESVDMOR method on one RC example. We only consider the DC moment for the SVDMOR method. The example RC circuit, net27, has 14 inputs and 118 outputs with model order of 182. Because there are more outputs than inputs, the number of independent input terminals can be determined by using only the DC moment as MI . However, higher-order moment information is needed to find the number of independent output terminals. Here we use first nine order block moments to construct the output moment matrix MO , which considers the worst case that all the output terminals are independent to each other. The singular values for m0 , which is used by SVDMOR, MI , MO are shown in Table 6.1. From the table we can see that we only need one input and one output terminal after terminal reduction by using the SVDMOR method. By using MI and MO in the ESVDMOR method, we find that there are more than one dominant singular value of MO . We choose one input and five outputs to make the reduced model more accurate. Table 6.1 Singular values of DC moment, input moment matrix and output moment matrix of the circuit net27. Index 1 2 3 4 5 6 7
m0 5.1587 3.9883 × 10−14 1.6681 × 10−14 – – – –
MI 5.1587 3.9883 × 10−14 1.6681 × 10−14 – – – –
MO 19.828 4.4677 1.6517 0.3045 0.0348 2.4611 × 10−3 1.6134 × 10−4
102
Terminal reduction of linear dynamic circuits
To compare the accuracy between the SVDMOR and ESVDMOR methods in a fair way, we make sure that the reduced models for both algorithms have the same number of poles. If we use the same order of block moments, then more terminals will lead to more poles using the Krylov-subspace MOR methods. For circuit net27, six poles are used to approximate the original model for both SVDMOR and ESVDMOR methods. The results are shown in Figure 6.2, which shows the frequency responses corresponding to the second input and the tenth output. 0.12 0.1
Magnitude
0.08 0.06 0.04 0.02 0 0 10
Original model SVDMOR model ESVDMOR model 5
10
10
Frequency
10
15
10
Figure 6.2 Frequency responses from SVDMOR and ESVDMOR for net27 circuit.
From these frequency response results, we can see that the ESVDMOR model could match the original model up to 5 GHz. In contrast, SVDMOR reduced model can only match frequencies up to 500 MHz. This clearly shows the advantage of the ESVDMOR method over the SVDMOR method when the input and output terminals have different dependency (different ranks in their moment responses matrices). One may argue that ESVDMOR is more accurate since it uses more terminals. Actually, if we use five inputs and five outputs in SVDMOR method, the results are still not good comparing with our ESVDMOR method under the six poles. The results are shown in Figure 6.3, where SVDMOR model 2 refers to the SVDMOR results with five input and five output terminals. So simply increasing the number of terminals in SVDMOR does not help to improve the model accuracy.
6.4
Determination of cluster number by SVD In this section, we present the new TermMerg method and present an algorithm to find the best number of independent clusters by using singular value decomposition on the input and output moment matrices discussed in the previous section. If two terminals have similar timing responses, it means that their moments have
103
6.4 Determination of cluster number by SVD
0.12 0.1
Magnitude
0.08 0.06 0.04 0.02 0 0 10
Original model SVDMOR model SVDMOR model 2 ESVDMOR model 5
10
10
Frequency
10
15
10
Figure 6.3 Frequency response from SVDMOR and ESVDMOR with different terminals for net27 circuit.
very similar values. If we have a number of terminals with similar timing behaviors, their moment matrix, where each moment series is a column or a row, will be a low-rank matrix. Singular value decomposition is very efficient to deal with rankdeficient matrices and it can reveal a great deal about the ranks and structure of a matrix, which motivates us to find the optimal number of clusters based on the moment matrices. For an m × n matrix A, the SVD decomposition of A is T , A = Um×m ΣVn×n
(6.37)
T T where Um×m and Vn×n are orthogonal matrices, Um×m Um×m = I and Vn×n Vn×n = I, Σ = diag(σ1 , σ2 , . . . , σmin(m,n) ), σi are called singular values and σ1 ≥ σ2 ≥ · · · ≥ σmin(m,n) . Before we proceed to present the cluster number determination method, we review an important result from the SVD decomposition [44]. For an SVD decomposition of a matrix A, if k < r = rank(A) and
Ak =
k X
σi ui viT ,
(6.38)
i=1
then
min k A − B k2 =k A − Ak k2 = σk+1 , rank(B)=k
(6.39)
where {ui } and {vi } are the left and right singular vectors respectively. Equation (6.39) reflects the fact that the rank-k approximation matrix Ak , is just σk+1 away from original matrix A in terms of norm-2 distance. In summary, for a matrix, A, SVD can do two things. The first thing is that SVD can tell us how many row or column are independent (the true rank of A) in a numerical way. Second, SVD can give a good low-rank approximation to the original
104
Terminal reduction of linear dynamic circuits
P matrix A ≈ Ak = ki=1 σi ui viT . But it is difficult to find the direct relationship between those left and right singular vectors ui and vi with the columns of the matrix A as those vectors are dominant (independent) columns of the original matrix in a different coordinate. But the rank information is still valid for the columns in the original matrix A. In our problem, we only use the true (low) rank information provided by the SVD instead of the singular vectors. Specifically, we look at the singular values and select the sufficient small singular value σk+1 to select the correct rank information. In the moment matrix, each column represents the moment response for an input or an output terminal. If two terminals have the same response, they should have the same column numerically and thus they are dependent. Hence, the selected rank number k can essentially be viewed as the number of clusters we expect as SVD essentially reveals the true rank of a matrix in a numerical way. Since the rank of matrix A also reflects the number of independent columns in the given matrix A, so it is natural for us to use k as the cluster number. Hence, the true rank information by SVD is a good guide for a later K-means-based clustering method to find the independent terminal set. We notice that a similar approach was applied to efficient clustering of video images to generate a compact representation of a video sequence to enable fast access to video contents [69]. The method performs the SVD on a transformed matrix, which corresponds the one-to-one with the original matrix for all its column vectors and then performs the K-mean clustering on the transformed matrix to speed up the clustering process. But the clustering essentially is still performed on the columns of the original matrix. We set a threshold ζ to denote the small singular value σk+1 . Typically ζ is set to 10−3 in our experiments, which means we regard σk+1 as the sufficient small singular value only if σk+1 ≤ ζ. In the new method, we select the approximation order not only based on the absolute value of the sufficient small singular value, but also on the relative ratio between two adjacent singular values. If the relative ratio between two adjacent singular values is close to 1, that means there is no big difference if the approximation order is increased by one. In this condition, we will keep increasing the approximation order until the ratio becomes small enough. Thus we first define a threshold ε. Then we compare two adjacent singular values σk+1 and σk . If σk+1 /σk ≤ ε and σk+1 ≤ ζ, then k is our choice. Typically ε is set to 10−3 in the experiments.
6.5
K-means clustering algorithm After we determine the number of clusters for the output (or input) terminals, say k, we proceed to group the output (or input) terminals into the k clusters. In the new approach, we apply a so-called K-means algorithm to determine each cluster [26]. K-means is the widely used clustering algorithm for partitioning (or
105
6.5 K-means clustering algorithm
clustering) N data points into k disjoint subset Sj containing Nj data points to minimize the sum-of-squares criterion: J=
k X X
j=1 i∈Sj
|xi − ϕj |2 ,
(6.40)
where xi is a vector representing the ith data point, and ϕj is a vector, representing the geometric centroid of the data points in Sj . It has wide applications in data mining and data analysis. However, the K-means clustering algorithm is very sensitive to the initial choice of cluster centers and the number of clusters [26]. Only the appropriate number of clusters produces the best approximation result. Fortunately, in this method, the cluster number k can be first obtained from SVD decomposition on the moment matrix as mentioned above. For a general moment matrix M (M = MO , or M = MI for different terminal reductions), it has l terminals to be merged. Let M = [c1 , c2 , . . . , cl ]. The clustering algorithm is shown in Figure 6.4 which is based on the K-means clustering scheme. K-MeansCluster Construct Cluster(k) 1 select k seed vectors out of c1 , c2 , . . . , cl as k centroids 2 put the rest of the unselected vectors into the nearest cluster Sj 3 compute ϕj for each cluster Sj 4 do re-cluster all ci into k clusters according to ϕj of each cluster 5 recompute ϕj for each cluster Sj 6 until no change in ϕj 7 return S1 , S2 , . . . , Sk and ϕ1 , ϕ2 , . . . , ϕk Select Rep Vector(S1 , S2 , . . . , Sk ) 1 for i ∈ Sj , compute kdi k2 = kci − ϕj k2 2 find Rj = ci so that min{kdi k2 } 3 return R 4 end Figure 6.4 K-means clustering algorithm. Reprinted with permission from [75] (c) 2005 IEEE.
There are two steps (algorithms) in the clustering method. The first step, Construct Cluster(k), clusters the given l vectors into k clusters (l > k). The second step Select Rep Vector(S1 , S2 , . . . , Sk )) takes the output of the clustering algorithm as the input and finds the representative vector for each cluster. The
106
Terminal reduction of linear dynamic circuits
representative vectors Rj will be kept during the terminal reduction. All the other unselected vectors will be suppressed. The basic idea of Construct Cluster() is to dynamically find the k clusters so that all the vectors in a cluster have the closest distance to its geometric centroid vector ϕj of cluster Sj . Construct Cluster() uses an iterative algorithm that minimizes the sum of distances from each vector to its cluster centroid vector ϕj , over all clusters. This algorithm moves vectors among clusters until the sum cannot be decreased any more. The second algorithm Select Rep Vector() finds the representative vector for each given cluster Sj . This has been achieved by calculating the distance between the centroid and each vector belonging to this cluster to find the terminal with the closest distance to the centroid. Since we use the representative terminal to represent all the other terminals in a cluster, the transfer functions of the reduced one (thus their zero) will be different from their original ones. But we ascertain that their expanded moment form will be as close as possible among terminals in a cluster (as we work on the moment matrix). Hence we expect the zeros of the two transfer functions (for the representative one and the reduced one) will be similar.
6.6
TermMerg algorithm In this section, we first present the whole terminal reduction flow of the TermMerg algorithm. Then we give the compact modeling flow, which combines the terminal reduction with traditional model order reduction. Finally we discuss some practical considerations of the algorithm.
6.6.1
TermMerg algorithm flow In this subsection, we present the flow of the TermMerg method. Algorithm 6.2: TermMerg 1. Construct the input and the output moment matrices. 2. Perform the SVD on the input and/or the output moment matrices to find the best cluster numbers. 3. Invoke K-MeansCluster to find the all the representative terminals for each cluster. After the terminal reduction, the input position matrix B and the output position matrix C in (6.15) will be modified to include only the representative terminals. We can define terminal selection matrices for this operation on B and C.
107
6.6 TermMerg algorithm
6.6.2
Modeling flow based on combined terminal and model order reduction After the terminal reduction, we will apply the traditional model order reduction technique on the terminal-reduced circuit, as shown in Figure 6.5. The new terminal reduction can work with any existing model order reduction techniques, such as projection-based methods [85], TBR-ased methods [87],or hierarchical reduction methods [121].
H(s)
H(s)
Model order reduction H’(s)
H’(s)
Interface circuits
Figure 6.5 The reduction flow of combined terminal and model order reductions.
To use the terminal and order reduced models, we may need to build simple interface circuits to connect the original terminals to the representative terminals. For output terminals, we may just physically connect the representative terminals to all the reduced terminals they represent (no interface circuitry is required). For input terminals, to avoid adverse coupling among different sources with low impedance, we may use controlled sources (for instance, urrent-controlled current sources (CCCS)) to connect several input sources together. For input sources with high impedance, we can just physically connect them together (again, no interface circuitry is required). To use the terminal and order reduced models, we may need to build simple interface circuits to connect the original terminals to the representative terminals. For output terminals, we may just physically connect the representative terminals to all the fan-out nodes that the reduced terminals connect before (i.e., no interface circuitry is required). Figure 6.6 illustrates a simple interface circuit for the input and output terminals. The admittance for each CCCS is the corresponding input admittance of the represented input terminal, which can be passively realized using Foster’s realization method [93].
108
Terminal reduction of linear dynamic circuits
i
CCCS
i
CCCS
i
CCCS
Input cluster 1 Output cluster 1
Output cluster S
Input cluster K
i
CCCS
Figure 6.6 Simple interface circuit.
6.6.3
Practical implementation and consideration We have mentioned that we determine the order of moments r based on the fact that the number of moments in the moment series from all the inputs should be equal to or larger than the number of terminals to be merged, as shown in (6.22) and (6.23). In this way, we can obtain the complete information of the moment matrix after singular value decomposition. However, if we have many outputs and a few inputs, we end up with large r. In other words, we have to use very high order moments. But high order moments are actually not very informative as they contain mainly the dominant pole information numerically [91]. This is also the case for interconnect circuits with many inputs and a few outputs. On the other hand, a large number of outputs or inputs does not imply that the circuit has more independent outputs and inputs, or clusters. Practically, the final cluster number may still be very small compared with the number of outputs and inputs. In this case, we do not need many high order moment information to distinguish those terminals. This motivates the following schemes to solve this problem. We pre-define an effective cluster number qe , which is smaller than the number of terminals q to be merged but is typically larger than the resulting number of clusters. For example, we may define qe = dq/ie, i = 2, 3, . . . depending on circuits, where the function dxe means rounding the x to the nearest larger integer. Then the order of moments r would be equal to or larger than qe /p. If the cluster’s number is equal to qe after SVD method at some threshold, that means the pre-defined number of clusters is too small to find the optimal cluster number at this threshold. We then either increase the threshold value (at cost of more approximation errors) or increase the pre-defined clusters’ number qe to re-cluster the terminals. If we already have had some prior-knowledge about the circuit terminals, it will be very helpful to choose the effective cluster number qe . Another way is to use a hierarchical terminal reduction scheme, which is suitable for a large number of resulting clusters. Specifically, we first partition the merging terminals into a number of groups so that their corresponding moment matrix M
109
6.6 TermMerg algorithm
will not need to use the higher moments. After this we perform SVD on each group and find several representative terminals from each group using K-means clustering method. After this we perform SVD on all the representative terminals to find the best global cluster number. Then we perform the K-means-based clustering algorithm on the top of those selected representative terminals to find the global representative terminals. We may have several hierarchical levels when the number of terminals to be merged is large. Another issue is that there may exist DC paths from the inputs to the outputs. In this case, (6.16) will be written as H(s) = C(sI − A)−1 B + D, and the zeroth moment will become m0 = −CA−1 B + D. If the zeroth moments for those DC-coupled terminals are quite different, they will probably not be clustered together. So the method can still be applied.
6.6.4
Passivity analysis Passivity is an important issue for compact modeling. First we show that the proposed TermMerg terminal reduction algorithm can always guarantee the passivity of reduced models when the model order reduction is passive, regardless of the input and output position matrices B and L. We have the following result: Proposition 6.1. The TermMerg method can always lead to passive models for any RLCK circuits. The proof is obvious. For any given RLCK circuit with sets of input and output terminals, after the TermMerg reduction, we have a new set of input and output terminals. To make the terminal-reduced system passive, one can use two reduction approaches. One is by means of the projection-based method. In this case, one first makes all the remaining terminals bidirectional ones (they are both inputs and outputs at the same time). As a result, we have the same terminal reduced input and output position matrices. In this case, the projection-based method can generate the passive model for this circuit. The second method is by means of passive truncated balanced realization (TBR), which can make passive reduction of RLCK circuits with different B and LT directly but at higher computation costs [88]. As we can see, as long as the model order reduction is a passive process, the combined terminal and model order reduction process is passive for the new method. However, this is not the case for SVDMOR as SVDMOR is strongly coupled with the model order reduction process. We analyze two cases. First we have B 6= L T . In this case, Br and LTr are not identical and Vk and Vk are not identical either. As a result, the projection-based MOR method cannot produce passive models. This is also true for passive TBR method as Vk and Vk are not identical. One may think that one can make all the terminals as bidirectional ones by mak-
110
Terminal reduction of linear dynamic circuits
ing B and LT equal. But we show below that such a simple strategy for enforcing passivity may significantly reduce the terminal reduction qualities. Table 6.2 lists the singular value results after the SVD on the 0th and 1st moment matrices for circuit net1026 with both input and output terminals treated as bidirectional ones. Table 6.2 Singular values of the DC admittance moment, 1st order admittance moment matrices of the circuit net1026 when all the terminals are treated as bidirectional. Index 1 2 3 4 5 6 .. .
m0 0.58970 0.55093 0.50215 0.41875 0.31229 0.064575 .. .
m1 0.42268 0.33515 0.30440 0.28897 0.25942 0.24131 .. .
258 259 260 261 262
0.0023181 0.0013246 0.00059548 0.0001499 2.9535 × 10−17
0.0099492 0.0099117 0.0098789 0.0098567 0.0096795
From Table 6.2, we can see that the singular values decay very slowly and the terminals cannot be reduced very much. Actually the numerical rank of the moment matrix mi (given by Matlab rank command) is 261 for m0 and 262 for m1 , which is the full rank of the moment matrix. This is true for other test cases. As a result, we can observe that SVDMOR is not efficient for reducing bidirectional terminals. One obvious reason is that many output terminals now become input terminals. So the responses excited by those terminals have to be considered for all the other terminals, which make the reduction more difficult or almost impossible. In the second case, we consider B = LT . To ensure passivity, we require that Br are LTr are the same and Vk and Vk are identical as well owing to the requirements of the congruence transformation. As a result, the moments mi must be symmetric. It can be proved that when the inputs to the original models consist only of current sources or voltage sources for RLCK circuits, mi is symmetric. If both current and voltage sources are present, mi will not be symmetric and terminal reduction by SVDMOR will not ensure the passivity. But as we have already shown, B = LT will lead to a reduction of bidirectional terminals, which cannot be reduced effectively by SVDMOR. In short, we observe that SVDMOR in general not lead to passivity terminal
6.7 Numerical examples
111
reduction. When it does produce passive terminal reduction, it does not work well (or does not work at all). In contrast, TermMerg can always ensure passive terminal reduction by using a passive model order reduction method.
6.7
Numerical examples The new method has been implemented in MATLAB. We tested the terminal merging algorithm on a number of real interconnect circuits from industry. The first interconnect circuit net1026 is a one-bit line circuit from a SRAM circuit in 180 nm technology. This network contains 525 resistors, 772 capacitors, 6 drivers, and 256 receivers. We perform the K-means-based TermMerg algorithm to reduce both receiver (output) terminals and driver (input) terminals. Since there are many outputs (256 receivers) compared with a few inputs (6 drivers), in this case, we set the order of moments in the input moment matrix to 1 and the order of the moments in the output moment matrix is computed as 256/6 = 42. But in this case we set the effective cluster number qe = 30. The output moment order is r = qe /6 = 5. By using singular value decomposition, the optimal number of clusters is found to be 5 if we define the threshold ε = 10−3 . The dominant singular values for the MI , MO are listed in Table 6.3.
6.7.1
Comparison with the SVDMOR method One problem with the SVDMOR method is that only a specific moment is used to determine the correlations of both input and output terminals. The specific moment, like the DC moment, can give a good estimation of terminal correlations on the low frequencies, but they may not be accurate at the high frequencies. This is specially true for circuits having small numbers of inputs and large numbers of outputs or vice versa. Figure 6.7 shows the SVDMOR reduction results for the interconnect circuit, net1026, in frequency domain. We perform first the terminal reduction and then Krylov-subspace-based MOR. We keep up to the 2nd order of block moments in the MOR for both the SVDMOR and TermMerg methods. SVDMOR based on m0 reduces the terminals to only one input and one output based on the singular values, as shown in Table 6.3. For TermMerg, we have one input and five outputs after terminal reduction. From Figure 6.7 we can see that the result from SVDMOR does not match well with the original circuit at high frequencies while the proposed method has a better matching at the same high frequencies. Accuracy loss at high frequency after terminal reduction reflects the fact that the only specific order of moments (DC moments) are used during the SVD-based terminal reduction and DC moments will lead to less accurate terminal relationship at the high frequency ranges. This problem can easily be seen when the numbers of
112
Terminal reduction of linear dynamic circuits
300
250
Mag part
200
150
100 Original model SVDMOR model TermMerg model
50
0 0 10
2
10
4
10
6
10 Frequency
8
10
10
10
12
10
Figure 6.7 Frequency impedance responses from the SVDMOR method for net1026 circuit.
Table 6.3 Singular values of DC admittance moment, input moment matrix and output moment matrix of the circuit net1026. ] 1 2 3 4 5 6 7
m0 5789.5 1.4364 × 10−12 3.7177 × 10−13 – – – –
MI 5789.5 1.4364 × 10−12 3.7177 × 10−13 – – – –
MO 46355 666.95 22.558 0.32084 0.00266 1.4549 × 10−5 5.6469 × 10−8
input and output terminals are quite different as in case of net1026. For this circuit, we can compare 256 (DC) response at 256 output terminals for each input terminal, so we have a better picture for determining their correlations even using the DC moments. While for each output port, we can only compare the responses from six inputs, the terminal relationship will not be accurate with just DC moments. Once we use high-order moments, we obtain better correlations of the output terminals and, thus, the reduced model is more accurate at the high frequencies, as shown in Table 6.3 and Figure 6.7.
6.7.2
Clustering results The final clustering results from TermMerg are shown in Table 6.4. The first column is the number of the cluster series. The second column is the representative terminal of each cluster. All the terminals in each cluster are placed in the third column.
113
6.7 Numerical examples
Table 6.4 Output clustering results for the one-bit lines circuit net1026. Reprinted with permission from [75] (c) 2005 IEEE. Cluster Index 1 2 3 4 5
Representive terminal Rcv206 Rcv58 Rcv19 Rcv98 Rcv144
Clustered terminals Rcv151, Rcv152, . . . , Rcv256 Rcv39, Rcv40, . . . , Rcv77 Rcv1, Rcv2, . . . , Rcv38 Rcv78, Rcv79, . . . , Rcv119 Rcv120, Rcv121, . . . , Rcv150
Figure 6.8 plots the distribution of terminals (x-axis) with respect to the cluster index number (y-axis). 5
Cluster number
4
3
2
1
0 0
50
100
150
Receiver number
200
256
Figure 6.8 Output terminal distribution for each cluster for net1026 circuit. Reprinted with permission from [75] (c) 2005 IEEE.
Then we go back to the time domain to validate the effectiveness of the method. We add a voltage source to a driver input to view the step responses at other receivers. Figure 6.9 shows the responses of five representative terminals. If we compare the 50% delay time, the delay time difference among them is approximately 10–20 ps, which is quite different. The enlarged local waveforms in Figure 6.9 are shown in Figure 6.10. If we plot more responses for all the suppressed terminals in one cluster, for instance receiver97 and receiver99, whose representative terminal is receiver98, we cannot tell the difference in responses between these reduced terminals and their representative terminal, receiver98, for the delay time, as shown in Figure 6.11. Detailed analysis shows that the delay time differences among these terminals are only about 1–2 ps, which is clearly shown in Figure 6.12. In other words, if we allow
Terminal reduction of linear dynamic circuits
1 0.9
Rcv98 Rcv58 Rcv206 Rcv19 Rcv144
0.8
Voltage (V)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2
0.4
0.6
0.8
1
Time (s)
1.2
1.4 −9
x 10
Figure 6.9 Step responses of representative output terminals. Reprinted with permission from [75] (c) 2005 IEEE.
0.55
Rcv98 Rcv58 Rcv206 Rcv19 Rcv144
0.54 0.53 0.52
Voltage (V)
114
0.51 0.5 0.49 0.48 0.47 0.46 0.45 4.8
5
5.2
5.4
5.6
5.8
Time (s)
6
6.2
6.4
6.6 −10
x 10
Figure 6.10 Comparison of 50% delay time among representative output terminals. Reprinted with permission from [75] (c) 2005 IEEE.
1–2 ps delay variations, those terminals can be viewed as the same terminal. At this point, we can say that it is reasonable to use the response at receiver98 to represent the responses at the suppressed terminals in its cluster, such as receiver97 and receiver99. Considering the process variations and other environmental variations, it is possible that we can combine them into one terminal. Also we can improve the accuracy of this method by relaxing the threshold ε to generate more clusters. For the input terminal merging, we need to cluster the six input (six drivers)
115
6.7 Numerical examples
1
Rcv98 Rcv58 Rcv206 Rcv19 Rcv144 Rcv97 Rcv99
0.9 0.8
Voltage (V)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2
0.4
0.6
0.8
Time (s)
1
1.2
1.4 −9
x 10
Figure 6.11 Step responses of representative output terminals and two suppressed outputs. Reprinted with permission from [75] (c) 2005 IEEE. 0.5 Rcv98 Rcv58 Rcv206 Rcv19 Rcv144 Rcv97 Rcv99
Voltage (V)
0.498
0.496
0.494
0.492
0.49
5.6
5.7
5.8
5.9
Time (s)
6
6.1 −10
x 10
Figure 6.12 Comparison of 50% delay time among representative output terminals and two suppressed outputs. Reprinted with permission from [75] (c) 2005 IEEE.
terminals. By using the same threshold level ε = 10−3 , the cluster number is 5. Only the terminals at driver3 and driver4 could be merged together. The second example, net27, is a clock-tree circuit, also in 180 nm technology. It contains 167 resistors, 654 capacitors, 14 drivers, and 118 receivers. For the output reduction, we set the effective cluster number qe = 118/2 = 59. Then we only need r = qe /14 ≈ 4 orders of moments to format the output matrix MO . After the SVD step, as shown in Table 6.1, it is obvious to select the cluster number as k = 5 at the given ε = 1 × 10−2 . We also present its distribution of terminals for different clusters in Figure 6.13 when we select the cluster number k = 5. The representative terminals are receiver98, receiver18, receiver110, receive36, and receiver84 corresponding to clusters
116
Terminal reduction of linear dynamic circuits
1–5. Notice that the receiver numbers were not assigned based on closeness between terminals for net27. But for circuit net1026, it seems that receivers are numbered based on the closeness between them. 5
Cluster number
4
3
2
1
0 0
20
40
60
Receiver number
80
100
118
Figure 6.13 Output terminal distribution for each cluster for net27 circuit. Reprinted with permission from [75] (c) 2005 IEEE.
The third example, net38, is a clock-tree circuit also. It includes 98 resistors, 342 capacitors, 18 drivers, and 48 receivers. We use r = (48/18) ≈ 3 moments to format the output matrix MO as well as input matrix MI . After the SVD, the input moment matrix MI has the following singular values: Σ = diag(1.03 × 105 , 42.88, 7.35, 2.05, 0.83, 0.17, 3.74 × 10−5 , . . . ). If we set the threshold ε = 1 × 10−3 , the cluster number is 16, which is almost equal to the number of inputs. However we notice that there is a big magnitude drop between two singular values 0.17 and 3.74 × 10−5 . If we relax the threshold to ε = 3 × 10−3 , the number of clusters is six. The distribution of terminals for different clusters is presented in Figure 6.14. For the output terminal reduction, the number of clusters will be 48 if the threshold is set to ε = 3 × 10−3 , which is the number of the output terminals. That means we are not able to cluster the output terminals at this threshold level. After we relax the threshold to ε = 6 × 10−2 , we can get the cluster number that is 13. We show the output assignment results in Figure 6.15.
6.8
Summary In this chapter, we presented a novel method, named TermMerg, to efficiently reduce the terminal number of general linear interconnect circuits considering delay
117
6.8 Summary
6
Cluster number
5
4
3
2
1
0 0
2
4
6
8
10
Driver number
12
14
16
18
Figure 6.14 Input terminal distribution for each cluster for circuit net38. 13 12 11 10
Cluster number
9 8 7 6 5 4 3 2 1 0
0
10
20
30
Receiver number
40
48
Figure 6.15 Output terminal distribution for each cluster for circuit net38.
uncertainty. Terminal reduction can lead to more compact order-reduced models. The new method is based on the high-order moment responses of terminals in frequency domain as the metrics for the timing or delay. We first applied a singular value decomposition method to determine the best number of clusters based on the low-rank approximation. After this, the K-means-based clustering algorithm was used to cluster the moments of the terminals into the different clusters. An improved SVDMOR, ESVDMOR, has been introduced, which improves the SVDMOR in terms of efficiency and accuracy. Experimental results on a number of real industry interconnect circuits demonstrated the effectiveness of the new method.
7 Vector-potential equivalent circuit for inductance modeling
As VLSI technology advances with decreasing feature size as well as increasing operating frequency, inductive effects of on-chip interconnects become increasingly significant in terms of delay variations, degradation of signal integrity, and aggravation of signal crosstalk [72, 116]. Since inductance is defined with respect to the closed current loop, the loop-inductance extraction needs to specify simultaneously both the signal-net current and its returned current. To avoid the difficulty of determining the path of the returned current, the partial element equivalent circuit (PEEC) model [101] can be used, where each conductor forms a virtual loop with infinity and the partial inductance is extracted. To model inductive interconnects accurately in the high frequency region, RLCM (M here stands for mutual inductance) networks under the PEEC formulation are generated from discretized conductors by volume decomposition according to the skin-depth and longitudinal segmentation according to the wavelength at the maximum operating frequency. The extraction based on this approach [59, 83, 84] has high accuracy but typically results in a huge RLCM network with densely coupled partial inductance matrix L. A dense inductively coupled network sacrifices the sparsity of the circuit matrix and slows down the circuit simulation or makes the simulation infeasible. Because the primary complexity is a result of the dense inductive coupling, efficient yet accurate inductance sparsification becomes a need for extraction and simulation of inductive interconnects in the high-speed circuit design. Because the partial inductance matrix in the PEEC model is not diagonally dominant, simply truncating off-diagonal elements leads to negative eigenvalues and the truncated matrix loses passivity [52]. There are several inductance sparsification methods proposed with the guaranteed passivity. The return-limited inductance model [109] assumes that the current for a signal wire returns from its nearest power and ground (P/G) wires. This model loses accuracy when the P/G grid is sparsely distributed. The shift-truncation model [65] calculates a sparse inductance matrix by assuming that the current returns from a shell with shell radius r0 . But it is difficult to determine the shell radius to obtain the desired accuracy. Because the inverse of the inductance matrix, called the K-element (susceptance) matrix is strictly diagonally dominant, off-diagonal elements can be truncated without affecting the passivity [9, 25]. Because K is a new circuit element not included 118
7.1 Vector-potential equivalent circuit
ai Ak Aki k Bij k Bi0 Iik Iˆik ˆk R ij ˆk R i0 Vik Vˆik
ith filament kth component kth component kth component kth component kth component kth component kth component kth component kth component kth component
of of of of of of of of of of
119
vector potential A averaged A at ai the flux from ai to aj the flux from ai to vector potential ground electrical branch current at ai magnetic branch current at ai coupling effective resistance between ai and aj ground effective resistance of ai electrical branch voltage at ai magnetic branch voltage at ai
Table 7.1 Table of notations. Reprinted with permission from [135] (c) 2005 IEEE. in conventional circuit simulators such as SPICE, new circuit analysis tools considering K have been developed [17, 58]. Alternatively, the double-inversion-based approaches have been proposed in [9,140]. Using the control volume to extract adjacently coupled effective resistances to model inductive effects, the vector-potential equivalent circuit (VPEC) model was recently introduced [86]. Its sparsified and SPICE-compatible circuit model is obtained based on a locality assumption that the coupling under the VPEC model exists only between adjacent wire filaments. This chapter presents a cost-efficient and probably passive inductance model: the vector-potential equivalent circuit (VPEC). Because the extraction in [86] requires to optimize the size of the control volume for each filament during integration, it becomes impractical for large-sized interconnects. Moreover, it need locality assumption that may lead to an inaccurate extraction [134]. In contrast, we rigorously derive an accurate VPEC model considering the coupling between any pair of filaments by inverting the partial inductance matrix. We further prove that the resulting circuit matrix for the full VPEC model is passive and strictly diagonal dominant. The diagonal dominance enables truncating small-valued off-diagonal elements to obtain a sparsified VPEC model named the truncated-VPEC (tVPEC) model with guaranteed passivity.
7.1
Vector-potential equivalent circuit As in FastHenry [59] with the magneto-quasi-static assumption, a conductor can be divided into a number of rectilinear filaments. The current density is constant over the cross-section of the filament. In this chapter, we use superscripts x, y, and z to denote spatial components of a vector variable. Let A be the vector potential, determined by the distribution of the current density J. Then J k and Ak are the components in the k-direction (k = x, y, z). We further use the subscript i for variables associated with filament ai (i ∈ N ), and every filament ai has a length l by adequate discretizations in the k-direction. Table 7.1 summarizes the notations used in this chapter. To model the inductive effect, we start with differential Maxwell equations in
120
Vector-potential equivalent circuit for inductance modeling
terms of A [57]: ∇2 Ak = −µJ k
(7.1)
∂Ak = −E k , ∂t
(7.2)
where the vector potential A and the current density J are in the z-direction, E is the electrical field, and µ is the permeability constant. Note that the resistive voltage drop by (−∇k φ) is not included in (7.2) since we are interested in the inductive voltage drop here. Given the distribution of the current density J k , the vector potential Ak is determined by Z µ Jk Ak = dτ (r0i ). 4π |r − r0i | To construct the system equation in the form of the integral equation, we apply the volume and the line integration to (7.1) and (7.2) respectively. For filament a i , (7.1) is integrated within the volume τi of filament ai , using Gauss’ law : Z Z a · dS = ∇ · adτ, S
we can obtain −µ
Z
τ
k
J dτ =
τi
Z
Si
∇Ak · dS
k + = Bi0
X j6=i
k Bij .
(7.3)
R Note that the surface integral Si dS · ∇Ak is actually the flux of the gradient of the kth component of the vector potential caused by the filament current of a i in τi . It consists of the following parts [86]: (i) the flux to infinity (vector potential k ground) Bi0 Z k ∇Ak · dS, (7.4) Bi0 = Si0
k and (ii) the flux to all other filaments aj (j ∈ N, j 6= i) Bij Z k Bij = ∇Ak · dS.
(7.5)
Sij
k However, explicitly determining the value of Bij is difficult because it is hard to partition the flux between filament ai , all other filaments aj , and the vector potential ground.
121
7.1 Vector-potential equivalent circuit
^
Ii ^
R ij
Ai Ii
Aj ^
^
^
R j0
Ri 0
I i = lI i
(b)
(a)
Figure 7.1 (a) Electronic current-controlled vector-potential current source; (b) The Kirchoff current law for vector potential circuit. An invoking vector potential current source is employed at ai , and the responding vector potential at aj is Akj , determined by the full effective resistance network. Reprinted with permission from [135] (c) 2005 IEEE.
Moreover, integrating (7.2) along the projected length in k-direction of filament ai leads to: Z Z ∂Ak dl = − E k · dl . (7.6) li ∂t li Based on (7.3) and (7.6), we can further construct the circuit-level system equation in the matrix form. By defining the filament vector potential [86] as the average volume integral of Ak within τi (surrounded by the surface Si ): Z 1 k Ai = Ak (r)dτ (r). (7.7) τ i τi We can define an effective coupling resistance k ˆ ij R = −µ
(Aki − Akj ) k Bij
(7.8)
to model (i.e., replace) the mutual inductive coupling between ai and aj . In addition, there also exists an effective ground resistance to model the self inductive effect: k
ˆ k = −µ Ai B k R i0 . i0
(7.9)
Because the filament current is invariant along the k-direction, the volume integral of the current density inside the volume τi is reduced to l Iik , where Iik is the electrical current at the cross-section of ai . Therefore (7.3) becomes the Kirchoff current law (KCL) under the full VPEC model: X (Aki − Akj ) Aki + = l Iik , ˆk ˆk R R i0
j6=i
ij
(7.10)
122
Vector-potential equivalent circuit for inductance modeling
where a vector potential current source Iˆik can be defined: Iˆik = l Iik ,
(7.11)
which is controlled by the electrical current Iik . An equivalent circuit to illustrate the VPEC KCL equation (7.10) is shown in Figure 7.1. Clearly, we can see the physical meaning of the effective resistance: given a unit current change at the ith ˆ k when all other filament, the vector potential observed at jth filament is exactly R ij filaments are connected to the vector potential ground. Similarly for (7.6), we have the following inductive nodal voltage equation: l
∂Aki = Vi k , ∂t
(7.12)
which describes the relation between the vector potential and its corresponding electrical voltage drop caused by the inductive effect. As a result, a voltage-controlled vector potential voltage source Vˆik is: Vˆik = Vik /l .
(7.13)
To extract the vector-potential equivalent circuit, the integration-based approach k in [86] needs to determine the localized flux Bij (j ∈ ni ), where ni is the set of k filaments adjacent to ai . The explicit calculation of Bij is hard, and only considering the localized flux, as in [86], loses the accuracy. The integration-based VPEC model in [86] needs to use a control volume for each filament, but no method was presented k in [86] to find an accurate control volume. To avoid calculating Bij explicitly and use the locality assumption, this chapter derives a full VPEC model and then present an inversion-based extraction below. The detailed derivation of the full VPEC model is presented in [135], where we obtain following two KCL and KVL equations: Aki + ˆk R i0
X
j6=i,i,j∈N
(Aki − Akj ) = Iˆik , ˆk R
(7.14)
ij
∂Aki = Vˆik , ∂t
(7.15)
where the vector potential current and voltage are related to the electrical branch current and voltage by Iˆik = l Iik ,
Vˆik = Vik /l .
(7.16)
Clearly, we can see the physical meaning of the effective resistance by (7.14): given a unit current change at the ith filament, the vector potential observed at jth filament ˆ k when all other filaments are connected to the vector potential ground. is exactly R ij
123
7.1 Vector-potential equivalent circuit
Electrical Crcuit Block R1
Filam ent 1
Vin1
M agnetic Crcuit Block
^
V1 = l V1
I1
C1 / 2
^
i A1 = v A1 L1 = 1H
I1 = l I1 ^
Vout1
C1 / 2
R 10
+
+
v A1
V1
^
-
^
C12
R12
R2
Filam ent 2
Vin 2
-
I2
C2 / 2
^
R 13
^
V2 = l V2
^
i A 2 = v A 2 L2 = 1H
I2 = l I2
Vout 2
C2 / 2
^
R 20
+
+
v A2
V2
^
-
^
C23
R 23
R3
Filam ent 3
Vin 3
-
C3 / 2
I3
^
V 3 = l V3
C3 / 2
^
i A 3 = v A 3 L3 = 1H
I3 = l I3
Vout 3
^
R 30
+
+
v A3
V3
-
^
-
Figure 7.2 Vector potential equivalent circuit model for three filaments. Reprinted with permission from [135] (c) 2005 IEEE.
Furthermore, (7.15) describes the relation between the vector potential and its corresponding electrical voltage drop caused by the inductive effect. We present a SPICE-compatible VPEC model for three filaments in Figure 7.2. The model consists of two blocks: the electrical circuit (PEEC resistance and capacitance) and the magnetic circuit (VPEC effective resistance and unit inductance). They are connected by the controlled sources. It includes the following components: 1. 2. 3. 4. 5. 6.
The resistance Ri and capacitance Ci in the electrical circuit are the same as those in the PEEC model; A dummy voltage source to sense current Iik in the electrical circuit controls Iˆik in the magnetic circuit; A voltage-controlled current source is used to relate Vˆik and Iˆik with gain g = 1 in the magnetic circuit; A voltage source Vik in the electrical circuit is controlled by Vˆik in the magnetic circuit; ˆ k and coupling R ˆ k are used to repreEffective resistances including ground R i0 ij sent the strength of inductances in the magnetic circuit; A unit inductance Li in the magnetic circuit is to: (i) take into account time derivative of Aki ; and (ii) preserve the magnetic energy from electrical circuit.
Although the number of magnetic circuit blocks increases with more filaments, sparsified VPEC models will be introduced in Subsection 7.3.2 to greatly reduce coupling resistances in magnetic circuit blocks with preserved passivity. Moreover, because the VPEC model largely reduces reactive elements (i.e., inductance) and its effective resistance is less densely stamped in the MNA (modified nodal analysis) matrix compared with the partial inductance under the PEEC model, the full VPEC model reduces the simulation time compared with the PEEC model.
124
Vector-potential equivalent circuit for inductance modeling
Note that the summation in KCL (7.14) for the full VPEC model is carried out over each pair of filaments. In contrast, this summation in [86] is carried out only for adjacent filaments. The author of [86] obtained the localized model by modeling k ˆ k . It is based on the analogy with the the flux Bij as a current flow through R ij conducting current flow at a surface S (Ohm’s law ):
I = −σ
Z
S
∇φ · dS
(7.17)
Equation (7.17) means that the conducting current I(x, y, z) is locally related to the flux of the electrical field E(x, y, z) (−∇φ) on the surface S. Equation (7.17) is correct because electrons only locally transport in the conductor. However, for k the magnetic coupling problem, the flux Bij is caused by the magnetic field that is not localized. Therefore, we still need non-local resistances to model the long-range effect of inductance accurately. Hence, the KCL equation (7.14) in our method is ˆ k (j ∈ ni ), but also to all other R ˆ k (j 6= i, j ∈ N ). related not only to the localized R ij ij The experimental results in Subsection 7.3.1 will show that compared with the PEEC model, the full VPEC model considering all filaments is more accurate than the localized VPEC model from [86].
7.2
VPEC via PEEC inversion k explicitly, there is no efficient calculation Because of the difficulty of determining Bij method for effective resistances in [86]. In this part, we will derive the circuit-level ˆ and then present system equation based on the VPEC effective resistance matrix G, an efficient method to calculate effective resistances from the inversion of the partial inductance matrix. We take the time derivative at both sides of (7.14) and then use (7.15) to replace the time derivative of the vector potential. Consequently, we obtain:
(
X 1 X ∂I k 1 1 + )Vik + (− k )Vjk = l 2 i . k k ˆ ˆ ˆ ∂t R R R i0
j6=i
ij
j6=i
(7.18)
ij
ˆ of VPEC model, We define the circuit matrix G ˆ kij = − 1 , G ˆk R ij
X 1 ˆ kii = ( 1 + G ). k ˆ ˆk R R i0
Then, the system equations can be rewritten as:
j6=i
ij
(7.19)
125
7.2 VPEC via PEEC inversion
ˆ kii Vik + G
X j6=i
k
ˆ kij Vjk = l 2 ∂Ii . G ∂t
(7.20)
Compared with the system equation based on inductance matrix Lk and its inverse S k = (Lk )−1 : Lkii
∂Iik X k ∂Ijk + = Vik , Lij ∂t ∂t j6=i
Siik Vik +
X
k k Sij Vj =
j6=i
∂Iik , ∂t
(7.21)
ˆ k and S k only differ by a geometrical factor l 2 : we find that G ˆ k = l2 S k . G
(7.22)
Therefore, starting with the L matrix under the PEEC model, we first obtain the ˆ under the VPEC inverse of L, and then have the following extraction formula for R model: 1 k ˆ ij R =− 2 k, l Sij
k ˆ i0 R =
l 2 Siik
+
1 P
j6=i
k l 2 Sij
.
(7.23)
Because the major computation effort is the inversion of the L matrix, we call this method an inversion-based VPEC model, which leverages the existing PEEC extractor. In contrast, the localized VPEC model is integration based and it needs k to calculate explicitly the local flux Bij (j ∈ ni ) from scratch, where its accuracy is sensitive to the size of the control volume during the integration [86]. Therefore, it is highly accurate only for a small number of filaments and needs to optimize the size of the control volume for each filament when the system has a large number of filaments. Clearly, it becomes impractical for the full-chip extraction.
7.2.1
Magnetic energy in the VPEC model Generally, the magnetic energy is given by the following space integral [57]:
um
1 = 2
Z
A · Jdτ τ X = ukm . k=x,y,z
For the full VPEC model, (7.24) can be rewritten by:
(7.24)
126
Vector-potential equivalent circuit for inductance modeling
Z 1 Ak J k dτ 2 τ 1X k Ai · l · Iik = 2 i 1 X k ˆk Ai Ii . = 2 i
ukm =
(7.25)
ˆ matrix, Furthermore, when we rewrite the KCL equation (7.10) in terms of G X
Gkij Aki = Iˆik ,
(7.26)
j
we have the following relation for the magnetic energy under the full VPEC model: ukm =
1X k k k G A A . 2 i,j ij i j
(7.27)
ˆ matrix is positive definite. Below we prove that the G
7.2.2
ˆ matrix Property of G ˆ k (k = x, y, z) in the VPEC model is positive defTheorem 7.1. Circuit matrix G inite. ˆ only differs from the K matrix by a positive geometric constant, Because G the proof of the matrix property (passivity and strict diagonal dominance) for ˆ The existing proofs in [16, 25] are based on the analogy: K is equivalent for G. −1 [L] = µ[C ], which holds when [C][L] = constant. However, this relation does not hold in general, as shown in [55]. Below, we present a direct proof for the VPEC model. Proof : According to (7.27), because the energy ukm (k = x, y, z) is positive, it ˆ k [128]. automatically results in a positive definite matrix G Therefore, the corresponding VPEC model is passive. However, to further guarantee a passive model after truncating small-valued off-diagonal elements from the ˆ is strictly diagonal original positive definite matrix, we will prove that the matrix G P ˆk k ˆ dominant [128], i.e., Gii > j |Gij |.
ˆ k (k = x, y, z) in the VPEC ˆ k and R Lemma 7.1. All the effective resistances R i0 ij model are positive. Proof : We present the proof based on the KCL equation (7.10). Since effective resistances are only determined by the geometry of the filaments, it will not depend
127
7.2 VPEC via PEEC inversion
on the applied external sources. Without loss of generality, we assume that an impulse current Iik is applied at filament ai along the z direction, and all other filaments aj are connected to the vector potential ground. Note that for filament ai , its average vector potential Aki is in the same direction as Iik ; for any other grounded filament aj , its average vector potential Akj is zero, but its induced current −Ijk is in the opposite direction to Iik according to Lenz’s law. Hence for filament aj , (7.10) becomes (Akj − Aki ) −Aki = = −l Ijk , ˆk ˆk R R ij ij where the induced current Ijk is determined by the coupling flux between ai and aj . Equation (7.28) can be further rewritten by: k ˆ k = Ai > 0. R ij l Ijk
(7.28)
ˆ k can be easily proved in a similar The positiveness of the ground resistance R i0 fashion. With this Lemma, we can further prove the following Theorem: ˆ k (k = x, y, z) in the VPEC model is strictly diTheorem 7.2. Circuit matrix G agonally dominant. Proof : According to (7.19) we have X j6=i
ˆ kij | = |G
X 1 ˆk R j6=i
(7.29)
ij
and X j6=i
N
ˆk | < |G ij
X 1 1 ˆk , + =G ii ˆk ˆk R R i0
j6=i
ij
or ˆ kii > G
X j6=i
ˆ kij |. |G
(7.30)
ˆ is strictly diagonally dominant. Note that truncating small I.e., the circuit matrix G off-diagonal entries from a strictly diagonally dominant matrix still leads to a positive definite matrix, i.e., a passive circuit model [128]. Based on Theorem 7.2, such a truncation-based sparsification still leads to passive circuit models. Intuitively, ˆ matrix (equivalent to truncating truncating small off-diagonal entries in the G
128
Vector-potential equivalent circuit for inductance modeling
ˆ matrix) results in ignoring larger resistors in the larger off-diagonal entries in R equivalent resistance network. Because the larger resistors are less sensitive to and also contribute less to the current change, the resulting sparsified model can still have a good waveform accuracy, as presented in Subsection 7.3.2. Moreover, our proof assumes that wires can be decomposed into short wires with a similar length. Therefore, in our experiments, we always segment wires according to one-tenth of maximum operating frequency when the wire lengths are different.
7.3 7.3.1
Numerical examples Simulation of aligned parallel bus We consider a five-bit bus, with one-segment per line. Each bus line is 1000 µm long, 1 µm wide and 1 µm thick. The space between lines is 2 µm. With the extracted RLCM parameters under the PEEC model, we further construct the full VPEC ˆ ij between all bits) from this method, and the localized model (with coupling R ˆ ij between adjacent bits) from [86]. We measured VPEC model (with coupling R responses at the far end of all five bits, and compared waveforms of the second bit in Figure 7.3 (a)–(b), for both the time-domain and the frequency-domain simulations, where a 1-V step voltage with 10 ps rising time is used for the time-domain transient simulation, and a 1-V AC voltage for frequency-domain (1 Hz ∼ 10 GHz) simulation. Clearly, the full VPEC model and the PEEC model produce identical waveforms in both frequency-domain and time-domain simulations, but the localized VPEC model introduces a non-negligible error and is not accurate compared with the PEEC model, where the time-domain response shows 15% waveform difference and the frequency-domain response shows a large deviation beyond 5 GHz.
7.3.2
Truncated-VPEC model We further present the truncated-VPEC (tVPEC) model. After the full inversion of ˆ As explained in Section 7.2, L, we obtain a strictly diagonally dominant matrix G. its small-valued off-diagonal elements can be truncated without loss of passivity. We present two truncating approaches below: geometrical (gtVPEC) and numerical (ntVPEC) truncation. The first one is applicable to the aligned parallel bus, and the second is applicable to conductors of any shapes. Geometrical truncation For the aligned parallel bus, we can define a truncating window (NW , NL ) for each wire segment, where NW and NL are the numbers of coupled segments in directions of wire width and length, respectively. The coupling along the wire length is the forward coupling, and the one along the wire width is the aligned coupling. Because of the symmetry introduced by aligning and paralleling, each wire segment will have ˆ ij the same-sized truncating window. As a result, the tVPEC model only contains R
129
7.3 Numerical examples
Wave
HSPICE simulation of 5-bit bus with 1-segment per line: transient response at far-end of 2nd bit
Symbol
150m
Full PEEC Full VPEC Localized VPEC
Voltages (lin)
100m
50m
0
-50m
-100m 0 Wave
20p
40p
Time (lin) (TIME)
60p
80p
100p
HSPICE ac simulation of 5-bit bus with 1-segment per line: ac response at far-end of 2nd bit
Symbol
Full PEEC 350m
Full VPEC Localized VPEC
300m
Volts mag (lin)
250m 200m 150m 100m 50m 0 0
2g
4g 6g Frequency (lin) (HERTZ)
8g
10g
Figure 7.3 For five-bit bus, (a) a 1-V step voltage with 10 ps rising time and (b) a 1-V ac voltage are applied to the first bit and all other bits are quiet. The responses of the PEEC model, full VPEC model, and localized VPEC model are measured at the far end of the second bit. Reprinted with permission from [135] (c) 2005 IEEE.
within the truncating window for each wire segment, and is called gtVPEC in Table 7.2. We consider a 32-bit bus with eight-segments per line and four differently sized truncating windows: (32, 8), (32, 2), (16, 2), and (8, 2), and summarize the experiment settings and results in Table 7.2. Clearly, there is a smooth trade-off between runtime and accuracy for different truncating window sizes, where the average voltage differences and associated with standard deviations are calculated for the entire time steps in SPICE simulation. We first compare results of different truncating windows. The truncating window (8, 2) achieves the highest speed-up of 30× and the largest difference of about 0.2 mV on average, less than 2% of the noise peak, and the truncating window (32, 2) has the highest accuracy with 0.06 mV on average but a reduced speed-up of 10×. Furthermore, the small difference between windows (32, 8) and (32, 2) implies that the forward couplings between non-adjacent segments are negligible. However, an NW larger than NL (as shown in Table 7.2) is needed to achieve a high accuracy. This implies that the aligned coupling is stronger than the forward coupling and, considering only the adjacent aligned coupling may lead to a large error.
Vector-potential equivalent circuit for inductance modeling
Models and window sizings Full PEEC Full VPEC(32, 8) gtVPEC(32, 2) gtVPEC(16, 2) gtVPEC(8, 2)
Number of elements 32896 32896 11392 3488 2240
Run time (s) and speed-up 2535.48 (1×) 772.89 (3×) 311.22 (8×) 152.57 (16×) 85.14 (32×)
Average voltage difference (V) 0 1.00×10−5 5.97×10−5 −1.23 × 10−4 −2.17 × 10−4
Standard deviation (V) 0 6.26e-4 1.84×10−3 4.56×10−3 8.91×10−3
Table 7.2 Settings and results of geometrical tVPEC models. Reprinted with permission from [135] (c) 2005 IEEE. Wave
HSPICE simulation of 128-bit bus with 1-segment per line: response of 2nd bit
Symbol
Full PEEC 90m
Full VPEC ntVPEC(90.6%) ntVPEC(65.3%)
80m
ntVPEC(30.5%)
70m
60m
50m
40m Voltages (lin)
130
30m
20m
10m
0
-10m
-20m
-30m
0
20p
40p 60p Time (lin) (TIME)
80p
100p
Figure 7.4 For 128-bit bus by numerical truncation, a 1-V step voltage with 10 ps rising time is applied to the first bit, and all other bits are quiet. The responses of the PEEC model, the full VPEC model, and the tVPEC model are measured at the far end of the second bit. Reprinted with permission from [135] (c) 2005 IEEE.
Numerical truncation For the numerical truncation, we define the coupling strength as the ratio of an ˆ We off-diagonal element to its correspondent diagonal element at each row of G. then truncate those off-diagonal elements with coupling strength smaller than a specified threshold. Figure 7.4 plots the simulation results under the numerical sparsification for the non-aligned parallel bus with 128 bits and one segment per line. The sparse factor is the ratio between the numbers of circuit elements in the truncated and full VPEC models. The waveform difference is small in terms of the noise peak for sparse factors up to 30.5%. Table 7.3 summarizes the truncation setting and simulation result, where values in parentheses of column 1 are truncating thresholds, and the runtime includes both SPICE simulation and matrix inversion in the case of VPEC models. One can see from the table that up to 30× speed-up is achieved when the average waveform differences is up to 0.377 mV, less than 1% of the noise peak.
131
7.4 Inductance models in hierarchical reduction
Models and truncation thresholds Full PEEC Full VPEC(0.0) ntVPEC(5×10−5 ) ntVPEC(1×10−4 ) ntVPEC(5×10−4 )
Number of elements 8256 8256 7482 5392 2517
Runt time (s) and speed-up 281.02 (1×) 36.40 (7×) 30.89 (9×) 19.55 (14×) 8.35 (28×)
Average voltage difference (V) 0 −1.64 × 10−6 4.64×10−6 1.29×10−5 3.77×10−4
Standard deviation (V) 0 3.41×10−4 4.97×10−4 1.37×10−3 5.20×10−3
Table 7.3 Settings and results of numerical tVPEC models. Reprinted with permission from [135] (c) 2005 IEEE. A larger speed-up factor can be expected, as a higher waveform difference can be tolerated in practice. We also compare the full VPEC model and the PEEC model. The full VPEC simulation is 7× faster and has a negligible waveform difference.
7.4
Inductance models in hierarchical reduction In this section, we discuss the VPEC (vector potential equivalent circuit) used in hierarchical reduction. As a continued discussion, we further compare the inductance formulation by the VPEC model and by the nodal susceptance methods [9, 17, 98]. We will show that the inductance model by nodal susceptance is not physically equivalent to inductance as unwanted DC paths are created at low frequency.
7.4.1
Inductance formulation by nodal analysis For an RLCM circuit, when we assume that only independent current sources exist at external ports, the circuit matrix in the frequency domain starting with MNA formulation can be written as Gx + sCx = Bi(s),
v(s) = B T x,
(7.31)
where x, v, and i are the state variables, output voltage and input current vectors, and G, C, and B are the state and the input and output matrices, respectively. Equation (7.31) can be further written as: Ai in (s) vn C 0 vn G ATl , (7.32) = +s 0 il 0 L il −Al 0 where G and C are the admittance matrices for resistors and capacitors, L is the inductance matrix, which includes the mutual inductance; vn is a vector of node voltage; il is a branch current vector of inductors; Al is the adjacency matrix for all inductors; and Ai is the adjacency matrix for all port current sources. Circuit reduction means applying the Gaussian elimination for state variables like node voltage vn and branch current il . If we first reduce the branch current vector il , we actually result in the state equation only with the nodal voltage variables 1 [G + sC + Γ][vn ] = [Ai in (s)], s
(7.33)
132
Vector-potential equivalent circuit for inductance modeling
Rd1 (100W )
sCw1 (22.1 fF)
I in (1A)
sCL1(10 fF)
M12 (1.18nH)
sC12 (11.1 fF) Rd 2 (100W )
sLw1 (1.48nH)
Rw1 (17W )
Rw2 (17W ) sCw2 (22.1 fF)
sLw2 (1.48nH)
sCL2 (10 fF)
Figure 7.5 Example of a coupled two-bit RLCM circuit under the PEEC model. Reprinted with permission from [135] (c) 2005 IEEE.
where Γ = Al SATl = Al L−1 ATl is the nodal susceptance [9, 17, 98]. This is exactly the nodal analysis formulation with Γ representing the inductance. The circuitreduction by further eliminating the nodal voltage-variable vn is exactly the Y -∆ transformation in [98]. The nodal susceptance in (7.33) actually creates DC paths at low-frequency range. We illustrate this with a two-bit interconnect example shown in Figure 7.5. The nodal voltage equation of susceptance at four nodes (A, B, C, D) becomes (S11 /s)VA − (S11 /s)VB + (S21 /s)VC − (S21 /s)VD = I1
−(S11 /s)VA + (S11 /s)VB − (S12 /s)VC + (S12 /s)VD = −I1 (S12 /s)VA − (S12 /s)VB + (S22 /s)VC − (S22 /s)VD = I2
−(S12 /s)VA + (S12 /s)VB − (S22 /s)VC + (S22 /s)VD = −I2 .
(7.34)
As shown in Figure 7.6, this is mathematically equivalent to stamping the six susceptance elements into the admittance matrix [98] when s 6= 0. However, when the susceptance element Sij /s approaches infinite (thus 0 impedance or short circuit) when s = 0, there exist four unwanted DC-paths between nodes (A, B, C, D), which did not exist before. As a result, it leads to the wrong dc values and inaccurate lowfrequency simulation results, even for the two-bit bus example shown in Figure 7.5. We compute the exact driving-point impedance responses using inductance under MNA and the nodal susceptance under NA, respectively, using the symbolic analysis tool [110]. As shown in Figure 7.7, NA formulation (by using the nodal susceptance for inductance) gives the exact response as SPICE does in the high-frequency range, but the response is not correct in the low-frequency range. When s approaches zero, the actual driving-point impedance in Figure 7.5 should be dominated by three capacitors with a total capacitance value 43 fF. However, for Figure 7.6 at
133
7.4 Inductance models in hierarchical reduction
I in (1A)
S11 (1.85GH - 1 ) s
Rw1 (17W )
Rd1 (100W )
sCL1 (10 fF)
sCw1 (22.1 fF)
sC12 (11.1 fF) S12 (1.48GH - 1 ) s
Rw2 (17W )
Rd 2 (100W )
-
S22 (1.85GH - 1 ) s
sCw2 (22.1 fF)
sCL2 (10 fF)
S12 (- 1.48GH - 1 ) s
Figure 7.6 Example of a coupled two-bit RLCM circuit under the nodal susceptance model. Reprinted with permission from [135] (c) 2005 IEEE.
DC, the driving-point impedance becomes a resistor with a total resistance value 234 Ω(ohms) (or 49 Ω(dB)) owing to the unwanted DC paths. Frequency responses of SPICE and 6−inductor model
300
250
PEEC in SPICE Susceptance under NA VPEC in SPICE
ohmDB
200
150
100
50
0 0 10
2
10
4
10
6
10
8
10
Frequency
10
10
12
10
14
10
Figure 7.7 Frequency responses of PEEC model in SPICE, susceptance under NA and VPEC models for the two-bit bus. Reprinted with permission from [135] (c) 2005 IEEE.
r r
There are two reasons for such a discrepancy: The nodal admittance Γ/s is indefinite at DC. More importantly, the Γ matrix itself is singular. This is due to the fact that the number of nodal-voltage variables nv is usually much larger than the number of inductive branch-current variables ni . There are originally only ni independent variables for L, but the dimension of Γ becomes nv , which is much larger than
134
Vector-potential equivalent circuit for inductance modeling
v1 v2 v3 v4 v5 v6
v1 g
v2 -g
-g
g
v3
v4
g -g
v5
v6
v1 v1 v2 v3 v4 v5 v6
-g g
v2
v3
s -s
-s s
v1 v2 v3 v4 v5 v6
v2
v6
s -s
-s s
(b)
v3
v4 v5 -c x
v6
c -c x
v5
s x -s x -s x s x
s x -s x -s x s x
(a) v1 cx
v4
cx c
(c)
v1 v2 v3 v4 v5 v6
p1 1
p2
1
(d)
Figure 7.8 Stamp of the second-order admittance in the NA matrix, where (a), (b) and (c) represent for G, Γ, C and B. G (rank=4) and Γ (rank=4) are both singular for 6 × 6 matrices. Reprinted with permission from [135] (c) 2005 IEEE.
ni . Therefore, Γ is singular and has a low rank. Figure 7.8 shows the stamping of Γ for the example of two wires in Figure 7.6. As a result, the NA formulation of inductance, which is based on L−1 , is no longer equivalent to the original circuit matrix. Hence circuit reduction starting with nodal-susceptance formulation cannot give the correct low-frequency response in general, and is not suitable for generating wideband macro-models of the interconnects.
7.4.2
Inductance formulation by VPEC model From the above discussion, we know that inductance formulation by nodal susceptance leads to an inaccurate low-frequency response. It is not suitable for generating reduced interconnect models for wideband applications. However, directly handling mutual inductance in a dense MNA formulation, as in [3], will be computationally expensive. As shown in [134], the sparsified VPEC model actually not only achieves the runtime speedup, but also has high accuracy compared with the original full model. Therefore, we use the VPEC model to represent inductance in our circuit-reduction flow, as it enables passive pre-sparsification [86, 134]. The significant difference between VPEC and nodal-susceptance models for mutual inductance is that VPEC is a physically equivalent model, and it can exactly represent the original system [134]. As shown in Figure 7.9, this model consists of an
135
7.4 Inductance models in hierarchical reduction
Electrical Crcuit Block Rd 1 (100 W )
Rw1 (17 W )
I1
Magnetic Crcuit Block ^
V1 = 1e - 3 V1
^
-3
I 1 = 1e I 1
+
^
I in (1 A)
sC w1 (22.1 fF )
sC L1 (10 fF )
R10
-
R12 ^
Rd 2 (100W )
Rw 2 (17 W )
I2
sC w 2 (22.1 fF )
+ ^
v A1 ^
sC12 (11.1 fF )
i A1 = v A1 V1 -
^
V2 = 1e - 3 V2 ^
R 20
^
^
R12 = 6 .75 e - 4 W
iA2 = vA 2
I 2 = 1e - 3 I 2
sC L 2 (10 fF )
^
R10 = R 20 = 2 .66 e - 3 W
L1 = 1H
+
+
v A2
V2
^
-
L2 = 1H
Figure 7.9 Example of a coupled two-bit RLCM circuit under the VPEC model. Reprinted with permission from [135] (c) 2005 IEEE.
electrical circuit (PEEC resistance and capacitance) and magnetic circuit (VPEC effective resistance and controlled source). It includes the following components: (1) the wire resistance and the capacitance are the same as in the PEEC model; (2) a dummy voltage source (sensing electrical current Ii ) to control Iˆi ; (3) a voltagecontrolled current source to relate Vˆi and Iˆi with gain g = 1; (4) an electrical voltage ˆ i0 and coupling source Vi controlled by Vˆi ; (5) effective resistors including ground R ˆ Rij to consider the strength of inductances; and (6) a unit inductance Li to: (i) take into account the time derivative of Ai ; and (ii) preserve the magnetic energy from the electronic circuit. Clearly, this SPICE compatible implementation does not introduce unwanted DC paths when s = 0 as by the nodal-susceptance. Moreover, Figure 7.7 shows the response of the VPEC model for the 2-bit circuit, which is identical to SPICE for the entire frequency range. Detailed analysis also shows that the impedance function of the 2-bit circuit modeled by the VPEC model is the same as the impedance function by PEEC model. As shown in [134], although the VPEC model introduces more circuit elements, it has a faster runtime because this model dramatically reduces the number of reactive elements (i.e., inductors), and leads to fewer numerical derivatives and integrals and makes the simulation converge faster. To further improve the sparsified VPEC model extraction without full inversion, as in [134], we extend a windowing technique [9]. It reduces the computation complexity to (O(N b3 )), where b is the size of the window (i.e., the size of the submatrix). Note that although the VPEC model enables efficient inductance simulation, the order of the circuit matrix is still high. Moreover, its SPICE compatible model still contains controlled sources, and cannot be handled by the existing realizable circuit-reduction approaches [3, 98].
136
7.5
Vector-potential equivalent circuit for inductance modeling
Summary The primary content of this chapter is to derive the inversion-based full VPEC model for multiple inductive interconnects and illustrate how to build sparsified VPEC for SPICE simulation with guaranteed passivity. Using a vector potential, we developed the VPEC model to model the inverseinductance elements. The resulting VPEC circuit is passive and has a strictly diagonally dominant matrix. This enables truncation-based sparsification methods with guaranteed passivity. We have presented the truncation-based method and have achieved orders of magnitude speedup in circuit simulation with small errors ˆ can be used compared to the PEEC model. We have also shown that the matrix G to justify the K-element or susceptance based simulation [9,16,25,58,141] from first principles. Note that SPICE can directly simulate the VPEC model but not the Kelement based model. In addition, we also showed that the direct circuit stamping of the K elements by NA leads to singular matrices. In contrast, the VPEC model can be accurately stamped in SPICE.
8 Structure-preserving model order reduction
8.1
Introduction The integrated circuit industry has continuously enjoyed enormous success owing to its ever increasing large-scale integration. With the advent of system-on-a-chip (SOC) technology [30,133], it requires heterogeneous integration to support different modules within one single silicon chip such as logic, memory, analog, RF/microwave, FPGA, and MEMS sensor. Such a heterogeneous integration leads to highly nonuniform current distribution across one layer or between any pair of layers. As a result, it is beneficial to design a structured multi-layer and multi-scale power and ground (P/G) grid [11] that is globally irregular and locally regular [115] according to the current density. This results in a heterogeneously structured P/G circuit model in which each subblock can have a different time constant. In addition, the typical extracted P/G grid circuits usually have millions of nodes and large numbers of ports. To ensure power integrity, specialized simulators for structured P/G grids are required to efficiently and accurately analyze the voltage bounce or drop using macro-models. In [139], internal sources are first eliminated to obtain a macro-model with only external ports. The entire P/G gird is partitioned at and connected by those external ports. Because elimination results in a dense macro-model, [139] applies an additional sparsification procedure that is error-prone and inefficient. In addition, [18,95] proposed localized simulation and design methods based on the locality of the current distribution in most P/G grids with C4-bumps. The P/G grid is divided into several subblocks, where the current sources in each block only affect the voltage fluctuation in that block. Nevertheless, all these methods [18,95,139] do not consider the heterogeneous structure of P/G grids resulting from the non-uniform current density. It results in a flat time-domain transient simulation that all blocks use the same time-step. It would improve the efficiency if each partitioned block has a different time constant and, hence, can be simulated at a different time-step, as in the waveform relaxation method [132]. An alternative approach for obtaining a macro-model is to use moment matching [91] by projection-based model order reduction (MOR) [32, 63, 85, 113], where PRIMA [85] is the method widely used to efficiently generate an order-reduced macro-model with preserved passivity. The reduced model of PRIMA [85] by a 137
138
Structure-preserving model order reduction
projection matrix with order q can match n = bq/np c block moments (where np is the port number). PRIMA can be implemented in a fashion of iterative pathtracing to efficiently solve tree structured P/G grids [120]. However, it is too inefficient to be directly applied to mesh structured P/G grids. The difficulty in appling MOR to analyze P/G grids stems mainly from the following reasons. The cost of Arnoldi orthonormalization is high for large circuits, and moment matching using the block Krylov subspace is less accurate with an increased number of ports. In addition, the reduced macro-model is dense, which slows down simulation when the port number is large [34]. To reduce orthonormalization cost for large circuits, HiPRIME [71] applies a partitioned PRIMA to reduce the entire circuit in a divideand-conquer fashion. After gluing the reduced state matrices, HiPRIME performs an additional projection to further reduced the entire system. However, all these approaches [71,85] use a flat projection that leads to a loss in the structure information of the state matrices. For example, the original state matrices may be sparse, but they become dense after flat projection. The resulting macro-model, therefore is too dense to be efficiently factorized in the time-domain or frequency-domain simulation. To leverage the subblock structure in G and C matrices during macromodel construction, we introduce a structured model order reduction method in this chapter.
8.2
Chapter overview In Section 8.3, we first review the moment-matching theorems. Instead of matching block moments of the transfer function, we directly match moments of output with an excitation current vector. As a result, the first q moments or q dominant poles of output can be matched using a projection matrix with order q, which is independent of port number. In contrast, the number of matched block moments by PRIMA decreases as the port number increases. In Section 8.4, we first review a recent 2 × 2 structure-preserving model order reduction: SPRIM [37]. We further introduce a generalized block structure-preserving model order reduction (BSMOR). Unlike SPRIM, it further partitions the flat projection matrix obtained from PRIMA into m blocks according to the provided block representation of state matrices. We show that this reduction preserves the block structure. As a result, the macro-model obtained by BSMOR is intrinsically sparse, and does not need the LP-sparsification used in [139]. Moreover, we show that the reduced macro-model can exactly match q dominant and additional (m − 1)q approximated poles. However, these two methods do not consider the heterogeneous substructure of state matrices. The reduced model is not compact or optimum, as discussed in Section 8.5 later on. In addition, as in [85], it orthonormalizes the entire state matrices to obtain the projection matrix. As a result, it is inefficient for large sized circuits. In Section 8.5, we propose a triangularization-based structure-preserving model order reduction (TBS) method.
139
8.3 Background
r r r
We first represent the original system by interconnected basic blocks. The basic blocks are obtained from the current density of locally regular structures in P/G grids. We reduce each basic block independently with order q, determine its first q dominant poles, and obtain its corresponding projection matrix. We then carry out dominant-pole-based clustering to obtain m clusters of basic blocks, where m is decided by the nature of the structured network. Each cluster is called a compact block with a unique pole distribution and a projection matrix. Because clustering reduces the redundant block information, the blockbased form in our method is more compact than that in BSMOR. We further triangulate the system into a triangular system with m compact blocks in the diagonal. The poles of the resulting triangular system are determined only by m diagonal blocks. A block-diagonal-structured projection matrix is constructed by stacking projection matrices for individual diagonal blocks in the triangular system. The reduced triangular system can be proved to match mq poles of the original one. This is the primary contribution of this method. Because PRIMA or HiPRIME can only match q poles using the same number of moments, the reduced system by TBS is more accurate, or TBS has a higher reduction efficiency under the same error bound. Moreover, the mq poles are exactly matched in TBS but not in BSMOR.
In addition, as discussed in Section 8.6, the obtained macro-model by BSMOR and TBS also enables a two-level analysis similar to [47] to solve each reduced block independently with different time steps. It reduces the transient simulation time in both frequency and time domains. In contrast, the reduced model by PRIMA and HiPRIME is dense and cannot be analyzed in a fashion of two-level analysis. We present the experiments in Section 8.7.
8.3 8.3.1
Background Grimme’s moment-matching theorem Using modified nodal analysis (MNA), the system equation of a P/G grid in the frequency domain is (G + sC)x(s) = Bu(s),
y(s) = BT x(s)
(8.1)
where x(s) is the state variable vector, G and C (∈ R N ×N ) are state matrices for conductance and capacitance with size N , and B and L (∈ R N ×np ) are input and output port-incident matrices with np ports. Eliminating x(s) in (8.1) gives H(s) = LT (G + sC)−1 B,
(8.2)
H(s) is a multiple-input multiple-output (MIMO) transfer function. PRIMA [85] finds a projection matrix V (∈ RN ×n ). It has dimension q and its columns span
140
Structure-preserving model order reduction
n-block (n = dq/np e) Krylov subspace K(A, R, n), i.e, K(A, R, n) = span(V ) = {R, AR, ..., An−1 R},
(8.3)
where two moment-generating matrices are A = (G + s0 C)−1 C,
R = (G + s0 C)−1 B,
and s0 is the expansion point that ensures that G + s0 C is non-singular. The reduced transfer function is ˆ ˆ T (G ˆ + sC) ˆ −1 B, ˆ H(s) =L
(8.4)
where ˆ = V T GV, C ˆ = V T CV, B ˆ = V T B, L ˆ = V T L. G ˆ and C ˆ ∈ Rq×q , and B ˆ and L ˆ ∈ Rq×np . As proved in [45], H(s) ˆ Note that G preserves the block moments of H(s), i.e., Theorem 8.1. If K(A, R, n) ⊆ span(V ), then the first n expanded block moments ˆ at s0 are identical for H(s) and H(s).
8.3.2
Moment matching of output response According to Theorem 8.1, if there is only one port, i.e., we have a (single-input single-output) SISO system, the reduced model can match q moments. When the port number np is large, which is typical for P/G grids, the order of matched block ˆ moment n is reduced and the reduced transfer function H(s) is less accurate. In this case, it is better to define an excitation current vector J = Bu(s) to directly match the moment of output x(s) = (G + sC)−1 J with the input vector J. As a result, the matched moments of the output with input J are q, which is independent of the port number np . This is because a MIMO system with right-hand-side Bu can be transformed into superposed SISO systems with the input J. The following Theorem has been proved. Theorem 8.2. Assume a MIMO system with a unit-impulse current source u, and define the excitation current vector J = Bu, where u ∈ R p and J ∈ RN . When the q columns of projection matrix V are obtained, the reduced response at the ˆ + sC) ˆ −1 J ˆ (J ˆ = V T J) matches the first q moments of the original ˆ (s) = (G output x −1 x(s) = (G + sC) J. Note that the following two systems have the same output x(s) (G + sC)x(s) = Bu(s),
(G + sC)x(s) = J(s).
(8.5)
141
8.4 Block-structure-preserving model reduction
In addition, J can be decomposed into several excitation components J=
p X
Ji = [J1
0 ... 0]T + ... + [0 ... Jp
0]T ,
i=1
Clearly for each Ji , it is equivalent to excite a SISO system with input Ji . Therefore, ˆ i (s) matches the first q moments of xi (s). With superposition, it is easy to verify x Pp P ˆ i (s) matches the first q moments of pi=1 xi (s). In contrast, PRIMA [85] that i=1 x matches the block moment of transfer function with input matrix B. Moreover, we have Corollary 8.1. With the input J, the first q dominant poles of x(s) are matched ˆ (s). by x Using excitation current vector J as input, the first q moments are identical for ˆ (s). So are the first q dominant poles. x(s) and x Because the typical P/G grids contain large numbers of ports, in this chapter the MOR is performed to match the moment of output x(s) with the input J = Bu, similar to [71,130]. This theorem can easily be extended to inputs with non-impulse current sources by using a generalized excitation current source with an augmented Arnoldi orthonormalization [112].
8.4 8.4.1
Block-structure-preserving model reduction SPRIM method The SPRIM method [37] observes a 2 × 2 block structure of the MNA matrix G AT C 0 G= ,C = , (8.6) −A 0 0 L where G (∈ Rn1 ×n1 ), C (∈ Rn1 ×n1 ), L (∈ Rn2 ×n2 ) are the conductance, capacitance and inductance matrices, respectively, and A (∈ R n2 ×n1 ) is the adjacent matrix indicating the branch current flow at the inductor. Note that n1 + n2 = N . It constructs a structured projection matrix Ve accordingly by partitioning the flat V obtained from the q-th PRIMA iteration V1 V1 0 e V = →V = . (8.7) V2 0 V2
where V1 is ∈ Rn1 ×qnp , V2 is ∈ Rn2 ×qnp , and Ve is ∈ RN ×2qnp . As a result, the number of columns in Ve is twice that in V . Accordingly, the new reduced state matrices are # " " # e 0 e A eT C G e e G= (8.8) e 0 , C = 0 L, e , −A
142
Structure-preserving model order reduction
where e = V1 T GV1 G
e = V2 T AV1 A
e = V1 T CV1 C
e = V2 T LV2 . L
e Ce (∈ R2qnp ×2qnp ), and Be (∈ R2qnp ×np ) is twice than that of Similarly, the size of G, ˆ ˆ ˆ G, C, and B reduced by using V . Therefore, the reduced model with state matrices has Ge and Ce approximately twice as much poles as those of the reduced model with ˆ An improved accuracy over PRIMA is observed [37]. state matrices Gˆ and C. Since the reduced model is written in the first order form in (8.8), the model reduced by SPRIM is twice as large as that produced by PRIMA. But the reduced model produced by SPRIM preserves the structure of the original model and can be further reduced into the second-order form using Schur’s decomposition [44] to element branch-current variable: eT L e −1 A, e e + sC e + 1A YeN A = G s
where YeN A is the reduced state matrix in NA form, which has the same size of the reduced matrix by using V . But the difference is that each element in GeN A becomes a second-order rational function of s instead of first-order of s. Hence, SPRIM algorithm essentially consists of two reduction steps: the first step is the structure-preserving projection-based reduction and the second step is block elimination. As a result, SPRIM can match more poles than PRIMA, and they result in the same size of the reduced model.
8.4.2
Block structured projection Nevertheless, SPRIM does not consider the possible subblock structure inside the G and C matrices. For the structured system, the original P/G grids can be partitioned into m0 basic blocks, where a dense grid with a small pitch is used for a region with high current density, and a sparse grid with a large pitch is used for a region with low current density [115]. Because of the heterogeneous structure of grids, each basic block can have different RC values, Gii and Cii , which are interconnected by the coupling blocks Gij and Cij (i 6= j), respectively. As a result, the conductance matrix G can be written in a block representation G1,1 G1,2 . . . G1,m0 G2,1 G2,2 . . . G2,m0 (8.9) G= . , .. .. .. .. . . . Gm0 ,1 Gm0 ,2 . . . Gm0 ,m0 P m0 nk = N ). A similar block structure is where each block has the size nk ( k=1 assumed for the C matrix and J becomes T T (8.10) x = x 1 x 2 . . . x m0 , J = J 1 J 2 . . . J m0 P m0 where each block contains user-specified np k ports (np = k=1 np k ).
143
8.4 Block-structure-preserving model reduction
Accordingly, we further partition the qth order projection matrix V obtained from PRIMA according to the m0 blocks in (8.9) V1 V2 V = . ..
Vm0
V1 0 → Ve = . ..
0 V2 .. .
... ... .. .
0 0 .. .
0 0 . . . V m0
,
(8.11)
where each Vi is ∈ Rni ×q , and V is ∈ RN ×q , but Ve is ∈ RN ×mq . We call this reduction a block structure-preserving model order reduction (BSMOR). Note that V T V = I, i.e., each column of Ve is still linearly independent and the total columnrank is increased by a factor of the block number m. The order-reduced state matrices are obtained by projecting Ve : e = Ve T GVe , G
Elementwise, we have
e = Ve T CVe , C
e = Ve T J. J
(8.12)
e i,j = Vi T Gi,j Vj C e i,j = Vi T Ci,j Vj J e i = V i T Ji G
(8.13)
e i,j represents the blocks at i block row and j block column in the reduced where G e So do C e i,j and J ei . Note that such a m × m block projection preserves the matrix G. structure and sparsity of the original G, C matrices. For example, when projected e matrix is by Ve , the reduced G
Ge =
V1 T G1,1 V1 V2 T G2,1 V1 .. .
V1 T G1,2 V2 V2 T G2,2 V2 .. .
. . . V1 T G1,m0 Vm0 . . . V2 T G2,m Vm0 .. .. . .
VmT0 Gm0 ,1 V1 Vm0 T Gm0 ,2 V2 . . . Vm0 T Gm,m Vm0
.
(8.14)
e is One important observation is that if the original G is sparse, the resulted G sparse as well. In contrast, when projected by flat V using PRIMA or HiPRIME, ˆ is the resulted G
ˆ = G
m X m X
Vi T Gi,j Vj ,
(8.15)
i=1 j=1
which is dense in general. Similar sparsity holds for C and J. Because the column rank of V is m0 times larger than V , the reduced model e and C e can approximate an additional m0 − 1 poles than the with state matrices G
144
Structure-preserving model order reduction
ˆ and C. ˆ However, the additional m0 − 1 poles reduced model with state matrices G by V are not exactly matched. This projection has following limitations: 1. 2. 3.
The system poles are determined not only by the diagonal blocks, but also determined by those off-diagonal blocks. The initially provided m0 basic blocks are not optimal for pole matching because they could have pole overlap with each other. The construct of V needs to obtain V by reducing the entire state matrices. Its orthonormalization cost is expensive.
As discussed in Section 8.5, these problems can be solved using a triangularizationbased block structure-preserving model order reduction.
8.5 8.5.1
TBS method Compact block formulation We first present a dominant-pole based clustering to generate a compact block representation of those basic blocks. Two-level organization of basic block We first decompose G and C in (8.9) by a two level representation G + sC = Y0 (s) + Y1 (s).
(8.16)
The diagonal part Y0 (s) is G0 + sC0 , where G0 = diag[G11 , ..., Gm0 m0 ],
C0 = diag[C11 , ..., Cm0 m0 ].
Note that each block matrix Gii or Cii is symmetric positive definite (s.p.d), i.e., each basic block is passive. The off-diagonal part (Y1 )ij is composed by the coupling block: Gij + sCij (i 6= j). Its entries are usually smaller than those in basic blocks in the diagonal. Accordingly, the moment generation matrices for each basic block are (A0 )i = (Gii + s0 Cii )−1 Cii ,
(R0 )i = (Gii + s0 Cii )−1 Ji .
As will be discussed, the two-level decomposition enables a structure-preserving model order reduction that constructs its projection matrix only from those diagonal blocks. In addition, the reduced macro-model can be efficiently analyzed in a two-level fashion. Clustering To obtain a more compact block representation we propose a bottom-up clustering algorithm based on the dominant poles. The system timing response for each basic block can be approximately determined by its q dominant poles, i.e., the first
145
8.5 TBS method
q most dominant eigenvalues or poles (λ1 ≤ ... ≤ λq ). Poles are calculated from e =G e −1 C e (∈ Rq×q ). the eigendecomposition of the order-reduced moment matrix A Note that when the excitation current vector is used for moment-matching the output, the size q of the reduced model with the desired accuracy can be much smaller than the size of the original model. As a result, the cost of eigendecomposition of reduced model is not high. Precisely, for m0 basic blocks, we calculate the first q dominant poles for each basic block by reducing it independently and finding its projection matrix Vi accordingly span(Vi ) = K((A0 )i , (R0 )i , q)
i = 1, ..., m0 .
(8.17)
ˆ i matches the first q moments of According to Theorem 8.2, using Vi , the reduced x ˆ i hence also matches the first q dominant poles of xi according xi with input Ji . x to Corollary 8.1. Assume block i has (Gii , Cii , Ji ). Its q-dominant pole set is e 0 )i ] = {λ1 ≤ ... ≤ λq }. Λi = eigen[(A
After merging block i with another block j and their interconnection (Gij , Cij ), its q-dominant-pole set becomes e 0 )0 ] = {λ0 ≤ ... ≤ λ0 }, Λ0i = eigen[(A i 1 q
where (A0 )0i is the new moment generation matrix for the merged block. Moreover, we can define the pole distance. If Λi and Λj are two dominant-pole sets, λm ∈ Λi and λn ∈ Λj , then the pole distance d(Λi , Λj ) is d(λm , Λj ) = min{|λm − λn | : λn ∈ Λj }
d(Λi , Λj ) = max{d(λm , Λj ) : λm ∈ Λi }.
The two basic blocks have a similar pole distribution and are clustered if d(Λ0i , Λi ) < , where is a small value specified by the user. More basic blocks can be merged into this cluster if they have a similar pole distribution to the cluster. On the other hand, a basic block itself is a cluster if it does not share a similar pole distribution with other blocks. The clustering obtains m clusters of basic blocks, where m is decided by the structure of P/G grids and . We call the clustera a compact block in this chapter. Accordingly, we can obtain a set of projection matrices: V = [V1(n1 ×q) , . . . , Vm(nm ×q) ], one for each compact block. The overall procedure is outlined in Algorithm 8.1. This interconnected compact block representation reduces the complexity of the original basic block representation as fewer blocks are needed to represent the original system. Moreover, the set of the first q-dominant poles of each clustered block has a minimum overlap. Note that because the original structured power grid shows a heterogeneous structure, each region can have various RC values, and the clus-
146
Structure-preserving model order reduction
tering algorithm will not converge to one entire circuit. This has been verified experimentally.
Algorithm 8.1 Dominant-pole clustering I = 0; /* M : unclustered blocks in m0 */ for every i in M do (0) (0 (0) input: (G0 )ii , (C0 )ii , (B0 )i ; (0) (0 e )ii by PRIMA; find: Vi and (A 0 /* Eigendecomposition of order reduced A */ (0) e (0) )ii ]; form: Λi = eigen[(A 0 for every j in M do (I) (I+1) while (d(Λi , Λi ) < ) do cluster: Merge block i and block j; update: Delete block i and block j in M ; (I+1) e (I+1) )ii by PRIMA; find: Vi and (A 0 (I+1) e (I+1) )ii ]; form: Λi = eigen[(A 0 (I) (I+1) calculate: d(Λi , Λi ); end while I + +; end for end for (I+1) (I+1) output: {V1 (nm ×q) }; (n1 ×q) , . . . , Vm
8.5.2
TBS model order reduction Although clustering results in m blocks, each has the unique pole distribution; the poles of the entire grids are not only determined by those diagonal blocks. In this section, we discuss how to form the upper triangular system (G, C) that is equivalent to the original system (G, C), and show that the system poles of (G, C) are determined only by its diagonal blocks [44]. With an additional block-structured projection, the reduced blocks can match more poles than the flat projection.
Triangularization The triangularization is based on the introduction of a replica block of (G, C), and moving those lower triangular blocks of (Gij , Cij ) (i < j) to the upper triangular parts at (Gi,m+j , Ci,m+j ). The resulting triangular system has an upper triangular
147
8.5 TBS method
state matrix G
G11 G12 0 G 22 . .. . G = . . 0 0 0
. . . G1m 0 0 . . . G2m G21 0 . .. .. .. . .. . . . . . Gmm Gm1 Gm2
0 0 .. . , ... 0 G
... ... .. .
(8.18)
and C has a similar structure to G. The port matrix B and state variable x are T T J = J 1 J 2 . . . J m J , x = x1 x2 . . . x m x .
where J and x are defined in (8.9) and (8.10). The resulting triangular system equation is
(G + sC)x(s) = J
(8.19)
It is easy to verify that the solution x(s) from (8.19) is the same as x(s) from (8.1). Below, we prove that the new triangular system is passive. Theorem 8.3. The upper block triangular system (G, C) is passive. Proof: The eigenvalues of the triangular system are given by the product of determinants of diagonal blocks |G| =
m+1 Y i=1
|(G0 )i | = |(G0 )1 |...|(G0 )m ||G|
Because each block (G0 )i (1 ≤ i ≤ m) and G are positive definite, G is positive definite as well. The same procedure can be used to prove that C is positive definite. Therefore, G + G T and C + C T are both s.p.d, and hence the triangular system is passive. Note that directly solving (8.19) involves a similar cost of solving (8.1) as the replica block at the lower-right corner needs to be factorized first. As shown below, its benefits can be appreciated after a structure-preserving model order reduction, where the state variable of each reduced block can be solved independently with q matched poles. mq-pole matching After clustering in Section 3.2, we can also obtain a set of projection matrices: {V1 , ..., Vm , Vm+1 }, where Vi (1 ≤ i ≤ m) is constructed for each block. Without using orthonormalization for the replica block, Vm+1 is obtained by Vm+1 = [V1 , ..., Vm ] (∈ RN ×q ).
(8.20)
Furthermore, instead of constructing a flat projection matrix V = [V1 , ..., Vm , Vm+1 ],
(∈ R2N ×q ),
(8.21)
148
Structure-preserving model order reduction
we reconstruct a block-diagonal structured projection matrix V: V = diag[V1 (n1 ×q) , ..., Vm (nm ×q) , Vm+1 (N ×q) ],
(8.22)
Pm with V ∈ R2N ×(m+1)q , i=1 ni = N. Using V to project G, C and B matrices respectively, we can obtain the order-reduced state matrices Ge = V T GV,
Ce = V T CV,
Je = V T J ,
In particular, the diagonal blocks in reduced Ge and Ce are called reduced blocks. The reduced Ge matrix preserves the upper block triangular structure " # eA GeB G Ge = , (8.23) 0 GeD
where
T T V11 G11 V V T G12 V11 . . . V11 G1m Vmm T T T 0 V22 G22 V22 . . . V22 G2m Vmm GeA = .. .. .. .. . . . . T T 0 0 . . . Vmm Gmm Vmm 0 0 ... 0 T V11 G12 V22 0 ... 0 e GB = .. .. . . .. . . . . T T T Vmm Gm1 V11 Vmm Gm2 V22 . . . 0
T GeD = Vm+1,m+1 GVm+1,m+1 .
(8.24)
Since BSMOR does not use triangularization, its system poles are not determined by those diagonal blocks. Therefore, its reduced macro-model does not exactly have mq poles matching (See Figure 8.1). In contrast, TBS can exactly match mq poles as discussed below. Theorem 8.4. For the state matrices G and C in the upper triangular block form, e ii , C e ii ) (∈ Rq×q ), if there is no overlap between eigenvalues of the reduced blocks ( G i.e., e 00 )1 + s(C e 00 )1 | ∪ ... ∪ |(G e 00 )m + s(C e 00 )m | = null, |(G
(8.25)
e exactly matches mq poles of the original system (G + the reduced system (Ge + sC) sC). Proof: Because the original G and C are in the upper triangular form, and the projection by V preserves the structure, the reduced Ge and Ce are in the upper e its triangular block form as well. For an upper-triangular-block system Ge + sC, e which are determined poles (eigenvalues) are the roots of its determinant |Ge + sC|,
149
8.6 Two-level analysis
only by the diagonal blocks e = |Ge + sC|
m Y
i=1
e ii + sC e ii |. |G
e represent the reciprocal poles of the reduced model Note that eigenvalues of |Ge + sC| e e ii with input Ji , its output x [85]. For the reduced block Gii + sC ei matches q moments and the first q domain poles of the output xi for block Gii + sCii in the triangular system. Since the entire system consists of m compact blocks, each with unique pole distribution, the reduced model by TBS can match mq poles. Note that the redundant poles obtained from the replica block are not counted here. With more matched poles, TBS is more accurate than HiPRIME and BSMOR. This will be shown in Section 8.7. Algorithm 8.2 Two-level analysis 1. Solve bottom level individually m + 1: reduced blocks for every i in m + 1 do e i , (Ye0 )i , (Ye1 )i ; (1.1) input: b (1.2) factor: LU/Cholesky factor (Ye0 )i ; (0) ei; (Ye1 ) = b i
for every j in q do (j) (j) (1.3) solve: back-substitution (Ye0 )i Pi = (Ye1 )i ; end for end for 2. Solve top level (2.1) input: P, P (0) ; (2.2) factor: LU/Cholesky factor I + P ; (2.3) solve: back-substitution (I + P )Q = P (0) ; 3. Update bottom level individually (3.1) output: x = P (0) − P Q.
8.6
Two-level analysis Because of the structure preserving, the reduced triangular system by BSMOR and TBS can be further analyzed efficiently by a two-level analysis similar to [47]. Note that a direct backward substitution-based analysis can be applied to the TBSreduced model, as it is a triangular system. Because the two-level analysis enables the parallelized solution and can be extended to the hierarchical analysis, it is used in this chapter to obtain the solution in both frequency and time domains. As a result, the state variable of each reduced block can be solved independently with matched q poles.
150
Structure-preserving model order reduction
Figure 8.1 Pole matching comparison: mq poles matched by TBS and BSMOR, and q poles matched by HiPRIME. Reprinted with permission from [138] (c) 2006 ACM.
Consider the system equation for the reduced model e = eb. Yx
(8.26)
In the frequency domain at a frequency point s, (8.26) becomes Ye = Ge + sCe = Ye0 (s) + Ye1 (s), eb = Je(s),
and in the time domain at a time instant t with time step h, (8.26) becomes 1 Ye = Ge + Ce = Ye0 (h) + Ye1 (h), h
eb = 1 Cx(t e − h) + Je(t). h
Note that the time step h can be different for each reduced block according to its dominant pole (λ1 ). The state vector x can be solved for each block in a fashion of two-level analysis similar to [47]. x = P (0) − P Q,
(8.27)
where P (0) = (Ye0 )−1eb,
P = (Ye0 )−1 Ye1 ,
Q = (I + P )−1 P (0) .
(8.28)
To avoid explicit inversion, LU or Cholesky factorization needs to be applied to Ye0 and I + (Ye0 )−1 Ye1 . As Ye0 shows the block diagonal form, each reduced block matrix is first solved independently with LU or Cholesky factorization and substitution at
8.7 Numerical examples
151
Figure 8.2 Non-zero (nz) pattern of conductance matrices: (a) original system, (b) triangular system, (c) reduced system by TBS. (a)–(c) have different dimensions, but (b)–(c) have the same triangular structure and the same diagonal block structure. Reprinted with permission from [138] (c) 2006 ACM.
the bottom level. The results from each reduced block are then used to solve the coupling block at the top level, and the final xk of each reduced block is updated. The overall procedure is outlined in Algorithm 8.2.
8.7
Numerical examples We implemented the BSMOR and TBS on a Linux workstation (P4 2.66GHz, 1Gb RAM). The RC mesh structures of the P/G grid are generated from realistic applications. In this section, we first verify that TBS preserves the triangular structure (sparsity) and matches mq poles, and then compare its accuracy and runtime with HiPRIME [71] and [139]. The excitation current sources (unit impulse) are explicitly considered in all MOR based methods to avoid block moment matching. The clustered block structure obtained from TBS is used as the partition for HiPRIME and [139], and the same block number is used for BSMOR but each block has the same size. The back-Euler method is used for time-domain simulation, and two-level analysis is applied for TBS, BSMOR, and [139]. In addition, TBS also considers the different time steps for different reduced blocks. In the comparison of the macromodel building and simulation time, all reduced models have similar accuracy. In the comparison of the waveform error, all MOR methods use the same number of matched moments, and macro-models for TBS and [139] have similar sizes and sparsification ratios.
152
Structure-preserving model order reduction
4.5 [135] HiPRIME BSMOR TBS Original
4 3.5
Voltage (V)
3 2.5 2 1.5 1 0.5 0 0
0.1
0.2
0.3
0.4
0.5 0.6 Time (ns)
0.7
0.8
0.9
1
−8
x 10
Figure 8.3 Comparison of time-domain responses between HiPRIME, BSMOR, [139], TBS and the original. TBS is identical to the original. Reprinted with permission from [138] (c) 2006 ACM.
8.7.1
A non-uniform structured RC mesh To model the power delivery network for four layers in a 3D IC design, we use a non-uniform RC mesh (size 1M) with 32 equally sized basic blocks (each layer has eight blocks) and 32 unit-impulse current sources located at centers of basic blocks. Each basic block has a different magnitude of RC values. The number of connections between any pair of basic blocks is also different. HiPRIME, BSMOR and TBS all use q = 8 moments to generate the reduced model. The clustering algorithm found four clusters with 4, 4, 8, 16 basic blocks, respectively. As a result, TBS constructs a block structured projection using four blocks with the aforementioned sizes. In contrast, BSMOR constructs a block structured projection using four blocks with equal size. Figure 8.2 shows the non-zero pattern of the conductance matrix before triangularization in Figure 8.2 (a), after triangularization in Figure 8.2 (b), and after the TBS reduction (m = 4, q = 8) in Figure 8.2 (c). Figure 8.2 (b) and (c) have similar non-zero patternw; this verifies that TBS preserves the triangular structure. Owing to the intrinsic sparsity by TBS, the reduced model has a 40.1% sparsification ratio. In contrast, HiPRIME generates a fully dense state matrix after the reduction and the sparsity in the reduced model by [139] is obtained by an additional LP-based sparsification. To compare pole matching, we choose one observation port that is not at the source node. The relative errors are calculated as the magnitude difference of poles between the reduced and original models. As shown in Figure 8.1, HiPRIME can only approximate eight poles of the original model, but TBS and BSMOR can approximate 32 poles owing to increased column rank in the projection matrix.
153
8.7 Numerical examples
0.62 Original HiPRIME TBS BSMOR
0.6
Voltage (VdB)
0.58
0.56
0.54
0.52
0.5 0
0.5
1
1.5
2
2.5 3 Frequency (Hz)
3.5
4
4.5
5
10
x 10
Figure 8.4 Comparison of frequency-domain responses between HiPRIME, BSMOR, TBS, and the original. TBS is identical to the original. Reprinted with permission from [138] (c) 2006 ACM.
Moreover, for poles matched by both TBS and BSMOR, TBS is about six times more accurate in average. This is because the system poles of a triangular matrix are determined by its diagonal blocks. With a structure-preserving model order reduction, the reduced triangular system by TBS can exactly match mq poles of the original system. In contrast, the reduction in BSMOR does not have the triangular structure, and hence its approximated mq poles are less accurate than those obtained by TBS. Figure 8.3 compares the time-domain response at one port for HiPRIME, BSMOR, [139], TBS and the original under a unit-impulse input. The time-domain waveform error is counted as the relative deviation at peak voltage. The reduced model by TBS is visually identical to the original model, but HiPRIME shows up to 36% error due to much fewer matched poles, and [139] shows up to 64% error due to the sparsfication. As mentioned before, the projection matrix constructed by BSMOR uses four uniform blocks, each are the same size. As a result, it is not optimum to match poles and results in up to 23% error. Figure 8.4 further shows the frequency-domain response under an impulse input. Using the same number of moments, we observe that the reduced model by TBS is identical to the original up to 50 GHz, but the one by BSMOR or HiPRIME shows non-negligible deviation beyond 10 GHz.
8.7.2
Scalability study We first study the runtime scalability of the reduced macro-model by HiPRIME, BSMOR, the method from [139] and TBS. The runtime here includes both the macro-model building time and macro-model simulation time (time-domain). All
Structure-preserving model order reduction
1e5
1e6
[135] TBS 1e4
1e5
HiPRIME BSMOR
1e4 Simulation time (s)
1e3 Building time (s)
154
1e2
1e3
1e2
1e1
Original HiPRIME [135] TBS BSMOR
1e1
1e0
0 3e2
1e0
1e3
3e3
1e4
3e4 1e5 Circuit Size
(a)
3e5
1e6
3e6
1e7
0 3e2
1e3
3e3
1e4
3e4 1e5 Circuit Size
3e5
1e6
3e6
1e7
(b)
Figure 8.5 Comparison of runtime under similar accuracy. (a) macro-model building time (log scale) comparison; (b) macro-model time-domain simulation time (log scale) comparison. Reprinted with permission from [138] (c) 2006 ACM.
reduced state matrices are constructed with similar accuracy. Figure 8.5(a) compares the macro-model building time. As [139] needs additional LP-based sparsification, it is inefficient for large sized P/G grids. For example, for a RC-mesh with size of 7.68 M, the method in [139] needs 4 hours, 42 minutes and 38 seconds to build a reduced macro-model with size 1 K and sparsity 30%, but TBS only spends 2 minutes and 8 seconds (133× speedup) to build the similar sized macro-model. Moreover, TBS also has 54× speedup over BSMOR (1 hours,45 minutes and 30 seconds) because orthonormalization is applied to each block independently in TBS. HiPRIME orthnormalizes each block independently, but its building time is still larger than TBS. This is because a higher order (4×) is required to generate a reduced model with similar accuracy as TBS. Figure 8.5(b) further compares the simulation time, where we also increase the port number when increasing the circuit size. Because HiPRIME still uses flat projection, it results in a dense macro-model that loses the structure information and cannot be analyzed hierarchically. Therefore, it becomes inefficient used for time-domain simulation. As a result, its simulation time is much larger than the other macro-models. On the other hand, BSMOR, [139] and TBS enable the two-level analysis to handle larger circuits with sizes up to 7.68 M and 1200 ports in similar runtimes. For a circuit with size (76.8 K)2 and 120 ports, TBS achieves 109× runtime speedup compared with HiPRIME. In Table 8.1, we further study the accuracy scalability of reduced macro-model by HiPRIME, BSMOR, [139] and TBS. All reduced models by MOR use the same number of moments. The standard deviation of waveform differences between the reduced and the original model is used as the measure of error. We use a higher
155
8.7 Numerical examples
node (N )
port (np )
order (q)
TBS (m=4)
HiPRIME
BSMOR (m=4)
[139]
768 7.68K 76.8K 768K 7.68M
12 80 120 200 1200
8 40 60 100 200
5.0310−7 1.8410−6 3.0210−5 1.2710−4 3.0110−3
9.0910−6 2.3110−5 6.8210−4 9.6710−3 9.9710−2
4.8710−6 7.9310−6 1.9110−4 4.2310−3 5.1010−2
5.5410−6 1.2110−5 1.3110−2 6.0110−2 0.11
Table 8.1 Time-domain waveform error of reduced models by HiPRIME, BSMOR, and TBS under the same order (number of matched moments). Reprinted with permission from [138] (c) 2006 ACM.
Voltage bounce (mV)
50 40 30 20 10
0 120 100 80 60 40 20 0
0
20
40
60
80
100
120
Figure 8.6 A P/G voltage bounce map without decoupling capacitor allocations. Reprinted with permission from [138] (c) 2006 ACM.
order reduced model (by 4×) as the base if the waveform of the original model is unavailable. We find that the accuracy of [139] degrades when a large sparsity ratio is needed, where LP optimization cannot preserve accuracy. On the other hand, using moment matching based projection with preserved sparsity, TBS generates a macro-model with higher accuracy. For example, it has a 38× higher accuracy than [139] when reducing a 7.68 M circuit to a 1 K macro-model with 32% sparsity. For the same circuit, TBS is 17× more accurate than BSMOR owing to the exact mq-pole matching, and is also 33× more accurate than HiPRIME owing to more matched poles.
156
Structure-preserving model order reduction
Voltage bounce (mV)
15
10
5
0 120 100 80 60 40 20 0
0
20
40
60
80
100
120
Figure 8.7 A P/G voltage bounce map with decoupling capacitors allocated at the centers of four blocks. Reprinted with permission from [138] (c) 2006 ACM.
8.7.3
Noise map for structured P/G grids With the use of the TBS model, we can efficiently predict the locations of hotspots for large sized P/G grids, and hence, obtain the guidance to place decoupling capacitors. We apply TBS to generate a noise map of voltage bounce at a large non-uniform P/G grid (size 100 M) with 256 basic blocks of one 2D SOC design. We assume 1280 injection inputs injected uniformly in the basic blocks. The TBS method with (m = 4, n = 64) is used to generate a reduced model with size 2562 that is decomposed into four reduced blocks. We now present the time-domain noise map at four time instants under unit-impulse sources generated from the macromodel by TBS. The original model cannot finish the simulation in time domain due to the lack of memory. Figure 8.6 shows the P/G voltage bounce map at t=10 unit time. (unit time is time step h) under unit-impulse sources. Because of the charging and discharging through the interconnection between blocks the locations of hot-spots drift across the die with the maximum bounce around 45 mV. Based on this observation, we allocate four decoupling capacitors (100 pf) at the center of four blocks. As clearly shown in Figure 8.7, most hot-spots disappeared and the maximum voltage bounce was reduced to 12 mV.
8.8 Summary
8.8
157
Summary In this chapter, an accurate and efficient block structure-preserving model order reduction method is introduced for large structured systems in the time-domain. By further partitioning the projection matrix into more blocks, the resulting macromodel preserves the structures such as sparsity. In contrast, the traditional model order reduction [32,34,63,71,85,113,120] all generate a dense reduced macro-model with no preserved structure. This reduction is called the BSMOR method. Moreover, with an additional triangularization procedure, the original system is passively transformed into a form with an upper triangular block structure, where system poles are determined only by m diagonal blocks; m is decided by the nature of the structured network. With an efficient dominant-pole-based clustering and a block-structured projection, the reduced triangular system can match mq poles of original system. This reduction is called the TBS method. Experiments show that the waveform error is reduced 33× compared with the flat projection method like PRIMA and HiPRIME, and 17× compared with BSMOR using user-specified partition. Moreover, with a two-level organization reduction and analysis in TBS can be performed for each block independently. Therefore, it reduces both macro-model building and simulation time. TBS is up to 54× faster at building macro-models than BSMOR, and up to 109× faster in simulating macromodels in the time domain than HiPRIME. In addition, as TBS preserves sparsity, it is up to 133× faster at building macro-models than [139].
9 Block structure-preserving reduction for RLCK circuits
9.1
Introduction In the chapter, we introduce another structure-preserving model order reduction method, which extends the SPRIM method [37] to more general block forms while the 2q moment-matching property is still preserved. The SPRIM method partitions the state matrix in the MNA (modified nodal analysis) form into natural 2 × 2 block matrices, i.e., conductance, capacitance, inductance, and adjacent matrices. Accordingly, the projection matrix is partitioned. As a result, SPRIM matches twice the moments of the models by using the projection matrix given by PRIMA. The reduced models also preserve the structural properties of the original models like symmetry (reciprocity). This idea has been extended to deal with more partitions by block structure-preserving model order reduction (BSMOR) [136], as shown in Chapter 8 to further exploit the regularity of the many parasitic networks. It was shown that by introducing more partitions, more poles are matched and this leads to more accurate order-reduced models [138]. However, the BSMOR method simply introduce more partitions or blocks; it does not truly preserve the circuit structures for general RLCK circuits for different input sources (voltages or currents). The reduced model does not match the 2q moments of the original models, as SPRIM does. In this chapter, we first show theoretically that structure-preserving model order reduction can be applied to RLCK admittance networks, which are driven by voltage sources and requires partitioning of the original MNA circuit matrix into 2 × 2 block matrices. We further show that for a hybrid MNA circuit matrix where both current and voltage sources are present, 2q moment matching cannot be achieved. Then we proposed a general block structure-preserving reduced order modeling of interconnects (BSPRIM), which generalizes the SPRIM method [37] by considering more partitions. We study the matrix partitioning for both impedance matrices and admittance matrices separately and show that the reduced models will still match the 2q moments of the original circuits and that the circuit structures such as symmetry, sparsity can be preserved. Experiments show that with more partitions, the reduced models become more accurate.
158
159
9.2 Block structure-preserving model reduction
9.2
Block structure-preserving model reduction In this section, we review the structure-preserving projection-based MOR method SPRIM and the important results of the SPRIM method.
9.2.1
Preliminary Consider a modified nodal formulation (MNA) of the RLCK circuit equation in the frequency domain: Gx(s) + sCx(s) = Bip (s) vp (s) = B T x(s)
(9.1)
where x(s) is the state variable vector, G and C (∈ R N ×N ) are state matrices and B (∈ RN ×np ) is B = [B1T
0]T ,
(9.2)
a port incident matrix. We assume that we now have only current sources indicated by ip (s). Eliminating x(s) in (9.1) gives vp (s) = H(s)ip (s) H(s) = B T (G + sC)−1 B,
(9.3)
where H(s) is a multiple-input multiple-output (MIMO) impedance transfer function. PRIMA finds a projection matrix V (∈ RN ×qnp ) such that its columns span the qth block Krylov subspace K(A, R, q), i.e., spanV = K(A, R, q),
(9.4)
where A = (G + s0 C)−1 C, R = (G + s0 C)−1 B, and s0 is the expansion point that ensures G + s0 C is non-singular. The resulting reduced transfer function is ¯ ¯ −1 B, ¯ H(s) = B¯T (G¯ + sC)
(9.5)
where G¯ = V T GV,
C¯ = V T CV,
B¯ = V T B,
(9.6)
which has the identical expanded first qth moments of the original transfer function ¯ C¯ are ∈ H(s). It is called the Grimme’s projection theorem [45]. Note that G, qn ×n qnp ×qnp , and B¯ is ∈ R p p . R The PRIMA-like MOR method cannot preserve the structure of the original models. This is reflected in the fact that if the impedance transfer function H(s) ¯ is symmetric, the reduced transfer function H(s) is no longer symmetric. Also the ¯ C¯ become dense or full matrices. reduced matrices G,
160
Block structure-preserving reduction for RLCK circuits
9.2.2
The SPRIM Method In [37], a structure-preserving reduced model order reduction technique, SPRIM, was proposed. The primary observation is that the impedance transfer function of RLCK circuit H(s) is symmetric. By using a split projection matrix, the structure information relevant to the passive and symmetric properties of the original circuit e matrices are still preserved. As a result, the reduced transfer function H(s) is still symmetric. This structure preserving MOR was made possible by the observation that instead of using the Krylov subspace K(A, R, q) for the projection matrix Ve , one can use any projection matrix such that the space spanned by the column in Ve contains the qth block Krylov subspace, i.e., K(A, R, q) ⊆ Ve .
(9.7)
The SPRIM method starts with the 2 × 2 structured MNA circuit matrices in the following form: G AT C 0 B1 G= ,C= ,B= , (9.8) −A 0 0 L 0 where G (∈ Rn1 ×n1 ), C (∈ Rn1 ×n1 ), L (∈ Rn2 ×n2 ) are conductance, capacitance and inductance matrices, and A (∈ Rn2 ×n1 ) is the adjacent matrix indicating the branch current flow at the inductor. Note that n1 + n2 = N . Therefore, a structured projection vector Ve is constructed by partitioning the projection vector V obtained from the qth PRIMA iteration, V1 V1 0 e . (9.9) V = →V = 0 V2 V2
where V1 is ∈ Rn1 ×qnp , V2 is ∈ Rn2 ×qnp , and hence Ve is ∈ RN ×2qnp . As a result, the number of columns in Ve is twice that in V . Accordingly the new reduced state matrices are " # " # e A eT e 0 G C e Ge = (9.10) e 0 ,C = 0 L e , −A
e = V2 T LV2 . Similarly, the e = V1 T CV1 and L e = V1 T GV1 , A e = V2 T AV1 , C where G 2qn ×n 2qn ×2qn p ˆ C, ˆ and Bˆ e Ce (∈ R p ), and Be (∈ R p p ) is twice as that of G, size of G, reduced by using V . In addition to the preservation of the structure of MNA matrix, an important benefit of SPRIM is that the reduced models will match 2q block moments given qth block Krylov subspace K(A, R, q). The 2q matching property of SPRIM is mainly a result of the fact that both the original impedance and reduced impedance transfer functions are symmetric when structure is preserved. When symmetric transfer functions are reduced, 2q moments are matched due to the use of two Krylov subspaces.
9.3 Structure preservation for admittance transfer-function matrices
161
However, the SPRIM method [37] only gives the structure-preserving MOR on the impedance matrices with current input sources. We show in the following section that admittance circuit matrices, which is driven only by voltage sources can also be reduced in a structure-preserving way.
9.3
Structure preservation for admittance transfer-function matrices In this section, we show that for admittance transfer functions, by properly partitioning the circuit matrices and splitting the projection matrix, we can still preserve the structure of the admittance circuit matrices and achieve the 2q moment matching result.
9.3.1
Circuit structure partitioning For an RCL circuit with voltage sources, the resulting MNA equation is written as Gx + C
dx = But(t) dt
(9.11)
where T 0 Ec CEc 0 0 EgT GEg ElT EvT G = −El 0 L 0, B = 0 , 0 0 , C = EvT 0 00 −Ev 0 0
where Ex is the incident (adjacency) submatrix for corresponding type of branches in the circuit and ut (t) is the vector of input voltage sources. The transfer admittance function becomes Y (s) = B T (G + sC)−1 B.
(9.12)
As a result, we can still partition the circuit matrices into a 2 × 2 form as follows: G1 GT2 C1 0 0 G= ,C = , B= , −G2 0 0 C2 B2 where G1 and C1 , G2 and C2 are defined as −El , −E v L0 , C1 = EcT CEc , C2 = 00 0 . B2 = EvT
G1 = EgT GEg , G2 =
(9.13)
Meanwhile, subblock matrices G1 , C1 , C2 are positive semidefinite, i.e., they satisfy G1 0, C1 0, C2 0.
(9.14)
162
Block structure-preserving reduction for RLCK circuits
Notice that by comparing (9.8) and (9.13) one finds that the major difference between impedance and admittance is that the position matrices B are different. It turns out that this difference will not change the 2q moment-matching property in the structure-preserving reduction as shown in the proof of Theorem 9.1 in Section 9.7. After getting V from qth PRIMA, we obtain Ve by partitioning V conformly
according to the size of G, C and B, V1 V1 0 e V = →V = . V2 0 V2
(9.15)
Here the rows of V1 , V2 equal the rows of G1 , G2 , respectively. The new reduced matrix can be obtained by " # " # " # e1 G eT e1 0 0 G C T e T e T 2 e e e e e e G = V G V= e2 . e2 0 , C = V C V = 0 C e2 , B = V B = B −G The corresponding transfer function is
e −1 B. e Ye (s) = BeT (Ge + sC)
(9.16)
It can be proved that the reduced admittance matrix Ye (s) is still symmetric. The reciprocity property is still preserved.
9.3.2
Re-orthonormalization of split projection matrix For the projection matrix V ∈ RN ×q in (9.15), its rank should be q (assuming N > q). After the two-way splitting operation in (9.15), the rank of Ve may however not be 2q and it is typically less than 2q. The reason is that after splitting operations, some columns become linear dependent. This is also reflected in the fact that the columns in Ve is no longer orthogonal to each other, i.e., Ve T Ve 6= I.
To mitigate this problem, we re-orthonormalize each subblock in Ve and obtain the new split projection matrix Te orth(V1 ) 0 e T = , (9.17) 0 orth(V2 ) where, orth(X) means making the orthonormal basis for matrix X. As a result, we may end up with fewer columns in Te than in Ve , which leads to smaller sizes of the reduced models. Also, Te become orthogonal again TeT Te = I
(9.18)
Note also that the re-orthonormalization process does not change the subspace of Ve and the moment-matching property applied to Ve is also valid for Te. But experimental results show that the re-orthonormalization can always lead to the same or more accurate reduced models.
9.4 General block structure-preserving MOR method
9.3.3
163
Structure-preserving properties The proposed 2 × 2 structure-preserving MOR method for admittance transfer function matrices has the similar properties as the SPRIM for RCL circuits containing current sources. As a result, the reduced transfer function Ye (s) is still symmetric and the reciprocal property is preserved. The moment-matching property can be stated as follows: Theorem 9.1. Choose s0 ∈ 0. Practically, condition (3) is difficult to check as it requires the checking of frequency responses from DC to infinite. Fortunately, there is a better way to check
172
173
10.1 Passivity enforcement
the positive-realness. It can be proved that the following statements are equivalent: (a) A transfer function matrix Y(s) is positive real. (b) Let (A B C D) be a minimal controllable state space representation of Y(s) and ∃K, K = K T , K ≥ 0, such that the linear matrix inequality (LMI), T A K + KA KB − C T ≥ 0, B T K − C −D − DT
(10.1)
(10.2)
holds. Constraint (10.2) actually comes from the Kalman Yakubovich Popov (KYP) lemma, which establishes important relations between state space and frequency domain objects. The KYP lemma was introduced in control theory and later used in the network synthesis [5]. If we include the term proportional to s in the transfer function, which means that we need to know what happens in infinite frequency, we can write the admittance matrix in terms of (A, B, C, D) as Y(s) = sY ∞ + D + C(sI − A)−1 B,
(10.3)
where I denotes the identity matrix with the same dimension as A. To keep the transfer function positive real, Y ∞ must satisfy Y ∞ = (Y ∞ )T , Y ∞ ≥ 0.
(10.4)
Therefore, we can transform the problem of checking whether the admittance matrix Y(s) is positive real into the problem of checking whether its corresponding state space model in terms of (A B C D) is positive semi-definite. More important is that we can use the PR criterion in terms of the state-space form to enforce the passivity of the reduced circuit matrices, as shown in the next section.
10.1.1
˜ State-space model representation of Y(s) After the multi-point hierarchical model reduction, an n-port order reduced admit˜ p,q is a rational function tance matrix is generated as shown in (10.5), where each Y of s. The reduction process can capture the entire dominant complex poles, which means that there are no poles in the RHP (right-hand plane) of the complex plane. ˜ ˜ 1,n Y1,1 · · · Y ˜ Y(s) = ... . . . ... . ˜ n,1 · · · Y ˜ n,n Y
(10.5)
˜ The first step we take is to transform the admittance matrix Y(s) into its statespace representation. We assume that all rational functions in the matrix share the
174
Model optimization and passivity enforcement
common poles of the system. If there are private poles appearing on the leading diagonal element, we can separate them and their residues from the whole rational function after partial fraction decomposition and realize them separately. ˜ p,q is considered Given a multivariable n-port network, each rational function Y as a single-input and single-output (SISO) subsystem and mapped to its state-space representation in the controllable canonical form, which corresponds to (Aq,q , Bq,q , Cp,q , Dp,q ) in the matrix of (A, B, C, D) respectively. Now we can write its statespace representation as (10.6): B1,1 A1,1 · · · 0 .. . . . . A= . . .. , B = .. 0 0 · · · An,n
··· 0 . .. . .. , · · · Bn,n
(10.6)
D1,1 · · · D1,n C1,1 · · · C1,n .. . . .. . . . . . C= . . .. . . . , D = . Dn,1 · · · Dn,n Cn,1 · · · Cn,n Also this mapping process could be viewed as n set single-input and multipleoutput (SIMO) subsystems. If we choose the mth port as the input port, the mth column admittance rational function can be mapped into (Am,m Bm,m C:,m D:,m ).
10.1.2
Passivity-enforcement optimization In this subsection, we briefly mention how passivity enforcement is done using a convex optimization process on the state-space representation of the admittance matrix. Assume that we have obtained the admittance matrix of a model order reduced ˜ ˜ p,q (s) denote the (p, q) entry system Y(s) with a set of N sampling points. Let Y ˜ ˆ of the transfer function Y(s). Let Yp,q (sk ) be the exact value of the admittance at the entry (p, q) at the kth frequency point, which can be obtained by the exact hierarchical symbolic analysis [126]. The optimization problem is to determine C, D, Y ∞ such that a cost function ˆ p,q − Y ˜ p,q ). Here the constraints are is minimized with constraints on the error (Y on the weighted least square error, taken over N frequencies N X k=1
ˆ p,q (sk ) − Y ˜ p,q (sk )k2 ≤ tp,q . wk,p,q kY 2
(10.7)
The whole optimization problem can be reformulated as the following convex
175
10.1 Passivity enforcement
programming problem: minimize: t(K, C, D, Y ∞ ), subject to: (10.1), (10.2), (10.4) t ≥ 0, ∀1 ≤ p, q ≤ m, (10.7), ∀1 ≤ p, q ≤ m, tp,q ≤ t, t ≥ 0.
(10.8)
where m is the port number of the network. Both the objective function and the constraints are convex functions of variables t, tp,q , K, C, D, and Y ∞ . It is shown in [21] that the optimization problem in (10.7) subject to constraints in (10.2) can be transformed into a convex programming problem, which can be solved efficiently by some existing convex programming programs. For simplicity, you can assume that wp,q,k = 1 for all values of k, p and q. In this ˜ p,q (sk )k to normalize the relative error. Since N chapter we choose wp,q,k = 1/kY may be large, (10.7) will lead to a very large number of constraints. Thus a more compact form is desired. For a matrix M , let Mp denote the pth row of M and Mq denote the qth column of M . We can write the transfer function for entry (p, q) as follows: ∞ Yp,q (sk ) = sYp,q + Dp,q + Cp (sk I − A)−1 Bq .
(10.9)
Let J(sk ) = wp,q,k [BqT (sk I − AT )−1 eTq seTq ],
(10.10)
˜ p,q (sk ), L(sk ) = wp,q,k Y and define Fp,q =
Re{J(s)} , Im{J(s)}
(10.11)
Gp,q =
Re{L(s)} , Im{L(s)}
(10.12)
and X = C D Y∞ .
we now have N X
k=1
˜ p,q (sk )k22 = kFp,q XpT − Gp,q k. wp,q,k kYp,q (sk ) − Y
(10.13)
(10.14)
We can perform QR decomposition to the matrix Fp,q , Fp,q = Qp,q Rp,q ,
(10.15)
where R is an upper triangular matrix and Q is an orthogonal matrix satisfying
176
Model optimization and passivity enforcement
QT Q = I. We can write kFp,q XpT − Gp,q k2 = (Fp,q XpT − Gp,q )T (Fp,q XpT − Gp,q ).
(10.16)
Let Ep,q = (Rp,q XpT − QTp,q Gp,q )
(10.17)
2 = GTp,q (I − Qp,q QTp,q )Gp,q ; δp,q
(10.18)
T 2 kFp,q XpT − Gp,q k2 = Ep,q Ep,q + δp,q .
(10.19)
and
we can rewrite (10.16) as
The least-square constraint (10.14) becomes T 2 Ep,q Ep,q + δp,q ≤ tp,q .
(10.20)
Sometimes we need to introduce additional constraints on C, D, and Y ∞ . The most common ways are probably to fix D or Y ∞ to a special value, such as zero. The fixed value must ensure that the system meets the positive-real condition. We notice that passivity enforcement was done by the compensation-based approach proposed in [1]. But this method does not ensure the accurate matching because the compensated part may have significant impacts on the frequency range in which we are interested.
10.2 10.2.1
Model optimization for active circuits Introduction Model order reduction for passive interconnect circuits has been intensively studied in the past owing to increasing complexity of parasitic layouts of digital circuits. Many efficient algorithms have been proposed to reduce the interconnects modeled as RC/RCL/RLCK circuits, such as Krylov-subspace projection-based methods [32, 63, 85, 91, 113] and local node-reduction-based methods [3, 27, 98, 107, 108, 121]. For linear or linearized active circuits like filters, opamps etc., which dominate many analog, mixed-signal and radio-frequency (RF) circuits, fewer studies have been done to reduce those active circuits and realize the reduced circuit matrices using compact models [41]. On the other hand, the simulation of analog, mixedsignal, and RF circuits is a very time-consuming process. For instance, RF circuit simulation is an extremely time-consuming process because of very long simulation time to accommodate both the fastest and the slowest tones in the input signals [80]. As more analog RF components are manufactured on-chip using the latest digital VLSI technologies [70], parasitics, which are associated with many digital circuits, are must be considered in the integrated analog, RF circuits. This leads to increasing sizes of analog RF circuits and makes RF simulation a more challenging task. As
10.2 Model optimization for active circuits
177
a result, reduction and compact modeling of circuits with both passive and active circuits are important for fast mixed-signal and RF circuit design and verification. Active circuits are different from passive circuits in several respects. First many active circuits are non-linear in nature. But they often exhibit linearity when input signals are small so that the operational points do not change significantly. Such linearized circuits are good enough for predicting many useful characteristics of the analog and RF circuits, such as gain, noise figures, bandwidth, or noise margin for early stage verification. Second, active circuits may generate energy, which means they are not passive. Hence, no passive reduction and passive enforcements are required. But the phase responses of many active circuits, such as opamps, are important as they determine the stability of the circuits caused by internal or external feedback loops [77]. But existing model order reduction approaches only match the magnitude (or real and imaginary) part of the circuit responses. Explicit matching of phase response is desired in analog circuit modeling in addition to the magnitude matching in the frequency domain. Third, active circuits typically don’t have the reciprocity property 10.1 . Mathematically, a reciprocal network has a symmetric matrix, which has been exploited by many existing reduction approaches for RLCK circuits by iterative approaches. But this is not the case for general active circuits, which will make those methods less efficient. Also, the reduction process should not change the reciprocity property of the circuit during the reduction process. But this is not the case for most projection based reduction algorithms like PRIMA [91], except for the recent work [37] and the structure-preserving methods in Chapters 8 and 9. Also, how to realize general non-reciprocal (non-symmetric) order-reduced circuit matrix remains a less studied problem.
10.2.2
The new modeling method In this section, we present a general reduction and macro-modeling technique of the linear or linearized active circuits. The new reduction process is based on the hierarchical multi-point reduction algorithm shown in Chapter 5, which allows reduction and realization of both passive and active circuits up to very wide frequency ranges [93]. The new method applies a constrained linear least-square-based optimization method to match both magnitude and phase responses of the admittance in the reduced matrix after the hierarchical reduction for modeling active circuits. After the model generation, we can apply a general multi-port network realization process to realize any multi-port non-symmetric circuit matrices for macromodeling of non-reciprocal active circuits based on the relaxed one-port Foster’s canonical form network synthesis technique, which is shown in Chapter 11. The resulting modeling algorithm can generate high-fidelity multi-port macromodels of any linear active network with easily controlled model accuracy and complexity, up to the given frequency range.
10.1
A reciprocal network is one in which the power losses are the same between any two ports regardless of the direction of propagation [127].
Optimization for magnitude and phase responses Existing model order reduction tools typically match the admittance responses in terms of magnitudes (or real and imaginary parts). But for active circuits, phase responses are also important as they are related to the stability of the active circuits as many active circuits have internal feedback. Figure 10.1 shows the optimized admittance response of Y12 (s) of the reduced two-port admittance matrix of the µA725 opamp circuit without considering matching phases. As can be seen, the phase part discrepancy is quite large at the low frequency range: even the magnitudes are matched perfectly. This reduced 2 × 2 circuit model will give incorrect phase responses, which may result in unstable or oscillating response of the whole system in time domain under some input stimulus in the given frequency range. As a result, we have to explicitly match the phase response during the optimization process shown below.
−10
2
−8
x 10
1
0
x 10
0.5
Real part
−2 −4 −6 0 10
2
4
10 10 Frequency
Imag part
Original Optimized
0
−0.5 −1 0 10
6
10
Original Optimized
2
4
10
4
10
10Frequency 10
6
−8
1
x 10
200
0.8
100
0.6
Phase
10.3
Model optimization and passivity enforcement
Magnitude
178
Original Optimized
0.4
−100
0.2 0 0 10
0
2
4
10 10 Frequency
6
10
−200 0 10
Original Optimized
2
10 10 Frequency
6
Figure 10.1 Admittance Y21 response of the µA725 opamp without considering phase. Reprinted with permission from [73] (c) 2005 IEEE.
10.3 Optimization for magnitude and phase responses
10.3.1
179
Constrained least-square-based optimization After the multi-point hierarchical model reduction, an n-port order reduced admittance matrix of the original circuit is generated, as shown in (10.21), ˆ Y1,1 (s) · · · Yˆ1,n (s) .. .. .. ˆ Y(s) = (10.21) , . . . ˆ ˆ Yn,1 (s) · · · Yn,n (s)
where, each of the rational admittances, Yˆij , can be represented in the partial fraction form, as shown in (10.23). The hierarchical multi-point reduction process typically finds all the dominant poles in the given frequency ranges for each admittance, but their residues may not be accurate owing to multi-point expansions. As a result, we need to adjust the residues so that admittance responses match well with the exact ones in both magnitude and phase. This can be done by the constrained least square optimization process. Assume that we have obtained the admittance matrix of an order-reduced system ˆ Y(s) with a set of T frequency sampling points. Let Y˜p,q (sk ) be the exact value of the admittance at the entry (p, q) at the kth frequency point, which can be obtained by the exact hierarchical symbolic analysis [126]. Let us first consider the magnitude only. The optimization problem is to determine the residues of poles such that the following least-square cost function is minimized: min(
T X k=1
kYˆp,q (sk ) − Y˜p,q (sk )k22 ).
(10.22)
To format the optimization problem, we need to rewrite each rational function at the entry (p, q) of the admittance matrix in the following partial fraction form: Yˆ (s) = sYˆ∞ + Yˆ0 +
M X
N
X rcn rc∗n rrm + ( + ), s − prm n=1 s − pcn s − pc∗n m=1
(10.23)
where we have N -pair conjugate poles pcn and M real poles prm . For a given frequency point sk , we define Ak = [ ar1 (sk ) · · · arM (sk ) ac1 (sk ) · · · ac2N (sk ) 1 sk ],
(10.24)
x = [ xr1 · · · xrM xc1 · · · xc2N Y0 Y∞ ]T .
(10.25)
and
For each pole lying on the real axis of s plane, we have arm (sk ) =
1 , sk − prm
(10.26)
and xrm is the residue corresponding to pole prm . In the case of complex poles, we
180
Model optimization and passivity enforcement
have acn =
1 1 j j + − , ac = , sk − pcn sk − pc∗n n+1 sk − pcn sk − pc∗n
(10.27)
and consequently xcn and xcn+1 are the real and imaginary parts of the conjugate residues of the complex poles, respectively. For T frequency points, we define Re(Y˜ ) Re(A) , Ylin = Alin = . (10.28) Im(A) Im(Y˜ ) We then can rewrite (10.22) as min(kAlin x − Ylin k22 ).
(10.29)
In this way, all the variables (the real and imaginary parts of residues) are real variables and the optimization is done in the real number domain. Now we consider the phase constraints. Phase is essentially the ratio of the real and imaginary parts of a complex number. Normally, it will be automatically matched if magnitude is matched when both real and imaginary parts are not small. But this is not the case when both of them are very small numbers. There are two aspects of the ratio: one is the sign of the ratio and the other is the value of the ratio. We first consider the sign constraint. Let us define YD = diag(Ylin ),
(10.30)
where diag(Ylin ) means generating a diagonal matrix from a vector Ylin and Dlin = YD Alin ;
(10.31)
then the phase sign constraint becomes Dlin x ≥ 0,
(10.32)
Then we consider the ratio value constrain, which requires that the ratio between real and imaginary parts of the optimized one is the same as the exact one. Let’s define Im(Y˜ ) YI = diag , (10.33) Re(Y˜ ) and Clin = Ilin YI Alin ,
(10.34)
where Ilin
1 ··· .. . . =. .
0 −1 · · · .. .. . . . . .
0 .. , .
0 · · · 1 0 · · · −1
(10.35)
181
10.4 Numerical examples
then the phase value constraint becomes Clin x = 0.
(10.36)
Finally, we have the following constrained linear least-square optimization problem: min(kAlin x − Ylin k22 ) subject to Dlin x ≥ 0 . Clin x = 0
(10.37)
The resulting problem is solved by MATLAB’s least square tool package in work.
10.4
Numerical examples In this section, we present some numerical results on two examples by using the discussed optimization and passivity enforcement methods. We use SeDuMi [118] and SeDuMi Interface to solve the convex programming problem for passivity enforcement. The numerical examples of passive modeling can be found in Chapter 5. We only present the results for active circuit modeling in this section. The first example is a folded cascode CMOS OPAMP [42]. Its small-signal model contains 122 resistors, capacitors, and voltage control current sources. We perform the multi-point hierarchical model order reduction up to 2 MHz, which extract four dominant common poles for the admittance matrix. We match the frequency up to 2 MHz during the optimization. The synthesized circuit includes 40 RLC, 2 VCVS and 2 CCCS controlled devices, which represents a 63.93% reduction ratio (63.93% circuit elements have been suppressed) for this case. The resulting waveforms in frequency domain and comparison with the original waveforms are shown in Figure 10.2 for Y12 (s). As one can see, the synthesized circuit matches the original circuit perfectly up to 2 MHz in all aspects of the frequency responses. Using this CMOS opamp, we design two low-pass active filters. The first filter example is a tenth-order active Sallen–Key topology low-pass filter, shown in Figure 10.3. After the reduction and realization, there are 88 RLC elements, 2 VCVS and 2 CCCS dependent sources in the synthesized circuit compared with 636 devices in the original circuit, which represents an 85.53% reduction rate. The resulting waveforms in the frequency domain and comparison with the original waveforms are shown in Figure 10.4 for Y21 (s). If the phase is not explicitly considered in the optimization process, the results are shown in Figure 10.5 for Y21 (s). It can be seen that the phase part has noticeable discrepancy compared with exact response. The filter’s transfer function response is shown in Figure 10.6. Fig 10.7 shows the simulation in the time domain with different excitations. For the left figure, the input signal is a sinusoidal signal with 1 KHz frequency. The outputs of the synthesized ones are almost the same of the original circuit and the synthesized circuit. For the right figure, the excitation is also a sinusoidal function with 1 MHz frequency. The outputs are still very close.
Model optimization and passivity enforcement
−11
0
−10
x 10
2
x 10
Imag part
Real part
−0.5 −1
−1.5
Original Synthesized
−2 0 10
5
10 −10
x 10
Original Synthesized
1
0 0 10
10
10
Frequency
5
10
10
10
Frequency 96
Original Synthesized Phase
2
Magnitude
182
1
Original Synthesized
94
92
0 0 10
5
10
90 0 10
10
10
Frequency
5
10
10
10
Frequency
Figure 10.2 Frequency response of Y12 of opamp model. Reprinted with permission from [73] (c) 2005 IEEE.
Vin
+
+
+
−
−
−
+
+
−
−
Vout
Figure 10.3 Active Sallen–Key topology low-pass filter. Reprinted with permission from [73] (c) 2005 IEEE.
The second filter is shown in Figure 10.8. This filter is a fifth-order elliptic filter using the FDNP (frequency-dependent negative resistor) technique. It contains 507 passive and active elements. The matching frequency is up to 1 MHz and we find eight dominant poles in this range. There are 56 RLC elements, 2 VCVS and 2 CCCS dependent sources in the synthesized circuit, which gives an 88.17% reduction rate. The frequency responses of the original and synthesized circuits are shown in Figure 10.9. From this figure, we notice that the realized circuit’s response is almost the same as the original system from DC to 1,MHz, but there are noticeable differences around 100 MHz. This is expected as we only match the frequency up to 1 MHz. As a result, the original circuit can be replaced by the synthesized model if the
183
10.4 Numerical examples
−4
10
−4
x 10
2
x 10
Imag part
Real part
0 5
−2 −4
0
−6
Original Synthesized
−5 0 10
2
4
10 10 Frequency
Original Synthesized
−8 0 10
6
10
2
4
10
4
10
10 10 Frequency
6
−3
1
x 10
200
0.8 Magnitude
100 Phase
0.6 0.4 0.2
−100
Original Synthesized
0 0 10
2
0
4
10 10 Frequency
6
10
Original Synthesized
−200 0 10
2
10 10 Frequency
6
Figure 10.4 Frequency response of Y21 of the Sallen–Key topology low-pass filter. Reprinted with permission from [73] (c) 2005 IEEE.
−4
10
−4
x 10
2
x 10
Imag part
Real part
0 5
0
2
−4 −6
Original Synthesized
−5 0 10
−2
4
10 10 Frequency
6
10
−8 0 10
Original Synthesized 2
4
10
4
10
10 10 Frequency
6
−3
1
x 10
200 100
0.6
Phase
Magnitude
0.8
0.4 0.2 0 0 10
−100
Original Synthesized 2
0
4
10 10 Frequency
6
10
−200 0 10
Original Synthesized 2
10 10 Frequency
6
Figure 10.5 Frequency response of Y21 of the Sallen–Key topology low-pass filter without considering phase. Reprinted with permission from [73] (c) 2005 IEEE.
Model optimization and passivity enforcement
0
Original circuit Synthesized ROM
−10
Voltage (DB)
−20 −30 −40 −50 −60 −70 0 10
2
4
10
6
8
10 10 Frequency
10
10
10
Figure 10.6 Frequency response of the transfer function of the Sallen–Key topology lowpass filter. Reprinted with permission from [73] (c) 2005 IEEE.
0.015
1
Original Synthesized
Original Synthesized
0.8
0.01
0.6 0.4
0.005
Voltage (V)
0.2
Voltage (V)
184
0
−0.2
0
−0.4
−0.005
−0.6 −0.8 −1 0
0.2
0.4
0.6
0.8
1
Time (s)
1.2
1.4
1.6
1.8
2 −3
x 10
−0.01 0
0.5
1 Time (s)
1.5
2
−6
x 10
Figure 10.7 Transient response of the Sallen–Key topology low-pass filter with different excitations. Reprinted with permission from [73] (c) 2005 IEEE.
filter works in the frequency range from DC to 1 MHz (at least) or 100 MHz (at most). Over this frequency, the realized circuit will not match the original circuit well. The resulting transient waveforms are also shown in Figure 10.10. We add the same signal to the input of the original filter and the synthesized circuit to view the output waveforms. The left figure is the result from a 1Khz sinusoidal excitation. The outputs are almost same between these two filters. The right figure shows the result from a 2 MHz sinusoidal signal. The time-domain simulation is still accurate at this frequency.
185
10.5 Summary
+
−
−
+
+
−
Vout
−
+
Vin
+
+
−
−
Figure 10.8 Active low-pass FDNR filter. Reprinted with permission from [73] (c) 2005 IEEE. 0 −20 −40
Voltage (DB)
−60 −80
Original circuit Synthesized ROM
−100 −120 −140 −160 −180 0 10
2
10
4
10
Frequency
6
10
8
10
10
10
Figure 10.9 Frequency response of the transfer function of the low-pass FDNR filter. Reprinted with permission from [73] (c) 2005 IEEE.
10.5
Summary In this chapter, we presented model optimization and passivity enforcement techniques for general passive and active circuit modeling. We first presented the convex programming or semi-definite programming based model optimization and passivity enforcement algorithm. The algorithm formulates the optimization problem into a convex programming problem with semi-definite constraints to ensure the passivity of the optimized models. This method is very general and can be applied to any fitting-based modeling process. But it suffers the scalability issue owing to expensive computation in the semi-definite programming and to the fast growing of constraints with the increasing number of terminals and poles. We then pre-
Model optimization and passivity enforcement
−3
0.5
1.4
Original Synthesized
0.4
x 10
Original Synthesized
1.2
0.3 1
0.2 0.1
Voltage (V)
Voltage (V)
186
0
−0.1 −0.2
0.8 0.6 0.4
−0.3 0.2
−0.4 −0.5 0
0.5
1 Time (s)
1.5
2
−3
x 10
0 0
0.5
1 Time (s)
1.5
2
−6
x 10
Figure 10.10 Transient response of the FDNR filter with different excitations. Reprinted with permission from [73] (c) 2005 IEEE.
sented a modeling technique for active linear analog circuits based on constrained least-square optimization. The new method explicitly considers the phase and magnitude responses of the circuits and ensures those important characteristics are well matched with that of the original circuits for active circuit modeling.
11 General multi-port circuit realization
Circuit realization deals with the issue of interfacing the reduced models with a general circuit simulation like SPICE. After the model order reduction, we will typically end up with an admittance or impedance transfer matrix H(s) = B T (G + sC)−1 B in the frequency domain, which can typically be rewritten as an N × N matrix, h11 (s) · · · h1n (s) .. .. .. HN ×N (s) = , . . . hN 1 (s) · · · hN N (s)
(11.1)
where N is the number of ports. Each of hij can be a rational polynomial or in the partial fraction form with poles pi and corresponding residues ki . There are two general ways to incorporate frequency-domain data like H N ×N (s) into a circuit simulator. The first method is to realize the frequency-domain data into a circuit with simple RLCM elements. The second method is by means of timedomain recursive convolution [14, 106]. Recursive convolution requires modification of the existing simulators and a new interface for getting the frequency domain data into the simulators. While realized circuits can easily be incorporated into the existing circuit simulators and are more portable among different simulators. In this chapter, we focus on the circuit realization methods for the sake of circuit simulations. Our goal is different from the traditional circuit synthesis methods [127], where the synthesis is targeted at the physical realization of some network functions for implementing circuits, such as filters and matching networks. Instead, in our problem, we only need to build a models that can be used with circuit simulators. Such relaxed requirements can lead to significant simplification in the synthesis processes, as shown in this section.
11.1
Review of existing circuit-realization methods Circuit realization for circuit simulation has been studied before [14, 35, 39, 98] in the computer-aided design community. General one-port and multiple-port network realization techniques were used in the past for solving admittance synthesis problems. For example, Cauer-formbased synthesis was applied earlier by Freund and Feldmann for RC/LC synthe187
188
General multi-port circuit realization
sis by matching matrix elements of the state-matrix after the interconnect model order reduction [35, 39]. However, the matrix matching is not efficient, involving O(N 2 ) operations, and the RC/LC template is not sufficient for general RLCM system realization. PRIME [82] is a circuit modeling and realization tool for high speed interconnect and transmission lines. It is based on Foster’s form to realize all the admittance and impedance network functions. But it requires every complex pole pair to be physically realizable (every RCL element is positive), which cannot be satisfied in general and may lead to significant errors when unrealizable pole pairs are discarded or their residues are changed. In [97, 98], a template fitting approach based on convex programming was proposed. The template is borrowed from Brune’s synthesis method. But this approach is expensive when high-order models are involved and it can only be applied to the one-port synthesis problem. Also the realization is an approximate process and errors will be introduced. All existing realization methods require that all the RLCM elements are physically realizable (with positive values). Therefore, the realized models are always passive. In the following, we first present Brune’s synthesis method, which can realize any one-port passive (positive-real or PR) network function with realizable RLCM elements. Brune’s method is a traditional network realization method, which is computationally expensive and less suitable for our modeling purpose. But this method contain many fundamental concepts for physical network realization algorithms and it is instrumental for the reader to review those methods before we introduce a new approach to the model-oriented realization method. A comprehensive treatment of physical network realization can be found in [127]. We then present a more efficient relaxed one-port and multi-port admittance realization method based on a generalized Foster’s form, which can use unrealizable (negative-valued) RLCM elements but still preserve the passivity. The new method is an exact realization method without introduction of any error and also overcomes the over-constrained synthesis conditions of the PRIME method.
11.1.1
One-port realization framework by RLCM elements In this section, we first review the conditions for positive real functions and then represent the general realization framework for one-port network by using RLCM elements. Positive real function revisited The RLCM immittances function Z(s) of a one-port network is positive real (passive) only if [127]: 1. 2. 3.
Z(s) is a real rational function of s, ReZ(jω) ≥ 0 for all real ω, All poles of Z(s) are in the closed LPH; all jω-axis poles are simple, with positive real residues.
189
11.1 Review of existing circuit-realization methods
It is shown in [127] that these conditions are also sufficient for a one-port network to be physically realizable. By using the traditional one-port network realization technique, the positive-real property will guarantee that all the RLC elements are positive (thus physically realizable). General one-port network realization steps The basic idea of physical one-port network realization is to break the impedance or admittance network into smaller realizable networks, which can be implemented by simple L, C, and R elements connected in series or parallel. There are two techniques used in the synthesis of one-port networks: (1) pole removal and (2) constant removal. The first operation realizes a series reactance branch when used on an impedance function Z(s) and a shunt susceptance branch when performed on an admittance function Y (s). The second operation realizes a series resistor from a Z(s) and a shunt conductance from a Y (s) function. Notice that for the pole-removal operations, we require that all the poles are in the jω-axis. The reason is that poles or pole pairs in the jω-axis can easily be realized by LC elements in series or in parallel. For instance, a network function whose impedance and admittance forms are as follows:
Z(s) = s + Y (s) =
s2 + 1 1 + Z1 (s) = + Z1 (s). s s
(11.2)
s + Y1 (s). s2 + 1
(11.3)
Note that Z(s) has at least two poles, s = 0 and s = ∞. So it can be realized by a series branch with LC in series, as shown in Figure 11.1(a). Similarly, Y (s) has at least two poles at s = −j and s = j, which can be realized by a shunt LC branch in series, as shown in Figure 11.1(b). PSfrag replacements L=1
C=1
Z(s)
Z1 (s)
Y (s)
L=1
Y1 (s)
C=1 (b)
(a)
Figure 11.1 Realization of Z(s) in (11.2) and Y (s) in (11.3).
Similarly a network whose impedance is the admittance in (11.3) is shown below: s + Z1 (s). +1 1 s2 + 1 Y (s) = s + = + Y1 (s). s s Z(s) =
s2
(11.4) (11.5)
190
General multi-port circuit realization
Now Z(s) has at least two poles, at s = −j and s = j, which can be realized by a series LC branch in parallel, as shown in Figure 11.2(a). Similarly, Y (s) has at least two poles, s = 0 and s = ∞. Hence, it can be realized by a shunt branch LC with in parallel, as shown in Figure 11.2(b). PSfrag replacements L=1
Z(s)
Y (s)
Z1 (s)
C =1
Y1 (s) L=1
C =1 (b)
(a)
Figure 11.2 Realization of Z(s) in (11.4) and Y (s) in (11.5).
In the following, we use the example to illustrate the synthesis process [127]. Consider a positive-real impedance function 2s3 + 4s2 + 2s + 3 . s3 + 3s2 + 4s + 1
Z(s) =
(11.6)
Since the degrees of the numerator and the denominator are the same, there is no pole or zero at s = ∞. For s = 0, Z(0) = 3. Hence there is no pole or zero at s = 0 either. We notice that s2 + 1 is the common factor in the numerator. Hence Y (s) = 1/Z(s) has poles at j and −j. These poles can be removed by the partialfraction expansion: Y (s) = where k1 =
k1 s + Y1 (s), +1
s2
s2 + 1 Y (s) s
= 1, s=j
and Y1 (s) =
s+1 s 1 = + . 3s + 3 6s + 9 3
So Y1 (s) can be realized by a shunt resistor with a RC branch. The realized circuit for Z(s) in (11.6) is shown in Figure 11.3. PSfrag replacements
Z(s)
3 1
Figure 11.3 Realization of Z(s) in (11.6).
1/9
1
6
191
11.1 Review of existing circuit-realization methods
We summarize the one-port network realization process for non-minimum functions as follows [127]: 1. 2.
3.
Check Z(s) for jω-axis poles. If there are any, remove them, thereby realizing a series reactance branch of the circuit. Check remainder Z1 (s) for jω-axis zeros. If there are any, remove the corresponding poles from Y1 (s) = 1/Z1 (s). Repeat steps 1 and 2 until there are no jω-axis poles and zeros in the remainder Z(s) or Y (s). Find the minimum value which the real part of the remainder Z(s) or Y (s) assuming along the jω axis. Remove this value as a series resistor (or shunt conductance). Return step 1.
It was shown in [127] for a positive-real function, the synthesis process will always generate positive R, L, and C values to ensure the physical realizable of the synthesized circuits. For practical interconnect and network functions extracted or obtained from fullwave simulation or measurement, most of the poles of network functions are not on the jω-axis as shown in [20]. As a result, the realization process will not work for those network functions. In the sequel, we introduce Brune’s method, which can realize any positive-real function.
11.1.2
One-port realization by Brune’s method The driving point synthesis for RLCM circuits was studied earlier by Brune in his significant paper [12], where he pointed out positive-real impedance Z(s) can be realized by a passive multiple-stage ladder network. To understand Brune’s method, we first introduce the concept of the minimum function. A minimum function is a function that cannot be synthesized by pole and constant removal as mentioned in the previous subsection. The real part of this function will reach 0 at a finite value ω1 along the jω-axis, as shown in Figure 11.4 and indicated by Z1 (ω) in dotted line. ReZ(jω) ReZ(jω) ReZ1 (jω)
PSfrag replacements
Rmin
ω
ω1 Figure 11.4 Real-part responses of Z(s) and the remainder Z1 (s) = Z(s) − Rmin .
192
General multi-port circuit realization
Practically, a given network function, Z(s), may be a constant away from the minimum function as shown in Figure 11.4. After we remove the constant, which will be realized by a series resistor, the remainder function Z1 (s) becomes a minimum function. The idea of Brune’s method is to create a pole in the remainder function explicitly. To do this, we notice that the real part of Z(ω) is zero at jω1 . So we have Z1 (jω1 ) = jX1 .
(11.7)
We try to extract a series element Zs (s) such that the remainder Z2 (s) is zero at jω1 , Zs (jω1 ) = Z1 (jω1 ) − Z2 (jω1 ) = jX1 .
(11.8)
Assume that X1 < 0 and the conclusion is also applied to X1 = 0 and X1 > 0. So to realize Zs (s), we can use a capacitor C1 , such that C1 = 1/(ω1 |X1 |) > 0. However, for the remainder Z2 (s) Z2 (s) = Z1 (s) − 1/(sC1 ), will then become non-positive-real as it has a pole at s = 0 with negative residue −1/C1 . Note that Z1 (s) does not have pole in jω axis. Hence, instead of using a positive capacitor, we use negative inductor directly to realize Zs (s) and we show that we can transform the negative inductor L1 into physical realization transformers later. So we have Z2 (s) = Z1 (s) − sL1 = Z1 (s) + s|L1 |,
(11.9)
which is the sum of the two positive-real functions and thus positive real itself. Since it has a zero at s1 = jω1 and a pole at s = ∞, we can now perform the normal pole removal as discussed above. First, we remove the poles at jω1 (also at −jω1 ) for Y2 (s) = 1/Z2 (s) Y2 (s) = where
k1 s + Y3 (s), s2 + ω12
s2 + ω12 Ys (s) k1 = s
(11.10)
.
(11.11)
s=jω1
As a result, we have a shunt RL branch with L2 = 1/k1 and C2 = k/ω12 . Then we have Y3 (s) = Y2 (s) −
s/L2 , s2 + ω12
(11.12)
which is positive real as it comes from (11.10). Note that Z2 (s) has a pole at s = ∞, which means that Y3 (s) has a zero at s = ∞. Hence Z3 (s) has a pole at s = ∞. This pole can be removed by extracting
193
11.1 Review of existing circuit-realization methods
a positive inductor L3 , and the remainder becomes Zrem = Z3 (s) − sL3 ,
(11.13)
which is positive real too. It can be shown that L3 can be computed as [127] L3 =
−L1 L2 |L1 |L2 = . L1 + L 2 −|L1 | + L2
(11.14)
The Brune’s realization cycle and the realized circuit are illustrated in Figure 11.5 R min Z(s)
L1
L3
L2 C
Brune's Cycle
Rt
Figure 11.5 Brune’s driving point synthesis by multiple-stage RLCM ladders (Brune’s cycle).
Note that these synthesis steps stay at the mathematical concept level. Next, we present a practical algorithm to implement these synthesis steps [137]. The outline of the algorithm is illustrated in Figure 11.6 One-Port Brune Synthesis(Z0 (s), L) While (i ≤ L and Zf is positive real) { Do jω-axis pole/zero check/remove { Z1 (s) = Check/remove jω-axis pole (Z0 (s)); Z2 (s) = Check/remove jω-axis zero (Z1 (s)); } B(s) = Check/remove jω-axis minimum resistance (Z2 (s)); Brune’s RLCM ladder synthesis (B(s)); Z0 (s) = Zf (s) = Remainder impedance; i + +; } Calculate termination resistor Rt by Zf (s); Figure 11.6 Brune’s multiple level ladder macromodel synthesis.
We will then briefly illustrate this procedure by studying an example with the following driving point impedance: 8s2 + s + 4 . + 11s2 + 20s + 1 To obtain a minimum function, it is first necessary to remove all possible poles and zeros of Z(s) from jω-axis to obtain a remainder for further minimum impedance removing. According to Brune’s synthesis rule discussed above, (i) removing a zero results in a series resonant LC element in parallel with the remainder Z(s) =
24s3
194
General multi-port circuit realization
k= 1 L 1 (-1h) C 1 (3f)
L 3 (2h) R 3 (4ohm )
L 2 (2h)
C 1 (3f)
L p (1h)
C 2 (2f)
L s (4h)
R 3 (4ohm )
C 2 (2f)
(a)
(b)
Figure 11.7 Example of Brune’s synthesis with passivity-preserved transformation: the non-passive T circuit is transformed to a passive coupled-inductor circuit.
impedance; and (ii) removing a pole results in a parallel resonant LC element in series with the remainder impedance. All these operations preserve the positive-real property. In this example, there is a zero for s → ∞. After removing this zero we add C1 = Y s(s) |s→∞ = 3F according to rule (i). We then obtain a remainder Z1 (s) for further minimum resistance removing. This minimum impedance is obtained by numerically solving the root ω1 of d(ReZ1 (s)) = 0. dω
(11.15)
In this case, we find ω1 = 0.5 and its corresponding minimum resistance: R = ReZ1 (s) = 0.0. Hence we have a minimum function: B(s) = Z1 (s) − R = Z1 (s). According to Brune’s synthesis rule (iii), by removing this resistance R, the B(s) function becomes pure imaginary impedance at jω1 : Z1 (s)|s=0.5j = −0.5j, and it can be synthesized by a negative inductance L1 = ImZ1 (s)/ω1 = −1H. Although we see a negative inductance here, we show in the following that by transforming the resultant non-passive T circuit into a pair of coupled inductors, we can still preserve the passivity. A general proof of this property can be found in [127]. Now the remainder becomes: Z2 (s) = B(s) − sLs =
8s3 + 16s2 + 2s + 4 , 8s2 + 8s + 1
and it has a zero at jω1 = 0.5. Hence by further removing this zero of Z2 (s), we can synthesize a series L2 = 2H and C2 = 2F . Finally, we obtain a remainder Zf (s) that cannot be further reduced: Z3 (s) = 2s + 4. Hence we can realize it by a inductance L3 = 2H and a termination resistance Rt = 4Ω as shown in Figure 11.7 (a). We further apply the passive transformation as shown in Figure 11.7 (b), and obtain passive parameters of a pair of coupled inductors: Lp = L1 + L2 = 1H, Ls = L2 + L3 = 4H, and M = L2 = 2H. Note that the coupling between Lp and Ls is perfect M ( √LpLs = 1). Because we have an order-reduced driving point impedance function, it enables us to practically synthesize a low-order driving point impedance macromodel. As shown in Figure 11.6, we control the synthesized Brune’s ladder stage by a specified
11.2 General multi-port network realization
195
number L, where we terminate the procedure and add a termination resistor Rt : Rt = Zf (s)|s→0 ,
(11.16)
where Zf (s) is the remaining impedance in the final synthesis step, and this termination resistor Rt takes accounts of the remaining DC impedance. We note that usually after several stage extractions, the extracted macromodel can capture the original system response well in a wide frequency range up to 10 GHz. Generally, if a minimum function is expressed as the ratio of two nth-degree polynomials without common factor, it decreases two degrees during every synthesis step with an increased one stage order of a RLCM ladder. Usually, by increasing the ladder stage, we increase the model order and capture more poles. For example, this procedure captures two poles whenever we synthesize one more stage of the ladder. Brune’s synthesis needs realization of an equivalent coupled inductor to enforce passivity. Bott–Duffin’s method [13], on the other hand, avoids the use of mutual coupled inductors. But a stiff price is paid in the extra number of RLC components and, hence, is not suitable for simple RLCM macromodel realization. Note that for general transfer function realization, two-port network realization is required, as transfer functions are not positive real in general and one-port network realization cannot be applied directly [127]. In the following section, we present a general one-port and multi-port network synthesis method for admittance network realization.
11.2
General multi-port network realization In this section, we introduce a new one-port and multi-port network synthesis approach. The new approach is motivated by the fact that the realized RLC circuits do not need to be physically implemented, as our goal is behavioral modeling. As a result, we allow the non-physical (negative) values of RLC elements. Such a relaxation can significantly simplify the realization process and render the process free of errors. On the other hand, passivity is enforced at the admittance and impedance matrix level (the mathematical level). Hence, the realized circuits are still passive, which was validated in [94].
11.2.1
General relaxed one-port network realization We start with one-port network realization. The approach is based on one-port network realization using Foster’s form, which can directly synthesize the one-port admittance function [127]. To synthesize the one-port circuit from the driving-point admittance rational
196
General multi-port circuit realization PSfrag replacements function Y (s), we first rewrite it in Foster’s canonical form [127]: Y (s) = sY∞ + Y0 +
M X
N
X an a∗n am + ( + ), s − pm n=1 s − pn s − p∗n m=1
(11.17)
where we expand the rational function into the partial fraction form with N conjugate poles pn and M real poles pm .
Rm 1 Rm M Y (s)
G s Cs Lm 1
Rn 1
Rn N
Ln 1
Ln N
Lm M Cn 1
Gn 1 Cn N
Gn N
Figure 11.8 One-port Foster admittance realization. Reprinted with permission from [94] (c) 2006 IEEE.
The admittance function in the Foster’s canonical form can then be synthesized by an equivalent circuit in Figure 11.8 with the following relations to determine R, L, C, G elements: Gs = Y0 , Rm m = Ln n = Gn Cn
n n
1 2Re{an } ,
=−
Re{an p∗ n} Re{an } ,
Cs = Y ∞ ;
Lm m = − apm ; m
1 am ,
Ln n Cn n |pn |2 = Rn n Gn n + 1, Rn Ln
n n
=
Re{an p∗ n} Re{an }
− 2Re{pn }.
(11.18)
Some existing works, like PRIME [82], require every complex pole pair to be physically realizable (every RCL element is positive), which is over-constrained and may lead to significant errors when unrealizable pole pairs are discarded or their residues are changed. In this approach, we relax those constraints by allowing some negative RLC elements. Note that, in general, the synthesis may produce negative-valued circuit elements. These negative circuit elements will not affect the stability and passivity of the realized circuit when the original circuit matrix or rational function is passive [35, 39, 127] as the passivity is guaranteed at the mathematical level of the models and the realization is error-free and reversible.
11.2.2
Multiple-port network realization For the passive multi-port order-reduced admittance matrix, we propose a general complete-graph structure (in case of full admittance matrix) to realize the admittance matrix based on one-port realization. Note that such a structure does not apply to the impedance matrices. Hence, our approach can only be applied to
197
11.3 Multi-port non-reciprocal circuit realization
multi-port admittance matrix realization. But this does not indicate the restrictions as most RLC circuits can be written into admittance forms by using modified nodal analysis approach. In the following, we first illustrate how a 2-port network is realized and then we extend this concept for the general n-port network realization. Given a 2 × 2 passive admittance matrix, which is obtained by the hierarchical model order reduction method, y11 (s) y12 (s) , (11.19) Y2×2 (s) = y21 (s) y22 (s) it can be realized exactly, using the Π-structure template shown in Figure 11.9, where each branch admittance will be realized by the one-port Foster’s expansion method shown in Figure 11.8. Based on this template, such a realization can be easily extended to the multi-port case. y2 = −y12
PSfrag replacements
y3 = y22 + y12
y1 = y11 + y12
Figure 11.9 General two-port realization Π model. Reprinted with permission from [94] (c) 2006 IEEE.
y11 (s) · · · y1n (s) .. .. YN ×N (s) = ... . . . yN 1 (s) · · · yN N (s)
(11.20)
Generally, for a reduced N -port network with a full N × N admittance matrix as shown in (11.20), the realized network will be a complete graph where each branch represents an admittance that is realized by the one-port realization method. For instance, Figure 11.10 shows a realization of a synthesized six-port network. The branch admittance of the mth port branch (the branch between the port and ground) is the sum of all the mth row admittances, and the admittance of the branch between the port and any other port is the negative value of the corresponding admittance. Notice that the new realized circuits automatically preserve the reciprocity of the original circuit matrix as it requires the admittance to be symmetric.
11.3
Multi-port non-reciprocal circuit realization In this section, we discuss methods to realize non-symmetric circuit matrices, which are often the case for active circuits due to the presence of controlled sources.
198
General multi-port circuit realization
Figure 11.10 Six-port realization based on Π-structure. Reprinted with permission from [94] (c) 2006 IEEE.
In this section, we show how to realize a general non-symmetric (non-reciprocal) n × n admittance matrix into a macro-model in the form of RLC and controlled elements, which can be accepted by general SPICE-like simulators in both frequency and time-domain simulation. Our approach is still based on the relaxed one-port realization approach. For one-port network, the realization process is the same as shown in Section 11.2. The difference lies in the multiple-port realization, as shown below. To realize a general n × n non-reciprocal admittance matrix, we propose a general complete-graph structure (for the full admittance matrix) to realize the admittance matrix based on the one-port realization. In the following, we first illustrate how a 2-port network is realized and then we extend this algorithm for the general n-port network realization. Given a 2 × 2 non-symmetric admittance matrix as shown in (11.21), y11 (s) y12 (s) Y2×2 (s) = , (11.21) y21 (s) y22 (s) where y12 (s) 6= y21 (s), the admittance matrix can be realized exactly by using the circuit template shown in Figure 11.11, where each branch admittance will be realized by the one-port Foster’s expansion method shown in Figure 11.8. Notice that the non-symmetric admittance is realized using two voltagecontrolled voltage sources (VCVS), EV1 and EV2 , and two current controlled current sources (CCCS), F I12 and F I21 . Both VCVS and CCCS have unit transfer gain. For instance, to realize y12 (s), which represents the transconductor that leads
PSfrag replacements
199
11.4 Numerical examples
+
1
3 F I12
V1 y11 _
+
EV2 −
4
2
I12
I21
y12
− y21 EV 1
+
+
F I21 V2 y22 _
Figure 11.11 General two-port non-reciprocal active realization. Reprinted with permission from [73] (c) 2005 IEEE.
to the current injected into node 1 due to voltage at node 2, the VCVS EV2 first transforms the voltage at node 2 (V2 ) into the node 3 (V2 = V3 ). Then V3 drives the admittance y12 to generate the current I12 , which then drives a CCCS F I12 to inject the same amount of current into node 1. Realization of y21 (s) can be explained in a similar way. For a general n × n non-symmetric admittance matrix, we can realize each pair of ports using the aforementioned two-port realization method until all the pairs of ports are realized. The resulting circuit will have a complete graph structure (for the full admittance matrix). But the non-symmetric property will be preserved during the realization process.
11.4
Numerical examples In this section, we present two examples. The first example is a one-port passive network and its network function Y1−port (s) in the partial fraction form with four system poles is given below: Y1−port (s) =
3.4160 × 107 + 1.1311 × 106 i (s + 5.0661 × 108 − 1.3497 × 1010 i) 3.4160 × 107 − 1.1311 × 106 i + (s + 5.0661 × 108 + 1.3497 × 1010 i) 2.1699 × 108 + 6.9784 × 106 i + (s + 1.9829 × 109 − 7.905 × 1010 i) 2.1699 × 108 − 6.9784 × 106 i + . (s + 1.9829 × 109 + 7.905 × 1010 i)
(11.22)
The frequency-domain responses of this circuit and the realized circuit are shown in (11.12). The two responses are the same. Secondly, we present the realization results for a two-port passive network Y2−port (s) in partial fraction form with six system poles, as shown below:
Y11 (s) Y12 (s) Y2−port (s) = , Y12 (s) Y22 (s)
(11.23)
General multi-port circuit realization
0.12
0.1 Original Realized 0.08
Magnitude
200
0.06
0.04
0.02
0
0
0.5
1
1.5
Frequency
2
2.5 10
x 10
Figure 11.12 Comparison between the transfer function Y1−port (s) and its circuit realization.
where
Y11 (s) =
−1.5280 × 107 − 1.4916 × 108 i + (s + 5.9850 × 1010 − 2.2440 × 1010 i) −1.5280 × 107 + 1.4916 × 108 i + (s + 5.9850 × 1010 + 2.2440 × 1010 i) 2.3594 × 107 + 1.7770 × 107 i + (s + 7.8192 × 109 − 3.9525 × 1010 i) 2.3594 × 107 − 1.7770 × 107 i + (s + 7.8192 × 109 + 3.9525 × 1010 i) −1.2283 × 107 − 2.3393 × 107 i + (s + 9.8407 × 109 − 8.2723 × 109 i) −1.2283 × 107 + 2.3393 × 107 i , (s + 9.8407 × 109 + 8.2723 × 109 i)
201
11.4 Numerical examples
Y12 (s) =
Y21 (s) =
Y22 (s) =
−7.3490 × 107 − 1.3550 × 108 i + (s + 5.9850 × 1010 − 2.2440 × 1010 i) −7.3490 × 107 + 1.3550 × 108 i + (s + 5.9850 × 1010 + 2.2440 × 1010 i) 3.8682 × 107 + 1.1596 × 107 i + (s + 7.8192 × 109 − 3.9525 × 1010 i) 3.8682 × 107 − 1.1596 × 107 i + (s + 7.8192 × 109 + 3.9525 × 1010 i) −5.9312 × 106 + 2.2050 × 107 i + (s + 9.8407 × 109 − 8.2723 × 109 i) −5.9312 × 106 − 2.2050 × 107 i , (s + 9.8407 × 109 + 8.2723 × 109 i)
−7.3490 × 107 − 1.3550 × 108 i + (s + 5.9850 × 1010 − 2.2440 × 1010 i) −7.3490 × 107 + 1.3550 × 108 i + (s + 5.9850 × 1010 + 2.2440 × 1010 i) 3.8682 × 107 + 1.1596 × 107 i + (s + 7.8192 × 109 − 3.9525 × 1010 i) 3.8682 × 107 − 1.1596 × 107 i + (s + 7.8192 × 109 + 3.9525 × 1010 i) −5.9312 × 106 + 2.2050 × 107 i + (s + 9.8407 × 109 − 8.2723 × 109 i) −5.9312 × 106 − 2.2050 × 107 i , (s + 9.8407 × 109 + 8.2723 × 109 i) −6.5196 × 108 − 5.9276 × 108 i + (s + 5.9850 × 1010 − 2.2440 × 1010 i) −6.5196 × 108 + 5.9276 × 108 i + (s + 5.9850 × 1010 + 2.2440 × 1010 i) 5.5074 × 107 − 8.9274 × 106 i + (s + 7.8192 × 109 − 3.9525 × 1010 i) 5.5074 × 107 − 8.9274 × 106 i + (s + 7.8192 × 109 + 3.9525 × 1010 i) 8.2948 × 106 − 1.5720 × 107 i + (s + 9.8407 × 109 − 8.2723 × 109 i) 8.2948 × 106 + 1.5720 × 107 i . (s + 9.8407 × 109 + 8.2723 × 109 i)
(11.24)
General multi-port circuit realization
The frequency responses for Y12 (s) and Y22 (s) are shown in the Figure 11.13 and Figure 11.14. Again, the realized circuit has the exact responses of the original circuit.
−3
4.5
x 10
4 Original
3.5
Realized
Magnitude
3 2.5 2 1.5 1 0.5
0
0.5
1
1.5
2
Frequency
2.5 10
x 10
Figure 11.13 Comparison between the transfer function Y12 (s) and its circuit realization.
−3
12
x 10
Original Realized
11 10 Magnitude
202
9 8 7 6 0
0.5
1
Frequency
1.5
2
2.5 10
x 10
Figure 11.14 Comparison between the transfer function Y22 (s) and its circuit realization.
11.5 Summary
11.5
203
Summary In this chapter, we have presented two network realization approaches for generating SPICE-compatible circuit netlist from the reduced admittance or impedance matrices. The first method, Brune’s method, is based on the classic network synthesis technique, which guarantees that the realized circuits are physically implemented. But we showed that such physical implementation is not necessary from the circuit simulation’s perspective. Based on this observation, we presented a more simple yet efficient multi-port network realization approach, which is based on the relaxed one-port Foster network synthesis method. Our new approach allows for non-physical RLC elements, the passivity, however, is ensured at the admittance or impedance matrix level, thus, the realized circuits are still passive. We also extended this method to realize non-symmetric circuit matrices, which typically come from active linear or linearized circuits.
12 Reduction for multi-terminal interconnect circuits
In this chapter, we study the model order reductions on interconnect circuits with many terminals or ports. We show that projection-based model order reduction techniques are not very efficient for those circuits. We then present an efficient reduction method which combines projection-based MOR with a frequency domain fitting method to produce reduced models for interconnect circuits with large terminals.
12.1
Introduction Krylov subspace projection methods have been widely used for model order reduction, owing to their efficiency and simplicity for implementation [32, 37, 85, 91, 113]. Chapter 2 has a detailed review of those methods. One problem with the existing projection-based model order reduction techniques is that they are not efficient at reducing circuits with many ports. This is reflected in several aspects of the existing Krylov subspace algorithms like PRIMA [85]. First, the time complexity of PRIMA is proportional to the number of ports of the circuits as moments excited by every port need to be computed and matrix-valued transfer functions are generated. Second, the poles of the reduced models increase linearly with the number of ports, and this makes the reduced models much larger than necessary. The fundamental reason is that all the Krylov-based projection methods are working directly on the moments, which contain the information of both poles and residues for the corresponding transfer function. To deal with more ports, we have more transfer functions and thus more poles and residues to compute. However, poles among different transfer functions are the same for the same circuits as poles are characteristics of a system. But projection-based methods cannot take advantage of this as they operate directly on moments. As more residues are computed for more transfer functions, more poles also generated. However, generating more poles does not always help to improve the accuracy of the reduced models, as more block moments are not always matched as the number of poles increases. As a result, projection-based methods lead to larger reduced models than necessary when the number of ports is larger. One way to resolve this problem is by means of port reductions. Recent work by
204
12.2 Problems of subspace projection-based MOR methods
205
Feldmann et al. [31] exploited the port dependence to reduce the number of ports under some error metric constraints. This work, however, is limited to circuits with main similar ports and it cannot be applied to general linear circuits with many independent ports. Another approach, which is also amenable for circuits with multiple ports, is the hierarchical model order reduction method [93,121]. But hierarchical model order reduction is also numerically unstable, except for reducing tree circuits [121]. The improved version can produce more accurate models at the expense of multi-frequency point expansions [93]. In this chapter, we present an efficient model order reduction method that overcomes the difficulty associated with subspace projection-based MOR methods for reducing circuits with many ports. The basic idea of the new method is to compute separately the poles and residues of the transfer functions in the reduced admittance matrices. This can be achieved, first, by applying traditional subspace projection methods to compute the poles and then using hierarchical symbolic analysis for computing frequency responses of admittances to determine the residues of transfer functions. Since traditional projection-based MOR is used only for computing the poles, we only need to compute the poles necessary for the accuracy requirements. Finally to ensure the passivity of the reduced model, a convex programming based optimization is applied. The numerical results show that the new method can lead to much smaller reduced-model sizes for a given frequency range or much higher accuracy given the same model sizes than subspace projection based methods although at a higher computation cost.
12.2
Problems of subspace projection-based MOR methods In this section, we briefly review the Krylov subspace based projection methods and point out their weakness for reducing circuits with multiple ports. Detailed discussions of Krylov subspace methods can be found in Chapter 2. We look at the most representative Krylov subspace method, PRIMA [85]. Without loss of generality, a linear m-port electrical circuit can be expressed as C x˙ n = −Gxn + Bum i m = L T xn ,
(12.1)
where x is the vector of state variables and n is the number of state variables, m is the number of independent sources specified as ports, C, G are matrices for conductance and storage elements, and B and L indicate input and output ports; typically B = L for interconnect circuits and both of them are n × N matrices; N is the number of terminals (as input and output ports). Define A = −G−1 C, A ∈ 0 s) , for Re(s) > 0 0 ≤ Im(s) ≤ 2πfmax (2) Y(s) = Y(¯ (3) Y(s) + Y(s)H ≥ 0 , for Re(s) > 0 In other words, Y(s) will be positive real for the given frequency range [0, fmax]. The main benefit of a reduced system being conditionally passive is that conditional passivity can be much easier to achieve than strict passivity. Many existing frequency-domain rational fitting methods [50,78] can be used to do this with much more scalable computational costs than the convex programming method [21]. On the other hand, we put more constraints on the signals driving the conditionally passive systems: we need to make sure that the signal spectrum is band limited such that its bandwidth is within the positive real bandwidth of the reduced system. In the following section, we present two methods to achieve this requirement.
221
13.4 Passivity enforcement by waveform shaping
13.4
Passivity enforcement by waveform shaping In this section, we discuss two methods to band limit a signal by slightly shaping its waveform. Note that based on the Fourier transform, if a signal is finite in the time domain, its spectrum extends to infinite frequency, and if its bandwidth is finite, its duration is infinite in the time domain. For a practical non-periodic time-limited signal such as a switching current in the signal line from transistor switching, one can never band limit such a signal from a strictly mathematical point of view. But practically we can make the out-of-band frequency energy sufficiently small compared with the in-band frequency energy such that the out-of-band energy will not stimulate the non-passive behavior of the system.
13.4.1
FFT and IFFT based waveform shaping The first method is based on the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT). The idea is to first transform the original transient signal into the frequency domain. Since in the FFT (or the discrete Fourier transform, DFT), we treat the non-periodic signal as a periodic signal, the resulting signal’s spectrum becomes discrete. Then we truncate those frequencies beyond fmax , which is given by designers. After this, we perform the inverse FFT on the truncated spectrum to get the time-domain waveform of the shaped signal (we only take the waveform in one period). The whole process is illustrated in Figure 13.5 and the algorithm is outlined in Figure 13.6.
Original waveform
Fast Fourier transform
Original spectrum
Spectrum truncation
F max
Shaped waveform
Inverse Fast Fourier transform
Shaped spectrum
Figure 13.5 Algorithm flow of FFT-IFFT-based waveform shaping.
Figure 13.7(a) and (b) show a ramp signal and its spectrum. The shaped waveform with the cut-off frequency fmax = 10 Ghz and the corresponding truncated spectrum are shown in Figure 13.7(c) and 13.7(d). The shaped waveform with the cut-off frequency fmax = 2 Ghz and the corresponding truncated spectrum are
222
Passive modeling by signal waveform shaping
FFT IFFT WaveformShaping Sample input data with Fs ; Fast Fourier transform P (j−1)(k−1) X0 (k) = N ; j=1 x0 (j)wN Spectrum truncation If fk < fmax or fk > Fs − fmax , X1 (k) = X0 (k); If fmax < fk < Fs − fmax , X1 (k) = 0; Inverse fast Fourier transform PN −(j−1)(k−1) x1 (j) = (1/N ) k=1 X1 (k)wN ; return vector: x1 of length N ; Figure 13.6 Algorithm of FFT-IFFT-based waveform shaping.
shown in Figure 13.7(e) and (f). In general, the spectrum truncation does not significantly change the waveform characteristics such as delay and slew. As we truncate high-frequency components, the shaped waveform shows some undershoots and overshoots in Figure 13.7(e). Those small undershoots and overshoots do not affect the delay and timing of the shaped waveform, when it propagates through the reduced model. If we truncate the spectrum at a higher frequency such as 10 GHz, we find that the resulting waveform is almost the same as the original one, which is shown in Figure 13.7(c). This demonstrates that if the cut-off frequency is high enough, the distortion caused by truncating can be tolerated. The drawback of the explicit waveform shaping method using FFT and IFFT is that it takes extra computational costs to process the signals. The computational cost is O(nlog2 (n)), where n is the number of sampling points.
13.4.2
Low-pass-filter based waveform shaping The second method is based on implicit waveform shaping by adding passive lowpass filters between the input signal and the reduced system, as shown in Figure 13.8. In this way, we guarantee that the signals through the reduced system are band limited. Notice that if we have a few input terminals (as for many interconnect circuits like clock trees or clock meshes), adding a few filters at those terminals will not increase the sizes of the reduced models significantly. Since the filter can be passively realized by a LC ladder, it can be combined with the reduced model to function as a passive model. Therefore, we can conveniently use this new model in current simulation software such as SPICE. However, we need to look at several issues associated with this method before we use it. First, the low-pass filter may also distort the input signals, as different frequency components may be delayed differently. Second, the introduction of a
223
13.4 Passivity enforcement by waveform shaping
Original ramp
Magnitude
Voltage (V)
0.8 0.6 0.4 0.2 0 2
3
4
Time (s) (a) Shaped ramp
Magnitude
Voltage (V)
5
0.4 0.2 4
10
9
Cut−off: 10 GHz 0
5
10
Frequency (Hz) (d) Shaped spectrum
−9
x 10
30
1
15 x 10
20
0
5
10
Frequency (Hz) (b) Shaped spectrum
30
0
15 9
x 10
25
0.8
Magnitude
Voltage (V)
0
x 10
0.6
Time (s) (c) Shaped ramp
10
0
5
0.8
3
20
−9
1
2
Original spectrum
30
1
0.6 0.4 0.2
20 15 10
Cut−off: 2 GHz
5
0 2
3
Time (s) (e)
4
5
0
0
−9
x 10
5
10
Frequency (Hz) (f)
15 9
x 10
Figure 13.7 Ramp signal shaped at different frequencies.
low-pass filter will introduce delay. In the following, we discuss methods to mitigate those two problems.
Mitigation of distortion problems The phase function and the resulting group delay function of a filter have profound time-domain ramifications as they have a direct effect on the waveform shape of the output signals. As a result, we choose the Bessel filter family for its good timedomain property. A Bessel filter has a linear phase characteristic over the passband of the filter, which implies a constant time delay over the pass-band of the filter (see Figure 13.9) so that the phase distortion in the filtering process can be avoided.
Passive modeling by signal waveform shaping
Input
Reduced Model
Passive low−pass filters
Output
Conditional passive model
Figure 13.8 Low-pass-filter-based waveform shaping.
From Figure 13.9(a), we can see a constant time delay from DC to the normalized frequency one when the order (n) of the filter is higher than three. In addition, its step response exhibits negligible overshoot and ringing. Group−delay characteristic
4
Magnitude response
50
n=10 3.5
0 9
n=1
8
3
−50 2
2.5
Magnitude (dB)
7
Group delay (s)
224
6 5
2 4 1.5
−100
3 4
−150
5 6
−200
3 1
2
0.5 0
7
0.5
1
1.5
9 10
−300
1
0
8
−250
2
Normalized frequency (a)
2.5
3
−350
0
2
4
6
Normalized frequency (b)
8
10
Figure 13.9 Group-delay characteristic and magnitude response for different order Bessel filters (normalized frequency).
However, a gradual roll-off (longer decay range) is the price we have to pay for a good time domain property. Fortunately, we can compromise it by increasing the order of the filter (see Figure 13.9(b)) at the cost of larger reduced models. Another way is to increase the passive frequency range so that the filter has sufficient reduction of spectrum (again at the cost of larger reduced models).
225
13.5 Numerical examples
Mitigation of delay problems Another issue we have to take into consideration is the time delay caused by the filter. Three factors can influence the time delay: the prototype, the order, and the cut-off frequency of a filter. Among them, the cut-off frequency is the dominant factor because the delay is inversely proportional to the cut-off frequency of a filter. Actual delay =
Normalized delay . Actual corner frequency (fc)
(13.1)
For example, for the eighth order Bessel filter, the normalized delay is 2.703 s. If the cut-off frequency is as high as 20 GHz, the actual delay could be as small as 0.135 ns. Hence, if the cut-off frequency is sufficiently high, the group delay caused by the filter can be made sufficiently small compared with the delay of the original circuit so that such a delay can be ignored.
13.5
Numerical examples In this section, we present some experimental results on two interconnect circuits from industry designs (from Cadence Design Systems Inc.). Conditional passivity is achieved by using the minimum-square fitting method on the required transfer functions with poles computed from projection-based methods, such as PRIMA. This fitting method can make the reduced models accurate to the given maximum frequency and ensure the passivity of the models in the given frequency range. The first example is a RC circuit with 210 nodes and 3 terminals. In this experiment, we used a steep square waveform as the input signal, as shown in Figure 13.10(a). We applied this signal to the original model, the reduced model, and the LPF (low-pass filter) based reduced model. The output waveforms of these three models are shown in Figure 13.10(b),(c),(d), respectively. The reduced model is conditionally passive: the passivity of the model can only be preserved at the frequency range from DC to 15 GHz. Since a steep square waveform contains high-frequency components beyond this range, we can observe the erratic time-domain behavior caused by energy generated at high frequencies, as shown in Figure 13.10(c). However, by eliminating those high-frequency components by LPF, the output waveform of the LPF-based reduced model (Figure 13.10(d)) matches the output waveform of the original model (Figure 13.10(b)) with little discrepancy. Therefore, the LPF-based reduced model can function as a passive model at all frequencies. In addition, we compare the qualities of a Bessel-LPF-based reduced model and an ellipse-LPF-based reduced model in Figure 13.11. Figure 13.11(a) and (b) show the transient responses from the original circuit due to the square input waveform. Figure 13.11(c) and (d) show the transient responses from the Bessel-LPF-based reduced model while Figure 13.11(e) and (f) are the transient responses from the
226
Passive modeling by signal waveform shaping
ellipse based reduced models using the same filter order. The results clearly show that the Bessel-LPF-based reduced model is superior to the ellipse-LPF-based based models. As shown in (Figure 13.11(d),(f)), the Bessel-LPF-based reduced model can effectively avoid overshoots and ringing. This result further demonstrates the rationale of the choice for a Bessel LPF over other types of LPF. The second example is a RC circuit (168 nodes) with 132 terminals (14 drivers and 118 receivers). This circuit does not have to be reduced much owing to a large number of terminals compared with the number of nodes. But it serves as an example that the convex programming method fails to optimize because of the large terminal count. Still, we use the fitting method to perform the reduction in the frequency domain and make the reduced models accurate to 50 Ghz. We use a steep square waveform, as shown in Figure 13.12(a) as the input. We apply this signal to the original model, the reduced model, and the LPF-based reduced model. The output waveforms of these three models are shown in Figure 13.12(b),(c),(d), respectively. By eliminating high-frequency components by LPF, the results are similar: output waveforms from the LPF-based reduced model (Figure 13.12(d)) match the original model (Figure 13.12(b)) well. But the simple reduced models lead to erratic time-domain behavior owing to their non-passivity at high frequencies, as shown in Figure 13.12(c). The experimental results also show that the output of the LPF-based reduced model exhibits less ringing than the output of the original model. This is because the ringing is caused by high frequency components of the input signal and many of those components are eliminated by LPF in the reduced model. If the ringing is of interest, we can observe more of is by increasing the frequency range of the reduced models.
13.6
Summary In this chapter, we presented a new passivity enforcement scheme for the passive modeling of general linear time-invariant systems. Instead of making the model passive for all frequencies as traditional methods did, the new method works on the signal waveforms, assuming that the circuit models are only passive for a limited frequency band (called conditionally passive). Such relaxation makes the circuit passive modeling work much easier using fitting-based methods for a reduced system, especially for systems with many terminals or requiring wideband accuracy using measured data. By shaping the signal waveforms such that the resulting signal’s spectra are band limited, the resulting systems will still be passive from the simulation’s perspective. We presented the waveform shaping method using frequency truncation by means of FFT/IFFT and low-pass-filter-based approaches for transient waveform shaping processes. We analyzed the delay and distortion effects caused by using low-pass filters and the new methods to mitigate the distortion and delay effects. Numer-
227
13.6 Summary
Input square waveform
−4
x 10
Output waveform of the original model
14 1
12 10
Current (A)
Voltage (V)
0.8 0.6 0.4 0.2
1
2
Time (s) (a)
3
4
−2
5
0
1
2
−8
x 10
Output waveform of the reduced model
−4
x 10
Time (s) (b)
3
4
5 −8
x 10
Output waveform of the reduced model with LPF
14
0.3
12
0.2
10
0.1
Current (A)
Current (A)
4
0 0
0.4
0 −0.1 −0.2
8 6 4 2
−0.3 −0.4
6
2
0 −0.2
8
0 0
1
2
Time (s) (c)
3
4
5 −8
x 10
−2
0
1
2
Time (s) (d)
3
4
5 −8
x 10
Figure 13.10 Comparison of responses of different models in time domain for the first example.
ical results on several interconnect circuits demonstrated the effectiveness of the method.
Passive modeling by signal waveform shaping
Original model
−4
x 10
10 5 0 −5
0
1
−4
x 10
4
−4
15
x 10
1
3
4
Time (s) (c)
6 −9
x 10
Bessel LPF based model
1.1 0
2
−3
1.4
4
4
1.2
1
Current (A) 3
Time (s) (e)
Time (s) (b)
1.3
5
0
2
x 10
−8
Ellipse LPF based model
1
2
x 10
5
0
0
−3
10
−5
1.1
1.4
0 2
1.2
5
Bessel LPF based model
1
1.3
−8
5
0
x 10
x 10
10
−5
Current (A)
3
Time (s) (a)
Current (A)
Current (A)
15
2
Original model
−3
1.4
Current (A)
Current (A)
15
x 10
Time (s) (d)
4
6 −9
x 10
Ellipse LPF based model
1.3 1.2 1.1 1
5
0
2
−8
x 10
Time (s) (f)
4
6 −9
x 10
Figure 13.11 Comparison in time domain between reduced models based on Bessel filters and ellipse filters.
Input square waveform
Current (A)
Voltage (V)
0.6 0.4 0.2 0 1
x 10
2
3
4
1
2
0
1
2
1 0.5
−0.5
5
−8 Time (s) x 10 (a) Output waveform of the reduced model
3
3
3
4
3
4
5
−8 Time (s) x 10 (b) −6 x 10 Output waveform of the reduced model with LPF
2.5
Current (A)
2 1 0 −1 −2
2 1.5 1 0.5 0
−3 −4
0
2 1.5
0 0
−6
4
Output waveform of the original model
2.5
0.8
−0.2
x 10
−6
3
1
Current (A)
228
0
1
2
Time (s) (c)
3
4
5 −8
x 10
−0.5
Time (s) (d)
5 −8
x 10
Figure 13.12 Comparison of responses of different models in time domain for second example.
References
1
2 3 4 5 6 7 8 9 10
11 12
13 14 15 16
R. Achar, P. K. Gunupudi, M. Nakhla, and E. Chiprout, “Passive interconnect reduction algorithm for distributed/measured networks,” IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing, 47, (4), 287–301, April 2000. D. Allstot, K. Choi, and J. Park, Parasitic-Aware Optimization of CMOS RF circuits. Kluwer Academic Publishers, 2003. C. S. Amin, M. H. Chowdhury, and Y. I. Ismail, “Realizable RLCK circuit crunching,” in Proc. Design Automation Conf. (DAC), 2003, 226–231. B. D. O. Anderson, “A system theory criterion for positive real matrices,” SIAM J. Contr., 5, 171–182, 1967. B. D. O. Anderson and S. Vongpanitlerd, Network Analysis and Synthesis. Englewood Cliffs, NJ: Prentice-Hall, 1973. A. Antoulas and D. C. Sorensen, “Approximation of large-scale dynamical system: An overview,” Int. J. Appl. Math. Comput. Sci., 11, (5), 1093–1121, 2001. W. F. Arnold and A. J. Laub, “Generalized eigenproblem algorithms and software for algebraic Riccati equation,” Proc. IEEE, 72, 1764–1754, 1984. W. E. Arnoldi, “The principle of minimized iteration in the solution of the matrix eigenvalue problem,” Quat. Appl. Math., 9, 17–29, 1951. M. Beattie and L. Pileggi, “Efficient inductance extraction via windowing,” in Proc. European Design and Test Conf. (DATE), 2001, 430–436. P. Benner, E. S. Quintana-Orti, and G. Quintana-Orti, “State-space truncation methods for parallel model reduction of large-scale systems,” Parallel Comput., 29, (11-12), 1701–1722, 2003. S. Boyd, L. Vandenberghe, A. E. Gamal, and S. Yun, “Design of robust global power and ground networks,” in Proc. ACM Int. Symp. on Physical Design (ISPD), 2001. O. Brune, “Synthesis of a finite two-terminal network whose driving point impedance is a prescribed function of frequency,” Journal of Math. and Phys., 10, 191–236, 1931. R. Brune and R. Duffin, “Impedance synthesis without transformers,” J. of Appl. Phys., 20, 816, 1949. M. Celik, L. Pileggi, and A. Odabasioglu, IC interconnect analysis. Kluwer Academic Publishers, 2002. C. Chen, Linear System Theory and Design, 3rd ed. Oxford University Press, 1999. T. Chen, C. Luk, and C. Chen, “Inductwise: Inductance-wise interconnect simulator 229
230
17
18 19
20 21
22
23 24 25
26 27
28 29
30 31
32
References
and extractor,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 22, (7), 884–894, 2003. T. Chen, C. Luk, H. Kim, and C. Chen, “INDUCTWISE: Inductance-wise interconnect simulator and extractor,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2002. E. Chiprout, “Fast flip-chip power grid analysis via locality and grid shells,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov. 2004, 485–488. E. Chiprout and M. S. Nakhla, “Analysis of interconnect networks using complex frequency hopping,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 14, (2), 186–200, Feb. 1995. E. Chiprout and N. S. Nakhla, Asymptotic Waveform Evaluation. Boston: Kluwer Academic Publishers, 1994. C. P. Coelho, J. Phillips, and L. M. Silveira, “A convex programming approach for generating guaranteed passive approximations to tabulated frequency-data,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 23, (2), 293–301, Feb. 2004. C. P. Coelho, J. R. Phillips, and L. M. Silveira, “A convex programming approach to positive real rational approximation,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov. 2001, 245–251. J. Cong, L. He, C.-K. Koh, and P. Padden, “Performance optimization of VLSI interconnect layout,” Integration, the VLSI Journal, 21, (1&2), 1–94, Nov 1996. W. J. Dally and J. W. Poulton, Digital Systems Engineering. Cambridge University Press, 1998. A. Devgan, H. Ji, and W. Dai, “How to efficiently capture on-chip inductance effects: introducing a new circuit element K,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2000, 150–155. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. John Wiley and Sons Inc., 2001. P. Elias and N. van der Meijs, “Including higher-order moments of RC interconnections in layout-to-circuit extraction,” in Proc. European Design and Test Conf. (DATE), 1996, 362–366. W. C. Elmore, “The transient analysis of damped linear networks with particular regard to wideband amplifiers,” J. Appl. Phys., 1948. D. F. Enns, “Model reduction with balanced realizations: an error bound and frequency weighted generalization,” in Proc. 23th IEEE Conf. Decision and Control, 1984, 127–132. K. K. et al., “Design of mixed-signal systems-on-a-chip,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 1561–1571, 2000. P. Feldmann, “Model order reduction techniques for linear systems with large numbers of terminals,” in Proc. European Design and Test Conf. (DATE), 2004, 944– 947. P. Feldmann and R. W. Freund, “Efficient linear circuit analysis by Pade approximation via the Lanczos process,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 14, (5), 639–649, May 1995.
References
33 34
35
36
37 38
39
40 41 42 43 44 45 46
47 48
49
50
231
——, “Reduced-order modeling of large linear subcircuits via block lanczos algorithm,” in Proc. Design Automation Conf. (DAC), 1995, 376–380. P. Feldmann and F. Liu, “Sparse and efficient reduced order modeling of linear subcircuits with large number of terminals,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2004, 88–92. R. Freund and P. Feldmann, “Reduced-order modeling of large linear passive multiterminal circuits using matrix-Pade approximation,” in Proc. European Design and Test Conf. (DATE), 1998, 530–537. R. W. Freund, “Passive reduced-order modeling of interconnects simulation and their computation by Krylov-subspace algorithm,” in Proc. Design Automation Conf. (DAC), 1999, 195–200. ——, “SPRIM: structure-preserving reduced-order interconnect macromodeling,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2004, 80–87. R. W. Freund and P. Feldmann, “Reduced-order modeling of large linear subcircuits by means of the SyPVL algorithm,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1996, 280–287. ——, “Reduced-order modeling of large linear subcircuits by means of the sypvl algorithm,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1996, 280– 287. F. R. Gantmacher, The Theory of Matrices, vol. 1. New York : Chelsea Publishing Co., 1977. G. Gielen and R. Rutenbar, “Computer-aided design of analog and mixed-signal integrated circuits,” Proc. of IEEE, 88, (12), 703–717, Dec. 2000. G. Gielen and W. Sansen, Symbolic Analysis for Automated Design of Analog Integrated Circuits. Kluwer Academic Publishers, 1991. K. Glover, “All optimal Hankel-norm approximations of linear multivariable systems and their l∞ -error bounds,” Int. J. Contr., 39, 1115–1193, 1984. G. H. Golub and C. V. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University Press, 1996. E. J. Grimme, Krylov projection methods for model reduction (Ph.D. Thesis). University of Illinois at Urbana-Champaign, 1997. S. Grivet-Talocia, I. A. Maio, and F. Canavero, “Recent advances in reduced-order modeling of complex interconnects,” in Proc. Dig. Electrical Performance Electronic Packaging, Oct. 2001, 243–246, vol. 10. G. Guardabassi and A. Sangiovanni-Vincentelli, “A two level algorithm for tearing,” IEEE Trans. on Circuits and Systems, 783–791, 1976. R. Gupta, B. Tutuianu, and L. Pileggi, “The Elmore delay as a bound for RC trees with generalized input signals,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, (1), 95–104, Jan. 1997. B. Gustavsen and A. Semlyen, “Rational approximation of frequency domain responses by vector fitting,” IEEE Trans. on Power Delivery, 14, (3), 1052–1061, March 1999. ——, “Enforcing passivity for admittance matrices approximated by rational functions,” IEEE Trans. on Power Systems, 16, (1), 97–104, Feb. 2001.
232
51
52 53
54 55 56
57 58
59
60
61
62 63
64 65
66
67
References
M. M. Hassoun and P. M. Lin, “A hierarchical network approach to symbolic analysis of large scale networks,” IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, 42, (4), 201–211, April 1995. Z. He, M. Celik, and L. Pillegi, “SPIE: Sparse partial inductance extraction,” in Proc. Design Automation Conf. (DAC), 1997, 137–140. X. Huang, V. Rahjavan, and R. A. Rohrer, “Awesim: a program for efficient analysis of linear(ized) circuits,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1990. Y. Ismail, “Efficient model order reduction via multi-node moment matching,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov. 2002, 767–774. Y. Ismail and E. G. Friedman, On-Chip Inductance in High Speed Integrated Circuits. Kluwer Academic Publishers, 2002. ——, “DTT: direct truncation of the transfer function – an alternative to moment matching for tree structured interconnect,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 21, (2), 131–144, Feb. 2003. J. D. Jackson, Classical Electrodynamics. John Wiley and Sons, 1975. H. Ji, A. Devgan, and W. Dai, “Ksim: A stable and efficient RKC simulator for capturing on-chip inductance effect,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), 2001, 379–384. M. Kamon, M. Tsuk, and J. White, “FastHenry: a multipole-accelerated 3D inductance extraction program,” IEEE Trans. on Microwave Theory and Techniques, 1750–1758, Sept. 1994. M. Kamon, F. Wang, and J. White, “Generating nearly optimally compact models from Krylov-subspace based reduced-order models,” IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, 47, (4), 239–248, 2000. S. Kapur and D. Long, “IES3: A fast integral equation solver for efficient 3dimentsional extraction,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1997. C. Kashyap and B. Krauter, “A realizable driving point model for on-chip interconnect with inductance,” in Proc. Design Automation Conf. (DAC), 2000. K. J. Kerns and A. T. Yang, “Stable and efficient reduction of large, multiport RC network by pole analysis via congruence transformations,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 16, (7), 734–744, July 1998. Y. Kim and D. Sylvester, “Indigo: http://vlsida.eecs.umich.edu/,” EECS Dept. at University Of Michigan. B. Krauter and L. Pileggi, “Generating sparse partial inductance matrices with guaranteed stability,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1995, 45–52. K. S. Kundert, ”A formal top-down design process for mixed-signal circuits,” in Analog Circuit Design, R. J. van de Plassche, J. Huigsing, and W. Sansen, Eds. Kluwer Academic Publishers, 2000. C. Lanczos, “An iteration method for the solution of the eigenvalue problem of linear differential and integral operators,” J. Res. Bur. Standards, 45, 255–282,
References
68
69 70 71
72 73
74
75
76
77 78
79
80
81
82
233
1950. A. J. Laub, M. T. Heath, C. C. Paige, and R. C. Ward, “Computation of system balancing transformations and other applications of simultaneous diagonalization algorithms,” IEEE Trans. Automat. Contr., 32, 115–122, 1987. S. Lee and M. H. Hayes, “Properties of the singular value decomposition for efficient data clustering,” IEEE Signal Process Letters, 11, (11), 862–866, Nov. 2004. T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, 2nd ed. Cambridge University Press, 2004. Y. Lee, Y. Cao, T. Chen, J. Wang, and C. Chen, “HiPRIME: Hierarchical and passivity preserved interconnect macromodeling engine for RLKC power delivery,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 26, (6), 797–806, 2005. J. Lillis, C. Cheng, S. Lin, and N. Chang, Interconnect analysis and synthesis. John Wiley, 1999. P. Liu, Z. Qi, A. Aviles, and S. X.-D. Tan, “A general method for multi-port active network reduction and realization,” in Proc. IEEE International Workshop on Behavioral Modeling and Simulation (BMAS), 2005, 7–12. P. Liu, Z. Qi, H. Li, L. Jin, W. Wu, S. X.-D. Tan, and J. Yang, “Fast thermal simulation for architecture level dynamic thermal management,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 2005, 639–644. P. Liu, S. X.-D. Tan, H. Li, Z. Qi, J. Kong, B. McGaughy, and L. He, “An efficient method for terminal reduction of interconnect circuits considering delay variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005, 821–826. P. Liu, S. X.-D. Tan, B. McGaughy, and L. Wu, “Compact reduced order modeling for multiple-port interconnects,” in Proc. Int. Symposium. on Quality Electronic Design (ISQED), 2006, 413–418. R. Ludwig and P. Bretchko, RF Circuit Design, Theory and Applications. Prentice Hall, 2000. T. Mangold and P. Russer, “Full-wave modeling and automatic equivalent-circuit generation of millimeter-wave planar and multilayer structures,” IEEE Trans. on Microwave Theory and Techniques, 47, (6), 851–858, June 1999. Y. Massoud and J. White, “Simulation and modeling of the effect of substrate conductivity on coupling inductance and circuit crosstalk,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 2002. K. Mayaram, D. C. Lee, S. Moinian, D. R. Rich, and J. Joychowdhury, “Computeraided circuit analysis tools for RFIC simulation: algorithms, features, and limitations,” IEEE Trans. on Circuits and Systems II: analog and digital signal processing, 47, (4), 274–286, 2000. B. Moore, “Principle component analysis in linear systems: Controllability, and observability, and model reduction,” IEEE Trans. Automat. Contr., 26, (1), 17–32, 1981. J. Morsey and A. C. Cangellaris, “PRIME: passive realization of interconnect models from measured data,” Electrical Performance of Electronic Packaging, 47–50, Oct. 2001.
234
83
84
85
86 87
88
89 90 91
92 93
94
95
96
97 98 99 100
References
K. Narbos and J. White, “FastCap: a multipole accelerated 3D capacitance extraction program,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 10, (11), 1447–1459, 1991. A. M. Niknejad and R. G. Meyer, “Analysis of eddy current losses over conductive substrates with applications to monolithic inductors and transformers,” IEEE Trans. on Microwave Theory and Techniques, 166–76, Jan. 2001. A. Odabasioglu, M. Celik, and L. Pileggi, “PRIMA: Passive reduced-order interconnect macromodeling algorithm,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 645–654, 1998. A. Pacelli, “A local circuit topology for inductive parasitics,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2002, 208–214. J. R. Phillips, L. Daniel, and L. M. Silveira, “Guaranteed passive balancing transformations for model order reduction,” in Proc. Design Automation Conf. (DAC), 2002, 52–57. ——, “Guaranteed passive balanced transformation for model order reduction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 22, (8), 1027–1041, 2003. J. R. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,” in Proc. European Design and Test Conf. (DATE), 2004, 938–943. ——, “Poor man’s TBR: a simple model reduction scheme,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 24, (1), 43– 55, 2005. L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation for timing analysis,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 352–366, April 1990. L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuit and System Simulation Methods. New York: McGraw-Hill, 1994. Z. Qi, S. X.-D. Tan, H. Yu, L. He, and P. Liu, “Wideband modeling of RF/analog circuits via hierarchical multi-point model order reduction,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan. 2005, 224–229. Z. Qi, H. Yu, P. Liu, S. X.-D. Tan, and L. He, “Wideband passive multi-port model order reduction and realization of RLCM circuits,” IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, 1496–1509, Aug. 2006. H. Qian, S. Nassif, and S. Sapatnekar, “Power grid analysis using random walks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 1204– 1224, 2005. Z. Qin and C.-K. Cheng, “RCLK-VJ network reduction with Hurwitz polynomial approximation,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan. 2003, 283–291. Z. Qin, S. X.-D. Tan, and C. Cheng, Symbolic Analysis and Reduction of VLSI Circuits. Boston, MA: Kluwer Academic Publishers, 2005. Z. Qin and C. Cheng, “Realizable parasitic reduction using generalized Y -∆ transformation,” in Proc. Design Automation Conf. (DAC), 2003, 220–225. B. Razavi, RF Microelectronics. Prentice Hall, 1998. J. Rubinstein, P. Penfield, and M. A. Horowitz, “Signal delay in RC tree networks,”
References
101 102 103
104
105
106
107 108 109
110
111
112
113
114
115
116
235
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 202– 211, 1983. A. E. Ruehli, “Equivalent circuits models for three dimensional multiconductor systems,” IEEE Trans. on Microwave Theory and Techniques, 216–220, 1974. Y. Saad, Iterative methods for linear systems. PWS publishing, 2000. N. Sadegh, J. D. Finney, and B. S. Heck, “An explicit method for computing the positive real lemma matrices,” Int. J. Robust and Nonlinear Control, 7, 1057–1069, 1997. B. Salimbahrami and B. Lohmann, “Krylov subspace methods in linear model order reduction: introduction and invariance properties,” Institute of Automation, University of Bremen, Tech. Rep., Aug. 2002. D. Saraswat, R. Achar, and M. Nakhla, “A fast algorithm and practical considerations for passive macromodeling of measured/simulated data,” IEEE Trans. on Advanced Packaging, 27, (1), 57–70, Feb. 2004. A. Semlyen and A. Dabuleanu, “Fast and accurate switching transient calculations on transmission lines with ground return using recursive convolution,” IEEE Trans. Power Apparatus and Systems, 94, (2), 1975. B. N. Sheehan, “TICER: Realizable reduction of extracted RC circuits,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1999, 200–203. ——, “Branch merge reduction of RLCM networks,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2003, 658–664. K. Shepard and Z. Tian, “Return-limited inductances: A practical approach to onchip inductance extraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 19, (4), 425–436, 2000. C.-J. Shi and X.-D. Tan, “Canonical symbolic analysis of large analog circuits with determinant decision diagrams,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 19, (1), 1–18, Jan. 2000. ——, “Compact representation and efficient generation of s-expanded symbolic network functions for computer-aided analog circuit design,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 20, (7), 813–827, April 2001. Y. Shi, H. Yu, and L. He, “SAMSON: a generalized second-order Arnoldi mehtod for multi-source network reduction,” in Proc. ACM Int. Symp. on Physical Design (ISPD), 2006. M. Silveira, M. Kamon, I. Elfadel, and J. White, “A coordinate-transformed Arnoldi algorithm for generating guaranteed stable reduced-order models of RLC circuits,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1996, 288–294. M. Silveira, M. Kamon, and J. White, “Efficient reduced-order modeling of frequency-dependent coupling inductances associated with 3-D interconnect structures,” in Proc. Design Automation Conf. (DAC), June 1995, 376–380. J. Singh and S. Sapatnekar, “Congestion-aware topology optimization of structured power/ground networks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 683–695, 2005. R. Singh, Signal integrity effects in custom IC and ASIC designs. John Wiley and
236
117 118 119 120
121 122
123
124
125 126
127 128 129 130
131
132 133 134 135
References
Sons, 2003. J. A. Starzky and A. Konczykowska, “Flowgraph analysis of large electronic networks,” IEEE Trans. on Circuits and Systems, 33, (3), 302–315, March 1986. J. Sturm, “Using SuDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optim. Meth. Softw., 10, 625–653, 1999. T. Stykel, “Gramian-based model order reduction for descriptor systems,” Math. Control Signals Systems, 16, 297–319, 2004. H. Su, S. S. Sapatnekar, and S. R. Nassif, “Optimal decoupling capacitor sizing and placement for standard cell layout designs,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 22, (4), 428–436, April 2003. S. X.-D. Tan, “A general s-domain hierarchical network reduction algorithm,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2003, 650–657. ——, “A general hierarchical circuit modeling and simulation algorithm,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 418–434, April 2005. S. X.-D. Tan, W. Guo, and Z. Qi, “Hierarchical approach to exact symbolic analysis of large analog circuits,” in Proc. Design Automation Conf. (DAC), June 2004, 860– 863. S. X.-D. Tan, Z. Qi, and H. Li, “Hierarchical modeling and simulation of large analog circuits,” in Proc. European Design and Test Conf. (DATE), Feb. 2004, 740–741. S. X.-D. Tan and J. Yang, “Hurwitz stable model reduction for non-tree structured RLCK circuits,” in IEEE Int. System-on-Chip Conf. (SOC), 2003, 239–242. X.-D. Tan and C.-J. Shi, “Hierarchical symbolic analysis of large analog circuits via determinant decision diagrams,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 19, (4), 401–412, April 2000. G. C. Temes and J. Lapatra, Introduction to circuit synthesis and design. McGrawHill Book Company, 1977. V. Valkenburg, Linear Circuits. Prentice Hall, 1982. J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design. New York, NY: Van Nostrand Reinhold, 1995. J. M. Wang and T. V. Nguyen, “Extended Krylov subspace method for reduced order analysis of linear circuit with multiple sources,” in Proc. Design Automation Conf. (DAC), 2003, 247–252. N. Wang and V. Balakrishnan, “Fast balanced stochastic truncation via a quadratic extension of the alternating direction implicit iteration,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005, 801–805. J. White, F. Odeh, A. Sangiovanni-Vincentelli, and A. Ruehli, “Waveform relaxation: theory and practice,” Trans. Soc. Compu. Simul., 95–133, 1985. W. Wolf, Modern VLSI Design: System-on-Chip Design, 3rd ed. Prentice Hall, 2002. H. Yu and L. He, “Vector potential equivalent circuit based on PEEC inversion,” in Proc. Design Automation Conf. (DAC), 2003, 781–723. ——, “A provably passive and cost-efficient model for inductive interconnects,”
References
136
137
138
139
140 141
237
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 24, (8), 1283–1294, 2005. H. Yu, L. He, and S. X. D.Tan, “Block structure preserving model reduction for linear circuits with large numbers of ports,” in Proc. IEEE International Workshop on Behavioral Modeling and Simulation (BMAS), 2005, 1–6. H. Yu, L. He, and S. X.-D. Tan, “Compact macro-modeling for on-chip RF passive components,” in Proc. IEEE International Conference on Communications, Circuits and Systems, 2004, 199–202. H. Yu, Y. Shi, and L. He, “Fast analysis of structured power grid by triangularization based structure preserving model order reduction,” in Proc. Design Automation Conf. (DAC), 2006, 205–210. M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchical analysis of power distribution networks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 21, (2), 159–168, Feb. 2002. G. Zhong, C. Koh, and K. Roy, “On-chip interconnect modeling by wire duplication,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2002, 341–346. ——, “On-chip interconnect modeling by wire duplication,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 22, (11), 1521–1532, 2003.
Index
Y -∆ introduction, 3 mq pole matching, 149 s-expaned DDDs, 74 adjoint matrix, 69 algebraic Riccati equations, see ARE ARE, 44 algorithm, 50 Arnoldi method, 2, 25 algorithm, 25 asymptotic waveform evaluation, 2, see AWE AWE, 215 algorithm, 13 algorithm flow, 16 derivation of residues and poles, 15 problems, 16 behavioral modeling, 1 block Arnodli, 31 block Arnoldi method, 26, 31 block Krylov subspace, 31 block Lanczos method, 27 block moment, 13 definition, 31 block moments, 31 block PVL method, see MPVL block structure preserving, 139 block structure-preserving MOR, 5 block structured projection, 144 Brune’s method, 6, 191 algorithm, 193 BSMOR, 139 BSPRIM, 158 admittance function, 166 impedence function, 164 localized moment matching, 165 main idea, 162 partitioning, 161 structure preserving properties, 163 classic TBR, 42 clustering, 149
238
common factor, 70 common-factor cancellation, 70 compact block formulation, 149 complex DDDs, 74 composite admittance, 69 conditional passivity, 218 conditional positive real definition, 219 conditional positive realness, 218 conditionally passive, 6 congruence transformation, 2, 24 pole computation, 19 constrainted least square optimization, 179 controllability, 4 controllability Gramians, 39, 40 convex optimization, 174 convex programming, 174, 216 crosstalk, 1 DDD, 69 definition, 70 DDD based hierarchical decomposition, 70 DDD representation, 72 descriptor form, 56 determinant decision diagram, see DDD dominant poles, 207 DTT, 3 eigendecomposition, 39 Elmore delay, 8 definition, 9 empirical TBR, 3, 46 ESVDMOR, 99 algorithm, 99 reduction flow, 100 fast Fourier transformation, 216 FFT, 216 waveform shaping, 221 fitting approach active circuits, 181 Foster’s method, 6, 195
239
Index
generalized Lyapunov equation, 59 Gramian, 41 Grimme’s projection theorem, 141 Hankel singular values, 41 Hessenberg matrix, 25 hierarchical model order reduction, see HMOR hierarchical subcircuit reduction, 68 HMOR, 3, 5 algorithm, 75 frequency waveform matching, 82 moment-matching connection, 76 multi-point expansion, 81 overview, 68 preservation of reciprocity, 80 Hurwitz polynomial, 3 hybrid TBR, 45 IFFT waveform shaping, 221 inductance, 118 input Krylov subspace, 23 input moment matrices, 96 definition, 98 inverse FFT, 216 K-means based cluster methods, 104 K-means methods, 104 Kalman Yakubovich Popov lemma, 173 Krylov subspace, 8 definition, 22 introduction, 2 moment matching connection theorem, 24 moment matching connection theorem of two-side Krylov method, 24 multiple terminals, 205 pole computation, 18, 209 TBR, 45 theory, 20 Lanczos method, 2, 25 algorithm, 26 Lur’e equations, 43 Lyapunov equations, 42 MIMO, 141 MIMO system, 13 minimum function, 191 MMM, 17 MNA, 11 RLC circuit formulation, 27 model optimization, 172 active circuits, 176 model order reduction, 2, see MOR, 8 model realization, 6 modified Gram–Schmidt, 25
modified nodal analysis, see MNA, 131 moment definition, 9 MOR, 8 MPVL, 2 MTermMOR, 204 algorithm, 208 multi-node moment matching, see MMM multi-port realization, 187 multiple terminal MOR, 204 multiple-port realization, 196 Non-reciprocal, 197 nodal analysis, 3, 131 nodal susceptance, 131 node reduction introduction, 3 oblique projection, 22 observability, 4 observability Gramians, 39, 40 one-port realization Brune’s method, 191 Framework, 188 Relaxation method, 195 orthogonal projection, 22 output Krylov subspace, 23 output moment matrices, 96, 98 Pade approximation algorithm, 13 introduction, 2 Pade via Lanczos method, see PVL method partial element equivalent circuit, see PEEC partial fraction decomposition, 14 passive HMOR, 75 passive TBR, 56 passivity waveform shaping, 217 passivity enforcement, 6, 172 convex programming, 174 passivity enforement waveform shaping, 221 PEEC, 118 POD review, 38 poor man’s TBR, 46 algorithm, 47 introduction, 3 positive real, 172, 217 definition, 43 waveform shaping, 217 positive real TBR algorithm, 44 positive real TBR methods, 3 positive-preserving TBR, 43
240
Index
positive-real constraints, 43 positive-real lemma, 43 possive realness definition, 28 PRIMA algorithm, 30 introduction, 2 passivity preservation, 28 pole computation, 19 PRIME, 216 PriTBR, 4, 57 PriTBR:algorithm, 57 PriTBR:algorithm flow, 60 PriTBR:structure-preserving, 60 PritTBR:congruence transformation, 58 projection based MOR introduction, 2 projection-based MOR passivity preservation, 28 proper orthogonal decomposition, see POD PVL method, 2 reciprocity, 172 recursive moment computation, 11 schur complement, 69 schur decomposition, 68 SeDuMi, 181 semi-definite programming, 216 signal integrity, 1 signal waveform shaping, 215 SIMO, 141 singular value, 37 singular value decomposition, see SVD skin effects, 1 SoC, 1 SPICE, 210 SPRIM, 5, 139 algorithm, 160 state-space equations, 174 structure preserving MOR, 5, 159 subcircuit suppression, 68 SVD, 37 review, 38 SVDMOR, 96 SVDMOR, 5, 94 algorithm, 95 Sylevster equation, 50 symbolic Gaussian elimination, 3 symbolic hierarchical analysis, 210 SyPVL method, 2 system-on-a-chip, 1 TBR, 3, 215 algorithm, 39 description, 37
error bounds, 43 for unstable systems, 48 time complexity, 47 TBS, 5, 139 term cancellation, 70 terminal reduction concepts, 93 introduction, 4 MOR, 107 TermMerg, 95 algorithm, 106 cluster number, 102 introduction, 5 TermMOR introduction, 6 TICER, 3 transfer functions, 207 triangularization, 149 truncated balanced realization, see TBR two-level analysis, 150 vector potential equivalent circuit, see VPEC introduction, 5 VPEC, 5, 118 definition, 119 inductance formulation, 134 truncated VPEC, 128 waveform shaping, 215 algorithm, 221 computing cost, 222 delay problem, 225 distortion problem, 223 low pass filters, 222 Y-expanded DDDs, 73 YDDD, 73 YDDD constructions, 74