1 LOW-POWER VLSI DESIGN: AN OVERVIEW
1.1 WHY LOW-POWER? Historically, VLSI designers have used circnit speed 85 the "pe...
718 downloads
5078 Views
37MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
1 LOW-POWER VLSI DESIGN: AN OVERVIEW
1.1 WHY LOW-POWER? Historically, VLSI designers have used circnit speed 85 the "performance" metric. Large in terms of perfoimanee and silicon area, have been made for digital processorz, microprocessors, DSPs ( D i t d Signal Processors), ASICs (Application Spec& ICa), ete. In general, "small area" and "high performance" are two cordieting constraints. The IC designers' activities have been involved in trading off these constreink. Power dissipation issue was not B d e sign criterion but an afterthought. In fact, power considerations have been the ultimate design criteria in special portable applications such as wristwatches and pacemakers for a long time. The objective in these applications war minimum power for maximum battery life time.
+.,
Recently, power dissipation is becoming an important constraint in B design. Several reasons anderlie the emerging of this issue. A m o n g them we dte: Battery-powered systems such BS bptop/noteboak campatus, electronic organiserr, etc. The need for these systems a r k s from the need to extend battery We. Many portable electronics nse the rechargeable Nickel Cadmium (NiCd) batteries. Although the battery industry has been making efforts to develop batteries with higher energy capaeity than that of NiCd, 8 strident increase does not seem imminent. The expected improvement of the energy density is 40% by the turn of the century. With iecent NiCd batteries, the energy density is around 20 Watt-hour/pound and the voltage is around 1.2 V. So, for example, for a notebook consuming a typical power of 10 Watts and using 1.5 pound of batteries, the time of operation bdween recharges is 3 hours. Even with the advanced battery
2
CHAPTER 1
technologies. such as Nickel-Metal Hydride (Ni-MH) which provide large energy density characteristics (- 30 Watt-hour/pound), the life time of the battery h still low. Since battery technology has offered a limited improvement. low-power design techniques are essential for portable devices.
*
rn
Low-power design is not only needed for portable applications but also to reduce the power of high-performance systems. With large integration density and improved speed of operation, systeme with high do& frequencies are emerging. These systems are using high-speed products snch as microprocessors. The cost as9ociated with packaging, cooling and fans required by these systems to remove the heat is incteasing significantly. Table 1.1 shows the power consumption of various microprocessors that operate in the frequency range of 66-t-300 MHu. This table demonstrates that, at higher frequencies, the power dissipation is tw excesive.
Another issue related to high power dissipstion is reliability. With the generation of on-chip high temperature, failure mechanisms are provoked [El. Among them, we cite silicon interconnect fatigue, package relstcd failure, electrical p a m e t e r shift. electrornigration, junction fatime, ete..
In addition,there is a trend tv keep the computers from using more than 5% shlue of the total US power bndgct [9]. Note that 50% of office power is nsed by PCs. Since the processors' frequency is increasing, which results in increased power, then low-power design techniques are prerequisites.
The power dissipation issues and the devices' reliability problems, when they are sealed down to 0.5 fin and below. have driven the electronics industry to adopt a snpply voltage lower than the old standard, 5 V. The new industry
Low-Power VLSI Design: An Overview
3
standard for IC operating voltage is 3.3 V (i10%). The effect of lowering the voltage to much lower values can be impressive in terms of power saving. The power is not only reduced but also the weight and volume associated with batteries in battery-operated systems.
1.2 LOW-POWER APPLICATIONS Low-power design is becoming a new era in VLSI technology, 8s it impacts many applications; such as: Battery-powered portable systems; for example notebooks, palmtops, CDs, language translators, etc. There systems represent an important growing maiket in the compoter industry. High-performance capabilities, eomparable to those of desktops, are demanded. Several low-power deroprocessors have been designed for these computers. Table 1.2 shows some examples of there low-power processors. However, these circuits still consume significant power an the order of 1-to-3 Watts. These &ems have their power
_.
(!4 PowerPC 603
80
0.5
IBM 486SLC2 MIPS R4200
66
0.8
80
0.64
0 3.3 3.3 3.3
(W) 2.2 1.8 1.8
[lo] [Ill
[IZ]
dissipation dominated by I j O devices such as hard disk ddves and LCD displays. The total expected power dissipation of notebooks is 2 Watts with 4 pounds weight and daily recharge. Electronic pocket commvnication products such 8s; cordless and cellular telephones, PDAs (Personal Digital Assistants), pagers, ete. Table 1.3 shows a battery analysis far B handheld cellular system. Low-power is crucial for extending the battery life of these systems. Also, battery improvement is needed. The PDAs requite a large *mount of dats processing with multimedia capabilities. The expected power of PDAs is around 0.5 Watt with 0.5 pound weight. Also the expected power for pagers is 10 mW with 0.125 ponnd weight.
CHAPTER1
4
Handheld Cellular Motorola Microtac GOO mW
Example RF Power
Battery life Total power load
. rn
I I
750 mAH secondary NiCd 75 minuter talk time 20 hours standby 650 mA x G V = 3900 m W
SubGHz processors for high-perfomance workstations and computers. 100 MBz systems and over are emerging, and 500 MHz and higher will be common by the end of the decade. Since the power consumed is increasing with the trend of frequency increase then processors with new architectures and circuits optimized for low-power are crucial. Other applications such as WLANs (Wireless Local Area Network) and electronic goads (calculators, hearing aids, watches, ete.).
1.3 LOW-POWER DESIGN METHODOLOGY In order to optimize the power dissipation ofdigital systems low-power methodology should be applied throughout the design process from system-level to proeeer-level, while realizing that performance is atill essential. During optimization, it is very important to know the power didribution within a proeerSOL Thns. the parts or blocks consuming an important fraction of the power ate properly optimized fa power 9a-g. Fig. 1.1 shows the different design levels of an integrated system. The process technology is under the control of the deviee/process designer. However, the other levels are eontrolled by the circuit designer.
1.3.1 Power Reduction Through Process Technology One way to reduce the power dissipation is to reduce the power supply voltage. However the delay increases sigdcantly, particulsrly when VDD approaches
Low-Power VLSI Deszgn: An Overview
cI
LOGIC/CIRCUlT
DEVICEPROCESS
Figure 1.1
5
I I
Power reduction design ~pacr
the threshold voltage. To overcome this problem, the devices should be scaled properly. The advantages of scaling for low-power operation are the following: Improved devices’ charlrcteristics for low-voltage operation. This is due to the improvement of the current drive capabilities; rn
Rednced capacitances throngh small geometries and junction capacitances;
I
Improved interconnect technology; Availability of multiple and variable threshold devices. This iesults in good management o f active and standby power trade-off; and
1
Higher density of integration. It was shown that the integration of 8 whole system, into a single chip, provides orders of magnitude in power savings.
CHAPTER 1
6
Table 1.4 shows the effect of ecaling on microprocessor performance [14]. The power &sipation can be reduced by one order of magnitude at fired frequency of operation.
I 0.50 I 0.35 I 0.35 1 0.25 VDD (V) I 3.3 1 2.5 Area (mm') I 8 x 10 15.6 x I Clock (MH.) I 1 150 100 Power (W) 1 5.0 I 3.3 m - ~Inn"M"R -" Area (%ma) 1 6.4 x 8.4 I 4.5 x 6 Power(W) 1 5.0 I 2.2 L (/4 L.ff ( P )
1 1 1
0.25
0.15 1.8
I 4x5 I 225 I 2.35 I 3.2 x 4.2 I 1
I I I
0.15
0.10 1.5
1 2.5 x 3 I 330 1 1.5 1 2 x 2.5 1 0.45
1.3.2 Power Reduction Through Circuitnogic design To minimize the power at circnit/logic level, many techniqoes c a n be nsed such as:
Use of more static style over dynamic style; Reduce the switching activity by logic optimim.tion; Optimim clock and bns loading; Clever circuit techniques that minimise device count and internal swing; Custom design may improve the power, however, the design cost increases; Redace VDOin "on-critical paths and proper transistor sizing; Use of multi-!+ logic circuits; and
Re-encoding of sequential &enits.
Low-Power VLSI Design: An Overuiew
7
1.3.3 Power Reduction Through Architectural Design At the architecture level, several approaches can be applied to the design: rn
Power management techniqoes where annsed blocks are shutdown;
m
Low-power architectnrcs based on parallelism, pipelining, etc.;
m
Memory partition with selectively enabled blocks; Reduction of the number of global busses; and
rn
Minimieation of instruction set for simple decoding and execution.
1.3.4
Power Reduction Through Algorithm Selection
Among the techniqves to minimize the power at the algorithmic level, we cite: rn
Minimking the number of operations and henee the number of hardware resonrces; and
Data coding far minimum switching estiuity
1.3.5 Power Reduction in System Integration The system level is also important to the whole process of power optimization. Some techniques are:
. rn
1.4
Utilive low system clocks. Higher frequencies are generated with on-chip phbse locked loop; and High-level of integration. Integrate off-chip memories (ROM, RAM, and other ICs such 61 digital and analog peripherals.
etc.)
THISBOOK
Tb3 book is an early eontribntion to the field oflow-power digital VLSI circuit and system design. It targets two types of aodiences; the senior undergraduate and postgradoate university stodents and the VLSI circuit and system
8
CHAPTER 1
designer working in industry. In this book we have tried to cover the basics, from the process technologies and device modeling t o the architecture level, of VLSl system. T h e fundamentals of pow- dissipation in CMOS Circuits are presented to provide the readers with Juffieient badrgranod to be famdiaz with the low-power defign world. Several practical eheuit examples and low-power techniqucs, mainly in CMOS technology, me discussed. Also low-voltage issues for digital CMOS and BiCMOS eircnitr are emphasiied. This book also provides an extensive study of advanced CMOS subsystem design. brious power minimiaation techniques, 8t the circuit, logic, architecture and algorithm levels, are presented. Finally, the book includes a rich list of references, treating advanced topics, at the end of each chapter. This allows the readers to study, in depth, any topier they find interesting. This book is orgganiad into eigth chapters. The first chapter i s an introduction to low-power design. The other chapters m e presented in the following sections.
1.4.1
Low-Voltage Process Technology
Chapter 2 deals with CMOS bulk, bipolar, BiCMOS and CMOS Silicon On Insolstor (SOI) process technologies. Several CMOS technologies (N-well and twin-tub) and low-voltage CMOS enhancement m e reviewed. Bipolar technology with emphasir on advanced stmetme. is considered. The topic of the isolstion techniques wed for both bipolar and CMOS is addressed. Three BiCMOS technologies, with different perfomance/cmt, are presented. Complementary BiCMOS structnre, where a vertical irolated PNP transistor merged with an NPN transistor in 8 CMOS process. The design rules of a 0.8 ~"mBiCMOS process is supplied. Finally, SO1 technology is reviewed for low-voltage and low-power spplieatianr.
1.4.2 Low-Voltage Device Modeling Chapter 3 addresses the topic of device modeling. This t a p k is of iderest to those readers who need to analyze, design and/or simulate circuits. It introduces commonly used models of both MOS and bipolar devices. In this chapter we consider simple analytical models which EM be used for circuit malysir and design of deep-rubmicromete. MOSFETr a t low-voltage. Also, a simple model t o compute the leakage current, henee the static power dissipation, of MOS-
Low-Power VLSI Deszgn: An Overview
9
FETs i6 discussed. The SPICE’ device models of an 0.8 pm CMOS/BiCMOS process are also presented. This should help the reader to appreciate the meaning of the model parameters as well as to analyse the power and delay of the low-voltage cirenits presented throughout the book. Supply voltage scaling, due to reliability and power dissipation issnes, is presented.
1.4.3 Low-Voltage Low-Power VLSI CMOS Circuit Design Chapter 4 focuses on CMOS logic circuit design. The sauces of power dissipation in these circuits are reviewed. Simple models for delay and power dissipation estimation m e presented. The concept of switching activity is introduced and examples are given. The power dissipation due to spurious transitions is described. Several CMOS design styles, such 8s pseudo-NMOS, dynamic and NO RAee (NORA) logics, are studied. Guidelines for low-power physical design 810 presented. Other circuit variations of the static complementary CMOS, which are suitable for low-power applications, are discussed. This indodes the passtransistor logic family such as Complementary Pass-transistor Logic (CPL), Dual Pass-trmsistor Logic (DPL), and Swing Restored Pass-transistor Logic (SRPL). Also an overview of clocldng strategy in VLSI systems is covered. Induded in this chapter is ane important area which is the I/O circuits. The power dissipation of the 1/0 circuits in also analped. Finally, techniques to reduce static and dynamic power components for CMOS design are also reviewed. This chapter is intended to provide the readers sufficient background in low-power circuit design.
1.4.4
Low-Voltage VLSI BiCMOS Circuit Design
A variety of BiCMOS logic circuits suitable for 3.3 and sub-3.3 V are presented in Chapter 5. The chapter starts with the introdoction of the conventional BiG MOS (totem-pole) gate which was used in 5 V applications. The degradation of this gate, with supply voltage scsJing, is demonstrated. The BiNMOS family suitable for low-voltage applications (3.3- 2 V range) is introduced. It is shown that it provides better performance and delay-power product than CMOS, at these voltages, even a t low fan-out. Other logic families, for low power supply voltage operation, are also discussed. Finally, this chapter presents several low-voltage applications of BiCMOS. ‘SPIUE i s th. mod c o m o n l y u r e d circuit timulator.
10
CHAPTER 1
1.4.5 Low-Power CMOS Random Access Memory Circuits The objective of Chapter 6 is two-fold. It is intended to present &=nit technique for active and standby power reduction in static and dynamic RAMs, and to apply the concepts bebind these techniqoes for other applications b e cause RAMs have seen a remarkable and rapid progrw in power reduction. These techniqoes are applicd to the architectural and dreuit levels. Several advanced circuit structures and memory organisstions are described. Circuits, operating at a power supply as low as 1 V, are dm discussed. The Voltage Down Converters (VDCs) used as DC-DC converters are also treated. Their low-power aspects ere investigated.
1.4.6 VLSI CMOS SubSystem Design Chapter 7 presents B subsystem view of CMOS design. A variety of building blocks of VLSI systems such as adders, multipliers, ALUs, data path, ROMs, PLAs, ete. are &cussed. Several options of each subsystem are presented with power dbripation emphasis. The use of PLL in high-speed CMOS systems for deskewing the internal dock is &o examined. Low-power issuer of CMOS subsystems ilie &o included.
1.4.7 Low-Power VLSI Design Methodology In Chapter 8 advanced techniques to reduce the dynamic power component at several levels of design are presented. Lowering the power supply voltage while maintaining the performance is one technique for power reduction addressed extensively in this chapter. It is shown that low-power techniques at the high-level (algorithmic and architectural) of the design lead to a power saving of several orders of magnitude. Several exxamples are included to give the reader a desr picture of low-power design aspects. In addition, the powestimation techniqnes, at the G c n i t , logical, architectural and behavioral Levels, 61e overviewed. The goal of powa estimation is to opt-e power, meet requirements and know the power distribution through the chip.
REFERENCES
[l] Special Report, 'The New Contenders," IEEE Spectrum, pp. 20-25, De
cember 1993. [2] D. W. Dobberpuhl et al., 'A 200-MHz 64-b Dual-Issue CMOS Microprocessor", IEEE J. Solid-State Circuits, vol. 27, no. 11, pp. 1555-1567, November 1992. 131 W. J. Bowhill et d.,"A 300MBs 64b Qoad-Issue CMOS RISC Mieroprocessor," IEEE International Solid-State Circaits C o d , Tech. Dig., pp. 182.183, February 1995. 141 Technology 1995: Solid State, IEEE Speetmm, pp. 35-39, January 1995. [5] D. Bearden, et d.,"A 133 MHe 64b Four-Issue CMOS Mieroproeessor,' IEEE International Solid-State Circuits Conf., Tech. Dig., pp. 174.175, February 1995.
[6] MIPS Press release, 1994.
[TI A. Charms, ot al., "A 64b Microprocessor with Multimedia Support," IEEE International Solid-state Circuits Conf., Tech. Dig., pp, 178-179, February 1995. [8] C. Small, "Shrinking Devices Pat the Squeese on System Packaging," EDN, "01. 39, no. 4, pp. 41-46, February 1994.
[9] P. Verhofstadt, "Keynote Address," IEEE Symposinm on Low Power Electronics, Tech. Dig., October 1994.
[ID] G. Gerosa, et d.,"A
2.2 W 80 MHz Superscalar RISC Microprocessor," IEEE Journal of Solid-state Circuits, "01. 29, no. 12, pp. 1440-1454, December 1994.
[ll] R. Beehade, et al., "A 32b 66MAu Micropzocersor," IEEE International
Solid-state Circuits Conference, Tech. Dig., pp. 208-209, February 1994.
LOW-POWER DIGITAL VLSI DESIGN
12
[I21 N. K. Yeung, Y-H. Sutu, T. Y-F. Su, E. T. Pak, C-C Chao, 5. Akki, D. D. Yau, and R. Ladenquai, "The Deign of a 55SPECint92 RISC Proeesunder ZW," IEEE Internationd Solid-State Circuits Conference, Tech. Dig., pp. 206-201, Febrmry 1994.
IOI
[13] 5. Lipoff and A. D. Little, "Evsluation of New Battery Technology in Se lected Applications," IEEE Workshop on Low-power Electronics, Phoenix, AZ, August 1993. (141 J. M. C. Stork, "Toehaalogy Leverage for U1L.a-Low Power In€mmation Systems," IEEE Symposium on Low Power Electronics, Tech Dig., pp. 5255. October 1994.
2 LOW-VOLTAGE PROCESS TECHNOLOGY
This chapter ~ e w ffi a an introduction to IC fabrication of CMOS bnlk, bipolar BiCMOS and CMOS SO1 devices including sub-micron devices for low-voltage applications. Section 2.1 is a review of CMOS process technologies. Examples for an N-well CMOS process and a twin-tub CMOS process are considered. Section 2.2 deals with bipolar technology with emphasis on advanced hipola structures. The topie of the isolation techniques used for both bipolar and CMOS is addressed in Section 2.3. In Section 2.4 we discuss the similarities between advanced CMOS and advanced bipolar transistor strnetnres to demonstrate how both technologies m e indeed convergiug. The BiCMOS technologies we introduced in Section 2.5. with emphasis on CMOS-based processes. Three BiCMOS technologies, with different performance/cost, w e presented. Section 2.6. introducer a complementary BiCMOS structure, where B vertical isolated PNP transistor is merged with an NPN transistor in B CMOS process. In Section 2.7, B table with the design rules of B generic 0.8 pm BiCMOS process is supplied. Finally, in Section 2.8, SO1 technology is reviewed for low-voltage applications.
2.1 CMOS PROCESS TECHNOLOGY The idea of CMOS wao first proposed by Wanlaoa and Sah [l].In the 198O's, it was widely acknowledged that CMOS is the technology for VLSI because of its unique advantyes, such as low power, high noise margin, wider temperature and voltage operntion range, overall circuit simplification and layout effie. The development of VLSI in tho 80's has driven the integration density to millions of transistors on B single chip.
CHAPTER2
14
In this section we review two CMOS bull. technologies: N-well and twin-tub proeeeser. Other processes such ar retrogradwvell technology is not discussed.
2.1.1
N-well CMOS Process
In the N-well CMOS process, the P-channel transistor is formed in the N-well itself and the N-channel in the €-substrate. Fig. 2.1 illustrates cross-sectional views and process steps of B typical N-well process. The process starts by growing an oxide on the wafer. The oxide is then patterned to open N-well windows. Phosphorus atoms are implanted into the &con followed by a high-temperature annealing to diffusethe well [Fig. Z.I(a)]. The LOCOS ( L o c a l Oxidation of Silicon)' technique is used to isolate the Merent active areas. After removing the nitride used in the LOCOS process, a photoresist layer is deposited and is then patterned by B P-well mark (new mark). This is followed by low energy ion implantation of boron (B I/I) to adjust the threshold voltage of the N-channel transistor [Fig. Z.l(b)]. A seeond ion implantation can be applied to eliminate punchthrough in the short channel device. Simiirly, the threshold voltage of the P-channel tramistor is adjusted [Fig. Z.I(c)]. A thin gate oxide is then grown and B layer of polysilicon is deposited and doped with phoaphoros. The polyailiean is patterned to form the gates of all the transistors and intereonneetion layer [Fig. Z.l(d)]. The source and drain regions are then implanted by using =photoresist mark. Boron is used for the Pf regions of the P-channel transistors and arsenic for N-channel transistors [Fig. 2.l(e)]. The N f and P+ regions e.re dso used Nand F- we& contacts, respectively. The photoresist is removed and a thick oxide is deposited by Chemical Vapor Deposition (CVD) ar an isolation layer between the polysilicon layer and the subsequent metal layer. Contact holes are opened in the oxide layer and metal (usually aluminum) is deposited on the whole wafer. At this stage, the metal is patterned and annealed at d s t i v d y low-temperature (450 C) [Fig. Z.l(f)]. One or two other metal layers are u m ally added. At the end, the wafer is pauivated and windows are patterned over the metal bonding pads to provide electrical contacts with pins. 'For nore dctoils on the LOCOS iadationnrrc Sictian 2.8.l.
PI
16
CHAPTER 2
.
Strip 1eisUordde Grow gate oxide Deporitpolysilicon Apply photoresist and pattern stripresirt
. 8
. -.. . .. ...
a Apply photoresist
Patteln s/D regions for P-ehanorl ~mi~rp+srn Stripphotar&t RepeatiorN+SlD Stripphotore%l
Grow oxide
Etch contact hoie Deposit mptd Pattar" metal Metal anneal
0 Figure 2.1
(emtinwd)
2.1.2 Twin-Tub CMOS Process An alternative =pproa& for CMOS devices fabrication is to use two separate v& (tubs) for N- and P-channel transistors in a lightly doped N- or P-type snbrtrate. This "twin-tub" CMOS technology uses a single mmk that d o w a it to form two independently doped and self-aligned tubs [Z];hence both CMOS devices types are optimiaed independently. This tlexibility in selecting the substrate type with no change in the process flow is the major advantage of twin-tub CMOS. This technology is alro more attractive when the devices are scaled down to submicron dimensions.
Low- Voltage Process Technology
17
Fig. 2.2 shows the major steps involved in B typical twin-tub process. The starting material is B lightly doped P-epitaxial material over a, Pi- substrate to reduce latch-up. In addition to the conventional N-tub process, another N-type (arsenic) shallow implant is used to increase the suifaee doping of the N-tub to prevent punchthrough (far short channel devices). It is also used to form the channel-stoppers' for the P-channel transistors [Fig. Z.Z(a)]. The photoresist is stripped and a selective oxidation of the N-tub is performed. The nitride/pad wide layers are removed to implant boron, which is driven in to form the P-tub. This is followed by a second boron ion implantation for the channel-stoppers for the N-channel device [Fig, 2.2(b)]. The N-tub oxide is then stripped. So far only one mask (N-tub mask, MASK#l) is required for self-aligned wells and channel-stopper processes. Both tubs are driven in. LOCOS isolation is developed to isolate between the devices using MASK#2, which defines the active areas. After the LOCOS process, baron is implanted through the pad oxide (wed in the LOCOS) to reduce the threshold voltage of the P-channel transistor using MASK#3. This process results in a buried-channel PMOS transistor. The pad oxide is then removed. The remaining steps are similar to those used in the N-well process where MASK#4 is needed to pattern the polysilieon [Fig. 2.2(~)].MASK#B and MASK#B me required to form the N t and Pi Joureer/drainr (S/D), respectively. MASK#? for contact openings, and MASK#8 for patterning the metal [Fig, 2.2(d)].
The fabrication ofsobmicron MOS transistors requires additional process steps to avoid hot carrier effects. Fig. 2.3 illustrates &CMOStwin-tub structure with Lightly Doped Drain (LDD). Both NMOS and PMOS devices have lightly doped extensions t o the ~ o u i c eand drain regions. The electric field near the drain is reduced due to its light doping. This prevents the generation of hot carriers. The major process steps to fabricate the LDD structure are shown in Fig, 2.4.
2.1.3 Low-Voltage CMOS Technology Seded CMOS has been reoognived BE the technology suitable for low-power battery operated systems demanding high-speed operations. Conventional sealed CMOS technology undergoes a drastic reduction in speed when the power supPly is reduced to 1 V and sub-l V. Ifthe threshold voltage is sealed aggressively, the subthreshold leakage current increases drastically, which causes limitations for battery applications. Hence, high-performance low-power sealed CMOS technology is needed for ultra-low voltage operation. One key in achieving lowPower CMOS devices i s the reduction of the junction capacitances 8s well = 'For marc dctaila on Lhc Ehannel-atopprra rrfcfrr t o S d i m 2.3.
CHAPTER2
18
. -. ..
stripe rcsir,
8 Grow sclcctivc hick
P-tub
N-rub
P-rub P epi-1aycr
.. ..
H'SID P'SID
contacts Metalhalion
A P rpi4ayer
Figure I.l
oxide
Remove niindeipad oxide B in ( P - ~ ~ I I ) B anneal (P-wolll 2 n d B Ill (channel-stoppis)
Twin-tub pmscss sequence
Low- Voltage Process Technology
Side will
Field irxidc
19
CEAPTER2
20
other pararitic capacitances. Also, the subthreshold cmrrent should be reduced when low threshold voltage (VT5 0.3V)is wed. Extensions and variations of standard CMOS process have been proposed to enhance the performance of devices at low-voltage [3, 41. There devices have
good short channel behavior, low junction eapadtbnce and ledwed parasitic resistance. The power supply choice depends on performhnce/reliabity/power trade-offs. Reduced power supply is needed far low-power applications, but 8 deeprubmicron CMOS device with ultrathin gate oxide and low threshold voltage should be used to improve performance. Table 2.1 shows the speed achieved at low-voltages using deepsubmicron processes. Table 1.1
[
Perforrnsnee cornperison
N a m e [Ref.] I C M O S Process IBM [3] 0.10 pm ATLT [4] 0.10 pm NEC [5] 0.15 pm Fujitsu [6] 0.10 pm 0.15 pm Toshiba [8] 0.35 pm
tow-uoltsge.
1 Voltage (V)I Delay (ps) I
21.0 50.0 52.0
An example of improved performance CMOS technology suitable for low-voltage is the one proposed by Toahiba [a] called CMOS Shallow Jnoction Well F E T (SJET). Fig. 2.5 shows the cross-sectional view of the CMOS-SJET process. The N-well and P-well depths are very shallow and comparable to the maxmum depletion layer width in the channel. With this CMOS-SJET structure the depletion layer of the NMOS device, for example, is extended compared to the original one and reaches the depletion layer of the P-well and the Ntype sobstrate. As B result, the total depletion layer width is inmeaced and low depletion capacitance, Go,is obtained. This leads to the reduction of the subthreshold slope ( s w Section 3.3.2). Thus, the threshold voltage can be reduced at low power supply voltage compared to the conventional CMOS p r e CWS. Furthermore the wells are designed to reduce junction capacitance of the S/D tegions by 40 to 55 % compared to the conventional one. The structure of Fig. 2.5 alro uses dual polysilicon gate Nt and Pt,to optimize the threshold voltages of the MOS devices. Mo W-polycide gates m e used to reduce the poly sheet resistance. The delay of the CMOS-SJET inverter is 2.5 times better than that of conventional CMOS using the same gate sine (0.5 pm technology) a t 1.5 V power supply. The power-delay product of a CMOS-SJET gate a t
Low-Voltage Process Technology
P MOSFET
21
N MOSFET
W
N-Subsmh
1.5 V nsing 0.35 p m teehno1o.q is 1.3 fJ which is 113 times improvement of that for conventional CMOS d e ~ c e s . However,the main drswback with the CMOS-SJET is the large body effect due to its retrograde doping profile.
2.2 BIPOLAR PROCESS TECHNOLOGY The technology ofepitaxial growth gave rise to the economical manufacturing of monolithic bipolar ICs as it allows a high-quality thin film of semieonductox to be grown on the top of a sobstrate. Jonction-isolation and e p i t u y techniques triggered the progress of bipolar technology. Althongh, most of the focos has been on the development of CMOS for the last ten years, yet, we find that bipolar technology has achieved significant progress as well. Impressive high-speed resalts were demonstrated at the 1985 ISSCC (International SolidState Circuits Cafereme) and thereafter. ECL (Emitter Coupled Logic) gate delay of 15 ps have been reported 191. It was shown that advanced silicon bipolar technologies, although quite complex, eould be integrated at the LSI level and operate at frequencies above thore of CMOS circuits. Since then, the interest in sdvaneed bipolar processes has increased. The key features for such technologies are: i) self-aligned base, ii) advanced isolation techniques such 8s deep-trench, and iii) polySicon emitter contact.
22
CHAPTER2
LOU- Voltage Process Technology
23
A1
P
Figure 1.7
C r o a s a d i o n d vicw of the SICOS bipolm device structure [ll]
hsve been replaced by the side wall base electrodes. This allows the base are& to be almost as large as the emitter. The SICOS rtructnre is suitable for VLSI applications became of its density and low perasitics
One of the features of advanced bipolar transistors is the replacanent of alnm n iU m by polysilicon for the contact of the emitter. This step has led to noticeable improvement in the current gain of bipolar transistam. For further reading on polysilicon emitter BJTs refer to [lo, 12, 131.
In this aection, we introduce &typical DoublePolysilicon Self-Aligned (DPSA) process technology as an example of the advanced bipolar technologies'. Any bipolar process typically starts with creating the bnried layers and the epitaxial layer. Fig. 2.8 illustrates the major steps of the epitaxid growth with an iv+ buried layer (BL). This buried lsyer is introduced to reduce the collector resistance o f a hipolar transistor. While the epitaxial layer offers the high-quality silicon host far the bipolar transistor. The steps involved in Fig. 2.8 are the following. First, an oxide lsrer is grown on the substrate and is then patterned using the buried layer mask. The photoresist on the oxide s e r ~ e sas a mask against etching and ion implantation. After etching the oxide, the exposed regions of the silicon surface are implanted by arsenic or antimony to form the Nt buried layers. The photoresist is then removed and an annealing step is carried out. All oxide is then stripped. An N-epitariai layer is grown 'A r-irw of conrmntiond bipolar t.~chnologyusing the jundion isolation ttchniquu can be f o n d in [la].
CHAPTER2
24
Pholamm
.. 8
Grow oxide Apply p h a r o n a a Pducdetch N+BLmark Implant Sb
..
Strip resist
Si Epitaxial Laycr
Annenl
Strip oride Epilaxy (intrinsic layer)
on the substrate as shown in Fig. 2.8(b). The thickness of this epitadal layer can he as low as 0.8 pm for advsnced digital bipolar technology. The problems limiting the &g down of the thickness of epitaxial layer are the autodoping and oot-diffusion of the boried Ieyer.
Fig. 2.9 amstrates the sequence of a DPSA process assuming B starting stimcture with N+ buried layer, N-epitaxial hyer and isolation oxide as shown in Fig. 2.9(a). First, photoresist is deposited and patterned to define the collector contact region (deep Nt collector sink). This region is then implanted with phosphorus to increa~eits doping level. The photoresist is stripped and
25
Low-Voltaqe Process Technology
Oxide isolalion
Initial Svucmre Apply photoresist PatBrn pholomist
(3
, : ,:
.
(N+calleelor mask) P In for lhcN'sink
CVD Oxide
(4
..
Svip photoresistloride DepositP+palySiio~ide Pattendetch oxidalpolyS1
.
26
CHAPTER2
.
DepositCVD oxide RiE etch of oxide
-
Deposit !he second lcvcl oipulyrilicon
P Ill IN+poIy) Anncal
a Pauemictch N+ p01ysi
-
a Dcposil oxide Open wnracl haler Dcposil metel
Pallemicuh mcial
Low- Voltage Process Technology
27
P-type bare is implanted through a pre-implantation oxide as shown in Fig 2.9(b). The resist and the oxide are then removed. A combination of 'P polysilicon and oxide layers are deposited o m the wafer. These layers are then etched 8 s shown in Fig. 2.9(c). A CVD oxide is deposited eyer the wafer. The oxide is then dry etched using reactive ion etching (RIE). The Pi- polysilieon is walled with the oxide (called sidewall space^) [Fig.P.S(d)]. The secondled of polysilicon is deposited and implanted with phosphoros that will ultimately form the diffosed emitter junction. At this stage, the wafer is annealed to drive the dopants from the P+ and Nf polysilicon layers. Fig. 2.9(e) illwtiates the structure after patterning the N+ polysilicon. The P+ diffusion under the polysilicon forms the extrinsic base. The eontaet openings to the P+ and Nf palyrilieon, and collector are etched. This is followed by the metallieation step. At the end, the metal is patterned 81 shown in Fig. 2.9(I). B
The advantage of bipolar devices is their high-speed performance. However, there are not suitable for battery backup systems because they consume high DC current. Many logic circuit techniqoes have been proposed for low-power adlow-voltage operation, particularly for telecommunications applications 115, 161.
2.3 ISOLATION IN CMOS AND BIPOLAR TECHNOLOGIES
2.3.1 CMOS Device Isolation Techniques Isolation in an integrated circuit means to electrically isolate similar or different transistors. In a CMOS chip, where more than one million transistors can be integrated, 1pA/tran&tor of leakage cnrrent due to a bad isohtion can lead to a. few watts of DC power consumption, Moreover this leakage current pzovokes susceptibility to thelatch-up as will be discussed in Section 3.1.6. Isolation in CMOS is reqnired to separate the devices electrically by elimioating the inversion layers, which might be induced by the interconnection layer between the trmsiston. The principle of isolation in CMOS is based on a field oxide formation between two active mess [Fig, 2.101. The width ofthe isohtion region should be minimiied to attain dense layout and particularly for VLSI circuits.
CHAPTER2
28
Active Area
571 x lo-'' 1189 1189 0. 0.
0.
xw
0.
0.
ACM LDIF
2
0. 2 1 x 10-8
940 x 10Wo
m
80
CHAPTER 3
rn
Depletion charge sharing by the drain and source;
rn
Channel-length moddtion; Dependence of some electrical parameters on drain and substrate biases; Better modeling of weak-, medium-, and strong- inverzion regions and elimination of the discontinuity problem in the drain-current; and Geometric dependencies;
3.2.3.1
Threshold voltage:
The threshold voltage is given bj
VT = VFB
+
4,
+
K
I
M
Kd9. t IVBBI) -
~
?VDS (3.51)
The two parameters, K , and K,, model the effect of non-uniform doping of the substrate on the threshold voltage. Typical values for KI and K 2 are 1 V'lz and 0.12 iespectively. The factor q mod& the DIBL effect and accounts for the cbsnnel-length modulation effect. It is a function of VDSand VBB.
3.2.3.2
Drain current.
When V h 5 V D ~ ,we . ~have IDS
=
PO 1t UO(V0S - VT) (1
* '=f)
" )
((Vos - V*)VD, - -V& 2
+ $$V,,)
(3.52) where a = 1
+ 9 XI F(Q. t
and g = 1 -
IVBgl)-"'
I
1.744
+ 0.836(h + ~ V B B ~ )
(3.53) (3.54)
The parameters Uo = U&), U, = UI(VB)and po = p o ( v ~ s , Vare ~ ) bias sensitive. For VDS > VDS..~,the drain current is given by
81
Low- Voltage flbevice Modeling
where
K' = I+..+J1+2.. 2
(3.56)
and
(3.67) The drain-source saturation voltage is given by
(3.58)
3.2.3.3 Suhhreshold curreni: In BSIM, the total drain current is modeled as the Linear sum of a rtronginversion component and a weak-inverion component I,. I , is expressed BI (3.59)
and (3.61)
The factor d.8 is empirkd to achieve the best fit. The Subthreshold parameter n is a function of Vpbs and VB.
3.2.3.4 Sensirivity Factors of Model Parumerers: BSIM user the following formula to aeeoont for the sensitivity of each parameter to the width and length of the channel
(3.62) where Po is an arbitrary parameter, LPo and W P o ate the Land W sensitivity factor. of Po.
CHAPTER3
82
Another deep-submicrometer MOSFET's model called BSlM3 181 has been developcd for circuit simulrdion. It uses an. improved threshold voltage, drain current snd chaanel-lenpth modulation mod&. The model is also simple and has a s d number of parameters (x 25).
3.2.4
MOS Capacitances
In transient simulation, MOS capacitances are very important for CMOS and BiCMOS circuits an&& The MOS capacitances can be divided into two types of lumped capacitors: the depletion capacitors of the bu&drain ( C m and C B S )[Fig. 3.81. m
and bulk-source pn junctions
the capacitors associated with the gate ( C a , COD,COB.Ccsm, C G D ~ and COB,) [see Fig. 3.8, except for COB-].
3.2.4.I
Juncrion Depletion Cupucirurzces
The bull-source and the bullr-drain junctions have a bottom area As and AD respectively and B sidewall with a perimeter P, and PD respectively. Each of the bottom area and the sidewall contributes to the total depletion cap-tance. The bottom area capacitance is mesured per unit area, while the sidewall capacitance is measured per unit perimeter. Both of t h e e components are voltage dependent. As these junctioos a x normally zcyerse biased, we will consider the case when the bulk-soures and bulk-drain voltages ( V hand V B D ) m e less than 01 equal to 0.5#j (6is the junction built-in potential). The total bull-source and hulk-drain capacitances can be expressed by the following reletions [l]
The exponential factor. Mj and Mi.- are in the order of 0.3-0.5. C, is the zero-bias capacitance of the bottom jmction p a unit area and C;,- is the eel-bias capacitance per unit perimeter.
Low- Voltage Device Modeling
83
3.2.4.2 Gate Capacirances The gate capacitances can be divided into taro categories: rn
The fid overlap capoeiioneea: gatedrain (CGD-), gatesource (Ccs-) , and gate-hmk (CDBm)ovellap capacitances. Both Ccs.. and Coom exist due to the lateral diffusion of the source and drain under the gate. They are usually given per unit width as Coso and Cooo. The total gate-source and gate-drain overlap capacitance is given by: cosm = CcsoWe:r,
(3.65)
coo,
(3.66)
=
COD0
W.ff
where Cam and Cooo are eqod to C,L+ The capadtor COB, is due to the overlap of the gate a i d e and the bulk along the channel length at both ends of the active of the transistor. This capacitance is typically normalined to the effective channel length, the total COB^ is hence given by Coaw = C O B 0 L*ff (3.67)
CHAPTER3
a4
.
where Ccao is equal to C,,Wd
The nonlinear capacitance due to the c A q e of the bulk OP tAe channel. This capacitance is actually distributed but CM be modeled by lumped eap&tances. In the CEX when the channel does note& the capscitance CM be expressed as C G B = cmwc,,Lc,f (3.68) When the device in in the linear resion the channel is extending uniformly to the drain. The channel shields the b d k and the CBpaeitance exists only between the gate and the channel. The gate-buk capacitance goes to %em.The gate-channel capacitance can be oxpressed in terms of two equd lumped capacitances, B gate-source and a gatedrain capacitance, which am denoted Cos and CGDand are given by
Gom the m n x e
COS
1 = COD = FcozweffL'ff
(3.69)
Finally, when the device enters saturation, the channel at the drain pinches off and hence the gate-drain capacitance component becomes i e m while the pste-source capacitance esa be expressed by 2
Ccr = -C,W.,fL.ff 3
(3.10)
Fig. 3.9 depicts the change of the capacitance components as a fnnctbn of the gatc-source voltage (assuming that the sourcebulk voltage is zem). The total gate-ronrce capacitance is given by the snmmation of the Cosm and Ccs, and s i d m l y , the total gatedrain capacitance is given by the summation of C C D ~ and COD. The above described capacitance model can be used for circuit analysis and eLeuit design. SPICE me8 B chargecontrol model, which IS- developed by Ward and Dutton [$I. This modelis bared on the mtod distribution of charge in the MOS stiuctue and its conservation.
3.3 CMOS LOW-VOLTAGE ANALYTICAL MODEL The MOS mod& discussed previously have been developed far circuit rimulators. These models (e.g. BSIM) involvc large numbers of parameters whose value. mud be derived from device measurements. With the% models it is difficult to develop an intlutive understanding of the device behavior. Therefore,
Low-Voltage Device Modeling
85
an analytical drain current model valid for submicrometer MOSFETs operating at lowvoltage is needed for hand calculation and first order circuit analysis, with reasonable accuracy.
3.3.1
Threshold Voltage Definitions
The threshold voltage, VT,has some definitions which are important for the estimation of the static power dissipation. The first definition is the utrapolated threshold voltage from the characteristic IDS - V m [me Section 32.11. Another one is the constant-current (Lo., 010 nA per width unit) threshold voltage. These voltages do not have the same value [lo, 11). The extrapolated VT has approximately 0.2 V more than the constant-current one [ll]. The extrapolated threshold voltage should be sealed down proportiondy to the supply uoltage. This is becmse the drive (saturation) current depends on (VDD- VT(ertrapo1ated)).
CHAPTER3
86
3.3.2 Subthreshold Current When the threshold voltage is scaled for low power supply voltage operation, subthreshold current increases significantly. This current a limiting fador for battery operated circnits. As shown in Fig. 3.10, the drain current in the subthreshold &on can be modeled by
IDS,"* = w;,,I,locv..-"l/s W.
(3.71)
where VT here ir the constant-eorrent threahold voltage. I, and W. are the drain current and the gate width to define VT. S is the subthreshold swing parameter. which is the gate d k g e swing required to redvce the drain uuient by one decade. The current I, is related to VDs by
I, = I;(1 - P=/".1
(3.72)
T h e subthreshold swing is given by LIZ) S cz 2.3K (1
+
2)
Vldeeode
(3.73)
where Cdisthe drplelion-layer capacitance of the sourcejdrain junctions. Thus, S has a theoretical minimum limit which is 60 mvldeeade.
The leakage current, due to the subthreshold eandnction, is computed from ID^..,^ when Ves = 0. Then I l d
=w.llIo,o-vds W.
(3.74)
Using the examples of Fig. 3.10, typical values for constant-current and axtrapohted threshold voltager are 0.3 V and 0.5 V respectively. The parameter 5 is equal to 75 mVldeeade and the leakage cnrrent is e q d to 1p A l p m When estimating the static power dissipation, the worst-c leakage current has to be evaluated. In this E B S ~ ,the worst csre threshold d t a g e , VT,, hsr to be used where (3.75) VT,. = VT - AVT
AVT is the vapiation of the threshold voltage due to the process parmeters fluctuation such BS the oxide thickness, doping profile, junction depth, gate and width lengths, ete. AVT can be BS high as 50 mV on the same wafer and 150 mV for different wafers. This results in almost two decades ofleakage
Low- Voltage Devzce Modeling
current increase. Also the temperature effect has to be considered when leakage current is computed. The temperature affects both VT and S. A typical value of the temperature coefficient of the threshold voltage is 1.6 mV decrease per degree Celsius. The subthreshold suing, S increases by 0.25 mV/(decade.C) [See Equation 3.731. For example, if the temperature increases &om 25 C to 75 C, the thrcshald voltage decreases by 80 mV md the leakage current equalr 30 pA/pm (initid extrapolated VT = 0.5 V). This value ib 30 timu higher than that at 25 C. Both the temperature and process effects can result in a drastic increase of the worst-case static power dissipation. Note that this variation of VT greatly affects the delay of CMOS circuits a t low supply voltage, since the drive cuirent is proportional to (VDD- VT).
3.3.3 Low-Voltage Drain Current A part of this model is based on the one proposed by 11.31. For long-channel devices, the carrier drift velocity v is related to the horizontal electric field E by B simple linear relation (v = p E ) where the carrier mobility is constant. For short-channel devices, the mobility is no longer a constant and is a function of
CHAPTER3
88
the vertical electric field in the inversion layer. At this point we prefer to use the symbol & for the mobility to denote its dependence on the vertical dectrie field. Also, the velocity (v) is no longer proportional to E but is gjwn by the following twwregion piecewise empirical model [14]
where
2%., E. = -
(3.77)
&
where the saturation velocity is equal to 8 x lo8 e m / s for electrons (NMOS device) and 6.5 x 10e e m / s for holes (PMOS device). The drain current in triode region (VDS5 VDS,,,)is given by [I31
The saturation current can be expressed by ZDS8.t
= "sdC-Wtfl(VOS
-
VT
VDS.d)
(3.79)
By equating (3.78) and (3.79) we can derive the following expression for V D S . . ~ VD'oS,.t = (1 - X)(VCS - VT)
(3.80)
where
(3.81)
The drain current in the saturation can be rewritten a8
Ios,.r = KvSatCmWe~i(Vcs- VT)
(3.82)
Note that VT,m the current eqnation, is the extrapolated threshold voltage The mobility & for electrons UUL be expressed [l5] fin = 240\/0.06tO./(Vcs
+vT)
f m NC ply-gate
(3 83)
and far holes
..=(
65[O.O6t,/(V~s - V T ) ] " ~ 65 [0.06t,/(T'as VT - I)]"'
-
f m 'P fop
POlY- gate
N i p l y - gate
(3.84)
where to, is in k and the mobility in cma/(Vs).Thn analytical model CM he used for gate length down to deepsobmcmn range
Low- Voltage Device Modeling
3.4
8'3
CMOS POWER SUPPLY VOLTAGE SCALING
Scaling device feature size has been used to increase paddng density and speed. MOSFET scaling can follow three theories: 1. Constant Electric Field (CE) scaling [16]. 2. Constant Voltage (CV) scaliog [l?].
3. Quasi-Constant Voltage (QCV) scaling 1171
Expression
Dimensions Gate oxide Doping Voltage Capaeitace current Gate Delay Dynamic Power Dynamic Energy
In the CE scheme all horizontal and vertical dimensions and voltages scale h e d y with the $ m e faetor. In the CV reheme, the dimensions are scaled, while the voltages w e kept constant. This scenario has been the most cornmonly used. While the constant electric field scaling is natural Lom the device physics point of view, the constant voltage scaling is more piactical from the systems standpoint. Changing the supply voltage every technology generation (when the feature sizes a e scaled) is too expensive because mdtiple pow-
CHAPTER 3
90
supply generatois will be required for each PC board. However, BS the channel length scales helow sboat 0.6 p m the 5 V supply voltage must be reduced for reliability rea~ons(e.6. hot carrier effects, breakdown, ete). The quasi-constant voltage scaliog is an intermediary scheme between the CE and CV views. The @c&g factors of the hoiieontal dimensions and the volts@ are denotd by kh and !ex, rerpectively. Table 3.3 summluiees the scaling ef the important device parameters according to the three theories as a fonction of the horizontal scaling factor (kh). Note that in the QCV scheme, the dimenions scale more aggressively than the voltage (k, = kh'.)
For the drain current, the following average value is used IDS
(I
W/LC,(VOS - VT)'.5
(3.85)
Thk expression is not far fiom the one propored by [El. Table 3.3 shows the erect of device sealing on the delay, power and energy. It is assnmed that a gate drives other gates, where the load is mainly the gate cspscithnce. The threshold voltage is sealed proportional to VDD rcsling. The gate delays imprave with scaling for all the scenarios, but with II better rate in the CV scheme. However. the dynamic power. at maximal frequency, of the gate increases by a factor k;' in the case of CV. For the CE scheme, the power is reduced by a high factor equal to kF6. Also in this Table, the dynamic energy dissipated by a gate is reported. This is independent of fkquency. For all schemes, it has improved significantly, particularly for the CE case.
Scaling the snpply voltage is an efficient way to reduce the power consomption. However, to get B better performance 8t low-Vdtagge the device sizes and the threshold voltage have to be properly scaled. For B fixed sub-micron technology. the supply voltage can not be reduced aggressively, otherwire the *peed is degraded. However, for each fixcd technology generation, there is a lower limit power supply voltage VDD,~, [la]. For VDD'S higher than this minimum limit the speed does not improve significantly. Typical d u e s for VDD,~,are, 3.3 V and 2.5 V for L.,j of 0.5 pm and 0.3 pm, respectively. On the other hand, the h i e r lrmit of V ~ isDdriven by the reliability and the power dissipation limiitation. The d n e of this VDD is proportional to the s p a r e root of design rules (6) [IS]. For 0.6 pm and 0.3 pm design rules with LDD structure, these high limits are 4.5 V and 3.3 V, renpeetively.
91
Low-Voltage Device Modeling
3.5 MODELING OF THE BIPOLAR TRANSISTOR
3.5.1 BJT Structure and Operation Fig. 3.11 shows a cross-sectional view of a NPN bipolar junction transistor with geometrical layout and the corresponding symbols for NPN and PNP. To understand the basic operation of the bipolar transistor, one dimensional representation ofthe active mgim can be used. Fig. 3.12(a) illustrates a typical profile of the one-dimensional section of the active region [Fig. 3.12(b)]. The N+PN- sand+& farms the heart of BJT. Consider an NPN transistor with VBE> 0.5V and VBC < OV (forward-active mode). The corresponding energy band diagram is shown in Fig. 3.12(e). When the NtP (emitter-base) junction is forward-biased, electrons are injected from the emitter into the base (current In=).A small fraction of these electrons recombine in the neutral base (I,B)8. The rest of the electrons, of which the cmrent I,, is constituted, diffosc through the base towards the reversebiased base-collector jnnction where they are swept by the electric field into the basecollector depletion kym. On the other hand, some of the holes in the base are injected into the N+ emitter region resulting in a current I p ~ . This component is small compared to I.B because the hales' concentration in the base ia much smaller than the electron concentration in the emitter. The emitter-bare depletion layer can be B rite for the recombination between the injected electrons and holes resulting in B current I,..,. Moreover, some holes ate swept into the base dne to the generation in the basecollector depletion &on, but this component is very small ( cz 10-'7A/pm2). The terminal currents can be -€ten 11% follows Ic = I..c (3.86) IB
= Za t L d
4 = I,&
+I
d
+
Ira
(3.87)
+
IPE
(3.88)
Note that it has been asmmed that the base and collector currents ere flowing in the device, while the emitter coxrent is a0-g out of it [Fig. 3.121. The emitter bjection efficiency, which is defined as the ratio of the electron's current iojected into the base to the total emitter eorrent, is by
(3.89)
CHAPTER3
92
. / N-well
has to be nem unity; thst is, the emitter current should mostly be due to electrons for an NPN transistot. The ratio
This ratio
is defined
-
1C fl= IB
the DC curcent gain.
(3.90)
Lou-Vololtage Device Modeling
93
CHAPTER3
94
When the emitter-base junction is reversebiased and the collector-base jamtion is forward-biased, the transistor is in the inverse xpion where the emitter and collector may be exchanged. When both junctions are reverse-biased the transistor is in the cutoflregion. But when they are forward-biased, the device is said to be in the astoration repion. In this situation, both junctions sre injecting into the bsse, the small electric fields in the two depletion regjons sweep the carders into the emitter and collector repiom. Both junctions collect as well as emit.
3.5.2 Ebers-Moll Model In this section, we present the EbercMoU (EM) model, which is a simple DC model of the bipolar transistor. The Ebers-Moll model can be used for hand calculations and first order circnit analysis. The derivation of the model equations, in this section, is bared on the analysis by Rodston [ZO]. Lo Section 3.5.1,we have disms~edthe device operation in the forward active region only. For a general analysis, we assume that the base-emitter and the base-collector junctions &re forward biased. In the following discussion we will neglect the CnrrentS due to recombination in the apace ehsrge layeis and in the base. This implies that Inc = &',hence, Equation (3.88) reduces to
IE = Lc
+ &E
(3.91)
The current due the holes injected &om the base into the emitter is given by 1201 - 11 (3.92) I,o = q AE D,E P ~ E O[,VD./V. WE
where h~~ is the equilibrium hole concentration in the emitter and W Eis the neutral emitter width. The current Incis dominated by the diffusion current in the base and is proportional to the gradient of the minority carders (electrons) in the neutral base. Because the neutral base width (WB)is very thin, this gradient is approximately a comtant. Therefore, we c a n write 1°C as [20]
Inc =
q
AE D,B [ n B ( O )
(3.93)
;:gag(wB)]
where na(0) and na(Ws) are the electron concentrations at the edges of the emitter-base and collector-base depletion regions respectively [see Fig. 3.131. Note that the slope of the clectmns in the base is given by the term between the brackets as demonstrated by Fig. 3.13. 'B? app~ying KCL (i
.
I,
+ I~
~
I, = 0).
-
scL t h t
If thc recombination in the bsrc i s n&c$cd bstuten LB and I.o. j l s . / w e that I,., ri L o .
is the differcncc
(LB =
0). we can
95
Low-Voltage Device Modeling
KllliffC
BaJC
CDiieclor
Using thejunction law, the electron concentrations nn(0) and na(Ws), can be expressed rn terms of VBE m d VBCrespectively. The current I., c a n hence be given by [ZO]
where Ng is the base impurity eoncentration. The collector current is given by
Ic = Inc - Ipc
(3.95)
The current IPc is due to the holes injected from the base to the collector8. The baSc-eoUcetor junction is basically a P + N N + structure as shown in Pig. *Not= Lhat I., we harr -rumEd
wmat inclvdcd in Eqv~tion(3.88)because in drriring Equation (3.86) that the Eallsstor-b-e junction was revc-c biased.
CHAPTER3
96
3.12(a). An expression for I,c can be derived from the analysis o f a P + N N + diode. The reader is adviced to consult with reference [20] for the details of this analysis. The carrent I,, is gi~m by
where pnco is the equilibrium hole concentration in the collector, Wc is the epitaxial thickness under the base and T ~ ? i ,s the hole lifetime in the epitaxial layer. By substituting Lorn Equations (3.92) and (3.94) in Equation (3.91) and from Equations (3.94) and (3.96)in Equation (3.96)we get the following equations for I p and lc I, = I, - U,I, (3.97)
Ic = -I,
+ at',
(3.98)
Eqnations (3.97) and (3.98) m e called the EberrMoU eqmations. Fig. 3.14 shows the equivalent circuit of the BJT bared on the Ebers-Moll equations. The EbersMoU model described above is general and can be used for any region of operation by substituting for VB, and V.c by lhe appropdate values. In the forward ective region, assuming that VBS = 0.8 V and VBC < 0.3 V the emitter and collector current of Equations (3.97) and (3.98)reduce to la = I, sz I,, eV-1".
(3.102)
where the reverse saturation current of the bare-emitter junction In, can be derived from Equation (3.99)snd is given by
97
Lour-Voltage Device Modeling
E ligure 3.14 model
Equivalent DC & N i t of the EST blucd
on
the Eb.ra;MoU
It can edsily be shown that the base current can he expressed as 1 - a, IB
=
-F
(3.105)
Ql
Eqnatims (3.102),(3.103) and (3.105)arethe well-known current equation. ofa fommd biased bqpolar transistor. Note that Equation (3.105) yields the famous relation between at and the DC forward current gain P P = Qf/(l- a f )1. The simple Ebers-Moll model lacks accuracy for the following three reasons
1. It does not account far the parasitic resirtors of the emitter. base and collector.
CRAPTER3
98
PC
d E’
2. It doer not aocount for the Early effect, which causes the collector current to increase 8s the collector-emitter voltage increases. 3. It does not sccount for the effect of the high collector currents on the current gain. Next, we will discnss the modeling of e&
phenomena separately,
3.5.2.I The Purusiricul Resisrors of a Bipolar Transistor Fig. 3.15 shows the modification of the EM model hy the addition of the base rwistanee RB, the collector resistance Rc and the emitter resistance R E . There extrinsic components represent the transistor’s parasitic resistances from their active region to their base, collector and emitter terminals, respectively. The effect of the perasitie resistances ir important because the voltage drop BEIOSS them contribute to the external baseemitter and collector-emitter voltages VB1=. and V , , E ,respectively, = shown by the following two equations
V B ~ E=, VBE + RsIs t RBI, Vo,w = VCE
+ RcIc + REIE
(3.106) (3.107)
99
Low- Voltage Device Modeling
The drop across the parasitic resistors has to be acconnted for to get more accurate iesalts from the EM model. Neglecting these drops may ~ V U Llead to erroneous iesults. For example, if the external collector-emitter voltage i n fonnd to be equal to 2 V one may dednce that the BJT operates in the active Ecgion. However, if Rc = 1.8K and RB = 0 . M and Ic I , = 1 mA, then the intrinde collector-emitter voltage (Von) is 0.1 V. This implies that the bipolar transistor is actually saturated. This phenomenon is known as QuariSatuwlion.
3.5.2.2
The Early Effecf
The E d y effect refers to the base width modalation due to the change of the collector base reverse voltage (in the forward active region). As the collectorbase reverse voltage increases, the base-collector depletion layer widens. The resulting reduction in the neutral base width causer the current gain to increase which, in turn, leads to an increase in the collector current [see Fig. 3.161. This effect can be modeled by introducing the Early voltage (Va,) in the expression of the collector cnrrent a5 follows (3.108)
The inverse of the forward Early voltage 1,'VAj is analogous to the coefficient A in an MOS transistor. A typical value of VA, is 50 V. The AC output resistance of the BJT in the forward active region is related to the Early voltage and is given by 70
-v.r I0 ~
(3.109)
The Early effect in the inverse active region can be modeled by using the reverse Early voltage (VA,) which charaderises the slope ofthe collector cutrent in that region (inverse active region).
3.5.2.3
High Current Effects
The current gain and the cut-off freqnency are degraded due to high collector current. Fig. 3.11 shows the effect of the collector current o n the gain. This degradation can be referred to the high level injection in the base (Webster effect) and/or the base pushout (Kirk effect). For B detailed discussion on these phenomenon, the reader is advised to consult reference [ZO]. In the w e , -here the injection level in the bare is high (Webster effect) the collector
CHAPTER3
100
Figure 8.18
Thcl-V shmatcnsticrdrr BJT
Low- Voltage Deuzce Modelzng
101
cnrsent can be expresed as [ZJ]
-
Ic =
ev-l=v%
(3.110)
where the forward knee current Ixje is defined the collector current at which its slope in the Gummcl plot changes from 1 to l/Z [see Fig. 3.181. This current marks the onset of high level injection. The degradation of the current gain, when Ic > k,, can be described by the following relation [203
P = - I0 =&IB
1x1
(3.111)
IC
where & is the value of the gain when Ic < I z f . The modeling of the Kbk effect is very complex. However, simple model for the current gain, which can be used in first oidei circuit analysis, i n given below [Zl]
(3.112)
The aemracy of the simple EM model can be enhanced by acconntbg for the parasitic resirtars, the Early effect and high emrent effect which mn be modeled by simple analytical expressions as shown above.
3.5.3
Bipolar Models in SPICE
Two BJT models are implemented in SPICE. The Ebers-Moll model and a more sophisticated one, which is based on the Gummel-Poon (GF) model [ZZ].The second model indudes the following second order effects: rn
Very lour eument effect on the gain.
rn
Base width modulation effect.
. m
High-level injection effects (the Kirk effect is not included) Base resistance -tion
with current.
The GP model is based on one-dimensional analysis. It is valid for all regions of operation: cutoff, forward-active, invecse-active. and saturation. The GPbared bipolar model is illustrated by the equivalent circuit shown in Fig. 3.19. *A trpicai value of 1x1 B
u i L a c s is 1 m.4/pmn’
C ~
CHAPTER3
102
in1ii
f
The two bad-teback diodes on the right represent the intrinsic base-emitter and basccollector junctions and their curients are given by 1231 I,,
= -(e I . ves/n,v. - 1)
(3.113)
qb
Iso = I* - ( e vec/n,v, - 1)
(3.114)
4s
where I, is given by [23]
(3.116)
The forward and reverse current e-on coefficient (nt ond %), which ate introduced in Equations (3.113) and (3.114), are used to model thelow currents. The parameter qb (base charge factor) accounts for the high current and base
Low- Voltage Device Modehng
Figure 2.1s
103
Thc GP-blrrrd model of D b i p d v t r ~ $ i s t m
width m a d h t i o n effects. It is given b7 [23] 9s
+ 1-
=
(3.116)
qr models the effects of base width modulation and can be expressed as
The general expression of qs [Equation (3.116)] can be simplifled for lo dev el and high-level injection conditions. if
if
PI q,
q:/4
> 91214
(low - level - injection) (high- level -injection)
(3.119)
CHAPTER 3
104
-
The two back-to-back diodes on the left [Fig. 3.191 account far the currents caused bv the recombination of carders in the emitter-base and the collectorbase space-charge layers and other recombinations. These currents be modeled by [23] c,r,(ev-~”-v~ I) (3.120) ~
c,r,(ev**’m=vs - I)
(3.121)
where C,,C,.n. and n. have been introduced to fit the measured corrents. Further improvements to this model ate possible by the inclusion of three parasitic resistances ( R c , Rs, R B ) ;three jnnction capacitsnces (CE, C c , Cs); and two diffusion capacitances (C-, Cdc)= shown in Fig. 3.19. The model of the bare resistance take. into account the effect of the corrent (current crowding) through the following expression [24] tan(r) - I R B ( I ) = R B +~ ~ ( R B - R B ~ z) tan(z)l
(3.122)
where the variable z ia given by
Rg represents the low-current maximum resistance and RBm high-cmrent minimum residanee. The junction depletion capacitance is a function of the junction voltage (V). This function can be approximated by the following two expressions
v
Cj.irp= C;(1 - - )
4,
-Mi
if V < FC4;
(3.124)
The empirieal factor FC has a value between 0 and 1. Its default valne in SPICE is 0.5. Note that Equations (3.124) and (3.125) apply for a reverse and forward biased junction respectively. The diffusion capacitances model the charge associated with injected carriers. For example, the electrons injected in the bare have B corresponding rtorsge charge Q~~ = r,rcc (3.126)
Low- Voltage Device Modeling
105
The forward transit time q is current-dependent and is gjven by an empirical olprcrJirm[24]
Where VTF is a fitting parameter to model the change of 7, as a function of VBC ( 01 V c s ) ,ITF models the change due to Io and XTF controls the increase of q . ICO is the collector current in the absence of the high-current effects which corresponds to that dEbers-Moll model. The diffusion capacitance (associated v i t h the injected electrons from the emitter into the base, when the base-emitter junction is forward biased) is gjvm by CDE
=
aQDB
(3.128)
Similarly, the base-collector junction has a diffusion capacitance, which is given by aQDc CDC = (3.129)
av,,
where QDC
= SIEC
(3.130)
Although the SPICE models account for most of the first and second order effects, they m e not highly accurate. This originates from some weaknesses in the theory on which the models are based. As the device festnres are scaled down the currently a d a b l e models become less accurate. The physics and the theory of the sealed devices is more complex. Hence, aseluate modeling becomes very difficdt. One way around that problem is to chose the model parameters such that simulated device chsracteriaties agree with measurements. In practice, the models' parameters are extracted automatically using parameter analyser. with software tools to obtain the best fit. As a result, the values of the extracted parameters may not correspond to their actual values. For example, it is common to find B discrepancy of 20% between the measured cnrrent gain of a bipolar transistor and that listed in the SPICE fie. h o t h e r approach, which U eqmivalent to tweaking the parameterr, is to m e empifid models (eg. BSIM model), in which the empirical (fitting) parameters c m be optimized to get the best fit between simulation and measurements. Typical GP parameters , for the 0.8 prn BiCMOS prsented in Chapter 2., a ~ e shorn in Table 3.4 and 3.5.
CHAPTER 3
106
Table I.,
Para meter
Bipolar dcviccpar-ekx
and HSPICE sorxspondcna
SPICE Keyword
Description
IS BF
Saturation current Ideal madmum forward gain Ideal madmum reverse gain Forward current-emirision coefficient Reverse current-emirision coefficient Forward early voltage Revers early voltage Forwadknee enrrent Reverse-knee current Baseemitter leakage ssturation current Basecollector leakage saturation current Baseemitter leakage emission coefficient Basecollector leakage emission coefficient Emitter resistance Collector resistance Base resistance at zero current Base current where RB = RB(O)/Z Minimnm high-current base resistance Base-emitter ser-bias depletion cap. Base-emitter built-in potential Base-emitter junction grading factor Basecollector aero-bias depletion cap. Basecollector built-in potential Base-collector junction grading factor Collector-substrate iero-bias cap. Collector-substrate built-in potential Collector-substrate junction grading factor Internal base fraction of base-collector cap. Coefficient for forward-bias depletion cap.
BR NF NR VAF VAR IKF IKR ISE ISC NE NC RE RC RE IRB
RBM CJE VJE MJE CJC VJC MJC CJS VJS
MJS XCJC FC
Low- Voltage Device Modeling
107
Table 3.4 (contznnrd)
I,
XTF VTF ITF T,
Table 3.5
TF XTF VTF ITF TR XTB XTI ED KF AF
Forward transit time T F biar-dependant coefficient TF barecollector voltage dependence c o d . T F high current parameta Reverse transit time Forward and re~ersebetel0 temperature exponent Saturation current temperature exponent Energy gap Flicket noise coefficient Flicker noise exponent
ASPICE BJT model pa~metcrr(0.8 I" BiCMO8 p r 0 ~ ~ s ~ ]
SPICE Keyword IS BF BR NF NR VA P VAR IKF IKR ISE
Vdue
Units A
Zx
100 1 1 1
sn . .
V
5 5n 10P
V
0.
A A
0.
A
108
CHAPTER 3
Table 8.6 (emlmurd)
RE RC RB IRB
RBM CJE VJE MJE CJC VJC
FC
30 87
650 0 650 1 . 5 1 ~lo-'' 0.87 0 265 1.15~10-14 o 713
XTI EG
0.5 12.5~ 916.2 1.6 a.7x 10-2 4 x 10W8 1.4 3.5 1.11
XF
2.9x10-e
AF
2.0
TF XTF VTF
ITF TR
XTB
n n n A 62 F V F V
Q
J
ev -
Low- Voltage Device Modeling
3.5.4
109
Chapter Summary
111 thk Chapter, we h a w r r r i c w c d the fundamentds ofth e 110s xiid bipolnr derirrv 'l'hr ~ m w common t device rwud11 u s S 4 i n SI'ICE ILRYC been pn w ~ t d 'The key device P B I I U ~ ~ ~of Cw ~ S h model h a w been defined and rrplaincd, so that the rradcr is familiar with the drtailr of these niodclr and can apprecislr the importance a f t h e different model parameten T h e reader 19 given B Lst of model parameterr, for B typical 0 8 pm RiCXOS prnccis. that can be used for circuit simulations T h o c modrl ran be used even a1 low-voltage opcralion. hlorcoser, ia .in,plc analytical model unltd for suhmirronwrr 1lOSFET'r has berm 1 l i r c i . r 4
REFERENCES
[I] A. Vlrudimirescu, and S. Lio, "The simulation of MOS Integrated Circaits using SPICEZ," M m o . No. UCB/ERL M80/7, Univ. Cdifomia, Berkeley, October 1980. [Z] H. Masuda, M. Nakai and M, Kubo, "Characteristics and Limitations of Scaled Down MOSFET's Due to Two Dimensional Field Effect," IEEE Trans. on Electron Devices. Vol. ED-26, pp. 980-986, 1979. [3] R.L.M. D u g , "A Simple Current Model for Short-Channel IGFET and Its Application to Circuit Simulation," IEEE Journal of Solid-State Circuits, vol. SC-14, pp. 358-367,1979. (41 G. Merkd, J . Bore1 and N.Z. Cupces. "An Accurate Large Signal MOS Transistor Model for Use in Computer-Aided Design," IEEE Trans. an
Electron Devices, vol. ED-IS, 1972. [5] G. Baum and 8 . Beneking, 'Drift Velocity Saturation in MOS Tranristors," IEEE Trans. on Electron Devices, YOI. ED-17, pp. 481-482, 1970.
[6] R.M. Swanson and J.D. Meindl, "Ion-Implanted Complementary MOS Transistors in Lou-Voltage Circuits," IEEE Journal of Solid-state Circuits, vol. SC-7, pp. 146-153, 1972. 171 B.J. Sheu, D.L. Scharfetter, P.-K. KO, and M.C. Jeng, "BSIM Berkeley Short-Channel IGFET Model for MOS Transistors," IEEE Journal of Solid-state Circuits, vol. SC-22, pp. 558-566, 1987.
Z. H. Liu, M. C. Jeng, P. K. KO,and C. Ha, "A Robust physical and Predictive Model for Deep-Snbmicmmeter MOS Circuit Simulation," IEEE Custom Integrated Circuits Conf., Tech. Dig., pp. 14.2.114.2.4, May 1993.
[8] J. 8. Huang,
[9] D.E. Ward and R.W. Dutton, "A Chargeoriented Model for MOS Transistors Capacitances," IEEE Journal of Solid-State Circuits, vol. SC-13, pp. 703-707, 1978.
LOW-POWERDIGITALVLSI DESIGN
112
[lo] Y. P. Tsividir, "Operation and Modeling of the MOS Trwsistor,' Gmw-Ha, 1988.
Mc
[Ill T. Sakata et al., "Subthreshold-Current Reduction Circuits for MultiGigabit DRAM'S," B E E Jonmal of Solid-state Circnits, vol. 29, no. 7, pp. 761-769, July 1994.
1121 S.M. Sae, "Physics of Semiconductor Devices," John WiIey & Sons, 1981. 1131 C.G. Sodini, P.-K. KO,and J.L. Moll, "The effect of High Fields on MOS Device and Cireuit Performance," IEEE Trans. on Electron Devices, Vol. ED-31, No. 10, pp. 1386-1393, October 1984. [14] B. HoefRinger, H. Sihbert, and G. Z h e r , "Model and Performance of Hot-Electron MOS Transistor for VLSI," IEEE Trans. on Electron Devices, Vol. ED-26, pp. 513, 1979.
[I51 C. hu, "Low-Voltitge CMOS Device Scaling," IEEE International SolidState Circuits Canf.,Ted. Dig., pp. 86-87, 1994. (161 R.H. Dennard, a t al.,"Designoflon Implanded MOSFETa with Very S m d Physical Dimensions," IEEE Journal of Solid-state Circuits, vol. SC-9, pp. 256-266, October 1974. [I71 P.K. Chatterjjee, et al., ''The Impact of Scaling Laws on the Choice of N-Channel or P-Channel for MOS VLSI," IEEE Electron Device Letten, Vol. EDL-I, pp. 220-223, October 1980. [la] M. K e h m u , "Process and device Techoologiea of CMOS Devices for LowVoltage Operation," IEICE Trans. Electron., vol. E76-C, no. 5, pp. 672680,May 1993. [19] M. Kdkumu, M. Kinugawa, and K. H m b o t o , "Choice of Power-Supply Voltage for Half-Micrometer and Lower Submicrometer CMOS Devices," IEEE Trans. Electron devices, vol. 37, no. 6, pp. 13341342, May 1990. [20] D.J. Rodstan, "Bipolar Semiconductor Devices," McGraw-HiU Publishing
Company, 1990. 1211 K. Naknuato, et al.,'Characteristics and Scaling Properties of n - p n Transistors with a Sidewall Base Contact Structure," IEEE Trans. on Electron Devices, vol. ED-32, no 2, pp. 328-332, February 1985. [22] H.K. Gummel and H.C. Poon, "An Integral Charge Control Model of Hipalirr Transistors," Bell Syst. Tech. J., vol. 49, 1970.
REFERENCES
113
[23] 1. Getreu, “Modeling the Bipolar Transistor,’ Tektranix, h e . , 1916. [24] P. Antognetti and G. Massobrio, “Semieandnctor Device Modeling with
SPICE,” McGraw a;U,1988.
4 LOW-VOLTAGE LOW-POWER VLSI CMOS CIRCUIT DESIGN
In thir chapter we introduce the CMOS logic gate with the development of sim-
ple models for delay and power disripstion estimation. These analysis permit us to understand the mechanisms that control the performance, particularly the power dkipation, of a logic circuit. Several CMOS d m i p s t y k , such as pseudoNMOS, dynamic logic and NORA, are presented. Other k c n i t variations of the static complementary CMOS, which are suitable for low-PO- applications, are discussed. These include the passtransistor logic families such as Complemendary Pass-transistor Logic (CPL), Dud Pasctramistor Logic (DPL), and Swing Restored Pass-transistor Logic (SRPL). Also an overview of clocking strategy in VLSl systems is covered. Included in this chapter is one important %re*which is the I/O circuits. The power dissipation of the I j O circuits is also analyzed. Findy, low-power techniques for CMOS design are also reviewed at the tr-istor-level. We will cover the low-power issues a t subsystem/system/architeeture levels in Chapter 6,7 and 8 in more detail. Several books treat in detail other CMOS circuit design aspects [I, 2, 31. The reader CM refer to them. Many issues existing in todays advanced CMOS circuit structures are considered; such as: Power dissipation components of a CMOS gate and their importance; Concept of switching activity; Power dissipation in 110 circuits;
.
Single-phase clocking strategy; Clock skew issue:
CHAPTER4
116
rn
Clock distribution in VLSl systems;
m
Ground bouncing; and
m
Low-power circuit techniques and design guideher.
4.1
CMOS INVERTER DC CHARACTERISTICS
Fig. 4.1 shows the basic complementary MOS inverter. Before deriving the DC-transfer characteristics of this inverter (the output voltage Y C ~ S U Ithe input voltage), lets understand the operation of this circuit.
.
When the input is BIGH, which means at VDD,we have
VSSn = Krn = VDD
v,
(4.1)
= K" VDD = 0 (4.2) In this case, Vosn > VT, and lVcstl < lVrpl. The PMOS is OFF and the NMOS is ON. The NMOS transistor N provider a current path to ground. The find stable value of the outpot voltage V. is ~
v, = 0
(4.3)
At the steady rtete, the DC cnment from VDD to the groondis controlled by the subthreshold current of the PMOS P ,since this device ia OFF and the NMOS N has B VDS equals to zero. We assume that the junctions leakage is negligible. If VT,,' is low enough (lower for example than -0.5 V), the subthreshold current is negligible (< 1 pA/prn width). If (negative) is high, the subthreshold is not negligible and can be w high as 1 p A / p m for = -0.05 V [see Section 3.321. In this case the output is not exBctly at zero and can have a value of tens of mV. In this section we a m m e that the subthreshold cmient is not importmt. Low-VT CMOS circuits .%re treated in Section 4.10. Similarly, when Kn is low (OV) Vos. f VT, and IV,s8l > [VTJ. The PMOS transistor is ON and the NMOS transistor iS OFF. The output voltage is given by
v.
= VDD
Also we assume that the leakage current is negligible. 'Exbr*pold.ed thruhold voltage.
(4.4)
Lorn-Voltage Lou-Power VLSI CMOS Cixuit Design
117
T
%sf+
PMOS
* Figure 1.1
A CMOS Inruter
The logic levels of the CMOS inverter are close to VDDand ground and the logic swing is equal to VDO.This is B main feature of CMOS gates.
4.1.1 ltansfer Characteristics In this section we discuss the DC ehaiacterirtier of the CMOS inverter of Fig. 4.1. Fig. 4.2 shows the DC transfer characteristic with the different regions of operation. For simplicity we use, for the MOS devices, the simple cnrrent models presented in Section 3.2.1. The circuit operation can be divided into fiue regions:
Region (A): 0 5 Ern< VT, The NMOS transistor is operating in the subthreshold region and the current is assumed zero. Hence the PMOS current is also em. The PMOS transistor is in the linear region. Thus, V. = VDD.
CHAPTER4
118
Region (B): Vrn < K. < I L Ens is defined M the input voltage at whioh the gab of the inverter is maximum and is also defined s the gate threshold voltage. In this region, the NMOS transistor ia operating in the satmation region and the PMOS is in the linear region. Since the emrent in both devices is thc same (in sbsolute value), w e have
IDS? = - I D S .
(4.5)
The PMOS current is given by I D S p '-Pp
[(~~-vDD-vTn)(va--I/DD)-~/~(~-vDO)z]
(4.6)
Where
6, = kp%
(4.7)
Leff
(4.8)
The saturation cument of the NMOS is given by
where
a.= -,k
W.ff L.ff
(4.11)
= Km
(4.12)
and VGS,
Using equations (4.5), (4.6) and (4.10), the ontput voltage is given by
v,
= (K*-Vrp)+
(4.13)
(%, - VTp)' - a(%%
VDD
-- vTv)vDD 2
- P-(!&
- vT,)a
PP
This equation of V, versus V, is plotted in Fig. 4.2 region (B) Region (C) : K, = V & Both the NMOS and PMOS transistors we in the saturation region. In this case, the PMOS current can be given by I D , = -P,
(G" - VTJ
(4.14)
Lou- Voltage Low-Power VLSI CMOS Circuit Design
119
'DI
YO
The NMOS saturation current is given in Eqoation (4.10). By iring the absolute value of the two dr- currents we have
equal-
(4.15)
where
p = -i%
(4.16)
PP
This equation is very useful from
B design point of view. Note, from this equation, that the logic threshold voltage of this gate is set by the designer; since the parameters & and /a are dependent on W c f fand L . t f . Moreover, the region (C) is d e k e d for only one point of I$,, For symmetrical NMOS and PMOS devices we have
VT" = VTP If the designer set
a 'PP
(4.17)
(4.18)
CHAPTER4
120
This ratio is a typicd example. The designer should set the rise ratio a5
(4.20)
We obtain VDD K, = K*" = -
(4.21)
2
A n inverter with this V,."* is sometimes called B symmetrical gate. The cutput voltage in this ea5e h not neeereary equal to VDD/2 and is given by the following inequality
K"
-vT,
< v. < V,,+
v,
(4.22)
In reality, V. is set by the alight dependence of I D , versus VD'OS
+
Region (D) : K,," < V,, < VDD In this region the NMOS is in the linear region while the PMOS is in the saturation region. Simila analysis used in region (B]can be applied. The output voltage is given by
\i
V. = (K* - V&) - ( L VT,,)' ~
~
&(I$.
Pn
~
VDD VT?)~(4.23) ~
+
Region (E): VDD < '4" 5 VDD In this region the NMOS transistor is ON, and in the linear region, and the PMOS is operating in the subthreshold region. If we arirume that this current is too small then
v.
=0
(4.24)
The cnrient flowing from VDDto ground, Y C ~ I S Y Sthe inpnt voltage, is plotted in Fig. 4.2(b). It reaches its madmum when both the MOS transistors are in saturation. It h important to note that for V,= K,," the DC power dissipation would be maximal.
Low- Voltage Low-Power VLSI CMOS G h o d Desrgn
121
Figvre 4.3 ERccl of thc ratio p on the (s)DC t r d w F h ~ E t e r i s t i c (b) i threshold voltage of ulr CMOS inverter
4.1.2
Effect of p
As we discussed before. the ratio 0 controls the threshold voltage of the CMOS inverter. This panmeter is set by the ekenit designer through the transistor sizes. Other psrameters such BS the mobility and the theshold voltage of devices are set during the fabrication and the circuit designer can not change them. Fig. 4.3 illustrates the dependence of DC transfer charaeterirtier and the threshold voltage of the CMOS inverter on the ratio p . Increasing 0 decreases the voltage &,". KU has II prwticsl maximum less than VOD t VpP and practical minimum greater than I+". Practical values mean that 0 can not have zero or infinite. In general, the circuit designer tries to set 0 = 1 for symmetrical operation unless the gate is used to switch an input s-8 different than a CMOS swing (from ground to VDD).
4.1.3
Noise Margins
Noise margin LG an important parameter in logic design. It i6 defined si the allowable noise voltage on the input 10 that the output is not affected. In other
CHAPTER4
122
(a)
words, we would define the valid logic levels such that they are restored when they propagate through a digital circuit. The logic levels c a n be extracted from the DC characteristic. As illustrated in Fig. 4.4 we define the levels at the input by
. rn
Logic 0 : for 0 5
Ii, 5 VrI, Logic 1 : for fix 5 5 VDD
and at the output by
.
Logic 0 : for 0 5
v. 5 V0'
Logic 1 : far Vog
The
5 V, 5 VDD
LOW noise margin is defined by N M L = ]fir.- V
d
(4.25)
Low- Voltage Low-Power VLSI CMOS Cnrcuit Dessgn
123
and the HIGH noise margin is defrned by
N M H = IVOH- Vrxl
(4.26)
The V,r. and the V m lev& can be defined ils the points where the slope of the DC transfer characteristics is -1, i.e.,
These valuer can be deduced wing equations (4.13) and (4.23). To have good noise mar&, it is desirable to have Vii. and f i x each near the other, mound the point V D D ~ ~ .
For CMOS circuits, the HIGH output Voltage level VOH,can be defined by letting VOH = VDDand Vor. = 0. The CMOS logic inverter has fairly ideal transfer €nnnnctian and it tends to have very good noise margins. In some applications, either N M x or NM,, is compromised to have good speed of operation.
4.1.4 Minimum Power Supply To obtain the maximum power raving in CMOS logic circuits, the power supply voltage should be reduced. So, what is the lowest practical supply voltage at which CMOS d l operate? In 19'12, Swansan and Meindl 141 demonstrated that the minimum supply voltage is given by
Vnom,n = BkTln
(4.28)
At room temperature this value is equal to 0.2 V. This demonstrates that CMOS ir a good candidate for ultra-low-power applications.
4.1.5
Example of Noise Margins
For an inverter with W, = 2W,= 4 p n (in 0.8 p n CMOS technology), and using a threshold voltage VT = VT,=(V~,(=0.5 V, we have the fobwinsvalues for N M L and H M H . At 3.3 V power supply voltage, Nnai. = 1.15 V and N M x = 1.45 V. However at 1.5 V, N M L = 0.60 V and N M H = 0.65 V. So the noise level should be kept low, particularly at low power supply voltage.
CHAPTER4
124
T
Figure 4.5
vDD
1
CMOS invat.? %ndwitching chaiactuistic
4.2 CMOS INVERTER SWITCHING CHARACTERISTICS In this section, we present the transient behavior of the CMOS inverter. A very simple analytic model for delay is developed. The objective of this analysis is to understand the parameters that affect the speed of the gate. We assume that the input has a step waveform. The delay t d , is the time difference between the mid point of the input rwhg and the mid point of the wing of the output signal. Referring to Fig. 4.5, td, is the 50% delay when the output is rising; and rn
tq k the 50% delay when the output k faUing.
The power dissipation issue during the switching is considered in Section 4.3.
Low-Voltage Low-Power VLSI CMOS Czrcuit D e q n
4.2.1
125
Analytic Delay Models
The load capacitance shown in Fig. 4.5 at the output of the CMOS inverter represents the total of the input capacitance of driven gates, the pararitic capacitance at the output of the gate itself and the wiring cepacitance. In Section 4.4, we discuss the estimation of this load capacitance. For simplicity we ac sume for 50% delay. that the MOS current is averaged, and is e q d to the saturation current. The equation of the saturation used in this seetion is the one given by Equation (3.82) Section 3.3.3. This saturation current is well modeled for short-ch-el devices,
4.2.1.1 Fall Deluy When the input goes from low (ground) to high (VDD),initially the output is at VDD, the pull-down NMOS of Fig. 4.5 is in the saturation region. We wusume that when the output falls to VDD~Z, the NMOS drain current is approximated by the raturstion current IDs,&. Referring to the equivalent circuit of Fig. 4.6(a), the delay i s computed from the following differential equation
where
I D S , , ~ , = Kn~.atCocWe~,m(Vcsn -E n ) (4.30) We ~ s s u m ethat the factor K, does not change. By integrating Equation (4.29) from t = tL, correrponding to V, = VDD, to 2 = t l , corresponding to V. = V D ~ / Zand , substitution of (4.30) into (4.29) we obtain
Note from this equation that the delay is inversely proportional to the width of the MOS transistor. So by aising the gate we can reduce the delay of the gate alone.
4.2.1.2 Rise Delay When the input goes from high (VDD)to low (ground), initidly the output is a t zero. The pull-up PMOS transistor operates in the saturation region. Similarly using the equivalent circuit of Fig. 4.6(h), the rise delay is given by (4.32)
126
CHAPTER 4
11
vDD At t = t , Vo=V,,
At t = t 3 V o = O At t = t Vo=-v~~ 4 2
From the *bow equation we can deduce that the dse delay is greater than the fall delay for equally sisad MOS transistors. So We,,,phould be rised such that the two saturation currents are almost equal in order to get symmetrical rise and fall dehyr.
4.2.1.3 Delay nme By definition, the delay time (sometiw called propagation delay) is given by
1
fz = #d, Hence, for
+td.)
VT. = - V T ~= VT the delay is given by
(4.33)
Low-Voltage Low-Power VLSI CMOS Circnzt Deszgn
127
Or the equation can be written as (4.35) The constant is slightly diected by VDDthrough the parameter K. This equ* tion shows a simple analytic expression for the delay time. We can observe that the delay is linesrly proportional to the total load capaeitsnce. Secondly, the delay increases when the power supply is scaled down. When VDD approsches the threshold voltage of the device, the delay incresses drssticdy. If the threshold voltage L sealed down with the supply voltage and the oxide t b i c h m is sealed down too, then the delay can improve with VDO sealing. &om the CMOS circuit designer point of view, the only parameters thst can be controlled to opt-e the speed of CMOS gates me:
..
The width of the MOS transistor; The load capacitances (input of the n u t stage, wiring,ette.); and The supply voltage V D D .
Fig. 4.7(a) shows the simulated effect of the power supply voltage on the delay ofan inverter with fanout = 3, using the device parameters given in Chapter 3. We buffer the input voltage with one inverter stage to obtain accurate results. The delay is almost stable at high VDO,however when VDDapproaches the threshold voltage of the NMOS and PMOS devices, it increaser drastically as expected by Equation (4.35). Therefore, the threshold wltage should be reduced to overcome this problem. In Fig. 4.7(b), the delay of the inverter is D VOD= 2.5 V. For VT/VDD > 0.5. the delay plotted versus the ratio V T ~ V D at incresses rapidly. In order to maintain improvement in circuit performace at reduced power supply voltage, VTJVDDmust be 5 0.2.
4.2.2 Delay Characterizationwith SPICE A data sheet for the delay of a cell (i.e., CMOS inverter) c ~ be n e d y prepared using SPICE. For example the load capzsitace 01 the fanout of a CMOS inverter is swept during the airnulation, and the relation of the type l a = a + b.C,(or fanout) can be obtained. Fig. 4.8 shows the delay YS. the external load capacitance C,. Other parameters can be extracted also.
128
CHAPTER4
4.5
I
Low- Voltage Low-Power VLSI CMOS Circuit Deszgn
129
0.65 I
0.15
'
1
1
I 2
3
4
5
6
7
8
9
10
4.3 POWER DISSIPATION To minimiae the power consnmption
of a CMOS circait, the various power components and their effect mast be identified. There are two types of power dissipation. One is the m-nn power dissipation which is related to the peak of the instantaneous current and the other is the averagge power dissipation. The peak current has an effect on the supply voltage noise due to the power line resistance. It can cause heating of the device, thus resulting in performanee degradation. From the battery lifetime point of view, the average power dissipation is mole important.
There are three power dissipation components within the CMOS inverter. These are: 1. Static power csused by the leakage current rent 1.t due to the value of the input voltage;
and other Static cur-
2. Dynamic power caused by the total output capacitance
CL;and
130
CHAPTER4
3. Dynamic power caused by the short-circait curent I,. during the
switching transient Sometimes component (2) and (3) are merged as total dynamic power
4.3.1 Static Power This component is split sometimes into two other components. The sourcces of static power dissipation, in a complementary CMOS inverter, are leakage currents (P,*) a d current drawn &om the supply due to the input voltage (P,%).Hence the total static power is given by P, = P s i
+ P.2
(4.36)
Leakage eubent consists of MOS junction leakage currents. Fig. 4.9 shows the parasitic diodes in a CMOS inverter. The body ties in this stroeture, such as the p&itic. diodes, m e not conducting (i.e. reverse biased and/or at iero voltage). The current in B diode is given by 9vd Id = I,(exp 1)
nkT
(4.37)
~
where n is the emission coefficient of the diode (sometimes equal to 1) and Vd is the applied voltage to the diode. Note that the current parameter 1. inereares with temmnrturc. The total rrower dissipation due to these le&am currents is given by P,l = ~ I a , V L W (4.38)
A typical value of this leakage current Id is 1 fa/device junction. This value is too small to have any effect on the static powex, because if we have o m million deuicer, the total contdbution to the power would be 0.01 pW. This first component of the static power is neglected, in the analysis, through all the chapters of this book except Chapter 6 in the cof memory design.
-
We con$der now the second component ofthe static power which is a function of the input voltage Kn. Assume that the input of the pull-down NMOS, of the inverter, is at B voltage 0 5 K" < V,. In this ease the torrent is given by the subthreshold expression (Fig. 4.10)
wW.O,,oLsgw
I D S = zo-I
(4.39)
Low- Voltage Low-Power VLSI CMOS Circuit Deszgn
Vss
r
131
CHAPTER 4
132
wherc VT is the constant-current threshold voltage. For V ,. > VT the current is given by expressions discussed in Chapter 3. The corresponding static power disripation is given by P.2 = IDsm*o.VDD (4.40) Thc mean value ofthe current is for both the PMOS and NMOS transistors. For example if V. = 0, VT = 0.15 V, W c f j= 10 fim and S = 75 mVJdeeade, this current is 1 nA. Far 1 million devices integrated, the total static power would be impmtant (1 mA of current). Note that this current increases drasticdly with the increase of temperature [see Section 3.321. This value, in standby mode. is not permitted lor battery-operated applications. CMOS circuits have been known to consume energy only during switching. But this is not troe mow. since low-VT CMOS is used far low-voltage operation. Some CMOS circuits, which exhibit a high DC current, are discussed in Section 4.6.
4.3.2
Dynamic Power of the Output Load
In this section we estimate the power dissipation due to the total oiitput load capacitance CL.This power is due to the currents needed to charge and discharge CL as shown in Fig. 4.11 and 4.12. We assumc a etcp input 10 neither the PMOS and NMOS m e on rimultanmurly. The average dynamic power Pa required to charge and dischsrgc II capacitance C, at Iswitching frequency f = IjT (Fig. 4.12) is given by I
=
(4.41)
The output current is given during charging phsse by do -- .Ip = C ," df
I ~
(4.42)
and during the discharge phase by
i - In = -c&dv. -
'
df
(4.43)
Then Eqoation (4.41) becomes
Finally the dynamic power dissipation is
T
(4.45)
Low-Voltage Low-Power VLSI CMOS Cmud Desegn
T
T
VDD
133
vDD
This equation shows that the power dissipation is proportiond to the operating frequency. Moreover, the ieduction of the power supply d r a s t i d y reducer the power dissipation. Ideally, 3.3 V ~npplyvoltage rednces the power dissipation by 56% compared to that of 5 V. Moreover, at 1 V the power is reduced by 96% compared to 5 V. The expression of dynamic power in Equation (4.45) is valid only for an inverter. However, for E. complex gate the concept ofswitching activity is introduced [see Section 4.5.31.
-
During the h s t output transition (charging) from 0 VDD,the energy drawn from the power mopply is Ed = CLV;,. For tbis transition, the energy stored in the load capacitor is
-
This means that during lhe output transition 0 Vo0, hdf of the energy drawn Gom the supply is stored in the capadtar and the other haUis eonramed
CHAPTER4
134
...............
~
/ ...
.......
L ......
....... 1 Time
y ...
......
...... .> Time
\
Lou- Voltage Low-Power VLSI CMOS Circuit Design
135
-
by the pull-up PMOS transistor. For the outpnt transition VDD 0, the mergy [l/2 C z V i D ) stored in the capacitor is consumed by the pun-down NMOS transistor and no current is drawn from the supply.
4.3.2.1
Energy vs. Power
It is important to distinguish between enecgy and power. If for uample, for a CMOS gate x e reduce its dock rate its power coxsmption will be reduced by the same proportion. Howevu, its energy d still be the same. Assume that the gste is powered with a battery to perform computations. The time reqoired t o complete the computation, with low dock rate, d beincreased. Therefore, after t h e computation the battery Uiy be jnst as dead as if the computation had been performed at high clock rate. So law-enecgy design is moreimportant than low-power design. The factor of merit in this case can be defined as the pmdud of energy limes the delay. The canvcntional term, low-power.is used through out this book to mean that we design for low-energy.
(I),
4.3.3 Short-circuit Power Dissipation Even if there were no load capacitance on the outpnt of the inverter and the paradtics are negligible, the gate would still dissipate switching energy. If the input changes slowly, both the NMOS and PMOS transistom are ON, an excess power is dissipated due to the. short-circnit current. Fig. 4.13 shows the rhortd time of the input. circuit cments BS the inverter switches as function of the i We are assaming that the rise time of the input is equal to the fall time.
P,c = I,..,.LVDD (4.47) To estimate I,.,, we use the simple model of the short-circuit current of Fig. 4.14 151. Also we Bssume that the inverter has symmetrical devices, which = P, = 0 and VT, = -VT- = VT. We also assume that the mesni that rise time is equal to the fall time of the input signal (7,= rt = 7).The mean short-circuit current in the unloaded inverter is
r,,.
=z
Due to symmetry we have
Y
T
[j:
i(t)dt
+ j:’i(tpt]
(4.48)
CHAPTER4
136
350 I
-50
'
0
1 I
2
1
4
5
(1
7
8
Time (ns) Figure 4.18
Shari-circuit evmnt function of the input dope
The NMOS transistor is operating in satmation, hence the above equation
The input voltage is given by
X * ( t ) = VOO -f
(4.51)
It can be derived &om Fig. 4.14 that
VT
*I= VDD 7
Then the integral leads t o
and t 2 = I 2
(4.62)
Low- Voltage Low-Pourer VLSI CMOS Circuit Design
Figure 4.14
137
hput voltage and short-cbeuit cumnt model
Thk equation shows that the short-circuit power dissipation is also proportional to the tiequeney. The only parameters that can be controlled by the circuit designer at given frequency and power supply to reduce P., are: 0 and 7. The power supply s d n g greatly affects the reduction of short-circuit power dissipation. Note that this analysis was done for an unloaded inverter. For a loaded gate, if the outpnt signal and inpnt signd have eqnd rise/fd times, the short-circuit power dissipation will be less than 20% of the total power [5]. So it is very important to keep the edges fast, to have negligible P,*01a t least, it is desirable to have equal input and output rise/fd times.
If the load capacitance is high, the output rirejfaU times become larger than the input ones. In this case, the inpot ehsnges completely before the output changer rignificantly. Therefore, the short-circuit current is near zero. Note that if VODis approaching (VT,,+ VTz)01 is less, the short circuit current can he eliminated because both devices can not conduct simultaneourlv.
138
4.3.4
CHAPTER4
Other Power Issues
The total power dissipztion of a CMOS gate is given by Pi,t,,
= P.
+ Pd + PSC
(4.54)
It represents the total power of a gate when it is switching at the same rate aa the operating frequency. In Chaptez 8, we will discuss how to estimate the power dissipation of a complex circuit.
Other power dissipation k u e s exist, such as: worst ease power estimation and temperature effect. These conditions are : maximum VDOandjunction tcmperatarc, and faat-faat process. Static power dissipation (subthreshold carrent) is incieaad by the increased temperature and increased power supply. Dynamic pow= is not sensitive to the temperatare bat it is affected greatly by the worst caae VDD.Short-drcuit power dissipation depends on the temperature j u t as the short-circuit current doer. It is also dependent on the power snpply. The mobility and threshold voltage deereaae with increasing temperature. Each of these two parameters has an opposite effect on the current. So it is important to eonrider the worst case power consumption evaluation in any design.
The simulated average total power dissipation can be easily measured by the SPICE simulator u&g POWER MEASUREMENT commands. However, several papers in the literature have introduced "power meter" in circvit simulation to meaauce the power dissipation [6,7, 81,
CAPACITANCEESTIMATION
4.4
Previously we saw that the speed and power dissipation of CMOS gat- depend strongly on the total ontput load ce.paeitance. This capacitance is the sum of three components as shown in Fig. 4.15. Total input capacitances of N driven gates noted C,m; 1
Parasitic output capacitance of the drive gate noted C,;and
I
Wiring capacitance noted C,.
For simplicity we estimate, in this section, the average value of Cr. over the range of the output awing. This approach is used only for b i t i d estimation
Low- Voltage Low-Power VLSI CMOS Czreutt Deszgn
139
of the design. More circait simulation and layout extraction and port-layout shdation arc needed fm mole accuracy. Moreover, it is sometimes interesting to derive a simple expression for the load capacitance to dee the impact of important parameters on the speed and the power dissipation. We h t eramine the different components of the outpnt load capacitance: then we illustrate by eo .
example the estimation approach.
4.4.1
Estimation of C,,
The total eapacitanee of the driven gates can be evaluated by 5m-g input capacitance of all the receiving gates and we have
the
The gate capacitance of the receiving gate can be approximated by n
Cq*te=
conC ( W L )
"
NMOS ON
TlIlE
end by the PMOS transistor as illustrated by the equivalent resistances of Fig. 4.35. In this figure, we assme that at V,, = 0, A and A are set to their final values. During this transient switrhing phase the NMOS is subject to the body while the PMOS is not. When a eero, at the input I , is to be transmitted then the PMOS is subject to the body &ct. The PMOS and NMOS transistors should be sbed such that they charge and discharge the output symmetrically. If VT. = IVT,~and the body effect is symmetrical then we can size the devices such as P. = Pp. Sometimes, equal shed NMOS and PMOS devices can be used. It is easy to see that the delay of the TG gate in approdmately independent of the input level. This is not the case if the pass-logic Y S ~ Sa singlcchannel
Low-Voltage Low-Power VLSI CMOS Czrcurt Deszgn
171
transistor. A drawback of the CMOS TG is that it co~~sumes more area than a single-channel transmission gate (NMOS TG 01 PMOS TG). Thnr, if the area is ofprime concern, NMOS TGs are used. Any CMOS TG logic (we call it here conventional pars-transistor logic) function can be implemcntcd using the TG primitive element described above. In such implementation the transistor count, hence the silicon area, is low compared to standard static CMOS implementation. This ishighlighted in the implementation of such functions BJ mdtiple-g, demdtipleldng, decoding and addition. Pi. 4.36 shows & 4 1 multiplmer, where the data lines A, B, C and D are contlolled by S1 and S2 such that
F = A S I S ? + B.S,.Sz + C.S&
+ D.S,.S2
(4.87)
Thm form of logic is used when the inputs and their logic complements are available. The implemenlation does not need VDDor ground liner. However, the implementation suffers from a number ofdrawbacks; the driving capability of the ckcnit is limited and the delay increa~eswith long TG chains. Moreover, the eireait does not provide a restoration ofthe logic lev& i.e., the logic gates are passive with no gain elements. Pi.4.37 shows an example on how to lestore the voltage levels in chained TGs. When 8 TGs are pnt in s u i e s . the output signal changes very slowly. However, when an inverter stage is added every 4 TG stages, the level is restored as shown in the SPICE voltage waveforms of Fig. 4.37. The CMOS TG logic can be used in CMOS d r c u i t design offering an extra The adder degree of eirenit design Beedom. A0 example is the full-adder. Circuits d l be diseused in detail in Chapta 7. Fig. 4.38 shows the schematic of the XOR gate which is used by the adder. When the input A is low, A is high. The transmission gate TG is closed, then the output is equal to B. When A is high, A is law. The inverter formed by the transistors N m d Pis enabled, then the output is equal to A. The TG gate is open in this care. To implement an adder lets first review its functions. The boolean function o f a full-adder are: (4.88) S,, = A B B B Ci, ,C ,
= A.B t &(A
+ B)
(4.89)
A and B are the inpots, Ci, the carry input, , , S is the sum ontput, and C,, is the carry output. The truth table ofan adder is shown in Table 4.10. The CMOS implementation ofa one-bit full-adder is 3hown in Fig. 4.39(a). It requires 28 transistors and has two gate delays. In this circuit the transistors
CHAPTER4
172
B
F C
D
Low-Voltage Low-Power V L S I CMOS Crrcuzt Deszgn
173
n 'TI i n x
. . .. Time V
dl
L = L- dt
This noise problem can occur on power lead and is termed power bounce. We will use only one name to refer to this problem. Consider a CMOS output driver driving the output pad of 50 p F at 3.3 V in 2 ns rke/fall timer. It can be shown [39] that 2 is related to the fall/rise times by (4.142)
The dijdl can be as high as 165 mA/m. If for example 8 drivers are dowed to switch rimnltaneoudy per eaeh VoojVss pads pair, the resulting ground bounce for 1 = 1 n B is 1320 mV. This value can be B problem, partieduly for low-voltage applications, since this ground bounce consumes a large fraction of the digital noise margins. Some of the problems encountered arc 1) fake triggering. 2) double cloddng, andjoz 3) missing clocked pulses.
Low- Voltage Low-Power VLSI CMOS Czrcurt Deszgn
235
110 buffers are not the only sonree of ground bounce in CMOS circuits. Clock bnffers llod slightly the c o x logic can also cause serious ground bounce in the supply leads when driving large loads. Careful power supply routing should be taken when we power large buffes. The resistance of the metal should be minimieed so the voltage drop, due to the corrent spike, is reduced. There are many techniques to reduce the ground bounce. One simple approach is to use separate supply pins for the ootput buffers. Some approaches, based on reducing L and d i l d l , are the following: Multiple supply pads and pins iz O ~ way E to ieduce the indnctanee of the supply. A recent chip nses 121 power/gronnd pins oat of a total of 293 pins [40].
.
Placement of power and ground pins, adjacent one to the other reduces the effective inductance of power sod groond pins by mutual inductance. This approach cmses an inerutse in chip s i x and cost. Circuit techniques to reduce the d i j d t of the output and dock bufferr,
while maintaining sdeqwte performance. The simplest way is to control the rise/fsD times while maintaining the timing requirement. However, this approach has a serious problem, since worst-ease-slow process dictates the buffer rising (worse~asedclsy), while best-casefast process dictates the ground bounce l e d Benee the buffer design is constrained by the two extremes of process variations. Once the buffer is siaed to satisfy the worse~asedelay, the worsecase gronnd bounce may exceed the fired level. This problem can be solved by controlling the signal slope at the inpnt of the output transistors of the buffer [41].
rn
For clock buffers, and in high-performance design, on-chip by-pass apacitmce are added between t,he power bur and the substrate as shown in Fig. 4.106. This capacitance lowers the impedance of the power s u p ply. On-chip bypass capacitance doer not reduce the noire produced by output buffers.
m
Another approach is to reduce the output d t q e swing of the large boffer.
In eondudon, to reduce the ground bounce, all the techniques can be combined to reduce Land d i l d t The reader can refer to many other techniques to reduce the ground bounce [42, 43, 44, 451.
CHAPTER4
236
T'DDC
I
4.9.7
f
VDDBus
Low-Swing Output Circuit
With the advent of high-performance VLSI chips, which operate beyond 100 MHe and have over 100 I/Os on the same chip, high data rate CMOS 110 interfaces with low-swing signals are needed such BP ECL (Emitter Coupled Logic) 146, 47,481, BTL [4Q],GTL (501, and CMTL (Current Mode Transceiver Logic) (511. Conventional unterminated htecconneets (between VLSl chips) for CMOS-level sign& w u d y have poor signal quality with severe overshoot and r k g h g . accompanied by EMJ (deetromag~tetieinterference) and the possibility to trigger the lath-up.
Fig. 4.101 shows two chips connected to the bidirectional transmission line (50 R termination resistors) though GTL I/O (Gunning 110 ) transceivers. Bath ends of the transmission line are tezminated to prevent reflections. The load seen by each driver is 25 R. The termination voltage VTMis about 1.2 V. The output driver is an open-drain NMOS pull-down transistor and when it is inactive the output is at high-level signal Vox equal to 1 $ ~The . input receiver = 0.8 V. uses a M e r e n t i d comparator with external reference voltage
Low-Voltage Low-Power VLSI CMOS Circuit Design
Figure 4.101
CTL 110 with two chipa connected to
237
transmirsionhe
Fig. 4.108 shows an output duver in open-drain confignration which indudes circuitry to reduce overshoot and the turn-off dildt. When K, is low, P, turns ON which itself turns Na and N, ON. In this C B J ~ , ,the maximum output voltage is VOL,,,, = 0.4 V. The powei dissipated by the pull-down NMOS ir madmum and mainly static. The static current is equal to (VTM- V o r ) / R= 0 3 / 2 5 = 32 mA8. Hence, the marimurn static power dissipated on-chip is P = 32 n A x 0.4 = 12.8 mW for each I/O. % i d value of Vor. is 0.24 V, thns the nomind power dissipated by each active driver is 9.2 mW. When the input goes Lorn low to high, N, turns ON and Na is still ON because the signal through the two inverters I , and 1, is delayed by about 1 na. The transistor NI is weak, hence the output discharge ir controlled by N, and Ns. There transistors let the drain of N, connected to its gate as long BS V ~ s irs higher than VT. When Ns turns OFF, then NI discharge. the gate of Nq to the ground. Thus, the turn-off of N4 is controlled. In this mse, there is no DC Power dissipated. Fig. 4.109 shows the input buffer which employs B differential comparator. This L V,., > 50 mV (< -50 mv), circuit switches to high (low) V,, when I respectively over process, power supply and junction temperature variations. ~
'"ole Lhat this ourrent ;s supplird by
Vp,
and DOL V,,
238
CRAPTER4
Vi"
(GTJ. levels)
YOU,
The average power dissipated by this input receiver is 5.5 m W at 5 V power supply.
Low-Voltage Low-Power VLSI CMOS Circuit Design
239
4.10 LOW-POWER CIRCUIT TECHNIQUES Remember that the total power dissipated by a circuit has three components. Two of them which are very important are : 1) the static power (P.),and 2) the dynamic powei ( P d ) . This section treats some of the circuit techniques for achieving law-power while maintaining performance. Techniques to reduce the power at rubrystem/rystem and architecture lev& will be discussed in Chapters 6, 7 and 8.
4.10.1 Law Static Power Techniques One important source of static power dissipation is the use of low threshold voltage. With device sealing, the power supply voltage is sealed. If the threshold voltage is not sealed, and is equal or greater than one half V D D the , gate delay increases drastically [52]. The threshold VOhge should be less than 20% of VDD,in order to maintain puformance at law supply voltage. At 1 V power ropply, the thrwhold voltages can be as low as 0.1 V. However, rcdncing VT C ~ S serioos ~ S standby snbthreshold enrrent increase, dne to the exponential relation between the current and VT. With low VT the process fluctuation can increase this current more. For VLSI integration and future ULSI, the total standby current can be high and not acceptable for low-power spplications. To reduce this subthreshold current, associated with low VT devices. there are many techniques. These techuiqms are based on the principle to reverse bias the VGSvoltage of the MOS device (in the case of NMOS) in the standby made ofoperation, as ahown in Fig. 4.110. With Vcs = V e x , where Von is mgativc, the standby state of the device moves from state to state p . We d t e two tcchniqoes using this principle:
4.10.1.1 Self-Reverse Biasing
This technique has been used mainly to reduce the static power dissipation in standby mode of the memory decoded-driver [53]. The drivers, in memory, have a lbrge number of circuits, arranged repeatedly, but only a few of them operate aimultaneoudy. The drcuit of Fig. 4.111 can drastically reduce the subthreshold current of the drkers. The technique simply consists of inserting a PMOS tmnsbtor P- with a size W. between the power supply VDO and the common source node A. AU the PMOS transistors (Pd,,Pd2, ...,Pdn)of the ' I C o l y L ~ -nl t
tbcahold voltage.
CHAPTER 4
240
drivers have, in thk example, the s m c sivc Wd and common SOUICC (node A). The number of drkers R can be between a few hundreds to a few thousands. The MOS transistors in the ddvers have low iVTdl (e.g., 0.1 V). The PMOS transistor PG have a threshold d t t a g e IVT,I slightly higher than I V d (%., 0.2
~
0.4
V).
In active mode, the input S is low and the transistor Pois ON. For the drivers only one circuit is ON. In order that the PMOS transistor Pedoes not affect the drive current of the driverg, its size W, should be larger than Wd,depending on the capacitance of the common murce, which is huge for high R. In standby mode, the input S is high and the PMOS transistor P, is OFF. The inputs of all drivers are set to high (VDD). Without the PMOS tiansirtor P., the total subthreshold emrent would be n timer the c u r d of each driver. This malres thk current very high. Hence Pc %educesand limits the sobtbrahold cnrrent. The voltage of the common source node A, is reduced by an amount AVsna (afew hundreds ofrmV). This CBUSOS the PMOS transistors ofell drivers to hsve self-reversebiasing gate-source voltage, which drastically reduces the subthreshold current. The time needed for the node to stabiliue to VDDAVsns (or the time needed to switch from the active to stsndby mode) is called evolution time and can be very high (order of 1 mr) compared 10 the delay of the driver. The reason is that only the leakage and subthreshold cyzlents which
Low-Voltage Low-Power
s
Figure 4.111
VLSI CMOS Circuit Design
241
Slvndby mode
Active mode
Subthicrholdcurrmt reduction by self-revcrre hissing.
&charge the node A in this mode. This time can be undgnificant to low-power operation if the standby mode time is large enough s i n the case of many lowpower applications. When the input S is turned low (active mode), the time needed for the coinmm source A to recover (reaches almost V D Dis ) too low and can be lower than the delay time. Hence. it doer not interrupt the start of normal operation.
Lets derive now the subthreshold current expressions before and after reduction by SXB technique. The total subthreshold current withont the self-reversebiasing techaique is given by
-1vm
wa I..*, = n.I-exp
wo
~
Sjln10
(4.143)
With the lranristor P,, the subthreshold current is given by
w. exp I d 2 = law,
-lvTcI ~
S/I.lO
(4.144)
CHAPTER4
242
We assume that the devices have the s m e lo, Wo and S. By dividiog the current equations (4.143)and (4.144). ws have, for the subthreshold current, a reduction factor
-,
lowd,
Forexampleforn = 512, W. = (with this ratio thespeed irnot affected), VT, = 0.3 V, V T ~= 0.1 V and S = 90 mVjdecode, the factory = 8.5 x 10'. So, the saving, in subthreshold current, is sufficient. The parameter AVsni, can be easily deduced. Note that this technique needs multi-VT technology.
4.10.1.2
Mulri-VTTechnique
This techniqne is similar to the one discussed above, but it u ~ be n applied to any CMOS logit (54,561. The basic idea is shown in the crsmple of the NAND gate of Fig. 4.112. Here the MOS transistors P and N have high VT (e.g., 0.6 V extrapolated) for 1 V power supply applications. Also the logic gate has MOSFETs with low VT ( 5 0.3 V). The signal SL is used to switch the gate in active or sleep (standby) mode. The virtual upp ply lines VDDV and Vssv are common for many gates. We call thb logic multi-threahold CMOS logic (MT-CMOS).
In the active mode, the signal SL is low,P and N are ON, so the vktoal supply Vssv can be set to almost VDOand ground, respectively. Hence, the 10w-V~logic o m switch effidently, bot cart shonld be taken in the siziing ofthe P I N devices compared to the logic. Fig. 4.113 shows the effect of aieing the high-& devices on the delay of the gate. The width of P I N rhodd be at least 10 timer larger than that of logic cells. This condition depends greatly on the pararitic capacitances of the Virtusl sopply lints CI6nd C, [see Fig. 4.1121. If C, and C, are large then the width of P and N transistors can be reduced, because these capacitances tend to suppress the bouncing of VDDVand Vssv and henee improve the rpeed. The high-& MOSFET. can be cornmon for several logic g a t e s (q, 10). lines VDDVand
In the standby (sleep) mode, the signal SL is high, then P and N are OFF. Hence, the subthreshold current is limited by that of these high-VT devices. In this ease, the static power dissipation is dramatically reduced in the sleep mode. The subthreshold reduction factor can be deduced using the analysis presented in the previous section. One problem associated with this MT logic is that the evolution and recovery times can be large.
Low- Voltage Low-Power VLSI CMOS Circuit Design
'
H - V T Tr Gak Wid* lnormalizedcd) Figure 4.113 CMOS,
Effect
high.V,
MOS width on thc p=dommce ol MT-
243
CHAPTER 4
244
The measured delay, as a function of the supply voltagc tor Zinput NAND gate with FO= 3 and wiring load of 1 mm (0.25 p F ) , is shown in Fig. 4.114. The technology is 0.5.pm CMOS with low VT- = 0.25 V, low V T ~ = -0.35 V, high VTn = 0.55 V and high VTp = -0.65 V. The MT-CMOS logic has almost the s-e speed ag the full 10w-V~logic. The logic delay time is reduced by 70% at 1 V as campared with that af the high-v, one.
For holding the level of the output during the deep mode, a level holder is 85 shown in Fig. 4.115. It consists o d y of cross-coupled inverters with high-VT devices powered from the power snpply VDD. necessary
T h e source of the static power dissipation is not mly low VT devieer. Several other issuer eontribnte to static power increase. These are some Circuit design guidelines to ieduce the static power Mipation : rn
Avoid the use of pseudo-NMOS circuits in yaw design.
Low-Voltage Low-Power VLSI CMOS Circuit Design
Figure 4.116
m
245
CMOS gatr with Icvrl holder.
Avoid the w e of TTL-compatible I/O or devise low-DC current level converters. D o not use low VT devices in the 1/0 buffers, otherwke the DC power increaser remarkably because the MOS transistors of the I/O buffers have large sines. If you do not have any option, then use the rubthreshold reduction techniques.
4.10.2 Low Dynamic Power Techniques ASIC. and VLSI processor elode are improving rapidly, reaehing the snb-GKa range [ZI,561. The power dissipation of CMOS di@d circuits, operating at thew high-fxequeneies, increaser drastically and it can be the main performance limiting factor. Therefore, low-power circuit techniques are needed to reduce the dynamic power of digitd citcuitr. Moreover, low-power chip consumption is extremely important in order to extend the battery life of portable systems 1571.
In general the dynamic power dissipation of B gate (i) is given by: Pas = rriC,v.VDDf
(4.146)
where (I,is the gate activity, V, is the voltage swing, C, is the load and parasitic capacitances and f is the operating frequency of the system. Equation (4.146) demonstrates that there m e several ways to reduce P,:
CHAPTER4
246
1. Reduce the power supply voltage. Seating VDDfrom 3.3 V to 1 V results in B power reduction factor of 11. However, tbia approach leads t o speed degradation for a givcn technology. But if device sealing is applied, in a next generation technology, the delay will improve and henee the operating frequency. In a complex digital system local supply reductions een be used for non-&tical dreuits. 2. Redwe, temporarily, the clock frequency of unused blocks on a VLSl chip using an on-chip power management unit or reduce the gate BCtivity. These can be done a t the architectural level. 3. Reduce the output capacitance Ci. As a first order approximation thi. capacitance is composed of the intercomect capadtanee G.,, and the total input capacitances of the driven gates C;sv The latter caa be redwed Using low inpat tapa6tanee logic family [SO] such a CPL-like. Also u5ing minimum size logic gates in non critical parts of the dclign can reduce the dynamic power significantly.
When Ci,, dominates, &s in busses and high-capacitance intereonncctionr (interbloek wirer), then dreuit techniques, bwed on low-swing signal, while maintaining the power sopply voltage. can lead to power dissipation reduction 158, 591. With increasing chip dimensions and integration density, the capacitances of wirer will dominate. It is expected that the power &ripation associated with the busses and the interwnneetions in future ULSl chips waill reach half of the total power dissipation [58]. These arc some guidelines for the design of low-dynamic power eircnits : rn
Cho0.e the technology that has low junction and oxide capacitances for the same performance. Avoid, if possible, the use of dynamic logic design style.
..
For any logic design, reduce the switching activity, by logic reordering and balanced delays through gate tree to avoid glitching problem.
rn
If pars-transistor logic style is used, uuefd design shodd be considered.
rn
Use low-input capacitance logic family In non-critical paths, use minimum size devices whenever it is possible without degrading the overall performance requirements.
Low-Voltage Low-Power VLSI CMOS Circuit Design
247
4.11 ADIABATIC COMPUTING As discussed in Section 4.3.2, the energy provided by the snpply to charge a load CLof a driver during charging and discharging is E = C,,Va
(4.147)
where V is the power supply voltage ar shown in Fig. 4.116(a). Half of the energy is dissipated by the resistor of the pull-up PMOS device during the charging phare. A similsr argument applies Lo the discharge resistor of the pd-down NMOS transistor. This analysis is valid men if a step power supply voltage, V, is applied to the network. From Fig. 4.116(b), the Voltage drop across the resistor, Rp varies from V (supply voltage) to eero. Hence. the energy disripsted by Rp is given by
En = / e V . d Q = / e V n C d ( V - V x ) then En =
1 41.v’ 2
(4.149)
En = C L V V . where
(4.148)
(4,150)
6 is the average voltage drop nerosr the resistor of the pull-up PMOS.
If the power supply voltage bar two half steps, ar shown in Fig. 4.116(c), the energy dksipated by the resistor is 1
ER = -C,Va
(4.151)
4
So less energy is dissipated by the resistor, when the average voltage is reduced, while keeping the swing and load eapaeilnnce constant. This is the principle of
Adiabatic Switching [61, 62, 631.
For multi-steps power supply voltage,
BC
shown in Fig. 4.116(d), the total
energy dissipated is given by 1611
E = CL-Va =
Ecmuant,msj
N
N
(4.152)
and the one dissipated by the resistor is 1 2
vz
En = 4 N
(4.153)
248
CHAPTER 4
Low-Voltage Lour-Power VLSI CMOS Czrcud Design
249
where N is the number of voltage steps uniformy distributed. Fig. 4.117 shows an example of a driver with uaiformy distributed supplies which are switched in surcesi~ely.The voltage V, is given by
To charge the load, Vt through VN are connected to the load in succession (by dosing switch 1, opening switch 1, dosing witch a, etc.). To discharge the load, Kx-1 through K are switched in the same way, and the switch 0 is dosed, connecting the output to gannd. Note that the supply voltage, with mnlti-steps, needs B longer time period than the conventional case to charge mp the load capacitance. This techniqne has been used for large loads. Another variation is to use a supply voltage with a ramp form" [62]. In this case, the energy is drastically reduced if a long time period is used. For the inverter for example, pulsed power supplie~(PPS) are applied to the circuit. The adiabatic comput;oP becomes attractive only when the delay is not critical, b e c a m in that technique the energy is traded for delay. The energy-delay product of the sdie.bbstic circuit is much worse than the conventional CMOS gates [64].
4.12
CHAPTER SUMMARY
This chapter has provided an introdnction t o low-power CMOS desisn. The power dissipation components of a CMOS gate hsve been discussed. Techniques to reduce the different components, a t physical and circuit levels, were presented. Novel CMOS design styles such iu CPL, DPL, and SRPL were examined. Several issues in CMOS circuit design, such as clock distribution, ground booncing, etc., were reviewed. This chapter represents a base, for Chapters 6 , 7, and 8 , where subsystems and low-power architectures are discussed.
REFERENCES
[I] N. H. E. Weste and K. Eshraghian, "Principles of CMOS VLSI Design : A Systems Perspective,'. second edition, Addison-Wesley, Reading, MA, 1993.
[2] J. P. Uyemura, "Circuit Design for CMOS VLSI," Kluwer Academic Publishers, Norwell, MA, 1992. 131 M. I. Elmasry, "Digital MOS Integrated Circuits 11", IEEE Press Book, 1993. [4] R. M. Swansan and J. D. Meindl, "Ion-Implanted Complementary MOS 'hamistors in Law-Voltage Circuits", IEEE 3. Solid-State Circuits, "01. 7, no. 2. pp. 146-153. April 1972. [S] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design a l Buffer Circuits," IEEE 3 . Solid-State Circuits, "01. 19, no. 4, pp. 468.413, August 1984. [6] S. M. Kang, "Accurate Simulation of Power Disripation in VLSI Circuits," IEEE J. Solid-State Circuits, vol. 21, no. 5, pp. 889-891, October 1986. [TI G. J. Fisher, "An Enhanced Power Meter for SPICE2 Circuit Simulation," IEEE Trans. Computer-Aided Design, vol. 7, pp. 641-643, May 1988. [8] G. Y. Yaeoub and W. H. Ku, "An Enhanced Technique lor Simulating Short-circuit Power Dissipation," IEEE J. Solid-Slate Circuits. YOI. 24, no. 3, pp. 844-847, June 1989. [9] N. Meijs, and J. T. Fokkema, "VLSI Circuit Reconstruction From Mhsk Topology,'. Integration,"01. 2, no. 2, pp. 85-119, 1984.
[I01 D. V. Heinbruch, "CMOS3 Cell Library," Addison-Wesley, Reading, MA, 1988. [I11 R. J. Landers, and S. Mahant-Shetti, "Multiplexer-Based Architecture for High-Density. Low-Power Gate Arrays," in Symposium on VLSI Circuits, Tech. Dig., Honolulu, pp. 33-34, June 1994.
252
LOW-POWER
DIGITALVLSI DESIGN
[lZ] M. 1. Elmasty, "Digital MOS Integrated Circuits I", IEEE Press Book,
1981. [I31 R. H. Krambeck, C. M. Lee and H-F S. Law, *High S p e d Compact Ckcuitr with CMOS", IEEE J. Solid-State Circuits, vol. 17, no. 3, pp. 614-619, June 1982.
[I41 V. Friedman and S. Lio, "Dynamic Logic CMOS Circuits". IEEE J. SolidStale Circuits. vol. 19, no. 2. pp. 263-266,April 1984. 1151 N. F. Conclaves and H. J. DeMan, "NORA:LI Race Free Dynamic CMOS Technique for Pipelined Logic Structures" IEEE J. Solid-state Circuits, vol. 18, no. 3. pp. 261-266, June 1983. 1161 C. M. Lee and E. W. Seeto, "Zipper CMOS," IEEE Circuits and Dcviccr Mag.. vol. 2, no. 3, pp. 10-17, May 1986. [lT] N. Weste and K. Erhraghian, "Piinciplcr of CMOS VLSI Design : A Syrtemr Perspective." Addison-Wesley. Reading, MA, 1985.
[IS] F. Lu and H. Samueli "A 200-MH1 CMOS Pipelined MultiplierAeeumiilator Using a Quasi-Domino Dynamic Full-Addcr Call Design," IEEE J. Solid-Stale Circuits. VOI. 28,
no.
2. pp. 123-132. February 1993.
[19] J. Yuan and C. Svenron, "High-speed CMOS Circnit Technique," IEEE J. Solid-state Circuits, vol. 24. no. 1. pp. 62-71, February 1989. 1201
M.Afghahi and C. Svensson, "A Unified SinglcPhare Clocking Scheme far VLSI Systems," IEEE J. Solid-state Circuits, uol. 25. DO. 1. pp. 225-233. February 1990.
I211 D. W. Dobberpuhl e l al., '"A 200-MHz 64-b Dual-Issue CMOS Microproccs~or",IEEE J. Solid-State Circuits. vol. 27, no. 11. pp. 1555-1567, November 1992. 1221 H. 8. Bskoglu, "Circuits. Interconnects. and PacLaging lor VLSI," Addison
Wesley, Reading. MA, 1990. [23] K. Yam, e l al., "A 3.8-ns CMOS 16x16 Multiplier u%htg Complementary PaJr-'Ihn8islar Logic", IEEE J. Solid-Stntc Circuits, "01. SC-25. no. 2. pp. 388-394, April 1990. [24] M. Suaiki. e l .I., "A 1.5-ns 32-b CMOS ALU in Double Pars-Thnsistor Logic", IEEE J . Solid-Slite Circuits, vol. SC-28. no. 11, pp. 1145-1151, November 1993.
REFERENCES
253
[25] A. Psrameswai, 8 . Eara, and T. Sakurai, "A High-speed, Low-Power, Swing Restored P a s s - T r k t o r Logic Based Multiply and Accnmulate
Circuit for Multimedia Applications," IEEE Custom Integrated Circnits Conference, Tech. Dig., S a n Diego, CA, pp. 278-281, May 1994.
"The Design and Analysis ofVLS1 Circuits", Addison-Wesley, Reading, MA, 1985.
[26] L. A. Glasser and D. W. Dobberpuhl,
[27] T. Kobayashi et al., "A Current-Controlled Latch Sense Amplifier and B Static Power-Saving Inpnt Buffer for Low-Power Architecture", IEEE J. Solid-state Circuits, vol. SC-28, no. 4, pp. 523-527, April 1993.
[28] M. S. J . Steyaert, et al, 'ECL-CMOS and CMOS-ECL Interface in 1.2pm CMOS for 150-MAz Digital ECL Data Transmission Systems", IEEE J. Solid-State CLcuits, uol. SC-26, no. 1,pp. 18-24, January 1991. [29] C. Mead and L. Conway, "Introduction to VLSI Systems", AddisonWesley, Reading, MA, 1960. [30] N. C. Li, G. L. Haviland and A. A. Tureynrki, "CMOS Tapered Boffer", IEEE J. Solid-state Circuits, vol. SC-25, no. 4, pp. 1005-1008, August 1990. [31] M. Nemes, "Driving Large Capacitances in MOS LSI Systems", IEEE J . Solid-state Circuits, vol. SC-19, no. 1, pp. 159-161, February 1984. [32] N. Bedenstiema and K. 0. Jcppson, "CMOS Chcuit Speed and Buffer Opthiastian", IEEE Tram Computer-Aided Design, "01. CAD-6, no. 2, pp. 276-281, M a d 1987.
[33] A.J. Al-JShalili, Y. Zhn and D. Al-KhaIili, "A Module Generator far Optid e e d CMOS Bnffer", IEEE Trans. Computer-Aided Design, "01. CAD-9, no. 10, pp. 1028-1046, October 1990. [34] S. R. Vemuru and A. R. Thorbjornren, "Variable-Taper CMOS Buffer", IEEE J. Solid-state Circuits, "01. SC-26, no. 9, pp.1265-1269, September 1991.
[35] J. Burlds, "Clock Tree Synthesis for High Performance ASIC?', in IEEE ASIC Intun. Conf. and Exhibit, Rochester, NY, pp. PS-8.1-PS-8.3, September 1991.
[36] P. D.Taand K. Do, "A Low-Power Clock Distribution Scheme for Complex IC System", in IEEE ASIC Intern. Conf. and Exhibit, Rochester, NY, pp. PI-5.1-P1-5.4, September 1991.
254
LOW-POWERDIGITAL VLSI DESIGN
[37] Li. Kojims, S. Tsnaka, and K. Sasski,
” Half-Swing ClocLing Scheme for 75% Power Saving in C l o c h g Circuitry,” Symposium on VLSI Circuits, Tech. Dig., Honolulu, pp. 2524, June 1994.
[381 J. S. Caravella and J. H.Quigley, *Thee Volt to Five Volt Intedace Circuit with Device Leakage Limited DC Power Dissipation”, in IEEE ASIC Intern. Conf. and Exhibit, Rochester, NY. pp. 448-451, September 1993. 1391 M. Shoji, “CMOS Digital Circuit Technology”, Prentiee Hall h c . , Englc wood Cliffs, NJ., 1988.
(401 F. Abu-Nofd et d.,“A ThresMillion Ttanaistor Microprocessor”, in IEEE Iotenw&xal Solid-State Circuits Conf., pp. 108-109, February 1992. (411 T. Gabars and D. Thompson, “Ground Honnee Control in CMOS Intessted Circuits“, in B E E International Solid-state Circuits Cod., pp. 88-89, February 1988.
(421 T.Gahara, “Gronnd Bounce Control and Impromd Latch-op Suppression Through Substrate Conduction”, IEEE J. Solid-State Circuits, “01. 23,no. 5 , pp. 12241232, October 1988. [43] M. HashLnoto and 0 - K Kwon, “Low dI/dt Noise and Refletion Free CMOS Signal Driver”, in IEEE Cuatom Integrated Circuits Conf., Tech. Dig.,pp. 14.4.1-14.4.4. 1989. [44] T. Wada, M. EiOo and K. Anami, ” Simple Noise Model and Law-Noise Data-Ontput Buffer for Ultra-High-speed Memories”, IEEE J. Solid-state Circuits, “01. 25, no. 6, pp. 15861588, December 1990. [45l R. S e n t b a t h a n and J. L. Prince, “Application Sp&e CMOS Output Driver Circuit Design Techniques to Reduce Simultaneous Switching Noise”,IEEE J. Solid-state Circuit, YOI. 28, no. 12, pp. 1383-1388,Decemher 1993. [46] T. Knight and A. Krymm, “A Sew-Terminating Low-Voltq,e-Swing CMOS Outpvt Driver”, IEEE J. Solid-State Circuits, 701. 23, no. 2, pp. 457-464, April 1988.
[47] H-J Schumseher, J. Dikken and E. Seevindr, “CMOS Subnanosecond True ECL Output Buffer”, IEEE J. Solid-State Circuits, “01. 25, no. 1,pp. 150154, February 1990. (481 M. PedcrMn and P. Meta, “ A CMOS to lO0K ECL Interface Circuit”, in IEEE International Solid-State Circuits C o d , Tech. Dig., pp. 226-227, February 1989.
REFERENCES
255
[49] J. Martinen, "BTL Transceivers Enable High-speed Bus Design", EDN, August 1992. [50] B. Gunning, L. Yuan, T. Nguyen and T. Wong, "A CMOS Low-VoltageS-g Itansrnisrion-Line Transceiver", in IEEE International Solid-state
Circuits Conf., Tech. Dig., pp. 58-59, Februay 1992. [51] J. A. Quigley, J. S. Caravella and W. J. Neil, '"Current Mode Transceiver Logic (CMTL) for Reduced Swing CMOS, Chip to Chip Communication", in IEEE International ASIC Conference and Exhibit, Rochester, NY,Tech. Dig., pp. 452-457, September 1993. [52] M. Kakumu, 'Process and Device Technologies of CMOS Devices foz LowVoltage Operation," IEICE Trans. Electron., Vol. E76C, No. 5 , pp. 672-
680, May 1993. [53] T. Kawahara et al., "Subthreshold Current Reduction for Decoded-Driver by Self-Reverse-Biasing." IEEE J. Solid-state Circuits, vol. 28, no. 11, pp. 1136-1144, November 1993. [54] S. Mutoh et al., "1 V Bigh-Speed Digital Ckcuit Technology with 0.5pm Multi-Threshold CMOS," in IEEE International ASIC Conference and Exhibit, Rocherter, NY,Tech. Dig.,pp. 186-189, September 1993. [55] M. Eoriguchi et el., "SSI CMOS Circuit for Low-Standby Subthreshold Current Giga-Scale LSI'r", IEEE J. of Solid-state Circuits, Vol. 28. No. 11, pp. 1131-1135 November 1993. [56] R. W. Badeau et al., "A 100-MAz Macropipelined VAX Microprocessor,"
IEEE J. Solid-state Cmcnits, vol. 27, no. 11, pp. 1585-1597, November 1992. [57] R. Brodersen, A. Chandrakasan and S. Sheng, "Design Techniques for Portable Systems", in IEEE International Solid-state Circuits Conf., Tech. Dig., pp. 168-169, February 1993. [58] Y.Nakagomeet al., "Sub.1-V Swing Internal Architecture for Futwe LowPower ULSI's," IEEE J . Solid-State Circuits. vol. 28, no. 4, pp. 414419,
A p d 1993. [59] A. Bellaouar, I. S. Abu-Khater, and M. I. Elmssry, "Low-Power CMOS/BiCMOS Drivers and Receivers for On-Chip Interconnects," IEEE 1. Solid-state Circuits. vol. 30, "0.1, May 1995. [601 A. Chandrakaran et al., ~~~~~-Power CMOS Digital Design", IEEE J. Solid-state Circuits, VOL 2, no. 4, pp. 473-484, April 1992.
LOW-POWER DIGITAL VLSI DESIGN
256
[61] L. J. Svensson, and . I . G. Kollcr, "Driving a Capacitive Load without Dissipating fCV'," IEEE Symporiam on Low Power Electronics, Tech. Dig., San-Diego, pp. 100-101, October 1994. 1621 T.Gabara, "Pulsed Power Supply CMOS - PPS CMOS," IEEE Sgmposium on Low Power Elcotronics, Tech. Dig., San-Dicgo, pp. 98-99, October
1994.
[63]J. S. Denker, "A Review of Adiabatic Computing," IEEE Symposium on Low Power Electronics, Tech. Dig.. San-Diego, pp. 94-97, October 1994. [64] M. Horowita, T. Indermaur. and R. Gonadeu, "Low-PowerDigitd Design." IEEE Symposium on Low Power Electroniw, Tech. Dig., Slm-Diego, pp. 8-11, October 1994.
5 LOW-VOLTAGE VLSI BICMOS CIRCUIT DESIGN
BiCMOS technology offers enhanced performance compared to CMOS at 5 V power supply voltage. Many high-speed BiCMOS SRAMs, gate arrays, ASICr, etc. have been fabricated [I]. In this chapter, we present 8 variety of BiCMOS logic circnits suitable for 3.3 and rub-3.3 V. The potential gatel for digital applications m e identilied. The chapter starts with the introduction of the conventional BiCMOS (totem-pole) gate which is used in 5 V applications. The degradation of this gate, with supply voltage scaling, is demonstrated. In Section 5.2, we introduce the BiNMOS family suitable for low-voltage applications. Othec logic families, for low power supply voltage operation, are discussed in Section 5.3. Low-voltage digital applications of BiCMOS m e identified. The reader is referred to BiCMOS books [Z,31 to get more familiar with BiCMOS circuits.
5.1 CONVENTIONAL BICMOS LOGIC In this section, the eanvenlional BiCMOS logic family is introduced. This brnily has been used successfully in many applications at 5 V power supply voltage. The reason for the speed advantage of BiCMOS compared to CMOS is explained. At lawvoltage, the performance degradation of conventional BiCMOS is shorn. The CMOS inverter of Fig. 5.1 suffersfrom the limited current drive when the load capaeit,ance u large. To increase the drive capability of CMOS, I bipolar driver can he added at thc output of the CMOS inverter. Fig. 5.2 shows one possible configuration to construct what is called B conventional BiCMOS
258
CHAPTER5
inverter. The addition of the bipolar driver stage to the basic CMOS inverter is responsible for the high current driving capability of BiCMOS over CMOS. As a result BiCMOS offers lower d e l q compared to that of CMOS especially at high loading capacitance. The operation ofthis gate is straightforward. When the input is low, the PMOS P is ON and its d r a b current tmns the transistor QlON. The collector current of QIcharger the output load capacitance. As the output reacher VDD-VBB,, where VBE, is the turn-on voltage of the bipolar transistor and ir about 0.7 V, Q, gradually turns OFF. During this period, the NMOS transistor N a is ON. Since Ndl is conducting, Q2 is in the cutoff region. Bansistor Nd2 can also be controlled by the output node. However, using the base node results in faster operation because the b a of Qt is p d e d up faster than the output node and because the voltage level of the b a a node is largei. If the input is high, the NMOS transistors N and Nd, are ON. Qlis OF€ while Q. turns ON to discharge the output node. As a result, the load capacitance is pulled down. As the output V. leaches VEB, transistor Q. turns OFF and the outpot stays at this level. The conventional BiCMOS gate provides high drive capbilitr, eem static power dissipation and h g h input impedance. More dincnssionr on this gate are given in the following sections.
Low- Voltage VLSI BiCMOS Circuit Design
259
w CMOS
"0
1
1
BiCMOS L
TCL
Figure
6 2
Conventional BiCMOS h v c r k r
5.1.1 DC Characteristics Fig. 5.3 shows the DC transfer characteristic of the conventional BiCMOS inverter of Fig. 5.2. When the input voltage to the BiCMOS inverter is s e r a both the bipolar tran&lurr azr OFF. The PMOS device P operates in the h e a r region with rero drain-source voltage. Due to the subthreshold current of the transistor N (- 10 p a ) , the base-emitter voltage of QI is around 0.45 V. As a result, the output voltage V, = 4.55 V (0VDD= 5 V). The bilse of the bipolar transistor Q2is at zero voltage because Nd2 is ON. As the input voltage increases, the subthreshold current of N h u e a r e s causing VB,,~,to rise and the ontput voltage to fa.When the input voltage is around the mid-VDo. both the P and N MOSFETs are ON and operate in t h e saturation region. Also the bipolar devices are ON. At this point, the BiCMOS inverter is in the high gain region and the output voltage drops sharply towards its low level.
CHAPTER 5
260
5 3 ,-.
j
0
> z 21
t
:
Figure 1.3
Thc DO tranafGr charactcrialic o f the convcntiondBiOMOS at
5
V.
As the input voltage increases again, the base of Q2Sollows the voltage of the output since N is ON. When the input voltage reaches V D D ,the PMOS P is OFF.The discharge device, A', is ON and the base ofQl is at uero. Also, the o n t p t is completely discharged and N is ON. Then, the base of Q, is at sera In this cme, the output voltage is %emend both the base-emitter voltages are aero.
5.1.2 Randent Switching Characteristics In this section we study the transient behavior of the convent,iond inverter of Fig. 5.2. The purpose o f this analysis b threefold i) it serves to nndeEs1w.d the transient switching behavior of the gate, i) to develop a simple analytic model, and iii) also to show the superiority of BiCMOS compared to CMOS. The objective of delay analysis is to point out the important device and circuit parameters that affect the response OS the gate. The developed model is very simple and can be used BS a first order spproimation. We start with the
Low- Voltage VLSI &CMOS Circuit Design
261
Time (nr)
(b)
e -6 -8
0
1
2
3
4
5
Time (ns)
snalysis of the puU-op section. Then we show the difference in the case of the pull-down section. We asinme a step input.
CHAPTER5
262
5.1.2.1 Tmnsient Lkhnvior Fig. 5.4 shows the transient behavior of the BiCMOS inverter of Fig. 5.2. When the inpmt f& t o gronnd, transistor P turns ON and operates initially in the saturation region. Its drain charges the parasitic capadtames et the base and when VBE,PI = VBErm, Qlturns ON. The emitter current increaser in a relatively short time to its peak to charge the output load Cr.as shown in Fig. 5.4(b). The ontput voltage is pulled-up following the base voltage of Q1 BI shown in Fig. 5.4(a). As the bof Q, exceeds VT,, Ndl turns ON to discharge the base of QIto ground. But due to capacitive COUP^^. VB,,, tends to be pulled-up. When the base vokage is higher t h m VDD- V D S , . ~where , VDS..+is the saturation voltage of P,the PMOS tramistor P enters the Linear zepion and the drain (base) current drops gradually. Consequently, the emitter current of Ql struts falling. As the output voltage V, approaches the theoretical limit of VDD VBE-, Ql is expected to turn gradually OFF. However, due to the capacitive coupling between the bare and the output node, V, exceeds this limit as shown in Fig. 5.4(a). The same ieasoning can be applied when the input riser to VDD ~
5.1.2.2 Analytic Delay Mudel A simple delay aoalysk is w r i e d out in this section. The reader can refer to [4. 5, 61 for other detailed models. We talre iota acconnt the pararitic capacitances and the bipolar high current effects. We do not take into account the parasitic resistances since they have no appreciable effect with advanced bipolar technology. This model is based on i b j e model [TI. Fig. 5.5 illustrates the transient equivalent circuit of the pull-up section (Fig. 5.2) of the conventional BiCMOS gate driving a load capacitance CI,.As we are interested in 50% rise time, the PMOS current can be modeled by the saturation current of the device. Thia current is given by Eqnstion (3.82) in Chapter 3 IDS,,* = ~ p c ~ ~ , ~ t , p ~ p-~ l IVT?l) vosl (5.') where Vcs is equal to (K*+j V D D )where , K,+ is the low level ofthe input. The capacitance C,, accounts for the parasitic capacitances of the MOS devices P, N d , and Ndz a t the base of the pull-up bipolar transistor. Therefore, it is given by = C d , P Cd,N*> (5.2) ~
c,,
+
+
where C d , pand Cd,Na,are the drain junction capacitances of P and Ndl and Ca,N., is the gate oxide capacitance of N d l . The overlap capacitances of P
Low- Voltage
VLSI BiCMOS Circuit Design
263
Bipolar large signal model
.\
-. -7~. . ......T.. .
and N,, hie assumed negligible. The bipolar parasitic capacitance C a, of Fig. 5.5(a) is given by (5.3) Cpa = CC.Q>t CE.Q, The total load capacitance, C., shown in Pig. 5.5(b), i s given by
c, = c,
t
CS,Q1+CC.Q,
(5.4)
where Cr.is the external load capacitance, C,,O, is the average collectorsubstrate capacitance of Qz and CC,~,is the average base-collector capacitance of Q2.R e c d from Section 3.5.3 lhat the base-emitter Murion capacitance is given by
co
drc,Q, =if=
(5.5)
whew the q is the forward transit time subject to high-level effects. The delay c m be divided into three components :
1. The first component, l,, in defined as the time required to turn QION. The model of Fig. 5.5(a) can be used in this case. Writing lhe current equation at the base node of QI,we have
CHAPTER5
264
Solving that equation and assuming that initidly the bare-emitter of Qzis zero, we have t,
=
(CF +C,)-
VBB,a
I.?,,.,
(5.7)
If the initial VBEis not eeio then the above expression should be corrected. Typical value of il is 17.5 ps for a total parasitic capacitance at the base node of 50 f F ,V.j+,, = 0.7 V ,and I D S , . ~= 2 mA.
2 The second component, t2, is defined as the time required to charge the diffusioncapmitame, CD,p,.Startingfrom t,, the collector current begins to quickly rise and then rexbes its peak value, I c p . The output voltage changes slowly (see waveformsofFig. 5.4). Sot. is then defined as the time required for the collector corrent to reach its peak. This delay component is given by t2IDSd
=
T,IOCp
(5.8)
which means that the charge furnished by the PMOS is needed to charge diffusion capacitance. Therefore,
The peak collector current of Q1 can be approximated 'sing Equation (3.111) [Section 3.5.21. So we have ICP = JBOIX,IDS..t
(5.10)
where Po is the value of the p i n for low-level injection and I x , is the forward knee current. Note that r, is incremed by the collector current [see equation (3.127) Section 3.531. Hence, an average value of the forward transit time should be used in the above delay expression. The initial value o f q is 12 ps and it can leach 50 pr when the collector current reaches, for example, 5 mA. For = 2 mA, typical value for t a is 78 pr (average forward transit time is 31 ps).
3. The third component, ts, is defined as the time required to charge the total load capacitance to the middle point of the output swing. If we assume that the voltage across the base-emitter of QIis almost constant, then we have the following approximation
(5.11)
Low-Vollage VLSI BiCMOS Circuit Design
265
that Ic,pz is constant during this time [see Fig. 5.41, and the mid-point of the output is VDD/Z,then we have
I f w e assume
(5.12) The value of this delay vsries by more than an order of magnitude depending on the device’s sise and the load capaeitnnee. For example, for a load C, of 1 pF, this delay. t 3 , has a typical value a t 5 V power voltage 400 p, while for load 100 f~ a typical value is 70 ps. Hence, the total delay t d can he written as 1”
=
IIitatt.
(5.13)
The first delay is associated with the parasitics at the bare, the second one with thc forward transit time and the last one is a function of the load capacitance. For smdl loads, t2 and ti dominate. Bowever, for large output loads, the third delay term, t s dominates. The exprersion of the pull-down time is similar to that of the pull-up time ucept for the value of the drain e m e n t of the transistor N [see Fig. 5.21. The saturation current ofthis device is given by
-
I D S . .=~ K , C = U , G ~ W ~ ( V G ~V h )
(5.14)
The VGs far the NMOS during the switching is affFeted by V L Zdrop ~ while the one of the PMOS is not. This voltage is given by
vos =
y;.,h.
~
VBE
(5.15)
So the effective gate-source voltage of the NMOS k lower than that of PMOS. The sizing of the NMOS and PMOS dwicer doer not follow the rule used for CMOS. It can only be determined from circuit simulation to get symmetrical risc/fa delay limes. The slope of the characteriPtic delay-load of the BiCMOS gate is larger than that of CMOS, since it is equal to V D D / Z ( ~ D S+,l c~p~) . For 8 CMOS gate, the slope is rimply VDD/~(~DS.~,). The saturation culient in the CMOS is slightly higher than that of BiCMOS because the CMOS inverter has D PMOS with slightly wider device (see next Section]. Houcver, the slope of the BiCMOS inverter is larger due to large Icp.Therefore. the BiCMOS gate h a s a higher ddvability than CMOS.
CHAPTER5
266
5.1.3
CMOS and BiCMOS Comparison
Lets compare the delay of BiCMOS gate to CMOS gate, having both of them the same inpnt capacitances. We consider the case of inverters with the following riser. For the BiCMOS inverter, we have : W, = W, = 10 em, WN*, = WN,, = 2 fim, and the emitter ate8 is n2 the minimom area. For the CMOS inuerter, we have W, = 15 em and W, = 7 em. For unloaded inverters and from the delay cxprersion of the BiCMOS inverter discussed above, ~ ~ , C M O<Si d , B , o M o S because the BiCMOS circuit has more parasitics and requires an initial delay to turn ON the bipolar devise. For large loads, I ~ , C M O S> G,B;CMOS, as explained previously. Fig. 5.6 shows the simulated delays of the CMOS and BiCMOS inverters function of the fanout. Fanout is defined here a s the ratio of the load seen by the gate to the hpni capacitance. In other wozdr, fanout is equal to the number of the gates connected to the ontput of the driving gate, all having the same input capacitance. The inputs axe driven by a small siae inverter of the s a m e type to have t y p i d inpnt waveform falljrise times. For low fanout, 1-to.2, CMOS outperforms BiCMOS at 5 V powez supply voltage. However, when the fenout is greater than 3, BiCMOS outperforms CMOS;particularly for high loads. In Fig. 5.6, the u o s s ( ~ ~ eear pacitance (or fanout), denoted C,,is typically h the order of 100 f F . This c m ~ o v e rvalue is critical for the performanee of BiCMOS; particularly when the supply voltage is sealed down.
5.1.4
Power Dissipation
As discussed, the BiCMOS gste of Fig. 5.2 has no DC emrent path from VDD to Vss if the input has rail-to-rail swing. Hence the static power dissipation is negligible if VT of the MOS devices is high. The dynamic power dissipation of the gate can be estimated from the circuit diagram of Fig. 5.7.
It is estimated by
Pa = C,iV%f
-
+ Cp2Vizms=f+ GVDD(VX- V L ) f
(5.16)
The first term is due to the total peraritie capacitance at the base node of Qi where the swing is V D D . The second term is also due to the parasitic capacitance st the base node of 4. The swing at this node is limited to VBB.,... when the collector current reaches its peak. Finally the third term is related to the output load capacitance, CL,and the parasitic capacitance at the output. The swing is only V x - V ~ where , VH and VL are the high-level and the low-level of ontput, respectively. These levels ace affected by the output load.
Low- Voltage
VLSI BzCMOS Circuit Design
267
Equivalent load capacitance (kF)
For small loads the power of BiCMOS is greater than that of CMOS, while for large loads, they have almost the same dynamic power. Table 5.1 shows the simulation results of the power dissipation for both gates at 5 V power supply. At a fanout of 1, CMOS consumes much lower power than BiCMOS and it is h t e r . However at a Ianout of 10, the BiCMOS is faster (37.5% delay reduction) and it dissipater only 24% power more than CMOS. When a BiCMOS gate is driving another BICMOS, or a CMOS gate, the driven gate exhibits a DC power dissipation. This DC current is nat acceptable, particularly when the circuit is in standby mode. Thk is due to the reduced $-Ping at the output of the first gate. Fig. 5.8 d o w r an example of BiCMOS gatedrivhgaCMOS gate. Iffor example theoutput ofthefirst gate (BiCMOS) VBE,the Vos of the driven NMOS would be higher than ieio and around the VT, resulting in appreciable DC power. Furthermore, the drive current of the driven gate would be reduced; particularly a t low power supply voltagc. Another disadvantage of the reduced swing is the noire margin reduction.
CHAPTER5
268
Table 5.1
CMOS/BiCMOS powm disripotion v e r m ~Land OVDD = 6 V and
f=100hmS
Driver
CMOS (mW) BiCMOS (mW)
Fenout=l
Fsnout=5
Fanout=lO
0.67 0.23
0.83 0.58
1.26 1.02
5.1.5 Full-Swing with Shunting Devices Previously we have seen that BiCMOS &caits uhibit iedoced output s-g. To overcome these shortcomings, various types of BiCMOS gates have been devised. There are based on the conventional BiCMOS citcuits with baseemitter or collector-emitter shunting techniques or on other logic circuits which will be d~eusredin the following sections. Figore 5.9 shows some of the circuits bared on shunting devices. Fig. 5.0(a) illustrated one full-swing (FS) configuration called "FS type" gate [8] which uses MOS devices to achieve full-swing. For the charging phase, 8s the output exceeds V x , Qi cemes to source current to the load, and the load capacitance is charged through the shunting PMOS transistor P,. When the input goes to HIGH,the load is discharged through
Low- Voltage VLSI BiCMOS Circuil Design
Fare 1 (BiCMOS) Figure
5.8
DC
269
Gate 2 (CMOS)
eowcr dissipstim of the &ring
p t c
N
and N,. When V. falls below V,, Qa ceases to sink current from the load capacitance. Then the output is discharged to the ground through only the MOS transistors N and N,. The final charging and discharging phaser occurs through the shunting devices. Hence, these phases c a n be slow became the MOS shunting devices have low drive capabilities. When this FS BiCMOS gate L operating under high frequency, the output s-g can he reduced. Another drawback of this circuit is that part of the current supplied by P ( N ) is wasted through the shunting transistors which weakens the bipolar drive. The shunting transistors P, ond N, can be minimum size. The problem of the base drive inherent in the "FS type" BiCMOS gate can be overcome by using feedback (FB) from the output through an inverter as shavn in Fig 5.9(h). This eireuit is called "FB type" [9]. During the pull-up transition, the shunting device P, is initially OFF and the PMOS transistor p wpplied all its current to the b s e af Q,. When V, is approaching its high level, the inverter I turns ON P, which itself charger the output node to V D D . The pull-down transition can be explained similarly. The shunting devices P. and N , and the inverter I can be sived properly to achieve greater speed then the othei configurations, even the conventional BiCMOS gate.
CHAPTER5
270
VDD
r
Vnn
&:
CMOS inverter
Figure 5.0 Fdl.swing BiCMOS gstr typal: (a) "FS type"; (b) "FB k y p i ' ' ; ( c ) '"CErhlvltingtype.
Another full-swing configuration is the one shown in Fig. 5.9(c). It uses a parallel inverter from the input to shunt the collector-emitter (CE) of QLand Qa ontputs. The disadvantage of this gate is the increased input capacitance.
5.1.6
Power Supply Voltage Scaling
The output bipolar stage introducer VBEvoltage losaes at the output node as discussed earlier. When LL BiCMOS gate is driving another BiCMOS gate, the conventional BiCMOS gate loser its superior performance o v a CMOS at lower power supply voltage. The major c a w of this problem is the pull-down section of the BiCMOS gate. The VoSvoltage of the driving NMOS transistor of the pull-down section is eqnal to VDD 2VeB. As VDDis redoeed, VOS is signifinrntly reduced, resulting in degradation of drain current, hence the driving capability ofthe conventional BiCMOS gate. Fig. 5.10 shows the delay of a BiCMOS inverter in comparison to that ofs CMOS m the supply voltage is scaled down. The reported delay times were extracted from SPICE simulation by memuring the delay of the second gate in e. chain of identical inverters. AU gates were equally loaded by B load CL = 0.25 p F and one fanout. All the circuits have the same input capacitance. The BiCMOS invcrter fails to ~
Lour-Voltage VLSI BICMOS Czrcuit Design
271
1.4,
operate at 2 V power supply. The BiCMOS outperforms CMOS but for 3 and sub4 V it looser its superior performance. The limit of operation of the conventional BiCMOS gate with the power supply voltage is determined by the NMOS device of the pull-down section. The drive current of this NMOS d e v k k (VDD -2Vs.s -VT..). Hence, VDD,,,~ 2.2 V. Therefore, high-performance BiCMOS circuits, at low-voltage, are needed that
-
minimize
m
Teehnology/procesn complexity;
rn
Circuit complexity by osing less device count;
m
Area occupied by the gate; and
rn
Power dissipation.
272
CHAPTER5
5.2 BINMOS LOGIC FAMILY BiCMOS technology can gain much of its performance edge o ~ e rCMOS with c k u i t techniques that mk-e or eliminate the effects of VBBloses. To overcome the problem of dday degradation in conventional BiCMOS with supply voltage, many navel circuits were proposed. In this section, a practical family suitable for 3.3 V and sub-3.3 V operation regime is outlined. Fig. 5.11 shows the BiNMOS family of BiCMOS & 1.
For p l y - S t
Lond
(6.5)
Thus, the high-storage node, in the ease of PMOS T F T sell, is charged-np qvkkly to VDD.For this rearon, the Soft Error Rate (SER) of the PMOS T F T cell is much lower than that of the poly-Si cell [El.
6.1.3
R e a m r i t e Operation
Fig. 6.9 shows a simplified readout circuitry for an SRAM. The circuit has static bit-line loads composed of pull-up NMOS devices N , and N2.The bitlines are pulled-up to a voltage (VDD - h), where V!, is the threshold voltage
Low-Power CMOS Random Access Memory Circuits
325
326
CHAPTER6
"OD WL
Figure 8.10
Power reduction by pulsing the word tine.
mbjett tu body effect. When the word-line W L is asserted, one word is selected. At this time, the bit-line B L is p d e d down to s level determined by the pull-up NMOS HI, the word-line transistor N., and the driver NMOS transistor Nd ss shown in Fig. 6.9(b). The voltage at the node A should be low (mar ground) to not alter the RAM content during this read operation. A small swing change on BL is dwirable to achieve the high-speed readout, particularly if CnL is high. The Sense Amplifier (SA) amplifies the small swing, AV on the bit-line. Typical values 0fAV-J are 100 mV wd.L?& respectively. It should be noted that t&FA phould provide a wide opemting margin over all pmcess, temperature, and voltage cornerr.
If the W L signal stays asserted, all selected eolamns consume a DC current flowing through the NMOS devices N,. N. and Nd. Thus, the shortening of read mode duration is necessary to reduce the power dissipation during this active mode. This is possible by pulsing W L with enough time to read the cell as shown in Fig. 6.10. The generation of pulsed W L signal is possible owing to the Address Transition Detection (ATD) technique as will be discussed in Section 6.1.5. Fig. 6.11(a) shows asimplified circuit configuration for SRAM write operation. For II write operation the memory cell state should be Ripped. When the write signal W E is asserted, the input data and its complement are placed on the bit-lines. If for example, a vero has to be stored in the node A initially at VDD,the voltage at this node should be below the threshold voltage of the coll, as shown in equivalent circuit of Fig. 6.ll(b). The bit-line in thia crse is pulled-down to almost 0 V. The design of write circuitry should provide a wide operating margin o v a all process, temperature, and voltage corners. Note that B DC current is consumed during a write mode, hence the W E signal should
Low-Power CMOS Random Access Memory Circuzts
WL
327
~
BL
of the write operation. In high-speed SEAMS, write recovery time is an important component of the write eyde time. It is defined BE the time necessary to recover from the write cycle to the read &o be short to cut this current at the end
state after the W E s i g d is disabled. Note that the swing on bit-lines after mite operation is large. Thus, an equalizer circuit is needed to reduce this s-g, so that the read operation is performed qoidrly. Fig. 6.12 illustrates b simplified achematic of an SEAM with xead/write circuitry. At the end of the memory cycle a differential voltage existed on the bit-lines. A PMOS equalizing device is used to equalise the bitliner after each read and write operation. The differential voltages on the bit-lines are restored
CHAPTER6
328
Dafa-i"
%D WE
0
WL
0
@.@
x
T
Lou-Power CMOS Random Access Memory Gircuzts
column 1
Bil-line conBLioning md COlvm" m
329
AQ
a%
/
1M
9 X3LdVH3
OEE
Low-Power CMOS Random Access Memory Circuits
rn
331
The decoders (row and column); The memory array. Ifm memory cells are connected to the ward-he, the active power of memory array (in read mode) is given by Pmm-ma,
=mPd
+ (n- l)m&ab + mrDcAtfVDD
(6.6)
Where P . , is the power dissipated in active mode when selecting the m cells and ~ I . . I , is the data retention (standby) power of the unselected mekory cells in the m Y n array. The second term is neplipible. The third term is due to the DC current, ID,, dadng the read operation. At is the activation t i m e of the DC eonr-g parts and f is the operating frequency (f = 1Jinc).An example of such a current is the DC current flowing Gom the bit-line load to the ground through the memory cell; rn
Sense amplifiers. They m e dominated mainly by a DC current; and
Remaining periphery such as input/output buffer, write circuitry ete. Note that the power dissipated by the pads is not included. The power dissipation of the components, other than the memory array, depends on the total capacitances, the opersting frequency and the internal voltage swing. It can include a DC component with a major contribution from the sense amplifier.
To reduce the active power consumption many techniques can be used and are summatized 85 follows : m
rn
Reducing the capacitances of the word-line and the number of m cells connected to it. This is possible by osing Hierarchical Word-Line (HWL) techniques. Reducing the DC current by using the pulse operation technique for the word-tine and the periphery circuits (including sense amplifier). Use of multi-stage static CMOS decoding to reduce the AC current. Lowering the operating power supply d t a g e .
The standby power (or Sometimes called retention current) of an SRAM has a major contribution from the memozy cells in the array if the sense amplifiers are disabled in this mode. It is given by Pstcdbv
=
mnprcar
(6.71)
332
CHAPTER6
One way to reduce the standby current is to reduce the operating voltage. However, note that the data-retention cnirent will increase with memory capacity. Moreover, the leakage current, per cell, tends to increase because the threshold voltage is expected to be reduced for low-voltage operation.
In the following sections, many key circuits in an SRAM are reviewed. The circnit techniqocs and memory organisation to reduce the lrctive and dataretention currents are presented.
6.1.5
Address lkansition Detector (ATD) Circuit
To generate the different t-ng
signals for word-lines, equalisation and sensing,
an on-chip pulse generator, which detects the address change, is needed. It is baaed on address transition detection technique. The ATD is a key technique to reduce the active power of memories. Fig. 6.14(a) shows the schematic
diagram of an ATD pulse generator. Short pulses are generated with XOR circuits when the address changes from "L" to 'H" or "H"t o "L"; then summed through an OR gate. The overall pulse width is controlled by the RC delay line shown in Fig. 6.14(b). The corresponding waveforms are shown in Fig. 6.14(c). The d m o pulse is usually stretched out with a d&y circuit to generate the different pulses needed in the SRAM. Note that the CS signal is also included as m input to the ATD generator.
6.1.6 Decoders Usually the decoding in an SRAM is performed by using complementary CMOS. Two kinds of decoders arc used ; the row and the column decoders. Fast static decoders are based on OR//NOR and ANDINAND gates. Fig. 6.15 shows an example of a two-bit input address EOW decoder. The input bnffers have to drive the interconnect capacitance of the address lines and the input capacitance of the NAND gates. To match the pitch of the memory cell and to perform decoding for severals blocks, twostages decoders ale used. The first stage performs predecoding and the second one performs the final decoding function [Fig. 6.161. The twostages decoder circuit has other advantages over the onc Stage decoder such as to reduce the number of transistors and fanin. Also it reduces the loading on the address input buffers. This predecoding teehnique optimiiaer both speed and power. In the last stage an additional signd 4, is included in the AND gate. This signal is generated from an ATD pulse generator to enable the decoder and ensue the pulse activated word-line. There
Low-Power CMOS Random Access Memory Czrcuits
(h)
333
6
i i
Address
CHAPTER6
334
-
: Address h e r
Word line dtivcr
r
Low-Pourer CMOS Random Access Memory CirczLita
Predecodcr
335
Final decoder
are several ways to build mw-decoderr and it depends on the R.AM architecture division.
The column decoder permits the selection d l out of m bits of the accessed TOW. Fig. 6.17(a) shows the circuits involved for column selection uskg an example of 4 columns. The selected gate permits the transferring of the data from the bit-lines to the common data-lines I j O . The signals Yi a r e controlled by the ANDINAND c o l u m decoder BS shown in Fig. 6.17(b).
336
CHAPTER 6
Low-Power CMOS Random Access MemonJ Czrcuits
337
6.1.7 Bit-line Conditioning Circuitry The NMOS bit-lines' loads [Fig, 6.181 have been used in many SRAMs at 5 V pow= supply. They provide a precharge level on the bit-lines of VDD VT. The threshold voltage of the load, VT is subject to the body effect. A typical valne of this precharge level for 5 V power supply is 3.5 V. This level is suitable for voltage-type sense amplifiers to provide large gain and f s t rensiog delay. ~
To reduce the DC current, during the write circuit, a variable bit-line load It realizes fast sensing in the read cycle and B short wdte pulse width in the mite cycle. For fast sensing, the voltage swing of the bit-line shodd be small. To achieve this, the load impedance should be low. On the other hand, to obtain a low current dndng write cycle, the load impedance of the bit-lines shonld be high. As shown in Fig. 6.19, during the read operation, all four NMOS transistors N,, Na, N,, and N4 are turned ON. The bit-lines are switched into a low-impedance state so that the Voltage swing of the bit-lines is limited to R small value (e.g., 100 mV). During the write operation, the NMOS devices N, and NI arc witched OFF and only the small she transistors N, and N , are turned ON. tdmique can be employed [Fig. 6.191,
338
CHAPTER6
i
NI
Figure 6.19
Variable load bit-hrs.
T
Low-Power CMOS Random Access Memory Circuits
339
As the power supply voltage is sealed down to 3 V, the preeharge level can be lower t h q 2 V, Thus, d-g r e d operation the high-level node of the memory cell can t;,f&e equal to the bit-line d t s g e . Hence, the noise margin of the memory cell is drastically degraded and consequently the cell stebbility and soft error are degraded. Therefore, at 3 V power supply voltage, a PMOS trsnsktor can be used w bit-liner' load [Fig. 6 . 201. The bit-lines precharge voltage is V b ~ Far . law-voltage bit-liner precharge voltage, special ~ e n s eamplifiers should be used because conventional sensing circuits have poor voltage gain (less than 10). A variable impedance bit-line, using PMOS transistois, can &o
be implemented.
6.1.8
Sense Amplifier
When reading II memory cell, the bit-lines are initially precharged. then one i f the two bit-lines goes down, while the other stays high. The operation of polling down the bit-line is very slow because the discharging MOS device, in the memory cell, is small and the bit-line capacitance is high. This results in very slow memory read time. Sense ampliiiers are used t o detect the small "adation on the bit-lines and amplify it to get at the end fuU-swing signal. A dmple anbalanced inverter with a high logic threshold voltage can be used. j i c e its input is single and has very small noise margin,it ir very sensitive to noise on the bit-line. Thus, sense amplification, for the data-liner, is a key to aehieve fast access time and low-power dissipation. In general, the delay of B sense amplifier (from the time of word-line activation) represents 30 to 40 %of the whole read aserr tie. Various kinds of sense amplifiers have been devised for fast sensing operation and low-power dissipation. Fig. 6.21(a) shows a ringlcend sense ampliser with an active current-mlror. Thin structure forms the basin for ~ n SRAMa' y sense amplifier circuits. It has two differentid inputs, D L and DL. The noise equally affects both the two inputs and only the difference is detected. The transistor N, acts as a curent source. Before the signal $ 4 . ~ is asserted, the data-lines D L and DL are high. AU the nodes, A, B and C, a x high. The signal & A is a s e r t e d when DL starts, for example, to drop slowly. In this m e , the NMOS transistor N, is ON. The output voltage (node C) drops suddenly to a c a t & voltage. Thus, the input signal is amplified by the gain of this differential amplifier.
Fig. 6.2l(b) shows the voltage waveforms of the single-end sense amplifier uskg SPICE simulation. The signal is generated with an ATD pulse. It i s
340
CHAPTER 6
Low-Pourer CMOS Random Access Memory C~rcuets
341
asserted for a time, enough to amplify the small variation (few hundreds of rnV) on data-lines', then it is disadivated. In this scheme the DC cnrrent consumed by the sense amplifier is cnt off. Usually the sense amplifier is common to msny columns through the common data-liner. The small Signel gain of this amplifier is given by * = 9-(6.8) 90
is the transconductance of the driver NMOS Nd and go is the cornbioed output conductance of the PMOS load and the NMOS driver. where
y'mn
In many SRAMs multi-stage sense amplifiers are needed to attain large volte.gge in Fig. gain. In this case, the daublbend sense arnpLifier is used a6 sh6.22. This circuit h s often been wed in many SRAMs. To attain high-speed data sense, a two and three-stage sense amplifier technique a n be adopted. Fig. 6.23 shows a two-stage amplifier structure. An equalisation technique is used for the data-lines, using the equalization pulse 4sq,which is generated with an ATD pnlse. It is indispensable, not only to attain faster data transfer 'Thc auipui of the srme ampmcr k then iatchcd.
342
CHAPTER 6
Low-Power CMOS Random Access Memory Circuzts
343
I S
Figure 8.14
PMOS cross-couplid sense nmplrficr
during read operation, but also to suppress incorrect data before the comect data appears in the sense amplifier [17]. For low-powei applications and &o due to the plastic packaging limitations of static memories, this type of sense amplifier can result in high power dissipation for high-density memories even if the current source is pulsed. Many circuits have been proposed to reduce the power of the sense amplifier while improving their sensing delay time. One of them is the PMOS CIOSScoupled amplifier [I81 shown in Fig. 6.24. The PMOS loads, P, and Pz,are cross-coupled and the M e r e n t i d outputs S a m S are connected to their girtes. The positive feedback in this latch amplifier permits much faster sense speed than the conventional one. In this circuit the equalization technique is used for the reasons discussed above. Fig. 6.25 rhawr the senre delnys of both the PMOS cross-coupled amplifier and the double-end current-mirror amplifier as 1 function of the average current of the amplifier. The input voltages simulate
CHAPTER6
344
0 6 prn CMOS
Convenuo~aicurrent -mrrror SA
1
2
3
4
'd
5
6
Low-Power CMOS Random Access Memory Circuits
345
the common data-lines' voltages and the sense delay id is defined as the delay time from the crosso~erpoint ofthe input voltages to the point when the ontput reacher 1 V difference. The PMOS cross-coupled amplifier has less than half the delay of the conventional current-mirror sense smplifrer. Moreover, this latch amplifier consumes less than one-Mth ofthe power of 6 current-mirror amplifier. The PMOS cross-coupled latch amplifier requires much more accurate timing for to optimize the sensing delay [la], Thin circuit also has low-power property compared to the current-mirror amplifier since it has nearly full-swing outputs with positive feedback.
+.,
346
CHAPTER6
When the voltage is sealed to 3 V power supply, the data-line voltage is near VDD, then a level shifting can be pedormed. Fig. 6.26 shows a two stage sense amplifier wed for 3.3 V mpply. The first stage is a cross-coupled NMOS amplifier which also performs level shifting of the common data-line voltage. In the second dage, a conventional sense amplifier is used which operates at the maximnm 9 .;. point since the l e d on SA a d YZ =re medium leutlr.
Fig. 6.21 shows another sense amplifier developed for low-voltage power supply [IS]. This circuit is mcd when the bit-tines are close to VDD,where the gain of a conventional current-mirroi amplifier is poor. The circuit is composed of a level-shift circuit and a conventional current-mirror amplifier. The level-shifter shifts the bibline voltage to a medium voltage; 0.6 to 0.7 V, (@ 1 V power
Low-Power CMOS Random Access Memory Czrczlits
347
supply voltage) where the gain IS maximum. Low-VT NMOS devices NL and N2 are used to provide these medium levels. There devices are subject to the body effect. Recently current sense-amplifiers have been proposed to overcome the gain reduction of voltage amplifiers a t low power supply [T, 121. Alao they reduce the power diiaipntion of the sensing operation compared to voltage sense amplifiers at the same delay. There circuits require wry careful dengn.
6.1.9
Output Latch
In low-power SRAM, the pulse technique for word-line and seme amplifter ir indispensable in order to reduce the DC Current. In such B pulse mode. a datalatch circuit is required to Store the amplified data by the sense amphfier from the memory cell for the data output circuitry. Fig. 6.28 shows an example of an output latch placed after the sense amplifier. The requirements of such an ontput latch are the following ' m
The latch circuit must not delay the mad access time. Such a requirement is attained by connecting the latch with data-bus lines in parallel. One input transmission gate, controlled by 41,is used to enter the data to the latch. Another transmission gate, controlled by 40, is used to put the dat. back into the det-bnr.
rn
The latched data must not be destroyed by the noise entering the SRAM. A noise in an SFAM is generated and propagated by the following mechanism. On the system board, 8 ground noire can enter the SRAM. When the peak level of the ground noise becomes large enough for the first gate of the address buffer to change the logic value of the address input, an ATD pulse noise is generated. This noise pulse could turn on the word-lineand the *erne amplifier for a short time resulting in an expected signal on the data-bus. Therefore, the Latched data conld be destroyed if the inpnt Gp.1 is ON. To avoid such a problem, two circuit techniques m e included in the eireuit of Fig. 6.28. The first one is the generation of Qr only when the pulse width of the ATD is large enongh, compared to that of the noise. The other circuit technique is to place latch-protecting invertem [Fig. 6.281 in the front of the output gates. The inverterr prevent noise from entering the output gates.
348
1
CHAPTER6
The new data must be quickly latched into the data-latch. The circuit of Fig. 6.28 can be optimbed for fast operation.
6.1.10 Hierarchical Word-Line for Low-Power Memory With the increased memory size, the word-line delay and the column power increase. To solve this problem, B Divided Word-Line (DWL) structure was proposed [ZOr. The concept of DWL is shown in Fig. 6.28. The cell array and the word-line are divided into ng blocks (rub-arrays). If the SRAM has no columns, each block has n o / n ~columns. The divided word-line of each block is activated by the main word-line and the corresponding block select signal. Consequently, only the memory cells connected to one divided wordLine w i t h a selected block are accessed in a cycle. Hence, the column current
Low-Power CMOS Random Access Memory Circuits
Global row decoder
Block
2nd Block
n-
Elnck
349
nBch Block
sdcct
lillC
Figure
n i n CI,IIIIlI"S C B (rneniory cells)
B.m
Divided Word-Linc (DWL) concept [ZD]
is reduced, since only the selected columns switch. Moreover, the ward-line selection delay, which is the delay time from the address input to the divided word-line, is reduced. This delay is composed ofthe main word-line select delay and the divided word-linc select delay. The main word-line selection delay is reduced compared to the conventional one, because the total capacitance of connected transistors is reduced. In a conventional S U M , the word-he has all the row memory c e k ' gates of B row connected to it. The insin word-line delay increases as the number of blocks increase because the number of block select gates increases. On the other hand, the divided word-line delay decreases as the number of connected cells i s reduced with the increasing number of blocks. Consequently, the word-line selection delay has a minimum for a certain number of blocks.
6.30 shows the effect of the number of blocks in DWL structure on the word-line select delay and the colvmn power for 64-Kb SRAM [l o]. In this example. a number of blocks of eight can be chosen. The ares penalty for this case is only 5%, compared to the conventional memory. AE an example, for I-Mb SRAM, the cell array is divided into 16 blocks and each black consists of 612 OWE by 128 columns. 9-bit address (,4...Ae) is used to select B I O W within Fig.
CHAPTER 6
350
I
16
2
32
Number of Blocks
a block using two-stage row decoder.
Global block selection is done using &bit
address. The DWL structure has been widely used in high-density SRAMa for its lowpower. high-speed characteristics. However, in high-density SRAMs, with a capacity more than 4 M b , the nomber of blocks in the DWL structure will have t o increase. Therefore, the capacitance of the global w o r d - h e increases cansing the delay and power increase. To solve this problem, the concept of Hierarchical Word Decoding (HWD) was proposed in [21] as shown in Fig. 6.31. The word select line is divided into more than two lev&. The number of lev& (hierarchy) is determined by the total load capacitance of the word select line to efficiently distribute it. Hence. the delay hnd the power ayt reduced. For 4-Mb, three levels of hierarchy haw been used with 32 blocks; each block having 128 columns by 1024 rows. Fig. 6.32 shows the delsy time and the total
352
CHAPTER 6
capacitance of the word decoding path comparison for the optimized DWL and HWD strmtures of 256-Kb, 1-Mb, snd 4-Mh S U M S . For 256-Kb SRAM there is no significant advsnthge of HWD over DWL. However, for high-density SRAMs the perfounance, of HWD in terms ofpower and delay, becomes dear. The three-levels scheme can be used efficiently for 16-Mb SRAMs.
6.1.11 Low-Voltage SRAM Operation and Circuitry There are several applications which need a 1.2 V battery power supply. For such B application 1 V SRAMs are needed. At 1 V power supply, B stable operation is targeted and it is very important that the noise is reduced. Moreover, the active and standby powers should be reduced t o meet the requirement of battery operation. For 1 V power supply, a full CMOS memory cell has a lower power dirripation in standby mode and greater immunity to transient noise and voltage variation than other cells. It can also operate at the lowest supply voltages. Although a full CMOS cell operates well at ultralow-voltage, its area is almost double of that of PMOS TFT. Henee it is not suitable for high-density memories (sine > 4Mb). When the full CMOS memory cell is operated at 1 V power ropply, a typical cell ratio is 3 for stable operation. The SNM of this cell, at 1V, can be h o s t the same as for a poly-Si load memory cell at 5 V. When nsing the fnU CMOS 4 no boosting of the wad-line is needed to write a high voltage level in the cell. However, the PMOS T F T cell requires a boosted voltage (V.h > VDD) on the word-line during the write cycle 1191. If the voltage of the word-line is raised only to VDDin the write cycle, the high node B of Fig 6.33 is initially at VDD- VT, where VT is the threshold voltage of the access device subject to the body effect. This low-level (VDO- I+) of the node B em not charge up to V0o because of the poor drimbility of the PMOS T F T device. When the boosted word-he tedrniqne is applied to the PMOS T F T cell during a write cycle, a problem can a G e . The unselected cells connected to the boosted c o m m o n word-he suffer from an instability problem because a large current flows through the low node of the cell. This large current is due to the high voltsge on the access transistor. Consequently, this technique is not suitable for 1 V operation.
Low-Power CMOS Random Access Memory Circuits
Figure 8.54
Twertep t.Ehniq\is
for 1 V operation [is].
353
354
CHAPTER6
Word driver
Low- VT MOSFET
-Din
WE
Din
(a)
Figure B.55
(a) TSW m d l w i t e ~imuitm [is]
A TwrrStep Word (TSW) voltage technique has been proposed by Ishibarhi et al. 1191 to solve the cited problem. Fig. 6.34 shows the block diagram of the proposed memory. The boosted-level generator' generates a voltage V,, = 1.5V for VDO = 1V. The word-line voltage har two-steps, one is VDD and the other is K h . The circuitry for the TSW method is shown in Fig. 6.35(s). When Q, goes to zero, the signal W L is raired to V,, = VDD. Then when .$ch is mserted with a high l e d , equal to Vch, the transistor Pi tnms ON and then the W L level is increared to V , , = Vch. In this e a e , the low threshold voltage device N, tun. OFF and the inverter formed by the transistors Pa m d N, is isolated to reducc m y leakage current. Fig. 6.35(b) shows the voltage waveforms for the TSW circuitry in read/write modes. During the write cycle, the high node A is first charged to a low voltage, 'The boostcdLvel8~lcratorirprcsentcdin ScetionB.2.11.
Low-Power CMOS Random Access Memory Circuits
355
then raised to Vms.The bit-hes are initially floating, then prechaged at the end of mite cycle. In the next read cycle, the b i t - k s are floating. Before the word-line voltages rise to V,,, the cell discharges BL through the low node B . Thus, when the word-line has risen to Vwt, current does not flow in the cell and the node B stays at low level voltage. Note that this technique requires mdti-V, CMOS devices and causes delay in writing because the bit-lines are discharged before writing. However. the low-voltagge S U M S discussed above require a relatkely high threshold voltage VT 2 0.5V. Thus, their speed is qnite slow. As an example. a 258-Kb SRAM with full CMOS memory cells attained 3 ps access time at 1 V power supply using 0.8 pm CMOS technology [22]. The active power at 0.1 MHa is 0.2 mW and the standby power is 5 nW.Another example is a 1-Mb SRAM with fuU CMOS memory c c b which achieves 200 n s access t h e at 1 V power supply using 0.5 p n CMOS technology 1231. The active
356
CHAPTER 6
cuprent at 1 MHs is 0.1 mW snd the standby current is 10 nW.Note that if the tbrerhald voltage is too low for ultra-low voltage applications, all the eirwits composing the SRAM will suffer from the subthreshold current leakage. Thus, the retention current increases drastically cansing B sedous problem for low-power applications. Moreover, the temperature effect and the threshold voltage variation enhance this current. So far, no practical solution has been proposed.
6.2 DYNAMIC RAM The first dynamic RAM (DRAM) was introduced in 1970 with a capacity of 1-Kb. Since then, the density has quadrupled every three years (one generation). Recently, some wperimentd 256-Mb DRAMs were reported [24, 25, 261. At p'esent, low-voltage 16-Mb DRAMr run in high-volume production. The development of there higher densities have made DRAMs the cheapest per bit compared with other types of memories. They are widely used as the main memory of mainframes,PCs, and workstations. The access time har been decreased from few hundreds of ns for 4-Kb DRAMr to less than 50 ns for 256-Mb. Also the power dissipation has been reduced by an order of magnitode from 4 K b capacity to 256-Mb capacity reaching 50 mW at 1.5 V power supply. The area of the memory cell has been reduced from more than 100 @mafor 64-Kb DRAM to 1.28 @mafor 64-Mb DRAM. In addition to the trend for higher-density standard DRAMs, there are two other trends: Low-Power (LP) DRAMs, and high-speed DRAMr. The highspeed DRAMs sacrifice the retention current ar well as density for faster access time. Low-voltage low-power DRAMs are becoming important particularly for battery operation. LP DRAMs extend the time of the battery operation as well as battery back-up operation. The active current of LP DRAMS has been lowered. The data-retention cuiient has also been reduced but rtii it is about one order of magnitude higher than those of SRAMs'. The 5 V power supply standard has been used for many DRAM &enmations from 64Kb to 16-Mb externally. This was followed hy 64-Mb DRAM powered with external 3.3 V not only to reduce the power dissipation, but &o t o emme reliability. The gate oxide reliability limits the msldmum voltage which is related to the boosted voltage inaide the chip. Regarding the internal voltage, the 5 V can be used to a maximum DRAM capacity of 4-Mb. At 16-Mb generation, the internal voltage is 3.3 V while maintaining external 5 V with on chip voltage 'This comparison is msdc for I - M b mernezicr.
Low-Power CMOS Random Access Memory Circuits
6
357
WL SWING
-
LIMITER
-?
5
-
4
-
w
0 3 4
-
-,
-
-
-
-
Li
t; ? I
1
4 4
0
I
DENSITY
1M
I
FEAT.SlzE1.3 Toi
25
I
I
4M
16M
MM
256M
I
I Ic
0.8
0.5
0.3
0.2
0.1
20
I5
10
7
5
Mn NiCd
(hi0 ipim)
(nm)
Figure 8.38 Trends of DRAM upp ply [ Z B )
down converter [see Section 6.31. Howevez the 3 3 V externill power supply wlll dominate. Recently, activities to r e d r e 1.5 V battery-operated DRAMs are accelerating
the trend in lowvoltage operation [ZT. 28. 291. Fig. 6.36 shows the trend of DRAM supply [ZS]. In battery operation, the chip must be operated on B variety of batteries with various supply voltages for a long-term and under supply fluctuationr.
358
CHAPTER6
\
CAS
6.2.1
/
Basics of a DRAM
In general the pins of a DRAM are
:
m
Address; which is seprrrated in time with two separate fields. There fields are the row and column address.
1
Row Address Strobe
rn
Column Address Strobe The column address on the multiplexed pins is clocked by this signal.
rn
Write Enable
(m). The row address is docked by this signal. (m).
(m).
Low-Power CMOS Random Access Memory Czrcuits
. m
359
Inpnt/outpot data pi... External power supply pins.
It is dear that the multiplexed address penalims the access delay so for fast DRAMr separate address input pins can be used. The multiplexing permits the reduction of the pin count and the cost of packaging. An example of DRAM timing, ndng the addresa multiplexing during read mode, is shown in Fig. 6.31. Some important times are shown, such as the access time from low, tmS, the row addxss strobe cyde time (or cycle time), tRC,and the row address strobe low-state time, 1x1s. Fig. 6.38 shows B gene& 4 M b DRAM architecture. It uses almost the same circuit techniques as SRAM except for memory army. Some additional circuits are needed such es a Back Bias Generator (BEG), B Half-Voltage Generator (BVG), an optiond Voltage-Down Converter (VDC), a R,eference Voltage Geaerator (RVG), and a boosted voltage generator circnit. The substrate back-bias voltage is indispensable for stable operation of the DRAM array. The halfvoltage generatar permits generation of the precharge level for the bit-lines to half-VDD as it is explained in the following sections. The reference voltage generator ir needed for the VDC. The boosted voltage generator uses b chargepump circuit and permits overdriving of the word-line WL to a voltage higher than VDD.More details on these circuits, composing the DRAM, are given in the following sections.
6.2.2
DRAM Memory Cell
CMOS DRAMr, with threetransistor and four-transistor cells, were used in 1and 4-kb generations. One-tranristor (IT) cell offers smdei chip size and low cost. These justify the process complexity to fabricate the IT ccU, particularly its capacitor. A &hematic of B 1T DRAM cell is illustrated in Fig. 6.39(a). The charge is stared in capacitor C,.To prevent loss of the stored information, the capacitor must be refreshed within a specific time with spedal circuitry. The bit line has a capacity CBLinduding the parasitic load of the canneeted circuits. Typical values for the storage and the bit-line eapaeiton are 30 f F And 250 f F , respectively. The ratio R = CBL,’C, is very important for the sensing operation.
CHAPTER 6
360
---
9.
RAS CAS WE
r
.
102
I'
Low-Power CMOS Random Accrss MemonJ Circuits
361
Doring the read operation ( W L is selected) the bit-line wltage changes by
where (VMC- Vm,) is the difference between the memory cell voltage and the bit-line voltage before the selection ofthe cell. A typicd value of the difference is V D D ,Hence, ~ we have fog the hit-line renre signal
(63) For 3.3 V supply voltage, and using a rstio E = 8 far 16-Mb DRAM,the sense
signal V , = 180 mV. This r m d voltage change, of the bit-line, requires sensing circuits. For low-voltage operation, V. decreases, thus a low ratio R is required. This is possible by reducing CBLand increasing C,. C, was implemented ming a simple planar-type capacitor a~ rhom in the structure of Fig. 6.39(b). Thi structure WBS used in DRAMS with capacity up to I-Mb. With the increased density, many threedimensional approaches were used for DRAMs with capacity higher than I-Mb. One approach is to stack the capacitor over the access transistor (STCcell). Another approach is to m e a trench capacitor. For more details on advanced cell structure the reader can consult 130, 311.
The signal charge (Q.ig = C.AV,) transferred to the bit-line during a r e d operation should have enongh margin agsinst noise. The sources of noise are the following : rn
.
bit-line noise; which is caused by capacitive couplings and other sonr~eei leakage charge; which is mainly due to the leakage in the junction of the NMOS trmsistor of a IT memory cdl; and a-particleinduced soft errom
In the early DRAM,the plate of the capacitor WBS grounded to reduce the noise injection from the VDDpower supply. However, for multi-Mb DRAMs, a VDD/Z bias €or the eeU plate was nsod. This scheme has several advantages such as, the reduction of the stcess on the thinner oxide of the atorage capacitor, and the reduction of supply voltage noise. Many I-Mb DRAMs have used this cell biasing scheme.
362
CHAPTER6
DRAM cell design with redneed VOD,the ratio R should be rednced. This L possible by reducing the bit-line capacitance, Csr. and increasing the storage capacitance C.. On the other hand, the area occupied by C. should be rednced to increase the chip capacity. One solution for C. reduction is the use or* capacitor insulator with extremely high permittivity 6 such BI Ferraelectric materials nuch as BoSrTiOJ film. Consequently B simple planar-typo capacitor can be nsed in that c a ~ e For Gb
Low-Power CMOS Random Access Memory Czrcurfs
363
6.2.3 R e a m r i t e Circuitry Fig. 6.40 illurtrstes the Merent circuits for read, write precharge, and equalisation funotions. The read operation is performed as follows. Initially both the bit-lines ( B L and BZ)are precharged to V, which is equal to VDD/Zand eqndized before the data reading operatirm. This hali-yoo preeharge technique permits the reduction of the active power disdpation 89 discussed in Section 6.2.9. The signal W L is seleded by the TOW decoder. The high level of the word-line voltage har to be greater than VDD to increase the stored chaise in the memory cell. The selected memory cell is connected to one bit-line. Then AVBL (100 to 200 mV) appears between the bit-lines, immediately &her the word-line rises. Then it is amplified by the latch-type CMOS sense amplifier
CHAPTER6
364
which is connected to both bit-liner. After the sensing and the restoring o p erations, the voltage levels of the bit-lines bsve a full-swing condition. The bit-line differential voltage signal is transferred to the differential output-lines (0 and d), through a read drcnit. The signal YR i selected h o s t at the 8-e time with W L . The parasitic capadtance of the output-line is large (a typical value 2 pF for 4-Mb DRAM), and the readout circuit would need a long time to amplify the ootput-line signal. A main sense amfler is used to read the output-liner, then the data is selected among several main SAs connected to different sub-arrays. Finally it ia transferred to the output buffer.
The DRAM cell readout mechanism is destructive, and hence the same data must be wsdtten to the cell on every read access. Consequently, on each bitline pair, a CMOS mpifier is needed to amplify and restore the level. This mechanism is not needed in SRAMs since the lead operation is non-destructive. In the write made, the YW Jignd is selected by a column decoder as shown in Fig. 6.40. In this ease, the write control signal is actiTated. The selected bit-lines are connected to a pak of wdte-liner W and W and the data are transferred to the memory cell when W L goer HIGH.
6.2.4 Low-Power Techniques Fig. 6.38 can be osed to identify the different sources of power dissipation in B DRAM. For simplicity we asmme that the internal supply voltage is the s a m e compared t o the external one. The total power dissipated is the addition of two components; the active power and the data-retention power. The active power is the rum of the power dissipated by the following components;
The decoders
(row and column);
The memory army. This is the dominant one. If m memory e d s ate connected to the word-line, the active power of memoly array is &ken by
P.,,sm.a,,ov = m x Poem
(6.11)
Where Pmctm is the power dissipated in active mode when selecting the m cells. It is given by
Pacam= C m A V m V D D f m
The sense amplifier;
(6.12)
Low-Power CMOS Rondorn Access Memory Circuzts
= m
365
Other circuits such as refresh circuit, substrate back-bias generator, boosted l e d generator, B voltage reference circuit, and a half-VDD generator. These circuits &a dissipate a DC current; The rest ofperiphery such BS main sense amplifier, input/antput buffers, write circuitry etc.
Note that the power dissipated by the pads is not included.
To ieduce this active power, many techniques can be used and a m smnmarieed as follows : rn
Reducing all capacitances; particularly the bit-line and word-lines ,the output voltage V,, is given by ~
V7#, = AVT-Rr.
RR
(6.25)
This shows that the reference voltage e m be adjusted to any voltage. Moreover, with trimming technique V,,, can be adjusted against pmcess vadation effect (AVT variation). The ontput voltage is sampled on the hold capacitor C,. When 4, is low, the circuit is in hold mode. Clock +2 is delayed to clock to minimbe fluctuation of the output voltage. These clocks ape generated from the self-refresh clack circuit in il DRAM. The ciircuit consumes a DC current only when 4, is applied. The average cuiient consumed by this circuit is I,,
= 31x74 = ~ ( A V T I R E ) ~ ~
(6.26)
The corrent of thb circuit c m be reduced where 7+ is the duty ratio of to a low-level in sub-PA iange by controlling the duty ratio. For example t o generate a reference voltage of 2.4 V from an externd power supply voltage of 3.3 V, RR and Rr. me 9 kR and 12 kfl, respectively. AVT has a typical value of 0.3 V. The total DC is 100 PA. So with a duty ratio lower than 1/100, the average current can be reduced below 1 p A . It can be easily shown that this circuit has a low sensitivity to power supply voltage and temperature variations.
6.4
CHAPTER SUMMARY
Low-power architectures/circuitr techniques for SRAMs, DRAMs and VDCs were reviewed. The obviow technique to reduce the power dissipation is the
400
CHAPTER6
Low-Power CMOS Random Access Memory Circuits
401
voltage ~ealing. The reduction of power supply voltage to 1- and sub-1 V range requires new circuit innovations and breakthroughs, particularly when low threshold voltage devices are used. It ww shown that not only the power supply voltage scaling contribntes to the power consvmption reduction but &o the reduction of capacitances and DC currents using sophisticated techniques. Many of the techniques presented for memories can be useful to other applications such as : ASICs, DSPs, etc. Design issuer for stable operation of a VDC and Iow-rtandby current techniques were invertigated.
REFERENCES
[I] 8. Tram ct al., "An 8 - m 1-Mb ECL BiCMOS SRAM ~ t a hConfigurabIe Memory Array Size," International Solid-state Circuits Cod. Tech. Dig., pp. 36-37, Febzuluy 1989.
[2] M. Matsni et al., "An 8-ns I-Mb ECL BiCMOS SRAM," International Solid-State Circuits Conf. Tech.Dig.,pp. 38-39, February 1989. [3] Y.Maki et al., 'A 6.5-nr 1 Mb BiCMOS ECL SRAM," International SolidState Circuits Conf. Tech. Dig., pp. 136-137, February 1990. [4] M. Takada et al., "A 5-11s 1-Mb ECL BiCMOS SRAM," BEE Journal of Solid State Circuits, uol. 25, no. 5, pp. 1057-1062, October 1990. 151 A. Ohba et al.. "A 7--ns I-Mb BiCMOS ECL SRAM with Program-Free Redundancy," in Symp. VLSI Circuits C o d Tech. Dig., pp. 41-42, May 1990. [6] Y. Okajimact al., "A 7-nr 4-Mb BiCMOS SRAM with a Parallel Testing Circuit," International Solid-State Circuits Conf. Tech. Dig., pp. 54-55, Febrosry 1991. [7] K. Sas&
ct d.,"A 7-ns 140-mW 1-Mb CMOS SRAM with Current Sense Amplifier," IEEE Journal of Solid.State Circuits, vol. 27, no. 11, pp. 15111518, November 1992.
[8] T. Ootani et al., "A 4-Mb CMOS SRAM with a PMOS Thin-Film Transistor Load Cell," IEEE Journal of Solid-State Circuits, "01. 25, no. 5, pp. 1082-1092, October 1990. [9] S. Mur&kami et al.. "A ZI-mW 4 M b CMOS SRAM for Battery Operetion,' lEEE Journal ofSolid-State Circuits, vol. 26, no. 11, pp. 1563-1570, November 1991.
[lo] K. Saraki et al., "16-Mb CMOY SRAM with a 2 . 3 - p ~Single-Bit-Line ~~ Memory Cell," IEEE Journal of Solid-state Circuits, val. 28, no. 11, pp. 1125-1130, November 1993.
404
LOW-POWER DIGITALVLSI DESIGN
[Ill M. Metrumiya et al., 'A 15-ns 16-Mb CMOS SRAM with Interdigitated Bit-Lme Architecture," IEEE Journal of Solid-State Circuits, ual. 27, no. 11, pp. 1497.1503, November 1992. [I21 K. Sen0 et al.. " A 9-ns 16-Mb CMOS SRAM with OfEset-Compensated Cnrrent Sense Amplifier," IEEE Journal of Solid-State Cirenitr, vol. 28, no. 11, pp. 1119-1124,November 1993.
[I31 E. Seevinck, F. J. List, and J. Lohrtroh, Static-Noise Marsin Analysis of MOS SRAM C e b , " IEEE Journal of Solid-State Circuits, vol. SC-22, no. 5 , pp. 748-754, Oetobei 1987.
[I41 H. Kato et al., "Consideration of Poly-Si Loaded Cell Capacity Limits for Low-Power and High-speed," IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 683-685. April 1992. [I51 K. Saraki et al.,"A 23-ns 4-Mb CMOS SHAM with 0.2-pA Standby Current," IEEE Journal of Solid-state Circuits, vol. 25, no. 5, pp. 1075-1081, October 1990. [I61 K. Ishibarhi, T. Yamanaka, and K. Shimohigashi, "An a-Immune.2-V Supply Voltage SRAM using a Polysilicon PMOS Load Cell," IEEE Journal of Solid-state Circuits, vol. 25, no. 1, pp. 55-60, February 1990.
[I?] K. Saraki et al., "A 15-ns I-Mbit CMOS SRAM," IEEE Journal of SolidState Circuits, vol. 23, no. 5 , pp. 1067-1072, October 1988. [I81 K. S s a k i e l al., "A 9-ns I-Mbit CMOS SRAM," IEEE Jonrnal of SolidState Circuits, "01. 24, to. 5, pp. 1219-1225, October 1989.
[I91 K. Ishibarhi, K. Takasugi, T. Yamanaka, T. Hashimoto, K. Sasaki. " A I-V TFT-Losd SRAM using a Two-step Word-Voltage Method," IEEE Journal of Solid-state Circuits, vol. 27, no. 11, pp. 1519-1524, Msy 1992. [20] M. Yoshimito, K. An-, H. Shioohara,T. Yoshihara, H. Takagi, S. Nagao, S. Kayano. and T. Nakano, "A Divided Word-Line Structure in the Static RAM and its Applieation to a 64K Fall CMOS RAM," IEEE Journal of Solid-State c i r c u i t s , vol. SC-18, no. 5, pp. 479-485, October 1983. [21] T. Hirose, H. Kuriyama, S. Mnmkami, K. Yuzuriha, T. Mukai, K. Tsutsumi, Y. Nishimura, Y . Kohno, and K. Anami, "A 20-ns 4 M b CMOS
SRAM with Eieraichical Word Decoding Architecture," IEEE Journal of Solid-State Circuits, vol. 25, no. 5, pp. 1068-1074, October 1990.
REFERENCES
405
[22] A. Sekiyama, T. Seki, S. Nagai, A. Iwase, N. Surilti, and M. Hayaraka, "A I-V Operating 256-Kb FaLI-CMOS SRAM," IEEE Journal of Solid-state Circuits, vol. 21, no. 5, pp. 776-782, May 1992. [23] T. Yabe, et al.. "High-Speed and Low-Standby-Power Cieuit Design of 1 to 5 V Operating 1 Mb Full CMOS SRAM." Symposium on VLSI Circuits Tech. Dig., pp, 107-108, May 1993. [24] G. Kitrukawa, et 81.. "256-Mb DRAM Circuit Technologies for File Applications," IEEE Journal of Solid-State Circuits, "01. 28, no. 11, pp. 11051113, November 1993. [25] T. Hasegawa, et al., "An Experimental DRAM with a NAND-Structnred Cell," IEEE Journal ofSolid-State Circuits, val. 28, no. 11, pp. 1099-1104, November 1993.
1261 T. Sugibayashi, et al., "A 30-nn 256-Mb DRAM with a Multidivided Array Structure," IEEE Journal of Solid-State Circuits, "01. 28, no. 11, pp. 10921099, November 1993. [27] M. A&, J. Etoh, K. Itoh, S-I. Kimura, and Y. Kawamota, "A 1.5-V DRAM for Battery-Bwed Applications," IEEE Journal of Solid-State Circuits, "01. 24, no. 6, pp. 1206-1212, October 1989.
[28] Y. Nakagome, et d.,-An Experimental 1.5-V 64-Mb DRAM," IEEE Journal of Solid-State Circuits, vol. 26, no. 4, pp. 465-471, April 1991. [29] H. Yamauehi, et al., "A Circuit Technology for High-speed BatteryOpersted 16-Mb CMOS DRAMS,~IEEE Journal of Solid-State Circuits, "01. 28, no. 11, pp. 10841091, November 1993.
[30] N. C. C. Lu, " Advanced Cell Structnres for Dynamic RAMS," IEEE Circuits m d Devices Magashe, no. 1, pp. 21-36, Jenuary 1989. [31] M. Takadn, "DRAM Technology for Giga-bit Age," International Conf. Solid State Devices and Materials, Tech. Dip., pp. 874876, 1993. [32] L. Itoh, et d.,"An Experimental 1-Mb DRAM with on Chip Voltage Limiter," in International Solid-State Circuits Cod., Tech. Dig., pp. 282283, 1984. [33] N. C-C. Lu, and H. H. Chao, '' Half-Voo Bit-Line Sensing Scheme in CMOS DRAMS," IEEE Journal of Solid-State Circuits, "01. SC-19, no. 5, pp. 451-454, August 1984.
LOW-POWER DIGITALVLSI DESIGN
406
(341 B. Kawamoto, T. Shinods, Y. Yamapehi, S. Shimiuu, K.Ohishi, N. Tanimum, T. YasUi, 'A 288K CMOS Pseudostatic RAM," IEEE Journal of Solid-state Circuits, vol. SC-19, no. 5 , pp. 619-625, October 1984. 1.351
Y.Trikihwa et d.,"An Emcient Back-Bias Gcnezstor 6 t h Xybzid P u m p ing Circuit for 1.5 V DRAMs," in Symposium of VLSI Circuits, Tech. Dig., pp. 85-86, May 1993.
(361 Y. KQnishi, ct al., "A 3&ns 4-Mb DRAM with a Battery-Backup (BBU) Mode," IEEE Journal ofsolid-state Circuits, vol. 25, no. 5 , pp. 1112-1117. October 1990.
[37] T. Ooirhi, et al., "A Wen-Synchronized Senring/Equalizing Method for S u b 1 V Operating Advanced DRAMs," in Symposium on VLSI Circuits. Tech. Dig., pp. 81-82, May 1993.
1381 M. Asakura, et al., "An Experimental 256-Mb DRAM with Boosted SenseGround Scheme," IEEE Journal of Solid-state Circuits, d.29. no. 11, pp. 1303-1309, November 1994. 1391 T. Sskata et al., "Subthreshold-Current Reduction Circuits for MultiGigabit DRAMS," in Symposium on VLSl Circuits, Tech. Dig.. pp. 45-46, May 1993. [40] T. hrruyama, et al.. "A New On-Chip Voltage Converter for Submicrome ter High-Density DRAMs," IEEE Journal of Solid-state Circnits, vol. 22, no. 3, pp. 437-441, June 1987. 141) M. T s h d a . e l al., -A 4-Mb DRAM with Aalf Internal Voltage Bit-Cine Precharge," IEEE Journal ofSolid-State Circuits, vol. 21, no. 5 , pp. 612617. October 1986. 1.121 M. Hiroguchi, e l
aL, "Dual-Operation-Vdtage Scheme for B S i g l e 5-V. 16-Mb DRAM," IEEE Journal of Solid-State Circuits, vol. 23, no. 5. pp. 1128-1132, Oetober 1988.
1431 G. Kitsukawe, et al., "A I-Mb BiCMOS DRAM Using TemperatureCompensstion Circuit Techniques," IEEE Journal of Solid-State Circuits, "01. 24, no. 3, pp. 597-602. Jnnc 1989. 144) M. Boriguchi, et al., "A Tunable CMOS-DRAM Voltage Limiter with Stabilised Feedback Amplifier," IEEE Journal of Solid-State Circuits, YO\. 25. no. 5. pp. 1129-1135, October 1990.
REFERENCES
407
[45] M. Roriguchi, et al., "Dual-Regulator Dual-Decoding-Trimmer DRAM Voltage Limiter far Brun-in Test," IEEE Journal of Solid-State Circuits, d.26, no. 11, pp. 15441549, November 1991. and H. Topshima, " A Voltage Doan Converter [46] K. Ishibashi, K. S-ki, with Submicroampere Standby Corrent for Low-Power Static RAMS," IEEE Journal of Solid-State Circuits, "01. 27, no. 6, pp. 920-926, June 1992.
[47] P. E. Anen, and D. R. Rolberg, "CMOS Analog Circuit Design," Holt, Rinehart and Winston Publisher, 1987. [48]
P. R. Gray, and R. G. Meyer, "Analysis and Design of Analog Integrated Cteuit," 2nd Edition Wiley Publisher, 1984.
[49] R. A. Blauschild et al., " A New NMOS Temperature Stable Voltage Reference," IEEE Journal of Solid-State Cicuitr. vol. SC-13, pp. 767-774, December 1978.
Y. Nsksgome, J. Etoh, E. Ymaeki, M. Ao?4 and K. Miyamwa, *Sub-l-prn Dynamic Reference Voltage Generator for BatteryOperated DRAMS," in Symp. VLSI Circuits, T e d . Dig., pp. 87-88, May
[60] H. &aka,
1993.
7 VLSI CMOS SUBSYSTEM DESIGN
In this chapter, we study the application of the dreuit techniqnes developed through Chapter 4 in the implementation of CMOS b d d i n g blocks soch as adders, multipliers, ALUs, data-path, and regnlar structures, etc. The pow= dissipation constraint is also included through the several options presented for each dreuit. The use of Phase locked Loop (PLL) in high-speed CMOS systems for deskewing the internal clock is also examined. Low-power issuer of the circuits presented are also discussed.
7.1 PARALLEL ADDERS Parallel adders ere the most important elements used in arithmetic operations of microprocessors, DSPr, ete. As in any logic design they are constrained by parameters aoch as speed, area, and power dissipation. The adder cell ir also an dement of multipliers, dividers, multiplier-acuundatorr (MACs). etc. A m o n g the varions adder's implementations used in many desigrw, we c a n cite the following clssse.:
-.
Carry Look-Ahead Adders (CLA);
m
Conditional Sum Adders (CSA).
m
Ripple Carry Adders (RCA);
Carry Select Adders (CS); and
This section h dovoted
to
describing all these adder classes.
410
7.1.1
CHAPTER7
Ripple Carry Adders
In Chapta 4, a d-rription of the fnmtiondity of an adder cell was presented. In an n-bit adder, a propagation of the carry always occurs. This propagation limits the speed of the adder. The simplest way to construct an n-bit adder is to cascade n 1-bit adders as shown in Fig. 7.1. This adder is called Ripple Carry Adder (RCA). Beesuse the carry ripples through the n-stager, the sum of the nthbit csnnot be perhmed until the c a w C=.L is evaluated. The delay of n-bit addition is given by
+.,
= (n - 1)t.
+ t,
(7-1)
where t , is the esrry delay and t. is the som delay. Since the carry propagation path is II critical stage for the delay, the full-adder cell should be optlnied. The sum and carry out are given by
S = A @ B ( B C
+
+
(7.2)
C,, = A . B (A B).C;, (7.3) The schematic of Fig. 7.2 cam be genewted to &dently implement the adder cell. Compared to the conventional CMOS full-adder implementation, there is no inveiter stage. Therefore, the carry delay is redoced. To optimiae the cell, the transistors in the carry path W, and W,, UUL be s i n 4 up [see Fig. 1.21. The other devices can be kept amall to reduce the load on the carry and the power dissipation. The transistors, driven by the carry in C,,, are placed close to the output. Thir will reduce the body effect. since the cairy signal is the
VLSI CMOS SubSystem Design
T
Crilicai path
T
411
CHAPTER7
412
latest one in an adder chain. The schematic of Fig. 1.2 ir symmetrical and leads to better layout and small area. Since the outpnts are complemented, and in order t o implement an RCA circuit, the configuration of Fig. 7.3 can be used. In this case, many cells use inverted inputs. Note that an n-bit RCA circuit is subject to the glitching problem. Fig. 7.4 shows 8 static simulation of a 4-bit adder, vrith the inputs A; set to zero (0), and the inputs B; and C,. i i s i g from 0 to 1. The outputs S, should stay at 0, however, due to the delay of the carry signal, through the chain of fulladders, the autpnts exhibit spurious transitions (glitching). There dynamic transitions dissipate extra powm and can represent an important portion of the total power. With careful design this glitchhg problem cam he minimized. One ddvbntage of the RCA is its low-power characteristic. However, its speed is very limited, particularly when the adder is wide.
Another efficient full-adder cell is based on Transmission Gates (TGs). Fig. 7.5 shows an optimived version of the fd-adder cell wing TGs & e d y discussed in Chapter 4. The carry ieal propagates only through one TG. Hence, an n-hit RCA would be faster and more compact than the conventional one'. Fig. 7.6 shows the construction ofan n-bit d d e r . Pmctiedy, an inverter is added every four stages to reduce the degradation of the carry signal due to the dktribnted RC effect. When the carry rignd is inverted after 4 I-bit stager, complementary carry path adders are used for the next 4-bit stages. This adder structure is sometimes called Mancherter adder. This circuit is faster than the RCA and may have loww power dissipation.
7.1.2 Carry Look-Ahead Adders To avoid the linear growth of the carry delay, we use a Carry Lookahead Adder (CLA) in which the earties can be generated in pardel. The carry of each bit is generated from the propagate and the generate ~ignalr(P(, G;)ss well i ~ sthe input carry (Go).The propaggste and the generate signals (Pi,Gi) are derived from the operands A; and B, hy
G; =
B.
(7.4)
VLSI CMOS SubSystem Design
413
CHAPTER7
414
I
A
Ci"
.
T
I
VLSI CMOS SubSystem Design
415
The carries of the four stager are given by
C I = G a t POCO
+
Cz = G I + PIGo PIP& Cs = Gn Cn = Gs
+ PxGr+ PzPzGo + PZPLPOCO
+ PsGr + PsPzGi + PsPzPxGo+ PaP,P,PoCo
(7.6) (7.71 (1.81
(1.9)
Fig. 1.7 shows the block diagram of a 4bit CLA adder. The carry generator blocks (CLG1 to CLG4) generate the carries CL to Cn, in parallel, &om the w r y in signal Co. The different P< and G; signals are implemented following the expressions given b7 Equations (7.4) and (1.51. The Bgenerator blocks (SG1 to SG4) generate the sums. The mm, S ( , Li generated by
Sc = Ci-1
@
Ai
@
B;
(7.10)
CHAPTER7
416
or
s, =
C Y, Yo
=Y
VLSI CMOS SuhSystem Design
qv;
xi (bl
431
CHAPTER7
432
7.2.2 Baugh-Wooley Multiplier It was noted that Biaun multiplier performs multiplication of unsigned nunbers. The Baugh-Wooley teehnique [7] was developed to design regular direct multipliers for two's complement numbers. This direct approach doer not need any two's complementing operations prior to multiplication. Let us consider two-numbers X and Y with the following form
x
= -x,-12"-'
+
c c
; a - I
X.2'
(7.22)
K2i
(7.23)
i=o
+
Y = -Y,-,2"-'
i=n-*
i=o
The product P = XY is given by the following equation
P = XY
+
x"_rY,_,2"-'
5
cc
i=n-2j=n-2
i=o
-x-.,
c
X;Ip'"
j=o
c
fi2"f"-Y
n.i
i=o
X,2"+'-'
(7.24)
i=o
In order to avoid the use of subtractor cells and use only adders, the negative t e r m should be transformed. So
c
i=n-2
__,.-x,_1
KZ"+L
-
x ".I
(-
p . 2
c
+ 2"-' + i=n-2 E P - 1
i=o
*=o
1
(7.25)
Using this property in Equation (7.23), the product P becomes P = XY
=
-2-'+(z".l
+
+
x".*Y"-,)
.2'*-2
Using the above rdstion M n x n multiplier, using only adders, can be imple mented. The schematic circuit diagram of 8.4 x 4 two's complement mdtiplicr bared on Baugh-Wooley'a algorithm is shown in Fig. 1.22. The different cells composing the array are &o shown. In this scheme n(n- 1) 3 full-addus are
+
VLSI CMOS SudSyslem Desagn
Figure T.22 M-Adder).
433
(a) 4 x 4 Baush-Wooley two's complement r e d s &nay (FA :
required. So for the ease a f n = 4 the array needs 15 adders. When n is relatively large, the Rnal adder stage in the multiplier army a n be implemented with the techniques discussed in Section 7.1. This type of multiplier L suitable for applications where operands vith less than 16 bits are to be processed. Application;, for snch a mdtiplier are, far exxamplc, for digital filters where s m d operands mc used (q., 6 , 8 and 12). For low-power and high-speed of operation, the array uses a CPL-like adder BS mentioned pieviously in Section 7.2.1,while a CSA scheme, combined with carry select, a n be u t i e d in the final adder. For operands equal or greater than &bit, the Baugh-Wooley scheme becomes too area-consuming and slow.
434
CHAPTER
7
Henee, techniques t o reduce the size of the array, while maintaining the regularity are required.
72.3 The Modified Booth Multiplier For operands equal or greater than &bits, the modified Booth algorithm [a] have been used in almost all the designed multipliers. It is bhsed on recoding the two's complement operand (Lo., multiplier) in order to reduce the number of partial products to be added. Thb makes the multiplier faster and uses less hardware (area). For eurmple. the modified Rad*-2 algorithm is based on partitioning the multiplier into overlapping groups of 3-bits, and each group is decoded to generate the correct paztial product.
VLSI CMOS SubSystem Design
435
Let us mite the multiplier, Y ,in two's complement ;=*--I
Y = -Y,-,2"-'
+
1 Y.2'
(7.27)
irnO
It can be rewritten as follows
In this equation, the terms in brackets have valuer in the set{-2, -1,O, 1, +2}. The reeoding of Y ,using the modified Booth algorithm, generates another number with the following five signed digits, -2, -1. 0, +1, +2. Each recoded digit in the multipliei performs B certain operation on the multiplicand, X ,85
illustrated in Table 7.1 Table 7.1 Partid ereduct .clYa, Y,,., Recoded 0
0
0
0 0
0 1
1 0
0
1
1
1
0
0
1 1 1
0 1 1
1 0 1
digit 0 +I +I +2 -2 -1 -1
0
Operation on X OXX + l X X
+I x x +2xx -2 x
-1
Y
x x
-1xx OxX
So the bits of the multiplier are partitioned into groups of overlapped 3-hits, each group permits generation of B ceitain partial product. The five posible multiples of the multiplicand are relatively easy to generate following the explanation given in Table 7.2 The generated partial prodnct is related to the multiplicand for each recoded digit by the relationships presented in Table 7.3. PP,is the partial product and PP, is the sign bit of the partial product w t h P, = Pn-l when no shifting of the partial product is performed. Note that the partial product is represented on n 1 bits.
+
CHAPTER7
436
Recoded Digit 0
+1 +2 -1 -2
Opuation on X Add 0 to the partial product Add X to the-partid-product Shift left X one position and add it to the partial product Add two’s complement ofX to the partial product Take two’s complement of X and shift left one
Table 7.S
Recoded Digit
0 +1 +2
-1 -2
Pmtial prodvct gmcrathn relations.
Operation on X
Added to
LSB
PP; = 0 PP; = x, PP, = PP; = x, PP, = Z,-,
fori=O,.-.n fori=O, ...a for i =0. ...n for i = 0,.. -n for i = O , . . .n
0
0 0 1 1
To clarify this algorithm, an example is presented in Fig. 7.23. Let X = l O O l O l O l and Y = 01101001. The recoded digits of Y are
oiioio,oi:
-
+a
-1 -2 +I
The bits are grouped into 3-bit groups overlapped by one bit and a bit with a value of aero is added on the right side of Y 85 Y-I. So the mdtiplicstian of two %bit numbers generates only 4 partial products. The number is then reduced by half, The partial prodnet in thb example is represented on 9 bits. For a correct partial product’s addition, the signs aze extended 85 shown in Fig. 7.23. The shape ofthe multiplier is then trapeiaidal due to the sign extension.
VLSI CMOS SubSystern Design
(-107)
10010101 = X
(+165)
%ELzy
437
Operalion
BltE recoded
+I
010
extension
~100101010
-2
100
-1
101
+2
ni I
1101010000011101 = P (-11235)
In order to make the =nay rectangular, and then more regular for VLSI implementation, the problem of sign extension must be addressed. This problem is more crucial when the operand lengths ars wide, where each partial product must be sign-extended to the length of the product. In thirIeetion we will not deal with the techniques to solve the problem of the sign extension. Bat we d discuss one technique which is shown in Fig. 1.24 for the e m p l e of Fig. 7.23. The bmie idea is to use two extra bits in the partial product. For the first partial product, the two additional bits, PP,+I and PP,+. ale equal to the sign bit of the partial product
PP..,,
= PP-,, = PP,
(7.29)
For the second partial product, if the first partial product was positive, then the two additional bits for this second partial product a e given by the expression above, otherwire we have two clues
PP,+z = PPm+,=l and
-
PP*+, = PP..+> = 1
if PP,=O
(1.30)
if PP, = 1
(7.31)
So it is more interesting to use a third bit, F, as a flag to indicate whether there is, from the previous partial, a negative sign bit to be propagated. F1 is the flag generated by the first partial product to the next one. For the example of Fig. 1.24, FO = 0 (no PP before the first one). and F, = F2 = F, = 1. SO for the first partial product there is a sign propagation to all the others. This
CHAPTER 7
438
(-107) (+I051
lOOlOlOl = X KOEl = Y Y Y
Operation
Bits recoded
+I
010
-2
100
..
:1E110010101 mOl10101 I0 ~OOllOlOll
D~00l01010
-I
101
+2
01 1
ll~10100P0011101= P (-11235) ..I
,
Additional hiis 10 he gencrawJ [sign ~i1cnsi0n1
8-1
0 Additional bits generated fmm the previous Sign and the prescnl sign Figure 1.24 Thc prcviour trample of Figvrc 7.23 eith aimpiifiId sign cxtm B,, then C, = 1, D , = 0, and AlAo > BIBo regardler. of the magnitudes of the lower bits. When A1 = BL = 0, the magnitudes of the two 2-b numbers depends on A. and Bo. In this situation, there are three
Let ns explain how
different cases:
1. AlAo
< B I B ofor
Eo = Fo = 0.
A.
c BO (i.e.,
Co = Do = 0). Then we can set
2. AlAo = BLBOfor Ao = BO ( k . , C, = 0, Do = 1). Then Eo = 0 and Fo = 1.
we can
set
3. AlAo > BIBo far AO > BO (i.e., C, = 1, Do = 0). Then Eo = 1 and Fo = 0.
we c m
set
These relations can easily be nsed to implement the second cell, Cz, of the comparator a8 shown in Fig. 7.37(c)
This technique, for the two-bit comparator, can be extended for an n-bit =omparator. It can be constructed by using B parallel tree of the cells C1 and C2. A 4-bit comparator could. for example, be constructed with two 2-bit comparators connected in parallel and at the output the 4 E and F generated signals
456
CHAPTER7
fed to an added C2 cell. In this architecture, the glitching is reduced by equdizing the delay paths of each cell. are
7.3.4
Shifter
Another macrocell of the data path is the shifter. It pertorms shift or rotate operations on the data If the number of bits to be shifted is arbitnuy, then a barrel rhifter is used [12,131. Fig. 7.38 shows the CMOS implementation
VLSI CMOS SubSystem Design
s3
s2
457
S1
SO
of a 4 b i t barrel sbifter. NMOS transistors are used as switches in the array. The input bns (Do- D,) can be connected to the output bus (Ra - RB)via the pass transistors. The control signal So-hselects the pass transistors to be switched. These signals determine the amount of shift and they m e generated by a 2-bit decoder. Since the outpots have a high level of VDD- VT,due to the pass transistor, then the output buffer nses a feedback PMOS device, Pf, to iestore the high level to VDO.This eliminates any DC current in the first inverter of the buffer.
Table 7.6 shows the values of the output bus function of the input data. Depending on the values ofD < 6 : 0 >, several shift operation8 can be performed. For example if D < G : 4 >= “O”, and D < 3 : 0 > is the 4-bit input data, then
CHAPTER7
458
B l o g i d shift is realiued. However, if D < 6 :4 >= “1” and D < 3 : 0 > is the input data, then an arithmetic shift operation is performed.
Table 7.6
Output bu. function of the &Sting amount
The barrel rhiftei is not 8 critical unit for the delay. A low-power operation is performed by odng a static implementation. This shifter can be implemented with transmission gates and the feeedbak PMOS are not required. However for low-power, the use of NMOS array is more efficient. The feedback PMOS should be sized to minimum.
7.3.5
Register File
A register file is a set oircgisters which store data. It consists of a small array of static memory c&. Register files are wed by miemprocessors and DSPs and they permit multiple read and write ports [14. 15, 16, IT]. A typical array is 32 registers of 32-bit. For example an ALU needs two pieces o i data from the regjster file. The array has dual-read ringle-te architecture.
Fig. 7.39 shows the schematic ofthe singleended memory eeU with 2 read ports and 1 write port (2R-IW). The read ports are the r e d bit-lines BL.RI and BL-R2. The memory cell, composed of two cross-coupled inverters h and 12 is addrwsed by two read word-line signals, W L R l and WL-R2. The NMOS transistor N, is controlled by the Wzite Enable ( W E ) signal. N1 is connected aerially to the write B E C ~ S S transistor N 2 . The transistor flz is controlled by the write word-line ( WL - W) signal. The transistor N, isolates the stored data from the write bit-line ( B L W ) .To write the datain the storage node A from the write bit-line, the imerters I , and I2 rhonld be sized earefnlly. The ratio of the inverter I, should be larger than 1 (e.g., 5 ) to set the threshold voltage of 1, to a law-level. This is due to the fact that Nl and N2 we&!+ transfers a high level (only 1’00 -VT=). Moreover, to ensure a correct write operation, the
P
‘ThedeFdlianofB iasivoninChc~pirr4.
VLSI CMOS SubSysten Design
BL-W
459
BL.RI
BL-RZ
WL-w WL-RI WLLRZ
WE(Wdte Enable) Figure 7.8s
( Z R I W ) rcgisterflle rrU.
feedback inverter 1, should he we& so the access transistors N, and N, can chmge the state of node A. For example the NMOS and PMOS of I, shodd be minim- siae except that the length of the NMOS is twice the minimum. Also the acce55 transistars should have highcr p compared to the transistors of 1,. For a given technology, the sizes should be determined by circuit simulation for a correct write operation. The inverter 1% is a buffer for the storage node.
A pair of three-port memory e& is shown in Fig. 7.40. This rtrueture has shared access transistor Na and write bit-line, B L W . To read and write the memory cell, the simplified rchematio of Fig. 7.41 is nsed. This schematic uses the calomn multiplexing scheme. For low-power, the register file U E ~ S static design and avoids the use of the conventional sense amplifier for bitline’s sensing. The sense amplifier consumes DC power. For a three port register file, two read and one write row decoders are required. Also, Write Enable (WE) and column addresses are needed to produce the column write enable for writing the data to the specified storage node. For fast operation AND gates can be u.ed with a m-om of of 5-bit inputs. During the read operation, if for example Na is asserted, then the data is put on the bit-line, BL.Rl. The bit-line is selected through the pass-transistor N,. The data is then senred by the inverter I , in Fig. 7.41. During this period, the
CHAPTER 7
460
BL-FSA
BL-RIA
HL-W
WE-I
BL_R2H
WE-2
Figure 1.10 A pmir d t h r r c p o r t memory c&
BCRiB
(2H-1W).
read enable signel, RE, is asserted, Ni is OFF and only the feedbaek PMOS P j is activated when a one ( V D-~VT,) is on the data-line. In this situation, the feedback PMOS charges up the data-line to VDD.Also the DC current, which c m be generated due to the reduced high l e d on the data-line, is completely eliminated. The p ratio of the inverter I, should be higher than one (e.g., 5 ) to achieve a symmetrical r e d access time for a % e m and a one. When R E = 0, then the data-lines axe i 4 a t e d from the bit-liner and the NMOS transistor Nz is ON. Therefore, the latch formed by the pair of inverters 11 and I , latches the old data. The operation of such a re&a file is fully static and does not dissipate any atatic power at any mode of operation. Furthermore, the read and write o p erations are asynchronous. This type of register file is suitable for low-power applications.
7.4
REGULAR STRUCTURES
In this section we examine the design of large regular rtruetnres such as Programmable Logic Arrays (PLAs), Read Only Memories (ROMs) and Content Addressable Memories (CAMS). The ROMs and PLAs are not only used to implement controllers in a regular manner but they also can be applied to signel processing. RAMS arc treated separately in Chapter 6. These large structures
VLSI CMOS SvbSystem Design
WSie decoder
(WAI
vow ,K. Y l W .... WE lWritof3nablc)
YOR. YOR. Y l R , . RE (Read Enable)
461
CHAPTER7
462
me usually dynamic circuits for fart operation. These dynamic circuits can be shut down with a power management Unit for power ravings. If for example the do& is turned OFF, all dynamic circuits go into 8 piechsrge mode with all PMOS precharge devices are ON.
7.4.1 Programmable Logic Array Logic functions such s those used in the control units of VLSI processors, or a r e hard to implement in random logic. One way of implementing these functions, in a regular structure, is the m e ofProgrammable Logic Array (PLA) [18,191.
in finitestate machines,
PLAs have regular architecture divided mainly in two planes BS shown in Fig. 7.42. Theso planes pelform a specific fnnction such 85 OR and AND. CMOS PLAs can be implemented in both static and dynamic styles. The style is chosen depending on the timing strategy in the chip. Other factors such BJ speed, power dissipation, and the allowed area, p l q an important role in the PLA design style. A CMOS PLA example, ushg psendo-NMOS like style, is s h a m in Fig. 7.43. The output OR functions are r & d with NOR gates. From Fig. 7.43(a), we have
PI = A t B t C = A.B.C
(7.33)
P, = A+C = A.C
(7.34)
Pa = B + C = B.6
(7.35)
-
P, = A + 6 = A.C
(7.36)
The buffers are used when the load on the bit-line is large. They consist in general of two invectez's stages. The OR plane is in principle similar to the AND plane [Fig. 7.43(b)]. From Fig. 7.43(b), we have
x
= Pi
+ P, + Pa
Y = P, + P,
(7.37) (7.38)
For this pseudo-NMOS PLA, NOR-NOR logic gate style iz used. This example shows that the PLA organization is useful for implementing Sum Of Products (SOP) functions. Hence any SOP function can be redzed by programming the army with the AND and OR cells. Any type of latch or register cm be used at the input and output. ThL design style of PLAs has e n m d size area and
VLSI CMOS SudSystem Design
Inputs
Figvre T.12
463
0"tP"tE AND-OR PLA ~ h r t e c t u r e .
it is simple to implement. However,it is not suitable for low-power application due to the high DC power dissipetion, p a r t i d w l y when the PLA is large. Moreover, it has B speed problem.
In dynamic CMOS style, the circuit shown in Fig. 7.44 can be used. It is a selftimed PLA, where the AND and OR planes are both realised =sing precharged NOR configuration. In this structure, o d a~ &gle clock phase is needed. When the dock, elk, is high the bit-lines are preeharged in both planes. The NMOS transistors NA and No are OBF, guaranteeing that there is no p.th to ground. Tracking liner in both planes are used to generate a delayed clock to the OR plane. When the clod is law, the prechargt PMOS transistors, in the AND plane, turn OFF, N A tarns ON and the produets a~leevdnsted. The tiaching lines ensure that No tuns ON only when the inputs to the OR planer are stable. Othetwise the outputs can be spmiously discharged. This PLA is fast, bnt it har a lot of wasted dynamio power. The wmted power har r e v a d sources such ar:
464
CHAPTER7
_ _ _
X = ARC+AC+RC
Y = ABCiAC
x = q + Pi+ Fj$
L
+
P
4
(bl Figure 1.48
P#eudD-NMOS
CMOS PLA:(s)AND plane; (b)OR pknc.
VLSI CMOS SubSystern Design
AND-plane
465
OR-plane
clk
- :vinua1Ground Figure 7.44
Sclf-timcd d+c
PLA using NOR-NOR style.
m
The virtual ground Liner are charged and discharged every cycle. The total eapheitance of the virtual ground is important, particularly for large PLAs because for the purpose oflayout compactness the ground lines ate in diffusion. This capacitance can be reduced using metal level in multi metal’s technology;
m
The number of inverters forming the buffers are important. Then, duiing the evaluation, several of them switch; and
m
The switching activity of dynamic NOR implementation is high [see Chapter 41.
Consider now the PLA shown in Fig. 7.45 mith AND-NOR structure. The OR plane is still the same compmed to the PLA of Pig. 7.44. However, the AND plane is considerably simplified because: rn
The virtual ground Liner disappear; and
CHAPTER 7
466
OR plane
AND-plane Delay
Tra'h"g
- 'Vinual Ground Figure 1.45
Sclf-timeddynamic PLA u s h r AND-NOR stylo
The number of inverters for buffering is reduced by half. The switching activity of the NAND implementation is aLo lower than that of NOR implementation, resulting in Iower power in the AND plane. O n e problem associated with this struetme is that the use of NAND may result in a large discharge time. Another dynamic PLA combines the pseudo-NMOS and dynamic logic design styles [19].Fig. 7.46 shows an example of such a structure. The AND plane uses a predseharged pseud-NMOS NOR style, while the OR plane uses B conventional dynamic precharged style. During the precharge phase, the clock signal is high and the bit-lines in the AND are predircharged to ground. In the OR plane, the bit-lines are precharged to VDD.The i n p d s@ to the OR plane are low. During the evaluation phare (clk = 0), the PMOS loads in the AND plane are ON, and t h e plane behaves as pseudo-NMOS logic. In this case, the PMOS device should be siaed correctly to ensure safe operation when the output stays at a low level. The product terms are evaluated and then the outputs. During this evaluation phase, the PLA dissipates a static power m d y by the AND plane. Then the power is increased by this DC component.
VLSI CMOS SubSystern Design
PMOSlOad
467
,
This PLA does not need the seW-t-g techaiqne nsed previously. Also it was shown that this PLA has a kst operation [IQ]. When implementing smaller controllers, it is sometimes more interesting to use random logic. The implementation consists of two or more levels of logic gates using s standard cell library. It is much less regular than a PLA structure and it can have lower power dissipation.
7.4.2
Read Only Memory
Read Only Memory (ROM) is used in many applications. In DSPs, for example. it can be used BJ table lookup to store coefficients. Also it is often used in VLSI processors as a microcode controller. In this case, the ROM contains the microprogram instructions. Typical miero-ROM size is 2k words of 64 bits. The read-out cycle of the ROM limits the speed of the processor. Conceptually, the structore of a ROM is quite similar to that of B PLA. Fig. 7.41 shows a simple ROM circuit architecture using NOR logic design. The state of the memory array is retained even if the ROM is not powered. The
89P
VLSI CMOS SubSystem Design
469
Bit-he (merall)
A - word-fine (rnCtSl2)
G
Diffurian
Ward-ime (polyriiicon)
Figure 7.41
Layout of a ROM memery cell
The ROM can be implemented in both styles: static and dynamic. In static styla, the pseudo-NMOS logic, similar to that of static PLA, can be used. Fb. 1.49 shows an example of a small ROM 'Lsing pseudo-NMOS circuit style. The conditioning circuits use PMOS devices, with their gates grounded, and the sense amplifier circuit is simply an inverter. The column decoder is also shown. One of the column decoders selects one of the two bit-lines. Then, node A is initially at VDD.If the selected bit-line is &charged, then node A is discharged and the outpot is pulled up to VDD.The pseud-NMOS is eaey to design and does not need a careful design, howveer, the power dissipation may be significant due to the DC current. For a relatidy large ROM, like the one used in microcontrollers, the power dissipation c m be significantly rcduced using the low-power techniques of SRAMsa. They include pulse mode operation using address transition detection, and r m d swing sensing, ete. *These tecbsiisuca M discused in mom detail in Chapter 6.
CHAPTER7
470
ROW demder
Figure 7.40
4
q
B. Fig. 8.6 shows the application of precomputing technique to the comparator. If the most signifiesnt bit, A=.I and B,.,, are different, then F ean be performed from the 1-bit MSB comparator and the registers R2 and R3 are disabled. Therefarc, the (n-I)comparators are shut-down. If the inputs have a uniform probability equal to 0.5, the enable signal has a pmbability of 0.5 to be at the logical level "1" or "0". Therefore. for h relatiwly large n the power saving can be qnite significant even if we include the power due to the *dditional circuitry. This technique of preeomputation can be synthesized for logic opt-ation. The selection of sub-set of input signals for which the output is precomputed
Low-Power VLSI Design Methodology
497
is critical for power savings. Otherwise, the additional circuitry can dissipate a relatively important power. Note that this added logic slightly increases the area of the circnit and may also inerese the clock cycle. The preeomputation techniqne can be applied to a mnltiple output function. However, if the logic has a large number of ontputs, then it may be worthwhile to s e k c t i d y apply precompotation technique to a small number of complex outputs. This selective partitioning will add a duplication of combinational logic and regirtera and this may offset the powex savings.
498
CHAPTER8
8.3 LP ARCHlTECTUKE-LEVELDESIGN In this section, sxhitecture meens also Register Transfer Level (RTL). The architecture uses a set of primitives suoh 8s adders, multipliers, ROMs, register filer, etc. RTL synthesis programs m e used to convert an RTL description to a set of registers and combinational lwgic. The impact of low-power techaiqnes on the architecture level c a n be more significant than the gate level as .rill be shown in this section. Techniques to reduce the power dissipation discxssed m e : parallelism, pipeline, distributed processing m d power man 0 eorrerpands to a lower activity for positively correlated signals, while p < 0 corresponds t o a higher activity for negatively correlated signals. T h e MSB region starts from the break point B P I . The region between BPO and BPI can be modeled by linear interpolation. BPO and B P 1 can be determined from the word-level statistics [37]. The power estimation of the architecture modules is based on B black-box teehnique of the switched capacitance. T y p i d modules are: adders, multipliers,
Low-Power VLSI Design Methodology
521
shifterr, RAMS, ROMs, ete. The power dissipation is modeled for each module by P = CV&f (8.23) where the switched capacitance C is related to the compleity and the activity of the module. For example of an n-bit dpple-carry subtractor, the switching capacitance is modeled by
c
= CGf,n
(8.24)
where C,,, is a capacitive coefficient (in fF/bit) determined from the DBT model. Ce,f can be a single coefficient for the U W N case. The DBT model employs several codfieienti for C.,,, which reflect the data representation and signal statistics. For the case of the subtractor, for example, B table of Cc,j is generated as a function of all possible data transitions, i.e., i g n bits transitions and LSB bits random transitions.
To extract the capaeitiae coefficients ofeaeh module, the library should be characterbed. This operetion is performed onetime for one library. The process of extraction consists of several steps: I
Pattern generation. Input patterns to B module are generated based on the DBT data model. Both xandom (UWN) and sign data stlearns should be used. The input patterns containing the U W N camponent must be simulated for several cycles. This allows convergence of the a~eragecapacitance. Simulation. The generated patterns are fed to a simulator (such 85 a circuit simulator) from which the switching capacitances ace extracted.
rn
- -
Capacitive coefficient's extraction. The simulation step produces the average effective switching capacitances for the entire series of applied input tramitions such a: U U, S 9 , cte. The capacitive coefficients are utracted from the effective switching capacitances and the complexity parameters.
Based on this methodology, a power mdysis tool, at the architectural level, has been developed
[%I.
'U and S me-
UWN and dgl P-S
of the input bits. rmapcctively.
522
CHAPTER8
8.5.4 Behavioral-LevelPower Estimation A behavioral representation describes the function of .e system versus a set of inputs. The behavior can be specified, for example, by algorithms (in Vedog, VHDL, ete.) 01 by boolean functions. The power estimation, at the behavioral level, relates the consumed energy to the execution of an algorithm. Decisions at the system and behavioral levels can influence the final power dissipation of the circuit by several orders of magnitude.
One approach for power estbation, at the behavioral level, h a been proposed in [38]. It is based on the combination of analytical and stochatic power models. In this work, e cl- ofapplieationa such a zeal time DSPs is considered for the power estimator. In the behavioral context, the power consnmed by a hardware resource is given by
P = N.CV'f
(8.25)
where N . is the number of accesses to the resource over the period of computation. Cis the average capacitance switched per access and f is the computation frequency. In [38] the power of aome hardware ielionrce~,such as execntion units, registers, etc., are analytically modeled (using Equation (8.25)) from the Control/Data
Flow Graph (CDFG)which is used to represent the design. The average capacitance switched, per BCC~JI, for a partioular hardware is estimated from the white noise data modd. The power consumed by hardware resources such a controllers, interconnects, and clock network is diScult to estimate. Statistically a large number of reabed chips i used to estimate the switched capacitance of there hardware ~esources.
8.6
CHAPTER SUMMARY
Low dynamic power techniques at several levels of abstractions have been presented. Algorithmic and architectural decisions c ~ influence n the power dissipation of a circuit by orders of magnitude. Therefore, CAD tools that help the designer to analyee the power of the ckeuit at these levels are needed. At lower levels of the design, the power reduction teehniqner offer some ravings but less than the one expected at higher levels. Several powor estimation tools have been discussed at the different levels of the design. Keep in mind that the circuit simulators provide B high accuracy for power analyais and take into account all power components.
REFERENCES
[I] K-Y. Chaa. and D. F. Wong. "Low Power Considerations in Floorplan Design," Prae. of the International Workshop on Law Powev Design, pp. 45-50, April 1994.
[Z] H. V8ishnav and M. Pedram, "PCUBE A Performance Driven Placement Algorithm for Lower Power Designs," Proc. of the EURO-DAC'93, pp.7277, September 1983. [3] A. Shcn, A. Ghosh, S. Devadar, and K. Keutaer, "On Average Power Dissipation and Random Pattern Testability of CMOS Combinational Logic Network," Proc. of the International Conference on Computer-Aided Design, pp. 402-401, November 1992. [4] K. Keutaer, "The Impact of CAD on the Design of Low Power Digital Circuits." IEEE Symposinm on Low Power Electronics, Tech. Dig., pp. 4245, October 1994. [5] GY. Tsui, M. Pedram, and A. M. Despain, "Technology Decomposition and Mapping Targeting Low Power Dissipation," 30th ACMfIEEE Dcsign Automation Conference, Tech. Dig., pp.68-T3, June 1993. [6] R. Murgai, R. K. Brayton, and A. Sangiovanni-VinEente, "Deeomposition of Logic Functions for Minimum Transition Activity," Proe. of the International Workshop on Low Power Design, pp. 33-38, A p d 1994. [TI
V.Tiwad, P. Ashar, and S. M&,
"Technology Mapping for Low Power." 30th ACMfIEEE Design Antomation Conference, Tech. Dig.,pp.74-79, Jrme 1993.
[a] K.
Scott and K. Keutsc., "Improving Cell Libraries for Synthesis," IEEE Custom Integrated Circuits Conference, Tech. Dig., pp. 128-151, May 1994.
[9] C. Lemonds and S. Mhhant Shetti, "A Low Power 16 by 16 Multiplier using Transition Reduction Circuitry," Proe. of the International Workshop on Low Power Design, pp. 139-142, April 1994.
LOW-POWER DIGITALVLSI
524
DESIGN
A. Chandrakasan, S. Sheng, and R. W. Brodcrren, '%w-Power CMOS Design," IEEE Journal of Solid-state Circuits, "01. 27,no. 4, pp. 472-484, A p d 1992. U. KO,P. T. Balsam, and W. Lee, '"A Self-timed Method to Mlnimiie Spurious Trannitionr in Low Power CMOS Cixcuit.," IEEE Symposium on Low Power Electronics, Tech. Dig., pp. 62-63,October 1994.
[I21 R. I. Bahar, H.Cho. 0 . D. Hachtcl, E. Mac", and F. Somenzi. "An Application of ADD-Based Timing Analysis to Combinational Low Power ReSynthesis," Proe. of the International Workshop on Low Power Design, pp. 139-142. April 1994. [I31 M. Alidins, 1. Montiero. S. Devadar, A. Ghosh, and M. Papaefthmiou, "Precomputing-Based Sequential Logic Optimization for Low-Power," IEEE lhnsactionr on Very Large Scale Integration Systems, vol. 2, no. 4, pp. 426-436, December 1994. 1141 A. Ghersho, and R. Gray, "Vector Qusntisation and Signal Compression,' Khwer Academic Pubhhers, MA, 1992.
[I51 D. B. Lidrky, and J. M. Rabaey, "Low-Power Design of Memory Intensive Functions," IEEE Symposium on Low Power Electronic-, Tech. Dig., pp. 16-11. October 1994.
[16] A. P. Chnndrskasan, A. Burstein, and R. W. Brodersen, "A Low-Power Chipset for B Portable Multimedia I/O Terminal," IEEE Jonrnal of SolidState Circuits, "01. 29, no. 12, pp. 1415-1428. December 1994.
[I71 J. Sfhut., *A 3.3 V 0.6 p m HiCMOS Superscalar Microprocessor," IEEE International Solid-State Cholits Conf., Tech. Dig., pp. 202203,Febiuary 1994.
[I81 N. K. Yeung, Y-H.Sutu. T. Y-F.Su, E. T. Pat, C-C Chao, S. Akki, D. D. Yau, and R. Lodenquai. "The Design o f a SSSPECint92 RISC Processor under ZW," IEEE International Solid-state Circuits Conference, Tech Dig., pp. 206-207, February 1994. [19] D. Pham, et sl., "A 3.0W 75SPECint92 85SPECfp92 Superscalar RISC," IEEE International Solid-state Circuits Conference. Tech. Dix., DO. 212213. February 1994 [ZO] G. Gerora, et al., "A 2.2 W 80 MHz Superscalar RISC Microprocessor." lEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1440-1454, De-
cember 1994.
REFERENCES
525
[XI S. Gary, C. Diete, J. Eno, G. Geross, S. Park, and H. Sanches. "The PoaerPC 603 Microprocessor: A Low-Pow- Design for Portable Apphtiom," Proc. of COMPCON'94, Tech. Dig., pp. 307-315, February 1994. [22] R. K. Kolagotla, S-S. Yu, and J. F. Jda, "VLSI Implementation of a 'Itee Searched Vector Quantieer," IEEE Transactions on Signal Processing, "01. 41, no. 2, pp. 901-905, February 1993.
[23] C-L. Su, C-Y. Tsui, and A. M. Derpain, "Low Power Aichitecture Design and Compilation Techniques foz High-Performance Processors," Proceedings of COMPCON'OI, Tech. Dig., pp. 489-498, Februsry 1994. [24] A-C Deng, "Power Analysis for CMOS/BiCMOS Circuits." Proe. of the International Workshop on Low Pow- Design, pp. 3-8, A p d 1994. [25] C. M. Emher, "Power Dkipation Andyysk of CMOS VLSI Circaits by Means of Switch-Level Simulation," Proc.of the European Solid-state Circuits Conference,pp. 61-64, 1990.
1261 M. A. Cirit, "Estimating Dynamic Power Consumption of CMOS Circuits," IEEE International Conference on Computer Aided Design, pp. 534537, November 1987.
[27]F. Najm, I. Hai,and P. Yang, *An extension of Probabilistic Simulation for Reliability Andy& of CMOS VLSI Circnits," 28th ACMjIEEE Design Automation Conference, Tech. Dig., pp. 644649, June 1991. [28] A. Ghosh, S. Devadas, K. Keutser, and J. White, 'Estimation of Average Switching Activity in Combinational and Sequential Circuits," 29th ACM/IEEE Design Automation Conference, Tech. Dig., pp. 253-259. June 1992. [29] F. N. Najm, '"A Survey of Power Estimation Techniques in VLSI Circuits," IEEE Transactions on Very Large Scale Integration Systems. vol. 2, no. 4, pp. 446-455, December 1994. [30] R. E. Bryant, "Graph-Baaed Algorithms For Boolean Function Manipulation," IEEE Tmnsaetiona on Computer-Aided Design, pp. 677-691, Augort 1986. [31] B. J. George, G. Yeap, M. G. Wloka. S. C. Tyle., and D. GossCn, "Power Analysis for Semi-custom Design," IEEE Custom Integrated Circuits Conference, Tech. Dig., pp. 249-252, 1994.
526
LOW-POWER DIGITALVLSI DESIGN
[32] B. J. George, G. Yeap, M. G. Wloka, S. C. Tyler, and D. Goss&, "Power Analysis and Characteridion for Semi-Custom Design," Proc. of the Int e r n s t i o d Workshop on Low Power Design,pp. 215-218, April 1934. 1.331 D. Lui, and C. Svensron, "Power Conramption Estimation in CMOS VLSI Chips,' IEEE Journal of Solid-state Circuits, uol. 29, no. 6, pp. 663-610, June 1994. [34] A. B. Bakoglu, "Circuits, Interconnects, and Packaging for VLSI," Addison-Wesley, Rcading, MA, 1990.
[35] S. R. Powell and P. M. Chm, 'Estimating Power Dissipation of VLSI Signal Processing Chips: The PFA Technique," VLSI Signal Procesing N.pp. 250-259, 1990.
1361 P. E. Landman, and J. M. Rabaey, "Power Estimation for High Level Synthesis," EDAGEUROASIC, Paris, Rance, pp. 361-366,February 1993. [37] P. E. Landman, and J. M. Rahaey, "Bla&-Box Capacitance Models for Architectural Power Analysis," Proceedings of the International Workshop on Low Power Design, N a p , CA, pp. 165-170,A p d 1994. 1381 R. Mehra, and J. Rabaey, "Behavioral Level Power Estimation and Exploration," Proceedings of the International Workshop on Low Power Design, Nape, CA, pp. 191-202. April 1994.
INDEX Absolute value calculator. 454 Adders carry lookahead, 412 carry select, 420 sompruison, 425 conditional I-, 423 Manchester, 412 ripple carry, 410 Address transition detection, 332 Adiabatic computing, 249 ALU, 451 Arithmetic logic unit, 451 Array multiplication, 429 ATD,332 AVC, 454 Back-biar generator, 373 Barrel rhifter, 456 BiCMOS applications, 299 BiNMOS logic, 272 bootstzapped, 288 CEBiCMOS, 285 comparison, 294 complementaiy technology, 43 complementary, 283 conventional gate, 257 delay analysis, 262 DSP, 303 gate array, 304 low-voltage families, 280 merged, 281 power dissipation. 266 pracesser, 36
quasi-complementary, 282 shunting techniques, 268
Bidirectional I/O, 229 BiNMOS family, 272 gate design, 274 logic gates, 277 p-transistor, 299 Bipolar EberrMoU model. 94 Gummel-Poon model, 101 high current effects, 99 hwh level injection, 101 Kirk effect, 99 knee cumnt, 101 structure, 91 technology, 21 transit time, 105 Webster effect, 99 Bird’s beak, 30 Body effect, 66 Boosted voltsge generator, 377 Booth multiplier, 434 Bootstrapped BiCMOS, 288 BSlM model, 77 Buffet siring, 221 By-pars capacitance, 235 CAM, 470 Capacitance estimation, 138 fringing, 144 gate, 83 i n.w t . 139 junction, 82 MOS. 82 parasitic, 141 wiring, 143
528
LOW-POWERDIGITAL VLSI DESIGN
CBiCMOS, 283 CEBiCMOS, 285 Channel length moddation, 75 Chmge pump, 373 Charge sharing, 180 Clock buffers, 226 Clock distribution, 224 Clock skew, 187, 474 Clock tree, 226 Clacked CMOS, 183 C I O ~ singlephase, 198 strategy, 188 two-phase, 202 CMOS sealing, 89 CMOS complex gate, 149 CPL, 203 delay- 124 domino, 177 DPL, 207 dynamic, 177 full-adder, 171 inverter, 116 layout, 161
NORA, 183 power dissipation, 129
process technology, 14 peodc-NMOS, 176 SRPL, 210 tranamistiion gate, 169 Zipper, 183 Colnmn decoder, 332 Comparator, 455 Complementary BICMOS, 283 Complementary pass-transistor logic, 203 Compressor, 442 Content addressable memarp: .. 4:70 Control unit, 451 CPL, 203 current gain, 97
Data path, 450 Desi- roles, 44
Dital d g d P I O C ~ Q S O I , 303 Distzibuted processing, 502 Domino logic, 177 DPL, 207 DRAM, 356 asceoo t i e ,
359
architecture, 359 baek-bi- generator, 373 boosted voltage generator, 377 ceh 359 charge pump, 373 deeodez, 366 half-voltage generator, 371 hierarchical word-line, 370 lowvoltage, 381 refresh, 377 sense amplifier, 367 DSP, 303 Dnal pass-tramistor logic, 203 Dynamic logic, 177 Early effect, 89 voltage, 99 Ebers-Moll model, 94 Edgetriggered D-Ripflop, 194 F&, 146 Fanout, 146 Flipflop, 194 Floorplanning, 490 hequency divider, 482 FuU-adder, 171 Full-custom design, 165 Gate array, 166, 304 Glitches, 160, 493 Ground bounce, 233 CTL, 236 Gummcl-Poon model, 101 Gunning 110, 236 Half-voltage generator. 371 High level injection, 101
Indez
HSPICE bipolar parsmeters, 105 MOS parameters, 77 110 circuits, 214 Input pad, 214 Isolation, 27 JK Bipflop, 197 Kink effect, 62 Kirk efteet, 99 Latch, 190 dynamic, 191 hold time, 190 setnp t i e , 190 static, 190 Leakage current, 130 Lightly doped drain, 17 L o 4 oxidation of silicon, 28 LOCOS, 28 Low-power algorithmic-level, 507 arehitreturtlevel, 498 circuit techniques, 239 CMOS technology, 17 DRAM, 364 gate-level, 490 Layout guidelines, 165 physical design, 489 reference voltage generator ,399 SRAM, 330 Low-voltage CMOS technology, 20 DRAM. 381 MOS model, 84
SRAM, 352 TTL, 215 MBiCMOS, 281 Memory DRAM, 356 ROM. 467 SRAM, 313 Merged BiCMOS, 281 Minimum power supply, 123
529
Mobility model, 74 MOS SPICE Models, 69 MOSl model, 72 MOS3 model, 73 Multi-threshold voltage techniqne, 242
Multiplexer, 171 Multipliers Baugh-Wooley, 432 Braun, 429 comparison, 450 modiiied Baath, 434 Wanace, 442 N-well process, 14 Noise margin, 121 NORA logic, 183 Output buffer, 229 Output pad, 227 Pardel adders, 409 Parallelirm. 498 P-tranristor logic complementary, 203 conventional, 169 dud. 203 swing restored, 203 Phase IocEred loop, 473 Pipelining, 500 PLA, 462 Plaeement and routing, 490 PLL, 473 charge pumped loop, 414 filter, 479 phase frequency detector. 476 voltage controlled oscillator, 479 Power diSsip&on components, 129 dynamic, 132 estimation, 510 internal, 152 measurement, 138 short-circuit, 135 stetic, 130
530
Power management, 505 Prechargc transistor, 178 Preeomputation, 496 Prababilirtic power estimation, 512 Programmable logic a ~ r a y462 , Pseudo-NMOS, 176 QCBiCMOS, 282 Quasi-complementary BEMOS, 282 Raee, 493
RAM dynamic, 356 static, 313 Read only memory, 467 Reference voltage generator. 395 Register file, 458 Register transfer level, 498 Register, 194
Reg& structures, 460 RGM, 467 Row decoder, 332 RTL, 498 RVG, 395 Scaling, 89 Schmitt trigget, 218 Self-reverse biasing, 239 Semi-custom design, 165 Sense amplifier. 339 Shift-, 456 Silicon On Insulator. 52 SO1 SIMGX, 52 Sol. 52 SPICE, 510 Spnrious transition, 160, 412,493 SEAM, 313 addrear access time, 315 architectnx, 315 ATD, 332 bitline prechatge, 337 cell. 318 column decoder, 332 divided word-line. 348
equalieing, 327 hieiacbical word decoding, 350 law-voltage, 352 ontpnt latch, 347 read cycle time, 315 readjwsrite circuitry, 324 row decoder. 332 s-e
amp&,
339
SRPL. 210 Standard-cd, 165 Subthreshold current, 86 Swing restored pars-transistor logic, 203 Switchiw activity. 152 Technology mapping, 491 TFT, 323 Thin film transistor, 323 Threshold mltage, 66, 85 TLB, 470 Toggle, 197 Trench isolation, 3 1 TTL. 215
-
..
Vector quantiacd image encoder, 502
Video compression, 502 Voltage controlled oscillator, 479 Voltage down convcrtez, 389 Voltage levels interface, 231 Voltage-eontrolled delay h e , 482 VQ, 502 Wallace tree, 442 webster effect, 99 Zipper CMOS logic, 183