Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications
Niccolò Battezzati · Luca Sterpone · Massimo Violante
Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications
123
Niccolò Battezzati Dipto. Automatica e Informatica Politecnico di Torino Corso Duca degli Abruzzi 24 10129 Torino, Italy
[email protected] Luca Sterpone Politecnico di Torino Corso Duca Degli Abruzzi 24 10129 Torino, Italy
[email protected] Massimo Violante Dipto. Automatica e Informatica Politecnico di Torino Corso Duca degli Abruzzi 24 10129 Torino, Italy
[email protected] ISBN 978-1-4419-7594-2 e-ISBN 978-1-4419-7595-9 DOI 10.1007/978-1-4419-7595-9 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010938708 c Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 4
Part I Basic Concepts 2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts . . . 2.1 FPGA Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 FPGA Configuration Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Floating Gate Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Antifuse Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 SRAM Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Logic Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Fine-Grain Logic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Coarse-Grain Logic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Routing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The Switching Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The Input/Output Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 The Configuration Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 An Overview of the Architecture of Modern FPGAs . . . . . . . . . . . . . . 2.7.1 Logic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Interconnection Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 Memory Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Arithmetic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.5 Processing Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.6 Interfacing Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 7 12 12 13 13 14 14 15 17 18 21 21 23 24 27 30 31 31 34 34
3 Reconfigurable Field Programmable Gate Arrays: Failure Modes and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1 The Impact of the Environment on the Device . . . . . . . . . . . . . . . . . . . 37 3.1.1 Radiation Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 v
vi
Contents
3.1.2 Radiation Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Physical Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 From the Effect to the Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Fault Models in FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 SEUs and MCUs Effects on the FPGA’s Routing Resources 3.1.7 SEUs and MCUs Effects on the FPGA’s Logic Resources . . 3.1.8 Topological Modifications Induced by SEUs and MCUs . . . 3.2 Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Life Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Accelerated Radiation Testing . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Fault Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Analytical Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40 41 44 50 52 57 58 62 64 66 68 74 82
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.1 Overview on the Design Process for FPGA Applications . . . . . . . . . . 85 4.1.1 FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1.2 Application Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Techniques for FPGA Manufacturer . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Mitigation Techniques for SEL . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2.2 Mitigation Techniques for TID . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2.3 Mitigation Techniques for Single Memory Elements . . . . . . 106 4.2.4 Mitigation Techniques for Programming Elements . . . . . . . . 113 4.2.5 Mitigation Techniques for Memories . . . . . . . . . . . . . . . . . . . 120 4.2.6 Mitigation Techniques for Logic Elements . . . . . . . . . . . . . . . 122 4.2.7 Mitigation Techniques for Input/Output Elements . . . . . . . . . 124 4.3 Overview of Techniques for FPGA User . . . . . . . . . . . . . . . . . . . . . . . . 126 4.3.1 In-Chip Mitigation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 126 4.3.2 Off-Chip Mitigation Techniques . . . . . . . . . . . . . . . . . . . . . . . 167 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Part II Practical Concepts
5 Reprogrammable FPGAs for Mission-Critical Applications . . . . . . . . . 179 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 5.2 Radiation-Tolerant Reprogrammable FPGAs . . . . . . . . . . . . . . . . . . . . 180 5.2.1 Virtex-4 QV Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.2.2 Actel RT ProASIC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 5.3 Radiation-Hardened Re-programmable FPGAs . . . . . . . . . . . . . . . . . . 184 5.3.1 Atmel ATF280 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Contents
vii
6 Putting Mitigation Techniques at Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.1 Mitigation Techniques for SRAM-Based Devices: The Xilinx Virtex Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.1.1 Single Event Upsets Consideration . . . . . . . . . . . . . . . . . . . . . 187 6.1.2 Multiple Cell Upsets Considerations . . . . . . . . . . . . . . . . . . . . 192 6.2 Mitigation Techniques for Flash-Based Devices: The Actel ProASIC3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.2.1 Single Event Transient Characterization . . . . . . . . . . . . . . . . . 198 6.2.2 Single Event Transient Mitigation . . . . . . . . . . . . . . . . . . . . . . 201 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 7 System-Level Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.2 The Target Radioactive Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 7.3 The Impact of the Target Radioactive Environment . . . . . . . . . . . . . . . 208 7.4 The Impact of the Target Application . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.5 System-Level Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Chapter 1
Introduction
Field-programmable gate arrays (FPGAs) play an important role in a growing number of applications. Originally devised to implement simple logic functions, FPGAs are today able to implement entire systems on a single chip. The most advanced FPGA devices as the Xilinx Virtex-7 family [3] are now offering up to 2 million logic cells, 65 Mb of embedded memory, and a number of additional features such as high-performance arithmetic functions and high-speed input/output modules. Besides the resource availability, FPGAs offer to designers two additional features that cannot be found in application-specific integrated circuits (ASICs). By exploiting FPGAs, designers can concentrate all their efforts in the application development, letting to someone else, i.e., the FPGA manufacturer, to deal with the complex task of developing and fabricating a correctly working silicon device. In case FPGAs are used, the actual silicon that will implement the application is already available from the beginning of the application design. On the contrary, in case of ASICs, the silicon is manufactured only after application design and validation have been completed. As a result, by exploiting FPGAs, designers can reduce significantly the time to market of their applications, as application development and its silicon implementation are decoupled. Moreover, FPGAs are general-purpose silicon that can be customized by designers for implementing virtually any application: the very same device can be re-used for a wide range of applications. As a result, the cost of developing a new FPGA device is shared among a larger base of user than for a new ASIC, which is generally targeted to only one specific user. The cost for each FPGA device can hence be kept much lower than that of each ASIC device. Some FPGAs adopt technologies that make them reconfigurable: the application the device implements is defined by an on-chip memory that can be freely altered by designers. On the contrary, ASICs are not reconfigurable: when the application has been etched in the silicon, it cannot be modified. Reconfiguration capability offers a significant competitive advantage with respect to ASICs in a number of possible scenarios: • In case of bugs, they can be fixed easily by downloading a new, correct, implementation of the application in the FPGA. This operation can be done without removing the device from the system where it is deployed. For certain type
N. Battezzati et al., Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications, DOI 10.1007/978-1-4419-7595-9_1, C Springer Science+Business Media, LLC 2011
1
2
1 Introduction
of applications, easy reconfiguration for bug-fixing allows for enormous cost savings. For example, in case a bug is found in an electronic apparatus employed in a satellite already placed in orbit, the capability of reconfiguring the FPGAs it embeds can make the difference between saving the entire mission or losing it, and with that substantial amount of money. • Reconfiguration implies the possibility of changing the algorithm the FPGA implements. New features can thus be added to extend the set of services the application offers when the system is already deployed in the field. By enabling the system to evolve, designers can effectively cope with the obsolescence of apparatus, thus prolonging the useful lifetime of their applications. For example, a telecommunication satellite placed in orbit when a certain standard was not even conceived can be updated to support its years after the satellite entered in service. As new communication standards appear every few years, this example is likely to become quite frequent in the near future. • Reconfiguration can become an active part of the application. The very same FPGA device can be reprogrammed to implement different functions in different instants of time. As a result, multiple functions can be implemented with a single device, thus saving space and mass. In case of satellite, where the launch cost depends heavily on these two parameters, reconfiguration can save a significant amount of money.
Developer of mission-critical applications like those in the space market already recognized the benefits stemming from FPGAs. A satellite is normally the unique exemplar of its own species, conceived and manufactured for a single customer. As a result, the cost and the time required for developing new ASICs for each new satellite are often not justified and not available. For this reason FPGAs are widely used in the space market. Due to the mission-critical nature of space-borne applications, and the need for operating in a harsh environment, the current design practice is based on not reconfigurable FPGAs (e.g., Actel RTAX family [1]). The adoption of such kind of devices brings only few of the benefits that designers can take advantage of in case reconfigurable devices are used. Today, new reconfigurable FPGAs are available that thanks to adequate design techniques and design tools can find their way in mission-critical applications, offering designers all the possible benefits stemming from their adoption. Designing a mission-critical application aiming at an harsh environment such as space using reconfigurable FPGAs is not an easy task. A number of possible problems can arise, and the appropriate mitigation techniques must be understood and mastered by designers. Tools are available to support designers, but they must be understood and mastered as well. The purpose of this book is to give an in-depth overview of the problems designers have to face when approaching the design of space mission-critical applications using FPGA devices and describe possible solutions to cope with them. We focused only on the aspect of ionizing radiation, presenting which effects they induce in FPGAs, discussing how to evaluate them and how to mitigate them. Many aspects have been left out of the book, such as the problems related to aging of components,
1 Introduction
3
as well as packaging issues, and the procedures needed to guarantee an adequate quality for space use. The book is organized into two parts. The first part, “Basic Concepts,” describes the concept of reconfigurable FPGA, its failure modes when affected by ionizing radiation, and possible mitigation techniques. In particular: • Chapter 2 presents the concept of reconfigurable FPGAs, describing the resources that can be found in modern devices, the different technologies available for the configuration memory, as well as a general model that we will use through the book to present the algorithms at the core of tools for mitigating ionizing radiation effects. • Chapter 3 discusses the impact of ionizing radiation on FPGA devices, from both a physical and a logic level. First, the physical phenomena are discussed to illustrate the interaction mechanisms between radiation and semiconductor. Then, the physical phenomena are modeled at a more abstract level, identifying the so-called fault models. Finally, the effects induced by the considered fault models when hitting the resource of reconfigurable FPGAs are discussed. The chapter ends with an overview of the techniques that can be used to assess the impact of radiation on a certain FPGA technology and of the techniques that can be used for assessing the effects of the considered fault models on applications mapped on FPGA devices. • Chapter 4 presents the solutions today available for mitigating the effects of radiation. After a review of the design flow needed for implementing an application on an FPGA device, we will address the presentation of the hardening solutions from two different points of view: the point of view of the FPGA manufacturer, by describing how an FPGA device can be made robust against ionizing radiation, and the point of view of the FPGA user, by describing how an application can be designed to tolerate the effects of radiation hitting a non-robust FPGA. The second part of the book, entitled “Practical Concepts,” focuses on reconfigurable FPGAs specifically designed for space use: the Xilinx Virtex-4 QV device [4], the Actel RT ProASIC3 [1], and the Atmel AT280 [2]. In particular: • Chapter 5 illustrates the characteristics of the considered devices. For each of them, a brief description of the available resources is given, and the data publicly available about their sensitiveness to ionizing radiation are reported and commented. • Chapter 6 discusses how the mitigation solutions presented in the previous chapters can be implemented on the considered devices. Experimental data coming from realistic benchmarks are presented to allow the reader understand the effectiveness of different mitigation solutions. • Chapter 7 outlines a possible approach to assess the impact of ionizing radiation on FPGA devices while taking into account the radioactive environment the application is aiming at and the peculiarities of the mission where the application has to be employed. Different solutions are discussed, outlining also the implications they have on the organization of the whole system. • Chapter 8 draws some conclusive remarks.
4
1 Introduction
References 1. Actel Corporation, Radiation-tolerant proasic3 low-power space-flight flash fpgas, 2 ed., November 2009. 2. Atmel, http://atmel.com/products/fpga/, 2010. 3. Xilinx, 7 series fpgas, Tech. report, Xilinx, June 2010. 4. Xilinx, Space-grade virtex-4qv family overview, ds653 (v2.0) ed., April 2010.
Part I
Basic Concepts
Chapter 2
Reconfigurable Field Programmable Gate Arrays: Basic Concepts
2.1 FPGA Architectures The first FPGA models have been introduced during the 1980s. The first programmable logic, almost similar to the FPGA, is comparable to the first costly programmable devices called programmable logic devices (PLDs) but able to implement a significantly higher amount of logic. Two first categories of devices have been developed: antifuse, consisting of an electrically programmable configuration memory which can be programmed only a single time and FPGA based on a configuration memory with SRAM cells that can be configured. Despite the antifuse devices were initially preferred for the more stability of the configuration memory, at the end of the 1980s, most of the preliminary dependability problems were solved, and the technology based on SRAM has started growing thanks to the volatility of the configuration memory that enables a wide range of applications. The FPGA architecture based on SRAM configuration memory can be configured in a very reduced time with whatever processor, differently from the antifuse FPGA that could be programmed only a single time. The FPGA architecture consists of a generic matrix of block interconnected by programmable interconnections. The capability of implementing any combinational or sequential function is related to the logic block capabilities. The elementary logic block function is generally called configurable logic block and it has the architecture illustrated in Fig. 2.1 The internal components of a configurable logic block may vary among different manufacturers. In the most cases the configuration logic blocks contain a main logical circuit called look-up table (LUTs); an example of a LUT is given in Fig. 2.2. It consists of a static RAM (SRAM) having the following dimensions 2m + 1. It represents a truth table for a logic function having m inputs. The input lines connected to the SRAM correspond to the inputs of the truth table, vice versa the output of the SRAM provides the value of the logic function. The LUT provides a high functionality since it can realize any function with m inputs m on a set of possible functions equivalent to 22 , with a maximum limit given by the number of configuration memory cells requested for a given LUT with k inputs, that is, equal to 2k [7, 8].
N. Battezzati et al., Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications, DOI 10.1007/978-1-4419-7595-9_2, C Springer Science+Business Media, LLC 2011
7
8
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
Fig. 2.1 The main element of the FPGA architecture: the configurable logic block (CLB)
0 1 0
c
1
a
f
f
1 1
a
b
c
f
0
0
0
0
0
0
1
1
0
1
0
0
0
1
1
1
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
1
1
0
b
a b c
Fig. 2.2 An example of a three-input LUT
The architectures of existing FPGAs mainly differ in three aspects: • Type of programming technology used • Structure of the logic block • Structure of the interconnection network Although the manufacturing differs, it is possible to create a generalized model of the internal structure; this model consists of four representation levels: 1. 2. 3. 4.
Tile Local routing Multiple tile Context
2.1
FPGA Architectures
9
The first representation level of the model, shown in Fig. 2.3, consists in the tile. The tile representation contains the CLB elements and the interconnections that guarantee their internal connectivity; these interconnections are exclusively engaged for the link of the CLB within a single tile.
Fig. 2.3 First representation level of the model: the tile
The second representation level, shown in Fig. 2.4, consists in the local routing among tiles based on the routing element called switch block. The interconnections between the different tiles are realized thanks to wiring segments embedded in the switching architecture; these wiring segments can be connected through the switch block. The interconnections at this level of the model are classified as local; therefore, they characterize the local routing. The switch block described in Fig. 2.5 is a component consisting of programmable interconnections able to connect the logic resources of the tile. The third level of the representation consists in the multiple tile: a macro element consisting of a set of tiles and with the respective switch blocks. The representation considers the overlay logic that is implemented by the FPGA architecture. In Fig. 2.6 the third level of the representation is illustrated. In detail, the connections V0 and H0 are the wiring segments for the connection of a generic tile, while V1 and H1 consist of the segments creating the connectivity among tiles. The connectivity channel between the tiles consists of long wiring segments with low resistance that can be traversed by signal in both the directions depending on the direction assessed to the signal by the switch blocks connected to its extreme points. Depending on the traversing direction, several degrees of connectivity may exist. The degree of connectivity is determined by the flexibility of the switch blocks and by the resolution of the routing architecture, it is an important factor for the efficiency of the FPGA architecture.
10
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
Fig. 2.4 Second representation level of the model: the local routing
Fig. 2.5 The switch-block element
2.1
FPGA Architectures
11 0 0
Fig. 2.6 The third representation level: the multiple tile
I/O
I/O
I/O
I/O
I/O
I/O
I/O Fig. 2.7 The fourth representation level: the context
I/O
12
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
The fourth level completes the representation of the FPGA context architecture introducing the input/output blocks dedicated to the communication with the FPGA architecture external word. The model is illustrated in Fig. 2.7 where the input/output modules are connected to the principal FPGA interconnection, indicated in Fig. 2.7 as V2 and H2. The interconnections of the third and fourth levels are classified as global and they characterize the global routing of the FPGA architecture.
2.2 FPGA Configuration Technology An FPGA architecture is composed using electrically programmable switches, where the dimensions, the capacity, and the resistance characterize the various models. In this section, we describe the principal manufacturing technologies, principally focusing on the volatility, reprogrammability, and complexity of the manufacturing process.
2.2.1 Floating Gate Technology The floating gate technology is realized as an early technology based on the erasable programmable read only memory (EPROM) that consists of memory cells that can be erasable using ultraviolet ray or based on the electrically erasable programmable read only memory (EEPROM) that are electrically erasable. In particular, this kind of approach is used in the model manufactured by Actel. The programmable switch illustrated in Fig. 2.8 consists of a floating-gate avalanche-injection MOS FAMOS that can be permanently disabled by injecting a charge into the floating gate. By applying a high voltage difference between the gate and the drain of the FAMOS transistor, it is possible to obtain a transition of high-energy electrons from the transistor junction to the isolated floating gate. At the end of the transition, the charge remains indefinitely in the floating gate and the transistor remains permanently polarized since it does not exist an electrical connection with the gate. The FAMOS transistor, if not programmed, is used in order to bring at low level the bit line when the word line is at high level. This approach can be further used to make connections between the word and the bit line; for example, it is especially used in order to implement the wired logic functionalities such as the Wired-AND. A Wired-AND logic consists in the connections of two or more wires with the supply voltage source through a unique resistance. The output corresponds to the product of the single signals with the same functionalities obtained using an AND gate. For this reason, the FAMOS transistor can be used for both the logic and the routing resources into an FPGA architecture. The best advantage of the EPROM technology is its non-volatility. With respect to the SRAM technology, it is not requested any permanent memory outside the FPGA chip in order to store the programming data. However, the EPROM technology requires three additional manufacturing processes, in order to insert the elevated
2.2
FPGA Configuration Technology
13
Fig. 2.8 The floating gate programming technology
pull-up resistors. Another drawback is the high static power consumption, due to the pull-up resistance of each EPROM cell transistor [11].
2.2.2 Antifuse Technology An antifuse device consists of two terminals with, in between, a high resistance for the not-programmed state. If a high voltage (i.e., a voltage comprised between 11 and 20 V) is applied between the two terminals, the resistance is melted and creates a permanent connection between the two terminals. Programming an antifuse requires an external circuitry providing high-level voltages and currents. The best advantage of the antifuse technology resides in the small cell dimensions. However, this advantage is generally reduced by the incurred dimensions of the transistors necessary to program the device, since they must manage elevated voltage levels.
2.2.3 SRAM Technology The main characteristic of a static RAM (SRAM)-based FPGA is a configuration memory built with SRAM cells able to control both the multiplexers and the pass transistors used for the connections as illustrated in Fig. 2.9. The functional
14
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
Fig. 2.9 The Static RAM configuration technology
principle is based on the logic value stored by the SRAM cell: when a logic “1” is stored, the pass transistor creates a connection between two hardwired segments; vice versa when a logic “0” is stored into the SRAM cell, the pass transistor is open and it presents a high resistance between the two hardwired segments. The state of the SRAM cell connected to the select line of the multiplexer illustrated in Fig. 2.9 controls which input of the multiplexer is connected to the output [14]. The SRAM is volatile; therefore, the FPGA architectures using SRAM cells require to be configured at each power-on. This is a relevant constraint, since it is mandatory that an SRAM-based FPGA system adopts an external and permanent memory, like a programmable ROM (PROM), or a microprocessor-based system that allows to program the FPGA’s configuration memory. However, the main advantage of this technology is the possibility to program the device an infinite number of cycles also in a short period.
2.3 The Logic Block The logic blocks of the FPGA architectures, also known as configurable logic block, principally differ for their dimensions and their functional implementation capabilities. The difference of the several logic blocks, as in part described in the previous sections, can be classified referring to the granularity of the data. The granularity could be defined as the number of boolean functions implemented by the logic or with the total number of transistor. In several FPGA architectures it is difficult to distinguish between the interconnections and the logic blocks given their connectivity; for simplicity we classify the FPGA model into two possible categories: logic blocks with a fine granularity, also called fine grained, and logic blocks with a coarse granularity, or coarse grained.
2.3.1 Fine-Grain Logic Blocks The fine-grain logic blocks consist of few interconnected elements. An example of a fine-grain logic block is the one manufactured by Plessey [13] as illustrated in
2.3
The Logic Block
15
Fig. 2.10 Fine-grain logic block manufactured by Plessey
Fig. 2.10. The logic is formed connecting the NAND gate to the multiplexer in order to create the desired logic function (combinational or sequential). The SRAM cells controlling the configuration memory bits consist of four bits, three of them configure the multiplexer while the last bit is used to control the latch. For example, in case the logic block is configured in order to implement the logic function f = ab +c, in this case the latch is not necessary; hence, the correspondent configuration bit will be programmed to a logic value capable to make the latch unused. The main advantage of the fine-grain logic block is the full usability of all the components. However, nevertheless, it is easy to efficiently use few logic gates, it results drastically disadvantageous managing a high number of routing segments and switching elements: These components generate delays and creates a huge increase in the device size.
2.3.2 Coarse-Grain Logic Blocks The coarse-grain logic blocks consist of a higher number of components with respect to the fine-grain logic blocks, principally including multiplexers, NAND gates, and LUTs. The Xilinx company is one of the top manufacturer of FPGA architecture embedding coarse-grain logic blocks. In Fig. 2.11 is illustrated the configurable logic block (CLB) of a generic Xilinx device. The basic components
16
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
DATA IN
DI
F DIN G
LOGIC VARIABLES
A B C D E
0 MUX 1
D
Q
QX
RD
QX
X F
F COMBINATORIAL FUNCTION
CLB OUTPUTS
G
G
QY
Y F DIN G
QY 0 MUX 1
D
Q
EC ENABLE CLOCK
RD 1 (ENABLE)
K CLOCK
DIRECT RESET
RD 0 (INHIBIT) (GLOBAL RESET)
Fig. 2.11 Coarse-grain logic block manufactured by Xilinx
contained in the CLB are the generator of combinational logic functions and two D-type flip-flops whose outputs can be directly connected to the inputs of the generator of combinational logic functions by using the internal routing existing in the CLB. The generator of combinational logic functions consists of two look-up tables (LUTs) with four-inputs that can be used separately or combined into a unique logic function. The partition of the inputs could be automatically made through a partitioning tool generally used during the logic synthesis or manually performed in case particular constraints would be implemented. On the other hand, all the sequential components have a common clock signal; besides the flip-flops that cannot be used as latches have common signal of clock-enable and an asynchronous reset. Any asynchronous preset can be obtained using the asynchronous reset if the data are stored in the form of active low logic level. The principal advantage of the coarse-grain logic block is the possibility to implement complex logic functions with few elements, reducing the need for a high number of programmable interconnections. However, it is difficult to obtain a high efficiency for the used resources, even if the decrease in the functional density is not a prohibitive factor.
2.4
The Routing Architecture
17
2.4 The Routing Architecture The routing architecture of an FPGA device defines how the programmable switches and the wiring segments that made the interconnections are placed in order to realize the complete logic functionality of the FPGA device [12]. To properly describe the FPGA routing architecture we introduce the following definitions: Wire segment It is a programmable connection through a switch. One or more switches can be connected to the same wire segment. Each access point of a wire segment has at least one switch connected to itself. Track It is a sequence of one or more wire segments that creates a connection between two points of the routing topological structure. Routing channel It is a group of parallel tracks. The model of the routing architecture, as illustrated in Fig. 2.12, has two basic structures: the connection block and the switch block. Each connection block allows
Fig. 2.12 The model of the routing architecture
18
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
the connection between the inputs and the outputs of a logic block and wire segments of a routing channel, while a switch block allows the connection between horizontal and vertical wire segments adopting a connectivity model on the four sides. The connection blocks typically connect only a single part of the whole track passing through a block. The connections on the four sides of the logic blocks are implemented by means of pass transistors for the output pins and multiplexers for the input pins, thus reducing the number of SRAM cells needed of each pin, and therefore reducing the whole dimension of the FPGA device. While the logic block pins are connected through the connection block, the switch block realizes the connection between segments intersecting horizontal and vertical channels. There are four types of wire segments in an FPGA architecture: 1. General-purpose interconnections: They consist in wire segments passing through the switches internally to the switch block. 2. Direct interconnections: They consist in the wiring segments connecting the outputs of each logic block directly to the neighboring logic blocks. 3. Long line interconnections: They consist in wire segments connecting all the switch blocks in the context of the FPGA architecture. 4. Clock line interconnections: They consist in unique wiring segments that connect all the sequential elements of the FPGA architecture. The clock lines are divided into groups depending on the clock domains they belong to.
2.4.1 The Switching Elements Each switching element of an FPGA architecture consists of a variable number of contact called programmable interconnect points (PIPs). Considering a switch block, the set of PIPs defines the connection of each wiring segment. PIPs, according to the technology of the FPGA configuration memory, can be implemented in different ways: • The floating gate technology implements the PIP with the FAMOS transistor and pass transistor. • The antifuse technology implements permanent PIP with the antifuse. • The SRAM technology has five transistors that form the SRAM cell and a pass transistor creating the contact: the wiring segments controlled by the pass transistor are connected or not, depending on the logic value contained into the configuration memory cell. Into the FPGA architecture the net of programmable interconnections consists of a set of wire segments that can be interconnected by means of PIPs, where each PIP can be controlled by some configuration memory cell. There are four kinds of PIPs: • Cross-point PIP • Break point PIP
2.4
The Routing Architecture
19
• Multiplexer PIP • Compound PIP A cross-point PIP connects wire segments localized in different plans: a horizontal segment to a vertical one and vice versa, or, considering the FPGA model introduced, it can connect a segment of the third level to a segment of the fourth level, or between the second and the third levels and vice versa; see Fig. 2.13.
Fig. 2.13 Location of the cross-point PIP
A break point PIP connects two segments of the same plan: two horizontal segments or two vertical segments. A break point PIP can be located between the third and the fourth levels of the FPGA representation model as illustrated in Fig. 2.14. A multiplexer PIP (MUX PIP) is classified into two different type: decoded MUX PIP and non-decoded MUX PIP. A decoded MUX PIP is a set of 2k cross-point PIPs connected to a common output wiring segment. Each decoded MUX PIP is controlled by k bits stored in the configuration memory cells; in particular, the input wiring segment addressed by the configuration memory bits is connected to the output segment while the decoding logic is embedded into the bit of the configuration memory and the PIP’s pass transistor. A non-decoded MUX PIP consists of a single configuration memory cell for each pass transistor, by this way k wiring segments are controlled by k configuration memory bits. As previously underlined, the wiring segments into the FPGA architecture interconnection network include both the local and global routing resources. Where the global routing resources are dedicated to the connection between the CLBs and the input/output blocks, while the local routing, specific for a single CLB, is used to connect a logic block to the global routing resources or to the adjacent logic blocks. The PIPs used for global routing resources are the cross-point PIPs and the break point PIPs, while PIPs of the local routing resources belong to the multiplexer PIP type, as illustrated in Fig. 2.15.
20
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
/O
I/O Break - Point PIP Location
I/O
I/O
Fig. 2.14 Location of the break-point PIP
Fig. 2.15 Location of the multiplexer and compound PIPs
Finally, the fourth PIP category consists of the compound PIPs. The compound PIPs are a combination of four cross-point PIPs and two break point PIPs, each one controlled by a particular group of bits into the configuration memory. The most recent FPGA architectures are built with buffered non-decoded MUX PIPs. This
2.6
The Configuration Memory
21
structure is adopted in order to prevent the signal degradation due to the resistance of each pass transistor.
2.5 The Input/Output Blocks The input/output blocks implement the interface between the pins of the package and the interconnections and logic blocks structure of the FPGA architecture matrix. This interface provides the other routing resources used on the frontiers of the device with the external word and that allow guaranteeing the connectivity of the FPGA architecture with the external signals. Communication with the outside signals is made by three-state buffers whose direction and polarity are determined by specific configuration memory bits. Each driver is configurable in order to have high or low slew rate and to manage different activation levels, depending on the voltage or current standard adopted. Besides, the drivers support different kind of communication standards that allow interfacing the architecture in electronic systems of different nature. The pads, which are connected to the drivers, are protected against possible electrostatic discharge (ESD) and over-voltage transitions. Within the input/output blocks there are several flip-flops, as illustrated in Fig. 2.16, that can be configured as D-type edge-triggered flip-flops or level-sensitive latches. PROGRAM-CONTROLLED MEMORY CELLS OUT INVERT
3-STATE INVERT
OUTPUT SELECT
SLEW RATE
Vcc
PASSIVE PULL UP
3- STATE T (OUTPUT ENABLE) O
OUT
D
Q
FLIP FLOP
OUTPUT BUFFER I/O PAD
R DIRECT IN REGISTERED IN
I Q Q D FLIP FLOP or LATCH
TTL or CMOS INPUT THRESHOLD
R
OK
IK
(GLOBAL RESET)
Fig. 2.16 An input/output block
2.6 The Configuration Memory The configuration memory contains all the information needed to program the circuit on the FPGA architecture. This information is also called bitstream. As illustrated in the previous sections, each resource within the FPGA architecture is
22
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts IO block bits
Clock bits
PIP bits LUT bits
Frame CLB bits
BRAM bits
Fig. 2.17 The general structure of the bitstream
controlled by specific configuration memory bits. The bitstream is generally organized in frames and each frame is correspondent to a set of resources localized in a contiguous area of the FPGA architecture, as illustrated in Fig. 2.17. These frames have a regular structure depending on the kind of interested resources. A finite state machine, dedicated to the transfer and localization of each frame in the correct area of the FPGA’s configuration memory, controls the bitstream loading. The configuration flow, by which the configuration memory is loaded with the bitstream data, basically consists of four phases: 1. Power up: The power supply is applied to the FPGA device. 2. Device initialization: The configuration memory is initialized by the configuration state machine dedicated to the bitstream loading. 3. Configuration load: In this phase, the device receives the configuration bit logic values that are progressively stored into the configuration memory through the dedicated configuration state machine. All the configuration events happen during the positive edge of the clock signal. 4. Start up: When all the configuration memory bits have been received, the configuration state machine brings the device to the start-up state: the circuit is actually implemented in the FPGA and it is ready to perform its operations.
2.7
An Overview of the Architecture of Modern FPGAs
23
2.7 An Overview of the Architecture of Modern FPGAs Across the years, FPGAs underwent a substantial evolution process that resulted in very complex devices, embedding many different resources. The architecture of a typical modern FPGA is depicted in Fig. 2.18, where we can recognize logic and interconnection resources that are used to implement combinational and sequential
Fig. 2.18 Architecture of a modern FPGA
24
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
functions and to deliver signals among functions; memory resources, that are used to implement random access memories, may be used for data storage by the circuit the FPGA implements; arithmetic resources, like digital signal processing (DSP) blocks, that are used for improving the performance of some computations (for example, multiply and accumulate operations); processing resources (e.g., processor hard cores like the PowerPC found in Xilinx Virtex-II, Virtex-4, and Virtex-5 devices [20]) that are used for building processor-based systems within a single FPGA; and interface resources that are used for delivering data to and from the FPGA using a variety of protocols spanning from the simple buffered short-circuit from the outside and the inside of the FPGA to complex high-speed serial link (like the Xilinx RocketIO [16]). The aim of this section is to outline all these kinds of resources.
2.7.1 Logic Resources FPGA have been originally intended as a means for implementing digital circuits and, therefore, they have been equipped with resources that allow implementing both combinational and sequential functions; moreover, the resources are programmable in such a way that designers are free to implement any logic function. To achieve such a goal, different solutions have been pursued by different manufacturers. 2.7.1.1 The Xilinx Configurable Logic Block The Xilinx configurable logic block (CLB) [19] is the key element for implementing sequential and combinational logic in Xilinx devices. It encompasses a number of slices (each slice includes the configurable resources for implementing boolean functions, flip-flops, and carry-propagation logic) and some interconnection resources. As an example, we report in Fig. 2.19 the architecture for the CLB found in Virtex-4 devices where we can recognize four slices, grouped two by two. Each group is connected to interconnection resources dedicated to fast propagation of carry signals. Moreover, each slice is connected to a switch matrix to route signals toward the rest of the FPGA and to local interconnection resources to route signals toward neighboring CLBs. Each slice contains several elements: • At least one n-input memory that can store 2n bits. This memory can be configured in several operational modes: – Look-up table (LUT) mode: It is used to implement single-output n-input combinational functions by storing the truth table of the function in the LUT. – RAM mode: It is used to implement 2n × 1 random access memories. – ROM mode: It is used to implement 2n × 1 read-only memories. – Shift-register mode: It is used to implement one 2n -bit shift register.
2.7
An Overview of the Architecture of Modern FPGAs
25
Fig. 2.19 The Xilinx Virtex-4 CLB
Optionally, the n-input memory can also be configured to operate as single port of dual-port memory. • Carry-propagation logic: This is used to propagate the carry signals coming from arithmetic operations, using dedicated high-speed lines. • At least one flip-flop: This is used for implementing state register bits for finite state machines or for data storage. Several parameters can be configured for defining the operations of the flip-flop (the operational mode among latch and flip-flop, the reset value, the polarity of the reset signal). • Glue logic: This is used to route signals inside the slice and to compute the complement of control signals (thus making possible to implement positive-edge and negative-edge triggered behaviors). As an example, the Xilinx Virtex-4 family of devices exploits CLBs with four slices, each offering two LUT and two flip-flops. The CLB can thus be used for implementing eight flip-flops, and up to eight 4-input combinational functions in case all the slices are used in LUT mode, or for storing up to 64 bits in case all the slices are used in RAM/ROM mode, or for working as one 64-bit shift register in case all the slices are used in shift-register mode. CLBs are arranged in an array, thus replicating the same structure in a regular fashion. The size of the array defines the resource availability of the device. For
26
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
example, the Xilinx Virtex-4 XC4VLX device is available in size ranging from 64× 24 CLBs (offering 12,288 LUTs and flip-flops), up to 192 × 116 CLBs (offering 178,176 LUTs and flip-flops). 2.7.1.2 The Actel VersaTile The VersaTile is the Actel [1] solution for implementing a configurable element able to work as a combinational function or as a sequential element. The VersaTile is available in the IGLOO and ProASIC3 devices and, as depicted in Fig. 2.20, is composed of 4 inputs, 2 outputs, 4 2-input multiplexer, 11 gates (1 NOR and 10 NOT), and a number of programmable switches, and hardwired interconnections among these elements.
0 1 Data X3 0 1
0 1
0 1 CLK X2
CLR/ Enable X1 CLR XC*
Fig. 2.20 The Actel VersaTile
By setting the configuration (open or closed) of each programmable switch the designer can use the VersaTile to implement • any 3-input combinational function; • one latch with clear or set signal; and • one D flip-flop with clear or set signal. VersaTile are replicated in a regular fashion, so that each FPGA carries an array of VersaTiles, whose total number may range from few hundreds (e.g., 384 as in the case of Actel A3P15 device) up to few tens of thousands (e.g., 75,264 in the case of Actel A3PE3000).
2.7
An Overview of the Architecture of Modern FPGAs
27
2.7.1.3 Comments When comparing the possible different architectures for logic resources, two examples of which have been presented in the previous sections, we can see that there is not a prevailing solution neither among different manufacturers (e.g., Xilinx versus Actel), nor within the same manufacturer (e.g., Xilinx Virtex 4 versus Xilinx Virtex 6). The solution based on LUTs and discrete flip-flops, as in the Xilinx case, is probably more efficient from the performance point of view as each of these elements can be optimized for one specific function. However, having a huge number of heterogeneous components increases the complexity of the overall design. On the contrary, the solution based on a single, general-purpose, and highly customizable element, as in the Actel case, contributes to simplify the overall architecture, but may have drawback on the efficiency of the obtained device as compromises must be accepted.
2.7.2 Interconnection Resources Interconnection resources are the infrastructure that FPGAs offer to exchange information between logic, memory, arithmetic, processing, and interfacing resources [15]. They can be seen as programmable “wires” that in principle can connect any resources together thus making possible the implementation of any circuit topology. Figure 2.21 depicts a conceptual model for the interconnection resources, where we can find two types of elements: • Wires that implement one-to-one hardwired connections between resources. In Fig. 2.21 we can see a number of wires that connect logic resources to router and pairs of routers. Wires are not programmable, as they implement a fixed connection between resources, which was assigned to them by the designers of the FPGA. • Routers that implement programmable many-to-many connection between resources. In general, each signal connected to a router can communicate via a programmable switch (i.e., a pass transistor) to any other signal of the router. Different configurations are possible, and Fig. 2.21 shows a possible example, where signal on one side of the router (e.g., signal A) is connected to all the signals lying on the other sides of the router. As a matter of fact, the number of resources that have to be interconnected is constantly growing at each new generation of FPGA. Moreover, when implementing very complex designs on large FPGAs, it is very likely that data have to be exchanged among resources that are physically placed in faraway locations. As a result a hierarchical interconnection architecture is needed. Different solutions have been developed by FPGA manufacturers. Among them, the solutions implemented by Xilinx, Altera, and Actel are the most interesting.
28
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
Fig. 2.21 Conceptual model for the interconnection resources
2.7.2.1 The Xilinx Interconnection Resources The Xilinx interconnection resources [15, 17] offer a hierarchical structure to designers, comprising the following: • Routers that in the Xilinx terminology are known as switch matrix. Any switch matrix offers a given number of hardwired interconnections to other switch matrix, and in particular assuming a regular array of N × M CLBs each connected to a switch matrix: – A number of so-called long lines that connect all the switch matrices along the full width and height of the FPGA. – A number of so-called hex lines that connect to every switch matrix in position (x, y) to the switch matrix in position (x + i, y + j) with i ∈ {−6, −3, 3, 6} and j ∈ {−6, −3, 3, 6}. – A number of so-called double lines that connect to every switch matrix in position (x, y) to the switch matrix in position (x +i, y+ j) with i ∈ {−2, −1, 1, 2} and j ∈ {−2, −1, 1, 2}.
2.7
An Overview of the Architecture of Modern FPGAs
29
– A number of direct interconnections between every switch matrix in position (x, y) and the nine switch matrices surrounding it. • Direct interconnection between slices and switch matrix, as depicted in Fig. 2.19. Connecting to switch matrix, any CLB can communicate to whichever CLB in the entire FPGA. • Direct interconnection between slices in the same CLB, as depicted in Fig. 2.19. Thanks to this dedicated, fast interconnection links, it is possible to minimize the communication delays when exchanging information among functions implemented in the same CLB, without wasting the global interconnection resources within switch matrix. By exploiting the Xilinx architecture, two functions can be interconnected by traversing a number of interconnection resources that depend on the placement of the design on the device. If the two functions are placed in the same CLB, they will benefit from the availability of fast dedicated interconnection links. On the contrary, if the design is very dense, resulting routing is very congested; they may have to resort to multiple links traversing a number of switch matrices. As a result, in case of very dense design, with very congested routing, the performance of the circuit may be limited by the need for traversing several switch matrices before reaching the end destination. 2.7.2.2 The Altera Interconnection Resources Also in the Altera devices we can find a hierarchical organization of interconnection resources [9, 15], which are composed as follows: • Local interconnections: Altera logic resources are clustered in logic array blocks, each comprising 10 identical logic elements (LEs) able to implement combinational and sequential functions by offering one 4-input LUT and one flip-flop. LEs within a LAB are directly connected using dedicated, fast local interconnection so that functions placed on the same LAB do not have to cross the global interconnection resources to exchange data. • Vertical/horizontal channels: The channel contains wires of length 4, 8, 16, and 24 that connect to LAB along the entire length of the wire. As LABs are regularly arranged in an N × M array, we have that channels offer dedicated wires to each LAB within 4, 8, 16, and 24 columns/rows from a given LAB. 2.7.2.3 The Actel Interconnection Resources The Actel interconnection resources [1] offer hierarchy of four types of interconnections: • Local resources: They are dedicated interconnections that link the outputs of each VersaTile to the inputs of each of the eight surrounding VersaTiles. • Long-line resources: They are interconnections intended for supporting delivery of data to longer distances and for higher fanout connections. They have length
30
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
spanning one, two, or four VersaTiles; they run both vertically and horizontally and cover the entire device. Each VersaTile can drive long-line resources, which can access every input of every VersaTile. • Very-long line resources: They span the entire device and are intended for implementing very long or high-fanout interconnections. Their length spans 12 VersaTiles in the vertical direction on each side of a given VersaTile and 16 VersaTiles in the horizontal direction on each side of a given VersaTile. • VersaNet networks: They are low-skew, high-fanout nets accessible from the outside of the FPGA or from internal logic. They are intended for clocks and resets distribution, and, in general, for any high-fanout nets requiring minimum skew.
2.7.3 Memory Resources As the number of logic and interconnection resources available in modern FPGAs is growing, these devices are becoming a very appealing replacement for ASICs in certain type of application. As a result, the FPGA is no longer intended for implementing just glue logic; it is becoming the device for building an entire system. The availability of on-chip memory resources is boosting this shift. Indeed, any computing system needs to store data and having the possibility of keeping them on-chip contributes to significantly speed up the performance of the system, as most of the task is performed on-chip without the need for massively accessing to outside memories via less efficient interfaces. As a result, modern FPGAs embed configurable memory resources that can be seen as static RAM arrays distributed among the whole device. The RAM array offers few tens of k bits of data (e.g., 18 and 36 kb for Xilinx Virtex-6 devices [22], 9 and 144 kb for Altera Stratix IV devices [5]), which can be arranged in several different “aspect ratios.” As an example, the 36 kb RAM available in Xilinx Virtex-6 devices can be configured as 32 kb × 1, 16 kb × 2, 8 kb × 4, 4 kb × 9, 2 kb × 18, 1 kb × 36, or 512 b × 72 in dual-port mode. Moreover, memory resources can be cascaded to implement larger blocks. Moreover, the memory resources can support at least three operational modes: • single-port mode; • dual-port mode; and • first-in first-out (FIFO) mode. Other operational modes can optionally support in the following: • shift-register mode, which is particularly convenient when the memory resource is used for implementing data storage in digital signal processing (DSP) algorithms and • ROM mode.
2.7
An Overview of the Architecture of Modern FPGAs
31
2.7.4 Arithmetic Resources As more and more complex applications are demanded to FPGA, manufacturers recognized the benefits on embedding hardwired arithmetic resources in their devices, implementing specific recurring functions that are common to many classes of applications. The arithmetic resources most widely adopted by FPGA manufactures (e.g., Xilinx, Altera, and Actel) is the multiply and accumulate (MAC) resource, which is common to many applications like filtering applications, software-defined radio, and radar systems. By exploiting the arithmetic resources, FPGA user can save logic resources to implement other non-recurring functions, thus making possible building more complex systems on a single FPGA device. Different FPGA manufacturers followed different implementation paths in developing their MAC resource. In general, the arithmetic resources they developed can provide at least one of the following functionalities: • • • • • • • • • •
Multiply Multiply and accumulate Multiply add Three-input add Barrel shift Wide-bus multiplexing Magnitude comparator Bit-wise logic functions Pattern detect Wide counter
As an example of arithmetic resource we can consider the Mathblock Actel recently introduced with the RTAX-DSP family of devices [2]. The architecture of the Mathblock is shown in Fig. 2.22. The arithmetic resource allows the hardwired logic for implementing the following function: Pn [40 : 0] = Pn−1 [40 : 0] ± (A × B)
(2.1)
2.7.5 Processing Resources Similar to what happened with arithmetic resources, FPGA manufacturer recognized the importance of embedding processing resources under the form of hardwired processor cores in the most advanced devices. By using hardwired processor cores, designers can implement computing systems on a single FPGA device. Moreover, they can have the possibility of building complex architectures where custom hardware accelerator implemented using FPGA logic, and arithmetic resources cooperate with embedded software running on the processor core the FPGA embeds. As all the modules are integrated in the same chip, a substantial performance
32
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
Fig. 2.22 Architecture of the Actel Mathblock
improvement is achieved with respect to implementations where discrete chips (one for the hardware accelerator and one for the processor) are used. The two most important example of processor cores embedded in FPGA are the PowerPC processor available in Xilinx Virtex 2-pro [20], and some versions of Xilinx Virtex-4 and Virtex-5 devices [18], and the Cortex-M3 processor available in Actel SmartFusion devices [3]. The Virtex-5 FXT [21] series of devices from Xilinx offers up to two PowerPC 440 cores (Virtex-2 pro and Virtex-4 adopted PowerPC405). Each processor offers the following features, which are schematized in Fig. 2.23: • A 5 × 2, 128-bit crossbar switch • Simultaneous memory bus and processor local bus (PLB) access • Integrated DMA channels, PLB interfaces, and dedicated memory interface so that FPGA logic resources are not employed to implement embedded computing system
2.7
An Overview of the Architecture of Modern FPGAs
33
Fig. 2.23 Architecture of PowerPC 440 adopted in some Virtex-5 FXT devices
• Auxiliary processor unit (APU) controller to integrate hardware accelerators and create custom co-processors • Non-blocking pipelined point-to-point access to FPGA resources • Dedicated memory interface port for up to 128-bit data transfer per cycle to offload PLB • Highly pipelined transmit and receive scatter–gather DMA channels • User-selectable port prioritization and operating frequencies The Cortex-M3 processor is an optimized version of the ARMv7-M [6] architecture for low-cost applications. It embeds the following features: • • • • • •
Hardware single-cycle divide and multiply instructions Nested vectored interrupt controller (NVIC) Configurable from 1 to 150 physical interrupts; up to 256 levels of priority Memory protection unit (MPU) Data watchpoint and trace unit (DWT) Flash patch and break point unit (FPB) that implements six program break points and two literal data-fetch break points
If we analyze the evolution of the market, we can see that the ARM Cortex architecture is growing interest. Indeed, following the example of Actel with the Cortex M3 in SmartFusion FPGA, Xilinx recently announced a novel architecture based on Cortex A9MP, which offers a dual-core processor each running at 800 MHz [10]. By comparing these examples we can see that there is a great variability in the solutions adopted by FPGA manufacturers, spanning from providing any hard cores at all (like Altera who opted for a soft-processor core, the NIOS II [4], which is implemented using FPGA logic and interconnection resources), to the
34
2 Reconfigurable Field Programmable Gate Arrays: Basic Concepts
microcontroller-like solutions as in the Actel case, to the almost full-fledged highperformance processor as in the Xilinx case.
2.7.6 Interfacing Resources FPGAs are today embedded in a number of high-demanding applications, where moving quickly data inside and outside the device is a crucial issue. To relieve FPGA users from the complexity of implementing standard interfaces using logic resources, and thus also allowing saving device resources, FPGA manufacturers are embedding in their devices hardwired modules providing functionalities like • PCI Express endpoints • Ethernet media access controller (MAC) modules • High-speed serial interfaces (e.g., Xilinx 11.18 Gbps GTH transceivers) As FPGA can also be appealing for embedded control applications, some device manufacturers are also implementing mixed signals interfacing resources.
References 1. Actel Corporation, Fpga array architecture in low-power flash devices, v1.4 ed., December 2008. 2. Actel Corporation, Smartgen hard multiplier adder/subtractor handbook, v1.0 ed., May 2009. 3. Actel Corporation, Actel smartfusion microcontroller subsystem (mss) user’s guide, 1 ed., May 2010. 4. Altera Corporation, Nios ii processor reference handbook, v10.0 ed., July 2010. 5. Altera Corporation, Stratix iv device handbook, siv5v1-4.1 ed., March 2010. 6. ARM, Armv7-m architecture reference manual, 2008. 7. S. Brown, Fpga architectural research: a survey, IEEE Design & Test of Computers 13 (1996), no. 4, 9–15. 8. S. Brown and J. Rose, Fpga and cpld architectures: A tutorial, IEEE Design & Test of Computers 13 (1996), no. 2, 42–57. 9. Altera Corporation, Comparing ip integration approaches for fpga implementation, Tech. report, February 2007. 10. K. DeHaven, Extensible processing platform. ideal solution for a wide range of embedded systems, White Paper WP369 (v1.0), Xilinx, April 2010. 11. H. Haznedar, Digital microelectronics, Benjamin/Cummings Pub. Co., San Francisco, 1991. 12. Y. Khalilollahi, Switching elements, the key to fpga architecture, Conference Record of WESCON/94. ‘Idea/Microelectronics’, Anaheim, CA, 1994, pp. 682–687. 13. Plessey, www.plessey.com; www.plesseysemiconductors.com, Plessey FPGA, November, 1989. 14. J. Rose, A. El Gamal, and A. Sangiovanni-Vincentelli, Architecture of field-programmable gate arrays, Proceedings of the IEEE, vol. 81, Piscataway, USA, 1993, pp. 1013–1029. 15. J. Rose, Abbas El Gamal, Senior Member, and Albert Sangiovanni-Vincentelli, Architecture of field-programmable gate arrays: The effect of logic block functionality on area efficiency, Proceedings of the IEEE 25 (1990), 1217–1225. 16. Xilinx, Rocketio TM transceiver user guide, ug024 (v3.0) ed., February 2007. 17. Xilinx, Virtex-ii platform fpga user guide, ug002 (v2.2) ed., November 2007.
References
35
18. Xilinx, Virtex-4 fpga embedded processor block with powerpc 405 processor, ds306 (v2.01b) ed., April 2009. 19. Xilinx, Virtex-6 fpga configurable logic block, ug364 (v1.1) ed., September 2009. 20. Xilinx, Powerpc 405 processor block reference guide, ug018 (v2.4) ed., January 2010. 21. Xilinx, Virtex-5 fpga user guide, ug190 (v5.3) ed., May 2010. 22. Xilinx, Virtex-6 fpga memory resources user guide, ug363 (v1.4) ed., May 2010.
Chapter 3
Reconfigurable Field Programmable Gate Arrays: Failure Modes and Analysis
3.1 The Impact of the Environment on the Device Nowadays, electronic devices are used in a huge number of applications, from entertainment market to military equipment, from personal computing to large-scale business frameworks, from mobile phones to satellites and space probes. Each application has its own requirements and constraints, each of which weight in a different fashion depending on the specifications of the mission to be fulfilled. One particular kind of applications is the one called mission critical. Mission-critical applications are usually characterized by the involvement of a huge amount of money that could be suddenly lost if something goes wrong. This is the case of satellites, for example, that cannot be repaired nor returned for maintenance if some part stops working. This is also the case of bank applications where an error during a transaction could cause the loss or stealing of huge sums of money. In the following we will refer to the use of electronic devices and in particular of FPGAs in such kind of applications. When used in mission-critical applications, FPGAs and digital circuits in general require a special attention to the dependability aspect. In particular, in case of mission-critical applications we can specify “dependability” as the capability to tolerate faults induced by the environment that could lead to a failure of the entire system. A fault is defined as the misbehavior of an internal component of the system. If activated by the operation of the system, it can be propagated to the outputs of this component, becoming an error. Finally, if the error is propagated and produces a misbehavior of the system outputs, this is called a failure. In mission-critical applications, faults and errors could be accepted but, depending on the requirements, failures must either be detected and signaled, also bringing the system in a safe state, or masked not allowing their happening. We could generically define this concept as fault tolerance, intended as the capability of guaranteeing that the system will be, or will be brought, always in a safe state. Strategies and techniques should be studied and applied to tolerate faults and errors but, in order to implement them in the most effective and efficient way, failure modes must be studied and analyzed in detail. In this section we are going to present the effects of the environment on integrated circuits, and in particular on FPGAs,
N. Battezzati et al., Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications, DOI 10.1007/978-1-4419-7595-9_3, C Springer Science+Business Media, LLC 2011
37
38
3 Reconfigurable Field Programmable Gate Arrays
starting with the analysis of the environment itself, of the faults it can induce, and the models that can be adopted to represent them. Faults can be introduced in the system by both the user and the surrounding environment. The user can cause faults providing wrong inputs to the system, causing it to go in an unknown or incoherent state. On the other side, environment could cause several kinds of faults, depending on the nature of the solicitation it provides to the system. For example, mechanical faults could be vibration or shocks or even by the effects of temperature and pressure variations; electrical faults can be caused, for example, by electromagnetic interference (EMI) or electrostatic discharge (ESD). Moreover, the effects of aging cause faults too. One of the most critical environmental aspect that could lead to failures of the modern integrated technologies and systems, especially in space and avionic applications, is radiation [9]. In the following of this book we will consider environmental impact on the electronic device as the only radiation impact. Indeed, even if there are several causes of malfunctioning due to the surrounding environment, in mission-critical space applications radiation is one of the most troublesome concern.
3.1.1 Radiation Environments Radiation is a generic term that envelops many kinds of physical entities. In general, it can be defined as a set of particles, charged or not, that can interact with the electronic system by an exchange of energy. Considering the broad range of applications of electronic systems, the harshest environment from the radiation point of view is space. Many kinds of radiation and particles are present in space, mainly generated by nuclear reactions in the stars. These particles can easily move in the vacuum of the space environment, but when they arrive close to the Earth they hit atoms and molecules that compose the atmosphere, full of nitrogen, oxygen, and other gasses, loosing a great part of their energy and transforming themselves to other kinds of particles. The atmosphere thus acts like a shield that protects us from radiation moving in space. Electronic circuits used in terrestrial applications are consequently safer than the ones used in space or in avionic applications, being the latter subject to the effects of particles that have not yet completely lost their energy. 3.1.1.1 Space Radiation In space, several kinds of radiation can interact with the electronic devices, provoking faults and damages. The main ones are • cosmic rays, • mesons, and • alpha particles. Cosmic rays are nuclei of several atoms, like hydrogen, helium, iron, that travel through the space with a speed of several thousands of kilometers per second. This
3.1
The Impact of the Environment on the Device
39
kind of radiation has very high energy and ionization degree and primarily derives by explosions of novas and supernovas happened millions of years before reaching our solar system. Among these particles, high-energy electrons are also present that travel at near the light speed. The sun, through solar winds, also contributes to cosmic rays, mainly by protons. Another kind of radiation that could lead to faults are mesons that are mainly produced by the interaction of high-energy particles of cosmic rays with the terrestrial atmosphere. The main kinds of mesons are charged pions and muons that have a lower energy than heavy ions but still can produce effects on electronic circuits. Alpha particles are still another kind of radiation that is produced by the decay process of radioactive elements. Such particles have a very low energy compared to the one of heavy ions and it is difficult that they can overcome the package protection of the chips and produce faults. However, more alpha particles can be produced by the interaction of other particles, like neutrons or protons, with the silicon of the devices, thus being generated inside the device itself and being able to produce faults. All these kinds of radiation, that travel through the space, can be kept by magnetic fields, thus being trapped there, like around the Earth. The magnetic field of the Earth looks like the tail of a comet, stretching millions of kilometers beyond the Moon at the opposite side of the Sun. This magnetic field can trap all the particles traveling nearby it, forming a sort of radioactive cloud around the Earth, called “van Hallen Radiation Belts” by the name of the scientist that discovered them in 1958. Solar flares, that are giant eruptions of solar gases and plasma on the surface of the Sun, produce strong distortions of the magnetic fields around the Earth, greatly increasing the cosmic rays absorption. Van Hallen Belts are not the only place where high-energy particles can be trapped. The radiation belts of Jupiter, for example, are thousand times more powerful than terrestrial ones.
3.1.1.2 Ground Radiation Ground-level radiation is mainly due to the interaction of cosmic rays with the Earth’s atmosphere. There are also terrestrial radioactive sources, like the Earth’s crust that emits alpha particles, but they are negligible with respect to the former. When cosmic rays arrive around the Earth, primary particles that compose them hit atoms in the atmosphere loosing much of their energy and producing different kinds of secondary particles. These are mainly divided into three groups: hadrons, muons, and electromagnetic components (electrons, positrons, and photons). Depending on their nature, these particles can either travel for some meters from the point of their generation, up to a new interaction with other atoms, or even reach the ground with no further interactions. The numerically dominant form of ground-level cosmic radiation is composed by muons. Even if cosmic rays continuously lose their energy while crossing the atmosphere, technology shrinking and voltage levels reduction make electronic devices always more sensitive to radiation effects even at the sea level.
40
3 Reconfigurable Field Programmable Gate Arrays
3.1.2 Radiation Characteristics In order to characterize the radioactive environment and its interaction with electronic devices some definitions are commonly used: • Flux • Fluence • Cross section As shown in Fig. 3.1, given a piece of material, called target, with a surface S, it is possible to define a unitary area a on which radiation is incident. Moreover, given a unitary time t, a beam of particles is characterized by the number of particles p themselves that crosses the unitary area. Each particle of the beam can have a different incident angle α when crossing the material’s surface. Fig. 3.1 Definition of the particles beam and target parameters
a
S
p
The particles flux f , as defined in (3.1), is the number of particles p that crosses the unitary area section a in one unit of time t: f = p/a · t
(3.1)
Flux is measured in [#particles/cm2 s] The particle fluence Φ, as defined in (3.2), is the integral of the flux in time, describing the number of particles that crosses the unitary area section during a specified time t:
t
Φ=
f (t) dt
0
Fluence for a determined time is measured in [#particles/cm2 ].
(3.2)
3.1
The Impact of the Environment on the Device
41
Finally, cross section σ is a hypothetical area measure that represents a surface of the target, such that if a particle crosses this surface there will be an interaction. Cross section is measured in [cm2 ]. Considering an electronic device we could say that its whole surface can interact with radiation but not all the particles will induce a fault or a damage. For this reason other kinds of cross sections will be introduced later, to correlate the actual sensitive area of the device to the produced effects.
3.1.3 Physical Effects When an electronic device is exposed to radiation, several kinds of effects can be observed, due to the interaction of the particles with the different materials the circuit is composed of. Before introducing these effects, some other definitions should be done, in particular related to the particles penetration in materials. As shown in Fig. 3.2, the incident particle, also called primary particle, has a scattering path during its penetration in the material. This random path is determined by the collisions of the particle with nuclei of the material that divert the particle itself making it lose part of its energy and producing secondary particles. However, for the sake of simplicity, such path can be approximated by a linear one that acts as the axis of a cylinder that describes the energy of generated secondary particles. If a secondary particle has an energy higher than , it can cross the material out of the cylinder, otherwise it cannot. primary particle
approximated path secondary particles
actual path
Fig. 3.2 Penetration of a particle in the material
First of all, the stopping power sp of a particular particle incident on a material is defined as its energy loss per unit path length of penetration into the material itself. As in (3.3), it is defined as the particle incremental energy (d E) loss per incremental distance (ds) traversed in the material: sp =
−d E ds
(3.3)
42
3 Reconfigurable Field Programmable Gate Arrays
It is important to note that the stopping power is not only a characteristic of the particle parameters, such as its mass, charge, and energy, but also a function of the parameters of the material in which it deposits its energy, including density, atomic weight, and atomic number. Whereas stopping power is related to the energy loss of the particle, linear energy transfer (LET) focuses on the energy transferred to the material in the proximity of the particle track by means of secondary particles emitted during the interaction. LET can be expressed as follows: LET =
−dE ds
(3.4)
Because LET is related to the energy transferred in the vicinity of the track, indeed, secondary particles with an energy greater than are excluded from the computation, since they will travel far from the track itself. dE is then the energy loss due to collisions minus the kinetic energies of all the secondary particles with energy greater than . If approaches infinity, thus taking into account all the secondary particles generated by the interaction with the material, LET becomes identical to the stopping power. In the following, we will consider LET in this latter case. The LET parameter, that in most cases can be approximated to the stopping power, plays an important role in the computation of error rates for electronic system components. In the studies of radiation effects on electronic devices, LET is usually expressed in units of MeV cm2 /mg of material (typically silicon). This unit of measurement arises from a combination of the energy lost by the particle to the material per unit path length (MeV/cm) divided by the density of the material (mg/cm3 ). 3.1.3.1 Energy Loss Mechanisms The major energy loss mechanisms for ionizing particles are ionization and bremsstrahlung. Ionization is a mechanism by which charge is released within the material traversed by the particle. It can be either direct, if the incident particle is charged, or indirect, if the particle is not charged but produces charged particles by means of the interaction with the material. The direct ionization mechanism is mainly due to heavy ions, electrons, positrons, and alpha particles. When they enter the material, if they have enough energy they can tear electrons out of the neutral atoms within the material itself or give electrons to ionized atoms, provoking a movement of charge that remains along the particle’s track. Indirect ionization, instead, is mainly due to neutrons and photons that have no charge by themselves, but can ionize atoms within the crystal lattice. Bremsstrahlung, on the other hand, is electromagnetic radiation produced by the acceleration of a charged particle, such as an electron, when deflected by another charged particle, such as an atomic nucleus. This is the most predominant phenomenon happening for light fast electrons. The radiated energy is a continuous spectrum in the X-ray and/or gamma-ray regions. Produced high-energy photons
3.1
The Impact of the Environment on the Device
43
such as gamma-rays can, in the neighborhood of an atomic nucleus, be annihilated with the immediate appearance of an electron (e− ) and a positron (e+ ) pair. In addition high-energy bremsstrahlung and pair production can combine to produce what are called cosmic ray showers. These are avalanches or cascades, within the device, of bremsstrahlung photons, electrons, and positrons, each providing a source for the other until their energies are dissipated and the shower ceases.
3.1.3.2 Funneling Effect Considering the previous mechanisms and the technology of digital microcircuits, radiation effects are mainly due to the funneling phenomenon (Fig. 3.3). When a heavy ion penetrates a semiconductor device through its junction and depletion layers, it produces a track of ionization, composed of electrons and holes from the semiconductor material atoms. The presence of the track temporarily collapses the depletion layers locally to the track, distorting the equipotential surfaces of the depletion layer electric fields. The distortion results in a funnel-shaped equipotential surfaces that can extend into the substrate of the device. This phenomenon creates two correspondent currents, along the track, an electron and a hole one, flowing in the opposite direction. The large increase in charge density augments the probability that a critical charge is collected at a device information node. In particular, if the incident ion track impales or passes very near to the information node, most of the charge in the funnel is collected by the node depletion layers in fractions of a nanosecond. After collection, the charge density falls to a level comparable to the
ionizing particle
GATE
DRAIN
SOURCE
– – + +
–
– + –
depletion layer
+
funnel holes current
electrons current
SUBSTRATE
Fig. 3.3 Funneling effect in a CMOS transistor due to an ionizing particle
44
3 Reconfigurable Field Programmable Gate Arrays
substrate dopant density, and the distributed depletion layer field relaxes back to its state prior to the track onset. 3.1.3.3 Displacement Effect Another relevant effect is the displacement of the crystal lattice. When a heavy particle enters the device material, it can change the arrangement of the atoms in the crystal lattice, creating lasting damages, increasing the number of recombination locations, thus worsening the analog properties of the affected semiconductor junctions. Such a damage can be annihilated by heating the device providing to the lattice enough energy to recombine and return to the equilibrium state. 3.1.3.4 Charge Accumulation Effect One last important effect is the charge accumulation due to both the funneling and displacement mechanisms. Considering these two effects during the exposure time, charge can be accumulated both in locations formed by means of the lattice defects and in the insulation oxides around and within the transistor’s structure. This effect provokes a gradual degradation of performance up to a loss of functionality over a certain accumulated radiation dose. Annihilation by heating is again a possible solution to restore the correct functionality of the device but, of course, cannot be always applied.
3.1.4 From the Effect to the Fault Physical effects due to radiation impact on materials used in electronic devices are very complex phenomena that require specific mathematical and probabilistic models to be represented. Simulation of such behaviors is getting more and more unfeasible due to the higher complexity and the very long time required. Indeed, modern circuits are growing toward unbelievable complexities, for both the involved materials, building technologies, and the integration of billions of transistors; studying the effect of an ionizing particle taking into account all the physical phenomena is getting too complex and time consuming. However, some common behaviors of the affected circuit can be recognized that can be produced by one or by a combination of more of the previous effects. Such behaviors are related to the functionality of the circuit itself, instead of being related to the physical principles of the technology, thus being much more comprehensible for the user that exploits the functionalities of the device. For this reason, the concept of fault model has been introduced. Instead of analyzing the effect of the particle on the material and the device structure by means of a complex mathematical model, it is possible to represent common kinds of misbehaviors induced by radiation by simple functional models that take into account the effect on the correct functioning of the circuit. In particular, such models define the difference of the faulty behavior, induced by radiation, with respect to the unaffected one, that in this case we can assume to be correct.
3.1
The Impact of the Environment on the Device
45
Radiation-induced fault models can be classified into two main categories: single event effects (SEEs) and total ionizing dose (TID). SEEs model the effects due to a single particle striking the device, while TID model the effect of charge accumulation and displacement damages. 3.1.4.1 Singe Event Effects Single event effects are models of the effect of the funneling induced by a single particle, charged or not, in a certain location within the device. Depending on the strike location and time, the electric fields, and the energy of the incident particle, the funneling can produce different functional behaviors. Because they are related to a single particle strike, the total exposure time of the device to radiation is not relevant to define their behavior. SEEs can be temporary faults that affect the device for a certain period of time, at most until a power cycle is performed, and these are called soft errors; otherwise, if the produced fault is permanent, damaging the device itself, it is called hard error. Depending on the effect they produce, soft and hard errors can be divided into different categories: • Soft errors – – – –
Single event upset (SEU) Multiple cell upset (MCU) Single event transient (SET) Single event functional interrupt (SEFI)
• Hard errors – Single event latch-up (SEL) – Single event gate rupture (SEGR) SET When the funneling effect takes place, as a consequence of a particle strike, as shown in Fig. 3.3 a track of electron-hole pairs distort the transistor depletion regions accumulating charge in such a funnel. In a period of time between picoseconds and nanoseconds the charge is collected by the electric field injecting holes or electrons, depending on the polarization of the field itself, in one of the transistorsensitive nodes. This phenomenon can cause a spurious variation, or glitch, of the voltage level at the output of the transistor that is called single event transient (SET). The quantity of charge needed to induce an SET is called critical charge (Q crit ). The SET shape depends not only on the incident particle but also on the device it strikes. Indeed, it is a function not only of the LET of the particle and its incident angle but also of the materials encountered in its path inside the device and the electric fields present at that particular moment. The SET pulse is always defined by a double transition, {0 → 1 → 0} or {1 → 0 → 1}. Since it is a transient variation of a logic value, the number in the middle defines the polarity, the sign, of the SET. We will speak of positive SET, if the
46
3 Reconfigurable Field Programmable Gate Arrays
Fig. 3.4 SET{0 → 1 → 0} shape and measures
transition is {0 → 1 → 0}, or negative SET otherwise. As shown in Fig. 3.4, the transient pulse, in this case a positive SET, can be defined by means of different measures. The maximum, or minimum in the case of a negative SET, voltage level it reaches is referred to as Vabs ; it defines the maximum amplitude of the pulse. The whole period of time the pulse perturbs the affected signal is called tabs ; however, this measure is not very significant. Indeed, defining VIH the level above which the signal is considered a “one” by the logic, treal {0 → 1 → 0} is the period of time during which the SET could really affect the behavior of the circuit, changing the value of the affected signal from 0 to 1. For a negative SET, treal {1 → 0 → 1} is computed on the basis of VIL , the level below which the signal is considered a “zero” by the logic. Finally, there are two other important measures to be defined. The rising and falling time of the pulse, respectively, defined as the time the pulse takes to go from the 10% of Vabs to the 90%, called trise {0 → 1 → 0}, and the time the pulse takes to go back from the 90% of Vabs to the 10%, called tfall {0 → 1 → 0}. Note that these two measures are transition-dependent because they depend on the sign of the pulse. In general the rising time of a positive pulse and the rising time of a negative pulse should be considered separately. SETs are transient faults that last for a period of time ranging between picoseconds and nanoseconds within the circuit, depending on the pulse width and amplitude. However, during their propagation could be sampled by memory elements, thus introducing errors that will propagate through the circuit up to its outputs, leading to a misbehavior of the system.
3.1
The Impact of the Environment on the Device
47
SEU SEU, also called upset or bit-flip, is the effect of a particle that changes the value of a memory element, as a latch or a cell within a memory array. As shown in Fig. 3.5, when an SET is generated by an ionizing particle within a memory element, depending on the amplitude of the resulting glitch, it could force the feedback loop to change its value thus modifying the actual value stored in the memory element. In particular, considering the SRAM cell depicted in Fig. 3.5a, suppose that the initial state of the cell is a logic “one”, i.e., the value that is at the output of the inverter formed by the transistors couple M4–M3 that forces the output of the cell, the bit line (BL). In this case, if an SET is generated in the drain of the pMOS transistor M2, if its amplitude is high enough, it could force the nMOS transistor M3 to be activated, thus causing the inversion of the output of the inverter M4–M3. Because of the feedback that controls the inverter M2–M1, also the content of the other half of the cell is modified, thus bringing the cell in the new, wrong, state (Fig. 3.5b).
Fig. 3.5 SEU mechanism: (a) radiation-induced pulse in the SRAM cell and (b) flip of the SRAM cell content
SEUs are not usually permanent faults, because at the first writing operation of the affected memory element the wrong value will be overwritten. However, there are some cases, in which the memory element could be not written again thus changing the SEU effect to a permanent one, until a reset is performed or a non-functional writing operation is done, for example, by means of an error correction system. MCU There are two cases in which multiple SEUs could be present in the same circuit. The first one is the case of SEUs accumulation; if an SEU has not been corrected yet, when another particle provokes a second upset, two faults will be present at the same time. This case is more frequent with “permanent” SEUs, in those memory elements that are not usually written but just read.
48
3 Reconfigurable Field Programmable Gate Arrays
The second case of multiple SEUs is the effect of a single particle that upsets more than one memory element. This effect is called multiple cell upset (MCU). It differs from the first case because it is the effect of a single particle and not of the accumulation of consecutive SEUs. MCUs happening depends not only on the energy of the particle and its incident angle, but also on the distance between neighboring memory elements. Indeed if the particle has enough energy to travel within the device for a long distance and a particular angle so that it can cross the area occupied by different cells, it could cause an MCU. For this reason, the distance between memory elements is a fundamental parameter that affects the MCU happening probability. Nowadays, the smaller sizes of the cells and the high integration degree, obtained also reducing the inter-cells space, lead to a higher MCU rate, thus making this kind of errors an actual concern for critical applications. SEFI SEFIs are temporary errors that affect elements of the device aimed at controlling its functionality. As a result, such faults may produce a malfunctioning of the whole device, being not recoverable unless a global reset, or even a power cycle, is performed. For example, a fault in the program counter or in the status register of a processor could become an SEFI, bringing the processor in a faulty state from which it is impossible to come back unless a global reset is done. In the same way, faults in the reconfiguration control logic of FPGAs may interrupt the reprogramming functionality, thus requiring a power cycle to restore the correct state of the device. SEL SELs are permanent errors that lead to an increase of the device current. This phenomenon usually leads to the destruction of the device itself if not removed in time. The only way for stopping SELs is powering off the device. As shown in Fig. 3.6, the typical CMOS structure includes, besides the designed p and n transistors, additional parasitic devices that are formed by the interaction of different doped areas. If a current peak is injected in such a parasitic structure, because of its strong positive feedback net, a chain reaction is triggered creating a short circuit between ground and Vdd that could burn down the device. A spurious current peak in the parasitic structures can be injected by direct or indirect ionization by means of a particle that strikes the device in that area, starting the SEL effect. Detection structures can be added to the sensitive transistors, in order to detect SELs and clearing them, by means of power cut off. SEGR SEGRs are destructive errors that have been observed in non-volatile memories, like EEPROMs, that lead to the device burnout. SEGR effect is due to the perforation of the transistor gate insulator. This can happen if a heavy particle, like a heavy ion, strikes the transistor gate while it is stressed by a high internal electric field.
3.1
The Impact of the Environment on the Device
49
Fig. 3.6 Typical CMOS structure with parasitic devices
Such field occurs, for example, during a writing operation in non-volatile memories, like EEPROMs. The heavy ion strike produces a highly conductive track within the dielectric by means of ionization, that forms a very low resistance path between the gate and the substrate. This can discharge the capacitor formed by the gate, the insulator, and the bulk and, if there was enough energy stored in it, the excessive heating could lead to the dielectric melting. 3.1.4.2 Total Ionizing Dose Different from the single event effects, total ionizing dose (TID) is the effect of the accumulation of the charge injected by radiation. It depends on the exposure time, the flux of the particles, and their LET. TID in space and avionic applications is mainly due to the effects of Van Allen Belt protons and electrons and the consequent secondary particles generated by the interaction of the former with the device. TID models the effects of charge accumulation and displacement damages that, together, lead to different misbehaviors. First of all, a global worsening of the device performance is registered; transistors slow down and the power consumption increases. In memory circuits, ionizing dose affects the sensitivity of the logic states of memory cells asymmetrically, causing an imbalance. This effect is due to mobility and transistors’ threshold changes caused by ionizing radiation. In Flash memories has been proven that TID lead to a change in the threshold of the floating-gate transistors so that they lose reprogrammability. A second effect of TID is the change in the SEE sensitiveness. Indeed the accumulated charge and the displacement within the crystal lattice of the device could make the device more sensitive to single events. One consequence of this is that SEUs can cause the so-called stuck bits that are memory cells whose value is modified by an SEU but because of the ionizing dose
50
3 Reconfigurable Field Programmable Gate Arrays
their correct value cannot be restored. Some of these bits anneal rapidly within a few minutes, some when irradiated by UV photons, some others take months to anneal. In general, TID effects can be annealed by means of heating the device, in order to provide enough energy to the crystalline lattice so that atomic locations can be restored and trapped charges can be released. 3.1.4.3 Cross Sections Cross section of an electronic device has been previously defined as the total area of the circuit such that, if exposed to radiation, there will be faults. The whole unshielded area of an FPGA is sensitive to radiation effects described above. However, considering the different fault models explained in this section, it will be clear that not the whole area is sensitive to all the kinds of fault. For this reason it is possible to define different cross sections, one for each fault model, that better express the sensitiveness of the device to that particular fault. Mitigation techniques, indeed, mainly focus on one or few of these fault models, because they cannot cope with all of them at the same time. It is thus useful to have a comparison measurement for evaluating the goodness of the technique related to the faults it addresses. In the following we will refer to different cross sections, depending on the fault model we are dealing with. For example, the SEU cross section will identify the hypothetical area measure that represents a surface of the target, such that if a particle crosses this surface there will be an SEU. In general, cross section is expressed in terms of area units, and thus measured in square centimeters, but for what concerns fault model cross sections, they can be easily and clearly expressed in terms of sensitive elements. For example, SEU cross section could be expressed by means of the number of sensitive memory elements.
3.1.5 Fault Models in FPGAs Fault models described in the previous section can happen in electronic devices exposed to radiation and thus also in FPGAs. On the basis of the architecture and the available resources present in an FPGA, those kinds of faults can cause different effects. 3.1.5.1 SETs Effects SETs are generated in a sensitive node of the FPGA that could be an internal transistor of a logic gate or a switch that, on the basis of the content of the corresponding configuration bit, programs a certain resource of the FPGA itself. SETs generated in the functional logic can be sampled by flip-flops, thus causing multiple errors that will propagate through the circuit up to its outputs if no mitigation schemes are present. On the other hand, SETs in the configuration logic could cause a transient modification in the circuit structure itself, for example, in the routing architecture, but it will be restored immediately after the SET. Besides these two kind of effects,
3.1
The Impact of the Environment on the Device
51
the most critical SET-induced errors are the ones happening in the global lines, like clocks and resets. Indeed, if an SET is generated, for example, in a clock line, it could affect the whole circuit changing the sampling period of the several memory elements. SETs in the clock line could lead to metastability problems, caused by the sampling of unstable signals or loss of data. SETs in the reset line could also cause the loss of data and state information of the system. 3.1.5.2 SEUs / MCUs Effects SEUs in FPGA can affect each memory element of the device. In FPGAs, memory elements could be divided into two classes: configuration memory elements and user memory elements. To the first group belongs all the memory cells that compose the configuration memory of the device as well as the registers and flip-flops of the several control machines of the FPGA configuration itself. To the second group, instead, belong all those memory elements, flip-flops, registers, embedded memories that can be used by the designer to implement its own application. For example, not only flip-flops and shift registers of the FPGA fabric, as well as block RAMs, fall in the second class, but also registers and memories of the hardwired macros, like embedded processors and DSPs. The effects of SEUs depend on the kind of affected resource as well as on its class. The effect of SEUs in the configuration memory bits could cause permanent changes in the implemented circuit. Such faults are not transient in FPGAs because the configuration memory is usually not written after the first configuration. This happens only when a new circuit is loaded in the device or a portion of the old one is modified by a reconfiguration operation. Configuration memory bits control the routing architecture of the implemented application and the value of the used functional units. If basic functions are implemented by means of LUTs, configuration bits control the content of such memories or, on the other hand, if they are implemented by means of configurable interconnections of basic gates, configuration bits control such an interconnection net. Configuration memories of modern FPGAs can be built into two different technologies: SRAM and Flash. The first kind of memory is highly sensitive to upset, as proofed in [7, 10, 15], while the second at this moment is not [17]. However, this kind of technology needs a deeper analysis about its SEUs sensitiveness. Indeed, while Flash memories have been proved to be sensitive to radiation-induced SEUs, when used to implement FPGAs no upset has been observed. This is because the size of the memory cells used in FPGAs configuration memories is greater than the one of the cells used in high-density devices used for storage, where gigabytes of data are contained. The greater size makes the cells less sensitive because the critical charge needed to upset their content is bigger. In the future, the technology scaling could lead to observe SEUs also in the configuration memory of Flash-based FPGAs. The effect of SEUs in elements that control the configuration management could lead to the wrong configuration of the device or impossibility to reconfigure it until a reset is performed.
52
3 Reconfigurable Field Programmable Gate Arrays
The effect of SEUs in user memory elements cause transient faults that propagate wrong values through the logic. The period affected by the produced error depends on the time between the SEU and the following writing operation of the upset element. For user flip-flops this period is usually a single clock cycle, while for embedded memories and registers the period depends on the write-access rate. A particular attention should be paid to the state flip-flops of finite state machines (FSMs). Indeed, the value of the outputs and of the next state in FSMs is computed on the basis of the value of the current state. SEUs in such structures are particularly dangerous because, if the state register is modified, this brings the FSM in a wrong state and all the following operations will be incoherent with the rest of the circuit. Until the FSM is not reset the wrong sequence of states corrupts its functioning, providing to the circuit the wrong output values. Analysis and hardening solutions require an accurate knowledge about the effects of SEUs and MCUs into the configuration memory. We describe these kinds of phenomena and the generated analysis model in Sections 3.1.6 and 3.1.7. 3.1.5.3 SELs/SEGRs Effects The effect of SELs and SEGRs usually causes the destruction of the device or at least parts of it. Because of the several and heterogeneous elements that are present in modern FPGAs, like memories, glue logic, complex processors, communication peripherals, many different technologies have to be designed to be SEL/SEGR immune. 3.1.5.4 TID Effects TID provokes three kinds of effects: performance degradation, power consumption increase, and programmability loss. The first effect leads to have slower devices, whose maximum operating frequency is reduced, scaling proportionally with the increase in the absorbed dose. The second effect leads to higher leakage currents that increase power consumption when transistors are not used. Finally, the third effect leads to losing reprogrammability in the FPGA configuration memory. Threshold voltages of memory elements’ transistors shift with the increase in the absorbed dose causing such a misbehavior. TID effects are permanent faults that cannot be removed by means of a simple power cycle, because they modify the lattice structure of the materials inside the FPGA. If the accumulated dose is not too high, annihilation through heating can restore the device functionality.
3.1.6 SEUs and MCUs Effects on the FPGA’s Routing Resources The analysis method of the interconnection effects have been developed thanks to the analogy of the switch block with the multiplexer. The multiplexer, also known as MUX or data selector, as widely known are devices with n inputs and a single
3.1
The Impact of the Environment on the Device
53
Fig. 3.7 Typical switch block main element: the multiplexer
output. Given an input value, they have the ability to connect the output line to the selected input and thus to transmit to the output line the data of the input lines, as illustrated in Fig. 3.7. Where the inputs I1 –I7 could be connected to the output Y only one at a time, where the choice is defined by the selector inputs S0 –S2 . All the multiplexers are characterized by an expression logic with reference output. In particular, the decoded MUX PIPs have a similar functionality but they are characterized by a decoding of the selector inputs, and a segment is controlled by at least k selector inputs for the 2k inputs, besides a large part of the inputs and outputs are bidirectional and they are characterized by a decoding related to the direction of the segments. The switch blocks are characterized by mono-directional buffered routing segments, bidirectional buffered routing segments, and bidirectional buffered in both the directions. The information extraction process of each switch block consists of three steps: 1. Definition of the electrical node structure. The nodes are organized in three sets for each switch block: • Output nodes: {U1 , U2 , U3 , ...} • Input nodes: {I1 , I2 , I3 , ...} • Bidirectional nodes: {B1 , B2 , B3 , ...} The extraction of such information is obtained from editing tools, electrical analysis, obtained on the selected FPGA architecture. It is notifiable that the bidirectional nodes are contained in both the input node and output node sets: • Output nodes {U1 , U2 , U3 , B1 ≡ U4 , B2 ≡ U5 ...} • Input nodes {I1 , I2 , I3 , B1 ≡ I4 , B2 ≡ I5 ...} 2. For each output nodes, it is necessary to identify the set of input nodes that can be connected to the same outputs: • ∀Ui ∈ A it is determined the set B I = {connectable_input_nodes}
54
3 Reconfigurable Field Programmable Gate Arrays
3. Single activation of each routing segment (PIP) correspondent to a given segment set individuated by an output node and by the input nodes of the sets B I . Determination of the set of bit that controls the routing resources. The obtained result is ∀N odes is determined the set BITs = {bit1 , bit2 , bit3 } The information extracted consists to generate a data structure able to describe the routing and logic components allocated in the tile, since the correspondence between the resources of the tile and the configuration memory is decoded in a different way depending on the logic resources controlled. Three main data structure can be generated for each analyzed FPGA architecture: • Routing database for decoded MUX PIP • Routing database for non-decoded MUX PIP • Database for the CLB’s internal routing resources Each database is generated according to the characterization of all the possible single event upsets (SEUs) or multiple cell upsets (MCUs) happening internally to the configuration memory. This characterization is realized thanks to identification of all the possible configuration memory bits for a given resource and to the storage of the configuration of the resource in relation to the modification (single or multiple) induced by the modification of the configuration memory bit. 3.1.6.1 Decoded MUX PIP The information related to the decoded MUX PIP consists of the information related to the correspondent PIPs present internally to the switch blocks. Considering a unique output node and applying the method illustrated in the previous section it is possible to obtain several strings of bits correspondent to a single PIP or a group of PIPs between input nodes and the output node, as in the example illustrated in Fig. 3.8. Each string identifies a programmed configuration correspondent to a PIP with a specific output node. For example, considering the following string: bit p = {bitC39R0 , bitC40R0 , bitC41R0 , bitC42R0 , bitC43R0 , bitC44R0 , bitC45R0 } if, for instance, the string of bit is bit p = {0, 0, 1, 1, 1, 1, 1} the instantiated PIP is I 1 → A1. Since it is possible that there exist more than a single input node connected to an output node, more PIPs can be instantiated in a single switch block with different sources of the signal and the same destination node; it is necessary to define the behavior of the string of bit bit p in relation to more than a single PIP with the same destination point activated at the same time. This behavior can be described by a logic function depending on the manufacturer characteristics and implementation rules adopted. For instance, in the case of the Xilinx devices, the decoded MUX PIPs are described by the function illustrated in Fig. 3.9.
3.1
The Impact of the Environment on the Device
55
Fig. 3.8 Example of the configuration strings for the output node A1 of the correspondent switch block
Fig. 3.9 Decoding function for the configuration memory bit and PIPs with similar destination points
The following terminology is used: • BIT0 = {bit0 , bit1 , ..., bit7 }: string of configuration bits correspondent to the nonprogrammed bits. I = bitI , bitI , ..., bitI , BITII = bitII , bitII , ..., bitII , .... • BITC c7 c7 c0 c1 c0 c1 C M M M , bitc1 , ..., bitc7 : strings of bit correspondent to the single PIP • BITCM = bitc0 versus the similar output node.
56
3 Reconfigurable Field Programmable Gate Arrays
While B0, B1, ..., B7 are defined as M M • B0 = bitIc0 , bitII , ..., bit , B1 = bitIc1 , bitII c0 c0 c1 , ..., bitc1 and I M • B7 = bitc7 , bitII c7 , ..., bitc7 . For the characterization of all the possible SEU and MCU effects, it is necessary to consider each single string and to generate all the possibly generated bit-flips. These combinations can be generated by proper software tools: for each string of the configuration memory correspondent to a programmed configuration without a bit-flip, there will be injected all the possible SEUs. Through this process it is possible to individuate all the PIP’s sets that can be instantiated in order to obtain a particular configuration. Since, there may be multiple PIP configuration with the same destination node that generates the same string of the configuration memory, it is necessary to define the domination rules that are fundamental in order to obtain the effective configuration of the PIP within a tile. The domination rules are the following: 1. The unique PIP corresponds to singularly activated segments, in other conditions they are disabled. 2. The PIP correspondent to strings that underlying other strings define a dominant PIP. The logic function describing the relation between the PIPs with the correspondent destination node and the domination rules can be implemented by software algorithms.
3.1.6.2 Non-decoded MUX PIP The information related to the non-decoded MUX PIPs is related to the configuration memory bits description of the non-decoded MUX PIPs used in the correspondent switch blocks. The PIPs related to the non-decoded MUX PIP can be organized in two types: 1. Bidirectional non-buffered: These PIPs are related to a single bit of the configuration memory. 2. Bidirectional buffered in a single or both directions: These PIPs are related to a set of univocal configuration memory bits. Both the types can be organized in a sequential database, where the keys are defined depending on the type: 1. Identifier of the bit within the bit matrix of a tile 2. Mask of bit correspondent to the rows of bit within a tile
3.1
The Impact of the Environment on the Device
57
3.1.7 SEUs and MCUs Effects on the FPGA’s Logic Resources The logic resources of a configurable logic block are essentially divided into three types: multiplexers, look-up tables and flip-flops. The extraction of the information for the multiplexer resources consists of the following steps: 1. Individuation of the multiplexer for each configuration logic block of the set of eligible configurations: ∀MUXi identification of AMUX = {CFG1 , CFG2 , ...} 2. Activation of the AMUX eligible configuration for each MUXi and definition of the configuration memory bit set ∀MUXi is determined the set BITsMUXi = {bit1 , bit2 , bit3 } The decoding of the LUT is referred to its specific truth table. For example, in case of four-input LUTs, the reference truth table is illustrated in Table 3.1, where a0 , a1 , a2 , ... are the correspondent configuration memory bits configuring the LUT’s logic function. In order to identify the correspondence of these bits with the configuration memory cell of the logic tile it is necessary to identify the following logic situation: ∀LUT : 1. Y = A1 2. Y = A1 Table 3.1 Look-up table decoding data
A4
A3
A2
A1
Y
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15
The decoding of the flip-flops is referred to the specific configuration settings. In order to identify the correspondence of its bits with the configuration memory cells it is necessary to activate all the possible configurations. Similar to the multiplexer decoding it consists of the following steps: 1. Individuation of the multiplexer for each configuration logic block of the set of eligible configurations: ∀FFi identification of AFF = {CFG1 , CFG2 , ...}
58
3 Reconfigurable Field Programmable Gate Arrays
2. Activation of the AMUX eligible configuration for each MUXi and definition of the configuration memory bit set ∀FFi is determined the set BITsFFi = {bit1 , bit2 , bit3 } The information of the configuration logic block resources consists of all the configuration memory bits data related to the multiplexers, flip-flops, and logic gates existing in each CLB of a tile. Each logic resource is univocally defined to more than two bits of the configuration memory. The database consists of a key and a mask that identifies the set of bit of the logic resource. The field relative to the identification of the resource consists of the name of the CLB within a tile (S1 or S0) and to the name of the logic resource individuated and to its configuration. Please note that the CLB database does not include the information related to the LUTs, since the implemented function is decodable with the configuration assumed directly by the bit.
3.1.8 Topological Modifications Induced by SEUs and MCUs In order to classify and localize the effects on the logic and routing resources induced by SEUs and MCUs into the FPGA configuration memory, it is necessary to identify the topological modification and to recognize the modification introduced in the architecture. The effects into the FPGA configuration memory can be catalogued into two classes: • Routing resources • CLB resources The most part of the configuration memory bits related to the tile is controlling routing resources. Each net of a circuit implemented by the FPGA architecture is realized thanks to the connection of logic modules through programmable interconnection points (PIPs). An SEU happening into the bit of the matrix correspondent to the control of the routing resources can modify a PIP altering or interrupting the propagation of one or more signals to the CLB at the downstream of the effect. The schematic representation of the effect scenario can be described considering the original interconnection condition illustrated in Fig. 3.10 that provides the implementation of two different routing nets net0 and net1 using, respectively, the two PIPs I3 → O1 and I10 → O4 . Considering the configuration illustrated in Fig. 3.10 it is possible to identify all the possible effects induced by a modification of a bit into the configuration memory. • Open: The PIP correspondent to the net1 is not programmed any more; therefore, I10 and O4 are not connected. There are two cases classifiable as open. The first case is illustrated in Fig. 3.11 where the net1 is deleted. The second case is illustrated in Fig. 3.12; the net1 is deleted and a new net netn connects an unused
3.1
The Impact of the Environment on the Device
59
Fig. 3.10 The original routing condition
Fig. 3.11 The open effect, first case
Fig. 3.12 The open effect, second case
input node to the previously used output. In the second case, the signal netn has a logic value that is not identifiable with an electrical local analysis. • Conflict: A new PIP, correspondent to the netn , is added between an input node and an output node, both previously used, as illustrated in Fig. 3.13. The new PIP creates a conflict on the output node O1 . The propagated signal is not identifiable
60
3 Reconfigurable Field Programmable Gate Arrays
Fig. 3.13 The conflict effect
n
Fig. 3.14 The input antenna effect
•
• •
•
by means of only a topological analysis. It is necessary to electrically stimulate the circuits in the correspondent location in order to obtain the correct behavior. Input antenna: A new PIP, correspondent to the netn , is added between an unused input node and an used output node, as illustrated in Fig. 3.14. The new PIP can influence the behavior of the output node depending on the output logic value assumed by the nodes of the CLB. Output antenna: A new PIP, correspondent to the netn , is added between an used input node and an unused output node, as illustrated in Fig. 3.15. The new PIP does not influence the behavior of the implemented circuits. Bridge: The PIP correspondent to the net1 is disabled while a new PIP, correspondent to the netn , is instantiated between an used input node and the output nodes of the previously used net1 as illustrated in Fig. 3.16. The behavior of the implemented circuit is modified. Tolerant: The configuration of the instantiated PIP is not affected by modifications. The modification of the bits into the configuration memory does not create any modification of the topological instances of the nets.
3.1
The Impact of the Environment on the Device
61
n
Fig. 3.15 The output antenna effect
n
Fig. 3.16 The bridge effect
• Unrouted: The modification of the PIP cannot be classified in anyone of the considered classes. The bits of the logic resources internally to the CLB are used to • describe the logic function implemented by the LUT; • configure the internal multiplexers of a CLB selecting the internal routing resources; • define the internal configuration of the CLB. Modifications of the configuration memory bits correspondent to the CLB’s logic resources can provoke a malfunction of the implemented circuit, depending on the affected resources: • Fault LUT: An error affecting a configuration bit of a LUT can provoke a modification of the logic function implemented. • Fault MUX: An error affecting the selection bits of a MUX can provoke the modification of the path enabled to the output. • Fault configuration CLB: An error affecting the bits related to the configuration resources of a CLB may induce a behavioral modification of the logic functionality of the circuit.
62
3 Reconfigurable Field Programmable Gate Arrays
Fig. 3.17 The classification of the error effect in the FPGA’s logic and routing resources
The classification of the effects into the configuration memory can be summarized in the diagram illustrated in Fig. 3.17.
3.2 Analysis Techniques The problem of analyzing the impact of ionizing radiations on applications mapped on FPGA devices entails two aspects. The first aspect, which we can define as technology-dependent analysis, is related to quantify the effect that a given radioactive environment induces on a given FPGA device in terms of cross section. As the name suggests, technology-dependent analysis focuses only on investigating the impact of ionizing radiations on the silicon technology adopted for manufacturing the FPGA device and does not take into consideration any specific application mapped on the FPGA device. Indeed, the cross-sectional figures obtained from this analysis quantify the portion of particles that produce observable effects on the considered device, no matter which function it implements. For example, one of the measures produced by technology-dependent analysis is the configuration memory cross section, which gives the probability for the particles in a certain environment to provoke SEUs in the bits of the FPGA configuration memory. The second aspect, which we can refer to as application-dependent analysis, is related to quantify the impact of the radiation-induced effects on the application the FPGA is implementing. This aspect takes into account the results of the technology-dependent analysis and aims at identifying among the particles that produce visible effects on the FPGA device, whose portions of particles lead to observable effects on the application the FPGA implements.
3.2
Analysis Techniques
63
To understand the difference between technology- and application-dependent analyses let us consider a generic SRAM-based FPGA that implements a given circuit, which uses 10% of the FPGA resources. For the considered example only 20% of the FPGA configuration memory bits hold meaningful information, i.e., only 20% of the FPGA configuration memory bits hold values that if changed by SEEs corrupt the implemented circuit. When performing the technology-dependent analysis to measure the configuration memory cross section, we are looking for the portion of ionizing radiation that when hitting the surface of the FPGA device produces SEUs. During this analysis any SEU observed, no matter if it is originated in the 20% meaningful bits, or in the remaining useless 80%, is considered. Let us suppose that the results of such technology-dependent analysis are the FPGA configuration memory cross section σConfigurationMemory . Conversely, during application-dependent analysis, the effects on the application of SEUs in the 20% meaningful bits of the configuration memory are investigated, only. Let us suppose that the portion of SEUs that produce visible effects on the application the FPGA implements is Application . The application-dependent FPGA configuration memory cross section σApplication can be computed as follows: σApplication = σConfigurationMemory × Application
(3.5)
The technology-dependent analysis should be performed once, when a new FPGA device is introduced in the design flow. Once completed, the results it produces can be re-used for any new design mapped on the same FPGA device. It is worthwhile to remark here that as cross-sectional figures are obtained by observing the interaction between ionizing radiation in a certain radioactive environment and the FPGA silicon structures, as soon as the characteristics of the environment change or the silicon structures change, new iterations of the technology-dependent analysis are needed. For example, in case FPGA manufacturer produces and updates mask set (for example, for fixing some bugs or for improving some functions), technology-dependent analysis should be repeated. Similarly, as a variability can be observed among fabrication lots, technology-dependent analysis should be performed every time FPGA devices coming from new manufacturing lots are adopted. Conversely, application-dependent analysis should be performed every time a new design is produced or an already existing design is updated. Indeed, different designs may entail different FPGA resource utilization and, therefore, different application-dependent cross sections are expected. As a result, applicationdependent analysis techniques should be able to assist the designers during his/her everyday work and should be intended as electronic design automation (EDA) tools, not dissimilar to simulation and synthesis tools. Due to the different purposes of technology- and application-dependent analyses, different approaches have been proposed that can be grouped into four different categories:
64
3 Reconfigurable Field Programmable Gate Arrays
• Life testing, to observe the impact of natural radiation environment on FPGA devices. • Accelerated radiation testing, to observe the impact of an artificial radiation environment on FPGA devices, to accelerate the observation of meaningful events. • Fault injection, to investigate the impact of meaningful faults model on applications mapped in FPGA devices using simulation tools or emulation platforms. • Analytical techniques, to investigate the impact of meaningful faults model on applications mapped in FPGA devices without resorting to neither simulation nor emulation. The former two categories encompass technology-dependent analysis techniques, while the latter two categories enlist application-dependent analysis techniques.
3.2.1 Life Testing In order to understand the impact of a radioactive environment on FPGA devices the simplest approach is to place the devices of interest in the natural radioactive environment where the devices have to operate when deployed in a real application, and then to monitor the devices continuously looking for symptoms of errors or impending errors [22]. As an example we can consider the Rosetta experiment currently run by Xilinx [12, 13]. The experiment aims at measuring the neutron cross section of memory elements embedded in several different generations of Xilinx SRAM-based FPGAs and consists in initializing every memory element (configuration memory as well as user memory) with a known pattern, and in monitoring every element looking for any difference with the expected pattern. Given the number of observed errors N over a given period of time TExperiment , we can compute the failure in time (FIT) of the device as follows: FIT =
N TExperiment
(3.6)
In case of natural radioactive environment the particle flux can be as low as ticle 10−13 par and therefore observing effects can take several months. To speed cm2 ×s up the process life testing experiments are normally performed monitoring a large number of devices (e.g., 200–1,000 parts) as acceleration factor. Table 3.2 reports the number of devices used in the experiment [13]. In the case of the Rosetta experiment the large number of devices is combined with another environmental acceleration factor. As the experiment targets atmospheric neutrons, Xilinx is exploiting the altitude as acceleration factor. Indeed, as the atmospheric neutron flux increases with the altitude, bringing the devices to higher altitudes allowing increasing the probability of observing meaningful events
3.2
Analysis Techniques
65
Device family
Table 3.2 Device under test Device number Technology (nm)
Quantity (Num)
Virtex-II FPGAs Virtex-II Pro FPGAs Spartan-3 FPGAs Virtex-4 FPGAs Virtex-4 FPGAs Virtex-5 FPGAs
XC2V6000 XC2VP50 XC3S1500 XC4VLX25 XC4VLX60 XC5VLX110
300 600 200 400 300 300
Table 3.3 Sites used for the Rosetta experiment
150 130 90 90 90 65
Location
Altitude (feet)
San Jose, CA Marseilles, France Longmont, CO Albuquerque, NM Pic du Bure, France Pic du Midi, France Aiguille du Midi, France White Mountain, CA Mauna Kea, HI Rustrel, France
257 359 4, 958 5, 145 8, 196 9, 298 11, 289 12, 442 13, 000 −1, 600
Resource
Table 3.4 Results published in [13] Virtex-II Virtex-II Pro Virtex-4
Virtex-5
Configuration memory Block RAM
401 FIT/Mb 397 FIT/Mb
151 FIT/Mb 635 FIT/Mb
384 FIT/Mb 614 FIT/Mb
246 FIT/Mb 352 FIT/Mb
in a given amount of time. Table 3.3 lists the locations used for deploying the Rosetta experiments. The results collected during the Rosetta experiment have been updated on [13] as summarized in Table 3.4. Another relevant life-testing example is the Cibola Flight Experimental Satellite [6], which has been sent to low-earth orbit (560 km) on March 2007. The satellite embeds three reconfigurable computing processors, each encompassing three Xilinx XQVR1000 FPGAs. The satellite implements a monitoring feature that counts the number of SEUs affecting the nine FPGAs and that sends the collected data to ground. The experiments showed that 759 SEUs have been detected over 2,830.7 device days. Moreover, as data about the observed SEUs also contain information about the actual position of the satellite, the experiments allowed showing that SEUs do not occur uniformly around the Earth, as a peak of events have been recorded when the satellite traverses the South Atlantic anomaly. Although life testing is the most accurate approach to observe the impact of the real radioactive environment over the device of interest, as it does not entail any model of the radioactive environment neither of the device under investigation, it entails long observation time (e.g., 6 months or more) to collect statistically meaningful data. Moreover, life testing may entail very expensive setup (e.g., large
66
3 Reconfigurable Field Programmable Gate Arrays
quantities of devices installed to high-altitude facilities or devices operated in an experimental satellite). As a result, more time- and cost-efficient approaches for performing technology-dependent analysis are needed.
3.2.2 Accelerated Radiation Testing The purpose of accelerated radiation ground testing is to allow performing technology-dependent analysis through artificial radiation environments, where the particle flux is several orders of magnitude higher than in the natural environment [14]. As a result, few minutes can model years of radiation exposure, allowing collecting statistically meaningful data in a much shorter amount of time with respect to life testing. Radiation testing is widely used to measure different cross sections useful to perform the technology-dependent characterization of the various elements that can be found in an FPGA architecture: • Single event upset (SEU) cross section for events affecting user memory elements. • Single event upset (SEU) cross section for event affecting configuration memory elements. • Single event transient (SET) cross section for events affecting resources such as global clock and reset lines, and any other resource that is sensitive to SETs. • Single event functional interruption (SEFI) cross section for events affecting the resources controlling the FPGA behavior (e.g., configuration control logic). In the above-mentioned cases, radiation testing aims at collecting measurements to experimentally establish the threshold LET for the effects of interest (e.g., the threshold LET for the configuration memory SEU cross section is the lowest LET value for which SEUs in the device configuration memory are observed), and data useful for computing the cross-sectional saturation value. Furthermore, radiation testing is used to quantify other aspects useful to define the radiation sensitiveness of a device, and in particular • Total ionizing dose limit, which quantifies the maximum radiation dose the device can withstand without showing any degradation. • Single event latch-up threshold, which is the minimum LET value a particle should induce in the device to trigger single event latch-up events. In order to perform accelerated radiation ground testing a test facility and a suitable setup is needed, as depicted in Fig. 3.18, whose main components are as follows: • Particle accelerator, which is the source of the particle flux used during the experiment. Different technologies can be used for accelerating particles and to produce different cocktails of particles. We can find accelerators able to produce mono-energetic particles or multi-energetic particles. We can find accelerators
3.2
Analysis Techniques
Particle accelerator
Beam Spreader
67 Beam Collimator
Test Chip
Faraday Cup
Chip Tester Radiation shield
Monitoring Station
Fig. 3.18 Typical setup for accelerated radiation testing
• • •
• •
able to produce only one single particle type (e.g., protons) or a wide spectrum of particles (e.g., different types of heavy ions). Beam spreader, which is used to distribute particles on a larger beam than that coming from the accelerator. Beam collimator, to focus the beam on a given spot size, which is typically ∼ 2 cm in diameter. Test chip, which is the device under test. Depending on the particle type used for experiments, the test chip may need a suitable preparation. Proton and neutron do not require any specific preparation for the device, as they are able to traverse the whole chip and produce ionization (by indirect deposition of charge in the silicon). Other particles like heavy ions, being heavier than proton and neutron, require the active area of the test chip to be exposed to be able to produce ionization. Indeed, unless the active area of the device is exposed, heavy ions may not be able to traverse the package and the device substrate. As a result, wire-bond devices must have the top of the package de-lidded, while flip-chip devices must have both the package de-lidded, and the substrate thinned. Faraday cup, which is used to collect the particles after they reached the test chip. Chip tester, which is used to observe the test chip and to provide it with the needed input stimuli. Depending on the type of cross section we are looking at, different testers are needed. In the case of SEU cross section the tester should be able to 1. initialize the memory elements of the device to a known pattern before radiation exposure; 2. observe the memory elements of the device after radiation exposure;
68
3 Reconfigurable Field Programmable Gate Arrays
3. compare the read information with the expected values; 4. signal to the monitoring station any mismatch. More complex scenario may be required for other phenomena. For example, in order to measure the SET cross section the chip tester must be able to apply to the device, which has been configured to implement a specific application to observe SETs, a set of input stimuli to let SETs to propagate to the outputs of the chip. Moreover, during radiation exposure the chip test must constantly monitor the outputs of the chip for recording any SET event and for signaling it to the monitor station. • Monitoring station, which is used to collect the data gathered during the experiment execution. Monitoring station is normally implemented using a PC or a laptop suitably connected with the chip tester. Radiation testing is an invaluable technique to quantify the response of a device to ionizing radiation; it is therefore mandatory any time a new device has to be characterized or a new silicon version of an already-characterized device is available. However, it cannot be regarded as a designer aid. Indeed, highly skilled personnel is needed for developing setup for radiation testing and for implementing the measurements as well. Moreover, expensive facilities are needed, which can be accessible only for few hours every month. As a matter of fact, radiation testing can hardly be used any time a new version of an application is ready for being deployed and evaluated on an FPGA device. As a result of these observations, we can claim that designers should be supported by analysis techniques that can be exploited much more frequently during the everyday design work. Analysis techniques are needed that can be used as other tools are (for example, HDL simulators), to assist designers in developing new applications mapped on FPGAs. The new analysis techniques are focused on the application the designer is developing instead of the technology at the basis of the FPGA that will be used to implement the design.
3.2.3 Fault Injection The concept of fault injection consists in inoculating a fault in a system and observing how the fault propagates within it, eventually reaching the system outputs [11]. This concept perfectly fits with the purpose of simulation tools that offer the capability of studying the dynamic behavior of a system model. Fault injection can be provided as an additional feature of simulation tools, which will be able to allow designers studying the dynamic behavior of system models when affected by faults, along with the more traditional and more established features oriented to design debug, performance, and power evaluations. The basic components of a fault injection system are depicted in Fig. 3.19. The target system is the object of the investigation, where faults will be inoculated and
3.2
Analysis Techniques
69
Fault Injection System Fault Injector
Fault Injection Manager
Workload Generator
Target System
Monitor, Collector and Analyzer
Fig. 3.19 Overview of a fault injection system
whose behavior will be monitored to study the impact of the injected faults. For the sake of this book, the target system is an FPGA-based design implementing a given application. With the term fault we refer to a malfunction of the target system that may be subjected to during its lifetime, which may let the target system behavior deviate with respect to the intended one. The malfunctions that affect a system may be • permanent, in case the malfunction always affects the system and • transient, in case the malfunction affects the system when a certain set of conditions are met. Otherwise, the system functions normally. During injection, fault models are used to capture such malfunctions. As far as FPGA-based systems are concerned, three types of fault models are the mostly widely used: • User-memory SEUs, to model the effect of ionizing radiation affecting FPGA memory element resources as a flip of one bit of user memory occurring randomly in time and space. • Configuration memory SEUs, to model the effect of ionizing radiation affecting the memory elements in the FPGA configuration memory as a flip of one bit of configuration memory occurring randomly in time and space. • Configuration control-logic SEUs, to model the effect of ionizing radiation affecting the memory elements within FPGA configuration control logic as a flip of one bit occurring randomly in both time and space.
70
3 Reconfigurable Field Programmable Gate Arrays
As other effects are becoming of interest, namely multiple cell upsets, singe event transients, and SEFI, other approaches have been proposed to support them. The main components of a typical fault injection system are as follows: 1. Injection manager: It supervises the injection campaign. Given the list of faults that has to be inoculated into the system, the fault list, it runs an injection experiment for each of them. Each experiment encompasses the following step: (a) The first fault f to be injected is selected from the fault list. (b) The target system, the workload generator, the data collector, and analyzer are set to the reset state. (c) The fault injector is programmed to inoculate the fault f , and the system monitor is programmed accordingly. For example, let us suppose the fault f has to be injected in the target system at time t f , being t0 the time when the first input stimuli are applied. The fault injector is instructed to stop the system at time t f , to inoculate the fault (accordingly to the type of the target system) and to resume the target system after fault injection to let the fault to propagate. Moreover, the system monitor is programmed to trigger data collection from time t f until the end of the workload. (d) The workload generator is activated. The input stimuli are applied to the target system; in accordance to step (c) fault f is injected in the target system and data are collected from t f onward. (e) Upon the completion of the workload, the fault effect is classified, and the whole procedure is repeated from step (a). 2. Fault injector: It inoculates a fault in the target system as it is activated by the workload generator. As detailed in the following sections, a number of techniques can be used to achieve fault inoculation, depending on the type of the target system. 3. Workload generator: It generates the input stimuli to activate the target system during the fault injection experiment. The input stimuli can be synthetic workload generated ad hoc or real inputs taken from the application where the system will be deployed. 4. System monitor, collector, and analyzer: It observes the target system and when necessary it triggers the collection of data from the target system. When triggered by the monitor, it collects from the target system data that are useful to classify the impact of the inoculated fault. For example it can collect the outputs produced by the target system as well as status information. The collected data are then processed by the data analyzer, which produces the classification of the fault effect. This task is normally performed by comparing the data collected on the faulty system with those produced by the fault-free system when activated by the same workload used during the injection experiment. In general, a number of different implementations are possible for the fault injection system, which can be grouped into different categories as a function of the type of the target system. We can have the following:
3.2
Analysis Techniques
71
1. Simulation-based fault injection [5, 8, 11, 21]. This category collects all the methods that have been developed to inoculate faults in a model of the target system whose dynamic behavior is evaluated through the use of simulation tools. The category can be further divided into sub-categories, by considering the abstraction level at which the target system is modeled: (a) System-level simulation: where the system is described in terms of complex components like processors executing software, memories, I/O peripherals, possibly connected through a network infrastructure. This abstraction level is suitable for modeling complex systems as, for example, a cluster of computers that build a server farm. (b) Register-transfer-level simulation: where the system is described in terms of components like registers, arithmetic and logic units, caches. This abstraction level is suitable when the target system is a component in a large infrastructure, as, for example, one of the network card of a computer in server farm. (c) Gate-level simulation: where the system is described in terms of logic gates and simple memory elements. This abstraction level is suitable when the target system is a component in a larger infrastructure, as, for example, a protocol controller chip inserted in a network interface card. 2. Emulation-based fault injection: where the system is first described as in the register-transfer-level case, and it is then emulated using dedicated hardware, like field programmable gate array (FPGA)-equipped boards. This sub-category can be seen as an evolution of register-transfer-level simulation, where hardware emulation is exploited to boost the simulation performance. Indeed, when very complex workloads and very large fault lists have to be considered, the time spent for each fault injection experiment can be prohibitive and means to reduce it are needed. 3. Software-based fault injection: where the system is a physical model, i.e., a prototype, composed of processors, memories, and I/O peripherals, possibly connected through a network infrastructure. Fault injection is implemented by means of specially crafted software that is added to the software the target system executes to implement the desired functionality. This sub-category is intended for performing dependability evaluation when the prototype of the target system is available and can be applied only to processor-based systems. Among the different possible implementations, emulation-based fault injection is often preferred for a number of reasons [4]. Hardware emulation allows much faster evaluation of the workload than any simulation-based approaches; as fault injection entails the evaluation of very large of set of faults (which may encompass million of faults), the fastest evaluation technique is generally the most preferable one. FPGA-based hardware emulation can simplify greatly the efforts needed for fault modeling. Indeed, in order to be able to assess the impact of a fault, the system model should include enough details for allowing a meaningful representation of the fault. As far as SEUs in the FPGA configuration memory and in its configuration
72
3 Reconfigurable Field Programmable Gate Arrays
control logic are considered, the model that should be used for simulation-based fault injection must include a suitable representation for them, resulting in a very complex model. Hardware emulation eliminates this representation problem from the root, in case the very same FPGA used in the target system is exploited for hardware emulation. As an example of emulation-based fault injection system for FPGA-based target system, we consider the FLIPPER tool [1–3]. FLIPPER is aimed at evaluating single event upset (SEU) and multiple cell upset (MCU) effects in the configuration memory and configuration control logic of Xilinx SRAM-based FPGAs. FLIPPER can be used to evaluate the SEU sensitivity of a design, for example, by collecting a probability distribution of the number of randomly injected faults in configuration memory necessary to cause a functional fault. This information can also be useful for defining the parameters for SEU/MCU mitigation techniques (such as scrubbing rate for the configuration memory). In another way, FLIPPER can be used to probe the configuration memory’s sensitive bits of implemented designs, by systematically upsetting each memory bit. Finally, it can be used to model the occurrence of single event functional interruptions (SEFIs) by inoculating SEUs in the device control registers. FLIPPER comprises three main parts: • A flexible FPGA-based board (control board) that rules the fault injection procedure. • A device under test (DUT) board that contains the FPGA to be tested, an XQ2VR6000 device. • A personal computer. FLIPPER exploits the reconfiguration features offered by Xilinx devices. In such devices it is indeed possible to alter, via a suitable interface, the information stored in the configuration memory, as well as the content of the configuration control registers. Thanks to these features: • SEUs and MCUs are injected by configuration memory manipulation. The FPGA device is configured once, downloading the nominal configuration information. Then, for each fault that procedure outlined in Fig. 3.20 is repeated. Through reconfiguration, the bit/bits of the configuration memory we intend to attack with an SEU/MCU is/are corrupted by complementing its/their current value/values. Then the workload is applied, and the outputs of the target system are monitored for any mismatch with respect to the expected values; this operation is referred to as functional test. In case the functional test is passed (i.e., the faulty system produces the same outputs as the fault-free one), the fault is classified as effectless otherwise it is classified as a failure. At the end of the functional test, no matter its outcome, the configuration memory is restored to its nominal values (the faulty bits are fixed), and the entire process is repeated. The location in the bitstream of an injected upset can be random, sequential, or user defined. Sequential mode means that every bit of the bitstream addressing the configuration memory is accessed and modified in sequential order. In random mode, SEUs
3.2
Analysis Techniques
73
START
Configure the device
NO Is there a fault to inject?
END
YES
Corrupt the bits
Apply the functional test
YES Is functional test passed? NO Classify the fault as FAILURE
Classify the fault as EFFECT - LESS
Fix the bits
Fig. 3.20 Sequence of operations performed by flipper for each fault in the configuration memory
are accumulated until the first output error is observed. In user-defined mode, selected locations for injection are provided via a text file. FLIPPER reports all discrepancies for the expected behavior by a real-time comparison of design outputs and gold vectors. Test results are collected in a text file, which can also be used to produce the distribution of probability to the first output error.
74
3 Reconfigurable Field Programmable Gate Arrays
• SEUs are injected in the configuration control register by randomly selecting one register among those in the configuration control logic, and one bit among those in the selected register, after the device has been configured according to its nominal configuration. The nominal value of the register we intend to attack is then read back, the faulty value is computed by complementing the selected bit, and then the faulty register value is sent back to the FPGA configuration control logic. Finally, the functional test is applied and the produced outputs compared with the expected ones. In case of mismatch, the fault is classified as failure, otherwise as effectless. The main disadvantage of fault injection tools, no matter whether they are based on emulation or not, is that the results they produce are workload dependent. Fault effects are classified by comparing the results faulty systems produce in comparison with fault-free ones, as a result, two different workloads may result in different fault effect classifications for the same fault injected in the same circuit. Certain faults may exist whose effect is observable only, and only if, a very peculiar workload is used. As consequence, either the designer has all the possible workloads available, which is hardly possible for complex designs or the analysis produced by fault injection tools capture only a fraction of all the possible effects. To overcome such a limitation, a new generation of application-dependent analysis tools has been developed which focuses classified fault effects independently from any given workload.
3.2.4 Analytical Techniques Analytical techniques aim at classifying the effects of faults without resorting to any given workload, and they are mainly used for studying the effects of SEUs/MCUs in the device configuration memory. The basic idea behind this approach is to calculate the modified circuit resulting from the occurrence of configuration memory upsets, and then to compare the modified circuit with the nominal one. As an example of this class of application-based analysis tools, we present STAR [18, 20]. STAR is a technique for predicting the possible impact of SEUs/MCUs in SRAM-based FPGA systems without resorting neither to simulation nor to fault injection. The technique is based on a topological inspection of the design implemented using SRAM-based FPGAs. By coupling information about the modification upsets may induce in the resources of the used FPGA device with a set of dependability rules, the technique is able to identify all the possible upsets which modify the circuit topology in such a way that, when a suitable stimulus is applied over the circuit’s inputs, the circuit produces erroneous results. The technique is pessimistic in the sense that it identifies all the possible source of errors, independently from the workload the circuit is supposed to process. In the case an upset requires a peculiar input stimuli for being observed that is not included in the circuit’s workload, it will never lead the circuit to produce an error when deployed in its mission.
3.2
Analysis Techniques
75
STAR was originally developed to support application-dependent analysis of the effects of SEUs in the configuration memory of Xilinx Virtex-I and Virtex-4 devices. It operates in a number of two modes: • Discovery mode, which is used to identify which bits of the configuration memory are actually sensitive for the circuit the FPGA implements. Given a circuit mapped on an FPGA, we can classify the bits in the device configuration memory as follows: 1. Not programmed and not sensitive, in the case the resources controlled by the configuration memory bit under analysis is not used by the mapped circuit, and any modification induced by an SEU in the bit will not modify the mapped circuit. 2. Not programmed and sensitive, in the case the resources controlled by the configuration memory bit under analysis is not used by the mapped circuit, but any modification induced by an SEU in the bit has side effects on the mapped circuit. For example, a short circuit is created between two wires of the mapped circuit. 3. Programmed and sensitive, in the case the resources controlled by the configuration memory bit under analysis is used by the mapped circuit, and any modification induced by an SEU in the bit has side effects on the mapped circuit. • TMR mode, which is used to identify which of the sensitive bits of the configuration memory as identified by the discovery mode are able to overcome the fault mitigation techniques the mapped circuit embeds. STAR exploits a set of dependability rules that must be enforced by a mapped circuit in order to be resilient to SEUs in the device configuration memory. As the modifications to an FPGA configuration following the occurrence of a soft error are finite, and known, STAR is able to first derive the modified circuit starting from the original one and to verify whether the modified circuit complies or not with the dependability rules. In case violations are found, STAR reports the information about the soft error responsible for the violations (for example, it reports the address of the bit within the device configuration memory that when altered by a soft error corrupts the adopted mitigation technique). The analyzer algorithm exploits an SRAM-based FPGA architectural generic model consisting of three kinds of resources, as shown in Fig. 3.21: logic blocks, switch boxes, and wiring segments [19]. The logic blocks model the CLBs and contain the combinational and sequential logic required to implement the user circuit. The input and output signals are connected to adjacent switch boxes through wiring segments. The switch boxes are switch matrices where several programmable interconnect points (PIPs), called routing segments controlled by the configuration memory, are available. We modeled the resources within SRAM-based FGPAs as vertices and edges of a graph. We have logic vertices that model the FPGA’s logic blocks, routing vertices that model the input/output points of the switch boxes,
76
3 Reconfigurable Field Programmable Gate Arrays
Fig. 3.21 SRAM-based model used by STAR
routing edges that model the PIPs and wiring edges that model the FPGA’s wiring segments. STAR reads the files produced by the synthesis flow after place and route operations, and it builds the generic model for the mapped circuit, which we refer to as M. Then, for each bit of the configuration memory the following operations are performed: 1. The bit of the configuration memory is upset, to simulate the occurrence of one SEU in the considered bit. 2. A new generic model, which we refer to as M , is computed starting from the modified configuration memory. 3. The model M is compared with M . In case a difference is observed, the configuration memory bit responsible for the difference is considered as sensitive (either programmed or not programmed depending on the initial configuration memory). In the case of TMR mode, the modified model M is further analyzed
3.2
Analysis Techniques
77
to assess whether the upset in the configuration memory resulted in a circuit that violates a set of dependability rules. As far as logic block errors are considered, several different modifications may be observed resulting from an upset in the configuration memory, depending on which resource of the logic block the SEU affected: • LUT error: the SEU modified one bit of a LUT, thus changing the combinational function it implements. • MUX error: the SEU modified the configuration of a MUX in the logic block; as a result, signals are not correctly forwarded inside the logic block. • FF error: the SEU modified the configuration of an FF, for example, changing the polarity of the reset line or that of the clock line. As far as switch boxes are considered, different phenomena are possible. Although an SEU affecting a switch box modifies the configuration of one programmable interconnection point (PIP) in it, both single and multiple effects can be originated. Single effects happen when the modifications induced by the SEU alter the affected PIP, only. In this case one situation may happen, which we call open: the SEU changes the configuration of the affected PIP in such a way that the existing connection between two routing segments is opened. In the routing graph we model such a situation by deleting the routing edge corresponding to the PIP that connects the two routing vertices. In order to describe the multiple effects in terms of modifications to the routing graph, let us consider the two routing edges A S /A D and B S /B D connecting the routing vertices A S , A D , B S , B D , as shown in Fig. 3.22a.
Fig. 3.22 Possible multiple effects induced by one SEU in a bit of the configuration of a switch matrix
We identified the following modifications that could be introduced by an SEU: • Short between A S /A D and B S /B D . As shown in Fig. 3.22b, a new routing edge is added to the graph that connects one end of A to one end of B. This effect can happen if A S /A D and B S /B D belong to the same switch box, and the SEU enables the non-decoded or decoded PIP that connects B with A. • Open, which corresponds to the deletion of both routing edges A S /A D and B S /B D , as shown in Fig. 3.22c. This situation may happen if a decoded PIP controls both A S /A D and B S /B D .
78
3 Reconfigurable Field Programmable Gate Arrays
• Open/short, which corresponds to the deletion of either the routing edge A S /A D or the one B S /B D and to the addition of the routing edge A S /B D or BS /A D , as shown in Fig. 3.22d. This situation may happen if a decoded PIP controls both A S /A D and B S /B D . The short effects, as shown in Fig. 3.22b, may happen if two nets are routed on the same switch box and a new edge is added between them. This kind of faulty effect happens when a cross-point PIP, that is non-buffered and has bidirectional capability, links two wire segments located in disjoint planes. Conversely, the open and the open/short effects, as shown in Fig. 3.22c and d, may happen if two nets are routed using decoded PIPs. The details of the configuration memory effects on the topological characteristics of routing and logic resources have been accurately described in Section 3.1.6. As far as the dependability rules are concerned, different SEU mitigation techniques correspond to different rules that have to be enforced by the mapped circuit to be insensitive to configuration memory upset. As an example, in case TMR mitigation is considered, we have defined the following rules: 1. All the circuit modules and connections must be replicated three times. 2. The outputs of the three circuit replicas must be voted according to the TMR principle. 3. The elements of the resulting TMR architecture (logic functions and connections among them) must be placed and routed in such a way that, given the corresponding routing graph, each new edge that is added (or deleted) to (from) the graph cannot provoke any fault belonging to the following categories: (a) Short between different connections belonging to different circuit replicas. (b) Open affecting different connections belonging to different circuit replicas. With the development of newer generations of device, MCUs are becoming no longer negligible. MCUs within not rad-hard SRAM-based FPGAs have been observed during accelerated radiation testing experiments with proton and heavy ions. In particular, a study presented in [16] shows that MCUs are on the rise when analyzing the occurrence of proton and heavy ion effects in four different Xilinx FPGA’s families. From the study, it can be seen that SEUs dominate in older devices like Virtex-I, where 1-bit upsets account for 99.96% of the observed events. On more recent devices, such as Virtex-4 and Virtex-5, a growing contribution for 2-bit events is observed, which now account for 5.43% (Virtex-4) and 8.79% (Virtex-5), and 3- and 4-bit events are recorded. As a result, STAR has been extended to support the analysis of MCU effects in the device configuration memory alongside SEU analysis. The MCU-aware STAR analyzes the effects of multiple upsets in the configuration memory of SRAM-based FPGAs as soon as a model of the placed and routed design is available implementing the desired application. The tool is composed of the modules as illustrated in Fig. 3.23: native circuit description, layout-aware static analyzer, and MCU violations. The native circuit description contains the structural and topological descriptions of the circuit, which consists of logic functions
3.2
Analysis Techniques
79
Fig. 3.23 Architecture of STAR targeting MCUs
(either combinational or sequential) and connections between them. The resources are described in terms of the addresses in the configuration memory of the resources used by the placed and routed circuit. The tool checks the placed and routed circuit analyzing the sensitive MCUs location affecting the memory elements the design embeds and the configuration memory. In detail, the tool is composed of three main modules: the redundancy cluster extractor, the dependability rules, and the rules checker: • The redundancy cluster extractor is a module that reads the native circuit description and extracts the place and route information related to each cell of the FPGA architecture. This information is processed by a clustering algorithm that groups the data depending on the FPGA topology architecture and on the redundancy structure of the adopted hardening technique. • The dependability rules is a database of constraints related to the topology architecture of the not rad-hard FPGA that must be fulfilled by the placed and routed circuit in order to be resilient to the effects provoked by MCUs. • The dependability rules are used by the rules-checker algorithm that reads each cluster and analyzes all the bits of the FPGA’s configuration memory. It returns a list of MCUs (MCU violations) that provoke critical modifications that may overcome the adopted hardening technique. The tool is based on a layout geometry database containing the information extracted from the laser screening. It contains the spatial distribution on both the X - and Y -axes of the configuration memory. The MCU-effect analysis is performed selecting a desired sensitive radius R (µm): given a configuration memory cell CM0 , each cell CMi that is within a distance R from CM0 is considered as location of an
80
3 Reconfigurable Field Programmable Gate Arrays
MCU (M0 , Mi ). In the current implementation of the tool only MCUs corresponding to the bit flip of two memory cells are considered. As suggested by the available data, 2-bit MCU is the most significant effect beside SEUs in recent generations of Xilinx devices. STAR analyzes MCUs by considering clusters of adjacent configuration memory bits as illustrated in Fig. 3.24a. As illustrated in Fig. 3.24b, MCUs may affect logic components belonging to the following sets: CLBs, block RAMs (BRAMs), BRAMs interconnects, and IOBs. Each resource set is controlled by a defined number of configuration memory frames where each frame corresponds to an FPGA’s configuration column of SRAM cells. Depending on the orientation of the MCU events (single column, row, or diagonal adjacent cells), the provoked effects may simultaneously corrupt resources of a single set or two sets whose configuration memory bits are adjacent.
Fig. 3.24 (a) Multiple cell upsets adjacent cells. (b) Configuration memory layout general organization of Virtex-II
We can define the MCU effects using the following parameters: • Orientation: It defines the position of the MCU within the FPGA’s configuration memory, as single column, diagonal, or single row. • Case: It defines the transitions induced by the MCU within the FPGA’s configuration memory cells as 00 → 11, 01 → 10/10 → 01, or 11 → 00. • Effects: It defines the effects induced by the MCU as short, open, short/open, logic, and logic routing. The classification of the effects can be further refined by considering the number of bits and the occurrence of the effects. Considering a couple of vertices A S /A D and B S /B D linked by two distinct interconnection segments and controlled by two
3.2
Analysis Techniques
81
Fig. 3.25 MCU fault effects scenario. The original configuration topology of the vertices A S /A D and B S /B D is defined by the configuration memory bits reported in (a). In (b) is illustrated a double open effect when two different bits in a vertical orientation affect two separate interconnections. In (c) is reported an open 2-bit; in this case both the involved bits are related to a single interconnection, while in (d) is reported an open/short effect
configuration memory bits each, as illustrated in Fig. 3.25a, we can have the following scenarios related to the interconnection resources: • Open or short 1-bit: Only one bit of the two cells affected by the MCU provokes a failure effect. • Double open or short: Both the bits of the two cells affected by the MCU provoke failure effects. In particular, each bit affects a distinct interconnection of the TMR structure. For example, it is reported in Fig. 3.25b the double open effects when two different bits in a vertical orientation affect two separate interconnections. • Open or short 2-bit: Both the bits of the two cells affected by the MCU provoke failure effects. In this case, both the bits are related to a single interconnection, and thus it does not corrupt the TMR structure. We reported in Fig. 3.25c an example of an open 2-bit. • Open–short: Both the bits of the two cells affected by the MCU provoke failure effects. In particular, one bit induces an open effect and the other one a short effect between distinct interconnections, as illustrated in Fig. 3.25d. When logic resources are considered, the following cases apply: • Logic failure: Both the bits of the two cells affected by the MCU provoke a failure in a single logic block of the FPGA. • Logic-routing failure: Both the bits of the two cells affected by the MCU provoke failure effects. In particular, one cell controls logic resources and the other one controls interconnection resources. STAR does not consider MCUs affecting IOBs and BRAMs.
82
3 Reconfigurable Field Programmable Gate Arrays
References 1. M. Alderighi, F. Casini, M. Citterio, S. D’Angelo, M. Mancini, S. Pastore, G.R. Sechi, and G. Sorrenti, Using flipper to predict proton irradiation results for virtex 2 devices: A case study, IEEE Transactions on Nuclear Science 56 (2009), no. 4, 2103–2110. 2. M. Alderighi, F. Casini, S. D’Angelo, M. Mancini, S. Pastore, and G.R. Sechi, Evaluation of single event upset mitigation schemes for sram based fpgas using the flipper fault injection platform, Defect and Fault-Tolerance in VLSI Systems, 2007. DFT ’07. Proceedings of the 22nd IEEE International Symposium on Rome, 2007, pp. 105–113. 3. M. Alderighi, F. Casini, S. D’Angelo, M. Mancini, S. Pastore, L. Sterpone, and M. Violante, Soft errors in sram-fpgas: A comparison of two complementary approaches, IEEE Transactions on Nuclear Science 55 (2008), no. 4, 2267–2273. 4. A. Benso and P. Prinetto, Fault injection techniques and tools for embedded systems reliability evaluation, Springer, Dordrecht, The Netherlands, 2003. 5. J. Boue, P. Petillon, and Y. Crouzet, Mefisto-l: A vhdl-based fault injection tool for the experimental assessment of fault tolerance, Fault-Tolerant Computing, 1998. Digest of Papers. Proceedings of the 28th Annual International Symposium on, Washington, DC 1998, pp. 168–173. 6. M. Caffrey, K. Morgan, D. Roussel-Dupre, S. Robinson, A. Nelson, A. Salazar, M. Wirthlin, W. Howes, and D. Richins, On-orbit flight results from the reconfigurable cibola flight experiment satellite (cfesat), Field Programmable Custom Computing Machines. FCCM ’09 Proceedings of the 17th IEEE Symposium on, Piscataway, 2009, pp. 3–10. 7. M. Ceschia, M. Bellato, A. Paccagnella, and A. Kaminski, Ion beam testing of altera apex fpgas, Radiation Effects Data Workshop, 2002 IEEE, Piscataway, 2002, pp. 45–50. 8. T.A. Delong, B.W. Johnson, and III Profeta, J.A., A fault injection technique for vhdl behavioral-level models, Design Test of Computers, IEEE 13 (1996), no. 4, 24–33. 9. R.D. Schrimpf and D.M. Fleetwood, Radiation effects and soft errors in integrated circuits and electronic devices, World Scientific, Hoboken, NJ, 2004. 10. E. Fuller, M. Caffrey, P. Blain, C. Carmichael, N. Khalsa, and A. Salazar, Radiation test results of the virtex fpga and zbt sram for space based reconfigurable computing, Military and Aerospace Programmable Logic Devices (MAPLD) Conference, Washington, DC, 1999. 11. Mei-Chen Hsueh, T.K. Tsai, and R.K. Iyer, Fault injection techniques and tools, IEEE Computer 30 (1997), no. 4, 75–82. 12. A. Lesea, S. Drimer, J.J. Fabula, C. Carmichael, and P. Alfke, The rosetta experiment: Atmospheric soft error rate testing in differing technology fpgas, IEEE Transactions on Device and Materials Reliability 5 (2005), no. 3, 317–328. 13. Austin Lesea, Continuing experiments of atmospheric neutron effects on deep submicron integrated circuits, Tech. report, Xilinx, 2009. 14. T.P. Ma and P.V. Dressendorfer, Ionizing radiation effects in mos devices and circuits, Wiley, New York, NY 1989. 15. H. Quinn, P. Graham, J. Krone, M. Caffrey, and S. Rezgui, Radiation-induced multi-bit upsets in sram-based fpgas, IEEE Transactions on Nuclear Science 52 (2005), no. 6, 2455–2461. 16. H. Quinn, P. Graham, K. Morgan, J. Krone, and M. Caffrey, Fpga testing and trends, Presented at SERESSA school, West Palm Beach, FL 2008. 17. S. Rezgui, J.J. Wang, E.C. Tung, B. Cronquist, and J. McCollum, Comprehensive see characterization of 0.13μm flash-based fpgas by heavy ion beam test, Proceedings of the 9th European Conference on Radiation and Its Effects on Components and Systems, Deauville 2007, pp. 1–6. 18. L. Sterpone and M. Violante, A new analytical approach to estimate the effects of seus in tmr architectures implemented through sram-based fpgas, IEEE Transactions on Nuclear Science 52 (2005), no. 6, 2217–2223. 19. L. Sterpone and M. Violante, A new reliability-oriented place and route algorithm for srambased fpgas, IEEE Transactions on Computers 55 (2006), no. 6, 732–744.
References
83
20. L. Sterpone, M. Violante, R.H. Sorensen, D. Merodio, F. Sturesson, R. Weigand, and S. Mattsson, Experimental validation of a tool for predicting the effects of soft errors in sram-based fpgas, IEEE Transactions on Nuclear Science 54 (2007), no. 6, 2576–2583. 21. R. Velazco, P. Fouillat, and R. Reis, Radiation effects on embedded systems, Springer, Dordrecht 2007. 22. James F. Ziegler, Ser-history, trends and challenges, Cypress, San Jose, 2004.
Chapter 4
Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
During the past years, several mitigation techniques have been proposed in order to increase the reliability of circuits of avionics and space applications and, in particular, to remove single and multiple points of failure from the designs. Depending on the kind of FPGA technology, several mitigation techniques have been proposed. These techniques rely, on the one hand, on technological modifications, in part sustained from the progressive improvement of the technology realization process and in part from necessity of increasing the reliability and the capacity of FPGA devices to tolerate faults; on the other hand, mitigation techniques can be applied at the application level, to exploit commercial technology anyhow achieving the required reliability degree. Depending on the technology, as explained in Chapter 3 different kinds of faults can be considered. In this chapter, we treat the several hardening solutions proposed for the different FPGAs dividing them into two large categories: techniques for the FPGA manufacturers and techniques for the FPGA users, also referred as application designers.
4.1 Overview on the Design Process for FPGA Applications To study and analyze the various hardening techniques that can be applied to FPGAs in order to enable their use in mission-critical applications, we need to briefly outline the design process used to implement applications on such devices and identify at which point mitigation techniques can be applied. The normal design process can be divided into two main parts: the device design/manufacturing process and the application design process. The first part aims at designing the FPGA itself like a particular ASIC, according to the architecture described in Chapter 2. On the other hand, the second process aims at designing the application that will be implemented on such FPGA. For the purposes of this book, we could define the first process as composed by two main phases: 1. Architecture design 2. Physical layout design N. Battezzati et al., Reconfigurable Field Programmable Gate Arrays for Mission-Critical Applications, DOI 10.1007/978-1-4419-7595-9_4, C Springer Science+Business Media, LLC 2011
85
86
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
The application design process can be instead divided into six different phases: 1. 2. 3. 4. 5. 6.
Register transfer level (RTL) description Synthesis Mapping Placement Routing Bitstream generation
Figure 4.1 draws a representation of the design process with all its phases. First of all the architecture of the FPGA device is designed, deciding logic and interconnection resources that will be available to the end-user. Then, the physical layout of the device is generated and, finally, the production is started to manufacture the actual chips. Once the FPGA is available to the end-user, the application designer, the first phase of this second step is the implementation of the RTL representation of the application itself. Second, synthesis and mapping phase translate the RTL representation into a more detailed and technology-dependent one, the netlist. This circuit is then placed and routed producing the final version of the application that will be placed on the device. The last phase is the bitstream generation that is aimed at producing the content of the FPGA configuration memory, so that it will implement the desired functionality. During all the several steps of the FPGA-based application design process, mitigation and hardening techniques to cope with radiation effects can be introduced, with different costs and benefits.
APPLICATION DESIGN FPGA DESIGN
Application RTL description
FPGA architecture design
Application synthesis & mapping
FPGA production
FPGA physical layout design
Application placement & routing
0 1 1 0 1 0 0 1 0 1
Fig. 4.1 FPGA-based application design process
0 0 0 1 0 1 1 1 1 1
1 1 1 1 1 0 0 1 0 0
1 0 1 1 1 1 0 1 1 0
0 1 0 1 0 1 1 1 0 0
1 1 1 0 0 1 1 0 1 1
0 0 1 1 0 1 1 1 1 1
1 0 1 1 1 0 0 1 0 1
1 1 0 1 0 0 1 1 1 1
1 1 0 0 0 0 0 0 1 1
Application bitstream generation
4.1
Overview on the Design Process for FPGA Applications
87
4.1.1 FPGA Design The FPGA design is the process that constitutes the first step for realizing an FPGA-based application. It is the process of designing and manufacturing a device with certain characteristics, that will be configurable, one or more times, with a homogeneous and regular structure, with programmable logic resources, interconnections, and communication protocols to implement a certain functionality and exchange information with the outside world. This process is composed of several steps, but two of them in particular are important from the reliability point of view, because they allow the designer to implement mitigation techniques against radiation effects. The first phase is the architecture design and the second one is the physical layout design. 4.1.1.1 FPGA Architecture Design FPGA architecture design is the process aimed at defining which kind of resources the device will be composed by, how they will be interconnected, how they will be programmed in order to fulfill the configuring capability FPGAs are characterized by. The architecture of an FPGA is defined by its logical resources that implement configurable basic functions or even complex hardwired cores, by the interconnection network, that routes signals among the different logic resources and that can be flat or hierarchical, and the memory and circuitry that enables the reconfiguration of the whole device, or even only portions of the same, to implement the desired functionalities. Finally, the input/output blocks implement interfaces for several communication protocols to transfer data from the FPGA to the external devices attached to it. Starting from this point of the design flow, the FPGA designers could insert hardening and mitigation techniques to cope with radiation effects, in particular with single event effects. Redundancy can be applied to logical resources, interconnections, configuration memory cells, as well as communication interfaces and hardwired cores to make them robust against SEEs, thus achieving fault tolerance inside the device itself. Such kind of FPGAs that implement mitigation techniques at the architectural level are referenced as radiation hardened by design (RHBD), because the hardening mechanisms are introduced during the architecture design phase. 4.1.1.2 FPGA Physical Layout Design The second phase of the FPGA design process is the physical layout. As in normal ASICs, once the architecture is defined and designed, the technology used to implement such an architecture has to be decided. Different technologies have different characteristics, from the performance point of view, power consumption, area occupation, and also radiation sensitivity. In particular, different materials and transistors form factors could drastically change the device resilience to radiation. The layout design phase is composed of different steps. First of all the technology mapping is performed, and the FPGA resources, synthesized as a set of basic
88
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
cells, are correlated to a particular technology. Such a technology could have also been especially designed to be radiation resilient and in the next section we will introduce some of the possible techniques that can be used to achieve this. After the technology mapping, the placement and routing phases are performed and optimized to meet the performance and the other constraints previously defined, and, finally, the physical layout of the different material zones into the device is automatically defined by the design tools; however, they could be manually modified in order to add robustness to the device itself. At different points of this phase the technology designers could intervene to insert hardening measures to make the FPGA robust against radiation, for what concerns both ionizing dose and single effects. First of all, a particular technology can be designed, for both the configuration memory and the FPGA fabric itself. Then, during the placement and routing phases, as well as during the manual layout definition phase, countermeasures could be taken in order to improve the technology robustness. This is done in two main ways. The first approach is to increase the distances between sensitive points in order to reduce the probability for them to share charges injected by radiation. The second approach is to enlarge the transistors of critical resources, in order to increase their Q crit , thus making them more resilient to radiation. Both of these techniques tend to reduce the overall performance of the device itself and increase the cost of the design process because they move it away from the standard ones. Devices that use such approaches are called technology-hardened or rad-hard FPGAs.
4.1.2 Application Design Once the FPGA has been designed, manufactured, and produced, the end-users can use it to implement their own applications. The design process is similar to the previous one but with some differences related to the regular and predefined nature of the FPGA fabric. Mitigation and hardening techniques can also be applied at the several levels of this design process, referring to such an approach as application hardening. Moreover, during the application design process, static analysis and simulations can be performed in order to study the effects of faults in the application. 4.1.2.1 Application RTL Description First of all, the application is described at the RT level. Usually, high-level description languages are used to perform such a task. They are called hardware description languages (HDLs), like VHDL [77] or Verilog. In this phase the designer specifies the architecture of its application from the hardware point of view, describing the components, their functionalities, and their interfaces and connections to implement the complex operations required. In the last years, with the increase in the designs complexity and size, higher level abstraction languages have been proposed and developed [67], like SystemC [52], Synphony-C [2], the use of ANSI C/C++ [107], and many others [19]. Such
4.1
Overview on the Design Process for FPGA Applications
89
languages are exploited to describe the application at the algorithm level and then high-level synthesis (HLS) tools are used to translate this representation into a set of hardware components and their relationships. HLS tools automatically provide, in the most cases, a lower level description using previously mentioned HDLs. The design process with such tools and languages is rich in advantages, because it offers a higher level of abstraction to the designer and the capability of an automatic translation and optimization, using, among others, parallel computation paradigms. However, the development process of such a technology is still in progress and there is not a standard and commonly accepted way of proceeding, leading to the creation of a variety of different languages, constructs, and optimization policies. SystemC seems to be one of the most supported and used languages but most designers still use HDLs to describe their applications. Hardening techniques and mitigation approaches could be already inserted at this level of the application design process. Some works, like [10], propose a high-level application of such techniques. On one side, they are easier to implement, at the top level of the design process, because closer to the abstraction level the designer uses to describe its application; on the other side, however, such an application requires an RTL description of the circuit itself, that is always more unusual, because the design flow is moving toward a direct synthesis from a high-level, behavioral, description to a netlist representation. Moreover, the application of such techniques at a high level of abstraction needs a very accurate knowledge of the synthesis environment and an ad hoc application in order not to be optimized out. Behavioral simulation is possible at this level of the design process, allowing the designer to analyze the correct behavior of the application and also, by means of fault injection, the response of the same to radiation effects. However, since the abstraction level of the description is high, it is not possible to inject faults using the usual models because the circuit is not yet described in detail, but only at a behavioral level. For this reason, high-level (RTL) fault models have been developed [60, 64, 134], trying to abstract the effects of low-level faults to the RTL. 4.1.2.2 Application Synthesis The application synthesis phase is aimed at translating the high-level description of the architecture, made by means of HDLs, into a more detailed representation by means of a logical technology. A logical technology is the definition of low-level primitives, basic logic gates, cells, and interconnections, that are the basic blocks for building a circuit. These blocks are dependent on the implementation of the FPGA fabric itself, because different FPGAs could have different kinds of basic blocks, like LUTs or logic gates, as well as implement interconnections by means of different routing architectures, like switch matrices or programmable busses. The set of basic blocks that compose a logical technology, usually provided by the FPGA vendor, is called technological library. The representation of the designer’s application by means of the primitives contained in the library is called netlist. Different FPGA vendors provide different synthesis tools that translate the designer’s description at the RT level in a lower level representation using their own technological library.
90
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
The format used to express this level of abstraction is usually proprietary but exists a standard language, supported by all vendors, that is able to describe in a powerful and detailed manner a netlist: the electronic design interchange format (EDIF).
EDIF The EDIF language was born in the early 1980s as an effort to propose a unified format for electronic information exchange initially between designers and foundries [39]. It has been proposed as a standard in its first version (EDIF 1.0.0) allowing exchange of a variety of design information, like macro cells, netlists, schematics, and layout information. However, the first actually public release has been the EDIF 2.0.0 version, approved in the 1988 as ANSI/EIA-548-1988 standard. EDIF continuously grew up following the evolutions of the ICs industry [49, 58], enriching its expression capabilities by releasing a third and, finally, a fourth version (EDIF 4.0.0) in 1996. EDIF is now recognized as the preferred interchange format for design information and most of the electronic design automation (EDA) community vendors can export their designs using this standard. In particular, EDIF is widely used to describe post-synthesis circuit netlists. Indeed, this format allows describing both the libraries the netlist’s components are bound to, and the netlist itself. EDIF 2.0.0 is the most supported version for describing netlists, even if it is not the newest one. This is because versions 3.0.0 and 4.0.0 added constructs related to schematics and printed circuit boards (PCBs) design, which are not relevant for netlists description and, on the contrary, made the EDIF standard size become much bigger. The EDIF format supports hierarchical descriptions becoming very suitable in order to preserve the original RTL structure of the circuit after synthesis. Figure 4.2 represents an example of hierarchy used by the EDIF standard to describe a netlist. Dashed rectangles are hierarchical cells that do not actually exist in the netlist but are useful to keep the description compact and keep the modularity expressed in the RTL representation. Such cells, indeed, can be used as normal cells thus allowing the instantiation of its whole content without the need to repeat it each time. The other elements that EDIF uses to represent a netlist are as follows: • Library represents the EDIF libraries, which contain the basic technological cells • Cell represents a kind of cell that can be instantiated within the netlist; they are not only the basic technological cells but also hierarchical cells defined by a higher description of the circuit • Port represents the connection point for a specified cell; a set of ports defines the interface (inputs and outputs) of the cell • Instance represents a particular instance of a cell. While each cell can appear only once because it defines a kind of element, the instance is one element of such a kind • Net represents a connection between two or more instances. A net is defined by a source and one or more sinks
4.1
Overview on the Design Process for FPGA Applications
Library
Cell
91
Port
A
B
C
Top-level Cell Instance
B1 Net A1 C1
Hierarchical cell A2
Port-ref
B2
A3
Fig. 4.2 EDIF representation of a netlist
• Port-ref represents the connection point onto an instance. It is thus defined by an instance and a port of that instance and is linked to a net as its source or one of its sinks. Finally, the top-level cell is the cell that contains the whole netlist, all its instances, nets, and port-refs. Hardening techniques and mitigation approaches can be applied at this design step, with a good knowledge of how the application will be implemented on the FPGA. While at higher levels of abstraction, the description is mostly behavioral and far to the actual implementation on the device, after synthesis logic resources of the final version of the application are already defined and mitigation schemes can be applied without the danger to be optimized out. However, since the netlist is a very complex and large-sized description of the application, it is not feasible to manually insert mitigation techniques. In the best case, if hardening techniques have well-known implementations, accepted by the market and mature enough, the designer can have automatic tools for implementing a very specialized solution, often with a low degree of freedom, offered by different vendors. This is the case, for example, of automatic TMR application in Xilinx FPGAs [30]. This tool works at the netlist level and is very specific to Xilinx architecture and cannot be used for other purposes or other FPGAs. For other architectures, Synopsys Synplify provides HDL libraries that can automatically infer TMR flip-flops, but requires the modification of the user’s HDL description [3]. Instead, if the designer wants to implement
92
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
novel mitigation techniques, or also known solutions that are not so mature, for example, to evaluate them against different ones, he usually has to implement them by hand, with a high probability to introduce bugs and making the HDL description confused. Moreover this can be done only at the RT level, if available, because manual modifications are impossible to be implemented at a lower level of abstraction, thus probably introducing a high overhead. Some works are in progress, to provide general frameworks for modifying, analyzing, and inserting mitigation structures at the netlist level [17, 27]. After the synthesis phase, simulation can be performed with more accuracy, and fault injection can make use of the usual, low-level, fault models, described in Section 3.1.4 of Chapter 3. At this level of the design process, static analysis can be also performed with good results, because the application model is very detailed and most assumptions that should have done at higher levels are here completely defined.
4.1.2.3 Application Mapping The application mapping phase is the step aimed at correlating the netlist, build up with the primitives of the technological library, with the real cells of the device. This process associates the logical blocks of the previous representation with the real primitives available in the FPGA. For example, if the technological library contains the LUT primitive and within the device there are LUTs of different sizes, e.g., with two or three inputs, the mapping process translates the LUT primitive into a LUT with two or three inputs according to the function that shall be implemented by that resource. This process is aimed at optimizing the area occupancy and delay performance of the overall circuit. Another example is the mapping of two dependent 2-input functions that could be integrated in a single 3-input function. Hardening techniques can be applied at this level of the design process in order to optimize the mapping for increasing the reliability of the whole application. Indeed, usual optimizations for area and delay are not necessarily good also for reliability. Grouping two 2-input functions in a single 3-input function, for example, could lead to a fault to occur in both of them by means of a single particle.
4.1.2.4 Application Placement Once the circuit is mapped using a set of real resources, two other steps are necessary to virtually locate it into the FPGA: placement and routing. The first step is aimed at placing each resource defined during the previous phase in a determined location of the device. Since FPGAs have a regular and mostly symmetrical structure, each resource type is replicated several times all along the device. LUTs, for example, are replicated thousands of times, as well as memory elements like flip-flops and latches. The placement phase assigns to each real resource used by the application a location among all the available ones, trying to optimize the area occupancy, according to the constraints of the designer.
4.1
Overview on the Design Process for FPGA Applications
93
In the last years, with the emerging of reconfigurable applications, the placement phase has assumed an important role in such a feature. Indeed, since most FPGAs allow the reconfiguration of portions of the device, instead of the whole one, to save time and keep the application running, reconfigurable parts of the application should be placed so that their change does not affect the other parts. The first required constraint for correctly implementing this approach is to bound reconfigurable parts into predefined areas of the device, forcing their resources to be placed in specific locations such that they will not be mixed with the resources of other application parts. Figure 4.3 shows an example FPGA, with different kinds of resources identified with different colors, and the example of two possible placement-constrained areas. The dashed boxes Constraint1 and Constraint2 are the boundaries into which two parts of the reconfigurable application should be located, meaning that each logic resource belonging to the first part must be placed in the first bounded region, as for the resources of the second part has to be located in the second region. The represented example FPGA is highly regular and structured but a sort of asymmetry has been drawn on purpose, to reflect that actual characteristics of most current FPGAs. This irregularity leads to a difficult partition of the available space of resources so
Constraint 1
Constraint 2
Fig. 4.3 Different placement constraints for reconfigurable applications
94
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
that it becomes unfeasible a real relocation of the reconfigurable parts of the application within the device but they can be placed only in certain, well-defined, regions. The flexibility that can be achieved depends on the choice of the bounding boxes; indeed, parts constrained with Constraint1 can be placed in 12 different regions, all equal to the first one, while portions of the application constrained with Constraint2 could be placed only in 4 different regions. However, several works are in progress to optimize the flexibility and relocation capability of reconfigurable applications [65, 66]. Even if the design of reconfigurable applications is not yet mature as for standard applications, several and hard efforts are in progress to enable the designers to easily make use of such a feature [93, 119, 133]. As in the application mapping step of the design process, also at this level mitigation techniques for radiation effects can be applied. Normal placement algorithms are aimed at optimizing the area and delay performance of the application, in compliance with the rules and constraints given by the designer. However, again, normal optimizations are not the best solution in terms of reliability, and different approaches could be desired to improve the robustness of the application itself. 4.1.2.5 Application Routing The final step of the design process is the application routing. It is aimed at finding a route for each signal within the circuit trying to minimize the delay and increasing as much as possible the operating frequency of the application. Routing is an expensive and difficult process and dedicated algorithms have been developed to cope with it [6, 88, 117, 120]. Given the FPGA architecture, regular and homogeneous, routing segments are already predefined in the device and the algorithms have to use them to create the nets to route the several signals. One of the problems that routing algorithms have to face is the congestion. Congestion is the accumulation of several nets in the same neighboring routing resources, such that it becomes difficult to route more signals in the same area and other nets have to pass far away from that location increasing their delay and consequently decreasing the overall performance. For this reason, routing algorithms should avoid congestion, making use as much as possible of a homogeneous routing space, not concentrating all the nets close to each other, but spreading them on the whole available space and resources. The routing process, for what concerns reconfigurable applications, is even more tricky. Indeed, as already said for the previous step, reconfigurable parts of the application have to be bounded in a predefined region of the device, thus requiring all the nets to be constrained to specific routing resources within the FPGA. However, for the routing process the problem is even more complex, since interconnection nets, which allows different parts of the application communicating among each other, have to be placed between different reconfigurable areas and, at the same time, shall not drive wrong signals during the reconfiguration process. For this reason, standard solutions are under investigation, like Bus Macros [48] or common interfaces [54], that are fixed parts of the application, aimed at routing communication nets between reconfigurable parts, being able to provide always correct signals, also during the reconfiguration process, making use of tri-state buffers and high-impedance signals.
4.1
Overview on the Design Process for FPGA Applications
95
Fixed region
Reconfigurable region 1
Reconfigurable region 2
Reconfigurable region 3
Reconfigurable region 4
Bus Macro
Fig. 4.4 Reconfigurable application with Bus Macros
Figure 4.4 shows an example architecture for a reconfigurable application mapped on an FPGA, with a fixed region, whose content will not be changed for the whole mission period, and four reconfigurable regions, each of which could contain different parts of the application itself. Since reconfigurable portions have to communicate with each other and also with the fixed one, Bus Macros could be used in order to implement such a protocol. These macros are simple pre-defined routing circuits, combined with tri-state buffers, able to create a connection between different regions of the FPGA but, at the same time, providing always a correct signal even during the reconfiguration phase. Also in this design phase, normal optimization goals, as delay reduction and resource usage minimization are, could go in the opposite direction with respect to reliability considerations. Minimizing resource usage, for example, using the same switch matrix to route more signals, increases the probability of having multiple failures on different nets caused by a single particle striking the same resource provoking an SEU, or even an MCU, in the configuration memory bits that control those nets. On the other hand, avoiding congestion is a benefit both from the performance point of view and the reliability one, because signals are not slowed down, delays are balanced, and, at the same time, possible upsets are less likely to affect neighboring resources. For this reason, in mission-critical applications, reliability considerations should be taken into account during the whole design process in the same way as performance ones are. All the steps of the design process, routing comprised, should be driven by both performance and reliability constraints.
96
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
4.1.2.6 Application Bitstream Generation The last phase of the design process, before the actual implementation of the application inside the FPGA by means of programming the configuration memory, is the bitstream generation. This step provides the final content of the configuration memory, the set of bits that will define the application within the device, the bitstream. The bitstream generation step is closely related to the configuration memory organization that is different among each FPGA vendor and sometimes also among different generations of devices from the same vendor. The organization of the bitstream is mostly proprietary, even if some reverse-engineering work has been done [5, 21, 84, 111], in order to be able to intervene also at this level of the design process introducing further checks and analysis for what concerns the reliability point of view that in commercial flows is almost not considered. For this reason, bitstream generation is a completely automated process, defined by the vendors and executed by their tools. In most cases, in commercial applications, this is very useful and convenient for the user, which does not have to cope with this problem; in missioncritical applications, however, analyzing the bitstream and relating it to the resources it programs enable to perform correlation between faults caused by radiation in the configuration memory and the effects they produce in the logic resources [111]. Special attention should be paid to FPGAs that support partial reconfiguration. Indeed, in such devices, it is possible to write also a single portion of the configuration memory leaving the rest unchanged and operating. For this reason partial bitstreams are needed, single parts of the whole bitstream that are related to the portion of the FPGA’s resources the designer wants to re-program. Partial bitstreams are again automatically generated by commercial tools, when this feature is available. However, since FPGAs are not completely regular, as they embed resources of different natures in different locations, the same part of a reconfigurable application usually cannot be placed in different locations, but just in few, structurally and architecturally coherent, ones. This means that, if a single reconfigurable part of an application should be placed in different sites within the FPGA, one partial bitstream has to be generated for each site and this drastically increases the amount of memory required to store them, unless runtime manipulation of the partial bitstream itself can be done [68], allowing its relocation within the FPGA [38].
4.2 Techniques for FPGA Manufacturer Hardening and mitigation techniques for radiation effects can be first introduced by manufacturers during the FPGA design process. As mentioned in the previous section, hardening methodologies could be applied at different levels, from the lowest ones of technology and layout, up to the higher level of the resources’ architecture. While low-level techniques are usually suitable to cope with physical effects, like latch-up and TID, high-level techniques address more abstract fault models, like SEEs are. However, all these techniques can be merged to obtain a more robust solution and a more reliable device.
4.2
Techniques for FPGA Manufacturer
97
Two considerations have to be done with regard to costs and benefits of such mitigation approaches. First of all, low-level techniques, which modifies layout or technology, provide a higher control of the behavior they produce in the circuit and of the parameters they change with respect to the normal, commercial, technology. This means that even single critical points of a transistor can be hardened or mitigation structures can be inserted within the device materials to increase the overall reliability. However, everything comes at a cost. Modifying technology parameters, dopant concentrations, and inserting material structures different from the commercial ones, maybe using different materials during the manufacturing process, require changing the fabrication process and chain. This could cost millions of dollars and such a scenario could come even worse if we consider that rad-hard FPGAs are not devices sold in huge numbers but their sales only address the avionic and space market as well as some military ones. On the other hand, modifications to the resources’ architecture could involve the re-design of technological libraries and their cells, but do not require changing the fabrication process. This results not only in a lower cost but also in a lower level of control, since such an approach cannot intervene at the physical level but could cope only with logical faults. For all these reasons, we are observing, in the last years, a trend that first moved from the technological mitigation to the application of RHBD techniques, and that is now moving toward the use of application-level mitigation measures, using a completely commercial off the shelf (COTS) technology to build the FPGA device [94]. As usual, this trend is staggered between the market and the research. While the latter is looking for cheaper solutions, that adopt commercial technologies and high-level techniques, also pushed from the market of avionic and space applications, the former is still using rad-hard technologies in critical applications because they are the only ones that ensure safe results, even if it is slightly moving toward the adoption of RHBD designs and even COTS-based ones, above all in non-critical applications. In this section we introduce the several techniques and approaches that can be applied at the different levels of the FPGA design process to cope with • • • • • • •
SEL TID radiation effects in single memory elements radiation effects in programming elements radiation effects in memories (both configuration and user ones) radiation effects in logic elements radiation effects in input/output elements
4.2.1 Mitigation Techniques for SEL SEL is one of the most critical effects induced by radiation, since it can cause the destruction of the device. According to Pouponnot [94], “CMOS technologies that are not SEL-free are generally not accepted for space use” and in general for mission-critical applications. Different approaches and technological techniques
98
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
have been developed during several decades but all of them have a common strategy: to break off the parasitic junctions, due to the CMOS structure itself. In order to do that, both insulating structures and the alteration of the electrical characteristics of CMOS materials can be adopted. There are also some architectural level techniques that have been proposed but they do not avoid SELs, rather they detect and stop them. 4.2.1.1 Silicon on Insulator (SOI) Technologies In a MOS transistor, only the very top region (0.1–0.2 µm thick) of the silicon wafer is used for carrier transport. The inactive volume, more than 99.9% of the wafer, is used as a mechanical support of the active device. However, this very part of the device can induce undesirable effects, as “leakage” currents, which cannot be modulated by the gate. SOI technology was born from the idea of separating the active device over-layer from the silicon substrate by a thick insulation layer. This idea was at the basis of 1930s Lilienfield’s patents [46, 47] where he observed and studied the insulated-gate field-effect transistor (IGFET). The active part of the transistor device was a thin silicon layer deposited on a thick insulator. At the beginning, SOI devices were manufactured for space and military applications. However, after several years of studies that led to significant improvements in SOI wafer quality and reduced manufacturing costs, SOI technology is now being developed for wide-scale commercial applications. Among all the SOI structures that have been investigated, there are two main types of n-MOS and p-MOS SOI transistors, mostly differing by the thickness of the top silicon layers: fully depleted transistors and partially depleted transistors [9]. The two structures are depicted in Fig. 4.5. Very thin layers of silicon lead to fully depleted (FD) transistors [74], i.e., the top silicon layer is totally depleted during the normal behavior. In addition to the silicon layer thickness, this condition also depends on its doping. Thicker silicon layers are used for partially depleted (PD) structures. PD structures have a neutral zone isolating the front device from
Gate
Gate Body
Buried Oxide
Buried Oxide
Si Substrate
Si Substrate
(a)
(b)
Fig. 4.5 SOI transistor structure: fully depleted (a) and partially depleted (b)
4.2
Techniques for FPGA Manufacturer
99
the buried oxide. This neutral region, called floating body, is coupled to the transistor terminals through the gate capacitance and the source and drain diode junctions. Among all the techniques proposed to manufacture SOI wafers, two are the most competing for commercial applications: separation by implantation of oxygen (SIMOX) [122] and bonded SOI (BESOI) using the Smart CutTM Technology [26].
SIMOX SIMOX technology was first developed in the 1980s. The process flow for fabricating SIMOX wafers is illustrated in Fig. 4.6. The first step is the implantation of a high concentration of oxygen underneath the initial silicon surface. A hightemperature anneal (above 1,300 ◦ C) is then performed to regenerate the crystalline
Oxygen (O+) implantation
Silicon
High temperature annealing
Silicon
Silicon Dioxide Silicon dioxide formation
Silicon Fig. 4.6 SIMOX fabrication process
100
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
quality of the upper silicon layer, damaged by the implantation step. This anneal also drives the chemical reaction which forms the buried oxide. Improvements in the buried oxides quality have been obtained through the development of low-dose and multiple-oxygen implantation processes. For low-dose SIMOX, the buried oxide is three to four times thinner than for standard SIMOX and the SOI material cost is lowered. The main drawback of SIMOX is the use of specific non-standard implantation equipment. For this reason, the increase in implantation time can make the cost of SIMOX wafers rapidly increase with wafer size. TM
BESOI with Smart Cut
Techniques
Bonding techniques have long been used for the cheap manufacture of thick-film oxide and silicon wafers. This process starts with two bare silicon wafers. An oxide layer is grown on the top of one of the wafers and they are then bonded together using Van der Waals forces. Subsequent annealing increases the mechanical strength of the bonded interface. Thick-film silicon layers are then manufactured by thinning the silicon layer to about 1 µm by means of mechanical grinding and polishing. However, to be comparable with thin-film silicon layers, several techniques have been developed to produce thin silicon upper layers with good thickness control. One of the most recent and working methods is the Smart CutTM Technique which utilizes hydrogen implantation to provide good SOI film uniformity at low cost. Moreover, it uses standard equipment, which ensures its compatibility with standard semiconductor processes. Figure 4.7 represents the SOI manufacturing process for fabricating Smart CutTM wafers. The first phase is the oxidation of a wafer (wafer A) to form the future buried oxide layer of the SOI structure. Hydrogen implantation is then performed through the oxide, as second step. The third phase consists in the mechanical cleaning of the second wafer, called handle wafer (wafer B), for removing particles and creating a hydrophilic surface. The two wafers are then bonded together by means of Van der Waals forces. The top wafer is cut away with a medium-temperature thermal activation to form a “cutting” plane at the level of the hydrogen-implanted region. The process ends with a high-temperature anneal to strengthen the bonding interface. SOI Response to Radiation The total dielectric isolation of SOI circuits eliminates four layer p-n-p-n paths common to bulk-silicon CMOS circuits and makes SOI circuits completely immune to single event latch-up. For what concerns total dose, the device degradation is caused by radiation-induced charge injection in oxides. For both SOI and the standard bulksilicon technologies, the same types of oxides are typically used and the radiationinduced buildup of charge in these oxides is similar. The main difference between total-dose degradation in SOI and bulk-silicon technologies is due to the buried
4.2
Techniques for FPGA Manufacturer
101
Oxide Oxide growth
Silicon
Oxide
Hydrogen (H+) implantation Silicon wafer A
Handle wafer cleaning Silicon wafer B
Silicon wafer A
Flipping and bonding
Oxide
Silicon wafer B
fer A
n wa
Silico
Cleavage cut and high temperature annealing
Oxide
Silicon wafer B
Fig. 4.7 SIMOX fabrication process
oxide of SOI transistors. As SOI-buried oxides are exposed to ionizing radiation, radiation-induced charge is trapped throughout it. As in the case of gate oxide [104], the radiation response of buried oxides has been observed to have a strong dependency on the manufacturing process [23].
102
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
As mentioned above, both wafer manufacturing process, SIMOX and BESOI with Smart CutTM Techniques, require high-temperature anneals (≥ 1,100 ◦ C) that cause oxygen to diffuse away from the buried oxide, leaving behind numerous defects [42, 128]. These defects can lead to radiation-induced trapped charge. Also implant damage can cause significant radiation-induced degradation in gate and field oxides [41, 106]. Thus, it is natural to expect that the very high-fluence implants used to fabricate SIMOX substrates (and some bonded oxide substrates) may cause numerous implant-related defects that can cause enhanced radiation-induced charge trapping. One technique that has been proposed to reduce the amount of radiation-induced positive trapped charge is the implantation of buried oxides with silicon [87]. The silicon implant creates electron traps throughout the buried oxide that are able to compensate the trapped positive charge, decreasing the total dose degradation of the oxide. Other hardening approaches for SOI technologies are based on device design techniques that mitigate the effects of radiation-induced trapped charge on transistor performance. The body-under-source field effect transistor (BUSFET) design is one example [103]. 4.2.1.2 Trench Isolation and Guard Rings Trench isolation is a technique that does not use particular technologies or variations in process parameters, but instead is based on isolating barriers between neighboring transistors that break the p–n–p–n critical path. As depicted in Fig. 4.8, the trench isolation (TI) structure is composed of deep trenches that are placed between the shallow trench isolation (STI) regions, narrower trenches that separate the TI, and the N + and P+ transistors, forming a ring structure encircling them [124]. In the manufacturing process, the TI step is completed after the STI one. For this reason, Shallow Trenches
N+
Trench Isolation
N+
P+
P+
Trench Isolation
Trench Isolation p-WELL
BURIED OXIDE
SILICON SUBSTRATE
Fig. 4.8 Trench isolation in SOI device: vertical cross section
4.2
Techniques for FPGA Manufacturer
103
TI is not “capped” by the STI, thus not being in a floating state (e.g., no floating capacitor element plate exists in this isolation structure). The TI structure can thus be about 10 times deeper than the STI structure, providing a significant increase in the isolation depth relative to the junctions. Morris [86] showed that CMOS latchup scaled approximately 1 V for every 0.1 µm change in STI depth, leading to an increase on the order of 10 V/µm of trench depth in latch-up trigger voltages. On the basis of trench isolation, the guard rings technique has been developed. To avoid the latch-up paths in CMOS IC’s fired by the overshooting or undershooting current caused by ionizing radiation, double guard rings are often used to surround the p-MOS and n-MOS transistors. However, to avoid the substrate current generated in a single transistor to cause latch-up in the neighboring circuits, transistors should be placed with an enough distance between each other. To reduce the distance thus saving chip size, additional guard rings could be placed between adjacent transistors as shown in Fig. 4.8. Automation tools have been developed in order to avoid the designer to manually insert rings [79] and to automatically compute and optimize distances between transistors [80], thus lowering the probability to insert design bugs and saving time. Trench isolation and guard rings can be used in combination with SOI devices, in order to increase more the latch-up robustness. Figure 4.8 shows the vertical cross section of such a structure, while Fig. 4.9 shows the horizontal one, illustrating the isolation ring that the trenches create around the two N + and P+ transistors.
TRENCH ISOLATION
BIAS CONTACT
N+
P+
Fig. 4.9 Trench isolation in SOI device: horizontal cross section
104
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
4.2.1.3 Other Techniques Many other techniques have been proposed during the last decades that mainly converged to the development of the previously described ones. Another approach that is worthy of mentioning is an architectural level technique, referred in [91]. Indeed, if on the one hand, guard ring solution is very efficient for preventing latch-up, however, it results in excessive area cost. On the other hand, modifications of process parameters may result in an imperfect protection and they incur in significant costs. Process modification may also impact circuit performance in particular for logic parts. At the architectural level, it is not possible to avoid SEL generation but it is possible to detect and eliminate it. Once a SEL is detected by an additional control logic, the action is to limit the current flowing through the parasitic activated devices in order to prevent circuit destruction (in case of catastrophic latch-up). In addition, for both catastrophic latch-up and micro-latch-up, this current has to be limited at a level that eliminates the latch-up and brings the circuit back to normal electrical conditions. This action can be performed by using current-limiting transistors. These transistors are placed on the power grid and can be turned off upon latch-up detection to block the abnormal current induced by a latch-up.
4.2.2 Mitigation Techniques for TID Hardening techniques for TID effects act at the layout and technology level. Total ionizing dose effects are due to positive trapped charge and creation of interface traps in oxides [14]. TID primarily increases leakage under isolation oxides and at the gate edges. Since the trapped charge is positive, only n-MOS transistors suffer from increased leakage due to these gate edge parasitic devices. Similarly, leakage between N-type diffusions can be increased by reduction of the field oxide. The main idea to reduce such effects is to increase the threshold voltage by means of variation in the process parameters or the re-design of the MOS structure, as well as decreasing the probability of ion implantation due to radiation in the device, in order to reduce the TID itself. The last approach is achieved by the use of SOI technology, as already seen in the previous section. On the other hand, the most effective RHBD techniques to cope with TID effects are edgeless transistor geometries and the use of reverse-body bias (RBB) in MOS design. 4.2.2.1 Edgeless transistors Normal MOS structure presents two edges that separate the gate and the channel from the source and the drain regions. By designing the transistor in an annular shape instead, as depicted in Fig. 4.10, the channel electric field will be changed by the curvature of the gate compared with a conventional MOS with an equivalent applied drain voltage. The transistor reliability due to radiation-induced carrier generation at the drain will be affected by the changes in the channel electric field. Hot-carrier degradation in normal MOS structures is due to injection of channel
4.2
Techniques for FPGA Manufacturer
105 Annular gate
Source (Drain)
Source (Drain)
Annular gate
Drain (Source)
Drain (Source)
Silicon Substrate
(a)
(b)
Fig. 4.10 Annular edgeless transistor: top view (a); vertical cross section (b)
hot electrons (CHE) or substrate hot electrons (SHE) into the gate oxide caused by impact ionization at the drain end of the channel where the electric field reaches a maximum. Since the rate of impact ionization is known to increase rapidly with the maximum electric field in the MOS channel [116], a change in the electric field by even a few percent can have a significant impact on device lifetime. This difference in ionization coefficient can thus lead directly to enhanced degradation in the devices with higher electric field. Annular transistors could provide worse or better TID robustness than conventional MOS depending on whether the drain is chosen as the inner or outer electrode, respectively, raising or lowering the channel electric field. Choosing the right design parameters, the use of edgeless transistors can thus improve the total-dose radiation hardness of a CMOS-integrated circuit [70], even if area, capacitance, and power dissipation performance suffer from some penalties [69]. The curvature of the gate in edgeless transistors also has an impact on the device reliability, enhancing or degrading the TID robustness depending on the drain position. The more extreme the curvature, the more the effect becomes relevant. Moreover, the shrink of gate dimensions in future technologies will increase the impact of such a behavior.
4.2.2.2 Reverse-Body Bias Another technique that can be used to mitigate TID effects is designing transistors with RBB. In order to illustrate the advantages of such an approach, we first need to introduce the so-called body effect. As described in [12], the threshold voltage Vth of a MOS transistor is described as follows:
106
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
√ 2 S q Na Vth = COX
(4.1)
However, if the body(bulk)-to-source voltage VSB is not null, (4.1) should be modified, taking into account the contribute of such a voltage that impacts on the depletion region size and consequently on the transistor’s threshold voltage. Equation (4.2), where γ and f are process parameters, illustrates this change: Vth =
√ 2 S q Na + γ ( 2Φf + VSB − 2Φf ) COX
(4.2)
In the RBB technique, a negative bias is applied to the n-channel transistors to raise their threshold voltage. The back-bias application produces two different effects that move in the opposite direction. On the one hand, applying a negative bias to the substrate increases the electric field across the oxide. As reported above, this negatively impacts on the radiation tolerance because increases the generated holes. On the other hand, it increases the threshold voltage of both the gate oxide and the field oxide regions. Thus, the field oxide region can be exposed to a larger dose before the parasitic current appears [36]. However Xapsos et al. [132] gave evidence of how the application of back-bias greatly improves the radiation tolerance. This behavior is due to the fact that the most relevant physical effect is the resulting increase in threshold voltage in the field oxide, instead of the increase in the channel electric field. The reason that the back-bias has such a large impact on the field oxide compared to the gate oxide is that it is mainly proportional to the oxide thickness [121].
4.2.3 Mitigation Techniques for Single Memory Elements One of the most important and critical element in FPGAs is the memory element. It is used to build up the configuration memory, as well as user memory, both as flip-flops, latches, and embedded RAMs. Different kinds of mitigation techniques and design structures can be applied to achieve single event effects robustness in memory elements. Two fundamental concepts are at the basis of SEU-immune storage cells design. First of all, redundancy in the memory circuit that provides a source of uncorrupted data after an SEU happens. This is obtained by using specifically designed latch replicas that store the same data. Second, data in the uncorrupted replicas provide specific state restoring feedback to recover the corrupted data. This theoretical principle can be achieved by means of different practical approaches. 4.2.3.1 Triple Module Redundancy (TMR) One of the most used techniques for the hardening of memory elements against SEUs is TMR. It is based on the triplication of the memory element itself and the
4.2
Techniques for FPGA Manufacturer
107
majority voting for deciding which is the correct output value. It is a particular implementation of the nMR approach that is used at different levels of abstraction, up to the system level, to cope with single faults in one replica. Indeed, the voting works only if the majority of the replicas store the correct value, otherwise, the whole mechanisms fail. Several versions of this technique have been proposed, according to the functionality the memory element has to accomplish. Details will be described in Section 4.3.1. 4.2.3.2 Dual Interlocked Storage Cell (DICE) First proposed in the second half of 1990s [28], the DICE technique has been widely used and is still in use, for both latches and static RAM cells hardening against SEUs. As depicted in Fig. 4.11, it is composed of two conventional cross-coupled inverter latch structures N0 –P1 and N2 –P3 , connected by bidirectional feedback inverters N1 –P2 and N3 –P0 . The crossing of these structures forms four nodes X 0 . . . X 3 that store the data as two pairs of complementary values (i.e., 1010 or 0101) which can be accessed at the same time using transmission gates for write or read operation. To achieve SEU robustness, the additional principle of dual node feedback control is exploited. This means that the logic state of each of the four nodes of the cell is controlled by two adjacent nodes located on the opposite diagonal. The two nodes on each diagonal do not depend directly one to another, their state being controlled by the two nodes of the other diagonal. A node X i (i = 0 . . . 3) P1
D
N4
X0
N5
X1
Dn
N0
P0
N1
N3
P2
N2 X3
X2
P3 N6 N7 CK
Fig. 4.11 Logical representation of the DICE architecture
108
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
controls the two complementary nodes on the opposite diagonal, X i−1 and X i+1 . The inverter symbols shown in Fig. 4.11 can be implemented either as P-type or N-type transistors, as their names suggest. They thus form two opposite feedback loops, a clockwise P-transistor loop, P0 . . . P3, and an anti-clockwise N-transistor loop, N 3 . . . N 0. If we consider the logic state “0” as X 0 . . . X 3 = 0101, the horizontal inverter loops formed by transistors N0 –P1 , N2 –P3 are in conduction, composing two latches that store the same data at their nodes, X 0 –X 1 and X 2 –X 3 . The transistor pairs N1 –P2 and N3 –P0 are instead in the off-state. In this manner, they act like a feedback interlock, isolating the two horizontal latches one to another. On the opposite side, for the logic state 1, X 0 . . . X 3 = 1010 and the vertical inverter pairs N1 –P2 , N3 –P0 are in conduction, performing the latch function. The horizontal transistor pairs N0 –P1 and N2 –P3 are instead off and perform the feedback interlock function, isolating the two vertical latches one to another. A negative upset pulse at any sensitive node X i would induce a positive spurious transition at node X i+1 through the P-transistor feedback Pi+1 . However, the feedback transistor Ni−1 , being blocked by the radiation-induced negative pulse, does not allow the spurious transition to affect the same logic state stored at node X i−1 . The positive pulse at node X i+1 is thus not further propagated through transistor Pi+2 . Nodes X i−1 , X i+2 are thus isolated and conserve their logic state unaffected. The spurious transition disappears after the upset transient, due to the state-reinforcing feedback provided by the other two nodes X i−1 , X i+2 through transistors Pi and Ni+1 . The same behavior can be observed for a positive transient pulse at node X i . The transistor-level schematic of the DICE storage cell is presented in Fig. 4.12. It is important to note that, if two sensitive nodes of the cell,
P0
P1
X0
P2
X1
N0
P3
X2
X3
N2
N1
N3
CK
N4
N5
N6
N7 Dn
D
Fig. 4.12 Transistor level representation of the DICE architecture
4.2
Techniques for FPGA Manufacturer
109
which store the same logic state (i.e., either nodes X 0 –X 2 or nodes X 1 –X 3 ), could be simultaneously flipped due to a single particle impact, the immunity is lost and the cell is upset. The probability of this event occurrence can be made very low if the transistor drain areas occupied by the sensitive node pairs are spaced on cell’s layout, so that the critical charge amount cannot be shared. 4.2.3.3 Built-In Soft Error Resilient (BISER) Flip-Flop The BISER flip-flop design technique is based on two flip-flops joined with a Celement as shown in Fig. 4.13. To illustrate how SEU resiliency is achieved by means of the C-element, we can consider the scenario where a particle hits one of the four latches when CLK is low. The C-element is a structure whose output floats in the high-impedance state if both the inputs are not identical. When the output node floats, the output voltage is maintained by the keeper structure shown in the figure. When both the inputs are identical, instead, the C-element acts like an inverter. Assuming the single event upset (SEU) as the fault model, only one latch is affected by a particle strike. When CLK is high, latches LB and PH1 are transparent, and the same data are stored in these two latches. As seen before, the C-element behaves like an inverter when the outputs of LB and PH1 match, and the flip-flop output Q has the correct value. When CLK turns low, latches LB and PH1 hold the stored logic value inside their feedback loops, their content becoming sensitive to radiation-induced SEUs. LA and PH2, on the other hand, are not SEU sensitive because they are transparent and driven by the preceding logic stages. If particle strike flips the logic value stored in PH1, the two inputs of the C-element will be different but the error will not propagate to the C-element output, according to its operation mode. Using the same principle, error correction is also enabled during the scenario where a particle strike occurs when CLK is high. The purpose of the keeper circuit in Fig. 4.13 is to fight the leakage current in the C-element when both the pull-up and the pull-down paths in the C-element are shut off, which occurs only when the inputs are in disagreement. Depending on the process technology and also
C element
Latch LA
Latch LB Keeper
Flip flop 2
D Latch PH2
Latch PH1
CK Flip flop 1
Fig. 4.13 The BISER flip-flop architecture
Q
110
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
on the clock frequency, the keeper structure may be avoided, thus saving some area. A particle strike in the keeper circuit will not cause an error at Q, because both the C-element inputs are holding the correct logic values, under the SEU assumption, and, hence, the Q node is strongly driven by the C-element. Even if the cost of this solution is higher than the DICE latch’s one, such a technique can be reused when dealing with scan flip-flops, as illustrated in [81]. 4.2.3.4 Cosmic Ray Immune Latch (CRIL) Another RHBD memory element is the so-called CRI latch, proposed in [8]. This structure, however, suffers some troubles when radiation effects are too strong. For this reason, Liang et al. [73] propose a modified version of this latch to enable its use in harsh radiation environments. Figure 4.14 shows the principle at the basis of the enhanced-CRIL architecture. There are three storage nodes, XP, XN, and XD, used to keep the latch state. When a radiation-induced pulse corrupts the value of one of these nodes, the correct data can be restored by the values on the other two nodes. If a transient error, an SET, occurs on XP when the latch keeps “0”, the pulse turns off the P1 transistor, thus making XD lose its drive, but the output Q is still kept at “0.” It keeps N3 in the off
XP P1 P4 P2 P5
CKn
P3 P6 D
Q XD N5
N2
CK N6
N4
N3
XN
Fig. 4.14 The enhanced-CRI latch (CRIL) architecture
N1
4.2
Techniques for FPGA Manufacturer
111
state, thus guaranteeing that the value on XN is conserved even though the error on XP turned on N2. Because these two unchanged “0”s on Q and XN keep P2 and P3 on, the correct value, “0,” is restored on XP. An upset on XP does not affect XN and XD, and this recovery mechanism works. Recovery after an SET on XN works in the same way. Considering the last node, XD, the pulse may induce an error that can be either “1”-to-“0” or “0”-to-“1.” This turns on either the P3 transistor or the N3 one; however, P2 remains off for the former case and N2 remains off for the latter. On the basis of such a behavior, the error never propagates to the other nodes. Even if, as consequence of a radiation-induced SET at one of the latch nodes, the output Q disappears for a while, the correct value of XD can be restored through P1 or N1. To change the state of the latch or to write data into it, nodes XP and XN must be altered simultaneously. This mechanism can be also exploited to mitigate SEUs induced by an SET on the input signal of the latch. Moreover, due to the fact that only p-MOS are connected to the node XP, a high to low transient can never happen on it; correspondingly, a low to high transient can never affect node XN because there are only n-MOS connected to it. Transistor P1 or N1 should be dimensioned with attention, in order to never conduct too much current, so that node XD is not affected by transients on node XP or XN. 4.2.3.5 Statically and Dynamically Hardened (S/DH) Latch The latch studied in [83] has the advantage, with respect to the other ones seen before, of coping with both the issues of SEUs caused by a particle striking one of the latch’s nodes and the issue of upsets caused by SETs happening on the element’s input signal. A scheme of the S/DH latch circuit is shown in Fig. 4.15. Two inverters are present, INV1 formed by the transistors couple M2–M3, and INV2 composed by the M5 and M6 transistors. In this structure the writing operation is always conditioned by the settling of input, D or Dn, at the high level. If, for example, Dn has been set at a “0” just before D was established at a “1,” the switching of the INV2 is delayed as long as the M4 transistor is not turned on. This condition becomes true when the node Qn reaches a low level, which only happens after the D input is set to “1.” As soon as M4 turns on, the “0” present at the Dn input is propagated through the INV2 inverter. The value inside the latch is thus written. The hardening design methodology is based on the fact that only a reverse-biased junction can be upset. This means that the values of an n-MOS storing a “0” and of a p-MOS storing a “1” cannot be changed by radiation effects. For this reason, only the reverse-biased junctions are faced by the hardening technique. This implies that Q or Qn nodes can be upset for both logical values either through the M2, M6, or MR2 junctions or through the M3, M5, or MR1 junctions and that G or Gn nodes can be upset if they store a 1 either through the MT1 or MR2 junctions or through the MT2 or MR1 junctions. To prevent a possible upset on the Q or Qn node, the concept of feedback poly-silicon resistors has been used. Since CMOS resistors have a large size, they are replaced by MOS. The MR1 and MR2 transistors “shield” the G and Gn nodes against the SEU and keep safe the memory state. Now, for an upset caused on the
112
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
M4
M1
H H
M5
M2 D
MR1 Gn
Qn
MT1
Q G
Dn MT2
MR2 M3
INV1
M6
INV2
Fig. 4.15 The S/DH latch architecture
G or Gn node the data lost is avoided by means of a high-impedance state. Indeed, when the node Gn (G) storing a “1” is upset to “0,” the transistor M4 (M1) stays off. So Q (Qn), the output of the latch, is put in a high-impedance state and keeps its stored value “0” safe during the SET effect. Finally, after the spurious transition, the Gn (G) node value is restored. A flip-flop based on the S/DH latch, used as both the master and slave latches, is also hardened against SETs in the input signal because • the high-impedance state protects the cell from a “1”-to-“0” upset on the highlevel input of the slave latch; • if a “0”-to“1” SET is present on one of the inputs, the erroneous value has to propagate through an inverter plus a resistive n-MOS. The delay necessary to this propagation and the presence of the dual input produce a very effective hardening. 4.2.3.6 Delayed Clocks Another technique that can be effectively used in designing SEU-hardened memory elements is replicating and delaying clock signals. This technique has to be used together with the previously mentioned ones that already imply redundancy of the memory element itself. As explained in Section 3.1.5, SETs in global lines and, in particular, in clock signals could cause data loss or corruption, induced by metastability that cannot be recovered, even with redundant memory elements, if the clock is shared among them. For this reason, as suggested in [94], in order to enable the use of memory elements in mission-critical applications working in harsh radiation environments it is good practice to use replicated clock lines.
4.2
Techniques for FPGA Manufacturer
113
In addition, clock signals can be delayed with respect to each other in order to implement time redundancy against transient effects induced by radiation. If an SET strikes the input line of the different element replicas, even if clocks are replicated, the wrong input may be sampled by all the replicas at the same time. Delaying the different clock signals with respect to each other, with a light skew, the probability to sample the transient fault in the majority of the replicas is decreased proportionally to the introduced delay. The most significant disadvantage of such a technique is the amount of delay added to the sequential paths containing such memory elements. Wang and Gong [127] propose another design for latch hardening, based on the delayed clocks principle mixed with the DICE technique.
4.2.4 Mitigation Techniques for Programming Elements FPGAs programming element is at the base of the device functionality. It is the basic element that provides programmability and in most cases also reconfigurability. As mentioned in Section 2.1, three main technologies are used to implement programming elements: SRAM, Flash, and antifuse. There are also some other technologies, still in the experimental phase that are under evaluation for possible optimizations of the configuration memory, based on optical switches [129] or magnetic ones [136]. 4.2.4.1 SRAM Technology Nowadays, SRAM is the most used technology to implement programming elements in FPGA devices. It guarantees on-the-fly reconfiguration with very high performance and can take advantage of the developments in the SRAM technology used in commercial applications to implement always smaller and faster devices. The basic six-transistor (6T) SRAM cell is depicted in Fig. 4.16. If, on the one hand, SRAM technology is the most used one, on the other hand, however, above the three main technologies used for implementing FPGA programming elements is the less robust for what concerns radiation effects [29, 34]. For this reason, several techniques and design methodologies have been proposed to harden programming elements built in this technology. Since the principle at the basis of the SRAM functioning is similar to the one of latches, many mitigation techniques are the same used for single memory elements. DICE and TMR, or derived ones, can be adopted. However, programming elements are usually present in a huge amount in an FPGA, thus requiring to save as more space as possible in building them, in order to be able to integrate always an increasing number of elements. DICE cell doubles the size of a standard 6T SRAM cell, being composed of 12 transistors. Some other designs have been thus proposed to reduce the memory element size still keeping the SEU hardness. A first approach, proposed in the early 1980s in [7], is to add feedback resistors, also called decoupling resistors, to critical nodes in the original 6T SRAM. The concept, that is at the basis of the S/DH latch technique previously seen, is to place two series resistors, Rg , in the cross-coupling lines of the inverter pairs of a memory
114
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
WL
M2
M4
Q Qn
M6
M5 BL BLn
M1
M3
Fig. 4.16 Basic 6T SRAM cell
cell, as depicted in Fig. 4.17. An ionizing particle that strikes the drain depletion regions of transistors in the off state generates electron–hole pairs which are collected by the electric field and affect the saturation current of the MOS structure. These additional currents change the gate voltages, Vg , of the opposite inverter pair and tend to destabilize the logic state. The amount of change (Vg ) depends on how much charge actually appears at the sensitive node. The insertion of series resistors Rg reduces the maximum amount of voltage change produced by a given charge. At the same time, this increases the critical charge Q crit , needed to produce an upset, thus hardening the cell. A slight variation of the decoupling resistors technique is proposed in [126]. Figure 4.18 shows the architecture and highlights the addition of a capacitance to provide an RC-based, instead that only R-based, filtering solution. This approach follows the same principle as resistor decoupling uses. There are two RC networks inserted in front of the inverter gates. They suppress short current pulses induced by single-particle strikes. The advantage of this second solution is that by introducing extra capacitance, the resistance value can be dramatically reduced. This resistance reduction makes it possible to use silicide polysilicon resistors, which suffer from a much less temperature variation and are easier to manufacture. The polygate line is used as both Rg and C g , because of two main reasons: first, gate capacitance per unit area keeps increasing with shrinking gate oxide thickness; second, it allows saving silicon area. However, the normal gate capacitor has also two drawbacks: the source–drain couple forms a junction with the substrate–n-well couple that can lead,
4.2
Techniques for FPGA Manufacturer
115
WL
M2
M4
Rg
Q
Qn
M6
M5
Rg BL
BLn
M1
M3
Fig. 4.17 A 6T SRAM cell with decoupling resistors
WL
M2
M4
Rg
Q M6
Qn M5
Rg BL
Cg BLn
Cg M1
M3
Fig. 4.18 A 6T SRAM cell with decoupling resistors and capacitors
116
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
in case of radiation strike, to a voltage change in the capacitor’s bottom plate, which can be coupled into the gate node and cause SEU; moreover, the source and drain contacts increase the area penalty and local routing complexity. It is thus possible to use a p-gate on p-substrate-type capacitor. Because there is no junction in such a structure and the substrate node is always grounded, no particle hit can cause voltage changes in the capacitor bottom plate. The scheme for such a solution is depicted in Fig. 4.19 [72].
WL
M2
M4 M8 Q
M6
Qn M7
M5 M1
BLn
M3
BL
Fig. 4.19 8T SEU-hardened CSRAM cell
It is important to note that using SOI technology the size of the feedback resistors can be considerably lower than the size of the resistors required for bulk-silicon circuits to obtain the same hardness level. 4.2.4.2 Flash Technology Another technology used to implement programming elements is Flash. The Flash memory cell is based on the floating-gate transistor, represented in Fig. 4.20. The floating gate transistor is composed of a silicon (Si) substrate layer, on which is built the classical MOS structure. However, the region between the transistor gate and the channel, that in the MOS technology is simply a layer of insulator, is here more complex, in order to provide the memory capability. In particular, there is a stacked structure composed of four different layers: the control gate, electrically controlled by the memory wordline, the ONO insulation layer, another polylayer
4.2
Techniques for FPGA Manufacturer
117 ONO
Wordline Control gate
Floating gate
Source
N-channel
Drain
Silicon Substrate
Fig. 4.20 Floating-gate transistor architecture
(floating gate), and, finally, an insulation layer that divides the floating gate from the transistor’s channel. The definition of the stacked structure is a very critical step of the manufacturing process, which requires the etching of a triple layer: poly2 (control gate), ONO, and poly1 (floating gate). As described in [98], the Flash memory cell can be used as a programming element, reaching high-density configuration memories. The Flash technology has a different behavior in terms of radiation response compared to SRAM. Indeed, since Flash cells used in FPGAs have a greater size than cells used in very high-density applications, like mass storage units, they are not sensitive to SEUs [100]. However, evidence of upsets in commercial, high-density, Flash memories has been reported [11], and this let us think that future Flash-based FPGAs, with smaller memory cells, could suffer also from this kind of fault. On the other hand, TID hardness for such technology is lower than the one of radiation-hardened technologies [125]. Indeed, considering the programmed state of a cell, when a high-energy ion strikes the floating gate structure, its behavior is modified for three main causes: electron injection in the floating gate, hole emission from the floating gate, and electron trapping within the insulation layers. For the erased state, the mechanisms are the same, simply exchanging electrons with holes. All the three phenomena change the threshold voltage Vth of the structure in similar manner. In particular, in the programmed state it is arisen by the charge injected within the transistor, while in the erased state it is lowered. This effect leads to an equality of the two states, thus making the cell no more programmable. The mechanisms that are at the basis of such a behavior are
118
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
as follows: first, charges injected in the floating gate are not able to skip the potential barrier of the insulation layers; second, even more critical, the charge trapped within the dielectric permanently affects the material characteristics, having changed the lattice structure of the material itself. For this reason, Flash-based FPGAs should be used only in mission-critical applications that are characterized by a low-dose environment or for very brief missions.
4.2.4.3 Antifuse Technology An antifuse is a device that irreversibly changes from a high to low resistance state when a programming voltage is applied across its terminals. Thus, one of the main features of the antifuse technology is that such devices are programmable only once. Another important characteristic is that their size is very small, if compared to the other technologies and above all to SRAM, and this allows a very high degree of integration. Antifuse-based FPGAs are thus one-time programmable FPGAs and do not allow reconfiguration, but, at the same time, they are not sensitive to SEEs, because their state cannot be inverted. Antifuses fall into two categories: dielectric and amorphous silicon antifuses. The dielectric antifuse is a structure based on a dielectric layer, usually ONO, placed between an N+ diffusion and poly-Si, as shown in Fig. 4.21. To integrate it in a standard CMOS process, three additional masks are required: N+ antifuse diffusion, antifuse poly, and thin oxide mask [55]. To program the antifuse and bringing it in a high state, a high voltage (10–20 V) is applied between the poly and the diffusion, to melt the dielectric. The resistance of the conductive state is thus determined by the size of the link between the N+ diffusion and the poly-Si. The size of the link is determined by the amount of power dissipated in the link which melts the dielectric. Since the temperature of the molten part has an inversely proportional dependence with its radius, the molten core will expand until its temperature drops below the dielectric’s melting point. This process, on the basis of
Poly-Si
ONO dielectric
N+ diffusion
Fig. 4.21 Basic structure of a dielectric antifuse
4.2
Techniques for FPGA Manufacturer
119
Al
Al
BPSG
Poly 2 ONO PSG Poly 1
Oxide Silicon Substrate
Fig. 4.22 PIP antifuse scheme
the materials’ characteristics and layer sizes, defines the voltage to be applied to program the device. In [62] another dielectric antifuse structure is proposed that uses tungsten silicide, WSi2 , instead of both poly-Si and the N+ diffusion, for its good thermal stability and lower resistivity value. As shown in Fig. 4.22, the first electrode layer, Poly-1, is embedded in a phosphor-silicate glass (PSG), that is used as a dielectric film between layers in a Si substrate. A very thin ONO layer (about 5 nm) is then deposited. The second electrode, Poly-2, is finally deposited onto the ONO layer, covering and insulating everything by means of boro-phosphor-silicate glass (BPSG). Because of the presence of two poly-Si-based electrodes, this antifuse is referred as poly-insulator-poly (PIP) antifuse structure. Amorphous silicon antifuse relies on the same structure, but instead of the dielectric, they use amorphous silicon (α-Si). Such devices, also called metal-tometal antifuse (MMAF), are based on the principle that amorphous silicon, placed between two metal layers, subjected to a high voltage, can be transformed into a polycrystalline silicon–metal alloy with a low resistance, thus connecting the two metal layers. Figure 4.23 shows a vertical cross section of the antifuse structure as described in [131]. The lower titanium–tungsten (TiW) electrode is deposited on top of the inter-metal dielectric. Another thick oxide layer is deposited above it. Then it is etched to form the (α-Si) layer, leaving two spacers between the amorphous silicon and the second metal electrode. The spacers isolate the thinnest and most variable area of the amorphous silicon from the top TiW electrode to improve the control of the programming voltage, which is a function of the amorphous silicon thickness. There are two advantages of an MMAF over a dielectric antifuse. First of all, connections to an MMAF are direct to metal that are thus part of the wiring layers.
120
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions Oxide spacers
-Si
TiW
Oxide
TiW
Oxide
Fig. 4.23 Amorphous silicon antifuse structure
In a dielectric antifuse, instead, connections to the wiring layers require additional space for the contacts and create higher parasitic capacitance. Moreover, this direct connection to the low-resistance metal layers makes it easier to use larger currents when programming the antifuse device. Chiang et al. [35] report a comparison between different antifuse technologies used as programming elements in FPGAs, with respect to the most important parameters: leakage and breakdown voltage, resistance, time-dependent dielectric breakdown (TDDB), on-resistance stability, and the thickness. Due to the high voltages applied during the programming process and the manufacturing process itself, one of the main issues of antifuse technology is reliability. Indeed, the dielectric antifuses are characterized by the TDDB reliability mechanism [109]. The time for the dielectric antifuse to breakdown is a function of the applied electric field and the FPGA fails when any one of many dielectric antifuses ruptures [35]. For these reasons, several techniques and manufacturing process have been studied in order to improve the reliability of antifuse technology and achieve a good yield [89, 118].
4.2.5 Mitigation Techniques for Memories In an FPGA there are mainly two kinds of memories: the configuration memory and the embedded block memories. The first one is composed of programming elements
4.2
Techniques for FPGA Manufacturer
121
of one of the technologies mentioned before; the embedded memories are usually blocks of SRAM. In addition to these two kinds of memories, in newest FPGAs, with embedded hard-wired cores, it is possible to find other memories, like caches in microprocessors; also these are usually SRAMs. Each memory can be hardened according to the techniques and methodologies explained in Sections 4.2.3 and 4.2.4. Usually, the configuration memory is hardened using techniques reported in Section 4.2.4; other memories, since they are not likely to be large, may be hardened with techniques presented in Section 4.2.3. All the methodologies and techniques presented before are single-bit-based; they make robust every bit of the memory, basically modifying its storage cell structure. However, it is possible to apply another technique that does not address every single bit in the memory but, instead, the words or lines of the memory itself. Such an approach allows saving space because mitigation is applied only once for each set of bits but, on the other hand, it cannot guarantee the same hardening performance as single-bit methodologies. Such an approach is based on the use of information redundancy, i.e., error detection (and correction) codes and memory scrubbing. It is out of the scope of this book reporting all the possible codes that could be used to detect and even correct single or multiple faults in memories; however, some brief considerations could be done with respect to memories in FPGAs. The first distinction can be done between single error detection (SED) and single error correction–double error detection (SEC–DED) codes. The first kind of codes allows just the detection of an error and these methods are far to guarantee the required level of reliability necessary for mission-critical applications. The second kind of codes, instead, allows the automatic identification of double errors and the correction of single errors, thus being more suitable for these applications. The application of such codes, also referred as error correcting codes (ECC) [1, 76], to a memory system requires also a dedicated circuitry to detect and eventually correct errors present in memory; the whole set of features, in terms of check bits and detection/correction circuitry, is called an error detection and correction (EDAC) system [95]. In order to enable the use of SED codes or to solve the problem of multiple errors with EDAC systems, scrubbing is required. Scrubbing is a technique by which the memory is rewritten with the correct data, periodically, or when an error is detected. Scrubbing needs another additional circuitry that refreshes the content of the memory with the correct values. For memories that store data that change while the application is running this task could be non-trivial and computationally intensive. In general, scrubbing, with or without SED codes, should be used for non-reliable memories with a fixed content, like SRAMs configuration memory, while for embedded RAMs, EDAC systems are preferred [94]. Should be noted that, even if these techniques are able to correct memory errors, however, they cannot avoid faults, meaning that when an SEU occurs, the bit remains temporarily changed until the correction system does not intervene to correct it. For configuration memories, this leads to have transient faults in the FPGA resources and thus in the implemented application, induced by a temporary variation of a configuration bit.
122
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
4.2.6 Mitigation Techniques for Logic Elements In an FPGA the resources that actually implement the requested functionalities are the logic elements embedded into the configurable logic blocks (CLBs). The main concern for this kind of resources is SETs, that can be caused by ionizing radiation that strikes a sensitive node, inducing a temporary variation of logic signals, that may change in SEUs if latched by following memory elements or propagate through the logic elements. Also in this case mitigation techniques can be applied at different levels; there are layout techniques that provide design rules and methodologies to improve the reliability of the device on the one hand; on the other hand, RHBD techniques are able to mitigate SETs introducing new structures based on the concept of time redundancy, in order to filter the transient pulse. The most relevant RHBD technique is called guard gate. 4.2.6.1 Layout Mitigation In sub-100 nm CMOS technologies, parasitic elements in the CMOS structure lead to significantly distinct P-MOS and N-MOS device charge collection characteristics, especially the temporal nature of the currents related to charge collection. The bipolar amplification is the main mechanism differentiating the P-MOS hits and the N-MOS hits. Mitigation techniques aiming at reducing the bipolar effects should result in shorter SET pulses for p-hits. The design parameters have a major effect on parasitic bipolar amplification because they determine the voltage drop in the n-well, are the n-well vertical resistance (Rvertical ) associated with the contact area [92] and the resistance between the channel and n-well contact (Rwell ) [114]. In detail, Rvertical is a direct function of the n-well contact area, since bigger n-well contacts will yield lower Rvertical requiring higher currents to turn on the parasitic bipolar transistor. While Rwell is directly related to the distance between the n-well contact and the transistor, close spacing of these two will result in a diminished effect of the parasitic bipolar amplification. The increased rate of charge removal is another mechanism capable to reduce the SET pulse widths. The rate is calculated referring to the current-carrying capability of the restoring transistor. One of the main techniques for reducing the pulse widths in the past has been to increase the size (i.e., transistor width) of the transistors associated with vulnerable nodes. However, this technique has some drawbacks since the increasing of the transistor width also creates penalties on the area and power characteristics. Therefore, some techniques only harden those nodes that are, probably suspected, to be most vulnerable. However, results reveal that at sub-100 nm technology nodes, NMOS devices collect significantly less charge than PMOS devices due to parasitic bipolar amplification on the PMOS devices. A better transistor sizing can be achieved based on the SET pulse width difference between n-hit and p-hit. This non-uniform increase helps reduce the p-hit pulse width by 30%. The reduction in pulse width is due to the increased rate of charge removal due to the increased NMOS restoring current drive associated with the p-hit node, while keeping the parasitic bipolar transistor area at a minimum.
4.2
Techniques for FPGA Manufacturer
123
4.2.6.2 Guard Gate The guard-gate technique is based on the principle of time redundancy: the value of the output signal of a logic element is checked twice at different times, separated by a delay δ, and if the values are in accordance with each other the result passes through, otherwise the previous correct value is maintained, until the spurious transient expires. Such an approach is able to filter SETs up to a width equal to the delay between the two checks; however, it involves the insertion of delay structures, and thus an area overhead, and a penalty in speed performance, because it introduces additional delays within a combinational path. In order to minimize the area and speed penalty, guard gate should be introduced just before a memory element, that is the critical element that may eventually sample an SET. Authors of [13, 56, 82] propose the adoption of such technique to mitigate SETs in combinational logic of ICs. The guard gate by itself is a buffer circuit with two inputs and one output as shown in Fig. 4.13. The output floats in the high-impedance state if both the inputs are not identical. When the output node floats, the output voltage will maintain its value until leakage current degrades it. When both the inputs are identical, the gate acts like an inverter. The fact that both the inputs are needed to be identical lends itself for mitigation of small signal perturbations. To understand the use of guardgate logic gate, we consider a simple example where a combinational logic block is feeding data to a latch. If an SET is generated and propagated to the output of the combinational logic, it may result in erroneous data latch. As an example, the guard gate is employed in the circuit as shown in Fig. 4.24, where both the guard-gate C-element
A
Logic element
-delay
B
Fig. 4.24 Mitigation of SETs in logic elements by means of guard gate
keeper
Q
124
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
inputs are from the combinational logic block output, but one of them is delayed. Also, assuming that the delay for the second input is longer than the SET pulse width, in such a case, the SET pulse present in the original signal and the delayed signal will not be input simultaneously to the guard gate. As a result, during the presence of SET pulse at one of its inputs, the guard-gate output will float and will maintain its voltage value. This effectively eliminates the SET pulse from reaching the latch input. The above discussion assumes that the delay in the guard-gate input path is longer than the SET pulsewidth. If the SET pulsewidth is longer than the delay, there will be an overlap of the erroneous signals at the inputs of guard gate as shown in Fig. 4.25. The top figure shows the signals when the SET pulsewidth is shorter than the delay on the input. As a result, the SET pulse shows up at the guard gate inputs at different times, resulting in elimination of the SET pulse by the guard gate. However, when the SET pulsewidth is longer than the delay, the guard gate inputs will experience an overlapping of erroneous signal. This will result in the guard-gate output to respond to the erroneous inputs. Under such a scenario, the SET pulse will be allowed to propagate through in its entirety to the latch. As the delay on the guard-gate inputs must be decided by the designer in advance, SET pulses that are longer than the set delay cannot be eliminated by the guard gate. Another issue related to SETs in logic elements is that recent studies put in relation the pulse width with the combinational path it traverses. In particular, the so-called propagation induced pulse broadening (PIPB) effect has been observed and investigated [32, 50], providing evidence that traversing logic elements, the SET can be broadened besides being filtered, depending on the electrical and logic characteristics of the logic path itself. This means that possible elongations of the pulse should be taken into consideration when designing the guard gates. Some research has been performed to mitigate such a behavior at the application design level, as explained in Section 4.3.1.
4.2.7 Mitigation Techniques for Input/Output Elements Input/output (IO) elements in last FPGAs are complex blocks that embed several logic resources and may also contain sequential resources, such as memory elements of flip-flops, to ease the input/output buffering of the interface signals of the application. In general IC pads require several circuits. Common I/O circuits include electrostatic discharge (ESD) protection, hysteretic noise rejection circuits, voltage level converters, and buffer circuits to drive the IC pad or internal routes to the core circuitry. At a minimum, buffers must be supplied for both input and output pads. The main mitigation technique against SEEs in this kind of elements is TMR. While the pads utilize TMR to mitigate SETs that may occur in the signaling path, a single pad is used per signal. The input pad splits a given input into three separate replicas, which are correctable by majority voting. The output pad is similarly controlled by three copies of a given data bit, which are voltage shifted up and fed into a majority gate pad driving stage. Clark et al. [37] propose a possible implementation for input and output pads with TMR technique.
4.2
Techniques for FPGA Manufacturer
125
V
L A t
B t
Q t
(a)
V
L A t
B t LQ (b)
t
Fig. 4.25 Waveforms for guard-gate system in the case of a shorter than δ SET (a) or wider than δ SET (b)
Swift et al. [115] report testing of FPGA I/O blocks using heavy ions and with fault injection in three modes: non-redundant, TMR on inputs only, and full input and output TMR. The second solution reduced the cross section by only 50%. With full TMR, instead, no beam testing failures were observed. The third approach thus effectively reduces the cross section per I/O channel by at least two orders of magnitude. Comparisons of radiation testing experiments with fault injection simulations showed that about 30% of the observable errors were due to SETs in the output circuit path.
126
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
4.3 Overview of Techniques for FPGA User 4.3.1 In-Chip Mitigation Techniques 4.3.1.1 Synthesis Techniques The redundancy-based techniques presented in this section adopt additional hardware components or additional computation time for detecting the presence of soft errors modifying the expected circuit operations and masking the error propagation through the circuit outputs. The techniques based on redundancy are not intended for removing errors from the configuration memory, but only for mitigation of effects in the configuration memory. The effects in the configuration memory can be removed relying on reconfiguration techniques that allow to remove transient errors accumulated in the configuration memory. The redundancy-based techniques can be achieved by duplicating the circuit the FPGA implements. The outputs are continuously compared and a detection signal is raised as soon as a mismatch is found. Some solutions have been deployed. These solutions are fairly simple and cost-effective; however, they are not able to mask the errors induced by soft errors or permanent errors. The triple modular redundancy (TMR) approach is adopted when fault-masking capabilities are required. The basic concept of the TMR architecture is that a circuit can be hardened against SEUs by designing three copies of the same circuit and building a majority voter on the outputs of the replicated circuits. The TMR implementation in order to prevent the effects of SEUs in technologies such as ASICs is generally obtained by applying the protecting capabilities of only the memory elements since combinational logic and interconnections are less sensitive to SEUs. When the configuration memory of FPGAs is considered, the TMR implementation should be revisited since a modification in the configuration memory may affect every FPGA’s resource: routing resources implementing interconnections, combinational resources, sequential resources, I/O logic. This means that three copies of the whole circuit, including I/O logic, have to be implemented to harden it against SEUs [30]. The optimal implementation of the TMR circuitry inside SRAM-based FPGAs depends on the type of circuit that the FPGA implements. As described in [30], the logic may be grouped into four different types of structure: throughput logic, state-machines logic, I/O logic, and special features (embedded RAM modules, DLLs, etc.). The throughput logic is a logic circuit of any size or functionality, synchronous or asynchronous, where the entire logic path flows from the inputs to the outputs of the module without ever forming a logic loop. The TMR architecture for a module M is implemented as shown in Fig. 4.26. Three copies of M are connected to a majority voter V, which computes the output of throughput logic. In order to prevent common-mode failures, the inputs feeding the throughput logic have to be replicated, too. This implies that, when M is fed directly from I/O pins, the adoption of TMR must be accomplished tripling the circuit I/O pins. State-machines logic is, by definition, state dependent. For this reason, it is important that the TMR voting is performed internally rather than externally to such
4.3
Overview of Techniques for FPGA User
127
Fig. 4.26 TMR architecture for throughput logic
Fig. 4.27 TMR scheme for state-machines logic
a module. Thus, applying TMR to a state machine consists of tripling all circuits and inserting a majority voter for each of the replicated feedback paths. The use of three redundant majority voters eliminates them as single points of failure, as shown in Fig. 4.27. Hardening the I/O logic through TMR causes a severe increase in the number of required I/O pins and this method can be used only when there are enough I/O resources to achieve tripling of all the inputs and outputs of the design. Therefore, as illustrated in Fig. 4.28, each redundant module of a design that uses FPGA inputs should have its own set of inputs. Thus, if one input is affected by an SEU, it only affects one module of the TMR architecture. The majority of any logic design can be realized by using look-up tables (LUTs), flip-flops (FFs), and routing resources that can be hardened against SEUs in the configuration memory through the previously outlined methods. However, there are other special FPGA resources that allow the implementation of more efficient and
128
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.28 TMR scheme for I/O logic
performing circuit implementations. These include block RAM, LUT RAM, shiftregister, and arithmetic cores. For each of these features, there are particular recommendations to be followed to guarantee an accurate TMR architecture. A detailed presentation of these recommendations is out of the scope of this book. Reader interested in these subjects may refer to [30, 31]. Other methodologies to implement redundant architectures on SRAM-based FPGAs are available. One of these techniques is oriented in performing all mitigations using the description language to provide a functional TMR methodology [53]. According to this methodology, interconnections and registers are tripled and internal voters are used before and after each register in the design. The advantage of this methodology is that it can be applied in any type of FPGA. Another approach is based on the concept that a circuit can be hardened against SEUs by applying TMR selectively (STMR) [25]. This approach extends the basic TMR technique by identifying SEU-sensitive gates in a given circuit and then by introducing TMR selectively on these gates, only. Although this approach optimizes TMR by replicating only the most sensitive portions of a circuit (thus saving area), it needs a high number of majority voters since one voter is needed for each SEU-sensitive circuit portion. To reduce both the pin count and the number of voters used to implement the TMR approach, Lima et al. proposed a technique based on time and hardware redundancy to harden combinational logic [40, 75]. This technique combines duplication with comparison (DWC) with a concurrent error detection (CED) machine based on time redundancy that works as a self-checking block. DWC detects faults in the system and CED detects which blocks are fault free. Although this fault-tolerant technique aims to reduce the number of I/O pads and the power dissipation, it is applied on a high-level description of the circuit, and, thus, if their components are not properly placed and routed on the FPGAs, they may suffer the multiple effect induced by SEU in the FPGAs configuration memory. In order to address the multiple effects induced by SEUs in the FPGAs configuration memory, it is mandatory to select a
4.3
Overview of Techniques for FPGA User
129
clever placement and routing of the design. To attach the problem, we abstracted the physical characteristics of an FPGA by using a generic model.
4.3.1.2 Placement Techniques: Floorplan Design of circuits on SRAM-based FPGAs is performed by commercial existing tools. These tools offer a complete tool chain that includes synthesis, map, and place and route on the FPGA physical array. In order to help designer in the definition of a circuit’s placement constraints, floorplan tools are provided. These tools are based on the SRAM-based FPGAs periodic chip die, where each architectural module of a circuit has a list of required logic and routing resources. Therefore each module has to be placed within an area of the FPGA array containing all the needed resources. Floorplan tools rely on floorplan design techniques that are based on a two-step algorithm: the execution of the Parquet floorplanner [4] with a resource-aware cost function which estimates the number of resources assigned to a single module with respect to the available resources on the FPGA’s array area. At the moment, this approach is the state of the art in the resource floorplanning for SRAMbased FPGAs. However, this approach presents several drawbacks with respect to the FPGA reconfigurability characteristics, since reconfigurable modules must be utmost rectangle area in order to deal with the internal reconfiguration port available on the most recent FPGA devices, such as in modern Xilinx FPGA families. A work dealing with floorplanning oriented to the reconfiguration capabilities and timing characteristics has been proposed in [123], where the floorplan process consists in two phases: the partition and sequencing of the design modules in sub-designs for configuration (this step is related to the temporal partitioning and scheduling) and the spatial positioning of design modules and wiring segments within each reconfigurable area. More recent works have been contributing principally on three-dimensional floorplanning [135]. The developed 3D floorplanning techniques introduced full floorplanning based on simulated annealing. These approaches are able to evaluate the time required by each module to communicate with RAM module outside of the FPGA device; however, the estimation cost function used is only able to consider logic resources such as logic blocks, without considering routing segments. A different work, that considers the FPGA floorplanning problem oriented to the partial dynamic reconfiguration, has been introduced in [105]. The aim of that work was to reduce the area used for the reconfiguration given two designs in two different time instants. The most recent works about floorplan have been developed in [85] where authors propose an approach able to perform the floorplan oriented to the resources and reconfiguration capabilities. When the fault-tolerant characteristics are considered, no pure floorplan approach has been developed in the past to face the problem of the single event effect (SEE)-induced errors. In this chapter, we adopt commercial tool provided by Xilinx in order to perform the floorplan implementation of the benchmark circuits and compare the performances and fault-tolerant characteristics with respect to the developed timingdriven placement algorithm.
130
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
4.3.1.3 Limitation of Synthesis and Mapping Techniques Netlist-based hardening solutions are based on constraints generally applied on replicated registers adopting a voter structure. Implementing netlist-based hardening solutions in other technologies, such as ASICs, is generally limited to protecting only the memory elements, because combinational logic is hard-wired and corresponds to non-configurable gates. Conversely, full module redundancy is required in FPGAs, because memory elements, interconnections, and combinational gates are all susceptible to SEUs. This means that three copies of the user’s design have to be implemented to harden the circuit against SEUs. The implementation of TMR circuit inside an SRAM-based FPGA depends on the type of circuit that is mapped on the FPGA device. All the logic resources are triplicated in three different domains according to the traditional TMR approach, and voter elements are included at the domain’s outputs. When any of the design domains fails, the other domains continue to work properly, and thanks to the voter, the correct logic value passes to the circuit output, as represented in Fig. 4.29. The redundant module architecture presents two major drawbacks when implemented on SRAM-based FPGAs: 1. It leaves designs vulnerable to SEUs in the voting circuitry. Radiation-hardened voting circuits are available to protect against the permanent damaging of the voter induced by single event latch-up (SEL); however, CMOS devices using layout geometries below 0.25 µm are still susceptible to transient effects. 2. It does not provide any protection against the accumulation of SEUs within flipflops (FFs) of state machines. After an SEU happens in the traditional TMR scheme, the state machine is corrected through scrubbing but the state is not reset for the synchronization. When implemented on SRAM-based FPGAs, TMR must follow specific rules assessing three fundamental design blocks: input and throughput logic, feedback logic, and output pins. Each triplicated TMR logic domain operates independently from the other logic domains. As it is illustrated in Fig. 4.29, all inputs, outputs, and voters are triplicated, which eliminate the presence of single points of failure in these resources. Furthermore, in order to ensure constant synchronization between the redundant state registers, majority voters with feedback paths are inserted. As a result, the feedback logic for each state machine is a function of the current state of all the three state registers. Finally, triplicated minority voters protect any possible SEUs affecting the circuit’s output that are directly connected to the device package pins as illustrated in Fig. 4.30. If an upset occurs in throughput logic or in a state machine in a design, the voter for the affected domain detects that the output is different and disables the three-state buffer, while the other two domains continue to operate correctly and to drive the correct output of the chip. This scenario makes netlist-hardened circuits logically immune from upsets in the voting circuitry and against transient errors. Recent research demonstrated that this characteristic is not sufficient to guarantee complete protection against SEU effects [61, 113]. In particular, it has been demonstrated that one and only one
4.3
Overview of Techniques for FPGA User
131
Fig. 4.29 The traditional TMR scheme with voter structures and an example of fault vulnerability
Fig. 4.30 The TMR scheme correctly implemented on SRAM-based FPGAs
configuration memory bit controls two or more FPGA’s internal routing segments. Thus an SEU affecting the FPGA’s configuration memory may provoke a domaincrossing error bypassing the TMR protection capabilities. In detail, a domaincrossing error is an error that spans two or more redundant domains of the TMR structure propagating to the voter structure and causing the incorrect operation of the voter. Thus, the circuit becomes unable to detect and correct a wrong circuit behavior [97]. The timing problem in TMR designs is today only based on user’s constraints to ensure cross-domain data path analysis during the place and route. The standard
132
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
place and route tool chain assumes that clock signals for each redundant domain arrive at the FPGA resource simultaneously. However, it is possible that the input clocks are phase-shifted with respect to each other, thus timing problem can result. These kinds of timing problems are generated by incorrect time balancing introduced by place and route tools. In detail, standard place and route tools consider only FFs within the same voter partition; therefore, the time balancing is optimized within a single voter partition while no optimization is performed between voter partitions. The minimization of the shift between redundant clock domains could be performed by PCB clock traces for redundant versions of the same clock and they must be of equal length. Nevertheless, it may solve the problem only for the clock-tree timing propagation it does not take into account the delay of the data path introduced by the voter scheme. While a typical non-TMR state machine can be constrained using a single period constraint, as this ensures that setup requirements are met between synchronous elements in the data path, the timing of a TMR state machine with voter partition logic cannot be fully constrained applying period constant for each path, because this will not cover the paths between all the possible cross-domains existing on the feedback logic signals.
4.3.1.4 Routing Functional Effects Experimental analysis of the FPGA architecture consists on the analysis of the physical behavior of the device during its functionality in a critical environment. This kind of analysis allows to evaluate the probability of an erroneous event and to define the proper cross section of SEU and MCU effects. In order to obtain the routing functional effects, an experimental setup has been configured. Its configuration is illustrated in Fig. 4.31. It consists of two main units: the control host localized at a distance of 50 m at the external side of a radiation chamber and the radiation chamber that contains, at a distance of 10 cm, a power-PC based on a microprocessor MPC860, an FPGA control architecture, and an FPGA under test, which will be used to depict the routing functional effects.
Control Host
MPC 860
Control FPGA
Fig. 4.31 Routing effects experiment setup
FPGA Under Test
4.3
Overview of Techniques for FPGA User
133
The controller of the radiation experiment is realized thanks to a Power-PC MPC860 and by a control unit implemented on a virtex FPGA. This FPGA architecture consists of four FIFO buffers and a circuit for the control of the writing and the reading operation of the bitstream of the FPGA’s configuration memory under test. The FIFO buffers have been implemented in order to decouple the data flow between the microprocessor MPC860 and the FPGA architecture under test: two FIFOs are used to download the bitstream from the configuration memory and two FIFOs are used for reading and writing the input and output test vectors. Both the MPC860 microprocessor and the FPGA architecture are placed very close to the FPGA under test inside of the radiation chamber. This solution allows to drive, from the external side of the radiation chamber, only the supply signal and an Ethernet connection to connect the MPC680 microprocessor to the host PC localized at 50 m of distance from the radiation chamber. In order to avoid the single event latch-up (SEL) phenomena, the power supply signal through the FPGA under test, are constantly monitored by a protection circuit, placed at the external side of the radiation chamber. During the execution of the radiation experiment, the FPGA under test is constantly exposed to the ions radiation and steadily stimulated and monitored through the control hardware. When a difference between the output vectors and the expected ones is observed, a readback process is executed from the configuration memory, then the pattern sequence is stored. The FPGA’s configuration memory is periodically reconfigured in order to reduce the possible accumulation of SEUs or multiple particle-induced MCUs. Some combinational test circuits have been used during the radiation experiments. Combinational circuits are extremely effective to observe the functional effects of errors on routing resources. In particular a combinational multiplier has been adopted. The combinational multiplier, the c6288 related to the ISCAS’85 benchmarks, has been placed on the FPGA with different N instances, with the output vectors generated by the combinational logic function XOR between the N instances. The ions reported in Table 4.1 have been used during the radiation experiments. The FPGA’s configuration memory has been periodically read comparing the erroneous configuration with the original bitstream. The bitstreams affected by single error have been compared with the original ones. The effects have been reported in Table 4.2, while the ratios between SEU/SEFI are illustrated in Table 4.3. Table 4.1 Total ionizing dose
Ion
Dose [Krad(Si)]
C 12 O 16 F 19 Si 28 Cl 36 N i 58 Ag 107 I 127
6.69 0.36 0.09 0.16 0.07 0.21 0.17 0.10
134
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Table 4.2 Total ionizing dose
Table 4.3 Total ionizing dose
Effect class
SEFI [#]
%
Unrouted Antenna Bridge Conflict Open IO block Tolerant Fault LUT Fault MUX CLB Fault CFG CLB
184 59 39 676 314 88 24 368 33 33
10 3 2 38 17 5 1 20 2 2
Ion
LET [MeV/mg/cm2 ]
SEU / SEFI
C 12 O 16 F 19 Si 28 N i 19
1.6 3 4.1 8.5 30
33 12 8 6 9
For each read-back process, it has been registered from 100 to 200 maximum errors. While in the 10% of the case it has been registered less than 10 errors. This result is particularly relevant since it allows an accurate identification of the errors within the configuration memory and to identify the correspondence between SEU and SEFI. In Table 4.4 is reported the classification of the SEFI generated by a single SEU into the FPGA’s configuration memory. Table 4.4 Total ionizing dose
Effect class
SEFI (#)
%
Unrouted Antenna Bridge Conflict Open IO block Tolerant Fault LUT Fault MUX CLB Fault CFG CLB
63 30 5 97 113 3 0 118 2 1
15 7 1 22 26 1 0 28 0 0
The results confirm that an SEU can provoke erroneous behavior both in the bits of the configuration memory corresponding to the CLB logic resources and in the bits corresponding to the routing resources. The graph illustrated in Fig. 4.32 describes how the exponential growth of the number of SEFI is directly proportional with the increase of density of the implemented circuit, and how the routing resources generate the critical effects. The graph illustrated in Fig. 4.33 shows the percentage of the SEFI effects induced by SEUs into the configuration memory related to the interconnections.
4.3
Overview of Techniques for FPGA User
135
Routing Resources
CLB Resources
Fig. 4.32 SEFIs in relation to the design density
Conflict 31%
Open 37%
Bridge 2% Antenna Input 10 %
Unrouted 20%
Fig. 4.33 Distribution of SEFIs into the routing resources
136
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
The principal routing critical effects due to SEUs into the configuration memory correspond to the introduction of open and conflict between two nodes. This result is one of the key issues if the objective is the reduction of the probability that an SEU in the configuration memory can generate critical circuit modifications. Thanks to the localization, classification, and individuation of the modification induced by an SEU into the routing resources, it has been possible by means of behavioral simulation, the identification of the model of the propagation of the fault, thus identifying the corresponding logical model associated with a signal in case of an effect into the routing. The following cases have been identified: • • • • •
Stuck-At 0: The affected signal A is fixed at the zero logic value. Stuck-At 1: The affected signal A is fixed at the one logic value. Wired-AND: The affected signals A and B propagate as the function A AND B. Wired-OR: The affected signals A and B propagate as the function A OR B. Wired-MIX: The affected signals A and B propagate as the following two conditions: – On the signal A, if A and B are equal, A propagates as A, vice versa A propagates as the zero logic value. – On the signal B, if A and B are equal, B propagates as B, vice versa B propagates as the one logic value.
• Bridge: The signal A propagates in B, while the signal B propagates in A. Functional Model of the Effects The comparison process of the output vectors and the HDL behavioral model obtained with concurrent process generation is empirically executed. This allows to identify a functional model related to the kind of node and the kind of resource present internally to the FPGA architecture. The functional model is related to the switch block level, in reference to the model described in Chapter 2. There are three levels of switch blocks: 1. Level 1: The tile switch block 2. Level 2: The local routing switch block 3. Level 3: The multiple tile switch block In the following, all the functional effects related to the several switch block levels are analyzed. Peculiar effects may change between the different FPGA architectures. The behavioral model described in this chapter is related to the general FPGA model described in Chapter 2 that generally contains all the possible propagation effects. Tile Switch Block The effects on the tile switch blocks are described in Table 4.5.
4.3
Overview of Techniques for FPGA User
137
Table 4.5 Tile switch block functional effect model
Table 4.6 Local routing switch block functional effect model
Table 4.7 Multiple tile switch block functional effect model
Routing effect
Effect model
Conflict Open Unrouted Bridge Antenna input
Wired-AND Stuck-At 0 Stuck-At 1 Bridge Stuck-At 0
Routing effect
Effect model
Conflict Open Unrouted Bridge Antenna input
Wired-AND–Wired-Mix Stuck-At 1 Stuck-At 0 Bridge Stuck-At 0
Routing effect
Effect model
Conflict Open Bridge Antenna input
Wired-AND Stuck-At 0 Bridge Stuck-At 1
Local Routing Switch Block The effects on the local routing switch blocks are described in Table 4.6. Multiple Tile Switch Block The effects on the multiple tile switch blocks are described in Table 4.7. 4.3.1.5 The Functional Model of the Redundancy Techniques The design techniques based on redundancy, as described in Section 4.3.1, rely on triple modular redundancy. The implementation of such kind of circuitry in different technologies, such as ASIC, is limited on the protection of the storage elements such as the flip-flops. For this kind of technologies, the protection is adequate, while, in case of FPGA architecture the protection is not sufficient. The implementation of triple modular redundancy is a fundamental requirement in order to make more resilient the implemented application with respect to single event upsets, since all the logical and routing paths, not only the flip-flops, are susceptible to the SEU or MCU modification. The right implementation of the TMR circuitry into the FPGA architecture depends on the kind of structure implemented. Three types of structures can be considered: combinational logic, state machines, and special applications. The combinational logic consists of logic modules with different functionalities. They can be synchronous or asynchronous, where all the logic paths between the
138
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
inputs and the outputs are realized without creating close rings. The implementation of a TMR design for a combinational logic structure creates three copies of the basic module. The state machines consist of a structure, where each output register, whatever is the level, retroacts with a previous register internally to the module creating a ring of registers. This structure is used in order to create accumulator, counter, and different types of state machines. Since the state machine, for definition, is dependent from the value of the state, the TMR must be implemented internally to each logic block. The main concept of the TMR implementation for a state machine is the triplication of the resource of the circuit and the insertion of a majority voter for each registers ring or retroactive path. An FPGA architecture provides a large set of special applications using particular logic resources such as block-RAM, shift-register based on LUTs, or applications using the delay-locked loops (DLL) that require special implementation method in order to create an application effectively redundant. While the logic block with majority voters can be realized using look-up tables, flip-flops, and routing resources, the special applications need particular distinction depending on the kind of special application implemented. In order to implement a TMR project, several libraries have been developed. These libraries provide a set of gates able to describe the voter characteristics. In particular two elements are provided: the voter implemented by LUT and the voter implemented by the buffer three state (BUFT). In Tables 4.8 and 4.9 are illustrated the functional definition of these two elements. Table 4.8 Voter implemented with LUT
Table 4.9 Voter implemented with buffer three state
Input name
Direction
Type
Width
TR0 TR1 TR2 V
IN IN IN OUT
std_logic std_logic std_logic std_logic
1 1 1 1
TR0
TR1
TR2
V
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
0 0 0 1 0 1 1 1
The selection of the routing or logic elements allocated into the FPGA architecture is determined by the place and route. The place and route is a semi-automatic process based on several constraints. These constraints are related to the following aspects:
4.3
• • • •
Overview of Techniques for FPGA User
139
Amount of used area Timing constraints on the interconnections Kind of adopted signals Power consumption
The place and route process is not always automatically completed. The complexity of the circuit or the impossibility to respect the specified constraints can generate an unrealizable compromise. In these situations it is possible to rely on editing tools that allow to manually select specific components. However, this situation is a limit case, and it is applicable only if the complexity of the circuit, since it results impossible to circumvent the constraints for complex designs. Considering the FPGA architecture implementing a TMR circuit, the place and route process performs an automatic selection of the resources correspondent to the three replicas allocating the three redundant modules with respect to the constraints of occupied area and timing between the connections. The key point is that the three modules have resources logically separate, but not physically. This observation leads to two important critical situations: 1. Two or more interconnections belonging to different redundant modules can be allocated in the same tile using potentially critical nodes as illustrated in Figs. 4.34 and 4.35. 2. Two or more interconnections belonging to different redundant modules can be allocated in different tiles but using potentially critical nodes as illustrated in Figs. 4.36 and 4.37.
NO3 EO5
Module B - TMR Module A - TMR
SI12
SI5
Fig. 4.34 Initial configuration of first critical condition
140
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
EI1
EO5
Module B - TMR
Module A - TMR
SI12
SI5
Fig. 4.35 First critical condition generated Module B - TMR
NO5
Module A - TMR
W01
WI7
SI9
Fig. 4.36 Initial configuration of the second critical condition Module B - TMR
NO5
Module A - TMR
SI9
Fig. 4.37 Second critical condition generated
WO1 WI7
SI2
4.3
Overview of Techniques for FPGA User
141
In the first case, the initial configuration, illustrated in Fig. 4.34, has instantiated two PIPs: • NO3 → SI12 belonging to the TMR module B programmed by the bit r12c42 1, r16c43 0, r17c43 0, r17c44 0, r16c45 0, r17c45 0, and r17c46. • EO5 → SI5 belonging to the TMR module A programmed by the bit r17c44 1 and r17c46 0. If an SEU induces a bit-flip in the bit r17c46, changing its value from the logic value 0 to 1, the two PIPs are modified in the following way, as illustrated in Fig. 4.35: • NO3 → SI12 is disabled since the programming string is not valid anymore. An open effect is generated as propagated to the output with the logic value depending on the switch block level. • EO5 → SI5 is disabled since and it is instantiated a new PIP EO5 → EI1 is added generating a bridge effect. In the second case, the initial configuration, illustrated in Fig. 4.36, has instantiated two PIPs: • NO5 → SI9 belonging to the TMR module B. • WO1 → WI7 belonging to the TMR module A programmed by the bits r17c19 and r17c21 at logic level 0 and r15c19 at logic level 1. If an SEU affects the bit location in r17c23 changing the content of the logic value from 0 to 1, it modifies the PIP from WO1 → WI7 to WO1 → SI2. The node SI2 provokes a conflict with the node SI9 localized in a different tile, as illustrated in Fig. 4.37. 4.3.1.6 Hardening Against Single Event Upsets In general, the commonly used design flow to map designs onto an SRAM-based FPGA consists of three phases. In the first phase, a synthesizer is used to transform a circuit model coded in a hardware description language into an RTL design. In the second phase a technology mapper transforms the RTL design into a gate-level model composed of look-up tables (LUTs) and flip-flops (FFs) and it binds them to the FPGA’s resources (producing the technology-mapped design). In the third phase, the technology-mapped design is physically implemented on the FPGA by the place and route algorithm. The problem of how to physically implement a circuit on an FPGA device is divided into two sub-problems: placement and routing. The main reason behind such decomposition is to reduce the problem complexity. The presented reliabilityoriented place and route algorithm, called RoRA, first reads a technology-mapped design. Then, it performs a reliability-oriented placement of each logic functions, and finally it routes the signals between functions in such a way that multiple errors affecting two different connections are not possible.
142
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.38 The flow of the reliability-oriented place and route algorithm
The algorithm we developed is described in Fig. 4.38, where the placement and routing steps are shown in a C-like pseudo-code. Our proposed RoRA placement algorithm performs a robust placement, which implements the TMR principle, executing four distinct functions: 1. The generate_functions_replicas() first reads the design description produced after the technology mapping and identifies the logic functions in the design. Second, it generates three replicas of the logic functions belonging to the original design. Let F be the set of the original design’s logic functions: at the end of this step the three sets F1 , F2 , and F3 are produced. 2. The generate_majority_voter() analyzes the three logic function sets F1 , F2 , and F3 and generates a logic function set F4 that performs the majority voting between them. 3. The generate_partitions() partitions the routing graph’s vertices in four nonoverlapping sets, where each set Si (i = 1, 2, 3, 4) has enough logic vertices to contain the logic functions of each set Fi (i = 1, 2, 3, 4). 4. Every logic function in set Fi is placed heuristically to the logic vertices in set Si , where i = 1, 2, 3, 4. This phase takes care of marking the graph, by assigning each logic function to exactly one logic vertex in our routing graph. The RoRA placement algorithm places each logic functions in Fi to the graph vertices belonging to Si , as well as the majority voter on S4 . After the placement process, each set Si contains exclusively the function of set Fi . This solution allows us to guarantee that single or multiple effects within one set Si only do not provoke any misbehavior of the circuit. Indeed, according to our placement, only multiple effects on the boundary of two different sets Si = S j may generate multiple errors that affect two different replicas. When all the logic functions are placed in the correspondent set of logic vertex, RoRA performs the routing of the interconnections between the logic vertices.
4.3
Overview of Techniques for FPGA User
143
Basically, the RoRA Routing algorithm works on the routing graph we developed, and it routes each connection between two logic vertices through the shortest path it can find. During path selection, the RoRA Routing algorithm labels dynamically the graph’s routing vertices, in such a way that it avoids the instantiation of two connections that may be subject to Short effects. Each graph routing vertex (RV) are labeled as f r ee, used or f or bidden, with the following meanings: 1. Fr ee: The routing vertex is not used by any connection. 2. U sed: The routing vertex is already used by a connection. 3. For bidden: A routing vertex RV is forbidden if and only if • it belongs to set Si (RV ∈ Si ); • at least one routing edge or one wiring edge exists between RV and another vertex RV belonging to S j (RV ∈ S j ), where i = j. If RV is added to the circuit and an SEU affects the routing resources in such a way that both RV and RV are affected, the TMR does no longer work as expected. The forbidden vertices sets (FVSs), which are empty at the beginning of the RoRA routing, contain the vertices marked as forbidden and belonging to the correspondent graph routing vertices set Si . RoRA performs the routing of each net by taking into consideration all the graph’s vertices labeled as free and it updates progressively the FVSs adding the vertices marked as forbidden. As soon as the net is routed and the marking of the graph has been updated (i.e., the vertices in the routing graph and the associated edges have been marked as used by the circuit implementation), the update() function is used to modify the set i of forbidden vertices (F V Si ), which is empty at the beginning of RoRA routing. The developed algorithm starts by reading a description of the circuit which consists of unplaced logic blocks and a set of nets. While standard placement techniques are sufficient if the application mapped on the FPGA does not require any particular reliability constraints, special attention must be taken in FPGA placement algorithm for safety critical application where high reliability is a mandatory requirement. The RoRA placement algorithm, which is described in Fig. 4.39 as C-like pseudo-code, performs the placement of a logic function by using the concept of window. A window is defined as a rectangular portion of the logic vertices belonging to the routing graph space. More in detail, the RoRA placement algorithm uses two types of windows: the place window PW and the nearby window W. The place window PW defines a rectangular space containing the logic vertices already connected to the logic vertex being placed, while the nearby window W defines the space containing a whole of logic vertices labeled as free and candidate for the placement. The RoRA placement algorithm implements different heuristic cost functions that measure the wire length as well as the routability of the placement. The wire length is based on the Manhattan distance that defines the distance between two points measured along axes at right angles that include horizontal and vertical components. Minimizing the wire length minimizes the number of routing resources required, and thus reduces the existence of SEU-sensitive routing resources; thus,
144
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.39 The flow of the RoRA placement algorithm
the Manhattan distance is minimized. However, the minimization of the Manhattan distance does not guarantee that a signal can be routed successfully, since not all the available routing resources can indeed be used, since some of them must be avoided for satisfying reliability constraints. To address this problem we added two metric functions: “local density” and “global constraints,” that are defined as follows: 1. The local_density (W) computes the number of routing resources available in the nearby window W. It returns the number of available edges that links two routing vertices labeled as free. 2. The global_constraints (W) computes the routing reliability constraints in the nearby window W. It returns the number of routing reliability constraints that may be generated between the routing vertices labeled as free and comprised in the nearby window W. The local density addresses the degree of routability of the placement. It attaches a cost to the placement considering the capability of routing resources. Thus, it aims at avoiding any competition among signals for insufficient routing resources. The global constraints address the inadequacies of the routability computing the congestion provoked by the routing reliability constraints. These metrics consist of looking at the region contained in the nearby window W and to compute a cost calculating the number of net and routing reliability constraints that may exist in this region.
4.3
Overview of Techniques for FPGA User
145
For a given placement phase the generated nearby window W in the routing graph is examined. This phase allows the RoRA routing algorithm to find easily a route for every signal, since the routing capability of the considered nearby window W where the signals have to be routed is computed during the placement phase. The RoRA placement of a logic function, LF, on a partition set Si is divided into two phases: pre-placement and placement. During the pre-placement, the window PW is generated considering the logic functions connected to LF that have already been placed on the logic vertices DLVs. In Fig. 4.40 it is described an example of the PW generation. Suppose that a logic function LFA is connected to the logic functions LFB, LFC, and LFD. It is supposed that only LFB and LFD have already been placed on the logic vertices DLVB and DLVD; during the placement of the logic function LFA, the place window, PW, will be generated and it selects an area where a logic vertex could be used for the placement of the logic function LFA. Moreover, W is initialized as equal as PW only if PW contains at least one logic vertex. Otherwise, W is generated by adding from the same dimension of PW one row or column that contains at least one free logic vertex. During the placing phase, the RoRA placement algorithm executes three different steps until the logic function, LF, is placed on a logic vertex V . First, the RoRA placement algorithm computes the heuristic cost functions local density and global constraints on the nearby window W and compares the respective values with their limits. The limits depend on the cardinality of the adopted routing graph, and thus on the kind of the used FPGA architecture. If the limits are not respected, the nearby window W is updated until the cost function is satisfied.
Fig. 4.40 An example of placement window
Second, a logic vertex labeled as free is selected from the nearby window W belonging to the partition set Si . A cost MDLV is associated with every logic vertex DLV that is already placed on the partition Si and that is connected to the logic function LF. Each cost MDLV is defined calculating the Manhattan distance between each DLV and the logic vertex V candidate for the placement of the logic function LF. Finally, the RoRA placement algorithm calculates a Manhattan cost C for the
146
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
whole DLVs and if C satisfies the max length distance the logic function LF is placed on the candidate logic vertex V . On the other hand, the FPGA routing is a complex combinatorial problem. Basically, the RoRA router algorithm works on the routing graph and routes each connection between two logic vertices through the shortest path it can find. During path selection, RoRA labels dynamically the graph’s routing vertices, in such a way that it avoids the instantiation of two connections belonging to two different sets S that may be subject to multiple effects. The general approach implemented in the RoRA router is a two-phase method composed of a global routing followed by a detailed routing. As shown in Fig. 4.41, given a source vertex SV belonging to a logic function Fi , a connection between SV and all its destination vertices DVs is computed executing the global routing followed by the detailed routing. The global routing balances the density of all the routing structures in relation with the reliability constraints, while the detailed routing assigns to the path-specific wiring edges, routing edges, and routing vertices. The global routing is based on a super-routing graph architecture which is composed of logic vertices and super-routing vertices (SRV) that are linked by a super edge (SE) as shown in Fig. 4.42, where a super-routing vertex models the whole of routing vertices of the FPGA routing graph, while a super edge models the whole of routing edges between routing vertices or between a routing and a logic vertex.
Fig. 4.41 The flow of the RoRA global and detailed routing algorithm
The super-routing graph is used to execute the global routing. The global routing on the super-routing graph architecture is performed by the function find global_route SV to DV. This function generates a global route P that consists of a sequence of super edges and super-routing vertices that link the source logic vertex SV to the destination logic vertex DV. Associating the super-routing graph architecture with the FPGA routing graph, a global route P is decomposed to a sequence of routing vertices, wiring and routing edges that connect SV to DV. Thus, the RoRA global routing generates a set of candidate paths that could be chosen by the RoRA-detailed routing to connect SV to DV. To determine whether a global route P is optimal, the RoRA global routing selects the super edges and
4.3
Overview of Techniques for FPGA User
147
Fig. 4.42 The super-routing graph architecture
the super routing vertices optimizing a heuristic cost function that consists of two components: the first component aims at minimizing the length of the global route by selecting the shortest way to connect the source to the sink, while the second component computes the availability of the global route calculating the number of vertices labeled as forbidden out of the number of vertices labeled as free, existing in it. The availability Af of a global route P composed of i super routing vertices SRV is defined as free(SRVi ) A f (P) = avoid(SRVi ) i
where avoid (SRVi ) is the number of routing vertices labeled as forbidden belonging to the super-routing vertex SRVi and free(SRVi ) is the number of routing vertices labeled as free. The global router makes the routing problem easier, since it can estimate the routing congestion due to the routed interconnection and the forbidden vertices. When a global route P is selected, the RoRA routing algorithm executes the detailed routing. The RoRA-detailed routing algorithm is split into two phases. In the first phase it expands each routing tree, where the root is associated with the logic vertex correspondent to the source of the connection, while the leaves are associated with the logic vertices correspondent to the destinations of the connection. The routing tree expansion is made by choosing wiring and routing edges linked by routing vertices labeled as free in our routing graph and belonging to the global route selected by the RoRA global routing. The RoRA-detailed routing is based on the approach developed for the pathfinder-negotiated congestion algorithm [22], [45]. It is based on the construction of a routing tree. The maze routing, described in [71], is usually used for this purpose. The RoRA-detailed router expands the routing tree
148
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
progressively to the leaves and preserving the routing channel by the global routing: starting from a tree composed of the source vertex, only new vertices are added, until all the destinations of the connection have been added to the tree. The previously executed global routing allows preserving memory and running time for the routing tree expansion, since the detailed router may choose the net paths on a limited space of solutions. The RoRA-detailed router uses the routing tree construction developed for the maze-routing approach with a fundamental difference in the creation of each routing tree: the key step of the RoRA-detailed router is performed during the routing tree expansion, where those vertices that are labeled as forbidden are not used. Moreover, the set of forbidden vertices is updated in the second phase of the RoRA-detailed router after the creation of the routing tree. The detailed routing generates the routing tree computing the function create_routing_tree(). This function performs the computation of the routing tree by taking into consideration all the graph’s vertices not labeled as forbidden and belonging to the global route P selected. After the expansion each routing tree (SV, DVs) may contain a number of routing vertices that could have a routing edge that links them to other routing vertices in the routing graph model by the modification of a single configuration memory bit. The update function of the RoRA algorithm selects these routing vertices belonging to the set Si and checks if each of them could be linked, by changing a single configuration memory bit, to the routing tree routed on the routing graph belonging to the set S j , where i = j. If this happens, the update function labels it as forbidden. By this way, no routing edge could link routing vertices belonging to a different set S, and thus no SEU affecting the configuration memory of the SRAM-based FPGA could affect more than one replica of the implemented TMR architecture.
4.3.1.7 Hardening Against Multiple Cell Upsets As described in the second chapter of this book, modern FPGAs have been designed with advanced integrated circuit techniques that allow high speed and low power performance, joined to reconfiguration capabilities. This makes new family of FPGA devices very advantageous for hungry computing applications and for safetycritical applications such as avionics and space. However, larger levels of integration makes FPGA’s configuration memory more prone to suffer multi-cell upset errors (MCUs), caused by a single radiation particle that can flip the content of multiple nearby cells. As explained in the previous fault effects chapter, MCUs are on the rise for the new generation of SRAM-based FPGAs, since their configuration memory is based on volatile programming cells designed with smaller geometries that result more sensitive to proton- and heavy ion-induced effects. MCUs drastically limits the capabilities of specific hardening techniques adopted in space-based electronic systems, mainly based on triple modular redundancy (TMR). In detail, safety systems such as space and avionic electronic systems always require highly dependable components, which guarantee correct operations in safety-critical environments, characterized by radiation particles [108]. In recent
4.3
Overview of Techniques for FPGA User
149
years, the request of reliable electronic devices and ICs offering high performances with low power consumption has been grown dramatically. Among the various electronic technologies recently investigated for their usage in space and avionic systems, many research activities [51, 98] have recently focused on field programmable gate arrays (FPGAs) with volatile configuration memory. This has been due to the technology scaling process that made possible the implementation of state-of-the-art static RAM-based FPGAs with embedded hardwired microprocessors, RAM modules, and numerous configurable logic and routing resources. As explained in the previous section, the main challenge for their adoption in space and avionic applications is mitigating the effects of radiation-induced upsets [33]. In detail, when ionizing radiation hits the surface of a COTS SRAM-based FPGA affecting the configuration memory bits, it may change the original circuit’s behavior. In addition to SEUs, multiple cell upsets (MCUs) provoked by ionizing radiation have been observed in SRAM-based memory devices. The effects of MCUs within COTS SRAM-based FPGAs have been reported during accelerated radiation testing experiments with protons and heavy ions [63]. In particular, a more detailed study quantifies the MCU’s occurrence induced by protons and heavy ions in four different SRAM-based FPGA families [96] indicating that two-cell MCUs are on the rise for the Xilinx FPGA families, as a result, the MCU sensitivity of the Xilinx VirtexII devices increases by two orders of magnitude if compared with the previously manufactured Xilinx Virtex-I devices. The effectiveness of TMR hardening techniques in the presence of MCUs in the configuration memory has been experimentally evaluated by static analysis [112] showing that 2-bit MCUs may corrupt TMR 2.6 orders of magnitude more than in the presence of SEUs. Therefore, TMR technique is drastically ineffective against MCU effects and new solutions must be developed. At the moment, there are no proposed approaches for the mitigation of MCU’s effects. A single MCU affects more than one FPGA’s configuration memory cell; hence, it violates redundancy techniques developed according to the single-fault assumptions. A previously performed study measures the occurrence of MCUs on Xilinx FPGAs [96] indicating that for recently manufactured FPGA families, the 98% of the MCUs observed in the FPGA’s configuration memory affects the two nearest cells, increasing with a factor comprising between two and three orders of magnitude the sensitivity of SEU effects. Considering the configuration memory organization, which consists in a matrix of configuration cells, the effects of an MCU is analyzed by clusters of adjacent configuration’s memory bits as illustrated in Fig. 4.43. MCUs may hit logic components belonging to various FPGA resources, such as block RAMs (BRAMs), BRAM interconnects, IOBs, and configuration logic blocks (CLBs). Configuration frames control a resource where each frame corresponds to an FPGA configuration column of SRAM cells [24]. CLB resources form the FPGA logic core. They have an array organization and each one consists of a switch matrix and a set of logic tiles. In detail, a single CLB may implement whatever sequential
150
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.43 Multiple cell upset adjacent cells. The FPGA’s configuration memory general organization with a detailed FPGA’s configurable logic block (CLB) as described in the FPGA architectural model
or combinational logic functions by programming configuration memory bits. The number of possible functions depends on the granularity of the resources available. The configuration memory bits controlling the programmable resources of a CLB are organized in a matrix of bits, related to the several set of resources: LUTs, FFs, and PIPs. Therefore, depending on the orientation of the MCU (on a single column, row, or diagonal adjacent cells), the provoked effect may simultaneously corrupt resources of a single set or two sets whose configuration memory bits are adjacent. The effects of the corrupted resources propagate within the circuit’s logic depending on the hardening techniques adopted. In the case redundancy techniques such as TMR are used, the circuit includes multiple voter partitions, where a voter
4.3
Overview of Techniques for FPGA User
151
Error propagated to the next voter partition
Voter Partition 1
Voter Partition 2
Errors marked by voter structure
Voter Partition 1
Voter Partition 2
Fig. 4.44 MCU’s effect corrupting TMR bypassing a voter partition and MCU’s effect masked by the TMR’s voter
partition consists in the resources (logic and routing) comprised between two voting structures. Considering the scheme illustrated Fig. 4.44, a voter partition consists in the logic and routing resources belonging to the domain Di with i ∈ {1, 2, 3} and comprises between voting structures Vi and Vi+1 . Voter partitions may bind the propagation of MCU effects. In detail, when an MCU affects TMR circuit two scenario are possible: 1. The errors induced by the MCU bypass the voter structure propagating through the voter partitions up to the circuit’s outputs, such as the scenario illustrated in Fig. 4.44. 2. The errors induced by the MCU are corrected by the voting structures if the corrupted resources belong to any domain D within different voter partitions, such as the scenario illustrated in Fig. 4.44. The relation between the cells modified by radiation particles and the errors eventually generated in the implemented circuits needs, at first, the knowledge of the link between each configuration memory cell and the correspondent FPGA’s resource. We used the tool presented in the previous work [112] in order to have a
152
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
complete coverage of each configuration memory cell. While effective for the study of the SCU phenomena, these data are not enough for allowing the study of the MCU effects. The knowledge of the physical layout of the configuration memory cells is mandatory in order to have data about each cell location, dimension, and distance between each others. This information allows effectively studying multiple cells modification and providing suitable hardening solutions. Since manufactures do not provide such information, we used the approach presented in [112] in order to extract it. This approach performs a laser screening of the device allowing the investigation of the configuration memory physical cells. The laser screening aims to investigate the structure of the physical device. Through localized photoelectric stimulation, the organization of the configuration memory is deducted, thus identifying where configuration memory cells are localized on the silicon surface of the SRAM-based FPGAs. The laser screening was performed thanks to EADS France Innovation Works with Radiation Analysis Laser Facility [78]. In detail, the pulsed laser is focused on the silicon area through the substrate. The laser performs a scan of the silicon surface while the laser energy is adjusted to detect the laser threshold energy. After each laser pulse, the content of the configuration memory is read and compared to a reference configuration data. In case upsets occur, both threshold energy and cells addresses are recorded. Therefore, the laser mapping associates the sensitive locations with the addresses of the upset configuration bits. For the purpose of this work, all the surface of CLBs array has been screened. Given the regularity of the FPGA array, the obtained analysis data are identical to the other CLB on the same family of devices. The analysis of the configuration memory layout has been performed on a Xilinx Virtex-II PRO SRAM-based FPGA. A CLB of these devices has a set of four logic tiles each one embedding two look-up tables (LUTs) and two flip-flops (FFs); a switch matrix with more than 4000 programmable interconnection points (PIPs) surrounded by more than 350 hardwired interconnections that can carry data among short, medium, or long distance. All these resources are controlled by a matrix of 1760 configuration memory bits organized in 22 columns (or frames) and 80 rows. The data obtained from the laser screening show that configuration bits are distributed vertically along each frame in a regular fashion. The distance between each CLB has been measured around 26 µm on the average and the vertical distance between each cell is around 2 µm while the central LUTs are separated with more than 14 µm. The horizontal distribution has a distance between configuration cells that varies from 1 to 26 µm. The CLB’s configuration bitmap is illustrated in Fig. 4.45 and the details of the configuration frames distance are reported in Table 4.10, where the type of resource is indicated, as well as the configuration frames and the average distance between correspondent configuration cells. The larger distance is observed between the frames controlling the multiplexer resource and the first LUT frame, while it is interesting to note that couples of routing frames (PIPs) are regularly separated by small and wider space. Given the obtained configuration memory layout, it is possible to state that the effects of a
4.3
Overview of Techniques for FPGA User
153
Fig. 4.45 The configuration bitmap (rows and columns are inverted)
Resources
Table 4.10 Xilinx Virtex-II CLB resources average distance Frames Average distance [µm]
Mux-LUT LUT-LUT LUT-PIP PIP PIP PIP PIP PIP PIP PIP PIP
1, 2 2, 3 3, 4 4,5–6,7–8,9–10,11–12,13–14,15–16,17–18,19 5, 6 7,8 9, 10 11, 12–19, 20 13, 14–20, 21 15, 16–21, 22 17, 18
26 2 20 1 5 8 7 5 3 4 14
single radiation particle can dramatically vary depending on the affected location. In the proposed approach, we created a layout geometry database containing the spatial distribution of both the X - and Y -axes of the configuration memory cells obtained from the laser screening. This database is integrated in the placement algorithm we developed in order to tune the placement conditions according to the MCUs mitigation. The PHAM (placement hardening algorithm for multiple cell upsets) algorithm is based on a data structure consisting of a logic placement graph (LPG) where logic resources are represented by vertices associated with the physical location into the FPGA array of CLBs, while PIPs are represented by edges. Each vertex is labeled by the name, type, and physical coordinates expressed by rows and columns within the FPGA array. An example of LPG is depicted in Fig. 4.46, where combinational (L) and sequential (F) elements are shown. The PHAM algorithm for computing LUT, voter, and cluster metrics uses LPG. The LUT metric considers the configuration memory pattern controlling LUTs. As illustrated in Fig. 4.47 there are 8 LUTs for each CLB controlled by 16 configuration cells placed in nearby locations. A critical LUT configuration is defined in case two nearby LUTs are programmed with logic belonging to different TMR domains and within the same voter partition. In this case, MCUs affecting cells across the two patterns may compromise the TMR hardening capabilities. In order to classify the critical conditions, a LUT critical index (CILUT ) is defined as the sum of coefficients related to the following cases:
154
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.46 A logic placement graph example with routing intersections
Fig. 4.47 LUT’s configuration memory patterns.
• 0: In case of single domains nearby LUTs • 7: In case of critical configurations of two or more nearby LUTs (1/3 or 2/4) • 1: In case of LUTs placed on the same CLB but not in critical configuration Generally, the voting structure consists in nine nets that connect the outputs of logic TMR domains to the voter elements, also called output voter macro, as illustrated in Fig. 4.48. The voter interconnections are the most sensitive parts, since the voting structure creates nine crossing points between nets of different TMR domains. Mapping tools implement the voting structure using LUTs of a single CLB in order to optimize performance; therefore, the voter interconnections
4.3
Overview of Techniques for FPGA User
155
Fig. 4.48 The voter’s structure, output voter macro
generally converge into one switch matrix. Besides, given the topology of the switch matrix, also the voter outputs are routed in the same switch matrix, as represented in Fig. 4.49. This condition drastically increases the probability of having failure due to multiple open or short effects induced by MCUs on voter inputs and outputs. A voter critical index (CIVOTER ) is defined considering the logic voter locations. The CIVOTER is associated with the following coefficients:
Fig. 4.49 Placement of the voter’s structure on a single CLB
• 3: In case of three voters placed on a single CLB • 2: In case of two voters placed on a single CLB • 1: In case the three voters are placed on different CLBs Finally, a clustering metric has been defined in order to minimize the errors induced by MCUs affecting routing resources. As considered in the voter metric, interconnections of different TMR domains routed on the same switch matrix
156
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
increase the probability of critical effects, since given the configuration layout and coding of each PIP, it is highly probable that an MCU induce multiple short or open effects. The clustering metric computes a cluster critical index (CICLUSTER ) considering LPG vertices of each voter partition. In detail, CICLUSTER is equal to the number of intersections between routing of different TMR domains divided by the number of CLBs of the cluster. The dimension of the cluster is computed as a square area containing all the logic vertices of the voter partition. A higher value of CICLUSTER corresponds to a major density of critical conditions into the CLB area. As an example, in Fig. 4.46 are illustrated logic vertices and edges of a unique voter partition. In the example, the CICLUSTER value is 0.56. The placement algorithm PHAM has been implemented as a software fully compatible with the Xilinx ISE development tool chain and with all the devices of the Virtex-II and Virtex-II Pro families. The algorithm starts by reading the native circuit description (NCD) generated by the Xilinx ISE tool chain and creates the LPG data structure and the list of used routing switches (RL) of the original design. The flow, illustrated in Fig. 4.50, consists of four phases. The first phase allows the individuation of the voter partitions and calculates the original metrics. The voter partitions are individuated searching the voter structures
Fig. 4.50 The flow of the placement algorithm PHAM
4.3
Overview of Techniques for FPGA User
157
in the LPG graph and by labeling logic vertices and edges, comprised between two voting structures, with a unique identifier. The LUT, voter, and cluster metrics are calculated on the basis of the original LPG structures. The second phase consists of the macro generation. The first function is Logic_macro (). It analyzes all the CILUT values using a greedy approach and modifies the logic vertices’ locations belonging to a single CLB area. The search is performed considering the vertex domain minority present in the CLB logic tiles. The minimum CILUT determines the new location. Once the Logic_macro () is performed, a set of logic macro (LM) is generated. Each macro contains the LUT vertices for a given CLB area belonging to a unique TMR domain. This guarantees that the placement of LUTs or FFs is performed in the optimal hardening condition for the minimization of the configuration memory criticalities. The second function is VOTER_macro (). It analyzes the CIVOTER values and modifies the voter vertices in order to separate each voting structure in three distinct CLBs. This guarantees that voters placement does not locate three voters in the same CLB. The third phase performs the placement of the logic and voter macros considering the CICLUSTER values and the configuration memory layout database. The first goal of the placement routines is the global minimization of the critical configuration of the CLBs. This process is performed for each voter partition. At first, it is placed the output voter macro, second the logic vertices are placed into the LPG by the place_solution () function. This function analyzes the LPG vertices related to the LM’s neighborhood. It searches a new location in the LPG where the majority of the domains are equal to the original one. The range of the search is extended up to the maximum tolerable Euclidean distance between the input and output edges, in order not to degrade the routing delay. When that limit is reached, the function returns the vertex location and a list of routing switches (RL) used by the edges linked to the new placement location. The cost C of the placement condition is computed by the function Layout_Cost that measures the MCU sensitivity on the basis of the CICLUSTER and of cost related to the physical layout CLAYOUT , where C = CICLUSTER + CLAYOUT . The physical layout cost is computed measuring the average configuration cells distance. In detail, the physical cells distance is calculated using the list of routing switches returned by the place_solution() function. The function computes the physical distance between the cells associated with the current routing switches and the cells related to routing switches belonging to different TMR domains and within the same current voter partition P. The cost C LAYOUT is inversely proportional to the physical distance. Finally, the algorithm generates an output user constraint file (UCF) containing the new placement constraints stored in the LPG. The generated file can be used by commercial ISE Xilinx tool chain in order to generate the FPGA’s configuration bitstream. 4.3.1.8 Hardening Against Single Event Transients The progressive shrinking technology process decreased the minimum dimensions of integrated circuits. This advancement accompanied by increasing operating
158
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
frequencies leads on one side the availability of low-power circuits with very small noise margins but on the other side makes integrated circuits increasing sensitive to single event transient (SET) pulses that may be generated and propagated through the combinational logic, leading to behavioral errors of the affected circuit. As previously explained, SETs are generated mainly by high-energy particles that impinge close to a junction resulting in a significant signal energy loss producing pulses that may propagate with a shape broadened and amplified among combinational logic paths. Different solutions have been explored: triple modular redundancy and guard gates as previously explained are viable technological solutions. However, also place and route algorithm for integrated circuit physical design, which is able to mitigate and filter the erroneous effects of SETs, has been explored. These solutions, in particular, have been experimented on Flash-based FPGAs using accurate timing analyses. Single event transient (SET) effects in CMOS integrated circuits (ICs) have recently become a severe concern in deep submicron technologies (DSM) [90]. Aside from general-purpose applications, SETs are today one of the main challenge for designers of advanced digital circuits used in safety-critical applications characterized by a radiation environment. As explained in Section 3.1.4, an SET is a transient pulse generated on a circuit node by a collection of charge deposited by a charged particle such as a heavy ion or a proton passing through a depletion region. The transient pulse generated on the node has a precise dynamic, characterized by width and amplitude. The principal technology factors that make ICs more sensitive to transient pulses generated by energy particles are the smaller dimension of the transistor size and the reduced thickness of the interconnections. Indeed, the progressive shrinking technology induces the simultaneous reduction of both the circuit node capacitance Cnode , due to the decreasing transistor size, and the supply voltage Vdd . Consequently, these reductions lead that the charge to store a logic value in a circuit node is also reduced. Therefore, a weak charge deposited by a particle strike is enough to temporarily alter the logic value of a node, creating a transient pulse that is propagated through the combinational logic of a circuit [20]. The transient pulse, after propagating through the logic, can be sampled by a storage element creating a bit-flip, also called single event upsets (SEUs), that can be propagated through the circuit up to the outputs and leading to an error. It is clear that the increasing working frequency represents another factor that increase the SET sensitiveness, since the chance to capture these transient effects also increase. Several methods have been proposed providing design techniques and layout model of logic gates and interconnections in order to characterize and mitigate the IC’s transient pulse sensitiveness. Generally, all the previously developed approaches rely on two approaches. The first one aims at minimizing the probability that a transient effect occurs in any sensitive node of the circuit. In order to diminish that probability, this approach is oriented at the source of the problem, by reducing the probability that an SET pulse is generated. Several techniques are developed using this approach, as it will be described in the following section, they are based on identifying the
4.3
Overview of Techniques for FPGA User
159
suspected susceptible gates and to selectively harden them, resulting in the reduction or absence of most faulty SETs in the circuit. The second one consists in reducing the number of SETs being latched or captured into the sequential elements of a circuit. Although it permits SETs to originate, it disallows glitches from being sampled by sequential elements. The methods relying on this technique are basically oriented in designing a flip-flop in order to filter large fraction of the SETs effect at its input data. These methods allow to completely nullify a soft error occurring in logic. Both the techniques introduce overheads. On the one hand, the modification of the gates inside a circuit introduces a high overhead in power consumptions, delay degradation, and also in the area of the circuit. On the other hand, the modification of the sequential elements presents a minor cost in terms of power consumptions and area overhead but the delay of the circuits will be dramatically affected by the additional timing constraints. Optimization techniques based on both gates resizing and flip-flop pulses filtering were also proposed in [99]. However, the main drawback of the previously mentioned techniques is that in most of the cases cannot be practicable since the physical layout of the adopted ICs is based on regular fabric such as FGPAS or structured ASIC where the cell-resizing and the flip-flop modifications are impossible. Besides, in the remaining case where these techniques are practicable, the designers should afford the cost of cells resizing that can be dramatically expensive. In recent experimental analysis, it has been demonstrated that the SET propagation shape may change along its propagation through the circuit logical paths [130]. In detail, SET propagation may suffer pulse broadening as it propagates through logical chains [32]; however, it has been proven that by changing the placement positions of logic gates and therefore changing the interconnections characteristics, in terms of delay and capacitive charge, it is possible to nullify the propagation of a transient pulse along combinational logic [110]. Place and route algorithm using an accurate characterization measurement of logic cells in order to tune the placement of their positions and to route the interconnections accordingly have been developed. This algorithm first analyzes the entire circuit identifying the timing characteristics and the combinational gates of each logic paths, then it modifies the placement positions of the logic gates in order to achieve the best mitigation condition. Finally, the algorithm produces a new layout of the original circuit, where the logic gates are positioned in different locations without introducing any delay overhead and without increasing the circuit area. The place and route algorithm is a viable solution to mitigate the SET effects on FPGA since it can be adopted at the application level without requiring modifications of the logic cells dimension and of the flip-flops architecture. Thanks to this characteristic place and route algorithms can be considered as an effective and general methodology that can be applied for the mitigation of SEUs and MCUs but also the effective mitigation of SETs phenomena. Several works have been proposed in the past in order to analyze and trying to mitigate the problem. One of the first method has based on the classical fault-tolerant approach such as triple modular redundancy (TMR) based on redundancy [18].
160
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Other techniques have been proposed relying on replication design methodology by using time or space redundancy. An example of that is the flip-flop redesigning for SET filtering where the original architecture of a flip-flop is modified including a dual-sampling latch with delayed signal sampling able to filter the SET effects [57], as previously described. However, the usage of checkers and logic duplication inherently introduces significant delay, area, and power overhead. Since the overhead introduced by these techniques is dramatically high due to the full logic replication, less area expensive solutions have been proposed aiming at directly resizing each sensitive node. Some approaches have been oriented to the transistor resizing [43, 44, 99]. These techniques analyze the sensitive nodes of a circuit based on the probability of having a critical location. All the selected sensitive gates are resized by reducing the capacitance of the sensitive node and therefore reducing the probability of an SET occurrence. Besides, it is possible to act directly on the sequential element. Some previously developed techniques provide the usage of radiation-hardened latches [59], these immunize FFs from SET due to particle hits, however it has demonstrated that transient errors into the combinational logic are going to be the more significant contributor to the generation of circuit errors [20]. Nevertheless these techniques provide effective solutions, they all rely on the modification of gates and FFs libraries which result in a change of the physical layout characteristics. This represent a critical drawback considering that new generations of ICs are widely based on regular-fabric devices such as FPGAs or structuredASIC that are characterized by a regular array of logic cells and FFs having fixed dimensions and architecture. It is therefore impossible to act with gate resizing or flip-flop architectural modifications. For these reasons, several works have recently investigated the nature of SET effects, studying the propagation of pulses through the combinational logic and routing resources in ad hoc designed circuits implemented on regular fabric devices [16, 101, 102] Experiments proven that SET propagation may suffer a significant broadening or a significant degradation depending on several factors such as the technology, the circuit topology or the capacitive load charge and delay of the routing. A recent study detailed a model of the SET propagation describing how the pulse shape vary depending on the increasing load capacitance. Besides, by means of electrical pulse injection [110] we proven that the placement positions of logic cells and the routing delay between them play a fundamental role in broaden or attenuate the SET propagation. As previously explained, the generation of the SET effects is due to the junction charge collection mechanism, when a charged particle crosses a junction area; it generates an amount of current that will cause a voltage glitch of elevated magnitude. The voltage glitch propagates for notable distances and SET may become indistinguishable from normal signal and in case it arrives at the clock edge, it is possibly sampled. The main source of SET effects for nowadays nanometer technologies is the combinatorial logic circuits, since the effects are generated by the reverse-biased junction collection charge accumulated in the sensitive area of logic gates. It is
4.3
Overview of Techniques for FPGA User
161
notable that for the new generation of FPGAs the critical charge allowing a transient generation decreases as the square of the technology features size as well as the critical width for propagation [20]. Therefore, the combinatorial logic circuitry is increasing sensitive to both SET generation and propagation. An SET has precise characteristics depending on the polarity, the waveform, the amplitude, the duration of the location impact, and on the charged particle (i.e., heavy ions or protons). Besides, the SET shape depends also from factor related to the implemented circuit such as the bias condition or the output load of the affected logic gate. Each gate silicon area can be divided into several sensitive areas, each one with a different linear energy transfer, LET, threshold and an associated output response. In the case of heavy ion radiation particles hitting an inverter gate implemented on a 130 nm technology-based device [15] we obtained an induced transient pulse similar to the waveform reported in Fig. 4.51.
Fig. 4.51 The single event transient pulse effects provoking a transition {0–1–0}
In case a charged particle hits a logic gate sensitive area generating an SET effect, two transitions are possible { 0 → 1 → 0 } or { 1 → 0 → 1 }. We characterized the SET pulse effect considering two parameters: the maximum voltage output difference of the SET pulse with respect to the original signal value, called as SET max ; and the width of the SET pulse can be measured at a defined transition level depending on the electrical logic level adopted, called as SETwidth . Once an SET is generated in the sensitive area of a logic gate it starts its propagation through the logic paths until a sequential element is reached. During its propagation the SET pulse may pass through inverting (i.e., INV, NAND, NOR, etc.) and non-inverting (i.e., AND, OR, etc.) gates. If the SET crosses an inverting gate it undergoes to the inversion of its logic value, vice versa if it crosses a non-inverting
162
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.52 SET propagation through an inverting gate with an input transition of {0–1–0}
gate the pulse is not inverted. In order to develop suitable mitigation techniques it is necessary to characterize both the logic gates. Considering the shape of the SET effect, as illustrated in Fig. 4.52 which a propagation example for an inverting gate given an SET transition { 0 → 1 → 0 } it is possible to define a broadening coefficient C x = t U − t I , where in case C x > 0 the SET pulse is broadened, vice versa if C x < 0 the SET pulse is attenuated. In order to detail the electrical behavior of an inverting gate, it is necessary to U I I U explicit the t coefficients. Considering that t I = t H L − t L H and t = t L H − U U I t H L , where the propagation times ti j and ti j depend on the capacitive and resistive loads of the input and output logic cones. The same characterization is defined for an inverting gate with an SET transition {1 → 0 → 1 }. In this case it is necessary I U U U to consider that t I = t LI H − t H L and t = t H L − t L H . Similarly it is possible to define the broadening coefficients as well as the propagation time for the noninverting gates. Given such considerations, it is possible to outline the timing relations illustrated in Table 4.11. The propagation of an SET effect through logic gates may also undergo to amplitude attenuation. In order to take into account this phenomenon, it is possible to define an amplitude attenuation coefficient which is described by the following relation A = SETI max − SETU max .
4.3
Overview of Techniques for FPGA User
Transition
163
Table 4.11 Gate propagation timing coefficients Function Propagation time
{0→1→0}
Inverting
I I t I = t H L − tL H
{0→1→0}
Inverting
U t U = t LUH − t H L
{1→0→1}
Inverting
I t I = t LI H − t H L
{1→0→1}
Inverting
U − tU t U = t H L LH
{0→1→0}
Non-inverting
I I t I = t H L − tL H
{0→1→0}
Non-inverting
U − tU t U = t H L LH
{1→0→1}
Non-inverting
I t I = t LI H − t H L
{1→0→1}
Non-inverting
U t U = t LUH − t H L
Both the timing relations and the amplitude attenuation coefficients are fundamental for the description of the SET propagation phenomena. However, those coefficients may vary depending on the electrical characteristics of the implemented circuits. The electrical characteristics include the capacitive load of the input and output logic cones, as well as the propagation delay of the interconnections between logic gates. In order to achieve these information for all the principal logic gates, it is necessary to perform an electrical characterization where it is measured the real behavior of the SET propagation through the considered logic gates. Since the way the SET pulse is propagated through logic gates is the principal core of the proposed algorithm, the complete characterization of the principal logic gates, made by means of the electrical behavior analysis under the injection of SET effects, is a mandatory process. The main purpose of the logic gate characterization is the measurement of the propagation times and of the attenuation coefficient under different electrical conditions. The information extracted during this phase are used by the place and route algorithm in order to mitigate the SET propagation. For instance, in the work proposed in [110], an electrical pulse generator implemented on a 130 nm Actel FPGA device has been used. The SET pulse generator has been developed at the physical level and injecting six different types of pulses at 250, 350, 600, 850, 1000 and 1270 ps. The SET injection has been performed for both the transitions { 0 → 1 → 0 } and { 1 → 0 → 1 }. Consequently the transient pulse has been measured at the output of the gate in terms of propagation times and voltage amplitude. For each basic logic gate, we defined two resistive capacitance load conditions connected to the gate output: 1. High resistive capacitance load conditions: A resistive capacitance load consisting of several routing segments ranging from 20 up to 300 has been added to the output of the gates. This condition emulates the logic gate with a load very close to the maximum tolerable fan-out driver. 2. Low resistive capacitance load condition: Only a small number of routing segments ranging from 1 to 20 have been added to the output of the gate. This condition emulates the logic gate with a light capacitive load.
164
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
The pulse generators have the purpose of injecting SET pulses internally to complex logic circuitry. Therefore, the logic scheme must be inserted in order to have the injection point located in the desired position and the defined scheme’s delays must not be biased by the circuit’s logic area, vice versa the injected pulses may differ from the defined behavior. In order to accomplish that condition, the pulse generator schemes have been treated as macros bound; thus the pulse generator logic locations remain untouched by the physical mapping of the circuit’s logic. As an example of the performed logic gate characterization, in Fig. 4.53 is reported the analysis of the broadening coefficient for the NAND gate considering a load condition 1 with 150 routing segments and a condition 2 with 5 routing segments. It is possible to note that the load condition 1 provokes a drastic broadening of the SET pulses injected at the input of the NAND gate, vice versa a small resistive capacitive load may provoke a width attenuation of the injected pulse. Besides, the width attenuation of the pulse is increasing proportionally with the width of the injected SET pulses.
Fig. 4.53 NAND broadening coefficient with different resistive capacitive load conditions in response to SET electrical pulse injection
The principal logic gates such as inverting gates (NAND, NOR, and INVD) and non-inverting gates (AND, OR) have been characterized. Two main conclusions from the analysis of the injected SET pulses have been obtained. The first is that a small or high resistive and capacitive load at the output of inverting gates may lead to an attenuation or a broadening, respectively, of the SET pulses injected into the input, the second is that a high or small resistive capacitive load at the output of non-inverting gates may lead to an attenuation or a broadening, respectively, of the injected SET pulse. Therefore, in order to achieve a better SET mitigation is necessary to place and route the circuit in order that the resistive capacitive loads
4.3
Overview of Techniques for FPGA User
165
between inverting gates is as small as possible, vice versa it is necessary to increase the capacitive resistive load between non-inverting gates. In order to achieve the optimal condition, it is mandatory the adoption of an automatic solution capable to implement these rules. The results we achieved from the gate characterization have been stored in a database, followed by the developed place and route algorithm in order to obtain SET pulse mitigation. In order to automatically implement the SET propagation characteristics, a new place and route algorithm aiming at implementing highly SET-resilient circuits on Flash-based FPGAs has been developed. One of the earliest version of the algorithm is based on a greedy approach and has been developed considering the layout informations related to the 130 nm ProAsic Flash-based FPGA manufactured by Actel. The algorithm consists of the following steps. At first, it loads the entire netlist of the circuit, then it performs three phases: the logic path tree analysis, the analytical placement of the logic resources, and finally the global routing of the circuit. The principal characteristic of the developed algorithm is that the placement and routing phases are regulated by a timing and capacitive load metric that allows to modify the logic gate placement positions into the regular array of the FPGA in order to achieve the more optimal condition for the SET mitigation. This modification is made by acting only on the placement location and on the number of adopted routing segments; the algorithm does not modify the netlist of the original circuit. The flow of the developed algorithm is illustrated in Fig. 4.54. The algorithm starts by reading the netlist description of the considered circuit M and creates the place and route graph PRG where logic functions and interconnections are modeled as logic vertices and edges. The algorithm performs then three phases: the logic path trees analysis, the placement, and finally the routing. The logic path trees analysis phase consists at first in identifying the logic cones of the circuit. Each logic path tree LT will include several source points (consisting in FFs or IOs) and a destination point consisting in a single FF or IO. Second, the function critical_logic_ path identifies the more timing critical logic path CP between one source point and the destination point of a considered LT. The timing critical logic path, which will be the logic path with the longest delay, is used to constrain the placement and routing functions for each considered LT. Once the logic path trees analysis is completed, the algorithm performs the placement. The first step of the placement consists is the identification of the logic source and destination sequential elements LS and LD. The second step is the more critical. For each logic path P between a source element LS and a destination element LD it is performed the logic placement and the global routing. First of all, it is created an original placement of the source sequential elements and the destination ones. The placement positions of those elements are stored into LS and LD. In detail, after initializing to zero a cost variable called Placement DelayCost (P DC) the placement of each gate G belonging to the logic path P is performed with the following steps:
166
4 Reconfigurable Field Programmable Gate Arrays: Hardening Solutions
Fig. 4.54 The earliest version of the algorithm for the mitigation of the SET transient effects
1. If the gate G and the next gate G + 1 are inverting gates, the placement is performed in the closest position trying to perform a short global routing. 2. In the other cases, the placement is performed in a longer distance and the global router will use more interconnection segments. After each placement step, the global routing functions update the PDC delay cost variable. Once the placement of all the logic gates of a logic path P is completed, the whole cost is evaluated in terms of delay. In case the cost of the generated solution is major than the most critical logic path CP the solution is repeated. Vice versa, if the delay cost is minor or equal to the CP delay the place and route graph is updated. The last phase consists of the detailed routing. This is the most simple part of the algorithm, since the function r oute routes the input and output points between each pre-placed gate. Finally, the place and route graph is updated. Once all the logic paths are placed and routed the PRG is exported into the native netlist format. In Fig. 4.55 it is given an example how the algorithm acts on a single logic path. Considering a logic path having two FFs FFi and FFi+1 and five logic gates, the
4.3
Overview of Techniques for FPGA User
Fig. 4.55 An example of the delay modification introduced by the developed algorithm on a single logic path
167
Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
original routing timing of the logic path is given by the sum of all the routing delay of each segment: Ttot = T1 + T2 + T3 + T4 . The developed algorithm modifies the routing delay between the inverting gates (NAND and the NOR) and between the non-inverting gates (OR and AND). The delay is modified by changing the placement position of each gate and consequently the routing between them. The resulting new timing characteristics of the logic path have TN 1 > T3 , TN 2 = T2 , and TN 4 = T4 . The difference between the first and the third delays must guarantee that TN tot