Securing Electricity Supply in the Cyber Age
TOPICS IN SAFETY, RISK, RELIABILITY AND QUALITY Volume 15 Editor Adrian V. Gheorghe Old Dominion University, Norfolk, Virginia, U.S.A. Editorial Advisory Board P. Sander, Technical University of Eindhoven, The Netherlands D.C. Barrie, Lakehead University, Ontario, Canada R. Leitch, Royal Military College of Science (Cranfield), Shriverham, U.K. Aims and Scope. Fundamental questions which are being asked these days of all products, processes and services with ever increasing frequency are: What is the risk? How safe is it? How reliable is it? How good is the quality? How much does it cost? This is particularly true as the government, industry, public, customers and society become increasingly informed and articulate. In practice none of the three topics can be considered in isolation as they all interact and interrelate in very complex and subtle ways and require a range of disciplines for their description and application; they encompass the social, engineering and physical sciences and quantitative disciplines including mathematics, probability theory and statistics. The major objective of the series is to provide series of authoritative texts suitable for academic taught courses, reference purposes, postgraduate and other research and practitioners generally working or strongly associated with areas such as: Safety Assessment and Management Emergency Planning Risk Management Reliability Analysis and Assessment Vulnerability Assessment and Management Quality Assurance and Management Special emphasis is placed on texts with regard to readability, relevance, clarity, applicability, rigour and generally sound quantitative content.
For other titles published in this series, go to www.springer.com/series/6653
Zofia Lukszo Geert Deconinck Margot P.C. Weijnen ●
Editors
Securing Electricity Supply in the Cyber Age Exploring the Risks of Information and Communication Technology in Tomorrow’s Electricity Infrastructure
Editors Zofia Lukszo Delft University of Technology The Netherlands
[email protected] Margot P.C. Weijnen Delft University of Technology The Netherlands
Geert Deconinck Department of Electrical Engineering (ESAT) Katholieke Universiteit Leuven Belgium
ISBN 978-90-481-3593-6 e-ISBN 978-90-481-3594-3 DOI 10.1007/978-90-481-3594-3 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009942534 © Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Contents
1 Introduction................................................................................................ Margot P.C. Weijnen, Zofia Lukszo and Geert Deconinck 2 The Future of Electricity Systems: General Trends, Developments.............................................................................................. Marija D. Ilic´
1
13
3 Dependency on Electricity and Telecommunications............................. Benoît Robert and Luciano Morabito
33
4 Critical Interrelations Between ICT and Electricity System................. Janusz W. Bialek
53
5 ICT and Powers Systems: An Integrated Approach.............................. C. Tranchita, N. Hadjsaid, M. Viziteu, B. Rozel, and R. Caire
71
6 Governance: How to Deal with ICT Security in the Power Infrastructure?.................................................................... 111 Marcelo Masera 7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid: A Case Study............................................................... 129 Nabajyoti Barkakati and Gregory C. Wilshusen 8 Metering, Intelligent Enough for Smart Grids?..................................... 143 Geert Deconinck
v
vi
Contents
9 Experience From the Financial Sector with Consumer Data and ICT Security............................................................................. 159 Wim Hafkamp and René Steenvoorden 10 The Way Forward.................................................................................... 171 Laurens J. de Vries, Marcelo Masera and Henryk Faas Index.................................................................................................................. 181
Chapter 1
Introduction Margot P.C. Weijnen, Zofia Lukszo and Geert Deconinck
Abstract The infrastructures for electric power and information and telecommunication services are critical enablers for all economic activity. Both of these infrastructure systems evolved over time as networks-of-networks in an institutionally fragmented landscape. In understanding and steering the emergent behaviour of these infrastructure systems both their physical network complexity and their social network complexity pose a formidable challenge. On top of the socio-technical complexity of the electricity infrastructure and the information and telecommunication infrastructure as such, the two infrastructure systems show unprecedented mutual interdependency. Unravelling this multi-level interdependency and identifying strategies to curb the new risks and vulnerabilities it implies for the reliability of electric power services is the goal of this book. It clearly shows that technical solutions alone will not suffice to ensure the future reliability and security of electricity infrastructure operations. Keywords Cybersecurity • infrastructure vulnerability • infrastructure dependencies • power systems
1.1 Infrastructures Are Critical Infrastructures are the backbone of the economy and society. Especially the network bound infrastructures operated by public utilities and network industries provide essential services that are enabling for almost every economic and social activity. M.P.C. Weijnen (*) and Z. Lukszo Technology Policy and Management, Delft University of Technology, P.O. Box 5015, 2600 GA Delft, The Netherlands e-mail:
[email protected];
[email protected] G. Deconinck K.U.Leuven – ESAT/ELECTA, Kasteelpark Arenberg 10 bus 2445, B-3001 Leuven, Belgium e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_1, © Springer Science + Business Media B.V. 2010
1
2
M.P.C. Weijnen et al.
The crucial role of the infrastructure networks for energy and water supply, transportation of people and goods, and provision of telecommunication and information services, is only aggravated as societies evolve from agricultural to industrial and post-industrial economies. In the development process, economies tend to become increasingly specialized, culminating into a myriad of value added services. Whereas households in agricultural economies are largely self-sustaining, those in the present-day urbanized service economies are totally dependent on the external provision of energy, water, telecom and transportation services, and on many other services, such as financial services, public health services and education that, in turn, rely on the aforementioned essential network bound infrastructures. Given the critical role of networked infrastructures in sustaining society and the economy, it is hard to understand why these systems were so long neglected by the scientific research community. For decades the public at large considered the availability and affordability of their services as self-evident. As a consequence of the high standards of reliability that were maintained in the Western world, the availability of infrastructure services was accepted as a fact of life and infrastructure related services were used without reflection. This situation changed dramatically with the events of 9/11, which instilled a general notion of vulnerability and a political awareness of the criticality of infrastructure systems and services.
1.2 Power and Telecom Are Key Since then, driven by amongst others the US Homeland Security department, a massive research effort was initiated to identify critical infrastructures and ensure their adequate protection. As a result, in the USA as well as in many other countries, numerous attempts were made to list the nation’s critical infrastructures and identify critical infrastructure assets [1,4]. The variety of infrastructure sectors identified as critical is enormous, ranging from energy, Information and Communication Technology (ICT), transportation and water supply systems to public and legal order and safety, banking services, emergency services, food and health. It is clear that the diversity of infrastructure systems listed as critical varies with the governance responsibility of the state department compiling the list [9]. In this volume, we focus on two infrastructure systems that are invariably listed as critical infrastructure, in each and every such list, in each and every country: the electrical power infrastructure and information and telecommunication infrastructure, respectively. There is no doubt that these infrastructure systems are critical in every way: they are essential to almost any economic activity, they contribute substantially to social welfare and quality of life and, in fact, deeply influence our way of life in ways never imagined by those who initially designed and shaped these systems.
1 Introduction
3
1.3 Infrastructures Are Complex Adaptive Socio-Technical Systems When considering how the critical infrastructure systems for electricity and telecommunication services were brought into being, it is intuitively clear that these infrastructure systems were not “designed” as such. Even though each and every bit of the physical system was designed, in the true sense of engineering design, the huge transnational infrastructure networks that today provide us with electricity, information and communication services were not designed as integrated systems. They rather emerged over time, step by step, as a result of technological innovation, changing economic conditions, changing public values and changing end-user demands. Over time, city networks were interconnected to form regional systems, regional networks were interconnected to form national systems, and these were interconnected across national borders, so that continental electricity networks and intercontinental communication networks emerged. The course of their development was not determined by a single decision maker but by a multitude of actors, in different roles, in different countries, with different interests and spheres of influence. For both infrastructure systems, the result of this evolutionary process is a patchwork of networks, a network-of-networks, with old and new technologies, in an institutionally fragmented landscape. The complexity of the physical networks in both infrastructure systems presents only one side of their complexity. In both systems numerous actors are involved in planning, operating, managing and regulating the infrastructure, at the national level and at international and supranational levels. The networks of actors in both the electricity and telecommunication infrastructure systems present another dimension of system complexity. In recognizing this combination of physical and social network complexity we are in fact considering infrastructure systems as socio-technical systems [8]. In describing infrastructures as systems that evolved, we implicitly take a complex adaptive systems perspective. To understand, steer and control evolving infrastructure systems, we have to acknowledge both the physical and social agents. As previously mentioned, infrasystems are characterised by both physical and multi-actor (social) network complexity, see Fig. 1.1. The physical network includes the conversion facilities where the infrastructure bound service is generated and the end-conversion devices at the end-user side, and all the physical assets, including the control systems, in between. The multi-actor network includes decision makers on the physical system’s operation and management, including the users of the system, and the decision making rules set by, e.g. the market, the legislative and regulatory framework and the broader institutional context. Hence, elements such as laws and governance structures count as an integral part of the infrastructure system instead of just as context variables. This is in contrast with the traditional engineering approach to infrastructures in which the actors tend to be seen as part of the context and with the traditional approach of the social sciences in which the actors are central and the physical infrastructure is considered a context variable.
4
M.P.C. Weijnen et al. Physical Network
Actor Network
Physical Network
Actor Network
Socio-Technical Network
Physical Network
Actor Network
Physical Network
Actor Network
Infrastructure Interaction
Fig. 1.1 Schematic representation of interdependencies within and between socio-technical networks
1.4 Emergent Behaviour For the electricity system, the evolution of the system as is in the Western world took place over more than a century. During all this time the technology for electricity transport and distribution did not fundamentally change. In the telecommunication sector, however, technological innovations lead to a succession of revolutions in transmission modes. Whereas the development of the fixed telephone system took off at around the same time as the electricity system, at the end of the nineteenth century, the second half of the twentieth century brought a wave of new communication technologies and consequently a proliferation of new communication networks, each subsequent one characterized by higher transmission speed and/or new functionalities. Unlike the electricity sector, competition between networks has emerged in the telecommunication sector. However, the new communication networks are not entirely independent substitutes for one another, even though end-users may have the illusion they are, as they are still interconnected in many ways. The network-of-networks characteristic of the telecommunication sector therewith indicates another level of complexity than prevailing in the electricity sector; on top of the interconnection of national networks using the same transmission technology, as in the electricity sector, communication networks using different transmission technologies are also interconnected, within and across national borders. Not only in an evolutionary perspective, but also at the timescale of network operations, both the electricity infrastructure and the telecommunication infrastructure show emergent behaviour. System operators need to be constantly aware of potentially unpredictable consequences of their actions and need to be constantly prepared for disturbances, the causes of which may be originating in parts of the system controlled by other system operators. Especially in the electricity system where operators cannot direct the flows over the network, examples of cascading
1 Introduction
5
blackouts abound. In November 2006, after a high voltage transmission line over the river Ems was temporarily shut down by local system operator E.ON Netz to allow a cruise ship safe passage, the reconfiguration of electricity flows over the network caused an overload of the local electricity grid which could not be contained. The grid imbalance in Germany cascaded throughout Europe, causing millions of households to be deprived of electricity, halting trains in Germany and France, trapping people in lifts, and forcing Cologne Bonn airport to switch to its emergency generator. In this incident, a total European power blackout was narrowly avoided by the automated safety systems in place which selectively shed part of the power load to safeguard priority end-users, such as hospitals. In 2003, the whole of Italy was affected by a power blackout after a falling tree disrupted a power line in Switzerland. Almost 56 million people were deprived of electricity for as long as 18 h. This blackout is said to have been exacerbated by slow communication between the Swiss and Italian transmission system operators, as it could have been avoided by timely corrective action on the part of the Italian grid operator [6]. Painfully it became clear that the system level behaviour emerges from individual agents’ actions and that communication between agents plays a key role, especially in situations of pending crisis. These two example cases could have led to far worse consequences if they had not happened during the weekend, at night, as it was. Both examples carry a strong message beside their demonstration of the vulnerability of large scale interconnected power systems. In the context of this book, the relevant message is the utter and complete reliance of the electrical power grid on information and telecommunication systems. As we have seen, the complexity of this infrastructure may even surpass that of the electricity infrastructure. The good news is that the telecommunication sector, with its competing networks, holds more redundancy in terms of capacity so that data transmission and personal communications can be switched to alternative networks if one network is corrupted. In emergency situations, however, the communication system tends to become quickly overloaded due to shared critical components between interconnected communication networks.
1.5 Minor Incidents with Major Consequences Another message that we can distil from aforementioned examples is that the causes of service disruptions in the electrical power system are quite innocent, in all the prominent cases of blackouts we have seen over the past decade in Europe and the USA. This message does not imply that we underestimate the real threat of terrorist attacks targeting critical infrastructure systems. The massive research efforts made for the purpose of critical infrastructure protection have brought us a far better understanding of the structure and behaviour of infrastructure systems, their interactions and interdependencies and led us to explore new avenues towards robustness and resilience of critical infrastructure systems. It is of the utmost importance, however, that this research effort is extended to cope with more mundane
6
M.P.C. Weijnen et al.
causes of infrastructure service interruptions than sabotage and terrorist acts. For infrastructure systems that are (partially) situated above ground, severe weather conditions are by far the most frequent cause of prolonged service interruptions. In parts of North America and Europe, snow storms and ice loads on power transmission lines are an almost annually recurring cause of electricity service disruptions. An unusually severe November storm in 1999 toppled transmission pylons and tossed uprooted trees on overhead high voltage power lines in France, henceforth disrupting electrical power services in large parts of Europe [3]. Like the power system, the telecommunication system is not immune either to “acts of God”. Large scale service disruptions are also known to occur in telecom systems through natural causes. In December 2006, a severe earthquake off the coast of Taiwan damaged several undersea cables. All over East Asia banks and businesses reported telephone and Internet service disruptions following the incident. The damage included communications cables to the USA and Europe, especially affecting Internet services. In the month it took before the cables were fully restored, most of the international Internet traffic was re-routed via landline cables connecting China and Europe, and satellite transmission was also used, according to China Telecom. In April 2004 it took China Netcom 2 weeks to restore service after a fishing boat damaged an undersea cable. Generally spoken, the resilience of the Internet is huge, as it relies on a huge network of cables and routers offering ample re-routing possibilities. However, also the Internet infrastructure contains critical elements such as the high bandwidth interconnections between continents. In November 2003, one of such critical links, an undersea cable linking the American and European continents, failed, causing widespread disruptions to Internet services in the UK. Even if the high voltage and high bandwidth transmission lines in the electrical power and telecommunications infrastructures, respectively, are generally designed for n-1 reliability of service, failure is not impossible. In the latter example of the TAT-14 communications cable linking Europe and the USA, which is a dual, bi-directional ring of cable satisfying the (n-1) principle, the undersea cable was not only damaged off the French coast, it was at the same time suffering from a technical fault near the US coast which had been diagnosed earlier that month but not yet been repaired. Also for transportation infrastructure the causes of lost service are quite mundane. Bad weather, congestion, traffic accidents, technical failure and human errors are far more dominant causes of lost service than any action with malicious intent. Any interruption of infrastructure bound services incurs substantial social costs, not only in terms of direct cost of repair or replacement, but especially in terms of lost productivity of end-users. For Western Europe (EU-15), even the most conservative estimates of the direct costs incurred by “normal” service interruptions (excluding, e.g. sabotage and terrorist attacks) in energy, telecommunication and transportation infrastructure amount to a few percent of the Gross Domestic Product [2]. It should be mentioned that virus attacks, hacking, phishing, malware and the likes of these are included in this estimate as “normal” causes of service interruptions. Apparently, certain acts with malicious intent are considered “normal”, even if they are illegal. Whereas hacking was once considered as a sport for students in computer science, it is now employed by criminal multinational organizations with
1 Introduction
7
ample resources at their disposal to refine the arts of hacking, phishing and the like. Successful virus attacks on the Internet are known to have been staged by underage high-school students, just for the fun of it. These “normal” threats that any Internet user and service provider has to cope with have created an ongoing race between the developers of protective software such as firewalls and the developers of malicious code. As it stands, the game has lost the innocence it may have had in the mayday of the Internet. The current situation is better described as a grim war in which huge interests are at stake, the more so as the Internet and Internet bound services are penetrating each and every part of the economy and society. The electrical power infrastructure has not escaped this fate. Like any other infrastructure it heavily depends on the Internet, amongst others for the provision of value added consumer services. Many of its market places are supported by the Internet; the Amsterdam power exchange APX was the first in its kind to be fully conducted through the Internet. The interdependence between the electricity and communications infrastructures is symmetrical: the electrical power infrastructure depends on the telecommunication infrastructure as much as any telecommunication network (with the exception of the fixed copper-wired telephone network) depends on the supply of electricity. The two infrastructure systems hold one another in a firm grip. Disturbances in the telecommunication infrastructure can cripple the electricity infrastructure as much as electricity service disruptions can cripple the telecommunications infrastructure. To counter these risks, we need to unravel this interdependence into more specific interdependency relations as they occur at deeper levels in the intertwined system.
1.6 The Approach of This Book In this volume the challenge of managing these interdependencies is approached from the perspective of the electricity infrastructure, as it seems that operators and strategic decision makers in the electricity system are still largely focused on the electricity infrastructure as such. Too often, information and communication systems are treated as add-on’s rather than as critical parts of the electricity infrastructure itself. As Marcelo Masera emphasizes in his contribution, the electricity infrastructure has become an E + I infrastructure, an intricately intertwined Electricity and Information infrastructure, and it should be operated and managed as such. The challenge of properly handling the critical role of information and communication infrastructure in the electrical power infrastructure is more important today than ever before in history, as the reliance on electricity supply security is greater than ever – and only increasing. Virtually any household and industry in Europe is connected to the pan-European grid and electricity represents an ever increasing share in their energy consumption pattern. At the same time, it seems more difficult than ever before to face this challenge, as the structures co-ordinating the development of the Member States’ national electric power systems have crumbled upon
8
M.P.C. Weijnen et al.
liberalisation of the European electricity market. The old vertical public monopolies were unbundled, separating grid operation from electricity generation and the provision of electricity services to end-users in wholesale and retail markets. Mechanisms co-ordinating the location, type and capacity of new generation capacity with the transmission and distribution grid were thus lost. The players adopted new roles, and many new players entered the system in new roles such as brokers, traders and market place operators. The multi-actor diversity of the electricity infrastructure system was further aggravated by the privatisation of many generators and service providers and by a subsequent wave of mergers and acquisitions, across national borders. The historically strong ties between electricity companies and local public interests are by and large disappearing. In the institutionally fragmented European landscape, the development of the electricity infrastructure system is determined by the net sum of decentralised decision making by all actors involved, each of them acting in its own interest. In today’s changing electric power system, the primary technical challenge is to improve the control of generation and transmission in order to minimize real-time power imbalances even while the share of decentralized and intermittent power generation capacity increases substantially. This involves ‘smarter’ transmission and distribution networks, supported by smart information and communication systems. The specific vulnerabilities of information and communication systems raise challenging questions for the dependability, security and resilience of the smart electric power system to be. It is questionable if the electric power grid operators and electricity service providers are prepared to deal with the increasing vulnerability of their operations to cyber attacks or to adequately protect data confidentiality, while guaranteeing accessibility. Another consequence of smart electricity infrastructure is that electricity grid operators increasingly have access to private and competition sensitive information. This introduces new risks of abuse and manipulation of confidential data, hitherto unknown to the energy sector. Better models and methods are needed for protection against exploitation of system vulnerabilities, whether accidental or intentional, such as in cyber attacks [7]. Research into flaws and weaknesses of the information and communication systems supporting electric power systems is in its infancy. The lack of cross-sectoral co-ordination and knowledge sharing is a significant hurdle in acquiring a sufficient understanding of the risks at hand as well as in formulating and implementing effective strategies to counter these risks. The challenges cannot be tackled with technical solutions alone: a coherent combination of technical, organizational and institutional (governance) solutions is required.
1.7 The Authors’ Contributions This book presents a cross-section of visions from different experts on electric power infrastructure and the challenges created by its dependency on information and communication systems. The forelast chapter is an exception, as it stems from
1 Introduction
9
the financial sector; it discusses many issues pertaining to the security of ICT systems in the financial sector which are equally relevant for the electric power sector and from which valuable lessons for the power sector can be learnt. The next chapter contributed by Ilic sets the scene for the future electric power infrastructure in the context of interdependent infrastructure systems. It describes the changing landscape of the electricity sector since electricity market liberalisation and the fast technology evolution that enables the future smart grid. In this context, many opportunities are identified to apply techniques from different engineering domains, such as computer science and engineering, to the operation of the smart electric power grids of the future and the applications that will be executed on these smart grids. Most of the contributions in this book zoom in on the so-called cyber interdependency of the electricity infrastructure, still a relatively new phenomenon stemming from the computerisation and automation of the electric power infrastructure, as all other networked infrastructures. Cyber interdependency, according to Rinaldi et al., implies that the state of an infrastructure is dependent on information transmitted through the information and communication infrastructure [5]. It implies the existence of electronic links between units and subsystems within the electric power infrastructure, and between the power infrastructure and other infrastructures, through which information is transmitted. Although many associate cyber interdependency only with electronic data transmission, information may also be transmitted in the form of personal communications. In the example of the Italian power blackout, personal communications between the Swiss and Italian TSO’s played a crucial role. In our definition of cyber interdependencies, such communication between system operators is included as it is essential for the proper functioning of the integrated electric power system. The specific cases of the blackouts in the eastern USA/Canada and in Italy in 2003 are discussed by Bialek. In this contribution, concrete evidence is gathered on the influence of the communication system and its failures on the emerging failures of the electric power system. It also shows the complexity of interdependencies and stresses the crucial role of real-time information on the state of the network to adequately support the transmission system operators in their increasingly challenging task. As Rinaldi et al. point out, cyber interdependency is only one type of interdependency between critical infrastructures. The other types of interdependencies identified are physical, geographic and logical interdependency [5]. Note that information and communication infrastructure may also be involved in the latter three types of interdependency, even if it is not in the role of directly supporting the operation and management of the power infrastructure. The four types of interdependency are not mutually exclusive. Physical interdependency between infrastructures implies that the state of one is dependent on the outputs of the other, and that this interdependency is embodied by a physical linkage, through which the output of one infrastructure is transmitted as an input to another infrastructure. Geographic interdependency is caused by close spatial proximity of (elements of) multiple infrastructures, so that potentially destructive environmental conditions such as a fire or an explosion that originates in one infrastructure, simultaneously affect all
10
M.P.C. Weijnen et al.
infrastructure situated in the same location. Logical interdependency denotes all interdependencies other than physical, cyber and geographic interdependencies. In the critical infrastructure interdependency study of Robert, executed for the province of Quebec, Canada, all four types of interdependencies are taken into account in examining how perturbations in one infrastructure propagate through other infrastructures. In decomposing the cyber interdependency of the electricity infrastructure, different authors in this book follow different strategies. Barkakati and Wilshusen take an information and communication security point of view to the control systems and networks needed to operate a large integrated power company in the USA. They distinguish between the distributed control systems used by a single processing or generating plant, which is bound to a specific site and relies on local area information/communication networks, and the SCADA systems employed to control the operation of the integrated power system operated by one company, which is dispersed over a large geographical area. Such a SCADA system necessarily relies on long-distance communication networks. Tranchita et al. take the structural decomposition of the control and management systems needed to operate the electric power infrastructure to the level of a transnational interconnected power grid and make an attempt to find a suitable modelling approach for interdependent information/communication and electricity infrastructures. Such modelling efforts are crucial to acquire a better understanding of the new risks entailed by the information and communication infrastructure enabled management and control systems directing the operation of the electric power infrastructure and to identify opportunities for increasing its efficiency and security. Masera follows a functional decomposition strategy, in which he distinguishes five strata. In this approach the first stratum, at the most fundamental level of cyber interdependency, comprises the digital processes that are involved in the measurements, operations and other actions executed directly on a single power variable or onto the digital data. Such processes are executed at all levels of the power infrastructure, ranging from subsystems within a power company to market-wide operations. In this decomposition, the strata only partially coincide with the structural system levels of the electric power infrastructure. It is clear that the combination of the power and telecom infrastructure also provides opportunities for new services and applications in the power grid. For instance, many countries are on their way to make huge investments in advanced metering to enable remote monitoring and control of metering devices at residential customers. In his contribution, Deconinck shows that tailoring the communication infrastructure of the power grid to such single application only would imply a missed opportunity for more advanced smart grid applications, which imply a need for larger bandwidth and faster and more dedicated communication channels. Valuable lessons to make the ICT infrastructure more dependable and more secure can also be obtained from other sectors, as is shown by Hafkamp and Steenvoorden in their contribution which stems from the financial sector. In the financial world, the risk assessment methodologies – which once were mainly
1 Introduction
11
related to financial risks only – have been extended to incorporate risks from failing communication infrastructure. The efforts from the banking sector show that, if supported by adequate regulation, the adoption of new information and communication technology does not necessarily increase risks significantly. All contributions emphasize that information and communication infrastructures did not evolve in sync with the electricity infrastructure. In comparison with the steadily but slowly evolving electricity infrastructure, technological change in the information and telecommunication sector has been far more turbulent, giving rise to a great heterogeneity of information and control systems employed in the electricity sector. Whereas the electricity sector has a long tradition of adhering to extremely high standards of reliability and security, the information and telecommunication sector is known for a less risk aversive attitude; if the need arises, security risks and other disfunctionalities can be ‘patched’. The electricity sector is unaccustomed with this culture of patch management. Moreover, the new challenges faced by the electricity sector cannot be tackled with technical solutions alone: a coherent combination of technical and institutional (governance) solutions is required. Governmental agencies and the owners and operators of critical infrastructure systems must join forces to identify the multi-level interdependencies between the electric power infrastructure and the information and telecommunication infrastructure and co-implement appropriate strategies to reduce the new risks and vulnerabilities. Acknowledgement The authors wish to acknowledge the Next Generation Infrastructures Foundation and the Delft Research Center for Next Generation Infrastructures for their generous support of the international workshop on “Electricity Security in the Cyber Age: Managing the increasing dependency of the electricity infrastructure on information and communication technology”, that resulted in this book. The book presents the contributions presented by distinguished invited keynote speakers from the international scientific and industrial community. The editors trust that the workshop held May 13–14, 2009, in Utrecht, the Netherlands, marks the beginning of a vibrant research community that will effectively help public policy makers and strategic decision makers in the energy sector in dealing with the risks of the next generation of ‘smarter’ electric power systems.
References 1. European Commission: Green Paper on a European Programme for Critical Infrastructure Protection, com 576 final, Brussels, 17 Nov 2005 (2005) 2. Kröger, W.: Critical infrastructures at risk: a need for a new conceptual approach and extended analytical tools. Reliabil. Eng. Syst. Safety 93, 1781–1787 (2008) 3. Le Du, M., Rassineux, B., Cochet, P.: The French Power Network Facing the 1999 Storms. Power Systems and Communication Infrastructures for the Future. In: Proceedings of the CRIS Conference, Beijing, September 2002 4. Moteff, J.D.: Critical infrastructures: background, policy, and implementation. CRS Report for Congress, Order Code RL30153, Updated 13 March 2007 (2007) 5. Rinaldi, S.M., Peerenboom, James P. & Kelly, Terrence K.: Identifying, understanding, and analyzing critical infrastructure interdependencies. IEEE Control Systems Magazine, December 2001
12
M.P.C. Weijnen et al.
6. Stefanini, A., Masera, M.: The security of power systems and the role of information and communication technologies: lessons from the recent blackouts. Int. J. Crit. Infrastruct. 4(1/2), 32–45 (2008) 7. Ten, C.W., Liu, C.C., Manimaran, G.: Vulnerability assessment of cybersecurity for SCADA systems. IEEE Trans. Power Syst. 23, 1836–1846 (2008) 8. Weijnen, M.P.C., Bouwmans, I.: Innovations in networked infrastructures: coping with complexity. Int. J. Crit. Infrastruct. 2(2/3), 121–132 (2006) 9. Wenger, A., Mauer, V., Dunn, M.: International CIIP Handbook 2008/2009 – An Inventory of 25 National and 7 International Critical Information Infrastructure Protection Policies 4: Center for Security Studies, ETH Zurich (2008)
Chapter 2
The Future of Electricity Systems: General Trends, Developments Marija D. Ilic´
Abstract In this chapter we consider qualitative changes that are taking place in today’s electric power industry, and, in particular, their implications on ICT design needs and specifications. To start with, we briefly summarize today’s operating and planning practices and their relatively weak dependence on Information Communications Technology (ICT). This generally results in sub-efficient utilization. We point out that different drivers of change, ranging across technological, regulatory and social, require transformation of these practices. A conjecture is put forward that ICT must play a fundamental role in transforming the industry services to meet future needs. Most of the ICT-enabled decision tools are distributed and minimally coordinated to allow for the implementation of choice with respect to type of service and willingness to pay, for example. On the other hand, opening the industry from fully hierarchical top-down managed industry to an open access industry makes the industry as a whole quite vulnerable as private information becomes more publicly available. We propose that carefully designed standards and protocols are essential for utilizing ICT without exposing the industry to the security threats. In this chapter we present our vision for Dynamic Monitoring and Decision Systems (DYMONDS), as the next generation Supervisory Control and Data Acquisition (SCADA) capable of achieving this fundamental objective. The main idea rests on evolving top-down hierarchical ICT into a multi-layered multidirectional ICT system we call DYMONDS. Keywords Next generation ICT for power systems • SCADA • DYMONDS
M.D. Ilic´ () Dept. of Electrical and Computer Engineering, Dept. of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213-3890, USA and Technology, Policy and Management, Delft University of Technology, Jaffalaan5, 2628 BX, Delft, The Netherlands e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_2, © Springer Science + Business Media B.V. 2010
13
14
M.D. Ilic´
2.1 Introduction The electric power industry world-wide has been at its cross-roads for quite some time. The industry has begun to transform from a rather statically planned and operated industry with well-defined objectives and information protocols within each integrated utility to the industry driven by a mix of disruptive technologies, organizational reforms and new societal objectives [3–5]. Current situation is the one of genuine state-of-flux characterized by many opportunities as well as by many problems as changes occur. Opportunities are basically seen by the developers of new hardware technologies and some new software technologies. However, it is fairly straightforward to show that software for supporting integration of new hardware is lagging. This situation is result of many factors, primarily lack of well-defined quantifiable performance objectives at different industry layers and the industry as a whole, and, moreover, standards and protocols capable of enforcing implementation of such objectives. Particularly difficult has been the problem of mandatory information exchange between the industry stakeholders. On the other hand, it has been extremely difficult to reconcile the need for more open access software in support of embedded adaptation to system conditions with the concerns about making information available and becoming more vulnerable all together. For security reasons the industry organizations such as the North American Electric Reliability Council (NERC) in the United States and its counterparts world-wide have been under pressure to establish and mandate Critical Infrastructure Protection (CIP) standards http://www.nerc.com/. Similarly, government organizations responsible for ensuring security against terrorist attacks, US Department of Energy (DoE) and US Department of Homeland Security (DHS) have been working closely with NERC toward such standards http://www.gao.gov/cgi-bin/getrpt?GAO-04-780. These conflicting objectives are difficult to reconcile given the overall spatial, temporal and contextual complexity of large-scale electric power systems [1]. Given this overall situation, it is safe to say that the role of software is not well-defined and that the industry badly needs user-friendly software to support its transformation without exposing it to security risks. We go one step further in this chapter and claim that it is indeed not possible to reach energy and environment objectives expected by the society without beginning to actively rely on ICT-enabled automation. Many functions currently performed by humans must be automated. In what follows we support this claim by reviewing fairly straightforward needs for moving beyond the worst-case preventive system operation toward on-line corrective automated adjustments by different industry layers driven by their own distributed decision making and the minimal information exchange with the others. We are headed toward a qualitatively different industry paradigm shift from the one in which reliable and secure service is achieved by having full top-down information and control of power plants, to the industry paradigm in which distributed decisions are made autonomously according to welldefined standards and protocols. Given this change of industry paradigm, it becomes practically impossible to advance today’s Common Information Model
2 The Future of Electricity Systems
15
(CIM) to supporting the objectives of future electric energy systems (choice, in particular) without establishing analytics-based ICT for: (1) enabling sufficient information to each industry layer to meet its objectives given (or estimated) information about the rest of the system; and (2) defining and implementing type and rate of information exchange essential to be exchanged between different industry layers for the system-wide objectives to be met. We call such ICT system DYMONDS, see http://www.eesg.ece.cmu.edu/dymonds. We suggest that deployment of many disruptive hardware technologies, such as wind and solar power, demand-side response, storage, plug-in-hybrid electric vehicles (PHEVs), small distributed generation, combined power and heat plants in particular, powerful communications platforms and sensor networks, advanced metering infrastructure (AMI), and many more unconventional technologies currently pursued, will require very careful ICT specifications and design in order to integrate most effectively into the existing physical power systems. Integration of these new resources requires carefully posed performance objectives and their trade-offs. Needless to say that today’s systems were never designed with the intent to support large-scale penetration of these incoming technologies. A typical demonstration and deployment of some pilot technologies is generally harmless to the system, and, at the same time incapable of claiming benefits beyond local. Successful integration of large-scale unconventional resources, on the other hand, can only be done by transforming today’s electric power grid into an active ICT-based enabler of monitoring and adjusting these new resources in coordination with the existing physical infrastructure. Given this claim concerning fundamental role of ICT for effective integration of unconventional resources, the next question becomes the one of designing the framework for such integration. One could start with today’s CIM and evolve it to support novel technologies. For all practical purposes CIM is simply a means of information exchange regarding: • Performance objective specification for a (group of) components • Particular intelligence internal to the distributed decision makers • Coordination of the (groups of) components by the Load Serving Entities (LSEs) • Coordination of LSEs by the entities responsible for meeting system-wide performance While there is a huge effort toward promoting CIM, there has been very little effort toward formalizing performance objectives and system-wide objectives which define essential information to be shared through CIM. Not having these directly affects dynamics of deploying and utilizing ICT and, ultimately, leads to hard-to-predict system-wide performance. In what follows we discuss the fundamentals of criteria for ICT specifications and design first. Once these are stated, the challenge to the current state-of-art ICT for power industry and a DYMONDS vision are described in some detail as the basis for moving forward. In Section 2.2 we briefly summarize assumptions and objectives of the electric power industry prior to restructuring and recent technological breakthroughs.
16
M.D. Ilic´
These are then used to assess state-of-art and industry practices in using ICT for meeting its objectives. In Section 2.3 we assess the potential of using ICT more systematically by describing typical performance objectives of both regulated and deregulated electric power industry. Several roadblocks to deploying such ICT are discussed for the regulated industry first and then for the deregulated. In Section 2.4 we pose the problem of the evolving physical electric energy systems with high penetration of disruptive distributed resources. We give several examples illustrating the need for multi-directional and multi-layered ICT as the next generation SCADA in support of effective integration of these resources. The move toward “smart grids” is posed as a novel systems engineering problem for which much ICT must be developed. In Section 2.4.3 we pose the DYMONDS problem as the problem of designing model-based ICT for quantifiable performance of future electric energy systems. The emphasis is on multi-directional ICT in order to reconcile often conflicting sub-objectives. In Section 2.4.1 we re-visit the software state-of-art for the old industry and describe some possible solutions for moving forward. We discuss how it would be possible to reconcile the need for more open information exchange with the requirements to hide the data for security reasons. Open research questions and the need for collaboration with experts in cyber-security are stressed http:// www.cylab.cmu.edu/. Finally, the closing Section 2.5 summarizes main points of this chapter.
2.2 Assumptions and Objectives Underlying Today’s Electric Power Industry We start by emphasizing the systems nature of the problem and the underlying engineering principles. In order to identify new needs and explore potential benefits from deploying less traditional technologies, it is essential first to understand the systems aspects of current operating and planning paradigms of large scale electric power grids. One way to proceed is to consider the basic features of their architecture. Such understanding is critical for posing the problem as a complex engineering system with well-defined performance objectives, as well as for understanding how these objectives are attempted in today’s industry. This approach, furthermore, helps to identify when the system may fail to perform. Section 2.2.1 is written with this concern in mind. Section 2.2.2 summarizes the overall performance objectives of today’s industry. For detailed general dynamic model based on the local characteristics of system components and network constraints see [6] in which the definition of an underlying structure-based model essential for conceptualizing principles of today’s hierarchical monitoring and control is derived. A systems approach to capturing the complexity of the dynamics ranging over vast time horizons in response to the drivers, such as exogenous inputs and disturbances, short-term feedforward and feedback signals, and new candidate technologies is at its forming stages at present.
2 The Future of Electricity Systems
17
Once the basic principles of today’s monitoring and control are formalized, it becomes possible to assess challenges seen in operating electric power systems under stress are described. Moreover, it becomes possible to assess the effects of measurement, estimation, and communications schemes on the types of models needed in order to attempt at least partial automation during abnormal operation.
2.2.1 Basic Physical Architecture of Today’s Electric Power Networks Electric power systems are very large-scale electric power networks interconnecting sources of electric power (generators) to the points of power consumption (loads). The interconnection has evolved over time to meet the needs of an ever growing demand for electricity. Several key drivers have shaped the basic topology of today’s systems, such as: (1) large power plants, often remote from load centers; (2) utilities supplying their customers without depending much on the neighboring utilities; and, (3) utilities interconnecting for reliability reasons, to help each other during major equipment failures. Consequently, the electric power grid has several voltage levels, converted from one to the other by step-up and step-down transformers. This has led to an extrahigh-voltage (EHV) meshed transmission backbone network, and distribution (local) lower voltage networks closer to the power consumers. Local networks are typically radial in structure. Shown in Fig. 2.1 is a sketch of this inherently hierarchical structure with respect to voltage levels present in a typical electric power interconnection, such as the Eastern Interconnection in the United States [6]. Generators, denoted by circles with wiggles, are connected to the sub-transmission or lower voltage transmission portion of the interconnection. They are further connected via step-up transformers to the EHV transmission network so the power is transferred long distance at as high voltage as possible to reduce transmission losses. Closer to the end users are placed step-down transformers connecting the EHV network to the substations and transforming back to lower voltage levels to supply consumers down to the residential houses. This physical network is structured both vertically and horizontally with respect to the operating objectives. To start with, a large regional interconnection, such as the one in the Eastern United States and Europe, is horizontally structured into many utilities with their own objectives of supplying customers with reliable and economic electricity. Red circles in Fig. 2.1 represent boundaries of these utilities. Each utility is furthermore vertically integrated into a single owner and operator (control area) of its own generation, typically located in the same geographical area, and the transmission and distribution networks all the way to the customers. The EHV and HV utility subnetworks are meshed and sparse, while the local (MV and LV) networks are radial during normal operations. Local networks often have normally open switches (NOSs) which close to supply customers from different power sources during equipment failures. There exist also fairly weak connections
18
M.D. Ilic´
Fig. 2.1 Current electric power architectures [1]
at the EHV and HV levels between various utilities for reliable service during large equipment failures. It has generally been more economic to rely on neighboring utilities for sending power via these tie-lines at times when utilities lose some of their own power than to build large stand-by generaiton capacity [2].
2.2.2 Planning and Operating Performance Objectives of Today’s Electric Power Systems The overall objective of traditional electric power system is simple: Minimizing total cost subject to reliability constraints. The implementation of this objective is complex when viewed as a single problem of decision making for very large-scale dynamic systems. The industry attempts to optimize the expected costs of serving customers subject to a variety of constraints, such as system dynamics and many input and output constraints. This is done for hard-to-forecast demand characteristics of the end users and the uncertain system equipment status. Moreover, the ownership of a typical electric power grid is distributed among different utilities with their own sub-objectives making the objectives of the entire interconnection even more difficult to meet.
2 The Future of Electricity Systems
19
It is striking that the existing and changing structures of electric power grids have evolved with very little reliance on formal systems control principles. Instead, various assumptions have made the design and operations manageable by the engineers themselves, without always having to rely on very detailed modeling and analysis. This creates an interesting challenge in its own right, as one wishes to explore the potential of more systematic sensing, communications, computing, and control for predictable performance and more diverse electricity services of the future. Essentially, it becomes necessary to take a step back, pose the problem, and understand often implied assumptions made in today’s operating and planning industry practices. For a detailed treatment of this, see [6]. Possibly one of the most difficult challenges in developing effective software tools for the electric power industry is thinking about the problem as a stochastic dynamic problem evolving at vastly different rates. The very question of conditions under which the single problem can be decomposed into simpler subproblems when the objective is long-term optimization under uncertainties subject to shortterm operating constraints makes this problem a singularly perturbed stochastic control problem. A possible approach to work one’s way through this very difficult problem is to decompose the problem into: • Functions which require feed-forward scheduling • Functions which require feedback design This separation makes it possible to review today’s practices, and ultimately, to assess potential problems and opportunities for improving. For summary of principles underlying feed-forward decision making and temporal decomposition of planning and scheduling functions in today’s industry, see [6].
2.2.3 Preventive Approach to Managing Uncertain Equipment Status in Today’s Industry In addition to the demand uncertainties, utilities face a tremendous challenge and high costs because of their obligations to serve customers reliably for at least 30 min following any single major equipment failure; this is according to the industry guidelines defined by the North American Electric Reliability Council (NERC). Regional utilities have cooperated when adding new equipment in order to ensure that the region as a whole operates reliably. Time consuming off-line simulations of regional systems are carried out to simulate the worst-case scenarios and to create lists of critical forced outages. Contingencies leading to fast transient instabilities, non-robust response to small deviations in states, parameters, and inputs away from nominal, and occurrence of uncontrollable voltage collapse are found and preventive adjustments of thermally limited power transfers are made in order to ensure that during such forced outages no problems of this type take place. A typical industry approach has been to perform off-line studies to tighten the control and output constraints within which the system would be allowed to operate
20
M.D. Ilic´
during normal conditions. The premise is that in case a fault occurs, there will be no dynamic problems and, therefore, no immediate dangers from time-critical problems. Therefore, feed-forward decision making in electric power systems for scheduling available resources assumes that transitions from one to the next schedule are stable.1 There are usually several stability-limited interfaces in each region. A combination of local dynamic control and additional constraints on some outputs are used to prevent unstable operation for typical loading conditions and the worstcase equipment failures. The result is suboptimal use of system-wide resources during normal conditions so that in case the worst-case contingency happens, there is no dynamic problem. The industry refers to this practice as preventive control. Preventive control should be contrasted with corrective control which can result from near real-time solution of the full stochastic dynamic optimization problem. Some forward-looking utilities have begun development of numerical tools for a more corrective approach to ensuring stability in electric power systems during major equipment failures, and are probabilistic in nature.
2.2.4 Feedback Control Functions in Today’s Industry Given today’s preventive control approach to managing large equipment failures by intentionally reducing allowable regions of operation to avoid time-critical events, and relying on human operators to carry out certain pre-defined procedures, the role of automation has been rather limited in large-scale electric power systems. One way to summarize feedback control performance objectives is by keeping in mind the basic horizontal organization of a large electric power system into utilities and power pools. Utilities (control areas) are expected to schedule sufficient power to supply the forecast load and the pre-agreed upon net power exchange with the neighboring utilities. Hierarchical automation is comprised of a primary (equipment) and secondary (utility) level. At the primary component level, local controllers are tuned to stabilize very fast and small supply/demand imbalances around the forecast demand and the corresponding feed-forward generation schedule. At present they are tuned for what may be considered the worst-case scenario and are generally not adaptive to major changes in system conditions. At the utility secondary level, control areas are expected to regulate slower, quasi-stationary, supply and demand deviations by means of Automatic Generation Control (AGC). AGC in the United States is the basic mechanism for balancing supply/demand among utilities in an entirely decentralized way. The Area Control Error (ACE) is a weighted sum of frequency deviations in control area, and deviations in net power flow exchanges, between the control area and its neighbors. Note that the optimization problem described here does not have dynamics of the system response as a constraint. This formulation implies that system dynamics are stable as scheduling and investment planning are done.
1
2 The Future of Electricity Systems
21
The AGC principle is an effective engineering concept based on the fact that the quasi-static frequency is the same in the entire interconnection; therefore, decentralized feedback implemented by each control area for responding to these frequency deviations contributes to the overall supply/demand balancing in the interconnection. Independently from where the imbalance is created, it can be regulated in response to a single observable variable, system frequency. At present, the scheduled net tie-line flows are based on agreeing with the neighbors regarding what these should be. The power scheduling for forecast demand is done by each control area scheduling internal power plants to meet the forecast demand in the area and the targeted net flow exchange with the neighboring control areas. However, there are no ways to enforce that the actual tie-line flows are what the schedules attempted, and there exists consequently so-called Inadvertent Energy Exchange (IEE) between each control area and the rest of the system. The IEE is a combination of actual tie-line flows deviating from the scheduled tie-line flows and the cumulative fast deviations of tie-line flows from the schedules. Cumulative frequency deviations are controlled by means of time-correction-error control at one power plant in the entire interconnection. We stress here that most of the operating practices are defined and standardized to a large extent for meeting the real power and frequency criteria. The operating and control practices for voltage and reactive power scheduling and control vary largely with the utility practice, much more so in the United States and less so in some countries in Europe. Coordination of real and reactive power scheduling and control practices remains largely a fundamentally open problem. This is discussed in some detail after the introduction of the necessary models.
2.2.5 Hierarchical Information Structure in Today’s Industry At the early stages of electric power network evolution, sensors, controllers and protection relays were based mainly on local measurements. Their set points were either pre-programmed for the forecast demand, and/or were adjusted by the human operators as the loading varied over time. After the first major blackouts in the early 1960s, the electric power industry recognized the need for more near real-time monitoring of their systems. Consequently, all major US utilities have built their Energy Management Systems (EMS), also known as the Control Centers, and have implemented a supporting Supervisory Control and Data Acquisition System (SCADA) for processing data and coordinating commands to manage power generation and delivery within the EHV and HV (bulk) portion of their own electric power system. SCADA is the fundamental monitoring and control architecture at the control area level. Moreover, several regions formed electric power pools to operate and manage several utilities in the same region with their own SCADA systems [8]. Even to this day, the information structure remains highly hierarchical: each primary controller utilizes its own local measurement only, each control area utilizes
22
M.D. Ilic´
measurements in its own utility only and has its own SCADA system. Protection, likewise, is pre-programmed to protect individual pieces of equipment and rarely requires communications. There is no on-line coordination between different regions within a large interconnection. As long as conditions are normal, the industry sees no need for system-wide scheduling of resources, nor for region-wide (online), nor for interconnection-wide on-line coordination. The control altough entirely decentralized at the level of control areas and quite effective. The only major issue during normal conditions is a sub-optimal use of regional resources due to decentralization.2 Most recently, there has been a considerable recognition of the need for synchronizing fast measurements across wider areas, in particular given major breakthroughs in new cost-effective measurement equipment, such as phasor measurement units (PMUs) and frequency measurement units (FMUs). Industry research is under way for systematic deployment of these sensors and their integration into the existing control design. We discuss later in this paper the relevance of these technologies for automating system operations outside normal regions.
2.3 Potential for Enhanced Performance of Today’s Industry by Means of ICT The potential of ICT for improving today’s performance of electric power systems is multi-fold. Most generally, the need for more active monitoring, estimation and decision making depends on the performance objectives defined by the industry. Once these are set, it is possible to quantify the potential benefits from performing a particular adaptation in support of the pre-defined system performance and on the performance of different stakeholders by assessing the cost effects of different constraints on the desired performance. There exists a real need to revisit the operating and planning objectives in order to demonstrate the potential shortcomings and opportunities for improvements related to ICT-based decision making. In particular, one could have at least four categories of performance objectives, as follows: • • • •
Normal-condition viability performance criteria Reliability performance criteria Efficiency (cost, benefit) performance criteria Environmental performance criteria
We differentiate between the term “hierarchical” SCADA and the multi-directional multilayered ICT future design (DYMONDS) by pointing out that very little, close to none, information is communicated from the system users (suppliers, consumers) back to the control center; the information flow is one-directional from the control center to the system users.
2
2 The Future of Electricity Systems
23
Any technology can be evaluated with respect to the performance metric of interest. In what follows we briefly discuss these four types of performance criteria and then explain how enhanced software could be used to quantify the effects of adapting hardware support on these performance criteria for any given power system. Finally, we describe our approach to assessing tradeoff performance when more than one performance criteria is of interest. Quantifying the tradeoffs across several types of performance criteria with respect to various control support and defining frontier curves may become quite important in the future as one attempts to understand the interdependences across technical, economic and environmental performance criteria.
2.3.1 Performance Criteria for Ensuring System Viability During Normal Conditions Possibly the most basic performance criteria for ensuring that the system operates during normal conditions within the technical constraints is the total transfer capability (TTC). Balancing authorities are required to compute TTC and post it on a daily basis. The TTC is used by the system coordinators and markets to implement energy transactions in an open-access way across different ownership boundaries. The TTC is currently defined as the net power which can be imported (exported) into (from) the control area without violating the technical constraints inside the control area. In order to quantify TTC non-conservatively it is important to differentiate between: • Hard constraints (such as Vi min < Vi < Vi max and Simax , apparent generation power). • Operating conditions-dependent constraints (line flow limit can be either thermal, or voltage- or stability-related limit as a function of system load). • Control-dependent constraints (line limits depend on the control practices in place; for example, if generation voltage is adjusted as system conditions change, the line flow limits are generally less restrictive than if it nothing is adjusted as loading and other system conditions vary). • Real and reactive power flow balance equations. It is particularly important to observe that the line flow limit is not a hard limit. To illustrate this, take the case of voltage-limited real power transfer. Current industry practice for testing the point-to-point transfer limit is to compute power flows as power is injected into the sending end of the interface and the same amount of power is taken out at the receiving end of the interface. The limit to the power transfer is detected when power flow software begins to have convergence problems; the highest injection for which power flow converged is then reduced by 5% and this is then declared to be the maximum power transfer across the interface, or
24
M.D. Ilic´
the net interface into the control area (TTC). Future software needs to be used to demonstrate the difference between the limit based on the P–V curve calculations and the maximum possible transfer obtainable as different controllable devices are adjusted to support the desired power transfer. The basic issue of how much transfer is viable across specific interfaces of interest becomes even more complex at the regional level, since the interface limits are greatly affected by the transfers across the multi-control areas. New software can be used to show the dependence of net imports on the TTC in a particular control area. For the transfers to be optimized at the regional level it is necessary to coordinate these interfaces by solving the seams problem. It makes sense to adjust available resources during normal conditions so that as conditions change the system conditions- and control-dependent flow limits are optimized to enable the largest TTC.
2.3.2 Performance Criteria for Ensuring Reliable Operations During Non-time critical Contingencies Similar to using the TTC as a measure of control area performance during the normal conditions, available transfer capability (ATC) could be computed for managing the system during contingencies without violating hard system constraints. Current practice is to create P–V curves for the worst-case contingencies and reduce the TTC to the ATC level so that the system is within the hard constraints even as the contingencies occur. Relevant for the direction of future software design is the approach for differentiating between the following two qualitatively different types of possible contingencies: • Contingencies which are not time critical. During such contingencies the system will move from one steady-state without dynamic and/or transient stability problems. However, the new post-contingency state may not be within the acceptable technical limits. • Contingencies which are time-critical. These contingencies require very fast fault clearing in order for the system not to become transiently and/or dynamically unstable. One must concern both non-time critical and time critical contingencies. One needs to assess a possible increase in TTC if adjustments of controllable components are allowed after the contingency takes place. This means that the TTC can be utilized during normal conditions, and as a non-time critical contingency occurs the enhanced software would be used to compute the most critical adjustments capable of bringing the post-contingency system to within the acceptable technical constraints. Significant increases in system ability to operate during contingencies without violating technical constraints are possible.
2 The Future of Electricity Systems
25
2.3.3 Criteria for Measuring Economic Performance of the System (Efficiency, Cost, Benefit) There are several criteria for assessing economic performance of the system, such as: • • • •
Network losses Economic transfers Total generation cost Social welfare (total generation cost minus total customer benefits)
Future enhanced software should be capable of optimizing all of these criteria by adjusting controllable devices throughout the system. Current practice is to optimize only real power generation by performing DC Optimal Power Flow OPF and to test whether these real power generation results are viable in a power flow solution. Most of the software is iterative, and often times is hard to converge this process. At some point the operator decides that some real power transfer are not possible because of the numerical power flow problems, and modifies the results of DC OPF optimized real power generation based on his/her expert knowledge. This inherently leads to suboptimal utilization, and operator decisions that are often hard to quantify. The settings on controllable network devices (such as capacitor banks, onload tap changing transformers, phase-angle regulators, and the like) are generally adjusted by the network owners (transmission and distribution system owners) and not by the system operator. Another way of interpreting this is that the system operator at present only has jurisdiction over generation resources, and the transmission owners are responsible for setting controllers on their network. The distribution systems (load providers) are responsible for adjusting the equipment settings on their portion of the network and for managing demand. Enhanced software needs to be capable of quantifying the differences in the outcomes of the economic criteria of interest when all controllable equipment is optimized by the system operator, and the outcomes when the system operator only optimizes generation settings.
2.3.4 Environmental Performance Criteria Most recently, industry has been charged by actively participating in meeting the environmental goals. This is done by adjusting utilization of polluting power plants to remain within the allowable emissions amounts and/or by purchasing cleaner power from others in order not to exceed the pre-specified emissions allowances. In addition, state mandates are under way requiring the addition of wind power which is generally considered to be clean.
26
M.D. Ilic´
It is fairly straightforward to show that a more adaptive adjustment of the existing resources could contribute significantly to using cleaner power plants when needed. In order to demonstrate this potential benefit it is necessary to quantify the environmental performance criteria much the same way as the technical and economic criteria first. Once this is done, it becomes possible to assess the cost of sub-optimal voltage/reactive power support seen in suboptimal environmental performance criteria. One possible way of demonstrating the severity of hardware adaptation-related problems in any given electric power system is by measuring the related missed opportunities to perform better with regard to technical (viability and reliability), economic and/or environmental performance criteria of interest. New software is needed for identifying the most costly problems and of suggesting possible most effective solutions (the solutions are both for enhanced operations and effective investments needed to relax the most costly problems.) The measures are straightforward. Given the performance criteria of interest Ji, the cost associated with any active system constraint yj is simply given as
ci , j =
∂J i ∂y j
(2.1)
For example, Ji could be ATC, TTC, Network Loss, Transmission Loss, Distribution Loss, Total Generation Cost, Social Welfare, Emissions Cost and the like. The active constraintyj be Voltage, Real Power Line Limit, Generation Capacity (Real and/or Reactive), PAR Line Flow Limit, Power Flow Real and/or Reactive Power Balance Constraint, and the like.
2.4 Toward Open Access Operations of Future Electric Power Systems There are basically three types of on-going industry changes. The first are technological and are the result of: (1) small scale distributed generation (DG) becoming cost-effective; (2) developing sensing and actuation technologies for customers to respond to system conditions and prices of electricity; (3) distributed switching technologies for both transmission and distribution systems; and (4) a wide spread of communications. With this technology in place the number of distributed decision makers and controllers is likely to increase significantly. The second driver of change has been organizational. By law, the electric power supply has become competitive, enabling customers to chose providers and go outside of their own control area to purchase cheaper power. Similarly, generators could sell to customers outside the control areas in which they are physically located. Because of this, a typical evolving structure was referred to as being “nested”, instead of hierarchical. At the same time, in parts of the electric interconnection in which power is supplied competitively there has been functional and
2 The Future of Electricity Systems
27
corporate unbundling within once tightly coordinated control area. The main new function of wire companies is to provide “open access” delivery of power, irrespective of ownership, across the entire interconnection. This is contributing to a diminishing role for control areas. The third, most recent, driver has been triggered by the new societal objectives concerning sustainable energy and environment goals. These goals could be considered to be either externalities to the basic energy supply or could be internalized by defining them as quantifiable objectives. One possible way of assessing the impact of these changes is, again, by looking at the evolving industry structures. New technologies are changing the basic electric power structure shown in Fig. 2.1 into network architectures shown in Figs. 2.2 and 2.3. Figure 2.2 is a sketch of fully distributed industry architecture characterized by active sensing and actuation by a very large number of small actors, such as small distributed energy resources, customer response and even automated switching of wires interconnecting these actors. The dynamics of interactions between various actors is determined by the distributed sensing and decision making of each actor. Each actor optimizes its own performance sub-objective for the assumed environment conditions. In this sense, a single decentralized performance objective solved centrally in the hierarchical system is solved by each actor optimizing its own sub-objective with respect to his own local variables; the constraints and interactions with the neighbors are computed for the assumed Lagrangian coefficients. Various iterative methods proposed in the recent literature naturally lend themselves to designing iterative protocols for exchanging information among the actors. One of the major open questions is if and how the system as a whole balances according to the basic network laws without system-wide coordination, and, moreover, how are various constraints, in particular flow congestion
Fig. 2.2 Fully distributed architectures [6, 7]
28
M.D. Ilic´
Fig. 2.3 Reconfiguration options for electric power grids (circles represent already controlled power plants, arrows end users, and the interconnections are controllable wires) [6, 7]
constraints observed. Recent work indicates that this is actually possible to achieve in distributed interactive ways. Lack of storage and, therefore, the need for supply and demand to meet instantaneously point into the fundamental research needs concerning distributed decision making and learning which takes into consideration time. There is very little work so far done on this subject, and this makes the broad area of multi-agent decision making not readily applicable to design of novel electric power system architectures. Figure 2.3 is a more likely future architecture. Instead of equipping each piece of equipment with sensors and actuators, economic and regulatory considerations will define what is economically viable. So, instead of having fully distributed, one would see gradually evolving, reconfiguring architectures, based on coalitions of actors with common goals. Consequently, the coordination of a large number of distributed decision makers could take place in qualitatively different ways from the traditional top-down hierarchical coordination based on temporal and spatial decomposition. Sensing, measurements, communications and control structures are becoming multi-layered and multi-directional, instead of being hierarchical. The electricity services of the future will be based on differentiable distributed performance metrics at a value which is likely to replace today’s control area-wide reliability criteria. Similarly, the organizational changes themselves lead to distributed decision making and affect control areas’ supply, delivery and purchase of electricity services. Each functional/ corporate entity has its own sub-objectives. These sub-objectives are attempted within a very uncertain system, since the rules for mandatory information exchange continue to remain quite vague.
2 The Future of Electricity Systems
29
2.4.1 The Need for Novel Standards and Protocols At first sight, there are many potential opportunities presented to future energy systems through deploying environmentally sustainable distributed energy resources and potential to balance supply and demand by adjusting demand during shortages caused by intermittent resources not producing. The actual transition from today’s industry into functional future industry is going to be difficult, however, given the magnitude of possible changes, and lack of quantifiable performance objectives capable of internalizing system constraints. Much the same as lack of quantifiable performance objectives presents a real roadblock to the deployment of new software capable of enhancing utilization of existing resources without violating reliability and safety, it is essential to define subobjectives for different layers within the changing industry, and quantifiable interactions with the rest of the system. This is a major R&D challenge. Once this is established, the actual ICT specifications and design will become quite manageable.
2.4.2 Related Unconventional Requirements for Monitoring and Control Managing multi-layered architectures in future power grids will effectively require a framework for auto-reconfiguring electric power systems according to the changing needs of customers and system conditions. Auto-reconfiguring is to be done in an adaptive way so that the most is made out of the available resources. Adaptive management is in sharp contrast to today’s top-down deterministic preventive control for managing uncertainties. Moreover, today the emphasis is on automation of local power grids (distribution systems) connecting medium size and small residential users and not solely on the backbone transmission systems. In order to meet these challenges, it is essential to develop a novel framework for dynamically integrating measurements and actuators throughout a large-scale network system in order to serve the end users efficiently during normal conditions, as well as reliably and securely during extreme conditions. Much intelligence needs to be developed for novel disruptive technologies, such as: (1) automatic metering of end users, digital relays; (2) Flexible AC Transmission Systems (FACTS), which for all practical purposes could be viewed as fast electronic valves for switching the parameters of the wires so that the strength of the interconnection and the overall configuration are dynamically controlled; (3) small distributed generation, ranging from combined heat power (CHP) through wind, solar, and fuel cells, as well as a very large number of highly unconventional small energy sources; (4) sensors and actuators for automated control of demand by the end users, including their response to dynamic electricity prices; as well as (5) exploring the potential of sensors ranging from typical size, through nano-and micro-sensors, which could be placed, in particular, with the end user [5]. Model-based information and software
30
M.D. Ilic´
algorithms will form the basis for coordinating interactions between these actively responding distributed components. Extensive simulations are needed to demonstrate the potential of just-in-time (JIT) and just-in-place(JIP) auto-reconfigurations of all candidate groups of components.
2.4.3 Next Generation SCADA: Dynamic Monitoring and Decision Systems (DYMONDS) as a Possible Means for Coordinated Interactions Among Distributed Decision Makers Conceptually, our envisioned DYMONDS would be embedded reconfigurable architectures facilitated by model-based interactive information systems for predictable performance at various industry layers of the changing industry. Our research focuses on the conceptualization of such evolving system architectures, families of system models, and multi-layered operating and planning decision solutions to support the reconfigurable power grids of the future. We view this task as a problem of designing and operating complex engineering systems by highly unconventional means. A DYMONDS framework is intended to integrate dynamically the measurements and actuators throughout a large-scale network system in order to serve the end users efficiently during normal conditions, and reliably and securely during extreme conditions. No single method lends itself to supporting such a framework. Instead, the envisioned auto-reconfiguration framework draws on several ideas: (1) making the system more observable by enhancing and gradually replacing current centralized Supervisory Data Acquisition and Control Systems (SCADA) with multi-layered measurement-communications architectures; (2) making the system more controllable by enabling both end users and the network elements with actuators, in addition to establishing suppliers as the main decision makers; (3) programming the logic for dynamic adjustments of context-dependent performance objectives by various actuators and industry layers; (4) using the context-dependent performance objectives to aggregate dynamically the measurement and control information; and (5) developing a new generation software architecture for simulating and eventually implementing goals (1) through (4). Our basic approach rests on two ideas: first, one needs to introduce a family of models to be embedded at each (group of) component level to assist its sensors and actuators convert large amounts of data into relevant information, and to adjust the logic of the actuators over broad ranges of conditions. These models must capture the physical processes in sufficient detail. Their complexity and order vary dynamically as a function of the objectives for which they are used, and of the type of sensing and actuation available. These models help sensors select the key information out of all avaliable sensed data. In addition, these models help actuators adjust their performance sub-objectives and even re-group their sub-objectives with the sub-objectives of other actuators, depending on the overall system conditions.
2 The Future of Electricity Systems
31
The second major idea is the system integration of the (groups of) components by designing an ICT-based DYMONDS framework for iterative interactions among groups of components. The major challenge is how to provide feedback incentives among the groups of components to aggregate dynamically and decide between their own sub-objectives and the objectives of the entire system. Novel concepts are needed for interactive model-supported sensing and decision making at the various industry layers. This plan should ultimately result in dynamically reconfigurable portions of the system according to their own sub-objectives, and with as much consideration for the objectives of the entire system as possible. The ultimate result will be the automation of JIT and JIP electricity service for well-understood performance in a complex electric power grid or similar network infrastructure. Attempting the above two ideas is far beyond the current state-of-the-art in large-scale dynamic systems. We believe that what is needed, instead, are breakthroughs across modeling, sensing, estimation, control, supporting software, and communications. All of these areas require a fresh look prior to efforts to solve the problem at hand. We briefly explain in the remainder of this section both the current state-of-the-art and the new approach needed for each of these areas. For interaction-variables-based modeling of open access DYMONDS and model-based reconfigurable DYMONDS, see [6].
2.5 Conclusions In summary, the future electric energy industry will require major re-thinking of its objectives. Today’s industry standards are primarily concerned with the worst-case design and operation. This in turn, allows system to be operated in a feed-forward manner while ensuring very reliable service. This is done, however, at the expense of under-utilizing the existing system and by observing emission as a regulatory constraint. An environment of this type does not have huge need for ICT. Therefore, ICT, has been, by and large viewed as an additional expenditure whose benefits are not obvious. In this chapter we have described that there is a significant potential for enhancing both reliability and efficiency by means of novel ICT which draws on on-line monitoring, estimation and robust software for on-line decision making. Much of the inefficiency is caused by so-called preventive rather than corrective operating mode in today’s industry. Nevertheless, lack of JIT and JIP actions has not been detrimental largely due to full knowledge and predictability of conventional energy sources and system demand. As the electric power industry gets challenged to pro-actively support deployment of novel environmentally sustainable small resources as well as utilize effects of demand adaptation, it is no longer possible to apply today’s deterministic worstcase scenarios as the basis for planning and operation. Instead, much can be gained by predicting and dynamic scheduling for predicted, yet uncertain conditions. There is an emerging literature documenting both operational problems with high
32
M.D. Ilic´
penetration of the intermittent resources, as well as proposed monitoring and decision making capable of eliminating such problems. Moreover, it has been shown that the system-wide performance objectives, given everything else equal, are greatly improved with such new ICT tools. Unfortunately, a systematic transition is not going to take place or it will be very slow without establishing model-based information exchange frameworks in support of meeting novel future objectives at reasonable cost. This, in turn, requires policy makers to come up with policies which clearly define such information exchange requirements and provide incentives for valuing the supporting ICT. This is a fundamental departure from the way the electric power industry has operated over the very long time. Without clear incentives, and protocols for supporting future objectives, the society is likely to quickly get disillusioned with the current energy and environment dream. Finally, the protocols are more complex than in the static hierarchically organized industry with traditional resources. This will require major collaborations between industry, governments and academia if the concepts are to be put in place in a timely way. Acknowledgement The author greatly appreciates partial funding on this work by the US National Science Foundation grant number CNS-0428404. The author also greatly acknowledges partial support by the Delft Chair.
References 1. Kundur, P.: Power System Stability and Control. Mc-Graw Hill, New York (1993) 2. Ilic, M., Zaborszky, J.: Dynamics and Control of Large Electric Power Systems. Wiley, New York (2000) 3. Building Networks for a Brighter Future 10–12 November 2008, Rotterdam, The Netherlands. http://www.nginfra.nl/conference2008/ (2008). Accessed 7 Dec 2009 4. European Technology Platform for the Electricity Networks of the Future. http://smartgrids. eu/. Accessed 7 Dec 2009 5. Morgan, M.G. , Apt, J., Lave, L.B., Ilic, M.D., Sirbu, M., Peha, J.M.: The Many Meanings of “Smart Grid”. EPP Policy Brief, August 2009, EPP Policy Briefing, Carnegine Mellow University (2009) 6. Ilic, M.: From hierarchical to open access electric power systems. In: Haykin S., Moulines E. (guest eds.) Proceedings of the IEEE, Special Issue on Large-Scale Dynamical Systems, vol. 95, no. 5. (2007) 7. Ilic, M., Black, J.W., Prica, M.: Electric Power Systems Research: Institutional and Technological Drivers for Near-Optimal Performance, Electric Power Systems Research. 77(9): 1160–1177 (July 2007) 8. Marceau, R.J., Malihot, R., Galiana, F.D.: A generalized shell for dynamic security and operations planning. IEEE Trans. Power Syst. 8(3), 1098–1106 (Aug. 1993)
Chapter 3
Dependency on Electricity and Telecommunications Benoît Robert and Luciano Morabito
Abstract In recent years, much has been written about the risks inherent to interdependencies among Critical Infrastructures (CIs). These infrastructures have become increasingly automated and interlinked and this linkage between CIs results in a highly complex and dynamic system which increases their vulnerability to cascading and domino effects. Four main types of interdependencies among CIs can be distinguished: physical, cybernetic, geographic and logical. This paper addresses the problematic of physical interdependencies. It presents two types of tools for managing physical interdependencies that were developed by the Centre Risque & Performance (CRP) at the École Polytechnique de Montréal in partnership with some of the CIs present in Montréal and Québec city. More specifically, this paper analysis the physical dependencies of CIs towards the telecommunications and the electrical systems and shows how it is possible to use these tools to identify, characterize and rank the interdependencies among CIs, understand and anticipate domino effects and plan protection and mitigation measures. Keywords Canada (Quebec) critical infrastructure emergency preparedness • interdependencies
3.1 Introduction It would be difficult, if not impossible, to imagine how our modern societies could function without essential resources. Such resources as drinking water, natural gas, telecommunications and electricity are necessary for the continuity and maintenance of all of society’s economic activities and for the well-being of the population.
B. Robert (*) and L. Morabito Département de mathématiques et génie industriel, Centre Risque & Performance, École Polytechnique de Montréal, Canada e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_3, © Springer Science + Business Media B.V. 2010
33
34
B. Robert and L. Morabito
These essential resources are supplied by Critical Infrastructures (CIs), which are grouped together in different classes depending on each country’s standards. In Canada, CIs are assigned to 10 classes: energy and utilities, communications and information technology, finance, health care, food, water, transportation, safety, government, and manufacturing [13]. In the United States, there are 13 classes of CIs [10], while in Europe, there are nine [6]. A CI’s mission is to supply an essential resource (or an essential service) that will be used by the public and also by other infrastructures, both critical and noncritical. To supply this resource, the CI uses other resources, which in turn come from other infrastructures [20]. Thus, to provide reliable telecommunications services, the telecommunications system uses electricity, water, natural gas, etc. Conversely, to provide electricity, the electric system uses telecommunications connections, water, etc. Thus, to function at their best, CIs need each other: the resources produced by some are used by others to produce their own resources and vice versa. Consequently, there is a high level of interdependency within these infrastructures. These interdependencies, which arise because of the exchange of resources, are functional (or physical) interdependencies. Other kinds of interdependencies also exist: for example, geographic interdependencies, which are due to the geographic proximity of infrastructures; cybernetic interdependencies, which are due to the exchanges of data on computer systems; and logical interdependencies, which are due to market realities or contextual circumstances [15, 18]. Although the adequate functioning of all CIs is based on these interdependencies, these four kinds of interdependencies among CIs form a highly complex system in which the failure of a single element (a single resource) may result in a fairly unpredictable domino effect, triggering harmful impacts on all CIs in a region and depriving thousands of users of the resources they need for their activities [12]. The aim of this article is to present some of the findings of a study of interdependencies that was conducted in Montreal and Quebec City from 2005 to 2008 by the Centre risque & performance at the École Polytechnique de Montréal. Along with multiple partners from both the public and the private sector and the main essential resources providers (electricity, natural gas, telecommunications, transportation (public transportation, road system), drinking water) in the cities of Montreal and Quebec (Canada), the Center was able to gather the information needed to conduct a study of the interdependences existing between these systems in these two specific areas. The participation of all these partners allowed the Center to develop a methodology for identifying the interdependencies between critical infrastructures and also to validate the results obtained. More specifically, in this paper, our intent is to analyze the importance of two particular resources (electricity and telecommunications) for the functioning of other CIs and society in general. With a focus on the use that other CIs make of electricity and telecommunications, the findings presented here will provide a better understanding of how the other CIs’ operations depend on these resources and how vulnerable they are to a failure of these systems.
3 Dependency on Electricity and Telecommunications
35
3.2 Electricity and Telecommunications: Two Indispensable Resources Of all the essential resources, electricity and telecommunications are undoubtedly those on which the functioning of society relies the most. These resources are used everywhere and by everybody; they form the very foundation of our daily activities: Two of the fundamental critical national infrastructures, upon which all others rely heavily, are power and telecom. Emergency services, banking and finance, agriculture and food, the chemical industry, defense industrial base, public health, and government cannot run effectively without them for any sustained period of time. [14, p. 127]
The ubiquity of these resources and the increasing reliability of the systems providing them mean that we sometimes forget that these resources too can fail. As a result, people often overlook the need to protect themselves from a problem affecting the supply of essential resources. Fortunately, breakdowns of the electricity and telecommunications systems are usually minor and of short duration, and so their impacts tend to be small. Nevertheless, major breakdowns can happen. A few examples should be mentioned to remind readers of the effects that blackouts and telecommunications outages can have on society. In January 1998, an ice storm hit the province of Quebec, particularly the area south of Montreal. Under the weight of more than 100 mm of ice, hundreds of transmission towers collapsed, depriving thousands of people of electric power. At the height of the Ice Storm crisis, more than one million people had no electricity, in winter, when the need for electricity is greatest. In all, this event caused the deaths of 28 people and approximately $790 million of damage to real property, vehicles and personal property [8, 17, 24]. On August 14, 2003, the most widespread blackout in the history of North America affected the central and northeast USA, as well as part of Ontario, in Canada. The blackout affected some 50 million people and resulted in the loss of 61,800 MW of electricity. Eight US states and the province of Ontario were affected. In some areas of the USA, it took about four days for the power to be restored. Some parts of Ontario underwent consecutive outages for more than a week before the situation returned to normal. Altogether, the blackout cost between US$4 and US$10 billion in the USA. In Canada, there was a net loss of 18.9 million hours of work. The shipping of manufactured goods declined by US$2.3 billion during this period [16, 25]. In September 2003, a similar blackout deprived 55 million people in Italy of power for approximately 3 h, resulting in four deaths [26]. More recently, in November 2006, a blackout hit Western Europe (Germany, France, Italy, Spain, Portugal, Belgium, Croatia, Austria, Slovenia and Luxembourg, but also, to a lesser degree, the Netherlands and Switzerland), with the result that close to 15 million users were without power for almost 2 h [2]. The telecommunications system is no stranger to major failures either. In January 2008, in Asia and the Middle East, two undersea cables were cut, reducing the capacity of telecommunications systems by 75% and depriving millions of
36
B. Robert and L. Morabito
users of service. It took about 15 days for service to be restored [5]. In December 2006, an earthquake in Taiwan damaged six major telecommunications cables. The damaged connections were responsible for communications between China, South Korea, Japan and the United States. It was several weeks before the system was repaired [4]. In September 2004, a technical problem affected the telecommunications service of the provider France Telecom. In this case, it took over 24 h to solve the problem [1, 3]. In April 2007, technical problems brought down the BlackBerry service in the USA for more than 3 h, while in February 2008, another BlackBerry breakdown again deprived North American users (about eight million people) of service for more than 10 h [9]. As these examples clearly show, a large-scale breakdown of electricity and/or telecommunications services can definitely occur. To prepare for such an event, it is important to understand how much our societies depend on these resources and what we must do to reduce our vulnerability to breakdowns. A society that is less vulnerable to such breakdowns will be more resilient; as a result, the associated consequences will be reduced.
3.3 Approach for Identifying Interdependencies Among Critical Infrastructures The complexity of systems means that failures can occur for many reasons. They may be due to natural causes, such as hurricanes, earthquakes, etc., but they may also be caused by technical problems, human error, or malicious intent. Thus, when analyzing the dependency of a society, an organization or an infrastructure on a given resource, one should not focus on understanding the potential causes of a failure but instead concentrate on its consequences. In the case of interdependencies among systems, the key issue is therefore not to understand why a system failed (i.e., is no longer able to supply a resource) but rather to analyze what the consequences of the unavailability of the resource supplied by this system will be for users. From this point of view, the most appropriate approach for assessing dependencies among systems is the consequence-based approach. Based on the study of the customer/supplier relationships that exist among the various systems, it is possible to find out which systems use which resources and where. Then, the use made of these resources is analyzed in order to determine the impact on a user of the unavailability of a given resource [22]. This impact is expressed as a function of the system’s capacity to supply its own resource and takes account of two very important parameters: time and geographic space [19–21]. The methodology to identify physical interdependencies among CIs is composed of five main steps: 1. Creating a cooperative space composed of the managers of CIs participating at the study
3 Dependency on Electricity and Telecommunications
37
2. Determination of the geographic area where the study of interdependencies will take place 3. Characterization of CIs in order to understand their mission (resource provided by the system to the society) and the functions they must perform in order to realize their mission 4. Analysis of the resources needed by each system (customer/supplier) and determination of the consequences to users of the non-availability of a resource 5. Creation of tools allowing the management of physical interdependencies among CIs (dependency curves and flexible cartography) The details of these steps can be obtained by consulting the methodological guide produced by the CRP [21].
3.4 Dependency Curves and the Flexible Cartography Approach To determine changes in the impact on a system’s mission as a function of time, the tools used are dependency curves (Fig. 3.1). Based on a four-color code (green, yellow, orange and red), these curves make it possible to find out the operating status of a system (of its mission) as a function of the length of time when one of the resources it uses is no longer available [20]. The consequences are assessed over 72 h. This period was chosen based on the 72-h autonomy period during which an organization or a person ought to be able to ensure self-sufficiency: The objective of 72 h of self-sufficiency constitutes a North American standard recognized by first responders (firemen, police officers and ambulance technicians), by all levels of government and by non-governmental aid organizations. These 72 h corresponds to the time needed to organize assistance [23, p. 3; our translation].
The meaning of the indicators in the dependency curves is given in Table 3.1. To determine the changes in impacts in a geographic area, the tools used are flexible cartographic representations. These cartographic representations make it possible to find out which geographic areas are affected by a breakdown and to anticipate future domino effects. Concretely, it is important to anticipate the impacts on the various systems’ missions of the failure of a resource in a particular geographic area. These areas are defined on the basis of system operations. For example, in the case of the water supply system, we consider the areas supplied by reservoirs; for the electricity system, we consider the areas supplied by
Fig. 3.1 Example of a dependency curve for a system in relation to a resource it uses
38
B. Robert and L. Morabito
Table 3.1 Indicators of CIs’ level of functioning Indicator Description Green The system is functioning normally with the resources it uses on an ongoing basis or without the contribution of one or more of its regular resources. Yellow The system is affected by the degradation of a resource in one or more of its infrastructures. The alternative measures or resources it has implemented to offset the degradation of the resource are sufficient and the system’s mission is not considered to be endangered in the next 72 h. Orange The system is affected by the degradation of a resource in one or more of its infrastructures. The palliative measures and alternative resources put in place to offset the degradation of the resource are not sufficient and the system’s mission is considered to be endangered in the next 72 h. Red The system’s mission has been affected in one location in the study area. An infrastructure belonging to another CI has been deprived of this resource.
transformer stations; for telecommunications, we consider the areas supplied by telephone exchanges; in the case of the public transit system, we consider the areas supplied by a particular transportation center (bus terminus, section of subway line, etc.); and so on (Fig. 3.2). This kind of cartography is called “flexible” because it offers enough flexibility to address the issue of interdependencies among systems without forcing the systems to reveal confidential information such as the exact location of infrastructures. Thus, attention is paid only to the impact on the systems’ mission of the unavailability of a resource in a given area (supply area) and no attempt is made to identify what is located in the area in question [20]. Figure 3.3 shows how flexible cartography and dependency curves work together. In the event of a water outage in the Water-11 sector, it is possible to anticipate a telecommunications outage in the Telecom-3 sector approximately 8 h later, even though we have no information on what telecommunications system infrastructures are located in the Water-11 supply area. To reach these kinds of conclusions, multidisciplinary work is necessary. Each system must receive information on the other systems’ supply areas and analyze the impact on its own functioning of a resource outage in each of these areas. Only the outcome of these analyses is transmitted to the partners in the study, in the form of dependency curves. Domino effects must then be studied in order to implement protection and communication mechanisms to reduce the risk of such effects [19]. When used jointly, dependency curves and flexible cartographic representations give users a faithful representation of the impacts over time of a given event (represented by a breakdown in a resource) and the domino effects likely to result. They bring together the essential information that will make it possible to manage a problematic situation without actually mirroring the complexity of the systems and the databases that contain all the information on these systems, which must be updated continuously. As well, because they make use of information that has
3 Dependency on Electricity and Telecommunications
39
Fig. 3.2 Supply areas for some systems in Montreal
Fig. 3.3 Joint use of flexible cartography and dependency curves
already been processed by the system managers themselves, these tools enable users to avoid drawing erroneous conclusions concerning the interpretation of raw data [20, 21]. Ultimately, an early warning system could potentially be developed to manage interdependencies among CIs [7, 19, 20]. A 4-year project (2008 to 2012) to create such a system is now under way.
40
B. Robert and L. Morabito
3.5 Dependency on Electricity The all but universal use of electricity means that, when power outages occur, they always have impacts. Modern society has come to depend on reliable electricity as an essential resource for national security; health and welfare; communications; finance; transportation; food and water supply; heating, cooling, and lighting; computers and electronics; commercial enterprise; and even entertainment and leisure—in short, nearly all aspects of modern life [25, p. 5].
Whether minor or major, the impacts on users of a power outage vary depending on a multitude of factors. In fact, although electricity is used everywhere, and in a more or less similar way, some users are more vulnerable to a power outage than others. Several parameters explain an organization’s vulnerability with regard to the resource “electricity.” In general, the greater the impacts of a power outage on an organization, the more vulnerable it is to such an outage. Among other things, these impacts depend on the use the organization makes of electricity, the context (time of day or time of year when the outage occurs, etc.), the duration and extent of the outage, the protective measures and management techniques that the organization has put in place (alternative resources) to protect itself from a power outage, and also the importance of the mission or service provided by this organization for its users: some organizations are better able than others to cease activities for some time without causing serious disruptions. Since the context, duration and extent of a power outage are not factors on which an organization can act directly, the use of alternative resources and the implementation of effective management methods are certainly the most important factors that the organization must focus on to reduce its vulnerability to a power outage. By putting alternative resources in place, an organization can increase its autonomy from the electric system to which it is connected. Thus, for example, if two organizations make similar use of the resource, the organization that owns a generator will definitely be less vulnerable to a power outage than the one that does not. As a result, the first organization will also have greater autonomy than the second in the event of an outage. Nevertheless, the use of alternative resources (or alternative management methods) in the event of a power outage should not give an organization the impression that it is entirely protected from power outages. As a matter of fact, the work we have done in Montreal and Quebec City revealed that most infrastructures are ready to handle situations when the power is cut and are actually designed to function for a certain amount of time during outages. Most systems have battery systems and/or generators, which give them some autonomy from the electric system. When power outages occur, these systems allow the organizations to keep their systems’ critical functions going and thus ensure their business continuity. Thus, if we ask them “Are you ready to cope with a power outage?” the answer is positive: “Yes, we have our generators.” The answer is just as positive if we ask them how long they can handle a power outage: “Indefinitely, as long as our generators keep working.”
3 Dependency on Electricity and Telecommunications
41
Fig. 3.4 Dependency curves for an electricity outage in downtown Montreal
Thus, if we analyze the dependency curves for the electricity resource in downtown Montreal, we can see that the systems would suffer very few impacts for the first 72 h because of their generators (the dependency curves change from green to yellow) (Fig. 3.4). Most of the systems’ generators operate on diesel (some use natural gas). In a power outage, then, a new dependency would be created on this alternative resource. A report on the volumes of diesel the generators of the CIs participating in the study need to function makes it obvious that there would be a supply problem in the event of a prolonged electricity outage (Fig. 3.5). As one can see, in downtown Montreal, the cumulative need for diesel by the infrastructures’ generators is approximately 5,300 L/h, or over 127,000 L/day. And it should not be forgotten that these represent only the needs related to the generators of the partners involved in the study. It does not take into consideration all the generators in downtown Montreal, nor the ongoing diesel needs of vehicles, machinery, etc. A shortage of diesel can therefore be anticipated in the event of a prolonged power outage. This is especially likely, given that the diesel supply systems themselves are powered by electricity (in fact, this exact situation was experienced during the 1998 ice storm). Bearing in mind the refining capacity of the petroleum industry in the region and the systems’ cumulative needs, it is therefore possible to predict that, after these generators had been operating for a certain time, there could be a shortage of diesel. Figure 3.6 illustrates this situation (for reasons of confidentiality, the values presented in this graph are not real ones). As of the time when the production of petroleum products becomes zero (at time T0), the reserve quantity (the total amount of petroleum available less the total amount
42
B. Robert and L. Morabito
Fig. 3.5 Diesel consumption by generators in downtown Montreal
Fig. 3.6 Petroleum supply and demand curves
of petroleum consumed) starts to decline. At time T1, there is no longer enough petroleum to meet all needs. At time T2, there is no longer enough petroleum to keep all the generators operating (Fig. 3.6).
3 Dependency on Electricity and Telecommunications
43
Fig. 3.7 Domino effects in the event of an electricity outage followed by a diesel shortage
Thus, when we ask system managers the question “What would happen if there was a diesel shortage during a power outage?” the answer for many of them is far less encouraging. According to this assumption, the effect on the transportation system would be almost instantaneous (2 h) and would be reflected in the closure of a major highway corridor. As for the telecommunications system, a total service breakdown would occur after only a few hours, depending on the amount of fuel already inside the generators’ tanks at the time of the shortage (Fig. 3.7). In reality, as the curves in Fig. 3.7 show, only the water and natural gas systems would be able to continue operating if there was a diesel shortage. This is explained by the fact that Montreal’s drinking water system is gravity-based. The water is pumped from pumping stations on top of Mount Royal, where huge reservoirs supply the downtown area by gravity. The water in these reservoirs is sufficient to supply the city for about 8 h, depending on demand. Moreover, the water system is equipped with a generator that functions on natural gas, which enables all of the plant’s critical systems to operate without electricity. This generator was commissioned after the ice storm, when the city came within a couple of hours of running out of water. The natural gas system also has generators that run on natural gas, which would enable it to operate autonomously without electricity. That explains why these two systems would retain a significant amount of autonomy if there was a power outage and a diesel shortage. Thus, even though the CIs have business continuity plans to deal with power outages, it is important to take cumulative needs into account to determine the longterm viability of these plans. Obviously, lengthy outages are quite rare, but they are not impossible. For that reason, it is important for the managers of these infrastructures to create backup systems that will be able to operate for long periods and also to
44
B. Robert and L. Morabito
contemplate systems that do not all require the same resources to operate. There are many kinds of generators that operate using a variety of fuels. Users need to be sure of making a choice that will give them the maximum autonomy when necessary [14]. Moreover, it would be desirable for CI managers to meet so they can create appropriate plans that will take the systems’ cumulative needs into consideration. These plans will also make it possible to give priority to supplying diesel or restoring electrical service to certain specific infrastructures that are particularly likely to create undesirable domino effects if they cease to function. A similar study could also make it possible to develop integrated plans that will establish priorities for fuel supply for CIs in the event of a prolonged outage and a fuel shortage. In this regard, it is important to note that priorities for fuel supply in the event of a shortage are often determined based on the category to which an infrastructure belongs. In Quebec, for example, four levels of priority for recovery have been established [11]. In the event of a fuel shortage, infrastructures with priority 1 are supplied first, while those with priority 4 are supplied last. For example, hospitals and telephone exchanges are priority 1 infrastructures; office towers and industrial buildings are priority 4 infrastructures. However, these priority levels do not take account of the context of the outage, the autonomy of the systems, their vulnerability to a power outage or the domino effects that could occur following the interruption of a particular infrastructure’s operations. In times of need, allocating diesel to the various infrastructures based on priority level alone, without taking these elements into consideration, would mean that bad decisions would be made, thereby aggravating the situation. For example, certain infrastructures in downtown Montreal have a priority level of 1 but enjoy considerable autonomy (more than 72 h). The water pumping stations are one notable example. Other infrastructures are given a priority level of 4, whereas they have very limited, or even no, autonomy. This is the case of certain office towers that house some of the systems’ call centers and even major control centers that need to be kept operating so that the situation can be monitored and a speedy recovery ensured. These elements need to be taken into consideration at the time when diesel supply priorities are determined but also when determining priorities for restoration of electric power. Thus, the study of domino effects and the vulnerability of infrastructures to power failures makes it possible to jointly determine the priorities for service restoration, based on an analysis of systems’ individual needs and domino effects. Then, when the time comes, good decisions can be made that will favor the collective operation of the systems and a return to normality. As well as being used regularly by organizations, electricity can also be an alternative resource in some cases. For example, many organizations use natural gas for heating in winter. If there is an outage of natural gas, most of these organizations have identified backup electrical heating systems as alternative resources. Evidently, these systems require electricity. As well, some organizations normally use water to cool down critical equipment. In the telecommunications system, for example, certain infrastructures absolutely depend on water for cooling. In the event of a water outage, this system has recourse to air-conditioning systems operating on
3 Dependency on Electricity and Telecommunications
45
Fig. 3.8 Domino effects in the event of an outage of water and electricity
electricity. Thus, in the event of a water shortage, it is important for these infrastructures to continue to be supplied with electricity. A breakdown involving water and electricity simultaneously could potentially result in a loss of telecommunications services after only a few hours (Fig. 3.8). This figure also reveals that a water outage would affect the transportation system. That is because, in Montreal, a problem with the water supply would affect a tunnel: the regulations stipulate that the tunnel must be closed if fire protection systems do not work because users’ safety could not then be guaranteed if there was an accident.
3.6 Dependency on Telecommunications The functioning of modern society relies to a great extent on information and on the capacity of organizations to communicate with each other fast and efficiently. Thanks to the Internet and the numerous new information technologies, it is possible to speedily access vital information, enabling economic activities to be carried out by all actors in society. Many of today’s businesses are large multinationals that operate more and more on a just-in-time basis with their various suppliers and subcontractors and therefore need information fast. This information is essential for the operation of their systems, for monitoring, for the production and transport of essential resources, for engaging in the phenomenal number of financial transactions that are carried out every second, etc. Consequently, a failure of the telecommunications system could very quickly affect organizations’ functioning.
46
B. Robert and L. Morabito
Fig. 3.9 Domino effects in the event of a telecommunications outage in downtown Montreal
One of the conclusions we drew from our work in Montreal and Quebec City was that telecommunications are mainly used by the other systems for management and financial activities. In fact, many of the systems we studied possess their own internal telecommunications networks for monitoring their systems. However, there were some exceptions to this rule: emergency services would be directly affected by a telecommunications breakdown. As well, in Montreal, the existence of a tunnel means that a breakdown in telecommunications would almost immediately entail the closure of that tunnel, which is a very important highway corridor. Government policies stipulate that it would be impossible to operate this tunnel if a telecommunications problem affected its operators’ ability to alert emergency services (an accident in a tunnel can have disastrous human consequences). The same is true of the public transit systems (trains and metro) and the banking systems. Thus, all these systems are highly vulnerable to a telecommunications breakdown (Fig. 3.9). It should be noted that the curves in Fig. 3.9 were obtained by considering the various systems’ dependency on the telecommunications system in the course of their normal operations. But it is important to remember that the telecommunications system would play a crucial role in emergency management and recovery after a disaster. Thus, as soon as such an event occurred, the first system to be called upon would be the telecommunications system. First responders use the communications system to exchange information; the general public does too, so demand for this service would be heavy. Therefore, in a disaster or a case of failure, the importance of the telecommunications system becomes key. All the systems that participated in the study have call
3 Dependency on Electricity and Telecommunications
47
Fig. 3.10 Domino effects in the event of a telecommunications outage in downtown Montreal (assuming the systems were already operating in degraded mode at the time of the outage)
centers that allow users to report anomalies, breakdowns or hazardous situations. In the absence of telecommunications, these call centers would become inoperative. And unlike a power outage, where most of the systems would be relatively well served by their generators, they are not very well equipped to handle a telecommunications breakdown. Deprived of telecommunications, it would be much more difficult for the systems to identify and locate outages, since they would lose all contact with their customers. Under these circumstances, the various systems’ autonomy in the event of a telecommunications breakdown during an emergency would be greatly reduced (Fig. 3.10). In addition, in Montreal, bearing in mind the impacts that a telecommunications breakdown could have on the road transportation and public transit systems, major traffic jams can be anticipated. It would therefore be harder for the systems to dispatch teams on the ground, so recovery of the systems would take much longer than normal. The result would be a crisis in which impacts continued to amplify each other and it became increasingly difficult to contain the domino effects. A telecommunications outage at any time could have serious consequences for the overall functioning of society, but in a disaster, when it became necessary to intervene and restore systems, the consequences could be catastrophic. In Quebec City, an interesting instance of vulnerability related to telecommunications was identified during our study of interdependencies. Unlike the city of Montreal, the city of Quebec (in the sense of an administration rather than a geographic area) possesses its own internal telecommunications system. A priori, then, it would seem that the city of Quebec would not be affected by a problem with
48
B. Robert and L. Morabito
Fig. 3.11 Domino effects in the event of a telecommunications outage in Quebec City
the private telecommunications systems. However, a technical study revealed critical points regarding the connections between the city’s internal system and the private external systems. Thus, if any problems affected these critical points, the city of Quebec would find itself completely isolated from the rest of the world, in terms of telecommunications (Fig. 3.11). It may be unnecessary to mention that the system managers acted speedily to correct this weakness! This kind of situation occurs much more frequently than one might think. Even if organization X possesses its own internal telecommunications system or does business with telecommunications company Y, that does not mean that it will not suffer consequences if there is a telecommunications outage at supplier Z. Often, infrastructures are leased and shared among suppliers. Thus, supplier Y may be leasing bandwidth from supplier Z so it can offer its service to organization X. Consequently, a failure affecting supplier Z will entail a failure at supplier Y and thus affect organization X even if there is no customer/supplier relationship between organization X and supplier Z. All of these domino effects are hard to predict because the impacts on one system of the failure of another system are mediated by yet other systems. The study of interdependencies and the analysis of potential domino effects make it possible to identify such effects and consider the related problems. These analyses represent a complex task that needs to be multidisciplinary and absolutely must involve managers from all the CIs so that specific cases can be identified. A society’s resilience is based to a great extent on the reliability of its essential systems, which is why it is important to analyze the dependencies and interdependencies among these systems.
3 Dependency on Electricity and Telecommunications
49
3.7 Application and Validation of the Methodology The approach and tools presented in this paper are used to anticipate the domino effects between CIs and plan measures to avoid or mitigate the consequences of these phenomenons. Therefore, we cannot really validate these tools by analyzing their application to past situations. It would be too easy to say that by using these tools, events such as the ice storm of 1998 may have had lower consequences. However, several unexpected developments during the ice storm could have been anticipated if similar tools existed then. For example, one could have foreseen the problems of water supply and shortages of gasoline before being faced with these situations. The real advantage of these tools is in the planning phase. By allowing CIs to get to know each other and to better target their interdependencies, especially those which may lead to adverse domino effects. The knowledge of these interdependencies can then allow the identification of several measures to be implemented to mitigate the consequences of these domino effects. For example, in Montreal, critical interdependencies were identified between the water and telecommunications systems (water used for cooling equipment), another critical interdependence has been identified between the water and transport systems (water used for fire protection in a major road infrastructure), another between the telecommunications and transport systems (SCADA systems), and finally, another between telecommunications and natural gas systems (SCADA systems). All these cases have led to specific studies that have helped put in place measures aiming at reducing the vulnerability of these critical links. Changes in maintenance of certain valves of the water system have been established, intervention procedures in the event of a natural gas leak in an area near electrical substations have been changed, etc. These actions have been achieved thanks to information from the study of interdependencies and the tools developed. By doing so, these tools can help retaining organizational knowledge by integrating considerations on interdependencies directly into the process of CIs. The approach was also validated in Quebec City. There again, the approach and tools have helped identifying critical interdependencies and put in place different measures to mitigate them. One event occurred in Quebec City that illustrates the use of tools. In September 2008 a major fire occurred near a telephone exchange. Before asking for a power outage from the power grid, firefighters have first asked about the autonomy of the telecommunications system without electricity to ensure that the outage would not cause additional problems in the telecommunications service. This example illustrates that these tools can help to take the right decisions in situations of need. It should also be noted that in the context of our work, we have considered only the dependence towards the fixed telephone system. Future works will have to integrate other infrastructure systems such as the mobile telephone system, the television and radio communications and the computer systems (Internet) in order to establish the dependence of the systems towards the information (data) that transits trough these communications systems (cybernetic dependences).
50
B. Robert and L. Morabito
3.8 Conclusions There is no need to demonstrate that society depends on essential resources. However, the various systems and infrastructures do not all have the same vulnerability to an electric or telecommunications outage. Thus, the impacts of such an outage will not be the same for all users. It is important for users that are very vulnerable to a breakdown of these resources to be able to implement protective measures that will make them less vulnerable and enable them to keep their primary functions operating. Nevertheless, reliance on an alternative resource, such as a generator, cannot be considered a valid reason for not asking the question “What would happen if there were a shortage of diesel or, more simply, if the generator didn’t work properly?” As we have illustrated with the examples detected by the study we carried out in Montreal and Quebec City, the cumulative needs for fuel for such generators are very high, which means that these systems could soon become inoperative if there were a prolonged blackout. Systems must therefore be able to think beyond such initial management measures and collectively consider the viability of these alternative measures. When the consequences of an interruption of service would be major, it is essential to provide for other protective measures. For example, one might acquire generators that operate on different kinds of fuel (e.g., natural gas) or, as is common in telecommunications systems, set up battery systems to maintain the organization’s critical activities. It might also turn out that the impacts of an interruption of service would be minimal. In that case, simply interrupting service could also constitute a measure that would promote recovery. To make these kinds of decisions, organizations need to know each other. An organization needs to know itself and know how much it depends on other organizations, but it also has to be familiar with other organizations and know how they depend on it. To do this, interdependent organizations must exchange relevant information so they can set up joint action plans to deal with a prolonged lack of a resource, whether it be electricity or any other resource. This is necessary to ensure the business continuity of all actors involved in the supply of services that are essential not only to the population but also to their own functioning. To do this, an analysis of interdependencies and domino effects is essential. This analysis should be conducted by a leading and neutral organism such as a government. It is important that this organism should be neutral because of the nature and the confidentiality of the information exchanged. The role of this organism will be to gather all the information from the different systems needed for evaluating the interdependencies. In Montreal and Quebec, this role was assumed by the Centre risque & performance: a university research center. In such a study, communication is the key factor of success. However, a simple and systematic approach must be used. Many approaches may be valid for such a study. In our case, we have developed an efficient consequences-based approach that is general enough to be applied in any geographic area with minor adaptation. A step-by-step guide for applying this methodology is available [21].
3 Dependency on Electricity and Telecommunications
51
Once interdependencies are analyzed, and their possible impacts identified, organizations can implement local protection and preventive measures in order to decrease their vulnerability to failures that may occur in other systems and to increase their systems’ capacity to respond when needed and limit the impacts of certain failures. Measures can also be taken at a more global level by all the systems, the first responders, the emergency managers and the governments in order to decrease our society’s vulnerability in the face of these resources and increase their resilience, thereby favoring a speedy return to normal after a breakdown. Acknowledgement The results presented in this paper are the outcome of work by the Centre risque & performance at École Polytechnique de Montréal. They were developed in the City of Montréal and tested and validated in the city of Québec. The CRP’s research project benefited from funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) and from Public Safety Canada (PSC) within the Joint Infrastructure Interdependencies Research Program (JIIRP) and from the active participation of the following partners from the key public and private organizations in Montréal: • • • • • • • • •
Bell Canada GazMétro Hydro-Québec Ministère de la sécurité publique du Québec Ministère des transports du Québec Public Security and Emergency Preparedness Canada Tecsult City of Montréal (Water and Sewer System, Emergency Preparedness Centre) City of Québec (Water and Sewer System, Transportation System, Telecommunications System, Emergency Preparedness Office)
The contributions of all of these partners, both technical and financial, allowed the center to collect the data we needed to develop concrete tools for managing risk including those presented in this article.
References 1. Chicheportiche, O.: Méga panne chez France Télécom: la faute à la VoIP?. Silicon.fr November 2 (2004) 2. Commission de régulation de l’énergie: Rapport d’enquête de la Commission de régulation de l’énergie sur la panne d’électricité du samedi 4 Novembre 2006, Commission de régulation de l’énergie (CRE), February 2007. http://www.see.asso.fr/bulletin/actu/2007/pdf/cre-rapport-8nov. pdf (2007). Accessed 12 Jan 2009 3. Comparatel: L´explication de la panne de ce week-end chez France Telecom, la VoIP. L’actualité des télécoms November 3 (2004) 4. Crochet-Damais, A.: Panne Internet en Asie: l’heure du bilan. Le Journal du Net December 30 (2006) 5. Drothier, Y.: Une panne Internet frappe l’Asie et le Moyen-Orient. Le Journal du Net January 31 (2008) 6. EU: Protection des infrastructures critiques dans le cadre de la lutte contre le terrorisme”, Communication de la Commission au Conseil et au Parlement Européen, June 2004. http:// europa.eu/scadplus/leg/fr/lvb/l33259.htm (2004). Accessed 4 June 2008
52
B. Robert and L. Morabito
7. Guichardet, G.: Early warning system and interdependencies. Master’s thesis, École Polytechnique de Montréal (April 2009) 8. Lecomte, E.L., Pang, A.W., Russell, J.W.: La tempête de verglas de 1998, Institut de prévention des sinistres catastrophiques (IPSC). http://www.iclr.org/pdf/ice%20storm%20report%20 french.pdf (1998). Accessed 14 Jan 2009 9. Leduc, C.: Panne de BlackBerry: la faute à une mise à jour. Branchez-vous February 13 (2008) 10. Moteff, J., Copeland, C., Fisher, J.: Critical infrastructures: what makes an infrastructure critical? Report for Congress, The Library of Congress, January 29, 2003. http://www.fas.org/irp/ crs/RL31556.pdf (2003). Accessed 4 June 2008 11. MRNFP: Activité pétrole et produits pétroliers: priorités de rétablissement de distribution en situation d’urgence, Ministère des ressources naturelles, de la faune et des parcs du Québec, Direction du développement des hydrocarbures (Feb 2004) 12. OCIPEP: Les menaces aux infrastructures essentielles canadiennes, Office of Critical Infrastructure Protection and Emergency Preparedness, Research and Development Directorate, March 12, 2003. http://www.infrastructure.gc.ca/research-recherche/result/alt_formats/pdf/ ocipep_f.pdf (2003a). Accessed 4 June 2008 13. OCIPEP: Infrastructures essentielles nationales”, Office of Critical Infrastructure Protection and Emergency Preparedness, Research and Development Directorate, December 2003. http:// www.infrastructure.gc.ca/research-recherche/result/alt_formats/pdf/ocipep_f.pdf (2003b). Accessed 4 June 2008 14. O’Reilly, G.P., Kelvin Chu, C.-H.: Optimal deployment of power reserves across telecom critical infrastructures. Bell Labs Tech. J. 12(4), 127–142 (2008) 15. Peerenboom, J.P., Fisher R.E., Rinaldi, S.M., Terrence, K.K.: Studying the chain reaction. Electric Perspectives, January/February (2002) 16. PSEPC: Panne d’électricité en Ontario et aux États-Unis—Impacts sur les infrastructures essentielles, Public Security and Emergency Preparedness Canada, Document IA06-002, August 2006. http://www.securitepublique.gc.ca/prg/em/_fl/ont-us-power-f.pdf (2006). Accessed 14 Jan 2009 17. Québec (Province), Nicolet, R.: Facing the unforeseeable: Lessons from the ice storm of ‘98: report of the Commission scientifique et technique chargée d’analyser les événements relatifs à la tempête de verglas survenue du 5 au 9 janvier 1998, ainsi que l’action des divers intervenants. Québec: The Commission (1999) 18. Rinaldi, S.M., Peerenboom, J.P., Terrence, K.K.: Identifying, understanding, and analyzing critical infrastructures interdependencies. IEEE Control Systems Magazine, December (2001) 19. Robert, B., de Calan, R., Morabito, L.: Modelling interdependencies among critical infrastructures. Int. J. Crit. Infrastruct. 4(4), 392–408 (2008) 20. Robert, B., Morabito, L.: The operational tools for managing physical interdependencies among critical infrastructures. Int. J. Crit. Infrastruct. 4(4), 353–367 (2008) 21. Robert, B., Morabito, L.: Réduire la vulnérabilité des infrastructures essentielles: Guide méthodologique, 62 pp. Les Éditions Tec&Doc, Lavoisiers, France (2009) 22. Robert, B., Morabito, L., Quenneville, O.: The preventive approach to risks related to interdependent infrastructures. Int. J. Emerg. Manag. 4(2), 166–182 (2007) 23. SPQ: Trousse promotionnelle pour les entreprises et autres organisations, Sécurité Publique Québec. http://www.msp.gouv.qc.ca/secivile/secivile.asp?txtSection=semaine&txtCategorie= trousses&txtNomAutreFichier=trousse_entreprises.htm (2008). Accessed 14 Jan 2009 24. Statistics Canada. (1998). The St. Lawrence River valley 1998 ice storm: Maps and facts. Catalogue no. 16F0021XIB. 25. U.S.–Canada Power System Outage Task Force: Final report on the August 14, 2003 blackout in the United States and Canada: causes and recommendations, April 2004, https://reports. energy.gov/ Accessed 7 Dec 2009 26. Vis, S.: ‘Black-out’ à répétition: Quelles conséquences? Les échos de la bourse, November 20 (2003)
Chapter 4
Critical Interrelations Between ICT and Electricity System Janusz W. Bialek
Abstract The widespread blackouts of 2003 have exposed the critical role of ICT systems in maintaining reliable operation of power systems. Fundamental errors in providing back-up and alarm function in the control room were one of the main contributing factors to the 2003 USA/Canada blackout. The lack of proper ICT infrastructure to enable proper communication and cooperation between system operators in Italy and Switzerland led to delayed remedial actions and the consequent blackout of Italy in 2003. Improved ICT systems would enable a better real-time cooperation and coordination between utilities in an interconnected power system but the main challenge is political: overcoming resistance of individual utilities to give up partially their interdependence and operate within the paradigm of a distributed, but coordinated, control. Emergence of GPS-synchronised Wide Area Measurement Systems (WAMS) holds a great promise for improved monitoring and control of modern power systems and therefore avoiding future blackouts. Keywords WAMS • power/ICT interdependencies • North-American and Italian blackout
4.1 Introduction Modern power systems are highly complicated, dynamic, non-linear, time-critical, and covering large geographic areas. In addition, reliable operation of the power grid is complex and demanding for two main reasons: • As electricity flows at almost the speed of light and it cannot be stored economically at large quantities, its production must follow consumption on minute-by-minute basis.
J.W. Bialek (*) Institute for Energy Systems, School of Engineering, The University of Edinburgh, UK e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_4, © Springer Science + Business Media B.V. 2010
53
54
J.W. Bialek
Any discrepancy between production and consumption will be absorbed by kinetic energy of all rotating generators resulting in their increased or reduced speed. • The flow of electricity in a meshed network is governed by the laws of physics and cannot be directly controlled.1 In order to avoid overloading of transmission lines, system operators (SOs) must have ability to adjust output of generators (or consumption of loads) in certain locations. All those reasons mean that monitoring and control of modern power systems requires advanced ICT systems. Any problems with ICT systems may have grave consequences for security of supply. This paper will discuss two aspects of critical interrelations between ICT and electricity systems. Firstly we will illustrate this criticality by discussing how inappropriate ICT systems contributed to widespread blackouts in USA/Canada and Italy in 2003. Then we will discuss the potential of GPS-synchronised Wide Area Measurement Systems (WAMS) in preventing power system blackouts.
4.2 ICT Systems and Blackouts The critical dependence of electricity supply systems on ICT systems has been most vividly demonstrated during a number of blackouts that occurred in recent years. There were six blackouts within 6 weeks in the late summer of 2003 affecting about 112 million people in USA, UK, Denmark, Sweden and Italy. They were all transmission-based, i.e. there were no problems at the time with the level of generation. The systems were not stressed before the blackouts occurred – in Italy the blackout even happened at night. Obviously the direct reasons for the blackouts were purely electrical (short circuits) but one of the reasons the blackouts spread so widely was the inadequate provision of ICT systems. This point will be described in more detail using the example of the USA/Canada and Italia blackouts. The attention will be concentrated not on the blackouts themselves but rather on how inadequate provision of ICT system has exasperated the situation. Please refer to Appendix 1 for explanation how security of an interconnected power system is maintained and to [2] for more details on blackouts of 2003.
4.2.1 US Blackout on 14 August 2003 The blackout was triggered by some initial innocuous-looking outages in northern Ohio, which spread to the North East of USA and parts of Canada. 62 GW were lost, about 50 million people were affected, full restoration took several days. The blackout description below contains extended excerpts from the final report [1]. Power flows can be controlled to some extent by so-called Flexible AC Transmission System (FACTS) devices but they are still too expensive for general use.
1
4 Critical Interrelations Between ICT and Electricity System
55
4.2.1.1 The Course of Events Figure 4.1 shows the geographical area and the control areas in the area where the blackout started. It is important to note that the events involved directly six control areas, which shows the importance of proper communication and coordination between System Operators. The disturbances started in northern Ohio controlled by FirstEnergy (FE) – see Fig. 4.1. At 14:02 EDT, Dayton Power & Light’s (DPL) Stuart-Atlanta 345-kV line (see Fig. 4.1) tripped off-line due to a tree flashover.2 This line had no direct electrical effect on FE’s system – but it did affect Midwest Independent System Operator (MISO) performance as reliability coordinator, even though PJM is the reliability coordinator for the DPL line.3 One of MISO’s primary system condition evaluation tools, its state estimator, was unable to assess system conditions for most of the period between 12:37 and 15:34 EDT, due to a combination of human error and the effect of the loss of DPL’s Stuart–Atlanta line on other MISO lines as reflected in the state estimator’s calculations. Without an effective state estimator, MISO was unable to perform
Fig. 4.1 The area of blackout origination [1]
Flow of current causes transmission lines to heat and sag. If trees growing underneath are not cut in time, a flashover may occur. 3 See Appendix 1 for explanations about reliability of interconnected power systems, state estimation, SCADA systems, etc. 2
56
J.W. Bialek
contingency analyses of generation and line losses within its reliability zone. Therefore, through 15:34 EDT MISO could not determine that with an earlier trip of Eastlake 5 power station in Cleveland, other transmission lines would overload if FE lost a major transmission line, and could not issue appropriate warnings and operational instructions. Starting around 14:14 EDT, FE’s control room operators lost the alarm function that provided audible and visual indications when a significant piece of equipment changed from an acceptable to problematic condition. Shortly thereafter, the Energy Management System (EMS) lost a number of its remote control consoles. Next it lost the primary server computer that was hosting the alarm function, and then the backup server such that all functions that were being supported on these servers were stopped at 14:54 EDT. However, for over an hour no one in FE’s control room grasped that their computer systems were not operating properly, even though FE’s Information Technology support staff knew of the problems and were working to solve them, and the absence of alarms and other symptoms offered many clues to the operators of the EMS’s impaired state. Thus, without a functioning EMS or the knowledge that it had failed, FE’s system operators remained unaware that their electrical system condition was beginning to degrade. Unknowingly, they used the outdated system condition information they did have to discount information from others about growing system problems. From 15:05:41 to 15:41:35 EDT, three 345-kV lines tripped in Cleveland area at 43.5%, 87.5% and 93.2%, respectively, of their normal and emergency line rating. As each of the transmission lines failed, power flows shifted to other transmission paths increasing their loading and resulting in further trips. Additionally, voltages on the rest of FE’s system degraded further. As each of FE’s 345-kV lines in the Cleveland area tripped out, it increased loading and decreased voltage on the underlying 138-kV system serving Cleveland and Akron, pushing those lines into overload. Starting at 15:39 EDT, the first of an eventual sixteen 138-kV lines began to fail. As these lines failed, the resulting voltage drops caused a number of large industrial customers with voltage-sensitive equipment to go off-line automatically to protect their operations. As the 138-kV lines tripped out, they blacked out customers in Akron and the areas west and south of the city, ultimately dropping about 600 MW of load. The collapse of FE’s transmission system induced unplanned power surges across the region. Shortly before the collapse, large electricity flows were moving across FE’s system from generators in the south (Tennessee, Kentucky, Missouri) to load centers in northern Ohio, eastern Michigan, and Ontario – see Fig. 4.2.1. This pathway in northeastern Ohio became unavailable with the collapse of FE’s transmission system. The electricity then took alternative paths to the load centres located along the shore of Lake Erie – see Fig. 4.2.2. Power surged in from western Ohio and Indiana on one side and from Pennsylvania through New York and Ontario around the northern side of Lake Erie. Transmission lines in these areas, however, were already heavily loaded with normal flows, and some of them began to trip.
4 Critical Interrelations Between ICT and Electricity System
57
Fig. 4.2 Overview of power flows during the blackout [1]
The northeast then separated from the rest of the Eastern Interconnection due to these additional power surges. The power surges resulting from the FE system failures caused lines in neighbouring areas to see overloads that caused impedance
58
J.W. Bialek
relays to operate.4 The result was a wave of line trips through western Ohio that separated AEP from FE – see Fig. 4.2.3. Then the line trips progressed northward into Michigan separating western and eastern Michigan – see Fig. 4.2.4. With paths cut from the west, a massive power surge flowed from PJM into New York and Ontario in a counter-clockwise flow around Lake Erie to serve the load still connected in eastern Michigan and northern Ohio (Fig. 4.2.4). Power flow from Ontario into Detroit suddenly changed direction and a period of sustained oscillations ensued indicating system instability. The impedance relays on the lines between PJM and New York saw the massive power surge as faults and tripped those lines. Lines in western Ontario also became overloaded and tripped (Figs. 4.2.5 and 4.2.6). The entire northeastern United States and the province of Ontario then became a large electrical island separated from the rest of the Eastern Interconnection (Fig. 4.2.7). This large island, which had been importing power prior to the cascade, quickly became unstable as there was not sufficient generation in operation within it to meet electricity demand. Systems to the south and west of the split, such as PJM, AEP and others further away remained intact and were mostly unaffected by the outage. Once the northeast split from the rest of the Eastern Interconnection, the cascade was isolated. 4.2.1.2 Criticality of ICT Systems for the USA/Canada Blackout The most critical issue from the point of view of ICT systems was that during the crucial hour when transmission lines started to trip, the operators at FE control room were unaware of what was going on in their systems. Not only there were unaware of the situation but that did not know that they were unaware. It is quite likely that had they known, they could have taken actions to avert the fast-spreading blackout. It is important to emphasise that time-criticality is essential for preventing blackouts. This is illustrated in Fig. 4.3 which shows the number of system elements that tripped during the blackout. Once a cascade starts, it is next to impossible to stop it. The report [1] identified a number of causes of the blackout and they were also reviewed in [2]. Below we quote the causes [1] that directly relate to ICT systems: • FE lacked procedures to ensure that their operators were continually aware of the functional state of their critical monitoring tools. • FE lacked procedures to test effectively the functional state of these tools after repairs were made. • FE did not have additional monitoring tools for high-level visualization of the status of their transmission system to facilitate its operators’ understanding of transmission system conditions after the failure of their primary monitoring/ alarming systems.
Impedance relay trips a transmission line when its load exceeds a pre-set value.
4
4 Critical Interrelations Between ICT and Electricity System
59
Fig. 4.3 Speed of blackout spreading [3]
MISO did not have real-time data from Dayton Power and Light’s Stuart-Atlanta 345-kV line incorporated into its state estimator (a system monitoring tool). This precluded MISO from becoming aware of FE’s system problems earlier and providing diagnostic assistance to FE. MISO’s reliability coordinators were using non-real-time data to support realtime “flowgate” monitoring. This prevented MISO from detecting an N-1 security violation in FE’s system and from assisting FE in necessary relief actions. MISO lacked an effective means of identifying the location and significance of transmission line breaker operations reported by their Energy Management System (EMS). Such information would have enabled MISO operators to become aware earlier of important line outages. PJM and MISO lacked joint procedures or guidelines on when and how to coordinate a security limit violation observed by one of them in the other’s area due to a contingency near their common boundary.
4.2.2 Italian Blackout on 28 September 2003 The blackout happened at 3 a.m. when Italy was importing 6,651 MW from France, Switzerland, Austria and Slovenia [3]. The import constituted about 24% of total demand and was about 300 MW above the agreed import level. The pattern of flows into Italy through the tie-lines depends on the overall generation pattern in surrounding countries. At the time, the Swiss transmission grid was highly stressed
60
J.W. Bialek
operating close to (N-1) security criterion however the Italian System Operator (SO) was not aware of it. The high usage of Swiss grid by imports to Italy was difficult to control by the Swiss operator by its own means.
4.2.2.1 The Course of Events At 3.01 am a tree flashover tripped an overhead 380 kV Mettlen-Lavorgo line in Switzerland – see Fig. 4.4. At the time the loading on the line was about 86% of its maximum capacity and the flashover was probably caused by insufficient distance of the tree from the conductors. An attempt was made to reclose the line but it was unsuccessful due to a too high phase angle difference (42°) resulting from a high power flow to Italy. The load carried by the tripped line was taken over by other parallel lines and resulted in overloading by 10% of another 380 kV line SilsSoazza – see Fig. 4.4. According to operating standards the line load should have been relieved within 15 min to prevent automatic disconnection. The Swiss operator, ETRANS, telephoned the Italian operator GRTN at 3.11 a.m. and requested reduction of imports by 300 MW to the previously agreed levels. According to ETRANS, they have also informed GRTN about the line outage but this claim is disputed by GRTN. GRTN reduced import at 3.21 a.m. by shutting down pumps at pumped-storage plants but this, together with some internal countermeasures undertaken within the Swiss system, was not sufficient and at 3.25 a.m., i.e. 24 min after the first line tripped, the overloaded Sils-Soazza line sagged and tripped after a tree flashover. From this moment on, a severe system failure was inevitable. Loss of the second import line resulted in a severe overload of other
Fig. 4.4 Italian blackout and its timeline [3]
4 Critical Interrelations Between ICT and Electricity System
61
import lines and the third line (Airolo-Mettlen) tripped after 4 s. Additionally, Italy lost synchronism with the rest of UCTE (loss of angle stability) and the remaining import lines tripped almost instantaneously isolating Italy from the rest of Europe at 3.25 a.m. Quite importantly, the dynamic interaction between the Italy and rest of UCTE main grid during the last seconds before separation led to a fast voltage collapse in Italy. Following islanding of Italy, the internal generation deficit was about 6.4 GW and frequency started to fall. Although about 10 GW of load was shed by automatic under-frequency load shedding, it proved ineffective as 21 out of 50 thermal plants were tripped due to low voltage even before frequency reached 47.5 Hz. Consequently the whole Italy, apart from Sardinia, was blacked out 2 min and 30 s after separation. Following separation of Italy, the rest of UCTE network was also in a dangerous position. Frequency quickly increased to 50.25 Hz, significant power fluctuations were recorded and the European power flows took an unpredicted pattern. Some generating units were tripped by over-frequency or under-voltage relays. Loading of lines from France to Germany and Belgium increased significantly. However the system operators took various emergency actions so that further spreading of blackouts was avoided. 4.2.2.2 Criticality of ICT Systems for the Italian Blackout Similarly as it was the case with the US blackout, the real underlying reason for such a widespread blackout was insufficient coordination of real-time security assessment and control between the Swiss and Italian System Operators. Additionally Italian system operator was unaware of the overall load flow situation in Europe and the resulting consequences for Italy. Proper ICT systems would have made exchange of real-time information possible and could have prevented the blackout. The report [3] also concluded that in Europe, where the network is highly meshed and stability problems never appeared to be so critical, power system stability must be thoroughly analysed – even in the case of N-2 contingencies. This will require deeper stability analyses, in order to identify possible conditions leading to stability problems and to define suitable countermeasures if necessary. Such system-wide stability analyses would require a continent-wide comprehensive ICT system.
4.2.3 Need for Coordination of Operation in Interconnected Power Systems One of the main problems with cross-border trades in an interconnected power system is that trades do not travel according to “contract paths” agreed between the seller and the buyer but rather they flow over many transmission lines, as determined by Kirchhoff’s and Ohm’s laws. This is referred to as a parallel, or loop-flow, effect.
62
J.W. Bialek
Fig. 4.5 Percentage shares through different transit routes for a trade from northern France to Italy [4]
Consequently utilities find their networks loaded with power transfers they have little idea about. Figure 4.5 shows different routes through which an assumed 1,000 MW trade between Northern France and Italy would flow [4]. Only 38% of power would flow directly from France to Italy; the remaining 62% would flow through different parallel routes loading the transit networks. Note that 15% of power would even flow in a round way via Belgium and Netherlands. Parallel flows did not cause major problems pre 1990, as inter-area exchanges were usually agreed well in advance by the system operators and were relatively small. Post 1990, inter-area trades have not only increased significantly in volume, but they also started to be arranged by independent agents, rather than system operators. The result is illustrated in Fig. 4.6 which shows that a large proportion of flows on the Belgian grid in 1999 was unexpected by the Belgian system operator. That situation led to a few nearly avoided blackouts in Belgium in 1990s. A similar parallel flow effect was one of the main factors contributing to US and Italian blackouts. In the case of US, as the existing transmission corridors were increasingly blocked by lines tripping off, power to supply northern Ohio and Ontario had to find alternative ways and did so through neighbouring utilities (Michigan, PJM and New York) – see Fig. 4.2. The relevant system operators suddenly saw huge increases in power transfers through their territories but they did not know what caused them and could do little about them. In the case of Italian blackout, transfers through Switzerland to Italy depended on the overall pattern of generation in surrounding countries. Although some of Swiss lines were operating close to their limits, Swiss operators could do little to relieve them. Proper accounting for parallel flows is difficult enough in operational planning stages. In emergencies, when power system topology may be different to the
4 Critical Interrelations Between ICT and Electricity System
63
Flows through the Belgian Grid on July 14, 1999
Power (MW)
4000
Unidentified flows Expected flows
3000 2000 1000 0 1
3
5
7
9
11
13
15
17
19
21
23
Hour Fig. 4.6 Expected and unidentified flows through Belgian grid [5]
assumed one and some power stations may be lost, proper accounting for parallel flows would require real-time security monitoring and automatic exchange of information between SOs, rather than telephone-based coordination. Proper coordination between System Operators in an interconnected power system would require a overcoming a number of technical, political and organisation challenges [6]. The best way forward would be to change the paradigm of operation from the existing decentralised to the coordinated one in which each SO would still look after its own area in day-to-day operation and planning but necessary coordination would be required for system-wide security assessment and control purposes. To do that information would have to be exchanged to assess the impact of planned trades and outages on all the areas involved or, in other words, to assess accurately all the parallel flows in both operational planning and real-time operation stages. Furthermore, real-time coordinated security assessment would be needed to assess the impact of any contingencies on the whole interconnected network. Obviously this would require advanced ICT systems. The next step would be coordinated reaction to contingencies. When each SO defends itself against a contingency without taking into account the big picture, the results may be detrimental for the system as the whole. The next important problem is determination of who, and how much, should pay for the reliability-connected actions. Development of ICT system to support such a mode of operation is a technical challenge but one which can be relatively easily overcome as the underlying technology is readily available. The main challenge is political: overcoming resistance of individual utilities to give up partially their interdependence and operate within the paradigm of a distributed, but coordinated, control. One of the other main problems with increased coordination is the necessity to exchange operational information about each control area. Such information could be deemed to be commercially sensitive so the SOs would be reluctant to share it. Traditional load-flow or stability programs require detailed information about generation and demand profiles at
64
J.W. Bialek
each node but for proposed coordinated operation and control exchange of full individual nodal profiles is not needed. The important thing is to assess the systemwide impact of situation in each control area. Further research is needed to establish what type of information is necessary to be exchanged, and in what way, in order to perform system-wide security assessment and control whilst which information can be deemed to be left as private to individual SOs. Following that, appropriate organisational structures would have to be established. There may be a need to establish a single body charged with maintaining real-time security of the whole interconnected system. The need for improved coordination between utilities has been clearly recognised by the industry. The situation has improved over the last few years with establishment of Electric Reliability Organisation in the USA in 2005 and a number of initiatives undertaken by UCTE. However there is still a long way to achieve a proper real-time cooperation between utilities.
4.3 Preventing Blackouts: Wide Area Measurement Systems Utilities rely on SCADA systems for operational monitoring of their systems – see Appendix 1 – but SCADA systems have some serious shortcomings. Firstly SCADA relies on the state estimator to obtain an estimate of the exact system topology, power flows and voltages. However all known state estimation algorithms have shortcomings especially with regard to bad data identification and robustness [7]. Bad measurements, especially incorrect switch statues, may result in large errors in the estimation results. The periods of measurements in a SCADA scan are not synchronous causing significant time skew errors. All those errors may seriously affect power system security evaluation by SO. The wave of power system blackouts in 2003 has provided an impetus to development of monitoring systems based on GPS-synchronised measurement technology referred to as wide area measurement systems – see Appendix 2 for technological details. Time-synchronised Phasor Measurement Units (PMUs) introduce the possibility of directly measuring the system state (i.e. voltage magnitudes and their angles) rather than estimating it based on system models and telemetry data. As measurements are tracked 20–60 times per second, PMUs can track system dynamics in real time. WAMS have a wide area of applications in monitoring and control [8] but here we will concentrate on the applications regarding prevention of blackouts. Due to their accuracy and wide area coverage, WAMS may enable early warning systems to detect conditions that lead to catastrophic events, help, with restoration and improve the quality of data for event analysis. It is worth noting here that it took several months to gather information from different measurement stations following the 2003 USA/Canada blackout and perform analyses to recreate the course of events [1]. On the other hand synchronised frequency traces from different parts of Europe were available almost immediately following the UCTE disturbance in 2006 due to availability of PMUs [9].
4 Critical Interrelations Between ICT and Electricity System
65
WAMS may prevent blackouts due to their ability to provide system monitoring and tracking of power system dynamics in real time. Onset of unstable oscillations usually precedes a blackout and this can be detected quickly [10]. WAMS-based wide area protection and control systems offer a chance to see “the big picture”, stop power system degradation, restore the system to a normal state and minimise the effect of disturbance [11]. WAMS could also provide a crucial building block for the concept of “smart grids” through their wide-area communication infrastructure and ability to monitor power operation in real time. Although WAMS have been implemented in a number of places in USA, Brazil, China and various places in Europe, their more widespread adoption is still yet to be achieved. Probably the main factor preventing a wider use of WAMS is a lack of application algorithms which would provide a significant additional value in everyday power system operation. Also a business case for adoption of WAMS is not entirely clear as the main advantage of using them lies in improved system security which is a “common good” benefiting everyone. All European countries have adopted, or are adopting, a liberalised model of organisation of electricity supply industry in which distribution, supply, generation and transmission sectors are separated (so-called unbundling). In such a business model costs of WAMS, and resulting benefits, cannot always be easily associated with one particular player. This makes it difficult to justify and finance any widespread installations.
4.4 Conclusions The widespread blackouts of 2003 have exposed the critical role of ICT systems in maintaining reliable operation of power systems. Fundamental errors in providing back-up and alarm function in the control room were one of the main contributing factors to the 2003 USA/Canada blackout. The lack of proper ICT infrastructure to enable efficient communication and cooperation between System Operators in Italy and Switzerland led to delayed remedial actions and the consequent blackout of Italy in 2003. Improved ICT systems would enable a better real-time cooperation and coordination between utilities in an interconnected power system but the main challenge is political: overcoming resistance of individual utilities to give up partially their interdependence and operate within the paradigm of a distributed, but coordinated, control. Emergence of GPS-synchronised wide area measurement systems holds a great promise for improved monitoring and control of modern power systems and therefore avoiding future blackouts. Despite some initial successes with WAMS deployment, their wider adoption is still to be achieved due to a lack of application algorithms which would make WAMS more relevant to everyday power system operation. Another obstacle may lay in the unbundled industry organisation which makes it difficult to associate costs and benefits of WAMS with one particular industry player.
66
J.W. Bialek
4.5 Appendix 1: Maintaining Reliability of an Interconnected Power System A control area is a geographic area within a large interconnected network in which a system operator (SO) balances generation and loads in real time to maintain reliable operation. Control areas are linked with each other through transmission interconnection tie lines. Close cooperation between system operators is required to support the reliability of their interconnection. There are approximately 140 control areas in North America while in Europe a control area usually means a single country although larger countries may be divided into more control areas (as, e.g. Germany). In the USA, reliability coordinators are responsible for coordination between a number of SOs controlling different control areas. ICT system are fundamental for maintaining power system reliability. System operators look at potential problems that could arise on their systems by using contingency analyses, driven from state estimation, that are fed by data collected by the SCADA system. SCADA: System operators use System Control and Data Acquisition systems to acquire power system data and control power system equipment. SCADA systems have three types of elements: field remote terminal units (RTUs), communication to and between the RTUs, and one or more master stations. Field RTUs, installed at generation plants and substations, are combination data gathering and device control units. They gather and provide information of interest to system operators, such as the status of a breaker (switch), the voltage on a line or the amount of real and reactive power being produced by a generator, and execute control operations such as opening or closing a breaker. Telecommunications facilities, such as telephone lines or microwave radio channels, are provided for the field RTUs so they can communicate with one or more SCADA master stations or, less commonly, with each other. Master stations are the pieces of the SCADA system that initiate a cycle of data gathering from the field RTUs over the communications facilities, with time cycles ranging from every few seconds to as long as several minutes. In many power systems, master stations are fully integrated into the control room, serving as the direct interface to the Energy Management System (EMS), receiving incoming data from the field RTUs and relaying control operations commands to the field devices for execution. State Estimation: System operators must have visibility (condition information) over their own transmission facilities, and recognize the impact on their own systems of events and facilities in neighbouring systems. To accomplish this, system state estimators use real-time data measurements (real and reactive power flows, the state of switches) available on a number, but not all, of transmission lines, substation and other plants. This information is fed to a mathematical model of the power system to estimate voltages and real and reactive power flows throughout the system. Contingency Analysis: A power system must be able to withstand on its own, i.e. without intervention of the system operator, impact of probable events (such as tripping of lines or generators) that are referred to as contingencies. The most common
4 Critical Interrelations Between ICT and Electricity System
67
criterion used is “N-1” contingency which means that a trip of a single element should not result in overloading of power system elements, loss of stability or voltage violation. This gives SO time to adjust operation should a contingency happen. Contingency analysis is run regularly by SO based on the current system operating conditions as identified by the state estimator.
4.6 Appendix 2: Wide Area Measurement Systems Wide area measurement systems (WAMS) is a measurement system based on transmission of analogue and/or digital information using telecommunication systems and allowing a synchronisation (time stamping) of the measurements using a common time reference. Measuring devices used by WAMS have their own clocks synchronised with the common time reference using satellite GPS (global positioning system).
4.6.1 WAMS and WAMPAC Based on GPS Signal The satellite GPS system is the result of many years of research undertaken by US civil and military institutions aiming to develop a very accurate navigation system. The system has been made available for civil users around the world. The accuracy of the GPS reference time of about 1 ms is good enough to measure the AC phasors with frequency 50 or 60 Hz. For a 50 Hz system, the period time corresponding to a full rotation corresponding to 360° is 20 ms = 20 × 103 ms. The time error of 1 ms corresponds to the angle error of 360°/(20 × 103) = 0.018° .i.e. 0.005%. Such an error is small enough from the point of view of phasor measurements. The possibility of measuring directly voltage and current phasors in a power system has created new control possibilities: • Monitoring of operation of a large power system from the point of view of voltage angles and magnitudes and frequency. This is referred to as wide area monitoring (WAM). • Application of special power system protections based on measuring phasors in large parts of a power system. Such protection is referred to as wide area protection (WAP). • Application of control systems based on measuring phasors in large parts of a power system. Such control is referred to as wide area control (WAC). Wide area measurement systems (WAMS) integrated with wide area monitoring (WAM) and wide area protection (WAP) and wide area control (WAC) is referred to as wide area measurement, protection, and control (WAMPAC).
68
J.W. Bialek
Recent years have seen a dynamic expansion of WAMPAC systems. Measurement techniques and telecommunication techniques have made a fast progress but the main barrier for the expansion of WAMPAC system is a lack of WAP and WAC algorithms based on the use of phasors. There has been a lot of research devoted to that problem but the state of knowledge cannot be regarded as satisfactory.
4.6.2 Structures of WAMS and WAMPAC WAMS, and constructed on their basis WAMPAC, may have different structures depending on telecommunication media used. With point-to-point connections, the structure may be multi-layer when PMU data are sent to phasor data concentrators (PDC). One concentrator may service 20–30 PMUs. Data from concentrators is then sent to computers executing SCADA/EMS functions or WAP/WAC phasorbased functions. An example of a three-layer structure is shown in Fig. 4.7. In each stage of data transmission, delays are incurred. Concentrators in the lowest layer service PMUs. As the delays are the smallest at that stage, the concentrators may supply data not only for monitoring (WAM) but also for protection (WAP) and control (WAC). The middle-layer concentrators combine data from individual areas of a power system. The data may be used for monitoring and for some WAP or WAC functions. The top, central, concentrator services the area concentrators. As at that stage the delays are the longest, the central layer may be used mainly for monitoring and for those SCADA/EMS functions that do not require a high speed of data transmission. The main advantage of the layered structure is the lack of direct connections between area concentrators. Such connections may make it difficult, or even impossible, to execute those WAP or WAC functions that require data from a number of areas.
Fig. 4.7 An example of a three-layer structure of WAMPAC. PMU – phasor measurement unit, PDC – phasor data concentrator, P&C – protection and control based on phasors
4 Critical Interrelations Between ICT and Electricity System
69
Fig. 4.8 WAMPAC structure based on a flexible communication platform
The only way to get access to data from another area is via the central concentrator which incurs additional delays. That problem may be solved by adding additional communication between area concentrators. That leads to more complicated communication structures as more links are introduced. Computer networks consisting of many local digital area networks (LAN) and one wide area digital network (WAN) offer best possibilities of further WAMPAC development and application. Such a structure is illustrated in Fig. 4.8. LAN services all measurement units and protection and control devices in individual substation. The connecting digital wide area network (WAN) creates a flexible communication platform. Individual devices can communicate with each other directly. Such a flexible platform may be used to create special protection and control systems locally, for each area, and centrally. The platform could also be used to provide data for local and central SCADA/EMSs.
References 1. US-Canada Power System Outage Task Force “Interim Report: Causes of the August 14th Blackout in the United States and Canada” (Nov 2003) 2. Bialek, J.W.: Recent Blackouts in US and Continental Europe: Is Liberalisation to Blame? Working Paper EP34, MIT Institute, Cambridge (2004)
70
J.W. Bialek
3. UCTE: Final Report of the Investigation Committee on the 28 September 2003 Blackout in Italy (2003) 4. Haubrich, H.-J., Fritz, W.: Study on cross-border electricity transmission tariffs by order of the European Commission, DG XVII/C1 Aachen (April 1999) 5. Bonnard, P.: “Power System Collapse: European Utility Experience” 2003 IEEE Trans. Distr. Conf. (2003) 6. Bialek, J.W.: Why has it happened again? Comparison between the 2006 UCTE blackout and the blackouts of 2003. In: IEEE PowerTech Conference, Lausanne (2007) 7. Shahidehpour, M., Tinney, W.F., Fu, Y.: Impact of security on power system operation. Proc. IEEE 93(11) (Nov. 2005) 8. Novosel, D., Madani, V., Bhargava, B., Vu, K., Cole, J.: Dawn of the grid synchronization. IEEE Power and Energy Magazine, Jan/Feb (2008) 9. UCTE: Final Report. System Disturbance on 4 November 2006 10. Wilson, D., Bialek, J.W., Hay, K., McNabb, P., Trehern, J.: Identifying sources of damping problems in power systems. In: 16th Power Systems Computation Conference, Glasgow (2008) 11. Begovic, M., Novosel, D., Karlsson, D., Henville, C., Michel, G.: Wide area protection and emergency control. Proc. IEEE 93(5) (May 2005)
Chapter 5
ICT and Powers Systems: An Integrated Approach C. Tranchita, N. Hadjsaid, M. Viziteu, B. Rozel, and R. Caire
Abstract In last years, an increasing incorporation of Information and Communication Technologies (ICT) into the power systems has been evidenced. ICT have enabled improving the control of the power grid and by consequence the reliability and the flexibility of these systems. Presently, ICT are a key aspect in the smart grids development. Today’s power systems depend on ICT. However, these technologies can fail and are also exposed to threats that can affect their functioning and the operation of the power system. Therefore, it is very important to consider both interconnected infrastructures (electrical power grid and its information and communication system (ICS)) in the modeling, design and security analysis of electrical power systems. In this chapter, some approaches on the interdependencies modeling between these infrastructures are presented. In addition, some methods based in risk and criticality assessment with regards cyber attacks and ICT failures are proposed. Keywords Modelling power/ICT dependencies • cybersecurity
5.1 Introduction The main objective of power systems is to deliver electrical energy that is nonstorable to final customers on very large areas, where and when this power is required. A large amount of tasks and operations are needed in order to assure permanently a reliable service at the lower cost. The electrical power network control requires a precise coordination in the operation executions. Transactions on the market, load and generation profile, assets upgrade, maintenance and security needs lead configuration changes of networks. C. Tranchita, N. Hadjsaid (*), M. Viziteu, B. Rozel, and R. Caire Grenoble Electrical Engineering Laboratory (G2ELab), Grenoble Institute of Technology (Grenoble INP) INPG, Laboratoire de Génie Electrique de Grenoble, ENSE3 bat D, 961 rue Houille Blanche, F-38402 St Martin d’Hères Cedex, France e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_5, © Springer Science + Business Media B.V. 2010
71
72
C. Tranchita et al.
Power systems are geographically dispersed and have been extended due to the economic profit while interconnecting local, national and international utilities. This interconnection requires to share and to coordinate basic tasks (as maintenance programming, control and coordination for emergency situations) among producers, operators and customers who act on the power network in physically isolated places [1]. Additionally, due to the electrical market deregulation, convergence of the participants’ information is necessary to reach a global optimal performance of the system. On the one hand, electrical power systems faced continuously with a series of events which can affect their integrity and operation. The system itself and operators must cope with these hazards in order to preserve its security. On the other hand, the disparity between growth of electrical demand and investments leads to system operations close to their limits, diminishing conservative security margins. Due to this complexity, a large amount of communication electronic devices, are distributed in electrical power systems to control, protect and supervise their functioning. Power systems are for the most part remotely controlled making possible a far-away and fast management. Progress on ICT have been widely exploited in power systems, improving the quality of operations, allowing automations, making available data and specialized analysis and solving challenges for human related activities. ICT involve any communication device or application to acquire, store, process and distribute information by using electronics means. The term includes both traditional technologies as radio, television, print, video and newer technologies as Internet, among others. ICS (which are composed of different ICT and have a specific objective) such as the ones used in power systems, are complex and larges. Complexity arises because devices are very heterogeneous and extensively networked with other internal and external systems. Besides, even if assets dedicated to control and protection have (in most cases) high advanced functionality, some networks also depend on the human organization [2]. ICS of power systems such as the one used in energy control centers and corporate computer’s networks are generally secured. Nevertheless, because of the complexity and size, numerous access points can be exploited by attackers in many different ways. Thus, ICS are vulnerable to cyber attacks that can restrain their operation, corrupt important data, or expose private information. In addition, cyber and physical natural failures in ICT may also occur and lead faults in the ICS and can affect the whole power system behavior. Presently power systems are highly connected and highly interdependent to other critical infrastructures [3]. This interconnectivity has increased efficiency but has also introduced more vulnerability into the system. Failures and attacks on ICT can affect not only the ICS but the concerned power system and other interconnected critical infrastructures. Severity of the events could be then more significant due to domino effect between infrastructures. In this context, the security assessment is more vital than ever to operate correctly power systems. Security is the ability of a power system to withstand sudden disturbances without service interruption [4]. New methods of security assessment are needed to respond to new potential disturbances arising from
5 ICT and Powers Systems: An Integrated Approach
73
failures of ICT, increased control complexity and malicious threats to which a power system is exposed. Research must focuses on scalable models and simulation tools to perform risk analysis to power system operation and integrity, showing up consequences of potential failures. Two main challenges take place to achieve such analysis; the first one in understanding the interactions between the ICS and the physical power grid and; the second one in determining the criticality of the information, IC functions or IC assets, which are crucial to the operation of the power system. Some experts [5, 6] stated that one of the most frequently identified shortfalls in knowledge related to enhancing critical infrastructure protection capabilities, are the incomplete understanding of interdependencies between infrastructures. Interdependency’ modeling is the first stage to respond to many queries about the real vulnerability of infrastructures. To have a complete understanding between the ICT behavior and the power grid operation, modeling is thus required. This chapter summarizes the most important aspects of Information and Communication Technologies (ICT) involved in power systems operation and their relationship with other important concepts such as the “traditional” power system security and cyber security. Different efforts have been made in the area of infrastructure interdependency modeling, risk assessment and vulnerability analysis. In that sense, the objective of this chapter is to provide some approaches developed in the Grenoble Institute of Technology and G2ELab (France), on the interdependency modeling between the Information and Communication System (ICS) and the Electrical Infrastructure. Since in traditional security assessment methods, only failures coming from electrical power grids are covered, new considerations in security analyses are proposed while dealing with cyber attacks and ICT failures. Specifically, new approaches to assess the risk and the criticality are presented.
5.2 Overview: ICT for Power Systems In order to reach end users, electrical power must be produced, transmitted and distributed through a large electric infrastructure. The objectives of electric utilities are related to security, quality and economy [7]. Each one of these key points is unlikely to be reached without a continuous awareness of the different states of the power system. This is why, local automation and communication devices are used to gather measurements and send them through a telecommunication infrastructure to a control center. The latter disposes of advanced software systems to process information and come up with coherent reactions to keep the electric parameters within nominal ranges. Traditionally, electrical utilities carry out their operation but also planning and maintenance tasks by using ICS, which enable the flow of information at different levels of the power system. The ICS is generally composed of three networks: (a) regular networks including public switched telephone and data networks; (b) wireless networks with cellular phones and wireless ATM (Asynchronous Transfer
74
C. Tranchita et al.
Mode); (c) computing networks including different dedicated LANs, WANs and the Internet [1]. Before explaining the different technologies employed at the different levels in the power system, generalities of ICT are detailed.
5.2.1 Generalities: ICT ICT designate the essential resources to handle information and particularly computers, programs and communication networks necessary to convert it, store it, manage it, transmit it and find it. The present denomination of ICT in the engineering domain, indicate everything that concerns the technologies (and techniques) used in the treatment and the transmission of information, mainly data processing and telecommunications. More accurately, Information and Communication Technology was defined by [8], as follows: The technology involved acquiring, storing, processing and distributing information by electronics means (including radio, television, telephone, and computers).
Information acquisition, communication of the information between different entities and information computerization (including information analysis, storing and visualization) are the different processes identified, see Fig. 5.1. Usually, it is difficult to define a border between the information system and the communication systems because both are related very closely. Communication is defined as the action to transmit information between two or more points/agents of the system. The ICS obtains or measures physical phenomena that are needed for specialized functions of control/protection/management. Once the information exists, it is converted to an analogical or digital signal. Thus, the communication system carries the signal from the measurement point to other(s) by using a communication media. A receiver gets the signal and converts it back into usable information. The communication process then finishes when this information is stored. Computerization is the use of the information with specialized functions for analysis or decision making. The information system is frequently referred
Communication System Information System
Information System
Measures Collecting data
Stored,Using Visualizing data
Fig. 5.1 Information and communication system [8]
5 ICT and Powers Systems: An Integrated Approach
75
only to computers. This last system concerns application software as programs and operating systems that can be applied in a computer, PLC or a control unit to store data, handle large amounts of information, perform complex computations, and control processes.
5.2.2 Hierarchical Structure of Power Systems and Main ICT Power systems are usually divided into five hierarchical levels as shows in Fig. 5.2. The three first levels concern mainly substations. These levels are the process, the bay/unit and the station level. Substation’s assets can be categorized into two groups: primary and secondary equipment. In general, primary equipment are High Voltage (HV) components and switchgears with associated sensors that are normally located in the process level. Secondary equipment is the protection and control equipment sited on bay level and on station level [9]. The two other levels are the regional energy control center level and the network energy control center level. The process level is the interface between power grid HV equipment and control system. Signals between process and bay level are analogical measurements of current and voltage and binary inputs as status information or opening/closing commands of switches. Assets used at this level include current, voltage and switches’ status sensors and actuators [10]. Functions performed in the bay level are related to control and protection. Devices in this level acquire data from the bay and perform actions on the bay’s primary equipment. Single devices as merging units
Fig. 5.2 Hierarchical conceptual levels of power systems
76
C. Tranchita et al.
(utilized to collect instantaneous values of primary equipment and to send them to the protection and controls unit), protection units and control units are present in the bay level. Intelligent Electronic Devices (IED) are the main assets employed in this level. They host in a single device different control and protection functions such as, switching operation, data acquisition, monitoring and control execution. Station level involves conceptually similar functions of the bay level, but that involve interactions at station level. These functions include control commands for switching operation and substation data collection from the bay level. Other station level’s functions are related to the interface to local station operator and to remote control centers [11]. Regional and Network Control Center levels include the centers responsible for the control and operation of large electrical networks. The accomplishment of these tasks has been made possible due to important developments in ICT. These centers receive data from substations and also from other control centers depending on its importance. Generalities of control center, realtime functions as SCADA (Supervisory Control And Data Acquisition), Energy Management Systems (EMSs) applications and database are detailed in the following sections.
5.2.3 The Control Energy Center, EMS and BMS The control center is the central nerve system of the power system. It senses the pulse of the power system, adjusts its condition, coordinates its movement, and provides defense against exogenous events [12].
Power networks were conceived more than half a century ago and exploited by vertically integrated companies. Deregulation has lead to deep modifications in the structure of this industry. Investments and acquisitions continue to change the frontiers between companies. The restructuring has changed its decisional structure from centralized to coordinated decentralized decision-making [12]. Commercial competition demands transparency and smart software meant to help machines and operators obey the sophisticated rules of the market. Present control centers are changing from functional and territorial point of view. Control centers are one of the main tools used by utilities to reach their quality, security and economy aims [7]. The core of a control center is made out of a SCADA/EMS. The SCADA refers to a distributed measurement and control system used to perform data collection and control at the supervisory level. The EMS is a system of computer-aided tools used by operators of power grids to monitor, control, and optimize the performance of the generation and/or transmission system [6]. The SCADA/EMS has two real-time function categories: one dedicated to the Automatic Generation Control (AGC) and the other one to manage the electric network. These two main function classes exchange information concerning settlements and scheduled prices. This flow of information ensures the real-time fulfilling of the electricity-market rules securely. In a deregulated power system, a BMS (Business Management System) handles all the types of contracts,
5 ICT and Powers Systems: An Integrated Approach
77
Market participant
Others TSO/DSO
applications for generation control
applications for network control
TSO/DSO connection
WWW internet Market portal
EMS Data bus SCADA database
Master Terminal Unit configurator
Remote Terminal Unit
Power plants
wallboard display & operator interface Power system components
Fig. 5.3 EMS and SCADA subdivision within a control center
adjustments and settlements required by the electricity market (long-term contracts, day-ahead schedules, hour-ahead adjustments, billing, among others). The information exchanges between the SCADA/EMS and the BMS are mutual. The BMS gives the set-points schedules for the AGC and uses the operating constraints, network model and the telemetry data to clear the market [12]. Figure 5.3 illustrates the structure of a typical control center that includes EMS, SCADA functions and ICT. In order to prioritize transparency and ensure market competition, the BMS provides information for the market participants through dedicated web-services (example for the US: OASIS – Open Access Same Time Information System) accessible via Internet. For a coherent management of the technical resources and a real-time monitoring of the financial activity, the BMS is also interconnected with the ERP (Enterprise Resources Planning) of the power utility. Information of the real-time consumption is provided by the smart metering system. 5.2.3.1 Brief SCADA/EMS Functions Inventory The decision coordination ensures the well functioning of the electric network is based on the SCADA/EMS. The main classification criteria for the applications included in a SCADA/EMS are related to timing (real time, expanded real time;
78
C. Tranchita et al.
preventive, curative, restorative), functionality (security, economic optimization, training), type of managed electric network (transmission, distribution), level of involvement of the operator (dispatching, preparation, training) and priorities (primary, secondary) [13]. This inventory lists the most frequent applications included in a SCADA/EMS from functional point of view. The main decision making and control applications are meant to manage the generation and the network in real time. The software applications included in the AGC system are load frequency control, economic dispatch, reserve monitoring and interchange transaction scheduling [14]. A real-time network analysis sequence starts gets input data from the SCADA system, takes into consideration the Network Topology, gets a complete and coherent profile of the network from the State Estimator. The output of the State Estimator becomes input data for the transmission loss factors and the bus load forecast. The security constraint is ensured by the Contingency Analysis function that takes its input data from the State Estimator and gives input data for the optimal power flow, security enhancement and preventive action. A network analysis computed in study mode has the role to provide useful information to the operator concerning the most suitable decisions that can be taken in a certain situation. The most frequent applications that are used in study mode are power flow, contingency analysis, optimal power flow and short circuit analysis [15].
5.3 Power System Security Assessment with Regards to ICT Failures and Cyber Attacks Presently, ICTs are involved at every level and virtually, in all functions of the power systems. Malfunctioning of ICTs or malicious attacks on these technologies are potential threats at all levels. In this section, the problem of cyber security of the information and communication system affecting the security of the power grid is analyzed. System security of a power system refers to the degree of risk in the ability of a system to withstand sudden disturbances and losses or failures of system components without interruption of customer service. It is related to system robustness to imminent disturbances, and hence, depends on the operating system condition as well as the probability of occurrence of contingencies [4, 16, 17]. Security problems vary according to time scales and characteristic symptoms such as under/over voltages or overloads. Under normal conditions, the fault of a single component will not have serious consequences for the system. Nevertheless, if the affected component is critical for the operation, or the power system is near to its operational limits and appropriate corrective actions are not taken quickly enough, this can trigger a sequence of events causing the partial or total collapse of the system. Although the cascade is not frequent, consequences are usually catastrophic. Security assessment is performed to examine the capability of a power system to
5 ICT and Powers Systems: An Integrated Approach
79
supply the load without violating the system’s and equipment’s operating limits. Also, the capability of a system to remain stable, when disturbances occur, is evaluated. The term security in this chapter is used indifferently including both static and dynamic failure conditions. The cyber security of an ICS refers to the establishment and maintenance policies and methods that ensure integrity, availability and confidentiality of the organization’s IC resources. This requires taking into consideration the actual and potential threats and overseeing and building a defense strategy accordingly. Major techniques and tools used in order to keep and guarantee the security of ICS are: firewalls (both hardware and software), antivirus software, intrusion detection systems, authentication (such as voiceprint, fingerprint, retinal scanning, and smart cards) and encryption. Other ways include the concepts of redundancy and back-up components. Cyber security of a power system comprises protecting information by preventing, detecting, and responding to ICT failures and attacks. This includes the safety of all IC services and computer-based applications employed in the five levels of the power system (see Fig. 5.2). Most of the related applications employing ICT are executed in power stations, substations and control centers. Cyber security principally guarantees the availability and integrity of information, such as measurements, data and commands sent and/or received by components and transmitted by communication channels [18]. In a power system, the most important is the continuity of the service respecting the operational limits. For this reason, the cyber security of a power system must focus on the preservation of the reliability and security of the power system. Therefore, cyber security assessment must include the impact on the electrical infrastructure behavior. All assessments of the power system security must also include ICT failures and cyber attacks. In modern power systems, due to the importance of the control and protection functions and thus their dependency on ICT, the communication between facilities must be available 100% of the time and the integrity of the information must be guaranteed [19]. Actions on the power grid achievable by control and protection system can also be achieved maliciously through a cyber attack, with the effort required and depending on the protection measures taken in place [20]. A cyber attack is an assault against a computer-based information system by using electronic means. The objective of these malicious attacks is to interrupt or corrupt flows of data, applications and databases to disturb and/or stop major services of the system that rely on computers. Thus, a cyber attack has the potential to damage critical electrical infrastructures. A cyber attack implies a sequence of actions that exploit the vulnerabilities of a system and which then entails a negative effect. The attack is an action or a series of actions that are not permitted by the systems’ owner. Figure 5.4 illustrates how a cyber attack occurs. This figure is a modification of the cyber attack process diagram presented in [21]. By the use of tools such as a program packet sniffer, a toolkit, keyloggers, among others, the attack is accomplished. These tools enable attackers to exploit ICS vulnerabilities, which exist because faults of design, software’s weakness and problems of hardware implementation or computer system’s
80
C. Tranchita et al. Actions
Attacker’s tool User command, Script Program, Autonomous agent (virus, worms), Distributed tool, Toolkit
Computer / Network Vulnerability
Probe, Scan, Flood, Authenticate, Bypass, Spoof, Read, Copy, Steal, Modify, Delete
Object Logical: data, process, account Physical: computer, component, (router, switch) network
Consequence Loss of Integrity, Loss of Availability DoS, Loss of confidentiality, Physical Destruction
Fig. 5.4 Cyber attack process
configuration. Thus, attackers can take actions in order to affect a component or process and thereby achieve their target [22]. The consequences of a cyber attack regarding only the ICS can be classified into five main groups as follows: 1. Corruption of information – loss of integrity: When data on a computer or network suffers improper modification. 2. Denial-of-service (DoS) – loss of availability: When unauthorized persons or systems can deny access to authorized users. 3. Disclosure of information – loss of confidentiality: When critical information is disclosed to unauthorized persons or systems. 4. Theft of resources – loss of availability: When computers or the resources of a network are used by unauthorized entities. It is a step to create DoS. However theft of resources can be possible without going until the DoS. 5. Physical destruction: When physical harm or destruction is achieved through the use of ICS. Consequences of cyber attacks does not purely concern the ICS, but can also affect the productivity, efficiency and economy of the process for which the ICS is being used. As it was aforementioned, protection and control are vital for the correct power system operation. Therefore, one of the major risks for power systems is that a cyber attack would entail the absence or the erroneous execution of protections or the functions of control devices. In this perspective, integrity and availability are the two main aspects of traditional security pillars that involve the cyber security of the ICS of the power systems. The issue of considering cyber attacks on critical infrastructures in security studies is nowadays more sensible than ever. Attacking via electronic means is one more way to commit crimes and terrorism, but in most cases it is cheaper and easier to execute than a physical attack and can be done without being in the place, i.e. remotely. Neither explosives, nor arms or vehicles are needed. This type of attack is less risky, because it is relatively anonymous [18]. The attackers’ identity can be hidden in cyberspace, and physical barriers, such as customs or the borders between countries, do not exist. Also, the attacker takes less of a personal risk, since they do not reveal themselves during the attack and the mortality risk is practically zero [23]. In [24], a study for the United States General Accounting Office published in March 2004 with an analysis of the vulnerability of control systems (SCADA) to
5 ICT and Powers Systems: An Integrated Approach
81
cyber attacks, is presented. Such attacks include the disruption of the operation of control systems by delaying or blocking the information flowing in the communication links, changing parameters not allowed on the equipment or supplying false information. Some particularities and trends of the ICS integrating the power systems are sources of cyber vulnerabilities, for instance: the remote access. Taking into consideration the size and the geographical zones were power systems are located; it is unavoidable to control the system remotely. This links between control centers, substations, generating plants and devices as RTU exposes the system to remote attacks. Other vulnerability source is the corporate networks, which are often linked to substations and control centers. The objective of this integration is the efficiency in some essential task as operation planning, maintenance and billing. Besides, control and protection devices are expensive and are traditionally planed to operate for long time. Most old IC devices have no security measure included. Encryption, authentication, intrusion detection are practically non-existent and difficult to implement because of hardware’s and communications channels’ age. However, sometimes this IC devices’ old age is helpful in the sense that old EMS and control systems use still proprietary protocol and operating systems. Therefore, attackers do not have much information or knowledge about their structure and vulnerabilities. On the contrary, operating systems for the newer ICS are either Windows or Unix-based. The broad trend for ICS is to move from proprietary protocols and software to commercial solutions. At the moment, the IC61850 standard is for example, an effort to set up a single worldwide standard for substation automation communications. To summarize, today’s control systems based on ICT and used in many critical infrastructures, face an increased risk for several reasons: they use standard technologies with known vulnerabilities. They are also connected to other networks such as the corporate network itself connected to the Internet. There are often nonsecure links open for diagnosis and maintenance or wireless connections and information on infrastructure and these control systems are publicly available. Particularly in relation to this last point, it is easy to get the manual containing the default password of the equipment and many times the password remains unchanged. The threat of cyber attacks on control systems is real. This kind of attacks has already been reported such as the contamination of the private computer network of a nuclear power plant by the Slammer worm. Further information about cyber attacks and ICT failures are presented in Appendix 1. Finally, securing these control systems involves many challenges due to: • The lack of specialized security technologies for these systems that are developed in the perspective of real-time availability and not in the perspective of a cyber-security • The non-perception of the economic justification of costly security measures • The organizational conflicts of coordination between computer security staff and the operators of this control system who are generally different working groups within the same company
82
C. Tranchita et al.
5.4 Modeling Interdependencies Between the ICS and Electrical Infrastructure of a Power System As seen throughout this chapter, the electrical infrastructure is heavily dependent on ICT for its operation in both normal operating conditions and critical conditions such as peak load or network restoration after a blackout (blackstart). Conversely, all components used by ICS need to be supplied with electric power. During a power outage, only devices equipped with Uninterruptible Power Supply (UPS) will be likely to operate. As these power interruptions/situations are relatively exceptional at present, these systems will therefore not be all available when a failure occurs. Thus, if the outage lasts several hours, the batteries will gradually run out. Indeed, the design of UPS is not based on actual costs of non-functioning of equipment supplied, but rather according to their perceived importance which is generally underestimated. This underestimation arises because only first order consequences are generally considered and consequences of higher orders (i.e. due to the effects of the first consequences and combined with the initial cause of the interruption) are taken into account of lower importance at best, or neglected at worst. It appears that electrical infrastructure depends on the information flow and ICS depends on electricity availability. These two coupled phenomena, for which these dependency relationships are very nonlinear and strongly coupled, produce ultimately interdependencies that are very complex to collect and analyze. As power, telecommunications and information networks have become very critical for the functioning of our modern societies. It becomes very difficult to accept failures, even for a short time, on one of these infrastructures. When it happens, it can be done at the expense of human lives, e.g. due to a traffic accident caused by the non-functioning of traffic lights at intersections in case of power failure or by an inability to contact emergency services in case of trouble on the telecommunications infrastructure. In order to avoid the occurrence of such consequences which can be even more devastating in case of natural disaster, it is necessary to better understand these phenomena. As they are primarily physical, this better understanding can be achieved through modeling. Thus, interdependencies’ modeling between critical infrastructures is the first step towards securing them and therefore secure main basic services and save lives. This step is not trivial and the difficulty of the task (due to the aforementioned complexity) combined with the variety of possible goals (minimizing deaths or a financial cost, securing one infrastructure or all) requires the use of different approaches that may be theoretical or simulation based, of a high level throughout a country or of local level on the scale of a city integrating the human factor and in particular its unpredictability to avoid it. One of the major difficulties of the infrastructures interdependencies modeling lies in this deep heterogeneity of mathematical behavior. From systems theory point of view, there were identified three main components in the power systems system: 1. The electric network, whose behavior is characterized by differential equations, a continuous sub-system 2. The telecommunication network, whose behavior is characterized by events, a discrete sub-system
5 ICT and Powers Systems: An Integrated Approach
83
Fig. 5.5 Heterogeneous components of a power system and their typical modeling
3. The control center, which provides different commands according to each circumstance, behaves as an expert system with human in the loop Figure 5.5 shows the different abovementioned subsystems.
5.4.1 State of the Art of Interdependencies Modeling This section presents a synthesis of various methods used for modeling interdependencies in critical infrastructures. The use of Petri nets, agents based modeling and other approaches on this area, their advantages and restrictions are detailed in the following sections. 5.4.1.1 Modeling with Petri Nets A tool often used to model the interdependencies of infrastructures is the Petri nets. A proposal for this use is presented in [25]. The authors developed an interdependency model by using an analytical approach based on the incidence matrix and the place-invariants. Relationships and interdependencies between critical infrastructures were identified. The network includes 23 transitions and 33 places for six infrastructures (oil, transportation, electric power, natural gas, water and telecommunication), as it is presented in Fig. 5.6. Thus in this net, there are only five places for each infrastructure. This modeling is made in a very high-level1 description and it is not well-adapted for a thinner modeling of the infrastructures behaviors. In [26] a model to identify and analyze common mode faults using the “Generalized Petri Stochastic Nets” (GPSN) is proposed. GPSN are an extension of Low-level modeling means that the description is complete and simplifications are avoided. On the contrary, high-level modeling describes only some phenomena or characteristics about the studied system.
1
84
C. Tranchita et al.
Fig. 5.6 Petri Nets used to model interdependencies of critical infrastructures [25]
Petri nets. This type of modeling allowed the authors to represent and quantify failures due to interdependencies. In their work, they acknowledge that the model has limitations but that has better performances compared to traditional failure tree models due to the explicit consideration of common mode failures. Restrictions are related to the direct relation between the modeling detail and the number of places. In general, Petri nets approach or derivatives (such as GPSN) have the disadvantage of leading to non-trivial graphs for simple systems. For instance, in [27] a method based on Petri nets to analyze the interactions between electrical and communication networks is presented. In this model, a simple line with two switching devices and a control center is represented by a Petri net comprising 17 places and 23 transitions shown in Fig. 5.7. In short, Petri Nets are not recommended to represent a detailed modeling of real large interconnected critical infrastructures because the number of places explodes. These nets can be used to model only a part of the infrastructure or model the complete system in a high-level, i.e. without going into any further detail of the modeling. 5.4.1.2 Agent Based Modeling A second approach commonly used is the Agent-Based Modeling (ABM) method. If this modeling technique is simulated (obviously computer simulated in this case), the approach is so-called ABS (Agent-Based Simulation) or ABM&S (Agent-Based
5 ICT and Powers Systems: An Integrated Approach
85
Fig. 5.7 Petri Nets to analyze the interactions between electrical and communication networks [26]
86
C. Tranchita et al.
Modeling and Simulation) [28]. This type of modeling has been widely used in social sciences and ecology, where it was usually called Individual-Based Modeling (IBM) [29, 30]. This approach is inherently distributed, bottom-up, using a group of intelligent agents interconnected to each others. The definition of an agent varies depending on the field of study and authors. Nevertheless, it emerges from the various definitions one which is more precise where an agent is considered as an autonomous system (software and/or hardware) located in an environment (with other agents) and acts on it to pursue its own objectives. This is an approach from the bottom up, of a relatively simple behavior of different components of low-level which are allowed to cooperate. The complex and high-level behavior emerging from this cooperation appears then by itself. This approach has several advantages over conventional modeling techniques as described in [31, 32]. First, there is no need to develop a high-level model to describe the behavior of a complex infrastructure. Instead, a bottom-up approach of relatively simple behavior of different low-level components is the starting point, and allowed to cooperate. The emerging complex behavior of high level appears then by itself. In addition, the model is modular. Each agent incorporates its own model (complex algorithm, Markov chains, etc.), which may then be different for each type of agent in the same environment. A third advantage lies in its inherently distributed approach, which facilitates the distribution of computation on multiple processors if the need arises during the simulation. In conclusion, the three keywords of the ABM are object, emergence and complexity. The development of a distributed simulation environment of electrical and communications networks based on agent-based simulation is described in [33]. Research works [34, 35] describe agent based modeling and simulation of interdependent critical infrastructures illustrated in Fig. 5.8. This simple example is based on a real system composed of three infrastructures: conditioned air and communication and electrical power supply. The model consists of eight agents. Arrows model the dependency relationship between agents. There are plain arrows when an agent needs the output of the other agent to perform its activities. Dotted arrows correspond to faults propagation modeling via these physical linkages. Intelligent Software Agents applied for the integration, modeling and simulation of critical infrastructures are shown in [36]. In [37] a model of game theory and agent-based simulation for critical infrastructures modeling is also proposed. A methodology to study the interdependencies in critical infrastructures with an implementation of ABM&S is illustrated in [38] on a simple example of the information system for emergency management. In general, the ABM method seems to be very interesting to model infrastructures composed by actors which have each one their own objectives. Nevertheless, some interconnected critical infrastructures share their objectives, e.g. the power grid and its associated ICS are made up of various components which contribute to accomplish the common goal of supplying electrical energy to final customers. There are not then independent actors that have a common objective and the modeling by ABM appears less pertinent.
5 ICT and Powers Systems: An Integrated Approach
87
Fig. 5.8 Example of ABM for three architectures [34]
ABM is appropriated to model the human behavior into the representations of critical infrastructures, e.g. the decision making of an operator in a power system. This method is less suitable to model only the “technical” functioning of infrastructures’ components. 5.4.1.3 Other Modeling Approaches Other concepts also appear in the literature such as the use of techniques of social networks and reliability modeling and the modeling based supply-demand graphs [39–42]. It exist also modeling and simulation approaches of critical infrastructures based on a Geographic Information System (GIS) [43, 44]. The inconvenient to use this last approach is that accurate and complete data on the studied zone is necessary. Besides, the zone’s size is limited and in the case of real large critical infrastructures, e.g. at the continental scale, the modeling by this method becomes not easy. The use of a database on accidents affecting critical infrastructures to enhance their vulnerabilities and interdependencies is presented in [45]. The approach can be useful to determine the most frequent or most severe vulnerabilities and by consequence to reduce them. However, because this method is based on historical data, new vulnerabilities cannot be identified. In [46], modeling of critical infrastructures is coupled with a genetic algorithm for the purpose of infrastructure planning. A study of the propagation of failures in critical infrastructures is made in [47] using the theory of fuzzy sets to quantify the links of
88
C. Tranchita et al.
failure propagation between infrastructures. Presently, it is difficult to envisage the usefulness of this approach on large systems, as a very simple case was only studied. Finally, two other existing and exciting approaches are described in detail in the following sections. This is the behavioral simulation of infrastructures and the use of the theory of complex networks. Behavioral simulation is a time dependent simulation of the studied infrastructures. Many models and simulators already exist for each infrastructure. However, in the context of interdependent infrastructures, performing simulation for each individual infrastructure does not necessarily provide accurate and complete results especially when the interdependencies are strong and the risk of cascading effects between coupled infrastructures is high. Therefore, the design of an integrated model within as a single simulator that is able to take into account several infrastructures is advisable. A second approach that is often used in the study of large networks is the theory of complex networks. This is a new field applied to many different infrastructures and can be helpful at the time to analyze large interconnected systems.
5.4.2 Multi-infrastructure Simulator for Interdependencies Studies In [48], authors present a survey listing the projects developed in Europe and the USA meant to virtually simulate the interactions between critical infrastructures. Among 30 listed simulators, only five of them take into consideration the electrical power grid and the ICT infrastructures. G2Elab at the Grenoble Institute of Technology is one of the few laboratories that has developed a combined simulator for inter-infrastructure studies [49]. This section details the main steps and the methodology used to build a combined simulator that models in a common simulation tool an electrical power network and the corresponding ICT infrastructure needed to monitor and to control it. The focus of the study is the power grid and the purpose is to better understand its interactions with the ICT infrastructure. The interdependencies between the electric network and the ICT infrastructure are caused mainly by two types of flows: energy and information. The electric network depends on the data flowing through the telecommunication infrastructure and the commands received from the control center. The focus is then mainly on the phenomena caused by information interdependencies. As it was mentioned, the difficulty of the infrastructures interdependencies modeling lies in this deep heterogeneity of mathematical behavior. A coupled simulator has to be able to show the interactions between the three families of components of a power system (see Fig. 5.5). The choice was made to use single infrastructure simulators and connect them together in a unitary tool. In order to build a robust tool to illustrate the behavior of a multi-infrastructure system, the following steps have been pursued. The first one consisted of identifying the main mechanisms of the interdependencies and the mathematical behavior
5 ICT and Powers Systems: An Integrated Approach
89
of the studied sub-systems (see Fig. 5.5). These observations helped to choose the suitable tools able to simulate the behavior of each individual infrastructure with compliant models (as a validity domain point of view), which was the second step of the approach. The third phase was to build the combined simulator: a coupling of individual simulation tools with inter-process communication principles. A fourth step was essential to do the inter-infrastructure simulation and it consisted in building a multi-infrastructure benchmark. Simulations in different scenario were run and the resulted tool was validated on behavior principles. 5.4.2.1 The Electric Network/Infrastructure The purpose is to illustrate scenarios inspired by real blackouts and medium/long term stability aspects (duration from minute up to 1 day) were mainly taken into account. Thus, the level of details does not reach electromagnetic transient phenomena. The time-domain simulation software was chosen, which is based on the integration of algebraic-differential equations. PSAT (Power System Analysis Toolbox) is an open source tool, able to perform time domain simulations in Matlab environment [50]. It gives to the user the possibility to modify the source code. New model were also implemented such as load variations according to real load tendencies, overload line protections or automatic under-frequency load shedding among others. 5.4.2.2 The Telecommunication Network/Infrastructure The goal was to simulate the behavior of the external telecommunication infrastructure. For the modeling, it was used a “black box” approach where the events are packets entering and exiting the network. The principle is called “discrete time simulation” and it illustrates the behavior of discrete systems. The components of the network are measurement concentrators, routers, the links between them and Remote Terminal Units (RTU) which gives setup to power components (that can command the turbine governors of generators, the alternator excitations and the switches in the electric network). The priority criteria for the choice of the simulation tool were basic features and usage simplicity. SimPy is “an object oriented, process based discrete-event simulation language which rely on standard Python” [51]. 5.4.2.3 The Control Center/Information Infrastructure The aim was to simulate a complete response loop. Once the data is gathered through the telecommunication infrastructure, the control center needs to provide suitable commands, according to the circumstance, meant to maintain the electric parameters within the acceptable limits. It receives frequency and voltage alarms
90
C. Tranchita et al. Matlab 1 Python Python
--electric electric network architecture network architecture
- -real real load loadcurve curve - -line line protection protection - -load load shedding shedding
Perl Perl
PSAT PSAT
IPC
perturbation perturbation file file
-- telecommunication telecommunication network architecture network architecture -- telecommunication telecommunication components components parameters parameters
-- perturbation perturbation (modification of (modification ofthe the functioning state functioning stateofof various various telecommunication telecommunication components) components) -- Gaussian noise Gaussian noise on measurements measurements on
IPC
--data data acquisition acquisition
Perl Perl
--primary primary voltage and frequency regulation regulation
Matlab 2
SimPy SimPy
--input inputdata data characterizing characterizing everyevery electric equipment electric equipment
--alarm treatmentand and control alarm treatment control (secondary frequency and (secondary frequency and voltage regulation) voltage regulation)
Control center Control center
Telecommunication network Telecommunication network
Electric network Electric network
Fig. 5.9 The combined simulator
from the measurement system and it sends new references to the generator RTUs, according to specific situation. Matlab software was chosen to simulate the control center. It is important to underline the fact that this is a different process than the one in which the electrical network is computed. 5.4.2.4 Inter-process Communication In order to couple the three dedicated simulators in one unitary tool, inter-process communication was used. The implementation depends strongly on the operating system, but it is possible to choose an approach that uses the same concept for all the platforms. The basic concept “Named Pipe” was used for the implementation of the interprocess communication (IPC). On Portable Operating Systems Interface (POSIX) it does not raise any difficulties and can easily be integrated with Python or Matlab. On Windows platform, Matlab cannot manage them directly. The integration was, therefore, possible by using Perl. This is an interpreter that can manage Windows named pipes and is provided with Matlab. Figure 5.9 depicts the scheme of combined simulator made out of the three individual simulation tools coupled through inter-process communication pipes. The major difficulty is to preserve the synchronization of the different infrastructure [49]. 5.4.2.5 The Benchmark A study case involving an educational test network was also developed. It consists of a multi-infrastructure model that describes scenarios together with user programmed functions. It was used as a first input of the combined simulator. Figure 5.10 presents the scheme of the proposed test case. For the electrical infrastructure the IEEE 9 bus benchmark was used. This was enriched by equipping the generators with turbine governors and automatic voltage regulators and by paralleling the lines and implementing line protection functions. Real load curve for the consumption were imposed.
5 ICT and Powers Systems: An Integrated Approach
91
RTU 3
RTU 2
CM 2
CM 3
6
5
ROUTER 1 ROUTER 11
Control Center - State estimator
4
ROUTER 2
1
- Alarm treatment - Voltage and frequency adjustment
CM 1
RTU 1
ROUTER 22
Fig. 5.10 The test case
5.4.2.6 Applications The coupled simulator can be used for different inter-infrastructure purposes. The present edition can mainly be employed to measure the impact of various scenarios on electric network and its ICT operation infrastructure. The events that can be simulated include load variations associated with contingencies in the electric network and also failures in the telecommunication infrastructure or even breakdown of the control center. Building multi-infrastructure simulation tools and multi-infrastructure test-cases is an important step for reaching a better comprehension concerning critical infrastructures. The coupled simulator is a very flexible, extensible and highly modular simulation tool. It can be adapted to new needs. 5.4.2.7 Modeling Interdependencies by Complex Networks The previous approach based on behavioral simulation is interesting to understand some of the interdependency phenomena in multi-infrastructure systems. However, this methodology is not sufficient to grasp the totality and a “theoretical” approach appears to be necessary. The aim is not to state that a theoretic approach is better or worse than the previous one, but rather to consider them as complementary. This double vision allows us to progress in the analysis and understanding of the behavior of interconnected large systems. Among the various possible theoretical approaches, apart from those outlined in the state of the art section, the theory of complex networks seems to be very promising [52].
92
C. Tranchita et al.
Indeed, critical infrastructures are large networks, hence the idea to use the concept of graphs. The complex network theory is an extension of the graph theory, relatively new because it is based fully on the new computer resources and large databases available nowadays [53]. Diverse documents have a detailed overview of this field [52, 54–56]. A graph or (complex) network is composed of edges – or otherwise known as connections, links, arcs, lines – connecting two summits – or sites, nodes –; the arcs can be directed or not. According to the graphs, the loops – which are arcs whose two terminals are attached to the same summit – and the multiple arcs may exist or not. In addition, it is possible to associate weights to arcs and/or to nodes. The generality of this definition enables the complex networks theory to be applied to areas as diverse as mathematics, physics, biology, computer science and sociology. To start with, the theory of complex networks focuses on a static or statistical characterization of networks. It concerns the computation of some quantities typifying the topology of these networks such as, the degree of the nodes, the distribution of these degrees, the correlations, the diameter, the characteristic length of the graph, the average degree, and the clustering coefficient, among others. In addition to this purely static characterization, there is also the issue of modeling dynamic phenomena that can occur in networks. In particular, the question of resilience to failures and to attacks arises. Specially, the results shows that a certain type of network that is found frequently in practice and is so-called scale-free, has a high resistance to random failures but it is vulnerable against targeted attacks [57]. Another dynamic phenomenon studied is the problem of the vulnerability of complex networks in relation to attacks based on cascading failures detailed in the research works [58, 59]. It may also be interesting to apply methods of graph partitioning to critical infrastructures. It consists in splitting a graph into two or N separate parts with a minimum line cuts. This study can help identifying areas of weaknesses of a given network. The Internet, World Wide Web, collaboration networks for scientific and film actors are the most frequently networks analyzed by using this theory, in particular because of the existence of associated databases. However, the electrical infrastructure as a complex network was also analyzed in [60]. It described the use of graph models to analyze the vulnerability of such networks to intentional or random attacks consisting of the deletion of nodes [61]. The authors of [62] proposed models of cascading failures in the North American transmission grid. This approach is also described in [63] as a new method to model the cascading failures in power grids. The paper [64] presents an analysis and comparison of different types of models of cascading faults in electricity networks. In addition, [65] proposes an identification of vulnerable lines in the electrical networks based on the theory of complex networks. Finally, there are descriptions of models for studying the interactions of critical infrastructures as complex systems in [66] as well as an analysis of associated risks [67]. The described methods provide remarkable results for a graph that models a single infrastructure. Moreover, taken into account the size of the network, these methods are relatively fast compared to classical time-domain simulations. Particularly, these techniques and models are “simple” and help mathematicians, in some cases, to find analytical solutions for different classes of graphs. It is possible to refine the model as needed as far as only simulation (top level) is concerned.
5 ICT and Powers Systems: An Integrated Approach
93
Fig. 5.11 Illustration of failure propagation in a two-layer system [69]
However, although these methods are applicable to many different infrastructures, presently they are only used on a single infrastructure at once. An attempt to overcome this limitation is the use of layered complex networks described in the papers [68, 69]. The latter can take into account the physical infrastructure (low level) and the logic infrastructure (high-level). This distinction can be found for example in transmission and communication networks and it is illustrated in Fig. 5.11. The interconnected system is divided in three layers, i.e. in three graphs. In this illustration, a single failure in the physical layer leads three separated failures in the logical layer (see Fig. 5.11a). The state of the system after the failure is shown in Fig. 5.11b. Dashed lines represent additional links in a “full rerouting” policy. Nevertheless, this approach is not yet sufficient for separated infrastructures. Research work is currently underway at G2ELab to extend this modeling to interdependencies between multiple heterogeneous infrastructures, specially the power grid and their associated ICS.
5.4.3 A Comparison of the Studied Modeling Approaches In the previous sections, an overview of diverse approaches to model the power grid and their ICS was presented. The different methods have both advantages and weakness as it was explained in each section. It is almost impossible to state the supremacy of any approach, because in most cases these do not rival, but complement each other. Briefly, Petri nets and GSPN are suitable when a low-level of modeling is required, but in this case they cannot model entirely large systems. Petri nets are then appropriate to model in depth a fraction of the system or to model the whole system but in a very high-level. ABM are suitable if the human must be included into the model. The two last approaches explained in the previous sections are studied at the G2Elab. The coupled simulator can model large systems in a low level. Dynamical studies can be thus performed. The different simulators must be synchronized in order
94
C. Tranchita et al.
to model correctly the behavior of the whole system. One of the main advantages is that it can be expanded by integrating more infrastructures’ components. The expansion’s limit is linked to the software restrictions. The results are accurate and can be useful to provide quantitative index which represent the operating state of the power system. The second approach, e.g. the complex networks, seems to be adequate to model very large systems. This modeling is based principally on the networks’ topology, however some basic physical phenomena related to infrastructures can be included. For instance, when the electrical network is modeled, authors believe that load flows or even differential equations can be included in the arcs. Nevertheless, this approach is a high-level modeling compared to the combined simulator but taken into account its simplicity, calculation time is shorter than an equivalent dynamic simulation. Moreover, most of security studies require a big amount of simulations. In that sense, complex networks appear more advantageous than the combined simulator.
5.5 Security Assessment with Regards to Cyber Attacks and ICT Failures The extensive use of ICT for the power grids reveals the notions of risk, vulnerability and interdependency. A better comprehension of the inter-infrastructure phenomena and the vulnerabilities due to the interdependence between the power network and the ICT infrastructure is useful to assess the security of the power system. Nevertheless, there is a lack of appropriate methodologies to the security assessment with regard to ICT failures and cyber attacks on critical equipment, including theses aspects of risk and vulnerability. Modeling approaches in the previous section help to understand this interdependency. Therefore, in order to respond to the economic and technical challenges imposed to power systems, researchers should focus on this type of analysis. The following sections present new path for this research area.
5.5.1 Precedence Graphs and Modified FMECA Approach to Assess Criticality of SCADA/EMS and DMS Functions In order to coherently provide proper security for critical infrastructures and especially the software functions for operators, priorities among them must be set. That requires a better understanding of cascading effects of ICT failures on the power systems. A hierarchy of criticality to the different components in the studied systems (power and ICT components) is recommended [70]. The term criticality appears in disciplines as physics, nuclear engineering and industrial management among others. In physics the notion “self-organized criticality systems” refers to a type of dynamic whose behavior has the tendency to evolve towards critical points. The concept was initially inspired by sand particles dynam-
5 ICT and Powers Systems: An Integrated Approach
95
ics and by avalanches [71]. In nuclear engineering, criticality refers to the potential of an accident to cause chain reactions that can lead to major damages to the nuclear power plant [72]. The definition of criticality is quantitative in industrial management. Its purpose is to find the “weak link” in an industrial process. In FMECA (Failure Mode, Effects, and Criticality Analysis) criticality is the product of fault frequency, fault severity and the fault non detection probability [73]. In the context of critical infrastructure security (power plus own ICT), criticality quantifies to what extent the failure of a considered function/component of the power system and ICS that can lead to a major outage [74]. 5.5.1.1 FMECA FMECA has the purpose of finding the components of a system whose frequent failures can have severe consequences, and are unlikely to be detected. In order to compute criticality according to the criteria provided by industrial management, first fault frequency (F), severity (S) and non-detection probability (D) have to be known for each one of the components in the studied system [75]. Classical FMECA is an empirical approach, where to each factor is given a value between 1 and 10, 10 being the worst possible case in Table 5.1. Computing criticality for the applications used in control centers to supervise and coherently control the electric network needs adapted methodology for assessing the values to each one of the concerned factors. Figure 5.12 presents a deterministic approach to quantify the criticality of SCADA/EMS and Distribution Management System (DMS) functions. The first tool necessary for the approach is a benchmark made out of electric, telecommunication and control center components that run as a whole unitary system. Another necessary tool is a combined simulator. For each component of the control center the fault frequency can be computed by using software reliability methods and input data from reliability reports [76]. Software reliability is a science descended from classical reliability that takes into consideration the conceptual differences between material and software. Reliability reports list the registered failures of software and the moment of each failure occurrence. Severity can be determined by using coupled simulators as those presented in the above section of this chapter. In order to better understand cascading failures severity, not only the impact on the electric infrastructure is considered, but the whole coupled infrastructure made out of the power grid and the ICS components is studied. This approach can answer to questions such as “How serious is the fact that the Table 5.1 Classic FMECA [75] F Frequency S Severity 10 Permanent 10 Human death 5 Frequent 5 Financially or materially consequences 1 Rare 1 Not serious
D 10 5 1
Non detection probability No detection possibilities A detection system exists but it is not infallible The detection system is infallible
96
C. Tranchita et al.
Dispatching functions benchmark
Telecommunication benchmark
Electric benchmark
Reliability reports (number and detection moments of bugs)
Simulation software
Fault detection systems Reliability reports (number and detection moments of bugs)
F=Fault frequency computing
S = Severity computing S (electric network state, telecommunication network state, control center state)
D=Fault non-detection probability
C = FxSXD Criticality ordering/hierarchy
Fig. 5.12 Adapted FMECA for SCADA/EMS/DMS applications
electric network is in normal state but the operator can’t operate it?” In practical reliability reports, no numeric correlation is done between the faults of dispatching functions/applications (for instance: State Estimation, Load Frequency Control, etc.) and their impact on the electric network [70]. The non-detection probability is the failure occurrence of the detection components. Their computation is the most problematic as the information concerning this aspect is rather poor when referring to operation applications. Study cases inventories of different SCADA/EMS and SCADA/DMS architectures have showed self-surveillance tools included by different control center providers. It is important to keep in mind that the most suitable way to reduce criticality is not by diminishing the severity of failures but by investing in powerful tools meant to detect them in due time. Once all the factors are computed, they are multiplied and the result gives the criticality of each studied component. Adapted FMECA for electric network software management applications requires a complex procedure. This is why other approaches must be considered. 5.5.1.2 Precedence Graphs The core of the concept of criticality refers to systems whose components are deeply interdependent. It’s mainly the data flows that cause the interdependencies between the electric network and the ICT infrastructure. From system theory point of view, the control center is an expert system. Data flows from one software application to another in order to provide a reaction to maintain the power system’s state within the operation limits.
5 ICT and Powers Systems: An Integrated Approach
97
Software applications inventory
Precedence graph
Succession indices
Criticality hierarchy
Fig. 5.13 Criticality hierarchy procedure for SCADA/EMS, SCADA/DMS functions
A simplified approach for the criticality hierarchy is based on the data flows interdependencies between SCADA/EMS, SCADA/DMS applications. Based on the definitions of criticality, it was considered that criticality is the potential of a component to cause a cascade reaction. More precisely, the more critical an application is, the more applications depend directly or indirectly upon it [70]. It is the graphs and data bases theories that provide a way to model this type of interdependencies, where a chain of applications take as input the output of their precedents. A precedence graph is a directed, acyclic graph where nodes represent sequential activities and where arcs link for instance node i to node j, require that activity i complete before activity j can start [77]. Precedence graphs are used in data base theory in contexts of concurrency control [78]. The conflict test consists in finding the cycles in the precedence graph. Figure 5.13 presents a four step procedure meant to provide the criticality hierarchy of the software applications that sustain the well functioning of a control center. The first step is to make an inventory of the considered functions by listing the name, the necessary input data, the output data and the methodology of each application. Thus, the precedence graph is built, based on the information in the inventory. The thirds step consists in running a modified depth first algorithm whose purpose is to find the number of applications directly and indirectly dependent of each studied functions. The output of the algorithm is the list of succession indices for all inventoried applications. The fourth and last step gives the criticality hierarchy by considering the succession indices. Figure 5.14 depicts a case study for a SCADA/DMS. The inventory consisted of a list of 24 applications, listed in Table 5.2. Each function was modeled as a finite state machine that reads input, executes an algorithm and writes output. The input data, the output data and the used methodology is the information necessary to represent the nodes and the branches of precedent graph [79]. Based on this case study and the above presented procedure, the most critical functions are the Network Model (node 2), Data Acquisition (node 1) and Relay
98
C. Tranchita et al.
6 14
21
20 18
5
8
9
13
11
2
7 1
10 19 23
15
12
17
24
4 16
22
3
Fig. 5.14 Case study: precedent graph for criticality hierarchy for SCADA/DMS functions
Table 5.2 Case study: list of SCADA/DMS functions Function name Label Function name Data acquisition 1 Supply restoration Network model 2 Optimal reconfiguration Topology analyzer 3 Load forecasting (short, medium and long term) Load flow 4 Capacitor placement Fault calculation 5 RTU placement Reliability analysis 6 Network development State estimation 7 Network reinforcement Checking of breakers and 8 Maintenance scheduling fuses capacity Performance indices 9 Large area restoration Thermal monitoring 10 Load shedding Volt control 11 Security assessment Relay protection 12 Fault management
Label 13 14 15 16 17 18 19 20 21 22 23 24
Protection (node 12). The function that is included in the most of the cycles is the State Estimator (node 7). Criticality and risk assessment for multi-infrastructure systems need adapted concepts, definitions and models for these joint systems. The adapted solutions should take into consideration mathematical behavior and the flows causing the interdependencies between the involved infrastructures.
5 ICT and Powers Systems: An Integrated Approach
99
5.5.2 Using Bayesian Networks for Security and Risk Assessment This last approach concerns the power system security assessment with regard to the potential cyber attacks against the ICS of the power system. The evaluation of the impact of cyber attacks on the ICS is not an easy task, because of the uncertainty modeling of this type of intentional acts and their uncertain impact on the system’ performance. The intentional nature of the cyber attacks and their occurrence does not exhibit a random characteristic. Thus, answering the following type of questions becomes difficult: What is the probability of the materialization of these attacks? What is the probability to have a catastrophic impact? The response requires thinking about the attackers’ motivation, possible vulnerabilities exploited, the lack of protection, generation’s ICT (age), the criticality of functions performed by attacked technology, among other numerous factors. All these variables contain different uncertainty levels [18]. Moreover, in this problem, different infrastructures can be affected and finally to lead damages in the power system operation. It appears that Bayesian networks offer the possibility of including in a single model, different types of uncertainties present into the problem and, of modeling interdependencies between the different infrastructures.
5.5.2.1 Generalities: Bayesian Networks In power systems, uncertainty influences models, designs and decisions as much in the technical as in the economic scope. In the last three decades much of the research done on the area of power system security has tried to include the uncertainty. Specifically, sources of uncertainty in the security assessment are power system models and parameters, measurements, external events that affect infrastructures, predictions and forecasting, and human intervention in the control loop. Until present, probability has been the most common model to represent the uncertainty. Inferences are also made in order to have more extensive information, which is not directly observed based on data, experience, a priori reasoning. In the probabilistic inference, the probability models the uncertainty and the joint probability distribution function describes the dependency relationship between variables/facts. Bayesian networks are one of the most common graphic models representing probabilistic inference. They are employed to understand and to obtain conclusions about interdependency relationships into models. Bayesian network is a directed acyclic graph whose structure describes a set of conditional independence properties about the variables [80]. The network offers for a set of variables {X1,X2,…,Xn} a compact representation of the joint probability distribution. The structure and the numerical parameters of a Bayesian network can be elicited from the experts or based on data. Four situations are distinguished in this construction:
100
C. Tranchita et al.
• Experts build the Bayesian network based on determinist knowledge. In this case the modeling does not represent any problem because both laws and variables are known. A typical example in power systems operation is: if the transmission line flow exceeds its capacity, then line is overloaded. • Experts’ knowledge is modeled since variables and relationships between them are uncertain. Intuitive reasoning is used for this. In this case, information of a diverse nature is taken into consideration in the modeling. This process is more laborious even if they know this matter or have a lot of experience. It exits a lot of examples in the power system security field as: the modeling of attackers’ motivation against ICT, interdependencies between power grid and its associated ICS, human acts and decision in the control loop among other examples. • Experts assign subjective probability values to network parameters: when data are not available or the uncertainty does not obey to randomness, the only possibility is to use a personal measurement of uncertainty or belief in an event. • Bayesian network is built based on data. Variables and links between them can be established from statistical (analysis of correlation, covariance, etc.) and optimization techniques. Once the Bayesian network is created, via the probabilistic inference, posterior probability distributions of variables of interest are quantified. The problem is then to find the probability of a set of query variables given a set of evidence. This is possible by sequential applications of Bayes’ theorem. Different possible of inferences are illustrate in Fig. 5.15. According to Pearls [81], “Bayesian networks are direct representations of the world, not of reasoning process”. This is correct because links in the Bayesian networks represent real causal connections and not the flow of information during
a
b
Evidence W
V
Query
X
ZZ Query
Y Y
V
From effects to causes
c
Query W
V Evidence
Query
Query
Z
Y Y
Evidence
From causes to effects
X Evidence
Y
X X
W
Z
d
V
Between causes of a effect
Fig. 5.15 Types of inferences in Bayesian networks
X
W
Query
Y Y Combined
Evidence
Z
Evidence
5 ICT and Powers Systems: An Integrated Approach
Identification of main IC assets
Assessment of the threats
Assessment of the vulnerabilities
101
Assessment of Inter dependency
Risk assessment to the power system
Fig. 5.16 Risk assessment with regard to cyberattacks
reasoning. The different inferences above presented can be obtained from these networks by propagating information in any direction.
5.5.2.2 The Approach Our approach is to use Bayesian networks to model the cause–effect relationships between the power grid and the ICS and also between external variables that influence the decision to assault a critical infrastructure. Potential impacts on the power system operation, (if effectively the attack took place) can also be evaluated. By means of Bayesian networks, it is possible to assess the risk of the power grid security due to contingencies of the ICS. This approach is showed in Fig. 5.16. A modeling of the attacker motivation, their capabilities and resources as well as the means that assets have to protect themselves in case of an attack is made. With these elements, the probability of whether or not an attack will take place, and also the intensity of such an attack on ICS can be found. By using the subjective approach, the probability of whether the attack on the ICS would affect the operation of the power system to a major or minor degree is quantified. Since risk is defined as the multiplication of the probability of occurrence of a defined unwanted event by and its consequence, the risk for the power operation related to cyber attacks can be accomplished. The possibility of incorporating subjective probability values by using Bayesian networks allows us to model different types of uncertainty and to model the interdependencies between main functions of the ICS and the power grid operation. Additionally, these graphs can also improve the communication between experts of different domains as in the case of ICT’s and power systems’ experts.
5.5.3 A Comparison of the Presented Security Assessment Approaches In the previous sections three approaches to assess the security of a power system taking into account failures from its ICS and cyber attacks were presented. The scope of these methods is not the same and once more, authors believe that they can be complementary. In the two first approaches, the criticality is taken as a measurement of the security. FMECA presented in this chapter is an elaborate method that
102
C. Tranchita et al.
takes into consideration the chain of software applications in a control centre, the impact of failures on the electric network and the fault detection systems. The criticality assessment is based on deterministic methods and not on empiric evaluations. This evaluation is performed thanks to the combined simulator shown in this chapter. Probabilities are estimated based on historical data, i.e., by using the frequency approach of the probability. In that sense, the advantage of FMECA for a given SCADA/EMS/DMS is that calculation appears to be accurate. However, FMECA method needs software reliability reports that are based on specific procedures and are time consuming. The second approach was proposed to give a hierarchy of the criticality of SCADA/EMS and SCADA/DMS applications. Precedent graphs were employed for this purpose. This is a synthetic and basic method based on the inventory of software applications in a control centre and on the comprehension of the data flows among them. Nevertheless, this method must be integrated into an impact study in order to differentiate the criticality of the applications in a cycle. Finally, Bayesian networks are proposed to assess a security risk index while considering cyber attacks on ICS. The advantage of this modeling is that intentional events can be taken into account. In addition, interdependencies between the infrastructures can be modeled in a simple way, e.g., by the conditional probability. Hybrid approaches are foreseen. For instance, in order to assess the risk of cyber attacks against the control center, probability of external events can be determinate by the use of Bayesian networks and the impact can be estimated by employing the criticality of the function affected. This criticality can be obtained by means of a precedence graphs model. Another approach is to measure the impact of cyber attacks trough the combined simulator or a complex network model of the system. These hybrid solutions would provide quantitative ways of studying inter-infrastructure events.
5.6 Conclusions After more than a century of electricity usage, and within the era of Internet, it is hardly unacceptable to live without electrical energy and to experience a media interruption. The consequences of a generalized incident, or blackout, either from natural origin, assets’ failures or malicious attack are dramatic both economically and socially. Present power systems are heavily dependent on ICT at various levels from measurement/bay level to control centers with its related communication and information exchange with market places and other control systems (neighboring systems). Indeed, they are used for gathering information as well as for issuing control actions that are vital for the system survivability. In addition, effective integration of ICT in power system can lead to more optimized operation and better control of these systems thus increasing efficiency. Besides, ICT are key components and functions in the ongoing research and development of “SmartGrids”.
5 ICT and Powers Systems: An Integrated Approach
103
However, ICS and power systems have evolved differently and considered by different communities. Hence, the integration of ICS in power systems is mostly carried out in layer shape structure (adding the ICT infrastructure once the electrical infrastructure is already built). Indeed, these infrastructures were planed almost independently. Nevertheless, these infrastructures are heavily dependent on each other. With liberalization and the followed responsibility partitioning between the various actors, the interdependencies of these infrastructures is becoming more and more critical. Intrusions on ICS or databases by cyber attacks are new threats on the ICT infrastructures. In addition, ICT failures can happen and affect the power system operation. Moreover, the use of “on the shelf” ICT components and the lack of risk assessment of ICT failures, whether component failure or cyber attack, on power system security is becoming a major threat for the whole system. Therefore, it is of prime importance to consider both interconnected infrastructures in the modeling, design and security analysis. Indeed, an integrated approach is needed at all levels. In this context, modeling interdependencies is considered as a prerequisite for securing those infrastructures. The expected output of such vision is that ICT can not only make the power system more resilient but also more efficient while allowing all actors to be involved including the end user (active customer). Furthermore, it can facilitate the integration of the renewable energy sources by providing, for example, efficient, cost effective and secure solutions to monitor and even control large scale dispersed small size generators. In this way, ICT contributes also in the protection against to climate change.
5.7 Appendix 1: ICT Failures and Cyber Attacks In the following, some cyber-incidents that have interfered power systems’ communications security during the last few years are mentioned • Tom Donahue, a CIA analyst, in January 2008 warned the electric power sector that cyber attackers had hacked into the computer systems of utility companies outside the United States and made demands, in at least one case causing a power outage that affected multiple cities. He said “We do not know who executed these attacks or why, but all involved intrusions through the Internet”. • January 2003, an incident occurred when the worm “Slammer” of the Internet infected the monitoring network of the nuclear plant Davis–Besse of First Energy Corporation in Ohio, the reactor happened to be offline. The worm entered the plant network via a contractor’s infected computer connected through a T1 line (telephone dial-up) directly to the plant network, thus bypassing the firewall [82]. The electric utility company lost control of their EMS/ SCADA for system nearly 5 h. A later report made by North American Electric Reliability Council (NERC) reached the conclusion that while nothing serious happened as a result, the EMS/SCADA system was not able to communicate
104
•
•
•
•
C. Tranchita et al.
with substations and plants, forcing the company operations staff to resort to manual operation of their transmission and generation assets until control could be restored. In September 2001, the Nimda worm was circulated widely throughout the world. The NERC know of an electric utility whose EMS/SCADA network was compromised by the Nimda worm. The worm then propagated itself and spread to the internal project network of a major EMS/SCADA vendor via the vendors’ support communications circuit, devastating the EMS vendors’ internal network and launching further attacks against the EMS/SCADA networks of all other customers of the vendor with support communications circuits [83]. In August 2001, the Code Red II worm successfully compromised the internal network of a company that provides services to NERC and numerous electric utility companies. This worm then attacked customers connected to this company, successfully compromising an exposed web server at one of the utility control centers. It is important to note that the compromised server was presumed to be protected, as it was not exposed to the Internet. This attack was propagated via the private frame relay network connecting the service company, the impacted utility, and the other connected utility companies [83]. For 17 days, between April 25 and May 11 of 2001, hackers managed to remain undetected after they breached the network of the Folsom, based California Independent System Operator. However, the attacks were limited to a “practice network” and so they posed no threat to the real power grid or the primary power distribution network that handles the Western USA. Although no damage was reported, officials traced the intrusion back to a system in China [84]. In December 2000 the National Infrastructure Protection Center (NIPC), said that “A regional entity in the electric power industry has recently experienced computer intrusions through anonymous FTP (File Transfer Protocol) login exploitation and the intruders used the hacked FTP site to store and play interactive games that consumed 95% of the organization’s Internet bandwidth”. NIPC added that “the compromised bandwidth threatened the regional entity’s ability to conduct bulk power transactions [85].
A document prepared for the CIGRE Joint Working Group-Security for Information Systems and Intranets in Electrical Power Systems- entitled “Cyber security considerations in power system Operations” said that a significant number of cyber incidents have taken place but only some have been admitted to or described [86]. A sample of incidents showed in this document is given as follows: • Large Generating Plant Output Reduced to Zero: The control system of a large generating plant operating at a number of 100 MW was infected by a virus and its output was reduced to virtually zero in a few seconds. The infection came from a connected corporate IT network. The solution was to rigorously separate the real-time and corporate networks [86]. • Distribution SCADA System Partly Disabled: A virus infected a lap-top which was used by a maintenance technician to modify a telecoms router. The virus affected all telecom nodes, including some used by a SCADA system.
5 ICT and Powers Systems: An Integrated Approach
105
The SCADA system was rendered partially inoperable for a number of days. A partial solution required better management of virus protection on lap-tops [86]. • Unauthorised Access to EMS Applications: A utility gave remote access rights to an EMS supplier. It was observed that application patches had been applied without agreement. No problems arose, but the situation revealed that continuous, non-verified access had remained open to an external [86]. Other Important Facts • Idaho National Laboratory in USA performed an experiment for the Department of Homeland Security (DHS) in March 2007 in order to evaluate the potential damage resulting from cyberattacks. The laboratory successfully destroyed a generator while conducting an experimental cyberattack. The attack involved the controlled hack of a replicated control system commonly found throughout the American power systems. Members of the House Committee on Homeland Security are concerned that malicious actors could use the same attack vector against large generators and other critical rotating equipment that could cause widespread and long-term damage to the electric infrastructure of the United States. The following failures or disoperation of ICT functions, which threatened the security of the power system, were cited by the GRID consortium [6, 74]. • Tripping of six 400 kV systems in Vallée du Rhône, France, due to time delay in communication • Substation outages initiated by transfer trip in Froncle and Pont la Ville, France, due to replayed information by the communication grid operator • Cyber-security problems in China, two events: loss of measurements from a large number of digital recorders, uncontrolled ramp-up and ramp-down of several hundred megawatts at a hydro power plant • Loss of dual server at control center due to software changeover • Loss of communications due to third party lines becoming faulty
References 1. Shahidehpour, M., Wang, Y.: Communication and control in electric power systems. IEEE Press Power Engineering Series (2003) 2. Ekstedt, M., Sommestad, T.: Enterprise architecture models for cyber security analysis. Proceedings of Power System Conference and Exposition, Seattle USA (2009) 3. European Commission: Terms of reference feasibility study: European network of secure test centers for reliable ICT-controlled critical energy infrastructures, July 2007 4. IEEE Working Group: Reliability indices for use in bulk power system supply adequacy evaluation. IEEE Trans. Power Apparatus Syst. 97(4), 1097–1103 (1978) 5. Mussington D.: Concepts for enhancing critical infrastructure protection: relating Y2K to CIP research and development. RAND: Science and Technology Institute, Santa Monica CA, 29 (2002) 6. GRID consortium: ICT vulnerabilities of power systems: a roadmap for future research. European Communities, ISBN 978-92-79-07138-6 (2007) 7. Kundur, P.: Power system stability and control. EPRI Editors and McGraw-Hill, New York (1993)
106
C. Tranchita et al.
8. Bjorn, T., Fontela, M., Mellstrand, P., Gustavsson, r., Andrieu, C., Bacha, S., Hadjsaid, N., Besanger, Y.: Overview of ICT components and its application in electric power systems. In: Proceedings of 2nd International Conference on Critical Infrastructures, Grenoble, France (2004) 9. Zima M., Bockarjova, M: Operation, monitoring and control technology of power systems. EEH Power Systems Laboratory, ETH Zurich. http:// www.eeh.ee.ethz.ch (2007). Accessed March 2007 10. Andersson, L., Brand, K.P., Wimmer, W.: The impact of the coming standard IEC61850 on the life-cycle of open communication systems in substations. In: Proceedings of Transmission and Distribution, Brisbane, Australia. http://www.nettedautomation.com/download/ mannheim-2003-03/Brisbane_Brand_2002-08.pdf (2001). Accessed 7 Dec 2009 11. Gupta, R.P.: Substation automation using IEC61850 standard. In: Proceedings of 15th National Power Systems Conference, Bombay. http://www.ee.iitb.ac.in/~npsc2008/NPSC_ CD/Data/Oral/DIC4/p107.pdf (2008). Accessed 7 Dec 2009 12. Wu, F.F., Moslehi, K. Bose, A.: Power system control centers: past, present, and future. Proc. IEEE (2005). doi: 10.1109/JPROC.2005.857499 13. Wehenkel, L.: Systèmes de conduite des grands réseaux électriques. http://www.montefiore. ulg.ac.be/~lwh/SCGRE/ Accessed 7 Dec 2009 14. Wood, A., Wollenberg B.: Power generation, operation, and control, 2nd ed. WileyInterscience, New York (1996) 15. Grigsby, L.: Electric Power Engineering Handbook: Power System Stability and Control, 2nd edn. CRC Press, Boca Raton, FL (2007) 16. CIGRE WG 38-03, McGillis, D.: Power system reliability analysis application guide. CIGRE, Paris (1987) 17. IEC: International Electrotechnical Vocabulary: Dependability and quality of service. In: International Standard 60050-191, 191, Geneva, Switzerland (1999) 18. Tranchita, C., HadjSaid, N., Torres, A.: Risk assessment for power system security with regard to intentional events. Thesis to obtain the degree of Doctor from the Grenoble Institute of Technology and the Los Andes University (2008) 19. Shaw, W.T.: Cybersecurity for SCADA Systems. PennWell Books, Tulsa, OK (2006) 20. Stamp, J., Dillinger J., Young, W., DePoy, J.: Common vulnerabilities in critical infrastructure control systems. Sandia National Laboratories report SAND2003-1772C: Albuquerque, New Mexico. http://www.oe.netl.doe.gov/docs/prepare/vulnerabilities.pdf (2003). Accessed 7 Dec 2009 21. Howard, J.D., Longstaff, T.A.: A common language for computer security incidents. Sandia National Laboratories, Report SAND98-8667, USA (1998) 22. Tranchita, C., HadjSaid, N., Torres, A.: Overview of the power systems security with regard to cyberattacks. In: Proceedings of 4th International CRIS Conference on Critical Infrastructures, Sweden (2009). doi: 10.1109/CRIS.2009.5071500 23. National Security Telecommunications Advisory Committee: Electric power risk assessment. http://www.aci.net/Kalliste/electric.htm (1997) 24. Dacey, R.F: Critical infrastructure protection – challenges and efforts to secure control systems. Technical Report GAO-04-354, United States General Accounting Office (GAO), Washington (2004) 25. Gursesli, O., Desrochers, A.A.: Modeling infrastructure interdependencies using Petri nets. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, vol. 2, 1506–1512 (2003) 26. Krings, A., Oman, P.: A simple GSPN for modeling common mode failures in critical infrastructures. In: Proceedings of 36th Hawaii Conference on System Sciences (2003) 27. Schneider, K., Liu, C.C., Paul, J.P.: Assessment of interactions between power and telecommunications infrastructures. IEEE Trans. Power Syst. 21(3), 1123–1130 (2006) 28. Macal, C., Sallach, D.: Workshop on agent simulation: applications, models, and tools, University of Chicago, Chicago, IL (1999) 29. Bonabeau, E.: Agent-based modeling: methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U S A 99, 7280–7287 (2002)
5 ICT and Powers Systems: An Integrated Approach
107
30. Grimm, V.: Ten years of individual-based modeling in ecology: what have we learned and what could we learn in the future? Ecol. Model. 115(2–3), 129–148 (1999) 31. Barton, D.C., Stamber, K.L.: An agent-based microsimulation of critical infrastructure systems. In: Proceedings of 8th International Energy Forum, International Energy Foundation’s ENERGEX (2000) 32. Macal, C., North, M.: Tutorial on agent-based modeling and simulation. In: Proceedings of the 2005 Winter Simulation Conference (2005) 33. Hopkinson, K., Wang, X., Giovanni, R., Thorp, J., Birman, K., Coury, D.: Epochs: a platform for agent-based electric power and communication simulation built from commercial off-theshelf components. IEEE Trans. Power Syst. 21(2), 548–558 (2006) 34. Panzieri, S., Setola, R., Ulivi, G.: An agent based simulator for critical interdependent infrastructures. In: Proceedings of 2nd Conference on Securing Critical Infrastructures (2004) 35. Panzieri S., Setola R., Ulivi G.: An approach to model complex interdependent infrastructures. In: Proceedings of 6th IFAC World Congress 2005 (2005) 36. Tolone, W.J., Wilson, D., Raja, A., Xiang, W., Hao, H., Phelps, S., Johnson, E.W.: Critical infrastructure integration modeling and simulation. H. Chen et al. (Eds.), Intelligence and Security Informatics, Lecture Notes in Computer Science, LNCS 3073, pp. 214–225, Springer-Verlag Berlin Heidelberg 2004 37. Zhang, P., Peeta, S., Friesz, T.: Dynamic game theoretic of multilayer infrastructure networks. In: Proceedings of 10th Conference on Travel Behavior Research, Lucerne (2003) 38. Casalicchio, E., Galli, E., Tucci, S.: Federated agent-based modeling and simulation approach to study interdependencies in its critical infrastructure. In: Proceedings of 11th IEEE International Symposium of Distributed Simulation and Real-Time Applications, 182–189 (2007) 39. Kim, H.M., Biehl, M., Buzacott, J.A.: M-ci2: modeling cyber interdependencies between critical infrastructures. In: Proceedings of 3rd IEEE Conference on Industrial Informatics, 644–648 (2005) 40. Lee, E.E., Mendonça, D.J., Mitchell, J.E., Wallace, W.A.: Restoration of services in interdependent infrastructure systems: a network flows approach. Technical Report 38-03-507, Rensselaer Polytechnic Institute, USA (2003) 41. Lee, E.E., Mitchell, J.E., Wallace, W.A.: Assessing vulnerability of proposed designs for interdependent infrastructure systems. In: Proceedings of 37th Hawaii Conference on System Sciences (2004) 42. Lee, E.E., Mitchell, J.E., Wallace, W.A.: Restoration of services in interdependent infrastructure systems: a network flows approach. IEEE Trans. Syst. Man Cybern. 37(6), 1303–1317 (2007) 43. Wolthusen, S.D.: GIS-based command and control infrastructure for critical infrastructure protection. In: Proceedings of 1st Workshop on Critical Infrastructure Protection (2005) 44. Johnson, C.W., Williams, R.: Computational support for identifying safety and security related dependencies between national critical infrastructures. In: Proceedings of 3rd IET International Conference on System Safety (2008) 45. Zimmerman, R.: Decision-making and the vulnerability of interdependent critical infrastructures. In: Proceedings of IEEE Conference on Systems, Man and Cybernetics, vol. 5, 4059– 4063 (2004) 46. Permann, M.R.: Toward developing genetic algorithms to aid in critical infrastructure modeling. In: Proceedings of IEEE Conference on Technologies for Homeland Security, 192–197 (2007) 47. Panzieri S., Setola R.: Failures propagation in critical interdependent infrastructures. Int. J. Model. Identif. Control 3(1) (2008) 48. Pederson, P., Permann, M.: Interdependency modeling: a survey of U.S. and international research. Idaho National Laboratory, USA (2006) 49. Rozel, B., Viziteu, M., Caire, R., Hadjsaid, N., Rognon, J.P.: Towards a common model for studying critical infrastructure interdependencies. In: Proceedings of IEEE Power and Energy Society General Meeting (2008)
108
C. Tranchita et al.
50. Milano, F.: An open source power system analysis toolbox. Trans. Power Syst. 20(3):1199– 1206 (2005) 51. Muller, K.: Advanced systems simulation capabilities in SimPy. In: EuroPython, Gothenburg, Sweden (2004) 52. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwanga, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424(4–5), 175–308 (2006) 53. Reinhard, D.: Graph Theory, 2nd ed. Springer-Verlag, New York (1997) 54. Albert, R., Barabási, A.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002) 55. Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks with aging of sites, Phys. Rev. E, 62(2):1842–1845 (2000) 56. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167– 256 (2003) 57. Barrat, A., Barthélemy M., Vespignani A.: Réseaux complexes et physique statistique. Images de la Physique (2006) 58. Lai, Y.C., Motter, A.E., Nishikawa, T.: Attacks and cascades in complex networks. Physics Lectures Notes, Springer, 650, 299–310 (2004) 59. Motter, A.E., Lai, Y.C.: Cascade-based attacks on complex networks. Phys. Rev. E 66(6) (2002) 60. Holmgren, A.J.: Using graph models to analyze the vulnerability of electric power networks. J. Risk Anal. 26(4), 955–969 (2006) 61. Carreras, B.A., Lynch, V.E., Dobson, I., Newman, D.E.: Critical points and transitions in an electric power transmission model for cascading failure blackouts. CHAOS 12(4), 985–994 (2002) 62. Kinney, R., Crucitti, P., Albert, R., Latora, V.: Modeling cascading failures in the North American power grid. Eur. Phys. J. B 46(1), 101–107 (2005) 63. Sun, K.: Complex networks theory: a new method of research in power grid. In: Proceedings of Transmission and Distribution Conference and Exhibition: Asia and Pacific, 1–6 (2005) 64. Sun, K., Han, Z.X.: Analysis and comparison on several kinds of models of cascading failure in power system. In: Proceedings of Transmission and Distribution Conference and Exhibition: Asia and Pacific, 1–7 (2005) 65. Chen, X., Sun, K., Cao, Y., Wang, S.: Identification of vulnerable lines in power grid based on complex network theory. In: Proceedings of IEEE Power Engineering Society General Meeting, 1–6, (2007) 66. Newman, D.E., Nkei, B., Carreras, B.A., Dobson, I., Lynch, V.E., Gradney, P.: Risk assessment in complex interacting infrastructure systems. In: Proceedings of 39th Hawaii Conference on System Sciences (2005) 67. Carreras, B.A., Newman, D.E., Gradney, P., Lynch, V.E., Dobson, I.: Interdependent risk in interacting infrastructure systems. In: Proceedings 40th Hawaii Conference on System Sciences (2007) 68. Kurant, M., Thiran, P.: Layered complex networks. Phys. Rev. Lett. 96, 138701 (2006) 69. Kurant, M., Thiran, P., Hagmann, P.: Error and attack tolerance of layered complex networks. Phys. Rev. E. 76, 026103 (2007) 70. Hadjsaid, N., Tranchita, C., Rozel, B., Viziteu, M., Caire, R.: Modeling cyber and physical interdependencies – applications in ICT and power grids. Proc. Power Syst. Conf. Expos. (2009). doi:10.1109/PSCE.2009.4840183 71. Jensen, H.J., Goddard, P., Yeomans, J.: Self-organized Criticality Emergent Complex Behavior in Physical and Biological Systems. Cambridge: Cambridge University Press, 1–6 (1998) 72. McLaughlin, T.P., Monahan, S.P., Pruvost, N.L., Frolov, V.V., Ryazanov, B.G., Sviridov, V.I: A review of criticality accidents: 2000 revision. Los Alamos National Laboratory Report LA-13638 (2000) 73. O’Connor, P., Newton, D., Bromley, R.: Practical Reliability Engineering, pp. 206–214. Wiley, New York (2002)
5 ICT and Powers Systems: An Integrated Approach
109
74. Matthew, G.: GRID Consortium, A survey of ICT vulnerabilities of power systems and relevant defense methodologies. In: Proceedings of IEEE Power Engineering Society General Meeting (2007) 75. Stamatis, D.H.: Failure Mode and Effect Analysis: FMEA from Theory to Execution. American Society for Quality (ASQ), Milwaukee, WI (1995) 76. Gaudoin, O., Ledoux, J.: Modélisation aléatoire en fiabilité des logiciels. Hermès Science Publications-Lavoisier, Paris (2007) 77. Giorgio, I.: Introduction to distributed systems and networking. http://www.cis.temple.edu. Accessed 5 January 2009 78. Elmasri, R., Navathe, S.: Fundamentals of Database Systems. pp. 570–591. Addison-Wesley, Upper Saddle River, NJ (2004) 79. Viziteu, M., Caire, R., Georges, D., HadjSaid, N.: Criticality hierarchy procedure applied to software applications for electrical networks management. In: Proceedings of 4th CRIS Conference on Critical Infrastructures, Sweden (2009) 80. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA (1988) 81. Pearl J.: Bayesian networks. In: Arbib, M. (ed.) Handbook of Brain Theory and Neuronal Networks, MIT Press, Cambridge, MA (2000) 82. U.S. Nuclear Regulatory Commission NRC: Information Notice 2003-14. http://www.nrc. gov/reading-rm/doc-collections/gen-comm/info-notices/2003/in200 314.pdf (2003) 83. North American Electric Reliability Council NERC: Permanent cyber security standard. SAR Drafting Team, USA (2003) 84. Weisman, R.: California power grid hack underscores threat to U.S. http://www.newsfactor. com/perl/story/11220.html (2001) 85. Greene, T.: Civilization hanging by a thread. White papers, Washington, USA. http://www. theregister.co.uk (2000). Accessed 7 Dec 2009 86. Roche P.: Cyber security considerations in power system operations. CIGRE Joint Working Group Security for Information Systems and Intranets in Electrical Power Systems, JWG D2/ B3/C2.01 (2005)
Chapter 6
Governance: How to Deal with ICT Security in the Power Infrastructure? Marcelo Masera
Abstract The evolution of the power infrastructure has been characterized by the increasing and intensive use of information and communication technologies. This paper argues that this transition has situated the infrastructure in a new paradigm, here denominated “E + I”, indicating the inextricably interwoven nature of the energy and information aspects. The paper discusses the main characteristics of the power infrastructure, and identifies five strata for the analysis of the integration of the ICT components, four within the single electric company, and one referring to the interactions among the actors of the power infrastructure. Finally, the paper finishes with some considerations on the implications for the governance of cybersecurity in power infrastructures, highlighting the need of legitimacy for the governance arrangement, but also the requirement of producing efficient measures. For this solid supporting evidence is necessary, and this would require the support of appropriate testing facilities. Keywords Power infrastructures • cybersecurity • governance
6.1 Introduction The massive use of ICT in the power infrastructure raises the problem of security: from daily experience it is evident that all ICT systems are vulnerable to a wide range of intentional and accidental threats. As the problem affects the whole infrastructure and the consequences can be serious for society at large, both power operators and governments have responsibility over that topic. However, as industry looks at the profitability of their operations, and the efficiency and continuity of their processes, while governments should consider electric power as a vital service M. Masera () Institute for the Protection and Security of the Citizen, Joint Research Centre, European Commission, Ispra, VA, Italy e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_6, © Springer Science + Business Media B.V. 2010
111
112
M. Masera
for society and the potential impact onto national security, there is the need for a joint pursuit where all interests and concerns can be taken into account and a workable solution developed. This is the process of governance of ICT security in the power infrastructure. The governance of infrastructures has to respect two conditions: it has to convey legitimacy (i.e. originated from a solid legal basis), and act producing effective solutions [1, 2]. The problem of cybersecurity of the power infrastructure falls short of satisfying both conditions. The legitimacy in USA derives from the Energy Policy Act of 2005 [3], and the creation of a Electric Reliability Organization (ERO). In the European Union, a set of policy packages has focused since 1996 [4, 5] on the of the creation of effective energy markets and deregulating the industry, while the infrastructure security aspects has been the object of a recent legislation on Critical Infrastructure Protection. The weaker points appear to be on the efficiency side of the governance mode. This derives from several special points that are discussed in the following sections: • The power system as an infrastructure with a technical and a market layers comprising multiple actors, and in the case of Europe, involving many countries, cannot be govern without proper institutions. • The power infrastructure is changing its nature due to the intensive use of ICT, and now shows an “E + I” (energy plus information) character that typifies its operation and management [6]. • The inclusion of ICT is accompanied by serious security concerns, at all levels of the power systems. The single power company cannot solve most of them and therefore there is the need for concerted actions that cannot be only of coercive character. • Efficient governance measures of the cybersecurity of the power infrastructure should be supported by suitable evidence. This evidence cannot be drawn exclusively from past experience or analytic sources. Specific facilities are required for supporting a sound and helpful decision-making process. Therefore the basis for the governance of the cybersecurity of the power infrastructure is the participation of all relevant stakeholders (private and public). In Europe this implies a multi-national setting.
6.2 The Electric Power Sector as an Infrastructure The electric power industry has seen momentous technological and organizational transformation since its inception by the end of the nineteenth century. Due to its importance for society the electric power infrastructure has always been the subject of special consideration for governments. Much of the attention of legislators and business has been focused onto the integration of the vast power infrastructure from the point of view of the stability and security of supply of the system, and to the market structure correlated with it [7, 8].
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
113
For instance, in Europe, the adoption in 1996 of a first package [1] was aimed at the liberalization of the energy market with an eye on the expansion of the competition among alternative suppliers. Further steps progressed in the integration of the market across national borders. But reality showed the difficulties in creating a unified market across the several Member States of the Union. Some noteworthy factors were the lack of transmission capacity for facilitating international trade and the monopolistic structure of several national electricity markets. The rationale for that initiative was the concept that more competition would promote an increase of efficiency of the sector, with a better offer and prices for consumers. It was clear that the approach fall short of its objectives, in part due to the nature of the incumbent operators, with their vertical integration and close links with the national states (with in some cases, situations of monopoly) [9]. A second package in 2003 set the so-called unbundling of the operators, separating the networks from the production and supply functions. The implementation of this policy hasn’t produced all the expected results, and a third package is to be adopted soon. This set of policy initiatives in the Europe Union (whose efficacy is not the object of this paper) tended towards the creation of a competitive framework based on the existences of a mesh of interconnections among the countries, and caused the increasing use of those cross-border tie-lines, for satisfying the international market operations [10]. The occurrence of some system malfunctions (among others, blackout in Italy, caused by problems in an interconnector with Switzerland in September 2003 [11]; blackout in southern Sweden and eastern Denmark in September 2003 [12]; an important power shortage in many European countries due to some operations in Germany, November 2006 [13]) clearly pointed to the need to consider the stability, reliability, adequacy and security of the European power infrastructure as a whole, beyond the consideration of those factor for each local system. This equates to saying that a large power infrastructure such as the European is: • A System-of-system showing capabilities, performance and behaviors that are more complex that the sum of those of the constituent systems (42 transmission operators from 34 countries will be embraced by the European Network of Transmission System Operators of Electricity – ENTSOE [14]). This is valid for normal functioning of the system, but also for the unwanted states, such as the operation under disturbances. • Large scale, expanded over broad geographic areas and across jurisdictional borders. This is true also in Europe, where under the same legal framework, each country can set its own rules. • Multi layer system, with several technical layers, but also with administrative and market ones – the first related to the coordination of the technical management of the systems, and the others to the trading of energy supplies. The current decentralized nature of liberalized electricity infrastructure has as a consequence that individual operators cannot be held responsible for the way the system as a whole functions. This leads to some truisms, which cannot be ignored when analyzing the decision-making process about the systems, mainly when
114
M. Masera
considering that the complexity of these infrastructures could only grow in complexity in the future: • Nobody owns, designs, or operates the infrastructure. The state of the infrastructure is the result of many independent decisions taking by all the participant actors, at the technical level, but also at the market level. • The analysis of the infrastructure cannot (always) be computed as just some function of the analysis of the single constituents. In many situations, and mostly when it could be more relevant as in case of disturbances when non-linear behaviors might arise, the infrastructure has to be analyzed as a unique entity. • The interactions and interdependencies among the many elements that compose the infrastructure can give place to unexpected properties, transitions and states, which at times would not be ascribable to individual but to the correlation among several components.
6.3 The E + I Paradigm What is of interest for us in this paper is the central role of Information and Communication Technologies (ICT) in this respect. The expansion of the national infrastructures, the integration of operators and markets across borders while satisfying both technical and market conditions, has only been possible due to the development and application of appropriate ICT tools. For instance, the Union for the Coordination of Transmission of Electricity (UCTE) that associates the operators of continental Europe (i.e. excluding the British Isles and Scandinavia) has reported an increase in cross-border exchanges from 50 TWh in 1975, to more than 300 TWh in 2007. ICT has enabled this evolution by substantial technical improvements in all systems dealing with the measurement, monitoring, protection, control, management and operation, e.g.: • Monitoring and sensor: more information on variables (voltages, current, power, generation and load data), from many more points; faster and more accurate identification of disturbances • Simulation and computation models: faster and more accurate computation of steady states, N-1, loss factors, contingency analysis, load flows and forecast, short circuit analysis, etc. • Synchronization: by means of, e.g. GPS over long distances • Support to maintenance operations: e.g. equipment outage scheduler, • Close interaction among Energy Management System and Business Management Systems • Communication with market operators • Inter-operator communication This of course has been complemented by the parallel evolution of the generation and distribution sectors of the power infrastructure.
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
115
In the future the advances in ICT, enabled by more powerful hardware, software and communications means, will facilitate, among others: • The improved identification of the causes of disturbances following the observation of abnormalities (e.g. short circuit location) • The management of large amounts of data across wide area networks • The state estimation of large systems and the fast computation of their security As a consequence of this, in [6] it was propose the E + I paradigm, which was derived from the analysis of the historical integration of ICT within energy infrastructures. ICT has evolved from mere support functions, to being an add-on, to a complete blending with the energy technical and managerial systems. Therefore the E + I paradigm postulates that: • In modern energy infrastructures, and especially in the power infrastructure, the ICT components are an integral part of the system, not an accessory carrying out some auxiliary function. Energy systems are “Energy-and-information”, with the impossibility to uncouple one and the other. Technological evolution will augment this condition. • The operators of the energy infrastructures are also “information and communication” companies, not just “energy companies”. They not only make intense use of ICT for their internal operations, but they also intensely employ them for their links with other operators, with the authorities and regulators, with technical services (such as maintenance) and with suppliers and customers. Actually their interfaces with all those actors are ICT supported and digital information rich, in some case predominantly so. Technological evolution will also augment this situation, entrenching energy supplies with digital information.
6.4 The Governance of Infrastructures Each operator has control over its own installations and resources, and therefore can apply its management capabilities for the viability and profitability of their enterprise. The addition of all the management decisions of the operators might or might not suffice for satisfying the objectives of the infrastructure. The absence of an actor supervising or controlling the entire system-of-systems, does not preclude the existence of global requirements and objectives – conditions that exceed the single actor. Among them we can mention the stability of the system (as local perturbations might affect other parts of the systems), and the adequacy of the system – or in other terms the long-term ability of the system-of-system to cope with its load demand (as local inadequacies might impact onto other systems) [15]. Based on this and the notions described in the previous section, we can expect that an infrastructure with many operators, such as the European power infrastructure, as in some situations will not be able to satisfy its global conditions by just adding local
116
M. Masera
decisions, will aim at some level of coordination or joint decision making. We call this decision-making process the governance of the whole infrastructure [16]. Governance of the infrastructure so refers to the structured approach (although not only legally formal) integrating rules and regulation for finding effective solutions to common problems. It regards the decisions that defines the expected states and outlook of the system-of-systems arrangement and functioning, that sets the process and metrics for verifying its performance, and the grants the power for actions onto the system components and its actors. This governance relates to the management activities of the single actor, to the cohesion among their policies and processes, etc. and might take the form of common procedures or agreed-upon technical standards [17]. In this paper we are concerned with the governance of the electric power infrastructure with respect to its ICT elements, and more specifically their security. In the United States, the Energy Act of 2005 established the set up of an Electric Reliability Organization, which then was awarded to NERC (North American Electric Reliability Corporation). NERC has the legal authority to enforce compliance with Reliability standards (including cybersecurity) developed and set by industrial representatives in the NERC Standards Committee. These standards are legally binding on all owners, operators and users of the bulk power system. In addition to this, NERC fulfils its statutory carrying out reliability and adequacy assessment, analysis of events (e.g. disturbances and abnormal events), readiness evaluations of the operators regarding the standards, and developing a function of situation awareness by monitoring in real-time the evolution of the operation of the infrastructures under its overseeing. In addition, it is explicitly stated that NERC should coordinate activities for the protection of the infrastructure against physical and cyber threats. No such organization exists in Europe. While the so-called Third Energy Package of legislation of the European Union aims at fostering a greater coordination on network operation and security, which should be put into practice by ENTSOE, the reach and mechanisms of the network codes that it will developed are not yet defined. What cannot be ignored is that a critical infrastructure such as the power system is also relevant for national security. Several times this issue has been rapidly and superficially treated, indicating that some sort of “public–private partnership” is the solution to the problem – without ever indicating which type of partnership is proposed: for instance, does it involve decision making process, or is it only about exchange of information, or something else? It is easy to foresee the possibility of conflict between differing positions in a multi-national setting. The links public– private in a given country might not fully be compatible with the private–private agreement across industry, which instead might not match the public–public arrangements among governments. This multi-party, multi-national problem will require appropriate governance approaches, with mechanisms and processes through which the means and ways for assuring the collective properties (mainly those related to risk and security) of the infrastructure are established.
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
117
A golden rule is that all concerned parties need to work together, as well as with parties who do not directly influence the physical system such as traders, brokers, power exchanges and retail companies. Through some governance process, the different affected actors should cooperate to handle risks that exceed the boundaries of their own risk management processes. Risks that are (or should be) the subject of the risk governance process are either risks that involve multiple actors or risks that originate outside the control of the involved actors. Therefore, and notwithstanding the steps made in the last years, significant questions lie ahead of the power infrastructure: • How to reach joint decisions about what is critical, what should be protected, which are adequate levels of protection? • How to take provision of the security risk that affect the infrastructure as a whole, at the same time as considering the interests and concerns of the single operators and of each country involved, the security of supply, and other condition such as environmental constraints? • How to set standards and guidelines without stifling innovation and the adoption of fast evolving technologies, mainly ICT, and without hindering the development of fair markets? • How to apportion costs and responsibilities between business obligations and national security objectives in an international framework?
6.5 ICT, Governance and the Power Infrastructure ICT is at the same time the main enabler of the expansion of the networking of the electric power infrastructure, and its main source of vulnerabilities today. The more ICT is incorporated into the power infrastructure, the more vulnerabilities materialize, and the more opportunities to suffer from malicious threats [18]. The old ICT in power systems had no or inadequate security protections, and the modern strains related to Internet and a few standard technologies are intrinsically insecure. On the other hand the “culture” of ICT is very different from the basic engineering and management style and methods in power systems. Most ICT are not fully proved, with reliability characteristics that the manufacturers rarely accept to be liable for. The mean-life of ICT systems is diminishing to a few years, contrasting to the decades-long life of electro-mechanical components [19, 20]. When a power system part is accepted as reliable, it is not subject to periodic “patches” that solve always emerging problems. ICT systems in power infrastructures can be considered arranged in at least 5 strata, each comprising the lower level: power physical variables or digital data; power components and digital applications; power stations and digital systems; the whole systems under the control of a single actor; and finally the system-of-systems level. For the sake of a comprehensive view of the power system, the following considers the full power company, and not only the components directly related
118
M. Masera
with the handling of power – taking into account only the power technical components leads to disregarding the integration of all digital processes within and among actors in the infrastructure. These are not just structural strata, but correspond to services or functions: • Stratum 1: comprises the measurements, operations and other actions executed directly onto the single power variable or onto the digital data (e.g. at the management, engineering, control or market sectors of the power company). This stratum could be related to local functions, to centralized functions, or even to overall measurements. • Stratum 2: comprises the actions executed onto the technical components or onto the technical systems (power line, transformer, etc.), or onto the other data processing applications (management, engineering, control or market related). • Stratum 3: comprises the actions executed at the technical stations (e.g. substations, control rooms) and complete digital systems (e.g. energy management systems). • Stratum 4: comprises the actions executed onto the whole system under the control of a single operator, either the technical power system (i.e. controlled by the Transmission System Operator, for instance by load balancing), or the power market (i.e. with actions such as he balancing of the trading operations). Geographically, this stratum can comprise large regions, or even countries. • Stratum 5: comprises all the actions involving more than one actor: system-ofsystems operations (such as regional coordination), and the interactions between TSOs and market operators, Independent System Operators (ISOs in charge of balancing the wholesale demand and supply), and generation and distribution companies. From the governance viewpoint, a first reaction could indicate that only the latter stratum should be the object of infrastructure-wide measures. This is surely the case, as this requires the agreement among the different involved actors, for instance regarding standardized data formats and harmonized approaches. But the other strata might be the object of infrastructure governance. As a matter of fact, the NERC cybersecurity standards cover aspects such as personnel access and training, security perimeters and sabotage reporting – all attachable to the first strata. Why is this necessary? There is an obvious and implicit recognition that failures and security breaches affecting or originating in the lower strata might impact within the boundaries of one company might dangerously impact onto the infrastructure as a whole. The open question is whether a set of obligations and guidelines on particular aspects of the system (although reasonable and advisable – for instance the need to have a security management structure in place, or to identify the critical assets) are sufficient, or there is a need for a more comprehensive approach. With a simple bottom-up approach there is the obvious risk of companies concentrating on the bureaucratic aspects of the compliance to the standards, implicit moral hazards – mainly considering the multi-national character of power infrastructure such as the European.
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
119
6.6 Cybersecurity and the Power Infrastructure The intensive use of ICT in the power infrastructure does not come without problems: straightforward security issues deriving from the intrinsic vulnerabilities of the ICT components and applications, but also some stemming from the specific traits of ICT with respect to the typical electro-mechanical systems in power infrastructures [21–23]. Among them we can mention: • Differences in life-cycles: ICT has a much shorter mean life, they evolve with modifications and patches (most of them for solving unexpected problems), when replaced new technologies might be employed due to the unavailability of the previous ones creating problems of interoperability with other components, etc. • Differences in the technical approaches: from the design to the operation, maintenance, testing and management of their reliability, as ICT might require the acceptance of faulty systems for assuring the continuity of the functioning of the system. The existing of ICT vulnerabilities put the whole system at the risk of failures, some of them with potential catastrophic consequences. These failures can be related to the main objective of the power infrastructure, that is blackouts with the loss of service for a significant quantity of users for a considerable period. But they can also refer to situations that can affect the functioning of the system by disturbing the operations of the power market, or the intercommunication among the actors of the infrastructure, or to violations of the confidentiality of the business actors with potential financial losses. Recently in the USA there have been some public acceptance of the cybervulnerability of the power grid. This includes a public note by NERC warning about the inadequacy of the current protection against possible cyber-attacks, and articles in the press about cyber-espionage in power companies. [24] Apart from the anecdotes not supported by systematic analysis or the reporting of actual incidents, it is clear that there are some cybersecurity concerns of immediate relevance. We will discuss in the following some distinctive cases using the set of strata discussed in the previous section: • Stratum 1: most functions at the lowest level in the technical systems are saved from the most direct threats due to the “obscure” character of the technologies employed – at least for the more street-level hacker. In any case, ready information can be found rather easily in open sources. In addition, this stratum refers to the deepest points in the chains of measurements or operations, so for arriving to them, several other security breaches at higher levels have to succeed. But a strict defense-in-depth will require also adequate protection at this stratum. On the digital data side, the security problems are similar to the most typical desktop systems: dramatic in most cases, characterized by the dependence upon unreliable operating systems and software code implementation processes, susceptible to attacks, even to ones of not very sophisticated nature. • Stratum 2: on the technical side, most functions (e.g. those executed by the SCADA systems) rely on protocols designed without any security principle in
120
M. Masera
mind, as they originate from times when security was not a vital requirement. The migration of those protocols onto IP networks is, if possible, worsening the situation or at least facilitating the work of potential aggressors. It would not be difficult to imagine the production and broad availability of dedicated exploits for hitting these systems, as it is now the case for the desktop environment, if there were the interest and potential benefits in doing so. The data processing applications are once more, as insecure as other typical software applications of our age, depending mainly from the security viewpoint on measures taken at higher levels. • Stratum 3: the most distinctive feature at this level is the increasing networking, with wireless connections not excluded in the near future. The dominant factors are the expansion of functional and operational capabilities based on the timely access to information, augment the efficiency of the systems and the possibility to have early identification and diagnosis of abnormalities and failures. This matches the market pressure to profitability and operations closer to the technical limits. More networking is translated in more entry points for threats, and more possibilities for the spreading of security incidents. The trend is also steer by standards that define new communications architecture, generally based upon widespread Internet technologies. The risk is that of inheriting vulnerabilities, and being subject to technical evolutions external to the logic of power systems. • Stratum 4: at the level of companies, it is apparent the evolution towards e-business, including functions such as congestion management and ancillary services management. Also here standards drive the evolution of systems, accompanying the ICT commercial offer. The reference are existing- or emerging standards or defacto ones for global e-business. The methods, models and implementation mechanisms are those of prevailing in software and networking engineering – very different from the normal ones in power companies. The implications for power companies are that they will have to be supported by powerful and skilled networking teams, with strong security knowledge and the authority to impose adequate policies and protections. • Stratum 5: in deregulated and internetworked systems such as the current power infrastructure, many ICT applications refer to the interactions for the coordination in the management of the systems among TSOs, and the implementation of the power market with interactions with main customers, traders, power exchanges, metering providers, entities in charge of the system balancing (such as ISOs), and the power generation and distribution companies. Most of these can be seen as B2B web applications. The functioning of the system-of-systems depends upon the seamless e-business communication among the market participants. The security implications are apparent – not so the accompanying work that should be produced by the relative standardization (e.g. IEC Technical Committee 57) and other technical bodies. For facing the repercussion of ICT and cybersecurity onto the power companies, these will have to recognize their E + I nature, and transform themselves into organizations with robust ICT and cybersecurity capabilities.
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
121
6.7 Governing Cybersecurity Issues Following the previous reasoning, it is clear that the governance of cybersecurity in the power infrastructure will require interventions by different actors at different levels, with stakeholders playing interconnected roles with dissimilar responsibility. Governance should be the synthesis where each actor finds its more appropriate position in the joint definition of the problem and the joint decision about the acceptable solutions. Which could be the right structure for the governance of cybersecurity in the power sector in Europe? There is no straightforward solution. What we can assert is the set of conditions and rules that will make that governance an effective reality. In the following we briefly discuss which are the foreseeable actions by the actors. Cybersecurity requires power companies to solve the confluence of three challenges: the business side (i.e. supply, operations, competitiveness, profitability), the compliance side (i.e. standards, sector agreements, guidelines), and the deployment of security capabilities (i.e. staff, skills, available technologies, costs). The three dimensions are not always compatible among them, and this mismatch can be a major source of problems. Some mismatches can be: • The business side might require the limitation of investments, while for compliance purposes new practices should be put in place. The other way round, compliance might require improvements that make sense for the overall infrastructure, without reflecting the concrete situation of the single company. • The business side might require functionality that stretches to the limit the security potential of the technical applications in use, while the availability of new technologies would demand with high priority the recruitment of new personnel with new technical skills beyond the plans of the company. • For the sake of compliance companies will have to implement technologies not fully compatible with their technical and competence base. On the other hand, companies would prefer to apply solutions for which there is no compliance programme in place, having to deal with the opposition between the risk of violating the normative and that of pausing their technological advancement. The other source of conflicts is the already mentioned multi-actor, multi-national character of the power infrastructure. From the operational point of view it has been recognized (e.g. in Europe with ENTSOE) the need to coordinate some actions and procedures of the interrelated actors, for instance for the management of congestion. The cooperation from the security standpoint can entail many levels: • One-to-one: two companies, e.g. neighboring ones, agree upon the security assessment of the assets interconnecting them. • Nation-wide: typically with the participation of the authorities, all the companies in the national power infrastructure agree upon common methods for the reporting and analysis of cybersecurity incidents. • Infrastructure-wide: all companies interconnected in the multi-national infrastructure agree on the mechanisms for the enforcement, and on benchmarking exercises.
122
M. Masera
• Inter-infrastructure: two not-interconnected infrastructures agree to exchange information on vulnerabilities and the efficiency of security measures. A key issue for companies and governance arrangements is to have relevant data for supporting their claims about the security and protection of their systems. Having no clear evidence could be at times an excuse for not acting. In other cases, insufficient information cannot only hinder the determination of potential solutions, but it might obstruct the same formulation of the problem. Several questions rely on the availability of proper data: How much protection is enough with respect to the expected threats? How resourceful and fit for their problems are the implemented solutions? Which could be the security consequences originating from the introduction of new technologies? The answers to these questions can only have three sources: • Analysis of events experienced in real systems – but this is limited to actual occurrences of security incidents, and to technologies actually deployed • Desktop analysis based on models – but this is limited by the pertinence and applicability of the supporting concepts, theories, methods and models used, usually based upon what is known about previous realities • Experimental work – for which there is the need to employ realistic settings (test beds, test ranges) where to reproduce the incidents that could potentially happen in real systems [25, 26] The latter option is the most costly, but the one that could support the verification of new technologies, architectures and security scenarios. Manufacturers normally test components and devices – not the least for obtaining their certification. However, the testing at large scale requires specifically devoted installations. In the USA special installations have been developed for the testing of SCADA (ref. the National SCADA Test Bed Program with installations such as the ones at the Idaho National Laboratory [27]). In Europe similar facilities don’t exist. Their development appears to be necessary. Their funding should involve all interested parties: • Governments should look at them as basic capabilities for getting answers to security problems, and for enabling both the needed science and technology, and innovation in industry. In this light, it can be argued that governments should be the main source of funds for the establishment of these experimental facilities. In addition, governments should fund all experiments justified for national security or societal issues at large [28]. • Power operators should contribute to the development of the facilities, and should shape their way of functioning. Operators obviously know how they are making use of ICT, and which are the anomalies that might cause problems. Their contribution is fundamental as contributors with field information, and as main users of the results of the experiments. Operators should fund the experiments directly linked to their industrial problems, while contributing to those related to national security or society wide issues [28].
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
123
• ICT suppliers are a key actor as the producers of the ICT systems. Their participation is fundamental mainly when referring to new technologies and prospective standards. Technology suppliers should financially support the experiments organized for trying their own technologies, while potentially contribute to the other experiments. It is clear the technology suppliers could be one of the main beneficiaries of this initiative, in terms of innovation, and in terms of qualification of their equipment [28, 29]. The real challenge is the understanding of the security condition of the overall infrastructure. Single events in particular systems can be useful for understanding a particular vulnerability, but extrapolating them to the whole infrastructure is not immediate. From the analytic standpoint, the power engineering disciplines has developed in the last decades many powerful instruments for the modeling and simulation of power systems. Constructed using the differential algebraic equations that characterize the behavior of the system, these models can be used for simulating the functioning of the power network under different conditions. Less typical are the models integration the ICT functions, and less so the ones including also the other company operations such as the market. In any case, simulating the normal functioning of power systems or power markets (although taken into account some contingencies), differs from the simulation of cybersecurity incidents and the propagation of the resulting effects. For cybersecurity it is not only important the effect onto the power system (i.e. the continuity of supply), but the dynamics of the security breach itself (i.e. vulnerability exploited, behavior of the threat, cascading faults, errors of human operators or protection mechanisms, etc.). In addition, modeling and simulating cybersecurity events require having the capability of representing and reproducing concurrent actions (not the least because the same ICT application is used in multiple places), and the interactions with human operators (who can misunderstand a situation due to contrasting data, or act changing configurations or the operational state of components). As a conclusion we can say, that we are far way for being able to solidly support security management and governance decisions regarding cybersecurity. Most decisions today could only be based upon generic guidelines and standards entailing broad policies (e.g. the requirement of implementing security controls, of basing the security measures on risk assessments, on planning responses to incidents and business continuity, on reporting incidents). These actions are needed, but do not suffice for ensuring the infrastructure at least from two perspectives: • The introduction of innovative ICT systems for which there are not enough antecedents for supporting security analysis. Vulnerabilities and threats will always be possible. Avoiding these ICT is not an option. A disciplined use of experimental security in apposite test ranges appears to be a more sensible approach. • The extremely large dimension of the power infrastructure, including the technical, management and market components, and the complexity of the potential states and behaviors determined by the countless and unforeseeable interactions
124
M. Masera
of many actors, warns against the sole use of analytic approaches. Also in this case the application of experimental approaches appears to be a sensible, although costly, solution for providing solid ground for governance decisions.
6.8 Research and Policy Recommendations Based on the previous considerations, the author formulates the following recommendations with respect to the research and policy initiatives that could help in dealing with the topics discussed in the article:
6.8.1 Research Recommendations • Establish a European research strategy for enabling the definition and implementation of an appropriate governance of ICT security in the power sector: 1. Program of multi-disciplinary studies considering the policy, social, legal, economic and technical factors on the governance of critical infrastructures in general and the power sector in particular, including: (a) Governance arrangements for ICT security in the power sector, in the context of the European policy framework (b) Analysis of the international dimension of ICT security and the power sector for an infrastructure that exceeds the borders of the European Union (c) Effective and fair means for informing the end users (industrial users, general public) about the ICT risks entail by systems such as smart grids, and managing incidents 2. Studying rigorous approaches for stating and validating security claims, regarding the ICT systems of power systems and systems-of-systems, including: (a) Modeling and simulation capabilities for dealing with the whole European power infrastructure, including technical (physical and cyber aspects), organization and market components. A key aspect is the need to being able to reproduce security scenarios, including failures, insufficient adequacy, and malicious attacks – aspects that are normally not taken into account in operational simulators. (b) Experimental facilities for carrying out security experiments, mainly of the ICT components of power systems. These facilities could be test beds mainly based on emulated components, or test ranges, based on the reproduction of real systems. There is a need for a rigorous, systematic approach to the design, running, observation and analysis of those experiments, as normal in other fields of experimental science. 3. Programs or supporting the operators in managing the evolution of the power infrastructure, beyond what can be achieved by the single actor. This can include:
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
125
(a) Vulnerability and security assessment of power systems at the European infrastructure, system-of-systems level. The open question is how to aggregate the data of the single systems, in respect of business confidentiality and national security concerns. The results should be summarized in some metrics representative of the vulnerability and security status of the infrastructure. This sensitive information should obviously be available to a set of restricted persons. (b) Scenario-based training for pan-European operations in case of security contingencies. This should comprise malicious attacks, for which collaboration with national security services might be required. Research of new methods that take into account the psychological reaction of operators to stress conditions in attack scenarios. (c) Exchange of security sensitive information on a cross-border basis. Firsthand knowledge of security events if of foremost importance for defining prevention and protection programs, and for benchmarking the efficiency of different approaches. These data could also be used for supporting specialized R&D activities, with a controlled dissemination of information to trusted researchers.
6.8.2 Policy Recommendations 1. Support the integration and further development of protection capabilities for the power sector at the European level, including the sponsorship of European public–private partnerships and following a jointly developed agenda: (a) The protection of the power infrastructure depends on the collaboration among many different actors: operators, vendors, regulators, other authorities, R&D. A first step towards any governance structure is the fostering of trust among all stakeholders. The development of joint capabilities could benefit from the outlining of a common agenda. (b) Foster the definition of an agreed minimum set of capabilities for the detection, response, and mitigation of cyber incidents. Incoherent approaches by different actors could hinder the overall protection. The agreement among the operators could take the form of self-regulation. 2. Support the creation of R&D centers and networks of excellence serving the whole European power system in the field of security in general and cybersecurity in particular. This should aim at: (c) Producing a critical mass of researchers for tackling the different problems faced by industry, and for anticipating future needs in coordination with the power sector (d) Producing technical platforms for demonstrating and validating the results of R&D, and for encouraging innovation (e) Supporting education and training, based on the latest available technologies and on new ICT applications ready to be deployed
126
M. Masera
6.9 Conclusions We have discussed why there are no simple solutions for the cybersecurity problem faced by the power infrastructure. Disregarding it or overlooking the real character of the problem, could only lead to serious incidents in the future. The main implication of this development is the urgent need to taken into account the cybersecurity of the power systems. But due to the multi-party, and in places such as Europe, multi-national character of the infrastructure, many decisions regarding cybersecurity has to be agreed upon all relevant private and public actors. Solutions should be based on the proper understanding of the problem, and be the result of decisions taken within some concrete cooperative problem-solving arrangement involving all concerned parties. We have debated the characteristics that should hold this governance of the cybersecurity of the power infrastructure. A key issue is the supporting evidence for that decision making process. It is argued that today we do not have enough data on the ICT security features of the whole power infrastructure for making sound decisions. Dedicated initiatives, mainly based upon experimental tests, are required for contributing towards this end. The other alternative is to use unproved applications in real systems, but trial and error is not an option for all systems of the infrastructure.
References 1. Scharpf, F.W.: What have we learned? Problem-solving capacity of the multilevel European Policy, Cologne: MPIfG Working Paper (2001) 2. de Jong, H.: Regulatory mode decision-making in the European process of electricity market integration, Netw. Ind. Q. 10(3) (2008) 3. Energy Policy Act: Bill of the Congress, United States of America, 29 July 2005, Public Law 109-58 (2005) 4. European Union: Council Directive 96/92/EC Concerning common rules for the Internal Market in Electricity (1996) 5. European Union: Directive 2003/54/EC of the European Parliament and of the Council of 26 June 2003 Concerning Common Rules for the Internal Market in Electricity and repealing Directive 96/92/EC (2003) 6. Gheorghe, A.et al: Critical Infrastructures at Risk, Securing the European Electric Power System. Springer, New York (2006) 7. International Energy Agency: IEA Energy Policy Review – The European Union, OECD Publishing (2008) 8. International Energy Agency: Energy Policies of IEA Countries – United States, OECD Publishing (2007) 9. Directorate-General Competition, European Commission: DG Competition Report on Energy Sector Inquiry, 10 January 2007, Brussels (2007) 10. Lagendijk, V.C.: Electrifying Europe: The power of Europe in the construction of electricity networks. Aksant, Amsterdam (2008) 11. GRTN (2003), Blackout: The events of 28 September 2003 (October 2003)
6 Governance: How to Deal with ICT Security in the Power Infrastructure?
127
12. Larsson, S., Danell, A.: The black-out in southern Sweden and eastern Denmark, September 23, 2003. Power Systems Conference and Exposition, 2006, PSCE, IEEE PES 2006, pp. 309–313. Atlanta, GA (2006) 13. Union for the Co-ordination of Transmission of Electricity: Final Report – System Disturbance on 4 November 2006, UCTE (2007) 14. European Network of Transmission System Operators for Electricity. http://entsoe.eu/ (2009) Accessed 7 Dec 2009 15. De Jong, H.M., Hakvoort, R.A.: European electricity market integration: an analysis of the European governance dynamics. In: Proceedings of the 31st IAEE international conference: bridging energy supply and demand: logistics, competition and environment, Istanbul, June 2008 16. Djelic, M.L., Sahlin-Andersson, K.: Transnational Governance: Institutional Dynamics of Regulation. Cambridge University Press, Cambridge (2008) 17. Finger, M., Groenewegen, J., Künneke, R.: The quest for coherence between institutions and technologies in infrastructures. J Netw. Ind. 6 (4), 227–259 (2005) 18. Eisenhauer, J.et al: Roadmap to Secure Control Systems in the Energy Sector. Energetics, Columbia, MD (2006) 19. Stouer, K., Falco, J., Kent, K.: Guide to Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems Security, Initial Public Draft. National Institute of Standards and Technology, Gaithersburg, MD (2006) 20. Ketzner P. et al.: Process Control System Security Technical Risk Assessment Methodology & Technical Implementation, I3P Research Report no. 13, March 2008 (2008) 21. Sanders W.: “TCIP: Trustworthy Cyber Infrastructure for Power,” Keynote at the Cyber Security and Information Infrastructure Research Workshop, Oak Ridge, TN, 14–15 May 2007 22. Tomsovic K., Bakken, D., Venkatasubramanian, M., Bose, A.: Designing the next generation of real-time control, communication and computations for large power systems. Proc. IEEE (special issue on Energy Infrastruct. Syst.) 93(5) (2005) 23. Hauser, C.H., Bakken, D.E., Dionysiou, I., Gjermundrød, K.H., Irava, V.S., Helkey, J., Bose, A.: Security, trust and QoS in next-generation control and communication for large power systems. Int. J. Crit. Infrastruct. 4(1/2) (2008) 24. NERC: NERC and Electric Industry Continue Efforts to Address Cyber Risk, North American Electric Reliability Corporation, Press Release, 17 June 2009 (2009) 25. Giani A. et al.: A testbed for secure and robust SCADA systems. ACM SIGBED Rev. 5(2) (2008) 26. Nai Fovino, I., Masera, M.: Methodology for experimental ICT industrial and critical infrastructure security tests. In: Ortiz-Arroyo, D. et al (eds.) Intelligence and Security Informatics, Lecture Notes in Computer Science, Vol 5376. Springer Verlag, Berlin (Germany) (2008) 27. Idaho National Laboratory: securing the power flow, INL’s SCADA test bed. http://www.inl. gov/ (2009) Accessed 7 Dec 2009 28. GRID project: ICT Vulnerabilities of power systems: a roadmap for future research. http:// grid.jrc.it/ (2007) Accessed 7 Dec 2009 29. Energetics Inc.: Roadmap to secure control systems in the energy sector, Columbia, MD, January 2006. http://www.oe.energy.gov/ (2006) Accessed 7 Dec 2009
Chapter 7
Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid: A Case Study Nabajyoti Barkakati and Gregory C. Wilshusen
Abstract Information and communications technologies (ICT) supporting critical infrastructures, including the electric grid, face increasing risks due to cyber threats, system vulnerabilities, and the serious potential impact of attacks or malfunction, as demonstrated by reported incidents. If these technologies are not adequately secured, their vulnerabilities could be exploited and critical infrastructures could be disrupted or disabled, possibly resulting in loss of life, physical damage, or economic losses. The US Government Accountability Office (GAO) examined the controls implemented by the Tennessee Valley Authority (TVA) – the United States’ largest public power company – to protect ICT including control systems and networks used to operate critical infrastructures. GAO’s examination identified numerous vulnerabilities that placed TVA’s control systems and networks at increased risk of unauthorized modification or disruption by both internal and external threats, and numerous actions that TVA can take to mitigate these vulnerabilities. This case study summarizes the results of GAO’s examination of the controls over TVA’s critical infrastructure control systems. Keywords Critical infrastructures • Information and Communication Technology • control systems • vulnerability • risk analysis
7.1 Introduction Information security is a critical consideration for any organization that depends on ICT to carry out its mission or business [1, 2]. Of particular importance is the security of information and systems supporting critical infrastructures – physical or virtual systems and assets so vital to a nation that their incapacitation or destruction would have a debilitating impact on national and economic security and on public N. Barkakati () and G.C. Wilshusen U.S. Government Accountability Office, 441G St NW, Washington, D.C. 20548, USA e-mails:
[email protected];
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_7, © Springer Science + Business Media B.V. 2010
129
130
N. Barkakati and G.C. Wilshusen
health and safety. These systems and assets – such as the electric power grid, chemical plants, and water treatment facilities – are essential to the operations of the economy and the government. The electric power grid’s reliance on ICT puts it at increasing risk from cyber threats [3]. Interconnection of control systems with public data networks makes them accessible to individuals and organizations from anywhere in the world [4]. Cybersecurity becomes even more critical as the electric power industry moves towards Smart Grid technology that further integrates distributed computing and communications into the electric grid to create an automated, widely distributed energy delivery network [5]. A recently completed comprehensive review of United States cyber security related policies and structures acknowledges the importance of security and resilience of infrastructure systems and recommends that the US Department of Energy should work with the US Federal Energy Regulatory Commission “to determine whether additional security mandates and procedures should be developed for energy-related industrial control systems. In addition, as the United States deploys new Smart Grid technology, the Federal government must ensure that security standards are developed and adopted to avoid creating unexpected opportunities for adversaries to penetrate these systems or conduct large-scale attacks” [6].
7.1.1 Industrial Control Systems Are Used in Critical Infrastructures and the Electric Grid Industrial control systems are used within these infrastructures to monitor and control sensitive processes and physical functions. Typically, control systems collect sensor measurements and operational data from the field, process and display this information, and relay control commands to local or remote equipment. Control systems perform functions that range from simple to complex. They can be used to simply monitor processes – for example, the environmental conditions in a small office building – or to manage the complex activities of a municipal water system or a nuclear power plant. In the electric power industry, control systems can be used to manage and control the generation, transmission, and distribution of electric power. For example, control systems can open and close circuit breakers and set thresholds for preventive shutdowns. There are two primary types of control systems: distributed control systems and supervisory control and data acquisition (SCADA) systems. Distributed control systems typically are used within a single processing or generating plant or over a small geographic area and communicate using local area networks, while SCADA systems typically are used for large, geographically dispersed operations and rely on long-distance communication networks. In general, critical infrastructure sectors and industries depend on both types of control systems to fulfill their missions or conduct business. For example, a utility company that serves a large geographic area may use distributed control systems to manage power generation at each power plant and a SCADA system to manage power distribution to its customers. A SCADA system is generally composed of these six components (see Fig. 7.1): (1) operating equipment, which includes pumps, valves, conveyors, and substation
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid Control center
Long-range communication
Telephone
131
Field site
Local processors Short-range communication
Satellite
Cellular Host computers
Instruments
Operating equipment
Wide area network (WAN)
Fig. 7.1 Major components of a SCADA system
breakers; (2) instruments, which sense conditions such as pH, temperature, pressure, power level, and flow rate; (3) local processors, which communicate with the site’s instruments and operating equipment, collect instrument data, and identify alarm conditions; (4) short-range communication, which carries analog and discrete signals between the local processors and the instruments and operating equipment; (5) host computers, where a human operator can supervise the process, receive alarms, review data, and exercise control; and (6) long-range communication, which connects local processors and host computers using, for example, leased phone lines, satellite, and cellular packet data. A distributed control system is similar to a SCADA system but does not operate over a large geographic area or use long-range communications.
7.1.2 Control Systems for Critical Infrastructures Face Increasing Risks Critical infrastructure control systems face increasing risks due to cyber threats, system vulnerabilities, and the potential impact of attacks as demonstrated by reported incidents [1, 2]. Cyber threats can be intentional or unintentional, targeted or nontargeted, and can come from a variety of sources. The United States Federal Bureau of Investigation has identified multiple sources of threats to US critical infrastructures, including foreign nation states engaged in information warfare; criminal groups and hackers; and disgruntled employees, contractors, or business partners working with or within an organization. Disruptions to ICT controls can have significant effects on utilities such as electricity and water. Although there is not a comprehensive source for incident reporting, the following attacks [7], demonstrate the potential impact of an attack: • Maroochy Shire sewage spill. In the spring of 2000, a former employee of an Australian organization that develops manufacturing software applied for a job with the local government, but was rejected. Over a 2-month period, this individual reportedly used a radio transmitter on as many as 46 occasions to remotely break into the
132
N. Barkakati and G.C. Wilshusen
controls of a sewage treatment system. He altered electronic data for particular sewerage pumping stations and caused malfunctions in their operations, ultimately releasing about 264,000 gal of raw sewage into nearby rivers and parks. • Davis–Besse power plant. The Nuclear Regulatory Commission confirmed that in January 2003, the Microsoft SQL Server worm known as Slammer infected a private computer network at the idled Davis–Besse nuclear power plant in Oak Harbor, Ohio, disabling a safety monitoring system for nearly 5 h. In addition, the plant’s process computer failed, and it took about 6 h for it to become available again. • Northeast power blackout. In August 2003, failure of the alarm processor in the control system of FirstEnergy, an Ohio-based electric utility, prevented control room operators from having adequate situational awareness of critical operational changes to the electrical grid. This problem was compounded when the state estimating program at the Midwest Independent System Operator failed due to incomplete information on the electric grid. When several key transmission lines in northern Ohio tripped due to contact with trees, they initiated a cascading failure of 508 generating units at 265 power plants across eight states and a Canadian province. • Taum Sauk Water Storage Dam failure. In December 2005, the Taum Sauk Water Storage Dam, approximately 160 km south of St. Louis, Missouri, suffered a catastrophic failure, releasing a billion gallons of water. According to the dam’s operator, the incident may have occurred because the gauges at the dam read differently than the gauges at the dam’s remote monitoring station. Control systems are more vulnerable to cyber threats, including intentional attacks and unintended incidents, than in the past for several reasons, including their increasing standardization and their increased connectivity to other systems and the Internet. According to the Annual Threat Assessment of the Intelligence Community dated February 12, 2009, “the growing interconnectivity between information systems, the Internet, and other infrastructures creates opportunities for attackers to disrupt telecommunications, electrical power, and other critical infrastructures” [8]. A successful cyber attack against physical infrastructure computer systems such as those that control power grids have the potential to disrupt services for hours or weeks. The potential impact of intentional attacks and unintentional incidents involving control systems can be serious. In August 2006, two circulation pumps at Unit 3 of the Browns Ferry, Alabama, nuclear power plant operated by TVA failed, forcing the unit to be shut down manually. The failure of the pumps was traced to an unintended incident involving excessive traffic on the control system network caused by the failure of another control system device. Other examples are described in the sidebar to the left. Critical infrastructure owners face both technical and organizational challenges to securing control systems. Technical challenges – including control systems’ limited processing capabilities, real-time operations, and design constraints – hinder an infrastructure owner’s ability to implement traditional information technology (IT) security processes, such as strong user authentication and patch management. Organizational challenges include difficulty in developing a compelling business case for investing in control systems security and differing priorities of information security personnel and control systems engineers.
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid
133
7.1.3 TVA Provides Power to the Southeastern United States The Tennessee Valley Authority is a federal corporation and the United States’ largest public power company. TVA’s power service area covers about 207,200 km2, includes almost all of Tennessee and parts of six other states, and serves about 8.7 million people. It operates 11 coal-fired fossil fuel plants, eight combustion turbine plants, nuclear nuclear plants, and a hydroelectric system that includes 29 hydroelectric dams and one pumped storage facility1 (see Fig. 7.2). Fossil fuel and nuclear plants account for 90% of TVA’s power. The coal-fired generating facilities have 15,075 MW of capacity and its combustion turbines provide an additional
Kentucky Virginia
Tennessee
Mississippi
Alabama
North Carolina
Georgia
Service area Hydroelectric (30) Coal and combustion turbine (16) Nuclear (3)
Fig. 7.2 TVA’s seven state service area and generating facilities
A pumped-storage plant uses two reservoirs, with one located at a much higher elevation than the other. During periods of low demand for electricity, such as nights and weekends, energy is stored by reversing the turbines and pumping water from the lower to the upper reservoir. The stored water can later be released to turn the turbines and generate electricity as it flows back into the lower reservoir.
1
134
N. Barkakati and G.C. Wilshusen
6,003 MW [9]. TVA’s nuclear plants generate 6,900 MW of electricity [10]. The hydroelectric system has a total capacity of 4,786 MW [11]. TVA also owns and operates one of the largest transmission systems in North America. It includes over 25,000 km of power lines [12]. TVA’s transmission system moves electric power from the generating plants where it is produced to distributors of TVA power and to industrial and federal customers across the region. TVA provides power to three main customer groups: distributors, directly served customers, and off-system customers. There are 159 distributors – 109 municipal utility companies and 50 cooperatives – that resell TVA power to consumers. These groups represent the base of TVA’s business, accounting for 85% of their total revenue. Fifty-three large industrial customers and six federal government installations buy TVA power directly. They represent 11% of TVA’s total revenue. Twelve surrounding utilities buy power from TVA on the interchange market. Sales to these utilities represent 4% of TVA’s total revenue.
7.1.4 ICT Control Systems Are Essential to TVA’s Operation TVA uses ICT control systems to both generate and deliver power. To generate power, control systems are used within power plants to open and close valves, control equipment, monitor sensors, and ensure the safe and efficient operation of a generating unit. Many control systems networks connect with other agency networks to transmit system status information. To deliver power, TVA monitors the status of its own and surrounding transmission facilities from two operations centers. Control systems at these centers are used to open and close breakers and balance the transmission of power across the TVA network while accounting for changes in network capacity due to outages and changes in demand that occur continuously throughout the day. TVA’s control systems range in capacity from simple systems with limited functionality located in one facility to complex, geographically dispersed systems with multiple functions. The ages of these control systems range from modern systems to systems dating back 20 or more years to the original construction of a facility. Responsibility for control systems security is distributed throughout TVA TVA’s Information Services organization manages the overall TVA corporate computer network that links facilities throughout the TVA service area and is connected to the Internet. As of February 2008, the Enterprise IT Security organization within Information Services was given specific responsibility for cyber security throughout the agency. However, the control systems located within a plant are integrated with and managed as part of the generation equipment, safety and environmental systems, and other physical equipment located at that plant. This means that development, dayto-day maintenance and operation, and upgrades of control systems are handled by the business units responsible for the facilities where the systems are located. Specifically, nuclear systems are managed by the Nuclear Power Group; coal and combustion turbine control systems are managed by the Fossil Power Group; and
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid
135
hydroelectric facilities are managed by River Operations. Transmission control systems are managed by TVA’s Transmission and Reliability Organization, located within its Power Systems Operations business unit. TVA’s Transmission and Reliability Organization is highly dependent on ICT control systems. In an effort to ensure its systems are secure, the Transmission and Reliability Organization has handled additional aspects of information security compared with other TVA organizations. For example, the organization manages portions of its own network infrastructure. It also has arranged for both internal and external security assessments in order to enhance the security of its control systems.
7.2 Case Study: Implementation of ICT Controls for Systems and Networks Supporting TVA’s Critical Infrastructure At the request of the Chairs and Ranking Members of three Congressional committees or subcommittees, GAO examined the ICT controls and security practices implemented by the Tennessee Valley Authority (TVA) to protect its control systems and networks. The objective of GAO’s review was to determine if TVA had effectively implemented appropriate information security practices for the control systems used to operate its critical infrastructures. GAO conducted its review using the Federal Information System Controls Audit Manual [13], a methodology for reviewing information system controls that affect the confidentiality, integrity, and availability of computerized data. The work was focused on the corporate network and ICT control systems located at six TVA facilities. These facilities were selected to provide a cross-section of the variety of control systems by type of generation facility (coal, combustion turbine, hydroelectric, and nuclear) and function (generation and transmission). To accomplish its objective, GAO conducted tests and observations using federal guidance, checklists, and vendor best practices for information security; examined and analyzed relevant documentation; and held discussions with key security representatives, system administrators, and management officials to determine whether information system controls were in place, adequately designed, and operating effectively. GAO also reviewed previous reports issued by the TVA Inspector General’s Office. This review was conducted from March 2007 to April 2008 in accordance with generally accepted US government auditing standards.
7.2.1 TVA Had Not Fully Implemented Appropriate Security Practices to Protect Its Critical Infrastructures GAO’s review determined that TVA had not fully implemented appropriate security practices to secure the control systems used to operate its critical infrastructures. Both the corporate network infrastructure and control systems networks and devices
136
N. Barkakati and G.C. Wilshusen
at individual facilities and plants were vulnerable to disruption. In addition, physical security controls at multiple locations did not sufficiently protect critical control systems. Perhaps one of the more serious deficiencies pertained to the interconnections between TVA’s control system networks and its corporate network increase the risk that security weaknesses on the corporate network could affect control systems networks. For example, because of weaknesses in the separation of lower security network segments from higher security network segments on TVA networks, an attacker who gained access to a less secure portion of a network such as the corporate network could potentially compromise equipment in a more secure portion of the network, including equipment that has access to control systems. As a result, TVA’s control systems that operate its critical infrastructures are at increased risk of unauthorized modification or disruption by both internal and external threats. 7.2.1.1 Weaknesses in TVA’s Corporate Network Controls Placed Network Devices at Risk Multiple weaknesses within the TVA corporate network left it vulnerable to potential compromise of the confidentiality, integrity, and availability of network devices and the information transmitted by the network. For example: • Almost all of the workstations and servers that were examined on the corporate network lacked key security patches or had inadequate security settings, thereby increasing the risk that known vulnerabilities could allow an attacker to execute malicious code and gain control of or compromise a system. • TVA had not effectively configured host firewall controls on laptop computers were reviewed, and one remote access system that was reviewed had not been securely configured. • Network services had been configured to operate across lower and highersecurity network segments, which could allow a malicious user to gain access to sensitive systems or modify or disrupt network traffic. • TVA’s ability to use its intrusion detection system2 to effectively monitor its net-work was limited and increased the risk that unauthorized access to TVA’s networks may not be detected and mitigated in a timely manner. 7.2.1.2 Weaknesses in TVA Control Systems Networks Jeopardized the Security of Its Control Systems Security safeguards implemented by TVA did not adequately protect its control systems networks and devices, leaving the control systems vulnerable to disruption by unauthorized individuals. For example: An intrusion detection system detects inappropriate, incorrect, or anomalous activity that is aimed at disrupting the confidentiality, availability, or integrity of a protected network and its computer systems.
2
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid
137
• TVA had implemented firewalls to segment control systems networks from the corporate network. However, the configuration of certain firewalls limited their effectiveness. For example, firewalls at three of six facilities reviewed were either bypassed or inadequately configured. As a result, the hosts on higher security control system networks were at increased risk of compromise or disruption from the other lower security networks. • The agency did not have effective passwords or other equivalent documented controls to restrict access to the control systems were reviewed. According to agency officials, passwords were not always technologically possible to implement, but in the cases that were reviewed there were no documented compensating controls. • Although TVA had taken steps to establish audit logs for its transmission control centers, it had not established effective audit logs or compensating controls at other facilities that were reviewed. According to agency officials, system limitations at these facilities have historically meant that multiple users shared a single account to access these control systems, thereby diminishing the usefulness of audit logs since system activities could not be traced to a single user. • TVA had not installed current versions of patches for key applications on computers on control systems networks. In addition, the agencywide policy for patch management did not apply to individual plant-level control systems. • Although TVA had implemented antivirus software on its transmission control systems network, it had not consistently implemented antivirus software on other control systems that were reviewed.
7.2.1.3 Physical Security Did Not Sufficiently Protect Sensitive Control Systems TVA had not consistently implemented physical security controls at several facilities that were reviewed. For example: • Live network jacks connected to TVA’s internal network at certain facilities that were reviewed had not been adequately secured from unauthorized access. • At one facility, sufficient emergency lighting was not available, a server room had no smoke detectors, and a control room contained a kitchen (a potential fire and water hazard). • The agency had not always ensured that access to sensitive computing and industrial control systems resources had been granted to only those who needed it to perform their jobs. At one facility, about 75% of facility badgeholders had access to a plant computer room, although the vast majority of these individuals did not need access. Officials stated that all of those with access had been through the required background investigation and training process. Nevertheless, an underlying principle for secure computer systems and data is that users should be granted only those access rights and permissions needed to perform their official duties.
138
N. Barkakati and G.C. Wilshusen
7.2.2 Information Security Management Program Was Not Consistently Implemented Across TVA’s Critical Infrastructure An underlying reason for TVA’s ICT control weaknesses was that it had not consistently implemented significant elements of its information security program, such as: documenting a complete inventory of systems; assessing risk of all systems identified; developing, documenting, and implementing information security policies and procedures; and documenting plans for security of control systems as well as for remedial actions to mitigate known vulnerabilities. As a result of not fully developing and implementing these elements of its information security program, TVA had limited assurance that its control systems were adequately protected from disruption or compromise from intentional attack or unintentional incident.
7.2.2.1 TVA’s Inventory of Systems Did Not Include Many Control Systems TVA’s inventory of systems did not include all of its control systems as required by agency policy. In its fiscal year 2007 report on compliance with federal information security requirements, TVA included the transmission and the hydro automation control systems in its inventory. However, the plant control systems at its nuclear and fossil facilities had not been included in the inventory. At the conclusion of GAO’s review, agency officials stated they planned to develop a more complete and accurate system inventory.
7.2.2.2 TVA Had Not Assessed Risks to Its Control Systems TVA had not completed categorizing risk levels or assessing the risks to its control systems. US law3 mandates that agencies assess the risk and magnitude of harm that could result from the unauthorized access, use, disclosure disruption, modification, or destruction of their information and information systems. However, while the agency had categorized the transmission and hydro automation control systems as high-impact systems, its nuclear division and fossil business unit, which includes its coal and combustion turbine facilities, had not assigned risk levels to its control systems. TVA had also not completed risk assessments for the control systems at its hydroelectric, nuclear, coal, and combustion turbine facilities. According to TVA officials, the agency plans to complete the risk assessments and security categorizations of remaining control systems. Federal Information Security Management Act (FISMA) of 2002, which was enacted as title III, E-Government Act of 2002, Pub. L. No.107-347, 116 Stat. 2899, 2946 (Dec. 17, 2002).
3
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid
139
7.2.2.3 Inconsistent Application of TVA’s Policies and Procedures Contributed to Program Weaknesses Several shortfalls in the development, documentation, and implementation of TVA’s information security policies contributed to many of the inadequacies in TVA’s security practices. For example: • TVA had not consistently applied agencywide information security policies to its control systems, and TVA business unit security policies were not always consistent with agencywide policies. • Cyber security responsibilities for interfaces between TVA’s transmission control system and its hydroelectric and fossil generation units had not been documented. • Physical security standards for control system sites had not been finalized or were in draft form. 7.2.2.4 Patch Management Weaknesses Left TVA’s Control Systems Vulnerable Weaknesses in TVA’s patch management process hampered the efforts of TVA personnel to identify, prioritize, and install critical software security patches to TVA systems in a timely manner. For a 15-month period, TVA documented its analysis of 351 reported vulnerabilities, while NIST’s National Vulnerability Database reported about 2,000 vulnerabilities rated as high or medium risk for the types of systems in operation at TVA for the same time period. In addition, upon release of a patch by the software vendor, the agency had difficulty in determining the patch’s applicability to the software applications in use at the agency because it did not have a mechanism in place to provide timely access to software version and configuration information for the applications. Furthermore, TVA’s written guidance on patch management provided only limited guidance on how to prioritize vulnerabilities. The guidance did not refer to the criticality of IT resources or specify situations in which it was acceptable to upgrade or downgrade a vulnerability’s priority from that given by its vendors or third-party patch tracking services. As a result, patches that were identified as critical were not applied in a timely manner; in some cases, a patch was applied more than 6 months past TVA deadlines for installation. 7.2.2.5 TVA Had Not Developed System Security and Remedial Action Plans for All Control Systems TVA had not developed system security or remedial action plans for all control systems as required under federal law and guidance. Security plans document the system environment and the security controls selected by the agency to adequately protect the system. Remedial action plans document and track activities to implement missing controls such as missing system security plans and other corrective actions
140
N. Barkakati and G.C. Wilshusen
necessary to mitigate vulnerabilities in the system. Although TVA had developed system security and remedial action plans for its transmission control system, it had not done so for control systems at the hydroelectric, nuclear, or fossil facilities. According to agency officials, TVA plans to develop a system security plan for its hydroelectric automation and nuclear control systems. Until the agency documents security plans and implements a remediation process for all control systems, it will not have assurance that the proper controls will be applied to secure control systems or that known vulnerabilities will be properly mitigated.
7.2.3 Opportunities Exist to Improve Security of TVA’s Control Systems Numerous opportunities exist for TVA to improve the security of its control systems. Specifically, strengthening logical access controls over agency networks can better protect the confidentiality, integrity, and availability of control systems from compromise by unauthorized individuals. In addition, fortifying physical access controls at its facilities can limit entry to TVA restricted areas to only authorized personnel, and enhancing environmental safeguards can mitigate losses due to fire or other hazards. Further, establishing an effective information security program can provide TVA with a solid foundation for ensuring the adequate protection of its control systems. Because of the risk and magnitude of harm that could result due to the interconnectivity between TVA’s corporate network and certain control systems networks, GAO recommended that TVA implement effective patch management practices, securely configure its remote access system, and appropriately segregate specific network services. GAO also recommended that the agency take steps to improve the security of its control systems networks, such as implementing strong passwords or equivalent authentication mechanisms, implementing antivirus software, restricting firewall configuration settings, and implementing equivalent compensating controls when such steps cannot be taken. To prevent unauthorized physical access to restricted areas surrounding TVA’s control systems, it was recommended that the agency take steps to toughen barriers at points of entry to these facilities. In addition, to protect TVA’s control systems operators and equipment from fire damage or other hazards, it was also recommended that the agency improve environmental controls by enhancing fire suppression capabilities and physically separating cooking areas from system equipment areas. Finally, to improve the ability of TVA’s information security program to effectively secure its control systems, GAO recommended that the agency improve its configuration management process and enhance its patch management policy. It was also recommended that TVA complete a comprehensive system inventory that identifies all control systems, perform risk assessments and security risk categori-zation of these systems, and document system security and remedial action plans for these systems. Further, GAO recommended improvements to agency information security policies.
7 Deficient ICT Controls Jeopardize Systems Supporting the Electric Grid
141
TVA concurred with all of GAO’s 19 recommendations regarding its information security program and the majority of 73 recommendations regarding specific information security weaknesses. The agency agreed on the importance of protecting critical infrastructures and stated that it has taken several actions to strengthen information security for control systems, such as centralizing responsibility for cyber security within the agency. It also provided information on steps the agency was taking to implement certain GAO recommendations.
7.3 Conclusions In summary, TVA’s power generation and transmission critical infrastructures are important to the economy of the southeastern United States and the safety, security, and welfare of millions of people. Control systems are essential to the operation of these infrastructures; however, multiple information security weaknesses exist in both the agency’s corporate network and individual control systems networks and devices, including the insecure interconnectivity between these networks and systems. An underlying cause for these weaknesses is that the agency had not consistently implemented its information security program throughout the agency. If TVA does not take sufficient steps to secure its control systems and implement an information security program, it risks not being able to respond properly to a major disruption that is the result of an intended or unintended cyber incident. While this study focused on the control systems risks and vulnerabilities at TVA, other research suggests that similar risks and vulnerabilities exist at other critical infrastructures. According to a control systems cyber security expert, contacts throughout industry have shared details and adverse effects of more than 125 confirmed cyber security incidents. The incidents were international in scope and span multiple industrial infrastructures. With respect to the electric power industry, cyber incidents have occurred in transmission, distribution, and generation including fossil fuel, hydro, combustion turbine, and nuclear power plants. According to the expert, many of the industrial control systems cyber incidents have resulted from the interconnectivity of systems, not the lack of traditional IT security approaches. Impacts, whether intentional or unintentional, range from trivial to significant environmental discharges, serious equipment damages, and even death [14].
References 1. GAO: Critical Infrastructure Protection: Federal Efforts to Secure Control Systems Are Under Way, but Challenges Remain, GAO-07-1036 (2007) 2. GAO: Critical Infrastructure Protection: Multiple Efforts to Secure Control Systems Are Under Way, but Challenges Remain. GAO-08-119T (2007) 3. Watts, D.: Security and vulnerability in electric power systems. 35th North American Power Symposium, University of Missouri-Rolla, Rolla, MI, pp. 559–566, 2003
142
N. Barkakati and G.C. Wilshusen
4. Amin, M.: Energy infrastructure defense systems. Special Issue of Proc. IEEE 93(5), 855–860 (2005) 5. Electric Power Research Institute: Report to NIST on the Smart Grid Interoperability Standards Roadmap (Contract No. SB1341-09-CN-0031 – Deliverable 7) (2009) 6. The White House: Cyberspace policy review: assuring a trusted and resilient information and communications infrastructure (2009) 7. National Institute of Standards and Technology: Guide to Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems Security: Recommendations of the National Institute of Standards and Technology, Special Publication 800-82 (2006) 8. Blair, D.: Annual threat assessment of the intelligence community for the senate committee on intelligence (12 Feb. 2009) 9. Tennessee Valley Authority: Fossil-fuel generation. http://www.tva.com/power/fossil.htm (2009). Accessed 23 June 2009 10. Tennessee Valley Authority: Nuclear energy. http://www.tva.com/power/nuclear/index.htm (2009). Accessed 23 June 2009 11. Tennessee Valley Authority: TVA’s dams and hydro plants. http://www.tva.com/power/pdf/ hydro.pdf (2009). Accessed 23 June 2009 12. Tennessee Valley Authority: TVA’s Transmission http://www.tvakids.com/electricity/ transmission.htm (2009). Accessed 26 June 2009 13. GAO: Federal Information System Controls Audit Manual, GAO-09-232G (2009) 14. Weiss, J.M.: Control systems cyber security – the current status of cyber security of critical infrastructures, testimony before the Committee on Commerce, Science, and Transportation, U.S. Senate (19 March 2009)
Chapter 8
Metering, Intelligent Enough for Smart Grids? Geert Deconinck
Abstract Advanced meters for electricity and gas provide two-way communication to upload commands to, and download measuring data from the meters. With a specific emphasis on the means for two-way communication, the amount of data to be transmitted and its real time requirements are analysed. Different communication means for smart metering are evaluated for suitability, penetration rate, exploitation costs, flexibility and dependability – focused on the case of region with an excellent communication infrastructure. Still, it is very difficult to fulfil all requirements for a future-proof smart metering infrastructure. Keywords Intelligent meters • electricity metering • communication for metering
8.1 Introduction In the context of the liberalisation of the European electricity markets, several countries are currently in the process of installing advanced meters residentially at each household: Italy [1], The Netherlands [2], Denmark, Sweden, France, Germany, etc. [3–5]. There are ongoing standardisation activities concerning meters and their communication: The Netherlands recently proposed a standard for advanced meters including local and remote communication interfaces [6], CIRED is identifying communication interfaces and protocols [7], the IEC is proposing standards for communication protocols for metering and object representation [8, 9], and Europe stimulates advanced metering in the context of energy efficiency [10], etc. Recently, a standardisation mandate has been given to CEN, CENELEC and ETSI to develop an open system architecture for utility meters involving communication protocols [11]. G. Deconinck (*) K.U. Leuven – ESAT/ELECTA, Kasteelpark Arenberg 10 bus 2445, B-3001, Leuven, Belgium e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_8, © Springer Science + Business Media B.V. 2010
143
144
G. Deconinck
The case of communication for smart metering applications in Flanders (6 million inhabitants, northern region of Belgium with 60% of its population) is specific, because of its high population density (440 inhabitants per square kilometer), a very high penetration of cable TV (95%) and of broadband Internet access (60%) and exploitation of advanced communication means (UMTS, terrestrial trunked radio, etc.). Furthermore, the DSL coverage (i.e. percentage of households that can subscribe to DSL – digital subscriber line) in Belgium is 100%, which is the highest level worldwide, and it provides the fifth fastest download access (20 Mbps); approximately 90% of the Internet access is via broadband [12]. Advanced meters for electricity and gas require two-way communication to upload commands to, and download measurement data from the meters. With a specific emphasis on the means for two-way communication, this research starts from an analysis of the amount of data to be transmitted and its real time requirements, and it evaluates different communication means for advanced metering for suitability, penetration rate, exploitation costs, flexibility and dependability – focused on Flanders’ case [13, 14]. Suitability is considered both in the narrow focus of metering, as well as in a broader focus of smart grids [15, 16]. The chapter starts by discussing the context of advanced metering, resulting in the requirements for two-way communication. Section 8.3 elaborates which communication means are suited for advanced metering. Section 8.4 provides a detailed analysis of cost and dependability issues. It concludes that it is not obvious to satisfy all communication requirements for a future-proof smart metering infrastructure – even with an excellent communication infrastructure.
8.2 Advanced Metering: Context and Communication Requirements 8.2.1 From AMR to Smart Metering Advanced metering can be implemented with different levels of intelligence associated to the meter. Typically three types can be distinguished, in order of increasing interaction level and feature contents. • AMR (automated meter reading) implies the remote reading of the measurement registers of a (electricity, gas, water, etc.) meter without physical access to the meter. It can be implemented via a temporary RF (radio) link to the meter from a car passing by in the street while interrogating the meters, or as an (always connected) communication link to the meter from the data collecting devices. Such link may use wireless or wired communication media. • AMM (automatic meter management) or AMI (advanced metering infrastructure) extends AMR with the ability to manage meters remotely. For instance, it allows for disconnection/reconnection of customers, for dimming their usage (e.g. down
8 Metering, Intelligent Enough for Smart Grids?
145
to a socially acceptable 6A or 10A for non-paying customers) or for integration of different tariff schemes in the meter. • Smart metering extends AMM with control abilities. For instance, it allows to shut down several customers simultaneously on short notice – in order to balance the grid in case of an incident –, or for demand side management – for usage flattening or load shifting –, or for integration in home automation systems – for automatic response to varying prices in real time pricing or time-of-use pricing scenarios –, etc. As such, smart meters are an indispensable enabler in a context of smart grids which deploy advanced information and communication technology to control the electrical grid [17]. Smart grids imply intelligence on the entire electrical grid. At the distribution level is comprises the metering, but also has connections to beyond the meter (energy management, active buildings, demand side management, etc.) and in the rest of the grid for monitoring and control (load profiling and forecasting, phase balancing, transformer optimization, asset management, etc.); potentially connected to other systems (web, business IT) for interaction with stakeholders, etc. [15, 16]. Smart grids also often imply many small generation units from renewable energy sources to be integrated, and new services and tariff schemes can be offered to the end customer [18]. Many different types of meters are on the market, from classic electromechanical Ferraris meters with an external pulse output to fully digital electronic power and energy meters [19]. Some manufacturers completely integrate the communication module into the meter (allowing for easier certification and tamper-resistance), while others provide a separate communication module via a dedicated communication port (using or not a standard, such as M-bus EN 13757 [8]). Most of these fall in the first and second categories. In the context of the presented study for the Flemish regulator, an advanced meter has been considered [13] that is able • To store multiple measurement registers (e.g. for consumption and generation of energy during different tariff periods) and to archive this data • To send measurement registers periodically (at least monthly) as well as on demand (typically once per year, e.g. if a consumer wants to change suppliers) • To be remotely (dis)connected and to reduce electricity or gas supply and to act as a budget meter • To undergo remote modification of tariffs and tariff periods • To have its firmware upgraded remotely, e.g. to incorporate new functionality • To send on demand power quality data (e.g. surges, outages, etc.) and to send automatic fraud alarms This set of requirements is conform to the Dutch standard NTA 8130 [6], which additionally requires that a group of meters can be reduced or disconnected as a whole, e.g. in order to cope with grid problems. Hence, according to these specifications, the meter rather belongs to the class of AMM-meters, than to the one of smart meters. In the rest of this paper, we firstly consider the above specifications, but also allow for future usage for smart meter purposes.
146
G. Deconinck
8.2.2 Advanced Meter Communication Requirements If advanced metering is only used to transfer data from the measurement registers, a smallband communication medium is sufficient and no real time constraints are involved. However, if demand side management is required, it is necessary to address the smart meters within a given time period and hence real time constraints need to be satisfied [20, 21]. Often, this implies that some type of broadcasting needs to be supported by the communication medium [22]. If – besides the classical measurement registers – also much additional data is to be transmitted (e.g. concerning power quality or load profiles) a broadband connection is required. Table 8.1 summarises the time and data size requirements for some typical transactions with advanced meters. In Flanders, there are about 3 million electricity meters (2.6 million residential) and 1.6 million gas meters (1.44 million residential). Hence, for approximately 3 million advanced meters this requires about 0.5 MiB per meter per year for reading measurement registers monthly. This is negligible with respect to Internet traffic, but still results in approximately 1 TiB raw data per year for Flanders. If not only monthly measurement registers need to be transferred, but also 15 min profile data needs to be send daily or if detailed power quality data needs to be transmitted, then the amount of data per meter can increase with two or three orders of magnitude. Furthermore, if a message needs to be sent to all meters within 1 h sequentially, only 1.2 ms is available if broadcasting is not supported. Hence, a hierarchical or parallel approach is required to send out commands to the meters to which a response is needed in a short time frame. Three communication architectures can be envisaged.
Table 8.1 Time and data size requirements per transaction type per meter Time Response No. times/ Transaction critical (min/type/max) year Data (min/typ./max) Yes Immediate/ 1 0.5 KiB/1 KiB/16 KiB Command store 5 min/1 h measurement registers No Immediate/ 13 (12 + 1) 1 KiB/32 KiB/16 MiB Send measurement 10 min/2 h registers (periodically + on demand) Command reduce load Yes Immediate/ 1 0.5 KiB/1 KiB/16 KiB 5 min/1 h Adjust parameters No Immediate/ 2 0.5 KiB/1 KiB/16 KiB 10 min/2 h Upgrade firmware No 10 min/ 0.2 0.5 KiB/1 KiB/512 KiB 2 h/1 day Send alarms No Immediate/ 0.2 0.5 KiB/1 KiB/16 KiB 10 min/2 h KiB: kibibyte = kilobinary bytes = 1,024 bytes (according to IEC 60027-2 Ed. 3.0 – 2005)
8 Metering, Intelligent Enough for Smart Grids?
147
• A direct connection is set up between the meter and the data collection point via a dial-in modem over the public switched telephone network (PSTN) or via a mobile data connection (GSM/GPRS). • A dedicated intermediate communication infrastructure is deployed between the meters and a concentrator, e.g. for power line carrier (PLC), or low power radio frequency (RF) communication solutions. • An existing intermediate communication infrastructure is used to connect the meters to the data collection points, e.g. by providing the meters with an Internet address and using existing broadband Internet connections of cables or phone lines.
8.3 Suitable Communication Means Three categories of communication media to smart meters have been studied in detail [13]: power line carrier, communication over telephone and cable infrastructure, and wireless communication, discussing technical aspects and the specifics for Flanders as an example of a densely populated area with advanced communication facilities. The following sections elaborate major advantages and disadvantages of these communication media for advanced metering applications.
8.3.1 Power Line Carrier Power line carrier, or power line telecommunication, uses the power grid for data communication [23, 24]. Digital data is modulated on a carrier at a specific frequency. Its usage is standardized and limited to specific spectra (EN 50065–1) [25]. The spectrum reserved in Europe for PLC lies between 3 and 148.5 kHz, in which the A-band (3–95 kHz) has been reserved for utility communications. This allows smallband communication only; realistic communication bandwidths reach up to about 4 kbps (kilobit per second). PLC has already been used for a long time for lower bandwidth applications, e.g. to switch public lighting or to switch between tariff periods. Each advanced meter requires a PLC modem which communicates with data concentrators which are often located in medium/low voltage substations. In Flanders, a typical distance of the meter to the substation transformer is about 400 m. In city areas, each transformer serves about 400 households; in rural areas this is often less [26]. Hence, approximately a 1,000 concentrators will be required to serve all meters. By using repeaters the communication reliability can be improved. Nevertheless, from personal communications with utilities, it seems that large European PLC-based deployments are not able to reach 1–5% of the meters. Also, when certain power problems occur (e.g. interrupted distribution cable), the communication medium is not available neither.
148
G. Deconinck
New developments include the usage of adaptive Orthogonal Frequency Division Multiplexing (OFDM) modulation techniques for improved throughput and robustness, compared to the often used Differential Code Shift Keying (DCSK) modulation techniques [27]. The major advantage of PLC is that no additional cabling is required, as the meter is already connected to the communication medium. Also, a concentrator can broadcast messages to all connected meters.
8.3.2 Smallband Communication over Telephone Lines For analog telephony and Integrated Services Digital Network (ISDN) (digital telephony), a connection is made over the public switched telephone network between an advanced meter and the data collection point. The modem at the meter needs to be connected to the telephone line, which is often not in the same place. Communication bandwidth is up to 56 (analog) or 128 kbps (ISDN) for a duplex connection. Setting up a connection (dialling in) takes a non-negligible time. The communication medium is very reliable, but multicasting is not supported. Flanders has a very high penetration rate of telephone (45 connections per 100 inhabitants), hence about 98% of the meters are connectable. This communication network remains functional in case of power problems, if the modem does not need a grid-connected power supply.
8.3.3 Broadband Connection over Phone Line or TV Cable The digital subscriber line (DSL) technology provides a broadband connection over a PSTN line, while a cable based connection uses the cable that carries television and radio signals to bring data communication to the home. These implementations allow for a bandwidth of hundreds of kilobit per second to several megabit per second. When this communication medium already exists in a home, it can be shared with the metering application. It is operated by a telecom provider or Internet service provider (ISP). For research purposes, we also include a dedicated broadband connection that is used for advanced metering only. In both cases, it is required to make a connection from the meter to the phone or cable equipment. Cable based broadband also allows for broadcasting data to all meters on the cable segment. Flanders has a high penetration rate for television cable, which serves about 95% of the households. If this is coupled with the high penetration of telephony, it is clear that it is a potentially widely available communication medium for smart metering. The reliability is assumed to be somewhat lower than smallband communication over telephone lines.
8 Metering, Intelligent Enough for Smart Grids?
149
8.3.4 Second or Third Generation Mobile Telephony and Data GSM (Global System for Mobile Communications) provides the standard for second generation digital mobile telephony; it is a circuit-based connection [28]. GPRS (General Packet Radio Service) is the corresponding packet-switched standard for data communication in the same frequency band [29]. GSM, GPRS and their variants or evolutions have a medium bandwidth (tens of kilobit per second), and a complete coverage of Flanders; however reception in cellars and alike (where the meters are often located) is not guaranteed. Third generation mobile telephony (UMTS) is being deployed in Flanders with a coverage of about 60% for outdoor purposes. UMTS allows a broadband connection of several hundreds of kilobit per second. The advanced meter needs to be equipped with a GPRS or UMTS modem, and the communication takes place over the network of the mobile operator, to whom a subscription fee has to be paid. Operators do not guarantee the availability of these networks for the foreseen lifetime of the meters (15 years). These cellular communication systems have a lower reliability than (landline) telephone connections.
8.3.5 Non-licensed RF Low-power radio often uses the non-licensed ISM (industrial, scientific and medical) band for RF communication often around 433 or 860 MHz. Each meter is equipped with a RF transmitter that allows communication to the data concentrator directly, or to other meters with RF transmitters which act as repeaters or forward the data, e.g. in a meshed network configuration. Also, an antenna infrastructure is required at the concentrators; this is typically not operated by a third party, but can be owned by the metering company or distribution system operator. Reliability of non-licensed RF is high, especially if there is a large penetration, such that the repeater functionality can be exploited [30]. It typically provides a smallband communication (up to some kilobit per second).
8.3.6 Licensed RF PMR (Professional Mobile Radio) is a group name for mobile radio systems that use licensed bands of the frequency spectrum. Typical implementations (besides walkie-talkies) include PAMR (Public Access Mobile Radio) and terrestrial trunked radio. PMR has been build for group communication (e.g. used by emergency services), but it is also standardised for utility services. Its main advantages are that it is a very reliable communication means with a very good coverage (also in cellars), that also provides broadcasting with a fast communication setup. It allows medium bandwidth communication (up to some tens of kilobit per second). The network is operated by an independent operator.
150
G. Deconinck
WiMAX is a novel RF communication technology of the fourth generation that used the licensed 2.5 or 3.5 GHz band or the non-licensed 5.8 GHz band. It is standardized as IEEE 802.16a with a potential range up to 50 km and a broadband connection up to 100 Mbps. The author is not aware of advanced metering implementations that use these communication means, which seems however quite well suited in terms of reliability and responsiveness.
8.4 Detailed Analysis 8.4.1 Cost Analysis Different studies have tried to identify costs and benefits of advanced metering [2, 3, 21, 31–33]. These studies show that the chosen medium for communication only plays a certain role in the overall business case; nevertheless, due to the uncertainty related to estimating the communication costs, its influence is non-negligible. Based on cost elements from other studies, and own estimations (detailed figures for the different cost elements can be found in [13]), a cost analysis has been made for several scenarios of communication means. Figure 8.1 provides a comparison of initial costs (such as the communication module of the advanced meter and the possible connection to the medium) and recurring costs (such as connection costs and subscription fees) for the different communication means.
Fig. 8.1 Initial and yearly costs per meter
8 Metering, Intelligent Enough for Smart Grids?
151
Fig. 8.2 Costs per meter over time horizon
These costs include concentrators where necessary, but they exclude the backbone ICT infrastructure (e.g. between the concentrators and the central data collection points), and the installation costs. For newer technologies (UMTS, PMR, WiMAX) costs are considered equal – due to uncertainty. Figure 8.2 aggregates these costs over a 15-year time horizon, which is assumed to be the lifespan of the smart meter. This analysis shows that communication based on existing broadband connections is the least expensive, shortly followed by PLC and RF. Next follows smallband telephony and GSM/GPRS. Novel communication means (such as PMR) do not seem to be significantly more expensive. A dedicated broadband connection is the most expensive.
8.4.2 Dependability Analysis Several dependability properties need to be analysed when comparing two-way communication media for advanced metering applications. • Availability: fraction of time that a connection between two communicating points is operational • Reliability: probability that a connection remains working for a specific time period • Integrity: probability that data is not wrongfully changed • Confidentiality: probability that data does not become known to external parties – without due authorisation
152
G. Deconinck
According to [34], also safety and maintainability are dependability attributes; however they are not that relevant from the perspective of the proposed study. Security is not defined as a separate dependability attribute, but can be considered to be a combination of integrity, confidentiality and availability [34]. Other properties can be related to dependability, such as real-timeliness and guaranteeing quality-of-service. In the context of smart metering, availability and reliability characterisation is required for the entire connection from the meter to the data collection point or vice versa. For network components (modems, routers, medium, concentrators), reliability is typically characterised by a mean-time-to-failure (MTTF), while availability relies on a combination of this MTTF with the mean-time-to-repair (MTTR). As such, a device or communication medium with a low MTTF and a short MTTR, will have a low reliability but can still have a high availability. For instance, if PLC is not working for about an hour a day, the availability is 96%. For media with identical availability, the one with a higher MTTF is preferred, because of its higher reliability. As such cable, GPRS and RF are preferred above PLC. From an availability and reliability perspective, one needs to consider all network components on the end-to-end path between communicating devices (e.g. from a meter over concentrator to the data collection point). As all these components need to be working for a working communication path, they form a serial reliability block diagram [35], which means that the reliability of the end-to-end-connection can be calculated by multiplying the reliability of each of the components Rsys (t ) = ∏ Ri (t ) . As such, i
for identical MTTF, communication media that involve more components (such as PLC or cable that uses concentrators and an intermediate communication infrastructure) are less reliable than media which involve fewer components (such as GPRS where the meter has a direct connection set up with the data collection point). Media with a lower MTTF (such as PLC) are less reliable than media with a higher MTTF (such as cable). However, when meters can use each other, as a repeater (such as is the case for PLC), they can tolerate certain faults, at a cost of some additional communication delay. If some meters would fail, there would still be a connection possible from the other meters to the data collection point. Reliability can then be modelled by taking the redundancy into account [35]; for instance, for an m-out-of-n system reliability is given by Rsys (t ) =
n −m
n
∑ .(R(t )) .(1 − R(t )) i =0
n −i
i
i
In the context of smart metering, integrity and confidentiality are typically associated to particular messages (data packets), rather than to the connection. Via a point-topoint and end-to-end protocol between two communicating entities, data can be encoded and encrypted. Point-to-point protocols act at the lower OSI-levels [36] (data link layer, routing layer, transport layer) from one point in the communication infrastructure to the next (e.g. between a meter and repeater, or between a meter and a concentrator, or between a concentrator and a data processing point); these depend much on the communication medium. End-to-end protocols act at the higher OSI-levels (application layer), e.g. directly from a meter to the data
8 Metering, Intelligent Enough for Smart Grids?
153
collection point; these do not depend on the communication medium and can be applied in any case. Encoding (and decoding) allows detecting integrity violations, both due to random or malicious faults. Error detecting codes or error correcting codes can be used for this [35]. Such codes can be applied in point-to-point and end-to-end communications. Encryption (and decryption) allows hiding information from third parties, and hence prevents confidentiality failures [37]. This is typically applied at an end-to-end level, so to avoid that intermediate communication entities are able to intercept (and/ or to modify, to generate, to listen to) the sent messages. Asymmetric or symmetric encryption can be used, depending whether a shared public key and a secret private key respectively a single shared secret key is used to encrypt information between two communicating entities. This choice has impact on the key management. For asymmetric encryption, a key pair (consisting of a private and public part) is required per communicating entity, while for symmetric encryption a secret key is required for each pair of entities that need to communicate with each other [37]. For example, if each meter (i.e. about 3 million meters for the Flanders case) needs to communicate with the database of the metering company (for metering purposes) as well as with the database of the distribution system operator (for monitoring purposes) as well as with the one of the electricity supplier (for demand side management purposes), then each meter needs only a single key-pair in the case of asymmetric encryption, while it needs three keys in the case of symmetric encryption. As a result, symmetric encryption, such as AES, requires a more complex key management infrastructure (with the number of keys proportional to n², with n the number of communicating entities), but it also has a higher implementation performance than asymmetric encryption (with the number of key pairs proportional to n). The threats associated with integrity problems depend on the bare bit error rates of the communication mediums, combined with the strength of the communication protocols. In this context, PLC is more prone to disturbances than RF than DSL and GPRS connections. The threats associated with confidentiality problems depend on the fact whether or not the communication medium is open to other respondents or shared with other applications. In this context, PLC and GPRS are less vulnerable than Internet-based or RF solutions.
8.5 Assessment Table 8.2 provides a summary of the most important attributes for these different communication means. The final decision on the best suited communication medium for advanced metering applications first needs to answer whether a solution which is only suited for AMM will be chosen, or if also future smart metering applications need to be supported. Besides, such decision has to consider both technical and non-technical requirements.
Addressability of the meter Suitability (bandwidth, BW) Suitability for real-time applications Flexibility Reliability
Operation
Costs
Accessibility
Functions with Functions with medium BW medium BW No No
Functions with high BW D: yes/E: no
High Medium/high
Medium High
Medium/high Very high
Medium/high High/very high
Mobile phone operator Directly
Medium to high High
GSM, GPRS 97%
Functions with low BW Yes
Telephone 98%
Wireless
Telephone operator Directly
Cable
Power line carrier Internet 95% Dedicated, D 95%/ existing, E 60% Medium D: very high E: medium Own D: telecom-provider E: ISP Via concentrators Directly
Table 8.2 Summarizing table
High Medium/high
Medium High
Own
Medium
RF 100%
High
PMR/trunked radio 100%
Medium/high Very high
Own or PMR-operator Via concentrators Directly or via concentrators Functions with Functions with Functions with high BW low BW medium BW Yes Yes Yes
Mobile phone operator Directly
High
UMTS 60%
154 G. Deconinck
8 Metering, Intelligent Enough for Smart Grids?
155
If a future-proof smart meter is required that allows for detailed and frequent consumer data (power quality data, quarter-hour profiles) – which also fits in a smart grid context, a significant larger bandwidth is required than when only a monthly remote reading is required for AMM purposes. This indicates that broadband Internet-based solutions or third generation mobile telephony would be a suitable communication infrastructure. When it is necessary that a set of meters is reached within a given time span as for smart metering control applications (real time requirements), then the medium shall support a form of broadcasting or parallelism. This favours solutions such as PLC or RF, or other wireless solutions (PMR, UMTS). When costs are an important issue, PLC/RF and existing Internet solutions are preferable for advanced metering applications. Besides, communication media operated by parties external to the energy markets (like telecom operators or Internet service providers) provide some potentially undesired level of dependence. This indicates PLC or RF as suited media. Concerning flexibility and reliability, all communication media fulfil the basic requirements for AMM meter applications. However, it would be a missed opportunity if large investments are made in the communication infrastructure for advanced metering if this would only be used for reading measurement registers remotely. From a perspective of its role in an advanced intelligent distribution grid, where smart meters contribute via demand side management to the smart grid management, it would be more desirable to much to go for hybrid, combined future-proof communication solutions to ensure a suitable simultaneous fulfilment of broadband, real time and dependability requirements. Acknowledgement This project is partially supported by the European Commission (SEESGENICT) and NGInfra (05/09/KUL).
References 1. Rogai, S.: Telegestore Project Progresses and Results (Keynote). IEEE International Symposium on Power Line Communications and Its Applications (ISPLC-2007), Pisa, Italy, p. 1, 2007. 2. Dijkstra, A., Leussink, E.M.A., Siderius, P.J.S.: Advies invoering slimme meetinfrastructuur bij kleinverbruikers (FAS 1-2893): SenterNovem, Oct. 2005 3. Energywatch: Get Smart: Bringing meters into the 21st century, p. 16, Aug. 2005 4. Gizmo or Revolutionary Technology: IET Seminar on Smart Metering, London, p. 60 (2008) 5. AlAbdulkarim, L., Lukszo, Z.: Smart Metering for the Future Energy Systems. In: Proceedings of the 4th CRIS International Conference on Critical Infrastructures (CRIS-2009), Linköping, Sweden, 2009 6. NEN: NTA 8130: Basisfuncties voor de meetinrichting voor elektriciteit, gas en thermische energie voor kleinverbruikers. Nederlands Normalisatie-instituut, p. 29 (Jan. 2007) 7. CIRED: Distribution utility telecommunication interfaces, protocols and architectures, Final Report of the CIRED Working Group WG06 (2003) 8. NBN: NBN EN 13757-X: communication system for meters and remote reading of meters. Belgisch Instituut voor Normalisatie (2003–2005) 9. IEC: IEC 62056 Electricity metering – data exchange for meter reading, tariff and load control. International Electrotechnical Commission (1999–2006)
156
G. Deconinck
10. COM(2008) 241 Addressing the Challenge of Energy Efficiency through Information and Communication Technologies, Commission of the European Communities, p. 11 (2008) 11. European Commission – DG Enterprise and Industry: M/441 Standardisation mandate to CEN, CENELEC and ETSI in the field of measuring instruments for the development of an open architecture for utility meters involving communication protocols enabling interoperability. European Commission, Brussels, p. 4 (2009) 12. OECD: OECD Communications Outlook 2007: OECD (2007) 13. Deconinck, G., Bekaert, D., Jacqmaer, P., Loix, T., Rigole, T., Verbruggen, B.: Studie communicatiemiddelen voor slimme meters (VREG 2006/0192): K.U.Leuven – ESAT (2007) 14. Deconinck, G.: An evaluation of two-way communication means for advanced metering in Flanders (Belgium). In: Proceedings of IEEE International Instrumentation and Measurement Technology Conference (I2MTC-2008), pp. 900–905. IEEE, Victoria, Vancouver Island, Canada, 2008 15. Chebbo, M.: EU SmartGrids Framework: Electricity Networks of the future 2020 and beyond. In: IEEE 2007 Power Engineering Society General Meeting, pp. 1–8, 2007 16. Vojdani, A.: Smart Integration. Power and Energy Magazine, IEEE 6, 71–79 (2008) 17. European Commission: EUR 22580 – Strategic Research Agenda for Europe’s Electricity Networks of the Future (SmartGrids): Luxembourg: Office for Official Publications of the European Communities (2007) 18. Deconinck, G., Decroix, B.: Smart Metering Tariff Schemes Combined with Distributed Energy Resources. In: Proceedings of the 4th CRIS International Conference on Critical Infrastructures (CRIS-2009), Linköping, Sweden, 2009 19. Tumanski, S.: Principles of Electrical Measurement. Taylor & Francis Group, Boca Raton, FL (2006) 20. Mak, S., Radford, D.: Design considerations for implementation of large scale automatic meter reading systems. IEEE Trans. Power Deliv. 10, 97–103 (1995) 21. Graabak, I., Grande, O.S., Ikaheimo, J., Karkkainen, S.: Establishment of automatic meter reading and load management, experiences and cost/benefit. In: International Conference on Power System Technology (PowerCon 2004), pp. 1333–1338, 2004 22. Sabolic, D.: Influence of the transmission medium quality on the automatic meter reading system capacity. IEEE Trans. Power Deliv. 18, 725–728 (2003) 23. Dostert, K.: Powerline Communications. Prentice Hall, New York (2001) 24. Pavlidou, N., Han Vinck, A.J., Yazdani, J., Honary, B.: Power line communications: state of the art and future trends. IEEE Communications Magazine 41 34–40 (2003) 25. EN 50065-1:2001 Signalling on low-voltage electrical installations in the frequency range 3 kHz to 148,5 kHz – Part 1: general requirements, frequency bands and electromagnetic disturbances. CENELEC (2001) 26. Hooijen, G., Vinck, H.: On the channel capacity of a European-style residential power circuit. In: Proceedings of 1998 International Symposium on Power Line Communication (ISPLCA’98), pp. 229–237, April 1998 27. Babic, M., Bausch, J., Kistner, T., Dostert, K.: Perfomance analysis of coded OFDM systems at statistically representative PLC channels. In: IEEE International Symposium on Power Line Communications and Its Applications, pp. 104–109 (2006) 28. ETSI EN 300 910: Digital cellular telecommunications system (Phase 2+) (GSM); Radio transmission and reception. European Telecommunications Standards Institute (2000) 29. ETSI EN 301 113: Digital cellular telecommunications system (Phase 2+) (GSM); General Packet Radio Service (GPRS); Service description; Stage 1. European Telecommunications Standards Institute (1999) 30. Levesque, W.: Did rules of thumb ever apply in AMR system selection? Metering International, pp. 42–43 (2006) 31. Owen, G., Ward, J.: Smart meters: commercial, policy and regulatory drivers. Sustainability First, p. 54+36, March 2006 32. van Gerwen, R., Jaarsma, S., Rob Wilhite, K.: Smart Metering. Leonardo-energy.org, July 2006, p. 9
8 Metering, Intelligent Enough for Smart Grids?
157
33. Schrijner, M., Burgers, J., Koenis, F.: Energiemeters worden mondiger … Resultaten van een kosten-batenanalyse naar de invoering van ‘slimme meters’ in Vlaanderen (KEMA 30820040-Consulting 08-1386). KEMA Nederland B.V. in opdracht van VREG 2008 34. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Depend. Secure Comput. 1, 11–33 (2004) 35. Johnson, B.W.: Design and analysis of fault-tolerant digital systems. Addison-Wesley, Reading, MA (1989) 36. Zimmermann, H.: OSI reference model – the ISO model of architecture for open systems interconnection. IEEE Trans. Commun. 28, 425–432 (1980) 37. Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton, FL (1996)
Chapter 9
Experience From the Financial Sector with Consumer Data and ICT Security Wim Hafkamp and René Steenvoorden
Abstract This chapter describes some standards and norms and best practices related to ICT security used within the financial services industry in the Netherlands. Although some of the best practices are sector specific, such as ATM security measures, we assume that most of the ICT security challenges for banks are similar to the challenges of the energy sector. Therefore, we hope that this paper will bring up new ideas for those who are responsible for ICT security in the energy sector. Information security and business continuity measures together belong to the area of operational risk management in a banking environment. Operational risk management is highlighted in Section 9.2. Operational risk management within a bank can best being described as a structured approach to respond to a number of threats according to the principles of Basel II. One the key elements is the capital calculation for operational risks. Section 9.3 explains a simplified model for the analysis of operational risks and the classification of data into three quality aspects: availability, integrity and confidentiality. In Section 9.4 state-of-the-art attacks, such as man-inthe-browser attacks, on Internet banking systems are discussed as an example of external fraud banks face today. The next Section explains the industries co-operative responses to those attacks. Finally, in Section 9.6 a parallel is made towards the energy sector with some conclusions and policy and research recommendations. Keywords Financial sector • information security • operational risk management Abbreviations AMA AIVD CRM
Advanced Measurement Approach Algemene Inlichtingen- en Veiligheidsdienst Customer Relations Management (system)
W. Hafkamp (*) and R. Steenvoorden Rabobank Nederland, P.O. Box 17100, 3500, HG Utrecht, The Netherlands e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality 15, DOI 10.1007/978-90-481-3594-3_9, © Springer Science + Business Media B.V. 2010
159
160
CDW EMV KLPD SCADA SEPA SMS
W. Hafkamp and R. Steenvoorden
Customer Data Warehouse Europay Mastercard Visa Korps Landelijke Politiediensten Supervisory Control and Data Aquisition Single European Payments Area Short Message System
9.1 Introduction Information can exist in many shapes. It can be spoken, written on paper, stored electronically, transmitted by electronic mail or by electronic means. Although the term information may not be confused with technology it is evident that ICT is a prerequisite for the execution of business processes in a banking environment. Some facts and figures. In 2007 the department Group ICT of Rabobank Nederland managed some 157,000 desktops and 21,000 server based software installations. Every day Rabobank executes more than one million Internet payment transactions. Rabobank has about 3,100 Automated Teller Machines (ATMs) and app. 14,500,000 successful ATM transactions a year. Information Technology and (consumer) data are important assets for a bank. PIN-codes, savings account information, transaction data are examples of personalized consumer data. One could say that this information is the raw material which is used to carry out the business processes. This is true not only for payment services, but also for financing, leasing and other processes. If the information is not sound or available it could lead to both financial and reputational damage. According to IEC/ISO standard for information security management practices [1] information security is the protection of information from a wide range of threats in order to ensure business continuity, minimize damage and realize customer confidence. Information security within the banking environment is often related to three quality aspects: Availability (A), Integrity (I) and Confidentiality (C) of data.
9.2 Standards and Norms 9.2.1 Standards and Norms Payment Services Payments services are strictly regulated on both a national as well as an international level. The European Payments Council for instance coordinates the process to establish a Single European Payments Area (SEPA) by developing all kinds of rules, including security requirements. The ISO has developed also several standards for (secure) data traffic, data handling and syntax. EMV and PCI (Payment Cards Industry) are typical brand related standards. PCI PED is an industry standard for ATM and PoS (Points of Sale) pin entry devices. Currence, owner of the ‘brand PIN’, and the Netherlands Bankers’ Association demand a number of extra requirements
9 Experience From the Financial Sector with Consumer Data and ICT Security
161
on top of PCI, the so-called PCI+ standard. These requirements focus on both physical and logical security related aspects like the physical installation of ATMs, external magstripereaders and the use of privacy shields.
9.2.2 Basel II Since the introduction of the term operational risk and the Basel II requirements [2] banks in the Netherlands have implemented a structured approach to respond to the different threats mentioned above. Basel II employs the “cause-event-effect concept”. According to this concept, deficiencies in the internal business operations or external factors (causes) can lead to negative occurrences (events) that can in turn lead to a loss for the bank or damage to the bank’s image (effect). Such events play a key role both in the Basel II agreement and within the operational risk management framework of any bank. The Basel II agreement includes a categorization of operational risks defined on the basis of seven types of events: 1. Internal Fraud – misappropriation of assets, tax evasion, intentional mismarking of positions, bribery 2. External Fraud – theft of information, hacking damage, third-party theft and forgery 3. Employment Practices and Workplace Safety – discrimination, workers compensation, employee health and safety 4. Clients, Products, & Business Practice-market manipulation, antitrust, improper trade, product defects, fiduciary breaches, account churning 5. Damage to Physical Assets – natural disasters, terrorism, vandalism 6. Business Disruption & Systems Failures – utility disruptions, software failures, hardware failures 7. Execution, Delivery, & Process Management – data entry errors, accounting errors, failed mandatory reporting, negligent loss of client assets The Basel agreements place demands on financial institutions that primarily relate to determining and calculating the amount of capital the bank is required to retain (solvency requirements). While Basel I already requires banks to retain capital for credit risks and market risk, based on Basel II banks will also be required to retain capital for operational risk. Basel II and various Supervisory bodies of the countries have prescribed various soundness standards for Operational Risk Management for Banks and similar Financial Institutions. To complement these standards, Basel II has given guidance to three broad methods of capital calculation for Operational Risk: • Basic Indicator Approach – based on annual revenue of the Financial Institution • Standardised Approach – based on annual revenue of each of the broad business lines of the Financial Institution • Advanced Measurement Approaches – based on the internally developed risk measurement framework of the bank adhering to the standards prescribed (methods include Scenario-based, Scorecard etc.)
162
W. Hafkamp and R. Steenvoorden
Rabobank (and the majority of the Dutch banks) has chosen for the Advanced Measurement Approach (AMA).
9.3 Risk Models The US Committee of Sponsoring Organizations of the Treadway Commission (COSO) developed in 1994 a framework to help organizations to evaluate and improve internal control systems. The current framework, called Enterprise Risk Management, consists of eight components: Internal Environment – The internal environment includes the organizations risk management philosophy with elements such as risk appetite, integrity and ethical values. Objective Setting – Enterprise risk management ensures that management has in place a process to set objectives and that the chosen objectives support and align with the entity’s mission and are consistent with its risk appetite. Event Identification – Internal and external events affecting achievement of an entity’s objectives must be identified. Risk Assessment – Risks should be analyzed, considering likelihood and impact, as a basis for determining how they should be managed. Risk Response – Management selects risk responses – avoiding, accepting, reducing, or sharing risk – developing a set of actions to align risks with the entity’s risk tolerances and risk appetite. Control Activities – Policies and procedures should be established and implemented to help ensure that risk responses are effectively carried out. Information and Communication – Relevant information should be identified and communicated in a form and time frame that enable people to carry out their responsibilities. Monitoring – The entirety of enterprise risk management is monitored and modifications made as necessary. COSOs Enterprise Risk Management is worldwide used as a de facto standard for integrated risk management activities in an enterprise. Some 2 years later, 1996, The US Information systems Audit and Control foundation published a framework for IT governance called COBIT . This framework is often seen as the risk management implementation of COSOs ERM for the IT department in a company. The framework emphasizes the importance of a plan-do-check-act cycle for IT services. COBIT defines four domains that are equal to the life cycle of an IT system: plan and organize, acquire and implement, deliver and support and finally monitor and evaluate [3].
9.3.1 Classification Process One of the first steps in an risk assessment process within Rabobank is the classification of (ICT) applications. The security classification process is the process by which (business) processes or applications are valued in terms of Availability, Integrity and Confidentiality (AIC). The security classification is assigned by using
9 Experience From the Financial Sector with Consumer Data and ICT Security
163
Table 9.1 ICT infrastructure availability classes within Rabobank E (A = 3)
Transactions Payment, savings, investing (retail) Internet banking
D (A = 3)
Customer (CRM) Front office local branch office bank Call Center Mail exchange
C (A = 2)
Back office Central data warehouse (CDW)
B (A = 2)
Management information Control Financial housekeeping
A (A = 1)
Static information retrieval (web) applications
Service window 7*24 Maintenance window 1 * week 0400–0500 exceptional request : 6 h No data loss 99.99% availability Fixed Service window Maintenance window Out of service window No data loss 99.99% availability Fixed Service window Maintenance window Out of service window End-of-day actuality 99.9 % availability Fixed Service window Maintenance window Out of service window End-of-day actuality 99% availability Fixed Service window Maintenance window Out of service window End-of-week actuality 98% availability
so-called AIC codes.The possible scores for each class are three (3), two (2) and one (1), the highest numeric indicating the highest criticality. The following criteria are taken into account when determining the security classes: Financial risk or risk to reputation Regulatory implications Impact on the business Some security controls in the information security baseline are default, some are related to the outcomes of a security classification process. For example, the ICTinfrastructure provider within Rabobank group defines five standardized availability classes for business processes indicating the required level of availiability and permitted data loss, see Table 9.1. They are related to the availability security code 1, 2 or 3 provided by the business owners of the applications.
9.3.2 Risk Analysis Risk analysis is an important element of any risk management process. It encompasses a systematic identification and analysis of relevant risks of which an organization is exposed to. There are two different approaches, a quantitative or a qualitative risk
164
W. Hafkamp and R. Steenvoorden
analysis. In the first approach risks are formulated in the form of loss expectancies (usually on an annual basis). In the other approach, qualitative risk analysis, qualifications, such as high impact, are given to each relevant risk. Organization sometimes combine the approaches in one self defined risk analysis model or in best practices models. The criteria by which risks will be evaluated and risk treatment is required have to be decided. Usually they are based on technical, financial, regulatory, environmental or social criteria. Important criteria to be considered are impact (on business processes) and likelihood [4]. The scope of an information risk analysis can range from an individual information system, specific system components or information services. The information risk analysis process within Rabobank Group is the process by which the residual risk is determined by an analysis of the areas of noncompliance with the determined information security controls and measures. The analysis of the areas of noncompliance takes into account all relevant threats, mitigating factors and the determined level of availability, integrity and confidentiality. Information risk analysis activities are performed periodically to address changes in security requirements and in the risk situation, e.g. in the assets, threats, vulnerabilities and impacts, and also when significant changes occur.
9.4 External Fraud 9.4.1 Internet Direct Banking Fraud In the Netherlands app. 8 million use the Internet for (regular) payments and investments banking activities. One of the big successes on Internet banking is called IDEAL. Five NL banks offer a joint web payment service in the Netherlands called IDEAL. Bank customers can use the authentication & signing procedures offered by their own bank for Internet direct banking to purchase goods and services from various webshops via this IDEAL service. From the early beginning of Internet direct banking in the Netherlands which started somewhere around 1997, security played an important role in the architectural design. Authentication of the customer and the integrity of the transaction are and were key starting points. All banks in the Netherlands have implemented some sort of two factor authentication to identify their Internet banking clients. Two factor means something you know (password or PIN) and something you have (token, card or device). With this two factor authentication mechanism a one time password is created for identification purposes and in many cases the mechanism also provides a challenge-response for transaction signing. Another basic principle of Internet direct banking is that customers are able to do their banking activities “Anytime and Anyplace.” This is one of the reasons that the enduser environment is (and can) not (be) managed by the bank. Banks are also very reluctant to download security software on a client PC despite the increase of
9 Experience From the Financial Sector with Consumer Data and ICT Security
165
Fig. 9.1 Anatomy of a malware attack on Internet direct banking
targeted malware. Main reason: Various PC software and hardware versions used by the customers can cause technical problems on the platform and the bank does not want to be held responsible for this. Despite the use of strong authentication banks in the Netherlands were faced with serious, sophisticated malware-attacks against their Internet direct banking applications since the beginning of 2007 [5]. Figure 9.1 shows the modus operandi of one of the attacks. The first step of a hacker is to infect the home computers. Often the victim receives an e-mail with a link to update his computer. Once updated the computer is infected by a trojan downloader. The computer is now under control of the attacker. Via the downloader the attacker installs a trojan that collects and sends out data from the PC to the attacker. The attacks for instance needs to know at which bank the victim does its direct banking activities so the attacker is able to download a bank specific trojan. Furthermore the attacker needs a money mule to do the money laundering. The recruitment of money mules is a story in itself. Moneymules are most times recruited via e-mail spam runs. Criminals offer a financial administration job via job advertisements in the name of Charity institutions like the Red Cross. They receive look-a-like job contracts from the criminal. Moneymules are asked to transfer the money via transfer agents to a foreign account within a couple of hours after receipt and they receive a certain percentage of the transfer amount. Police investigation in the Netherlands showed that moneymules in most cases are not aware that they were involved in criminal activities. After all the preparatory activities the attacker is now ready to launch the attack, e.g., by showing a fake Internetbankingscreen (‘error page’) and
166
W. Hafkamp and R. Steenvoorden
asking for re-entering the credentials. The trojan is able to penetrate the https-session offered by the bank by making use of known vulnerabilities in the browser of the computer.
9.4.2 Defensive Actions In response to the above described attacks banks have taken some countermeasures individually and jointly. The countermeasures can be categorized as (a) ‘secure the channel’, (b) ‘educate the consumer’, (c) ‘clean the Internet’ and finally (d) ‘transaction monitoring’.
9.4.2.1 Secure the Channel As stated before all Dutch banks use some form of two factor authentication (2FA) for the identification of their Internet banking customers. Although this is a strong defense measure recent attacks proved that under certain circumstances customers can be tricked. For instance when the computer is not patched with the latest browser and operating system security updates and the customer does not act according to the prescribed rules for Internet banking. Therefore most banks have made changes in their Internet banking dialogue between client and host. Some banks now use an extra channel for verification (SMS) whilst others ask for extra details in the challenge-response process such as total transaction amount and account number of the remittee.
9.4.2.2 Educate the Customer On the preventive side the Netherlands Bankers’Association started in close cooperation with the top eight banks in the Netherlands a media campaign on radio, television and the Internet. The campaign was called ‘3 X kloppen’ (knock three times).1 In the campaign the customer was asked to check three things each time they were doing business via their Internet bank: check the security of your PC, check the URL of our bank and check the entered payments. The campaign was evaluated as very successful. Many customers stated that they improved the security of their home pc, e.g. updated their anti-virus product, and said that they would take more notice of recognizing suspicious Internet direct bank websites.
http://www.3xkloppen.nl
1
9 Experience From the Financial Sector with Consumer Data and ICT Security
167
9.4.2.3 Clean the Internet This category contains all actions to frustrate criminal activities on the Internet. One example is the take-down of (the IP-addresses) of so-called dropzones. A dropzone is a server owned by a criminal which receives information from infected home computers. In the beginning of 2009 Banks in the Netherlands jointly developed a service to detect malware threats on the Internet and to respond quickly in the event that a bank is hit by malware. 9.4.2.4 Transaction Monitoring Finally, a lot of effort is devoted to what is called transaction monitoring. The rationality behind transaction monitoring is that by analysing and combining different (log)data, such as IP-addresses with unusual patterns in application logdata transactions can be recognized and possible fraudulent transactions can be stopped before the money actually leaves the bank.
9.5 Cooperation ICT security is not an issue for competition between the banks in the Netherlands. Under the umbrella of the Netherlands Bankers’ Association several Fraud and (IT) Security related working groups develop specific security guidelines or exchange information about lessons learnt, vulnerabilities, etc. All major NL-banks participate in the working groups. One of the most successful working groups is FI-ISAC. In 2003 the Netherlands Bankers’ Association (NVB) started a feasibility study on ICT security incidents and vulnerabilities information exchanges based on existing models of ISACs in the United States. By the end of 2003 a NVB working group, called FI-ISAC.NL, actually exchanged lessons learned on phishing-incidents.2 Since 2006 FI-ISAC works together with Dutch national police (KLPD), national intelligence service (AIVD) and the national computer emergency response team (GovCERT) to help the Dutch banks to develop preventive and monitoring measures against phishing and malware attacks. This initiative is called the NICC, National Infrastructure against Cybercrime.3 The NICC aims to set-up ISACs for each critical infrastructure sector in the Netherlands. The exchange of information is formalized through guidelines and is based on a so-called traffic light protocol. Red information, very sensitive information, is only exchanged during meetings. This information is not recorded in the minutes of a meeting. Yellow information is anonymous data which can be distributed within the organization on the basis of ‘need-to-know’. Green information can be distributed without restrictions. The ISAC email listservers are operated by GovCERT. ‘Phishing’ is a collective noun for internet related identity theft incidents. http://www.samentegencybercrime.nl (in Dutch)
2 3
168
W. Hafkamp and R. Steenvoorden
9.6 Conclusions and Recommendations 9.6.1 ICT (Security) in an Electricity and Banking Environment, Some Differences The difference between an ICT systems configuration of an electricity company and the average ICT configuration of a bank is not evident. Operational processes in both banks and electricity companies rely on fault tolerant ICT systems. The main difference can be found in the production environment. Electricity companies work with SCADA-systems to measure, analyze and control industrial systems in the operator environment. The ICT department of a Bank does not work with SCADAsystems. In the backoffice the electricity company makes use of similar ICT systems and applications for administrative functions (e.g. customer relationship and accounting). The threat model also contains differences and similarities. A financial institution is more prone to internal and external fraud than an electricity company. Cybercrime for instance is a topic that in particular hits modern banks today, especially with the current growth of direct Internet banking. On the other hand the electricity company will suffer more from major disasters, such as the so-called acts of god,4 than a bank. New developments in the world of electricity such as smartmetering will probably create new (fraud related) threats, privacy questions and authentication and integrity challenges that are similar to the issues which currently count for banks that deliver services via the Internet [6]. Banks and electricity companies are also part of the critical infrastructure. Society relies more or less on the continuity and integrity of both financial and power services. Failures will lead to reputational damage and may lead to chaos. Banks and electricity companies belong to the private sector. To achieve more control we expect that governments (public sector) will increase their regulatory power towards companies that operate in the area of critical infrastructure.
9.6.2 Policy Recommendations On a sector level banks agreed to manage information security as an area of non competition. In practice this means that lessons learnt from IT security incidents are shared and that where needed jointly countermeasures are taken. We recommend that the electricity companies in the various nations state similar and take actions to set-up a comparable information exchange model for IT security and fraud. We also recommend to take a closer look at other best practices in the finance sector, especially related to integrity and confidentiality risks. Such as floods, lightning storms, etc.
4
9 Experience From the Financial Sector with Consumer Data and ICT Security
169
9.6.3 Research Recommendations Furthermore we recommend to address a number of research questions. The first question is about risk management models. The Basel II framework in our opinion is an excellent method to mitigate operational risks. A formalized method of calculating capital reserves to sustain major incidents and at the same time a method to influence the total capital amount by means of implementing sufficient and effective preventive measures create management attention and response within the entire organization. We recommend that the energy sector develops a similar model for operational risk management taken into account the experiences of the financial sector. The introduction of smart metering introduces new risks related to for instance privacy. We recommended the electricity sector develops a security architecture based on sustainable principles of (customer) authentication and data integrity and confidentiality similar to the models Dutch banks use for their Internet direct banking applications and for their ATM networks. For the latter we recommend that the sectors takes a closer look at the way cryptographic schemes are implemented between ATM clients and hosts in payments networks. We also recommended that electricity companies enhance their ICT monitoring systems to detect unusual patterns in power usage in real time. The security of SCADA systems will become more challenging once they are no longer isolated from the outside world. Special attention is needed for remotely controlled systems over the Internet. The ICT banking environment has also remotely controlled critical applications. For these environments specific authentication and authorization mechanisms are in place such as two factor authentication, challengeresponse, out-of-band and four-hands principles. We advice the energy sector to take advantage of some lessons leant from the financial sector with respect to remote control of ICT systems.
References 1. NEN-ISO/IEC 27002: Information technology – Security techniques – Code of practice for information security management, November (2007) 2. Basel Committee on Banking Supervision, International Convergence of Capital Measurement and Capital Standards (2005). http://www.bis.org, November 2005 3. Guldentops, E.: Governing information technology through COBIT, IFIP TC11/WG11.5 4th Working conference on integrity and internal control in information systems. Kluwer, Brussel (2001) 4. ENISA: Risk management: implementation principles and inventories for risk management/ assessments methods and tools, June (2006) 5. Keemink, S., Roos, B.: Security analysis of Dutch smart metering systems. University of Amsterdam, Amsterdam (2007) 6. Hafkamp, W.H.M.: Is internetbankieren nog wel veilig? NIBE Bank en Effectenbedrijf – Fraudeflits, September (in Dutch) (2007)
Chapter 10
The Way Forward Laurens J. de Vries, Marcelo Masera and Henryk Faas
Abstract Current trends in energy, like the increasing need to integrate renewable energy sources, the evolution of future smart grids and distributed generation, are likely to dramatically increase the dependence of the electricity infrastructure on ICT. This opens new possibilities but also creates new risks, both inherent risks and due to new opportunities for malicious attacks. This book has highlighted these challenges from different points of view. The aim of this final chapter is to demonstrate how they together show a way forward to ensuring a secure and high quality supply of electricity that meets future demands. Keywords Power sector • information and communication technology • risk assessment
10.1 Introduction The electricity infrastructure has benefited from the implementation of ICT applications for technical, economic and administrative functions. This has improved many of these functions, providing more functionality and/or lowering costs. ICT infrastructure and applications have also made entirely new services possible. However, this L.J. de Vries (*) Technology, Policy and Management, Delft University of Technology, P.O. Box 5015, 2600, GA Delft, The Netherlands e-mail:
[email protected] M. Masera Joint Research Centre, Institute for the Protection and Security of the Citizen, European Commission, Ispra (VA), Italy e-mail:
[email protected] H. Faas Joint Research Center, Institute for Energy, European Commission, 1755, ZG Petten, The Netherlands e-mail:
[email protected] Z. Lukszo et al. (eds.), Securing Electricity Supply in the Cyber Age, Topics in Safety, Risk, Reliability and Quality, 15, DOI 10.1007/978-90-481-3594-3_10, © Springer Science + Business Media B.V. 2010
171
172
L.J. de Vries et al.
development also entails new risks, as was shown in Chapter 7 in which Barkakati and Wilshusen described the results of an investigation of the Tennessee Valley Authority, one of the United States of America’s largest electricity companies. This investigation identified numerous vulnerabilities to both internal and external threats. In this chapter, we will evaluate the implications of the trends with respect to the use of ICT in the electricity infrastructure and associated risks that were identified in this book. Based on this analysis, we will suggest a way forward: which actions should be taken and what should be done by which actors.
10.2 Trends Many electricity markets around the world have been restructured with the goal of introducing competition, or at least increasing private investment, in the past two decades. This has significantly increased the organizational complexity of the electricity sector, as the number of active parties increased and decision power was devolved. Where possible, this restructuring process also involved increased trade between connected markets, leading to increasing connections between markets or even their integration. As system operators, regulators and market operators tend to have a national scope, and laws and regulations in the connected markets continue to be different, this has caused the organizational and institutional complexity to increase. Certain technical functions, such as balancing and congestion management, must be taken care of at a system level and therefore require close cooperation between many actors in these interconnected markets. Another trend is the shift towards renewable energy sources. Some sources, like hydropower, geothermal power and biomass, are partly or fully controllable. However, the fastest growing source of renewable generation is wind power, while in the longer run the potential for solar power also is high. These sources are much less controllable. In the absence of hydropower in many areas or of another medium for energy storage, electricity networks are beginning to be subjected to strong variations in the volume of power that is generated in specific locations. According to the binding, unilateral targets that the European Union has set for itself, in 2020, 20% of energy consumption in the EU should be from renewable sources, up from 8.5% in 2005. As many forms of renewable energy are most suitable for electricity production – rather than, for instance, transport or space heating – the share of renewable energy for electricity production will need to be even higher. In 2008, in the EU more wind capacity was installed than any other electricity generation technology [1]. The increase of wind energy production to a targeted 480 TWh by 2020 will have a significant impact on the electricity infrastructure, both in terms of how it is operated and the demand for network capacity. The intermittent nature of wind (and solar energy, but the volume of solar energy is currently much smaller) creates specific operational challenges, for instance with respect to balancing supply and demand. Wind parks typically require substantial network investments as they often are located away from the main transmission grids, for instance off shore. Further network capacity increases may be needed to spread the fluctuations in wind power
10 The Way Forward
173
over a larger part of the network, which appears to be a more economic way of dealing with these variations than creating energy storage facilities [2]. Not all renewable energy is developed in large-scale locations. Much of the existing solar power and small-scale wind and hydropower feed directly into the distribution grid. They are part of a wider trend of distributed generation, for which small-scale combined heat and power is an important driver. Distributed generation causes the role of distribution networks to change as they need to accompany twoway traffic and may need to start to contribute to the balancing of supply and demand. An open question is how large volumes of distributed generation would be organized. As businesses and even households may become net electricity producers at certain moments, the distribution network will need to change its functionality in order to accommodate bidirectional flows of electricity from small distributed generation plants. This would require more flexible control over the network, which is why distributed generation is often associated with the concept of smart grids. In a smart grid, ICT is much more integrated into the electricity network than in the existing electricity infrastructure. This provides substantially more functionality to its users, for instance to support distributed generation, and makes new services to consumers possible. In this vision, the roles of producers, consumers, supply companies and network managers change substantially. A further step would be the creation of microgrids. They would include local networks of distributed generation, storage and controllable load; such a network could be operated as a single aggregated load or generation plant. Microgrids might even have the capacity to operate in island mode, disconnected from the main grid, in case of disruptions in the main grid, and might also contribute to black start capabilities of the main grid. Distributed generation units could be operated by their owners, but another option would be for them to be aggregated into virtual power plants which could be operated as a single unit, for instance by a power company. Of course, the users of these units would be able to place restrictions on the use of the generation units, in particular if they are also used to provide heat.
10.3 The Role of ICT All these trends lead to a significant increase in the role of ICT in the electricity infrastructure, to the point that ICT will become completely embedded in the electricity infrastructure and the latter will not be able to function without ICT. Competition and market integration require the exchange of large amounts of data, for instance about network flow forecasts, between more parties, more frequently and closer to real time than has been the practice traditionally. Organized power exchanges operate through Internet; market integration and congestion management build further upon these systems. Therefore the need is growing for large, innovative ICT applications such as wide-area monitoring systems (WAMS); see Chapter 4. As the geographic scale increases, new supervisory control mechanisms will need to be developed, particularly when multiple system operators need to work more closely together. Distributed generation is also unthinkable without a large role for
174
L.J. de Vries et al.
ICT in the control of the network and the generators, market operation, metering and billing, for instance. Apart from these trends, there is a strong incentive for electricity network companies to implement ICT in order to reduce costs, such as the cost of metering and in order to be able to use available network capacity more efficiently. Thus there are many drivers for a substantial increase in the role of ICT at all levels of the electricity system, for technical as well as commercial applications. ICT applications in the electricity infrastructure can be divided into two categories: communication and information processing. Communication links can be divided into links within organizations and between organizations. Both may cross national borders, as some companies operate in more than one country, while operators from different countries also need to interact. Information processing occurs in many places in the electricity infrastructure and in all stages of the information flows. The increasing integration of ICT in the electricity infrastructure causes the already high complexity of the power infrastructure to increase further. More complexity may create new risks of systemic failures, which are difficult to identify until they manifest themselves, and arduous to counteract as they require many coordinated modifications by numerous actors throughout the electricity infrastructure. In Chapter 3, Robert demonstrates the complexity of interrelations between infrastructures, especially during crises, and the risk of domino effects. In Chapter 5, Tranchita et al. further discuss the consequences of the interdependence of ICT and electricity infrastructures. The dependence of these infrastructures upon each other, each of which is critical to society, combined with the lack of precaution (for instance in the use of ‘off the shelf’ software) and the lack of risk assessments, are troubling. The complexity is increased again by the presence of different generations of ICT applications. Because ICT applications in the electricity infrastructure need to be extremely reliable, companies are hesitant to replace existing applications with newer ones – which appear very frequently – even if they add functionality or are faster. Therefore several generations of ICT equipment tend to exist in parallel. This raises maintenance and management costs and increases the risk of errors. It also places extra demands on the training and skills of the workforce. As both the market side and the network management side of electricity infrastructures become more integrated, the lack of standardization will begin to pose a greater challenge.
10.4 Risks The increasing reliance upon ICT in the electricity infrastructure creates new risks. Nevertheless, the integration of ICT in the electricity infrastructure will take place because of the functionality and the cost reductions that it enables. Therefore we need to analyze these risks and mitigate them where possible. It is convenient to divide risks between inadvertent failures – that originate from within the infrastructure – and malicious threats, which may come from outside the system, but may also be caused by staff within the electricity sector. While the reliability of ICT systems
10 The Way Forward
175
can be improved by applying better software and hardware engineering technologies, protection against cybersecurity problems is more difficult. The electricity sector does not guide the development of ICT, so it inherits the problems and the weaknesses of generic ICT standards and applications. Many ICT applications have not been designed with security in mind. The trend towards linking formerly standalone computers to the Internet increases the vulnerability to malicious threats. Security breaches may have two types of consequences: they may affect electric power service and they may affect information security. The former category concerns the risk that power service will be interrupted, whereas the second category involves risks related to privacy and data confidentiality. Technical (inadvertent) failure mainly affects the reliability of electricity service, while malicious threats may be aimed at service interruptions or at obtaining confidential information. Access to confidential information can be used for commercial purposes, for instance to manipulate the electricity market. Information about small consumers may provide indications of habits and consumption patterns. Companies may abuse this information for commercial purposes, but more malicious applications can also be imagined (e.g. regarding the presence of people at home, etc.). Cybersecurity incidents (both intentional and accidental) may have severe consequences with respect to the safety of the power installations and their workers. This is a delicate point of contact between the cyber world and the physical one. One should take into consideration that, for instance, potential cyber attackers can modify protections, thereby nullifying all preventive actions.
10.5 Who Benefits, Who Pays? The implementation of ICT in power systems entails substantial costs, such as for the design, the development of applications, the procurement of equipment and maintenance. Risk-related costs, such as additional protection and the management of incidents and their consequences, are not trivial. While it is clear who pays for the acquisition of the ICT systems, it is less clear who bears the costs of failures. Because the electricity infrastructure is critical to society and potential problems therefore exceed the reach and capacity of individual power companies, the security of this critical infrastructure is overseen by government. However, the decisions regarding the protection of the electricity infrastructure are made by the power operators. Dealing with cybersecurity risks in the power sector therefore raises urgent questions to the industrial actors, to society and to governments. When a company uses ICT in such a way that its consequences (positive and negative) affect only that company, the decisions about its reliability and security are a matter of management. Industrial actors will normally follow the business logic of maximum return on their investments. Many ICT systems, however, cause effects far beyond the company that uses them. In this case, the difference between who is at the origin of the problem and who may suffer from its effects cannot be ignored. In many cases, the benefits of ICT accrue to the companies who implement
176
L.J. de Vries et al.
them – e.g. by reducing their costs – whereas the risks are born by the consumers. Power outages cost society orders of magnitude more than they cost power companies; the negative effects of theft of information may be entirely for the consumers whose data has been stolen. Small consumers tend to be less able to protect themselves against the risk, in part because they are less aware of ICT security problems. Thus there is a potential conflict of interests between the industrial actors (e.g. the network operators) who benefit from the efficiency increases, and the consumers who may pay the risks in terms of privacy and reliability. In such cases, the benefits and consequences of ICT for all implied actors should be analyzed. Regulation may be needed to ensure that those who control the risks take all the interests that are at stake into account. Matters are further complicated when ICT systems cross jurisdictional borders, for instance to support the information flows among actors on different sides of a border. An important question is who should take the decisions. It is evident that many of the most influential decisions are the responsibilities of the operators, as the procurement and usage of ICT systems is in their hands. Nevertheless, when dealing with issues that may be critical to society, there is an important role for government. This can take the form of establishing minimum requirements and other guidelines. When the potential consequences of failures can affect end users, they should be able to evaluate and decide upon those risks. Legal prescriptions may help set the framework for the interaction among the different actors, specifying their rights and obligations. However, the key question is how to determine what should be done about the risks. Effective risk management requires a comprehensive view of all problems that need to be incorporated into the normal design, development and operation of the power systems. Problem areas can be summarized as: • How to create incentives for the power system operators to apply sufficient security and safety protection procedures, for issues that might exceed the boundaries of their companies? • How to balance all hazards, for all stakeholders, in a multi-national setting – being able to synthesize a joint approach across the power infrastructure? • How to include risk-related cost considerations without distorting the market, or creating unfair advantages for some operators? • How to deal with the standardization of risk-related measures, without creating unnecessary and inefficient obligations and bureaucratic burden? • How to deal with transparency issues regarding safety and security – among operators, between operators and their customers, between operators and governments, across national borders?
10.6 Where To Go? Electricity infrastructure security issues cut across many disciplines: engineering, ICT, economics, management, to name but a few. The issues also affect many actors: generating companies, transmission and distribution companies, consumers,
10 The Way Forward
177
power exchanges and regulators. Problems are often international or inter-state, multiplying the number of actors and creating questions regarding interconnection and differences in regulation. Security issues involve multiple infrastructures, namely one or more ICT infrastructures (Internet, telephone, SCADA systems) in addition to the electricity infrastructure itself. The classic Cartesian solution of analyzing the parts of a system does not work, because the main goal is to avoid a break-down of the system as a whole. Thus we have a typical example of a complex problem. This means that multiple methods will be needed to improve our understanding and multiple approaches may need to be taken to addressing the problems. Complex systems can be modeled. Human behavior can be tested, and people can be trained in simulations. Robert (Chapter 3) stresses the importance of human interaction during a crisis, and therefore the need for organizations to know each other. Large-scale emergency exercises may be needed to find out how the system behaves under stress and to teach the involved actors how to respond.
10.6.1 Research, Risk Assessment and Communication In this book, the need for a comprehensive, systematic risk assessment has been emphasized, in particular with respect to interdependencies between different critical systems (ICT, electricity and others) and possible domino effects, but also with regard to second order failures in which backup systems come under strain due to sudden, high demand. The complexity of the assessment needs to be balanced against the need for a transparent approach. There are good reasons for down-scaling the electricity infrastructure, or at least the level at which it is organized. In Chapter 2, Ilic discussed the merits of Dynamic Monitoring and Decision Systems (DYMONDS) as a possible means for coordinated interactions between distributed decision makers. Key to this solution is that it is not top-down, but that distributed control units together manage the system, and that control is dynamic, rather than static. This places strong demands on modeling capabilities, among others because there is an increasing need for models that operate close to real time, in addition to the models for the planning and the design phases. Significant but scattered efforts in this respect are under way worldwide. The modeling of the power infrastructure with its interconnected ICT infrastructure, interdependencies with other critical systems and between organizations and operators would benefit from an open exchange of models and methodologies. The modeling efforts will also need to be extended more across disciplines, since organizational, market, policy, legal and human behavior aspects play crucial roles in the behavior of the electricity infrastructure. Finally, new and better networked experimental facilities are required, for example to test the ICT components of power systems, through the creation of R&D centers and networks of excellence or technical platforms for demonstrating and validating. Greater emphasis is needed on training of operators and researchers with realistic scenarios. In general, research efforts need to be
178
L.J. de Vries et al.
better linked and best practices shared to avoid duplication, enhance quality through peer review and to concentrate efforts. A key issue is the availability of solid data. Confidentiality, competition and security concerns will need to be balanced with the need for (controlled) dissemination of information, including cross-border data exchange and collaboration. Generating and in particular analyzing data appears to be a significant challenge (see Chapter 4 on the wide area measurement systems) but solutions may be more straightforward compared with the question of how to share the information. A major focus in governance therefore needs to be on the international exchange of relevant information between organizations, but also on the communication of risks to industry and household end-users.
10.6.2 Organization and Governance The organizational challenge is for individual utilities to give up some of their independence and cooperate, sharing information with each other or through neutral bodies. National regulatory authorities will need to improve cooperation and increase transparency and information exchange. In Europe, the planned European Agency for the Cooperation of Energy Regulators should function as an independent body to monitor the security of the network and review rules put in place to ensure reliability of the system. However, red tape and additional administrative burden need to be limited and operators should have a clear interest in developing joint capabilities and public–private partnerships.
10.6.3 Understanding the Cost of Failures Current market mechanisms are inadequate for electricity as a network based good with very limited storage options. New ICT systems could improve cooperation and coordination between actors, but they are costly and incentives would have to be provided to factor in a public good like security of supply when making investment decisions. It is therefore important to understand and communicate to the public the potential impact of failures (costs of blackouts etc.) on different groups of consumers and society in general. With an increased link between ICT and the electricity grid, national security can also be at stake.
10.7 Conclusions From this book it is clear that the cybersecurity of power systems (including the prevention and the management of digital attacks) requires careful consideration. Part of the solution is technological and consists of designing more secure ICT systems.
10 The Way Forward
179
Part of the solution is scientific, as there is a need for better understanding of potential vulnerabilities and threats. Part of the solution is organizational, as the many actors of the power infrastructure (in addition to the security solutions implemented by the single operator) need to agree on and put into practice measures of dealing collectively with the issue of cybersecurity. Challenges are as much on the organizational and governance side as they are on technical feasibility or modeling and risk assessment methodologies. At this point, the way forward is not towards a clearly defined end state, but will need to be found through a process of research and development, dialogue and exchange.
References 1. EWEA (European Wind Energy Association): ‘Wind energy’s number one spot is more than justified’. News release, 19 March 2009. Marseille: EWEA (2009) 2. Ummels, B.: Wind integration: power system operation with large-scale wind power in liberalised environments. Doctoral dissertation, Delft University of Technology (2009)
Index
A Advanced metering infrastructure (AMI), 144 Agent based modeling (ABM) method advantages, 86 agent-based simulation (ABS), 84, 86 individual-based modeling (IBM), 86 intelligent software agents, 86 interdependent critical infrastructures, 86, 87 Airolo-Mettlen tripped line, 61 Area control error (ACE), 20 Automated meter reading (AMR), 144 Automatic generation control (AGC), 20, 21, 76–78 Automatic meter management (AMM), 144–145 Auto-reconfiguration, 29, 30 Availability, integrity and confidentiality (AIC), 162–163 Available transfer capability (ATC), 24 B Barkakati, 10, 171 Bayesian networks catastrophic impact, 99 inferences types, 100–101 joint probability distribution function, 99 posterior probability distribution, 100 structure and numerical parameters, 99–100 Bialek, 9 Business Management System (BMS), 76–77 C COBIT, 162 Code Red II worm, 104
Committee of Sponsoring Organizations of the Treadway Commission (COSO), 162 Complex adaptive socio-technical systems, 3–4 Complex networks cascading failure model, 92 full rerouting policy, 93 graph theory, 92 interconnected large systems, 91 scale-free network, 92 two-layer system, 93 Consequence-based approach, 36 Contingency analysis, 66–67 Critical infrastructures (CIs), 33–34 industrial control systems, 130–131 physical/virtual systems and assets, 129 risk analysis, 131–132 Smart Grid technology, 130 Tennessee Valley Authority (TVA) audit logs, 137 corporate network control, 136 firewalls, 137 physical security, 137 security safeguards, 136 Criticality hierarchy, 94, 97, 102 D DC optimal power flow, 25 Deconinck, 10 Dependency curves and flexible cartography approach CI level indicators, 37, 38 flexible cartography, 38, 39 Montreal, supply areas, 38, 39 relative system, 37 water supply system, 37–38
181
182 Dependency (cont.) electricity diesel consumption, generators, 41, 42 domino effects, 43, 45 electricity outage, downtown Montreal, 41 generators, 43–44 modern society, 40 natural gas outage, 44–45 organization’s vulnerability, 40 petroleum supply and demand curves, 41–42 power outage, 40 priority levels, 44 telecommunications domino effects, 46–48 emergency management and recovery, 46 telecommunications outage, downtown Montreal, 46, 47 vulnerability, Quebec City, 47–48 Digital subscriber line (DSL), 144, 148 Directed acyclic graph, 97, 99 Distribution Management System (DMS) functions, 95–98 Donahue, Tom, 103 Dynamic monitoring and decision systems (DYMONDS), 30–31 E E + I paradigm, 114–115 Electricity and telecommunications application and validation, 49 dependency CI level indicators, 37, 38 diesel consumption, generators, 41, 42 domino effects, 43, 45–48 electricity outage, downtown Montreal, 41 emergency management and recovery, 46 flexible cartography, 38, 39 generators, 43–44 modern society, 40 Montreal, supply areas, 38, 39 natural gas outage, 44–45 organization’s vulnerability, 40 petroleum supply and demand curves, 41–42 power outage, 40 priority levels, 44 relative system, 37
Index telecommunications outage, downtown Montreal, 46, 47 vulnerability, Quebec City, 47–48 water supply system, 37–38 geographic interdependencies, 34 interdependencies identification, 36–37 two indispensable resources, 35–36 Electric power architectures, 17, 18 Electric power systems, trends and developments assumptions and objectives feedback control functions, 20–21 hierarchical information structure, 21–22 physical architecture, 17–18 planning and operating performance, 18–19 uncertain equipment management, preventive approach, 19–20 common information model (CIM), 15 disruptive hardware technologies, 15 open access operations distributed decision makers, 28 dynamic monitoring and decision systems (DYMONDS), 30–31 electric power grids, 27, 28 fully distributed architectures, 27 iterative methods, 27 monitoring and control, 29–30 novel standards and protocols, 29 on-going industry changes, 26–27 performance enhancement, ICT economic performance, 25 environmental performance criteria, 25–26 reliable operations, non-time critical contingencies, 24 system viability, normal conditions, 23–24 tradeoff performance, 23 resource integration, 15 Electromechanical Ferraris meters, 145 Energy management systems (EMS), 21 Enhanced software, 25 Enterprise risk management, 162 Extra-high-voltage (EHV) transmission network, 17–18 F Failure Mode, Effects, and Criticality Analysis (FMECA), 95–96 Feed-forward decision making, 20
Index Field remote terminal units, 66 File transfer protocol (FTP), 104 Financial sector cooperation, 167 external fraud defensive actions, 166–167 Internet direct banking fraud, 164–166 IEC/ISO standard, 160 recommendations ICT (security), 168 policy recommendations, 168 research recommendations, 169 risk models classification process, 162–163 enterprise risk management, 162 risk analysis, 163–164 standards and norms Basel II, 161–162 payments services, 160–161 Flemish regulator, 145 Flexible AC transmission systems (FACTS), 29 G Generalized Petri stochastic nets (GPSN), 83, 85 General Packet Radio Service (GPRS), 149, 152, 153 Global System for Mobile Communications (GSM), 149 Governance process ICT, 117–118 North American Electric Reliability Corporation (NERC), 116 “public–private partnership,” 116 risk process, 117 structured approach, 116 system-of-system, 115 third energy package, 116 H Hafkamp, 10 Hierarchical automation, 20 I ICT and electricity system interrelations electricity flow, 53, 54 interconnected power systems different transit routes, 62 flows, Belgian grid, 62, 63
183 real-time cooperation, 64 reliability maintenance, 66–67 system operators (SO), 63 Italian blackout, 28 September 2003 course of events, 60–61 criticality, 61 US blackout, 14 August 2003 course of events, 55–58 criticality, Canada blackout, 58–59 wide area measurement systems (WAMS) blackouts prevention, 65 GPS signal, 67–68 phasor measurement units (PMUs), 64 SCADA systems, 64 “smart grids” concept, 65 structures, 68–69 warning systems, 64 IDEAL service, 164 IEC/ISO standard, 160 Ilic, 9, 177 Inadvertent energy exchange (IEE), 21 Individual-based modeling (IBM), 86 Information communication technology (ICT) and power systems Control Energy Center, EMS and BMS, 76–78 critical infrastructure protection, 73 cyber-incidents, 103–105 electrical market deregulation, 72 electrical power network control, 71 electric utilities, 73 generalities, 74–75 hierarchical structure, 75–76 infrastructure interdependency modeling, 82–83 agent based modeling, 84–87 applications, 91 behavioral simulation, 88 benchmark, 90, 91 combined simulator, 88–89 complex networks, 91–93 control center/information infrastructure, 89–90 critical infrastructures, 87–88 electric network/infrastructure, 89 energy and information, 88 heterogeneous components, 82–83 inter-process communication, 90 network topology, 94 Petri nets, 83–85 social networks and reliability modeling, 87
184 Information communication technology (ICT) and power systems (cont.) telecommunication network/ infrastructure, 89 uninterruptible power supply (UPS), 82 interconnectivity, 72 operation and integrity, 72–73 security assessment availability and integrity, 79 Bayesian networks, 99–101 consequences, 80 criticality hierarchy, 94, 102 cyber attack process, 79, 80 cyber vulnerability, 80–81 establishment and maintenance, 79 hybrid approach, 102 imminent disturbance, 78 modified FMECA approach, 95–96 precedence graphs, 96–98 proprietary protocol, 81 Slammer worm, 81 Information security management program inconsistent application, TVA’s policies and procedures, 139 opportunities, 140–141 patch management weakness, 139 remedial action plans, 139–140 risks assessment, 138 system inventory, 138 Integrated Services Digital Network (ISDN), 148 Intelligent electronic devices (IED), 76 Interconnected power systems different transit routes, 62 flows, Belgian grid, 62, 63 real-time cooperation, 64 reliability maintenance, 66–67 system operators (SO), 63 Internet direct banking, 164–166 Internet service provider (ISP), 148 Italian blackout, 28 September 2003 course of events, 60–61 criticality, 61 J Jeopardize system controls, electric grid ICT control implementation critical infrastructures audit logs, 137 corporate network control, 136 firewalls, 137
Index physical security, 137 security safeguards, 136 Federal Information System Controls Audit Manual, 135 information security management program inconsistent application, TVA’s policies and procedures, 139 opportunities, 140–141 patch management weakness, 139 remedial action plans, 139–140 risks assessment, 138 system inventory, 138 TVA’s operation, 134–135 industrial control systems, 130–131 physical/virtual systems and assets, 129 power service, Southeastern United States, 133–134 risk analysis, 131–132 Smart Grid technology, 130 L Large-scale breakdown, 36 Local digital area networks (LAN), 69 Local networks, 17 M Masera, Marcelo, 7, 10 Mean-time-to-failure (MTTF), 152 Mean-time-to-repair (MTTR), 152 Microgrids, 173 Midwest independent system operator (MISO), 55–56, 59 Multi-infrastructure simulator applications, 91 benchmark, 90, 91 combined simulator, 88–89 complex networks cascading failure model, 92 full rerouting policy, 93 graph theory, 92 interconnected large systems, 91 scale-free network, 92 two-layer system, 93 control center/information infrastructure, 89–90 electric network/infrastructure, 89 energy and information, 88 inter-process communication, 90 telecommunication network/ infrastructure, 89
Index N National Infrastructure Protection Center (NIPC), 104 Netherlands Bankers’ Association, 160, 166, 167 Networked infrastructure complex adaptive socio-technical systems, 3–4 critical, 1–2 cyber interdependency, 9, 10 emergent behaviour, 4–5 financial sector, 10–11 functional decomposition strategy, 10 geographic interdependency, 9–10 information and communication systems, 7, 8 logical interdependency, 10 minor incidents with major consequences electricity and communications infrastructures, 7 Internet, 6 “normal” threats, 7 research efforts, 5–6 multi-actor diversity, 8 physical interdependency, 9 power and telecom, 2 structural decomposition, 10 Networks interconnection, 3 Nimda worm, 104 North American Electric Reliability Corporation (NERC), 14, 19, 116 O Open access operations distributed decision makers, 28 dynamic monitoring and decision systems (DYMONDS), 30–31 electric power architectures, 18, 27 electric power grids, 27, 28 fully distributed architectures, 27 iterative methods, 27 monitoring and control, 29–30 novel standards and protocols, 29 on-going industry changes, 26–27 Overhead 380 kV Mettlen-Lavorgo line, 60 Overloaded Sils-Soazza line, 60 P Payment Cards Industry (PCI), 160 PCI+ standard, 160 Pearls, J., 100
185 Petri nets, 83–85 Portable operating systems interface (POSIX), 90 Power infrastructure cybersecurity, 119–124 business and compliance side, 121 congestion and ancillary services management, 120 congestion management, 121–122 deregulated and internetworked systems, 120 differential algebraic equations, 123 events modeling and simulation, 123 experimental work, 122 facility development, 122 ICT vulnerability, 119, 123 “obscure” character, 119 protocols, 119–120 technology suppliers, 123 widespread Internet technologies, 120 E + I paradigm, 114–115 governance ICT, 117–118 North American Electric Reliability Corporation (NERC), 116 “public–private partnership,” 116 risk process, 117 structured approach, 116 system-of-system, 115 third energy package, 116 integration, 112–113 policy initiatives, 113 policy recommendations, 125 research recommendations, 124–125 Power line carrier (PLC), 147–148, 152, 153 Power line telecommunication, 147 Precedence graphs criticality hierarchy, 97–98 data acquisition, 97 expert system, 96 network model, 97 relay protection, 97–98 SCADA/DMS function, 97, 98 state estimator, 98 Professional Mobile Radio (PMR), 149 Public Access Mobile Radio (PAMR), 149 Public switched telephone network (PSTN), 147 Public utilities, 1 R Rinaldi, S.M., 9 Risk management process, 163
186 Robert, 10, 177 RTU. See Field remote terminal units S SCADA/EMS functions inventory, 77–78 Self-organized criticality systems, 94 Sensitive control system protection, 137 SimPy, 89 Single European Payments Area (SEPA), 160 Slammer worm, 81, 103 Smart grids, 102, 173 advanced metering AMR-smart metering, 144–145 communication requirements, 146–147 assessment attributes, 153, 154 PLC/RF and Internet solutions, 155 technical and non-technical requirements, 153 communication interfaces and protocols, 143 cost analysis, 150–151 dependability analysis encoding and decoding, 153 encryption and decryption, 153 mean-time-to-failure (MTTF), 152 properties, 151 safety and maintainability, 152 suitable communication means broadband connection over phone line/ TV cable, 148 licensed radio frequency, 149–150 non-licensed radio frequency, 149 power line carrier (PLC), 147–148 second/third generation mobile telephony and data, 149 smallband communication over telephone lines, 148 two-way communication, 144 State estimation, 66 Steenvoorden, 10 Supervisory Control and Data Acquisition (SCADA) systems, 21, 30–31, 64, 66, 130–131 T Telecommunication sector, 4 Tennessee Valley Authority (TVA) critical infrastructure, 135 appropriate security practices audit logs, 137 corporate network control, 136 firewalls, 137
Index physical security, 137 security safeguards, 136 information security management program, 138 inconsistent application, 139 opportunities, 140–141 patch management weakness, 139 remedial action plans, 139–140 risks assessment, 138 system inventory, 138 Terrestrial trunked radio, 149 Third generation mobile telephony, 149 Tie-line flows, 21 Total transfer capability (TTC), 23–24 Traffic light protocol, 167 Tranchita, 10 U UMTS, 149 US blackout, 14 August 2003 course of events firstenergy (FE), 55–56 midwest independent system operator (MISO), 55–56 origination area, 55 power flows, 56–58 transmission lines, 56 tripped off-line, 55 criticality, Canada blackout, 58–59 W Way forward, ICT Cartesian solution, 177 communication links, 174 competition and market integration, 173 consequences, 175 cybersecurity incidents, 175 distributed generation, 173 failure cost, 178 functionality and the cost reductions, 174 information processing, 174 internal and external threats, 171 organizational complexity, 172 organization and governance, 178 power outages cost, 176 renewable energy source, 172–173 research, risk assessment and communication, 177–178 restructuring process, 172 risk management, 176 substantial costs, 175 Wide area control (WAC), 67
Index Wide area digital network (WAN), 69 Wide area measurement, protection, and control (WAMPAC) systems data transmission, 68 flexible communication platform, 69 GPS signal, 67–68 three-layer structure, 68 Wide area measurement systems (WAMS) blackouts prevention, 65
187 GPS signal, 67–68 phasor measurement units (PMUs), 64 SCADA systems, 64 “smart grids” concept, 65 structures, 68–69 warning systems, 64 Wide area protection (WAP), 67 Wilshusen, 10, 171 WiMAX, 150