Optimal Shutdown Control of Nuclear Reactors
M A T H E MAT1C S I N SCIENCE AND ENGINEERING A S E R I E S OF M O N O G R A P H S A N D T E X T B O O K S
Edited by Richard Bellman University of Southern California
TRACY Y. THOMAS. Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965 TRACY Y. THOMAS. Plastic Flow and Fracture in Solids. 1961 2. 3. RUTHERFORD ARIS.T h e Optimal Design of Chemical Reactors: A Study in Dynamic Programming. 1961 JOSEPH LASALLEand SOLOMON LEFSCHETZ.Stability by Liapunov’s 4. Direct Method with Applications. 1961 LEITMANN (ed.). Optimization Techniques: With Applications to 5. GEORGE Aerospace Systems. 1962 RICHARDBELLMANand K E N N E T HL. COOKE. DifferentialDifference 6. Equations. 1963 FRANKA. HAIGHT.Mathematical Theories of Traffic Flow. 1963 7. 8. F. V. ATKINSON. Discrete and Continuous Boundary Problems. 1964 NonLinear Wave Propagation: With AppliA. JEFFREY and T . TANIUTI. 9. cations to Physics and Magnetohydrodynamics. 1964 10. J U L I U S T. Tow. Optimum Design of Digital Control Systems. 1963 HARLEY FLANDERS. Differential Forms: With Applications to the Physical 11. Sciences. 1963 SANFORD M. ROBERTS.Dynamic Programming in Chemical Engineering 12. and Process Control. 1964 SOLOMON LEFSCHETZ. Stability of Nonlinear Control Systems. 1965 13. DIMITRISN. CHORAFAS. Systems and Simulation. 1965 14. A. A. PERVOZVANSKII. Random Processes in Nonlinear Control Systems. 15. 1965 MARSHALL C. PEASE,111. Methods of Matrix Algebra. 1965 16. V. E. BENES.Mathematical Theory of Connecting Networks and Tele17. phone Traffic. 1965 WILLIAM F. AMES.Nonlinear Partial Differential Equations in Engineering. 18. 1965 19. J. A C Z ~ LLectures . on Functional Equations and Their Applications. 1966 R. E. MURPHY. Adaptive Processes in Economic Systems. 1965 20. S. E. DREYFUS. Dynamic Programming and the Calculus of Variations. 21. 1965 A. A. FEL’DBAUM. Optimal Control Systems. 1965 22. 1.
MATHEMATICS I N S C I E N C E A N D E N G I N E E R I N G 23. 24.
25. 26. 27.
28. 29.
A. HALANAY. Differential Equations : Stability, Oscillations, Time Lags. 1966 M. NAMIK OEUZTORELI. TimeLag Control Systems. 1966 DAVIDSWORDER. Optimal Adaptive Control Systems. 1966 MILTONASH. Optimal Shutdown Control of Nuclear Reactors. 1966 DIMITRIS N. CHORAFAS. Control System Functions and Programming Approaches. (In Two Volumes.) 1966 N. P. .ERUGIN. Linear Systems of Ordinary Differential Equations. 1966 SOLOMON MARCUS. Algebraic Linguistics; Analytical Models. 1966
In preparation A. KAUFMANN. Graphs, Dynamic Programming, and Finite Games MINORU URABE. Nonlinear Autonomous Oscillations Dynamic Programming: Sequential Scientific A. KAUFMANN and R. CRUON. Management GEORGELEITMANN (ed.) . Optimization: A Variational Approach A. M. LIAPUNOV. Stability of Motion Y. SAWAGARI, Y. SUNAHARA, and T . NAKAMIZO. Statistical Decision Theory in Adaptive Control Systems MASUNAO AOKI.Optimization of Stochastic Processes F. CALOGERO. Variable Phase Approach to Potential Scattering J. H. AHLBERG,E. N. NILSON,and J. L. WALSH.The Theory of Splines and Their Application HAROLD J. K U S H N E RStochastic . Stability and Control
This page intentionally left blank
Optimal Shutdown Control of Nuclear Reactors MILTON A S H E. H . Plesset Associates, Inc. Santa Monica, California
0
Academic Press 1966
New York and London
Copyright 8 1966, by Academic Press Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, or any other means, without written permission from the publishers. Academic Press Inc. 11 1 Fifth Avenue, New York, New York 10003 United Kingdom Edition published by Academic Press Inc. (London) Ltd. Berkeley Square House, London W. 1 Library of Congress Catalog Card Number: 6622148 Printed in the United States of America
To my wife, Shulamite, and children, Avner, Miryam, Reuel nlwY5l 71nw’;, (Sayingsof ihe Fathers, IV, 5 )
This page intentionally left blank
Preface
The shutdown problem of a high flux, thermal neutron nuclear reactor, with respect to the kinetics of the radioactive fission products it contains, principally xenon135, is about twenty years old, having emerged from the Manhattan Project experience of World War 11. That the existence of the isotope xenon135 in thermal reactors is to be contemplated with a healthy respect is attested to by the fact that xenon is one of the principal factors that limit the maximum operating power of present conventional thermal neutron reactors. There is another class of xenon problems which come under the heading of xenon spatial oscillations, sometimes called “flux tilt” oscillations. These are discussed only peripherally in this monograph. Simultaneous with the development of nuclear reactor physics, and nucleonics in general, were the advances made in the areas of applied mathematics stemming from the requirements of practitioners of military operations research and analysis. Such advances also began as a product of the World War I1 experience and are still being made today. One of these is dynamic programming, for which a better descriptor is multistage decision theory, and which is presently coming of age in manifold applications. These include applications in most branches of engineering, economics, statistics, and physics, not to mention the needs of modern operations analysis in a fast changing geopolitical and technological world. In the late 1950’~~ the obvious interdisciplinary potential in the confluence of the above type of applied mathematics, principally dyix
X
Preface
namic programming, with certain problem areas in nuclear reactor engineering and physics was recognized. Out of this grew the application to the specific class of problems that comprise the contents of this book. Aspects of this confluence are built into the structure of the first three chapters for pedagogical reasons. That is, an exposition of elementary dynamic programming is included in the first three chapters which also comprise the background for, and introduction to, the central problem of this work  the investigation of nuclear reactor optimal xenon shutdown programs. This monograph is addressed to those engaged or interested in nuclear reactor physics and the engineering sciences, as well as to their counterparts in modern optimal control theory. It is felt that readers with a mathematical bent will appreciate the portions of this class of problems that are cast in a mathematical framework, while those with more physical predilections will derive much from what is presented on the physical level. The writing of most of the manuscript, as well as the final computations, was done during my stay as an Israel Atomic Energy Commission Fellow, in the academic year 19641965, at the Soreq Nuclear Research Center, Yavne, Israel. I wish to thank the Israel Defense Ministry for their courtesy in allowing me the use of their large scale digital computing facilities, Professors S. Yiftah and I. Pelah, respectively Scientific Director and Head of the Physics Department of the Soreq Nuclear Research Center, under whose auspices this work was completed, Dr. R. Bellman, University of Southern California, Los Angeles, and Dr. R. Kalaba, The RAND Corporation Santa Monica, for their assistance, suggestions, and reading of the manuscript, my son Avner for helping to proofread the manuscript and galleys, Mr. Wayne Jones of the System Development Corporation, Santa Monica who essentially wrote the central digital computer programDYNPROG described in Chapter 6 , as well as provided vital critical discussion, and Mrs. Ilana Maik, also of the Soreq Nuclear Research Center, for her painstaking efforts in expertly typing the manuscript. MILTONASH Nahal Soreq, Israel Santa Monica, California
Contents
ix
PREFACE Chapter 1. Xenon in Nuclear Reactors.
Dynamic Programming I 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8.
Introduction and Historical Review Xenon Spatial Oscillations FissionProduct Poison Production Absorption Cross Section of Xenon ThermalReactor Xenon Difficulties Conventional Approaches to Circumvent Xenon Dynamic Programming Principle of Optimality and Two Examples
1 3 4
6 7 11 12 13
Chapter 2. Reactor Poisons.
Dynamic Programming I1 2.1. 2.2. 2.3. 2.4. 2.5. 2.6.
LongTerm FissionProduct Poisons Poison Reactivity Xenon Spatial Oscillations Revisited Discrete Optimal Control Averaged Control and Terminal Control Impact on Linearized Control Theory xi
21 23 24 25
27 30
xii
Contents
Chapter 3. Poison Kinetics and Xenon Shutdown.
Dynamic Programming I11 ReactorPoison Kinetics Equations Immediate Flux Shutdown Xenon and Samarium after Protracted Shutdown Xenon Minimum and Minimax Problem Statements 3.5. Constraints 3.6. Dynamic Programming. Absolute Value and Minimax Criteria 3.1. 3.2. 3.3. 3.4.
34 39 41 43 44 47
Chapter 4. The Maximum Principle 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.1.
Introduction Two Examples BangBang Control Continuous and BangBang Control Optimal OrbitalRendezvous Control Simplified Xenon Shutdown Control The TwoPoint BoundaryValue Problem
52 56 59 61 63 64 67
Chapter 5. Minimum and Minimax Xenon Shutdown 5.1. 5.2. 5.3. 5.4. 5.5. 5.6.
Mathematical Restatement of Optimal Xenon Shutdown Mathematical Restatement of Constraints DynamicProgramming Functional Equation Derivation of Bellman’s Equation BangBang Control Dilemma DynamicProgramming versus MaximumPrinciple Optimal Shutdown Solutions
69 71
73 75 77 80
Contents
xiii
Chapter 6. Computational Aspects 6.1. 6.2. 6.3. 6.A.
Introduction and Calculation of Fk Tables The Xenon Override Constraint DYNPROG and COAST InputData Format Appendix to Chapter 6
87 90 91 93
Chapter 7. Experimental Verification 7.1. 7.2. 7.3. 7.4. 7.5. 7.A.
Introduction and IRR1 Reactor Description Immediate Shutdown of IRR1 to Zero Flux Shutdown to Nonzero Power Levels Xenon and Iodine Buildup and Decay Experimental Results Appendix to Chapter 7
102 103 107 109 115 118
Chapter 8. Results and Conclusions 8.1. Introduction and Xenon Unconstrained Extremals 8.2. Xenon Constrained Extremals 8.3. Interdependence of Flux and Xenon Constraints 8.4. Two Types of Optimal Shutdown Payoffs 8.5. Short Allowable Shutdown Durations 8.6. Strongly Limited Xenon Override Shutdown 8.7. Conclusions of Experimental Investigation
121 124 126 127 129 130 136
Chapter 9. Summary and Equivalences 9.1. Reprise 9.2. Equivalence between the Optimality Principle and the Maximum Principle
140
143
Contents
xiv
9.3. Comparison of Optimal Shutdown Criteria 9.4. Other Equivalences 9.5. HigherOrderSystemFormulations
152
References
155
Bibliography Document Glossary Xenon Bibliography
INDEX
147 149
157
158
1 65
CHAPTER 1
Xenon in Nuclear Reactors. Dynamic Programming I
1.1. Introduction and Historical Review In a large highpower thermalenergy t nuclear reactor that is operating at steady state, the myriads of fissions from which the power output is derived produce high concentrations of various fissionproduct radioactive nuclei which are detrimental to normal operation. Certain fissionproduct nuclei, and their decay products, have tremendous absorption cross sections t for thermalenergy neutrons. The neutrons are the “lifeblood” of nuclear reactors, because they maintain the chain reaction by causing fissions, which in turn release more neutrons. The above fission products are called poisons, because they adversely affect the maintenance of the constant neutron population required for equilibrium reactor operation. The principal fissionproduct poisons of interest are the isotopes samarium149 and xenon135, whose thermal neutron absorption cross sections are 50,000 and 3,500,000barns, 0 respectively. Such t This refers to the kinetic energy of the bulk of the reactor neutron population thermal energy is essentially the energy at room temperature (300 “Kelvin), which corresponds to 0.025 electron volts. $ “Cross section” can be thought of as the microscopic interaction probability per pair of interacting particles. Thus the above absorption cross section is the probability that a fissionfragment nucleus will literally absorb an incident neutron. The noun, cross section, is derived from the fact that it has dimensions of area. 8 1 barn = 1024 cm2; this unit of cross section originated during Manhattan Project days and, as can be surmised, the derivation hints of someone or something that cannot hit the broad side of a barn.
1
2
Xenon in Nuclear Reactors. Dynamic Programming I
11
cross sections are orders of magnitude greater than those ordinarily encountered in reactor physics or engineering. As a matter of fact, the above xenon cross section is the largest known neutron absorption cross section of any nucleus on the Segrt chart (periodic table). The reasons for the existence of such large cross sections will be discussed briefly later. In an operating reactor, at least enough fuel (uranium or plutonium) must be supplied, in addition to the required critical mass, that sufficient additional neutrons are released from fission to “satiate the appetite” of the fissionproduct poisons existing in the steady state. However, if the reactor is disturbed from steadystate operation, such as being shut down, inordinately large amounts of poisons accumulate, which can severely restrict the flexibility of subsequent reactor control. The theme of this book centers about the manner of coping with this difficulty. Specifically, it is desired to find the means to shut down a large highpower thermal reactor on a program that permits maintenance of the flexibility of subsequent control. Such flexibility is delineated in the later discussion. The problem of fissionproduct poisons with respect to their effect on preserving the operating integrity of a highpower thermal reactor exists for at least a score of years. It was first manifest in the initial operation of the large plutoniumproducing thermal reactors at Hanford, Washington, during the World War I1 Manhattan Project. It was found that these reactors were slowly shutting themselves down for no apparent reason. Since they were intended to produce weapongrade plutonium, as opposed to gaseousdiffusion methods just starting to produce weapongrade uranium (the diffusion technology had just been born), the whole atomicbomb program was thought at that time to be in jeopardy. As is well known, both methods for producing atombomb fuel were “successful,” in that uranium and plutonium were the separate constituents used in the two atom bombs dropped on Japan during the closing phase of World War 11. As can be imagined, a toppriority investigation was launched into the reasons for the improper performance of the Hanford reactors. Within two days, Enrico Fermi and J. H. Wheeler found the difficulty. It was caused mainly by the fissionproduct xenon135, which was
1.21
Xenon Spatial Oscillations
3
poisoning the reactors by depleting them of neutrons because of its (now known) tremendous absorption cross section. The Hanford reactors were immediately supplemented with additional fuel to maintain steadystate operationi.e., to keep the reactors critical. Thus in presently existing highpower thermalnuclear reactors, at least enough additional fuel must be incorporated to counteract the influence of fissionproduct poisons to provide for steadystate reactor operation. For such operation, the amount of fuel must be increased over that required for a critical mass, often by a factor of 2. However, the fuel load is increased over critical for other important reasons, such as the simple fact that fuel is being used up. For example, the fuel charge of a modern Polaris nuclear submarine is many times greater that than called for by the criticality equations, to provide enough fuel over the anticipated military life of the submersible. Such heavy fuel loading in a submarine yields, at the same time, sufficient poisoncontrol flexibility. That is, there is always enough reactor fuel, except when it is severely depleted just prior to recharging with a new fuel core, to override the adverse effect of the fissionproduct poison concentration in order to restart at will after shutdown. In the shutdown state, the xenon concentration, for example, can rise to many orders of magnitude over that at steady state. As will be seen, this is due to the kinetic imbalance of the production, decay, and absorption of the various fission products following shutdown. However, in conventional landbased stationary highpower reactors, such heavy fuel loading is prohibitively expensive, so that only partial xenon override is possible. That is, the reactor can be restarted only during a short time following shutdown. If this time is exceeded, the reactor cannot be restarted until the excess poison has decayed away naturally, which is a matter of two days or more.
1.2. Xenon Spatial Oscillations There is another class of reactor xenon problems. These are caused by spatial oscillations of the flux or power? throughout a large thermal + It is easily shown that the reactor power is proportional to the neutron flux (see footnote, page 8). Hence flux and power are often used synonymously in this book.
4
Xenon in Nuclear Reactors. Dynamic Programming I
11
reactor due principally to the spacetime kinetics of xenon135. Such oscillations tend to occur when the reactor is so heavily loaded with fuel that the power density is constant almost throughout the reactor volume. A classic case of a tendency to xenon spatial oscillations occurs in the tritiumproducing reactors at the Savannah River facility. These fluxtilt oscillations are discussed only qualitatively herein. Suffice it to say at this point that the oscillation periods are measured in days or fractions thereof. Thus their effect on the reactor can be controlled adequately and easily by manual surveillance on the part of the reactor operator. 1.3. FissionProduct Poison Production It would be well now to discuss briefly the origin and subsequent behavior of the radioactiveisotope species that play an important role in xenon control considerations. As mentioned, fissions are the specific causative agent of reactor power production, by virtue of the kinetic energy of the resulting fissionfragment particles, which heat the fuel and consequently the reactor itself. Coolant is circulated through the reactor to extract the heat generated, which is transformed to useful power external to the reactor system. Certain of these fragmentparticle nuclei transmute through beta decay (i.e., by emitting /? particles, which is another term for electrons) to form the poison isotopes. Specifically, tellurium135 is a fission fragment occurring in 5.6 per cent of fissions. It has a halflife of 2 minutes, /?decaying into iodine135. With a halflife of about 6.7 hours, iodine135 in turn /?decays into the troublesome isotope xenon135. Iodine135 decay provides the principal source of xenon in a steadystate reactor, even though xenon itself is produced directly as a fission fragment. However, the latter occurs in only about 0.3 per cent of the fissions. Xenon135, if it does not absorb a neutron to become xenon136, which is a harmless nonpoison, will /?decay into cesium135 with a halflife of 9.2 hours. Cesium135 will ultimately /?decay, with a halflife of 20,000 years, to barium135, which is a stable isotope. The pertinent decay schemes are depicted in Fig. 1.1. For xenon kinetics purposes, upon the realization that the two principal isotopes to be considered are xenon and iodine whose half
FissionProduct Poison Production
1.31
5
Xenon Iodine Decay Scheme
1
Te’35 (fission product)
2 min
p
I135
(30’
~
?
/
Xe 135
~
@~ ~A
)
p
\ \Y
(70 per cent)
Xel35
Samarium Promethium Decay Scheme
Ndt4’
(fission product)
(stable)
FIG. 1.1. poisons.
Decay schemes for xenoniodine and samariumpromethium reactor
lives are measured in hours, two approximations usually are made. The first is that tellurium does not “exist” (halflife only 2 minutes) but that iodine135 is assumed to be directly formed from fission with the same fission yield (5.6 per cent) as tellurium. The second is that although iodine135 itself, like xenon135, absorbs neutrons, its absorption cross section is millions of times smaller than xenon. Then the corresponding term in the xenon and iodine kinetics equations, to be developed later, is ignored. In a similar manner, samarium149 kinetics can be considered.
6
Xenon in Nuclear Reactors. Dynamic Programming I
11
Samarium has the next highest thermal neutron absorption cross section of the reactorpoison fission products, but it is some 1/70th that of xenon. Analogously, neodymium149 is a fission fragment in 1.4 per cent of fissions and /?decays, with a halflife of 1.7 hours, to promethium149. The latter also /?decays, with a halflife of 53 hours, to the neutronabsorbing samarium149. Samarium, unlike xenon135, is a stable isotope. Also, analogous to xenoniodine, the approximations made on realization that the two important isotopes are samarium and promethium is that neodymium does not “exist” (its halflife is but 1.7 hours compared to 53hour promethium) but that promethium is assumed to be created directly from fission with the neodymium fission yield of 1.4 per cent. As before, the absorption of neutrons by promethium is small compared to samarium, so that the corresponding term in the samarium and promethium kinetics equations, discussed later, is ignored. However, because of the small thermal neutron absorption cross section of samarium compared to xenon, the reactor optimalcontrolflexibility problem depends principally on the xenon concentration, with samarium poison being minor by comparison for highpower thermal reactors. In the main theoretical development later, the emphasis will be on xenon, with the mental reservation that samarium poison is present as well.
1.4. Absorption Cross Section of Xenon As the reason for existence of the xenon control problem hinges on the stupendous thermal neutron absorption cross section of xenon135, it is appropriate to discuss briefly the reason for such a neutron affinity. When nuclei contain a large number of protons and neutrons, they exhibit a complicated behavior with regard to their interactions with other subatomic particles. This applies as well to the xenon135 nucleus and incident thermal neutrons. Without going into the quantum mechanical and nuclear physics explanation, it is found experimentally that the cross sections of such nuclei exhibit resonance properties, in that for certain incident neutron energies, the cross section can increase greatly compared to the remainder of the energy
ThermalReactor Xenon Dificulties
1.51
I
range. In the case of xenon, such a resonance occurs at thermal neutron energies. In fact, for the case of xenon and thermal energy neutrons, the cross section value at resonance is found experimentally to be quite close to the theoretical upper bound. As mentioned before, it is the largest known thermal neutron absorption (capture) cross section. Figure 1.2 depicts the behavior of the xenon cross section as found by experiment.
“r
Incident neutron energy, electron volts
FIG.1.2. Total cross section of xenon135. (After H. M. Sumner, [32].)
1.5. ThermalReactor Xenon Difficulties
As will be seen from later considerations, the amount of xenon poison concentration at steadystate reactor operation increases with increasing equilibrium power output for lowpower reactors. These correspond roughly to the conventional research reactor of up to 1megawatt power output. For highpower thermal reactors, whose output is reckoned in scores or hundreds of megawatts, the xenon concentration at steady state approaches a limiting value independent of the power. However, when the power output of a highpower thermal reactor is reduced, and especially if this reactor is shut down, the xenon
8
Xenon in Nuclear Reactors. Dynamic Programming I
11
concentration quickly builds up (almost as soon as the shutdown procedure is completed) to such proportions that subsequent control flexibility to increase power or to start up is lost, unless sufficient extra fuel has been incorporated into the reactor fuel charge. That is, the xenon poison concentration is so high after shutdown that it is unfortunately necessary to wait until the xenon decays away naturally in order to restart  a matter of some 30 to 50 hours, depending on the equilibrium flux (power) prior to shutdown. After enough xenon has decayed away, the subsequent neutron population will be able to maintain the equilibrium state (critical reactor), since the tremendous xenon absorption of neutrons is then essentially absent. On the other hand, if there is sufficient additional fuel to provide enough neutrons to “feed” the xenon at all times and corresponding xenon concentrations in the postshutdown period, then there is no difficulty. However, for highpower, or highfluxt thermal reactors, it will be seen presently that the amount of additional fuel required is at least an order of magnitude greater than that needed if no xenon were present. In fact, this consideration, plus the amount of fuel needed even to partially cope with the burgeoning xenon poison following shutdown, proscribes the presentday practical maximum design power of large thermal reactors. However, it is the thesis of this work that steps can be taken to alleviate this xenon difficulty through optimal shutdown programs that reduce the postshutdown xenon poison concentration. To provide a concrete example of how much additional fuel is needed to override the postshutdown xenon concentration, the following development is given. This derivation will dwell only on aspects important to the thread of this work. For more details, references in nuclear reactor physics and engineering texts can be consulted [l31. Consider a reactor a t equilibrium. Then for such a critical reactor, a parameter k,called the multiplication factor, must be equal to unity. + Flux, another important reactor parameter, can be thought of as the number of neutrons per square centimeter per second. Its knowledge is important in reactor physics and engineering, since multiplying it by the appropriate macroscopic cross section of the element under consideration yields the corresponding number of interactions per cubic centimeter per second.
1.51
9
ThermalReactorXenon Dificulties
If k < 1, the reactor is subcritical which means that, for whatever reason (e.g., insertion of control rods), the neutron population is decreasing, resulting in eventual shutdown. If, on the other hand, k > 1, then the population is increasing, a situation called supercritical. The fractional change in k from the equilibrium state, called reactivity ( K ) and usually measured in units of p, which is another reactor parameter, is K = ( k  l ) / p k . One unit of p is called one dollar of reactivity, and fractions thereof are expressed in cents of reactivity. For U235, /3=0.0064. It is first important to derive an expression for the negative reactivity corresponding to a given amount of xenon concentration in the reactor. In terms of xenon poisoning, k can be expressed as a ratio of particular cross sections. Thus for a reactor not containing xenon poison, k can be considered to within a constant of proportionality as given [l] by (1.1) k = zfuel/(zfuel + zrnoderator) 9
whereas for the same reactor containing xenon poison the analogous proportion is (1.2) kt = zfuel/(zfuel + zmoderator + zpoison) a
Cruelis the total macroscopic absorption cross section of the fuel for incident thermal neutrons. This includes absorption that produces fissions, in turn producing a new generation of neutrons, as well as parasitic absorption, which merely implies neutron loss. zfuel = Nfuelgfuel,
where cruelis the total microscopic absorption cross section (see footnote, page 1). N is the average number of fuel nuclei per cubic centimeter in the reactor. Correspondingly, Cmoderator is the macroscopic absorption cross section of the moderator t for incident thermal t The fuel core of a thermalnuclear reactor is immersed in a moderator, often water or graphite, which slows (moderates) the fast neutrons, born of fission, down to thermal energies through collisional processes. This must be done so that the resulting thermal neutrons can cause new fissions efficiently (fuel fission cross sections are highest for thermal neutrons), thus maintaining the chain reaction. The moderator often acts as the coolant, in which case it is normally circulated through the reactor out to an external heat exchanger.
Xenon in Nuclear Reactors. Dynamic Programming I
10
11
neutrons, while Zpoisonis the macroscopic absorption cross section of the xenon poison concentration. As the value of k, or k' is normally not too far from unity, an equivalent xenon poison negative reactivity can be defined as K x = (k'  k ) / P k ' .
(1.3)
Then in terms of the previous definitions for k and k', K,
=
mP/p(l
+ m)
(dollars),
(1 4)
where the poisoning factor P = Zpoison/Zfueland m = Zfuel/&,,oderator. This is the desired relationship for the reactivity due to the poison. For enrichedfuel reactors, m S 1, so that the poison reactivity is directly proportional to the poisoning factor; thus K , = P / p (dollars). It is an easy step to obtain an expression for the required increase in fuel concentration for a given amount of xenon poison reactivity in dollars. That is, for maintaining a reactor critical after xenon has accumulated, the fuel concentration Nfuelmust be increased to the value Niuelto compensate for the xenon poison, so that k=k'. Therefore, zfuel/(zfuel
+ zrnoderator)
= Ziuel/(Ziuel
+ Zmoderator + zpoison)
(le5)
The ratio of poisoned to unmust hold, where Z~uel=N;uelofue,. poisoned fuel concentration is then given in terms cf the (poison) reactivity by N i ' u e l l N r u e l = 1 + P ( m + 1) IKxI . (1.6)
For p=0.0064, which corresponds to U2j5fuel, and a typical value of m20 for a watermoderated reactor using highly enriched fuel (almost pure uz3'), N i ' u e l / N f u e l = 1 + 0.13 I K x l * (1 7) As will be seen, the postshutdown xenon concentration in a highpower thermal reactor can climb to a maximum of hundreds of dollars of negative reactivity. Then, from (1.7), it is seen that the fuel concentration required for xenon poison override at will, in the postshutdown phase, can be at least an order of magnitude greater than that called for when no xenon is present. t See note added in proof, page 20.
1.61
Conventional Approaches to Circumvent Xenon
11
1.6. Conventional Approaches to Circumvent Xenon There exists a number of limited methods by which the xenon problem can be overcome at least in principle, and sometimes in practice. Most of these methods come under the category of broad physical or chemical means. For example, circulatingfuel reactors can be constructed to circumvent the xenon difficulty. This type of reactor, in the research and development phase at present, contains a mixture of fuel and fluid (slurry) which circulates into and out of the reactor proper. Part of the circulation loop is external to the reactor, so that heat can be extracted from the slurry. Then the xenon itself can be extracted from the slurry, external to the reactor, by chemical means. With the advent of the newly discovered xenon compounds, perhaps methods can be found to extract the xenon from within the reactor in situ. The existence of the reactivity poisoning effect of xenon was one of the important reasons that much early interest was evinced in epithermal reactors. In these reactors, means are used to keep the thermal neutron population negligible, so that the xenon poison concentration would be of no consequence. In this system the neutron population now peaks above thermal energies (epithermal), where the xenon cross section is negligible compared to its very large thermalenergy magnitude. However, other difficulties with this type of reactor, especially in early applications to the nuclearsubmarinereactor program, resulted in a curtailment of military interest. One new wave of the future in nuclear energy seems to be the fast reactor. This reactor is built to function with fast (highenergy) neutrons only, because of the desirability of breeding new fuel, which is done most efficiently with fast neutrons. That is, an operating fast reactor will simultaneously produce fuel by breeding the fertile U238 isotope into new fissionable PuZ3’ fuel, for example. Ultimately, the use of fast reactors will produce sufficient fissionable fuel (Pu239from U238,or U233from Th232)from the plentiful world supplies of U238 and Th232.Then the availability of cheap fuel, and therefore cheap power, will no longer be a world problem, especially for the developing nations. However, more to the point in the present context, the xenon accumulation will no longer pose a problem because of its negligible cross section for fast neutrons.
12
Xenon in Nuclear Reactors. Dynamic Programming I
11
Another approach to maintain a modicum of xenon control flexibility is the use of an online xenon analog computer. That is, the reactor flux as a function of time is monitored and read into this computer, which then computes and displays the corresponding xenon concentration in real time using the xenoniodine kinetics differential equations, developed later. The reactor operator can thereby monitor the computed xenon concentration at all times, especially after shutdown. For example, he would then be apprised of when it is most expeditious to restart the reactor following shutdown. With a view toward calculating the optimal shutdown programs mentioned earlier, the thought might occur to one to set up the xenoniodine kinetics differential equations on an analog computer, and then attempt a search for an optimal shutdown by trying various flux shutdown functions. However, such a “cut and try” method would result, at most, in obtaining relative optimization while missing the actual class of shutdown programs that yield the correct answer. This will be recognized later, as the optimal shutdown functions will be seen to be essentially piecewiseconstanthardly the type of results that conventional analog computers would yield.
1.7. Dynamic Programming Simultaneous with the postWorld War I1 reactor development program was the development of the mathematical discipline of dynamic programming which grew out of postwar operations research and analysis needs. This technique, invented by the mathematician R. Bellman, is an interesting variant of multistage decision theory [4]. It has application to complicated control processes in which it is desired to optimize the control so that a predetermined criterion, or cost, functional is satisfied. By compartmentalizing the particular problem into a series of coupled subproblems, dynamic programming can render solutions to quite complicated control processes. The mathematical aspects of the coupling are manifest in a novel functional equation. This equation provides a recurrence relation or algorithm by which the problem is solved stepwise, obtaining a value for the control variable at each step. The resulting sequence of control or decision values forms the optimal control policy with respect to extremizing the predetermined criterion functional.
1.81
Principle of Optimality and Two Examples
13
In the limit of an infinite number of stages, each of which is infinitesimally short, the sequence of decisions devolves to the soughtfor control function, while the preceding recurrence relation becomes a novel partial differential equation called Bellman’s equation. It resembles an ordinary firstorder partial differential equation with the salient difference that the maximum or minimum operation occurs in the midst of the equation, which provides the novel aspect. As will be discussed further, Bellman’s equation is essentially the HamiltonJacobi differential equation description of the control process. The Pontryagin maximum principle, also to be discussed later, provides a complementary description of the dynamics of the control process, which corresponds to a formulation in terms of Hamilton’s canonical equations. These descriptions are two modern approaches KO control processes, or problems, which come under the aegis of the calculus of variations. Actually, the Pontryagin maximum principle is a generalization of the Weierstrass necessary condition for the existence of an extremal, to include control functions that form a closed set. In our context, this means merely control functions that are constrained. Both dynamic programming and the maximum principle provide new avenues for the solution of complicated control processes containing unwieldy constraints and difficult statevariable behavior, which the classical calculus of variations handles in a strained and artificial nianner, at best.
1.8. Principle of Optimality and Two Examples The essence of the method of dynamic programming is contained in its principle of optimality. First, the problem or process is compartmentalized into stages in a way that is as natural as possible. Then the principle of optimality is applied, from which the functional equation, unique to dynamic programming, springs. The principle of optimality is: An optimal policy has the property that whatever the initial state and initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. This is a statement of principle about optimal policy, which will be understood as that policy which extremizes a predetermined criterion unique to the problem at hand. Although “optimal policy” appears in its own statement of principle, it will be realized from the examples
14
Xenon in Nuclear Reactors. Dynamic Programming I
11
that follow, where an optimal policy is defined, that the optimality principle does not constitute a tautology. Example 1. An idealized breeder reactor [5]. Consider a hypothetical enriched uranium reactor. This means that the naturally occurring fertilet fuel (mostly U238),containing 0.7 per cent of the fissionable U235, has been enriched in the diffusion plants to upward The reactor, while consuming U235,breeds of 90 per cent of U235. Pu239from the fertile U238as well. For a given fuel charge (lumping ~ 2 . 3 5and Pu239together) containing x kilograms, assume that the reactor breeds a net amount rx kilograms per fuel cycle. At the end of such a cycle, certain fuel elements which have been depleted of fuel through irradiation are rearranged and/or removed and new elements are substituted. Such complications will be neglected in that the fuel will be considered homogeneous, so that any given fraction up to ( r  l ) x , since x kilograms must remain as a critical mass, can be removed at the end of each fuel cycle. It is also assumed that the reactor is able to accommodate to more fuel than the critical mass x , if need be, through its control system. The reactor will be operated for N fuel cycles before it is dismantled for overhaul. At the end of each cycle, fuel is removed from the reactor and used according to a utility function g ( Y k ) , where yk is the amount of fuel removed at the beginning of the kth cycle. The problem is to determine the optimal fuelremoval policy, which is the one that maximizes the overall utility for N cycles of operation. Let JN(x)be the maximum overall utility obtained from the fuel removed from the breeder reactor after N cycles of operation using an optimal fuelremoval policy. First consider a onecycle operation, which will be embedded in a twocycle operation, etc. Assuming that g ( y ) is an increasing function, then for a single cycle, one (trivially) obtains fl ( X I = max 9 ( Y l ) = 9 (rx) ( Y l = r.1. (1.8) O Q y l
t
e
100
'0
1.0
2.0
3.0 40
5.0
6.0 7.0
Postshutdown time, iodine meon lifetime (9.58hours)
FIG.3.2. Reactivity due to xenon poison after complete stepfunction flux shutdown, for various equilibrium flux levels.
for thermal reactors, K,( c0)  11.44 dollars, following immediate shutdown to zero flux. This is small compared to the maximum reactivity, following immediate shutdown, of  150 dolars due to xenon for the same equilibrium flux (Fig. 3.2).
3.3. Xenon and Samarium after Protracted Shutdown After the reactor has been shut down for a long time, the xenon and iodine concentrations have diminished to negligible levels, which can be taken as zero in the same sense as that of the shutdown flux as
42
I3
Poison Kinetics, Xenon Shuidown. Dynamic Programming III
discussed in the footnote on page 39. As seen earlier, the samarium concentration has reached its limiting value, as given by (3.25). If the reactor is now restarted to its previous equilibrium operating power, the xenon concentration will also build up to its previous equilibrium value. This is seen by letting u = l in (3.12) and (3.13) with zero initial conditions, to obtain
+ r o ) x + y1 (w + r o ) y + y z ( w + ro)
f =  (w
I’=ly
x(0) = 0 ,
(3.26)
y(0) = 0 .
(3.27)
The integral gives the xenon concentration after startup from “zero” concentration, which is (units of iodine mean life, 9.58 hours)
+ yz(w
+
“)
w+r,l

{ 1  exp [  (w
+ ro)t ] } .
(3.28)
This is sketched in Fig. 3.3, where it is combined with the appropriate xenon concentration shutdown curve taken from Fig. 3.2.
I0
r\
Xenon
I + mo
I
0 Many days
‘mmediotec_
shutdown
FIG.3.3.
immediate
 startup ~
Time

Xenon and samarium concentration buildup after long shutdown.
3.41
Xenon Minimum and Minimax Problem Statements
43
Similarly for samarium, except that since it is a stable isotope, its asymptotic concentration prior to restarting the reactor after a long time is obtained by integrating (3.14) and (3.15) with s(O)=p(O)=1 and u=O as initial conditions. This gives (units of promethium mean life, 77.2 hours) s ( t ) = 1 m,(l  e*) p = et, (3.29)
+
+
so that sasym = 1 m,, and pasym = 0. Upon restarting after a long time, u= 1 and, from (3.14) and (3.15),
giving s(t)=
S=mo(~s)
S(O)=Sasym,
(3.30)
&=lp
P (0)
(3.31)
= Pasym 3
I +[e‘rn,exp(m,t)]. m0 1  m,
(3.32)
This is sketched in Fig. 3.3 as well. 3.4.
Xenon Minimum and Minimax Problem Statements
With the background of the previous discussions in the areas of nuclearfissionproduct poison kinetics and dynamic programming, the optimal xenon shutdown class of problems is here stated in a literal fashion. Their mathematical restatement will come later. The xenon concentration quickly rises, following an abrupt flux shutdown from equilibrium conditions to very low, or “zero,” power, possessing a maximum of considerable magnitude, especially for high equilibrium power (flux) thermal reactors. The corresponding xenon poison maximum reactivities are measured in hundreds of dollars. As discussed in Section 1.5, this necessitates additional fuel loading of one and possibly two orders of magnitude greater than that required for equilibrium xenon operation, to allow for override of the influence of the xenon poison at will in the postshutdown phase. If such large amounts of fuel are not available in addition to the normal fuel charge, the postshutdown control flexibility is severely curtailed, because of the ease with which the xenon poison reactivity can overcome the available reactor positive reactivity. If this happens, the xenon poison will keep the reactor in a shutdown state for a protacted length of time, some 30 to 60 hours, until the xenon decays away naturally, as dis
44
Poison Kinetics, Xenon Shutdown. Dynamic Programming 111
I3
cussed earlier. For example, in the Materials Testing Reactor (MTR) at the National Reactor Testing Station, Arco, Idaho, which is a large highpower thermal reactor, only enough additional fuel is available to override xenon for the first 30 minutes, following an immediate (abrupt) flux shutdown. To alleviate the postshutdown control flexibility where reasonable, but still limited, amounts of additional fuel for xenon override are available, perhaps optimal flux shutdown programs can be found, in the sense of the following problems. Problem ( a ) . With certain constraints on the flux magnitude, and on the allowable xenon concentration given in Section 3.5, and for a given allowable time T in which to shut the reactor down, what is the flux shutdown program that minimizes the xenon maximum wherever it occurs in the postshutdown period (later than T)? This is the xenon minimax problem. The time at which the reactor is restarted in the postshutdown period is assumed irrelevant, but is considered in the formulation of the second problem. Problem ( 6 ) . For like constraints on the flux magnitude and the xenon concentration, and with an allowable shutdown time T, what is the flux shutdown program that minimizes the xenon concentration at a given time T o > T i n the postshutdown epoch? To is, of course, the time at which it is desired to restart the reactor. This is the xenon minimization problem. It is seen that (a) is a special case of (b) where To coincides with the occurrence of the xenon peak. 3.5.
Constraints
As is realized, the problems just formulated are couched in the language of the calculus of variations. In this sense, it is well known that the type of constraints imposed are the principal influence on the behavior of the optimal flux control programs. Further, the constraints often spell the difference between whether or not the problem is tractable, using the classical variational calculus. This is because of the severe limitations imposed on the smoothness of the state and control variables by the constraints. Such smoothness, or continuity, is a major requirement for solutions of these kinds of control processes, using the classical approach,
3.51
Constraints
45
On the other hand, the computational algorithm obtained from the dynamic programming formulation, as employed on largescale highspeed digital computers, actually thrives on constraints. This is due to the fact that the constraints actually limit the “search space” of the problem, as this algorithm can be construed as the instrument of a dynamic programming searchtheoretic method to obtain optimal control policies. That is, the amount of fast (core) memory, and therefore the computation time, can be reduced greatly by the limits imposed on the state variable space by the constraints. The principal constraints to be considered are those on (1) flux magnitude, (2) inverse period, and (3) xenon concentration. These are considered in turn. (1) Flux magnitude. The constraints on the magnitude of the normal
ized flux, u, are that it possesses a minimum of zero (see footnote, p. 39) and a maximum M . Usually the maximum corresponds to equilibrium operating power. However, as will be seen later, because of the xenon constraint, there are two modes of obtaining a given optimalflux shutdowncontrol policy. One is to conform strictly to the flux constraint maximum, resulting in one form of optimal flux shutdown. The second is to ignore the flux maximum constraint temporarily, so that the system will proceed to a given flux magnitude somewhat higher than equilibrium operating power over a portion of the optimal shutdown program, and then fall off with a negative exponential behavior back to equilibrium power, as will be seen. From the mathematical point of view, it is shown [9] that the flux magnitude must be constrained at least over part of the optimal shutdown program, resulting in a bangbang form of control; otherwise problems (a) and (b) of Section 3.4 are not properly posed. (2) Inverse Period. In an operating reactor, the logarithmic time derivative of the flux, called the inverse period in nuclearreactor parlance, is also constrained. It has a positive upper bound given by the particular reactorplant regulations governing safe standard operating procedure. The inverse period upper bound must ensure that the reactor does not vary temporally on what is termed a dangerously “fast period.”t For example, if the reactor were proceeding from a startup state on a positive inverse period of large magnitude, it could t The term period means the inverse of the logarithmic time derivative of the flux; inverse period means the direct logarithmic time derivative of the flux.
46
Poison Kinetics, Xenon Shutdown. Dynamic Programming 111
I3
pass through its desired equilibrium state (critical) too quickly, becoming supercritical t and possibly causing an accident before safety mechanisms and/or the human operator could intervene. A lower (negative) bound on the inverse period is normally provided in an operating reactor by the inertia of the control rods and associated mechanisms ; occurring as well is the phenomenon of delayed neutrons which also increase the sluggishness of a decreasing (subcritical) neutron population.
( 3 ) Xenon Concentration. To account for the most important fact, and the practical raison d’Ptre of the search for optimal xenon shutdown programs  that there will probably not be a complete xenon override capability in the reactor, i.e., the ability to restart at will following abrupt flux shutdown, there must be a constraint on the allowable xenon concentration. If the reactor possesses complete xenon override capability, optimal shutdown is trivial. In other words, the reactor can be restarted at will following shutdown, so that no optimal shutdown programs are necessary unless it is desired to minimize the postshutdown xenon concentration for some other reason besides attempting to acquire better postshutdown control flexibility. The constraint on the xenon concentration corresponds to the given amount of positive reactivity available for partial xenon override, which is provided by additional fuel, as discussed earlier. In the discussion to follow, optimal Aux shutdown programs for problems (a) and (b) will be obtained using constraints (1) and (3) only, that is, constraints on the flux and xenon concentration magnitudes only. Solutions, specifically to include constraint (2) on the inverse period as well, are beyond the scope of this work. This will be discussed with regard to the amount and type of future elaboration of optimal xenon shutdown control. The approach here is that the flux and xenon constraints delineate an idealization of reality sufficient for present needs. The corresponding shutdown programs obtained will be suboptimal because they do not take account of inverse period constraints that exist in an actual operating reactor. The idealization turns + Supercritical implies that the reactor multiplication factor k > 1 results in increasing neutron population which could herald a reactor accident; cf. Section 1.5.
3.61
47
Dynamic Programming. Absolute Value and Minimax Criteria
out to be quite a good approximation to the actual situation because of the time scales involved. The allowable shutdown duration T, time to xenon maximum, postshutdown epoch, etc., are measured in hours, while the allowable inverse period is measured in minutes, so that fluxlevel changes can be approximated in a discontinous manner by flux steps or pulses required by the optimal shutdown programs (as derived later). 3.6. Dynamic Programming. Absolute Value and Minimax Criteria One of the optimal xenon control class of problems, which is the central control process investigated in this book, is to find the shutdown control program to minimize the maximum postshutdown xenon concentration at whatever time it occurs. Before discussing the formulation of that problem, it is important to study some simple illustrations using a “minimax” criterion functional. Such problems are quite difficult to handle in a general way using the classical calculus of variations. However, at the very least using dynamic programming, a computational algorithm can be derived in a straightforward manner from which solutions to such problems can be found. This also holds true for the lesscomplicated criterion of minimizing the absolute value of a state variable. For example, what is the optimal control u that yields the minimum of the absolute value of the final state of the system, i.e., the terminal control criterion, lxNl = min, (3.33) where the equation of state is x,+1 = ax,
+ u,
xo = c
and the control is constrained in that 0 < u, < M ? Let j” ( c ) = min lxNl .
(3.34)
(3.35)
U
Then at the final state, since zero stages remain to control, trivially
(3.36)
Poison Kinetics, Xenon Shutdown. Dynamic Programming III
48
For one stage remaining, fl
+ u l ) = min(fo(ac),fo(ac + M ) )
( c ) = rnin fo(ac OBuiGM
(3
(3.37)
when u1 is zero or M respectively, since fo is linear in ul. Then lacl. In general, for j stages remaining to control,
f l(c)=
f j ( c ) = min
fjl
OBUjSM
(ac
+ uj)
( j = 1,2, ..., N ) .
(3.38)
Now consider a minimax criterion problem: to find the { u j } sequence, also constrained as above, that yields ( j = 1,2, ..., N ) .
min max I x j ( ( ~ j )
i
(3.39)
That is, find the optimal control policy u that minimizes the maximum 1x1 in whichever intervalj it happens to occur, using state equation (3.34). Let fN = min max ( x i [ . (3.40) uj
Again, trivially,
j = 1,2, ...,N
fo(c) = I C I
(3.41)
7
since there are no further decisions to make. Now f l( c ) =
min max lxjl uj
or f l( c )
(3.42)
j=O.l
= minmax(lc1, lac
+ ull),
(3.43)
UI
but f l (c) =
min max (IcI, fo (ac + ul)),
(3.44)
OduiQM
so that with j stages remaining to control, fj(c)=
min rnax(lcl,fj,(ac+uj))
OBujdM
( j = 1 , 2,..., N ) .
(3.45)
It is seen that an equivalent way of writing (3.45) is f j ( c ) = max(Ic1, min
OdujQM
f j  l(ac
+ uj))
( j = 1,2, ..., N ) .
(3.46)
As a second example, examine the following twodimensional system [6], which achieves a similar solution, but in a more deductive
49
Dynamic Programming. Absolute Value and Minimax Criteria
3.61
manner. It is desired to find the optimal control sequence { u j } that yields min max lxjl, (3.47) uj
j = 1.2, ...,N
where the system state is governed by the pair of difference equations Xr+I = X r Yr+l
+ Yrd
xo=c1, YO=',.
=Yr+g(xr,Yr)d
(3.48)
It is important to appreciate the following seemingly simple idea. For a sequence of k numbers {Nk},any chosen number Ni obeys the partition relationship max (Nl, N,, ..., Nk) = max (Ni,max (N1,N2,..., N,,,Nj,Nk)). Now let
fN(c1,CZ) = min uj
j
max
= 1,2,...,N
(3.49)
[xi[.
(3.50)
Using the partition relationship above, this can be rewritten fN(c1,c2)= minmax(lxNI,
max
j = 1,2,
UJ
...,N 
1
Ixjl).
(3.51)
As the minimum is over the set ujr i.e., the minimum operation is on the i n d e x j only, it can be placed inside the parentheses to give fN(c1,c2) = max(lxNl,min uj
or fN(C1,C2)=max(lxNl? f N  l ( c l
max
j = 1 , 2 , ...,N  1
(3.52)
Ixjl)
+ c2A,c2 + g(c1,c2)d)),
(3.53)
since with one lessj choice to make for u j , the system now finds itself confronted with a similar minimax problem but now advanced from the old state (c1,c2)to the new state (c1+c2d,c2+g(cl,c2)d)as given by the difference equations (3.48). Note that this method, if applied to the previous onedimensional problem, will yield (3.46) as one of its solutions. As an example of a simplified minimax criterion functional, consider the old chestnut of finding a bogus coin from a seemingly identical pile of coins, using an ungraduated balance, where it is known that
50
Poison Kinetics, Xenon Shutdown. Dynamic Programming III
I3
the bogus coin is heavier (or lighter) than the others [lo]. That is, find the minimum number of weighings required to guarantee finding the bogus coin from. among N coins. Letf(N)= the minimum number of weighings required, for N coins, to guarantee finding the bogus coin using an optimal weighing policy. The principle of optimality asserts that f ( N )= 1
+ minmax(f(k),f(N k
 2k)).
(3.54)
It is obvious that the N coins should be divided into three equal stacks, as nearly as possible. Then two stacks each contain k coins, while the third contains N  2 k coins, yielding k = [ N / 3 ] ; [n] is read as the nearest integer contained in n. The two equal stacks containing k coins each are weighed. This accounts for the first term (unity) on the right side of (3.54). If the scale becomes unbalanced, the search is immediately narrowed to k coins. That case is fortuitous, because if the scale balances, the search narrows to the possibly larger stack of N  2 k coins, since N  2k 2 N / 3 is possible. Then, at worst, the search narrows to the stack that has one more coin than the other two. Thence, k= [ N / 3 ] ,and
f ( N ) = 1 + min max (f ( k ) ,f ( N k=[N/3]
 2k)).
(3.55)
Since the minimum operator commutes with the maximum operator in the above equation, it becomes
f ( N ) = 1 + max (f ( ~ ~ 1 3f1( N) ~ 2 ~ / 3 1 ) ) .
(3.56)
This implies that the minimum number of weighings, f(N),equals the first weighing plus the subsequent minimum number of weighings, f ( [ N / 3 ] ) ,or f(N 2[N/3]), whichever is the larger, to guarantee finding the bogus coin. To solve (3.56), it is convenient first to let N = 3 m to obtain
f(3rn)=1+rnax(f(rn),f(rn))=I+
f(rn).
(3.57)
Second, let N=3m+ 1 in (3.56), to obtain
f(3m+1)=1+max(f(m),f(rn+1))=1+f(m+l). (3.58)
3.61
Dynamic Programming. Absolute Value and Mimmax Criteria
51
Third, let N = 3rn + 2 in (3.56), to obtain f(3m
+ 2) = 1 + max(f(m + l),f ( m ) ) = 1 + f ( m + 1).
Using these recurrence relations, with m = 1, 2, solutions are easily obtained, f ( N )=M ,
where
...,
(3.59)
the nontrivial
3M' < N < 3 M .
(3.60)
For example, six weighings are sufficient to guarantee finding a bogus coin (known to be lighter or heavier) from among collections ranging from 243 to 729 coins.
CHAPTER 4
The Maximum Principle
4.1. Introduction
As alluded to in the earlier discussion, the modern theory of optimal control has seen the emergence in the last decade of two complementary methodologies for treating control processes. These are Bellman’s optimality principle of dynamic programming and Pontryagin’s maximum principle. Both of these possess advantages and disadvantages with respect to the formulation and solution of particular control problems. The central problem herein, optimal xenon shutdown control, will be mathematically formulated later, using both principles. The difficulties will be pointed out, and it will be seen that for this class of problems, especially from the computational viewpoint, dynamic programming is the more straightforward and hence the more efficacious method. The aim of this chapter is to introduce the maximum principle. It will be stated but not proved, and illustrative examples and ramifications will be presented. Proof of the maximum principle can be found in many places in the controltheory literature of today [lo, 111. Consider the system described by the set of ordinary differential equations (2.1 l), rewritten here in expanded form i 1 =f1(x1,x2,...,xn; i 2
= f2 (x 1 9 x2 9
. ,x, **
in=fn(x1,x2,...,x,;
u1,u2 , . . . I u , ; t) ; u 1 ,u2 , ...,u, ; t )
u1,u2,...,un; 52
t)
x,(O)=x,o, = x20
x2 (0)
x,(O)=x,o,
9
(4.1)
4.11
53
Introduction
where the state of the system is described by the vector (xl,x2,..., x,), and the control vector is (ul, u2, ..., u,). The number of components in the state vector 2 and the control vector d need not be the same. Assume that it is desired to find the optimal control policy, i.e., the vector d = ( u l , u2, ..., u,) which will transform the system described by equations (4.1) from its initial state (xl0,x2,,, ..., xn0)to a final state in duration T such that the criterion or cost functional, JCC] =
S:
fn+1(~19X2,...,xn;
~ 1 3 ~ . .2. ,,u , ;
t)dt,
(4.2)
is minimized. If it turns out that a particular criterion is to be maximized, one merely minimizes the negative in this context. Some typical examples of cost functionals are: (1) minimum time control, f , + l = 1, so that trivially J = T ; (2) minimum meansquared control, f,+l= 117l2; then J=fO'1dl2 dt; (3) minimum meansquareddeviation,,f,+ = p12;then J=fg12.12dt; or various combinations of these.
Now define a new additional state variable xn+1
=
Sb
fn+l(x1,x2,.*.,xn;
u1,u2,..yun;
z)dz
(4.3)
u 1 , u 2 , ...,u,;
2)
(4.4)
with its differential equation i n + l =fn+1(~1,~2,.*.,~,;
and initial condition X , + ~ ( O ) = O and final condition x , + , ( T ) = J . x , + ~ is an additional state variable which should be considered as added to the system described by equations (4.1). As can be appreciated, the above general optimal control process is written in the language of the calculus of variations. With the integral constraint on the motion of the system, Eq. (4.2), such a formulation is called a Lagrange problem. Defining an additional variable xn+ as above, so that now the motion of the system, augmented by the additional variable, is controlled, in order that xn+ (T)=min is obtained, converts it to a Mayer problem. That is, the augmented system is transformed to a final state where one particular variable, x , + ~ ,has its extremum, while the others acquire certain fixed values. This is done because the maximum principle is couched in terms of a Mayer formulation.
54
I4
The Maximum Principle
Now define a Hamiltonian H, by letting (4.5)
H = p l f , + P J i +.**+Pn+lfn+l,
where a set of adjoint (dual, auxiliary) variablesp,, p , , ...,p n + are defined in that they satisfy the following system of equations, adjoint to the system given by (4.1):
p,
+ ax, p, ax,
afl
= pp1
afn+ 1
af2 + a * * +
Pn+ 1
8x2
P , ( T ) = 0,
From the definition of the Hamiltonian, it is seen that the state vector 5, and its dual
[email protected](generalized momentum of classical mechanics), satisfy the following canonical equations of Hamilton; i.e.,
xn+ and pn+ are seen to be “special” variables, since the definition of x,+ from (4.3) implies
because the Hamiltonian does not depend on xn+ explicity. In classical mechanics, it is said that the Hamiltonian is cyclic in the coordinate xn+1, so that the corresponding generalized momentum p n + is a constant of the motion. Then, for mathematical convenience, p n + =  1 is chosen, which is the reason for the final condition on pn+ in (4.6). With regard to the control vector a, the maximum principle requires that it belongs to a closed set of functions. As mentioned, the maximum principle is an extension of the Weierstrass necessary condition for the existence of extremal, in the variational calculus, to closed sets of
,
4.11
Introduction
55
control functions. In our terms, this merely implies that the control vector is constrained, so that for each of its components, a, Q u, d bi, i = 1, ..., n. The proof of the maximum principle requires closed sets o f t vectors; otherwise the preceding statement of the control problem is meaningless. This often has a physical analogue as well, since an unconstrained control vector implies the availability of infinite amounts of energy, momentum, thrust, etc., which is physically impossible. The maximum principle, briefly stated, is that the soughtfor optimal control 1*,which is the control vector that guides the system satisfying equations (4.1) so that J in (4.2) is minimized, is also that control vector which maximizes the Hamiltonian defined in (4.5). That is, maxH(p(t),a(t),G(t)) I1
= H(p(t),n(t),t*(t)) = constant,
(4.9)
where the adjoint vector p ( t ) satisfies equations (4.6) and pn+ < O . In other words, considering the Hamiltonian as a functional of 1, the optimal control t * is the vector that maximizes H, which maximum is a constant of the motion. H (p, 2,t * ) is a constant throughout the control duration T, and further H ( p , f, t*)=O, if T is not fixed a priori. The maximum principle is a necessary, but not sufficient, condition. However, for most control processes, physical intuition will provide sufficiency arguments as well for the calculated optimal 1*. With respect to the 1* that gives the soughtfor H (p, 2,t*) = max, the optimal control behavior falls into two general categories. The first is that if the system equations (4.1) (usually linear) are such that the resulting Hamiltonian is linear in the ui,then with the constraints a, Q u, < b,, i = 1, 2, ..., n, the optimal control lies on the boundary of its closed set. That is, all the ui will equal either a, or 6, over all or part of the control duration T. If the preceding is true only over portions of the control duration, then the u, will switch back and forth between a, and b, during the control phase, so that H ( p , 2,t*)=max will be maintained throughout the control duration. This type of control is called bangbang, for obvious reasons. The second category is the case where H ( p , 2,t ) is nonlinear in 1.Then the u, are obtained from aH/au,= 0, which generally results in a continuous function of time for the optimal control. Both of these types will be illustrated in the following examples.
The Maximum Principle
56
4.2.
14
Two Examples
Example 1. The first example is onedimensional (firstorder) and is quite similar to the one in Section 1.8 (which was solved by dynamic programming). The equation of state is
n=un
n(O)= 1 .
(4.10)
The cost functional is J = p Jiu2 dt. Following the scheme of the maximum principle, define the additional variable ni
(4.1 I )
with the corresponding additional equation of state j=pu 2 y(O)=O,
(4.12)
thereby converting the Lagrange problem of minimizing p u2 dt to the Mayer problem of seeking y(T)=min. The problem now is one of finding the optimal control u that transfers the system from its initial state n(0)=1 to a prescribed final state nT, in control duration T, such that y ( T ) is a minimum. The Hamiltonian is simply H=plun+p2Pu2,
(4.13)
and the adjoint variables satisfy
Since the Hamiltonian is nonlinear in u, aH/au=O, together with p 2 =  1, yields u* = n p l / 2 p . (4.15) Then to obtain u* explicity, it is necessary to solve the following twopoint boundaryvalue problem : n = un
n(0) = 1 ,
(4.16a)
p = pu2
Y ( 0 ) = 0,
(4.16b)
PI
= up1
p2=0
P1 (T)= 0,
(4.16~)
p2(T)=1.
(4.16d)
57
Two Examples
4.21
That is, both n and p , must be obtained from solutions of equations (4.16). In this case it is easy, since multiplication of (4.16~)by n yields np,
+ plun = 0 .
(4.17)
Using equation (4.16a), this is equivalent to d (np,) dt
=0
or
np, = c1 (constant).
(4.18)
This implies that u*=c,/2P, a constant of the motion. Integrating equation (4.16a) yields n = exp(c1t/2P). (4.19) In terms of the final state n,, c, =(2P/T) In n,, which follows from (4.19). In turn, the optimal control is 1
(4.20)
u* =  Inn,
T
and the corresponding state behavior is (4.21)
n = (n,)f’,.
If the final state is greater than the initial state, i.e., n T > 1, then u* is positive, so that the system increases exponentially to its final state, reaching there in time T. On the other hand, if n T < 1, then u* :
8
c .> ._
e

.
10
0
Equilibrium xenon concentrat ion corresponds to poison negative rwctivity of $7.82
5
._
g
C
5
1.0
0
575 min r
1.0
II50 min 1
2.0
I725 min I
3.0
Time, iodine mean life (9.58hours)
FIG.8.2. Xenon poison reactivity and corresponding optimal flux shutdown policies to minimize maximum xenon poison reactivity for various shutdown control durations T.Equilibrium flux @O = 2 x 1014 neutrons/cm2sec,umrx= 1 .O.
124
Results and Conclusions
I8
will not accomplish the trajectory transfer consistent with the parameters. A physical analog of this type of bangbang control is a child’s swing. With a person pushing from behind with the correct tempo (correctly timed pulse train), the amplitude of the swing easily can be made to increase or decrease with push pulses of limited magnitude.
8.2. Xenon Constrained Extremals The approach taken to compute optimal flux shutdown programs is to obtain them first without the xenon override constraint. Then the xenon constraint is imposed and the resulting shutdown programs are compared with those without the xenon constraint. First it should be noted that the unconstrained extremals are perfectly valid, provided that the corresponding optimal flux shutdown programs are physically T = 0.2 t control phaseT = 0.5 Post m t r o l phose
a
1.0
0
T = 1.0 Post control phase

1.0 2.0 3.0 Time, iodine mean life (9.58 hours)
FIG.8.3. Xenon poison reactivity and corresponding optimal flux shutdown policies to minimize maximum xenon reactivity for various shutdown control durations T. Equilibrium flux @O = 2 x 1014 neutrons/cm2sec, urnax= 2.0.
8.21
Xenon Constrained Extremals
125
realizable. By this is meant that for a given set of parameters, there exists sufficient partial xenon override reactivity to counteract the xenon poison reactivity at the point in the control phase at which the first flux pulse is called for in the shutdown program. If this is so, then the unconstrained shutdown program can be executed; otherwise not, and the system must then wait until enough xenon has decayed away naturally in order to restart at all, much less in any optimal manner. For example, as seen from Figs. 8.2 and 8.3, a physically realizable shutdown program can require an amount of xenon override capability ranging up to 60 dollars' worth of reactivity, where the xenon maximum occurs at about 70 dollars at this equilibriumflux magnitude ( q0 = 2 x l O I 4 neutrons/cm'sec). A second note is that the xenon constrained extremals are coincident with their unconstrained counterparts except for the portion of the extremal field cut off by the xenon override constraint, the horizontal line x = x,. The optimal shutdown program respects this constraint 'by generating a suboptimal extremal arc consisting of sawtooth segments adjacent to the line x = x, between its two intersections with the corresponding unconstrained arc, as sketched in Fig. 8.4. The sawtooth
10
3
X
0 I0
~(0.65) = I
L state; coast phase begins
0
02
04
06
08
10
Iodine, equilibrium units,
I2
I/Io
FIG.8.4. Minimax phase (xenoniodine) space plot of extremal arcs plus coastingphase arcs for control duration T = 1. @PO = 2 x IOl4 neutrons/cm2sec, urnax= 1 .O; cf. Fig. 8.2. Time in units of iodine mean life.
126
18
Results and Conclusions
extremal corresponds to a flux pulse train which burns our the xenon as the sawteeth zigzag opposite the line x = x c . Sawteeth occur because of the flux constraints, i.e., the bangbang nature of the control, as well as the constraining equations of state trajectories.
8.3. Interdependence of Flux and Xenon Constraints If the fluxupperbound constraint M is relaxed, the sawteeth can be dispensed with and the xenon constraint line x=xc itself can be made part of the extremal. This could be desirable from a practical point of Alternate flux program which makes xenon constraint line x = xc part of suboptimal
20
U
I0
0
t2 = 0.80
1,. 015
Alternate extremal orc including xenon constraint line x = xc
630 xenon override
Shutdown control phase
0
I
I
I
0.2
0.4
0.6
I 0.8
1
I
1.0
1.2
Time, iodine meon life (9.58 hours)
FIG.8.5. Optimal flux shutdown program for problem (b), to minimize xenon concentration at end of shutdown duration T = TO= 1. Equilibrium flux @o = = 2 x 1014 neutrons/cm%ec.
8.41
127
Two Types of Optimal Shutdown Payoffs
view in that reactor operating personnel might object strenuously to a large number of pulses in any optimal flux shutdown regime. Problems (a) and (b) are terminalcontrol types of problems; the cost functional depends only on the state at the termination of the control phase, so that the “running” or average cost of the control itself is assumed free (cf. Section 2.5). Hence the cost criterion is unaffected by either mode of suboptimal control, i.e., with or without sawteeth for a xenon constrained extremal. If the sawteeth are superseded in favor of the xenon constraint itself, then the line x=x, connects to the unconstrained extremal at the points marked + in Fig. 8.4. However, the fluxupperbound constraint must be relaxed along the x = x c arc, or else u s 0 for certain sets of the parameters. On the arc x=x,, i = O holds, which together with the state equations (3.12) and (3.13) results in a differential equation whose solution yields an expression for the shutdown flux u on x=x,. It is
2 M ti < t < t 2 , (8.3) where without loss of generality a highpower reactor is assumed, implying r 0 B w and x,By,. The time t , is the instant when the unconstrained extremal first intersects the line x=x,, while t 2 is the time at which the constrained extremal leaves x = x, as marked by signs, as shown in Figs. 8.4 and 8.5. Both t , and tz are measured from initial equilibrium. At r , the flux is jumped from zero to its value given by (8.3); i.e., u ( t , ) > M . u then drops exponentially until time tz when u ( t z ) = M , thus coinciding with the value of u on the unconstrained extremal. This is shown in Fig. 8.5 for the specific example M = l .
+
8.4. Two Types of Optimal Shutdown Payoffs Another result aids in partially answering the question of whether or not optimal flux shutdown programs pay off in terms of possible fuel savings. For a series of unconstrained xenon minimax cases at equilibrium flux q0=2 x 1014 neutrons/cm’sec, the xenon peak as a function of the shutdown duration T is plotted in Fig. 8.6 and is fitted by the empiricism min xp=  70 exp (  0 . 7 T) dollars. At this equilibrium flux level, immediate shutdown (T=O) yields a min x p = 70 dollars,
128
Results and Conclusions Hours
10 L
0
1
0.25
I
0.50
I
0.75
I
1.0
Flux shutdown duration time T , iodine mean life (9.59 hours)
FIG.8.6. Maximum xenon reactivity and its time of occurrence following minimax flux shutdown phase. Equilibrium flux @PO = 2 x loL4neutrons/cmzsec, Urnax = 1.0.
while if T = l , corresponding to 9.58 hours of shutdown duration, min xp 1:  34.8 dollars. Also the time at which the maximum occurs is increased for the optimized shutdown. The llhour xenon maximum for zero shutdown time (immediate shutdown) has been shifted to 20 hours, for 9.58 hours of allowed shutdown time, as can be seen from Fig. 8.6. For highflux reactors with normal fuel loads, and Zfue,/Zmoderator B 1, Eq. (1.7) asserts that the amount of additional fuel required to override the xenon peak poison reactivity is proportional to min xp. Hence the amount of fuel needed will be approximately halved with the above parameters if optimal shutdown programs are used. If it is desired to override the xenon concentration at will in the postshutdown phase (complete override), Eq. (1.7) reveals that, at the above flux level for an enrichedU 2 3 watermoderated reactor, approximately 10.1 times as much fuel is required as called for by the critical mass equations. For example, if the critical mass is 6 kilograms, 61 kilograms are needed for complete override, whereas only 34 kilograms would be required if optimal shutdown programs are used with 9.58hour shutdown durations. This would correspond to a saving of 324,000 dollars at the prevailing U.S. rate of 12,000 dollars per kilogram for enriched U Z 3 ’ .What is done for economic reasons in highpower thermal reactors, except in military applications, is to settle for a very limited partial override situation, so that the fuel inventory be maintained at reasonably low levels, roughly 2 to 3 critical masses. This corresponds
8.51
Short Allowable Shutdown Durations
129
to being able to restart within only an hour or less following immediate shutdown. Similar results are obtained for the minimization of the postshutdown xenon concentration, problem (b). This can be seen from the similarity of the functions @ and Y, as discussed previously. In terms of IRR1 reactor scheduling, it will be seen that optimal shutdown flux programs can be used to advantage. Under normal daily operation, the IRR1 is shut down each night at about 2300 hours. For the anticipated higher power operation at 5 megawatts, the postshutdown xenon maximum will be larger than its magnitude at the present power of 2 megawatts. More important, the time at which the postshutdown xenon maximum occurs following an immediate nonoptimal shutdown will increase from approximately 7.7 to 9 hours. This will unfortunately coincide with the daily morning startup time. For protracted power operation at 5 megawatts, it is doubtful that sufficient xenonpeakoverride capability can be maintained in this reactor to restart the following morning. Therefore, an optimal shutdown program of the minimax type will serve to lower the xenon peak concentration occurring at the early morning startup hour, shifting the xenon peak ahead in time, as seen from Fig. 8.3. Or, an optimal shutdown program of the type of problem (b) can be invoked which will minimize the postshutdown xenon concentration at a time corresponding to early morning startup. Both types of optimal shutdown programs are equivalent in this case, in that they both shift the xenon peak sufficiently far ahead in time to depress the postshutdown xenon at the desired startup time.
8.5. Short Allowable Shutdown Durations A third general result, which is seen by examining the figures, is that for short shutdown durations (TG 0.20) the minimax xenon reactivity magnitude, problem (a), is relatively insensitive to the shutdown policy. Thus for shutdown durations of two hours or less, nothing much can be done to “minimax” the xenon reactivity poison. For short allowable shutdown durations, problem (b) is probably more apropos in that the corresponding criterion is only to minimize the xenon reactivity at a given postshutdown time, especially so if it is desired to accomplish this precisely at the termination of the shutdown duration T.
130
Results and Conclusions
8.6. Strongly Limited Xenon Override Shutdown
For many thermal reactors, such as the IRR1 reactor, the fuel inventory is relatively small (about 2 to 2.5 critical masses), so that the corresponding reactivity available to override xenon is also small. In general this can be expressed in terms of the ratio of available xenon override reactivity K , to the xenon reactivity at equilibrium operating conditions, Kxo. As can be appreciated from Figs. 8.7 to 8.11, the smaller this ratio for a given shutdown duration, the greater the number of pulses required for optimal shutdown. This is also depicted in Fig. 8.12, where the ratio Kc/Kxo is plotted as a function of the percentage of time that the reactor is on, i.e., at operating power, in the shutdown duration. It is seen that as Kc/Kxoapproaches unity, the reactor approaches its limit of not being shut down. That is, if there is no xenon override reactivity available, no optimal shutdown programs exist, as the reactor cannot be restarted at all following shutdown. It is also seen from Figs. 8.7 to 8.11 that, as the allowable shutdown duration is reduced, the average pulse widths are larger. This indicates that the reactor must remain at operating power a greater percentage of the time during the shutdown control phase. This is required to burn out the xenon, as there: is only a relatively small amount of xenon override reactivity available. However, the fact that the reactor is a t operating power so frequently during the shutdown duration does not allow the iodine concentration to be decreased very much. The iodine concentration must be decreased to obtain a substantial postshutdown xenon minimum. Hence, such a minimum will not be realized. This is another way of saying that the allowable shutdown duration T is too short, as discussed in Section 8.5. The limited xenon override reactivity available, especially in the cases of shorttime allowable shutdown duration, is reflected in the only slight reduction in minimax xenon attained compared to immediate shutdown, as shown in Figs. 8.7 to 8.11. If the flux maximum constraint M can be relaxed, much of the offon behavior due to the large number of pulses required for optimal shutdown could be eliminated. That is, the flux would be increased to greaterthanrated operating power over part of the extremal, as de
131
Strongly Limited Xenon Override Shutdown
8.61
1.0
$ 3 0 override
U
Scale I
0
1.0 $25 override
U
Scale I
0
1.0
$ 15 override
U
Scale I
0
I00
;?;ig.
Coasting phase
$30
scale
II
,$70
xenon peak following step shutdown scale I
t U 0
9
.
Shutdown control phase scale I 30 override 25 override $ I 5 override
cn 0
e e
c
C
x"
10
I
I8
08
I0
Scale I
20 22 24 26 Time, iodine mean life ( 9 58 hours)
28
Scale ll
02
04
t+t
06
4
FIG.8.7. Optimal shutdown for postshutdown xenon minimax. Allowable shutdown duration T = 1 (9.58 hr). Equilibrium flux @O = 2 x l O I 4 neutrons/cm2sec, urnax= 1.0.
Results and Conckusions
132
10 U
$65 override Scale I
0
10
$ 4 5 override Scale I
U
n " 10
$ 30 override Scale I
U
0 I0
$ I5 override
U
Scale I

n
Coasting phase
Scale
II
I00
Shutdown control phase
P
.
c
10
m 0
c
02 I0
18
20
04
06
08
I0
Scale I
22
24
26
28
Scale
Ii
Time. iodlne mean life ( 9 5 8 hours)
FIG.8.8. Optimal shutdown for postshutdown xenon minimum (at end of shutdown duration). T = TO= 1, 00= 2 x 1014 neutrons/cm2sec.
8.61
Strongly Limited Xenon Override Shutdown
133
10 U
$13 override Scale I
0
10 U
$10 override Scale I
0 I0 U
.+
0
Coasting phase
scale
11
$8 30 override Scale I Nonoptimal step shutdown xenon peak, scale I
I0
r

0

Shutdown control phase
Ti
scale I
h
?
"
+
e
?
Is 0
P
E
x"
04
06
20
22
l o [.2+L++.. 16
18
08
I0
Scale I
24
26
Scale
+A
T i m e , iodine mean life ( 9 58 hours)
U
FIG.8.9. Optimal shutdown for postshutdown xenon minimax allowable shutdown duration T = 1 (9.58 hr). Equilibrium flux 90= 4.2 x IOl3 neutrons/ cm2sec, urnax= 1.0.
Results and Conclusions
134
1.0 U
$7.80 override Scale I
0 1.0
$9.00 override
U
Scale
0 1.0
$12.50 override Scale I
U
0
Coasting phase
, $12.50 override
scale
U
Shutdown control phase
$ 12.50 override $ 9.00 override $ 7.80 override
" ?
I
___
+ Nonopfirnal step
shutdown xenon peak
Scale 1

02
03
04
05
L 0 t l I_ 3 _ 15+  17I Y 19
10 09
Scale 1 Scale U
I I
Time, iodine mean life (9 58 hours)
FIG.8.10. Optimal shutdown for postshutdown xenon minimax. Allowable shutdown duration T = 0.5 (4.79 hr). Equilibrium flux GO= 4.2 x 1013neutrons/ cm2sec, urnax= 1.0.
Strongly Limited Xenon Override Shutdown
8.61
135
I0
$ 7 0 0 override Scale I
U
0 10
$8.75 override
U
Scale I
0
1.0 $10.80 override
U
0
    _
Scole I Nonoptimal step shutdown xenon peak level

10
?
Q 0
r,
c .>
.*
7 00 override
0
0
? m
._ c 0
m C
e
E
X
1.0 05
005 07
010
015
020
025
03
15
I7
1
09 II 13 Time, iodlne mean life ( 9 5 8 hours)
+
Scale I Scale
II
FIG.8.1 1. Optimal shutdown for postshutdown xenon minimax. Allowable shutdown duration T = 0.3 (2.87 hr). Equilibrium flux @o = 4.2 x 1Ols neutrons/ cm2sec, urnax= 1.0.
Results and Conclusions
136

5 5

0
l’? 
.=5 \k
p

0 01 .L
z B

C C
2 r

0
dc“
1.0
I
0
14
28
42
56
70
84
98
Percentage of time reactor is on during shutdown phase
FIG.8.12. Ratio of xenon override reactivity to xenon equilibrium reactivity versus number of pulses in shutdown program. The latter is in terms of percentage of “on” time during shutdown phase to minimax xenon, problem (a).
scribed in Section 8.3, to burn out the xenon as the alternative to a long flux pulse train. However, for small available xenon override reactivity, this might prove difficult, as the extent to which the flux constraint must be relaxed might be intolerable. This can be easily discerned by examining Eq. (8.3) and Figs. 8.4 and 8.5.
8.7. Conclusions of Experimental Investigation The optimal shutdown experiment is described in Section 7.5, where the results are also given. With the anticipated increase in power of the IRR1 reactor to 5 megawatts, the fact that the postshutdown xenon maximum will occur almost at the desired morning startup time, commencing daily operation then necessitates a shift in the time of occurrence of the xenon peak. This is easily accomplished by using an optimal shutdown program that minimizes the xenon maximum, problem (a). As discussed in Section 8.4, the shutdown program to
8.71
Conclusions of Experimental Investigation
137
minimize the xenon at a given postshutdown time, problem (b), can be used as well. As seen from Figs. 8.7 to 8.11, the shutdown programs for problem (a) or (b) are quite similar when strong xenon override constraints are imposed, other parameters being equal. From Fig. 7.7 it is seen that there are differences between the computed optimal shutdown program and the one that is experimentally realized. This is because (1) the mechanical inertia of the controlrod system and the delayed neutrons constrain the shutdown rate of the reactor, as discussed in Section 5.2; and (2) the startup rate is constrained for safety reasons, so that the reactor does not pass through the desired equilibrium state (criticality at desired operating power) too quickly. If this happens, the reactor can become supercritical (k> 1) for a dangerously long time interval, presaging a reactorcoremeltdown accident. F,or reasons discussed in Section 7.5, the experimental shutdown program was initiated from a nonequilibrium state; i.e., i ( 0 )#O, j ( 0 )#O. This produced the initial “transient” in the measured xenon poison reactivity seen in Fig. 7.7. In general, however, the measured xenon reactivity changes are less abrupt than those computed, owing to the less abrupt experimentally realized optimal shutdown program. The xenon peak time has been shifted from 0.8 (in units of iodine mean life) for an immediate nonoptimal shutdown to 2.0 using the optimal shutdown program described in Section 7.5. However, the magnitude of the peak has remained essentially the same as its nonoptimal shutdown value, because of the strong xenon override constraint imposed. The ratio of the xenon override constraint to the equilibrium xenon level was about 1.30, which is quite near the limiting value for effective optimal shutdown programs, as discussed in Section 9.1. If no automatic control mechanism is available to exercise such a protracted optimal shutdown program (10 hours), T can be decreased, so that the operator on the late evening shift can manually exercise the shutdown program prior to shutting the reactor down for the night. Shorter optimal shutdown durations T will not depress the xenon reactivity at the termination of shutdown as much as the experiment under discussion (T= 1.04). How small T can be made depends on how much xenon override reactivity is available, which in turn depends on the age and status of the core configuration. Again, see Section 9.1.
138
18
Results and ConcIusions
Since one principal reason for nightly shutdown is to conserve fuel, perhaps a better shutdown criterion would be one that takes into account fuel usage (burnup). Such a criterion is investigated in Section 9.3. In these experiments the ratio of nopulse duration to pulse duration during shutdown is effectively 3; hence the reactor is down only 37.5 per cent of the time during the shutdown control phase. Again, this is because of the imposition of the strong xenon constraint, which forces the reactor “on” frequently to burn out the xenon to maintain at will startup capability. Figure 8.13 depicts how the optimal shutdown program discussed would be executed on a daily basis. It is interesting to examine the optimal shutdown programs to minimize the xenon reactivity in the postshutdown period, problem (b), for the case of no xenon override constraint. From previous discussion, the latter can be construed as assuming that sufficient xenon reactivity override exists at the onset of the first pulse required by the optimal shutdown program, so that the particular shutdown program is realized. Figure 8.14 depicts such optimal shutdown programs and their resultant xenon reactivity changes for various shutdown dura2300 U
0700
2300
0700
2300
0700
1.0
s
3
LL
n
0‘
FIG.8.13. Computed optimal flux shutdown to minimize xenon at TO= 1 (9.58 hr) for various shutdown durations 7‘. Sufficient override reactivity assumed available at onset of first pulse. Equilibrium flux QO = 0.833 x 10ls neutrons/cm2sec.
8.7 1
Conclusions of Experimental Investigation
139
I.o U
0 1.0 
s” ‘ u
=
8
0
I’
u
6 
8
L
T = 0.0
=
t
T = 0.3
9 
8 
7 
Xenon
Reactor daily restart
FIG.8.14. Suggested daily optimal shutdown program to minimax xenon in IRR1 reactor for depleted cores.
tions T,with the postshutdown time of xenon minimization To= 1.0 (9.58 hours). It is evident that the longer shutdown durations are more effective in reducing the xenon reactivity at the given postshutdown time. This is simply because the reactor is operating at full power more of the time, so that more xenon is burned out. This is especially true in the case where T= To= 1.0 in Fig. 8.13, where the reactor is effectively “ ~ f f ” only 40 per cent of the shutdown control phase. This is seen to be the case, since the reactor is on during the last portion of the shutdown, which is immediately contiguous to the daily “on” period.
CHAPTER 9
Summary and Equivalences
9.1. Reprise The previous chapters have described the formulation, behavior, and optimal control policies of a substantive control process occurring in the field of nuclear reactor physics and engineering. It is that of controlling the reactor flux or power in order to shut the reactor down while minimizing the effect of xenon poison, which is a concomitant part of thermalreactor operation. We have brought to bear one of the newer modern and powerful methods, which is dynamic programming, for obtaining optimal power shutdown policies or programs to accomplish this task. Without belaboring the obvious, such programs should play a very important role in the control and operation of such reactors, since to date there are no generally accepted measures to determine how, and to what extent, the xenon override problem should be handled. In large thermal reactors built thus far, only sufficient additional fuel has been provided to override the xenon poison for a very short time, compared to the mean life of iodine135, following shutdown. Such times are of the order of to 1 hour following immediate shutdown. This is because of the inordinately large amounts of additional fuel required to cope with the xenon poison when no optimal shutdown program is used. As seen in Chapter 2, the amount of fuel inventory required to override the postshutdown xenon at will ranges from one to two orders of magnitude over that needed to counter the xenon poison at steadystate operation. Even the latter is often two to three times the amount of fuel called for by the criticality
+
140
9.11
Reprise
141
equations used in reactordesign computations. Besides the very great expense (hundreds of thousands of dollars at the prevailing U.S. price of $12,00O/kilogram of U235) involved in tying up such a large fuel inventory, the control of such an overfueled reactor poses difficulties from the operational safety point of view. This is because such a system possesses dangerously large amounts of potentially positive reactivity, which could cause a neutron population multiplication at a stupendous exponentially increasing rate if the reactor became supercritical (k> 1) for a sustained period. Of course such a situation would only occur during a reactor accident. In general, to prevent the occurrence of such accidents, very elaborate precautions must be, and are, taken. It is felt, as has been stated often enough, that the xenon override difficulty can be alleviated through the use of optimal shutdown programs described in the previous chapters. This was demonstrated analytically in the description and consequent numerical computation of such programs using the dynamicprogramming algorithm. It was also demonstrated experimentally on the (albeit lowtomedium power) IRR1 reactor described in Chapter 7. For this reactor, considering the anticipated increase in its power, it will be sufficient to shift the xenon peak ahead in time, which is accomplished through optimal power shutdown to minimize the xenon maximum, also discussed in Chapter 7. From the operational point of view, optimal flux shutdown to control xenon poison provides an example of a process that has sufficiently complicated features to make its solution one which cannot be deduced in detail intuitively. Even though the general nature of the optimal shutdown can be deduced from physical grounds, to obtain the quantitative aspects would prove quite difficult indeed without performing extensive “cut and try” optimal power shutdown experiments on an actual reactor. As is by now evident, one must pulse the reactor to quickly reduce the iodine concentration, to ultimately minimize the xenon poison after shutdown. Upon initial shutdown there is an immediate increase of xenon which must be balanced against the simultaneous desired decrease of iodine. Hence there is a point at which the reactor must be pulsed to burn out the xenon to maintain control flexibility by respecting the xenon override constraint. How
142
Summary and Equivalences
19
ever, this procedure increases the iodine concentration. How and when to pulse the reactor, within the limits imposed by the xenon override constraint, is not something that can normally be intuited from physical feeling about the vagaries of xenon behavior in large highpower thermal reactors. Inroads into the determination of optimal shutdown policies also can be obtained by using the methods of the maximum principle described in Chapters 4 and 5. However, the resulting formulation yields a formidable twopoint boundaryvalue problem which is difficult to handle from the computational point of view. Analytical solution to obtain optimal flux shutdown programs using the maximum principle is out of the question, as hardly more than tutorial examples can be solved in this manner. However, the general nature of the control regimen (whether bangbang or not) can be determined by inspection of the analytical formulation in terms of either the maximum principle or the principle of optimality of dynamic programming. As mentioned in Chapter 5 , certain trialanderror procedures can be used in conjunction with the maximum principle to obtain optimal shutdown policies, but it is more natural to use a straightforward algorithm as provided by dynamic programming; one with no cumbersome twopoint boundaryvalue difficulties. There is a kind of equivalence between the optimal control policy corresponding to the xenon minimax criterion and that of, for example, the criterion of minimumtime optimal control. These and other aspects, including the relationship between the use of control criteria singly and in combination, as well as the equivalence between the maximum principle and the optimality principle of dynamic programming are discussed in Sections 9.2 and 9.3, respectively. As to the question of the efficacy of optimal shutdown programs, they can be employed not only on highpower thermal reactors, but on those of low to medium power as well. In the case of the IRR1 reactor, it was seen that the use of an optimal shutdown program, either to minimize the postshutdown xenon maximum or to minimize the xenon at a given post shutdown time, will greatly facilitate its normal daily operation. This is necessary because, at the anticipated increase in power operation, the amount of reactivity available to override xenon will be small, and the time of occurrence of the post
Equivalence bei ween Optimaliiy and Maximum Principles
9.2)
143
shutdown xenon peak will almost coincide with the desired daily time of restarting the reactor. For the case of highflux thermal reactors, approximately up to 50 per cent less fuel is needed if optimal shutdown programs are employed to override postshutdown xenon at will. Lesser results are obtained for shorter shutdown duration times and more restrictive xenon override constraints. Generally, optimal shutdown programs are to no avail if the shutdown duration time is less than 2 hours, and/or the xenon override constraint is less than approximately 1.5 times the xenon poison reactivity at equilibrium operating power. 9.2. Equivalence between the Optimality Principle and the Maximum Principle
There is an equivalence between the optimality principle of dynamic programming and the maximum principle, which is readily apparent when both principles are applied to find the solution to a particular optimal control process [21]. This equivalence will be demonstrated by first using the principle of optimality to derive the functional equation of dynamic programming for a general class of optimal control processes. Using this functional equation, the adjoint equations of the maximum principle will follow, as well as the statement of the maximum principle itself. Consider the general control process of finding the optimal control policy that minimizes PT
J
= J 0 g ( x 1 , x 2 ,..., x,;
u 1 , u 2 ,..., urn)dt
(m s N ) ,
(9.1)
where T is the allowable duration of control, 2 = (xl, x2, ...,x,) is the state vector, and zi=(ul, u2, ..., urn)is the control vector. The components xi of the state vector satisfy the following equations of state (equations of motion) :
144
19
Summary and Equivalences
That is, it is desired to find the optimal control policy G*=(ul*, u2*, ..., urn*)that yields J*
=
s:
g (x1,x2,
...,x,;
u1*,u2* ,..., urn*)dt = min ,
(9.3)
where the {xi} satisfy equations (9.2) with initial conditions 2(0)= (cl, c2, ..., c,) such that the system proceeds to some given final state, ZT,at the end of the given control duration, T. Let a “cost,” or criterion, function S ( c , , c2, ..., ,c T ) be defined as rT
Using the principle of optimality of dynamic programming allows one, as explained in previous chapters, to write the following functional equation for S : S(c1,c2, ...,,c
u1,u2,..., u , ) A
T ) = min[g(c,,c,, ..., c,; U
+ S(C1 + i l A , ..., C N + i N A , T  A ) ] .
(9.5)
This equation enunciates the optimality principle in that the cost functional S is given by the minimum, over the allowa5le {ui},of the sum of two terms on the right side of (9.5). The first term is the cost of control for a duration of A units of time, and the second is the cost of a “new” optimal control process S (cl + i l A , c 2 + i 2 A , ..., c,+ii.,A, T  A ) , beginning in the state ( c l + i l A , c 2 + i 2 A , ..., c , + i , A ) but of duration T  A units of time. In the usual manner, expansion of the second term on the right side of (9.5) about the initial state (cl, c2, ..., c,) yields S(c1,c2,
..., ,c T ) = min z
g(c1,c2,
l
as + S + aci iiA N
i=
..., c,;
ul, u 2 , ..., u,)A
as
A
aT
1
+ O ( A 2 )+.. .
(9.6)
Canceling S on both sides of (9.6), dividing the result by A , taking the limit of zero A , and substituting from the equations of state (9.2)
9.21
145
Equivalence bet ween Optimality and Maximum Principles
results in Bellman's equation, viz.,
This is a partial differential equation from which the optimal control vector d* can be obtained in principle, together with the minimum cost, S. As already discussed a number of times, this equation is too difficult to solve in closed form, except for tutorial examples, so that one must resort to numerical computations, using its discrete analogue, Eq. (9.5). Now define, for convenience, an additional component, x ~ +of~ , the state vector f by letting x ~ satisfy + ~
It should be noted that the right side of (9.8) does not depend ex~ . fact plays a role in the formulation of the soplicitly on x ~ +This lution using the maximum principle. Here x N + l is introduced so that the resemblance between the two principles will become more apparent; Bellman's equation (9.7) can be rewritten
as
(
aT
...,
~ 1 , ~ 2 , cN;
T ) = min
~
+
C i=l
as1
ji(c1,c2,
aci
...,c,;
ul, ~
..., u,).
2 ,
(9.10)
To obtain the correspondence between this equation and the maximum principle, define a set of adjoint variables as (9.1 1)
146
Summary and Equivalences
19
From the definition of p i ( t ) in (9.11), its total derivative is
Furthermore, along an extremal in the phase space, (9.10) asserts that
as "+'as  = C fi*
aT
aci
i=l
= min = constant,
(9.13)
where the optimal control vector 1* has been inserted in thef;:, giving cz, ...) CN; u1*, uz*, ...3 urn,*). Differentiation of (9.13) with respect to, e.g., u,*, assumed continuous, gives N + l as afi* C =o. (9.14) i = l aciau,*
f;:*Ef;:(C,,
If (9.13) is also differentiated with respect to c,,, the result is
as + aci 
["*
afi* au,* au, ac,
11
 +y
ac,
=O.
(9.15)
From (9.14) it is seen that the third term of (9.15) vanishes. Comparing (9.15) with (9.11) and (9.12) supplies the equation that the adjoint variables p i must satisfy, viz.,
bi(t)=
~
+
j=l
afj aci
1 p j  .
(9.16)
Identifying a Hamiltonian, after Pontryagin, as N+1
(9.17)
then along an extremal, from (9.13), H must satisfy
H* = 
N+ 1
C
i=1
pifi* = max = constant.
(9.18)
9.31
Comparisonof Optimal Shutdown Criteria
147
That is, the maximum principle asserts that for a system satisfying equations (9.2), the optimal control that minimizes J, also maximizes H, where  H=Cf=+llpi& and where the p i are defined by (9.16). As can now be seen from (9.10), Bellman’s equation is the HamiltonJacobi equation, i.e., along an extremal (9.19) and the adjoint equations are the canonical equations of Hamilton. In other words, from (9.16), the definition of H, and identifying ci with xi yields aH aH . xi=  (9.20) ( i = 1, ..., N + 1). Pi(t)= axi api 9.3.
Comparison of Optimal Shutdown Criteria
As discussed in Section 2.5, there are two general types of control criterion functionals. They are the averaged or integral type, of which min J [ u ] = min U
u
1 T
o
L (2, u ) dt
(9.21)
is typical. Then there is the terminal control criterion, of which optimal xenon shutdown as presented in the foregoing chapters, is a case in point. For the latter, it is desired to minimize a “cost” functional of the final state only. That is, minJ, [ u ] = min c#~(f(T),u) or U
U
minY(f(T),u),
(9.22)
U
where f ( T ) is the state of the system at the termination of control. T is the shutdown control duration. Now, to appreciate how the interaction of the constraints influences the character of the optimal xenon shutdown programs, consider initially the xenon minimax criterion, problem (a), with no xenon override constraint but with givedflux constraints. It should be noted that any formulation with no constraints whatever is not physically meaniag’ful, as discussed in Section 4.1. Then for no xenon override constraint (complete xenon override capability) but with bounded flux, an optimal shutdown program and a final state (xT,yT) can be determined
148
19
Summary and Equivalences
which corresponds to a given xenon “minimax” @ ( x T ,y T ) .To attain the final state, ( x T ,yT), one can proceed along an extremal from the initial equilibrium state over a portion consisting of sawtooth segments which is a characteristic of the flux constraints, and is depicted in Figs. 8.7 to 8.1 1. Or, the same final state ( x T ,y T )can be achieved if the flux constraint is relaxed to allow part of the xenon override constraint boundary line to become part of the extremal. The optimal flux u* is continuous (exponential) and exceeds the upper flux bound over this portion of the extremal. This is shown in Figs. 8.4 and 8.5 and discussed in Section 8.3. Whether to choose the sawteeth or the xenon override constraint boundary as part of the extremal depends on the particular reactor system. If the reactor is operating at equilibrium which is close to its maximum upper power bound, then the sawteeth are used, obviously since the flux constraint cannot be relaxed. On the other hand, if the reactor is operating at a conservatively rated equilibrium power, including the xenon override constraint as part of the extremal would eliminate a number of pulses, as can be seen from Figs. 8.7 to 8.11. This would perhaps simplify the practical difficulties of requiring a complex shutdown control system and allow the reactor to exceed rated power temporarily, which should cause no undue harm. The fact that the control criterion or cost functional depends only on the final state, so that the cost of control during the shutdown phase per se is assumed to be zero, may not reflect the actual control cost well. An additional cost associated with the shutdown phase can be readily identified as the control mechanism effort used to pulse the control rods. This is developed by first rewriting Bellman’s equation (5.17) for either problem (a) or (b): to find the optimal control policy u* and the corresponding control functional F= min, @ or F= min, Y that satisfies
aF ( E , T )
~
aT
 min V F * i j ( u )
F(C,O)
[email protected](E)
or
Y(E)
(9.23)
O