Optimization of Stochastic Systems Topics in Discrete-Time Systems
MATHEMATICS IN SCIENCE AND ENGINEERING A SERIES OF MONOGRAPHS AND TEXTBOOKS
Edited by Richard Be.Ilrrra.ri University of Southern California 1.
TRACY Y. THOMAS. Concepts from Tensor Analysis and Differential Geometry. Second Edition. 1965
2.
TRACY Y. THOMAS. Plastic Flow and Fracture in Solids. 1961
3.
RUTHERFORD ARIS. The Optimal Design of Chemical Reactors: A Study in Dynamic Programming. 1961
4.
JOSEPH LASALLE and SOLOMON LEFSCHETZ. Stability by Liapunov's Direct Method with Applications. 1961
5.
GEORGE LEITMANN (ed.). Optimization Techniques: With Applications to Aerospace Systems. 1962 RICHARD BELLMAN and KENNETH L. COOKE. Differential-Difference Equations. 1963 FRANK A. HAIGHT. Mathematical Theories of Traffic Flow. 1963
6. 7. 8.
F. V. ATKINSON. Discrete and Continuous Boundary Problems. 1964
9.
A. JEFFREY and T. TANIUTl. Non-Linear Wave Propagation: With Applications to Physics and Magnetohydrodynamics. 1964
10.
JULIUS T. Tou. Optimum Design of Digital Control Systems. 1963
11.
HARLEY FLANDERS. Differential Forms: With Applications to the Physical Sciences. 1963
12.
SANFORD M. ROBERTS. Dynamic Programming in Chemical Engineering and Process Control. 1964
13.
SOLOMON LEFSCHETZ. Stability of Nonlinear Control Systems. 1965
14.
DIMITRIS N. CHORAFAS. Systems and Simulation. 1965 A. A. PERVOZVANSKII. Random Processes in Nonlinear Control Systems. 1965 MARSHALL C. PEASE, III. Methods of Matrix Algebra. 1965
15. 16.
17.
V. E. BENES. Mathematical Theory of Connecting Networks and Telephone Traffic. 1965
18.
WILLIAM F. AMES. Nonlinear Partial Differential Equations in Engineering. 1965
19.
J. ACZEL. Lectures on
20.
R. E. MURPHY. Adaptive Processes in Economic Systems. 1965
21.
S. E. DREYFUS. Dynamic Programming and the Calculus of Variations. 1965 A. A. FEL'DBAUM. Optimal Control Systems. 1965
22.
Functional Equations and Their Applications. 1966
MATHEMATICS
IN
SCIENCE AND
ENGINEERING
23.
A. HALANAY. Differential Equations: Stability, Oscillations, Time Lags.
24. 25. 26. 27.
M. NAMIK OGUZTORELI. Time-Lag Control Systems. 1966 DAVID SWORDER. Optimal Adaptive Control Systems. 1966
1966
MILTON ASH. Optimal Shutdown Control of Nuclear Reactors. 1966 DIMITRIS N. CHORAFAS. Control System Functions and Programming Approaches. (In Two Volumes.) 1966 N. P. ERUGIN. Linear Systems of Ordinary Differential Equations. 1966
28. 29.
SOLOMON MARcus. Algebraic Linguistics; Analytical Models. 1967
30. 31.
A. M. LIAPUNOV. Stability of Motion. 1966 GEORGE LEITMANN (ed.). Topics in Optimization. 1967
32.
MASANAO AOKI. Optimization of Stochastic Systems. 1967
In preparation A. KAUFMANN. Graphs, Dynamic Programming, and Finite Games MINORU URABE. Nonlinear Autonomous Oscillations A. KAUFMANN and R. CRUON. Dynamic Programming: Sequential Scientific Management
Y. SAWARAGI, Y. SUNAHARA, and T. NAKAMIZO. Statistical Decision Theory in Adaptive Control Systems F. CALOGERO. Variable Phase Approach to Potential Scattering J. H. AHLBERG, E. N. NILSON, and J. L. WALSH. The Theory of Splines and Their Application HAROLD J. KUSHNER. Stochastic Stability and Control
This page intentionally left blank
Optimization of Stochastic Systems Topics in Discrete-Time Systems
MASANAO AOKI Department of Engineering University of California Los Angeles, California
1967 ACADEMIC PRESS New York • London
COPYRIGHT
© 1967,
BY ACADEMIC PRESS INC.
ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC PRESS INC.
111 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD.
Berkeley Square House, London W.l
LIBRARY OF CONGRESS CATALOG CARD NUMBER:
PRINTED IN THE UNITED STATES OF AMERICA
66-30117
To M. F. A. and C. A.
This page intentionally left blank
Preface This book is an outgrowth of class notes of a graduate level seminar on optimization of stochastic systems. Most of the material in the book was taught for the first time during the 1965 Spring Semester while the author was visiting the Department of Electrical Engineering, University of California, Berkeley. The revised and expanded material was presented at the Department of Engineering, University of California, Los Angeles during the 1965 Fall Semester. The systems discussed in the book are mostly assumed to be of discrete-time type with continuous state variables taking values in some subsets of Euclidean spaces. There is another class of systems in which state variables are assumed to take on at most a denumerable number of values, i.e., these systems are of discrete-time discrete-space type. Although the problems associated with the latter class of systems are many and interesting, and although they are amenable to deep analysis on such topics as the limiting behaviors of state variables as time indexes increase to infinity, this class of systems is not included here, partly because there are many excellent books on the subjects and partly because inclusion of these materials would easily double the size of the book. The readers are referred to Refs. 47a, 52, 58, 63a, 74a and the books by K. L. Chung, J. G. Kemeny et al., and R. S. Varga listed in the Bibliography. Following the introductory remarks and simple one-dimensional examples to indicate the types of problems dealt with in the book, the procedures for deriving optimal Bayesian control policies for discrete-time stochastic systems are developed systematically III Chapters II through IV. Those readers who are being exposed to the types of problems in the examples in Chapter I for the first time should glance over these examples without unduly concerning themselves with the question of how the optimal controls are derived and then come back to them after reading Chapters I and III. Chapter II treats a class of stochastic control systems such that the complete information on the random variables in the system descriptions is available through their joint probability distribution functions. ix
x
PREFACE
Such systems are called purely stochastic. Chapter III treats a class of stochastic systems in which the joint probability distribution functions of the random variables are parametrized by unknown parameters in known parameter spaces. Such systems are called parameter adaptive. Chapter IV presents the most general formulation of optimal Bayesian optimization problems in the book and is a generalization of the material in Chapters II and III. Advanced readers may go directly to Chapter IV to see the general mathematical formulation used. The material in Chapters II and III is included primarily for pedagogical purposes. Since optimal control problems often involve estimation problems as subproblems, and since the topic is of interest in its own right, Chapter V is devoted to discussions of estimation problems of linear and nonlinear systems. Chapter VI concerns the convergence questions in Bayesian optimization method and includes material on stochastic observability of systems. Some of the material in this chapter is relevant to learning systems. Chapter VII presents approximations in control and estimation problems and current topics such as various suboptimal estimation schemes and construction of suboptimal control policies for adaptive systems. Control problems discussed are mostly of finite duration, N. The behaviors of systems as N -----). ware only touched upon in Chapters VIII and IX. Chapter VIII briefly describes the question of stability of stochastic systems. The last section of Chapter VI and this chapter constitute material on the quantitative aspects of discrete-time systems. Needless to say, the concept of optimal controls is meaningful only when the resultant system behaviors are stable. Implicit in this is the assumption that there is at least one control policy which makes the expected value of the criterion function finite. Although this point becomes important when the control problems with infinite duration are discussed, the stability question is considered primarily as an application of martingales discussed in Chapter VI. This and some other topics not contained in previous chapters are mentioned in Chapter IX and some future problems are also suggested. All my work on stochastic control systems included here has been supported by the Office of Naval Research. I am particularly grateful to Professor G. Estrin who has supported and encouraged my work in this area for many years since I was a graduate student, to Professor R. Bellman who introduced me to problems of optimization and suggested writing the book, to Professors Zadeh and Desoer who gave me the opportunity to visit the University of California, Berkeley and to give
PREFACE
Xl
a seminar, and to Professors A. V. Balakrishnan and C. T. Leondes for their support of the seminar conducted at the University of California, Los Angeles. The book has been improved materially as the results of discussion with D. D. Sworder, R. E. Mortensen, A. R. Stubberud, and J. R. Huddle. I want to express my sincere thanks and appreciation to my teachers, colleagues, and students for their help in preparing the book. The following charts are included as an aid for those readers who wish to follow particular topics of interest. Optimal Bayesian Control
Estimation
Stability
I
11.3
VI.3
II
V
VIA
III
I
VIII
I
I
I
I
I
VII.3-6
IV
I
I
I
I
VI
VI
I IX
I
I
VIII
I
VII.2
I
VII.!
Approximate Methods in Control and Estimation
Appendix IV
II
I
I--~
II
I~_---I
III VII.!, 2
Use of Sufficient Statistics in Control and Estimation
I VII.3-6
1-,I
IIL5
I
V.3
IV.2
Los Angeles, California December, 1966
MASANAO AOKI
This page intentionally left blank
Contents Preface
CHAPTER 1.
IX
Introduction
1. Introduction 2. Preliminary Examples
1 4
Optimal Bayesian Control of General Stochastic Dynamic Systems . . . . . . . . . . . . . . . . .
CHAPTER II.
20
1. Formulation of Optimal Control Problems . . . . . . . . . 2. Example. Linear Control Systems with Independent Parameter Variations . . . 3. Sufficient Statistics . . . . . . . . . . . 4. Discussions . . . . . . . . . . . . . . Appendix A. Minimization of a Quadratic Form Appendix B. Use of Pseudoinverse in Minimizing a Quadratic Form Appendix C. Calculation of Sufficient Statistics Appendix D. Matrix Identities . . . . . . . . . . . . . . . .
36 53 71 73 74 76 79
Adaptive Control Systems and Optimal Bayesian Control Policies . . . . . . . . . . . . . .
81
21
CHAPTER III.
1. 2. 3. 4. 5. 6. 7.
General Problem Statement (Scope of the Discussions) Systems with Unknown Noise Characteristics . . . . Systems with Unknown Plant Parameters . . . . . . Systems with Unknown Plant Parameters and Noise Characteristics Sufficient Statistics . . . . . . . . . . . . . . . Method Based on Computing Joint Probability Density Discussions . . . . . . . . . . . . . . . . . .
82 83 104 116 117 120 125
Optimal Bayesian Control of Partially Observed Markovian Systems . . .
128
1. Introduction 2. Markov Properties
128 132
CHAPTER IV.
xiii
CONTENTS
XIV
3. Optimal Control Policies . 4. Derivation of Conditional Probability Densities 5. Examples . . . . . . . . . . . . . . . .
140 142 143
v. Problem of Estimation
154
CHAPTER
1. Least-Squares Estimation . . . 2. Maximum Likelihood Estimation 3. Optimal Bayesian Estimation . Appendix. Completion of Squares .
155 168 173 195
Convergence Questions in Bayesian Optimization Problems . .
197
I. 2. 3. 4. 5.
Introduction Convergence Questions: A Simple Case Martingales . . . . . . . . . . . . Convergence Questions: General Case . Stochastic Controllability and Observability
197 199 202 204 209
Approximations
223
CHAPTER VI.
CHAPTER VII.
1. 2. 3. 4. 5.
Approximately Optimal Control Policies for Adaptive Systems Approximation with Open-Loop Feedback Control Policies Sensitivity and Error Analysis of Kalman Filters . . . . . . Estimation of State Vectors by a Minimal-Order Observer . . Suboptimal Linear Estimation by State Vector Partition Method: Theory . 6. Suboptimal Estimation by State Vector Partition: An Example . . Appendix A. Derivation of the Recursion Formula for Open-Loop Feedback Control Policies (Section 2) . . . . . . . . . . . . Appendix B. Derivation of the Constraint Matrix Equations (Section 4) Appendix C. Computation of Llr(i) (Section 6) .
224 241 246 250 265 269 276 278 279
Stochastic Stability
282
I. Introduction . 2. Stochastic Lyapunov Functions as Semimartingales 3. Examples . . . . . . . . . . . Appendix. Semimartingale Inequality
282 284 288 290
CHAPTER VlII.
Miscellany
291
1. Probability as a Performance Criterion 2. Min-Max Control Policies . . . 3. Extensions and Future Problems . .
291 298 300
CHAPTER IX.
CONTENTS
xv
Appendix I. Some Useful Definitions, Facts, and Theorems from Probability Theory . . . . . . . . . . . . . . . . . . . Appendix II. Pseudoinverse . . . . . . . Appendix III. Multidimensional Normal Distributions Appendix IV. Sufficient Statistics . . . . . . . . .
309 318 325 333
Bibliography . . References . . . List of Symbols.
339 339 347
Author Index .
349
Subject Index .
352
This page intentionally left blank
Optimization of Stochastic Systems Topics in Discrete-Time Systems
This page intentionally left blank
Chapter I
Introduction
1+ Introduction
There is a wide range of engineering problems in which we want to control physical equipment or ensembles of such equipment. These problems may range from a relatively simple problem of controlling a single piece of equipment, such as a motor, to a very complex one of controlling a whole chemical plant. Moreover, we want to control them in the best, or nearly best, possible manner with respect to some chosen criterion or criteria of optimality. These criteria are usually referred to as the performance indices, or criterion functions (functionals), etc. In each of these control problems, we are given a physical system (a plant) that cannot be altered, and a certain amount of key information on the plant and the nature of the control problems. The information on control problems may be classified loosely into four somewhat interrelated classess" ": (I) requirements on over-all control systems to be synthesized, (2) characteristics of plants, (3) characteristics of the controllers to be used, and (4) permissible interactions between the controllers and the plants. The first class of information will include such things as the desired responses of the plants which may be given indirectly by the performance indices or directly in terms of the desired outputs of the plants, such as the requirement that outputs of plants follow inputs exactly.
* Superscript
numbers refer to the references at the end of this book.
2
I.
INTRODUCTION
In the second class will be included descriptions of the dynamical behaviors of given plants. For example, plants may be governed by linear or nonlinear ordinary differential equations, difference equations, or by partial differential equations, the last being the case for distributed parameter systems. This class may also include information available on plant parameters and on random disturbances affecting the plant behavior, such as plant time-constant values, probability distribution functions of random noises acting on the outputs of plants, or random variations of some plant characteristics, and so on. Available controllers may be limited in amplitude or in total energy available for control purposes. Controllers to be used may be capable of storing certain amounts of information fed into them. Their complexities may also be constrained. For example, for some reason we may want to use only linear controllers, or we may want to limit their complexities by allowing no more than a specified number of components, such as integrators and so on. This information is given by the third class. Finally, the fourth class may include specifications on the types of feasible measurements to be performed on plants, on the ways actuators can influence plant behaviors, and generally on the way information on the states of plants is fed back to the controllers and descriptions of the class of inputs the controllers are expected to handle, etc. The natures and difficulties of optimal control problems, therefore, vary considerably, depending on the kinds of available information in each of these four categories. The theory of optimal control has reached a certain level of maturity, and we now possess such theoretical tools as Pontryagin's maximum principle.P! dynamic programming,20-22 functional analysis,"! RMS filtering and prediction theory,U,98 etc., in addition to the classical control theory, to synthesize optimal control systems, given the necessary information for the problems. However, one major shortcoming of these theoretical tools is that they assume "perfect" information for the problems to be solved. Namely, for such theories to be applicable one needs information such as the equation for a system to be controlled, the mechanism by which the system is observed, the statistical properties of internally and externally generated noises affecting system performance, if any, the criterion of performance, and so on. In other words, when all pertinent information on the structures, parameter values, and/or nature of random disturbances affecting the system performances are available, the problem of optimally controlling
1.
INTRODUCTION
3
such systems can, in principle, be solved. Such a theory of optimal control might be termed as the theory of optimal control under perfect information. In reality, the "perfect information" situation is never true, and one needs a theory of control which allows acceptable systems to be synthesized even when one or more pieces of key information required by the current optimal control theory are lacking. This book is intended as an attempt to offer partial answers to the defects of "perfect information" optimal control theories. It primarily discusses optimal control problems with varying assumptions on items in Classes 2 and 4, and with relatively standard assumptions on items in Classes 1 and 3. The main objective of the present book, therefore, may be stated as the unified investigation of optimal stochastic control systems including the systems where some information needed for optimal controller synthesis is missing and is to be obtained during the actual controlling of the systems. In this book we are concerned with closed-loop optimal control policies of stochastic and adaptive control systems. More detailed discussion on the nature of optimal controls is found in Section 1 of Chapter II. Although closed-loop control policies and open-loop control policies are equivalent in deterministic systems, they are quite different in systems involving random elements of some kinds.P For an elementary discussion of this point see, for example, S. Dreyfus.P Further discussions are postponed until Section 1, A of Chapter II. Whatever decision procedures controllers employ in supplying the missing information must, of course, be evaluated by the consequences reflected in the qualities of control in terms of the stated control objectives or chosen performance indices. Statistical decision theory 29 ,11 5 will have a large part to play in synthesizing optimal controllers. Papers on the theoretical and computational aspects of optimal stochastic and adaptive control problems began to appear about 1960. 3 , 21 , 55 ,60 ,61 In particular, in a series of four papers on dual control theory, Fel'dbaum recognized the importance of statistical decision theory." The major part of the present book is concerned with the question of how to derive optimal Bayesian control policies for discrete-time control systems, The derivation is somewhat different from that of Fel'dbaum, however, and is partly based on the method suggested by Stratonovich.P? For similar or related approaches see Refs. 2, 54, 105a,
124, 132, 133, 141.
4
I.
INTRODUCTION
2. Preliminary Examples In order to introduce the topics of the next three chapters and to illustrate the kinds of problems encountered there, very simple examples of optimal control problems are discussed in this section without showing in detail how the indicated optimal controls are derived, before launching into detailed problem formulations and their solutions. These simple examples will also be convenient in comparing the effects on the complexities of optimal control policies of various assumptions on the systems. The readers are recommended to verify these optimal controls after becoming familiar with the materials in Chapters II and III. The plant we consider in these examples is described by the firstorder scalar difference equation UiE(-OO,
(0),
0
~
i
~
N-l
(1)
where x, a, b, and u are all taken to be scalar quantities. The criterion function is taken to be That is, a final value control problem of the first-order system is under consideration. We will consider only nonrandomized control policies in the following examples. The questions of randomized controls versus nonrandomized controls will be discussed in the next chapter. For the purpose of comparison, a deterministic system is discussed in Example I where the plant parameters a and b are assumed to be known constants. Later this assumption is dropped and the optimal control of System (I) will be discussed (Examples 2, 5~ 7) where a and/or b are assumed to be random variables. The effects on the form of control of random disturbances on the plant and observation errors will be discussed in Examples 3 and 4. In all examples the control variable u is taken to be unconstrained. Optimization problems, where the magnitude of the control variable is constrained, are rather complex and are discussed in Ref. 45, for example.
A.
OPERATIONS WITH CONDITIONAL PROBABILITIES
Before beginning the discussion of examples, let us list here some of the properties of conditional probabilities'" (or probability densities when they exist) that are used throughout this book. These are given for probability density functions. Analogous relations are valid in terms
2.
PRELIMINARY EXAMPLES
5
of probabilities. Some definitions, as well as a more detailed discussion of expectations, conditional expectations, and other useful facts and theorems in the theory of probabilities, are found in Appendix I, at the end of this book. There are three basic operations on conditional probability density functions that are used constantly. The first of these is sometimes referred to as the chain rule: pea, b I c)
=
pCb I c) pea I b, c)
(2)
Equation (2) is easily verified from the definition of conditional probability densities. The second operation is the integrated version of (2): pea I c)
=
JpCb
1
c) pea I b, c) db
(3)
This operation is useful when it is easier to compute pCb I c) and pea I b, c) than to compute pea I c) directly. For example, consider a system with a plant equation
(4) where ex is a random system parameter. Assuming that p(ex I Xi) is available, this formula is used to compute p(xi+l I Xi)' since P(Xi+l I Xi , ex) is easily obtained from the plant equation (4) if the probability density P(~i) is assumed known. The last of the three basic operations is used to compute certain conditional probability densities when it is easier to compute those conditional probability densities where some of the variables and the conditioning variables are interchanged. This is known as Bayes' formula: pea 1 b c)
,
=
pea I b)p(c I a, b)
f pea I b) p(c I a, b) da
(5)
or its simpler version pea I b) =
pea) pCb I a)
f pea) pCb I a) da
(6)
The Bayes formula is used, for example, to compute p(x i 1 Yi) given
P(Yi I xi) where Yi is the observed value of Xi .
1.
6
INTRODUCTION
In this book the notation E(·) is used for the expectation operation. A detailed discussion of the E(·) operation can be found in Appendix 1. This is a linear operation so that, given two random variables X and Y with finite expectations and two scalar quantities a and b,
+ bY) =
E(aX
a E(X)
+ b E(Y)
(7)
This formula is also valid when E(X) and/or E(Y) is infinite when the right-hand side of (7) is well defined. Another useful formula is E(X2)
[E(X)]2
=
+ var X
(8)
where var X is the variance of X which is defined to be var X
B.
EXAMPLE
1.
=
E(X - EX)2
(9)
DETERMINISTIC CONTROL SYSTEM
Suppose we have a scalar deterministic control system described by the difference equation (1) with a and b known and observed by Yi
=
Xi'
(10)
O~i~N-l
Such a system is drawn schematically in Fig. 1.1. Equation (10) shows that the state of the system is observed exactly. That is, the control system of Example 1 is deterministic, completely specified, and its state is exactly measured. This final control problem has a very simple optimal control policy. Since ] =
X N2 =
(ax N - 1
+ buN _ 1 ) 2
clearly an optimal control variable at time N - 1, denoted by utr-l , is given by
(11) , - - - - - - - - - - - - -PLANT --, I
I
I
I
I
IL
_
Fig. I.l. System with deterministic plant and with exact measurement. a, bare known constants.
2.
7
PRELIMINARY EXAMPLES
U o*, U 1 *,..., Ut-2 are arbitrary, and min] = 0. Actually in this example we can choose anyone or several of the N control variables u o, U 1, ... , U N- 1 appropriately to minimize ]. For the purpose of later comparisons we will consider the policy given by (11) and choose U i * = ~ ax] b, i = 0, 1,... , N ~ 1. From (11) we see that this optimal control policy requires, among other things, that (i) a and b of (1) be exactly known, and that (ii) X N- 1 be exactly observable as indicated by (10). When both of these assumptions are not satisfied, the optimal control problem of even such a simple problem is no longer trivial. Optimal control problems without Assumptions (i) and/or (ii) will be discussed later. Now let us discuss the optimal control problem of a related stochastic system where the plant time-constant a of (I) is assumed to be a random variable.
C. EXAMPLE
2.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH RANDOM TIME CONSTANT
Consider a discrete-time control system Ui
Xo Yi
E (-00,00)
(12)
gIven
=
(13)
O~i~N-I
Xi'
where {ai} is a sequence of independently and identically distributed random variables with known mean () and known variance a 2 • This system is a slight modification of the system of Example 1. It is given schematically in Fig. 1.2. The criterion function is still the same X N 2• Since X N PLANT
r--------------l
I
I
I
I I
I
L
~
__
~
__
~_
Fig. 1.2. System with random plant and with exact measurement. a, are independently and identically distributed random variable with known mean and variance; b is a known constant.
1.
8
INTRODUCTION
is a random variable now, an optimal control policy is a control policy which minimizes the expected value of ], Ej. Consider the problems of choosing U N- I at the (N - I )th control stage. Since (14)
where the outer expectation operation is taken with respect to the random variables X o , Xl , X N- I , * EX N 2 is minimized by minimizing the inner conditional expectation with respect to U N- I for every possible collection of X o ,... , X N- I , U o ,... , UN-I' Now
where
is taken to be some definite (i.e., nonrandom) function of In obtaining the last expression in (15), use is made of the basic formula of the expectation operations (7) and (8). From (15),
Xo
,
U N- I
Xl"'"
XN- I •
U~-l
= -8x rv_ I lb
(16)
and (17)
By assumption, a is a known constant. Therefore, the problem of choosing U N- 2 is identical to that of choosing UN-I' Namely, instead of choosing U N- I to minimize EX N 2, U N- 2 is now chosen to minimize a 2 E(x7v_I)' Thus it is generally seen that each control stage can be optimized separately with O~i~N-I
(18)
and min
E]
a 2 Nx o2
=
Uo'" "UN_l
(19)
This problem can also be treated by a routine application of dynamic programming. 20 Define Irv_n(x) =
* Since Uo
, ... ,
UN-l
min
u,u·····uN_I
E(X N
2
1
Xn
= x at time n)
(20)
only nonrandomized closed-loop control policies are under consideration, are some definite functions of Xo , ... , XN-l for any given control policy.
2.
9
PRELIMINARY EXAMPLES
starting from x at time n employing an optimal sequence of controls Un ,"', UN-I' Then, invoking the principle of optimality, I N - n satisfies the functional equation
IN_n(x) is the expected value of
XN 2
(21)
where To solve (21), it is easily seen that I N-n(Xn) is quadratic in put
Xn ;
therefore, (22)
where Q's and iL's are to be determined. Since Io(xN ) Qo = 1,
=
fLo = 0
X N 2,
we have (23)
From (21)-(23) one obtains Qn
=
fLn =
u 2n
(24)
0,
(25)
therefore min
uO·····uN_l
EXN 2
=
a 2Nx o2
with (26)
O~i~N-l
Comparing (18) with (11) of the previous example, one notices that , U 1 , ••• , U N- 2 are no longer arbitrary and the mean is regarded as "a" of the deterministic system. If you consider a system associated with (12) where the random variable ai is replaced by its expected value e, then we have a deterministic system
e
Uo
with i
= 0, 1,... , N
- 1
If you consider a control problem with this plant equation replacing the original system (12), then from (II) the optimal control policy for this associated system is such that
which turns out to be identical with the optimal control at time N ~ 1 for the original system (12), This is an example of applying what is
I.
10
INTRODUCTION
known as the certainty equivalence principle,49,136a where a given stochastic system is replaced by a corresponding deterministic system by substituting expected values for random variables. Sometimes optimal control policies for the deterministic system thus obtained are also optimal for the original stochastic systems. The detailed discussion of this principle is deferred until Chapter II, Section 2. Systems involving randomness of one sort or another are called stochastic to distinguish them from deterministic control systems. The adjective "purely" is used to differentiate stochastic systems with known probability distribution functions or moments, such as mean and variance, from stochastic systems in which some of the key statistical information is lacking, or incomplete. Such systems will be called adaptive to differentiate them from purely stochastic systems. The system of this section is therefore a simple example of a purely stochastic control system. One can go a step further in this direction and consider an adaptive system, for example, by assuming that the mean e is random with a given a priori distribution for e. Before doing this, let us go back to the basic system of Example I and add random disturbances to the state variable measurement (10) and/or to the plant equation (I).
D.
EXAMPLE
3.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH NOISY OBSERVATION
Let us now assume that the observations of state variables are noisy. Figure 1.3 is the schematic diagram of this system. Later, in Example 4 of this section, as well as in Chapters III and IV, we will consider
,------ -
-
-
-
-
-PLANT ---, I
...,- _ _ r-r-
L
Yj
I
I Xi I I
J
I
Fig. 1.3. System with deterministic plant and with noisy measurement. a, bare known constants, and 'I; are measurement noises.
2.
PRELIMINARY EXAMPLES
11
several such examples which show that the optimal control problems with noisy observations are substantially more difficult than those with exact state variable observations. In this example, the plant parameters a and b are still assumed given, but instead of (10) we now assume that Yi
=
Xi
+ "Ii ,
(27)
O~i~N-l
where YJi is the noise in the observation mechanism (observation error random variable of the system at time i). Its first and second moments are assumed given. Otherwise, the system is that of Example 1. Note that it is no longer possible to say as we did in Example I that the control variable of (11) is optimal, since what we know at time N - I is the collection YN-l 'YN-2 , ... , Yo rather than that of X N- 1 , X N- 2 , ... , X o ; i.e., X N-1 is not available for the purpose of synthesizing control variable UN-I' We must now consider dosed-loop control policies where U i is some deterministic function of the current and past observations on the system state variable and of past employed controls. That is, the control is taken to be
and the function cPo, cPl ,... , cPN-l must be chosen to mimrmze E]. Control policies are discussed in Section I, A, Chapter II. Denote the conditional mean and variance of Xi by (28)
E(xi!Yo'''',Yi)=P-i
and var(xil Yo '''',Yi) =
Ui
2,
O~i~N-l
(29)
Then, from (7), (9), (28), and (29), E(XN
2 =
j
Yo '''',YN-l' E[(aX N_ I
Uo , ... , UN-I)
+ bUN_ I)21 Yo '''''YN-I' Uo , ... , UN-I] (30)
By choosing UN- 1 to minimize (30) for given is minimized, since
EX N 2
Yo , ... , YN-l , U o , ... , UN-I'
(31)
where the outer expectation is with respect to all possible y N - 1 and uN-I, where the notation y N - I is used for Yo , ... , YN-l and UN-I for Uo , ... , UN-I'
I.
12
INTRODUCTION
If GN_1 is independent of UN-I' then
=
-a E(XN _ 1 I yN-I)/b,
U o , ...
, U N- 2
arbitrary
(32)
is optimal in the sense that this control policy minimized E J, and (33)
Note that the problem of choosing U N-1 optimally is reduced to that of estimating X N-1 given yN-l by the conditional mean ILN-l. Later we will see how to generate such estimates using additional assumptions on the observation noises. See, for example, Section 3, Chapter II, and Section 2, Chapter III. Note also that one of the effects of noisy observations is to increase the minimal EJ value by some positive constant value proportional to the variance of the noise. E. EXAMPLE
4.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH ADDITIVE PLANT NOISE
The system to be considered next is that of Example 1, with random disturbances added to the plant equation: Xi+l
=
Xo
grven
aX i
+ bu, +
ti'
(34) (35)
where
~i
are independent with (36) (37)
See Fig. 1.4 for the schematic diagram. Proceeding as in Example 2, (38)
smce and
2.
13
PRELIMINARY EXAMPLES
r-----
~i- - - - - -PLANT --, I
I I
I I I L
_
Fig. 104. System with deterministic plant, with additive random plant disturbances, and with exact measurement. a, b are known constants, and ti are random disturbances on the plant.
because the conditional probability density p(xN I X N- I ,UN-I) is given by that of ~N-I with ~N-I = X N - aX N_ I - UN-I' From (38), the optimal policy is given by (39)
since G N- I is a constant independent of UN-I' Observe that the random disturbance in the plant equation has an effect on EJ similar to that of the disturbance in the observation equation. In both cases the minimum of E(] I yN-I) is increased by an amount proportional to the variance of the disturbances. Since the mean of t, is zero, the system of Example 1 is the deterministic system obtained from the system of Example 4 by replacing ~i by its mean, i.e., by applying the certainty equivalence principle to the system. Comparing (11) with (39), the optimal control policy for this system is seen to be identical with that of Example 1. Comparing Example 3 with Example 4, the optimal control policy for Example 4 is seen to be simpler. In Example 3 it is necessary to compute f-t's, whereas the optimal control policy for Example 4 is the same as that of Example 1. As this example indicates, it is typically more difficult to obtain optimal control policies for systems with noisy state vector observations than with exact state vector measurements.
F.
EXAMPLE
5.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH UNKNOWN TIME CONSTANT
In Examples 1, 3, and 4, it is essential that the plant time-constant a be known exactly since it appears explicitly in the expressions for
1.
14
INTRODUCTION
optimal control policies for these systems. In this example, we consider the system described by u,. E (-00, (0) Yi =
Xi
+ YJi
(40)
(41)
(42)
where "a" is now assumed to be a random variable with known mean and variance and where YJ's are assumed to be independent. It is further assumed that "a" is independent of YJ's and that E(a)
= ex
(43) (44)
where ex and Ul are assumed known. One may interpret the value of the time-constant "a" as a sample from a common distribution function with known mean and variance. Such a situation may arise when the plant under consideration is one of the many manufactured in which, due to the manufacturing tolerances, the time-constant of the plant is known to have a statistical distribution with known mean and variance. The noise in the observation (41) prevents the determination of "a" exactly by measuring the state variables at two or more distinct time instants. This problem is a simple example of plant parameter adaptive control systems. Later we consider another parameter adaptive system, in Section H (Example 7). In Example 3, we have derived the optimal control policy when a is a known constant. There we have U~_l
=
-afLN-l/ b
In this example a is not known exactly. In Examples 1 and 3, by comparing (11) and (32) we see that the only change in U N- 1 when the observations are noisy is to replace X N-1 by its conditional mean value fLN-l . In Example 2, where the time constant is chosen independently from the common distribution at each time instant, the time-constant a in the optimal control of (11) has been replaced by the mean value in the optimal control of (18). Therefore, it is not unreasonable to expect that (45)
is optimal where the random variable a is replaced by its a posteriori mean value
2.
15
PRELIMINARY EXAMPLES
The control of (45) is not optimal. Namely, the optimal control policy for Example 5 cannot be derived by applying the certainty equivalence principle mentioned in Examples 2 and 4. To obtain the optimal control at time N - 1, compute
=
J
XN
2
p(x N
X P(XN -
=
J(aX
N- I
X P(XN -
I ,
I X N- I
a I yN-\
+ I ,
, UN-I,
a)
UN-I)
dXN dXN - I da
UN-I)
dXN - I da
bU N _ I)2
a I yN-\
(46)
where the probability densities are assumed to exist. Denoting ~
N-I =
E(ax N-I I yN-I ,
uN-I)
and
(46) can be expressed as E(XN 2 I yN-t,
UN-I)
=
(~N-It-
bU N_ I)2
+ };~-I
Therefore, assuming that l:'~_1 is independent of control at time N - 1 is given by
UN-I'
the optimal
By the chain rule, we can write
In Chapter II, we will show that if the observation noises are Gaussian then the conditional probability density function of XN- I , given a, yN-\ and U N- 2, is Gaussian, and that its conditional mean satisfies the recursion equation where fLi = E(xi I a, yi, U i - 1) and where K N - I is a constant independent of y's and u's. We will also show that the conditional variance of XN- I , given a, yN-\ and UN-2, is independent of y's and u's.
I.
16
INTRODUCTION
The conditional mean and the variance of nonlinear functions of a. Therefore, -=F E(XN _ I
~N-I
XN~l'
I a, y N-l, U N- 2) E(a I yN-l,
however, are some UN-I)
showing that the control given by (45) is not optimal. We will take up the questions of computing the optimal control policies for systems with random parameters in Chapter III. G. EXAMPLE
6.
STOCHASTIC CONTROL SYSTEM:
SYSTEM WITH UNKNOWN GAIN
In Examples 1-4 we see that their optimal control policies have the common structure that the random or the unknown quantities are replaced by their (a posteriori) mean values; i.e., the certainty equivalent principle yields the optimal control policies for these examples. The optimal control policy in Example 5, however, does not have this structure. As another example of the latter nature let us consider a stochastic control system U i E (-00 , 00) (47) X H I = ax, + bu, + ti' Xo
given
Yi
(48)
O~i~N-l
= Xi'
where a is a known constant but where b is now assumed to be a random variable, independent of g's with finite mean and variance. The schematic diagram of this system is also given by Fig. 1.4. The plant disturbance g's are assumed to be independently and identically distributed random variables with (49) (50)
O:(:i~N-l
According to the certainty equivalence principle,
U!;_l , we consider the deterministic plant
ill
order to obtain (51)
where bN -
From (11), the optimal
U N-1
I
~
E(b I X N- I)
(52)
for the system (51) is given by (53)
2.
17
PRELIMINARY EXAMPLES
With this control, the conditional expected value of tribution to E I from the last control stage, is given by E(x N 2IX N-'I
UN-I)
+~
E [(ax N-I _ _b b_ ax N-I -
=
N
I
I.
N-I
r.e., the con-
)21 X N-I] (54)
where a~_1
=
I X N- I)
var(b
Let us try another control variable U N-I
= -
bN _ I
b2 N-I
+
02 N-I
(
aXN-I
)
(55)
With this control, E(x N 2[X N-'I
UN-I)
= E [(ax N-I _
N I _ bb+ 02
b2 N-I
ax N-I
N-I
+ SN-I t )21
X N-I]
(56)
Comparing (54) and (56), we see the optimal control for the deterministic system (5 I) is not optimal since the control variable of (55) is better. This is only one of the many subtle points that arise in optimal control of stochastic systems. In Chapter III we will show how to derive such a policy in a routine manner.
H.
EXAMPLE
7.
STOCHASTIC CONTROL SYSTEM:
RANDOM TIME-CONSTANT SYSTEM WITH UNKNOWN MEAN
In Example 2, the random time-constants {aJ are assumed to have known means. Now, we assume the mean is unknown. The system is described by u, Yi
=
Xi
E (-
OCJ, OCJ)
(57) (58)
where' a/s are independently and identically distributed Gaussian random variables with mean e and variance a 2 , where a is assumed known but e is assumed to be a random variable.
I.
18
INTRODUCTION
I t is convenient to introduce a notation 2(·) to denote the distribution of a random variable. Using this notation, it is assumed that (59)
°
where a is given and where N(a, b) is a standard notation for a normal distribution with mean a and variance b. The unknown mean is assumed to have the a priori distribution 2 0(8)
=
N(8 0
,
u 02 )
with 00 and U o given. This type of control problem, which is stochastic but not purely stochastic, is called adaptive or more precisely parameter-adaptive to distinguish it from purely stochastic problems. If, instead of assuming that the mean of a is known in Example 5, we assume that the mean is a random variable with given a priori distribution, then we obtain another example of adaptive control system. The optimal control policy for parameter adaptive control systems are discussed in Section 3, Chapter III.
I.
EXAMPLE
8.
SYSTEM WITH UNKNOWN NOISE
Most parts of this book are concerned with a class of control policies known as closed-loop Bayes control policies.w Loosely speaking, the Bayesian approach to the optimal control problems requires the assumption of a priori probability distribution functions for the unknown parameters. These distribution function are updated by the Bayes rule, given controls and state vector measurements up to the current time. The Bayes approach is examined in some detail in Chapter VI. The min-max approach does not assume the probability distribution functions for the unknown parameters. In Chapter IX, we will briefly discuss min-max control policiesw and their relationship with Bayes control policies. As an illustration, consider a system with perfect observation:
+ o + ~o
Xl =
ax o
Yo
X o given
=
U
where it is assumed that a is known and that
to is a random variable with
with probability p with probability I - P
where 01 and O2 are given, 01
> O2 •
2.
PRELIMINARY EXAMPLES
19
The criterion function is taken to be
] =
X1
2
=
(aX O
+ Uo + ~O)2
Since] is a function of U o as well as p we write it as ](p, u). The expected value of ] is given as
Therefore, the control given by minimizes E]:
Note that Y1* is maximized when p = 1. When p is known, the control is called the optimal Bayes control for the problem. If p is not given, U o* cannot be obtained. Let us look for the control which makes ] independent of 81 or 82 , Namely, consider Uo given by Uo
Then
Thus, if Uo is employed, X 1 2 is the same regardless of p values. Such a control policy is called an equalizer control policy. 58a ,133 The value of ] is seen to be equal to Y1 * when p = 1. In other words, the control Uo minimizes the criterion function for the worst possible case p = 1. Therefore Uo may be called the min-max control since it minimizes the maximal possible E] value. Comparing Uo and U o*, Uo is seen to be the optimal Bayes control for p = 1. For this example, an equalizer control policy is a min-max control policy, which is equal to the optimal Bayes control policy for the worst possible a priori distribution function for the unknown parameter 8. It is known that the above statements are true generally when the unknown parameter 8 can take on only a finite number of possible values. When 8 can take an infinite number of values, similar but weaker statements are known to be true. See Chapter IX, Section 2 of this book or Ferguson 58a and Sworder.I'" for details.
Chapter II
Optimal Bayesian Control of General Stochastic Dynamic Systems
In this chapter, we develop a systematic procedure for obtaining optimal control policies for discrete-time stochastic control systems, i.e., for systems where the random variables involved are such that they all have known probability distribution functions, or at least have known first, second, and possibly higher moments. Stochastic optimal control problems for discrete-time linear systems with quadratic performance indices have been discussed in literature under the assumptions that randomly varying systems parameters and additive noises in the plant and/or in the state variable measurements are independent from one sampling instant to the next. 67 ,80 The developments there do not seem to admit any ready extensions to problems where the independence assumption is not valid for random system parameters, nor to problems where distribution functions for noises or the plant parameters contain unknown parameters. In this chapter, a method will be given to derive optimal control policies which can be extended to treat a much larger class of optimal control problems than those mentioned above, such as systems with unknown parameters and dependent random disturbances. This method can also be extended to cover problems with unknown parameters or random variables with only partially known statistical properties. Thus, we will be able to discuss optimal controls of parameter adaptive systems without too much extra effort. The method to be discussed-v-" partly overlaps those discussed by other investigators, notably that of Fel'dbaum.v" Although the method presented here is essentially its equivalent,105a the present method is 20
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
21
believed to be more concise and less cumbersome to apply to control problems. For example, the concept of sufficient statistics'" are incorporated in the method and some assumptions on the systems which lead to simplified formulations are explicitly pointed out. 15 ,16 The evaluations of various expectation operations necessary in deriving optimal control policies are all based on recursive derivations of certain conditional probabilities or probability densities. As a result, the expositions are simpler and most formulas are stated recursively which are easier to implement by means of digital computers.
1. Formulation of Optimal Control Problems
A.
PRELIMINARIES
In this section, purely stochastic problems are considered. Namely, all random variables involved are assumed to have known probability densities and no unknown parameters are present in the system dynamics or in the system observation mechanisms. We consider a control system described by Uk E
Uv , k
= 0, I, ... , N - 1
(1)
where Po(x o) is given and observed by k
= 0, I, ...,N
(2)
and where X k is an n-dimensional state vector at kth time instant, Uk is a p-dimensional control vector at the kth time instant, Uk is the set in the p-dimensional Euclidean vector space and is called the admissible set of controls, t k is a q-dimensional random vector at the kth time instant, Yk is an m-dimensional observation vector at the kth time instant, and YJk is an r-dimensional random vector at the kth time instant. The functional forms of F k and G k are assumed known for all k. Figure 2.1 is the schematic diagram of the control system. The vectors tk and YJk are the random noises in the system dynamics and in the observation device, or they may be random parameters of the system. In this chapter, they are assumed to be mutually independent, unless stated otherwise. Their probability properties are assumed to be known completely. The problem of optimal controls with imperfect probability knowledge will be discussed in the next chapter.
22
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
CONTROLLER WITH
Fig. 2.1.
MEMORY
Schematic diagram of general stochastic control system.
From now on, Eq. (1) is referred to as the plant equation and Eq. (2) is referred to as the state variable observation equation or simply as the observation equation. The performance index is taken to be N
]
=
I
Wk(X k, Uk-I),
(3)
k~l
This form of performance index is fairly general. It contains the performance indices of final-value problems, for example, by putting Wi = 0, i = 1,... , N - 1 and taking W N to be a function of X N only. We use a notation uk to indicate the collection U o , U I , ... , Uk . Similarly x k stands for the collection X O, Xl" .. ' X k • Although in the most general formulation the set of admissible control at time k, Uk' will depend on x k and uk-I, Uk is assumed in this book to be independent of x k , uk-I. a. Optimal Control Policy
One of our primary concerns in the main body of this book is the problem of deriving optimal control policies, in other words, obtaining the methods to control dynamic systems in such a way that some chosen numbers related to system performances are minimized. Loosely speaking, a control policy is a sequence of functions (mappings) which generates a sequence of control actions U o , U I , ... according to some rule. The class of control policies to be considered throughout this book is that of closed-loop control policies, i.e., control policies such that the control Uk at time k is to depend only on the past and current observations yk and on the past control sequences U k- I which are assumed to be also observed. A nonrandomized closed-loop control policy for an N-stage
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
23
control process is a sequence of N control actions Ui , such that each Ui takes value in the set of admissible control Vi' Ui E Vi' 0 ~ i ~ N - 1, depending on the past and current observations on the system Yo , Yl ,... , Yi-l ,Yi and on the past control vectors Uo ,... , Ui-l' Since past controls Uo ,... , Ui-l really depend on Yo ,... , Yi-l' Ui depends on Yo ,... , Yi-l 'Yi . * Thus a control policy c?(u) is a sequence of functions (mappings) cpo , c?l ,..., c?N-l such that the domain of c?i is defined to be the collection of all points
v, E
with
Yj
,
O,s;; j ,s;; 1
where Y j is the set in which the jth observation takes its value, and such that the range of c?i is Vi' Namely, u; = Ui(yi, Ui- 1 ) = c?i(yi) E Vi ." When the value of Ui is determined uniquely from yi, u':", that is when the function c?i is deterministic, we say a control policy is nonrandomized. When c?i is a random transformation from y i, Ui-1 to a point in Vi' such that c?i is a probability distribution on Vi' a control policy is called randomized. A nonrandomized optimal control policy, therefore, is a sequence of mappings from the space of observable quantities to the space of control vectors; in other words, it is a sequence of functions which assigns definite values to the control vectors, given all the past and current observations, in such a way that the sequence minimizes the expected value of J. From (3), using E(·) to denote the expectation operation, the expected value of ] is evaluated as N
E]
=
E
(I
Wk)
lc~l
where Essentially, the method of Fel'dbaum'" consists in evaluating E(Wk ) by
* For the sake of convenience, initially available information on the system is included in the initial observation. t Uo = uo(Yo) = oPo(Yo), ... , u, = U;(yi, Ui- I) = U;(yi, oPo(Yo), ... , oPi-l(yi-I)) = oPi(yi).
24
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where dx k f':, dx o dX1 '" dXk , dyk-l f':, dyo ... dYk-l, and dU k- 1 ~ duo'" dU k_l, and writing p(x k, yk-l) in terms of more elementary probability densities related to (1) and (2). Since we do not follow his method directly, we will not discuss it any further in this chapter. However, in order to give the readers some feeling for and exposure to his method, we give, as an example in Section6, his method of the treatment of a particular class of parameter adaptive systems. The other method, to be developed fully in this and later chapters, evaluates not R k directly but the conditional mean of W k , E(Wk I y k-l, u k- Z)
=
JWk(Xk , Uk-I) P(Xk , Uk- I Iyk-l, uk- Z) dXk dUk_I
(4)
and generates p(x k I yk, Uk-I)
and P(Yk+l I yk, Uk),
0 :;:;; k :;:;; N - 1
recursively. See (21) and (22) for the significance of these expressions.
b. Notations It may be helpful to discuss the notations used in the book here. In the course of our discussions, it will become necessary to compute various conditional probability densities such as P(Xi+1 I yi). As mentioned before, we are interested in obtaining optimal closed-loop control policies; i.e., the class of control policies to be considered is such that the ith control variable U i is to be a function of the past and current observable quantities only, i.e., of yi and Ui- l only, 0 ~ i ~ N - 1. If nonrandomized control policies are used, * then at time i, when the ith control Ui is to be determined as a function of yi as Ui = 1>i(yi), it is the functional form of 1>i that is to be chosen optimally, assuming 1>i-I are known. In other words, 1>i depends on 1>i-I. Note that even though the function 1>i is fixed, 1>i(yi) will be a random variable prior to time i since yi are random variables. It will be shown in the next section that these 1>'s are obtained recursively starting from 1>N-I on down to i is expressed as a function of 1>0 ,...,1>i-l , which is yet to be determined. Therefore, it is sometimes more convenient to express Ui = 1>i(yi) as Ui = Ui(U i-l, yi), whereby the dependence of u i on past controls 1>0 ,...,1>i-1 is explicitly shown by a notational abuse of using Uj for 1>j , 0 :;:;; j :;:;; i. Since Ui is taken to be a measurable function of yi,
* It is shown later on that we need consider only the class of nonrandomized closedloop control policies in obtaining optimal Bayesian control policies.
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
25
Of course, one must remember that p(. I yi, u i ) is a function of Ui , among others, which may yet be determined as a function of yi (or equivalently of Ui-1 and yi). To make this explicit, sometimes a subscript 4>i is used to indicate the dependence of the argument on the form of the past and current control, e.g., pq,JXi+l I Xi' yi) = p(Xi+l I Xi'
ic,
= 4>i(yi)).
When randomized control policies are used, the situation becomes more complicated since it is the probability distribution on Vi that is to be specified as a function of Ui-1 and yi; i.e., a randomized control policy is a sequence of mappings 4>0 ,4>1 ,..., 4>N-1 such that 4>i maps the space of observed state vectors yi into a probability distribution on Vi . A class of nonrandomized control policies is included in the class of randomized control policies since a nonrandomized control policy may be regarded as a sequence of probability distributions, each of which assigns probability mass 1 to a point in Vi' 0 ~ 1 ~ N - 1. The question of whether one can really find optimal control policies in the class of nonrandomized control policies is discussed, for example, in Ref. 3. For randomized control policies,
hence P(Yi+1 I yi) is a functional depending on the form of the density function of ui , p(ui I yi). When Ui is nonrandomized, P(Yi+1 I yi) is a functional depending on the value of u, and we write
or simply P(Yi+1 I yi, ui )· The variables Ui or ui are sometimes dropped from expressions such as p(. I yi, u i ) or p(. I yi, ui ) where no confusion is likely to occur. Let (5) be the joint conditional probability that the sequence of the state vectors and observed vectors will lie in the elementary volume dx o'" dx, dyo ... dYi-1 around Xi and yi-\ given a sequence of control specified by 4>i-\ where the notation d(Xi,yi-1)
=
d(x o ,... , Xi ,Yo '''·,Yi-1) (6)
II.
26
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
is used to indicate the variables with respect to which the integrations are carried out. Let (7)
be the conditional probability that the observation at time k lies in the elementary volume dYk about Yk , given Xk • Finally, let (8)
be the probability that the initial condition is in the elementary volume about X o ' Various probability density functions in (5), (7), and (8) are assumed to exist. If not, they must be replaced by Stieltjes integral notations. B.
DERIVATION OF OPTIMAL CONTROL POLICIES
We will now derive a general formula to obtain optimal control policies. At this point, we must look for optimal control policies from the class of closed-loop randomized control policies. a. Last Stage Consider the last stage of control, assuming y N - 1 have been observed and U N- 2 have been determined somehow, and that only the last control variable U N-1 remains to be specified. Since U N-1 appears only in W N , EJ is minimized with respect to U N-1 by minimizing EWN with respect to UN-I' Since (9)
where the outer expectation is with respect to yN-l and U N- 2, R N is minimized if E(WN I yN-l, U N- 2) is minimized for every yN-l and U N- 2• One can write E(WN
I yN-\
U N-
2)
=
J
WN(X N, UN-I) p(X N , U N- I
I yN-I,
U
N- 2)
d(x N, UN-I)
(10)
By the chain rule, the probability density in (10) can be written as p(x N , U N-I
I yN-\
U N- 2) =
I yN-\ U N- 2) x p(x N I UN-I, yN-I)
P(U N- I
(11 )
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
where p(XN
I uN-I, yN-I)
=
27
Jp(XN I X
N - I , uN-I, yN-I)
X
P(XN - I I uN-I, yN-I) dXN _ I
(12)
If the fs and YJ's are mutually independent and for each k, i.e., if i. ,..., ~N-I , YJo ,... , YJN-l are all independent, then, from Eqs. (1) and (2), ~o ,
(13) O~i~k-l
We will use Eq. (13) throughout this section. Developments are quite similar when this Markov property'" does not hold. One merely uses the left-hand side of Eq. (13). See Section 2 of Chapter IV for more general discussions of the Markov property. In particular, in (12),
and (14)
since
U N- I
affects
XN
but not
X N-1 •
Define
Therefore, if one assumes that (14) is available, then (10) can be written as (15)
where AN
~
JW
N(x N , UN-I) p(X N
X P(XN -
I
I yN-I,
U
N- 2)
I XN - I
, UN-I)
d(XN , XN - I )
(16)
In (16), the probability density p(x N I X N- I ,UN-I) is obtainable from the known probability density function for ~N-l and the plant equation (I) under appropriate assumptions on (1). See for example Eq. (27). The second probability density in (16), P(XN-I I yN-I, U N- 2), is not generally directly available. It will be shown in the next section how it can be generated. For the moment assume that it is available.
II.
28
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Thus /ow is in principle computable as a function of yN-l and uN-I, hence its minimum with respect to U N-1 can in principle be found. Denote this minimizing U N- 1 by U"tr-l' Define
Thus, the minimization of EWN with respect to PN-l IS accomplished by that of E(WN I yN-\ UN- 2), which is achieved by taking \ .IS a f unction . PN-l 0~( U N- 1 UN-I' S'mce I\N 0 f y N-l an d U N-l , U N _ 1 IS obtained as a function of yN-l and U N- 2 as desired. See Fig. 2.2 for illustrations of random and nonrandom control policies and the corresponding values of the conditional expectation of WN • In Eq. (15) the expression PN-l(U N- 1 ) represents a probability density function of U N- 1 E UN-l' where the functional form of the density function depends on the history of observation or on yN-I. The functional form of PN-l specifies the probability PN-l(U N-1) dU N_ 1 with which a control in the neighborhood of a point U N-1 is used in the last control stage. However, we have seen that this generality is not necessary, at least for the last control UN-I' and we can actually confine our search for the optimal U N- 1 to the class of nonrandomized control policies; i.e., the value of the optimal control vector U N- 1 will actually be determined, given yN-\ and it is not that merely the form of the probability density will be determined. We can see by similar arguments that the Ui are all nonrandomized, 0 :(; i :(; N - 1. Thus, we can remove U N-1 from
-
* )
* .
>'N ,
f.~~
,
_ _ :-
-/1
--I
i: f x P
' ULtf -!------
i
N,
: ,
,
U~_I
UN_I
'
--i--S(uN-I -u"N-l )
t
-----
,
:
~
:
"
UN_I
Fig. 2.2.
d
UN-I
,
:.
:
N N_'(UN_,)
E(WN
:
i
P (
N-tUN_1
)
""'"'- UN_I
I yN_l)
versus
UN_I'
t If u'N- is not unique, then the following arguments must be modified slightly. By 1 choosing anyone control which minimizes /ow and concentrating the probability mass one there, a nonrandomized control still results.
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
29
the probability density function in Eq. (11) and we can deal with p(x N I yN-l) with the understanding that UN-1 is uniquely determined by
yN-l.
Figure 2.3 illustrates this fact schematically for scalar control variable. A typical PN-l(U) may have a form like Fig. 2.3(a), where UN- 1 is taken to be a closed interval. Optimal PN-l , however, is given by Fig. 2.3(b). A nonrandomized control is such that a point in UN-l is taken with probability 1. If U N-l consists of Points A, B, and C for two-dimensional control vectors, as shown in Fig. 2.4(a), then there are three possible nonrandomized U N- 1 , i.e., U N- 1 given by Point A, Point B, or Point C, whereas a neighborhood of any point in Triangle ABC typically may be chosen with a randomized control policy with probability PN-l(U) du, where du indicates a small area about U in Triangle ABC. This is shown in Fig. 2.4(b).
b. Last Two Stages Putting aside, for the moment, the question of how to evaluate
P(X N- 1 I yN-l), let us proceed next to the consideration of optimal control
(a)
Fig. 2.3.
Schematic representation of randomized and nonrandomized control.
(a)
(b)
I~C ~ Fig. 2.4. Admissible control variable with the randomized and nonrandomized control policies.
II.
30
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
policies for the last two stages of the process. Assume that yN-2 and UN-3 are given. The control variable U N- 2 appears in W N - 1 and W N • Since E[WN_I(X N_I, UN-2)
+ WN(xN, uN-I)]
=
E[E(WN_1
+ W N ly N-2, UN-
3
)]
where the outer expectation is with respect to yN-2, and since a choice of certain U N- 2 transforms the problem into the last stage situation just considered, EJ is minimized by choosing U N- 2 such that it minimizes E(WN_ 1 WN I yN-2, U N- 3) for every yN-2 and by following this U N- 2 by U'Ll . Analogous to (15) we have
+
(18)
where and where AN- 1
g
J
WN-1(X N-I 'U N-2) P(X N-I
I XN-2'
X P(X N-2 I yN-2, UN- 3 ) d(X N_I ,
Also since
E( W N I yN-2, UN- 3 ) yN-2
=
X N-
UN- 2)
(19)
2)
E[E( W N I yN-I, UN-2) I yN-2, UN- 3 ]
C yN-l. This is seen also from p(. I yN-2, UN- 3 )
=
Jp(. I yN-\ UN-2) P(YN-l I yN-2, UN-2) X P(UN-2)d(YN-l , UN-2)
where use is made of the elementary operations (1) and (2) discussed in Chapter I. The optimal PN-2 is such that it minimizes E(WN_ 1 WN * ) where the asterisk on W N is to indicate that ut_l is used for the last control. Now,
+
min E(WN_I
PN-2
+ W N* IyN-2, UN- 3 )
= min[E(WN_1 I y N- 2, UN- 3 ) PN-2 E(WN* I yN-2, UN- 3 ) ]
+
=
min E[WN _ 1 PN-2 E(WN* I yN-\ UN-2) I yN-2, UN- 3 ]
+
=
min E[WN_1
PN-2 YN* I yN-2, UN-3]
+
+ JYN * P(YN-I I UN-2, yN-2) dYN-I] X PN-2 dU N_2
(20)
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
31
where it is assumed that P(YN-l I UN- 2, y N- 2) is available. Defining YN-l by YN-I =
AN - I
+ f YN* P(YN-I I y N-2, UN-2) dYN-I
Eq. (20) is written as . E( W N-I mill PN-2
* IY N-2 + WN ' UN-3) =
. mill
PN-2
f YN-IPN-2 dUN-2
Comparing this with Eq. (I5), it is seen that the optimal control is such * = 0"(UN- 2 - UN*) h * .IS UN- 2 W hiIC h mnumizes .. . t h at PN-2 UN_2 YN-I , - 2 , were and the control at (N - 2)th stage is also nonrandomized. c. General Case
Generally, E('Lf:+l Wi) is minimized by minimizing E('Lf:+l Wi/y k, uk-I) with respect Pic for each yk, Uk- I and following it with pt+1 ,... , Pl;-I . It should now be clear that arguments quite similar to those employed in deriving Pl;-I and Pl;-2 can be used to determine Pic *. Define Yk
Y~+l
=
Ak
+ f Y:+I P(Yk I y k- l, Uk-I) dYk ,
(21)
== 0
where p(YIc I yk-\ Uk-I) is assumed available and where Ak IS gIven, assuming P(Xk-1 I yk-\ Uk-2) is available, by
Ak =
f Wk(Xk , Uk-I) p(Xk I Xk- I , Uk-I) P(Xk-1 I y k- I, Uk1
Then optimal control at time k -
. Y» mm uk_l
=
~k
2
)
d(Xk , Xk- I),
(22)
~N
1, ULI , is Ulc-I , which minimizes Ylc :
'Yk *,
(23)
By computing Ylc recursively, optimal control variables are derived in the order of Ul;-I , Ul;-2 ,... , Uo*. Once the optimal control policy is derived, these optimal control variables are used, of course, in the order of time u o*, u l *,..., Ul;-I. The conditional probability densities assumed available in connection with (21) and (22) are derived in Section 1,C. At each time k, Uo*, ... , UJ:_I and Yo ,... , Ylc are no longer random but known. Therefore, Ulc * is determined definitely since
and ePic is given as a deterministic function.
32
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
°
From (22), Ak = if Wk = 0. Therefore, if we have a final value problem, then Ak = 0, k = 1,2, ..., N - 1 and, from (21), Yk'S are simply obtained by repeated operations of minimization with respect to u's and integrations with respect to y's. From (21) and (23) we have Yk
*
.
mmYk
=
This is precisely the statement of the principle of optimality'" applied to this problem where
To see this simply, let us assume that the state vectors are perfectly observable, i.e., Yi
=
Xi,
Then, the key relation (21) reads
which is the result of applying the principle of optimally to Yk*
=
min
u k _ l' ...• uN_l
E[Wk
+ .., + WN [X k-1]
We have the usual functional equation of the dynamic programming if the {xk}-process is a first-order Markov sequence, for example, if tk's are all independent. Then
When the observations are not perfect, then the arguments of Yk * are generally yk-l and U k- 2• Thus the number of the arguments changes with k. "IN * is computed as a function of yN-l and U N- 2 and, at step k, Yk in Y[+J is integrated out and the presence of U k- I is erased by the minimization operation on Uk-l to obtain Yk * as a function of yk-I and U k- 2• As we will discuss in Section 3, when the information in (y k , Uk-I) is replaceable by that in quantities called sufficient statistics.I" Sk' and when Sk satisfies a certain condition, then the recursion relation for the
1.
FORMULATION OF OPTIMAL CONTROL PROBLEMS
33
general noisy observation case also reduces to the usual functional equation of dynamic programming
where
Sk
satisfies the relation
for some function 1j;. For detail, the reader is referred to Sections II, 3 and IV,2. Similar observations are valid for recurrence equations in later chapters. C. DERIVATION OF CERTAIN CONDITIONAL PROBABILITY DENSITIES
Equations (21)-(23) constitute a recursive solution of optimal control policies. One must evaluate y's recursively and this requires that the conditional densities h(xi I yi) and Pq,(Yi+l I yi) or, equivalently, p(xi I yi, U i - 1) and P(Yi+l I yi, u i ) are available. * We have noted, also, that these conditional densities are not readily available in general. The general procedure for deriving such densities are developed in Chapters III and IV. To indicate the method, let us derive these densities under the assumption that noise random vectors es and Yj's are mutually independent and independent for each time. Consider a conditional density p(xi+l , Yi+l I yi, ui ). By the chain rule, remembering that we are interested in control policies in the form of Ui =rpi(yi, Ui - 1), 0 ~ i ~ N - 1,
We can write, using (13),
(24)
* Alternately, one can just as easily generate pfx.L, They are related by P(Xi+l
I y"
u i) =
f
P(Xi+l
I Xi
I y i , u') andp(Yi+l I y i ,
, Ui) pi»,
I y,., Ui~l)
dx,
u') recursively.
II.
34
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Thus, from (24),
=
J
p(Xi I yi, Ui- 1 ) P(X'+1 I x, , U,)
X
Hence p( X.
I yi+! Ui)
HI'
=
p(Yi+1 I x,+!) dx,
(25)
i) P(Xi+! 'Yi+~ yi: U P(Yi+l I y" U') f p(Xi I yi, Ui-1 ) P(Xi+l I Xi, Ui) P(Yi+l I Xi+l) dXi
J(numerator) dXi+l
(26)
where the denominator of (26) gives P(YHI I yi, ui) and where p(xi+!lxi, ui) and P(Yi I Xi) are obtainable from the plant and observation equations and the density functions for t i and YJi . The recursion formula is started from p(x o I Yo), which may be computed by the Bayes formula p(x I Yo) o
=
Po(xo) p(Yo I x o)
I Po(xo)p(Yo I x o) dx o
where Po(x o) is assumed available as a part of the a priori information on the system. Equation (26) is typical in that the recursion formulas for r». I yi, U i - 1) and P(Yi+! I y i , ui) generally have this structure for general stochastic and adaptive control problems in later chapters. In the numerator of Eq. (26), P(XH 1 I Xi , ui) is computed from the plant equation and the known density function for ti and P(YHI I Xi+!) is computed from the observation equation and the known density function for YJi' The first factor p(xi I yi, U i - 1) is available from the previous stage of the recursion formula. With suitable conditions 73 •10 9b and
= p(ti)1 J< I p(y, I Xi) = p(YJ,) I I, I
P(Xi+ 1 I Xi' Ui)
(27)
where J< and [; are appropriate Jacobians and where the plant and the observation equations are solved for t i and YJi' respectively, and substituted in the right-hand sides. When t's and YJ's enter into Eqs. (1) and (2) additively, then the probability densities in Eq. (26) can be obtained particularly simply
1.
35
FORMULATION OF OPTIMAL CONTROL PROBLEMS
from the probability densities for t's and 71'S. See Ref. 1 for multiplicative random variable case. For example, if Eqs. (1) and (2) are Xk+l = Fk(Xk, Uk) + gk Yk = Gk(Xk) + 'YJk
then
I J< I = I J~ I =
1
and and are substituted inthe right-hand sides of Eq. (27). Thus, if P(gi)
1
= (27T)1/2
exp (u1
gi2
~?)
and P('YJi)
1
= (27T)1/2
exp (u2
'YJi 2
~;Z)
then
and
Equation (26) indicates clearly the kind of difficulties we will encounter time and again in optimal control problems. Equation (26) can be evaluated explicitly by analytical methods only in a special class of problems. Although this special class contains useful problems of linear control systems with Gaussian random noises as will be discussed in later sections of this chapter, in a majority of cases, Eq. (26) cannot be integrated analytically. We must resort either to numerical evaluation, to some approximate analytical evaluations of Eq. (26), or to both. Numerical integrations of Eq. (26) are nontrivial by any means since the probability density function P(Xi I yi, U i - 1) will not be any well-known probability density in general, cannot be represented conveniently analytically, and hence must he stored numerically. See Appendix IV at the end of this hook and Chapter III for additional details. Also see Ref. 73a. In order to synthesize u.", it is necessary to compute p(xi I yi, U i - 1) by (26) and then to compute '-\+1' to generate P(Yi+1 I yi, u i ), to evaluate E(yt+2 I yi, u i ), to obtain Yi+1 , and finally to minimize Yi+1 with respect to Ui .
36
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Note that the controller must generally remember yi and U i - 1 at time i in order to generate Ui*' Although some of the information necessary to compute U i can be precomputed, i.e., generated off-line, all these operations must generally be done on the real-time basis if the control problem is the real-time optimization problem. If k sampling times are needed to perform these operations, one must then either find the optimal control policy from the class of control policies such that
i = k, k
+ 1,... , N
- 1
where U o* through ULI must be chosen based on the a priori information only, or use approximations so that all necessary computations can be performed within one sampling time. In practice we may have to consider control policies with the constraints on the size of the memory in the controller and/or we may be forced to use control policies as functions of several statistical moments (such as mean or variance) instead of the probability density functions and generate these statistics recursively. For example, ui * may have to be approximated from the last few observations and controls, say Yi-l , Yi , Ui-2 , and U i - 1 . The problems of suboptimal control policies-t-?" are important not only from the standpoint of simple engineering implementations of optimal control policies but also from the standpoint of approximately evaluating Eq. (26). The effects of any suboptimal control policies on the system performance need be evaluated carefully either analytically or computationally, for example, by means of Monte Carlo simulations of system behaviors. We will return to these points many times in the course of this book, in particular in Chapter VII, where some approximation techniques are discussed.
2. Example. Linear Control Systems with Independent Parameter Variations A.
INTRODUCTION
As an application of the optimal control formulation given in Sections 1,Band 1,C, the optimal control policy for a linear stochastic sampled-data control system with a quadratic performance index will be derived. We assume that system parameters are independent random variables, that systems are subject to external disturbances, and that
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
37
the state vector measurements are noisy. These random disturbances are all assumed to have known means and covariances. Specializations of this general problem by dropping appropriate terms lead to various stochastic optimal control problems, such as the optimal control of a deterministic plant with noisy state vector measurements, the optimal control of random plant with exact state vector measurements, and so on. Scalar cases of such systems have been discussed as Examples 2-4 of Chapter I. This type of optimal control problem has been analyzed by means of dynamic programming. 67 ,Bo The key step in such an analysis is, of course, the correct application of the principle of optimality to derive the functional equation. By the method of Section I,B the correct functional equations will result naturally without invoking the principle of optimality explicitly. Consider the sampled-data control system of Fig. 2.5, where the state vector of the system satisfies the difference equation (28a), where the system output vector is given by (28b), and where the observation equation is given by (33): (28a)
where po(xo) is assumed given, (28b)
where Xk
Ak Bk
is an n-vector (state vector), is an n X n matrix, is an n X p matrix,
Fig. 2.5. System with linear random plant, with additive plant disturbances, and with noisy measurement. The sequence of imput signals d, are generated by Eq, (34).
38
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
is a p-vector (control vector), Uk E Uk , where Uk is a subset of E p (p-dimensional Euclidean space) and is called an admissible set of controls, gk is an n-vector (noise vector), Ck is an s-vector (output vector), and M k is an s X n matrix.
Uk
In (28a), A k , B k , and gk are generally random variables, which are assumed to be independent for each k. The {gk} random variables are also assumed to be independent of {A k } and of {B k } . The independence assumption on gk for each k can be weakened somewhat by introducing another random variable Vk such that k
=
0, 1,... , N - 1
(29)
where Ck is a known (n X n) matrix, D k is a known (n X q) matrix, v k is a q-vector, and V k is a random variable assumed to be independent for each k, and independent of A's and B's at all times. Equation (29) is introduced to handle random disturbances on the system which is not independent in k but which may be derived from another stochastic process {vk } which has the desirable property of being independent for each k. * This type of noises is not more general, since by augmenting the state vector Xk with gk' Eqs. (28) and (29) can be combined to give an equation similar to Eq. (28) with an independent random variable as a forcing term. Let
then (30)
where
and where Zk is the generalized (or augmented) state vector. t The random noise in (30), Ok' is independent for each k and of random variables Sk *'The noises fs are analogous to those generated by white noise through a linear shaping filter in continuous time processes. See for example Ref. 98. t See Chapter IV for more systematic discussions of the idea of augmented state vectors.
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
39
and T k for all k. Thus, it is seen that, by augmenting the original equation for the system state vector by another equation describing the noise generation mechanism, it is possible to treat certain classes of dependent noises by the augmented state equation, Eq. (30), on which only independent noises act. Thus, it is no loss of generality to discuss Eq. (28) with independent ~k for this class. Assume that the control problem is to make the system output follow the desired output sequence {dk } as closely as possible, measured in terms of the performance index J:
J
N
=
I
1
Wk(e k , Uk-I),
(31)
where Wk is a functional which assigns a real number to each pair of an error vector ek ('\, dk - Ck and Uk-I. For example, Wk may be a quadratic form in ek :
where V k is a positive symmetric (s X s) matrix, and a pnme denotes a transpose. The feedback is assumed to consist of (33)
where Yk is an m vector (observation vector); i.e., the controller does not observe X k directly but receives Yk where YJk is the random observation error. In most control situations, the desired output sequence {dn } is a sampled sequence of a solution to some linear differential equation on which some noise is possibly superimposed. Assume that {dk } is generated by gk+l
=
Fkg k + c.i,
dk
=
Hkg k
(34)
where gk Fk
Gk
Sk
tt,
is is is is is
an an an an an
m' vector, (m' X m') matrix, (m' X r) matrix, r-dimensional random vector independent for each k, and (s X m') matrix.
40
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Since most deterministic signals are solutions of linear differential or difference equations or can be approximated by such solutions, the class of desired output sequences described by (34) is fairly large. It is possible to combine Eqs. (28) and (34) into a single equation. Define (35)
Then (36)
where r-i
.!:jk
= (10
and the generalized output of the system is given by (37)
where
The performance index for systems described by (36) can be expressed as a quadratic from in X by defining a new V k appropriately when W's are quadratic in (31). For example, since ek
=
dk - Ck
=
Hkg k - MkXk (-M k , Hk)X k
=
Letting the new V k be
one can write (Xk'VkXk) instead of (ek'Vkek)' where the new V« again is positive symmetric with dimension (m' n). * Thus, by suitably
+
* For those not familiar with operating with partitioned matrices, see for example Gantmacher. Baa
2.
41
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
augmenting the state equation for the plant, it is possible to incorporate the mechanisms for dependent noises and/or input signals and the control problems can be taken to be the regulator problem, i.e., that of bringing the (augmented) state vector to the origin in the state space. Since we are interested in closed-loop control policies, the control at the kth sampling instant is assumed to depend only on the initially available information plus yk and U k-1 and on nothing else. We see from the above discussions that the problem formulation of this section with the system of (28) observed by (33) is not as restrictive as it may appear at first and is really a very general formulation of linear control systems with quadratic performance indices. It can cover many different control situations (for example, by regarding (28) as the state equation for the augmented systems). With this in mind, we will now discuss the regulator problem of the original system (28). In the development that follows, Wk of the performance index is taken, for definiteness, to be
B.
PROBLEM STATEMENT
Having given a general description of the nature of the problem, we are ready to state the problem more precisely. The problem is to find a control policy U N-1 such that it minimizes the expected value of the performance index E] where U i given by
E
Vi' 0
~
i ~ N - 1, and where the performance index N
] =
l:>k'VkX k 1
IS
N-l
+L
(38)
Uk'PkUk
0
where Vk's and Pk's are symmetric positive matrices, and where the system's dynamics is given by k
where Po(xo) is given and where A k , B k with
,
= 0, 1,... , N - 1 and
tk
(39a)
are random variables
i = 0, 1,0,0, N - 1
(39b)
42
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
It is assumed that gk's are independent of all (A k , B k ), that gk and (A k , B k ) are independent for each k, and the system is observed by k
=
0, 1, . , N - 1
(40)
where E(7Jk) = 0, E(7Jk7Jk') = R k , and for simplicity of exposition 7Jk is assumed independent for each k and of all other random variables gk and (A k , B k ) , k = 0, I, ... , N - 1. R's and Q's are assumed known. The situation where fs and 7J's are not independent can be treated also. See for example Section 3,E. We have seen in the previous section that this problem statement can cover situations with dependent noise, input signal dynamics, and others by considering Eq. (39) as the equation for the augmented state vectors, if necessary. Various conditional probability density functions and moments are all assumed to exist in the following discussions.
C.
ONE-DIMENSIONAL EXAMPLE
Before launching into the derivations of the optimal control policy for the problem, let us examine its simpler version of the one-dimensional problem so that various steps involved in arriving at the optimal control policy are made clear. In this way, we will avoid getting lost when we deal with general vector cases. The one dimensional problem is given with the plant equation
°
~ i ~ N -
1,
u,
(41)
E ( - 00. 00)
and the observation equation Yi
=
Xi
where (Xi' fJi , gi, and 7Ji' pendent random variables. It is assumed that E(ai)
+ TJi,
°
~
(42)
0~i~N-1
i
~
N - I, are assumed to be inde-
a,
(43a)
E({3i) = b,
(43b)
E(gi)
=
=
E(TJi)
=
0,
O~i~N-l
(43c)
and that the random variables all have finite variances. Take] to be (44)
2.
43
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
Then, according to the development in Section 1,B, in order to obtain must compute
ut-l , one
f
=
2 XN p(x N
I CXN-I , (3N-I , tN-I'
X P(CX N-I , (3N-I , tN-I' X d(x N , XN-I , =
f
(CXN-IXN-I
XN-I
XN-I , UN-I)
I yN-I)
CXN-I , (3N-I , tN-I)
+ (3N-IUN-I + t N_ I)2
X P(CX N-I ,
(3N-I , tN-I'
XN-I
X d(XN_ I ,
CX N-I , (3N-I ,
tN-I)
I yN-I) (45)
By the independence assumption, one has, in (45),
o~ and YN
=
f
[(aN-Ix N-I
X P(X N-I
+ bN _ I UN_ I )2 + a~_lx~_1
I yN-l)
+ .E~_lU~_1
i
~
N -
I
(46)
+ q~-I] (47)
dX N_I
where (48a)
.El var(ti) = ql, var((3i) =
(48b) O~i~N-I
(48c)
Let (49a)
and O~i~N-I
(49b)
These p,'s and zl's may be computed explicitly with the additional assumptions on the random variables. For example, if these random variables are all assumed to be Gaussian, then they can be computed as in the examples of Section 3. From (47)-(49),
(50)
44
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
Assuming fLi and Lli are independent of to U N-1 to give
Ui , YN
is minimized with respect (51)
where (52)
and
. mmYN = UN_I
YN *
(53)
where /
I
£:,
=
L'2
L'~_I
N-I
+ b2
2
N-I
aN -
I
+
2
(54a)
UN-I
and (54b)
The expression for YN * can be put slightly differently, retaining the conditional expectation operation E(· I yN-l) in the expression for YN*' In this alternate form, (47) can be written as (55)
where (56)
One can easily check that Eqs. (53) and (55) give the same value for since
YN*
E[/lX;"-I
+ VI I yN-l]
+ .1~_lJI = IIiL;"-I + PI =
/liL~-I
+ VI
Having obtained some insight into the problem, we will treat the general case next.
D.
OPTIMAL CONTROL POLICY
As discussed in Section I,B, in order to determine the optimal control policy for problem one must first compute Ak , I ~ k ~ N: .\.k =
E(Wk I yk-\
U k-2)
(57a)
2.
45
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
where p(X k I yk-l, Uk- 2) =
J
p(X k I Xle-I , Uk-I' A k- I , B k- I , glc-I)
X
P(Xlc-1 , A k- I , B k- I , gk-I I yk-l, Uk- 2 )
X d(Xk_l
,
Ale-I, B lc-I , gk-I)
(57b)
By the independence assumptions of the random variables Ai , B; , and YJi , 0 :(: i :(: N - 1, one can write
Therefore,
x;
=
~i
,
J[xlc'VleXle + U~-IPk-IUk-]] X p(x k I Xk- I , Ulc-I , A k-I , B le-I , gk-I)
X P(Xk- 1 I yle-l, Uk- 2 ) p(A lc-1 X
,
B k- I) P(gk-I)
d(x le , Xk- I , A lc-I , B k- I , gk-I)
(59)
To obtain U}:;_I , AN is evaluated first. Since the mean of ~N-I is zero by Assumption (39b), the contribution of (xN' V NXN) to AN is given by
J(Xk'VNXN)P(XN I XN- I' UN-I' AN-I, B N-I, gN-I) P(XN- I I yN-I) X p(A N-I, B N- I) P(gN-I) d(x N 'X N-I' AN-I' B N-I, gN-I) =
J[(AN-IXN- I + BN-IUN-I)' VN(AN-IXN-I + BN-IUN-I) + E!;(f vNg)] X
p(A N- I , B N- I) P(XN-I I yN-I) d(XN- I , AN-I, B N- I)
(60)
r
where E!; is the expectation operation with respect to Denoting by a bar the expectation operation with respect to the random variables A and B, we have, from (39b), (59), and (60),
(61 )
By minimizing (61) with respect to UN-I' the optimal UN- I is given by (62)
46
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where (63)
and where (64)
In (63), the superscript plus denotes the pseudoinverse. The pseudoinverse is discussed in Appendix II at the end of this book. If the inverse exists, for example, if P N - 1 is assumed to be positive definite, then the pseudoinverse coincides, of course, with the inverse. See Appendix A and B at the end of this chapter for further discussion of the pseudoinverse and its applications to minimizations of quadratic forms. Substituting (62) into (61) and making use of (B.9) in Appendix B,
(65)
where
and
The first term in Eq. (66c) is independent of X N-1 and yN-l by assumption on ( N - l ' The second term will be independent of past controls if the estimation error (X N-1 - fLN-l) has a conditional covariance matrix independent of X N-1 and yN-\ for example, if X N-1 - fLN-l is normally distributed (see Section 2,F). To obtain U':i-2 , we compute
where use is made of the property of the conditional expectation (68)
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
47
We encountered this relation earlier in Section I,B. Proceeding as before, noting that now (VN- 1 II) corresponds to V N , PN-2 to P N- 1, etc., the development from (60) to (66) is repeated to give
+
(69)
and (70)
where (71)
and where (72a)
and V2
g
VI
+ tr[(VN_ 1 +I1)Q N- 2] + E[(X N_2 -
X 7T 2(XN-2 -
JLN-2)'
JLN-2) I yN-2]
(72d)
In general, O~i~N-l
and O~i~N-l
(73) (74a)
where (74b)
and where (75a)
and VN-i
g
VN-i-l
+ tr[( V i +1 + I N-i-l)Qi]
+ E[(Xi -
JL, , 7T N-i(Xi - JLi)) I yi]
(75d)
48
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
When fL/s are computed explicitly as a function of yi and Ui~\ Eqs. (73)(75d) solve the proposed problem completely. Equations (74) and (75) show that the feedback coefficients A are computable before the control process begins, i.e., off-line since they do not depend on the previous control vectors nor on the observation vectors. Note also that A's are not random. Computations of fL's generally must be done on-line. They are computed later in Section 3 with the additional assumptions that the noise are all Gaussian random vectors. Figure 2.6 shows the configuration of the optimal control system. In terms of fL, (73) can also be written as (76a)
where
(76b)
and where
(76c)
is the conditional covariance matrix of
Xi .
RANDOMLY VARYING PLANT WITH ADDITIVE N O /
,------
I
I
OPTIMAL ESTIMATOR
Fig. 2.6. Optimal controller for the stochastic system of Fig. 2.5 with noisy state vector measurement. See Fig. 2.8 for the schematic diagram of the optimal estimator. = [P k + Bk'(Vk+1 -+- IN_k_dBkJi B/(VH ated by Eq. (75).
.11 k
1
+
IN __ k_l)A k; {lil. i = I, ... , N gener-
2.
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
49
When the state vectors can be observed exactly, E(Xi I yi) reduces to Xi and the term E[(Xi - fLi)' '71ixi - fLi) I yi] vanishes in the equation for Vi . Replacing fLi by Xi in (62)-(76), the optimal control vectors with the exact state vector measurements are given by O:S;;i:S;;N-l
(77)
where Ai is the same as before and is given by (75c) and (78)
where (79a)
with (79b)
Figure 2.7 is the optimal control system configuration with no observation noises. Thus, as already observed in connection with a simple system of Example 4 of Chapter I, the effect of additive noise t to the system is merely to increase y* by ii. When the performance index is given by (80)
RANDOMLY VARYING PLANT
I" - - - - - - - - -- - l
.-
I
;-1_ _-.1
I I Ix
I I
Ck k +1
I
I
UNIT DELAY 1
L
Fig. 2.7.
I I I .J
Optimal controller for the stochastic system of Fig. 2,5 when the state
vector measurement is exact, A k = -[Pk {Ii}, i = 1, .,', N, generated by Eq. (75).
+ Bk'(Vk+l ~-
IN~k-_l)Bk]t
Bk;(Vk : 1 + IN_k_1)A k;
50
II.
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
rather than by (38), the recursion formula for y* is obtained by putting all Pi equal to zero. Then, from (75c), the optimal control policy with the new performance index of (80) is given by (74a) and (75c) with Pi 0. In particular,
UJr_l
(81 )
and
Unlike the previous problem, where Pi =F 0, this problem permits a certain simplification and one can write (83)
where PI =
Generally, with Pi
A N _1 =
0,
-
BN_l(B~_1
°
~
VN B N _ 1 ) + (B~_1
VNA N _ 1)
(84)
i ~ N - 1, (85)
where
(86)
Equations (74) and (75), which define recursively the optimal feedback control coefficients and the optimal criterion function values, can be put in a little more transparent forms when the system parameters A's and B's are deterministic and fs and Y)'s are the only random variables. From (74a) and (75c), we write Ai as (74a-l)
where (74a-2)
and where we give a symbol L N - i to Defining Ii by
Vi
+ I N-i for
ease of reference. (75a-1)
we have from (75a) and (75b) Ji
=
L N _ i_1(I - BiNi)
(75a-2)
2.
51
SYSTEMS WITH INDEPENDENT PARAMETER VARIATIONS
and (75a-3)
The recursion formulas (74a-2), (75a-2), and (75a-3) for N's, ]'S and L's are initiated by putting 10 = 0 or equivalently 1N = O. Then from (75a-3) From (74a-l) and (74a-2) AN - I
=
NN-IAN-I
=
(P N -
I
+ B~-IVNBN-IrB~-IVNAN-I
which is in agreement with (63), taking note of the fact that A N B N-I are now deterministic by assumption. By using (75a-2), we have IN-I
=
Lo(I -
I
and
BN-INN-l)
and from (75a-3) Now, N N-2 , 12 , and L 2 etc. are determined in the orders indicated. Later in Section 3 of this chapter as well as in Chapter V, we will encounter a similar set of recursion equations in expressions for conditional means and conditional covariance matrices of certain Gaussian random sequences. We will postpone the discussions of the significance of this similarity until then. E. CERTAINTY EQUIVALENCE PRINCIPLE
If we consider a plant with nonrandom plant parameters and if es and Y)'s are the only random variables in the system, then the bars over the expressions for Ai , Ii , and 7Ti in (75) can be removed. Since these quantities are independent of the plant noise process {ti}, and of the observation noise process hi}' they are identical to the ones derived for a deterministic plant with no random input and with exact state vector measurements. As observed earlier in connection with (58), (66), (74), and (75), {t i } and {Y)i} processes affect only Vi and E(x i I yi). Since the optimal control vectors are specified fully when E(xi I yi) are given, the problem of optimal control is separated into two parts: the estimation of the state vectors, given a set of observation data; and the determination of proper feedback coefficients, {Ai}, which can be done from the corresponding deterministic plant.
II.
52
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
If A's are random but B's are deterministic in the plant equation, then Ai , Ii , and 7Ti are the same as the ones for the equivalent deterministic plant The procedure to obtain control policies for stochastic systems by considering the optimal control policies for the related deterministic systems where the random variables are replaced by their expected values, is called the certainty equivalence principle.49.136a One may speak of a modified certainty equivalence principle when the random variables are replaced with some functions of their statistical moments. For systems with random A's and deterministic B's, their optimal certainty equivalent control policies are the optimal control policies for the same class of stochastic systems with Yi = Xi , i.e., when the Xi are observed exactly and when E(qi) = 0, ~ i ~ N - 1, or if Yi oj::. Xi' then Xi is replaced by E(x i I y i ) . When A's and B's are both random, the optimal certainty equivalent control policies are optimal control policies for the deterministic system with the plant equation
°
For example, with N
] =
I
1
u;* = -[Pi
[X/ViX i
+ U;-lPi-1U i- 1]
+ B/(Vi+l + IN-i)B i]+ B/(Vi+l + IN-i)Ai E(Xi Iyi)
where
+ IN-i-1)Ai - A/(Vi+l + I N- i- 1) Bi[Pi + B/(Vi+l + IN-i-1)B i]+ B/(Vi+l + IN-i-1)Ai
I N-i = A/(Vi+l X X
Since B/(Vi+l
+ IN_i_1)B; oj::. B/(Vi+l + IN-i-1)B i
the optimal certainty equivalent control policy is not optimal for this class of stochastic control systems.
F.
GAUSSIAN RANDOM VARIABLES
It has been assumed in connection with Eq. (74) that quantities
are independent of
Xi
and y i •
3.
53
SUFFICIENT STATISTICS
Two sufficient conditions for this to be true are: (a) All random variables in the problem have a joint Gaussian distribution. (b) The plant and observation equations are all linear. This will be shown by computing the conditional error covariance matrix E[(xi - fLi)'(Xi - fLi) [yi] explicitly under Assumptions (a) and (b) in the next section, Section 3. See Appendix III at the end of this book for brief expositions of Gaussian random variables.
3. Sufficient Statistics We have seen in previous sections that Uk is generally a function of yk and not just of Yk . From (21) and (22) of Section I,B, we note that this dependence of Uk on yk occurs through P(x k I yk) and P(Yk+l I v". Uk) in computing y's. Intuitively speaking, if a vector Sk' a function of yk, exists such that P(Xk I yk) = p(x k I sic), then the dependence of Uk on past observation is summarized by Sic and optimal Uk will be determined, given Sk and perhaps Ylc without the need of additional knowledge of ylc-l. Such a function of observations is called a sufficient statistic.:" See also Appendix IV. We discuss two simple one-dimensional examples first. Those readers who are familiar with matrix operations and Gaussian random variables may go directly to Section 3, C.
A.
ONE-DIMENSIONAL EXAMPLE
To show that such a function exists and to see how it helps simplify the control problem solution, consider a scalar control system with a plant equation
o~ and the observation equation Yi
= hixi
+ 7Ji ,
hi
i
~
* 0,
N -1,
UiE(-oo,oo)
0 ~ i ~ N - 1
(87)
(88)
Take as the performance index a quadratic form in x and u,
] =
N
I
I
(ViXi
2
+ ti_1uL),
(89)
II.
54
OPTIMAL CONTROL OF STOCHASTIC SYSTEMS
where a i and b, are known deterministic plant parameters and where fs and YJ's are assumed to be independent Gaussian random variables with E(ti) = E('r)i) = 0, E(ti 2) = qi2 > 0, E('r)i 2) = r i2 > 0, E(ti'r)j) = 0,
O~i~N-I
(90a)
O~i~N-I
(90b)
O~i~N-I
(9Oc)
all i and j
(90d)
Assume also that X o is Gaussian, independent of fs and YJ's with mean ex and variance a 2 • This system is a special case of the class of systems discussed in Section 2. Now
where fLo l/a o2
= =
(ex/a 2 + hoyo/ro2)/(I/a2 + ho2/ro2) l/a
2
+
(92a) *
ho2 /ro2
(92b)
From (26) of Section 1,C, P
(
x. HI
I i+l) Y
=
f p(xi I yi) p(xi+1 I Xi, U i)P(Yi+l I Xi+1) dXi P(Yi+1 I y i)
(93)
From (88), (90a), and (90c), P(Yi I Xi)
(y - hX)2)
const exp ( -
=
'2r. 2"
,
(94)
From (87), (90a), and (90b), p(x i+1 I Xi 'U i )
=
const exp ( -
(Xi+1 - aix i - biui )2) 2qi2
(95)
We will now show by mathematical induction on i that p(xi I yi, Ui- 1 )
=
const exp (_ (Xi2-:/'i)2 )
(96)
holds for all 0 ~ i ~ N - 1 with appropriately chosen fLi and ai' This relation is satisfied for i = 0 by (91). Substituting (94)-(96) into (93) and carrying out the integration with respect to Xi , P(Xi+l I yi+l}
=
const exp ( _
(xi+12~
{ 0
The optimal control policy is now derived for the above system. In the previous two examples, the recursion equations for conditional probability density functions involved a one-dimensional normal distribution function. In this example, we will see that, because of noisy observations, two-dimensional normal distribution functions must be computed in the recursive generations of the conditional density functions. To obtain ut-l , compute first
=
=
f
I
QN XN
2
QN XN
2
p(x N I yN-l) p(X N
I X N- l
dX N
, UN-I'
tN-I) P(X N-1 , t N-1 I yN-l) d(x N
, X N- l , tN-l)
In Eq. (67), (68)
2. since
gi
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
is independent of
gi-l and
of
YJi
99
by assumption. From Eq. (62), O:(;i:(;N-l
(69)
Substituting Eqs. (68) and (69) into Eq. (67), AN =
I
qN[(ax N-1
+ bU N_1)2 + ao2] P(X N- 1 I "fJN-l ,yN-l)
X P("fJN-l I yN-l) d(X N- 1 , "fJN-l)
=
I
qN{[a(YN-l - "fJN-l)
+ bUN- l ]2 + ao
2
Defining the conditional mean and variance of
}
P("fJN-l I yN-l) d"fJN-l YJi
(70)
by (71a)
O:(;i~N-l
and (71b)
we have
Since ?}N-l and T N- 1 are independent of UN-I' by minimizing Eq. (72) with respect to U N- 1 the optimal control variable at time (N - 1) is given as (73)
and (74)
Therefore, if fLN-l and T N - 1 are known, so are U'fi-l and YN*' We also see from Eq. (74) that, since YN * is independent of YN-l , each control stage can be optimized separately and the optimal control policy consists of a sequence of one stage optimal policy (75)
In order to compute fL's and T'«, we show first that the conditional probability density of e and YJi are jointly normally distributed, i.e., p(8, n, I yi)
=
canst exp[-
~
(8 - (Ji , 7]i
-
?}i) Mil (8 -
~i)]
"fJi - YJi
(76a)
100 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
where (76b)
is a constant covariance matrix and where
8i
=
E(8
Il)
TJi = E(YJi I yi)
=
var( 8 I yi)
M~2
=
var( YJi I yi)
(76c)
= E[(8 - 8i )(YJi - TJi) IlJ
= M~l
M~2
Mtl
Then with the notation defined by (71b),
To verify (76) for i = 0, consider p(8 YJ I yO) ,
=
0
p(8, YJo ,yo) p(yo)
Po(8) p(YJo I 8) p(Yo I 8, YJo) p(Yo)
(77a)
where, from (65), Po(8)
=
const exp (_ (8 -
fL)2)
2u 22
(77b)
from (64), p(YJo I 8)
= canst exp (- (YJ02~ 28)2 )
(77c)
and from (66),
From (77), (76a) is seen to hold for i (j _
o-
°
with
+ (yo - rx)/(U12 + U 32) 1/u22 + 1/(u12 + u 2) + (yo - rx)/U 2 fL/( u1 + u2 3 1/(u + U2 2) + Iju3 2
fL/U22
2 3 )
(78a)
2
~
YJo
=
(77d)
=
1
2
(78b) (78c) (78d) (78e)
Note that M o is a constant matrix.
2.
101
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
Thus (76a) is verified for i = O. Next, assume that (76a) is true for some i > O. We will show that the equation is true for i + 1, thus completing the mathematical induction on (76a). To go from ito (i + 1), consider p(8, YJi , YJi+1 I yi). By the chain rule, this conditional density can be written as p(8, YJi
,YJi+l ,Yi+l
I y i)
=
p(8, YJi I y i) P(YJi+l I 8, YJi' yi) X
P(Yi+l I 8, YJi , YJi+l ,yi)
(79)
where the second factor reduces to P(YJi+l I 8) since YJ/s are conditionally independent of all other YJ's and t's by assumption. From (61) and (62), y satisfies the difference equation (80)
The third factor of (79) is given, therefore, by
By integrating both sides of (79) with respect to YJi , p(8, YJi+l ,Yi+l I yi) = P(Yi+l I yi) p(8, YJi+l I yi+l)
=
I p(8, YJi I yi) P(YJi+l I 8)P(Yi+l I 8, YJi' YJi+lY) dYJi
Therefore,
P(
I i+l) _
8 ,YJi+l
Y
-
f p(8, YJi I yi) p(YJi+l I 8) P(Yi+l 18, YJi; YJi+l' yi) f·
.
dTji
p(8, YJi I y') P(YJi+l I 8) P(Yi+l I 8, YJi' YJi+l' y') d(8, YJi'
YJi+l)
(81)
After carrying out the tedious integration in (81), we establish (76a) for 1 with i 8 _ [(Ci + Di)/ui 2 + BiD i - Ci2]8i + (B i + Ci)Z;juI 2 i+l L1 2
+
i
~
YJi+l =
Mit
1
(C i
+ Di)8;ju1 2 + [(Bi + Ci)/UI 2 + BiDi L1.2
= (l/u I 2 + B i)/L1 i 2
Miri 1 =
(l/u I 2
-
M~ril
=
(l/uI 2
+D
=
Yi+l - a(Yi - Y]i) - bu,
Zi
Ci )/L1 i 2 i)/L1 1
2
1
C i2]Zi
102 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
where
and where
Therefore, M i in (76a) all turn out to be constants which can be precalculated (i.e., calculated off-line). The only on-line computation in generating ui * is that of updating YJi and Bi where Zi depends on Yi , Yi+1 , and U i .
d. System with Unknown Noise Variance The technique similar to that used in the previous example can be used to treat control problems with unknown noise variances. As an illustration, we derive an optimal control policy for the system of Section 2,D,c given by (61) and (62) assuming now that the variances of the observation noises are unknown, and the mean is known. We only give a summary of the steps involved since they are quite similar to those of Section 2,D,c. Another approach to treat unknown noise variances is to use CYJi instead of YJi with 2?( YJi) = N(O, 1) and assume C to be the unknown parameter and apply the method of Section 3 of this chapter. Also see Sections 2 and 3 of Chapter VII. For an entirely different approach see Ref. 91a. Instead of (64), we assume that 2?(TJi)
=
N(O,.E)
where 1: is the variance which is assumed to be distributed according to
where ZO,l
-+- ZO.2
=
1
2.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
103
Namely, the unknown variance of the observation noise is assumed to be either l:1 or l:2 with the a priori probability given by ZO,I and ZO,2 respectively. Other assumptions on the distribution functions of fs and X o are the same as before. The probability of l: is now taken to be independent of X o . With the criterion function as before, each control stage can be optimized separately. To obtain u i *, we compute
Defining j E(xj I y )
\+1 can
=
and
Xj
var(xj I yj)
=
r.,
(82)
O~j~N-I
be expressed as Qi+l[(axi
"i+1 =
+ bui )2 + a2Ti + u02]
Under the assumption, to be verified later, that Xi and of U i we obtain o ~ i ~ -1
r, are independent
as the optimal control variable at time i. In order to compute Xi , consider the joint probability density function p(xi , l: I yi). It is easy to see that it satisfies the recursion equation
.
p(xi+l , l: I y'+1)
I-p(X i , l: yi)P(Xi+l Xi , Ui)P(Yi+l Xi+l , l:) dx, ----------------I [numerator] d(xi+l' l:) 1
=
I
I
It is also easy to show inductively that p(Xi , l: I y i)
=
[Zi.l S(l: - l:1)
+ Zi.2 S(l: -
l:2)]N[fl-i(l:), TiCl:)]
(83)
The second factor in (83) is the Gaussian probability density function with mean fl-i(l:) and variance riCl:) , where
+ bui)/(aTi(l:) + u 2) + Yi+1Il:J/[I/(a 2Ti(l:) + u 2) + Ill:] fl-o(l:) = (alu 2 + Yo/l:)/(Ilu 2 + Ill:) IITi+1(l:) = I!(a 2Ti(l:) + 2) + Ill: IITo(l:) = Ih + Ill: Zi+l.l = Zi,lwi.I!(Zi.l + Zi.2 (0 ~ Zi+1.2 = Zi.2 + Zi.2 fl-i+l(l:)
=
[(afl-i(l:)
0
0
3
3
U0
2
Wi.1
Wi.2)
Wi.2/(Zi.l Wi.l
W i . 2)
i
~
N - 1)
104 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
and where j
=
1,2
Then from (82) and (83) Xi
=
Zi,lfLi(l:l)
r,
+ Zi,2fLi(l:2)
=
Zi,lri(l:l)
+ Zi,2ri(l:2) + Zi,lZi,2[fLi(l:1)
the assumption that satisfied.
Xi
and
r, are independent of
- fL;(l:2)]2
Ui
is thus seen to be
3. Systems with Unknown Plant Parameters A.
OPTIMAL CONTROL POLICY
We had an occasion to mention briefly as Example 7 of Chapter I a control problem where the time constant of a system is random with an unknown mean. We did not derive its optimal control policy however. In this section we will derive optimal control policies for systems having unknown plant and/or observation parameters. Derivations of optimal control policies for this class of systems are carried out quite analogously as in Section 2. Their plant and observation equations are given by (I) and (2). The derivations hinge on the availability of conditional densities P(Ci, {3, Xi I yi), where Ci and {3 are unknown plant and observation parameters of the system. The class of control policies can again be taken to be nonrandomized by the arguments that parallel exactly those of Section 2. Optimal control policies are derived from (84)
where YN
g
AN
Yk+l
g
Ak+l
+ E(Yk+2 I yk),
O~k~N-2
and where Ai
g
JWi(X i ,
U i-1)
p(x i I Xi-I' ex,
U i-1)
1 ~ i ~ N
pea,
Xi-l
I yi-l) d(Xi_1 ,Xi, a),
(85)
3.
105
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
To obtain p(ex, Xi 1 yi), which appears in (85) recursively, consider fl, Xi , Xi+l ,Yi+l I yi) and write it by the chain rule as
p(ex,
p( ex, f3, Xi , Xi+l 'Yi+l I y i) =
p(ex, f3, Xi I yi) P(Xi+l I ex,
f3, Xi ,yi) P(Yi+l I ex, f3, Xi+l , Xi ,yi)
= p(ex, f3, Xi Iyi) p(Xi+l I ex, Xi , Ui) P(Yi+l I f3, Xi+l) Now since
the
recursion equation for Xi I yi) is obtained as
the
p(ex, fl,
p(ex, f3, Xi+l I yi+l)
=
conditional
probability
density
f p(ex, f3, Xi I yi) p(Xi+l I Xi' Ui, ex) X
P(Yi+l I f3, Xi+l) dXi
(86)
J[numerator] d(ex, f3, Xi+l)
and (87)
We next consider several examples.
B.
EXAMPLES
a. System with Unknown Random Time Constant Consider a one-dimensional control system of Example 7, Chapter I, described by the plant equation
o~
i
~
N - 1,
u,
E (-
00,
(0)
(88)
where a's are independently and identically distributed random variables with (89)
for each e where e is the unknown parameter of the distribution function. It is assumed to have an a priori distribution function (90)
106
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
The system is assumed to be perfectly observed: Yi
=
Xi'
(91 )
O~i~N-l
When the common mean of the random time constants is known, the problem reduces to that of a purely stochastic system. As such, this problem has already been discussed in Section 2 of Chapter II. By letting U o ~ 0, the solution of this problem reduces to that for the purely stochastic system as we will see shortly. The criterion function is taken to be (92)
Now
(93)
°
Because of the assumption of perfect observation, the knowledge of is equivalent to that of a N- 2, since ai = (xi+l - Ui)!xi if Xi =F from (88). If Xi = for some i, it is easy to see that u j = 0, j = i, i 1,... , N - 1, is optimal from that point on. Define X
N- I
+
°
ul
=
var(a i I ai-I),
~i~N-l
(94)
~i~N-l
(95)
From Eq. (93), using symbols just defined by (94) and (95), (96)
From Eq. (96), the optimal control variable at time N - 1 is given by (97)
since aN - I and uN - I can be seen to be independent of minimal value of YN is given as
UN-I.
The
(98)
3. Now \
=
SYSTEMS WITH UNKNOWN PLANT PARAMETERS
0, i
0
** In this example, the random variables are discrete and it is convenient to deal with probabilities rather than probability densities. It is, therefore, necessary to modify prior developments in an obvious way. Such modifications will be made without further comments.
5.
SUFFICIENT STATISTICS
119
The pair (k, i) is said to be sufficient for 8, or the number of times r's are +c is the sufficient statistic for 8. Denote this number i by Sk . To obtain an optimal control policy for the system (139) one computes, as usual, YN(yN-I)
=
E(WN I y N-I)
J
WN(YN)P(YN I y N- I) dYN
=
(143)
The conditional probability one needs in evaluating YN, therefore, P(r i I y i ) or P(ri I r i - 1 ) , 0 ~ i ~ N - 1. One can write it as Prir,
=
e I yi)
Pr[r i
=
=
IS
e I ri- I ]
where and where Si = t(i + (lie) LJ:~ rj) is the number of times +c is observed. Therefore, in (143) the conditioning variable yN-l, which is an N-dimensional vector, is replaced by a single number SN-l' and we can write (143) as E(WN I yN-I) = E(WN I SN-I' yN-I) = YN(YN-I, ZN-I) =
+ e + UN-I) + (l - 81)WN(aYN-I - e + UN-I)] + (1 - ZN-I)[8 N(aYN-I + e + UN-I) + (1 - 8 N(aYN_I - e + uN-I)] WN(aYN_I + e + UN-I)fJN-I + WN(aYN-I -- e + UN_I )(1 - ON-I)
zN_I[8 I WN(aYN-I
2W
2)W
=
where
0N-l
is the a posteriori estimate of 8, given yN-\
(144)
120
III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
and where
where
cxN-1
[:"
=
(~)SN-l
81
(1 -
82 1 - 81
)N-1-S N_
1
Thus YN *, which is generally a function of yN-1 and U N- 2, is seen to be a function of YN-1 and ZN-1 (or SN-1) only, a reduction in the number of variables from 2N - 3 to just two. The derivation of the optimal control policy for this problem and its relation with optimal control policies of the corresponding stochastic systems, where 8 = 81 or 82 with probability one, is discussed later in Section 1 of Chapter VII.
6. Method Based on Computing Joint Probability Density Next, we will describe briefly the procedure along the lines proposed by Fel'dbaum'" to evaluate optimal control policies for the class of system of (1) and (2). The method consists in evaluating R i by first computing
rather than by computing p(81 , 82 , ex, j3, ti , YJi , Xi I yi). We evaluate R k for any non-randomized control policies eP k - 1 = (ePo ,... , eP"-l) where Ui = ePi(Ui - l, yi), i = 0, 1,... , k - 1, since the proof for non-randomized controls proceeds quite analogously. We will first discuss the case when 81 and 82 are the only unknowns.
A.
SYSTEMS WITH UNKNOWN NOISE CHARACTERISTICS
In this section we will obtain optimal control policies for the same class of systems under the same set of assumptions as in Section 2. Define
X
p",k-l(X k , yk-1,
X d(x k , yk-1,
gk-l, YJ k - l,
gk-1, YJ k - l,
81 , 82 )
81 , 82 )
(145)
6. where
COMPUTING JOINT PROBABILITY DENSITY
121
pq;-I(X k, yk-\ gk-\ 71 k- \ 81 , (12)
p(81 , ( 2 ) pq;-I(X k I x k- \ yk-l, gk-\ 71 k- \ 81 X P"k-I(Xk-\yk-\ gk-\ 71 k - 1 181 , ( 2)
=
, ( 2)
(146)
and where
- TI P . I(X Ie-I
-
c , 11'l i i v Y i , Si
l't-
I Xi-1, yi-l , k or 1>k will be omitted when 1>k or 1>k appears as a subscript of p such as Pq,k( ... ), or it may be dropped altogether. It is always clear which 1> is meant. For example, in Pq,(Xi+1 I Xi , t i , yi), 1> is really 1>i and Pq,,(Xi+l I Xi' gi ,yi) = P(Xi+l I Xi' gi' u, = ePi(yi)) If Eq. (2) is invertible, then instead of Eq. (151) the joint probability density can be rewritten as
,°
Pq,k_I(Xk,yk-l, e- l , YJ k- l, 81 2) k-l = peel , 02) P(gi I gi-l, 01)P(YJi I YJi-l, 02)
TI
i=l
X P(Yi I Yi-l , Ui-l , gi-l , tu-: , "Ii) P(Xi I Yi , YJi) X
p(X k I Xk- l , Uk_I' gk-l)
(154)
This expression is simpler if P(Yi I Yi-l , U i- 1 , t i - 1 , YJi-1 , YJi) is easily computable. If the values of parameter vectors (J1 and (J2 are known, say (J1 * and (J2 *, then (155)
but values of 81 and 82 are actually unknown and only the a priori density function for 81 and (J2 is assumed given as poe (Jll (J2)' We use the a posteriori density function for p(81 , (J2)' After y k has been observed, define Pk(Ol , 02) = Pq,(OI , 02 I y k)
°
PO(OI , 02) pq,(yk I 1,°2)
(156)
6.
COMPUTING JOINT PROBABILITY DENSITY
123
Equation (156) is evaluated from (154) or (151) using the simpler expression of the two. Equation (156) is rather complicated in general. If (154) is applicable, then (156) becomes a little simpler with Pk(y k I (Jl , (JZ)
=
JTI p( e. I ti-l, k
(Jl)
P(TJi I TJ i-\ (JZ)
i~O
Otherwise, one needs to evaluate pk-1(yi
I (Jl' (J2)
JTI k
=
P(ti I t i-\ (Jl) P(TJi I Tf i - \ (JZ)
i~O
Note the joint density expressions such as (151) and (154) are essentially the same as (38) or (39) which are obtained by the repeated application of the recursion formula for the conditional probability density functions. The method of the present section requires the generation of the joint probability density functions such as (151) or (154), which are used in obtaining the optimal control policies. In our method developed in Chapter II and in the previous sections of this Chapter, the conditional density expressions appear directly in the equations for optimal control policies, and the conditional densities are generated recursively. For example, Pk(8 1 , 82 ) of (156) is generated as follows. Using the chain rule, P((Jl, (Jz , Xi , Xi+1 , Yi+l I yi)
=
P((Jl , (Jz , Xi I yi) X P(Xi+1 I Xi , (Jl ,yi) P(Yi+l I Xi+l , (Jz)
Integrating it with respect to
thus, P((Jl , (Jz , Xi+1 I yi+l)
=
Xi ,
JP((Jl , (Jz , Xi I yi) p(xi+1 I Xi , e, , ui) X
and
P(Yi+l I Xi+l , (Jz) dx,
-;:-----"---=---'-'-=-_:....:c:c_=-,-
_
J [numerator] d((Jl , (Jz , x i+1)
124 III.
B.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
OPTIMAL CONTROL POLICY
The procedure to obtain the optimal control policy is quite similar to that already discussed in Sections 2-4:
Ak ~
JWk(Xk , Uk-1)p(xo , to, TJo I 8
1 ,
k-1 X
II P(ti I t i - \
82 ) p(Xk I Xk-1 , Uk-1 , e-1)Pk(81 , 82 )
81 ) P(TJi I TJ i-\ 82 ) p(X i I Xi-1 , Ui-1 , ti-1)
i~1
X
Eq,k-1(Wk )
=
P(Yi I Xi , TJi) d(xk, t k- \ TJk-1)
JA dy 7e
(159)
k- 1
The optimal control policy is now obtained from
(160)
and k
where the asterisk indicates
= 1,... , N - 1
ut-1 , ut_2 , ... , Uk
* are
substituted in
h *.
C. SYSTEMS WITH UNKNOWN PLANT PARAMETERS
From the discussions in Sections 6,A and B, it is clear that similar developments are possible when the plant equation contains unknown parameters iX and f3 or, more generally, iX, f3, 81> and 82 , Since the procedure to treat such systems are almost identical, only results are given for the system when iX is the only unknown plant parameter. An a priori probability density function for iX, PO{iX) is assumed given. Again, optimal control policies turn out to be nonrandomized. Hence the probability density for Uk is omitted. The probability density function one needs is now k
=
0, 1, 2, ... ,N - 1 (161)
where (162)
7.
125
DISCUSSIONS
and where p",(Xk+1 !IX,yk) = p(XO)
Remember that they depend on
Uk.
k
TI P(Xi+1 I Xi , Ui , IX) o
Equation (162) can be computed from
J
k dx" h( IX I yk) = -;;-'-PO(IX) p",(x I uk-I, IX) c:-:-~
~
JPO(IX)P(Xk I uk-I, IX) d(xk, IX)
Define
Ak
(163)
(164)
by
k-I
=
k-I
JWkP(XO) TIo P(Yi I Xi) TI P(Xi+1 I Xi' Uk' IX) h-I(IX) d(xk, IX)
(165)
0
k
= 1,... ,N
(166)
Define A
YN N-2) YN*(yN-I, U
= =
"\
liN
minAN(yN-I, UN-I)
uN_l
(167)
where Define
= f(yN-I, UN- 2)
U~-I
* (yN-2 UN-3) YN-I' A
=
minimizes (167) with respect to
min [A N-I
UN_2
+ JYA*(yN-I , U
N
-
2) dy ] N-I
(168)
Optimal UN- 2 is obtained from (168) as a function of yN-2 and UN- 3. In general, by defining A *(Yk-I , Uk-2) Yk
=
~~~
. ["Ilk
+ Yk+1 A* (k Y , Uk-I) dYk] ,
k
=
1,... , N - 1
(69)
we thus obtain a sequential procedure for evaluating Bayes' control policy.
7. Discussions In this chapter, the main problem has been to compute certain conditional densities of the form p( Vi I yi), where the variable Vi contains
126 III.
ADAPTIVE CONTROL SYSTEMS; OPTIMAL CONTROL POLICIES
the unobservable variables such as Xi , (Xi' e), (e, Xi , ;i)' or(o;, fJ, e1 , e2 , ;i , YJi)' as the case may be. The variable Vi is chosen in such a way that p(v i I yi) can be computed recursively starting from p(v o I Yo) and such that E(Wk I yk-l) is simply evaluated in terms of P(Vk~l I yk-l). The conditioning variables contain only observable variables or known functions of the observable variables. In the next chapter, we will expand on this point further and develop a general theory of optimal control for a more general class of systems. So far, all the discussions are on control systems with known deterministic inputs. When stochastic inputs are considered as indicated in Section I, and a different criterion function Xi ,
N
]
=
L Wk(X k, d k , Uk-I) k~1
is taken for the same system (1) and (2), where dk is the desired stochastic response of the system at time k and where actual input is given by Zk , Zk
=
Kk(d k , 'k)
then a development similar to Sections 2, 3, 4, and 6 to obtain closedloop optimal control policy is possible if the desired form for Uk is specified to be For example, we may assume that the probability density function for dk is known except for the parameter fL E ei" ' e i" given, and that the probability density function for Sk , which is assumed to be independent of fs and YJ's, is completely given. If zk, in addition to yk and uk-I, is assumed to be observed, then, using f P(fL I Zk) p( dk I fL) dfL as the probability density function for dk , it is possible to discuss optimal control policies for discrete-time systems where e's, 0;, fJ, and/or other parameters may be additionally assumed unknown. Unlike a posteriori density functions for e's, 0; or fJ, the P(fL I zk) does not depend on the employed control policy since the Zk are observed outside the control feedback loop. In such cases information on fL is accumulated passively by merely observing a sequence of random variables whose realizations cannot be influenced by any employed control policies. The procedure of obtaining the optimal control policies in Section 6 accomplishes the same thing as the procedure based on evaluation of conditional expectation
7.
DISCUSSIONS
127
In the former, the computation of A. k is complicated whereas in the latter, that of p(vi I yi) is the major computation where Vi could be Xi , (Xi' ex) or (Xi' ex, fl, 81 , 82), as the case may be. Once P(vi I yi) is available the computation of A. k is relatively easy. Thus the method of this book differs from that of Fel'dbaum primarily in the way the computations of E(Wk ) are divided. Our method is superior in that the dependence of A. k on p(Vi I yi) is explicitly exhibited and hence the introduction of sufficient statistics is easily understood. The dependence of A.'S on some statistics is also explicitly shown in our method. The similarity of the main recursion equations of Chapters II and III are also more clearly seen in our formulation. Also, in Section 6, the a posteriori density function for unknown system and/or noise distribution function parameters are incorporated somewhat arbitrarily and heuristically, whereas in our method it is incorporated naturally when P(vi I yi) are computed. It is worthwhile to mention again that the problems of optimal control become more difficult when observations are noisy. We have discussed enough examples to see that the derivations of optimal control policies are much easier when state vectors and realizations of random variables are measurable without error than when only noisy measurements on state vectors are given. The difficulties in deriving optimal controls are compounded many times when the statistics of the measurement noises are only partially known.
Chapter IV
Optimal Bayesian Control of Partially Observed Markovian Systems
1 Introduction +
In the previous two chapters we have derived formulations for optimal Bayesian control policies for purely stochastic and adaptive systems. We noted that the main recursion equations for optimal control policies are identical for these two classes of systems. The slight differences are in the auxiliary equations that generate certain conditional probability densities for these two classes. The only quantities that are not immediately available and must be generated recursively are the conditional probability densities which are p(Xi I yi) in the case of purely stochastic systems and are P(Xi' 8 I yi) or p(x i, 8i , 82 , ex, fJ I yi), etc., in the parameter adaptive systems. The other probability densities needed in computing y's are immediately available from the plant and observation equations and from the assumed probability distribution functions of noises and/or random system parameters. In each case, the conditioning variables are the variables observed by the controller or some functions of these observed variables, such as y's, u's, or sufficient statistics. The other variables are the quantities not observed by the controller, such as x's or (Xi' 81 , 82 , ex, fJ), etc. Developments in the previous two chapters are primarily for systems with independent random disturbances, although possible extensions for systems with dependent noises have been pointed out from time to time. In this chapter we consider more general systems where noises t and 1] may be dependent and where unknown plant and observation parameters ex and fJ may be time-varying. We present a formulation 128
1.
129
INTRODUCTION
general enough to cover much wider classes of control systems than those considered so far. See also Refs. 2, 14-16, 56, 105a, 130, and 135 for subjects related to this chapter. The class of systems of this chapter is assumed to be described by a plant equation i = 0,1, ... , N
-~
1
(1)
and the observation equation Yi =
Gi(X i ,
i
iu , f3i),
=
0, 1,... , N - 1
(2)
where t's and YJ's are noises and where system parameters ex and f3 are subscripted now to include the possibility of these unknown system parameters being time-varying. When they are not subscripted, they are understood to be constants. The criterion function is the same as before: N
J= I
Wi(Xi , Ui-1)
1
Only the class of nonrandomized control policies will be considered. It is fairly clear that one can heuristically argue that optimal control policies for systems of (1) and (2) are nonrandomized in much the same way as before. It is fairly clear also that the approach of Chapters II and III, where certain conditional probability densities have been computed recursively to derive optimal control policies, can be extended to cover the class of control problems of this chapter. As an illustration, consider a system with the unknown plant parameter ex and the unknown observation parameter f3. The noises t and YJ are assumed to form mutually independent first-order Markov sequences such that the unknown parameters ()1 and ()2 characterize their respective transition probability densities, i.e., and We know that, if p( ex, Ai
g =
Xi '
ti I yi)
E(Wi(X i, U i-1)
J
Wi(Xi , U i-1)
X
p(ex,
X i-1 , ~i-1
is known for all
J
I yi-1) =
Wi(X i, U i-1)
p(x i I X i-1
I yi-1)
, U i-1 ,
~i-1
d(X i_1 , Xi ,
,
ex)
ex, ~i"1)
°:s;;
j
:s;;
N - 1, then
p(x i ! yi-1) dx,
130
IV.
CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
is computable for all 1 ~ i ~ Nand nonrandomized optimal control policies are derived from them. The conditional density p(Ci., Xi , ~i I yi) is obtained by computing recursively conditional densities of a certain vector which suitably augments (xi' Yi)' all conditioned on yi. For example, P(Xi' ~i' TJi , 01 , 02 , Ci., (3 I yi) is computed recursively. The derivation of such a recursion relation is carried out as usual. The chain rule is used to write =
f3 I yi) P(Xi+1 I Xi , ti , <x, ui ) P(ti+1 I ti' 81) P(YJi+1 I YJi' 82) P(Yi+1 I xi+1' (3)
P(Xi , ti , YJi , 81 , 82 X
, <X,
where the assumptions on fs and TJ's are used to simplify some of the conditional density expressions. Integrating both sides with respect to Xi , ~i , and TJi ,
Jp(xi , ti , YJi ,8
1,82 , o ,
f3, Xi+1 , ti+1 , YJi+1 ,Yi+1 I y i) d(x i , ti , YJi)
=
P(Yi+1 I yi) p(x i+1 , ti+1 , YJi+1 , 81 , 82
=
Jp(xi, ti' YJi' 8
, <x,
f3 I yi+1)
Therefore,
1 , 82 , <x,
X
f31 yi) O(Xi+1
-Fi ) p(t i+1
I t., 81)
p(YJi+1 I YJi , 82 ) O(Yi+1 - Gi+1) d(Xi , ti , 7]i)
f [numerator] d(x i+1 , t i+1 , YJi+1 , 81 , 82 , Ci., (3)
(3)
and
The recursion (3) is started from a given a priori probability density for (x o , ~o , YJo , 01 , O2 , Ci., (3): c 8 8 p( x o, £0 , YJo' l ' 2'
<X,
f3 I ) -
Yo-
Po( Xo , to , YJo , 81 , 82
, <x,
(3) p(Yo I Xo , (3)
f [numerator] d(x o , to, YJo , 81 , 82 , Ci., (3)
Conditional probability densities and optimal control policies for systems under different sets of assumptions can be similarly derived by first augmenting (x, y) appropriately so that the conditional densities for the augmented vectors are more easily obtainable.
1.
131
INTRODUCTION
As another example, if the system parameter cx is not a constant but a Markov random variable with known transition probability density P(CXi+l I (Xi)' and if fs and 7)'s are all independent with known densities, then P(xi , (Xi I yi) can be recursively generated similarly to (3) and used to evaluate
=
J
Wi+l(Xi+l , u i ) p(xi+l I Xi'
X
where P(Xi+l , (Xi+l I yi+l)
=
CXi ,
Ui)
p(Xi , CXi I yi) d(x i , Xi+l , CXi)
Jp(Xi , X
(Xi
I yi) p(Xi+l I Xi , (Xi , Ui) P(IXi+l I (Xi)
P(Yi+l I Xi+l) d(Xi , (Xi)
f [numerator] d(xi+l ,
IXi+l)
Instead of cataloging all such systems which are amenable to this approach, we will develop a general method which subsumes these particular cases. The approach we use in deriving optimal control policies for such systems is to augment the state vector x and the observation vector y with appropriate variables in such a way that the augmented state vector becomes a (first-order) Markov sequence. Then, in very much the same way as in Chapters II and III, the optimal control policy is obtained once we compute certain conditional probability densities of the unobserved portion of the augmented state vector, i.e., the components of the augmented state vector which are not available to the controllers, conditioned on the observed portion, i.e., the components of the state vector which are made available to the controllers. The knowledge of the controller consists, then, of the past controls, the observation data, i.e., the collection of the observed portions of the augmented state vectors and of the a posteriori probability distribution function of the unobserved portion of the augmented state vector. This amount of information is summarized by sufficient statistics if they exist. The derivation of the optimal control policy and the a posteriori probability distribution function, assuming the existence of the probability density function, is discussed in Section 3 and 4, respectively. In the next section, we pursue the subject of Markov properties of the augmented state vectors which are of basic importance. See also Ref.51a.
132
IV. CONTROL OF PARTIALLY OBSERVED MARKOVIAN SYSTEMS
2. Markov Properties A.
INTRODUCTION
In some cases {(Xi' Yin is already a first-order Markov sequence, as will be shown presently in this section. When this is not the case, there is more than one way, depending on the assumptions about noises and parameters in plant and observation equations, of augmenting (Xi' Yi) so that the resulting vector becomes a first-order Markov sequence. Clearly, if gi} is Markovian, where 'i is some augmented state vector, then we do not destroy the Markov property by adding to independent random variables with known distribution functions. Simplicity and ease of computing the conditional densities would dictate particular choice in any problem. The question of the minimum dimension of augmented state vectors 'i to make gi} Markovian is important theoretically but will not be pursued here. Generally speaking, the a posteriori probability density functions such as p(xi I y i ) and P(xi' gi , YJi I yi) are sufficient in the sense that the corresponding density functions at time i 1 are computable from their known value at time i. We can include the a posteriori probability density function as a part of an augmented state vector to make it Markovian. The dimension of such augmented vectors, generally, are infinite. We are primarily interested in finite dimensional augmented state vectors. As an example of a system where {(Xi' Yin is a first-order Markov sequence, consider a purely stochastic dynamic system of Chapter II:
'i
+
Yi = Gi(Xi, "1i),
i = 0, 1,... , N - 1
(4)
where fs and YJ's are mutually independent and independent in time and have known probability densities and (4) contains no unknown parameters. Consider a vector (5)
In (5), Yi and Ui are the only components observed by the controller. We will see that under certain assumptions gi}-process is a first-order Markov sequence, where the conditional probability of 'HI is such that Pr['i+l EEl
'0 =
Zo , ... , ~i = Zi] = Pr['i+l EEl 'i = Zi]
for all i, where E is any measurable set in the Euclidean space with the same dimension as ,. This is the transition probability of g}-process.
2.
133
MARKOV PROPERTIES
It is assumed furthermore that the transition probability density p( 'i+l I 'i) exists so that Pr['i+l EEl 'i = z;] =
r
&.'
~i+lEE
P('i+l I 'i = Zi) d'i+l
Let us compute the conditional probability density P('i+l I 'i) of (5) assuming that Ui depends only on Yi' or at most on Yi and Ui- 1 • This assumption will be referred to as Assumption Y. We have seen several examples in previous chapters where Assumption Y holds true. Generally speaking Assumption Y implies that Yi is the sufficient statistics for Xi , i.e., p(x i I y i) = p(x i I Yi) and P(Yi+l I yi, Ui) = P(Yi+l I Yi , Ui)' Then 'Yi+l will be a function of Yi and Ui~l rather than y' and u'', and ui * is obtained as a function of 'Yi rather than of y i [see, for example, Eq. (21) of Section II, 2, BJ. Detailed discussions on the validity of Assumption Y is presented later in this section. With Assumption Y, we can write p( 'i+l I 'i) as
= 8(x i+l - F i) P(gi+l) p("Ii+l) 8(Yi+l - Gi+l) X 8(Ui+l ~ (MYi+l , u i ) )
where the independence assumption of the random noises Thus, we see that
IS
used.
and P"'('i+l I 'i) is computable as a function of By a slight change of terminology, this example can be rephrased as a control system with the plant and observation equations
°
Assume that its control policy is given by Ui = ePi(Xi), Suppose that t/s are independent and that Efi[ti, ePi(Xi) I xi] = for all Xi . Then E(xi+l I Xi) = Xi and {Xi} is a martingale. There are other, less trivial, examples. We discuss next the maximum likelihood estimation problem of an unknown parameter. We know from Chapters II-VI that, for some optimal adaptive control problems, the optimal control policy synthesis can be separated from the optimal estimation process of the unknown plant parameters and/or noise statistics. Maximum likelihood estimates are often used when exact Bayesian conditional expectation estimates are not available or are too cumbersome to work with. If the random variables are Gaussian, these two kinds of estimates coincide. Suppose we have a system where its unknown parameter 8 is assumed to be either 81 or 82 • 47a Consider a problem of constructing the maximum likelihood estimate of 8 given a set of (n + 1) observed state vectors at time n, yn. Suppose that p(yn I Bi ) is defined for all n = 0, 1,... and i = 1,2. Form the ratio (12)
The probability density p(yn I 8), when regarded as a function of 8 for fixed yn, is a likelihood function mentioned in Section 2, Chapter V. Hence Zn is called the likelihood ratio. Since 8 = 81 or 82 in the example, the maximum likelihood estimate of 8 is 82 if Zn > 1, 81 if Zn < 1, and undecided for Zn = 1. Thus, the stochastic process {zn} of (12) describes the time history of estimate of 8. To study the behavior of the sequential estimate of 8, one must study the behavior of {zn} as n ---+ 00. Since p is a density function, the denominator is nonzero with probability one. Let us assume that
204
VI.
CONVERGENCE QUESTIONS
p(yn I ()2) = 0 whenever p(yn I ()1) = 0 since otherwise we can decide () to be ()2 immediately. Suppose ()1 is the true parameter value. Then
and
Then, since zn are random variables which are functions of yn, with probability
I
(13)
Taking the conditional expectation of (13) with respect to zn, E(E(zn+1 I yn) I zn)
=
E(E(zn+1 I yn, zn) I zn)
= E(Zn+l I zn) =
st»; I zn)
Thus, it is seen that the sequence of likelihood ratios, {zn}, is a martingale. For more practical engineering examples, see, for example, Daly,42.43 Kallianpur'", Raviv.P"
4. Convergence Questions: General Case A.
PROBLEM STATEMENT
We now make precise statements made in Section 1. This section is based on Blackwell and Dubins.s! A common frame of reference used in this book in judging system performances of control systems is the expected values of some scalarvalued functions]. In case of adaptive control systems, their expected values EJ depend, among others, on the unknown parameters () taking their values in some known parameter spaces e.
4.
CONVERGENCE QUESTIONS: GENERAL CASE
205
There are many other systems, not necessarily control systems, whose performances are judged using this common frame of reference. A sequence of measurements Yo , YI ,... is made while a given system is in operation where the measurement mechanisms are assumed to be designed so that y's are functions, among others, of 8; i.e., joint conditional probability density p(yn I 8) is assumed given for each 8 E e. An a priori probability density for 8, Po(8), is also assumed given. The a posteriori probability density p(8 I yn) is computed by the Bayes rule
p(e I yn)
=
po(e)p(yn I e)
J de Po( e)p(yn I e)
Now we formulate (from the questions on p. 197 and 198) Question 3'. Under what conditions does p(8 I yn) converge as n---+ oo ? Question 4'. Given two a priori probability densities Po(8) and qo(8), under what conditions do they approach the same density? In Questions 3' and 4', the closeness or the distance of any two probabilities PI and P 2 , defined for the same class of observable (i.e., measurable) events, is measured by the least upper bound of the absolute differences of the probabilities assigned to all such events by PI and P 2 • Denote the distance of the two probabilities by p(P I , P 2 ) and the class of observable events by ff:
In the language of the theory of probability, Q is a sample space, ff is the a field of subsets of Q, and PI and P 2 are two probability measures so that PI(A) and P 2(A) are the probabilities assigned to A, for every AEff. Some of the definitions and facts from the probability theory are summarized in Appendix I at the end of this book. Question 3' asks for the conditions for p(pn, P*) ---+ 0 as n ---+ 00 for some probability P" where pn is the nth a posteriori probability. Question 4', therefore, asks for the conditions under which
where pn and Qn are the nth a posteriori probabilities starting from the a priori probabilities Po and Qo , respectively.
206 B.
VI.
CONVERGENCE QUESTIONS
MARTINGALE CONVERGENCE THEOREMS
Both Questions 3' and 4' of Section 4,A are answered by straightforward applications of martingale convergence theorems. The forms in which we will use them are stated here without the proofs: The proofs can be found, for example, in Refs. 31, 47a. See also Appendix I at the end of this book.
Theorem 1. Let Zn be a sequence of random variables such that I has a finite expectation, converges almost everywhere to a random variable z, and let Yo, Yl ,... be a sequence of measurements. Then
SUPn , Zn
lim E(zn I yn)
n->ro
=
E(z I Yo, Yl ,... )
Theorem 2. Let in be any sequence of random variables that converges to 0 with probability 1. Then, with probability 1 and for all € > 0, converges to 0 as n
---+ 00.
We also note here that zero-one laws in their various versions,47a,102 although they can be proved directly, can also be proved by applying Theorem 1. Let
Z
= Is, where Is is the indicator of the event B, i.e., wEB w¢B
where the event B is defined on the sequence of the measurements Yo 1 Yl ,.... Then, from Theorem 1, one has
Theorem 3. on the Yn's.
PCB I y n )
. s t,
with probability 1, where B is defined
C. CONVERGENCE
We now consider the convergence of the a posteriori probability densities pee I y n ) to the true value of e, eo . It is assumed that there exists: (i) an a priori probability density which assigns a positive probability to some neighborhood of eo ;
4.
CONVERGENCE QUESTIONS: GENERAL CASE
207
e
(ii) a subset B in such that the event 80 E B is defined on the Yn's; namely, with nonzero probability, there is a realization of measurements Yo 'Y1 ,... such that the sequence of functions of Yo, Y1 ,... computed according to the Bayes rule converges to 80 • Then, by Theorem 3, PCB [yn)
L
d8p(8 jyn) ......... 1 or 0
=
depending on
This is equivalent to saying
D.
MUTUAL CONVERGENCE
For the sake of convenience in writing, let w = (y n ) be the measurements that have been performed and let v = (Yn+l , Yn+2 ,...) be the future measurements. Let A be a measurable set in the product space ~n+l X ~n+2 X "', where ~k is a a field of subsets of Y k , the space of outcomes for the kth measurement Yle . Let B be a measurable set in ff, a a field of subsets of e. Let pn(A, B I w) be the conditional probability of v being in A and B being in B, given w, when the probability density Po(B) is adopted as the a priori density function on e. The conditional probability Qn(A, B I w) is similarly defined with qo(B) as its a priori probability density. qo( B) is assumed to be absolutely continuous with respect to Po(8). Namely it is assumed that a nonnegative function of 8, feB) ?: 0, exists such that
qo(8)
= Po(8)f(8).
The main result can be stated that except for the measurement sequence with Q-probability (i.e., the probability constructed from qo(8») zero, p(pn, Qn) -+ O. The convergence to zero of distance SUPB p[pn(B I w), Qn(B I w)] as n -+ 00 is implied by the convergence to zero of SUPA,B p(pn(A, B I w), Qn(A, B I w). Therefore, the proof for this latter is sketched in this section. See Ref. 31 for details. Because of the absolute continuity assumption, we can write Q(e) =
Ie
0
[dn(w, v, 8) - 1] dpn(v, 8 I w)
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
=
J
dn-l
>€
(dn ~ 1) dpn(v, 8 I w)
+J
Q E}
Thus The last step comes from Theorem 2. Thus, if given a priori density function for 8, Po(8), any other choices of a priori density functions also converge to the same density function eventually so long as they are absolutely continuous with respect to Po(8).
5. Stochastic Controllability and Observability A.
INTRODUCTION
In deterministic dynamical systems the concepts of controllability and observabilit y8 7- 89 , l oo , 142 play a very important theoretical role in characterizing possible system behaviors. As pointed out in Section 1, the corresponding concepts of observability and controllability of stochastic systems exist and are intimately connected with the convergence questions of a posteriori probability density functions such as P(xi I yi) or p(x o 11') as i -+ 00. We will define these concepts in terms of the covariance matrices associated with these a posteriori density functions. * We have discussed in Chapters II-V the procedure for generating these a posteriori density functions for general systems with nonlinear plant and observation equations. Therefore, their definitions will in principle be applicable
* By the duality principle discussed in Section 3,C, Chapter II, the results of this section can be translated into the corresponding results on the asymptotic behaviors of regulator systems or vice versa. See Ref 89a. The independent investigation of the asymptotic behaviors of the error covanance matrices is of interest since it sheds additional light on the subject.
VI.
210
CONVERGENCE QUESTIONS
to general nonlinear stochastic systems even though they are developed for stochastic systems with linear plant and observation equations in this section. Let us illustrate by simple examples how the question of observability arises in stochastic control systems. Consider a system with the plant equation A nonsingular (14) and with the observation equation Yi =
Hix i
+ n,
(15)
where the matrix A is assumed known, where Hi YJ's are some observation noises. Then, from (14) and (15),
=
Ar', and where
showing that Yi observes a noisy version of X o for all i = 0, 1,.... In this case, since (14) is deterministic, Xi for any i > 0 can be constructed if X o is known exactly. Thus assuming the existence of p(x o I yn), if p(xo I v") converges as n ---+ 00 to a delta function then so does the density function for Xi , at least for stable systems, since Xi = Aix o . Instead of (14) consider now a system (16)
Then Xi
=
i-I
Ai X O
+ L:
Ai-l-jCi~j
j~O
and i-I
Yi =
HAi X o
+ L:
HAi-HC;~j
+ 7];
j~O
If HAi = 0 for some i o , then HAk = 0 for all k ~ i o . Therefore, no matter how many observations are made, Yk does not contain X o for k ~ i o . It is not possible to get more precise information on X o than that contained in Yo ,... , Yi o- l . Similarly, if the density function for X o is not completely known, for example if the distribution function of X o contains an unknown parameter 81 , then the observation scheme of (16) is such that p(81 I y n ) remains the same for n ;:? io . Then we cannot hope to improve our knowledge of 81 beyond that at time i o no matter how
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
211
many observations are taken. Therefore, we may want to call the system with (14) and (15) stochastically observable and the system with (16) stochastically unobservable or observable in a weaker or wider sense. Observability of stochastic systems is then defined as the existence condition of the system state vector estimates with certain specified asymptotic behaviors, where the class of the state vector estimates of Xi is taken to be functions of y i . Such observability may be called on-line observability. There is another type of observability concept which may be called off-line observability. The class of state vector estimates of Xi is no longer restricted to be functions of yi but is taken to be of Yj ,Yj+l ,..., Yj+k' where j > i or j < i. The behavior in some probability sense of such estimates as k ----+ CX) is then investigated. Both are of interest in system applications. In this book, the on-line observability is developed using the convergence in the mean. We now make these preliminary remarks more precise. Given two square matrices of the same dimension, A and B, we use the notation A ? B when (A - B) is nonnegative definite and A > B when (A - B) is positive definite. B.
OBSERVABILITY OF DETERMINISTIC SYSTEMS
Consider a deterministic system described by
where x's are n vectors and y's are m vectors and where Ai exists for all i. The problem is to determine the x/s from a sequence of observations Yo ,Yl ,.... Because of the deterministic plant equation, the problem is equivalent to determining x at anyone particular time, say x o, from a sequence of observations Yo , Yl ,.... Of course, the problem is trivial if m ? n and H o has rank n. Then X o is determined from Yo alone by Xo =
Xi
(Ho'Ho)-lHoyo
More interesting situations arise when m < n, Let us determine from y i . Defining the (i l)m X n augmented H matrix by
+
H z.
=
~
H 0 1> O ' i ~
H l 1>l.i •
Hi
212
VI.
CONVERGENCE QUESTIONS
where ePk,j is the transition matrix from x j to y vector by
Xk
and an augmented
we can write
N ow if (H/Hi ) is nonsingu1ar, then
i.e., if Hi has rank n, then the definition of Hi to
Xi
can be determined from
yi.
By changing
we obtain
Such a system is called observable. The condition that the rank of Hi is n is the observability condition of a deterministic systemJ42 This concept has been introduced by Kalman,87-89 Physically speaking, when a system is observable the observation mechanism of the system is such that all modes of the system response become available to it in a finite time. In other words, when the system is observable it is possible to determine any Xi' i = 0, 1'00" exactly from only a finite number of observations y.
C.
STOCHASTIC OBSERVABILITY OF DETERMINISTIC PLANT WITH NOISY MEASUREMENTS
Let us now consider a deterministic plant with noisy observations: (17) Yi =
H.»,
+ TJi
(18)
5.
STOCHASTIC CONTROLLABILITY AND OBSERVABILITY
213
where X o is a Gaussian random variable with a covariance matrix l:o , where noises are independent Gaussian random variables with E(1)i)
0
=
(19)
E(1)i1)/) = R/Jij
and where R i is nonsingular, i = 0, 1,.... Here again, if we can determine (or estimate) the state vector x at anyone time, such as X o , then from (17) we can determine all other x's. Because of the noisy observations, it is no longer possible to determine Xi from a finite number of observations. We compute, instead, the probability density function p(xi I yi) as our knowledge of Xi , or compute the least-squares estimate of Xi from yi if noises are not Gaussian. Since and where
rpn.i is the state transition matrix from o.i
+
-1 :(:
(k
(35a)
i.e., 0Yi is the special case of LlYi where Yi* has the form (33) as the result of the assumed form (32). Note that the operations of the minimization and the averaging with respect to Z are interchanged in (35a). We may therefore say in this case the increase in the cost of adaptive control over the cost of stochastic control is given by the interchange of the summation and the minimization operations. The relation of the optimal policies for adaptive and purely stochastic systems is established by Observation 4.
d. Observation 4 If s
y;*(x, Zi-l)
=
min Ui~l
L Zi-Lk<Wi + Y;+1.k>k k=l
(36)
+ -t:»,
if <Wi is quadratic in u, and if no constraints are imposed on u and x (i.e., state vector and control vectors are not constrained in any way), then the optimal control vector for the adaptive system at time i ~ 1 is given by a linearly weighted sum of the optimal control vector of the corresponding S stochastic problems at time i - 1. Under the same set of assumptions, 0Yi of (35a) is, at most, quadratic in the optimal control vector for the stochastic problems.
Proof of Observation 4.
Since the dependence of
<Wi + Yi\l.l'>
k
on
u is quadratic by assumption, by completing the square, if necessary,
and recalling the recursion formula for
-r, and that the notation
ULl.k
1.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
235
is used to denote the optimal control for the purely stochastic problem with 0 = Ok at time i, we can write it as
wherecfJ~,k = cfJi,k and where cfJi,k and rPi,k generally depend on x. Note that Utl,k will generally depend on Ok . Substituting (37) into (36), the optimal control Ut_1 for the adaptive problem is given by performing the minimization
s
L
min
k=l
Ui_l
Zi_U[(U - ULU)'([>i,k(U - uLu)
+ i.k(ui-l - uL u)
k~l
s
=
s ,
L (Uf-U)'Zi-U([>i.kU'i-l.k - (L Zi-Lk([>i,kUi-U) k~
~I
(39)
As a special case, if the quadratic part of rt,k is independent of Ok , i.e., if
VII.
236
APPROXIMATIONS
then, from (38), the adaptive optimal control 1 ~ k < S, by
IS
related to ut-Lk,
Namely the optimal adaptive control is a weighted average of the corresponding stochastic optimal control. For this special case, (39) reduces to S
0Yi
=
L
(U'Ll,k)'Zi-l,kep,U;-l.k
k~l
(41 )
Even in the general case, by defining s
q), ~
L
(42)
Zi-l,kepi.k
k-l
and (43)
We can express ULI and SYi in forms similar to (40) and (41), respectively. We have, from (40), (42), and (43), (44)
and, from (39), (42), and (43),
(45)
The difference in control cost SYi generally depends on x. If q)k is independent of the state vector x and if the stochastic optimal control vector utk can be expressed as a sum of functions of Xi only and of Ok only, then SYi will be independent of x. To see this we express, byassumpbon, Ui.k as (46)*
.
*
* Equation (13) shows that the system of Section
B satisfies (46), at least for i
~
N -
1.
I.
ADAPTIVE SYSTEM OPTIMAL CONTROL POLICIES
237
Then, substituting (46) into (44) and (45), we obtain u i * = a(x i )
S
+ L zi,kb
(47)
k
lc=l
and s
0Yi+l =
s,
L bk'tPi+lZi.kbk - (L k~l
Zi.kbk) 1>i+l
k~l
S
(L
Zi.kbk)
(48)
k~l
Equation (48) shows that 0Yi is independent of x, when the stochastic problems are such that
k, note the relations bi(k
+-
Jb;(k +- l)u i p(A, B, C I x") d(A, B, C) * f b;(k I) p(A, B, C I x") d(A, B, C)
l)u i ~
+-
Ui
= bi(k
+-
I)ui
2.
245
OPEN-LOOP FEEDBACK CONTROL POLICIES
Similarly u;'kij(k
+ I)uj
0/= u;'kij(k
+ I)uj,
i] > k
etc.,
Note that only when Uk is involved we can write
etc. When hiCk
+
+ I )ui , u/ Kij(k + 1)Uj , etc., are equated with hiCk + 1)u i ,
etc., in the right-hand side of (71), the control variables k 1 are all taken to be functions of x only, i.e., open-loop control variables are substituted for closed-loop control variables. The optimal open-loop control variables Uk Uk+1 , ... , U N- 1 which approximate the optimal closed-loop policy is then given by Uk , ... , UN-I, which minimizes (71). Hence, by differentiating Eq. (71) with respect to u j ,j = k, k 1,... , N - I, we obtain
uiKij(k I)uj , Uk , uk+1 , ... , U N-
+
N-l
z::
i=k
Kji(k
+ I)u;*
N-l
= - (hj(k
+ 1) + I
fJ{O;j (k
i=k
j
=
k, k
+ 1))
+ 1,... , N
- 1
(72)
which, when solved, gives Uk * among others. When Uk * is applied at time k and the time advances to k 1 from k, we have one more observation X k +1 ' Therefore, rather than using ut+l ,..., U't-l obtained by solving Eq. (72) at time k, we resolve Eq. (72) after iL's and T'« are re-evaluated conditioned on xk+ l rather than on x k . In other words, only the immediate control Uk * is used from (72). Thus, at each time instant k we have a recursive procedure to obtain uk * This approximation generates an open-loop feedback control policy since a new observation Xk+l is incorporated in computing a new optimal open-loop policy based on the knowledge xk+I. It is easy to see that open-loop policies are much easier to compute than closed-loop policies. The question of when optimal open-loop feedback policies are good approximations to optimal closed-loop policies must be carefully considered for each individual problem. See Spangl 27 for computer studies for simple systems.
+
VII.
246
APPROXIMATIONS
3. Sensitivity and Error Analysis of Kalman Filters A.
INTRODUCTION
If the description of a linear dynamic system set of equations Xi+! = Aixi + ti
IS
given exactly by the
(73)
+ 7Ji
(74)
E(ti) = E(7Ji) = 0
(75a)
Yi = H,», E(tit/)
=
Q;?iij
(75b)
E(7Ji7J/)
=
R i8ij
(75c) (75d)
namely, when the matrices of the system Ai and Hi' the means and covariance matrices of the random noises, are given exactly as above, then the outputs of the Wiener-Kalman filter are the best linear estimates of the state vectors of the system. See Chapter V and Section 4 of Chapter II. It is important to have some measures of the variations of the filter outputs when some of the underlying assumptions are not true, since the system parameters such as Ai and Hi or noise statistics such as Qi and R i will not be generally known exactly in real problems. Such inaccuracy may arise as a result of numerically evaluating Ai and Hi (round-off errors and/or error in quadrature). For example the linear system given by (73) and (74) may be merely an approximate expression of a nonlinear dynamic and/or plant equations obtained by linearizing them about some nominal trajectories. Then, Ai and Hi are evaluated, perhaps numerically, by taking certain partial derivatives. See, for example, Section 3,F of Chapter V. Another reason for such analysis is that, for problems with complex expressions for A and/or H, it is of interest to examine the effect of a simplified approximate expression for A and/or H on the accuracy of the estimation. As for noise statistics, we usually have only their rough estimates. Therefore, it is important to evaluate the effects of inaccuracies in Ai , Hi , Qi , and/or in R i on the estimates, i.e., on the error covariance matrices of the outputs of the Wiener-Kalman filters. It is also important to know the effects of nonoptimal filter gains on the error-covariance matrices. We are interested in nonoptimal gains: (I) to study the sensitivity of the estimates and of the error covariance matrices with respect to the filter gain and (2) to study the effects of the
3.
SENSITIVITY AND ERROR ANALYSIS OF KALMAN FILTERS
247
simplified suboptimal method of gain computation on the filter performance since the gain computation is the most time consuming operation in generating the estimates. For additional details see, for example, Joseph. 8 1 . 8 2 We have derived, in Section 3 of Chapter II and elsewhere, the expressions for the error-covariance matrices for Kalman filter. They are given by (76)
where (77a)
where X,:*
and where
Ki+I
g
E(x,: I y':)
(77b)
is the optimal filter gain given by (77c)
where M':+l
g
E[(Xi+l - x':+l)(x':+l - x':+l)' I y':]
=
A,:r,:A/
+ Qi
(78)
The error-covariance matrix of the optimal estimate I',
IS
calculated by
g E[(x,: - x,:*)(x,: - x,:*)' I y':] = (I - K,:H,:)M,:(I - K,:H,:)' + K;R;K/
(79a)
or equivalently by (79b)
The initial estimate X o* and its error-covariance matrix To is assumed given from a priori information on the initial state of the system. B.
GAIN VARIATION
Let us first consider the effects on T of the gain changes from its optimal values K, by 8Ki . Denoting by 8Ti the deviation of the errorcovariance matrix from its optimal form T'; and dropping the subscripts,
sr = ~
[I - (K + SK)HJM[I - (K + SK)H]' - (I - KH)M(I - KH)' - KRK'
+ (K + SK)R(K + SK)'
SK [- HM(I - KH)'
KH)MH'
+ RK'] + [(I -
where the second-order terms are neglected.
+ KR]
SK'
(80)
248
VII. APPROXIMATIONS
Since K is given by (77c), coefficients multiplying oK and oK' vanish in (80) and we have
sr =
0
The alternate expression for optimal error-covariance (79b), obtainable by substituting (77c) into (79b), gives
sr =
-SKHM
Therefore, in numerically evaluating r, the expression (79a) would be less sensitive than (79b) to small variation in K. In Sections 4-6 we consider several suboptimal filters using non-optimal gains in the Wiener-Kalman filters. See also Section
3,E. C.
THE VARIATION OF THE TRANSITION MATRIX
We now investigate the effects of changes in Ai on the accuracy of computing M i +1 from r i . The noises are taken to be Gaussian random variables. Denoting the small variation of Ai by oA i and dropping subscripts, oM = oArA' AroA' from (78). Since oM will be small compared with M, write M oM = M EN, where E is a small positive constant. Since M is symmetric, by appropriate linear transformation on x, M can be made diagonal, i.e., the components of the estimation error x ~ x after the linear transformation can be taken to be uncorrelated, hence independent. The variances of these independent errors are the eigenvalues of M. Therefore, the change in the eigenvalues of M due to a small change in A may be regarded approximately as the changes in the variances of the components of the estimation error x ~ x. (This is only approximately true since oM will not be generally diagonal even if Mis.) We will now investigate the difference of the eigenvalues of M and of M oM. Denote by t\ the ith eigenvalue of M with its normalized eigenvector denoted by ei . We define .\ and ei as the corresponding quantities for M + EN. Writing
+
+
+
+
3.
SENSITIVITY AND ERROR ANALYSIS OF KALMAN FILTERS
the relation (M
+ EN)e =
Ae yields, to the order
249
E,
+ AT OA')ei
=
e/(oATA'
=
2e/ oATA'ei
= 2e/ oAA-l ATA'ei =
2e/(OAA-l)(A iei
~
Qei)
Therefore, or
If a major contribution to M comes from ATA' and not from II Qi 11/\ ~ 1, and one has approximately t
Q, then
EA;l 1/1 Ai I ~ 2 I oA,Ail II
In computing M N , Eq. (78) will be used N times starting from To. If each step of going from I', to Mj+l' 0 ~ j ~ N - 1, satisfies the assumptions stated above, then the total percentage error in M N is approximately given by N-l
2
I
o
II OAiAil II
or 2N II oAA-l II if A is a constant matrix. Therefore, as a rule of thumb, one must have tlOAAII~
1
2N
in such applications where N total number of estimates are generated.
D.
IMPRECISE NorSE COVARIANCE MATRICES
Since the statistics of the random noises are known only very roughly, the effects of large variations of Q and R, rather than their small variations on T, need be investigated. Such investigations must generally be done numerically in designing filters.
VII.
250
APPROXIMATIONS
One may take the min-max point of view in evaluating the effect of different Q's and R's on r, using techniques similar to those in Aoki'" where the effects of unknown gain (distribution) matrix on the performance index have been discussed. See also Section 2,D of Chapter II, Section 2 of this chapter, and Refs. 64 and 129 for treatment of unknown covariance matrices.
E.
EFFECTS OF SIMPLIFICATION
The amount of computations for implementing optimal Kalman filter is quite large for systems with high dimensions. In order to update x i * and r i, i.e., to obtain Xt+l and ri+l from xi* and r i, the following steps are involved: (i) xi+l is computed by x i * and Ai, (ii) Mi+l is computed from by (78), (iii) Ki+l is computed by (77), (iv) Xt+l is computed by (76), and (v) ri+l is computed by (79). A rough calculation shows that the number of multiplications involved is of the order n 3 even without counting the number of multiplications necessary to invert an (m X m) matrix, where n is the dimension of the state vector. In many problems, therefore, one is willing to use slightly inaccurate estimates if a significant reduction of the amount of computation results. One such reduction is achieved by reducing the dimension of the state vectors, for example, by replacing correlated noises by uncorrelated noises, or by partitioning the state vectors. 81 ,104 , 104a . 1l 2 Related approximation methods aimed at the reduction of the amount of computation are discussed in the next two sections. In practice, any such approximation must be carefully evaluated to achieve a reasonable trade-off of accuracy versus the amount of computation.
r,
4. Estimation of State Vectors by a Minimal-Order Observer A.
INTRODUCTION
When the problems of control are separated from those of estimation, * approximation may be made to the subproblems of control, to estimation, or to both. Approximate control schemes may, for example, use some
* This procedure is known to yield an over-all optimal control system for a class of linear systems with quadratic criterion. See Section 2 of Chapter II for detail.
4.
ESTIMATION OF STATE VECTORS
251
statistics which are not sufficient to approximately summarize past and current observation data and use control policies which are functions of these statics. In the next three sections, we will discuss effects on the performances of the optimal Kalman filters of various approximations which reduce the amount of computation required to generate estimates of the state vectors. Consider the case of linear systems with additive noises. We have seen in Section 3 of Chapter II that for the linear observation scheme the best linear estimator of the state vector has the same dimension as the plant. For complex systems with large dimensions, therefore, the problem of constructing the optimal filter or computing the optimal state vector estimates is not trivial. It is, therefore, important in practice to consider approximately optimal estimation procedures where constraints are imposed on the permissible complexities of the estimators or on the amount of computations. One approach is to partition the state vector into subvectors'"; i.e., instead of constructing an optimal estimate for the entire state vector, one may partition the state vector and construct a suboptimal filter for the state vector by combining judiciously estimates of these partitioned components of the state vector. This method requires a smaller amount of computation because of the nonlinear dependence of the amount of computation on the dimension of the state vector. This will be the subject of Sections 5 and 6. Another approach in limiting the complexities of an estimation scheme is to specify the dimension of the estimator. One such proposal has been made by Johansen. 78 Consider a situation where the system is described by Xi+l Yi
= =
+ BU i + gi HXi + TJi
(81)
AXi
(82)
where x is an n-dimensional state vector, y is an m-dimensional observation vector, u is a control vector, and g and YJ are Gaussian noises, and where we use O~i~N-l
as the optimal estimate of Xi at time i. The best estimate fLi+l has been shown to be generated recursively as a linear function of fLi and Yi+l . Note that fL has the same dimension as x. Johansen's proposal in discrete-time version is to generate approximate estimates of Xi , Zi , i.e., an approximation to P« , by Zi+l
=
Fiz i
+ DiYi
or
Zi+l
=
Fiz i
+ DiYi+l
252
VII. APPROXIMATIONS
where the dimension of control generated by
Zi
is generally less than that of u,
=
Xi ,
and to use
CiZ i
in problems with quadratic criterion functions since we know that the optimal control u; is proportional to fLi . In this formulation, matrices C, D, and F are chosen to minimize a given criterion function. These matrices, however, are not determined uniquely and require further conditions and/or numerical experimentation to obtain satisfactory results. Since the observation of the state vector Yi carries a certain amount of information on the state vector Xi , we will now consider a procedure for generating a vector Zi in such a way as to supplement the information carried by Yi so that Zi , together with Yi , can be employed to yield an approximation to fLi . This idea will be made more precise for the case of time-invariant linear sample data systems of (81) and (82). * We are particularly interested in this procedure where the dimension of the vector Zi is smaller than that of Xi' where Zk is the state vector of the dynamic system governed by k
=
0,1, ...
where Zk is the p-dimensional vector at the kth time instant, p :'(: n, F k is the (p X p) matrix, and D k is the (p X m) matrix. Typically, p < n. For example, one may take the estimate of X k , Xk , to be
where K and N are to be chosen. The vector Zk' together with Ylc , acts as the inputs to the estimator of X k • Such a system will be called an observer in this section. We now consider the problem of constructing an estimator Zi of (n - m) dimensions so that Xi is estimated as some linear function of Yi and zi' B.
DETERMINATION OF THE STATE VECTOR OF A DETERMINISTIC SYSTEM
Under some regularity conditions it is possible to generate Zi which, together with Yi , determines Xi exactly for deterministic linear systems. *'The following development is based on Aoki and Huddle.!?
4.
ESTIMATION OF STATE VECTORS
253
Namely, if an n-dimensional linear plant is completely observable, and if the observation on the system state vector produces m independent outputs (m < n), then it is possible to construct a device with (n - m)-dimensional state vector to supply the remaining n - m components of the plant state vector. We will give a sketch of the method developed by Luenberger.l'" A more detailed discussion is given in Section C, where the basic idea is modified to estimate the state vectors of a stochastic system using a more constructive method. Consider a linear system, the plant of which is governed by
where Xi is the n-dimensional state vector and the observation equation is given by Yi =
H»,
where Yi is m-dimensional, m ~ n, and H is an (m X n) matrix. Assume that the system is completely observable.P'' i.e., assume that the m· n column vectors of the matrices k
=
0, I, ... , n -
I}
span the n-dimensional Euclidean space. Then it is possible to design an observer with arbitrarily small time constant, such that Xi can be reconstructed exactly from Yi and Zi where Zi is the state vector of the observer. The design of such an observer is based on the existence of a matrix T which relates the state vectors of the plant and the observer by Tx i
Zi =
,
i = 0,1, ...
The dynamic equation of the observer is given by Zi+l =
FZ i
+ DYi +- CUi
where T and F is related to A and C by the matrix equations TA - FT = DH
(83)
C = TB
(84)
and These equations are derived in Section C. Luenberger shows that if the original plant is observable then F can be chosen so that its norm
254
VII. APPROXIMATIONS
is arbitrarily small and that T can be chosen in such a way that is nonsingular. Therefore,
r = (~)
His proof that T can be chosen to make T nonsingular, however, is not constructive. We will construct an (n - m)-dimensional observer for the stochastic system in such a way that the error-covariance matrices of the estimates of the plant state vector are minimized in a sense to be specified later.
C.
ESTIMATION OF THE STATE VECTOR OF A STOCHASTIC SYSTEM
We will now extend the ideas discussed in Section B to linear stochastic systems and design an estimator of Xi using (Yi , zi)' where Yi is the system observation and Zi is the output of an observer with (n - m) memory elements. If the system is such that nearly all the components of Xi are observed, i.e., if m is close to n, then the number of memory elements employed by the observer is much less than n. If the resultant error covariance matrix indicates that system performance is not much worse than that achieved using the optimal Wiener-Kalman filter, then the estimator considered here may have a practical application. See Section D and Ref. 17 for some numerical comparisons of the filter performances. a. The Stochastic Design Problem and the Estimation Error-Covariance Matrix In this section we consider the stochastic problem without control. The control term is introduced in Section 4,C,c. The system whose state is to be estimated is shown in Fig. 7.2. The state vector satisfies the nth-order time-invariant linear difference equation (85)
where gi is a sequence of independent vector random variables representing disturbance noise. The observation equation of the state vector is assumed given by (82). We will assume here that H is an (m X n) matrix having rank m and in addition is such that the system
4.
255
ESTIMATION OF STATE VECTORS
,-----------------1 : {.
xi +1
,
I '
I
I
I SYSTEM I
I
I I I
(N Ih-ORDER)
I I I I
I I
:
L
--,
r - - - - - - - - - - - - - - - - - - ----, Z· I " D L DELAY ,+ ,MINIMAL
I
ORDER DYNAMIC 'SUBSYSTEM OF I THE OBSERVERI(N - M lth-ORDER
I
I I
rI - - - - - - - - - - - - - - - - - - - - - I SUBSECTION OF I THE OBSERVER
I FOR GENERATING
I I
iTHE STATE VECTOR I ESTIMATE
I
L "
I
IX i + 1
I
I L
Fig. 7.2. estimator.
,
J
Schematic diagram of (n - m)-dimensional observer and the state vector
is observable. * We denote by Rand Q the covariance matrices of the independent noises g, T) which are assumed here, for convenience, to be stationary: E(g;g/) = QOi; E('TJi'TJ/) = us; E(gi'TJ/) = 0 for all i and j
The state vector of the observer is assumed to satisfy the difference equation (86)
where F and D are time-invariant (n - m) X (n - m) matrix and (n - m) X m matrix, respectively, yet to be specified. From the discussion of the previous paragraphs, the observer system is seen to involve two distinct sections. The first is a dynamic subsystem whose output is to represent, under nonstochastic conditions and proper initialization, a linear transformation Tx of the observed system state vector.
* See
Ref. 17 for discussions on unobservable systems.
VII.
256
APPROXIMATIONS
The other section of the observer accomplishes the construction of the observer estimate of the system state vector by applying the inverse linear transformation t-» to the partitioned state vector
Denoting
7'-1 =
(P i V)
(87a)
(-I-)
(87b)
where
t
=
and where P is an (n X (n ~ m» matrix and V is an (n X m) matrix, we may express the observer output as (88)
which is as depicted in Fig. 7.2. The central problem in the design of the observer for deterministic systems is to select the unspecified matrices such that the fundamental matrix equation (83) is satisfied while t remains nonsingular.I'" For the stochastic problem, these conditions must also be specified, but in addition we seek that solution which permits minimization of the elements of the estimation error-covariance matrix. To obtain this design solution we first derive a number of matrix relations involving the estimation error-covariance matrix which is defined as C,
g E[(Xi - Xi)(Xi - Xi)']
(89)
where Xi is the estimate of Xi provided by the observer at time i and where Xi is the true state of the system of (85) at time i. The relations obtained will then be manipulated in such a way that a set of equations are obtained for elements of the covariance matrix C in terms of the given matrices A, H, Q, and R and one design matrix V of (87a). These relations may then be used by the designer to minimize certain elements of the error-covariance matrix C, as desired. It should be emphasized that the constraints placed on observer design lead to a loss of freedom in the minimization process as should be expected. The central question, then, is whether or not these constraints allow a much cheaper estimator to be built which may have performance comparable to that of the (unconstrained) optimal estimator for the particular application considered. Throughout the ensuing discussion we shall
4.
257
ESTIMATION OF STATE VECTORS
use the following relations which, as shown in Appendix B at the end of this chapter, guarantee the existence of the observer:
PT
F= TAP
(90)
D
=
TAV
(91)
+ VH
=
In
(92)
HV =Im TP = I n _ m
(93)
HP=O TV = 0
where lie denotes the (k X k) identity matrix. We begin by considering the error in the dynamic subsystem of the observer. We define it as e,
=
Z,: -
Tx,:
(94)
The dynamic equation for ei may be written from (85) and (86) as em
=
Fe,
+ (DH -
(1' A - FT))x,:
+ ~,:
where But as T is taken to satisfy (83), (94) simplifies to (95)
We note that the augmented observation vector satisfies the equation
[;;J =
ix,: + [~:
J
(96)
where
[;;J is the observation of Xi which augments the originally available observation vector Yi . The noise ei is not white however. Its mean and covariance matrices are given by Ee,
=
0
(97)
TQT'
VII.
258
APPROXIMATIONS
Note that E(eiTJ/)
= 0
The estimate of Xi is constructed as the output of the observer and is given by (88). Expressing Zi and Yi as functions of Xi' we have from (96) Xi
(PT
=
+ VH)X i + Pe, + VTJi
(98)
Using (92) we see that the error-covariance matrix is expressed by C,
= PSiP'
+ VRV'
(99)
Using (93) and (99) we easily obtain TCiH'
=
0
(100)
The covariance Q of the plant disturbance noise does not enter relation (100) explicitly. To obtain an expression containing Q we reconsider the error propagation of the estimate. Defining xi = Xi - Xi' we write xi+l as (101)
Using (86) and (90)-(92), Xi+!
=
= = =
Xi+l
can be rewritten as
+ P(Fzi + DYi) VYi+i + PTA(Pzi + Vy,) VYi+l + PTAXi VYi+l + (I - VH)Ax i VYi+l
(102)
Therefore we have the difference equation for the estimation error as (103)*
From (103), the recursion equation for the error covariance matrix is given by CHi =
VRi+lV'
+ (I
~
VH)(AC,A'
+ Qi)(I -
VH)'
(104)
where V satisfies the constraint HV=Im
(105)
* Matrices VH and PT = I - VH that appear in (103) and elsewhere, are projection operators since (VH)(VH) = VH.
4.
259
ESTIMATION OF STATE VECTORS
Multiplying (104) by H' from right and making use of the relations of (93), we obtain (106)
b. Optimal Design of an Estimator of Minimal Order In this section we modify the matrix relations involving the errorcovariance matrix C, obtained in the previous section, and proceed to an optimal design solution in a rather unencumbered manner, while still satisfying the multiple constraints imposed on the observer structure. The constraint (105) is explicit and can be applied with no difficulty at the outset of the design effort for a given observation matrix H. Since (92) alone is sufficient for the inverse 1'-1 to exist we employ the expresSiOn
PT =1 -
(107)
VH
(with HV = 1m imposed) wherever useful From (106), we obtain
III
the ensuIllg discussion. (108)
From (104), we obtain (1 - VH)Ci+l = (1 - VH)[ACiA'
+ Qi](I -
VH)'
(109)
These two implicit relations involving C i + 1 are sufficient with Eq. (101) to obtain an expression equivalent to (106): (1 - VH)Ci+lH' = 0 The constraint on Ci+l expressed by (108) is easily imposed at the outset of design, to specify the covariance Ci+l given C; and the design matrix V. * Thus, if we address ourselves to the task of minimizing selected elements of Ci+l while (106), (108), and (109) are satisfied, by selection of the matrix V subject to the constraint HV = 1m , we will have optimized the design of the minimal-order estimator for the stochastic application. If this is done sequentially (i = 1,... ) we will derive a sequence of matrices {Vi} which realize the optimal transient response of the filter. On the other hand, by assuming a steady-state condition, (110)
* Although the conditions given by (104) and (105) and those given by (106), (108), and (109) are equivalent, the latter may be more convenient to employ directly.
260
VII.
APPROXIMATIONS
we can use (108)-(110) to yield by the same procedure an estimator design which is optimal in the steady state. c. Control Systems
Now suppose that instead of (85) we have the control system as originally given by (81):
The estimator is now taken, by adding a control term to (86), to be (Ill) Then, as before, in terms of T which satisfies T A difference ei between Zi and TXi satisfies the equation
FT =
DH, the
Therefore, by choosing G
=
TB
(112)
where T is previously chosen, the result of the prevIOUS section still holds true. d. Connection with Optimal Wiener-Kalman Filter The estimator described in this section generates the suboptimal estimate Xi of Xi by (88): (I13)
where Yi+l is the observed state vector of the system given by (82), where Zi+1 is the output of the observer (86), and where F and Dare given by (89) and (90), respectively. Therefore, from (102) we can see that
=
AX i
+ V(Yi+1
- HAx i)
where V must satisfy the constraint given by (105).
(114)
4.
261
ESTIMATION OF STATE VECTORS
In this form, we can see clearly the relation with the Wiener-Kalman filter of the estimation scheme of this section. Instead of using optimal time-varying gains of the Kalman filter, the estimation scheme of this section uses a constant gain V which is chosen optimally in order to minimize the steady-state error-covariance matrix C of (110). Figure 7.2 is the schematic diagram of the estimator showing how the observed state vector and the output of the observer is combined to generate the estimate of the state vector. We next present an example of constructing an observer which IS optimal in the steady state. For this example, the original plant IS chosen to be observable.
D.
EXAMPLE: OBSERVABLE SYSTEM
Consider a two-dimensional system with the plant equation x(i
+ 1) =
Ax(i)
+ Bu(i) + W)
where ") =
x(1
and where equation
Xl , X 2,
(Xl(i))
A=
(") ,
X2 1
u, and
~ 2
( o
B =
(~)
g are scalar quantities and with the observation y(i) = Hx(i)
+ YJ(i)
where H
=
(1,0)
Q(i)
=
(qlo q20)
R(i)
=
r
Xl'
The observation y is taken on but X 2 is not observed. We will construct a first-order observer which estimates X 2 by Zi+l
=
FZ i
+ D y(i) + K
Let and
u(i)
262
VII.
APPROXIMATIONS
The constraint equations (90)-(93) between VI =
l' and 1'-1 yield
1
p] = 0
t]
t 2P 2
=
1
t]P2
=
- V2
=
0
+tv
2 2
(115)
We now compute the steady-state error-covariance matrix C:
Imposing the constraints on the steady-state error-covariance matrices, HCH' = R
yields
Cn
= r
(1 - VH)CH' = 0
yields
C12
=
and (I -
VH)C = (1 -
VH)ACA'
+ Q(1 -
v 2r
VH)'
yields (116)
We choose to minimize the variance of X 2 variable v 2 • Thus we seek V 2 such that
, C2 2 ,
by selection of the free
which yields
Solving this equation with the additional simplifying assumption ( 117)
we find V2
= -0.37
(118)
and Cn
=
02
C22
=
2.4450 2
c12 = -0.370
(119) 2
4.
263
ESTIMATION OF STATE VECTORS
To complete the design of the observer, we compute F = TAP = -0.63
(120)
D = TAV = -0.5t 2
and where
and
t-» _ [ 0 -
l/t 2
1]
-0.37
It is seen that t 2 remains unspecified in the optimization. This due to the fact that the multiplication of the transformation
IS
by t 2 is irrelevant in reconstructing x, so long as it is nonzero, as passing
z through the inversion t-: cancels whatever effect t 2 may have.
The schematic diagram of the observer is given in Fig. 7.3. In order to obtain some insight into the accuracy of the estimation method discussed in this section, the error-covariance matrices of the optimal Wiener-Kalman filter are computed for the system of the example for comparison. Denoting the (i, j)th component of the optimal error-covariance matrix at time k by Tij(k), the following set of equations hold: Tn(i
+ 1) =
[q1
+ Tn(i)
X
[1 - (q1
- 4 T 12(i) + T 22(i)]
+ 4 Tn(i)
- 4 T 12(i) + T 22(i))/L1.J
T 12(i + 1) = [2 T 12(i) - T 22(i)][1 - (q1 Tdi
+ 1) =
q2
+ T 22(i) -
+ 4 Tn(i)
- Tdi)
+ Tdi))/L1i]
(2 T 12(i) - T 22(i))2/L1i
where
In particular, we are interested in T ll(oo), Tu(oo), and T 22(OO ). These satisfy the algebraic equation
VII.
264
APPROXIMATIONS
,--------- - - - - - - - - uti)
-.,
-
I I
I
LOBSERYER
I
y(i) ---'----,---1
I
I I
I I -
-
-
-
-
-
1 I
GENERATOR
OF THE STATE rYECTOR I ESTIMATE
I I X(i) I
I I J
I
L
x(j +1) {-; xu-
(~) urn +(j)
-:) X(;) +
I
SYSTEM
(I,O)X(i) +')(i)
E( ; 0 , E'1; 0
EH';(~'~')'
E'1'1';U'
E('1';O
Fig. 7.3.
Numerical example of the observer construction.
where
In terms of s, T 1 2 ( CIJ)
=
rls
T 22 ( (0)
=
2rjs -
Q2S
Considering s for the same case of (117), we must solve
S4+
S3_6s
2+2s+4=0
The required solution is s =
-3.055
5.
265
SUBOPTIMAL LINEAR ESTIMATION: THEORY
which yields the optimal error covariances for the (unconstrained) Kalman filter as Tn(oo) ~~ O.89a2 T 22 ( (0)
=
2.4a 2
T 1 2 ( (0)
=
-O.33a2
Comparing these results with those obtained for the minimal-order filter, we see that performance appears very favorable while realizing simultaneously a reduction in memory elements.
5. Suboptimal Linear Estimation by State Vector Partition Method: Theory We discuss in this section another approximate estimation scheme of state vectors of dynamical systems with linear plant and observation equations. The scheme is based on the observation made in Section 3,E that the amount of computation involved in generating an optimal estimate of the state vector of a linear dynamical system is a nonlinear function of the dimension of the state vector. For example, it requires a smaller amount of computations to generate k estimates of nlk-dimensional state vectors than to generate one estimate of the n-dimensional state vector. After an introductory discussion of such a suboptimal estimation method in this section an approximate estimation problem will be discussed in Section 6 when a natural partitioning of the state vector is possible based on the difference of time responses of various modes 14 2 of the system. The discussion of this section is done for the system given by (73)-(75d) of Sectjon 3.
A.
CONSTRUCTION OF SUBOPTIMAL FILTER
Suppose
Xi
is partitioned into k subvectors
z/,
Zi
2,
... ,
zl,
where
z/ is the value of the jth subvector at time i and where zj has dimension
nj Dj
, Lj n j ?: n. The jth subvector is related to the state vector by a (nj X n) matrix
:
z/
=
DjXi,
.r
~
j
~
k
(121)
Although the matrices D, could be taken to be time varying, they are assumed to be time invariant in order to avoid the resultant complexities
266
VII.
APPROXIMATIONS
of the filter construction. Therefore, the manner in which the state vector x is partitioned into k subvectors is fixed throughout the estimation process. From (121), (122)
where the notations * and have the same meanmgs gIven by (77a) and (77b). The estimates Xi * and Xi are assumed to be reconstructed from the estimates for the partitioned subvectors by A
k
x·* = "LJ F-z;* ~ J z. ;~l
(123)
where F's and D's must satisfy the relation (124)
in order that the state vector is reconstructed from the partitioned subvectors. Proceeding analogously with the optimal estimation procedure, we consider the sequential estimation equation for the subvectors given by (125)
where Kij is the filter gain at time i and where the matrix G j chooses the subvector of the y that is used in updating the estimate for z/ The matrices G j are also taken to be time invariant. From (123) and (125), x:t
*=
k
IF ·z· J
H t
i
=
k
k
L Fjz/ + L FjK/G;[Yi i
= Xi
- Hix i]
k
+ L FjK/G;[Yi -
HiXi]
(126)
5.
SUBOPTIMAL LINEAR ESTIMATION: THEORY
267
The comparison of (76) and (126) shows that the suboptimal estimation scheme implies that k
L FjK/Gj
s; =
(127)
j~l
is the gain of the filter for the state vector Xi . We choose Kii next in a manner analogous to (77c). From (121) and (122),
= = Since
(Xi -
Dj(Aix i
+ gi -
DjAi(X i - x;*)
Aixi*)
+ Djgj
x i *) is not available, it is approximated by Xi - x/ ~ D/(z/ - z~*)
where the superscript plus indicates a pseudoinverse to yield (128)
Then
(129)
where
Proceeding analogously with the optimal estimation equations we construct a suboptimal filter by the following set of equations: TOi
g
E(zoj - zoj*)(zoj
=
DjroD/
where To is assumed known, P;+l
where
g
A/T/A{
+ Q/
~
z~
*)' (130)
268
VII.
APPROXIMATIONS
and Q/ ~ DjQi D/
H/
~
(131)
c.n.o»
R/ ~ GjR;G/
+ R/]-l
K/ ~ P/H/[H/P/Hj' T/
=
[I - K/H/]P/[I - K/H/]'
+ K/R/K/ (131a)
k
Xi*
Xi
=
+ L FjK,tGj[Yi -
Hix;]
j~l
where Xo is assumed known. Figure 7.4 is a schematic diagram of the suboptimal filter. Computations in (131) involve manipulations of matrices of order less than (n X n) or (m X n). Of course, the above derivation does not give any unique partitions of the state vector. If the partition subvectors are not crosscoupled to any other subvectors, either through the plant or observation equations or through the random disturbances, then the present scheme would be optimal. Physical intuition based on the components of x plus a certain amount of numerical experimentation would be necessary to arrive at a reasonable partition. r--------- -
-
-
-
-
r----l
I
I
I
I
I G I
I
I
Y,
-
r----l
I
NEW
-
I
I
I
-
-- -
-
-
-
r-------,
-
-
-
-
-
-
-
I
I
I I
I
I
I
I
F
I
' I
I I
I
I
I
I
I
Ix·
I ' I
OBSERVATIONI
I I I I I
I
--l
OBSERVATION VECTOR PARTITIONING
I
I
~';E=-
~E-; E~';'~;-
- _J VARYING GAIN
I I I I
0-; - J THE CORRECTION TERMS
I
I
I
I
H.X.
I I
'Hi
X,
I
I I
L
-l ESTIMATOR
Fig. 7.4.
Suboptimal filter by partitioning of the state vector.
6. B.
269
SUBOPTIMAL ESTIMATION: AN EXAMPLE
ERROR COVARIANCE OF THE SUBOPTIMAL FILTER
Noting that the suboptimal filter based on the partitioned vector of Section B is equivalent to using the filter gain (127) in the WienerKalman filter for Xi' the error-covariance matrices of this. suboptimal filter can be computed as follows:
[1 - (L F jK;+1 G j) H k
1"i+l
=
[1 - (L F jK;+1 G j) H
k '
i+1] PHI
J~1
i+1]
J~1
k
k '
+ ( L F jK;+1 G j) R i+1 ( L F jK;+1 G j) J~1
J~1
where To is given and where K/ is computed by (131). Comparing T thus generated with the optimal error covanance which results from using the optimal gain K,
=
PiH/[HiPiH/
+ R i ]- 1
the degradation of the filter accuracy can be computed for this suboptimal filter. See Pentecost-P for an application to a navigation problem. We will next consider in detail a particular partition that results from the certain assumed form of the plant transition matrix.
6. Suboptimal Estimation by State Vector Partition: An Example A.
INTRODUCTION
As an example of the subject of the last section, where the estimation of state vectors via partitioned substate vectors is treated, let us now consider a problem where the state vector can be grouped into two subvectors naturally in a sense to be mentioned below. The system is given by (73) and (74). As before, Xi and ~i are n vectors and Yi and YJi are m vectors where E(gi) = E(1)i) = 0 E(fig/)
=
AiDij
E(1)i1)/)
=
L s;
VII.
270
APPROXIMATIONS
We use A and 1: instead of Q and R since we want to use Q and R to denote submatrices of A and 1:. We have already discussed the desirability of reducing the amount of computations associated with optimal filterings by using some suboptimal filtering scheme, so long as the required accuracy of desired estimates is compatible with that of the suboptimal filters. Suppose that the system is stable and that eigenvalues of Ai can be grouped into two classes such that the real parts of eigenvalues in one group are much different from those of the other group. Using Jordan canonical representation.P'' assume that the state vector Xi can be written as
where Zi is a q vector and becomes
~ -t:. l _ ) ( _ Wi+1
=
Wi
is an (n - q) vector. The plant equation
(_-.!~~_)(_~i_)
0 : Pi
ui,
+ (_~i_)
(132)
vi
where it is assumed that I tJ1i II ~ II epi II, i = 0, I, ... and where fLi q vector and Vi is an (n - q) vector. Define covariance submatrices Qi' Si' and R i by
IS
a
Assume that II ill
= 0(1)
and where E is a small positive quantity and where the notations O( I) and O(E) mean that I cI>i II is of order 1 and II Pi II is of order E, respectively. Partition Hi as (134)
where
M,
=
(m
N;
=
m
X X
q)
matrix
(n - q) matrix
Then, writing the Kalman filter equation for x i * in terms of its components Zi * and Wi * , the optimal estimates of Zi and Wi , they satisfy the equations Z~l
wi+1
+ Ki+l(Yi+1 = Piw;* + L i+1(Yi+1 = i Z;*
Mi+li Z;* - N i +1 P iWi*) M i+1i Z;* - N i+1 P iW,; *)
(135)
6.
SUBOPTIMAL ESTIMATION: AN EXAMPLE
where
K'+l
=
(q
matrix
m)
X
L'+l = (n - q)
271
X
m
matrix
are gains of the filters for the subvectors z and w. Optimal gains of these two filters can be expressed as (136)
where S,*n
g (M'+l , NiH) i'*(i) ( ~f+l
r ~ * ( 1' ) =[', (c[J, 0 A J *(i)
g
) HI
+ .E'+J
0) r*(') (c[J/0 lfI.'0) + (S' Q,
lfI ,
1
M'H i't1(i)
...1 2*(i) g M,+l i'l~(i)
r
z
+ N i+1 i'i;(i) + N'+l i'~(i)
( 137)
(138)
(139)
The time index of is now carried as the argument of T and the subscripts on r refer to the components of r. The asterisk indicates a quantity associated with the optimal filters. Since
the components of the optimal error-covariance matrices are given by rtJ.(i
r1~(i r2~(i
+ 1) = + 1) = + 1) =
i'ti(i) - Al*(i)' 57+11 A/(i)
i'1~(i)
- Al*(i)' 57+11 ...1 2*(i)
i'~(i)
- A/(i)' 57+11 A/(i)
(140)
When arbitrary nonoptimal gainsKi+1 andL i +1 are used these components of the error-covariance matrices are related by
+ 1) = r + 1) = r 22(i + 1) = r
+ K'+l5i+1K'~1 Ki+l A 2(i) + Ki+15i+1L;+1 A 2(i)' L'+J + L,+l5,+lLi+1
ll(i
i'J1(i) - Ki+l AJ(i) - A 1(i)' K'+J
12(i
f'di) - A 1(i)' L~+l
-
i'di) - Li+1 A 2(i) -
(141)
VII.
272
APPROXIMATIONS
where Ei+l is defined by EH 1
and where
=
+ E i+!
Hi+! t(i) HI+!
+ eJ>i Tu(i) eJ>/ Si + eJ>j T (i) P/ s, + Pi T (i ) P/
(142)
tll(i) ~ Qi
t 1 2 (i)
~
Tdi) ~
12
(143)
22
By our assumption on cIJi and Pi ,
II t 22 (i)11= 0(11 n, II) and
if
II s, II = 0(1) II s, II = 0(.:)
(~;
:;).
II t 1 2 (i)11= 0(11 s, II) II t 1 2 (i)11= 0(.:)
0nNil (~'
Hil =
~;)
if
+
(144) ( 145a) (l45b)
~(, =Hi.,fi 11 2 II Mi+lSi'+-/(Mi+l Si
+ Ni+lRi)11 (1 + II(r~.(i)
Mi+l
+ SiNi+l)S"t-/Mi+l II) and (164)
where c, ~
II SiMi+lSi';/Mi+lSi II
r,
III/>i 11 211 Mi+l S"tl\Mi+1 Si
~
+ N i+lRi)11
2
Comparing (157)-(159) with (162)-(164), it is clear that that II Si I is the major source of error of this type of approximations as expected. Appendix A. Derivation of the Recursion Formula for Open-Loop Feedback Control Policies (Section 2) Substituting Eq. (68) into Eq. (67), one obtains N-l
an
+2 L
b/c(n) U/c
+ L L U/c' K/cj(n) Uj
/c~n-l
/c
+ 2Cnxn_1 + x~-lLnxn-l ,
+ 2 L u/ fj(n) j
X n- 1
j
+ 2 L g/c(n) f/c + L L f/c' N/cj(n) fj /c
+ 2 L f/c' M/c(n) /c
/c
X n_1
j
+ 2 L L f/c' O/cj(n) u, /c
j
277
APPENDIX A. RECURSION FORMULA
+ an+l + 2 L bk(n +
+ L L Uk'
I) Uk
k
k
+ 2Cn+lXn + Xn'L'H1Xn + 2 Lgk(n + k
+LL
tk' Nkj(n
+2L
tk Mk(n
k
+
I) t j
j
+
Kkj(n
j
I)
I) ».
e,
+ 2 L u/ fj(n +
I) Xn
j
+
I) Xn
+ 2 L L tk'
+
Okj(n
i
k:
+
+
where Xn = AXn_1 BUn_1 cgn- 1 . Since Eq. (AI) must hold for all values of the variables u, one obtains bn_1(n)
=
Cn+1B
bk(n)
=
b,Ci F 12(i) IfF()N:+I
X S;;lMi +1 .tn(i)]
+ .tl~(i) M:+1S;;lNi+l Tt2(iY + .tn(i)M(+IS;/lOi+1S;;lMi+I .tn(i) - (Mi+1.t1i(i) + N i+1 i\~(i)Y X S;;lLl i+lSi+1(Mi+1.t1i(i) + N i+1.ttz'(i)) ~ Ll.tn(i) - Ll.tn(i) M:+i S;t\Mi+1 .t1i(i) - .ttl(i) M:+1S;;lMi+l Ll.tn(i) - .ti~(i)
M:+1S;;lNi+l(f/>i LlF12(i) IfF(Y
- Ll.tn(i) M:+1S;;lNi+l(Si
+ f/>i Tl~(i)
IfF(Y
- (f/>i LlT12 (i) IfF()N:+1S;;lMi+1 .t1i(i) - (Si
+ f/>i T 12(i) lfFi)N:+1S;;lMi+l Ll.tn(i)
+ .tiz(i) N:+1S;;lNi+l .ti2'(i) + I\i(i) M(+1S;:;lOi+IS;;lMi+l .t{;.(i) - Al*(i)' S;;ILli+lSi+; Al *(i)
We drop subscripts from now on: LlTdi
+ 1) =
Ll.t12(i)
+ [.tliM'S-IM.tl~
- .tnM'S-IM.t12]
+ .tl~N'S-lM.tl~ + [.tliM'S-W.t2~ - .tnl'l!J'S-WR] + (.tl~N'S-W.t2~ - .t12N'S-lNR) + .tnM'S-W(R - .t22) + .t1iM'S-10S-1NR - (.t1iM' + .tl~N')S-l LlS-l(M.tl~ + N.t2~) =
Ll.t12(i) - Ll.tn(i) M'S-IMtl~
- tt{M'S*-lM Llt12
+ ttiM'S-WlfFiTizP/ + Tl~N'S-WlfFiT2~1fF/ -
Lltn M'S-WR Llt1 2 N'S-WR
+ .tl~N'S-lMtl~
- .t1iM'S-WlfFiTizP/ - Lltn M'S-WlfFiT22P/
+ i'riM'S-lOS-lNR LlS-1 (Mtl~ + Ntiz)
- t1iM'S-WlfFi LlT22 IfF;' - (t1iM'
+ tl~N')S-l
APPENDIX C. COMPUTATION OF
11T22(i
.dr(i)
+ 1) = 11t22(i ) + (l\~N'S-1Ml\~ - RN'S-1Mt12) + (t~N'S-Wt~ - RN'S-Wt22) + (t1~'M'S-Wt2~ - t{2M'S-WR) + tt2M'S-1Mt~ + RN'S-WR - t 22N'S-WR + RN'S-18S-WR - (t1~'M' + t2~N')S-1 11-1S-1 X (Mt1~ + Nt~) = 11t22(i) - RN'S-1M 11t12 + PiR/.p/N'S-1Mt1~ - RNS-WPi 11T22 Pi - 11/'{2 M'S-WR
+
t~M'S-1Mt1~
+-
RN'S-18S-WR - (t1~'M' 1
X 11S- (Mt1~
- 11t22 N'S-WR
+ Nt2~)
+ t2~N')S-1
- Pir2~P/N'S-WR
281
Chapter VIII
Stochastic Stability
I. Introduction We consider the question of stability of discrete-time stochastic systems via Lyapunov functions in this chapter. * This topic is not only important in its own right but also provides an example of the classes of engineering problems where the theory of martingales introduced in Chapter VI can be fruitfully applied. It is well known that natures of stability of deterministic dynamical systems can be answered if we can construct Lyapunov functions with certain specified properties. See, for example, La Salle and Lefschetz.s? Hahn.?" or Krasovskii.t" Also see Refs. 141a, 143 for other methods. Generally speaking, given a dynamical system with the state vector x, the stability of the equilibrium point of a dynamical system (which is taken to be the origin without loss of generality) can be shown by constructing a positive definite continuous function of x, Vex), called a Lyapunov function, such that its time derivative dV(x)/dt along a system trajectory is nonpositive definite. A monotonically decreasing behavior of Vex) along the trajectory implies a similar behavior for the norm of the state vector x, II x II, i.e., II x I ---+ as t ---+ 00, which is in correspondence with our intuitive notion that the origin is asymptotically stable. For a discrete-time system with the trajectory {x n , n = 0, I,...},
°
* For discussions of stability of continuous time stochastic systems see, for example, Samue]s,'2'-123 Kozin.f" Bogdanoff," Caughey," and Caughey and Dienes. 3' See also Ref.6Ia.
282
1.
283
INTRODUCTION
the stability of the origin is implied by the behavior of V(x) such that, for any set of i discrete sampling points in time, 0 ,,:;; n 1 < n 2 < ... < ni , (1)
This behavior of V(x) may be intuitively understood by interpreting V(x) as a "generalized" energy of the system which must not increase with time for stable dynamical systems. Now consider a discrete-time stochastic dynamical system described by k
=
0, I, ...
(2)
where Xk is the n-dimensional state vector and glc is a random variable (generally a vector). A control system described by k
=
0,1, ...
(3)
can be regarded as a dynamical system described by (2) for a given control policy, say Uk = rPlc(Xk)' where rPlc is a known function of xk: Xk+l
=
Fk(Xk ,
0, then
~ f3(a)
[1 - Pr(11 X o II ~ a)]
+ f3(M) Pr(11 X o II
~ a)
or
Pr[11 X o II
°
~
a] ~ (EV(II X o II) - f3(a))/f3(M)
Choose p(o, E) > sufficiently small so that f3(p) Then for X o satisfying Pr(11 X o II ? p) ~ p we have E
V(II
Xo
II)
+ pf3(M) ~
EO.
EO
~
From this inequality and (9), (11) follows. Thus we have proved the stability with a slightly different definition of stability that the origin is stable if and only if, for any 0 > and E > 0, there exists a p(o, E) > such that for every X o satisfying Pr(11 X o II? p) ~ p, and Pr(11 X o I ~ M) = 1, (11) holds. The criterion for asymptotic stability is given by the following. Suppose that there exists a continuous nonnegative function y(.) of real numbers such that it vanishes only at zero and
°
°
E[V(x(n, xo)) I xo , ... , xn - 1]
-
V(x(n - 1, xo))
~
-y(11 x(n -
1, xo) II)
0#' if it exists, is given by H(4)o#, 0#)
i~t
=
H(4)#, 0#)
where ()# is the assumed (or known) probability distribution function for (). If, for every € > 0, there exists (),# E e« such that a given control policy 4>0# comes within € of inf H(4)#, ()/), then 4>0# is called an extended Bayes or c-Bayes control policy.P
Equalizer Control Policy.
If there exists a control policy 4># such that H(4)#, 0)
=
constant
for all () E e, then it is called an equalizer control policy. The usefulness of equalizer policies may be illustrated by the following theorem.
Theorem. If 4>0# E M) = 0 for some fixed M;
Case B. Pr(N = j I ~i'
yi, N
>
i)
= Pr(N = j IN> i).
In Case A, we shall find a general procedure following the approach of this book. In Case B, we show that the problem can be transformed into one in which there is an infinite running time and a new cost function ]', which is not a function of stopping time. If the plant and observation equations are linear, with the only random variables (besides stopping time) being additive noise, and if the cost function is quadratic, that is if Xi+!
=
Yi = Wi,N
=
+ Biu, + gi cs, + n, +
AiXi
(U i-1 , GN,iUi-l)
(Xi' HN,iXi)
where Ai , B i , and C; are known and ~i and YJi are independent random variables with known distributions, then we can, formally, write a solution to the optimal problem.
b. Case A [Pr(N
>
M) = 0]
Suppose the system has survived through i = M - 1 steps. Then we know that we have exactly one more step to go. The problem is then the same as the case of a known stopping time and we have already solved that problem. Hence there is an explicit expression for U M- 1 as a function of U M- 2 and y M - I that will minimize the expected cost.
3.
EXTENSIONS AND FUTURE PROBLEMS
305
Now suppose we have survived M - 2 steps. Now there are two possibilities: either the stopping time is N = M - 1 with probability p M-l/ M-2' or N = M with probability P N/ M-2 . If the former holds, the additional cost will be W M-l M - l ; in the latter case it will be W M-l,M W M,M' Hence, taking the conditional expectation with respect to stopping time, the conditional expectation of the last two-stage cost L1] is given by
+
W M, M is a function of x M , U M-l ; U M-l is a function of yM-l , U M-2 ; X M is a function of X M - 1 , UM-l , ~M-l and hence a function of X M - 1 , 2 yM-l , U M- , ~M-l' Hence EN L1] is a function of 'M-2 (because P is a function of 'M-2), yM-2, U M- 3, which are observables, plus YM-l' gM - l ' X M - l ' which are not observables, and also, of course, U M-2 • In principle, we can find the probability distribution
and hence
Then the optimal policy is to choose U M-2 to minimize this conditional expectation. Now we see that this U M- 2 is a function of yM-2, U M - 3, and 'M-2' Going back another step, we have
This expression is a function of the observables 'M-3' yM-3, u M - 4, the nonobservables YM-2' YM-l , gM-2 , gM-l , X M - 2 ; and on U M-3· Again we find the conditional probabilities of the nonobservables conditioned on the observables and U M- 3 • Then we find the conditional expectation of the additional cost after M - 3 stages conditioned on the same variables. Again, we choose U M- 3 to minimize this conditional expectation. We see again that the control U M-3 is a function of the observables yM-3, U M- 4, and SM-3 .
IX.
306
MISCELLANY
The process continues with the general expression M
EN[LJ] I ~i]
=
M
I
P j/ iWi+1,j
+ I
j~i+l
M-l
=
P j/ iWi+2,j
+ ".
j~i+2
M
I
I
Ic~l
j~i+1c
Pj/iWi+Ic,j
which is a function of the observables y i , ui - \
~i'
Yi+l 'Yi+2 ,,,., YM-l , gi' gi+l , ... , gM-l , ~i+l
,,,.,
the nonobservables ~M-l , Xi' and the control Ui' The conditional expectation of Ll j, conditioned on the observables and Ui , is found and Ui chosen to minimize this expectation. As in the case of known stopping times, the practical problem is finding the conditional probabilities required. The process is only slightly more difficult by the inclusion of the extra variable ~, which determines the conditional distribution of stopping times. ~i+2
,
c. Case B Now let us consider the special case where the only additional information about the stopping time we have at the ith step over what we know at the first step is that the stopping time is greater than i. That is, ~i disappears for this problem and we have Pr(N
=
j yi, N I
> i)
=
_
Pr(N
=
!
Pr(N Pr(N
j =
>
1
N
> i)
j) i)
o
Now, if we exarrune EN[Llj n on, we have
+ I i~n+2
IN> n]
if j > i otherwise
and where Llj = cost from
Pr(i = N) Wn +2 , i
+ .,,]
3.
EXTENSIONS AND FUTURE PROBLEMS
307
If we multiplied all of the cost by a constant, we would change nothing. Hence, once we get to the nth stage, we can use EN[LJ]'
IN> n]
I I
=
k~l
Pr(i
=
N) W n +k • i
i~n+k
But then
That is, the expression for the expected cost function from n on is the same as from time zero. This shows us that we can use a single equivalent cost function in which the implicit dependence on the random stopping time disappears. That is, we note that E N [] ] =
I I k~l
i~k
W k'
=
where
Pr(i = N)
Wk,i
=
I
W k'
k=l
I
Pr(i = N)
Wk,i
i~k
N ate that we have left off the upper limits of summation in all cases. We can let this upper limit go to infinity. N ow our optimal control policy is that policy which is optimal for the system given and a cost function of
I'
=
I
Wk'(X k, Uk-I)
k~l
As an example, suppose final state). Then W k'
=
L
=
Wk,i
P(i = N)
xl8 k ,i (that is least squares in the
Wk,i
=
i=k
L
P(i = N)
Xk
20k,i
i=k
or, if
then 00
W k'
=
I
P(i = N)
AX i
i=k
= A P(k = N)
20k,i
+I
(P = i) U~_l
i=k
Xk
2
+ P(N :?' k)
ULI
308
IX.
MISCELLANY
Now, the obvious difficulty is that we have an equivalent system with an infinite time duration. This precludes the possibility of going to the last stage and working back. If we have linear plant and observation equations with additive independent noise and a quadratic cost function, the problem is solvable. This is because we know the optimal policy is the same as in the deterministic case except we use E[x n I ynJ instead of X n. The deterministic linear system of infinite duration can be solved by variational techniques and hence our problem can be solved.w Even in this special case, we may not be able to find explicit expressions for E(x n I yn). If the observation equation is noise free, or if the system is noise free and the observation noise is Gaussian, we can solve the problem in principle.
Appendix J
Some Useful Definitions, Facts, and Theorems from Probability Theory
In order to facilitate the reading of this book (especially of Chapters VI and VIII) several facts and theorems from the theory of probability are collected here together with some of their intuitive explanations. For more detailed and systematic accounts see, for example, Doob 47a or Loeve.l'" PROBABILITY TRIPLE
In order to be able to discuss probabilities of certain events, three things must be specified. They are: (i) the sample space, Q; (ii) the class of events to which probabilities can be assigned, :F. Events in the class :F are certain subsets of the sample space Q; (iii) probability measure P (defined on :F) so that, to any event A in the class :F, a real nonnegative number PA, 0 ~ PA ~ 1, is assigned, with PQ = 1. These three things are collectively referred to as a probability triple (Q,:F, P). Since each event in :F must have a probability assigned to it unambiguously, :F cannot be any arbitrary collection of subset of Q but must have a certain structure. For example, in a single coin tossing, the sample space Q is composed of two points: H (for head) and T (for tail). The class :F consists of four subsets, {(c/», (H), (T), (H, Tn, where c/> denotes a null set. When we say a coin is fair, we mean that PH
=
PT = 309
i
310
APPENDIX I
Intuitively, .fF includes all the events to which probabilities can be assigned. If an event A has a probability p, then one also wants to talk about the probability of A (the event that A does not occur) I - p; i.e., if A E.fF, then .fF must be such that A E .fF. If Al and A 2. are in .fF, then one wants to discuss the event Al n A 2 (the event that Al and A 2 occur simultaneously), the event Al U A 2 (the event that at least one of Al and A 2 occur), etc. Namely, if AI, A 2 E.fF, then.fF must be such that .fF contains Ai U Ai ' A, n Ai' Ai U Ai ' Ai n Ai' Ai U Ai ' and Ai n Ai' i, j = 1, 2. Such a class is known as a field. Since we also want to discuss probabilities of events which are limits of certain other events such as lim n -7w U~ Ai and lim n -7w n~ Ai , Ai E .fF, i = 1,2,... , .fF is usually taken to be a a field.
Example. Given a set AC Q, .fF = {cjY, A, Q - A, Q} is the minimal a field containing A (i.e., the smallest a field containing A). RANDOM VARIABLES
A random variable X (abbreviated as r.v. X) is a mapping from Q to the extended real line R (the real line plus ± (0) such that
for all A E Borel field (a field on R) where X-I is the inverse mapping of X; i.e., X-IA = {w; X(w) E A, w E Q}. Such an X is called .fF measurable. We denote by a(X) = X-I (Borel field) the smallest a field of subsets of Q with respect to which X is measurable. INDEPENDENCE
Let .EI
, .E2 , ... ,.En be sub-a-fields of .fF, i.e., .Ei is a a field such that 1 :s;: i :s;: n. They are called independent if and only if
s, C s ,
P
(n Ai) = 1
fI P(A i)
for arbitrary
Ai E };i'
1:( i :( n
I
A sequence of sub-a-fields of .fF, .Ei , i = 1,... , is independent if .EI , ... , .En are independent for all n = 1,2,.... Random variables Xl' X 2 , ••• are independent if and only if a(XI), a(X2 ) , ••• are independent.
311
PROBABILITY THEORY EXPECTATION
An indicator function (also called a set indicator)
fA
is defined to be
IS
called a simple
WEA
w$A The expectation of
fA
is defined to be
A finite linear combination of indicator functions function. If m
X
n
I
=
aJAi
=
i=l
I s.:s,
i=l
where Ai and B, are measurable, i.e., Ai , B, of X is defined to be m
I «r»,
EX =
E
%, then the expectation
n
=
1
I
bjPB j
1
If X is a nonnegative random variable, and if {Xn } and {Yn } are two sequences of measurable simple functions such that X n i X and r, i Y (Xn i X means that ~ Xn+l and limn Xn(w) = X(w), for all w E Q), then
x;
lim EXn
=
lim EYn
and this common value is defined to be EX (the expectation of X). The expectation of a random variable X on (Q, %, P) is defined to be EX
EX+ - EX-
=
where X+
=
max(X, 0),
X-
=
when the right-hand side is meaningful. EX is also written as EX
=
JX dP.
max(O, -X)
312
APPENDIX I
ABSOLUTE CONTINUITY
Let us suppose that two probabilities P and Q are available for the same (Q, g;). We say P is absolutely continuous with respect to Q, written as P 0,
P 0, Pi > 0 for all i, I :c:;; i :c:;; 4, in the above example, P 0, there exists 0 > 0 and N(o, E) such that Pr[j X n
-
X I
~
E] < 8
for
n
~
N(E, 8)
316
APPENDIX I
Convergence with probability one: A sequence of LV. {Xn } is said to converge with probability one to a LV. X if Pr[Xn ----+ X] = 1, i.e., for every E > 0, Pr
[n U I X n
m
n +m -
X
I~
E]
=
0
or, equivalently, Pr
[U [I X m
n +m -
X
I~
E)] --+ 0,
Convergence in L': A sequence of converges to a LV. X in L! if
LV.
n
n
--+ 00
{Xn } , X n ELI, n
=
1,... ,
--+ 00
EXAMPLES
Convergence in probability does not imply convergence with probability one. Let X n be independent such that Pr[Xn
°
=
0] = 1 - lin,
Pr[X n = 1] = lin
Then, X n ----+ in probability one but not with probability one. As a matter of fact, the set of w such that {Xn ( w)} will be one infinitely often has the probability one. Convergence with probability one does not imply convergence in L': Let X n be independent, EX" < 00 with Pr[Xn Then X n
----+
°
=
0]
=
1 - 1/n2,
with probability one but
EXn = 1 -1+ EX = 0 SOME CONVERGENCE THEOREMS
Monotone Convergence Theorem. Consider a r.v. X and sequence of LV. X n such that X n i X. Then EX = limn EXn . Martingale Convergence Theorem. a martingale on (Q, :7, P).
Let {Xi' :7i
,
a
i = 1, 2, ...} be
317
PROBABILITY THEORY
If E I X n I ~ k
n = rank A, then we may be interested in x = (A'A)-1A'y as a solution to (1) m some cases. We have seen one example in Chapter II, Section 2, where it is necessary to minimize a quadratic expression [(u, Su)
+ 2(u,
Tx)]
(2)
with respect to u even when 8-1 is not defined. In (2), as shown in Appendix A of Chapter II, the desired u is obtained by solving the linear equation
(3) Su + Tx = 0 when 8-1 exists. Even if 8-1 does not exist and (3) cannot be solved for u, one is still interested in finding u which minimizes the quadratic form (2). This minimizing u satisfies (3) in an approximate sense, to be described below. The concept of pseudoinverses of matrices is introduced as an extension of the concept of the inverses to provide the method of solving the
318
319
PSEUDO INVERSE
equation such as (1) or (3) approximately in such a way that, when the inverses of appropriate matrices exist, these two concepts coincide. 65 ,llo ,1l1 There are several ways to introduce and derive properties of pseudoinverses. 27 .47 ,142 Here, the starting point is taken to be the minimization of a quadratic form. Namely, the problem of solving (1) for x is transformed into that of minimizing a quadratic form II
Ax - y
11 2 =
(Ax - y, Ax - y)
(4)
with respect to x. After all, this is the way the pseudoinverses appeared in our problem in Chapter II, Section 2. The minimizing x of (4) may not be unique. Then, let us agree to pick that x with the smallest norm II x II as our solution. This seems quite reasonable for (2), for example, since one is usually interested in minimizing the performance index (2) with the smallest fuel, energy, etc., which may be interpreted as u having the smallest norm. For further discussions of quadratic programming problems to select unique solutions by successive use of various criteria, see Mortensen.J?" Denote x with these properties by x
A+y
=
(5)
where A+ is called the pseudoinverse of A. Note that when A-I exists, x = A-ly satisfies the conditions of uniquely minimizing II Ax _ y 2 • 11
CONSTRUCTION OF THE PSEUDOINVERSE
The development of the pseudoinverses presented here is based on the properties of finite-dimensional Hermitian matrices.l'" See Beutler''? for similar treatments of pseudoinverses in more general spaces. , Let A be an m X n matrix with rank r, C" an n-dimensional complex Euclidean vector space, M(A) the range space of A, %(A) the null space of A, and A* the complex conjugate transpose of A. Vectors are column vectors. Vectors with asterisk are, therefore, row vectors with complex conjugate components. Our construction of A+ is based on the polar decomposition of A: r
A
=
I
(6)
Adigi*
i=l
where r = rank A, and where gi EO C», such that fi*fi = 0ij ,
I.
and gi are column vectors,
g;*gj = 0ij,
~ i, j ~ r
I. EO c»,
320
APPENDIX II
and where .\.; >0 is defined later by (15). In (6),fig/ is a dyad (m X n matrix of rank one) andfi*fj is complex inner product. Then it will be shown that A+ with the desired property is obtained as r
A+
I
=
Ai1gJ;*
(7)
;~l
First, (6) is derived. Let Then one can write X
=
Xi ,
i = 1,... , n, be an orthonormal basis in C",
n
I
for all
«,»,
x
E
en
i=l
where (X;
x;*x
=
Now Ax =
n
I
(X;Ax;
i=l
where 1 =
1...., n
since A is a linear mapping from C» to C», Since rank A Yl ,... , Yr be the orthonormal basis of :a?(A) C c». Then, generally,
=
r, let
r
Ax;
=
I
{3ijYi
(8)
j~l
By suitable choices of bases in c» and C", (3ij in (8) can be made quite simple. To find such suitable bases, consider A *A, an (n X n) matrix. It is a Hermitian linear transformation on C n , hence it has n nonnegative real eigenvalues, and its matrix representation can be made diagonal by a proper choice of a basis in en. Since r = rank A = rank A*A
321
PSEUDOINVERSE
exactly r of the n eigenvalues are nonzero. Let Pi be such positive eigenvalues with eigenvector Zi, I ~ i ~ r, A*Azi
=
PiZi,
Pi
> 0,
Zi E
i
en,
=
1,..., r
(9)
Multiplying (9) by A from left yields AA*(Azi ) = pi(Azi),
i
=
1,... , r
(10)
This shows that, if Zi is an eigenvalue of A *A, with the eigenvalue Pi , then the AZi are eigenvectors of AA * with the same eigenvalue Pi . Since AA * has exactly r positive eigenvalues, rank (AA *) = rank (A*A) = r, hence A*A and AA* have Pi' i = 1,... , r, as their common eigenvalues. Orthonormalize the eigenvectors for A *A and denote them by {gi' i = 1, ... , r}: i = 1,... , r
We have seen in (10) that Agi are eigenvectors for AA*. Choose the eigenvectors for AA* {Ii, i = 1,... , r} by i
=
1,... , r
Since
{Ii' i
=
I, ... , r}
IS
also orthonormal if fJi
=
(Pi)l/2, i
=
I, ... , r. Thus (II)
It is known that14 2 em = 8i'(AA *) EB JV(AA *)
en = 8i'(A*A) EBJV(A*A),
Since ~(A *A) C C", complete an orthonormal basis for Cn by adding gr+l ,... , gn to {gi, i = 1,... , r}. Similarly, ~(AA*) C c» and an orthonormal basis for c» is obtained by augmenting {Ii , i = 1,... , r} by {lr+l ,···,fm}· Then {gr+l' ... , gn} spans A'(A *A) and {lr+l ,... ,Im} spans %(AA*). It is also true 14 2 that ~(A) ~ %(A*) and .'J1'(A*) ~ .Y(A). Thus A*Ax = 0
¢>
Ax = 0,
AA*x=O¢>A*x=O
Hence, from A *Agj = 0, Ag;
= 0,
j
=
r
+ 1,..., n
322
APPENDIX II
and from AA *fj = 0,
(12)
Ar], = 0,
j
+ 1,..., m
r
=
From (11) and (12),
Ar],
0,
=
i
=
1,..., r
i
=
r
(13)
+ 1,... , m
Since {gl , ... , gn} is a basis in en, given any x E en,
where and Ax
=
n
r
1
1
L cxiAgi = L CXiP~/Yi
(14)
or r
A
L Adigi*
=
(15)
1
where Pi is a positive eigenvalue of A*A, 1 ~ i ~ r. Equation (14) is the simplified form of (8). Thus ~(A) is spanned by I. , i = 1,... , r. Now we consider Problem (4) with x E en. Write with
v
with
Yi = f;*v
E
91?(A)
Then v has the expansion r
V
= Lydi
(16)
1
N ow consider a vector related to x by
Then, from (14), Ax
=
v
and therefore II
Ax - y
11
2
~
II
Ax - Y
11
2
for all
x
E
en
323
PSEUDO INVERSE
Also
Therefore, one sees that A+ is defined by
or r
A+ = L/..;lgJt
(17)
1
where
1\ is given by (15). From (15), A*
=
r
L /..igdi* 1
and r
(A *)+
=
L /..;lhgi *
(18)
I
From (17) and (18), one can readily obtain useful identities: (i) (ii) (iii) (iv)
AA+A = A A+AA+ = A+ (A+)* = (A*)+ (A+)+ = A
For example, (i) is obtained from (15) and (17);
=
L
i,j,k
\k;l\J;gi *gj~
"Ls; *
since
Expressions such as (17) and (18) can be put in to matrix forms. Define
= {II ,···,fm}:
m X m
matrix
G = {gl ,..., gn}:
n X n
matrix
F
324
APPENDIX II
and
A 0:) R ~ ( .:H>.~ 1.
.
:0
·
m
X
n
matrix
The orthonormalities of j's and g's imply FF* =F*F
=
t;
where 1m is the m-dimensional identity matrix. Similarly, GG* = G*G = In
From (15) and (17), A =FRG*
and A+ = GR+F*
where
R+
=
Similarly, A* = GR'F* (A*)+ = F(R+),G*
where ['] means a transpose.
Appendix III
Multidimensional Norma! Distributions
In this section certain useful facts on multidimensional normal distributions are listed for easy reference. An attempt has been made to give a logical self-contained presentation wherever deemed possible without unduly lengthening the material presented in the appendix. Most of the proofs are omitted. For a more complete discussion of the material, the reader is referred to Cramer'" and Miller.l?" RANDOM MATRICES AND RANDOM VECTORS
Definition I.
A random (m X n) matrix Z is a matrix Z
=
(Zij),
of random variables
i
= 1,2, ..., m, j
=
1,2, ..., n
Zll' Z12 , .•. , zmn .
Definition 2. EZ
=
(Ez i j )
Let Z be an (m X n) random matrix. Let A be a (l X m) matrix, B an (n X q) matrix, and C a (l X q) matrix. Then
Lemma I.
E(AZB
Example 1. i.e.,
+ C) =
A(EZ)B
+C
Let X be an n-dimensional random vector with mean ft,
EX = flo 325
326
APPENDIX III
Then (X - fL)(X - fL)' is an (n X n) random matrix and 11
~
E[(X - fL)(X - fL)']
is defined as a covariance matrix of the random vector X. Thus, by definition, A is a symmetric positive matrix (i.e., either positive definite or positive semidefinite). CHARACTERISTIC FUNCTIONS AND PROBABILITY DENSITY FUNCTIONS
Definition 3. The characteristic function (abbreviated ch.f.) of an n-dimensional random vector X is q,(t)
~
E
eii'X
for every real n-dimensional vector t. When n = I, this definition reduces to the usual definition of the ch.f. of a random variable.
Theorem 1. Given two distribution functions F I and F 2 on the real line, if the corresponding ch.f. is such that q,l(t) - q,2(t), then F I = F 2 . The inversion formula lim ~21 T->oo
7T
IT
-T
e-
i ta
~
e-
iib
q,(t) dt
It
exists and is equal to F(b) - F(a), where a and b are any continuity points of F. This theorem has the corresponding generalization to n-dimensional Euclidean space.
Definition 4.
When an n-dimensional random vector X has the ch.f. q,(t)
=
exp[it'm - it'l1t]
where m is an n vector and A is a positive (n X n) matrix, then the corresponding distribution function is called normal (n-dimensional normal distribution) and is denoted by N(m, A). The parameters of the distribution function m and A are the mean and the covariance matrix of X, respectively.
Lemma 2. The ch.f. of the marginal distribution of any k components of an n-dimensional vector, say Xl' X2 ,... , Xk , is obtained from q,(t) by putting t i = 0, k + 1 ::s: i ::s: n.
327
NORMAL DISTRIBUTIONS
From Lemma 2 and Definition 4, the ch.f. of k components of X, x k ) , is given by
(Xl' X 2 , ••. ,
rp(u)
exp[iu'(L - tu'Mu]
=
where u is the first k components of t, fL is the first k components of m, and M is the k X k principal minor matrix of .1. Since 4>(u) has the same form as 4>(t), g = (Xl"'" xk) is also normally distributed with N((L, M), or any marginal distribution of a normal distribution is also a normal distribution. LINEAR TRANSFORMATIONS OF RANDOM VARIABLES
Let an n-dimensional random vector X be distributed according to N(m, A) with nonsingular .1. Then there exists a nonsingular (n X n) matrix C such that C'A.-IC
=
I
(1)
Define an n-dimensional random vector Y by CY
=
X - m
(2)
Then the ch.f. if;(t) of Y is given by if;(t)
=
E exp(it'Y)
=
E exp(it'C'(x - m))
= exp( -it'C'm) where rp(t)
E exp(it'C'x)
=
exp( -it'C'm) rp(Ct)
=
E exp(it'X)
Thus
n exp( -tt n
if;(t) = exp( -tt't)
=
2
i )
(3)
i=l
since C' A.C
=
(C'A.-IC)-I
=
1
Therefore Y is also normal and is distributed according to N(O, I). This fact generalizes to any linear transformation.
Lemma 3. Linear transformations on normal random vectors are also normal random vectors.
328
APPENDIX III
Since
J ... J exp[it'y -
tY'Y] dYl ... dYn
=
(27T)n lz exp[ -tt't]
En
where En is the n-dimensional Euclidean space, this shows that
~ (27T~nIZ
f(Yl ,... , Yn)
exp( -t y'y)
(4)
is the probability density function of the d.f. with ch.f. Eq. (3). From Eq. (4), E(y;) = i = 1,2, ... , n E(Yi Z) = 1,
°
E(YiYj)
i=Fj
0,
=
°
Therefore, the covariance matrix of the random vector Y is I. Thus and I in N(O, I) have the physical meaning of the mean and covariance matrices of ¥: E(Y) E(YY')
= =
°
I
(5)
This is also clear from the definition of the ch.f. The probability density function of x can be obtained from Eq. (4) by the change of variables, Eq. (2), as f(x l, X z ,..., xn ) =
(i~Lz
exp( -[t(X - m)'A-l(X - m)])
where] is the Jacobian of the transformation
] =
I 8Yi I= 8x
[ C-l [
j
and where CC '
=
A from (1) is used. Hence
I C 1= and
[A [1/2
I ] I = IA
I-liz
Therefore,
I(xl ,..., xn) =
(27T) n /211A
1
1/ 2
is the density function of N(m, A).
exp[ -t(X - m),A-l(X - m)]
(6)
329
NORMAL DISTRIBUTIONS
Notice that normal distributions are specified as soon as m and A are specified, in other words as soon as the mean and covariance matrices of the random vector X are specified. From (1), (2), and (5), E(X) = m,
E(X - m)(X - m),
=
CC'
(7)
=,1 PARTITION OF RANDOM VECTORS
Let X be an n-dimensional random vector with N(m, A). Assume that A is nonsingular. Partition X into two vectors Xl and X 2 of k and (n - k) dimensions each. Define All = E(X1 - m1)(XI - m 1)', ,122 = E(X2 - m 2)(X 2 - m 2)', ,112 = E(X1 - m1)(X2 - m2)'
m1 = E(X1 ) m2
= E(X2 )
(8)
If A l 2 = 0, then
I A I = I All I I ,122 I ,1-1
=
0)
(,1111
o
A;}
and the density function of x becomes, from Eq. (6), f(X1, X 2) = (27T)k/2+All X
11/2
(27T)(n-kl}2
exp( -{t(X1
-
m1)'A111(X l
-
m1)})
IA-;[172 exp( -{te X 2 - m2)'A;21(X2
-~
m2)})
(9)
Therefore when A l 2 = 0, Xl and X 2 are independent and are distributed according to Nim; ,An) and N(m 2 ,A 22), respectively. Thus, we have Lemma 4. Two uncorrelated normally distributed random vectors are independent. CONDITIONAL DISTRIBUTIONS
Generally, ,112
=I' 0
330
APPENDIX III
In this case, introduce random vectors YI and Y 2 of k and (n - k) dimensions each by Y I = Xl - DX2 Y2 = X 2
where D is a (k
X
(n - k)) matrix to be specified in a moment. Then
If D is chosen to be then YI and Y 2 are uncorrelated normally distributed random vectors, hence independent from Lemma 4. Since YI and Y2 are normally distributed, their distributions are specified by computing their means fil and fi2 and covariance matrices E I and E 2 where 11-1
= EY1 =
m 1 -
11-2
=
m2
EY2
=
111211221m2
and ];2 = E[(Y2 =
-
EY2)(Y2
-
EY2 )']
11 2 2
Then the joint density function of (Xl' X 2 ) when .112 #- 0 is given by
where ] =
I OX °Yi I,
I
~
i, j
~
n
j
Then the conditional probability density function of Xl on X is obtained from
This has the normal distribution law (10)
NORMAL DISTRIBUTIONS
331
Thus the conditional mean of a normally distributed random vector is linear in the conditioning vector X 2 :
SINGULAR DISTRIBUTIONS
When a covariance matrix A is positive semidefinite, then A-I does not exist and the density function cannot be obtained from the inversion formula as has been done in previous sections. The function t/J(t) of Definition 3, however, is still a ch.f. Therefore, there exists a corresponding d.f. even when A-I does not exist. (For necessary and sufficient condition for t/J(t) to be a ch.f. see, for example, Cramer). This d.f. can be obtained as a limit of a d.f. with nonsingular .11 k -+ A. For example, let
Akl now exists and the corresponding d.f. F k can be found. As €k -+ 0, a ch.f. with .11 k converges at every t, Then it can be shown that there exists a d.f. F with t/J(t) as its ch.f. to which F k converges at every continuity point of F. This limit d.f. is called a singular normal distribution. Let rank A
=
r
0, E(y j 2) = 0, E(Yi 2 )
r+I~j~n
by rearranging components of y, if necessary.
332
APPENDIX III
This implies Yi
=
0
Then, from Eq. (11), It is seen therefore can be expressed variables YI ,... , Yr' of Xl"'" X n , each dependent.
with probability 1,
x
=
m
r
+1~
j
~
n
+ CY
that random variables Xl"'" X n , with probability 1, as linear combinations of r uncorrelated random Since each Yi' 1 ~ i ~ r, is a linear combination Yi' 1 ~ i ~ r, is normally distributed and is in-
Theorem 2. If n random variables are distributed normally with the covariance matrix of rank r, then they can be expressed as linear combinations of r independent and normally distributed random variables with probability 1.
Appendix IV
Sufficient Statistics
INTRODUCTION
We have discussed in some detail, in Chapters II-IV, optimal control problems for a class of dynamical systems involving some random variables in the description of their environments, plants, and observation schemes. We have obtained optimal control policies for these problems by first computing y's, the conditional expected values of the criterion function, conditioned on the currently available information about the system and on the utilized control variables, then minimizing y's over the class of admissible control variables. In order to compute Yk we needed the conditional probability density functions p(x k I y k-l, uk-I) or P(XIc-1 I ylc-l, UIc~I). Also, in Chapter IV, in computing Ylc , we needed expressions for P(/l-Ic I vk) and P(VIc I vk~l) where /l-k and Vic are the unobserved and observed portions of the Markov process gk}' 'Ic = (/l-k , Vic)' Generally, expressions for P(Xk I yk, Uk), P(/l-k I vk), and P(Vk I vic) are rather complicated functions of the observed data and employed controls. An optimal controller must remember all past observations and past controls vk or ylc and uk in order to synthesize the optimal control vector 1. Thus, the optimal controller generally needs a memory at time k which grows with time. For certain classes of systems, however, we have seen that it is possible to compute these conditional probability densities by knowing only a fixed and finite number of quantities tk(ylc, Uk-i) of fixed dimensions. They are functions of the observed data (ylc, Ulc-I); i.e., for some problems, optimal control policies can be synthesized by knowing values of only a finite fixed number of functions of the observed data thus eliminating the need for a growing memory.
+
333
334
APPENDIX IV
Random variables which are functions of observed realizations (i.e., samples) of another random variable are called statistics. When statistics carry with them all information about the probability distribution function that can possibly be extracted by studying observed data, they are called sufficient statistics." Thus, we can realize optimal control policies with controllers of finite memory capacity if sufficient statistics exist for the problems. See, for example, Section 3 of Chapter II, Section 5 of Chapter III, and Section 2,B of Chapter IV. SUFFICIENT STATISTICS
A formal definition of sufficient statistics for random variables with probability density functions is as follows. " Let zn be a random sample with the probability density function p(zn; B) which depends on a parameter BEe. A statistic T 1 = t 1 (z n) (a real-valued function) is called a sufficient statistic for B if and only if, for any other real-valued statistics T 2 , ... , Tn such that the Jacobian is not identically zero, the conditional probability density function p(t2 , ... , t n I t 1 ) of T 2 , .. ·, Tn given T 1 = t 1 is independent of B. Namely, not only does B not appear in p(t2 , ... , t n I t 1 ) but also the domain of p(t2 , ... , t n I t 1 ) does not depend on B. A vector-valued sufficient statistic is similarly defined as a finite collection of real-valued sufficient statistics. The above definition is somewhat inconvenient to apply, since one must test conditional density functions of all statistics for the dependence on B. We have a criterion called the Fisher-Neyman criterion or factorization theorem which is much more convenient in practice to test if given statistics are sufficient or not. We state the theorem when the probability density function exists and when its domain is independent of B.
Factorization Theorem. T is a sufficient statistic for Bif and only if it is possible to factor the joint probability density function as p(zn; B)
=
g(zn) h(T, B)
where g does not involve B. Therefore, when a sufficient statistic T exists, an optimal controller needs to remember only T, and the problem of growing memones does not arise. We will now consider an example to illustrate the above discussion. In Section 4, the implications of the existence of sufficient statistics on controller memory requirements are further considered.
335
SUFFICIENT STATISTICS EXAMPLES 73
Consider a sample of size 2, Z2 = (ZI ,Z2) where ZI and Z2 are independent Gaussian random variables with unknown mean Band known variance 1. Then is a sufficient statistic for e. This can be seen, for example, by directly applying the definition. Consider any statistic t 2 = f(zl , Z2) such that ZI and Z2 are expressed by ZI = k 1(t 1 • t z) Zz
=
k Z(t l , t z)
and the Jacobian is nonzero. Then, by writing the density function for p(t l, t z ; 0)
=
=
Z2,
p(kl(t l, t z), kZ(tI' t z» I ] I
ILl
exp ( _ ~ [ (tl -; 20)2
where ] is independent of Since
+
Z t1 - 4 k 1(t1 t z) kZ(t1 , t z) ])
2
e.
the conditional density of t z given t 1 becomes
]
(t
Z
1 4k 1 k z ) p(tz I t 1 ; 0) = (27T)l/Z exp - ---4---
e.
e.
which is independent of Therefore t 1 is a sufficient statistic for Actually, that t 1 is a sufficient statistic for can be seen much more directly by applying the Fisher-Neyman criterion, by writing
e
P(ZI , Zz ; 0) = g(t l , 0) h(ZI , zz)
where
(tl
g(tl' 0)
=
1 (27T)I/Z exp -
-4 20)2 )
h(ZI' zz)
=
1 ( Z l -4 zz)Z ) (27T)I/Z exp -
Other examples are found in Hogg and Craig.?"
336
APPENDIX IV
SELF-REPRODUCING DISTRIBUTION FUNCTION
One of the extraordinary features of Gaussian random variables is that the transformations of a priori probability distribution functions by the Bayes rule into a posteriori distributions turn out to preserve the normal forms of the probability distribution functions when the plant and the observation equations are linear in the random variables. A normal distribution function is completely specified by its mean and covariance matrix. See, for example, Appendix III. This is the reason why the controllers need remember at any time i only two quantities, fLi and T', , in the examples of Section 4, Chapter II, and the controllers can compute the next set of numbers fLi+1 and T i + 1 given a new set of data Yi+l and u; . Unfortunately not all probability distribution functions share this property of "reproducing" the form of the a priori distribution function in the a posteriori distribution function form. If the a posteriori distribution functions have the same form as the a priori distribution functions, then only a set of parameter values need be determined to specify the particular distribution function. Since it is infinitely easier to specify a set of numbers than a function, one sees the advantage of choosing a priori probability distribution functions which reproduce in Bayesian optimization of control problems. See Spragins 1 28 ,1 29 for detail. We have already mentioned that normal distributions have the selfreproducing property. As another example, consider random variables Y such that Yi
=
P /0
with probability with probability
8 1- 8
and where
~ r(a + b + 2) a b Po(8) - I'ia + 1) r(b + 1) 8 (1 - 8) ,
0