Tamer Ba§ar Coordinated ~ University of Urbana, mini USA
Pierre Bernhard Inria Unite de Recherche Sophia-Antipolis Val...
27 downloads
690 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Tamer Ba§ar Coordinated ~ University of Urbana, mini USA
Pierre Bernhard Inria Unite de Recherche Sophia-Antipolis Valbone Cedex France
I ~
the main purpose of this book is to provide a self-contained exposition to the essential elements of (and relevant results from) linear-quadratic (LQ) zero-sum dynamic game theory, and to show how these tools can directly be used in solving the HOO-optimal control problem in both continuous and discrete time, for finite and infinite horizons, and under different perfect and imperfect state information patterns. The work presented here is based on partly independent and partly joint recent work of the authors on the topic. Some selected results have been reported earlier in [5], [6], [11],
[9], [8], [14], [12], [10], [23], [25].
1.2
A Relationship Between HOO-Optimal Control and LQ Zero-Sum Dynamic Games
For the basic worst-case controller design problem (in continuous or discrete time) let us adopt the following compact notation: z
y u
(1.1)
Here z, y, w and u denote, respectively, the controlled output, measured output, disturbance, and control variables, and they belong to appropriate
Introduction to Minimax Designs
5
Hilbert spaces, denoted 1i z , 1ly 1lw and 1i u , respectively. Gij, i,j = 1,2, are appropriate bounded causal linear operators, and so is
jj
EM, which is
called the controller. Here M is the controller space, which is assumed to be compatible with the information available to the controller. For every fixed jj
E M we can introduce a bounded causal linear operator TjJ : 1lw
-+
1lz,
defined by
where
jj
0
G2i denotes a composite operator, and we assume here that the
required inverse exists. (A sufficient condition for this would be for either jj
or G 2 2 to be strictly causal, which we henceforth assume.) Furthermore,
implicit in the definition of the controller space M is the requirement that, for infinite-horizon problems, every jj E M internally stabilizes the underlying system. The design objective is to optimally attenuate the disturbance at the output, which, in mathematical terms, is the optimization problem, (1.2a) where
~
. ~ denotes the operator norm of TjJ , i.e.,
~TjJ ~:=
Here
sup IITjJ(w)lIz == sup wE1i .. IIwll .. ~l
\\·110 denotes the appropriate Hilbert space norm, with the subscript
identifying the corresponding space. At places where there is no ambiguity from context, we will drop this identifying subscript. Now, in view of (1.2b), the optimization problem (1.2a) clearly defines a "min max" problem, and a natural question to ask is whether the "infimum" and "supremum" operators can be interchanged in the construction of these controllers. In the parlance of game theory, this question is translated into one of the equivalence of upper and lower values of a zero-sum game defined by the kernel IThe identity here is valid because J.I is a linear controller. H J.I is allowed to be a nonlinear controller, it then becomes more appropriate to work with the second expression.
Chapter 1
6
I\\wllw,
IITI'(w)lI.
a topic which will be studied later. Weshould note at
this point the obvious inequality (see Section 2.1 for details), upperA value
inf I'EM
IITI'(w)II./\\wl\w' >
SUp wE1lw
~
lower value ________
________
SUp wE1l",
inf I'EM
~A~
~
\\TI'(w)\\./\\w\\w',
(1.3)
and also point out to the fact that for a large class of problems to be studied in this book this inequality is in fact strict, unless the disturbance is allowed to be a stochastic process (see Section 3.3.3 for a demonstration of this point). Toward formulating a different (in many ways, a simpler) game associated with the worst-case design problem formulated above, let
---------------
us first assume that there exists a controller p.. E M satisfying the min,
imax disturbance attenuation bound ,. in (1.2a). Then, (1.2a) becomes equivalent to : (i)
(1.4a) and (ii) there is no other p. E M (say, jl), and a corresponding ::y
< ,., such
that
(l.4b) Now, introducing the parameterized (in,
~
0) family of cost functions:
(i) and (ii) above become equivalent to the problem of finding the smallest value of,
~
0 under which the upper value of the associated game with
objective function J-y(p., w) is bounded, and finding the corresponding controller that achieves this upper value. It will turn out that for the class of systems under consideration here, the zero-sum dynamic game defined by the kernel (1.5) has equal upper and lower values (whenever they are
7
Introduction to Minimax Designs
finite), which makes the existing theory on saddle-point solutions of linearquadratic dynamic games directly applicable to this class of worst-case design problems. The dynamic game whose performance index is given by
J-y (I-', w), as above, will be called the soft-constrained game associated with the disturbance attenuation problem. The terminology "soft-constrained" is used to reflect the fact that in this game there is no hard bound on the disturbance w, while in the original problem characterized by the kernel
J(I-', w) (see (1.5)) a norm bound had to be imposed on w. In both cases, the underlying dynamic optimization problem is a two-person zero-sum dy-
namic game, with the controller (u) being the minimizing player (henceforth called Player 1), and the disturbance being the maximizing player
(calle~
Player 2).
1.3
Discrete- and Continuous-time Models
The models we will work with in this book are the following state-space representations of (1.1) in discrete and continuous time:
Discrete Time : (1.6a) Zk = Hk (fhxk \
+ Gk-1Uk-1 + A-1Wk-1)"+Gkuk, .
v
(.
(1.6c) K
L(u,w)
= \(K+1I~1 + ~)zkI2;
QJ ~ 0,
(1.7a)
k=l
K
L-y(u,w)
= L(u,w) _,2 E IWkl2
(1.7b)
k=l
2The reason why we write the "controlled output" z as in (1.6b) is to be a.ble to consider different special cases in Chapters 3 -and 6, without introducing additional notation.
8
Chapter 1
where upper case letters denote matrices of appropriate dimensions, {xl:} is the state vector sequence, norm, and K is either finite
1·1 denotes an appropriate or +00. Furthermore,
Euclidean (semi-)
(1.8) where the control law (or, equivalently, control policy or strategy) J-L :=
J-L[l,K] EM is taken as some linear mapping, even though we will establish some results also for a class of nonlinear controllers. We will also consider the cases where some structural constraints are imposed on the dependence of the control on the measurement sequence
Y[l,k]
(such as delayed
measurements). For any admissible cc5ntrollaw J-L[l,K], if the control action generated by (1.8) is substituted into (1.7b), we will denote the resulting objective function as J..,(J-L, w), which is defined over M x 1lw (whereas
L.., was defined on 1lu x 1lw).3 This corresponds to (1.5) defined in the previous section. Likewise, the counterpart of L (given by (1.7a)) will be denoted by J. Note that the functional J is in fact J.., evaluated at ,
= 0;
the same interpretation applies to the relationship between Land L..,.
Continuous Time:
x = A(t)x +B(t)u(t) + D(t)w(t),
L(u, w)
x(O) =
Xo
(1.9a)
z(t) = H(t)x(t) + G(t)u(t)
(1.9b)
y(t) = C(t)x(t) + E(t)w(t)
(1.9c)
= Ix(tJ )I Q2 I + L..,(u, w)
ltl 0
Iz(t)1 2 dt;
- w) = L(u,
u(t)
,2 10['I Iw(t)1
= J-L(t, Y[O,t]),
t~0
(1.10a) 2
dt
(1.10b) (1.11)
3Clearly, J.., is identical with the functional L.., when the control is restricted to be open-loop. Since J.., is more general than L.." we will mostly be using the notation J.., (and correspondingly J) in this book, even when working with action variables, unless the specific context requires the difference between the two to be explicitly recognized.
-,1
-,1
9
Introduction to Minimax Designs
where J-L E M is again a linear controller, compatible with the underlying information structure (as to whether continuous or sampled measurements are made, whether there is delayed dependence on Y[O,tj, etc.). Again, we will use the notation J-y(J-L, w), to denote the counterpart of the performance index (1.lOb) (or equivalently (1.5)), defined over the product space M x 1iw, and also use J(J-L, w) to denote the counterpart of (1.10a). We will
study in this book both the finite horizon (finite tf) and infinite horizon
(tf
-+
00) cases, and some of the results to be presented will be valid even
in the general class of nonlinear controllers. In both discrete and continuous time, the initial states (Xl and xo) will either be taken to be zero, or be taken as part of the disturbance, and in the latter case an additional negative term will be added to (1.7b) and (1.10b), which will involve Euclidean norms of Xl
and Xo, respectively. We postpone the discussion on the introduction of
these norms into the performance indices until Sections 3.5 and 4.5.
1.4
Organization of the Book
In the next chapter, we introduce some basic notions from zero-sum static and dynamic game theory, and present some key results on the existence, uniqueness, and characterization of saddle-point equilibria of general games. Chapter 3 deals with the class of discrete-time linear-quadratic dynamic games with soft constraints and under various perfect state information patterns, and obtains necessary and sufficient conditions for boundedness of the upper value. These results are then applied to the discrete-time H oo _ optimal control problem so as to obtain optimal controllers under perfect state and delayed state information patterns. Both finite- and infinitehorizon problems are
address~d,
and in the latter case the stabilizability of
the optimal controller is established under appropriate conditions. Chapter 4 presents the counterparts of these results in the continuous time, and particularly deals with the perfect state and sampled state information patterns. Complete extensions to the imperfect information case are developed
10
Chapter 1
in the next two chapters. The first of these (Chapter 5) is. devoted to the class of continuous-time linear-quadratic game and minirllax design problems, under both continuous and sampled imperfect state measurements. Chapter 6, on the other hand, develops a complete set of results for the discrete-time problem with imperfect (disturbance corrupted) state measurements. Chapter 7 discusses a class of related minimax design problems in filtering and smoothing. This is followed by two appendices (Chapters 8 and 9) that present some useful results on conjugate points, which are extensively used in the developments of Chapters 4 and 5, and a technical result needed in the proof of Theorem 5.1 in Chapter 5. The book ends with a list of references, and a table that indicates the page numbers of the Lemmas, Theorems, Definitions, etc. appearing in the text.
1.5
Conventions, Notation and Terminology
The book comprises seven chapters and two appendices. Each chapter is divided into sections, and sections occasionally into subsections. Section 3.2, for example, refers to the second section of Chapter 3, while Section 3.2.3 is the third subsection of Section '3.2. Items like theorems, definitions, lemmas, etc.; are identified within each chapter according to the "telephone numbering system"; thus, Theorem 4.2 would be the second theorem of Chapter 4, and equation (5.8) would be the eighth equation of Chapter 5. The following symbols are adopted in the book, unless stated otherwise in a specific context: n-dimensional Euclidean space If
the set of natural numbers defined by defines
v
for all end of proof, remark, lemma, etc.
j
11
Inrroduction to Minimax Designs
x
state variable;
Xk
in discrete time and
x(t) in continuous time x~ (A')
transpose of the vector
IXkl II xII
a Euclidean norm of the state vector x at time k a Hilbert space norm of the state trajectory x
U
control variable;
Uk
Xk
(of matrix A)
in discrete time
and u(t) in continuous time
u
space where the control trajectory lies; occasionally denoted by llu, 1£ standing for Hilbert space
w
disturbance variable;
Wk
in discrete time and
w(t) in continuous time
w
space where the disturbance trajectory lies; occasionally denoted by llw control restricted to the discrete or continuous time interval [t 1, t2J; also the notation u t is used to denote the sequence U[l,t) in discrete time and u[O,t)
y
in continuous time
measurement variable; Yk in discrete time and
y(t) in continuous time
y
space where the measurement trajectory lies
p. EM
control policy, belonging to a given policy space M
vEN' e2([t 1,t2J,lR n )
policy for disturbance, belonging to policy space N the Hilbert space of square summable functions on the discrete interval [t 1, t2], taking values in lR n the Hilbert space of square integrable functions on
[t 1 , t 2 ], taking values in 1R n J
performance index (cost function) of the original disturbance attenuation problem
J-y
I
performance index of the associated soft-constrained dynamic game, parameterized by 'Y
>0
12
t
II I ;
I!
Chapter 1
ARE
algebraic Riccati equation
RDE
Riccati differential equation
FB
feedback
LQ
line ar-qu adr atic
CLPS
closed-loop perfect state
SDPS
sampled-data perfect state
DPS
delayed perfect state (in the continuous time)
CLD
closed-loop delayed (in the discrete time)
Tr[A]
trace of the square matrix A
peA)
spectral radius of the square matrix A
Ker E
kernel of the matrix (or operator) E
f !
Throughout the book, we will use the words "optimal" and "minimax" interchangeably, when referring to a controller that minimizes the maximum norm. Similarly, the corresponding attenuation level will be referred to as optimal or minimax, interchangeably. Also, the words sury·ective and onto
(and likewise, injective and one-to-one) will be used interchangeably, the former more extensively than the latter.
Chapter 2 Basic Elements of Static and Dynamic Games
Since our approach in this book is based on (dynamic) game theory, it will be useful to present at the outset some of the basic notions of zero-sum game theory, and some general results on the existence and characterization of saddle points. We first discuss, in the next section, static zero-sum games, that is games where the actions of the players are selected independently of each other; in this case we also say that the players' strategies are constants. We then discuss in Sections 2.2 and 2.3 some general properties of dynamic I
games (with possibly nonlinear dynamics), first in the discrete time and then in the continuous time, with the latter class of games also known as differential games. In both cases we also introduce the important notions of representation of a strategy, strong time consistency, and noise i;"sensitivity.
2.1
Static Zero-Sum Games
Let L = L(u, w) be a functional defined on a product vector space U x W, to be minimized by u E U C U and maximized by w EWe W, where U and Ware the constraint sets. This defines a zero-sum game, with kernel
L, in connection with which we "an introduce two values: Upper value:
L:= inf sup L(u,w) uEU wEW
(2.1a)
.oj
Chapter 2
14 Lower value:
J:..:=
(2.1b)
sup inf L(u, w) wEwuEU
with the obvious inequality
L ? J:...
(2.1c)
If we have an equality in (2.1c), the common value
(2.2) is called the value of the zero-sum game, and furthermore if there exists a pair (u· E U, w* E W) such that
L(u*, w*)
= L*
(2.3)
then the pair (u*, w*) is called a (pure-strategy) saddle-point solution. In this case we say that the game admits a saddle point (in pure strategies). Such a saddle-point solution will equivalently satisfy the so-called "pair of saddle-point inequalities": L(u*,w)~L(u*,w·)~L(u,w*),
VuEU,VwEW.
(2.4)
Not every zero-sum game admits a saddle point, or even a value. Consider, for example, the zero-sum game:
L(u,w)=(u-w?;
U=W=[0,2]
for which
J:..= 0, L = 1, and hence the game does not have a value (in pure strategies). If, however, we take for w the two-point probability mass function
p:'
1
W·P·"2
and choose
u· = 1,
with probability (w.p.) 1,
defined by
15
Static and Dynamic Games
then min UE[O,2j
[(u_O)2~+(U_2)2~]=1 2 2
which is attained uniquely at u
= 1, and
max {(I - W)2} = 1 wE[O,2j
attained at w
= 0 and w = 2,
which is the support set of p:,. Hence, in
the extended space of "mixed policies", the game admits a "mixed" saddle point. To define such a saddle point in precise terms, let
F(P, Q)
=
JJ
L(u, w)P(du)Q(dw)
(2.5a)
uxW
where P (respectively, Q) is a probability measure on U (respectively, W). The counterpart of (2.1c) now is
F where
:= ~fsup
F (respectively,
Q
F(P, Q) ~ supi~f F(P, Q) =: E Q
(2 .5b)
F) is the upper (respectively, lower) value in mixed
strategies (policies). If
F = F =: F·,
(2.6a)
then F· is the value in mixed strategies. Furthermore, if there exist probability measures (PO, Q.) such that
F(P", Q")
= F",
(2.6b)
then (P", Q") is a mixed saddle-point solution. Some of the standard existence results on pure and mixed saddle points are the following (see [17] for proofs):
Theorem 2.1. If U and Ware finite sets, the zero-sum game admits a
saddle point in mixed policies.
16
Chapter 2
Theorem 2.2. Let U, W be compact, and L. be continuous in the pair ('1.1,
w). Then, there exists a saddle poin t in mixed policies.
Theorem 2.3. In addition to the hypotheses of Theorem 2.2 above, let
U and W be convex, L(u, w) be convex in '1.1 E U for every w E W, and concave in w E W for every'll E U. Then there exists a saddle point in pure policies. If, furthermore, Lis strictly convex-concave, the saddle point solution is unique.
If the saddle-point solution of a zero-sum game is not unique, then any ordered combination of these multiple saddle-point equilibria could be adopted as a solution, in view of the following simple, but very useful property [17].
Property 2.1:
Ordered interchangeability. For a zero-sum game,
{LjU,W}, if (il,w) E U x Wand (il,w) E U x W are two saddle-point pairs, then the pairs (il, ill) and (ti, w) also constitute saddle points.
0
This ordered interchangeability property of multiple saddle poin ts holds not only for pure-strategy saddle points, but also for mixed-strategy ones, in which case the action variables'll and ware replaced by the corresponding probability measures P and Q. It also holds in the context of dynamic games, with the action variables now replaced by the corresponding strategies of the players.
2.2
Discrete-Time Dynamic Games
In this section we introduce a general class of
finite-hori~on
discrete-time
zero-sum games, discuss in this context various information patterns, and present a sufficient condition for the existence of a saddle point when the
I 1
17
Static and Dynamic Games
information pattern is perfect state. We also introduce the notions of representation of a strategy, strong time consistency, and (asymptotic) noise insensitivity.
Consider a zero-sum dynamic game described by the state equation
(2.7a) (2.7b) and with the finite-horizon cost given by K
L(u,w) =
(2.8)
Lgk(Xk+l,Uk,Wk,Xk) k=l
which is to be minimized by Player 1 and maximized by Player 2, using the control vectors
IR n ,
U[l,K]
and
W[l,K],
respectively. The state
Xk
takes values in
the measurement vector Yk takes values in IRP, and the control vector
of Player i takes values in IR fl., i = 1,2. It is assumed (at this point) that the initial state
Xl
is known to both players.
The formulation above will not be complete, unless we specify the information structure of the problem, that is the nature of the dependence of the control variables on the state or the measurement vector. In this book, we will be interested primarily in four different types of information l
:
(i) Closed-loop perfect state (CLPS): The control is allowed to depend on the current as well as the entire past values of the state, i.e.,
(2.9a) X[l,k]
I 1
(Xl> ... ,Xk)
where Il[l,k] := (Ill, .. " Ilk) is known as a (control) policy. We let MCLPS
denote the set Qf all such (Borel measurable) policies.
(ii) Closed-loop 1-step delay (CLD): The control is not allowed to depend on the current value of the state, but it can depend on the entire past 1 Here we describe these from the point of view of Player 1; a similar listing could be given for Player 2 as well.
18
Chapter 2
values of the state, i.e.,
(2.9b) k = 1. We denote the corresponding policy space by
MCLD.
(iii) Open-loop (OL): The control depends only on the initial state, i.e.,
(2.9c) The corresponding policy space is denoted by
MOL.
(iv) Closed-loop imperfect state (CLlS): The control is allowed to depend on the current as well as the entire past values of the measurement vector, i.e.,
k We let
MCLls
(2.9d)
= 1.
denote the set of all such (Borel measurable) policies.
Several delayed versions o(this CLIS information pattern will also be used in this bookj but we postpone their discussion until Chapter 6. Let M denote the policy space of Player 1 under anyone of the information patterns introduced above. Similarly let N denote the policy space of Player 2. Further, introduce the function J : M
J(J1.,II)=L(u,w),
X
N
-+
IR by
Uk=J1.kO, Wk=lIk(·), kE[l,K]j (2.10)
(J1.[I,Kl, II[I,Kl)
EM X N,
where we have suppressed the dependence on the initial state
Xl.
The
triple {JjM,N} constitutes the normal form of the zero-sum dynamic game, in the context of which we can introduce the notion of a saddle-point equilibrium, as in (2.4).
J
19
Static and Dynamic Games
Definition 2.1. Given a zero-sum dynamic game {J;M,N} in normal form, a pair of policies (p.* , v*) E M x N constitutes a saddle-point solu tion
if, for all (p., v) EM x N,
J(p.*,v)::; J*:= J(p.*,v*)::; J(p.,v*) The quantity J* is the value of the dynamic game.
(2.11a)
The value is defined even if a saddle-point solution does not exist, as .
J:= Here
inf supJ(p.,v)
I'EM vof
J and 1. are the
= J* = vof sup inf J(p.,v) =:J.... I'EM
(2.11b)
upper value and the lower value, respectively, and
generally we have the inequality J ~
I,
as discussed in Section 2.1 in the
context of static games. Only when they are equal, as in (2.11b), that the value J* of the game is defined.
In single player optimization (equivalently, optimal control) problems, the method of dynamic programming provides an effective means of obtaining the optimal solution whenever it exists, -by solving a sequence of static optimization problems in retrograde time. The counterpart of this in two-player zero-sum dynamic games with CLPS information pattern is a recursive equation that involves (again in retrograde time) the saddle-point solutions of static games. Such an equation provides a sufficient condition for the existence of a saddle point for a dynamic game, and it will play an important role in the developments of Chapters 3 and 6. It is known as Isaacs' equation, being the discrete-time version of a similar equation obtained by Rufus Isaacs in the early 1950's in the continuous time [49] (see Section 2.3):
1
.;.~
20
Chapter 2
gk(fk(x'~k(X),Vk(X))'~k(X),Vk(X),X) +Vk+l(fk(X,~kCx),
VK+l(X)
vk(x))) ;
(2.12)
== O.
We now have the following sufficiency result, whose proof can be found in
([17]; Chapter 6). Theorem 2.4. Let there exist a function Vk(·),k 2: 1, and two policies ~* E MCLPS,
v· E NCLPS, generated by (2.12). Then, the pair (~., v*)
provides a saddle point for the discrete-time dynamic game formulated in this section, with the corresponding (saddle-point) value being given by o Note that the policies (~. = ~[l,K]' v· = V[l,K]) are generated recursively in retrograde time, and for each k E [1, K] the computation of the pair
(~k'
Vk) involves the saddle-point solution ofastatic game with (initial)
state Xk. This makes these policies dependent only on the current value of the state, which are therefore called feedback policies. A second point to note in (2.12) is the requirerpent that every static game encountered in the recursion have a (pure-strategy) saddle point. If this requirement does not hold, then the first and second lines of (2.12) lead to different values, say
Vk and L, respectively, with Vl(Xl) corresponding to the upper value and V 1(Xl) to the lower value of the dynamic game, assuming that these values are bounded [15]. Even if the sufficiency result of Theorem 2.4 holds, this does not necessarily imply that all saddle-point policies of the dynamic game under the CLPS information pattern are generated by (2.12). In the characterization of all saddle-point policies of a given dynamic game, or in the generation of saddle points under different information patterns, the following notion of a representation of a strategy (see, [17]; Chapter 5) proves to be very useful, as it will also become evident from the analyses of the chapters to follow.
21
Static and Dynamic Games
Definition 2.2. For a given two-player dynamic game with policy spaces M and N (for Players 1 and 2, respectively), let (J-I E M, II E N) and (jJ E M, iI E N) be two pairs of policies. These are representations of each
other if (i) they both generate the same unique state trajectory, say X[I,K+l], and (ii) they admit the same pair of open-loop values on this trajectory, i.e., J-I1:(X[I,1:])
jJ1:(X[I,1:]) ,
. 111:( X[1,1:])
il1:(X[I,1:]);
Vk E [1, K] . o
Clearly, under dynamic information patterns, a given pair of policies will have infinitely many representations; furthermore, if this pair happens to provide a saddle-point solution, then every representation will be a candidate saddle-point solution for the game, as implied by the following useful result.
Theorem 2.5. For a given dynamic game, let (J-I" EM, II" E N) and (jJ E M, iI E N) be two pairs of saddle-point policies. Then, necessarily,
they are representations of each other.
Proof.
It follows from the ordered interchangeability property of multi-
ple saddle points (cf. Property 2.1) that (jJ, II") is also a saddle-point pair. This implies that, with II = II" fixed, both J-I" andjJ (globally) minimize the game performance index (i.e. J(J-I,II*)) over M, and hence are representations of each other. A symmetrical argument shows that II" and iI are also representations of each other, thus completing the proof.
o
Faced with the possibility of existence of multiple saddle-point equilibria in dynamic games when (at least one of) the players have access to
Chapter 2
22
dynamic (such as CLPS, or CLlS) information,2 a natural question to ask is whether one can refine the notion of a saddle-point soiution further, so as to ensure unicity of equilibria. Two such refinement schemes, which we will occasionally refer to in the following chapters, are strong time consistency and (asymptotic) noise insensitivity [7]. We call a (saddle-point) solution "strongly time consistent," if it provides a solution to any truncated version of the original game, regardless of the values of the new initial states. 3 More precisely,
Definition 2.3: Strong time consistency.
From the original game
defined on the time interval [1, K], construct a new game on a shorter time interval [I, K], by setting J.t[O,l-l]
= a[O,l_l], 1I[0,l-1] = I1[O,l-l],
a(O,l-l]
and I1[O,l-l] are fixed but arbitrarily chosen. Let (J.t* E
NCLPS)
be a saddle-point solution for the original game.
"strongly time consistent," if the pair
(J.t[l,K] ' v(~,K])
where
MCLPS, V*
E
Then, it is
is a saddle-point so-
lution of the new game (on the interval [I, K]), regardless of the choices for a(O,l-l],I1(O,l-l],
and for every I, 2 ~ I ~ K.
Clearly, every solution generated by the recursion (2.12) satisfies (by construction) this additional .requirement, and is therefore. strongly time consistent. Because of this special feature, such a closed-loop solution is also called a feedback saddle-point solution. A feedback solution is not necessarily unique, unless every static game encountered in (2.12) admits a unique saddle-point solution. The second refinement scheme, "noise insensitivity," or its weaker version, "asymptotic noise insensitivity" , refers to the robustness of the saddlepoint solution to additive stochastic perturbations in the state dynamics, 2 All these equilibria lead necessarily to the same saddle-point value (in view of Theorem 2.5), but not necessarily to the same existence conditions - an important point which will become clear in subsequent chapters. 3 A weaker notion is that of weak time consistency, where the truncated game is assumed to have an initial state on the equilibrium trajectory of the original game [7]. Every saddle-point solution is necessarily weakly time consistent.
I
23
Static and Dynamic Games
modeled by zero-mean white noise processes. Such a "stochastically perturbed" game will have its state dynamics (2.7a) replaced by
where {O",k E [1,K]} is a sequence of n-dimensional independent zeromean random vectors, with probability distribution p8. Furthermore, the performance index of the perturbed game will be given by the expected value of (2.8), with the expectation taken with respect to p8. We are now in a position to introduce the notion of (asymptotic) noise insensitivity.
Definition 2.4: Noise insensitivity.
A saddle-point solution of the
unperturbed dynamic game under a given information pattern is noise insensitive if it also provides a saddle point for the stochastically perturbed game introduced above, for every (zero-mean, independent) probability distribution p8. It is asymptotically noise insensitive if it can be obtained as the limit of every saddle-point solution of the perturbed game, as p8 Ulllformly converges4 to the one-point measure concentrated at 0 = O.
Another appealing feature of the feedback saddle-point solution generated by Isaacs' equation (2.12) is that it is asymptotically noise insensi tive.
If the state dynamics are linear, and the objective function is quadratic in the three variables, as in (1.6a)-(1.7a), the feedback solution is in fact noise insensitive. This important property will be further elucidated on in the
I
next chapter. ~Here, the convergence is defined with respect to the weak topology on the underlying sample space (a Polish space); see, for example, [48]. pp. 25-29.
24
Chapter 2
2.3
Continuous-Time Dynamic Games
In the continuous-time formulation, (2.7a), (2.7b) and (2.8) are replaced, respectively, by
!x(t) =:
x=
f(tjx(t),u(t),w(t)),
y(t) = h(tjx(t), w(t)), L(u,w)
= q(x(tf)) +
1t/
t:2: 0
(2.I3a)
t:2: 0
(2.I3b)
g(tjx(t),u(t),w(t))dt,
(2.14)
where t f is the terminal time, and the initial state Xo is taken to be fixed (at this point) and known to both players. Here we introduce five types of -information structure, "listed below:5 (i) Closed-loop perfect state (CLPS):
u(t) = I-'(tj x(s), s
~
t),
t:2: O.
(2.15a)
Here we have to impose some additional (other than measurability) conditions on I-' so that the state differential equation (2.I3a) admits a unique solution. One such condition is to assume that
1-'0
is Lipschitz-continuous in x. We will not discuss these conditions in detail here, because it will turn out (see, Chapters 4 and 8) that they are not explicitly used in the derivation oflthe relevant saddle-point solution in the linear-quadratic game of real interest to us. Nevertheless, let us introduce the policy space
MCLPS
to denote the general
class of smooth closed-loop policies. (ii) Sampled-data perfect state (SDPS): Here we have a partitioning of the time interval [0, tf], as
where tic, k E {I, ... , K}, is a sampling point. The controller has access to the values of the state only at the past (and present, if any) 5 Again,
we list them here only with respect to Player 1.
25
SUllie and Dynamic Games
sampling points, i.e.,
u(t) = Jl(tj X(tk)' ... ' X(t 1), XO), (2.15b) tk~t 0 time units. Hence, admissible
controllers are of the form:
u(t)
t~8
Jl(tjX[O,t_8j),
o ~ t < 8,
Jl(tj xo),
(2.15c).
where Jl(.) is piecewise continuous in t, and Lipschitz-continuous in x. We denote the class of such control policies by Mw.
(iv) Open-loop (OL):
u(t)
= Jl(tj xo),
t
~ 0
(2.15d)
where Jl(.) is piecewise continuous in t and measurable in Xo. The corresponding policy space is denoted by
MOL.
(v) Closed-loop imperfect state (CLlS): The control is allowed to depend on the current as well as the entire past values of the measurement vector, i.e.,
u(t)
= Jl(tjy(s), s ~ t),
t.~ 0,
(2.15e)
where the dependence on the initial state, whose value is known, has been suppressed. Here again we have to impose some regularity and growth conditions on Jl
~o
that the state differential equation admits
a unique solution. We let MeLIs denote the set of all such (Borel measurable) policies. Sampled-data and delayed versions of this eLlS information pattern will also be used in this bookj but we postpone their introduction until Chapter 5.
26
CluJpter2
The normal form for a continuous-time dynamic (differential) game can be defined as in (2.10), with Definition 2.1 being equally valid here for a saddle point. The counterpart of the recursion (2.12) is now the following continuous-time Isaacs' equation, which is a generalization of the Hamilton-J acobi-Bellman partial differential equation that provides a sufficient condition in optimal control: -
aV(t; x ) . a = mm
t
max
uERm, wERm.
[aV(t; x) a J(t; x, tL, w) X
+ get; x, tL, w) ]
. [aV(t;x) = wERm. max mm a J (t;X,tL,w ) +g (t;X,tL,W )] uERm, X
'*( t;x )) = aV(t;x)!( ax t;x,p . ( t;x) ,II +g(t;x,p*(t;x),II*(t;x)) ; (2.16) The following is now the counterpart of Theorem 2.4, in the continuous time, whose proof can be found in ([17]; Chapter 8).
Theorem 2.6. Let there exist a continuously differentiable function Vet; x) on [0, t,] x IR n , satisfying the partial differential equation (2.16), and two policies p* E
MCLPS,
11* E
}fCLPS,
as constructed above. Then,
the pair (p*, II·) provides a saddle point for the (continuous-time) differential game formulated in this section, with the corresponding (saddle-point) value being given by V(O; xo).
0
The notion of a representation of a strategy in continuous time can be introduced exactly as in Definition 2.2. Furthermore, the notions of strong time consistency and (asymptotic) noise insensitivity introduced in the discrete time in Section 2.2 find natural counterparts here. For the former, we simply replace the discrete time interval [1, K] with the continuous interval [O,t,]:
I
27
Static and Dynamic Games
Definition 2.5: Strong time consistency.
From the original game
defined on the time interval [0, tf], construct a new game on a shorter time interval [f, tf], by setting fJ[O,l)
P[O,l)
= a[O,l),
Zl[o,l)
are fixed but arbitrarily chosen. Let (p" E
= fJ[O,l),
where
a[O,l)
and
MCLPS, II" E }/CLPS)
be a
saddle-point solution for the original game. Then, it is "strongly time consistent," if the pair (Pll,t}]' 1I[~,t}]) is a saddle-point solution of the new game (on the interval [f, t f]), regardless of the admissible choices for and for every f E (0, t f).
a[O,l), fi[O,l), 0
Every solution generated by Isaacs' equation (2.16) is strongly time consistent, and it is called (as in the discrete-time case) a feedback saddlepoint solution.
Noise insensitivity can also be defined as in Definition 2.4, with the stochastic perturbation now modeled as a zero-mean independentincrement process. For the linear-quadratic· differential game, it suffices to choose this as a standard Wiener process, with respect to which the feedback saddle-point solution is also noise insensitive. For details, ·the reader is referred to [[17], chapter 6].
J
Chapter 3 The Discrete-Time Minimax Design Problem With Perfect State Measurements
3.1
Introduction
In this chapter, we study the discrete-time minimax controller design problem, as formulated by (1.6)-(1.7), when the controller is allowed to have perfect access to the system state, either without or with one step delay. We first consider the case when the controlled output is a concatenation of the system state and the current value of the control, and the initial state is zero; that is, the system dynamics and the performance index are (without any loss of generality in this claSs):
K
L(u,w)
= IXK+ll~J + L
{IXkl~k + IUkn,
(3.2a)
k=l
Qf
~
0;
Qk
~
0, k
= 1,2, ...
Note that in terms of the notation of (1.6b), we have
For this class of problems, we study in Section 2 the boundedness of the upper value, and the existence and characterization of the saddle-point solution of the associated soft-constrained linear-quadratic game with ob-
J
29
Discrete-Time Problem with Perfect State Measurements
jective function (1.7b), which in this case is written as: K
L1'(u,w)
= \XK+l\~J + L
{\Xk\~.
+ \Uk\2 - ,2\Wk\2} ,
k=l
Qf
~
0;
Qk
~
(3.2b)
0, k = 1,2, ...
We will alternatively use the notation J1'(J.', w), for L1" as introduced in Section 1.3, when the control u is generated by a particular policy J.' E M, compatible with the given information pattern. We study this
soft-constrained dynamic game under three different information patterns; namely, open-loop, closed-loop perfect state (CLPS), and closed-loop onestep delay (CLD), which were introduced in Section 2.2. We then use, in Section 3.3, the solutions of the soft-constrained games to construct optimal (minimax) controllers for the disturbance attenuation problem, under the CLPS and CLD information patterns. We also study the saddle-point property of this optimal controller in the context of the original hard-constrained disturbance attenuation problem, with the disturbance taken as a random sequence. Section 3.4 develops the counterparts of these results for the time-invariant infinite-horizon problem, and Section 3.5 extends them to the more general class where the controlled output is given by (1.6b), and the initial state
Xl
is unknown. Section 3.6 discusses some extensions to
nonlinear systems, and the chapter concludes with Section 3.7 which provides a summary of the main results.
3.2
The Soft-Constrained Linear-Quadratic Dynamic Game
We study the soft-constrained linear-quadratic dynamic game described by (3.1) and (3.2b), under three different information patterns, where we take
J
the initial state
Xl
to be known, but not necessarily zero, as in Section 2.2.
30
Chapter 3
3.2.1
Open-loop information structure for both players.
The procedure to obtain the saddle-point solution of this game is first to fix the open-loop policy of Player 2, say W[l,Kj, and minimize L,(u, W[l,Kj)
given by (3.2b) with respect to
u
=
U[l,Kj.
From linear-quadratic optimal
control theory (see, e.g., [56]) it is well-known that the solution exists and is unique (since L, is strictly convex in u), and is characterized in terms of the standard discrete-time Riccati equation. Furthermore, the optimal u (to be denoted u[l,Kj
=
u*)
is linear in
£1(W[1,Kj,X1)'
Player 1, say
U[l,Kj,
W[l,Kj
and
Xl,
i.e., for some linear function £1(.),
Now, conversely, if we fix the open-loop policy of
and maximize J,( U[l,Kj, w) with respect to w =
the solution will again be linear, i.e.,
W[l,Kj
= £2(fi[1,Kj, Xl)'
W[l,Kj,
but existence
and uniqueness will depend on whether L, is strictly concave in w or not. This requirement translates into the following condition: Lemma 3.1. The quadratic objective functional L,(u, w) given by (3.2b),
and under the state equation (3.1), is strictly concave in w for every openloop policy u of Player 1, if, and only if,
(3.3a) where the sequence Sk+1, k E [1, K], is generated by the Riccati equation
(3.3b) o
Under condition (3.3a), the quadratic open-loop game becomes strictly convex-concave, and it follows from Theorem 2.3 1 that it admits a unique 1 Here, even though the sets U and W are not compact, it follows from the quadratic nature of the objective function that u and w can be restricted to closed and bOilllded (hence compact) subsets of finite-dimensional spaces, without affecting the saddle-point solution. Hence, Theorem 2.3 can ·be used.
.j'•
31
Discrete-TIme Problem with Perfect Stale Measurements
(pure-strategy) saddle point - - which has to be the unique fixed point of the linear mappings:
Certain manipulations, details of which can be found in ([17], p. 248), lead to the conclusion that this unique fixed point is characterized in terms of a matrix sequence generated by another (other than (3.3b)) discrete-time Riccati equation. The result is given in the following theorem:
Theorem 3.1. For the discrete-time linear-quadratic zero-sum dynamic game with open-loop information structure, let (3.3a) be satisfied and M"i
k E [1, K], be a sequence of matrices generated by
(3.4a) where
(3.4b) Then, (i) Ak, k E [1, KJ, are invertible. (ii) The game admits a unique saddle-point solution, given by
(3.5a) k E [l,K]
(3.5b)
where {Xk+l' k E [1, K]} is the corresponding state trajectory, generated by
(3.5c)
j
(iii) The saddle-point value of the game is
L, *- L, U (*, W *) --Xl'MlXl·
(3.6)
Chapter 3
32
(iv) If, however, the matrix in (3.3a) has at least one negative eigenvalue, then the upper value becomes unbounded.
Before closing this subsection, we note that if the matrix sequence generated by (3.4a)-(3.4b) is invertible, then (3.4a) can be written in the more appealing (symmetric) form
A sufficient condition for invertibility (and hence positive definiteness) of M"" rank
3.2.2
k E [1, K], is that (A~ Hk)"
(A~
Hk)
k E [1, K] be injective, that is
= n, k E [1, K], and Qf be positive definite.
Closed-loop perfect state information for both players.
When players have access to closed-loop state information (with memory), then, as discussed in Section 2.2, we cannot expect the saddle-point solution to be unique. There will in general exist a multiplicity of saddlepoint equilibria - - each one leading to the same value (see Theorem 2.5), but not necessarily requiring the same existence conditions. To ensure unicity of equilibrium, one can bring in one of the refinement schemes introduced in Section 2.2, but more importantly, because of the intended application of this result, we have to make sure that the solution arrived at requires the least stringent condition on the positive parameter /, among all saddle-point solutions to the problem. Both these considerations in fact lead to the same (feedback saddle-point 2 ) solution, which is given in the following theorem.
Theorem 3.2. For the two-person zero-sum dynamic game with c1osed-
loop perfect state information pattern, and with a fixed / > 0, 2For this terminology, see Section 2.2.
33
Discrete-Time Problem with Perfect State Measurements
(i) There exists a unique feedback saddle-point solution if, and only if,
where the sequence of nonnegative definite matrices Mk+l, k E [1, K], is generated by (3.4a). (ii) Under condition (3.7), the matrices Ak, k E K, are invertible, and the unique feedback saddle-point policies are (3.8) Wk*
= Vk*() Xk = I -2D'M 10 10+1 A10 A kXk, 1
k E [1, K],
(3.9)
with the corresponding unique state trajectory generated by the difference equation (3.10)
and the saddle-point value is the same as (3.6), that is (3.11)
(iii) If the matrix 310 in (3.7) has a negative eigenvalue for some k E [1, K], then the game does not admit a saddle point under any information structure, and its upper value becomes unbounded. Proof.
Parts (i) and (ii) of the Theorem follow by showing that the
given saddle-point policies uniquely solve the Isaacs' equation (2.12) for this linear-quadratic game, with the corresponding value function being
Vk(X)
= x'Mkx ,
kE[l,K].
For each k, existence of a unique saddle point to the corresponding static game in (2.12) is guaranteed by the positive definiteness of 310, and if there is some
k E [1, K] such that
3~ has a negative eigenvalue, then the corre-
sponding static game in the sequence does not admit a saddle point, and
Chapter 3
34
being a quadratic game this implies that the upper value of the game (which then is different from the lower value) is unbounded. IfBk has a zero (but no negative) eigenvalue, then whether the corresponding game admits a saddle point or not depends on the precise value of the initial state and in particular if
Xl
= 0 one can allow
Bk
Xl,
to have-a. zero eigenvalue and
still preserve the existence of a saddle point. Since the "zero-eigenvalue case" can be recovered as the limit of the ''positive-eigenvalue case", we will henceforth not address the former. Some matrix manipulations, details of which can be found in ([17], pp. 247-257), show that condition (3.7) actually implies the nonsingularity of AI.. We now prove part (iii), which is the necessity part of the result. Contrary to the statement of the Theorem, suppose that there exists a policy for Player 1, say bounded for all
{PI.
E
W[I,Kj,
Ml;,
k E [1, K]}, under which the cost function J"( is
even though
Bk
has a negative eigenvalue for some
k E [1, K]; clearly, this policy cannot be the feedback policy (3.8). Let k be the largest integer in [1, K] for which Bk +l
> O.
Bk
has a negative eigenvalue, and
Furthermore, let a policy for Player 2 be chosen as for k < k for k = k for k > k
where
wli
is (at this point) an arbitrary element of 1R m" and
II;
is as
defined by (3.9). Denote the state trajectory corresponding to the pair (P[I,Kj, 1I[1,Kj) by X[I,Kj, and the corresponding open-loop values of the two
policies by the two players by notation
U[I,Kj
] 0 (sufficiently small, so that 1'2+<
'Y., whlch (intuitively) is generally the case, due to loss of infonnation to the controller under delayed state infonnation.
53
Discrete-TI/7Ie Problem with Perfect Stale Measurements
Then, for the discrete-time disturbance attenuation probJem with onestep deJayed state information,
(3.49)
Remark 3.4. If the controller were allowed to depend only on Xk-l at stage k, and not also on
X[1,k-2j,
then the only representation of the feedback
controller (3.8) in this class would be
=
B~M2Al1 A1Xl,
k
(3.50)
=1
which however requires more stringent conditions on , than the controller (3.48a). If the resulting performance level is acceptable, however, the advantage here is that the controller is static, while (3.48a) is dynamic (of the same order as that of the state). Illustrative Example (continued) Continuing with the illustrative example of Section 3.3.2, we now study -~
the degradation in performance due to a (one step) delay in the use of the state information,
a la Theorem 3.6.
Taking
J{
= 4, we compute the
closed-loop performance bound (from Theorem 3.2) as
,. = 1.4079. For the one-step delay information case, the corresponding performance (as given by (3.47» is
,0 = 2.2083,
whereas the performance unde~ the simpler (no-memory) controller (3.50) 1S
,nm
= 2.5135.
Two other possible constructions for the compensator (3.48b) are
56
Chapter 3
Proof.
The proof is similar to that of monotonicity of the solution of Ric-
cati equation in linear-quadratic control, but now we use the saddle-point value function (of the soft-constrained game) rather than the dynamic programming value function. Toward this end, we first write down the Isaacs' equation (2.12), associated with the linear-quadratic "feedback" game of Section 3.2.2:
Here Vk(x) is the saddle-point value of the dynamic game of Section 3.2.2, with only K - k + 1 stages (i.e., starting at stage k with state x and running through K). We know from Theorem 3.2 that 9 Vk( x) =
Ixlit.,
for every x E JR n . Now, since for two functionals g and
I,
and
g(u,w)2:/(u,w):::}inf sup g(u,w)2:inf sup/(u,w), U
w
w
U
it follows from (*) that the matrix inequality
MHI
2: Mk+2 implies the
inequality Mk 2: Mk+l' Hence, the proof of the lemma will be completed (by induction) if we can show that MK 2: MK+l under condition (3.5) with k
=Q.
Since we know that
= K,
QAI/ = Q[! + (BB' -
,"(-2 DD')Q]-l
2: 0,
it follows that
thus completing the proof.
o
The next lemma provides a set of conditions under which the sequence {Mk,K+lH=K+l is bounded, for every K
> O. Here we use a double index
9This fact has already been used in the proof of Theorem 3.2.
57
Discrete-TIT1U! Problem with Perfect StaJe Measurements
on M to indicate explicitly the dependence of the sequence on the terminal (starting) time point K. Of course, for the time-invariant problem, the elements of the sequence will depend only on the difference, K - k, and not on k and K separately.
Lel1lllla 3.4. Let 1
> 0 be fixed, and AI be a positive definite solution of
the ARE (3.52b), which also satisfies the condition (3.53). Then, for every
K> 0,
AI where
Mk,K
Proof.
Mk,K+l
2: 0,
for all k ~ K
+1
is generated by (3.4a).
First note that (3.53) implies nonsingularity of
hypothesis
(3.54)
AI > 0, it follows that AlA_l > O.
A,
and since by
If this property is used in
(3.52b), we immediately have Nt 2: Q, which proves (3.54) for k We now show that the validity of (3.54) for k k. Toward this end let us first assume that
+ 1 implies
Mk+l,K+l
= J{ + 1.
its validity for
> 0, and note from
(3.4a) and (3.52b), that
.•
t
which is nonnegative definite since Nt 2:
by the hypothesis of
Mk+l,K+l
the inductive argument. By the continuous dependence of the eigenvalues of a matrix on its elements the result holds also for
Mk+l,K+l
2: 0, since
we can choose a matrix N((), and a sufficiently small positive parameter (0
such that 0
< N(()
~
Nt, 0 < (
= x' H' H x + x' A' M A
-1 Ax
=0
Hx=OandM!Ax=O
where M~ is the unique nonnegative definite square-root of M, and the last result follows because both terms on the right are nonnegative definite quadratic terms, and M A-1 can equivalently be written as
Ii U I!
III:
~
Ii ~
~ I!
Next, multiply the ARE from left and right by x' A' and Ax, respectively, to arrive at
The left-hand side is zero by the earlier result, and hence
Now multiply the ARE from left and right by X'(A2)' and A 2x, respectively,
~
and continue in the manner above to finally arrive at the relation
II
x' (HI,AIHI, (A2)' H', ... , (An-1)'
II
H') = 0
59
Discrete-Time Problem with Perfect State Measurements
which holds, under the observability assumption, only if x
= O.
Hence
M> O.
o
An important consequence of the above result, also in view of Lemmas 3.3 and 3.4, is that if there exist mUltiple positive-definite solutions to the ARE (3.52b), satisfying (3.53), there is a minimal such solution (minimal in the sense of matrix partial ordering), say M+, and that lim Mk
K-+oo
K+l
= M+ > 0,
I
whenever (A, C) is an observable pair. Clearly, this minimal solution will also determine the value of the infinite-horizon game, in the sense that
(3.55)
where
J; is (3.2b) with K = 00.
The following lemma now says that the existence of a positive definite solution to the ARE (3.52b), satisfying (3.53), is not only sufficient, but also necessary for the value of the game to be bounded, again under the observability condition.
Lemma 3.6. Let (A, Q1/2) be observable. Then, if the ARE (3.52b) does
not admit a positive definite solution satisfying (3.53), the upper value of the game {J;;MCLPS),1{~} is unbounded.
Proof.
Since the limit point of the monotonic sequence of nonnegative
definite matrices {MI;,K+d has to constitute a solution to (3.52b), nonexistence of a positive definite solution to (3.52b) satisfying (3.53), (which also implies nonexistence of a nonnegative definite solution, in view of Lemma 3.5) implies that for each fixed k, the sequence {MI;,K+dK>O is unbounded. This means that given any (sufficiently large) a > 0, there
60
Chapter 3
exists a K
> 0,
and an initial state
E IR n, such that the value of the K-
Xl
stage game exceeds alxll2. Now choose
>
x1M1,K+1X1
Wi
K. Then,
> alxl1 2
which shows that the upper value can be made arbitrarily large. In the above, lI[l,KJ in the first inequality is the feedback saddle-point controller for Player 2 in the K-stage game, and the second inequality follows because the summation from K
+ 1 to 00 is nonnegative and hence the quantity is
bounded from below by the value of the K-stage game.
o
Lemmas 3.3-3.6 can now be used to arrive at the following counterpart of Theorem 3.2 in the infinite-horizon case.
Theorem 3.7. Consider the infinite-horizon discrete-time linear-quadratic
soft-constrained dynamic game, with I> 0 fixed and (A, Q1/2) constituting an observable pair. Then,
(i) The game has equal upper and lower values if, and only if, the ARE (3.52b) admits a positive definite solution satisfying (3.53).
(ii) If the ARE admits a positive definite solution, satisfying (3.53), then it admits a minimal such solu-tion, to be denoted M+(/)' Then, the finite value of the game is (3.55). (iii) The upper (minimax) value of tbe game is finite if, and only if, the upper and lower values are equal. (iv) If M+(/)
> 0 exists,
as given above, the controIler Ji[l,DO) given by
(3.51a), with M(/) replaced by M+(/), attains tbe finite upper value,
I
61
Discrete-Time Problem with Perfect State Measurements
in the sense that (3.56) and the maximizing feedback solution above is given by (5.1b), again with M replaced by M+. (v) Whenever the upper value is bounded, the feedback matrix (3.57a) is Hurwitz, that is it has all its eigenvalues inside the unit circle. This implies that the linear system (3.57b) is bounded input-bounded output stable.
Proof.
Parts (i)-(iii) follow from the sequence of Lemmas 3.3-3.6, as also
discussed prior to the statement of the theorem. To prove part (iv), we first note that the optimization problem in (3.56) is the maximization of 00
2: {IXk+ll~ +IB'M+A+-
k=l over
W[l,oo),
1
AXkl2 -i2IwkI2} + IXll~
subject to the state equation constraint (3.57b). First consider
the truncated (K-stage) version: K
max W[1,K)
0, which is (3.53) and is satisfied by hypothesis. Hence recursively we solve identical optimization problems at each stage, leading to K
max
L {IXkHlb + IB'M+ X+-' AXkl2 -
,2IwkI2} ~
xi M+Xl
W["K! k:l
where the bound is independent of I
"Y 0L , the RDE (4.5) does not have a conjugate point on the
interval [0, t J]. (ii) For'Y
> .y0L, the game
admits a unique saddle-point solution, given
by
u*(t) = JL*(t;xo) = -B'(t)Z(t)x*(t) w*(t)
1 = 1I*(t; xo) = 2"D'(t)Z(t)x*(t),
'Y
(4.6a)
t 2: 0
(4.6b)
where x[O,t/l is the corresponding state trajectory, generated by
x* (iii) For'Y >
= (A -
.y0L,
(BB' - -;DD')Z(t»x*; 'Y
.y0L,
= Xo.
(4.7) .
the saddle-point value of the game is
L; == L..,(u*, w*) = (iv) If 'Y
= l+tJ-t'
O. Hence, there is no conjugate point for (4.5)
on the interval (0,00). For (4.3), however, a finite solution exists .only if
tJ < 1, in which case 1
S(t) = l+t-tJ'
O~t~tJ 0, there exists a continuously differentiable matrix function Z(·5 solving (4.5) for all t E (O,t,]. Now, as in (4.4), let us introduce i'CL :=
infh > 0: The RDE (4.5) does not have (4.10) a conjugate point on [O,t,]).
Hence, the equivalent representation (4.9) for the game performance index is valid whenever , >
i'CL.
Since this representation for L, completely
decouples the controls of the players, it readily leads to the feedback saddlepoint solution, by minimization and maximization of the second and third terms separately. This observation, together with the results presented in Appendix A, leads to the following theorem:
Theorem 4.2. For the linear-quadratic zero-sum differential game with
closed-loop information structure, defined on the time interval [0, t,], let the parameter
(i)" If, >
.:yCL
i'CL,
be as defined by (4.10). Then:
the differential game admits a unique feedback saddle-point
solution, which is given by
p*(t; x(t)) 1I*(t;X(t)) =
= -B(t)' Z(t)x(t)
,~D(t)'Z(t)x(t),
t 2:
(4.11a)
°
(4.l1b)
79
Continuous-7i17l£ Systems with Perfect State Measure17l£nts
where Z(.) is the unique solution of (4.5). Furthermore, the saddlepoint value is given by (4.8), and the corresponding state trajectory is generated by (4.7). (ii) If,
teL. This solution, however, is neither strongly time consistent, nor
noise insensitive. Two other such constructions, which follow similar lines as above, are given below .in Sections 4.2.3 and 4.2.4, in the contexts of sampled-data and delayed state information patterns. 4.2.3
Sampled-data information for both players.
We now endow both players with sampled-data perfect state (SDPS) information of the type introduced in Section 2.3. Note that the admissible controls for the players are of the form
u(t)
= pet; x(h), ... , x(tt), xo),
tk
wet)
= lI(t;X(tk)' ... ' X(tl), x~),
tk ~ t < tk+l, k = 0, I, ... , K, (4.13a)
~
t < tk+l, k = 0, I, ... , K, (4.13a)
where {O, tl, t 2, ... , tK} is an increasing sequence of sampling times. In view of part (ii) of Theorem 4.2, together with Theorem 2.5, every sarnpleddata solution to the differential game has to be a representation of the feedback saddle-point solution (4.11) on the trajectory (4.7). One such representation, which is in fact the only one that is noise insensitive, and depends only on the most recent sampled value of the state, is
pSD (t; X(tk»
= -B(t)' Z(t)<J>(t, tk)X(t k ),
tk ~ t < h+l'
(4.14a)
1
tk ~ t < tk+l, (4.14b) 'Y where is the state transition function associated with the matrix lISD(t;X(tk» = 2D(t)'Z(t)(t,tk)X(tk),
F(t)
:=
A(t) - [B(t)B(t)' - -;D(t)D(t)']Z(t) 'Y
(4.14c)
81
Continuous-Time Systems with Perfect State Measurements
and Z(-) is the solution to (4.5). Clearly, for (4.14a)-(4.14b) to constitute a saddle-point solution, first we have to require that (4.5) has no conjugate points in the given interval, and in addition we have to make sure that the function J..,(jlSD, w) is concave in w E 1iw' The corresponding condition is given in Theorem 4.3 below, which also says that the solution (4.14a)(4.14b) in fact requires the least stringent conditions, besides being strongly time consistent and noise insensitive. Before presenting the theorem, let us introduce K Riccati differential equations, similar to (4.3) but with different boundary conditions:'
Sk
+ A'Sk + SkA + Q + ,-2Sk DD'Sk = OJ (4.15)
Sk(tk+d = Z(tk+l)j
tk ~ t
< tk+l,
k = K, K - 1, ... O.
Furthermore, let
i'SD
:= infh
> i'CL : The K RDE's (4.15) do not have conjugate
points on the respective sampling intervals}, ( 4.16) where i'CL was defined by (4.10). Then we have:
Theorem 4.3. Consider the linear-quadratic zero-sum differential game (4.1)-(4.2a), with sampled-data state information (4.4).
(i) If, > i'SD, it admits a unique strongly time-consistent and nozseinsensitive saddle-point solution, which is given by the policies
(4. 14a)-(4. 14b). The saddle-point value is again given by (4.8).
(ii) If, < i'SD, the differential game with the given sampling times does not admit a saddle point, and it has unbounded upper value. Proof.
We should first remark that the set of conjugate point conditions
on the Riccati differential equations (4.15), which was used in defining i'~D,
l
I,
82
;
in w E llw, which can be shown using Lemma 4.1. But simply showing
!
l
Chapter 4
is precisely the condition for strict concavity of the function }-y(J-LSD, w)
this would not prove the theorem, since the policy (4.16a) was just one particular representation of the feedback saddle-point policy (4.lla). A more complete proof turns out to be one that follows basically the steps of the proof of Theorem 3.2, with the only difference being that in the present case the static games in between measurement points are continuous-time open-loop games (ofthe type covered by Theorem 4.1), while in the discretetime game of Theorem 3.2 they were games defined on Euclidean spaces. Accordingly, under the sampled-data measurement scheme, we have the dynamic programming recursion (Isaacs' equation):
where, on the time interval [tb tl:+ 1 ),
and
This dynamic programming recursion involves the solution of a sequence of (nested) open-loop linear-quadratic zero-sum differential games, each one having a bounded upper value if, and only if, (from Theorem 4.1) the Riccati differential equation (4.15) does not have a conjugate point on the corresponding subinterval. The boundary values are determined using Theorem 2.5, that is the fact that open-loop and closed-loop feedback saddlepoint policies generate the same trajectory whenever both saddle points exist. It is also because of this reason that the saddle-point policies have the structure (4.15), which are also the unique strongly time-consistent and noise-insensitive saddle-point policies since they are derived using dynamic
83
Continuous-Time Systems with Perfect Stale Measurements
programming (see the discussion in Section 2.3). This completes the proof of part (i). Part (ii) follows from a reasoning identical to the one used in the proof of Theorem 3.2 (iii), under the observation (using Theorem 4.1) that if the upper value of anyone of the open-loop games encountered in the derivation above is unbounded, then the upper value of the original game will also be unbounded with sampled-data measurements.
Remark 4.1. In construction of the sample-data saddle-point strategies (4.15), we need the solution of a single n-dimensional RDE (4.5). However, to determine the existence of a saddle point we need, in addition, the solutions (or the conjugate point conditions) of J{ n-dimensional RDE's (4.15), . one for each sampling subinterval. These conditions would still be the prevailing ones if Player 2 had access to full (continuous) state information, and the pair of policies (4.15) would still be in saddle-point equilibrium under this extended (closed-loop) information for Player 2. However, this solution will not retain- its noise-insensitivity property. Under the CLPS information pattern for Player 2 (and still with SDPS pattern for Player 1), a noiseinsensitive saddle-point policy for Player 2 will depend on the current value of the state, as well as the most recent sampled state measurements. For the construction of such a policy, which will again exist whenever'Y we refer the reader to [13].
Remark 4.2. Since
MSDPS
> t SD ,
C
MCLPS,
it should be intuitively obvious
that if the upper value of the game is unbounded under continuous state measurements, it should also be unbounded under sampled state measurements, but not necessarily vice versa because sampling continuous state measurements generally leads to a loss of information for Player 1. This intuition is of course also reflected in the definition of the parameter value .ySD,
in (4.16). An important consequence of this reasoning is that, because
of the loss of information due to sampling, for "reasonable" problems 'fie
i
I
84
Chapter 4
might expect to see the strict inequality2:
f f
( 4.17)
Derivation of the precise conditions under which this would be true for the time-varying problem is quite a challenging task, which we will not pursue here.
I
Delayed state measurements
4.2.4
Here we consider the linear-quadratic differential game where both players have access to state with a delay of (J
> 0 time units. The players' admissible
control policies will hence be of the form:
u(t)
=
J-l(t, X[O,t-8j), t ~
(J
(4.18)
J-l(t, xo),
o~ t < (J.
for Player 1, and a similar structure for Player 2. We denote the class of such controllers for Players 1 and 2 by M8D and .N8D , respectively. Since
Men
c
MCLPS,
we again know from Theorem 4.2 that for 'Y
< t CL
the
upper value of this differential game (with delayed measurements) will be unbounded, and for 'Y
> .:yCL its saddle-point solution is necessarily a repre-
sentation of the feedback solution (4.15), provided that it exists. Motivated by the argument that led to Theorem 3.4 in Chapter 3 for the discrete-time delayed-measurement game, we can now write down the following representation of (4.15) in Men x .NOD as a candidate saddle-point solution, which is clearly noise insensitive:
u(t) = J-l°(t,1J(t» = -B(t)'Z(t)1J(t), w(t)
1 = 1I0(t,1J(t» = 2D(t)'Z(t)1J(t), 'Y
O:S t:S tj, O:S t:S tj,
(4.19a) (4.19b)
2Note that if the game is complet~ly decoupled in terms of the effects of the controls of Player 1 and Player 2 on the state vector as well as the cost function (such as matrices A and Q being block-diagonal and the range spaces of BB' and DD' having empty intersection), then we would have an equality in (4.17); this, however, is not a problem of real interest to us, especially in the context of the disturbance attenuation problem to be studied in the next section.
85
ContinUous-Time Systems with Perfect State Measurements
where 1)(.) is generated by the (infinite-dimensional) compensator
1)(t) = 1/;(t, t - B)x(t - B)
_it
1/;(t, s)
t-8
. [B(S)B(S)' -
(4.20)
~2D(S)D(S)']
1)( s) = 0,
0~
S
Z(s)1)(s)ds,
< r,
with 1/;(-,.) being the state transition matrix associated with A(} To obtain the conditions under which this is indeed a saddle-point solution, we have to substitute (4.19a) into L..." and require the resulting quadratic function (in w) to be strictly concave. The underlying problem is a quadratic maximization problem in the Hilbert space 1iw, and is fairly complex because the state dynamics are described by a delay-differential equation. However, an alternative argument that essentially uses the property given in Theorem 2.5, yields the required concavity condition quite readily: If the problem admits a saddle-point solution, at any time r the value of the game for the remaining future portion will be x(r)'Z(r)x(r), which follows from Theorem 4.2 and Theorem 2.5. In the interval [r - B, r], the choice of J.'[r-9,r] will not show any dependence on
W[r-8,r],
and hence the game will re-
quire essentially the same existence conditions as the open-loop game of Theorem 4.1, only defined on a shorter interval. Hence, for each r 2: B, we will have a RDE of the type (4.3), defined on
[r-B,r):
(4.21 ) This is a continuum family of Riccati differential equations, each one indexed by the time point r 2: B. Associated with this family, we now intl'o-
86
Chapter 4
duce the counterpart of (4.16): teD := inf{-y
> t CL : For every
T 2: 0, the RDE (4.21) does not have a
conjugate point on the corresponding interval}. (4.22) It now follows from the discussion above that the delayed-measurement
game cannot have a bounded upper value if I < teD. Hence, we arrive at the following counterpart of Theorem 4.3 under delayed measurements.
Theorem 4.4. Consider the linear-quadratic zero-sum differential game (4.1)-(4.2a), with delayed state m~asurements (4.18). Let the scalar teD be defined by (4.22). Then:
(i) If I > teD, the game admits a unique noise-insensitive saddle-point solution, which is given by the policies (4.19a)-(4.19b). The saddlepoint value is again given by (4.8). (ii) If 'Y < t ev , the differential game does not admit a saddle point and has unbounded upper value.
4.3
The Disturbance Attenuation Problem
We now return to the minimax design problem formulated in Chapter 1, by (1.9)-(1.11), where we take H'G = 0, H'H = Q, G'G = I, C = I,
E
= °and Xo = 0, or equivalently the problem formulated in Section 4.1
by (4.1)-(4.2a), with the W>O norm to be minimized given by (1.2b). This is the disturbance attenuation problem with closed-loop perfect state information (2.15a), where the controller space is denoted by MCL. We will also consider the sampled-data state information (2.15b), with the resulting controller belonging to Ms D , and the delayed state information (2.15c), where the controller belongs to Mev. In all cases, the solutions to the disturbance
87
Continuous-TIme Systems with Perfect State Measurements
attenuation problem (to be presented below) follow from the solutions of the corresponding soft-constrained games studied in the previous section - a relationship which has already been expounded on in general terms in Section 1.2, and in the context of the discrete-time problem in Section 3.3. 4.3.1
Closed-loop perfect state information
The associated soft-constrained game is the one studied in Section 4.2.2. Let :yCL be as defined by (4.10). For every / > :yCL, we know (from Theorem 4.2) that the soft-constrained game admits a saddle-point solution, with the minimizing control given by.( 4.11a). Let us rewrite this feedback
1-';, to indicate explicitly its dependence on the parameter /.
controller as
Unlike the case of the discrete-time problem (d. Theorem 3.5) the limit of
1-';
as /
1 :yCL
may not be well-defined (because of the existence of a
conjugate point to (4.5) on the interval [O,t,]), and hence we have to be content with suboptimal solutions. Toward a characterization of one such solution, let
f
> 0 be sufficiently small, and
/f := :yCL
+ f.
Then, it follows
from the saddle-point value (4.8), by taking Xo = 0, that
This implies, in view of part (ii) of Theorem 4.2, that
where the latter is the optimum (minimax) disturbance attenuation level (1.2a) under closed-loop state information. Hence we arrive at the following theorem: Theorem 4.5. For the continllous-time finite-horizon disturbance attenu-
ation problem with closed-loop perfect state information (and with Xo
= 0):
(i) The minimax attenuation level is equal to .yCL, that is (4.23a)
88
Chapter 4
where the bound is given by (4.10). (ii) Given any
f
> 0, ~CL
sup J-ysD(fl, w) < 0, wErt",
which says that the upper value of the soft-constrained game (under sampled state measurements) is bounded, in spite of the fact that I < .:yso. This contradicts with statement (ii) of Theorem 4.3, thus completing the proof. We are now in a position to state the following theorem, which follows readily from Theorem 4.3, in view of (4.25):
90
Chapter 4
Theorem 4.6. For the finite-horizon disturbance attenuation problem
with sampled state information (and Xo = 0): (i) The minimax attenuation level ,SD is equal to
(ii) Given any
£
>
0, and a corresponding
9
1, which indeed turns out to be the case because it can be shown that for,
< 1, regardless of the value of .A E (0,1), either i) or ii) is
violated. At this point we can raise the question of the "optimal choice" of the sampling time parameter .A, so that ,so is minimized. Because of the monotonicity property of the two conditions above, the optimum value of
.A is one under which there exists a , which makes both i) and ii) equalities (simultaneously). Some manipulations bring this condition down to one of existence of a .A E (0, 1) to the trigonometric equation tan [
(2.A-1) .A
'"27r] =
1 4.A2 _ 1 tanh
[7r"2 V4.A2 - 1]
which admits the unique solution '
.A
~
0.6765.
The corresponding minimax attenuation level is ,so
= 2.A ~ 1.353.
Note that the above also shows that in the general disturbance attenuation problem if the choice of sampling times is also part of the design, under the constraint of a fixed total number of sampled measurements in the given time interval, uniform sampling (i.e., tk+l - tk = constant) will not necessarily be optimal. We will later see that for the time-invariant infinite-horizon problem uniform sampling is indeed optimal. 4.3.4
Delayed state measurements
Consider now the case when the controller can access the state with a delay of 0 > 0 time units. The soft-constrained game here is the one discussed in Section 4.2.4, whose relevant saddle-point solution was given in Theorem 4.4. An argument quite identical to the one used in the previous section, applied now to Theorem 4.4, readily leads to the following result.
93
Continuous-Time Systems with Perfect State Measurements
Theorem 4.7. Consider the con tinuous-time disturbance atten uation
problem with 8-delayed state information, of the type given by (4.18). Let the scalar -:reD be defined by (4.22). Then: (i) The minimax attenuation level is equal to -:reD, that is
, o := (ii) For any
f
f
•
'T'
III ~ 11J IJEM8D
»=,
~8D
(4.27a)
> 0, (4.27b)
where J.L~~ is given by (4. 19a), with, =,~. (iii) If,o >
,*,
where the latter was given in Theorem 4.5, the limiting'
controJIer o J.L~o
,
= l' 0 is replaced by the observability of the pair (A, H).
Chapter 4
98
Now, toward establishing (v), consider the finite-horizon truncation of the maximization problem (4.32):
subject to :i: = (A - BB' Z+)x
+ Dw == Ax + Dw.
Since Z+ > 0, by hypothesis, this quantity is bounded from above by
where
Q := Q + Z+ B B' Z+ . In view of Lemma 4.1 and Theorem 8.2 of Appendix A, this maximization problem admits a unique solution provided that the following RDE does not have a conjugate point on [0, tf]:
which, however, admits the solution
s= Z+. Furthermore, the maximum value is given by
-,.17+ Xo, xoz, and the maximizing solution is (4.28b) with Z = Z+. Since the truncated maximization problem is uniformly bounded from above, with the bound being the value (4.31), the result stands proven.
o
b) Sampled-data perfect state information:
Now, given that limtr-+oo Z(tj t f ) = Z+, where Z+ exists as the minimal positive definite solution of the ARE (4.29), it is fairly straightforward to
99
Continuous-Time Systems with Perfect SUlle Measurements
obtain the counterpart of Theorem 4.8 under sampled-data measurements. In view of Theorem 4.3, we have, as candidate solutions, (4.34a) (4.34b)
where q> is the state transition function associated with the matrix
F :=
A- (B B' - ;2 DD') Z+.
-I
(4.34c)
In between two sampling points, still the Riccati differential equations (4.15) will have to be used, for existence considerations, with the boundary conditions now being identical: S(tk)
= Z+.
Being time invariant, the·
Riccati differential equations themselves are also identical, apart from the length of the time interval on which they are defined. Since these Riccati equations are used only for the determination of existence, the conjugatepoint condition of the one defined on the longest time interval will be the determining factor -
~
all others will play no role in the solution. Hence,
letting
t.:=SUp{tk+l-t",
k=O,l, ... },
(4.35)
10
the relevant RDE will be .
1
S+A'S+SA+Q+2SDD'S=0; 'Y
S(t.)=2+
(4.36)
which should not have a conjugate point in the interval [O,t.]. It is now convenient to introduce here the counterpart of (4.16): 7~ := inf{!
>
°:The ARE (4.29) admits a positive definite solution,
and the RDE (4.36) does not have a conjugate point on the interval [0, t.]}. (4.37) This brings us to the following theorem.
100
Chapter 4
4.9. Consider
Theorem
the infinite-horizon
quadratic differential game, where QJ with H' H
the pair (A, H) is observable,
= Q, and the information structure is sampled-data. >
(i) If I
= 0,
time-invariant linear-
Then,
9~, the game has equal upper and lower values, with the
common value being (4.31a). (ii) Ifl
> 9;?, the controller J1.~ given
by (4.34a) attains the finite value,
that is
< 9;?, the upper value 9f the differential game is unbounded.
(iii) If I (iv) If I
> 9!D, the matrix Q is positive definite, and the controller
(4.34a) is used for u, the res[Jiting (hybrid) closed-loop system becomes bounded input-bounded output stable (with w considered to be the input). Proof.
Parts (i) and (iii) follow from Theorem 4.8 and the discussion
preceding the statement of this theorem (see also Theorem 4.3). To prove part (iii), we use the line of reasoning employed in the proof of Theorem 4.8 (v). Note that the optimization problem in (ii) above is the maximization of the performance index lim sup Kloo
over
W[O,oo),
subject to the state equation constraint
x = Ax + Dw -
BE' Z+(t, tk)X(tk)'
tk S t
9,;,0, the maximum above exists, and the
maximizing w is given by
Toward proving this claim, it is convenient to introduce a zero-sum softconstrained differential game of the type defined by (4.1) and (4.2b), but with the interval being [tK -1, tK). In this game, we take Players 1 and 2 to have open-loop information (that is they only have access to the value of X(tK_1)). This is consistent with the nature of the information pattern for Player 1, and for Player 2 we know that the· predse nature of his information pat tern is irrelevant since (as we shall shortly see) the upper value of the game is bounded whenever 'Y
> 9,;,0. Now, in view of Theo-
rem 4.1, a unique saddle-point solution to this open-loop differential game exists and is given by (4.6a)-( 4.6b), where both (4.3) and (4.5) have as terminal (boundary) conditions Z+, replacing Qf. With these boundary conditions, the RDE (4.3) is precisely (4.36) (on the appropriate sampling interval), and the solution of (4.5) is Z+ for all t E [tK-1,tK]. Furthermore, the saddle-point strategies (4.6a) and (4.6b) are exactly (4.34a) and (4.34b), respectively. Hence, the game has a value, given by (in view of
(4.8)) x(tK-d Z+X(tK_1), which directly implies that max {IX(tK )I~+ W[O,'KJ
K-1
+ 2: Fk(x, w)} k=O
K-2
=
max W[O"K_IJ
{IX(tK-dl~+
+ 2: Fk(x,w)}. k=O
102
Chopter4
On every sampling interval, we now have identical problems, and hence (inductively) the maximum above is equal to X~Z+Xo ,
which provides a uniform (upper) bound for the finite-horizon maximization problem (*). Since we already know that this value is the limiting value of the sequence of finite-horizon games, it follows that the control J.L~ attains it, and the steady state controller (4.34b), or even (4.28b) (if disturbance is allowed to depend on the current value of the state), maximizes J:;"(J.L~, w). Finally, part (iv) follows from the boundedness of the upper value, which is x~Z+ Xo. If the hybrid system were not stable, then the upper value could be driven to
+00 by choosing
w
== 0, since Q > O.
c) Delayed perfect state measurements: The infinite-horizon counterpart of Theorem 4.4 can be obtained by following arguments quite similar to those that led to Theorem 4.9. The counterpart of (4.36) here is the time-invariant version of the RDE (4.21):
S· 8+ A 'S8+ S 8A +Q+ 21 S8 DD , S8 'Y
= 0;
which is now a single RDE, identical to (4.36), and defined on the time interval [0, eJ. For boundedness of an upper value to the differential game, we now have to require, as in the sampled-data case, (in addition to the condition of Theorem 4.8) that the RDE (4.38) have no conjugate point on the given interval. To make this condition more precise, we introduce the following counterpart of (4.37):
1:;; := inf {'Y > 0 : ARE (4.29) admits a positive definite solution, and RDE (4.38) does not have a conjugate point on the interval [0, eJ}. (4.39) Then, the following theorem is a replica of Theorem 4.9, for the delayed information case.
103
Continuous-Time Systems with Perfect State Measurements
Theorem
4.10. Consider the infinite-horizon time-invariant linear-
quadratic differential game, where QJ = 0, the pair (A, H) is observable, with H' H = Q, and the information structure is delayed-state. Let 9~ be defined as in (4.39). Then, (i) If 'Y
>
9~, the game has equal upper and lower values, with the
common value being (4.31a). (ii) If 'Y
> 9~, the controller I-'~ given 1-'~(1](t))
by
= -B'Z+1](t),
t 2: 0
( 4.40)
where 1](-) is given by (the time-invariant version of) (4.20), attains the finite value, that is 8D w ) -- Xo'Z-+ Xo· sup J ..,OO(1-'00' wE'H. w
(iii) If 'Y < 9~, the upper value of the differential game is unbounded. (iv)
Ih > 9~, Q is positive definite,
and the controller (4.40) is used for
u, the resulting closed-loop system described by a delayed differential equation becomes bounded input-bounded outpu t stable.
4.4.2
The disturbance attenuation problem
The solution to the (infinite-horizon) HOO-optimal control problem under the three information patterns of this chapter, can now readily be obtained from Theorems 4.8 - 4.10, by essentially following the arguments that led to the results of Section 4.3 (in'the finite horizon) from the soft-constrained counterparts first developed in Section 4.2. We present these results below, without repeating some of the common arguments for their formal justification.
104
Chapter 4
a) Closed-loop perfect state measurements: Let us introduce the scalar
9:;:'L := inf{-y > 0:
ARE (4.29) has a positive definite solution}.
(4.41)
Note that it is quite possible that the set above whose infimum determines
9:;:'L "
is empty. To avoid such pathological cases, we now assume (in addi-
tion to the observability condition of Theorem 4.8) that the pair (A, B) is stabilizable. This ensures, by a continuity argument applied to (4.29) at
1
= 00, that the set in (4.41) is nonempty, and hence 9~L is finite. 4
Then,
because of the established relationspip between the soft-constrained differential game and the disturbance attenuation problem, the following result follows from Theorem 4.8.
Theorem 4.11. For the continuous-time infinite-horizon disturbance at-
tenuation problem (with Xo
= 0),
let the condition of Theorem 4.8 on
observability be satisfied, and the pair (A, B) be stabilizable. Then
* 100:=
. f
III I'EMcLPS
«: T I'
~
~CL = 100 ,
which is finite and is given by (4.41). Moreover, given any
f.
> 0, we have (
the bound
where ( 4.42)
with
Z::-
being the unique minimal positive definite solution of (4.29). Fur-
thermore, for any
stable system.
f.
> 0, /1-';. leads to a bounded input-bounded output
4For a similar argument used in the discrete time, see the proof of Theorem 3.8. Note that with'"Y = 00, the ARE (4.29) becomes the standard ARE that arises in linearquadratic regulator theory ([56], [2]), which is known to admit a unique positive definite solution under the given conditions.
105
Continuous-Time Systems with Perfect State Measurements
Remark 4.4. Again, as in the case of Theorem 4.8, we are not claiming that there exists an optimal controller (corresponding to I
= 9~L), because
the ARE (4.29) may not admit any bounded solution for this limiting value of Ii the example of Section 4.4.3 will further illustrate this point.
0
b) Sampled-data perfect state information pattern: Under the sampled-data information pattern, it follows from the definition of 9~, as given by (4.37), that ~SD
> ~CL
100 _ 100
which is of course a natural relationship because the CLPS information pat- . tern includes the SDPS pattern. In view of this relationship, the followin& theorem now follows from Theorem 4.9.
Theorem 4.12. For the infinite-horizon disturbance attenuation problem with sampled state measurements, let the pair (A, Q1/1:) be observable, and
(A, B) be stabilizable. Then,
i) The limiting (as t f ~ (0) minimax atten uation level I~' as defined by the infinite-horizon version of (l.2a) for the SDPS information, is
equal to i'~ given by (4.37), which is finite. ii) Given any f > 0, and a corresponding 100,. := 9~
+ f,
JL;'oo,.(t, X(tk)) = ~B' Z~oo" exp{F,oo,,(t - tk)}X(tk),
the controller
tk
s t < t H1 , (4.43)
where F.
'00,'
= A - (BBI - 12 _l_DD/) Z+ '00,,' 00,.
(4.44)
achieves an asymptotic performance no worse than 1 00 ,•. In other words, IIT!,~oo" (w)11
S loo,.lIwll, for all wE 1iw'
106
Chapter 4
iii) If Q > 0, the controller given by (4A3) leads to a bounded inputbounded output stable system. iv) The following limiting controller exists: Pr:'so = lim Pc:',00,< . '00 flO
Remark 4.5. If the choice of the sampling times is also part of the design, then for the finite-horizon version and under the constraint of a fixed total number of sampled measurements in the given time interval, uniform sampling (i.e., t,Hl
- tk
= constant) will not necessarily be optimal, as we
have seen earlier. For the time-invariant infinite-horizon version, however, uniform sampling will be overall optimal (under sayan average frequency constraint), since the asymptotic performance is determined by the length of the longest sampling interval. c) Delayed state measurements:
Now, finally, under the
de~ayed
state measurement information pattern,
Theorem 4.10 leads to the following solution for the disturbance attenuation problem.
Theorem 4.13. For the infinite-horizon disturbance attenuation problem
with 8-delayed perfect state measurements, let the pair (A, Ql/2) be observable, and the pair (A, B) be stabilizable. Then, i) The limiting (as tf
--+
(0) minimax attenuation level
'Y~, as defined
by the infinite-horizon version of (l.2a) for the DPS information, is
equal to i'~ given by (4.39), which is finite. ii) Given any f > 0, and a corresponding 'Y~,< := 9~
J-Lr;. (t, X(tk)) oo.~
= -B' Z~o
CCI.~
1J(t),
+ f,
t 2: 8,
the controller (4.45)
(
107
Continuous-Time Systems with Perfect State Measurements
where 7](.) is generated by (the time-invariant verSlOn of) (4.20), achieves an asymptotic performance no worse than
,~,
9NCL ,
= 0,
V, > ~CL
.
( 4.58)
the.feedback controller /-'; ensures a disturbance
attenuation bound" that is ( 4.59) o Note that Theorem 4.14 is weaker than Theorem 4.5, because it does not claim that
:yNCL
is the opti..T.al (minimax) disturba.nce atLenuation ievel.
For a stronger result one has to bring in further structure on
f and g,
and derive (for nonlinear systems) the counterparts of the necessary and sufficient conditions of Appendix A.
4.7
Main Points of the Chapter
This chapter has presented the solution to the continuous-time disturbance attenuation (HOO-optimal control) problem under several perfect state information patterns. When the controller has access to the current value of the state, the optimum attenuation level is determined by the nonexistence of a conjugate point to a particular Riccati differential equation in the finite horizon (see Theorem 4.5), and by the nonexistence of a positive definite solution to an algebraic Riccati equation in the infinite horizon case (see Theorem 4.11). In contradistinction with the discrete-time case, however, a corresponding optimal controller may not exist. There exist suboptimal linear feedback controllers (( 4.23c) or (4.42)) ensuring attenuation levels arbitrarily close to the optimum ones. Under sampled-data measurements, a set of additional RDE's (4.15), whose boundary conditions are determined by the solution of the CLPS
Continuous-Time Systems with Perfect State Measurements
113
RDE (4.5), playa key role in the determination of the optimum attenuation level (see Theorem 4.6). A controller that achieves a performance level arbitrarily close to the optimum is again a linear feedback rule (4.26a), that uses the most recent sampled value of the state. For the infinite-horizon version, the optimum attenuation level is determined by one ARE and one RDE, with the latter determined on the longest sampling interval (see Theorem 4.12)". The same conditions apply to the infinite-horizon delayed state measurement problem as well, even though the near-optimal controllers in this case have completely different structures (see Theorem 4.13; and Theorem 4.7 for the finite horizon case). The chapter has also discussed extensions to problems with (i) performance indices containing·cross terms between state and control, (ii) nonzero initial states, and (iii) nonlinear
dynamics/nonquadratic cost functions. As mentioned earlier, the HOO-optimal control problem under the CLPS information pattern was essentially solved in [65], which, however, does not contain a complete solution (in terms of both necessary and sufficient conditions) to the finite horizon case. The sampled-data solution, as given here, was first presented in [14] and was furthe·r discussed in [8] and [12], which also contain results on the DPS information case. The continuoustime HOO-optimal control problem under CLPS information and using the state space approach has been tJ.1.e subject of several (other) papers and theses, among which are [53], [59], [71], [86], [80], and [51]. A preliminary study of the nonlinear problem, using game-theoretic techniques, can be found in [45].
I
'. . .
·-·~t·. ·~ -.-~ .-
:,-:'"
::~
Chapter 5 The Continuous-Time Problem With Imperfect State Measurements
5.1
Formulation of the Problem
We now turn to the class of continuous-time problems originally formulated in Section 1.3, where the state variable is no longer available to the controller, but only a disturbance-corrupted linear output is. The system is therefore described by the following equations:
x(t)
= A(t)x(t) + B(t)u(t) + D(t)w(t), yet)
x(O)
= C(t)x(t) + E(t)w(t).
= xo
(5.1) (5.2)
We have not included an explicit dependence on u in the output equation (5.2), as the controller knows u and can always subtract out such a term from y. The regularity assumptions on the matrices
CO and D(.) are
as on A, B, D in the previous chapter: piecewise-continuous and bounded. To emphasize duality, it will be helpful to write the performance index in terms of a second output (as in (1.9)-(1.10))
z(t) = H(t)x(t)
+ G(t)u(t),
(5.3)
where we first take
H'(t)H(t)
=:
Q(t),
G'(t)G(t) = I,
(5.4a) (5.4b)
I
'~1.·.- -.-~, .-
H'(t)G(t) :::: 0,
(5.4c)
so that we may write the performance index, to be minimized, as
:,-:'"
::~
115
Imperfect Stale Measurements: Continuous TIme
J(u, w)
= Ix(tJ )I~, + IIxll~ + IIuW == Ix(tJ )I~, + IIzW·
(5.5)
For ease ofreference, we reproduce here the solution, given in Section 4.3.1, of the full state min-max design problem, for a given attenuation level/:
u·(t)
= p.·(t; x(t)) = -B'(t)Z(t)x(t).
(5.7a)
As shown in Section 4.2 (Theorem 4.2), for a related "soft-constrained" differential game, this controller is in saddle-point equilibrium with the "worst-case" disturbance ~·(t)
= v·(t; x(t)) = /-2 D'(t)Z(t)x(t).
(5.7b)
Notice that (5.4b) implies that G is injective. Dually, we shall assume that E is surjective, and we shall let
E(t)E'(t)
=:
N(t),
(5.8a)
where N(t) is invertible. l To keep matters simple, and symmetrically to (5.4c) above, we shall, in the main development, assume that
D(t)E'(t)
= O.
(5.8b)
However, in Section 5.4, we will provide a more general solution to the problem, to cover also the cases where the restrictions (5.4c) and (5.8b) are not imposed. The admissible controllers,
u =p.(y)
(5.9)
1 As a matter of fact, it would be possible to absorb N-~ into D and E, and take EE' = I. We choose not to do so because it is often interesting, in examples and practical
applications, to vary N, which is a measure of the measurement noise intensity.
116
Chapter 5
the set of which will simply be denoted by M, are controllers which are
causal, and under which the triple (5.1), (5.2), (5.9) has a unique solution for every Xo, and every w(.) E L 2([0,t,j,lR n )
= W.
(In the infinite-horizon
case, we shall impose the further condition that (5.9) stabilizes the system, including the compensator). We shall consider, as the standard problem, the case where Xo is part of the unknown disturbance, and will show on our way how the solution is to be modified if, to the contrary, Xo is assumed to be known, and equal to zero. To simplify the notation, we shall let (Xo, w) =: wEn := lR n
x W.
(5.10)
We also introduce the extended performance index (5.11) where Qo is a weighting matrix, taken to be positive definite. In view of the discussion of Section 1.2, the disturbance attenuation problem to be solved is the following:
Problem P-Y'
Determine necessary and sufficient conditions on 'Y such
that the quantity inf sup J-y(J1.,w) /JEM wEn
is finite (which implies that then it will be zero), and for each such 'Y find a controller J1. (or a family of controllers) that achieves the minimum. The infimum of all 'Y's that satisfy these conditions will be denoted by 'Y. .
0
We present the solution to this basic problem, in the finite horizon, in Section 5.2. Section 5.3 deals with the sampled-data case, and Section 5.4 studies the infinite horizon case, also known as the four block problem. Section 5.5 extends these to more general classes of problems with cross terms in the cost function, and with delayed measurements; it also discusses some extensions to nonlinear systems. The chapter concludes with Section 5.6 which summarizes the main results.
117
Imperfect State Measurements: Continuous Time
5.2
A Certainty Equivalence Principle, and Its Application to the Basic Problem P-y
To simplify the notation in the derivation of the next result, we shall rewrite (5.1)-(5.2) as :i:
= f(t; x, u, w),
x(O)
= Xo,
(5.12a)
y= h(t;x,w)
(5.12b)
and (5.11) as
J"(u,w) = q(x(tf))
[', + J g(t;x(t),u(t),w(t))dt+N(xo). o
For a given pair of control and measurement trajectories, U
= yto,t,],
and fJ
(5.12c)
= U[O,t/l
we shall introduce a family of constrained optimization
problems QT( U, fJ), indexed by r E [0, tf], called "auxiliary problems" , in the following way. Let
nT(u, fJ)
= {w E n I y(t) = fJ(t),
Here, of course, y(t) is generated by U and that
n
T
w
'lit E [0, r]}.
(5.13)
through (5.1)-(5:2). Notice
depends only on the restrictions of U and fJ to the subinterval [0, r],
that is on uT := U[O,T] and fJT := ytO,T]' Also, the property
WEnT
depends
only on w T := (xo, wT) = (xo, W[O,T])' and not on W(Th]' so that we shall also write, when convenient,
wT E
Let u and fJ be fixed.
n;(u, fJ), or equivalently w T E n;(uT, fJT).
Obviously, if
r
>
r, nT'
c
n T. But also,
because E is assumed to be surjective, any w TEn; can be extended to an w T'
En;:.
Therefore, for a given pair (u, fJ), the set
of the elements of nT' to [0, r], is
n;"
of the restrictions
n;:
so that, finally, ( 5.14)
.
118
Chapter 5
Consider also the soft-constrained zero-sum differential game with full state information, defined by (5.1) and (5.11) (or equivalentIJ(5.12c)), but with-
out the N(xo) term. Let (J.l', 1/') be its unique state-feedback strongly time consistent saddle-point solution (given by (5.7)), and let V(t; x) (=
Ixlht))
be its value function. Finally introduce the performance index
GT(u,w)
= V(r; x(r)) + faT g(tj x, u, w)dt + N(xo).
(5.15)
We are now in a position to state the auxiliary problems:
(5.16) o
This is a well-defined problem, and, as a matter of fact, n;(u, y) depends only on (UT,yT). However, it will be convenient to use the equivalent form(5.17) since in fact GT depends only on uT and wT, and therefore (5.17) depends on n;/u,y) which is n;(u,y) in view of (5.14). Remark 5.1.
The maximization problem QT defined by (5.16) has an
obvious interpretation in terms of "worst case disturbance". We seek the worst overall disturbance compatible with the information available at time r. Thus, Xo and past values of w should agree with our own past control
and past measurements, but future w's could be the worst possible, without any constraint imposed on them, i.e., chosen according to 1/', leading to a least cost-to-go V(rjx(r)).
o
We now describe the derivation of a controller which will turn out to be an optimal ''min sup" controller. Assume that for (u T, yT) given, the problem QT admits a unique maximum CiT, and let
~
be the trajectory
119
Imperfect State Measurements: Continuous TIme
generated by (tY, &IT) up to time
T.
Let (5.18)
This defines an implicit relation J1, since therefore have a relation u( T)
xT(T)
itself depends on tY. We
= J1( uT, yT) whose fixed point is the control
sought. In fact, because of the casuality inherent in the definition (5.18), this fixed point exists and is unique, and (5.18) turns out to provide an explicit description for J1(y). Relation (5.18) says, in simple terms, that one should look, at each instant of time
~;-I
i _I
T,
for the worst possible disturbance
the available information at that time, and with the
(;jT
compatible with
correspo~ding worst.
possible state trajectory, and use the control which would agree with the optimal state feedback strategy if current state were the worst possible one. Thus this is indeed a worst case certainty equivalence principle, but not a
~I
separation principle since this worst case trajectory depends on the payoff index to be minimized.
Theorem 5.1. If, for every y E Y and every
T
E [0, t I], the problem
QT(J1(y),y) has a unique maximum &IT generating a state trajectory XT, then (5.18) provides a min sup controller solution for problem P-y, and the minimax cost is given by min max J-y
/'EM wEn
Proof.
= "'oER" max {V(O; xo) + N(xo)} = V(O; xg) + N(xg).
Notice first that the maximum in the second expression above
exists under the hypotheses of the theorem, and the notations in the third expression are consistent, since the problem QO, assumed to have a unique maximum, is just that maximizatipn problem. Introduce the function (5.19)
Chapter 5
120 Notice also that for fixed
u and w,
one has
_ _ oV oV a T_ or G (u,w)= or +a;f(r ;x,u,w )+g(r ;x,u,w ), ed at with all functions on the right-h and side above being evaluat (r;x(r) ,u(t),w (r)). It now follows from a theorem due to Danski n (see the Append ix B; Chapte r 9), that, under the hypothe sis that CiT is unique, function WT defined by (5.19) has a derivative in r:
, y) OWT(U Or
_( ) ~T( )) (~T() r, u r ,w r .
oV ( ~T() _( ) ~T()) or + ox f r; x r, u r ,w r + 9 r; x = oV
Suppose that
u = /1(y),
so that -a(r) is chosen according to (5.18); then it
follows from Is~acs' equation (2.16J that
8 T or W (/1(y) , y) ::; O. re As a result, WT(/1(y), y) is a decreasing function of r, and therefo
Moreover, since Vet,; x) = q(x), we have
Gtf
= J,p and thus (5.20a)
Xo On the other hand, for every given u E U, a possible disturb ance is along with a w that agrees with v·, and this leads to the bound
= xg
(5.20b) Compa ring this with (5.20a) proves the theorem .
o
lOur derivation above has in fact establis hed a stronge r type of optima ity than that stated in Theore m 5.1, namely T hypothe ses Coroll ary 5.1. For a given r, and any past u , and under the of Theore m 5.1, the use of (5.18) from r onward s ensures
121
Imperfect State Measurements: Continuous TIme
and this is the best possible (smallest) guaranteed outcome.
o
The above derivation holds for any optimal strategy J-L of the minimizer. As we have discussed in Section 2.3, and also in Section 4.2.2, many such optimal strategies may exist which are all representations (cf. Definition 2.2) of J-L*. We shall exhibit such strategies of the form 2 J-L",(t; x, y)
= J-L*(t; x) + r"p(y -
Cx)
where "p is a causal operator from Y into U, with "p(O)
= O.
Placing this
control into the classical "completion of the squares" argument (see (4.9) or (8.15)), we get q(x(tj))
(J + JT g(t;x,u,w)dt =
V(r;x(r))
Notice that y - Cx = Ew, and that EI/*(t;x) = 0 (from (5.8b)). Therefore, the integral on the right-hand side is stationary at w
= I/*(t; x),
and
maximum if, for instance,
In view of (5.8a), and for a fixed wE W such that Ew that r
Iwl 2: ITJI;"-l, so 2
= TJ, one easily sees
that the above holds, ensuring that all games from
to t j have a saddle point if (5.21)
Applying Theorem 5.1 with J-L'/', and noticing that by construction
yr = y
T ,
we get:
2This parallels the analysis of Section 4.2.2 that led to (4.12b) in the CLPS informa.tion case.
~I :;-;
Chapter 5
122
''1:__ _
-~~"
....-.
~
Corollary 5.2. A family of optimal controllers for problem P.., is given by
~::*
.:
-'
-"-~--",
(5.22)
where 1/J is any causal operator from Y into U such that (5.21) is satisfied, and the existence and unicity of the solution of the differential equation (5.1)
is preserved.
We may notice that (5.21) can be seen as imposing that 1/J be a contraction from L2([T,tJJ,JR~_1) into L2([T,tJJ,JR m l), for all T. The following theorem is
impo~tant
for the application of the above
result to our problem.
Theorem 5.2. If for all (u,y) E U x
y, there exists a t" such that the
problem Qt' fails to have a solution and exhibits an infinite supremum, then the problem P.., has no solution, the supremum in w being infinite for any admissible controller J1. EM.
Proof.
Let fl be an admissible controller,
wEn
the output generated under the control fl, and
be a disturbance,
u = p.(y).
For every real
wE
n tl (u, y) such
number a, there exists, by hypothesis, a disturbance that ct' (u, w)
> a.
y be
By the definition of n tl , the pair (u, w) generates the
same output fi as (u,w), so that
w together
with p. will generate (u,y).
Consider the disturbance w given by
w(t) =
w(t) { v" (x(t))
if if
t
< t"
t > t".
Because of the saddle-point property of /.I", it will cause J..,(J1.,w)
>
Ct'(u,w) 2: a, from which the result follows.
¢
~"2;:
I,:
'~'---
.~~"
....-.
123
Impeifect State Measurements: Continuous Time
~
~::*
.:
"-~--",
~"2;:
Remark 5.2.
The results obtained so far in this section do not depend
on the special linear-quadratic form of problem P..,. They hold equally well for nonlinear/non quadratic problems, of the type covered in Section 4.6, provided that the conditions for the application of Danskin's theorem, as developed in Appendix B,3 apply. However, it is due to the linear character of the problem that we shall always have either Theorem 5.1 or Theorem 5.2 to apply. Moreover, as we shall explain below, Theorem 5.1 can hardly be used to find an optimal controller unless some very special structure is imposed on the problem. We now undertake to use Theorem 5.1 to give a practical solution to problem P-r. The first step is to solve the gp-neric problem QT ('U', y). A simple way to do this is to set it in a Hilbert space setting. Let U T, WT, yT, ZT be the Hilbert spaces L2([O, r], IRk), with appropriate k's, where the
variables u T , w T , yT and ZT live. The system equations (5.1), (5.2), and (5.5) together define a linear operator from UT x W T X IR n into yT x ZT X IR n:
(5.23a) (5.23b) (5.23c) In the development to follow, we shall suppress the superscript r, for simplicity in exposition. First we shall need the following lemma.
Lemma 5.1. The operator dual to (5.23):
A*p + COq
0:'
j3
=
+ r.p*)..,
B*p+D*q+'IjJ*)..,
3In the nonlinear case, hypothesis H2 need to be replaced by weak continuity of the map
o!-+ G.
Chapter 5
124
admits the following internal representation, in terms of the vector function
>.(t): -).(t)
+ G'(t)p(t) + H'(t)q(t),
a(t)
B'(t)>.(t) + G'(t)q(t),
f3(t)
D'(t)>.(t)
>'0 Proof.
A'(t)>.(t)
>'(r)
= >."
+ E'(t)p(t),
>.(0).
The proof simply follows from a substitu tion of
x and
). in the
identity
>"(r)x( r) - >"(O)x(O) == iT ().'x + >.'x)dt, leading to
which proves the claim.
ced in Problem QT may now be stated in terms of the operato rs introdu (5.23), recalling that V(r;x( r)) maxxo,w [I 0), or (A, D) is comple r it is controllable over [t, t J], the matrix ~(t) is positive definite· wh~neve defined.
127
Imperfect State Measurements: Continuous Time
Proof.
In the case when
Xo
is unknown, ~(o)
= Qa 1 is positive definite,
and the result then follows from Proposition 8.4 given in Appendix A. In the case when
Xo
is known (and fixed),
~(o)
= 0, and the result follows
by duality with the Riccati equation of the control problem. By duality, complete controllability of (A, D) translates into complete observability of
(A',D'), which is sufficient to ensure that
~
> 0.
Notice also that the concavity of the problem QO, and hence the existence of an
xg
(which will necessarily be zero), necessitates that
Z(O) - ,2Qo < 0, which is equivalent to the condition that the eigenvalues of Q
a Z(O) = ~(O)Z(O), which are real, be of modulus less than ,2. 1
As
a consequence, the condition that (I - ,-2~(t)Z(t)) be invertible for all
t E [O,t,], translates into a condition on the spectral radius
of~(t)Z(t):
(5.35a)
or equivalently, since
~(t)
is invertible,
(5.35/1 ) We now state the main result of this section. Theorem 5.3. Consider the disturbance attenuation problem of this sec-
tion with continuous imperfect state measurements (5.2), and let its optimum attenuation level be , •. For a given, > 0, if the Riccati differential equations (5.6) and (5.28) have solutions over [0, t,], and if condition (5.35)
(in either of its two equivalent forms) is satisfied, then necessarily, 2: , •. For each such " there exists an optimal controller, given by (5.30) and
(5.32), or equivalently by (5.33) and (5.34), from which a whole family of optimal controllers can be obtained as in (5.22). Conversely, if either (5.6) or (5.28) has a conjugate point in [O,t,], or if (5.35) fails to hold, then ,
~
,., i.e., for any smaller, (and possibly for the one considered) the
supremum in problem P-y is infinite for any admissible J-L.
128
Chapter 5
Proof.
In view of Theorems 5.1 and 5.2, wha:t we have to do is relate the
concavity of the problems QT to the existence of Z,
~
and to condition
(5.35). These problems involve the maximization of a nonhomogeneous quadratic form under an affine constraint, or equivalently over an affine subspace. The concavity of this problem only depends on the concavity of the homogeneous part of the quadratic form over the linear subspace parallel to the affine set. We may therefore investigate only the case obtained by setting u
= 0 (homogeneous part of GT) and y = 0 (linear subspace).
Let us assume that Z and
~
exist over [0, t J]' and that (5.35) holds.
Let 4 (5.36) It is now a simple matter, using Lagrange multipliers or completion of the
squares method, and parameterizing all w's that satisfy Cx + Ew
w = -E' N-1Cx
+ (1 -
E' N- 1E)v,
= 0 by
v E lR m "
to see that: 5
wlc~a:w=o
[a: + a;:
and moreover, W(O; x)
f(t; x, 0, w) + g(t; x, 0, w)]
= -N(x).
=0
(5.37)
Integrating along the trajectory corre-
sponding to an arbitrary w, we get
W(T;X(T))+N(xo)+ iT g(t;x,O,w)dt~O. with an equality sign for the maximizing w T • Hence we have,
Vw E QT(O,O), GT(O,w)::; IX(T)lk(T) - W(T;X(T)) = IX(T)lhT)_-y'!:-l(T)' (5.38) with an equality sign for the maximizing w. Now, the condition (5.35b) implies that this is a concave function of x( T), and hence that GT (0, w) ::; 0 over QT(O, 0). Thus the conditions of Theorem 5.1 apply, and the first part of Theorem 5.3 is proved. 4This flll1ction W is not the same as the one used in the proof of Theorem 5.1. 5We use the same notation j, 9 and N as in the statement and proof of Theorem 5.1.
Imperfect State Measurements: Continuous Time
129
The maximizing w in (5.37) is of the form w
= Px (with P = D''5:,-l -
E' N-1C) , P bounde d. Hence, under that disturb ance, the map Xo t-+ X(T) is surjective. Therefore, the above derivation shows that if, at some condition (5.35) does not hold, OT can be made arbitrar ily large over QT(O,O). So we are only left with the task of proving that existence of '5:, T,
over [0, t J] is necessary, since the necessity of the existence of Z to have , ~ ,. is, as already seen, a direct consequence of Theore m 8.5. Take any (x, A) satisfying the differential equatio ns in (5.26), indepen dently of the bounda ry conditions. Compu ting (,AI x) and integra ting, dt we get
!:..
,2[A'(T)X(T) - A'(O)X(O)] +
{(lxl~ -,2IwI 2)dt = O.
(5.39) .
vU
Introdu ce the square matrice s X(t) and yet) defined by 6
X
AX + DD'Y,
Y
(C' N-1C _,2Q) X - A'Y,
X(O)
= Qiil, YeO) = I.
All solutions of (5.26) that satisfy the first bounda ry conditi on (at 0, in (5.26a)) are of the form x(t) =X(t)e , A(t) = ,2Y(t)e , where E IRn is a constan t. But also, as long as Y, which is invertible at 0, remains invertib le
e
for all t, we have '5:,
= Xy- 1 , as direct differentiation of this last expression
shows. Then, X is invertible, since '5:, is nonsingular by Proposi tion 5.!. Now, let T* be such that Y(T·) is singular, and let i 0 be such that Y(T*)e O. Then X(T*)e i 0, because the vector (x' A)' «(X',2 (Y')',
e
=
=
solution of a linear differential equatio n not zero at t = 0, can never be zero. Using Xo QiiIC, the admissible disturb ance w D'YC - E'N-1 CXC leads to x X(t)C with A Y(t)C. Application of (5.39) then yields
= =
=
=
Therefore, for any smaller " since w here is nonzero, GT • (O,w) will be positive, implying that the (homogeneous) problem is not concave . This completes the proof of the theorem. o 6These are Caratheo dory's canonica l equation s, as revived by Kalman.
130
Chapter 5
Remark 5.4.
We may now check that the same theorem with the same
formulas, except that E(O) = 0, hold for the problem where Xo is given to be zero, and (A, D) is completely controllable. The sufficiency part can be kept with the additional classical device of Tonelli to handle the singularity at OJ see [30] or [21]. (The trick is to note that in a neighborhood of 0, a trajectory X(t)e is optimal for the problem with free initial state, and with initial cost given by -/2(lxoI2
+ 2e'xo).
Therefore, any other trajectory
reaching the same x(t) gives a smaller cost, but for trajectories through
x(O) = 0, the above initial cost is zero, so the costs compared are the same as in our problem). The necessity of the existence ;f E is unchanged, and the necessity of (5.35) win again derive from (5.38) if we can show that any :1:(/) can be reached by maximizing trajectories. The controllability hypothesis ensures
>
that E(t)
0, but then Y being invertible, so is X
= EY.
reachable X(T) are precisely the points generated by X(T)e, completes the proof of the result for the case when Xo
Since all
e E JRn, this
= 0,
Notice also that although this is not the way the result was derived, replacing Q01 by the zero matrix amounts to placing an infinite penalty on nonzero initial states.
o
Proposition 5.2. For the problem with Xo
= 0,
let 10 be the optimum
attenuation level for the full state information problem and IN for the
current problem P...,. Then, if C is injective, IN is continuous with respect to N at N
= O.
> 10. There is an (; > 0 such that E is defined over [0, (;], and sufficiently small so that p(E(t)Z(t)) < 12 over that interval. Now, looking Proof.
Let I
at the equation satisfied by E- 1 , it is easy to see that we may choose N small enough to ensure E-1(t) have I
> IN > 10·
> 12Z(t). Hence, for N small enough, we
Since this was for any I
> 10,
the proof is complete. 0
-~'i~
131
Imperfect State Measurements: Continuous Time
Example.
Let us return to the analysis of the running example of Chap-
ter 4, this time with a measurement noise:
x
U
y
X+V, andletv=N!w2,
J
itl (x2
+ WI,
Xo
+ u2)dt.
= ~/r
Recall that if we set m
= 0 given,
(hence assuming, < 1), we have
Z(t) = (11m) tan[m(t I - t)], and that the conjugate-point condition for Z IS
4tJ ,2 > --::---..:..,..-,:+ 4tJ' or equivalently t
The
RD~
,
11"
--=== ~2'
0.8524.
We end this section with a proposition showing that the above situation prevails for a large class of problems. Proposition 5.3. When both Z(t) and 'E(t) are positive definite (this
would be true if, for example, both Qo and QJ are positive definite), the spectral radius condition (5.35) is the binding constraint that determines
Proof.
As, is decreased to approach the conjugate-point threshold of
either Z or 'E, for some t* the corresponding matrix goes to infinity, while the other one is still bounded away from zero, so that p(E(t*)Z(t*)) Therefore condition (5.35) is violated first.
5.3
-+ 00.
Sampled-Data Measurements
We now replace the continuous measurement (5.2) by a sampled-data measurement. As in Section 4.3.2, let {tkh~l be an increasing sequence of measurement time instants:
133
Imperfect Stale Measurements: Continuous TIme
The measurement at time t", is of the form
(5.41) and we take as the norm of the overall disturbance w
= (xo, w, {vd ):
where K
IIvW
=L
Iv", 12.
"'=1 We therefore have to determine whether, under the given measurement scheme, the upper value of the game with cost function
(5.42) is bounded, and if in the affirmative, to obtain a corresponding min-sup controller
The solution to this problem can be obtained by a simple extension of the method used in the previous section for the continuous measurement case .. Toward this end, we first extend the certainty-equivalence principle. For simplicity in exposition, we will use the following compact notation which, with the exception of the last one, was introduced earlier:
A(t)x + B(t)u + D(t)w G",x + EkV",
Ixl~1 -,2Ixl~o
Ixl~
+ lul 2-,21w1 2 -,21v1 2
f(t;x, u, w) h",(x,v",) q(x)
-. N(x) g(t;x,u,w) K(v).
Chapter 5
134 In the sequel, we will also use the fact that K(G)
'!Iv
0" and K(v)
1,
and
z+ =
/
~'
and for the perfect state information problem, /. = 1. As .,. in--. Section 5.2, the equation for ~ is the same as that for Z. Hence, we have ~+
=Z+
and the global concavity condition yields
hence /2
> 2.
Therefore, for this imperfect information problem, we have
/. = .../2, which shows an increase in the achievable attenuation level, due to the presence of disturbance in the measurement equation.
144
5.5 5.5.1
Chapter 5
More General Classes of Problems Cross terms in the cost function
We investigate here how the results of the previous sections generalize to cases where the simplifying assumptions (5.4c) and (5.8b) do not hold. Consider a problem with a cost function of the form J
= Ix(tj)I~1 +
1tl
(x' u')
(~, ~)
( : ) dt,
(5.60a)
where the weighting matrix of the second term is taken to be nonnegative definite, and R
> O. Note that in this formulation (5.4b) and (5.4c) have
respectively been replaced by
G'(t)G(t)
R,
(5.60b)
H'(t)G(t) =: P.
(5.60c)
=:
Symmetrically, we shall have, instead of (5.8b),
D(t)E'(t) =: L.
(5.61 )
t~rm
in the cost is most easily dealt with
by making a substitution in the forms u
= u-R- 1 P'x. It is also convenient
As seen in Section 4.5.1, the cross
to introduce
A=A-BR-1p',
Q=Q_PR-1p'
(5.62)
so that the RDE (5.6) becomes
z+ zA + A' z -
Z(BR- 1B' - 1'-2 DD')Z + Q = 0,
Z(tj) = Qj (5.63)
and the minimax full state information control for a fixed value of l' is
Similarly, for the estimation part it will be useful to introduce
A=A-LN-1C,
M=DD'-LN-1L'.
(5.64)
8Note that, as compared with the notation of Section 4.5.1, we have u = R-!:U,
.4=A,Q=Q.
145
Imperfect State Measurements: Continuous Time
m'~,
We may either keep the cross terms in the derivation of Theorem 5.3, as in
e'l
[23], or use the transformation we used in the previous section, to obtain the generalized version of the RDE (5.28), which becomes
and of the compensator (5.30), which is now
An alternative form for this equation, also valid in the former case, of course, but more appealing here, is (5.66b)
where i:=
Hx+Gu,
and (5.67) The other form, (5.33)-(5.34), also takes on a rather simple form: (5.68)
u = -R- 1 (B' Z + P')x
(5.69)
where we have introduced
:
Finally, in Corollary 5.2, or Corollary 5.3, we must use 1/J(y - (C
+
'Y- 2 L' Z)X), and take into account the weighting matrix R on u, leading
to the following counterpart of condition (5.40):
'IT E [O,t!], 30:' E (0,1): 'IIy(.) E L2,
loT 11/J(y)I~-ldt:$ 0:' loT Iyl;"'_ldt (5.70 )
We may now summarize the above in the following Theorem.
I
I
146
Chapter 5
Theorem 5.6. Consider the disturbance attenu9-tion problem of this sec-
tion, with the system described by (5.1)-(5.2), and the performance index given by (5.60). Let its optimum attenuation level be 'Y •. i) Using the notation introduced by (5.8a), (5.61), (5.62) and (5.64), if the RDE's (5.63) and (5.65) have solutions Z and that meet condition (5.35), then 'Y
~
~
defined over [0, t J]'
'Y •.
ii) A family of controllers that ensure such an attenuation level 'Y is given by U
= u+'Yt/;(y-y), where Uis given by (5.66) and (5.67), or equivalently
by (5.68) and (5.69), and t/; is any causal operator satisfying condition (5.70), and preserving existence'and unicity of the solution to the differential equation (.5-1)' iii) If either (5.63) or (5.65) has a conjugate point in [0, t J]' or if (5.35) fails to hold, then 'Y
~
'Y •.
ivy If the initial state Xo is known to be zero, then the initial condition in (5.65) must be replaced by
~(O)
= 0, and the above holds provided
that the paIr (A, D) is completely reachable over [r, tJ] for every r E
[0, t J ]. v) For the infinite-horizon version, the solution is obtained by replacing the conditions on the existence of solutions to the two RDE's by conditions on the existence of their (positive definite) limits Z+ as
t
----->
-00 and
~+
as t
----->
+00, and replacing Z and
~
by Z+ and
in condition (5.35) and in (5.66b)-(5.67) or (5.68)-(5.69). 5.5.2
~+
-I
-.-- !
I I
Delayed measurements
I
We now consider the class of problems where the available information at time t is only
Y[O,t-8] =: yt-8
where 0
>
°
is a time delay. As long as
t < 0, there is no available information, and the situation is the same as in the case investigated in Section 4.2.4. Then, we can easily extend
]
-J
147
Imperfect State Measurements: Continuous TIme
the certainty equivalence principle to that situation, the constraint in the auxiliary problem being
Since there is no constraint on wet), t E [r - 8, r], by a simple dynamic programming argument, we easily see that in that time interval, wet) =
v*(t;x(t)), and u(t)
= J.l*(t;x(t)).
VCr, x(r))
+
iT
T-8
As a consequence, we shall have
get; x, u, w)dt
= V(r -
8, x(r - 8))
so that
w
r
T T) sup GT( u,w En;_8
=
sup
GT-8( u T-8, w T-8).
(5.71)
wr-ien~=:
The trajectory of the delayed information auxiliary problem is therefore obtained by solving the standard auxiliary problem up to time t
=r
I
- (),
and then using u(t) = J.l*(t,x(t)), and wet) = v*(t;x(t)). Of course, this assumes that the lliccati equation (5.6) has a solution over (0, r). Now, the concavity conditions for the new auxiliary problem comprises
.[
two parts. Firstly, the problem from r - 8 to r, with u(t) fixed as an open-loop policy as above, must be concave. We have seen in Section 4.2.1 I'
(cf. Theorem 4.1), as also discussed in Section 4.2.4, that this is so if the Riccati equation
S+A'S+SA+-y- 2 SDD'S+Q=0,
S(r)=Z(r)
has a solution over [r - 8, r]. Secondly, we have the conditions of the full state delayed information problem, which allow us to bring concavity back to that of problem GT-8 in (5.71), in view of Theorem 5.3. Hence, we arrive at the following result, which is essentially obtained by applying a certainty equivalence principle to the solution given in Theorem 4.7 for the full state delayed information problem.
I
148
Chapter5
Theorem 5.7. Consider the problem with delayed imperfect information,
with the delay characterized by the constant B. Let its optimum attenuation level be denoted by,·. Given
a, >
0, if the conditions for the full state
delayed information problem (c£ Theorem 4.4 or Theorem 4.7) are satisfied, and if furthermore equation (5.28) has a solution over [0, t! - B], satisfying
(5.35) over that interval, then necessarily ,
~
,.. Moreover, an optimal
controller is obtained by placing xt-S(t-B) instead of x(t-B) in the solution (4.l9a)-(4.20), where xtJ (t) is given by (5.33).
5.5.3 Let
UE
o
Nonlinear/nonquadratic pr?blems take up the nonlinearfnonquadratic problem of Section 4.6, as de-
fined by (4.54), (4.55a), but with a nonlinear imperfect measurement as in (5.12b). We may again assume a quadratic measure of the intensity of the noise, and of the amplitude of the initial state. (Although at this level of generality this might as well be replaced by any other integral measure). For a given " assume that the associated soft-constrained differential game with CLPS information admits a saddle point, with a value function
.. !
Vet; x), and a feedback minimax strategy Il-y(t; x). We have stressed in Remark 5.2 that the certainty equivalence principle holds regardless of the linear-quadratic nature of the problem at h a n d , l provided that the conditions for the application of Danskin's theorem (cf. Appendix B) are satisfied. We may therefore still define the auxiliary problem as in (5.13)-(5.16), and state Theorems 5.1 and 5.2. Furthermore, notice that the function Vet; x) can be computed off-line as a preliminary step, while a possible tool to solve the auxiliary problem
is the forward Hamilton Jacobi equation (5.37) reformulated here as max
wlh(t;z,w)=y(t)
OW aw J(t; x, u, w) [ -at + -a x W(O;x) = N(x).
+ get; x, u, w) ] = 0,
149
Imperfect State Measurements: Continuous Time
In principle, this equation can be solved recursively in real time. Assume, for instance, that the state space has been discretized with a finite mesh, and Wet; x) is represented by the vector Wet) of its values at the discretization points. Assume further that {)W/{)x is approximated by finite differences. Then, this is a forward differential equation driven by y, of the form
dW(t)
~
Finally, we can, in principle, solve in real time for
xl(t)
= argmax[V(t;x) :c
W(t;x)].
If W is well defined and the above argmax is unique, then by Theorem 5.1, a minimax strategy is u(t)
I
= F(t; Wet), u(t), yet)).
= J-L-y(t;xl(t)).
III
Ii /i
I! I:
;i
'[
5.6
Main Results of the Chapter
I:
This chapter has presented counterparts of the results of Chapter 4 when
I
the measurements available to the controller are disturbance corrupted. For
I
the basic finite-horizon problem where the measurements are of the type
Iii
"
1 -- t
(5.2), the characterization of a near-optimal controller is given in terms of two lliccati differential equations, one of which is the RDE encountered
II
ih Chapter 4 (see (5.6)) and the other one is a "filtering'; Riccati equa-
I!
tion which evolves in forward time (see (5.28)). The controller features a
II
certainty equivalence property, reminiscent of the standard LQG regulator problem (see (5.27), or (5.32)), and it exists provided that, in addition to
:1
!I II
II
if
the condition of nonexistence of a conjugate point to the two RDE's, a
I'
spectral radius condition on the product of the two solutions of the RDE's
I:
(see (5.35a)) is satisfied. The pre~ise statement for this result can be found
ir
in Theorem 5.3, whose counterpart for the infinite-horizon case (which is
Ii
the original four-block problem) is Theorem 5.5.
I
In addition to the standard CLIS information structure, the chapter has also presented results on the imperfect sampled-data measurement case (see·
I
l
150
Chapter 5
Theorem 5.4), and delayed imperfect measurement case (see Theorem 5.7), both on a finite time horizon, with the controller (guaranteeing an attenuation level,
> ,.) in each case satisfying a form of the certainty equivalence
principle. The results and the derivation presented here for the linear problem are based on the work reported in [23], [25], and this methodology has some potential applications to nonlinear/nonquadratic problems as well (as discussed in Section 5.5.3). The basic result on the infinite-horizon problem with continuous measurements (cf. Theorem 5.5; with two ARE's and a spectral radius condition) was first derived in [37], using a method quite different than the one presented here.' Since then, several other derivations and extensions have appeared, such as those reported in [80], [55J,
-r
[50], [51], [71], [59], [81], [82], where the last two employ a direct "completion of squares" method. Some extensions to systems with multiple controllers, using decentralized measurements, can be found in [83]. The idea of breaking the time interval into two segments and looking at two different dynamic optimization problems, one in forward and the other one in backward time (as used here) was also employed in [50]; for an extensive study of the backward/forward dynamic programming approach applied to other types of decision problems the reader is referred to [85]. The results on the sampled-data and delayed measurement cases appear here for the first time.
-;
Chapter 6 The Discrete-Time Problem With Imperfect State ~easurements
-r
6.1
The Problem Considered
We study in this chapter the discrete-time counterpart of the problem of Chapter 5; said another way, the problem of Chapter 3, but with the measured output affected by disturbance: (6.1) (6.2)
.........
The cost function for the basic problem is the same as in (3.2a): K
L(u, w)
= IXK+lI~J + L IXkl~. + IUkl 2 == IXK+lI~J +lIxll~ + lIuW
(6.3a)
k=i
that we shall sometimes write as (6.3b) where z is the "controlled output": (6.4)
with (6.5a) (6.5b) -;
1h 1
152
Chapter 6
(6.5c) For ease of reference, we quote the solution given in Section 3.3 for the full state-information min-max design problem with attenuation level,. Let
{Mk h~K+l be the solution of the discrete-time Riccati equation (3.4), that we shall write here in one of its equivalent forms:
This form assumes that Mk is positive definite. As mentioned in Section 3.2.1, this is guaranteed by the following hypothesis that we shall make throughout the present chapter: QJ > 0, and Ak '] rank [ Hk
(i.e.,
[A~ H~l'
= n,
(6.7a)
is injective). Likewise, we shall assume that
(6.7b) (i.e., [Ak Dk] is surjective). The hypotheses (6.7a) and (6.7b) are less restrictive than assuming complete observability in the first case, and complete reachability in the second. Under the first hypothesis, the optimal full state information controller may be written as
In the present chapter, we assume that only imperfect information on the state
Xk
is available. More precisely, in the standard problem we shall
assume that the information available to the controller at time k is the sequence {Yi,i = 1, ... ,k -I}, that we shall write as yk-l, or Y[l,k-l]. Hence, admissible controllers will be of the form
and the set of all such controllers will be denoted by M.
As in the
continuous-time case, we shall assume that Ek is surjective, and let
(6.10)
I,
Imperfect State Measurements: Discrete Time
153
which is therefore invertible. Dually to (6.5c), we shall also assume (6.11) to simplify the derivation. After solving the problem under these restrictions, we will discuss in Section 6.3 extensions to cases where (6.5c) and/or (6.11) are not satisfied. In the standar d problem,
Xl
will be part of the disturb ance, and we
shall use the notatio n w for {wkh~l' and
(Xl'W) =:w E n:= IR n x W. This extende d disturb ance formula tion is convenient to work with, because it ailows for derivations parallel to the continuous-time case, already discussed in Chapte r 5. However, as pointed out in Chapte r 3 (Section 3.5.2), this case can be embedd ed in the fixed initial state case by includin g one more time step and defining Xl
= Wo,
Xo = O.
As in the continu ous-tim e case, we shall show during the course of the derivat ion how the result should be modified if instead Xl is taken to be zero. To obtain relatively simpler results, this will require the complete reachab ility of (Ak, Dk) over [1, k], k
= 2, ... , which is equivalent to (6.7b)
togethe r with Dl being surjective. Introdu ce the extende d perform ance index (6.12) with Qo positive definite. The problem to be solved is then the followin g:
Proble m P-yo
Obtain necessary and sufficient conditions on I for the upper value of the game with kernel (6.12),
..
.. info sup J-y(J.L,w), ~ ~
IJEM wEn
.i
'.'
I
Chapter 6
154
to be finite, and for such a , find a controller under which this upper value (which is zero) is achieved. The infimum of all such ,'s will be denoted by
The solution to this problem is given in Section 6.2, by employing a forward-and-backward dynamic programming approach.l In Section 6.3, we solve the stationary infinite-horizon problem, which corresponds to the discrete-time four-block HOO-optimal control problem. In Section 6.4, we give the formulas for the case where the restriction (6.11) does not apply. We also generalize the "one-step predictor" imposed by the form (6.10) to an arbitrary "8 steps predictor", including the case
(J
= 0,
i.e., the "filtering" problem. Furthermore we include a dis-
cussion on the nonlinear problem. The chapter concludes with Section 6.5, which summarizes the main points covered.
6.2
A Certainty Equivalence Principle and Its Application to the Basic Problem P"(
To simplify the notation, we shall rewrite (6.1), (6.2) as
(6.13a) (6.13b) and (6.12) as K
J-y = M(XK+I)
+ Lgk(Xk,Uk,Wk) + N(xI}. k=l
For fixed sequences ii T- I := (iiI, U2,"" UT_I) and f/
:= (iiI, i12,···, ih),
introduce the constraint set
nT(iiT-I,i/) = {w 1 For
En IYk =Yk,k= 1, ... ,r}.
(6.14)
an extensive discussion of this approach, as it arises in other contexts, see [85].
155
Imperfect Stale Measurements: Discrete nme
Here, Yk is the output generated by u and w, and the constraint may be checked knowing only w T := (Xl, WI, W2, ... , WT). In fact, we shall need to consider sequences of length AT'
HT
2: T, and we shall write
~
- k = 1, ... , } = {T' w E AT' I Yk = Yk, T
H
(6.15)
•
Let the value function of the full state-feedback two-person dynamic game defined by (6.1) and (6.13), i.e., IXkl~k be denoted Vk(Xk). Introduce the auxiliary performance index T
T T GT(U ,w )
= VT+1(X T+I) + L gk(Xk, Uk, Wk) + N(xd
(6.16)
10=1
and consider the auxiliary problem QT (uT-I, iy-l): nT-I
max
LT
w r - 1 En~=!( gT-2 ,gr-l)
1
I
I
(-T-l U ,w T-I)
(6.17)
We may now introduce the controller which will turn out to be optimal (min-max) for P..,. Let CY be the solution of the auxiliary problem QT, and
X'" be the trajectory generated by u
T
-
1
and CY. The controller we will use
is ·UT
= J-LT•
(~T)
XT
This defines a complete sequence
= /l-T (-T-l U , Y-T-l) ..
U
~
= Ji.(y), where Ji. is strictly causal.
The intuition is clear. The above says that at time k =
T -
(6.18)
T,
knowing Yf; up to
1, one should look for the worst possible disturbance compatible
with the available information, compute the corresponding current
XT
,
and
"play" as if the current state were actually that most unfavorable one. We may now state the main theorem:
Theorem 6.1. If, for every Y E Y, and every
T
E [1, Kj, the auxiliary
problem QT(j1(y),y) has a unique.maximum attained for w
= QT, generat-
ing a state trajectory X'", then (6.18) defines a min sup controller for P.."
and the min-max cost is
~. ru
II
156
Chapter 6
We need to consider an auxiliary problem where 'U T - 1 is fixed at
Proof.
UT-I, and 'U T is given a priori as a function of XT, as 'U T = J.l;(x T). Let
u
T
-
1 .
J.l;
be that strategy.
LeIllIlla 6.1. We have max
GT-I(UT-I,wT-I)
wr-lEn;=~( a r -
2
,u r -
1)
and the optimal &y-l of both problems coincide.
(Notice that this is the property (5.71) we used in Section 5.5.2).
Proof.
~irlp --- -}
In the nroblem of thp. rilrht-hann ....
LJ
- -
----
111--,
i~ lJn~()n~t.r"inpd J -~inrp -----
--
-----------~----
it. in-Au-
--
-----
ences only XT+I and YT. Hence, by dynamic programming, WT maximizes VT+1(XT+I)
+ g(xT,J.l;(XT),WT),
and by Isaacs' equation (see (2.12)) we
know that the maximizing WT is lI;(XT), and the resulting value of this sum isVT(x T).
Continuing with the proof of Theorem 6.1, we may now define WT(uT-1,it- l )
.
wr
-
1
GT-I(UT-I,wT-I)
max
En;::: ~(OT-2,11T-l)
Using the representation theorem (Theorem 2.5), we also have, freezing 'U T : W T(-T-I 'U ,y-T-I) -_ WT
(On the right-hand side,
max En;_1 (U T - 2 ,11 1) r
-
GT(-T-I '(~T) ' U . J.l T xT , WT) .
x; is fixed, as generated by the wT solution of QT).
Assume therefore that
u = il(X;). T
-T) C T'U'Y
~y(-T-I
Since
(-T-2 -T-I) T_1'U'Y'
~y
we get, with this controller, provided that GT(UT,w T) is concave, WT+I(U T, ir) = max GT(uT- 1 . J.l;(X;),WT) wrEn~(ur-1JgT)
(6.19)
157
Imperfect Stale MeasuremenJs: Discrete Time
We conclude that if u
= Ji(y), W
T
decreases with
T,
and that
Vy, Furthermore, since GK(u,w)
= J-y(u,w),
and for any wEn, if y is the
output it generates with (6.18), then necessarily w E nKCJi(y), y), we get (6.20) Now, it is possible for the maximizing player to choose the initial state
x~
and then the w coinciding with lI*(x), and this ensures the bound:
(6.21)
A comparison of (6.20b) and (6.21) establishes the result. The above derivation shows that whatever UT time
T
1
and i/- 1 are, if from
onwards we use (6.18), W T decreases. Hence we have in fact estab-
lished a stronger result than that claimed in Theorem 6.1: L L
IL
Corollary 6.1. For any given T and any past u T , and under the hypothes~
of Theorem 6.1, using the controller Ji of (6.18) will ensure
(6.22) and this is the best (least) possible guaranteed outcome.
We now have the following theorem, which is important in the context of our problem P-y.
I, I
Theorem 6.2. If for all (u, y) E U x y there exists a k* such that problem
Qk* defined by (6.17) fails to have a finite supremum, then problem P-y
has no solution, the supremum with respect to w being infinite for any con troller J.L EM. ,
,
I,
158
Chapter 6
Proof.
The proof is identical to that of Theorem 5.2.
Remark 6.1.
Theorems 6.1 and 6.2 do not depend on the linear-quadratic
character of problem P-y, and Theorem 6.1, in particular, might be interesting in its own sake since it applies to problems with nonlinear state dynamics and/or non quadratic cost functions. However, the use we shall now make of them in the sequel is completely dependent on the specific (linear-quadratic) form of P-y.
To make things simpler, we reformulate problem QT in a more compact way. Letu T- 1 E£2([1,T_1],IRml)~1Rml(T-l),yT-l E£2([1,T-1],IRP)~
IRP(T-l\ zT-l E £2([1, T - 1], IR q) V T- 1 , 7J T-
1,
(T-l, tpT-l,
~
"V- 1, and
1R. q(T-l\ and AT-I, BT-l, C- 1 ,
0 , Vk E [1,K];
(ii) Equation (6.25) has a solution over [1, K
+ 1],
with
(iii) Condition (6.30a) or equivalently (6.30b). Under these three conditions, ,
~
,., and an optimal (minimax) controller
is given by (6.6), (6.25), (6.27), and (6.29) .. If one of the three conditions above is not met, then, :::; ,., i.e., for any smaller" problem P-y has an
infinite supremum for any admissible controller J.l EM.
Proof.
We have seen earlier in Section 3.2 (cf. Theorem 3.2) that the
first condition of the theorem is both necessary and sufficient for the full state information problem to have a solution. It is a foriiori necessary here.
ij
:I I
i
l
tIt. . ~
~...
In view of Theorems 6.1 and 6.2, we must study the role of the other two conditions in the concavity of problems QT. These are nonhomogeneous quadratic forms to be maximized under affine constraints. Strict concavity of such a problem is equivalent to that of its homogeneous part. We may therefore restrict our attention to the case where
11 T-l
= 0 and yT-l = O.
We shall use a forward dynamic programming approach, with
:
I ~
162
Chapter 6
and then (6.32) The function W satisfies the dynamic programming equation
W1(x)
= N(x),
() = [ () + ( ° )]
Wk+l x+
max W k x x,w
Uk x, ,w
(6.33a)
. {x+=/J.:(x,O,W) su b Ject to O=hk(X,W)
(6.33b)
Because of the linear-quadratic nature of the problem, we solve it with (6.34) Let us investigate one step of the forward dynamic programming. To simplify the notation, we shall write x and w for Xk and Wk, X+ for Xk+l, and likewise K and K+ for Kk and KHI. Equations (6.1) and (6.2) are now replaced by
X+ =Ax+Dw,
(6.35)
°=Cx+Ew.
(6.36)
Since E was assumed to be surjective, by an orthogonal change of basis on
w, the above can be cast into the following form, where P pi
= I, and E2
is invertible: (6.37) Moreover, hypothesis (6.11) yields
DE'
= D~E2 = 0,
and hence
D2
= 0,
so that (6.35)-(6.36) now read (6.38)
163
Imperfect Stale Measurements: Discrete Time
In the sequel, we shall let D stand for D l , w for , however that
D1D~
= DD',
and
E2E~
Wl,
and E for E 2 . Notice
= EE' = N. The constraint (6.39)
v = -EilCX, or Ivl 2= Ixl~/N-IC' Thus, (6.33b) now reads Ix+li-+ max [lxli- + IxlQ2 - 'llwl 2- r2Ixl~/N_lcl X,w
also yields
(6.40) subject to
Ax + Dw = x+.
We had assumed that [A D] is surjective. Therefore, this problem either has a solution for every
x+,
or for some
x+
it has an infinite supremum.
In that second case, the problem Qr is unbounded. Hence a necessary condition would be that the first case prevails, leading to the conclusion that extremal trajectories may cover all of IR n at any time step. But now, (6.32) reads
rr;~
[lxrI1, + IXrli-J .
This problem is strictly concave if, and only if, (6.41)
r, so that if Mr + Kr is singular for some r > 0, for any smaller value of r the problem has an (And we shall see that Kr is strictly increasing with
infinite supremum). Now, recall that under our standing hypothesis (6.7), Mr
> O. Thus, a necessary condition 'iT E [2, K
is
+ 1],
Kr
< O.
Furthermore, this condition is also satisfied by Kl case where
Xl
(6.42)
= -r2Qo in the standard
is unknown.
Lemma 6.3. Let P := K
+Q
- r 2 C ' N-1C. A necessary condition for
K+ to be negative definite is that P :S 0 and Ker Pc Ker A
Proof. Ixl~
Suppose that P is not nonpositive definite. Let x be such that
> 0. 2 If Ax
= 0,
problem (6.40) is not concave, and has an infinite
2Here, perhaps by a slight abuse of notatioll\ we use it is not a nonn (because of the choice of P).
IxlJ, to mean x, Px, even though
Chapter 6
164
>
0, to allY (x, w) m~eting the con-
= Ax.
Then (x,O) meets the constraint;
supremum (since we may add ax, a straint). If Ax =F 0, choose x+ thus Ix+lk+
2:
and get Ix+ Ik+
Ixl~
2:
> 0. Likewise, if Ixl~ = 0, but Ax =F 0, pick x+ = Ax,
Ixl~
= 0.
This completes the proof of the lemma.
0
Thus, an orthogonal change of basis on x necessarily exists, such that
A with
P < 0.
= [A
0]
P=
[~ ~]
x= [
~]
The maximization problem (6.40) now reads
where [.If D] is still surjective. Introduce a Lagrange multiplier 2p E IR and notice that DD'
n
_,2 AP- 1A' > 0, so that the stationarity conditions
,
yield X
w
_,2 p-1 A(DD' _,2 AP-1 A,)-1 x+, D'(DD'
and thus
K+
_,2 AP- 1A,)-1 x+ ,
= (AP- 1A _ ,-2 DD,)-1.
(6.43)
We now claim the following further fact:
Lemma 6.4. If for some k, Pk introduced earlier in Lemma 6.3· is not
positive definite, then for the corresponding, we have the inequality,
,..
Proof.
:s
We shall prove that if P is only nonnegative definite, for any
smaller, it fails to be so, so that, according to the above, problem QT fails
165
Imperfect State Measurements: Discrete TIme
to be concave for some
T.
From (6.40),
IX2li = max [-,2IxIIQ2 +lxdQ2 -,2IwlI2_,2Ixl~INC]' :l
Hence, here, PI
X'l,W
0
1
1
= _,2Qo +Ql -,2C~NlICI.
1
1
Since Qo is positive definite,
PI being singular for some, implies that for any smaller value of, it will fail to be nonpositive definite. Suppose that PI is negative definite. It is strictly increasing (in the sense of positive definiteness) with,. But then according to (6.43), because [A D] is surjective, with" and recursively, so are the
[{Ie'S,
[{2
is strictly increasing
as long as P is negative definite.
Thus let k = k* be the first time step where P is singular. The above reasoning shows that, for any' smaller 7, P ","",lill no longer be nonpositive
definite. This completes the proof of the lemma.
We now continue with the proof of Theorem 6.3. As a consequence of Lemma 6.4, (6.41) may be rewritten as
or, if we let ~Ie :=
_,2 [{;I, we arrive precisely at equation (6.25a), while
(6.41) is then equivalent to (6.30). Moreover, the condition P
< 0 is exactly
the positivity condition stated in the theorem, so that the necessity of the three conditions to ensure, P
2:
,*
is now proved. But according to (6.40),
< 0 is clearly sufficient to ensure concavity in the dynamic programming
recursion, and therefore together with (6.30) ensures concavity of all the auxiliary problems. Thus, in view of Theorems 6.1 and 6.2, the proof of the main theorem has been completed.
Remark 6.3.
o
In the case where x 1 is given to be zero, and Dl is surjective,
only the first step differs:
166
Chapter 6
I.e., K2
= 1 2(D 1 DJ.)-1,
or E2
tion (6.25) initialized at EI Proposition 6.1. Let
= DID!.
Hence, we may still use equa-
= 0, with Ql = o.
10 be the optimum attenuation level for the full
state feedback problem (as given in Theorem 3.5), and IN be the one for the Xl
= 0 and N = {N,d
IN
is continuous with
El:+l
converges to DI:D~
current (imperfect state measurement) problem with given by (6.10). If the matrices respect to N at N =
Proof. as NI:
CI:
are injective,
o.
According to (6.25a), if CI: is surjective, --->
O. Thus for NI: small enough, (6.25a) has a solution, and the
positivity condition is met. Furthermore, let I
> 10. By hypothesis, the
positivity condition (3.7) is satisfied, i.e.,
or equivalently D I: D I:'
so that, as
El:+l
< I 2M-l 1:+1
DI:D~,
approaches
(6.30a), at time k + 1, will eventually
be satisfied also. Therefore, for N small enough, never be smaller that
I > IN. But since IN can
10, we conclude that I > IN 2: 10. Since this was
o.
true for any
I > I~' we conclude that IN
Example.
We now revisit the illustrative example of Chapter 3, by
--->
10 as
N
--->
adding an observation noise:
=0 where VI: = N~ WI:
XI:+1
XI:
+ UI: + WI:,
YI:
XI:
+ VI:
Xl
Note that the disturbance is now the two-dimensional vector The performance index is again taken to be 3
J
= L: X~+1 + U~ . 1:=1
I
(WI: WI:)').
~I
167
Imperfect State Measurements: Discrete TIme
Applying the theory developed above, it is a simple matter to check that, as Since D.\; = 1, the nonsingularity condition for the first Riccati equation is M.\; < 'Y 2, and therefore, for N small enough, N
---->
0, E.\;
---->
1 for k
= 2,3,4.
the spectral radius condition p(E.\;M.\;) < 'Y2 will be satisfied. Let us now be more explicit for the case N = 1. Equation (6.6) yields
Since we have EI
= 0, there is no condition to check on MI.
It has been
seen in Section 3.2.4 that the most stringent of the positivity conditions'is here M2 < 'Y 2 , which yields 'Y2 > 1.2899 (approximately). We also find that
The second positivity condition of Theorem 6.3 turns out to be less stringent than its first condition (given above), and hence it does not impose any additional restriction on 'Y. The third (spectral radius) condition, on the other hand, yields E2M2 < 'Y2 E4M4 < 'Y2 E3 M 3 < 'Y2
M2 < 'Y2; M4 < 'Y2; Mj < 'Y2.
The first two are already satisfied whenever 'Y2 > 1.2899. The third one brings in a more stringent condition, namely 'Y
v'5 > 1 + 2""
: : : 1.6180,
z
or 'Y > 2.6180,
which is the optimum attenuation level for this example. Note that, as
I
~I
expected, here the optimum attenuation level is higher than of the "noisefree measurement" case.
6.3
o
The Infinite-Horizon Case
We now turn to the time-invariant infinite-horizon problem, with the pair
(A, H) being observable (A, D) being controllable, and the extended per- .
168
Chapter 6
formance index replaced by +00
J;(u,w)
=
2:=
(IXkl~
+ IUkl2 - /2IwkI2)
k=-oo
We further require that Xk
-+
0 as k
-+
-00 and as k
-+
+00.
We first recall from Section 3.4 (Theorem 3.7) that, for any finite integer 7,
and for a fixed / > 0, the perfect state-measurement game defined
over [7, +00) has a finite upper value if, and only if, the algebraic Riccati equation (6.44)
has a (minimal) positive definite solution M+, which is the (monotonic) l~-nit,
as k
---T
-00,
initialized at Mo
of the solution of the PJ.ccati difference equation (6.6)
= O.
The certainty equivalence principle holds intact here, using the stationary (version of) Isaacs 'equation (2.12). We need only consider the auxiliary problem for
7
= 0: -1
C(-l)(u,w)
2:=
= Ixol~+ +
IXkl~
+ IUkl2 - /2IwkI 2.
(6.45)
k=-oo
Now, making the same change of the basis that led to (6.38)-(6.39), letting
Xi :=
X_i,
and likewise for w and y, and suppressing the "tilde"s in the
equations to follow, we are led to an infinite-time control problem for an implicit system, as given below:
00
C
Ixol~+
+ 2:=(IXkl~_'Y2CIN-1C + /2y~N-1Cxk k=l
The forward dynamic programming technique of the previous section applies. Thus, we again know that the homogeneous problem has a solution for any Xo (i.e., it is strictly concave) if, and only if, the associated algebraic
169
Imperfect Stale Measurements: Discrete nme
Riccati equation
(6.46) has a (minimal) positive definite solution .E+, or equivalently the algebraic Riccati equation
(6.47) admits a maximal negative definite solution K-. Furthermore, the closedloop system matrix is Hurwitz. Of course, Xo is now a function of w, so that G(-l) will be concave only if, m addition, the matrix M+
+ K-
IS
negative definite, or equivalently
(6.48) We may notice that the analysis of the asymptotic behavior of the Riccati equation (6.25a) has already been made in Section 3.4, since up to an obvious change of notation, (6.25a) is the same as (6.6). The positivity --~
condition we impose, however, is weaker here, so that to r.ely on the previous analysis, we have to check that the arguments carryover. But (6.40) still shows that K+ is an increasing function of the number of stages in the game, and the positivity condition (3.7), invoked in Lemma 3.3, was in fact used only to show that the matrix MiJl
+ BkB~
- 1- 2 DkD~ is positive
definite, and this corresponds to the positivity condition here. So we may still rely here on the derivation of Chapter 3 (Section 3.4). To recover the nonhomogeneous terms, let
and compute the maximum in
Xb Wk
of
170
Chapter 6
under the constraint
Since we have already resolved the concavity issue, we need only write the stationarity conditions, which may be obtained with the use of a Lagrange multiplier, say 2Pb corresponding to the above constraint. Toward this end, introduce
in terms of which the stationarity conditions read
(These are in fact equivalent to (6.24)). The multiplier Pk is computed by placing this back into the constraint. Let
AXk - A(I< + Q)-l(QXk
Xk+1
+ ,26) + BUk (6.49)
_
A(I< + Q)-l(I< Xk _,2~k)
+ BUk,
I
It is easily verified that the stationarity conditions may be written as
Pk
-I«XH1 - xHd
Wk
,-2 D' I«Xk+1 - Xk+1)
Xk
(I
0 ; (iii) The global concavity condition (6.48) is satisfied. ~
Then, 'Y
'Y., and a controller ensuring an attenuation level 'Y is given by
(6.62) to (6.64). If anyone of the three conditions above fails for some
k E [1, K], then 'Y :-:; 'Y*. If the initial state is fixed at initialization of (6.61) must be taken as
~2
Xl
= 0,
then the
= III, and the result holds
provided that III > O. For the stationary, infinite horizon problem, all equations must be repiaced by their statiuIliiry counterparts, and (6.66) should be taken over [0,00). Then, if the corresponding algebraic Riccati equations have (minimal) positive definite solutions satisfying (6.48), 'Y
--~T
I
~
'Y* ; otherwise 'Y :-:; 'Y •.
6.4.2
Delayed measurements
We consider here the case where, at time k, the available information is the sequence
yk - 8
For k :-:;
e,
of the output Yi, i
= 1, ... , k -
e,
where
e is a fixed
integer.
no measurement is available. It is straightforward to extend
the certainty-equivalence principle to this information structure, with the auxiliary problem at time
T
being
l
.i Ij I
(6.65) (The definition of n~::::~ is, of course, as in (6.15)). Noting that thus, k
= T - e+ 1, ... , T -
Wk,
1, is unconstrained, we may extend Lemma 6.1:
G r(-r-8. max u · Jt[r-8+1 rj'W r) wreO;_,(il,g) J
I I
r-8) = wr-'eo;:: max Gr-8(-r-8 u ,W .
,
We conclude that the minimax control is obtained by solving the standard problem P-y up to time
T -
e,
and then taking
Wk
= Ilk Cxk) for k = T -
e
1
~
...
175
Imperfect State Measurements: Discrete TIme !:.
f
~--T
up to r - 1, and using
UT
= J.L;(x~) according to the certainty-equivalence
principle. This is equivalent to applying the certainty-equivalence principle to the solution of the delayed state measurement problem of Section 3.2.3 (extended to an arbitrary finite delay). Concavity of the auxiliary problem requires that the conditions of Theorem 6.3 on
~
be satisfied up to time K - B, and that the following maxi-
mization problem T
m,:x{VT+!(XT+I)
+
L
gk(Xk, Uk, Wk)}
(6.66)
k=T-8+I with Uk fixed open-loop, be concave. The condition for this to be true is for the (open-loop) Riccati equation (see Lemma 3.1)
(6.67) to have a solution over [r - B + 1, r], that satisfies
(6.68) The worst-case state
x;
will now be obtained by solving-
l
-i Ij
using
x;_8 obtained from (6.27)-(6.28) where the index r must be replaced
by r - B.
1 Theorem 6.6. In the delayed information problem, with a fixed delay of
I
length B, let the following conditions be satisfied:
(i) The Riccati equation (6.6) has a solution over [1,K + 1], with
I
- I - 'Y -2D k D'k > 0 ; M HI
,
(ii) The set of Riccati equations (6.67) have solutions satisfying (6.68) over
I
[r - B+ 1,r] ;
Chapter 6
176
~1 ,J
(iii) The lliccati equation (6.25a) has a solution over [1, K - () fk+l
+ 1], with
>0;
(iv) Condition (6.30) is satisned for k E [1, K - () + 1]. Then, ,
~
,*,
and the procedure outlined prior to the statement of the
theorem yields an optimal (minimax) controller. If anyone of these four conditions fails, then, S
6.4.3
,* .
o
The "filtering" problem
We assumed, in the standard problem P.., that the information available at time k was Yi, i = 1, ... ,k-l. Hence, we may view it as a one-step predictor problern. Vve consider here the case where in addition, Yk is available at the time Uk is computed. Toward this end, we use the auxiliary criterion
GT
= IXT+l11r+l + IXT I~r + IUT 12 T-l
,21wT 12
+ L(lXkl~k + IU kl 2- ,2IwkI2) - ,2Ixll~o·
k=l
Using Lemma 6.1, we set (6.69) We may now compute, using dynamic programming, over one step max G T +1 =
wrEn.,.
max
w.,.-lEOr_l
max GT] . [ wrlYr
The maximum inside the bracket exists if (6.70) which is the standing assumption anyway, and the maximum value is easily found to be T-l
-IYT - CTXTI~'N;-l
+ IXTI~r + L(lxkl~k + IU kl 2k=l
,2Iwd) - ,2Ixll~o
-,
-.
--·····1··-
177
Imperfect State Measurements: Discrete Time
,J
-,
where (6.71)
(Notice that when DE' is not zero, the term in y-Cx is more complicated). We may use this new criterion in the theory as developed in Section 6.2. The only difference being in the final term, it will only show up in the final term of (6.24b) which is now replaced by
As a consequence, we have X• T
=
~T X T - , -2~ L..T
[6- TXT ~T + ,
1 2C'NT T ( YT -
C~T)l XT ,
and hence, (6.28) is replaced by (6.72)
while the global concavity condition (6.30) becomes (6.73) We have therefore obtained the following result:
Theorem 6.7. For the discrete-time disturbance attenuation problem of
Section 6.1, assume that also the current value of the output is available for the controller, and for this problem let ,. again denote the optimum attenuation level. Let the conditions of Theorem 6.4, with (6.30) replaced by (6.73), stand. When,
~
,., the solution is obtained as in Theorem 6.3,
except that (6.28) is now replaced by (6.72) and (6.29), which is
Jlt(xZ),
Uk
=
modified accordingly. (The notation used in (6.72) and (6.73) is
defined in (6.69) to (6.71)).
o
178
Chapter 6
6.4.4
Nonlinearfnonquadratic problems
Consider the nonlinear problem of Section 3.6, but with an imperfect nonlinear measurement as in (6.13b). We may keep the quadratic measure of noise intensity and of initial state amplitude, or change them to any other positive measure. Assume that the soft-constrained dynamic game associated with this problem has been solved under the CLPS information pattern, leading to a value function V.I;(x) and a feedback minimax strategy Ilk(x). We have already pointed out in Remark 6.1 that the certainty equivalence principle (i.e., the results of Theorems 6.1 and 6.2) holds for the nonlinear problem as well, provided that the various maxima involved exist and are unique. Therefore, we may still define an auxiliary problem as in (6.15) to (6.17), and employ Theorem 0.1. Notice that the computation of the value function V can be done offline. Then, the forward dynamic programming approach as in (6.33) with the constraints set to and can in principle be carried out recursively in real time, yielding W'\:+l when
Y.I; becomes available. Therefore,
which can also, in principle, be computed at that time, and placed into
Ilk+! to yield the minimax controller, J.lk+l(x:t~).
6.5
Main Points of the Chapter
This chapter has developed the discrete-time counterparts of the continuous-time results of Chapter 5, by establishing again a certaintyequivalence type of decomposition.
For linear systems where the con-
troller has access to the entire past values of the disturbance-corrupted state measurements, the characterization of a controller that
guaran~ees
a
,i' Imperfect State Measurements: Discrete TIme
179
given disturbance attenuation level is again determined by the solutions of two (discrete-time) Riccati equations «6.6) and (6.25a)), and a spectral radius condition (6.30b). As can be seen from (6.27)-(6.29), the controller is a certainty-equivalence controller, with the compensator (or estimator) having the same dimension as that of the plant. A precise statement on the optimality of such a controller has been given in Theorem 6.3, whose counterpart in the "filtering" case where the controller has access to also the current value of the measurement vector is Theorem 6.7. The chapter has also presented results on the infinite-horizon case (see Theorem 6.4), on problems with more general cost functions with cross terms in the state and control (cf. Theorem 6.5), and delayed information structures (cf. Theorem 6.6).
Furthermore, some discussion on the nonlinear/nonquadratic
problem has been included in the last subsection of the chapter. The solution to the basic problem covered by Theorem 6.3 was first outlined in [23], where complete proofs were not given. Some early work on this problem has also been reported in [60]. The form in which the results are established here, and the solutions to the more generai problems discussed, also involve the contributions of Garry Didinsky, a Ph.D. student at UI. As far as we know, this is the first published account of these results, which completes the theory of HOO-optimal control for linear-quadratic systems, in both continuous and discrete time, and under arbitrary information structures.
I
I
!
Chapter 7 Performance Levels For Minimax Estimators
7.1
Introduction
In the previous two chapters, in the development of the solutions to the disturbance attenuation probiem with imperfect measurements in continuous and discrete time, we have encountered filter equations which resemble the standard Kalman filter when the ,,-:eighting on the state in the cost function (i.e., Q) is set equal to zero. In this chapter we study such problems, independently of the analysis of the previous chapters, and show that the appearance of the Kalman filter is actually not a coincidence. The commonality of the analysis of this chapter with those of the previous ones is again the use of game-theoretic techniques. To introduce the class of estimation problems covered here, let us first consider the discrete-time formulation. Suppose that we are given the linear time-varying system
along with the linear measurement equation
(7.2) and are interested in obtaining an estimate, 6K, for the system state at time k =
K,
using measurements collected up to (and including) that time,
Performance Levels for Mini= Estimators
181
Y[O,Kl' The error in this estimat e is
(7.3) which depend s on the system and measur ement noises ({ Wk, k ~ O} and {vk,k ~ O}, respectively), as well as on the initial state Xo. The Bayesian approac h to this (filtering) problem assigns probab ility distribu tions to the unknowns, and determines DK as the function that minimizes the expecte d value ofa particu lar norm of eK. For a large class of such norms, the unique optimu m is the conditional mean of XK, which is well-known to be linear if the random vectors are also jointly Gaussia n distribu ted. Moreov er if the noise sequences are indepen dent white noise sequences, then the conditi onal . mean is generat ed recursively by the Kalman filter [I]. The alterna tive (minimax) approac h to be discussed here, however, is purely determi nistic. Toward formulating this class of problems, let IIw,v,xoliK denote some Euclidean norm of the (2I< + 2)-tupl e
(W[O,K-ll, V[O,Kl, xo), and introdu ce the mappin g (7.4) which is the counter part of the mappin g Tp of Section 1.2. Then, consist ent with (1.2), we will call an estimat or DK minima x if the following holds: infoK supw,v,':o {IToK (w, v, xo)1 IlIw, v, xollK}
= sUPw,v,':o {ITo:.: (w, v, xo)1 IlIw, V, xollK} Here
(7.5) =:
IK'
IK is the minimax attenua tion level for this I< stage filtering problem.
In the above formulation, if the measur ements available for the estimation of x K are Y[O,K -ll' then this constitu tes the i-step prediction problem; if, on the other hand, we are interest ed in the estimat ion of Xl, for some i < I.y. Note that [or 'Y > the minima x policy is indepen dent of 'Y, and furtherm ore supinfs up L-y y
U
:r,W
1',
= O.
We now claim that (by continu ity) the same holds for 'Y = '}', and hence conditi on (*) in the limit is equival ent to (7.17). This is-true because , as just proven, 7~ sup w,v,z
{ITo(w, v, x)lIlIw, v, xII} 7. Note that 7 > 0 since we had taken (M,D) '(M,D) completes the proof of all parts of the Lemma.
=f.
O. This
We are now in a position to present the solution to (7.13).
Theor em 7.1. The minima x estimat or for the static problem of this section is = 6, which is defined by (7.16). The minima x attenuation level 'Y. is equal to 7, which is the square root of the maximu m eigenvalue of
o·
(7.18). Ii
186
Chapter 7
Proof.
In view of Lemma 7.1, we have for all (x, w, v),
L.y(8(y); x, w, v) ~ 0 which implies
and since
i is the smallest
suc~ scalar,
(7.13) follows.
o
We now conclude this section with a crucial observation.
Corollary 7.1. The minimax estimator, 0·, of Theorem 7.1 is a Bayes estimator for z under the measurement y, when x, w, v are zero-mean in-
dependent Gaussian random vectors with respective covariances I, I, and R- i
.
Equivalently, under this a priori distribution, 8' is the conditionai
mean:
o·(y) Proof.
= E[z I y]
(7.19)
This follows from a well-known property of Gaussian distributions
[1].
0
Remark 7.l. The above result should not be construed as saying that the Gaussian distribution is least favorable [38] for a game with kernel as in (7.13); it is simply an interpretation given to ,., which will prove to be very useful in the next section.
o
Remark 7.2. Lemma 7.1 as well as Theorem 7.1 extend naturally to the case where x, w, v, y and z belong to infinite-dimensional complete inner product (Hilbert) spaces. In this case M, D, Hand F will be taken as bounded linear operators, and R will be taken as a strongly positive linear operator. The Euclidean norms will be replaced by the norms on the corresponding Hilbert spaces, and the scalar products by inner products. __ b
Then, the counterpart of (7.16) will be
8(y)
= (M H· + Dr)(R- 1 + H H· + F r)-ly,
(7.20)
Performance Levels/or Minimax Estimators
187
where "'" denotes the adjoint of a linear operato r. Conditi on (7.18) is also equally valid here, with ",,, replaced by"'''. o
7.3
Optim um Perfo rmanc e Level s
7.3.1
Discre te time
We now return to the dynami c estimat ion problem s formula ted in Section 7.1, where we take the disturb ance norm as K-I
Ilw,v,xolli == Ixol 2 + 2:
{I W kI
2
+ IVkl~.} + IVKI~K
(7.21)
k=O
where Rk is a positive definite weighting coefficient matrix, for every k 0, 1, ... , K. Note that there is no loss of general ity in taking the norms on
=
Xo and Wk as standar d Euclide an norms, because any non unity weighting
can be absorbe d into the problem parame ters, by redefining Xo and
Wk.
a) Filteri ng Since we have a termina l state estimat ion problem (see (7.4)), this problem is no different from the static one formula ted in the previous section, and hence Theore m 7.1 equally applies here. To see the dir~ct corresp ondence, let us rewrite Xk in the form (with some obvious definitions for ~k and N k ):
(7.22a) where Then
W[O,k_l] y[O,K]
is a column vector, made up of the
l.
can be rewritt en as Y[O,K]
__ b
wi's, l = 0, ... , k -
= H Xo + FW[O,K -1] + V[O,K]
(7.22b)
for some Hand F, where F is lower block triangu lar. Now letting M := ~K, D := N K, the problem becomes equivalent to the one of the previous section, and hence by Corollary 7.1, (7.23) .
Chopter7
188
noise sewith {Wk}, {Vk} being indepen dent zero-m ean Gaussia n white N(O,I) , quences, where Wk '" N(O,I) , Vk '" N(O,R ;l). Further more Xo '" ed by and it is indepen dent of {Wk} and {vd. Conseq uently, oK is generat minima x a Kalman filter. In fact, the same Kalman filter generat es the [0, J{j and estimat e for the filtering problem defined on any subinte rval of using the same parame ter
valu~s.
optiNow, to determ ine 'Y., we have to obtain the counter part of the equival ent mizatio n problem (7.17). Using the corresp ondenc e above, the optimiz ation problem can easily be seen to be
(7.24)
subject to the system equatio n
tic (LQ) We seek the smalles t value of 'Y2 under which this Linear- Quadra 3.1, we control problem has a finite (equiva lently zero) cost. Using Lemma now arrive at the following result:
Theor em 7.2. Let
rF
be the set of all 'Y
>
°
for which the sequence of
symme tric matrices {SdZ= K generated by the Riccati equation (7 .25a)
SK
2
CKRK CK - 'Y- I
Pk
:= I
+ D~Sk+1Dk
satisfy the following two conditions: i) Pk , k = 0, ... , J{
-
1, are positive definite
(7.25b)
189
Performance Levels for Minimax Estimators
ii) So
+I
is nonnegative definite
Then, IF := inf {I : I E fF} is the minimax attenuation level lie for the filtering problem.
b) Prediction By the same reasoning as in the filtering problem, the minimax i-step predictor is
where the statistics on the disturbances are as before. Hence 0* is a Kalman predictor. The counterpart of the minimization problem (7.24) in t.his case is (7.26) Hence, the result of Theorem 7.2 equally applies here, provided that we take Rk
= 0 for k > J{ -i.
Denoting the minimax attenuation level in this
case by I}!, clearly we have IP 2: IF.
c) Smoothing Here Theorem 7.1 and Corollary 7.1 are directly applicable, since there is no causality requirement.ol;l,J;he estimator. This also makes the resulting estimator an Hco-optimal smoother. Using the notation of (7.6), we have k=O,I, ... ,I{ where the distributions are agam as before. Hence, the standard Bayes smoot hers (involving two recursive equations - - one in forward and one in backward time [1]) are also minimax estimators. The condition (7.17) now becomes
"
.,
190
Chapter 7
under the system equation (7.1). The following counterpart of Theorem 7.2 readily follows for the smoothing problem. Theorem 7.3. Let fs be the set of alII>
°
for which the sequence of
symmetric matrices {Sd~=K g~nerated by the lliccati equation
Sk
= CkRkCk - ~Nk + A~Sk+l [I I
DkPk- 1 D~Sk+d A k ; (7.28a)
(7.28b)
sa.tisfy the following two conditions:
i) Pk, k ii) So
= 0, ... , [{ -
+I
1, are positive definite
is nonnegative definite.
Then, IS := inf{! : I E fs} is the minimax attenuation level for the smoothing problem.
7.3.2
o
Continuous time
For the continuous-time problem, we take as the counterpart of (7.21): (7.29) Then it is a simple exercise to write down the counterparts of the three minimization problems (7.24), (7.26) and (7.27), and the associated (continuoustime) lliccati equations. We assume below that all coefficient matrices have piecewise continuous entries on [O,t f). a) Filtering The optimization problem is: min W,Xo
{-~lx(tJ W+ Ixol 2 + Jo/" I
[x(t)'C'RCx(t) + W(t)'W(t»)dt} (7.30)
191
Performance Levels for Minimax Estimators
subject to (7.8). Then, the following theorem follows from Lemma 4.1:
Theorem 7.4. Let fp be the set of all I > 0 for which the Riccati equation below does not have any conjugate point in the interval [0, t J]:
S+SA+A'S- SDD'S+ G'RG
= 0,
(7.31)
and furthermore that
S(O) Then, IF := inf h
:I
+I
~
O.
E fp} is the minimax atien uation level for the
filtering problem.
b) Prediction If we allow a
(J
unit delay in the utilization of the measurement, (7.30)
is replaced by
Hence, to obtain the minimax attenuation level, I]" we simply have to take
R(t)
== 0, for t > t J -
(J,
in the statement of Theorem 7.4.
c) Smoothing If we take the integrated smoothing error as
(7.33) the associated optimization problem becomes min W,Xo
{lxol2 +
it! 0
[X(t)'[G'RG - -;N(t)]x(t) I
+ w(t)'w(t)]
for which the counterpart of Theorem 7.4 is the following.
dt}
(7.34)
192
Chapter 7
Theorem 7.5. Let
fs
be the set of all I
>
0 for which the Riccati dif-
ferential equation below does not have any conjugate point in the interval
[0, t,]: .
1
S+ SA+ A'S - SDD'S + C'RC- 2N I
= OJ
S(t,)
= 0,
(7.35)
and furthermore that
S(O)
I
+ I 2: o.
J
Then, IS := inf {I : I E f F} is the minimax (H OO ) attenuation level for the smoothing problem.
o
j
r
l
1
i In all cases above the infinite-dimensional versions of Theorem 7.1 and Corollary 7.1 (ida Remark 7.2) can be used to establish the Bayes property of the minimax estimators. The corresponding distributions for the disturbances are again Gaussian, with w(t), v(t) being independent zero-mean white noises (more precisely, their integrals are Wiener processes), with the associated covariances being I6(t) and R(t)6(t), respectively, where 6(.) is the Dirac delta function. Hence, again the well-known Bayesian filter, predictor and smoother equations generate the minimax estimators. We should also note that if sampled (instead of continuous) measurements are available, the Riccati equations above will have to be modified to accommodate also this type of measurement schemesj this should now be an easy exercise, given the analyses of the previous chapters on the sampled-data information pattern .. Furthermore, if the initial state Xo is instead a known quantity (say the zero vector), then there would not be any need for the second conditions in the theorems above (i.e., those that involve S(O)).
7.4
Summary of Main Results
This chapter has provided a nutshell analysis of a class of minimax filter, prediction and smoothing problems for linear time-varying systems
f
-1
Performance Levels for Minimax Estimators
193
in both discrete and continuous time, by making use of the saddle point of a particular quadratic game. The main structural difference between the performance index here and those adopted in the previous chapters is that here the "gain" is from the disturbance(s) to the pointwise (instead of cumulative) output (which in this case is the estimation error). This differ-
I J
J
r
l
1
i
f
1
ence in the cost functions leads to an important structural difference in the solutions, in that for the class of problems studied here the minimax decision rules (estimators) can be obtained without computing the associated. i·
minimax attenuation levels - - a feature the decision rules (con trollers) 0 b-
I
tained in the previous chapters did not have. As a result, here the minimax estimators and the associated performance levels can be determined independently, with the former being Bayes estimators with respect to Gaussian distributions (which then readily leads to the Kalman-type recursive structures), and the latter (that is, the minimax disturbance attenuation levels) determined from the solutions of some related linear quadratic (indefinite) optimal control problems, as presented in Theorems 7.2-7.5. The derivation here is based on the recent work reported in [12], though
i:
the fact that these minimax estimators are Bayes estimators with respect to Gaussian distributions had in fact been presented some twenty years ago in [28]. The authors' primary interest in [28] was to obtain a complete characterization of the evolution of the uncertainty in the value of the state X/c
(or x(t)), caused by the norm (or energy) bounded disturbances w, v
and
XQ,
and in the light of the measurements received. Indeed, it was
shown in [28] that these uncertainty sets are ellipsoids, whose centroids (which are minimax estimators) are generated by Kalman filter (predictor or smoother) type equations, and independently of the magnitudes of the norm bounds.
i'l'I'
!II
Chapter 8 Appendix A: Conjugate Points
We provide, in this appendix, a self-contained introduction to conjugate points as they arise in dynamic linearrquadratic optimization, and in par-
ticular in the context of linear-quadratic differential games. The results presented below are used in Chapter 4, in the proof of some key results in the solution of the continuous-time disturbance attenuation problem. We consider a two-player system in JR n, defined over a time interval
x(t)
= A(t)x(t) + B(t)u(t) + D(t)w(t), BO
The matrix functions An,
x(O)
= Xo.
and D(.) are assumed to be piecewise
continuous and bounded over [0, t j J, and have dimensions n and n
X m2,
(8.1)
X
n, n
X ml
respectively. We are also given a nonnegative definite matrix
Qj and a nonnegative definite matrix function Q(.), both of dimension ,
-- I
n X n. We let
Ixl~1 := x'QjX Let U
= L2([0,tjJ,JR ml),
and
W
Ilxll~:= lotI x'(t)Q(t)x(t)dt.
= L~([0,tjJ,JRm2), lIull and IIwl! denote the
L2 norms of u E U and w E W. For every positive 'Y E JR+, let J, : JRn
X
U
X
W
-+
JR be given by
195
Appendix A: Conjugate Points
where t
1-+
x(t) is the unique solution of (B.1). When there is no source of
ambiguity, we shall suppress the subindex "I of J. We first consider the minimization of J with respect to u, for fixed The function u
u
1-+
x is affine, and (u, x)
1-+
1-+
WfW.
J is quadratic, and therefore
J(xo; u, w) is quadratic nonhomogeneous. Since QJ 2: 0, and Q 2: 0,
it follows that J is convex, and J(xo, u, w) here being that J(xo; u, w) 2:
lIull
2
-
-+ 00
as
Ilull -+ 00
(the reason
"I2I1wW). Therefore J has a (unique)
minimum, which can be obtained using the standard necessary conditions of the calculus of variations, or the Pontryagin minimum principle, -as
u(t)
= -BI(t) .. (t)
(B.2a)
where .\(t) is given, along with the optimal trajectory, as the solution of the Hamiltonian system (a two point boundary value problem)
x = Ax- BBI).. + Dw, j
x(O)
= Xo
(B.2b)
I
= -Qx- AI)..,
(B.2c)
It follows from an analysis identical to the one to be developed shortly for
the maximization problem that there exists a solution f{ (t) to the associated Riccati differential equation (RDE)
i< + f{ A + AI f{ -
f{ BB' f{
+ Q = 0,
(B.3)
and that the optimal control and optimal cost can be written in terms of f{
and the associated variables. Let us now turn to the maximization problem.
We first prove two
theorems (namely, Theorems B.1 and B.2, to follow) that can be obtained by elementary means. We shall then state and prove a more complete result, given as Theorem B.3. Introduce the following Riccati differential equation for a matrix S (the "maximization Riccati equation")
(B.4).
-I
196
Chapter 8
Whenever necessary t.o indicate the explicit dependence .on the parameter
P
" we shall den.ote the s.oluti.on .of this equati.on by S1'O.
By classical
n
9.
Proof.
The "if' part has already been proven in Theorem 8.1; therefore
we prove here only the "only if' part. Toward this end, suppose that we take I ~
9.
Let hkh~o be a monotonically decreasing sequence in IR +
with limit point
/lSk(t·)/I
-+ 00
9, and write Sk for S,,(,.
as k
-+ 00.
We know that for some t* E [0, t J ),
We now need the following fact":
Lemma 8.1. There exists a fixed
x
E IRn such that the sequence
{/x/th>o is unbounded.
Proof of the Lemma. definite for each k
~
According to Proposition 8.1, Sk is nonnegative
0, so that a valid norm, equivalent to any other one, is
its trace. Hence Tr[Sk)
-+ 00.
As"a consequence, at least one of the diagonal
elements is unbounded, since otherwise the trace would be bounded by the (finite) sum of these bounds" Now, picking
x as the basis vector associated
with that diagonal element proves the lemma. Note that at the expense of taking a subsequence of the k's, we may assume that "x/l~.
-+ 00.
0
j Chapter 8
200
Now, returning to the proof of Theorem 8.2, we let (-'.) denote the
~.
I)
state transition matrix associated with A, and choose
Xo so that w
= (0, t*)x -
= °for t E [O,t*)
1 '
(0, t)B(t)u(t)dt,
will yield x(t*)
J..,(xo;u, w) ;::: Ix(t, )Itf
I
t
= x.
For all such w's
+ itf (Ixl~ + lul 2 -
/2IwI2)dt.
t'
In view of (8.7) on the interval [to , t, J, this implies maxw J"'k (xo; ti, w)
-+ <Xl.
But we also have
Consequently, sUPw J..,(xo; u, w) ;::: maxw J..,k(XO; u, w), for all k ;::: 0. Hence the supremum of J.., over w E W is infinite.
~" Remark 8.3.
that concerns
The result of Theorem 8.2 is the only one in this appendix
J-:y.
Before we state and prove the next theorem, we need to introduce Caratheodory's canonic equations and prove two lemmas. Let P(.) be a symmetric piecewise continuous bounded matrix function on [0, X (.) and Y (-) be two n
X
t, ),
and let
n matrix functions defined by
X=AX-PY,
X(t,)
= I,
Y = -QX -A'Y,
(8.9a) (8.9b)
Notice that X and Yare well defined over (O,t,), and that [X'(t), Y'(t))' has a constant rank n. As a matter of fact, it is the transition matrix generating all the solutions of the Hamiltonian system
x= Ax-P>.,
(8.10a) (8.10b)
II
I
I
j 201
Appendix A: Conjugate Points
I
~)
for all possible initial states. Consider also the associated Riccati differential equation in K:
I
K+KA+A'K-KPK+Q=O,
K(tJ)=QJ.
(8.11)
Lemma 8.2. The RDE (8.11) has a bounded solution over [O,tJ] if, and
only if, the matrix function X(.) that solves (8.9) is invertible over [O,t J ]. Otherwise, the finite escape time t* of (8.11) is the largest t* < T such that X(t*) is singular.
Proof.
By continuity, the matrix X is invertible in a left neighborhood of
tJ. Let K(t) = Y(t)X-I(t), and check using (8.9) that this matrix function
/{ is indeed the (unique) solution of(8.11) in that neighborhood. Hence if X is invertible over [0, t J]' /{ is defined over that interval. Conversely, assume that /{ (.) is defined over [t, t J ], and let K associated with A - P/{. Let X(t)
(., .)
be the transition matrix
= K(t, tJ), and yet) = K(t)X(t),
and
again check directly that they constitute a solution of (8.9). As a transition matrix is always invertible, this completes the proof of the lemma. Lemma 8.3. For any fixed
~
E IR n , let x(t)
the solution of (8.10) initialized at Xo Ix(tJ )I~f
.
-I
Proof.
0
= X(t)~ and ..\(t) = Y (t)~ be
= X(O)~.
Then
+ II..\II~ + Ilxll~ = eY'(O)X(O)~ = e X'(O)Y(O)~.
(8.12)
Compute
!("\'x) =
-Ixl~ -I..\I~
and integrate this from 0 to t J to obtain the first expression. Doing the same calculation with Y' X shows that this matrix remains symmetric for all t. (This also explains why K
I
= Y X-I is symmetric.)
0
We are now in a position to state and prove the following theorem, which holds for every initial state Xo, but says nothing about the case 'Y
= 9.
Chapter 8
202
Theorem 8.3. For any fixed Xo E IR n and u E U, J..,.(xo; u, tv) has a finite supremum in w E W if,
Proof.
> 9,
and only if, 2:
9.
The "if" part is Theorem 8.1. We prove here the "only if" part.
Notice that the function w
~
x is affine, and therefore w
~
J..,.(xo; u, w) is
quadratic nonhomogeneous. If its homogeneous part is not concave, it has an infinite supremum. This in turn is characterized by the fact that this homogeneous part, J..,.(O; 0, w), is positive for some w fact, if this happens, multiplying
w by
= w.
As a matter of
a positive number a will multiply
the linear terms in any J..,.(xo; u, w) by a, and the quadratic terms by a 2 , so that the quadratic terms will eventually dominate for large values of a. Therefore henceforth, we investigate the case Xo = 0, u
==
o.
= -,-2DD', we associate with this maximization problem a pair of square matrices, X..,.(t), Y..,.(t). For, = 9, we Applying Lemma 8.2 with P
know that there is a conjugate point at some t* E [0, t J). Therefore, by Lemma 8.2, X::y(t*) is singular. Let ~
t= 0 in IR
n
be such that X::y(t*)~
Then, because the matrix [X', Y'l' is always of rank n, Y~(t*)~ ..,.
t=
= O.
0, and
remains nonzero in a right neighborhood of t*. Let wE W be defined by
wet)
={
0
if
t E [0,1*)
,-2D'Y~
if
tE[t*,tJj.
Applying Lemma 8.3 between t* and t J, (8.12) yields J::y(O; 0, w) also that
IlwW > O.
Now, take any,
= O.
Notice
< 9. Again we have
Hence J..,.(O; 0, w) is unbounded, and so is J..,.(xo; u, w) for any Xo E IR n and uEU.
Remark 8.4.
It can further be shown that at , =
9, the conjugate point
is in fact at t = 0, and that J~(xo; u, w) has a finite supremum if, and only ..,. if, Xo E ImX::y(O).
"
203
Appendix A: Conjugate Points
We now turn to the investigation of 1 sup min J-y(xo; u, w). w
"
To this (max min) optimization problem, we now associate a new Riccati differential equation:
A finite escape time for (8.13) is also called a conjugate point. A proof identical to the one of Proposition 8.1 yields:
Proposition 8.4. Whenever it exists, Z-y(t) is nonnegative definite, and'
it is positive definite if either QJ or Q(.) is positive definite.
0
The counterpart to Theorem 8.1 is the following:
Theorem 8.4. If Z-y is defined over [0, t J]' then
max min J-y(xo; u, w) w
Proof.
"
= x~Z-y(O)xo.
(8.14)
The proof relies upon the method of completion of squares. Using
(8.13) and (8.1),
!
(x'(t)Z-y(t)x(t))
so that
V(U,w) EU x W,
J-y(XO; u, w)
= x~Z-y(O)xo + lIu + B' Z-yx112 -
,211w -
,-2 D'Z-yxW. (8.15)
1 Proofs of some of the results to follow are quite similar to their counterparts in the pure maximization problem already discussed, and hence they will be not be included here.
204
Chapter 8
This expression shows that if the "players" are allowed to use,closed-Ioop controls, of the form u(t)
p*(t, x)
= pet, x(t)), vet) = vet, x(t)), then the pair
= -B'(t)Z-y(t)x,
v*(t, x)
= 1- 2 D'(t)Z-y(t)x
(8.16)
is a saddle point. However we are looking here for open-loop controls. Let in the sequel x* be the trajectory generated by strategies (8.16), and
u*(t)
:=
p*(t, x*(t)), w*(t)
v*(t,-x*(t)) be the open-loop controls on that
:=
trajectory. For any fixed w, we may choose the control u(t) that would be generated by p* together with that w. We see that then J-y :::;
x~Z-y(O)xo.
Therefore, a fortiori,
and since this is true for all w E W, sup minJ-y(xo;u,w):::; w
x~Zo(O)xo.
(8.17)
u
Let us now study the optimization problem: min J-y(xo; u, w*) . u
In view of (8.2), its unique solution is given by u
= -B')., where). is the
solution of
X
Ax-BB').+1-2DD'Z-yx*, x(O)=xo
It is easy to check directly that x = x*, ). = Z-yx* := ).* is a solution
of these equations. But then the pair (x*,). *) satisfies the Hamiltonian system of the form (8.10), with P
= BB' -
1- 2 DD'. Therefore, we can
apply Lemma 8.3, to arrive at
This, together with (8.17), proves the claim. A reasoning identical to the one used in the proof of Proposition 8.2 now leads to
'I, ,
205
Appendbc A: Conjugate Points
Proposition 8.5. For 'Ylarge enough, (8.13) has a solution over [0, tJ]. ~
As a result, we may introduce the following nonempty set as the counterpart of
r
:=
r:
°
{12: I 't/'Y 2: 1, the RDE (8.13) has a solution over [0, t J]} . (8.18a)
Furthermore let 'Y'
= inf{-y E r}
(8.18b)
Again a proof identical to the one of Proposition 8.3 yields the following result:
Proposition 8.6. For 'Y
'Y', the RDE (8.13) has a conjugate poin t
Now, Theorem 8.2 does not readily carryover to the sup min problem, because in the time interval [0, to], u might be able to drive the state through an x(t·) for which the value x'(tO)Z')'(t·)x(t·) does not go unbounded as 'Y
!
'Y •. We turn therefore toward establishing the counterpart of Theorem 8.3.
Theorem 8.5. For any fixed Xo E IR n , the function min.. J')'(xo; u, w) has a finite supremum in w E W if'Y
Proof.
> 'Y.,
and only if'Y
2:
'Y •.
We first note that using (8.2) an expression for min.. J')' is given
by (8.19) The function w
1-+
(x, A), in (8.2b)-(8.2c), is still affine, and hence the
above expression still defines a quadratic nonhomogeneous function of w. Therefore as in Theorem 8.3 we study only the homogeneous term, which·
206
Chapter 8
is given by (8.19) with Xo with P
= BB' -
= O.
Let X'Y(t), Y'Y(t) be the solutions of (8.9)
'Y-2DD'. (Note that in this case (8.11) is in fact (8.13)).
Let t" be the conjugate point corresponding to 'Y", and as in Theorem 8.3 let
e=p 0 be such that X'Y.(t")e = 0, which ensures that Y'Y.(t")e =p o. The
control
t E [0, t)
for
will lead to x(t)
= X'Y.(t)e,
>.(t)
= Y'Y.(t)e from t"
on, and therefore by
applying Lemma 8.3, min J'Y. (0; u, Ui) u
although
IlUiIi > o.
= 0,
Therefore, as previously, for any 'Y
< 'Y",
Although Theorem 8.5 seems to be comparable to Theorem 8.3, the situation is in fact significantly more complicated in the present case than in the case of simple maximization. What happens at 'Y
= 'Y'
depends on
further properties of the conjugate point. See [21] for a more complete analysis. The following example is an extension of the example first presented in [21], to show some of these intricacies. EXaIllple 8.1.
Let n
= 1, t f = 2, and the game be described by x = (2 -
J'Y(xo; U, w)
t)u + tw x(O)
= Xo
1 = "2x(2)2 + IluW - 'Y2I1wIl2.
The associated Riccati differential equation (8.13) is
z + (l-1)Z2 = 0, It admits the unique solution
Z(2)
1
= "2.
207
Appendix A: Conjugate Points
provided that the inverse exists. For any,
> 1, Zoy(t) is positive and hence
is defined on [0,2]. However, for, =
= 1, a conjugate point appears
at t*
= 1, i.e., in the interior of [0,2].
,*
Moreover, for,
= 1, the feedback
control 2 -t
= J1.(t, z(t)) = - 2(t _ 1)2 x(t), although defined by a gain that diverges at t = 1, will produce a bounded u(t)
control trajectory for every w E W, so that the quantity sup inf Joy. (xo; u, w) w
is bounded for every Xo E IR.
u
o
-- I !
Chapter 9 Appendix B: Danskin's Theorem
In this appendix we state and prove a theorem due to Danskin, which was used in Chapter 5, in the proof of Theorem 5.1. As far as we know, the original version was first presented in [33]. Since then, many more general forms have been derived. More discussion on this can be found in [24]. To follow the arguments used in this chapter, some background in real analysis, for example at the level of [72], is required. We first introduce some preliminaries: Let I
= [0, tf] C IR, and n be a Hilbert space, with inner product (., .),
and norm 11·11. Let G: I x
n --> IR : (t,w)
t-+
G(t,w) be given, and define
Wet) := sup G(t,w). wEn
The hypotheses on G are as follows. Hypothesis Hl.
Vt E I, w t-+ G(t,w) is concave, upper semicontinuous,
and G(t,w)
as w -->
--> -00
00.
0
As a consequence of hypothesis HI, Vt E I,
3 wet) En:
G(t,w(t))
= Wet).
Hypothesis H2. Vt E I,
3v
> 0:
Vw En,
Wet) - G(t,w) 2': vllw - w(t)1I2. ')()Q
0
209
Appendix B: Danskin's Theorem
Hypothesis H3.
Vw E 0, Vt E (0, t f), there exists a partial derivative
8 8tG(t,w) (t,w)
1--+
. G(t,w)
=:
G(t,w) is continuous, G being bounded over any bounded subset
of O. (For t = 0 or tf we may assume one-sided derivatives). As a consequence of hypothesis H3, t
1--+
o
G(t, w) is continuous, uniformly
in w over any bounded subset of I x O. Remark 9.1.
For a quadratic form
G(t,w)
= -(w,A(t)w) + 2(b(t),w) + c(t) ,
the hypothesis H2 is a consequence of HI. As a matter of fact, concavity requires that A be positive definite, and
G(t,w)
-> -00
th~
hypothesis
as w ->
00
requires that A be coercive (or elliptic), i.e., inf (w,Aw)=lI>O. IIwll=l
(9.1)
If this infimum were zero, then letting {w n} be a sequenc~ with Ilwnll
=1
and lim (wn,Aw n ) = 0, we could choose wn = €n(wn,Awn)-~wn, with
€n
= ±1 such that (b,w n)::::: O.
Then wn would diverge to infinity, and yet
G(t,w) ::::: c(t)-1.
I
But then, if A is coercive, it is invertible, w(t)
W(t)
= A- I(t)b(t),
and
= (b(t), A-I(t)b(t)) + c(t) ,
G(t,w) = -(w - w(t),A(t)[w - w(t)]) + (b(t), A-I (t)b(t))
(9.2a)
+ c(t)
(9.1) and (9.2) together prove that H2 is satisfied.
(9.2b) 0
We now state and prove the main theorem of this appendix. Theorem 9.1. Under the hypotheses Hl-H3, W has a derivative given by
d~?) = G(t,w(t))
i !
210
Chapter 9
Proof.
We shall look separately at the right derivative and the left deriva-
tive, so that the proof will hold at t
= 0 and t = t f.
For convenience, define
w = w(t), and
~(h) = *[W(t + h) -
W(t)].
Let {h n } be a sequence of positive numbers decreasing to zero. From the definition of W, we have
~(hn) = hI
[W(t
+ hn ) - G(t,w)] 2: hI [G(t + hn,w) - G(t,w)]
n
n
so that (9.3) Now let Wn := w(t
+ hn ).
By HI, the wn's stay in abounded set.
Hence the sequence has at least one weak accumulation point w. Take a subsequence, again noted {w n }, such that Wn
---+ W
(w n converges weakly
to w). We then have
The uniform continuity of G with respect to t yields
G, being a concave function of wand upper semicontinuous, is also upper semicontinuous in the weak topology. Thus lim sup [G(t, wn ) - G(t, w)] $ O. n-oo
From the above three relations, we conclude that limsup G(t +hn,wn ) $ G(t,w).
(9.4)
n-oo
Now, it follows from (9.3) that there exists a real number a such that ~(hn)
2: a, or equivalently (recall that h n > 0)
211
Appendix B: Danskin's Theorem
Therefore l~l~f G(t+hn,wn) 2: G(t,w).
(9.5)
It follows from (9.4) and (9.5) that
G(t, w) 2: G(t, w). This implies that w = w, G(t,w)
= G(t,w), and
Using H2, together with the uniform continuity of G with respect to t, we get
vllWn - wlJ2
~
G(t,w) - G(t,w n ) G(t + hn , wn ) + G(t + hn , wn ) - G(t, wn) ---+ 0,
=G(t, w) -
which in turn implies that
Wn ---+
win the strong topology of Q.
Finally, we have
However, G is differentiable in t for all t in I. Thus, by the mean-value theorem, there exists a
en E [0,1] such that
By the continuity of G, it follows that lir.!Nf Ll(h n ) ~ G(t,w).
(9.6)
The inequalities (9.3) and (9.6) together imply that lim hI [W(t
h~!O
n
+ hn ) -
W(t)]
Since the derivation was carried out for h n
= G(t,w).
> 0,
this shows that W has a
right derivative equal to G(t,w). The proof for the left derivative is entirely similar, with h n < O. The first argument of the above proof now gives, instead of (9.3): limsup Ll(h n ) ~ G(t,w). h~ 10
(9.7)
212
Chapter 9
The proof leading to (9.4) does not depend on the sign of h n . The inequality (9.7) yields
and hn being now negative, this gives again (9.5), and hence the strong convergence of Wn to
w.
Then, the last argument of the earlier proof leads
to lie t~f ~(hn) ~ G(t, w)
and this, together with (9.7), completes the proof.
(9.8)
Chapter 10 References
[1] B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979. [2] B. D. O. Anderson and J. B. Moore. Optimal Control: Linear Quadratic Methods. Prentice-Hall, Englewood Cliffs, NJ, 1990. [3] M. Athans, editor. Special Issue on Linear-Quadratic-Gaussian Control, volume 16(6). IEEE Trans. Automat. Control, 1971. [4] T. Ba§ar. A general theory for Stackelberg games with partial state information. Large Scale Systems, 3(1):47-56, 1982. [5] T. Ba§ar. Disturbance attenuation in LTI plants with finite horizon: Optimality of nonlinear controllers. Systems and Control Letters, 13(3):183-191, September 1989. [6] T. Ba§ar. A dynamic games approach to controller design: Disturbance rejection in discrete time. In Proceedings of the 29th IEEE Conference Decision and Control, pages 407-414, December 13-15, 1989. Tampa, FL. [7] T. Ba§ar. Time consistency and robustness of equilibria in noncooperative dynamic games. In F. Van der Ploeg and A. de Zeeuw, editors, Dynamic Policy Games in Economics, pages 9-54. North Holland, 1989. [8] T. Ba§ar. Game theory and HOO-optimal control: The continuous-time case. In Proceedings of the 4th International Conference on Differential Games and Applications. Springer-Verlag, August 1990. Helsinki, Finland. [9] T. Ba§ar. Game theory and HOO-optimal control: The discrete-time case. In Proceedings of the 1990 International Conference on New Trends in Communication, Control and Signal Processing, pages 669686, Ankara, Turkey, July 1990. Elsevier.
214
Chapter 10
[10] T. Ba§ar. HOO-Optimal Control: A Dynamic Game Approach. University of Illinois, 1990. [11] T. Ba§ar. Minimax disturbance attenuation in LTV plants in discretetime. In Prooceedings of the 1990 Automatic Control Conference, San Diego, May 1990. [12] T. Ba§ar. Optimum performance levels for minimax filters, predictors and smoothers. Preprint, May 1990. [13J T. Ba§ar. Generalized Riccati equations in dynamic games. In S. Bittanti, A. Laub, and J. C. Willems, editors, The Riccati Equation. Springer-Verlag, March 1991. [14] T. Ba§ar. Optimum H oo designs under sampled state measurements. To appear in Systems and Control Letters, 1991. [15] T. Ba§ar and P. R. Kumar. On worst case design strategies. Comptder.s and Mathematics with Applications: Special Issue on Pursuit Evasion Differential Games, 13(1-3):239-245, 1987. [16] T. Ba§ar and M. Mintz. Minimax terminal state estimation for linear plants with unknown forcing functions. International Journal of Control, 16(1):49-70, August 1972. " [17] T. Ba§ar and G. J. Olsder. Dynamic Noncooperative Game Theory. Academic Press, London/New York, 1982. [18] M. D. Banker. Observability and Controllability of Two-Player Discrete Systems and Quadratic Control and Game Problems. PhD thesis, Stanford University, 1971. [19] A. Bensoussan. Saddle points of convex concave functionals. In H. W. Kuhn and G. P. Szego, editors, Differential Games and Related Topics, pages 177-200. North-Holland, Amsterdam, 1971. [20] P. Bernhard. Sur la commandabilite des systemes lineaires discrete deux joueurs. Rairo, J3:53-58, 1972.
a
* [21] P.necessary Bernhard. Linear-quadratic two-person zero-sum differential games: and sufficient conditions. Journal of Optimization Theory
fj
Applications, 27:51-69, 1979.
[22] P. Bernhard. Exact controllability of perturbed continuous-time linear systems. IEEE Transactions on Automatic Control, AC-25(1):89-96, 1980. [23] P. Bernhard. A certainty equivalence principle and its applications to continuous-time sampled data, and discrete time H oo optimal control. INRIA Report #1347, August 1990.
References
215
[24] P. Bernhard. Variations sur un theme de Danskin avec une coda sur un theme de Von Neumann. INRIA Report #1238, March 1990. [25] P. Bernhard. Application of the min-max certainty equivalence principle to sampled data output feedback H OO optimal control. To appear in Systems and Control Letters, 1991. [26] P. Bernhard and G. Bellec. On the evaluation of worst case design with an application to the quadratic synthesis technique. In Proceedings of the 3rd IFAC Symposium on Sensitivity, Adaptivity, and Optimality, Ischia, Italy, 1973. [27] D. P. Bertsekas and I. B. Rhodes. On the minimax reachability of target sets and target tubes. A utomatica, 7:233-247, 1971. [28] D. P. Bertsekas and I. B. Rhodes. Recursive state estimation for a set-membership description of uncertainty. IEEE Transactions on Automatic Control, AC-16:117-128, April 1971. [29] D. P. Bertsekas and I. B. Rhodes. Sufficiently informative functions and the minimax feedback control of uncertain dynamic systems. IEEE Transactions on Automatic Control, AC-18(2):117-123, April 1973. [30] C. C. Caratheodory. Calculus of Variations and Partial Differential Equations of the First order. Holden-Day, New York, NY, 1967. Original German edition published by Teubner, Berlin, 1935. [31] B. C. Chang and J. B. Pearson. Optimal disturbance reduction in linear multivariable systems. IEEE Transactions on Automatic Control, AC-29:880-887, October 1984.
'! [32] J. B. Cruz, Jr. Feedback Systems. McGraw Hill, New York, NY, 1971. [33] J. M. Danskin. The Theory of Max Min. Springer, Berlin, 1967. [34] G. Didinsky and T. Ba§ar. Design of minimax controllers for linear systems with nonzero initial conditions and under specified information structures. In Proceedings of the 29th IEEE Conference on Decision and Control, pages 2413-2418, Honolulu, HI, December 1990. [35] P. Dorato, editor. Robust Control. IEEE Press, New York, NY, 1987. [36] P. Dorato and R. F. Drenick. Optimality, insensitivity, and game theory. In 1. Radanovic, editor, Sensitivity Methods in Control Theory, pages 78-102. Pergamon Press, New York, NY, 1966. [37] J. Doyle, K. Glover, P. Khargonekar, and B. Francis. State-space solutions to standard H2 and Hoo control problems. IEEE Transactions on Automatic Control, A~34(8):831-847, 1989.
216
Chapter 10
[38] T. S. Ferguson. Mathematical Statistics, A Decision Theoretic Approach. Academic Press, New York, 1967. [39] B. A. Francis. A Course in Hoc Control Theory, volume 88 of Lecture Notes in Control and Information Sciences. Springer-Verlag, New York,1987. [40] B. A. Francis and J. C. Doyle. Linear control theory with an Hoc optimality criterion. SIAM J. Control Optimization, 25:815-844, 1987. [41] B. A. Francis, J. W. Helton, and G. Zames. Hoc-optimal feedback controllers for linear multivariable systems. IEEE Trans. Automat. Control, 29:888-9000, 1984. [42] P. M. Frank. Introduction to System Sensitivity Theory. Academic Press, New York, 1978. [43] K. Glover and J. C. Doyle. State-space formulae for all stabilizing controllers that satisfy an HOC-norm bound and relations to risk sensitivity. Systems and Control Letters, 11:167-172, 1988. [44] K. Glover, D. J. N. Limebeer, J. C. Doyle, E. Kasenally, and M. G. Safonov. A characterization of all solutions to the four-block general distance problem. SIAM Journal on Control, 1991. [45] J. W. Helton and J. A. Ball. Hoc control for nonlinear plants: Connectors with differential games. I~ Proceedings of the 29th IEEE Conference Decision and Control, pages 956-962, December 13-15, 1989. Tampa, FL. [46] M. Heymann, M. Pachter, and R. J. Stern. Max-min control problems: A system theoretic approach. IEEE Transactions on Automatic Control, AC-21( 4):455-463, 1976. [47] M. Heymann, M. Pachter, and R. J. Stern. Weak and strong max-min controllability. IEEE Trans. Automat. Control, 21(4):612-613, 1976. [48] P. J. Huber. Robust Statistics. Wiley, New York, NY, 1981. [49] R. Isaacs. Differential Games. Kruger Publishing Company, Huntington, NY, 1975. First edition: Wiley, NY 1965. [50] P. P. Khargonekar. State-space Hoc control theory and the LQG control problem. In A. C. Antoulas, editor, Mathematical System Theory: The Influence of R. E. Kalman. Springer Verlag, 1991. ~
[51] P. P. Khargonekar, K. Nagpal, and K. Poolla. HOC-optimal control with transients. SIAM J. of Control and Optimization, 1991.
References
217
[52] P. P. Khargonekar and K. M. Nagpal. Filtering and smoothing in an Hoc setting. In Proceedings of the 29th IEEE Conference on Decision and Control, pages 415-420, 1989. Tampa, FL. [53] P. P. Khargonekar, I. R. Petersen, and M. A. Rotea. Hoc-optimal control with state-feedback. IEEE Transactions on Automatic Control, AC-33(8):786-788, 1988. [54] P. R. Kumar and P. P. Varaiya. Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice-Hall, Englewood Cliffs, NJ, 1986. [55] H. K wakernaak. Progress in the polynomial solution of the standard Hoc optimal control problem. In V. Utkin and O. Jaaksoo, editors, Proceedings of the 11th IFAC World Congress, volume 5, pages 122134, August 13-17 1990. Tallinn, Estonia, USSR. [56] H. Kwakernaak and R. Sivan. Linear Optimal Control Systems. WileyInterscience, New York, 1972. [57] I. D. Landau. Adaptive Control: The Model Reference Approach. Marcel Dekker, New York, 1979. [58] G. Leitmann. Guaranteed asymptotic stability for some linear systems with bounded uncertainties. ASME Journal Dynamic Systems, Measurement, and Control, 101(3), 1979. [59] D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar, and M. Green. A game theoretic approach to Hoc control for time varying systems. In Proceedings of the International Symposium on the Mathematical Theory of Networks and Systems, Amsterdam, Netherlands, 1989. [60] D. J. N. Limebeer, M. Green, and D. Walker. Discrete time Hoc control. In Proceedings of the 29th IEEE Conference on Decision and Control, pages 392-396, Tampa, FL, 1989. [61] D. J. N. Limebeer and G. D. Halikias. A controller degree bound for Hoc-optimal control problems of the second kind. SIAM Journal on Control and Optimization, 26:646-667, 1988. [62] D. J. N. Limebeer, E. Kasenally, I. Jaimouka, and M. G. Safonov. All solutions to the four-block general distance problem. In Proceedings of the 27th IEEE Conference 'on Decision and Control, pages 875-880, Austin, Texas, 1988. [63] D. P. Looze, H. V. Poor, K. S. Vastola, and J. C. Darragh. Minimax control of linear stochastic systems with noise uncertainty. IEEE Transactions on Automatic Control, AC-28(9):882-887, September 1983.
218
Chapter 10
[64] E. F. Mageirou. Values and strategies for infinite duration linear quadratic games. IEEE Transactions on Automatic Control, AC21(4):547-550, August 1976. [65] E. F. Mageirou and Y. C. Ho. Decentralized stabilization via game theoretic methods. A utomatica, 13:393-399, 1977. [66] C. J. Martin and M. Mintz. Robust filtering and prediction for linear systems with uncertain dynarriics: A game-theoretic approach. IEEE Trans. A utomat. Control, 28(9):888-896, 1983. [67] P. H. McDowell and T. Baliar. Robust controller design for linear stochastic systems with uncertain parameters. In Proceedings of the 1986 American Control Conference, pages 39-44, Seattle, WA, June 1986. [68] M. Mintz and T. Baliar. On tracking linear plants under uncertainty. In Proceedings of the 3rd IFAC Conference on Sensitivity, Adaptivity and Optimality, Ischia, Italy, June 1973. [69] I. R. Petersen. Disturbance attenuation and Hoo optimization: A design method based on the algebraic Riccati equation. IEEE Transactions on Automatic Control, AC-32(5):427-429, May 1987. [70] B. D. Pierce and D. D. Sworder. Bayes and minimax controllers for a linear system with stochastic jump parameters. IEEE Trans. Automat. Control, 16(4):300-306, 1971.
[71] I. Rhee and J. L. Speyer. A game theoretic controller and its relationship to H OO and linear-exponential Gaussian synthesis. In Proceedings of the 29th IEEE Conference Decision and Control, pages 909-915, December 13-15, 1989. Tampa, FL. [72] H. L. Royden. Real Analysis. MacMillan, 1968. [73] M. G. Safonov, E. A. Jonckhere, M. Verma, and D. J. N. Limebeer. Synthesis of positive real multivariable feedback systems. International J. Control, 45(3):817-842, 1987. [74] M. G. Safonov, D. J. N. Limebeer, and R. Y. Chiang. Simplifying the Hoo theory via loop shifting, matrix pencil, and descriptor concepts. International Journal of Control, 50:2467-2488, 1989. [75] D. M. Salmon. Minimax controller design. IEEE Trans. Automat. Control, 13(4):369-376, 1968. [76] L. J. Savage. The Foundation of Statistics. Wiley, New York, NY, 1954.
References
219
,[77] A. Stoorvogel. The discrete time Hoo control problem: the full information case problem. To appear in SIAM Journal on Control, 1991. [78] A. A. Stoorvogel. The H= Control Problem: A State Space Approach. PhD thesis, University of Eindhoven, The Netherlands, October 1990. [79] D. Sworder. Optimal Adaptive Control Systems. Academic Press, New York, 1966. [80] G. Tadmor. H= in the time domain: the standard four block problem. To appear in Math. Contr. Sign. £'3 Syst., 1991. [81] K. Uchida and M. Fujita. On the central controller: Characterizations via differential games and LEQG control problems. Systems and Control Letters, 13(1):9-13, 1989. [82] K. Uchida and M. Fujita. Finite horizon H= control problems with terrninal penalties. Preprint, April 1990. [83] R. J. Veillette, J. V. Medanic, and W. R. Perkins. Robust control of uncertain systems by decentralized control. In V. Utkin and O. J aaksoo, editors, Proceedings of the 11th IFAC World Congress, volume 5, pages 116-121, August 13-17 1990. Tallinn, Estonia, USSR. [84] S. Verdu and H. V. Poor. Minimax linear observers and regulators for stochastic systems with uncertain second-order statistics. IEEE Trans. Automat. Control, 29(6):499-511, 1984. [85] S. Verdu and H. V. Poor. Abstract dynamic programming models under commutative conditions. SIAM J. Control and Optimization, 25(4):990-1006, July 1987. [86] S. Weiland. Theory of Approximation and Disturbance Attenuation for Linear Systems. PhD thesis, University of Gronigen, The Netherlands, January 1991. [87] J. C. Willems. The Analysis of Feedback Systems. MIT Press, Cambridge, MA, 1971. [88] J. C. Willems. Least squares stationary optimal control and the algebraic Riccati equations. IEEE Transactions on Automatic Control, AC-16(6):621-634, December 1971. [89] D. J. Wilson and G. Leitmann. Minimax control of systems with uncertain state measurements. Applied Mathematics £'3 Optimization, 2(4):315-336,1976. [90] H. S. Witsenhausen. Minimax control of uncertain systems. Report . ESL-R-269. MIT. Cambride:e. MA. Mav 1966.
220
ChapterlO
[91] H. S. Witsenhausen. A minimax control problem for sampled linear systems. IEEE Transactions for Automatic Control, AC-13:5-21, February 1968. [92] I. Yaesh and U. Shaked. Minimum Hoc-norm regulation of linear discrete-time systems and its relation to linear quadratic difference games. In Proceedings of the 29th IEEE Conference Decision and Control, pages 942-947, December 13-15, 1989. Tampa, FL. [93] G. Zames. Fe~dback and optimal sensitivity: Model reference transformation, multiplicative seminorms and approximate inverses. IEEE Transactions on Automatic Control, AC-26:301-320, 1981.
Chapter 11 List of Corollaries, Definitions, Lemmas, Propositions, Remarks and Theorems
Chapter 2 Definitions No. 2.1. p. 19 2.2. 21 2.3. 22 2.4. 23 2.5. 27
Propertites No. 2.1 p. 16
Theorems No. 2.1 p. 15 2.2 16 2.3 16 2.4 20 2.5 21 2.6 26
Chapter 3 Lemmas No.3.1. p.30 3.2. 45 3.3. 55 3.4. 57 3.5. 58 3.6. 59 Theorems No.3.1. p.31 3.2. 32 3.3. 37 3.4. 41 3.5. 46 3.6. 52 3.7. 60 3.8. 53 3.9. 70
Propositions No.3.1. p.51
Remarks No. 3.1. p. 3.2. 3.3. 3.4. 3.5. 3.6.
35 36 47 53 64 64
222
Chapter 11
Chapter 4 Lemmas No. 4.1. p. 74
Remarks No. 4.1. p. 83 4.2. 83 4.3. 88 4.4. 105 4.5. 106 4.6. 107
Theorems No. 4.1. p. 75 4.2. 78 4.3. 81 4.4. 86 4.5. 87 4.6. 90 4.7. 93 4.8. 96 4.9. 100 4.10. 103 4.11. 104 105 4.12. 4.13. 106 4.14. 112
Chapter 5 Corollaries No. 5.1. p. 120 5.2. 122 5.3. 143
Lemmas No. 5.1. p. 123 5.2. 135
Remarks No. 5.1. p. 5.2. 5.3. 504. 5.5. 5.6.
Theorems No. 5.1. p. 119 5.2. 122 5.3. 127 137 5.4. 5.5. 142 5.6. 146 5.7. 148
118 123 126 130 136 136
Propositions No. 5.1. p. 126 5.2. 130 5.3. 132
II
II
List o/Corollaries, Definitions, LenlmilS, Propositions, Remarks and Theorems
Chapter 6 Corollaries No. 6.1. p. 157
Remarks No. 6.1. p. 158 6.2. 161 6.3. 165
Lemmas No. 6.1. p. 156 6.2. 158 6.3. 163 6.4. 164 Theorems No. 6.1. p. 155 6.2. 157 6.3. 161 6.4. 172 6.5. 173 6.6. 175 6.7. 177
Propositions No. 6.1. p. 166
Chapter 7 Corollaries No. 7.1. 186
Lemmas No.7.1. p. 183
Remarks No. 7.1. p. 186 7.2. 186
Theorems No.7.1. p.185 7.2. 188 7.3. 190 7.4. 191 7.5. 192
Chapter 8 Definitions No.8.1. p. 196
Lemmas No. 8.1. p. 199 8.2. 201 8.3. 201
Remarks No. 8.1. p. 197 199 8.2. 200 8.3. 202 8.4.
Theorems No.8.1. p.,196 199 8.2. 8.3. 202 8.4. 203 8.5. 205
Propositions No. 8.1. p. 196 198 8.2. 8.3. 198 8.4. 203 8.5. 205 8.6. 205
223
Chapter 11
224
Chapter 9 Remarks No. 9.1. p. 209
Theorems No.9.1. p. 209
Systems & Control: Foundations & Applications Series Editor Christopher I. Byrnes Department of Systems Science and Mathematics ," Washington University Campus P. O. 1040 One Brookings Drive St. Louis, MO 63130-4899 U.S.A.
Systems & Control; FouiuJations & Applications publishes research monographs and advanced graduate texts dealing with areas of current research in all areas of systems and control theory and its applications to a wide variety of scientific . disciplines. We encourage preparation of manuscripts in such forms as LaTex or AMS TeX for delivery in camera-ready copy which leads to rapid publication, or in electronic form for interfacing with laser printers or typesetters. . . Proposals should be sent directly to the editor or to: Birkhiiuser Boston, 675 Massachusetts Avenue, Suite 601, Cambridge, MA 02139, U.S.A. Estimation Techniques for Distributed Parameter Systems H.T. BanksandK. Kunisch
Set-Valued Analysis Jean-Pierre Aubin and Helene Frankowska Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems Harold J Kushner Methods of Algebraic Geometry in Control Theory: Part I Scalar Linear Systems and Affine Algebraic Geometry PeterFalb
Er -Optimal Control and Related Minimax Design Problems A Dynamic Approach . Tamer &§ar and Pierre Bernhard