DIFFERENTIAL GAMES OF PURSUIT
SERIES ON OPTIMIZATION V O L 2
DIFFERENTIAL GAMES OF PURSUIT Leon A. Petrosjan Faculty of Applied Mathematics St. Petersburg State University RUSSIA
V f e World Scientific V h
Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing C o . Pie. L i d . P O Box 128, Fairer Road. Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 73 Lynton Mead, Totteridge, London N20 8 D H
L i b r a r y of Congress C a l a] oging-in-Publi cation Data Pelrosjan, L . A . (Leon Aganesovich) Differential games of pursuit / Leon A . Pelrosjan. p.
cm. - (Series on optimization; vol. 2)
Includes bibliographical refetences and index. ISBN 9810209797 :$82.0O(U.S.> I. Differential games. QA272.P46
I. Title.
II. Series.
1993
5l9.3-dc20
93-24420 C1P
Copyright © 1993 by World Scientific Publishing C o . Pte. Ltd. All rights reserved. This boot, or pans thereof, may not be reproduced in any form orby any means, electronic or mechanical, including photocopying, recording oranv information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through ihe Copyright Clearance Center, Inc., 27 Congtess Slreet, Salem, M A 01970, U S A .
Printed in Singapore.
Contents Preface 1
2
viii
Preliminaries
1
1.1
1
Z e r o - s u m t w o - p e r s o n games in n o r m a l f o r m
1.2
E q u i l i b r i u m Point
1.3
M i x e d strategies and existence of e q u i l i b r i u m point
17
9
1.4
G a m e s w i t h convex payoff function
20
1.5
O n e class of games w i t h complete information
32
1.6
S i m u l t a n e o u s games of p u r s u i t w i t h n o n - c o n v e x payoff functions
38
D e f i n i t i o n of differential game of p u r s u i t a n d existence theor e m of e q u i l i b r i u m points 49 2.1
N o n f o r m a l description
49
2.2
G a m e of p u r s u i t i n n o r m a l f o r m
53
2.3
E x i s t e n c e of the t - e q u i l i b r i u m point in differential games w i t h prescribed d u r a t i o n
2.4
59
E x i s t e n c e of e - e q u i l i b r i u m points i n o p t i m a l time differential p u r s u i t games
66
2.5
Alternative
71
2.6
Differential games w i t h dependent motions
73
2.7
A l t e r n a t i v e for games w i t h dependent motions and discrimination
3
C l a s s of pursuit—evasion
80 games w i t h o p t i m a l open—loop s t r a t -
egy for e v a d e r 3.1
83
D i s c r e t e g a m e w i t h t e r m i n a l payoff a n d d i s c r i m i n a t i o n for player E
83
3.2
C o n t i n u o u s game w i t h o u t d i s c r i m i n a t i o n
90
3.3
A r b i t r a r y t e r m i n a l payoff functions and phase constrains
3.4
O p t i m a l t i m e p u r s u i t games
. . . .
92 95
v
v
;
4
5
6
7
Contents 3.5
Necessary and sufficient conditions for existance of o p t i m a l o p e n -
3.6
loop strategy for player E Iterative methods for solution of differential game of pursuit
E x a m p l e s of d i f f e r e n t i a l g a m e s of p u r s u i t 4.1 Games w i t h prescribed d u r a t i o n w i t h o u t phase constraints 4.2
P h a s e - c o n s t r a i n e d "simple p u r s u i t " games
4.3
" S i m p l e p u r s u i t " game w i t h two pursuers and one evader
4.4
Relations between m a x i r n i n time of pursuit and t i m e of absorption
.
.110
119 . . . 119 131 . . . .
138 143
" L i f e l i n e " game of p u r s u i t
149
5.1
D e f i n i t i o n of "life l i n e " game
149
5.2
Discrete game
150
5.3
P r o o f of one geometric l e m m a
153
5.4
B a s i c theorem
157
5.5
Rejection of d i s c r i m i n a t i o n
158
5.6
M u l t i p l a y e r " l i f e l i n e " games
162
Differential games w i t h incomplete information
169
6.1
P u r s u i t games w i t h delayed i n f o r m a t i o n for player P
169
6.2
G a m e w i t h information delayed. Case of m pursuiers a n d one evader
177
6.3
E x i s t e n c e of e q u i l i b r i a i n mixed strategies i n "princess and m o n ster" game of pursuit
182
6.4
Differential games w i t h discrete i n f o r m a t i o n p a r t i t i o n
191
6.5
Games w i t h m i x e d information state
199
6.6
P u r s u i t game w i t h prescribe d u r a t i o n a n d delayed i n f o r m a t i o n for b o t h players
207
6.7
Delayed i n f o r m a t i o n for both players when the evader takes part in game
211
6.8
O n e multistage game w i t h delayed i n f o r m a t i o n
Noncooperative
team 226
differential games
235
7.1
G a m e on a finite graph tree
235
7.2
Nash equilibrium
239
7.3
D e f i n i t i o n of the noncooperative differential games i n n o r m a l f
7.4 7.5
o
r
253
m
Definition of cooperative differential game i n the f o r m of characteristic sets Classification of dinamic stable ( t i m e - c o n s i s t a n t ) solutions
257
. . . 261
vii
Contents 7.6 7.7
8
C o o p e r a t i v e differential games w i t h side p a y m e n t s 8.1 8.2 8.3
9
S t r u c t u r e a n d d y n a m i c s t a b i l i t y of pareto o p t i m a l solution i n the game of a p p r o a c h i n g E x i s t e n c e of d y n a m i c stable C - k e r n e l a n d NM-solution i n coo p e r a t i v e game of approaching
D e f i n i t i o n of cooperative differential game i n the characteristic function form P r i n c i p l e of d y n a m i c stability ( t i m e - c o n s i s t a n c y ) C l a s s i f i c a t i o n of d y n a m i c stable solutions
270 278 283 283 289 298
N e w o p t i m a l i t y principles i n n-person differential games 303 9.1 Integral o p t i m a l i t y principles 303 9.2 Differential strongly t i m e consistent o p t i m a b t y principles . . . . 3 1 1 9.3
S t r o n g l y t i m e consistent o p t i m a l i t y principles for the games w i t h discount payoffs
315
Bibliography
319
Index
325
viii
Preface
Preface T h e process of p u r s u i t represents a t y p i c a l conflict s i t u a t i o n and it is, therefore, not s u r p r i s i n g that such a s i t u a t i o n is one of the oldest topics w i t h i n the field of game theory. T h i s area was pioneered by H . S t e i n h a u s [62], who was the first to formulate i n 1925 the p r o b l e m of p u r s u i t as the differential game of p u r s u i t . A f t e r a prolonged silence, i n the 1950's the m a t h e m a t i c i a n s resumed their researches i n differential games. T h e y developed a m e t h o d based on the " m a i n " first order p a r t i a l differential e q u a t i o n , which seems to be first o b t a i n e d by R . Isaacs. A m o n g the literatures of t h a t p e r i o d are the papers by W . F l e m i n g [52, 53] dealing w i t h the convergency of values of discrete (discrete-time) games to a solution of the " m a i n " e q u a t i o n , the papers by L . B e r c o v i t z [50, 51], where the necessary a n d sufficient conditions are d e r i v e d for the existence of the e q u i l i b r i u m s i t u a t i o n i n terms of the " m a i n " e q u a t i o n , a n d , finally, the monograph by R. Isaacs [I], where numerous examples are provided for the i l l u s t r a t i o n of the whole m e t h o d of f i n d i n g a s o l u t i o n , w h i c h is based on the consideration of the " m a i n " e q u a t i o n . T h e first home works on differential games appeared in the 1960's (see [11, 31, 40]). L . S . P o n t r y a g i n and his followers considered the p r o b l e m s of p u r s u i t by s o l v i n g it for P u r s u e r P, and the p r o b l e m of evasion for E v a d e r E. T h e p r o b l e m of evasion is to describe a set of i n i t i a l states from w h i c h E v a d e r can ensure avoidance of capture. Here, for the linear case, an exhaustive solutions are o b t a i n e d to b o t h problems (see [18-21, 3 6 - 4 1 , 45, 46]). N . N . K r a s o v s k y and his followers assess the q u a l i t y of p u r s u i t by the t i m e span from the i n i t i a l i n s t a n t of the process up to the f - c a p t u r e i n s t a n t (t > 0). T h i s m e t h o d is based on the e x t r e m a l sighting rule which gives i n a n u m b e r of cases the e q u i l i b r i u m p o i n t . T h e m e t h o d was conclusively f o r m u l a t e d i n the monograph by N . N . K r a s o v s k y [10]. I m p o r t a n t developments are also observed in the theory of differential games w i t h depended m o t i o n s , for w h i c h a fine machinery of m i x e d strategies has been developed, leading to the e q u i l i b r i u m p o i n t . A l l the findings i n this area are described i n [12], O f s p e c i a l interest are the works [14, 22, 24, 60). T h e proposed monograph develops an approach to the p r o b l e m of p u r s u i t from the point the view of the general theory of z e r o - s u m t w o - p e r s o n games. C o n s i d e r a t i o n is given to the problems f e a t u r i n g various payoff functions (the distance between players when the game terminates, the m i n i m u m distance between players as the game proceeds, the escape from a given set before the capture i n s t a n t , etc.). For each p r o b l e m , the strategies of p u r s u i t and evasion are constructed w h i c h , under some c o n d i t i o n s , prove to be o p t i m a l i n the sense of "saddle p o i n t " ( e q u i l i b r i u m s i t u a t i o n ) . T h e existence theorems are proved for e q u i l i b r i u m points, and o p t i m a l strategies are e x p l i c i t l y expressed for a
Preface
ix
large n u m b e r of specific games of p u r s u i t . O u r considerations are not confined to the perfect i n f o r m a t i o n games only. T h e chapter deals w i t h some of the above p r o b l e m s for the case w i t h delay of i n f o r m a t i o n of the process state. T h e last chapters are devoted to the n-person differential games, where the m a i n p r o b l e m of t i m e consistency [66] and strongly t i m e consistency [67] of the o p t i m a l decisions is investigated. T h e m o n o g r a p h contains m o s t l y the results o b t a i n e d by the a u t h o r , w i t h the exception of some theorems f r o m the general theory of games (see C h . 1) and some Tesults given i n other chapters of the b o o k . In p a r t i c u l a r , the results presented i n 3, C h . 2, were o b t a i n e d by O . A . M a l a f e e v , a n d those i n 4, 5, C h . 6 by R . V . K h a c h a t u r i a n . E x a m p l e s 11, 12 i n C h . 1 were solved by N . I . S a t i m o v . T h e p r o o f of l e m m a i n 3, C h . 5 was done b y J . D u t k e v i t s c h , second m e t h o d of successive approximates i n 6, C h . 3 was derived by S . V . C h i s t i a k o v , and the m a i n results i n 8, C h . 7 are prooved by N . M . S l o b o s h a n i n . T h e a u t h o r wishes to express his indebtedness to Prof. V . I . Z u b o v a n d P r o f . N . N . V o r o b y e v for their involvement i n the discussions d u r i n g the monog r a p h p r e p a r a t i o n s . H i s sincere thanks also go to Prof. J . S . O s i p o v and V . N . U s h a k o v f r o m the U r a l Research Center of the R u s s i a n A c a d , of Sc. for their i n v a l u a b l e assistance i n p r e p a r i n g the m a n u s c r i p t . T h e a u t h o r specially t h a n k A . A . Ovseenko and S.S. Tolstobrov for the p r e p a r a t i o n of the E n g l i s h p u b l i c a t i o n i n P C .
English translation by J . M . D o n e t z and L . A . Petrosjan.
Chapter 1 Preliminaries 1.1
Zero-sum two-person games in normal form
In order to formalize a conflict situation of decision- making, it is essential to provide mathematical description to the parties involved in the conflict and the results of their actions. In compliance with the adopter terminology, such actions are called strategies and their results are assessed by a numerical function which is called a payoff function. The strategy and payoff may vary in nature to suit the task. The notions of the strategy and payoff will be illustrated by referring to examples. We shall give a formal definition of the zero-sum two-person game in normal form [2J. Example 1 (simultaneous game of pursuit on the plane) . Let Sl and S2 be two sets on the plane. The game proceeds as follows. Player ! chooses a point x E SI, and Player II a point Y E S2 . In making their choices the Player I and II have no information on the opponents actions and, therefore, such a choices can be conveniently interpreted as simultaneous. In this case, the points x and yare the strategies for the Player I and II. Thus the sets of all strategies coincide with the sets Sl and S2 on the plane. Player I aims to minimize the distance between himself and Player II. Player II pursues the opposite aim. For this reason, by the payoff of Player II in this game is meant the Euclidean distance p(x , y) between the points x E Sl and y E S2' Players I's payoff is equal to be Player II's payoff of opposite sign (Fig.! ). Example 2 (multistage game of pursuit). Let each point z in the plane be associated with the sets Uz alld Vz • Consider the game that follows. At the initial time instant, the Player I and II are respectively at the points Xo and Yo and make simultaneous choice of the points Xl E Uxo and YI E VyO '
Preliminaries
2
Fig. I. whereupon they make a transition to the states x a n d jft. A t the p o i n t s xi and y\ the players choose respectively the points x G t / ^ a n d y G and make a transitions to the states x a n d j / , etc. In the states i A - i * J f t - i , the players choose the p o i n t s S* € U _, a n d y € V - i a n d make a t r a n s i t i o n s to the states Xk,yk- T h e process terminates at the N-tti stage by choosing the points x y a n d m a k i n g a transition to the states XN,VN (Fig.2). x
2
2
xt
n>
2
2
t
W
n
A s a result such a sequential decision m a k i n g procedure realizes trajectories: x ,Xi,... ,x ,... ,x b e i n g the t r a j e c t o r y of P l a y e r I a n d j / , j / j , . . . , y , . . . , y s 0
k
N
k
0
the t r a j e c t o r y of Player II. T h e a i m of P l a y e r I is to m i n i m i z e the distance between himself a n d P l a y e r II throughout the d u r a t i o n of the game, w h i l e the aim of P l a y e r II is the opposite one. Definition of strategies in the game involves more complexities t h a n i n the earlier e x a m p l e . In this case, the strategy determines at each step x^-j (yk-i) the rule for the player to choose the next state (an element of the set c / _ , (Vy»-i))- It can be easily shown t h a t the player's choice must be dependent on the state i n f o r m a t i o n he has on each step. I t
1
A s s u m e t h a t P l a y e r I at each k-th step has the knowledge of fe, the s t a t e Xk, has the m e m o r y of all the previous states a n d knows the states yo,Vi, • • • ,Vk previously chosen by the opponent a n d his choice of yk+i G V i n the s t a t e t/fc- Player II at each k-th state has complete i n f o r m a t i o n on fc, the s t a t e y^, his previous states jJo,Jfi • • •,yk-i a n d the states z o , X ] , . . . , z t w h i c h his opponent has been up to the instant k. B y this definition of i n f o r m a t i o n s t a t e , the d i s c r i m i n a t i o n of Player II occurs at each step of the game. yk
f
Of course the f o r m u l a t i o n of the p r o b l e m , where at each k-th stage of the game, i n m a k i n g his choice, the P l a y e r I has a knowledge of P l a y e r II's choice at this stage a n d P l a y e r II i n m a k i n g his choice has a knowledge of P l a y e r I's
Zero-sum
two-person
games in normal
form
Fig.
3
2.
choice, w o u l d be a b s u r d . T h e i m p l i c a t i o n of the above i n f o r m a t i o n state is t h a t P l a y e r II is t h e " l e a d e r " . Some slight modifications i n the definition enables f o r m u l a t i o n of the game where P l a y e r I is the leader. T h e players have t h e knowledge of the game at each decision stage. F o r t h i s reason this g a m e w i l l b e called t h e perfect i n f o r m a t i o n game. T h e case o f the perfect i n f o r m a t i o n game lends itself most r e a d i l y for analysis, a l t h o u g h the s o l u t i o n of specific games involves problems because of the large amount of i n f o r m a t i o n to b e treated when the players' strategies are formed. T h e player's strategies m u s t enable t h e m to make a unique choice i n any possible i n f o r m a t i o n state. W i t h this u n d e r s t a n d i n g of players strategies, the strategies for P l a y e r I m u s t b e t h e functions w h i c h place every possible inform a t i o n state of t h e player, i.e. every sequence xo, i i , . . . , x , yo, !fii • • • iI/fei Vk+it ft i n correspondence w i t h t h e choice o f a p o i n t , w h i c h is possible i n t h e given i n f o r m a t i o n state, i.e. t h e choice of the point x f r o m the set U . T h u s , the strategy for P l a y e r I is a single-valued function of variables k
k + 1
Xh
«(•) =
v(x ,xi,...,x y ,i/ ,...,y ,y i,k) 0
k>
0
i
k
k+
P l a y e r I P s strategy is defined i n t h e same way except t h a t t h e l a t t e r is t h e f u n c t i o n o n l y o f x ,x,,... ,x , yo,Vi,-• • ,Vk,k since t h e choice o f x e U is u n k n o w n to P l a y e r II at t h e fc-th step. T h e strategy of P l a y e r II w i l l b e denoted b y 0
k
k + i
w(-) = v(x ,..., 0
xt
x , j/o, • • - , yk, k) k
T h e values of t h e function ti(-) belong to t h e set V . It is easily compreh e n d e d t h a t each pair of i n i t i a l conditions x , y and the p a i r of chosen strategies u n i q u e l y determine the trajectories of t h e players I a n d II X O , X J , . . . , X H , MJ[
0
0
Preliminaries
4
/
/ /
/
/
4
/
/
/
/ /
/
/ / V
/
/
/ Fig. 3.
9 o , S l ! • • - >9W
t
n
a
t
a
r
e
constructed recurrently by t h e rule. -
u(xo,... xfc,aoi"-jS»r*;y^**)' )
= w ( z , . - • , x * , i / o , - . . y*.fc)> 0
fc =
\,...,N-1.
Consequently, the i n i t i a l conditions x y 0t
0
a n d the p a i r of t h e strategies define
the player's payoff uniquely. C o n s i d e r i n g the previously s t a t e d o b j e c t i v e o f the game,the payoff m a y be defined as follows: K{u(-),v(-)) where X XT,
... ,X[j,yo,yi,
0:
=
imnp(x ,y ) k
k
• • • ,VN are the pair of trajectories c o r r e s p o n d i n g
u n i q u e l y t o the chosen strategies u(-),ti(-) a n d t h e i n i t i a l c o n d i t i o n s
x ,y . 0
0
U n l i k e E x a m p l e 1, the sets of strategies here are t h e sets of functions a n d have a complex s t r u c t u r e (see F i g . 2 ) . 3 (multistage perfect i n f o r m a t i o n game of p u r s u i t ) .
Example
Kinematics
and payoff i n this game are defined as i n E x a m p l e 2, except that P l a y e r I is less informed at each stage of the game. Let 0 < I < N b e a given n a t u r a l n u m b e r . A t each step k > I P l a y e r 1 knows his choice sequence x , 0
a n d the opponent's choice sequence a t
the first k — £ steps of the game, i.e. y ,yi,..., 0
y^-t, yk-e+i i
a
t
t h e steps k < t
only Xo,. .. Xfi a n d y . T h i s means that i n t h e case i n v o l v e d he receives t h e t i
0
- d e l a y i n f o r m a t i o n about P l a y e r II. T h e i n f o r m a t i o n state of P l a y e r II is the same as in E x a m p l e 2. Schematically, the players' i n f o r m a t i o n s state c a n be represented as follows (Fig.3) [47]. P r o c e e d i n g from the notion of strategy as a r u l e , w h i c h places each inform a t i o n state of the player in correspondence w i t h the choice of a p a r t i c u l a r
Z e r o - s u m two-person
games in normal
form
5
decision f r o m the set of decisions that are possible i n the given i n f o r m a t i o n state, we m a y say t h a t the strategies i n this game for P l a y e r I are functions «(•) = w i t h values i n the set U
u{x ,...,x ,yo,...,yk-i+i,k) 0
for k > £ a n d the functions
xt
«(•) = w i t h values i n the set U
h
xt
u(x ,...,x ,ya,k) 0
k
for k < £. T h e strategies of P l a y e r II remains the
same as i n E x a m p l e 2. T h i s g a m e is called the game w i t h delayed i n f o r m a t i o n for P l a y e r I a n d , for £ > 0, refers to the z e r o - s u m g a m e w i t h imperfect i n f o r m a t i o n . In a special case, for £ = 0 we have the game w i t h perfect i n f o r m a t i o n f r o m E x a m p l e 2. L e t S\ a n d S the set Si [J Sip(x ,y ) 0
U
XI1
0
> d{C)
= Si, V
ya
2
be the b o u n d e d sets on the plane, a n d C a sphere c o n t a i n i n g W e choose the i n i t i a l conditions x ,y 0
— S. 2
i n such a m a n n e r t h a t
0
being a diameter of sphere C ) . Now set jV = £,£ = 1,
(d(C)
T h e proposed game becomes the game f r o m E x a m p l e 1.
Indeed, since Xi a n d j / i are constrained by xi £ U
lB
= S\,yi
€ V
VB
=
Sj,
we have nun p{x ,yk) k
Example
= min[/>(x ,i/o),^(xi,j/i)] = 0
^ . ( m u l t i s t a g e t i m e o p t i m a l p u r s u i t game).
p(x yi) u
In t h i s e x a m p l e k i n e -
m a t i c s a n d i n f o r m a t i o n states of players are the same as i n E x a m p l e 2. A t the same t i m e , the payoff is determined here differently. A s s u m e t h a t a real n u m b e r R > 0 called the capture r a d i u s , is g i v e n . T h e d u r a t i o n of the game (the number of steps A ' ) is not specified. the n u m b e r of the step at w h i c h p(xN ,yjv ) n
t e r m i n a t e s at the step N
n
a
Let N
n
be
< R for the first t i m e . T h e play
a n d the payoff of P l a y e r I is assumed to be (—N ). n
T h e payoff of P l a y e r II is said to be
+N . n
B y t h i s d e f i n i t i o n , P l a y e r I is a i m i n g to approach P l a y e r II at a distance not exceeding R i n a m i n i m a l t i m e (for a m i n i m u m number of steps). seeks to delay the c a p t u r e m o m e n t . jectories) t h a t for a l l JV's there is p{x^, is assumed to be + c o ,
P l a y e r II
A l s o , possible are such realizations ( t r a y^)
> R. T h e n the payoff for P l a y e r II
a n d the one of P l a y e r I —oo, respectively.
Here the
strategy sets are wider t h a n those i n E x a m p l e 1 since account m u s t be taken of possible occurrence of an a r b i t r a r y long trajectories. Therefore, the strategy set for P l a y e r I(II) is composed of a l l possible functions of the f o r m u(-) = u(x ,.. 0
(v(-) = v(x ,. 0
.,x ,y ,...,y +i),k k
0
k
= 0,1,..., JV,....
• •, Xjt, i / o , . . . . 2/t), k — 0 , 1 , . . . , J V , . . . )
Preliminaries
6
w i t h values i n the sets U* V . A s i n E x a m p l e 3, we can formulate the t i m e o p t i m a l p u r s u i t game w i t h information delay for P l a y e r I . tl
yi
Example 5 (multistage game w i t h complete i n f o r m a t i o n ) . T h e game d u r a tion is not specified. K i n e m a t i c s and i n f o r m a t i o n states of players are t h e same as i n E x a m p l e 4. L e t {xt}, {yt} be the trajectories realized i n t h e game. T h e payoff of Player I is assumed to be +1 if there is such AT that p(x ,yN) < R, otherwise i t is assumed to b e - 1 . T h e payoff of P l a y e r II equals t h e one of opposite sign for Player I . T h u s , Player I aims to approach P l a y e r II for a distance R, a n d Player II has the opposite objective. H e r e as i n t h e earlier case, we can formulate the incomplete i n f o r m a t i o n p r o b l e m s i m i l a r to the p r o b l e m i n E x a m p l e 3. N
Example 6 (multistage " L i f e fine" game w i t h complete i n f o r m a t i o n ) . A convex set S is given on the plane. A t t h e i n i t i a l t i m e i n s t a n t , t h e Players I and II are i n the states x G S , j o £ S. N e x t , at each step P l a y e r I makes a choice from the set U H S, a n d P l a y e r II from t h e set V , T h e game terminates i n one of the following two cases: 0
XK
N
1. T h e r e is such number N„ that p (i/v„,I/N„) < R, VN„ € S\ 2. T h e r e is such number Ns that y^r, £ S. In the first case, the payoff of Player I(II) is assumed t o b e + 1 ( - 1 ) . If one of this possibilities is not realized, i.e. p(xN,yN) > R,VN 6 S for a l l N's, then the payoff of b o t h players is said to be zero. F r o m the definition of k i n e m a t i c s and payoff function we m a y conclude that the object of P l a y e r I is to approach P l a y e r II for a distance R w i t h i n the set S before Player II penetrates the b o u n d a r y of 5 . Therefore, this game is known as the " L i f e l i n e " game, since the b o u n d a r y of the set S turns out to be t h e " L i f e l i n e " for P l a y e r II. P l a y e r I must remain i n S throughout the game. W i t h the g a m e d u r a t i o n unspecified, the game is represented by t h e altern a t i n g plays w i t h complete i n f o r m a t i o n a n d , therefore, the strategies are det e r m i n e d just as in E x a m p l e 4. T h e incomplete i n f o r m a t i o n case for P l a y e r I (Fig.4) can also be formulated here. Example 7 (game w i t h several pursuers a n d one evader). Here t h e g a m e is played by a d e t a i l of pursuers, P = {P ...,P }, a c t i n g as one player, i.e. h a v i n g a c o m m o n objective, a n d E v a d e r I I . D e t a i l P w i l l be referred to as P l a y e r I . A l l the above examples can be generalized to this case. O u r consideration w i l l be confined to E x a m p l e 2. u
m
Let « ! , . . . , atf, y be the s t a r t i n g position of the detail p a r t n e r s P , , . . . , P and P l a y e r I I on the plane. U = {U ,}, i = 1 , . . . , m , V = {V } are t h e set systems o n the plane governing the kinematics of t h e d e t a i l partners P , , . . . , P and Player II. 0
m
K
x
v
m
Z e r o - s u m t w o - p e r s o n games in normal
T '
7
form
>
1
F i g . 4.
T h e game is as follows.
A t t h e i n i t i a l t i m e i n s t a n t , t h e d e t a i l partners
Pi, • • •, Pm a n d P l a y e r II make a simultaneous choice of points x\ G X" G
U t,..., x
! / i € Vyo, and pass to the states x\,. . . , It™, y\. A t the points x\,.,.,
^ i * ' V\ t h e players choose the points x\ G U i,...,
x™ G U ™, y G V
x
to t h e states x\,... ,x™,y , 2
players choose t h e points to t h e states x ,...,
x
2
e t c . W h e n i n the states G
j
, . . . , y , x™ G i/rj> , , J/i G V _ a n d pass k
ik
players choose t h e points x\ G ( 7 i _ _ , . . . , x ™ G U ^_^,yk G I
the states x\,...,
x
i
l
*3^K
the
a n d pass t o
xjp, y^. T h e process terminates at t h e / Y - t h step b y choosing
the points x\j € U i a
x j v , . . . , xJJJ, yN-
a n d pass
:
x™, j / t , etc. W h e n i n t h e states x\_j,...,
k
V i
• i *jl-tt-3tk-i the
H
,. . . , i j j e
6 l ^ . ,
a n d passing t o the states
A s a result o f such sequential decision procedure, we o b t a i n
u n i q u e l y t h e m + 1 trajectory x' ,... ,x' ,t = 1 , . . . , m , for t h e d e t a i l partners 0
N
Pi, a n d t h e t r a j e c t o r y j/oi • • • »JW f ° P l a y e r II. r
D u r i n g t h e game t h e detail P, a c t i n g as one player, aims t o m i n i m i z e the distance between itself and P l a y e r II at least for one of its partners. Player 11 has the opposite o b j e c t . T h e strategy for the pursuer d e t a i l P is denned as C a r t e s i a n p r o d u c t of strategies for every d e t a i l p a r t n e r , each strategy being covered i n E x a m p l e 2. A s s u m e t h a t the i n f o r m a t i o n state is such as given i n t h a t e x a m p l e , i.e. P l a y e r I ( d e t a i l P) knows k at each fc-th step, his state Xt = {x\,..., x j . , . . . , x™}, remembers a l l his previous states, is aware of the states yo,...,yk previously chosen b y his opponent, a n d the choice j/t+i G V , b y his opponent i n the state y . A t each fc-th step P l a y e r II has complete i n f o r m a t i o n i n k, his state y remembers his previous states y ,..., and t h e states i , . . . , X * , i n w h i c h his o p p o n e n t , has been u p t o the i n s t a n t k. Vk
k
k
0
0
Preliminaries
8
T h u s , the strategy for P l a y e r P turns out to b e the r u l e , w h i c h correspond to each i n f o r m a t i o n state, i.e. each sequence x , xj,..., Q
the point x = {x\ ,...,
arJT+i>i
x*
+l
M
w
h
e
r
e
4+1
xt, yo, • • •, >/*> !/fc+i> * % •
e
T
h
e
strategy for
P l a y e r I is represented b y the vector function u(-) = {u'(-)}> where u'(-) = u ^ a ^ , . , . , * * , ^ , . . .,if*,ff*+i,*0>» = l , . . - , m . T h e strategy for P l a y e r I I is defined j u s t o n t h a t o f P l a y e r P except t h a t i t is only the function of x ,...,x y ,...,y ,k, since t h e choice of the p o i n t x +i £ U is u n k n o w n t o Player I I a t the k-th step. P l a y e r I P s strategy will be denoted b y v{-). It is easily comprehended t h a t , as i n E x a m p l e 2 , to each pair of i n i t i a l conditions x = { i , . . . , 2 ; , - - . , x%], y a n d a p a i r of t h e strategies chosen u(-), u(-) u n i q u e l y correspond a trajectory collection for the detail P = {P ..., P } a n d Player IPs trajectory determined by the recurrent rule 0
k
kl
0
k
Xk
0
u
0
0
m
x y i k+
= u(x ,...,x y ,-.
k + l
0
kl
= v{x ,...,x y ,... 0
kl
.,y ,y i,k)
0
k
,yk,k),
0
k=
k+
l,...,N~l
Consequently, t h e i n i t i a l conditions x , y a n d the pair o f strategies )/(•},«{•) 0
0
define the players' payoff i n a unique way. Therefore, the payoff function m a y be defined as
K{u(-),v(-)) = m l n f m i n p f a r j j . j / i ) } ,
where x' ,... ,x' ; t/o,... - . ,SW, i = 1 , . . . , m , are the trajectories for the partners of detail P and the E v a d e r E corresponding u n i q u e l y t o the chosen strategies u(-), v(-) and i n i t i a l conditions x ,y 0
N
0
0
E x a m p l e s 3-6 c a n be n a t u r a l l y generalized t o the case, where several p u r suers, a c t i n g as one player, and one evader are involved. T h e above examples are represent of a variety of z e r o - s u m games even i n the simple case, when they provide a q u a l i t a t i v e model for the some p h y s i c a l process - p u r s u i t . B e g i n n i n g w i t h the second e x a m p l e , we have been dealing w i t h the d y n a m i c decision processes developing i n t i m e . T o o b t a i n a correct formulation of such i m p o r t a n t notion as strategy, it is essential t h a t the i n formation capabilities of b o t h parties should be defined precisely. A f t e r the strategies had been defined for b o t h players, the games lost their d y n a m i c character a n d came to bear resemblance, at least o u t w a r d l y , t o the game i n E x a m p l e 1. Indeed, h a v i n g chosen strategies u('),u(-) a t the start of the game, the players then d i d not make any new decisions since, by d e f i n i t i o n , the strategies envisaged a unique choice of decision at each step of the g a m e a n d i n a n y information setting. Therefore, the choice of strategies u(-),u(-) determines uniquely the outcome of the game, i.e. the payoff. Such a t r a n s i t i o n f r o m the d y n a m i c game t o the static state, when the entire game reduces t o a s i m u l taneous choice of strategies and subsequent c o m p u t a t i o n of payoff f u n c t i o n , is
Equilibrium Point
9
called the n o r m a l i z a t i o n of the game, and the game itself the g a m e i n n o r m a l form [2]. W e now offer a general definition for the z e r o - s u m game i n n o r m a l f o r m . D e f i n i t i o n L. Let us assume t h a t two a r b i t r a r y sets P , E are given. T h e set P is c a l l e d the set of strategies for Player I , and the set E is the set of strategies for P l a y e r II. T h e elements of the sets P and E are denoted by ti(-),t>(), respectively. T h e real function K is defined on the C a r t e s i a n p r o d u c t P x E. T h e t r i p l e T =< P,E,K > is c a l l e d the z e r o - s u m game in n o r m a l form. T h e g a m e T may be interpreted as follows. T h e Players I and II make a simultaneous (that is, w i t h o u t any i n f o r m a t i o n on the opponent's choice) choice of elements u{-) £ P and v(-) £ E. P l a y e r II receives the payoff equal to K(u(-),v(-)), a n d P l a y e r I the payoff e q u a l to —K(u(-),v(-)). In E x a m p l e I, the sets P and E coincide w i t h the sets Si and S on the p l a n e , a n d the function A"(u(-), «(•)) was the E u c l i d e a n distance between the points x & Si a n d y £ S 2
2
In the subsequent examples the strategy sets P a n d E had a more complex s t r u c t u r e . T h e y represented the sets of functions ti(-), u( ) that were defined for all possible i n f o r m a t i o n states governed by the rules of the game, and the payoff f u n c t i o n was denned u n i q u e l y by the strategies chosen and was c o m p u t e d by the players' trajectories. A n o t h e r feature of examples 2-7 is t h a t the game i n n o r m a l f o r m depended on the i n i t i a l states of players xo,yoTherefore, we a c t u a l l y dealt w i t h the entire f a m i l y of the games i n n o r m a l form T dependent o n the i n i t i a l states x ,y as parameters. XOiVa
Q
0
D e f i n i t i o n 2 . T h e pair of strategies «(•),!;(•),where u(-) £ P , «(•) £ E, is c a l l e d the s i t u a t i o n in the game T. T h e definitions of the game T suggests t h a t the s i t u a t i o n i n the game P defines the players' payoff i n a unique way.
1.2
Equilibrium Point
Let the game F be given i n n o r m a l f o r m T —< P,E,K >. F o r s i m p l i c i t y , assume t h a t the sets P a n d E are c o m p a c t , and the payoff function K is continuous. W e shall define the m i n i m u m payoff w h i c h can be ensured by P l a y e r II i n the game T. If the strategy u(-) chosen by the opponent were k n o w n to P l a y e r II he w o u l d always ensure the payoff m a x K(u(-),«(•)) "(•)
= *i(*0))
W i t h this i n m i n d , a n d being " w i s e " , Player I can choose the strategy u ( 0
)
Preliminaries
10 from the c o n d i t i o n , m i n | . ) $ i («{•)), i.e u
- m i n m a x K(u(-),v(-)) «(•) "(•)
= &
(1-2.1)
Let u ( 0 he a strategy for P l a y e r I, under w h i c h o
m i n m a x Klu(-), «.(•) «(•)
«(•)) = m a x /C(tt (-)< «( )
=
0
5
is achieved. T h e n the strategy u (-) is called the m i n i m a x strategy for P l a y e r I , and v the upper value of the game. P l a y i n g the m i n i m a x strategy u ( ) , P l a y e r IPs payoff does not exceed the q u a n t i t y v u n d e r any strategy v(-) G E. In fact , for a l l u(-) e E the following inequality holds 0
0
v = m a x K(u , v(-)) > / f ( « ( - ) , »(•)) *(•) 0
0
T h e l a t t e r also means t h a t , i n p l a y i n g the strategy u ( ) loss of P l a y e r I ( w i t h the zero-sum game, the loss of P l a y e r I is equal t o the payoff of P l a y e r II) does not exceed v. N o t e t h a t , t o ensure the payoff v. P l a y e r II has t o know the choice made by P l a y e r I. In this case, choosing a strategy Vo(-) f r o m the c o n d i t i o n max„(.j K(u (-),v(-)) = K(u (-) v (-)) = v, he o b t a i n s the payoff v. In choosing the strategy fo(-), however, he can n o t ensure the payoff v under any strategy u(-) for P l a y e r I, since i n the s i t u a t i o n {£(•),«()(•)) i t m a y t u r n out that K(u(],vo{-)) < K(uo{-),v { )). 0
0
0
t
0
0
Let us now assume t h a t P l a y e r I acquired somehow knowledge of the s t r a t egy v{ ) chosen by Player II i n the game T . T h e n he c a n always ensure t h a t the payoff of Player II does not exceed the q u a n t i t y min K(u(-),v(-))
=
W i t h this i n m i n d , and being " w i s e " , Player II can choose a strategy u ( ) f r o m the c o n d i t i o n $2(fo( )) = max„(.) (v(')) i.e. 0
2
• M O )
= maxminiY(u(.),u(-)) = y «(•) •>(•)
(1.2.2)
Let u (-) be a strategy for P l a y e r II, under w h i c h 0
m a x m i n / Y ( u ( - ) , !,(•)) = m i n K(u(-},v (-)) «(•) «(-) »(•)
= v.
0
T h e strategy v {-) is called the m a x i m i n strategy of P l a y e r II, a n d v the lower value of the game. B y p l a y i n g the m a x i m i n strategy v (-), P l a y e r II ensures the payoff v, whatever strategy may be chosen by P l a y e r I. Indeed, 0
0
E q u i l i b r i u m Point
\\
T h e d e f i n i t i o n of quantities v and v suggests t h a t there is always « > V.
(1.2.3)
T h u s , P l a y e r 1 can always ensure that the payoff of P l a y e r II does not exceed v, a n d P l a y e r II can always ensure the payoff, that is at least equal to v . B e l o w is given a rigorous proof of this statement for a more general case. L e m m a 1 Let the game T =
be given.
B = i n f sup A : { u ( - ) , « ( 0 ) > s u p i n f K(u(•),»(•)) (the compactness of sets P,E not assumed here).
and the continuity
of function
Then = v. K(u(-),v()}
(1.2.4) are
P r o o f : B y definition, for any strategy u(-). SUBK(U(-)M-))>I inf * ( « ( • ) , » ( - ) ) •
(1.2.5)
Since (1.2.5) holds for a l l u(-), then ti = i n f sup K(u(-),v(-))
> s u p i n f K '( y, then it is h a r d l y reasonable to recommend the P l a y e r I a n d II to adopt the m i n i m a x (1.2.1) and m a x i m i n (1.2.2) strategies Uo(-),tto(-), respectively. B o t h strategies are a m o n g the too " c a u t i o u s " strategies, a n d their choice is j u s t i f i e d , if the player is said to acquire, somehow, knowledge of the strategy chosen by his opponent i n the game T. A t the same t i m e , by the rules of the game, this e x a c t l y must not occur in the game T , since the players m a k e their choices simultaneously a n d independently of one another (none of the players knows the strategy chosen by the opponent). A n i m p o r t a n t p r o b l e m i n the theory of zero-sum two-person games is to reveal the class of games, for w h i c h the equality v = v = v
(1.2.6)
is satisfied. If (1.2.6) is satisfied, then by choosing p r o p e r l y a m a x i m i n strategy P l a y e r II ensures the payoff , v = y ~ v and w i t h the " c a u t i o u s " play, P l a y e r I
Preliminaries
12
cannot tolerate P l a y e r II g e t t i n g t h e payoff more t h a n v = v = v. T h u s , if Player II has no a d d i t i o n a l i n f o r m a t i o n on Player I's choice (he may t h i n k t h a t Player I makes a r a n d o m choice of strategy), he must p l a y so as to ensure the payoff v. If P l a y e r II deviates f r o m the strategy w h i c h assures h i m , his payoff may b e less t h a n v. It is therefore reasonable to consider v the value o f t h e game for P l a y e r II. T h e strategies u'(-),v'(-), for w h i c h #(«*(•).»(•))
< V for a l l «(•) € E
* ( « ( - ) . «*(•)) >
v
f
o
r a 1 1
O
S
(1.2.7) 0 -2-8)
p
are called the o p t i m a l strategies for the Players I a n d II. T h e following theorem relates the o p t i m a l strategies t o the equality (1.2.6). T h e o r e m 1 Let P and E be compact sets, and the function K[u{-),v{ )) be continuous. For the existence of strategies «*{•),«*(•) and the value v in the game T , it is necessary and sufficient that v = m i n m a x K(u{-),v(-)) »l-J n ) P r o o f : Necessity.
= /c(u*(-), ""(•)) = v « m a x m i n A T ( « ( ) , «(•)) = v. »(•) "(•)
Since for a l l u(-), K ( « ( - } . « * ( ' ) ) £
w
i
t
n
e
n
minK(u(.),v-(-))>v, «•(-) m«nuniY(u(-),u(-)) > ir()
u()
S i m i l a r l y , the inequality m a x , K(u*(-), v (
}
ti(-)) < v implies
m i n m a x #(!*(•),,;(•)) < v. T h u s , we have m a x m i n / C ( u ( ),u(-)) > m i n m a x / • £ > ( - ) , ,,(•)). Together w i t h L e m m a 1 we have y = v . Sufficiency.
W e choose u ' ( - ) from the c o n d i t i o n m a x A " ( i T ( ' ) , u ( - ) ) - m i n m a x K(u(-),
»(•)) = t>
and u ' ( ' ) from the condition min/C(ti(.),v"(-)) = max m i n ( t i ( - ) , u M ) = . M ) «(•) "(-) r
Equilibrium
Point
13
T h e n K(u(-),v-(-)) > v and / £ > • ( • ) , « ( • ) ) < « , W e have thus completed the proof of T h e o r e m 1.
•
It c a n b e easily seen t h a t i f t h e o p t i m a l strategies u*(.),t>"(-) exist (see c o n d i t i o n s (1.2.7), (1.2.8)), then the equality K(u'{-),v-(-) = v is satisfied. D e f i n i t i o n 1. T h e s i t u a t i o n (u'(-), v'(-)) is called the e q u i l i b r i u m point ( " s a d d l e p o i n t " ) i n the game T = < P , £ , K(u(-),v(-)) > if for a l l «(•) £ P a n d u(-) £ E the i n e q u a l i t y holds. tfOO,v*(-))
> K(u-(-)),v-(-))
= v > K(u-( ),v(-))-
(1-2-9)
T h e m e a n i n g of o p t i m a l strategies and saddle point can be readily seen f r o m (1.2.9). If the players choose the strategies «*(•),«*(•) c o n s t i t u t i n g the equil i b r i u m p o i n t then none of t h e m seeks t o change his strategy. Indeed, from the l e f t - h a n d side of i n e q u a l i t y (1.2.9) i t follows t h a t i f Player I deviates from the o p t i m a l strategy then the payoff o f P l a y e r II only increases, and hence his own payoff o n l y decreases ( p r o v i d i n g P l a y e r II continues to play the strategy «*(•)). A t the same t i m e , t h e r i g h t - h a n d side of i n e q u a l i t y (1.2.9) implies that i f P l a y e r II deviates f r o m the o p t i m a l strategy v'(-), then his payoff only decreases ( p r o v i d i n g P l a y e r I plays the o p t i m a l strategy). A n o t h e r i m p o r t a n t feature of o p t i m a l strategies is t h a t , before the game starts, the players m a y i n f o r m one another o n the use of o p t i m a l strategies in the game. Such i n f o r m a t i o n does n o t increase the opponent's payoff i n any possible way, since t h e only reasonable choice here is the choice of o p t i m a l strategies w h i c h involves o b t a i n i n g the value of the game. D e f i n i t i o n 2. L e t an e > 0 be given. T h e strategies u*(-), «*(•) are called the e-optimal strategies for the Players I and P l a y e r I I , and the s i t u a t i o n (u"(-), i>*(-)) the e - e q u i l i b r i u m point, i f the inequality
#(«(•),
* « & < ( • »
>
K( 0.
•
D e f i n i t i o n 3 . If i n the game F for any e > 0 there is the e - e q u i l i b r i u m point, then the l i m i t l i m , _ . K(u'(•)#*{•)) is called the value of t h e g a m e and is denoted by v. F r o m the above reasonings i t follows that this definition is i n agreement w i t h the definition of the game value given above. N o t e t h a t the value of the game c a n also exist even when the o p t i m a l strategies for players I a n d II not exist. T h e following theorem contains the necessary a n d sufficient c o n d i t i o n for the existence of the value of the game. 0
T h e o r e m 2 For the existence of € - equilibrium point and the value of the game in the game T for any e > 0, it is necessary and sufficient that m a x i n f 7f («{•), i>(-)) — m i n sup JV(u(-), w(-)) = v »{•) "( ) "(•) „(.) T h e proof of is o m i t t e d here, since it is similar to the one for T h e o r e m 1. In w h a t follows by the solution of the ( z e r o - s u m ) game is meant finding the s a d d l e - p o i n t , if any, or finding the e - e q u i l i b r i u m point for a n y e > 0. T h e above definition m a y be i l l u s t r a t e d by referring to E x a m p l e 1 f r o m 1. A s s u m e that the sets S\ a n d S\ are the closed circles w i t h r a d i i R a n d R? (R\ > / i j ) . L e t us calculated the lower value of the game y = m a x „ S j T
€
Equilibrium
Point
15
Fig.
5.
m i r i i e S i p{ >y)- L e t y £ S . T h e n m i n 5 p{x,y ) is achieved at the point XQ, where the straight line, passing through the center of the circle Si and the point y , intersects the b o u n d a r y of the circle S i . E v i d e n t l y , the q u a n t i t y minxes, p(x,y ) a t t a i n s its m a x i m u m value at the point M £ S , where the lines of centers O O j (Fig.5) intersect the b o u n d a r y of the circle Si which is farthest f r o m the point 0. T h u s , y = p(0, M) — R\. x
a
2
l 6
Q
1
0
0
2
In order t o c a l c u l a t e the u p p e r value of the game u = m i n S i Xj<eS3 p ( x , j / } , consider two cases. Case / . T h e center Oi of the circle S belongs to the set St ( F i g . 6 ) . For each x £ Si the p o i n t y delivering m a x 5 j p{x , y) is constructed as follows. L e t j/q and j/q be the points of intersection of the straight line OXQ w i t h the b o u n d a r y of the circle S , and j/q be the point of intersection of the straight line Oix w i t h the b o u n d a r y of the circle S which is the farthest from the point XQ- T h e n y is determined from the c o n d i t i o n l 6
2
0
0
0
v C
2
0
2
0
p(xo,yo) B y c o n s t r u c t i o n , for a l l x
0
=
ma.x p{x ,y' ) 3
0
o
£ £}•
m a x p { x , y ) = p(x ,y ) 0
0
>
0
y£S
2
For Xo = Oi however we get maxp(O y) s€Sa u
therefore
=
R
2
m i n m a x pix, y) — R
2
R
2
ma
Preliminaries
16
Fig.
7.
It can be directly seen that since 0\ G Si, i n Case 1 v = R > OM — Ri = y Moreover, the equality v — y is possible only if Oi belongs to the closure of the set S\. 2
T h u s , if i n Case 1 O does not belong to the b o u n d a r y of the set 5 , then the e q u i l i b r i u m point and the value of the game does not exist. If, however, 0 ] belongs to the closure of the set Si, there is the e q u i l i b r i u m p o i n t . Here the o p t i m a l strategy for P l a y e r II is to choose the point M l y i n g on the intersection of the line of the centers OOi w i t h the b o u n d a r y of the set S and being farthest f r o m the point 0. T h e o p t i m a l strategy for player I is to choose the point x G Si w h i c h is coincident w i t h the center Oi of the circle S . T h e value of the game here is v = y = u = Ri + R^ — R = R . t
5
2
2
t
Case 2.
T h e center of the circle 0\ $ Si.
2
T h i s case is a version of case
Mixed
strategies
and existence
of equilibrium
point
17
1,where the center of the circle S belongs to the closure of the set Si- L e t us calculate the q u a n t i t y v ( F i g . 7 ) . L e t x € Si. T h e point y delivering m a x , , ^ p{x -,y) then coincides w i t h the point y at w h i c h the straight line passing t h r o u g h Xo a n d the center O j of the circle S intersects the b o u n d a r y of the circle S t h a t is farthest f r o m the point Xo- Indeed, the circle w i t h the radius x^yo, its center b e i n g at the point x , contains S its b o u n d a r y touches the b o u n d a r y of the circle S only at one point jfo- E v i d e n t l y , the m a g n i t u d e of maXygSj p(xo,y) = p(x ,y ) attains a m i n i m u m at the point Mi where the l i n e segment OM intersects the b o u n d a r y of the circle S\. T h u s , i n the this case 2
0
0
0
0
2
2
2
0
2
0
0
v = m i n m a x p(x, y) = OM
— Ri — y
T h e o p t i m a l strategies i m p l y the choice of M G S and A / , G Si made by the players I a n d II, respectively. 2
If the open circles Si and S are regarded as the strategy sets i n E x a m p l e 1, then i n Case 2 the value of the game exist and equals 2
y = sup i n f pits, y) = inf sup p(x, y) = u = OM i€SjveSi „6S[ ,
— R\ = v.
r € S
T h e o p t i m a l strategies, however, does not exist, since Mj $ SifM $ S. At the same t i m e , the c - o p t i m a l strategies exist that are the points f r o m the eneighborhoods of the points M , belonging to the sets Si and S , respectively. 2
t
1.3
2
M i x e d strategies and existence of equilibrium point
E x a m p l e 1 i n 1 shows t h a t the e q u i l i b r i u m point exists not in a l l z e r o - s u m t w o - p e r s o n games. T h e existence of the e q u i l i b r i u m point is found to be exceptional father t h a n c o m m o n in nature. In this case it is not clear w h a t m a y be u n d e r s t o o d by a solution of the game. A s s u m e t h a t the g a m e F has no e q u i l i b r i u m p o i n t . A s stated in the preceding section, i n such type of games it is very i m p o r t a n t for each player to know the opponent's intentions. A l t h o u g h the rules of the game do not provide such an o p p o r t u n i t y , w i t h a sufficient repetition of the game w i t h the same opponent the player can s t a t i s t i c a l l y estimate the possibility of choosing a p a r t i c u l a r strategy a n d take a proper a c t i o n . W h a t shall the player do to keep his i n t e n t i o n secret f r o m the opponent? T h e o n l y reasonable m e t h o d here is to choose a strategy at r a n d o m , i n c o m p l i ance w i t h a r a n d o m m e c h a n i s m . In this case, the player cannot guess w h i c h of the strategies has been chosen by his opponent, since the l a t t e r himself cannot foretell the result of the r a n d o m choice.
Preliminaries
18
Formally, such a possibility can be realized as follows. F o r each g a m e T, we construct its m i x e d extension f , i n which the players' strategies are represented by various p r o b a b i l i t y measures over t h e sets of strategies for the g a m e F , a n d by the payoff is meant the m a t h e m a t i c a l expectation. W e offer a rigorous definition. L e t A be some <j-algebra of t h e subsets of P, a n d B some M j » ) > A/(>',4
(1.3.1)
is satisfied. T h e strategies / i * , i/~ are called the o p t i m a l mixed strategies for t h e P l a y e r I and I I , respectively. E v i d e n t l y , ordinary strategies are a special case of the m i x e d strategies, i.e. the inclusion P C P , E C E is v a l i d . Indeed, each strategy u(-) 6 P ( « ( - ) G E) can b e associated w i t h the p r o b a b i l i t y measure /'ii(') € P{yv{-) € E) concentrating the total mass o n the element u( ) G
M i x e d strategies a n d existence o f e q u i l i b r i u m point
19
P ( » ( . ) G E). It can be s h o w n t h a t if i n the game T =< P , E, K{u( ), u(-)) > there is t h e e q u i l i b r i u m p o i n t , then t h e e q u i l i b r i u m p o i n t also exist i n the game T = < P,E,M(p,i/)
> ( t h a t , t h e e q u i l i b r i u m point also exists i n the
m i x e d strategies i n t h e game T ) , then t h e strategies
"«•(•)> prescribing
p r o b a b i l i t y 1 t o elements u ' ( - ) G P a n d v"(-) G E constitute the e q u i l i b r i u m p o i n t i n the game T. I.e.these are t h e m i x e d o p t i m a l strategies i n the game T=
. t
Indeed, let («*(•), «*(•)) be the e q u i l i b r i u m p o i n t i n the game Y =< *(«(•)>»(-))>•
T
h
e
P,E,
D
* ( « ( • ) , » * ( • ) ) > * ( « • ( • ) . » * ( • ) ) > ^{"•(•)."(-))
d-3.2)
for a l l «(•) G P.w(-) G £ . W e shall r e w r i t e (1.3.2) as K{u{-),v-{.))>K{u'{-)y{-)),
(1.3.3)
/f(u-(),«())<XK(-),«'(.)).
(1.3.4)
Integrating (1.3.3) w i t h respect t o an a r b i t r a r y measure v G E, a n d (1.3.4) w i t h respect t o an a r b i t r a r y measure, a G P we o b t a i n / K{u{-),v'{-))dp.
> j K(u'(-),v'(-))dp
= K(u-(-),v*(-)),
(1.3.5)
j K{u-{-)A-))dv
< j
= K(u'{),v'{-)).
(1.3.6)
K(u-(-),v-{-))d„
W e i n t e g r a t e (1.3.5) a n d (1.3.6) respectively w i t h measures
"„•{.),/*«•(.)•
T h e n , u s i n g (1.3.5) a n d (1.3.6), we find A i ( u , i v . ) ) > A/(« .(.,, i/ . ) > A/(^ .(.,,i/), (
u
v
( 0
u
(1.3.7)
i.e. the p o i n t A / ( / i ' ( . ) , i V ( - ) ) is the e q u i l i b r i u m point in the m i x e d strategies U
in the g a m e F , a n d A / ( ^ - ( ) . "-.(•)) =
^K(0,»*(-))-
For s i m p l i c i t y , we i n t r o d u c e the following n o t a t i o n :
M( . v) M h
=
M(u(),v).
T h e o r e m 3 For the pair p.", v* to be the equilibrium necessary
and sufficient
point in the game T , it is
that for all u(-) G P . t i ( - ) G E the following
inequality
holds. A / ( u ( - ) , f * ) > M{p-,v-)
> Mip-A-)).
(1.3.8)
Preliminaries
20
P r o o f : Sufficiency. L e t « £ P,u € E be a r b i t r a r y m i x e d strategies. W e integrate b o t h parts of the l e f t - h a n d inequality i n (1.3.8) w i t h respect to the measure p, and b o t h parts of the r i g h t - h a n d inequality i n (1.3.8) w i t h respect to the measure v. A s a result , we o b t a i n M(p,S)
= J
M(u(-),u')dfi>
> j M{ i\v')du M{pS,v')
(1.3.9)
= M{n\v-),
r
= j M{p',v')&t>
> j M{n',v(-))dv
>
= M(fi',v).
(1.3.10)
F r o m (1.3.9) and (1.3.10) we find out that the pair (/i*,c*) constitutes the e q u i l i b r i u m point i n m i x e d strategies. T h e necessity is evident, since if (1.3.2) is satisfied for all u £ P , v £ E, then i n p a r t i c u l a r , it also holds for a l l strategies / i ( - ) , !>„(•) prescribing p r o b a b i l i t y one to the elements u(-) € P , v{-) € E. • u
A s shown, the m i x e d strategies extend the players' strategy sets i n the original game and preserve the value of the game and the e q u i l i b r i u m point, if any. In most cases, the e q u i l i b r i u m point i n m i x e d strategies is found to be existent, while the e q u i l i b r i u m point may not exist. W e consider a special class of z e r o - s u m two-person games T where there is the e q u i l i b r i u m point i n m i x e d strategies. T h e o r e m 4 Let P e CompR (E G CompR ) be a compact set in Euclidean spaces of corresponding dimension, and the function K(tt(-}, v(-)) be continuous on Cartesian product P X E. Then in the game V • < P , E, /^(u(•),«(•)) > there is the equilibrium point in mixed strategies. k
1
For the proof this theorem y o u may refer to [8]. In some cases, using specific properties of the payoff f u n c t i o n , we may provide a more precise definition for structure of m i x e d o p t i m a l strategies i n the game V. Specially , this occurs i n the games w i t h convex and concave payoff f u n c t i o n . In w h a t follows we w i l l be dealing w i t h this type of games a n d , therefore, we w i l l consider t h e m i n details i n the following section.
1.4
Games with convex payoff function
A s s u m e t h a t the set E is compact i n R , n
i t " , the function K(u(-),v(-))
the set P is convex and compact i n
is continuous i n all its variables a n d convex i n
G a m e s w i t h convex payoff
function
21
«(•) for a l l o( ). S u c h game T = < P, E,K(u(-),v( w i t h convex payoff function (or convex game).
)) > w i l l be called the game
Let us consider a m i x e d extension of the game T over spaces of various p r o b a b i l i t y measures on P and E. Investigation of such type of games is of interest to us because the solution of the f a m i l y of p u r s u i t - e v a s i o n type games w i t h incomplete i n f o r m a t i o n reduces to t h e m . T h e s o l u t i o n has simple geometrical structure. In this section we w i l l define a general f o r m of o p t i m a l strategies and i n d i c a t e a m e t h o d of c o m p u t i n g the value of the game. W e hope t h a t the reader has some knowledge of fundamentals of the theory of convex sets a n d convex functions, therefore the relevant results of this theory are given w i t h o u t proof. Let I be a m i x e d strategy by which the point 2 is chosen w i t h p r o b a b i l i t y 1. T h e f o l l o w i n g theorem holds. t
T h e o r e m 5 The value of the game T is determined v = min max
by the
formula
Klu.v).
Player It has an optimal strategy of the form A j > Q J . At the same time, all pure strategies K(u,v) is achieved, are optimal for Player K(u, v) with each fixed v(-) is strictly convex for Player I is unique.
v = Y,i=i ^ i ^ w 1, I under which m i n max,, I . In addition, if the function in u(-), then the optimal strategy =
0
UBI
u
In T h e o r e m 5, I/Q is a p r o b a b i l i t y measure concentrated on not more t h a n the (n + 1) points i n the set £ , v , ( - ) Before t u r n i n g i m m e d i a t e l y to the proof of T h e o r e m 5, we shall prove some a u x i l i a r y l e m m a describing the properties of convex functions. Let B be a compact convex region i n the n - d i m e n s i o n a l space whose elements are denoted by y. T h e f u n c t i o n / is called linear if / ( £ X,y,) = £ A ; / ( i / ; ) with = 1 (the linear f u n c t i o n is not necessarily homogenous). T h e function f(y) — 1 denoted by u>, is linear. T h e linear functions form the (n - f 1) d i m e n t i o n a l linear space E. W e denote by F an element of the conjugate space E' . L e m m a 3 If F[w)
= 1, then there is such y that F{f)
= f(y)
for all f g
E.
P r o o f : It suffices to show t h a t 11 -I- 1 equations F(fi) = f(y) have a solution w i t h respect to y for n + 1 linearly independent elements / , 6 E. B u t w may be t a k e n as f\. T h e first equation then converts to identity. T h e other n equations are independent a n d have a s o l u t i o n , since y is an element of the n - d i m e n s i o n a l space. •
Preliminaries
22
L e m m a 4 The sei of all functions f, that are nonnegative on the set B, form a closed convex cone P C E with its vertex at zero, and the function u> is its interior point. Moreover, the region P(y) = {f(y) : f(y) € P}, for which P(y) > 0. coincides with B. (We denote by P(y) a collection of all f{y) for f € P, and by f{B) a collection of all f(y) for y G B). P r o o f : F i r s t show t h a t P is a closed convex cone. L e t f f € P , a > 0, 0 > O.Then f = af +0f also belongs to P , since f(y) = a / , ( y ) + Pf [y) > 0 for all y € B. T h a t the P is closed is evident. If / = 0, then f € Pit
1
3
1
2
W e now show that the region, for w h i c h P(y) > 0, coincides w i t h E. L e t y & B. T h e n y may be separated from B by a h y p e r p l a n e , i.e. it is possible to indicate such / € E that f(y) < c < f{B). Consequently, the function f — cw belongs to P and is negative i n j / . Suppose w is not the interior point of P. T h e n there is a sequence / „ —» u>, / n € P- For each y„ 6 B there is y £ B such t h a t f {Vn) < 0. Since B is compact region , then there exists a subsequens y converging to y € B and such that / n , ( u „ ) < 0. T h e n , f(ya) < 0, as well. In this case , however, I i m _ o o /n (!/o) < 0the same t i m e , l i m f (y) = 1 for a l l t/'s on B, i.e. w is the interior point of the set P. • n
n
ni
0
t
nt
t
A
t
n - D O
n
L e m m a 5 Let Q be a compact convex sei from E not intersecting there arc such y 6 B and S > 0 that Q(y) < —6.
P.
Then
P r o o f : Let F € E' a n d shares P and Q, i.e. F(f)>b>F(g)
(1.4.1)
for a l l / € P and q £ Q. F r o m this it i m m e d i a t e l y follows that F(f) > 0 for all / € P Indeed, assume that the case is the contrary. L e t fi be such that F(h) < 0. T h e n Xf, £ P for any A > 0 and we have F(Xfj) = XF(f ) < 0. Since A cannot be chosen a r b i t r a r i l y large, this contradicts (1.4.1). T h u s , P{f) > 0 a n d , a d d i t i o n a l l y , there is 6 ^ 0 satisfying (1.4.1), since 0 6 P, a n d P(0) = 0, and the sets P and Q are convex, closed and do n ot intersect. F r o m (1.4.1) we o b t a i n t
F(f)>0>
F(q)-b.
(1.4.2)
Setting ( — 6) = 6 > 0, we derive the inequality P(Q)
+ 60).
(1.4.3)
Since w is an interior point of P, then F{u>) > 0 holds. W e may choose F i n such a manner t h a t F(w) = 1. B y L e m m a 3, we find the point y satisfying the inequality Q(y) + 5 < 0 < P(y). In accordance w i t h the preceding l e m m a , y € B. T h i s completes the proof of L e m m a 5. •
G a m e s w i t h convex payoff
23
function
L e m m a 6 If the function s u p f (B) is positive for the family {f } of linear functions, then for the properly chosen A, > 0, E S S Aj = 1 and a ; , i = I,... ,n,n + 1, the function f = E S r i A / is a member of P, i.e. f{B) > 0. a
a
a
;
0 j
P r o o f : If / „ ( ; / ) > 0 for t h e fixed a a n d y, then there is a n open set c o n t a i n i n g y where there is a strict inequality. Consequently, by H e i n e - B o r e l theorem on covers, we m a y find such finite subset {f } t h a t m a X j { / ( j / } } > 0 for each yeB. 0)
0>
D e n o t e by Q a convex h u l l of the s u b f a m i l y {f ,}- Because of L e m m a 5 , the sets P a n d Q intersect. Since Q is b o u n d e d , a n d P is not b o u n d e d , then a b o u n d a r y point of t h e set Q lies i n P T h i s point belongs to a space of dimension not exceeding n , therefore it m a y be represented as a convex linear c o m b i n a t i o n of not more t h a n n + 1 functions / „ . T h i s proves the l e m m a . • a
T h e o r e m 6 Let {ip } be a family of continuous convex functions defined on the compact convex n-dimensional set B. The function sup
0 there are such indices a ; and such A : , that for all y G B. a
a
a
0
7i+l
1=1
when A,- > 0 and £ £ £ A, = 1. Proof:
Suppose t h a t for a l l j 6 B , satisfying the condition of T h e o r e m 5,
the following i n e q u a l i t y holds 6 > i n f sup
0
for a c e r t a i n a. A n y y - t a n g e n t plane satisfies this c o n d i t i o n , so that f a m i l y a
f
B
contains a l l planes tangent to ip„. Consequently, for a l l y's sup fp{y) = supy {y) a
> c.
Preliminaries
24 Consider for a fixed 6 > 0 the f a m i l y Fp = lb-(c
-8)w].
Since sup,, f 13(B) > c, we have sn [fp-(c-6)w](B)>S>0. 0 P
W e shall apply L e m m a 6 to the f a m i l y Fp and choose p\,..., /3 , A i , . . . , A Let be the functions corresponding to f f r o m (1.4.4). T h e n n+t
n + 1
.
0i
n+l
n+l
i=l
i=1
for all y G B, which was stated i n the theorem. W e now prove T h e o r e m 5. Consequently, the function tp(u) = sup K(u, v) achieves a m i n i m u m value at some point u 6 B, and for every t > 0 we may choose {A;} and {v\}. T h u s , v
0
n+l
for a l l u € P , where A|, «J take values from compact sets, i.e. X\ > 0 and Jl"=i A ' — 1. Therefore, as * —1 0, for every i we may choose the l i m i t points \9 ?'i = l , . . . , n + 1, satisfying the conditions A? > 0 a n d E " ^ A? = 1, }
1
v
£a^(»,^J^
(1-4-5)
for a l l u £ P . T h e inequality (1.4.5) implies t h a t if Player II adopts a m i x e d finite strategy v*(-) = A°/,,°, then his payoff is at least f(u ). A t the same t i m e , if Player I adopts a pure strategy u , then w i t h any v P l a y e r IPs payoff equals 0
0
K(u ,v) 0
< sup K(u ,v) 0
=
ip(u ). 0
T h u s , i ^ ( U ) is the value of the game, and {7^, v'} a pair of o p t i m a l strategies. Let us now assume t h a t K is a s t r i c t l y convex w i t h respect to u for any u , and show that the o p t i m a l strategy for Player I here is unique. A l s o , for P l a y e r II we choose a fixed o p t i m a l strategy of the form YJlH A?f„j, define P as a set of such points u that 0
1+1 i £A°A>,v°) < («o) + -, v
* = 1,2,,..,
G a m e s with convex payoff function and
25
denote by p a p r o b a b i l i t y measure related to any o p t i m a l strategy for
P l a y e r I . It is r e a d i l y seen t h a t u(P
= 0 for any v.
— P) v
B u t it follows
from the s t r i c t convexity t h a t the intersection [") P* consists of a unique p o i n t . Consequently, the m i x e d o p t i m a l strategy /,„, is unique. T h i s completes proof of the t h e o r e m .
the •
T h e result of this section will be i l l u s t r a t e d by referring to examples. Example Si = S
2
8. L e t us consider a special case given i n E x a m p l e 1 from 1. L e t
— S a n d the set S be a closed circle on the plane, w i t h O as its center
and R as r a d i u s . T h e payoff function K(u,v) respect to x for fixed y. x
2
£ S,j£
= p{x y),x t
S,is s t r i c t l y convex w i t h
Indeed, let y — (y\,y )
£ £, x
2
=
1
(xj,x ) £ 2
S,
= ( x j , x\) € S . If s i ^ x , then for a l l 0 < A < 1 2
p{ ,Xx y
+ (1 - \)x )
+ (1 - X)p( x ).
< \pfax )
2
%
1
(1.4.6)
2
yi
In fact, { A y i + (1 -
- [A3} + (1 - \)x }} 2
{A( ,i-xj)4-(l-A)( !
= A (yi-xl) + 2
! f
=
2
i-3?)} =
0-4.7)
2
L4
i0
B y squaring (1.4.10), we get [2(yi -
- x )(y 2
< im - A)\y* - 4)
2
- x\)(y - x\)]
- * d a
(!•*•")
a
or, w h i c h is the same, Kift - 4)(y2 - A) - (V2 - 4)(ifi - «SSP > o
(i-4.i2)
However, (1.4.12) is always satisfied i f x ^ x ^ j / . T h u s , we have completed the proof for the convexity of the function p(x, y) i n t, w i t h each y being fixed. B y the c o n d i t i o n , the set S (circle) is also convex. Therefore, we can directly apply T h e o r e m 1, from w h i c h i t follows t h a t the value of the game 1
2
v ~ m i n max/>(x, y).
(1.4.13)
T h e example i n 2 suggests that v = m i n max p(x, y) = R
(1.4.14)
In this case, the point x 6 S, on which the m i n i m u m of expression m a x s p(x,y) is achieved, is unique a n d coincides w i t h t h e center of the circle 5 (i.e., w i t h the point O (see 2)). B y T h e o r e m 5, this point precisely is the o p t i m a l strategy for Player I (minimizer) i n the given case. T h e o r e m 1 states that Player II (maximizer) has a m i x e d o p t i m a l strategy prescribing a positive probability to not more three points i n t h e set S. However, because of the symmetry of the set S, by using the o p t i m a l strategy, the player chooses w i t h probability ( 4 , |) any two d i a m e t r i c a l l y opposite points on the b o u n d a r y o f S. Denote such a strategy for Player II by f " = i I + ^I . T o prove the o p t i m a l i t y of v' i t suffices to establish t h a t for all x , y £ S, s &
yl
y2
}
M{I y) 0t
R, then the above a s s u m p t i o n contradicts the m i n i m a l i t y of the sphere Co(R). A t the same t i m e , there are points y on which m i n m a x , p(x, y) = p(0,y) is achieved. These are the tangent points of the b o u n d a r y of the sphere Co{R) and of the b o u n d a r y of the set S. T h u s , x
v e
e S
1
XI
i e S
e s
v = m i n m a x ^(a:, y) = R. N o w let /o be P l a y e r I ' s strategy prescribing p r o b a b i l i t y 1 to the choice of the center 0 of the sphere C (R) 0
(evidently, 0 € S).
E m p l o y i n g T h e o r e m 1,
Preliminaries
28
we have that the o p t i m a l strategy for P l a y e r II prescribes positive p r o b a b i l i t y to not more t h a n three points of the set S. It is readily seen t h a t these points must be the tangent points for the sphere C {R) and the b o u n d a r y of the set S. T o define probabilities, under w h i c h the m a x i m i z i n g player chooses the tangent points for the sphere Co(R) and the b o u n d a r y of the set S, we will consider two cases. Case 1. T h e r e exists two tangent points for S and C (R} such that their j o i n i n g chord is a diameter of the circle CQ(R)Case 8. T h e two points given above not exist. Suppose we have a simple Case 1. Denoted by Ai, A d i a m e t r i c a l l y opposite tangent points for Co(R) and S. T h e o p t i m a l strategy for the m a x i m i z e r is the choice of points A\ and A w i t h equal probabilities Denote this strategy for P l a y e r II by ji'. It suffices to show t h a t for all A G S 0
0
2
2
T h e latter is proved as i n (1.4.15). In case 2, there are necessarily such three tangent points A A ,A that the center O of the circle C [R) belongs to a convex hull of these points. W e choose a coordinate system, i n such a way t h a t the center O of the circle CQ(R) is its o r i g i n . In this system, let ( z i . J h ) , (x ,y ), (x , y ) be coordinates for the points A,,A ,Az. T h e n there exist such A , > 6 , £ ? = j A,- = l , i = 1,2,3 that E f - i A^ar; — 0, £ j L ] A;t/, — 0, since any point of the convex p o l y h e d r o n is representable as a convex linear c o m b i n a t i o n of its vertices. lt
2
3
Q
2
2
3
3
2
W e show t h a t one of the points b y v'. C o n s i d e r based on (1.4.5),
a m i x e d o p t i m a l strategy for the m a x i m i z e r is the choice of Ay, A ,A$ with probabilities A i , A A . Denote this strategy the function E(l ,v*) for A e S. It suffices to show t h a t , for a l l A G S. 2
2 l
3
A
E{I ^)>E{l ,u') A
0
= R.
However, the function
•=1
where x, y are coordinates of the point A G S, is s t r i c t l y convex w i t h respect to x,y and is continuously d i f f e r e n t i a t e w i t h respect to x,y, therefore it has a unique point of m i n i m u m that is determined from the c o n d i t i o n
G a m e s with convex payoff
function
29
Fig. 9.
T h i s yields
3 = 0, - * « ) » + ( » - * ) *
0)
-0.
i t j ^ ( x - x , ) + (y 2
y i
y
It can be r e a d i l y seen t h a t the point x = 0,y = 0 is a solution to equations (*), since, s u b s t i t u t i n g x = 0, y = 0 in (*), we o b t a i n A
3
E
T-
3
= E
j ^
r
=
3
Thus, /
0
is a p o i n t of m i n i m u m of the function R=
e"), i.e. for a l l .4 g S
E{h,v-)<E{l ,v-). A
w h i c h was to be shown ( F i g . 9 ) . C o n s i d e r a generalization of the preceding game. L e t P l a y e r I choose not one p o i n t x 6 S, but m points i ' ' , . . . , x ' ' € 5 . F o r m a l l y , the game proceeds as follows. T h e convex compact set S is given on the plane. P l a y e r II chooses 1
m
Preliminaries
30
a point y € S, a n d P l a y e r I a system of m points x ' , . . . , x > € S. Choices are m a d e simultaneously a n d independently of one another. P l a y e r I P s payoff is assumed t o be ] )
1 ^ f ) -
m
E M
1
,m
i
- ! /
,
i
)
+ {4
2
, 1
-^) ] 2
=
here x = ( x ' ' , . . . , x^ '). Player I ' s payoff is equal t o — K. 1
m
In this game a set of strategies for the m i n i m i z i n g player P = [S] is convex and closed as Cartesian product of closed convex sets. T h e function K{x,y) is strictly convex, i n x w i t h every fixed y. Indeed, t h e function w(u) — u is strictly convex. m
2
If t h e function w(u) is s t r i c t l y convex , then t h e function w(u + c) is also strictly convex w i t h any fixed c. If the function t O j f i i ! ) , u ; ( i i 2 ) , • - •, w (u ) are convex, then the function $ ( u i , . . . , u ) = C-ti)j(uj) is also convex. T h u s , we may a p p l y T h e o r e m 1 on the structure of o p t i m a l strategies for players, according to w h i c h the o p t i m a l strategy for Player I is pure, and the m i x e d strategy of Player II concentrates measure at no more t h a n three points of the set S. T h e value of the game is equal t o 2
n
n
n
1 min max — V*a{*'"*,»). *(.) «*£„(•, t) be o p t i m a l strategies for the Players I a n d I I , and y
the value of the game r(x,y,N).
v(x,y,N)
T h e function F[x,y,k)
is defined
as follows: F(x,y,k)
= max mir, { m a x [f(x\
y\ 0 ) , V ( x ' , y', k — 1)]} =
= m a x [ / ( z , y , 0 } , m a x rnin V(x\ =
y\ k - 1)] =
(1.5.2)
ms.x\f(x,y,0) V(x,y,k-\)] >
Strategies («*)* , (w*)*^ are defined by the rule y
k
(u-)
= / " r v ( - - 0 , at t> 0, x- e U , y' 6 %, z
'
\
i V(x',y' k~\) >
= mm .Vlx; \k-\)), z
y
t
:
=0
>
O n e class of games with complete
i « ^ u < j - |
S
35
information
)
a
t
t
=
0
W e w i l l show that F ( x , y , f c ) is the value of the game r ( * , y , i ) , aud *) a""! ( * ) l ( " ) o p t i m a l strategies for the players I and II i n this game. B y definition, the payoff K i n the situation (w")'jL('i*)»(*")£*(*,'*) is equal to F(x,y,k). w
v
, f
a r e
N o w let one of the players, say Player II, depart from the strategy (i>")£,„{-,( J, choosing a strategy ( u ) * ( - , t ) instead of i t . T h e n , from (1.5.2) we o b t a i n ' v
> m a x \f(x,y,0),
F(x,y,k)
m i n V(x\y,k-
1)"| .
(1.5.3)
Generally, j / does not coincide here w i t h y from (1.5.2), since the strategy (»)«(•. 0
d i f f e r s
f
r
o
m
t h e
strategy
(»•)*,(.,*),
Let m i n ^ y , ^ ( a ; , j / , i — 1) is achieved at the point x = x(y). game r ( i , j / , f c — 1). L e t K(u,v) be the payoff f u n c t i o n , and o p t i m a l strategies for the P l a y e r I and II i n this game. T h e n V ( i , y , f c - 1) = K&ftti?)
C o n s i d e r the S f c ' the
(1-5.4)
>
for any strategy v * ^ of P l a y e r II i n the (k — l ) - s t a g e game. 1
In (1.5.4), let u j ^ ' be a restriction of the strategy u £ ( - , i ) on a position set i n the game r(x,y,k — 1). T h e n , using (1.5.3), (1.5.4), we may w r i t e the following c h a i n of inequalities: iV
>K({*X„,(v-)* )>
F(x,y,k)
tf
>max[/(x,y,0),K(i,!7,^l)]> >max[/(x,y,0),^ut^Ol =
(1.5.5)
=
*(("'}*,,()*„. S i m i l a r l y , it may be shown that for a l l ( « ) £ „ F(x,y,k)
=
K((v-)i, ,(vX,)< v
k
W e shall find the value of game and o p t i m a l strategies for the players by solving the equation (1.5.7). Since f{x ,y ,k) = 0 w i t h k < N, we get the equation V(x,y,k) = max m i n V(x\y',k~ 1) (1.5.9) k
k
w i t h the i n i t i a l c o n d i t i o n V(x, y, 0) = p{x, y). Hence we have = m a x m i n p(x',y') y'ev,,x'eLii
V(x,y,l)
Since U a n d V are the circles w i t h centers at x and y, and a and 0 as r a d i i , then if U D V , the function V(x,y, 1) = 0,if, however, U C V then m a x . y ^ m i n ^ e ^ p(x',y') = p(x,y) + 0-a = p(x,y)-(a~0) (see analysis of E x a m p l e 1 i n 2). W e have x
y
x
v
y
x
.. J O, ( >y L6
6
In view of the s y m m e t r y for y < 0, the payoff of P l a y e r E turns out to be 1/3. C o n s e q u e n t l y , p l a y i n g any pure strategy, P l a y e r E cannot a payoff below 1/3.
Preliminaries
42
F r o m (1.6.5) and (1.6.6) we may conclude t h a t M(p',v") = 1/3. Case B. S is a circle w i t h radius r, its center being at the o r i g i n of coordinates. In what follows, we denote by D (S ) a circle circumference of the circle w i t h radius r, its center being at the o r i g i n of coordinates. r
T
Consider the function $(r,/>) = r
+ p - i r r o , 0 < r, p < R.
2
2
L e m m a 8 The function ®{r,p). (as a function of the variable r) is strictly convex and achieves the absolute minimum at a unique point r = 2R/x. 0
P r o o f : W e have "
f
c
d
.
^
i
.
.
,
*
(
1
, „
Consequently, the function $ ( r , p) r 6 [OR], is s t r i c t l y convex, and the derivat
tive Or
=
2
r - —
1.6.8
IT
is strictly monotone. E v i d e n t l y , the function (1.6.8) vanishes at a unique point r = 2R/ir, Because of the strict convexity of the function $(r ,R) (1.6.7), the point r is a unique point of absolute m i n i m u m . T h i s proves the lemma. 0
0
0
L e m m a 9 The function 4>(r, p) is strictly maximum at the point p — R.
convex in p and achieved
absolute
0
P r o o f : Because of the s y m m e t r i c entry of variables r and p into the function $(r,p),
the strict convexity of §{r,p)
follows from L e m m a 8.
W e have $ ( r , R) - * ( r , 0 ) = r + R - -r R it 0
2 0
0
2
- r
2
0
=
_ * _ i » a _ » ^ _ „ » , . We have thus proved the l e m m a completely.
•
Remark. L e m m a s 8 a n d 9 suggest that the point ( r , R) is the saddle point of the function $(r,p). Indeed, * ( r , p ) < * ( r , R) < * ( r , P ) . 0
0
0
T h e o r e m 8 Mixed optimal strategies are: for Player P the choice of Xi with uniform destribution on S , x = —x , for Player E the choice of y with uniform destribution on S . The value of the game is equal to $>(r , R). ra
2
2
t
0
S i m u l t a n e o u s games of pursuit
with non-convex
Fig.
payoff
functions
43
12.
P r o o f : D e n o t e the stated strategies by
ii*,i>".
Let P l a y e r E adopt the strategy u", and Player P an a r b i t r a r y pure strategy x = x,- = ( r i c o s y > i , r i s i n ^ , ) , i = 1,2.
(&iy%);
C o n s i d e r the case x = x%. Denote by r a number n — r , and by
-
{
W
=
Preliminaries
44
Let = -[{R +
f i M
4)0-2Rr
2
f (v>) = - P
+ r ) ( i r - /?) + 2 f l r , sin^cos F,(0) + I5£r) =
- F( )
Vl
2
V2
/ (R + r | - 2 f l r cos ,, r,- s i n ^>). u
2
2 l
N o w let X i a n d x lie o n the diameter of the circle D , the distance between t h e m being 2 r , a n d the chord A B (for t h i s case) forms t h e central angle 2 a . Assume that x = (Rcosa — r,Q), x = (Rcosct + r , 0 ) . P l a y e r £ ' s payoff is equal t o 2
r
x
2
*(-Rcosa+
r) + R sin ip}dyj = 2
2
2
a
[K ~ ZRws'KRcosa 2
c
'i R2
2Rcos
^(
Rcosa
-) r
+ r) + (Rcosa
+(
+
-)w
ficosa
r
2
r) }dyj+ 2
=
Simultaneous
games of pursuit = ^{[R + [R
2
with non-convex
+ (Rcos a + r) ]a 2
- 2 f t s i n a(Rcos
a +
+ ( f l c o s a - r) ](7r - a) + 2Rsma(Rcos 2
2
45
payoff functions r)+
a -
r)}.
W e show that the f u n c t i o n r) for a fixed r w i t h respect to a a achieves m i n i m u m when a = x / 2 . P e r f o r m i n g elementary c o m p u t a t i o n s , we obtain 3* 2 T T — = - f l s i n a\(n — 2a)r — x f t c o s a l , da IT " ' ' therefore for sufficiently small a j°' < 0, because s i n a > 0, r(w - 2a) x/E COS a < 0 (in the l i m i t i n g case r x - x i i = x ( r - R) < 0, r < R). A t the same t i m e , 2*fcZ3£l = -R • 0 = 0. 3 l t
r >
2
W i t h each f i x e d r , the f u n c t i o n ^ j " ' ^ has no zeros i n a exept for a = x / 2 . Suppose the opposite is true. L e t Q i € ( 0 , x / 2 ) be a zero of the function —j?^. T h e n , for a = en, the function G(a) = (x — 2a)r — uRcosa also equals to zero. T h u s , ( ? ( « , ) = G ( x / 2 ) = 0. E v i d e n t l y , G ( a ) , as the difference of a linear f u n c t i o n (x — 2 o ) r a n d the function irftcosa, has no zeros: G(a) > 0 for a (E ( t * i , x / 2 ) . W e compute its dervatives: G'{a) = - 2 r + x f l s i n a ,
= xtfcoso > 0
G"(a)
i.e. the f u n c t i o n G ( a ) is convex. Since G(a)
> 0, a € ( c t i , x / 2 ) , this contra-
dicts the above inequality. T h u s , the f u n c t i o n *j£' ^ < 0 for a € ( o i , x / 2 ) and d
3 l t ,
^ ' ^ 2
r
=
r
0- Consequently, the function *(or, r ) w i t h respect to c* achieves
absolute m i n i m u m for a = x / 2 : *(o,r) > *(x/2,r). T h i s means that i n this case the payoff of P l a y e r E = * ( a , r ) > * ( x / 2 , r ) > {r,R) > * ( r „ , f l ) .
M(x,v-)
(1.6.11)
Based on (1.6.9)—(1.6.11), it turns out that for any pure strategy x = ( a : j , i ) Player E has the payoff M{x,v-) > *(r ,ft). (1.6.12) 2
0
Suppose P l a y e r P adopts the strategy p", a n d Player E an a r b i t r a r y pure strategy y — (p cos i>, p sin t/>). T h e n P l a y e r £ obtains the payoff A/(u*,!/) = - L f 2 x Jo = _L 2 x Jo
m
\ (p a
min[p
+ rl-2pr cos(i!--
m
1 2m - 1
where G' is a strategy of Player E. We restrict ourselves to the cases when the points Xi are l y i n g o n the interval (in the remaining cases the proof is s i m i l a r ) : 1. n €ti,i= 2. n £ii,
1,2,...,m; 1 , 2 , . . . , * - l,fc + l , . . . , m , x e e „
i=
k?p.
k
In the first case, 2m - 4i + 1