Kronecker Products and Matrix Calculus: with Applications ALEXANDER GRAHMvI, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior L...
63 downloads
1254 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Kronecker Products and Matrix Calculus: with Applications ALEXANDER GRAHMvI, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior Lecturer in Mathematics, The Open University, Milton Keynes
.,...;.
ELLIS HORWOOD LIMITED Publishers· Chichester
Halsted Press: a division of JOHN WILEY & SONS New York· Brisbane· Chichester· Toronto
first published in 1981 by ELLiS HORWOOD LIMiTED Market Cross House, Cooper Street, Chichester, West Sussex, PO 19 lEB, England 11Ie publisher's colophon is reproduced from James Gillison's drawing of the allcient Market Cross, Chichester.
Distributors: Australia, New Zealand, South-east Asia; Jacaranda-WUey Ltd., Jacaranda Press, JOHN WILEY & SONS INC., G.P.O. Box 859, Brisbane, Queensland 40001, Australia Canada: JOHN WILEY & SONS CANADA LIMITED 22 Worcester Road, Rexdale, OntariO, Canada. b'urope, Africa.' JOHN WILEY & SONS LIMITED Baffins Lane, Chichester, West Sussex, England, North and South America and the rest of the world: Halsted Press: a division of JOliN WILEY & SONS 605 Third Avenue, New York, N.Y. 10016, U.S.A.
© 1981 A. Graham/Ellis Horwood Ltd. British Library Cataloguing in Publication Data Grw:un. Alexander Kronecker products and matrix calculus. (Ellis Horwood series in mathematics and its applications) 1. Matrices 1. Title 512.9'43 QA188 Library of Congress Card No. 81-7132
AACR2
ISBN 0-85312-391-8 (Ellis Horwood Limited, Library Edition) [SBN 0-85312-427-2 (Ellis Horwood Limited. Student Edition) ISBN 0-470-27300-3 (Halsted Press) Typeset in Press Roman by Ellis Horwood Ltd. PIlnted in Great Britain by R. J. Acford, Chichester
COI'YRIGIIT NOTICE -
All Rillht~ Rescrved. No [lurt or this publication may be rcproduccd, stored in a retricval ~ystCl\\, or tranSlllillcd,ln any form or by any means, ele~tronic, mcchanical, photocopying, recording or otherwise, without the permission of E111s Horwood Limited, Market Cross House, Cooper SIIeet, Chichester, West Sussex, England.
Table of Contents Author's Preface ..........................................7 Symbols and Notation Used ..................................9 Chapter 1 - Preliminaries 1.1
Introduction ....................................... 11
1.2 Unit Vectors and Elementary Matrices ...................... 11
1.3 Decompositions of a Matrix ............................. 13
1.4 The Trace Function .................................. 16 1.5 The Vec Operator
................................. 18
.
Problems for Chapter I ................................20
.01.
Chapter 2 - The Kronecker Product
2.1 Introduction ....................................... 21 2.2 Definition of the Kronecker Product .......................21 (/]
2.3 Some Properties and Rules for Kronecker Products ............. 23
2.4 Definition of the Kronecker Sum .........................30 2.5 The Permutation Matrix associating vccX and vecX' ............. 32 4-.
Problems for Chapter 2 ................................ 35
Q[>
a..
Chapter 3 - Some Applications for the Kronecker Product
3.1 Introduction ....................................... 37 3.2 The Derivative of a Matrix ..............................37 0..
v-1
3.3 Problem 1: solution of AX + XB = C ..................... 38 3.4 Problem 2: solution of AX + XA = µX ..................... 40 3.5 Problem 3: solution of X = AX + XB ..................... 41 3.6 Problem 4: to find the transition matrix associated with }..
the equation X = AX + XB ............................ 42
3.7 Problem 5: solution of AXB = C .........................44 3.8 Problem 6: Pole assignment for a Multivariable System...........45
380A
Table of Contents
6
Chapter 4 - Introduction to Matrix Calculus
4.1 Introduction ....................................... 51
4.2 The Derivatives of Vectors ............................. 52 4.3 The Chain rule for Vectors ............................. 54 4.4 The Derivative of Scalar Functions of a Matrix
with respect to a Matrix ............................... 56 its Elements and Conversely ............................60 4.6 The Derivatives of the Powers of a Matrix ................... 67 Problems for Chapter 4 ................................ 68 4.5 The Derivative of a Matrix with respect to one of
Chapter 5 - Further Development of Matrix Calculus including an Application of Kronecker Products
5.1 Introduction ....................................... 70 5.2 Derivatives of Matrices and Kronecker Products ............... 70
5.3 The Determination of (avecX)/(avecY) for more
complicated Equations ............................... 72 5.5 The Matrix Differential ................................ 78 Problems for Chapter 5 ................................ 80 5.4 More on Derivatives of Scalar Functions with respect to a Matrix .... 75
A.°
Chapter 6 - The Derivative of a Matrix with respect to a Matrix
6.1 Introduction ....................................... 81 6.2 The Definition and some Results ......................... 81 ...
6.3 Product Rules for Matrices ............................. 84 6.4 The Chain Rule for the Derivative of a Matrix with respect to Matrix .88
Problems for Chapter 6 ................................ 92
Chapter 7 - Some Applications of Matrix Calculus '_'
7.1 Introduction ....................................... 94 7.2 The Problems of Least Squares and Constrained Optimization in Scalar Variables ..................................... 94 'C7
7.3 Problem 1: Matrix Calculus Approach to the Problems
of Least Squares and Constrained Optimization ................96 7.4 Problem 2: The General Least Squares Problem ............... 100 ...
'v,
7.5 Problem 3: Maximum Likelihood Estimate of the Multivariate Normal 102
7.6 Problem 4: Evaluation of the Jacobians of some Transformations... 104 7.7 Problem 5: To Find the Derivative of an Exponential
Matrix with respect to a Matrix ......................... 108
Solution to Problems ..................................... III Tables of Formulae and Derivatives ............................ 121
Bibliography ........................................... 126
Index ............................................... 129
Author's Preface
'i.
...
_r.
My purpose in writing this book is to bring to the attention of the reader, some recent developments in the field of Matrix Calculus. Although some concepts, such as Kronecker matrix products, the vector derivative etc. are mentioned in a few specialised books, no book, to my knowledge, is totally devoted to this subject. The interested researcher must consult numerous published papers to appreciate the scope of the concepts involved. Matrix calculus applicable to square matrices was developed by Turnbuil [29,301 as far back as 1927. The theory presented in this book is based on the works of Dwyer and McPhail [15] published in 1948 and others mentioned in the Bibliography. It is more general than Turnbull's development and is applicable to non-square matrices. But even this more general theory has grave limitations, in particular it requires that in general the matrix elements are non constant and +~+
O..
rte.
''.
i1.
T°°
4-.
.'7
independent. A symmetric matrix, for example, is treated as a special case.
~a.,
1)¢
CV]
.'7
'r+
Methods of overcoming some of these limitations have been suggested, but I am not aware of any published theory which is both quite general and simple enough to be useful. The book is organised in the following way: Chapter 1 concentrates on the preliminaries of matrix theory and notation which is found useful throughout the book. In particular, the simple and useful
elementary matrix is defined. The vec operator is defined and many useful
l/1
..^
'-'
^.N
...
in.
...
9a)
relations are developed. Chapter 2 introduces and establishes various important properties of the matrix Kronecker product. Several applications of the Kronecker product are considered in Chapter 3. Chapter 4 introduces Matrix Calculus. Various derivatives of vectors are defined and the chain rule for vector differentiation is established. Rules for obtaining the derivative of a matrix with respect to one of its elements and conversely are discussed. Further developments in Matrix Calculus including derivatives of scalar functions of a matrix with respect to the matrix and matrix differentials are found in Chapter 5. '27
Chapter 6 deals with the derivative of a matrix with respect to a matrix.
Author's Preface
8
'..
This includes the derivation of expressions for the derivatives of both the matrix product and the Kronecker product of matrices with respect to a matrix. There is also the derivation of a chain rule of matrix differentiation, Various applications of at least some of the matrix calculus are discussod in Chapter 7, By making use, whenever possible, of simple notation, including many worked examples to illustrate most of the important results and other examples at the end of each Chapter (except for Chapters 3 and 7) with solutions at the 4..
a`1
1..
4.n 'CJ
461
end of the book, I have attempted to bring a topic studied mainly at post-S7
'C1
graduate and research level to an undergraduate level.
Symbols and Notation Used A,B,C... matrices
el e
Ell 0,,, SU
A., Aj. A1.'
(A')., (A').; tr A
vecA AOB iff diag {A} 8Y
the transpose of A the (i, j)th element of the matrix A the matrix A having arf as its (4 j)th element the unit matrix of order m X in the unit vector the one vector (having all elements equal to one) the elementary matrix the zero matrix of order in X m the Kronecker delta the lth column of the matrix A the jti row of A as a column vector the transpose of Af. (a row vector) the ithe column of the matrix A' the transpose of the ith column of A' (that is, a row vector) the trace of A an ordered stock of columns ofA the Kronecker product of A and B if and only if the square matrix having elements all, a22, . . . along its diagonal and zeros elsewhere .w,
ari [aif] I,,,
,..
A'
a matrix of the same order as Y
aXrs
ayf
ax Ers
E#
a matrix of the same order as X
an elementary matrix of the same order as X an elementary matrix of the same order as Y
CHAPTER I
Preliminaries
1.1 INTRODUCTION
In this chapter we Introduce some notation and discuss some results which will
be found very useful for the development of the theory of both Kronecker products and matrix differentiation. Our aim will be to make the notation as
fl.
.".
simple as possible although inevitably it will be complicated. Some simplification may be obtained at the expense of generality. For example, we may show that a result holds for a square matrix of order n X n and state that it holds in the more general case when A is of order in X n. We will leave it to the interested reader to modify the proof for the more general case. Further, we will often write L°.
.....
or
or
justDij instead of
m n
ij i=1 j=1
when the summation limits are obvious from the context. Many other simplifications will be used as the opportunities arise. Unless of particular importance, we shall not state the order of the matrices considered. It will be assumed that, for example, when taking the product All or ABC the matrices are conformable. 1.2 UNIT VECTORS AND ELEMENTARY MATRICES
The unit vectors of order n are defined as 1
0
0
0
1
0
,
e2 = 0
,
..., e _
0
...
e1 = 0 Pi
L0J
L1
Preliminaries
12
[Ch. 1
The one vector of order n is defined as 11 1
e=
(1.2)
1
1
From (1.1) and (1.2), obtain the relation
e = Eel
(1.3) r.,
The elementary matrix E,i is defined as the matrix (of order m X n) which has a unity in the (i, f)th position and all other elements are zero. For example,
000...0
001 ...0 E23 =
000...0
(1.4)
Lo00...0J The relation between e1, ei and E11 is as follows
Eli = ei el
ti.
Cs.
where ei denotes the transposed vector (that is, the row vector) of el.
t27
Example 1.1 Using the unit vectors of order 3
(i) form Ell, E21, and E23 (ii) write the unit matrix of order 3 X 3 as a sum of the elementary matrices. Solution 1
1
E11= 0 [1 001= 0
0
1 [100]= 100 0 0
E23 =
000 000 --+
E21=
1
00
000
1
0
000 0 0 0
[0 0 1]= 0 0
..-
(i)
1
000
Decompositions of a Matrix
Sec. 1.3]
13
3
(ii)'
! = Eit + E22 + E33 =
eiej r= The Kronecker delta Sij is defined as
1ift=/ Oifizkj
Sid CAD
it can be expressed as
Sij=ejei=ejei
(1.6)
.
We can now determine some relations between unit vectors and elementary matrices.
Eijer = eiejer (by 1.5) (1.7)
= 5/rei and
e,.Eii = e.eiej (1.8)
= Sriej Also
EijErs = eieieres = 5jetes = SjrEis
(1.9)
In p articular if r =f, we have
EijEjs=51jEis=Eis and more generally (1.10)
LijEjsEsrn = EisEsm = Eim
Notice from (1.9) that
EijErs = 0 if / # r . 1.3 DECOMPOSITIONS OF A MATRIX We consider a matrix A of order m X n having the following form
A
all all
ainn
a12
a22
.
a2n
= [a11]
Lamlamt amnJ We denote then columns of A by A.1, A.2, ... A,n. So that
A.j = a2i an,j
(j = 1, 21 .... n)
(1.12)
[Cll. 1
Preliminaries
14
and them rows of A by A1., A.2, ...A.. so that au
A. =
(i = 1,2,... ,m)
ate
(1.13)
at,
Both the A.l and the A. are column vectors. In this notation we can write A as the (partitioned) matrix or as
A = [A.1 A.2 ... A.,,]
(1.14)
A = [A1.A2.... A,,,.]'
(1.15)
fl,
(where the prime means 'the transpose of'). For example, let
A
all al a21 a22
so that
A1. =
all
and
a21
A2. _
a12
then
a22
Pall
a2I' = call a121 a221
L 12
=A.
La21 a22
The elements, the columns and the rows of A can be expressed in terms of the unit vectors as follows:
So that
The jth column A.1 = Ael
(1.16)
The ith rowAi '= ejA.
(1.17)
A;. = (e,A)' = A'e1.
(1.18)
The (i,j)th element ofA can now be written as
all = ejAel = eeA'el We can express A as the sum
(1.20) '+f
A = EEailEfl A = EEaile,e1.
0
(where the Ell are of course of the same order as A) so that
(1.21)
--*
1(,
Decompositions of a Matrix
Sec. 1.31
15
From (1.16) and (1.21)
A. j = Aej = (2Eaiieie)ei = ZEatjet(e/ej) (1.22)
= 2;a;ie; . Similarly
so that
At. = Ea;jej
(1.23)
j
A;. = Eatjej
(1.24)
.
I
It follows from (1.21), (1.22), and (J.24) that
A = XA.jej
(1.25)
A = Eet A;.' .
(1.26)
GIN
[1.
and
Example 1.2 Write the matrix
Fall a,2 N..
A=
L2l a2J as a sum of: (i) column vectors of A; (ii) row vectors of A.
Solutions (i) Using (1.25)
A = A.le'1 + A.2e2
[
a
21
[1 03 +
[0 1]
a22
al e
Using (1.26)
A = el A1: + e2A2.'
ro [all a12] + [00
[a,21
a,22]
a.)
There exist interesting relations involving the elementary matrices operating on the matrix A. For example
EtjA = e;ej'A = e1Aj '
(by 1.5) (by 1.17)
(1.27)
[Ch. I
Preliminaries
16
similarly
AErj = Ae;ej' = A.ree
.(by 1.16)
(1.28) (1.29) CST
.On
sa that
AEij = A.jee AE,jB = Aejej'B = A.,B1.'
(by 1.28 and 1.27)
,ErjAEr,i = ere/Aeres
(by 1.5)
= ejalre'l
(1.30)
(by 1.19) (1.31)
= ajreie; = airEls +:.
t.)
In particular (1.32)
Ej1AErr = airEir
v.. ,...
Example 1.3 Use elementary matrices and/or unit vectors to find an expression for (i) The product AB of the matrices A = [a,1] and B = [bij]. (ii) The kth column of the product AB (iii) The kth column of the product XYZ of the matricesX= [xji], Y= and Z = [zii] Solutions
(i) By (1.25) and (1.29)
A = EA. i e, = EAEii hence
AB = E(AE11)B = E(Aej)(ej'B) = EA.1Bj.'
(by (1.16) and (1.17)
(ii) (a) by (1,16)
`s]
(AB).k = (AB)ek = A(Bek) = AB.k (b) From (i) above we can write
(AB).k =
E(Aejej'B)ek = E(Aej)(e%Bek)
= EA./bjk i
(iii)
by (1.16) and (1.19)
(XYZ).k = Ezjk(XY).j
by (ii)(b) above
= E(zjkX)Y.j
by (ii)(a) above.
1.4 THE TRACE FUNCTION The trace (or the spur) of a square matrix A of order (n X n) is the sum of the diagonal terms
n
art 1=1
The Trace Function
Sec. 1.4] We write
17
tr A = Eau
(1.33)
From (1.19) we have
aj1 = e';Aet, so that
tr A = Ee'iAei
(1.34)
From (1.16) and (1. 34) we find
tr A = Ee'iA.j
(1.35)
and from (1.17) and (1.34)
tr A = EAj.'ej
(1.36)
.
We can obtain similar expression for the trace of a productAB of matrices. For example
tr AB = Ee'jABej
(1.37)
t
= EE(e'Ae1)(e%Bet) II
= Efatlbfj
(See Ex. 1.3) (1.38)
Similarly tr BA = EeeBAe1
=
= EEbljat/
(1.39)
From (1.38) and (1.39) we find that
trAB=trBA.
(1.40)
From (1.16), (1.17) and (1.37) we have
tr AB = EA; B.t
(1.41)
Also from (1.40) and (1.41)
tr AB = EB1.A.j Similarly
.
(1.42)
tr AB' = EAj.B1 .
(1.43)
and since tr AB' = Is A'B
tr AB' = EA.'jB.t
(1.44)
Preliminaries
18
[Ch. I
'C3
Two important properties of the trace are .nd
tr (A + B) = tr A + tr B
(1.45)
tr (a A) = a trA
(1.46)
where a is ascalar.
These properties show that trace is a linear function. For real matrices A and B the various properties of tr (AB') indicated above show that it is an inner product and is sometimes written as
tr (AB') _ (A, B) 1.5 THE VEC OPERATOR We shall make use of a vector valued function denoted by vec A of a matrix A defined by Neudecker (221. If A is of order m X n A.1
vecA = A.2 LA. J,
(1.47) .
-U.
From the definition it is clear that vecA is a vector of order mn. For example if
A =
a21 azz C11 a'2
then
rai n
vecA = a21 a12
a22
Example 1.4 Show that we can write tr AB as (vec A')' vec B Solution By (1.37) tr AB =
Ee'jABe1
= EAi;B,1
by (1.16) and (1.17)
(since the ith row of A is the ith column of A')
Sec. 1.51
The Vec Operator
19
Hence (assuming A and B of order n X n) B.l
tr AB = E(A').1'(A').i 2'.
_ (vec A')'vec B
(A').,,']
B.2
B,
Before discussing a useful application of the above we must first agree on notation for the transpose of an elementary matrix, we do this with the aid of an example. X11 Xl2 X13
Let X =
X21 X22 X23
then an elementary matrix associated with will X will also be of order (2 X 3). For example, one such matrix is
_ E12=
0
0
1
000
The transpose of E12 is the matrix 0 0
E12 =
0
1
00 'C7
t3.
..,
.N.
c$'
Although at first sight this notation for the transpose is sensible and is used frequently in this book, there are associated snags. The difficulty arises when the suffix notation is not only indicative, of the matrix involved but also determines specific elements as in equations (1.31) and (1.32). On such occasions it .NJ
..d
will be necessary to use a more accurate notation indicating the matrix order and the element involved. Then instead of E12 we will write E12(2 X 3) and instead of E12 we write E21(3 X 2),
More generally if X is a matrix or order (in X n) then the transpose of Ers (171 X n)
will be written as Ers
unless an accurate description Is necessary, in which case the transpose will be written as
Esr(nXm)
.
Now for the application of the result of Example 1.4 which will be used later on in the book.
[Ch. 1]
Preliminaries
20
From the above tr E,''A = (vec Ers)' (vec A) ars
..y
where ars is the (r,s)th element of the matrix A. We can of course prove this important result by a more direct method.
tr E',.sA = Ee ErsAek ai/ekese;.eiejek
(sinceA =>aiiEij)
i, j, k
ij'k$Sri'jk = ars i,1, k
Problems for Chapter 1
(1) The matrix A Is of order (4 X n) and the matrix B is of order (n X 3). Write
the product AB in terms of the rows of A, that is, A,., A2., .. , and the columns of B, that is, B.1, B.2, ...
.
(2) Describe in words the matrices (a) AEik
and
(b) EikA .
(3)
Show that
(a) trABC= EA1.BC.i
(b) trABC= trBCA=trCAB
.-.
C.1
Write these matrices in terms of an appropriate product of a row or a column of A and a unit vector.
Show that tr AEij = aji
B = [bij] is a matrix of order (n X n) diag {B} = diag {bll, b22, ... , b,,,, } = EbiiEii .
Show that if aij = tr BEjj6jj
then A = [aij] = diag{B}
CHAPTER 2
The Kronecker Product 2.1 INTRODUCTION
Kronecker product, also known as a direct product or a tensor product is a concept having its origin in group theory and has important applications in +.,
particle physics. But the technique has been successfully applied in various fields
of matrix theory, for example in the solution of matrix equations which arise when using Lyapunov's approach to the stability theory. The development of the [3.
technique in this chapter will be as a topic within the scope of matrix algebra.
2.2 DEFINITION OF THE KRONECKER PRODUCT
Consider a matrix A = [aqj of order (m X n) and a matrix B = [bq] of order
.''
(r X s). The Kronecker product of the two matrices, denoted by A O B is defined as the partitioned matrix
AOB =
a11B
a12B
...
a21B
a22B
...
a2,B a,r,,,
LamIB
(2.1)
B
A O B is seen to be a matrix of order (rnr X its). It has inn blocks, the (i,j)th block is the matrix a11B of order (r X s). For example, let
A E P' 11 ail f a21
B_ I bil b121
a221
I b21 b22
then
rallbll allbl2 al2bll a12b12
AOB
a11B a12B
La21B a22B
=
a11b21 a1lb22 a12b21 a12b22 a21b11 a21b12 a22b11 a22b12 a21 b21 a21 b22 a22 b21 a22 b22
(Ch. 2
The Kronecker Product
".. Z2
CUD
Notice that the Kronecker product is defined irrespective of the order of the makes involved. From this point of view it is a more general concept than matrix multiplication. As we develop the theory we will note other results C3.
w{..'
which are more general than the corresponding ones for matrix multiplication.
The Kronecker product arises naturally in the following way. Consider two
linear transformations
x = Az and y = Bw
xt
x2
Fall
at2 r a t
Last
a22
Yt
btr
Y2
bet
and
Z2
._._
which, in the simplest case take the form bb2r22,
wt (2.2)
LW J
We can consider the two transformations simultaneously by defining the following vectors
xiyt
ztwt
XI VI
z ws
x 0y =
v= z© w=
and
(2.3)
I x2Yt
z2wt
x2 Y2
z2w2
.
To find the transformation between µ and v, we determine the relations between the components of the two vectors. For example, xtyt = (attzt + at2z2) (btt wt + bt2w2) = all btt (ziwt) + all bt2(ztw2) + at2btt(z2wt) + at2bt2(z2w2)
attbt2
at2btt
a,-2b,2
attb21
all b22
at2b2t
a12b22
a2tbtt
a21b12
a22brt
a22b12
a21b12
a2tb22
a22b2t
a22 b22
.N..
u=
alibi,
tr.
Similar expressions for the other components lead to the transformation
v .N.
or
µ = (A®B)v, that is
Az®Bw = (A®B)(z(Dw)
.
(2.4)
Example 2.1
Let Eq be an elementary matrix of order (2 X 2) defined in section 1.2 (see 1.4). Find the matrix 2
U=
2
Ej, i ®EI I
Sec. 2.3]
Some Properties and Rules for Kronecker Products
23
Solution
L°-!+
U =Ell (8) Ell +E1,2 ®E2,1 +E11 ® E12 +E2,2 ®E2,2
f11 ® r61 + roa1 ® roof + (001 00
0 0
0 0
1
0
I
of
(lo
'
+ ( 011 ®
so that 1
U =
it
Lo 0J
011
0 0 0
0 0
1
0
0 0
1
0 0 0
0
1
Note. U is seen to be a square matrix having columns which are unit vectors er(i = 1, 2,.. ). It can be obtained from a unit matrix by a permutation of rows or columns. It is known as a permutation matrix (see also section 2.5). 2.3 SOME PROPERTIES AND RULES FOR KRONECKER PRODUCTS
We expect the Kronecker product to have the usual properties of a product. I
If a is a scalar, then
A O (aB) = a(A ®B)
.
(2.5)
Proof The (i, j)tli block of A O (aB) is [are (aB)J
= a[a11BJ
= a[(i, j) th block of A O BJ The result follows.
It The product is distributive with respect to addition, that is (a)
(b)
(A+B)OC = AOC+B®C A®(B+C) = ul ®B+A®C
Proof We will only consider (a), The (i, j)th block of (A + B) ® C is (ali + b1i) C
.
The (i, j)th block of A ® C + B ® C is
a11C+b;1C = (a11+bl)C
(2.6)
(2.7)
(Ch. 2
The Kronecker Product
24 Since the two
-III The
blocks are equal for every (i,j), the result follows.
product is associative
A®(B®C) _ (A(2-9 B)®C .
(2.8)
IV There exists a zero element Ornr, = Orr, 2) On (2.9)
a unit element Imn ° Im ® In 0
The unit matrices are all square, for example In, in the unit matrix of order (jn X m).
Other important properties of the Kronecker product follow. (2.10)
V (A ®B)' = A' ®B' Proof '+7
The (i,j)th block of (A (D B)' is ai jB' .
VI (The `Mixed Product Rule').
(A ®B) (C ®D) = AC ®BD
(2.11)
provided the dimensions of the matrices are such that the various expressions exist.
C.,
Proof The (i,j)th block of the left hand side is obtained by taking the product of the t.,..
ith row block of (A ® B) and the /th colum block of (C ® D), this is of the following form c11D
(ai1B ai2B ... ajnB)
c21D
cn1D
= EajrcriBD . r
-
-
The (i, j)th block of the right hand side is (by definition of the Kronecker product) gj1BD
where gji is the (i, j)th element of the matrix AC. But by the rule of matrix multiplications
gji=Zajrcri
Sec. 2.31
Some Properties and Rules for Kronecker Products
25
Since the (i,j)th blocks are equal, the result follows.
VII Given A(m X m) and B(n X n) and subject to the existence of the various inverses,
(A©B)'' = A"' OBy'
(2.12)
Proof Use (2.11 )
(A ®B) (A-' ®B"') = AA-' ®BY-' = I, ®In = Inv. The result follows.
VIII (See (1.47))
vec(AYB) _ (B' ®A) vec Y
(2.13)
Proof We prove (2.13) for A, Y and B each of order n X n. The result is true for A(m X n), Y(n X r), B(r X s). We use the solutions to Example 1.3(iii).
(AYB).k = E(bikA)Y.i
'-
i
-,
Y.1
_ [blkA b2kA ... bnkA1
Y. 2
Y.n
= [B.k'®A]vecY = [(B')k: ®A] vec Y since the transpose of the kth column of B is the kth row of B'; the results follows.
Example 2.2 Write the equation
all a21
a12
XI
X3
a22
X2
X4
X11
C12
X21 `2J
in a matrix-vector form.
Solution The equation can be written as AXI = C. Use (2.12), to find
vec (AXI) = (1®A) vec X = vcc C ,
The Kronecker Product
26
[Ch. 2
so that
0-1
x1
C1t
0
x2
X21
a12
X3
a12
o`"
x4
Lc22
Fall a12 0 a21
a22 0
0
0
0
0
all
a22
a21
Example 2.3 A and B are both of order (n X n), show that
(i) vecAB=(1®A)vecB (ii) vecAB=(B'®A)vecl (iii) vec AB = E (B').k ® A.k Solution
(1) (As in Example 2.2)
In (2.13) let Y = B andB =1. (ii) In (2.13) let Y = I . (iii) In vec AB = (B' ®A) vec I substitute (1.25), to obtain
f`7
vecAB =
[(B').ie; O EA.lei]vecl
= [((B').i®A.J)(e.® ee)]
ij
vec 1
(by 2.11)
The product e', O ei' is a one row matrix having a unit element in the [(i - 1)n + j]th column and zeros elsewhere. Hence the product ,U.
[(B').; ®A.i] [el' O el] is a matrix having
(B').1®A.1
'fl
as its [(i -1)n + j]th column and zeros elsewhere. Since vecl is a one column matrix having a unity in the 1st, (n + 2)nd, (2n + 3)rd . . . n2rd position and zeros elsewhere, the product of 61.
[(B').I ®A.l] [ej ® e)] and vec I is a one column matrix whose elements are all zeros unless i and j satisfy
(i-1)n+j = l,orn+2,or2n+3,...,orn2
Sec. 2.3j
Some Properties and Rules for Kronecker Products
that is
1=j=1 or
27
or i=j=3 or ..., i=j=n
i=j=2
in which case the one column matrix is
(B').i®A.r (i = 1,2,...,n) The result now follows.
IX If (X;} and (xj) are the eigenvalues and the corresponding eigenvectors for A and (µi} and (yi) are the eigenvalues and the corresponding eigenvectors for B, then
A®B has eigenvalues (Xrµj} with corresponding eigenvectors (xi ® yi}.
Proof By (2.11)
(A ® B) (x, ® yi) _ (Ax,) © (Byi) _ (Xixr) ® (µ1y1)
= Xjµi(x1 ®yj)
(by 2.5)
The result follows.
X Given the two matrices A and B of order n X n and m X m respectively JAOBI = IAImJBV" where IAA means the determinant of A.
Proof Assume that X1, X2, ... , X and µr, µ2, ... , µ,,, are the eigenvalues of A and B respectively. The proof relies on the fact (see [18] p. 145) that the determinant of a matrix is equal to the product of its eigenvalues. Hence (from Property IX above)
IAOBI = jjXjuf i,l n
n
X ' II µj) 1x2 tI P) 1=t
(X1 X2
...
JAI"' IBI°
...
1=t
ll(22 ...
rr
t X nr1=tl l µ//
The Kronecker Product
28
[Ch. 2
Another important property of Kronecker products follows.
AOB = Ut(BOA)U2
(2.14)
where U1 and U2 are permutation matrices (see Example 2.1).
Proof Let AYB' = X, then by (2.13) (BOA) vec Y = vecX X.
(1)
on taking transpose, we obtain
BY;t' = X' So that by (2.13)
(A 0 B) vec Y' _ vecX'
(2)
.
'I]
From example 1.5, we know that there exist permutation matrices U1 and U2 such that
vec X' = U1 vec X
and
vec Y = U2 vec Y'
.
Substituting for vec Yin (1) and multiplying both sides by U1, we obtain
U1(B 0A)U2vecY' = U1 vecX
.
(3)
Substituting for vec X' in (2), we obtain
(A O B) vec Y' = U1 vecX
.
(4)
The result follows from (3) and (4). We will obtain an explicit formula for the permutation matrix Uin section 2.5. Notice that U1 and U2 are independent of A and B except for the orders of the matrices.
XII if f is an analytic function, A is a matrix of order (n X n) and f(A) exists, then
f(1,n&A) = Im ID AA) and
f(A O Im) = f(A) O I. Proof Since f is an analytic function it can be expressed as a power series such as
f(z) = a°+a1z+a2z2+.. so that
f(A) = aoI,, +a1A+a2A2+... _
where A° = I.
u^,.
By Cayley Hamilton's theorem (see [18]) the right hand side of the equation for f(A) is the sum of at most (n + 1) matrices.
Sec. 2.3]
Some Properties and Rules for Kronecker Products
29
We now have
k =O
k=0
k=0 err,
®
7a
k
k=0
Im ©f (A)
'"1
This proves (2.15); (2.16) is proved similarly. We can write
f(A (D I,,)
)'ak(A Ox Im)k k -O
(Ak ©Im)
by (2.11)
k=0
akAk ®lm) k=0
akA®0Im
=
by (2.6)
k=0
f(A) (& Irn
This proves (2.16). An important application of the above property is for
f(z) = eZ
.
(2.15) leads to the result
elm 6A = Im O eA and (2.16) leads to eA ®rm = eA
O It n
Example 2.4 Use a direct method to verify (2.17) and (2.18).
elm®A =
a~'
Solution
(2.17)
(2.18)
The Kronecker Product
30
[Ch. 2
The right hand side is a block diagonal matrix, each of the m blocks is the sum
I,,,+A+21 A2+...
= eA
.
The result (2.17) follows. eA®Im
(In®Im)+(A(D Im)+21 Q. ®A)2+... ( 1 n ®In,) + ( A ®1m) + 1(A2 01m) + .. .
= Q,,+A+2A2+...)OOIm = eA ®I,,,
XIII tr(A®B)=trAtrB Proof Assume that A is of order (n X n)
tr(A®B) = tr(a1,B)+tr(a22B)+...+tr(annB) = a11trB+a22trB+...+anntrB = (all +a22+...+a.... )trB = tr A tr B . 2.4 DEFINITION OF THE KRONECKER SUM Given a matrix A(n X n) and a matrix B (m X m), their Kronecker Sum denoted by A ®B is defined as the expression
AG+B = A©I,,+1n®B
(2.19)
We have seen (Property IX) that if {X;} and {pj} are the eigenvalues of A and B respectively, then {X;pj} are the eigenvalues of the product A ® B. We now show the equivalent and fundamental property for A (D B. XIV If {X;} and tAj) are the eigenvalues of A and B respectively, then (Xi + pf} are the eigenvalues of A O B.
Proof Let x and y be the eigenvectors corresponding to the eigenvalues X and p of A and B respectively, then
(A(DB)(x®y) _ (A0I)(x0y)+(10B)(x(3y) = (Ax ®y) + (x ®By) = X(x ®y) + U(x ®y)
_ (X+p)(x®y) The result follows.
by (2.19) by (2.11)
Definition of the Kronecker Sum
Sec. 2.41
31
Example 2. S
Verify the Property XN for A
_
-1
l
I
and
B =
0
1-0 1 Cl-lJ
Solution For the matrix A; X, = 1
and
x, = 1101
X2 = 2 and x2 =
-
'=9
For the matrix B;
µ, = 1
[1
and
1122 and
Yi
1i
Y2=L1
We find 2
0 -1 0 0 0 -1
0
0
3
0
0
0
2
1
2
C=AO+B =
(L
and 1 pi - Cl = p (p - 1) (p - 2) (p - 3), so that the eigenvalues of A O B are
and
p = 0 = X, + µ2
and
xt O y2 = [0
1
p = 1 = X2 + 112
and
x2 O Y2 = 10
1 0 -1]'
p = 2 = X,+µr
and
x1Oy, = (1
1
p = 3 = X2 + µr
and
x2 O Yr = 11
1 -1 -1 ]'
0 0]' 0 0]' .
The Kronecker sum frequently turns up when we are considering equations of the form; (2.20) AX + XB = C c,.
where A(n X n), B(m X in) and X(n X m). Use (2.13) and solution to Example 2.3 to write the above in the form vecC or
(11' (D A) vec X = vec C
It is interesting to note the generality of the Kronecker sum. For example,
exp (A + B) = exp A exp B
(2.21)
[Ch. 2
The Kronecker Product
32
if and only if A and B commute (see [ 181 p. 227) 't7
"t7
whereas exp (A 0 B) = exp (A 0 1) exp (I 0 B) even if A and B do not commute! Example 2.6 Show that
exp (A ®B) = expA © exp B
whereA(n X it),B(m X m). Solution By (2.11)
A®B and (A
0 Im) and (In 0 B) commute so that
exp (A ®B) = exp (A 01m + In 0 B) = exp (A ®I,,,) exp (In ®B) = (expA ®Im) (1 ® exp B)
(by 2.15 and 2.16)
= expA 0 expB
(by 2.11)
2.5 THE PERMUTATION MATRIX ASSOCIATING vec X AND vec X'
If X = [x;l] is a matrix of order (in X n) we can write (see (1.20))
X = EEx,/E;j where Eli is an elementary matrix of order (in X n). It follows that
X' = so that
vec X' = EEx11 vec Erl'
(2.22)
.
We can write (2.22) in a form of matrix multiplication as x11
...
x21
.. vec E,;1 vec E12:... vec E,;,n]
I
x,,,, x12 ,,,
vec X' = [vec E11 vec E21
xmn
Sec. 2.5]
The Permutation Matrix
33
that is
vec X' = [vec E11 vec E21; ... vec E,,',,: vec E12 ... vec E,,',,j vec X. So the permutation matrix associating vec X and vec X' is U = [vec E,',
(2.23)
... vec
vec E2,
Example 2.7 Given
X = xli X12 X13
determine the matrix U
x21 x21 x13
such that vecX' = U vec X, Solution
EI'1
0l 1
00
Ei'r
!f
00
El, ' =
1
0
0 0
r f=7
0
0
1,0
E22 =
0
1
0 0
-,
0 0
E23 =
and
0
E13
00
00
= r0
0
1(0)
1
Hence by (2.23) 1
0 0 0 0 0
0 0
1
0 0 0
0 0 0 0 0
--»
U =
1
1
0
0 0 0 0
0 0 0
1
0 0
0 0 0 0 0
1
We now obtain the permutation matrix U in a useful form as a Kronecker product of elementry matrices. As it is necessary to be precise about the suffixes of the elementary matrices, we will use the notation explained at the end of Chapter 1. As above, we write m
X' = > > xrsEsr (n X m)
.
r=l s=1
By (1.31) we can write
X'
Er (nXm)XEsr(11 Xm). r, s
[Ch. 2
The Kronecker Product
34
1 fence,
vec X' = vec Esr (n X nt) XE,rr (n X m) r, s
Er,.(mXn)©E,.r(nXm)jvecX
by (2.13)
r' s
It follows that U = ) Ers (m X n) O Esr (n X m)
(2.24)
r, s
or in our less rigorous notation
U = ,E, Ox Ers
(2.25)
r, s
r..
Notice that U is a matrix of order (nut X nut). At first sight it may appear that the evaluation of the permutation matrices Ut and U2 in (2.14) using (2.24) is a major task. In fact this is one of the examples where the practice is much easier than the theory. We can readily determine the form of a permutation matrix - as in Example 2.7. So the only real problem is to determine the orders of the two matrices. c..
4U.
Since the matrices forming the product (2.14) must be conformable, the orders of the matrices Ut and U2 are determined respectively by the number of rows and the number of columns of (A O B). Example 2.8
Let A = [a111 be a matrix of order (2 X 3), and B = [bit] be a matrix of order (2 X 2). Determine the permutation matrices Ut and U2 such that
A O B = Ut (B 0 A) U2 Solution
(A ©B) is of the order (4 X 6)
From the above discussion we conclude that Ut is of order (4 X 4) and U2 is of order (6 X 6). 0 0 0 0 0 1
1
0 0 0
Ut _ 0 0 0
1
1
1
1
0 0 0
0 0 0 0
0
00
000
0 0 and
U2 =
1
0
0000 00 000 0
1
1
0 0 0 0 0 11
The Permutation Matrix
Sec. 2.51
35
Another related matrix which will be used (in Chapter 6) is
U=
rs O
Ers
(2.26)
r, s
When the matrix X is or order (in X n), U is or order (nr2 X n2).
Problems of Chapter 2
(1) Given
Ers(inX n)0Esr(nX m).
U= r, s
Show that
U-1 = U' =.Er(nXin)0Ers(inXn) r' s C)'
direct method to evaluate
(a) (i) AYB (ii) B' ©A (b) Verify (2.13) that vecAYB = (B' O A) vec Y.
(3) Given A =
r2
-1
1
1
and B = 0
2
0
1
(a) Calculate
AOB add BOA. (-)
(b) Find matrices U, and U2 such that
AOB = Ul(BOA)U2. (4) Given C3
A 2
4
_3
calculate
(a) exp (A)
(b)'exp(A 01). Verify (2.16), that is
exp (A) 01 = exp (A 01).
`L7
(2) A = [at1], B = [b,1] and Y = [y,j] are matrices all of order (2 X 2), use a
[Ch. 2)
The Kronecker Product
36 (5) Given
2
A
-1
1
-
and B
1
1
2
3
4
,
calculate
../
(a) A"' O and B-' (b) (A ©B)'' . Hence verify (2.12), that is
(A © B)'' = A"' © B''
(6) Given
A = L4
2]
and B = L2
,
find
3
(a) The eigenvalues and eigenvectors of A and B. (b) The eigenvalues and eigenvectors of A © B. (c) Verify Property IX of Kronecker Products.
(7) A, B, C and D are matrices such that A is similar to C, and B is similar to D.
Show that A 0 B is similar to C rJ D.
CHAvrER 3
Some Applications of the Kronecker Product
3.1 INTRODUCTION
There are numerous applications of the Kronecker product in various fields including statistics, economics, optimisation and control. It is not our intention to discuss applications in all these fields, just a selected number to give an idea
"'7
..'
.'^
....
,-.
ice'
C].
of the problems tackled in some of the literature mentioned in the Bibliography. There is no doubt that the interested reader will find there various other applications hopefully in his own field of interest. A number of the applications involve the derivative of a matrix - it is a well known concept (for example see [18] p. 229) which we now briefly review. 3.2 THE DERIVATIVE OF A MATRIX Given the matrix
A(t) _ [ar!(t)) the derivative of the matrix, with respect to a scalar variable t, denoted by (d/dt)A(t) or just dA/dt or A(t) is defined as the matrix
dtA(t) - dta;t(t)I
.
(3.1)
I
Similarly, the integral of the matrix is defined as
JA(t)dt = [Jaii@)dt For example, given
A =
2t2
4
sin t
2 + t2
(3.2)
Some Applications of the Kronecker Product
38
[Clt. 3
then
d
Q
14t
fAdt =
and
dt A = cost 2t
t3
4t
-cost 2t + t3/3
+C
where C is a constant matrix. One important property follows immediately. Given conformable matrices A(t) and B(t), then
dt [AB] = aAB+A d-
.
(3.3)
cry
Example 3.1 Given
C = AOB
(each matrix is assumed to be a function of t) show that
717
dC
= dAOB+AO dB
3.4)
Solution On differentiating the (i, j)th block of A O B, we obtain
t d
(aijB)
i'iB + a,i aB
which is the (i, j)th partition of
dAOB+AOdB
,
the result follows.
3.3 PROBLEM 1
Determine the condition for the equation
AX+XB = C to have a unique solution. Solution We have already considered this equation and wrote it (2.21) as
(B'@ A) vec X = vec C or
Gx = c where G = B' (D A and c = vec C.
(3.5)
Problem I
...
Sec. 3.3)
39 "..
Equation (3,5) has a unique solution 1ff G is nonsingular, that is iff the eigenvalues of G are all nonzero. Since, by Property XIV (see section 2.4), the eigenvalues of G are (X1 + µ/) (note that the eigenvalues of the matrix B' are the same as the eigenvalues of B). Equation (3.5) has a unique solution iff
Xr+µl a:0
(all iandj).
We have thus proved that AX + BX = C has a unique solution iff A and (-B) have no eigenvalue in common. If on the other hand,A and (-B) have common eigenvalues then the existence of solutions depends on the rank of the augmented matrix
[Gc] If the rank of [G:c] is equal to the rank of G, then solutions do exist, otherwise the set of equations
AX+XB = C is not consistent. Example 3.2 Obtain the solution to
AX+XB = C (1)
(ii)
A=
I0
A=1
22
0
,
[
B= B=
2,
_;'
where
1
3 +
0]
an d
12
4
0 -1
an d
^'r
Solution Writing the equation in the form of (3.5) we obtain, (1)
-2 - 1 0-1
1
0
x1
I
0
1
x2
-2
0
1-1
x3
3
0
4
0
x4
2
where for convenience we have denoted
x2
x,l
-N1
4
2
C=
C=
2
2 -9
Some Applications of the Kronecker Product
40
[Ch. 3
00.
On solving we obtain the unique solution
X=
10
21 -1
1
(ii) In case (ii) A and (-B) have one eigenvalue (X = 1) in common. Equation (3.5) becomes
H2 -1
0
0
x,
0 -1
0
0
x2
4
0
0 -1
x3
S
LO
4
0
x4
-9
J
0
_
2
and rank G = rank [G; c]. G is seen to be singular, but
rank G = rank [G c] = 3
1
...
hence at least one solution exists. In fact two linearly independent solutions are
Ti
0
X, _
and
-2 -1
X2 =
1-1
-2 -1
0
any other solution is a linear combination of X, and X.2.
3.4 PROBLEM 2
Determine the condition for the equation
AX-XA=yX
(3.6)
to have a nontrivial solution.
Solution We can write (3.6) as
Hx = px
(3.7)
whereH=I®A -A'@ I and
x = vecX . (3.7) has a nontrivial solution for x iff
1,41-HI = 0 0
that is iff p is an eigenvalue of H. But by a simple generalisation of Property XIV,
Sec. 3.5]
Problem 3
41
section 2.4, the eigenvalues of H are {(At - ?l)} where {rr} are the eigenvalues of A. 1-fence (3.6) has a nontrivial solution iff
p= Example 3.3 Determine the solutions to (3.6) when
A =
5 of 2
and
p = -2 .
3
Solution
p = -2 is an eigenvalue of H, hence we expect a nontrivial solution. Equation (3.7) becomes 0
0--2
01
0-2
XI
X1
2
2
0
0 -2
0
X3
X3
0
0
0
xa
x4
2
x2
x2
= -2
On solving, we obtain 1
X=
1
-1 -1
3.5 PROBLEM 3
Use the fact (see [18] p. 230) that the solution to is
z = Ax , x(0) = c
(3.8)
x = exp (A t) c
(3.9)
to solve the equation
X = AX + XB
,
X (O) = C
(3.10)
where A(n X n), B(m X in) and X(n X rn). Solution Using the vec operator on (3.10) we obtain where and
X = GX ,
x (0) = c
x = vecX, c = vecC
G = I,,, OA+B'OI
(3.11)
[Ch. 3
Some Applications of the Kronecker Product
42
By (3.9) the solution to (3.11) is
vee X = exp {(I,,, 0 A) t + (B' ®lr,)t) vcc C vcc C (see Example 2.6) [exp (I,,, ©A)t] [exp [I, © exp (At)] [exp (Bt) O
by (2.17) and
vec C
(2.18).
We now make use of the result
vec AB = (B'(D I) vec A
(in (2.13) put A =1 and Y - A) in conjunction with the fact that
[exp (B'r)] = exp (Bt) , to obtain
vec C = vec [Cexp (Bt)]
(exp (Bt) O
Using the result of Example 2.3(1), we finally obtain vec X = vec [exp (At) C exp (Bt)
(3.12)
So that X = exp (At) C exp (Bt). Example 3.4 Obtain the solution to (3.10) when -1 ,
0
B=
0
1
and
-2
C=
0 -1
2
0
.-.
1
A =
1
I
Solution (See [ 18] p. 227)
exp(At) = hence
er
et - e2t
fl
e2t
,
exp (Bt) =
et 0
0
l
er
_e2r-ear
X e3t
er
3.6 PROBLEM 4 We consider a problem similar to the previous one but in a different context. An important concept in Control Theory is the transition matrix. Very briefly, associated with the equations
X = A(t)X or is = A(t)x is the transition matrix (P1 (t, r) having the following two properties
...
c1(t r) = A(t)'t1(t r) and
`b1(t, t) = 1
(3.13)
Problem 4
Sec. 3.6J
[For simplicity of notation we shall write it is easily shown that
43
for cb(t,T).] lfA is a constant matrix,
(1) = exp(At) Similarly, with the equation
X = X13 so that
X' = 13'X'
we associate the transition matrix 4'2 such that (3.14) 4,2 = B'`F2 . The problem is to find the transition matrix associated with the equation
X=AX+XB
(3.15) ,L]
given the transition matrices 4' and `I'2 defined above.
Solution We can write (3.15) as
is=Gx where x and G were defined in the previous problem. We define a matrix as (3.16)
Ji(t,T) _ 1,2(t,T)0 `P,(t,T) We obtain by (3.4) q)2 ©(>;, +
4)2
by (3.13) and (3.14)
(B'4'2) ® `1't + `1'2 O (A`1)1)
= (B'`F2) ® (I`1't) + (I`F2) ® (A`l't)
= [B'OI+IOAi[(2O(1?,J Hence
by (2.11)
.
=GO .
Also
i (t, t) _ `l'2(4 r) ®`F (t, r)
= I®I =I.
(3.18) .ti
The two equations (3.17) and (3.18) prove that L is the transition matrix for (3.15) Example 3.5 Find the transition matrix for the equation
X
-
X+X IO
2
1
0
0 -1
Some Applications of the Kronecker Product
44
(Clr. 3
Solution In this case both A and B are constant matrices. From Example 3.4.
et-e2` i
et
4'1 = exp(At) =
e2t
Lo Iet
4)2 = exp (Bt) _
coy
0 t e-
0
So that
e2t e2t__e31
1G=(D2O t=
0
0
0
0
0
1
1 -et
0
0
et
0
e
0 Lo
3t
For this equation
2 -1
0
0
0
3
0
0
0
0
0 -1
L0
0
0
G =
1,
and it is easily verified that
=Gi and
3.7 PROBLEM 5 Solve the equation
AXB =C where all matrices are of order n X n.
Solution Using (2.13) we can write (3.19) In the form Hx = c
(3.20)
where H = B'O A, x = vec X and c = vec C. The criteria for the existence and the uniqueness of a solution to (3.20) are well known (see for example [ 18] ). The above method of solving the problem is easily generalised to the linear equation of the form A1XB1 + A2XB2 + ... +A,XB,. = C (3.21)
Problem 6
Sec. 3.81
45
Equation (3.21) can be written as for example (3.20) where this time B,
Example 3.6 Find the matrix X, given
A1XB1 +A2XB2 = C where
02 B2 =
-l
4 -6
1
and
3
C= 1
Solution For this example it is found that
H = B0A1+Bz0A2 =
r
0
8
2
2 -2 - 3
--t
1 -1
1
2
0
2
5
2
-4 -2 -5 - 4
andc'=[4 0 -6 81 It follows that -1
x = H-lc =
-2 0
so that
X
3.8 PROBLEM 6
This problem is to determine a constant output feedback matrix K so that the closed loop matrix of a system has preassigned eigenvalues. A multivariable system is defined by the equations
x = Ax+Bu
y=Cx
(3.22)
where A(n X n), B(n X m) and C(r X n) are constant matrices, u, x and y are column vectors of order in, n and r respectively.
Some Applications of the Kronecker Product
[Ch. 3
We are concerned with a system having an output feedback law of the form
u = Ky
(3.23)
where K(m X r) is the constant control matrix to be determined.
On substituting (3.23) into (3.22), we obtain the equations of the closed loop system
z=(A+BKC)x y
= Cx
(3.24)
.
The problem can now be restated as follows: Given the matrices A, B, and C, determine a matrix K such that
XI -A -BKC I = ao + a, X + ... + an._1 A"-1 + A" (say)
(3.25) = 0 for preassigned values A = A1, A2, ..., An
Solution
Various solutions exist to this problem. We are interested in the application of the Kronecker product and will follow a method suggested in [24]. We consider a matrix H(n X n) whose eigenvalues are the desired values A1, A2 ... , An, that is
IAl-HI = 0 and
for
A=
IAl-HI = ao+a1A+...+an_1A"-1+A"
Let
(3.26) .
(3.27)
A + BKC = H
so that
BKC=H-A=Q (say)
3.28)
Using (2.13) we can write (3.28) as
(C'@ B) vec K = vec Q
(3.29)
or more simply as
Pk = q
(3.30)
where P = C' O B, k = vec K and q = vec Q. Notice that P is of order (n2 X mr) and k and q are column vectors of order mr and n2 respectively. The system of equations (3.30) is overdetennined unless of course to = n =r,
in which case can be solved in the usual manner - assuming a solution does exist!
In general, to solve the system for k we must consider the subsystem of linearly independent equations, the ienraining equations being linearly dependent
I!]
Problem 6
Sec. 3,8 .1
47
on this subsystem. In other words we determine a nonsingular matrix T(n2 X n2) such that
Pt
TP = ---
(3.31)
Li P2 .°,
where P, is the matrix of the coefficients of the linearly independent equations of the system (3.30) and P2 is a null matrix.
Premultiplying both sides of (3.30) by T and making use of (3.31), we obtain
TPk=Tq
or
u
k=
(3.32)
V
LiPN
..Y
'.'
oCD
+'7
'r.
If the rank of P is tnr, then Pl is of order (nir X rnr), P2 is of order ([n2- mr] X mr) and u and v are of order nir and (n2 -mr) respectively. A sufficient condition for the existence of a solution to (3.32) or equivalently to (3.30) is that v = 0
(3.33)
k = Pt-t u
may.'
in (3.32). If the condition (3.33) holds and rank Pt = mr, then (3.34)
,
The condition (3.33) depends on an appropriate choice of H. The underlying assumption being made is that a matrix H satisfying this condition does exist. This in turn depends on the system under consideration, for example whether it .°C
is controllable. Some obvious choices for the forth of matrix H are: (a) diagonal, (b) upper or lower triangular, (c) companion form or (d) certain combinations of the above forms. Although forms (a) and (b) are well known, the companion form is less well documented. Very briefly, the matrix .'.
H=
0
1
0
...
0
0
0
I
...
0
0
0
0
...
1
Lao -ar -a2
-a"-
is said to be in `companion' form, it has the associated characteristic equation
IA! -HI = ao + at?t + ... +
0
(3.35)
(Ch. 3
Some Applications of the Kxonecker Product Example 3. 7
Determine the feedback matrix K so that the two input - two output system 0
1
x= 3
3
2 -3
0
0
0
1 x+
1
0u
2
0
1
has closed loop eigenvalues (-1, -2, -3). Solution
We must first decide on the form of the matrix H. Since (see (3.28))
H - A = BKC and the first row of B is zero, it follows that the first row of
H-A must be zero. We must therefore choose H in the companion form.
Since the characteristic equation of His
(X+1)(X+2)(a+3) = X3 +6X2+11a+ 6 = 0 H
1
0
0
1
(see (3 .35))
[-6 -11 --6 ONO
0
and hence (see (3.28)) 0
0
Q = -3 -3
0
r0
-8 -8 -8
1
P=C'OB=
11
110
0
0
0
0
1
0
1
0
0
1
0
1
.
1
O
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
1
Problem 6
Sec. 3.8]
49
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
T=
0
An appropriate matrix T is the following
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0 0-1
0
0
0
0
1
0
0-1
0
0
0
0
0
0
0
0
0
0
1
0
It follows that 0
1
0
0
0
1
1
0
1
0
0
1
0
1
P,
0
0
0
0
PZ
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
`--O
0
TP =
and
0 - 8 - 3 - 8
Tq =
0
u v
0
0
Some Applications of the Kronecker Product
50 Since
Pt =
0
0
1
0
0
0
0
1
0
1
0
1
0
1
L0
1J
so that (see (3.34) )
-3
k =Pi-lu =
0
0
-8 Hence
A_
[-13
0
-8
-1
Pit =
0
1
0
0-1
0
1
1
0
0
0
0
1
0
0
[Ch. 3]
CHAPTER 4
Introduction to Matrix Calculus 4.1 INTRODUCTION
.CC
n';
SAC
It is becoming ever increasingly clear that there is a real need for matrix calculus in fields such as multivariate analysis. There is a strong analogy here with matrix algebra which is such a powerful and elegant tool in the study of linear systems and elsewhere. Expressions in multivariate analysis can be written in terms of scalar calculus, fl..
but the compactness of the equivalent relations in terms of matrices not only .ti
leads to a better understanding of the problems involved, but also encourages the consideration of problems which may be too complex to tackle by scalar calculus. We have already defined the derivative of a matrix with respect to a scalar (see (3.1)), we now generalise this concept. The process is frequently referred to v0,
'C7
".,
.w,
II.
as formal or symbolic matrix differentiation. The basic definitions involve the partial differentiation of scalar matrix functions with respect to all the `'"
elements of a matrix. These derivatives are the elements of a matrix, of the same order as the original matrix, which is defined as the derived matrix. The words 'formal' and 'symbolic' refer to the fact that the matrix derivatives are defined without the rigorous mathematical justification which we expect for the corresponding scalar derivatives. This is not to say that such justification cannot be made, rather the fact is that this topic is still in its infancy and that appropriate mathematical basis is being laid as the subject develops. With this in mind we make the following observations about the notation used. In general the elements 't7
.."
coo
-L7
of the matrices A, B, C, . will be constant scalars. On the other hand the elements of the matrices X, Y, Z, . . are scalar variables and we exclude the possibility that any element can be a constant or zero. In general we will also demand that these elements are independent. When this is not the case, for example when the matrix X is symmetric, is considered as a special case. The . .
a,:
.
f-'
try
t17
.-.
,w.
°.'
(G9
reader will appreciate the necessity for these restrictions when he considers the partial derivatives of (say) a matrix X with respect to one of its elements xr5. Obviously the derivative is undefined if xr,. is a constant. The derivative is Er,s if xr5 is independent of all the other elements of X, but is Er,s + E,,. if X is symmetric.
Introduction to Matrix Calculus
52
(Ch. 4
There have been attempts to define the derivative when xrs is a constant (or Zero) but, as far as this author knows, no rigorous mathematical theory for the general case has been proposed and successfully applied.
4.2 THE DERIVATIVES OF VECTORS Let x and y be vectors of orders n and m respectively. We can define various derivatives in the following way (15]:
(1) The derivative of the vector y with respect to vector x is the matrix row
ays
aYR,
ax,
ax,
ax,
ay
ayt
rod`
INS
FaYt
ax
ax2
ax2
aYm
3Y2
((d
axe
ay,
by.,
aym
axn
axn
ax-1
(4.1)
of order (n X m) where yr, Y2, ... , y,,, and x,, x2, ... , x are the components of y and x respectively. C1'
`d.
(2) The derivatives of a scalar with respect to a vector. Ify is a scalar ray
ay
T
ax
ax2
ay
(4.2)
by axn
(3) The derivative of a vector y with respect to a scalar x ay,
loo
aye
ax
Lax
ax
Example 4.1 Given
y=
rpm
by
aym
ax X,
Yr
x=
x2
Y2 X3
(4.3)
Sec. 4.2]
53
The Derivatives of Vectors
and
Yi =xi-x2 Y2 = x3 + 3x2
Obtain ay/ax.
Solution
ay
_
3Yi
ay-2
axe
axt
ay,
aye
axe
axe
ay,
ay2
ax3
ax3
2xt\ 0 -1
ax
0
3
2xj
In multivariate analysis, if x and y are of the same order, the absolute value of the determinant of ax/ay, that is of aX
ayJ is called the Jacobian of the transformation determined by
y = Y(x) .ti
Example 4.2
/Ay
The transformation from spherical to Cartesian co-ordinates is defined by x = r sin 0 cos >V ,y = r sin B sin ', and z = r cos B where r > 0, 0 < 0 . A=l
/',. ail bpjxlp
(4,14)
.
1=1
From (4.14) we immediately obtain
roles
ayij axrs
(4.15)
- atrbsj
We can now write the expression for aylj/aX , ayii
ayij
aytj
ax11
aX12
a
ayu
ayu
ax21
ax22
aX2n
, .
aXln
(4.16)
... aylj
ay11
aXm 1 axm 2
...
aye,
axm n
Using (4.15), we obtain
ailbnj
a12b2j
... ...
Limblj aimb2i
...
almbnjj
a11blj
aitb2j
ai2blj
aylj
ax
a12bnj
(4.17)
We note that the matrix on the right hand side of (4.17) can be expressed as (for notation see (1.5) (1.13) (1.16) and (1.17)) ail
ail
(btjb2j ... bnjj
atmj
= Al. B./ = A'e1 ee B'.
(Ch. 4
Introduction to Matrix Calculus
62
So that ay`I
ax
= A'Er/B'
(4.18)
where Ell is an elementary matrix of order (I X q) the order of the matrix Y. We also use (4.14) to obtain an expression for aYlaxrs
=
aXrs
r0'
aY
(r, s fixed, 1, j variable I < i s 1, 1< j 5 q)
ay,I aXrs
that is ayl2
ay,g
axrs
aXrs
aXrs
ay
aye,
ay22
ay2g
axrs
aXrs
axrs
axrs
ay a
8YI2
xs
axrs
Eli ay" axrs
(4.19)
AID"
ayll
aytq axrs
where Et1 is an elementary matrix of order (1 X q). We again use (4.15) to write
ayu
alrbsl
alrbs2
a2rbsi
a2rbs2
arnrbsl
arnrba2
... ... alrbq a2rbsq
axrs .
amrbsq
air
a2r
[bst b52
.
. .
bsq ]
arnr
A.rBs' = AeresB
.
So that axrs
= AErsB SRS
a (AXB)
(4.20)
where Ers is an elementary matrix of order (m X n), the order of the matrix X.
The Derivative of a Matrix
Sec. 4.51
63
Example 4.5 Find the derivative aY/axr,, given
Y = AX'B where the order of the matrices A, X and B is such that the product on the right hand side is defined. Solution By the method used above to obtain the derivative a/axis (AYB), we find a 3Xrs
(AX'B) = AE,,B
.
Before continuing with further examples we need a rule for determining the derivative of a product of matrices. Consider
Y = UV
(4.21) .C)
where U = [u11] is of order (rn X n) and V = [qj] is of order (n X 1) and both U and V are functions of a matrix X. We wish to determine
aY
ay11
axis
ax
-- and
The (i,j)th element of (4.21) is n
ylj =
(4.22)
UIPVPI
P=1
hence
airs
n
vPj P=
i aXrs
P=I
-
avP1 U iP
lam
n UUp
ay;j
(4 23)
.
.
axis
For fixed r and s, (4.23) is the (i,j)th element of the matrix aYlax,s of order (m X 1) the same as the order of the matrix Y. II.
On comparing both the terms on the right hand side of (4.23) with (4.22), we can write
a(UV) axrs
as one would expect.
au axis
V+U
av axis
(4.24)
Introduction to Matrix Calculus
64
[Ch. 4 ti-
On the other hand, when fixing (i,j), (4.23) is the (r,s)th element of the matrix ay;l/aX, which is of the same order as the matrix X, that is ay,l
L "lip ax vpl + L utp
ax
p=1
p-1
avpl (4.25)
ax
We will make use of the result (4.24) in some of the subsequent examples. Example 4.6 Let X = [xrs] be a non-singular matrix. Find the derivative aY/axrs, given
(i) Y = AX -'B, and
(ii) Y=XAX Solution
(i) Using (4.24) to differentiate
yy-t = I, we obtain
aY 3Y-' = 0, -Y-'+Y axrs axrs
hence
aY -axrs _ -Y -ay-' Y-. axrs
But by (4.20) a
axrs
axrs
(B-1XA-1) = B-'Ers q-t
,-.
3Y-' so that
CID
ay
a
- = - (AX-'B) = AX -'BB-'ErsA-'AX -'B axrs
axrs
AX-'ErsX-'B .
(ii) Using (4.24), we obtain
ay axrs
_
-
aX'
axrs
AX+X'
a(AX) axrs
_ E, AX + X'Airrs
(by (4.12) and (4.20)) .
Both (4.18) and (4.20) were derived from (4.15) which is valid for all i, j and r, s, defined by the orders of the matrices involved. 1
The Derivative of a Matrix
Sec. 4.5 1
65
The First Transformation Principle
,R.
It follows that (4.18) is a transformation of (4.20) and conversely. To obtain (4.18) from (4.20) we replace A by A', B by B' and Er: by Eli (careful, Ers and Etl may be of different orders). The point is that although (4.18) and (4.20) were derived for
constant matrices A and B, the above transformation is independent of the status of the matrices and is valid even when A and B are functions of X.
Example 4.7 Find the derivative of aytl/aX, given
(i) Y = AX'B,
(ii) Y=AX-'B, and (iii) Y = X AU where X = [x,l] is a nonsingular matrix, Solution (1) Let W = X', tlien
ay Y = AWB so that by (4.20) - =AEr3B aWrs
hence
ay,l
aw
= A'E;iB'.
But ayL/
a}ri
ax
aw'
_ (ay.l
awl
hence DYq
ax
= BE ;IA
(ii) From Example 4.6(i) aY axrs
-AX-'L,-,,,X-'B.
Let At = AX -1 and Bt = X''B, then aY a xrs
A1E 3B
1
so that ay,t
ax
= AiE,1B1' = -(X )'A'E;1B'(X t)' .
Introduction to Matrix Calculus
66
[Ch. 4
(iii) From Example 4.6 (ii) aY aXrs
= E,,AX + X'AE,,s
.
0.j
LetA,=1,Bt=Ax,A2=XAandB2=1, then ax axrs
= AtErsBl +A2Ersl32 .
The second term on the right hand side is in standard form. The first term is in the form of the solution to Example 4.5 for which the derivative ay;l/aX was found in (i) above, hence ay 'r = B1E;1AI + A2E,/B2'
ax
= AXE; + A`xE;l . It is interesting to compare this last result with the example in section 4.2 when we considered the scalary = x'Ax. In this special case when the matrix X has only one column, the elementary matrix which is of the same order as Y, becomes
E;1=E;j=1. Hence
ay,, = aY
ax
ax
= Ax + A'x
which is the result obtained in section 4.2 (see (4,4)). Conversely using the above techniques we can also obtain the derivatives of the matrix equivalents of the other equations in the table (4.4). Example 4.8 Find aY
ay;; and
aXrs
ax
when (i) Y = AX, and
(ii) Y=X'X. Solution (i) With B = I, apply (4.20) aY axrs
= AEr3.
The Derivatives of the Powers of a Matrix
Sec. 4.61
The transformation principle results in ay11
ax
(ii) This is a special case of Example 4.6 (ii) in which A = I. We have found the solution aY axrs
ErsX + X'Ers
and (Solution to Example 4.7 (iii))
'Y" = XE11 + XEj . ax
4.6 THE DERIVATIVES OF THE POWERS OF A MATRIX Our aim in this section is to obtain the rules for determining ay;;
ay and
axrs
when
ax
Y=X".
Using (4.24) when U = V= X so that
Y=X2 we immediately obtain
ay
- =ErsX+XErs axrs
and, applying the first transformation principle,
ay,
ax
= E;1X'+X'E;j .
It is instructive to repeat this exercise with
U= X 2 so that Y
and
V= X
X3.
We obtain
ay Ltd
axrs
= ErsX 2 + XErsX + X 2Ers
and
Y-u = Ei, (X')2 + X'EifX' + (X 1)2E,,
ax
67
Introduction to, Matrix Calculus
68
[Ch. 4
More generally, it can be proved by induction, that for
Y=Xn kEESXn-k-1
X
(4.26)
k=0
where by definition X ° = I, and
"-I
ay;l
(4.27)
x )k E,j (X ) n -k-1
ax
k=1
Example 4.9 Using the result (4.26), obtain aYlaxrs when
Y=X-n Solution Using (4.24) on both sides of
X-nXn=I we find
a(X-n)
Xn+X-n
a (Xn)
airs
so that
=
0
axrs
3(X-n)
_ `x -n a(Xn) X-n.
axrs
axrs
Now making use of (4.26), we conclude that
3(X-n)
= -x-n
7
Fn-1
XkErsXn-k-1
axrs
L=° Problems for Chapter 4
...
-
(1) Given
x=
xtl x12 x3 x233]
Y=
x21 x22
and y = 2x11x22 -x21x13, calculate BY ay
ax
and
ax
x-1 1
2x2
sin x
The Derivatives of the Power of a Matrix
Sec. 4.61
69
(2) Given
X
sinx
X
cos x
cz
X=
and
Fsinx
ex
L'
XI
evaluate
alxl ax by
(a) a direct method (b) use of a derivative formula.
(3) Given X =
X11
x12 X13
and
Y = X'X,
Lx 21 x22 X231
use a direct method to evaluate (a)
DY
and
(b)
aY i3
ax-21
ax
(4) Obtain expressions for
by ax's
and
ay;;
ax
when
(a) Y = XAX and
(b) Y = XAX'.
(5) Obtain an expression for atAXBI/ax,,. It is assumedAXB is non-singular. (6) Evaluate aY/ax,,s when (a) Y = X (X')2
and
(b) Y = (X')2X.
CHAPTER 5
Further Development of Matrix Calculus including an Application
of Kronecker Products 5.1 INTRODUCTION
In Chapter 4 we discussed rules for determining the derivatives of a vector and then the derivatives of a matrix. But it will be remembered that when Y is a matrix, then vec Y is a vector.
This fact, together with the closely related Kronecker product techniques discussed in Chapter 2 will now be exploited to derive some interesting results. Also we explore further the derivatives of some scalar functions with respect to a matrix first considered in the previous chapter. 5.2 DERIVATIVES OF MATRICES AND KRONECKER PRODUCTS In the previous chapter we have found ay;!/3X when
Y = AXB
(5.1)
-o^
where Y = [y1j], A = [ajj], X = [x11] and B = [by]. We now obtain (a vec Y)/(a vec X) for (5.1). We can write (5.1) as
y=Px
(5.2)
where y = vec Y, x=vecXand P=B'OA. By (4.1), (4.4) and (2.10) ay
ax
=P' = (B'OA)' = BOA'.
(5.3)
The corresponding result for the equation
Y = AX'B is not so simple.
(5.4)
[Sec. 5.2]
time
Derivatives of Matrices and Kronecker Products
71
The problem is that when we write (5.4) in the form of (5.2), we have this
y = Pz
(5.5)
where z = vec X' We can find (see (2.25)) a permutation matrix U such that
vecX' = UvecX
(5.6)
in which case (5.5) becomes
y=PUx so that ax
= (PU)' = U'(B ®A') .
5.7)
It is convenient to write
U'(B O A') = (B
(5.8)
U' is seen to premultiply the matrix (B O A'). Its effect is therefore to rearrange the rows of (B d A'). In fact the first and every subsequent nth row of (B (D A') form the first consecutive m rows of (B O A')(,,). The second and every subsequent nth row form the next m consecutive rows of (B and so on. A special case of this notation is for n = 1, then
(B (D A'){1) = BOA'
.
(S.9)
Now, returning to (5.5), we obtain, by comparison with (5.3) ay ax
= (B(D
Example 5.1 Obtain (a vec Y)/(a vec X), given X = [x;l] of order (m X n), when
(i) Y=AX, (ii) Y=XA, (iii) Y=AX' and (iv) Y=XA. Solution
Let y = vec Y and x = vec X.
(i) Use (5.3) with B = I ay ax
= 10 A'.
(5.10)
Further Development of Matrix Calculus
72
(ii) Use (5.3) ay
ax
= A ®I .
(iii) Use (5.10) ay
_ (I ®A')(n)
ax
(iv) Use (5.10) ay
= (A ®I)(o
ax
5.3 THE DETERMINATION OF (a vec X)/(3 vec Y) FOR MORE COMPLICATED EQUATIONS
In this section we wish to determine the derivative (a vec Y)/(a vec X) when, for example, Y = X'AX (5.11) wheie X is of order (m X n).
Since Y is a matrix of order (n X n), it follows that vec Y and vec X are vectors of order nn and nm respectively. With the usual notation III
Y = [yi/)
,
X = [xi/)
we have, by definition (4.1), ay11
ay21
ax11
ax11
ax11
a vec Y
ayl I
ay21
aynn
avecx
axle
a .x21
ax21
aynn
ayll
ay21
aynn
aXmn
axmn
3Xmn
...
...
...
...
I
[Ch. 5
But by definition (4.19), ay) ' the first row of the matrix (5,12) is vec -ax, I
/
the second row of the matrix (5.12) is +\vec
a'
Y-),etc.
a.x21
(5.12)
The Determination of (3 vecX)/(3 vec Y)
Sec. 5.3]
73
We can therefore write (5.12) as a vec Y
a vecX
( by aY 1 ' BY = vec - : vec - ; ... ; vec ,
3x11
ax,nn
8x21
(5.13)
We now use the solution to Example (4.6) where we had established that
when Y = X'AX, then
by axrs
= E,,SAX + X AErs .
(5.14)
It follows that
by
vec - = vec E;SAX +vec X AE,S axrs
= (XA'OI) vecE;S+(IOXA)vecErs
(5.15)
(using (2.13)) . Substituting (5.15) into (5.13) we obtain a vec Y
a vec X
_ [(X'A'01)[vee/'1 vecE21;
.
;vecErnr,]]'
+ [(IOXA)[vecEll: vecE21:... vecE,,,n]]' _ [vec Eii: vec E21; ... ; vec E;,,n]'(AX 01) + [vec E11 vec E21 vec E,nn ]' (I (DA'X)
(5.16)
(by (2.10)). The matrix [vec E, 1 , vec E21
.. .
vec Ernn ]
is the unit matrix I of order (mn X mn). Using (2.23) we can write (5.16) as
3vecY
avecX
= U'(AX 01) + (10 A'X) .
That is a vec Y
(5.17)
a vcc X
In the above calculations we have used the derivative a Y/axrs to obtain (3 vec Y)/
(a vecX).
Further Development of Matrix Calculus
74
[Ch. 5
The Second Transformation Principle'-j
Only slight modifications are needed to generalise the above calculations and show that whenever ay
= AErsB + CE,, D
aXrs
where A, B, C and D may be functions of X, then a vec Y cow
avecX
=
(5.18)
We will refer to the above result as the second transformation principle. Example .f.2 Find
avecY
when
avecX
(i) Y = X'X
(ii) Y = AX-'B
Solution
Lety=vecYandx=vecX (i) From Example 4.8 ay
= Er'sX + X'Ers
aXrs
Now use the second transformation principle, to obtain ay
ax
= I©X+(X(D
(u) From Example 4.6 ay axrs
AX-'ErjX-'B
hence
ay
ax
= -(X -'B) O (X-')'A'.
Hopefully, using the above results for matrices, we should be able to rediscover results for the derivatives of vectors considered in Chapter 4.
Sec. 5.4]
More on Derivatives of Scalar Functions
75
For example let X be a column vector x then
y=
Y = X'X becomes
x 'x
(y is a scalar) .
The above result for ay/ax becomes av
= (I0 x)+(x0 1)(1)
ax
0
c..
But the unit vectors involved are of order (n X 1) which, for the one column vector X is (1 X 1). ilence ay
= l ©x + x ©1 ax
(use (5,9))
=x+x=2x
which is the result found in (4.4). 5.4 MORE ON DERIVATIVES OF SCALAR FUNCTIONS WITH RESPECT TO A MATRIX
In section 4.4 we derived a formula, (4.10), which is useful when evaluating 31Y)/3X for a large class of scalar matrix functions defined by Y. .ti
Example.5.3 Evaluate the derivatives a log IX
()
ax
and
aIXIr
(ii)
ax
Solution (i) We have
(log IXD = X t-0
axrs
I I
I
.
axa rs
From Example 4.4,
Hence
alxl Ixl(x-') ax = a log IXI
ax (ii)
alxlr aXrs
_ = (X
= rjXj r-1
(non-symmetric case) .
1) .
a1xl aXrs
Further Development of Matrix Calculus
76
[Ch. 5
Hence
alxlr -- rlXIr(X-1)' ax Traces of matrices form an important class of scalar matrix functions covering a wide range of applications, particularly in statistics in the formulation of least squares and various optimisation problems. Having discussed the evaluation of the derivative a Y/axrs for various products of matrices, we can now apply these results to the evaluation of the derivative
a(tr Y)
ax We first note that
a(tr Y) _ [a(tr Y)1 axrs
(5.19) c^,
ax
JI
where the bracket on the right hand side of (5.19) denotes, (as usual) a matrix of the same order as X, defined by its (r,s)th element. As a consequence of (5.19) or perhaps more clearly seen from the definition (4.7), we note that on transposing X, we have
a(tr Y) '
a(tr Y)
ax
ax'
(5.20) -
Another, and possibly an obvious property of a trace is found when considering the definition of aY/axrs (see (4.19)). Assuming that Y = [yij] is of order (n X n)
tray
=
axrs
ayri+aY22+...+aYnn axrs
3Xrs
axrs
a
- (YI1 + Y22 + .
ay
a (tr Y)
axrs
axrs
tr
Example 5.4 Evaluate
+ Ynn)
axrs
Hence,
a tr(AX) ax
(5.21)
Sec. 5.4]
More on Derivatives of Scalar Functions
77
Solution
a tr(AX) aXrs
= tr
a(AX) by (5.21)
airs
= tr (AE,,)
by Example (4.8)
= tr(E,,A')
since tr Y = tr Y'
= (vec E,.,)' (vec A') by Example (1.4). Hence,
atr(AX) ax
,
= A
As we found in the previous chapter we can use the derivative of the trace of one product to obtain the derivative of the trace of a different product. Example 5.5 Evaluate
a tr (AX')
ax Solution From the previous result a t r (BX)
_ a t r (X'B') = B,
ax
ax
.-1
Let A' = B in the above equation, it follows that
atr(X'A) ax
_
atr(A'X) = A.
ax
The derivatives of traces of more complicated matrix products can be found similarly.
Example 5.6 Evaluate
when
8 (tr Y)
aY
(i) Y = XAX (ii) Y = X AXB Solution It is obvious that (i) follows from (ii) when B = I.
Further Development of Matrix Calculus
78
[Ch. 5
(ii) Y = X1B where X1= X AU.
ay _ axt airs
B
ax-".'
= E,s AXB + X'AEB
(by Example 4.6)
Hence,
tr(aY\ = tr(E,3AXB)+tr(X`AErsB) axrs!)
tr (E,,4AXB) + tr (E,,.4 XB')
= (vec EE,.)' vec (AXB) + (vec Ers)' vec (AXB') .
It follows that
a(trY)
= AXB + A'XB'.
ax
(i) Let B = I in the above equation, we obtain
a(tr Y)
ax
= AX+A'X = (A+A')X .
5.5 THE MATRIX DIFFERENTIAL For a scalar function f(x) where x = [x1 x2 as
df = > J=
of
... x,,]', the differential df is defined
dxl.
(5.23)
Ox,
Corresponding to this definition we define the matrix differential dX for the matrix X = [x;1] of order (m X n) to be
>'C
dX =
dxtn
dx22
... ...
dXm2
...
dxrn.1
dx11
dx12
dx21
Ldxmt
(5.24)
dx2n
.
The following two results follow immediately:
d(aX) = a(dX)
(where a is a scalar)
d(X + Y) = dX + dY. Consider now X = [x;1] of order (m X n) and Y = [ y,f] of order (n X p).
XY = [ExjJyjk]
(5.25)
(5.26)
The Matrix Differential
Sec. 5.5]
79
hence
d(XY) = d[Yxtlyjk) = 7
_ [E(dXij)yjk) + IExii(dYjk)) It follows that
d(XY) = (dX)Y+X(dY).
(5.27)
Example 5.7 Given X = [xtl] a nonsingular matrix, evaluate
(i) dlXl , (il) d(X'') Solution
(i) By (5.23) dIXI
(dx,j) ax11
Xij(dxij) since (a1Xl)/(axij) =Xij, the cofactor ofxij in IXI. By an argument similar to the one used in section 4.4, we can write
dIXI = tr {Z'(dX)}
(compare with (4.10))
where Z = IXij] Since Z'= IX jX-1, we can write
dIXI = IXl tr {X-'(dX)} . (ii) Since
X-1X = we use (5.27) to write
d(X-')X + X-'(dX) = 0. Hence
d(X-') = -X-'(dX)X"' (compare with Example 4.6). Notice that if X is a symmetric matrix, then
x=x' and
(dX)' = dX
.
(5.28)
Further Development of Matrix Calculus
80
[Ch. 5]
Problems for Chapter 5
(1) Consider
A =
all a12 a21
X=
a12
[X11 xiz
and Y = AX'.
X21 X22
Use a direct method to evaluate a vec Y
avac X and verify (5.10).
(2) Obtain avac Y
avecx when
(i) Y = AX'B and (ii) Y = )JAII X2. (3) Find expressions for
atrY ax when .,.,.
(a) Y = AXB, (b) Y = X2
and
(c) Y = XX'.
(4) Evaluate
a try ax when
(a) Y = X-1, (b) Y = AX-'B, (c) Y = X" and (d) Y = eX. (5) (a) Use the direct method to obtain expressions for the matrix differential dY when
(i) Y = AX, (ii) Y = X'X and (iii) Y = X2. (b) Find dY when
Y = AXBX.
Cl IAPTLR 6
The Derivative of a Matrix with respect to a Matrix 6.1 INTRODUCTION
In the previous two chapters we have defined the derivative of a matrix with respect to a scalar and the derivative of a scalar with respect to a matrix. We will now generalise the definitions to include the derivative of a matrix with respect y,,
to a matrix. The author dial"adopted the definition suggested by Vetter [31], although other definitions also'give rise to some useful results.
6.2 THE DEFINITIONS AND SOME RESULTS
Let Y = [y,j be a matrix of order (p X q). We have defined (see (4.19)) the derivative of Y with respect to a scalar xrs, it is the matrix [ayti/axr,s] of order
(pXq) Let X = [xrs] be a matrix of order (m X n) we generalise (4.19) and define the derivative of Y with respect to X, denoted by aY
ax as the partitioned matrix whose (r,s)th partition is aY axrs
in other words ay
ay
OXt1
3x12
aY
aY
aY
ax
421
...
axij aY
...
}d{
OXmt
Cc)
aY
aY
axm2
ay
Ers0 -
_ 3x2n
...
a.X22
aY
aY
r, s
axrs
(6.1)
[Clt. 6
The Derivative of a Matrix with Respect to a Matrix
82
The right hand side of (6.1) following from the definitions (1.4) and (2.1) where Err is of order (in X n), the order of the matrix X. It is seen that 3Y/3X is a matrix of order (mp X nq). Example 6.1 Consider Y =
x11 x12 x22
sin(xii +x12)
exll x" log (x1t ,F-X21))J
and
X
x11 xt21 x21 x22
_.y
Evaluate aY
ax Solution
ay
x22 exl l x]] 1
12 x22
axi t
cos (XI I
1
+ x12)
(x11 + x21)
ay aX12
x77 x22
0
cos (x11 + x12)
0
1
421
0
,1y
0
0
ay
ay ax22
x11x12
x17 exllx731
0
0
x11 + x21 x12 x22
ay ax
x22 exl l x»
X1 t x22
0
cos (x11 + x12)
0
xtt x12
x11 exl l x21
1
cos (x11 + x 12 )
xii + x21
0
0 1
0
0
Example 6.2 Given the matrix X = [xv] of order (m X n), evaluate aX/aX when
(i) All elements of X are independent (ii) X is a symmetric matrix (of course in this case m = n).
0
The Definitions and Some Results
Sec. 6.2)
.-,
Solution (i) I3y (G.1)
ax r
ax
= U (see (2.26))
r, s
ax
= Ers +Esr
axrs
ax
=
axrs.
"
for
r$s
for
r=s
We can write the above as;
ax = Ers + Esr - SrsErr
axrs
Hence, ax
Ers + > Ers Ox Esr ` 5rs > Esr Ox Err
rs
ax
r,s
r,s
r, s
= U+ U-2:ErrOx Err
(see (2.24) and (2.26))
Example 6.3 Evaluate and write out in full ax'lax given X12 X13
X11
X =
Lx21 x22 x231 v°,
.-,
Solution By (6.1) we have ax'
ax = Ers © Ers = U. Hence 1
I--
0
0
0
0
0
0
0
1
0
0
0
ax,
0
0
0
0
1
0
ax -
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
83
The Derivative of a Matrix with Respect to a Matrix
84
[Ch. 6
From the definition (6.1) we obtain
)'
tax, =(>Ers °aX r, s
by (2. 10)
Ers Ox f a
_
\axr.
O
a Y'
from (4 19)
r,s
It follows that
aY fax
aY (6.2)
= ax'
"C7
'6.3 PRODUCT RULES FOR MATRICES
We shall first obtain a rule for the derivative of a product of matrices with
I-,
respect to a matrix, that is to find an expression for a (XY)
az where the order of the matrices are as indicated
X(mXn), Y(nXv), Z(pXq). By (4.24) we write
a(XY) azrs
=
ax
Y+X
azrs
aY azrs
where Z = [Zrsl
If Ers is an elementary matrix of order (p X q), we make use of (6.1) to write Fax
a (XY)
ay l
Y+X
aZrs
r. s
azrs
ax -Y+
IEr,
aZrs
r, s
Ers(8X
aY azrs
rs
3Y
ax
= > Erslo OX -Y+ r5' IIErs 0X UZrs .ox
aZ
Ers O
r. s
S
UZrs
Product Rules for Matrices
Sec. 6.3 1
85
(where Iq and Ip are unit matrices of order (q X q) and (p X p) respectively)
ax
(Lrs
(D- ) (Iq ®Y) + airs
r, s
(I ®X) Er, rs
aY ---) azrf
(by 2.11)
finally, by (6.1)
a(XY) az
ax = az
(I ®Y) + (I®(@ X) aY
(6.3)
az
Example 6.4 Find an expression for
ax-' ax Solution Using (6.3) on
xX-'=1, we obtain
a (xx-')
ax
ax
ax
ax-1
ax
hence
ax-I ax =
-(I©x)-'
ax(I©x-')
= -(IOX-1)CI(I(& X-') (by Example 6.2 and (2.12)).
Next we determine a rule for the derivative of a Kronecker product of matrices with respect to a matrix, that is an expression for a(X (D Y)
az
The order of the matrix Y is not now restricted, we will consider that it is (u X v). On representing X © Y by it (i,j)th partition [x;1Y] (i = 1, 2, ... , m, k = 1, 2, ..
,
n), we can write
a (X ©Y) azrs
a
air:
[xr1Y]
86
The Derivative of a Matrix with Respect to a Matrix
[Ch. 6
where (r, s) are fixed = L3ZrsYJ + L
aZ
s
j
ax
ay _ aZrs -OY+XO. aZrs Hence by (6.1)
3(X(D Y)
ax
:rs0x -OO Y+
az
aZrs
r, s
r,s
aY E 0X0 aZrs
where Ers is of order (p X q) =aZ®Y+'
Ers0(XO
aY\ azrs J
r,
The summation on the right hand side is not X © aY/aZ as may appear at first sight, nevertheless it can be put into a more convenient form, as a product of matrices. To achieve this aim we make repeated use of (2.8) and (2.11)
Ers®(Xazrs ® aYl= [IpErsIq]OLUii r, s
//
r, s
®X)U1] aZrs
/ by (2.14)
c(0
aY Ers) O U, -0 azrs
r, s
X
[Iq O U2]
by (2.11)
//
ErsOa-Y
OUi]
OO X [Ig0 U2] bY(2.11). aZrs
a(XOY)_ ax0Y+ az
10U ay0X] [IO U21 q
[p
(6.4)
] laz
C1.
az
where U, and U2 are permutation matrices of orders (mu X mu) and (nv X nv) f1.
re pe ctive ly.
We illustrate the use of equation (6.4) with a simple example.
(i) Equation (6.4), and (ii) a direct method to evaluate
a(A©X) ax
c14
'GO
Example 6.5 A = [ail] and X = [x11] are matrices, each of order (2 X 2). Use
Sec. 6.3]
Product Rules for Matrices
87
Solution (i) In this example (6.4) becomes
(Aaxx)
_ [I O
U1 ]
Cax ©A [I ©U2]
where I is the unit matrix of order (2 X 2) and 0
1
0
0
1
0
0
1
0
0
0
0
0
1
0
U1=U2=ZE,s0OErs=
0
Since
ax ax
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
1
only a simple calculation is necessary to obtain the result. It is found that
a(AOX) ax
a12 0
0
all
0
a12
0
0
0
0
0
0
a22
0
0
a21
0
a22
0
0
0
0
0
0
0
0
0
0
0
0
0
0
a12
0
0
all
0
a12
0
0
0
0
0
0
0
0
a21
0
a22
0
0
a21
0
a22
all
0
0
0
a21
0
0
0
0 all
(il) We evaluate ICS
Y = AOX =
allxll
alixl2
a12x11
a12x12
a11x21
a11x22
a12X21
a12x22
a21 x11
a21 x 12
a22 x 11
a22 x 12
a21x21
a21x22
a22x21
a22x22
and then make use of (6.1) to obtain the above result.
[Ch. 6
6.4 THE CHAIN RULE FOR THE DERIVATIVE OF A MATRIX WITH RESPECT TO A MATRIX We wish to obtain an expression for (0l'0
az
ax
where the matrix Z Is a matrix function of a matrix X, that is
Z = Y(X) where
X = [xii] is of order (m X n) Y = [ yil] is of order (u X v) Z = [zri] is of order (p X q) By definition in (6.1)
az
az
r=1,2,...,m
ax
axrs
s = 1, 2, ... , n
r, s
where Er,s is an elementary matrix of order (m X n),
= r,s
Ers D
i,i
l=1, 2,...,u
azii iiaxrs -
1 = 1, 2, ... , q
where Eli is of order (p X q) As in section 4, 3, we use the chain rule to write az,i
azii
airs
a,
a=1,2,...,u 0=1,2,...,v
ayap
ayap axrs
Hence az
ax =
ayap
Ers
ayap axrs
r, s
ayap axrs
ayap ® az 0e, 9
ax
ayap
O
Ei
azii
(by 2.5)
aya p
(by (4.7) and (4.19))
("1
The Derivative of a Matrix with Respect to a Matrix
88
Sec. 6.4]
The Chain Rule for the Derivative of a Matrix
89
If I,, and It, are unit matrices of orders (n X n) and (p X p) respectively, we can write the above as az
ax
ap
(1-Yli")'& \ IPaYap )
Hence, by (2.11)
M
p (aaX
aX
3z
N) (I.
l\
Yap
Equation (6.5) can be written in a more convenient form, avoiding the summation, if we define an appropriate notation, a generalisation of the previous one. Since
Y1i
Y12
Y21
Y22
LYu1
Yu2
Y =
than (vec Y)' _ y y21
...
Yiv Y2v
...
YuvJ
. Yuv J
We will write the partitioned matrix Laax®1
aXi(3)
P
1;...ax P
P
as
a
ax
or as
a (vec Y)'
ax
®IP
®IP
Similarly, we write the partitioned matrix az
In ® aYii
aY21
az In
ayuv
as
P In®
az l `DIN
az
In Ox -
a vec Y
[Ch. 6
The Derivative of a Matrix with Respect to a Matrix
90
We can write the sum (6.5) in the following order Y11
ax = raax
ray" 01] (1" © P ax yu1 +l
aaZ 1 +
® IPJ CI"
'r4
(0I(0
az
IL
Yzi
1.
..n
+auv®IPI"° azLayx
J[
aZ
ayu.J
We can write this as a (partitioned) matrix product +,G
_)I :,.
az
ayii©I aY21 ax r 75X P* ax 1
P
ax
-
az
I" ®ayuv Finally, using the notations defined above, we have a [vec Y]'
aZ
ax
,,p
az
aZ 1"0 ®
P
L"
(6.6)
a vec Y] fro
We consider a simple example to illustrate the application of the above formula. The example can also be solved by evaluating the matrix Z in terms of the components of the matrix X and then applying the definition in (6.1). w-.
Example 6.6 Given the matrix A = [au] and X = [x11] both of order (2 X 2), evaluate
aziax where Z = Y'Y and Y = AX. (i) Using (6.6) (ii) Using a direct method. Solution (1) For convenience write (6,6) as
az ax = Q
QR
[a[vecYr ®I ax P]
az
N
where
and
R = IO a vec Y
The Chain Rule for the Derivative of a Matrix
Sec. 6.4]
91
From Example 4.8 we know that
ay" ± A'Er ax
so that Q can now be easily evaluated,
Q
I
0
00
a22 0 0
1
a22 0 0
Also in Example 4.8 we have found aZ
= E,S Y + Y'Ers
aYra
we can now evaluate R 2Y11
Y12
0
0
Y12
0
0
0
0
0
2Y11
Y12
o
0
Y12
0
0
0 0
2Y21 Yn
0
0 0 all 0 all 0 0 1 0 0 0 all 001
a21 0 I
Y22
0
0
0
0
2Y21
Nom'
Y22
0
0
Y22
0
0
Y11
0
0
Yil
2Y,2
0
0
0
0
0
Y11
0
0
Y 2Y,2
R =
0""Y21"0""0" Y21
2Y22 0
0
0
0
0
Y21
Lo
0
Y21
2y2
X
000
ate
0 0 a21
0
000
a2i
0 0 a22 0
0 0 a12 0
0
00 0 all 0 0 a12 0 0 0 0 all 0 0 all 0
1
000
a22
I
The Derivative of a Matrix with Respect to a Matrix
92
(Ch.
The product of Q and R is the derivative we have been asked to evaluate
QR =
a11y12 + a21y22
o
0
a11y1 l +a21y21
a12y12 +1122Y22
o
;,c
E2ailyil + 2a21y21 a11y12 + a21y22
2412y + 2a22Y21 La12y12 +a22y22
0
ally,, + a21y21 2a11y12 + 2a21y22
al2y11 + a22y21
a12.y11 + a22y21 2a12y12 + 2a22y22
(ii) By a simple extension of the result of Example 4.6(b) we find that when
Z = X'A'AX az
axrs
= ErSAAX + X'A'AErs
= ErsA'Y + Y'AErs where Y = AX.
By (6.1) and (2.11)
ax
r-.
az
(Ers Ox Ers) (10 A'Y) + 2 (I OO Y'Z) (Ers Ox Ers) r.s
r,s
Since the matrices involved are all of order (2 X 2) 0
0
0
0
0
1
0
0
^'.'
IErsOE;s =
1
1
0
0
0
0
0
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
and
O--
E Ers OX Ers =
0 1
0
On substitution and multiplying out in the above expression for aZfaX, we obtain the same matrix as in (i). Problems for Chapter 6
(1) Evaluate aYjaX given
y_
[cos (X12 + x22) xux211 X12x22
and
X=
x11
x12
IX-21
X22
Problems .L]
6] (2)
rxil
The elements of the matrix X =
x12 LX13
93
x21
x22 X23 J
are all independent. Use a direct method to evaluate aX/aX.
()3
I x11
x12
x21
x22
Given a non-singular matrix X = _
]
.mar
use a direct method to obtain
ax-1
ax and verify the solution to Example 6.4.
(4) The matrices A = [aiij and X = [x,ij are both of order (2 X 2), X is nonsingular. Use a direct method to evaluate
a(A 0 X-')
ax
CHAPTER 7
Some Applications of Matrix Calculus 7.1 INTRODUCTION
As in Chapter 3, where a number of applications of the Kronecker product were
considered, in this chapter a number of applications of matrix calculus are discussed. The applications have been selected from a number considered in the published literature, as indicated in the Bibliography at the end of this book. These problems were originally intended for the expert, but by expansion and simplification it is hoped that they will now be appreciated by the general reader.
7.2 THE PROBLEMS OF LEAST SQUARES AND CONSTRAINED OPTIMISATION IN SCALAR VARIABLES
In this section we consider, very briefly, the Method of Least Squares to obtain a curve or a line of `best fit', and the Method of Lagrange Multipliers to obtain an extremum of a function subject to constraints. For the least squares method we consider a set of data
i = 1, 2, ..., n
(xi, Yi)
(7.1)
'L7
and a relationship, usually a polynomial function (7.2)
Y = f(x) For each x;, we evaluate f(xi) and the residual or the deviation
ei = y, -f(xr) .
(7.3)
E--
The method depends on choosing the unknown parameters, the polynomial coefficients when f(x) is a polynomial, so that the sum of the squares of the residuals is a minimum, that is n
S = > ei is a minimum.
(Yi -f(x,))'
(7.4)
The Problems of Least Square and Constrained Optimisation
[Sec. 7.21
95
In particular, when f(x) Is a linear function
y =ao+alx S(ao, al) is a minimum when
as as0
C/!
as
(7.5)
=0=as . 1
These two equations, known as normal equations, determine the two unknown parameters ao and a1 which specify the line of 'best fit' according to the principle of least squares. For the second method we wish-to determine the extremum of a continuously differentiable function
f(x1,x2, ...,xn)
(7.6)
whose n variables are contrained by in equations of the form
g1(x1,x2,...,x,) = 0,
1 = 1,2,...,rr
The method of Lagrange Multipliers depends on defining an augmented function
ff+
m
1pigt t=1
where the pi are known as Lagrange multipliers.
The extreme of f(x) is determined by solving the system of the (m + n) equations
af* ax,
=a
g; = 0
r = 1, 2, .. , n
i = 1,2,...,m
for the m parameters µl, u2, ... , µm and the n variables x determining the extremum. Example 71
Given a matrix A = [a11] of order (2 X 2) determine a symmetric matrix X = [x;j] which is a best approximation to A by the criterion of least squares. Solution Corresponding to (7.3) we have
E=A - X where E = [e;1] and e11 = a;i -x1j.
96
Some Applications of Matrix Calculus
[Ch. 7
.ti
The criterion of least squares for this example is to minimise
S=e= l,/
which is the equivalent of (7.6) above. The constraint equation is
Xi2 -x21 = 0 and the augmented function is
f* = Earl -x1/)2 + µ(x12 -x21) = 0
-2(a ll '-x11)
ax11
af*
-2(a12 -x12) +',1 = 0
ax12
af*
- -2 (a21 -x21) -11 = 0
.N+
ax21
= 0
af*
-2 (a22 - x22) = 0
ax22
This system of 5 equations (including the constraint) leads to the solution
µ = a12 -x21
x11 = all , x22 = a22 , x12 = x21 = J(a12 + a21) Hence a12 + a21
all
2
X =
2
a12 + a21
L
all
a12
a21
a22
+ 2
all
a21
a12
a22
a22
2
= j(A+A') 7.3 PROBLEM 1 - MATRIX CALCULUS APPROACH TO THE PROBLEMS OF LEAST SQUARES AND CONSTRAINED OPTIMISATION
If we can express the residuals in the form of a matrix E, as in Example 7.1, then the sum of the residuals squared is
S = tr E'E
.
(7.10)
Problem 1
Sec. 7.3]
97
The criterion of the least squares method is to minimise (7,10) with respect to the parameters involved.
The constrained optimisation problem then takes the form of finding the matrix X such that the scalar matrix function
S = f(X) is minimised subject to contraints on X in the form of
.G(X)=0
(7.11)
where G = [gill is a matrix of order (s X t) where s and t are dependent on the a.-
number of constraints g1l involved.
As for the scalar case, we use Lagrange multipliers to form an augmented matrix function f*(X). Each constraint gil is associated with a parameter (Lagrange multiplier) Ail Since
where
Eµllg;l = tr U'G
U = [µtl]
we can write the augmented scalar matrix function as
f*(X) = trE'E+ tr U'G
(7.12)
which is the equivalent to (7.8). To find the optimal X, we must solve the system of equations
af* = 0. ax
(7.13)
Problem
Given a non-singular matrix A = [ail] of order (n X n) determine a matrix X = [x,1] which is a least squares approximation to A
(i) when X is a symmetric matrix (ii) when X is an orthogonal matrix. Solution (i) The problem was solved in Example 7.1 when A and X are of order (2 X 2). With the terminology defined above, we write
E=A - X G(X) = X -X' = 0 so that G and hence U are both of order (n X n).
Some Applications of Matrix Calculus
98
[Ch. 7
Equation (7.12) becomes
f* = trA'A-trA'X-trX'A+trX'X+trU'X-trU'X'. We now make use of the results, in modified form if necessary, of Examples 5.4 and 5.5, we obtain
of ax
_ -2A+2X+U-U' = 0
for X = A+
U °- U' 2
Then
X'=A'+U'-U 2
and since X = X', we finally obtain `""
X=j(A+A'). E'"
(ii) This time
G(X)=X'X-I=0. Hence
f* = tr[A'-X'][A-X] +trU'[XX'-I]
so that
af
ax
_ -2A+2X+X[U+U']
=0 for X=A-X
2
fl.
Premultiplying by X' and using the condition
X'X = I we obtain =I+U+U'
X'A
2
and on transposing
A'X = I+
U+ U' 2
Hence
A'X = X'A
.
(7.14)
,_, ...
If a solution to (7.14) exists, there are various ways of solving this matrix equation.
Sec. 7.3]
Problem 1
99
For example with the help of (2.13) and Example (2.7) we can write it as
[(l ©A') .- (A' ©I)U] x = 0
(7,15)
where U is a permutation matrix (see (2.24)) and
x=vecX. .D.
We have now reduced the matrix equation into a system of homogeneous ...
equations which can be solved by a standard method. If a non-trivial solution to (7.15) does exist, it is not unique. We must scale it appropriately for X to be orthogonal.
There may, of course, be more than one linearly independent solution to (7.15). We must choose the solution corresponding to X being an orthogonal matrix.
Example 72 Given
A =
find the othogonal matrix X which is the least squares best approximation to A. Solution
-1
0
2
1
0
0
0
0
1
-1
0
0
2
1
[IOA'] =
r1 -1
0
and [A'©1]U =
1
0
0
0
0
1 -1
2
1
0
0
0
0
2
1
Equation (7.15) can now be written as 0
0
0
0
2
1
-1
1
-2 -1
1
-1
0
0
0 1-+
0
x=0
'L7
There are 3 non-trivial (linearly independent) solutions, (see [18] p.131). They are
x = [1 -2 1 1]',
x = [1
1
2 -1]'
and
Only the last solution leads to an orthogonal matrix X, it is
X=1
13
2
3
-3
2
x = [2 -3 3 2]'.
[Ch. 7
Some Applications of Matrix Calculus
100
7.4 PROBLEM 2 - THE GENERAL LEAST SQUARES PROBLEM The linear regression problem presents itself in the following form: N samples from a population are considered. The ith sample consists of an te/
observation from a variable Y and observations from variables X1, X2, ..., X (say).
We assume a linear relationship between the variables. If the variables are measured from zero, the relationship is of the form
Yl = bo+blxn+b2x11+...+bx,8+el.
(7.16)
If the observations are measured from their means over the N samples, then
(i= 1, 2, ... N)
yr =
(7.17)
bo, b1, b2, ... , b are estimated parameters and e1 Is the corresponding residual. In matrix notation we can write the above equations as
y = Xb + e
(7.18)
[]
where
Y=
.
b=
ba
,
eI
e=
2
...
Y2
[bl]
YNI' and
rl
... xln
X22 ... X2n
X11 X12 or
X =
... Xln
X21 X22 ... x2n
...
1
...
I{
x12
ex
...
X =_
Ibn
L1
XN2 ... XNnJ
LXNI XN2 ... XNnJ
.
As already indicated, the `goodness of fit' criterion is the minimisation with respect to the parameters b of the sum of the squares of the residuals, which in this case is
S = e'e = (y'-b'X')(y-Xb).
Making use of the results in table (4.4), we obtain a (e'e)
ab
=
-(y
'X)'-X'y + (X'Xb +X'Xb)
= -2X'y + 2X'Xb = 0 for X'Xb = X'y.
(7.19)
where b is the least squares estimate of b. If (X'X) is non-singular, we obtain from (7.19) b
= (X'X)-1 X'y..
(7.20)
Problem 2
Sec. 7.41
101
We can w,ite (7.19) as
X'(y -Xi) = 0 X'e = 0
or
(7.21)
which is the matrix form of the normal equations defiend in section 7.2. Example Z 3
Obtain the normal equations for a least squares approximation when each sample consists of one observation from Y and one observation from
(i) a random variable X (ii) two random variables X and Z. Solution (1)
X =
Y,
x1
1
x2
1
I
y =
Y2
6, ,
b = 62
... 1
XN
YN
hence
X'[y-Xb] = Ey;-b1N-b2Ex; ExiYi - b, Ex; - 62 Ex,2J So that the normal equations are .-0
Ey, = b,N+b2Ex1 and Exly! = b1 E xr + b2 Ex,? .
(ii) In this case x1 l
x2 z2
bl
Y11
y =
b=
Y2
...
...
X=
z
Lb3J
11 xN ZNJ
LYNJ
The normal equations are