Kronecker Products and Matrix Calculus: With Applications

Kronecker Products and Matrix Calculus: with Applications ALEXANDER GRAHAM, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior Le...

Author: Alexander Graham

247 downloads 1736 Views 1MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Kronecker Products and Matrix Calculus: with Applications ALEXANDER GRAHAM, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior Lecturer in Mathematics, The Open University, Milton Keynes

,.

.

'thu Llu,c,y UUI'JOl ,tty 01

PclrO\(,Um I. MIH 0, 0 < 0 . A=l

/',. ail bpjxlp

(4,14)

.

1=1

From (4.14) we immediately obtain

roles

ayij axrs

(4.15)

- atrbsj

We can now write the expression for aylj/aX , ayii

ayij

aytj

ax11

aX12

a

ayu

ayu

ax21

ax22

aX2n

, .

aXln

(4.16)

... aylj

ay11

aXm 1 axm 2

...

aye,

axm n

Using (4.15), we obtain

ailbnj

a12b2j

... ...

Limblj aimb2i

...

almbnjj

a11blj

aitb2j

ai2blj

aylj

ax

a12bnj

(4.17)

We note that the matrix on the right hand side of (4.17) can be expressed as (for notation see (1.5) (1.13) (1.16) and (1.17)) ail

ail

(btjb2j ... bnjj

atmj

= Al. B./ = A'e1 ee B'.

(Ch. 4

Introduction to Matrix Calculus

62

So that ay`I

ax

= A'Er/B'

(4.18)

where Ell is an elementary matrix of order (I X q) the order of the matrix Y. We also use (4.14) to obtain an expression for aYlaxrs

=

aXrs

r0'

aY

(r, s fixed, 1, j variable I < i s 1, 1< j 5 q)

ay,I aXrs

that is ayl2

ay,g

axrs

aXrs

aXrs

ay

aye,

ay22

ay2g

axrs

aXrs

axrs

axrs

ay a

8YI2

xs

axrs

Eli ay" axrs

(4.19)

AID"

ayll

aytq axrs

where Et1 is an elementary matrix of order (1 X q). We again use (4.15) to write

ayu

alrbsl

alrbs2

a2rbsi

a2rbs2

arnrbsl

arnrba2

... ... alrbq a2rbsq

axrs .

amrbsq

air

a2r

[bst b52

.

. .

bsq ]

arnr

A.rBs' = AeresB

.

So that axrs

= AErsB SRS

a (AXB)

(4.20)

where Ers is an elementary matrix of order (m X n), the order of the matrix X.

The Derivative of a Matrix

Sec. 4.51

63

Example 4.5 Find the derivative aY/axr,, given

Y = AX'B where the order of the matrices A, X and B is such that the product on the right hand side is defined. Solution By the method used above to obtain the derivative a/axis (AYB), we find a 3Xrs

(AX'B) = AE,,B

.

Before continuing with further examples we need a rule for determining the derivative of a product of matrices. Consider

Y = UV

(4.21) .C)

where U = [u11] is of order (rn X n) and V = [qj] is of order (n X 1) and both U and V are functions of a matrix X. We wish to determine

aY

ay11

axis

ax

-- and

The (i,j)th element of (4.21) is n

ylj =

(4.22)

UIPVPI

P=1

hence

airs

n

vPj P=

i aXrs

P=I

-

avP1 U iP

lam

n UUp

ay;j

(4 23)

.

.

axis

For fixed r and s, (4.23) is the (i,j)th element of the matrix aYlax,s of order (m X 1) the same as the order of the matrix Y. II.

On comparing both the terms on the right hand side of (4.23) with (4.22), we can write

a(UV) axrs

as one would expect.

au axis

V+U

av axis

(4.24)


64

[Ch. 4 ti-

On the other hand, when fixing (i,j), (4.23) is the (r,s)th element of the matrix ay;l/aX, which is of the same order as the matrix X, that is ay,l

L "lip ax vpl + L utp

ax

p=1

p-1

avpl (4.25)

ax

We will make use of the result (4.24) in some of the subsequent examples. Example 4.6 Let X = [xrs] be a non-singular matrix. Find the derivative aY/axrs, given

(i) Y = AX -'B, and

(ii) Y=XAX Solution

(i) Using (4.24) to differentiate

yy-t = I, we obtain

aY 3Y-' = 0, -Y-'+Y axrs axrs

hence

aY -axrs _ -Y -ay-' Y-. axrs

But by (4.20) a

axrs

axrs

(B-1XA-1) = B-'Ers q-t

,-.

3Y-' so that

CID

ay

a

- = - (AX-'B) = AX -'BB-'ErsA-'AX -'B axrs

axrs

AX-'ErsX-'B .

(ii) Using (4.24), we obtain

ay axrs

_

-

aX'

axrs

AX+X'

a(AX) axrs

_ E, AX + X'Airrs

(by (4.12) and (4.20)) .

Both (4.18) and (4.20) were derived from (4.15) which is valid for all i, j and r, s, defined by the orders of the matrices involved. 1

The Derivative of a Matrix

Sec. 4.5 1

65

The First Transformation Principle

,R.

It follows that (4.18) is a transformation of (4.20) and conversely. To obtain (4.18) from (4.20) we replace A by A', B by B' and Er: by Eli (careful, Ers and Etl may be of different orders). The point is that although (4.18) and (4.20) were derived for

constant matrices A and B, the above transformation is independent of the status of the matrices and is valid even when A and B are functions of X.

Example 4.7 Find the derivative of aytl/aX, given

(i) Y = AX'B,

(ii) Y=AX-'B, and (iii) Y = X AU where X = [x,l] is a nonsingular matrix, Solution (1) Let W = X', tlien

ay Y = AWB so that by (4.20) - =AEr3B aWrs

hence

ay,l

aw

= A'E;iB'.

But ayL/

a}ri

ax

aw'

_ (ay.l

awl

hence DYq

ax

= BE ;IA

(ii) From Example 4.6(i) aY axrs

-AX-'L,-,,,X-'B.

Let At = AX -1 and Bt = X''B, then aY a xrs

A1E 3B

1

so that ay,t

ax

= AiE,1B1' = -(X )'A'E;1B'(X t)' .


66

[Ch. 4

(iii) From Example 4.6 (ii) aY aXrs

= E,,AX + X'AE,,s

.

0.j

LetA,=1,Bt=Ax,A2=XAandB2=1, then ax axrs

= AtErsBl +A2Ersl32 .

The second term on the right hand side is in standard form. The first term is in the form of the solution to Example 4.5 for which the derivative ay;l/aX was found in (i) above, hence ay 'r = B1E;1AI + A2E,/B2'

ax

= AXE; + A`xE;l . It is interesting to compare this last result with the example in section 4.2 when we considered the scalary = x'Ax. In this special case when the matrix X has only one column, the elementary matrix which is of the same order as Y, becomes

E;1=E;j=1. Hence

ay,, = aY

ax

ax

= Ax + A'x

which is the result obtained in section 4.2 (see (4,4)). Conversely using the above techniques we can also obtain the derivatives of the matrix equivalents of the other equations in the table (4.4). Example 4.8 Find aY

ay;; and

aXrs

ax

when (i) Y = AX, and

(ii) Y=X'X. Solution (i) With B = I, apply (4.20) aY axrs

= AEr3.

The Derivatives of the Powers of a Matrix

Sec. 4.61

The transformation principle results in ay11

ax

(ii) This is a special case of Example 4.6 (ii) in which A = I. We have found the solution aY axrs

ErsX + X'Ers

and (Solution to Example 4.7 (iii))

'Y" = XE11 + XEj . ax

4.6 THE DERIVATIVES OF THE POWERS OF A MATRIX Our aim in this section is to obtain the rules for determining ay;;

ay and

axrs

when

ax

Y=X".

Using (4.24) when U = V= X so that

Y=X2 we immediately obtain

ay

- =ErsX+XErs axrs

and, applying the first transformation principle,

ay,

ax

= E;1X'+X'E;j .

It is instructive to repeat this exercise with

U= X 2 so that Y

and

V= X

X3.

We obtain

ay Ltd

axrs

= ErsX 2 + XErsX + X 2Ers

and

Y-u = Ei, (X')2 + X'EifX' + (X 1)2E,,

ax

67

Introduction to, Matrix Calculus

68

[Ch. 4

More generally, it can be proved by induction, that for

Y=Xn kEESXn-k-1

X

(4.26)

k=0

where by definition X ° = I, and

"-I

ay;l

(4.27)

x )k E,j (X ) n -k-1

ax

k=1

Example 4.9 Using the result (4.26), obtain aYlaxrs when

Y=X-n Solution Using (4.24) on both sides of

X-nXn=I we find

a(X-n)

Xn+X-n

a (Xn)

airs

so that

=

0

axrs

3(X-n)

_ `x -n a(Xn) X-n.

axrs

axrs

Now making use of (4.26), we conclude that

3(X-n)

= -x-n

7

Fn-1

XkErsXn-k-1

axrs

L=° Problems for Chapter 4

...

-

(1) Given

x=

xtl x12 x3 x233]

Y=

x21 x22

and y = 2x11x22 -x21x13, calculate BY ay

ax

and

ax

x-1 1

2x2

sin x

The Derivatives of the Power of a Matrix

Sec. 4.61

69

(2) Given

X

sinx

X

cos x

cz

X=

and

Fsinx

ex

L'

XI

evaluate

alxl ax by

(a) a direct method (b) use of a derivative formula.

(3) Given X =

X11

x12 X13

and

Y = X'X,

Lx 21 x22 X231

use a direct method to evaluate (a)

DY

and

(b)

aY i3

ax-21

ax

(4) Obtain expressions for

by ax's

and

ay;;

ax

when

(a) Y = XAX and

(b) Y = XAX'.

(5) Obtain an expression for atAXBI/ax,,. It is assumedAXB is non-singular. (6) Evaluate aY/ax,,s when (a) Y = X (X')2

and

(b) Y = (X')2X.

CHAPTER 5

Further Development of Matrix Calculus including an Application

of Kronecker Products 5.1 INTRODUCTION

In Chapter 4 we discussed rules for determining the derivatives of a vector and then the derivatives of a matrix. But it will be remembered that when Y is a matrix, then vec Y is a vector.

This fact, together with the closely related Kronecker product techniques discussed in Chapter 2 will now be exploited to derive some interesting results. Also we explore further the derivatives of some scalar functions with respect to a matrix first considered in the previous chapter. 5.2 DERIVATIVES OF MATRICES AND KRONECKER PRODUCTS In the previous chapter we have found ay;!/3X when

Y = AXB

(5.1)

-o^

where Y = [y1j], A = [ajj], X = [x11] and B = [by]. We now obtain (a vec Y)/(a vec X) for (5.1). We can write (5.1) as

y=Px

(5.2)

where y = vec Y, x=vecXand P=B'OA. By (4.1), (4.4) and (2.10) ay

ax

=P' = (B'OA)' = BOA'.

(5.3)

The corresponding result for the equation

Y = AX'B is not so simple.

(5.4)

[Sec. 5.2]

time

Derivatives of Matrices and Kronecker Products

71

The problem is that when we write (5.4) in the form of (5.2), we have this

y = Pz

(5.5)

where z = vec X' We can find (see (2.25)) a permutation matrix U such that

vecX' = UvecX

(5.6)

in which case (5.5) becomes

y=PUx so that ax

= (PU)' = U'(B ®A') .

5.7)

It is convenient to write

U'(B O A') = (B

(5.8)

U' is seen to premultiply the matrix (B O A'). Its effect is therefore to rearrange the rows of (B d A'). In fact the first and every subsequent nth row of (B (D A') form the first consecutive m rows of (B O A')(,,). The second and every subsequent nth row form the next m consecutive rows of (B and so on. A special case of this notation is for n = 1, then

(B (D A'){1) = BOA'

.

(S.9)

Now, returning to (5.5), we obtain, by comparison with (5.3) ay ax

= (B(D

Example 5.1 Obtain (a vec Y)/(a vec X), given X = [x;l] of order (m X n), when

(i) Y=AX, (ii) Y=XA, (iii) Y=AX' and (iv) Y=XA. Solution

Let y = vec Y and x = vec X.

(i) Use (5.3) with B = I ay ax

= 10 A'.

(5.10)

Further Development of Matrix Calculus

72

(ii) Use (5.3) ay

ax

= A ®I .

(iii) Use (5.10) ay

_ (I ®A')(n)

ax

(iv) Use (5.10) ay

= (A ®I)(o

ax

5.3 THE DETERMINATION OF (a vec X)/(3 vec Y) FOR MORE COMPLICATED EQUATIONS

In this section we wish to determine the derivative (a vec Y)/(a vec X) when, for example, Y = X'AX (5.11) wheie X is of order (m X n).

Since Y is a matrix of order (n X n), it follows that vec Y and vec X are vectors of order nn and nm respectively. With the usual notation III

Y = [yi/)

,

X = [xi/)

we have, by definition (4.1), ay11

ay21

ax11

ax11

ax11

a vec Y

ayl I

ay21

aynn

avecx

axle

a .x21

ax21

aynn

ayll

ay21

aynn

aXmn

axmn

3Xmn

...

...

...

...

I

[Ch. 5

But by definition (4.19), ay) ' the first row of the matrix (5,12) is vec -ax, I

/

the second row of the matrix (5.12) is +\vec

a'

Y-),etc.

a.x21

(5.12)

The Determination of (3 vecX)/(3 vec Y)

Sec. 5.3]

73

We can therefore write (5.12) as a vec Y

a vecX

( by aY 1 ' BY = vec - : vec - ; ... ; vec ,

3x11

ax,nn

8x21

(5.13)

We now use the solution to Example (4.6) where we had established that

when Y = X'AX, then

by axrs

= E,,SAX + X AErs .

(5.14)

It follows that

by

vec - = vec E;SAX +vec X AE,S axrs

= (XA'OI) vecE;S+(IOXA)vecErs

(5.15)

(using (2.13)) . Substituting (5.15) into (5.13) we obtain a vec Y

a vec X

_ [(X'A'01)[vee/'1 vecE21;

.

;vecErnr,]]'

+ [(IOXA)[vecEll: vecE21:... vecE,,,n]]' _ [vec Eii: vec E21; ... ; vec E;,,n]'(AX 01) + [vec E11 vec E21 vec E,nn ]' (I (DA'X)

(5.16)

(by (2.10)). The matrix [vec E, 1 , vec E21

.. .

vec Ernn ]

is the unit matrix I of order (mn X mn). Using (2.23) we can write (5.16) as

3vecY

avecX

= U'(AX 01) + (10 A'X) .

That is a vec Y

(5.17)

a vcc X

In the above calculations we have used the derivative a Y/axrs to obtain (3 vec Y)/

(a vecX).


74

[Ch. 5

The Second Transformation Principle'-j

Only slight modifications are needed to generalise the above calculations and show that whenever ay

= AErsB + CE,, D

aXrs

where A, B, C and D may be functions of X, then a vec Y cow

avecX

=

(5.18)

We will refer to the above result as the second transformation principle. Example .f.2 Find

avecY

when

avecX

(i) Y = X'X

(ii) Y = AX-'B

Solution

Lety=vecYandx=vecX (i) From Example 4.8 ay

= Er'sX + X'Ers

aXrs

Now use the second transformation principle, to obtain ay

ax

= I©X+(X(D

(u) From Example 4.6 ay axrs

AX-'ErjX-'B

hence

ay

ax

= -(X -'B) O (X-')'A'.

Hopefully, using the above results for matrices, we should be able to rediscover results for the derivatives of vectors considered in Chapter 4.

Sec. 5.4]

More on Derivatives of Scalar Functions

75

For example let X be a column vector x then

y=

Y = X'X becomes

x 'x

(y is a scalar) .

The above result for ay/ax becomes av

= (I0 x)+(x0 1)(1)

ax

0

c..

But the unit vectors involved are of order (n X 1) which, for the one column vector X is (1 X 1). ilence ay

= l ©x + x ©1 ax

(use (5,9))

=x+x=2x

which is the result found in (4.4). 5.4 MORE ON DERIVATIVES OF SCALAR FUNCTIONS WITH RESPECT TO A MATRIX

In section 4.4 we derived a formula, (4.10), which is useful when evaluating 31Y)/3X for a large class of scalar matrix functions defined by Y. .ti

Example.5.3 Evaluate the derivatives a log IX

()

ax

and

aIXIr

(ii)

ax

Solution (i) We have

(log IXD = X t-0

axrs

I I

I

.

axa rs

From Example 4.4,

Hence

alxl Ixl(x-') ax = a log IXI

ax (ii)

alxlr aXrs

_ = (X

= rjXj r-1

(non-symmetric case) .

1) .

a1xl aXrs


76

[Ch. 5

Hence

alxlr -- rlXIr(X-1)' ax Traces of matrices form an important class of scalar matrix functions covering a wide range of applications, particularly in statistics in the formulation of least squares and various optimisation problems. Having discussed the evaluation of the derivative a Y/axrs for various products of matrices, we can now apply these results to the evaluation of the derivative

a(tr Y)

ax We first note that

a(tr Y) _ [a(tr Y)1 axrs

(5.19) c^,

ax

JI

where the bracket on the right hand side of (5.19) denotes, (as usual) a matrix of the same order as X, defined by its (r,s)th element. As a consequence of (5.19) or perhaps more clearly seen from the definition (4.7), we note that on transposing X, we have

a(tr Y) '

a(tr Y)

ax

ax'

(5.20) -

Another, and possibly an obvious property of a trace is found when considering the definition of aY/axrs (see (4.19)). Assuming that Y = [yij] is of order (n X n)

tray

=

axrs

ayri+aY22+...+aYnn axrs

3Xrs

axrs

a

- (YI1 + Y22 + .

ay

a (tr Y)

axrs

axrs

tr

Example 5.4 Evaluate

+ Ynn)

axrs

Hence,

a tr(AX) ax

(5.21)

Sec. 5.4]

More on Derivatives of Scalar Functions

77

Solution

a tr(AX) aXrs

= tr

a(AX) by (5.21)

airs

= tr (AE,,)

by Example (4.8)

= tr(E,,A')

since tr Y = tr Y'

= (vec E,.,)' (vec A') by Example (1.4). Hence,

atr(AX) ax

,

= A

As we found in the previous chapter we can use the derivative of the trace of one product to obtain the derivative of the trace of a different product. Example 5.5 Evaluate

a tr (AX')

ax Solution From the previous result a t r (BX)

_ a t r (X'B') = B,

ax

ax

.-1

Let A' = B in the above equation, it follows that

atr(X'A) ax

_

atr(A'X) = A.

ax

The derivatives of traces of more complicated matrix products can be found similarly.

Example 5.6 Evaluate

when

8 (tr Y)

aY

(i) Y = XAX (ii) Y = X AXB Solution It is obvious that (i) follows from (ii) when B = I.


78

[Ch. 5

(ii) Y = X1B where X1= X AU.

ay _ axt airs

B

ax-".'

= E,s AXB + X'AEB

(by Example 4.6)

Hence,

tr(aY\ = tr(E,3AXB)+tr(X`AErsB) axrs!)

tr (E,,4AXB) + tr (E,,.4 XB')

= (vec EE,.)' vec (AXB) + (vec Ers)' vec (AXB') .

It follows that

a(trY)

= AXB + A'XB'.

ax

(i) Let B = I in the above equation, we obtain

a(tr Y)

ax

= AX+A'X = (A+A')X .

5.5 THE MATRIX DIFFERENTIAL For a scalar function f(x) where x = [x1 x2 as

df = > J=

of

... x,,]', the differential df is defined

dxl.

(5.23)

Ox,

Corresponding to this definition we define the matrix differential dX for the matrix X = [x;1] of order (m X n) to be

>'C

dX =

dxtn

dx22

... ...

dXm2

...

dxrn.1

dx11

dx12

dx21

Ldxmt

(5.24)

dx2n

.

The following two results follow immediately:

d(aX) = a(dX)

(where a is a scalar)

d(X + Y) = dX + dY. Consider now X = [x;1] of order (m X n) and Y = [ y,f] of order (n X p).

XY = [ExjJyjk]

(5.25)

(5.26)

The Matrix Differential

Sec. 5.5]

79

hence

d(XY) = d[Yxtlyjk) = 7

_ [E(dXij)yjk) + IExii(dYjk)) It follows that

d(XY) = (dX)Y+X(dY).

(5.27)

Example 5.7 Given X = [xtl] a nonsingular matrix, evaluate

(i) dlXl , (il) d(X'') Solution

(i) By (5.23) dIXI

(dx,j) ax11

Xij(dxij) since (a1Xl)/(axij) =Xij, the cofactor ofxij in IXI. By an argument similar to the one used in section 4.4, we can write

dIXI = tr {Z'(dX)}

(compare with (4.10))

where Z = IXij] Since Z'= IX jX-1, we can write

dIXI = IXl tr {X-'(dX)} . (ii) Since

X-1X = we use (5.27) to write

d(X-')X + X-'(dX) = 0. Hence

d(X-') = -X-'(dX)X"' (compare with Example 4.6). Notice that if X is a symmetric matrix, then

x=x' and

(dX)' = dX

.

(5.28)


80

[Ch. 5]

Problems for Chapter 5

(1) Consider

A =

all a12 a21

X=

a12

[X11 xiz

and Y = AX'.

X21 X22

Use a direct method to evaluate a vec Y

avac X and verify (5.10).

(2) Obtain avac Y

avecx when

(i) Y = AX'B and (ii) Y = )JAII X2. (3) Find expressions for

atrY ax when .,.,.

(a) Y = AXB, (b) Y = X2

and

(c) Y = XX'.

(4) Evaluate

a try ax when

(a) Y = X-1, (b) Y = AX-'B, (c) Y = X" and (d) Y = eX. (5) (a) Use the direct method to obtain expressions for the matrix differential dY when

(i) Y = AX, (ii) Y = X'X and (iii) Y = X2. (b) Find dY when

Y = AXBX.

Cl IAPTLR 6

The Derivative of a Matrix with respect to a Matrix 6.1 INTRODUCTION

In the previous two chapters we have defined the derivative of a matrix with respect to a scalar and the derivative of a scalar with respect to a matrix. We will now generalise the definitions to include the derivative of a matrix with respect y,,

to a matrix. The author dial"adopted the definition suggested by Vetter [31], although other definitions also'give rise to some useful results.

6.2 THE DEFINITIONS AND SOME RESULTS

Let Y = [y,j be a matrix of order (p X q). We have defined (see (4.19)) the derivative of Y with respect to a scalar xrs, it is the matrix [ayti/axr,s] of order

(pXq) Let X = [xrs] be a matrix of order (m X n) we generalise (4.19) and define the derivative of Y with respect to X, denoted by aY

ax as the partitioned matrix whose (r,s)th partition is aY axrs

in other words ay

ay

OXt1

3x12

aY

aY

aY

ax

421

...

axij aY

...

}d{

OXmt

Cc)

aY

aY

axm2

ay

Ers0 -

_ 3x2n

...

a.X22

aY

aY

r, s

axrs

(6.1)

[Clt. 6

The Derivative of a Matrix with Respect to a Matrix

82

The right hand side of (6.1) following from the definitions (1.4) and (2.1) where Err is of order (in X n), the order of the matrix X. It is seen that 3Y/3X is a matrix of order (mp X nq). Example 6.1 Consider Y =

x11 x12 x22

sin(xii +x12)

exll x" log (x1t ,F-X21))J

and

X

x11 xt21 x21 x22

_.y

Evaluate aY

ax Solution

ay

x22 exl l x]] 1

12 x22

axi t

cos (XI I

1

+ x12)

(x11 + x21)

ay aX12

x77 x22

0

cos (x11 + x12)

0

1

421

0

,1y

0

0

ay

ay ax22

x11x12

x17 exllx731

0

0

x11 + x21 x12 x22

ay ax

x22 exl l x»

X1 t x22

0

cos (x11 + x12)

0

xtt x12

x11 exl l x21

1

cos (x11 + x 12 )

xii + x21

0

0 1

0

0

Example 6.2 Given the matrix X = [xv] of order (m X n), evaluate aX/aX when

(i) All elements of X are independent (ii) X is a symmetric matrix (of course in this case m = n).

0

The Definitions and Some Results

Sec. 6.2)

.-,

Solution (i) I3y (G.1)

ax r

ax

= U (see (2.26))

r, s

ax

= Ers +Esr

axrs

ax

=

axrs.

"

for

r$s

for

r=s

We can write the above as;

ax = Ers + Esr - SrsErr

axrs

Hence, ax

Ers + > Ers Ox Esr ` 5rs > Esr Ox Err

rs

ax

r,s

r,s

r, s

= U+ U-2:ErrOx Err

(see (2.24) and (2.26))

Example 6.3 Evaluate and write out in full ax'lax given X12 X13

X11

X =

Lx21 x22 x231 v°,

.-,

Solution By (6.1) we have ax'

ax = Ers © Ers = U. Hence 1

I--

0

0

0

0

0

0

0

1

0

0

0

ax,

0

0

0

0

1

0

ax -

0

1

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

83


84

[Ch. 6

From the definition (6.1) we obtain

)'

tax, =(>Ers °aX r, s

by (2. 10)

Ers Ox f a

_

\axr.

O

a Y'

from (4 19)

r,s

It follows that

aY fax

aY (6.2)

= ax'

"C7

'6.3 PRODUCT RULES FOR MATRICES

We shall first obtain a rule for the derivative of a product of matrices with

I-,

respect to a matrix, that is to find an expression for a (XY)

az where the order of the matrices are as indicated

X(mXn), Y(nXv), Z(pXq). By (4.24) we write

a(XY) azrs

=

ax

Y+X

azrs

aY azrs

where Z = [Zrsl

If Ers is an elementary matrix of order (p X q), we make use of (6.1) to write Fax

a (XY)

ay l

Y+X

aZrs

r. s

azrs

ax -Y+

IEr,

aZrs

r, s

Ers(8X

aY azrs

rs

3Y

ax

= > Erslo OX -Y+ r5' IIErs 0X UZrs .ox

aZ

Ers O

r. s

S

UZrs

Product Rules for Matrices

Sec. 6.3 1

85

(where Iq and Ip are unit matrices of order (q X q) and (p X p) respectively)

ax

(Lrs

(D- ) (Iq ®Y) + airs

r, s

(I ®X) Er, rs

aY ---) azrf

(by 2.11)

finally, by (6.1)

a(XY) az

ax = az

(I ®Y) + (I®(@ X) aY

(6.3)

az

Example 6.4 Find an expression for

ax-' ax Solution Using (6.3) on

xX-'=1, we obtain

a (xx-')

ax

ax

ax

ax-1

ax

hence

ax-I ax =

-(I©x)-'

ax(I©x-')

= -(IOX-1)CI(I(& X-') (by Example 6.2 and (2.12)).

Next we determine a rule for the derivative of a Kronecker product of matrices with respect to a matrix, that is an expression for a(X (D Y)

az

The order of the matrix Y is not now restricted, we will consider that it is (u X v). On representing X © Y by it (i,j)th partition [x;1Y] (i = 1, 2, ... , m, k = 1, 2, ..

,

n), we can write

a (X ©Y) azrs

a

air:

[xr1Y]

86


[Ch. 6

where (r, s) are fixed = L3ZrsYJ + L

aZ

s

j

ax

ay _ aZrs -OY+XO. aZrs Hence by (6.1)

3(X(D Y)

ax

:rs0x -OO Y+

az

aZrs

r, s

r,s

aY E 0X0 aZrs

where Ers is of order (p X q) =aZ®Y+'

Ers0(XO

aY\ azrs J

r,

The summation on the right hand side is not X © aY/aZ as may appear at first sight, nevertheless it can be put into a more convenient form, as a product of matrices. To achieve this aim we make repeated use of (2.8) and (2.11)

Ers®(Xazrs ® aYl= [IpErsIq]OLUii r, s

//

r, s

®X)U1] aZrs

/ by (2.14)

c(0

aY Ers) O U, -0 azrs

r, s

X

[Iq O U2]

by (2.11)

//

ErsOa-Y

OUi]

OO X [Ig0 U2] bY(2.11). aZrs

a(XOY)_ ax0Y+ az

10U ay0X] [IO U21 q

[p

(6.4)

] laz

C1.

az

where U, and U2 are permutation matrices of orders (mu X mu) and (nv X nv) f1.

re pe ctive ly.

We illustrate the use of equation (6.4) with a simple example.

(i) Equation (6.4), and (ii) a direct method to evaluate

a(A©X) ax

c14

'GO

Example 6.5 A = [ail] and X = [x11] are matrices, each of order (2 X 2). Use

Sec. 6.3]

Product Rules for Matrices

87

Solution (i) In this example (6.4) becomes

(Aaxx)

_ [I O

U1 ]

Cax ©A [I ©U2]

where I is the unit matrix of order (2 X 2) and 0

1

0

0

1

0

0

1

0

0

0

0

0

1

0

U1=U2=ZE,s0OErs=

0

Since

ax ax

1

0

0

1

0

0

0

0

0

0

0

0

1

0

0

1

only a simple calculation is necessary to obtain the result. It is found that

a(AOX) ax

a12 0

0

all

0

a12

0

0

0

0

0

0

a22

0

0

a21

0

a22

0

0

0

0

0

0

0

0

0

0

0

0

0

0

a12

0

0

all

0

a12

0

0

0

0

0

0

0

0

a21

0

a22

0

0

a21

0

a22

all

0

0

0

a21

0

0

0

0 all

(il) We evaluate ICS

Y = AOX =

allxll

alixl2

a12x11

a12x12

a11x21

a11x22

a12X21

a12x22

a21 x11

a21 x 12

a22 x 11

a22 x 12

a21x21

a21x22

a22x21

a22x22

and then make use of (6.1) to obtain the above result.

[Ch. 6

6.4 THE CHAIN RULE FOR THE DERIVATIVE OF A MATRIX WITH RESPECT TO A MATRIX We wish to obtain an expression for (0l'0

az

ax

where the matrix Z Is a matrix function of a matrix X, that is

Z = Y(X) where

X = [xii] is of order (m X n) Y = [ yil] is of order (u X v) Z = [zri] is of order (p X q) By definition in (6.1)

az

az

r=1,2,...,m

ax

axrs

s = 1, 2, ... , n

r, s

where Er,s is an elementary matrix of order (m X n),

= r,s

Ers D

i,i

l=1, 2,...,u

azii iiaxrs -

1 = 1, 2, ... , q

where Eli is of order (p X q) As in section 4, 3, we use the chain rule to write az,i

azii

airs

a,

a=1,2,...,u 0=1,2,...,v

ayap

ayap axrs

Hence az

ax =

ayap

Ers

ayap axrs

r, s

ayap axrs

ayap ® az 0e, 9

ax

ayap

O

Ei

azii

(by 2.5)

aya p

(by (4.7) and (4.19))

("1


88

Sec. 6.4]

The Chain Rule for the Derivative of a Matrix

89

If I,, and It, are unit matrices of orders (n X n) and (p X p) respectively, we can write the above as az

ax

ap

(1-Yli")'& \ IPaYap )

Hence, by (2.11)

M

p (aaX

aX

3z

N) (I.

l\

Yap

Equation (6.5) can be written in a more convenient form, avoiding the summation, if we define an appropriate notation, a generalisation of the previous one. Since

Y1i

Y12

Y21

Y22

LYu1

Yu2

Y =

than (vec Y)' _ y y21

...

Yiv Y2v

...

YuvJ

. Yuv J

We will write the partitioned matrix Laax®1

aXi(3)

P

1;...ax P

P

as

a

ax

or as

a (vec Y)'

ax

®IP

®IP

Similarly, we write the partitioned matrix az

In ® aYii

aY21

az In

ayuv

as

P In®

az l `DIN

az

In Ox -

a vec Y

[Ch. 6


90

We can write the sum (6.5) in the following order Y11

ax = raax

ray" 01] (1" © P ax yu1 +l

aaZ 1 +

® IPJ CI"

'r4

(0I(0

az

IL

Yzi

1.

..n

+auv®IPI"° azLayx

J[

aZ

ayu.J

We can write this as a (partitioned) matrix product +,G

_)I :,.

az

ayii©I aY21 ax r 75X P* ax 1

P

ax

-

az

I" ®ayuv Finally, using the notations defined above, we have a [vec Y]'

aZ

ax

,,p

az

aZ 1"0 ®

P

L"

(6.6)

a vec Y] fro

We consider a simple example to illustrate the application of the above formula. The example can also be solved by evaluating the matrix Z in terms of the components of the matrix X and then applying the definition in (6.1). w-.

Example 6.6 Given the matrix A = [au] and X = [x11] both of order (2 X 2), evaluate

aziax where Z = Y'Y and Y = AX. (i) Using (6.6) (ii) Using a direct method. Solution (1) For convenience write (6,6) as

az ax = Q

QR

[a[vecYr ®I ax P]

az

N

where

and

R = IO a vec Y

The Chain Rule for the Derivative of a Matrix

Sec. 6.4]

91

From Example 4.8 we know that

ay" ± A'Er ax

so that Q can now be easily evaluated,

Q

I

0

00

a22 0 0

1

a22 0 0

Also in Example 4.8 we have found aZ

= E,S Y + Y'Ers

aYra

we can now evaluate R 2Y11

Y12

0

0

Y12

0

0

0

0

0

2Y11

Y12

o

0

Y12

0

0

0 0

2Y21 Yn

0

0 0 all 0 all 0 0 1 0 0 0 all 001

a21 0 I

Y22

0

0

0

0

2Y21

Nom'

Y22

0

0

Y22

0

0

Y11

0

0

Yil

2Y,2

0

0

0

0

0

Y11

0

0

Y 2Y,2

R =

0""Y21"0""0" Y21

2Y22 0

0

0

0

0

Y21

Lo

0

Y21

2y2

X

000

ate

0 0 a21

0

000

a2i

0 0 a22 0

0 0 a12 0

0

00 0 all 0 0 a12 0 0 0 0 all 0 0 all 0

1

000

a22

I


92

(Ch.

The product of Q and R is the derivative we have been asked to evaluate

QR =

a11y12 + a21y22

o

0

a11y1 l +a21y21

a12y12 +1122Y22

o

;,c

E2ailyil + 2a21y21 a11y12 + a21y22

2412y + 2a22Y21 La12y12 +a22y22

0

ally,, + a21y21 2a11y12 + 2a21y22

al2y11 + a22y21

a12.y11 + a22y21 2a12y12 + 2a22y22

(ii) By a simple extension of the result of Example 4.6(b) we find that when

Z = X'A'AX az

axrs

= ErSAAX + X'A'AErs

= ErsA'Y + Y'AErs where Y = AX.

By (6.1) and (2.11)

ax

r-.

az

(Ers Ox Ers) (10 A'Y) + 2 (I OO Y'Z) (Ers Ox Ers) r.s

r,s

Since the matrices involved are all of order (2 X 2) 0

0

0

0

0

1

0

0

^'.'

IErsOE;s =

1

1

0

0

0

0

0

1

1

0

0

1

0

0

0

0

0

0

0

1

0

0

and

O--

E Ers OX Ers =

0 1

0

On substitution and multiplying out in the above expression for aZfaX, we obtain the same matrix as in (i). Problems for Chapter 6

(1) Evaluate aYjaX given

y_

[cos (X12 + x22) xux211 X12x22

and

X=

x11

x12

IX-21

X22

Problems .L]

6] (2)

rxil

The elements of the matrix X =

x12 LX13

93

x21

x22 X23 J

are all independent. Use a direct method to evaluate aX/aX.

()3

I x11

x12

x21

x22

Given a non-singular matrix X = _

]

.mar

use a direct method to obtain

ax-1

ax and verify the solution to Example 6.4.

(4) The matrices A = [aiij and X = [x,ij are both of order (2 X 2), X is nonsingular. Use a direct method to evaluate

a(A 0 X-')

ax

CHAPTER 7

Some Applications of Matrix Calculus 7.1 INTRODUCTION

As in Chapter 3, where a number of applications of the Kronecker product were

considered, in this chapter a number of applications of matrix calculus are discussed. The applications have been selected from a number considered in the published literature, as indicated in the Bibliography at the end of this book. These problems were originally intended for the expert, but by expansion and simplification it is hoped that they will now be appreciated by the general reader.

7.2 THE PROBLEMS OF LEAST SQUARES AND CONSTRAINED OPTIMISATION IN SCALAR VARIABLES

In this section we consider, very briefly, the Method of Least Squares to obtain a curve or a line of `best fit', and the Method of Lagrange Multipliers to obtain an extremum of a function subject to constraints. For the least squares method we consider a set of data

i = 1, 2, ..., n

(xi, Yi)

(7.1)

'L7

and a relationship, usually a polynomial function (7.2)

Y = f(x) For each x;, we evaluate f(xi) and the residual or the deviation

ei = y, -f(xr) .

(7.3)

E--

The method depends on choosing the unknown parameters, the polynomial coefficients when f(x) is a polynomial, so that the sum of the squares of the residuals is a minimum, that is n

S = > ei is a minimum.

(Yi -f(x,))'

(7.4)

The Problems of Least Square and Constrained Optimisation

[Sec. 7.21

95

In particular, when f(x) Is a linear function

y =ao+alx S(ao, al) is a minimum when

as as0

C/!

as

(7.5)

=0=as . 1

These two equations, known as normal equations, determine the two unknown parameters ao and a1 which specify the line of 'best fit' according to the principle of least squares. For the second method we wish-to determine the extremum of a continuously differentiable function

f(x1,x2, ...,xn)

(7.6)

whose n variables are contrained by in equations of the form

g1(x1,x2,...,x,) = 0,

1 = 1,2,...,rr

The method of Lagrange Multipliers depends on defining an augmented function

ff+

m

1pigt t=1

where the pi are known as Lagrange multipliers.

The extreme of f(x) is determined by solving the system of the (m + n) equations

af* ax,

=a

g; = 0

r = 1, 2, .. , n

i = 1,2,...,m

for the m parameters µl, u2, ... , µm and the n variables x determining the extremum. Example 71

Given a matrix A = [a11] of order (2 X 2) determine a symmetric matrix X = [x;j] which is a best approximation to A by the criterion of least squares. Solution Corresponding to (7.3) we have

E=A - X where E = [e;1] and e11 = a;i -x1j.

96

Some Applications of Matrix Calculus

[Ch. 7

.ti

The criterion of least squares for this example is to minimise

S=e= l,/

which is the equivalent of (7.6) above. The constraint equation is

Xi2 -x21 = 0 and the augmented function is

f* = Earl -x1/)2 + µ(x12 -x21) = 0

-2(a ll '-x11)

ax11

af*

-2(a12 -x12) +',1 = 0

ax12

af*

- -2 (a21 -x21) -11 = 0

.N+

ax21

= 0

af*

-2 (a22 - x22) = 0

ax22

This system of 5 equations (including the constraint) leads to the solution

µ = a12 -x21

x11 = all , x22 = a22 , x12 = x21 = J(a12 + a21) Hence a12 + a21

all

2

X =

2

a12 + a21

L

all

a12

a21

a22

+ 2

all

a21

a12

a22

a22

2

= j(A+A') 7.3 PROBLEM 1 - MATRIX CALCULUS APPROACH TO THE PROBLEMS OF LEAST SQUARES AND CONSTRAINED OPTIMISATION

If we can express the residuals in the form of a matrix E, as in Example 7.1, then the sum of the residuals squared is

S = tr E'E

.

(7.10)

Problem 1

Sec. 7.3]

97

The criterion of the least squares method is to minimise (7,10) with respect to the parameters involved.

The constrained optimisation problem then takes the form of finding the matrix X such that the scalar matrix function

S = f(X) is minimised subject to contraints on X in the form of

.G(X)=0

(7.11)

where G = [gill is a matrix of order (s X t) where s and t are dependent on the a.-

number of constraints g1l involved.

As for the scalar case, we use Lagrange multipliers to form an augmented matrix function f*(X). Each constraint gil is associated with a parameter (Lagrange multiplier) Ail Since

where

Eµllg;l = tr U'G

U = [µtl]

we can write the augmented scalar matrix function as

f*(X) = trE'E+ tr U'G

(7.12)

which is the equivalent to (7.8). To find the optimal X, we must solve the system of equations

af* = 0. ax

(7.13)

Problem

Given a non-singular matrix A = [ail] of order (n X n) determine a matrix X = [x,1] which is a least squares approximation to A

(i) when X is a symmetric matrix (ii) when X is an orthogonal matrix. Solution (i) The problem was solved in Example 7.1 when A and X are of order (2 X 2). With the terminology defined above, we write

E=A - X G(X) = X -X' = 0 so that G and hence U are both of order (n X n).


98

[Ch. 7

Equation (7.12) becomes

f* = trA'A-trA'X-trX'A+trX'X+trU'X-trU'X'. We now make use of the results, in modified form if necessary, of Examples 5.4 and 5.5, we obtain

of ax

_ -2A+2X+U-U' = 0

for X = A+

U °- U' 2

Then

X'=A'+U'-U 2

and since X = X', we finally obtain `""

X=j(A+A'). E'"

(ii) This time

G(X)=X'X-I=0. Hence

f* = tr[A'-X'][A-X] +trU'[XX'-I]

so that

af

ax

_ -2A+2X+X[U+U']

=0 for X=A-X

2

fl.

Premultiplying by X' and using the condition

X'X = I we obtain =I+U+U'

X'A

2

and on transposing

A'X = I+

U+ U' 2

Hence

A'X = X'A

.

(7.14)

,_, ...

If a solution to (7.14) exists, there are various ways of solving this matrix equation.

Sec. 7.3]

Problem 1

99

For example with the help of (2.13) and Example (2.7) we can write it as

[(l ©A') .- (A' ©I)U] x = 0

(7,15)

where U is a permutation matrix (see (2.24)) and

x=vecX. .D.

We have now reduced the matrix equation into a system of homogeneous ...

equations which can be solved by a standard method. If a non-trivial solution to (7.15) does exist, it is not unique. We must scale it appropriately for X to be orthogonal.

There may, of course, be more than one linearly independent solution to (7.15). We must choose the solution corresponding to X being an orthogonal matrix.

Example 72 Given

A =

find the othogonal matrix X which is the least squares best approximation to A. Solution

-1

0

2

1

0

0

0

0

1

-1

0

0

2

1

[IOA'] =

r1 -1

0

and [A'©1]U =

1

0

0

0

0

1 -1

2

1

0

0

0

0

2

1

Equation (7.15) can now be written as 0

0

0

0

2

1

-1

1

-2 -1

1

-1

0

0

0 1-+

0

x=0

'L7

There are 3 non-trivial (linearly independent) solutions, (see [18] p.131). They are

x = [1 -2 1 1]',

x = [1

1

2 -1]'

and

Only the last solution leads to an orthogonal matrix X, it is

X=1

13

2

3

-3

2

x = [2 -3 3 2]'.

[Ch. 7


100

7.4 PROBLEM 2 - THE GENERAL LEAST SQUARES PROBLEM The linear regression problem presents itself in the following form: N samples from a population are considered. The ith sample consists of an te/

observation from a variable Y and observations from variables X1, X2, ..., X (say).

We assume a linear relationship between the variables. If the variables are measured from zero, the relationship is of the form

Yl = bo+blxn+b2x11+...+bx,8+el.

(7.16)

If the observations are measured from their means over the N samples, then

(i= 1, 2, ... N)

yr =

(7.17)

bo, b1, b2, ... , b are estimated parameters and e1 Is the corresponding residual. In matrix notation we can write the above equations as

y = Xb + e

(7.18)

[]

where

Y=

.

b=

ba

,

eI

e=

2

...

Y2

[bl]

YNI' and

rl

... xln

X22 ... X2n

X11 X12 or

X =

... Xln

X21 X22 ... x2n

...

1

...

I{

x12

ex

...

X =_

Ibn

L1

XN2 ... XNnJ

LXNI XN2 ... XNnJ

.

As already indicated, the `goodness of fit' criterion is the minimisation with respect to the parameters b of the sum of the squares of the residuals, which in this case is

S = e'e = (y'-b'X')(y-Xb).

Making use of the results in table (4.4), we obtain a (e'e)

ab

=

-(y

'X)'-X'y + (X'Xb +X'Xb)

= -2X'y + 2X'Xb = 0 for X'Xb = X'y.

(7.19)

where b is the least squares estimate of b. If (X'X) is non-singular, we obtain from (7.19) b

= (X'X)-1 X'y..

(7.20)

Problem 2

Sec. 7.41

101

We can w,ite (7.19) as

X'(y -Xi) = 0 X'e = 0

or

(7.21)

which is the matrix form of the normal equations defiend in section 7.2. Example Z 3

Obtain the normal equations for a least squares approximation when each sample consists of one observation from Y and one observation from

(i) a random variable X (ii) two random variables X and Z. Solution (1)

X =

Y,

x1

1

x2

1

I

y =

Y2

6, ,

b = 62

... 1

XN

YN

hence

X'[y-Xb] = Ey;-b1N-b2Ex; ExiYi - b, Ex; - 62 Ex,2J So that the normal equations are .-0

Ey, = b,N+b2Ex1 and Exly! = b1 E xr + b2 Ex,? .

(ii) In this case x1 l

x2 z2

bl

Y11

y =

b=

Y2

...

...

X=

z

Lb3J

11 xN ZNJ

LYNJ

The normal equations are

Kronecker Products and Matrix Calculus: With Applications

Kronecker Products and Matrix Calculus: With Applications (Mathematics and Its Applications)

Matrix calculus and Kronecker product with applications and C++ programs