Time Series and Linear Systems

Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner 86 Time Series and Linear Systems Edi...

Author: Sergio Bittanti

40 downloads 1052 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner

86 Time Series and Linear Systems

Edited by S. Bittanti

Springer-Verlag Berlin Heidelberg New York London Paris Tokyo

Series Editors M. Thoma • A. Wyner

Advisory Board L. D. Davisson • A. G. J. MacFarlane • H. Kwakernaak J. L. Massey ' Ya Z. Tsypkin • A. J. Viterbi Editor Sergio Bittanti Dipartimento di Elettronica Politecnico di Milano Piazzo Leonardo da Vinci 32 20133 Milano (italy)

ISBN 3-540-16903-2 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16903-2 Springer-Verlag New York Berlin Heidelberg Library of Congress Cataloging in Publication Data Time series and linear systems. (Lecture notes in control and information sciences; 86) Includes bibliographies. 1. Time-series analysis. 2. Linear systems. I. Bittanti, Sergio. I1. Series. OA280.T558 1986 519.5'5 86-20244 ISBN 0-387-16903-2 (U.S.) This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Weft", Munich. © Springer-Verlag Berlin, Heidelberg 1986 Printed in Germany Offsetpnnting: Color-Druck, G. Baucke, Berlin Binding: B. Helm, Berlin 216113020-543210

PREFACE

O v e r the

five p a s t

n i c o di M i l a n o

years,

(Italy)

and ~ d e n t i f i c a t i o n Several

Analysis,

series

Statistics,

of r e s e a r c h

by m e a n s

Numerical

contributing

on the subject.

underlying

this

was

chapters

are e x t e n d e d

advanced

topics

The b o o k

problem

in the field.

They also

directions

as follows.

The p r o b l e m of f i n d i n g

the

linear

of c r i t e r i a

such as A I C or B I C

the p r o b l e m

of d e t e r m i n i n g

observed second

~

Hankel

studied matrix

matrix

variables

chapter.

assumtions

of c u r r e n t

which

Among

of the

impulse

of f i n i t e

are s u b j e c t

The motivation

can then be avoided.

interest.

rank.

is an

in time

series

here

other

things

the use

discussed.

rational

transfer

of a p p r o x i m a t i n g response

Linear

to errors

as the

is the b e s t

is c r i t i c a l l y

as the p r o b l e m

important

useful

chapter

models

a suitable

The v a r i o u s

constitute

is i n t e r p r e t e d

model

for the d a t a at hand.

with a Hankel

talks

of ideas

overviewing

The first

of m o d e l l i n g

approximant

infinite

their

a system-theoretic

papers

to the use of s t o c h a s t i c

approximation

train

of s u c h an activity.

introductory

is o r g a n i z e d

analysis.

report

to r e s e a r c h

introduction

to d e v e l o p

The

with

System

for the art of modelling.

is a p a r t i a l

introductions

systems.

Econometrics,

up a w o r k s h o p

This b o o k

of l i n e a r

including

to s e t t i n g

point of v i e w

of m o d e l l i n g

backgrounds,

the P o l i t e c n i c o

activity

at the P o l i t e c -

in the m e t h o d o l o g y

of d i f f e r e n t

Theory,

visited

has b e e n

of time

specialists

and C o n t r o l

a stream

function the

coefficients

systems

where

are c o n s i d e r e d

is t h a t p r e j u d i c i a l A n e w class

Moreover,

all

in the

causality

of d y n a m i c

models

IV

for time

series

are b a s e d strictly

is p r o p o s e d

on the c l a s s i c a l related

chapter

Length

approach.

digits

with which

is d e v o t e d

of s t o c h a s t i c of b i n a r y data.

ically

time-varying

coefficients,

structural

series.

Chapter

The

properties

of these

and so on.

in the a n a l y s i s

of s t o c h a s t i c

in the sixth Shur Then,

problems

chapter.

and Singular the p r o b l e m

time-invariant

The v o l u m e

on the s u b j e c t the m a i n

authors their

of some r e c e n t either

expresses

his

care and patience

sincere

valuable

Research

Council

(M.P.I.)

is g r a t e f u l l y

theory

(C.N.R.)

properties upon.


of the LU,

QR,

is provided. subspace

of a

is d e v o t e d

in E c o n o m e t r i c s .

book providing

acknowledgment contributions,

di T e o r i a

researchers

courses with

field.

in the p r e p a r a t i o n

of the C e n t r o

to d e s c r i b e

is t o u c h e d

last c h a p t e r

in the

period-

on the b a s i c

b y these

algorithms

trends

with

reachability,

reachability The

to

as a t e x t b o o k for m o n o g r a p h i c

and p e r s p e c t i v e s

for t h e i r m o s t

The s u p p o r t

the

or as a r e f e r e n c e

trends

The e d i t o r

Decomposition

systems

systems

overview

as

it p e r m i t s

here

i.e.

system

is studied.

can be u s e d

focuses

periodic

This

of the data,

with

The r o l e p l a y e d

of b i n a r y

data.

can be u s e d

systems,

of c o m p u t i n g

system

to the d i s c u s s i o n

which

An e x t e n s i v e

2. The

Description

the o b s e r v e d

5 deals

in l i n e a r

Value

Minimum

with which

attention

stabilizability

Some numerical

in C h a p t e r

complexity

digits

and are

by the n u m b e r

to e n c o d e

the o b s e r v e d

time

judged

These models

approach,

introduced

encode

seasonal

Analysis

is t h e n

it p e r m i t s

number

chapter.

to the so c a l l e d

A model

to the n o t i o n

the s h o r t e s t

Factor

to the s y s t e m s

fourth

leads

in the t h i r d

to the as well

fellow as

of the m a n u s c r i p t s .

dei S i s t e m i

of the N a t i o n a l

and t h a t of the M i n i s t r y

of E d u c a t i o n

acknowldged.

Sergio

Bittanti

ABSTRACTS

Chapter

TIME

l

SERIES AND

b y E.J.

The b a s i c

concept

an o u t p u t

y(t),

u(t),

p

of

STOCHASTIC

Hannan

of this p a p e r of

q

is a l i n e a r

components,

components

y(t)

y(t)

the

e(t)

= ~ W i e(t-i) linear

The m e t h o d s

valid when

the s y s t e m

prediction

is o p t i m a l ,

is r e l a t e d

to an input,

+ ~ L i u(t-i) 1

are the

- Z L i u(t-i).

system wherein

via a relation

0

wherein

MODELS

is t r u l y

prediction

errors

of the p a p e r linear

but may prove

are s u b s t a n t i a l l y

in the useful

for

sense over

that

linear

a much wider

range. To b r i n g

the p r o b l e m b a c k

statistical structure

methods

are b a s e d

by one w h e r e i n

W(z)

are a p p r o x i m a t e d discussion

= Z W.

process.

to c h o o s e

"order"

lags

-i

, L(z)

the of the t r u e

functions

= Z L.

of r a t i o n a l

of some b a s i c

approximation

the m a x i m u m

z

proportions

on t h e a p p r o x i m a t i o n

the m a t r i x

by m a t r i c e s

is g i v e n

the

to r e a s o n a b l e

-i

functions.

theory

It is n e c e s s a r y ,

z

relating

A brief

to such an

in the a p p r o x i m a t i o n

of the a p p r o x i m a n t ,

i.e.

effectively

in the A R M A X m o d e l ,

h Z AiY(t-i) 0

h = ~ B.u(t-i)l I

h + Z C.l e(t-i) 0

,

to w h i c h Various

the r a t i o n a l algorithms

analysis

described

are d e s c r i b e d

a suitable

that

does

approximant.

recursive

at e a c h

corresponds.

are b a s i c

in time

a solution

The m a i n

this by a G a u s s - N e w t o n

is r e d e t e r m i n e d

to the p r o b l e m

algorithm

iteration

iteration

series

in w h i c h

the

by a c a l c u l a t i o n

in the order.

Finally,

on-line

presented

Chapter

function

and are t h e n u s e d to e f f e c t

of f i n d i n g

order

transfer

implementations

for the c a s e w h e r e

LINEAR

2

of the a l g o r i t h m

y(t)

are

is scalar.

ERRORS-IN-VARIABLES

SYSTEMS

by M. D e i s t l e r

Linear where

errors-in-variables(EV) all o b s e r v e d

considered. out

The

variables

statistical

to be s i g n i f i c a n t l y

conventional good part

of t h e s e

the t r a n s f e r general,

errors

systems, are s u b j e c t

analysis

more

complications

function

of the

is n o t u n i q u e l y

system

are

systems.

f r o m the

f r o m the

turns

to A

fact t h a t

in the E V case,

determined

systems

systems

compared

(e.g. ARMAX) arises

linear

to e r r o r s

of s u c h

complicated

in e q u a t i o n s

i.e.

in

second moments

of the o b s e r v a t i o n s . The p a p e r known

is o r g a n i z e d

results

sections contained

concerning

3 - 5 the in the

is a n a l y s e d :

as follows: the s t a t i c

information

(ensemble)

In s e c t i o n

In s e c t i o n

about

second

3 the

case

2 some w e l l

are r e s t a t e d .

the t r a n s f e r

moments

In

function

of the o b s e r v a t i o n s

set of all t r a n s f e r

functions

corresponding described.

to g i v e n

Section

system

is a p r i o r i

whether

causality

the o b s e r v a t i o n s . are derived. using

4 deals with known


the same p r o b l e m

to be c a u s a l

c a n be d e t e c t e d In s e c t i o n

Section

information

second moments

the

the p r o b l e m

f r o m the s e c o n d m o m e n t s

5 conditions

6 deals

coming

and w i t h

when

is

with

for i d e n t i f i a b i l i t y

conditions

from moments

of

for i d e n t i f i a b i l i t y

of o r d e r

greater

than

two.

A N E W C L A S S OF D Y N A M I C M O D E L S FOR STATIONARY TIME SERIES

Chapte r 3

by G. P i c c i

A new class presented. known

of d y n a m i c

for s t a t i o n a r y

time

T h e y are a n a t u r a l

generalization

of the w e l l -

and P s y c h o m e t r i c s . of time

to some e x t e n t

simple

of m u l t i v a r i a t e introduction subsumed

series

is

Analysis M o d e l s w i d e l y u s e d in S t a t i s t i c s It is s h o w n

series clarify

the

in this n o t e

structure

schemes

series which

of a p r i o r i causality

by c o n v e n t i o n a l

reduce

of) Dynamic

in the r e c e n t

mathematical time

that the F a c t o r A n a l y s i s

considered

VariabZe Models d i s c u s s e d provide

S. P i n z o n i

models

l i n e a r Factor

Models

and

avoid

They

identification

the u n j u s t i f i e d

assumptions

A R M A X models.

(and

Errors-In-

literature.

for the

to

as for e x a m p l e

Chapter

4

PREDICTIVE AND NONPREDICTIVE MINIMUM DESCRIPTION LENGTH PRINCIPLES by J. R i s s a n e n

This

chapter

behind

presents

the r e c e n t l y

Minimum model

permits length

one to e n c o d e

stochastic

for m o d e l s

the p r e d i c t i v e tend

sets

data

can be predicted.

involves their

a tight

estimates

values,

statistical with

lower

information

problems

complexity

bound

that

in m o d e l i n g

we d e s c r i b e

of the d a t a

relative

of models,

both

single

case.We

illustrate

with

associated

estimates

structures.

We also

the p a r a m e t e r s , be t a k e n

of.

The

by s i m u l a t i o n s .

complexity the

the c o m p l e x i t y and

all the

say that the

the funda-

stochastic

model. of the

stochastic

A R M A class input/output

the c o n s i s t e n c y of the p a r a m e t e r s

h o w the p r i o r by their

feasibility

for

f r o m the d a t a

and the m u l t i p l e

as r e p r e s e n t e d

advantage

demonstrated

describe

with

to the g a u s s i a n

of the n u m b e r

which

with which

to c a l c u l a t e

simulations

can be

stochastic

the c a l c u l a t i o n

complexity

to be the

ones,

to i n c o r p o r a t e

optimal

code

of the p a r a m e t e r s

we m a y

it

on h o w the

c a n be e x t r a c t e d

are

with which

complexities

associated

Hence,

called

a statistical

is d e f i n e d

The

ideas

shortest

for the errors

The m o d e l

models.

in the

The

Depending

same value.

and the a s s o c i a t e d

As a p p l i c a t i o n s

digits

and the n o n p r e d i c t i v e

m a y be t a k e n

the c o n s i d e r e d

mental

in a class

b o t h of the n u m b e r

which

principle,

data.

of s t o c h a s t i c

to the

also

the b a s i c

Briefly,

of b i n a r y

of the data.

two k i n d s

samples

principles.

the o b s e r v e d

complexity

is done

defined, large

Length

manner

estimation

by the n u m b e r

available

coding

developed

Description

is judged

in a t u t o r i a l

knowledge

estimated of the

of the and the about

values,

scheme

is

can

IX

Chapter 5

DETERMINISTIC AND STOCHASTIC LINEAR PERIODIC SYSTEMS by S. Bittanti

The main results concerning the structural properties of linear periodic systems are reviewed. and discrete-time time-invariant discussed.

Both continuous-time

systems are dealt with. By a comparison with

systems,

five structural properties are

Three of them are basic properties concerning the

reachability and controllability subspaces. The fourth one concerns the length of the time interval required to perform the reachability and controllability transition. (spectral) characterizations

are presented as fifth property.

The extended structural properties detectability)

The modal

(i.e. stabilizability and

are also dealt with. Finally,

periodic stochastic

systems are considered. The existence of a cyclostationary solution is investigated by analizing the appropriate periodic Lyapunov equation.

Chapter 6

NUMERICAL PROBLEMS IN LINEAR SYSTEM THEORY by D. Boley and S. Bittanti

We discuss some numerical aspects

in linear system theory.We

start by showing the numerical algorithm to solve systems of linear equations and non-degenerate

least squares problems.We

then move on to an introduction to more sophisticated matrix decompositions,

used to solve more sophisticated problems,and

introduce the cincept of son,

backward

error

analysis

1965). Among the decompositions we introduce

(Wilkin-

name

form

LU

A=LU

used

to o b t a i n

solution

of l i n e a r

determinant

(Gaussian Elimination) A=QR

QR

soln. to l e a s t S q u a r e s p r o b l e m (linear n o n d e g e n e r a t e )

(orthogonal triangularization)

soln. to l i n e a r E q u a t i o n s w i t h o u t n e e d to p i v o t

Schur

A=QRQ ' . Eigenvalues/vectors

Singular Value D e c o m p o s i t i o n (S.V.D.)

A=PZQ ' . Singular

Values

• rank • distance

to s i n g u l a r i t y

2 - n o r m of m a t r i x

•

2-norm condition where

P,Q denote

orthogonal

U,R

"

upper

triangular

matrices

L

"

lower

triangular

matrices

Z

is n o n - n e g a t i v e

diagonal

last s e c t i o n we d i s c u s s

linear

s y s t e m theory.

linear

numerical

methods.

a n d give

in t e r m s

subspace

some r e c e n t

of r e s u l t s

aspects

is f o c u s e d

in

o n the p r o b l e m

of a t i m e - i n v a r i a n t

It is s h o w n h o w some c l a s s i c a l

problems

on the e r r o r s

some n u m e r i c a l

The attention

the c o n t r o l l a b l e

system.

number

matrices

In the


Equations

methods

results

from these

lead to

giving bounds

classical

×I

Chapter

SOME R E C E N T

7

DEVELOPMENTS

b y M. M c A l e e r

In this p a p e r

we d i s c u s s

in e c o n o m e t r i c s : particular,

macroeconomic associated

modelling

with

and M. D e i s t l e r

some of the m a i n

methods

diagnostic

IN E C O N O M E T R I C S

recent

for s p e c i f i c a t i o n

checking

and

empirical

search,

specification

and f o r e c a s t i n g ; microeconomics.

developments in testing;

and some m o d e l s

AUTHORS

Sergio Bittanti D i p a r t i m e n t o di E l e t t r o n i c a P o l i t e c n i c o di M i l a n o P i a z z a L e o n a r d o da Vinci, 32 20133 M I L A N O ITALY

Daniel Boley D e p a r t m e n t of C o m p u t e r S c i e n c e U n i v e r s i t y of M i n n e s o t a 136 L i n d Hall 207 C h u r c h S t r e e t S.E. M I N N E A P O L I S , M i n n e s o t a 55455 U.S.A.

Manfred Deistler I n s t i t u t fdr O k o n o m e t r i e u n d Operations Research Technische Universit~t Wien Argentinierstrasse 8/119 A-1040 WIEN AUSTRIA

E d w a r d G. H a n n a n D e p a r t m e n t of S t a t i s t i c s M a t h e m a t i c a l S c i e n c e s Bldg. The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA

M i c h a e l J. M c A l e e r D e p a r t m e n t of S t a t i s t i c s , The F a c u l t i e s The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA

Xill

Giorgio Picci I s t i t u t o di E l e t t r o t e c n i c a U n i v e r s i t ~ di P a d o v a Via G r a d e n i g o 6/A 35131 P A D O V A ITALY

Stefano Pinzoni LADSEB-CNR Corso Stati Uniti 35020 P A D O V A ITALY

Jorma R i s s a n e n IBM-RES 650 H a r r y R o a d SAN JOSE, C A 95193 U.S.A.

4

ed E l e t t r o n i c a

XIV TABLE

TIME

Chapte r I

SERIES

by E.J.

AND

OF C O N T E N T S

STOCHASTIC

MODELS

Hannan

I. I n t r o d u c t i o n

I

2. Some

4

Basic

Algorithms

3. A p p r o x i m a t i o n 4. R a t i o n a l

Criteria

Transfer

5. A. G a u s s - N e w t o n 6. Some

Theoretical

8

Function

Approximation

12 16

Procedure

28

Considerations

34

References

Chapter

2

LINEAR

ERRORS-IN-VARIABLES

SYSTEMS

37

by M. D e i s t l e r


37

2. The S t a t i c

41

3. S e c o n d

Case

M o m e n t s and D y n a m i c

Models: the G e n e r a l

Case

4. C a u s a l i t y 5. C o n d i t i o n s Moments

52 for I d e n t i f i a b i l i t y

f r o m the S e c o n d

of the O b s e r v a t i o n s

6. I d e n t i f i a b i l i t y References

48

from H i g h O r d e r

58 Moments

63 66

XV

Chapter

3

A NEW CLASS

OF D Y N ~ 4 I C

FOR STATIONARY b y G. Picci

TIME

MODELS

69

SERIES

and S. P i n z o n i

69

I. I n t r o d u c t i o n 2. D y n a m i c

Factor

3. S t o c h a s t i c

Analysis

80

Models

87

Realization

4. C a u s a l i t y

104

References

112

Ch_~pter 4

PREDICTIVE MINIMUM

AND NONPREDICTIVE

DESCRIPTION

LENGTH

115

PRINCIPLES

by J. R i s s a n e n

1. I n t r o d u c t i o n

115

2. C o d i n g

120

and Prediction

3. A R M A E s t i m a t i o n 4. V e c t o r

Time

and P r e d i c t i o n

Series

125 131

Models

137

References

Chapter

5

DETERMINISTIC

AND

STOCHASTIC

LINEAR

PERIODIC

141

SYSTEMS by S. B i t t a n t i

141

I. I n t r o d u c t i o n 2. S t r u c t u r a l Systems

Properties

of C o n t i n u o u s - t i m e

Periodic

143

X~ 2.1 Continuous-time Linear Periodic Systems

143

2.2 Structural Properties

145

2.3 Grammian Matrices

146

2.4 Five Structural Properties of Time-invariant

146

Systems 2.5 Five Structural Properties of Continuous-time

148

Periodic Systems 3.

Structural Properties of Discrete-time Periodic

156

Systems 3.1 Discrete-time Linear Periodic Systems

156

3.2 Structural Properties

158

3.3 Grammian Matrices

158

3.4 Five Structural Properties of Discrete-time

159

Periodic Systems 4.

Kalman Canonical Decomposition

163

5.

Extended Structural Properties

165

6.

Stochastic Linear Periodic Systems

168

References

Chapter 6

176

NUMERICAL PROBLEMS IN L I N E A R

SYSTEM

THEORY

183

by D. Boley and S. Bittanti

1 ,

Introduction

183

2.

Review of Simpler Computational Methods

183

2.1LU

183

Decomposition

2.20rthogonal 2.2.1QR

Decomposition

Decomposition

2.2.2 Geometric Interpretation of a Rotation

188 188 191

2.2.3 QR Decomposition by Housolder deconigositions 192 2.2.4 Solving Least Squares Problems Using Orthogonal Decompositions

194

X~

Special Forms Used in Numerical Linear Algebra-Why

196

3.1 The Jordan Canonical Form

196

3.2 Numerical Conditioning of a Problem

197

4.

Schur Decomposition

199

5.

Singular Value Decomposition -

201

3.

Condition Number of a Matrix 6.

Applications of Previous to Linear Systems

References

Chapter 7

211 220

SOME RECENT DEVELOPMENTS IN ECONOMETRICS

222

by M. McAleer and M. Deistler

I.

Introduction

222

2.

Specification and Quality COntrol of a Model

226

2.1 Model Specification

227

2.2 Tight and Loose Specifications

228

2.3 Principles for Testing

231

2.4 Diagnostic Testing

232

2.4.1 Serial Correlation

232

2.4.2 Heteroscedasticity

233

2.4.3 Exogeneity

234

2.4.4 Functional Form

234

2.4.5 Parameter Constancy

235

2.4.6 Non-nested Alternatives

235

3.

Macroeconomic Modelling and Forecasting

236

4.

Microeconometrics

240

References

241

Chapter

I

Time

Series and Stochastic

E.J.

.

Models

Hannan

Introduction

This c h a p t e r will be c o n c e r n e d w i t h p r o c e d u r e s

for a n a l y s i n g

y(t),

of

t = 1,2,...,T,

where

y(t)

is a v e c t o r

q

data,

components

that can be thought of as the o u t p u t of some s y s t e m to w h i c h the input is

u(t),

an o b s e r v e d v e c t o r of

p

components.

held in m i n d is one w h e r e no very p r e c i s e about the system and the d e s c r i p t i o n of such g e n e r a l i t y explanation.

that e x p e r i e n c e

information

The situation is available

will be on the b a s i s

suggests w i l l

suffice

This w i l l be further d i s c u s s e d below.

of m o d e l s for a g o o d

T h e s e models

will always be stochastic. Let us b e g i n by c o n s i d e r i n g stationary

stochastic

y(t),

p r o c e s s with

E{yj(t) 2} < -, where

y~(t)

is the

assume that

y(t)

of the p r o c e s s

is ergodic

is e v e r

from the i n d e f i n i t e l y

effects c o u l d

=

of

y(t).

since only one h i s t o r y

so that there

far past,

to r e q u i r e if there

such effects

to

or r e a l i z a t i o n that it be

is no influence

or rather

such as diurnal

by a so that

It is c o s t l e s s

on

y(t)

is such an

as the m e a n or of

or seasonal m o v e m e n t s . so that,

will be with the m e a n c o r r e c t e d

Such

for example,

quantities

y(t)- y,

1 T Z y(t).

to c a l c u l a t i o n s

so t h a t

[(t)

This makes n o t a t i o n Any such stationary, least in part,

square,

q,

first be removed by r e g r e s s i o n

all c a l c u l a t i o n s

In relation

as g e n e r a t e d

seen and r e a s o n a b l e

it can only be through

periodic c o m p o n e n t s

been done

j = 1,2

j'th c o m p o n e n t

purely n o n - d e t e r m i n i s t i c , influence

alone,

finite mean

it will be assumed

is the r e s i d u a l

that this has a l r e a d y

from such an adjustment.

simpler. non-deterministic

through

its spectrum,

process can be analysed, f(w),

a

q x q

at

matrix valued

2

function

satisfying F(t)

f (~) = f (~)

= f (-~) '

d E{y(s)y(s+t)'}

= I ~eit~f{~)

We shall n o t discuss F o u r i e r m e t h o d s methods models

are e m p h a s i s e d

proportions,

in c o n t r a s t

Here

to F o u r i e r m e t h o d s

by smoothness

and systems e n g i n e e r i n g

a c r o n y m for a u t o r e g r e s s i v e (Here e x o g e n o u s

means

For

to m a n a g e a b l e

requlrements

for

f~).

emphasised

and are c a l l e d ARMAX,

moving-average

input.)

that are

is r e d u c e d

These finite p a r a m e t e r m o d e l s have been e s p e c i a l l y econometrics

the main

"finite parameter"

and in w h i c h the g e n e r a l i t y

essentially,

de.

in any d e t a i l b e c a u s e

of this paper are different.

non-parametric

and

with exogenous

y(t}

stationary

in

an

compo[:ents.

and non-

deterministic j y(t) Here the e(t) of

e(t)

= y(t) y(t)

= 7 W i e(t-i), 0

are the linear i n n o v a t i o n s

- y(tlt-l)

from

y(t[t-l)

important,

Then

There

% 0,

since

i.e.

is the b e s t linear p r e d i c t o r

There

is an e x t e n s i v e

of

f(~)

here.

the c o n s t r u c t i o n

but this will n o t be

Put (i.i)

Iz] > 1

and

W(z)

is a n a l y t i c

H o w e v e r we always a s s u m e

zeros on

Izl = 1

for

Izl > i ,

det W(z)

cause c o n s i d e r a b l e

# 0,

problems.

is a d e c o m p o s i t i o n f(~)

which

is u n i q u e

W(z)

having

= _ 1 W(e-i~ ) nw(e-i~)*, 2~ since there

the p r o p e r t i e s

we g e n e r a l i s e

(i.I)

y(t)

is no other

(1.2) such d e c o m p o s i t i o n

stated above.

To take a c c o u n t

= 7. Wie(t-i) 0

u(t)

= ZLiZ-i

is causal, so that there is no i n f l u e n c e However

The e s s e n t i a l

(1.3)

relation

(I.I},

as a basis for a w o r t h w h i l e further

of

+ 7. L.u(t-i), 1

L(z)

s > t.

with

to

and p u t

u(s),

theory

= 7. W.z -i 0

~II W i N 2 < ,

Izl > I,

E{e(s)e(t) '} = ~st ~.

W i e n e r and others c o n c e r n i n g

algorithmically,

det W(z)

since

y(t-2)...

from k n o w l e d g e

W(z}

9(tlt-1)

where

y(t-l),

due to Kolmogoroff, of

W 0 = Iq,

specialisation

(1.27,

restriction

on

is that the

y~t)

(1.3) are too g e n e r a l

statistical

consider

here

analysis.

the infinite

from to serve

To introduce

(Hankel)

matrix

a

-W 1

L1

W2

L2

W3

L3

-..

W2

L2

W3

L3

W4

L4

-.-

W3

L3

W4

L4

W5

L5

...

H =

i

m

I

0

Here

[WjLj]

Q

Q

t

0

0

i

Q

O

w i l l be c a l l e d a "block",

columns.

The i m p o r t a n c e of

obvious, f a c t that the b e s t predictic~ of u ( ~ , ,

H

j

H

has,

Li÷jut i,

Let

n

rows of

H

the c o e f f i c i e n t w i l l be m a d e

that span all of the rows of

so that a n y row can be l i n e a r l y r e p r e s e n t e d in t e r m s of them.

Of course rank of

n

w o u l d be infinite in general.

H,

rows of

H

p

[W(z),L(z) ].

Call

H1

y(t) Since

where

K,L

c o l u m n s of

(1.3),

comprise, H 0.

q

r e s p e c t i v e l y the first

q

are com.posed of

This is the state

F,H

+ e(t},

the rank of

H, H 0.

x(t)

H0~(t)

= Ke(t)

and

= H0~(t-l).

(1.4)

(full)

rows of

+ Lu(t) H

then

+ H2u(t-I). H 1 = HH0,

and x(t+l)

= Fx(t)

space r e p r e s e n t a t i o n

Its lack of u n i q u e n e s s ,

given that

F

+ Lu(t)

+ Ke(t).

(1.5)

in p r e d i c t i o n e r r o r form. is minimal,

i.e. of d i m e n s i o n

is e n t i r e l y due to the lack of u n i q u e n e s s

in a

T h a t can be m a d e u n i q u e by c h o o s i n g the rows of

as the first l i n e a r l y H.

or

Put

+ e(t),

for suitable

y(t) = Hx(t)

choice of

H,

(1.4)

= Hl~(t-l)

H0, H 2

H 2 = FH 0

the

the first b l o c k of

~(t) = [e(t)'u(t) ' e ( t - l ) ' u ( t - l ) ' . . . ] ' , Then from

n,

and p u t

H 0 = [K L H 2] the next

The i n t e g e r

is c a l l e d the o r d e r or the M c M i l l a n d e g r e e of

e q u i v a l e n t l y of

of

is, ignoring

j>l

The i m p o r t a n c e of H

in section 4.

be a set of

p + q

can be seen f r o m the, a l m o s t

evident in o t h e r w a y s H0

rows a n d

as the j'th row of blocks,

blocks in t h a t p r e d i c t i o n .

H

q

step a h e a d p r e d i c t o r

(t+jlt) = Z0 Wi+ j et_i + SO that

of

H0

i n d e p e n d e n t set f o u n d as you go down the r o w s

We w i l l r e t u r n to this later.

The m e t h o d s

used herein are d e p e n d e n t 9 n acting, as if

Then, and only then, functions

of

z

W(z)

and

L(z)

are m a t r i c e s

a n d can thus be w r i t t e n

n

is finite.

of rational

in the form

[ W(z) L(z)] = A(z-l) -I [C(z-I) B(z-l)] where (1.6)

A(z),

B(z), C(z)

are m a t r i c e s

(1.6)

of p o l y n o m i a l s .

is far from unique but we shall later describe

prescription fraction

of

H0

how the u n i q u e

just d e s c r i b e d

description",

the shift o p e r a t o r

Of course

(1.6).

i.e.

leads to a unique "matrix -1 shall use z also to indicate

We

z-ly(t)

= y(t-l).

Corresponding

to

(1.6)

we have the ARMAX r e p r e s e n t a t i o n A(z-l)Y(t)

= B(z-l)u(t}

This is i m p o r t a n t p a r t l y b e c a u s e

+ C(z-l}e(t)"

it e x p r e s s e s

y(t-1), y(t-2),., u(t-l), u(t-2),..e(t), serve as a basis for an iterative coefficient matrices are unobserved, no input

y(t)

estimation

This will be dealt w i t h variable

and can

p r o c e d u r e where the

are e s t i m a t e d by regression,

(or exogenous)

in terms of

e(t-l), e(t-2),.,

b e i n g r e p l a c e d by e s t i m a t e s

in the iteration.

(1.7)

the

e(t),

from a p r e v i o u s in section

5.

which stage

When

is o b s e r v e d we speak of the ARMA

case. Notes on References. spectral

theory,

for example

theory of systems 2.

There are m a n y r e f e r e n c e s Hannah

see K a i l a t h

(1970).

(1980), C a s t i

For the structure (1977).

Some Basic A l g o r i t h m s

There are three basic a l g o r i t h m s (i)

The first a l g o r i t h m y(t)

d(~)

of time series analysis.

is the discrete

at f r e q u e n c i e s

for

Under

T'

highly

Composite.

E{d(2~j/T) d(2nk/T) and,

indeed the error

(ii)

The

transform

is u n i f o r m l y

second a l g o r i t h m

j=0,1 ..... [½T']

conditions

on

f(~)

n

is finite.

is the L e v i n s o n - W h i t t l e

recursion.

in a sense,

(I.I) by c o n s i d e r i n g

smoothness

2~j/T',

} ~ 6jk2~f(2~j/T)

a l g o r i t h m will not be so i m p o r t a n t

This is designed,

Fourier

= T - ½ T~y(t) e it~ , 1

which is c h e a p l y c o m p u t a b l e

in

for the basic

0(T -I)

if

This

to us.

to produce a p p r o x i m a t i o n s

to

e(t)

~(z) = w(z) -I

e(t) = ~(z)y(t)

The procedure recursively calculates polynomial

approximations

$ n

of degree ~(z)

n

to

~.

is a polynomial

degree

n.

s < 0

n

of degree

will,

n

because a system for which

The recursive calculation

natural estimates

For

We have used

of

F(t)

in fact, be of McMillan

uses the data through the

of the form

1 T-t (t) = ~ Z y(s)y(s+t) ', t Z 0. s=l put ~(t) = ~(-t)' However it will be convenient

put this Levinson-Whittle

recursion

because of its many uses later.

in a more general

Thus let

v(t)

to

setting

be a vector of

s

components and put ^ I T-s Fv(t ) = ~ 7 v(s)v(s+t) ' = s=l

v(-t) ',

t >. 0.

~he recursion calculates matrices^ Fn,j, Fn, j, Sn, S nIf^ v(t) = y(t) then Fn, j is ~n,j' the c o e f f i c i e n t ~n(Z) ~ and correspondingly en(t) The

Fn, ~

process putting

=

would,

"backwards"

we have an estimate

n 0Z Sn,jy(t-j) '

rn(t),

v(t)

for

n rn(t) = ~ n , j

We now give the

e(t)

= y(t),

corresponding

correspond

12.1) to

to the time reversed

(as distinct from the forwards residuals Fn,j = ~n,j

z -j in

iT+n ~n = Sn = T ~ Sn (tlen(t) '

in this case where

residuals,

of

of

~n(t)).

Thus,

y(t) = v(t) y(t-n+j) ,

IT+n ~n = Sn = ~ 1~ rn(t)rnlt)'.

recursive algorithm in terms of

Fn,j = Fn-l,j + Fn,nFn-l,n-j'Fn,j

v(t).

= Fn-l,j

Fn, 0 = Fn, 0 ffi I s~_ Fn,n = -An-ISn~I ' Fn,n = -An-iSn~l'

12.21

+ Fn,nFn-l,n-j '

n An =

ZFn,j~v(J-n-I) "

0

Sn = (Is - Fn,nFn,n) Sn-l' Sn = (Is - Fn,nFn,n) Sn-i '

So = S o ffi ~v 1°)" In case

s = i

we have

that the algorithm

S n = S n' F n,j = F n,j . . j=l, . .

is simplified.

,n,

so

These procedures better,

when

have

n/T

severe

is not small.

founded on the T o e p l i t z T < t ~ T+n. for the

disadvantages This

assumption

Fn, ~

for given

n

T

v(t)

implicitly or

the system of e q u a t i o n s

has a block Toeplitz down any diagonal.)

= %n_l(t)

they are

= 0, -n < t ~ 0

+ ~n,nrn_l|t-l),

matrix,

i.e.

There have been

many m o d i f i c a t i o n ~ , often Dased on calculating, ~n(t) (see (2.1), (2.2)) r e c u r s i v e l y by %n(t)

is small or,

is because

that

(This is so called because

one with the same e l e m e n t s

when

for example,

~

(t), n

rn(t)

= ~n_l(t-l) + ~n,nen_l(t)

~n(0)

£ 0,

~0(t)

= 90(t)

= y(t),

1 ~ t ~ T.

Then also An - T1 It is the terms

T+n Z en(t)rn(t-l). 1

in

(2.1),

(2.2),

to cause m o s t of the trouble, involve, q = 1

in a substantial

it has been

T Z en_l(t)^ n

rn_l(t-l),

resulting

number

coefficient. equivalent

to the fact The use of them)

replaced

in

that

are called partial coefficients

the

Sn(Z)

between

is that the

# 0,

and is completely Izl ~ I,

additional because

a desirable calculations.

but wherever

These

of the flow diagrams

in real time calculations.

we shall continue

~n,n

to write

that is used

For the

in terms of the it could be

(using a lower case symbol

autocorrelations

by systems

computing

(2.4)

(2.4)

formula.

TO see why the a l g o r i t h m consider

also In case

by

of correlation in

~ n,n

(so c a l l e d

recursion

case

I ~ t < n

assumption.

be replaced

n,n

that seem

as also does the c o r r e l a t i o n

involves

are important

by a lattice

In the scalar

$

A virtue

(-i,i),

(2.3)

of this account

Levinson-Whittle

those for

use the c o e f f i c i e n t

or ladder m e t h o d s

describing purposes

that

is also true of

property. lattice

T < t ~ T+n

the Toeplitz

n ~ t ~ T.

lies

This

for

~n-i (t-l)/{!2 TZ en_l(t) ^ ^ 2 + ½ tZ rn_llt-l)2 } n n

but one m i g h t equally en_l{t),

(2.3)

though

way,

suggested

(2.3)

by statisticians

for

q = i}

and reflection

engineers. has b e e n p r e s e n t e d

an estimate

of

e(t)

for general

when

v(t)

inputs are observed.

Put,

then, n^

en(t)

n^

= Z~ n • y(t-j) 0 ,3

- Z~n, j u(t-j). I

Here

Z~n, j z -j is an a p p r o x i m a t i o n

using

(1.6) .

To obtain

A

and

[~n^,~'Tn,j" ]

also

~n'

hand

q x q

$, ~

to

take

as the f i r s t b l o c k

the c o v a r i a n c e matrix

W(z)-IL(z)

of

q

m a t r i x of the

of S n.

= C ( z - l ) - i B ( z -I)

.

v(t) ' = (y(t) ',u(t) ') , s = p + q, rows in

en(t),

Fn,j.

Then

is the top left

This type of p r o c e d u r e

will r e p e a t e d l y

be used below. (iii) The third m a j o r a l g o r i t h m a finite p a s t e q u i v a l e n t finite.

The a l g o r i t h m A

x(t+l)

to

is the K a l m a n

e(t),

filter,

on the basis of

which computes

(1.5),

for

n

is

= Fx(t)

+ Lu(t)

+ K(t) e(t),

y(t)

^

= Hx(t)

+ c(t)

!

K(t)

= {FP(t) H

P(t+l) P(1)

It may be wise

= FP(t)F'

= FP(1)F'

There is an e n o r m o u s Gaussian

x(l}

P(t),

of r o u n d i n g literature

+ .Q}-IK(t)'

= 0.

replacing

it by

½{P(t)+P(t) '}

errors.

surrounding

this algorithm.

For

lies in the fact that it allows

to be calculated,

w h i c h we call

+ ~}-I

- K(t){HP(t)H'

+ KnK',

its importance

likelihood

likelihood,

+ K~K'

to sym/netrise

to reduce the e f f e c t s

our p u r p o s e s

+ K~}{HP(t)H'

L(8)

or better

and still

(-2T -I)

the

by that

speak of as the likelihood.

This is, apart from a c o n s t a n t , 1 T Zlog d e t { H P ( t ) H ' + ~ }

i T + ~ 1Ze(t) '{HP (t)H'+n}-l£ (t) .

(2.5)

7 1 Here

e

K, ~.

stands for the p a r a m e t e r s Those

indicate by in

~.

T.

In (2.5)

treating few,

in F, H, K

u(t)

if any,

assumption

we

The r e m a i n d e r the G a u s s i a n as a fixed

of the m e t h o d s

that the

e(t)

involved,

shall call

i.e.

are the v a r i a n c e s likelihood

sequence

those

in

system p a r a m e t e r s

of this c h a p t e r are Gaussian.

and c o v a r i a n c e s

has been w r i t t e n

of vectors.

down

We e m p h a s i s e

depend greatly

The likelihood,

is used to obtain an e s t i m a t i o n m e t h o d rather

F, H,

and shall

than b e c a u s e

that

on the (2.5), it is

the true likelihood. Notes on R e f e r e n c e s .

The fast F o u r i e r

t r a n s f o r m was i n t r o d u c e d

to

latter day science

in C o o l e y and Tukey

of the L e v i n s o n - W h i t t l e

(1965).

The v e c t o r

a l g o r i t h m was given in W h i t t l e

L a t t i c e f o r m s are s u r v e y e d in F r i e d l a n d e r

(1982).

form

(1963).

A g r e a t amount

of detail a b o u t the K a l m a n f i l t e r is found in A n d e r s o n a n d Moore (1979). 3.

Approximation Criteria

The p r o b l e m to be c o n s i d e r e d

in t h e r e m a i n d e r of this c h a p t e r is

that of a p p r o x i m a t i n g the true s y s t e m by one of finite M c M i l l a n degree.

T h i s degree,

n,

has to be d e t e r m i n e d .

Once t h i s

is

r e c o g n i s e d it m u s t a l s o be r e c o g n i s e d that it is not p o s s i b l e to p r o c e e d p u r e l y through the m i n i m i s a t i o n of a l w a y s be f u r t h e r r e d u c e d by t a k i n g procedures here considered choose

n n

log det ~n + d(n)CT/T' Here

~n

(2.5)

large.

by m i n i m i s i n g some f o r m of n = 0,1,...,N.

is the m a x i m u m l i k e l i h o o d e s t i m a t e of

and the f i r s t term in the m i n i m a l v a l u e of

(3.1)

~,

(2.55, for

n

given.

The c o n s t a n t

The second term in

is

n ( 2 q + p).

term w h i c h i n c r e a s e s as Two c o m m o n l y used w i l l be c a l l e d be c a l l e d

CT

n

BIC(n).

increases,

s e q u e n c e s are

AIC(n),

given

and

An upper bound,

d(n)

whereas C T ~ 2,

C T = log T, N,

is the d i m e n s i o n (3.1)

(N

is a p e n a l t y

the first d e c r e a s e s . in which c a s e

in w h i c h case has b e e n

m i g h t increase w i t h

(3.15

(3.15 w i l l

imposed on

and is n e e d e d in c o n n e c t i o n w i t h p r o o f s of a s y m p t o t i c p r o p e r t i e s of the method.

n,

essentially

(Some a p p r o x i r ~ t i o n is

i n v o l v e d in that statement.) which

(3.1)

is, e x c e p t for a constant,

of

T,

since t h a t can The a l t e r n a t i v e

n

(wi~h T5

T.)

in p r a c t i c e

such b o u n d s do not seem to be u s e d p r o b a b l y b e c a u s e the b o u n d s n e e d e d for v a l i d i t y are m u c h larger than v a l u e s of experienced

n

t h a t an

i n v e s t i g a t o r w o u l d c o n s i d e r r e a s o n a b l e and a r e needed

in the t h e o r e t i c a l

i n v e s t i g a t i o n only to e x c l u d e r i d i c u l o u s l y

large values. For the c a s e of

C T = log T

a j u s t i f i c a t i o n has been g i v e n by

R i s s a n e n on the b a s i s of a m i n i m u m d e s c r i p t i o n

length p r i n c i p l e .

The idea is to use the m o d e l

set to r e c o r d the d a t a in as f e w b i t s

as p o s s i b l e .

(or r a t h e r

The first term

T/2

b y it) g i v e s a

m e a s u r e of the a v e r a g e n u m b e r of b i t s r e q u i r e d for an o p t i m a l encoding when

n

is fixed and the m a x i m u m l i k e l i h o o d structure,

on G a u s s i a n a s s u m p t i o n s , i s decode,

t a k e n to be the true structure.

To

the m o d e l p a r a m e t e r s m u s t also be t r a n s m i t t e d a n d T / 2 by

the second t e r m in

(3.1), for

BIC,

measures

the n u m b e r of bits for

an optimal e n c o d i n g of these, to an a c c u r a c y d e t e r m i n e d by that of the m e t h o d of m a x i m u m likelihood.

The use of

CT - 2

justified by A k a i k e on the basis of a p r e d i c t i o n

has b e e n

theory,

and has b e e n

widely used. The e m p h a s i s in this c h a p t e r w i l l p r i n c i p a l l y be on the use of rational t r a n s f e r f u n c t i o n systems as a p p r o x i m a t i o n s more general kind. section.

to systems of a

T h i s will be f u r t h e r d i s c u s s e d in the next

H o w e v e r h e r e some d i s c u s s i o n of the case w h e r e there is a

true r a t i o n a l t r a n s f e r f u n c t i o n s y s t e m w i l l be g i v e n in r e l a t i o n to the use of

(3.1).

T h e c o n d i t i o n s under w h i c h the s t a t e m e n t s b e l o w

hold true are e s s e n t i a l l y

(6.1),

of fourth m o m e n t s of the

ej (t),

also d e p e n d o n a c o n d i t i o n (Compare b e l o w

(I.i).)

(6.2), below, p l u s the f i n i t e n e s s but the p r o o f s of the t h e o r e m s

det W(z)

This

6

# 0,

Izl >_ 1-6,

6 > 0.

may be as small as d e s i r e d but is

p r e s c r i b e d u pr~or~. Now assume there is a true T ÷ ~,

CT/T

+

0

no

n

minimises

(3.1) while,

(which is an i n s i g n i f i c a n t r e q u i r e m e n t ) .

following holds, w h e r e a.s. (i)

and

lim inf C T / ( 2 1 o g log T) > 1 then T~=

n ÷ n0, a.s.

If

lim sup C T / ( 2 T+=

n

a.s. to

loglog T) < 1 then

does not c o n v e r g e

n0.

lim inf C T = ~ then

If

lim sup C T < = T ~ ~

~ + n O in p r o b a b i l i t y .

then

!im l i m P{n > i. 6~0 T + ~ no } =

(3.2)

These results d e s e r v e careful i n t e r p r e t a t i o n . (i) should not be i n t e r p r e t e d as saying that a good value to use b e c a u s e with

T

tO be m e a n i n g f u l .

At is 3.9. of

T

2 loglog T At

T = i0

It is t h e r e f o r e not far f r o m

In the f i r s t place C T = 2 loglog T

changes

CT = 2

(3.2) s u g g e s t s that

AIC(n)

n

T = i000

for m o s t AIC(n).

values

The r e s u l t

is bad b e c a u s e it w i l l a l w a y s o v e r -

estimate the M c M i l l a n degree. no true d e g r e e and t h e n

is

far too slowly

it is 1.7 and at

met in p r a c t i c e , w h i c h is the value for

is how fast.

Then the

stands for " a l m o s t surely".

If

(ii) If

as

H o w e v e r in p r a c t i c e there w i l l be

should increase with

Some i n v e s t i g a t i o n s

suggest t h a t

T.

The q u e s t i o n

C T = 2, i.e. AIC,

10

gives an o p t i m a l rate Of increase, The r e s u l t

(3.2) d e s e r v e s

s i m p l e s t case w h e r e n = 1

a c c o r d i n g to c e r t a i n c r i t e r i a .

further d i s c u s s i o n .

q = i,

n0 = 0

We give this for the

so that

y(t)

= e(t).

When

is the m o d e l t h e n y(t)

+ ay(t-l)

= e(t)

+ ce(t-l),

We i n d i c a t e w h y

n = 1

value,

is u n i f o r m l y bounded.

when

CT

lal < I,

w i l l be p r e f e r r e d to

Icl < 1-6.

n = 0,

(3.3)

the true

The c h o i c e b e t w e e n the two

v a l u e s w i l l be b a s e d on log ~i + 2 C T / T - log ~0 = - l ° g ~ 0 / ~ l ) so that

n = 1

Consider

Fig.

w i l l be p r e f e r r e d w h e n

+ 2CT/T

A T = T l o g ( ~ 0 / ~ l) > 2C T •

i.

/ 1-6

c

-i+6

/ -i

1 a

F i g u r e I. The r e g i o n of o p t i m i s a t i o n lines t h r o u g h

±(1-6),

for

n = 1

is that b e l o w and a b o v e the

e x c l u d i n g the diagonal,

the l i k e l i h o o d c o u l d be at the boundary. maximum likelihood estimates that

(~,~)

it may be s h o ~

m o v e s to the diagonal.

but the m a x i m u m of

In fact if

Thus

that AT

that d i a g o n a l b y

lal < i o g { ( 2 - 6 ) / ~ } 1T

= 4,

let us say.

s t a t i o n a r y r a n d o m f u n c t i o n of

(a - c) ÷ 0,

Fig.

~ = log{(l+a)/(l-a)}

6,

i.e.

~(~)2 ~

so

i.

where

Let us

so that

Then this function,

is e v e n t u a l l y the m a x i m u m v a l u ~ is

are the

is e v e n t u a l l y the

m a x i m u m of a f u n c t i o n d e f i n e d on the d i a g o n a l of parameterise

a, 6

~(s)

of w h i c h is a

t a k e s the p l a c e of

t

in our p r e v i o u s

considerations

spectral d e n s i t y ~(u)

will,

as

~ + 0 el.

so that

Thus

A

(a,c)

as follows.

becomes

large v a l u e s

which will m a k e o p t i m i s a t i o n interpretation

continuously.

-~ < ~ < =.

its m a x i m u m for i n c r e a s i n g l y that approach

but v a r i e s

{cosh ~ } - i

will

increasingly of

u

(i,i)

has

that

large,

i.e. v a l u e s

approach

difficult.

~(u)

It is e v i d e n t

or

take

of

a

(-i,-i),

This r e s u l t has a n o t h e r

It is a p p r o x i m a t e l y

true that

~i

is the

minimum value of

-~{Id(~)12/lw(eiC~) l2}

IV

where that

W(z)

=

(z-c)/(z-a).

a-S ~ 0.

If

Iw(ei~)12 ÷ i,

a,S

(e,a) = (I,i)

than

does,

0 (for -i)

then

IW(ei~)12

or

(3.4)

a "notch"

If

a

reduced

~n

or

0.

bf

Where

Id(~)l 2

This f u n c t i o n

near

0,

en.

is i n c r e d i b l y

local minima to find.

to

(3.4) and the a b s o l u t e

This c o r r e s p o n d s

local maxima and minima, neighbourhoods the function

of

~,e

+i) or

IW(ei~)12 ~).

Thus

the n o t c h will by the shape of irregular that w i l l

for

(for

~

m i n i m u m m a y be very d i f f i c u l t 2 ~(u) will have many

small)

into small

because

of the n a t u r e of

e = log{(l+a)/(1-a)}. situation,

essentially

the same. in that

i.e.

that for general

It must be e m p h a s i s e d T

n, q, p, that

(3.2)

is is very

may need to be very large before

it is

relevant.

Notes on References. suggested in A k a i k e

The procedures (1969),

and above are in H a n n a n relating to

AIC

Rissanen

(1980),

described (1983).

(1981),

w h e n there is no true

Hannah and K a v a l i e r i s

(1980).

T

give

to the fact that

w h i c h w i l l be c o m p r e s s e d

a = ±i

The general "asymptotic"

for

faster

(for

(since

shape will be is d e t e r m i n e d

so that there w i l l be many v a l u e s

el

±n

be and what its precise large,

We know that

then

goes to

zero at

at other values

at

d(~)). el

so that it seeks to move

becomes

is f u r t h e r

for

The m e t h o d of m a x i m u m

(3.4),

(-i,-i).

to unity u n i f o r m l y

develops

2(i)

away f r o m

a-a ÷ o.

IW(ei~)l-2

a n d thus

will converge

as

to m i n i m i s e

towards e

(See s e c t i o n

remain b o u n d e d

uniformly,

likelihood attempts

(3.4)

de

in this s e c t i o n were The results

(1984). nO

in

(3.2)

For the results

see S h i b a t a

(1980),

12

4.

Rational

Transfer

Function ApDr0ximation

In this section a b r i e f theory c o n c e r n i n g

the a p p r o x i m a t i o n

by a p p r o x i m a t i n g

to

H

less c o n c e r n e d w i t h methods

account w i l l be given of some d e e p of the true structure

by a Hankel matrix

theory may c h o o s e

relate mostly

to the case where

W(z)

by a

possible

W(z)

for

n

finite

in the Hankel norm, w h i c h

singular value norm) to another.

for

H

from

has Thus

)',

past.

The

structure

~ ~).

matrices being (5.5).

describes

space on w h i c h

By i.e.

R.

H

(I

Q ~)

as the

operates

(1.2).

space

F(t)'

= 0,

(4.1)

(j,k)th

block,

Wj = 0,

of the f u t u r e on the

is therefore matrix

of

e n d o w e d with a e t,

namely

definition H

of tensor p r o d u c t

operates

blocks see b e l o w

is endowed w i t h a m e t r i c

m a t r i x of

Yt+l'

namely

that

block

is the

t'th

the c a n o n i c a l

= F(j-k)'

Fourier coefficient factorisation

Let this f a c t o r i s a t i o n

(This n o t a t i o n

(e(t)',e(t+l)',...)'

a b l o c k diagonal m a t r i x w i t h the diagonal

E{y(t+j)y(t+k) '} = F(k-j) Since

(or

we mean the T e n s o r p r o d u c t of the two

The space to w h i c h

we c o n s i d e r

norm

yt = ( y ( t - l ) ' , y ( t - 2 ) ' , . . . ) '

given by the c o v a r i a n c e

For the general

(j,k}th

)'

the d e p e n d e n c e

structure given by the c o v a r i a n c e with

is as small as

from one H i l b e r t

et =

E ( e t et+l)

Wj_k,J,k=l,2,... (4.1)

metric (I

H-~

(i.I),

Yt+l = Her + Ket+l' K

so t h a t

is the E u c l i d e a n

(y(t)',y(t+l)' ....

as is e a s i l y c h e c k e d

j < 0.

The idea is to a p p r o x i m a t e

as an o p e r a t o r

e t = (e(t-l) ',e(t-2)',...

where

The

To see w h a t is i n v o l v e d put Yt =

Then,

(Readers

this section.)

there are no inputs and

only that case w i l l be d i s c u s s e d here. to

of finite rank.

to "skip"

be

f(-m)

matrix of

f(--a)

of this, as for f(m) in -I~ -iv --iw * = (2~) W(e )~W(e ) .

is in a g r e e m e n t w i t h that in section 2 b e c a u s e

is the s p e c t r u m of the time r e v e r s e d process.)

Here

W(z)

f(-w]

= E WjzJ

13

and

det W + 0,

block,

W(j}

Izl > I.

Let

= 0, j < 0.

W

have

W(k-j)

(j,k)th

as the

Then

s = ( i • ~-%) w - 1 ~ ( z ~ n ½) operates from £2 to £2' sequences al,a2,.., with produc~

(a,b)

decomposition is upper

= Zajbj.

Thus S

triangular

The blocks,

Sj+k_l,

S

and for

z = exp i~ If

Toeplitz,

singular

by the matrix

matrix

so that

W-IH

then that

it is easily

q = 1

In the scalar

whose

W -I

is of Hankel

value

because

then

case,

f(-~)

q = I,

Thus we write

in the typical

form.

(j,k) th

place

function (4.3)

checked

= f(~)

we shall

w(z).

W

is also of matrix

= ~-½ W ( z ) -I W(z -I) n ½

matrix. letters.

S

j,k = 1,2,...,

are g e n e r a t e d S(z)

it is

is also a Hankel

and block

It follows

in

where £2 is the space of all ZIajl 2 < ~ and with the ihner

is sought.

that form.

(4.2)

that this is a unitary

so that in future

In this case

~ = ~

and

W = W.

use lower case

therefore

(4.3)

becomes

s(z) = w(z-l)/w(z) which

is o b v i o u s l y

of modulus

1

for

has real coefficients.

Of course

analytic

for

However

analytic

part,

The singular that operator

Izl ~ I. i.e.

value

Introduce

unlike

since W(z),

only the c o e f f i c i e n t s

the coefficients decomposition

of

to be appropriate,

S = 2 pjnj~j, 1

z = exp i~

S(z),

e.g.

of S

z 3,

j > 0,

is of the form

w(z) is not

of the

occur in

S.

(assuming

compact)

njnk = ~j~k = ~jk

Pl ~ P2 ~ "'" ~ 0.

the new random variables uj ~ nj

(I

@ ~½)

*

(I

@ n -½) e .

xj = ~j

W -I Yt+l t

Then E(uju k) = E(xjx k) = 6jk; The occur

uj,

xj

m i g h t be called

in the classical

analysis

as functions

theory

E(ujx k) = 6jkP j.

"discriminant

functions"

of statistical

canonical

that are used to c l a s s i f y

since they correlation

individuals.

The

14

pj

themselves

e s, s ~ t, canonical

w o u l d be c a l l e d

spans the same correlations

w o u l d be o b t a i n e d with the metric yr.

if

"canonical

space as do the

and the same S

uj

determined

(at least for

Hankel norm a p p r o x i m a t i o n given

n.

virtues

to

H

these

for the A R M A case.

H.

block,

Call

that such a c a n o n i c a l

r(v,j)

the

v=l,2, .... r(v,j),

after f i r s t

Then

by the f i r s t j'th

n

row,

of

it is

the b e s t W(z),

for

since the in a

to estimate

ideas h a v e b e e n used by Akaike

representation

is chosen as c o n s t i t u t e d

to

of h a v i n g

survey,

It will be r e c a l l e d

matrix

is known

into that here

that we shall b r i e f l y (1.6),

6

xi)

from a space

are b y no means evident

c o n t e x t nor are the e f f e c t s However

of

and e q u i v a l e n t

We shall not enter further

pj, uj, xj.

the same

by the c o v a r i a n c e

q = I) to d e t e r m i n e

of such an a p p r o x i m a t i o n

statistical

s ~ t,

as an operator

Once the singular value d e c o m p o s i t i o n

possible u n i q u e l y

of

yS,

Since

(but not the same

were c o n s i d e r e d

structure

correlatlcns".

introducing

in a way

a canonical

form is a t t a i n e d linearly

the

if

H0

independent

j = 1,...,q,

rows

in the v'th

such a set of rows is always of the form

v = 1 ..... nj;

j = 1 ..... q;

Znj = n.

(4.4)

The

n. are known as the K r o n e c k e r indices. They uniquely ] determine these first linearly i n d e p e n d e n t rows of H . There is 0 a c o r r e s p o n d i n g unique f a c t o r i s a t i o n of W(z) = A(z-l)-ic(z-l), w h e r e A(z -I) = ZA(z),

C(z)

= ZC(z)

and

in the j'th place in the diagonal. n o m i a l s with monic,

A having diagonal

Z

is diagonal with

A, C

are m a t r l c e s

elements

of degree

i.e. have unity as the c o e f f i c i e n t

= C - A

the d e c o m p o s i t i o n

is u n i q u e l y

of

znJ.

nj

z-nJ or p o l y w h i c h are

Putting

d e f i n e d by the i n e q u a l i t i e s

on degrees deg aij

< deg ~jj, j + i;

deg ~ij

~ deg aii'

j ~ i;

deg aij < deg aii" deg eij < deg ~ ii'

i,j = 1,2 .... ,q.

A k a i k e ' s m e t h o d leads to estimates y~t = (y(t)', y ( t - l ) ' , . . . , y ( t - h ) ' ) ' fitting an a u t o r e g r e s s i o n minimising

BIC

or

AIC.

j > i

of the

nj

and of

A.

Put

where

h

m i g h t be c h o s e n by

and d e t e r m i n i n g

h

as the order

Put,

for

£ = 0,1,...;

m = 0,1,...,q-l,

15

y£m(t) ' = (y(t+l} ', y(t+2) ', .... y(t+£) ', Yl(t+£+l),..., Ym(t+£+l) ) '. If the smallest

nj

is for

j = m

and

nm = £

then row

(see (4.4)) is linearly dependent on earlier rows of correspondingly y£,m(t)

H

r(E+l,m) and

(see (4.1)) there will be some linear function of

that is orthogonal to the

past,

while this will not be

true for £i < £ or for £I = £' ml < m. we consider the solutions of ~J[DjI£q+m - ~£,m~£,m ]"

To judge when this is so

01 > D2 > "'" > 0£q+m

1 ~£,m = {T Zy£,m(t)Y£,m(t),}-½ T1 Z ~ , ~ t ) (gt) ,{~l zgt(gt),}-% where the summations are over £q+m 4 hq.

The

~j

h+l < t < T-£-I.

It is assumed that

are the canonical correlations between the

Y£,m(t) and 9 t. Successively examining these canonical correlations (ordering (£,m) in dictionary order, first according to £ and then

m)

we stop when,

for the first time

_(T_V£,m)log (l-~£q+m) 2 - ~£,m > 0; If this happens at eliminate

£(i), m(1)

Ym(1) (t+£(1)+j),

then

j > 0,

nm(1)

~£,m = q(h-£)-m+l. is put at

from all future

£(i). Y£,m

Now

and

continue, always taking 9£,m as qh-dim y£,m(t) + i. Once nm(1) is determined we eliminate Ym(2) (t+£(2)+j), j > 0, from future y£,m(t) and continue and so on. In this way all nk are determined and with each will be associated an ~(k), which is the ~j for the smallest ~j at the step when nk was determined. ~j is determined only up to a scalar factor and that is fixed in ~(k) making the last element unity. of the estimate of to yj(t+v) determined,

A(z)

Now

~(k)

so that the element of

nk

in canonical form corresponding to

available.

~(k)

k'th

by row

corresponding

in y£,m(t), for £,m at the values where nk was is the coefficient of z v-I in the estimate of ak,j(z).

Thus at the end of the calculation the A(z),

determines the

and

estimate

~

of

the Kronecker indices, are

It is then necessary to estimate

C(z).

This would be

done

by forming ~(z-l)y(t) and using the calculated autocovariances of this to estimate those of C(z-l)e(t). Then an estimate of the spectrum will be obtained and factored to find an estimate of Since

A(0) = C(0)

and the row degrees of

C(z)

C(z).

are prescribed by

the degree inequalities this would have to be done carefully and would not be a trivial calculation for

q > i.

In any case these

18

estimates of

A(z), C(z)

are inefficient b u t could be used to

initiate a minimisation of

(2.5), in the form for

to the canonical choice of

H0

and the

nk"

(1.5) corresponding

We do not proceed

further with the description because there are problems with the method.

It is, so far, restricted to the ARMA case.

determined in an inefficient estimation procedure adjustment of them has been suggested.

The

nk

are

and no later

However the method is of

interest because of its association with the theory of the first part of this section. Notes on References.

Adamyan,

and Jewell and Bloomfield norm approximation. q=l,

Arov and Krein

(1983)

(1983,a)

suggest,

that the canonical correlations be found directly

estimate.

Akaike

(1983)

deal with the theory of Hankel

Jewell and Bloomfield

s(z) = W(z)/W(z-l),

for

from

w h i c h is to be obtained by factoring a spectral (1969,a)

presents his method.

of a moving-average model see Hannah 5.

(1971), Glover

For some estimation

(1970).

A Gauss-Newton Procedure

(i)

First the case

q=l

and the calculations

Gauss-Newton procedure but to include

n

! T T~ eT (t)2' Here

At each iteration this is to

Thus consider = cT(z-l) _l{a T (z-1)y(t)-bT(z-1)u(t)}.

eT(t)

at, by, c r

The idea is to use a

to approximate to the true A~MAX structure

in the estimation.

be done recursively.

ARMAX model for

will be discussed because this is important

are then quite feasible.

are the transfer functions, q=l

and

T

for given

n,

(5.1) in the

is the vector of system parameters

i.e. the 3n freely varying coefficients in aT, b , c T. Here, again, we use lower case letters for the scalar case. Note that b

is, i n general,

a row vector since we do not require

p=l.

The

and are functions only of wT(z)-i = CT(Z-I)-IaT(Z -1) eT(t) WT(z)-l£T(z) = c (Z-I)-IbT(z-I). The procedure is to linearise these functions about a previous estimate, of

(5.1)

to a linear problem.

Gauss-Newton but includes

n

which reduces the minimisation

As has been said the procedure is in the optimisation.

It is necessary

to obtain a first estimate from which to commence the itezation. This is done by taking autoregression. Step 0.

eT ~ 1

and choosing

Put vCt)

at, b T

We go on to describe the algorithm.

=

~-u(t)/'

t

=

1 ..... T

by regression

17

and use the Levinson-Whittle algorithm. hand element of

S n.

Choose

~

Let

G2 n

be the top left

to minimise

log ~2 + n(p+l)log T/T. n Let the first row of scalar and in

~(z)

F~,j

be called

(aj'b!)3 where

a.~

is

5. has p elements. Then ~. is the j'th coefficient 3 J and ~ is the j'th coefficient vector in ~(z). S

The basic algorithm is now given by step 1 which is repeated until convergence.

To commence step 1 one needs estimates

These will initially come from step 0, with Step i.

Define

e(t), ~(t), ~(t), ~(t)

. . . .

ce(t)=~y(t),

~

c~(t)=y(t),

n, a, ~, c.

~ ~ 1.

by

^

(t)=e(t), c~(t)=u(t),

y(t)=~(t)=~(t)=~(t)=e(t)=0,

t < 0.

Put

I ~ (t~\

v(t) :

l-~(t)~ ~-~(t)/

,

t = 1,2 ..... T

Fn,j' Fn,j' Sn'

and use the Levinson-Whittle recursion to generate

Sn" Put n

£n(t) = ~Fn,jV(t-n+j)

Sa(1)\ n,n~ (1)| = Sn_11 ~n,n[

~(l~ I n an /

n.

as zero for

t > T.

to

(i.e. repetition)

~(t)-~(t)+$(t) 6 (z-l)~(t)

a(z-l)~(t)-&~(t)=0

and

by

e(t)

implies that

~(t)- ~(t)

is,

of the

in

(5.2)

for

a(z-l)~(t) for

n > n

= e(t) a

linear c o m b i n a t i o n of the v a r i a b l e s in the regression. W h e n this ~(1) ~(1) i s done t h e ~ ., must be r e g a r d e d as a d j u s t m e n t s t o t h e n,3 n,] previous a~,j, c~,j i.e. m u s t be added to these. (ii)

N o w c o n s i d e r the v e c t o r case, w h i c h is m o r e e l a b o r a t e .

r e t u r n to the set, n,

for g i v e n

~.

M(n),

of all systems,

(We fix

the s y s t e m p a r a m e t e r s

~

the set of all H a n k e l m a t r i c e s

W(z), L(z). dimension

(1.3), of M c M i l l a n d e g r e e

for the m o m e n t only, b e c a u s e

that n e e d discussion.)

the r e q u i r e m e n t s b e l o w

(i.i))

H

of rank

First

M(n) n

it is

is e q u i v a l e n t l y

(for

W(z)

and of all p a i r s of t r a n s f e r

obeying functions

It m a y b e c o n c e p t u a l i s e d as a s m o o t h surface of n(2q+p)

and,

technically,

is an a n a l y t i c m a n i f o l d .

A

r e a s o n a b l e a p p r o a c h to e s t i m a t i n g a system w o u l d t h e r e f o r e be to determine

n

and t h e n the a p p r o p r i a t e p o i n t on

w h a t was done for because

M(n)

q=l.

For

q > 1

be m a p p e d h o m e o m o r p h i c a l l y

into E u c l i d e a n space.

a l t e r n a t i v e to the c o n s i d e r a t i o n of the K r o n e c k e r indices, There is, however,

a sum of

and this is

c a n n o t t h e n b e c o v e r e d by one n e i g h b o u r h o o d t h a t m a y

of all systems w h o s e K r o n e c k e r indices sum

of M(n)

M(n)

~his is h o w e v e r a p r o b l e m

M(n)

to

M(n) n

is the u n i o n

and h e n c e an

is the d e t e r m i n a t i o n of

as was the t e c h n i q u e u s e d in s e c t i o n 4.

s o m e t h i n g very a r b i t r a r y in the d e c o m p o s i t i o n

into sets c o r r e s p o n d i n g to d i f f e r e n t p a r t i t i o n s of q

n

as

i n t e g e r s and the e f f o r t r e q u i r e d for an e f f i c i e n t

p r o c e d u r e to d i s c o v e r t h e s e

is fairly c o n s i d e r a b l e .

of K r o n e c k e r indices s u m m i n g to n a m e l y those which,

for

n

n = qh + m,

A m o n g s t the set

there is one special set, 0 ~ m < q,

are of the f o r m

n I = n 2 = ... n m = h+l, nm+ 1 = -- . = n q = h. T h e n the f i r s t l i n e a r l y i n d e p e n d e n t r o w s in H are just the first n rows

n If

21

U(n)

is the subset of

independent

then

M(n)

u(n)

for which these rows are linearly

is open and dense in

or nothing is lost in restricting

attention

to

unlikely that the maximum of the likelihood in

M(n).

(However

u(n)

would provide

M(n).

Thus

U(n).

little

It is most

will be found off

a bad coordinate

system in

which to work if the maximum was near the edge.)

We describe

in another way by giving a unique description

A(z),

of

U(n) U(n)

B(z), C(z)

in W(2) = A(z-1)-lc(z-l), L(z) = A(z-l)-iB(z-l) for a system in U(n). We do this by describing the coefficient matrices An, j, Bn,j, Cn, j in A(z), B(z), C(z).

These will be depicted

indicating a freely varying are after the

All partitions

m'th row or column

A n ,0

= [~m Oq_m] ' Bnw0 = O, An , 1 = [[ 0. ]

Cn, 0

An,h+l' All other

below with a star

submatrix of elements.

Bn,h+l'

Cn,h+l =

An,j, Bn,j, Cn,j,

j ~ h+l,

are unrestricted.

We do

not mean that An,h+l, Bn,h+l, Cn,h+ 1 are equal. The vector T of system parameters coordinatising U(n) is of dimension n(2q+p) and is made up of the freely varying elements matrices.

in the coefficient

We now go on to describe how to estimate

n, T

and

~.

We do this by a series of steps that are related to those for but are more complicated. step

2

is

iterated.

Steps 0 to 1 are not repeated. Always the output

from the previous

the input to the next so we do not indicate

q=l

Only step is

those by a special

notation i.e. we do not for example write

~!i) for the ~. 3 J matrix found at step 1 since it is clear which Aj is used an step 2 i.e. that from step 1 and not step 0. Also we shall now index the stages in the Levinson-Whittle

recursion

by

h,

rather than

n

as

before. Step 0.

Put vet)

~ \-uCtl/'

t

=

1, ....

T

and use the Levinson Whittle recursion. hand choose

q x q ~,

submatrix of i.e.

n,

S h, h = 0,1,2,...,

be the top left n where n = qh, and

to minimise

log det ~n + n(q+p) Let the first block of

Let

q

log T/T,

rows in

Fh, j

n

=

hq,

h=

be called

0,i,...

|Ae,

I~j ]

and

22

let ~ be called ~. Then Aj: Bj n c o e f f i c i e n t m a t r i c e s in A(z), B{z), and

n

Step i.

=

are the

j = l,...,h

with

A0 = Ig,

C(z)

~ Iq

.

Put 8(t)

= Z Ajy(t-j) 0

- Z Bju(t-j) 1

and

v(t) =

|-u(t)J

k-act)/

and use the L e v i n s o n - W h i t t l e element of

Sh

is c a l l e d

algorithm.

~n'

Again

n = hq

the top left hand

and we choose

h

i.e.

to m i n i m i s e log det ~ Now

IAj, Bj, 6 9 ]

coefficient

Now

m

= ~-i

+ n(2q+p)

n

are the top

matrices

in

to the case

~-i

or equal to the true

in

in

C(z),

and p r o v i d e

with

A0 = B0 = I

n = £q+m,

for

0 ~ m < q.

m = 1,2,...,q-l,

at step i,

£,

T

is to insert

(5.3) and the elements

transfer

function

procedure,

in the a p p r o p r i a t e

a l g o r i t h m will be used for of

m

need be taken.)

n =

(h-l)q + m 1 < j < m

m.

An, 0, Cn, 0.


we regress

Yj(t)

here

at (see

t h e m as a r e g r e s s i o n

(It is u n l i k e l y

and then only

The r e g r e s s i o n

other variables,

that we d e s c r i b e If

q > 5

places

c h e a p l y using the c a l c u l a t i o n s

It is simpler to d e s c r i b e

one for each value of

than

We c o n s i d e r

done at step i.

step 1 but the details are too c o m p l i c a t e d to be d e s c r i b e d the references).

m=0

will be greater

indicated by a star in

This can be done c o m p u t a t i o n a l l y

for

was p r e f e r r e d

our procedure.

then are a l r e a d y

zero e l e m e n t s

h

h

the

We c h o o s e

since

to w h i c h

at step 1

which e x p l a i n s

but the c a l c u l a t i o n s

h=0,1,2,...

F~,j

If there is a true r a t i o n a l

system then for large e n o u g h

The p r o b l e m

n = hq,

rows in

B(z),

and need only compute

by the criterion.

m = q

q

A(z),

has to be determined,

corresponds

log T/T,

4

that the

or fewer v a l u e s

is of a v e c t o r v a r i a b l e

but is c a r r i e d for a typical

on

out row by row so row,

j,

j = l,...,q.

on the f o l l o w i n g v a r i a b l e s ^

(i) and

- Yk(t-i), i = 2,...,h

for

k=l,...,q;

where

k = m+l,...,q.

i = l,...,h

for

k ~ m

23

(2}

Uk(t-i),

k = l,...,q;

i = 1 .... ,~

(3)

ek{t-i),

k = l,...,q;

i = 1 .... ,h

where ^j^ Z C4e(tj) = 0

g

g

^j Z A4y(tj) 0

Z Bju(t-j ), 1

A

y(t) FQr

m < j < q

= u(t)

we regress

= e(t) yj(t)

(i)

-Yk(t-i),

(2) (3)

-(Yk(t) - Sk(t)), uk(t-i), ek(t-i),

The coefficient regression ek(t-i)

of

-Yk(t-i) aj,k(i) ,

in relation as

or

-(Yk(t)

in

A(z) C(z).

from the

choosing

to m i n i m i s e

q

m = q

the left

end of step

I).

Now we have

an

indicated

by

As was said above

for

Step 2.

q2, q2

n=(~-l)q+m, products

of

is chosen by

n = q(~-l)

with

+ m,

m = i, .... q.

expression

the latter

0, 1 are not repeated.

will be necessary,

F o r m matrices

respectively

~(z)

~n'

uk(t-i),

at the

of the form

n = ~.

steps

often no repetition

n

j'th

for

and cross

Now

log T/T,

B(z),

is the

The matrix

side is just the m i n i m i s e d

n, A(z),

(5.3)

- ek(t))

and s i m i l a r l y

regressions.

log det ~n + n{2q+p) (For

i = 1 ..... ~-i.

by the sums of squares

the residuals m,

i = 1 .... ,h-l.

k = m+l ..... q. k = 1 ..... q,

to • B(z),

T -I

t ~ O.

on

k = l,...,q;

estimate,

is estimated

= O,

q(t), and

~(t),

qp

~(t),

columns

Step

2 may be but

or at most one. of

q

rows

and,

by solving

A

h 0Z ~j [q(t-j},

~(t-j) , ~(t-j)]

=

(y(t)',

(y(t} ', u(t) ', e(t) ') = 0, Here

A

e(t)

is obtained 0Z ~je(t-j)

with the usual product wherein

Iq,(5.4)

t ~ 0.

from

= - 1Z ~ u(t-J)3

initial

u(t)', e(t)')@

conditions.

a typical

block

is

+ By

0Z Ajy^ (t-j) X @ Y

xijY ,

we m e a n

(5.5) the tensor

i = 1,...,a;

j = l,...,b

24

where

X

is

a x b.

Of course in (5.4)

all blocks are a scalar multiple of column of

n(t), for example, is

X

lq.

is

1 x (2q+p)

Thus the

and

i+q(j-l)th

O(z=l)-iEijy(t),

where

Eij

consists of zeros save for a unit in the (i,j)th place. Put

I [n(t)!] ~v(J ) = ~ Xl-C(t) ~-l[q(t+j), -C(t+j), -~(t+j)] . L-C (t) This matrix is of dimension

q(2q + p).

It is to be the

that is the input to the Levinson-Whittle carried out.

It is, thus,

computational effort.

For

(5.6)

~v(j)

recursion which is to be

q(2q + p)

that determines the

q = p = 5

this is

75,

which already

would be a rather large scale implementation of the Levinson-Whittle recursion. In cases where q is larger it may be necessary to use some other expedient and we discuss this in remarks below. Let

~(t)

be the vector obtained by adding columns numbered

i + q(i - I),

i -- l,...,q

in

n(t)

and similarly for

~(t), ~(t)

in relation to ~(t), u(t). (It is ~(t), ~(t), ~(t) that correspond most closely to the quantities defined for q=l.) Thus h 0Z Cjn(t-j) = y(t), 0Z Cj~(t-j) = e(t), 0Z Cj~(t-j)=u(t). ^

^

^

Now form, for each h recursion with (5.6), ^ = ~-I_ ~h,h Sh i

^

value considered in the L e v i n s o n - ~ i t t l e

h-I 7 ~ !Z[n(t-h+j) 0 h-l,j T

-~(t-h+j), {~(t)

Here e(t) q (2q+p).

is as from (5.5).

This vector,

Th, j = Th_l, j + Fh_l,h_jTh, h,

-~(t-h+j)] '

- ~(t) {h,h'

+ e(tl}. is of dimension

j = 0,1,2 .... ,h-l.

To initiate take ~0,0 to have zeros everywhere save for units in the places numbered i + q(i-l), i = l,...,q; q(q+p) + i + q(i-l), ^

i = l,...,q.

NOW the

Th, j

Bj, Cj, for n = hq. Thus element in the i + q(k-l)'th

provide^ estimates of the matrices

Aj,

Ah, j has as estimate of aik(j) the place in ~h,j" ~h,j has as its

(i,k) 'th element the element in place

q2 + i + q(k-l)

while Ch, j has as its (i,k) 'th element that in the {q2 + qP + i + q(k-l)}'th place. Next put,

in

~h,j

25

~n = T1 Z ~n(t)&n(t)'"

n = hq,

where h h Z th, jen(t- j) = Z A ,jy(t-j) 0 0 h and choose

i.e.

~

h - Z Bh,jult-j) 1

so that this minimises

log det ~n + n(2q + p) log T/T,

Now we seek to estimate

m

in

n = (~ - l)q + m,

as in step I of the algorithm. this as a regression formed at

(5.4) columns

elements

in

numbered

[A(j),

B(j), C(j)]

i+q(k-l),

bik(J)

matrix

Oik(J). element

in addition,

q2+qp+i+q(k-l)

are added,

having been eliminated) Now call

T

order,

where

parameter

comes

column index,

~h,j' the

i+q(k-l),

from k

all columns

i=l,...,m; Xj(t)

columns numbered i=m+l,...,q,

A(j),

j,

B(j)

i,k=l .... ,q for which

to be null.

Thus

k=m+l,...,q

are

except that for the i+q(k-l),

k=l,...,m

to form a matrix of only

to lag,

q(2q+p)

is associated

q2+qp+i+q(k-l),

is prescribed

these parameters

first according

~(t-j)]

aik(J) , the

k=l,...,p

Now eliminate

the vector of estimates

n

n=(~-l)g+m,

~(t-j),

As in forming

Call the resulting matrix

X0(t),

|-n(t-j),

is associated with

in (5.3)

columns numbered

eliminated.

it could be computed using

i=l,...,q;

and the column numbered

the corresponding j=l

step.

to describe

in the sense that the column

i,k=l ..... q

q2+i+q(k-l),

is associated with for

though

Consider

in this matrix are associated with the

column numbered with

m,

(5.6).

from the previous

q(2q+p)

m = 1,2,...,q-l,

Again it will be easiest

for each

the output from the use of

(5.7)

n = hg.

(all others

m(q-m)

of system parameters

are arranged or

C(j),

for

in dictionary

then according

and finally according

columns.

to whether

then according

to row index

i.

the to

Then

~n = {TI ZX(t) '~-ix(t)}-i {~I ZX(t) ,~-l(~(t) + e(t) -~(t))}; x(t) = [X0(t),

We emphasise

Xl(t) . . . .

that

of step 2.

~(t), ~(t)

are all formed using

step, which at the first use of step 2

step I, but later will have been from a previous use

Only

at this step.

].

X(t), ~, ~(t),

the output from the previous will have been

(5.8)

h

has been determined

The notation

~n

in (5.8)

by previous calculations should not be confused

26 ^

with ;fh,~ type

earlier. h,j

~n

is made up of many submatrices

and is of dimension

n(2q + p),

of the

n ~ (h-l)q + m.

We now again put = TI Z e n (t) e^n (t)',

~n

n = (h-l)q + m

(5.9)

h

Z 6 e (t-j) 0 n,3 n where

~n,j'

= Z A (t-j) - Z B ju(t-j) 0 n'JY 1 n,

~n,j' dn,j

the identification

have elements

discussed

before

obtained

(5.7).

~n

according

We choose

~

to

to

minimise log det ~n + n(2q + p) log T/T,

n = (h-l)q + m,

m = 1,2,...,q. (5.10)

Again for

m = q

optimised

(5.7).

the value of this criterion Then

values corresponding of

~

from

~j, ~j, ~j

to

~, i

is

(h-l)q + m

(5.9) that optimised

the

Remarks.

I.

analogues

here.

(5.4)

~

in

(5.5)).

All of the remarks In relation

from these

a

in relation

this factorisation references). 2. and

AA " C(e i ~)~C(e I~)

which does have

det C(z)

are available

Much of the work involved q

begin to be important.

would not be unreasonable step 1.

for

*

[z[ ~ i.

simulations

of

transfer n

function

is improved

at

Iz[ ~ I.

so as to obtain Algorithms

for

(see the

h, m

p

there it

found at

of step 2 the values of

step.

of

and if that

To reduce the calculation

to do them only at the

at the previous

with rational

that the determination

C(z)

is in step 2 where the sizes of

(5.5) have been computed we may move determined

# 0,

criterion

canonically

# 0,

of

but we omit details here

In any case at repetitions

h, m

the description

to the scalar case have

det ~(z)

from the first use of step 2 could be used. (5.4),

is the value

to the use of an estimate

again a problem will arise unless

C(z),

~

~j, ~j, Cj, ~,

This completes

Again this can be checked via a Schur-Cohn fails we should factor

and

to be the

(5.10).

We may now repeat step 2 commencing (which defines the algorithm.

is that which

are finally defined

h,

When that is done, once straight to

(5.8),

(5.9)

However experience with generated

data shows

at the first use of step

2 and it may improve again at later iterations

of that step.

27

N o t e s on R e f e r e n c e s .

The a l g o r i t h m s here d e s c r i b e d w e r e

p r e s e n t e d in R a n n a n and R i s s a n e n

(1982), H a n n a n and K a v a l i e r i s

The emphasis there w a s m o r e on o r d e r d e t e r m i n a t i o n . d e t e r m i n a t i o n of m

in step 1 (i.e.

is given in the second r e f e r e n c e . b e g i n n i n g of s u b s e c t i o n example.

(ii)

(1969)

Tunnicliffe Wilson

(1972).

For

(1984).

the

q > i) an a l t e r n a t l v e c a l c u l a t i o n For the structure t h e o r y at the

see D i e s t l e r and H a n n a h

The a l g o r i t h m in remark 4 in s u b s e c t i o n

Tunnicliffe Wilson

first

(1981),

for

(i) is due to

and its m a t r i c i a l v e r s i o n to

28

6.

Some T h e o r e t i c a l C o n s i d e r a t i o n s

This section w i l l be very b r i e f this account,

nor c o u l d

a v a i l a b l e here.

since t h e o r y is not the p u r p o s e of

such t h e o r y be f u l l y p r e s e n t e d in the space

H o w e v e r there seems to be some v i r t u e in i n d i c a t i n g

the scope of the t h e o r y u n d e r l y i n g

the m e t h o d s .

In the first place it is not n e c e s s a r y t h a t linear i n n o v a t i o n s , e(t),

be G a u s s i a n and all of the m e t h o d s are v a l i d u n d e r m u c h m o r e

general c o n d i t i o n s the

e(t)

in the sense t h a t the same theory o b t a i n s as if

were Gaussian.

The e s s e n t i a l c o n d i t i o n

E{e(t) le(t-l) , e(t-2) ....

} = 0.

This is e q u i v a l e n t to the a s s e r t i o n ,

for

(1.3), that the b e s t l i n e a r p r e d i c t o r

by a linear system.

Asymptotic

u(t)

(in the

if they are to be

require additionally that (6.2)

} = n.

some r e g u l a r i t y c o n d i t i o n s of a r e a s o n a b l y g e n e r a l

nature are n e e d e d b u t we do not d i s c u s s Of course

see

in that sense, g e n e r a t e d

distributions,

E{e(t)e(t) 'le(t-l) ....

- ZLiu(t-i) ,

is the b e s t p r e d i c t o r

so t h a t the d a t a is,

For

(6.1) y(t)

least squares sense)

the same as for the G a u s s i a n case,

is that

(6.1),

t h e m here.

(6.2) w i l l h o l d if the

with zero mean v e c t o r and finite

e(t)

are i n d e p e n d e n t ,

s e c o n d moments,

but are c o n s i d e r a b l y

more general. 7.

On-Line Procedures

Here only the case

p = q = I

m e t h o d easily g e n e r a l i s e s

to

w i l l be c o n s i d e r e d , p > I.

c o n c e r n i n g m e t h o d s for real time, and this has r e c e n t l y b e e n references.

There

though the

is a large l i t e r a t u r e

o n - l i n e e s t i m a t i o n of s y s t e m s

surveyed,

as w i l l be i n d i c a t e d in the

Here a t t e n t i o n will be c o n c e n t r a t e d on an o n - l i n e

implementation,

for

q = i,

of the a l g o r i t h m d e s c r i b e d in s e c t i o n 5.

In other w o r d s we i m p l e m e n t the two steps of this a l g o r i t h m in an o n - l i n e fashion, w i t h the step 1 i t e r a t e d Before d e s c r i b i n g that let us d e s c r i b e procedures.

(i.e. repeated)

once.

three k n o w n on-line

Each is of the f o r m

T(t) = T(t-l)

+ P(t)x(t)$(t),

e(t)

= w(t)

- T(t)'x(t)

2g

T

P(t)

= {z x ( t ) x ( t ) ' } - i 1

= P(t-l) Here and

v(t)

is the

T(t)

"independent

is the e s t i m a t e

coefficients. w(t),

- {I + x(t) 'P(t-l)x(t) } - I p ( t - l ) x ( t ) x ( t ) ' P ( t - l ) .

must

at t i m e

In t h e b a s i c

be c o n s t r u c t e d

w i t h the e s t i m a t e

on-line at t i m e

T(t-l).

(I) RLS = R e c u r s i v e

least

variable" t

in a r e g r e s s i o n

of the v e c t o r

procedures t

squares.

to t i m e

identify

bl(t),

t

together

T, x ( t ) ,

This corresponds

ah(t),

and probably

w(t).

to s t e p 0.

• (t) ' =

(al(t),

x(t) ' =

( - y ( t - l ) , - y ( t - 2 ) ..... -y(t-h) , u ( t - 1 ) , u ( t - 2 ) , . . . , u ( t - h ) }

w (t)

a2(t) .....

x(t)

of r e g r e s s i o n

x(t),

from data

In e a c h c a s e w e

on

b2(t) .....

bh(t)).

= y (t).

(2) A M L = A p p r o x i m a t e

maximum

likelihood.

This

corresponds

t o the

first use o f s t e p i.

T(t) ' =

(ai(t) ..... an(t) ,bl(t) ..... b n ( t ) ,Cl(t ) ..... C n ( t ) )

x(t) ' =

(-y(t-1) ..... - y ( t - n ) , u ( t - 1 ) ..... u ( t - n ) , % ( t - 1 ) ..... % ( t - n ) )

w(t)

= y(t).

In fact w h a t

is m o s t p r o p e r l y

but i n s t e a d

£ (t-j)

~(t)

= y(t)

e (t-j)

in

x(t)

- T(t)x(t).

This c a n b e done s i n c e which uses

c a l l e d /hML u s e s n o t

at t i m e

t

the l a t e s t

value

used

is

~(t-1)

T(t-1).

(3) RML = R e c u r s i v e s e c o n d use o f s t e p

maximum

likelinoodo

This

corresponas

to t h e

I.

T(t) ' =

(a l(t) , .... a n(t) ,b l(t), .... b n ( t ) , c l(t) , .... c n(t))

x(t) ' =

(-~(t-1) ..... - ~ ( t - n ) , ~ ( t - 1 ) ..... ~(t-n) ,~(t-1) ..... ~(t-n))

w(t)

= ~(t)

+ ~(t)

n 7 Cj (t)x(t-j)' 0 y(t)

= u(t)

As p r e s e n t e d independently

=

= e(t)

- ~(t).

(7.1)

( - y ( t ) , u ( t ) , ~ ( t ) ) , C 0 ( t ) - l, = 0,

t ~ 0.

above each of these of the others.

(7.2)

c o u l d De u s e d as a p r o c e d u r e

Of c o u r s e

n

is fixed.

It is k n o w n

80

that AML m a y not converge, McMillan degree

n,

e v e n if the true s y s t e m is A R M A X of

unless

2R(c(ei~) - I - ½) > 0,

~ q [-~,~],

i.e. unless the p o s i t i v e real c o n d i t i o n is s a t i s f i e d .

It seems

that ~ML m a y fail u n l e s s the l o c a t i o n of the zeros o f

C(z)

is

m o n i t o r e d and w h e n these move inside the u n i t circle then the Cj {t)

u s e d in f o r m i n g

~ (t) , ~ (t) , ~ (t) , % (t)

m u s t be held at

fixed v a l u e s o u t s i d e of the u n i t circle u n t i l the o u t p u t vector, T(t)

corresponds

to a stable

C.(t) set, i.e. a set with zeros 3 For these reasons it has b e e n s u g g e s t e d

outside of the circle.

that the a l g o r i t h m s be run in parallel, p r o v i d e d by the

~(t)

for RML b e i n g the

~(t)

f r o m AML.

in g e n e r a l be m u c h l a r g e r t h a n assumed true order.

w i t h the

from RLS and w i t h the

~(t-j)

e(t)

The v a l u e of

n

in h

for AML

(7.1),

in AM-L, RML w h e r e the

A c o m m o n choice w o u l d be

f7.2)

in RLS w o u l d

h = 2n,

n

is the

but t h i s

is arbitrary. One main reason for on-line c a l c u l a t i o n is to allow the e s t i m a t e s to adapt to an e v o l v i n g m e c h a n i s m

generating

case one should a l s o be "forgetting"

the r e m o t e p a s t since that

will be i r r e l e v a n t to the "forgetting factor" and

x(s)

e s t i m a t i o n problem.

£t(s)

t ~ ~(u), u=s+ 1

then the nett e f f e c t changed,

at time

t.

In that

Thus a

is i n c l u d e d t h a t m u l t i p l i e s

in the c a l c u l a t i o n s ~t(s) =

the data.

w(s)

If

£ (s) = 1 s

is that o n l y the f o r m u l a for

P(t)

is

becoming 1 = ~)[P(t-l)

P(t)

- {l(t)

+ 2 ( t ) ' P ( t - l ) x ( t ) } - i p ( t - l ) x ( t ) x ( t ) 'P(t-l)] . One r e a s o n a b l e p r o c e d u r e w o u l d be to take 0 < ~ < 1

and

1

is fairly n e a r to

However it is felt t h a t

h

and

n

i,

h

will have to i n c r e a s e w i t h

converge to the t r u e log t,

T

H l,

where

l

is

0.95.

m i g h t be m a d e to d e p e n d on

In p a r t i c u l a r even if the true system w e r e then

l(t) e.g.

t

Of c o u r s e if

in o r d e r that h

T (t)

increases with

as it w i l l if AIC or BIC is used to choose

e v e n t u a l l y the c a l c u l a t i o n c a n n o t be d o n e

t.

of the k n o w n order,

h,

in real time.

n,

will t

as

then However

if "forgetting" is u s e d then the sample s i z e is not, truly,

3~ increasing

with

The criterion

t

and thus

time

t

shoula not increase

indefinitely.

should be

log c^2(t) h wherej when

h

+ h log f(t)/f(t)

"forgetting"

is used

f(t)

(7.3) measures

the sample

size to

and is f(t+l)

= X(t+l)f(t)

sznce the effective

sample

+ i,

size is

t t f(t) : Z n X(u). s=l s+l It remains allowed

to describe

so to vary.

RLS, where readily with

h

how to compute

Though

indicates

~(t)

in

these procedures

the order,

(7.3) w h e n

are d e s c r i b e d

and for

p = 1

h

is

for

it will be

seen that they can be used in the same way for AML or RML,

n

taking

the place of

could also fairly easily vector

x(t)

h,

and for

be g e n e r a l i s e d

p > i.

to

Inaeea

they

Call

Xh(t}

q > 1.

when this has b e e n rearranged

the

as

(-y (t-l) ,u (t-l) ,-y (t-2) ,u (t-2) ,... ,-y (t-h) ,u (t-h)) and rearrange

T(t)

Xh(t)' If

Q

accordingly,

is orthogonal

and

Rh(t)

and

is upper

f (t)-Ish (t) 2

the calculations be obtained S =

is

[~(t)

,

[xH~t+l~ Q

acts only on rows place

in

QiQi_l

Qi_iQi_2

... Q1 S

:

[~

(t)

rh(t) 1

0

sn(t)j

triangular, c~(t).

then

Moreover,

as

cost.

Put

= (y(1) ..... y(t}).

~(t)~h(t)

= -rh(t|

as now will be indicated, a~(t),

h = I,...,H

may

consider

rH (t) ]

y(t+l~] Q2HQ2H_I

i, 2H+I ... Q1 s. are

Indeed

%h(t).

v~t)'

may be done so that all

at little

and construct

it

: [Xh~l) ..... xh(t)],

Q[X h(t)v(t)]

where

calling

--- Q2QI

where

and introduces

Qi'

orthogonal,

a zero in the

Then if the rows numbered

i,

(2H+l,i)'th (2h+l)

in

32

and

(0,0,

...,

0, d ½, d % r 2 . . . .

~0,0,

...,

0 • ~_iXl

xI i'

di,

~i

2

e

, 6 i~ - i x 2 '

are defined

d i = d + 6i_ixi

,

= d / d i,

i,

(0,0

(2h+l)

#

..., 0,

where

SH(t)2 we may find

of

"'" Q1

chosen

... S

d r h+l )

Q

---

, ~ X h + l ). Q2HQ2H-I

"'" Q 1

right hand element

of

Q2HQ2H_I

... Q 1 S

is

~(t)

h ~

H

h = H,

Moreover

at no e x t r a ... Q 1 S

6~ h.

Thus

this,

and

(7.3)

the w h o l e

thing may be done with f o r m of the a l g o r i t h m s

q = i.

How

this

well and the and via

h

h

to h a v e

transfer

It w o u l d

then,

eventually

the a l g o r i t h m

algorithm

could

t = 2000)

that

and often

log t

itself

with

RML.

This then

5, at l e a s t

to b e seen.

should t

could to i t s

for

If

h,

If it w a s fit the d a t a

set

l(t)

£ 1

and could be chosen

increase

as

log t.

Tnis

n o t r u n in real time. long run value

increases

b e r u n In r e a l t i m e

it c o u l d b e r e g a r d e d

recursive.

system would

then we

to i n c r e a s e

would De small compared

very large

in AML,

in s e c t i o n

h ~ H,

made

some virtues.

function

right

for a l l

calculation n

gives

and that of

will be remains

system was not evolving

that eventually

However got

certainly

a rational

should be allowed

(7.3).

means

it s e e m s

algorithm

the b o t t o m

m a y be c o m p u t e d

the same

that

is

Thus given


cost, s i n c e % ^ is 62heh(t)

an o n - l i n e

are f i x e d

in RLS.

is

+ ~H(t) 2

Q2hQ2h_l is

for

defines n

are

"'"

recursively.

useful

xI = 0

of

to minimise

Precisely

e

r I = r I = I,

element

~H(t)

~(t)

hand element

0, 6 X 2,

= SH(t-1)2

for all

QiQi_l

right hand

and the bottom

6 %2 H e^H (t),

!

r k = c r k + sx k,

r o w s of

the bottom

believed

---- 1

60

= ~i_ixi/d i

O, d ~i e d z r 2,

--oQ

(0,0,

Q2hQ2h-i

,

!

t h e n the

~h(t) 2

by the recursion

s

x k = x K - xirk,

6 H

' 6 %i - i x 2h+l"%

"'"

6i = d 6 i - i / d i

e

Moreover

, a½r2h+l)

slowly

u p uo v a l u e s

as a r e a l

until

it

so t h a t the so l a r g e

time algorithm.

(say

33

Allowing

h

useful, w i t h

and

n

l(t)

to i n c r e a s e n e e d s i n v e s t i g a t i o n but c o u l d p r o v e a d r o i t l y varied,

or even a n o n - l i n e a r ,

to m o d e l an e v o l v i n g p h e n o m e n o n

episodic phenomenon.

When

h

or

n

it is likely that o c c a s i o n a l l y they w i l l change a p p r e c i a b l y value of

t

to another.

This

is b e c a u s e

(7.3)

varies from one

is l i k e l y to be flat

near its m i n i m u m or e v e n h a v e s e v e r a l m i n i m a n e a r to e q u a l i t y . may not m a t t e r m u c h since all of the c o m p e t i n g m o d e l s

This

are b e n a v Z n g

about e q u a l l y w e l l b u t c o u l d be m i s i n t e r p r e t e d as e v o l u t i o n . Notes on References.

The f i e l d of o n - l i n e c a l c u l a t i o n

surveyed in L j u n g and S ~ d e r s t r o m section, for

h,

n

(1986).

The b a s i c p r o c e d u r e of this

fixed, w a s s u g g e s t e d in Mayne,

(1983) and the p r o c e d u r e and M a c k i s a c k

(1983).

for

h, n

is e x t e n s i v e l y

A s t r o m anO C l a r ~ e

v a r y i n g in Hannan,

Kavalleris

34

References. Adamyan,

V.M., Arov,

D.F. and Krein, M.G.

(1971)

of Schmidt pairs for a Hankel operator Schur-Takagi Akaike,

H.

Ann. Akaike,

problem.

(1969) Inst.

H.

Fitting

(1969,a)

autoregressive

Canonical Advances

and D.G. Lainiotis, Anderson,

B.D.O.

15, 31-73.

models

for prediction.

6, 416-431. correlation

and the use of an information Identification,

and the generalised

Maths USSR Sbornik,

Star. Math.

Analytic properties

analysis

criterion.

and Case Studies,

Academic

and Moore, J.B.

Press, (1979)

of time series

In: System eds. R.K. Mehra

New York,

29-91.

Optimal Filtering,

Prentice Hall, Englewood Cliffs. Casti,

J,L.

(1977)

Academic Cooley,

J.W.

and Tukey,

calculation

Glover,

M. and Hannan,

B.

(1983)

E.J.

(1981)

Anal.

Lattice

systems and their Cambridge,

Multiple

Hannah,

E.J.

(1980)

The estimation

system.

of the

for adaptive

processing.

Ann. (1981)

Statist.

error bounds

Tlme Ser~es,

Research Dept.

Wiley,

New York.

of the order of an ARMA

8, 1071-1081.

Estimating

J. Multivariate

L~

Systems Division,

of linear

EngLand.

(1970)

E.J.

of

All optimal Hankel norm approximatlons

E.J.

Hannah,

Mathematics

Some properties

filters

Hannan,

process.

for machine

ii, 474-484.

Control and Management

Engineering,

An algorlthm

70, 830-867.

multivariable Report,

(1955)

of ARMA systems with unknown order.

(1982)

IEEE,

K.

J.W.

19, 297-301.

J. Multivariate

Proc.

and Their Applications,

of complex Fourier Series.

parameterization

Friedlander,

Systems

Press, New York.

Computation, Diestler,

Dynamical

the dimension

Anal.

of a linear

ii, 459-473.

of

35

Hannan,

E.J.

(1982)

criterion.

Testing

for autocorrelation

In: Essays in Statistical

and E.J. Hannan,

and Akaike's

Science,

Applied Probability

Trust,

eds. J.M. Gani

Sheffield,

403-412. Hannan,

E.J.

and Kavalieris,

series models. Hannan, E.J. models. Hannan, E.J.,

and Kavalieris,

Kavalieris,

autoregressive

Ann. Statist.

of past and future

~1986)

order.

Recursive

733, no.l. estimation

BiometriKa,

of mlxed

59, 81-94.

Prentice Hall, Englewood

~1983)

canonical

definitions

Identification,

P.

(1983,a)

for time serles:

T.

(1983)

MIT Press,

D.Q., Astrom,

for recursive

correlations

correlations

bounds and computation.

K.J.

and Clarke,

Research

Imperial

(1983)

Theory and Practice

College,

Universal

J.M.

(1983)

of parameters

Report,

of

Mass. A new algorithm

in controlled

Dept. of Electrical

London.

prior

estimation by minimum description

for parameters length.

and

Ann. Statist.

416-431. R.

(1980)

Asymptotically

order of the model process.

of

and theory.

Canonical

Cambridge,

identif±cation

A~MA processes.

Shibata,

Cliffs.

l_!l, 848-855.

Lgung, L. and S6derstrom,

J.

autoregression

l_!l, 837-847.

Ann. Statist.,

Rissanen,

M.

Biometrika,

for time series:

and Bloomfield,

Engineering,

Regression,

J. (1982) Recursive

P.

linear time

7.

Linear Systems,

past and future

Mayne,

(1986)

moving-average

(1980)

Multivariate i_~6, 492-561.

L. and MacKisack,

Jewell, N.P. and Bloomfield,

Jewell, N.P.

L.

of linear systems.

Hannan, E.J. and Rissanen,

T.

I1984) Prob.

J. Time Series Anal.

estimation

Kailath,

L.

Adv. Appl.

Ann.

efficient

for estimating

Statist.

selection

parameters

8, 147-164.

of the

of a linear

~,

36

Tunnicliffe Wilson, G.

(1969}

Factorization

of the covariance

generating function of a pure moving-average SIAM J. Numer. Tunnicliffe Wilson,

Anal., G.

~1972) The factorization or matricial

spectral densities. Whittle,

P.

~1963)

process.

~, 1-7.

SIAM J. Appl. Math.,

23, 420-426.

On the fitting of multivariate

auto-regressions

and the approximate canonical

factorization of a spectral

density matrix.

50, 129-134.

Biometrika,

Chapter 2

Linear Errors-in-Variables Models

Manfred Deistler


In this c o n t r i b u t i o n we are c o n c e r n e d w i t h some a s p e c t s of the ident i f i c a t i o n p r o b l e m for linear systems w h e r e b o t h inputs and outputs are subject to

("observational")

called e r r o r s - i n - v a r i a b l e s

errors.

M o d e l s of this k i n d are

(EV) models.

The c o n v e n t i o n a l s e t t i n g in the s t a t i s t i c a l a n a l y s i s of linear s y s t e m s is to a t t r i b u t e all e r r o r s to the outputs,

or

valently to add the e r r o r s to the e q u a t i o n s . equations ^

(for our purposes)

equi-

T h i s gives the e r r o r s in

(EE) models. ^

Let x t and Yt d e n o t e the "true" inputs and o u t p u t s r e s p e c t i v e l y and let x t and Yt d e n o t e the o b s e r v e d inputs and outputs, ation can be i l l u s t r a t e d as follows:

I

There

E V m o d e l s are of the form:

Fig

I

then the situ-

I: S c h e m a t i c r e p r e s e n -

t a t i o n of an E V m o d e l

u t and v t are the e r r o r s of the inputs a n d the o u t p u t s re-

spectively.

On the o t h e r hand EE m o d e l s are of the form:

S

Fig 2: S c h e m a t i c r e p r e s e n -

r

tation of an EE m o d e l Yt

S8

Of c o u r s e the E V setting is m o r e g e n e r a l t h a n the EE setting.

For a

nunlber of p u r p o s e s , e.g. for the p r e d i c t i o n of the o b s e r v e d o u t p u t s f r o m o b s e r v e d inputs,

the EE s e t t i n g is adequate.

In m a n y cases h o w -

ever, the E V s e t t i n g seems to be m o r e a p p r o p r i a t e ,

(i)

e.g.

if our m a i n i n t e r e s t c o n c e r n s the "true" s y s t e m g e n e r a t i n g the data

(rather t h a n a good r e p r e s e n t a t i o n of the data)

and if we

c a n n o t be sure a p r i o r i that the true inputs are not c o n t a m i n a t e d by e r r o r s

(ii)

if we w a n t to d e c o u p l e the c o m m o n e f f e c t b e t w e e n the v a r i a b l e s f r o m the i n d i v i d u a l e f f e c t s

(iii)

if there is no a p r i o r i c l a s s i f i c a t i o n of the o b s e r v e d v a r i a b l e s into inputs a n d o u t p u t s a n d if thus a s y m m e t r i c t r e a t m e n t of the v a r i a b l e s w o u l d be a p p r o p r i a t e .

We are d e a l i n g here only w i t h linear systems in a s t a t i o n a r y context. Also,

if the c o n t r a r y has not b e e n s t a t e d e x p l i c i t e l y ,

we r e s t r i c t

o u r s e l v e s to the single input - single o u t p u t case. Our p r i m a r y int e r e s t is in the c h a r a c t e r i s t i c s function

of the system,

i.e. in the t r a n s f e r

(or the p a r a m e t e r s of the t r a n s f e r f u n c t i o n ) ;

but a l s o the

^

c h a r a c t e r i s t i c s of the errors and of

(xt) are of interest.

The s t a t i s t i c a l t h e o r y of linear d y n a m i c EE systems, A R M A X systems

(also in the m u l t i i n p u t - m u l t i o u t p u t case)

c h e d a c e r t a i n stage of c o m p l e t e n e s s n o w (1984)).

e s p e c i a l l y of has rea-

(see H a n n a n and K a v a l i e r i s

In the EV case on the o t h e r h a n d there is still a g r e a t

n u m b e r of open p r o b l e m s and this is the r e a s o n why there is still a r e l a t i v e l y small n u m b e r of a p p l i c a t i o n s

in this field.

lems in the E V case a r i s e f r o m the fact that the

The m a i n prob-

(ensemble)

second

m o m e n t s of the o b s e r v a t i o n s do in g e n e r a l n o t u n i q u e l y d e t e r m i n e the t r a n s f e r f u n c t i o n of the system. A n o t h e r d i f f e r e n c e to EE m o d e l s is, that in the EV case, h i g h e r o r d e r m o m e n t s may c o n t a i n a d d i t i o n a l

Our e m p h a s i s fiability,

is on two problems:

i.e.

(in the non G a u s s i a n case)

i n f o r m a t i o n a b o u t the t r a n s f e r function.

The f i r s t is the p r o b l e m of identi-

the p r o b l e m w h e t h e r the c h a r a c t e r i s t i c s

of i n t e r e s t

39

m e n t i o n e d a b o v e can be u n i q u e l y d e t e r m i n e d f r o m c e r t a i n c h a r a c t e r i s t i c s of the o b s e r v a t i o n s as e.g. f r o m t h e i r from their p r o b a b i l i t y law

(ensemble)

s e c o n d m o m e n t s or

(see D e i s t l e r and S e l f e r t

(1978)).

If the

answer is n e g a t i v e then the s e c o n d p r o b l e m is to d e s c r i b e the sets of 9 b s e r v a t i o n a l l y e q u i v a l e n t c h a r a c t e r i s t i c s of interest,

i.e. the sets

of c h a r a c t e r i s t i c s of i n t e r e s t w h i c h c o r r e s p o n d to the same c h a r a c t e r istics of the o b s e r v a t i o n s . These q u e s t i o n s are q u e s t i o n s p r e c e d i n g e s t i m a t i o n in the n a r r o w sense and as has b e e n s t a t e d a l r e a d y they t u r n out to be the m a i n d i f f i c u l t y in the p r o c e s s of e s t i m a t i o n

(or inference)

in E V models.

This diffi-

culty is the r e a s o n w h y not v e r y m u c h a t t e n t i o n has b e e n p a i d to EV models for a long time. However,

in the last d e c a d e t h e r e has b e e n a

r e s u r g i n g i n t e r e s t in E V m o d e l s in e c o n o m e t r i c s , theory, see e.g. A i g n e r and G o l d b e r g e r A n d e r s o n B.D.O. (1984), D e i s t l e r

s t a t i s t i c s and system

(1977), A i g n e r et al.

(1985), A n d e r s o n and D e i s t l e r

(1984),

(1984), A n d e r s o n T.W.

(1984), D e i s t l e r

(1985a),Fuller

(1980), G r e e n and

Anderson

(1985) , H i n i e h and W e b e r

(1984) , K a l m a n

(1982), K a l m a n

Maravall

(1979), Picci

Mittag

(1985), W e g g e

(1985), S 6 d e r s t r 6 m

(1983),

(1980), S c h n e e w e i B und

(1983).

The p a p e r is o r g a n i z e d as follows.

In s e c t i o n 2 we r e p e a t some well

known results for the static case.

In sections 3 to 5 we c o n s i d e r the

(dynamic) c a s e w h e n the c h a r a c t e r i s t i c s are their second moments.

of the o b s e r v a t i o n s c o n s i d e r e d

T h e r e b y in s e c t i o n 3 the set of all t r a n s f e r

functions c o r r e s p o n d i n g to g i v e n s e c o n d m o m e n t s of the o b s e r v a t i o n s is described.

Section

4

deals w i t h the same p r o b l e m , w h e n the s y s t e m

is a priori k n o w n to be c a u s a l and w i t h the p r o b l e m w h e t h e r c a u s a l i t y can be d e t e c t e d f r o m the s e c o n d m o m e n t s of the o b s e r v a t i o n s .

In sec-

tion 5 several c o n d i t i o n s for i d e n t i f i a b i l i t y are given. F i n a l l y in section 6 we d e r i v e c o n d i t i o n s for i d e n t i f i a b i l i t y u s i n g i n f o r m a t i o n coming f r o m m o m e n t s of o r d e r g r e a t e r than two.

The system c o n s i d e r e d is of the form

(1.1)

Yt = w(mxt

40

where B on ~

is a c o m p l e x v a r i a b l e as w e l l as the b a c k w a r d - s h l f t o p e r a t o r

and where

(1.2)

w(B)

=

Z wiBl

is the t r a n s f e r function.

The s u m m a t i o n on the l.h.s of

(1.2) ranges

o v e r all i n t e g e r s and thus in g e n e r a l the s y s t e m is not a p r i o r i a s s u m e d to be causal.

The o b s e r v e d p r o c e s s e s

(x t) and

(yt) are g i v e n by

^

(1.3)

x t = xt + Ut

(1.4)

Yt = Yt + vt

We a s s u m e throughout:

(1.5) All p r o c e s s e s c o n s i d e r e d are

(wide sense)

stationary;

all limits

of r a n d o m v a r i a b l e s are u n d e r s t o o d in the sense of m e a n squares convergence

(I .6)

Ex t = Eu t = Ev t = 0

(I .7)

EXsU t = EXsV t = 0

Vs,t

and

(1.8)

(ut,v t) has a s p e c t r a l density,

~ say.

T h e s e a s s u m p t i o n s are c a l l e d the s t a n d a r d a s s u m p t i o n s h e r e and they w i l l not be f u r t h e r e x p l i c i t e l y restated.

The a s s u m p t i o n Ex t = 0 is i m p o s e d for n o t a t i o n a l c o n v e n i e n c e o n l y and may e a s i l y be relaxed. tion

(1.8)

(1.7)

is n a t u r a l

is n a t u r a l for errors.

In m a n y cases w e in a d d i t i o n a s s u m e

in our context. A l s o the assump-

41

(I .9)

EUsV t = o

Vs,t

i.e. ~ is d i a g o n a l (1.10) All p r o c e s s e s Thereby,

if

considered

have a spectral

(zt) is a s t a t i o n a r y

density

we often use fz to d e n o t e

process,

its spectral density. Assumption and

(1.9) means

(yt) are

due

that all

ment devices

effects

e.g.

if the errors

for inputs and outputs are correlated.

then c o r r e s p o n d

between

the s i t u a t i o n

is h o p e l e s s

to given second m o m e n t s

to separate

because

(x t)

effects are

Of course s i t u a t i o n s may occur w h e r e

can not be justified,

tional a s s u m p t i o n information

(linear)

to the s y s t e m and that only i n d i v i d u a l

a t t r i b u t e d to the errors. an a s s u m p t i o n

common

such

in the m e a s u r e -

W i t h o u t any addi"too many"

of the observations.

systems

Additional

the errors c o u l d be o b t a i n e d from certain

frequency domain p r o p e r t i e s

of the errors,

or from h i g h e r order moments.

2. The Static Case Here we c o n s i d e r the t r a n s f e r

the special case, w h e r e the system is static,

function w is simply the slope p a r a m e t e r of a line and

all p r o c e s s e s detail in the

are white noise. literature,

the surveys by M a d a n s k y T.W. A n d e r s o n complicated

see K a l m a n

Yt

(1.3) and

This case has been d i s c u s s e d

see e.g.

Gini

(1959), Moran

(1921), F r i s c h

(1982)

in great

(1934)

(1971), A i g n e r et al.

(1984). For the m u l t i v a r i a b l e

The static E V model

(2.1)

i.e.

and

(1984)

and K l e p p e r and L e a m e r

(1984).

is w r i t t e n as

= axt

(1.4), w h e r e

EXsX t = ~st.O~ In a d d i t i o n we a s s u m e

; a~R

(xt),

;

(ut) and

EUsU t = 6stO u (1.9)

(v t) are w h i t e n o i s e and thus

;

and

case, w h i c h is much more

EVsV t = 6stO v

i.e. E U s V t = o. If we try to w r i t e

42

(2.1) (1.3)(1.4)

as a "regression"

in the o b s e r v e d variables,

we

obta in :

Yt = axt + (vt - aut) But here E x t ( v t - au t) = -a.o u and thus in general squares e s t i m a t o r s investigate

will not be consistent.

the p r o b l e m

The p a r a m e t e r s

least

in more detail.

of i n t e r e s t are ~ = (a,o~,Ou,av).

ween these p a r a m e t e r s

(ordinary)

T h e r e f o r e we have to

and the second moments

The relation bet-


is given by (2.2)

o x = Ex~ = o~ + ~u

(2.3)

~xy = ExtY t = ExtYt = a . ~

(2.4)

~y

^

^

= Ey 2 = a2o^ + x

v

Thus the p r o b l e m of i d e n t i f i a b i l i t y model

is w h e t h e r

8 is u n i q u e l y

from second moments

determined

for this

from ~x" Cxy" Oy" A slightly

m o r e general model w o u l d be of the form

(2.5)

(I .3) -

bYt = ax t

(where a and b are suitably n o r m a l i z e d e.g. by a 2 + b 2 = I)

(I .4). Sloppy speaking here we a l l o w for the case a = ~

(2. I). Then the p r o b l e m of o b s e r v a t i o n a l to the f o l l o w i n g covariance

"Frisch"

matrix

problem

equivalence

(see Kalman

is e q u i v a l e n t

(1982)):

Given the

K = --Fx'~xyi find all d e c o m p o s i t i o n s i

l

l yx °yI (2.6)

K

into c o v a r i a n c e

-- ~

÷

(i.e. symmetric,

nonnegative

definite)

matrices

^

K a n d ~,

such that K is singular and ~ is diagonal.

lence is s t r a i g h t f o r w a r d ~

in

This equiva-

here K is the c o v a r i a n c e m a t r i x of

4,3

^

^

(xt,Yt), a and b, after suitable normalization, are defined from

~=(°u0)

^

the linear dependence relations in K, and

0

In the case

(2.1)

aV

(which excludes the possibility b = 0 in (2.5)

and which is the only one we treat here, unless the contrary has been explicitely stated)

K = ~.

a

a

2

holds. ^

By the singularity of K we have ^ det K =

(2.7) where a~ = E ~

a^.a^

x

y

~2 xy

-

=

O,

and furthermore

(2.8)

0 - 0, n - r > 0;

r times n-r times r n-r since v and u are

is n o n

(and t h a t

fact

inde-

pendent.

Let us a s s u m e a

that

n > 2 such

using

(2.16),

(2.18)

xt

that

C~n

Cgr-1~(n+1-r)

a is u n i q u e l y

processes.

Note

are at l e a s t and for

that

(2.16)

Thus w e h a v e

shown:

Theorem

Consider

2.2:

the a s s u m p t i o n

x t is n o n

Gaussian

theorem

forward

Then

there

a @ 0 then,

static

r > 0

the c u m u l a n t s

of

to the c a s e

the

observed

n = 2)

there

f o r m C y r x ( n - r ) , r > 0, n - r > 0

-

~,

~u and

a v can

(2.4).

EV model

Then,

under

a n d a ~ 0, t h e m o d e l

can be extended

n-

a is d e t e r m i n e d ,

(2.2)

(2.14).

I > 0,

(as o p p o s e d

Once

from

r-

from

of the

holds.

the

> 0).

(2.1) (1.3) (1.4)

the assumptions

s~

together ~ 0,

is i d e n t i f i a b l e .

to t h e m u l t i v a r i a t e

case

in a s t r a i g t h -

manner.

If i n s t e a d late

that

thus

(2.16)

unique

;

determined

for n > 2

determined

~

we assume

~ 0, a n d

two cumulants

these

be u n i q u e l y

This

If in a d d i t i o n

a = Cyrx(n-r) Cy (r-1) x (n+l-r)

and t h u s

with

Gaussian

@ 0.

of a s s u m i n g (u t v t) holds

provided

is for

that

that

(u t) a n d

Gaussian all there

then

r and for

(v t) a r e

independent

we postu-

Curvn- r = 0 whenever

n > 2 and

all n > 2 and

is a n > 2 s u c h

that

therefore ^

Cxn

~ 0.

a is

is

48

Now, let us make a few remarks on estimation: mic Gaussian

likelihood

is of the form (2.19)

function

(where constants

LT(O)

Thereby K(O)is the covariance

in (2.19)

estimator

The corresponding observationally

(MLE), obtained by minimizing

IT =

In the non Gaussian

case, of course

estimator

The reader

(for the case Oxy,T>0

Sy,T

say)

a'~xy,T )

(2.18) can be used to define

for a from the sample cumulants.

that the estimators

Here the

should be selected and also infor-

is referred

of different

to Drion

3. Second Moments and Dynamic Models:

order can be

(1951)

of obtained from

satisfy the restrictions

and Scott

(1950).

(2.18) do not

coming from the second moments. The General

Case

From now on, linear dynamic

systems are considered.

the two sections

only the information

following,

of all

to the true K,

IJxy

mation coming from sample cumulants

necessarily

corresponding

~x,T - &-1 "~xy,T'

'

problem arises which cumulants

Note also,

i.e. the e s t i m a t e o f t h e s e t

parameters

to theorem 2.1 is given by

[~xy,T'~x,T -1

combined.

xt

T tZ1= (yt) (xt'Yt)

set of parameters,

equivalent

{8 = (a,a-1.~xy,T,

a consistent

size. Thus the correspon-

is given by

K T = \~xy'T' Oy,T

then according

'

matrix K in (2.6) corresponding

{~x,T'°xy,T) (2.20)

(2.1) (1.3) (1.4)

T Z (xt,Yt).K-1(0).(xt,Yt) t=l

8, and T is the sample

ding maximum likelihood LT(O)

of the static model

logarith-

have been neglected):

= T log det K (8) +

to the parameters

The negative

second moments of the observed processes

In th£s and in

coming from the

(x t , yt ) is used.

49

For the m o m e n t let x t and Yt be not n e c e s s a r i l y Let z t = (xt,Yt), The general

zt =

one dimensional.

(~t'gt)' wt = (ut'vt)"

f o r m of a linear dynamic

s y s t e m is:

N • •N•

(3.1)

lim Z N~ i=-N wi

•

--

= 0, w i 6 ~

zt-i

m x n

; m 0

We a s s u m e

V

(3.1) can be w r i t t e n as

~

wi~t_i

= 0.

i=--00

Clearly,

for n = 2, f(1) can e i t h e r have rank one or rank zero. ^

In

the second case f(1) = 0 and f (I) = 0. I m p o s i n g (3.4) implies ^ xy m* = I and f(l) has rank I for all I. (3.4) is a u t o m a t i c a l l y fulf i l l e d if f

(I) ~ 0 for all I. Thus u n d e r (3.4) the s y s t e m can xy a l w a y s be w r i t t e n as (1.1) w h e r e w = (w,-1) is u n i q u e for given f and

fxy(X)

= 0 implies

If f itself

w(e - i l )

is singular,

= O. then f=f and ~=0 d e f i n e s a d e c o m p o s i t i o n

c o r r e s p o n d i n g to an e r r o r - f r e e

system.

This d e c o m p o s i t i o n

is u n i q u e

whenever f

(I)~0. For 1's w h e r e f (I)=0 we may have e.g. f (I)=0, xy xy y (fx(1) > 0 and f~(1) > 0), and fu(1) > 0 gives rise to a n o t h e r d e c o m -

position.

Of c o u r s e

in this case we have for the c o r r e s p o n d i n g

transfer function w(e-il)=0. a s s u m e that f(1) than zero.

For the rest of the p a p e r we a l w a y s

is n o n s i n g u l a r on a set of L e b e s g u e m e a s u r e g r e a t e r

51 Besides the transfer function w, the other characteristics interest are

of

fx' fu' fv"

Analogously to the static case, the set of pairs with given f satisfies

(f~,fg) compatible

(3.5)

O < f~

< fx'

f~(1) = f~(-l);

f~ is measurable

(3.6)

0 ~ f9

! fy,

f~(1)

f~ is measurable

= fg(-l) ;

^

and, since f is singular

(3.v)

If

xy

12 = f^f^

x y

and (3.5)(3.6)(3.7) are the only restrictions on (f~,f~). Thus we have (Anderson and Deistler (1984) Deistler(1985a)). Theorem 3.1:Consider

the linear dynamic EV system

Under the additional assumptions (1.9)(1.10) all transfer functions w satisfying (3.8)

Ifyx(1) l-f~1(~) 0, t h e n

If

(4.12)

of w of m o d u l u s

(4.1)

if t h e r e

and a constant

is a t r a n s f e r

(vi)

and

satisfying

n is u n i q u e l y

if a n d o n l y

which

(and

of p o l e s

see t h a t

functions

no transfer

(v)

we

shown

Let

If n = 0,

(4.11)

- ~a O is t h e n u m b e r

the n u m b e r

(4.13)

the

then

(4.9)

transfer

fxy)

(iv)

via

yx

w and w are

(4.14)

(ii)

f

hold,

w = b + b - B ~ b ° ( a + a - B~a°) -I

. Thus we have

(i)

(4.15)

+ ~b O - ~a

(4.11) a n d

Theorem

-

a w a n d f^ g i v i n g x

Now,

From

(4.13)

transfer

function,

but

there

is m i n i p h a s e

the

transfer

w = a-lb exists

functions

and w correspond

a polynomial

f2

are causal, to the

satisfying

same (4.13)

58

(4.16)

0 < $f2 0 such that = c. (b+~ -) (a+~ - ) - 1 . f 2 ~ 2 . B n - ~ f 2

(4.17)

holds Proof:

If in

(4.11) w e take f2 = a w = c ( b + ~ -)

and fl

(a+~-)-lB n

w h e r e ( b + ~ - ( a + ~ -) is a causal and m i n i p h a s e with

(ii) this implies

(iii) - (v) and a l s o

This result has been stated in D e i s t l e r

non n e c e s s a r i l y

case a n d in G r e e n a n d A n d e r s o n From

rational (1985)

transfer (vi)

function;

together

is e a s i l y seen.

(1985a)and D e i s t l e r

Partly more general r e s u l t s have been given for the causal,

then

in Anderson,

(1985 b).

B.D.O.

(1985)

single input - single output

for a causal m u l t i v a r i a b l e

case.

(4.17) we see that the c a u s a l i t y a s s u m p t i o n gives a substantial

r e d u c t i o n of the set of all t r a n s f e r f u n c t i o n s c o m p a t i b l e w i t h given f

. If n = 0, then in the causal case, w is unique up to m u l t i p l i yx c a t i o n by a p o s i t i v e c o n s t a n t (this has b e e n p o i n t e d out by H i n i c h (1983) and Anderson,

B.D.O.

(1985)). An e s t i m a t i o n p r o c e d u r e

has been d e v e l o p e d by H i n i c h

5. C o n d i t i o n s

(1983)

for I d e n t i f i a b i l i t y

for the case n=0

and H i n i c h a n d Weber

(1984).

f r o m the Second M o m e n t s of the

Observations This section c o n s i s t s of two parts: the a d d i t i o n a l

the second step of the a n a l y s i s case.

Special e m p h a s i s

We have

(see A n d e r s o n and D e i s t l e r (1985a) (1985b), M a r a v a l l

If the transfer

(4.10). So this is section for the rational In the second p a r t

for i d e n t l f i a b i l i t y .

T h e o r e m 5.1: Let the a s s u m p t i o n s (i)

(4.8) and

of the p r e v i o u s

is put o n identifiability.

we give some o t h e r c o n d i t i o n s

Deistler

In the first part, we i n v e s t i g a t e

i n f o r m a t i o n c o m i n g from

(1984), A n d e r s o n , (1979), N o w a k

(4.1)

B.D.O.

(1985),

(1983)):

- (4.5) hold.

Then:

f u n c t i o n s are a p r i o r i k n o w n to be causal and

if either n = 0 or if

59

(5.1)

d,b

are relatively

prime

(5.2)

a,e

are relatively

prime

(5.3)

b+,~ -

then w is u n i q u e l y

are relatively

determined

prime

from f under

e a c h of the f o l l o w i n g

condi-

tions :

(5.4)

d,c

are r e l a t i v e l y

prime and

(5.5)

a.d,f

are relatively

prime

(5.6)

6d = 0 a n d

(5.7)

Gad = 0 a n d 6e + 6b > 6g - 6f

(ii)

If w is a p r i o r i assumed

(iii)

assumed

f2 An

If the t r a n s f e r assumptions

(4.14)

to be c a u s a l

(5.8)

a,e

(5.9)

a ,a

prime,

~ad > 0

and

if d a n d c a r e a p r i o r i

then there

compatible

functions

(5.1)

and

~e > 6h - 6c

to b e r e l a t i v e l y

of f a c t o r s

~d > 0

is o n l y a f i n i t e

with given

are n o t n e c e s s a r i l y

number

fx" causal,

a n d if the

- (5.3)

are relatively

prime

+~--

hold,

(iv)

then under

are relatively

prime

(5.4)

or

If w , f ~ c o r r e s p o n d s

or

(5.5)

to g i v e n

fyx'

(5.6)

or

(5.7)

t h e n a l l cw, c - l f ~ ,

satisfies (5.10)

0 < C m i n _< c _< C m a x w i t h C m i n a n d Cma x d e f i n e d

(5.11)

by

min 1B I= I

(fx(B)

- c -I f~(B)) = 0 min

IBlmin=1

(fy(B)

- Cma xlw(B) 12f~(B))

and (5.12)

w is u n i q u e

= 0

where c

60

c o r r e s p o n d to given f, and for all o t h e r c, cw, c-lf

correspond Proof:

to given

x

does not

f.

(i): If n = 0, then as has a l r e a d y b e e n stated, w is u n i q u e l y

determined from

(4.9) up to m u l t i p l i c a t i o n by a p o s i t i v e constant.

The same h o l d s under

(5.1)

-

(5.3) : Due to

zero c a n c e l l a t i o n s on the r . h . s

in

(5.1)and(5.2) no p o l e -

(4.9) can occur and thus a and d

are u n i q u e l y d e t e r m i n e d f r o m the p o l e s in fyx" By

(5-3)

, e then is

u n i q u e l y d e t e r m i n e d f r o m those zeros of a d d * f y x , B i say, w h e r e also -I B i is a zero and thus b and w are u n i q u e up to m u l t i p l i c a t i o n by a p o s i t i v e constant.

From

(4.8) we h a v e

(5.13)

dfxd* = e o e* + d c - l h o U h * c - 1 * d *

If

then there e x i s t s at least one zero of d,

(5.4) holds,

B I say,

and we h a v e

dfxd*(B1)

= e~ge*(B1)

and f r o m this Ge and thus b are u n i q u e l y d e t e r m i n e d . (5.5)

If

The proof for

is c o m p l e t e l y analogous.

(5.6)

h o l d s then

(4.8)

is of the f o r m

-fx = e~£ e* + c lhs h*c-

I*

and thus c is u n i q u e l y o b t a i n e d f r o m the p o l e s of fx' Then a e is obt s i n e d f r o m a c o m p a r i s o n of c o e f f i c i e n t s of p o w e r

cf Xc* = c e ~ e e * c * and in the same way we p r o c e e d if

The proof of (5.9)

(iii)

de + ~c in

+ ho h* (5.5)

holds.

is c o m p l e t e l y a n a l o g o u s ,

since

(5.1) - (5.3)(5.8)

h e r e a g a i n g u a r a n t e e that w is d e t e r m i n e d f r o m fyx up to multi-

p l i c a t i o n by a p o s i t i v e constant.

6~

(ii) if d and c are relatively prime,

then all zeros of d are poles

of fx and thus there is only a finite number of candidates

for d and

thus also for f2 in (4.17) (iv) is an immediate consequence

of Theorem 4.1 and of

taking into account the non-negativity

(4.8) and

of spectral densities

for

(4.10), IBI = 1

Clearly, once w is uniquely determined and if w(e -il) ~ 0 then also ~ ' fu' fv are unique. (i) and (iii) show that e.g. using (4.19), once the degrees are prescribed and if 6d > 0, we have identifiability on a generic subset of the parameter Now

space.

we discuss some other cases where additional

guarantee

identiflability

(i) Let the inputs xt have a spectral distribution (5.14)

F^(I) x

=

a priori restrictions

from the second moments of the observations

; f^dl + Z Fx, j [_~,~] x j:lj 0 x,3

Thus (x t) is a fairly general process where F x has an absolutely

con-

tinuous and a discrete part and where the discrete part corresponds to a stationary harmonic process ZeiljtZx,j, where Fx, j = ElZx,jl 2. Here we do not impose

(1.9). By assumption

(1.8),

(ut,v t) has a spectral

density and thus we have (5.15)

Fx(1)

=

S (f~+fu)dl [-z,l]

+

Z Fx, j j:lj2 such that Cen~ 0. If in addition w2(e -il) ~ 0

VI, then

f~n(ll...In_1) fulfilled.

(6.8) is

# 0 ~ll...In_ 1 and then due to (6.3) condition

The generalization of Theorem 6.1 to the multivariable case is straightforward. Estimators of the transfer function w may be obtained from (6.9) replacing the cumulant spectra by their estimators.

66 References Aigner,D.J. and A.S.Goldberger (Eds.), (1977): Latent Variables Socio-Economic Models.North Holland P.C., Amsterdam

in

Aigner,D.J., C.Hsiao, A.Kapteyn and T.Wansbeek (1984): Latent Variable Models in Econometrics. In: Griliches, Z. and M.D.Intriligator (Eds.) Handbook of Econometrics. North Holland P.C., Amsterdam Akaike,H. (1966): On the Use of Non-Gaussian Process in the Identification of a Linear Dynamic System. Annals of the Institute of Statistical Mathematics 18, 269 - 276 Anderson,B.D.O. (1985): Identification of scalar errors-in-variables models with dynamics, Forthcoming in Automatica Anderson,B.D.O. and M.Deistler (1984): Identifiability in Dynamic Errors-in-Varlables models, Journal of Time Series Analysis, 5, 1-13 Anderson,T.W. (1984): Estimating Linear Statistical Relationships. Annals of Statistics, 12, 1 - 45 Brillinger,D.R. (1981): Time Series: Data Analysis and Theory. panded Edition. Holden Day, San Francisco

Ex-

Deistler,M. (1984}: Linear errors-in-variables models. In: J.Franke, W.H~rdle und D.Martin (Eds.), Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Springer-Verlag, Berlin Deistler,M. (1985a): Linear dynamic errors-in-variables models in: J.Gani and M.Priestley (Eds.) Essays in Time Series and Allied Processes. Forthcoming Deistler,M. (1985b): Identifiability and Causality in Linear Dynamic Errors-in-Variables Systems. In: Proc. 5th Eranco Belgian Meeting of Statisticians. Forthcoming Deistler,M. and H.G.Seifert (1978): Identifiability and Consistent Estimability in Dynamic Econometric Models. Econometrica, 46, 969 - 980 Drion,E.F. (1951): Estimation of the Parameters of a Straight Line and of the Variances of the Variables, if they are Both Subject to Error. Indegationes Math. 13, 256 - 260 Frisch,R. (1934): Statistical Confluence Analysls by Means of Complete Re@ression S[stems. Publication No. 5, University of Oslo, Economic Institute Fuller,W.A. (1980): Properties of some Estimators for the Errors-inVariables Model. Annals of Statistics, 8, 407 - 422 Geary,R.C. (1942): Inherent Relations between Random Variables. Proceedings of the Royal Irish Academy, Sec. A, 47, 63 - 76

67

Geary,R.C. (1943): Relations between Statistics: The General and the Sampling Problem When the Samples are Large. P r o c e e d i n g s of the Royal Irish Academy. Sec. A, 49, 177 - 196 Gini,C. (1921): Sull'interpolazione di una tetra quando i valori della variable indipendente sono affetti da errori aocldentall. Metron I, 63 - 82 Green,M. and B.D.O.Anderson (1985): Identification of m u l t i v a r i a b l e e r r o r s - i n - v a r i a b l e s models with dynamics. Mimeo. Hannan,E.J.

(1970):

Multiple

Time

Series.

Wiley,

New York

Hannan,E.J. and L.Kavalieris (1984): Multivariate Linear Time Series Models. Advances in Applied Probability 16, 492 - 561 Hinich,M.J. (1983): Estimating the Gain of a Linear Filter from Noisy Data. In: D.R.Brillinger and P . R . K r i s h n a i a h (Eds.) Handbook of Statistics, Vol 3. North Holland, A m s t e r d a m Hinich,M.J. and W.E.Weber (1984): Estimating Linear Filters with Errors in Variables Using the Hilbert Transform. Federal Reserve Bank of Minneapolis, Res.Dept. Staff Report 96 Kalman,R.E. (1982): System Identification from Noisy Data. In: A.Bednarek and L.Cesari (Eds.) Dynamical Systems II, a University of Florida International Symposium. Academic Press, New York Kalman,R.E. (1983): Identifiability and Modeling in Econometrics. In: Krishnaiah,P.R. (Ed.) Developments in Statistics, vol 4. Academic Press, N e w York Kendall,M.G. and A.Stuart (1969): The Advanced Vol I, 3rd Edition, Griffin, London

Theor~ of Statistics.

Klepper,S. and E.Leamer (1984) Consistent Sets of Estimates for Regressions with Errors in all Variables. Econometrica 52, 163 -. 183 Madansky,A. (1959): The Fitting of Straight Lines when Both Variables are Subject to Error. Journal of the American Statistical Association 54, 173 - 205 Maravall, A. (1979): Identification Springer Verlag, Berlin.

in Dynamic

Moran, P.A.P. (1971): Estimating Structural ships. Journal of M u l t i v a r i a b l e Analisys

Shock-Error

and Functional I, 232-255

Models. Relation

Nowak, E. (1983): Identification of the Dynamic Shock-Error Model with A u t o c o r r e l a t e d Errors. Journal of Econometrics 23, 211-221 Picci, G. (1985): Factor Analylis Methods. This Volume

Models via Stochastic

Realization

68

Reiers¢l,O. (1941): C o n f l u e n c e A n a l y s i s by M e a n s of Lag M o m e n t s and o t h e r M e t h o d s of C o n f l u e n c e A n a l y s i s . E c o n o m e t r i c a 9, I - 24 Reiers~l,O. (1950): I d e n t i f i a b i l i t y of a L i n e a r R e l a t i o n B e t w e e n V a r i a b l e s w h i c h are s u b j e c t to Error. E e o n o m e t r i c a 18, 375 - 389 S c h n e e w e i S , H . u n d H . J . M i t t a g (1985): L i n e a r e M o d e l l e m i t f e h l e r b e h a f t e t e n Daten. P h y s i c a Verlag, W ~ r z b u r g Scott,E.L. (1950): Note on C o n s i s t e n t E s t i m a t e s of the L i n e a r Structural R e l a t i o n B e t w e e n two V a r i a b l e s . A n n a l s of M a t h e m a t i c a l S t a t i s t i c s 21, 284 - 288 S 6 d e r s t r 6 m , T . (1980): S p e c t r a l D e c o m p o s i t i o n w i t h A p p l i c a t i o n to I d e n t i f i c a t i o n . In: A r c h e t t i , F . and M . C u g i a n i (Eds.) N u m e r i c a l T e c h n i q u e s for S t o c h a s t i c Systems. N o r t h H o l l a n d P.C., A m s t e r d a m Wegge,L. (1983): A R M A X - M o d e l s P a r a m e t e r I d e n t i f i c a t i o n w i t h o u t and w i t h L a t e n t V a r i a b l e s . W o r k i n g Paper. Dept. of Economics, Univ. of C a l i f o r n i a , Davis.

Chapter

3

A New Class of Dynamic Models For Stationary Time Series

Giorgio

Picci

and

Stefano

Pinzoni

I. Introduction In this note we shall discuss a new class of dynamic models which may be better suited than conventional ARMAX schemes to describe non-causally interacting time series. Typical areas of application that we have in mind include econometrics (where it is often not clear what variables are "endogenous" and what are "exogenous") and identification of industrial processes operating under feedback. In these situations there is no a priori clear causality relation among the variables and, in fact, a possible goal of the identification experiment could be the testing for existence of causal relations. The class of models introduced here is a natural dynamic generalization of the well-known static Factor Analysis model which in various equivalent forms (the most popular of which seems to be the so-called Errors-In-Variables scheme) has been object of much study in the past especially by econometricians and psychologists.

(For definitions of these concepts and a

rather comprehensive survey of the literature one may consult the recent paper by Van Schuppen(1985). The study of these models has recently been revitalized by Kalman in a series of papers(Kalman, 1982a,1982b and]98~and some of the critiques presented in Kalman's

?0

work have been the motivating stimulus £or the earlier paper (Finesso and Picci, 1984). The present exposition represents the natural continuation and generalization of the results presented there. In order to improve readability we have chosen to skip some non essential technical details. A more complete story can be found in (Picci and Pinzoni, ]986). People interested in genera] philosophical discussions on the modelling problem considered here are referred to the introduction of (Finesso and Picci, 1984). We should mention that some of the specific issues dealt with in this paper are also treated (in the scalar E.I.V. context) in the work of Anderson and Deistler (1984), Anderson (]985), Deistler (]985). Although the primary motivations (and hence the basic assumptions) in these papers are of a rather different nature than ours, the reader might find some ground for comparisons in the discussion of the causality problem presented in section 4. For the sake of motivating the introduction of Dynamic Factor Analysis models we shall briefly review the definition of causality of a dynamical model, first in the deterministic and then in the stochastic (Gaussian) case. The idea that we want to convey is that causal models are quite "nongeneric" mathematical descriptions to impose aprioristically to real data, e.g. economic time series or data coming from industrial processes involving feedback. In a deterministic framework the notion of causality is of course well known. Assume that the components of the m-dimensional variable y(t), whose temporal evolution is described by a certain dynamical model, have been grouped in two subvectors,

Yl(t)] y(t)

=

,

(I .I)

Y2 (t)

with Yi(t)_ _ER

mi

, i = 1,2 , and m1+m 2 = m .

It is intuitively clear

that a dynamical model should quantify the dynamic relation

71

occurring between the variables Yl and Y2 (i.e. how much

Yl

"influences" Y2 and vice versa). This is made precise in J.O.Willems refoundation of Systems Theory (Willems,1979):anymodel valently dynamical system) with external variables subset of trajectories

is just a

L~ (called the behaviour of the system)

in ( m)~= ( ml)~x (~m2) Z between

y

(or equi-

and therefore a bona fide relation

Yl (ranging over (~ml) ~) and Y2 (ranging on (~m2)2). We

say that Yl causes Y2 or, equivalently,

that Yl is the input and

Y2 is the output variable of the system, if this relation specializes to a very particular kind of function, namely if

y2(t) = f(yl), where

t~z

,

(1.2)

f depends only on the values taken by Yl before and at

time t . In the stochastic case the sharply defined subset ~

is

replaced by a probability measure on the sample space (Rm) Z and thus the external variable

y becomes a stochastic process

{y(t)}. The model is in this case just the probability law of {y(t)}. To make things simple we shall consider here about the simplest possible class of random processes, described in the following BASIC ASSUMPTION The process {y(t)} is an m-dimensional Gaussian stationary process with zero mean and has a rational spectral density

S

strictly positive definite on the unit circle (i.e. S(e ie) >0).

D We shall write the spectrum

S in a partitioned form corre-

sponding to the subdivision (1.1) of the external variables,

72

S

S1

S12

S21

S2

,

=

(I .3)

where the blocks S., of dimension m. xm., represent the auto l

I

l

spectra and $12 the cross spectrum of the two components {y1(t)} and {Y2(t)} of dimension m I and m 2. The definition of causality in this context, essentially due to

Granger (3963 and 1969)~sounds as follows.

DEFINITION 1.1 We say that the process Yl causes Y2 or, equivaleqtl~, that Yl is an input process with correspondlng output Y2 if, for all

t ~ ~ ,

E~2(t ) lyl]

= E[Y2(t)

]Y1(S), si~ ,

(1.4)

where the first conditional expectatlon is with respect to the whole history {Y1(t); t E Z }

of the component YI"

O Causality is just conditional independence of the past and present output history {Y2(S); s ~ t }

from future inputs

{Y1(S); s > t } given the past of the input {Y1(S); s ~ t } and can of course be defined in a much more general setting than the one adopted here. In a Gaussian setting we can however translate everything in the convenient Hilbert space language of the linear theory of random processes (see e.g.Rozanov, 1967).Some of this material necessary for future use will be quickly reviewed in the next paragraphs. We shall denote the vector space of all finite linear combinam tions of the scalar random variables [a'y(t); ~ 6 R , t E Z } closed in the metric induced by the scalar product

< x,z > : = Ex z, by +

the symbol H(y) (sometimes abbreviated to H) • H~(y) , Ht(Y) will

73

denote the past and future subspaces spanned by the random variables y(s) up to and, respectively, after and at, time t. Clearly,

h • (=yurn(y) ) where U : y(t)+y(t+l)

(1.5)

is the (unitary) shift operator of the

process {y(t)}. Normally the subscript zero in (1.5) will be dropped. For the two components Yl and Y2 we shall define the subspaces H(Yl) , H(Y2) (abbreviated to H I and H 2 when there is no danger of confusion) accordingly. Obviously H = HIV H 2

where

the wedge denotes closed vector sum. Subspaces like H I and H 2 are doubly invariant for the shift U, in the sense that they satisfy

UtH. = H. for all t E ~. The i l multiplicity of a doubly invariant subspace X C H is the cardinality of any minimal generating set, i.e. is the smallest n which one can find random variables

{x1''"'Xn } in X

for

such that

the vector space generated by {Utx.; i= 1,...,n , t E Z } is dense i in X. The process {x(t)} with x.(t) =Utx. is called a generating l i process of X. By the Spectra] Representation Theorem (Rozanov, 1967), there is a unitary representation of the random variables in H(x) as n-dimensional

(row) vector functions in the Hilbert space

L2(C,dQ) where C = {z; Izl = l} is the unit circle in the complex n plane and Q is the n x n matrix spectral distribution measure of the process {x(t)} . Each random variable ~(t): = ut~ with ~EX can be written as ~(t) = [~e i@t f(ei@)dx(e i8)

for a unique

f E L 2 (C,dQ). Here ~ is the n-dimensional random n spectral measure of the stationary process {x(t)}. As it is well known ( Rozanov,

1967 )

the spectral distribution matrix is re-

lated to ~ by dQ=E(d~ d~*),where the star means conjugate transpose.

74

The representation will be symbolically written as

(i .6)

~(t) ~ f(z)x(t) .

The System Theoretic interpretation of the notation is that the stationary process {~(t~ is obtained by passing the stationary process {x(t)} through the linear (stable) filter of transfer function f. In all cases of interest for us the spectral distribution measure of {x(t)} will be absolutely continuous with respect to Lebesgue measure on C. The spectral density matrix will still be denoted by the symbol Q.It is well known(compare e.g.Fuhrmann~1981, p. 111) that {x ,...,x } being a minimal generating set is equivalent to Q(e

.@I i

n

. .

. ,

. °

.

. ,

o

) being st rlctly posltlve deflnlte on a set of

positive Lebesgue measure. For example the assumption S(e i8) > 0 a.e. guarantees that H = H ( y ) has precisely multiplicity m, a possible minimal set of generators being given by the m scalar components of the random vector y(0). Observe further that any other minimal generating process for H(x) can be written as

u(t) = T(z)x(t) ,

(1.7)

with T an nxn matrix function having rows in L2(C,dQ) and Q-a.e. n

nonsingular o n t h e

unit circle.

Of course when {x(t)} admits a spectral density Q which is a.e. positive definite on the unit circle, then all admissible T ' s

will be a.e. nonsingular on C and all minimal gene-

rating processes for H(x) will have an a.e. positive definite spectral density on the unit circle.In particular, by choosingT=W I//2n

75

where W is any square solution of the standard spectral factor~zstionproblemWW*=Q,we

obtain white noise generators {u(t)}

for H(x).The transfer function in the representation

(1.6) in

this case belongs to L2(C, d@/2~). In this context we shall call n causal any function f with vanishing positive Fourier coefficients in

L2(C, dO/2~), i.e. such that n f~ . . I e-~Okf (el0)dO/2~ = 0 ~-~

(1.8)

for all k>0. Thus any causal function belongs to the n-dimensional conjugate Hardy space ~2 (Hoffmann,1962) and can be extended n to a function of the complex variable z analytic on {Izl > I} (including the point at infinity). A matrix valued function T will be called causal if its rows are. It can be verified directly that for any generating process {x(t)} with a strictly positive definite

matrix (°)

spectral density

we have

Ht(u) C Ht(x)

if and only if the

transfer matrix T in (1.7) is causal. A (left -) invertible matrix T with rows in L2(C, d9/2~) will be called minimum phase if it is n

causal and its extension has an analytic (left-) inverse on {Izl > I}. This is the same thing as a conjugate outer matrix function in

H2-theory. We finally recall the concept of conditional orthogonalit~.

Two

subspaces HI,H 2 of H will be said conditionally orthogonal, given.

iH21H), if

a third subspace X (notation: H I _

< h I -EXhl , h 2 - E X h 2 > = 0

for all h I E H I and h 2 6 H 2, Here the symbol E

(1.9) X

denotes orthogonal

projection onto X. Since in the Gaussian case conditional expec(o) or, more generally, ]967).

full rank purely non deterministic

(Rozanov,

76

tation given a certain family of random variables in H is the same thing as orthogonal projection onto the subspace of H spanned by them~ we see that conditional orthogonality is the same property as conditional independence, given X, of the two families H I and H 2 of Gaussian random variables. The concept of conditional orthogonality

will be extensively used in this paper. For additional

information one may consult (Lindquist and Picci. ]985). We return to our discussion of causality in the stochastic setting. The following is a rather well known fact although often stated in a different terminology.

THEOREM 1.1 The process {Y1(t)} causes {Y2(t)} if and only if

Y2(t) = A(z)y1(t) +v(t)

,

(1.10)

where A(z) is an m 2 x m I causal matrix function and {v(t)} stationary process completely independent of {Y1(t)},i.e.

E y1(t)v'(s) = 0

for all

(1.11)

t,s E Z.

This result is essentially due to (Caines and Chart, 1976). It is also discussed in (Caines and Chan, ]975) and (Gevers and Anderson, 1982). In these references causality is called "absence of feedback" (from Y2 to y]). Note that (].]0) is nothing else but the popular ARMAX scheme widely used in time series identification. Just express {v(t)} by its innovation representation, v(t) = G(z)e(t)

,

(1.12)

where G(z) is minimum phase, normalized so as to make G(~) = I,

77

and {e(t)} is a white noise process. Recall that, by rationality of S(z), both A(z) and the spectrum of {v(t)} are rational and then express the rational matrix [A(z) G(z)] by a left coprlme M.F.D. D(z) -lIB(z) C(z)] to get

D(z)Y2(t) = B(z)Y1(t) +C(z)e(t).

(1.13)

The orthogonality condition (1.11) holds if and only if

E e(t)y;(s) = 0 ,

t,sEZ

and therefore using ARMAXmodels noise and input (yl) processes

(o)

a causality relation on the data.

,

with independent

(1.14) (or uncorrelated)

.

is equivalent to imposing a priori In this case the statistical

inference problem of estimating the joint law of {Y1(t)} and {Y2(t)} is reduced to the much simpler problem of estimating just the conditional law of future y~s given past inputs YI" Quite often there is no evidence in the data which justifies the use of causal models. What kind of models should then be used in this situation? One obvious answer would be to describe the whole (joint) process {y(t)} by an m-dimensional ARMA scheme corresponding say to the m x m rational minimum phase spectral factor of the joint spectrum S. Our main concern is however in describing how two given groups of variables

(Yl and y2 ) interact

dynamically.

In practice Yl and Y2 have a precise physical or economic meaning and the main reason for doing modelling and identification is to discover how much of the temporal evolution of each variable is "explained" by the other. For this purpose it would be much more useful to have models which (although necessarily equivalent to the joint ARMA scheme mentioned above) put into explicit evidence the mutual influence of the variables Yl and Y2" A class of mathe(o) Actually condition (1.14) is often considered to be part of the definition of an ARMAX model and is not even explicitly mentioned.

78

matical descriptions which in a certain sense generalizes the causal model (1.10) is the stochastic feedback scheme

Y2(t) = L(z)y 1(t) +v 1(t),

(i.~5) Y1(t) = K(z)Y2(t) +v2(t), where L and K are causal transfer functions and {v1(t)} and {v2(t)} stationary"error" processes whose innovations can at most he assumed orthogonal to the past histories of Yl an4 Y2' respectively. This class of models has been extensively investigated in recent years, especially by Gevers and Anderson (1981 and 1982) and Anderson and Gevers (]982) with the main motivation of understanding identifiabi]ity of control systems operating under feedback. Practica] use of these mode]s for time series identification seems however to have been very limited so far. We shall propose here a different class of models in which the dynamic interaction between Yl and Y2 is explicitly by the introduction of an auxiliary

described

variable x. This auxiliary

variable will play a role similar to the state variable in Systems Theory.

DEFIN%TION 1.2 A Dynamic Factor Analysis Model with external variables the (jointly statignary) vector processes

{Y1(t)} and {Y2(t)}, is

a linear relation of the form

Y1(t) = A 1(z)x(t)+w 1(t),

(i .16)

Y2(t) = A2(z)x(t) +w2(t), where A I (z) and A2(z) are transfer matrices of dimension m I x n and. m 2 x n

and {x(t)}, {w1(t)} , {w2(t)} are zero mean stationary

79

processes of dimensions n, m|, m 2 which are pairwise uncorrelated, i.e.

{w1(t)} i {x(t)} i {w2(t)}"

(1.17) []

Note that A I and A 2 need not be causal. The process {x(t)} will sometimes be referred to as the factor process of the model. A Dynamic

Factor Analysis (F.A.) model will be called rational if AI,A 2

are rational matrices and {x(t)} has rationalspectrum. The terminology (although not

terribly elegant) has been extrapolated from the

static case. In the next sections we

shall present a first rudimentary

analysis of the model (1.16). The main questions one would like to answer concern the representability of an arbitrary joint stationary process {y(t)} (with y(t) partitioned as in (1.1)) by models of the type (1.16), the equivalence of representations (i.e. when do different representations describe the same spectrum S or the same process {y(t)}), the "external behaviour" of the model which is obtained once {x(t)} is eliminated, finding a natural notion of minimality and characterizations of minimal models, parametrizations and canonical forms in the rational case and above all discuss

use of Factor Analysis models in Statistical Inference

(i.e. identification). This is quite a large program and only a few of these aspects will be touched upon in this paper. Others (especially the last two mentioned above), which still need more research, will not be discussed here.

80

2. Dynamic Factor Analysis Models The stationary processes {x(t) }, {w 1(t)}, {w2(t)} which define a Factor Analysis model span a certain Hilbert space H(X,Wl,W 2) which we denote by H . The Factor Space X of the O

model (1.16) is the doubly invariant subspace of H

generated O

by the factor process,

X = span {a'x(t); a E R

n

, t6Z}

.

(2.1)

Let n < n be the multiplicity of X and let[x(t)} be a minimal generating process for X. Clearly, since x(t) =T(z)x(t) for some nxn

matrix T, we can always rewrite the model (1.16) with A (z) I and A2(z) replaced by At(z) =A1(z)T(z) and A2(z) =A2(z)T(z) and a factor process x(t) which is a minimal generating process for X. We shall therefore adhere from now on to the convention of considering only F.A. models in which {x(t)}is a minimal generating process for

X. Hence the multiplicity of X will always coincide

with the dimension of x(t). Two F.A. models which differ by a change of (minimal) generators in X will be called equivalent. Obviously two equivalent models have the same {w.(t)} processes (for i=1,2), i

the same factor space X and transfer matrices and factor processes related hy A.(z) = A.(z)T(z) -I , i

i=I,_ 2

,

i

(2.2)

^

x (t) = T(z)x(t), where T is a Q-a.e. nonsingular

n x n matrix function whose rows

belon~ to L2(C,dQ),Q being the spectral distribution measure of n

{x(t)}. It is easy to check that (2.2) defines an equivalence relation on the class of all F.A. models of {Y1(t)}, {Y2(t)}. We shall now introduce the concept of splittin$ subspace. By this idea we shall be able to attach a precise probabilistic meaning

81 to F.A. models and at the same time reduce this notion

to a very

simple geometric object. Let H i=H(yi),i = 1,2 be the Hilbert spaces spanned by the components {Yi(t)},

i= 1,2. It will be useful to

think of H I and H 2 as (doubly invariant) subspaces embedded in a large Hilbert space Ho obtained by suitably augmenting H

NIV H 2. On

there is defined a unitary shift operator U which reduces to O

the shift of the process {y(t)} on the subspaee H=H(y) = H I V H 2. (The role played by H

o

is very similar to that of the space

H(X,Wl,W 2) introduced at the beginning of this section).

DEFINITION 2.1 A (stationary) splitting Subspace is a doubly invariant subspace X __°fHo which makes H(y I) and H(y 2) conditionally orthogonal given X, i.e. satisfies

H(Yl)IH(Y 2) [ X

(2.3)

together with UX = X. A Splitting Subspace X is called minimal if there are no proper subspaces of X which are doubly invariant and still satisfy condition (2.3).

[] The concept of splitting subspace is a generalization of the idea of sufficient statistic (at least in the Gaussian case). It follows in fact from the definition of conditional orthogonality (1.9) that EEh IIXVH2]

= EEh IIX] ,

h IE H I ,

and, equivalently,

E[h2IxVH ] = E[h21X ] ,

h2 E H2 ,

82

so that all what is relevant in H2(H I) at the purpose of predicting any h16 H I (h26 H 2) is already contained in X. Therefore if X (or any system of generators of X) is given, we can disregard H 2 (H I) completely. Note that the concept of splitting is of interest only if it corresponds to effective data reduction. Hence the notion of minimality is of central importance. LEMMA 2.1 (Ruckebusch, 1976 and Lindquist and Picci, 1985) A splitting subspace X is minimal if and only if EXH I = X ,

EXH 2 = X

(2.4)

(here EXH'I is the closure of {EXhl; hie Hi} ,

i = 1,2). []

The following theorem shows that (modulo choice of generators) splitting subspaces and Dynamical Factor Analysis models are essentially the same thing. THEOREM 2.1 The factor space X of any F.A. model of {Y1(t)}, {Y2(t)} i__ss a splitting subspace. Vice versa to every splitting subspace X for H(Yl) , H(Y2) of finite multiplicity there . corresponds the equivalence class, defined modulo choice of generators, of F.A. models having X as

factor space.

Proof: Let X be given by (2.1). Then, since A.(z)x(t) = EXy.(t) , l l

tEZ,

i= 1,2 ,

(2.5)

the o r t h o g o n a l i t y r e l a t i o n of {wl(t)} and {w2(t)} , which holds by assumption for any model (1.16), can be rewritten as X X Y1(t)-g y1(t) l Y2(S)- E Y2(S) ,

t,sEZ •

(2.6)

88

As {Yi(t)} is a generating process for H.I it follows from the definition ( 1 . 9 )

t h a t indeed X i s s p l i t t i n g .

Viceversa, let

X be a splitting subspace and {x(t)} a minimal generating process X for X of dimension n. The projections E Y i ( t ) can be w r i t t e n as in (2.5) for suitable transfer functions A.(z) of dimension m. xn. I i Define

w.(t): = Y i ( t ) - E X . ( t ) l i

,

teT,

i=1,2,

(2.7)

then the stationary processes {w.(t)} are orthogonal to X and, i by the conditional orthogonality condition (2.3),we have also E w1(t)w2(s)' = 0

for all

t,sE ~. Therefore

{y1(t)} and

{Y2(t)} can be written as in (1.16), while satisfying (1.17). [] The equivalence established by Theorem 2.1 permits to define a first rough notion of minimality for F.A. models. We shall say that a F.A. model is irreducible if its factor space is minimal splitting.

THEOREM 2.2

(Picci and Pinzoni, 1986)

A F.A. model is irreducible if and only if the rank a.e. on the unit circle of the matrices A|(z) and A2(z) is equal to the multiplicity of X. All irreducible F.A. models have the same multiplicity (i.e. the same number of factors) n equal to the rank a.e. on the unit circle of the cross spectrum $12 of the processes {Y1(t)} and {Y2(t) }In the rest of this paper we shall concentrate on irreducible models. As we have just seen these models are characterized by a.e. left invertible matrices Ak(Z) , k = 1,2. Their factor process has an absolutely continuous spectrum

with an a.e. positive definite

spectral density matrix Q on the unit circle(Picci and Pinzoni,1986).

84

If in an irreducible F.A. model we eliminate the auxiliary variable {x(t)},we obtain a scheme of the following type, A2(z)-Ly2(t) = A1(z)-Ly1(t)

,

(2.8)

Y1(t) = Y1(t) +w1(t)

,

(2.9a)

y2(t) = Y2(t) +w2(t)

.

(2.9b)

This is essentially what is commonly called an Errors-In-Variables (E.I.V.) model of the processes {Y1(t)}, {Y2(t)}. Here Y1(t) and Y2(t) are represented as "noisy" observations of the "true" variaA

bles y~(t), Y2(t) obeying the deterministic relation (2.8) . Note that the correlation structure of {Y1(t)} and {Y2(t)} is completely embodied in the relation (2.8)

as the noise processes {Wk(t)} are

mutually uncorrelated and also orthogonal to the "true" variables {;k(t)}. An equivalent form of the deterministic link (2.8)

between

the true variables is obtained by substituting x(t) =A1(z)-Ly|(t) into the second equation in (1.16), getting -L Y2(t) = W(z)Y1(t)

,

W(z): =A2(m)AI(Z)

(2. I o)

, W(z) ~=A1(z)A2(z) -L

(2.11)

or,dually,

Y1(t) = W(z~Y2(t)

Note that the transfer functions W(z), W(z) ~ and also the relation (2.8)

are invariant under change of generators , x(t) =

= T(z)x(t) (T nonsingular), and are therefore uniquely attached to the (minimal) splitting subspace X of the model. An important question concerns the existence of models for which W (or W ~

is

a causal transfer function. This is the same as asking if two stationary processes described by an arbitrary joint spectrum S can be represented by the "noisy" input-output model

8G

Yl (t) = yl (t) + w I (t), (2.12) Y2(t) = W(z)Y1(t) +w2(t) , where W(z) is causal and {w1(t)}i{y|(t)}i{w2(t)}.

We shall

take up this kind of questions in section 4. As a last general comment about F.A. models, we remark that the freedom of changing generators in the factor space X permits to choose transfer matrices Ak(Z) or factor processes of very special structure. For example we can always take {x(t)} to be a white noise process or require that both At(z) and A2(z) be causal transfer functions.

For simplicity we shall state the next result

for the case of rational F.A. models.

PROPOSITION 2.1 For every rational irreducible F.A. model there is a choice of (minimal) generators in X, x(t) = T(z)x(t) ,

which (maintains rationality and) achieves causality of the transfer function matrices

Ak(Z) = Ak(Z)T(z)-1,

k = 1,2

(2.13)

Proof: In a rational model both the spectrum Q and the matrices ~,

k = 1,2

are rational functions. Since the joint spectrum

of the processes

Yk(t) = ~ ( z ) x ( t ) ,

=

$12 $21

32

=

k = 1,2

Q A2

I

,

A2

'

(2.14)

86

is then itself a rational function, it admits causal (in particular minimum phase) rational

spectral factors. Note that irre-

ducibility implies that rank S = n = r a n k

$12. Pick a causal full

rank rational spectral factor A (of dimension m x n) of S and write it as a partitioned matrix with two blocks Ak(Z) of dimensions m k x n ,

k = 1,2. The spectral factorization

= [AI(z)

(z)

L A2(z)

A2

is c l e a r l y e q u i v a l e n t t o the r e p r e s e n t a t i o n s k = 1,2

with { x ( t ) } an

Yk(t) = A k ( z ) x ( t ) ,

n - d l m e n s i o n a l white n o i s e p r o c e s s .

We

interpret {~(t)} as the new factor process of the model. Since A(z) is full rank, we can solve for

A1(z) =

A2(z)

Y2 (t)

x(t)

in the representation

x(t)= [X1(z)1 x(t), A2(z)

getting

~(t) = A2(~)

-L [A1(z) A2(z)

Note that T is square n x n

x(t) := T(z)x(t)

.

and nonsingular because of irre-

ducibility. This proves Proposition 2.1.

In the proof we could in particular have chosen =

LA1(z)'X2(z)']' minimum

[] A(z) =

phase. We see that an irreducible

rational F.A. model can always be written as a pair of ARMAX equations,

87

D1(z)Y1(t) = B1(z)x(t) +C1(z)e1(t), (2.15) D2(z)Y2(t) = B2(z)x(t) +C2(z)e2(t),

with {~(t)},

{el(t)} ,

{e2(t)} pairwise uncorrelated white noise

processes and Dk(Z)

and

dimensions

and

mkxm k

Ck(Z) stable polynomial matrices of m k x Pk' Pk

being the multiplicity

of the noise process {Wk(t)} , k = 1,2.

3. Stochastic realization The main problem of this section will be to describe the class of all irreducible F.A. models which match a given spectral density matrix. We shall see that this is equivalent to solving the following problem. PROBLEM P.! Given an

mxm

spectral densit~ matrix S partitioned as in

(1.3) and satisfyin$ the Basic Assumption

of Sect. I, find all

5-tuples of matrix functions {AI,A2,Q,RI,R 2} on the unit circle, with A~ of dimension

mkx n

and of rank n , Q of dimension

and nonsingular , R k of dimension i)

nx n

mkx mk, k = 1,2, which

satisfy the system of equations

S I = AIQ A I + R I ' $12 = A1Q A2 , S 2 = A2Q A 2 * R 2 ,

ii)

make the (m+n) x (m+n) matrix

(3.1)

88

sI

S12

AIQ

$12

S2

A2Q

QA I

QA 2

Q

(3.2)

into a spectral density matrix (in particular Hermitian and nonnegative definite on the unit circle).

[] Assume we have an irreducible F.A. model,

z1(t) = A1(z)x(t) +w1(t) ,

(3.3) z2(t) = A2(z)x(t) +w2(t)

if we interpret Q as the spectral density matrix of Rk, k = 1,2

{x(t)} and

as the spectra of the two noise processes {Wk(t)} ,

we see that eqns. (3.1) express precisely the fact that the joint spectrum of {z1(t)} and {z2(t)} coincides with the given joint spectrum S.

Note also that the matrix S in (3.2) is just the

joint spectral density of the three processes {z1(t)} , {z2(t)} and {x(t)}. Vice versa, assume we are given a 5-tuple {AI,A2,Q, RI,R 2} of matrices satisfying eqns. (3.1) and condition (ii). It is not hard to see and we shall check this later, that condition (ii) implies that

Q,RI,R 2 are necessarily bounded Hermi-

tian positive semidefinite (Q is actually positive definite) matrices on the unit circle and can therefore be interpreted as spectral densities of three mutually uncorrelated zero mean Gaussian processes [x(t)},{w1(t)} , {w2(t)}. Starting from these processes, we generate {z1(t)} and {z2(t)} by the linear transformation (3.3). We see from (3.1) that the joint spectrum of the stationary processes {z1(t)} , {z2(t)} is precisely equal to the given joint spectral density matrix S. In short, solving

89

problem

P.I is the same thing as finding all irreducible F.A.

models (3.3) for which the joint spectrum of the external variable__~s{z1(t)} , {z2(t)} is equal to the given spectral density matrix S. This problem is a distributional or "weak sense" stochastic realization problem (Finesso and Picei, ]984 and Lindquist and Picci, 1985). Interpreting S as the joint spectrum of two given Gaussian processes {y](t)}, {Y2(t)} , we are looking for all irreducible models (3.3) such that {Zk(t)} and {Yk(t)} equal processes in distribution. In "practical"

are

terms this

means that the model (3.3) will only be useful to simulate the signals {Yk(t)} in an "average" sense but not samp]ewise

in

general. A 5-tuple {AI,A2,Q,RI,R 2} satisfying conditions (i) and (ii) above, or, equivalently a F.A. mode] of the type (3.3) matching the given spectrum S, will be called a F.A. representation of the spectrum S. A (strong sense) F.A. representation of the processes {Y1(t)}, {Y2(t)} is instead a F.A. model of the type (3.3) for which Zk(t)= = Yk(t) almost surely for all

t e Z. This type of (samplewise)

equality is clearly stronger than equality in distribution and can only occur when the processes {Zk(t)} and {Yk(t)} are defined on the same probability space. This means that the various processes {x(t)}, {w1(t)} , {w2(t)} in (3.3) must be built in such a way that Ho: =H(X,Wl,W 2) DH(yl,Y 2) =H(zl,z2). Samplewise (i.e. strong sense) F.A. representations of {y1(t)},[Y2(t)} can be classified according to "how big"an underlying space H

is needed to support o the processes which specify the model. Later we shall study in some detail the class of F.A. representations for which H = o = H(yl,y2). These representations will be called "y-measurable" (o)

(o) Clearly an equivalent condition for y-measurability is that the factor space X is included in H(y).

90

Note that whenever {x(t)} is given, the noise processes {Wk(t)} are automatically fixed as functions of {x(t)~, {Y1(t)}, {y2(t)} by the orthogonality condition (1.17), as

Wk(t) = y k ( t ) - EXyk(t)

,

k = ~,2 ,

(3.4)

where X is the splitting subspace generated by {x(t)}.Therefore a (strong) F.A. representation is completely specified once the factor process {x(t)} is assigned as a function of some available generators of the space H . In particular a y-measurable repreo sentation is completely specified once {x(t)} is given as a function of {Y1(t)} and {Y2(t)}. In order to avoid complicated statements about equivalence classes, it will be useful to fix once and for all a rule for choosing generators in each factor space X. A convenient way to do this is to f i x a full rank factorizationof the cross

spectrum $12 ,

$12(z) = H(z)G (z) ,

(3.5)

where H and G are of respective dimensions m I x n , m 2 x n and of rank equal to

n=rank

$12 a.e. on C. Since $12 is rational, we

can always choose H and G to be rational matrices. In fact we shall choose H and G in such a way that (3.5) is a minimal factorization of the rational matrix Sl2(in the sense of Gohberg

and

Kaashoek

Bart,

(]979), p. 84).

Since all entries of a rational spectral density matrix must be analytic on the unit circle, it follows that both H(z) and G(z) must also be analytic on the unit circle. In the following we shall make the simplifying assumption that $12(z) has no zeros on the unit circle, i.e.

rank S I2(e i0) = n,

This guarantees

V e e [0,2~).

(3.6)

that neither H(z) nor G(z) can have zeros

91

on the unit circle, more precisely, both H(e io) and G(e ie) will be of constant rank n

for all

e E [0,2~). From now on the

matrices H and G will be considered as data of our problem.

LEMMA 3.1 Let condition (3.6) hold.

Then for each equivalence class

of irreducible F.A. models of {y1(t)}, {Y2(t)} there is a unique choice of generating process {x(t)} in the factor space X such that -I AI(Z) = H(z),

A2(z) = G(z)Q(z)

,

(3.7)

where Q is the (nonsingular) spectrum of {x(t)} . Alternatively, a unique generating process [x(t)} can be chosen for which

A 1(z) = H(z)Q(z) -I ,

A2(z) = G(z) ,

(3.8)

where Q is the spectrum of {x(t)}. The generating processes {x(t)}, {x(t)} for =he same minimal splitting subspace X are related by the transformation

x(t) = Q-1(z)x(t) .

(3.9)

Proof: In fact, if we start with an arbitrary irreducible model (1.16),there is a unique change of generators in X, ~(t) =T(z)x(t), with T such that H(z)T(z) =A1(z). Note that there is a unique a.e. nonsingular solution to this equation as both A I and H are of full rank n. Moreover T E L2(C,Qd6) where Q is the spectral density of {x(t)}. This follows from T(z) =H(z) -LA 1(z),because A ~ L2(C,Qde) and any left inverse of H(z) is analytic on the I

92

unit circle, in force of assumption (3.6). With this choice we get

$12(z) = H(z)Q(z)(T(z)-I~A2(z)* , where Q is the spectrum of {x(t)}. From (3.5) it follows then A2(z)T(z)-I = G(z)Q(z) -I Similarly, by choosing x(t) = T(z)x(t) with G(z)T(z) =A2(z) , we obtain (3.8). In particular, for x(t) =x(t) we find T =~-I.

[] By choosing the generators as stated in Lemma 3.1, we get a unique irreducible F.A. model representative of each minimal splitting subspace X. These models, for the two different choices (3.7) and (3.8), can be written as Y1(t) = H(z)x(t) + w (t) I

(3.10)

Y2(t) = G(z)Q(z)-Ix(t) +w2(t) ,

and, respectively, as

y1(t) = H(z)~(z)-1~(t) + w1(t) (3.11) Y2(t) = G(z)x(t) +w2(t)

.

We shall call "first" and "second" type canonical forms the two representations (3.10) and (3.11). Clearly each equivalence class of irreducible F.A. representations of a given spectrum S can in turn be represented by a unique 5-tuple -I {H, G Q or by

, Q, RI, R 2}

93

{H

~-I

G, Q

RI

R 2}

Note that R

and R are uniquely determined from the equaliI 2 ties (3.1) as functions of SI,AI, Q and $2,A2, Q. We conclude that all irreducible F.A. representations of the spectrum S• written in the first canonical form, are parametrized in a one-to-one wa 7 By the

nx n

nonsingular matrix function Q as

{H• G Q

-1

*

, Q, SI-HQH • S2-GQ

-1

*

G } ,

(3.12)

where Q is constrained to satisfy the condition that the matrix

S1

HG

GH

S2

QH

G

(3.13)

be a spectral densitY. Dually• all irreducible F.A. representations o_f_fS written in the second canonical form are parametrized in a one-to-one way by the nonsinBular

nx n

matrix function Q

{HQ -I, G, Q• SI-HQ-IH , S2-GQG } ,

a~s

(3.14)

where Q is constrained by the condition that the matrix

SI

HG

GH*

S2

LH*

H (3.15)

QG*

be a spectral density function. At this point we are ready to describe the solution set of our stochastic realization problem P.I. We introduce the

nx n

Hermitian matrices * -I QI: = H S I H•

*

-I

Q2 = G S 2 G

(3.16)

94 and set Q,:

o

=

Note that both Q1 and Q2 are strictly positive definite rational spectral density matrices in force of condition (3.6) and our standing assumptions on S. We define also the

nxn

Hermitian

matrices

A: = q l - q 2

'

~: = Q2-Q1

(3.18)

THEOREM 3.1 All irreducible F.A. representations of the spectrum S written in the first canonical form (3.12) are parametrized hy the solutions Q of the matrix inequality

Q-Q2-(Q-Q2)A-I (Q-Q2)* > 0

(3.19)

•

Dually~ all irreducible F.A. representations

of S written in the

second canonical form (3.14) are parametrized by the solutions of the inequality Q-QI-(Q-QI)% -1 (Q-QI)* ~ 0. A__n_n nx n

(3.20)

matrix function Q solves (3.19) if and only if

solves (3.20). All solutions Q (Q) of(3.19)

~ = Q-I

(resp. (3.20))are

Hermitian bounded and strictly positive definite,in fact they satisfy QI Z Q Z Q2 "

Q2 > ~ > Q1 '

(3.21)

where QI and Q2 (Q| and Q2 ) are the spectra] densities defined by (3.16) and (3.17).

95

Proof: What needs to be shown is that an n x n matrix function Q makes (3.13) a spectral density matrix if and only if it satisfies the quadratic inequality (3.19). Assume there is a Q making (3.13) into a spectral density matrix. Then, by a standard block diagonalization procedure, the positive definiteness of (3.13) is seen to be equivalent to

S2>0

, (3.22)

$I: = S1-S 12S21 S21 > 0 , *

q-G

-I

*

-I

*~-1

*

-I

S 2 G-(q,G S 2 G)H S I H(Q-G S 2 G) > 0 .

The first two inequalities are trivially satisfied. In fact, by our Basic Assumption on S, S 2 and $I are strictly positive definite on the whole of C. By simple matrix manipulations it can be checked that *

* -1

H*S-IHI = H (SI-HQ2 H )

_

H = (QI

-1

Q2 )

(3.23)

and therefore, recalling our notations (3.18), we see that Q has to satisfy the inequality (3.19). Note that Q makes the matrix (3.13) positive semidefinite if and only if Q=Q-I makes (3.15) positive semidefinite. This in turn happens if and only if Q satisfies the dual inequality (3.20) as can be seen by exactly the same argument used before. -I Thus Q satisfies (3.19) if and only if Q satisfies (3.20). -I Observe now that the matrixA , given by the expression (3.23), is strictly positive definite Hermitian on the unit circle and therefore any solution Q to (3.19) makes Q-Q2 positive semidefinlte Hermitian. Hence Q is Hermitian and Q ~ Q 2 " solution Q of (3.20) satisfies Q ~ Q I

•

Similarly any

Then, writing Q as Q-I9

we

96

obtain the first inequality in (3.21). So, any solution to (3.19) has a lower (Q2) and upper bound (~]I),. Q2 being strictly posi----I

rive definite and QI

being trivially bounded on C. It follows

that any solution to (3.19) is a spectral density matrix. The matrix (3.13) constructed from such a solution is also Hermitian positive semidefinite and has bounded entries on the unit circle. Therefore it is a spectral density matrix. [] The solution set of the inequalities

(3.19),

(3.20) can be

described quite explicitly.

THEOREM 3.2 An

nx n

matrix valued function Q on the unit circle solves

the inequality (3.19) if and only if it is Hermitian and Dually, an

nxn

QI ~ Q~Q2'

matrix Q solves (3.20) if and only if it is

Hermitian and satisfies

Q2~Q~QI.

Proof: The "only if "part is already contained in the statement of Theorem 3.1. We only need to prove the "if" part. Assume first that QI > Q > Q2 (with strict inequalities) holds. Then (QI-Q) and (Q-Q2) are both Hermitian strictly positive definite and there-I -I fore (Q-Q2) + (QI-Q) is strictly positive definite. Byawellknown formula for the inverse of a sum of matrices we see that this positivity condition is equivalent to

Q-Q2-(Q-Q2)A

(Q-Q2)

> 0 .

Now, every Q satisfying QI ~ Q ~ Q 2

(3.24)

can be approximated in L=nxn(C)

by a sequence of matrices Qk for which the strict inequalities hold~ Take for instance

97

Qk = -k- Q +

(QI+Q2) '

for which apparently Qk-Q2 > 0 and QI-Qk > 0. Hence Qk satisfies the strict inequality (3.24). But the left hand side of (3.24) is a positive definite matrix which is a continuous function of Qk and, as

k-> ~, it can at most become positive semidefinite.

[3 REMARK As a corollary of Theorems 3.1, 3.2 we obtain that the inequality QI>Q_>Q2 is equivalent to -1

SI>HQH

,

S2>G Q

*

G ,

Q>O,

which form in turn an equivalent set of conditions to the positivity of the matrix (3.13). This fact in particular guarantees that if Q satisfies (3.21) (or equivalently (3.19)), then the noise spectra R I and R 2 will be (Hermitian and) positive semidefinite. Note that the maximal solution QI is in this sense just the matrix which corresponds to the largest approximant of rank n of S I in the ordering of Hermitian positive semidefinite matri-

[] ces.

Theorem 3.1 provides a recipe for computing all irreducible F.A. representations describing a given spectral density matrix S in a fixed coordinate system. We can now easily see that there are many of such representations (a fact that we have not bothered to show till now). For example, as the two "extreme" spectra QI and Q2 defined in (3.16) and (3.17)

both satisfy the inequality

(3.19) (with equality sign), we see that there are a "maximal" and "minimal" irreducible F.A. representations (in the first canonical form) which correspond respectively to the maximal (QI)

98

and minimal (Q2) solutions to the inequality (3.19). Solutions like QI' Q2 above for which (3.19) is satisfied with equality sign have a special meaning. They correspond to joint spectra (3.13) of minimum possible rank, m, as can be seen from the block diagonalization

(3.22). Since the rank of the joint

spectrum of {z1(t)}, {z2(t)} and {x(t)} is equal to the multiplicity of the doubly invariant subspace

H(X,Zl,Z 2) spanned by these

processes, the multiplicity m of

H(X,Zl,Z 2) is equal to the mul-

tiplicity of the subspace H(Zl,Z2). This can only happen if H(X,Zl,Z 2) =H(Zl,Z2) , or, that is the same, if x(t) EH(Zl,Z 2) for all

tEZ. We see that all models which correspond to solu-

tions Q of (3.19) with equality sign are characterized by the fact that the factor process {x(t)} is a function of {z1(t)} , {z2(t)}. This observation is the key to the following result. PROPOSITION 3.1 The solutions Q to the quadratic matrix equation -I Q-Q2-(Q-Q2)A

* (Q-Q2)

= 0

(3.25)

parametrize in a one-to-one way the (strong) irreducible y-measurable representations of the processes {Y1(t)}, {Y2(t)}

of the form

(3.10). Dually, all solutions Q to the quadratic e~uation

Q-QI-

(Q-QI)A-I(Q-QI)* = 0

(3.26)

parametrize in a one-to-one way the (strong) irreducible F.A. representations of {Y1(t)}, {Y2(t)} of the form (3.11) for which XCH(y). Proof: Consider a F.A. representation of the type (3.10). If the factor space X is contained in H(y),then H(x,Yl,Y2) =H(yl,Y2) =H(y)

99

and hence the joint spectrum of {Y1(t)}, {Y2(t)}, {x(t)} has rank m. This implies that the spectrum Q of {x(t)} satisfies (3.19) with equality sign. Viee versa, assume Q is a solution of (3.25). Then, as discussed previously, the factor process of the F.A. -I model of type (3.3) attached to the weak realization {H, GQ , *

Q, SI-HQH , S2-G Q

-I

*

O } of the spectrum S, has the property that

x(t) belongs to H(z 1,z 2) for all t. It can therefore be written as x(t)=P1(Z)Zl(t)+P2(z)z2(t), transfer matrices. Define an

where P.(z), i= 1,2, are n x m . i l n-dimensional process {x(t)} by

setting

x(t) = P1(z)Y1(t) +P2(z)Y2(t).

(3.27)

Then {x(t)}, {Y1(t)}, {Y2(t)} have exactly the same joint second order statistics (i.e. the same spectrum) as {x(t)}, {z1(t)} , {z2(t)}. Since conditional orthogonality depends on joint second order moments o n l ~ i t then follows that is splitting for splitting for

H(Yl) , H(y 2)

X : = span {x(t); t 6 ~ }

exactly as the factor space X was

H(Zl) , H(z2). Hence {x(t)} is the factor process

of a strong F.A. representation of the type (3.10). By construction {x(t)} has spectral density matrix equal to Q and XCH(y). []

Let us define the stationary n-dimensional processes

xl(t) =

Q1(z)

xl(t),

Xl(t) =H

(z)S1(z) -ly I (t)

, (3.28)

x 2(t) = G * (z)S2(z) -ly2(t) where QI is defined by (3.16),(3.17).

Observe that the spectra of

{x1(t)} and {x2(t)} are precisely the extremal solutions QI,Q2 of the quadratic inequality (3.19). It is immediate to check that {x1(t)} and {x2(t)} are minimal generators for the subspaces

100

X1:

-H(Yl)H(Y2) '

= E

X2:

~H(Y2)H(y ]).

=

(In fact, for example X 2 is generated by

Y1(t) = S|2(z)S21(z)Y2(t) =

= H(z)x2(t) ). Moreover both X I and X 2 are minimal splitting suhspaces (compare e.g. Lindquist,Picci and Ruckehusch, 1979) XICH(y I) ,

X2CH(Y 2) ,

therefore they specify two equivalence classes of strong irreducible F.A. representations of {y1(t)}, {Y2(t)}. The particular generators {x1(t) } and {x2(t) } defined in (3.28) correspond to choosing these representations in the first canonical form, namely Y1(t) = H(z)x1(t ) +w1,1(t) , (3.29) Y2(t) = G(z)Q1(z) -Ixi (t) +wl,2(t) , and Y1(t) = H(z)x2(t ) +w2,l(t) , (3.30) Y2(t) = G(z)Q2(z)-Ix2(t ) +w2, 2 (t) . Observe that in the representation (3.29) the second equation is just the decomposition of estimate

Y2(t)

as the sum of the (noncausal)

Y2(t) = S21(z)S](z)-I Y1(t) and of the corresponding

estimation error. The first equation is more interesting. It can be rewritten in the form YI(t) = ~H(z)Y1(t) + (l-nH(z))Y1(t) , where H

(3.31)

i s the p r o j e c t i o n v a l u e d m a t r i x f u n c t i o n ~tt(z) = H(z)(H

(z)S1(z)-IH(z))-'IH*(z)S1(z)-1 (3,32)

101

mapping onto the column space of H. Note that ~ is S1-orthogonal , , H i.e. HHSI(I-~H) = 0 a.e. on the unit circle. Thus x I formally looks like the classical least squares estimate of x linear model Yl = H x + w

in the

. An analogous interpretation holds for the

second equation in (3.30). The next theorem describes quite explicitly

the family of

all (strong) y-measurable irreducible F.A. representations of {Y1(t) },

{Y2(t) }.

THEOREM 3.3 The factor process of any irreducible y-measurable F.A. representation (in the first canonical form) is a combination of {x1(t)} and {x2(t)} of the form x(t) = H(z)x1(t) + (I-E(z))x2(t) ,

(3.33)

where = (Q-Q2)A -1

(3.34)

is a A-ortho~onal projection valued matrix function on the unit circle. Proof : The proof relies on the easily checked fact that IN(x2) ] =x2(t). Then

x(t): =x1(t)-x2(t)

is orthogonal to {x2(t)} and so the direct sum is orthogonal.

ELx1(t) I

form a process which

H(x 1,x 2) =H(x)8H(x2) , where Now, any minimal splitting

suhspace X C H(y) is actually contained in

H(x~,x 2) C H(y)

(Lindquist and Picci, 1985), so that the corresponding factor process {x(t)} can be expressed as x(t) = S

~(z)A(z)-Ix(t) +S (z)Q2(z)-Ix2(t) x,x x,x 2 '

(3.35)

102

where the cross spectra are easily computed from

S = S S-1HQ11 = q x,x I x,Y I SX,X 2 z S S-IG x,y 2 2

=

9

Q2

Equation (3.35) is exactly the same as (3.33). In order to check that H is a projection, notice that right multiplication of -I (3.25) by A gives

(Q-Q2)A -I = (Q-Q2)A-I(Q-Q2)& -I

,

which shows that H = H 2 ; moreover (3.25) can be rewritten to look exactly llke ~g(l-~)

=0. Thus H is a A-orthogonal projection.

If we couple formula (3.33) with the explicit expressions (3.28) given for

x1(t)

and

x2(t) , we obtain a linear trans-

formation acting on the "data" {y1(t)}~Y2(t)}

that ~e want to

represent. This is precisely the rule telling us how the factor process of each y-measurable representation is manifactured. Note that (3.33) is still parametrized by Q. To complete the picture we need now to describe the solution set of the quadratic equation (3.25).

PROPOSITION 3.2 Let V be a square spectral factor of the spectral density matrix A=QI-Q2.

Then all solutions

Q = Q2+v where r is any

nxk

rr v

,

(k~n)

unit circleti.e, such that

Q #Q2

to (3.25) are given

(3.36) isometric matrix function on the

103

F F = Ik ,

Ik being the

kx k

(3.37)

identity matrix.

Proof: Write

Q-Q2' assumed to be of rank

k_ H(n), w h e r e H(n) = -- j.~Xy. P(y) log P(y) denotes the e n t r o p y of the strings of length n. This means that the ideal w a y to encode the strings relative to the given distribution is to assign to string y a code string w i t h length - log P(y). This, of course, c a n n o t a l w a y s be done e x a c t l y because a code string must have an integer length, but at least we k n o w w h a t w e should be striving for, and w e call it t h e / d e a l code length. A n o t h e r good n a m e w o u l d be Shannon complexity o f y relative to the given distribution.

A direct application of these ideas to compressing strings confronts us with the same problem as m e t in traditional statistics: The distribution P0") is not k n o w n to us, and it either has to be imagined or, better, estimated. For this reason w e consider a p a r a m e t r i c family { P0(Y) } of such distributions or models, w h e r e 0 = (01. . . . . Ok), and k ranges over the set of all n a t u r a l numbers, H o w n o w to calculate the ideal code length is the central problem in the

M D L principle to b e discussed next.

122

There are two basic ways to go about encoding a string of data. In the first way we read the entire string and we ^ somehow form the best estimate 0 O,) of the parameter vector 0. Then we design a code C such that the length of the code string CO,) is close to the ideal -- log/~e c,)(y). We need not concern ourselves with the details of how such a code can be designed, which is just a routine matter. The important thing to realize is that the datay can lag de^ coded from the code string CO,) only if the decoder also knows the estimated parameter vector 0 (y). This has to be given in an explicitly coded form, because the decoder at the time it is needed does not yet k n o w y and, hence, cannot calculate the estimate by any conceivable algorithm. The binary code string for the parameter vector, which may be placed as a preamble in front of C(y), must dearly be a prefix code, lor otherwise the decoder would not be able to separate it from the subsequent binary code of the data. Hence, its length L(8) must satisfy the Kraft-inequality, ~2-L(') _< 1, where 0 runs through all its possible values. These values are clearly truncations (think of computing the maximum likelihood estimates, which surely result in truncated numbers). If we carry too many fractional digits, the required code will have to be long, while if we truncate too heavily, the results will deviate too much from the optimum, and we end up coding the string with non-optimal parameters. It turns out that when each component is truncated to its optimal precision, reflecting its importance to the entire code length, the k code length for the k-component parameter vector and the loss due to truncation is ~-- log n bits, Rissanen (1978). In addition, the decoder will have to be given the number of the components k in the estimated parameter vector as another prefix coded preamble, which takes a little more than log k bits. This number, of course, is almost always quite negligible in comparison with the other length, and we drop it. All told, the best ideal code length with this type of "nonpredictive" coding is to within terms of order log n given by

k l~vp(y) = min { -- log P0(Y) + "~- log n]. k,e

(2.2)

The same expression but with different content and scope was also derived by restrictive Bayesian assumptions }n Schwarz (1978). We also refer to the pioneering work of Akaike (1974) for another criterion, where the weaker model complexity penalizing term k gets added to the first, the negative logarithm of the likelihood term. In contrast with (2.2) such a term is too weak to produce consistent estimates of the number of parameters in all the analyzed cases, Hannan (1980) and Shibata (1976). Finally, we add that when the parameter coding job is done more carefully, Rissanen (1983a), a third term is required, namely, k log ]JOII~<e),where M(O) denotes the Hessian

123

matrix of - log Pc(Y). This term turns out to be sensitive to the structure in which the parameters of a multivariable dynamic system are represented, Rissanen (1983b); see also Section 4.

The other way of coding data strings requires no explicit code for the parameters, because the coding will be done in a "predictive" way. What this means is that from each portion y ' =-)1 . . . . . y, of the data string we form an estimate of the distribution P0(.v,+l I.I/) for the possible values of the next symbol, where # is to be replaced by an estiA A mated value 0(t) = O0/), calculated by an algorithm from the so-far processed string. The decoder knows this algorithm, and he can also calculate the same estimate provided that it indeed does not depend on the future and not yet decoded data points. We know from Shannon's result, derived above, that the best way to do the coding is to assign to the next symbol the code length -- log P~0(,)(y,.~ lY'). and hence the best total ideal code length with this type of predictive coding is

n-I

I_p0') = min { - ~'. log ~t0(vt+l lY')}.

k

t-O

(2.3)

We should also have included the code length, log k, required to describe the number of the components in the es^ timated parameter vectors, but as above this term is negligible. How sheuld we pick the estimates O(t)? It seems to make eminent sense to pick them in such a way that the accumulated past code lengths

t-I -- log PO(,,pt) = -- E log PeO~¢+l lyt), t=O

(2.4)

arc minimized, which is seen to be done by the maximum likelihood estimates of the parameters for each value e[ k. This represents a most attractive principle of inductive inference: Make that choice that has worked best in the past. And who can argue against that, provided that we have no other "prior" knowledge about the behavior of the data! This philosophy in his "prequential" approach to estimation was also discovered independently in Dawid (1984). A somewhat similar and yet crucially different "cross- validation" principle has been studied in Stone (t977). Because no "honesty" of the predictions is required, the associated criterion is asymptotically equivalent with Akaike's AIC, and hence the resulting estimates of the number of parameters are not consistent.

124

In order to avoid ill-conditioned optimization problems, w e in (2,4) never estimate more p a r a m e t e r s t h a n d a t a points; t h a t is, w e begin w i t h k = 0 a n d increase/¢ gradually to each final value with which the criterion in (2.3) is evaluated. The case w i t h no free p a r a m e t e r s means that w e need an initial distribution POt) to predict or encode the v e r y first observation. This could be done by having a fixed p a r a m e t e r value 0(0), obtained s o m e h o w on prior grounds, w h i c h singles out a distribution from the family. W e discuss later h o w such a prior knowledge can be t a k e n a d v a n t a g e of in modeling and prediction.

The predictive coding process is seen to be v e r y similar to prediction: In both cases we try to unravel the uncert a i n t y a b o u t the " n e x t " observation Y,.t b y acting on the past d a t a only. In fact, the t w o processes are equivalent. A A TO see this, let 8(Ft+t - - y (t + 1 ] t)) be a n y reasonable prediction error measure, w h e r e y (t + I ] t) is some prediction of the n e x t observation, involving p a r a m e t e r s to be estimated from the past data. Define a conditional density ^ f~(Y~+t [Y) proportional to e -~(y,,t -y 0+tl0) and we get a family of p a r a m e t r i c probahilistic models, w h e r e the code length, a p a r t from a n irrelevant t e r m due to t r u n c a t i o n and proportional to n, is the sum of the prediction errors. A particularly i m p o r t a n t special case results from the quadratic prediction error measure, because then the predictive MDL principle reduces to a predictive least squares (LS) principle. W e discuss its application to A R M A estimation in the n e x t section. Because the non-predictive coding process c a n n o t be interpreted as prediction, w e conclude t h a t coding is a strictly more general process than prediction.

W e conclude this section b y stating that the t w o described coding lengths are asymptotically optimal in the sense that their mean, relative to a n y process in the considered class of " s m o o t h " models, is shortest a m o n g all codes satisfying (2.1). Because the variance of these lengths, c o m p u t e d per observation, behaves like 1/n, w e m a y take these lengths themselves to represent well the shortest possible per s y m b o l code lengths (prediction errors), and w e call It~e(F) and lr(F) the non-predictive and predictive stochastic complexities, respectively, of the stringy, relative to the considered class of models. This result not only generalizes the a b o v e m e n t i o n e d Shannon theorem, giving a tight lower b o u n d for the code length a n d the prediction errors, but it also serves a similar role as C r a m e r - R a o in= equality for estimators, e x c e p t t h a t w e m a y assess the goodness of a n y estimators, including the n u m b e r of parameters. The n a m e " c o m p l e x i t y " seems apt in view of the fact that it represents the ultimate limit to w h i c h the three f u n d a m e n t a l tasks, prediction, estimation, a n d coding, can be performed.

125

3. A R M A E s t i m a t i o n and Prediction

AS w e outlined in the preceding section, estimation and prediction are intertwined: you c a n n o t predict optimally w i t h o u t performing estimation optimally. Here w e m e a n the real prediction problem w h e r e w e are g i v e n an obs e r v e d sequence of numbers, y ( l ) . . . . . y ( n ) , one b y one, and w e are asked to predict for each n the n e x t value, This is to be done w i t h o u t knowing the probabilistie source of the n u m b e r s as usually done in prediction theory. O u r approach is to select a class of models, or perhaps several classes, and fit a model in each class w i t h the predictive L S principle. The prediction will be d o n e with the best model at each instant of time, and if the past is a n y guid-

ance to the future this strategy will provide the best predictions obtainable with the selected class. W e shall choose the model class as the gaussian A R M A class, w h i c h means that w e shall have to k n o w h o w prediction is done optimally for such processes. The K a l m a n t h e o r y in principle is applicable, but the solution it provides involves par a m e t e r s t h a t c a n n o t be estimated from the observations. For this reason w e shall use a n o t h e r approach, Rissanen (1967), and w e give the relevant recurrence equations below.

Consider a process generated b y the recursion

y ( t ) + a l y ( t -- 1) + " - + apy(t -- p ) = e(t) + c l e ( t -- 1) + ... + Cqe(t -- q),

(3.1)

for t > p , w h e r e e is a n orthogonal zero-mean process with variance E ( e ( t ) 2) = 0 2. Letting u ( t ) for t >_p stand for the M A process

u ( t ) = e(t) + c l e ( t -- 1) + ... + ¢qe(t - - q ) ,

we see that the eovarianca E ( u ( t ) u ( s ) )

(3.2)

= r(t,s), t,$ >_ p , satisfies the crucial "bandedness" property

r(t,s) = 0, for [ t -- s] > q.

(3.3)

W e let the initial variables be specified b y the eovariances as follows

E ( u ( t ) y ( $ ) ) = r(t,s) = 0 if t - s > q

(3.4) E(y(t)y(s))

= r(t,s),

t,s < p .

126

The problem is to find the orthogonal projection of y ( t ) on Y0~-t, the subspace spanned b y the observations up to ^ t - 1, w r i t t e n a s y (t I t -- 1). The task is simple if we find. a representation of the process u as follows:

(3.5)

u ( t ) = E(t) + Ci ( O e ( t - - 1) + "'" + Cq(t)e(t -- q ) , t > q,

w h e r e e(t) is a n uneorrelated (but not of unit variance) process; the variables for non-positive indices are zero. The coefficients are found by the C h o l e s k y factorizatlon of the covariance m a t r i x R = {r(ij)] as R = B I B , w h e r e B is upper triangular,

b(O,O) b(O,1) 0

b(O,n)

b(l,1) 0

B=

b(n0

0

l,n)

b(n,n)

Specifically, the f a c t o r s are defined b y the following rccursions, which also are s e e n to result from the G r a m - S m l t h orthogonalization procedure.

q

b ( t -- i j ) = [ r ( t -- i,t) - -

Z b(t - j , t ) b ( t - j,t - i) q b - l ( t j-i+l

b(t,t) = + [r(t,t) - b 2 ( t - q,t) . . . . .

b(0,0)= +~,b(t-i,t)=0,

b(t -

- i,t - i),

l,t)2] 1/2, t > 0

1 _ +

k

2c, c,):c,-,t,-;- 1)=

i--I

(3.8) i--I

where d,(t) = q(t) - a~, i = 1. . . . . k for k = m a x {p:/}, and the coefficients with undefined i n d e x values a r e zero.

One can show that if the polynomial defined by the coefficients c, has its roots strictly outside the unit circle, then c,(t) -4- c,. The limiting predictor, then, agrees w i t h the s t a t i o n a r y optimal predictor

q

~, ft I t -

1) + ~

k

cf, ft -

i It - i -

1) =

i=l

~, (ci -

a i ) y f t - i)

(3.9)

i=1

We n o w r e t u r n to the main problem of h o w to do the prediction w h e n the coefficients and the t w o order n u m b e r s p and q are n o t k n o w n . W e apply the predictive LS principle and proceed as follows. For each pair (p,q) and each t w e solve the following ordinary least squares problem

l

min E ~ 2 ( i ) , fl i--I

(3.10)

where 0 denotes the vector of the coefficients a = (at . . . . ,%,q . . . . . cq) together w i t h the p(p + 1 ) / 2 + q(q + 1 ) / 2 initial elements r(id) in (3.6), defining a vector/3, and r(i) m a y be solved recursively from (3.1), (3.2), and (3.5). A A ^ Let the minimizing p a r a m e t e r s be O (t) = (a (t),/3 (t)). With these we n o w e x t e n d the Cholesky factorization one more step; i.e., w e c o m p u t e the coefficients (3.6) for t + 1, and w i t h (3.7) we calculate the n e w prediction 2~(t + 1 It) f r o m the f o r m u l a (3.8), w h i c h clearly depends only on the past data and the pair (p,q), because the ^ calculation of O(t) is done b y the fixed ordinary least squares algorithm. This gives the prediction error A A ~(t + 1 IpAt) =y(t + 1) - - ~ ( t + 1 It). As the final step w e find the best pair (p(n),q(n)) which solves the optimization problem

128

n-I Ip~v) = mln X # ( t + 1 Ipa). P'q t-O

(3.n)

It can be shown that asymptotically

I

~)

~ 02(1 +

p+q n

In n),

(3.12)

where

02----. ~ - ~ e 2 ( / ) . t-I

(3.13)

Remarks.

In the above described procedure we did not pay any attention to the amount of computations needed. Rather, our aim was to do the prediction as well as we know how, provided, though, that there is no prior knowledge about ^

A

the parameter values. Clearly, when calculating 0 (t + 1) by a suitable hill climbing routine, we should use 8 (t) as the initial estimate. It is also possible to calculate the Cholesky Iactorization by an order of magnitude faster algorithm, Rissanen (1973), in case the eovariance matrix R(t) is a Toeplitz matrix; i.e., if the process u is stationary, and if we set the initial conditions to zero. Alternatively, it is enough to have the initial conditions such that the y -- process is stationary. The resulting fast predictor recursions have been described by Lindquist (1974), Kailath, Morf, and Sidhu (1974), and by Rissanen (19"/5), after this author lectured the topic at Stanford University during 1971-1972. Much earlier, the impulse response of a stationary predictor was found with a fast algorithm by Levinson, but that algorithm required an ever growing memory.

The entire Cholesky factorization can be avoided if we ignore the influence of the initial conditions and simply replace the representation (3.5) by (3.2). The only problem remaining then is to compute the sequence of esti-

129

A

mates a(t), t = 0 . . . . . n - 1 for different values of p and q. In the caze of AR processes such calculations can be done recursively by the so.called ladder forms; see Wax (1985).

As a final remark, the difference between the complexity and the sum of the squared residuals (3.13) was observed in Bittanti (1983), where it was wondered whether the relationship between the two could be clarified. Well, (3.12) does it in a most decisive manner.

We computed in Rissanen (1984c) a small simulation to test the predictive least square principle for estimating an ARMA order. We used the stationary equations which do not require the Cholesky [actorization, The data were generated with an ARMA(1,1) system with parameters a = .5 and c = - . 3 , where e(t) was a computer generated zero mean unit variance independent gaussian sequence. We fitted models of type ARMA(p,q) with (p,q) ffi (1,0), (2,0), (1,1), and (2,2). The following table gives the sum in (3.12), calculated for various values ofp,q and divided by n, along a single sample of size 600.

(p,q)

n ----50

n = 100

n = 200

n = 300

n = 600

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(1,0) (2,0) (1,I) (2,2)

1.336 1.629 1.505 1.925

1.276 1.385 1.307 1.520

1.101 1.156 1.117 1.221

1.107 1.120 1.091 1.159

1.015 0,996

Table 1. Simulations of ARM A processes

We see that the models (2,0) and (2,2) give uniformly worse values than the two best models (1,0) and (1,1) in the table for all sample sizes (we did not calculate the last entry for them, which surely would have been worse, too). In the last model, in particular, the two extra parameters penalize heavily the prediction errors. For sample sizes up to 200 the simpler model (1,0) performs best, but eventually the model with the right numbers of parameters (1,1) is the winner. This makes sense in that there is no predictive benefit in estimating the second less significant parameter until there is enough data, even if we knew that such a parameter existed; the data are the ultimate arbiter in deciding what is optimal and what is not.

130

We then w a n t e d to study how initial estimates of the p a r a m e t e r s might be t a k e n a d v a n t a g e of to improve t h e par a m e t e r estimates a n d the predictions. After all, in our opinion, the most natural and easy w a y to incorporate inltial knowledge is directly in terms of the estimate of the parameters, including their numbers. Indeed, the p a r a m e t e r s usually represent constants, a n d a n y Bayesian type of prior distribution for t h e m is both a w k w a r d to justify and just a b o u t impossible to estimate in a meaningful way. The traditional Bayesian formalism does not permit a representations of initial k n o w l e d g e in terms of a p a r a m e t e r value, because the so defined singular dis^ tribution c a n n o t be altered by the data. However, our formalism does it easily. In fact, let 0 (0) denote the initial ^ estimate w i t h p + q components. Then w i t h O(t) denoting the predictive L S estimate from the first t observations w i t h initial k n o w l e d g e ignored, as described above, define a new estimate as a linear combination of the t w o

O(t) =

^ ^ at0 (o) + (1 - . c ) O ( t ) .

(3.14)

The coefficient is defined as follows

at=

1

--,_ , ,_ , ' 1 + 2 L'Ow'q't)-Lke'q't)

(3.15)

w h e r e L(p,q,t) denotes the a c c u m u l a t e d prediction errors, (3.11) before the minimization, and Lo(p,q,t) is the s a m e ^ w h e n the p a r a m e t e r is the initial estimate 0 (0). Because this p a r a m e t e r is the same t h r o u g h o u t the data, La(p,q,t) coincides w i t h the usual non-predictive s u m of the squared deviations. We see that a good initial e s t i m a t e tends to m a k e the corresponding code length shorter t h a n the length L(p~l,t) for small values of t, because initially the estiA

m a t e 0 (t) tends to be poor due to a small sample size. This causes a r to be near one, a n d the effective estimate 0 (t) is close to the initial estimate. However, eventually a, gets small, unless the initial estimate is perfect, and the ef^ fective estimate tends to the steadily improving estimate 0 (t).

To test the feasibility of this s c h e m e w e generated a data sequence of length 100 w i t h the A R M A ( I , 1 ) model defined b y the t w o p a r a m e t e r s a = 0 . 7 , c = -- 0.1, a n d w i t h a unit variance zero m e a n gaussian independent input sequence. W e set the initial estimate 0(0) = (0.7, -- 0.1) of the p a r a m e t e r s at the " t r u e " values. W e wish to cornpare the convergence of the p a r a m e t e r e s t i m a t e 0 (t) = (a, c ), given b y such a perfect initial knowledge, w i t h t h a t ^

A

A

of the least squares estimates 0 (t) = (a, c ). In this e x p e r i m e n t we, then, kept the n u m b e r of p a r a m e t e r s at the

131

correct value. The~o estimates along with the two sums of the squared predictions, corresponding to the two estiA

maters 0 and O, r~pectively, were computed along the 100 sample points, and the results are in the following table.

time t

with prior estimates a c L

without prior estimates a c L

.......................................................................

I0 20 30 40 100

0.15 0.47 0.62 0.69 0.69

0.01 -0.23 -0.17 -0.11 -0.11

5.0 17.1 28.7 36.0 98.9

0.04 0.16 0.24 0.43 0.50

0.03 -0.41 -0.50 -0.28 -0.33

4.99 20.0 33.7 42.9 108.2

Table 2. Effect of prior estimates

We see that indeed good initial estimates improve both the convergence and the prediction error.

4. Vector Time Series Models

As quite well known, the class of multi-input/output linear dynamic systems, even of a fixed dimensionality, is topologically a lot more complex space than in the case when either the input or the output is scalar. Hence, when we search for the stochastic complexity of an observed vector li me series, relative to the class of such models, we may find a model which with relatively few parameters will capture the essence in the data. In the older statistical literature the only models that were fitted to a series with, say, p components, had the maximal dimensionality, a multiple of p. This was justified on the grounds that since the estimated l-lankel-matrix, or its equivalent, has the maximal rank, there is no point in fitting other models. Such an argument indicates a gross misunderstanding of modeling, and, in fact, an equivalent argument would dismiss fitting dynamic systems to scalar sequences as well; after all, no observed sequence is generated by any dynamic system.

Since the theory of multivariable linear dynamic systems is by now well known, and, in fact, it ma y even be covered in some of the other chapters of this book, we do not need to describe it here in any detail. Instead, we just summarize the relevant facts. The set of all linear dynamic systems with, say, p inputs and equally many outputs,

132

is in o n e - t o - o n e c o r r e s p o n d e n c e w i t h t h e set of all H a n k e l m a t r i c e s of p x p blocks w i t h finite r a n k , s a y n. G i v e n s u c h a s y s t e m , its m a t r i x i n p u t / o u t p u t impulse response defines a H a n k e l m a t r i x of p x p blocks a n d r a n k n. C o n v e r s e l y , a n y of t h e usual realization a l g o r i t h m s defines a

p-input/output s y s t e m

of d e g r e e n of s u c h a H a n k e l

matrix.

T h e set of all finite r a n k H a n k e l m a t r i c e s of p x p blocks c l e a r l y a d m i t s a p a r t i t i o n into e q u i v a l e n c e classes b y t h e r a n k n. H o w c a n e a c h s u c h class b e p a r a m e t c r i z e d ? Unlike in t h e c a s e w i t h p = 1, a n equivalenc~ class c o r r e s p o n d i n g to a r a n k n, a n d h e n c e t h e set of all linear s y s t e m s of o r d e r n w i t h p inputs a n d o u t p u t s , c a n n o t b e p a r a m e t e r i z e d w i t h a single c o o r d i n a t e s y s t e m , a n d t h e set is n o t a linear space. This i m p o r t a n t o b s e r v a t i o n is a t the r o o t of t h e m o d e r n t h e o r y of linear d y n a m i c s y s t e m s , a n d it also affects in a p r o f o u n d m a n n e r the w a y s u c h models o u g h t to be fitted to the o b s e r v e d time series. C o n s i d e r , f o r e x a m p l e , the set of all H a n k e l m a t r i c e s w i t h p = 2 a n d n -- 3. If w e f u r t h e r a s s u m e t h e first t w o r o w s to b e linearly i n d e p e n d e n t , as w e s h o u l d to a v o i d p a t h o l o g y , t h e n t h e H a n k e l p r o p e r t y implies t h a t either t h e t h i r d or the f o u r t h r o w m u s t be the last r e m a i n i n g r o w t h a t t o g e t h e r w i t h t h e first t w o f o r m s a 3 - e l e m e n t basis f o r t h e s p a n of all the r o w s in t h e m a t r i x . A g a i n the H a n k e l p r o p e r t y implies t h a t these t h r e e basis r o w s a r e defined just as s o o n as w e specify t h e t w o first e l e m e n t s in each, hence, six a l l t o g e t h e r . In the f o r m e r case, w h e r e the basis consists of the Hrst t h r e e r o w s , t h e f o u r t h a n d t h e fifth r o w a r c linear c o m b i n a t i o n s of the basis e l e m e n t s a n d , hence, to specify t h e m w e n e e d six ecefficients. All t h e o t h e r r o w s in the H a n k e l m a t r i x a r e n o w just shifts a n d t r u n c a t i o n s of these a n d t h e basis r o w s .

2np =

Similarly,

12 p a r a m e t e r s a r e n e e d e d to specify all t h e H a n k e l m a t r i c e s , w h e r e the first, s e c o n d , a n d t h e f o u r t h r o w s

f o r m a basis.

C o n s i d e r n o w t h e set of m a t r i c e s w h e r e the f o u r t h r o w is a basis c l e m e n t . T h e n the t h i r d r o w p e r f o r c e is l i n e a r l y d e p e n d e n t o n t h e first, second, a n d t h e f o u r t h . C o n s i d e r t h e f u r t h e r subset w h e r e t h e t h i r d r o w in f a c t is linearly d e p e n d e n t o n t h e first t w o . E v i d e n t l y n o s u c h m a t r i x a n d the c o r r e s p o n d i n g linear s y s t e m c o u l d b e e x p r e s s e d in t e r m s of t h e p a r a m e t e r s d e f i n e d b y t h e basis consisting of t h e first, second, a n d t h e t h i r d r o w . F r o m this w e c o n elude t h a t in o r d e r t o p a r a m e t e r i z c the set of all s y s t e m s of d e g r e e 3 h a v i n g t w o inputs a n d o u t p u t s , w e n e e d t w o distinct c o o r d i n a t e s y s t e m s .

~33

In general, then, the set of all linear systems of degree n having p inputs and outputs, may be partitioned into finitely many equivalence classes, each class corresponding to the so-called lexicographic basis defined by each matrix as follows: Each of the first p rows is included in the basis, and the next basis element is the first row which is not in the linear span of those above it, and so on. Consider the ith row, i _ for

(1.a) which

[(t,x) 1.2

into

Xr (T,t)

System

if there exists an input function

carries

the event

(T,0) into

(t,x)

(T,0) 3 .

[Xc(t,T)] denotes

[controllable] 1.3

t]

(T,t),T t],such over

that x is

(t,T) ~ .

denotes the set of the reachable E c o n t r o l -

states at t. (1.a)

(or, e q u i v a l e n t l y

the pair

(A,B)) is r e a c h -

able[controllable 3 at time t if Xr(t) = R n [Xc(t) 1.7

System

at t 3

(I)

(or, equivalently,

the pair

=

Rn]

(A,B)) is reach-

a b l e [ c o n t r o l l a b l e 3 if Xr(t) = R n, Yt [Xc(t)

= R n, Yt 3 .

146

2.3

GramPian matrices

The following nxn matrices are named reachability and controlla-

bility Grammian matrices respectively.

Wr(T,t)

.t = j~ #(t,~) B(o) B(o)' ¢(t,s)' do , t >

(3.a)

w

=

(3.b)

(t,-r)

~(t,o)

B(a)

B(c~)'

~(t,c~)'

de

,

T>t

C

It is well known (Kalman, 1969) that

Xr(~,t) = R EWr(~,t)]

x (t,~) = R EWc(t,~)] c

L-J m

where R

is the range operator.

In the periodic case, the following recursions can be derived in view of

(2):

Wr(t-(i+1)T,t ) = Wr(t-iT,t ) + [@(t+T,t)] i Wr(t-T,t) [@(t+T,t)'] i (4.a) C

C

s

C

(4.b) 2.4

Five structural properties of time-invariant systems

The structural properties of linear time-invariant systems have received ample coverage in the literature,see e.g. Kalman

(1969),

147

Chen

(1970).

though all,

some

well

of t h e m

in t h e p r o p e r

periodic

A)

Five

known

are

properties

trivial,

order,

for

are

listed

it is a d v i s a b l e

the

subsequent

below.

to

list

discussion

Althem

on

systems.

The reachability

and controllability

subspaces

at t i m e

t

subspaces

are t i m e -

do c o i n c i d e : x

r

(t)

B) T h e

= x

c

(t)

,

reachability

Yt

and controllability

invariant: X

X

C)

r

c

(t) = const.

, Yt

(t) = const.

, Yt.

If t h e p a i r point, X

r

X

(t)

(A,B)

is r e a c h a b l e

it is r e a c h a b l e =

Rn

(t) = R n

---->

X

~

X

c

D)

r

[controllable]

[controllable] (t)

=

Rn

(t) = R n

,

at a n y

at a t i m e time

point:

Vt , Vt.

c

If a s t a t e any

e>0,

(t,

t+~)]


it is r e a c h a b l e :

Xr(t)

= Xr(t-e,t )

Xc(t)

= Xc(t,

t+E).

at t [ c o n t r o l l a b l e over

(t-E,t)

at t],

[controllable

then, over

for

148

E) The p a i r

(A,B)

[sI - A

rank o n the s p e c t r u m

to the s y s t e m

a trivial

between

the t i m e - i n v a r i a n t

D can be r e p h r a s e d

systems,

reachability

instantaneous". made

arbitrarily

constraint I.

often

referred

2.5

Five

an

input

the

(1966),

Popov

structural

that,

with

function

properties

the

case.

in t i m e - i n v a r i a n t

are

"asymptotically

of

transitions time

the a b s e n c e

of any

in the c l a s s i c a l

characterization

Belevitch

(1968),

of c o n t i n u o u s - t i m e

Defini-

E is

(Popov-Belevitch-Hautus)

(1966),

is

to stress

and the p e r i o d i c

interval

spectral

to as the PBH

only

and c o n t r o l l a b i l i t y

is c o n n e c t e d

on the

Finally,

see J o h n s o n

in

B is i n t u i t i v e . C

here


occur

shor~This

tions

by saying

The reachability to

Property

of B and is l i s t e d

Property

energy

if

of A.

time-invariance,

consequence

difference

can be

if and o n l y

B 3

is full Due


condition, Hautus

(1969).

periodi c

systems

The f o l l o w i n g

basic

Do p r o p e r t i e s • Can

anything

questions

A-C hold

true

be said about

lity

intervals?

Does

there

exist

In the

first

place,


for p e r i o d i c

version

holds

subspaces

section:

and c o n t r o l l a b i -

of the PBH

true,

coincide

in this

systems?

the r e a c h a b i l i t y

a periodic A still


test?

i.e.

the r e a c h a b i l i t y

even

in the p e r i o d i c

149

case:

X

r

(t) = X

As m a n y r e s u l t s periodic

~t.

concerning

systems,

geometric

this

the s t r u c t u r a l

can b e p r o v e n

derivation

of a t y p i c a l

of inclusion

that a n y s t a t e NT.

geometric

Due

t+NT,

to p e r i o d i c i t y ,

x is c o n t r o l l a b l e

X

reachable

c

(T,t)

contrary and X

c

to zero

(t+NT),

in an

so t h a t

x~

since x ~ is c o n t r o -

(t,x)

(t,0)

into

will

there

exists

(t+NT,0).

By the

t h e n be t r a n s f e r r e d at t+NT,

to p e r i o d i c i t y .

This

and c o n s e leads

to

s y s t e m at e a c h

well

known

property

to the time

and c o n t r o l l a b i l i t y time p o i n t

(T,t) m a y n o t c o i n c i d e

is an e x t e n s i o n

for t i m e - i n v a r i a n t

invariant

case,

sub-

systems.

the s u b s p a c e s

(see E x a m p l e

1 below).

c

dically

r

the

Xr(t).

P r o p e r t y B is o b v i o u s l y

X

that,

x e is r e a c h a b l e

at t thanks

of a p e r i o d i c

r

(t) = X

of the r e a c h a b i l i t y

of the a n a l o g o u s

X

the event

Xc(t) C

The c o i n c i d e n c e

However,

let us o u t l i n e

at t. T h e r e f o r e

transfers

Therefore

the c o n c l u s i o n

spaces

which

function,

(t+NT,-x~).

quently

or

at t + N T as well.

function

same i n p u t

proof,

a t t c a n be d r i v e n

Let x = $ ( t , + N T ) x "e. It is a p p a r e n t

to

algebraic

X

controllable

is c o n t r o l l a b l e

an i n p u t

either via

of l i n e a r

(t) C X (t). L e t x ~ be a s t a t e w h i c h c -- r at t and d e n o t e b y N a p o s i t i v e i n t e g e r such

is c o n t r o l l a b l e

lableat

properties

methods.

As an e x a m p l e

interval

(t),

c

time-varying

(t) = X

X c(t)

:

(t+T)

,

~t

= X c(t+T)

,

~t.

r

false. I n s t e a d X r ( t )

and Xc(t)

are p e r i o -

150

However,

b y the s a m e a r g u m e n t s

ti a n d B o l z e r n

dim X

dim X

From

r

c

(1984,a),

Lemma

used

in d i s c r e t e - t l m e

3, it can be p r o v e n

in B i t t a ~

that

(t) = c o n s t

(t) = const.

this,

periodic

it f o l l o w s

system

reachable

that property


C still h o l d s true:

at a g i v e n

at a n y t i m e point.

Hence

s p e a k of s y s t e m r e a c h a b i l i t y

time point,

If a

it is

it is p o s s i b l e


to

without

fur-

ther specifications.

The attention of w h i c h are knowledge

is n o w f o c u s e d somewhat

vals

(1969),

algebraic

Theorem

1

If s y s t e m sition

systems

in

C

Bittanti

(Brunovsky,

X

first

concern-

sixties.

interIn B r u -

is p r o v e n b y m e a n s proof,

stories

To the b e s t

statements

late

of

of g e o m e t r i c

(1984,a),

then

Lemma

type,

I.

the c o n t r o l l a b i l i t y

in an i n t e r v a l

(t) = X

C

(t,t+nT).

tran-

of t i m e of l e n g t h nT

:

C

the


and Bolzern

(1) is c o n t r o l l a b l e ,

~

e a c h other.

to the

result

D and E,

1969)

c a n be p e r f o r m e d

(t) = R n

the

An alternative

is the s y s t e m order)

X

go b a c k

the f o l l o w i n g

arguments.

can be found

author,

with

of the r e a c h a b i l i t y

of p e r i o d i c

novsky

interwoven

of the p r e s e n t

ing the l e n g t h

on p r o p e r t i e s

~]

(n

151

In Kalman

(1969),

a stronger

statement

is reported

without

proof.

Proposition If system sition

(Kalman,

(I) is controllable,

can be performed

The question untill

1969)

of proving

1975,

when,

in an interval

in a

Riccati

proposition.

Furthermore,

equation,

system controllability

different

and N i s h i m u r a ~(T,0)

condition, matrix

remained

paper

which generalizes

open

on

Hewer gave a proof

he gave a spectral

[]

of Kalman

condition

of

the PBH test to the

case.

A slightly

matrix

lengthy

tran-

of time of length T.

Kalman proposition

the periodic

periodic

then the controllability

condition

is due to Kano

(1979), where reference is made to the m o n o d r o m y RT = e in place of R. The condition, named H-

will now be stated

~(t+T,t).

to discrete-time

H-condition

yet equivalent

This

in terms of a ~ n e r i c

is especially

systems. Recall

at t

that

monodromy

useful

for the extension RT [ is the spectrum of e

(continuous-time)

Given a time point t, the matrix sl-

#(t+T,t)

Wc(t,t+T) 3

is full rank on I-

[]

The first paper where

the condition was

stated

is probably Bittanti,

Bolzern,

and Guardabassi

Colaneri

in these

terms (1983).

152

However,the spectral conditions playeda k e y r o l e in the analysis of the periodic Lyapunov and Riccati Equations, Hewer Kano and Nishimura

(1969),

(1975). The proof given by Hewer of the

validity of the H-condition as controllability test was based on Kalman proposition, in Hewer

though. Unfortunately,

the proof given

(1969) of such a proposition was not correct. Even

more so:the Kalman proposition itself is not true,as shown by the following counterexample.

Example 1 For a given integer n, let 1 I, ~2' ..., I n be n given distinct real numbers. Consider the single-input system:

A(t) = diag [ 1 I, 12, .... Xn] ' E e-X1 (l-t)

e _12 (l_t)

l , sin ... e -ln (l-t) _

B It) = periodic extension of previous,

t ~ [0,1]

For this system, which is periodic of period T = I, 9(0,~) B(o) = (sin ~g) x I where x I = [e -11

e-k2

... e-ln]

Letting 2 = f01 sin ~q dq

=t, te [0,13

153

it f o l l o w s

Wc(0,1)

from

=

(3.b)

~ X l X ~.

Therefore,

dim

X

not

controllable

For

a given

interval

xi

Then, can

(0,1)

c

and

-iX1

e

Wc(0,k)

the

= ~(xlx:]

assuming

n > I, t h i s

system

integer

k,

k ~ n,

consider

now

the

time

let

. .. e

-iXnJ

'

(4)

for

recursion following

,

any

i = I, 2,

...,

k.

integer

k ~n,

W

c

(0,k)

expression:

+ x2x[z + "" " + XkXJ)'k

k < n

Consequently,

Xc(0,k)

Since dim

X

= span

x I, x2,

c

(0,k)

Therefore states shorter

the

which than

Interestingly

Xr(0,k)

Ix1,

...,

= k

x2,

...,

x n are ,

,

Xk~

k

independent,

it f o l l o w s

is c o n t r o l l a b l e , be d r i v e n

to

zero

but

enough,

it t u r n s

X_l

...,

out

X~k+13

there

in an

(n).

= s p a n Ix0,

.< n .

that

Yk ~T, the solution of the Lyapunov equation is given by FT(t) = ¢(t,T) ~ ~(t,T)' + Wr(~,t).

(26)

Setting now T=0, t=T and imposing the periodicity constraint (24), the following equation is obtained: = ¢(T,0)

~

(T,0)'

+ ~.

This is the discrete-time algebraic Lyapunov equation. be shown

It can

(Graham, 1981) that, if the characteristic multipliers

lie within the unit circle, this equation admits a unique solution. From these results, stable, then both

it follows that: if the system is as~nptotically

(20) and (21) admit a unique T-periodic

solution. As a matter of fact, under the assumption of asymptotic stability, the following can be shown to hold true Bolzern and Colaneri,

(Bittanti,

1984):

Consider the solution F (t) of (21) such that F (T) = ~. T T Then F (t) converges to the periodic solution of % (21) as T÷-~, for whichever ~. In particular, taking ~=0,

(26)

entails that the Wr(-~,t) ~8 the T-periodic solution. Moreover,

174

in view of positive

(25), it is also apparent

semidefinite

reachable,

(at each t).

that this solution In fact,

should

is

(A,B) be

the solution is obviously positive definite

(at each

t). This last conclusion Lemma,

is part of the so-called Periodic Lyapunov

which can be stated as follows

Colaneri, Theorem

(Bittanti,

Bolzern and

1985).

6

The system is asymptotically such that

(A,B)

positive definite

stable if and only if, for any

is reachable,

there exists a T-periodic

solution of the Lyapunov equation

(21).~]

An extended version of this lemma can be given under the assumption

that

(A(t), B(t))

be stabilizable

only.

Theorem 7 The system is asymptotically such that

(A,B) is stabilizable,

positive semidefinite Theorem

7 is proven in

is decomposed

to the reachability

there exists a T-periodic

solution of the Lyapunov equation (Bittanti,

by means of a decomposition equation

stable if and only if, for any

Bolzern and Colaneri,

technique.

Precisely,

into three subequations

canonical

decomposition

of

(21).[7 1985)

the Lyapunov

corresponding (A,B).

One could wonder whether the Lyapunov equation may admit a T-periodic

positive

is not stable.

semidefinite

solution even if the system

In case the system is not asymptotically

stable,

175

matrix ~(T,0)

has some eigenvalues

on or outside

the unit

circle. If a characteristic the pair

(A,B)

T-periodic

multiplier

is stabilizable,

solution,

(see

(Bittanti and Colaneri, say, p characteristic

lies on the unit circle, then

(21) does not admit any

(Wimmer and Ziebur,

1986, Thm.

multipliers

2(a)).

that,

if

1984) and

(A,B)

solution of time-points.

(Bittanti,

is reachable

(21)

lower than I. Then,

Colaneri,

1986),

or stabilizable,

it is shown

the T-periodic for each

The remainign n-p ones are all positive if

(A,B)

solutions of

(21) correspond

(12), the conclusion stabilizable.

Then,

eq.(12)

(A,B)

semidefinite

to a cyclostationary

is the following:

if

is stabilizable.

Since it is obvious that only the positive

Assume

solution of

that

(A,B)

is

if the system is not asymptotically

admits no cyclostationary

The analysis of the discrete-time

solution.

periodic Lyapunov equation

is currently underway and partially Colaneri,

Suppose now that,

(if any) has p negative eigenvalues

is reachable or nonnegative

stable,

1975) and

have modulus greater than I,

while the remaining n-p ones have modulus in (Shayman,

and

reported

in

(Bolzern and

1986).

Acknowledgment The author is grateful to Professors Diego Bricio Hernandez comments.

Guido Guardabassi

and

for their helpful and stimulating

176

References Bailey, J.E. (1973): Periodic Operation of Chemical Reactors: A Review. Chem. Eng. Commun. I, 111-124. Bekir, E. and R.S. Bucy (1976): Periodic Equilibria for Matrix Riccati Equations. Stochastics 2, 1-104. Belevitch, V. (1968): Classical Network Theory. Holden Day, San Francisco. Bernstein, D.S and E.G. Gilbert (1980): Optimal Periodic Control: The H Test Revisited. IEEE Trans. Automatic Control AC-25, 673-684. Bittanti, S. and P. Bolzern (1984,a): Can the Kalman Canonical Decomposition be performed for a Discrete-time Linear Periodic System? Ist Latin American Conference on Automatica, Campina Grande, Brazil, 449-453. Bittanti, S. and P. Bolzern (1984,b) : Canonical Decomposition and Discrete-time Linear Systems. 23rd Conference of Decision and Control, Las Vegas, U.S.A., 1737, 1738. Bittanti, S. and P. Bolzern (1984,c): Four Equivalent Notions of Stabilizability of Periodic Linear Systems. 3rd American Control Conference, San Diego, U.S.A., 1321-1323. Bittanti, S. and P. Bolzern (1985,a): Reachability and Controllability of Discrete-time Linear Systems. IEEE Trans. Automatic Control 30, 399-491. Bittanti, S. and P. Bolzern (1985,b): Discrete-time Linear Periodic Systems: Grammian and Modal Criteria for Reachability and Controllability. International J. Control 41, 899-928. Bittanti, S. and P. Bolzern (1985,c): Stabilizability and Detectability of Linear Periodic Systems. Systems and Control Letters 6, 141-145. Plus Addendum, to appear in Systems and Control Letters (1986), 7, 73. Bittanti, S. and P. Bolzern (1986): On the Structure Theory of Discrete-time Linear Systems. International J. Systems Science, 17, 33-47.

177

Bittanti, S., P. Bolzern and P. Colaneri (1984): Stability Analysis of Linear Periodic Systems via the Lyapunov Equation. 9th IFAC World Congress, Budapest, 8, 169-172. Bittanti, S., P. Bolzern and P. Colaneri (1985): The Extended Periodic Lyapunov Lemma. Automatica 5, 603-605. Bittanti, S., P. Bolzern, P. Colaneri and G. Guardabassi (1983): H and K-Controllability of Linear Periodic Systems. 22nd Conference on Decision and Control, S. Antonio, U.S.A., 1376-1379. Bittanti, S. and P. Colaneri (1986): Lyapunov and Riccati Equations: Periodic Inertia Theorems. IEEE Trans. Automatic Control (to appear). Bittanti, S., P. Colaneri and G. De Nicolao (1986): Discretetime Periodic Systems: a note on the Reachability and Controllability interval length. Centro Teoria Sistemi, Politecnico di Milano, Int. Rep. 86-003. Bittanti, S., P. Colaneri and G. Guardabassi (1984): H-Controllability and Observability of Linear Periodic Systems. SIAM J. Control and Optimization 22, 889-893. Bittanti, S., G. Fronza and G. Guardabassi (1973): Periodic Control: A Frequency Domain Approach. IEEE Trans. Automatic Control 18, 33-38. Bittanti, S., G. Guardabassi, C. Maffezzoni and L. Silverman (1978): Periodic Systems: Controllability and the Matrix Riccati Equation. SIAM J. Control and Optimization 16, 37-40. Bittanti, S. and D.B. Hernandez (1986): The Simple Pendulum as an Illustrative Example of the Periodic Control Problem. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-010.

Bolzern, P. (1986): Criteria for Reachability, Controllability and Stabilizability of Discrete-time Linear Periodic Systems. V Polish-English Seminar on Real-Time Process Control, Warsaw, Poland.

178

Bolzern, P. and P. Colaneri (1986): Existence and Uniqueness Conditions for the Periodic Solutions of the Discretetime Periodic Lyapunov Equation. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-011. Brockett, R.W. (1970): Finite Dimensional Linear Systems. Wiley and Sons.

J.

Brunovsky, P. (1969): Controllability and Linear Closed loop Controls in Linear Periodic Systems. J. Differential E~uations 6, 296-313. Chen, C.T. (1970): Introduction to Linear System Theory. Rinehart and Winston.

Holt,

Colonius, F. (1985ja): Optimality for Periodic Control of Functional Differential Systems. J. Mathematical Analysi s and Applications (to appear). Colonius, F. (1985,b): The High Frequency Pi-Criterion for Retarded Systems. IEEE Trans. Automatic Control 11, 1045-1048. DaPrato, G. (1984): Periodic Solutions of Infinite Dimensional Riccati Equations. Rendiconti Accademia Nazionale dei Lincei, (to appear). Dorato, P. and A.H. Levis (1971): Optimal Linear Regulators: the Discrete-time Case. IEEE Trans. Automatic Control 6, 613-620. Dorato, P. and H.K. Knudsen (1979): Periodic Optimization with A p p l i c a t i o n s to Solar Energy Control. Automatica 15, 673-679 Gardner, W.A. and D.E. Franks (1975): Characterization of Cyclo-stationary Random Processes. IEEE Trans. Information T h e o r y 21, 1-24. Gilbert, E.G. (1977): Optimal Periodic Control: A General Theory of Necessary Conditions. SIAM J. Control and Optimization 15, 717-746.

17g

Gilbert, E.G. and D.T. Lyons (1981): The Improvement of Aircraft Specific Range by Periodic Control. AIAA Guidance and Control Conference, Albuquerque. Graham, A. (1981): Kronecker Products and Matric Calculus with Applications. Ellis Horwood Limited, Chichester. Grasselli, O.M. (1984): A Canonical Decomposition of Linear Periodic Discrete-time Systems. International J. Control 40, 201-214. Guardabassi, G. (1971): Optimal Steady State Versus Periodic Control. Ricerche di Automatica 2, 240-252. Guardabassi, G. (1976): The Optimal Periodic Control Problem. Journal A 17, 75-83. Halanay, A.(1966): New York.

Differential Equations.

Academic Press,

Hautus, M.L.J. (1969): Controllability and Observability Conditions of Linear Autonomous Systems. Inda@. Math. 443-448.

72

Hernandez, V. and L. Jodar (1985): Boundary Problems and Periodic Riccati Equations. IEEE Trans. Automatic Control 11, 1131-1133. Hewer, G.A. (1975): Periodicity, Detectability and the Matrix Riccati Equation. SIAM J. Control 13, 1235-1251. Horn, F.J.M. and R.C. Lin (1967): Periodic Processes: A Variational Approach. Ind. Eng. Chem. Process Des. Dev. I, 21-30.

6,

Horn, F.J.M. and J.E. Bailey (1968): An Application of the Theorem of Relaxed Control to the Problem of Increasing Catalyst Selectivity. J. Optimization Theory and Applications 2, 441-449. Houlihan, S.C., E.M. Cliff and H.J. Kelley (1982): Study of Chattering Cruise, Journal Aircraft 19, 119-124.

180

Johnson, C.D. (1966): Invariant Hyperplanes for Linear Dynamical Systems. IEEE Trans. Automatio Control 11, 113-116. Kabamba, P.T. (1985): Monodromy Eigenvalue Assignment in Linear Periodic Systems. 24th Conference on Decision and Control, Ft. Lauderdale, U.S.A., 177, 178. Kalman, R.E. (1969): Theory of Regulators for Linear Plants. In: Kalman R.E., P.L. Falb and M.A. Arbib: Topics in Mathematical S y s t e m Theor[. McGraw-Hill Co., New York. Kano, H. and T. Nishimura (1979): Periodic Solutions of Matrix Riccati Equations with Detectability and Stabilizability. !nternational J. Control 29, 471-487. Kern, G. (1980): Linear Closed-loop Control in Linear Periodic Systems with Application to Spin-stabilized Bodies. International J. Control 31, 905-916. Khandelwal, D.N., J. Sharma and L.M. Ray (1979): Optimal Periodic Maintenance of a Machine. IEEE Trans. Automatic Control 24, 513. Khargonekar, P.P., K. Poolla and A. Tannenbaum (1985): Robust Control of Linear Time-invariant Plants Using Periodic Compensation. IEEE Trans. Automatic Control 11, 1088-1098. Kono, M. (1980): Eigenvalue Assignment in Linear Periodic Discrete-time Systems. International J. Control I, 149-158. Maffezzoni, C. (1974): Hamilton-Jacobi Theory for Periodic Control Problems. J. Optimization Theory and Applications 14, 21-29. Markus, L. (1973): Optimal Control of Limit Cycles or what Control Theory can do to Cure a Heart Attack or to Cause One. Symposium on Ordinary Differential Equations, Minneapolis, Minnesota (1972). W.A. Harris, Y. Sibuya, eds., SpringerVerlag, Berlin. Matsubara, M., N. Nishimura, N. Watanabe and K. Onogi (1981): Periodic Control Theory and Applications. Research Reports of Automatic Control Laboratory Vol. 28, Faculty of Engineering, Nagoya University.

181

Matsubara, M. and K. Onogi (1978): Stabilized Suboptimal Periodic Control of a Chemical Reactor. IEEE Trans. Automatic Control 23, 1005-1008. Meyer, R.A. and C.S. Burrus (1976): Design and Implementation of Multirate Digital Filters. IEEE Trans. Acoustics, Speech and Signal Processing 1, 53-58. Nistri, P. (1983): Periodic Control Problems for a Class of Nonlinear Periodic Differential Systems. Nonlinear Analysis, Theory, Methods and A p p l i c a t i o n s 7, 79-90. Noldus, E. (1975): A Survey of Optimal Periodic Control of Continuous Systems. Journal A 16, 11-16. Onogi, K. and M. Matsubara (1980): Structure Analysis of Periodically Controlled Chemical Processes. Chem. En~. Sci. 34, 1 0 0 9 - 1 0 1 9 . Popov, V.M. Berlin.

(1973): Hyperstability of control systems.

Springer,

Sch~dlich, K., U. Hoffmann and H. Hofmann (1983): Periodical Operation of Chemical Processes and Evaluation of Conversion Improvements. Chemical En~ineerin~ Science 38, 1375-1384. Shayman, M.A. (1984): Inertia Theorems for the Periodic Lyapunov Equation and Periodic Riccati Equation. Systems and Control Letters 4, 27-32. Shayman, M.A. (1985): On the Phase Portrait of the Matrix Riccati Equation Arising from the Periodic Control Problem. SIAM. J. Control and Optimization 23, 717-751. Sincic, D. and J.E. Bailey (1978): Optimal Periodic Control of Variable Time-delay Systems. International J. Control 27, 547-555. Speyer, J.L. (1973): On the Fuel Optimality of Cruise, J. Aircraft 10, 763-764.

182

Speyer, J.L. (1976): Non-optimality of Steady-state Cruise for Aircraft. AIAA Journal 14, 1604-1610. Speyer, J.L. and R.T. Evans (1984): A Second Variational Theory of Optimal Periodic Processes. IEEE Trans. Automatic Control 29, 138-148. Valko~ P. and G.A. Almasy (1982): Periodic Optimization of Hammerstein-type Systems. Automatica 18, 245-148. Watanabe, N., Y. Nishimura and M. Matsubara (1976): Singular Control Test for Optimal Periodic Control Problems. IEEE Trans. Automatic Control 21, 609-610. Watanabe, N., K. Onogi and M. M a t s u b a r a (1981): Periodic Control of Continuous Stirred Tank Reactors - I, Chem. En@. Sci. 36, 809-818, II ibid. 37, 745-752. Watanabe, N., H. Kurimoto and M. M a t s u b a r a (1984): Periodic Control of Continuous Stirred Tank Reactors - I I I , Case of multistage reactors. Chem. En 9. Sci. 39, 31-36. Wimmer, H.K. and A.D. Ziebur (1975): Remarks on Inertia Theorems for Matrices. Czechoslovak Mathematical Journal 25, 556-561. Wong, E. and B. Hajek (1985): Stochastic Processes En~ineerin ~. Springer-Verlag, Berlin.

in

Wonham, W.M. (1968): On a M a t r i x Riccati Equation for Stochastic Control. SIAM Journal Control 6, 681-698. Yakubovich, V.A. and V.M. Starzhinskii (1975): Linear Differential Equations with Periodic Coefficients. J. Wiley, New York.

Chapter

6

Numerical

Problems

Daniel

in L i n e a r

Boley

System

and S e r g i o

Theory

Bittanti

I. I n t r o d u c t i o n The a n a l y s i s tation

of m u l t i v a r i a b l e

of m a t r i x

problems,

rank and e i g e n v a l u e s . computer

In this work,

numerical

We b e g i n

more

we o u t l i n e

problems

with

f r o m linear

linear

2. R e v i e w

used

methods Value

of these

for c o m p u t e r examples

calculations,

of w h e r e

decompositions problems,

eigenvalue

Decompositions). concepts

to m a t r i x

to be u s e d on a

theory.

and r e l a t e d for

systems

the c o m p u -

for h a n d c o m p u t a t i o n s .

and give

in s y s t e m

equations

and S i n g u l a r

few a p p l i c a t i o n s riant

arise

involves

the t e c h n i q u e s

a r e v i e w of the s i m p l e r

of linear

systems

a few t e c h n i q u e s

w h y they are useful,

sophisticated

(Schur

ranging

In general,

are n o t the same as those

illustrate

systems

control

used then

and rank

go on to

computation

We c o n c l u d e

to the a n a l y s i s

to solve

with

a

of t i m e - i n v a

systems.

of S i m p l e r

Computational

Methods

2.1 - LU d e c o m ~ o s ! ~ ! 2 n

We b e g i n that

by i n t r o d u c i n g

the c o n c e p t

is, we try to reduce

simpler we w o u l d

matrices, like

a matrix

from w h i c h

to calculate.

of a m a t r i x

decomposition;

A to the p r o d u c t

we can c a l c u l a t e

of several

whatever

it is

-

184

The

first

a matrix

example

triangular, Gaussian

is the LU d e c o m p o s i t i o n ,

A into the p r o d u c t respectively.

Elimination.

A = LU, w h e r e

in w h i c h w e d e c o m p o s e L, U are

This decomposition

To see this,

lower,

is c o m p u t e d

it is b e s t

upper using

to use an

example.

Consider

A =

[31i] 1 2

1 1

In G a u s s i a n 1 to rows the

Elimination,

2 and

3. T h i s

(1)

the f i r s t

step

is to add m u l t i p l e s

can be a c c o m p l i s h e d

by m u l t i p l y i n g

of r o w A on

l e f t by the m a t r i x

Im

0 1 0

M1 =

21 m31

where,

in this

the r e s u l t

is

0 0 1 case,

m21 = -1/3,

m31 = -2/3

are the m u l t i p l i e r .

Then,

185

MIA

=

2/3

-

(2)

I/3

Then,

in the n e x t

s t e p we

apply

a matrix

I°l

M2 =

I

m32

where row

m 3 2 = -I/2.

2 to r o w

3 U = M2MIA

This

3. T h e

has

=

both

m32 = -I/2

times

2/3

~

(3)

/

sides

by MI I M21

(det M i = I),

so we m a y

to o b t a i n

L = M11 M21 .

By c o m p a r i n g

(I) w i t h

(2) a n d

zero

then

to set to z e r o

M 2 is u s e d

column

2.

In t h e

general

matrices,

one

following

case,

all

the

(3), n o t e

M I is to set to

the

of a d d i n g

is

t h a t M I, M 2 are n o n s i n g u l a r

multiply

where

result

I

0 We n o t e

the e f f e c t

final

where

for e a c h

subdiagonal all

the

Matrix

the

elements

action

Mk,

of m a t r i x

of c o l u m n

subdiagonal

A is n x n, we m u s t

column.

structure:

that

elements

apply

k = 1,2,..,

n-1

I of A; of

"M"

n-l,

has

186

I

"

0 I

Mk =

mk+ I ,k " .

° mn, k 0

T k - t h column Coefficients matrix

The

mj,k,

k + 2 .... , n, w i l l

be

referred

to as the

multipliers.

last

i t e m we n e e d

decomposition that

j = k+1,

the

inverse

as M k w i t h

to c o m p l e t e

is: w h a t of Mk,

is

"L"? To

the description see w h a t

as c a n be e a s i l y

the m u l t i p l i e r s

in t h e k - t h

of the L U

f o r m h a s L, we n o t e

verified,

column

is the

same

negated:

-I I

• Mkl

0 I

= O-mk+]

,k " .

-mn, k

0

T k-th

Secondly, the

result

diagonal.

we n o t e

column

that when

is s i m p l y In o u r

to

fill

we

form

in all

3 × 3 example,

we

the p r o d u c t

L =M11

the m u l t i p l i e r s

have:

"'" Mn

below

the

I'

187

L =

=

Here,

one

can

multipliers

I

I

I_2/3

0

I/2

see

from

the d i a g o n a l we h a v e

/3

that all

the Mk,

in t h e i r

all

"I" 's.

the n e t

=

change

Hence,

the

1/2

is to c o l l e c t

sign

and place

position.

(i,j)

-m.. = the m u l t i p l i e r u s e d on r o w 13 the s t a g e w h e n c o l u m n j is b e i n g

I

I_2/3

effect

corresponding

/3

position

j when

added

all

the

them below

On t h e d i a g o n a l , of L,

i > j, is

to r o w

i during

zeroed:

i 0 L =

(-Iniji" -

In c o n c l u s i o n , L is l o w e r

we have

triangular

found with

a decomposition

"1" 's

on

for A : A = LU,

the d i a g o n a l

where

a n d U is u p p e r

triangular.

What

can we

do w i t h

this?

We give

2 uses

of t h i s

decomposition:

A. Solve Linear Equations By u s i n g

LUx

A=LU,

the

system

Ax =b


to

= b.

If we

(4)

call

y = Ux,

we t h e n

Ly = b

Ux

= y.

reduce

equation

(4) to two t r i a n g u l a r

systems:

188

Triangular

systems

are

"back-substitution". of G a u s s i a n

... M 1 b

Then

solution

Ux

= L-Ib

solved

using

note

that

also

to t h e v e c t o r

= L-lAx

x can be

if w e

= Ux

found

by

the process

apply b,

known

as

the

row operation

the

result

will

be

= y.

solving

= y.

It t u r n s o u t operations work

that also

to s o l v e

except new

We

Elimination

Mn_lMn_ 2

the

easily

that

right

the extra

to b to o b t a i n

L y = b f o r y.

by using

hand

work

involved

in a p p l y i n g

y is e x a c t l y

The

two

L y = b, w e m a y

s i d e b, w i t h o u t

schemes solve

the

same

as t h e

are exactly

directly

repeating

the row

equivalent,

A x = b, w i t h

a

the decomposition.

B. Computing Determinant Since

the determinant

product

of a p r o d u c t

of t h e d e t e r m i n a n t s ,

determinant

uij

known

fact

is t h e

the product

Using

2.2

that of

(i,j)

element

the diagonal

of d e t A

- Orthoqonal

2.2.1

is e q u a l write

to t h e

the

... Unn) ,

of U.

the determinant

the LU decomposition

definition

immediately

of A as:

d e t A = d e t L • d e t U = I " (u 1 1 u 2 2

where

of m a t r i c e s

we may

Here,

we have

of a t r i a n g u l a r

used

matrix

the w e l l is s i m p l y

elements.

is m u c h

(Stewart,

faster

than

using

the

1973).

Decomposition

- Q R Decomposition

In t h e L U d e c o m p o s i t i o n ,

we

have

applied

matrices

that

are not

189

orthogonal; Since

they

2 vectors

formations,

do not p r e s e r v e m a y be m a d e

we w o u l d

transformations,

like

which

almost

to see w h a t

Q is o r t h o g o n a l

ortho-normal,

i.e.

=

{0

or angles

parallel

do p r e s e r v e

A n x n matrix

qlqJ

lengths

of vectors.

by such t r a n s -

one can do w i t h

lengths

orthogonal

and angles.

if its c o l u m n s

qi are m u t u a l l y

, if i ~ j ,ifi=j,

or,

in m a t r i x

notation,

Q'Q = I.

In this

section,

we

show h o w one m a y t r i a n g u l a r i z e

using o n l y o r t h o g o n a l and angles.

transformations,

thereby

preserving

T h e n we show w h y the use of o r t h o g o n a l

is p a r t i c u l a r l y

useful

Consider

the matrix:

A =

I

by g i v i n g

an e x a m p l e

a matrix lengths

decompositions

of its use.

I

We w o u l d where

like to r e d u c e

"?" d e n o t e s

determined.

We

transformation

QI =

where,

the

a nonzero

first

element

see h o w to do this of the

column

D

whose

using

1

value

~ ' to

~

0

is to be

a "rotation",

i.e.

a

form

c 0

by o r t h o g o n a l i t y , c 2 + s 2 = I.

(5)

0~',

190

We w i l l

u s e QI

to z e r o

element

a21,

i.e.

is[!ic0[!I The

second

line yields:

-S"

3

I

+

which,

c"

=

0,

together

with

C = 3//1-0

,

Having

QI'

defined

s a m e way,

Q2

=

to

zero

we

find

c 2 + s 2 = I, y i e l d s

s = 1/I/To .

we a p p l y the

it to A to o b t a i n

elements

c and

s of the

Q1A.

Then,

in the

rotation

(6)

1 0

a third

a31.

To c o m p l e t e

rotation

Q3 to

the zero

triangular a32,

decomposition,

obtaining

we n e e d

finally

R = Q3Q2QIA. In g e n e r a l , zero

all

in the n x n case,

the

R = QrQr_1

subdiagonal

... Q 1 A

= an u p p e r

Let

Q =

(QrQr_1

..- Q1 )-I.

By o r t h o g o n a l i t y :

we n e e d

elements

r = n(n+1)/2

of A:

triangular

matrix.

rotations

to

191

,

!

!

Q = Q1Q2

We h a v e

"'"

Qr

now

the

"

so c a l l e d

triangularization

A

=

2.2.2

ortho

A rotation seen from problem

only

upper triangular

the

of a v e c t o r ,

we may

look

as c a n be

at a r e p r e s e n t a t i v e

2 × 2 rotation:

also

x and e I =

2 components

(6). H e n c e ,

2-space.

represent

Consider

:

R

affects

(5) a n d

in t h e

Consider

X

•

Geometric Interpretation of a Rotation

-

We m a y

or o r t h o g o n a l

of A:

Q

arbitrary

QR decomposition

c = cos

a vector

D

%

, s = sin

~ for

some angle

x of R 2, a n d d e n o t e

by

8 the

~

angle

9':

Ixl fcos:l =

IIx II [ s i n

"

Hence,

[ c

Qx

=

llxll

cos

that

+ s

sin

i cc

os(O-~

=

-S COS

This means

0

Qx

8 + C sin

is t h e v e c t o r

llxll

Lsin(8-%~

x rotated

by angle

.

#.

between

192

2.2.3

Decomposition by Householder Transformations

QR

As we h a v e

seen

in 2 . 2 . 1 ,

be o b t a i n e d

by m u l t i p l y i n g

alternative

way

holder

vector

is t h a t

as o n e

in g e n e r a l , component one

likes

can

only.

rotations.

This

n-1 The

reflection")

implies

Householder

c a n be

of a s i n g l e

that,

introduced

to t r a n s f o r m

by m a k i n g

"House-

trans-

components

to o b t a i n

a vector

so-called

to z e r o

out

can

is an

of a

transformation,

transformations, transformation

There

of t h e s e

as m a n y

it is p o s s i b l e

Householder

We w a n t

on t h e

advantage

zero out

by m e a n s

QR d e c o m p o s i t i o n

rotations.

Q, b a s e d

The main

by a r o t a t i o n

can use

follows.

one

Q of the

n(n+l)/2


transformations".

formations

matrix

whereas,

one

the Q R d e c o m p o s i t i o n , instead

of n ( n + l ) / 2

(or " e l e m e n t a r y reference

to R 2 as

x to a v e c t o r

v along

e.g.,

axis

e I = [I

such

03

that

vector

,

n v [[ =

around

[[ x [[

x + v

This

(see Fig.

c a n be a c h i e v e d 1).

[ Z=X+ V

eI

Figure 1.

v

by r e f l e c t i n g

the

193

We

go

v=

through

the

following

steps

(note

that

we know

x

and

ttxti e1~

- Axis

of

reflection

z = x+v

= FXl +

II x il,x~]'

I..

The

corresponding

- Project

x

onto

unit

the

axis

vector

of

is

z/fir If.

reflection

to

obtain:

Z Z Ix

tl z II2 - Find

the

difference

between

x

and

its

projection:

a

(or

equivalently

b=a-x.

- Reflect v

x around

= x+2b=x+

its

2(a-x)

projection = 2a-x

2 -z z- ' x = -

= -x+

z) :

(I- 2

ilzli The

zz'

) X.

ilzil2

matrix ZZ

!

P =I-2----

,

IIzll2

gives

the

Householder

transformation.

Since

v = - Px,

we

can

conclude

"reflect"

a vector

In n - s p a c e , to

zero

that,

out

we at

can once

x

by into

pick as

such any

the

many

a linear axis

of

transformation, the

desired

target

components

of

one

can

space.

direction a vector

as

c we

so

as

like.

194

2.2.4

Solving Least Squares Problems Using Orthogonal Decompositions

Let A e R mxn, m_>n, problem

b e R m and x e R n. The L i n e a r L e a s t Squares

is the p r o b l e m of f i n d i n g the f o l l o w i n g m i n i m u m

min I 1 ~ - b

II

X

The a l g o r i t h m of 2.2.1 may be a p p l i e d to r e c t a n g u l a r just as well as square ones. decomposition

matrices

In this case we see that the Q R

of the r e c t a n g u l a r

w h e r e Q ~ R m x m is o r t h o g o n a l ,

m a t r i x A is:

R ~ R nxn is upper t r i a n g u l a r

0 e R (m-n)xn is a b l o c k of zero elements.

As Q is o r t h o g o n a l ,

it does not change the norm.

llAx-bI' = I'Q' (Ax-b)'l

= " [RTx-c

Lol

II,

where

c = Q'b. Partitioning

C

=

this v e c t o r c o n f o r m a l l y

r

IC21 we have

IIAx - b II = II [RXc2-c I II

with [~I,

Hence:

and

195

In o r d e r

to m i n i m i z e

this norm,

we

set

x = R-Ic I .

(9)

Thus,

min x

II A x - b

To f i n d

II : II c 2 ll-

the o p t i m u m

value

of x g i v e n

by

(9) we h a v e

to s o l v e

system

Rx = c I .

In this of

(10)

respect,

N Ax-b

A'Ax

II l e a d s

to the

noticing

celebrated

a direct

normal

minimization

equations:

of

(11)

(8),

system

(11)


to

= R'C I.

(12)

It is i m p o r t a n t

to o b s e r v e

to s o l v i n g

(12)

as o n e o b t a i n s

discussion

of

Systems,

this,

see e.g.

give an e x a m p l e

Suppose

only

(Lawson

0

solving fewer

use

system

errors.

the e r r o r

and Hanson,

(10)

For

a complete

analysis

1974).

is p r e f e r a b l e

of L i n e a r

However,

we m a y

of the p r o b l e m .

on a computer

7 significant

[ 11 10 -4

that

one must

we a r e w o r k i n g

we c a r r y

A =

that

-- A'b.

In v i e w

R'RX

i t is w o r t h

0

10 -4

digits.

with

precision

Consider

10 -7

i.e

matrix

(13)

196

A has r a n k

A'A

I

=

2, but

if we f o r m

1 + 10-8

1

1

1+10 -8

in o u r c o m p u t e r

J

w e w i l l loose

the p a r t

which

has r a n k

3. S p e c i a l

1. So, we loose r a n k

Forms

Used

in N u m e r i c a l

The LU a n d Q R d e c o m p o s i t i o n s s t e p in the c o m p u t a t i o n in the f o l l o w i n g . flavour

useful

things

(ii)

used

a given

Determinant:

(iii)

R a n k of A

(iv)

Nullspace

is w e l l

known,

section

I

also

are u s e d

as the basic

to be i n t r o d u c e d

serves

of f i n d i n g

square matrix

to give

the

A,

a n u m b e r of i.e.

1

det A

of A: ker A

(v) I m a g e o r R a n g e of A:

- The J o r d a n

above

in the r e s t of this work.

on the p r o b l e m s

(i) E i g e n v a l u e s :

Linear Algebra-Why

discussed

The p r e v i o u s

about

information.

of the d e c o m p o s i t i o n s

of the a p p r o a c h

We n o w c o n c e n t r a t e

As

instead

El ii

A'A =

3.1

"10 -8" a n d o b t a i n

Canonical

there

c o l s p A.

Form

are m a n y

classical

decompositions

for

197

matrices,

the m o s t

common

A is then d e c o m p o s e d

A = PJP

where

-I

into the p r o d u c t

P is n o n s i n g u l a r , Form

eigenvalues

of A

(product

(dimension

corresponding

( elements

so-called

Matrix

minus

corresponding

of J),

and the

of J o r d a n

the c o l u m n s

bloks

of P c o r r e s p o n

the n u l l s p a c e

to n o n z e r o

us the

the

of J)

the n u m b e r

of J g e n e r a t e

Jordan

form can tell

elements

to h i = 0). F u r t h e r m o r e ,

of A,

rows of J g e n e r a t e

of A.

So, the J o r d a n

Canonical

(i) - (v). However,

almost

This

of the d i a g o n a l

of the m a t r i x

the c o l u m n s

the range

1959).


ding to all zero c o l u m n s

advised.

decomposition.

of 3 m a t r i c e s :

and J is in the

(Gantmacher,

determinant

whereas

the J o r d a n

,

Canonical

rank

being

The m a t r i x

singular),

separated(almost separated", This calls

especially

Conditionin~

conditioning

finite w o r d

Form

li are p o o r l y

I. are "well l is an i l l - p o s e d problem.

consideration

computer.

Because

can be r e p r e s e n t e d

in the computer.

In the t r e a t m e n t used

are p e r t u r b e d

(i.e.

of a P r o b l e m

model m o s t

often

items is ill-

if the

is an i m p o r t a n t

numbers

form

of

use of a d i g i t a l

length,

even

this

ill-conditioned

if the e i g e n v a l u e s

the J o r d a n

for the q u e s t i o n

one c o n s i d e r s t h e

one to find out all

computations,

P can be e x t r e m e l y

coincidingl.But

finding

3.2 - N u m e r i c a l

Numerical

Form enables

for n u m e r i c a l

only

whenever of the approximately

of such a p p r o x i m a t i o n s ,

is to c o n s i d e r

what happens

the

if the n u m b e r s

slightly.

The e i g e n v a l u e s take for e x a m p l e

can be e x t r e m e l y the

3 x 3 matrix

sensitive

to p e r t u r b a t i o n s :

198

-64

82

144

-178

-46

-778

962

248

A =

which

21]

has e i g e n v a l u e s

(14)

1, 2, 3. If we add a small p e r t u r b a t i o n

EE,

where

01 E = 10 -4

-0.6

I. I

-6

_-0.1

0.3

-1

is a rank one m a t r i x perturbed shows

that

!

matrix

of n o r m ~10 -3, then,

A + EE w i l l

problems

have

m a y occur

complex

for any e > 0.45, eigenvalues!

e v e n on small

the

This

innocuous-looking

matrices!

Even with

7-16 d i g i t s

In the f o l l o w i n g perturbations destroy order

of accuracy,

20 x 20 e x a m p l e s ,

in the

all d i g i t s

of m a g n i t u d e

9 th place

of a c c u r a c y will

this

is a r e l e v a n t

due to W i l k i n s o n

in some e l e m e n t

will

in the e i g e n v a l u e s -

be w r o n g

problem:

(1965), completely even

the

in some cases:

["20 20

20 19

20 18

20 17

0 20 16

20 (15)

A = 5

0

20 4

20 3

20 2

20 I

199

We p o i n t apply then

out that

to o b t a i n

an a l g o r i t h m . slight

result,

as was

gorithm

that can give

In c a s e a p r o b l e m

use m e t h o d s

satisfactory

is n o t so b a d l y

introduce

collectively

the s t a b i l i t y

conditioned

and the

believe

the answer

the p r o b l e m is w e l l hope

hope to improve ditioned

The usual

in terms

4. Schur

of s t a b i l i t y

"Backward

of a s p e c i f i c

If a p r o b l e m

decomposition

which

most closely

corresponds

called

decomposition.

Schur

this d e c o m p o s i t i o n

badly

con-

not.

in L i n e a r

in the n e x t

is n u m e r i c a l l y

best

Algebra intro-

section.

to the J o r d a n

is g i v e n

Denoting

useful

decomposition

by

"*" c o n j u g a t e

is the

so-

transpose,

(16)

Q is a ( p o s s i b l y (possibly

and w h i c h

by

A = QRQ*,

is a

we c a n n o t

Decomposition

The m a t r i x

where

case,

for a or

of

If the p r o b l e m

is p r o b a b l y

algorithm

then we m a y

t h e n we can

in the o p p o s i t e

This

is w e l l -

by the c o n d i t i o n i n g

for a l g o r i t h m s

Stability".

causing

are c a l l e d

is u n s t a b l e ,

stable

like to

the b e s t

it is stable

of a s o l u t i o n

by any a l g o r i t h m ,

measure

is s o - c a l l e d duced

the a c c u r a c y

problem

providing properties

of the algorithm).

but

is no al-

we w o u l d

as p o s s i b l e ,

defined

but the m e t h o d

algorithm;

there

t h e n we m a y con-

regard

to solve

limits

and the s t a b i l i t y

for a b e t t e r

conditioned,

In this

desirable

method

case

in

the d e s i r e d

results.

of an algorithm.

(to the

conditioned,

In this

or at least

These

or i l l - c o n d i t i o n e d

can d e s t r o y

as few e r r o r s

perturbations,

on such errors.

one m u s t

(such as t h a t o c c u r i n g

length)

(15).

for its solution.

which

the s m a l l e s t bounds

word

seen in e x a m p l e

to a p r o b l e m

is i l l - p o s e d

to the d a t a

from the finite

sider m e t h o d s

solution

If a p r o b l e m

perturbations

the c o m p u t e r

the

complex) o r t h o g o n a l

complex)upper

triangular

matrix matrix.

(Q*Q = I) and R

200

We f i r s t n o t e the m a t h e m a t i c a l position: values

since

elements.

= detQ

Hence,

is a l w a y s

form yields

the n u m e r i c a l bounded

"almost parallel",

the o r i g i n a l

instability

items

properties?

we can m a k e

form

between

(16)

is n o t m a d e worse.

same

size

to be

stable,

that

b l e m c l o s e to the o r i g i n a l

of s a y i n g

slightly

perturbed

b l e m A".

Here,

R.

3.2.

This

and

The a d v a n t a g e

in A w i l l

result

sensitivity

in

to

any algorithm/form

shown

(Wilkinson,

is e x a c t

1965)

for a p r o -

R

starting problem

As is s e e n

is f o r w a r d

stable:

"The

R c l o s e to the R t h a t w e w o u l d h o p e

starting with

an a n s w e r

(a) is not t r u e destroy

so the

is the r e s u l t

an a n s w e r

"close r' m e a n s

used.

in Sect.

in

is n o t

problem.

in a p r o b l e m

In g e n e r a l ,

(b) the a l g o r i t h m

has p r o d u c e d

"close to s i n g u l a r i t y " ) .

one.

exact arithmetic

p r o b l e m A", we say

are

the i l l - c o n d i t i o n i n g

can be

(a) the a l g o r i t h m

has p r o d u c e d

it

(the c o l u m n s

sure that the r e s u l t

in R,

transformations

to obtain w i t h

of Sect.3.

Q is o r t h o g o n a l ,

is t h a t p e r t u r b a t i o n s

perturbations

the c o m p u t e r

Since

ill-conditionin~

b a s e d on u n i t a r y "backward"

(ii)

t h a n the s t a r t i n g

of the

statement

just

the r e l a t i o n :

of R.

so t h a t Q is n e v e r

perturbations

complete

satisfy

(i) a n d

of an a l @ o r i t h m m e n t i o n e d

of the S c h u r

algorithm

are

elements

to p e r t u r b a t i o n s

is the d i s t i n c t i o n

Instead

the e i g e n v a l u e s

can hope to r e m o v e

problem,

sensitive

algorithm

the e i g e n -

in size a n d w e l l - c o n d i t i o n e d

T h o u g h no a l g o r i t h m

more

for w h i c h

transformation,

The d e t e r m i n a n t s


the S c h u r

What about

R,

o f the S c h u r d e c o m -

d e t R det Q* = det R =

= product

never

is a s i m i l a r i t y

of A are t h o s e o f

the diagonal

detA

(16)

relevance

the e x a c t

is b a c k w a r d which

original

stable:

is e x a c t

"The

for a

A c l o s e to the o r i g i n a l

on the o r d e r

of t h e p r e c i s i o n

from examples

in g e n e r a l - s l i g h t

(14)

and

changes

(15),

pro of the

to A c a n

201

By u s i n g exact

orthogonal

eigenvalues

In e x a c t

no method examples

of

can

(14)

c a n be

we make and

the

still

obtain

(backward

hence

If o u r m e t h o d still

stability).

we o b t a i n

makes

the

exact

slight

hope

for

errors

(b). T h e s e

in a m e t h o d •

Schur f o r m c a n n o t be used reliably for items (iii)-(v).

Schur

as

form

follows.

If A is a l r e a d y

is o b t a i n e d

with

upper

R = A, Q = I. C o n s i d e r

the n × n m a t r i x :

I -I

-1

I

A

to A

one can expect

illustrated

triangular,

close

(a), b u t we c a n

the b e s t the

we may

no errors,

(15).

satisfy

shows

Unfortunately,

then

for a m a t r i x

arithmetic,

eigenvalues

This

transformations,

0

=

(17)

". -I 1

i

If ~ = 0, a l l

the

rank

is c l o s e

to d e f i c i e n t ,

will

make

A singular!

-I x =

eigenvalues

-2 , ~

Therefore, a matrix.

-3 r (~ , - • . t

we need Such

This

exactly

since

-(n-~)

-(n-l) ,

way

~

by

e to

the

e = - I / 2 (n-2)

forming

Ax with

-(n-1 r

to f i n d

is p r o v i d e d

I. N e v e r t h e l e s s ,

"perturbing"

c a n be v e r i f i e d

(~

a better

a way

are

by the

the rank, Singular

det, Value

etc.

of

Decom-

position.

5.

Singular

In t h i s items We

section,

such

also

and try

Value

Decomposition-Condition

we

introduce

as r a n k :

introduce to e x p l a i n

the

the its

another

Singular

concept

Number

decomposition

Value

Number

a Matrix

relevant

Decomposition

of C o n d i t i o n

significance.

of

to

(S.V.D.).

of a m a t r i x

202

The

A

S.V.D.

of a mx p matrix

A is

-- U ~ V *

where

(18)

U and V are

m x p real By

and diagonal

letting

oI ! o2 ~

IIAII =

o2,

...,

and orthogonal

matrix,

n = min(m,p),

= diag (al,

In w h a t

square

we

with

usually

matrices

nonnegative assume

and

E is a

diagonal

elements.

that

on ) ,

-.- Z o n I 0.

follows,

we w i l l

IIAxll

max

use

the m a t r i x

2-norm:

,

(19)

IIx II = 1 w h e r e llx llis t h e u s u a l several

properties.

the

fact

the

2-norm

that

vector

The most

orthogonal

of a matrix

2-norm.

In t h i s

immediately

matrices

do n o t

norm,

relevant affect

we have

property (Stewart,

is 1973)

or v e c t o r :

IIQxll = IIx II

(20a)

IIQAII=

IIA ll-

(20b)

Notice

also

What

kind

obvious

IIAII

that of

IIQ

II = 1.

information

is t h e n o r m

o f A.

From

gleam

(18),

from the

(20)

and

S.V.D.?

The most

(19):

= I1~11 = 01 •

If A is a n o n s i n g u l a r

A -1

can one

= V Z-I U*.

Moreover,

square

matrix,

its

inverse

is g i v e n

by

203

IIA-Ill = 11 z-III = %1 Given a square nonsingular matrix A, the number

kIAl =IIAII • IIA-III

(21)

is said to be the condition number of A. Obviously,

k(A)

= oI/~n"

The condition number happens to be a very useful quantity in estimating the sensitivity of such items as rank, determinant, inverse,

solution to a set of linear equations,

etc., with

respect to perturbations to the matrix A. It also gives the "distance to singularity" To see this, we start with the classical origin of k(A). Consider the problem of solving the matrix equation Ax = b. When using a computer, we obtain an approximate result ~, which we consider exact for the slightly perturbed problem A~ = ~. Note that we have perturbed only b, not A. We have:

Ax = b ~u A~=b. Subtract to get

A(~-x)=b-b. Multiplying both sides by A -1 and taking the norms, one obtains:

i~-xJl ~ liA-Ill il~-bll, i.e. the

(error in answer)

122) is bounded by the(error in right

hand side) magnified by iIA-III. However,

to estimate the number

of digits of accuracy in x, we need the "relative error"

204

II~-xll ilxll If the r e l a t i v e of accuracy,

error

is e.g. % 1 0 -6,

regardless

of the r e l a t i v e

error,

of the

then we h a v e

about

size of x. To o b t a i n

we use the r e l a t i o n

6 digits

an e s t i m a t e

A x = b to obtain:

IIAII Ilxl[ > [Ibll i.e.

flail > I llbil -]Ixil From

(22)

113-xlI Ilx II i.e.

and

(23)

(23),

and d e f i n i t i o n

(21),

I, for any A.

numbers

values.

of some p a r t i c u l a r

205

Indeed:

= (c) k(Q)

IIAII

IIA-111

z

IIAA-111

=llIII

= 1

= 1, if Q is o r t h o g o n a l

(d) L e t T 6 be the are t.. = 13

6 × 6 Hilbert

matrix,

the

elements

of w h i c h

(1+i+j) -I.

Then, k(T 6) ~ 106 .

The

S.V.D.

a n d k(A)

singularity"

c a n be

used

to f i n d

the

of a m a t r i x

A. We n o t e

that

7 = number

of n o n z e r o

G. s.

"distance

to

!

rank A = rank

In p a r t i c u l a r ,

if A is n o n s i n g u l a r ,

t h a t A is n o n s i n g u l a r singular.

V*(A+E)

Then,

a i > 0, V i .

a n d E is a p e r t u r b a t i o n

we m a y w r i t e ,

U =V*AU

1

+V*EU

using

the

S.V.D.

Now,

such

suppose

t h a t A + E is

of A

(18):

= Z +F,

where

F = V*EU.

Because

U, V are

to A c o r r e s p o n d We

can define

smallest

d

= sing

In v i e w

dsing

=

"distance

E such

EII

:

II F II- T h u s ,

to p e r t u r b a t i o n s to

singularity"

F to as the

perturbations

E. norm

of the

t h a t A + E is s i n g u l a r :

min A + E sing.

of t h e

orthogonal,ll exactly

II E II •

discussion

min 7.+F sing.

II F II

(25)

above,

this

corresponds

to

(26)

E

206

Since

Z = diag(ol,

is c l e a r

that

o 2 , . . . , On) , w i t h

the F which

achieves

o I I o2 >

the m i n i m u m

... _> o n > 0, in

(26)

it

is

F = d i a g ( 0 , 0 , . . . , 0 , - ~ n)

so t h a t

lJ ~ Jl= % Hence,

the E a c h i e v i n g

we have

U = [u I u 2

labeled

(25)

is

the columns

of U , V :

... U n ]

V = Iv I v 2 ... V n ]

Notice

in

= - O n U n v*n '

E = UFV*

where

the minimum

also

.

that

If ~ Ir= % Hence,

dsing

the

distance

to s i n g u l a r i t y

is

= an .

(27)

Consequently, to t h e

size of

d

the the

"relative starting

distance matrix

A)

to

singularity"

(relative

is:

o sing

_

}IA II SO,

k(A)

solving

n

- k (A) .

(28)

°I not only Ax=

b, b u t

indicates also

the

shows

difficulty

how

close A

one

can expect

is to a s i n g u l a r

in

207

matrix relative gives

the

sensitivity

It s h o u l d defined

be n o t e d

using

S.V.D.

We h a v e

spaces

We n o t e

hold

the

suppose

... ~

in a n y

the

size

r a n k A,

For

about

this the

case,

analogous

of t h e

smallest

(Or/O I) g i v e s

we

...

to

IIAII II

c a n be

to a v e c t o r

norm,

involving

2-norm.

(iv)

such quantities

and

(v) of S e c t . 3 ,

start with values

and

perturbation the relative

a singular

of A

= o n = 0,

(27)

k(A)

in A.

to o b t a i n

points

singular

words,

The results

in the

can be used

that

:

such norm. only

that,

and

k(A)

corresponding

o r > Or+ I = Or+ 2 =

in p a s s i n g

or gives

quantity

norm

S.V.D.

In o t h e r

A to p e r t u r b a t i o n s

are valid

colsp A?

L e t us

oI ~ o2 ~

the

r a n k A, k(A) . W h a t

k e r A,

n × n A.

reduce

(24)

of t h e m a t r i x .

of rank

that

however,

seen how

as IIAII ,

size

any matrix

and relations the

to t h e

satisfy:

so t h a t

(28), needed size

of

r a n k A = r.

the quantity to

further

such

perturbation•

We w r i t e

the

S.V.D.

of A as

"a 1

v;]

o2

0 or

v Eu12[iI 0°

0

A = U

0

0 0

where

Z I = diag(ol,

o2

,---,

or )

(29)

208

is r × r a n d n o n s i n g u l a r . to Z.

A

been

partitioned

conformally

1

UI,V I are nx r orthonormal

singular. ker A

have

Thus,

= UIZIV

where

U, V

Hence,

is t h e

orthonormal

orthogonal basis

In p r a c t i c e ,

matrices

U I is an o r t h o n o r m a l

i.e.

to u s e

complement

and

basis

of

the

space

V 2 is an o r t h o n o r m a l

(29), o n e w i l l

~I

is r × r n o n -

for c o l s p with

basis

frequently

A,

and

V I as

f o r k e r A.

encounter

the

situation

o I >_ o 2

where

>_

some of the

the order cide

singular

It is b e s t gular 10 0

only

the exact

later

10 -I

_> ... > o n _> 0,

singular

the machine

is s m a l l " , value

(scaled 10 -2

the order

10 -4

instead,

"small",

The problem

is at w h a t

the problem

point

of a c c u r a c y ,

here. Assume

so t h a t

o I = 1) e.g.

10 -8

10 -10

10 -9

of magnitude

Hence

are

precision.

that

r a n k of t h e m a t r i x

zero.

values

i.e.

on

is to d e -

to c o n s i d e r

zero.

illustrate are

6 digits

considered

10 0

to

values 10 -1

where

If,

of e.g.

"how small

a small

had

... >_ O r > O r + I

are

shown.

is 8. B u t

then

we would

0

that

the

0,

Then

we

see

if t h e o r i g i n a l

any number consider

< 10 -6 the

rank

that

data

should to be

be 5.

we had the values

10 -2

10 -4

10 - 6

10 - 8

10 - 1 0

10 - 1 2

10 - 1 4

10 - 1 6

I0 -I

10 -2

10 -3

10 -4

10 -5

10 -6

10 -7

10 -8

10 -18

or

i0 0

sin-

10 -9 '

only

209

then there almost

is no o b v i o u s

entirely

Unfortunately, practice,

small

see

this

the S.V.D.

This

arise

really

means

perturbation

to A w i l l

reduce

the rank,

only

slightly

larger,

in

is n o t a d e f e c t

arises,it

For a full d i s c u s s i o n

this

frequently

situation

(Klema and Laub,

We close

c a n and d o e s

in l a r g e m a t r i c e s .

perturbation,

further.

rank depends

of the z e r o t o l e r a n c e !

situation

If this

(negligible)

another even

so the e f f e c t i v e

o n the c h o i c e

especially

of the S.V.D.

gap,

can r e d u c e

of the S.V.D.

that a and

the r a n k

and the r a n k

1980).

section with

W e just p o i n t

a few examples o u t the idea,

of s i t u a t i o n s

leaving

involving

the d e t a i l s

to

It is u s e f u l

to

t h e reader.

a) L e a s t

Squares

(Lawson

T h i s w a s the c l a s s i c a l solve p r o b l e m cases, R in

(7) in cases w h e r e

we c a n n o t

A is r a n k d e f i c i e n t .

use the Q R d e c o m p o s i t i o n

(8) is s i n g u l a r ,

If we r e s o r t

& Hanson,1974) o r i g i n of the S.V.D.

instead

and h e n c e we c a n n o t to the S.V.D.

because solve

In s u c h

the m a t r i x

(9).

of A, we have,

in v i e w of

(20a)

llAx-bll

= 11

V*x-bll

=

w h e r e y = V * x a n d c = U*b. original

problem

partition

I[

y

We minimize we

find

The result

to a d i a g o n a l

the a b o v e

0

11 ZV*x-

as in

C

(29)

y ctl,

is w e h a v e

problem

converted

involving

the

Z. We

to o b t a i n

II

t h i s n o r m by s e t t i n g

that Y2 is free!

*bll=ll

Yl

-I = Z1 ci"

In the s o l u t i o n ,

210

(b) P s e u d o

Inverse

The p s e u d o expressed

inverse

A + of A, L a w s o n

and Hanson,

1974,

can be

as

A+

V

U*

where

we have

(c) R e l a t i o n

used

the p a r t i t i o n i n g

(29).

to A * A

We p o i n t

out the r e l a t i o n s h i p

of the S.V.D.

to the c l a s s i c a l

idea of e i g e n v a l u e s . If A = U Z V*, then A *A = V Z U* U Z V*

= V Z 2 V*

~2

•...,

= diag(~,

Hence•

the

a~

singular

of A'A,

of A * A . I n

using

fact,

semi-definite

to p r o v e

Hanson,

1974). for a c t u a l

solution

of the

without ease

accurate

forming

A'A,

squares and

u s i n g A * A are o f t e n

performing

computation

this

argument

in

(Lawson

and

or the

the S.V.D.

from e x a m p l e

(or 2 × n for any n), sufficiently by hand.

positive

(a), it is a l m o s t

to c o m p u t e

as c a n be seen

r o o t of

is s y m m e t r i c

of the S.V.D.

problem

faster

problems

square

of V are the eigenvectors

of the S.V.D.

computation

least

of 2 × 2 m a t r i x

obtained when

the fact that A * A

the e x i s t e n c e

However,

more

oi are just the

and the c o l u m n s

for any A, one can c a r r y

reverse

always

2 an).

values

the e i g e n v a l u e s

,

(13). the

accurate,

directly In the results

especially

211

6. A p p l i c a t i o n s

of P r e v i o u s

We f i n a l l y

a look of h o w the p r e v i o u s

stability

take

applies

continuous-time ~(t)

= F x(t)

y(t)

= H x(t)

to L i n e a r

on n u m e r i c a l either

the

system

+ G u(t) (31)

F is n x n, G is n x m a n d H is p x n. Markov

w(i)

G

= H F i-I

In d i s c r e t e whereas

parameters

time

t h e y are the v a l u e s

in c o n t i n u o u s - t i m e at the o r i g i n

We s h a l l

consider

a given

system

system

defined


and

t, t h e r e

Obviously, The p r o b l e m

criteria

studied

theorem:

two problems

Determining

f r o m the whether

the s y s t e m

w(-). (Kalman,

Falb,

if,

for e a c h

state x and e a c h

a T

Time Series and Linear Systems

Linear Time-Varying Systems: Analysis and Synthesis

Linear Operators and Linear Systems)

Linear Operators and Linear Systems)

Linear Systems and Signals

Discrete time markov jump linear systems

Discrete-time Markov jump linear systems

Linear Time-Varying Systems: Algebraic-Analytic Approach

Signals and Linear Systems

Discrete-Time Markov Jump Linear Systems

Linear Systems

Linear systems

Linear Systems

Linear Systems

Linear Systems

Signal Processing and Linear Systems

Linear Inequalities and Related Systems

Signal Processing and Linear Systems

Signal Processing and Linear Systems

Signal Processing and Linear Systems

Systems of Linear Inequalities

Principles of Linear Systems

Linear Dynamical Systems

Linear dynamical systems

Well-posed linear systems

Linear Systems Theory

Linear Systems Theory

A Linear Systems Primer

Linear Time Varying Systems and Sampled-data Systems (Lecture Notes in Control and Information Sciences)

Linear systems theory

(A, B)-Invariant Polyhedral Sets of Linear Discrete-Time Systems

Time Series and Linear Systems

Linear Time-Varying Systems: Analysis and Synthesis

Linear Operators and Linear Systems)

Linear Operators and Linear Systems)

Linear Systems and Signals

Discrete time markov jump linear systems

Discrete-time Markov jump linear systems

Linear Time-Varying Systems: Algebraic-Analytic Approach

Signals and Linear Systems

Discrete-Time Markov Jump Linear Systems

Linear Systems

Linear systems

Linear Systems

Linear Systems

Linear Systems

Signal Processing and Linear Systems

Linear Inequalities and Related Systems

Signal Processing and Linear Systems

Signal Processing and Linear Systems

Signal Processing and Linear Systems

Systems of Linear Inequalities

Principles of Linear Systems

Linear Dynamical Systems

Linear dynamical systems

Well-posed linear systems

Linear Systems Theory

Linear Systems Theory

A Linear Systems Primer

Linear Time Varying Systems and Sampled-data Systems (Lecture Notes in Control and Information Sciences)

Linear systems theory

(A, B)-Invariant Polyhedral Sets of Linear Discrete-Time Systems

Recommend Documents