Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner
86 Time Series and Linear Systems
Edi...
40 downloads
1054 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Lecture Notes in Control and Information Sciences Edited by M.Thoma and A. Wyner
86 Time Series and Linear Systems
Edited by S. Bittanti
Springer-Verlag Berlin Heidelberg New York London Paris Tokyo
Series Editors M. Thoma • A. Wyner
Advisory Board L. D. Davisson • A. G. J. MacFarlane • H. Kwakernaak J. L. Massey ' Ya Z. Tsypkin • A. J. Viterbi Editor Sergio Bittanti Dipartimento di Elettronica Politecnico di Milano Piazzo Leonardo da Vinci 32 20133 Milano (italy)
ISBN 3-540-16903-2 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16903-2 Springer-Verlag New York Berlin Heidelberg Library of Congress Cataloging in Publication Data Time series and linear systems. (Lecture notes in control and information sciences; 86) Includes bibliographies. 1. Time-series analysis. 2. Linear systems. I. Bittanti, Sergio. I1. Series. OA280.T558 1986 519.5'5 86-20244 ISBN 0-387-16903-2 (U.S.) This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Weft", Munich. © Springer-Verlag Berlin, Heidelberg 1986 Printed in Germany Offsetpnnting: Color-Druck, G. Baucke, Berlin Binding: B. Helm, Berlin 216113020-543210
PREFACE
O v e r the
five p a s t
n i c o di M i l a n o
years,
(Italy)
and ~ d e n t i f i c a t i o n Several
Analysis,
series
Statistics,
of r e s e a r c h
by m e a n s
Numerical
contributing
on the subject.
underlying
this
was
chapters
are e x t e n d e d
advanced
topics
The b o o k
problem
in the field.
They also
directions
as follows.
The p r o b l e m of f i n d i n g
the
linear
of c r i t e r i a
such as A I C or B I C
the p r o b l e m
of d e t e r m i n i n g
observed second
~
Hankel
studied matrix
matrix
variables
chapter.
assumtions
of c u r r e n t
which
Among
of the
impulse
of f i n i t e
are s u b j e c t
The motivation
can then be avoided.
interest.
rank.
is an
in time
series
here
other
things
the use
discussed.
rational
transfer
of a p p r o x i m a t i n g response
Linear
to errors
as the
is the b e s t
is c r i t i c a l l y
as the p r o b l e m
important
useful
chapter
models
a suitable
The v a r i o u s
constitute
is i n t e r p r e t e d
model
for the d a t a at hand.
with a Hankel
talks
of ideas
overviewing
The first
of m o d e l l i n g
approximant
infinite
their
a system-theoretic
papers
to the use of s t o c h a s t i c
approximation
train
of s u c h an activity.
introductory
is o r g a n i z e d
analysis.
report
to r e s e a r c h
introduction
to d e v e l o p
The
with
System
for the art of modelling.
is a p a r t i a l
introductions
systems.
Econometrics,
up a w o r k s h o p
This b o o k
of l i n e a r
including
to s e t t i n g
point of v i e w
of m o d e l l i n g
backgrounds,
the P o l i t e c n i c o
activity
at the P o l i t e c -
in the m e t h o d o l o g y
of d i f f e r e n t
Theory,
visited
has b e e n
of time
specialists
and C o n t r o l
a stream
function the
coefficients
systems
where
are c o n s i d e r e d
is t h a t p r e j u d i c i a l A n e w class
Moreover,
all
in the
causality
of d y n a m i c
models
IV
for time
series
are b a s e d strictly
is p r o p o s e d
on the c l a s s i c a l related
chapter
Length
approach.
digits
with which
is d e v o t e d
of s t o c h a s t i c of b i n a r y data.
ically
time-varying
coefficients,
structural
series.
Chapter
The
properties
of these
and so on.
in the a n a l y s i s
of s t o c h a s t i c
in the sixth Shur Then,
problems
chapter.
and Singular the p r o b l e m
time-invariant
The v o l u m e
on the s u b j e c t the m a i n
authors their
of some r e c e n t either
expresses
his
care and patience
sincere
valuable
Research
Council
(M.P.I.)
is g r a t e f u l l y
theory
(C.N.R.)
properties upon.
are c o n s i d e r e d
of the LU,
QR,
is provided. subspace
of a
is d e v o t e d
in E c o n o m e t r i c s .
book providing
acknowledgment contributions,
di T e o r i a
researchers
courses with
field.
in the p r e p a r a t i o n
of the C e n t r o
to d e s c r i b e
is t o u c h e d
last c h a p t e r
in the
period-
on the b a s i c
b y these
algorithms
trends
with
reachability,
reachability The
to
as a t e x t b o o k for m o n o g r a p h i c
and p e r s p e c t i v e s
for t h e i r m o s t
The s u p p o r t
the
or as a r e f e r e n c e
trends
The e d i t o r
Decomposition
systems
systems
overview
as
it p e r m i t s
here
i.e.
system
is studied.
can be u s e d
focuses
periodic
This
of the data,
with
The r o l e p l a y e d
of b i n a r y
data.
can be u s e d
systems,
of c o m p u t i n g
system
to the d i s c u s s i o n
which
An e x t e n s i v e
2. The
Description
the o b s e r v e d
5 deals
in l i n e a r
Value
Minimum
with which
attention
stabilizability
Some numerical
in C h a p t e r
complexity
digits
and are
by the n u m b e r
to e n c o d e
the o b s e r v e d
time
judged
These models
approach,
introduced
encode
seasonal
Analysis
is t h e n
it p e r m i t s
number
chapter.
to the so c a l l e d
A model
to the n o t i o n
the s h o r t e s t
Factor
to the s y s t e m s
fourth
leads
in the t h i r d
to the as well
fellow as
of the m a n u s c r i p t s .
dei S i s t e m i
of the N a t i o n a l
and t h a t of the M i n i s t r y
of E d u c a t i o n
acknowldged.
Sergio
Bittanti
ABSTRACTS
Chapter
TIME
l
SERIES AND
b y E.J.
The b a s i c
concept
an o u t p u t
y(t),
u(t),
p
of
STOCHASTIC
Hannan
of this p a p e r of
q
is a l i n e a r
components,
components
y(t)
y(t)
the
e(t)
= ~ W i e(t-i) linear
The m e t h o d s
valid when
the s y s t e m
prediction
is o p t i m a l ,
is r e l a t e d
to an input,
+ ~ L i u(t-i) 1
are the
- Z L i u(t-i).
system wherein
via a relation
0
wherein
MODELS
is t r u l y
prediction
errors
of the p a p e r linear
but may prove
are s u b s t a n t i a l l y
in the useful
for
sense over
that
linear
a much wider
range. To b r i n g
the p r o b l e m b a c k
statistical structure
methods
are b a s e d
by one w h e r e i n
W(z)
are a p p r o x i m a t e d discussion
= Z W.
process.
to c h o o s e
"order"
lags
-i
, L(z)
the of the t r u e
functions
= Z L.
of r a t i o n a l
of some b a s i c
approximation
the m a x i m u m
z
proportions
on t h e a p p r o x i m a t i o n
the m a t r i x
by m a t r i c e s
is g i v e n
the
to r e a s o n a b l e
-i
functions.
theory
It is n e c e s s a r y ,
z
relating
A brief
to such an
in the a p p r o x i m a t i o n
of the a p p r o x i m a n t ,
i.e.
effectively
in the A R M A X m o d e l ,
h Z AiY(t-i) 0
h = ~ B.u(t-i)l I
h + Z C.l e(t-i) 0
,
to w h i c h Various
the r a t i o n a l algorithms
analysis
described
are d e s c r i b e d
a suitable
that
does
approximant.
recursive
at e a c h
corresponds.
are b a s i c
in time
a solution
The m a i n
this by a G a u s s - N e w t o n
is r e d e t e r m i n e d
to the p r o b l e m
algorithm
iteration
iteration
series
in w h i c h
the
by a c a l c u l a t i o n
in the order.
Finally,
on-line
presented
Chapter
function
and are t h e n u s e d to e f f e c t
of f i n d i n g
order
transfer
implementations
for the c a s e w h e r e
LINEAR
2
of the a l g o r i t h m
y(t)
are
is scalar.
ERRORS-IN-VARIABLES
SYSTEMS
by M. D e i s t l e r
Linear where
errors-in-variables(EV) all o b s e r v e d
considered. out
The
variables
statistical
to be s i g n i f i c a n t l y
conventional good part
of t h e s e
the t r a n s f e r general,
errors
systems, are s u b j e c t
analysis
more
complications
function
of the
is n o t u n i q u e l y
system
are
systems.
f r o m the
f r o m the
turns
to A
fact t h a t
in the E V case,
determined
systems
systems
compared
(e.g. ARMAX) arises
linear
to e r r o r s
of s u c h
complicated
in e q u a t i o n s
i.e.
in
second moments
of the o b s e r v a t i o n s . The p a p e r known
is o r g a n i z e d
results
sections contained
concerning
3 - 5 the in the
is a n a l y s e d :
as follows: the s t a t i c
information
(ensemble)
In s e c t i o n
In s e c t i o n
about
second
3 the
case
2 some w e l l
are r e s t a t e d .
the t r a n s f e r
moments
In
function
of the o b s e r v a t i o n s
set of all t r a n s f e r
functions
corresponding described.
to g i v e n
Section
system
is a p r i o r i
whether
causality
the o b s e r v a t i o n s . are derived. using
4 deals with known
of the o b s e r v a t i o n s
the same p r o b l e m
to be c a u s a l
c a n be d e t e c t e d In s e c t i o n
Section
information
second moments
the
the p r o b l e m
f r o m the s e c o n d m o m e n t s
5 conditions
6 deals
coming
and w i t h
when
is
with
for i d e n t i f i a b i l i t y
conditions
from moments
of
for i d e n t i f i a b i l i t y
of o r d e r
greater
than
two.
A N E W C L A S S OF D Y N A M I C M O D E L S FOR STATIONARY TIME SERIES
Chapte r 3
by G. P i c c i
A new class presented. known
of d y n a m i c
for s t a t i o n a r y
time
T h e y are a n a t u r a l
generalization
of the w e l l -
and P s y c h o m e t r i c s . of time
to some e x t e n t
simple
of m u l t i v a r i a t e introduction subsumed
series
is
Analysis M o d e l s w i d e l y u s e d in S t a t i s t i c s It is s h o w n
series clarify
the
in this n o t e
structure
schemes
series which
of a p r i o r i causality
by c o n v e n t i o n a l
reduce
of) Dynamic
in the r e c e n t
mathematical time
that the F a c t o r A n a l y s i s
considered
VariabZe Models d i s c u s s e d provide
S. P i n z o n i
models
l i n e a r Factor
Models
and
avoid
They
identification
the u n j u s t i f i e d
assumptions
A R M A X models.
(and
Errors-In-
literature.
for the
to
as for e x a m p l e
Chapter
4
PREDICTIVE AND NONPREDICTIVE MINIMUM DESCRIPTION LENGTH PRINCIPLES by J. R i s s a n e n
This
chapter
behind
presents
the r e c e n t l y
Minimum model
permits length
one to e n c o d e
stochastic
for m o d e l s
the p r e d i c t i v e tend
sets
data
can be predicted.
involves their
a tight
estimates
values,
statistical with
lower
information
problems
complexity
bound
that
in m o d e l i n g
we d e s c r i b e
of the d a t a
relative
of models,
both
single
case.We
illustrate
with
associated
estimates
structures.
We also
the p a r a m e t e r s , be t a k e n
of.
The
by s i m u l a t i o n s .
complexity the
the c o m p l e x i t y and
all the
say that the
the funda-
stochastic
model. of the
stochastic
A R M A class input/output
the c o n s i s t e n c y of the p a r a m e t e r s
h o w the p r i o r by their
feasibility
for
f r o m the d a t a
and the m u l t i p l e
as r e p r e s e n t e d
advantage
demonstrated
describe
with
to the g a u s s i a n
of the n u m b e r
which
with which
to c a l c u l a t e
simulations
can be
stochastic
the c a l c u l a t i o n
complexity
to be the
ones,
to i n c o r p o r a t e
optimal
code
of the p a r a m e t e r s
we m a y
it
on h o w the
c a n be e x t r a c t e d
are
with which
complexities
associated
Hence,
called
a statistical
is d e f i n e d
The
ideas
shortest
for the errors
The m o d e l
models.
in the
The
Depending
same value.
and the a s s o c i a t e d
As a p p l i c a t i o n s
digits
and the n o n p r e d i c t i v e
m a y be t a k e n
the c o n s i d e r e d
mental
in a class
b o t h of the n u m b e r
which
principle,
data.
of s t o c h a s t i c
to the
also
the b a s i c
Briefly,
of b i n a r y
of the data.
two k i n d s
samples
principles.
the o b s e r v e d
complexity
is done
defined, large
Length
manner
estimation
by the n u m b e r
available
coding
developed
Description
is judged
in a t u t o r i a l
knowledge
estimated of the
of the and the about
values,
scheme
is
can
IX
Chapter 5
DETERMINISTIC AND STOCHASTIC LINEAR PERIODIC SYSTEMS by S. Bittanti
The main results concerning the structural properties of linear periodic systems are reviewed. and discrete-time time-invariant discussed.
Both continuous-time
systems are dealt with. By a comparison with
systems,
five structural properties are
Three of them are basic properties concerning the
reachability and controllability subspaces. The fourth one concerns the length of the time interval required to perform the reachability and controllability transition. (spectral) characterizations
are presented as fifth property.
The extended structural properties detectability)
The modal
(i.e. stabilizability and
are also dealt with. Finally,
periodic stochastic
systems are considered. The existence of a cyclostationary solution is investigated by analizing the appropriate periodic Lyapunov equation.
Chapter 6
NUMERICAL PROBLEMS IN LINEAR SYSTEM THEORY by D. Boley and S. Bittanti
We discuss some numerical aspects
in linear system theory.We
start by showing the numerical algorithm to solve systems of linear equations and non-degenerate
least squares problems.We
then move on to an introduction to more sophisticated matrix decompositions,
used to solve more sophisticated problems,and
introduce the cincept of son,
backward
error
analysis
1965). Among the decompositions we introduce
(Wilkin-
name
form
LU
A=LU
used
to o b t a i n
solution
of l i n e a r
determinant
(Gaussian Elimination) A=QR
QR
soln. to l e a s t S q u a r e s p r o b l e m (linear n o n d e g e n e r a t e )
(orthogonal triangularization)
soln. to l i n e a r E q u a t i o n s w i t h o u t n e e d to p i v o t
Schur
A=QRQ ' . Eigenvalues/vectors
Singular Value D e c o m p o s i t i o n (S.V.D.)
A=PZQ ' . Singular
Values
• rank • distance
to s i n g u l a r i t y
2 - n o r m of m a t r i x
•
2-norm condition where
P,Q denote
orthogonal
U,R
"
upper
triangular
matrices
L
"
lower
triangular
matrices
Z
is n o n - n e g a t i v e
diagonal
last s e c t i o n we d i s c u s s
linear
s y s t e m theory.
linear
numerical
methods.
a n d give
in t e r m s
subspace
some r e c e n t
of r e s u l t s
aspects
is f o c u s e d
in
o n the p r o b l e m
of a t i m e - i n v a r i a n t
It is s h o w n h o w some c l a s s i c a l
problems
on the e r r o r s
some n u m e r i c a l
The attention
the c o n t r o l l a b l e
system.
number
matrices
In the
of c o m p u t i n g
Equations
methods
results
from these
lead to
giving bounds
classical
×I
Chapter
SOME R E C E N T
7
DEVELOPMENTS
b y M. M c A l e e r
In this p a p e r
we d i s c u s s
in e c o n o m e t r i c s : particular,
macroeconomic associated
modelling
with
and M. D e i s t l e r
some of the m a i n
methods
diagnostic
IN E C O N O M E T R I C S
recent
for s p e c i f i c a t i o n
checking
and
empirical
search,
specification
and f o r e c a s t i n g ; microeconomics.
developments in testing;
and some m o d e l s
AUTHORS
Sergio Bittanti D i p a r t i m e n t o di E l e t t r o n i c a P o l i t e c n i c o di M i l a n o P i a z z a L e o n a r d o da Vinci, 32 20133 M I L A N O ITALY
Daniel Boley D e p a r t m e n t of C o m p u t e r S c i e n c e U n i v e r s i t y of M i n n e s o t a 136 L i n d Hall 207 C h u r c h S t r e e t S.E. M I N N E A P O L I S , M i n n e s o t a 55455 U.S.A.
Manfred Deistler I n s t i t u t fdr O k o n o m e t r i e u n d Operations Research Technische Universit~t Wien Argentinierstrasse 8/119 A-1040 WIEN AUSTRIA
E d w a r d G. H a n n a n D e p a r t m e n t of S t a t i s t i c s M a t h e m a t i c a l S c i e n c e s Bldg. The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA
M i c h a e l J. M c A l e e r D e p a r t m e n t of S t a t i s t i c s , The F a c u l t i e s The A u s t r a l i a n N a t i o n a l U n i v e r s i t y GPO Box 4 C A N B E R R A , A C T 2601 AUSTRALIA
Xill
Giorgio Picci I s t i t u t o di E l e t t r o t e c n i c a U n i v e r s i t ~ di P a d o v a Via G r a d e n i g o 6/A 35131 P A D O V A ITALY
Stefano Pinzoni LADSEB-CNR Corso Stati Uniti 35020 P A D O V A ITALY
Jorma R i s s a n e n IBM-RES 650 H a r r y R o a d SAN JOSE, C A 95193 U.S.A.
4
ed E l e t t r o n i c a
XIV TABLE
TIME
Chapte r I
SERIES
by E.J.
AND
OF C O N T E N T S
STOCHASTIC
MODELS
Hannan
I. I n t r o d u c t i o n
I
2. Some
4
Basic
Algorithms
3. A p p r o x i m a t i o n 4. R a t i o n a l
Criteria
Transfer
5. A. G a u s s - N e w t o n 6. Some
Theoretical
8
Function
Approximation
12 16
Procedure
28
Considerations
34
References
Chapter
2
LINEAR
ERRORS-IN-VARIABLES
SYSTEMS
37
by M. D e i s t l e r
I. I n t r o d u c t i o n
37
2. The S t a t i c
41
3. S e c o n d
Case
M o m e n t s and D y n a m i c
Models: the G e n e r a l
Case
4. C a u s a l i t y 5. C o n d i t i o n s Moments
52 for I d e n t i f i a b i l i t y
f r o m the S e c o n d
of the O b s e r v a t i o n s
6. I d e n t i f i a b i l i t y References
48
from H i g h O r d e r
58 Moments
63 66
XV
Chapter
3
A NEW CLASS
OF D Y N ~ 4 I C
FOR STATIONARY b y G. Picci
TIME
MODELS
69
SERIES
and S. P i n z o n i
69
I. I n t r o d u c t i o n 2. D y n a m i c
Factor
3. S t o c h a s t i c
Analysis
80
Models
87
Realization
4. C a u s a l i t y
104
References
112
Ch_~pter 4
PREDICTIVE MINIMUM
AND NONPREDICTIVE
DESCRIPTION
LENGTH
115
PRINCIPLES
by J. R i s s a n e n
1. I n t r o d u c t i o n
115
2. C o d i n g
120
and Prediction
3. A R M A E s t i m a t i o n 4. V e c t o r
Time
and P r e d i c t i o n
Series
125 131
Models
137
References
Chapter
5
DETERMINISTIC
AND
STOCHASTIC
LINEAR
PERIODIC
141
SYSTEMS by S. B i t t a n t i
141
I. I n t r o d u c t i o n 2. S t r u c t u r a l Systems
Properties
of C o n t i n u o u s - t i m e
Periodic
143
X~ 2.1 Continuous-time Linear Periodic Systems
143
2.2 Structural Properties
145
2.3 Grammian Matrices
146
2.4 Five Structural Properties of Time-invariant
146
Systems 2.5 Five Structural Properties of Continuous-time
148
Periodic Systems 3.
Structural Properties of Discrete-time Periodic
156
Systems 3.1 Discrete-time Linear Periodic Systems
156
3.2 Structural Properties
158
3.3 Grammian Matrices
158
3.4 Five Structural Properties of Discrete-time
159
Periodic Systems 4.
Kalman Canonical Decomposition
163
5.
Extended Structural Properties
165
6.
Stochastic Linear Periodic Systems
168
References
Chapter 6
176
NUMERICAL PROBLEMS IN L I N E A R
SYSTEM
THEORY
183
by D. Boley and S. Bittanti
1 ,
Introduction
183
2.
Review of Simpler Computational Methods
183
2.1LU
183
Decomposition
2.20rthogonal 2.2.1QR
Decomposition
Decomposition
2.2.2 Geometric Interpretation of a Rotation
188 188 191
2.2.3 QR Decomposition by Housolder deconigositions 192 2.2.4 Solving Least Squares Problems Using Orthogonal Decompositions
194
X~
Special Forms Used in Numerical Linear Algebra-Why
196
3.1 The Jordan Canonical Form
196
3.2 Numerical Conditioning of a Problem
197
4.
Schur Decomposition
199
5.
Singular Value Decomposition -
201
3.
Condition Number of a Matrix 6.
Applications of Previous to Linear Systems
References
Chapter 7
211 220
SOME RECENT DEVELOPMENTS IN ECONOMETRICS
222
by M. McAleer and M. Deistler
I.
Introduction
222
2.
Specification and Quality COntrol of a Model
226
2.1 Model Specification
227
2.2 Tight and Loose Specifications
228
2.3 Principles for Testing
231
2.4 Diagnostic Testing
232
2.4.1 Serial Correlation
232
2.4.2 Heteroscedasticity
233
2.4.3 Exogeneity
234
2.4.4 Functional Form
234
2.4.5 Parameter Constancy
235
2.4.6 Non-nested Alternatives
235
3.
Macroeconomic Modelling and Forecasting
236
4.
Microeconometrics
240
References
241
Chapter
I
Time
Series and Stochastic
E.J.
.
Models
Hannan
Introduction
This c h a p t e r will be c o n c e r n e d w i t h p r o c e d u r e s
for a n a l y s i n g
y(t),
of
t = 1,2,...,T,
where
y(t)
is a v e c t o r
q
data,
components
that can be thought of as the o u t p u t of some s y s t e m to w h i c h the input is
u(t),
an o b s e r v e d v e c t o r of
p
components.
held in m i n d is one w h e r e no very p r e c i s e about the system and the d e s c r i p t i o n of such g e n e r a l i t y explanation.
that e x p e r i e n c e
information
The situation is available
will be on the b a s i s
suggests w i l l
suffice
This w i l l be further d i s c u s s e d below.
of m o d e l s for a g o o d
T h e s e models
will always be stochastic. Let us b e g i n by c o n s i d e r i n g stationary
stochastic
y(t),
p r o c e s s with
E{yj(t) 2} < -, where
y~(t)
is the
assume that
y(t)
of the p r o c e s s
is ergodic
is e v e r
from the i n d e f i n i t e l y
effects c o u l d
=
of
y(t).
since only one h i s t o r y
so that there
far past,
to r e q u i r e if there
such effects
to
or r e a l i z a t i o n that it be
is no influence
or rather
such as diurnal
by a so that
It is c o s t l e s s
on
y(t)
is such an
as the m e a n or of
or seasonal m o v e m e n t s . so that,
will be with the m e a n c o r r e c t e d
Such
for example,
quantities
y(t)- y,
1 T Z y(t).
to c a l c u l a t i o n s
so t h a t
[(t)
This makes n o t a t i o n Any such stationary, least in part,
square,
q,
first be removed by r e g r e s s i o n
all c a l c u l a t i o n s
In relation
as g e n e r a t e d
seen and r e a s o n a b l e
it can only be through
periodic c o m p o n e n t s
been done
j = 1,2
j'th c o m p o n e n t
purely n o n - d e t e r m i n i s t i c , influence
alone,
finite mean
it will be assumed
is the r e s i d u a l
that this has a l r e a d y
from such an adjustment.
simpler. non-deterministic
through
its spectrum,
process can be analysed, f(w),
a
q x q
at
matrix valued
2
function
satisfying F(t)
f (~) = f (~)
= f (-~) '
d E{y(s)y(s+t)'}
= I ~eit~f{~)
We shall n o t discuss F o u r i e r m e t h o d s methods models
are e m p h a s i s e d
proportions,
in c o n t r a s t
Here
to F o u r i e r m e t h o d s
by smoothness
and systems e n g i n e e r i n g
a c r o n y m for a u t o r e g r e s s i v e (Here e x o g e n o u s
means
For
to m a n a g e a b l e
requlrements
for
f~).
emphasised
and are c a l l e d ARMAX,
moving-average
input.)
that are
is r e d u c e d
These finite p a r a m e t e r m o d e l s have been e s p e c i a l l y econometrics
the main
"finite parameter"
and in w h i c h the g e n e r a l i t y
essentially,
de.
in any d e t a i l b e c a u s e
of this paper are different.
non-parametric
and
with exogenous
y(t}
stationary
in
an
compo[:ents.
and non-
deterministic j y(t) Here the e(t) of
e(t)
= y(t) y(t)
= 7 W i e(t-i), 0
are the linear i n n o v a t i o n s
- y(tlt-l)
from
y(t[t-l)
important,
Then
There
% 0,
since
i.e.
is the b e s t linear p r e d i c t o r
There
is an e x t e n s i v e
of
f(~)
here.
the c o n s t r u c t i o n
but this will n o t be
Put (i.i)
Iz] > 1
and
W(z)
is a n a l y t i c
H o w e v e r we always a s s u m e
zeros on
Izl = 1
for
Izl > i ,
det W(z)
cause c o n s i d e r a b l e
# 0,
problems.
is a d e c o m p o s i t i o n f(~)
which
is u n i q u e
W(z)
having
= _ 1 W(e-i~ ) nw(e-i~)*, 2~ since there
the p r o p e r t i e s
we g e n e r a l i s e
(i.I)
y(t)
is no other
(1.2) such d e c o m p o s i t i o n
stated above.
To take a c c o u n t
= 7. Wie(t-i) 0
u(t)
= ZLiZ-i
is causal, so that there is no i n f l u e n c e However
The e s s e n t i a l
(1.3)
relation
(I.I},
as a basis for a w o r t h w h i l e further
of
+ 7. L.u(t-i), 1
L(z)
s > t.
with
to
and p u t
u(s),
theory
= 7. W.z -i 0
~II W i N 2 < ,
Izl > I,
E{e(s)e(t) '} = ~st ~.
W i e n e r and others c o n c e r n i n g
algorithmically,
det W(z)
since
y(t-2)...
from k n o w l e d g e
W(z}
9(tlt-1)
where
y(t-l),
due to Kolmogoroff, of
W 0 = Iq,
specialisation
(1.27,
restriction
on
is that the
y~t)
(1.3) are too g e n e r a l
statistical
consider
here
analysis.
the infinite
from to serve
To introduce
(Hankel)
matrix
a
-W 1
L1
W2
L2
W3
L3
-..
W2
L2
W3
L3
W4
L4
-.-
W3
L3
W4
L4
W5
L5
...
H =
i
m
I
0
Here
[WjLj]
Q
Q
t
0
0
i
Q
O
w i l l be c a l l e d a "block",
columns.
The i m p o r t a n c e of
obvious, f a c t that the b e s t predictic~ of u ( ~ , ,
H
j
H
has,
Li÷jut i,
Let
n
rows of
H
the c o e f f i c i e n t w i l l be m a d e
that span all of the rows of
so that a n y row can be l i n e a r l y r e p r e s e n t e d in t e r m s of them.
Of course rank of
n
w o u l d be infinite in general.
H,
rows of
H
p
[W(z),L(z) ].
Call
H1
y(t) Since
where
K,L
c o l u m n s of
(1.3),
comprise, H 0.
q
r e s p e c t i v e l y the first
q
are com.posed of
This is the state
F,H
+ e(t},
the rank of
H, H 0.
x(t)
H0~(t)
= Ke(t)
and
= H0~(t-l).
(1.4)
(full)
rows of
+ Lu(t) H
then
+ H2u(t-I). H 1 = HH0,
and x(t+l)
= Fx(t)
space r e p r e s e n t a t i o n
Its lack of u n i q u e n e s s ,
given that
F
+ Lu(t)
+ Ke(t).
(1.5)
in p r e d i c t i o n e r r o r form. is minimal,
i.e. of d i m e n s i o n
is e n t i r e l y due to the lack of u n i q u e n e s s
in a
T h a t can be m a d e u n i q u e by c h o o s i n g the rows of
as the first l i n e a r l y H.
or
Put
+ e(t),
for suitable
y(t) = Hx(t)
choice of
H,
(1.4)
= Hl~(t-l)
H0, H 2
H 2 = FH 0
the
the first b l o c k of
~(t) = [e(t)'u(t) ' e ( t - l ) ' u ( t - l ) ' . . . ] ' , Then from
n,
and p u t
H 0 = [K L H 2] the next
The i n t e g e r
is c a l l e d the o r d e r or the M c M i l l a n d e g r e e of
e q u i v a l e n t l y of
of
is, ignoring
j>l
The i m p o r t a n c e of H
in section 4.
be a set of
p + q
can be seen f r o m the, a l m o s t
evident in o t h e r w a y s H0
rows a n d
as the j'th row of blocks,
blocks in t h a t p r e d i c t i o n .
H
q
step a h e a d p r e d i c t o r
(t+jlt) = Z0 Wi+ j et_i + SO that
of
H0
i n d e p e n d e n t set f o u n d as you go down the r o w s
We w i l l r e t u r n to this later.
The m e t h o d s
used herein are d e p e n d e n t 9 n acting, as if
Then, and only then, functions
of
z
W(z)
and
L(z)
are m a t r i c e s
a n d can thus be w r i t t e n
n
is finite.
of rational
in the form
[ W(z) L(z)] = A(z-l) -I [C(z-I) B(z-l)] where (1.6)
A(z),
B(z), C(z)
are m a t r i c e s
(1.6)
of p o l y n o m i a l s .
is far from unique but we shall later describe
prescription fraction
of
H0
how the u n i q u e
just d e s c r i b e d
description",
the shift o p e r a t o r
Of course
(1.6).
i.e.
leads to a unique "matrix -1 shall use z also to indicate
We
z-ly(t)
= y(t-l).
Corresponding
to
(1.6)
we have the ARMAX r e p r e s e n t a t i o n A(z-l)Y(t)
= B(z-l)u(t}
This is i m p o r t a n t p a r t l y b e c a u s e
+ C(z-l}e(t)"
it e x p r e s s e s
y(t-1), y(t-2),., u(t-l), u(t-2),..e(t), serve as a basis for an iterative coefficient matrices are unobserved, no input
y(t)
estimation
This will be dealt w i t h variable
and can
p r o c e d u r e where the
are e s t i m a t e d by regression,
(or exogenous)
in terms of
e(t-l), e(t-2),.,
b e i n g r e p l a c e d by e s t i m a t e s
in the iteration.
(1.7)
the
e(t),
from a p r e v i o u s in section
5.
which stage
When
is o b s e r v e d we speak of the ARMA
case. Notes on References. spectral
theory,
for example
theory of systems 2.
There are m a n y r e f e r e n c e s Hannah
see K a i l a t h
(1970).
(1980), C a s t i
For the structure (1977).
Some Basic A l g o r i t h m s
There are three basic a l g o r i t h m s (i)
The first a l g o r i t h m y(t)
d(~)
of time series analysis.
is the discrete
at f r e q u e n c i e s
for
Under
T'
highly
Composite.
E{d(2~j/T) d(2nk/T) and,
indeed the error
(ii)
The
transform
is u n i f o r m l y
second a l g o r i t h m
j=0,1 ..... [½T']
conditions
on
f(~)
n
is finite.
is the L e v i n s o n - W h i t t l e
recursion.
in a sense,
(I.I) by c o n s i d e r i n g
smoothness
2~j/T',
} ~ 6jk2~f(2~j/T)
a l g o r i t h m will not be so i m p o r t a n t
This is designed,
Fourier
= T - ½ T~y(t) e it~ , 1
which is c h e a p l y c o m p u t a b l e
in
for the basic
0(T -I)
if
This
to us.
to produce a p p r o x i m a t i o n s
to
e(t)
~(z) = w(z) -I
e(t) = ~(z)y(t)
The procedure recursively calculates polynomial
approximations
$ n
of degree ~(z)
n
to
~.
is a polynomial
degree
n.
s < 0
n
of degree
will,
n
because a system for which
The recursive calculation
natural estimates
For
We have used
of
F(t)
in fact, be of McMillan
uses the data through the
of the form
1 T-t (t) = ~ Z y(s)y(s+t) ', t Z 0. s=l put ~(t) = ~(-t)' However it will be convenient
put this Levinson-Whittle
recursion
because of its many uses later.
in a more general
Thus let
v(t)
to
setting
be a vector of
s
components and put ^ I T-s Fv(t ) = ~ 7 v(s)v(s+t) ' = s=l
v(-t) ',
t >. 0.
~he recursion calculates matrices^ Fn,j, Fn, j, Sn, S nIf^ v(t) = y(t) then Fn, j is ~n,j' the c o e f f i c i e n t ~n(Z) ~ and correspondingly en(t) The
Fn, ~
process putting
=
would,
"backwards"
we have an estimate
n 0Z Sn,jy(t-j) '
rn(t),
v(t)
for
n rn(t) = ~ n , j
We now give the
e(t)
= y(t),
corresponding
correspond
12.1) to
to the time reversed
(as distinct from the forwards residuals Fn,j = ~n,j
z -j in
iT+n ~n = Sn = T ~ Sn (tlen(t) '
in this case where
residuals,
of
of
~n(t)).
Thus,
y(t) = v(t) y(t-n+j) ,
IT+n ~n = Sn = ~ 1~ rn(t)rnlt)'.
recursive algorithm in terms of
Fn,j = Fn-l,j + Fn,nFn-l,n-j'Fn,j
v(t).
= Fn-l,j
Fn, 0 = Fn, 0 ffi I s~_ Fn,n = -An-ISn~I ' Fn,n = -An-iSn~l'
12.21
+ Fn,nFn-l,n-j '
n An =
ZFn,j~v(J-n-I) "
0
Sn = (Is - Fn,nFn,n) Sn-l' Sn = (Is - Fn,nFn,n) Sn-i '
So = S o ffi ~v 1°)" In case
s = i
we have
that the algorithm
S n = S n' F n,j = F n,j . . j=l, . .
is simplified.
,n,
so
These procedures better,
when
have
n/T
severe
is not small.
founded on the T o e p l i t z T < t ~ T+n. for the
disadvantages This
assumption
Fn, ~
for given
n
T
v(t)
implicitly or
the system of e q u a t i o n s
has a block Toeplitz down any diagonal.)
= %n_l(t)
they are
= 0, -n < t ~ 0
+ ~n,nrn_l|t-l),
matrix,
i.e.
There have been
many m o d i f i c a t i o n ~ , often Dased on calculating, ~n(t) (see (2.1), (2.2)) r e c u r s i v e l y by %n(t)
is small or,
is because
that
(This is so called because
one with the same e l e m e n t s
when
for example,
~
(t), n
rn(t)
= ~n_l(t-l) + ~n,nen_l(t)
~n(0)
£ 0,
~0(t)
= 90(t)
= y(t),
1 ~ t ~ T.
Then also An - T1 It is the terms
T+n Z en(t)rn(t-l). 1
in
(2.1),
(2.2),
to cause m o s t of the trouble, involve, q = 1
in a substantial
it has been
T Z en_l(t)^ n
rn_l(t-l),
resulting
number
coefficient. equivalent
to the fact The use of them)
replaced
in
that
are called partial coefficients
the
Sn(Z)
between
is that the
# 0,
and is completely Izl ~ I,
additional because
a desirable calculations.
but wherever
These
of the flow diagrams
in real time calculations.
we shall continue
~n,n
to write
that is used
For the
in terms of the it could be
(using a lower case symbol
autocorrelations
by systems
computing
(2.4)
(2.4)
formula.
TO see why the a l g o r i t h m consider
also In case
by
of correlation in
~ n,n
(so c a l l e d
recursion
case
I ~ t < n
assumption.
be replaced
n,n
that seem
as also does the c o r r e l a t i o n
involves
are important
by a lattice
In the scalar
$
A virtue
(-i,i),
(2.3)
of this account
Levinson-Whittle
those for
use the c o e f f i c i e n t
or ladder m e t h o d s
describing purposes
that
is also true of
property. lattice
T < t ~ T+n
the Toeplitz
n ~ t ~ T.
lies
This
for
~n-i (t-l)/{!2 TZ en_l(t) ^ ^ 2 + ½ tZ rn_llt-l)2 } n n
but one m i g h t equally en_l{t),
(2.3)
though
way,
suggested
(2.3)
by statisticians
for
q = i}
and reflection
engineers. has b e e n p r e s e n t e d
an estimate
of
e(t)
for general
when
v(t)
inputs are observed.
Put,
then, n^
en(t)
n^
= Z~ n • y(t-j) 0 ,3
- Z~n, j u(t-j). I
Here
Z~n, j z -j is an a p p r o x i m a t i o n
using
(1.6) .
To obtain
A
and
[~n^,~'Tn,j" ]
also
~n'
hand
q x q
$, ~
to
take
as the f i r s t b l o c k
the c o v a r i a n c e matrix
W(z)-IL(z)
of
q
m a t r i x of the
of S n.
= C ( z - l ) - i B ( z -I)
.
v(t) ' = (y(t) ',u(t) ') , s = p + q, rows in
en(t),
Fn,j.
Then
is the top left
This type of p r o c e d u r e
will r e p e a t e d l y
be used below. (iii) The third m a j o r a l g o r i t h m a finite p a s t e q u i v a l e n t finite.
The a l g o r i t h m A
x(t+l)
to
is the K a l m a n
e(t),
filter,
on the basis of
which computes
(1.5),
for
n
is
= Fx(t)
+ Lu(t)
+ K(t) e(t),
y(t)
^
= Hx(t)
+ c(t)
!
K(t)
= {FP(t) H
P(t+l) P(1)
It may be wise
= FP(t)F'
= FP(1)F'
There is an e n o r m o u s Gaussian
x(l}
P(t),
of r o u n d i n g literature
+ .Q}-IK(t)'
= 0.
replacing
it by
½{P(t)+P(t) '}
errors.
surrounding
this algorithm.
For
lies in the fact that it allows
to be calculated,
w h i c h we call
+ ~}-I
- K(t){HP(t)H'
+ KnK',
its importance
likelihood
likelihood,
+ K~K'
to sym/netrise
to reduce the e f f e c t s
our p u r p o s e s
+ K~}{HP(t)H'
L(8)
or better
and still
(-2T -I)
the
by that
speak of as the likelihood.
This is, apart from a c o n s t a n t , 1 T Zlog d e t { H P ( t ) H ' + ~ }
i T + ~ 1Ze(t) '{HP (t)H'+n}-l£ (t) .
(2.5)
7 1 Here
e
K, ~.
stands for the p a r a m e t e r s Those
indicate by in
~.
T.
In (2.5)
treating few,
in F, H, K
u(t)
if any,
assumption
we
The r e m a i n d e r the G a u s s i a n as a fixed
of the m e t h o d s
that the
e(t)
involved,
shall call
i.e.
are the v a r i a n c e s likelihood
sequence
those
in
system p a r a m e t e r s
of this c h a p t e r are Gaussian.
and c o v a r i a n c e s
has been w r i t t e n
of vectors.
down
We e m p h a s i s e
depend greatly
The likelihood,
is used to obtain an e s t i m a t i o n m e t h o d rather
F, H,
and shall
than b e c a u s e
that
on the (2.5), it is
the true likelihood. Notes on R e f e r e n c e s .
The fast F o u r i e r
t r a n s f o r m was i n t r o d u c e d
to
latter day science
in C o o l e y and Tukey
of the L e v i n s o n - W h i t t l e
(1965).
The v e c t o r
a l g o r i t h m was given in W h i t t l e
L a t t i c e f o r m s are s u r v e y e d in F r i e d l a n d e r
(1982).
form
(1963).
A g r e a t amount
of detail a b o u t the K a l m a n f i l t e r is found in A n d e r s o n a n d Moore (1979). 3.
Approximation Criteria
The p r o b l e m to be c o n s i d e r e d
in t h e r e m a i n d e r of this c h a p t e r is
that of a p p r o x i m a t i n g the true s y s t e m by one of finite M c M i l l a n degree.
T h i s degree,
n,
has to be d e t e r m i n e d .
Once t h i s
is
r e c o g n i s e d it m u s t a l s o be r e c o g n i s e d that it is not p o s s i b l e to p r o c e e d p u r e l y through the m i n i m i s a t i o n of a l w a y s be f u r t h e r r e d u c e d by t a k i n g procedures here considered choose
n n
log det ~n + d(n)CT/T' Here
~n
(2.5)
large.
by m i n i m i s i n g some f o r m of n = 0,1,...,N.
is the m a x i m u m l i k e l i h o o d e s t i m a t e of
and the f i r s t term in the m i n i m a l v a l u e of
(3.1)
~,
(2.55, for
n
given.
The c o n s t a n t
The second term in
is
n ( 2 q + p).
term w h i c h i n c r e a s e s as Two c o m m o n l y used w i l l be c a l l e d be c a l l e d
CT
n
BIC(n).
increases,
s e q u e n c e s are
AIC(n),
given
and
An upper bound,
d(n)
whereas C T ~ 2,
C T = log T, N,
is the d i m e n s i o n (3.1)
(N
is a p e n a l t y
the first d e c r e a s e s . in which c a s e
in w h i c h case has b e e n
m i g h t increase w i t h
(3.15
(3.15 w i l l
imposed on
and is n e e d e d in c o n n e c t i o n w i t h p r o o f s of a s y m p t o t i c p r o p e r t i e s of the method.
n,
essentially
(Some a p p r o x i r ~ t i o n is
i n v o l v e d in that statement.) which
(3.1)
is, e x c e p t for a constant,
of
T,
since t h a t can The a l t e r n a t i v e
n
(wi~h T5
T.)
in p r a c t i c e
such b o u n d s do not seem to be u s e d p r o b a b l y b e c a u s e the b o u n d s n e e d e d for v a l i d i t y are m u c h larger than v a l u e s of experienced
n
t h a t an
i n v e s t i g a t o r w o u l d c o n s i d e r r e a s o n a b l e and a r e needed
in the t h e o r e t i c a l
i n v e s t i g a t i o n only to e x c l u d e r i d i c u l o u s l y
large values. For the c a s e of
C T = log T
a j u s t i f i c a t i o n has been g i v e n by
R i s s a n e n on the b a s i s of a m i n i m u m d e s c r i p t i o n
length p r i n c i p l e .
The idea is to use the m o d e l
set to r e c o r d the d a t a in as f e w b i t s
as p o s s i b l e .
(or r a t h e r
The first term
T/2
b y it) g i v e s a
m e a s u r e of the a v e r a g e n u m b e r of b i t s r e q u i r e d for an o p t i m a l encoding when
n
is fixed and the m a x i m u m l i k e l i h o o d structure,
on G a u s s i a n a s s u m p t i o n s , i s decode,
t a k e n to be the true structure.
To
the m o d e l p a r a m e t e r s m u s t also be t r a n s m i t t e d a n d T / 2 by
the second t e r m in
(3.1), for
BIC,
measures
the n u m b e r of bits for
an optimal e n c o d i n g of these, to an a c c u r a c y d e t e r m i n e d by that of the m e t h o d of m a x i m u m likelihood.
The use of
CT - 2
justified by A k a i k e on the basis of a p r e d i c t i o n
has b e e n
theory,
and has b e e n
widely used. The e m p h a s i s in this c h a p t e r w i l l p r i n c i p a l l y be on the use of rational t r a n s f e r f u n c t i o n systems as a p p r o x i m a t i o n s more general kind. section.
to systems of a
T h i s will be f u r t h e r d i s c u s s e d in the next
H o w e v e r h e r e some d i s c u s s i o n of the case w h e r e there is a
true r a t i o n a l t r a n s f e r f u n c t i o n s y s t e m w i l l be g i v e n in r e l a t i o n to the use of
(3.1).
T h e c o n d i t i o n s under w h i c h the s t a t e m e n t s b e l o w
hold true are e s s e n t i a l l y
(6.1),
of fourth m o m e n t s of the
ej (t),
also d e p e n d o n a c o n d i t i o n (Compare b e l o w
(I.i).)
(6.2), below, p l u s the f i n i t e n e s s but the p r o o f s of the t h e o r e m s
det W(z)
This
6
# 0,
Izl >_ 1-6,
6 > 0.
may be as small as d e s i r e d but is
p r e s c r i b e d u pr~or~. Now assume there is a true T ÷ ~,
CT/T
+
0
no
n
minimises
(3.1) while,
(which is an i n s i g n i f i c a n t r e q u i r e m e n t ) .
following holds, w h e r e a.s. (i)
and
lim inf C T / ( 2 1 o g log T) > 1 then T~=
n ÷ n0, a.s.
If
lim sup C T / ( 2 T+=
n
a.s. to
loglog T) < 1 then
does not c o n v e r g e
n0.
lim inf C T = ~ then
If
lim sup C T < = T ~ ~
~ + n O in p r o b a b i l i t y .
then
!im l i m P{n > i. 6~0 T + ~ no } =
(3.2)
These results d e s e r v e careful i n t e r p r e t a t i o n . (i) should not be i n t e r p r e t e d as saying that a good value to use b e c a u s e with
T
tO be m e a n i n g f u l .
At is 3.9. of
T
2 loglog T At
T = i0
It is t h e r e f o r e not far f r o m
In the f i r s t place C T = 2 loglog T
changes
CT = 2
(3.2) s u g g e s t s that
AIC(n)
n
T = i000
for m o s t AIC(n).
values
The r e s u l t
is bad b e c a u s e it w i l l a l w a y s o v e r -
estimate the M c M i l l a n degree. no true d e g r e e and t h e n
is
far too slowly
it is 1.7 and at
met in p r a c t i c e , w h i c h is the value for
is how fast.
Then the
stands for " a l m o s t surely".
If
(ii) If
as
H o w e v e r in p r a c t i c e there w i l l be
should increase with
Some i n v e s t i g a t i o n s
suggest t h a t
T.
The q u e s t i o n
C T = 2, i.e. AIC,
10
gives an o p t i m a l rate Of increase, The r e s u l t
(3.2) d e s e r v e s
s i m p l e s t case w h e r e n = 1
a c c o r d i n g to c e r t a i n c r i t e r i a .
further d i s c u s s i o n .
q = i,
n0 = 0
We give this for the
so that
y(t)
= e(t).
When
is the m o d e l t h e n y(t)
+ ay(t-l)
= e(t)
+ ce(t-l),
We i n d i c a t e w h y
n = 1
value,
is u n i f o r m l y bounded.
when
CT
lal < I,
w i l l be p r e f e r r e d to
Icl < 1-6.
n = 0,
(3.3)
the true
The c h o i c e b e t w e e n the two
v a l u e s w i l l be b a s e d on log ~i + 2 C T / T - log ~0 = - l ° g ~ 0 / ~ l ) so that
n = 1
Consider
Fig.
w i l l be p r e f e r r e d w h e n
+ 2CT/T
A T = T l o g ( ~ 0 / ~ l) > 2C T •
i.
/ 1-6
c
-i+6
/ -i
1 a
F i g u r e I. The r e g i o n of o p t i m i s a t i o n lines t h r o u g h
±(1-6),
for
n = 1
is that b e l o w and a b o v e the
e x c l u d i n g the diagonal,
the l i k e l i h o o d c o u l d be at the boundary. maximum likelihood estimates that
(~,~)
it may be s h o ~
m o v e s to the diagonal.
but the m a x i m u m of
In fact if
Thus
that AT
that d i a g o n a l b y
lal < i o g { ( 2 - 6 ) / ~ } 1T
= 4,
let us say.
s t a t i o n a r y r a n d o m f u n c t i o n of
(a - c) ÷ 0,
Fig.
~ = log{(l+a)/(l-a)}
6,
i.e.
~(~)2 ~
so
i.
where
Let us
so that
Then this function,
is e v e n t u a l l y the m a x i m u m v a l u ~ is
are the
is e v e n t u a l l y the
m a x i m u m of a f u n c t i o n d e f i n e d on the d i a g o n a l of parameterise
a, 6
~(s)
of w h i c h is a
t a k e s the p l a c e of
t
in our p r e v i o u s
considerations
spectral d e n s i t y ~(u)
will,
as
~ + 0 el.
so that
Thus
A
(a,c)
as follows.
becomes
large v a l u e s
which will m a k e o p t i m i s a t i o n interpretation
continuously.
-~ < ~ < =.
its m a x i m u m for i n c r e a s i n g l y that approach
but v a r i e s
{cosh ~ } - i
will
increasingly of
u
(i,i)
has
that
large,
i.e. v a l u e s
approach
difficult.
~(u)
It is e v i d e n t
or
take
of
a
(-i,-i),
This r e s u l t has a n o t h e r
It is a p p r o x i m a t e l y
true that
~i
is the
minimum value of
-~{Id(~)12/lw(eiC~) l2}
IV
where that
W(z)
=
(z-c)/(z-a).
a-S ~ 0.
If
Iw(ei~)12 ÷ i,
a,S
(e,a) = (I,i)
than
does,
0 (for -i)
then
IW(ei~)12
or
(3.4)
a "notch"
If
a
reduced
~n
or
0.
bf
Where
Id(~)l 2
This f u n c t i o n
near
0,
en.
is i n c r e d i b l y
local minima to find.
to
(3.4) and the a b s o l u t e
This c o r r e s p o n d s
local maxima and minima, neighbourhoods the function
of
~,e
+i) or
IW(ei~)12 ~).
Thus
the n o t c h will by the shape of irregular that w i l l
for
(for
~
m i n i m u m m a y be very d i f f i c u l t 2 ~(u) will have many
small)
into small
because
of the n a t u r e of
e = log{(l+a)/(1-a)}. situation,
essentially
the same. in that
i.e.
that for general
It must be e m p h a s i s e d T
n, q, p, that
(3.2)
is is very
may need to be very large before
it is
relevant.
Notes on References. suggested in A k a i k e
The procedures (1969),
and above are in H a n n a n relating to
AIC
Rissanen
(1980),
described (1983).
(1981),
w h e n there is no true
Hannah and K a v a l i e r i s
(1980).
T
give
to the fact that
w h i c h w i l l be c o m p r e s s e d
a = ±i
The general "asymptotic"
for
faster
(for
(since
shape will be is d e t e r m i n e d
so that there w i l l be many v a l u e s
el
±n
be and what its precise large,
We know that
then
goes to
zero at
at other values
at
d(~)). el
so that it seeks to move
becomes
is f u r t h e r
for
The m e t h o d of m a x i m u m
(3.4),
(-i,-i).
to unity u n i f o r m l y
develops
2(i)
away f r o m
a-a ÷ o.
IW(ei~)l-2
a n d thus
will converge
as
to m i n i m i s e
towards e
(See s e c t i o n
remain b o u n d e d
uniformly,
likelihood attempts
(3.4)
de
in this s e c t i o n were The results
(1984). nO
in
(3.2)
For the results
see S h i b a t a
(1980),
12
4.
Rational
Transfer
Function ApDr0ximation
In this section a b r i e f theory c o n c e r n i n g
the a p p r o x i m a t i o n
by a p p r o x i m a t i n g
to
H
less c o n c e r n e d w i t h methods
account w i l l be given of some d e e p of the true structure
by a Hankel matrix
theory may c h o o s e
relate mostly
to the case where
W(z)
by a
possible
W(z)
for
n
finite
in the Hankel norm, w h i c h
singular value norm) to another.
for
H
from
has Thus
)',
past.
The
structure
~ ~).
matrices being (5.5).
describes
space on w h i c h
By i.e.
R.
H
(I
Q ~)
as the
operates
(1.2).
space
F(t)'
= 0,
(4.1)
(j,k)th
block,
Wj = 0,
of the f u t u r e on the
is therefore matrix
of
e n d o w e d with a e t,
namely
definition H
of tensor p r o d u c t
operates
blocks see b e l o w
is endowed w i t h a m e t r i c
m a t r i x of
Yt+l'
namely
that
block
is the
t'th
the c a n o n i c a l
= F(j-k)'
Fourier coefficient factorisation
Let this f a c t o r i s a t i o n
(This n o t a t i o n
(e(t)',e(t+l)',...)'
a b l o c k diagonal m a t r i x w i t h the diagonal
E{y(t+j)y(t+k) '} = F(k-j) Since
(or
we mean the T e n s o r p r o d u c t of the two
The space to w h i c h
we c o n s i d e r
norm
yt = ( y ( t - l ) ' , y ( t - 2 ) ' , . . . ) '
given by the c o v a r i a n c e
For the general
(j,k}th
)'
the d e p e n d e n c e
structure given by the c o v a r i a n c e with
is as small as
from one H i l b e r t
et =
E ( e t et+l)
Wj_k,J,k=l,2,... (4.1)
metric (I
H-~
(i.I),
Yt+l = Her + Ket+l' K
so t h a t
is the E u c l i d e a n
(y(t)',y(t+l)' ....
as is e a s i l y c h e c k e d
j < 0.
The idea is to a p p r o x i m a t e
as an o p e r a t o r
e t = (e(t-l) ',e(t-2)',...
where
The
To see w h a t is i n v o l v e d put Yt =
Then,
(Readers
this section.)
there are no inputs and
only that case w i l l be d i s c u s s e d here. to
of finite rank.
to "skip"
be
f(-m)
matrix of
f(--a)
of this, as for f(m) in -I~ -iv --iw * = (2~) W(e )~W(e ) .
is in a g r e e m e n t w i t h that in section 2 b e c a u s e
is the s p e c t r u m of the time r e v e r s e d process.)
Here
W(z)
f(-w]
= E WjzJ
13
and
det W + 0,
block,
W(j}
Izl > I.
Let
= 0, j < 0.
W
have
W(k-j)
(j,k)th
as the
Then
s = ( i • ~-%) w - 1 ~ ( z ~ n ½) operates from £2 to £2' sequences al,a2,.., with produc~
(a,b)
decomposition is upper
= Zajbj.
Thus S
triangular
The blocks,
Sj+k_l,
S
and for
z = exp i~ If
Toeplitz,
singular
by the matrix
matrix
so that
W-IH
then that
it is easily
q = 1
In the scalar
whose
W -I
is of Hankel
value
because
then
case,
f(-~)
q = I,
Thus we write
in the typical
form.
(j,k) th
place
function (4.3)
checked
= f(~)
we shall
w(z).
W
is also of matrix
= ~-½ W ( z ) -I W(z -I) n ½
matrix. letters.
S
j,k = 1,2,...,
are g e n e r a t e d S(z)
it is
is also a Hankel
and block
It follows
in
where £2 is the space of all ZIajl 2 < ~ and with the ihner
is sought.
that form.
(4.2)
that this is a unitary
so that in future
In this case
~ = ~
and
W = W.
use lower case
therefore
(4.3)
becomes
s(z) = w(z-l)/w(z) which
is o b v i o u s l y
of modulus
1
for
has real coefficients.
Of course
analytic
for
However
analytic
part,
The singular that operator
Izl ~ I. i.e.
value
Introduce
unlike
since W(z),
only the c o e f f i c i e n t s
the coefficients decomposition
of
to be appropriate,
S = 2 pjnj~j, 1
z = exp i~
S(z),
e.g.
of S
z 3,
j > 0,
is of the form
w(z) is not
of the
occur in
S.
(assuming
compact)
njnk = ~j~k = ~jk
Pl ~ P2 ~ "'" ~ 0.
the new random variables uj ~ nj
(I
@ ~½)
*
(I
@ n -½) e .
xj = ~j
W -I Yt+l t
Then E(uju k) = E(xjx k) = 6jk; The occur
uj,
xj
m i g h t be called
in the classical
analysis
as functions
theory
E(ujx k) = 6jkP j.
"discriminant
functions"
of statistical
canonical
that are used to c l a s s i f y
since they correlation
individuals.
The
14
pj
themselves
e s, s ~ t, canonical
w o u l d be c a l l e d
spans the same correlations
w o u l d be o b t a i n e d with the metric yr.
if
"canonical
space as do the
and the same S
uj
determined
(at least for
Hankel norm a p p r o x i m a t i o n given
n.
virtues
to
H
these
for the A R M A case.
H.
block,
Call
that such a c a n o n i c a l
r(v,j)
the
v=l,2, .... r(v,j),
after f i r s t
Then
by the f i r s t j'th
n
row,
of
it is
the b e s t W(z),
for
since the in a
to estimate
ideas h a v e b e e n used by Akaike
representation
is chosen as c o n s t i t u t e d
to
of h a v i n g
survey,
It will be r e c a l l e d
matrix
is known
into that here
that we shall b r i e f l y (1.6),
6
xi)
from a space
are b y no means evident
c o n t e x t nor are the e f f e c t s However
of
and e q u i v a l e n t
We shall not enter further
pj, uj, xj.
the same
by the c o v a r i a n c e
q = I) to d e t e r m i n e
of such an a p p r o x i m a t i o n
statistical
s ~ t,
as an operator
Once the singular value d e c o m p o s i t i o n
possible u n i q u e l y
of
yS,
Since
(but not the same
were c o n s i d e r e d
structure
correlatlcns".
introducing
in a way
a canonical
form is a t t a i n e d linearly
the
if
H0
independent
j = 1,...,q,
rows
in the v'th
such a set of rows is always of the form
v = 1 ..... nj;
j = 1 ..... q;
Znj = n.
(4.4)
The
n. are known as the K r o n e c k e r indices. They uniquely ] determine these first linearly i n d e p e n d e n t rows of H . There is 0 a c o r r e s p o n d i n g unique f a c t o r i s a t i o n of W(z) = A(z-l)-ic(z-l), w h e r e A(z -I) = ZA(z),
C(z)
= ZC(z)
and
in the j'th place in the diagonal. n o m i a l s with monic,
A having diagonal
Z
is diagonal with
A, C
are m a t r l c e s
elements
of degree
i.e. have unity as the c o e f f i c i e n t
= C - A
the d e c o m p o s i t i o n
is u n i q u e l y
of
znJ.
nj
z-nJ or p o l y w h i c h are
Putting
d e f i n e d by the i n e q u a l i t i e s
on degrees deg aij
< deg ~jj, j + i;
deg ~ij
~ deg aii'
j ~ i;
deg aij < deg aii" deg eij < deg ~ ii'
i,j = 1,2 .... ,q.
A k a i k e ' s m e t h o d leads to estimates y~t = (y(t)', y ( t - l ) ' , . . . , y ( t - h ) ' ) ' fitting an a u t o r e g r e s s i o n minimising
BIC
or
AIC.
j > i
of the
nj
and of
A.
Put
where
h
m i g h t be c h o s e n by
and d e t e r m i n i n g
h
as the order
Put,
for
£ = 0,1,...;
m = 0,1,...,q-l,
15
y£m(t) ' = (y(t+l} ', y(t+2) ', .... y(t+£) ', Yl(t+£+l),..., Ym(t+£+l) ) '. If the smallest
nj
is for
j = m
and
nm = £
then row
(see (4.4)) is linearly dependent on earlier rows of correspondingly y£,m(t)
H
r(E+l,m) and
(see (4.1)) there will be some linear function of
that is orthogonal to the
past,
while this will not be
true for £i < £ or for £I = £' ml < m. we consider the solutions of ~J[DjI£q+m - ~£,m~£,m ]"
To judge when this is so
01 > D2 > "'" > 0£q+m
1 ~£,m = {T Zy£,m(t)Y£,m(t),}-½ T1 Z ~ , ~ t ) (gt) ,{~l zgt(gt),}-% where the summations are over £q+m 4 hq.
The
~j
h+l < t < T-£-I.
It is assumed that
are the canonical correlations between the
Y£,m(t) and 9 t. Successively examining these canonical correlations (ordering (£,m) in dictionary order, first according to £ and then
m)
we stop when,
for the first time
_(T_V£,m)log (l-~£q+m) 2 - ~£,m > 0; If this happens at eliminate
£(i), m(1)
Ym(1) (t+£(1)+j),
then
j > 0,
nm(1)
~£,m = q(h-£)-m+l. is put at
from all future
£(i). Y£,m
Now
and
continue, always taking 9£,m as qh-dim y£,m(t) + i. Once nm(1) is determined we eliminate Ym(2) (t+£(2)+j), j > 0, from future y£,m(t) and continue and so on. In this way all nk are determined and with each will be associated an ~(k), which is the ~j for the smallest ~j at the step when nk was determined. ~j is determined only up to a scalar factor and that is fixed in ~(k) making the last element unity. of the estimate of to yj(t+v) determined,
A(z)
Now
~(k)
so that the element of
nk
in canonical form corresponding to
available.
~(k)
k'th
by row
corresponding
in y£,m(t), for £,m at the values where nk was is the coefficient of z v-I in the estimate of ak,j(z).
Thus at the end of the calculation the A(z),
determines the
and
estimate
~
of
the Kronecker indices, are
It is then necessary to estimate
C(z).
This would be
done
by forming ~(z-l)y(t) and using the calculated autocovariances of this to estimate those of C(z-l)e(t). Then an estimate of the spectrum will be obtained and factored to find an estimate of Since
A(0) = C(0)
and the row degrees of
C(z)
C(z).
are prescribed by
the degree inequalities this would have to be done carefully and would not be a trivial calculation for
q > i.
In any case these
18
estimates of
A(z), C(z)
are inefficient b u t could be used to
initiate a minimisation of
(2.5), in the form for
to the canonical choice of
H0
and the
nk"
(1.5) corresponding
We do not proceed
further with the description because there are problems with the method.
It is, so far, restricted to the ARMA case.
determined in an inefficient estimation procedure adjustment of them has been suggested.
The
nk
are
and no later
However the method is of
interest because of its association with the theory of the first part of this section. Notes on References.
Adamyan,
and Jewell and Bloomfield norm approximation. q=l,
Arov and Krein
(1983)
(1983,a)
suggest,
that the canonical correlations be found directly
estimate.
Akaike
(1983)
deal with the theory of Hankel
Jewell and Bloomfield
s(z) = W(z)/W(z-l),
for
from
w h i c h is to be obtained by factoring a spectral (1969,a)
presents his method.
of a moving-average model see Hannah 5.
(1971), Glover
For some estimation
(1970).
A Gauss-Newton Procedure
(i)
First the case
q=l
and the calculations
Gauss-Newton procedure but to include
n
! T T~ eT (t)2' Here
At each iteration this is to
Thus consider = cT(z-l) _l{a T (z-1)y(t)-bT(z-1)u(t)}.
eT(t)
at, by, c r
The idea is to use a
to approximate to the true A~MAX structure
in the estimation.
be done recursively.
ARMAX model for
will be discussed because this is important
are then quite feasible.
are the transfer functions, q=l
and
T
for given
n,
(5.1) in the
is the vector of system parameters
i.e. the 3n freely varying coefficients in aT, b , c T. Here, again, we use lower case letters for the scalar case. Note that b
is, i n general,
a row vector since we do not require
p=l.
The
and are functions only of wT(z)-i = CT(Z-I)-IaT(Z -1) eT(t) WT(z)-l£T(z) = c (Z-I)-IbT(z-I). The procedure is to linearise these functions about a previous estimate, of
(5.1)
to a linear problem.
Gauss-Newton but includes
n
which reduces the minimisation
As has been said the procedure is in the optimisation.
It is necessary
to obtain a first estimate from which to commence the itezation. This is done by taking autoregression. Step 0.
eT ~ 1
and choosing
Put vCt)
at, b T
We go on to describe the algorithm.
=
~-u(t)/'
t
=
1 ..... T
by regression
17
and use the Levinson-Whittle algorithm. hand element of
S n.
Choose
~
Let
G2 n
be the top left
to minimise
log ~2 + n(p+l)log T/T. n Let the first row of scalar and in
~(z)
F~,j
be called
(aj'b!)3 where
a.~
is
5. has p elements. Then ~. is the j'th coefficient 3 J and ~ is the j'th coefficient vector in ~(z). S
The basic algorithm is now given by step 1 which is repeated until convergence.
To commence step 1 one needs estimates
These will initially come from step 0, with Step i.
Define
e(t), ~(t), ~(t), ~(t)
. . . .
ce(t)=~y(t),
~
c~(t)=y(t),
n, a, ~, c.
~ ~ 1.
by
^
(t)=e(t), c~(t)=u(t),
y(t)=~(t)=~(t)=~(t)=e(t)=0,
t < 0.
Put
I ~ (t~\
v(t) :
l-~(t)~ ~-~(t)/
,
t = 1,2 ..... T
Fn,j' Fn,j' Sn'
and use the Levinson-Whittle recursion to generate
Sn" Put n
£n(t) = ~Fn,jV(t-n+j)
Sa(1)\ n,n~ (1)| = Sn_11 ~n,n[
~(l~ I n an /
n.
as zero for
t > T.
to
(i.e. repetition)
~(t)-~(t)+$(t) 6 (z-l)~(t)
a(z-l)~(t)-&~(t)=0
and
by
e(t)
implies that
~(t)- ~(t)
is,
of the
in
(5.2)
for
a(z-l)~(t) for
n > n
= e(t) a
linear c o m b i n a t i o n of the v a r i a b l e s in the regression. W h e n this ~(1) ~(1) i s done t h e ~ ., must be r e g a r d e d as a d j u s t m e n t s t o t h e n,3 n,] previous a~,j, c~,j i.e. m u s t be added to these. (ii)
N o w c o n s i d e r the v e c t o r case, w h i c h is m o r e e l a b o r a t e .
r e t u r n to the set, n,
for g i v e n
~.
M(n),
of all systems,
(We fix
the s y s t e m p a r a m e t e r s
~
the set of all H a n k e l m a t r i c e s
W(z), L(z). dimension
(1.3), of M c M i l l a n d e g r e e
for the m o m e n t only, b e c a u s e
that n e e d discussion.)
the r e q u i r e m e n t s b e l o w
(i.i))
H
of rank
First
M(n) n
it is
is e q u i v a l e n t l y
(for
W(z)
and of all p a i r s of t r a n s f e r
obeying functions
It m a y b e c o n c e p t u a l i s e d as a s m o o t h surface of n(2q+p)
and,
technically,
is an a n a l y t i c m a n i f o l d .
A
r e a s o n a b l e a p p r o a c h to e s t i m a t i n g a system w o u l d t h e r e f o r e be to determine
n
and t h e n the a p p r o p r i a t e p o i n t on
w h a t was done for because
M(n)
q=l.
For
q > 1
be m a p p e d h o m e o m o r p h i c a l l y
into E u c l i d e a n space.
a l t e r n a t i v e to the c o n s i d e r a t i o n of the K r o n e c k e r indices, There is, however,
a sum of
and this is
c a n n o t t h e n b e c o v e r e d by one n e i g h b o u r h o o d t h a t m a y
of all systems w h o s e K r o n e c k e r indices sum
of M(n)
M(n)
~his is h o w e v e r a p r o b l e m
M(n)
to
M(n) n
is the u n i o n
and h e n c e an
is the d e t e r m i n a t i o n of
as was the t e c h n i q u e u s e d in s e c t i o n 4.
s o m e t h i n g very a r b i t r a r y in the d e c o m p o s i t i o n
into sets c o r r e s p o n d i n g to d i f f e r e n t p a r t i t i o n s of q
n
as
i n t e g e r s and the e f f o r t r e q u i r e d for an e f f i c i e n t
p r o c e d u r e to d i s c o v e r t h e s e
is fairly c o n s i d e r a b l e .
of K r o n e c k e r indices s u m m i n g to n a m e l y those which,
for
n
n = qh + m,
A m o n g s t the set
there is one special set, 0 ~ m < q,
are of the f o r m
n I = n 2 = ... n m = h+l, nm+ 1 = -- . = n q = h. T h e n the f i r s t l i n e a r l y i n d e p e n d e n t r o w s in H are just the first n rows
n If
21
U(n)
is the subset of
independent
then
M(n)
u(n)
for which these rows are linearly
is open and dense in
or nothing is lost in restricting
attention
to
unlikely that the maximum of the likelihood in
M(n).
(However
u(n)
would provide
M(n).
Thus
U(n).
little
It is most
will be found off
a bad coordinate
system in
which to work if the maximum was near the edge.)
We describe
in another way by giving a unique description
A(z),
of
U(n) U(n)
B(z), C(z)
in W(2) = A(z-1)-lc(z-l), L(z) = A(z-l)-iB(z-l) for a system in U(n). We do this by describing the coefficient matrices An, j, Bn,j, Cn, j in A(z), B(z), C(z).
These will be depicted
indicating a freely varying are after the
All partitions
m'th row or column
A n ,0
= [~m Oq_m] ' Bnw0 = O, An , 1 = [[ 0. ]
Cn, 0
An,h+l' All other
below with a star
submatrix of elements.
Bn,h+l'
Cn,h+l =
An,j, Bn,j, Cn,j,
j ~ h+l,
are unrestricted.
We do
not mean that An,h+l, Bn,h+l, Cn,h+ 1 are equal. The vector T of system parameters coordinatising U(n) is of dimension n(2q+p) and is made up of the freely varying elements matrices.
in the coefficient
We now go on to describe how to estimate
n, T
and
~.
We do this by a series of steps that are related to those for but are more complicated. step
2
is
iterated.
Steps 0 to 1 are not repeated. Always the output
from the previous
the input to the next so we do not indicate
q=l
Only step is
those by a special
notation i.e. we do not for example write
~!i) for the ~. 3 J matrix found at step 1 since it is clear which Aj is used an step 2 i.e. that from step 1 and not step 0. Also we shall now index the stages in the Levinson-Whittle
recursion
by
h,
rather than
n
as
before. Step 0.
Put vet)
~ \-uCtl/'
t
=
1, ....
T
and use the Levinson Whittle recursion. hand choose
q x q ~,
submatrix of i.e.
n,
S h, h = 0,1,2,...,
be the top left n where n = qh, and
to minimise
log det ~n + n(q+p) Let the first block of
Let
q
log T/T,
rows in
Fh, j
n
=
hq,
h=
be called
0,i,...
|Ae,
I~j ]
and
22
let ~ be called ~. Then Aj: Bj n c o e f f i c i e n t m a t r i c e s in A(z), B{z), and
n
Step i.
=
are the
j = l,...,h
with
A0 = Ig,
C(z)
~ Iq
.
Put 8(t)
= Z Ajy(t-j) 0
- Z Bju(t-j) 1
and
v(t) =
|-u(t)J
k-act)/
and use the L e v i n s o n - W h i t t l e element of
Sh
is c a l l e d
algorithm.
~n'
Again
n = hq
the top left hand
and we choose
h
i.e.
to m i n i m i s e log det ~ Now
IAj, Bj, 6 9 ]
coefficient
Now
m
= ~-i
+ n(2q+p)
n
are the top
matrices
in
to the case
~-i
or equal to the true
in
in
C(z),
and p r o v i d e
with
A0 = B0 = I
n = £q+m,
for
0 ~ m < q.
m = 1,2,...,q-l,
at step i,
£,
T
is to insert
(5.3) and the elements
transfer
function
procedure,
in the a p p r o p r i a t e
a l g o r i t h m will be used for of
m
need be taken.)
n =
(h-l)q + m 1 < j < m
m.
An, 0, Cn, 0.
the c a l c u l a t i o n
we regress
Yj(t)
here
at (see
t h e m as a r e g r e s s i o n
(It is u n l i k e l y
and then only
The r e g r e s s i o n
other variables,
that we d e s c r i b e If
q > 5
places
c h e a p l y using the c a l c u l a t i o n s
It is simpler to d e s c r i b e
one for each value of
than
We c o n s i d e r
done at step i.
step 1 but the details are too c o m p l i c a t e d to be d e s c r i b e d the references).
m=0
will be greater
indicated by a star in
This can be done c o m p u t a t i o n a l l y
for
was p r e f e r r e d
our procedure.
then are a l r e a d y
zero e l e m e n t s
h
h
the
We c h o o s e
since
to w h i c h
at step 1
which e x p l a i n s
but the c a l c u l a t i o n s
h=0,1,2,...
F~,j
If there is a true r a t i o n a l
system then for large e n o u g h
The p r o b l e m
n = hq,
rows in
B(z),
and need only compute
by the criterion.
m = q
q
A(z),
has to be determined,
corresponds
log T/T,
4
that the
or fewer v a l u e s
is of a v e c t o r v a r i a b l e
but is c a r r i e d for a typical
on
out row by row so row,
j,
j = l,...,q.
on the f o l l o w i n g v a r i a b l e s ^
(i) and
- Yk(t-i), i = 2,...,h
for
k=l,...,q;
where
k = m+l,...,q.
i = l,...,h
for
k ~ m
23
(2}
Uk(t-i),
k = l,...,q;
i = 1 .... ,~
(3)
ek{t-i),
k = l,...,q;
i = 1 .... ,h
where ^j^ Z C4e(tj) = 0
g
g
^j Z A4y(tj) 0
Z Bju(t-j ), 1
A
y(t) FQr
m < j < q
= u(t)
we regress
= e(t) yj(t)
(i)
-Yk(t-i),
(2) (3)
-(Yk(t) - Sk(t)), uk(t-i), ek(t-i),
The coefficient regression ek(t-i)
of
-Yk(t-i) aj,k(i) ,
in relation as
or
-(Yk(t)
in
A(z) C(z).
from the
choosing
to m i n i m i s e
q
m = q
the left
end of step
I).
Now we have
an
indicated
by
As was said above
for
Step 2.
q2, q2
n=(~-l)q+m, products
of
is chosen by
n = q(~-l)
with
+ m,
m = i, .... q.
expression
the latter
0, 1 are not repeated.
will be necessary,
F o r m matrices
respectively
~(z)
~n'
uk(t-i),
at the
of the form
n = ~.
steps
often no repetition
n
j'th
for
and cross
Now
log T/T,
B(z),
is the
The matrix
side is just the m i n i m i s e d
n, A(z),
(5.3)
- ek(t))
and s i m i l a r l y
regressions.
log det ~n + n{2q+p) (For
i = 1 ..... ~-i.
by the sums of squares
the residuals m,
i = 1 .... ,h-l.
k = m+l ..... q. k = 1 ..... q,
to • B(z),
T -I
t ~ O.
on
k = l,...,q;
estimate,
is estimated
= O,
q(t), and
~(t),
qp
~(t),
columns
Step
2 may be but
or at most one. of
q
rows
and,
by solving
A
h 0Z ~j [q(t-j},
~(t-j) , ~(t-j)]
=
(y(t)',
(y(t} ', u(t) ', e(t) ') = 0, Here
A
e(t)
is obtained 0Z ~je(t-j)
with the usual product wherein
Iq,(5.4)
t ~ 0.
from
= - 1Z ~ u(t-J)3
initial
u(t)', e(t)')@
conditions.
a typical
block
is
+ By
0Z Ajy^ (t-j) X @ Y
xijY ,
we m e a n
(5.5) the tensor
i = 1,...,a;
j = l,...,b
24
where
X
is
a x b.
Of course in (5.4)
all blocks are a scalar multiple of column of
n(t), for example, is
X
lq.
is
1 x (2q+p)
Thus the
and
i+q(j-l)th
O(z=l)-iEijy(t),
where
Eij
consists of zeros save for a unit in the (i,j)th place. Put
I [n(t)!] ~v(J ) = ~ Xl-C(t) ~-l[q(t+j), -C(t+j), -~(t+j)] . L-C (t) This matrix is of dimension
q(2q + p).
It is to be the
that is the input to the Levinson-Whittle carried out.
It is, thus,
computational effort.
For
(5.6)
~v(j)
recursion which is to be
q(2q + p)
that determines the
q = p = 5
this is
75,
which already
would be a rather large scale implementation of the Levinson-Whittle recursion. In cases where q is larger it may be necessary to use some other expedient and we discuss this in remarks below. Let
~(t)
be the vector obtained by adding columns numbered
i + q(i - I),
i -- l,...,q
in
n(t)
and similarly for
~(t), ~(t)
in relation to ~(t), u(t). (It is ~(t), ~(t), ~(t) that correspond most closely to the quantities defined for q=l.) Thus h 0Z Cjn(t-j) = y(t), 0Z Cj~(t-j) = e(t), 0Z Cj~(t-j)=u(t). ^
^
^
Now form, for each h recursion with (5.6), ^ = ~-I_ ~h,h Sh i
^
value considered in the L e v i n s o n - ~ i t t l e
h-I 7 ~ !Z[n(t-h+j) 0 h-l,j T
-~(t-h+j), {~(t)
Here e(t) q (2q+p).
is as from (5.5).
This vector,
Th, j = Th_l, j + Fh_l,h_jTh, h,
-~(t-h+j)] '
- ~(t) {h,h'
+ e(tl}. is of dimension
j = 0,1,2 .... ,h-l.
To initiate take ~0,0 to have zeros everywhere save for units in the places numbered i + q(i-l), i = l,...,q; q(q+p) + i + q(i-l), ^
i = l,...,q.
NOW the
Th, j
Bj, Cj, for n = hq. Thus element in the i + q(k-l)'th
provide^ estimates of the matrices
Aj,
Ah, j has as estimate of aik(j) the place in ~h,j" ~h,j has as its
(i,k) 'th element the element in place
q2 + i + q(k-l)
while Ch, j has as its (i,k) 'th element that in the {q2 + qP + i + q(k-l)}'th place. Next put,
in
~h,j
25
~n = T1 Z ~n(t)&n(t)'"
n = hq,
where h h Z th, jen(t- j) = Z A ,jy(t-j) 0 0 h and choose
i.e.
~
h - Z Bh,jult-j) 1
so that this minimises
log det ~n + n(2q + p) log T/T,
Now we seek to estimate
m
in
n = (~ - l)q + m,
as in step I of the algorithm. this as a regression formed at
(5.4) columns
elements
in
numbered
[A(j),
B(j), C(j)]
i+q(k-l),
bik(J)
matrix
Oik(J). element
in addition,
q2+qp+i+q(k-l)
are added,
having been eliminated) Now call
T
order,
where
parameter
comes
column index,
~h,j' the
i+q(k-l),
from k
all columns
i=l,...,m; Xj(t)
columns numbered i=m+l,...,q,
A(j),
j,
B(j)
i,k=l .... ,q for which
to be null.
Thus
k=m+l,...,q
are
except that for the i+q(k-l),
k=l,...,m
to form a matrix of only
to lag,
q(2q+p)
is associated
q2+qp+i+q(k-l),
is prescribed
these parameters
first according
~(t-j)]
aik(J) , the
k=l,...,p
Now eliminate
the vector of estimates
n
n=(~-l)g+m,
~(t-j),
As in forming
Call the resulting matrix
X0(t),
|-n(t-j),
is associated with
in (5.3)
columns numbered
eliminated.
it could be computed using
i=l,...,q;
and the column numbered
the corresponding j=l
step.
to describe
in the sense that the column
i,k=l ..... q
q2+i+q(k-l),
is associated with for
though
Consider
in this matrix are associated with the
column numbered with
m,
(5.6).
from the previous
q(2q+p)
m = 1,2,...,q-l,
Again it will be easiest
for each
the output from the use of
(5.7)
n = hg.
(all others
m(q-m)
of system parameters
are arranged or
C(j),
for
in dictionary
then according
and finally according
columns.
to whether
then according
to row index
i.
the to
Then
~n = {TI ZX(t) '~-ix(t)}-i {~I ZX(t) ,~-l(~(t) + e(t) -~(t))}; x(t) = [X0(t),
We emphasise
Xl(t) . . . .
that
of step 2.
~(t), ~(t)
are all formed using
step, which at the first use of step 2
step I, but later will have been from a previous use
Only
at this step.
].
X(t), ~, ~(t),
the output from the previous will have been
(5.8)
h
has been determined
The notation
~n
in (5.8)
by previous calculations should not be confused
26 ^
with ;fh,~ type
earlier. h,j
~n
is made up of many submatrices
and is of dimension
n(2q + p),
of the
n ~ (h-l)q + m.
We now again put = TI Z e n (t) e^n (t)',
~n
n = (h-l)q + m
(5.9)
h
Z 6 e (t-j) 0 n,3 n where
~n,j'
= Z A (t-j) - Z B ju(t-j) 0 n'JY 1 n,
~n,j' dn,j
the identification
have elements
discussed
before
obtained
(5.7).
~n
according
We choose
~
to
to
minimise log det ~n + n(2q + p) log T/T,
n = (h-l)q + m,
m = 1,2,...,q. (5.10)
Again for
m = q
optimised
(5.7).
the value of this criterion Then
values corresponding of
~
from
~j, ~j, ~j
to
~, i
is
(h-l)q + m
(5.9) that optimised
the
Remarks.
I.
analogues
here.
(5.4)
~
in
(5.5)).
All of the remarks In relation
from these
a
in relation
this factorisation references). 2. and
AA " C(e i ~)~C(e I~)
which does have
det C(z)
are available
Much of the work involved q
begin to be important.
would not be unreasonable step 1.
for
*
[z[ ~ i.
simulations
of
transfer n
function
is improved
at
Iz[ ~ I.
so as to obtain Algorithms
for
(see the
h, m
p
there it
found at
of step 2 the values of
step.
of
and if that
To reduce the calculation
to do them only at the
at the previous
with rational
that the determination
C(z)
is in step 2 where the sizes of
(5.5) have been computed we may move determined
# 0,
criterion
canonically
# 0,
of
but we omit details here
In any case at repetitions
h, m
the description
to the scalar case have
det ~(z)
from the first use of step 2 could be used. (5.4),
is the value
to the use of an estimate
again a problem will arise unless
C(z),
~
~j, ~j, Cj, ~,
This completes
Again this can be checked via a Schur-Cohn fails we should factor
and
to be the
(5.10).
We may now repeat step 2 commencing (which defines the algorithm.
is that which
are finally defined
h,
When that is done, once straight to
(5.8),
(5.9)
However experience with generated
data shows
at the first use of step
2 and it may improve again at later iterations
of that step.
27
N o t e s on R e f e r e n c e s .
The a l g o r i t h m s here d e s c r i b e d w e r e
p r e s e n t e d in R a n n a n and R i s s a n e n
(1982), H a n n a n and K a v a l i e r i s
The emphasis there w a s m o r e on o r d e r d e t e r m i n a t i o n . d e t e r m i n a t i o n of m
in step 1 (i.e.
is given in the second r e f e r e n c e . b e g i n n i n g of s u b s e c t i o n example.
(ii)
(1969)
Tunnicliffe Wilson
(1972).
For
(1984).
the
q > i) an a l t e r n a t l v e c a l c u l a t i o n For the structure t h e o r y at the
see D i e s t l e r and H a n n a h
The a l g o r i t h m in remark 4 in s u b s e c t i o n
Tunnicliffe Wilson
first
(1981),
for
(i) is due to
and its m a t r i c i a l v e r s i o n to
28
6.
Some T h e o r e t i c a l C o n s i d e r a t i o n s
This section w i l l be very b r i e f this account,
nor c o u l d
a v a i l a b l e here.
since t h e o r y is not the p u r p o s e of
such t h e o r y be f u l l y p r e s e n t e d in the space
H o w e v e r there seems to be some v i r t u e in i n d i c a t i n g
the scope of the t h e o r y u n d e r l y i n g
the m e t h o d s .
In the first place it is not n e c e s s a r y t h a t linear i n n o v a t i o n s , e(t),
be G a u s s i a n and all of the m e t h o d s are v a l i d u n d e r m u c h m o r e
general c o n d i t i o n s the
e(t)
in the sense t h a t the same theory o b t a i n s as if
were Gaussian.
The e s s e n t i a l c o n d i t i o n
E{e(t) le(t-l) , e(t-2) ....
} = 0.
This is e q u i v a l e n t to the a s s e r t i o n ,
for
(1.3), that the b e s t l i n e a r p r e d i c t o r
by a linear system.
Asymptotic
u(t)
(in the
if they are to be
require additionally that (6.2)
} = n.
some r e g u l a r i t y c o n d i t i o n s of a r e a s o n a b l y g e n e r a l
nature are n e e d e d b u t we do not d i s c u s s Of course
see
in that sense, g e n e r a t e d
distributions,
E{e(t)e(t) 'le(t-l) ....
- ZLiu(t-i) ,
is the b e s t p r e d i c t o r
so t h a t the d a t a is,
For
(6.1) y(t)
least squares sense)
the same as for the G a u s s i a n case,
is that
(6.1),
t h e m here.
(6.2) w i l l h o l d if the
with zero mean v e c t o r and finite
e(t)
are i n d e p e n d e n t ,
s e c o n d moments,
but are c o n s i d e r a b l y
more general. 7.
On-Line Procedures
Here only the case
p = q = I
m e t h o d easily g e n e r a l i s e s
to
w i l l be c o n s i d e r e d , p > I.
c o n c e r n i n g m e t h o d s for real time, and this has r e c e n t l y b e e n references.
There
though the
is a large l i t e r a t u r e
o n - l i n e e s t i m a t i o n of s y s t e m s
surveyed,
as w i l l be i n d i c a t e d in the
Here a t t e n t i o n will be c o n c e n t r a t e d on an o n - l i n e
implementation,
for
q = i,
of the a l g o r i t h m d e s c r i b e d in s e c t i o n 5.
In other w o r d s we i m p l e m e n t the two steps of this a l g o r i t h m in an o n - l i n e fashion, w i t h the step 1 i t e r a t e d Before d e s c r i b i n g that let us d e s c r i b e procedures.
(i.e. repeated)
once.
three k n o w n on-line
Each is of the f o r m
T(t) = T(t-l)
+ P(t)x(t)$(t),
e(t)
= w(t)
- T(t)'x(t)
2g
T
P(t)
= {z x ( t ) x ( t ) ' } - i 1
= P(t-l) Here and
v(t)
is the
T(t)
"independent
is the e s t i m a t e
coefficients. w(t),
- {I + x(t) 'P(t-l)x(t) } - I p ( t - l ) x ( t ) x ( t ) ' P ( t - l ) .
must
at t i m e
In t h e b a s i c
be c o n s t r u c t e d
w i t h the e s t i m a t e
on-line at t i m e
T(t-l).
(I) RLS = R e c u r s i v e
least
variable" t
in a r e g r e s s i o n
of the v e c t o r
procedures t
squares.
to t i m e
identify
bl(t),
t
together
T, x ( t ) ,
This corresponds
ah(t),
and probably
w(t).
to s t e p 0.
• (t) ' =
(al(t),
x(t) ' =
( - y ( t - l ) , - y ( t - 2 ) ..... -y(t-h) , u ( t - 1 ) , u ( t - 2 ) , . . . , u ( t - h ) }
w (t)
a2(t) .....
x(t)
of r e g r e s s i o n
x(t),
from data
In e a c h c a s e w e
on
b2(t) .....
bh(t)).
= y (t).
(2) A M L = A p p r o x i m a t e
maximum
likelihood.
This
corresponds
t o the
first use o f s t e p i.
T(t) ' =
(ai(t) ..... an(t) ,bl(t) ..... b n ( t ) ,Cl(t ) ..... C n ( t ) )
x(t) ' =
(-y(t-1) ..... - y ( t - n ) , u ( t - 1 ) ..... u ( t - n ) , % ( t - 1 ) ..... % ( t - n ) )
w(t)
= y(t).
In fact w h a t
is m o s t p r o p e r l y
but i n s t e a d
£ (t-j)
~(t)
= y(t)
e (t-j)
in
x(t)
- T(t)x(t).
This c a n b e done s i n c e which uses
c a l l e d /hML u s e s n o t
at t i m e
t
the l a t e s t
value
used
is
~(t-1)
T(t-1).
(3) RML = R e c u r s i v e s e c o n d use o f s t e p
maximum
likelinoodo
This
corresponas
to t h e
I.
T(t) ' =
(a l(t) , .... a n(t) ,b l(t), .... b n ( t ) , c l(t) , .... c n(t))
x(t) ' =
(-~(t-1) ..... - ~ ( t - n ) , ~ ( t - 1 ) ..... ~(t-n) ,~(t-1) ..... ~(t-n))
w(t)
= ~(t)
+ ~(t)
n 7 Cj (t)x(t-j)' 0 y(t)
= u(t)
As p r e s e n t e d independently
=
= e(t)
- ~(t).
(7.1)
( - y ( t ) , u ( t ) , ~ ( t ) ) , C 0 ( t ) - l, = 0,
t ~ 0.
above each of these of the others.
(7.2)
c o u l d De u s e d as a p r o c e d u r e
Of c o u r s e
n
is fixed.
It is k n o w n
80
that AML m a y not converge, McMillan degree
n,
e v e n if the true s y s t e m is A R M A X of
unless
2R(c(ei~) - I - ½) > 0,
~ q [-~,~],
i.e. unless the p o s i t i v e real c o n d i t i o n is s a t i s f i e d .
It seems
that ~ML m a y fail u n l e s s the l o c a t i o n of the zeros o f
C(z)
is
m o n i t o r e d and w h e n these move inside the u n i t circle then the Cj {t)
u s e d in f o r m i n g
~ (t) , ~ (t) , ~ (t) , % (t)
m u s t be held at
fixed v a l u e s o u t s i d e of the u n i t circle u n t i l the o u t p u t vector, T(t)
corresponds
to a stable
C.(t) set, i.e. a set with zeros 3 For these reasons it has b e e n s u g g e s t e d
outside of the circle.
that the a l g o r i t h m s be run in parallel, p r o v i d e d by the
~(t)
for RML b e i n g the
~(t)
f r o m AML.
in g e n e r a l be m u c h l a r g e r t h a n assumed true order.
w i t h the
from RLS and w i t h the
~(t-j)
e(t)
The v a l u e of
n
in h
for AML
(7.1),
in AM-L, RML w h e r e the
A c o m m o n choice w o u l d be
f7.2)
in RLS w o u l d
h = 2n,
n
is the
but t h i s
is arbitrary. One main reason for on-line c a l c u l a t i o n is to allow the e s t i m a t e s to adapt to an e v o l v i n g m e c h a n i s m
generating
case one should a l s o be "forgetting"
the r e m o t e p a s t since that
will be i r r e l e v a n t to the "forgetting factor" and
x(s)
e s t i m a t i o n problem.
£t(s)
t ~ ~(u), u=s+ 1
then the nett e f f e c t changed,
at time
t.
In that
Thus a
is i n c l u d e d t h a t m u l t i p l i e s
in the c a l c u l a t i o n s ~t(s) =
the data.
w(s)
If
£ (s) = 1 s
is that o n l y the f o r m u l a for
P(t)
is
becoming 1 = ~)[P(t-l)
P(t)
- {l(t)
+ 2 ( t ) ' P ( t - l ) x ( t ) } - i p ( t - l ) x ( t ) x ( t ) 'P(t-l)] . One r e a s o n a b l e p r o c e d u r e w o u l d be to take 0 < ~ < 1
and
1
is fairly n e a r to
However it is felt t h a t
h
and
n
i,
h
will have to i n c r e a s e w i t h
converge to the t r u e log t,
T
H l,
where
l
is
0.95.
m i g h t be m a d e to d e p e n d on
In p a r t i c u l a r even if the true system w e r e then
l(t) e.g.
t
Of c o u r s e if
in o r d e r that h
T (t)
increases with
as it w i l l if AIC or BIC is used to choose
e v e n t u a l l y the c a l c u l a t i o n c a n n o t be d o n e
t.
of the k n o w n order,
h,
in real time.
n,
will t
as
then However
if "forgetting" is u s e d then the sample s i z e is not, truly,
3~ increasing
with
The criterion
t
and thus
time
t
shoula not increase
indefinitely.
should be
log c^2(t) h wherej when
h
+ h log f(t)/f(t)
"forgetting"
is used
f(t)
(7.3) measures
the sample
size to
and is f(t+l)
= X(t+l)f(t)
sznce the effective
sample
+ i,
size is
t t f(t) : Z n X(u). s=l s+l It remains allowed
to describe
so to vary.
RLS, where readily with
h
how to compute
Though
indicates
~(t)
in
these procedures
the order,
(7.3) w h e n
are d e s c r i b e d
and for
p = 1
h
is
for
it will be
seen that they can be used in the same way for AML or RML,
n
taking
the place of
could also fairly easily vector
x(t)
h,
and for
be g e n e r a l i s e d
p > i.
to
Inaeea
they
Call
Xh(t}
q > 1.
when this has b e e n rearranged
the
as
(-y (t-l) ,u (t-l) ,-y (t-2) ,u (t-2) ,... ,-y (t-h) ,u (t-h)) and rearrange
T(t)
Xh(t)' If
Q
accordingly,
is orthogonal
and
Rh(t)
and
is upper
f (t)-Ish (t) 2
the calculations be obtained S =
is
[~(t)
,
[xH~t+l~ Q
acts only on rows place
in
QiQi_l
Qi_iQi_2
... Q1 S
:
[~
(t)
rh(t) 1
0
sn(t)j
triangular, c~(t).
then
Moreover,
as
cost.
Put
= (y(1) ..... y(t}).
~(t)~h(t)
= -rh(t|
as now will be indicated, a~(t),
h = I,...,H
may
consider
rH (t) ]
y(t+l~] Q2HQ2H_I
i, 2H+I ... Q1 s. are
Indeed
%h(t).
v~t)'
may be done so that all
at little
and construct
it
: [Xh~l) ..... xh(t)],
Q[X h(t)v(t)]
where
calling
--- Q2QI
where
and introduces
Qi'
orthogonal,
a zero in the
Then if the rows numbered
i,
(2H+l,i)'th (2h+l)
in
32
and
(0,0,
...,
0, d ½, d % r 2 . . . .
~0,0,
...,
0 • ~_iXl
xI i'
di,
~i
2
e
, 6 i~ - i x 2 '
are defined
d i = d + 6i_ixi
,
= d / d i,
i,
(0,0
(2h+l)
#
..., 0,
where
SH(t)2 we may find
of
"'" Q1
chosen
... S
d r h+l )
Q
---
, ~ X h + l ). Q2HQ2H-I
"'" Q 1
right hand element
of
Q2HQ2H_I
... Q 1 S
is
~(t)
h ~
H
h = H,
Moreover
at no e x t r a ... Q 1 S
6~ h.
Thus
this,
and
(7.3)
the w h o l e
thing may be done with f o r m of the a l g o r i t h m s
q = i.
How
this
well and the and via
h
h
to h a v e
transfer
It w o u l d
then,
eventually
the a l g o r i t h m
algorithm
could
t = 2000)
that
and often
log t
itself
with
RML.
This then
5, at l e a s t
to b e seen.
should t
could to i t s
for
If
h,
If it w a s fit the d a t a
set
l(t)
£ 1
and could be chosen
increase
as
log t.
Tnis
n o t r u n in real time. long run value
increases
b e r u n In r e a l t i m e
it c o u l d b e r e g a r d e d
recursive.
system would
then we
to i n c r e a s e
would De small compared
very large
in AML,
in s e c t i o n
h ~ H,
made
some virtues.
function
right
for a l l
calculation n
gives
and that of
will be remains
system was not evolving
that eventually
However got
certainly
a rational
should be allowed
(7.3).
means
it s e e m s
algorithm
the b o t t o m
m a y be c o m p u t e d
the same
that
is
Thus given
the c a l c u l a t i o n
cost, s i n c e % ^ is 62heh(t)
an o n - l i n e
are f i x e d
in RLS.
is
+ ~H(t) 2
Q2hQ2h_l is
for
defines n
are
"'"
recursively.
useful
xI = 0
of
to minimise
Precisely
e
r I = r I = I,
element
~H(t)
~(t)
hand element
0, 6 X 2,
= SH(t-1)2
for all
QiQi_l
right hand
and the bottom
6 %2 H e^H (t),
!
r k = c r k + sx k,
r o w s of
the bottom
believed
---- 1
60
= ~i_ixi/d i
O, d ~i e d z r 2,
--oQ
(0,0,
Q2hQ2h-i
,
!
t h e n the
~h(t) 2
by the recursion
s
x k = x K - xirk,
6 H
' 6 %i - i x 2h+l"%
"'"
6i = d 6 i - i / d i
e
Moreover
, a½r2h+l)
slowly
u p uo v a l u e s
as a r e a l
until
it
so t h a t the so l a r g e
time algorithm.
(say
33
Allowing
h
useful, w i t h
and
n
l(t)
to i n c r e a s e n e e d s i n v e s t i g a t i o n but c o u l d p r o v e a d r o i t l y varied,
or even a n o n - l i n e a r ,
to m o d e l an e v o l v i n g p h e n o m e n o n
episodic phenomenon.
When
h
or
n
it is likely that o c c a s i o n a l l y they w i l l change a p p r e c i a b l y value of
t
to another.
This
is b e c a u s e
(7.3)
varies from one
is l i k e l y to be flat
near its m i n i m u m or e v e n h a v e s e v e r a l m i n i m a n e a r to e q u a l i t y . may not m a t t e r m u c h since all of the c o m p e t i n g m o d e l s
This
are b e n a v Z n g
about e q u a l l y w e l l b u t c o u l d be m i s i n t e r p r e t e d as e v o l u t i o n . Notes on References.
The f i e l d of o n - l i n e c a l c u l a t i o n
surveyed in L j u n g and S ~ d e r s t r o m section, for
h,
n
(1986).
The b a s i c p r o c e d u r e of this
fixed, w a s s u g g e s t e d in Mayne,
(1983) and the p r o c e d u r e and M a c k i s a c k
(1983).
for
h, n
is e x t e n s i v e l y
A s t r o m anO C l a r ~ e
v a r y i n g in Hannan,
Kavalleris
34
References. Adamyan,
V.M., Arov,
D.F. and Krein, M.G.
(1971)
of Schmidt pairs for a Hankel operator Schur-Takagi Akaike,
H.
Ann. Akaike,
problem.
(1969) Inst.
H.
Fitting
(1969,a)
autoregressive
Canonical Advances
and D.G. Lainiotis, Anderson,
B.D.O.
15, 31-73.
models
for prediction.
6, 416-431. correlation
and the use of an information Identification,
and the generalised
Maths USSR Sbornik,
Star. Math.
Analytic properties
analysis
criterion.
and Case Studies,
Academic
and Moore, J.B.
Press, (1979)
of time series
In: System eds. R.K. Mehra
New York,
29-91.
Optimal Filtering,
Prentice Hall, Englewood Cliffs. Casti,
J,L.
(1977)
Academic Cooley,
J.W.
and Tukey,
calculation
Glover,
M. and Hannan,
B.
(1983)
E.J.
(1981)
Anal.
Lattice
systems and their Cambridge,
Multiple
Hannah,
E.J.
(1980)
The estimation
system.
of the
for adaptive
processing.
Ann. (1981)
Statist.
error bounds
Tlme Ser~es,
Research Dept.
Wiley,
New York.
of the order of an ARMA
8, 1071-1081.
Estimating
J. Multivariate
L~
Systems Division,
of linear
EngLand.
(1970)
E.J.
of
All optimal Hankel norm approximatlons
E.J.
Hannah,
Mathematics
Some properties
filters
Hannan,
process.
for machine
ii, 474-484.
Control and Management
Engineering,
An algorlthm
70, 830-867.
multivariable Report,
(1955)
of ARMA systems with unknown order.
(1982)
IEEE,
K.
J.W.
19, 297-301.
J. Multivariate
Proc.
and Their Applications,
of complex Fourier Series.
parameterization
Friedlander,
Systems
Press, New York.
Computation, Diestler,
Dynamical
the dimension
Anal.
of a linear
ii, 459-473.
of
35
Hannan,
E.J.
(1982)
criterion.
Testing
for autocorrelation
In: Essays in Statistical
and E.J. Hannan,
and Akaike's
Science,
Applied Probability
Trust,
eds. J.M. Gani
Sheffield,
403-412. Hannan,
E.J.
and Kavalieris,
series models. Hannan, E.J. models. Hannan, E.J.,
and Kavalieris,
Kavalieris,
autoregressive
Ann. Statist.
of past and future
~1986)
order.
Recursive
733, no.l. estimation
BiometriKa,
of mlxed
59, 81-94.
Prentice Hall, Englewood
~1983)
canonical
definitions
Identification,
P.
(1983,a)
for time serles:
T.
(1983)
MIT Press,
D.Q., Astrom,
for recursive
correlations
correlations
bounds and computation.
K.J.
and Clarke,
Research
Imperial
(1983)
Theory and Practice
College,
Universal
J.M.
(1983)
of parameters
Report,
of
Mass. A new algorithm
in controlled
Dept. of Electrical
London.
prior
estimation by minimum description
for parameters length.
and
Ann. Statist.
416-431. R.
(1980)
Asymptotically
order of the model process.
of
and theory.
Canonical
Cambridge,
identif±cation
A~MA processes.
Shibata,
Cliffs.
l_!l, 848-855.
Lgung, L. and S6derstrom,
J.
autoregression
l_!l, 837-847.
Ann. Statist.,
Rissanen,
M.
Biometrika,
for time series:
and Bloomfield,
Engineering,
Regression,
J. (1982) Recursive
P.
linear time
7.
Linear Systems,
past and future
Mayne,
(1986)
moving-average
(1980)
Multivariate i_~6, 492-561.
L. and MacKisack,
Jewell, N.P. and Bloomfield,
Jewell, N.P.
L.
of linear systems.
Hannan, E.J. and Rissanen,
T.
I1984) Prob.
J. Time Series Anal.
estimation
Kailath,
L.
Adv. Appl.
Ann.
efficient
for estimating
Statist.
selection
parameters
8, 147-164.
of the
of a linear
~,
36
Tunnicliffe Wilson, G.
(1969}
Factorization
of the covariance
generating function of a pure moving-average SIAM J. Numer. Tunnicliffe Wilson,
Anal., G.
~1972) The factorization or matricial
spectral densities. Whittle,
P.
~1963)
process.
~, 1-7.
SIAM J. Appl. Math.,
23, 420-426.
On the fitting of multivariate
auto-regressions
and the approximate canonical
factorization of a spectral
density matrix.
50, 129-134.
Biometrika,
Chapter 2
Linear Errors-in-Variables Models
Manfred Deistler
I. I n t r o d u c t i o n
In this c o n t r i b u t i o n we are c o n c e r n e d w i t h some a s p e c t s of the ident i f i c a t i o n p r o b l e m for linear systems w h e r e b o t h inputs and outputs are subject to
("observational")
called e r r o r s - i n - v a r i a b l e s
errors.
M o d e l s of this k i n d are
(EV) models.
The c o n v e n t i o n a l s e t t i n g in the s t a t i s t i c a l a n a l y s i s of linear s y s t e m s is to a t t r i b u t e all e r r o r s to the outputs,
or
valently to add the e r r o r s to the e q u a t i o n s . equations ^
(for our purposes)
equi-
T h i s gives the e r r o r s in
(EE) models. ^
Let x t and Yt d e n o t e the "true" inputs and o u t p u t s r e s p e c t i v e l y and let x t and Yt d e n o t e the o b s e r v e d inputs and outputs, ation can be i l l u s t r a t e d as follows:
I
There
E V m o d e l s are of the form:
Fig
I
then the situ-
I: S c h e m a t i c r e p r e s e n -
t a t i o n of an E V m o d e l
u t and v t are the e r r o r s of the inputs a n d the o u t p u t s re-
spectively.
On the o t h e r hand EE m o d e l s are of the form:
S
Fig 2: S c h e m a t i c r e p r e s e n -
r
tation of an EE m o d e l Yt
S8
Of c o u r s e the E V setting is m o r e g e n e r a l t h a n the EE setting.
For a
nunlber of p u r p o s e s , e.g. for the p r e d i c t i o n of the o b s e r v e d o u t p u t s f r o m o b s e r v e d inputs,
the EE s e t t i n g is adequate.
In m a n y cases h o w -
ever, the E V s e t t i n g seems to be m o r e a p p r o p r i a t e ,
(i)
e.g.
if our m a i n i n t e r e s t c o n c e r n s the "true" s y s t e m g e n e r a t i n g the data
(rather t h a n a good r e p r e s e n t a t i o n of the data)
and if we
c a n n o t be sure a p r i o r i that the true inputs are not c o n t a m i n a t e d by e r r o r s
(ii)
if we w a n t to d e c o u p l e the c o m m o n e f f e c t b e t w e e n the v a r i a b l e s f r o m the i n d i v i d u a l e f f e c t s
(iii)
if there is no a p r i o r i c l a s s i f i c a t i o n of the o b s e r v e d v a r i a b l e s into inputs a n d o u t p u t s a n d if thus a s y m m e t r i c t r e a t m e n t of the v a r i a b l e s w o u l d be a p p r o p r i a t e .
We are d e a l i n g here only w i t h linear systems in a s t a t i o n a r y context. Also,
if the c o n t r a r y has not b e e n s t a t e d e x p l i c i t e l y ,
we r e s t r i c t
o u r s e l v e s to the single input - single o u t p u t case. Our p r i m a r y int e r e s t is in the c h a r a c t e r i s t i c s function
of the system,
i.e. in the t r a n s f e r
(or the p a r a m e t e r s of the t r a n s f e r f u n c t i o n ) ;
but a l s o the
^
c h a r a c t e r i s t i c s of the errors and of
(xt) are of interest.
The s t a t i s t i c a l t h e o r y of linear d y n a m i c EE systems, A R M A X systems
(also in the m u l t i i n p u t - m u l t i o u t p u t case)
c h e d a c e r t a i n stage of c o m p l e t e n e s s n o w (1984)).
e s p e c i a l l y of has rea-
(see H a n n a n and K a v a l i e r i s
In the EV case on the o t h e r h a n d there is still a g r e a t
n u m b e r of open p r o b l e m s and this is the r e a s o n why there is still a r e l a t i v e l y small n u m b e r of a p p l i c a t i o n s
in this field.
lems in the E V case a r i s e f r o m the fact that the
The m a i n prob-
(ensemble)
second
m o m e n t s of the o b s e r v a t i o n s do in g e n e r a l n o t u n i q u e l y d e t e r m i n e the t r a n s f e r f u n c t i o n of the system. A n o t h e r d i f f e r e n c e to EE m o d e l s is, that in the EV case, h i g h e r o r d e r m o m e n t s may c o n t a i n a d d i t i o n a l
Our e m p h a s i s fiability,
is on two problems:
i.e.
(in the non G a u s s i a n case)
i n f o r m a t i o n a b o u t the t r a n s f e r function.
The f i r s t is the p r o b l e m of identi-
the p r o b l e m w h e t h e r the c h a r a c t e r i s t i c s
of i n t e r e s t
39
m e n t i o n e d a b o v e can be u n i q u e l y d e t e r m i n e d f r o m c e r t a i n c h a r a c t e r i s t i c s of the o b s e r v a t i o n s as e.g. f r o m t h e i r from their p r o b a b i l i t y law
(ensemble)
s e c o n d m o m e n t s or
(see D e i s t l e r and S e l f e r t
(1978)).
If the
answer is n e g a t i v e then the s e c o n d p r o b l e m is to d e s c r i b e the sets of 9 b s e r v a t i o n a l l y e q u i v a l e n t c h a r a c t e r i s t i c s of interest,
i.e. the sets
of c h a r a c t e r i s t i c s of i n t e r e s t w h i c h c o r r e s p o n d to the same c h a r a c t e r istics of the o b s e r v a t i o n s . These q u e s t i o n s are q u e s t i o n s p r e c e d i n g e s t i m a t i o n in the n a r r o w sense and as has b e e n s t a t e d a l r e a d y they t u r n out to be the m a i n d i f f i c u l t y in the p r o c e s s of e s t i m a t i o n
(or inference)
in E V models.
This diffi-
culty is the r e a s o n w h y not v e r y m u c h a t t e n t i o n has b e e n p a i d to EV models for a long time. However,
in the last d e c a d e t h e r e has b e e n a
r e s u r g i n g i n t e r e s t in E V m o d e l s in e c o n o m e t r i c s , theory, see e.g. A i g n e r and G o l d b e r g e r A n d e r s o n B.D.O. (1984), D e i s t l e r
s t a t i s t i c s and system
(1977), A i g n e r et al.
(1985), A n d e r s o n and D e i s t l e r
(1984),
(1984), A n d e r s o n T.W.
(1984), D e i s t l e r
(1985a),Fuller
(1980), G r e e n and
Anderson
(1985) , H i n i e h and W e b e r
(1984) , K a l m a n
(1982), K a l m a n
Maravall
(1979), Picci
Mittag
(1985), W e g g e
(1985), S 6 d e r s t r 6 m
(1983),
(1980), S c h n e e w e i B und
(1983).
The p a p e r is o r g a n i z e d as follows.
In s e c t i o n 2 we r e p e a t some well
known results for the static case.
In sections 3 to 5 we c o n s i d e r the
(dynamic) c a s e w h e n the c h a r a c t e r i s t i c s are their second moments.
of the o b s e r v a t i o n s c o n s i d e r e d
T h e r e b y in s e c t i o n 3 the set of all t r a n s f e r
functions c o r r e s p o n d i n g to g i v e n s e c o n d m o m e n t s of the o b s e r v a t i o n s is described.
Section
4
deals w i t h the same p r o b l e m , w h e n the s y s t e m
is a priori k n o w n to be c a u s a l and w i t h the p r o b l e m w h e t h e r c a u s a l i t y can be d e t e c t e d f r o m the s e c o n d m o m e n t s of the o b s e r v a t i o n s .
In sec-
tion 5 several c o n d i t i o n s for i d e n t i f i a b i l i t y are given. F i n a l l y in section 6 we d e r i v e c o n d i t i o n s for i d e n t i f i a b i l i t y u s i n g i n f o r m a t i o n coming f r o m m o m e n t s of o r d e r g r e a t e r than two.
The system c o n s i d e r e d is of the form
(1.1)
Yt = w(mxt
40
where B on ~
is a c o m p l e x v a r i a b l e as w e l l as the b a c k w a r d - s h l f t o p e r a t o r
and where
(1.2)
w(B)
=
Z wiBl
is the t r a n s f e r function.
The s u m m a t i o n on the l.h.s of
(1.2) ranges
o v e r all i n t e g e r s and thus in g e n e r a l the s y s t e m is not a p r i o r i a s s u m e d to be causal.
The o b s e r v e d p r o c e s s e s
(x t) and
(yt) are g i v e n by
^
(1.3)
x t = xt + Ut
(1.4)
Yt = Yt + vt
We a s s u m e throughout:
(1.5) All p r o c e s s e s c o n s i d e r e d are
(wide sense)
stationary;
all limits
of r a n d o m v a r i a b l e s are u n d e r s t o o d in the sense of m e a n squares convergence
(I .6)
Ex t = Eu t = Ev t = 0
(I .7)
EXsU t = EXsV t = 0
Vs,t
and
(1.8)
(ut,v t) has a s p e c t r a l density,
~ say.
T h e s e a s s u m p t i o n s are c a l l e d the s t a n d a r d a s s u m p t i o n s h e r e and they w i l l not be f u r t h e r e x p l i c i t e l y restated.
The a s s u m p t i o n Ex t = 0 is i m p o s e d for n o t a t i o n a l c o n v e n i e n c e o n l y and may e a s i l y be relaxed. tion
(1.8)
(1.7)
is n a t u r a l
is n a t u r a l for errors.
In m a n y cases w e in a d d i t i o n a s s u m e
in our context. A l s o the assump-
41
(I .9)
EUsV t = o
Vs,t
i.e. ~ is d i a g o n a l (1.10) All p r o c e s s e s Thereby,
if
considered
have a spectral
(zt) is a s t a t i o n a r y
density
we often use fz to d e n o t e
process,
its spectral density. Assumption and
(1.9) means
(yt) are
due
that all
ment devices
effects
e.g.
if the errors
for inputs and outputs are correlated.
then c o r r e s p o n d
between
the s i t u a t i o n
is h o p e l e s s
to given second m o m e n t s
to separate
because
(x t)
effects are
Of course s i t u a t i o n s may occur w h e r e
can not be justified,
tional a s s u m p t i o n information
(linear)
to the s y s t e m and that only i n d i v i d u a l
a t t r i b u t e d to the errors. an a s s u m p t i o n
common
such
in the m e a s u r e -
W i t h o u t any addi"too many"
of the observations.
systems
Additional
the errors c o u l d be o b t a i n e d from certain
frequency domain p r o p e r t i e s
of the errors,
or from h i g h e r order moments.
2. The Static Case Here we c o n s i d e r the t r a n s f e r
the special case, w h e r e the system is static,
function w is simply the slope p a r a m e t e r of a line and
all p r o c e s s e s detail in the
are white noise. literature,
the surveys by M a d a n s k y T.W. A n d e r s o n complicated
see K a l m a n
Yt
(1.3) and
This case has been d i s c u s s e d
see e.g.
Gini
(1959), Moran
(1921), F r i s c h
(1982)
in great
(1934)
(1971), A i g n e r et al.
(1984). For the m u l t i v a r i a b l e
The static E V model
(2.1)
i.e.
and
(1984)
and K l e p p e r and L e a m e r
(1984).
is w r i t t e n as
= axt
(1.4), w h e r e
EXsX t = ~st.O~ In a d d i t i o n we a s s u m e
; a~R
(xt),
;
(ut) and
EUsU t = 6stO u (1.9)
(v t) are w h i t e n o i s e and thus
;
and
case, w h i c h is much more
EVsV t = 6stO v
i.e. E U s V t = o. If we try to w r i t e
42
(2.1) (1.3)(1.4)
as a "regression"
in the o b s e r v e d variables,
we
obta in :
Yt = axt + (vt - aut) But here E x t ( v t - au t) = -a.o u and thus in general squares e s t i m a t o r s investigate
will not be consistent.
the p r o b l e m
The p a r a m e t e r s
least
in more detail.
of i n t e r e s t are ~ = (a,o~,Ou,av).
ween these p a r a m e t e r s
(ordinary)
T h e r e f o r e we have to
and the second moments
The relation bet-
of the o b s e r v a t i o n s
is given by (2.2)
o x = Ex~ = o~ + ~u
(2.3)
~xy = ExtY t = ExtYt = a . ~
(2.4)
~y
^
^
= Ey 2 = a2o^ + x
v
Thus the p r o b l e m of i d e n t i f i a b i l i t y model
is w h e t h e r
8 is u n i q u e l y
from second moments
determined
for this
from ~x" Cxy" Oy" A slightly
m o r e general model w o u l d be of the form
(2.5)
(I .3) -
bYt = ax t
(where a and b are suitably n o r m a l i z e d e.g. by a 2 + b 2 = I)
(I .4). Sloppy speaking here we a l l o w for the case a = ~
(2. I). Then the p r o b l e m of o b s e r v a t i o n a l to the f o l l o w i n g covariance
"Frisch"
matrix
problem
equivalence
(see Kalman
is e q u i v a l e n t
(1982)):
Given the
K = --Fx'~xyi find all d e c o m p o s i t i o n s i
l
l yx °yI (2.6)
K
into c o v a r i a n c e
-- ~
÷
(i.e. symmetric,
nonnegative
definite)
matrices
^
K a n d ~,
such that K is singular and ~ is diagonal.
lence is s t r a i g h t f o r w a r d ~
in
This equiva-
here K is the c o v a r i a n c e m a t r i x of
4,3
^
^
(xt,Yt), a and b, after suitable normalization, are defined from
~=(°u0)
^
the linear dependence relations in K, and
0
In the case
(2.1)
aV
(which excludes the possibility b = 0 in (2.5)
and which is the only one we treat here, unless the contrary has been explicitely stated)
K = ~.
a
a
2
holds. ^
By the singularity of K we have ^ det K =
(2.7) where a~ = E ~
a^.a^
x
y
~2 xy
-
=
O,
and furthermore
(2.8)
0 - 0, n - r > 0;
r times n-r times r n-r since v and u are
is n o n
(and t h a t
fact
inde-
pendent.
Let us a s s u m e a
that
n > 2 such
using
(2.16),
(2.18)
xt
that
C~n
Cgr-1~(n+1-r)
a is u n i q u e l y
processes.
Note
are at l e a s t and for
that
(2.16)
Thus w e h a v e
shown:
Theorem
Consider
2.2:
the a s s u m p t i o n
x t is n o n
Gaussian
theorem
forward
Then
there
a @ 0 then,
static
r > 0
the c u m u l a n t s
of
to the c a s e
the
observed
n = 2)
there
f o r m C y r x ( n - r ) , r > 0, n - r > 0
-
~,
~u and
a v can
(2.4).
EV model
Then,
under
a n d a ~ 0, t h e m o d e l
can be extended
n-
a is d e t e r m i n e d ,
(2.2)
(2.14).
I > 0,
(as o p p o s e d
Once
from
r-
from
of the
holds.
the
> 0).
(2.1) (1.3) (1.4)
the assumptions
s~
together ~ 0,
is i d e n t i f i a b l e .
to t h e m u l t i v a r i a t e
case
in a s t r a i g t h -
manner.
If i n s t e a d late
that
thus
(2.16)
unique
;
determined
for n > 2
determined
~
we assume
~ 0, a n d
two cumulants
these
be u n i q u e l y
This
If in a d d i t i o n
a = Cyrx(n-r) Cy (r-1) x (n+l-r)
and t h u s
with
Gaussian
@ 0.
of a s s u m i n g (u t v t) holds
provided
is for
that
that
(u t) a n d
Gaussian all there
then
r and for
(v t) a r e
independent
we postu-
Curvn- r = 0 whenever
n > 2 and
all n > 2 and
is a n > 2 s u c h
that
therefore ^
Cxn
~ 0.
a is
is
48
Now, let us make a few remarks on estimation: mic Gaussian
likelihood
is of the form (2.19)
function
(where constants
LT(O)
Thereby K(O)is the covariance
in (2.19)
estimator
The corresponding observationally
(MLE), obtained by minimizing
IT =
In the non Gaussian
case, of course
estimator
The reader
(for the case Oxy,T>0
Sy,T
say)
a'~xy,T )
(2.18) can be used to define
for a from the sample cumulants.
that the estimators
Here the
should be selected and also infor-
is referred
of different
to Drion
3. Second Moments and Dynamic Models:
order can be
(1951)
of obtained from
satisfy the restrictions
and Scott
(1950).
(2.18) do not
coming from the second moments. The General
Case
From now on, linear dynamic
systems are considered.
the two sections
only the information
following,
of all
to the true K,
IJxy
mation coming from sample cumulants
necessarily
corresponding
~x,T - &-1 "~xy,T'
'
problem arises which cumulants
Note also,
i.e. the e s t i m a t e o f t h e s e t
parameters
to theorem 2.1 is given by
[~xy,T'~x,T -1
combined.
xt
T tZ1= (yt) (xt'Yt)
set of parameters,
equivalent
{8 = (a,a-1.~xy,T,
a consistent
size. Thus the correspon-
is given by
K T = \~xy'T' Oy,T
then according
'
matrix K in (2.6) corresponding
{~x,T'°xy,T) (2.20)
(2.1) (1.3) (1.4)
T Z (xt,Yt).K-1(0).(xt,Yt) t=l
8, and T is the sample
ding maximum likelihood LT(O)
of the static model
logarith-
have been neglected):
= T log det K (8) +
to the parameters
The negative
second moments of the observed processes
In th£s and in
coming from the
(x t , yt ) is used.
49
For the m o m e n t let x t and Yt be not n e c e s s a r i l y Let z t = (xt,Yt), The general
zt =
one dimensional.
(~t'gt)' wt = (ut'vt)"
f o r m of a linear dynamic
s y s t e m is:
N • •N•
(3.1)
lim Z N~ i=-N wi
•
--
= 0, w i 6 ~
zt-i
m x n
; m 0
We a s s u m e
V
(3.1) can be w r i t t e n as
~
wi~t_i
= 0.
i=--00
Clearly,
for n = 2, f(1) can e i t h e r have rank one or rank zero. ^
In
the second case f(1) = 0 and f (I) = 0. I m p o s i n g (3.4) implies ^ xy m* = I and f(l) has rank I for all I. (3.4) is a u t o m a t i c a l l y fulf i l l e d if f
(I) ~ 0 for all I. Thus u n d e r (3.4) the s y s t e m can xy a l w a y s be w r i t t e n as (1.1) w h e r e w = (w,-1) is u n i q u e for given f and
fxy(X)
= 0 implies
If f itself
w(e - i l )
is singular,
= O. then f=f and ~=0 d e f i n e s a d e c o m p o s i t i o n
c o r r e s p o n d i n g to an e r r o r - f r e e
system.
This d e c o m p o s i t i o n
is u n i q u e
whenever f
(I)~0. For 1's w h e r e f (I)=0 we may have e.g. f (I)=0, xy xy y (fx(1) > 0 and f~(1) > 0), and fu(1) > 0 gives rise to a n o t h e r d e c o m -
position.
Of c o u r s e
in this case we have for the c o r r e s p o n d i n g
transfer function w(e-il)=0. a s s u m e that f(1) than zero.
For the rest of the p a p e r we a l w a y s
is n o n s i n g u l a r on a set of L e b e s g u e m e a s u r e g r e a t e r
51 Besides the transfer function w, the other characteristics interest are
of
fx' fu' fv"
Analogously to the static case, the set of pairs with given f satisfies
(f~,fg) compatible
(3.5)
O < f~
< fx'
f~(1) = f~(-l);
f~ is measurable
(3.6)
0 ~ f9
! fy,
f~(1)
f~ is measurable
= fg(-l) ;
^
and, since f is singular
(3.v)
If
xy
12 = f^f^
x y
and (3.5)(3.6)(3.7) are the only restrictions on (f~,f~). Thus we have (Anderson and Deistler (1984) Deistler(1985a)). Theorem 3.1:Consider
the linear dynamic EV system
Under the additional assumptions (1.9)(1.10) all transfer functions w satisfying (3.8)
Ifyx(1) l-f~1(~) 0, t h e n
If
(4.12)
of w of m o d u l u s
(4.1)
if t h e r e
and a constant
is a t r a n s f e r
(vi)
and
satisfying
n is u n i q u e l y
if a n d o n l y
which
(and
of p o l e s
see t h a t
functions
no transfer
(v)
we
shown
Let
If n = 0,
(4.11)
- ~a O is t h e n u m b e r
the n u m b e r
(4.13)
the
then
(4.9)
transfer
fxy)
(iv)
via
yx
w and w are
(4.14)
(ii)
f
hold,
w = b + b - B ~ b ° ( a + a - B~a°) -I
. Thus we have
(i)
(4.15)
+ ~b O - ~a
(4.11) a n d
Theorem
-
a w a n d f^ g i v i n g x
Now,
From
(4.13)
transfer
function,
but
there
is m i n i p h a s e
the
transfer
w = a-lb exists
functions
and w correspond
a polynomial
f2
are causal, to the
satisfying
same (4.13)
58
(4.16)
0 < $f2 0 such that = c. (b+~ -) (a+~ - ) - 1 . f 2 ~ 2 . B n - ~ f 2
(4.17)
holds Proof:
If in
(4.11) w e take f2 = a w = c ( b + ~ -)
and fl
(a+~-)-lB n
w h e r e ( b + ~ - ( a + ~ -) is a causal and m i n i p h a s e with
(ii) this implies
(iii) - (v) and a l s o
This result has been stated in D e i s t l e r
non n e c e s s a r i l y
case a n d in G r e e n a n d A n d e r s o n From
rational (1985)
transfer (vi)
function;
together
is e a s i l y seen.
(1985a)and D e i s t l e r
Partly more general r e s u l t s have been given for the causal,
then
in Anderson,
(1985 b).
B.D.O.
(1985)
single input - single output
for a causal m u l t i v a r i a b l e
case.
(4.17) we see that the c a u s a l i t y a s s u m p t i o n gives a substantial
r e d u c t i o n of the set of all t r a n s f e r f u n c t i o n s c o m p a t i b l e w i t h given f
. If n = 0, then in the causal case, w is unique up to m u l t i p l i yx c a t i o n by a p o s i t i v e c o n s t a n t (this has b e e n p o i n t e d out by H i n i c h (1983) and Anderson,
B.D.O.
(1985)). An e s t i m a t i o n p r o c e d u r e
has been d e v e l o p e d by H i n i c h
5. C o n d i t i o n s
(1983)
for I d e n t i f i a b i l i t y
for the case n=0
and H i n i c h a n d Weber
(1984).
f r o m the Second M o m e n t s of the
Observations This section c o n s i s t s of two parts: the a d d i t i o n a l
the second step of the a n a l y s i s case.
Special e m p h a s i s
We have
(see A n d e r s o n and D e i s t l e r (1985a) (1985b), M a r a v a l l
If the transfer
(4.10). So this is section for the rational In the second p a r t
for i d e n t l f i a b i l i t y .
T h e o r e m 5.1: Let the a s s u m p t i o n s (i)
(4.8) and
of the p r e v i o u s
is put o n identifiability.
we give some o t h e r c o n d i t i o n s
Deistler
In the first part, we i n v e s t i g a t e
i n f o r m a t i o n c o m i n g from
(1984), A n d e r s o n , (1979), N o w a k
(4.1)
B.D.O.
(1985),
(1983)):
- (4.5) hold.
Then:
f u n c t i o n s are a p r i o r i k n o w n to be causal and
if either n = 0 or if
59
(5.1)
d,b
are relatively
prime
(5.2)
a,e
are relatively
prime
(5.3)
b+,~ -
then w is u n i q u e l y
are relatively
determined
prime
from f under
e a c h of the f o l l o w i n g
condi-
tions :
(5.4)
d,c
are r e l a t i v e l y
prime and
(5.5)
a.d,f
are relatively
prime
(5.6)
6d = 0 a n d
(5.7)
Gad = 0 a n d 6e + 6b > 6g - 6f
(ii)
If w is a p r i o r i assumed
(iii)
assumed
f2 An
If the t r a n s f e r assumptions
(4.14)
to be c a u s a l
(5.8)
a,e
(5.9)
a ,a
prime,
~ad > 0
and
if d a n d c a r e a p r i o r i
then there
compatible
functions
(5.1)
and
~e > 6h - 6c
to b e r e l a t i v e l y
of f a c t o r s
~d > 0
is o n l y a f i n i t e
with given
are n o t n e c e s s a r i l y
number
fx" causal,
a n d if the
- (5.3)
are relatively
prime
+~--
hold,
(iv)
then under
are relatively
prime
(5.4)
or
If w , f ~ c o r r e s p o n d s
or
(5.5)
to g i v e n
fyx'
(5.6)
or
(5.7)
t h e n a l l cw, c - l f ~ ,
satisfies (5.10)
0 < C m i n _< c _< C m a x w i t h C m i n a n d Cma x d e f i n e d
(5.11)
by
min 1B I= I
(fx(B)
- c -I f~(B)) = 0 min
IBlmin=1
(fy(B)
- Cma xlw(B) 12f~(B))
and (5.12)
w is u n i q u e
= 0
where c
60
c o r r e s p o n d to given f, and for all o t h e r c, cw, c-lf
correspond Proof:
to given
x
does not
f.
(i): If n = 0, then as has a l r e a d y b e e n stated, w is u n i q u e l y
determined from
(4.9) up to m u l t i p l i c a t i o n by a p o s i t i v e constant.
The same h o l d s under
(5.1)
-
(5.3) : Due to
zero c a n c e l l a t i o n s on the r . h . s
in
(5.1)and(5.2) no p o l e -
(4.9) can occur and thus a and d
are u n i q u e l y d e t e r m i n e d f r o m the p o l e s in fyx" By
(5-3)
, e then is
u n i q u e l y d e t e r m i n e d f r o m those zeros of a d d * f y x , B i say, w h e r e also -I B i is a zero and thus b and w are u n i q u e up to m u l t i p l i c a t i o n by a p o s i t i v e constant.
From
(4.8) we h a v e
(5.13)
dfxd* = e o e* + d c - l h o U h * c - 1 * d *
If
then there e x i s t s at least one zero of d,
(5.4) holds,
B I say,
and we h a v e
dfxd*(B1)
= e~ge*(B1)
and f r o m this Ge and thus b are u n i q u e l y d e t e r m i n e d . (5.5)
If
The proof for
is c o m p l e t e l y analogous.
(5.6)
h o l d s then
(4.8)
is of the f o r m
-fx = e~£ e* + c lhs h*c-
I*
and thus c is u n i q u e l y o b t a i n e d f r o m the p o l e s of fx' Then a e is obt s i n e d f r o m a c o m p a r i s o n of c o e f f i c i e n t s of p o w e r
cf Xc* = c e ~ e e * c * and in the same way we p r o c e e d if
The proof of (5.9)
(iii)
de + ~c in
+ ho h* (5.5)
holds.
is c o m p l e t e l y a n a l o g o u s ,
since
(5.1) - (5.3)(5.8)
h e r e a g a i n g u a r a n t e e that w is d e t e r m i n e d f r o m fyx up to multi-
p l i c a t i o n by a p o s i t i v e constant.
6~
(ii) if d and c are relatively prime,
then all zeros of d are poles
of fx and thus there is only a finite number of candidates
for d and
thus also for f2 in (4.17) (iv) is an immediate consequence
of Theorem 4.1 and of
taking into account the non-negativity
(4.8) and
of spectral densities
for
(4.10), IBI = 1
Clearly, once w is uniquely determined and if w(e -il) ~ 0 then also ~ ' fu' fv are unique. (i) and (iii) show that e.g. using (4.19), once the degrees are prescribed and if 6d > 0, we have identifiability on a generic subset of the parameter Now
space.
we discuss some other cases where additional
guarantee
identiflability
(i) Let the inputs xt have a spectral distribution (5.14)
F^(I) x
=
a priori restrictions
from the second moments of the observations
; f^dl + Z Fx, j [_~,~] x j:lj 0 x,3
Thus (x t) is a fairly general process where F x has an absolutely
con-
tinuous and a discrete part and where the discrete part corresponds to a stationary harmonic process ZeiljtZx,j, where Fx, j = ElZx,jl 2. Here we do not impose
(1.9). By assumption
(1.8),
(ut,v t) has a spectral
density and thus we have (5.15)
Fx(1)
=
S (f~+fu)dl [-z,l]
+
Z Fx, j j:lj2 such that Cen~ 0. If in addition w2(e -il) ~ 0
VI, then
f~n(ll...In_1) fulfilled.
(6.8) is
# 0 ~ll...In_ 1 and then due to (6.3) condition
The generalization of Theorem 6.1 to the multivariable case is straightforward. Estimators of the transfer function w may be obtained from (6.9) replacing the cumulant spectra by their estimators.
66 References Aigner,D.J. and A.S.Goldberger (Eds.), (1977): Latent Variables Socio-Economic Models.North Holland P.C., Amsterdam
in
Aigner,D.J., C.Hsiao, A.Kapteyn and T.Wansbeek (1984): Latent Variable Models in Econometrics. In: Griliches, Z. and M.D.Intriligator (Eds.) Handbook of Econometrics. North Holland P.C., Amsterdam Akaike,H. (1966): On the Use of Non-Gaussian Process in the Identification of a Linear Dynamic System. Annals of the Institute of Statistical Mathematics 18, 269 - 276 Anderson,B.D.O. (1985): Identification of scalar errors-in-variables models with dynamics, Forthcoming in Automatica Anderson,B.D.O. and M.Deistler (1984): Identifiability in Dynamic Errors-in-Varlables models, Journal of Time Series Analysis, 5, 1-13 Anderson,T.W. (1984): Estimating Linear Statistical Relationships. Annals of Statistics, 12, 1 - 45 Brillinger,D.R. (1981): Time Series: Data Analysis and Theory. panded Edition. Holden Day, San Francisco
Ex-
Deistler,M. (1984}: Linear errors-in-variables models. In: J.Franke, W.H~rdle und D.Martin (Eds.), Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Springer-Verlag, Berlin Deistler,M. (1985a): Linear dynamic errors-in-variables models in: J.Gani and M.Priestley (Eds.) Essays in Time Series and Allied Processes. Forthcoming Deistler,M. (1985b): Identifiability and Causality in Linear Dynamic Errors-in-Variables Systems. In: Proc. 5th Eranco Belgian Meeting of Statisticians. Forthcoming Deistler,M. and H.G.Seifert (1978): Identifiability and Consistent Estimability in Dynamic Econometric Models. Econometrica, 46, 969 - 980 Drion,E.F. (1951): Estimation of the Parameters of a Straight Line and of the Variances of the Variables, if they are Both Subject to Error. Indegationes Math. 13, 256 - 260 Frisch,R. (1934): Statistical Confluence Analysls by Means of Complete Re@ression S[stems. Publication No. 5, University of Oslo, Economic Institute Fuller,W.A. (1980): Properties of some Estimators for the Errors-inVariables Model. Annals of Statistics, 8, 407 - 422 Geary,R.C. (1942): Inherent Relations between Random Variables. Proceedings of the Royal Irish Academy, Sec. A, 47, 63 - 76
67
Geary,R.C. (1943): Relations between Statistics: The General and the Sampling Problem When the Samples are Large. P r o c e e d i n g s of the Royal Irish Academy. Sec. A, 49, 177 - 196 Gini,C. (1921): Sull'interpolazione di una tetra quando i valori della variable indipendente sono affetti da errori aocldentall. Metron I, 63 - 82 Green,M. and B.D.O.Anderson (1985): Identification of m u l t i v a r i a b l e e r r o r s - i n - v a r i a b l e s models with dynamics. Mimeo. Hannan,E.J.
(1970):
Multiple
Time
Series.
Wiley,
New York
Hannan,E.J. and L.Kavalieris (1984): Multivariate Linear Time Series Models. Advances in Applied Probability 16, 492 - 561 Hinich,M.J. (1983): Estimating the Gain of a Linear Filter from Noisy Data. In: D.R.Brillinger and P . R . K r i s h n a i a h (Eds.) Handbook of Statistics, Vol 3. North Holland, A m s t e r d a m Hinich,M.J. and W.E.Weber (1984): Estimating Linear Filters with Errors in Variables Using the Hilbert Transform. Federal Reserve Bank of Minneapolis, Res.Dept. Staff Report 96 Kalman,R.E. (1982): System Identification from Noisy Data. In: A.Bednarek and L.Cesari (Eds.) Dynamical Systems II, a University of Florida International Symposium. Academic Press, New York Kalman,R.E. (1983): Identifiability and Modeling in Econometrics. In: Krishnaiah,P.R. (Ed.) Developments in Statistics, vol 4. Academic Press, N e w York Kendall,M.G. and A.Stuart (1969): The Advanced Vol I, 3rd Edition, Griffin, London
Theor~ of Statistics.
Klepper,S. and E.Leamer (1984) Consistent Sets of Estimates for Regressions with Errors in all Variables. Econometrica 52, 163 -. 183 Madansky,A. (1959): The Fitting of Straight Lines when Both Variables are Subject to Error. Journal of the American Statistical Association 54, 173 - 205 Maravall, A. (1979): Identification Springer Verlag, Berlin.
in Dynamic
Moran, P.A.P. (1971): Estimating Structural ships. Journal of M u l t i v a r i a b l e Analisys
Shock-Error
and Functional I, 232-255
Models. Relation
Nowak, E. (1983): Identification of the Dynamic Shock-Error Model with A u t o c o r r e l a t e d Errors. Journal of Econometrics 23, 211-221 Picci, G. (1985): Factor Analylis Methods. This Volume
Models via Stochastic
Realization
68
Reiers¢l,O. (1941): C o n f l u e n c e A n a l y s i s by M e a n s of Lag M o m e n t s and o t h e r M e t h o d s of C o n f l u e n c e A n a l y s i s . E c o n o m e t r i c a 9, I - 24 Reiers~l,O. (1950): I d e n t i f i a b i l i t y of a L i n e a r R e l a t i o n B e t w e e n V a r i a b l e s w h i c h are s u b j e c t to Error. E e o n o m e t r i c a 18, 375 - 389 S c h n e e w e i S , H . u n d H . J . M i t t a g (1985): L i n e a r e M o d e l l e m i t f e h l e r b e h a f t e t e n Daten. P h y s i c a Verlag, W ~ r z b u r g Scott,E.L. (1950): Note on C o n s i s t e n t E s t i m a t e s of the L i n e a r Structural R e l a t i o n B e t w e e n two V a r i a b l e s . A n n a l s of M a t h e m a t i c a l S t a t i s t i c s 21, 284 - 288 S 6 d e r s t r 6 m , T . (1980): S p e c t r a l D e c o m p o s i t i o n w i t h A p p l i c a t i o n to I d e n t i f i c a t i o n . In: A r c h e t t i , F . and M . C u g i a n i (Eds.) N u m e r i c a l T e c h n i q u e s for S t o c h a s t i c Systems. N o r t h H o l l a n d P.C., A m s t e r d a m Wegge,L. (1983): A R M A X - M o d e l s P a r a m e t e r I d e n t i f i c a t i o n w i t h o u t and w i t h L a t e n t V a r i a b l e s . W o r k i n g Paper. Dept. of Economics, Univ. of C a l i f o r n i a , Davis.
Chapter
3
A New Class of Dynamic Models For Stationary Time Series
Giorgio
Picci
and
Stefano
Pinzoni
I. Introduction In this note we shall discuss a new class of dynamic models which may be better suited than conventional ARMAX schemes to describe non-causally interacting time series. Typical areas of application that we have in mind include econometrics (where it is often not clear what variables are "endogenous" and what are "exogenous") and identification of industrial processes operating under feedback. In these situations there is no a priori clear causality relation among the variables and, in fact, a possible goal of the identification experiment could be the testing for existence of causal relations. The class of models introduced here is a natural dynamic generalization of the well-known static Factor Analysis model which in various equivalent forms (the most popular of which seems to be the so-called Errors-In-Variables scheme) has been object of much study in the past especially by econometricians and psychologists.
(For definitions of these concepts and a
rather comprehensive survey of the literature one may consult the recent paper by Van Schuppen(1985). The study of these models has recently been revitalized by Kalman in a series of papers(Kalman, 1982a,1982b and]98~and some of the critiques presented in Kalman's
?0
work have been the motivating stimulus £or the earlier paper (Finesso and Picci, 1984). The present exposition represents the natural continuation and generalization of the results presented there. In order to improve readability we have chosen to skip some non essential technical details. A more complete story can be found in (Picci and Pinzoni, ]986). People interested in genera] philosophical discussions on the modelling problem considered here are referred to the introduction of (Finesso and Picci, 1984). We should mention that some of the specific issues dealt with in this paper are also treated (in the scalar E.I.V. context) in the work of Anderson and Deistler (1984), Anderson (]985), Deistler (]985). Although the primary motivations (and hence the basic assumptions) in these papers are of a rather different nature than ours, the reader might find some ground for comparisons in the discussion of the causality problem presented in section 4. For the sake of motivating the introduction of Dynamic Factor Analysis models we shall briefly review the definition of causality of a dynamical model, first in the deterministic and then in the stochastic (Gaussian) case. The idea that we want to convey is that causal models are quite "nongeneric" mathematical descriptions to impose aprioristically to real data, e.g. economic time series or data coming from industrial processes involving feedback. In a deterministic framework the notion of causality is of course well known. Assume that the components of the m-dimensional variable y(t), whose temporal evolution is described by a certain dynamical model, have been grouped in two subvectors,
Yl(t)] y(t)
=
,
(I .I)
Y2 (t)
with Yi(t)_ _ER
mi
, i = 1,2 , and m1+m 2 = m .
It is intuitively clear
that a dynamical model should quantify the dynamic relation
71
occurring between the variables Yl and Y2 (i.e. how much
Yl
"influences" Y2 and vice versa). This is made precise in J.O.Willems refoundation of Systems Theory (Willems,1979):anymodel valently dynamical system) with external variables subset of trajectories
is just a
L~ (called the behaviour of the system)
in ( m)~= ( ml)~x (~m2) Z between
y
(or equi-
and therefore a bona fide relation
Yl (ranging over (~ml) ~) and Y2 (ranging on (~m2)2). We
say that Yl causes Y2 or, equivalently,
that Yl is the input and
Y2 is the output variable of the system, if this relation specializes to a very particular kind of function, namely if
y2(t) = f(yl), where
t~z
,
(1.2)
f depends only on the values taken by Yl before and at
time t . In the stochastic case the sharply defined subset ~
is
replaced by a probability measure on the sample space (Rm) Z and thus the external variable
y becomes a stochastic process
{y(t)}. The model is in this case just the probability law of {y(t)}. To make things simple we shall consider here about the simplest possible class of random processes, described in the following BASIC ASSUMPTION The process {y(t)} is an m-dimensional Gaussian stationary process with zero mean and has a rational spectral density
S
strictly positive definite on the unit circle (i.e. S(e ie) >0).
D We shall write the spectrum
S in a partitioned form corre-
sponding to the subdivision (1.1) of the external variables,
72
S
S1
S12
S21
S2
,
=
(I .3)
where the blocks S., of dimension m. xm., represent the auto l
I
l
spectra and $12 the cross spectrum of the two components {y1(t)} and {Y2(t)} of dimension m I and m 2. The definition of causality in this context, essentially due to
Granger (3963 and 1969)~sounds as follows.
DEFINITION 1.1 We say that the process Yl causes Y2 or, equivaleqtl~, that Yl is an input process with correspondlng output Y2 if, for all
t ~ ~ ,
E~2(t ) lyl]
= E[Y2(t)
]Y1(S), si~ ,
(1.4)
where the first conditional expectatlon is with respect to the whole history {Y1(t); t E Z }
of the component YI"
O Causality is just conditional independence of the past and present output history {Y2(S); s ~ t }
from future inputs
{Y1(S); s > t } given the past of the input {Y1(S); s ~ t } and can of course be defined in a much more general setting than the one adopted here. In a Gaussian setting we can however translate everything in the convenient Hilbert space language of the linear theory of random processes (see e.g.Rozanov, 1967).Some of this material necessary for future use will be quickly reviewed in the next paragraphs. We shall denote the vector space of all finite linear combinam tions of the scalar random variables [a'y(t); ~ 6 R , t E Z } closed in the metric induced by the scalar product
< x,z > : = Ex z, by +
the symbol H(y) (sometimes abbreviated to H) • H~(y) , Ht(Y) will
73
denote the past and future subspaces spanned by the random variables y(s) up to and, respectively, after and at, time t. Clearly,
h • (=yurn(y) ) where U : y(t)+y(t+l)
(1.5)
is the (unitary) shift operator of the
process {y(t)}. Normally the subscript zero in (1.5) will be dropped. For the two components Yl and Y2 we shall define the subspaces H(Yl) , H(Y2) (abbreviated to H I and H 2 when there is no danger of confusion) accordingly. Obviously H = HIV H 2
where
the wedge denotes closed vector sum. Subspaces like H I and H 2 are doubly invariant for the shift U, in the sense that they satisfy
UtH. = H. for all t E ~. The i l multiplicity of a doubly invariant subspace X C H is the cardinality of any minimal generating set, i.e. is the smallest n which one can find random variables
{x1''"'Xn } in X
for
such that
the vector space generated by {Utx.; i= 1,...,n , t E Z } is dense i in X. The process {x(t)} with x.(t) =Utx. is called a generating l i process of X. By the Spectra] Representation Theorem (Rozanov, 1967), there is a unitary representation of the random variables in H(x) as n-dimensional
(row) vector functions in the Hilbert space
L2(C,dQ) where C = {z; Izl = l} is the unit circle in the complex n plane and Q is the n x n matrix spectral distribution measure of the process {x(t)} . Each random variable ~(t): = ut~ with ~EX can be written as ~(t) = [~e i@t f(ei@)dx(e i8)
for a unique
f E L 2 (C,dQ). Here ~ is the n-dimensional random n spectral measure of the stationary process {x(t)}. As it is well known ( Rozanov,
1967 )
the spectral distribution matrix is re-
lated to ~ by dQ=E(d~ d~*),where the star means conjugate transpose.
74
The representation will be symbolically written as
(i .6)
~(t) ~ f(z)x(t) .
The System Theoretic interpretation of the notation is that the stationary process {~(t~ is obtained by passing the stationary process {x(t)} through the linear (stable) filter of transfer function f. In all cases of interest for us the spectral distribution measure of {x(t)} will be absolutely continuous with respect to Lebesgue measure on C. The spectral density matrix will still be denoted by the symbol Q.It is well known(compare e.g.Fuhrmann~1981, p. 111) that {x ,...,x } being a minimal generating set is equivalent to Q(e
.@I i
n
. .
. ,
. °
.
. ,
o
) being st rlctly posltlve deflnlte on a set of
positive Lebesgue measure. For example the assumption S(e i8) > 0 a.e. guarantees that H = H ( y ) has precisely multiplicity m, a possible minimal set of generators being given by the m scalar components of the random vector y(0). Observe further that any other minimal generating process for H(x) can be written as
u(t) = T(z)x(t) ,
(1.7)
with T an nxn matrix function having rows in L2(C,dQ) and Q-a.e. n
nonsingular o n t h e
unit circle.
Of course when {x(t)} admits a spectral density Q which is a.e. positive definite on the unit circle, then all admissible T ' s
will be a.e. nonsingular on C and all minimal gene-
rating processes for H(x) will have an a.e. positive definite spectral density on the unit circle.In particular, by choosingT=W I//2n
75
where W is any square solution of the standard spectral factor~zstionproblemWW*=Q,we
obtain white noise generators {u(t)}
for H(x).The transfer function in the representation
(1.6) in
this case belongs to L2(C, d@/2~). In this context we shall call n causal any function f with vanishing positive Fourier coefficients in
L2(C, dO/2~), i.e. such that n f~ . . I e-~Okf (el0)dO/2~ = 0 ~-~
(1.8)
for all k>0. Thus any causal function belongs to the n-dimensional conjugate Hardy space ~2 (Hoffmann,1962) and can be extended n to a function of the complex variable z analytic on {Izl > I} (including the point at infinity). A matrix valued function T will be called causal if its rows are. It can be verified directly that for any generating process {x(t)} with a strictly positive definite
matrix (°)
spectral density
we have
Ht(u) C Ht(x)
if and only if the
transfer matrix T in (1.7) is causal. A (left -) invertible matrix T with rows in L2(C, d9/2~) will be called minimum phase if it is n
causal and its extension has an analytic (left-) inverse on {Izl > I}. This is the same thing as a conjugate outer matrix function in
H2-theory. We finally recall the concept of conditional orthogonalit~.
Two
subspaces HI,H 2 of H will be said conditionally orthogonal, given.
iH21H), if
a third subspace X (notation: H I _
< h I -EXhl , h 2 - E X h 2 > = 0
for all h I E H I and h 2 6 H 2, Here the symbol E
(1.9) X
denotes orthogonal
projection onto X. Since in the Gaussian case conditional expec(o) or, more generally, ]967).
full rank purely non deterministic
(Rozanov,
76
tation given a certain family of random variables in H is the same thing as orthogonal projection onto the subspace of H spanned by them~ we see that conditional orthogonality is the same property as conditional independence, given X, of the two families H I and H 2 of Gaussian random variables. The concept of conditional orthogonality
will be extensively used in this paper. For additional
information one may consult (Lindquist and Picci. ]985). We return to our discussion of causality in the stochastic setting. The following is a rather well known fact although often stated in a different terminology.
THEOREM 1.1 The process {Y1(t)} causes {Y2(t)} if and only if
Y2(t) = A(z)y1(t) +v(t)
,
(1.10)
where A(z) is an m 2 x m I causal matrix function and {v(t)} stationary process completely independent of {Y1(t)},i.e.
E y1(t)v'(s) = 0
for all
(1.11)
t,s E Z.
This result is essentially due to (Caines and Chart, 1976). It is also discussed in (Caines and Chan, ]975) and (Gevers and Anderson, 1982). In these references causality is called "absence of feedback" (from Y2 to y]). Note that (].]0) is nothing else but the popular ARMAX scheme widely used in time series identification. Just express {v(t)} by its innovation representation, v(t) = G(z)e(t)
,
(1.12)
where G(z) is minimum phase, normalized so as to make G(~) = I,
77
and {e(t)} is a white noise process. Recall that, by rationality of S(z), both A(z) and the spectrum of {v(t)} are rational and then express the rational matrix [A(z) G(z)] by a left coprlme M.F.D. D(z) -lIB(z) C(z)] to get
D(z)Y2(t) = B(z)Y1(t) +C(z)e(t).
(1.13)
The orthogonality condition (1.11) holds if and only if
E e(t)y;(s) = 0 ,
t,sEZ
and therefore using ARMAXmodels noise and input (yl) processes
(o)
a causality relation on the data.
,
with independent
(1.14) (or uncorrelated)
.
is equivalent to imposing a priori In this case the statistical
inference problem of estimating the joint law of {Y1(t)} and {Y2(t)} is reduced to the much simpler problem of estimating just the conditional law of future y~s given past inputs YI" Quite often there is no evidence in the data which justifies the use of causal models. What kind of models should then be used in this situation? One obvious answer would be to describe the whole (joint) process {y(t)} by an m-dimensional ARMA scheme corresponding say to the m x m rational minimum phase spectral factor of the joint spectrum S. Our main concern is however in describing how two given groups of variables
(Yl and y2 ) interact
dynamically.
In practice Yl and Y2 have a precise physical or economic meaning and the main reason for doing modelling and identification is to discover how much of the temporal evolution of each variable is "explained" by the other. For this purpose it would be much more useful to have models which (although necessarily equivalent to the joint ARMA scheme mentioned above) put into explicit evidence the mutual influence of the variables Yl and Y2" A class of mathe(o) Actually condition (1.14) is often considered to be part of the definition of an ARMAX model and is not even explicitly mentioned.
78
matical descriptions which in a certain sense generalizes the causal model (1.10) is the stochastic feedback scheme
Y2(t) = L(z)y 1(t) +v 1(t),
(i.~5) Y1(t) = K(z)Y2(t) +v2(t), where L and K are causal transfer functions and {v1(t)} and {v2(t)} stationary"error" processes whose innovations can at most he assumed orthogonal to the past histories of Yl an4 Y2' respectively. This class of models has been extensively investigated in recent years, especially by Gevers and Anderson (1981 and 1982) and Anderson and Gevers (]982) with the main motivation of understanding identifiabi]ity of control systems operating under feedback. Practica] use of these mode]s for time series identification seems however to have been very limited so far. We shall propose here a different class of models in which the dynamic interaction between Yl and Y2 is explicitly by the introduction of an auxiliary
described
variable x. This auxiliary
variable will play a role similar to the state variable in Systems Theory.
DEFIN%TION 1.2 A Dynamic Factor Analysis Model with external variables the (jointly statignary) vector processes
{Y1(t)} and {Y2(t)}, is
a linear relation of the form
Y1(t) = A 1(z)x(t)+w 1(t),
(i .16)
Y2(t) = A2(z)x(t) +w2(t), where A I (z) and A2(z) are transfer matrices of dimension m I x n and. m 2 x n
and {x(t)}, {w1(t)} , {w2(t)} are zero mean stationary
79
processes of dimensions n, m|, m 2 which are pairwise uncorrelated, i.e.
{w1(t)} i {x(t)} i {w2(t)}"
(1.17) []
Note that A I and A 2 need not be causal. The process {x(t)} will sometimes be referred to as the factor process of the model. A Dynamic
Factor Analysis (F.A.) model will be called rational if AI,A 2
are rational matrices and {x(t)} has rationalspectrum. The terminology (although not
terribly elegant) has been extrapolated from the
static case. In the next sections we
shall present a first rudimentary
analysis of the model (1.16). The main questions one would like to answer concern the representability of an arbitrary joint stationary process {y(t)} (with y(t) partitioned as in (1.1)) by models of the type (1.16), the equivalence of representations (i.e. when do different representations describe the same spectrum S or the same process {y(t)}), the "external behaviour" of the model which is obtained once {x(t)} is eliminated, finding a natural notion of minimality and characterizations of minimal models, parametrizations and canonical forms in the rational case and above all discuss
use of Factor Analysis models in Statistical Inference
(i.e. identification). This is quite a large program and only a few of these aspects will be touched upon in this paper. Others (especially the last two mentioned above), which still need more research, will not be discussed here.
80
2. Dynamic Factor Analysis Models The stationary processes {x(t) }, {w 1(t)}, {w2(t)} which define a Factor Analysis model span a certain Hilbert space H(X,Wl,W 2) which we denote by H . The Factor Space X of the O
model (1.16) is the doubly invariant subspace of H
generated O
by the factor process,
X = span {a'x(t); a E R
n
, t6Z}
.
(2.1)
Let n < n be the multiplicity of X and let[x(t)} be a minimal generating process for X. Clearly, since x(t) =T(z)x(t) for some nxn
matrix T, we can always rewrite the model (1.16) with A (z) I and A2(z) replaced by At(z) =A1(z)T(z) and A2(z) =A2(z)T(z) and a factor process x(t) which is a minimal generating process for X. We shall therefore adhere from now on to the convention of considering only F.A. models in which {x(t)}is a minimal generating process for
X. Hence the multiplicity of X will always coincide
with the dimension of x(t). Two F.A. models which differ by a change of (minimal) generators in X will be called equivalent. Obviously two equivalent models have the same {w.(t)} processes (for i=1,2), i
the same factor space X and transfer matrices and factor processes related hy A.(z) = A.(z)T(z) -I , i
i=I,_ 2
,
i
(2.2)
^
x (t) = T(z)x(t), where T is a Q-a.e. nonsingular
n x n matrix function whose rows
belon~ to L2(C,dQ),Q being the spectral distribution measure of n
{x(t)}. It is easy to check that (2.2) defines an equivalence relation on the class of all F.A. models of {Y1(t)}, {Y2(t)}. We shall now introduce the concept of splittin$ subspace. By this idea we shall be able to attach a precise probabilistic meaning
81 to F.A. models and at the same time reduce this notion
to a very
simple geometric object. Let H i=H(yi),i = 1,2 be the Hilbert spaces spanned by the components {Yi(t)},
i= 1,2. It will be useful to
think of H I and H 2 as (doubly invariant) subspaces embedded in a large Hilbert space Ho obtained by suitably augmenting H
NIV H 2. On
there is defined a unitary shift operator U which reduces to O
the shift of the process {y(t)} on the subspaee H=H(y) = H I V H 2. (The role played by H
o
is very similar to that of the space
H(X,Wl,W 2) introduced at the beginning of this section).
DEFINITION 2.1 A (stationary) splitting Subspace is a doubly invariant subspace X __°fHo which makes H(y I) and H(y 2) conditionally orthogonal given X, i.e. satisfies
H(Yl)IH(Y 2) [ X
(2.3)
together with UX = X. A Splitting Subspace X is called minimal if there are no proper subspaces of X which are doubly invariant and still satisfy condition (2.3).
[] The concept of splitting subspace is a generalization of the idea of sufficient statistic (at least in the Gaussian case). It follows in fact from the definition of conditional orthogonality (1.9) that EEh IIXVH2]
= EEh IIX] ,
h IE H I ,
and, equivalently,
E[h2IxVH ] = E[h21X ] ,
h2 E H2 ,
82
so that all what is relevant in H2(H I) at the purpose of predicting any h16 H I (h26 H 2) is already contained in X. Therefore if X (or any system of generators of X) is given, we can disregard H 2 (H I) completely. Note that the concept of splitting is of interest only if it corresponds to effective data reduction. Hence the notion of minimality is of central importance. LEMMA 2.1 (Ruckebusch, 1976 and Lindquist and Picci, 1985) A splitting subspace X is minimal if and only if EXH I = X ,
EXH 2 = X
(2.4)
(here EXH'I is the closure of {EXhl; hie Hi} ,
i = 1,2). []
The following theorem shows that (modulo choice of generators) splitting subspaces and Dynamical Factor Analysis models are essentially the same thing. THEOREM 2.1 The factor space X of any F.A. model of {Y1(t)}, {Y2(t)} i__ss a splitting subspace. Vice versa to every splitting subspace X for H(Yl) , H(Y2) of finite multiplicity there . corresponds the equivalence class, defined modulo choice of generators, of F.A. models having X as
factor space.
Proof: Let X be given by (2.1). Then, since A.(z)x(t) = EXy.(t) , l l
tEZ,
i= 1,2 ,
(2.5)
the o r t h o g o n a l i t y r e l a t i o n of {wl(t)} and {w2(t)} , which holds by assumption for any model (1.16), can be rewritten as X X Y1(t)-g y1(t) l Y2(S)- E Y2(S) ,
t,sEZ •
(2.6)
88
As {Yi(t)} is a generating process for H.I it follows from the definition ( 1 . 9 )
t h a t indeed X i s s p l i t t i n g .
Viceversa, let
X be a splitting subspace and {x(t)} a minimal generating process X for X of dimension n. The projections E Y i ( t ) can be w r i t t e n as in (2.5) for suitable transfer functions A.(z) of dimension m. xn. I i Define
w.(t): = Y i ( t ) - E X . ( t ) l i
,
teT,
i=1,2,
(2.7)
then the stationary processes {w.(t)} are orthogonal to X and, i by the conditional orthogonality condition (2.3),we have also E w1(t)w2(s)' = 0
for all
t,sE ~. Therefore
{y1(t)} and
{Y2(t)} can be written as in (1.16), while satisfying (1.17). [] The equivalence established by Theorem 2.1 permits to define a first rough notion of minimality for F.A. models. We shall say that a F.A. model is irreducible if its factor space is minimal splitting.
THEOREM 2.2
(Picci and Pinzoni, 1986)
A F.A. model is irreducible if and only if the rank a.e. on the unit circle of the matrices A|(z) and A2(z) is equal to the multiplicity of X. All irreducible F.A. models have the same multiplicity (i.e. the same number of factors) n equal to the rank a.e. on the unit circle of the cross spectrum $12 of the processes {Y1(t)} and {Y2(t) }In the rest of this paper we shall concentrate on irreducible models. As we have just seen these models are characterized by a.e. left invertible matrices Ak(Z) , k = 1,2. Their factor process has an absolutely continuous spectrum
with an a.e. positive definite
spectral density matrix Q on the unit circle(Picci and Pinzoni,1986).
84
If in an irreducible F.A. model we eliminate the auxiliary variable {x(t)},we obtain a scheme of the following type, A2(z)-Ly2(t) = A1(z)-Ly1(t)
,
(2.8)
Y1(t) = Y1(t) +w1(t)
,
(2.9a)
y2(t) = Y2(t) +w2(t)
.
(2.9b)
This is essentially what is commonly called an Errors-In-Variables (E.I.V.) model of the processes {Y1(t)}, {Y2(t)}. Here Y1(t) and Y2(t) are represented as "noisy" observations of the "true" variaA
bles y~(t), Y2(t) obeying the deterministic relation (2.8) . Note that the correlation structure of {Y1(t)} and {Y2(t)} is completely embodied in the relation (2.8)
as the noise processes {Wk(t)} are
mutually uncorrelated and also orthogonal to the "true" variables {;k(t)}. An equivalent form of the deterministic link (2.8)
between
the true variables is obtained by substituting x(t) =A1(z)-Ly|(t) into the second equation in (1.16), getting -L Y2(t) = W(z)Y1(t)
,
W(z): =A2(m)AI(Z)
(2. I o)
, W(z) ~=A1(z)A2(z) -L
(2.11)
or,dually,
Y1(t) = W(z~Y2(t)
Note that the transfer functions W(z), W(z) ~ and also the relation (2.8)
are invariant under change of generators , x(t) =
= T(z)x(t) (T nonsingular), and are therefore uniquely attached to the (minimal) splitting subspace X of the model. An important question concerns the existence of models for which W (or W ~
is
a causal transfer function. This is the same as asking if two stationary processes described by an arbitrary joint spectrum S can be represented by the "noisy" input-output model
8G
Yl (t) = yl (t) + w I (t), (2.12) Y2(t) = W(z)Y1(t) +w2(t) , where W(z) is causal and {w1(t)}i{y|(t)}i{w2(t)}.
We shall
take up this kind of questions in section 4. As a last general comment about F.A. models, we remark that the freedom of changing generators in the factor space X permits to choose transfer matrices Ak(Z) or factor processes of very special structure. For example we can always take {x(t)} to be a white noise process or require that both At(z) and A2(z) be causal transfer functions.
For simplicity we shall state the next result
for the case of rational F.A. models.
PROPOSITION 2.1 For every rational irreducible F.A. model there is a choice of (minimal) generators in X, x(t) = T(z)x(t) ,
which (maintains rationality and) achieves causality of the transfer function matrices
Ak(Z) = Ak(Z)T(z)-1,
k = 1,2
(2.13)
Proof: In a rational model both the spectrum Q and the matrices ~,
k = 1,2
are rational functions. Since the joint spectrum
of the processes
Yk(t) = ~ ( z ) x ( t ) ,
=
$12 $21
32
=
k = 1,2
Q A2
I
,
A2
'
(2.14)
86
is then itself a rational function, it admits causal (in particular minimum phase) rational
spectral factors. Note that irre-
ducibility implies that rank S = n = r a n k
$12. Pick a causal full
rank rational spectral factor A (of dimension m x n) of S and write it as a partitioned matrix with two blocks Ak(Z) of dimensions m k x n ,
k = 1,2. The spectral factorization
= [AI(z)
(z)
L A2(z)
A2
is c l e a r l y e q u i v a l e n t t o the r e p r e s e n t a t i o n s k = 1,2
with { x ( t ) } an
Yk(t) = A k ( z ) x ( t ) ,
n - d l m e n s i o n a l white n o i s e p r o c e s s .
We
interpret {~(t)} as the new factor process of the model. Since A(z) is full rank, we can solve for
A1(z) =
A2(z)
Y2 (t)
x(t)
in the representation
x(t)= [X1(z)1 x(t), A2(z)
getting
~(t) = A2(~)
-L [A1(z) A2(z)
Note that T is square n x n
x(t) := T(z)x(t)
.
and nonsingular because of irre-
ducibility. This proves Proposition 2.1.
In the proof we could in particular have chosen =
LA1(z)'X2(z)']' minimum
[] A(z) =
phase. We see that an irreducible
rational F.A. model can always be written as a pair of ARMAX equations,
87
D1(z)Y1(t) = B1(z)x(t) +C1(z)e1(t), (2.15) D2(z)Y2(t) = B2(z)x(t) +C2(z)e2(t),
with {~(t)},
{el(t)} ,
{e2(t)} pairwise uncorrelated white noise
processes and Dk(Z)
and
dimensions
and
mkxm k
Ck(Z) stable polynomial matrices of m k x Pk' Pk
being the multiplicity
of the noise process {Wk(t)} , k = 1,2.
3. Stochastic realization The main problem of this section will be to describe the class of all irreducible F.A. models which match a given spectral density matrix. We shall see that this is equivalent to solving the following problem. PROBLEM P.! Given an
mxm
spectral densit~ matrix S partitioned as in
(1.3) and satisfyin$ the Basic Assumption
of Sect. I, find all
5-tuples of matrix functions {AI,A2,Q,RI,R 2} on the unit circle, with A~ of dimension
mkx n
and of rank n , Q of dimension
and nonsingular , R k of dimension i)
nx n
mkx mk, k = 1,2, which
satisfy the system of equations
S I = AIQ A I + R I ' $12 = A1Q A2 , S 2 = A2Q A 2 * R 2 ,
ii)
make the (m+n) x (m+n) matrix
(3.1)
88
sI
S12
AIQ
$12
S2
A2Q
QA I
QA 2
Q
(3.2)
into a spectral density matrix (in particular Hermitian and nonnegative definite on the unit circle).
[] Assume we have an irreducible F.A. model,
z1(t) = A1(z)x(t) +w1(t) ,
(3.3) z2(t) = A2(z)x(t) +w2(t)
if we interpret Q as the spectral density matrix of Rk, k = 1,2
{x(t)} and
as the spectra of the two noise processes {Wk(t)} ,
we see that eqns. (3.1) express precisely the fact that the joint spectrum of {z1(t)} and {z2(t)} coincides with the given joint spectrum S.
Note also that the matrix S in (3.2) is just the
joint spectral density of the three processes {z1(t)} , {z2(t)} and {x(t)}. Vice versa, assume we are given a 5-tuple {AI,A2,Q, RI,R 2} of matrices satisfying eqns. (3.1) and condition (ii). It is not hard to see and we shall check this later, that condition (ii) implies that
Q,RI,R 2 are necessarily bounded Hermi-
tian positive semidefinite (Q is actually positive definite) matrices on the unit circle and can therefore be interpreted as spectral densities of three mutually uncorrelated zero mean Gaussian processes [x(t)},{w1(t)} , {w2(t)}. Starting from these processes, we generate {z1(t)} and {z2(t)} by the linear transformation (3.3). We see from (3.1) that the joint spectrum of the stationary processes {z1(t)} , {z2(t)} is precisely equal to the given joint spectral density matrix S. In short, solving
89
problem
P.I is the same thing as finding all irreducible F.A.
models (3.3) for which the joint spectrum of the external variable__~s{z1(t)} , {z2(t)} is equal to the given spectral density matrix S. This problem is a distributional or "weak sense" stochastic realization problem (Finesso and Picei, ]984 and Lindquist and Picci, 1985). Interpreting S as the joint spectrum of two given Gaussian processes {y](t)}, {Y2(t)} , we are looking for all irreducible models (3.3) such that {Zk(t)} and {Yk(t)} equal processes in distribution. In "practical"
are
terms this
means that the model (3.3) will only be useful to simulate the signals {Yk(t)} in an "average" sense but not samp]ewise
in
general. A 5-tuple {AI,A2,Q,RI,R 2} satisfying conditions (i) and (ii) above, or, equivalently a F.A. mode] of the type (3.3) matching the given spectrum S, will be called a F.A. representation of the spectrum S. A (strong sense) F.A. representation of the processes {Y1(t)}, {Y2(t)} is instead a F.A. model of the type (3.3) for which Zk(t)= = Yk(t) almost surely for all
t e Z. This type of (samplewise)
equality is clearly stronger than equality in distribution and can only occur when the processes {Zk(t)} and {Yk(t)} are defined on the same probability space. This means that the various processes {x(t)}, {w1(t)} , {w2(t)} in (3.3) must be built in such a way that Ho: =H(X,Wl,W 2) DH(yl,Y 2) =H(zl,z2). Samplewise (i.e. strong sense) F.A. representations of {y1(t)},[Y2(t)} can be classified according to "how big"an underlying space H
is needed to support o the processes which specify the model. Later we shall study in some detail the class of F.A. representations for which H = o = H(yl,y2). These representations will be called "y-measurable" (o)
(o) Clearly an equivalent condition for y-measurability is that the factor space X is included in H(y).
90
Note that whenever {x(t)} is given, the noise processes {Wk(t)} are automatically fixed as functions of {x(t)~, {Y1(t)}, {y2(t)} by the orthogonality condition (1.17), as
Wk(t) = y k ( t ) - EXyk(t)
,
k = ~,2 ,
(3.4)
where X is the splitting subspace generated by {x(t)}.Therefore a (strong) F.A. representation is completely specified once the factor process {x(t)} is assigned as a function of some available generators of the space H . In particular a y-measurable repreo sentation is completely specified once {x(t)} is given as a function of {Y1(t)} and {Y2(t)}. In order to avoid complicated statements about equivalence classes, it will be useful to fix once and for all a rule for choosing generators in each factor space X. A convenient way to do this is to f i x a full rank factorizationof the cross
spectrum $12 ,
$12(z) = H(z)G (z) ,
(3.5)
where H and G are of respective dimensions m I x n , m 2 x n and of rank equal to
n=rank
$12 a.e. on C. Since $12 is rational, we
can always choose H and G to be rational matrices. In fact we shall choose H and G in such a way that (3.5) is a minimal factorization of the rational matrix Sl2(in the sense of Gohberg
and
Kaashoek
Bart,
(]979), p. 84).
Since all entries of a rational spectral density matrix must be analytic on the unit circle, it follows that both H(z) and G(z) must also be analytic on the unit circle. In the following we shall make the simplifying assumption that $12(z) has no zeros on the unit circle, i.e.
rank S I2(e i0) = n,
This guarantees
V e e [0,2~).
(3.6)
that neither H(z) nor G(z) can have zeros
91
on the unit circle, more precisely, both H(e io) and G(e ie) will be of constant rank n
for all
e E [0,2~). From now on the
matrices H and G will be considered as data of our problem.
LEMMA 3.1 Let condition (3.6) hold.
Then for each equivalence class
of irreducible F.A. models of {y1(t)}, {Y2(t)} there is a unique choice of generating process {x(t)} in the factor space X such that -I AI(Z) = H(z),
A2(z) = G(z)Q(z)
,
(3.7)
where Q is the (nonsingular) spectrum of {x(t)} . Alternatively, a unique generating process [x(t)} can be chosen for which
A 1(z) = H(z)Q(z) -I ,
A2(z) = G(z) ,
(3.8)
where Q is the spectrum of {x(t)}. The generating processes {x(t)}, {x(t)} for =he same minimal splitting subspace X are related by the transformation
x(t) = Q-1(z)x(t) .
(3.9)
Proof: In fact, if we start with an arbitrary irreducible model (1.16),there is a unique change of generators in X, ~(t) =T(z)x(t), with T such that H(z)T(z) =A1(z). Note that there is a unique a.e. nonsingular solution to this equation as both A I and H are of full rank n. Moreover T E L2(C,Qd6) where Q is the spectral density of {x(t)}. This follows from T(z) =H(z) -LA 1(z),because A ~ L2(C,Qde) and any left inverse of H(z) is analytic on the I
92
unit circle, in force of assumption (3.6). With this choice we get
$12(z) = H(z)Q(z)(T(z)-I~A2(z)* , where Q is the spectrum of {x(t)}. From (3.5) it follows then A2(z)T(z)-I = G(z)Q(z) -I Similarly, by choosing x(t) = T(z)x(t) with G(z)T(z) =A2(z) , we obtain (3.8). In particular, for x(t) =x(t) we find T =~-I.
[] By choosing the generators as stated in Lemma 3.1, we get a unique irreducible F.A. model representative of each minimal splitting subspace X. These models, for the two different choices (3.7) and (3.8), can be written as Y1(t) = H(z)x(t) + w (t) I
(3.10)
Y2(t) = G(z)Q(z)-Ix(t) +w2(t) ,
and, respectively, as
y1(t) = H(z)~(z)-1~(t) + w1(t) (3.11) Y2(t) = G(z)x(t) +w2(t)
.
We shall call "first" and "second" type canonical forms the two representations (3.10) and (3.11). Clearly each equivalence class of irreducible F.A. representations of a given spectrum S can in turn be represented by a unique 5-tuple -I {H, G Q or by
, Q, RI, R 2}
93
{H
~-I
G, Q
RI
R 2}
Note that R
and R are uniquely determined from the equaliI 2 ties (3.1) as functions of SI,AI, Q and $2,A2, Q. We conclude that all irreducible F.A. representations of the spectrum S• written in the first canonical form, are parametrized in a one-to-one wa 7 By the
nx n
nonsingular matrix function Q as
{H• G Q
-1
*
, Q, SI-HQH • S2-GQ
-1
*
G } ,
(3.12)
where Q is constrained to satisfy the condition that the matrix
S1
HG
GH
S2
QH
G
(3.13)
be a spectral densitY. Dually• all irreducible F.A. representations o_f_fS written in the second canonical form are parametrized in a one-to-one way by the nonsinBular
nx n
matrix function Q
{HQ -I, G, Q• SI-HQ-IH , S2-GQG } ,
a~s
(3.14)
where Q is constrained by the condition that the matrix
SI
HG
GH*
S2
LH*
H (3.15)
QG*
be a spectral density function. At this point we are ready to describe the solution set of our stochastic realization problem P.I. We introduce the
nx n
Hermitian matrices * -I QI: = H S I H•
*
-I
Q2 = G S 2 G
(3.16)
94 and set Q,:
o
=
Note that both Q1 and Q2 are strictly positive definite rational spectral density matrices in force of condition (3.6) and our standing assumptions on S. We define also the
nxn
Hermitian
matrices
A: = q l - q 2
'
~: = Q2-Q1
(3.18)
THEOREM 3.1 All irreducible F.A. representations of the spectrum S written in the first canonical form (3.12) are parametrized hy the solutions Q of the matrix inequality
Q-Q2-(Q-Q2)A-I (Q-Q2)* > 0
(3.19)
•
Dually~ all irreducible F.A. representations
of S written in the
second canonical form (3.14) are parametrized by the solutions of the inequality Q-QI-(Q-QI)% -1 (Q-QI)* ~ 0. A__n_n nx n
(3.20)
matrix function Q solves (3.19) if and only if
solves (3.20). All solutions Q (Q) of(3.19)
~ = Q-I
(resp. (3.20))are
Hermitian bounded and strictly positive definite,in fact they satisfy QI Z Q Z Q2 "
Q2 > ~ > Q1 '
(3.21)
where QI and Q2 (Q| and Q2 ) are the spectra] densities defined by (3.16) and (3.17).
95
Proof: What needs to be shown is that an n x n matrix function Q makes (3.13) a spectral density matrix if and only if it satisfies the quadratic inequality (3.19). Assume there is a Q making (3.13) into a spectral density matrix. Then, by a standard block diagonalization procedure, the positive definiteness of (3.13) is seen to be equivalent to
S2>0
, (3.22)
$I: = S1-S 12S21 S21 > 0 , *
q-G
-I
*
-I
*~-1
*
-I
S 2 G-(q,G S 2 G)H S I H(Q-G S 2 G) > 0 .
The first two inequalities are trivially satisfied. In fact, by our Basic Assumption on S, S 2 and $I are strictly positive definite on the whole of C. By simple matrix manipulations it can be checked that *
* -1
H*S-IHI = H (SI-HQ2 H )
_
H = (QI
-1
Q2 )
(3.23)
and therefore, recalling our notations (3.18), we see that Q has to satisfy the inequality (3.19). Note that Q makes the matrix (3.13) positive semidefinite if and only if Q=Q-I makes (3.15) positive semidefinite. This in turn happens if and only if Q satisfies the dual inequality (3.20) as can be seen by exactly the same argument used before. -I Thus Q satisfies (3.19) if and only if Q satisfies (3.20). -I Observe now that the matrixA , given by the expression (3.23), is strictly positive definite Hermitian on the unit circle and therefore any solution Q to (3.19) makes Q-Q2 positive semidefinlte Hermitian. Hence Q is Hermitian and Q ~ Q 2 " solution Q of (3.20) satisfies Q ~ Q I
•
Similarly any
Then, writing Q as Q-I9
we
96
obtain the first inequality in (3.21). So, any solution to (3.19) has a lower (Q2) and upper bound (~]I),. Q2 being strictly posi----I
rive definite and QI
being trivially bounded on C. It follows
that any solution to (3.19) is a spectral density matrix. The matrix (3.13) constructed from such a solution is also Hermitian positive semidefinite and has bounded entries on the unit circle. Therefore it is a spectral density matrix. [] The solution set of the inequalities
(3.19),
(3.20) can be
described quite explicitly.
THEOREM 3.2 An
nx n
matrix valued function Q on the unit circle solves
the inequality (3.19) if and only if it is Hermitian and Dually, an
nxn
QI ~ Q~Q2'
matrix Q solves (3.20) if and only if it is
Hermitian and satisfies
Q2~Q~QI.
Proof: The "only if "part is already contained in the statement of Theorem 3.1. We only need to prove the "if" part. Assume first that QI > Q > Q2 (with strict inequalities) holds. Then (QI-Q) and (Q-Q2) are both Hermitian strictly positive definite and there-I -I fore (Q-Q2) + (QI-Q) is strictly positive definite. Byawellknown formula for the inverse of a sum of matrices we see that this positivity condition is equivalent to
Q-Q2-(Q-Q2)A
(Q-Q2)
> 0 .
Now, every Q satisfying QI ~ Q ~ Q 2
(3.24)
can be approximated in L=nxn(C)
by a sequence of matrices Qk for which the strict inequalities hold~ Take for instance
97
Qk = -k- Q +
(QI+Q2) '
for which apparently Qk-Q2 > 0 and QI-Qk > 0. Hence Qk satisfies the strict inequality (3.24). But the left hand side of (3.24) is a positive definite matrix which is a continuous function of Qk and, as
k-> ~, it can at most become positive semidefinite.
[3 REMARK As a corollary of Theorems 3.1, 3.2 we obtain that the inequality QI>Q_>Q2 is equivalent to -1
SI>HQH
,
S2>G Q
*
G ,
Q>O,
which form in turn an equivalent set of conditions to the positivity of the matrix (3.13). This fact in particular guarantees that if Q satisfies (3.21) (or equivalently (3.19)), then the noise spectra R I and R 2 will be (Hermitian and) positive semidefinite. Note that the maximal solution QI is in this sense just the matrix which corresponds to the largest approximant of rank n of S I in the ordering of Hermitian positive semidefinite matri-
[] ces.
Theorem 3.1 provides a recipe for computing all irreducible F.A. representations describing a given spectral density matrix S in a fixed coordinate system. We can now easily see that there are many of such representations (a fact that we have not bothered to show till now). For example, as the two "extreme" spectra QI and Q2 defined in (3.16) and (3.17)
both satisfy the inequality
(3.19) (with equality sign), we see that there are a "maximal" and "minimal" irreducible F.A. representations (in the first canonical form) which correspond respectively to the maximal (QI)
98
and minimal (Q2) solutions to the inequality (3.19). Solutions like QI' Q2 above for which (3.19) is satisfied with equality sign have a special meaning. They correspond to joint spectra (3.13) of minimum possible rank, m, as can be seen from the block diagonalization
(3.22). Since the rank of the joint
spectrum of {z1(t)}, {z2(t)} and {x(t)} is equal to the multiplicity of the doubly invariant subspace
H(X,Zl,Z 2) spanned by these
processes, the multiplicity m of
H(X,Zl,Z 2) is equal to the mul-
tiplicity of the subspace H(Zl,Z2). This can only happen if H(X,Zl,Z 2) =H(Zl,Z2) , or, that is the same, if x(t) EH(Zl,Z 2) for all
tEZ. We see that all models which correspond to solu-
tions Q of (3.19) with equality sign are characterized by the fact that the factor process {x(t)} is a function of {z1(t)} , {z2(t)}. This observation is the key to the following result. PROPOSITION 3.1 The solutions Q to the quadratic matrix equation -I Q-Q2-(Q-Q2)A
* (Q-Q2)
= 0
(3.25)
parametrize in a one-to-one way the (strong) irreducible y-measurable representations of the processes {Y1(t)}, {Y2(t)}
of the form
(3.10). Dually, all solutions Q to the quadratic e~uation
Q-QI-
(Q-QI)A-I(Q-QI)* = 0
(3.26)
parametrize in a one-to-one way the (strong) irreducible F.A. representations of {Y1(t)}, {Y2(t)} of the form (3.11) for which XCH(y). Proof: Consider a F.A. representation of the type (3.10). If the factor space X is contained in H(y),then H(x,Yl,Y2) =H(yl,Y2) =H(y)
99
and hence the joint spectrum of {Y1(t)}, {Y2(t)}, {x(t)} has rank m. This implies that the spectrum Q of {x(t)} satisfies (3.19) with equality sign. Viee versa, assume Q is a solution of (3.25). Then, as discussed previously, the factor process of the F.A. -I model of type (3.3) attached to the weak realization {H, GQ , *
Q, SI-HQH , S2-G Q
-I
*
O } of the spectrum S, has the property that
x(t) belongs to H(z 1,z 2) for all t. It can therefore be written as x(t)=P1(Z)Zl(t)+P2(z)z2(t), transfer matrices. Define an
where P.(z), i= 1,2, are n x m . i l n-dimensional process {x(t)} by
setting
x(t) = P1(z)Y1(t) +P2(z)Y2(t).
(3.27)
Then {x(t)}, {Y1(t)}, {Y2(t)} have exactly the same joint second order statistics (i.e. the same spectrum) as {x(t)}, {z1(t)} , {z2(t)}. Since conditional orthogonality depends on joint second order moments o n l ~ i t then follows that is splitting for splitting for
H(Yl) , H(y 2)
X : = span {x(t); t 6 ~ }
exactly as the factor space X was
H(Zl) , H(z2). Hence {x(t)} is the factor process
of a strong F.A. representation of the type (3.10). By construction {x(t)} has spectral density matrix equal to Q and XCH(y). []
Let us define the stationary n-dimensional processes
xl(t) =
Q1(z)
xl(t),
Xl(t) =H
(z)S1(z) -ly I (t)
, (3.28)
x 2(t) = G * (z)S2(z) -ly2(t) where QI is defined by (3.16),(3.17).
Observe that the spectra of
{x1(t)} and {x2(t)} are precisely the extremal solutions QI,Q2 of the quadratic inequality (3.19). It is immediate to check that {x1(t)} and {x2(t)} are minimal generators for the subspaces
100
X1:
-H(Yl)H(Y2) '
= E
X2:
~H(Y2)H(y ]).
=
(In fact, for example X 2 is generated by
Y1(t) = S|2(z)S21(z)Y2(t) =
= H(z)x2(t) ). Moreover both X I and X 2 are minimal splitting suhspaces (compare e.g. Lindquist,Picci and Ruckehusch, 1979) XICH(y I) ,
X2CH(Y 2) ,
therefore they specify two equivalence classes of strong irreducible F.A. representations of {y1(t)}, {Y2(t)}. The particular generators {x1(t) } and {x2(t) } defined in (3.28) correspond to choosing these representations in the first canonical form, namely Y1(t) = H(z)x1(t ) +w1,1(t) , (3.29) Y2(t) = G(z)Q1(z) -Ixi (t) +wl,2(t) , and Y1(t) = H(z)x2(t ) +w2,l(t) , (3.30) Y2(t) = G(z)Q2(z)-Ix2(t ) +w2, 2 (t) . Observe that in the representation (3.29) the second equation is just the decomposition of estimate
Y2(t)
as the sum of the (noncausal)
Y2(t) = S21(z)S](z)-I Y1(t) and of the corresponding
estimation error. The first equation is more interesting. It can be rewritten in the form YI(t) = ~H(z)Y1(t) + (l-nH(z))Y1(t) , where H
(3.31)
i s the p r o j e c t i o n v a l u e d m a t r i x f u n c t i o n ~tt(z) = H(z)(H
(z)S1(z)-IH(z))-'IH*(z)S1(z)-1 (3,32)
101
mapping onto the column space of H. Note that ~ is S1-orthogonal , , H i.e. HHSI(I-~H) = 0 a.e. on the unit circle. Thus x I formally looks like the classical least squares estimate of x linear model Yl = H x + w
in the
. An analogous interpretation holds for the
second equation in (3.30). The next theorem describes quite explicitly
the family of
all (strong) y-measurable irreducible F.A. representations of {Y1(t) },
{Y2(t) }.
THEOREM 3.3 The factor process of any irreducible y-measurable F.A. representation (in the first canonical form) is a combination of {x1(t)} and {x2(t)} of the form x(t) = H(z)x1(t) + (I-E(z))x2(t) ,
(3.33)
where = (Q-Q2)A -1
(3.34)
is a A-ortho~onal projection valued matrix function on the unit circle. Proof : The proof relies on the easily checked fact that IN(x2) ] =x2(t). Then
x(t): =x1(t)-x2(t)
is orthogonal to {x2(t)} and so the direct sum is orthogonal.
ELx1(t) I
form a process which
H(x 1,x 2) =H(x)8H(x2) , where Now, any minimal splitting
suhspace X C H(y) is actually contained in
H(x~,x 2) C H(y)
(Lindquist and Picci, 1985), so that the corresponding factor process {x(t)} can be expressed as x(t) = S
~(z)A(z)-Ix(t) +S (z)Q2(z)-Ix2(t) x,x x,x 2 '
(3.35)
102
where the cross spectra are easily computed from
S = S S-1HQ11 = q x,x I x,Y I SX,X 2 z S S-IG x,y 2 2
=
9
Q2
Equation (3.35) is exactly the same as (3.33). In order to check that H is a projection, notice that right multiplication of -I (3.25) by A gives
(Q-Q2)A -I = (Q-Q2)A-I(Q-Q2)& -I
,
which shows that H = H 2 ; moreover (3.25) can be rewritten to look exactly llke ~g(l-~)
=0. Thus H is a A-orthogonal projection.
If we couple formula (3.33) with the explicit expressions (3.28) given for
x1(t)
and
x2(t) , we obtain a linear trans-
formation acting on the "data" {y1(t)}~Y2(t)}
that ~e want to
represent. This is precisely the rule telling us how the factor process of each y-measurable representation is manifactured. Note that (3.33) is still parametrized by Q. To complete the picture we need now to describe the solution set of the quadratic equation (3.25).
PROPOSITION 3.2 Let V be a square spectral factor of the spectral density matrix A=QI-Q2.
Then all solutions
Q = Q2+v where r is any
nxk
rr v
,
(k~n)
unit circleti.e, such that
Q #Q2
to (3.25) are given
(3.36) isometric matrix function on the
103
F F = Ik ,
Ik being the
kx k
(3.37)
identity matrix.
Proof: Write
Q-Q2' assumed to be of rank
k_ H(n), w h e r e H(n) = -- j.~Xy. P(y) log P(y) denotes the e n t r o p y of the strings of length n. This means that the ideal w a y to encode the strings relative to the given distribution is to assign to string y a code string w i t h length - log P(y). This, of course, c a n n o t a l w a y s be done e x a c t l y because a code string must have an integer length, but at least we k n o w w h a t w e should be striving for, and w e call it t h e / d e a l code length. A n o t h e r good n a m e w o u l d be Shannon complexity o f y relative to the given distribution.
A direct application of these ideas to compressing strings confronts us with the same problem as m e t in traditional statistics: The distribution P0") is not k n o w n to us, and it either has to be imagined or, better, estimated. For this reason w e consider a p a r a m e t r i c family { P0(Y) } of such distributions or models, w h e r e 0 = (01. . . . . Ok), and k ranges over the set of all n a t u r a l numbers, H o w n o w to calculate the ideal code length is the central problem in the
M D L principle to b e discussed next.
122
There are two basic ways to go about encoding a string of data. In the first way we read the entire string and we ^ somehow form the best estimate 0 O,) of the parameter vector 0. Then we design a code C such that the length of the code string CO,) is close to the ideal -- log/~e c,)(y). We need not concern ourselves with the details of how such a code can be designed, which is just a routine matter. The important thing to realize is that the datay can lag de^ coded from the code string CO,) only if the decoder also knows the estimated parameter vector 0 (y). This has to be given in an explicitly coded form, because the decoder at the time it is needed does not yet k n o w y and, hence, cannot calculate the estimate by any conceivable algorithm. The binary code string for the parameter vector, which may be placed as a preamble in front of C(y), must dearly be a prefix code, lor otherwise the decoder would not be able to separate it from the subsequent binary code of the data. Hence, its length L(8) must satisfy the Kraft-inequality, ~2-L(') _< 1, where 0 runs through all its possible values. These values are clearly truncations (think of computing the maximum likelihood estimates, which surely result in truncated numbers). If we carry too many fractional digits, the required code will have to be long, while if we truncate too heavily, the results will deviate too much from the optimum, and we end up coding the string with non-optimal parameters. It turns out that when each component is truncated to its optimal precision, reflecting its importance to the entire code length, the k code length for the k-component parameter vector and the loss due to truncation is ~-- log n bits, Rissanen (1978). In addition, the decoder will have to be given the number of the components k in the estimated parameter vector as another prefix coded preamble, which takes a little more than log k bits. This number, of course, is almost always quite negligible in comparison with the other length, and we drop it. All told, the best ideal code length with this type of "nonpredictive" coding is to within terms of order log n given by
k l~vp(y) = min { -- log P0(Y) + "~- log n]. k,e
(2.2)
The same expression but with different content and scope was also derived by restrictive Bayesian assumptions }n Schwarz (1978). We also refer to the pioneering work of Akaike (1974) for another criterion, where the weaker model complexity penalizing term k gets added to the first, the negative logarithm of the likelihood term. In contrast with (2.2) such a term is too weak to produce consistent estimates of the number of parameters in all the analyzed cases, Hannan (1980) and Shibata (1976). Finally, we add that when the parameter coding job is done more carefully, Rissanen (1983a), a third term is required, namely, k log ]JOII~<e),where M(O) denotes the Hessian
123
matrix of - log Pc(Y). This term turns out to be sensitive to the structure in which the parameters of a multivariable dynamic system are represented, Rissanen (1983b); see also Section 4.
The other way of coding data strings requires no explicit code for the parameters, because the coding will be done in a "predictive" way. What this means is that from each portion y ' =-)1 . . . . . y, of the data string we form an estimate of the distribution P0(.v,+l I.I/) for the possible values of the next symbol, where # is to be replaced by an estiA A mated value 0(t) = O0/), calculated by an algorithm from the so-far processed string. The decoder knows this algorithm, and he can also calculate the same estimate provided that it indeed does not depend on the future and not yet decoded data points. We know from Shannon's result, derived above, that the best way to do the coding is to assign to the next symbol the code length -- log P~0(,)(y,.~ lY'). and hence the best total ideal code length with this type of predictive coding is
n-I
I_p0') = min { - ~'. log ~t0(vt+l lY')}.
k
t-O
(2.3)
We should also have included the code length, log k, required to describe the number of the components in the es^ timated parameter vectors, but as above this term is negligible. How sheuld we pick the estimates O(t)? It seems to make eminent sense to pick them in such a way that the accumulated past code lengths
t-I -- log PO(,,pt) = -- E log PeO~¢+l lyt), t=O
(2.4)
arc minimized, which is seen to be done by the maximum likelihood estimates of the parameters for each value e[ k. This represents a most attractive principle of inductive inference: Make that choice that has worked best in the past. And who can argue against that, provided that we have no other "prior" knowledge about the behavior of the data! This philosophy in his "prequential" approach to estimation was also discovered independently in Dawid (1984). A somewhat similar and yet crucially different "cross- validation" principle has been studied in Stone (t977). Because no "honesty" of the predictions is required, the associated criterion is asymptotically equivalent with Akaike's AIC, and hence the resulting estimates of the number of parameters are not consistent.
124
In order to avoid ill-conditioned optimization problems, w e in (2,4) never estimate more p a r a m e t e r s t h a n d a t a points; t h a t is, w e begin w i t h k = 0 a n d increase/¢ gradually to each final value with which the criterion in (2.3) is evaluated. The case w i t h no free p a r a m e t e r s means that w e need an initial distribution POt) to predict or encode the v e r y first observation. This could be done by having a fixed p a r a m e t e r value 0(0), obtained s o m e h o w on prior grounds, w h i c h singles out a distribution from the family. W e discuss later h o w such a prior knowledge can be t a k e n a d v a n t a g e of in modeling and prediction.
The predictive coding process is seen to be v e r y similar to prediction: In both cases we try to unravel the uncert a i n t y a b o u t the " n e x t " observation Y,.t b y acting on the past d a t a only. In fact, the t w o processes are equivalent. A A TO see this, let 8(Ft+t - - y (t + 1 ] t)) be a n y reasonable prediction error measure, w h e r e y (t + I ] t) is some prediction of the n e x t observation, involving p a r a m e t e r s to be estimated from the past data. Define a conditional density ^ f~(Y~+t [Y) proportional to e -~(y,,t -y 0+tl0) and we get a family of p a r a m e t r i c probahilistic models, w h e r e the code length, a p a r t from a n irrelevant t e r m due to t r u n c a t i o n and proportional to n, is the sum of the prediction errors. A particularly i m p o r t a n t special case results from the quadratic prediction error measure, because then the predictive MDL principle reduces to a predictive least squares (LS) principle. W e discuss its application to A R M A estimation in the n e x t section. Because the non-predictive coding process c a n n o t be interpreted as prediction, w e conclude t h a t coding is a strictly more general process than prediction.
W e conclude this section b y stating that the t w o described coding lengths are asymptotically optimal in the sense that their mean, relative to a n y process in the considered class of " s m o o t h " models, is shortest a m o n g all codes satisfying (2.1). Because the variance of these lengths, c o m p u t e d per observation, behaves like 1/n, w e m a y take these lengths themselves to represent well the shortest possible per s y m b o l code lengths (prediction errors), and w e call It~e(F) and lr(F) the non-predictive and predictive stochastic complexities, respectively, of the stringy, relative to the considered class of models. This result not only generalizes the a b o v e m e n t i o n e d Shannon theorem, giving a tight lower b o u n d for the code length a n d the prediction errors, but it also serves a similar role as C r a m e r - R a o in= equality for estimators, e x c e p t t h a t w e m a y assess the goodness of a n y estimators, including the n u m b e r of parameters. The n a m e " c o m p l e x i t y " seems apt in view of the fact that it represents the ultimate limit to w h i c h the three f u n d a m e n t a l tasks, prediction, estimation, a n d coding, can be performed.
125
3. A R M A E s t i m a t i o n and Prediction
AS w e outlined in the preceding section, estimation and prediction are intertwined: you c a n n o t predict optimally w i t h o u t performing estimation optimally. Here w e m e a n the real prediction problem w h e r e w e are g i v e n an obs e r v e d sequence of numbers, y ( l ) . . . . . y ( n ) , one b y one, and w e are asked to predict for each n the n e x t value, This is to be done w i t h o u t knowing the probabilistie source of the n u m b e r s as usually done in prediction theory. O u r approach is to select a class of models, or perhaps several classes, and fit a model in each class w i t h the predictive L S principle. The prediction will be d o n e with the best model at each instant of time, and if the past is a n y guid-
ance to the future this strategy will provide the best predictions obtainable with the selected class. W e shall choose the model class as the gaussian A R M A class, w h i c h means that w e shall have to k n o w h o w prediction is done optimally for such processes. The K a l m a n t h e o r y in principle is applicable, but the solution it provides involves par a m e t e r s t h a t c a n n o t be estimated from the observations. For this reason w e shall use a n o t h e r approach, Rissanen (1967), and w e give the relevant recurrence equations below.
Consider a process generated b y the recursion
y ( t ) + a l y ( t -- 1) + " - + apy(t -- p ) = e(t) + c l e ( t -- 1) + ... + Cqe(t -- q),
(3.1)
for t > p , w h e r e e is a n orthogonal zero-mean process with variance E ( e ( t ) 2) = 0 2. Letting u ( t ) for t >_p stand for the M A process
u ( t ) = e(t) + c l e ( t -- 1) + ... + ¢qe(t - - q ) ,
we see that the eovarianca E ( u ( t ) u ( s ) )
(3.2)
= r(t,s), t,$ >_ p , satisfies the crucial "bandedness" property
r(t,s) = 0, for [ t -- s] > q.
(3.3)
W e let the initial variables be specified b y the eovariances as follows
E ( u ( t ) y ( $ ) ) = r(t,s) = 0 if t - s > q
(3.4) E(y(t)y(s))
= r(t,s),
t,s < p .
126
The problem is to find the orthogonal projection of y ( t ) on Y0~-t, the subspace spanned b y the observations up to ^ t - 1, w r i t t e n a s y (t I t -- 1). The task is simple if we find. a representation of the process u as follows:
(3.5)
u ( t ) = E(t) + Ci ( O e ( t - - 1) + "'" + Cq(t)e(t -- q ) , t > q,
w h e r e e(t) is a n uneorrelated (but not of unit variance) process; the variables for non-positive indices are zero. The coefficients are found by the C h o l e s k y factorizatlon of the covariance m a t r i x R = {r(ij)] as R = B I B , w h e r e B is upper triangular,
b(O,O) b(O,1) 0
b(O,n)
b(l,1) 0
B=
b(n0
0
l,n)
b(n,n)
Specifically, the f a c t o r s are defined b y the following rccursions, which also are s e e n to result from the G r a m - S m l t h orthogonalization procedure.
q
b ( t -- i j ) = [ r ( t -- i,t) - -
Z b(t - j , t ) b ( t - j,t - i) q b - l ( t j-i+l
b(t,t) = + [r(t,t) - b 2 ( t - q,t) . . . . .
b(0,0)= +~,b(t-i,t)=0,
b(t -
- i,t - i),
l,t)2] 1/2, t > 0
1 _ +
k
2c, c,):c,-,t,-;- 1)=
i--I
(3.8) i--I
where d,(t) = q(t) - a~, i = 1. . . . . k for k = m a x {p:/}, and the coefficients with undefined i n d e x values a r e zero.
One can show that if the polynomial defined by the coefficients c, has its roots strictly outside the unit circle, then c,(t) -4- c,. The limiting predictor, then, agrees w i t h the s t a t i o n a r y optimal predictor
q
~, ft I t -
1) + ~
k
cf, ft -
i It - i -
1) =
i=l
~, (ci -
a i ) y f t - i)
(3.9)
i=1
We n o w r e t u r n to the main problem of h o w to do the prediction w h e n the coefficients and the t w o order n u m b e r s p and q are n o t k n o w n . W e apply the predictive LS principle and proceed as follows. For each pair (p,q) and each t w e solve the following ordinary least squares problem
l
min E ~ 2 ( i ) , fl i--I
(3.10)
where 0 denotes the vector of the coefficients a = (at . . . . ,%,q . . . . . cq) together w i t h the p(p + 1 ) / 2 + q(q + 1 ) / 2 initial elements r(id) in (3.6), defining a vector/3, and r(i) m a y be solved recursively from (3.1), (3.2), and (3.5). A A ^ Let the minimizing p a r a m e t e r s be O (t) = (a (t),/3 (t)). With these we n o w e x t e n d the Cholesky factorization one more step; i.e., w e c o m p u t e the coefficients (3.6) for t + 1, and w i t h (3.7) we calculate the n e w prediction 2~(t + 1 It) f r o m the f o r m u l a (3.8), w h i c h clearly depends only on the past data and the pair (p,q), because the ^ calculation of O(t) is done b y the fixed ordinary least squares algorithm. This gives the prediction error A A ~(t + 1 IpAt) =y(t + 1) - - ~ ( t + 1 It). As the final step w e find the best pair (p(n),q(n)) which solves the optimization problem
128
n-I Ip~v) = mln X # ( t + 1 Ipa). P'q t-O
(3.n)
It can be shown that asymptotically
I
~)
~ 02(1 +
p+q n
In n),
(3.12)
where
02----. ~ - ~ e 2 ( / ) . t-I
(3.13)
Remarks.
In the above described procedure we did not pay any attention to the amount of computations needed. Rather, our aim was to do the prediction as well as we know how, provided, though, that there is no prior knowledge about ^
A
the parameter values. Clearly, when calculating 0 (t + 1) by a suitable hill climbing routine, we should use 8 (t) as the initial estimate. It is also possible to calculate the Cholesky Iactorization by an order of magnitude faster algorithm, Rissanen (1973), in case the eovariance matrix R(t) is a Toeplitz matrix; i.e., if the process u is stationary, and if we set the initial conditions to zero. Alternatively, it is enough to have the initial conditions such that the y -- process is stationary. The resulting fast predictor recursions have been described by Lindquist (1974), Kailath, Morf, and Sidhu (1974), and by Rissanen (19"/5), after this author lectured the topic at Stanford University during 1971-1972. Much earlier, the impulse response of a stationary predictor was found with a fast algorithm by Levinson, but that algorithm required an ever growing memory.
The entire Cholesky factorization can be avoided if we ignore the influence of the initial conditions and simply replace the representation (3.5) by (3.2). The only problem remaining then is to compute the sequence of esti-
129
A
mates a(t), t = 0 . . . . . n - 1 for different values of p and q. In the caze of AR processes such calculations can be done recursively by the so.called ladder forms; see Wax (1985).
As a final remark, the difference between the complexity and the sum of the squared residuals (3.13) was observed in Bittanti (1983), where it was wondered whether the relationship between the two could be clarified. Well, (3.12) does it in a most decisive manner.
We computed in Rissanen (1984c) a small simulation to test the predictive least square principle for estimating an ARMA order. We used the stationary equations which do not require the Cholesky [actorization, The data were generated with an ARMA(1,1) system with parameters a = .5 and c = - . 3 , where e(t) was a computer generated zero mean unit variance independent gaussian sequence. We fitted models of type ARMA(p,q) with (p,q) ffi (1,0), (2,0), (1,1), and (2,2). The following table gives the sum in (3.12), calculated for various values ofp,q and divided by n, along a single sample of size 600.
(p,q)
n ----50
n = 100
n = 200
n = 300
n = 600
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(1,0) (2,0) (1,I) (2,2)
1.336 1.629 1.505 1.925
1.276 1.385 1.307 1.520
1.101 1.156 1.117 1.221
1.107 1.120 1.091 1.159
1.015 0,996
Table 1. Simulations of ARM A processes
We see that the models (2,0) and (2,2) give uniformly worse values than the two best models (1,0) and (1,1) in the table for all sample sizes (we did not calculate the last entry for them, which surely would have been worse, too). In the last model, in particular, the two extra parameters penalize heavily the prediction errors. For sample sizes up to 200 the simpler model (1,0) performs best, but eventually the model with the right numbers of parameters (1,1) is the winner. This makes sense in that there is no predictive benefit in estimating the second less significant parameter until there is enough data, even if we knew that such a parameter existed; the data are the ultimate arbiter in deciding what is optimal and what is not.
130
We then w a n t e d to study how initial estimates of the p a r a m e t e r s might be t a k e n a d v a n t a g e of to improve t h e par a m e t e r estimates a n d the predictions. After all, in our opinion, the most natural and easy w a y to incorporate inltial knowledge is directly in terms of the estimate of the parameters, including their numbers. Indeed, the p a r a m e t e r s usually represent constants, a n d a n y Bayesian type of prior distribution for t h e m is both a w k w a r d to justify and just a b o u t impossible to estimate in a meaningful way. The traditional Bayesian formalism does not permit a representations of initial k n o w l e d g e in terms of a p a r a m e t e r value, because the so defined singular dis^ tribution c a n n o t be altered by the data. However, our formalism does it easily. In fact, let 0 (0) denote the initial ^ estimate w i t h p + q components. Then w i t h O(t) denoting the predictive L S estimate from the first t observations w i t h initial k n o w l e d g e ignored, as described above, define a new estimate as a linear combination of the t w o
O(t) =
^ ^ at0 (o) + (1 - . c ) O ( t ) .
(3.14)
The coefficient is defined as follows
at=
1
--,_ , ,_ , ' 1 + 2 L'Ow'q't)-Lke'q't)
(3.15)
w h e r e L(p,q,t) denotes the a c c u m u l a t e d prediction errors, (3.11) before the minimization, and Lo(p,q,t) is the s a m e ^ w h e n the p a r a m e t e r is the initial estimate 0 (0). Because this p a r a m e t e r is the same t h r o u g h o u t the data, La(p,q,t) coincides w i t h the usual non-predictive s u m of the squared deviations. We see that a good initial e s t i m a t e tends to m a k e the corresponding code length shorter t h a n the length L(p~l,t) for small values of t, because initially the estiA
m a t e 0 (t) tends to be poor due to a small sample size. This causes a r to be near one, a n d the effective estimate 0 (t) is close to the initial estimate. However, eventually a, gets small, unless the initial estimate is perfect, and the ef^ fective estimate tends to the steadily improving estimate 0 (t).
To test the feasibility of this s c h e m e w e generated a data sequence of length 100 w i t h the A R M A ( I , 1 ) model defined b y the t w o p a r a m e t e r s a = 0 . 7 , c = -- 0.1, a n d w i t h a unit variance zero m e a n gaussian independent input sequence. W e set the initial estimate 0(0) = (0.7, -- 0.1) of the p a r a m e t e r s at the " t r u e " values. W e wish to cornpare the convergence of the p a r a m e t e r e s t i m a t e 0 (t) = (a, c ), given b y such a perfect initial knowledge, w i t h t h a t ^
A
A
of the least squares estimates 0 (t) = (a, c ). In this e x p e r i m e n t we, then, kept the n u m b e r of p a r a m e t e r s at the
131
correct value. The~o estimates along with the two sums of the squared predictions, corresponding to the two estiA
maters 0 and O, r~pectively, were computed along the 100 sample points, and the results are in the following table.
time t
with prior estimates a c L
without prior estimates a c L
.......................................................................
I0 20 30 40 100
0.15 0.47 0.62 0.69 0.69
0.01 -0.23 -0.17 -0.11 -0.11
5.0 17.1 28.7 36.0 98.9
0.04 0.16 0.24 0.43 0.50
0.03 -0.41 -0.50 -0.28 -0.33
4.99 20.0 33.7 42.9 108.2
Table 2. Effect of prior estimates
We see that indeed good initial estimates improve both the convergence and the prediction error.
4. Vector Time Series Models
As quite well known, the class of multi-input/output linear dynamic systems, even of a fixed dimensionality, is topologically a lot more complex space than in the case when either the input or the output is scalar. Hence, when we search for the stochastic complexity of an observed vector li me series, relative to the class of such models, we may find a model which with relatively few parameters will capture the essence in the data. In the older statistical literature the only models that were fitted to a series with, say, p components, had the maximal dimensionality, a multiple of p. This was justified on the grounds that since the estimated l-lankel-matrix, or its equivalent, has the maximal rank, there is no point in fitting other models. Such an argument indicates a gross misunderstanding of modeling, and, in fact, an equivalent argument would dismiss fitting dynamic systems to scalar sequences as well; after all, no observed sequence is generated by any dynamic system.
Since the theory of multivariable linear dynamic systems is by now well known, and, in fact, it ma y even be covered in some of the other chapters of this book, we do not need to describe it here in any detail. Instead, we just summarize the relevant facts. The set of all linear dynamic systems with, say, p inputs and equally many outputs,
132
is in o n e - t o - o n e c o r r e s p o n d e n c e w i t h t h e set of all H a n k e l m a t r i c e s of p x p blocks w i t h finite r a n k , s a y n. G i v e n s u c h a s y s t e m , its m a t r i x i n p u t / o u t p u t impulse response defines a H a n k e l m a t r i x of p x p blocks a n d r a n k n. C o n v e r s e l y , a n y of t h e usual realization a l g o r i t h m s defines a
p-input/output s y s t e m
of d e g r e e n of s u c h a H a n k e l
matrix.
T h e set of all finite r a n k H a n k e l m a t r i c e s of p x p blocks c l e a r l y a d m i t s a p a r t i t i o n into e q u i v a l e n c e classes b y t h e r a n k n. H o w c a n e a c h s u c h class b e p a r a m e t c r i z e d ? Unlike in t h e c a s e w i t h p = 1, a n equivalenc~ class c o r r e s p o n d i n g to a r a n k n, a n d h e n c e t h e set of all linear s y s t e m s of o r d e r n w i t h p inputs a n d o u t p u t s , c a n n o t b e p a r a m e t e r i z e d w i t h a single c o o r d i n a t e s y s t e m , a n d t h e set is n o t a linear space. This i m p o r t a n t o b s e r v a t i o n is a t the r o o t of t h e m o d e r n t h e o r y of linear d y n a m i c s y s t e m s , a n d it also affects in a p r o f o u n d m a n n e r the w a y s u c h models o u g h t to be fitted to the o b s e r v e d time series. C o n s i d e r , f o r e x a m p l e , the set of all H a n k e l m a t r i c e s w i t h p = 2 a n d n -- 3. If w e f u r t h e r a s s u m e t h e first t w o r o w s to b e linearly i n d e p e n d e n t , as w e s h o u l d to a v o i d p a t h o l o g y , t h e n t h e H a n k e l p r o p e r t y implies t h a t either t h e t h i r d or the f o u r t h r o w m u s t be the last r e m a i n i n g r o w t h a t t o g e t h e r w i t h t h e first t w o f o r m s a 3 - e l e m e n t basis f o r t h e s p a n of all the r o w s in t h e m a t r i x . A g a i n the H a n k e l p r o p e r t y implies t h a t these t h r e e basis r o w s a r e defined just as s o o n as w e specify t h e t w o first e l e m e n t s in each, hence, six a l l t o g e t h e r . In the f o r m e r case, w h e r e the basis consists of the Hrst t h r e e r o w s , t h e f o u r t h a n d t h e fifth r o w a r c linear c o m b i n a t i o n s of the basis e l e m e n t s a n d , hence, to specify t h e m w e n e e d six ecefficients. All t h e o t h e r r o w s in the H a n k e l m a t r i x a r e n o w just shifts a n d t r u n c a t i o n s of these a n d t h e basis r o w s .
2np =
Similarly,
12 p a r a m e t e r s a r e n e e d e d to specify all t h e H a n k e l m a t r i c e s , w h e r e the first, s e c o n d , a n d t h e f o u r t h r o w s
f o r m a basis.
C o n s i d e r n o w t h e set of m a t r i c e s w h e r e the f o u r t h r o w is a basis c l e m e n t . T h e n the t h i r d r o w p e r f o r c e is l i n e a r l y d e p e n d e n t o n t h e first, second, a n d t h e f o u r t h . C o n s i d e r t h e f u r t h e r subset w h e r e t h e t h i r d r o w in f a c t is linearly d e p e n d e n t o n t h e first t w o . E v i d e n t l y n o s u c h m a t r i x a n d the c o r r e s p o n d i n g linear s y s t e m c o u l d b e e x p r e s s e d in t e r m s of t h e p a r a m e t e r s d e f i n e d b y t h e basis consisting of t h e first, second, a n d t h e t h i r d r o w . F r o m this w e c o n elude t h a t in o r d e r t o p a r a m e t e r i z c the set of all s y s t e m s of d e g r e e 3 h a v i n g t w o inputs a n d o u t p u t s , w e n e e d t w o distinct c o o r d i n a t e s y s t e m s .
~33
In general, then, the set of all linear systems of degree n having p inputs and outputs, may be partitioned into finitely many equivalence classes, each class corresponding to the so-called lexicographic basis defined by each matrix as follows: Each of the first p rows is included in the basis, and the next basis element is the first row which is not in the linear span of those above it, and so on. Consider the ith row, i _ for
(1.a) which
[(t,x) 1.2
into
Xr (T,t)
System
if there exists an input function
carries
the event
(T,0) into
(t,x)
(T,0) 3 .
[Xc(t,T)] denotes
[controllable] 1.3
t]
(T,t),T t],such over
that x is
(t,T) ~ .
denotes the set of the reachable E c o n t r o l -
states at t. (1.a)
(or, e q u i v a l e n t l y
the pair
(A,B)) is r e a c h -
able[controllable 3 at time t if Xr(t) = R n [Xc(t) 1.7
System
at t 3
(I)
(or, equivalently,
the pair
=
Rn]
(A,B)) is reach-
a b l e [ c o n t r o l l a b l e 3 if Xr(t) = R n, Yt [Xc(t)
= R n, Yt 3 .
146
2.3
GramPian matrices
The following nxn matrices are named reachability and controlla-
bility Grammian matrices respectively.
Wr(T,t)
.t = j~ #(t,~) B(o) B(o)' ¢(t,s)' do , t >
(3.a)
w
=
(3.b)
(t,-r)
~(t,o)
B(a)
B(c~)'
~(t,c~)'
de
,
T>t
C
It is well known (Kalman, 1969) that
Xr(~,t) = R EWr(~,t)]
x (t,~) = R EWc(t,~)] c
L-J m
where R
is the range operator.
In the periodic case, the following recursions can be derived in view of
(2):
Wr(t-(i+1)T,t ) = Wr(t-iT,t ) + [@(t+T,t)] i Wr(t-T,t) [@(t+T,t)'] i (4.a) C
C
s
C
(4.b) 2.4
Five structural properties of time-invariant systems
The structural properties of linear time-invariant systems have received ample coverage in the literature,see e.g. Kalman
(1969),
147
Chen
(1970).
though all,
some
well
of t h e m
in t h e p r o p e r
periodic
A)
Five
known
are
properties
trivial,
order,
for
are
listed
it is a d v i s a b l e
the
subsequent
below.
to
list
discussion
Althem
on
systems.
The reachability
and controllability
subspaces
at t i m e
t
subspaces
are t i m e -
do c o i n c i d e : x
r
(t)
B) T h e
= x
c
(t)
,
reachability
Yt
and controllability
invariant: X
X
C)
r
c
(t) = const.
, Yt
(t) = const.
, Yt.
If t h e p a i r point, X
r
X
(t)
(A,B)
is r e a c h a b l e
it is r e a c h a b l e =
Rn
(t) = R n
---->
X
~
X
c
D)
r
[controllable]
[controllable] (t)
=
Rn
(t) = R n
,
at a n y
at a t i m e time
point:
Vt , Vt.
c
If a s t a t e any
e>0,
(t,
t+~)]
is r e a c h a b l e
it is r e a c h a b l e :
Xr(t)
= Xr(t-e,t )
Xc(t)
= Xc(t,
t+E).
at t [ c o n t r o l l a b l e over
(t-E,t)
at t],
[controllable
then, over
for
148
E) The p a i r
(A,B)
[sI - A
rank o n the s p e c t r u m
to the s y s t e m
a trivial
between
the t i m e - i n v a r i a n t
D can be r e p h r a s e d
systems,
reachability
instantaneous". made
arbitrarily
constraint I.
often
referred
2.5
Five
an
input
the
(1966),
Popov
structural
that,
with
function
properties
the
case.
in t i m e - i n v a r i a n t
are
"asymptotically
of
transitions time
the a b s e n c e
of any
in the c l a s s i c a l
characterization
Belevitch
(1968),
of c o n t i n u o u s - t i m e
Defini-
E is
(Popov-Belevitch-Hautus)
(1966),
is
to stress
and the p e r i o d i c
interval
spectral
to as the PBH
only
and c o n t r o l l a b i l i t y
is c o n n e c t e d
on the
Finally,
see J o h n s o n
in
B is i n t u i t i v e . C
here
and c o n t r o l l a b i l i t y
occur
shor~This
tions
by saying
The reachability to
Property
of B and is l i s t e d
Property
energy
if
of A.
time-invariance,
consequence
difference
can be
if and o n l y
B 3
is full Due
is r e a c h a b l e
condition, Hautus
(1969).
periodi c
systems
The f o l l o w i n g
basic
Do p r o p e r t i e s • Can
anything
questions
A-C hold
true
be said about
lity
intervals?
Does
there
exist
In the
first
place,
and c o n t r o l l a b i l i t y
for p e r i o d i c
version
holds
subspaces
section:
and c o n t r o l l a b i -
of the PBH
true,
coincide
in this
systems?
the r e a c h a b i l i t y
a periodic A still
are c o n s i d e r e d
test?
i.e.
the r e a c h a b i l i t y
even
in the p e r i o d i c
149
case:
X
r
(t) = X
As m a n y r e s u l t s periodic
~t.
concerning
systems,
geometric
this
the s t r u c t u r a l
can b e p r o v e n
derivation
of a t y p i c a l
of inclusion
that a n y s t a t e NT.
geometric
Due
t+NT,
to p e r i o d i c i t y ,
x is c o n t r o l l a b l e
X
reachable
c
(T,t)
contrary and X
c
to zero
(t+NT),
in an
so t h a t
x~
since x ~ is c o n t r o -
(t,x)
(t,0)
into
will
there
exists
(t+NT,0).
By the
t h e n be t r a n s f e r r e d at t+NT,
to p e r i o d i c i t y .
This
and c o n s e leads
to
s y s t e m at e a c h
well
known
property
to the time
and c o n t r o l l a b i l i t y time p o i n t
(T,t) m a y n o t c o i n c i d e
is an e x t e n s i o n
for t i m e - i n v a r i a n t
invariant
case,
sub-
systems.
the s u b s p a c e s
(see E x a m p l e
1 below).
c
dically
r
the
Xr(t).
P r o p e r t y B is o b v i o u s l y
X
that,
x e is r e a c h a b l e
at t thanks
of a p e r i o d i c
r
(t) = X
of the r e a c h a b i l i t y
of the a n a l o g o u s
X
the event
Xc(t) C
The c o i n c i d e n c e
However,
let us o u t l i n e
at t. T h e r e f o r e
transfers
Therefore
the c o n c l u s i o n
spaces
which
function,
(t+NT,-x~).
quently
or
at t + N T as well.
function
same i n p u t
proof,
a t t c a n be d r i v e n
Let x = $ ( t , + N T ) x "e. It is a p p a r e n t
to
algebraic
X
controllable
is c o n t r o l l a b l e
an i n p u t
either via
of l i n e a r
(t) C X (t). L e t x ~ be a s t a t e w h i c h c -- r at t and d e n o t e b y N a p o s i t i v e i n t e g e r such
is c o n t r o l l a b l e
lableat
properties
methods.
As an e x a m p l e
interval
(t),
c
time-varying
(t) = X
X c(t)
:
(t+T)
,
~t
= X c(t+T)
,
~t.
r
false. I n s t e a d X r ( t )
and Xc(t)
are p e r i o -
150
However,
b y the s a m e a r g u m e n t s
ti a n d B o l z e r n
dim X
dim X
From
r
c
(1984,a),
Lemma
used
in d i s c r e t e - t l m e
3, it can be p r o v e n
in B i t t a ~
that
(t) = c o n s t
(t) = const.
this,
periodic
it f o l l o w s
system
reachable
that property
is r e a c h a b l e
C still h o l d s true:
at a g i v e n
at a n y t i m e point.
Hence
s p e a k of s y s t e m r e a c h a b i l i t y
time point,
If a
it is
it is p o s s i b l e
and c o n t r o l l a b i l i t y
to
without
fur-
ther specifications.
The attention of w h i c h are knowledge
is n o w f o c u s e d somewhat
vals
(1969),
algebraic
Theorem
1
If s y s t e m sition
systems
in
C
Bittanti
(Brunovsky,
X
first
concern-
sixties.
interIn B r u -
is p r o v e n b y m e a n s proof,
stories
To the b e s t
statements
late
of
of g e o m e t r i c
(1984,a),
then
Lemma
type,
I.
the c o n t r o l l a b i l i t y
in an i n t e r v a l
(t) = X
C
(t,t+nT).
tran-
of t i m e of l e n g t h nT
:
C
the
and c o n t r o l l a b i l i t y
and Bolzern
(1) is c o n t r o l l a b l e ,
~
e a c h other.
to the
result
D and E,
1969)
c a n be p e r f o r m e d
(t) = R n
the
An alternative
is the s y s t e m order)
X
go b a c k
the f o l l o w i n g
arguments.
can be found
author,
with
of the r e a c h a b i l i t y
of p e r i o d i c
novsky
interwoven
of the p r e s e n t
ing the l e n g t h
on p r o p e r t i e s
~]
(n
151
In Kalman
(1969),
a stronger
statement
is reported
without
proof.
Proposition If system sition
(Kalman,
(I) is controllable,
can be performed
The question untill
1969)
of proving
1975,
when,
in an interval
in a
Riccati
proposition.
Furthermore,
equation,
system controllability
different
and N i s h i m u r a ~(T,0)
condition, matrix
remained
paper
which generalizes
open
on
Hewer gave a proof
he gave a spectral
[]
of Kalman
condition
of
the PBH test to the
case.
A slightly
matrix
lengthy
tran-
of time of length T.
Kalman proposition
the periodic
periodic
then the controllability
condition
is due to Kano
(1979), where reference is made to the m o n o d r o m y RT = e in place of R. The condition, named H-
will now be stated
~(t+T,t).
to discrete-time
H-condition
yet equivalent
This
in terms of a ~ n e r i c
is especially
systems. Recall
at t
that
monodromy
useful
for the extension RT [ is the spectrum of e
(continuous-time)
Given a time point t, the matrix sl-
#(t+T,t)
Wc(t,t+T) 3
is full rank on I-
[]
The first paper where
the condition was
stated
is probably Bittanti,
Bolzern,
and Guardabassi
Colaneri
in these
terms (1983).
152
However,the spectral conditions playeda k e y r o l e in the analysis of the periodic Lyapunov and Riccati Equations, Hewer Kano and Nishimura
(1969),
(1975). The proof given by Hewer of the
validity of the H-condition as controllability test was based on Kalman proposition, in Hewer
though. Unfortunately,
the proof given
(1969) of such a proposition was not correct. Even
more so:the Kalman proposition itself is not true,as shown by the following counterexample.
Example 1 For a given integer n, let 1 I, ~2' ..., I n be n given distinct real numbers. Consider the single-input system:
A(t) = diag [ 1 I, 12, .... Xn] ' E e-X1 (l-t)
e _12 (l_t)
l , sin ... e -ln (l-t) _
B It) = periodic extension of previous,
t ~ [0,1]
For this system, which is periodic of period T = I, 9(0,~) B(o) = (sin ~g) x I where x I = [e -11
e-k2
... e-ln]
Letting 2 = f01 sin ~q dq
=t, te [0,13
153
it f o l l o w s
Wc(0,1)
from
=
(3.b)
~ X l X ~.
Therefore,
dim
X
not
controllable
For
a given
interval
xi
Then, can
(0,1)
c
and
-iX1
e
Wc(0,k)
the
= ~(xlx:]
assuming
n > I, t h i s
system
integer
k,
k ~ n,
consider
now
the
time
let
. .. e
-iXnJ
'
(4)
for
recursion following
,
any
i = I, 2,
...,
k.
integer
k ~n,
W
c
(0,k)
expression:
+ x2x[z + "" " + XkXJ)'k
k < n
Consequently,
Xc(0,k)
Since dim
X
= span
x I, x2,
c
(0,k)
Therefore states shorter
the
which than
Interestingly
Xr(0,k)
Ix1,
...,
= k
x2,
...,
x n are ,
,
Xk~
k
independent,
it f o l l o w s
is c o n t r o l l a b l e , be d r i v e n
to
zero
but
enough,
it t u r n s
X_l
...,
out
X~k+13
there
in an
(n).
= s p a n Ix0,
.< n .
that
Yk ~T, the solution of the Lyapunov equation is given by FT(t) = ¢(t,T) ~ ~(t,T)' + Wr(~,t).
(26)
Setting now T=0, t=T and imposing the periodicity constraint (24), the following equation is obtained: = ¢(T,0)
~
(T,0)'
+ ~.
This is the discrete-time algebraic Lyapunov equation. be shown
It can
(Graham, 1981) that, if the characteristic multipliers
lie within the unit circle, this equation admits a unique solution. From these results, stable, then both
it follows that: if the system is as~nptotically
(20) and (21) admit a unique T-periodic
solution. As a matter of fact, under the assumption of asymptotic stability, the following can be shown to hold true Bolzern and Colaneri,
(Bittanti,
1984):
Consider the solution F (t) of (21) such that F (T) = ~. T T Then F (t) converges to the periodic solution of % (21) as T÷-~, for whichever ~. In particular, taking ~=0,
(26)
entails that the Wr(-~,t) ~8 the T-periodic solution. Moreover,
174
in view of positive
(25), it is also apparent
semidefinite
reachable,
(at each t).
that this solution In fact,
should
is
(A,B) be
the solution is obviously positive definite
(at each
t). This last conclusion Lemma,
is part of the so-called Periodic Lyapunov
which can be stated as follows
Colaneri, Theorem
(Bittanti,
Bolzern and
1985).
6
The system is asymptotically such that
(A,B)
positive definite
stable if and only if, for any
is reachable,
there exists a T-periodic
solution of the Lyapunov equation
(21).~]
An extended version of this lemma can be given under the assumption
that
(A(t), B(t))
be stabilizable
only.
Theorem 7 The system is asymptotically such that
(A,B) is stabilizable,
positive semidefinite Theorem
7 is proven in
is decomposed
to the reachability
there exists a T-periodic
solution of the Lyapunov equation (Bittanti,
by means of a decomposition equation
stable if and only if, for any
Bolzern and Colaneri,
technique.
Precisely,
into three subequations
canonical
decomposition
of
(21).[7 1985)
the Lyapunov
corresponding (A,B).
One could wonder whether the Lyapunov equation may admit a T-periodic
positive
is not stable.
semidefinite
solution even if the system
In case the system is not asymptotically
stable,
175
matrix ~(T,0)
has some eigenvalues
on or outside
the unit
circle. If a characteristic the pair
(A,B)
T-periodic
multiplier
is stabilizable,
solution,
(see
(Bittanti and Colaneri, say, p characteristic
lies on the unit circle, then
(21) does not admit any
(Wimmer and Ziebur,
1986, Thm.
multipliers
2(a)).
that,
if
1984) and
(A,B)
solution of time-points.
(Bittanti,
is reachable
(21)
lower than I. Then,
Colaneri,
1986),
or stabilizable,
it is shown
the T-periodic for each
The remainign n-p ones are all positive if
(A,B)
solutions of
(21) correspond
(12), the conclusion stabilizable.
Then,
eq.(12)
(A,B)
semidefinite
to a cyclostationary
is the following:
if
is stabilizable.
Since it is obvious that only the positive
Assume
solution of
that
(A,B)
is
if the system is not asymptotically
admits no cyclostationary
The analysis of the discrete-time
solution.
periodic Lyapunov equation
is currently underway and partially Colaneri,
Suppose now that,
(if any) has p negative eigenvalues
is reachable or nonnegative
stable,
1975) and
have modulus greater than I,
while the remaining n-p ones have modulus in (Shayman,
and
reported
in
(Bolzern and
1986).
Acknowledgment The author is grateful to Professors Diego Bricio Hernandez comments.
Guido Guardabassi
and
for their helpful and stimulating
176
References Bailey, J.E. (1973): Periodic Operation of Chemical Reactors: A Review. Chem. Eng. Commun. I, 111-124. Bekir, E. and R.S. Bucy (1976): Periodic Equilibria for Matrix Riccati Equations. Stochastics 2, 1-104. Belevitch, V. (1968): Classical Network Theory. Holden Day, San Francisco. Bernstein, D.S and E.G. Gilbert (1980): Optimal Periodic Control: The H Test Revisited. IEEE Trans. Automatic Control AC-25, 673-684. Bittanti, S. and P. Bolzern (1984,a): Can the Kalman Canonical Decomposition be performed for a Discrete-time Linear Periodic System? Ist Latin American Conference on Automatica, Campina Grande, Brazil, 449-453. Bittanti, S. and P. Bolzern (1984,b) : Canonical Decomposition and Discrete-time Linear Systems. 23rd Conference of Decision and Control, Las Vegas, U.S.A., 1737, 1738. Bittanti, S. and P. Bolzern (1984,c): Four Equivalent Notions of Stabilizability of Periodic Linear Systems. 3rd American Control Conference, San Diego, U.S.A., 1321-1323. Bittanti, S. and P. Bolzern (1985,a): Reachability and Controllability of Discrete-time Linear Systems. IEEE Trans. Automatic Control 30, 399-491. Bittanti, S. and P. Bolzern (1985,b): Discrete-time Linear Periodic Systems: Grammian and Modal Criteria for Reachability and Controllability. International J. Control 41, 899-928. Bittanti, S. and P. Bolzern (1985,c): Stabilizability and Detectability of Linear Periodic Systems. Systems and Control Letters 6, 141-145. Plus Addendum, to appear in Systems and Control Letters (1986), 7, 73. Bittanti, S. and P. Bolzern (1986): On the Structure Theory of Discrete-time Linear Systems. International J. Systems Science, 17, 33-47.
177
Bittanti, S., P. Bolzern and P. Colaneri (1984): Stability Analysis of Linear Periodic Systems via the Lyapunov Equation. 9th IFAC World Congress, Budapest, 8, 169-172. Bittanti, S., P. Bolzern and P. Colaneri (1985): The Extended Periodic Lyapunov Lemma. Automatica 5, 603-605. Bittanti, S., P. Bolzern, P. Colaneri and G. Guardabassi (1983): H and K-Controllability of Linear Periodic Systems. 22nd Conference on Decision and Control, S. Antonio, U.S.A., 1376-1379. Bittanti, S. and P. Colaneri (1986): Lyapunov and Riccati Equations: Periodic Inertia Theorems. IEEE Trans. Automatic Control (to appear). Bittanti, S., P. Colaneri and G. De Nicolao (1986): Discretetime Periodic Systems: a note on the Reachability and Controllability interval length. Centro Teoria Sistemi, Politecnico di Milano, Int. Rep. 86-003. Bittanti, S., P. Colaneri and G. Guardabassi (1984): H-Controllability and Observability of Linear Periodic Systems. SIAM J. Control and Optimization 22, 889-893. Bittanti, S., G. Fronza and G. Guardabassi (1973): Periodic Control: A Frequency Domain Approach. IEEE Trans. Automatic Control 18, 33-38. Bittanti, S., G. Guardabassi, C. Maffezzoni and L. Silverman (1978): Periodic Systems: Controllability and the Matrix Riccati Equation. SIAM J. Control and Optimization 16, 37-40. Bittanti, S. and D.B. Hernandez (1986): The Simple Pendulum as an Illustrative Example of the Periodic Control Problem. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-010.
Bolzern, P. (1986): Criteria for Reachability, Controllability and Stabilizability of Discrete-time Linear Periodic Systems. V Polish-English Seminar on Real-Time Process Control, Warsaw, Poland.
178
Bolzern, P. and P. Colaneri (1986): Existence and Uniqueness Conditions for the Periodic Solutions of the Discretetime Periodic Lyapunov Equation. Centro Teoria dei Sistemi, Politecnico di Milano, Int. Rep. 86-011. Brockett, R.W. (1970): Finite Dimensional Linear Systems. Wiley and Sons.
J.
Brunovsky, P. (1969): Controllability and Linear Closed loop Controls in Linear Periodic Systems. J. Differential E~uations 6, 296-313. Chen, C.T. (1970): Introduction to Linear System Theory. Rinehart and Winston.
Holt,
Colonius, F. (1985ja): Optimality for Periodic Control of Functional Differential Systems. J. Mathematical Analysi s and Applications (to appear). Colonius, F. (1985,b): The High Frequency Pi-Criterion for Retarded Systems. IEEE Trans. Automatic Control 11, 1045-1048. DaPrato, G. (1984): Periodic Solutions of Infinite Dimensional Riccati Equations. Rendiconti Accademia Nazionale dei Lincei, (to appear). Dorato, P. and A.H. Levis (1971): Optimal Linear Regulators: the Discrete-time Case. IEEE Trans. Automatic Control 6, 613-620. Dorato, P. and H.K. Knudsen (1979): Periodic Optimization with A p p l i c a t i o n s to Solar Energy Control. Automatica 15, 673-679 Gardner, W.A. and D.E. Franks (1975): Characterization of Cyclo-stationary Random Processes. IEEE Trans. Information T h e o r y 21, 1-24. Gilbert, E.G. (1977): Optimal Periodic Control: A General Theory of Necessary Conditions. SIAM J. Control and Optimization 15, 717-746.
17g
Gilbert, E.G. and D.T. Lyons (1981): The Improvement of Aircraft Specific Range by Periodic Control. AIAA Guidance and Control Conference, Albuquerque. Graham, A. (1981): Kronecker Products and Matric Calculus with Applications. Ellis Horwood Limited, Chichester. Grasselli, O.M. (1984): A Canonical Decomposition of Linear Periodic Discrete-time Systems. International J. Control 40, 201-214. Guardabassi, G. (1971): Optimal Steady State Versus Periodic Control. Ricerche di Automatica 2, 240-252. Guardabassi, G. (1976): The Optimal Periodic Control Problem. Journal A 17, 75-83. Halanay, A.(1966): New York.
Differential Equations.
Academic Press,
Hautus, M.L.J. (1969): Controllability and Observability Conditions of Linear Autonomous Systems. Inda@. Math. 443-448.
72
Hernandez, V. and L. Jodar (1985): Boundary Problems and Periodic Riccati Equations. IEEE Trans. Automatic Control 11, 1131-1133. Hewer, G.A. (1975): Periodicity, Detectability and the Matrix Riccati Equation. SIAM J. Control 13, 1235-1251. Horn, F.J.M. and R.C. Lin (1967): Periodic Processes: A Variational Approach. Ind. Eng. Chem. Process Des. Dev. I, 21-30.
6,
Horn, F.J.M. and J.E. Bailey (1968): An Application of the Theorem of Relaxed Control to the Problem of Increasing Catalyst Selectivity. J. Optimization Theory and Applications 2, 441-449. Houlihan, S.C., E.M. Cliff and H.J. Kelley (1982): Study of Chattering Cruise, Journal Aircraft 19, 119-124.
180
Johnson, C.D. (1966): Invariant Hyperplanes for Linear Dynamical Systems. IEEE Trans. Automatio Control 11, 113-116. Kabamba, P.T. (1985): Monodromy Eigenvalue Assignment in Linear Periodic Systems. 24th Conference on Decision and Control, Ft. Lauderdale, U.S.A., 177, 178. Kalman, R.E. (1969): Theory of Regulators for Linear Plants. In: Kalman R.E., P.L. Falb and M.A. Arbib: Topics in Mathematical S y s t e m Theor[. McGraw-Hill Co., New York. Kano, H. and T. Nishimura (1979): Periodic Solutions of Matrix Riccati Equations with Detectability and Stabilizability. !nternational J. Control 29, 471-487. Kern, G. (1980): Linear Closed-loop Control in Linear Periodic Systems with Application to Spin-stabilized Bodies. International J. Control 31, 905-916. Khandelwal, D.N., J. Sharma and L.M. Ray (1979): Optimal Periodic Maintenance of a Machine. IEEE Trans. Automatic Control 24, 513. Khargonekar, P.P., K. Poolla and A. Tannenbaum (1985): Robust Control of Linear Time-invariant Plants Using Periodic Compensation. IEEE Trans. Automatic Control 11, 1088-1098. Kono, M. (1980): Eigenvalue Assignment in Linear Periodic Discrete-time Systems. International J. Control I, 149-158. Maffezzoni, C. (1974): Hamilton-Jacobi Theory for Periodic Control Problems. J. Optimization Theory and Applications 14, 21-29. Markus, L. (1973): Optimal Control of Limit Cycles or what Control Theory can do to Cure a Heart Attack or to Cause One. Symposium on Ordinary Differential Equations, Minneapolis, Minnesota (1972). W.A. Harris, Y. Sibuya, eds., SpringerVerlag, Berlin. Matsubara, M., N. Nishimura, N. Watanabe and K. Onogi (1981): Periodic Control Theory and Applications. Research Reports of Automatic Control Laboratory Vol. 28, Faculty of Engineering, Nagoya University.
181
Matsubara, M. and K. Onogi (1978): Stabilized Suboptimal Periodic Control of a Chemical Reactor. IEEE Trans. Automatic Control 23, 1005-1008. Meyer, R.A. and C.S. Burrus (1976): Design and Implementation of Multirate Digital Filters. IEEE Trans. Acoustics, Speech and Signal Processing 1, 53-58. Nistri, P. (1983): Periodic Control Problems for a Class of Nonlinear Periodic Differential Systems. Nonlinear Analysis, Theory, Methods and A p p l i c a t i o n s 7, 79-90. Noldus, E. (1975): A Survey of Optimal Periodic Control of Continuous Systems. Journal A 16, 11-16. Onogi, K. and M. Matsubara (1980): Structure Analysis of Periodically Controlled Chemical Processes. Chem. En~. Sci. 34, 1 0 0 9 - 1 0 1 9 . Popov, V.M. Berlin.
(1973): Hyperstability of control systems.
Springer,
Sch~dlich, K., U. Hoffmann and H. Hofmann (1983): Periodical Operation of Chemical Processes and Evaluation of Conversion Improvements. Chemical En~ineerin~ Science 38, 1375-1384. Shayman, M.A. (1984): Inertia Theorems for the Periodic Lyapunov Equation and Periodic Riccati Equation. Systems and Control Letters 4, 27-32. Shayman, M.A. (1985): On the Phase Portrait of the Matrix Riccati Equation Arising from the Periodic Control Problem. SIAM. J. Control and Optimization 23, 717-751. Sincic, D. and J.E. Bailey (1978): Optimal Periodic Control of Variable Time-delay Systems. International J. Control 27, 547-555. Speyer, J.L. (1973): On the Fuel Optimality of Cruise, J. Aircraft 10, 763-764.
182
Speyer, J.L. (1976): Non-optimality of Steady-state Cruise for Aircraft. AIAA Journal 14, 1604-1610. Speyer, J.L. and R.T. Evans (1984): A Second Variational Theory of Optimal Periodic Processes. IEEE Trans. Automatic Control 29, 138-148. Valko~ P. and G.A. Almasy (1982): Periodic Optimization of Hammerstein-type Systems. Automatica 18, 245-148. Watanabe, N., Y. Nishimura and M. Matsubara (1976): Singular Control Test for Optimal Periodic Control Problems. IEEE Trans. Automatic Control 21, 609-610. Watanabe, N., K. Onogi and M. M a t s u b a r a (1981): Periodic Control of Continuous Stirred Tank Reactors - I, Chem. En@. Sci. 36, 809-818, II ibid. 37, 745-752. Watanabe, N., H. Kurimoto and M. M a t s u b a r a (1984): Periodic Control of Continuous Stirred Tank Reactors - I I I , Case of multistage reactors. Chem. En 9. Sci. 39, 31-36. Wimmer, H.K. and A.D. Ziebur (1975): Remarks on Inertia Theorems for Matrices. Czechoslovak Mathematical Journal 25, 556-561. Wong, E. and B. Hajek (1985): Stochastic Processes En~ineerin ~. Springer-Verlag, Berlin.
in
Wonham, W.M. (1968): On a M a t r i x Riccati Equation for Stochastic Control. SIAM Journal Control 6, 681-698. Yakubovich, V.A. and V.M. Starzhinskii (1975): Linear Differential Equations with Periodic Coefficients. J. Wiley, New York.
Chapter
6
Numerical
Problems
Daniel
in L i n e a r
Boley
System
and S e r g i o
Theory
Bittanti
I. I n t r o d u c t i o n The a n a l y s i s tation
of m u l t i v a r i a b l e
of m a t r i x
problems,
rank and e i g e n v a l u e s . computer
In this work,
numerical
We b e g i n
more
we o u t l i n e
problems
with
f r o m linear
linear
2. R e v i e w
used
methods Value
of these
for c o m p u t e r examples
calculations,
of w h e r e
decompositions problems,
eigenvalue
Decompositions). concepts
to m a t r i x
to be u s e d on a
theory.
and r e l a t e d for
systems
the c o m p u -
for h a n d c o m p u t a t i o n s .
and give
in s y s t e m
equations
and S i n g u l a r
few a p p l i c a t i o n s riant
arise
involves
the t e c h n i q u e s
a r e v i e w of the s i m p l e r
of linear
systems
a few t e c h n i q u e s
w h y they are useful,
sophisticated
(Schur
ranging
In general,
are n o t the same as those
illustrate
systems
control
used then
and rank
go on to
computation
We c o n c l u d e
to the a n a l y s i s
to solve
with
a
of t i m e - i n v a
systems.
of S i m p l e r
Computational
Methods
2.1 - LU d e c o m ~ o s ! ~ ! 2 n
We b e g i n that
by i n t r o d u c i n g
the c o n c e p t
is, we try to reduce
simpler we w o u l d
matrices, like
a matrix
from w h i c h
to calculate.
of a m a t r i x
decomposition;
A to the p r o d u c t
we can c a l c u l a t e
of several
whatever
it is
-
184
The
first
a matrix
example
triangular, Gaussian
is the LU d e c o m p o s i t i o n ,
A into the p r o d u c t respectively.
Elimination.
A = LU, w h e r e
in w h i c h w e d e c o m p o s e L, U are
This decomposition
To see this,
lower,
is c o m p u t e d
it is b e s t
upper using
to use an
example.
Consider
A =
[31i] 1 2
1 1
In G a u s s i a n 1 to rows the
Elimination,
2 and
3. T h i s
(1)
the f i r s t
step
is to add m u l t i p l e s
can be a c c o m p l i s h e d
by m u l t i p l y i n g
of r o w A on
l e f t by the m a t r i x
Im
0 1 0
M1 =
21 m31
where,
in this
the r e s u l t
is
0 0 1 case,
m21 = -1/3,
m31 = -2/3
are the m u l t i p l i e r .
Then,
185
MIA
=
2/3
-
(2)
I/3
Then,
in the n e x t
s t e p we
apply
a matrix
I°l
M2 =
I
m32
where row
m 3 2 = -I/2.
2 to r o w
3 U = M2MIA
This
3. T h e
has
=
both
m32 = -I/2
times
2/3
~
(3)
/
sides
by MI I M21
(det M i = I),
so we m a y
to o b t a i n
L = M11 M21 .
By c o m p a r i n g
(I) w i t h
(2) a n d
zero
then
to set to z e r o
M 2 is u s e d
column
2.
In t h e
general
matrices,
one
following
case,
all
the
(3), n o t e
M I is to set to
the
of a d d i n g
is
t h a t M I, M 2 are n o n s i n g u l a r
multiply
where
result
I
0 We n o t e
the e f f e c t
final
where
for e a c h
subdiagonal all
the
Matrix
the
elements
action
Mk,
of m a t r i x
of c o l u m n
subdiagonal
A is n x n, we m u s t
column.
structure:
that
elements
apply
k = 1,2,..,
n-1
I of A; of
"M"
n-l,
has
186
I
"
0 I
Mk =
mk+ I ,k " .
° mn, k 0
T k - t h column Coefficients matrix
The
mj,k,
k + 2 .... , n, w i l l
be
referred
to as the
multipliers.
last
i t e m we n e e d
decomposition that
j = k+1,
the
inverse
as M k w i t h
to c o m p l e t e
is: w h a t of Mk,
is
"L"? To
the description see w h a t
as c a n be e a s i l y
the m u l t i p l i e r s
in t h e k - t h
of the L U
f o r m h a s L, we n o t e
verified,
column
is the
same
negated:
-I I
• Mkl
0 I
= O-mk+]
,k " .
-mn, k
0
T k-th
Secondly, the
result
diagonal.
we n o t e
column
that when
is s i m p l y In o u r
to
fill
we
form
in all
3 × 3 example,
we
the p r o d u c t
L =M11
the m u l t i p l i e r s
have:
"'" Mn
below
the
I'
187
L =
=
Here,
one
can
multipliers
I
I
I_2/3
0
I/2
see
from
the d i a g o n a l we h a v e
/3
that all
the Mk,
in t h e i r
all
"I" 's.
the n e t
=
change
Hence,
the
1/2
is to c o l l e c t
sign
and place
position.
(i,j)
-m.. = the m u l t i p l i e r u s e d on r o w 13 the s t a g e w h e n c o l u m n j is b e i n g
I
I_2/3
effect
corresponding
/3
position
j when
added
all
the
them below
On t h e d i a g o n a l , of L,
i > j, is
to r o w
i during
zeroed:
i 0 L =
(-Iniji" -
In c o n c l u s i o n , L is l o w e r
we have
triangular
found with
a decomposition
"1" 's
on
for A : A = LU,
the d i a g o n a l
where
a n d U is u p p e r
triangular.
What
can we
do w i t h
this?
We give
2 uses
of t h i s
decomposition:
A. Solve Linear Equations By u s i n g
LUx
A=LU,
the
system
Ax =b
is e q u i v a l e n t
to
= b.
If we
(4)
call
y = Ux,
we t h e n
Ly = b
Ux
= y.
reduce
equation
(4) to two t r i a n g u l a r
systems:
188
Triangular
systems
are
"back-substitution". of G a u s s i a n
... M 1 b
Then
solution
Ux
= L-Ib
solved
using
note
that
also
to t h e v e c t o r
= L-lAx
x can be
if w e
= Ux
found
by
the process
apply b,
known
as
the
row operation
the
result
will
be
= y.
solving
= y.
It t u r n s o u t operations work
that also
to s o l v e
except new
We
Elimination
Mn_lMn_ 2
the
easily
that
right
the extra
to b to o b t a i n
L y = b f o r y.
by using
hand
work
involved
in a p p l y i n g
y is e x a c t l y
The
two
L y = b, w e m a y
s i d e b, w i t h o u t
schemes solve
the
same
as t h e
are exactly
directly
repeating
the row
equivalent,
A x = b, w i t h
a
the decomposition.
B. Computing Determinant Since
the determinant
product
of a p r o d u c t
of t h e d e t e r m i n a n t s ,
determinant
uij
known
fact
is t h e
the product
Using
2.2
that of
(i,j)
element
the diagonal
of d e t A
- Orthoqonal
2.2.1
is e q u a l write
to t h e
the
... Unn) ,
of U.
the determinant
the LU decomposition
definition
immediately
of A as:
d e t A = d e t L • d e t U = I " (u 1 1 u 2 2
where
of m a t r i c e s
we may
Here,
we have
of a t r i a n g u l a r
used
matrix
the w e l l is s i m p l y
elements.
is m u c h
(Stewart,
faster
than
using
the
1973).
Decomposition
- Q R Decomposition
In t h e L U d e c o m p o s i t i o n ,
we
have
applied
matrices
that
are not
189
orthogonal; Since
they
2 vectors
formations,
do not p r e s e r v e m a y be m a d e
we w o u l d
transformations,
like
which
almost
to see w h a t
Q is o r t h o g o n a l
ortho-normal,
i.e.
=
{0
or angles
parallel
do p r e s e r v e
A n x n matrix
qlqJ
lengths
of vectors.
by such t r a n s -
one can do w i t h
lengths
orthogonal
and angles.
if its c o l u m n s
qi are m u t u a l l y
, if i ~ j ,ifi=j,
or,
in m a t r i x
notation,
Q'Q = I.
In this
section,
we
show h o w one m a y t r i a n g u l a r i z e
using o n l y o r t h o g o n a l and angles.
transformations,
thereby
preserving
T h e n we show w h y the use of o r t h o g o n a l
is p a r t i c u l a r l y
useful
Consider
the matrix:
A =
I
by g i v i n g
an e x a m p l e
a matrix lengths
decompositions
of its use.
I
We w o u l d where
like to r e d u c e
"?" d e n o t e s
determined.
We
transformation
QI =
where,
the
a nonzero
first
element
see h o w to do this of the
column
D
whose
using
1
value
~ ' to
~
0
is to be
a "rotation",
i.e.
a
form
c 0
by o r t h o g o n a l i t y , c 2 + s 2 = I.
(5)
0~',
190
We w i l l
u s e QI
to z e r o
element
a21,
i.e.
is[!ic0[!I The
second
line yields:
-S"
3
I
+
which,
c"
=
0,
together
with
C = 3//1-0
,
Having
QI'
defined
s a m e way,
Q2
=
to
zero
we
find
c 2 + s 2 = I, y i e l d s
s = 1/I/To .
we a p p l y the
it to A to o b t a i n
elements
c and
s of the
Q1A.
Then,
in the
rotation
(6)
1 0
a third
a31.
To c o m p l e t e
rotation
Q3 to
the zero
triangular a32,
decomposition,
obtaining
we n e e d
finally
R = Q3Q2QIA. In g e n e r a l , zero
all
in the n x n case,
the
R = QrQr_1
subdiagonal
... Q 1 A
= an u p p e r
Let
Q =
(QrQr_1
..- Q1 )-I.
By o r t h o g o n a l i t y :
we n e e d
elements
r = n(n+1)/2
of A:
triangular
matrix.
rotations
to
191
,
!
!
Q = Q1Q2
We h a v e
"'"
Qr
now
the
"
so c a l l e d
triangularization
A
=
2.2.2
ortho
A rotation seen from problem
only
upper triangular
the
of a v e c t o r ,
we may
look
as c a n be
at a r e p r e s e n t a t i v e
2 × 2 rotation:
also
x and e I =
2 components
(6). H e n c e ,
2-space.
represent
Consider
:
R
affects
(5) a n d
in t h e
Consider
X
•
Geometric Interpretation of a Rotation
-
We m a y
or o r t h o g o n a l
of A:
Q
arbitrary
QR decomposition
c = cos
a vector
D
%
, s = sin
~ for
some angle
x of R 2, a n d d e n o t e
by
8 the
~
angle
9':
Ixl fcos:l =
IIx II [ s i n
"
Hence,
[ c
Qx
=
llxll
cos
that
+ s
sin
i cc
os(O-~
=
-S COS
This means
0
Qx
8 + C sin
is t h e v e c t o r
llxll
Lsin(8-%~
x rotated
by angle
.
#.
between
192
2.2.3
Decomposition by Householder Transformations
QR
As we h a v e
seen
in 2 . 2 . 1 ,
be o b t a i n e d
by m u l t i p l y i n g
alternative
way
holder
vector
is t h a t
as o n e
in g e n e r a l , component one
likes
can
only.
rotations.
This
n-1 The
reflection")
implies
Householder
c a n be
of a s i n g l e
that,
introduced
to t r a n s f o r m
by m a k i n g
"House-
trans-
components
to o b t a i n
a vector
so-called
to z e r o
out
can
is an
of a
transformation,
transformations, transformation
There
of t h e s e
as m a n y
it is p o s s i b l e
Householder
We w a n t
on t h e
advantage
zero out
by m e a n s
QR d e c o m p o s i t i o n
rotations.
Q, b a s e d
The main
by a r o t a t i o n
can use
follows.
one
Q of the
n(n+l)/2
of c o m p u t i n g
transformations".
formations
matrix
whereas,
one
the Q R d e c o m p o s i t i o n , instead
of n ( n + l ) / 2
(or " e l e m e n t a r y reference
to R 2 as
x to a v e c t o r
v along
e.g.,
axis
e I = [I
such
03
that
vector
,
n v [[ =
around
[[ x [[
x + v
This
(see Fig.
c a n be a c h i e v e d 1).
[ Z=X+ V
eI
Figure 1.
v
by r e f l e c t i n g
the
193
We
go
v=
through
the
following
steps
(note
that
we know
x
and
ttxti e1~
- Axis
of
reflection
z = x+v
= FXl +
II x il,x~]'
I..
The
corresponding
- Project
x
onto
unit
the
axis
vector
of
is
z/fir If.
reflection
to
obtain:
Z Z Ix
tl z II2 - Find
the
difference
between
x
and
its
projection:
a
(or
equivalently
b=a-x.
- Reflect v
x around
= x+2b=x+
its
2(a-x)
projection = 2a-x
2 -z z- ' x = -
= -x+
z) :
(I- 2
ilzli The
zz'
) X.
ilzil2
matrix ZZ
!
P =I-2----
,
IIzll2
gives
the
Householder
transformation.
Since
v = - Px,
we
can
conclude
"reflect"
a vector
In n - s p a c e , to
zero
that,
out
we at
can once
x
by into
pick as
such any
the
many
a linear axis
of
transformation, the
desired
target
components
of
one
can
space.
direction a vector
as
c we
so
as
like.
194
2.2.4
Solving Least Squares Problems Using Orthogonal Decompositions
Let A e R mxn, m_>n, problem
b e R m and x e R n. The L i n e a r L e a s t Squares
is the p r o b l e m of f i n d i n g the f o l l o w i n g m i n i m u m
min I 1 ~ - b
II
X
The a l g o r i t h m of 2.2.1 may be a p p l i e d to r e c t a n g u l a r just as well as square ones. decomposition
matrices
In this case we see that the Q R
of the r e c t a n g u l a r
w h e r e Q ~ R m x m is o r t h o g o n a l ,
m a t r i x A is:
R ~ R nxn is upper t r i a n g u l a r
0 e R (m-n)xn is a b l o c k of zero elements.
As Q is o r t h o g o n a l ,
it does not change the norm.
llAx-bI' = I'Q' (Ax-b)'l
= " [RTx-c
Lol
II,
where
c = Q'b. Partitioning
C
=
this v e c t o r c o n f o r m a l l y
r
IC21 we have
IIAx - b II = II [RXc2-c I II
with [~I,
Hence:
and
195
In o r d e r
to m i n i m i z e
this norm,
we
set
x = R-Ic I .
(9)
Thus,
min x
II A x - b
To f i n d
II : II c 2 ll-
the o p t i m u m
value
of x g i v e n
by
(9) we h a v e
to s o l v e
system
Rx = c I .
In this of
(10)
respect,
N Ax-b
A'Ax
II l e a d s
to the
noticing
celebrated
a direct
normal
minimization
equations:
of
(11)
(8),
system
(11)
is e q u i v a l e n t
to
= R'C I.
(12)
It is i m p o r t a n t
to o b s e r v e
to s o l v i n g
(12)
as o n e o b t a i n s
discussion
of
Systems,
this,
see e.g.
give an e x a m p l e
Suppose
only
(Lawson
0
solving fewer
use
system
errors.
the e r r o r
and Hanson,
(10)
For
a complete
analysis
1974).
is p r e f e r a b l e
of L i n e a r
However,
we m a y
of the p r o b l e m .
on a computer
7 significant
[ 11 10 -4
that
one must
we a r e w o r k i n g
we c a r r y
A =
that
-- A'b.
In v i e w
R'RX
i t is w o r t h
0
10 -4
digits.
with
precision
Consider
10 -7
i.e
matrix
(13)
196
A has r a n k
A'A
I
=
2, but
if we f o r m
1 + 10-8
1
1
1+10 -8
in o u r c o m p u t e r
J
w e w i l l loose
the p a r t
which
has r a n k
3. S p e c i a l
1. So, we loose r a n k
Forms
Used
in N u m e r i c a l
The LU a n d Q R d e c o m p o s i t i o n s s t e p in the c o m p u t a t i o n in the f o l l o w i n g . flavour
useful
things
(ii)
used
a given
Determinant:
(iii)
R a n k of A
(iv)
Nullspace
is w e l l
known,
section
I
also
are u s e d
as the basic
to be i n t r o d u c e d
serves
of f i n d i n g
square matrix
to give
the
A,
a n u m b e r of i.e.
1
det A
of A: ker A
(v) I m a g e o r R a n g e of A:
- The J o r d a n
above
in the r e s t of this work.
on the p r o b l e m s
(i) E i g e n v a l u e s :
Linear Algebra-Why
discussed
The p r e v i o u s
about
information.
of the d e c o m p o s i t i o n s
of the a p p r o a c h
We n o w c o n c e n t r a t e
As
instead
El ii
A'A =
3.1
"10 -8" a n d o b t a i n
Canonical
there
c o l s p A.
Form
are m a n y
classical
decompositions
for
197
matrices,
the m o s t
common
A is then d e c o m p o s e d
A = PJP
where
-I
into the p r o d u c t
P is n o n s i n g u l a r , Form
eigenvalues
of A
(product
(dimension
corresponding
( elements
so-called
Matrix
minus
corresponding
of J),
and the
of J o r d a n
the c o l u m n s
bloks
of P c o r r e s p o n
the n u l l s p a c e
to n o n z e r o
us the
the
of J)
the n u m b e r
of J g e n e r a t e
Jordan
form can tell
elements
to h i = 0). F u r t h e r m o r e ,
of A,
rows of J g e n e r a t e
of A.
So, the J o r d a n
Canonical
(i) - (v). However,
almost
This
of the d i a g o n a l
of the m a t r i x
the c o l u m n s
the range
1959).
of the d i a g o n a l
ding to all zero c o l u m n s
advised.
decomposition.
of 3 m a t r i c e s :
and J is in the
(Gantmacher,
determinant
whereas
the J o r d a n
,
Canonical
rank
being
The m a t r i x
singular),
separated(almost separated", This calls
especially
Conditionin~
conditioning
finite w o r d
Form
li are p o o r l y
I. are "well l is an i l l - p o s e d problem.
consideration
computer.
Because
can be r e p r e s e n t e d
in the computer.
In the t r e a t m e n t used
are p e r t u r b e d
(i.e.
of a P r o b l e m
model m o s t
often
items is ill-
if the
is an i m p o r t a n t
numbers
form
of
use of a d i g i t a l
length,
even
this
ill-conditioned
if the e i g e n v a l u e s
the J o r d a n
for the q u e s t i o n
one c o n s i d e r s t h e
one to find out all
computations,
P can be e x t r e m e l y
coincidingl.But
finding
3.2 - N u m e r i c a l
Numerical
Form enables
for n u m e r i c a l
only
whenever of the approximately
of such a p p r o x i m a t i o n s ,
is to c o n s i d e r
what happens
the
if the n u m b e r s
slightly.
The e i g e n v a l u e s take for e x a m p l e
can be e x t r e m e l y the
3 x 3 matrix
sensitive
to p e r t u r b a t i o n s :
198
-64
82
144
-178
-46
-778
962
248
A =
which
21]
has e i g e n v a l u e s
(14)
1, 2, 3. If we add a small p e r t u r b a t i o n
EE,
where
01 E = 10 -4
-0.6
I. I
-6
_-0.1
0.3
-1
is a rank one m a t r i x perturbed shows
that
!
matrix
of n o r m ~10 -3, then,
A + EE w i l l
problems
have
m a y occur
complex
for any e > 0.45, eigenvalues!
e v e n on small
the
This
innocuous-looking
matrices!
Even with
7-16 d i g i t s
In the f o l l o w i n g perturbations destroy order
of accuracy,
20 x 20 e x a m p l e s ,
in the
all d i g i t s
of m a g n i t u d e
9 th place
of a c c u r a c y will
this
is a r e l e v a n t
due to W i l k i n s o n
in some e l e m e n t
will
in the e i g e n v a l u e s -
be w r o n g
problem:
(1965), completely even
the
in some cases:
["20 20
20 19
20 18
20 17
0 20 16
20 (15)
A = 5
0
20 4
20 3
20 2
20 I
199
We p o i n t apply then
out that
to o b t a i n
an a l g o r i t h m . slight
result,
as was
gorithm
that can give
In c a s e a p r o b l e m
use m e t h o d s
satisfactory
is n o t so b a d l y
introduce
collectively
the s t a b i l i t y
conditioned
and the
believe
the answer
the p r o b l e m is w e l l hope
hope to improve ditioned
The usual
in terms
4. Schur
of s t a b i l i t y
"Backward
of a s p e c i f i c
If a p r o b l e m
decomposition
which
most closely
corresponds
called
decomposition.
Schur
this d e c o m p o s i t i o n
badly
con-
not.
in L i n e a r
in the n e x t
is n u m e r i c a l l y
best
Algebra intro-
section.
to the J o r d a n
is g i v e n
Denoting
useful
decomposition
by
"*" c o n j u g a t e
is the
so-
transpose,
(16)
Q is a ( p o s s i b l y (possibly
and w h i c h
by
A = QRQ*,
is a
we c a n n o t
Decomposition
The m a t r i x
where
case,
for a or
of
If the p r o b l e m
is p r o b a b l y
algorithm
then we m a y
t h e n we can
in the o p p o s i t e
This
is w e l l -
by the c o n d i t i o n i n g
for a l g o r i t h m s
Stability".
causing
are c a l l e d
is u n s t a b l e ,
stable
like to
the b e s t
it is stable
of a s o l u t i o n
by any a l g o r i t h m ,
measure
is s o - c a l l e d duced
the a c c u r a c y
problem
providing properties
of the algorithm).
but
is no al-
we w o u l d
as p o s s i b l e ,
defined
but the m e t h o d
algorithm;
there
t h e n we m a y con-
regard
to solve
limits
and the s t a b i l i t y
for a b e t t e r
conditioned,
In this
desirable
method
case
in
the d e s i r e d
results.
of an algorithm.
(to the
conditioned,
In this
or at least
These
or i l l - c o n d i t i o n e d
can d e s t r o y
as few e r r o r s
perturbations,
on such errors.
one m u s t
(such as t h a t o c c u r i n g
length)
(15).
for its solution.
which
the s m a l l e s t bounds
word
seen in e x a m p l e
to a p r o b l e m
is i l l - p o s e d
to the d a t a
from the finite
sider m e t h o d s
solution
If a p r o b l e m
perturbations
the c o m p u t e r
the
complex) o r t h o g o n a l
complex)upper
triangular
matrix matrix.
(Q*Q = I) and R
200
We f i r s t n o t e the m a t h e m a t i c a l position: values
since
elements.
= detQ
Hence,
is a l w a y s
form yields
the n u m e r i c a l bounded
"almost parallel",
the o r i g i n a l
instability
items
properties?
we can m a k e
form
between
(16)
is n o t m a d e worse.
same
size
to be
stable,
that
b l e m c l o s e to the o r i g i n a l
of s a y i n g
slightly
perturbed
b l e m A".
Here,
R.
3.2.
This
and
The a d v a n t a g e
in A w i l l
result
sensitivity
in
to
any algorithm/form
shown
(Wilkinson,
is e x a c t
1965)
for a p r o -
R
starting problem
As is s e e n
is f o r w a r d
stable:
"The
R c l o s e to the R t h a t w e w o u l d h o p e
starting with
an a n s w e r
(a) is not t r u e destroy
so the
is the r e s u l t
an a n s w e r
"close r' m e a n s
used.
in Sect.
in
is n o t
problem.
in a p r o b l e m
In g e n e r a l ,
(b) the a l g o r i t h m
has p r o d u c e d
"close to s i n g u l a r i t y " ) .
one.
exact arithmetic
p r o b l e m A", we say
are
the i l l - c o n d i t i o n i n g
can be
(a) the a l g o r i t h m
has p r o d u c e d
it
(the c o l u m n s
sure that the r e s u l t
in R,
transformations
to obtain w i t h
of Sect.3.
Q is o r t h o g o n a l ,
is t h a t p e r t u r b a t i o n s
perturbations
the c o m p u t e r
Since
ill-conditionin~
b a s e d on u n i t a r y "backward"
(ii)
t h a n the s t a r t i n g
of the
statement
just
the r e l a t i o n :
of R.
so t h a t Q is n e v e r
perturbations
complete
satisfy
(i) a n d
of an a l @ o r i t h m m e n t i o n e d
of the S c h u r
algorithm
are
elements
to p e r t u r b a t i o n s
is the d i s t i n c t i o n
Instead
the e i g e n v a l u e s
can hope to r e m o v e
problem,
sensitive
algorithm
the e i g e n -
in size a n d w e l l - c o n d i t i o n e d
T h o u g h no a l g o r i t h m
more
for w h i c h
transformation,
The d e t e r m i n a n t s
of the d i a g o n a l
the S c h u r
What about
R,
o f the S c h u r d e c o m -
d e t R det Q* = det R =
= product
never
is a s i m i l a r i t y
of A are t h o s e o f
the diagonal
detA
(16)
relevance
the e x a c t
is b a c k w a r d which
original
stable:
is e x a c t
"The
for a
A c l o s e to the o r i g i n a l
on the o r d e r
of t h e p r e c i s i o n
from examples
in g e n e r a l - s l i g h t
(14)
and
changes
(15),
pro of the
to A c a n
201
By u s i n g exact
orthogonal
eigenvalues
In e x a c t
no method examples
of
can
(14)
c a n be
we make and
the
still
obtain
(backward
hence
If o u r m e t h o d still
stability).
we o b t a i n
makes
the
exact
slight
hope
for
errors
(b). T h e s e
in a m e t h o d •
Schur f o r m c a n n o t be used reliably for items (iii)-(v).
Schur
as
form
follows.
If A is a l r e a d y
is o b t a i n e d
with
upper
R = A, Q = I. C o n s i d e r
the n × n m a t r i x :
I -I
-1
I
A
to A
one can expect
illustrated
triangular,
close
(a), b u t we c a n
the b e s t the
we may
no errors,
(15).
satisfy
shows
Unfortunately,
then
for a m a t r i x
arithmetic,
eigenvalues
This
transformations,
0
=
(17)
". -I 1
i
If ~ = 0, a l l
the
rank
is c l o s e
to d e f i c i e n t ,
will
make
A singular!
-I x =
eigenvalues
-2 , ~
Therefore, a matrix.
-3 r (~ , - • . t
we need Such
This
exactly
since
-(n-~)
-(n-l) ,
way
~
by
e to
the
e = - I / 2 (n-2)
forming
Ax with
-(n-1 r
to f i n d
is p r o v i d e d
I. N e v e r t h e l e s s ,
"perturbing"
c a n be v e r i f i e d
(~
a better
a way
are
by the
the rank, Singular
det, Value
etc.
of
Decom-
position.
5.
Singular
In t h i s items We
section,
such
also
and try
Value
Decomposition-Condition
we
introduce
as r a n k :
introduce to e x p l a i n
the
the its
another
Singular
concept
Number
decomposition
Value
Number
a Matrix
relevant
Decomposition
of C o n d i t i o n
significance.
of
to
(S.V.D.).
of a m a t r i x
202
The
A
S.V.D.
of a mx p matrix
A is
-- U ~ V *
where
(18)
U and V are
m x p real By
and diagonal
letting
oI ! o2 ~
IIAII =
o2,
...,
and orthogonal
matrix,
n = min(m,p),
= diag (al,
In w h a t
square
we
with
usually
matrices
nonnegative assume
and
E is a
diagonal
elements.
that
on ) ,
-.- Z o n I 0.
follows,
we w i l l
IIAxll
max
use
the m a t r i x
2-norm:
,
(19)
IIx II = 1 w h e r e llx llis t h e u s u a l several
properties.
the
fact
the
2-norm
that
vector
The most
orthogonal
of a matrix
2-norm.
In t h i s
immediately
matrices
do n o t
norm,
relevant affect
we have
property (Stewart,
is 1973)
or v e c t o r :
IIQxll = IIx II
(20a)
IIQAII=
IIA ll-
(20b)
Notice
also
What
kind
obvious
IIAII
that of
IIQ
II = 1.
information
is t h e n o r m
o f A.
From
gleam
(18),
from the
(20)
and
S.V.D.?
The most
(19):
= I1~11 = 01 •
If A is a n o n s i n g u l a r
A -1
can one
= V Z-I U*.
Moreover,
square
matrix,
its
inverse
is g i v e n
by
203
IIA-Ill = 11 z-III = %1 Given a square nonsingular matrix A, the number
kIAl =IIAII • IIA-III
(21)
is said to be the condition number of A. Obviously,
k(A)
= oI/~n"
The condition number happens to be a very useful quantity in estimating the sensitivity of such items as rank, determinant, inverse,
solution to a set of linear equations,
etc., with
respect to perturbations to the matrix A. It also gives the "distance to singularity" To see this, we start with the classical origin of k(A). Consider the problem of solving the matrix equation Ax = b. When using a computer, we obtain an approximate result ~, which we consider exact for the slightly perturbed problem A~ = ~. Note that we have perturbed only b, not A. We have:
Ax = b ~u A~=b. Subtract to get
A(~-x)=b-b. Multiplying both sides by A -1 and taking the norms, one obtains:
i~-xJl ~ liA-Ill il~-bll, i.e. the
(error in answer)
122) is bounded by the(error in right
hand side) magnified by iIA-III. However,
to estimate the number
of digits of accuracy in x, we need the "relative error"
204
II~-xll ilxll If the r e l a t i v e of accuracy,
error
is e.g. % 1 0 -6,
regardless
of the r e l a t i v e
error,
of the
then we h a v e
about
size of x. To o b t a i n
we use the r e l a t i o n
6 digits
an e s t i m a t e
A x = b to obtain:
IIAII Ilxl[ > [Ibll i.e.
flail > I llbil -]Ixil From
(22)
113-xlI Ilx II i.e.
and
(23)
(23),
and d e f i n i t i o n
(21),
I, for any A.
numbers
values.
of some p a r t i c u l a r
205
Indeed:
= (c) k(Q)
IIAII
IIA-111
z
IIAA-111
=llIII
= 1
= 1, if Q is o r t h o g o n a l
(d) L e t T 6 be the are t.. = 13
6 × 6 Hilbert
matrix,
the
elements
of w h i c h
(1+i+j) -I.
Then, k(T 6) ~ 106 .
The
S.V.D.
a n d k(A)
singularity"
c a n be
used
to f i n d
the
of a m a t r i x
A. We n o t e
that
7 = number
of n o n z e r o
G. s.
"distance
to
!
rank A = rank
In p a r t i c u l a r ,
if A is n o n s i n g u l a r ,
t h a t A is n o n s i n g u l a r singular.
V*(A+E)
Then,
a i > 0, V i .
a n d E is a p e r t u r b a t i o n
we m a y w r i t e ,
U =V*AU
1
+V*EU
using
the
S.V.D.
Now,
such
suppose
t h a t A + E is
of A
(18):
= Z +F,
where
F = V*EU.
Because
U, V are
to A c o r r e s p o n d We
can define
smallest
d
= sing
In v i e w
dsing
=
"distance
E such
EII
:
II F II- T h u s ,
to p e r t u r b a t i o n s to
singularity"
F to as the
perturbations
E. norm
of the
t h a t A + E is s i n g u l a r :
min A + E sing.
of t h e
orthogonal,ll exactly
II E II •
discussion
min 7.+F sing.
II F II
(25)
above,
this
corresponds
to
(26)
E
206
Since
Z = diag(ol,
is c l e a r
that
o 2 , . . . , On) , w i t h
the F which
achieves
o I I o2 >
the m i n i m u m
... _> o n > 0, in
(26)
it
is
F = d i a g ( 0 , 0 , . . . , 0 , - ~ n)
so t h a t
lJ ~ Jl= % Hence,
the E a c h i e v i n g
we have
U = [u I u 2
labeled
(25)
is
the columns
of U , V :
... U n ]
V = Iv I v 2 ... V n ]
Notice
in
= - O n U n v*n '
E = UFV*
where
the minimum
also
.
that
If ~ Ir= % Hence,
dsing
the
distance
to s i n g u l a r i t y
is
= an .
(27)
Consequently, to t h e
size of
d
the the
"relative starting
distance matrix
A)
to
singularity"
(relative
is:
o sing
_
}IA II SO,
k(A)
solving
n
- k (A) .
(28)
°I not only Ax=
b, b u t
indicates also
the
shows
difficulty
how
close A
one
can expect
is to a s i n g u l a r
in
207
matrix relative gives
the
sensitivity
It s h o u l d defined
be n o t e d
using
S.V.D.
We h a v e
spaces
We n o t e
hold
the
suppose
... ~
in a n y
the
size
r a n k A,
For
about
this the
case,
analogous
of t h e
smallest
(Or/O I) g i v e s
we
...
to
IIAII II
c a n be
to a v e c t o r
norm,
involving
2-norm.
(iv)
such quantities
and
(v) of S e c t . 3 ,
start with values
and
perturbation the relative
a singular
of A
= o n = 0,
(27)
k(A)
in A.
to o b t a i n
points
singular
words,
The results
in the
can be used
that
:
such norm. only
that,
and
k(A)
corresponding
o r > Or+ I = Or+ 2 =
in p a s s i n g
or gives
quantity
norm
S.V.D.
In o t h e r
A to p e r t u r b a t i o n s
are valid
colsp A?
L e t us
oI ~ o2 ~
the
r a n k A, k(A) . W h a t
k e r A,
n × n A.
reduce
(24)
of t h e m a t r i x .
of rank
that
however,
seen how
as IIAII ,
size
any matrix
and relations the
to t h e
satisfy:
so t h a t
(28), needed size
of
r a n k A = r.
the quantity to
further
such
perturbation•
We w r i t e
the
S.V.D.
of A as
"a 1
v;]
o2
0 or
v Eu12[iI 0°
0
A = U
0
0 0
where
Z I = diag(ol,
o2
,---,
or )
(29)
208
is r × r a n d n o n s i n g u l a r . to Z.
A
been
partitioned
conformally
1
UI,V I are nx r orthonormal
singular. ker A
have
Thus,
= UIZIV
where
U, V
Hence,
is t h e
orthonormal
orthogonal basis
In p r a c t i c e ,
matrices
U I is an o r t h o n o r m a l
i.e.
to u s e
complement
and
basis
of
the
space
V 2 is an o r t h o n o r m a l
(29), o n e w i l l
~I
is r × r n o n -
for c o l s p with
basis
frequently
A,
and
V I as
f o r k e r A.
encounter
the
situation
o I >_ o 2
where
>_
some of the
the order cide
singular
It is b e s t gular 10 0
only
the exact
later
10 -I
_> ... > o n _> 0,
singular
the machine
is s m a l l " , value
(scaled 10 -2
the order
10 -4
instead,
"small",
The problem
is at w h a t
the problem
point
of a c c u r a c y ,
here. Assume
so t h a t
o I = 1) e.g.
10 -8
10 -10
10 -9
of magnitude
Hence
are
precision.
that
r a n k of t h e m a t r i x
zero.
values
i.e.
on
is to d e -
to c o n s i d e r
zero.
illustrate are
6 digits
considered
10 0
to
values 10 -1
where
If,
of e.g.
"how small
a small
had
... >_ O r > O r + I
are
shown.
is 8. B u t
then
we would
0
that
the
0,
Then
we
see
if t h e o r i g i n a l
any number consider
< 10 -6 the
rank
that
data
should to be
be 5.
we had the values
10 -2
10 -4
10 - 6
10 - 8
10 - 1 0
10 - 1 2
10 - 1 4
10 - 1 6
I0 -I
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -18
or
i0 0
sin-
10 -9 '
only
209
then there almost
is no o b v i o u s
entirely
Unfortunately, practice,
small
see
this
the S.V.D.
This
arise
really
means
perturbation
to A w i l l
reduce
the rank,
only
slightly
larger,
in
is n o t a d e f e c t
arises,it
For a full d i s c u s s i o n
this
frequently
situation
(Klema and Laub,
We close
c a n and d o e s
in l a r g e m a t r i c e s .
perturbation,
further.
rank depends
of the z e r o t o l e r a n c e !
situation
If this
(negligible)
another even
so the e f f e c t i v e
o n the c h o i c e
especially
of the S.V.D.
gap,
can r e d u c e
of the S.V.D.
that a and
the r a n k
and the r a n k
1980).
section with
W e just p o i n t
a few examples o u t the idea,
of s i t u a t i o n s
leaving
involving
the d e t a i l s
to
It is u s e f u l
to
t h e reader.
a) L e a s t
Squares
(Lawson
T h i s w a s the c l a s s i c a l solve p r o b l e m cases, R in
(7) in cases w h e r e
we c a n n o t
A is r a n k d e f i c i e n t .
use the Q R d e c o m p o s i t i o n
(8) is s i n g u l a r ,
If we r e s o r t
& Hanson,1974) o r i g i n of the S.V.D.
instead
and h e n c e we c a n n o t to the S.V.D.
because solve
In s u c h
the m a t r i x
(9).
of A, we have,
in v i e w of
(20a)
llAx-bll
= 11
V*x-bll
=
w h e r e y = V * x a n d c = U*b. original
problem
partition
I[
y
We minimize we
find
The result
to a d i a g o n a l
the a b o v e
0
11 ZV*x-
as in
C
(29)
y ctl,
is w e h a v e
problem
converted
involving
the
Z. We
to o b t a i n
II
t h i s n o r m by s e t t i n g
that Y2 is free!
*bll=ll
Yl
-I = Z1 ci"
In the s o l u t i o n ,
210
(b) P s e u d o
Inverse
The p s e u d o expressed
inverse
A + of A, L a w s o n
and Hanson,
1974,
can be
as
A+
V
U*
where
we have
(c) R e l a t i o n
used
the p a r t i t i o n i n g
(29).
to A * A
We p o i n t
out the r e l a t i o n s h i p
of the S.V.D.
to the c l a s s i c a l
idea of e i g e n v a l u e s . If A = U Z V*, then A *A = V Z U* U Z V*
= V Z 2 V*
~2
•...,
= diag(~,
Hence•
the
a~
singular
of A'A,
of A * A . I n
using
fact,
semi-definite
to p r o v e
Hanson,
1974). for a c t u a l
solution
of the
without ease
accurate
forming
A'A,
squares and
u s i n g A * A are o f t e n
performing
computation
this
argument
in
(Lawson
and
or the
the S.V.D.
from e x a m p l e
(or 2 × n for any n), sufficiently by hand.
positive
(a), it is a l m o s t
to c o m p u t e
as c a n be seen
r o o t of
is s y m m e t r i c
of the S.V.D.
problem
faster
problems
square
of V are the eigenvectors
of the S.V.D.
computation
least
of 2 × 2 m a t r i x
obtained when
the fact that A * A
the e x i s t e n c e
However,
more
oi are just the
and the c o l u m n s
for any A, one can c a r r y
reverse
always
2 an).
values
the e i g e n v a l u e s
,
(13). the
accurate,
directly In the results
especially
211
6. A p p l i c a t i o n s
of P r e v i o u s
We f i n a l l y
a look of h o w the p r e v i o u s
stability
take
applies
continuous-time ~(t)
= F x(t)
y(t)
= H x(t)
to L i n e a r
on n u m e r i c a l either
the
system
+ G u(t) (31)
F is n x n, G is n x m a n d H is p x n. Markov
w(i)
G
= H F i-I
In d i s c r e t e whereas
parameters
time
t h e y are the v a l u e s
in c o n t i n u o u s - t i m e at the o r i g i n
We s h a l l
consider
a given
system
system
defined
is r e a c h a b l e
and
t, t h e r e
Obviously, The p r o b l e m
criteria
studied
theorem:
two problems
Determining
f r o m the whether
the s y s t e m
w(-). (Kalman,
Falb,
if,
for e a c h
state x and e a c h
a T