Discrete Stochastic Processes and Optimal Filtering
Discrete Stochastic Processes and Optimal Filtering
Jean-Claude B...
43 downloads
1076 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Discrete Stochastic Processes and Optimal Filtering
Discrete Stochastic Processes and Optimal Filtering
Jean-Claude Bertein Roger Ceschi
First published in France in 2005 by Hermes Science/Lavoisier entitled “Processus stochastiques discrets et filtrages optimaux” First published in Great Britain and the United States in 2007 by ISTE Ltd Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 6 Fitzroy Square London W1T 5DX UK
ISTE USA 4308 Patrice Road Newport Beach, CA 92663 USA
www.iste.co.uk © ISTE Ltd, 2007 © LAVOISIER, 2005 The rights of Jean-Claude Bertein and Roger Ceschi to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Cataloging-in-Publication Data Bertein, Jean-Claude. [Processus stochastiques discrets et filtrages optimaux. English] Discrete stochastic processes and optimal filtering/Jean-Claude Bertein, Roger Ceschi. p. cm. Includes index. "First published in France in 2005 by Hermes Science/Lavoisier entitled "Processus stochastiques discrets et filtrages optimaux"." ISBN 978-1-905209-74-3 1. Signal processing--Mathematics. 2. Digital filters (Mathematics) 3. Stochastic processes. I. Ceschi, Roger. II. Title. TK5102.9.B465 2007 621.382'2--dc22 2007009433 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 13: 978-1-905209-74-3 Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire.
To our families We wish to thank Mme Florence François for having typed the manuscript, and M. Stephen Hazlewood who assured the translation of the book
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
Chapter 1. Random Vectors. . . . . . . . . . . . . . . . . . . . . . . 1.1. Definitions and general properties . . . . . . . . . . . . . . . 1.2. Spaces L1(dP) and L2(dP) . . . . . . . . . . . . . . . . . . . . 1.2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. Mathematical expectation and applications . . . . . . . . . . 1.3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2. Characteristic functions of a random vector . . . . . . . 1.4. Second order random variables and vectors . . . . . . . . . . 1.5. Linear independence of vectors of L2(dP) . . . . . . . . . . . 1.6. Conditional expectation (concerning random vectors with density function) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7. Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
Chapter 2. Gaussian Vectors . . . . . . . . . . . . . . . . . . 2.1. Some reminders regarding random Gaussian vectors 2.2. Definition and characterization of Gaussian vectors . 2.3. Results relative to independence . . . . . . . . . . . . 2.4. Affine transformation of a Gaussian vector . . . . . . 2.5. The existence of Gaussian vectors . . . . . . . . . . . 2.6. Exercises for Chapter 2 . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . .
. . . . . . . . . .
1 1 20 20 22 23 23 34 39 47
. . . . . . . . . . . . . .
51 57
. . . . . . .
63 63 66 68 72 74 85
. . . . . . .
. . . . . . . . . .
. . . . . . .
. . . . . . . . . .
. . . . . . .
. . . . . . . . . .
. . . . . . .
. . . . . . . . . .
. . . . . . .
. . . . . . .
viii
Discrete Stochastic Processes and Optimal Filtering
Chapter 3. Introduction to Discrete Time Processes. . . . . . . . 3.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. WSS processes and spectral measure. . . . . . . . . . . . . . 3.2.1. Spectral density . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Spectral representation of a WSS process . . . . . . . . . . . 3.3.1. Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2.1. Process with orthogonal increments and associated measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2.2. Wiener stochastic integral . . . . . . . . . . . . . . . . 3.3.2.3. Spectral representation. . . . . . . . . . . . . . . . . . 3.4. Introduction to digital filtering . . . . . . . . . . . . . . . . . 3.5. Important example: autoregressive process . . . . . . . . . . 3.6. Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
93 93 105 106 110 110 111
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
111 113 114 115 128 134
Chapter 4. Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Position of the problem . . . . . . . . . . . . . . . . . . . . . 4.2. Linear estimation . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Best estimate – conditional expectation . . . . . . . . . . . 4.4. Example: prediction of an autoregressive process AR (1) 4.5. Multivariate processes . . . . . . . . . . . . . . . . . . . . . 4.6. Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
141 141 144 156 165 166 175
Chapter 5. The Wiener Filter . . . . . . . . . . . . . 5.1. Introduction. . . . . . . . . . . . . . . . . . . . 5.1.1. Problem position . . . . . . . . . . . . . . 5.2. Resolution and calculation of the FIR filter . 5.3. Evaluation of the least error . . . . . . . . . . 5.4. Resolution and calculation of the IIR filter . 5.5. Evaluation of least mean square error . . . . 5.6. Exercises for Chapter 5 . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
181 181 182 183 185 186 190 191
Chapter 6. Adaptive Filtering: Algorithm of the Gradient and the LMS. 6.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Position of problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4. Minimization of the cost function. . . . . . . . . . . . . . . . . . . . . . 6.4.1. Calculation of the cost function . . . . . . . . . . . . . . . . . . . . 6.5. Gradient algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
197 197 199 202 204 208 211
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Table of Contents
6.6. Geometric interpretation . . . . . . . . . . . . . . . . 6.7. Stability and convergence . . . . . . . . . . . . . . . 6.8. Estimation of gradient and LMS algorithm . . . . . 6.8.1. Convergence of the algorithm of the LMS . . . 6.9. Example of the application of the LMS algorithm . 6.10. Exercises for Chapter 6 . . . . . . . . . . . . . . . .
ix
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
214 218 222 225 225 234
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
237 237 241 241 244 245 245 246 248 248 250 258 260 262
Table of Symbols and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . .
281
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
283
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
285
Chapter 7. The Kalman Filter . . . . . . . . . . . . . . . 7.1. Position of problem . . . . . . . . . . . . . . . . . . 7.2. Approach to estimation . . . . . . . . . . . . . . . . 7.2.1. Scalar case . . . . . . . . . . . . . . . . . . . . . 7.2.2. Multivariate case . . . . . . . . . . . . . . . . . 7.3. Kalman filtering . . . . . . . . . . . . . . . . . . . . 7.3.1. State equation . . . . . . . . . . . . . . . . . . . 7.3.2. Observation equation. . . . . . . . . . . . . . . 7.3.3. Innovation process . . . . . . . . . . . . . . . . 7.3.4. Covariance matrix of the innovation process. 7.3.5. Estimation . . . . . . . . . . . . . . . . . . . . . 7.3.6. Riccati’s equation. . . . . . . . . . . . . . . . . 7.3.7. Algorithm and summary . . . . . . . . . . . . . 7.4. Exercises for Chapter 7 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
Preface
Discrete optimal filtering applied to stationary and non-stationary signals allows us to process in the most efficient manner possible, according to chosen criteria, all of the problems that we might meet in situations of extraction of noisy signals. This constitutes the necessary stage in the most diverse domains: the calculation of the orbits or guidance of aircraft in the aerospace or aeronautic domain, the calculation of filters in the telecommunications domain, or in the domain of command systems, or again in that of the processing of seismic signal – the list is non-exhaustive. Furthermore, the study and the results obtained from discrete signals lend themselves easily to the calculator. In their book, the authors have taken pains to stress educational aspects, preferring this to displays of erudition; all of the preliminary mathematics and probability theories necessary for a sound understanding of optimal filtering have been treated in the most rigorous fashion. It should not be necessary to have to turn to other works to acquire a sound knowledge of the subjects studied. Thanks to this work, the reader will be able not only to understand discrete optimal filtering but also will be able easily to go deeper into the different aspects of this wide field of study.
Introduction
The object of this book is to present the bases of discrete optimal filtering in a progressive and rigorous manner. The optimal character can be understood in the sense that we always choose that criterion at the minimum of the norm − L2 of error. Chapter 1 tackles random vectors, their principal definitions and properties. Chapter 2 covers the subject of Gaussian vectors. Given the practical importance of this notion, the definitions and results are accompanied by numerous commentaries and explanatory diagrams. Chapter 3 is by its very nature more “physics” heavy than the preceding ones and can be considered as an introduction to digital filtering. Results that will be essential for what follows will be given. Chapter 4 provides the pre-requisites essential for the construction of optimal filters. The results obtained on projections in Hilbert spaces constitute the cornerstone of future demonstrations. Chapter 5 covers the Wiener filter, an electronic device that is well adapted to processing stationary signals of second order. Practical calculations of such filters, as an answer to finite or infinite pulses, will be developed. Adaptive filtering, which is the subject of Chapter 6, can be considered as a relatively direct application of the determinist or stochastic gradient method. At the end of the process of adaptation or convergence, the Wiener filter is again encountered.
xiv
Discrete Stochastic Processes and Optimal Filtering
The book is completed with a study of Kalman filtering which allows stationary or non-stationary signal processing; from this point of view we can say that it generalizes Wiener’s optimal filter. Each chapter is accentuated by a series of exercises with answers, and resolved examples are also supplied using Matlab software which is well adapted to signal processing problems.
Chapter 1
Random Vectors
1.1. Definitions and general properties If we remember that
n
{
= x = ( x1 ,..., xn )
of real n -tuples can be fitted to two laws:
1 = (1, 0,..., 0 ) ,...,
n
= ( 0,..., 0,1) and x ∈
denoted:
⎛ x1 ⎞ ⎜ ⎟ x = ⎜ ⎟ (or xT = ( x1 ,..., xn ) ). ⎜x ⎟ ⎝ 2⎠
}
; j = 1 to n , the set
x, y → x + y and n
making it a vector space of dimension n . The basis implicitly considered on
xj ∈
n
n
×
n
( λ ,x ) → λ x ×
n
n
will be the canonical base n
expressed in this base will be
2
Discrete Stochastic Processes and Optimal Filtering
Definition of a real random vector Beginning with a basic definition, without concerning ourselves at the moment
⎛ X1 ⎞ ⎜ ⎟ with its rigor: we can say simply that a real vector X = ⎜ ⎟ linked to a physical ⎜X ⎟ ⎝ n⎠ or biological phenomenon is random if the value taken by this vector is unknown and the phenomenon is not completed. For typographical reasons, the vector will instead be written X
T
or even X = ( X 1 ,..., X n ) when there is no risk of confusion. In other words, given a random vector X and Β ⊂
assertion (also called the event) ( X ∈ Β ) is true or false:
n
= ( X 1 ,..., X n )
we do not know if the
n
Β .X
However, we do usually know the “chance” that X ∈ Β ; this is denoted
Ρ ( X ∈ B ) and is called the probability of the event ( X ∈ Β ) .
After completion of the phenomenon, the result (also called the realization) will be denoted
⎛ x1 ⎞ ⎜ ⎟ x = ⎜ ⎟ or xT = ( x1 ,..., xn ) or even x = ( x1 ,..., xn ) ⎜x ⎟ ⎝ 2⎠ when there is no risk of confusion.
Random Vectors
3
An exact definition of a real random vector of dimension n will now be given. We take as given that: – Ω = basic space. This is the set of all possible results (or tests) random phenomenon. –
ω
linked to a
a = σ -algebra (of events) on Ω , recalling the axioms: 1) Ω ∈ a c 2) if Α ∈ a then the complementary A ∈ a
( Α j , j ∈ J ) is a countable family of events then
3) if
∪ A j is an event,
j∈J
i.e. ∪ A j ∈ a j∈J
n
– –
= space of observables
B(
n
)=
n
Borel algebra on n
which contains all the open sets of
; this is the smallest
σ
n
-algebra on
.
DEFINITION.– X is said to be a real random vector of dimension n defined on
( Ω, a )
if
∀Β ∈ B (
X is a measurable mapping n
)
Χ −1 ( Β ) ∈ a.
( Ω, a ) → (
n
,B (
n
)) ,
i.e.
When n = 1 we talk about a random variable (r.v.). In the following the event Χ
−1
(Β)
is also denoted as
and even more simply as ( X ∈ B ) .
{ω
X (ω ) ∈ B
}
PROPOSITION.– In order for X to be a real random vector of dimension n (i.e. a measurable mapping
( Ω, a ) → (
each component Χ j
n
,B (
n
) ) , it is necessary and it suffices that
j = 1 at n is a real r.v. (i.e. is a measurable mapping
( Ω, a ) → ( R,B ( R ) ) ).
4
Discrete Stochastic Processes and Optimal Filtering
ABRIDGED DEMONSTRATION.– It suffices to consider:
Χ −1 ( Β1 × ... × Βn ) where Β1 ,..., Β n ∈ B ( R )
as
we
show
B (R) ⊗
B(
that
n
) = B (R) ⊗
⊗B ( R )
where
⊗ B ( R ) denotes the σ -algebra generated by the measurable
blocks Β1 × ... × Β n .
( B1 ×
Now X
× Bn ) = X 1−1 ( B1 ) ∩
and only if each term concerns
∩ X n−1 ( Bn ) , which concerns
a , that is to say if each
a
if
X j is a real r.v.
DEFINITION.– X = X 1 + iX 2 is said to be a complex random variable defined on
( Ω, a )
say
the
if the real and imaginary parts X 1 and X 2 are real variables, that is to random
( Ω, a ) → (
,B(
variables
)) .
X 1 and X 2
are
measurable
mappings
EXAMPLE.– The complex r.v. can be associated with a real random vector
X = ( X1 ,..., X n ) and a real n -tuple, u = ( u1 ,..., un ) ∈ i∑u j X j j
n
.
= cos∑ u j X j + i sin ∑ u j X j j
j
The study of this random variable will be taken up again when we define the characteristic functions . Law Law Ρ X of the random vector X .
Random Vectors
First of all we assume that the a mapping
P:
σ
-algebra
a → [0,1] verifying:
a
is provided with a measure
5
P , i.e.
1) P ( Ω ) = 1 2) For every family
( Aj , j ∈ J ) of countable pairwise disjoint events:
⎛ ⎞ P ⎜ ∪ Aj ⎟ = ∑ P Aj ⎝ j∈J ⎠ j∈J
( )
DEFINITION.– We call the law of random vector X, the “image measure
P through the mapping of X”, i.e. the definite measure on B ( following way by: ∀Β ∈ B
( n)
(
PX ( Β ) = ∫ dPX ( x1 ,..., xn ) = P X −1 ( B ) Β
n
PX of
) defined in the
)
↑
Definition
(
= P ω
)
X (ω ) ∈ Β = P ( X ∈ Β )
Terms 1 and 2 on the one hand and terms 3, 4 and 5 on the other are different notations of the same mathematical notion.
n
X
X
−1
B ∈B (
(B ) ∈ a
Ω Figure 1.1. Measurable mapping
X
n
)
6
Discrete Stochastic Processes and Optimal Filtering
It is important to observe that as the measure calculable for all Β ∈ B The space law is denoted:
n
(
( n ) because X
,B (
n
a,
PX ( B ) is
is measurable.
provided with the Borel algebra n
P is given along
) , PX ) .
B(
n
) and then with the PX
NOTE.– As far as the basic and the exact definitions are concerned, the basic definition of random vectors is obviously a lot simpler and more intuitive and can happily be used in basic applications of probability calculations. On the other hand in more theoretical or sophisticated studies and notably in those calling into play several random vectors, X , Y , Z ,... considering the latter as definite mappings on the same space ( Ω, a ) ,
( i.e. X,Y,Z, ... : ( Ω, a ) → (
n
,B (
n
))) ,
will often prove to be useful even indispensable.
X (ω ) Y (ω )
ω Ω
n
Z (ω )
Figure 1.2. Family of measurable mappings
In effect, the expressions and calculations calling into play several (or the entirety) of these vectors can be written without ambiguity using the space
( Ω, a,P ) . Precisely, the events linked to X , Y , Z ,… are among elements a (and the probabilities of these events are measured by P ).
A of
Random Vectors
7
Let us give two examples: 1) if there are 2 random vectors X , Y : ( Ω, a, P ) →
(
n
,B
( )) n
and
( ) , the event ( X ∈ B ) ∩ (Y ∈ B′) (for example) can be
given B and B′ ∈ B
n
translated by X −1 ( B ) ∩ Y −1 ( B ′ ) ∈ a ;
2) there are 3 r.v. X , Y , Z : ( Ω, a, P ) →
(
,B (
) ) and given a ∈
* +.
Let us try to express the event ( Z ≥ a − X − Y ) . Let us state U = ( X , Y , Z ) and B =
where
B Borel set of
3
{( x, y, z ) ∈
3
, represents the half-space bounded by the plane ( Π ) not
containing the origin 0 and based on the triangle A B C.
C (a)
B (a) A(a)
0 Figure 1.3. Example of Borel set of
U is ( Ω, a ) →
(
3
}
x+y+z ≥ a .
,B
( ) ) measurable and: 3
U ( z ≥ a − x − y ) = (U ∈ B ) = U −1 ( B ) ∈ a .
3
8
Discrete Stochastic Processes and Optimal Filtering
NOTE ON SPACE ( Ω, a, P ) .– We said that if we took as given Ω and then on
Ω and then P on
a
and so on, we would consider the vectors
a
X , Y , Z ,... as
measurable mappings:
( Ω, a, P ) → (
n
,B
( )) n
This way of introducing the different concepts is the easiest to understand, but it rarely corresponds to real probability problems. In general
( Ω, a, P )
is not specified or is even given before “ X , Y , Z ,...
measurable mappings”. On the contrary, given the random physical or biological sizes X , Y , Z ,... of
n
, it is on departing from the latter that
X , Y , Z ,... definite measurable mappings on introduced.
( Ω, a, P )
( Ω, a, P )
( Ω, a, P )
and
are simultaneously
is an artificial space intended to serve as a link between
X , Y , Z ,... What has just been set out may seem exceedingly abstract but fortunately the general random vectors as they have just been defined are rarely used in practice. In any case, and as far as we are concerned, we will only have to manipulate in what follows the far more specific and concrete notion of a “random vector with a density function”. DEFINITION.– We say that the law PX of the random vector X has a density if there is a mapping f X :
(
n
,B
( )) → ( n
,B (
measurable, called the density of PX such that ∀B ∈ B
))
which is positive and
( n ).
P ( X ∈ B ) = PX ( B ) = ∫ dPX ( x1 ,..., xn ) = ∫ f X ( x1 ,..., xn ) dx1 ,..., dxn B
B
Random Vectors
9
VOCABULARY.– Sometimes we write
dPX ( x1 ,..., xn ) = f X ( x1 ,..., xn ) dx1 ,..., dxn and we say also that the measure PX admits the density f X with respect to the Lebesgue measure on density f X . NOTE.–
n
. We also say that the random vector
(
f X ( x1 ,...xn ) dx1 ,...dxn = P X ∈
∫B
n
X admits the
) =1.
For example, let the random vector be X = ( X 1 , X 2 , X 3 ) of density
f X ( x1 , x2 , x3 ) = K x3 1∆ ( x1 , x2 , x3 ) where ∆ is the half sphere defined by x12 + x22 + x32 ≤ R with x3 ≥ 0 . We easily obtain via a passage through spherical coordinates:
1 = ∫ Kx3 dx1 dx2 dx3 = K ∆
π R4 4
where K =
4 π R4
Marginals
⎛ X1 ⎞ ⎜ ⎟ Let the random vector be X = ⎜ ⎟ which has the law ⎜X ⎟ ⎝ 2⎠ probability f X . DEFINITION.– The r.v. marginal of
PX
and density of
X j , which is the j th component of X , is called the j th
X and the law PX j of X j is called the law of the j th marginal.
If we know
PX , we know how to find the PX j laws.
10
Discrete Stochastic Processes and Optimal Filtering
In effect ∀B ∈ B (
(
)
)
P X j ∈ B = P ⎡⎣( X 1 ∈
∫
(
) ∩ ... ∩ ( X j ∈ B ) ∩ ... ∩ ( X n ∈ )⎤⎦
)
f X x1 ,..., x j ,..., xn dx1 ...dx j ...dxn
×...× B ×...×
using the Fubini theorem:
= ∫ dx j ∫ B
n−1
(
f X x1 ,..., x j ,..., xn
)
dx1...dxn except
The equality applying for all
( )
fX j xj = ∫
n−1
(
dx j
B , we obtain:
f X x1 ,..., x j ,..., xn
)
dx1...dxn except dx j
NOTE.– Reciprocally: except in the case of independent components the knowledge of PX ⇒ / to that of PX . j
EXAMPLE.– Let us consider: 1) A Gaussian pair Z
f Z ( x, y ) =
T
= ( X , Y ) of density of probability
⎛ x2 + y 2 ⎞ 1 exp ⎜ − ⎟. 2π 2 ⎠ ⎝
Random Vectors
11
We obtain the densities of the marginals:
fX ( x) = ∫
fY ( y ) = ∫
+∞ −∞
+∞ −∞
f z ( x, y ) dy =
⎛ x2 ⎞ 1 exp ⎜ − ⎟ and 2π ⎝ 2⎠
f z ( x, y ) dx =
⎛ y2 ⎞ 1 exp ⎜ − ⎟ . 2π ⎝ 2 ⎠
A second random non-Gaussian pair W
T
= (U , V ) whose density of
probability fW is defined by:
fW ( u , v ) = 2 f Z ( u , v ) if uv ≥ 0
fW ( u, v ) = 0 if uv < 0
Let us calculate the marginals
fU ( u ) = ∫
+∞ −∞
fW ( u, v ) dv = ∫ =∫
+∞ −∞ +∞
−∞
2 f Z ( u, v ) dv
if
u≤0
2 f Z ( u , v ) dv
if
u>0
From which we easily come to fU ( u ) =
In addition we obtain fV ( v ) =
⎛ u2 ⎞ 1 exp ⎜ − ⎟ 2π ⎝ 2 ⎠
⎛ v2 ⎞ 1 exp ⎜ − ⎟ 2π ⎝ 2⎠
CONCLUSION.– We can clearly see from this example that the marginal densities (identical in 1 and 2) do not determine the densities of the vectors (different in 1 and 2).
12
Discrete Stochastic Processes and Optimal Filtering
Probability distribution function
DEFINITION.– We call the mapping:
FX : ( x1 ,..., xn ) → FX ( x1 ,..., xn )
[ 0,1]
n
the distribution function of a random vector X
T
= ( X1 ,..., X n ) .
This is defined by:
FX ( x1 ,..., xn ) = P ( ( X1 ≤ x1 ) ) ∩ ... ∩ ( X n ≤ xn ) and in integral form, since X is a probability density vector:
FX ( x1 ,..., xn ) = ∫
x1 −∞
xn
∫ −∞ f X ( u1,.., un ) du1.. dun .
Some general properties: – ∀j = 1 at n the mapping x j → FX ( x1 ,..., xn ) is non-decreasing; – FX ( x1 ,..., xn ) → 1 when all the variables x j → ∞ ; – FX ( x1 ,..., xn ) → 0 if one at least of the variables x j → −∞ ; – If ( x1 ,..., xn ) → f X ( x1 ,..., xn ) is continuous, then
∂ n FX = fX . ∂ xn ...∂ x1
EXERCISE.– Determine the probability distribution of the pair
( X ,Y )
of density
f ( x, y ) = K xy on the rectangle ∆ = [1,3] × [ 2, 4] and state precisely the value of
K.
Random Vectors
13
Independence
DEFINITION.– We say that a family of r.v., X 1 ,..., X n , is an independent family if ∀ J ⊂ {1, 2,..., n} and for all the family of B j ∈ B (
):
⎛ ⎞ P ⎜ ∩ X j ∈ Bj ⎟ = ∏ P X j ∈ Bj . ⎝ j∈J ⎠ j∈J
(
∈B (
As
)
(
)
) , it is easy to verify, by making certain Borel sets equal to
,
that the definition of independence is equivalent to the following:
∀B j ∈ B (
)
⎛ n ⎞ n : P ⎜ ∩ X j ∈ B j ⎟ = ∏ P X j ∈ Bj ⎝ j =1 ⎠ j =1
(
)
(
)
again equivalent to:
∀B j ∈ B (
)
n
(
P ( X ∈ B1 × ... × Bn ) = ∏ P X j ∈ Bj j =1
)
i.e. by introducing the laws of probabilities:
∀B j ∈ B (
NOTE.–
B
This
( ) =B ( n
probabilities PX j
n
) : PX ( B1 × ... × Bn ) = ∏ PX j =1
law
of
) ⊗ ... ⊗ B ( ) ) is (defined on B ( ) ).
j
( Bj )
probability
PX
(defined
on
the tensor product of the laws of
Symbolically we write this as PX = PX ⊗ ... ⊗ PX n . 1
14
Discrete Stochastic Processes and Optimal Filtering
NOTE.– Let X 1 ,..., X n be a family of r.v. If this family is independent, the r.v. are independent pairwise, but the converse is false. PROPOSITION.– Let X = ( X 1 ,..., X n ) be a real random vector admitting the density of probability f X and the components X 1 , ..., X n admitting the densities
f X ,..., f X n . 1
In order for the family of components to be an independent family, it is necessary that and it suffices that:
n
f X ( x1 ,..., xn ) = ∏ f X j ( x j ) j =1
DEMONSTRATION.– In the simplified case where f X is continuous: – If ( X 1 ,..., X n ) is an independent family:
n ⎛ n ⎞ n FX ( x1 ,..., xn ) = P ⎜ ∩ X j ≤ x j ⎟ = ∏ P X j ≤ x j = ∏ FX j x j ⎝ j =1 ⎠ j =1 j =1
(
)
(
)
( )
by deriving the two extreme members:
f X ( x1 ,..., xn ) =
n ∂F n ∂ n FX ( x1 ,..., xn ) X j (xj ) =∏ =∏ f X j ( x j ) ; ∂xn ...∂x1 ∂x j j =1 j =1
Random Vectors
– reciprocally if f X ( x1 ,..., xn ) = i.e. B j ∈ B (
) , for
15
n
∏ fX ( xj ) : j
j =1
j = 1 at n :
N ⎛ ⎞ ⎛ n ⎞ P ⎜ ∩ X j ∈ Bj ⎟ = P ⎜ X ∈∏ Bj ⎟ = ∫ n f ( x ,..., xn ) dx1... dxn ∏ Bj X 1 ⎝ j =1 ⎠ J =1 ⎝ ⎠ j=1
(
=∫
)
n
∏ Bj ∏ j =1 n
fX
j
n
( )
x j dx j = ∏ ∫ j =1
j =1
Bj
NOTE.– The equality f X ( x1 ,..., xn ) =
fX
j
n
( )
(
x j dx j = ∏ P X j ∈ B j j =1
)
n
∏ fX j ( xj )
is the definition of the
j −1
function of n variables and f X is the tensor product of the functions of a variable
f X . Symbolically we write f X = f X ⊗ ... ⊗ f X n (not to be confused with the 1
j
ordinary
f = f1 f 2 i i f n
product
defined
by
f ( x ) = ( f1 ( x ) f 2 ( x )i i f n ( x ) ) .
EXAMPLE.– Let the random pair X = ( X1 , X 2 ) of density: ⎛ x 2 + x22 ⎞ 1 exp ⎜ − 1 ⎟. ⎜ 2π 2 ⎟⎠ ⎝
As
⎛ x 2 + x22 ⎞ 1 exp ⎜ − 1 ⎟= ⎜ 2π 2 ⎟⎠ ⎝
⎛ x 2 ⎞ 1 ⎛ x22 ⎞ exp ⎜ − ⎟ ⎜− ⎟ ⎜ 2 ⎟ 2π ⎜ 2 ⎟ 2π ⎝ ⎠ ⎝ ⎠ 1
⎛ x2 ⎞ ⎛ x2 ⎞ 1 exp ⎜ − 1 ⎟ and exp ⎜ − 2 ⎟ are the densities of X1 and of X 2 , ⎜ 2 ⎟ ⎜ 2 ⎟ 2π 2π ⎝ ⎠ ⎝ ⎠ these two components X1 and X 2 are independent.
and as
1
16
Discrete Stochastic Processes and Optimal Filtering
DEFINITION.– Two random vectors:
(
X = ( X1 ,..., X n ) and Y = Y1 ,..., Yp
)
are said to be independent if:
∀B ∈ B
( ) and B ' ∈B ( ) n
p
P ( ( X ∈ B ) ∩ (Y ∈ B ') ) = P ( X ∈ B ) P (Y ∈ B ') The sum of independent random variables
NOTE.– We are frequently led to calculate the probability P in order for a function of n r.v. given as X 1 ,..., X n to verify a certain inequality. Let us denote this probability
as
P (inequality).
Let
us
assume
that
the
random
vector
X = ( X 1 ,..., X n ) possesses a density of probability f X ( x1 ,..., xn ) . The
method of obtaining P (inequality) consists of determining B ∈ B verifies ( X 1 ,..., X n ) ∈ B .
( n ) which
∫ B f X ( x1,..., xn ) dx1... dxn
We thus obtain: P (inequality) = EXAMPLES.– 1) P ( X 1 + X 2 ≤ z ) = P where B =
{( x, y ) ∈
2
( ( X1, X 2 ) ∈ B ) = ∫ B f X ( x1, x2 ) dx1 dx2
}
x+ y ≤ z
y z 0
z
x
Random Vectors
17
P ( X1 + X 2 ≤ a − X 3 ) = P ( ( X1 , X 2 , X 3 ) ∈ B ) = ∫ f X ( x1 , x2 , x3 ) dx1 dx2 dx3 B
z C (a)
0
y
B (a)
A (a)
x
1 space containing the origin 0 and limited by the plane placed on the 2 triangle A B C from equation x + y + z = a
B is the
P ( Max
( X1 + X 2 ) ≤ z ) = P ( ( X1 , X 2 ) ∈ B ) = ∫ f X ( x1 , x2 ) dx1 dx2 B where B is the non-shaded portion below.
y z 0
z
x
Starting with example 1) we will show the following.
18
Discrete Stochastic Processes and Optimal Filtering
PROPOSITION.– Let X and Y be two real independent r.v. of probability densities, respectively f X and fY . The r.v.
Z = X + Y admits a probability density f Z defined as:
f Z ( z ) = ( f X ∗ fY )( z ) = ∫
+∞ −∞
f X ( x ) fY ( z − x ) dx
DEMONSTRATION. – Let us start from the probability distribution of Z .
FZ ( z ) = P ( Z ≤ z ) = P ( X + Y ≤ z ) = P ( ( X , Y ) ∈ B ) (where B is defined in example 1) above)
= ∫ f ( x, y ) dx dy = (independence) B
∫B
f X ( x ) fY ( y ) dx dy
y
x+ y = z
z z−x 0
=∫
+∞ −∞
In stating
=∫
+∞ −∞
f X ( x ) dx ∫
z− x −∞
x
z
x
fY ( y ) dy .
y =u−x:
f X ( x ) dx ∫
z −∞
fY ( u − x ) du = ∫
z −∞
du ∫
+∞ −∞
f X ( x ) fY ( u − x ) dx .
Random Vectors
The mapping u →
+∞
∫ −∞
19
f X ( x ) fY ( u − x ) dx being continuous, of which
FZ ( z ) is a primitive from and:
FZ′ ( z ) = f Z ( z ) = ∫
+∞ −∞
f X ( x ) fY ( z − x ) dx .
NOTE.– If (for example) the support of f X and fY is
+
, i.e. if
f X ( x ) = f X ( x )1 [0,∞[ ( x ) and fY ( y ) = fY ( y ) 1 [0,∞[( y ) we easily arrive at:
z
f Z ( z ) = ∫ f X ( x ) fY ( z − x ) dx 0
EXAMPLE.– X and Y are two exponential r.v. of parameter independent. Let us take as given For
z≤0
For
z≥0
λ
which are
Z = X +Y :
fZ ( z ) = 0 .
fZ ( z ) = ∫
+∞
−∞
and f Z ( z ) = λ ze 2
z −λ z − x f X ( x ) fY ( z − x ) dx = ∫ λ e− λ x λ e ( ) dx = λ 2 ze− λ z
−λ z
0
1[0,∞[ ( z ) .
20
Discrete Stochastic Processes and Optimal Filtering
1.2. Spaces
L1 ( dP ) and L2 ( dP )
1.2.1. Definitions
The family of r.v. X :
ω
→
X (ω )
( Ω, a,P ) ( ,B ( ) ) , denoted
forms a vector space on
Two vector subspaces of will be defined.
ε
ε.
play a particularly important role and these are what
The definitions would be in effect the final element in the construction of the Lebesgue integral of measurable mappings, but this construction will not be given here and we will be able to progress without it. DEFINITION.– We say that two random variables X and X ′ defined on ( Ω, a )
are almost surely equal and we write X = X ′ a.s. if X = X ' except eventually on
an event N of zero probability (that is to say N ∈ a and P ( N ) = 0 ). We note: – X = {class (of equivalences) of r.v.
X ′ almost definitely equal to X };
– O = {class (of equivalences) of r.v. almost definitely equal to O }. We can now give: – the definition of L ( dP ) as a vector space of first order random variables; and 1
2
– the definition of L
{ L ( dP ) = {
( dP ) as a vector space of second order random variables:
L1 ( dP ) = r. v. X 2
r.v.
X
} X (ω ) dP (ω ) < ∞ }
∫ Ω X (ω ) ∫Ω
2
dP (ω ) < ∞
Random Vectors
21
where, in these expressions, the r.v. are clearly defined except for at a zero probability event, or otherwise: the r.v. X are any representatives of the X classes, because, by construction the integrals of the r.v. are not modified if we modify the latter on zero probability events. Note on inequality
∫ Ω X (ω )
dP (ω ) < ∞
Introducing the two positive random variables:
X + = Sup ( X , 0 ) and X − = Sup ( − X 1 0 ) we can write X = X
+
− X − and X = X + + X − .
Let X ∈ L ( dP ) ; we thus have: 1
∫ Ω X (ω ) dP (ω ) < ∞ ⇔ ∫ Ω X (ω ) dP (ω ) < ∞ and − ∫ Ω X (ω ) dP (ω ) < ∞. +
So, if X ∈ L ( dP ) , the integral 1
+ − ∫ Ω X (ω ) dP (ω ) = ∫ Ω X (ω ) dP − ∫ Ω X (ω ) dP (ω )
is defined without ambiguity. 2
NOTE.– L
( dP ) ⊂ L1 ( dP ) 2
In effect, given X ∈ L
(∫
Ω
( dP ) , following Schwarz’s inequality:
X (ω ) dP (ω )
) ≤∫ 2
Ω
X 2 (ω ) dP ∫ dP (ω ) < ∞ Ω
1
22
Discrete Stochastic Processes and Optimal Filtering
⎛ 1 ⎛ x − m ⎞2 ⎞ 1 exp ⎜ − ⎜ ⎟ ). ⎜ 2 ⎝ σ ⎟⎠ ⎟ 2πσ ⎝ ⎠
EXAMPLE.– Let X be a Gaussian r.v. (density This belongs to L ( dP ) and to L 1
2
Let Y be a Cauchy r.v. (density
( dP ) .
(
1
π 1 + x2
)
).
This does not belong to L ( dP ) and thus does not belong to L 1
2
( dP ) either.
1.2.2. Properties
– L ( dP ) is a Banach space; we will not use this property for what follows; 1
2
– L
( dP )
is a Hilbert space. We will give here the properties without any
demonstration. 2
* We can equip L
( dP ) with the scalar product defined by:
∀X , Y ∈ L2 ( dP ) < X,Y > = ∫ X (ω ) Y (ω ) dP (ω ) Ω
This expression is well defined because following Schwarz’s inequality:
∫Ω
2
X (ω ) Y (ω ) dP (ω ) ≤ ∫ X 2 (ω ) dP (ω ) ∫ Y 2 (ω ) dP (ω ) < ∞ Ω
Ω
and the axioms of the scalar product are immediately verifiable.
Random Vectors 2
* L
23
( dP ) is a vector space normed by: X = < X, X > =
2 ∫ Ω X (ω ) dP (ω )
It is easy to verify that:
∀X , Y ∈ L2 ( dP )
X +Y ≤ X + Y
∀X ∈ L2 ( dP ) and ∀λ ∈
λX = λ
X
As far as the second axiom is concerned: – if X = 0 ⇒ X – if
X =
(∫
Ω
=0;
)
X 2 (ω ) dP (ω ) = 0 ⇒ X = 0 a.s.
L2 ( dP ) is a complete space for the norm 2
sequence X n converges to X ∈ L
.
( or
)
X =0 .
defined above. (Every Cauchy
( dP ) .)
1.3. Mathematical expectation and applications 1.3.1. Definitions
We are studying a general random vector (not necessarily with a density function):
(
X = X1,..., X n
)
:
( Ω, a , P ) → (
n
,B
( )) . n
24
Discrete Stochastic Processes and Optimal Filtering
Furthermore, we give ourselves a measurable mapping:
Ψ:
(
n
,B
( n )) → (
))
,B (
Ψ X (also denoted Ψ ( X ) or Ψ ( X 1 ,..., X n ) ) is a measurable mapping (thus
an r.v.) defined on ( Ω, a ) .
X
(
( Ω, a, P )
n
,B
n
X
Ψ
Ψ X
(
( ), P )
,B (
))
DEFINITION.– Under the hypothesis Ψ X ∈ L ( dP ) , we call mathematical 1
expectation of the random value Ψ X the expression Ε ( Ψ X ) defined as:
Ε(Ψ X ) = ∫
Ω
(Ψ
or, to remind ourselves that
X )(ω ) dP (ω )
X is a vector:
Ε ( Ψ ( X 1 ,..., X 2 ) ) = ∫ Ψ ( X 1 (ω ) ,..., X n (ω ) ) dP (ω ) Ω
NOTE.– This definition of the mathematical expectation of Ψ X is well adapted to general problems or to those of a more theoretical orientation; in particular, it is 2
by using the latter that we construct L
( dP ) the Hilbert space of the second order
r.v. In practice, however, it is the PX law (similar to the measure P by the mapping
X ) and not P that we do not know. We thus want to use the law PX to
Random Vectors
25
express Ε ( Ψ X ) , and it is said that the calculation of Ε ( Ψ X ) from the space ( Ω, a,P ) to the space
(
n
,B
( ), P ) . n
X
In order to simplify the writing in the theorem that follows (and as will often
occur in the remainder of this work) ( X 1 ,..., X n ) , ( x1 ,..., xn ) and dx1...dxn will often be denoted as
X , x and dx respectively.
Transfer theorem
Let us assume Ψ X ∈ L ( dP ) ; we thus have: 1
Ε(Ψ X ) = ∫
Ω
(Ψ
X )(ω ) dP (ω ) = ∫
n
Ψ ( x ) dPX ( x )
In particular, if PX admits a density f X :
E (Ψ X ) = ∫
n
Ψ ( x ) f X ( x ) dx and Ε X = ∫ x f X ( x ) dx .
Ψ ∈ L1 ( dPX ) DEMONSTRATION.– – The equality of 2) is true if Ψ = 1B with B ∈ B
Ε ( Ψ X ) = Ε (1B X ) = PX ( B ) =∫
n 1B
( x ) dPX ( x ) = ∫
n
– The equality is still true if m
j =1
because
Ψ ( x ) dPX ( x ).
Ψ is a simple measurable mapping, that is to say if
Ψ = ∑ λ j 1B or B j ∈ B j
( n)
( ) and are pairwise disjoint. n
26
Discrete Stochastic Processes and Optimal Filtering
We have in effect:
(
m
)
m
( )
Ε ( Ψ X ) = ∑ λ j Ε 1B j X = ∑ λ j PX B j j =1
m
= ∑λj ∫
n 1B
j =1
=∫
n
( x ) dPX ( x ) = ∫ j
j =1
⎛ m ⎞ λ j 1B ( x ) ⎟ dPX ( x ) n ⎜∑ ⎜ j =1 ⎟ j ⎝ ⎠
Ψ ( x ) dPX ( x )
If we now assume that Ψ is a positive measurable mapping, we know that it is the limit of an increasing sequence of positive simple measurable mappings Ψ P .
⎛
We thus have ⎜
∫ Ω ( Ψ p X ) (ω ) = ∫
⎜ with Ψp ⎝
n
Ψ p ( x ) dPX ( x )
Ψ
Ψ p X is also a positive increasing sequence which converges to Ψ X and by taking the limits of the two members when p ↑ ∞ , we obtain, according to the monotone convergence theorem:
∫Ω (Ψ
X )(ω ) dP (ω ) = ∫
n
Ψ ( x ) dPX ( x ) .
If Ψ is a measurable mapping of any sort we still use the decomposition
Ψ = Ψ + − Ψ − and
Ψ = Ψ+ + Ψ− . +
Furthermore, it is clear that ( Ψ X ) = Ψ
+
−
X and ( Ψ X ) = Ψ − X .
It emerges that: +
−
(
) (
Ε Ψ X = Ε (Ψ X ) + Ε (Ψ X ) = Ε Ψ+ X + Ε Ψ− X
)
Random Vectors
27
i.e. according to what we have already seen:
=∫
n
Ψ + ( x ) dPX ( x ) + ∫
n
Ψ − ( x ) dPX ( x ) = ∫
n
Ψ ( x ) dPX ( x )
As Ψ X ∈ L ( dP ) , we can deduce from this that Ψ ∈ L ( dPX 1
1
(reciprocally if Ψ ∈ L ( dPX 1
In particular Ε ( Ψ X )
) then Ψ
+
)
X ∈ L1 ( dP ) ).
and Ε ( Ψ X ) are finite, and −
(
) (
Ε (Ψ X ) = Ε Ψ+ X − Ε Ψ− X =∫
n
Ψ + ( x ) dPX ( x ) − ∫
=∫
n
Ψ ( x ) dPX ( x )
n
)
Ψ − ( x ) dPX ( x )
NOTE.– (which is an extension of the preceding note) In certain works the notion of “a random vector as a measurable mapping” is not developed, as it is judged as being too abstract. In this case the integral
∫
nΨ
( x ) dPX ( x ) = ∫
n
Ψ ( x ) f X ( x ) dx
PX admits the density f X ) is given as a definition of Ε ( Ψ X ) . EXAMPLES.– 1) Let the “random Gaussian vector” be X
f X ( x1 , x2 ) =
where
ρ ∈ ]−1,1[
1 2π 1 − ρ 2
T
= ( X1 , X 2 ) of density:
⎛ 1 1 ⎞ exp ⎜ − x12 − 2 ρ x1 x2 + x22 ⎟ 2 ⎝ 2 1-ρ ⎠
(
)
and let the mapping Ψ be ( x1 , x2 ) → x1 x2
3
(if
28
Discrete Stochastic Processes and Optimal Filtering
The condition:
∫
2
x1 x23
⎛ 1 exp ⎜ − 2 ⎜ 2 1− ρ 2 2π 1 − ρ ⎝ 1
(
)
(
⎞ x12 − 2 ρ x1 x2 + x22 ⎟ dx1 dx2 < ∞ ⎟ ⎠
)
is easily verifiable and:
EX1 X 23 = ∫
x x3 2 1 2
⎛ 1 exp ⎜ − ⎜ 2 2 1− ρ2 2n 1 − ρ ⎝ 1
(
)
(
⎞ x12 − 2 ρ x1 x2 + x22 ⎟ dx1dx2 ⎟ ⎠
)
2) Given a random Cauchy variable of density
1
π∫
x
fX ( x) =
1
1 π 1 + x2
1 dx = +∞ thus X ∉ L1 ( dP ) and EX is not defined. 1 + x2
Let us consider next the transformation Ψ which consists of “rectifying and clipping” the r.v. X .
Ψ
K
−K
0
K
x
Figure 1.4. Rectifying and clipping operation
Random Vectors
Ψ ( x ) dPX ( x ) =
∫
1
K
K
−K
∞
29
K
∫ − K x 1 + x 2 dx + ∫ −∞ 1 + x 2 dx + ∫ K 1 + x2 dx
⎛π ⎞ = ln 1 + K 2 + 2 K ⎜ − K ⎟ < ∞. ⎝2 ⎠
(
)
Thus, Ψ X ∈ L ( dP ) and: 1
Ε(Ψ X ) = ∫
+∞ −∞
⎛π ⎞ Ψ ( x ) dPX ( x ) = ln 1 + K 2 + 2 K ⎜ − K ⎟ . ⎝2 ⎠
DEFINITION.– Given np r.v. X jk
(
( j = 1 at
)
p, k = 1 at n ) ∈ L1 ( dP ) , we
⎛ X 11 … X 1n ⎞ ⎜ ⎟ define the mean of the matrix ⎡⎣ X jk ⎤⎦ = ⎜ ⎟ by: ⎜ X p1 X pn ⎟⎠ ⎝ ⎛ ΕX 11 … ΕX 1n ⎞ ⎜ ⎟ Ε ⎡⎣ X jk ⎤⎦ = ⎜ ⎟. ⎜ ΕX p1 ΕX pn ⎟⎠ ⎝ In particular, given a random vector:
⎛ X1 ⎞ ⎜ ⎟ X = ⎜ ⎟ or X T = ( X 1 ,..., X n ) verifying X j ∈ L1 ( dP ) ∀j = 1 at n , ⎜X ⎟ ⎝ n⎠
(
)
30
Discrete Stochastic Processes and Optimal Filtering
⎛ EX 1 ⎞ ⎜ ⎟ ⎡ T⎤ We state Ε [ X ] = ⎜ ⎟ or Ε ⎣ X ⎦ = ( EX1 ,..., ΕX n ) . ⎜ EX ⎟ ⎝ n⎠
(
)
Mathematical expectation of a complex r.v.
DEFINITION.– Given a complex r.v. X = X 1 +i X 2 , we say that:
X ∈ L1 ( dP ) if X 1 and X 2 ∈ L1 ( dP ) . If X ∈ L ( dP ) we define its mathematical expectation as: 1
Ε ( X ) = ΕX 1 + i Ε X 2 . Transformation of random vectors
We are studying a real random vector X = ( X 1 ,..., X n ) with a probability
density of f X ( x )1D ( x ) = f X ( x1 ,..., xn ) 1D ( x1 ,..., xn ) where D is an open set of
n
.
Furthermore, we give ourselves the mapping:
α : x = ( x1 ,..., xn ) → y = α ( x ) = (α1 ( x1 ,..., xn ) ,...,α n ( x1 ,..., xn ) ) ∆
D We assume that that
α
α
1
is a C -diffeomorphism of D on an open ∆ of
is bijective and that
α
and
β =α
−1
1
are of class C .
n
, i.e.
Random Vectors
X
α
31
Y =α (X )
∆
D Figure 1.5. Transformation of a random vector
The random vector Y = (Y1 ,..., Yn ) =
X
by a
C1 -diffeomorphism
(α1 ( X1,..., X n ) ,...,α n ( X1,..., X n ) )
takes its values on ∆ and we wish to determine fY ( y )1∆ ( y ) , its probability density. PROPOSITION.–
fY ( y )1∆ ( y ) = f X ( β ( y ) ) Det J β ( y ) 1∆ ( y ) DEMONSTRATION.– Given:
Ψ ∈ L1 ( dy )
Ε ( Ψ (Y ) ) = ∫
n
Ψ ( y ) fY ( y )1∆ ( y ) dy .
Furthermore:
Ε ( Ψ (Y ) ) = ΕΨ (α ( X ) ) = ∫
n
Ψ (α ( x ) ) f X ( x )1D ( x ) dx .
By applying the change of variables theorem in multiple integrals and by denoting the Jacobian matrix of the mapping
=∫
n
β
as J β ( y ) , we arrive at:
Ψ ( y ) f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) dy .
32
Discrete Stochastic Processes and Optimal Filtering
Finally, the equality:
∫ n Ψ ( y ) fY ( y )1∆ ( y ) dy = ∫ n Ψ ( y ) f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) dy has validity for all Ψ ∈ L ( dy ) ; we deduce from it, using Haar’s lemma, the 1
formula we are looking for:
fY ( y )1∆ ( y ) = f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) IN PARTICULAR.– If X is an r.v. and the mapping:
α : x → y = α ( x) D⊂
Α⊂
the equality of the proposition becomes:
fY ( y )1∆ ( y ) = f X ( β ( y ) ) β ′ ( y ) 1∆ ( y ) EXAMPLE.– Let the random ordered pair be Z = ( X , Y ) of probability density:
f Z ( x, y ) =
1 x y
1 2 2 D
( x, y )
where
D = ]1, ∞[ × ]1, ∞[ ⊂
2
Random Vectors 1
Furthermore, we allow the C -diffeomorphism
33
α:
α
β
D 1
∆ 1
0
x
1
0
u
1
defined by:
⎛ ⎜ ⎜ ⎜ ⎜ ⎜⎜ ⎝
α : ( x, y ) → ( u = α1 ( x, y ) = xy , v = α 2 ( x, y ) = x y ) ∈D
∈∆
(
β : ( u, v ) → x = β1 ( u, v ) = uv , y = β 2 ( u, v ) = u v ∈∆
)
∈D
⎛ v u 1⎜ J β ( u, v ) = ⎜ 2⎜ 1 ⎜ uv ⎝
⎞ v ⎟ 1 ⎟ and Det J β ( u , v ) = . u⎟ 2 v − 3 ⎟ v 2⎠ u
(
The vector W = U = X Y , V = X
Y
) thus admits the probability density:
fW ( u , v )1∆ ( u , v ) = f Z ( β1 ( u , v ) , β 2 ( u , v ) ) Det J β ( u, v ) 1∆ ( u, v ) =
(
1 uv
)
1
2
( uv )
2
1 1∆ ( u, v ) = 12 1∆ ( u, v ) 2v 2u v
34
Discrete Stochastic Processes and Optimal Filtering
NOTE.–
Reciprocally
W = (U , V )
vector
of
probability
density
fW ( u , v ) 1∆ ( u , v ) and whose components are dependent is transformed by β
into vector Z = ( X , Y ) of probability density f Z ( x, y ) 1D ( x, y ) and whose components are independent.
1.3.2. Characteristic functions of a random vector
DEFINITION.– We call the characteristic function of a random vector:
X T = ( X1 ... X n ) the mapping ϕ X : ( u1 ,..., u2 ) → ϕ X ( u1 ,..., u2 ) defined by: n
⎛ ⎜ ⎝
n
⎞ ⎟ ⎠
ϕ X ( u1 ,..., un ) = Ε exp ⎜ i ∑ u j X j ⎟ =∫
j =1
⎛ n ⎞ exp ⎜⎜ i ∑ u j x j ⎟⎟ f X ( x1 ,...xn ) dx1... dxn n ⎝ j =1 ⎠
(The definition of ΕΨ ( X 1 ,..., X n ) is written with:
⎛ n ⎞ Ψ ( X 1 ,..., X n ) = exp ⎜ i ∑ u j X j ⎟ ⎜ j =1 ⎟ ⎝ ⎠ and the integration theorem is applied with respect to the image measure.)
ϕX
is thus the Fourier transform of
ϕX = F ( fX )
.
fX
which can be denoted
Random Vectors
35
(In analysis, it is preferable to write:
F ( f X ) ( u1 ,..., un ) = ∫
n ⎛ ⎞ exp − i u x f u ,..., un ) dx1... dxn . ) ⎜ j j⎟ n ⎜ ∑ ⎟ X( 1 ⎝ j =1 ⎠
Some general properties of the Fourier transform: –
ϕ X ( u1 ,...u2 ) ≤ ∫
n
f X ( x1 ,..., xn ) dx1... dxn = ϕ X ( 0,..., 0 ) = 1 ;
– the mapping ( u1 ,..., u2 ) → ϕ X ( u1 ,..., u2 ) is continuous; n
– the mapping F : f X → ϕ X is injective. Very simple example
[
]n
The random vector X takes its values from within the hypercube ∆ = −1,1 and it admits a probability density:
f X ( x1 ,..., xn ) =
1 1 ∆( x1 ,..., xn ) 2n
(note that components X j are independent).
1 exp i ( u1 x1 + ... + un xn ) dx1...dxn 2n ∫ ∆ n sin u 1 n +1 j = n ∏ ∫ exp iu j x j dx j = ∏ uj 2 j =1 −1 j =1
ϕ ( u1 ,..., un ) =
(
)
where, in this last expression and thanks to the extension by continuity, we replace:
sin u1 sin u2 by 1 if u1 = 0 , by 1 if u2 = 0 ,... u1 u2
36
Discrete Stochastic Processes and Optimal Filtering
Fourier transform inversion
F
fX
ϕX
F −1 As shall be seen later in the work, there are excellent reasons (simplified calculations) for studying certain questions using characteristic functions rather than probability densities, but we often need to revert back to densities. The problem which arises is that of the invertibility of the Fourier transform F , which is studied in specialized courses. It will be enough here to remember one condition. PROPOSITION.– If (i.e.
∫
n
ϕ X ( u1 ,..., un ) du1...dun < ∞
ϕ X ∈ L1 ( du1...dun ) ), f X ( x1 ,..., xn ) =
1
( 2π )n
then F
∫
−1
exists and:
⎛ n ⎞ exp − i u x ϕ ⎜ j j⎟ n ⎜ ∑ ⎟ X = 1 j ⎝ ⎠
( u1 ,..., un ) du1...dun .
In addition, the mapping ( x1 ,..., xn ) → f X ( x1 ,..., xn ) is continuous. EXAMPLE.–
Given
a
Gaussian
r.v.
(
)
X ∼ Ν m, σ 2 ,
i.e.
that
⎛ 1 ⎛ x − m ⎞2 ⎞ 1 exp ⎜ − ⎜ ⎟ and assuming that σ ≠ 0 we obtain ⎜ 2 ⎝ σ ⎟⎠ ⎟ 2πσ ⎝ ⎠ 2 2 ⎛ uσ ⎞ ϕ X ( u ) = exp ⎜ ium − ⎟. 2 ⎝ ⎠ fX ( x) =
It is clear that ϕ X ∈ L1 ( du ) and
fX ( x) =
1 2π
+∞
∫ −∞ exp ( −iux ) ϕ X ( u ) du .
Random Vectors
37
Properties and mappings of characteristic functions 1) Independence
PROPOSITION.– In order for the components X j of the random vector
X T = ( X 1 ,..., X n ) to be independent, it is necessary and sufficient that: n
ϕ X ( u1 ,..., un ) = ∏ ϕ X ( u j ) . j
j =1
DEMONSTRATION.– Necessary condition:
⎛ ⎜ ⎝
⎞ ⎟ ⎠
n
ϕ X ( u1 ,..., un ) = ∫ exp ⎜ i ∑ u j x j ⎟ f X ( x1 ,..., xn ) dx1...dxn n
j =1
Thanks to the independence:
=∫
n ⎛ n ⎞ n exp i u x f x dx ... dx = ⎜ ⎟ ( ) ∏ϕ X j (u j ) . 1 j j ⎟∏ X j n n ⎜ ∑ j j =1 ⎝ j =1 ⎠ j =1
Sufficient condition: we start from the hypothesis:
∫ =∫
⎛
i n exp ⎜ ⎜
n
⎞
∑ u j x j ⎟⎟ f x ( x1,..., xn ) dx1... dxn
⎝ j =1 ⎠ ⎛ n ⎞ n exp i u x x j dx1... dxn ⎜ ⎟ j j ⎟∏ f X n ⎜ ∑ j = 1 j = 1 j ⎝ ⎠
( )
38
Discrete Stochastic Processes and Optimal Filtering
from which we deduce: f X ( x1 ,..., xn ) =
n
∏ f X j ( x j ) , i.e. the independence, j =1
since the Fourier transform f X ⎯⎯ → ϕ X is injective. NOTE.– We must not confuse this result with that which concerns the sum of independent r.v. and which is stated in the following manner. n
If X 1 ,..., X n are independent r.v., then
ϕ∑ X ( u ) = ∏ ϕ X j
j
j =1
j
(u ) .
If there are for example n independent random variables:
(
)
(
X 1 ∼ Ν m1 , σ 2 ,..., X n ∼ Ν mn , σ 2 and n real constants
)
λ1 ,..., λn , the note above enables us to determine the law of
n
∑λj X j .
the random value
j =1
λj X j
In effect the r.v.
ϕ∑ j
λ X
=e
and thus
j
j
are independent and:
n
n
j =1
j =1
( )
n
( u ) = ∏ ϕλ j X j ( u ) = ∏ ϕ X j λ j u = ∏ e
1 iu ∑ λ j m j − u 2 ∑ λ 2j σ 2j 2 j j
n
⎛
j =1
⎝
⎞
∑ λ j X j ∼ Ν ⎜⎜ ∑ λ j m j , ∑ λ 2j σ 2j ⎟⎟ . j
j
⎠
j =1
1 iuλ j m j − u 2 λ 2j σ 2j 2
Random Vectors
39
2) Calculation of the moment functions of the components X j (up to the 2nd order, for example)
Let us assume
ϕX ∈C2
( ). n
In applying Lebesgue’s theorem (whose hypotheses are immediately verifiable) once we obtain:
∀K = 1 to n
∂ϕ X ( 0,..., 0 ) ∂uK
⎛ ⎞ ⎛ ⎞ = ⎜ ∫ n ixK exp ⎜ i ∑ u j x j ⎟ f X ( x1 ,..., xn ) dx1...dxn ⎟ ⎜ j ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠( u1 = 0,...,un = 0 ) = i∫
n
xK f X ( x1 ,..., xn ) dx1...dxn = i Ε X K
i.e. Ε X K = −i
∂ϕ X ( 0,..., 0 ) . ∂u K
By applying this theorem a second time, we have:
∀k
and
∈ (1,2,..., n )
EX K X =
∂ 2ϕ X ( 0,..., 0 ) ∂u ∂uK
1.4. Second order random variables and vectors
Let us begin by recalling the definitions and usual properties relative to 2nd order random variables. DEFINITIONS.–
Given
X ∈ L2 ( dP )
of
probability
density
E X 2 and E X have a value. We call variance of X the expression: Var X = Ε X 2 − ( Ε X ) = E ( X − Ε X ) 2
2
fX ,
40
Discrete Stochastic Processes and Optimal Filtering
We call standard deviation of
X the expression σ ( X ) = Var X . 2
Now let two r.v. be X and Y ∈ L
( dP ) . By using the scalar product on
L2 ( dP ) defined in 1.2 we have: ΕXY = < X , Y > = ∫ X (ω ) Y (ω ) dP (ω ) Ω
and, if the vector Z = ( X , Y ) admits the density f Ζ , then:
EXY = ∫
2
xy f Z ( x, y ) dx dy .
We have already established, by applying Schwarz’s inequality, that ΕXY actually has a value. 2
DEFINITION.– Given that two r.v. are X , Y ∈ L of
( dP ) , we call the covariance
X and Y : The expression Cov ( X , Y ) = ΕXY − ΕX ΕY . Some observations or easily verifiable properties:
Cov ( X , X ) = V ar X Cov ( X , Y ) = Cov (Y , X ) – if
λ
is a real constant
Var ( λ X ) = λ 2 Var X ;
– if X and Y are two independent r.v., then Cov ( X , Y ) = 0 but the reciprocal is not true;
Random Vectors
41
– if X 1 ,..., X n are pairwise independent r.v.
Var ( X1 + ... + X n ) = Var X1 + ... + Var X n Correlation coefficients
The
Var X j (always positive) and the Cov ( X j , X K ) (positive or negative)
can take extremely high algebraic values. Sometimes it is preferable to use the (normalized) “correlation coefficients”:
ρ ( j, k ) =
Cov ( X j , X K ) Var X j
Var X K
whose properties are as follows:
ρ ( j , k ) ∈ [ −1,1] In effect, let us assume (solely to simplify its expression) that X j and X K are centered and let us study the 2nd degree trinomial in
λ.
Τ ( λ ) = Ε ( λ X j − X K ) = λ 2ΕX 2j − 2λΕ ( X j X K ) + Ε X K2 ≥ 0 2
Τ ( λ ) ≥ 0 ∀λ ∈
(
∆ = E X jXK is
negative
or
ρ ( j , k ) ∈ [ −1,1] ).
)
2
zero,
if and only if the discriminant
− Ε X 2j Ε X K2 i.e.
Cov ( X j , X K )
This is also Schwarz’s inequality.
2
≤ Var X j Var X K
(i.e.
42
Discrete Stochastic Processes and Optimal Filtering
Furthermore, we can make clear that
ρ ( j , k ) = ±1
if and only if ∃ λ 0 ∈
such that X K = λ 0 X j p.s. In effect by replacing X K with definition of
λ0 X j
in the
ρ ( j , k ) , we obtain ρ ( j , K ) = ±1 .
Reciprocally, if
ρ ( j , K ) = 1 (for example), that is to say if:
∆ = 0 , ∃ λ0 ∈
such that X K = λ 0 X j a.s.
If X j and X K are not centered, we replace in what has gone before X j by
X j − Ε X j and X K by X K − Ε X K ). 2)
If
(
Xj
and
)
Xk
are
independent,
Ε X j Xk = Ε X j Ε Xk
so
Cov X j , X k = 0 , ρ ( j , k ) = 0 . However, the reciprocity is in general false, as is proven in the following example. Let Θ be a uniform random variable on
f Θ (θ ) =
[0 , 2 π [
that is to say
1 1 (θ ) . 2π [ 0 , 2 π [
In addition let two r.v. be X j = sin Θ and X K = c os Θ . We can easily verify that Ε X j
(
Cov X j , X k
)
and
ρ ( j , k ) are
X j and X k are dependent.
, Ε X k , Ε X j X k are zero; thus 2
2
zero. However, X j + X k = 1 and the r.v.
Random Vectors
43
Second order random vectors
DEFINITION.– We say that a random vector X 2
if X j ∈ L
( dP )
T
= ( X1 ,..., X n ) is second order
∀ j = 1 at n .
DEFINITION.– Given a second order random vector X
T
= ( X1 ,..., X n ) , we call
the covariance matrix of this vector the symmetric matrix:
… Cov ( X 1 , X n ) ⎞ ⎛ Var X 1 ⎜ ⎟ ΓX = ⎜ ⎟ ⎜ Cov ( X , X ) ⎟ X Var 1 n n ⎝ ⎠ If we return to the definition of the expectation value of a matrix of r.v., we see T that we can express it as Γ X = Ε ⎡( X − Ε X )( X − Ε X ) ⎤ .
⎣
⎦
We also can observe that Γ X −ΕX = Γ X . NOTE.– Second order complex random variables and vectors: we say that a complex random variable X = X 1 + i X 2 is second order if X 1 and
X 2 ∈ L2 ( dP ) . The covariance of two centered second order random variables, X = X 1 + i X 2 and Y = Y1 + iY2 has a natural definition:
Cov ( X , Y ) = EXY = E ( X1 + iX 2 )(Y1 − iY2 )
= E ( X 1Y1 + X 2Y2 ) + iE ( X 2Y1 − X 1Y2 )
and the decorrelation condition is thus:
E ( X 1Y1 + X 2Y2 ) = E ( X 2Y1 − X 1Y2 ) = 0 .
44
Discrete Stochastic Processes and Optimal Filtering
We say that a complex random vector X order if
T
(
)
= X 1 ,..., X j ,..., X n is second
j ∈ (1,..., n ) X j = X1 j + iX 2 j is a second order complex random
variable for the entirety. The covariance matrix of a second order complex centered random vector is defined by:
⎛ E X 2 … EX X ⎞ 1 1 n⎟ ⎜ ΓX = ⎜ ⎟ ⎜ ⎟ 2 ⎜ EX X ⎟ E X n ⎠ ⎝ n 1 If we are not intimidated by its dense expression, we can express these definitions for non-centered complex random variables and vectors without any difficulty. Let us return to real random vectors. T DEFINITION.– We call the symmetric matrix Ε ⎡ X X ⎤ the second order matrix
⎣
moment. If
⎦
X is centered Γ X = ⎡⎣ X X ⎤⎦ . T
Affine transformation of a second order vector
Let us denote the space of the matrices at p rows and at n columns as M ( p, n ) .
PROPOSITION.– Let X
T
= ( X1 ,..., X n ) be a random vector of expectation
value vector m = ( m1 ,..., mn ) and of covariance matrix Γ X . T
Furthermore
(
)
let
BT = b1 ,..., bp .
a
matrix
be
A ∈ M ( p, n )
and
a
certain
vector
Random Vectors
45
The random vector Y = A X + B possesses Αm + B as a mean value vector Τ
and Γ y = ΑΓ X Α as a covariance matrix. DEMONSTRATION.–
Ε [Y ] = Ε [ ΑX + B ] = Ε [ ΑX ] + Β = Αm + Β . In addition for example: Τ Ε ⎡( ΑX ) ⎤ = Ε ⎣⎡ X Τ ΑΤ ⎦⎤ = mΤ ΑΤ ⎣ ⎦
Τ ΓY = Γ ΑX +Β = Γ ΑX = Ε ⎡⎢ Α ( X − m ) ( Α ( X − m ) ) ⎤⎥ = ⎣ ⎦ Τ Τ Ε ⎡ Α ( X − m )( X − m ) ΑΤ ⎤ = Α Ε ⎡( X − m )( X − m ) ⎤ ΑΤ = ΑΓ X Α Τ ⎣ ⎦ ⎣ ⎦
for what follows, we will also need the easy result that follows. PROPOSITION.– Let X
T
= ( X 1 ,..., X n ) be a second order random vector, of
covariance matrix Γ Χ . Thus: ∀ Λ = ( λ1 ,..., λn ) ∈ T
n
⎛ n ⎞ Λ Τ Γ X Λ = var ⎜ ∑ λ j X j ⎟ ⎜ j =1 ⎟ ⎝ ⎠
DEMONSTRATION.–
(
)
(
Λ ΤΓ X Λ = ∑ Cov X j , X K λ j λK = ∑ Ε X j − ΕX j j,K
j,K
) ( X K − Ε X K ) λ j λK 2
2 ⎛ ⎛ ⎞⎞ ⎛ ⎞ ⎛ ⎞ = Ε ⎜ ∑ λ j X j − ΕX j ⎟ = Ε ⎜ ∑ λ j X j − Ε ⎜ ∑ λ j X j ⎟ ⎟ = Var ⎜ ∑ λ j X j ⎟ ⎜ j ⎟⎟ ⎜ j ⎟ ⎜ j ⎝ K ⎠ ⎝ ⎠⎠ ⎝ ⎠ ⎝
(
)
46
Discrete Stochastic Processes and Optimal Filtering
CONSEQUENCE.– ∀Λ ∈
n
Τ
we still have Λ Γ Χ Λ ≥ 0 .
Let us recall in this context the following algebraic definitions: – if Λ Γ X Λ > 0 ∀Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) , we say that Γ X is positive T
definite; – if ∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) such that Λ Γ X Λ = 0 , we say that Λ X is Τ
positive semi-definite. NOTE.– In this work the notion of vector appears in two different contexts and in order to avoid confusion, let us return for a moment to some vocabulary definitions. n
1) We call random vector of
(or random vector with values in
n
), every
⎛ X1 ⎞ ⎜ ⎟ n-tuple of random variables X = ⎜ ⎟ ⎜X ⎟ ⎝ n⎠ (or X
T
= ( X 1 ,..., X n ) or even X = ( X 1 ,..., X n ) ).
X is a vector in this sense that for each ω ∈ Ω , we obtain an n-tuple X (ω ) = ( X 1 (ω ) ,..., X n (ω ) ) which belongs to the vector space n . 2) Every random vector of
n
. X = ( X 1 ,..., X n ) of which all the components
X j belong to L2 ( dP ) we call a second order random vector.
In this context, the components X j themselves are vectors since they belong to the vector space L ( dP ) . 2
Thus, in what follows, when we speak of linear independence or of scalar product or of orthogonality, it is necessary to point out clearly to which vector space,
n
or L ( dP ) , we are referring. 2
Random Vectors 2
1.5. Linear independence of vectors of L
( dP ) 2
DEFINITION.– We say that n vectors X 1 ,..., X n of L
λ1 X 1 + ... + λn X n = 0
independent if
2
zero vector of L
a.s.
( dP )
( dP ) ). 2
λ1 ,..., λn are not all zero and ∃ an event A λ1 X 1 (ω ) + ... + λn X n (ω ) = 0 ∀ω ∈ A .
dependent if ∃
In particular: X 1 ,..., X n will be linearly dependent if ∃ zero such that
are linearly
⇒ λ1 = ... = λn = 0 (here 0 is the
DEFINITION.– We say that the n vectors X 1 ,..., X 2 of L such that
λ1 X 1 + ... + λn X n = 0
( dP )
are linearly
of positive probability
λ1 ,..., λn
are not all
a.s.
Examples: given the three measurable mappings:
X1, X 2 , X 3 :
([0, 2] ,B [0, 2] , dω ) → (
,B (
))
defined by:
X 1 (ω ) = ω
X 2 (ω ) = 2ω X 3 (ω ) = 3ω
47
⎫ ⎪ ⎬ on [ 0,1[ and ⎪ ⎭
X 1 (ω ) = e −(ω −1)
⎫ ⎪⎪ X 2 (ω ) = 2 ⎬ on [1, 2[ ⎪ X 3 (ω ) = −2ω + 5⎭⎪
48
Discrete Stochastic Processes and Optimal Filtering
X1 ; X2 ; X3
3
2
1
0
1
ω
2
Figure 1.6. Three random variables
The three mappings are evidently measurable and belong to L ( dω ) , so there 2
are 3 vectors of L ( dω ) . 2
There 3 vectors are linearly dependent on measurement
A = [ 0,1[ of probability
1 : 2
−5 X 1 ( ω ) + 1 X 2 ( ω ) + 1 X 3 ( ω ) = 0
∀ω ∈ A
Covariance matrix and linear independence
Let Γ X be the covariance matrix of X = ( X 1 ,..., X n ) a second order vector.
Random Vectors
49
1) If Γ X is defined as positive: X 1 = X 1 − ΕX 1 ,..., X n = X n − ΕX n are thus *
*
linearly independent vectors of L ( dP ) . 2
In effect:
⎛ ⎛ ⎞ ⎛ ⎞⎞ ΛT Γ X Λ = Var ⎜ ∑ λ j X j ⎟ = Ε ⎜ ∑ λ j X j − Ε ⎜ ∑ λ j X j ⎟ ⎟ ⎜ j ⎟ ⎝ j ⎠ ⎝ j ⎠⎠ ⎝
2
2
⎛ ⎞ = Ε ⎜ ∑ λ j ( X j − ΕX j ) ⎟ = 0 ⎝ j ⎠ That is to say:
∑λ ( X j
j
j
− ΕX j ) = 0
a.s.
This implies, since Γ X is defined positive, that
λ1 =
= λn = 0
We can also say that X 1 ,..., X n generates a hyperplane of L ( dP ) of *
*
2
(
*
*
)
dimension n that we can represent as H X 1 ,..., X n . In particular, if the r.v. X 1 ,..., X n are pairwise uncorrelated (thus a fortiori if they are stochastically independent), we have:
ΛT Γ X Λ = ∑ Var X j .λ j2 = 0 ⇒ λ1 =
= λn = 0
j
thus, in this case, Γ X is defined positive and X 1 ,..., X n are still linearly *
independent.
*
50
Discrete Stochastic Processes and Optimal Filtering
NOTE.– If Ε X X , the matrix of the second order moment function is defined as T
positive definite, then X 1 ,..., X n are linearly independent vectors of L ( dP ) . 2
2) If now Γ X is semi-defined positive:
X 1* = X 1 − ΕX 1 ,..., X n∗ = X n − ΕX n are thus linearly dependent vectors of L ( dP ) . 2
In effect:
∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 )
(
)
⎛
such that: Λ Γ X Λ = Var ⎜ T
⎝
∑λ j
j
⎞ Xj⎟=0 ⎠
That is to say:
∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) such that
∑λ ( X j
j
j
− ΕX j ) = 0 a.s.
⎛ X1 ⎞ ⎜ ⎟ 3 Example: we consider X = X 2 to a second order random vector of , ⎜ ⎟ ⎜X ⎟ ⎝ 3⎠ ⎛ 3⎞ ⎛ 4 2 0⎞ ⎜ ⎟ ⎜ ⎟ admitting m = −1 for the mean value vector and Γ X = 2 1 0 for the ⎜ ⎟ ⎜ ⎟ ⎜ 2⎟ ⎜ 0 0 3⎟ ⎝ ⎠ ⎝ ⎠ covariance matrix. We state that Γ X is semi-defined positive. In taking for example
Random Vectors
ΛT = (1, − 2, 0 )
we
( X1 − 2 X 2 + 0 X 3 ) = 0
verify
(Λ Γ Λ) = 0 T
that
and X 1 − 2 X 2 = 0 *
a.s.
X
*
Thus
51
Var
a.s.
X ∗ (ω )
L2 ( dP )
3
0
X∗
x2 ∆
x1
(
H X 1∗ X 2∗ X 3∗
When
(
ω
describes Ω ,
X ∗ (ω ) = X ∗ (ω ) , X ∗ (ω ) , X ∗ (ω ) 1
2
3
random vector of
3
)
X 1∗ , X 2∗ , X 3∗ vectors of L2 ( dP )
)
T
(
∗
∗
∗
generate H X 1 X 2 X 3 2
subspace of L
of the 2nd order
describes the vertical plane ( Π ) passing
)
( dP ) of
dimension 2
through the straight line ( ∆ ) of equation
x1 = 2 x2 Figure 1.7. Vector
X ∗ (ω )
X∗
and vector
1.6. Conditional expectation (concerning random vectors with density function)
Given that assume
that
X is a real r.v. and Y = (Y1 ,..., Yn ) is a real random vector, we X
and
Y
are
independent
and
that
Z = ( X , Y1 ,..., Yn ) admits a probability density f Z ( x, y1 ,..., yn ) . In this section, we will use as required the notations
Y , ( y1 ,..., yn ) or y . Let us recall to begin with fY ( y ) =
∫
f Z ( x, y ) dx .
the
vector
(Y1 ,..., Yn )
or
52
Discrete Stochastic Processes and Optimal Filtering
Conditional probability
We want, for all B ∈ B (
)
and all
( y1 ,..., yn ) ∈
n
, to define and
calculate the probability that X ∈ B knowing that Y1 = y1 ,..., Yn = yn . We denote this quantity P
(
)
( ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) )
or more
simply P X ∈ B y1 ,..., yn . Take note that we cannot, as in the case of discrete variables, write:
(
)
P ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) =
(
P ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) P ( (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) )
The quotient here is indeterminate and equals
)
0 . 0
For j = 1 at n , let us note I j = ⎡⎣ y j , y j + h ⎡⎣ We write:
(
P ( X ∈ B y1 ,..., yn ) = lim P ( X ∈ B ) (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) h →0
= lim
h→0
P ( ( X ∈ B ) ∩ (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) ) P ( (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) )
∫ B dx ∫ I ×...×I f Z ( x, u1,..., un ) du1...dun h→0 ∫ I ×...×I f y ( u1,..., un ) du1...dun ∫ B f Z ( x, y ) dx = f Z ( x, y ) dx = ∫ B fY ( y ) fY ( y ) = lim
n
1
1
n
)
Random Vectors
53
It is thus natural to say that the conditional density of the random vector X
( y1 ,..., yn ) is the function:
knowing
x → f ( x y) =
f Z ( x, y ) if fY ( y ) ≠ 0 fY ( y )
We can disregard the set of
y for which fY ( y ) = 0 for its measure (in
n
)
is zero. Let us state that Α =
{( x, y ) fY ( y ) = 0} ; we observe:
P ( ( X , Y ) ∈ Α ) = ∫ f Z ( x, y ) dx dy = ∫ Α
=∫
{y f
Y
( y )=0}
{y f
Y
( y )=0}
du ∫ f ( x, u ) dx
fY ( u ) du = 0 , so fY ( y ) is not zero almost everywhere.
Finally, we have obtained a family (indicated by y verifying fY ( y ) > 0 ) of
(
probability densities f x y
(∫
)
)
f ( x y ) dx = 1 .
Conditional expectation
Let the random vector always be Z = ( X , Y1 ,..., Yn ) of density f Z ( x, y ) and
f ( x y ) always be the probability density of X , knowing y1 ,..., yn . DEFINITION.– Given a measurable mapping Ψ : under
the
(
hypothesis
)
∫
(
,B (
Ψ ( x ) f ( x y ) dx < ∞
) ) → ( ,B ( ) ) , (that
is
to
say
Ψ ∈ L1 f ( x y ) dx we call the conditional expectation of Ψ ( X ) knowing
54
Discrete Stochastic Processes and Optimal Filtering
( y1 ,..., yn ) , the expectation of
Ψ ( X ) calculated with the conditional density
f ( x y ) = f ( x y1 ,..., yn ) and we write:
Ε ( Ψ ( X ) y1 ,..., yn ) = ∫ Ψ ( x ) f ( x y ) dx
Ε ( Ψ ( X ) y1 ,..., yn ) is a certain value, depending on
( y1 ,..., yn ) ,
and we
denote this gˆ ( y1 ,..., yn ) (this notation will be of use in Chapter 4). DEFINITION.– We call the conditional expectation of Ψ ( X ) with respect to
Y = (Y1 ,..., Yn ) the r.v. gˆ (Y1 ,..., Yn ) = Ε ( Ψ ( X ) Y1 ,..., Yn ) (also denoted
Ε ( Ψ ( X ) Y ) ) which takes the value gˆ ( y1 ,..., yn ) = Ε ( Ψ ( X ) y1 ,..., yn )
when (Y1 ,..., Yn ) takes the value ( y1 ,..., yn ) .
NOTE.– As we do not distinguish between two equal r.v. a.s., we will still call the condition expectation of
Ψ ( X ) with respect to Y1 ,..., Yn of all r.v.
gˆ ′ (Y1 ,..., Yn ) such that gˆ ′ (Y1 ,..., Yn ) = gˆ (Y1 ,..., Yn ) almost surely.
That is to say gˆ ′ (Y1 ,..., Yn ) = gˆ (Y1 ,..., Yn ) except possibly on Α such that
P ( Α ) = ∫ fY ( y ) dy = 0 . Α
PROPOSITION.– If Ψ ( X ) ∈ L ( dP ) (i.e. 1
gˆ (Y ) = Ε ( Ψ ( X ) Y ) ∈ L1 ( dP ) (i.e.
∫
n
∫
Ψ ( x ) f X ( x ) dx < ∞ ) then
gˆ ( y ) fY ( y ) dy < ∞ ).
Random Vectors
55
DEMONSTRATION.–
∫ =∫
n
n
gˆ ( y ) f ( y ) dy = ∫
n
Ε ( Ψ ( X ) y ) fY ( y ) dy
fY ( y ) dy ∫ Ψ ( x ) f ( x y ) dx
Using Fubini’s theorem:
∫ =∫
n+1
Ψ ( x ) fY ( y ) f ( x y ) dx dy = ∫
Ψ ( x ) dx ∫
n
n+1
Ψ ( x ) f Z ( x, y ) dx dy
f Z ( x, y ) dy = ∫ Ψ ( x ) f X ( x ) dx < ∞
Principal properties of conditional expectation
The hypotheses of integrability having been verified:
( (
1) Ε Ε Ψ ( X ) Y
)) = Ε ( Ψ ( X )) ;
(
)
(
)
2) If X and Y are independent Ε Ψ ( X ) Y = Ε Ψ ( X ) ;
(
)
3) Ε Ψ ( X ) X = Ψ ( X ) ; 4) Successive conditional expectations
(
)
Ε E ( Ψ ( X ) Y1 ,..., Yn , Yn +1 ) Y1 ,..., Yn = Ε ( Ψ ( X ) Y1 ,..., Yn ) ; 5) Linearity
Ε ( λ1Ψ1 ( X ) + λ2 Ψ 2 ( X ) Y ) = λ1Ε ( Ψ1 ( X ) Y ) + λ2Ε ( Ψ 2 ( X ) Y ) . The demonstrations which in general are easy may be found in the exercises.
56
Discrete Stochastic Processes and Optimal Filtering
Let us note in particular that as far as the first property is concerned, it is sufficient to re-write the demonstration of the last proposition after stripping it of absolute values. The chapter on quadratic means estimation will make the notion of conditional expectation more concrete. Example: let Z = ( X , Y ) be a random couple of probability density
f Z ( x, y ) = 6 xy ( 2 − x − y )1∆ ( x, y ) where ∆ is the square [ 0,1] × [ 0,1] .
(
)
Let us calculate E X Y . We have successively: 1
i.e.
1
( y ) = ∫0 f ( x, y ) dx = ∫0 6 xy ( 2 − x − y ) dx with f ( y ) = ( 4 y − 3 y 2 )1[0,1] ( y )
– f
(
)
– f x y =
(
y ∈ [ 0,1]
f ( x, y ) 6 x ( 2 − x − y ) 1[0,1] ( x ) with y ∈ [ 0,1] = f ( y) 4 − 3y
) ∫0 xf ( x y ) dx ⋅1[0,1] ( y ) = 2 54−−43yy 1[0,1] ( y ) . ( ) 1
– E X y =
Thus:
E(X Y) =
5 − 4Y 1 (Y ) . 2 ( 4 − 3Y ) [0,1]
We also have:
(
)
E ( X ) = E E ( X Y ) = ∫ E ( X y ) f ( y ) dy 1
0
5− 4y 7 . =∫ 4 y − 3 y 2 dy = 0 2(4 − 3y) 12 1
(
)
Random Vectors
57
1.7. Exercises for Chapter 1 Exercise 1.1.
Let
X be an r.v. of distribution function ⎛ 0 if ⎜ 1 if F ( x) = ⎜ ⎜2 ⎜ 1 if ⎝
x2
Calculate the probabilities:
(
) (
) (
P X 2 ≤ X ; P X ≤ 2X 2 ; P X + X 2 ≤ 3
4
).
Exercise 1.2.
Given
the
f Z ( x, y ) = K
random
vector
Z = ( X ,Y )
1 1∆ ( x, y ) where K yx 4
of
probability
density
is a real constant and where
⎧ 1⎫ ∆ = ⎨( x, y ) ∈ 2 x, y > 0 ; y ≤ x ; y > ⎬ , determine the constant K and the x⎭ ⎩ densities f X and fY of the r.v. X and Y . Exercise 1.3.
Let X and Y be two independent random variables of uniform density on the
[ ]
interval 0,1 : 1) Determine the probability density f Z of the r.v. Z = X + Y ; 2) Determine the probability density fU of the r.v. U = X Y .
58
Discrete Stochastic Processes and Optimal Filtering
Exercise 1.4.
Let X and Y be two independent r.v. of uniform density on the interval
[0,1] .
Determine the probability density fU of the r.v. U = X Y . Solution 1.4.
y
xy = 1
1
xy < u
A
B
0
u
x
1
U takes its values in [ 0,1] Let FU be the distribution function of
U:
– if
u ≤ 0 FU ( u ) = 0 ; if u ≥ 1 FU ( u ) = 1 ;
– if
u ∈ ]0,1[ : FU ( u ) = P (U ≤ u ) = P ( X Y ≤ u ) = P ( ( X , Y ) ∈ Bu ) ;
where Bu = A ∪ B is the cross-hatched area of the figure. Thus FU ( u ) =
∫B
u
f( X ,Y ) ( x, y ) dx dy = ∫
Bu
f X ( x ) fY ( y ) dx dy
Random Vectors
1
u
u
0
= ∫ dx dy + ∫ dx ∫ A
x
dy = u + u ∫
1 dx
= u (1 − n u )
x
u
59
.
⎛ 0 if x ∈ ]-∞,0] ∪ [1, ∞[ ⎜− nu x ∈ ]0,1[ ⎝
Finally fU ( u ) = FU′ ( u ) = ⎜
Exercise 1.5.
Under consideration are three r.v.
X , Y , Z which are independent and of the
same law N ( 0,1) , that is to say admitting the same density
1 2π
⎛ x2 ⎞ ⎜− ⎟. ⎝ 2⎠
Determine the probability density fU of the real random variable (r.r.v.)
U = (X 2 +Y2 + Z2) 2. 1
Solution 1.5.
Let FU be the probability distribution of
U
– if
⎛ u ≤ 0 FU ( u ) = P ⎜ X 2 + Y 2 + Z 2 ⎝
– if
u > 0 FU ( u ) = P ( ( X + Y + Z ) ∈ Su ) ;
(
where Su is the sphere of
3
centered on
)
1
2
⎞ ≤ u⎟ = 0; ⎠
( 0, 0, 0 ) and of radius u
= ∫ f( X ,Y ,Z ) ( x, y, z ) dx dy dz Su =
( 2π )
∫S exp ⎜⎝ − 2 ( x 2 ⎛ 1
1 3
u
2
)
⎞ + y 2 + z 2 ⎟ dx dy dz ⎠
60
Discrete Stochastic Processes and Optimal Filtering
and by employing a passage from spherical coordinates:
= =
1
( 2π )
eπ
3
∫0 2
1
( 2π )
3
2
2
dθ
π
∫0
dϕ
⎛ 1
u
∫ 0 exp ⎜⎝ − 2 r
2
⎞ 2 ⎟ r sin ϕ dr ⎠
u ⎛ 1 ⎞ 2π ⋅ 2 ∫ r 2 exp ⎜ − r 2 ⎟ dr 0 ⎝ 2 ⎠
⎛ 1 2⎞ r ⎟ is continuous: ⎝ 2 ⎠
and as r → r exp ⎜ −
⎛ 0 if u 0
fa ( x ) =
1 a is a probability density 2 Π a + x2
(called Cauchy’s density). 1b)
Verify
that
ϕ X ( u ) = exp ( −a u )
the
corresponding
1c) Given a family of independent r.v. density of the r.v.
Yn =
characteristic
function
is
X 1 ,..., X n of density f a , find the
X 1 + ... + X n . n
What do we notice? 2) By considering Cauchy’s random variables, verify that we can have the equality
ϕ X +Y ( u ) = ϕ X ( u ) ϕ Y ( u )
with
X and Y dependent.
Random Vectors
61
Exercise 1.7.
Show that
⎛1 2 3⎞ ⎜ ⎟ M = ⎜ 2 1 2 ⎟ is not a covariance matrix. ⎜3 2 1⎟ ⎝ ⎠
⎛ 1 0,5 0 ⎞ ⎜ ⎟ Show that M = 0,5 1 0⎟ ⎜ ⎜ 0 0 1 ⎟⎠ ⎝
is a covariance matrix.
Verify that from this example the property of “not being correlated with” for a family or r.v. is not transitive. Exercise 1.8.
Show
that
ΕX = ( 7, 0,1) T
the
random
vector
X T = ( X1 , X 2 , X 3 )
of
expectation
⎛ 10 −1 4 ⎞ ⎜ ⎟ and of covariance matrix Γ X = −1 1 −1 belongs ⎜ ⎟ ⎜ 4 −1 2 ⎟ ⎝ ⎠
almost surely (a.s.) to a plane of
3
.
Exercise 1.9.
We are considering the random vector U = ( X , Y , Z ) of probability density
fU ( x, y, z ) = K x y z ( 3 − x − y − z )1∆ ( x, y, z ) where ∆ is the cube
[0,1] × [0,1] × [0,1] .
1) Calculate the constant
K.
2) Calculate the conditional probability
⎛ 1 3⎞ ⎡1 1⎤ P⎜ X ∈⎢ , ⎥ Y = ,Z = ⎟ . 2 4⎠ ⎣4 2⎦ ⎝
3) Determine the conditional expectation
(
)
Ε X 2 Y,Z .
Chapter 2
Gaussian Vectors
2.1. Some reminders regarding random Gaussian vectors DEFINITION.– We say that a real r.v. is Gaussian, of expectation m and of variance
σ2
if its law of probability PX :
⎛ ( x − m )2 ⎞ 1 ⎟ if σ 2 ≠ 0 (using exp ⎜ − – admits the density f X ( x ) = 2 ⎜ ⎟ σ 2 2π σ ⎝ ⎠ a double integral calculation, for example, we can verify that ∫ f X ( x ) dx = 1 ); – is the Dirac measure
(
2πσ
)
δm
if σ
2
= 0.
δm
−1
fX
x
x m
m
Figure 2.1. Gaussian density and Dirac measure
64
Discrete Stochastic Processes and Optimal Filtering
If
σ 2 ≠ 0 , we say that X
is a non-degenerate Gaussian r.v.
2 If σ = 0 , we say that X is a degenerate Gaussian r.v.; X is in this case a “certain r.v.” taking the value m with the probability 1.
EX = m, Var X = σ 2 . This can be verified easily by using the probability distribution function. As we have already observed, in order to specify that an r.v. X is Gaussian of
(
)
m expectation and of σ 2 variance, we will write X ∼ N m, σ 2 . Characteristic function of Let
us
begin
X 0 ∼ N ( 0,1) :
firstly
(
ϕ X (u ) = E e 0
X ∼ N ( m, σ 2 )
iuX 0
by
)
determining
1 = 2π
∫
iux
e e
the
− x2
2 dx
characteristic
function
of
.
We can easily see that the theorem of derivation under the sum sign can be applied:
ϕ ′X 0 ( u ) =
i 2π
∫
iux
e xe
− x2
2 dx
.
Following this by integration by parts:
=
i 2π
⎡⎛ iux − x2 ⎞+∞ ⎤ − x2 +∞ iux 2 2 dx = − uϕ e e iue e − + ⎢⎜ ⎥ X0 (u ) . ⎟ ∫ −∞ ⎠−∞ ⎢⎣⎝ ⎥⎦
Gaussian Vectors
The resolution of the differential equation condition that
ϕ X ( 0) = 1
(
)
ϕ X (u ) =
By changing the variable y = case, we obtain If
σ2 =0
ϕ X (u ) =
ϕ X (u ) =
x−m
σ
1 ium − u 2σ 2 e 2
δm )
1 ium − u 2σ 2 e 2
ium
⎛ ⎝
−u
2
0
∫
(
.
1 ⎛ x −m ⎞ +∞ iux − 2 ⎜ σ ⎟ ⎠ e e ⎝ −∞
2
dx .
which brings us back to the preceding
ϕ X (u )
(Fourier transform in the sense
so well that in all cases
(σ
2
≠ or = 0 )
)
1
2
(σ ) 2
1 2
1
2
with the
2
∼ N m, σ 2 , we can write: 1
( 2π )
ϕ X (u ) = e
.
NOTE.– Given the r.v. X
fX ( x) =
= e
1 2πσ
0
.
that is to say if PX = δ m ,
of the distribution of
0
leads us to the solution
0
For X ∼ N m, σ 2
ϕ ′X ( u ) = − uϕ X ( u )
65
( )
⎛ 1 exp ⎜ − ( x − m ) σ 2 ⎝ 2
−1
( x − m ) ⎞⎟ ⎠
⎞ ⎠
ϕ X ( u ) = exp ⎜ ium − u σ 2u ⎟ These are the expressions that we will find again for Gaussian vectors.
66
Discrete Stochastic Processes and Optimal Filtering
2.2. Definition and characterization of Gaussian vectors T
= ( X 1 ,..., X n ) is Gaussian
∑aj X j
is Gaussian (we can in this
DEFINITION.– We say that a real random vector X if ∀ ( a0 , a1 ,..., an ) ∈
n +1
the r.v. a0 +
n
j =1
definition assume that a0 = 0 and this will be sufficient in general). A random vector X
T
= ( X 1 ,..., X n ) is thus not Gaussian if we can find an
n -tuple ( a1 ,..., an ) ≠ ( 0,..., 0 ) such that the r.v.
n
∑aj X j
is not Gaussian and
j =1
n
for this it suffices to find an n -tuple such that
∑ajX j
is not an r.v. of density.
j =1
EXAMPLE.– We allow ourselves an r.v. X ∼ N ( 0,1) and a discrete r.v.
ε,
independent of X and such that:
P ( ε = 1) =
1 1 and P ( ε = −1) = . 2 2
We state that Y = ε X . By using what has already been discussed, we will show through an exercise that although Y is an r.v. N ( 0,1) , the vector ( X , Y ) is not a Gaussian vector. PROPOSITION.– In order for a random vector X
T
= ( X 1 ,..., X n ) of expectation
mT = ( m1 ,..., mn ) and of covariance matrix Γ X to be Gaussian, it is necessary and sufficient that its characteristic function (c.f.)
⎛ ⎜ ⎝
m
1 2
⎞ ⎟ ⎠
ϕ X ( u1 ,..., un ) = exp ⎜ i ∑ u j m j − uT Γ X u ⎟ j =1
ϕX
be defined by:
( where u
T
= ( u1 ,..., un )
)
Gaussian Vectors
67
DEMONSTRATION.–
⎛ ⎜ ⎝
⎞ ⎟ ⎠
n
⎛ ⎜ ⎝
⎞ ⎟ ⎠
n
ϕ X ( u 1,..., u n ) = E exp ⎜ i ∑ u j X j ⎟ = E exp ⎜ i.1.∑ u j X j ⎟ j =1
j =1
n
= characteristic function of the r.v.
∑u j X j
in the value 1.
j =1
That is to say:
ϕ
n
∑u j X j
(1)
j =1
and
ϕ
⎛ ⎛ n ⎞ 1 2 ⎛ n 1 exp .1. 1 Var = ⎜ − i E u X n () ⎜⎜ ∑ j j ⎟⎟ ⎜⎜ ∑ u ⎜ 2 ∑u j X j ⎝ j =1 ⎠ ⎝ j =1 ⎝ j =1
j
⎞⎞ X j ⎟⎟ ⎟⎟ ⎠⎠
n
if and only if the r.v.
∑u j X j
is Gaussian.
j =1
⎛ n ⎞ u j X j ⎟ = u T Γ X u , we arrive indeed at: ∑ ⎜ j =1 ⎟ ⎝ ⎠
Finally, since Var ⎜
⎛ ⎜ ⎝
n
1 2
⎞ ⎟ ⎠
ϕ X ( u 1,..., u n ) = exp ⎜ i ∑ u j m j − u T Γ X u ⎟ . j =1
NOTATION.– We can see that the characteristic function of a Gaussian vector X is entirely determined when we know its expectation vector m and its covariance
matrix Γ X . If X is such a vector, we will write X ∼ N n ( m, Γ X ) .
PARTICULAR CASE.– m = 0 and Γ X = I n (unit matrix), X ∼ N n ( 0, I n ) is called a standard Gaussian vector.
68
Discrete Stochastic Processes and Optimal Filtering
2.3. Results relative to independence PROPOSITION.– T
= ( X 1 ,..., X n ) is Gaussian, all its components X j are
2) if the components
X j of a random vector X are Gaussian and independent,
1) if the vector X thus Gaussian r.v.;
the vector
X is thus also Gaussian.
DEMONSTRATION.– 1) We write
X j = 0 + ... + 0 + X j + 0... + 0 . n
2)
ϕ X ( u 1,..., u n ) = ∏ ϕ X ( u j
j =1
j
)
n 1 ⎛ ⎞ = ∏ exp ⎜ iu j m j − u 2jσ 2j ⎟ 2 ⎝ ⎠ j =1
⎛ n ⎞ 1 u jmj − u T ΓX u ⎟ ∑ ⎜ j =1 ⎟ 2 ⎝ ⎠ 0 ⎞ ⎟ ⎟. σ n2 ⎟⎠
that we can still express exp ⎜ i
⎛ σ 12 ⎜ with Γ X = ⎜ ⎜ 0 ⎝
NOTE.– As we will see later “the components
X j are Gaussian and independent”
is not a necessary condition for the random vector
(
)
X T = X1 ,..., X j ,..., X n to
be Gaussian. PROPOSITION.– If
(
X T = X 1 ,..., X j ,..., X n
)
is a Gaussian vector of
covariance Γ X , we have the equivalence: Γ X diagonal ⇔ the r.v. independent.
X j are
Gaussian Vectors
69
DEMONSTRATION.–
⎛ σ 12 ⎜ ΓX = ⎜ ⎜ 0 ⎝
0 ⎞ n ⎟ ⇔ ϕ u ,..., u = ( ) ∏ϕ X j u ⎟ X n 1 j −1 σ n2 ⎟⎠
( j)
This is a necessary and sufficient condition of independence of the r.v. X j . Let us sum up these two simple results schematically:
(
X T = X 1 ,..., X j ,..., X n
)
The components
X j are
Gaussian r.v.
is a Gaussian vector If (sufficient condition) the r.v. X j are
Even if
Γ X is
independent
diagonal
(The r.v. X j are independent ⇔ Γ X
(The r.v. X j are independent or X is Gaussian)
is
diagonal)
NOTE.– A Gaussian vector
(
X T = X1 ,..., X j ,..., X n
order. In effect each component
)
is evidently of the 2nd
X j is thus Gaussian and belongs to L2 ( dP )
2 ⎛ ⎞ −( x − m ) 1 2 ⎜ ⎟ 2 e 2σ dx < ∞ ⎟ . ⎜∫ x 2 2πσ ⎜ ⎟ ⎝ ⎠
We can generalize the last proposition and replace the Gaussian r.v. by Gaussian vectors.
70
Discrete Stochastic Processes and Optimal Filtering
Let us consider for example three random vectors:
(
X T = X ,..., X 1
) ; Y = (Y ,..., Y ) ; Z = ( X ,..., X , Y ,..., Y ) T
n
T
1
ΓX ⎛ ⎜ and state Γ Z = ⎜ ⎜ Cov(Y , X ) ⎝
1
p
n
1
p
Cov( X , Y ) ⎞ ⎟ ⎟ ⎟ ΓY ⎠
where Cov ( X , Y ) is the matrix of the coefficients Cov
Cov ( X , Y ) = ( Cov ( X , Y ) ) .
( X j ,Y )
and where
T
PROPOSITION.– If
(
Z T = X 1 ,..., X n , Y1 ,..., Yp
)
is a Gaussian vector of
covariance matrix Γ Z , we have the equivalence:
Cov ( X , Y ) = zero matrix ⇔ X and Y are two independent Gaussian vectors. DEMONSTRATION.–
⎛ΓX ⎜ ΓZ = ⎜ ⎜ 0 ⎝
0 ⎞ ⎟ ⎟⇔ ΓY ⎟⎠
ϕ Z ( u 1 ,..., u n, u n+1,..., u n+ p )
(
⎛ n+ p ⎛ΓX 1 T⎜ ⎜ = exp ⎜ i ∑ u j m j − u ⎜ 2 ⎜ ⎜ j =1 ⎝ 0 ⎝
)
0 ⎞ ⎞ ⎟ ⎟ ⎟u⎟ ΓY ⎟⎠ ⎟⎠
= ϕ X ( u 1,..., u n ) ϕY u n +1,..., u n + p , which is a necessary and sufficient condition for the independence of vector X and Y .
Gaussian Vectors
NOTE.– Given Z
T
(
71
)
= X T , Y T , U T ,... where X , Y ,U ,... are r.v. or random
vectors: – that Z is a Gaussian vector is a stronger hypothesis than – Gaussian
X and Gaussian Y and Gaussian U , etc.;
X and Gaussian Y and Gaussian U , etc. and their covariances (or T T T T matrix covariances) are zero ⇒ that Z = X , Y , U ,... is a Gaussian – Gaussian
(
)
vector. EXAMPLE.– Given that
X , Y , Z three r.v. ∼ N ( 0,1) , find the law of the vector
W T = (U , V ) or U = X + Y + Z and V = λ X − Y with λ ∈
( X ,Y , Z ) a, b ∈ aU + bV = ( a + λ b ) X + ( a − λ b ) Y + aZ W T = (U , V ) is a Gaussian vector.
the
independence,
the
vector
To determine this entirely we must know m = EW
W ∼ N 2 ( m, ΓW ) .
is
: because of
Gaussian
is a Gaussian r.v. Thus
and ΓW and we will have
It follows on easily:
EW T = ( EU , EV ) = ( 0, 0 )
⎛
and ΓW = ⎜
Var U
⎝ Cov (V ,U )
and
Cov (U ,V ) ⎞ ⎛ 3 λ −1 ⎞ ⎟=⎜ Var V ⎠ ⎝ λ − 1 λ 2 + 1⎟⎠
In effect:
Var U = EU 2 = E ( X + Y + Z ) = EX 2 + EY 2 + EZ 2 = 3 2
Var V = EV 2 = E ( λ X − Y ) = λ 2 EX 2 + EY 2 = λ 2 + 1 2
Cov (U , V ) = E ( X + Y + Z )( λ X − Y ) = λ EX 2 − EY 2 = λ − 1
72
Discrete Stochastic Processes and Optimal Filtering
Particular case:
λ = 1 ⇔ ΓW
diagonal ⇔ U and V are independent.
2.4. Affine transformation of a Gaussian vector We can generalize to vectors the following result on Gaussian r.v.:
(
If Y ∼ N m, σ
2
)
then ∀a, b ∈
(
)
aY + b ∼ N am + b, a 2σ 2 .
(
By modifying a little the annotation, with N am + b, a
σ2
2
)
becoming
N ( am + b, a VarYa ) , we can imagine already how this result is going to extend to Gaussian vectors. PROPOSITION.– Given a Gaussian vector Y ∼ N n ( m, ΓY ) , A a matrix
belonging to M ( p, n ) and a certain vector B ∈
(
T
vector ∼ N p Am + B, AΓY A
p
, then
).
AY + B is a Gaussian
DEMONSTRATION.–
⎛ a11 ⎜ ⎜ AY + B = ⎜ a 1 ⎜ ⎜ ⎜ a p1 ⎝
ai
⎛ a1n ⎞ ⎛ Y1 ⎞ ⎛ b1 ⎞ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ n a n ⎟ ⎜ Yi ⎟ + ⎜ b ⎟ = ⎜ ∑ a iYi + b ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ i =1 ⎟⎜ ⎟ ⎜ ⎟ ⎜ a pn ⎟⎠ ⎜⎝ Yn ⎟⎠ ⎜⎝ b p ⎟⎠ ⎜⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎟ ⎠
– this is indeed a Gaussian vector (of dimension p ) because every linear combination of its components is an affine combination of the r.v. Y1 ,..., Yi ,..., Yn and by hypothesis Y
T
= (Y1 ,..., Yn ) is a Gaussian vector;
Gaussian Vectors
73
– furthermore, we have seen that if Y is a 2nd order vector:
E ( AY + B ) = AEY + B = Am + B and Γ AY + B = AΓY AT . EXAMPLE.– Given emerges
( n + 1)
independent r.v. Y j ∼ N
( µ ,σ ) 2
j = 0 at n , it
Y T = (Y0 , Y1 ,..., Yn ) ∼ N n +1 ( m, ΓY ) with mT = ( µ ,..., µ ) and
⎛σ 2 ⎜ ΓY = ⎜ ⎜ 0 ⎝
0 ⎞ ⎟ ⎟. σ 2 ⎟⎠
Furthermore, given new r.v. X defined by:
X1 = Y0 + Y1 ,..., X n = Yn −1 + Yn ,
the vector X
T
= ( X 1 ,..., X n )
⎛ X 1 ⎞ ⎛ 110...0 ⎞ ⎛ Y0 ⎞ ⎜ ⎟ ⎜ ⎟⎜ ⎟ is Gaussian for ⎜ ⎟ = ⎜ 0110..0 ⎟ ⎜ ⎟ more ⎜ X ⎟ ⎜ 0...011 ⎟ ⎜ Y ⎟ ⎠⎝ n ⎠ ⎝ n⎠ ⎝
(
T
precisely following the preceding proposition, X ∼ N n Am, AΓY A NOTE.– If in this example we assume
µ =0
and
).
σ 2 = 1 , we are certain that the
vector X is Gaussian even though its components X j are not independent. In effect, we have for example:
Cov ( X1 , X 2 ) ≠ 0 because
EX 1 X 2 = E (Y0 + Y1 ) (Y1 + Y2 ) = EY12 = 1 and EX 1EX 2 = E (Y0 + Y1 ) E (Y1 + Y2 ) = 0 .
74
Discrete Stochastic Processes and Optimal Filtering
2.5. The existence of Gaussian vectors NOTATION.– u = ( u 1,..., u T
n
) , xT = ( x1 ,..., xn )
and m = ( m1 ,..., mn ) . T
We are interested here in the existence of Gaussian vectors, that is to say the existence of laws of probability on n having Fourier transforms of the form:
⎛ ⎞ 1 exp ⎜ i ∑ u j m j − u T Γu ⎟ ⎜ j ⎟ 2 ⎝ ⎠ PROPOSITION.–
Given
a
mT = ( m1 ,..., mm )
vector
and
a
matrix
Γ ∈ M ( n, n ) , which is symmetric and semi-defined positive, there is a unique probability PX on
∫
n
, of the Fourier transform:
⎛ n ⎞ ⎛ n 1 T ⎞ exp ,..., exp i u x dP x x = ⎜⎜ ∑ j j ⎟⎟ X ( 1 ⎜⎜ i ∑ u j m j − u Γu ⎟⎟ n) n 2 ⎝ j =1 ⎠ ⎝ j =1 ⎠
In addition: 1) if Γ is invertible, PX admits on
f X ( x1 ,..., xn ) =
n
the density:
1 n
( 2π ) 2 ( Det Γ )
1
2
T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ; ⎝ 2 ⎠
2) if Γ is non-invertible (of rank r < n ) the r.v. X 1 − m1 ,..., X n − mn are linearly dependent. We can still say that hyperplane ( Π ) of
n
ω → X (ω ) − m
a.s. takes its values on a
or that the probability PX loads a hyperplane ( Π ) does
not admit a density function on
n
.
Gaussian Vectors
75
DEMONSTRATION.– 1) Let us begin by recalling a result from linear algebra:
Γ being symmetric, we can find an orthonormal basis of n formed from eigenvectors of Γ ; let us call (V1 , ..., Vn ) this basis. By denoting the eigenvalues of Γ as λ j , we thus have ΓV j = λ jV j where the λ j are solutions of the equation Det ( Γ − λ I ) = 0 . Some consequences
⎛λ ⎜ 1 Let us first note Λ = ⎜ ⎜⎜ ⎝ 0 (where the V j are column vectors). – ΓV j = λ jV j
(
orthogonal VV
T
⎞ 0 ⎟ ⎟ and V = V1 , ⎟ λ n ⎟⎠
(
– The
λj
λj
)
j = 1 at n equates to ΓV = V Λ and, matrix V being
)
= V T V = I , Γ = V ΛV T .
Let us demonstrate that if in addition Γ is invertible, the and thus the
, V j , Vn
λj
are ≠ 0 and ≥ 0 ,
are > 0.
are ≠ 0 . In effect, Γ being invertible, n
0 ≠ Det Γ = Det Λ = ∏ λ j j =1
The
λj
are ≥ 0 : let us consider in effect the quadratic form u → u
( ≥ 0 since Γ is semi-defined positive).
T
Γu
76
Discrete Stochastic Processes and Optimal Filtering
In the basis (V1...Vn ) , u is written ( u 1,..., u
n
)
with u j = < V j , u > and the
⎛u1⎞ ⎜ ⎟ 2 quadratic form is written u → ( u 1,..., u n ) Λ ⎜ ⎟ = ∑ λ j u j ≥ 0 from which ⎜u ⎟ j ⎝ n⎠ we get the predicted result. Let us now demonstrate the proposition. 2) Let us now look at the general case, that is to say, in which Γ is not necessarily invertible (once again that the eigenvalues λ j are ≥ 0 ).
(
)
Let us consider n independent r.v. Y j ∼ N 0, λ j . We know that the vector Y
T
= (Y1 ,..., Yn ) is Gaussian as well as the vector
X = VY + m (proposition from the preceding section); more precisely
(
)
X ∼ N m , Γ = V ΛV T . The existence of vectors of Gaussian expectation and of a given covariance matrix is thus clearly proven. Furthermore, we have seen that if X is N n ( m, Γ ) , its characteristic function
⎛ ⎜ ⎝
(Fourier transformation of its law) is exp ⎜ i
1
⎞
∑ u j m j − 2 uT Γu ⎟⎟ j
⎠
We thus in fact have:
∫
⎛ 1 T ⎞ = − exp ,..., exp i u x dP x x i u m u Γu ⎟ . ( ) ⎜ ∑ ∑ 1 j j X n j j n ⎜ j ⎟ 2 ⎝ ⎠
(
)
Uniqueness of the law: this ensues from the injectivity of the Fourier transformation.
Gaussian Vectors
77
3) Let us be clear to terminate the role played by the invertibility of Γ . a) If Γ is invertible all the eigenvalues
Y T = (Y1...Yn ) admits the density:
λ j ( = VarY j )
are > 0 and the vector
⎛ y 2j ⎞ 1 exp ⎜ − fY ( y1 ,..., yn ) = ∏ ⎟ ⎜ 2λ j ⎟ j =1 2πλ j ⎝ ⎠ 1 ⎛ 1 ⎞ exp ⎜ − yT Λ −1 y ⎟ = 1 ⎝ 2 ⎠ ⎞ 2 n ⎛ n 2 ( 2π ) ⎜⎜ ∏ λ j ⎟⎟ ⎝ j =1 ⎠ n
As far as the vector X = VY + m is concerned: the affine transformation
y → x = Vy + m is invertible and has y = V −1 ( x − m ) as the inverse and has Det V = ±1 ( V orthogonal) as the Jacobian. n
Furthermore
∏ λ j = Det Λ = Det Γ . j =1
By applying the theorem on the transformation of a random vector by a
C1 -diffeomorphism, we obtain the density probability of vector X:
(
)
f X ( x1 ,..., xn ) = f X ( x ) = fY V −1 ( x − m ) = ↑
↑
notation
1 n
( 2π ) 2 ( DetΓ )
1
2
theorem
↑ we clarify
( )
T ⎛ 1 exp ⎜ − ( x − m ) V T ⎝ 2
−1
⎞ Λ −1V −1 ( x − m ) ⎟ ⎠
78
Discrete Stochastic Processes and Optimal Filtering T
As Γ = V ΛV :
f X ( x1 ,..., xn ) =
1 n
( 2π ) 2 ( DetΓ )
1
2
T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ⎝ 2 ⎠
b) If rank Γ = r < n , let us rank the eigenvalues of Γ in decreasing order: and λr +1 = 0,..., λn = 0
λ1 ≥ λ2 ≥ ...λr > 0
Yr +1 = 0 a.s .,..., Yn = 0 a.s. and, almost surely, X = VY + m takes its values
in ( Π ) the hyperplane of affine mapping
n
the image of
y → Vy + m .
NOTE.– Given a random vector
ε = { y = ( y1 ,..., yr , 0,..., 0 )} by the
X T = ( X 1 ,..., X n ) ∼ N n ( m, Γ X ) and
supposing that we have to calculate an expression of the form:
EΨ ( X ) = ∫
∫
n
n
Ψ ( x ) f X ( x ) dx =
Ψ ( x1 ,..., xn ) f X ( x1 ,..., xn ) dx1...dxn
In general, the density f X , and in what follows the proposed calculation, are rendered complex by the dependence of the r.v. X 1 ,..., X n . Let
λ1 ,..., λn
diagonalizes Γ X .
be the eigenvalues of Γ X and V the orthogonal matrix which
Gaussian Vectors
We have X = VY + m with Y
(
∼ N 0, λ j
)
T
79
= (Y1 ,..., Yn ) , the Y j being independent and
and the proposed calculation can be carried out under the simpler
form: − yj ⎛ n 1 2λ Ψ (Vy + m ) ⎜ ∏ e j n ⎜ ⎜ j =1 2πλ j ⎝
2
E Ψ ( X ) = E Ψ (VY + m ) = ∫
⎞ ⎟ dy ...dy n ⎟ 1 ⎟ ⎠
EXAMPLE.– 1) The expression of a normal case: Let the Gaussian vector X
⎛1
where Γ X = ⎜
⎝ρ
T
= ( X1 , X 2 ) ∼ N 2 ( 0, Γ X )
ρ⎞
⎟ with ρ ∈ ]−1,1[ . 1⎠
Γ X is invertible and f X ( x1 , x2 ) =
⎛ 1 1 ⎞ exp ⎜ − x12 − 2 ρ x1 x2 + x22 ⎟ . 2 ⎝ 2 1− ρ ⎠ 2π 1 − ρ 2 1
(
)
80
Discrete Stochastic Processes and Optimal Filtering
1
fx
2π 1 − ρ 2
ε 0
x1
x2
The intersections of the graph of ellipses
ε
2 from equation x1
fX
with the horizontal plane are the
− 2 ρ x1 x2 + x22 = C
(constants)
Figure 2.2. Example of the density of a Gaussian vector
2) We give ourselves the Gaussian vector X
T
= ( X 1 , X 2 , X 3 ) with:
⎛ 3 0 q⎞ ⎜ ⎟ m = (1, 0, −2 ) and Γ = ⎜ 0 1 0 ⎟ . ⎜q 0 1⎟ ⎝ ⎠ T
Because of Schwarz’s inequality must suppose q ≤
( Cov ( X1, X 2 ) )
2
≤ Var X 1 Var X 2 we
3.
We wish to study the density f X ( x1 , x2 , x3 ) of vector X .
Gaussian Vectors
81
Eigenvalues of Γ :
Det ( Γ − λΙ ) =
3−λ
0
q
(
)
1− λ 0 = (1 − λ ) λ 2 − 4λ + 3 − q 2 . 0 1− λ
0 q
From which we obtain the eigenvalues ranked in decreasing order:
λ1 = 2 + 1 + q 2 a) if q < density in
3
3 then λ1 > λ2 > λ3 , Γ is invertible and X has a probability
given by:
f X ( x1 , x2 , x3 ) =
b) q =
, λ2 = 1 , λ3 = 2 − 1 + q 2
1 3
( 2π ) 2 ( λ1λ2λ3 )
1
2
T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ; ⎝ 2 ⎠
3 thus λ1 = 4 ; λ2 = 1; λ3 = 0 and Γ is non-invertible of rank
2. Let us find the orthogonal matrix V ΓV j = λ j V j . For
λ1 = 4 ; λ2 = 1; λ3 = 0
which diagonalizes Γ by writing
we obtain respectively the eigenvectors
⎛ 3 ⎞ ⎛− 1 ⎞ ⎛ 0⎞ ⎜ 2⎟ ⎜ 2⎟ ⎜ ⎟ ⎜ ⎟ ⎜ V = 0 ⎟ V = 1 V1 = 0 ⎜ ⎟ , 2 ⎜ ⎟, 3 ⎜ ⎟ ⎜ 0⎟ ⎜⎜ 1 ⎟⎟ ⎜⎜ 3 ⎟⎟ ⎝ ⎠ ⎝ 2⎠ ⎝ 2 ⎠
82
Discrete Stochastic Processes and Optimal Filtering
and the orthogonal matrix
(VV
V = (V1 V2 V3 )
T
)
= V TV = Ι .
Given the independent r.v. Y1 ∼ N ( 0, 4 ) and Y2 ∼ N ( 0,1) and given the r.v.
Y3 = 0 a.s., we have: ⎛ 3 ⎛ X1 ⎞ ⎜ 2 ⎜ ⎟ X = ⎜ X2 ⎟ = ⎜ 0 ⎜ ⎜X ⎟ ⎜ ⎝ 3⎠ ⎜ 1 ⎝ 2
0 1 0
− 1 ⎟⎞ ⎛ Y ⎞ ⎛ 1 ⎞ 2 1 ⎜ ⎟ ⎜ ⎟ 0 ⎟ ⎜ Y2 ⎟ + ⎜ 0 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ 3 ⎟⎟ ⎝ 0 ⎠ ⎝ −2 ⎠ 2⎠
⎛ X 1∗ ⎞ ⎜ ∗⎟ ∗ or, by calling X = ⎜ X 2 ⎟ the vector X after centering, ⎜⎜ ∗ ⎟⎟ ⎝ X3 ⎠ ⎛ X1∗ ⎞ ⎜⎛ 3 ⎜ ∗⎟ ⎜ 2 0 ⎜ X2 ⎟ = ⎜⎜ ∗ ⎟⎟ ⎜⎜ ⎝ X 3 ⎠ ⎜⎝ 1 2
0 1 0
− 1 ⎟⎞ ⎛ Y ⎞ X1∗ = 3 2Y1 2 1 ⎜ ⎟ 0 ⎟ ⎜ Y2 ⎟ i.e. X 2∗ = Y2 ⎟ ⎜ ⎟ 3 ⎟⎟ ⎝ 0 ⎠ X 3∗ = 1 Y1 2 2⎠
⎛ X 1∗ ⎞ ⎜ ⎟ ∗ ∗ We can further deduce that X = ⎜ X 2 ⎟ . ⎜⎜ ∗⎟ ⎟ 3 X 1 ⎝ ⎠
Gaussian Vectors
83
x3 1
U
0
x2
3
x1 Figure 2.3. The plane
( Π ) is the support of probability PX
Thus, the vector X ∗ describes almost surely the plane ( Π ) containing the axis
0x2 and the vector U T =
(
)
3,0,1 . The plane ( Π ) is the support of the
probability PX .
Probability and conditional expectation Let us develop a simple case as an example. Let
the
Gaussian
( Cov ( X , Y ) ) ρ=
Z T = ( X , Y ) ∼ N 2 ( 0, Γ Z ) .
In
stating
2
and Var X = σ 1 , Var Y = σ 2 the density Z is written: 2
VarX VarY
f Z ( x, y ) =
vector
1 2πσ 1σ 2
2
⎛ ⎛ x2 1 xy y 2 ⎞ ⎞⎟ exp ⎜ − 2 − ρ + ⎜ ⎟ . ⎜ 2 1 − ρ 2 ⎝ σ 12 σ 1σ 2 σ 22 ⎠ ⎟ 1− ρ 2 ⎝ ⎠
(
)
84
Discrete Stochastic Processes and Optimal Filtering
Conditional density of X knowing Y = y
f ( x y) = 1 =
=
2πσ 1σ 2
f Z ( x, y ) = fY ( y )
f Z ( x, y ) dx
∫
⎡ ⎛ x2 1 xy y 2 ⎞ ⎤⎥ ⎢ − 2ρ + exp − ⎜ ⎟ σ 1σ 2 σ 22 ⎠ ⎥ ⎢ 2 1 − ρ 2 ⎝ σ 12 1− ρ 2 ⎣ ⎦ 2 ⎡ 1 y ⎤ 1 exp ⎢ − 2⎥ 2π σ 2 ⎣ 2 σ2 ⎦
(
1
(
f Z ( x, y )
σ 1 2π 1 − ρ
2
)
)
⎡ ⎛ σ1 1 ρ exp ⎢ − 2 − x ⎜ σ2 ⎢ 2σ 1 1 − ρ 2 ⎝ ⎣
(
)
⎞ y⎟ ⎠
2⎤
⎥ ⎥ ⎦
X being a real variable and y a fixed numeric value, we can recognize a Gaussian density. More precisely: the conditional law of X , knowing Y = y , is ⎛ σ ⎞ N ⎜ ρ 1 y , σ 12 1 − ρ 2 ⎟ . ⎝ σ2 ⎠
(
We see in particular that
)
E ( X y) = ρ
σ1 y σ2
and that
In Chapter 4, we will see more generally that if
( X , Y1 ,..., Yn ) n
vector,
E(X Y) = ρ
E ( X Y1 ,..., Yn ) is written in the form of λ0 + ∑ λ jY j . j =1
σ1 Y. σ2
is a Gaussian
Gaussian Vectors
85
2.6. Exercises for Chapter 2 Exercise 2.1.
D of center 0 and of radius R which is used for archery. The couple Z = ( X , Y ) represents the coordinates of the point of We are looking at a circular target
impact of the arrow on the target support; we assume that the r.v. independent and following the same law
(
N 0.4 R
2
).
X and Y are
1) What is the probability that the arrow reach the target? 2) How many times must one fire the arrow In order for, with a probability
≥ 0.9 , the target is reached at least once (we give n10 ≠ 2.305 ).
Let us assume that we fire 100 times at the target, calculate the probability that the target to reached at least 20 times. Hint: use the central limit theorem. Solution 2.1.
Z = ( X ,Y )
X and Y being independent, the density of probability of ⎛ x2 + y 2 ⎞ 1 is f Z ( x, y ) = f X ( x ) fY ( y ) = and exp ⎜− 2 ⎟ R 8π R 2 8 ⎝ ⎠
P (Z ∈ D) =
1 8π R 2
1) The r.v.s
⎛ x2 + y 2 ⎞ exp ∫D ⎜⎝ − 8R 2 ⎟⎠ dx dy using a change from Cartesian
to polar coordinates: R −e 1 ⎞ 2π ⎛ = ⎜− dθ ∫ e 2 ⎟ ∫0 0 ⎝ 8π R ⎠
2
8 R 2 ede
=
−1 1 1 R2 −u 2 ⋅ 2π ⋅ ∫ e 8 R du = 1 − e 8 2 2 0 8π R
2) At each shot k , we associate a Bernoulli r.v. U k ∼ b ( p ) defined by
⎛ U k = 1 if the arrow reaches the target (probability p ) ⎜ ⎝ U k = 0 if the arrow does not reach the target (probability 1- p )
86
Discrete Stochastic Processes and Optimal Filtering
In n shots, the number of impacts is given by the r.v.
U = U1 + ... + U n ∼ B ( n, p )
P (U ≥ 1) = 1 − P (U = 0 ) = 1 − Cnk p k (1 − p ) = 1 − (1 − p ) We
are
⇔ (1 − p ) n ≥ 19
n
n−k
( where k = 0 )
n
n which verifies 1 − (1 − p ) ≥ 0,9 n10 n10 n10 2,3 # ≤ 0,1 ⇔ n ≥ − =− =− i.e. −1 1 n (1 − p ) n (1 − p ) ne 8 8 thus
looking
n
for
3) By using the previous notations, we are looking to calculate P (U ≥ 20 ) with
U = U1 + P (U1 +
with
µ = 1− e
+ U100 , which is to say: ⎛U + + U100 ≥ 20 ) = P ⎜ 1 ⎝ −1
8
# 0,1175
and
+ U100 − 100µ 100σ
≥
−1 −1 ⎞ ⎛ σ = ⎜ ⎜⎛ 1 − e 8 ⎟⎞ e 8 ⎟ ⎠ ⎝⎝ ⎠
1
20 − 100µ ⎞ ⎟ 100σ ⎠ 2
# 0,32
8, 25 ⎞ ⎛ P⎜S ≥ = P ( S ≥ 2,58 ) = 1 − F0 ( 2,58 ) 3, 2 ⎟⎠ ⎝ where S is an r.v. N ( 0,1) and
F0 distribution function of the r.v. N ( 0,1) .
Finally P (U ≥ 20 ) = 1 − 0,9951# 0, 005 .
i.e.
Gaussian Vectors
87
Exercise 2.2.
n independent r.v. of law N ( 0,1) and given
X 1 ,… , X n
Given
a 1 ,… , a n ; b 1,… , b n
2n real constants:
1) Show that the r.v. Y =
n
n
j =1
j =1
∑ a j X j and Z = ∑ b j are independent if and
n
only if
∑ a jb j = 0 . j =1
2) Deduce from this that if the r.v.
X=
X 1 ,..., X n are n independent r.v. of law N ( 0,1) ,
1 n ∑ X j and YK = X K − X (where n j =1
K∈
{1, 2,..., n} )
are
independent. For K
≠
YK and Y are they independent r.v.?
Solution 2.2. 1) U = (Y , Z ) is evidently a Gaussian vector. ( ∀λ and
µ∈ ,
In order for
Y and Z to be independent it is thus necessary and sufficient that:
the r.v. λY + µ Z is evidently a Gaussian r.v.).
0 = Cov (Y , Z ) = EYZ = ∑ a j b j EY j Z j = ∑ a j b j j
2) In order to simplify the expression, let us make K
j
= 1 an example:
1 1 ⎛ 1⎞ X n ; Y1 = ⎜1 − ⎟ X 1 − X 2 − n n ⎝ n⎠ n 1⎛ 1⎞ 1 and ∑ a j b j = ⎜ 1 − ⎟ − ( n − 1) = 0 n⎝ n⎠ n j =1 X=
1 X1 + n
+
−
1 Xn n
88
Discrete Stochastic Processes and Optimal Filtering
– To simplify let us make K
= 1 and
=2
1 1 ⎛ 1⎞ Y1 = ⎜1 − ⎟ X 1 − X 2 − − X n ; n n ⎝ n⎠ 1 1 ⎛ 1⎞ Y2 = − X 1 + ⎜1 − ⎟ X 2 − − X n n n ⎝ n⎠ n
and
⎛
1⎞1
1
∑ a j b j = −2 ⎜⎝1 − n ⎟⎠ n − ( n − 2 ) n < 0 , thus Y1 and Y2 are dependent. j =1
Exercise 2.3.
X ∼ N ( 0,1) and a discrete r.v. ε such that 1 1 P ( ε = −1) = and P = ( ε = +1) = . 2 2 We give a real r.v.
We suppose
X and ε independent. We state Y = ε X :
– by using distributions functions, verify that Y ∼ N ( 0,1) ; – show that Cov ( X , Y ) = 0 ; – is the vector U = ( X , Y ) gaussian? Solution 2.3. 1)
(
FY ( y ) = P (Y ≤ y ) = P ( ε X ≤ y ) = P ( ε X ≤ y ) ∩ ( ( ε = 1) ∪ ( ε = −1) )
=P
( ( (ε X ≤ y ) ∩ (ε = 1) ) ∪ ( (ε X ≤ y ) ∩ (ε = −1) ) )
)
Gaussian Vectors
89
Because of the incompatibility of the two events linked by the union
= P ( ( ε X ≤ y ) ∩ ( ε = 1) ) + P ( ( ε X ≤ y ) ∩ ( ε = −1) ) = P ( ( X ≤ y ) ∩ ( ε = 1) ) + P ( ( − X ≤ y ) ∩ ( ε = −1) ) Because of the independence of
X and ε ,
P ( X ≤ y ) P ( ε = 1) + P ( − X ≤ y ) P ( ε = −1) =
1 ( P ( X ≤ y ) + P ( − X ≤ y )) 2
Finally, thanks to the parity of the density of the law N ( 0,1) ,
= P ( X ≤ y ) = FX ( y ) ; 2) Cov ( X , Y ) = EXY − EXEY = Eε X − EX Eε X = Eε EX 2
0
2
= 0;
0
3) X + Y = X + ε X = X (1 + ε ) ;
(
)
Thus P ( X + Y = 0 ) = P X (1 + ε ) = P (1 + ε = 0 ) =
1 2
λ X + µY (with λ = µ = 1 ) because the law admits no density ( PX +Y ({0} ) = 1 ). 2 We can deduce that the r.v.
Thus the vector U = ( X , Y ) is not Gaussian.
is not Gaussian,
90
Discrete Stochastic Processes and Optimal Filtering
Exercise 2.4. Given a real r.v. X ∼ N ( 0,1) and given a real a > 0 :
⎧⎪ X if X < a is also a real ⎪⎩− X if X ≥ a
1) Show that the real r.v. Y defined by Y = ⎨ r.v. ∼ N ( 0,1)
(Hint: show the equality of the distribution functions FY = FX .)
4 2) Verify that Cov ( X , Y ) = 1 − 2π
∞
∫a
2
x e
− x2
2 dx
Solution 2.4. 1) FY ( y ) = P ( Y ≤ y ) = P
( (Y ≤ y ) ∩ ( X
Distributivity and then incompatibility
( P ( (Y ≤ y )
)
(
< a) ∪ ( X ≥ a)
)
⇒
)
P (Y ≤ y ) ∩ ( X < a ) + P (Y ≤ y ) ∩ ( X ≥ a ) =
)
((
)
X < a P ( X < a) + P Y ≤ y X ≥ a P ( X ≥ a)
P ( X ≤ y ) P ( X < a ) + P (( − X ≤ y )) P ( X ≥ a ) P( X ≤ y )
because
1 − x2 2 e = f X ( x) is even 2π
(
)
= P ( X ≤ y ) P ( X < a ) + P ( X ≥ a ) = P ( X ≤ y ) = FX ( y ) ;
)
Gaussian Vectors
91
2) EX = 0 and EY = 0, thus:
Cov ( X , Y ) = EXY = ∫ =∫ −∫
∞ −∞
−a −∞
a −a
x 2 f X ( x ) dx − ∫
x 2 f X ( x ) dx − ∫
−a −∞
−a −∞
∞
x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a
∞
x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a
∞
x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a
The 1st term equals EX 2 = VarX = 1 . The sum of the 4 following terms, because of the parity of the integrated function, equals
∞
−4∫ x 2 f X ( x ) dx from which we obtain the result. a
Exercise 2.5.
⎛X⎞ ⎛ 0⎞ Z = ⎜ ⎟ be a Gaussian vector of expectation vector m = ⎜ ⎟ and of ⎝Y ⎠ ⎝1 ⎠ ⎛ 1 1 ⎞ 2 ⎟ which is to say covariance matrix Γ Z = ⎜ Z ∼ N 2 ( m, Γ Z ) . ⎜1 1 ⎟ ⎝ 2 ⎠ Let
1) Give the law of the random variable
X − 2Y .
2) Under what conditions on the constants a and b is the random variable aX + bY independent of X − 2Y and of variance 1? Solution 2.5. 1) X ∼ N ( 0,1) and Y ∼ N (1,1) ; as
X and Y are also independent thus
X − 2Y is a Gaussian r.v.; precisely X − 2Y ∼ N ( −2,5 ) . ⎛ X − 2Y ⎞ ⎟ is a Gaussian vector (write the definition) X − 2Y and ⎝ aX + bY ⎠
2) As ⎜
aX + bY
are
independent
⇔
Cov ( X − 2Y , aX + bY ) = 0
now
92
Discrete Stochastic Processes and Optimal Filtering
Cov ( X − 2Y , aX + bY ) = aVarX − b Cov ( X , Y )
2 −2a Cov ( X , Y ) − 2bVarY = a − b − a = 0 i.e. b = 0 3 As 1 = Var ( a X
+ bY ) = Var aX = a 2 Var X
: a = ±1 .
Exercise 2.6.
X and Y and we assume that X admits a density probability f X ( x ) and that Y ∼ N ( 0,1) . We are looking at two independent r.v.
Determine the r.v.
(
)
E e XY X .
Solution 2.6.
(
)
E e XY x = E xY = ∫ e xy 1 x2 2 = e ∫ e 2π 1 So y → e 2π finally obtain
(
−( y − x ) 2
−( y − x ) 2
)
1 −y 2 e dy 2π 2
2
dy
2
is a density of probability (v.a. ∼ N ( x,1) ), and we
E e XY X = e
X2
2
.
Chapter 3
Introduction to Discrete Time Processes
3.1. Definition A discrete time process is a family of r.v.
{
XT = Xt j t j ∈T ⊂
}
where T called the time base is a countable set of instants. X t is the r.v. of the j
family considered at the instant t j . Ordinarily, the t j are uniformly spread and distant from a unit of time and in the sequence T will be equal to
,
or
∗
and the processes will be still denoted
X T or, if we wish to be precise, X , X or X
∗
.
In order to be able to study correctly some sets of r.v. X j of X T and not only the r.v. X j individually, it is in our interests to consider the latter as being definite mappings on the same set and this leads us to an exact definition.
94
Discrete Stochastic Processes and Optimal Filtering
DEFINITION.– Any X T family with measurable mappings is called a real discrete time stochastic process:
Xj: ω
⎯⎯ →
(
( Ω,a )
X j (ω ) ,B (
with j ∈ T ⊂
))
We also say that the process is defined on the fundamental space ( Ω, a ) . In general a process X T is associated with a real phenomenon, that is to say that the X j represent (random) physical, biological, etc. values. For example the intensity of electromagnetic noise coming from a certain star. For a given
ω,
that is to say after the phenomenon has been performed, we
obtain the values x j = X j (ω ) .
{
DEFINITION.– xT = x j j ∈ T
}
is called the realization or trajectory of the
process X T .
X −1
X0
X1
X2
Xj
xj
x1 x2
x−1 -1
x0
0
1
2
Figure 3.1. A trajectory
j
t
Introduction to Discrete Time Processes
95
Laws We defined the laws PX of the real random vectors X Chapter 1. These laws are measures defined on n
Borel algebra of The finite sets
B
= ( X 1 ,..., X n ) in
T
( ) =B (
) ⊗ ... ⊗ B ( )
n
.
( X i ,..., X j ) of r.v. of X T are random vectors and, as we will
be employing nothing but sets such as these in the following chapters, the considerations of Chapter 1 will be sufficient for the studies that we envisage. T
and in certain problems we cannot avoid the following However, X T ∈ additional sophistication: 1) construction of a
σ
-algebra
2) construction of laws on
B
B
( ) = ⊗ B ( ) on T
j
j∈T
T
;
( ) (Kolmogorov’s theorem). T
Stationarity DEFINITION.– We say that a process
∀i, j , p ∈
the random vectors
same law, i.e. ∀Bi ,..., B j ∈ B (
((
)
(
{
XT = X j j ∈
}
is stationary if
( X i ,..., X j ) and ( X i+ p ,..., X j + p ) have the ) (in the drawing the Borelians are intervals):
P X i + p ∈ Bi ∩ ... ∩ X j + p ∈ B j
) ) = P ( ( X i ∈ Bi ) ∩ ... ∩ ( X j ∈ B j ) )
96
Discrete Stochastic Processes and Optimal Filtering
i +1
i
i+ p
j
i +1+ p
j+ p
t
Wide sense stationarity DEFINITION.– We say that a process
X T is centered if EX j = 0
DEFINITION.– We say that a process
X T is of the second order if:
X j ∈ L2 ( dP )
∀j ∈ T .
∀j ∈ T .
Let us remember that if
X j ∈ L2 ∀j ∈ T then X j ∈ L1 and ∀i, j ∈ T
EX i X j < ∞ . Thus, the following definition is meaningful. DEFINITION.– Given
X a real 2nd order process, we call the covariance function
of this process, the mapping
(
Γ : i, j ⎯⎯ → Γ ( i, j ) = Cov X i , X j
)
x We call the autocorrelation function of this process the mapping:
R : i, j ⎯⎯ → R ( i, j ) = E X i X j x
Introduction to Discrete Time Processes
97
These two mappings obviously coincide if X ] is centered. We can recognize here notions introduced in the context of random vectors, but here as the indices ...i,... j ,... represent instants, we can expect in general that when the deviations
i − j increase, the values Γ ( i, j ) and R ( i, j ) decrease. DEFINITION.– We say that the process X ] is wide sense stationary (WSS) if: – it is of the 2nd order;
→ m ( j ) = EX is constant; – the mapping j ⎯⎯ ]
\ Γ ( i + p, j + p ) = Γ ( i, j )
– ∀ i, j , p ∈ ]
In this case Γ ( i, j ) is instead written C ( j − i ) . Relationship linking the two types of stationarity A stationary process is not necessarily of the 2nd order as we see with the process X ] for example in which we choose for X j r.v. independent of Cauchy’s law:
fX j ( x) =
(
a
π a +x 2
2
)
and a > 0 and
EX j and EX 2j are not defined.
A “stationary process which is also of the 2nd order” (or a process of the 2nd order which is also stationary) must not be confused with a WSS process. It is clear that if a process of the 2nd order is stationary, it is thus WSS. In effect:
EX j + p = ∫ xdPX j+ p ( x ) = ∫ xdPX j ( x ) = EX j \
\
98
Discrete Stochastic Processes and Optimal Filtering
and:
Γ ( i + p, j + p ) = ∫ =∫
xy dPX i+ p , X j+ p ( x, y ) − EX i + p EX j + p
2
2
xy dPX i , X j ( x, y ) − EX i EX j = Γ ( i, j )
The inverse implication “wide sense stationary (WSS) ⇒ stationarity” is false in general. However, it is true in the case of Gaussian processes. Ergodicity Let X
be a WSS process.
DEFINITION.– We say that the expectation of X
EX 0 = lim
N ↑∞
N
1 2N + 1
∑
j =− N
X j (ω ) a.s. (almost surely)
We say that the autocorrelation function X
∀n ∈
is ergodic if:
K ( j, j + n ) = EX j X j +n = lim
N ↑∞
is ergodic if:
1 2N + 1
N
∑
j =− N
X j (ω ) X j +n (ω ) a.s.
That is to say, except possibly for ω ∈ N set of zero probability or even with the exception of trajectories whose apparition probability is zero, we have for any trajectory x :
EX 0 = lim
N ↑∞
+N
1 2N + 1
∑
j =− N
x j (ergodicity of 1st order)
= EX j X j + n = lim N ↑∞
1 2N + 1
+N
∑
j =− N
x j x j + n (ergodicity of 2nd order)
Introduction to Discrete Time Processes
With the condition that the process X
99
is ergodic, we can then replace a
mathematical expectation by a mean in time. This is a sufficient condition of ergodicity of 1st order. PROPOSITION.– Strong law of large numbers: If the X j ( j ∈
)
form a sequence of independent r.v. and which are of the
same law and if E X 0 < ∞ then EX 0 = lim
N ↑∞
+N
1
∑
2 N + 1 j =− N
X j (ω ) a.s.
NOTE.– Let us suppose that the r.v. X j are independent Cauchy r.v. of probability density
1
a π a + x2 2
( a > 0) .
By using the characteristic functions technique, we can verify that the r.v.
YN =
1
+N
∑
2 N + 1 j =− N
X j has the same law as X 0 ; thus YN can not converge a.s. to
the constant EX 0 , but E X 0 = +∞ .
X
EXAMPLE.– We are looking at the process
which consists of r.v.
X j = A cos ( λ j + Θ ) where A is a real constant and where Θ is an r.v. of uniform probability density f Θ (θ ) =
1 1 (θ ) . Let us verify that X 2π [0,2π [
is a
WSS process.
EX j = ∫
2π 0
Acos ( λ j + θ ) f Θ (θ ) dθ =
Γ ( i, j ) = K ( i, j ) = EX i X j = ∫
A2 2π
2π
∫0
2π 0
A 2π
2π
∫0
cos ( λ j + θ ) dθ = 0
Acos ( λ j + θ ) Acos ( λ j+θ ) f Θ (θ ) dθ
cos ( λ i + θ ) cos ( λ j + θ ) dθ =
A2 cos ( λ ( j − i ) ) 2
100
Discrete Stochastic Processes and Optimal Filtering
and X
is in fact WSS.
Keeping with this example, we are going to verify the ergodicity expectation. Ergodicity of expectation
lim N
+N
1
∑
Acos ( λ j + θ ) (with θ fixed ∈ [ 0, 2π [ )
2 N + 1 j =− N
= lim N
1
N
∑
2 N + 1 j =− N
Acosλ j = lim N
2A ⎛ N 1⎞ ⎜⎜ ∑ cosλ j − ⎟⎟ 2 N + 1 ⎝ j =0 2⎠
iλ N +1 N 2A ⎛ 1⎞ 2 A ⎛ 1- e ( ) 1 ⎞ iλ j = lim − ⎟ ⎜ Re ⎜ Re ∑ e − ⎟⎟ = lim N 2N + 1 ⎜ 2 ⎠ N 2 N + 1 ⎝⎜ 2 ⎠⎟ 1 − eiλ ⎝ j =0
If λ ≠ 2kπ , the parenthesis is bounded and the limit is zero and equal to EX 0 . Therefore, the expectation is ergodic. Ergodicity of the autocorrelation function
lim N
(with
∑
2 N + 1 j =− N
θ
= lim N
= lim N
+N
1
Acos ( λ j + θ ) Acos ( λ ( j + n ) + θ )
[
[
fixed ∈ 0, 2π )
A2
+N
∑
2 N + 1 j =− N
cosλ j cosλ ( j + n )
1 A2 + N ∑ ( cosλ ( 2j+n ) + cosλ n ) 2 2 N + 1 j =− N
+N ⎛ 1 A2 ⎛ ⎞ ⎞ A2 = lim ⎜ Re ⎜ eiλ n ∑ eiλ 2 j ⎟ ⎟ + cosλ n ⎜ ⎟⎟ 2 N ⎜ 2 2N + 1 j =− N ⎝ ⎠⎠ ⎝
The
limit
is
still
zero
autocorrelation function is ergodic.
and
A2 cosλ n = K ( j , j + n ) . Thus, the 2
Introduction to Discrete Time Processes
101
Two important processes in signal processing Markov process DEFINITION.– We say that X – ∀B ∈ B (
is a discrete Markov process if:
);
– ∀t1 ,..., t j +1 ∈
with t1 < t2 < ... < t j < t j +1 ;
– ∀x1 ,..., x j +1 ∈
.
(
) (
)
P X t j+1 ∈ B X t j = x j ,..., X t1 = x1 = P X t j+1 ∈ B X t j = x j , an
Thus
equality that more briefly can be written:
(
) (
)
P X t j+1 ∈ B x j ,..., x1 = P X t j+1 ∈ B x j . We can say that if t j represents the present instant, for the study of X towards the future (instants > t j ), the information
(
{( X
tj
) (
= x j ,..., X t 1 = x1
)
brings nothing more than the information X t j = x j .
B
xt1 xt
t1
j −1
t j −1
tj xt
t j +1 j
t
)}
102
Discrete Stochastic Processes and Optimal Filtering
Markov processes are often associated with phenomena beginning at instant 0 for example and we thus choose the probability law Π 0 of the r.v. X 0 .
(
The conditional probabilities P X t ∈ B x j j +1
) are called transition probabilities.
= j.
In what follows, we suppose t j
DEFINITION.– We say that the transition probability is stationary if
(
)
(
)
P X j +1 ∈ B x j is independent of j = P ( X 1 ∈ B x0 ) .
Here is an example of a Markov process that in practice is often met.
X
is defined by the r.v.
(
)
X 0 and the relation of recurrence
X j +1 = f X j , N j where the N j are independent r.v. and independent of the r.v.
X 0 and where f is a mapping:
×
Thus, let us show that ∀B ∈ B (
):
2
→
Borel function.
( ) ( ) ⇔ P ( f ( X , N ) ∈ B x , x ,..., x ) = P ( f ( X , N ) ∈ B x ) ⇔ P ( f ( x , N ) ∈ B x , x ,..., x ) = P ( f ( x , N ) ∈ B x ) P X j +1 ∈ B x j , x j −1 ,..., x0 = P X j +1 ∈ B x j j
j
j
j
j
j
j −1
j −1
0
0
This equality will be verified if the r.v.
( X j −1 = x j −1 ) ∩ ... ∩ ( X 0 = x0 ) .
j
j
j
j
j
j
N j is independent of
Introduction to Discrete Time Processes
103
Now the relation of recurrence leads us to expressions of the form:
X 1 = f ( X 0 , N 0 ) , X 2 = f ( X 1 , N1 ) = f ( f ( X 0 , N 0 ) , N1 )
(
= f 2 ( X 0 , N 0 , N1 ) ,..., X j = f j X 0 , N1 ,..., N j −1 which proves that of
)
N j , being independent of X 0 , N1 ,..., N j −1 , is also independent
X 0 , X 1 ,..., X j −1 (and even of X j ).
Gaussian process
the random vector
(
X S = X i ,..., X j
remember is denoted X S ∼
(
is Gaussian if ∀ S = ( i,..., j ) ∈
X
DEFINITION.– We say that a process
)
,
is a Gaussian vector that as we will
)
N n mS , Γ X s .
We see in particular that as soon as we know that a process law is entirely determined by its expectation function
X is Gaussian, its
j → m ( j ) and its
covariance function i, j → Γ ( i, j ) . A process such as this is denoted
X ∼ N ( m ( j ) , Γ ( i, j ) ) .
A Gaussian process is obviously of the 2nd order: furthermore if it is a WSS process it is thus stationary and to realize this it is sufficient to write the probability:
(
)
f X S xi ,..., x j =
of whatever vector
1
( 2π )
j −i +1 2
( Det Γ ) XS
1 2
T ⎛ 1 ⎞ exp ⎜ − ( x − mS ) Γ −S1 ( x − mS ) ⎟ ⎝ 2 ⎠
X S extracted from the process.
104
Discrete Stochastic Processes and Optimal Filtering
Linear space associated with a process
X
Given
a WSS process, we note
combinations of the r.v. of
That is to say:
H
X
HX
the family of finite linear
X .
⎧⎪ = ⎨ ∑ λ j X j S finite ⊂ ⎪⎩ j∈S
⎫⎪ ⎬ ⎪⎭
DEFINITION.– We call linear space associated with the process
H
X
2
augmented by the limits in L of the elements of H
denoted
H
X
X
X
the family
. The linear space is
.
NOTES.– 1) H
X
⊂H
X
⊂ L2 ( dP ) and H
2) Let us suppose that X
X
2
is a closed vector space of L
( dP ) .
is a stationary Gaussian process. All the linear 2
combinations of the r.v. X j of X
are Gaussian and the limits in L are equally
(
Gaussian. In effect, we easily verify that if the set of r.v. X n ∼ N mn , σ n 2
converge in L towards an r.v. X of expectation m and of variance
σ n2
then converge towards m and
σ
(
and X ∼ N m, σ
2
σ 2 , mn
) respectively.
2
)
and
Delay operator Process X
being given, we are examining operator T
defined by:
T n : ∑ λ j X j → ∑ λ j X ( j − n ) ( S finished ⊂ j∈S
H
j∈S
X
H
X
)
n
( n ∈ ) on H ∗
X
Introduction to Discrete Time Processes
DEFINITION.– T
n
105
is called operator delay of order n .
Properties of operator delay: –
T n is linear of H ∗
– ∀ n and m ∈ –
X
in H
X
;
T n T m = T n+m ;
T n conserves the scalar product of L2 , that is to say ∀ I and J finite ⊂
:
⎛ ⎞ ⎛ ⎞ < T n ⎜ ∑ λi X i ⎟ , T n ⎜ ∑ µ j X j ⎟ > = < ∑ λi X i , ∑ µ j X j > . ⎜ j∈J ⎟ i∈I j∈J ⎝ i∈I ⎠ ⎝ ⎠ EXTENSION.– T Let
Z ∈H
X
n
extends to all
and Z p ∈ H
X
H
X
in the following way:
be a sequence of r.v. which converge towards Z
2
in L ; Z P is in particular a Cauchy sequence of
( )
Tn Zp
is also a Cauchy sequence of
converges in
H
X
H
X
P
∀Z ∈ H
X
towards Z . It is natural to state T
n
As a consequence,
X
n
and by isometry T ,
H
which, since
. It is simple to verify that lim T
particular series Z p which converges towards
H n
X
is complete,
( Z p ) is independent of the
Z.
and the series Z p ∈ H
X
which converges
T n (Z p ) . ( Z ) = lim P
DEFINITION.– We can also say that
H
X
is the space generated by the X
process. 3.2. WSS processes and spectral measure
In this section it will be interesting to note the influence on the spectral density of the temporal spacing between the r.v. For this reason we are now about to
106
Discrete Stochastic Processes and Optimal Filtering
{
consider momentarily a WSS process X θ = X jθ j ∈ and where jθ has the significance of duration.
} where θ
is a constant
3.2.1. Spectral density
DEFINITION.– We say that the process X θ possesses a spectral density if its
( ( j − i )θ ) = EX iθ X jθ − EX iθ EX jθ can be written 1 C ( nθ ) = ∫ 12θ exp ( 2iπ ( inθ ) u ) S XX ( u ) du and S XX ( u ) is −
covariance C ( nθ ) = C
in the form:
2θ
then called the spectral density of the process X θ . PROPOSITION.– +∞
Under the hypothesis
∑ C ( nθ ) < ∞ :
n =−∞
1) the process X θ admits a spectral density S XX ; 2) S XX is continuous, periodic of
C
− nθ − 2θ − θ
1
θ
period, real and even.
Var X jθ
0 θ
2θ
nθ
S XX
u
t −1
2θ
0
1
2θ
Figure 3.2. Covariance function and spectral density of a process
Introduction to Discrete Time Processes
107
NOTE.– The covariance function C is not defined (and in particular does not equal zero) outside the values nθ . DEMONSTRATION.– Taking into account the hypotheses, the series: +∞
∑ C ( pθ ) exp ( −2iπ ( pθ ) u )
p =−∞
converges uniformly on
1
θ
and defines a continuous function S ( u ) and
-periodic. Furthermore:
∫ =∫
+∞ 2θ C −1 2θ p =−∞ 1
1
2θ −1 2θ
∑ ( pθ ) exp ( −2iπ ( pθ ) u ) exp ( 2iπ ( nθ ) u ) du
S ( u ) exp ( 2iπ ( nθ ) u ) du 2
The uniform convergence and the orthogonality in L
( − 1 2θ , 1 2θ ) of the
complex exponentials enables us to conclude that:
C ( nθ ) = ∫
1
2θ −1 2θ
exp ( 2iπ ( nθ ) u ) S ( u ) du and that S XX ( u ) = S ( u ) .
To finish, C ( nθ ) is a covariance function, thus:
C ( − nθ ) = C ( nθ )
108
Discrete Stochastic Processes and Optimal Filtering
and we can deduce from this that S XX ( u ) =
+∞
∑
p =−∞
real and even (we also have S XX ( u ) = C ( 0 ) + 2 EXAMPLE.– The covariance C ( nθ ) = σ e
C ( pθ ) exp ( −2iπ ( pθ ) u ) is ∞
∑ C ( pθ ) cos2π ( pθ ) u ). p =1
2 − λ nθ
( λ > 0)
of a process X θ in
fact verifies the condition of the proposition and X θ admits the spectral density.
S XX ( u ) = σ 2
+∞
∑ e−λ nθ −2iπ ( nθ )u
n =−∞
∞ ⎛ ⎞ − λ nθ − 2iπ ( nθ )u − λ nθ + 2iπ ( nθ )u =σ 2 ⎜∑e + ∑e − 1⎟ n =0 ⎝ n =0 ⎠ 1 1 ⎛ ⎞ =σ2⎜ + − 1⎟ − λθ − 2iπθ u − λθ + 2iπθ u 1− e ⎝ 1− e ⎠ ∞
=σ2
1 − e−2λθ 1 + e −2λθ − 2e − λθ cos2πθ u
White noise
DEFINITION.– We say that a centered WSS process X θ is a white noise if its covariance
function
⎛ C ( 0 ) = EX 2jθ = σ 2 ⎜ ⎜ C ( nθ ) = 0 if n ≠ 0 ⎝
C ( nθ ) = C ( ( j − i )θ ) = EX iθ X jθ
verifies
∀j ∈
The function C in fact verifies the condition of the preceding proposition and
S XX ( u ) =
+∞
∑
n =−∞
C ( nθ ) exp ( −2iπ ( nθ ) u ) = C ( 0 ) = σ 2 .
Introduction to Discrete Time Processes
109
S XX C
σ
σ2
2
t
u
0
0
Figure 3.3. Covariance function and spectral density of a white noise
We often meet “Gaussian white noises”: these are Gaussian processes which are also white noises; the families of r.v. extracted from such processes are independent
(
and ∼ N 0, σ
2
).
More generally we have the following result which we will use as the demonstration. Herglotz theorem
In order for a mapping
nθ → C ( nθ ) to be the covariance function of a
WSS process, it is necessary and sufficient that a positive measurement on
⎛⎡ 1 1 ⎤⎞ , ⎥ ⎟ , which is called the spectral measure, such that: ⎝ ⎣ 2θ 2θ ⎦ ⎠
B ⎜ ⎢-
C ( nθ ) = ∫
1
2θ −1 2θ
exp ( 2iπ ( nθ ) u ) d µ X ( u ) . ∞
In this statement we no longer assume that
∑ C ( nθ ) < ∞ .
n =−∞
µX
exists
110
Discrete Stochastic Processes and Optimal Filtering +∞
∑ C ( nθ ) < ∞ ,
If
we
again
find
the
starting
statement
with:
n =−∞
d µ X ( u ) = S XX ( u ) du (a statement that we can complete by saying that the
spectral density S XX ( u ) is positive).
3.3. Spectral representation of a WSS process
In this section we explain the steps enabling us to arrive at the spectral representation of a process. In order not to obscure these steps, the demonstrations of the results which are quite long without being difficult are not given.
3.3.1. Problem
The object of spectral representation is: 1) To study the integrals (called Wiener integrals) of the
∫S ϕ ( u ) dZu
type
obtained as limits, in a meaning to clarify the expressions with the form:
∑ ϕ ( u j ) ( Zu j
j
− Zu j−1
) , ϕ is a mapping with complex values (and
where S is a restricted interval of
{
other conditions), Z S = Z u u ∈ S
}
is a 2nd order process with orthogonal
increments (abbreviated as p.o.i.) whose definition will be given in what follows. 2) The construction of the Wiener integral being carried out, to show that reciprocally, if we allow ourselves a WSS process X θ , we can find a p.o.i.
{
}
Z S = ZU u ∈ S = ⎡ − 1 , 1 ⎤ such that ∀j ∈ ⎣ 2θ 2θ ⎦ a Wiener integral X jθ =
∫S e
2iπ ( jθ )u
dZu .
X jθ may be written as
Introduction to Discrete Time Processes
NOTE.–
∫ S ϕ ( u ) dZu
and
∫S e
2iπ ( jθ )u
111
dZu will not be ordinary Stieljes
integrals (and it is this which motivates a particular study). In effect:
⎛ ⎞ ⎜ ⎟ ⎜ σ = ,.., u j −1 , u j , u J +1 subdivision of S ⎟ ⎜ ⎟ let us state ⎜ σ = sup u j − u j −1 module of the subdivision σ ⎟ j ⎜ ⎟ ⎜ ⎟ ⎜ Iσ = ∑ ϕ u j Z u j − Z u j−1 ⎟ u j ∈σ ⎝ ⎠
{
}
( )(
)
∀σ , the expression Iσ is in fact defined, it is a 2nd order r.v. with complex values. However, the process Z S not being a priori of bounded variation, the ordinary limit
lim Iσ , i.e. the limit with a given trajectory u → Zu (ω ) , does not exist and
σ →0
∫ S ϕ ( u ) dZu The r.v.
cannot be an ordinary Stieljes integral.
∫ S ϕ ( u ) dZu
will be by definition the limit in
limit exists for the family Iσ when
L2 precisely if this
σ → 0 , i.e.: 2
lim E Iσ − ∫ ϕ ( u ) dZu = 0 .
σ →0
S
This is still sometimes written:
L _ ( Iσ ) . ∫ S ϕ ( u ) dZu = σlim →0 2
3.3.2. Results
3.3.2.1. Process with orthogonal increments and associated measurements
S designates here a bounded interval of
.
112
Discrete Stochastic Processes and Optimal Filtering
DEFINITION.– We call a random process of continuous parameters with base S , all the family of r.v. Z u , the parameter u describing S .
{
}
This process will be denoted as Z S = Z u u ∈ S . Furthermore, we can say that such a process is: – centered if EZ u = 0
∀u ∈ S ; 2
2
– of the 2nd order if EZ u < ∞ (i.e. Z u ∈ L – continuous in
( dP ) ) ∆u ∈ S ;
L2 : if E ( Zu + ∆u − Zu ) → 0 2
when ∆u → 0 ∀u and u + ∆u ∈ S (we also speak about right continuity when
∆u > 0 or of left continuity when ∆u < 0 in L2 ). In what follows Z S will be centered, of the 2nd order and continuous in
L2 .
Z S has orthogonal increments ∀u1 , u2 , u3 , u4 ∈ S with u1 < u2 ≤ u3 < u4
DEFINITION.– We say that the process ( ZS
is
a
p.o.i.)
if
(
< Z u4 − Z u3 , Z u2 − Zu1 > L2 ( dP ) = E Zu4 − Zu3
)(Z
u2
)
− Zu1 = 0 .
We say that Z S is a process with orthogonal and stationary increments ( Z S is a p.o.s.i.) if Z S is a p.o.i. and if in addition ∀u1 , u2 , u3 , u4 with u4 − u3 = u2 − u1
(
we have E Z u − Z u 4
3
)
2
(
= E Z u2 − Zu1
)
2
.
PROPOSITION.– To all p.o.i. Z S which are right continuous in
L2 , we can
associate: –
a
function
F which does not decrease on
F ( u ′ ) − F ( u ) = E ( Zu′ − Zu ) if u < u ′ ; 2
S
such
that:
Introduction to Discrete Time Processes
– a measurement thus
µ
on
B (S )
113
which is such that ∀ u , u ′ ∈ S with u < u ′ ,
( ).
µ ( ]u, u′]) = F ( u′ ) − F u −
3.3.2.2. Wiener stochastic integral Let Z S still be a p.o.i. right continuity and PROPOSITION.– Given
ϕ ∈ L2 ( µ )
⎞ Zu j − Z u j−1 ⎟ exists. This is by definition ⎟ ⎠ ϕ ( u ) dZ u ;
( )(
Wiener’s stochastic integral 2) Given
ϕ
and ψ ∈ L
2
∫S
the associated measurement.
with complex values:
⎛ lim L2 _ ⎜ ∑ ϕ u j σ →0 ⎜ u ∈σ ⎝ j
1) The
µ
)
( µ ) with complex values, we have the property:
E ∫ ϕ ( u ) dZu ∫ ψ ( u ) dZu = ∫ ψ ( u )ψ ( u ) d µ ( u ) , S
S
in particular E
S
∫ S ϕ ( u ) dZu
2
2
= ∫ ϕ (u ) d µ (u ) . S
Idea of the demonstration
Let us postulate that
ε = vector space in step functions with complex values.
We begin by proving the proposition for functions
ϕ ( u ) = ∑ a j 1⎤u j
⎦
⎤
j −1 ,u j ⎦
ϕ ,ψ ,... ∈ ε
( u ) and : ∫ S ϕ ( u ) dZu = ∑ ϕ ( u j ) ( Zu j
j
(if
ϕ ∈ε
)
− Zu j−1 ).
We next establish the result in the general case by using the fact that
ε ( ⊂ L2 ( µ ) ) is dense in ϕn ∈ ε such that:
L2 ( µ ) , i.e. ∀ϕ ∈ L2 ( µ ) we can find a sequence
114
Discrete Stochastic Processes and Optimal Filtering 2
ϕ − ϕn L ( µ ) = ∫ ϕ ( u ) − ϕn ( u ) d µ ( u ) → 0 S 2
2
when n → ∞ .
3.3.2.3. Spectral representation We start with X θ , a WSS process. Following Herglotz’s theorem, we know that its covariance function
nθ → C ( nθ ) is written C ( nθ ) = ∫
(⎣
spectral measure on B ⎡ −1
2θ
,1
1
2θ 2iπ ( nθ )u e d µX −1 20
(u )
where
µX
is the
)
⎤ . 2θ ⎦
PROPOSITION.– If X θ is a centered WSS process of covariance function
nθ → C ( nθ ) and of spectral measure µ X , there exists a unique p.o.i.
{
}
Z S = Zu u ∈ S = ⎡ −1 , 1 ⎤ such that: ⎣ 2θ 2θ ⎦ ∀j ∈
X jθ = ∫ e
2iπ ( jθ )u
S
dZ u .
Moreover, the measurement associated with Z S is the spectral measure The expression of the X jθ representation of the process.
as Wiener integrals is called the spectral
2iπ ( j + n )θ ) u 2iπ jθ u EX jθ X ( j + n )θ = E ∫ e ( ) dZu ∫ e ( dZu
NOTE.–
S
S
applying the stated property of 2) of the preceding proposition.
=∫ e S
−2iπ ( nθ )u
µX .
dZ u = C ( −nθ ) = C ( nθ ) .
and
by
Introduction to Discrete Time Processes
115
3.4. Introduction to digital filtering
We suppose again that θ = 1 . Given
{
a
h = hj ∈
WSS
j∈
process
X
and
a
sequence
of
real
} , we are interested in the operation which at X
numbers makes a
new process Y correspond, defined by:
∀K ∈
( h 0T
0
YK =
⎛
+∞
⎞
+∞
∑ h j X K − j = ⎜⎜ ∑ h jT j ⎟⎟ X K ⎝ j =−∞
j =−∞
⎠
is also denoted as h1 where 1 is the identical mapping of +∞
In what follows we will still assume that
∑
j =−∞
L2 in L2 ).
h j < ∞ ; this condition is
1
and is called (for reasons which will be explained later) generally denoted h ∈ the condition of stability. DEFINITION.– We say that the process Y process X
by the filter H ( T ) =
is the transform (or filtration) of the
+∞
∑ h jT j and we write Y
j =−∞
= H (T ) X .
NOTES.– 1) Filter H (T ) is entirely determined by the sequence of coefficients
{
h = hj ∈
j∈
} and according to the case in hand, we will speak of filter
H (T ) or of filter h or again of filter (..., h− m ,..., h−1 , h0 ,..., hn ,...) . 2) The expression “ ∀K ∈
YK =
convolution product (noted ∗ ) of X
+∞
∑ h j X K − j ” is the definition of the
j =−∞
by h which is also written as:
116
Discrete Stochastic Processes and Optimal Filtering
YK = ( h ∗ X
Y = h ∗ X or again ∀K ∈ 3) Given that X
is a WSS process and
is clear that the r.v. YK =
+∞
∑ hj X K − j
H
X
∈H
)K
is the associated linear space, it X
and that process Y
is also
j =−∞
WSS. Causal filter
Physically, for whatever
K
is given, YK can only depend on the previous r.v.
X K − j in the widest sense of YK , i.e. that j ∈
. A filter H (T ) which realizes
this condition is called causal or feasible. Amongst these causal filters, we can further distinguish two major classes: 1) Filters that are of finite impulse response (FIR) such that: N
YK = ∑ h j X K − j
∀K ∈
j =0
the schematic representation of which follows.
XK
T h0
T
T
h1
h2
hN
Σ
Σ
Σ
Figure 3.4. Schema of a FIR filter
YK
Introduction to Discrete Time Processes
117
2) Filters that are of infinite impulse response (IIR) such that ∞
YK = ∑ h j X K − j
∀K ∈
j =0
NOTES.– 1) Let us explain about the role played by the operator T : at any particular instant K , it replaces X K with X K −1 ; we can also say that T blocks the r.v.
X K −1 for a unit of time and restores it at instant
K
2) Let H (T ) be an IIR filter. At the instant
K
.
∞
YK = ∑ h j X K − j = h0 X K + ... + hK X 0 + hK +1 X −1 + ... j =0
For a process X , thus beginning at the instant 0 , we will have:
∀K ∈
K
YK = ∑ h j X K − j j =0
Example of filtering of a Gaussian process
Let us consider the Gaussian process X
∼ N ( m ( j ) , Γ ( i, j ) ) and the FIR
filter H (T ) defined by h = (...0,..., 0, h 0,..., hN , 0,...) . We immediately verify
that the process Y = H ( T ) X
is Gaussian. Let us consider for example the
filtering specified by the following schema:
118
Discrete Stochastic Processes and Optimal Filtering
(
X ∼ N 0, e − j −i
)
T -1
2
YK
Σ
K
YK = ∑ h j X K − j = − X K + 2 X K −1
∀K ∈
j =0
Y is a Gaussian process. Let us determine its parameters: mY ( i ) = EY j = 0
(
(
ΓY ( i, j ) = E Yi Y j = E ( − X i + 2 X i −1 ) − X j + 2 X j −1
)) =
E X i X j − 2 E X i −1 X j − 2 E X i X j −1 + 4 E X i −1 X j −1 = 5e
− j −i
− 2e
− j −i +1
− 2e
− j −i −1
Inverse filter of a causal filter
DEFINITION.– We say that a causal filter H (T ) is invertible if there is a filter
(
denoted H (T ) process X
)
−1
and called the inverse filter of H (T ) such that for any WSS
we have:
(
X = H (T ) ( H (T ) ) X −1
) = ( H (T ) )
−1
( H (T ) X ) ( ∗)
Introduction to Discrete Time Processes
If such a filter exists, the equality Y = H ( T ) X
119
is equivalent to the equality
X = ( H (T ) ) Y . −1
Furthermore,
{
h′ = h′j ∈
( H (T ) )
j∈
}
−1
is
and
defined
by
we
have
(
)
a
sequence
the
of
coefficients
convolution
product
X = h′ ∗ Y .
∀K ∈
In order to find the inverse filter H (T )
{
of coefficients h′ = h′j ∈
(∗) is equivalent to: ∀K ∈
j∈
−1
, i.e. in order to find the sequence
} , we write that the sequence of equalities
⎞ ⎛ +∞ ⎞ ⎛ +∞ ⎞ ⎛ ⎛ +∞ ⎞ ⎞ ⎛ ⎛ +∞ ⎞ X K = ⎜ ∑ h jT j ⎟ ⎜ ⎜ ∑ h′j T j ⎟ X K ⎟ = ⎜ ∑ h′j T j ⎟ ⎜ ⎜ ∑ h j T j ⎟ X K ⎟ ⎜ j =−∞ ⎟ ⎜ ⎜ j =−∞ ⎟ ⎟ ⎜ ⎜ j =−∞ ⎟ ⎟ ⎜ j =−∞ ⎟ ⎝ ⎠⎝⎝ ⎠ ⎠⎝⎝ ⎠ ⎠ ⎝ ⎠ or even to:
⎛ +∞ ⎞ j ⎜⎜ ∑ h jT ⎟⎟ ⎝ j =−∞ ⎠
⎛ +∞ ⎞ ⎛ +∞ ⎞ j j ′ h T ⎜⎜ ∑ j ⎟⎟ = ⎜⎜ ∑ h′j T ⎟⎟ ⎝ j =−∞ ⎠ ⎝ j =−∞ ⎠
⎛ +∞ ⎞ j ⎜⎜ ∑ h j T ⎟⎟ = 1 ⎝ j =−∞ ⎠
EXAMPLE.– We are examining the causal filter H ( T ) = 1 − hT 1) If h < 1
∞
H (T ) admits the inverse filter ( H (T ) ) = ∑ h j T j . −1
j =0
120
Discrete Stochastic Processes and Optimal Filtering
For that we must verify that being given X K r.v. at instant
K
of a WSS process
X , we have: ⎛⎛
⎞
⎞
∞
(1 − hT ) ⎜⎜ ⎜⎜ ∑ h j T j ⎟⎟ X K ⎟⎟ = X K ⎝ ⎝ j =0
⎠
2
(equality in L )
⎠ ⎛ N ⎞ ⇔ lim (1 − hT ) ⎜ ∑ h j T j ⎟ X K = X K ⎜ j =0 ⎟ N ⎝ ⎠
(
)
⇔ 1 − h N +1 T N +1 X K − X K = h
N +1
X K −( N +1) → 0 when N ↑ ∞
which is verified if h < 1 since X K − ( N +1) =
(
We should also note that H (T )
)
−1
E X 02 .
is causal.
⎛ ⎝
2) If h > 1 let us write (1 − hT ) = − hT ⎜ 1 −
(1 − hT )
−1
⎛ 1 ⎞ = ⎜1 − T −1 ⎟ ⎝ h ⎠
As the operators commute and
−1
(1 − hT ) = −
T −1 h
∞
∑
j =0
−1
1 −1 ⎞ T ⎟ thus: h ⎠
⎛ 1 −1 ⎞ ⎜− T ⎟. ⎝ h ⎠
1 < 1, q ∞ 1 −j T ( ) = − T ∑ h j +1 . hj j =0 − j +1
Introduction to Discrete Time Processes
121
However, this inverse has no physical reality and it is not causal (the “lead − ( j +1) operators” T are not causal). 3) If h = 1
(1 − T ) and (1 + T ) are not invertible.
Transfer function of a digital filter
DEFINITION.– We call the transfer function of a digital filter
H (T ) =
+∞
∑
j =−∞
h j T j the function H ( z ) =
+∞
∑ hj z− j
z∈
.
j =−∞
We recognize the definition given in the analysis of a Laurent sequence if we
1 . As a consequence of this permutation the transfer z −1 functions (sums of the series) will sometimes be written by using the variable z . We also say that H ( z ) is the z transform of the digital sequence permute z and z
−1
=
h = (... h− m ,..., h 0,..., hn ,...) .
Let us be more precise about the domain of definition of H ( z ) ; it is the domain of the convergence K of Laurent sequence. We already know that K is an annulus of center 0 and thus has the form
{
}
K = z 0≤r < z < R
Moreover, any circle of a complex plane of centre and radius
C ( 0, ρ ) .
ρ
is denoted by
122
Discrete Stochastic Processes and Optimal Filtering
K contains C ( 0,1) because due to that fact that we know the hypothesis of +∞
the stability of the filter,
∑
j =−∞
+∞
∑ hj z− j
hj < ∞ ,
converges absolutely any
j =−∞
∀ z ∈ C ( 0,1) . C ( 0, R ) C ( 0, r )
0
1
Figure 3.5. Convergence domain of transfer function
The singularities
σj
of H ( z ) verify
σj ≤r
H ( z ) of any digital filter
or
σj ≥R
and there will be
at least one singularity of H ( z ) on C ( 0, r ) and another on C ( 0, R ) (if not,
K , the holomorphic domain of H ( z ) , could be enlarged). If the filter is now causal and: – if it is an IIR filter then H ( z ) =
{
K = z 0≤r < z
}
( R = +∞ ) ;
∞
∑ h j z − j , so H ( z ) j =0
is holomorphic in
Introduction to Discrete Time Processes
– if it is an FIR filter then H ( z ) =
{
K = z 0< z
} (pointed plane at 0).
123
N
∑ h j z − j , so H ( z )
is holomorphic in
j =0
We observe above all that the singularities
σj
of a transfer function of a stable,
causal filter all have a module strictly less than 1.
C ( 0, r ) 0
∗0
1
Figure 3.6. Convergence domain of and convergence domain of
1
H ( z ) of an IIR causal filter
H ( z ) of an FIR causal filter +∞
NOTE.– In the case of a Laurent sequence
∑ hj z− j
(i.e., in the case of a digital
j =−∞
filter h = {... h− m ,..., h 0,..., hn ,...} ), its domain of convergence K and thus its sum H ( z ) is determined in a unique manner, that is to say that the couple
( H ( z ) , K ) is associated with the filter.
Reciprocally, if, given H ( z ) , we wish to obtain the filter h , it is necessary to
begin by specifying the domain in which we wish to expand H ( z ) , because for
124
Discrete Stochastic Processes and Optimal Filtering
different K domains, we obtain different Laurent expansions having H ( z ) as the sum.
(
This can be summed up by the double implication H ( z ) , K Inversion of the
)
h.
z transform
(
)
Given the couple H ( z ) , K , we wish to find filter h .
H being holomorphic in K , we can apply Laurent’s formula: ∀j ∈
hj =
1 2iπ
∫Γ
H ( z) +
z − j +1
dz
where (homotopy argument) Γ is any contour of K and encircling 0 . The integral can be calculated by the residual method or even, since we have a choice of contour
Γ , by choosing Γ = C ( 0,1) and by parameterizing and calculating the integral ∀j ∈
hj =
1 2iπ
iθ ijθ ∫Γ H ( e ) e dθ . +
In order to determine h j , we can also expand the function H ( z ) in Laurent sequences by making use of the usual known expansions. SUMMARY EXAMPLE.– Let the stable causal filter H (T ) = 1 − hT with
h < 1 , of transfer function H ( z ) = 1 − h z −1 defined on
− {0} . We have
seen that it is invertible and that its inverse, equally causal and stable, is ∞
R (T ) = ∑ h j T j . j =0
Introduction to Discrete Time Processes
125
The transfer function of the inverse filter is thus: ∞
R ( z ) = ∑ h j z− j = j =0
(note also that R ( z ) =
1
1 defined on z z > h 1 − hz −1
H (z)
{
}
).
h 0
0
Figure 3.7. Definition domain of
{
1
H ( z ) and definition domain of R ( z )
}
1 on z z > h , let us find (as an exercise) the 1 − hz −1 −j Laurent expansion of R ( z ) , i.e. the h j coefficients of z . Having R ( z ) =
1 1 R ( z )z j −1dz = + ∫ Γ 2iπ 2iπ where Γ is a contour belonging to z z > h . Using the Laurent formulae h j =
{
}
∫Γ
+
zj −dz z−h
126
Discrete Stochastic Processes and Optimal Filtering
By applying the residual theorem,
⎞ 1 ⎛ zj zj if j ≥ 0 h j = 2iπ . in h ⎟ = lim ( z − h ) = hj ⎜ residual of 2iπ ⎝ z-h z−h ⎠ z →h if j < 0 :
h j = 2iπ .
1 ⎡⎢⎛
⎞ ⎤ ⎡⎛ ⎞⎤ 1 1 ⎜ Residual of in 0 ⎟ ⎥ + ⎢⎜ Residual of in h ⎟ ⎥ = 0 . ⎟ ⎥ ⎢⎜ ⎟⎥ 2iπ ⎢⎣⎜⎝ z j ( z −h ) z j ( z −h ) ⎠ ⎦ ⎣⎝ ⎠⎦ −1
1
hj
PROPOSITION.– It is given that X
is a WSS process and
linear space; we are still considering the filter
H ( z) =
+∞
∑
j =−∞
h j z − j with
+∞
∑
hj
H
X
is the associated
H (T ) of transfer function
hj < ∞ .
j =−∞
So: 1)
∀K ∈
⎛ +∞ ⎞ j ⎜⎜ ∑ q jT ⎟⎟ X K = ⎝ j =−∞ ⎠
That is to say that the r.v. YK =
H
X
+∞
∑
j =−∞
+∞
∑ q j X K − j converges in H X .
j =−∞
h j X K − j of the filtered process remain in
; we say that the filter is stable.
2) The filtered process Y is WSS. 3) The spectral densities of X
SYY ( u ) = H ( −2iπ u )
2
and of Y are linked by the relationship:
S XX ( u )
Introduction to Discrete Time Processes
127
DEMONSTRATION.– 1) We have to show that ∀K ∈ such that the sequence N →
, there exists an r.v.
YK ∈ H
X
⊂ L2 ( dP )
N
∑ hj X K − j
converges for the norm of
H
X
and
−N
when N ↑ ∞ towards YK . As
X
H
is a Banach space, it is sufficient to verify the
normal convergence, namely: +∞
∑
+∞
∑
(
h j E X K2 − j
)
1
2
0
By using the equation of observations:
and
Y j ⊥ WK
0 ≤ j ≤ K −1
Yj ⊥ NK
0≤ j≤K
The problem of the estimation can now be expressed simply in the following way. Knowing that A( K ) is the state matrix of the system, H ( K ) is the measurement matrix and the results yi of Yi
i ∈ [1, K ] ; obtain the estimations x j
of X j : – if 1< j K we say that the estimation is a prediction. NOTE.– The matrices C ( K ) and G ( K ) do not play an essential role in the measurement where the powers of noise appear in the elements of the matrices QK and RK respectively. However, the reader will be able to find analogies with notations used in “Processus stochastiques et filtrage de Kalman” by the same authors which examine the continuous case.
248
Discrete Stochastic Processes and Optimal Filtering
7.3.3. Innovation process
The innovation process has already been defined as:
I K = YK − H ( K ) Pr oj
H KY −1
and:
⎪⎧
X K = YK − H ( K ) Xˆ K |K −1
: ( m×1)
⎪⎫
K −1
∑ Λ jY j Λj matrix n × m ⎬⎪ ⎪ j =0
H KY-1 = ⎨ ⎩
⎭
By this choice of Λ j , the space multivectors X j and Pr oj
Y HK −1
XK
=
H KY−1
is adapted to the order of the state
Xˆ K |K −1 has the same order as X K .
Thus I K represents the influx of information between the instants K − 1 and K Reminder of properties established earlier:
I K ⊥ Y j ⎫⎪ ⎬ for j ∈ [1, K -1] I K ⊥ I j ⎪⎭ We will go back to the innovation to give the importance of its physical sense. 7.3.4. Covariance matrix of the innovation process
Between two measurements, the dynamic of the system leads to an evolution of the state quantities. So the prediction of the state vector at instant K , knowing the measurements
(Y1...YK −1 )
which is to say Xˆ K |K −1 is written according to the
filtering at instant K − 1 .
Xˆ K |K −1 = E ( X K | Y1 ,… , YK −1 ) = Pr oj
HY
= Pr oj
HY
K −1
K −1
XK
( A( K − 1) X K −1 + C ( K − 1) N K −1 | Y1 ,… , YK −1 )
= A( K − 1) Xˆ K −1|K −1 + 0
The Kalman Filter
Xˆ
= A ( K −1) Xˆ
K K −1
249
K −1 K −1
Only the information deriving from a new measurement at instant K will enable us to reduce the estimation error at this same instant. Thus H ( K ) representing in a certain fashion, the measurement operator, where at the very least its effect:
YK − H ( K ) Xˆ
K K −1
will represent the influx of information between two instants of observation. It is for this reason that this information is called the innovation. We observe, furthermore, that I K and YK have the same order. By exploiting the observation equation we can deduce:
⎛ ⎞ I K = H ( K ) ⎜ X K − Xˆ + G ( K ) WK K K −1 ⎟ ⎝ ⎠ and
IK = H (K ) X
K K −1
+ G ( K ) WK
where X K |K −1 = X K − Xˆ K |K −1 is called the prediction error. The covariance innovation matrix is expressed finally as:
Cov I K = E
(
I K I KT
)
that is to say Cov I K = H ( K ) PK K −1 H where P
K K −1
T
⎛ ⎞⎛ ⎞ = E ⎜ H (K ) X + G ( K ) WK ⎟ ⎜ H ( K ) X + G ( K ) WK ⎟ K K −1 K K −1 ⎝ ⎠⎝ ⎠ T
( K ) + G ( K ) RK GT ( K )
⎛ ⎞ XT = Ε⎜ X ⎟ is called the covariance matrix of the ⎝ K K −1 K K −1 ⎠
prediction error.
250
Discrete Stochastic Processes and Optimal Filtering
A recurrence formula on the matrices P
K K −1
will be developed in Appendix A.
7.3.5. Estimation
In the scalar case, we have established a relationship between the estimate of a magnitude X K and the innovations I K . We can, quite obviously extend this approach to the case of multivariate processes, that is to say we can write:
Xˆ
K
= ∑ d j (i ) I j
iK
j =1
where d j ( i ) is a matrix ( n x m ) . Let us determine the matrices d j ( i ) :
(
T
since E X i|K I j
(
) = E (( X T
we have: E X i I j
) = E ( Xˆ
furthermore, we have E
(
Then, since I j ⊥ I p
(
i
)
) )
− Xˆ i|K I Tj = 0 ∀j ∈ [1, K ] T i| K I j
X i I Tj
)
) and knowing the form of Xˆ
⎛ K ⎞ = E ⎜ ∑ d p (i ) I p I Tj ⎟ . ⎜ p =1 ⎟ ⎝ ⎠
∀j ≠ p
(
and
)
j , p ∈ [1, K ]
E X i I Tj = d j ( i ) E I j I Tj = d j ( i ) CovI j .
(
Finally: d j ( i ) = E X i I j
T
) ( CovI ) j
−1
.
i| K
The Kalman Filter
251
We thus obtain: K
(
) ( Cov I )
(
) ( Cov I )
Xˆ i K = ∑ Ε X i I Tj j =1
K −1
= ∑ Ε X i I Tj j =1
(
+ Ε X i I KT
−1
j
Ij
−1
j
) ( Cov I
K
Ij
)−1 I K
We are now going to give the Kalman equations. Let us apply the preceding equality to the filtering Xˆ K +1 K +1 ; we obtain: K +1
Xˆ K +1 K +1 = ∑ Ε X K +1 I Tj
(
) ( Cov I )
K
(
) ( Cov I )
−1
+ Ε X K +1 I KT +1 ( Cov I K +1 )
−1
j =1
= ∑ Ε X K +1 I Tj j =1
(
j
)
The state equation reminds us that:
X K +1 = Α ( K ) X K + C ( K ) N K and we know that N K
⊥ Ij .
Thus:
(
)
(
−1
j
)
Ε X K +1 I Tj = Α ( K ) Ε X K I Tj .
Ij Ij I K +1.
252
Discrete Stochastic Processes and Optimal Filtering
The estimate of X K +1 knowing the measurement at this instant K+1 is thus expressed: K
(
Xˆ K +1 K +1 = Α ( K ) ∑ Ε X K I Tj
(
j =1
) ( Cov I ) j
)
−1
Ij
+ Ε X K +1 I KT +1 ( Cov I K +1 ) I K +1. −1
The term under the sigma sign (sum) can be written Xˆ K K . Let us exploit this expression:
I K +1 = H ( K +1) X K +1 K + G ( K +1) WK +1 . This gives us:
(
)
−1 Xˆ K +1 K +1 = Α ( K ) Xˆ K K + Ε X K +1 I KT +1 ( Cov I K +1 ) I K +1
which is also written:
(
⎛ Xˆ K +1 K +1 = Α ( K ) Xˆ K K + Ε ⎜ X K +1 H ( K +1) X K +1 K + G ( K +1) WK +1 ⎝
)
T
⎞ ⎟ ⎠
−1
. ( Cov I K +1 ) I K +1 In addition we have shown that the best estimation at a given instant, knowing the past measurements, that we write as Xˆ K +1 K , is equal to the projection of
X K +1 on H KY , i.e.:
The Kalman Filter
Xˆ K +1 K = ProjH Y X K +1 = Pr oj
HY
K
Xˆ K +1 K = Pr oj
HY
and as:
Yj
⊥
K
K
253
( Α (K ) X K + C (K ) NK )
( Α (K ) X K + C (K ) NK ) ∀ j ∈[1, K ]
NK
it becomes Xˆ K +1 K = Α ( K ) Xˆ K K ; Α ( K ) squared . We can consider this equation as that which describes the dynamic of the system independently of the measurements, and as one of the equations of the Kalman filter.
⊥ Wj
In addition, X K
K, j >
0 : it becomes for the filtering:
(
)
−1 Xˆ K +1 K +1 = Xˆ K +1 K + Ε X K +1 X KT +1 K H T ( K + 1) ( Cov I K +1 ) I K +1 .
As:
Xˆ K +1 K
⊥
X K +1 K
then:
( (
)
Xˆ K +1 K +1 = Xˆ K +1 K + E X K +1 − Xˆ K +1 K X KT +1 K H T ( K +1) −1
. ( Cov I K +1 ) I K +1 thus: −1 Xˆ K +1 K +1 = Xˆ K +1 K + PK +1 K H T ( K +1) ( Cov I K +1 ) I K +1
)
254
Discrete Stochastic Processes and Optimal Filtering
DEFINITION.– We call the Kalman gain the function K defined (here at instant K+1) by:
K ( K +1) = PK +1 K H T ( K +1) ( Cov I K +1 )
−1
with:
Cov I K +1 = H ( K + 1) PK +1 K H T ( K + 1) + G ( K +1) RK +1 GT ( K +1) From which by putting
K ( K + 1) back into the expression we obtain:
(
K ( K +1) = PK +1 K H T ( K +1) H ( K +1) PK +1 K H (TK +1) + G ( K +1) RK +1GT ( K +1)
)
−1
We notice that this calculation does not require direct knowledge of the measurement of YK . This expression of the gain intervenes, quite obviously, in the algorithm of the Kalman filter and we can write;
(
Xˆ K +1 K +1 = Xˆ K +1 K + K ( K +1) YK +1 − H ( K +1) Xˆ K +1 K
)
This expression of the best filtering represents another equation of Kalman filter. We observe that the “effect” of the gain is essential. In effect, if the measurement is very noisy, which signifies that the elements of matrix RK are large, then the gain will be relatively weak, and the impact of this measurement will be minimized for the calculation of the filtering. On the other hand, if the measurement is not very noisy, we will have the inverse effect; the gain will be large and its effect on the filtering will be appreciable.
The Kalman Filter
255
We are now going “to estimate” this filtering by calculating the error that we commit, that is to say, by calculating the covariance matrix of the filtering error. Let us recall that Xˆ K +1 K +1 is the best of the filtrations, in the sense that it minimizes the mapping:
Z
→ tr X K +1 − Z
Y ∈ H K+ 1
2
T = tr E ⎡( X K +1 − Z )( X K +1 − Z ) ⎤ ⎣ ⎦
∈\ 2
The minimum is thus: tr X K +1 − Xˆ K +1 K +1
(
= tr E X K +1 K +1 X TK +1 K +1
(
T
NOTATION.– In what follows, matrix E X K +1 K +1 X K +1 K +1
P
)
)
is denoted
and is called the covariance matrix of the filtering error.
K +1 K +1
We now give a simple relationship linking the matrices
P
K +1 K +1
and P
K +1 K
.
We observe that by using the filtration equation first and the state equation next:
X K +1|K +1 = X K +1 − Xˆ K +1 K +1
(
= X K +1 − Xˆ K +1 K − K ( K +1) YK +1 − H ( K +1) Xˆ K +1 K = X K +1 − Xˆ K +1 K − K ( K +1)
(H (
K +1) X K +1 + G ( K +1) WK +1 − H ( K +1) Xˆ K +1 K
)
)
= ( I d − K ( K +1) H ( K +1) ) X K +1|K − K ( K +1) G ( K +1) WK +1
256
Discrete Stochastic Processes and Optimal Filtering
where I d is the identity matrix. By bringing this expression of X K +1|K +1 in P
K +1 K +1
and by using the fact
that: X K +1| K ⊥ WK +1 we have:
P
K +1 K +1
= ( I d − K ( K +1) H ( K +1) ) P
K +1 K
( I d − K ( K +1) H ( K +1) )T +
K ( K +1) G ( K +1) R ( K +1) GT ( K +1) K T ( K +1) an expression which, since:
Cov I K +1 = G ( K +1) RK +1 GT ( K +1) + H ( K + 1) PK +1 K H T ( K + 1) can be written:
(
PK +1 K +1 = K ( K +1) − PK +1 K H T ( K +1) ( CovI K +1 )
(
( CovI K +1 ) ( K ( K + 1) − PK +1 K
−1
)
H (TK +1) ( CovI K +1 )
)
−1 T
)
+ I d − PK +1 K H T ( K +1) ( CovI K +1 ) H ( K +1) PK +1 K . −1
However, we have seen that: K ( K +1) = PK +1 K H ( K +1) ( Cov I K +1 ) T
−1
So the first term of the second member of the expression is zero and our sought relationship is finally:
(
)
PK +1 K +1 = I d − K ( K +1) H ( K +1) PK +1 K This “updating” of the covariance matrix by iteration is another equation of the Kalman filter.
The Kalman Filter
257
Another approach to calculate this minimum [RAD 84]. We notice that the penultimate expression of PK +1|K +1 can be put in the form:
(
PK +1 K +1 = K ( K +1) − PK +1 K H T ( K +1) J −1 ( K +1)
(
)
J ( K +1) K ( K + 1) − PK +1 K H T ( K + 1) J (−K1 +1)
(
)
)
T
+ I d − PK +1 K H T ( K +1) J −1 ( K +1) H ( K +1) PK +1 K with:
J ( K +1) = H ( K +1) PK +1 K H T ( K +1) + G ( K +1) RK +1 GT ( K +1) = Cov I K +1 Only the 1st term of PK +1 K +1 depends on
K ( K +1) and is of the form
M J M T symmetric with J . So this shape is a positive or zero trace and:
(
)
PK +1 K +1 = M J M T + I d − PK +1 K H T ( K +1) J −1 ( K +1) H ( K +1) PK +1 K . The minimum of the trace will thus be reached when
M is zero, thus:
K ( K +1) = PK +1 K H T ( K +1) J −1 ( K +1) where:
(
K ( K +1) = PK +1 K H T ( K +1) H ( K +1) PK +1 K H T ( K + 1) + G ( K +1) RK +1GT ( K +1) a result which we have already obtained! In these conditions when:
(
)
PK +1 K +1 = I d − K ( K +1) H ( K +1) PK +1 K
)
−1
258
Discrete Stochastic Processes and Optimal Filtering
we obtain the minimum of the tr PK +1 K +1 . It is important to note that K , the Kalman gain and PK K the covariance matrix of the estimation error are independent of the magnitudes YK . We can also write the best “prediction”, i.e. Xˆ K +1 K according to the preceding prediction: Thus:
(
Xˆ K +1 K = Α ( K ) Xˆ K K −1 + Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1
)
As for the “best” filtering, the best prediction is written according to the preceding predicted estimate weighted with the gain and the innovation brought along by the measurement YK . This Kalman equation is used not in filtering but in prediction. We must now establish a relationship on the evolution of the covariance matrix of the estimation errors. 7.3.6. Riccati’s equation
Let us write an evolution relationship between the covariance matrix of the filtering error and the covariance matrix of the prediction error:
(
PK K −1 = Ε X K K −1 X KT K −1
)
or by incrementation:
with:
(
PK +1 K = Ε X K +1 K X KT +1 K X K +1 K = X K +1 − Xˆ K +1 K
).
The Kalman Filter
259
Furthermore we know that:
Xˆ K +1 K = Α ( K ) Xˆ K K −1 + A ( K ) K ( K ) I K giving the prediction at instant K +1 and X K +1 = Α ( K ) X K + C ( K ) N K just as: I K = YK − H ( K ) Xˆ K K −1 . The combination of these expressions gives us:
(
)
(
)
X K +1 K = Α ( K ) X K − Xˆ K K −1 − Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1 + C ( K ) N K but YK = H ( K ) X K + G ( K ) WK thus:
(
)
(
)
X K +1 K = Α ( K ) X K − Xˆ K K −1 − Α ( K ) K ( K ) H ( K ) X K − Xˆ K K −1 −
Α ( K ) K ( K ) G ( K ) WK + C ( K ) N K
X K +1 K = ( Α ( K ) − Α ( K ) K ( K ) H ( K ) ) X K K −1 − Α ( K ) K ( K ) G ( K ) WK + C ( K ) N K We can now write PK +1 K by observing that:
X K K −1 ⊥ N K and
X K K −1 ⊥ WK
.
NOTE.– Please note that X K +1/ K is not orthogonal to WK .
260
Discrete Stochastic Processes and Optimal Filtering
Thus:
PK +1 K = ( Α ( K ) − Α ( K ) K ( K ) H ( K ) ) PK K −1 ( Α ( K ) − Α ( K ) K ( K ) H ( K ) )
T
+ C ( K ) QK C T ( K ) + Α ( K ) K ( K ) G ( K ) RK GT ( K ) K T ( K ) ΑT ( K ) This expression of the covariance matrix of the prediction error can be put in the form:
PK +1 K = Α ( K ) PK K ΑT ( K ) + C ( K ) QK C T ( K ) This equality independent of YK is called Riccati’s equation with PK K = ( I d − K ( K ) H ( K ) ) PK K −1 which represents the covariance matrix of the filtering error, which is equally independent of YK . See Appendix A for details of the calculation. 7.3.7. Algorithm and summary
The algorithm presents itself in the following form, with the initial conditions:
P0 and Xˆ 0|0 given as well as the matrices: Α ( K ) , QK , H ( K ) , RK , C ( K ) and G ( K ) . 1) Calculation phase independent of YK . Effectively, starting from the initial conditions, we perceive that the recursivity which acts on the gain K ( K + 1) and on the covariance matrix of the prediction and filtering errors PK +1 K and PK +1 K +1 do not require knowledge of the observation process. Thus, the calculation of these matrices can be done without knowledge of measurements. As for these measurements, they come into play for the innovation calculation and that of the filtering or of prediction.
The Kalman Filter
261
PK +1 K = Α( K ) PK K ΑT ( K ) + C ( K ) QK CT ( K )
(
K ( K+1) = PK +1 K HT ( K+1) H ( K+1) PK +1 K HT ( K + 1) + G ( K+1) RK +1 GT ( K+1) PK +1 K +1 = ( Id − K ( K+1) H ( K+1) ) PK +1 K
)
−1
Xˆ K +1 K = Α( K ) Xˆ K K T
(
T
or K ( K + 1) = PK +1 K +1 H ( K + 1) G ( K +1) RK +1G ( K +1)
)
−1
T
if G ( K +1) RK +1G ( K +1) is invertible. 1) Calculation phase taking into account results y K of process YK .
I K +1 = YK +1 − H ( K + 1) Xˆ K +1 K Xˆ K +1 K +1 = Xˆ K +1 K + K ( K + 1) I K +1 It is using a new measurement that the calculated innovation will allow us, weighted by the gain at the same instant, to know the best filtering.
YK +1
+
Σ
K ( K + 1)
Σ
Σ
-
Xˆ K +1 K +1
+
T H ( K + 1)
Xˆ K +1 K
Α(K )
Figure 7.2. Schema of the principle of the Kalman filter
Xˆ K K
262
Discrete Stochastic Processes and Optimal Filtering
Important additional information may be obtained in [HAY 91]. NOTE.– If we had conceived a Kalman predictor we would have obtained the expression of the prediction seen at the end of section 7.3.5.
(
Xˆ K +1 K = Α ( K ) Xˆ K K −1 + Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1
)
IK NOTE.– When the state and measurement equations are no longer linear, a similar solution exists and can be found in other works. The filter then takes the name of the extended Kalman filter. 7.4. Exercises for Chapter 7 Exercise 7.1.
Given the state equation
X K +1 = A X K + N K
where the state matrix A is the “identity” matrix of dimension 2 and
N K the
system noise whose covariance matrix is written Q = σ I d ( I d : identity matrix). 2
The system is observed by the scalar equation:
YK = X 1K + X K2 + WK where X 1K and X K2 are the components of the vector
X K and where WK is the measurement noise of the variance R = σ 12 . P0|0 = Id
and
ˆ = 0 are the initial conditions. X 0|0
1) Give the expression of the Kalman gain K (1) at instant “1” according to
σ 2 and σ 12 . 2) Give the estimate of Xˆ 1|1 of X 1 at instant “1” according to the first measurement of
K (1) and the first measurement Y1 .
The Kalman Filter
263
Solution 7.1.
1) K (1) =
1+ σ 2 ⎛1⎞ ⎜ ⎟ 2 + 2σ 2 + σ 12 ⎝ 1 ⎠
2) Xˆ 1|1 = K (1)Y1 Exercise 7.2.
We are considering the movement of a particle.
x1 ( t ) represents the position of the particle and x2 ( t ) its speed. t
x1 ( t ) = ∫ x2 (τ ) dτ + x1 ( 0 ) 0
By deriving this expression and by noting
x2 (t ) =
dx1 ( t ) = dt
approximately = x1 ( K +1) − x1 ( K ) .
We assume that the speed can be represented by:
X K2 = X K2 −1 + N K −1 where N K is a Gaussian stationary noise which is centered and of variance 1. The position is measured by y K , result of the process YK . This measurement adds a Gaussian stationary noise, which is centered and of variance 1.
Y ( K ) = H ( K ) X ( K ) + WK We assume that RK measurement equal to 1.
covariance matrix (of dimension 1) has a noise
264
Discrete Stochastic Processes and Optimal Filtering
1) Give the matrices A, Q (covariance matrix of the system noise) and H . 2) In taking as initial conditions Xˆ 0 = Xˆ 0|0 = 0
P0|0 = I d identity matrix,
give xˆ1|1 the 1st estimation of the state vector. Solution 7.2.
⎛ 1 1⎞ ⎛0 0⎞ ⎟; Q = ⎜ ⎟ ; H = (1 0 ) ⎝ 0 1⎠ ⎝0 1⎠
1) A = ⎜
⎛ 2 3⎞ ⎟ y1 ⎝1 3⎠
2) xˆ1|1 = ⎜
Exercise 7.3. [RAD 84]
We want to estimate two target positions using one measurement. These 1
2
positions X K and X K form the state vector:
(
X K = X 1K
X K2
)
T
The process noise is zero. The measurement of process Y is affected by noise by W of mean value zero and of variance R carried to the sum of the position:
YK = X 1K + X K2 + WK In order to simplify the calculation, we will place ourselves in the case of an immobile target:
X K +1 = X K = X The initial conditions are:
(
)
– P0|0 = C ov X , X = Id identity matrix
The Kalman Filter
265
– R = 0.1 – y = 2.9 (measurement) and Xˆ 0|0 = ( 0
0)
T
1) Give the state matrix A , and observation matrix H . 2) Give the Kalman gain
K.
3) Give the covariance matrix of the estimation error. 2
4) Give the estimation in the sense of the minimum in L of the state vector XK . 5) If x = xK = (1
2 ) , give the estimation error x = xK |K = xK − xˆ K |K . T
1
6) Compare the estimation errors of the variances of X K
and
X K2 and
conclude. Solution 7.3.
H = (1 1)
1) A = I d
2) K = (1 2,1 1 2,1)
T
⎛ 1,1 2,1
−1,1
⎝
1,1
3) P1|1 = ⎜ −1,1 ⎜
2,1
4) xˆ1|1 = ( 2,9 2,1
(
1
5) xK = xK
xK2
1
2,1 ⎞
2,1
⎟⎟ ⎠
2,9 2,1)
T
)
T
= ( −0,38 − 0, 62 )
T
2
6) var X K = var X K = 0,52 Exercise 7.4.
Given the state equation of dimension “1” (the state process is a scalar process):
X K +1 = X K .
266
Discrete Stochastic Processes and Optimal Filtering
The state is observed by 2 measurements: Y1 W1 YK = ⎛⎜ YK2 ⎞⎟ affected by noise with WK = ⎛⎜ WK2 ⎞⎟ ⎝ K⎠ ⎝ K⎠
The noise measurement is characterized by its covariance matrix:
σ2 RK = ⎛⎜ O1 σO2 ⎞⎟ . 2 ⎠ ⎝ The initial conditions are:
P0|0 = 1 (covariance of the estimation error at instant “0”)
ˆ = 0 (estimate of and X 0|0
X at instant “0”)
Let us state D = σ 1 + σ 2 + σ 1 σ 2 . 2
2
2
2
1) Give the expression of K(1) Kalman gain at instant “1” according to σ 1 , σ 2 and D 2) Give the estimate Xˆ 1|1 of X 1 at instant “1” according to the measurements
Y11 , Y12 and σ 1,σ 2 and D
σ 12 σ 22 σ 12 +σ 22 instant “1” according to σ . 3) By stating
σ2 =
give P1|1 the covariance of the filtration error at
Solution 7.4.
⎛σ 1) K (1) = ⎜ 1
2
⎝ D
2) Xˆ 1|1 =
(σ Y
2 1 2 1
σ 22 ⎞ ⎟ D ⎠
)
+ σ 12Y12 / D
The Kalman Filter
3) P1|1 =
267
σ2 1+ σ 2
Exercise 7.5.
The fixed distance of an object is evaluated by 2 radar measurements of different qualities. The 1st measurement gives the result:
y1 = r + n1 , measurement of the process Y = X + N1 where we know that the noise N1 is such that: E ( N1 ) = 0 and var ( N1 ) = σ 12 = 10-2 . The 2nd measurement gives: y 2 = r + n2
measurement of the process
Y = X + N2 . E ( N 2 ) = 0 and var ( N 2 ) = w (scalar) The noises N1 and
N 2 are independent.
1) Give the estimate rˆ1 of r that we obtain from the 1st measurement. 2) Refine this estimate by using the 2nd measurement. We will name as rˆ 2 this
new estimate that we will express according to w .
3) Draw the graph rˆ2 ( w) and justify how it appears. Solution 7.5.
1) rˆ1 = xˆ1|1 = y1 2) rˆ2 = xˆ2|2 = y1 +
σ 12
σ 12
+w
( y2 − y1 ) =
100 wy1 + y2 100 w + 1
268
Discrete Stochastic Processes and Optimal Filtering
3) See Figure 7.3.
Figure 7.3. Line graph of the evolution of the estimate according to the power of the noise w , parametered according to the magnitude of the measurements
Appendix A Resolution of Riccati’s equation
Let us show that: PK +1 K = A ( K ) PK K A ( K ) + C ( K ) QK C ( K ) T
T
Let us take again the developed expression of the covariance matrix of the prediction error of section 7.3.6
PK +1 K = Α ( K ) ( I d − K ( K ) H ( K ) ) PK K −1 ( Α ( K ) − Α ( K ) K ( K ) H ( K ) )
T
+ C ( K ) QK C(TK ) + Α ( K ) K ( K ) G ( K ) RK G T ( K ) K T ( K ) ΑT ( K )
The Kalman Filter
269
with:
K ( K ) = PK K −1 H T ( K ) ( Cov I K )
−1
and:
Cov I K = H ( K ) PK K −1 H (TK ) + G ( K ) RK G T ( K ) . By replacing
K ( K ) and Cov I K , by their expressions, in the recursive writing
of, PK +1 K , we are going to be able to simplify the expression of the covariance matrix of the prediction error. To lighten the expression, we are going to eliminate the index K when there is no ambiguity by noting P1 = PK +1 K , P0 = PK K −1 and I = I K
(
)
P1 = A I d − KH P0 ( Α − ΑKH ) + C Q C T + Α K G R G T K T ΑT T
K = P0 H T ( Cov I )
−1
Cov I = H P0 H T + G R GT Thus:
G R G T = Cov I − H P0 H T K G R G T K T = P0 H T ( Cov I )
(
−1
( Cov I − H P
0
H T ) ( Cov I )
= P0 H T − P0 H T ( Cov I ) H P0 H T −1
KGRGT K T = P0 H T ( cov I )
−1T
−1T
) ( Cov I )
H P0T
−1T
H P0T
HP0T − P0 H T ( cov I ) HP0 H T ( cov I )
HP0T
−1
−1T
P1 = AP0 AT − AKHP0 AT − AP0 H T K T AT + AKHP0 H T K T AT + CQC T + (+ P0 H T ( cov I )
−1T
HP0T − P0 H T ( cov I ) HP0 H T ( cov I ) −1
−1T
HP0T ) AT
270
Discrete Stochastic Processes and Optimal Filtering
i.e. in replacing K with its expression. −1
P1 = AP0 AΤ − A P0 H T ( Cov I ) HP0 AT − AP0 H T ( Cov I )
−1T
HP0T AT
K
+ AP0 H
Τ
(
( Cov I )
−1
+ A P0 H Τ ( Cov I )
HP0 H T ( Cov I )
−1T
−1T
HP0T AT + CQC T −1
HP0T − P0 H T ( Cov I ) HP0 H T ( Cov I )
−1T
)
HP0T AT .
The 3rd and 6th term cancel each other out and the 4th and 7th term also cancel each other out which leaves: P1 = AP0 A − AKHP0 A + CQC T
(
)
or: P1 = A ⎡ I d − KH P0 ⎤ A + CQC ⎣ ⎦ T
T
T
T
PK +1 K = A ( K ) ( I d − K ( K ) H ( K ) ) PK K −1 ) AT ( K ) + C ( K ) QK C T ( K ) PK K Thus:
PK +1 K = A ( K ) PK K AT ( K ) + C ( K ) QK C T ( K ) = covariance matrix of prediction error with:
PK K = ( I d − K ( K ) H ( K ) ) PK K −1 = covariance matrix of filtering error. This result will be demonstrated in Appendix B. NOTE.– As mentioned in section 7.3.7 knowing the initial conditions and the Kalman gain, the updating of the covariance matrices made in an iterative manner.
PK |K −1
and
PK |K
can be
The Kalman Filter
271
Appendix B
We are going to arrive at this result starting from the definition P
KK
and by
using the expression of the function K already obtained. NOTE.– Different from the calculation developed in section 7.3.6 we will not show obtained is minimal. that trP KK
Another way of showing the following result:
(
)
PK K = Ε X K K X TK K = PK K −1 − K ( K ) H ( K ) PK K −1
(
)
= Id − K ( K ) H ( K ) P
K K −1
Demonstration
Starting from the definition of the covariance matrix of the filtering error, i.e.:
PK |K
=
(
E X K |K X TK |K
)
It becomes with X K | K = X K − Xˆ K |K and Xˆ K K = Xˆ K K −1 + K ( K ) I K So X K K = X K − Xˆ K K −1 − K ( K ) I K
X K K −1 Let us now use these results to calculate PK |K :
(
) (
)
PK K = PK K −1 − K ( K ) Ε I K X KT K −1 − Ε X K K −1 I KT K (TK ) + K ( K ) Ε ( I K I KT ) K T ( K )
272
Discrete Stochastic Processes and Optimal Filtering
We observe that:
(
) (
)
but I j ⊥ I K and I j ⊥ YK
j ∈ [1, K − 1]
Ε X K K −1 I KT = Ε X K − Xˆ K K −1 I KT
thus Xˆ K K −1 ⊥ I K Given:
(
) (
) (
Ε X K K −1 I KT = Ε X K I KT = E A−1 ( K ) ( X K +1 − C ( K ) N K ) I KT
(
(
)
Thus: Ε X K I K = Ε A T
−1
( K ) X K +1 I KT
)
)
For Ε ( N K ) = 0 However, we have seen elsewhere that:
(
Ε ( X K +1 I KT ) = E ( A ( K ) X K + C ( K ) N K ) H ( K ) X K |K −1 + G ( K )WK =
as:
N K ⊥ WK and Furthermore: For Xˆ K |K −1 ⊥
(
)
)
T
E A ( K ) X K X TK |K −1 H T ( K )
N K ⊥ X K |K −1 = X K − Xˆ K |K −1
(
)
(
)
E X K X TK |K −1 = E Xˆ K |K −1 + X K |K −1 X TK |K −1 = PK |K +1 X K |K −1
The Kalman Filter
273
Thus it becomes:
(
)
Ε X K K −1 I KT = PK K −1H T ( K ) thus:
PK K = PK K-1 − K ( K ) H ( K ) PKT K −1 − PK K −1H T ( K ) K T ( K ) + K ( K ) ( Cov I K ) K T ( K ) with K ( K ) = PK K −1 H ( K ) ( Cov I K ) T
−1
after simplification and in noting that
PK K = PKT K symmetric or hermitic matrix if the elements are complex: PK K = PK K −1 − K ( K ) H ( K ) PK K −1 or:
PK K = [ I d − K ( K ) H ( K ) ] PK K −1 QED Examples treated using Matlab software First example of Kalman filtering
The objective is to estimate an unknown constant drowned in noise. This constant is measured using a noise sensor. The noise is centered, Gaussian and of variance equal to 1. The initial conditions are equal to 0 for the estimate and equal to 1 for the variance of the estimation error.
274
Discrete Stochastic Processes and Optimal Filtering
clear t=0:500; R0=1; constant=rand(1); n1=randn(size(t)); y=constant+n1; subplot(2,2,1) %plot(t,y(1,:)); plot(t,y,’k’);% in B&W grid title(‘sensor’) xlabel(‘time’) axis([0 500 -max(y(1,:)) max(y(1,:))]) R=R0*std(n1)^2;% variance of noise measurement P(1)=1;%initial conditions on variance of error estimation x(1)=0; for i=2:length(t) K=P(i-1)*inv(P(i-1)+R); x(i)=x(i-1)+K*(y(:,i)-x(i-1)); P(i)=P(i-1)-K*P(i-1); end err=constant-x; subplot(2,2,2) plot(t,err,’k’); grid title(‘error’); xlabel(‘time’) axis([0 500 -max(err) max(err)]) subplot(2,2,3) plot(t,x,’k’,t,constant,’k’);% in W&B title(‘x estimated’) xlabel(‘time’) axis([0 500 0 max(x)])
The Kalman Filter
275
grid subplot(2,2,4) plot(t,P,’k’);% in W&B grid,axis([0 100 0 max(P)]) title(‘variance error estimation’) xlabel(‘time’)
Figure 7.3. Line graph of measurement, error, best filtration and variance of error
Second example of Kalman filtering The objective of this example is to extract a dampened sine curve of the noise. The state vector is a two component column vector: X1=10*exp(-a*t)*cos(w*t) X2=10*exp(-a*t)*sin(w*t) The system noise is centered, Gaussian and of variance var(u1) and var(u2).
276
Discrete Stochastic Processes and Optimal Filtering
The noise of the measurements is centered, Gaussian and of variance var(v1) and var(v2). Initial conditions: the components of the state vector are zero at origin and the covariance of the estimation error is initialized at 10* identity matrix. Note: the proposed program is not the shortest and most rapid in the sense of CPU time; it is detailed to allow a better understanding. clear %simulation a=0.05; w=1/2*pi; Te=0.005; Tf=30; Ak=exp(-a*Te)*[cos(w*Te) -sin(w*Te);sin(w*Te) cos(w*Te)];%state matrix Hk=eye(2);% observations matrix t=0:Te:Tf; %X1 X1=10*exp(-a*t).*cos(w*t); %X2 X2=10*exp(-a*t).*sin(w*t); Xk=[X1;X2];% state vector % measurements noise sigmav1=100; sigmav2=10; v1=sigmav1*randn(size(t)); v2=sigmav2*randn(size(t)); Vk=[v1;v2]; Yk=Hk*Xk+Vk;% measurements vector % covariance matrix of measurements noise. Rk=[var(v1) 0;0 var(v2)];% covariance matrix of noise %initialization
The Kalman Filter
sigmau1=0.1;% noise process sigmau2=0.1;%idem u1=sigmau1*randn(size(t)); u2=sigmau2*randn(size(t)); %Uk=[sigmau1*randn(size(X1));sigmau2*randn(size(X2))]; Uk=[u1;u2]; Xk=Xk+Uk; sigq=.01; Q=sigq*[var(u1) 0;0 var(u2)]; sigp=10; P=sigp*eye(2);%covariance matrix of estimation error P(0,0) % line graph subplot(2,3,1) %plot(t,X1,t,X2); plot(t,X1,’k’,t,X2,’k’)% in W&B axis([0 Tf -max(abs(Xk(1,:))) max(abs(Xk(1,:)))]) title(‘state vect. x1&x2’) subplot(2,3,2) %plot(t,Vk(1,:),t,Vk(2,:),‘r’) plot(t,Vk(1,:),t,Vk(2,:));% in W & B axis([0 Tf -max(abs(Vk(1,:))) max(abs(Vk(1,:)))]) title(‘meas. noise w1&w2’) subplot(2,3,3) %plot(t,Yk(1,:),t,Yk(2,:),‘r’); plot(t,Yk(1,:),t,Yk(2,:));% in W&B axis([0 Tf -max(abs(Yk(1,:))) max(abs(Yk(1,:)))]) title(‘observ. proc. y1&y2’) Xf=[0;0]; %%estimation and prediction by Kalman
277
278
Discrete Stochastic Processes and Optimal Filtering
for k=1:length(t); %%prediction Xp=Ak*Xf; % Xp=Xest(k+1,k) and Xf=Xest(k,k) Pp=Ak*P*Ak’+Q; % Pp=P(k+1,k) and P=P(k) Gk=Pp*Hk’*inv(Hk*Pp*Hk’+Rk); % Gk=Gk(k+1) Ik=Yk(:,k)-Hk*Xp;% Ik=I(k+1)=innovation % best filtration Xf=Xp+Gk*Ik; % Xf=Xest(k+1,k+1) P=(eye(2)-Gk*Hk)*Pp;% P=P(k+1) X(:,k)=Xf; P1(:,k)=P(:,1);%1st column of P P2(:,k)=P(:,2);%2nd column of P end err1=X1-X(1,:); err2=X2-X(2,:); %% line graph subplot(2,3,4) %plot(t,X(1,:),t,X(2,:),‘r’) plot(t,X(1,:),‘k’,t,X(2,:),‘k’)% in W&B axis([0*Tf Tf -max(abs(X(1,:))) max(abs(X(1,:)))]) title(‘filtered x1&x2’) subplot(2,3,5) %plot(t,err1,t,err2) plot(t,err1,‘k’,t,err2,‘k’)% in W&B axis([0 Tf -max(abs(err1)) max(abs(err1))]) title(‘errors’)
The Kalman Filter
subplot(2,3,6) %plot(t,P1(1,:),‘r’,t,P2(2,:),‘b’,t,P1(2,:),‘g’,t,P2(1,:),‘y’) plot(t,P1(1,:),‘k’,t,P2(2,:),‘k’,t,P1(2,:),t,P2(1,:),‘b’) axis([ 0 Tf/10 0 max(P1(1,:))]) title(‘covar. matrix filter. error.’)% p11, p22, p21 and p12
Figure 7.4. Line graphs of noiseless signals, noise measurements, filtration, errors and variances
279
Table of Symbols and Notations
N, R, C
Numerical sets
L2
Space of summable square function
a.s.
Almost surely
E
Mathematical expectation
r.v.
Random variable
r.r.v.
Real random variable
a. s. X n ⎯⎯→ X
Convergence a.s. of sequence X n to X 2
⋅, ⋅ L2 ( )
Scalar product in L
⋅
Norm
L2 (
)
L2
Var
Variance
Cov
Covariance
⋅∧⋅
Min ( ⋅ , ⋅)
X ∼ N (m, σ 2 )
Normal law of means m and of variance
AT
Transposed matrix
HKY
Hilbert space generated by YN , scalar or multivariate processes
Pr ojHY K
σ2
A
Projection on Hilbert space generated by Y( t ≤ K )
282
Discrete stochastic Processes and Optimal Filtering
XT
Stochastic process defined on T (time describes T )
p.o.i.
Process with orthogonal increments
p.o.s.i.
Process with orthogonal and stationary increments
Xˆ K |K −1
Prediction at instant K knowing the measurements of the process YK of instants 1 to K −1
X K |K −1
Prediction error
Xˆ K |K
Filtering at instant K knowing its measurements of instants 1 to K
X K |K
Filtering error
∇λ C
Gradient of function C ( λ )
{X P }
The set of element
1D
Indicative function of a set
X which verify the property P D
Bibliography
[BER 98] BERTEIN J.C and CESCHI R., Processus stochastiques et filtrage de Kalman, Hermes, 1998. [BLA 06] BLANCHET G. and GARBIT M., Digital Signal and Image Processing using MATLAB, ISTE, 2006. [CHU 87] CHUI C.K. and CHEN G., Kalman Filtering, Springer-Verlag, 1987. [GIM 82] GIMONET B., LABARRERE M. and KRIEF J.P., Le filtrage et ses applications, Cépadues editions, 1982. [HAY 91] HAYKIN S., Adaptive Filter Theory, Prentice Hall, 1991. [MAC 81] MACCHI O., “Le filtrage adaptatif en telecommunications”, Annales des Télécommunications, 36, no. 11-12, 1981. [MAC 95] MACCHI O., Adaptive Processing: The LMS Approach with Applications in Transmissions, John Wiley, New York, 1995. [MET 72] METIVIER M., Notions fondamentales de la théorie des probabilités, Dunod, 1972. [MOK 00] MOKHTARI M., Matlab et Simulink pour étudiants et ingénieurs, Springer, 2000. [RAD 84] RADIX J.C., Filtrages et lissages statistiques optimaux linéaires, Cépadues editions, 1984. [SHA 88] SHANMUGAN K.S. and BREIPOHL A.M., Random signal, John Wiley & Sons, 1988. [THE 92] THERRIEN C.W., Discrete random signals and statistical signal processing, Prentice Hall, 1992. [WID 85] WIDROW B. and STEARNS S.D., Adaptive Signal processing, Prentice Hall, 1985.
Index
A, B adaptive filtering 197 algebra 3 analytical 187 autocorrelation function 96 autoregressive process 128 Bienaymé-Tchebychev inequality 143 Borel algebra 3
C cancellation 199 Cauchy sequence 158 characteristic functions 4 coefficients 182 colinear 213 convergence 218 convergent 219 correlation coefficients 41 cost function 204 covariance 40 covariance function 107 covariance matrix 258 covariance matrix of the innovation process 248
covariance matrix of the prediction error 249 cross-correlation 184
D deconvolution 199 degenerate Gausian 64 deterministic gradient 225 deterministic matrix 245 diffeomorphism 31 diphaser 209
E eigenvalues 75 215 eigenvectors 75 ergodicity 98 ergodicity of expectation 100 ergodicity of the autocorrelation function 100 expectation 67
F, G filtering 143, 247 Fubini’s theorem 162 Gaussain vectors 13
286
Discrete Stochastic Processes and Optimal Filtering
Gradient algorithm 211
P
H, I
Paley-Wiener 187 prediction 143 199 247 258 prediction error 249 predictor 200 pre-whitening 186 principle axes 215 probability distribution function 12 process noise vector 245 projection 183
Hilbert spaces 145 Hilbert subspace 144 identification 199 IIR filter 186 impulse response 182 independence 13, 246 innovation 240 innovation process 172, 248 causal 187 orthogonal 192
K, L Kalman gain 254 least mean square 184 linear observation space 168 linear space 104 LMS algorithm 222 lowest least mean square error 185
M, N
Q, R quadratic form 216 random variables 194 random vector 1 3 random vector with a density function 8 regression plane 151 Riccati’s equation 258, 260
S
marginals 9 Markov process 101 matrix of measurements 246 measure 5 measurement noise 238 measurement noise vector 246 minimum phase 187 multivariate 245 250 multivariate processes 166 multivector 245
Schwarz inequality 160 Second order stationarity 96 second order stationary processes 199 singular 185 smoothing 143, 247 spectral density 106 stability 218 stable 219 state matrix 245 stationary processes 181 stochastic process 94 system noise 238
O
T
observations 245 orthogonal matrix 216 orthogonal projection 238
Toeplitz 211 216 trace 257 trajectory 94 transfer function 121
Index
transition equation 247 transition matrix 247
U-Z unitary matrix Q 216 variance 39 white noise 109 185 Wiener filter 181
287