Discrete Stochastic Processes and Optimal Filtering

Discrete Stochastic Processes and Optimal Filtering Discrete Stochastic Processes and Optimal Filtering Jean-Claude B...

Author: Jean-Claude Bertein | Roger Ceschi

43 downloads 1076 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Discrete Stochastic Processes and Optimal Filtering


Jean-Claude Bertein Roger Ceschi

First published in France in 2005 by Hermes Science/Lavoisier entitled “Processus stochastiques discrets et filtrages optimaux” First published in Great Britain and the United States in 2007 by ISTE Ltd Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 6 Fitzroy Square London W1T 5DX UK

ISTE USA 4308 Patrice Road Newport Beach, CA 92663 USA

www.iste.co.uk © ISTE Ltd, 2007 © LAVOISIER, 2005 The rights of Jean-Claude Bertein and Roger Ceschi to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Cataloging-in-Publication Data Bertein, Jean-Claude. [Processus stochastiques discrets et filtrages optimaux. English] Discrete stochastic processes and optimal filtering/Jean-Claude Bertein, Roger Ceschi. p. cm. Includes index. "First published in France in 2005 by Hermes Science/Lavoisier entitled "Processus stochastiques discrets et filtrages optimaux"." ISBN 978-1-905209-74-3 1. Signal processing--Mathematics. 2. Digital filters (Mathematics) 3. Stochastic processes. I. Ceschi, Roger. II. Title. TK5102.9.B465 2007 621.382'2--dc22 2007009433 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 13: 978-1-905209-74-3 Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire.

To our families We wish to thank Mme Florence François for having typed the manuscript, and M. Stephen Hazlewood who assured the translation of the book

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiii

Chapter 1. Random Vectors. . . . . . . . . . . . . . . . . . . . . . . 1.1. Definitions and general properties . . . . . . . . . . . . . . . 1.2. Spaces L1(dP) and L2(dP) . . . . . . . . . . . . . . . . . . . . 1.2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. Mathematical expectation and applications . . . . . . . . . . 1.3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2. Characteristic functions of a random vector . . . . . . . 1.4. Second order random variables and vectors . . . . . . . . . . 1.5. Linear independence of vectors of L2(dP) . . . . . . . . . . . 1.6. Conditional expectation (concerning random vectors with density function) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7. Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

Chapter 2. Gaussian Vectors . . . . . . . . . . . . . . . . . . 2.1. Some reminders regarding random Gaussian vectors 2.2. Definition and characterization of Gaussian vectors . 2.3. Results relative to independence . . . . . . . . . . . . 2.4. Affine transformation of a Gaussian vector . . . . . . 2.5. The existence of Gaussian vectors . . . . . . . . . . . 2.6. Exercises for Chapter 2 . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . .

. . . . . . . . . .

1 1 20 20 22 23 23 34 39 47

. . . . . . . . . . . . . .

51 57

. . . . . . .

63 63 66 68 72 74 85

. . . . . . .

. . . . . . . . . .

. . . . . . .

. . . . . . . . . .

. . . . . . .

. . . . . . . . . .

. . . . . . .

. . . . . . . . . .

. . . . . . .

. . . . . . .

viii


Chapter 3. Introduction to Discrete Time Processes. . . . . . . . 3.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. WSS processes and spectral measure. . . . . . . . . . . . . . 3.2.1. Spectral density . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Spectral representation of a WSS process . . . . . . . . . . . 3.3.1. Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2.1. Process with orthogonal increments and associated measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2.2. Wiener stochastic integral . . . . . . . . . . . . . . . . 3.3.2.3. Spectral representation. . . . . . . . . . . . . . . . . . 3.4. Introduction to digital filtering . . . . . . . . . . . . . . . . . 3.5. Important example: autoregressive process . . . . . . . . . . 3.6. Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

93 93 105 106 110 110 111

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

111 113 114 115 128 134

Chapter 4. Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Position of the problem . . . . . . . . . . . . . . . . . . . . . 4.2. Linear estimation . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Best estimate – conditional expectation . . . . . . . . . . . 4.4. Example: prediction of an autoregressive process AR (1) 4.5. Multivariate processes . . . . . . . . . . . . . . . . . . . . . 4.6. Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

141 141 144 156 165 166 175

Chapter 5. The Wiener Filter . . . . . . . . . . . . . 5.1. Introduction. . . . . . . . . . . . . . . . . . . . 5.1.1. Problem position . . . . . . . . . . . . . . 5.2. Resolution and calculation of the FIR filter . 5.3. Evaluation of the least error . . . . . . . . . . 5.4. Resolution and calculation of the IIR filter . 5.5. Evaluation of least mean square error . . . . 5.6. Exercises for Chapter 5 . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

181 181 182 183 185 186 190 191

Chapter 6. Adaptive Filtering: Algorithm of the Gradient and the LMS. 6.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Position of problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4. Minimization of the cost function. . . . . . . . . . . . . . . . . . . . . . 6.4.1. Calculation of the cost function . . . . . . . . . . . . . . . . . . . . 6.5. Gradient algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

197 197 199 202 204 208 211

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

Table of Contents

6.6. Geometric interpretation . . . . . . . . . . . . . . . . 6.7. Stability and convergence . . . . . . . . . . . . . . . 6.8. Estimation of gradient and LMS algorithm . . . . . 6.8.1. Convergence of the algorithm of the LMS . . . 6.9. Example of the application of the LMS algorithm . 6.10. Exercises for Chapter 6 . . . . . . . . . . . . . . . .

ix

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

214 218 222 225 225 234

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

237 237 241 241 244 245 245 246 248 248 250 258 260 262

Table of Symbols and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . .

281

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

283

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

285

Chapter 7. The Kalman Filter . . . . . . . . . . . . . . . 7.1. Position of problem . . . . . . . . . . . . . . . . . . 7.2. Approach to estimation . . . . . . . . . . . . . . . . 7.2.1. Scalar case . . . . . . . . . . . . . . . . . . . . . 7.2.2. Multivariate case . . . . . . . . . . . . . . . . . 7.3. Kalman filtering . . . . . . . . . . . . . . . . . . . . 7.3.1. State equation . . . . . . . . . . . . . . . . . . . 7.3.2. Observation equation. . . . . . . . . . . . . . . 7.3.3. Innovation process . . . . . . . . . . . . . . . . 7.3.4. Covariance matrix of the innovation process. 7.3.5. Estimation . . . . . . . . . . . . . . . . . . . . . 7.3.6. Riccati’s equation. . . . . . . . . . . . . . . . . 7.3.7. Algorithm and summary . . . . . . . . . . . . . 7.4. Exercises for Chapter 7 . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

Preface

Discrete optimal filtering applied to stationary and non-stationary signals allows us to process in the most efficient manner possible, according to chosen criteria, all of the problems that we might meet in situations of extraction of noisy signals. This constitutes the necessary stage in the most diverse domains: the calculation of the orbits or guidance of aircraft in the aerospace or aeronautic domain, the calculation of filters in the telecommunications domain, or in the domain of command systems, or again in that of the processing of seismic signal – the list is non-exhaustive. Furthermore, the study and the results obtained from discrete signals lend themselves easily to the calculator. In their book, the authors have taken pains to stress educational aspects, preferring this to displays of erudition; all of the preliminary mathematics and probability theories necessary for a sound understanding of optimal filtering have been treated in the most rigorous fashion. It should not be necessary to have to turn to other works to acquire a sound knowledge of the subjects studied. Thanks to this work, the reader will be able not only to understand discrete optimal filtering but also will be able easily to go deeper into the different aspects of this wide field of study.

Introduction

The object of this book is to present the bases of discrete optimal filtering in a progressive and rigorous manner. The optimal character can be understood in the sense that we always choose that criterion at the minimum of the norm − L2 of error. Chapter 1 tackles random vectors, their principal definitions and properties. Chapter 2 covers the subject of Gaussian vectors. Given the practical importance of this notion, the definitions and results are accompanied by numerous commentaries and explanatory diagrams. Chapter 3 is by its very nature more “physics” heavy than the preceding ones and can be considered as an introduction to digital filtering. Results that will be essential for what follows will be given. Chapter 4 provides the pre-requisites essential for the construction of optimal filters. The results obtained on projections in Hilbert spaces constitute the cornerstone of future demonstrations. Chapter 5 covers the Wiener filter, an electronic device that is well adapted to processing stationary signals of second order. Practical calculations of such filters, as an answer to finite or infinite pulses, will be developed. Adaptive filtering, which is the subject of Chapter 6, can be considered as a relatively direct application of the determinist or stochastic gradient method. At the end of the process of adaptation or convergence, the Wiener filter is again encountered.

xiv


The book is completed with a study of Kalman filtering which allows stationary or non-stationary signal processing; from this point of view we can say that it generalizes Wiener’s optimal filter. Each chapter is accentuated by a series of exercises with answers, and resolved examples are also supplied using Matlab software which is well adapted to signal processing problems.

Chapter 1

Random Vectors

1.1. Definitions and general properties If we remember that

n

{

= x = ( x1 ,..., xn )

of real n -tuples can be fitted to two laws:

1 = (1, 0,..., 0 ) ,...,

n

= ( 0,..., 0,1) and x ∈

denoted:

⎛ x1 ⎞ ⎜ ⎟ x = ⎜ ⎟ (or xT = ( x1 ,..., xn ) ). ⎜x ⎟ ⎝ 2⎠

}

; j = 1 to n , the set

x, y → x + y and n

making it a vector space of dimension n . The basis implicitly considered on

xj ∈

n

n

×

n

( λ ,x ) → λ x ×

n

n

will be the canonical base n

expressed in this base will be

2


Definition of a real random vector Beginning with a basic definition, without concerning ourselves at the moment

⎛ X1 ⎞ ⎜ ⎟ with its rigor: we can say simply that a real vector X = ⎜ ⎟ linked to a physical ⎜X ⎟ ⎝ n⎠ or biological phenomenon is random if the value taken by this vector is unknown and the phenomenon is not completed. For typographical reasons, the vector will instead be written X

T

or even X = ( X 1 ,..., X n ) when there is no risk of confusion. In other words, given a random vector X and Β ⊂

assertion (also called the event) ( X ∈ Β ) is true or false:

n

= ( X 1 ,..., X n )

we do not know if the

n

Β .X

However, we do usually know the “chance” that X ∈ Β ; this is denoted

Ρ ( X ∈ B ) and is called the probability of the event ( X ∈ Β ) .

After completion of the phenomenon, the result (also called the realization) will be denoted

⎛ x1 ⎞ ⎜ ⎟ x = ⎜ ⎟ or xT = ( x1 ,..., xn ) or even x = ( x1 ,..., xn ) ⎜x ⎟ ⎝ 2⎠ when there is no risk of confusion.

Random Vectors

3

An exact definition of a real random vector of dimension n will now be given. We take as given that: – Ω = basic space. This is the set of all possible results (or tests) random phenomenon. –

ω

linked to a

a = σ -algebra (of events) on Ω , recalling the axioms: 1) Ω ∈ a c 2) if Α ∈ a then the complementary A ∈ a

( Α j , j ∈ J ) is a countable family of events then

3) if

∪ A j is an event,

j∈J

i.e. ∪ A j ∈ a j∈J

n

– –

= space of observables

B(

n

)=

n

Borel algebra on n

which contains all the open sets of

; this is the smallest

σ

n

-algebra on

.

DEFINITION.– X is said to be a real random vector of dimension n defined on

( Ω, a )

if

∀Β ∈ B (

X is a measurable mapping n

)

Χ −1 ( Β ) ∈ a.

( Ω, a ) → (

n

,B (

n

)) ,

i.e.

When n = 1 we talk about a random variable (r.v.). In the following the event Χ

−1

(Β)

is also denoted as

and even more simply as ( X ∈ B ) .

{ω

X (ω ) ∈ B

}

PROPOSITION.– In order for X to be a real random vector of dimension n (i.e. a measurable mapping

( Ω, a ) → (

each component Χ j

n

,B (

n

) ) , it is necessary and it suffices that

j = 1 at n is a real r.v. (i.e. is a measurable mapping

( Ω, a ) → ( R,B ( R ) ) ).

4


ABRIDGED DEMONSTRATION.– It suffices to consider:

Χ −1 ( Β1 × ... × Βn ) where Β1 ,..., Β n ∈ B ( R )

as

we

show

B (R) ⊗

B(

that

n

) = B (R) ⊗

⊗B ( R )

where

⊗ B ( R ) denotes the σ -algebra generated by the measurable

blocks Β1 × ... × Β n .

( B1 ×

Now X

× Bn ) = X 1−1 ( B1 ) ∩

and only if each term concerns

∩ X n−1 ( Bn ) , which concerns

a , that is to say if each

a

if

X j is a real r.v.

DEFINITION.– X = X 1 + iX 2 is said to be a complex random variable defined on

( Ω, a )

say

the

if the real and imaginary parts X 1 and X 2 are real variables, that is to random

( Ω, a ) → (

,B(

variables

)) .

X 1 and X 2

are

measurable

mappings

EXAMPLE.– The complex r.v. can be associated with a real random vector

X = ( X1 ,..., X n ) and a real n -tuple, u = ( u1 ,..., un ) ∈ i∑u j X j j

n

.

= cos∑ u j X j + i sin ∑ u j X j j

j

The study of this random variable will be taken up again when we define the characteristic functions . Law Law Ρ X of the random vector X .

Random Vectors

First of all we assume that the a mapping

P:

σ

-algebra

a → [0,1] verifying:

a

is provided with a measure

5

P , i.e.

1) P ( Ω ) = 1 2) For every family

( Aj , j ∈ J ) of countable pairwise disjoint events:

⎛ ⎞ P ⎜ ∪ Aj ⎟ = ∑ P Aj ⎝ j∈J ⎠ j∈J

( )

DEFINITION.– We call the law of random vector X, the “image measure

P through the mapping of X”, i.e. the definite measure on B ( following way by: ∀Β ∈ B

( n)

(

PX ( Β ) = ∫ dPX ( x1 ,..., xn ) = P X −1 ( B ) Β

n

PX of

) defined in the

)

↑

Definition

(

= P ω

)

X (ω ) ∈ Β = P ( X ∈ Β )

Terms 1 and 2 on the one hand and terms 3, 4 and 5 on the other are different notations of the same mathematical notion.

n

X

X

−1

B ∈B (

(B ) ∈ a

Ω Figure 1.1. Measurable mapping

X

n

)

6


It is important to observe that as the measure calculable for all Β ∈ B The space law is denoted:

n

(

( n ) because X

,B (

n

a,

PX ( B ) is

is measurable.

provided with the Borel algebra n

P is given along

) , PX ) .

B(

n

) and then with the PX

NOTE.– As far as the basic and the exact definitions are concerned, the basic definition of random vectors is obviously a lot simpler and more intuitive and can happily be used in basic applications of probability calculations. On the other hand in more theoretical or sophisticated studies and notably in those calling into play several random vectors, X , Y , Z ,... considering the latter as definite mappings on the same space ( Ω, a ) ,

( i.e. X,Y,Z, ... : ( Ω, a ) → (

n

,B (

n

))) ,

will often prove to be useful even indispensable.

X (ω ) Y (ω )

ω Ω

n

Z (ω )

Figure 1.2. Family of measurable mappings

In effect, the expressions and calculations calling into play several (or the entirety) of these vectors can be written without ambiguity using the space

( Ω, a,P ) . Precisely, the events linked to X , Y , Z ,… are among elements a (and the probabilities of these events are measured by P ).

A of

Random Vectors

7

Let us give two examples: 1) if there are 2 random vectors X , Y : ( Ω, a, P ) →

(

n

,B

( )) n

and

( ) , the event ( X ∈ B ) ∩ (Y ∈ B′) (for example) can be

given B and B′ ∈ B

n

translated by X −1 ( B ) ∩ Y −1 ( B ′ ) ∈ a ;

2) there are 3 r.v. X , Y , Z : ( Ω, a, P ) →

(

,B (

) ) and given a ∈

* +.

Let us try to express the event ( Z ≥ a − X − Y ) . Let us state U = ( X , Y , Z ) and B =

where

B Borel set of

3

{( x, y, z ) ∈

3

, represents the half-space bounded by the plane ( Π ) not

containing the origin 0 and based on the triangle A B C.

C (a)

B (a) A(a)

0 Figure 1.3. Example of Borel set of

U is ( Ω, a ) →

(

3

}

x+y+z ≥ a .

,B

( ) ) measurable and: 3

U ( z ≥ a − x − y ) = (U ∈ B ) = U −1 ( B ) ∈ a .

3

8


NOTE ON SPACE ( Ω, a, P ) .– We said that if we took as given Ω and then on

Ω and then P on

a

and so on, we would consider the vectors

a

X , Y , Z ,... as

measurable mappings:

( Ω, a, P ) → (

n

,B

( )) n

This way of introducing the different concepts is the easiest to understand, but it rarely corresponds to real probability problems. In general

( Ω, a, P )

is not specified or is even given before “ X , Y , Z ,...

measurable mappings”. On the contrary, given the random physical or biological sizes X , Y , Z ,... of

n

, it is on departing from the latter that

X , Y , Z ,... definite measurable mappings on introduced.

( Ω, a, P )

( Ω, a, P )

( Ω, a, P )

and

are simultaneously

is an artificial space intended to serve as a link between

X , Y , Z ,... What has just been set out may seem exceedingly abstract but fortunately the general random vectors as they have just been defined are rarely used in practice. In any case, and as far as we are concerned, we will only have to manipulate in what follows the far more specific and concrete notion of a “random vector with a density function”. DEFINITION.– We say that the law PX of the random vector X has a density if there is a mapping f X :

(

n

,B

( )) → ( n

,B (

measurable, called the density of PX such that ∀B ∈ B

))

which is positive and

( n ).

P ( X ∈ B ) = PX ( B ) = ∫ dPX ( x1 ,..., xn ) = ∫ f X ( x1 ,..., xn ) dx1 ,..., dxn B

B

Random Vectors

9

VOCABULARY.– Sometimes we write

dPX ( x1 ,..., xn ) = f X ( x1 ,..., xn ) dx1 ,..., dxn and we say also that the measure PX admits the density f X with respect to the Lebesgue measure on density f X . NOTE.–

n

. We also say that the random vector

(

f X ( x1 ,...xn ) dx1 ,...dxn = P X ∈

∫B

n

X admits the

) =1.

For example, let the random vector be X = ( X 1 , X 2 , X 3 ) of density

f X ( x1 , x2 , x3 ) = K x3 1∆ ( x1 , x2 , x3 ) where ∆ is the half sphere defined by x12 + x22 + x32 ≤ R with x3 ≥ 0 . We easily obtain via a passage through spherical coordinates:

1 = ∫ Kx3 dx1 dx2 dx3 = K ∆

π R4 4

where K =

4 π R4

Marginals

⎛ X1 ⎞ ⎜ ⎟ Let the random vector be X = ⎜ ⎟ which has the law ⎜X ⎟ ⎝ 2⎠ probability f X . DEFINITION.– The r.v. marginal of

PX

and density of

X j , which is the j th component of X , is called the j th

X and the law PX j of X j is called the law of the j th marginal.

If we know

PX , we know how to find the PX j laws.

10


In effect ∀B ∈ B (

(

)

)

P X j ∈ B = P ⎡⎣( X 1 ∈

∫

(

) ∩ ... ∩ ( X j ∈ B ) ∩ ... ∩ ( X n ∈ )⎤⎦

)

f X x1 ,..., x j ,..., xn dx1 ...dx j ...dxn

×...× B ×...×

using the Fubini theorem:

= ∫ dx j ∫ B

n−1

(

f X x1 ,..., x j ,..., xn

)

dx1...dxn except

The equality applying for all

( )

fX j xj = ∫

n−1

(

dx j

B , we obtain:

f X x1 ,..., x j ,..., xn

)

dx1...dxn except dx j

NOTE.– Reciprocally: except in the case of independent components the knowledge of PX ⇒ / to that of PX . j

EXAMPLE.– Let us consider: 1) A Gaussian pair Z

f Z ( x, y ) =

T

= ( X , Y ) of density of probability

⎛ x2 + y 2 ⎞ 1 exp ⎜ − ⎟. 2π 2 ⎠ ⎝

Random Vectors

11

We obtain the densities of the marginals:

fX ( x) = ∫

fY ( y ) = ∫

+∞ −∞

+∞ −∞

f z ( x, y ) dy =

⎛ x2 ⎞ 1 exp ⎜ − ⎟ and 2π ⎝ 2⎠

f z ( x, y ) dx =

⎛ y2 ⎞ 1 exp ⎜ − ⎟ . 2π ⎝ 2 ⎠

A second random non-Gaussian pair W

T

= (U , V ) whose density of

probability fW is defined by:

fW ( u , v ) = 2 f Z ( u , v ) if uv ≥ 0

fW ( u, v ) = 0 if uv < 0

Let us calculate the marginals

fU ( u ) = ∫

+∞ −∞

fW ( u, v ) dv = ∫ =∫

+∞ −∞ +∞

−∞

2 f Z ( u, v ) dv

if

u≤0

2 f Z ( u , v ) dv

if

u>0

From which we easily come to fU ( u ) =

In addition we obtain fV ( v ) =

⎛ u2 ⎞ 1 exp ⎜ − ⎟ 2π ⎝ 2 ⎠

⎛ v2 ⎞ 1 exp ⎜ − ⎟ 2π ⎝ 2⎠

CONCLUSION.– We can clearly see from this example that the marginal densities (identical in 1 and 2) do not determine the densities of the vectors (different in 1 and 2).

12


Probability distribution function

DEFINITION.– We call the mapping:

FX : ( x1 ,..., xn ) → FX ( x1 ,..., xn )

[ 0,1]

n

the distribution function of a random vector X

T

= ( X1 ,..., X n ) .

This is defined by:

FX ( x1 ,..., xn ) = P ( ( X1 ≤ x1 ) ) ∩ ... ∩ ( X n ≤ xn ) and in integral form, since X is a probability density vector:

FX ( x1 ,..., xn ) = ∫

x1 −∞

xn

∫ −∞ f X ( u1,.., un ) du1.. dun .

Some general properties: – ∀j = 1 at n the mapping x j → FX ( x1 ,..., xn ) is non-decreasing; – FX ( x1 ,..., xn ) → 1 when all the variables x j → ∞ ; – FX ( x1 ,..., xn ) → 0 if one at least of the variables x j → −∞ ; – If ( x1 ,..., xn ) → f X ( x1 ,..., xn ) is continuous, then

∂ n FX = fX . ∂ xn ...∂ x1

EXERCISE.– Determine the probability distribution of the pair

( X ,Y )

of density

f ( x, y ) = K xy on the rectangle ∆ = [1,3] × [ 2, 4] and state precisely the value of

K.

Random Vectors

13

Independence

DEFINITION.– We say that a family of r.v., X 1 ,..., X n , is an independent family if ∀ J ⊂ {1, 2,..., n} and for all the family of B j ∈ B (

):

⎛ ⎞ P ⎜ ∩ X j ∈ Bj ⎟ = ∏ P X j ∈ Bj . ⎝ j∈J ⎠ j∈J

(

∈B (

As

)

(

)

) , it is easy to verify, by making certain Borel sets equal to

,

that the definition of independence is equivalent to the following:

∀B j ∈ B (

)

⎛ n ⎞ n : P ⎜ ∩ X j ∈ B j ⎟ = ∏ P X j ∈ Bj ⎝ j =1 ⎠ j =1

(

)

(

)

again equivalent to:

∀B j ∈ B (

)

n

(

P ( X ∈ B1 × ... × Bn ) = ∏ P X j ∈ Bj j =1

)

i.e. by introducing the laws of probabilities:

∀B j ∈ B (

NOTE.–

B

This

( ) =B ( n

probabilities PX j

n

) : PX ( B1 × ... × Bn ) = ∏ PX j =1

law

of

) ⊗ ... ⊗ B ( ) ) is (defined on B ( ) ).

j

( Bj )

probability

PX

(defined

on

the tensor product of the laws of

Symbolically we write this as PX = PX ⊗ ... ⊗ PX n . 1

14


NOTE.– Let X 1 ,..., X n be a family of r.v. If this family is independent, the r.v. are independent pairwise, but the converse is false. PROPOSITION.– Let X = ( X 1 ,..., X n ) be a real random vector admitting the density of probability f X and the components X 1 , ..., X n admitting the densities

f X ,..., f X n . 1

In order for the family of components to be an independent family, it is necessary that and it suffices that:

n

f X ( x1 ,..., xn ) = ∏ f X j ( x j ) j =1

DEMONSTRATION.– In the simplified case where f X is continuous: – If ( X 1 ,..., X n ) is an independent family:

n ⎛ n ⎞ n FX ( x1 ,..., xn ) = P ⎜ ∩ X j ≤ x j ⎟ = ∏ P X j ≤ x j = ∏ FX j x j ⎝ j =1 ⎠ j =1 j =1

(

)

(

)

( )

by deriving the two extreme members:

f X ( x1 ,..., xn ) =

n ∂F n ∂ n FX ( x1 ,..., xn ) X j (xj ) =∏ =∏ f X j ( x j ) ; ∂xn ...∂x1 ∂x j j =1 j =1

Random Vectors

– reciprocally if f X ( x1 ,..., xn ) = i.e. B j ∈ B (

) , for

15

n

∏ fX ( xj ) : j

j =1

j = 1 at n :

N ⎛ ⎞ ⎛ n ⎞ P ⎜ ∩ X j ∈ Bj ⎟ = P ⎜ X ∈∏ Bj ⎟ = ∫ n f ( x ,..., xn ) dx1... dxn ∏ Bj X 1 ⎝ j =1 ⎠ J =1 ⎝ ⎠ j=1

(

=∫

)

n

∏ Bj ∏ j =1 n

fX

j

n

( )

x j dx j = ∏ ∫ j =1

j =1

Bj

NOTE.– The equality f X ( x1 ,..., xn ) =

fX

j

n

( )

(

x j dx j = ∏ P X j ∈ B j j =1

)

n

∏ fX j ( xj )

is the definition of the

j −1

function of n variables and f X is the tensor product of the functions of a variable

f X . Symbolically we write f X = f X ⊗ ... ⊗ f X n (not to be confused with the 1

j

ordinary

f = f1 f 2 i i f n

product

defined

by

f ( x ) = ( f1 ( x ) f 2 ( x )i i f n ( x ) ) .

EXAMPLE.– Let the random pair X = ( X1 , X 2 ) of density: ⎛ x 2 + x22 ⎞ 1 exp ⎜ − 1 ⎟. ⎜ 2π 2 ⎟⎠ ⎝

As

⎛ x 2 + x22 ⎞ 1 exp ⎜ − 1 ⎟= ⎜ 2π 2 ⎟⎠ ⎝

⎛ x 2 ⎞ 1 ⎛ x22 ⎞ exp ⎜ − ⎟ ⎜− ⎟ ⎜ 2 ⎟ 2π ⎜ 2 ⎟ 2π ⎝ ⎠ ⎝ ⎠ 1

⎛ x2 ⎞ ⎛ x2 ⎞ 1 exp ⎜ − 1 ⎟ and exp ⎜ − 2 ⎟ are the densities of X1 and of X 2 , ⎜ 2 ⎟ ⎜ 2 ⎟ 2π 2π ⎝ ⎠ ⎝ ⎠ these two components X1 and X 2 are independent.

and as

1

16


DEFINITION.– Two random vectors:

(

X = ( X1 ,..., X n ) and Y = Y1 ,..., Yp

)

are said to be independent if:

∀B ∈ B

( ) and B ' ∈B ( ) n

p

P ( ( X ∈ B ) ∩ (Y ∈ B ') ) = P ( X ∈ B ) P (Y ∈ B ') The sum of independent random variables

NOTE.– We are frequently led to calculate the probability P in order for a function of n r.v. given as X 1 ,..., X n to verify a certain inequality. Let us denote this probability

as

P (inequality).

Let

us

assume

that

the

random

vector

X = ( X 1 ,..., X n ) possesses a density of probability f X ( x1 ,..., xn ) . The

method of obtaining P (inequality) consists of determining B ∈ B verifies ( X 1 ,..., X n ) ∈ B .

( n ) which

∫ B f X ( x1,..., xn ) dx1... dxn

We thus obtain: P (inequality) = EXAMPLES.– 1) P ( X 1 + X 2 ≤ z ) = P where B =

{( x, y ) ∈

2

( ( X1, X 2 ) ∈ B ) = ∫ B f X ( x1, x2 ) dx1 dx2

}

x+ y ≤ z

y z 0

z

x

Random Vectors

17

P ( X1 + X 2 ≤ a − X 3 ) = P ( ( X1 , X 2 , X 3 ) ∈ B ) = ∫ f X ( x1 , x2 , x3 ) dx1 dx2 dx3 B

z C (a)

0

y

B (a)

A (a)

x

1 space containing the origin 0 and limited by the plane placed on the 2 triangle A B C from equation x + y + z = a

B is the

P ( Max

( X1 + X 2 ) ≤ z ) = P ( ( X1 , X 2 ) ∈ B ) = ∫ f X ( x1 , x2 ) dx1 dx2 B where B is the non-shaded portion below.

y z 0

z

x

Starting with example 1) we will show the following.

18


PROPOSITION.– Let X and Y be two real independent r.v. of probability densities, respectively f X and fY . The r.v.

Z = X + Y admits a probability density f Z defined as:

f Z ( z ) = ( f X ∗ fY )( z ) = ∫

+∞ −∞

f X ( x ) fY ( z − x ) dx

DEMONSTRATION. – Let us start from the probability distribution of Z .

FZ ( z ) = P ( Z ≤ z ) = P ( X + Y ≤ z ) = P ( ( X , Y ) ∈ B ) (where B is defined in example 1) above)

= ∫ f ( x, y ) dx dy = (independence) B

∫B

f X ( x ) fY ( y ) dx dy

y

x+ y = z

z z−x 0

=∫

+∞ −∞

In stating

=∫

+∞ −∞

f X ( x ) dx ∫

z− x −∞

x

z

x

fY ( y ) dy .

y =u−x:

f X ( x ) dx ∫

z −∞

fY ( u − x ) du = ∫

z −∞

du ∫

+∞ −∞

f X ( x ) fY ( u − x ) dx .

Random Vectors

The mapping u →

+∞

∫ −∞

19

f X ( x ) fY ( u − x ) dx being continuous, of which

FZ ( z ) is a primitive from and:

FZ′ ( z ) = f Z ( z ) = ∫

+∞ −∞

f X ( x ) fY ( z − x ) dx .

NOTE.– If (for example) the support of f X and fY is

+

, i.e. if

f X ( x ) = f X ( x )1 [0,∞[ ( x ) and fY ( y ) = fY ( y ) 1 [0,∞[( y ) we easily arrive at:

z

f Z ( z ) = ∫ f X ( x ) fY ( z − x ) dx 0

EXAMPLE.– X and Y are two exponential r.v. of parameter independent. Let us take as given For

z≤0

For

z≥0

λ

which are

Z = X +Y :

fZ ( z ) = 0 .

fZ ( z ) = ∫

+∞

−∞

and f Z ( z ) = λ ze 2

z −λ z − x f X ( x ) fY ( z − x ) dx = ∫ λ e− λ x λ e ( ) dx = λ 2 ze− λ z

−λ z

0

1[0,∞[ ( z ) .

20


1.2. Spaces

L1 ( dP ) and L2 ( dP )

1.2.1. Definitions

The family of r.v. X :

ω

→

X (ω )

( Ω, a,P ) ( ,B ( ) ) , denoted

forms a vector space on

Two vector subspaces of will be defined.

ε

ε.

play a particularly important role and these are what

The definitions would be in effect the final element in the construction of the Lebesgue integral of measurable mappings, but this construction will not be given here and we will be able to progress without it. DEFINITION.– We say that two random variables X and X ′ defined on ( Ω, a )

are almost surely equal and we write X = X ′ a.s. if X = X ' except eventually on

an event N of zero probability (that is to say N ∈ a and P ( N ) = 0 ). We note: – X = {class (of equivalences) of r.v.

X ′ almost definitely equal to X };

– O = {class (of equivalences) of r.v. almost definitely equal to O }. We can now give: – the definition of L ( dP ) as a vector space of first order random variables; and 1

2

– the definition of L

{ L ( dP ) = {

( dP ) as a vector space of second order random variables:

L1 ( dP ) = r. v. X 2

r.v.

X

} X (ω ) dP (ω ) < ∞ }

∫ Ω X (ω ) ∫Ω

2

dP (ω ) < ∞

Random Vectors

21

where, in these expressions, the r.v. are clearly defined except for at a zero probability event, or otherwise: the r.v. X are any representatives of the X classes, because, by construction the integrals of the r.v. are not modified if we modify the latter on zero probability events. Note on inequality

∫ Ω X (ω )

dP (ω ) < ∞

Introducing the two positive random variables:

X + = Sup ( X , 0 ) and X − = Sup ( − X 1 0 ) we can write X = X

+

− X − and X = X + + X − .

Let X ∈ L ( dP ) ; we thus have: 1

∫ Ω X (ω ) dP (ω ) < ∞ ⇔ ∫ Ω X (ω ) dP (ω ) < ∞ and − ∫ Ω X (ω ) dP (ω ) < ∞. +

So, if X ∈ L ( dP ) , the integral 1

+ − ∫ Ω X (ω ) dP (ω ) = ∫ Ω X (ω ) dP − ∫ Ω X (ω ) dP (ω )

is defined without ambiguity. 2

NOTE.– L

( dP ) ⊂ L1 ( dP ) 2

In effect, given X ∈ L

(∫

Ω

( dP ) , following Schwarz’s inequality:

X (ω ) dP (ω )

) ≤∫ 2

Ω

X 2 (ω ) dP ∫ dP (ω ) < ∞ Ω

1

22


⎛ 1 ⎛ x − m ⎞2 ⎞ 1 exp ⎜ − ⎜ ⎟ ). ⎜ 2 ⎝ σ ⎟⎠ ⎟ 2πσ ⎝ ⎠

EXAMPLE.– Let X be a Gaussian r.v. (density This belongs to L ( dP ) and to L 1

2

Let Y be a Cauchy r.v. (density

( dP ) .

(

1

π 1 + x2

)

).

This does not belong to L ( dP ) and thus does not belong to L 1

2

( dP ) either.

1.2.2. Properties

– L ( dP ) is a Banach space; we will not use this property for what follows; 1

2

– L

( dP )

is a Hilbert space. We will give here the properties without any

demonstration. 2

* We can equip L

( dP ) with the scalar product defined by:

∀X , Y ∈ L2 ( dP ) < X,Y > = ∫ X (ω ) Y (ω ) dP (ω ) Ω

This expression is well defined because following Schwarz’s inequality:

∫Ω

2

X (ω ) Y (ω ) dP (ω ) ≤ ∫ X 2 (ω ) dP (ω ) ∫ Y 2 (ω ) dP (ω ) < ∞ Ω

Ω

and the axioms of the scalar product are immediately verifiable.

Random Vectors 2

* L

23

( dP ) is a vector space normed by: X = < X, X > =

2 ∫ Ω X (ω ) dP (ω )

It is easy to verify that:

∀X , Y ∈ L2 ( dP )

X +Y ≤ X + Y

∀X ∈ L2 ( dP ) and ∀λ ∈

λX = λ

X

As far as the second axiom is concerned: – if X = 0 ⇒ X – if

X =

(∫

Ω

=0;

)

X 2 (ω ) dP (ω ) = 0 ⇒ X = 0 a.s.

L2 ( dP ) is a complete space for the norm 2

sequence X n converges to X ∈ L

.

( or

)

X =0 .

defined above. (Every Cauchy

( dP ) .)

1.3. Mathematical expectation and applications 1.3.1. Definitions

We are studying a general random vector (not necessarily with a density function):

(

X = X1,..., X n

)

:

( Ω, a , P ) → (

n

,B

( )) . n

24


Furthermore, we give ourselves a measurable mapping:

Ψ:

(

n

,B

( n )) → (

))

,B (

Ψ X (also denoted Ψ ( X ) or Ψ ( X 1 ,..., X n ) ) is a measurable mapping (thus

an r.v.) defined on ( Ω, a ) .

X

(

( Ω, a, P )

n

,B

n

X

Ψ

Ψ X

(

( ), P )

,B (

))

DEFINITION.– Under the hypothesis Ψ X ∈ L ( dP ) , we call mathematical 1

expectation of the random value Ψ X the expression Ε ( Ψ X ) defined as:

Ε(Ψ X ) = ∫

Ω

(Ψ

or, to remind ourselves that

X )(ω ) dP (ω )

X is a vector:

Ε ( Ψ ( X 1 ,..., X 2 ) ) = ∫ Ψ ( X 1 (ω ) ,..., X n (ω ) ) dP (ω ) Ω

NOTE.– This definition of the mathematical expectation of Ψ X is well adapted to general problems or to those of a more theoretical orientation; in particular, it is 2

by using the latter that we construct L

( dP ) the Hilbert space of the second order

r.v. In practice, however, it is the PX law (similar to the measure P by the mapping

X ) and not P that we do not know. We thus want to use the law PX to

Random Vectors

25

express Ε ( Ψ X ) , and it is said that the calculation of Ε ( Ψ X ) from the space ( Ω, a,P ) to the space

(

n

,B

( ), P ) . n

X

In order to simplify the writing in the theorem that follows (and as will often

occur in the remainder of this work) ( X 1 ,..., X n ) , ( x1 ,..., xn ) and dx1...dxn will often be denoted as

X , x and dx respectively.

Transfer theorem

Let us assume Ψ X ∈ L ( dP ) ; we thus have: 1

Ε(Ψ X ) = ∫

Ω

(Ψ

X )(ω ) dP (ω ) = ∫

n

Ψ ( x ) dPX ( x )

In particular, if PX admits a density f X :

E (Ψ X ) = ∫

n

Ψ ( x ) f X ( x ) dx and Ε X = ∫ x f X ( x ) dx .

Ψ ∈ L1 ( dPX ) DEMONSTRATION.– – The equality of 2) is true if Ψ = 1B with B ∈ B

Ε ( Ψ X ) = Ε (1B X ) = PX ( B ) =∫

n 1B

( x ) dPX ( x ) = ∫

n

– The equality is still true if m

j =1

because

Ψ ( x ) dPX ( x ).

Ψ is a simple measurable mapping, that is to say if

Ψ = ∑ λ j 1B or B j ∈ B j

( n)

( ) and are pairwise disjoint. n

26


We have in effect:

(

m

)

m

( )

Ε ( Ψ X ) = ∑ λ j Ε 1B j X = ∑ λ j PX B j j =1

m

= ∑λj ∫

n 1B

j =1

=∫

n

( x ) dPX ( x ) = ∫ j

j =1

⎛ m ⎞ λ j 1B ( x ) ⎟ dPX ( x ) n ⎜∑ ⎜ j =1 ⎟ j ⎝ ⎠

Ψ ( x ) dPX ( x )

If we now assume that Ψ is a positive measurable mapping, we know that it is the limit of an increasing sequence of positive simple measurable mappings Ψ P .

⎛

We thus have ⎜

∫ Ω ( Ψ p X ) (ω ) = ∫

⎜ with Ψp ⎝

n

Ψ p ( x ) dPX ( x )

Ψ

Ψ p X is also a positive increasing sequence which converges to Ψ X and by taking the limits of the two members when p ↑ ∞ , we obtain, according to the monotone convergence theorem:

∫Ω (Ψ

X )(ω ) dP (ω ) = ∫

n

Ψ ( x ) dPX ( x ) .

If Ψ is a measurable mapping of any sort we still use the decomposition

Ψ = Ψ + − Ψ − and

Ψ = Ψ+ + Ψ− . +

Furthermore, it is clear that ( Ψ X ) = Ψ

+

−

X and ( Ψ X ) = Ψ − X .

It emerges that: +

−

(

) (

Ε Ψ X = Ε (Ψ X ) + Ε (Ψ X ) = Ε Ψ+ X + Ε Ψ− X

)

Random Vectors

27

i.e. according to what we have already seen:

=∫

n

Ψ + ( x ) dPX ( x ) + ∫

n

Ψ − ( x ) dPX ( x ) = ∫

n

Ψ ( x ) dPX ( x )

As Ψ X ∈ L ( dP ) , we can deduce from this that Ψ ∈ L ( dPX 1

1

(reciprocally if Ψ ∈ L ( dPX 1

In particular Ε ( Ψ X )

) then Ψ

+

)

X ∈ L1 ( dP ) ).

and Ε ( Ψ X ) are finite, and −

(

) (

Ε (Ψ X ) = Ε Ψ+ X − Ε Ψ− X =∫

n

Ψ + ( x ) dPX ( x ) − ∫

=∫

n

Ψ ( x ) dPX ( x )

n

)

Ψ − ( x ) dPX ( x )

NOTE.– (which is an extension of the preceding note) In certain works the notion of “a random vector as a measurable mapping” is not developed, as it is judged as being too abstract. In this case the integral

∫

nΨ

( x ) dPX ( x ) = ∫

n

Ψ ( x ) f X ( x ) dx

PX admits the density f X ) is given as a definition of Ε ( Ψ X ) . EXAMPLES.– 1) Let the “random Gaussian vector” be X

f X ( x1 , x2 ) =

where

ρ ∈ ]−1,1[

1 2π 1 − ρ 2

T

= ( X1 , X 2 ) of density:

⎛ 1 1 ⎞ exp ⎜ − x12 − 2 ρ x1 x2 + x22 ⎟ 2 ⎝ 2 1-ρ ⎠

(

)

and let the mapping Ψ be ( x1 , x2 ) → x1 x2

3

(if

28


The condition:

∫

2

x1 x23

⎛ 1 exp ⎜ − 2 ⎜ 2 1− ρ 2 2π 1 − ρ ⎝ 1

(

)

(

⎞ x12 − 2 ρ x1 x2 + x22 ⎟ dx1 dx2 < ∞ ⎟ ⎠

)

is easily verifiable and:

EX1 X 23 = ∫

x x3 2 1 2

⎛ 1 exp ⎜ − ⎜ 2 2 1− ρ2 2n 1 − ρ ⎝ 1

(

)

(

⎞ x12 − 2 ρ x1 x2 + x22 ⎟ dx1dx2 ⎟ ⎠

)

2) Given a random Cauchy variable of density

1

π∫

x

fX ( x) =

1

1 π 1 + x2

1 dx = +∞ thus X ∉ L1 ( dP ) and EX is not defined. 1 + x2

Let us consider next the transformation Ψ which consists of “rectifying and clipping” the r.v. X .

Ψ

K

−K

0

K

x

Figure 1.4. Rectifying and clipping operation

Random Vectors

Ψ ( x ) dPX ( x ) =

∫

1

K

K

−K

∞

29

K

∫ − K x 1 + x 2 dx + ∫ −∞ 1 + x 2 dx + ∫ K 1 + x2 dx

⎛π ⎞ = ln 1 + K 2 + 2 K ⎜ − K ⎟ < ∞. ⎝2 ⎠

(

)

Thus, Ψ X ∈ L ( dP ) and: 1

Ε(Ψ X ) = ∫

+∞ −∞

⎛π ⎞ Ψ ( x ) dPX ( x ) = ln 1 + K 2 + 2 K ⎜ − K ⎟ . ⎝2 ⎠

DEFINITION.– Given np r.v. X jk

(

( j = 1 at

)

p, k = 1 at n ) ∈ L1 ( dP ) , we

⎛ X 11 … X 1n ⎞ ⎜ ⎟ define the mean of the matrix ⎡⎣ X jk ⎤⎦ = ⎜ ⎟ by: ⎜ X p1 X pn ⎟⎠ ⎝ ⎛ ΕX 11 … ΕX 1n ⎞ ⎜ ⎟ Ε ⎡⎣ X jk ⎤⎦ = ⎜ ⎟. ⎜ ΕX p1 ΕX pn ⎟⎠ ⎝ In particular, given a random vector:

⎛ X1 ⎞ ⎜ ⎟ X = ⎜ ⎟ or X T = ( X 1 ,..., X n ) verifying X j ∈ L1 ( dP ) ∀j = 1 at n , ⎜X ⎟ ⎝ n⎠

(

)

30


⎛ EX 1 ⎞ ⎜ ⎟ ⎡ T⎤ We state Ε [ X ] = ⎜ ⎟ or Ε ⎣ X ⎦ = ( EX1 ,..., ΕX n ) . ⎜ EX ⎟ ⎝ n⎠

(

)

Mathematical expectation of a complex r.v.

DEFINITION.– Given a complex r.v. X = X 1 +i X 2 , we say that:

X ∈ L1 ( dP ) if X 1 and X 2 ∈ L1 ( dP ) . If X ∈ L ( dP ) we define its mathematical expectation as: 1

Ε ( X ) = ΕX 1 + i Ε X 2 . Transformation of random vectors

We are studying a real random vector X = ( X 1 ,..., X n ) with a probability

density of f X ( x )1D ( x ) = f X ( x1 ,..., xn ) 1D ( x1 ,..., xn ) where D is an open set of

n

.

Furthermore, we give ourselves the mapping:

α : x = ( x1 ,..., xn ) → y = α ( x ) = (α1 ( x1 ,..., xn ) ,...,α n ( x1 ,..., xn ) ) ∆

D We assume that that

α

α

1

is a C -diffeomorphism of D on an open ∆ of

is bijective and that

α

and

β =α

−1

1

are of class C .

n

, i.e.

Random Vectors

X

α

31

Y =α (X )

∆

D Figure 1.5. Transformation of a random vector

The random vector Y = (Y1 ,..., Yn ) =

X

by a

C1 -diffeomorphism

(α1 ( X1,..., X n ) ,...,α n ( X1,..., X n ) )

takes its values on ∆ and we wish to determine fY ( y )1∆ ( y ) , its probability density. PROPOSITION.–

fY ( y )1∆ ( y ) = f X ( β ( y ) ) Det J β ( y ) 1∆ ( y ) DEMONSTRATION.– Given:

Ψ ∈ L1 ( dy )

Ε ( Ψ (Y ) ) = ∫

n

Ψ ( y ) fY ( y )1∆ ( y ) dy .

Furthermore:

Ε ( Ψ (Y ) ) = ΕΨ (α ( X ) ) = ∫

n

Ψ (α ( x ) ) f X ( x )1D ( x ) dx .

By applying the change of variables theorem in multiple integrals and by denoting the Jacobian matrix of the mapping

=∫

n

β

as J β ( y ) , we arrive at:

Ψ ( y ) f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) dy .

32


Finally, the equality:

∫ n Ψ ( y ) fY ( y )1∆ ( y ) dy = ∫ n Ψ ( y ) f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) dy has validity for all Ψ ∈ L ( dy ) ; we deduce from it, using Haar’s lemma, the 1

formula we are looking for:

fY ( y )1∆ ( y ) = f X ( β ( y ) ) Dét J β ( y ) 1∆ ( y ) IN PARTICULAR.– If X is an r.v. and the mapping:

α : x → y = α ( x) D⊂

Α⊂

the equality of the proposition becomes:

fY ( y )1∆ ( y ) = f X ( β ( y ) ) β ′ ( y ) 1∆ ( y ) EXAMPLE.– Let the random ordered pair be Z = ( X , Y ) of probability density:

f Z ( x, y ) =

1 x y

1 2 2 D

( x, y )

where

D = ]1, ∞[ × ]1, ∞[ ⊂

2

Random Vectors 1

Furthermore, we allow the C -diffeomorphism

33

α:

α

β

D 1

∆ 1

0

x

1

0

u

1

defined by:

⎛ ⎜ ⎜ ⎜ ⎜ ⎜⎜ ⎝

α : ( x, y ) → ( u = α1 ( x, y ) = xy , v = α 2 ( x, y ) = x y ) ∈D

∈∆

(

β : ( u, v ) → x = β1 ( u, v ) = uv , y = β 2 ( u, v ) = u v ∈∆

)

∈D

⎛ v u 1⎜ J β ( u, v ) = ⎜ 2⎜ 1 ⎜ uv ⎝

⎞ v ⎟ 1 ⎟ and Det J β ( u , v ) = . u⎟ 2 v − 3 ⎟ v 2⎠ u

(

The vector W = U = X Y , V = X

Y

) thus admits the probability density:

fW ( u , v )1∆ ( u , v ) = f Z ( β1 ( u , v ) , β 2 ( u , v ) ) Det J β ( u, v ) 1∆ ( u, v ) =

(

1 uv

)

1

2

( uv )

2

1 1∆ ( u, v ) = 12 1∆ ( u, v ) 2v 2u v

34


NOTE.–

Reciprocally

W = (U , V )

vector

of

probability

density

fW ( u , v ) 1∆ ( u , v ) and whose components are dependent is transformed by β

into vector Z = ( X , Y ) of probability density f Z ( x, y ) 1D ( x, y ) and whose components are independent.

1.3.2. Characteristic functions of a random vector

DEFINITION.– We call the characteristic function of a random vector:

X T = ( X1 ... X n ) the mapping ϕ X : ( u1 ,..., u2 ) → ϕ X ( u1 ,..., u2 ) defined by: n

⎛ ⎜ ⎝

n

⎞ ⎟ ⎠

ϕ X ( u1 ,..., un ) = Ε exp ⎜ i ∑ u j X j ⎟ =∫

j =1

⎛ n ⎞ exp ⎜⎜ i ∑ u j x j ⎟⎟ f X ( x1 ,...xn ) dx1... dxn n ⎝ j =1 ⎠

(The definition of ΕΨ ( X 1 ,..., X n ) is written with:

⎛ n ⎞ Ψ ( X 1 ,..., X n ) = exp ⎜ i ∑ u j X j ⎟ ⎜ j =1 ⎟ ⎝ ⎠ and the integration theorem is applied with respect to the image measure.)

ϕX

is thus the Fourier transform of

ϕX = F ( fX )

.

fX

which can be denoted

Random Vectors

35

(In analysis, it is preferable to write:

F ( f X ) ( u1 ,..., un ) = ∫

n ⎛ ⎞ exp − i u x f u ,..., un ) dx1... dxn . ) ⎜ j j⎟ n ⎜ ∑ ⎟ X( 1 ⎝ j =1 ⎠

Some general properties of the Fourier transform: –

ϕ X ( u1 ,...u2 ) ≤ ∫

n

f X ( x1 ,..., xn ) dx1... dxn = ϕ X ( 0,..., 0 ) = 1 ;

– the mapping ( u1 ,..., u2 ) → ϕ X ( u1 ,..., u2 ) is continuous; n

– the mapping F : f X → ϕ X is injective. Very simple example

[

]n

The random vector X takes its values from within the hypercube ∆ = −1,1 and it admits a probability density:

f X ( x1 ,..., xn ) =

1 1 ∆( x1 ,..., xn ) 2n

(note that components X j are independent).

1 exp i ( u1 x1 + ... + un xn ) dx1...dxn 2n ∫ ∆ n sin u 1 n +1 j = n ∏ ∫ exp iu j x j dx j = ∏ uj 2 j =1 −1 j =1

ϕ ( u1 ,..., un ) =

(

)

where, in this last expression and thanks to the extension by continuity, we replace:

sin u1 sin u2 by 1 if u1 = 0 , by 1 if u2 = 0 ,... u1 u2

36


Fourier transform inversion

F

fX

ϕX

F −1 As shall be seen later in the work, there are excellent reasons (simplified calculations) for studying certain questions using characteristic functions rather than probability densities, but we often need to revert back to densities. The problem which arises is that of the invertibility of the Fourier transform F , which is studied in specialized courses. It will be enough here to remember one condition. PROPOSITION.– If (i.e.

∫

n

ϕ X ( u1 ,..., un ) du1...dun < ∞

ϕ X ∈ L1 ( du1...dun ) ), f X ( x1 ,..., xn ) =

1

( 2π )n

then F

∫

−1

exists and:

⎛ n ⎞ exp − i u x ϕ ⎜ j j⎟ n ⎜ ∑ ⎟ X = 1 j ⎝ ⎠

( u1 ,..., un ) du1...dun .

In addition, the mapping ( x1 ,..., xn ) → f X ( x1 ,..., xn ) is continuous. EXAMPLE.–

Given

a

Gaussian

r.v.

(

)

X ∼ Ν m, σ 2 ,

i.e.

that

⎛ 1 ⎛ x − m ⎞2 ⎞ 1 exp ⎜ − ⎜ ⎟ and assuming that σ ≠ 0 we obtain ⎜ 2 ⎝ σ ⎟⎠ ⎟ 2πσ ⎝ ⎠ 2 2 ⎛ uσ ⎞ ϕ X ( u ) = exp ⎜ ium − ⎟. 2 ⎝ ⎠ fX ( x) =

It is clear that ϕ X ∈ L1 ( du ) and

fX ( x) =

1 2π

+∞

∫ −∞ exp ( −iux ) ϕ X ( u ) du .

Random Vectors

37

Properties and mappings of characteristic functions 1) Independence

PROPOSITION.– In order for the components X j of the random vector

X T = ( X 1 ,..., X n ) to be independent, it is necessary and sufficient that: n

ϕ X ( u1 ,..., un ) = ∏ ϕ X ( u j ) . j

j =1

DEMONSTRATION.– Necessary condition:

⎛ ⎜ ⎝

⎞ ⎟ ⎠

n

ϕ X ( u1 ,..., un ) = ∫ exp ⎜ i ∑ u j x j ⎟ f X ( x1 ,..., xn ) dx1...dxn n

j =1

Thanks to the independence:

=∫

n ⎛ n ⎞ n exp i u x f x dx ... dx = ⎜ ⎟ ( ) ∏ϕ X j (u j ) . 1 j j ⎟∏ X j n n ⎜ ∑ j j =1 ⎝ j =1 ⎠ j =1

Sufficient condition: we start from the hypothesis:

∫ =∫

⎛

i n exp ⎜ ⎜

n

⎞

∑ u j x j ⎟⎟ f x ( x1,..., xn ) dx1... dxn

⎝ j =1 ⎠ ⎛ n ⎞ n exp i u x x j dx1... dxn ⎜ ⎟ j j ⎟∏ f X n ⎜ ∑ j = 1 j = 1 j ⎝ ⎠

( )

38


from which we deduce: f X ( x1 ,..., xn ) =

n

∏ f X j ( x j ) , i.e. the independence, j =1

since the Fourier transform f X ⎯⎯ → ϕ X is injective. NOTE.– We must not confuse this result with that which concerns the sum of independent r.v. and which is stated in the following manner. n

If X 1 ,..., X n are independent r.v., then

ϕ∑ X ( u ) = ∏ ϕ X j

j

j =1

j

(u ) .

If there are for example n independent random variables:

(

)

(

X 1 ∼ Ν m1 , σ 2 ,..., X n ∼ Ν mn , σ 2 and n real constants

)

λ1 ,..., λn , the note above enables us to determine the law of

n

∑λj X j .

the random value

j =1

λj X j

In effect the r.v.

ϕ∑ j

λ X

=e

and thus

j

j

are independent and:

n

n

j =1

j =1

( )

n

( u ) = ∏ ϕλ j X j ( u ) = ∏ ϕ X j λ j u = ∏ e

1 iu ∑ λ j m j − u 2 ∑ λ 2j σ 2j 2 j j

n

⎛

j =1

⎝

⎞

∑ λ j X j ∼ Ν ⎜⎜ ∑ λ j m j , ∑ λ 2j σ 2j ⎟⎟ . j

j

⎠

j =1

1 iuλ j m j − u 2 λ 2j σ 2j 2

Random Vectors

39

2) Calculation of the moment functions of the components X j (up to the 2nd order, for example)

Let us assume

ϕX ∈C2

( ). n

In applying Lebesgue’s theorem (whose hypotheses are immediately verifiable) once we obtain:

∀K = 1 to n

∂ϕ X ( 0,..., 0 ) ∂uK

⎛ ⎞ ⎛ ⎞ = ⎜ ∫ n ixK exp ⎜ i ∑ u j x j ⎟ f X ( x1 ,..., xn ) dx1...dxn ⎟ ⎜ j ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠( u1 = 0,...,un = 0 ) = i∫

n

xK f X ( x1 ,..., xn ) dx1...dxn = i Ε X K

i.e. Ε X K = −i

∂ϕ X ( 0,..., 0 ) . ∂u K

By applying this theorem a second time, we have:

∀k

and

∈ (1,2,..., n )

EX K X =

∂ 2ϕ X ( 0,..., 0 ) ∂u ∂uK

1.4. Second order random variables and vectors

Let us begin by recalling the definitions and usual properties relative to 2nd order random variables. DEFINITIONS.–

Given

X ∈ L2 ( dP )

of

probability

density

E X 2 and E X have a value. We call variance of X the expression: Var X = Ε X 2 − ( Ε X ) = E ( X − Ε X ) 2

2

fX ,

40


We call standard deviation of

X the expression σ ( X ) = Var X . 2

Now let two r.v. be X and Y ∈ L

( dP ) . By using the scalar product on

L2 ( dP ) defined in 1.2 we have: ΕXY = < X , Y > = ∫ X (ω ) Y (ω ) dP (ω ) Ω

and, if the vector Z = ( X , Y ) admits the density f Ζ , then:

EXY = ∫

2

xy f Z ( x, y ) dx dy .

We have already established, by applying Schwarz’s inequality, that ΕXY actually has a value. 2

DEFINITION.– Given that two r.v. are X , Y ∈ L of

( dP ) , we call the covariance

X and Y : The expression Cov ( X , Y ) = ΕXY − ΕX ΕY . Some observations or easily verifiable properties:

Cov ( X , X ) = V ar X Cov ( X , Y ) = Cov (Y , X ) – if

λ

is a real constant

Var ( λ X ) = λ 2 Var X ;

– if X and Y are two independent r.v., then Cov ( X , Y ) = 0 but the reciprocal is not true;

Random Vectors

41

– if X 1 ,..., X n are pairwise independent r.v.

Var ( X1 + ... + X n ) = Var X1 + ... + Var X n Correlation coefficients

The

Var X j (always positive) and the Cov ( X j , X K ) (positive or negative)

can take extremely high algebraic values. Sometimes it is preferable to use the (normalized) “correlation coefficients”:

ρ ( j, k ) =

Cov ( X j , X K ) Var X j

Var X K

whose properties are as follows:

ρ ( j , k ) ∈ [ −1,1] In effect, let us assume (solely to simplify its expression) that X j and X K are centered and let us study the 2nd degree trinomial in

λ.

Τ ( λ ) = Ε ( λ X j − X K ) = λ 2ΕX 2j − 2λΕ ( X j X K ) + Ε X K2 ≥ 0 2

Τ ( λ ) ≥ 0 ∀λ ∈

(

∆ = E X jXK is

negative

or

ρ ( j , k ) ∈ [ −1,1] ).

)

2

zero,

if and only if the discriminant

− Ε X 2j Ε X K2 i.e.

Cov ( X j , X K )

This is also Schwarz’s inequality.

2

≤ Var X j Var X K

(i.e.

42


Furthermore, we can make clear that

ρ ( j , k ) = ±1

if and only if ∃ λ 0 ∈

such that X K = λ 0 X j p.s. In effect by replacing X K with definition of

λ0 X j

in the

ρ ( j , k ) , we obtain ρ ( j , K ) = ±1 .

Reciprocally, if

ρ ( j , K ) = 1 (for example), that is to say if:

∆ = 0 , ∃ λ0 ∈

such that X K = λ 0 X j a.s.

If X j and X K are not centered, we replace in what has gone before X j by

X j − Ε X j and X K by X K − Ε X K ). 2)

If

(

Xj

and

)

Xk

are

independent,

Ε X j Xk = Ε X j Ε Xk

so

Cov X j , X k = 0 , ρ ( j , k ) = 0 . However, the reciprocity is in general false, as is proven in the following example. Let Θ be a uniform random variable on

f Θ (θ ) =

[0 , 2 π [

that is to say

1 1 (θ ) . 2π [ 0 , 2 π [

In addition let two r.v. be X j = sin Θ and X K = c os Θ . We can easily verify that Ε X j

(

Cov X j , X k

)

and

ρ ( j , k ) are

X j and X k are dependent.

, Ε X k , Ε X j X k are zero; thus 2

2

zero. However, X j + X k = 1 and the r.v.

Random Vectors

43

Second order random vectors

DEFINITION.– We say that a random vector X 2

if X j ∈ L

( dP )

T

= ( X1 ,..., X n ) is second order

∀ j = 1 at n .

DEFINITION.– Given a second order random vector X

T

= ( X1 ,..., X n ) , we call

the covariance matrix of this vector the symmetric matrix:

… Cov ( X 1 , X n ) ⎞ ⎛ Var X 1 ⎜ ⎟ ΓX = ⎜ ⎟ ⎜ Cov ( X , X ) ⎟ X Var 1 n n ⎝ ⎠ If we return to the definition of the expectation value of a matrix of r.v., we see T that we can express it as Γ X = Ε ⎡( X − Ε X )( X − Ε X ) ⎤ .

⎣

⎦

We also can observe that Γ X −ΕX = Γ X . NOTE.– Second order complex random variables and vectors: we say that a complex random variable X = X 1 + i X 2 is second order if X 1 and

X 2 ∈ L2 ( dP ) . The covariance of two centered second order random variables, X = X 1 + i X 2 and Y = Y1 + iY2 has a natural definition:

Cov ( X , Y ) = EXY = E ( X1 + iX 2 )(Y1 − iY2 )

= E ( X 1Y1 + X 2Y2 ) + iE ( X 2Y1 − X 1Y2 )

and the decorrelation condition is thus:

E ( X 1Y1 + X 2Y2 ) = E ( X 2Y1 − X 1Y2 ) = 0 .

44


We say that a complex random vector X order if

T

(

)

= X 1 ,..., X j ,..., X n is second

j ∈ (1,..., n ) X j = X1 j + iX 2 j is a second order complex random

variable for the entirety. The covariance matrix of a second order complex centered random vector is defined by:

⎛ E X 2 … EX X ⎞ 1 1 n⎟ ⎜ ΓX = ⎜ ⎟ ⎜ ⎟ 2 ⎜ EX X ⎟ E X n ⎠ ⎝ n 1 If we are not intimidated by its dense expression, we can express these definitions for non-centered complex random variables and vectors without any difficulty. Let us return to real random vectors. T DEFINITION.– We call the symmetric matrix Ε ⎡ X X ⎤ the second order matrix

⎣

moment. If

⎦

X is centered Γ X = ⎡⎣ X X ⎤⎦ . T

Affine transformation of a second order vector

Let us denote the space of the matrices at p rows and at n columns as M ( p, n ) .

PROPOSITION.– Let X

T

= ( X1 ,..., X n ) be a random vector of expectation

value vector m = ( m1 ,..., mn ) and of covariance matrix Γ X . T

Furthermore

(

)

let

BT = b1 ,..., bp .

a

matrix

be

A ∈ M ( p, n )

and

a

certain

vector

Random Vectors

45

The random vector Y = A X + B possesses Αm + B as a mean value vector Τ

and Γ y = ΑΓ X Α as a covariance matrix. DEMONSTRATION.–

Ε [Y ] = Ε [ ΑX + B ] = Ε [ ΑX ] + Β = Αm + Β . In addition for example: Τ Ε ⎡( ΑX ) ⎤ = Ε ⎣⎡ X Τ ΑΤ ⎦⎤ = mΤ ΑΤ ⎣ ⎦

Τ ΓY = Γ ΑX +Β = Γ ΑX = Ε ⎡⎢ Α ( X − m ) ( Α ( X − m ) ) ⎤⎥ = ⎣ ⎦ Τ Τ Ε ⎡ Α ( X − m )( X − m ) ΑΤ ⎤ = Α Ε ⎡( X − m )( X − m ) ⎤ ΑΤ = ΑΓ X Α Τ ⎣ ⎦ ⎣ ⎦

for what follows, we will also need the easy result that follows. PROPOSITION.– Let X

T

= ( X 1 ,..., X n ) be a second order random vector, of

covariance matrix Γ Χ . Thus: ∀ Λ = ( λ1 ,..., λn ) ∈ T

n

⎛ n ⎞ Λ Τ Γ X Λ = var ⎜ ∑ λ j X j ⎟ ⎜ j =1 ⎟ ⎝ ⎠

DEMONSTRATION.–

(

)

(

Λ ΤΓ X Λ = ∑ Cov X j , X K λ j λK = ∑ Ε X j − ΕX j j,K

j,K

) ( X K − Ε X K ) λ j λK 2

2 ⎛ ⎛ ⎞⎞ ⎛ ⎞ ⎛ ⎞ = Ε ⎜ ∑ λ j X j − ΕX j ⎟ = Ε ⎜ ∑ λ j X j − Ε ⎜ ∑ λ j X j ⎟ ⎟ = Var ⎜ ∑ λ j X j ⎟ ⎜ j ⎟⎟ ⎜ j ⎟ ⎜ j ⎝ K ⎠ ⎝ ⎠⎠ ⎝ ⎠ ⎝

(

)

46


CONSEQUENCE.– ∀Λ ∈

n

Τ

we still have Λ Γ Χ Λ ≥ 0 .

Let us recall in this context the following algebraic definitions: – if Λ Γ X Λ > 0 ∀Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) , we say that Γ X is positive T

definite; – if ∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) such that Λ Γ X Λ = 0 , we say that Λ X is Τ

positive semi-definite. NOTE.– In this work the notion of vector appears in two different contexts and in order to avoid confusion, let us return for a moment to some vocabulary definitions. n

1) We call random vector of

(or random vector with values in

n

), every

⎛ X1 ⎞ ⎜ ⎟ n-tuple of random variables X = ⎜ ⎟ ⎜X ⎟ ⎝ n⎠ (or X

T

= ( X 1 ,..., X n ) or even X = ( X 1 ,..., X n ) ).

X is a vector in this sense that for each ω ∈ Ω , we obtain an n-tuple X (ω ) = ( X 1 (ω ) ,..., X n (ω ) ) which belongs to the vector space n . 2) Every random vector of

n

. X = ( X 1 ,..., X n ) of which all the components

X j belong to L2 ( dP ) we call a second order random vector.

In this context, the components X j themselves are vectors since they belong to the vector space L ( dP ) . 2

Thus, in what follows, when we speak of linear independence or of scalar product or of orthogonality, it is necessary to point out clearly to which vector space,

n

or L ( dP ) , we are referring. 2

Random Vectors 2

1.5. Linear independence of vectors of L

( dP ) 2

DEFINITION.– We say that n vectors X 1 ,..., X n of L

λ1 X 1 + ... + λn X n = 0

independent if

2

zero vector of L

a.s.

( dP )

( dP ) ). 2

λ1 ,..., λn are not all zero and ∃ an event A λ1 X 1 (ω ) + ... + λn X n (ω ) = 0 ∀ω ∈ A .

dependent if ∃

In particular: X 1 ,..., X n will be linearly dependent if ∃ zero such that

are linearly

⇒ λ1 = ... = λn = 0 (here 0 is the

DEFINITION.– We say that the n vectors X 1 ,..., X 2 of L such that

λ1 X 1 + ... + λn X n = 0

( dP )

are linearly

of positive probability

λ1 ,..., λn

are not all

a.s.

Examples: given the three measurable mappings:

X1, X 2 , X 3 :

([0, 2] ,B [0, 2] , dω ) → (

,B (

))

defined by:

X 1 (ω ) = ω

X 2 (ω ) = 2ω X 3 (ω ) = 3ω

47

⎫ ⎪ ⎬ on [ 0,1[ and ⎪ ⎭

X 1 (ω ) = e −(ω −1)

⎫ ⎪⎪ X 2 (ω ) = 2 ⎬ on [1, 2[ ⎪ X 3 (ω ) = −2ω + 5⎭⎪

48


X1 ; X2 ; X3

3

2

1

0

1

ω

2

Figure 1.6. Three random variables

The three mappings are evidently measurable and belong to L ( dω ) , so there 2

are 3 vectors of L ( dω ) . 2

There 3 vectors are linearly dependent on measurement

A = [ 0,1[ of probability

1 : 2

−5 X 1 ( ω ) + 1 X 2 ( ω ) + 1 X 3 ( ω ) = 0

∀ω ∈ A

Covariance matrix and linear independence

Let Γ X be the covariance matrix of X = ( X 1 ,..., X n ) a second order vector.

Random Vectors

49

1) If Γ X is defined as positive: X 1 = X 1 − ΕX 1 ,..., X n = X n − ΕX n are thus *

*

linearly independent vectors of L ( dP ) . 2

In effect:

⎛ ⎛ ⎞ ⎛ ⎞⎞ ΛT Γ X Λ = Var ⎜ ∑ λ j X j ⎟ = Ε ⎜ ∑ λ j X j − Ε ⎜ ∑ λ j X j ⎟ ⎟ ⎜ j ⎟ ⎝ j ⎠ ⎝ j ⎠⎠ ⎝

2

2

⎛ ⎞ = Ε ⎜ ∑ λ j ( X j − ΕX j ) ⎟ = 0 ⎝ j ⎠ That is to say:

∑λ ( X j

j

j

− ΕX j ) = 0

a.s.

This implies, since Γ X is defined positive, that

λ1 =

= λn = 0

We can also say that X 1 ,..., X n generates a hyperplane of L ( dP ) of *

*

2

(

*

*

)

dimension n that we can represent as H X 1 ,..., X n . In particular, if the r.v. X 1 ,..., X n are pairwise uncorrelated (thus a fortiori if they are stochastically independent), we have:

ΛT Γ X Λ = ∑ Var X j .λ j2 = 0 ⇒ λ1 =

= λn = 0

j

thus, in this case, Γ X is defined positive and X 1 ,..., X n are still linearly *

independent.

*

50


NOTE.– If Ε X X , the matrix of the second order moment function is defined as T

positive definite, then X 1 ,..., X n are linearly independent vectors of L ( dP ) . 2

2) If now Γ X is semi-defined positive:

X 1* = X 1 − ΕX 1 ,..., X n∗ = X n − ΕX n are thus linearly dependent vectors of L ( dP ) . 2

In effect:

∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 )

(

)

⎛

such that: Λ Γ X Λ = Var ⎜ T

⎝

∑λ j

j

⎞ Xj⎟=0 ⎠

That is to say:

∃Λ = ( λ1 ,..., λn ) ≠ ( 0,..., 0 ) such that

∑λ ( X j

j

j

− ΕX j ) = 0 a.s.

⎛ X1 ⎞ ⎜ ⎟ 3 Example: we consider X = X 2 to a second order random vector of , ⎜ ⎟ ⎜X ⎟ ⎝ 3⎠ ⎛ 3⎞ ⎛ 4 2 0⎞ ⎜ ⎟ ⎜ ⎟ admitting m = −1 for the mean value vector and Γ X = 2 1 0 for the ⎜ ⎟ ⎜ ⎟ ⎜ 2⎟ ⎜ 0 0 3⎟ ⎝ ⎠ ⎝ ⎠ covariance matrix. We state that Γ X is semi-defined positive. In taking for example

Random Vectors

ΛT = (1, − 2, 0 )

we

( X1 − 2 X 2 + 0 X 3 ) = 0

verify

(Λ Γ Λ) = 0 T

that

and X 1 − 2 X 2 = 0 *

a.s.

X

*

Thus

51

Var

a.s.

X ∗ (ω )

L2 ( dP )

3

0

X∗

x2 ∆

x1

(

H X 1∗ X 2∗ X 3∗

When

(

ω

describes Ω ,

X ∗ (ω ) = X ∗ (ω ) , X ∗ (ω ) , X ∗ (ω ) 1

2

3

random vector of

3

)

X 1∗ , X 2∗ , X 3∗ vectors of L2 ( dP )

)

T

(

∗

∗

∗

generate H X 1 X 2 X 3 2

subspace of L

of the 2nd order

describes the vertical plane ( Π ) passing

)

( dP ) of

dimension 2

through the straight line ( ∆ ) of equation

x1 = 2 x2 Figure 1.7. Vector

X ∗ (ω )

X∗

and vector

1.6. Conditional expectation (concerning random vectors with density function)

Given that assume

that

X is a real r.v. and Y = (Y1 ,..., Yn ) is a real random vector, we X

and

Y

are

independent

and

that

Z = ( X , Y1 ,..., Yn ) admits a probability density f Z ( x, y1 ,..., yn ) . In this section, we will use as required the notations

Y , ( y1 ,..., yn ) or y . Let us recall to begin with fY ( y ) =

∫

f Z ( x, y ) dx .

the

vector

(Y1 ,..., Yn )

or

52


Conditional probability

We want, for all B ∈ B (

)

and all

( y1 ,..., yn ) ∈

n

, to define and

calculate the probability that X ∈ B knowing that Y1 = y1 ,..., Yn = yn . We denote this quantity P

(

)

( ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) )

or more

simply P X ∈ B y1 ,..., yn . Take note that we cannot, as in the case of discrete variables, write:

(

)

P ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) =

(

P ( X ∈ B ) (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) P ( (Y1 = y1 ) ∩ .. ∩ (Yn = yn ) )

The quotient here is indeterminate and equals

)

0 . 0

For j = 1 at n , let us note I j = ⎡⎣ y j , y j + h ⎡⎣ We write:

(

P ( X ∈ B y1 ,..., yn ) = lim P ( X ∈ B ) (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) h →0

= lim

h→0

P ( ( X ∈ B ) ∩ (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) ) P ( (Y1 ∈ I1 ) ∩ .. ∩ (Yn ∈ I n ) )

∫ B dx ∫ I ×...×I f Z ( x, u1,..., un ) du1...dun h→0 ∫ I ×...×I f y ( u1,..., un ) du1...dun ∫ B f Z ( x, y ) dx = f Z ( x, y ) dx = ∫ B fY ( y ) fY ( y ) = lim

n

1

1

n

)

Random Vectors

53

It is thus natural to say that the conditional density of the random vector X

( y1 ,..., yn ) is the function:

knowing

x → f ( x y) =

f Z ( x, y ) if fY ( y ) ≠ 0 fY ( y )

We can disregard the set of

y for which fY ( y ) = 0 for its measure (in

n

)

is zero. Let us state that Α =

{( x, y ) fY ( y ) = 0} ; we observe:

P ( ( X , Y ) ∈ Α ) = ∫ f Z ( x, y ) dx dy = ∫ Α

=∫

{y f

Y

( y )=0}

{y f

Y

( y )=0}

du ∫ f ( x, u ) dx

fY ( u ) du = 0 , so fY ( y ) is not zero almost everywhere.

Finally, we have obtained a family (indicated by y verifying fY ( y ) > 0 ) of

(

probability densities f x y

(∫

)

)

f ( x y ) dx = 1 .

Conditional expectation

Let the random vector always be Z = ( X , Y1 ,..., Yn ) of density f Z ( x, y ) and

f ( x y ) always be the probability density of X , knowing y1 ,..., yn . DEFINITION.– Given a measurable mapping Ψ : under

the

(

hypothesis

)

∫

(

,B (

Ψ ( x ) f ( x y ) dx < ∞

) ) → ( ,B ( ) ) , (that

is

to

say

Ψ ∈ L1 f ( x y ) dx we call the conditional expectation of Ψ ( X ) knowing

54


( y1 ,..., yn ) , the expectation of

Ψ ( X ) calculated with the conditional density

f ( x y ) = f ( x y1 ,..., yn ) and we write:

Ε ( Ψ ( X ) y1 ,..., yn ) = ∫ Ψ ( x ) f ( x y ) dx

Ε ( Ψ ( X ) y1 ,..., yn ) is a certain value, depending on

( y1 ,..., yn ) ,

and we

denote this gˆ ( y1 ,..., yn ) (this notation will be of use in Chapter 4). DEFINITION.– We call the conditional expectation of Ψ ( X ) with respect to

Y = (Y1 ,..., Yn ) the r.v. gˆ (Y1 ,..., Yn ) = Ε ( Ψ ( X ) Y1 ,..., Yn ) (also denoted

Ε ( Ψ ( X ) Y ) ) which takes the value gˆ ( y1 ,..., yn ) = Ε ( Ψ ( X ) y1 ,..., yn )

when (Y1 ,..., Yn ) takes the value ( y1 ,..., yn ) .

NOTE.– As we do not distinguish between two equal r.v. a.s., we will still call the condition expectation of

Ψ ( X ) with respect to Y1 ,..., Yn of all r.v.

gˆ ′ (Y1 ,..., Yn ) such that gˆ ′ (Y1 ,..., Yn ) = gˆ (Y1 ,..., Yn ) almost surely.

That is to say gˆ ′ (Y1 ,..., Yn ) = gˆ (Y1 ,..., Yn ) except possibly on Α such that

P ( Α ) = ∫ fY ( y ) dy = 0 . Α

PROPOSITION.– If Ψ ( X ) ∈ L ( dP ) (i.e. 1

gˆ (Y ) = Ε ( Ψ ( X ) Y ) ∈ L1 ( dP ) (i.e.

∫

n

∫

Ψ ( x ) f X ( x ) dx < ∞ ) then

gˆ ( y ) fY ( y ) dy < ∞ ).

Random Vectors

55

DEMONSTRATION.–

∫ =∫

n

n

gˆ ( y ) f ( y ) dy = ∫

n

Ε ( Ψ ( X ) y ) fY ( y ) dy

fY ( y ) dy ∫ Ψ ( x ) f ( x y ) dx

Using Fubini’s theorem:

∫ =∫

n+1

Ψ ( x ) fY ( y ) f ( x y ) dx dy = ∫

Ψ ( x ) dx ∫

n

n+1

Ψ ( x ) f Z ( x, y ) dx dy

f Z ( x, y ) dy = ∫ Ψ ( x ) f X ( x ) dx < ∞

Principal properties of conditional expectation

The hypotheses of integrability having been verified:

( (

1) Ε Ε Ψ ( X ) Y

)) = Ε ( Ψ ( X )) ;

(

)

(

)

2) If X and Y are independent Ε Ψ ( X ) Y = Ε Ψ ( X ) ;

(

)

3) Ε Ψ ( X ) X = Ψ ( X ) ; 4) Successive conditional expectations

(

)

Ε E ( Ψ ( X ) Y1 ,..., Yn , Yn +1 ) Y1 ,..., Yn = Ε ( Ψ ( X ) Y1 ,..., Yn ) ; 5) Linearity

Ε ( λ1Ψ1 ( X ) + λ2 Ψ 2 ( X ) Y ) = λ1Ε ( Ψ1 ( X ) Y ) + λ2Ε ( Ψ 2 ( X ) Y ) . The demonstrations which in general are easy may be found in the exercises.

56


Let us note in particular that as far as the first property is concerned, it is sufficient to re-write the demonstration of the last proposition after stripping it of absolute values. The chapter on quadratic means estimation will make the notion of conditional expectation more concrete. Example: let Z = ( X , Y ) be a random couple of probability density

f Z ( x, y ) = 6 xy ( 2 − x − y )1∆ ( x, y ) where ∆ is the square [ 0,1] × [ 0,1] .

(

)

Let us calculate E X Y . We have successively: 1

i.e.

1

( y ) = ∫0 f ( x, y ) dx = ∫0 6 xy ( 2 − x − y ) dx with f ( y ) = ( 4 y − 3 y 2 )1[0,1] ( y )

– f

(

)

– f x y =

(

y ∈ [ 0,1]

f ( x, y ) 6 x ( 2 − x − y ) 1[0,1] ( x ) with y ∈ [ 0,1] = f ( y) 4 − 3y

) ∫0 xf ( x y ) dx ⋅1[0,1] ( y ) = 2 54−−43yy 1[0,1] ( y ) . ( ) 1

– E X y =

Thus:

E(X Y) =

5 − 4Y 1 (Y ) . 2 ( 4 − 3Y ) [0,1]

We also have:

(

)

E ( X ) = E E ( X Y ) = ∫ E ( X y ) f ( y ) dy 1

0

5− 4y 7 . =∫ 4 y − 3 y 2 dy = 0 2(4 − 3y) 12 1

(

)

Random Vectors

57

1.7. Exercises for Chapter 1 Exercise 1.1.

Let

X be an r.v. of distribution function ⎛ 0 if ⎜ 1 if F ( x) = ⎜ ⎜2 ⎜ 1 if ⎝

x2

Calculate the probabilities:

(

) (

) (

P X 2 ≤ X ; P X ≤ 2X 2 ; P X + X 2 ≤ 3

4

).

Exercise 1.2.

Given

the

f Z ( x, y ) = K

random

vector

Z = ( X ,Y )

1 1∆ ( x, y ) where K yx 4

of

probability

density

is a real constant and where

⎧ 1⎫ ∆ = ⎨( x, y ) ∈ 2 x, y > 0 ; y ≤ x ; y > ⎬ , determine the constant K and the x⎭ ⎩ densities f X and fY of the r.v. X and Y . Exercise 1.3.

Let X and Y be two independent random variables of uniform density on the

[ ]

interval 0,1 : 1) Determine the probability density f Z of the r.v. Z = X + Y ; 2) Determine the probability density fU of the r.v. U = X Y .

58


Exercise 1.4.

Let X and Y be two independent r.v. of uniform density on the interval

[0,1] .

Determine the probability density fU of the r.v. U = X Y . Solution 1.4.

y

xy = 1

1

xy < u

A

B

0

u

x

1

U takes its values in [ 0,1] Let FU be the distribution function of

U:

– if

u ≤ 0 FU ( u ) = 0 ; if u ≥ 1 FU ( u ) = 1 ;

– if

u ∈ ]0,1[ : FU ( u ) = P (U ≤ u ) = P ( X Y ≤ u ) = P ( ( X , Y ) ∈ Bu ) ;

where Bu = A ∪ B is the cross-hatched area of the figure. Thus FU ( u ) =

∫B

u

f( X ,Y ) ( x, y ) dx dy = ∫

Bu

f X ( x ) fY ( y ) dx dy

Random Vectors

1

u

u

0

= ∫ dx dy + ∫ dx ∫ A

x

dy = u + u ∫

1 dx

= u (1 − n u )

x

u

59

.

⎛ 0 if x ∈ ]-∞,0] ∪ [1, ∞[ ⎜− nu x ∈ ]0,1[ ⎝

Finally fU ( u ) = FU′ ( u ) = ⎜

Exercise 1.5.

Under consideration are three r.v.

X , Y , Z which are independent and of the

same law N ( 0,1) , that is to say admitting the same density

1 2π

⎛ x2 ⎞ ⎜− ⎟. ⎝ 2⎠

Determine the probability density fU of the real random variable (r.r.v.)

U = (X 2 +Y2 + Z2) 2. 1

Solution 1.5.

Let FU be the probability distribution of

U

– if

⎛ u ≤ 0 FU ( u ) = P ⎜ X 2 + Y 2 + Z 2 ⎝

– if

u > 0 FU ( u ) = P ( ( X + Y + Z ) ∈ Su ) ;

(

where Su is the sphere of

3

centered on

)

1

2

⎞ ≤ u⎟ = 0; ⎠

( 0, 0, 0 ) and of radius u

= ∫ f( X ,Y ,Z ) ( x, y, z ) dx dy dz Su =

( 2π )

∫S exp ⎜⎝ − 2 ( x 2 ⎛ 1

1 3

u

2

)

⎞ + y 2 + z 2 ⎟ dx dy dz ⎠

60


and by employing a passage from spherical coordinates:

= =

1

( 2π )

eπ

3

∫0 2

1

( 2π )

3

2

2

dθ

π

∫0

dϕ

⎛ 1

u

∫ 0 exp ⎜⎝ − 2 r

2

⎞ 2 ⎟ r sin ϕ dr ⎠

u ⎛ 1 ⎞ 2π ⋅ 2 ∫ r 2 exp ⎜ − r 2 ⎟ dr 0 ⎝ 2 ⎠

⎛ 1 2⎞ r ⎟ is continuous: ⎝ 2 ⎠

and as r → r exp ⎜ −

⎛ 0 if u 0

fa ( x ) =

1 a is a probability density 2 Π a + x2

(called Cauchy’s density). 1b)

Verify

that

ϕ X ( u ) = exp ( −a u )

the

corresponding

1c) Given a family of independent r.v. density of the r.v.

Yn =

characteristic

function

is

X 1 ,..., X n of density f a , find the

X 1 + ... + X n . n

What do we notice? 2) By considering Cauchy’s random variables, verify that we can have the equality

ϕ X +Y ( u ) = ϕ X ( u ) ϕ Y ( u )

with

X and Y dependent.

Random Vectors

61

Exercise 1.7.

Show that

⎛1 2 3⎞ ⎜ ⎟ M = ⎜ 2 1 2 ⎟ is not a covariance matrix. ⎜3 2 1⎟ ⎝ ⎠

⎛ 1 0,5 0 ⎞ ⎜ ⎟ Show that M = 0,5 1 0⎟ ⎜ ⎜ 0 0 1 ⎟⎠ ⎝

is a covariance matrix.

Verify that from this example the property of “not being correlated with” for a family or r.v. is not transitive. Exercise 1.8.

Show

that

ΕX = ( 7, 0,1) T

the

random

vector

X T = ( X1 , X 2 , X 3 )

of

expectation

⎛ 10 −1 4 ⎞ ⎜ ⎟ and of covariance matrix Γ X = −1 1 −1 belongs ⎜ ⎟ ⎜ 4 −1 2 ⎟ ⎝ ⎠

almost surely (a.s.) to a plane of

3

.

Exercise 1.9.

We are considering the random vector U = ( X , Y , Z ) of probability density

fU ( x, y, z ) = K x y z ( 3 − x − y − z )1∆ ( x, y, z ) where ∆ is the cube

[0,1] × [0,1] × [0,1] .

1) Calculate the constant

K.

2) Calculate the conditional probability

⎛ 1 3⎞ ⎡1 1⎤ P⎜ X ∈⎢ , ⎥ Y = ,Z = ⎟ . 2 4⎠ ⎣4 2⎦ ⎝

3) Determine the conditional expectation

(

)

Ε X 2 Y,Z .

Chapter 2

Gaussian Vectors

2.1. Some reminders regarding random Gaussian vectors DEFINITION.– We say that a real r.v. is Gaussian, of expectation m and of variance

σ2

if its law of probability PX :

⎛ ( x − m )2 ⎞ 1 ⎟ if σ 2 ≠ 0 (using exp ⎜ − – admits the density f X ( x ) = 2 ⎜ ⎟ σ 2 2π σ ⎝ ⎠ a double integral calculation, for example, we can verify that ∫ f X ( x ) dx = 1 ); – is the Dirac measure

(

2πσ

)

δm

if σ

2

= 0.

δm

−1

fX

x

x m

m

Figure 2.1. Gaussian density and Dirac measure

64


If

σ 2 ≠ 0 , we say that X

is a non-degenerate Gaussian r.v.

2 If σ = 0 , we say that X is a degenerate Gaussian r.v.; X is in this case a “certain r.v.” taking the value m with the probability 1.

EX = m, Var X = σ 2 . This can be verified easily by using the probability distribution function. As we have already observed, in order to specify that an r.v. X is Gaussian of

(

)

m expectation and of σ 2 variance, we will write X ∼ N m, σ 2 . Characteristic function of Let

us

begin

X 0 ∼ N ( 0,1) :

firstly

(

ϕ X (u ) = E e 0

X ∼ N ( m, σ 2 )

iuX 0

by

)

determining

1 = 2π

∫

iux

e e

the

− x2

2 dx

characteristic

function

of

.

We can easily see that the theorem of derivation under the sum sign can be applied:

ϕ ′X 0 ( u ) =

i 2π

∫

iux

e xe

− x2

2 dx

.

Following this by integration by parts:

=

i 2π

⎡⎛ iux − x2 ⎞+∞ ⎤ − x2 +∞ iux 2 2 dx = − uϕ e e iue e − + ⎢⎜ ⎥ X0 (u ) . ⎟ ∫ −∞ ⎠−∞ ⎢⎣⎝ ⎥⎦

Gaussian Vectors

The resolution of the differential equation condition that

ϕ X ( 0) = 1

(

)

ϕ X (u ) =

By changing the variable y = case, we obtain If

σ2 =0

ϕ X (u ) =

ϕ X (u ) =

x−m

σ

1 ium − u 2σ 2 e 2

δm )

1 ium − u 2σ 2 e 2

ium

⎛ ⎝

−u

2

0

∫

(

.

1 ⎛ x −m ⎞ +∞ iux − 2 ⎜ σ ⎟ ⎠ e e ⎝ −∞

2

dx .

which brings us back to the preceding

ϕ X (u )

(Fourier transform in the sense

so well that in all cases

(σ

2

≠ or = 0 )

)

1

2

(σ ) 2

1 2

1

2

with the

2

∼ N m, σ 2 , we can write: 1

( 2π )

ϕ X (u ) = e

.

NOTE.– Given the r.v. X

fX ( x) =

= e

1 2πσ

0

.

that is to say if PX = δ m ,

of the distribution of

0

leads us to the solution

0

For X ∼ N m, σ 2

ϕ ′X ( u ) = − uϕ X ( u )

65

( )

⎛ 1 exp ⎜ − ( x − m ) σ 2 ⎝ 2

−1

( x − m ) ⎞⎟ ⎠

⎞ ⎠

ϕ X ( u ) = exp ⎜ ium − u σ 2u ⎟ These are the expressions that we will find again for Gaussian vectors.

66


2.2. Definition and characterization of Gaussian vectors T

= ( X 1 ,..., X n ) is Gaussian

∑aj X j

is Gaussian (we can in this

DEFINITION.– We say that a real random vector X if ∀ ( a0 , a1 ,..., an ) ∈

n +1

the r.v. a0 +

n

j =1

definition assume that a0 = 0 and this will be sufficient in general). A random vector X

T

= ( X 1 ,..., X n ) is thus not Gaussian if we can find an

n -tuple ( a1 ,..., an ) ≠ ( 0,..., 0 ) such that the r.v.

n

∑aj X j

is not Gaussian and

j =1

n

for this it suffices to find an n -tuple such that

∑ajX j

is not an r.v. of density.

j =1

EXAMPLE.– We allow ourselves an r.v. X ∼ N ( 0,1) and a discrete r.v.

ε,

independent of X and such that:

P ( ε = 1) =

1 1 and P ( ε = −1) = . 2 2

We state that Y = ε X . By using what has already been discussed, we will show through an exercise that although Y is an r.v. N ( 0,1) , the vector ( X , Y ) is not a Gaussian vector. PROPOSITION.– In order for a random vector X

T

= ( X 1 ,..., X n ) of expectation

mT = ( m1 ,..., mn ) and of covariance matrix Γ X to be Gaussian, it is necessary and sufficient that its characteristic function (c.f.)

⎛ ⎜ ⎝

m

1 2

⎞ ⎟ ⎠

ϕ X ( u1 ,..., un ) = exp ⎜ i ∑ u j m j − uT Γ X u ⎟ j =1

ϕX

be defined by:

( where u

T

= ( u1 ,..., un )

)

Gaussian Vectors

67

DEMONSTRATION.–

⎛ ⎜ ⎝

⎞ ⎟ ⎠

n

⎛ ⎜ ⎝

⎞ ⎟ ⎠

n

ϕ X ( u 1,..., u n ) = E exp ⎜ i ∑ u j X j ⎟ = E exp ⎜ i.1.∑ u j X j ⎟ j =1

j =1

n

= characteristic function of the r.v.

∑u j X j

in the value 1.

j =1

That is to say:

ϕ

n

∑u j X j

(1)

j =1

and

ϕ

⎛ ⎛ n ⎞ 1 2 ⎛ n 1 exp .1. 1 Var = ⎜ − i E u X n () ⎜⎜ ∑ j j ⎟⎟ ⎜⎜ ∑ u ⎜ 2 ∑u j X j ⎝ j =1 ⎠ ⎝ j =1 ⎝ j =1

j

⎞⎞ X j ⎟⎟ ⎟⎟ ⎠⎠

n

if and only if the r.v.

∑u j X j

is Gaussian.

j =1

⎛ n ⎞ u j X j ⎟ = u T Γ X u , we arrive indeed at: ∑ ⎜ j =1 ⎟ ⎝ ⎠

Finally, since Var ⎜

⎛ ⎜ ⎝

n

1 2

⎞ ⎟ ⎠

ϕ X ( u 1,..., u n ) = exp ⎜ i ∑ u j m j − u T Γ X u ⎟ . j =1

NOTATION.– We can see that the characteristic function of a Gaussian vector X is entirely determined when we know its expectation vector m and its covariance

matrix Γ X . If X is such a vector, we will write X ∼ N n ( m, Γ X ) .

PARTICULAR CASE.– m = 0 and Γ X = I n (unit matrix), X ∼ N n ( 0, I n ) is called a standard Gaussian vector.

68


2.3. Results relative to independence PROPOSITION.– T

= ( X 1 ,..., X n ) is Gaussian, all its components X j are

2) if the components

X j of a random vector X are Gaussian and independent,

1) if the vector X thus Gaussian r.v.;

the vector

X is thus also Gaussian.

DEMONSTRATION.– 1) We write

X j = 0 + ... + 0 + X j + 0... + 0 . n

2)

ϕ X ( u 1,..., u n ) = ∏ ϕ X ( u j

j =1

j

)

n 1 ⎛ ⎞ = ∏ exp ⎜ iu j m j − u 2jσ 2j ⎟ 2 ⎝ ⎠ j =1

⎛ n ⎞ 1 u jmj − u T ΓX u ⎟ ∑ ⎜ j =1 ⎟ 2 ⎝ ⎠ 0 ⎞ ⎟ ⎟. σ n2 ⎟⎠

that we can still express exp ⎜ i

⎛ σ 12 ⎜ with Γ X = ⎜ ⎜ 0 ⎝

NOTE.– As we will see later “the components

X j are Gaussian and independent”

is not a necessary condition for the random vector

(

)

X T = X1 ,..., X j ,..., X n to

be Gaussian. PROPOSITION.– If

(

X T = X 1 ,..., X j ,..., X n

)

is a Gaussian vector of

covariance Γ X , we have the equivalence: Γ X diagonal ⇔ the r.v. independent.

X j are

Gaussian Vectors

69

DEMONSTRATION.–

⎛ σ 12 ⎜ ΓX = ⎜ ⎜ 0 ⎝

0 ⎞ n ⎟ ⇔ ϕ u ,..., u = ( ) ∏ϕ X j u ⎟ X n 1 j −1 σ n2 ⎟⎠

( j)

This is a necessary and sufficient condition of independence of the r.v. X j . Let us sum up these two simple results schematically:

(

X T = X 1 ,..., X j ,..., X n

)

The components

X j are

Gaussian r.v.

is a Gaussian vector If (sufficient condition) the r.v. X j are

Even if

Γ X is

independent

diagonal

(The r.v. X j are independent ⇔ Γ X

(The r.v. X j are independent or X is Gaussian)

is

diagonal)

NOTE.– A Gaussian vector

(

X T = X1 ,..., X j ,..., X n

order. In effect each component

)

is evidently of the 2nd

X j is thus Gaussian and belongs to L2 ( dP )

2 ⎛ ⎞ −( x − m ) 1 2 ⎜ ⎟ 2 e 2σ dx < ∞ ⎟ . ⎜∫ x 2 2πσ ⎜ ⎟ ⎝ ⎠

We can generalize the last proposition and replace the Gaussian r.v. by Gaussian vectors.

70


Let us consider for example three random vectors:

(

X T = X ,..., X 1

) ; Y = (Y ,..., Y ) ; Z = ( X ,..., X , Y ,..., Y ) T

n

T

1

ΓX ⎛ ⎜ and state Γ Z = ⎜ ⎜ Cov(Y , X ) ⎝

1

p

n

1

p

Cov( X , Y ) ⎞ ⎟ ⎟ ⎟ ΓY ⎠

where Cov ( X , Y ) is the matrix of the coefficients Cov

Cov ( X , Y ) = ( Cov ( X , Y ) ) .

( X j ,Y )

and where

T

PROPOSITION.– If

(

Z T = X 1 ,..., X n , Y1 ,..., Yp

)

is a Gaussian vector of

covariance matrix Γ Z , we have the equivalence:

Cov ( X , Y ) = zero matrix ⇔ X and Y are two independent Gaussian vectors. DEMONSTRATION.–

⎛ΓX ⎜ ΓZ = ⎜ ⎜ 0 ⎝

0 ⎞ ⎟ ⎟⇔ ΓY ⎟⎠

ϕ Z ( u 1 ,..., u n, u n+1,..., u n+ p )

(

⎛ n+ p ⎛ΓX 1 T⎜ ⎜ = exp ⎜ i ∑ u j m j − u ⎜ 2 ⎜ ⎜ j =1 ⎝ 0 ⎝

)

0 ⎞ ⎞ ⎟ ⎟ ⎟u⎟ ΓY ⎟⎠ ⎟⎠

= ϕ X ( u 1,..., u n ) ϕY u n +1,..., u n + p , which is a necessary and sufficient condition for the independence of vector X and Y .

Gaussian Vectors

NOTE.– Given Z

T

(

71

)

= X T , Y T , U T ,... where X , Y ,U ,... are r.v. or random

vectors: – that Z is a Gaussian vector is a stronger hypothesis than – Gaussian

X and Gaussian Y and Gaussian U , etc.;

X and Gaussian Y and Gaussian U , etc. and their covariances (or T T T T matrix covariances) are zero ⇒ that Z = X , Y , U ,... is a Gaussian – Gaussian

(

)

vector. EXAMPLE.– Given that

X , Y , Z three r.v. ∼ N ( 0,1) , find the law of the vector

W T = (U , V ) or U = X + Y + Z and V = λ X − Y with λ ∈

( X ,Y , Z ) a, b ∈ aU + bV = ( a + λ b ) X + ( a − λ b ) Y + aZ W T = (U , V ) is a Gaussian vector.

the

independence,

the

vector

To determine this entirely we must know m = EW

W ∼ N 2 ( m, ΓW ) .

is

: because of

Gaussian

is a Gaussian r.v. Thus

and ΓW and we will have

It follows on easily:

EW T = ( EU , EV ) = ( 0, 0 )

⎛

and ΓW = ⎜

Var U

⎝ Cov (V ,U )

and

Cov (U ,V ) ⎞ ⎛ 3 λ −1 ⎞ ⎟=⎜ Var V ⎠ ⎝ λ − 1 λ 2 + 1⎟⎠

In effect:

Var U = EU 2 = E ( X + Y + Z ) = EX 2 + EY 2 + EZ 2 = 3 2

Var V = EV 2 = E ( λ X − Y ) = λ 2 EX 2 + EY 2 = λ 2 + 1 2

Cov (U , V ) = E ( X + Y + Z )( λ X − Y ) = λ EX 2 − EY 2 = λ − 1

72


Particular case:

λ = 1 ⇔ ΓW

diagonal ⇔ U and V are independent.

2.4. Affine transformation of a Gaussian vector We can generalize to vectors the following result on Gaussian r.v.:

(

If Y ∼ N m, σ

2

)

then ∀a, b ∈

(

)

aY + b ∼ N am + b, a 2σ 2 .

(

By modifying a little the annotation, with N am + b, a

σ2

2

)

becoming

N ( am + b, a VarYa ) , we can imagine already how this result is going to extend to Gaussian vectors. PROPOSITION.– Given a Gaussian vector Y ∼ N n ( m, ΓY ) , A a matrix

belonging to M ( p, n ) and a certain vector B ∈

(

T

vector ∼ N p Am + B, AΓY A

p

, then

).

AY + B is a Gaussian

DEMONSTRATION.–

⎛ a11 ⎜ ⎜ AY + B = ⎜ a 1 ⎜ ⎜ ⎜ a p1 ⎝

ai

⎛ a1n ⎞ ⎛ Y1 ⎞ ⎛ b1 ⎞ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ n a n ⎟ ⎜ Yi ⎟ + ⎜ b ⎟ = ⎜ ∑ a iYi + b ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ i =1 ⎟⎜ ⎟ ⎜ ⎟ ⎜ a pn ⎟⎠ ⎜⎝ Yn ⎟⎠ ⎜⎝ b p ⎟⎠ ⎜⎜ ⎝

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎟ ⎠

– this is indeed a Gaussian vector (of dimension p ) because every linear combination of its components is an affine combination of the r.v. Y1 ,..., Yi ,..., Yn and by hypothesis Y

T

= (Y1 ,..., Yn ) is a Gaussian vector;

Gaussian Vectors

73

– furthermore, we have seen that if Y is a 2nd order vector:

E ( AY + B ) = AEY + B = Am + B and Γ AY + B = AΓY AT . EXAMPLE.– Given emerges

( n + 1)

independent r.v. Y j ∼ N

( µ ,σ ) 2

j = 0 at n , it

Y T = (Y0 , Y1 ,..., Yn ) ∼ N n +1 ( m, ΓY ) with mT = ( µ ,..., µ ) and

⎛σ 2 ⎜ ΓY = ⎜ ⎜ 0 ⎝

0 ⎞ ⎟ ⎟. σ 2 ⎟⎠

Furthermore, given new r.v. X defined by:

X1 = Y0 + Y1 ,..., X n = Yn −1 + Yn ,

the vector X

T

= ( X 1 ,..., X n )

⎛ X 1 ⎞ ⎛ 110...0 ⎞ ⎛ Y0 ⎞ ⎜ ⎟ ⎜ ⎟⎜ ⎟ is Gaussian for ⎜ ⎟ = ⎜ 0110..0 ⎟ ⎜ ⎟ more ⎜ X ⎟ ⎜ 0...011 ⎟ ⎜ Y ⎟ ⎠⎝ n ⎠ ⎝ n⎠ ⎝

(

T

precisely following the preceding proposition, X ∼ N n Am, AΓY A NOTE.– If in this example we assume

µ =0

and

).

σ 2 = 1 , we are certain that the

vector X is Gaussian even though its components X j are not independent. In effect, we have for example:

Cov ( X1 , X 2 ) ≠ 0 because

EX 1 X 2 = E (Y0 + Y1 ) (Y1 + Y2 ) = EY12 = 1 and EX 1EX 2 = E (Y0 + Y1 ) E (Y1 + Y2 ) = 0 .

74


2.5. The existence of Gaussian vectors NOTATION.– u = ( u 1,..., u T

n

) , xT = ( x1 ,..., xn )

and m = ( m1 ,..., mn ) . T

We are interested here in the existence of Gaussian vectors, that is to say the existence of laws of probability on n having Fourier transforms of the form:

⎛ ⎞ 1 exp ⎜ i ∑ u j m j − u T Γu ⎟ ⎜ j ⎟ 2 ⎝ ⎠ PROPOSITION.–

Given

a

mT = ( m1 ,..., mm )

vector

and

a

matrix

Γ ∈ M ( n, n ) , which is symmetric and semi-defined positive, there is a unique probability PX on

∫

n

, of the Fourier transform:

⎛ n ⎞ ⎛ n 1 T ⎞ exp ,..., exp i u x dP x x = ⎜⎜ ∑ j j ⎟⎟ X ( 1 ⎜⎜ i ∑ u j m j − u Γu ⎟⎟ n) n 2 ⎝ j =1 ⎠ ⎝ j =1 ⎠

In addition: 1) if Γ is invertible, PX admits on

f X ( x1 ,..., xn ) =

n

the density:

1 n

( 2π ) 2 ( Det Γ )

1

2

T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ; ⎝ 2 ⎠

2) if Γ is non-invertible (of rank r < n ) the r.v. X 1 − m1 ,..., X n − mn are linearly dependent. We can still say that hyperplane ( Π ) of

n

ω → X (ω ) − m

a.s. takes its values on a

or that the probability PX loads a hyperplane ( Π ) does

not admit a density function on

n

.

Gaussian Vectors

75

DEMONSTRATION.– 1) Let us begin by recalling a result from linear algebra:

Γ being symmetric, we can find an orthonormal basis of n formed from eigenvectors of Γ ; let us call (V1 , ..., Vn ) this basis. By denoting the eigenvalues of Γ as λ j , we thus have ΓV j = λ jV j where the λ j are solutions of the equation Det ( Γ − λ I ) = 0 . Some consequences

⎛λ ⎜ 1 Let us first note Λ = ⎜ ⎜⎜ ⎝ 0 (where the V j are column vectors). – ΓV j = λ jV j

(

orthogonal VV

T

⎞ 0 ⎟ ⎟ and V = V1 , ⎟ λ n ⎟⎠

(

– The

λj

λj

)

j = 1 at n equates to ΓV = V Λ and, matrix V being

)

= V T V = I , Γ = V ΛV T .

Let us demonstrate that if in addition Γ is invertible, the and thus the

, V j , Vn

λj

are ≠ 0 and ≥ 0 ,

are > 0.

are ≠ 0 . In effect, Γ being invertible, n

0 ≠ Det Γ = Det Λ = ∏ λ j j =1

The

λj

are ≥ 0 : let us consider in effect the quadratic form u → u

( ≥ 0 since Γ is semi-defined positive).

T

Γu

76


In the basis (V1...Vn ) , u is written ( u 1,..., u

n

)

with u j = < V j , u > and the

⎛u1⎞ ⎜ ⎟ 2 quadratic form is written u → ( u 1,..., u n ) Λ ⎜ ⎟ = ∑ λ j u j ≥ 0 from which ⎜u ⎟ j ⎝ n⎠ we get the predicted result. Let us now demonstrate the proposition. 2) Let us now look at the general case, that is to say, in which Γ is not necessarily invertible (once again that the eigenvalues λ j are ≥ 0 ).

(

)

Let us consider n independent r.v. Y j ∼ N 0, λ j . We know that the vector Y

T

= (Y1 ,..., Yn ) is Gaussian as well as the vector

X = VY + m (proposition from the preceding section); more precisely

(

)

X ∼ N m , Γ = V ΛV T . The existence of vectors of Gaussian expectation and of a given covariance matrix is thus clearly proven. Furthermore, we have seen that if X is N n ( m, Γ ) , its characteristic function

⎛ ⎜ ⎝

(Fourier transformation of its law) is exp ⎜ i

1

⎞

∑ u j m j − 2 uT Γu ⎟⎟ j

⎠

We thus in fact have:

∫

⎛ 1 T ⎞ = − exp ,..., exp i u x dP x x i u m u Γu ⎟ . ( ) ⎜ ∑ ∑ 1 j j X n j j n ⎜ j ⎟ 2 ⎝ ⎠

(

)

Uniqueness of the law: this ensues from the injectivity of the Fourier transformation.

Gaussian Vectors

77

3) Let us be clear to terminate the role played by the invertibility of Γ . a) If Γ is invertible all the eigenvalues

Y T = (Y1...Yn ) admits the density:

λ j ( = VarY j )

are > 0 and the vector

⎛ y 2j ⎞ 1 exp ⎜ − fY ( y1 ,..., yn ) = ∏ ⎟ ⎜ 2λ j ⎟ j =1 2πλ j ⎝ ⎠ 1 ⎛ 1 ⎞ exp ⎜ − yT Λ −1 y ⎟ = 1 ⎝ 2 ⎠ ⎞ 2 n ⎛ n 2 ( 2π ) ⎜⎜ ∏ λ j ⎟⎟ ⎝ j =1 ⎠ n

As far as the vector X = VY + m is concerned: the affine transformation

y → x = Vy + m is invertible and has y = V −1 ( x − m ) as the inverse and has Det V = ±1 ( V orthogonal) as the Jacobian. n

Furthermore

∏ λ j = Det Λ = Det Γ . j =1

By applying the theorem on the transformation of a random vector by a

C1 -diffeomorphism, we obtain the density probability of vector X:

(

)

f X ( x1 ,..., xn ) = f X ( x ) = fY V −1 ( x − m ) = ↑

↑

notation

1 n

( 2π ) 2 ( DetΓ )

1

2

theorem

↑ we clarify

( )

T ⎛ 1 exp ⎜ − ( x − m ) V T ⎝ 2

−1

⎞ Λ −1V −1 ( x − m ) ⎟ ⎠

78

Discrete Stochastic Processes and Optimal Filtering T

As Γ = V ΛV :

f X ( x1 ,..., xn ) =

1 n

( 2π ) 2 ( DetΓ )

1

2

T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ⎝ 2 ⎠

b) If rank Γ = r < n , let us rank the eigenvalues of Γ in decreasing order: and λr +1 = 0,..., λn = 0

λ1 ≥ λ2 ≥ ...λr > 0

Yr +1 = 0 a.s .,..., Yn = 0 a.s. and, almost surely, X = VY + m takes its values

in ( Π ) the hyperplane of affine mapping

n

the image of

y → Vy + m .

NOTE.– Given a random vector

ε = { y = ( y1 ,..., yr , 0,..., 0 )} by the

X T = ( X 1 ,..., X n ) ∼ N n ( m, Γ X ) and

supposing that we have to calculate an expression of the form:

EΨ ( X ) = ∫

∫

n

n

Ψ ( x ) f X ( x ) dx =

Ψ ( x1 ,..., xn ) f X ( x1 ,..., xn ) dx1...dxn

In general, the density f X , and in what follows the proposed calculation, are rendered complex by the dependence of the r.v. X 1 ,..., X n . Let

λ1 ,..., λn

diagonalizes Γ X .

be the eigenvalues of Γ X and V the orthogonal matrix which

Gaussian Vectors

We have X = VY + m with Y

(

∼ N 0, λ j

)

T

79

= (Y1 ,..., Yn ) , the Y j being independent and

and the proposed calculation can be carried out under the simpler

form: − yj ⎛ n 1 2λ Ψ (Vy + m ) ⎜ ∏ e j n ⎜ ⎜ j =1 2πλ j ⎝

2

E Ψ ( X ) = E Ψ (VY + m ) = ∫

⎞ ⎟ dy ...dy n ⎟ 1 ⎟ ⎠

EXAMPLE.– 1) The expression of a normal case: Let the Gaussian vector X

⎛1

where Γ X = ⎜

⎝ρ

T

= ( X1 , X 2 ) ∼ N 2 ( 0, Γ X )

ρ⎞

⎟ with ρ ∈ ]−1,1[ . 1⎠

Γ X is invertible and f X ( x1 , x2 ) =

⎛ 1 1 ⎞ exp ⎜ − x12 − 2 ρ x1 x2 + x22 ⎟ . 2 ⎝ 2 1− ρ ⎠ 2π 1 − ρ 2 1

(

)

80


1

fx

2π 1 − ρ 2

ε 0

x1

x2

The intersections of the graph of ellipses

ε

2 from equation x1

fX

with the horizontal plane are the

− 2 ρ x1 x2 + x22 = C

(constants)

Figure 2.2. Example of the density of a Gaussian vector

2) We give ourselves the Gaussian vector X

T

= ( X 1 , X 2 , X 3 ) with:

⎛ 3 0 q⎞ ⎜ ⎟ m = (1, 0, −2 ) and Γ = ⎜ 0 1 0 ⎟ . ⎜q 0 1⎟ ⎝ ⎠ T

Because of Schwarz’s inequality must suppose q ≤

( Cov ( X1, X 2 ) )

2

≤ Var X 1 Var X 2 we

3.

We wish to study the density f X ( x1 , x2 , x3 ) of vector X .

Gaussian Vectors

81

Eigenvalues of Γ :

Det ( Γ − λΙ ) =

3−λ

0

q

(

)

1− λ 0 = (1 − λ ) λ 2 − 4λ + 3 − q 2 . 0 1− λ

0 q

From which we obtain the eigenvalues ranked in decreasing order:

λ1 = 2 + 1 + q 2 a) if q < density in

3

3 then λ1 > λ2 > λ3 , Γ is invertible and X has a probability

given by:

f X ( x1 , x2 , x3 ) =

b) q =

, λ2 = 1 , λ3 = 2 − 1 + q 2

1 3

( 2π ) 2 ( λ1λ2λ3 )

1

2

T ⎛ 1 ⎞ exp ⎜ − ( x − m ) Γ −1 ( x − m ) ⎟ ; ⎝ 2 ⎠

3 thus λ1 = 4 ; λ2 = 1; λ3 = 0 and Γ is non-invertible of rank

2. Let us find the orthogonal matrix V ΓV j = λ j V j . For

λ1 = 4 ; λ2 = 1; λ3 = 0

which diagonalizes Γ by writing

we obtain respectively the eigenvectors

⎛ 3 ⎞ ⎛− 1 ⎞ ⎛ 0⎞ ⎜ 2⎟ ⎜ 2⎟ ⎜ ⎟ ⎜ ⎟ ⎜ V = 0 ⎟ V = 1 V1 = 0 ⎜ ⎟ , 2 ⎜ ⎟, 3 ⎜ ⎟ ⎜ 0⎟ ⎜⎜ 1 ⎟⎟ ⎜⎜ 3 ⎟⎟ ⎝ ⎠ ⎝ 2⎠ ⎝ 2 ⎠

82


and the orthogonal matrix

(VV

V = (V1 V2 V3 )

T

)

= V TV = Ι .

Given the independent r.v. Y1 ∼ N ( 0, 4 ) and Y2 ∼ N ( 0,1) and given the r.v.

Y3 = 0 a.s., we have: ⎛ 3 ⎛ X1 ⎞ ⎜ 2 ⎜ ⎟ X = ⎜ X2 ⎟ = ⎜ 0 ⎜ ⎜X ⎟ ⎜ ⎝ 3⎠ ⎜ 1 ⎝ 2

0 1 0

− 1 ⎟⎞ ⎛ Y ⎞ ⎛ 1 ⎞ 2 1 ⎜ ⎟ ⎜ ⎟ 0 ⎟ ⎜ Y2 ⎟ + ⎜ 0 ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ 3 ⎟⎟ ⎝ 0 ⎠ ⎝ −2 ⎠ 2⎠

⎛ X 1∗ ⎞ ⎜ ∗⎟ ∗ or, by calling X = ⎜ X 2 ⎟ the vector X after centering, ⎜⎜ ∗ ⎟⎟ ⎝ X3 ⎠ ⎛ X1∗ ⎞ ⎜⎛ 3 ⎜ ∗⎟ ⎜ 2 0 ⎜ X2 ⎟ = ⎜⎜ ∗ ⎟⎟ ⎜⎜ ⎝ X 3 ⎠ ⎜⎝ 1 2

0 1 0

− 1 ⎟⎞ ⎛ Y ⎞ X1∗ = 3 2Y1 2 1 ⎜ ⎟ 0 ⎟ ⎜ Y2 ⎟ i.e. X 2∗ = Y2 ⎟ ⎜ ⎟ 3 ⎟⎟ ⎝ 0 ⎠ X 3∗ = 1 Y1 2 2⎠

⎛ X 1∗ ⎞ ⎜ ⎟ ∗ ∗ We can further deduce that X = ⎜ X 2 ⎟ . ⎜⎜ ∗⎟ ⎟ 3 X 1 ⎝ ⎠

Gaussian Vectors

83

x3 1

U

0

x2

3

x1 Figure 2.3. The plane

( Π ) is the support of probability PX

Thus, the vector X ∗ describes almost surely the plane ( Π ) containing the axis

0x2 and the vector U T =

(

)

3,0,1 . The plane ( Π ) is the support of the

probability PX .

Probability and conditional expectation Let us develop a simple case as an example. Let

the

Gaussian

( Cov ( X , Y ) ) ρ=

Z T = ( X , Y ) ∼ N 2 ( 0, Γ Z ) .

In

stating

2

and Var X = σ 1 , Var Y = σ 2 the density Z is written: 2

VarX VarY

f Z ( x, y ) =

vector

1 2πσ 1σ 2

2

⎛ ⎛ x2 1 xy y 2 ⎞ ⎞⎟ exp ⎜ − 2 − ρ + ⎜ ⎟ . ⎜ 2 1 − ρ 2 ⎝ σ 12 σ 1σ 2 σ 22 ⎠ ⎟ 1− ρ 2 ⎝ ⎠

(

)

84


Conditional density of X knowing Y = y

f ( x y) = 1 =

=

2πσ 1σ 2

f Z ( x, y ) = fY ( y )

f Z ( x, y ) dx

∫

⎡ ⎛ x2 1 xy y 2 ⎞ ⎤⎥ ⎢ − 2ρ + exp − ⎜ ⎟ σ 1σ 2 σ 22 ⎠ ⎥ ⎢ 2 1 − ρ 2 ⎝ σ 12 1− ρ 2 ⎣ ⎦ 2 ⎡ 1 y ⎤ 1 exp ⎢ − 2⎥ 2π σ 2 ⎣ 2 σ2 ⎦

(

1

(

f Z ( x, y )

σ 1 2π 1 − ρ

2

)

)

⎡ ⎛ σ1 1 ρ exp ⎢ − 2 − x ⎜ σ2 ⎢ 2σ 1 1 − ρ 2 ⎝ ⎣

(

)

⎞ y⎟ ⎠

2⎤

⎥ ⎥ ⎦

X being a real variable and y a fixed numeric value, we can recognize a Gaussian density. More precisely: the conditional law of X , knowing Y = y , is ⎛ σ ⎞ N ⎜ ρ 1 y , σ 12 1 − ρ 2 ⎟ . ⎝ σ2 ⎠

(

We see in particular that

)

E ( X y) = ρ

σ1 y σ2

and that

In Chapter 4, we will see more generally that if

( X , Y1 ,..., Yn ) n

vector,

E(X Y) = ρ

E ( X Y1 ,..., Yn ) is written in the form of λ0 + ∑ λ jY j . j =1

σ1 Y. σ2

is a Gaussian

Gaussian Vectors

85

2.6. Exercises for Chapter 2 Exercise 2.1.

D of center 0 and of radius R which is used for archery. The couple Z = ( X , Y ) represents the coordinates of the point of We are looking at a circular target

impact of the arrow on the target support; we assume that the r.v. independent and following the same law

(

N 0.4 R

2

).

X and Y are

1) What is the probability that the arrow reach the target? 2) How many times must one fire the arrow In order for, with a probability

≥ 0.9 , the target is reached at least once (we give n10 ≠ 2.305 ).

Let us assume that we fire 100 times at the target, calculate the probability that the target to reached at least 20 times. Hint: use the central limit theorem. Solution 2.1.

Z = ( X ,Y )

X and Y being independent, the density of probability of ⎛ x2 + y 2 ⎞ 1 is f Z ( x, y ) = f X ( x ) fY ( y ) = and exp ⎜− 2 ⎟ R 8π R 2 8 ⎝ ⎠

P (Z ∈ D) =

1 8π R 2

1) The r.v.s

⎛ x2 + y 2 ⎞ exp ∫D ⎜⎝ − 8R 2 ⎟⎠ dx dy using a change from Cartesian

to polar coordinates: R −e 1 ⎞ 2π ⎛ = ⎜− dθ ∫ e 2 ⎟ ∫0 0 ⎝ 8π R ⎠

2

8 R 2 ede

=

−1 1 1 R2 −u 2 ⋅ 2π ⋅ ∫ e 8 R du = 1 − e 8 2 2 0 8π R

2) At each shot k , we associate a Bernoulli r.v. U k ∼ b ( p ) defined by

⎛ U k = 1 if the arrow reaches the target (probability p ) ⎜ ⎝ U k = 0 if the arrow does not reach the target (probability 1- p )

86


In n shots, the number of impacts is given by the r.v.

U = U1 + ... + U n ∼ B ( n, p )

P (U ≥ 1) = 1 − P (U = 0 ) = 1 − Cnk p k (1 − p ) = 1 − (1 − p ) We

are

⇔ (1 − p ) n ≥ 19

n

n−k

( where k = 0 )

n

n which verifies 1 − (1 − p ) ≥ 0,9 n10 n10 n10 2,3 # ≤ 0,1 ⇔ n ≥ − =− =− i.e. −1 1 n (1 − p ) n (1 − p ) ne 8 8 thus

looking

n

for

3) By using the previous notations, we are looking to calculate P (U ≥ 20 ) with

U = U1 + P (U1 +

with

µ = 1− e

+ U100 , which is to say: ⎛U + + U100 ≥ 20 ) = P ⎜ 1 ⎝ −1

8

# 0,1175

and

+ U100 − 100µ 100σ

≥

−1 −1 ⎞ ⎛ σ = ⎜ ⎜⎛ 1 − e 8 ⎟⎞ e 8 ⎟ ⎠ ⎝⎝ ⎠

1

20 − 100µ ⎞ ⎟ 100σ ⎠ 2

# 0,32

8, 25 ⎞ ⎛ P⎜S ≥ = P ( S ≥ 2,58 ) = 1 − F0 ( 2,58 ) 3, 2 ⎟⎠ ⎝ where S is an r.v. N ( 0,1) and

F0 distribution function of the r.v. N ( 0,1) .

Finally P (U ≥ 20 ) = 1 − 0,9951# 0, 005 .

i.e.

Gaussian Vectors

87

Exercise 2.2.

n independent r.v. of law N ( 0,1) and given

X 1 ,… , X n

Given

a 1 ,… , a n ; b 1,… , b n

2n real constants:

1) Show that the r.v. Y =

n

n

j =1

j =1

∑ a j X j and Z = ∑ b j are independent if and

n

only if

∑ a jb j = 0 . j =1

2) Deduce from this that if the r.v.

X=

X 1 ,..., X n are n independent r.v. of law N ( 0,1) ,

1 n ∑ X j and YK = X K − X (where n j =1

K∈

{1, 2,..., n} )

are

independent. For K

≠

YK and Y are they independent r.v.?

Solution 2.2. 1) U = (Y , Z ) is evidently a Gaussian vector. ( ∀λ and

µ∈ ,

In order for

Y and Z to be independent it is thus necessary and sufficient that:

the r.v. λY + µ Z is evidently a Gaussian r.v.).

0 = Cov (Y , Z ) = EYZ = ∑ a j b j EY j Z j = ∑ a j b j j

2) In order to simplify the expression, let us make K

j

= 1 an example:

1 1 ⎛ 1⎞ X n ; Y1 = ⎜1 − ⎟ X 1 − X 2 − n n ⎝ n⎠ n 1⎛ 1⎞ 1 and ∑ a j b j = ⎜ 1 − ⎟ − ( n − 1) = 0 n⎝ n⎠ n j =1 X=

1 X1 + n

+

−

1 Xn n

88


– To simplify let us make K

= 1 and

=2

1 1 ⎛ 1⎞ Y1 = ⎜1 − ⎟ X 1 − X 2 − − X n ; n n ⎝ n⎠ 1 1 ⎛ 1⎞ Y2 = − X 1 + ⎜1 − ⎟ X 2 − − X n n n ⎝ n⎠ n

and

⎛

1⎞1

1

∑ a j b j = −2 ⎜⎝1 − n ⎟⎠ n − ( n − 2 ) n < 0 , thus Y1 and Y2 are dependent. j =1

Exercise 2.3.

X ∼ N ( 0,1) and a discrete r.v. ε such that 1 1 P ( ε = −1) = and P = ( ε = +1) = . 2 2 We give a real r.v.

We suppose

X and ε independent. We state Y = ε X :

– by using distributions functions, verify that Y ∼ N ( 0,1) ; – show that Cov ( X , Y ) = 0 ; – is the vector U = ( X , Y ) gaussian? Solution 2.3. 1)

(

FY ( y ) = P (Y ≤ y ) = P ( ε X ≤ y ) = P ( ε X ≤ y ) ∩ ( ( ε = 1) ∪ ( ε = −1) )

=P

( ( (ε X ≤ y ) ∩ (ε = 1) ) ∪ ( (ε X ≤ y ) ∩ (ε = −1) ) )

)

Gaussian Vectors

89

Because of the incompatibility of the two events linked by the union

= P ( ( ε X ≤ y ) ∩ ( ε = 1) ) + P ( ( ε X ≤ y ) ∩ ( ε = −1) ) = P ( ( X ≤ y ) ∩ ( ε = 1) ) + P ( ( − X ≤ y ) ∩ ( ε = −1) ) Because of the independence of

X and ε ,

P ( X ≤ y ) P ( ε = 1) + P ( − X ≤ y ) P ( ε = −1) =

1 ( P ( X ≤ y ) + P ( − X ≤ y )) 2

Finally, thanks to the parity of the density of the law N ( 0,1) ,

= P ( X ≤ y ) = FX ( y ) ; 2) Cov ( X , Y ) = EXY − EXEY = Eε X − EX Eε X = Eε EX 2

0

2

= 0;

0

3) X + Y = X + ε X = X (1 + ε ) ;

(

)

Thus P ( X + Y = 0 ) = P X (1 + ε ) = P (1 + ε = 0 ) =

1 2

λ X + µY (with λ = µ = 1 ) because the law admits no density ( PX +Y ({0} ) = 1 ). 2 We can deduce that the r.v.

Thus the vector U = ( X , Y ) is not Gaussian.

is not Gaussian,

90


Exercise 2.4. Given a real r.v. X ∼ N ( 0,1) and given a real a > 0 :

⎧⎪ X if X < a is also a real ⎪⎩− X if X ≥ a

1) Show that the real r.v. Y defined by Y = ⎨ r.v. ∼ N ( 0,1)

(Hint: show the equality of the distribution functions FY = FX .)

4 2) Verify that Cov ( X , Y ) = 1 − 2π

∞

∫a

2

x e

− x2

2 dx

Solution 2.4. 1) FY ( y ) = P ( Y ≤ y ) = P

( (Y ≤ y ) ∩ ( X

Distributivity and then incompatibility

( P ( (Y ≤ y )

)

(

< a) ∪ ( X ≥ a)

)

⇒

)

P (Y ≤ y ) ∩ ( X < a ) + P (Y ≤ y ) ∩ ( X ≥ a ) =

)

((

)

X < a P ( X < a) + P Y ≤ y X ≥ a P ( X ≥ a)

P ( X ≤ y ) P ( X < a ) + P (( − X ≤ y )) P ( X ≥ a ) P( X ≤ y )

because

1 − x2 2 e = f X ( x) is even 2π

(

)

= P ( X ≤ y ) P ( X < a ) + P ( X ≥ a ) = P ( X ≤ y ) = FX ( y ) ;

)

Gaussian Vectors

91

2) EX = 0 and EY = 0, thus:

Cov ( X , Y ) = EXY = ∫ =∫ −∫

∞ −∞

−a −∞

a −a

x 2 f X ( x ) dx − ∫

x 2 f X ( x ) dx − ∫

−a −∞

−a −∞

∞

x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a

∞

x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a

∞

x 2 f X ( x ) dx − ∫ x 2 f X ( x ) dx a

The 1st term equals EX 2 = VarX = 1 . The sum of the 4 following terms, because of the parity of the integrated function, equals

∞

−4∫ x 2 f X ( x ) dx from which we obtain the result. a

Exercise 2.5.

⎛X⎞ ⎛ 0⎞ Z = ⎜ ⎟ be a Gaussian vector of expectation vector m = ⎜ ⎟ and of ⎝Y ⎠ ⎝1 ⎠ ⎛ 1 1 ⎞ 2 ⎟ which is to say covariance matrix Γ Z = ⎜ Z ∼ N 2 ( m, Γ Z ) . ⎜1 1 ⎟ ⎝ 2 ⎠ Let

1) Give the law of the random variable

X − 2Y .

2) Under what conditions on the constants a and b is the random variable aX + bY independent of X − 2Y and of variance 1? Solution 2.5. 1) X ∼ N ( 0,1) and Y ∼ N (1,1) ; as

X and Y are also independent thus

X − 2Y is a Gaussian r.v.; precisely X − 2Y ∼ N ( −2,5 ) . ⎛ X − 2Y ⎞ ⎟ is a Gaussian vector (write the definition) X − 2Y and ⎝ aX + bY ⎠

2) As ⎜

aX + bY

are

independent

⇔

Cov ( X − 2Y , aX + bY ) = 0

now

92


Cov ( X − 2Y , aX + bY ) = aVarX − b Cov ( X , Y )

2 −2a Cov ( X , Y ) − 2bVarY = a − b − a = 0 i.e. b = 0 3 As 1 = Var ( a X

+ bY ) = Var aX = a 2 Var X

: a = ±1 .

Exercise 2.6.

X and Y and we assume that X admits a density probability f X ( x ) and that Y ∼ N ( 0,1) . We are looking at two independent r.v.

Determine the r.v.

(

)

E e XY X .

Solution 2.6.

(

)

E e XY x = E xY = ∫ e xy 1 x2 2 = e ∫ e 2π 1 So y → e 2π finally obtain

(

−( y − x ) 2

−( y − x ) 2

)

1 −y 2 e dy 2π 2

2

dy

2

is a density of probability (v.a. ∼ N ( x,1) ), and we

E e XY X = e

X2

2

.

Chapter 3

Introduction to Discrete Time Processes

3.1. Definition A discrete time process is a family of r.v.

{

XT = Xt j t j ∈T ⊂

}

where T called the time base is a countable set of instants. X t is the r.v. of the j

family considered at the instant t j . Ordinarily, the t j are uniformly spread and distant from a unit of time and in the sequence T will be equal to

,

or

∗

and the processes will be still denoted

X T or, if we wish to be precise, X , X or X

∗

.

In order to be able to study correctly some sets of r.v. X j of X T and not only the r.v. X j individually, it is in our interests to consider the latter as being definite mappings on the same set and this leads us to an exact definition.

94


DEFINITION.– Any X T family with measurable mappings is called a real discrete time stochastic process:

Xj: ω

⎯⎯ →

(

( Ω,a )

X j (ω ) ,B (

with j ∈ T ⊂

))

We also say that the process is defined on the fundamental space ( Ω, a ) . In general a process X T is associated with a real phenomenon, that is to say that the X j represent (random) physical, biological, etc. values. For example the intensity of electromagnetic noise coming from a certain star. For a given

ω,

that is to say after the phenomenon has been performed, we

obtain the values x j = X j (ω ) .

{

DEFINITION.– xT = x j j ∈ T

}

is called the realization or trajectory of the

process X T .

X −1

X0

X1

X2

Xj

xj

x1 x2

x−1 -1

x0

0

1

2

Figure 3.1. A trajectory

j

t


95

Laws We defined the laws PX of the real random vectors X Chapter 1. These laws are measures defined on n

Borel algebra of The finite sets

B

= ( X 1 ,..., X n ) in

T

( ) =B (

) ⊗ ... ⊗ B ( )

n

.

( X i ,..., X j ) of r.v. of X T are random vectors and, as we will

be employing nothing but sets such as these in the following chapters, the considerations of Chapter 1 will be sufficient for the studies that we envisage. T

and in certain problems we cannot avoid the following However, X T ∈ additional sophistication: 1) construction of a

σ

-algebra

2) construction of laws on

B

B

( ) = ⊗ B ( ) on T

j

j∈T

T

;

( ) (Kolmogorov’s theorem). T

Stationarity DEFINITION.– We say that a process

∀i, j , p ∈

the random vectors

same law, i.e. ∀Bi ,..., B j ∈ B (

((

)

(

{

XT = X j j ∈

}

is stationary if

( X i ,..., X j ) and ( X i+ p ,..., X j + p ) have the ) (in the drawing the Borelians are intervals):

P X i + p ∈ Bi ∩ ... ∩ X j + p ∈ B j

) ) = P ( ( X i ∈ Bi ) ∩ ... ∩ ( X j ∈ B j ) )

96


i +1

i

i+ p

j

i +1+ p

j+ p

t

Wide sense stationarity DEFINITION.– We say that a process

X T is centered if EX j = 0

DEFINITION.– We say that a process

X T is of the second order if:

X j ∈ L2 ( dP )

∀j ∈ T .

∀j ∈ T .

Let us remember that if

X j ∈ L2 ∀j ∈ T then X j ∈ L1 and ∀i, j ∈ T

EX i X j < ∞ . Thus, the following definition is meaningful. DEFINITION.– Given

X a real 2nd order process, we call the covariance function

of this process, the mapping

(

Γ : i, j ⎯⎯ → Γ ( i, j ) = Cov X i , X j

)

x We call the autocorrelation function of this process the mapping:

R : i, j ⎯⎯ → R ( i, j ) = E X i X j x


97

These two mappings obviously coincide if X ] is centered. We can recognize here notions introduced in the context of random vectors, but here as the indices ...i,... j ,... represent instants, we can expect in general that when the deviations

i − j increase, the values Γ ( i, j ) and R ( i, j ) decrease. DEFINITION.– We say that the process X ] is wide sense stationary (WSS) if: – it is of the 2nd order;

→ m ( j ) = EX is constant; – the mapping j ⎯⎯ ]

\ Γ ( i + p, j + p ) = Γ ( i, j )

– ∀ i, j , p ∈ ]

In this case Γ ( i, j ) is instead written C ( j − i ) . Relationship linking the two types of stationarity A stationary process is not necessarily of the 2nd order as we see with the process X ] for example in which we choose for X j r.v. independent of Cauchy’s law:

fX j ( x) =

(

a

π a +x 2

2

)

and a > 0 and

EX j and EX 2j are not defined.

A “stationary process which is also of the 2nd order” (or a process of the 2nd order which is also stationary) must not be confused with a WSS process. It is clear that if a process of the 2nd order is stationary, it is thus WSS. In effect:

EX j + p = ∫ xdPX j+ p ( x ) = ∫ xdPX j ( x ) = EX j \

\

98


and:

Γ ( i + p, j + p ) = ∫ =∫

xy dPX i+ p , X j+ p ( x, y ) − EX i + p EX j + p

2

2

xy dPX i , X j ( x, y ) − EX i EX j = Γ ( i, j )

The inverse implication “wide sense stationary (WSS) ⇒ stationarity” is false in general. However, it is true in the case of Gaussian processes. Ergodicity Let X

be a WSS process.

DEFINITION.– We say that the expectation of X

EX 0 = lim

N ↑∞

N

1 2N + 1

∑

j =− N

X j (ω ) a.s. (almost surely)

We say that the autocorrelation function X

∀n ∈

is ergodic if:

K ( j, j + n ) = EX j X j +n = lim

N ↑∞

is ergodic if:

1 2N + 1

N

∑

j =− N

X j (ω ) X j +n (ω ) a.s.

That is to say, except possibly for ω ∈ N set of zero probability or even with the exception of trajectories whose apparition probability is zero, we have for any trajectory x :

EX 0 = lim

N ↑∞

+N

1 2N + 1

∑

j =− N

x j (ergodicity of 1st order)

= EX j X j + n = lim N ↑∞

1 2N + 1

+N

∑

j =− N

x j x j + n (ergodicity of 2nd order)


With the condition that the process X

99

is ergodic, we can then replace a

mathematical expectation by a mean in time. This is a sufficient condition of ergodicity of 1st order. PROPOSITION.– Strong law of large numbers: If the X j ( j ∈

)

form a sequence of independent r.v. and which are of the

same law and if E X 0 < ∞ then EX 0 = lim

N ↑∞

+N

1

∑

2 N + 1 j =− N

X j (ω ) a.s.

NOTE.– Let us suppose that the r.v. X j are independent Cauchy r.v. of probability density

1

a π a + x2 2

( a > 0) .

By using the characteristic functions technique, we can verify that the r.v.

YN =

1

+N

∑

2 N + 1 j =− N

X j has the same law as X 0 ; thus YN can not converge a.s. to

the constant EX 0 , but E X 0 = +∞ .

X

EXAMPLE.– We are looking at the process

which consists of r.v.

X j = A cos ( λ j + Θ ) where A is a real constant and where Θ is an r.v. of uniform probability density f Θ (θ ) =

1 1 (θ ) . Let us verify that X 2π [0,2π [

is a

WSS process.

EX j = ∫

2π 0

Acos ( λ j + θ ) f Θ (θ ) dθ =

Γ ( i, j ) = K ( i, j ) = EX i X j = ∫

A2 2π

2π

∫0

2π 0

A 2π

2π

∫0

cos ( λ j + θ ) dθ = 0

Acos ( λ j + θ ) Acos ( λ j+θ ) f Θ (θ ) dθ

cos ( λ i + θ ) cos ( λ j + θ ) dθ =

A2 cos ( λ ( j − i ) ) 2

100


and X

is in fact WSS.

Keeping with this example, we are going to verify the ergodicity expectation. Ergodicity of expectation

lim N

+N

1

∑

Acos ( λ j + θ ) (with θ fixed ∈ [ 0, 2π [ )

2 N + 1 j =− N

= lim N

1

N

∑

2 N + 1 j =− N

Acosλ j = lim N

2A ⎛ N 1⎞ ⎜⎜ ∑ cosλ j − ⎟⎟ 2 N + 1 ⎝ j =0 2⎠

iλ N +1 N 2A ⎛ 1⎞ 2 A ⎛ 1- e ( ) 1 ⎞ iλ j = lim − ⎟ ⎜ Re ⎜ Re ∑ e − ⎟⎟ = lim N 2N + 1 ⎜ 2 ⎠ N 2 N + 1 ⎝⎜ 2 ⎠⎟ 1 − eiλ ⎝ j =0

If λ ≠ 2kπ , the parenthesis is bounded and the limit is zero and equal to EX 0 . Therefore, the expectation is ergodic. Ergodicity of the autocorrelation function

lim N

(with

∑

2 N + 1 j =− N

θ

= lim N

= lim N

+N

1

Acos ( λ j + θ ) Acos ( λ ( j + n ) + θ )

[

[

fixed ∈ 0, 2π )

A2

+N

∑

2 N + 1 j =− N

cosλ j cosλ ( j + n )

1 A2 + N ∑ ( cosλ ( 2j+n ) + cosλ n ) 2 2 N + 1 j =− N

+N ⎛ 1 A2 ⎛ ⎞ ⎞ A2 = lim ⎜ Re ⎜ eiλ n ∑ eiλ 2 j ⎟ ⎟ + cosλ n ⎜ ⎟⎟ 2 N ⎜ 2 2N + 1 j =− N ⎝ ⎠⎠ ⎝

The

limit

is

still

zero

autocorrelation function is ergodic.

and

A2 cosλ n = K ( j , j + n ) . Thus, the 2


101

Two important processes in signal processing Markov process DEFINITION.– We say that X – ∀B ∈ B (

is a discrete Markov process if:

);

– ∀t1 ,..., t j +1 ∈

with t1 < t2 < ... < t j < t j +1 ;

– ∀x1 ,..., x j +1 ∈

.

(

) (

)

P X t j+1 ∈ B X t j = x j ,..., X t1 = x1 = P X t j+1 ∈ B X t j = x j , an

Thus

equality that more briefly can be written:

(

) (

)

P X t j+1 ∈ B x j ,..., x1 = P X t j+1 ∈ B x j . We can say that if t j represents the present instant, for the study of X towards the future (instants > t j ), the information

(

{( X

tj

) (

= x j ,..., X t 1 = x1

)

brings nothing more than the information X t j = x j .

B

xt1 xt

t1

j −1

t j −1

tj xt

t j +1 j

t

)}

102


Markov processes are often associated with phenomena beginning at instant 0 for example and we thus choose the probability law Π 0 of the r.v. X 0 .

(

The conditional probabilities P X t ∈ B x j j +1

) are called transition probabilities.

= j.

In what follows, we suppose t j

DEFINITION.– We say that the transition probability is stationary if

(

)

(

)

P X j +1 ∈ B x j is independent of j = P ( X 1 ∈ B x0 ) .

Here is an example of a Markov process that in practice is often met.

X

is defined by the r.v.

(

)

X 0 and the relation of recurrence

X j +1 = f X j , N j where the N j are independent r.v. and independent of the r.v.

X 0 and where f is a mapping:

×

Thus, let us show that ∀B ∈ B (

):

2

→

Borel function.

( ) ( ) ⇔ P ( f ( X , N ) ∈ B x , x ,..., x ) = P ( f ( X , N ) ∈ B x ) ⇔ P ( f ( x , N ) ∈ B x , x ,..., x ) = P ( f ( x , N ) ∈ B x ) P X j +1 ∈ B x j , x j −1 ,..., x0 = P X j +1 ∈ B x j j

j

j

j

j

j

j −1

j −1

0

0

This equality will be verified if the r.v.

( X j −1 = x j −1 ) ∩ ... ∩ ( X 0 = x0 ) .

j

j

j

j

j

j

N j is independent of


103

Now the relation of recurrence leads us to expressions of the form:

X 1 = f ( X 0 , N 0 ) , X 2 = f ( X 1 , N1 ) = f ( f ( X 0 , N 0 ) , N1 )

(

= f 2 ( X 0 , N 0 , N1 ) ,..., X j = f j X 0 , N1 ,..., N j −1 which proves that of

)

N j , being independent of X 0 , N1 ,..., N j −1 , is also independent

X 0 , X 1 ,..., X j −1 (and even of X j ).

Gaussian process

the random vector

(

X S = X i ,..., X j

remember is denoted X S ∼

(

is Gaussian if ∀ S = ( i,..., j ) ∈

X

DEFINITION.– We say that a process

)

,

is a Gaussian vector that as we will

)

N n mS , Γ X s .

We see in particular that as soon as we know that a process law is entirely determined by its expectation function

X is Gaussian, its

j → m ( j ) and its

covariance function i, j → Γ ( i, j ) . A process such as this is denoted

X ∼ N ( m ( j ) , Γ ( i, j ) ) .

A Gaussian process is obviously of the 2nd order: furthermore if it is a WSS process it is thus stationary and to realize this it is sufficient to write the probability:

(

)

f X S xi ,..., x j =

of whatever vector

1

( 2π )

j −i +1 2

( Det Γ ) XS

1 2

T ⎛ 1 ⎞ exp ⎜ − ( x − mS ) Γ −S1 ( x − mS ) ⎟ ⎝ 2 ⎠

X S extracted from the process.

104


Linear space associated with a process

X

Given

a WSS process, we note

combinations of the r.v. of

That is to say:

H

X

HX

the family of finite linear

X .

⎧⎪ = ⎨ ∑ λ j X j S finite ⊂ ⎪⎩ j∈S

⎫⎪ ⎬ ⎪⎭

DEFINITION.– We call linear space associated with the process

H

X

2

augmented by the limits in L of the elements of H

denoted

H

X

X

X

the family

. The linear space is

.

NOTES.– 1) H

X

⊂H

X

⊂ L2 ( dP ) and H

2) Let us suppose that X

X

2

is a closed vector space of L

( dP ) .

is a stationary Gaussian process. All the linear 2

combinations of the r.v. X j of X

are Gaussian and the limits in L are equally

(

Gaussian. In effect, we easily verify that if the set of r.v. X n ∼ N mn , σ n 2

converge in L towards an r.v. X of expectation m and of variance

σ n2

then converge towards m and

σ

(

and X ∼ N m, σ

2

σ 2 , mn

) respectively.

2

)

and

Delay operator Process X

being given, we are examining operator T

defined by:

T n : ∑ λ j X j → ∑ λ j X ( j − n ) ( S finished ⊂ j∈S

H

j∈S

X

H

X

)

n

( n ∈ ) on H ∗

X


DEFINITION.– T

n

105

is called operator delay of order n .

Properties of operator delay: –

T n is linear of H ∗

– ∀ n and m ∈ –

X

in H

X

;

T n T m = T n+m ;

T n conserves the scalar product of L2 , that is to say ∀ I and J finite ⊂

:

⎛ ⎞ ⎛ ⎞ < T n ⎜ ∑ λi X i ⎟ , T n ⎜ ∑ µ j X j ⎟ > = < ∑ λi X i , ∑ µ j X j > . ⎜ j∈J ⎟ i∈I j∈J ⎝ i∈I ⎠ ⎝ ⎠ EXTENSION.– T Let

Z ∈H

X

n

extends to all

and Z p ∈ H

X

H

X

in the following way:

be a sequence of r.v. which converge towards Z

2

in L ; Z P is in particular a Cauchy sequence of

( )

Tn Zp

is also a Cauchy sequence of

converges in

H

X

H

X

P

∀Z ∈ H

X

towards Z . It is natural to state T

n

As a consequence,

X

n

and by isometry T ,

H

which, since

. It is simple to verify that lim T

particular series Z p which converges towards

H n

X

is complete,

( Z p ) is independent of the

Z.

and the series Z p ∈ H

X

which converges

T n (Z p ) . ( Z ) = lim P

DEFINITION.– We can also say that

H

X

is the space generated by the X

process. 3.2. WSS processes and spectral measure

In this section it will be interesting to note the influence on the spectral density of the temporal spacing between the r.v. For this reason we are now about to

106


{

consider momentarily a WSS process X θ = X jθ j ∈ and where jθ has the significance of duration.

} where θ

is a constant

3.2.1. Spectral density

DEFINITION.– We say that the process X θ possesses a spectral density if its

( ( j − i )θ ) = EX iθ X jθ − EX iθ EX jθ can be written 1 C ( nθ ) = ∫ 12θ exp ( 2iπ ( inθ ) u ) S XX ( u ) du and S XX ( u ) is −

covariance C ( nθ ) = C

in the form:

2θ

then called the spectral density of the process X θ . PROPOSITION.– +∞

Under the hypothesis

∑ C ( nθ ) < ∞ :

n =−∞

1) the process X θ admits a spectral density S XX ; 2) S XX is continuous, periodic of

C

− nθ − 2θ − θ

1

θ

period, real and even.

Var X jθ

0 θ

2θ

nθ

S XX

u

t −1

2θ

0

1

2θ

Figure 3.2. Covariance function and spectral density of a process


107

NOTE.– The covariance function C is not defined (and in particular does not equal zero) outside the values nθ . DEMONSTRATION.– Taking into account the hypotheses, the series: +∞

∑ C ( pθ ) exp ( −2iπ ( pθ ) u )

p =−∞

converges uniformly on

1

θ

and defines a continuous function S ( u ) and

-periodic. Furthermore:

∫ =∫

+∞ 2θ C −1 2θ p =−∞ 1

1

2θ −1 2θ

∑ ( pθ ) exp ( −2iπ ( pθ ) u ) exp ( 2iπ ( nθ ) u ) du

S ( u ) exp ( 2iπ ( nθ ) u ) du 2

The uniform convergence and the orthogonality in L

( − 1 2θ , 1 2θ ) of the

complex exponentials enables us to conclude that:

C ( nθ ) = ∫

1

2θ −1 2θ

exp ( 2iπ ( nθ ) u ) S ( u ) du and that S XX ( u ) = S ( u ) .

To finish, C ( nθ ) is a covariance function, thus:

C ( − nθ ) = C ( nθ )

108


and we can deduce from this that S XX ( u ) =

+∞

∑

p =−∞

real and even (we also have S XX ( u ) = C ( 0 ) + 2 EXAMPLE.– The covariance C ( nθ ) = σ e

C ( pθ ) exp ( −2iπ ( pθ ) u ) is ∞

∑ C ( pθ ) cos2π ( pθ ) u ). p =1

2 − λ nθ

( λ > 0)

of a process X θ in

fact verifies the condition of the proposition and X θ admits the spectral density.

S XX ( u ) = σ 2

+∞

∑ e−λ nθ −2iπ ( nθ )u

n =−∞

∞ ⎛ ⎞ − λ nθ − 2iπ ( nθ )u − λ nθ + 2iπ ( nθ )u =σ 2 ⎜∑e + ∑e − 1⎟ n =0 ⎝ n =0 ⎠ 1 1 ⎛ ⎞ =σ2⎜ + − 1⎟ − λθ − 2iπθ u − λθ + 2iπθ u 1− e ⎝ 1− e ⎠ ∞

=σ2

1 − e−2λθ 1 + e −2λθ − 2e − λθ cos2πθ u

White noise

DEFINITION.– We say that a centered WSS process X θ is a white noise if its covariance

function

⎛ C ( 0 ) = EX 2jθ = σ 2 ⎜ ⎜ C ( nθ ) = 0 if n ≠ 0 ⎝

C ( nθ ) = C ( ( j − i )θ ) = EX iθ X jθ

verifies

∀j ∈

The function C in fact verifies the condition of the preceding proposition and

S XX ( u ) =

+∞

∑

n =−∞

C ( nθ ) exp ( −2iπ ( nθ ) u ) = C ( 0 ) = σ 2 .


109

S XX C

σ

σ2

2

t

u

0

0

Figure 3.3. Covariance function and spectral density of a white noise

We often meet “Gaussian white noises”: these are Gaussian processes which are also white noises; the families of r.v. extracted from such processes are independent

(

and ∼ N 0, σ

2

).

More generally we have the following result which we will use as the demonstration. Herglotz theorem

In order for a mapping

nθ → C ( nθ ) to be the covariance function of a

WSS process, it is necessary and sufficient that a positive measurement on

⎛⎡ 1 1 ⎤⎞ , ⎥ ⎟ , which is called the spectral measure, such that: ⎝ ⎣ 2θ 2θ ⎦ ⎠

B ⎜ ⎢-

C ( nθ ) = ∫

1

2θ −1 2θ

exp ( 2iπ ( nθ ) u ) d µ X ( u ) . ∞

In this statement we no longer assume that

∑ C ( nθ ) < ∞ .

n =−∞

µX

exists

110

Discrete Stochastic Processes and Optimal Filtering +∞

∑ C ( nθ ) < ∞ ,

If

we

again

find

the

starting

statement

with:

n =−∞

d µ X ( u ) = S XX ( u ) du (a statement that we can complete by saying that the

spectral density S XX ( u ) is positive).

3.3. Spectral representation of a WSS process

In this section we explain the steps enabling us to arrive at the spectral representation of a process. In order not to obscure these steps, the demonstrations of the results which are quite long without being difficult are not given.

3.3.1. Problem

The object of spectral representation is: 1) To study the integrals (called Wiener integrals) of the

∫S ϕ ( u ) dZu

type

obtained as limits, in a meaning to clarify the expressions with the form:

∑ ϕ ( u j ) ( Zu j

j

− Zu j−1

) , ϕ is a mapping with complex values (and

where S is a restricted interval of

{

other conditions), Z S = Z u u ∈ S

}

is a 2nd order process with orthogonal

increments (abbreviated as p.o.i.) whose definition will be given in what follows. 2) The construction of the Wiener integral being carried out, to show that reciprocally, if we allow ourselves a WSS process X θ , we can find a p.o.i.

{

}

Z S = ZU u ∈ S = ⎡ − 1 , 1 ⎤ such that ∀j ∈ ⎣ 2θ 2θ ⎦ a Wiener integral X jθ =

∫S e

2iπ ( jθ )u

dZu .

X jθ may be written as


NOTE.–

∫ S ϕ ( u ) dZu

and

∫S e

2iπ ( jθ )u

111

dZu will not be ordinary Stieljes

integrals (and it is this which motivates a particular study). In effect:

⎛ ⎞ ⎜ ⎟ ⎜ σ = ,.., u j −1 , u j , u J +1 subdivision of S ⎟ ⎜ ⎟ let us state ⎜ σ = sup u j − u j −1 module of the subdivision σ ⎟ j ⎜ ⎟ ⎜ ⎟ ⎜ Iσ = ∑ ϕ u j Z u j − Z u j−1 ⎟ u j ∈σ ⎝ ⎠

{

}

( )(

)

∀σ , the expression Iσ is in fact defined, it is a 2nd order r.v. with complex values. However, the process Z S not being a priori of bounded variation, the ordinary limit

lim Iσ , i.e. the limit with a given trajectory u → Zu (ω ) , does not exist and

σ →0

∫ S ϕ ( u ) dZu The r.v.

cannot be an ordinary Stieljes integral.

∫ S ϕ ( u ) dZu

will be by definition the limit in

limit exists for the family Iσ when

L2 precisely if this

σ → 0 , i.e.: 2

lim E Iσ − ∫ ϕ ( u ) dZu = 0 .

σ →0

S

This is still sometimes written:

L _ ( Iσ ) . ∫ S ϕ ( u ) dZu = σlim →0 2

3.3.2. Results

3.3.2.1. Process with orthogonal increments and associated measurements

S designates here a bounded interval of

.

112


DEFINITION.– We call a random process of continuous parameters with base S , all the family of r.v. Z u , the parameter u describing S .

{

}

This process will be denoted as Z S = Z u u ∈ S . Furthermore, we can say that such a process is: – centered if EZ u = 0

∀u ∈ S ; 2

2

– of the 2nd order if EZ u < ∞ (i.e. Z u ∈ L – continuous in

( dP ) ) ∆u ∈ S ;

L2 : if E ( Zu + ∆u − Zu ) → 0 2

when ∆u → 0 ∀u and u + ∆u ∈ S (we also speak about right continuity when

∆u > 0 or of left continuity when ∆u < 0 in L2 ). In what follows Z S will be centered, of the 2nd order and continuous in

L2 .

Z S has orthogonal increments ∀u1 , u2 , u3 , u4 ∈ S with u1 < u2 ≤ u3 < u4

DEFINITION.– We say that the process ( ZS

is

a

p.o.i.)

if

(

< Z u4 − Z u3 , Z u2 − Zu1 > L2 ( dP ) = E Zu4 − Zu3

)(Z

u2

)

− Zu1 = 0 .

We say that Z S is a process with orthogonal and stationary increments ( Z S is a p.o.s.i.) if Z S is a p.o.i. and if in addition ∀u1 , u2 , u3 , u4 with u4 − u3 = u2 − u1

(

we have E Z u − Z u 4

3

)

2

(

= E Z u2 − Zu1

)

2

.

PROPOSITION.– To all p.o.i. Z S which are right continuous in

L2 , we can

associate: –

a

function

F which does not decrease on

F ( u ′ ) − F ( u ) = E ( Zu′ − Zu ) if u < u ′ ; 2

S

such

that:


– a measurement thus

µ

on

B (S )

113

which is such that ∀ u , u ′ ∈ S with u < u ′ ,

( ).

µ ( ]u, u′]) = F ( u′ ) − F u −

3.3.2.2. Wiener stochastic integral Let Z S still be a p.o.i. right continuity and PROPOSITION.– Given

ϕ ∈ L2 ( µ )

⎞ Zu j − Z u j−1 ⎟ exists. This is by definition ⎟ ⎠ ϕ ( u ) dZ u ;

( )(

Wiener’s stochastic integral 2) Given

ϕ

and ψ ∈ L

2

∫S

the associated measurement.

with complex values:

⎛ lim L2 _ ⎜ ∑ ϕ u j σ →0 ⎜ u ∈σ ⎝ j

1) The

µ

)

( µ ) with complex values, we have the property:

E ∫ ϕ ( u ) dZu ∫ ψ ( u ) dZu = ∫ ψ ( u )ψ ( u ) d µ ( u ) , S

S

in particular E

S

∫ S ϕ ( u ) dZu

2

2

= ∫ ϕ (u ) d µ (u ) . S

Idea of the demonstration

Let us postulate that

ε = vector space in step functions with complex values.

We begin by proving the proposition for functions

ϕ ( u ) = ∑ a j 1⎤u j

⎦

⎤

j −1 ,u j ⎦

ϕ ,ψ ,... ∈ ε

( u ) and : ∫ S ϕ ( u ) dZu = ∑ ϕ ( u j ) ( Zu j

j

(if

ϕ ∈ε

)

− Zu j−1 ).

We next establish the result in the general case by using the fact that

ε ( ⊂ L2 ( µ ) ) is dense in ϕn ∈ ε such that:

L2 ( µ ) , i.e. ∀ϕ ∈ L2 ( µ ) we can find a sequence

114

Discrete Stochastic Processes and Optimal Filtering 2

ϕ − ϕn L ( µ ) = ∫ ϕ ( u ) − ϕn ( u ) d µ ( u ) → 0 S 2

2

when n → ∞ .

3.3.2.3. Spectral representation We start with X θ , a WSS process. Following Herglotz’s theorem, we know that its covariance function

nθ → C ( nθ ) is written C ( nθ ) = ∫

(⎣

spectral measure on B ⎡ −1

2θ

,1

1

2θ 2iπ ( nθ )u e d µX −1 20

(u )

where

µX

is the

)

⎤ . 2θ ⎦

PROPOSITION.– If X θ is a centered WSS process of covariance function

nθ → C ( nθ ) and of spectral measure µ X , there exists a unique p.o.i.

{

}

Z S = Zu u ∈ S = ⎡ −1 , 1 ⎤ such that: ⎣ 2θ 2θ ⎦ ∀j ∈

X jθ = ∫ e

2iπ ( jθ )u

S

dZ u .

Moreover, the measurement associated with Z S is the spectral measure The expression of the X jθ representation of the process.

as Wiener integrals is called the spectral

2iπ ( j + n )θ ) u 2iπ jθ u EX jθ X ( j + n )θ = E ∫ e ( ) dZu ∫ e ( dZu

NOTE.–

S

S

applying the stated property of 2) of the preceding proposition.

=∫ e S

−2iπ ( nθ )u

µX .

dZ u = C ( −nθ ) = C ( nθ ) .

and

by


115

3.4. Introduction to digital filtering

We suppose again that θ = 1 . Given

{

a

h = hj ∈

WSS

j∈

process

X

and

a

sequence

of

real

} , we are interested in the operation which at X

numbers makes a

new process Y correspond, defined by:

∀K ∈

( h 0T

0

YK =

⎛

+∞

⎞

+∞

∑ h j X K − j = ⎜⎜ ∑ h jT j ⎟⎟ X K ⎝ j =−∞

j =−∞

⎠

is also denoted as h1 where 1 is the identical mapping of +∞

In what follows we will still assume that

∑

j =−∞

L2 in L2 ).

h j < ∞ ; this condition is

1

and is called (for reasons which will be explained later) generally denoted h ∈ the condition of stability. DEFINITION.– We say that the process Y process X

by the filter H ( T ) =

is the transform (or filtration) of the

+∞

∑ h jT j and we write Y

j =−∞

= H (T ) X .

NOTES.– 1) Filter H (T ) is entirely determined by the sequence of coefficients

{

h = hj ∈

j∈

} and according to the case in hand, we will speak of filter

H (T ) or of filter h or again of filter (..., h− m ,..., h−1 , h0 ,..., hn ,...) . 2) The expression “ ∀K ∈

YK =

convolution product (noted ∗ ) of X

+∞

∑ h j X K − j ” is the definition of the

j =−∞

by h which is also written as:

116


YK = ( h ∗ X

Y = h ∗ X or again ∀K ∈ 3) Given that X

is a WSS process and

is clear that the r.v. YK =

+∞

∑ hj X K − j

H

X

∈H

)K

is the associated linear space, it X

and that process Y

is also

j =−∞

WSS. Causal filter

Physically, for whatever

K

is given, YK can only depend on the previous r.v.

X K − j in the widest sense of YK , i.e. that j ∈

. A filter H (T ) which realizes

this condition is called causal or feasible. Amongst these causal filters, we can further distinguish two major classes: 1) Filters that are of finite impulse response (FIR) such that: N

YK = ∑ h j X K − j

∀K ∈

j =0

the schematic representation of which follows.

XK

T h0

T

T

h1

h2

hN

Σ

Σ

Σ

Figure 3.4. Schema of a FIR filter

YK


117

2) Filters that are of infinite impulse response (IIR) such that ∞

YK = ∑ h j X K − j

∀K ∈

j =0

NOTES.– 1) Let us explain about the role played by the operator T : at any particular instant K , it replaces X K with X K −1 ; we can also say that T blocks the r.v.

X K −1 for a unit of time and restores it at instant

K

2) Let H (T ) be an IIR filter. At the instant

K

.

∞

YK = ∑ h j X K − j = h0 X K + ... + hK X 0 + hK +1 X −1 + ... j =0

For a process X , thus beginning at the instant 0 , we will have:

∀K ∈

K

YK = ∑ h j X K − j j =0

Example of filtering of a Gaussian process

Let us consider the Gaussian process X

∼ N ( m ( j ) , Γ ( i, j ) ) and the FIR

filter H (T ) defined by h = (...0,..., 0, h 0,..., hN , 0,...) . We immediately verify

that the process Y = H ( T ) X

is Gaussian. Let us consider for example the

filtering specified by the following schema:

118


(

X ∼ N 0, e − j −i

)

T -1

2

YK

Σ

K

YK = ∑ h j X K − j = − X K + 2 X K −1

∀K ∈

j =0

Y is a Gaussian process. Let us determine its parameters: mY ( i ) = EY j = 0

(

(

ΓY ( i, j ) = E Yi Y j = E ( − X i + 2 X i −1 ) − X j + 2 X j −1

)) =

E X i X j − 2 E X i −1 X j − 2 E X i X j −1 + 4 E X i −1 X j −1 = 5e

− j −i

− 2e

− j −i +1

− 2e

− j −i −1

Inverse filter of a causal filter

DEFINITION.– We say that a causal filter H (T ) is invertible if there is a filter

(

denoted H (T ) process X

)

−1

and called the inverse filter of H (T ) such that for any WSS

we have:

(

X = H (T ) ( H (T ) ) X −1

) = ( H (T ) )

−1

( H (T ) X ) ( ∗)


If such a filter exists, the equality Y = H ( T ) X

119

is equivalent to the equality

X = ( H (T ) ) Y . −1

Furthermore,

{

h′ = h′j ∈

( H (T ) )

j∈

}

−1

is

and

defined

by

we

have

(

)

a

sequence

the

of

coefficients

convolution

product

X = h′ ∗ Y .

∀K ∈

In order to find the inverse filter H (T )

{

of coefficients h′ = h′j ∈

(∗) is equivalent to: ∀K ∈

j∈

−1

, i.e. in order to find the sequence

} , we write that the sequence of equalities

⎞ ⎛ +∞ ⎞ ⎛ +∞ ⎞ ⎛ ⎛ +∞ ⎞ ⎞ ⎛ ⎛ +∞ ⎞ X K = ⎜ ∑ h jT j ⎟ ⎜ ⎜ ∑ h′j T j ⎟ X K ⎟ = ⎜ ∑ h′j T j ⎟ ⎜ ⎜ ∑ h j T j ⎟ X K ⎟ ⎜ j =−∞ ⎟ ⎜ ⎜ j =−∞ ⎟ ⎟ ⎜ ⎜ j =−∞ ⎟ ⎟ ⎜ j =−∞ ⎟ ⎝ ⎠⎝⎝ ⎠ ⎠⎝⎝ ⎠ ⎠ ⎝ ⎠ or even to:

⎛ +∞ ⎞ j ⎜⎜ ∑ h jT ⎟⎟ ⎝ j =−∞ ⎠

⎛ +∞ ⎞ ⎛ +∞ ⎞ j j ′ h T ⎜⎜ ∑ j ⎟⎟ = ⎜⎜ ∑ h′j T ⎟⎟ ⎝ j =−∞ ⎠ ⎝ j =−∞ ⎠

⎛ +∞ ⎞ j ⎜⎜ ∑ h j T ⎟⎟ = 1 ⎝ j =−∞ ⎠

EXAMPLE.– We are examining the causal filter H ( T ) = 1 − hT 1) If h < 1

∞

H (T ) admits the inverse filter ( H (T ) ) = ∑ h j T j . −1

j =0

120


For that we must verify that being given X K r.v. at instant

K

of a WSS process

X , we have: ⎛⎛

⎞

⎞

∞

(1 − hT ) ⎜⎜ ⎜⎜ ∑ h j T j ⎟⎟ X K ⎟⎟ = X K ⎝ ⎝ j =0

⎠

2

(equality in L )

⎠ ⎛ N ⎞ ⇔ lim (1 − hT ) ⎜ ∑ h j T j ⎟ X K = X K ⎜ j =0 ⎟ N ⎝ ⎠

(

)

⇔ 1 − h N +1 T N +1 X K − X K = h

N +1

X K −( N +1) → 0 when N ↑ ∞

which is verified if h < 1 since X K − ( N +1) =

(

We should also note that H (T )

)

−1

E X 02 .

is causal.

⎛ ⎝

2) If h > 1 let us write (1 − hT ) = − hT ⎜ 1 −

(1 − hT )

−1

⎛ 1 ⎞ = ⎜1 − T −1 ⎟ ⎝ h ⎠

As the operators commute and

−1

(1 − hT ) = −

T −1 h

∞

∑

j =0

−1

1 −1 ⎞ T ⎟ thus: h ⎠

⎛ 1 −1 ⎞ ⎜− T ⎟. ⎝ h ⎠

1 < 1, q ∞ 1 −j T ( ) = − T ∑ h j +1 . hj j =0 − j +1


121

However, this inverse has no physical reality and it is not causal (the “lead − ( j +1) operators” T are not causal). 3) If h = 1

(1 − T ) and (1 + T ) are not invertible.

Transfer function of a digital filter

DEFINITION.– We call the transfer function of a digital filter

H (T ) =

+∞

∑

j =−∞

h j T j the function H ( z ) =

+∞

∑ hj z− j

z∈

.

j =−∞

We recognize the definition given in the analysis of a Laurent sequence if we

1 . As a consequence of this permutation the transfer z −1 functions (sums of the series) will sometimes be written by using the variable z . We also say that H ( z ) is the z transform of the digital sequence permute z and z

−1

=

h = (... h− m ,..., h 0,..., hn ,...) .

Let us be more precise about the domain of definition of H ( z ) ; it is the domain of the convergence K of Laurent sequence. We already know that K is an annulus of center 0 and thus has the form

{

}

K = z 0≤r < z < R

Moreover, any circle of a complex plane of centre and radius

C ( 0, ρ ) .

ρ

is denoted by

122


K contains C ( 0,1) because due to that fact that we know the hypothesis of +∞

the stability of the filter,

∑

j =−∞

+∞

∑ hj z− j

hj < ∞ ,

converges absolutely any

j =−∞

∀ z ∈ C ( 0,1) . C ( 0, R ) C ( 0, r )

0

1

Figure 3.5. Convergence domain of transfer function

The singularities

σj

of H ( z ) verify

σj ≤r

H ( z ) of any digital filter

or

σj ≥R

and there will be

at least one singularity of H ( z ) on C ( 0, r ) and another on C ( 0, R ) (if not,

K , the holomorphic domain of H ( z ) , could be enlarged). If the filter is now causal and: – if it is an IIR filter then H ( z ) =

{

K = z 0≤r < z

}

( R = +∞ ) ;

∞

∑ h j z − j , so H ( z ) j =0

is holomorphic in


– if it is an FIR filter then H ( z ) =

{

K = z 0< z

} (pointed plane at 0).

123

N

∑ h j z − j , so H ( z )

is holomorphic in

j =0

We observe above all that the singularities

σj

of a transfer function of a stable,

causal filter all have a module strictly less than 1.

C ( 0, r ) 0

∗0

1

Figure 3.6. Convergence domain of and convergence domain of

1

H ( z ) of an IIR causal filter

H ( z ) of an FIR causal filter +∞

NOTE.– In the case of a Laurent sequence

∑ hj z− j

(i.e., in the case of a digital

j =−∞

filter h = {... h− m ,..., h 0,..., hn ,...} ), its domain of convergence K and thus its sum H ( z ) is determined in a unique manner, that is to say that the couple

( H ( z ) , K ) is associated with the filter.

Reciprocally, if, given H ( z ) , we wish to obtain the filter h , it is necessary to

begin by specifying the domain in which we wish to expand H ( z ) , because for

124


different K domains, we obtain different Laurent expansions having H ( z ) as the sum.

(

This can be summed up by the double implication H ( z ) , K Inversion of the

)

h.

z transform

(

)

Given the couple H ( z ) , K , we wish to find filter h .

H being holomorphic in K , we can apply Laurent’s formula: ∀j ∈

hj =

1 2iπ

∫Γ

H ( z) +

z − j +1

dz

where (homotopy argument) Γ is any contour of K and encircling 0 . The integral can be calculated by the residual method or even, since we have a choice of contour

Γ , by choosing Γ = C ( 0,1) and by parameterizing and calculating the integral ∀j ∈

hj =

1 2iπ

iθ ijθ ∫Γ H ( e ) e dθ . +

In order to determine h j , we can also expand the function H ( z ) in Laurent sequences by making use of the usual known expansions. SUMMARY EXAMPLE.– Let the stable causal filter H (T ) = 1 − hT with

h < 1 , of transfer function H ( z ) = 1 − h z −1 defined on

− {0} . We have

seen that it is invertible and that its inverse, equally causal and stable, is ∞

R (T ) = ∑ h j T j . j =0


125

The transfer function of the inverse filter is thus: ∞

R ( z ) = ∑ h j z− j = j =0

(note also that R ( z ) =

1

1 defined on z z > h 1 − hz −1

H (z)

{

}

).

h 0

0

Figure 3.7. Definition domain of

{

1

H ( z ) and definition domain of R ( z )

}

1 on z z > h , let us find (as an exercise) the 1 − hz −1 −j Laurent expansion of R ( z ) , i.e. the h j coefficients of z . Having R ( z ) =

1 1 R ( z )z j −1dz = + ∫ Γ 2iπ 2iπ where Γ is a contour belonging to z z > h . Using the Laurent formulae h j =

{

}

∫Γ

+

zj −dz z−h

126


By applying the residual theorem,

⎞ 1 ⎛ zj zj if j ≥ 0 h j = 2iπ . in h ⎟ = lim ( z − h ) = hj ⎜ residual of 2iπ ⎝ z-h z−h ⎠ z →h if j < 0 :

h j = 2iπ .

1 ⎡⎢⎛

⎞ ⎤ ⎡⎛ ⎞⎤ 1 1 ⎜ Residual of in 0 ⎟ ⎥ + ⎢⎜ Residual of in h ⎟ ⎥ = 0 . ⎟ ⎥ ⎢⎜ ⎟⎥ 2iπ ⎢⎣⎜⎝ z j ( z −h ) z j ( z −h ) ⎠ ⎦ ⎣⎝ ⎠⎦ −1

1

hj

PROPOSITION.– It is given that X

is a WSS process and

linear space; we are still considering the filter

H ( z) =

+∞

∑

j =−∞

h j z − j with

+∞

∑

hj

H

X

is the associated

H (T ) of transfer function

hj < ∞ .

j =−∞

So: 1)

∀K ∈

⎛ +∞ ⎞ j ⎜⎜ ∑ q jT ⎟⎟ X K = ⎝ j =−∞ ⎠

That is to say that the r.v. YK =

H

X

+∞

∑

j =−∞

+∞

∑ q j X K − j converges in H X .

j =−∞

h j X K − j of the filtered process remain in

; we say that the filter is stable.

2) The filtered process Y is WSS. 3) The spectral densities of X

SYY ( u ) = H ( −2iπ u )

2

and of Y are linked by the relationship:

S XX ( u )


127

DEMONSTRATION.– 1) We have to show that ∀K ∈ such that the sequence N →

, there exists an r.v.

YK ∈ H

X

⊂ L2 ( dP )

N

∑ hj X K − j

converges for the norm of

H

X

and

−N

when N ↑ ∞ towards YK . As

X

H

is a Banach space, it is sufficient to verify the

normal convergence, namely: +∞

∑

+∞

∑

(

h j E X K2 − j

)

1

2

0

By using the equation of observations:

and

Y j ⊥ WK

0 ≤ j ≤ K −1

Yj ⊥ NK

0≤ j≤K

The problem of the estimation can now be expressed simply in the following way. Knowing that A( K ) is the state matrix of the system, H ( K ) is the measurement matrix and the results yi of Yi

i ∈ [1, K ] ; obtain the estimations x j

of X j : – if 1< j K we say that the estimation is a prediction. NOTE.– The matrices C ( K ) and G ( K ) do not play an essential role in the measurement where the powers of noise appear in the elements of the matrices QK and RK respectively. However, the reader will be able to find analogies with notations used in “Processus stochastiques et filtrage de Kalman” by the same authors which examine the continuous case.

248


7.3.3. Innovation process

The innovation process has already been defined as:

I K = YK − H ( K ) Pr oj

H KY −1

and:

⎪⎧

X K = YK − H ( K ) Xˆ K |K −1

: ( m×1)

⎪⎫

K −1

∑ Λ jY j Λj matrix n × m ⎬⎪ ⎪ j =0

H KY-1 = ⎨ ⎩

⎭

By this choice of Λ j , the space multivectors X j and Pr oj

Y HK −1

XK

=

H KY−1

is adapted to the order of the state

Xˆ K |K −1 has the same order as X K .

Thus I K represents the influx of information between the instants K − 1 and K Reminder of properties established earlier:

I K ⊥ Y j ⎫⎪ ⎬ for j ∈ [1, K -1] I K ⊥ I j ⎪⎭ We will go back to the innovation to give the importance of its physical sense. 7.3.4. Covariance matrix of the innovation process

Between two measurements, the dynamic of the system leads to an evolution of the state quantities. So the prediction of the state vector at instant K , knowing the measurements

(Y1...YK −1 )

which is to say Xˆ K |K −1 is written according to the

filtering at instant K − 1 .

Xˆ K |K −1 = E ( X K | Y1 ,… , YK −1 ) = Pr oj

HY

= Pr oj

HY

K −1

K −1

XK

( A( K − 1) X K −1 + C ( K − 1) N K −1 | Y1 ,… , YK −1 )

= A( K − 1) Xˆ K −1|K −1 + 0

The Kalman Filter

Xˆ

= A ( K −1) Xˆ

K K −1

249

K −1 K −1

Only the information deriving from a new measurement at instant K will enable us to reduce the estimation error at this same instant. Thus H ( K ) representing in a certain fashion, the measurement operator, where at the very least its effect:

YK − H ( K ) Xˆ

K K −1

will represent the influx of information between two instants of observation. It is for this reason that this information is called the innovation. We observe, furthermore, that I K and YK have the same order. By exploiting the observation equation we can deduce:

⎛ ⎞ I K = H ( K ) ⎜ X K − Xˆ + G ( K ) WK K K −1 ⎟ ⎝ ⎠ and

IK = H (K ) X

K K −1

+ G ( K ) WK

where X K |K −1 = X K − Xˆ K |K −1 is called the prediction error. The covariance innovation matrix is expressed finally as:

Cov I K = E

(

I K I KT

)

that is to say Cov I K = H ( K ) PK K −1 H where P

K K −1

T

⎛ ⎞⎛ ⎞ = E ⎜ H (K ) X + G ( K ) WK ⎟ ⎜ H ( K ) X + G ( K ) WK ⎟ K K −1 K K −1 ⎝ ⎠⎝ ⎠ T

( K ) + G ( K ) RK GT ( K )

⎛ ⎞ XT = Ε⎜ X ⎟ is called the covariance matrix of the ⎝ K K −1 K K −1 ⎠

prediction error.

250


A recurrence formula on the matrices P

K K −1

will be developed in Appendix A.

7.3.5. Estimation

In the scalar case, we have established a relationship between the estimate of a magnitude X K and the innovations I K . We can, quite obviously extend this approach to the case of multivariate processes, that is to say we can write:

Xˆ

K

= ∑ d j (i ) I j

iK

j =1

where d j ( i ) is a matrix ( n x m ) . Let us determine the matrices d j ( i ) :

(

T

since E X i|K I j

(

) = E (( X T

we have: E X i I j

) = E ( Xˆ

furthermore, we have E

(

Then, since I j ⊥ I p

(

i

)

) )

− Xˆ i|K I Tj = 0 ∀j ∈ [1, K ] T i| K I j

X i I Tj

)

) and knowing the form of Xˆ

⎛ K ⎞ = E ⎜ ∑ d p (i ) I p I Tj ⎟ . ⎜ p =1 ⎟ ⎝ ⎠

∀j ≠ p

(

and

)

j , p ∈ [1, K ]

E X i I Tj = d j ( i ) E I j I Tj = d j ( i ) CovI j .

(

Finally: d j ( i ) = E X i I j

T

) ( CovI ) j

−1

.

i| K

The Kalman Filter

251

We thus obtain: K

(

) ( Cov I )

(

) ( Cov I )

Xˆ i K = ∑ Ε X i I Tj j =1

K −1

= ∑ Ε X i I Tj j =1

(

+ Ε X i I KT

−1

j

Ij

−1

j

) ( Cov I

K

Ij

)−1 I K

We are now going to give the Kalman equations. Let us apply the preceding equality to the filtering Xˆ K +1 K +1 ; we obtain: K +1

Xˆ K +1 K +1 = ∑ Ε X K +1 I Tj

(

) ( Cov I )

K

(

) ( Cov I )

−1

+ Ε X K +1 I KT +1 ( Cov I K +1 )

−1

j =1

= ∑ Ε X K +1 I Tj j =1

(

j

)

The state equation reminds us that:

X K +1 = Α ( K ) X K + C ( K ) N K and we know that N K

⊥ Ij .

Thus:

(

)

(

−1

j

)

Ε X K +1 I Tj = Α ( K ) Ε X K I Tj .

Ij Ij I K +1.

252


The estimate of X K +1 knowing the measurement at this instant K+1 is thus expressed: K

(

Xˆ K +1 K +1 = Α ( K ) ∑ Ε X K I Tj

(

j =1

) ( Cov I ) j

)

−1

Ij

+ Ε X K +1 I KT +1 ( Cov I K +1 ) I K +1. −1

The term under the sigma sign (sum) can be written Xˆ K K . Let us exploit this expression:

I K +1 = H ( K +1) X K +1 K + G ( K +1) WK +1 . This gives us:

(

)

−1 Xˆ K +1 K +1 = Α ( K ) Xˆ K K + Ε X K +1 I KT +1 ( Cov I K +1 ) I K +1

which is also written:

(

⎛ Xˆ K +1 K +1 = Α ( K ) Xˆ K K + Ε ⎜ X K +1 H ( K +1) X K +1 K + G ( K +1) WK +1 ⎝

)

T

⎞ ⎟ ⎠

−1

. ( Cov I K +1 ) I K +1 In addition we have shown that the best estimation at a given instant, knowing the past measurements, that we write as Xˆ K +1 K , is equal to the projection of

X K +1 on H KY , i.e.:

The Kalman Filter

Xˆ K +1 K = ProjH Y X K +1 = Pr oj

HY

K

Xˆ K +1 K = Pr oj

HY

and as:

Yj

⊥

K

K

253

( Α (K ) X K + C (K ) NK )

( Α (K ) X K + C (K ) NK ) ∀ j ∈[1, K ]

NK

it becomes Xˆ K +1 K = Α ( K ) Xˆ K K ; Α ( K ) squared . We can consider this equation as that which describes the dynamic of the system independently of the measurements, and as one of the equations of the Kalman filter.

⊥ Wj

In addition, X K

K, j >

0 : it becomes for the filtering:

(

)

−1 Xˆ K +1 K +1 = Xˆ K +1 K + Ε X K +1 X KT +1 K H T ( K + 1) ( Cov I K +1 ) I K +1 .

As:

Xˆ K +1 K

⊥

X K +1 K

then:

( (

)

Xˆ K +1 K +1 = Xˆ K +1 K + E X K +1 − Xˆ K +1 K X KT +1 K H T ( K +1) −1

. ( Cov I K +1 ) I K +1 thus: −1 Xˆ K +1 K +1 = Xˆ K +1 K + PK +1 K H T ( K +1) ( Cov I K +1 ) I K +1

)

254


DEFINITION.– We call the Kalman gain the function K defined (here at instant K+1) by:

K ( K +1) = PK +1 K H T ( K +1) ( Cov I K +1 )

−1

with:

Cov I K +1 = H ( K + 1) PK +1 K H T ( K + 1) + G ( K +1) RK +1 GT ( K +1) From which by putting

K ( K + 1) back into the expression we obtain:

(

K ( K +1) = PK +1 K H T ( K +1) H ( K +1) PK +1 K H (TK +1) + G ( K +1) RK +1GT ( K +1)

)

−1

We notice that this calculation does not require direct knowledge of the measurement of YK . This expression of the gain intervenes, quite obviously, in the algorithm of the Kalman filter and we can write;

(

Xˆ K +1 K +1 = Xˆ K +1 K + K ( K +1) YK +1 − H ( K +1) Xˆ K +1 K

)

This expression of the best filtering represents another equation of Kalman filter. We observe that the “effect” of the gain is essential. In effect, if the measurement is very noisy, which signifies that the elements of matrix RK are large, then the gain will be relatively weak, and the impact of this measurement will be minimized for the calculation of the filtering. On the other hand, if the measurement is not very noisy, we will have the inverse effect; the gain will be large and its effect on the filtering will be appreciable.

The Kalman Filter

255

We are now going “to estimate” this filtering by calculating the error that we commit, that is to say, by calculating the covariance matrix of the filtering error. Let us recall that Xˆ K +1 K +1 is the best of the filtrations, in the sense that it minimizes the mapping:

Z

→ tr X K +1 − Z

Y ∈ H K+ 1

2

T = tr E ⎡( X K +1 − Z )( X K +1 − Z ) ⎤ ⎣ ⎦

∈\ 2

The minimum is thus: tr X K +1 − Xˆ K +1 K +1

(

= tr E X K +1 K +1 X TK +1 K +1

(

T

NOTATION.– In what follows, matrix E X K +1 K +1 X K +1 K +1

P

)

)

is denoted

and is called the covariance matrix of the filtering error.

K +1 K +1

We now give a simple relationship linking the matrices

P

K +1 K +1

and P

K +1 K

.

We observe that by using the filtration equation first and the state equation next:

X K +1|K +1 = X K +1 − Xˆ K +1 K +1

(

= X K +1 − Xˆ K +1 K − K ( K +1) YK +1 − H ( K +1) Xˆ K +1 K = X K +1 − Xˆ K +1 K − K ( K +1)

(H (

K +1) X K +1 + G ( K +1) WK +1 − H ( K +1) Xˆ K +1 K

)

)

= ( I d − K ( K +1) H ( K +1) ) X K +1|K − K ( K +1) G ( K +1) WK +1

256


where I d is the identity matrix. By bringing this expression of X K +1|K +1 in P

K +1 K +1

and by using the fact

that: X K +1| K ⊥ WK +1 we have:

P

K +1 K +1

= ( I d − K ( K +1) H ( K +1) ) P

K +1 K

( I d − K ( K +1) H ( K +1) )T +

K ( K +1) G ( K +1) R ( K +1) GT ( K +1) K T ( K +1) an expression which, since:

Cov I K +1 = G ( K +1) RK +1 GT ( K +1) + H ( K + 1) PK +1 K H T ( K + 1) can be written:

(

PK +1 K +1 = K ( K +1) − PK +1 K H T ( K +1) ( CovI K +1 )

(

( CovI K +1 ) ( K ( K + 1) − PK +1 K

−1

)

H (TK +1) ( CovI K +1 )

)

−1 T

)

+ I d − PK +1 K H T ( K +1) ( CovI K +1 ) H ( K +1) PK +1 K . −1

However, we have seen that: K ( K +1) = PK +1 K H ( K +1) ( Cov I K +1 ) T

−1

So the first term of the second member of the expression is zero and our sought relationship is finally:

(

)

PK +1 K +1 = I d − K ( K +1) H ( K +1) PK +1 K This “updating” of the covariance matrix by iteration is another equation of the Kalman filter.

The Kalman Filter

257

Another approach to calculate this minimum [RAD 84]. We notice that the penultimate expression of PK +1|K +1 can be put in the form:

(

PK +1 K +1 = K ( K +1) − PK +1 K H T ( K +1) J −1 ( K +1)

(

)

J ( K +1) K ( K + 1) − PK +1 K H T ( K + 1) J (−K1 +1)

(

)

)

T

+ I d − PK +1 K H T ( K +1) J −1 ( K +1) H ( K +1) PK +1 K with:

J ( K +1) = H ( K +1) PK +1 K H T ( K +1) + G ( K +1) RK +1 GT ( K +1) = Cov I K +1 Only the 1st term of PK +1 K +1 depends on

K ( K +1) and is of the form

M J M T symmetric with J . So this shape is a positive or zero trace and:

(

)

PK +1 K +1 = M J M T + I d − PK +1 K H T ( K +1) J −1 ( K +1) H ( K +1) PK +1 K . The minimum of the trace will thus be reached when

M is zero, thus:

K ( K +1) = PK +1 K H T ( K +1) J −1 ( K +1) where:

(

K ( K +1) = PK +1 K H T ( K +1) H ( K +1) PK +1 K H T ( K + 1) + G ( K +1) RK +1GT ( K +1) a result which we have already obtained! In these conditions when:

(

)

PK +1 K +1 = I d − K ( K +1) H ( K +1) PK +1 K

)

−1

258


we obtain the minimum of the tr PK +1 K +1 . It is important to note that K , the Kalman gain and PK K the covariance matrix of the estimation error are independent of the magnitudes YK . We can also write the best “prediction”, i.e. Xˆ K +1 K according to the preceding prediction: Thus:

(

Xˆ K +1 K = Α ( K ) Xˆ K K −1 + Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1

)

As for the “best” filtering, the best prediction is written according to the preceding predicted estimate weighted with the gain and the innovation brought along by the measurement YK . This Kalman equation is used not in filtering but in prediction. We must now establish a relationship on the evolution of the covariance matrix of the estimation errors. 7.3.6. Riccati’s equation

Let us write an evolution relationship between the covariance matrix of the filtering error and the covariance matrix of the prediction error:

(

PK K −1 = Ε X K K −1 X KT K −1

)

or by incrementation:

with:

(

PK +1 K = Ε X K +1 K X KT +1 K X K +1 K = X K +1 − Xˆ K +1 K

).

The Kalman Filter

259

Furthermore we know that:

Xˆ K +1 K = Α ( K ) Xˆ K K −1 + A ( K ) K ( K ) I K giving the prediction at instant K +1 and X K +1 = Α ( K ) X K + C ( K ) N K just as: I K = YK − H ( K ) Xˆ K K −1 . The combination of these expressions gives us:

(

)

(

)

X K +1 K = Α ( K ) X K − Xˆ K K −1 − Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1 + C ( K ) N K but YK = H ( K ) X K + G ( K ) WK thus:

(

)

(

)

X K +1 K = Α ( K ) X K − Xˆ K K −1 − Α ( K ) K ( K ) H ( K ) X K − Xˆ K K −1 −

Α ( K ) K ( K ) G ( K ) WK + C ( K ) N K

X K +1 K = ( Α ( K ) − Α ( K ) K ( K ) H ( K ) ) X K K −1 − Α ( K ) K ( K ) G ( K ) WK + C ( K ) N K We can now write PK +1 K by observing that:

X K K −1 ⊥ N K and

X K K −1 ⊥ WK

.

NOTE.– Please note that X K +1/ K is not orthogonal to WK .

260


Thus:

PK +1 K = ( Α ( K ) − Α ( K ) K ( K ) H ( K ) ) PK K −1 ( Α ( K ) − Α ( K ) K ( K ) H ( K ) )

T

+ C ( K ) QK C T ( K ) + Α ( K ) K ( K ) G ( K ) RK GT ( K ) K T ( K ) ΑT ( K ) This expression of the covariance matrix of the prediction error can be put in the form:

PK +1 K = Α ( K ) PK K ΑT ( K ) + C ( K ) QK C T ( K ) This equality independent of YK is called Riccati’s equation with PK K = ( I d − K ( K ) H ( K ) ) PK K −1 which represents the covariance matrix of the filtering error, which is equally independent of YK . See Appendix A for details of the calculation. 7.3.7. Algorithm and summary

The algorithm presents itself in the following form, with the initial conditions:

P0 and Xˆ 0|0 given as well as the matrices: Α ( K ) , QK , H ( K ) , RK , C ( K ) and G ( K ) . 1) Calculation phase independent of YK . Effectively, starting from the initial conditions, we perceive that the recursivity which acts on the gain K ( K + 1) and on the covariance matrix of the prediction and filtering errors PK +1 K and PK +1 K +1 do not require knowledge of the observation process. Thus, the calculation of these matrices can be done without knowledge of measurements. As for these measurements, they come into play for the innovation calculation and that of the filtering or of prediction.

The Kalman Filter

261

PK +1 K = Α( K ) PK K ΑT ( K ) + C ( K ) QK CT ( K )

(

K ( K+1) = PK +1 K HT ( K+1) H ( K+1) PK +1 K HT ( K + 1) + G ( K+1) RK +1 GT ( K+1) PK +1 K +1 = ( Id − K ( K+1) H ( K+1) ) PK +1 K

)

−1

Xˆ K +1 K = Α( K ) Xˆ K K T

(

T

or K ( K + 1) = PK +1 K +1 H ( K + 1) G ( K +1) RK +1G ( K +1)

)

−1

T

if G ( K +1) RK +1G ( K +1) is invertible. 1) Calculation phase taking into account results y K of process YK .

I K +1 = YK +1 − H ( K + 1) Xˆ K +1 K Xˆ K +1 K +1 = Xˆ K +1 K + K ( K + 1) I K +1 It is using a new measurement that the calculated innovation will allow us, weighted by the gain at the same instant, to know the best filtering.

YK +1

+

Σ

K ( K + 1)

Σ

Σ

-

Xˆ K +1 K +1

+

T H ( K + 1)

Xˆ K +1 K

Α(K )

Figure 7.2. Schema of the principle of the Kalman filter

Xˆ K K

262


Important additional information may be obtained in [HAY 91]. NOTE.– If we had conceived a Kalman predictor we would have obtained the expression of the prediction seen at the end of section 7.3.5.

(

Xˆ K +1 K = Α ( K ) Xˆ K K −1 + Α ( K ) K ( K ) YK − H ( K ) Xˆ K K −1

)

IK NOTE.– When the state and measurement equations are no longer linear, a similar solution exists and can be found in other works. The filter then takes the name of the extended Kalman filter. 7.4. Exercises for Chapter 7 Exercise 7.1.

Given the state equation

X K +1 = A X K + N K

where the state matrix A is the “identity” matrix of dimension 2 and

N K the

system noise whose covariance matrix is written Q = σ I d ( I d : identity matrix). 2

The system is observed by the scalar equation:

YK = X 1K + X K2 + WK where X 1K and X K2 are the components of the vector

X K and where WK is the measurement noise of the variance R = σ 12 . P0|0 = Id

and

ˆ = 0 are the initial conditions. X 0|0

1) Give the expression of the Kalman gain K (1) at instant “1” according to

σ 2 and σ 12 . 2) Give the estimate of Xˆ 1|1 of X 1 at instant “1” according to the first measurement of

K (1) and the first measurement Y1 .

The Kalman Filter

263

Solution 7.1.

1) K (1) =

1+ σ 2 ⎛1⎞ ⎜ ⎟ 2 + 2σ 2 + σ 12 ⎝ 1 ⎠

2) Xˆ 1|1 = K (1)Y1 Exercise 7.2.

We are considering the movement of a particle.

x1 ( t ) represents the position of the particle and x2 ( t ) its speed. t

x1 ( t ) = ∫ x2 (τ ) dτ + x1 ( 0 ) 0

By deriving this expression and by noting

x2 (t ) =

dx1 ( t ) = dt

approximately = x1 ( K +1) − x1 ( K ) .

We assume that the speed can be represented by:

X K2 = X K2 −1 + N K −1 where N K is a Gaussian stationary noise which is centered and of variance 1. The position is measured by y K , result of the process YK . This measurement adds a Gaussian stationary noise, which is centered and of variance 1.

Y ( K ) = H ( K ) X ( K ) + WK We assume that RK measurement equal to 1.

covariance matrix (of dimension 1) has a noise

264


1) Give the matrices A, Q (covariance matrix of the system noise) and H . 2) In taking as initial conditions Xˆ 0 = Xˆ 0|0 = 0

P0|0 = I d identity matrix,

give xˆ1|1 the 1st estimation of the state vector. Solution 7.2.

⎛ 1 1⎞ ⎛0 0⎞ ⎟; Q = ⎜ ⎟ ; H = (1 0 ) ⎝ 0 1⎠ ⎝0 1⎠

1) A = ⎜

⎛ 2 3⎞ ⎟ y1 ⎝1 3⎠

2) xˆ1|1 = ⎜

Exercise 7.3. [RAD 84]

We want to estimate two target positions using one measurement. These 1

2

positions X K and X K form the state vector:

(

X K = X 1K

X K2

)

T

The process noise is zero. The measurement of process Y is affected by noise by W of mean value zero and of variance R carried to the sum of the position:

YK = X 1K + X K2 + WK In order to simplify the calculation, we will place ourselves in the case of an immobile target:

X K +1 = X K = X The initial conditions are:

(

)

– P0|0 = C ov X , X = Id identity matrix

The Kalman Filter

265

– R = 0.1 – y = 2.9 (measurement) and Xˆ 0|0 = ( 0

0)

T

1) Give the state matrix A , and observation matrix H . 2) Give the Kalman gain

K.

3) Give the covariance matrix of the estimation error. 2

4) Give the estimation in the sense of the minimum in L of the state vector XK . 5) If x = xK = (1

2 ) , give the estimation error x = xK |K = xK − xˆ K |K . T

1

6) Compare the estimation errors of the variances of X K

and

X K2 and

conclude. Solution 7.3.

H = (1 1)

1) A = I d

2) K = (1 2,1 1 2,1)

T

⎛ 1,1 2,1

−1,1

⎝

1,1

3) P1|1 = ⎜ −1,1 ⎜

2,1

4) xˆ1|1 = ( 2,9 2,1

(

1

5) xK = xK

xK2

1

2,1 ⎞

2,1

⎟⎟ ⎠

2,9 2,1)

T

)

T

= ( −0,38 − 0, 62 )

T

2

6) var X K = var X K = 0,52 Exercise 7.4.

Given the state equation of dimension “1” (the state process is a scalar process):

X K +1 = X K .

266


The state is observed by 2 measurements: Y1 W1 YK = ⎛⎜ YK2 ⎞⎟ affected by noise with WK = ⎛⎜ WK2 ⎞⎟ ⎝ K⎠ ⎝ K⎠

The noise measurement is characterized by its covariance matrix:

σ2 RK = ⎛⎜ O1 σO2 ⎞⎟ . 2 ⎠ ⎝ The initial conditions are:

P0|0 = 1 (covariance of the estimation error at instant “0”)

ˆ = 0 (estimate of and X 0|0

X at instant “0”)

Let us state D = σ 1 + σ 2 + σ 1 σ 2 . 2

2

2

2

1) Give the expression of K(1) Kalman gain at instant “1” according to σ 1 , σ 2 and D 2) Give the estimate Xˆ 1|1 of X 1 at instant “1” according to the measurements

Y11 , Y12 and σ 1,σ 2 and D

σ 12 σ 22 σ 12 +σ 22 instant “1” according to σ . 3) By stating

σ2 =

give P1|1 the covariance of the filtration error at

Solution 7.4.

⎛σ 1) K (1) = ⎜ 1

2

⎝ D

2) Xˆ 1|1 =

(σ Y

2 1 2 1

σ 22 ⎞ ⎟ D ⎠

)

+ σ 12Y12 / D

The Kalman Filter

3) P1|1 =

267

σ2 1+ σ 2

Exercise 7.5.

The fixed distance of an object is evaluated by 2 radar measurements of different qualities. The 1st measurement gives the result:

y1 = r + n1 , measurement of the process Y = X + N1 where we know that the noise N1 is such that: E ( N1 ) = 0 and var ( N1 ) = σ 12 = 10-2 . The 2nd measurement gives: y 2 = r + n2

measurement of the process

Y = X + N2 . E ( N 2 ) = 0 and var ( N 2 ) = w (scalar) The noises N1 and

N 2 are independent.

1) Give the estimate rˆ1 of r that we obtain from the 1st measurement. 2) Refine this estimate by using the 2nd measurement. We will name as rˆ 2 this

new estimate that we will express according to w .

3) Draw the graph rˆ2 ( w) and justify how it appears. Solution 7.5.

1) rˆ1 = xˆ1|1 = y1 2) rˆ2 = xˆ2|2 = y1 +

σ 12

σ 12

+w

( y2 − y1 ) =

100 wy1 + y2 100 w + 1

268


3) See Figure 7.3.

Figure 7.3. Line graph of the evolution of the estimate according to the power of the noise w , parametered according to the magnitude of the measurements

Appendix A Resolution of Riccati’s equation

Let us show that: PK +1 K = A ( K ) PK K A ( K ) + C ( K ) QK C ( K ) T

T

Let us take again the developed expression of the covariance matrix of the prediction error of section 7.3.6

PK +1 K = Α ( K ) ( I d − K ( K ) H ( K ) ) PK K −1 ( Α ( K ) − Α ( K ) K ( K ) H ( K ) )

T

+ C ( K ) QK C(TK ) + Α ( K ) K ( K ) G ( K ) RK G T ( K ) K T ( K ) ΑT ( K )

The Kalman Filter

269

with:

K ( K ) = PK K −1 H T ( K ) ( Cov I K )

−1

and:

Cov I K = H ( K ) PK K −1 H (TK ) + G ( K ) RK G T ( K ) . By replacing

K ( K ) and Cov I K , by their expressions, in the recursive writing

of, PK +1 K , we are going to be able to simplify the expression of the covariance matrix of the prediction error. To lighten the expression, we are going to eliminate the index K when there is no ambiguity by noting P1 = PK +1 K , P0 = PK K −1 and I = I K

(

)

P1 = A I d − KH P0 ( Α − ΑKH ) + C Q C T + Α K G R G T K T ΑT T

K = P0 H T ( Cov I )

−1

Cov I = H P0 H T + G R GT Thus:

G R G T = Cov I − H P0 H T K G R G T K T = P0 H T ( Cov I )

(

−1

( Cov I − H P

0

H T ) ( Cov I )

= P0 H T − P0 H T ( Cov I ) H P0 H T −1

KGRGT K T = P0 H T ( cov I )

−1T

−1T

) ( Cov I )

H P0T

−1T

H P0T

HP0T − P0 H T ( cov I ) HP0 H T ( cov I )

HP0T

−1

−1T

P1 = AP0 AT − AKHP0 AT − AP0 H T K T AT + AKHP0 H T K T AT + CQC T + (+ P0 H T ( cov I )

−1T

HP0T − P0 H T ( cov I ) HP0 H T ( cov I ) −1

−1T

HP0T ) AT

270


i.e. in replacing K with its expression. −1

P1 = AP0 AΤ − A P0 H T ( Cov I ) HP0 AT − AP0 H T ( Cov I )

−1T

HP0T AT

K

+ AP0 H

Τ

(

( Cov I )

−1

+ A P0 H Τ ( Cov I )

HP0 H T ( Cov I )

−1T

−1T

HP0T AT + CQC T −1

HP0T − P0 H T ( Cov I ) HP0 H T ( Cov I )

−1T

)

HP0T AT .

The 3rd and 6th term cancel each other out and the 4th and 7th term also cancel each other out which leaves: P1 = AP0 A − AKHP0 A + CQC T

(

)

or: P1 = A ⎡ I d − KH P0 ⎤ A + CQC ⎣ ⎦ T

T

T

T

PK +1 K = A ( K ) ( I d − K ( K ) H ( K ) ) PK K −1 ) AT ( K ) + C ( K ) QK C T ( K ) PK K Thus:

PK +1 K = A ( K ) PK K AT ( K ) + C ( K ) QK C T ( K ) = covariance matrix of prediction error with:

PK K = ( I d − K ( K ) H ( K ) ) PK K −1 = covariance matrix of filtering error. This result will be demonstrated in Appendix B. NOTE.– As mentioned in section 7.3.7 knowing the initial conditions and the Kalman gain, the updating of the covariance matrices made in an iterative manner.

PK |K −1

and

PK |K

can be

The Kalman Filter

271

Appendix B

We are going to arrive at this result starting from the definition P

KK

and by

using the expression of the function K already obtained. NOTE.– Different from the calculation developed in section 7.3.6 we will not show obtained is minimal. that trP KK

Another way of showing the following result:

(

)

PK K = Ε X K K X TK K = PK K −1 − K ( K ) H ( K ) PK K −1

(

)

= Id − K ( K ) H ( K ) P

K K −1

Demonstration

Starting from the definition of the covariance matrix of the filtering error, i.e.:

PK |K

=

(

E X K |K X TK |K

)

It becomes with X K | K = X K − Xˆ K |K and Xˆ K K = Xˆ K K −1 + K ( K ) I K So X K K = X K − Xˆ K K −1 − K ( K ) I K

X K K −1 Let us now use these results to calculate PK |K :

(

) (

)

PK K = PK K −1 − K ( K ) Ε I K X KT K −1 − Ε X K K −1 I KT K (TK ) + K ( K ) Ε ( I K I KT ) K T ( K )

272


We observe that:

(

) (

)

but I j ⊥ I K and I j ⊥ YK

j ∈ [1, K − 1]

Ε X K K −1 I KT = Ε X K − Xˆ K K −1 I KT

thus Xˆ K K −1 ⊥ I K Given:

(

) (

) (

Ε X K K −1 I KT = Ε X K I KT = E A−1 ( K ) ( X K +1 − C ( K ) N K ) I KT

(

(

)

Thus: Ε X K I K = Ε A T

−1

( K ) X K +1 I KT

)

)

For Ε ( N K ) = 0 However, we have seen elsewhere that:

(

Ε ( X K +1 I KT ) = E ( A ( K ) X K + C ( K ) N K ) H ( K ) X K |K −1 + G ( K )WK =

as:

N K ⊥ WK and Furthermore: For Xˆ K |K −1 ⊥

(

)

)

T

E A ( K ) X K X TK |K −1 H T ( K )

N K ⊥ X K |K −1 = X K − Xˆ K |K −1

(

)

(

)

E X K X TK |K −1 = E Xˆ K |K −1 + X K |K −1 X TK |K −1 = PK |K +1 X K |K −1

The Kalman Filter

273

Thus it becomes:

(

)

Ε X K K −1 I KT = PK K −1H T ( K ) thus:

PK K = PK K-1 − K ( K ) H ( K ) PKT K −1 − PK K −1H T ( K ) K T ( K ) + K ( K ) ( Cov I K ) K T ( K ) with K ( K ) = PK K −1 H ( K ) ( Cov I K ) T

−1

after simplification and in noting that

PK K = PKT K symmetric or hermitic matrix if the elements are complex: PK K = PK K −1 − K ( K ) H ( K ) PK K −1 or:

PK K = [ I d − K ( K ) H ( K ) ] PK K −1 QED Examples treated using Matlab software First example of Kalman filtering

The objective is to estimate an unknown constant drowned in noise. This constant is measured using a noise sensor. The noise is centered, Gaussian and of variance equal to 1. The initial conditions are equal to 0 for the estimate and equal to 1 for the variance of the estimation error.

274


clear t=0:500; R0=1; constant=rand(1); n1=randn(size(t)); y=constant+n1; subplot(2,2,1) %plot(t,y(1,:)); plot(t,y,’k’);% in B&W grid title(‘sensor’) xlabel(‘time’) axis([0 500 -max(y(1,:)) max(y(1,:))]) R=R0*std(n1)^2;% variance of noise measurement P(1)=1;%initial conditions on variance of error estimation x(1)=0; for i=2:length(t) K=P(i-1)*inv(P(i-1)+R); x(i)=x(i-1)+K*(y(:,i)-x(i-1)); P(i)=P(i-1)-K*P(i-1); end err=constant-x; subplot(2,2,2) plot(t,err,’k’); grid title(‘error’); xlabel(‘time’) axis([0 500 -max(err) max(err)]) subplot(2,2,3) plot(t,x,’k’,t,constant,’k’);% in W&B title(‘x estimated’) xlabel(‘time’) axis([0 500 0 max(x)])

The Kalman Filter

275

grid subplot(2,2,4) plot(t,P,’k’);% in W&B grid,axis([0 100 0 max(P)]) title(‘variance error estimation’) xlabel(‘time’)

Figure 7.3. Line graph of measurement, error, best filtration and variance of error

Second example of Kalman filtering The objective of this example is to extract a dampened sine curve of the noise. The state vector is a two component column vector: X1=10*exp(-a*t)*cos(w*t) X2=10*exp(-a*t)*sin(w*t) The system noise is centered, Gaussian and of variance var(u1) and var(u2).

276


The noise of the measurements is centered, Gaussian and of variance var(v1) and var(v2). Initial conditions: the components of the state vector are zero at origin and the covariance of the estimation error is initialized at 10* identity matrix. Note: the proposed program is not the shortest and most rapid in the sense of CPU time; it is detailed to allow a better understanding. clear %simulation a=0.05; w=1/2*pi; Te=0.005; Tf=30; Ak=exp(-a*Te)*[cos(w*Te) -sin(w*Te);sin(w*Te) cos(w*Te)];%state matrix Hk=eye(2);% observations matrix t=0:Te:Tf; %X1 X1=10*exp(-a*t).*cos(w*t); %X2 X2=10*exp(-a*t).*sin(w*t); Xk=[X1;X2];% state vector % measurements noise sigmav1=100; sigmav2=10; v1=sigmav1*randn(size(t)); v2=sigmav2*randn(size(t)); Vk=[v1;v2]; Yk=Hk*Xk+Vk;% measurements vector % covariance matrix of measurements noise. Rk=[var(v1) 0;0 var(v2)];% covariance matrix of noise %initialization

The Kalman Filter

sigmau1=0.1;% noise process sigmau2=0.1;%idem u1=sigmau1*randn(size(t)); u2=sigmau2*randn(size(t)); %Uk=[sigmau1*randn(size(X1));sigmau2*randn(size(X2))]; Uk=[u1;u2]; Xk=Xk+Uk; sigq=.01; Q=sigq*[var(u1) 0;0 var(u2)]; sigp=10; P=sigp*eye(2);%covariance matrix of estimation error P(0,0) % line graph subplot(2,3,1) %plot(t,X1,t,X2); plot(t,X1,’k’,t,X2,’k’)% in W&B axis([0 Tf -max(abs(Xk(1,:))) max(abs(Xk(1,:)))]) title(‘state vect. x1&x2’) subplot(2,3,2) %plot(t,Vk(1,:),t,Vk(2,:),‘r’) plot(t,Vk(1,:),t,Vk(2,:));% in W & B axis([0 Tf -max(abs(Vk(1,:))) max(abs(Vk(1,:)))]) title(‘meas. noise w1&w2’) subplot(2,3,3) %plot(t,Yk(1,:),t,Yk(2,:),‘r’); plot(t,Yk(1,:),t,Yk(2,:));% in W&B axis([0 Tf -max(abs(Yk(1,:))) max(abs(Yk(1,:)))]) title(‘observ. proc. y1&y2’) Xf=[0;0]; %%estimation and prediction by Kalman

277

278


for k=1:length(t); %%prediction Xp=Ak*Xf; % Xp=Xest(k+1,k) and Xf=Xest(k,k) Pp=Ak*P*Ak’+Q; % Pp=P(k+1,k) and P=P(k) Gk=Pp*Hk’*inv(Hk*Pp*Hk’+Rk); % Gk=Gk(k+1) Ik=Yk(:,k)-Hk*Xp;% Ik=I(k+1)=innovation % best filtration Xf=Xp+Gk*Ik; % Xf=Xest(k+1,k+1) P=(eye(2)-Gk*Hk)*Pp;% P=P(k+1) X(:,k)=Xf; P1(:,k)=P(:,1);%1st column of P P2(:,k)=P(:,2);%2nd column of P end err1=X1-X(1,:); err2=X2-X(2,:); %% line graph subplot(2,3,4) %plot(t,X(1,:),t,X(2,:),‘r’) plot(t,X(1,:),‘k’,t,X(2,:),‘k’)% in W&B axis([0*Tf Tf -max(abs(X(1,:))) max(abs(X(1,:)))]) title(‘filtered x1&x2’) subplot(2,3,5) %plot(t,err1,t,err2) plot(t,err1,‘k’,t,err2,‘k’)% in W&B axis([0 Tf -max(abs(err1)) max(abs(err1))]) title(‘errors’)

The Kalman Filter

subplot(2,3,6) %plot(t,P1(1,:),‘r’,t,P2(2,:),‘b’,t,P1(2,:),‘g’,t,P2(1,:),‘y’) plot(t,P1(1,:),‘k’,t,P2(2,:),‘k’,t,P1(2,:),t,P2(1,:),‘b’) axis([ 0 Tf/10 0 max(P1(1,:))]) title(‘covar. matrix filter. error.’)% p11, p22, p21 and p12

Figure 7.4. Line graphs of noiseless signals, noise measurements, filtration, errors and variances

279

Table of Symbols and Notations

N, R, C

Numerical sets

L2

Space of summable square function

a.s.

Almost surely

E

Mathematical expectation

r.v.

Random variable

r.r.v.

Real random variable

a. s. X n ⎯⎯→ X

Convergence a.s. of sequence X n to X 2

⋅, ⋅ L2 ( )

Scalar product in L

⋅

Norm

L2 (

)

L2

Var

Variance

Cov

Covariance

⋅∧⋅

Min ( ⋅ , ⋅)

X ∼ N (m, σ 2 )

Normal law of means m and of variance

AT

Transposed matrix

HKY

Hilbert space generated by YN , scalar or multivariate processes

Pr ojHY K

σ2

A

Projection on Hilbert space generated by Y( t ≤ K )

282

Discrete stochastic Processes and Optimal Filtering

XT

Stochastic process defined on T (time describes T )

p.o.i.

Process with orthogonal increments

p.o.s.i.

Process with orthogonal and stationary increments

Xˆ K |K −1

Prediction at instant K knowing the measurements of the process YK of instants 1 to K −1

X K |K −1

Prediction error

Xˆ K |K

Filtering at instant K knowing its measurements of instants 1 to K

X K |K

Filtering error

∇λ C

Gradient of function C ( λ )

{X P }

The set of element

1D

Indicative function of a set

X which verify the property P D

Bibliography

[BER 98] BERTEIN J.C and CESCHI R., Processus stochastiques et filtrage de Kalman, Hermes, 1998. [BLA 06] BLANCHET G. and GARBIT M., Digital Signal and Image Processing using MATLAB, ISTE, 2006. [CHU 87] CHUI C.K. and CHEN G., Kalman Filtering, Springer-Verlag, 1987. [GIM 82] GIMONET B., LABARRERE M. and KRIEF J.P., Le filtrage et ses applications, Cépadues editions, 1982. [HAY 91] HAYKIN S., Adaptive Filter Theory, Prentice Hall, 1991. [MAC 81] MACCHI O., “Le filtrage adaptatif en telecommunications”, Annales des Télécommunications, 36, no. 11-12, 1981. [MAC 95] MACCHI O., Adaptive Processing: The LMS Approach with Applications in Transmissions, John Wiley, New York, 1995. [MET 72] METIVIER M., Notions fondamentales de la théorie des probabilités, Dunod, 1972. [MOK 00] MOKHTARI M., Matlab et Simulink pour étudiants et ingénieurs, Springer, 2000. [RAD 84] RADIX J.C., Filtrages et lissages statistiques optimaux linéaires, Cépadues editions, 1984. [SHA 88] SHANMUGAN K.S. and BREIPOHL A.M., Random signal, John Wiley & Sons, 1988. [THE 92] THERRIEN C.W., Discrete random signals and statistical signal processing, Prentice Hall, 1992. [WID 85] WIDROW B. and STEARNS S.D., Adaptive Signal processing, Prentice Hall, 1985.

Index

A, B adaptive filtering 197 algebra 3 analytical 187 autocorrelation function 96 autoregressive process 128 Bienaymé-Tchebychev inequality 143 Borel algebra 3

C cancellation 199 Cauchy sequence 158 characteristic functions 4 coefficients 182 colinear 213 convergence 218 convergent 219 correlation coefficients 41 cost function 204 covariance 40 covariance function 107 covariance matrix 258 covariance matrix of the innovation process 248

covariance matrix of the prediction error 249 cross-correlation 184

D deconvolution 199 degenerate Gausian 64 deterministic gradient 225 deterministic matrix 245 diffeomorphism 31 diphaser 209

E eigenvalues 75 215 eigenvectors 75 ergodicity 98 ergodicity of expectation 100 ergodicity of the autocorrelation function 100 expectation 67

F, G filtering 143, 247 Fubini’s theorem 162 Gaussain vectors 13

286


Gradient algorithm 211

P

H, I

Paley-Wiener 187 prediction 143 199 247 258 prediction error 249 predictor 200 pre-whitening 186 principle axes 215 probability distribution function 12 process noise vector 245 projection 183

Hilbert spaces 145 Hilbert subspace 144 identification 199 IIR filter 186 impulse response 182 independence 13, 246 innovation 240 innovation process 172, 248 causal 187 orthogonal 192

K, L Kalman gain 254 least mean square 184 linear observation space 168 linear space 104 LMS algorithm 222 lowest least mean square error 185

M, N

Q, R quadratic form 216 random variables 194 random vector 1 3 random vector with a density function 8 regression plane 151 Riccati’s equation 258, 260

S

marginals 9 Markov process 101 matrix of measurements 246 measure 5 measurement noise 238 measurement noise vector 246 minimum phase 187 multivariate 245 250 multivariate processes 166 multivector 245

Schwarz inequality 160 Second order stationarity 96 second order stationary processes 199 singular 185 smoothing 143, 247 spectral density 106 stability 218 stable 219 state matrix 245 stationary processes 181 stochastic process 94 system noise 238

O

T

observations 245 orthogonal matrix 216 orthogonal projection 238

Toeplitz 211 216 trace 257 trajectory 94 transfer function 121

Index

transition equation 247 transition matrix 247

U-Z unitary matrix Q 216 variance 39 white noise 109 185 Wiener filter 181

287

Discrete Stochastic Processes and Optimal Filtering

Stochastic Processes and Filtering Theory

Discrete stochastic processes

Optimal Filtering

Optimal Filtering

Optimal Filtering

Stochastic optimal control Discrete-time Case

Stochastic optimal control: the discrete time case

Stochastic Processes and Filtering Theory (Mathematics in Science and Engineering)

Markov decision processes: Discrete stochastic dynamic programming

Markov decision processes: Discrete stochastic dynamic programming

Stochastic Mechanics and Stochastic Processes

Markov decision processes: discrete stochastic dynamic programming

Markov decision processes: discrete stochastic dynamic programming

Nonlinear Filtering and Stochastic Control

Fundamentals of stochastic filtering

Fundamentals of Stochastic Filtering

Stochastic processes

Probability and Stochastic Processes

Stochastic Processes and Models

Stochastic processes

Stochastic Processes

Stochastic Processes

Stochastic Processes and Models

Stochastic processes and models

Stochastic Processes

Stochastic Processes

Stochastic processes

Nonlinear Filtering and Optimal Phase Tracking

Stochastic processes

Stochastic processes

Discrete Stochastic Processes and Optimal Filtering

Stochastic Processes and Filtering Theory