LECTURES
ON
PRQBRB1LITV RND SECOND ORDER RANDOM RELDS
This page is intentionally left blank
Series on Advances in ...
26 downloads
718 Views
40MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LECTURES
ON
PRQBRB1LITV RND SECOND ORDER RANDOM RELDS
This page is intentionally left blank
Series on Advances in Mathematics for Applied Sciences - Vol.
LECTURES ON
PROBABILITY AND SECOND ORDER RANDOM FIELDS Diego Bricio Hernandez
World Scientific Singapore • New Jersey • London • Hong Kong
30
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Bricio Hernandez, Diego. Lectures on probability and second order random fields / Diego Bricio Hernandez. p. cm. — (Series on advances in mathematics for applied sciences, vol. 30) Includes bibliographical references and index. ISBN 9810219083 1. Random fields. 2. Probabilities. I. Title. II. Series. QA274.45.B75 1995 519.2-dc20 94-30544 CIP
Copyright © 1995 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form orbyanymeans, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970, USA.
Printed in Singapore by Uto-Print
Foreword This book was written by Diego Bricio Hernandez during his stay at the University of Padua, Italy, in 1991, but was sent to the publisher only after the author's premature death in November 1993, at age 48. Diego spent extensive periods of time in Padua as a Visiting Professor, and maintained regular contacts with many others in the Italian scientific community. Those who knew Diego could not help but be touched by his warmth, generos ity, and selflessness, and by his enthusiasm in helping and collaborating with others. In publishing this monograph, Diego's many friends and colleagues join me in paying an affectionate tribute to his memory: Giovanni Andreatta, Paolo Dai Pra, Giovanni Di Masi, Susi Dulli, A. Gombani, Giovanni Marchesini, Luigi Mariani, Claudio Paniconi, Michele Pavon, Giorgio Picci, Stefano Pinzoni, Mario Pitteri, Mario Putti, Andrea Rinaldo, Wolfgang Runggaldier, Flavio Sartoretto and Marco Vianello (in Padua); S. Bittanti, L. Galgani and A. Locatelli (in Milan); Nicola Bellomo and R. Monaco (in Turin). Padua, June 23, 1994 Renato Spigler
v
This page is intentionally left blank
Preface This report originated from a series of eight weekly lectures delivered by the author at the DMMMSA 1 during the Spring Term of 1991. The target audience was a mix of mathematicians and engineers, recruited from the Universities of Padua and Trento. The purpose of delivering these seminars was twofold, namely to provide: • the theoretical background material required for the computer generation of random fields, of interest in various fields of Applied Mathematics, and • the necessary probabilistic background suitable for applied work in Water Resources Engineering as well as Signal and Image Processing. As to the first of these two goals, the main mathematical tools are the various representation theorems for second order random fields. The Karhunen-Loeve expansion is proposed as the main ingredient of a simulation algorithm for mean square continuous random fields defined on compact sets. Analogously, follow ing [17] the Cramer representation formula for random fields as an integral with respect to a random orthogonal measure is made the basis of a simulation algo rithm for random fields defined on a Euclidean space. This material is presented in Chapters 7 and 8, and relies heavily on the earlier chapters. In turn, Chapters 1 through 7 develop the mathematical notions referred to in the second of the two points mentioned above. The introductory chapter of the book by Dagan [7] was selected in order to provide some guidance in the selection of topics. In fact, these seven chapters simply develop the concepts required in Dagan's book, trying to present them in accordance with the spirit of modern Probability Theory and in somewhat greater depth. 1
Dipartimento di Metodi e Modelli Matematici per le Scienze Applicate, University of Padua (Italy) vii
Vlll
PREFACE
Chapters 1 and 2 constitute a brief recollection of the fundamental concepts of classical probabihty theory. For later reference, a brief discussion on Monte Carlo methods has been included in Chapter 3. In turn, chapters 4 and 5 contain the main mathematical tools required for the study of random fields from the "second order" point of view, namely the L2 theory of random variables and the Fourier transform on Euclidean spaces, respectively. With these tools at hand, chapters 6 and 7 develop the second order theory of random fields, including the representation formulas referred to above. Finally, chapter 8 discusses the practical issues involved in generating random fields on a computer, as well as modelling them from experimental data. The selection of topics is such that concepts are introduced only if they are required later on, thus guaranteeing the minimality of the presentation. The degree of difficulty of this material is intermediate, and thus it should be accessible to an upper level engineering student interested in applying these concepts to her/his disciphne. The very basics of Lebesgue integration should be a part of the mathematical toolkit of these students. It is only hoped that these lecture notes will serve to lure some young engineers and applied mathematicians into the study of these probabilistic topics. Full proofs of theorems are given only in some special cases. Instead, the style of this presentation seeks to motivate the reader before introducing the various concepts, and then to illustrate at least some of the connections among them. However, a modern, measure theoretical presentation of probabihty is adopted, and no attempt has been made to hide the concepts away from the reader. This is beheved to be important if engineers are to ready themselves for a profitable consultation of the modern hterature on the subject. These lectures on second order random fields were delivered following the invitation to do so received from F. Sartoretto and G. Gambolati. Indeed, both the lectures and these notes owe a lot to Sartoretto's initiative. My deepest recognition for his contribution. Besides, M. Putti made me aware of both [7] and [17], both of which constituted the guiding hght in the choice of topics covered here. In addition, C. Paniconi translated Chapter 1 from the Italian original into English. Their contribution is gratefully acknowledged. Needless to say, the contents of these notes remain my exclusive responsibility.
PREFACE
IX
I also want to thank the other participants in the seminar on random fields, namely R. Rigon and A. Rinaldo (Trento) as well as P. Salandin, R. Spigler, M. Takagi and M. Vianello (Padua), all of whom provided valuable interventions both at lecture time and afterwards. Partial support was received from DMMMSA, as well as from the Faculty of Engineering of the University of Padua and from The Venice Institute.
DBH, La Stanga, Fall of 1991
This page is intentionally left blank
Contents Preface
vii
1
R a n d o m Variables 1.1 The Concept of Probability 1.2 Random Variables 1.3 Distributions 1.4 Expected Value
1 1 5 8 14
2
R a n d o m Vectors 2.1 Joint Distributions 2.2 Independence 2.3 Transformations of Random Vectors
21 21 26 30
3
Sampling R a n d o m Variables 3.1 Designing a Sampling Technique 3.2 Monte Carlo Methods 3.3 Error Bounds
35 35 39 43
4
S e c o n d Order P r o p e r t i e s 4.1 Orthogonality 4.2 Orthogonal Projections 4.3 Conditional Expectation 4.4 Optimal Linear Estimation
51 52 56 59 62
5
The 5.1 5.2 5.3
67 67 71 76
Fourier Transform Characteristic Functions The Fourier Transform The Plancherel Theorem xi
xii
CONTENTS
6
Second Order R a n d o m Fields 6.1 Covariance Functions 6.2 Construction of Random Fields 6.3 Analytical Properties of Random Fields 6.4 Spectral Representation of Covariances
81 81 86 91 98
7
Spectral Representation of R a n d o m Fields 7.1 Random Measures 7.2 Stochastic Integrals 7.3 The Spectral Representation
103 103 107 112
8
Sampling and Modeling R a n d o m Fields 8.1 Sampling Orthogonal Random Measures 8.2 Fast Fourier Computations 8.3 Simulation and Orthogonal Expansions 8.4 Mean and Covariance Estimation
117 118 121 125 130
Bibliography
135
A The Sources
139
Index
141
Chapter 1 Random Variables This introductory chapter presents the basic probabilistic ideas that will be re quired in the sequel. It is not meant to replace a good introductory book (prefer ably [10]). Instead, it should only be regarded as a friendly introduction to the subject, written as a handy reference and for purposes of motivation. Section 1.1 introduces the concept of probability, from its empirical foundations to the fully axiomatic presentation of our days. The main mathematical object of study in Probability is the Probability Space - consisting of the space of results (Q), the family of all events (A), and the probability measure (P) - and this is clearly emphasized. The remaining three sections deal with random variables, and with the accompanying concepts of distribution functions, density functions, expected value, mean, variance, etc. Characteristic functions will be presented in section 5.1.
1.1
The Concept of Probability
A general model for an experiment could be constructed in the following manner: 1. Take a non-empty set f2, in such a way that each u> 6 fi represents a possible o u t c o m e of the experiment. 2. To the "greatest possible number" of subsets / I c i l , each subset A rep resenting a possible e v e n t , associate a number P(A) £ [0,1], called the probability of such event. fl is called the sample space of the experiment and each A C 0 to which we can associate a probability is called an e v e n t associated with the experiment. 1
2
CHAPTER
1. RANDOM
VARIABLES
We see that event A has been observed when we perform a realization of the experiment and an outcome ui £ A is obtained. What is the probability associated with an event? There are various an swers to this question, depending on the field or application. For example, an experimental researcher can decide to proceed in the following way: repeat the experiment a large number of times, say N, noting each time whether event A has occurred. Let NA be the number of times that event A has occurred and let NA be the corresponding relative frequency. We can take this value /A as the probability of A: the larger N is, the more valid is the assumption. This manner of proceeding is called the frequency approach: the probability of event A is denned as P{A) = Mm ~ .
(1.1)
A related approach is one used frequently in fields such as Signal Theory, in which we study phenomena which vary in time in a stationary manner. We define the time average of a signal / as
^■•=^lomdtFor an experiment in which the outcomes are non-negative real numbers (fl = [0, oo) ), for each A C fi we can define the special signal
W>-\
f i if* e A 0 Stf A.
The 1
1
— l")- JJ (i •2) T-K» T :an be identified identified with the probability nrohahilitv of nf having h avinor an a n ontrnTnp It is clear that (JA) can outcome in A, that is with the probability of A. Clearly equations (1.1) and (1.2) are analogous and both correspond to an experimental approach to the problem of knowledge acquisition. A radically different approach, because it is a priori, is one which we call Laplacian: there are N possible outcomes, or rather #Q = N\ if an event A can . . . £ A = > U~ =1 A„ e A The elements of A are called e v e n t s . Definition 1.1.2 Let A be a cr-algebra of subsets of fi. A function P : A —> [0,1] is a probability if i) P(fl) = 1; ») ^ ( £ ~ = i An) = E~=i P(A„) for every family {A n } of events for which An D Am = 0 if ra ^ m. The ordered triple
(n.A-P) is called a probability space if • n is a non-empty set (the space of o u t c o m e s ) , • A is a cr-algebra of subsets of Q, (the family of events),
(i.4)
1.2. RANDOM
5
VARIABLES
• P : A -* [0,1] is a probabihty. The structure (1.4) constitutes the basis for a mathematical model of an experiment, keeping in mind the non-reproducibility which is often encountered in empirical sciences. For instance, in a coin toss the outcomes are "heads" or "tails" and we can take
n = {H,T}. All possible events are elements of ■A = { 0 , { t f } , { T } , Q } . Lastly, if the coin is "fair", the probability P is given by 0 H->0 {H} h-> 1/2
Q H
{T} ^
1 1/2
according to the Laplacian approach. All the properties of probability, without exception, are derived from the mathe matical model (1-4).
1.2
R a n d o m Variables
A much more interesting (and more complex) example of a probabihty space is as follows: • We wish to study the steady state response of a confined heterogeneous aquifer, under conditions of water withdrawal. We measure the hydraulic conductivity at various points in the aquifer for confidence in the results. Hence it is convenient to consider modeling the aquifer as a medium with random characteristics. The scope of the modeling problem will be to construct a probabihty space {£l,A, P) adapted to this situation. The possible outcomes in our experiment of measuring hydraulic conductivity (i.e. the corresponding profiles) are functions which assume a value at each point in the space which we are working in. Let a set S C M be the mathematical representation of this space, illustrated in figure 1.1. Then the elements of £1 are the functions ui : S -> JR, where
31
CHAPTER
1. RANDOM
VARIABLES
Figure 1.1: The aquifer as our physical space. ui(x) = hydraulic conductivity at point x. We shall ask that the space 5 be open, for technical motives, and that the func tions be regular, say continuously differentiable. In fact, let us take
n = c1(5). A more classical notation for the hydraulic conductivity at point x when the profile is ui would be D(x,ui)
=
Lj(x).
Q is made up of continuous functions, thereby we can specify A in a natural way, using the properties of the space of continuous functions with respect to uniform convergence 1 . Let u : S —> IR be the piezometric level, which we assume to be a twice continuously differentiable function: u g C2(S), for each data configuration in the medium. The advantages of this degree of regularity will become apparent below. The confined nature of the medium is manifested in the boundary conditions £
- 0 in dS.
(1.5)
Let us consider the space of all continuous functions on S, with the topology of uniform convergence on compact subsets of S. Then Si is a subspace of this space, from which it inherits the relative topology. Let us take A to be the smallest tr-algebra containing all the open subsets of f2 i.e. A consists of the corresponding Borel subsets of (I. See [31].
1.2. RANDOM
VARIABLES
7
The continuity equation asserts that V(D(x,w)u)
= {x) in S
(1.6)
for each w £ Q, (j>(x) being the local extraction rate per unit volume, that is
I If {x)dv ^! J J JB sec of water is extracted from the zone B, for any open set B C S. determine the function
u: S xil^-
Hence we
St
solving the boundary value problem (1.5), (1.6). For each point x £ S we determine a function w H4 U(X,U) which represents the piezometric level at the point x if the hydraulic conductivity profile in the medium is u> £ fi. In other words the piezometric level at a point is a random variable (see the definition below). In general we are interested in real functions X defined in the space of out comes, for a given probability space (U,A,P). We are interested in associating a probability with all the events of the type • The value of X is larger than b £ St. • The value of X is not larger than a £ IR. • The value of X falls in the interval [a, b], etc. These events will be denoted by means of the symbols (X > b), (X < a), (a < X < b), etc. In general {X £ B) = {w £ 0 : X(w) £ B} for each B C St. For this purpose we introduce the following concept: Definition 1.2.1 A r a n d o m variable in (fi, A, P) is a function X : il —► St such that (X < x) £ A for each x £ St. For a random variable X the probability P(X < x) is always defined, since (X < x) is always an event. It can be shown that (A' > a), (a < X < b) are always events if A is a random variable. In fact (A £ B) is always an event -
CHAPTER
8
1. RANDOM
VARIABLES
and hence P(X G B) is always denned - provided that B is an open or closed set in IR, or the countable union or countable intersection of such sets, or the complement of such a union or intersection, etc. The sets B C IR which can be obtained starting from the intervals by means of set operations such as countable complementation, union, and intersection are called Borel s u b s e t s of IR, in honor of the French mathematician Emile Borel. The set of the Borel sets of IR - denoted by B(IR) or simply B - is a : IR —» IR a continuous function. Then i) aX + BY is a random variable, ii) XY
is a random variable,
iii) cj) o X is a random variable. In particular \X\, X2, cos(X), sin(JT), etc. are random variables if X is. We can also consider complex random variables: if X and Y are real random variables, Z := X + iY is a complex random variable, by definition. Hence e,Y := c o s y + isin Y is a complex random variable and likewise e := e e .
1.3
Distributions
Let X be a random variable in the probability space (Q, A, P). We can determine a function Fx ■ IR —> IR (called the cumulative distribution - CDF - or simply distribution function of X) given by Fx(x) : = P{X < x).
(1.7)
Clearly if x < y then (X < x) C (X < y) and hence Fx{x) < Fx(y), that is, Fx is monotonically increasing. Furthermore, when a; —>• +oo the event (X < x) tends to 0. and when x —»■ —oo the event (X < x) tends to 0. Hence it is reasonable that lim Fx(x) = 0, Urn Fx{x) = 1. (1.8) i-> + oo
v
'
s:->-oo
v
'
v
'
We can prove that this occurs, along with the less intuitive fact that Fx is always right-continuous, that is the following result holds:
1.3.
31
DISTRIBUTIONS
Figure 1.2: A continuous CDF P r o p o s i t i o n 1.3.1 The cumulative distribution function of a random variable X is monotonically increasing, right-continuous, and satisfies (1.8). The plots of cumulative distribution functions have the general form shown in figure 1.2 or 1.3. How can we determine the cumulative distribution function of a given random variable XI We can sample values of X, repeating N times the corresponding experiment, obtaining for each x £ IR the table of results: Repetition 1 2
Observation xi x2
x; < xl yes no
N
xN
yes
Clearly #{i 0 for every k and • J2kPk = 1-
1.4
Expected Value
A random variable can take on a large, often infinite, number of values. We are interested in substituting, in place of the possible values of the random variable, a representative value which takes into account the large or small probabilities associated with the above-mentioned values. The following construction gives a three-step definition (1.12) - (1.13) - (1.15) of this representative value of a random variable, to be termed its e x p e c t e d value. It is given in the spirit of abstract integration theory, and follows closely the first chapter of [31]. Suppose X is a non-negative random variable. For each n > 1, consider the points k , ¥ , fe = 0 , l , 2 , . . . and let
*»H = i- if i- < * M < *■ i
O"
9n
—
^ 2n
Note that Xn —>• X when n —> oo a.s. Observe that Xn assumes at most countable many values, depending on the observed outcome w, i.e. X is simple. Let £ consist of all non-negative simple random variables S such that S < X a.s.; we know now that S ^ 0. For each S £ E, let a\, ) = crk if xk < X(u) < xk+iThe weighted average of S is then defined as f S dP := f^ akP (xk < X < xk+1), J
(1.12)
k=i
and it is called the integral of S with respect to P. It is a non-negative number, possibly +oo. Define the integral of X with respect to P by means of / X dP := sup / S dP.
(1.13)
1.4. EXPECTED
VALUE
15
Again, / X dP is a non-negative number, possibly +00. Observe that
/ XdP < J Y dP if X a convex function.
a) aX +/3Y £ Lx and E{aX
+ /3Y) = aEX
Then
+ J3EY.
b ) <j>{EX) < E(cf> o X) (Jenssen's inequality).
In particular if X £ L\ then
E(),
is the piezometric head at x corresponding to a hydrauhc conductivity profile w. 21
22
CHAPTER
2. RANDOM
VECTORS
Suppose finitely many points pi,... ,pn G 5 are selected for measurement and let the random variables Xi,..., X„ be defined by X{ := YPi,
i=l,...,n.
Then, X := (Xi,... ,Xn) constitutes a r a n d o m vector: it is a vector valued random quantity. Abstractly, given a probability space (Q,A,P), an n-dimensional r a n d o m vector is a function X : Cl —> lRn such that each Xi is a random variable. Here X{ is the i-th coordinate of X, defined by X(u>) =
(X1(u,),...,Xn(u,)).
Just as in the scalar case, the probabilistic properties of a random vector X are embodied in its distribution function Fx : IRn —> IR, defined as follows: FXx
x„(*i, ...,*•.)•■= P ( * i < »i, • • ■ , * „ < »„),
where the various "," stand for "fl", i.e. (A, B ) E i f l B ,
for A, 5 € A.
Clearly
n(* * » ) = F *>
X._ t (!Bl,-..,!Bn-l)-
(2.2)
In addition i=l
3=1
for any permutation { i i , . . . , i n } of { 1 , . . . , re}, hence the invariance property FxH,...,xin(z>i,
■ ■ ■ ,xin)
= FXl
Analogously
n(Ai - - - »**.*fc+i. - ••,*;
Compare (2.7) with (2.3) and (2.8) with (2.2), respectively. The famous Kolmogorov Extension Theorem states that if both (2.7) and (2.8) hold, then the modeling problem has a solution. Conditions (2.7) and (2.8) are referred to in the literature as Kolmogorov consistency conditions. See chapter 1 of [24] for a proof of Kolmogorov's Theorem. Just as in the scalar case, a joint distribution may have a density. In such a case ■ ■■ ■oo
J — oo
fxi,,..,xn{ti,-.',t;n)dii
...d£n
2.1.
JOINT DISTRIB
UTIONS
25
and
'* *■ = o^t:-
(2 9)
-
An example of joint density is the standard multidimensional Gaussian dis tribution; a random vector X has such a distribution - X ~ JV(0, / ) - if its density is fx(x)
= -^=e-IWI 3 / 2 ,
(2.10)
where ||-|| denotes the ordinary EucUdean norm in IR". In general a continuous function / : JRn —> 2R will be a probabihty density only if • f(x)
> 0 for all x G IR,
• SlR" f(x)dx
= 1-
If the components X i , . . . , X n of a random vector X have expectation (i.e. if Xi G £ i , i = 1 , . . . , n ) , then we say that .X G Z• ••,£*»)r-
(2-11)
Analogously we say that X G £2 if -Xi G £2, i = 1, ■ • •, ra. In such a case, each of the expectations EXiXj exist, in virtue of Schwarz's inequality (4.14). Thus the random matrix (X — EX)(X — EX)T has expectation. The covariance matrix of X is defined as - EX)T'.
Cov(X) := E(X - EX){X The (i,j)-th
(2.12)
element of Cov(X) is ea := E{Xi - EXi)(Xj
-
EXjf,
and it is referred to as the covariance of Xi and Xj. A simple computation shows that if X ~ N(0,1), then X has mean 0 G IRn and its covariance matrix is the identity matrix / G IRnxn.
26
2.2
CHAPTER
2. RANDOM
VECTORS
Independence
In courses on Elementary Probability one defines the conditional probability of A given B as p{AnB)
P(A\m-
P{AlB)
P(B) ' provided both A and B are events and P(B) / 0. One can say that A and B are independent if P{A\B) = P(A) and P(B\A) = P{B), if both conditional probabilities are defined. Thus A and B are indepen dent if and only if P(AnB) = P(A)P(B). (2.13) Clearly, this last condition makes sense even if either P(A) or P(B) hence can be (and usually is) taken as the definition of independence B. Two random variables X and Y are independent if the events (X (V < y) are independent in the sense of (2.13) for each choice of x,y € P(X <x,Y iR n by setting y(«) =
ff(A-H),
wtfl,
2.3. i.i.
TRANSFORMATIONS TKANatUHMATlUlyS
OF RANDOM RANDOM OF
631
VECTORS VtiLHURS
-- briefly Y = g(X). g{X). What is the distribution of YI Y? Does Y have a density if jX does? Let X have a continuous density and suppose g is continuously differentiablt differentiate, with a non vanishing Jacobian. Then it is given by the equations yi
=
gi(xi,.. ■ » » ! . )
=
Sfi(a:i,...,ajB)
yn = Vn =
gn(xi,. ■ » * n ) gn{xu...,xn)
Vi
whereas its inverse is given by equations ss ii
= =
2x„n
=
hi(yu•...,y .yn) n) fti(yi). ). hn(yhx,.n(yu■...,y > nyn).
We know that J:_8(x u...,xn) J:=8(xu...,xn)
oo{yu---,yn) (yi,..-,yn)
Let Let
(( -- 00 00 ,, yy] ] := := {77 {77 ee 2R" 2R" :r?i :r?i < < yi, yi,
= ll ,, .. .. .. ,, nn }} .. tt =
Then P(Y Vn)) T,(gl(Vl,---.yn),.--,»~(Vl,---,Vn)) d(yi,...,yn)
2 24 ((2.24) - )
32
CHAPTER
2. RANDOM
VECTORS
In (2.23), assume A is nonsingular. Then the Jacobian needed is det(A ^/det{AAT)~\
) =
Since 1
/
II
Il2
fx{x) =
exv
v^
1
-rf ii^m^ri^m
by (2.24) we obtain
/.a-) , my>
^/(2,r)»det(^)
P
l
2
/
This is the multivariate Gaussian distribution with mean 6 and covaxiance matrix AAT, and we say that Y ~ N(b, AAT). In general, the multivariate Gaussian distribution with mean m and covariance K > 0 is the random vector Y with density My)
1
=
=e -|(y-m)
f
jr-'(y-m)
(2
25)
^/(27r)"det(io!
Then x = --log(l
-u)
3.2.
MONTE
CARLO
METHODS
39
gives a sample from X if U is a random number. So does y = --logu,
(3.4)
by the way, with the added advantage of saving one subtraction. If equation (3.3) is difficult to solve, an open possibility is devising an approx imation to Fx1 and using x = (j>(u)
instead of solving (3.3). See [37] for additional tricks for sampling random vari ables.
3.2
Monte Carlo Methods
The Monte Carlo method for solving problems consists of • devising a probabilistic model which contains the problem's solution as a parameter, and • sampling the relevant random variables in order to estimate such parameter. Typically, the parameter to be estimated is the mean of a certain random variable X. By the Law of Large Numbers, the sample mean constitutes a good estimate of EX, for large n. Hence the solution to the original problem can be approximated by means of a probabilistic mechanism, even if the original problem itself was deterministic in character. For instance, consider the problem of evaluating an integral /=/1/(x)dx, Jo
(3.5)
for continuous / . If U is a random variable uniformly distributed in [0,1], then f(U) is a random variable, and Ef(U) Given a sample Ui,...,
= I.
U„, the Monte Carlo estimates for the above integral is
. Ux + • • • + Un 7~ , n
40
CHAPTER
3. SAMPLING
RANDOM
VARIABLES
the larger n the better. In general, let a : 7R —> IR be a bounded, right continuous, non-decreasing function. Consider the problem of evaluating the Stieltjes integral +oo
fda
/
-oo
with / as above. Let m : = inf a, M := sup a, and define F := (a — m)/(M so that I = {M -m)
— m),
I °° fdF. J—oo
Observe that 0 < F(x) < 1 and, moreover, lim Fix) = 0, lim Fix) = 1, SE-+-00
3!-> + 00
so that F is indeed the distribution function of a random variable X. / = {M
Therefore
-m)Ef(X).
In theory, in order to evaluate / approximately, it suffices to draw a sample Xi,... , Xn of X and form the estimate r
. -Xl + • ■ ■ +
/ir
I ~ (M - m)
n
Xn
.
As an example, let r+oo /•+oo
I = / Jo
Xx f(x)dx e e- "T[x)ax.
Then
I
={j0+°°f(*W-e-Xn
= \Ef(X),
where X is exponentially distributed with parameter A. Using equation (3.4), a sample U\,..., Un from the uniform distribution in [0,1] can be converted into a sample Y\,.. ■ ,Yn from the exponential distribution. Then 1YX+ ■■■ + ¥„ A n for large n.
3.2. MONTE
CARLO
METHODS
41
This idea is by no means confined to the approximate evaluation of one di mensional integrals. Indeed, let R be a region in Rd, and let / , <j> : R -»■ R be two continuous functions. Suppose <j>{xi,.. ,,Xd)
> 0 inR
and let M : = / ■ ■ ■ / (f>(xi,. . . , Xd)dxi . .. dxd < +oo. Then, p := <j)/M is the density of a certain random vector X £ 5? d . Therefore, I■=]■■■
} f(xu
■ ■ ■ > xd)<j>(xu ..., xd)dXl ...dxd
=
MEf(X),
thus allowing for a Monte Carlo evaluation of / , once a sample X%, ■. ■, Xn of X is obtained, with large n. For instance, let 7
"
: =
/
" '/
e""^"'""^/(zi. • • • - ^) da; i ■ • ■ **d,
where ui := (u»i,... ,Wd), with u»i > 0, i = 1 , . . . ,d. Letting y; = VluiXi,
i = 1, ...,d,
if follows that
,„ = ^ .
£
/
( «
*
)
,3.6)
where 5^,. . . , Vj are independent standard Gaussian random variables. We have seen how to sample this distribution, and we can use such knowledge to compute a Monte Carlo estimate of Iu based on (3.6). As a last illustration of the Monte Carlo idea, consider the problem of solving a system of linear algebraic equations Ax = b,
(3.7)
which is assumed to have a unique solution x £ IR . Let us define the quadratic form
*(z) = \\Ax-b\?Q, where
(3.8)
CHAPTER 3. SAMPLING RANDOM
42
VARIABLES
with Q a positive definite and symmetric matrix. Then, solving (3.7) is equivalent to minimizing $ . Define the ellipsoid E = {x € JRd : *(x) < 1}. Observe that the centre of the ellipsoid is precisely x := A~lb. *(x) = {Ax - b)TQ{Ax - b) = (x - x)TATQA{x so that e"* (l ) differs from the N(x, (ATQA)~1) factor, namely /„,