Random Processes in Physics and Finance
This page intentionally left blank
Random Processes in Physics and Finance ...
79 downloads
873 Views
13MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Random Processes in Physics and Finance
This page intentionally left blank
Random Processes in Physics and Finance MELVIN LAX, WEI CAI, MIN XU Department of Physics, City University of New York Department of Physics, Fairfield University, Connecticut
OXPORD UNIVERSITY PRESS
OXTORD UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dares Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © Oxford University Press 2006 The moral rights of the authors have been asserted Database right Oxford University Press (maker) First published 2006 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available Printed in Great Britain on acid-free paper by Biddies Ltd., King's Lynn, Norfolk ISBN 0-19-856776-6 10 9 8 7 6 5 4 3 2 1
978-0-19-856776-9
Preface The name "Econophysics" has been used to denote the use of the mathematical techniques developed for the study of random processes in physical systems to applications in the economic and financial worlds. Since a substantial number of physicists are now employed in the financial arena or are doing research in this area, it is appropriate to give a course that emphasizes and relates physical applications to financial applications. The course and text on Random Processes in Physics and Finance differs from mathematical texts by emphasizing the origins of noise, as opposed to an analysis of its transformation by linear and nonlinear devices. Of course, the latter enters any analysis of measurements, but it is not the focus of this work. The text opens with a chapter-long review of probability theory to refresh those who have had an undergraduate course, and to establish a set of tools for those who have not. Of course, this chapter can be regarded as an oxymoron since probability includes random processes. But we restrict probability theory, in this chapter, to the study of random events, as opposed to random processes, the latter being a sequence of random events extended over a period of time. It is intended, in this chapter, to raise the level of approach by demonstrating the usefulness of delta functions. If an optical experimenter does his work with lenses and mirrors, a theorist does it with delta functions and Green's functions. In the spirit of Mark Kac, we shall calculate the chi-squared distribution (important in statistical decision making) with delta functions. The normalization condition of the probability density in chi-square leads to a geometric result, namely, we can calculate the volume of a sphere in n dimensions without ever transferring to spherical coordinates. The use of a delta function description permits us to sidestep the need for using Lebesgue measure and Stieltjes integrals, greatly simplifying the mathematical approach to random processes. The problems associated with Ito integrals used both by mathematicians and financial analysts will be mentioned below. The probability chapter includes a section on what we call the first and second laws of gambling. Chapters 2 and 3 define random processes and provide examples of the most important ones: Gaussian and Markovian processes, the latter including Brownian motion. Chapter 4 provides the definition of a noise spectrum, and the WienerKhinchine theorem relating this spectrum to the autocorrelation. Our point of view here is to relate the abstract definition of spectrum to how a noise spectrum is measured.
vi
PREFACE
Chapter 5 provides an introduction to thermal noise, which can be regarded as ubiquitous. This chapter includes a review of the experimental evidence, the thermodynamic derivation for Johnson noise, and the Nyquist derivation of the spectrum of thermal noise. The latter touches on the problem of how to handle zero-point noise in the quantum case. The zero-frequency Nyquist noise is shown to be precisely equivalent to the Einstein relation (between diffusion and mobility). Chapter 6 provides an elementary introduction to shot noise, which is as ubiquitous as thermal noise. Shot noise is related to discrete random events, which, in general, are neither Gaussian nor Markovian. Chapters 7-10 constitute the development of the tools of random processes. Chapter 7 provides in its first section a summary of all results concerning the fluctuation-dissipation theorem needed to understand many aspects of noisy systems. The proof, which can be omitted for many readers, is a succinct one in density matrix language, with a review of the latter provided for those who wish to follow the proof. Thermal noise and Gaussian noise sources combine to create a category of Markovian processes known as Fokker-Planck processes. A serious discussion of Fokker-Planck processes is presented in Chapter 8 that includes generation recombination processes, linearly damped processes, Doob's theorem, and multivariable processes. Just as Fokker-Planck processes are a generalization of thermal noise, Langevin processes constitute a generalization of shot noise, and a detailed description is given in Chapter 9. The Langevin treatment of the Fokker-Planck process and diffusion is given in Chapter 10. The form of our Langevin equation is different from the stochastic differential equation using Ito's calculus lemma. The transform of our Langevin equation obeys the ordinary calculus rule, hence, can be easily performed and some misleadings can be avoided. The origin of the difference between our approach and that using Ito's lemma comes from the different definitions of the stochastic integral. Application of these tools contribute to the remainder of the book. These applications fall primarily into two categories: physical examples, and examples from finance. And these applications can be pursued independently. The physical application that required learning all these techniques was the determination of the motion and noise (line-width) of self-sustained oscillators like lasers. When nonlinear terms are added to a linear system it usually adds background noise of the convolution type, but it does not create a sharp line. The question "Why is a laser line so narrow" (it can be as low as one cycle per second, even when the laser frequency is of the order of 1015 per second) is explained in Chapter 11. It is shown that autonomous oscillators (those with no absolute time origin) all behave like van der Pol oscillators, have narrow line-widths, and have a behavior near threshold that is calculated exactly.
PREFACE
vii
Chapter 12 on noise in semiconductors (in homogeneous systems) can all be treated by the Lax-Onsager "regression theorem". The random motion of particles in a turbid medium, due to multiple elastic scattering, obeys the classic Boltzmann transport equation. In Chapter 13, the center position and the diffusion behavior of an incident collimated beam into an infinite uniform turbid medium are derived using an elementary analysis of the random walk of photons in a turbid medium. In Chapter 14, the same problem is treated based on cumulant expansion. An analytical expression for cumulants (defined in Chapter 1) of the spatial distribution of particles at any angle and time, exact up to an arbitrarily high order, is derived in an infinite uniform scattering medium. Up to the second order, a Gaussian spatial distribution of solution of the Boltzmann transport equation is obtained, with exact average center and exact half-width with time. Chapter 15 on the extraction of signals in a noisy, distorted environment has applications in physics, finance and many other fields. These problems are illposed and the solution is not unique. Methods for treating such problems are discussed. Having developed the tools for dealing with physical systems, we learned that the Fokker-Planck process is the one used by Black and Scholes to calculate the value of options and derivatives. Although there are serious limitations to the Black-Scholes method, it created a revolution because there were no earlier methods to determine the values of options and derivatives. We shall see how hedging strategies that lead to a riskless portfolio have been developed based on the BlackScholes ideas. Thus financial applications, such as arbitrage, based on this method are easy to handle after we have defined forward contracts, futures and put and call options in Chapter 16. The finance literature expends a significant effort on teaching and using Ito integrals (integrals over the time of a stochastic process). This effort is easily circumvented by redefining the stochastic integral by a method that is correct for processes with nonzero correlation times, and then approaching the limit in which the correlation time goes to zero (the Brownian motion limit). The limiting result that follows from our iterative procedure, disagrees with the Ito definition of stochastic integral, and agrees with the Stratanovich definition. It is also less likely to be misleading as conflicting results were present in John Hull's book on Options, Futures and Other Derivative Securities. In Chapter 17 we turn to methods that apply to economic time series and other forms including microwave devices and global warming. How can the spectrum of economic time series be evaluated to detect and separate seasonal and long term trends? Can one devise a trading strategy using this information? How can one determine the presence of a long term trend such as global warming from climate statistics? Why are these results sensitive to the choice of year from solar year, sidereal year, equatorial year, etc. Which one is best? The most
viii
PREFACE
careful study of such time series by David J. Thomson will be reviewed. For example, studies of global warming are sensitive to whether one uses the solar year, sidereal year, the equatorial year or any of several additional choices! This book is based on a course on Random Processes in Physics and Finance taught in the City College of City University of New York to students in physics who have had a first course in "Mathematical Methods". Students in engineering and economics who have had comparable mathematical training should also be capable of coping with the text. A review/summary is given of an undergraduate course in probability. This also includes an appendix on delta functions, and a fair number of examples involving discrete and continuous random variables.
Contents
A Note from Co-authors
xiv
1
Review of probability 1.1 Meaning of probability 1.2 Distribution functions 1.3 Stochastic variables 1.4 Expectation values for single random variables 1.5 Characteristic functions and generating functions 1.6 Measures of dispersion 1.7 Joint events 1.8 Conditional probabilities and Bayes'theorem 1.9 Sums of random variables 1.10 Fitting of experimental observations 1.11 Multivariate normal distributions 1.12 The laws of gambling 1.13 Appendix A: The Dirac delta function 1.14 Appendix B: Solved problems
1 1 4 5 5 7 8 12 16 19 24 29 32 35 40
2
What is a random process 2.1 Multitime probability description 2.2 Conditional probabilities 2.3 Stationary, Gaussian and Markovian processes 2.4 The Chapman-Kolmogorov condition
44 44 44 45 46
3
Examples of Markovian processes 3.1 The Poisson process 3.2 The one dimensional random walk 3.3 Gambler's ruin 3.4 Diffusion processes and the Einstein relation 3.5 Brownian motion 3.6 Langevin theory of velocities in Brownian motion 3.7 Langevin theory of positions in Brownian motion 3.8 Chaos 3.9 Appendix A: Roots for the gambler's ruin problem 3.10 Appendix B: Gaussian random variables
48 48 50 52 54 56 57 60 64 64 66
x
4
CONTENTS
Spectral measurement and correlation 4.1 Introduction: An approach to the spectrum of a stochastic process 4.2 The definitions of the noise spectrum 4.3 The Wiener-Khinchine theorem 4.4 Noise measurements 4.5 Evenness in u> of the noise? 4.6 Noise for nonstationary random variables 4.7 Appendix A: Complex variable notation
69 69 71 73 75 77 80
5
Thermal noise 5.1 Johnson noise 5.2 Equipartition 5.3 Thermodynamic derivation of Johnson noise 5.4 Nyquist's theorem 5.5 Nyquist noise and the Einstein relation 5.6 Frequency dependent diffusion constant
82 82 84 85 87 90 90
6
Shot noise 6.1 Definition of shot noise 6.2 Campbell's two theorems 6.3 The spectrum of filtered shot noise 6.4 Transit time effects 6.5 Electromagnetic theory of shot noise 6.6 Space charge limiting diode 6.7 Rice's generalization of Campbell's theorems
93 93 95 98 101 104 106 109
7
The 7.1 7.2 7.3 7.4 7.5 7.6 7.7
113 113 117 119 121 122 123 126
8
Generalized Fokker-Planck equation 8.1 Objectives 8.2 Drift vectors and diffusion coefficients 8.3 Average motion of a general random variable 8.4 The generalized Fokker-Planck equation 8.5 Generation-recombination (birth and death) process
fluctuation-dissipation theorem Summary of ideas and results Density operator equations The response function Equilibrium theorems Hermiticity and time reversal Application to a harmonic oscillator A reservoir of harmonic oscillators
69
129 129 131 134 137 139
CONTENTS
8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 9
The characteristic function Path integral average Linear damping and homogeneous noise The backward equation Extension to many variables Time reversal in the linear case Doob's theorem A historical note and summary (M. Lax) Appendix A: A method of solution of first order PDEs
Langevin processes 9.1 Simplicity of Langevin methods 9.2 Proof of delta correlation for Markovian processes 9.3 Homogeneous noise with linear damping 9.4 Conditional correlations 9.5 Generalized characteristic functions 9.6 Generalized shot noise 9.7 Systems possessing inertia
xi
143 146 149 152 153 160 162 163 164 168 168 169 171 173 175 177 180
10 Langevin treatment of the Fokker-Planck process 10.1 Drift velocity 10.2 An example with an exact solution 10.3 Langevin equation for a general random variable 10.4 Comparison with Ito's calculus lemma 10.5 Extending to the multiple dimensional case 10.6 Means of products of random variables and noise source
182 182 184 186 188 189 191
11 The rotating wave van del Pol oscillator (RWVP) 11.1 Why is the laser line-width so narrow? 11.2 An oscillator with purely resistive nonlinearities 11.3 The diffusion coefficient 11.4 The van der Pol oscillator scaled to canonical form 11.5 Phase fluctuations in a resistive oscillator 11.6 Amplitude fluctuations 11.7 Fokker-Planck equation for RWVP 11.8 Eigenfunctions of the Fokker-Planck operator
194 194 195 197 199 200 205 207 208
12 Noise in homogeneous semiconductors 12.1 Density of states and statistics of free carriers 12.2 Conductivity fluctuations 12.3 Thermodynamic treatment of carrier fluctuations 12.4 General theory of concentration fluctuations
211 211
215 216
218
xii
CONTENTS
12.5 Influence of drift and diffusion on modulation noise
222
13 Random walk of light in turbid media 13.1 Introduction 13.2 Microscopic statistics in the direction space 13.3 The generalized Poisson distribution pn(t) 13.4 Macroscopic statistics
227 227 229 232 233
14 Analytical solution of the elastic transport equation 14.1 Introduction 14.2 Derivation of cumulants to an arbitrarily high order 14.3 Gaussian approximation of the distribution function 14.4 Improving cumulant solution of the transport equation
237 237 238 242 245
15 Signal extraction in presence of smoothing and noise 15.1 How to deal with ill-posed problems 15.2 Solution concepts 15.3 Methods of solution 15.4 Well-posed stochastic extensions of ill-posed processes 15.5 Shaw's improvement of Franklin's algorithm 15.6 Statistical regularization 15.7 Image "&v restoration
258 258 259 261 264 266 268 270
16 Stochastic methods in investment decision 16.1 Forward contracts 16.2 Futures contracts 16.3 A variety of futures 16.4 A model for stock prices 16.5 The Ito's stochastic differential equation 16.6 Value of a forward contract on a stock 16.7 Black-Scholes differential equation 16.8 Discussion 16.9 Summary
271 271 272 273 274 278 281 282 283 286
17 Spectral analysis of economic time series 17.1 Overview 17.2 The Wiener-Khinchine and Wold theorems 17.3 Means, correlations and the Karhunen-Loeve theorem 17.4 Slepian functions 17.5 The discrete prolate spheroidal sequence 17.6 Overview of Thomson's procedure 17.7 High resolution results
288 288 291 293 295 298 300 301
CONTENTS
17.8 Adaptive weighting 17.9 Trend removal and seasonal adjustment 17.10 Appendix A: The sampling theorem
xiii
302 303 303
Bibliography
307
Index
323
A note from co-authors Most parts of this book were written by Distinguished Professor Melvin Lax (1922-2002), originated from the class notes he taught at City University of New York from 1985 to 2001. During his last few years, Mel made a big effort in editing this book and, unfortunately, was not able to complete it before his untimely illness. Our work on the book is mostly technical, including correcting misprints and errors in text and formulas, making minor revisions, and converting the book to LaTex. In addition, Wei Cai wrote Chapter 14, Section 10.3-10.5, Section 16.8, and made changes to Section 8.3, 16.4, 16.6 and 16.7; Min Xu wrote Chapter 13 and partly Section 15.6. We dedicate our work in this book in memory of our mentor, colleague and friend Melvin Lax. We would like to thank our colleagues at the City College of New York, in particular, Professors Robert R. Alfano, Joseph L. Birman and Herman Z. Cummins, for their strong support for us to complete this book.
Wei Cai MinXu
1 Review of probability
Introductory remarks The purpose of this chapter is to provide a review of the concepts of probability for use in our later discussion of random processes. Students who have not had an undergraduate probability course may find it useful to have some collateral references to accompany our necessarily brief summary. Bernstein (1998) provides a delightful historical popularization of the ideas of probability from the introduction of Arabic numbers, to the start of probability with de Mere's dice problem, to census statistics, to actuarial problems, and the use of probability in the assessment of risk in the stock market. Why was the book titled Against the Gods ? Because there was no need for probability in making decisions if actions are determined by the gods, it took the Renaissance period before the world was ready for probability. An excellent recent undergraduate introduction to probability is given by Hamming (1991). The epic work of Feller (1957) is not, as its title suggests, an introduction, but a two-volume treatise on both the fundamentals and applications of probability theory. It includes a large number of interesting solved problems. A review of the basic ideas of probability is given by E. T. Jaynes (1958). A brief overview of the frequency ratio approach to probability of von Mises, the axiomatic approach of Kolmogorov, and the subjective approach of Jeffreys is presented below. 1.1 Meaning of probability The definition of probability has been (and still is) the subject of controversy. We shall mention, briefly, three approaches. 1.1.1 Frequency ratio definition
R. von Mises (1937) introduced a definition based on the assumed existence of a limit of the ratio of successes S to the total number of trials N;
2
REVIEW OF PROBABILITY
If the limit exists:
it is regarded as the definition of the probability of success. One can object that this definition is meaningless since the limit does not exist, in the ordinary sense, that for any e there exists an N such that for all M > N, |P/v - P\ < e. This limit will exist, however, in a probability sense; namely, the probability that these inequalities will fail can be made arbitrarily small. The Chebycheff inequality of Eq. (1.32) is an example of a proof that the probability of a deviation will become arbitrarily small for large deviations. What is the proper statement for the definition of probability obtained as a "limit" of ratios in a large series of trials? 1.1.2 A priori mathematical approach (Kolmogorov)
Kolmogorov (1950) introduced an axiomatic approach based on set theory. The Kolmogorov approach assumes that there is some fundamental set of events whose probabilities are known, e.g., the six sides of a die are assumed equally likely to appear on top. More complicated events, like those involving the tossing of a pair of dice, can be computed by rules of combining the more elementary events. For variables that can take on continuous values, Kolmogorov introduces set theory and assigns to the probability, p, the ratio between the measure of the set of successful events and the measure of the set of all possible events. This is a formal procedure and begs the question of how to determine the elementary events that have equal probabilities. In statistical mechanics, for example, it is customary to assume a measure that is uniform in phase space. But this statement applies to phase space in Cartesian coordinates, not, for example in spherical coordinates. There is good reason, based on how discrete quantum states are distributed, to favor this choice. But there is no guide in the Kolmogorov approach to probability theory for making such a choice. The rigorous axiomatic approach of Kolmogorov raised probability to the level of a fully acceptable branch of mathematics which we shall call mathematical probability. A major contribution to mathematical probability was made by Doob (1953) in his book on Stochastic Processes and his rigorous treatment of Brownian motion. But mathematical probability should be regarded as a subdivision of probability theory which includes consideration of how the underlying probabilities should be determined. Because ideal Brownian motion involves white noise (a flat spectrum up to infinite frequencies) sample processes are continuous but not differentiable. This problem provides a stage on which mathematicians can display their virtuosity in set theory and Lebesgue integration. When Black and Scholes (1973) introduced a model for prices of stock in which the logarithm of the stock price executes a
MEANING OF PROBABILITY
3
Brownian motion, it supplied the first tool that could be used to price stock (and other) options. This resulted in a Nobel Prize, a movement of mathematicians (and physicists) into the area of mathematical finance theory and a series of books and courses in which business administration students were coerced into learning set and Lebesgue integration theory. This was believed necessary because integrals over Brownian motion variables could not be done by the traditional Riemann method as the limit of a sum of terms each of which is a product of a function evaluation and an interval. The difficulty is that with pure Brownian motion the result depends on where in the interval the function must be evaluated. Ito (1951) chose to define a stochastic integral by evaluating the function at the beginning of the interval. This was accompanied by a set of rules known as the Ito calculus. Mathematical probability describes the rules of computation for compound events provided that the primitive probabilities are known. In discrete cases like the rolling of dice there are natural choices (like giving each side of the die equal probability). In the case of continuous variables, the choice is not always clear, and this leads to paradoxes. See for example Bertrand's paradox in Appendix B of this chapter. Feller (1957) therefore makes the logical choice of splitting his book into two volumes the first of which deals with discrete cases. The hard work of dealing with continuous variables is postponed until the second volume. What "mathematical probability" omits is a discussion of how contact must be made with reality to determine a model that yields the correct measure for each set in the continuous case. The Ito model makes one arbitrary choice. Stratonovich (1963) chooses not the left hand point of the interval, but an average over the left and right hand points. These two procedures give different values to a stochastic integral. Both are arbitrary. As a physicist, I (Lax) argue that white noise leads to difficulties because the integrated spectrum, or total energy, diverges. In a real system the spectrum can be nearly flat over a wide range but it must go to zero eventually to yield a finite energy. For real signals, first derivatives exist, the ordinary Riemann calculus works in the sense that the limiting result is insensitive to where in the interval the function is evaluated. Thus the Ito calculus can be avoided. One can obtain the correct evaluation at each stage, and then approach the limit in which the spectrum becomes flat at infinity. We shall see in Chapters 10 and 16 that this limiting result disagrees with Ito's and provides the appropriate result for the ideal Brownian limit. 1.1.3 Subjective probability
Jeffreys (1957) describes subjective probability in his book on Scientific Inference. One is forced in life to assign probabilities to events where the event may occur only once, so the frequency ratio can not be used. Also, there may be no obvious elementary events with equal probabilities, e.g. (1) what is the probability that
4
REVIEW OF PROBABILITY
Clinton would be reelected? (2) What is the probability that the Einstein theory of general relativity is correct? The use of Bayes' theorem, discussed in Section 1.8, will provide a mechanism for starting with an a priori probability, chosen in a possibly subjective manner, but calculating the new, a posteriori probability that would result if new experimental data becomes available. Although Bayes' theorem is itself irreproachable, statisticians divide into two camps, Bayesians and non-Bayesians. There are, for example, maximum likelihood methods that appear to avoid the use of a priori probabilities. We are Bayesians in that we believe there are hidden assumptions associated with such methods, and it would be better to state one's assumptions explicitly, even though they may be difficult to ascertain. 1.2 Distribution functions We shall sidestep the above controversy by assuming that for our applications there exists a set of elementary events whose probabilities are equal, or at least known, and shall describe how to calculate the probability associated with compound events. Bertrand's paradox in Appendix l.B illustrates the clear need for properly choosing the underlying probabilities. Three different solutions are obtained there in accord with three possible choices of that uniform set. Which choice is correct turns out not to be a question of mathematics but of the physics underlying the measurements. Suppose we have a random variable X that can take a set S of possible values Xj for j = 1,2,..., N. It is then assumed that the probability
of each event j is known. Moreover, since the set of possible events is complete, and something must happen, the total probability must be unity:
If X is a continuous variable, we take
as given, and assume completeness for the density function p(x) in the form
STOCHASTIC VARIABLES
5
The "discrete" case can be reformatted in continuous form by writing
where 6(x) is the Dirac delta function discussed in Appendix l.A. It associates a finite probability PJ to the value X = Xj. Since mathematicians (until the time of Schwarz) regarded delta functions as improper mathematics, they have preferred to deal with the cumulative density function
which they call a distribution whereas physicists often use the name distribution for a density function p(x). The cumulative probability replaces delta functions by jump discontinuities which are regarded as more palatable. We shall only occasionally find it desirable to use the cumulative distribution.
1.3 Stochastic variables We shall refer to X as a random (or stochastic) variable if it can take a set (discrete or continuous) of possible values x with known probabilities. With no loss of generality, we can use the continuous notation. These probabilities are then required to obey
Mathematicians prefer the latter form and refer to the integral as a Stieltjes integral. We have tacitly assumed that X is a single (scalar) random variable. However, the concept can be immediately extended to the case in which X represents a multidimensional object and x represents its possible values.
1.4 Expectation values for single random variables If X is a random variable, the average or expectation of a function f ( X ) of X is defined by in the discrete case. Thus, it is the value f ( x j ) multiplied by the probability PJ of assuming that value, summed over all possibilities. The corresponding statement
6
R E V I E W OF PROBABILITY
in the continuous case is
If the range of x is broken up into intervals [XJ,XJ+AXJ] then the discrete formula with
where Xj + AXJ = Xj+i,
is simply the discrete approximation to the integral. The most important expectation is the mean value (or first moment)
More generally, the nth moment of the probability distribution is defined by:
The discrete case is included, with the help of Eq. (1.7). Note that moments need not necessarily exist. For example the LorentzCauchy distribution
has the first moment m but no second or higher moments. It may seem to be a tautology, but the choice f ( X ) = 6(a — X) yields as expectation value
the probability density itself. Equation (1.65), below, provides one example in which this definition is a useful way to determine the density distribution. In attempting to determine the expectations of a random variable, it is often more efficient to obtain an equation for a generating function of the random variable first. For example, it may be faster to calculate the expectation exp(itX) which includes all the moments (Xn} than to calculate each moment separately.
CHARACTERISTIC FUNCTIONS AND GENERATING FUNCTIONS 1.5
7
Characteristic functions and generating functions
The most important expectation is the characteristic function
which is the Fourier transform of the probability distribution. Note that xmax, then it is convenient to deal with the generating function for s > 0. If it is bounded from below, one can use When neither condition is obeyed, one may still use these definitions with s understood to be pure imaginary. 1.6 Measures of dispersion In this section, we shall introduce moments that are taken with respect to the mean. They are independent of origin, and give information about the shape of the probability density. In addition, there is a set of moments known as cumulants to physicists, or Thiele semi-invariants to statisticians. These are useful in describing the deviation from the normal error curve since all cumulants above the second vanish in the Gaussian case. Moments
The most important measure of dispersion in statistics is the standard deviation a defined by
since it describes the spread of the distribution p ( x ) about the mean value of x m = (x). Chebychev's inequality
guarantees that the probability of deviations that are large multiple h of the standard deviation a must be small. Proof
since the full value of (/icr)2.
MEASURES OF DISPERSION
9
The inequality remains if we replace the RHS by its smallest possible value
or
The remarkable Chebychev inequality limits the probability of large deviations with no knowledge of the shape of p ( x ) , only its standard deviation. It is useful in many applications, including the proof by Welsh (1988) of the noisy coding theorem. The standard deviation a is the second order moment about the mean. Higher order moments about the mean are defined (Kendall and Stuart 1969) by
for moments about the mean, m = (x}, and to use Eq. (1.14)
for the ordinary moments. Thus /Z2 = o"2, and /j,i = 0. The binomial expansion of Eq. (1.33) yields
where
.
LJ J
Conversely:
= .,,";_ ^, 1 is the binomial coefficient. In particular:
•''^' - ''
10
REVIEW OF PROBABILITY
Cumulants
The cumulants to be described in this section are useful since they indicate clearly the deviation of a random variable from that of a Gaussian. They are sometimes referred to as Thiele semi-invariants (Thiele 1903). The cumulants KJ are defined by
Note that normalization of the probability density p(x) guarantees that //(, = 1 and KO = 0. Equivalently,
By separating a factor exp(imt) the cumulants can be expressed in terms of the central moments:
Thus K\ = m, and the higher AC'S are expressible in terms of the moments of (x — m). In particular:
Cumulants were introduced as a tool in quantum mechanics by Kubo (1962) in conjunction with a convenient notation for the Kn as linked moments
where the individual linked moments must still be calculated by Eqs. (1.45)(1.49). However, Eq. (1.43) can be written in a nice symbolic form:
Example The normal error distribution (with mean zero) associated with a Gaussian random variable X,
MEASURES OF DISPERSION
11
has the characteristic function
The integral can be performed by completing the square and introducing x' = x — ia2t as the new variable of integration. In particular, the cumulants are all determined by In (t] to be The characteristic function, Eq. (1.54), can be rewritten in the form
where X is a Gaussian random variable of mean 0. A Gaussian variable with a nonvanishing mean can be handled by a shift of origin. For any Gaussian variable X with mean m = (X) not necessarily 0, we have
where This is a convenient way to calculate the average of an exponential of a Gaussian random variable. Since the Fourier transform of a Gaussian is a Gaussian, we know the form of the right hand sides of Eqs. (1.56) and (1.57). The coefficients could have been obtained simply by expanding both sides in powers of t and equating the coefficients oft" for n = 0,1, 2. Skewness and kurtosis The cumulants describe the probability distribution in an intrinsic way by subtracting off the effects of all lower-order moments. Thus (X2} has a value that depends on the choice of origin whereas ^2 = a2 = ((X — m) 2 ) describes the spread about the mean. Similarly, ^3 = ^3 describes the skewness or asymmetry of a distribution, and «4 = /Z4 — 3^2 describes the "kurtosis" of the distribution, that is, the extent to which it differs from the standard bell shape associated with the normal error curve. These measures are usually stated in the dimensionless form
These measures 71 and 72 clearly vanish in the Gaussian case. Moreover, they provide a pure description of shape independent of horizontal position or scale.
12
1.7
REVIEW OF PROBABILITY
Joint events
Suppose that we have two random variables X and Y described (when taken separately) by the probability densities p\ (x) andp2(y) respectively. The probability that X is found in the interval (x, x + dx) and at the same time y is found in the interval (y, y + dy) is described by the joint probability density Define p\(x)dx as the probability of finding X in (x, x + dx} regardless of the value of Y. Then we can write
which is referred to as the marginal distribution of x. Conversely,
Example Two points x and y are selected, at random, uniformly on the line from 0 to 1. (a) What is the density function p(£) of the separation £ = \x — y\l (b) What is the mean separation? (c) What is the root mean squared separation [{£2} — (£) 2 ] ? (d) What is (W(|x — y|)) for an arbitrary function W? Solution It is necessary to map the square with vertexes at the four point in the (x, y) plane: to the corresponding points in the (u, v) plane, by a transformation whose Jacobian is unity (see Fig. 1.1). In the (u, v) plane, the four points are at Using Eq. (1.62) and Eq. (1.64), the density function p(£) is then given by
JOINT EVENTS
13
FlG. 1.1. Transformation from x, y variables to u, v variables. Note that our use of a delta function to specify the variable we are interested in is one of our principal tools. It fulfills our motto that experimentalists do it with mirrors, and theorists do it with delta functions (and Green's functions). The solution (1.65) is even in u, we can integrate over 1/2 the interval and double the result:
We can verify that (a) Normalization
(b) The average separation between points is
(c) The mean square separation is
(d) The average of any function W of the separation \X — Y\ is
where the last integral, with the value over unity, was entered as a means of introducing the variable £. Rearranging the order of integration we get for the right
14
REVIEW OF PROBABILITY
FlG. 1.2. The events A and B are nonoverlapping, and the probability of at least one of these occurring is the sum of the separate probabilities. hand side: where the second integral is simply the definition of p(£). Restoring the left hand side we have established another tautology:
where W(Q is an arbitrary function and p(£) was given in Eq. (1.66). Finally we obtain an explicit formula
Disjoint events
If A and B are disjoint events (events that cannot both occur), then where A U B means the union of A and B, that is, at least one of the events A or B has occurred. In the language of set theory the intersection, A n B, is the region in which both events have occurred. For disjoint sets such as those shown in Fig. 1.2, the intersection vanishes. Overlapping events
The probability that at least one of two events A and B has occurred when overlap is possible is because the sum of the first two terms counts twice the probability p(A n B) that both events occur (the shaded area in Fig. 1.3).
JOINT EVENTS
15
FIG. 1.3. Events A and B that overlap are displayed. The hashed overlap region is called the intersection and denoted A n B, Note that an event A or B need not be elementary. For example, A could represent the tossing of a die with an odd number appearing which is the union of these events, a one, a three, or a five appearing. B could be the union of the events one and two. Suppose YJ is a random variable that takes the value 1 if event Aj occurs, and zero otherwise. We have thus introduced a projection operator, that is the analogue for discrete variables of our introduction of a delta function for continuous variables. The probability that none of the events Ai,A2,..., An has occurred can be written as The probability that one or more events has occurred can be written as a generalization of Eq. (1.75):
The use of projection operators such as YJ is often a convenient tool in solving probability problems. Uncorrelated or independent random variables
Two random variables X and Y are said to be uncorrelated if
16
R E V I E W OF PROBABILITY
The variables are said to be independent if
It is clear that two independent variables are necessarily uncorrelated. The converse is not necessarily true. 1.8 Conditional probabilities and Bayes' theorem If X and Y are two, not necessarily independent variables, the conditional probability density P(y\x)dy that Y takes a value in the range [y, y + dy] given that X has the value x is denned by
where P(x, y} is the joint probability density of x and y and
is the probability density of x if no information is available about y. Of course, Eq. (1.80) is also valid with the variables x and y interchanged so that
The notation in which the conditioned variables appear on the right is common in the mathematical literature. It is also consistent with quantum mechanical notation in which one reads the indexes from right to left. Thus verbally we say that the probability that X and Y take the values x and y is the probability that X takes the value x times the probability that Y will take the value y knowing that X has taken the value x, a conclusion that now appears obvious. Equation (1.82) is a general equation that imposes no requirements on the nature of the random variables. Moreover, the same idea applies to events A and B which may be more complicated than single random variables. Thus
is valid if B represents the event Xn = xn and A represents the compound event
Xi = xi,X2 = X2,...,Xn-i =£„_!. Thus
Suppose that Ac is the complementary event to A (anything but A). Then these events are mutually exclusive and exhaustive:
CONDITIONAL PROBABILITIES AND BAYES' THEOREM
17
c
Then the events A n B and A n B are mutually exclusive and their union is B. Thus
By the same argument, if the set of events Aj are mutually exclusive, AiCiAj = 0, and exhaustive.
then Eq. (1.87) generalizes to
which describes the probability of an event B in terms of possible starting points, or hypotheses Aj. Bayes' theorem
One can determine the probability of a hypotheses Aj if we have made a measurement B. This conditional probability probability P(Aj\B) is given by Bayes' thp.nrp.TTT
The first equality follows directly from the definition Eq. (1.80) of a conditional probability. The second equality is obtained by inserting Eq. (1.88). The importance of Bayes' theorem is that it extracts the a posteriori probability, P(Aj\B), of a hypothesis Aj, after the observation of an event B from the a priori probability P(Aj] of the hypothesis Aj. For simple systems like the tossing of a die, the a priori probabilities are known. In more general problems they have to be estimated, possibly as subjective probabilities. Bayesians believe that this step is necessary. Anti-Bayesians do not. They try to use another approach, such as maximum likelihood. In our opinion this approach is equivalent to making a tacit assumption for the a priori probabilities. We would prefer explicit assumptions. Bernstein (1998) notes that Thomas Bayes (1763), an English minister, published no mathematical works while he was alive. But he bequeathed his manuscripts, in his will, to a preacher, Richard Price, who passed it to another member of the British Royal Society, and his paper Essay Towards Solving A Problem in the Doctrine of Chance was published two years after his death. Although Bayes' work was still ignored for two decades after his death in 1761, he has since become famous among statisticians, social and physical scientists.
18
REVIEW OF PROBABILITY
Example It is known that of 100 million quarters, 100 are two-headed. Thus the a priori probability of a coin being two-headed is 10~6. A quarter is selected at random from this population and tossed 10 times. Ten heads are obtained. What is the probability that this coin is two headed? Solution Let AI = two headed, A^ = A\ = fair coin, B = ten heads tossing result, we have.
Then,
Thus observing 10 heads in a row has caused the a priori probability, 10 6, of a bad coin to increase to about 10~3. The point of this problem, as a Bayesian would say, is that one can never calculate the a postieri probability of a hypothesis without the use of Bayes' theorem with a choice of a priori probability, possibly determined by some subjective means. Example Two points are chosen at random in the interval [0,1]. They are connected by a line. Two more points are then chosen over the same interval and connected by a second line. What is the probability that the lines overlap? Solution We will choose the complementary question: what is the probability that they do not overlap? which is easier to answer. Suppose the first two points are x and y. No overlap will occur if the next two points are both left of the smaller of x, y or both right of the larger of x, y. By symmetry, the second probability is the same as the first.
SUMS OF RANDOM VARIABLES
19
Suppose x is the smaller of x and y. The probability that the third point is less than x is x. The probability that the fourth point is less than x is also x. The probability that both are less than x is x 2 . What is the probability density P(x = £) given that x < y! This conditional probability is
where H(x) is the Heaviside step function, H(x) = 1 if x > 0 and H(x) = 0 otherwise. We can evaluate the denominator in Eq. (1.91):
Thus the conditional probability is given by
The probability of both to the left is then f^ £2d£2(l - £) = 1/6. Therefore the probability of no overlap is 1/6 x 2 = 1/3 and the probability of overlap is 1 - 1 / 3 = 9 /3
1.9
Sums of random variables
The characteristic function of the sum of two random variables, Z = X + Y, is
If the variables are independent, then the averages over x and y can be performed separately with the result
Because the cumulants are denned in terms of the logarithm of the characteristic function- the cnmnlants are additive:
20
REVIEW OF PROBABILITY
More generally, if
and the X, are independent variables
The characteristic function of the joint distribution, p(x,y), of the two random variables is defined by
If X and Y are two independent Gaussian random variables, so is their linear combination. By Eq. (1.56), we then have
where the deviation variables are denned by
If these variables X,Y are uncorrelated, i.e., (XY) characteristic function factors:
= (X)(Y)
= 0, then the
Since p(x, y) can be obtained by taking the inverse Fourier transform of (s, t), it too must factor. Hence we arrive at the result: if two Gaussian variables are uncorrelated, they are necessarily independent. Bernoulli trials and the binomial distribution
Bernoulli considered a set of independent trials each with probability p of success and q = 1 — p of failure. In n trials, the probability that the first r trials are a success and the remainder are failures is
If we ask for the probability that there are r successes in n trials without regard to order, the probability will be
SUMS OF RANDOM VARIABLES
21
u/1-».o»-a
is the number of ways r objects can be drawn from n indistinguishable objects. We shall show how to derive Eqs. (1.105) and (1.106) without knowing the combinatorial coefficients by using the characteristic function. If we define a random variable S that takes the value 1 for a success and 0 for a failure, then the characteristic function in a single trial is
so that the characteristic function for n independent trials by Eq. (1.99) is
With z = elt, the generating function can be expanded using the binomial theorem
Since the coefficient of zr in the generating function is Pr [or Pr (n) with n fixed and r variable!, we have established that
Comparison with Eq. (1.105) shows that the combinatorial and binomial coefficients are equal
With the abbreviation 0 = it, Eq. (1.108) can be rewritten as
from which we deduce the cumulants to be
22
REVIEW OF PROBABILITY
The measures of skewness and kurtosis were given in Eq. (1.58)
The higher order dimensionless ratios, j > 2, depend on j roughly as,
So all vanish as n —> oo for j > 2. Thus as n —> oo, 0. Then the cumulative probability, Pn(u), that
tends uniformly to the limit
See for example Uspensky (1937), Chapter 14. The condition (1.120) is less stringent than the set of conditions in Eq. (1.115). The Poisson distribution
If n —» oo for fixed p, the binomial distribution approaches the Gaussian distribution, which is an example of the central limit theorem. Another distribution, appropriate for rare events, may be obtained by letting n —» oo, p —» 0 with a fixed product or fixed mean value. As n —> oo, we can utilize Stirling's asymptotic approximation for the factorial function to obtain the limiting behavior for large n
Then the binomial distribution
As p —> 0 and n —> oo (for fixed r)
24
REVIEW OF PROBABILITY
Thus the binomial distribution in the rare event limit approaches the Poisson distribution: The associated generating function
yields the corresponding characteristic function
The cumulants then take the remarkable simple single value for all s
A completely different approach to the Poisson process is given in Section 3.2. 1.10
Fitting of experimental observations
Linear fit Suppose we expect, on theoretical grounds, that some theoretical variable Y should be expressible a linear combination of the variables X^. We wish to determine the coefficients aM of their linear expansion in such a way as to best fit a set of experimental data, i = 1 to N, of values X^, Yi. We can choose to minimize the least squares deviation between the "theoretical" value of Yi
and the observed value Yi by minimizing the sum of the squares of the deviations between experiment and theory:
The number of measurements, N, is in general, much larger than the number, s, of unknown coefficients, a^. The expected value of F is then
FITTING OF EXPERIMENTAL OBSERVATIONS
25
where
We can minimize by differentiating (F) with respect to a\. After removing a factor of two, we find that the av obey a set of linear equations:
With MXv = (XXX»}, and Bx = ( X X Y } , we have the matrix equation which provides a least squares fit to the date by a linear function of the set of individual random variables. The logic behind this procedure is that the measurement of Yi is made subject to an experimental error that we treat as a Gaussian random variable e^. Errors in the points X? of measurement are assumed negligible, and for simplicity, we assume (e^e.,) = cV, C, thus
Suppose odds up to d\ = W/(C/2) are available. Then one can bet (7/2 on the first bet and win with probability
The best one can do, then, is to stop if one wins on the first stop, and to bet C/2 again if one loses. The expected amount bet is then
so that the probability P of winning is better than that in the first scheme, Eq. (1.190). It is clear, that if the degree, e, of unfairness is the same at all odds, it is favorable to choose the highest odds available and bet no more than necessary to achieve C + W.
APPENDIX A: THE DIRAC DELTA FUNCTION
1.13
35
Appendix A: The Dirac delta function
Point sources enter into electromagnetic theory, acoustics, circuit theory, probability and quantum mechanics. In this appendix, we shall attempt to develop a convenient representation for a point source, and establish its properties. The results will agree with those simply postulated by Dirac (1935) in his book on quantum mechanics and called delta functions, or "Dirac delta functions" in the literature. What are the required properties of the density associated with a point source? The essential property of a point source density is that it vanishes everywhere except at the point. There it must go to infinity in such a manner that its integral over all space (for a unit source) must be unity. We shall start, in one dimension, by considering a source function 6(e, x) of finite size, e, which is small for x 3> e, and of order 1/e for x < e, such that the area for any e is unity:
All of these requirements can be fulfilled if we choose
where g(y) is an integrable mathematical function that describes the "shape" of the source, whereas e describes its extent. Examples of source functions which can be constructed in this way are
A problem involving a point source can always be treated using one of the finite sources, S(e, x), by letting e —» 0 at the end of the calculation. Many of the steps in the calculation (usually integrations) can be formed more readily if we can let e —> 0 at the beginning of the calculation. This will be possible provided that the results are independent of the shape g(y) of the source. Only if this is the case,
36
R E V I E W OF PROBABILITY
however, can we regard the point source as a valid approximation in the physical problem at hand. The limiting process e —>• 0 can be accomplished at the beginning of the calculation by introducing the Dirac delta function
This is not a proper mathematical function because the shape g(y) is not specified. We shall assume, however, that it contains those properties which are common to all shape factors. These properties can be used in all problems in which the point source is a valid approximation. It is understood that the delta function will be used in an integrand, where its properties become well denned. The most important property of the 6 function is
for a < b < c and zero if b is outside these limits. Setting 6 = 0, for simplicity of notation, we can prove this theorem in the following manner:
In the first term, the limit as e —> 0 can be taken. The integral over g(y) is then 1 if a < 0 < c since the limits then extend from — oo to oo, and 0 otherwise since both limits approach plus (or minus) infinity. The result then agrees with the desired result, Eq. (1.202). The second integral can be easily shown to vanish under mild restrictions on the functions / and g. For example, if / is bounded and g is positive, the limits can be truncated to fixed finite values, say a' and c' (to any desired accuracy), since the integral converges. Then the limit can be performed on the integrand, which then vanishes. The odd part of the function makes no contribution to the above integral, for any f(x). It is therefore customary to choose g(y) and hence S(x) to be even fnnrtinns
The derivative of a delta function can be defined by
APPENDIX A: THE DIRAC DELTA FUNCTION
37
where the limiting process has a similar meaning to that used in defining the delta function itself. An integration by parts then yields the useful relation
when the range of integration includes the singular point. The indefinite integral over the delta function is simply the Heaviside unit function H(x)
With g(y) taken as an even function, its integral from negative infinity to zero is one half, so that the appropriate value of the Heaviside unit function at the origin is Conversely, it is appropriate to think of the delta function as the derivative of the Heaviside unit function, \AJlM
The derivative of a function possessing a jump discontinuity can always be written in the form
The last term can be rewritten in the simpler form since and Eq. (1.211) is clearly valid underneath an integral sign in accord with Eq. (1.202). As a special case Eq. (1.211) yields provided that it is integrated against a factor not singular at x = 0. Thus we can write, with c an arbitrary constant, or
Thus the reciprocal of x is undefined up to an arbitrary multiple of the delta function. A particular reciprocal, the principal valued reciprocal, is customarily defined
38
REVIEW OF PROBABILITY
by its behavior under an integral sign,
Thus a symmetric region [—e, e] is excised by the principal valued reciprocal before the integral is performed. The function x/(x2 + e 2 ) behaves as 1/x for all finite x and deemphasizes the region near x = 0. Thus, in the limit, it reduces to the nrincinal valued recinroeal. The, combination
is important in the theory of waves (including quantum mechanics) because it enters the integral representation for "outgoing" waves. Its complex conjugate is used for incoming waves. Behavior of delta functions under transformations Because the delta function is even, we can write Thus we can make the transformation y = a\xto obtain
This can be restated as a theorem
The denominator is simply the magnitude of the Jacobian of the transformation to the new variable A natural generalization to the case of a nonlinear transformation y = j (x) is given by
where the xr are the nondegenerate roots of / This theorem follows from the fact that a delta function vanishes everywhere except at its zeros, and near each zero, we can approximate and apply Eq. (1.220).
APPENDIX A; THE DIRAC DELTA FUNCTION
39
A simple example of the usefulness of Eq. (1.222) is the relation
Multidimensional delta functions
The concept of a delta function generalizes immediately to a multidimensional space. For example, a three-dimensional delta function has the property that
If the integral is written as three successive one-dimensional integrals in Cartesian coordinates, Eq. (1.226) is equivalent to the statement
If spherical coordinates are used, Eq. (1.226) is equivalent to
The denominator is just the Jacobian for the transformation from Cartesian to spherical coordinates. This is the natural generalization of the Jacobian found in Eq. (1.221), and guarantees that the same result, Eq. (1.226), is obtained regardless of which coordinate system is used. Volume of a sphere in n dimensions
We shall now show how to calculate the volume of a sphere in n dimensions with the help of delta functions, and probability theory alone! The volume of a sphere of an n-dimensional sphere of radius R can be written
where the Heaviside unit function confines the integration region to the interior of the sphere. If we now differentiate this equation with respect to R2 to convert the Heaviside function to a delta function, we get
Now, if we let X{ = Rui for all i and use the scaling property, Equation (1.220) of delta functions we get
40
REVIEW OF PROBABILITY
where
is a constant that depends only on n. Since dVn = Rn dRdS where S is a surface element, 2Kn, can be interpreted as the surface area, Sn of a sphere of radius unity in n-dimensions. However, the normalization of the chi-square distribution in Eq. (1.155) forces Kn to take the value, Eq. (1.156),
Finally, Eq. (1.231) can be integrated to yield
1.14
Appendix B: Solved problems
The dice player A person bets that in a sequence of throws of a pair of dice he will get a 5 before he gets a 7 (and loses). What odds should he be given to make the bet fair? Solution
Out of the 36 possible tosses of a pair, only the four combinations, 1 + 4, 2 + 3, 3 + 2, and 4 + 1 add to 5. Similarly, six combinations add to 7: 1 + 6, 2 + 5, 3 + 4, 4 + 3, 5 + 2, and 6 + 1. Thus in a single toss the three relevant probabilities P5 = 4/36, P7 = 6/36 for 5 and 7, and the probability for all other possibilities combined P0 = 26/36. The probability of r tosses of "other", and s > 1 tosses of 5, followed by a toss of a 7 is given by
where the sum on s starts at 1 to insure the presence of a 5 toss, r has been replaced by n — s and the combinatorial coefficient has been inserted to allow the "other" tosses and the 5 tosses to appear in any order. The 7 toss always appears at the end.
APPENDIX B: SOLVED PROBLEMSs
41
The sum over s can be accomplished by adding and subtracting the s = 0 term:
The two terms are simply summable geometric series. Using P$ + Pj + P0 = 1 the final result is
The corresponding result for 7 to appear first is obtained using the formulas with 5 and 7 interchanged:
Thus the "house" should give 3 : 2 odds to the bettor on 5. The gold coins problem A desk has four drawers. The first holds three gold coins, the second has two gold and one silver, the third has one gold coin and two silver, and finally the fourth drawer has three silver coins. If a gold coin is drawn at random, what is (a) the probability that there is a second gold coin in the same drawer, and (b) if a second drawing is made from the same drawer, what is the probability that it too is gold? Solution Because of the distribution of the gold coins, the probability that the coin came from the first drawer is pi = 3/6 because three of the six available gold coins were in that drawer. Similarly p2 = 2/6, and p% = 1/6. The probability that there is a second coin in the same drawer is 1 x pi + 1 x p% + 0 x p% = 5/6. Similarly, the probability that the second selected coin (from the same drawer) is gold is 1 x pi + (1/2) x p2 = 2/3, since the second coin is surely one in the first drawer, and has a 1/2 chance in the second drawer, and there are no gold coins left in the third drawer. Note that these values of 1, 1/2, and 0 are conditional probabilities given the outcome of the first choice. The Bertrand paradox The Bertrand problem can be stated in the following form: A line is dropped randomly on a circle. The intersection will be a chord. What is the probability that the length of the chord will be greater than the side of the inscribed equilateral triangle?
42
REVIEW OF PROBABILITY
Solution 1
The side of the inscribed triangle is a chord with a distance of half of the radius, -R/2, from the center of the circle. Assuming that along the direction perpendicular to the side of the inscribed triangle the distance is uniformly distributed, the chord will be longer if the distance from the circle center is less than R/2, which it is with probability 1/2. Solution 2
The chord will be greater than the triangle side if the angle subtended by the chord is greater than 120 degrees (out of a possible 180) which it achieves with probability 2/3. Solution 3
Draw a tangent line to the circle at an intersection with the chord. Let (f> be the angle between the tangent and the chord. The chord will be larger than the triangle side if <j> is between 60 and 120 degrees, which it will be with probability (120 — 60)/180 = 1/3. Solution 2 is given in Kyburg (1969). Solution 3 is given by Uspensky (1937) and the first solution is given by both. Which solution is the correct one? Answer: the problem is not well defined. We do not know which measures have uniform probability unless an experiment it specified. If a board is ruled with a set of parallel lines separated by the diameter, and a circular disk is dropped at random, the first solution is correct. If one spins a pointer at the circle edge, the third solution would be correct. Gambler's ruin
Hamming (1991) considers a special case of the gambler's ruin problem, in which gambler A starts with capital C, and gambler B starts with W (or more) units. The game will be played until A loses his capital C or wins an amount W (even if B is a bank with infinite capital). Each bet is for one unit, and there is a probability p that A will win and q = I — p that B will win. Solution
The problem is solved using the recursion relation
APPENDIX B: SOLVED PROBLEMS
43
where P(n) is the probability that A will win if he holds n units. The boundary conditions are Strictly speaking, the recursion relation only needs to be satisfied for 0 < n < T, which omits the two end points. However, the boundary conditions, Eq. (1.240), then lead to a unique solution. The solution to a difference equation with constant coefficients is analogous to that of a differential equation with constant coefficients. In the latter case, the solution is an exponential. In the present case, we search for a power law solution, P(n) = rn, which is an exponential in n. The result is a quadratic equation with roots 1 and p/q. The solutions, 1™ and (p/q)n, actually obey the recursion relation, Eq, (1.239), for all n. But they do not obey the boundary conditions. Thus we must, as in the continuous case, seek a linear combination and impose the boundary conditions of Eq. (1.240) to obtain simultaneous linear conditions on A and B. The final solution is found to be
where we must set n = C, and T = C + W to get the probability that A at his starting position will win. lip = 1/2, the two roots for r coalesce, and as in the differential equation case a solution linear in n emerges. Application of the boundary conditions leads to
Since our solution obeys the boundary conditions, as well as the difference equation everywhere (hence certainly in the interior) it is clearly the correct, unique solution.
2
What is a random process
2.1 Multitime probability description A random or stochastic process is a random variable X(t), at each time t, that evolves in time by some random mechanism (of course, the time variable can be replaced by a space variable, or some other variable in application). The variable X can have a discrete set of values Xj at a given time t, or a continuum of values x may be available. Likewise, the time variable can be discrete or continuous. A stochastic process is regarded as completely described if the probability distribution is known for all possible sets [ti, t%, ...,tn] of times. Thus we assume that a set of functions
describes the probability of finding
We have previously discussed multivariate distributions. To be a random process, the set of variables Xj must be related to each other as the evolution in time Xj = X(t) \t=t, of a single "stochastic" process. 2.2
Conditional probabilities
The concept of conditional probability introduced in Section 1.8 immediately generalizes to the multivariable case. In particular, Eq. (1.82)
can be iterated to yield
STATIONARY, GAUSSIAN AND MARKOVIAN PROCESSES
45
When the variables are part of a stochastic process, we understand Xj to be an abbreviation for X ( t j ) . The variables are written in time sequence since we regard the probability of xn as conditional on the earlier time values x n _i, ...,xi. 2.3
Stationary, Gaussian and Markovian processes
A stationary process is one which has no absolute time origin. All probabilities are independent of a shift in the origin of time. Thus
In particular, this probability is a function only of the relative times, as can be seen by setting r = —ti. Specifically, for a stationary process, we expect that
and the two-time conditional probability
reduces to the stationary state, independent of the starting point when this limit exists. For the otherwise stationary Brownian motion and Poisson processes in Chapter 3, the limit does not exist. For example, a Brownian particle will have a distribution that continues to expand with time, even though the individual steps are independent of the origin of time. A Gaussian process is one for which the multivariate distributions pn(xn,xn-i, ...,xi) are Gaussians for all n. A Gaussian process may, or may not be stationary (and conversely). A Markovian process is like a student who can remember only the last thing he has been told. Thus it is defined by
that is the probability distribution of xn is sensitive to the last known event xn-\ and forgets all prior events. For a Markovian process, the conditional probability formula. Ea. (2.5) specializes to
so that the process is completely characterized by an initial distribution p ( x i ) and the "transition probabilities" p(xj a^-i). If the Markovian process is also
46
WHAT IS A R A N D O M PROCESS
stationary, all P(XJ\XJ-\) are described by a single transition probability
independent of the initial time tj-i.
2.4
The Chapman-Kolmogorov condition
We have just shown that a Markovian random process is completely characterized by its "transition probabilities" p(xz x\). To what extent is p(xa is the transition probability per unit time and the second term has been added to conserve probability. It describes the particles that have not left the state
THE CHAPMAN-KOLMOGOROV CONDITION
47
a provided that
If we set t = to + Ato, we can evaluate the right hand side of the ChapmanKolmosorov condition to first order in At and Atr>:
which is just the value p(a', to + At + Ato|ao, to) expected from Eq. (2.18). Note, however, that this proof did not make use of the conservation condition, Eq. (2.19). This will permit us, in Chapter 8, to apply the Chapman-Kolmogorov condition to processes that are Markovian but whose probability is not normalized.
3
Examples of Markovian processes
3.1
The Poisson process
Consider two physical problems describable by the same random process. The first process is the radioactive decay of a collection of nuclei. The second is the production of photoelectrons by a steady beam of light on a photodetector. In both cases, we can let a discrete, positive, integer valued, variable n(t) represent the number of counts emitted in the time interval between 0 and t. In both cases there is a constant probability per unit time v such that vdt is the expected number of photocounts in [t, t + dt] for small dt. We use the initial condition Then n — HQ will be the number of counts in the interval [0, i]. When we talk of P(n, t) we can understand this to mean P(n, t|n 0 , 0), the conditional density distribution. Since the state n(t) = n is supplied by transitions from the state n — I with production of photoelectrons at a rate vdt and is diminished by transitions from state n to n + 1 we have the eauation with the middle term supplying the increase in P(n) by a transition from the n — I state, and the last term describes the exit from state n by emission from that state. These are usually referred to as rate in and rate out terms respectively. Canceling a factor dt we obtain the rate equation In the first term, n increases from n — 1 to n, in the second from n to n +1. Thus n never decreases. Such a process is called a birth process in the statistics literature, or a generation process in the physics literature. A more general process is called a birth and death process or a generation-recombination process. Since n > no we have no supply from the state P(HQ — 1, t) so that whose solution is
THE POISSON PROCESS
49
since P(n, 0) = Snjrto at time t = 0 corresponding to the certainly that there are no particles at time t = 0. The form, Eq. (3.5), of this solution suggests the transformation
with the resultant equation
subject to the initial condition
Thus any Q(n,t) may be readily obtained if Q(n—l) is known. Butn, as described by Eq. (3.3), can only increase. Thus
andEq. (3.5) yields
Solution by induction then yields
or, setting n = no + m,
for n > no with a vanishing result for n < no- This result specializes to the usual Poisson answer
for the usual case no = 0 (see also Eq. (1.128)). The two formulas, Eqs. (3.12) and (3.13) are, in fact, identical since n — no has the meaning of the number of events occurring in the interval (0, t). The more general form is useful in verifying
50
EXAMPLES OF M A R K O V I A N PROCESSES
the Chapman-Kolmogorov conditions
where the last step recognizes the binomial expansion that occurred in the previous step. The final result is equal to that in Eq. (3.13) if t is replaced by (t — to) in the latter. The Poisson process is stationary so that P(n, t no, to) is a function only of t — to- However, no limit exists as t — to —»• oo, so that there is no time independent P(n). We shall therefore evaluate the characteristic function of the conditional probability density
This result reduces to Eq. (1.130) if one sets HO = 0. The cumulants can be calculated as follows:
where the cumulants all have the same value! Here the subscript L is used to denote the linked moment or cumulant as in Section 1.6. 3.2
The one dimensional random walk
The Bernoulli sequence of independent trials described in Section 1.9 can be mapped onto a random walk in one dimension. In Fig. 3.1 we show an array of
THE ONE DIMENSIONAL RANDOM WALK
51
FIG. 3.1. Random walk on a discrete lattice with spacing a. points at the positions ja where j = 0, ±1, ±2, etc. and a is the spacing between the points. At each interval of time, T, a hop is made with probability p to the right and q = 1 — p to the left. The distribution of r, of hops to the right, in N steps in given as before by the Bernoulli distribution:
The first moment, and the second moment about the mean are given as before in Section 1.9 by
A particle that started at 0 and taken r steps to the right, and N — r to the left arrives at position with mean value Notice, if p = q = 1/2, or equal probability to jump to the right or the left, the average position after N steps will remain 0. The second moment about the mean is given by From the central limit theorem, discussed in Section 1.9, the limiting distribution after many steps is Gaussian with the first and second moments just obtained:
If we introduce the position and time variables by the relations
52
EXAMPLES OF M A R K O V I A N PROCESSES
the moments of x are given by
The factor 2 in the definition of the diffusion coefficient D is appropriate for one dimension, and would be replaced by 2d if were in a space of dimension d. Thus the distribution moves with a "drift" velocity
and spreads with a diffusion coefficient denned by
The appropriateness of this definition of diffusion coefficient is made clear in Section 3.4 on "Diffusion processes and the Einstein relation". A detailed discussion of random walks in one and three dimensions is given by Chandrasekhar (1943) as well as by Feller (1957). A recent encyclopedia article by Shlesinger (1997) emphasizes recent work in random work problems. See also A Wonderful World of Random Walks by Montroll and Shlesinger (1983). An "encyclopedic" review of "stochastic process" is given by Lax (1997).
3.3
Gambler's ruin
The problem we discussed, in connection with the second law of gambling, that of winning a specific sum W starting with a finite capital C, is referred to as the Gambler's ruin problem. To make connection to physical problems, we map the probability to a random walk problem on a line. It is distinguished from conventional random walk problems because it involves absorbing boundaries. Since the game ends at these boundaries it is also a first passage time problem - a member of a difficult class. The gambling problem with bet 6 and odds d can be described as a random walk problem on a line with steps to the left of size 6 if a loss is incurred, and a step to the right of bd if a win occurs. Instead of dealing with the probability of winning at each step, we shall define P(x) as the probability of eventually winning if one starts with capital x. Our random walk starts at the initial position C. The game is regarded as lost if one arrives at 0, i.e., no capital left to play, and it is regarded as won if one arrives at the objective C + W.
G A M B L E R ' S RUIN
53
Our random walk can be described by the recursion relation:
since the right hand side describes the situation after one step. With probability p one is at position x + bd with winning probability P(x + bd) and with probability q, one is at position x — b with winning probability P(x — b). Since the probability of eventual winning depends on x, but not how we got there, this must also be the probability P(x). The procedure we have just described of going directly after the final answer, rather than following the individual steps, is given the fancy name "invariant embedding" by mathematicians, e.g., Bellman (1964). The boundarv conditions we have are
We seek a homogeneous solution of this linear functional equation in exponential form P(x) = exp(Ax), just as for linear differential equations. Equation (3.28) then determines A by requiring
We establish in Appendix 3.A that there are exactly two roots, one with A = 0, and one with A > 0. Calling the second root A, the general solution of Eq. (3.28) is a linear combination of 1 and exp(Ax) subject to the boundary conditions, Eq. (3.29), with the result
The probability of winning is that associated with the initial capital C:
Although Eq. (3.30) does not supply an explicit expression for A (except in the case C 0. The denominator in Eq. (3.32) then increases more rapidly with A than the numerator. Thus
Since the condition (3.30) involves only the product A6, an increase in b causes a decrease in A, hence an increase in P. Thus the probability of winning is an increasing function of b. Of course, at the starting position, a bet greater than C is
54
EXAMPLES OF M A R K O V I A N PROCESSES
impossible. Thus the optimum probability is obtained if A is calculated from Eq. (3.30) with b replaced by C:
Our arguments have tacitly assumed that one bet requires a step outside the domain 0 < x < W + C. Thus if a game with large odds d = 2W/C were allowed, the preceding argument would not apply, and it would be appropriate to bet C/2, since the objective is to win no more than W, and to terminate the game as soon as possible, in order to minimize the total amount bet. 3.4
Diffusion processes and the Einstein relation
In the large N limit, the distribution function (3.23) for the one-dimensional random walk can be written as
where dx/dn = a is the width of one step, and
Equation (3.35) is the Green's function of the diffusion equation that is written down explicitly in Eq. (3.50) below. That is, Eq. (3.35) is the solution of Eq. (3.50) that obeys the initial condition:
Let us compare this result with the macroscopic theory of diffusion in which a concentration c of particles obeys the conservation law.
where the particle (not electrical) current density is given by Pick's law
and D is the macroscopic diffusion constant. Thus c obeys the diffusion equation
which reduces in one dimension for constant D to
DIFFUSION PROCESSES AND THE EINSTEIN RELATION
55
The Green's function solution to this equation appropriate to a point source at x = XQ at t = to is given by
in agreement with our random walk result, Eq. (3.35) for v = 0 but the initial position at XQ. The Einstein relation Einstein's (1905, 1906, 1956) original idea is that if one sets up a uniform applied force F there will be a drift current
where the mechanical mobility B is the mean velocity of a particle per unit applied force. Thus the drift current per unit of concentration c is proportional to the applied field F. However, if a force F is applied in an open circuit, a concentration gradient will build up large enough to cancel the drift current
or The simplest example of this is the concentration distribution set up in the atmosphere subject to the gravitational force plus diffusion. This steady state result must agree with the thermal equilibrium Boltzmann distribution
where k is Boltzmann's constant, T is the absolute temperature, and the potential energy V is Comparison of the two expressions for c(x) yields the Einstein relation between diffusion, D, and mobility, B: For charged particles, F = eE, and the electrical mobility is /j, = v/E = eB so that a frequently stated form of the Einstein relation.
56
EXAMPLES OF M A R K O V I A N PROCESSES
If the entire current, Eq. (3.44), is substituted into the continuity equation (3.38), one gets
where v = BF (or efj,E in the charged case). Equation (3.50) is a special case of a Fokker-Planck equation to be discussed in Section 8.3. We note, here, that the drift is contained in the first derivative coefficient term and the diffusion is the second derivative coefficient. The solution of this equation for a pulse starting at x = 0 at t = 0 is
which is the precise relation of the discrete random walk solution, Eq. (3.35). By injecting a pulse of minority carriers into a semiconductor and examining the response on an oscilloscope at a probe a distance down the sample, a direct measurement can be made of the "time of flight" of the pulse and the spread in its width. This technique introduced by Haynes was applied by his class Transistor Teacher's Summer School (1952) to verify the Einstein relation for electrons and holes in germanium. Note that Eq. (3.51) describes a Gaussian pulse whose center travels linear with a velocity v. Thus the center of the pulse has a position that grows linearly with time. Also, the pulse has a Gaussian shape, and the root mean square width is given by (2Dt) 1 / 2 . Measurements were made by each of the 64 students in the class. The reference above contained the average results that verified the Einstein relation between the diffusion constant and the mobility. With holes or electrons injected into a semiconductor, a pulse will appear on a computer screen connected by probes to the semiconductor. For several probes at different distances, the time of arrival can be noted and the width of the pulse is measured at each probe. Thus a direct measurement is made of both the mobility /j, and the diffusion constant D.
3.5
Brownian motion
The biologist Robert Brown (1828) observing tiny pollen grains in water under a microscope, concluded that their movement "arose neither from currents in the fluid, nor from its gradual evaporation, but belonged to the particle itself". MacDonald (1962) points out that there were numerous explanations of Brownian motion, proposed and disposed of in the more than 70 years until Einstein (1905, 1906, 1956) established the correct explanation that the motion of the particles was due to impact with fluid molecules subject to their expected Boltzmann distribution of velocities.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION
57
It is of interest to comment on the work of von Nageli (1879) who proposed molecular bombardment but then ruled out this explanation because it yielded velocities two orders of magnitude less than the observed velocities of order 10~4cm/sec. von Nageli assumed that the liquid molecules would have a velocity given by with a molecular weight of 100, m ~ 10 22 gram and T ~ 300K so that v ~ 2 x 104cm/sec. The Brownian particle after a collision can be expected to have a velocity where the mass of the Brownian particle M is proportional to the cube of its radius so that
so that V ~ (2 x 104)/(8 x 109) ~ 2 x 10~6 cm/sec, which is two orders of magnitude too small. Conversely, if we assume the Brownian particle to be in thermal equilibrium, Since M ~ (8 x 109)(10"22g) ~ 10~12 grams, and T ~ 300K, we have V ~ 0.2 cm/sec which is much larger than the observed velocity of 3 x 10~4 cm/sec. We shall return to the resolution of this discrepancy after we have discussed Brownian motion from the Langevin (1908) point of view. 3.6
Langevin theory of velocities in Brownian motion
Our introduction to the Langevin treatment of Brownian motion comes from the paper of Chandrasekhar (1943) and the earlier paper of Uhlenbeck and Ornstein (1930), both of which are in the excellent collection made by Wax (1954). However, a great simplification can be made in the algebra if one assumes from the start that the process is Gaussian in both velocity and position. The justification is given in Appendix 3.B. The distribution of velocity is first considered. The free particle of mass M subject to collisions by fluid molecules is described by the equation (for simplicity, we discuss in a one-dimensional case, instead of the actual three-dimensional case)
It was Langevin's (1908) contribution to recognize that the total force F exerted by the fluid molecules contains a smooth part —v/B associated with the viscosity
58
EXAMPLES OF M A R K O V I A N PROCESSES
of the fluid that causes the macroscopic motion of a Brownian particle to decay plus a fluctuating force F(t) whose average vanishes
This fluctuating part will be shown to give rise to the diffusion of the Brownian particle. The relation between the fluctuating part and the diffusion part is the Einstein relation to be derived below. It is also a special case of the fluctuationdissipation theorem to be derived in Chapter 7. Note that if a steady external force G is applied, the average response at long times is v = BG so that B is to be interpreted as the mechanical mobility. If the particle is a sphere of radius a moving in a medium of viscosity rj then Stokes law yields (in the three-dimensional case)
To simplify notation in subsequent calculations, we shall rewrite Eq. (3.56) as
For a Brownian particle of diameter 2a = 1 micron and mass M ~ (8 x 10 9 ) grams moving in a fluid of viscosity of r/ ~ 10~2 poise, we estimate that I/A ~ 10~7 sec. Microscopic collisions with neighboring particles should occur in a liquid at roughly 1012-1013/sec. Thus the correlation
must fall off in times of order 10 sec, much shorter than the 10 sec decay time. It is therefore permissible to approximate the correlation as a Dirac delta function.
where d is an as yet unknown constant. Equation (3.59) can be solved for the velocity
where (v(t)}Vo is the ensemble average velocity contingent on u(0) = VQ.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION
59
2
The mean square velocity contingent (Au ) on v(0) = VQ is then the first term on the left:
for the particular ^(s) of Eq. (3.62). The limiting value at long times is In the limit as t —> oo,(v (t))Vo must approach the thermal equilibrium value This yields another Einstein relation for the one-dimensional case.
that relates a measure d of diffusion in velocity space to the mobility B (or the dissipation). Equation (3.64) can be rewritten Thus the mean square deviation of the velocity from its mean, starting with the initial velocity VQ, namely avv is independent of the starting velocity VQ\ This is a special case, with u = t, of Eq. (8.18) of Classical Noise I in Lax (19601). For the delta correlated case, Eq. (3.62) shows that the velocity is a sum of uncorrelated (hence independent) Gaussian variables since (A(s)A(s')) = 0 for s ^ s'. Since each term is Gaussian, the sum will also be a Gaussian random variable (see Appendix 3.B). Thus the statistics of v(t) are completely determined by its mean and second cumulant since all higher cumulants vanish. Thus the conditional probability density for v(t~) is given by
where (v)Vo = VQ exp(—At) and the unconditional average (v2} = kT/M is just the thermal average independent of time. In the limit as t —» oo we approach the steady state solution
which agrees with the equilibrium Boltzmann distribution.
60
EXAMPLES OF M A R K O V I A N PROCESSES
Is the converse theorem true? That a Gaussian steady state distribution P(v) implies that A(s) must be a Gaussian process? The answer is positive and is presented in Appendix 3.B. We calculated the velocity distribution in this section, Eq. (3.69), by assuming the velocity v(x) is a Gaussian variable even though velocity-velocity correlations exist. Our results can be justified by the fact that /\v(t) is given by the integral in Eq. (3.62) which represents a sum of uncorrelated variables since (A(s)A(s')) vanishes for s ^ s'.
3.7
Langevin theory of positions in Brownian motion
Since the position of a particle is determined by the time integral of the velocity, we would expect that the statistics of Brownian motion of a particle, that is the random motion in position space, can be determined fairly directly by a knowledge of its motion in velocity space. More generally, one would like to determine the joint distribution in position and velocity space. We shall see in this section that the manipulations to be performed involve only minor difficulties provided that the distribution in positions, velocities and the joint distribution are all Gaussians. That is because the distributions can be written down fairly readily from the first and second moments if the distributions are Gaussian in all variables. But its proof depends on the Gaussian nature of the sum of Gaussian variables. And we have only established that Gaussian nature if the variables are independent. Since positions and velocities are correlated, it is not clear whether the scaffold we have built using independent variables will collapse. To separate the computational difficulties from the fundamental one, we shall perform the calculations in this section assuming that all the variables involved are Gaussian, and reserve for Appendix 3.B a proof that this is the case. The average position may be obtained by integrating Eq. (3.62) with respect to time and averaging:
Here, all averages are understood to be taken contingent on given initial velocities and positions. Next, we calculate ((x(t) — (x(t))) 2 ). The general value of the random variable, x ( t ) , can be obtained by integrating Eq. (3.62) over time, by setting t = w in Eq. (3.62) and integrating over w from 0 to t. The expression is simplest if we subtract off the average position given by Eq. (3.70). The result takes the form
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION
61
where
with Integration by parts then yields
where The fluctuations in position are then described after applying Eq. (3.61) by
where VQ has again canceled out. It is of interest to examine axx (t) for small and large t. For small t, we may expand the exponential so that
Thus for small t, x(t) is excellently described by (x(t)}. More specifically
when At u, the integral over r should be done first in Eq. (9.35). Then the delta function is always satisfied somewhere in the range of integration. Thus
But the integral is a perfect differential, as remarked in Lax (19601):
The complete integral then yields
GENERALIZED CHARACTERISTIC FUNCTIONS
175
The absolute value \t — u , in the first term, is unnecessary since t > u. However, when u > t, the integral must be done over s first, and both answers are given correctly by using \t — u . Although we are dealing with a stationary process, Eq. (9.40) is not stationary (not a function only of \t — u\) because initial conditions have been imposed at t = to. However, if to —> —oo, the results approach the stationary limit
Equation (9.40) can also be rewritten by subtracting off the mean values
so that for the fluctuations about the average position:
Again the results approach the stationary results as to —» — oo and u — to —>• oo at fixed t — u. At t = u = to, the right hand side vanishes as it should, since there is no fluctuation at the initial time at which all values are specified. The stationarity of the original expression, Eq. (9.40), is maintained if all times t, to, and u are all shifted by the same amount r.
9.5
Generalized characteristic functions
In this section we will obtain the result in Section 8.7 using the Langevin approach, which was presented in Lax (1966IV), Section 2.
176
L A N G E V I N PROCESSES
We can continue our discussion of homogeneous noise with linear damping using the same Langevin equation
but specifying the noise source moments:
Thus all linked-moments are assumed maximally delta correlated. The parameters A, D and Dn can be functions of time as discussed in Lax (1966IV), but we shall ignore that possibility here for simplicity of presentation. Here, L denotes the linked average (or cumulant) which is defined by
where y(s) is an arbitrary (vector) function of s. Equation (9.49) is a natural generalization of Eq. (1.51). If we insert Eqs. (9.46), (9.47), (9.48) into Eq. (9.49), we get
Here the symbol ":" means summation on all the indexes. The n = 1 term vanishes in view of Eq. (9.46). Equation (9.50) defines K(y, s) which was previously defined in the scalar case in Eq. (8.113). Instead of evaluating MQ by solving a partial differential equation, as in Section 8.8, we consider the direct evaluation of Eq. (8.114):
To do this, we write the solution of Eq. (9.45) as
GENERALIZED SHOT NOISE
177
If Eq. (9.52) is inserted into Eq. (9.51) and the order of the integration over u and s is reversed, we get
where
Equation (9.53) is now of the form, Eq. (9.49), for which the average is known. The final result
is in agreement with Eq. (8.126), except that here, we have explicitly dealt with the multivariable case.
9.6
Generalized shot noise
In the shot noise case, there is no damping, that is, A = 0, corresponding to a noise source equation of the form
where the rjk are random variables. We use the symbol G rather than F to remind us that (G} ^ 0. The choice of linked-moments
with
is appropriate to describe Rice's generalized shot noise of Section 6.7. With this choice, the linked-moment relation of Eq. (9.58), with F replaced by G yields
178
LANGEVIN PROCESSES
One can explicitly separate the n = 1 term:
These results describe the properties of the noise source G. We are concerned, however, with the average
If we separate off the mean part of G
then
GENERALIZED SHOT NOISE
179
Reversing the order of integration in MQ leads, as in Eq. (9.53), to a result
where
Equation (9.65) is of the form to which Eq. (9.60) can be applied. Since we have G in place of G, the first factor on the right hand side of Eq. (9.60) should be omitted in MQ:
For the case of simple shot noise
For the case of generation and recombination,
where v = G + R is the total rate.
When inserted into Eq. (9.63), these results, Eqs. (9.69), (9.71), contain all the multitime statistics of the conditional multitime average MQ. The corresponding multitime correlations can be determined by simply differentiating MQ with respect to %(«1)^(^2) • • ••
180
9.7
LANGEVIN PROCESSES
Systems possessing inertia
The customary description of random systems in terms of a set of first order equations for da/dt appears to contradict the case of systems possessing inertia for which second order equations are appropriate. However, by introducing the momentum variables, as in Hamiltonian mechanics, we convert TV second order equations to IN first order equations. Extension of our previous results to inertial systems is immediate, but purely formal, except for the fact that the set of position variables is even under time reversal:
whereas the momentum variables are odd:
and the cross-moments are clearly odd in time (see Eq. (7.14))
Hence such moments must vanish at equal time in the classical case:
In the quantum case, the commutator qp — pq = ih forces (qp} = —(pq) = ih/2. If we set
then a = (p, q) (here a is Hermitian) obeys a set of first order equations
corresponding to the second order equation
where and F is an external force. In the presence of noise, random forces can be added to the right hand side of Eqs. (9.78) and (9.79). Hashitsume (1956) presented heuristic arguments that no noise source should be added to the momentum-velocity
SYSTEMS POSSESSING INERTIA
181
relation. However, a proof is required, which we shall make based on the Einstein relation. Equations (9.78) and (9.79) correspond to the decay matrix
In the equilibrium case, we can write
The nonequilibrium case is discussed in Lax (19601), Section 10. The Einstein relation Eq. (9.18) in the equilibrium case then yields
The presence of elements only in the lower right hand corner means that, as desired, random forces only enter Eq. (9.79). If F in Eq. (9.79) is regarded as the random force then
may be regarded as a different way of stating the Einstein relation and the fluctuation-dissipation theorem.
10 Langevin treatment of the Fokker-Planck process
10.1
Drift velocity
In Chapter 9, the Langevin processes are discussed based on the Langevin equations in the quasilinear form, Eqs. (9.1), (9.2). In this chapter, we consider the nonlinear Langevin process defined bv
The coefficients B and a may explicitly depend on the time, but we will not display this time dependence in our equations. We will now limit Langevin processes which lead back to the ordinary FokkerPlanck equation, i.e., a generalized Fokker-Planck equation that terminates with second derivatives. We shall later find that the classical distribution function of the laser, which corresponds to the density matrix of the laser, obeys, to an excellent approximation, an ordinary Fokker-Planck equation. We assume
The Gaussian nature of the force f ( t ) implied by Eq. (10.4), which is needed for a conventional Fokker-Planck process, namely one with no derivatives higher than the second. It is possible to construct a Langevin process which can reproduce any given Fokker-Planck process and vice versa. The process defined by Eq. (10.1) is Markovian, because a(t + At) can be calculated in terms of a(t), and the result is uninfluenced by information concerning a(u) for u < t. The /'s at any time t are not influenced by the /'s at any other t', see Eqs. (10.2)-(10.4), nor are they influenced by the a's at any previous time since / is independent of previous a. Thus the moments Dn in Section 8.1 can be calculated from the Markovian expression:
DRIFT VELOCITY
183
The difference between the unlinked and linked averages in Eq. (10.5) vanishes in the limit At —> 0. Rewriting Eq. (10.1) as an integral equation and denoting we have the integral equation:
We shall solve this equation by iteration. In the zeroth approximation we set a(s) = a(t) = a which is not random in averages taken subject to a(t) = a. Our first approximation is then
The first term is already of order At and need not be calculated more accurately. In the second term of Eq. (10.7) we insert the first approximation
Retaining only terms of order At, or / 2 , but not /At, or higher, we arrive at the second and final approximation
Let us now take the moments, Dn. For n > 2, using Eq. (10.4), to order At, we have Thus from Eq. (10.5) all Dn = 0 for n > 2. For n = 2, using Eqs. (10.2)-(10.5), we obtain
184
L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
and for n = 1,
The double integral in Eq. (10.14), evaluated using Eq. (10.3), is half that in Eq. (10.12) since only half of the integral over the delta function is taken. Integration over half a delta function is not too well denned. From a physical point of view, we can regard the correlation function in Eq. (10.3) to be a continuous symmetric function, such as a Gaussian of integral unity. As the limit is taken with the correlation becoming a narrower function, the integral does not change from 1/2 during any point of the limiting process. Equations (10.13) and (10.14) have shown that given a Fokker-Planck process, described by a drift vector A(a) and a diffusion D(a), we can determine the functions B(a) and a (a):
that leads to a Langevin process with the correct drift A(a) and diffusion D(a) of the associated Fokker-Planck process described by Eqs. (10.1). The reverse is also true. Given the coefficients B and a (a) in the Langevin equation, we can construct the corresponding coefficients, A and D of the Fokker-Planck process. 10.2
An example with an exact solution
The procedure used in the above section may be regarded as controversial, because we have used an iterative procedure which is in agreement with the Stratonovich's (1963) treatment of stochastic integrals, as opposed to the Ito's (1951) treatment. This disagreement arises when integrating over random process that contain white noise, the so-called Wiener processes. We shall therefore consider an example which can be solved exactly. Moreover, the example will assume a Gaussian random force that is not delta correlated, but has a finite correlation time. In that case, there can be no controversy about results. At end, however, we can allow the correlation time to approach zero. In that way we can obtain exact answers even in the white noise limit, without having to make one of the choices proposed by Ito or Stratonovich, as discussed in Lax (1966IV), Section 3.
AN EXAMPLE WITH AN EXACT SOLUTION
185
The example we consider is:
where // and a are constants, independent of a and t, and R(t — u) is an analytical function. Thus our problem is linear and Gaussian but not delta correlated, hence, the noise is not white. This example was previously presented (with a slight difference in notation) in Lax (1966III), Section 6C and Lax (1966IV), Section 3:
from Eq. (10.17). Therefore the ensemble average given by
can be evaluated using Eq. (9.49) in terms of the linked average:
The average is then expressed in terms of the cumulants. But for the Gaussian case, the series in the exponent terminates at the second cumulant:
We make a transformation of Eq. (10.18) to variable x, y:
It was permissible, here, to obey the ordinary rules of calculus in this transformation, without requirement of Ito's calculus lemma, because delta function correlations are absent. The equation of motion for x is
Since x would be constant in the absence of the random force f ( t ) , then the probability of x at time t, P(x, t), is necessarily Gaussian, and determined only by the
186
LANGEV1N TREATMENT OF THE FOKKER-PLANCK PROCESS
random force /(£), therefore, has the normalized solution form
where the second moment from Eq. (10.23) must be:
where H is an abbreviation for H(t}. Using Eq. (10.22), and changing back to the original random variable a, Eq. (10.24) leads to
Equation (10.27) is valid, even in the non-Markovian case of a continuous differentiable /(£). In this case, (H2) proportional to t2 when t —> 0, and its first derivative over dt is zero. If the process is Markovian, /(t) will be delta correlated anrl / ff^\ will he linear in •/••
10.3
Langevin equation for a general random variable
Let us consider an arbitrary function M(a, t) of the random variable a, which obeys the Langevin equations: We ask what is the Langevin equation for Ml Following the procedure in Section 8.2, the drift vector for M in the Fokker-Planck process is determined by
where, from Eq. (10.15) and Eq. (10.16), and
Equation (10.30) is terminated to n = 2 for an ordinary Fokker-Planck process.
L A N G E V I N EQUATION FOR A G E N E R A L R A N D O M VARIABLE
187
The fluctuation term for M is determined by
The drift term in the Langevin equation for M is given by
We obtain that
The Langevin equation for M is given by
Therefore, the transform in our Langevin equation obeys the ordinary calculus rule. The average is contributed not only from B(M,t] but also from the second term
which is not zero, except that a(M) is a constant. For the conditional average with M(t) = M, the contribution from the second term is
We consider an example, which will be used in Chapter 16 for applications in the finance area. If S is the price of a stock, the fundamental Langevin model is built for the percentage of the stock value dx = dS/S:
where u, and a are not functions of x.
188
L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
Our Langevin equation for S is simply
Hence, Eq. (10.37) can be simply obtained from Eq. (10.36) multiplying by S. For obtaining the average or the conditional average with s(t) = s, d(S)/dt, we have
and for the conditional average, (S) in the last expression of Eq. (10.38) is replaced by S. 10.4
Comparison with Ito's calculus lemma
The stochastic differential equations, in which Ito's calculus lemma is used for the transform of random variables, are broadly applied in the financial area. Ito's stochastic equation is written as
where dz is the differential of a Wiener process of pure Brownian motion, namely one whose mean remains zero, and whose standard deviation is (At) 1 / 2 . Ito's stochastic equation for an arbitrary function M(a, t) is written as
where A(M, t) is determined by Ito's calculus lemma:
and a(M)n is determined by
Our Eq. (10.30) is similar to Ito's calculus lemma, which indicates that the ordinary calculus rule is not valid for the Fokker-Planck processes, as shown in Section 8.2. However, the calculus rule for our Langevin equation is not Ito's calculus lemma. The difference between our Langevin's stochastic equation and that using Ito's lemma is originates from the different definitions of the stochastic integral. Ito's integral tc and write Eq. (10.62) as
i.e., the average conditional on a(i) = a at time t.
11 The rotating wave van del Pol oscillator (RWVP)
11.1
Why is the laser line-width so narrow?
The question has been raised with an entirely different meaning by Scully and Lamb (1967). However, optical lasers with frequencies as high as 1015 radians per second can have line-widths of 104 and even smaller. The addition of nonlinearity to a system generally leads to combination frequencies, but not to an extremely narrow resonance. The clue for the solution of this difficulty is described in Lax's 1966 Brandeis lectures (Lax 1968), in which Lax considered a viable classical model for a laser containing two field variables (like A and A^~), two population levels such as the upper and lower level populations in a two level atomic system, plus a pair of atomic polarization operators that represent raising and lowering operators. When Lax took this nonlinear 5 by 6 system, sought a stationary state, and examined the deviations from the stationary operating state in a quasilinear manner, he discovered that there is always one nonstable degree of freedom: a phase variable that is a combination of field and atomic phases. The next step was the realization that this degree of instability is not an artifact of the particular example, but a necessary consequence of the fact that this system, like many others, including Hewlet-Packard radio-frequency oscillators, is autonomous, namely that the equations of motion contained no time origin and no metronome-like driving source. Mathematically, this means that the system is described by differential equations that does not contain to explicit time dependence. As a consequence if x(t) is a solution, where x(t) is a six-component vector, then x(t + r) is also necessarily a solution of the differential equation system. But this means that the solution is unstable to a time shift, or more pictorially to a frequency shift. Under an instability, a new, perhaps sharp line can occur, as opposed to the occurrence of summation or difference lines that arise from nonlinearity. Hempstead and Lax (1967CVI) illustrate the key idea in a simpler system, the union of a positive and negative impedance. In this chapter, we will first describe this nonlinear model, build the corresponding differential equation of motion in Section 11.2, and transform this equation to a dimensionless canonical form in Section 11.4. In Section
AN OSCILLATOR WITH PURELY RESISTIVE NONLINEARITIES
195
11.3, the diffusion coefficients in a Markovian process is defined, and the condition for validity of this approximation is described. In Section 11.5 the phase fluctuations and the corresponding line-width are calculated. The main result of line-width W obtained in Eq. (11.66) is shown to be very narrow. In Section 11.6, the amplitude fluctuations is calculated using a quasilinear treatment. In Sections 11.7 and 11.8, the exact solutions of fluctuations are calculated based on the Fokker-Planck equation of this model. 11.2
An oscillator with purely resistive nonlinearities
We consider a self-sustained oscillator, which is modeled by connecting a positive and negative resistance in series with a resonant circuit, as shown in Fig. 11.1. A more general case was considered in Lax (1967V). The negative resistance is a source of energy, like the pump in a laser. A steady state of energy flow from the negative to the positive resistance is achieved at the oscillation intensity at which the positive and negative resistance vanishes, and the frequency of oscillation stabilizes at the frequency at which the total reactance vanishes. In addition to the standard resonant circuit Ldl/dt + Luj^Q, we assume a resistivity function R(p)I. Therefore our equation of motion is
where e(i) is a real Gaussian random force.
and p is essentially the energy stored in the circuit, which is defined by
where with A being complex. By taking our nonlinearity R(p) to be a function of the energy stored in the circuit, but not of the current or of the charge, we omit terms that vary as exp(2iwoi), etc., thus we have made the rotating wave approximation. By definition and using Eqs. (11.1), (11.4), (11.5) we obtain
where
196
THE ROTATING WAVE VAN DEL POL OSCILLATOR ( R W V P )
FIG. 11.1. A self-sustained oscillator is modeled by connecting a positive and negative resistance in a resonant circuit. The negative resistance is a source of energy. A steady state of energy flow from the negative to the positive resistance is achieved at the oscillation intensity at which the positive and negative resistance vanishes, and the frequency of oscillation stabilizes at the frequency at which the total reactance vanishes. It is consistent with the rotating wave approximation only to retain the slowly varying parts of A and A*, so that the term of A*e2luJ°t in Eq. (11.6) is dropped, leaving
We first use the equations for A and A* to determine the operating point, i.e.,
From Eq. (11.8), the operating point, PQQ, is therefore determined by
or using Eq. (11.2), that We call this operating point poo because we will later find a slightly better one, po, using a different reference frame. From Eq. (11.8) we obtain the equation of
THE DIFFUSION COEFFICIENT
197
motion of A*, where We are doing this problem thoroughly because we will find that this classical random process, which is associated with the quantum mechanical problem like laser in the difficult region near threshold, reduces to the Fokker-Planck equation for this rotating wave van der Pol oscillator. 11.3 The diffusion coefficient Noticing that e_(i) and e+(t) are random forces, we now calculate the diffusion coefficients D-+ = D+- defined by
Using the definition of the 6 function, we have
Equation (11.15)isan appropriate definition for processes that are Markovian over a time interval AT » (^o)"1 (see Lax 1966IV, Section 5 for a more detailed discussion of processes containing short, nonzero correlation times). Equation (11.15) can be rewritten as
Using the definition of G(e, UJQ) in Eq. (4.50), we see that the diffusion constant in the limit of AT —> oo is
and describes the noise at the resonant frequency U>Q. If we had chosen
then our spectrum would be that of white noise, i.e., independent of frequency. In the case of not exactly Markovian process we are assuming the spectrum does not vary too rapidly about OJQ (see Fig. 11.2), and thus we can approximate it by
198
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
FIG. 11.2. In the case of not exactly Markovian process we can approximate it by white noise evaluated at the frequency U>Q. white noise evaluated at the frequency UIQ. The spectrum of the noise source is not necessarily white, but only the change over the frequency width of the oscillator is important, and that change may be small enough to to permit approximating the noise as white. Indeed, the situation is even more favorable in a laser containing N photons: the line width will be smaller than the ordinary resonator line-width by a factor of N. In general, for an oscillatory circuit such as shown in Fig. 11.1, it is essential to choose AT to be large compared to the period of the circuit, but it is often chosen small compared to the remaining time constants to avoid nonstationary errors. The condition for the validity of this choice is actually
where A l is the relaxation time associated with the system and Su> is the frequency interval that
over which the noise displays its nonwhiteness. To order (woAT) shown in Lax (1967V) that
1
, we have
Thus we actually require two conditions, Eq. (11.19) and the less stringent condition (o^AT)"1 «C 1 .
THE VAN DER POL OSCILLATOR SCALED TO CANONICAL FORM 199 11.4
The van der Pol oscillator scaled to canonical form
The oscillator shown above is a rotating wave oscillator, but not a van del Pol oscillator since Eq. (11.8) has an arbitrary nonlinearity R(p). Therefore we expand R(p) about the operating point, forming the linear function
We shall later discuss the condition under which this approximation is valid. We now perform a transformation
and where
and Then Eq. (11.8) becomes a canonical form:
where
and
The coefficients £ and T were determined by the requirement that A' and h satisfy Eqs. (11.27) and (11.29). The condition for neglect of higher terms in the expansion of R(p) about the operating point is where
Inserting Eq.
we require
If the noise (e 2 ) Wo is weak, as it usually is, the noise width of Ap = ^2 will be small compared to the region Sp over which R(p) changes appreciably. In physical terms the width £2 of the fluctuations (in p) is small compared to the region 8p characterizing the nonlinearity. Thus over this region the linear expansion of R(p), resulting in the van der Pol equation, is valid.
200
11.5
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
Phase fluctuations in a resistive oscillator
In Eq. (11.10) we found that an oscillator chooses to operate at a point at which its net resistivity and its line-width vanishes. Noise in a stable nonlinear system would add to this signal possessing a delta function spectrum, but not broaden it. Fortunately, an autonomous oscillator (described by a differential equation with time independent coefficients) is indifferent to a shift in time origin and thus is unstable against phase fluctuations. These unstable phase fluctuations will broaden the line, whereas amplitude fluctuations only add a background. For the purpose of understanding phase broadening, therefore, it is adequate to linearize with regard to amplitude fluctuations. Indeed, for a purely resistive oscillator, there is no coupling (at the quasilinear level) between amplitude and phase fluctuations. At least in the region well above threshold, then, when amplitude fluctuations are small, it is adequate to treat phase fluctuations by neglecting amplitude fluctuations entirely. If in Eq. (11.8) we introduce
then the phase is found to obey the equation
from which R(p) has disappeared. Amplitude fluctuations are neglected by setting u = 0. The only vestige of dependence on R(p) is through po, which is \A\^ at the operating point. Equation (11.34), with u = 0, is a differential equation containing no time origin and no metronome-like driving source. As a consequence if x(t) is a solution, then x(t + T) is also necessarily a solution. This means that the solution is unstable to a time shift. po could be replaced by the more accurate (p). Since R(p) no longer enters the problem, we can with no loss of generality work with the dimensionless variables introduced in Section 11.4 for the RWVP oscillator. Dropping the primes in Eq. (11.27), and defining p = (p)/£ 2 , Eq. (11.34) (with u = 0) takes the dimensionless form
of a Langevin problem in alone. Since e ( t ) , hence h(t] has been assumed Gaussian, with vanishing linkedmoments for n > 2, Eq. (11.35) as shown in Chapter 10 reduces to a
PHASE FLUCTUATIONS IN A RESISTIVE OSCILLATOR
201
Fokker-Planck process. The diffusion coefficient D is given by Eq. (8.14),
Using Eq.
Since we already have the product of two /i's, using Eqs. (11.29) and (11.30) we obtain
Therefore D((f>) is independent of 0 below threshold. We will now calculate the amplitude fluctuations. In the quasilinear (QL) approximation, the decay constant for amplitude fluctuations is
F O K K E R - P L A N C K EQUATION FOR RWVP
207
where the subscript a denotes the amplitude. Using Eq. (11.81), and then Eq. (11-82),
A better approximation, the intelligent quasilinear approximation (IQL), is obtained by replacing po by p, the exact (or experimental) mean number:
We note that we need not have used the quasilinear approximation, as we could have solved this problem exactly, using the drift vector Eq. (11.81) and the diffusion coefficient, Eq. (11.78) to obtain the Fokker-Planck equation:
11.7
Fokker-Planck equation for RWVP
A complete Fokker-Planck equation including phase and amplitude fluctuations can be obtained from Eq. (11.27) after dropping the primes:
The last term is obtained by DAA*(d2P/dAdA*) + DA*A(d2P/dA*dA), fromEq. (11.29) DAA, = DA,A = 2. Alternatively, if one prefers real variables, one can set
in Eq. (11.27) to obtain the real Langevin equations
and
208
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
with
with
Alternatively with A = re~l(t) by
216
NOISE IN HOMOGENEOUS SEMICONDUCTORS
The voltage noise is given by
and the total voltage fluctuation may be obtained by replacing the integral by unity. The total noise, which only involves a represents the transition probability for a "collision" which carries the particle form state a to state a'. Considering the Pauli principle for electrons and holes in semiconductors, the transition rate Wa>a is replaced by The master transition probability from occupation number n(a) to n'(a) can be written as
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS
219
We now perform the calculation of the 6th state of the first moment of the transition probability defined by Eq. (8.10):
If one inserts Eq. (12.52) and sums first over n', the only terms in the sum which contribute are those for which a = b and a' = b,
Therefore, from Eqs. (8.22) and (12.51),
In a steady state, one is tempted to make the terms on the right hand side of Eq. (12.55) cancel in pairs by detailed balance:
However, if one requires that three states a,b,c be consistent with one another under this requirement, one finds that
must satisfy the consistency requirement f ( c , a ) = f ( c , b ) f ( b , a ) a functional equation whose only solution is the form
Thus if the ratio of forward to reverse transition probabilities has the form Eq. (12.58), then the steady state solution has the form
or Even in the nonequilibrium case, we can choose to define a quasi-Fermi level A = exp(£/fcT). Under the thermal equilibrium case, g(a) = exp(—E(a)/kT) which leads to the conventional equilibrium result Eq. (12.48).
220
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In general one obtains a steady state in which neither equilibrium nor detail balance occur. In any case, for small deviations from a steady state, we set
and rewrite Eq. (12.55) in the form
where
The equation for the second moments, according to Eq. (8.23), is
By using the definition Eq. (12.52), one obtains
for c 7^ b. The two terms in D^c are equal under detailed balance but not otherwise. For c = 6, we obtain
The steady state second moments (An(6)An(c)} now are chosen so that the right hand side of Eq. (12.64) vanishes, i.e., so that the Einstein relation is obeyed. Assuming that there is no correlation between fluctuations in different states, we
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS
221
try a solution of form and we find that the ansatz satisfies the Einstein relation of Eq. (12.64), provided the steady state obeys detailed balance, Eq. (12.56). Using Eq. (12.59), we have which leads to The total number of systems is N = ^n(c), which yields a formula similar to the thermodynamic case, Eq (12.38). In our model, however, the total number TV should be fixed, and we need to force a constraint. The solution Eq. (12.68) we found is only a particular solution, and can be added a solution of the homogeneous equation
For the Boltzmann case (e = 0), the solution Eq. (12.68) is replaced by which obeys the constraint {[^ a Ara(a)]Are(c)} = 0 . For the Fermi and Bose cases, we have
The added term is of order 1/N and therefore is unimportant in calculating the fluctuations in any small portion of a large system. However, this term does affect fluctuations in appreciable parts of a system. Equation (12.50) can be readily applied to the case in which n(l) = N, the number of conduction electrons, n(2) = N, the number of trapped electrons and re(3) = Nv — P, number of electrons in the valence band, where Nv is the number of valence band states and P is the number of holes. Thus
We have assumed nondegeneracy for the holes and the free electrons, but not for the trapped electrons. Since An(3) = — P, we can write the second moments in
222
NOISE IN HOMOGENEOUS SEMICONDUCTORS
the form
These results are consistent with the constraint AP = AJV + A TV of charge neutrality. If the number of traps is zero, they reduce to AP = AN and
12.5
Influence of drift and diffusion on modulation noise
To concentrate on the influence of drift and diffusion on density fluctuations and modulation noise, let us return to the trap-free case discussed in Section 12.3, where AJV = AP. The spectrum of voltage noise already given in Eq. (12.35) can be rewritten as a product
of the total noise
and the normalized spectrum
where
Note that this normalization is four times that used for g(uj~) in Lax and Mengert (1960). For simplicity, we confine ourselves to a one-dimensional geometry, as
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 223 was done by Hill and van Vliet (1958), and calculate the total hole fluctuation as
We can now apply our techniques to continuous parameter systems by replacing the sum
by the integral
Introducing a more convenient notation for the Green's function
we can write
so that the correlation at two times is, as usual, related to the pair correlation at the initial time
It is customary to treat fluctuations at the same time at two places as uncorrelated. This is clearly the case for independent carriers. It is less obvious when Coulomb attractions (say between electrons and holes) are included. It was shown, however, in Appendix C of Lax and Mengert (1960) that a delta function correlation is valid, as long as we are dealing with distances greater than the screening radius. Thus we can take
where the coefficient of the delta function is chosen so that the fluctuation in the total number of carriers {(AP)2} is given correctly by Eq. (12.82). Here L is the distance between the electrodes.
224
NOISE IN HOMOGENEOUS SEMICONDUCTORS
The definition, Eq. (12.34), of $(t) yields the expression
If the Green's function is defined, appropriately as in Lax (19601), to vanish for t < 0, it will obey an equation of the form
where, the operator A is defined, in the continuous variable case, by
Here, v and D are the bipolar drift velocity and diffusion constants found by van Roosebroeck (1953) to describe the coupled motion of electrons and holes while maintaining charge neutrality
where the individual diffusion constants and mobilities are related by the Einstein relation. Equation (12.95) for the Green's function can be solved by a Fourier transform method where Here A (A;) are the eigenvalues of the A operator With Eq. (12.99) for k, the after-effect function can be calculated from Eq. (12.94)
where z = kL/2. Thus the spectrum, Eq. (12.85), is
where A has been re-expressed as a function of z, Lax and Mengert (1960) provide an exact evaluation of this integral. However, the resulting expressions are complicated. It is therefore worthwhile to treat some
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 225 limiting cases. For example, if there is no diffusion, then
and the after-effect function is given by
where Ta = L/v is the transit time and the spectrum is governed by a windowing factor W with the window factor given by
Indeed, the current noise, in this special case, can be written in the form given by Hill and van Vliet (1958)
which emphasizes the similarity to shot noise. The equivalent current is defined by
The window factor still takes the complicated form
where r = 1/r. Even this result is complicated to understand. If we take the limiting case when recombination is unimportant over the transit time, the result simplifies to
a windowing factor similar to that found associated with the effect of transit time on shot noise.
226
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In the opposite limit, in which diffusion is retained but drift is neglected, the exact result for the spectrum is given by
where
and is the reciprocal of the diffusion length. The exponential term represents an interference term between the two bounddaries that is usually negligible since they are seperated by substantially more than a diffusion length. A simple approximate from over intermediate frequencies is
In summary, in addition to the first term, which represents the volume noise easily computed just by using the total carrier P(t), the term proportional to an inverse frequency to the three-halves power arises from diffusion across the boundary at the electrodes.
13
Random walk of light in turbid media
Light propagation in a multiple scattering (turbid) medium such as the atmosphere, colloidal suspensions and biological tissue is commonly treated by the theory of radiative transfer; see, for example, Chandrasekhar (1960). Recent advances in ultrafast lasers and photon detectors for biomedical imaging and diagnostics have revitalized the interest in radiative transfer (Alfano 1994; Yodh et al. 1997; Gandjbakhche 1999). The basic equation of radiative transfer is the elastic Boltzmann equation, a nonseparable integro-differential equation of first order for which an exact closed form solution is not known except for the case for isotropic scatterers as far as the authors know (Hauge 1974). Solutions are often based on truncation of the spherical harmonics expansion of the photon distribution function or resort to numerical calculation including Monte Carlo simulations (Ishimaru 1978; Cercignani 1988). In this chapter, we shall treat light propagation in turbid media as a random walk of photons and determine the characteristics of light propagation (center position and diffusion coefficients) from an analysis of the random walk performed by photons in the turbid medium directly (Xu et al. 2004). In the next chapter, a more advanced approach solving the elastic Boltzmann equation based on a cumulant expansion of the photon distribution function will be presented.
13.1
Introduction
Clouds, sea water, milk, paint and tissues are some examples of turbid media. A turbid medium scatters light strongly. Visible light shed on one side of a cup of milk is much weak and diffusive observed on the other side of the cup because light is highly scattered in milk while the absorption of light by milk is very low. The scattering and absorption property of a turbid medium is described by the scattering and absorption coefficients /j,s and /j,a, respectively. Their values depend on the number density of scatterers (absorbers) in the medium and the cross-section of scattering (absorption) of each individual scatterer (absorber). For a collimated beam of intensity IQ incident at the origin and propagating along the z direction inside a uniform turbid medium, the light intensity in the forward direction at
228
RANDOM WALK OF LIGHT IN TURBID MEDIA
position z is attenuated according to the Beer's law: where HT = Us + Ha is the total attenuation coefficient. The portion of light propagating in the exact forward direction is usually called "ballistic light". The reduction of the intensity of ballistic light comes from the scattering of light into other directions (called "multiply scattered light" or "diffusive light") and light absorption in the medium. Inside the turbid medium, ballistic light decays exponentially and only multiply-scattered light survives over some distance away from the incident light source. The theory to treat propagation of multiply scattered light in a turbid medium is the theory of radiative transfer (Chandrasekhar 1960). Due to the difficulty in solving the elastic Boltzmann equation which governs radiative transfer, in particular, in a bounded volume, solutions are often based on truncation of the spherical harmonics expansion of the photon distribution function or resort to numerical calculation including Monte Carlo simulations (Ishimaru 1978; Cercignani 1988). Monte Carlo methods treat photon migration as a Markov stochastic process. The solution to the elastic Boltzmann equation is equivalent to the probability of finding a photon at any specified location, direction and time in the Monte Carlo simulation. The advantage of the Monte Carlo method is that it can easily handle, at least in principle, a bounded region, different boundary conditions and/or heterogeneity of the medium. However, Monte Carlo methods are computation prohibitive when the size of the sampling volume becomes large. In the stochastic picture of photon migration in turbid media, photons take a random walk in the medium and may get scattered or absorbed according to the scattering coefficient /zs and the absorption coefficient ^a of the medium. A phase function, P(s, s'), describes the probability of scattering a photon from direction s to s'. The free path (step-size) between consecutive events (either scattering or absorbing) has an exponential distribution / u^exp(—^yef) characterized by the total attenuation //y. At an event, photon scattering takes place with probability HS/^T (the albedo) and absorption with probability na/HT- This picture forms the basis for the Monte Carlo simulation of photon migration. Here we shall use this simple picture of a Markov stochastic process of photons to compute analytically macroscopic quantities such as the average central position and half-width of the photon distribution. The idea is to first analyze the microscopic statistics of the photon propagation direction in the direction space which is solely determined by the phase function and the incident direction of light. The connection between the microscopic statistics and the macroscopic quantities at any specified time and position is made by a "bridge", a generalized Poisson distribution pn(t), the probability that a photon has endured exactly re scattering events before time t. In this book, we will restrict our discussion to light propagation in an isotropic turbid medium where the property of light scattering depends
MICROSCOPIC STATISTICS IN THE DIRECTION SPACE
229
FIG. 13.1. A photon moving along n is scattered to n' with a scattering angle 0 and an azimuthal angle ^ in a photon coordinate system xyz whose 2-axis coincides with the photon's propagation direction prior to scattering. XYZ is the laboratory coordinate system. on the angle between s and s' rather than the directions and the phase function can be written in a form of .P(s • s').
13.2
Microscopic statistics in the direction space
Denote the position, direction and step-size of a photon after ith scattering event as xW, sW and S-l\ respectively. The initial condition is x(°) = (0, 0, 0) for the starting point and s^0) = SQ — (0, 0, 1) for the incident direction. The laboratory Cartesian components of x*^ and s^-1 are Xa and Sa (a = 1,2,3). The photon is incident at time to = 0- For simplicity the speed of light is taken as the unit of speed and the mean free path /u^1 as the unit of length. The scattering of photons takes a simple form in an orthonormal coordinate system attached to the moving photon itself where n is the photon's propagation direction prior to scattering and m is an arbitrary unit vector not parallel to n (see Fig. 13.1). The distribution of the scattering angle 9 € [0, vr] is given by the phase function of the medium and the azimuthal angle (f) is uniformly distributed over [0 , 2-rr) . For one realization of the scattering event of angles (0, (f>) in the photon coordinate system, the outgoing propagation direction n' of the photon will be:
230
RANDOM WALK OF LIGHT IN TURBID MEDIA
FIG. 13.2. The average photon propagation direction (vector) decreases as gr" where g is the anisotropy factor and n is the number of scattering events.
The freedom of choice of the unit vector m reflects the arbitrariness of the xy axes of the photon coordinate system. For example, taking m = (0, 0,1), Eq. (13.2) gives
Here sa , etc, are stated in the laboratory coordinate system. The ensemble average of the propagation direction over all possible realizations of (6, (p) and then over all possible sW in Eq. (13.3) turns out to be (s(*+1)) = (sW) (cos 8) because 0 and <j) are independent and (cos^) = (sin^) = 0. By recursion,
where g = (cos 9) = 1 — g\ is the anisotropy factor (see Fig. 13.2).
MICROSCOPIC STATISTICS IN THE DIRECTION SPACE
231
By squaring the third equation in Eq. (13.3) and then taking an ensemble average, we find
where 52 = | (sin2 6*) as (cos 0) = 0 and (sin2 0) = (cos2 0) = ^. By forming a product from the first and third equation in Eq. (13.3) and then taking an ensemble average, we find
Similar equalities are obtained for x and y components as the labels are rotated due to the symmetry between x,y,z directions. The correlations between the propagation directions are hence given by
On the other hand, the correlation between si and Sa (j > i) can be reduced to a correlation of the form of Eq. (13.7) due to the following observation
232
RANDOM WALK OF LIGHT IN TURBID MEDIA
where p(s^ |sW) means the conditional probability that a photon jumps from sW at the ith step to s(J) at the jth step. Equation (13.8) is a result of the ChapmanKolmogorov condition (2.17), p(sW sW) = f ds^-^p(s^\s^~^)p(s^-^ sW) of the Markovian process and the fact f ds^Sa p(s^\s^~^) = gs2 from Eq. (13.3). Combining Eqs. (13.7) and (13.8), and using the initial condition of s(°\ that is, we conclude
where the constants /i = /2 = — 1 and /s = 2. Here we see that the autocorrelation of the x, y, or z component of the photon propagation direction approaches 1/3, i.e., scattering uniformly in all directions, after a sufficient large number of scattering (a = (3 and j = i —» oo ), and the cross-correlation between them is always zero (a ^ (3). 13.3
The generalized Poisson distribution pn(t)
The connection between the macroscopic physical quantities about the photon distribution and the microscopic statistics of the photon propagation direction is made by the probability, pn(t), that the photon has taken exactly n scattering events before time t (the (n + l)th event comes at t). We claim pn(t) obeys the generalized Poisson distribution. This claim was previously proved by Wang and Jacques (1994):
which is the Poisson distribution of times of scattering with the expected rate occurrence of IJL~I multiplied by an exponential decay factor due to absorption. Here we have used p^1 = 1 as the unit of length. This form ofpn(t) can be easily verified by recognizing first that po(t) = exp(—t) equals the probability that the photon experiences no events within time t (and the first event occurs at t); and second that the probability pn+i (t) is given by
MACROSCOPIC STATISTICS
233
in which the first event occurred at t' is scattering and followed by n scattering events up to but not including time t, which confirms Eq. (13.11) at n + 1 if Eq. (13.1 1) is valid at n. The total probability of finding a photon at time t
decreases with time due to the annihilation of photons by absorption. 13.4
Macroscopic statistics
The average propagation direction (s(t)} at time t is then
Plugging Eqs. (13.4) and (13.11) into Eq. (13.14), we obtain <s(t)) = zexp(-fiagit)
= zexp(-t/lt).
(13.15)
l
Here lt = nj /(I — g) is usually called the transport mean free path which is the randomization distance of the photon propagation direction. The first moment of the photon density with respect to position is thus
revealing that the center of the photon cloud moves along the incident direction for one transport mean free path lt before it stops (see Fig. 13.3). The second moment of the photon density is calculated as follows. Denote p(&2, £21 si, ti) the conditional probability that a photon jumps from a propagation direction si at time t\ to a propagation direction 82 at time t% (£2 > ti > 0). The conditional correlation of the photon propagation direction subject to the initial condition is given by
Denote the number of scattering events encountered by the photon at states (si, t\) and (82, £2) as n\ and n^ respectively. Here n n\ since the photon jumps from (si, t\) to (82, £2)- Equation (13.17) can be rewritten as
The evaluation of the denominator in Eq. (13.18) is simple and is given by Sn2 Pni (*2) = ex P(~/ u a*2)- To evaluate the numerator in Eq. (13.18), we proceed
234
RANDOM WALK OF LIGHT IN TURBID MEDIA
as follows:
where C"1 = n,2'/[( n 2 ~~ rai)'ni'] and we have repeatedly used the binomial expansion of (a + b)n = J^o